* backing chain & block status & filters @ 2020-04-28 8:55 Vladimir Sementsov-Ogievskiy 2020-04-28 11:08 ` Max Reitz 0 siblings, 1 reply; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2020-04-28 8:55 UTC (permalink / raw) To: qemu block; +Cc: Kevin Wolf, Andrey Shinkevich, qemu-devel, Max Reitz Hi! I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and is_allocated_above", and returned to all the inconsistencies about block-status. I keep in mind Max's series about child-access functions, and Andrey's work about using COR filter in block-stream, which depends on Max's series (because, without them COR fitler with file child breaks backing chains).. And, it seems that it's better to discuss some questions before resending. First, problems about block-status: 1. We consider ALLOCATED = ZERO | DATA, and documented as follows: * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer * BDRV_BLOCK_ZERO: offset reads as zero * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing raw data * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this * layer rather than any backing, set by block layer This actually means, that we should always have BDRV_BLOCK_ALLOCATED for formats which doesn't support backing. So, all such format drivers must return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for example, iscsi - doesn't. 2. ZERO. The meaning differs a bit for generic block_status and for drivers.. I think, we at least should document it like this: BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at this layer and read as ZERO. If generic block_status returns ZERO, it only mean that it reads as zero, but the region may be allocated on underlying level. 3. bdi.unallocated_blocks_are_zero I think it's very bad, that we have formats, that supports backing, but doesn't report bdi.unallocated_blocks_are_zero as true. It means that UNALLOCATED region reads as zero if we have short backing file, and not as zero if we remove this short backing file. I can live with it but this is weird logic. These bad drivers are qcow (not qcow2), parallels and vmdk. I hope, they actually just forget to set unallocated_blocks_are_zero to true. Next. But what about drivers which doesn't support backing? As we considered above, they should always return ZERO or DATA, as everything is allocated in this backing-chain level (last level, of course).. So again unallocated_blocks_are_zero is meaningless. So, I think, that driver which doesn't support backings, should be fixed to return always ZERO or DATA, than we don't need this unallocated_blocks_are_zero at all. 3. Short backing files in allocated_above: we must consider space after EOF as ALLOCATED, if short backing file is inside requested backing-chain part, as it produced exactly because of this short file (and we never go to backing). (current realization of allocated_above is buggy, see my outdated series "[PATCH 0/4] fix & merge block_status_above and is_allocated_above") 4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have a backing chain of non-backing child.. I just remember that we didn't reach the consensus. 5. Filters.. OK we have two functions for them: bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I think both are wrong: bdrv_co_block_status_from_file leads to problem [4], when we can report UNALLOCATED, which refers not to the current backing chain, but to sub backing chain of file child, which is inconsistent with block_status_above and is_allocated_above iteration. bdrv_co_block_status_from_backing is also is not consistent with block_status_above iteration.. At least at leads to querying the same node twice. ================= So, about filters and backing chains. Keeping (OK, just, trying to keep) all these things in mind, I think that it's better to keep backing chains exactly *backing* chains, so that "backing" child is the only "not own" child of the node. So, its close to current behavior and we don't need child-access functions. Then how filters should work: Filter with backing child, should always return UNALLOCATED (i.e. no DATA, no ZERO), it is honest: everything is on the other level of backing chain. Filter with file child should always return BDRV_BLOCK_DATA | BDRV_BLOCK_RECURSE, to show that: 1. everything is allocated in *this* level of backing chain 2. filter is too lazy to dig in it's file child (and, maybe the whole sub-tree of it) and asks generic layer to do it by itself, if it wants zeroes. Then, of course, if we want some filter to be inside backing chain, it should have not "file" child but "backing". For this, we may support in current public filter both variants: backing or file, as user prefer. I.e., filter is opened either with file option or with backing and operate correspondingly. And newer filters (like backup-top) may support only backing variants. ================= So, I propose to complicate a bit code of old file-based filters, to support backing child (which may be done on demand, when needed. For example, Virtuozzo now need only COR filter (and only to satisfy iotests, actually)), and keep generic code about block-status and handling backing-chains simpler (and simpler to document). Another benefit is that block-graph is already visible through QAPI, and it would be good to keep backing link meaning, not adding the thing that for filters "file" link behaves as "backing". So again, I'm for limiting backing-behavior to bs->backing child only (as it currently behaves). What do you think? -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-04-28 8:55 backing chain & block status & filters Vladimir Sementsov-Ogievskiy @ 2020-04-28 11:08 ` Max Reitz 2020-04-28 11:28 ` Kevin Wolf 2020-04-28 14:51 ` Vladimir Sementsov-Ogievskiy 0 siblings, 2 replies; 16+ messages in thread From: Max Reitz @ 2020-04-28 11:08 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy, qemu block Cc: Kevin Wolf, Andrey Shinkevich, qemu-devel [-- Attachment #1.1: Type: text/plain, Size: 6699 bytes --] On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote: > Hi! > > I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and > is_allocated_above", and returned to all the inconsistencies about > block-status. I keep in mind Max's series about child-access functions, > and Andrey's work about using COR filter in block-stream, which depends > on Max's series (because, without them COR fitler with file child breaks > backing chains).. And, it seems that it's better to discuss some > questions before resending. > > First, problems about block-status: > > 1. We consider ALLOCATED = ZERO | DATA, and documented as follows: > > * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer > * BDRV_BLOCK_ZERO: offset reads as zero > * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing > raw data > * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this > * layer rather than any backing, set by block > layer > > This actually means, that we should always have BDRV_BLOCK_ALLOCATED for > formats which doesn't support backing. So, all such format drivers must > return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for > example, iscsi - doesn't. Hm. I could imagine that there are formats that have non-zero holes (e.g. 0xff or just garbage). It would be a bit wrong for them to return ZERO or DATA then. But OTOH we don’t care about such cases, do we? We need to know whether ranges are zero, data, or unallocated. If they aren’t zero, we only care about whether reading from it will return data from this layer or not. So I suppose that anything that doesn’t support backing files (or filtered children) should always return ZERO and/or DATA. > 2. ZERO. The meaning differs a bit for generic block_status and for > drivers.. I think, we at least should document it like this: > > BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer > BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at > this layer and read as ZERO. If generic block_status returns ZERO, it > only mean that it reads as zero, but the region may be allocated on > underlying level. Hm. What does that mean? One of the problems is that “allocated” has two meanings: (1) reading data returns data defined at this backing layer, (2) actually allocated, i.e. takes up space on the file represented by this BDS. As far as I understand, we actually don’t care about (2) in the context of block_status, but just about (1). So if a layer returns ZERO, it is by definition (1)-allocated. (It isn’t necessarily (2)-allocated.) > 3. bdi.unallocated_blocks_are_zero > > I think it's very bad, that we have formats, that supports backing, but > doesn't report bdi.unallocated_blocks_are_zero as true. It means that > UNALLOCATED region reads as zero if we have short backing file, and not > as zero if we remove this short backing file. What do you mean by “remove this short backing file”? Because generally one can’t just drop a backing file. So maybe a case like block-stream? Wouldn’t that be a bug in block-stream them, i.e. shouldn’t it stream zeros after the end of the backing file? > I can live with it but > this is weird logic. These bad drivers are qcow (not qcow2), parallels > and vmdk. I hope, they actually just forget to set > unallocated_blocks_are_zero to true. qcow definitely sounds like it. > Next. But what about drivers which doesn't support backing? As we > considered above, they should always return ZERO or DATA, as everything > is allocated in this backing-chain level (last level, of course).. So > again unallocated_blocks_are_zero is meaningless. So, I think, that > driver which doesn't support backings, should be fixed to return always > ZERO or DATA, than we don't need this unallocated_blocks_are_zero at all. Agreed. > 3. The second 3.? :) > Short backing files in allocated_above: we must consider space after > EOF as ALLOCATED, if short backing file is inside requested > backing-chain part, as it produced exactly because of this short file > (and we never go to backing). Sounds correct. > (current realization of allocated_above is > buggy, see my outdated series "[PATCH 0/4] fix & merge > block_status_above and is_allocated_above") > > 4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have > a backing chain of non-backing child.. I just remember that we didn't > reach the consensus. Possible? :) > 5. Filters.. OK we have two functions for them: > bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I > think both are wrong: > > bdrv_co_block_status_from_file leads to problem [4], when we can report > UNALLOCATED, which refers not to the current backing chain, but to sub > backing chain of file child, which is inconsistent with > block_status_above and is_allocated_above iteration. > > bdrv_co_block_status_from_backing is also is not consistent with > block_status_above iteration.. At least at leads to querying the same > node twice. Well, yes. > So, about filters and backing chains. Keeping (OK, just, trying to keep) > all these things in mind, I think that it's better to keep backing > chains exactly *backing* chains, so that "backing" child is the only > "not own" child of the node. So, its close to current behavior and we > don't need child-access functions. Then how filters should work: > > Filter with backing child, should always return UNALLOCATED (i.e. no > DATA, no ZERO), it is honest: everything is on the other level of > backing chain. I disagree, because filters with or without backing children should work exactly the same way and just not appear in the backing chain. > Filter with file child should always return BDRV_BLOCK_DATA | > BDRV_BLOCK_RECURSE, to show that: > 1. everything is allocated in *this* level of backing chain > 2. filter is too lazy to dig in it's file child (and, maybe the whole > sub-tree of it) and asks generic layer to do it by itself, if it wants > zeroes. > > Then, of course, if we want some filter to be inside backing chain, it > should have not "file" child but "backing". For this, we may support in > current public filter both variants: backing or file, as user prefer. > I.e., filter is opened either with file option or with backing and > operate correspondingly. And newer filters (like backup-top) may support > only backing variants. I disagree again for the same reason. Max [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-04-28 11:08 ` Max Reitz @ 2020-04-28 11:28 ` Kevin Wolf 2020-04-28 15:13 ` Vladimir Sementsov-Ogievskiy 2020-04-28 14:51 ` Vladimir Sementsov-Ogievskiy 1 sibling, 1 reply; 16+ messages in thread From: Kevin Wolf @ 2020-04-28 11:28 UTC (permalink / raw) To: Max Reitz Cc: Andrey Shinkevich, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu block [-- Attachment #1: Type: text/plain, Size: 2434 bytes --] Am 28.04.2020 um 13:08 hat Max Reitz geschrieben: > On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote: > > Hi! > > > > I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and > > is_allocated_above", and returned to all the inconsistencies about > > block-status. I keep in mind Max's series about child-access functions, > > and Andrey's work about using COR filter in block-stream, which depends > > on Max's series (because, without them COR fitler with file child breaks > > backing chains).. And, it seems that it's better to discuss some > > questions before resending. > > > > First, problems about block-status: > > > > 1. We consider ALLOCATED = ZERO | DATA, and documented as follows: > > > > * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer > > * BDRV_BLOCK_ZERO: offset reads as zero > > * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing > > raw data > > * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this > > * layer rather than any backing, set by block > > layer > > > > This actually means, that we should always have BDRV_BLOCK_ALLOCATED for > > formats which doesn't support backing. So, all such format drivers must > > return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for > > example, iscsi - doesn't. > > Hm. I could imagine that there are formats that have non-zero holes > (e.g. 0xff or just garbage). It would be a bit wrong for them to return > ZERO or DATA then. > > But OTOH we don’t care about such cases, do we? We need to know whether > ranges are zero, data, or unallocated. If they aren’t zero, we only > care about whether reading from it will return data from this layer or not. > > So I suppose that anything that doesn’t support backing files (or > filtered children) should always return ZERO and/or DATA. I'm not sure I agree with the notion that everything should be BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today at least. If we want to change this, we will have to check all callers of bdrv_is_allocated() and friends who might use this to find holes in the file. Basically, the way bdrv_is_allocated() works today is that we assume an implicit zeroed backing layer even for block drivers that don't support backing files. Kevin [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-04-28 11:28 ` Kevin Wolf @ 2020-04-28 15:13 ` Vladimir Sementsov-Ogievskiy 2020-04-28 16:18 ` Eric Blake 0 siblings, 1 reply; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2020-04-28 15:13 UTC (permalink / raw) To: Kevin Wolf, Max Reitz; +Cc: Andrey Shinkevich, qemu-devel, qemu block 28.04.2020 14:28, Kevin Wolf wrote: > Am 28.04.2020 um 13:08 hat Max Reitz geschrieben: >> On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote: >>> Hi! >>> >>> I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and >>> is_allocated_above", and returned to all the inconsistencies about >>> block-status. I keep in mind Max's series about child-access functions, >>> and Andrey's work about using COR filter in block-stream, which depends >>> on Max's series (because, without them COR fitler with file child breaks >>> backing chains).. And, it seems that it's better to discuss some >>> questions before resending. >>> >>> First, problems about block-status: >>> >>> 1. We consider ALLOCATED = ZERO | DATA, and documented as follows: >>> >>> * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer >>> * BDRV_BLOCK_ZERO: offset reads as zero >>> * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing >>> raw data >>> * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this >>> * layer rather than any backing, set by block >>> layer >>> >>> This actually means, that we should always have BDRV_BLOCK_ALLOCATED for >>> formats which doesn't support backing. So, all such format drivers must >>> return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for >>> example, iscsi - doesn't. >> >> Hm. I could imagine that there are formats that have non-zero holes >> (e.g. 0xff or just garbage). It would be a bit wrong for them to return >> ZERO or DATA then. >> >> But OTOH we don’t care about such cases, do we? We need to know whether >> ranges are zero, data, or unallocated. If they aren’t zero, we only >> care about whether reading from it will return data from this layer or not. >> >> So I suppose that anything that doesn’t support backing files (or >> filtered children) should always return ZERO and/or DATA. > > I'm not sure I agree with the notion that everything should be > BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today > at least. If we want to change this, we will have to check all callers > of bdrv_is_allocated() and friends who might use this to find holes in > the file. Yes. Because they are doing incorrect (or at least undocumented and unreliable) thing. > > Basically, the way bdrv_is_allocated() works today is that we assume an > implicit zeroed backing layer even for block drivers that don't support > backing files. But read doesn't work so: it will read data from the bottom layer, not from this implicit zeroed backing layer. And it is inconsistent. On read data comes exactly from this layer, not from its implicit backing. So it should return BDRV_BLOCK_ALLOCATED, accordingly to its definition.. Or, we should at least document current behavior: BDRV_BLOCK_ALLOCATED: the content of the block is determined by this layer rather than any backing, set by block. Attention: it may not be set for drivers without backing support, still data is of course read from this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean allocation on fs level, which occupies real space on disk.. So, for such drivers ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or (most probably) not, don't look at ALLOCATED flag, as it is added by generic layer for another logic, not related to fs-allocation. 0 means that, most probably, data doesn't occupy space on fs, zero-status is unknown (most probably non-zero) -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-04-28 15:13 ` Vladimir Sementsov-Ogievskiy @ 2020-04-28 16:18 ` Eric Blake 2020-04-28 16:46 ` Vladimir Sementsov-Ogievskiy 0 siblings, 1 reply; 16+ messages in thread From: Eric Blake @ 2020-04-28 16:18 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy, Kevin Wolf, Max Reitz Cc: Andrey Shinkevich, qemu-devel, qemu block On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote: >>> Hm. I could imagine that there are formats that have non-zero holes >>> (e.g. 0xff or just garbage). It would be a bit wrong for them to return >>> ZERO or DATA then. >>> >>> But OTOH we don’t care about such cases, do we? We need to know whether >>> ranges are zero, data, or unallocated. If they aren’t zero, we only >>> care about whether reading from it will return data from this layer >>> or not. >>> >>> So I suppose that anything that doesn’t support backing files (or >>> filtered children) should always return ZERO and/or DATA. >> >> I'm not sure I agree with the notion that everything should be >> BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today >> at least. If we want to change this, we will have to check all callers >> of bdrv_is_allocated() and friends who might use this to find holes in >> the file. > > Yes. Because they are doing incorrect (or at least undocumented and > unreliable) thing. Here's some previous mails discussing the same question about what block_status should actually mean. At the time, I was so scared of the prospect of something breaking if I changed things that I ended up keeping status quo, so here we are revisiting the topic several years later, still asking the same questions. https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html > >> >> Basically, the way bdrv_is_allocated() works today is that we assume an >> implicit zeroed backing layer even for block drivers that don't support >> backing files. > > But read doesn't work so: it will read data from the bottom layer, not from > this implicit zeroed backing layer. And it is inconsistent. On read data > comes exactly from this layer, not from its implicit backing. So it should > return BDRV_BLOCK_ALLOCATED, accordingly to its definition.. > > Or, we should at least document current behavior: > > BDRV_BLOCK_ALLOCATED: the content of the block is determined by this > layer rather than any backing, set by block. Attention: it may not be > set > for drivers without backing support, still data is of course read from > this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean > allocation on fs level, which occupies real space on disk.. So, for > such drivers > > ZERO | ALLOCATED means that, read as zero, data may be allocated on > fs, or > (most probably) not, > don't look at ALLOCATED flag, as it is added by generic layer for > another logic, > not related to fs-allocation. > > 0 means that, most probably, data doesn't occupy space on fs, > zero-status is > unknown (most probably non-zero) > That may be right in describing the current situation, but again, needs a GOOD audit of what we are actually using it for, and whether it is what we really WANT to be using it for. If we're going to audit/refactor the code, we might as well get semantics that are actually useful, rather than painfully contorted to documentation that happens to match our current contorted code. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-04-28 16:18 ` Eric Blake @ 2020-04-28 16:46 ` Vladimir Sementsov-Ogievskiy 2020-04-28 18:37 ` Kevin Wolf 2020-04-28 19:44 ` Vladimir Sementsov-Ogievskiy 0 siblings, 2 replies; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2020-04-28 16:46 UTC (permalink / raw) To: Eric Blake, Kevin Wolf, Max Reitz Cc: Andrey Shinkevich, qemu-devel, qemu block 28.04.2020 19:18, Eric Blake wrote: > On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote: > >>>> Hm. I could imagine that there are formats that have non-zero holes >>>> (e.g. 0xff or just garbage). It would be a bit wrong for them to return >>>> ZERO or DATA then. >>>> >>>> But OTOH we don’t care about such cases, do we? We need to know whether >>>> ranges are zero, data, or unallocated. If they aren’t zero, we only >>>> care about whether reading from it will return data from this layer or not. >>>> >>>> So I suppose that anything that doesn’t support backing files (or >>>> filtered children) should always return ZERO and/or DATA. >>> >>> I'm not sure I agree with the notion that everything should be >>> BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today >>> at least. If we want to change this, we will have to check all callers >>> of bdrv_is_allocated() and friends who might use this to find holes in >>> the file. >> >> Yes. Because they are doing incorrect (or at least undocumented and unreliable) thing. > > Here's some previous mails discussing the same question about what block_status should actually mean. At the time, I was so scared of the prospect of something breaking if I changed things that I ended up keeping status quo, so here we are revisiting the topic several years later, still asking the same questions. > > https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html > https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html > >> >>> >>> Basically, the way bdrv_is_allocated() works today is that we assume an >>> implicit zeroed backing layer even for block drivers that don't support >>> backing files. >> >> But read doesn't work so: it will read data from the bottom layer, not from >> this implicit zeroed backing layer. And it is inconsistent. On read data >> comes exactly from this layer, not from its implicit backing. So it should >> return BDRV_BLOCK_ALLOCATED, accordingly to its definition.. >> >> Or, we should at least document current behavior: >> >> BDRV_BLOCK_ALLOCATED: the content of the block is determined by this >> layer rather than any backing, set by block. Attention: it may not be set >> for drivers without backing support, still data is of course read from >> this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean >> allocation on fs level, which occupies real space on disk.. So, for such drivers >> >> ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or >> (most probably) not, >> don't look at ALLOCATED flag, as it is added by generic layer for another logic, >> not related to fs-allocation. >> >> 0 means that, most probably, data doesn't occupy space on fs, zero-status is >> unknown (most probably non-zero) >> > > That may be right in describing the current situation, but again, needs a GOOD audit of what we are actually using it for, and whether it is what we really WANT to be using it for. If we're going to audit/refactor the code, we might as well get semantics that are actually useful, rather than painfully contorted to documentation that happens to match our current contorted code. > Honest enough:) I'll try to make a table. I don't think that reporting fs-allocation status is a bad thing. But I'm sure that it should be separated from backing-chain-allocated concept. -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-04-28 16:46 ` Vladimir Sementsov-Ogievskiy @ 2020-04-28 18:37 ` Kevin Wolf 2020-04-28 19:44 ` Vladimir Sementsov-Ogievskiy 1 sibling, 0 replies; 16+ messages in thread From: Kevin Wolf @ 2020-04-28 18:37 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy Cc: Andrey Shinkevich, qemu-devel, qemu block, Max Reitz Am 28.04.2020 um 18:46 hat Vladimir Sementsov-Ogievskiy geschrieben: > 28.04.2020 19:18, Eric Blake wrote: > > On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote: > > > > > > > Hm. I could imagine that there are formats that have non-zero holes > > > > > (e.g. 0xff or just garbage). It would be a bit wrong for them to return > > > > > ZERO or DATA then. > > > > > > > > > > But OTOH we don’t care about such cases, do we? We need to know whether > > > > > ranges are zero, data, or unallocated. If they aren’t zero, we only > > > > > care about whether reading from it will return data from this layer or not. > > > > > > > > > > So I suppose that anything that doesn’t support backing files (or > > > > > filtered children) should always return ZERO and/or DATA. > > > > > > > > I'm not sure I agree with the notion that everything should be > > > > BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today > > > > at least. If we want to change this, we will have to check all callers > > > > of bdrv_is_allocated() and friends who might use this to find holes in > > > > the file. > > > > > > Yes. Because they are doing incorrect (or at least undocumented and unreliable) thing. > > > > Here's some previous mails discussing the same question about what block_status should actually mean. At the time, I was so scared of the prospect of something breaking if I changed things that I ended up keeping status quo, so here we are revisiting the topic several years later, still asking the same questions. > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html > > https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html > > > > > > > > > > > > > Basically, the way bdrv_is_allocated() works today is that we assume an > > > > implicit zeroed backing layer even for block drivers that don't support > > > > backing files. > > > > > > But read doesn't work so: it will read data from the bottom layer, not from > > > this implicit zeroed backing layer. And it is inconsistent. On read data > > > comes exactly from this layer, not from its implicit backing. So it should > > > return BDRV_BLOCK_ALLOCATED, accordingly to its definition.. > > > > > > Or, we should at least document current behavior: > > > > > > BDRV_BLOCK_ALLOCATED: the content of the block is determined by this > > > layer rather than any backing, set by block. Attention: it may not be set > > > for drivers without backing support, still data is of course read from > > > this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean > > > allocation on fs level, which occupies real space on disk.. So, for such drivers > > > > > > ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or > > > (most probably) not, > > > don't look at ALLOCATED flag, as it is added by generic layer for another logic, > > > not related to fs-allocation. > > > > > > 0 means that, most probably, data doesn't occupy space on fs, zero-status is > > > unknown (most probably non-zero) > > > > > > > That may be right in describing the current situation, but again, > > needs a GOOD audit of what we are actually using it for, and whether > > it is what we really WANT to be using it for. If we're going to > > audit/refactor the code, we might as well get semantics that are > > actually useful, rather than painfully contorted to documentation > > that happens to match our current contorted code. > > > > Honest enough:) I'll try to make a table. > > I don't think that reporting fs-allocation status is a bad thing. But > I'm sure that it should be separated from backing-chain-allocated > concept. I think we could easily agree on what would be a good concept. My concern is just that existing code probably uses existing semantics and not what we consider more logical now. So if we change it, we must make sure that we change all places that expect the old semantics. Kevin ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-04-28 16:46 ` Vladimir Sementsov-Ogievskiy 2020-04-28 18:37 ` Kevin Wolf @ 2020-04-28 19:44 ` Vladimir Sementsov-Ogievskiy 2020-04-29 9:15 ` Vladimir Sementsov-Ogievskiy 1 sibling, 1 reply; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2020-04-28 19:44 UTC (permalink / raw) To: Eric Blake, Kevin Wolf, Max Reitz Cc: Andrey Shinkevich, qemu-devel, qemu block [-- Attachment #1: Type: text/plain, Size: 3661 bytes --] 28.04.2020 19:46, Vladimir Sementsov-Ogievskiy wrote: > 28.04.2020 19:18, Eric Blake wrote: >> On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote: >> >>>>> Hm. I could imagine that there are formats that have non-zero holes >>>>> (e.g. 0xff or just garbage). It would be a bit wrong for them to return >>>>> ZERO or DATA then. >>>>> >>>>> But OTOH we don’t care about such cases, do we? We need to know whether >>>>> ranges are zero, data, or unallocated. If they aren’t zero, we only >>>>> care about whether reading from it will return data from this layer or not. >>>>> >>>>> So I suppose that anything that doesn’t support backing files (or >>>>> filtered children) should always return ZERO and/or DATA. >>>> >>>> I'm not sure I agree with the notion that everything should be >>>> BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today >>>> at least. If we want to change this, we will have to check all callers >>>> of bdrv_is_allocated() and friends who might use this to find holes in >>>> the file. >>> >>> Yes. Because they are doing incorrect (or at least undocumented and unreliable) thing. >> >> Here's some previous mails discussing the same question about what block_status should actually mean. At the time, I was so scared of the prospect of something breaking if I changed things that I ended up keeping status quo, so here we are revisiting the topic several years later, still asking the same questions. >> >> https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html >> https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html >> >>> >>>> >>>> Basically, the way bdrv_is_allocated() works today is that we assume an >>>> implicit zeroed backing layer even for block drivers that don't support >>>> backing files. >>> >>> But read doesn't work so: it will read data from the bottom layer, not from >>> this implicit zeroed backing layer. And it is inconsistent. On read data >>> comes exactly from this layer, not from its implicit backing. So it should >>> return BDRV_BLOCK_ALLOCATED, accordingly to its definition.. >>> >>> Or, we should at least document current behavior: >>> >>> BDRV_BLOCK_ALLOCATED: the content of the block is determined by this >>> layer rather than any backing, set by block. Attention: it may not be set >>> for drivers without backing support, still data is of course read from >>> this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean >>> allocation on fs level, which occupies real space on disk.. So, for such drivers >>> >>> ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or >>> (most probably) not, >>> don't look at ALLOCATED flag, as it is added by generic layer for another logic, >>> not related to fs-allocation. >>> >>> 0 means that, most probably, data doesn't occupy space on fs, zero-status is >>> unknown (most probably non-zero) >>> >> >> That may be right in describing the current situation, but again, needs a GOOD audit of what we are actually using it for, and whether it is what we really WANT to be using it for. If we're going to audit/refactor the code, we might as well get semantics that are actually useful, rather than painfully contorted to documentation that happens to match our current contorted code. >> > > Honest enough:) I'll try to make a table. > > I don't think that reporting fs-allocation status is a bad thing. But I'm sure that it should be separated from backing-chain-allocated concept. > As a first step, I've don brief analysis of .bdrv_co_block_status of drivers (attached) -- Best regards, Vladimir [-- Attachment #2: block-status-report --] [-- Type: text/plain, Size: 7213 bytes --] = Filter functions = mirror,commit,backup filters: bdrv_co_block_status_from_backing blklogwrites,compress,COR,throttle: bdrv_co_block_status_from_file return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID - semantics of BDRV_BLOCK_RAW is unclean, behavior is broken blkdebug: blkdebug_co_block_status - actually, uses bdrv_co_block_status_from_file, after additional blkdebug-related things not influincing the result = raw = raw: raw_co_block_status return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID - semantics of BDRV_BLOCK_RAW is unclean, behavior is broken Summary: we need to fix BDRV_BLOCK_RAW-recursion semantics to not interfere with block_status_above/is_allocated_above loops. = Format drivers with supports_backing = true = qed: bdrv_qed_co_block_status bdi->unallocated_blocks_are_zero = true; 0 - go to backing ZERO - metadata-zero DATA | OFFSET_VALID with @map set and @file = file-child : metadata-allocated-data parallels: parallels_co_block_status unallocated_blocks_are_zero is unset, but they are actually read as zero if no backing 0 - go to backing DATA | OFFSET_VALID with @map set and @file = file-child : metadat-allocated-data qcow2: qcow2_co_block_status bdi->unallocated_blocks_are_zero = true; ZERO | OFFSET_VALID with @map set and @file = something : metadata-allocated-zero DATA | OFFSET_VALID with @map set and @file = something : metadata-allocated-data RECURSE | DATA | OFFSET_VALID with @map set and @file = file-child : metadata-allocated-data (hint, that you'd better dig zeroes in *file) ZERO : metadata-zero DATA : compressed clusters. So, it's metadata-allocated-data, but no offset we can provide 0: go to backing qcow: qcow_co_block_status unallocated_blocks_are_zero is unset, but they are actually read as zero if no backing 0: go to backing DATA: compressed DATA | OFFSET_VALID with @map set and @file = file-child : metadata-allocated-data vmdk: vmdk_co_block_status unallocated_blocks_are_zero is unset, but they are actually read as zero if no backing 0: go to backing ZERO: metadata-zero DATA with @file set to something: compressed DATA | OFFSET_VALID with @map and @file set : metadata-allocated-data RECURSE | DATA | OFFSET_VALID with @map and @file set : metadata-allocated-data (with hint) Summary: So, obviously unallocated_blocks_are_zero is not needed, as all they behave so common semantics: just corresponds to specification: DATA | ZERO means backing-chain-allocated-at-this-level ZERO means reads as zero OFFSET_VALID means that it may be read from @file (which should be child of this) at @map offset additional RECURSE is hint to dig zeroes in @file of @map offset (if it is not set, digging is most probably useless) of course, nothing here about fs-allocation status or occupying space on disk = Format drivers with supports_backing = false = vdi: vdi_co_block_status bdi->unallocated_blocks_are_zero = true; 0: metadata-unallocated.. in this case vid_co_preadv zeroize qiov. So, it actually should be ZERO DATA | OFFSET_VALID with @map set and @file = file-child : metadata-allocated-data RECURSE | DATA | OFFSET_VALID with @map set and @file = file-child : metadata-allocated-data (hint, that you'd better dig zeroes in *file) vpc: vpc_co_block_status bdi->unallocated_blocks_are_zero = true; 0: unallocated? in this case vpc_co_preadv zeroize qiov. So, it actually should be ZERO DATA | OFFSET_VALID with @map set and @file = file-child : metadata-allocated-data RECURSE | DATA | OFFSET_VALID with @map set and @file = file-child : metadata-allocated-data (hint, that you'd better dig zeroes in *file) Summary: Again, unallocated_blocks_are_zero is always true, so it's not actually needed Moreover, I think drivers should return ZERO by themselves So common semantics should be the same as for backing-supporting formats, except for they must always set at least one of DATA and ZERO flags of course, nothing here about fs-allocation status or occupying space on disk == Protocol drivers, with no children == iscsi, iser: iscsi_co_block_status bdi->unallocated_blocks_are_zero = iscsilun->lbprz; but block-status never return 0, if iscsilun->lbprz always: OFFSET_VALID with @map set and @file = this also: DATA : normal-data 0 - fs-unallocated, not zero ZERO - fs-unallocated, zero nbd: nbd_client_co_block_status special case: ZERO: for offset > nbd EOF (which may be not sector-aligned) I think, no reason to not report OFFSET_VALID combination, as byte-accurate EOF is actually internal nbd driver information otherwise: always: OFFSET_VALID with @map=@offset and @file = this also: DATA: if feature unsupported, so unknown any combination of DATA and ZERO, mapping NBD protocol, !DATA means: most probably doesn't occupy space on fs ZERO means: reads as zero null-co, null-aio: null_co_block_status Hmm, is it a protocol driver? :) always: OFFSET_VALID with @map set and @file = this also: ZERO - if s->read_zeroes (read will zeroize qiov) 0 - otherwise (read does nothing) gluster: qemu_gluster_co_block_status always: OFFSET_VALID with @map=@offset and @file=this also: DATA: if not want_zero, or unknown, or known-data - normal-data ZERO: fs-unallocated-zero file-posix: raw_co_block_status bdi->unallocated_blocks_are_zero = s->discard_zeroes; but it never report unallocated :) always: OFFSET_VALID with @map=@offset and @file=this also: DATA: if not want_zero, or unknown, or known-data - normal-data ZERO: fs-unallocated-zero sheepdog: sd_co_block_status Hmm. sheepdog is a dead project, yes? Should we deprecate it? 0: fs-unallocated DATA | OFFSET_VALID with @map=@offset and @file=this vvfat: vvfat_co_block_status always: DATA Summary: First, unallocated_blocks_are_zero is obviously unused. Second, I think all should always return at least OFFSET_VALID with @map=@offset and @file=this, as it's exactly where from we read in terms of block-layer. And good to be consistent about it. Next, OK, seems, most of drivers wants to report fs-allocation/disk-occupying status, 0 - most probably doesn't occupy space, not-zero ZERO - most probably doesn't occupy space, zero DATA - most probably occupy space, but also is a safe default DATA | ZERO - most probably occupy space, but for some reason known zeroes. Only nbd can theoretically do it. = Testing = test: bdrv_test_co_block_status always 0. test_sync_op_block_status() tests different scenarios of bdrv_is_allocated. If we decide to change the behavior, not a problem to change test results and bdrv_test_co_block_status() = Drivers without block_status = A lot of them. Not a problem. Worth mentioning vhdx: vhdx bdi->unallocated_blocks_are_zero = (s->params.data_bits & VHDX_PARAMS_HAS_PARENT) == 0; - it is not used, as DATA|ALLOCATED is set for drivers without block_status ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-04-28 19:44 ` Vladimir Sementsov-Ogievskiy @ 2020-04-29 9:15 ` Vladimir Sementsov-Ogievskiy 2020-04-29 10:50 ` Vladimir Sementsov-Ogievskiy 0 siblings, 1 reply; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2020-04-29 9:15 UTC (permalink / raw) To: Eric Blake, Kevin Wolf, Max Reitz Cc: Andrey Shinkevich, qemu-devel, qemu block [-- Attachment #1: Type: text/plain, Size: 3851 bytes --] 28.04.2020 22:44, Vladimir Sementsov-Ogievskiy wrote: > 28.04.2020 19:46, Vladimir Sementsov-Ogievskiy wrote: >> 28.04.2020 19:18, Eric Blake wrote: >>> On 4/28/20 10:13 AM, Vladimir Sementsov-Ogievskiy wrote: >>> >>>>>> Hm. I could imagine that there are formats that have non-zero holes >>>>>> (e.g. 0xff or just garbage). It would be a bit wrong for them to return >>>>>> ZERO or DATA then. >>>>>> >>>>>> But OTOH we don’t care about such cases, do we? We need to know whether >>>>>> ranges are zero, data, or unallocated. If they aren’t zero, we only >>>>>> care about whether reading from it will return data from this layer or not. >>>>>> >>>>>> So I suppose that anything that doesn’t support backing files (or >>>>>> filtered children) should always return ZERO and/or DATA. >>>>> >>>>> I'm not sure I agree with the notion that everything should be >>>>> BDRV_BLOCK_ALLOCATED at the lowest layer. It's not what it means today >>>>> at least. If we want to change this, we will have to check all callers >>>>> of bdrv_is_allocated() and friends who might use this to find holes in >>>>> the file. >>>> >>>> Yes. Because they are doing incorrect (or at least undocumented and unreliable) thing. >>> >>> Here's some previous mails discussing the same question about what block_status should actually mean. At the time, I was so scared of the prospect of something breaking if I changed things that I ended up keeping status quo, so here we are revisiting the topic several years later, still asking the same questions. >>> >>> https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00069.html >>> https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg03757.html >>> >>>> >>>>> >>>>> Basically, the way bdrv_is_allocated() works today is that we assume an >>>>> implicit zeroed backing layer even for block drivers that don't support >>>>> backing files. >>>> >>>> But read doesn't work so: it will read data from the bottom layer, not from >>>> this implicit zeroed backing layer. And it is inconsistent. On read data >>>> comes exactly from this layer, not from its implicit backing. So it should >>>> return BDRV_BLOCK_ALLOCATED, accordingly to its definition.. >>>> >>>> Or, we should at least document current behavior: >>>> >>>> BDRV_BLOCK_ALLOCATED: the content of the block is determined by this >>>> layer rather than any backing, set by block. Attention: it may not be set >>>> for drivers without backing support, still data is of course read from >>>> this layer. Note, that for such drivers BDRV_BLOCK_ALLOCATED may mean >>>> allocation on fs level, which occupies real space on disk.. So, for such drivers >>>> >>>> ZERO | ALLOCATED means that, read as zero, data may be allocated on fs, or >>>> (most probably) not, >>>> don't look at ALLOCATED flag, as it is added by generic layer for another logic, >>>> not related to fs-allocation. >>>> >>>> 0 means that, most probably, data doesn't occupy space on fs, zero-status is >>>> unknown (most probably non-zero) >>>> >>> >>> That may be right in describing the current situation, but again, needs a GOOD audit of what we are actually using it for, and whether it is what we really WANT to be using it for. If we're going to audit/refactor the code, we might as well get semantics that are actually useful, rather than painfully contorted to documentation that happens to match our current contorted code. >>> >> >> Honest enough:) I'll try to make a table. >> >> I don't think that reporting fs-allocation status is a bad thing. But I'm sure that it should be separated from backing-chain-allocated concept. >> > > As a first step, I've don brief analysis of .bdrv_co_block_status of drivers (attached) > As a second step, here is brief analysis of all block_status usage -- Best regards, Vladimir [-- Attachment #2: block-status-usage-report --] [-- Type: text/plain, Size: 4098 bytes --] Public interface of block-status is: bdrv_block_status bdrv_block_status_above bdrv_is_allocated bdrv_is_allocated_above = bdrv_block_status = bdrv_make_zero: works on current level of backing-chain, want's to skip zeroes, not interested in @map and @file img convert: convert_iteration_sectors: wants to distinguish ZERO, DATA and go-to-backing. It also tries to not write zeroes, if have short backing file, but does it a bit wrong. Treats unallocated as DATA if no backing. img-map: get_block_status: distinguish ZERO, DATA and go-to-backing. Count depth of the backing. Just reports final ZERO and DATA. So, fs-unallocated thing is reported to user = bdrv_block_status_above = block-copy: block_copy_block_status: wants two things: 1. skip go-to-backing holes in top layer for top mode 2. do write_zero for ZERO areas mirror: call on the whole backing chain - for DATA (and for DATA|ZERO which is bad) do just copy - for ZERO do just ZERO - for 0 (which means that bottom layer doesn't report that unallocated are zero) does DISCARD (which is most-probably zeroing) - absolutely wrong thing qcow2: is_zero: call on the whole backing chain, want's just to check is reads-as-zero or not. qcow2: qcow2_measure: call on the whole backing chain: - skip ZERO - count clusters with both DATA and ALLOCATED set. Hmm. ALLOCATED is always set for DATA. Seems the function actually tries to calculate disk occupation, assuming that BDRV_BLOCK_ALLOCATED helps in it, but it actually doesn't.. I think, correct solution is to support offset and bytes in bdrv_measure, and split it from block_status. Then qcow2_measure will just recursively call bdrv_measure on its children. This would be clean. nbd: nbd_co_send_sparse_read: call on the whole backing chain: - wants to distinguish zeroes nbd: blockstatus_to_extents: call on the whole backing chain: !ALLOCATED -> NBD_HOLE ZERO -> NBD_ZERO So, we report HOLE only if it's not BDRV_BLOCK_ALLOCATED on any layer.. That's wrong. I think, we should report HOLE in a lot more cases. Actually, when not occupy real space on disk. img-compare: call on the whole backing chain: - do not compare zeroes - do not compare if both report unallocated.. it's actually not correct for protocols which reports fs-unallocated-non-zeroes. As reads may differ actually. Still, read from fs-unallocated area is not guaranteed to return same thing each time, yes? At least, null-co doesn't guarantee it :).. So, it may be correct to skip these areas. Or may be better to always report them different?? - consider data-zeroes equal to unallocated.. it's definitely not correct for protocols which reports fs-unallocated-non-zeroes I think, img-compare must only consider zero/non-zero, and don't touch other block-status features. Otherwise it's a mess img-convert: convert_iteration_sectors: call on the whole backing chain: already described in bdrv_block_status section = bdrv_is_allocated = Obvious thing for backing-chain related operation (still wrong that some protocol drivers may return fs-unallocated and it is treated as go-to-backing areas): block-copy, commit, copy-on-read, stream, img-rebase Others: vvfat: o_O it has qcow child.. and operates like self is a backing of this child. But yes, it just uses bdrv_is_allocated to understand is chunk is rewritten in qcow. migration/block: skip unallocated for top mode (shared_base, as it called here) io-alloc: just report number of allocated in top layer io-map: map_is_allocated: same thing as io-alloc, but report chunks test_sync_op_block_status: just check what it returns = bdrv_is_allocated_above = Obvious usage for backing-chain related: commit, mirror, stream, img-rebase. Wrong for fs-unallocated-non-zero reporting drivers Others: qcow2: is_unallocated: call for the whole backing chain. Used to check is-zero.. Wrong for fs-unallocated-non-zero reporting drivers, and may be more efficient if consider also ZERO status.. but in some smart-fast way. replication: allocated or not in backing-chain: common case ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-04-29 9:15 ` Vladimir Sementsov-Ogievskiy @ 2020-04-29 10:50 ` Vladimir Sementsov-Ogievskiy 0 siblings, 0 replies; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2020-04-29 10:50 UTC (permalink / raw) To: Eric Blake, Kevin Wolf, Max Reitz Cc: Andrey Shinkevich, qemu-devel, qemu block Full summary What drivers returns? Filters and raw are just broken together with their BDRV_BLOCK_RAW. Format drivers behaves as follows (except for backing-not-supporting, which should be fixed): 0 - go to backing (not-backing supporting will never return it, backing-supporting will return it even if there is not backing file, but read is guaranteed to return zeroes, if there is no backing file at the moment) ZERO - zero at this level DATA - data at this level DATA | ZERO actually never returned. Protocol drivers 0 - fs-unallocated, read may return any garbage (null-co, iscsi) from SCSI: Logical Block Provisioning Read Zeros (LBPRZ) bit 1 If the logical block provisioning read zeros (LBPRZ) bit is set to one, then, for an unmapped LBA specified by a read operation, the deviceserver shall send user data with all bits set to zero to the data-in buffer. 0 If the TPRZ bit is set to zero, then, for an unmapped LBA specified by a read operation, the device server may send user data with all bitsset to any value to the data-in buffer. so, yes, this 0 actually have significant meaning. And null-co matches it too. And nbd should be fixed to match it, I think. ZERO - fs-unallocated, reads as zero DATA - fs-allocated data or safe default ZERO | DATA - only nbd may return it, and it seems wrong. let's fix it. So, seems, we may can with it as is. The only thing to be documented is meaning of zero status returned by driver: for backing-supporting it means go-to-backing, with a guarantee to read zeroes if there is no backing file, for backing-not-supporting it means most probably not occupy disk space, read returns garbage. And format drivers without backing support should never return 0. =================== How it is used? And here we definitely fail. As there are a lot of places where 0 return of drivers is misused: we consider it fs-unallocated, but it may be just go-to-backing, we consider it go-to-backing, but it may be fs-unallocated. let's go again through our public API = bdrv_block_status = img-convert and bdrv_make_zero wants only zero and go-to-backing information img-map is better to distinguish fs-unallocated and go-to-backing and report them in different way. = bdrv_block_status_above = most callers needs only zero and go-to-backing information, others are doing wrong things and should be rewritten anyway = bdrv_is_allocated = most callers need only go-to-backing information. io-alloc io-map - are reporting utilities, and they probably want to show fs-unallocated as well... but we never documented, how actually img-map, io-map and io-alloc should work and what they are trying to say :). Anyway, they may use same interface as img-map = bdrv_is_allocated_above = callers only need go-to-backing information OK, sounds good. So, for most things we only need zero/data/go-to-backing information. fs-unallocated should be treated as DATA, it's garbage, but on read we will directly read this garbage, so we should consider it DATA. And only for reporting through img-map, io-alloc and io-map we may want fs-unallocated information. Do we actually need it? Basing on this, I think that there should be four block-status types: ZERO: unrelated to backing, reads as zero from this layer DATA: unrelated to backing, reads from this layer, may be non-zero BACKING: go to backing for information, reads from backing as well RAW: I'm a filter or 'raw' driver, I don't know what to do. I give you my child and offset in it, take care of it. Be careful: it may be backing child or not, don't break your backing-chain loop on me! So we may require that at most one of DATA, ZERO, RAW is set. And if nothing is set it means BACKING. And if we want to report fs-unallocated things, we just need additional flag for it, like BDRV_FS_UNALLOCATED_GARBAGE, which may be combined with DATA type chunks. Or, may be even better, to split type from flags: block_status will return type, which is one of these four types, and other things (RECURSE, EOF, OFFSET_VALID, FS_UNALLOCATED_GARBAGE) goes to additional flags out-argument. Note also, that the only user of @map and @file parameters is img-map. And all other callers have to pass NULL, NULL. I think, this definitely should be refactored. -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-04-28 11:08 ` Max Reitz 2020-04-28 11:28 ` Kevin Wolf @ 2020-04-28 14:51 ` Vladimir Sementsov-Ogievskiy 2020-04-30 19:12 ` Vladimir Sementsov-Ogievskiy 2020-05-07 12:58 ` Max Reitz 1 sibling, 2 replies; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2020-04-28 14:51 UTC (permalink / raw) To: Max Reitz, qemu block; +Cc: Kevin Wolf, Andrey Shinkevich, qemu-devel 28.04.2020 14:08, Max Reitz wrote: > On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote: >> Hi! >> >> I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and >> is_allocated_above", and returned to all the inconsistencies about >> block-status. I keep in mind Max's series about child-access functions, >> and Andrey's work about using COR filter in block-stream, which depends >> on Max's series (because, without them COR fitler with file child breaks >> backing chains).. And, it seems that it's better to discuss some >> questions before resending. >> >> First, problems about block-status: >> >> 1. We consider ALLOCATED = ZERO | DATA, and documented as follows: >> >> * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer >> * BDRV_BLOCK_ZERO: offset reads as zero >> * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing >> raw data >> * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this >> * layer rather than any backing, set by block >> layer >> >> This actually means, that we should always have BDRV_BLOCK_ALLOCATED for >> formats which doesn't support backing. So, all such format drivers must >> return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for >> example, iscsi - doesn't. > > Hm. I could imagine that there are formats that have non-zero holes > (e.g. 0xff or just garbage). It would be a bit wrong for them to return > ZERO or DATA then. > > But OTOH we don’t care about such cases, do we? We need to know whether > ranges are zero, data, or unallocated. If they aren’t zero, we only > care about whether reading from it will return data from this layer or not. > > So I suppose that anything that doesn’t support backing files (or > filtered children) should always return ZERO and/or DATA. > >> 2. ZERO. The meaning differs a bit for generic block_status and for >> drivers.. I think, we at least should document it like this: >> >> BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer >> BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at >> this layer and read as ZERO. If generic block_status returns ZERO, it >> only mean that it reads as zero, but the region may be allocated on >> underlying level. > > Hm. What does that mean? > > One of the problems is that “allocated” has two meanings: > (1) reading data returns data defined at this backing layer, > (2) actually allocated, i.e. takes up space on the file represented by > this BDS. > > As far as I understand, we actually don’t care about (2) in the context > of block_status, but just about (1). > > So if a layer returns ZERO, it is by definition (1)-allocated. (It > isn’t necessarily (2)-allocated.) > >> 3. bdi.unallocated_blocks_are_zero >> >> I think it's very bad, that we have formats, that supports backing, but >> doesn't report bdi.unallocated_blocks_are_zero as true. It means that >> UNALLOCATED region reads as zero if we have short backing file, and not >> as zero if we remove this short backing file. > > What do you mean by “remove this short backing file”? Because generally > one can’t just drop a backing file. > > So maybe a case like block-stream? Wouldn’t that be a bug in > block-stream them, i.e. shouldn’t it stream zeros after the end of the > backing file? > >> I can live with it but >> this is weird logic. These bad drivers are qcow (not qcow2), parallels >> and vmdk. I hope, they actually just forget to set >> unallocated_blocks_are_zero to true. > > qcow definitely sounds like it. > >> Next. But what about drivers which doesn't support backing? As we >> considered above, they should always return ZERO or DATA, as everything >> is allocated in this backing-chain level (last level, of course).. So >> again unallocated_blocks_are_zero is meaningless. So, I think, that >> driver which doesn't support backings, should be fixed to return always >> ZERO or DATA, than we don't need this unallocated_blocks_are_zero at all. > > Agreed. > >> 3. > > The second 3.? :) > >> Short backing files in allocated_above: we must consider space after >> EOF as ALLOCATED, if short backing file is inside requested >> backing-chain part, as it produced exactly because of this short file >> (and we never go to backing). > > Sounds correct. > >> (current realization of allocated_above is >> buggy, see my outdated series "[PATCH 0/4] fix & merge >> block_status_above and is_allocated_above") >> >> 4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have >> a backing chain of non-backing child.. I just remember that we didn't >> reach the consensus. > > Possible? :) > >> 5. Filters.. OK we have two functions for them: >> bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I >> think both are wrong: >> >> bdrv_co_block_status_from_file leads to problem [4], when we can report >> UNALLOCATED, which refers not to the current backing chain, but to sub >> backing chain of file child, which is inconsistent with >> block_status_above and is_allocated_above iteration. >> >> bdrv_co_block_status_from_backing is also is not consistent with >> block_status_above iteration.. At least at leads to querying the same >> node twice. > > Well, yes. > >> So, about filters and backing chains. Keeping (OK, just, trying to keep) >> all these things in mind, I think that it's better to keep backing >> chains exactly *backing* chains, so that "backing" child is the only >> "not own" child of the node. So, its close to current behavior and we >> don't need child-access functions. Then how filters should work: >> >> Filter with backing child, should always return UNALLOCATED (i.e. no >> DATA, no ZERO), it is honest: everything is on the other level of >> backing chain. > > I disagree, because filters with or without backing children should work > exactly the same way and just not appear in the backing chain. They work the same way in my variant too. The only difference is that if you use file-child-based filters, you can't do stream/commit around them. I just think, that binding the concept to the "backing" link of the node is simpler and more generic. In blockdev era, when all nodes will be named and libvirt will take care of all nodes including filters, we will not need any filter-skipping magic, libvirt will directly point to exact nodes it means. And we can deprecate "file" children of existing filters, to finally reach simple architecture with simple backing chain of nodes (any nodes). And after deprecating old filename-based interfaces, we'll drop all filter-skipping magic.. > >> Filter with file child should always return BDRV_BLOCK_DATA | >> BDRV_BLOCK_RECURSE, to show that: >> 1. everything is allocated in *this* level of backing chain >> 2. filter is too lazy to dig in it's file child (and, maybe the whole >> sub-tree of it) and asks generic layer to do it by itself, if it wants >> zeroes. >> >> Then, of course, if we want some filter to be inside backing chain, it >> should have not "file" child but "backing". For this, we may support in >> current public filter both variants: backing or file, as user prefer. >> I.e., filter is opened either with file option or with backing and >> operate correspondingly. And newer filters (like backup-top) may support >> only backing variants. > > I disagree again for the same reason. > > Max > -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-04-28 14:51 ` Vladimir Sementsov-Ogievskiy @ 2020-04-30 19:12 ` Vladimir Sementsov-Ogievskiy 2020-05-01 3:04 ` Andrey Shinkevich 2020-05-07 12:58 ` Max Reitz 1 sibling, 1 reply; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2020-04-30 19:12 UTC (permalink / raw) To: Max Reitz, qemu block; +Cc: Kevin Wolf, Andrey Shinkevich, qemu-devel 28.04.2020 17:51, Vladimir Sementsov-Ogievskiy wrote: > 28.04.2020 14:08, Max Reitz wrote: >> On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote: >>> Hi! >>> >>> I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and >>> is_allocated_above", and returned to all the inconsistencies about >>> block-status. I keep in mind Max's series about child-access functions, >>> and Andrey's work about using COR filter in block-stream, which depends >>> on Max's series (because, without them COR fitler with file child breaks >>> backing chains).. And, it seems that it's better to discuss some >>> questions before resending. >>> >>> First, problems about block-status: >>> >>> 1. We consider ALLOCATED = ZERO | DATA, and documented as follows: >>> >>> * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer >>> * BDRV_BLOCK_ZERO: offset reads as zero >>> * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing >>> raw data >>> * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this >>> * layer rather than any backing, set by block >>> layer >>> >>> This actually means, that we should always have BDRV_BLOCK_ALLOCATED for >>> formats which doesn't support backing. So, all such format drivers must >>> return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for >>> example, iscsi - doesn't. >> >> Hm. I could imagine that there are formats that have non-zero holes >> (e.g. 0xff or just garbage). It would be a bit wrong for them to return >> ZERO or DATA then. >> >> But OTOH we don’t care about such cases, do we? We need to know whether >> ranges are zero, data, or unallocated. If they aren’t zero, we only >> care about whether reading from it will return data from this layer or not. >> >> So I suppose that anything that doesn’t support backing files (or >> filtered children) should always return ZERO and/or DATA. >> >>> 2. ZERO. The meaning differs a bit for generic block_status and for >>> drivers.. I think, we at least should document it like this: >>> >>> BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer >>> BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at >>> this layer and read as ZERO. If generic block_status returns ZERO, it >>> only mean that it reads as zero, but the region may be allocated on >>> underlying level. >> >> Hm. What does that mean? >> >> One of the problems is that “allocated” has two meanings: >> (1) reading data returns data defined at this backing layer, >> (2) actually allocated, i.e. takes up space on the file represented by >> this BDS. >> >> As far as I understand, we actually don’t care about (2) in the context >> of block_status, but just about (1). >> >> So if a layer returns ZERO, it is by definition (1)-allocated. (It >> isn’t necessarily (2)-allocated.) >> >>> 3. bdi.unallocated_blocks_are_zero >>> >>> I think it's very bad, that we have formats, that supports backing, but >>> doesn't report bdi.unallocated_blocks_are_zero as true. It means that >>> UNALLOCATED region reads as zero if we have short backing file, and not >>> as zero if we remove this short backing file. >> >> What do you mean by “remove this short backing file”? Because generally >> one can’t just drop a backing file. >> >> So maybe a case like block-stream? Wouldn’t that be a bug in >> block-stream them, i.e. shouldn’t it stream zeros after the end of the >> backing file? >> >>> I can live with it but >>> this is weird logic. These bad drivers are qcow (not qcow2), parallels >>> and vmdk. I hope, they actually just forget to set >>> unallocated_blocks_are_zero to true. >> >> qcow definitely sounds like it. >> >>> Next. But what about drivers which doesn't support backing? As we >>> considered above, they should always return ZERO or DATA, as everything >>> is allocated in this backing-chain level (last level, of course).. So >>> again unallocated_blocks_are_zero is meaningless. So, I think, that >>> driver which doesn't support backings, should be fixed to return always >>> ZERO or DATA, than we don't need this unallocated_blocks_are_zero at all. >> >> Agreed. >> >>> 3. >> >> The second 3.? :) >> >>> Short backing files in allocated_above: we must consider space after >>> EOF as ALLOCATED, if short backing file is inside requested >>> backing-chain part, as it produced exactly because of this short file >>> (and we never go to backing). >> >> Sounds correct. >> >>> (current realization of allocated_above is >>> buggy, see my outdated series "[PATCH 0/4] fix & merge >>> block_status_above and is_allocated_above") >>> >>> 4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have >>> a backing chain of non-backing child.. I just remember that we didn't >>> reach the consensus. >> >> Possible? :) >> >>> 5. Filters.. OK we have two functions for them: >>> bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I >>> think both are wrong: >>> >>> bdrv_co_block_status_from_file leads to problem [4], when we can report >>> UNALLOCATED, which refers not to the current backing chain, but to sub >>> backing chain of file child, which is inconsistent with >>> block_status_above and is_allocated_above iteration. >>> >>> bdrv_co_block_status_from_backing is also is not consistent with >>> block_status_above iteration.. At least at leads to querying the same >>> node twice. >> >> Well, yes. >> >>> So, about filters and backing chains. Keeping (OK, just, trying to keep) >>> all these things in mind, I think that it's better to keep backing >>> chains exactly *backing* chains, so that "backing" child is the only >>> "not own" child of the node. So, its close to current behavior and we >>> don't need child-access functions. Then how filters should work: >>> >>> Filter with backing child, should always return UNALLOCATED (i.e. no >>> DATA, no ZERO), it is honest: everything is on the other level of >>> backing chain. >> >> I disagree, because filters with or without backing children should work >> exactly the same way and just not appear in the backing chain. > > They work the same way in my variant too. The only difference is that if you use file-child-based filters, you can't do stream/commit around them. I just think, that binding the concept to the "backing" link of the node is simpler and more generic. In blockdev era, when all nodes will be named and libvirt will take care of all nodes including filters, we will not need any filter-skipping magic, libvirt will directly point to exact nodes it means. And we can deprecate "file" children of existing filters, to finally reach simple architecture with simple backing chain of nodes (any nodes). And after deprecating old filename-based interfaces, we'll drop all filter-skipping magic.. What do you think? -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-04-30 19:12 ` Vladimir Sementsov-Ogievskiy @ 2020-05-01 3:04 ` Andrey Shinkevich 2020-05-06 5:56 ` Vladimir Sementsov-Ogievskiy 0 siblings, 1 reply; 16+ messages in thread From: Andrey Shinkevich @ 2020-05-01 3:04 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy, Max Reitz, qemu block Cc: Kevin Wolf, qemu-devel [-- Attachment #1: Type: text/plain, Size: 1378 bytes --] Sounds good to me generally. Also, we need to identify the filter by its node name when the file names of a node and of the filter above it are the same. And what about automatically generated node name for the filter? We will want to pass it to the stream routine. Andrey ________________________________ From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Sent: Thursday, April 30, 2020 10:12 PM To: Max Reitz <mreitz@redhat.com>; qemu block <qemu-block@nongnu.org> Cc: Kevin Wolf <kwolf@redhat.com>; qemu-devel <qemu-devel@nongnu.org>; Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Subject: Re: backing chain & block status & filters > > .... The only difference is that if you use file-child-based filters, you can't do stream/commit around them. I just think, that binding the concept to the "backing" link of the node is simpler and more generic. In blockdev era, when all nodes will be named and libvirt will take care of all nodes including filters, we will not need any filter-skipping magic, libvirt will directly point to exact nodes it means. And we can deprecate "file" children of existing filters, to finally reach simple architecture with simple backing chain of nodes (any nodes). And after deprecating old filename-based interfaces, we'll drop all filter-skipping magic.. What do you think? -- Best regards, Vladimir [-- Attachment #2: Type: text/html, Size: 2506 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-05-01 3:04 ` Andrey Shinkevich @ 2020-05-06 5:56 ` Vladimir Sementsov-Ogievskiy 0 siblings, 0 replies; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2020-05-06 5:56 UTC (permalink / raw) To: Andrey Shinkevich, Max Reitz, qemu block; +Cc: Kevin Wolf, qemu-devel 01.05.2020 6:04, Andrey Shinkevich wrote: > Sounds good to me generally. > Also, we need to identify the filter by its node name when the file names of a node and of the filter above it are the same. And what about automatically generated node name for the filter? We will want to pass it to the stream routine. > As I understand, there should not be auto-generated node-names in blockdev era. All nodes are named by libvirt with full-control on them. -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-04-28 14:51 ` Vladimir Sementsov-Ogievskiy 2020-04-30 19:12 ` Vladimir Sementsov-Ogievskiy @ 2020-05-07 12:58 ` Max Reitz 2020-05-07 19:34 ` Vladimir Sementsov-Ogievskiy 1 sibling, 1 reply; 16+ messages in thread From: Max Reitz @ 2020-05-07 12:58 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy, qemu block Cc: Kevin Wolf, Andrey Shinkevich, qemu-devel [-- Attachment #1.1: Type: text/plain, Size: 8317 bytes --] On 28.04.20 16:51, Vladimir Sementsov-Ogievskiy wrote: > 28.04.2020 14:08, Max Reitz wrote: >> On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote: >>> Hi! >>> >>> I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and >>> is_allocated_above", and returned to all the inconsistencies about >>> block-status. I keep in mind Max's series about child-access functions, >>> and Andrey's work about using COR filter in block-stream, which depends >>> on Max's series (because, without them COR fitler with file child breaks >>> backing chains).. And, it seems that it's better to discuss some >>> questions before resending. >>> >>> First, problems about block-status: >>> >>> 1. We consider ALLOCATED = ZERO | DATA, and documented as follows: >>> >>> * BDRV_BLOCK_DATA: allocation for data at offset is tied to this >>> layer >>> * BDRV_BLOCK_ZERO: offset reads as zero >>> * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing >>> raw data >>> * BDRV_BLOCK_ALLOCATED: the content of the block is determined by >>> this >>> * layer rather than any backing, set by block >>> layer >>> >>> This actually means, that we should always have BDRV_BLOCK_ALLOCATED for >>> formats which doesn't support backing. So, all such format drivers must >>> return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for >>> example, iscsi - doesn't. >> >> Hm. I could imagine that there are formats that have non-zero holes >> (e.g. 0xff or just garbage). It would be a bit wrong for them to return >> ZERO or DATA then. >> >> But OTOH we don’t care about such cases, do we? We need to know whether >> ranges are zero, data, or unallocated. If they aren’t zero, we only >> care about whether reading from it will return data from this layer or >> not. >> >> So I suppose that anything that doesn’t support backing files (or >> filtered children) should always return ZERO and/or DATA. >> >>> 2. ZERO. The meaning differs a bit for generic block_status and for >>> drivers.. I think, we at least should document it like this: >>> >>> BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer >>> BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at >>> this layer and read as ZERO. If generic block_status returns ZERO, it >>> only mean that it reads as zero, but the region may be allocated on >>> underlying level. >> >> Hm. What does that mean? >> >> One of the problems is that “allocated” has two meanings: >> (1) reading data returns data defined at this backing layer, >> (2) actually allocated, i.e. takes up space on the file represented by >> this BDS. >> >> As far as I understand, we actually don’t care about (2) in the context >> of block_status, but just about (1). >> >> So if a layer returns ZERO, it is by definition (1)-allocated. (It >> isn’t necessarily (2)-allocated.) >> >>> 3. bdi.unallocated_blocks_are_zero >>> >>> I think it's very bad, that we have formats, that supports backing, but >>> doesn't report bdi.unallocated_blocks_are_zero as true. It means that >>> UNALLOCATED region reads as zero if we have short backing file, and not >>> as zero if we remove this short backing file. >> >> What do you mean by “remove this short backing file”? Because generally >> one can’t just drop a backing file. >> >> So maybe a case like block-stream? Wouldn’t that be a bug in >> block-stream them, i.e. shouldn’t it stream zeros after the end of the >> backing file? >> >>> I can live with it but >>> this is weird logic. These bad drivers are qcow (not qcow2), parallels >>> and vmdk. I hope, they actually just forget to set >>> unallocated_blocks_are_zero to true. >> >> qcow definitely sounds like it. >> >>> Next. But what about drivers which doesn't support backing? As we >>> considered above, they should always return ZERO or DATA, as everything >>> is allocated in this backing-chain level (last level, of course).. So >>> again unallocated_blocks_are_zero is meaningless. So, I think, that >>> driver which doesn't support backings, should be fixed to return always >>> ZERO or DATA, than we don't need this unallocated_blocks_are_zero at >>> all. >> >> Agreed. >> >>> 3. >> >> The second 3.? :) >> >>> Short backing files in allocated_above: we must consider space after >>> EOF as ALLOCATED, if short backing file is inside requested >>> backing-chain part, as it produced exactly because of this short file >>> (and we never go to backing). >> >> Sounds correct. >> >>> (current realization of allocated_above is >>> buggy, see my outdated series "[PATCH 0/4] fix & merge >>> block_status_above and is_allocated_above") >>> >>> 4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have >>> a backing chain of non-backing child.. I just remember that we didn't >>> reach the consensus. >> >> Possible? :) >> >>> 5. Filters.. OK we have two functions for them: >>> bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I >>> think both are wrong: >>> >>> bdrv_co_block_status_from_file leads to problem [4], when we can report >>> UNALLOCATED, which refers not to the current backing chain, but to sub >>> backing chain of file child, which is inconsistent with >>> block_status_above and is_allocated_above iteration. >>> >>> bdrv_co_block_status_from_backing is also is not consistent with >>> block_status_above iteration.. At least at leads to querying the same >>> node twice. >> >> Well, yes. >> >>> So, about filters and backing chains. Keeping (OK, just, trying to keep) >>> all these things in mind, I think that it's better to keep backing >>> chains exactly *backing* chains, so that "backing" child is the only >>> "not own" child of the node. So, its close to current behavior and we >>> don't need child-access functions. Then how filters should work: >>> >>> Filter with backing child, should always return UNALLOCATED (i.e. no >>> DATA, no ZERO), it is honest: everything is on the other level of >>> backing chain. >> >> I disagree, because filters with or without backing children should work >> exactly the same way and just not appear in the backing chain. > > They work the same way in my variant too. The only difference is that if > you use file-child-based filters, you can't do stream/commit around > them. Which is a bug that’s been know for a long time, and the primary cause for the “Deal with filters” series. > I just think, that binding the concept to the "backing" link of > the node is simpler and more generic. In blockdev era, when all nodes > will be named and libvirt will take care of all nodes including filters, > we will not need any filter-skipping magic, libvirt will directly point > to exact nodes it means. And we can deprecate "file" children of > existing filters, One thing to note is that all user-creatable filter drivers currently use “file”, so this would change them all. The idea of just using backing for filters just always seems to me like trying to take the easy way out. It seems to me like many things around filters are broken because we weren’t careful enough when introducing them (mostly a combination of “let’s see whether it just works” and “it seems to mostly work”), and the “Deal with filters” series attempts to rectify that. As evidenced by the reviews that required even more preliminary work (like the still on-list and under-discussion BdrvChildRole series), there is really a ton of design that should be improved. That makes me very hesitant to just switch filters over to backing, because I feel that might alleviate some of the most pressing symptoms, while not addressing the underlying issues. And without symptoms that hurt, nobody might want to fix the issues. Basically, it feels like more of the same “Let’s try whether it mostly works” and “it seems to mostly work”. (Also, my naïve feeling is that if just treating backing and file the same for filter drivers, the deal with filters series would be ten patches long and v1 would have been uncontroversial.) Max [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: backing chain & block status & filters 2020-05-07 12:58 ` Max Reitz @ 2020-05-07 19:34 ` Vladimir Sementsov-Ogievskiy 0 siblings, 0 replies; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2020-05-07 19:34 UTC (permalink / raw) To: Max Reitz, qemu block; +Cc: Kevin Wolf, Andrey Shinkevich, qemu-devel 07.05.2020 15:58, Max Reitz wrote: > On 28.04.20 16:51, Vladimir Sementsov-Ogievskiy wrote: >> 28.04.2020 14:08, Max Reitz wrote: >>> On 28.04.20 10:55, Vladimir Sementsov-Ogievskiy wrote: >>>> Hi! >>>> >>>> I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and >>>> is_allocated_above", and returned to all the inconsistencies about >>>> block-status. I keep in mind Max's series about child-access functions, >>>> and Andrey's work about using COR filter in block-stream, which depends >>>> on Max's series (because, without them COR fitler with file child breaks >>>> backing chains).. And, it seems that it's better to discuss some >>>> questions before resending. >>>> >>>> First, problems about block-status: >>>> >>>> 1. We consider ALLOCATED = ZERO | DATA, and documented as follows: >>>> >>>> * BDRV_BLOCK_DATA: allocation for data at offset is tied to this >>>> layer >>>> * BDRV_BLOCK_ZERO: offset reads as zero >>>> * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing >>>> raw data >>>> * BDRV_BLOCK_ALLOCATED: the content of the block is determined by >>>> this >>>> * layer rather than any backing, set by block >>>> layer >>>> >>>> This actually means, that we should always have BDRV_BLOCK_ALLOCATED for >>>> formats which doesn't support backing. So, all such format drivers must >>>> return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for >>>> example, iscsi - doesn't. >>> >>> Hm. I could imagine that there are formats that have non-zero holes >>> (e.g. 0xff or just garbage). It would be a bit wrong for them to return >>> ZERO or DATA then. >>> >>> But OTOH we don’t care about such cases, do we? We need to know whether >>> ranges are zero, data, or unallocated. If they aren’t zero, we only >>> care about whether reading from it will return data from this layer or >>> not. >>> >>> So I suppose that anything that doesn’t support backing files (or >>> filtered children) should always return ZERO and/or DATA. >>> >>>> 2. ZERO. The meaning differs a bit for generic block_status and for >>>> drivers.. I think, we at least should document it like this: >>>> >>>> BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer >>>> BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at >>>> this layer and read as ZERO. If generic block_status returns ZERO, it >>>> only mean that it reads as zero, but the region may be allocated on >>>> underlying level. >>> >>> Hm. What does that mean? >>> >>> One of the problems is that “allocated” has two meanings: >>> (1) reading data returns data defined at this backing layer, >>> (2) actually allocated, i.e. takes up space on the file represented by >>> this BDS. >>> >>> As far as I understand, we actually don’t care about (2) in the context >>> of block_status, but just about (1). >>> >>> So if a layer returns ZERO, it is by definition (1)-allocated. (It >>> isn’t necessarily (2)-allocated.) >>> >>>> 3. bdi.unallocated_blocks_are_zero >>>> >>>> I think it's very bad, that we have formats, that supports backing, but >>>> doesn't report bdi.unallocated_blocks_are_zero as true. It means that >>>> UNALLOCATED region reads as zero if we have short backing file, and not >>>> as zero if we remove this short backing file. >>> >>> What do you mean by “remove this short backing file”? Because generally >>> one can’t just drop a backing file. >>> >>> So maybe a case like block-stream? Wouldn’t that be a bug in >>> block-stream them, i.e. shouldn’t it stream zeros after the end of the >>> backing file? >>> >>>> I can live with it but >>>> this is weird logic. These bad drivers are qcow (not qcow2), parallels >>>> and vmdk. I hope, they actually just forget to set >>>> unallocated_blocks_are_zero to true. >>> >>> qcow definitely sounds like it. >>> >>>> Next. But what about drivers which doesn't support backing? As we >>>> considered above, they should always return ZERO or DATA, as everything >>>> is allocated in this backing-chain level (last level, of course).. So >>>> again unallocated_blocks_are_zero is meaningless. So, I think, that >>>> driver which doesn't support backings, should be fixed to return always >>>> ZERO or DATA, than we don't need this unallocated_blocks_are_zero at >>>> all. >>> >>> Agreed. >>> >>>> 3. >>> >>> The second 3.? :) >>> >>>> Short backing files in allocated_above: we must consider space after >>>> EOF as ALLOCATED, if short backing file is inside requested >>>> backing-chain part, as it produced exactly because of this short file >>>> (and we never go to backing). >>> >>> Sounds correct. >>> >>>> (current realization of allocated_above is >>>> buggy, see my outdated series "[PATCH 0/4] fix & merge >>>> block_status_above and is_allocated_above") >>>> >>>> 4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have >>>> a backing chain of non-backing child.. I just remember that we didn't >>>> reach the consensus. >>> >>> Possible? :) >>> >>>> 5. Filters.. OK we have two functions for them: >>>> bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I >>>> think both are wrong: >>>> >>>> bdrv_co_block_status_from_file leads to problem [4], when we can report >>>> UNALLOCATED, which refers not to the current backing chain, but to sub >>>> backing chain of file child, which is inconsistent with >>>> block_status_above and is_allocated_above iteration. >>>> >>>> bdrv_co_block_status_from_backing is also is not consistent with >>>> block_status_above iteration.. At least at leads to querying the same >>>> node twice. >>> >>> Well, yes. >>> >>>> So, about filters and backing chains. Keeping (OK, just, trying to keep) >>>> all these things in mind, I think that it's better to keep backing >>>> chains exactly *backing* chains, so that "backing" child is the only >>>> "not own" child of the node. So, its close to current behavior and we >>>> don't need child-access functions. Then how filters should work: >>>> >>>> Filter with backing child, should always return UNALLOCATED (i.e. no >>>> DATA, no ZERO), it is honest: everything is on the other level of >>>> backing chain. >>> >>> I disagree, because filters with or without backing children should work >>> exactly the same way and just not appear in the backing chain. >> >> They work the same way in my variant too. The only difference is that if >> you use file-child-based filters, you can't do stream/commit around >> them. > > Which is a bug that’s been know for a long time, and the primary cause > for the “Deal with filters” series. > >> I just think, that binding the concept to the "backing" link of >> the node is simpler and more generic. In blockdev era, when all nodes >> will be named and libvirt will take care of all nodes including filters, >> we will not need any filter-skipping magic, libvirt will directly point >> to exact nodes it means. And we can deprecate "file" children of >> existing filters, > > One thing to note is that all user-creatable filter drivers currently > use “file”, so this would change them all. > > The idea of just using backing for filters just always seems to me like > trying to take the easy way out. It seems to me like many things around > filters are broken because we weren’t careful enough when introducing > them (mostly a combination of “let’s see whether it just works” and “it > seems to mostly work”), and the “Deal with filters” series attempts to > rectify that. As evidenced by the reviews that required even more > preliminary work (like the still on-list and under-discussion > BdrvChildRole series), there is really a ton of design that should be > improved. > > That makes me very hesitant to just switch filters over to backing, > because I feel that might alleviate some of the most pressing symptoms, > while not addressing the underlying issues. And without symptoms that > hurt, nobody might want to fix the issues. Basically, it feels like > more of the same “Let’s try whether it mostly works” and “it seems to > mostly work”. > > (Also, my naïve feeling is that if just treating backing and file the > same for filter drivers, the deal with filters series would be ten > patches long and v1 would have been uncontroversial.) > Hmm. OK, I don't know. Probably it's not a big difference: use CAF vs move filters to backing child. And anyway we may move them later. Probably it's simpler to fix all the issues, if move the filters to backing child first, but we already have you series and rebasing it on another concept is an extra work, not worth doing. -- Best regards, Vladimir ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2020-05-07 19:36 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-04-28 8:55 backing chain & block status & filters Vladimir Sementsov-Ogievskiy 2020-04-28 11:08 ` Max Reitz 2020-04-28 11:28 ` Kevin Wolf 2020-04-28 15:13 ` Vladimir Sementsov-Ogievskiy 2020-04-28 16:18 ` Eric Blake 2020-04-28 16:46 ` Vladimir Sementsov-Ogievskiy 2020-04-28 18:37 ` Kevin Wolf 2020-04-28 19:44 ` Vladimir Sementsov-Ogievskiy 2020-04-29 9:15 ` Vladimir Sementsov-Ogievskiy 2020-04-29 10:50 ` Vladimir Sementsov-Ogievskiy 2020-04-28 14:51 ` Vladimir Sementsov-Ogievskiy 2020-04-30 19:12 ` Vladimir Sementsov-Ogievskiy 2020-05-01 3:04 ` Andrey Shinkevich 2020-05-06 5:56 ` Vladimir Sementsov-Ogievskiy 2020-05-07 12:58 ` Max Reitz 2020-05-07 19:34 ` Vladimir Sementsov-Ogievskiy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).