On 13.08.19 15:21, Kevin Wolf wrote: > Am 13.08.2019 um 14:01 hat Kevin Wolf geschrieben: >> Am 13.08.2019 um 13:28 hat Vladimir Sementsov-Ogievskiy geschrieben: >>> 13.08.2019 14:04, Kevin Wolf wrote: >>>> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>>> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to >>>>> returned file. But is it correct behavior at all? If returned file >>>>> itself has a backing file, we may report as totally unallocated and >>>>> area which actually has data in bottom backing file. >>>>> >>>>> So, mirroring of qcow2 under raw-format is broken. Which is illustrated >>>>> by following commit with a test. Let's make raw-format behave more >>>>> correctly returning BDRV_BLOCK_DATA. >>>>> >>>>> Suggested-by: Max Reitz >>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy >>>> >>>> After some reading, I think I came to the conclusion that RAW is the >>>> correct thing to do. There is indeed a problem, but this patch is trying >>>> to fix it in the wrong place. >>>> >>>> In the case where the backing file contains some data, and we have a >>>> 'raw' node above the qcow2 overlay node, the content of the respective >>>> block is not defined by the queried backing file layer, so it is >>>> completely correct that bdrv_is_allocated() returns false, like it would >>>> if you queried the qcow2 layer directly. If it returned true, we would >>>> copy everything, which isn't right either (the test cases should may add >>>> the qemu-img map output of the target so this becomes visible). >>>> >>>> The problem is that we try to recurse along the backing chain, but we >>>> fail to make the step from the raw node to the backing file. >>> >>> I'd say, the problem is that we ignore backing chain of non-backing >>> child >> >> Yes, exactly. And I know even less about what happens if a child is >> neither bs->file nor bs->backing. Imagine a qcow2 image with an external >> data file that is a qcow2 image with a backing file itself. :-) >> >> Actually, just having two qcow2 layers nested with bs->file probably >> already fails. >> >>>> Note that just extending Max's "deal with filters" is not enough to fix >>>> this because raw doesn't actually meet all of the criteria for being a >>>> filter in this sense (at least because the 'offset' option can change >>>> offsets between raw and its child). >>>> >>>> I think this is essentially a result of special-casing backing files >>>> everywhere instead of treating them like children like any other. >>> >>> But we need to special-case them, as we have interfaces operating on >>> backing chain, >> >> I'm not sure yet if this means that these interfaces are wrong, but it >> might. But in any case, I think we depend on special-casing in more >> places than we should. >> >>>> bdrv_co_block_status_above() probably shouldn't recurse along the >>>> backing chain, but along the returned *file pointers, and consider the >>>> returned offset in *map. >>> >>> So, you mean that in case of unallocated, format layer should return >>> it's backing file as file? >> >> Yes, because that's where it's reading the data from. >> >> Hm... Now I wonder what this means for DATA... In theory it would have >> to be set for backing files, but that would make it completely useless. >> We can distinguish the cases by looking at *file, but how does the >> generic block layer know which child should be counted as "allocated" >> and which shouldn't? > > Possible answer to my own question: > > bdrv_is_allocated(bs) isn't even asking a complete question. What we > really need to ask is whether a specific child is where data comes from. > > What the current callers of bdrv_is_allocated() are interested in is > whether the data comes from bs->backing or from somewhere else. That is, > if removing bs from the graph (so that all parents of bs would point to > bs->backing instead) would still result in the same data in the given > block. Maybe callers of bdrv_is_allocated() should just ensure that the node they pass actually has a backing file. (If it doesn’t, they should skip all filters until it does.) Max