Am 05.05.2020 um 13:59 hat Max Reitz geschrieben: > On 05.05.20 13:19, Kevin Wolf wrote: > > Am 18.02.2020 um 13:42 hat Max Reitz geschrieben: > >> This mask will supplement BdrvChildClass when it comes to what role (or > >> combination of roles) a child takes for its parent. It consists of > >> BdrvChildRoleBits values (which is an enum). > >> > >> Because empty enums are not allowed, let us just start with it filled. > >> > >> Signed-off-by: Max Reitz > >> --- > >> include/block/block.h | 38 ++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 38 insertions(+) > >> > >> diff --git a/include/block/block.h b/include/block/block.h > >> index fd89eb6c75..8c23948d08 100644 > >> --- a/include/block/block.h > >> +++ b/include/block/block.h > >> @@ -268,6 +268,44 @@ enum { > >> DEFAULT_PERM_UNCHANGED = BLK_PERM_ALL & ~DEFAULT_PERM_PASSTHROUGH, > >> }; > >> > >> +enum BdrvChildRoleBits { > >> + /* Child stores data */ > >> + BDRV_CHILD_DATA = (1 << 0), > >> + > >> + /* Child stores metadata */ > >> + BDRV_CHILD_METADATA = (1 << 1), > >> + > >> + /* > >> + * A child to which the parent forwards all reads and writes. It > > > > Is "_all_ reads and writes" really required? Imagine a caching block > > driver, should it not be considered a filter because it may just > > complete the requests from its cache rather than asking the child? > > Hm. The important thing is that parent and child always show exactly > the same data, which is the second part of the definition given here. > So maybe we should flip both sentences, e.g.: > > “A child which always presents exactly the same visibile data as the > parent, e.g. by virtue of the parent forwarding all reads and writes.” > > ? Turns it into an example, which is a good way of explaining things that are commonly, but not universally true. I like it. > >> + * must present exactly the same visible data as the parent. > >> + * Any node may have at most one filtered child at a time. > >> + */ > >> + BDRV_CHILD_FILTERED = (1 << 2), > >> + > >> + /* > >> + * Child from which to read all data that isn’t allocated in the > >> + * parent (i.e., the backing child); such data is copied to the > >> + * parent through COW (and optionally COR). > >> + */ > >> + BDRV_CHILD_COW = (1 << 3), > >> + > >> + /* > >> + * The primary child. For most drivers, this is the child whose > >> + * filename applies best to the parent node. > >> + * Each parent must give this flag to no more than one child at a > >> + * time. > >> + */ > >> + BDRV_CHILD_PRIMARY = (1 << 4), > > > > And I assume some drivers like quorum don't set it on any child. > > I thought “no more than one” implies that. Technically no, though it's probably the first assumption of most people. It's just a suggestion for clarification, feel free to ignore it. > >> + /* Useful combination of flags */ > >> + BDRV_CHILD_IMAGE = BDRV_CHILD_DATA > >> + | BDRV_CHILD_METADATA > >> + | BDRV_CHILD_PRIMARY, > >> +}; > >> + > >> +/* Mask of BdrvChildRoleBits values */ > >> +typedef unsigned int BdrvChildRole; > >> + > >> char *bdrv_perm_names(uint64_t perm); > >> uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm); > > > > The list intuitively makes sense to me. Let me try to think of some > > interesting cases to see whether the documentation is complete or > > whether it could be improved. > > > > > > qcow2 is what everyone has in mind, so it should be obvious: > > > > * Without a data file: > > * file: BDRV_CHILD_IMAGE > > * backing: BDRV_CHILD_COW > > > > * With a data file: > > * file: BDRV_CHILD_PRIMARY | BDRV_CHILD_METADATA > > * data-file: BDRV_CHILD_DATA > > * backing: BDRV_CHILD_COW > > > > > > We can use VMDK to make things a bit more interesting: > > > > * file: BDRV_CHILD_PRIMARY | BDRV_CHILD_METADATA > > * extents.*: BDRV_CHILD_METADATA | BDRV_CHILD_DATA > > * backing: BDRV_CHILD_COW > > > > In other words, we can have multiple data and metadata children. Is this > > correct or should extents not be marked as metadata? (Checked the final > > code: yes we do have multiple of them in vmdk.) Should this be mentioned > > in the documentation? > > If the extents contain metadata (I thought not, but I think I was just > wrong; sparse extents must contain their respective mapping structures), > then yes, they should all be marked as metadata children. > > I’m not sure whether that needs to be mentioned explicitly in the doc, > because “Child stores metadata” seems sufficient to me. When you're the author, the meaning of everything is clear to you. :-) In case of doubt, I would be more explicit so that the comment gives a clear guideline for which role to use in which scenario. > > Do we then also want to allow multiple BDRV_CHILD_COW children? We don't > > currently have a driver that needs it, but maybe it would be consistent > > with DATA and METADATA then. However, it would contradict the > > documentation that it's the "Child from which to read all data". > > Yes. I would revisit that problem when the need arises. > > It seems to me like this would open a whole can of worms, just like > allowing multiple filtered children does. Okay. Shall we document it explicitly like we do for the filter role? > > blkverify: > > > > * x-image: BDRV_CHILD_PRIMARY | BDRV_CHILD_DATA | BDRV_CHILD_FILTERED > > * x-raw: BDRV_CHILD_DATA | BDRV_CHILD_FILTERED > > > > Hm, according to the documentation, this doesn't work, FILTERED can be > > set only for one node. But the condition ("the parent forwards all reads > > and writes") applies to both children. I think the documentation should > > mention what needs to be done in such cases. > > I don’t know. blkverify is a rare exception by design, because it can > abort when both children don’t match. (I suppose we could theoretically > have a quorum mode where a child gets ejected once a mismatch is > detected, but that isn’t the case now.) Well, yes, this is exceptional. I would ignore that property for assigning roles because when it comes to play, roles don't matter any more because the whole process is gone. So... > Furthermore, I would argue that blkverify actually expects the formatted > image to sometimes differ from the raw image, if anything, because the > format driver is to be tested. This is the reason why I chose x-raw to > be the filtered child. ...I don't think this case is relevant. If blkverify returns something, both children have the same data. > So there is no general instruction on what to do in such cases that I > followed here, I specifically chose one child based on what blkverify is > and what it’s supposed to do. Therefore, I can’t really give a general > instruction on “what needs to be done in such cases”. Maybe the missing part for me is what FILTERED is even used for. I assume it's for skipping over filters in certain functions in the generic block layer? In this case, maybe the right answer is that... > > For blkverify, both > > children are not equal in intention, so I guess the "real" filtered > > child is x-image. But for quorum, you can't make any such distinction. I > > assume the recommendation should be not to set FILTERED for any child > > then. > > Quorum just isn’t a filter driver. ...blkverify isn't one either because performing an operation on only one child usually won't be correct. > > Looking at the final code... Hm, your choice looks quite different: You > > don't have DATA for x-raw, but you make it the PRIMARY and FILTERED > > child. I think PRIMARY/FILTERED is just a bug (e.g. getlength and flush > > being forwarded only to x-image show that it's primary). > > I rather consider getlength() a special case. Ideally, we’d forward > getlength() to both and compare the results; however, image formats > might have different size resolution than raw files, so there could be a > difference, but it’d be irrelevant. > > It makes then sense to forward it to the formatted image, because > generally formats have byte resolution for the disk size, whereas for > raw files it depends on caching and the filesystem, I think. > > As for flush, yes, why do we forward it only to x-image? Why is “the > raw file not important”? Because it's the copy that is used to check whether the main image is correct. If things break, you just create a new copy. At least that's how blkverify was supposed to be used. In fact, I guess in the typical use cases for blkverify, cache=unsafe is enough anyway because you would start over from scratch, so... not a strong argument. > > I do wonder > > whether I have a different interpretation of DATA than you, though. > > I never set DATA for FILTERED, because I consider FILTERED to be > stronger than DATA, so once FILTERED is set, it doesn’t matter whether > DATA is set or not. I suppose that should either be mentioned in the > documentation, or we decide that we should always set DATA regardless. Either option should be fine. I guess documenting it is less work. > > Also, the comparison makes me wonder whether FILTERED always implies > > PRIMARY? Would there ever be a scenario where a child is FILTERED, but > > not PRIMARY? > > I don’t know. I suppose it does. But what’s the implication? *shrug* I was just asking to see if I understand things right. We could document it, but I don't have a good reason why we must do that. Maybe the more relevant question would be if a FILTERED child must be the only child to avoid the problems we're discussing for blkverify. But I think I already answered that question for myself with "no", so probably not much use asking it. Kevin