On 15.08.19 17:21, Vladimir Sementsov-Ogievskiy wrote: > 09.08.2019 19:14, Max Reitz wrote: >> Currently, check_to_replace_node() only allows mirror to replace a node >> in the chain of the source node, and only if it is the first non-filter >> node below the source. Well, technically, the idea is that you can >> exactly replace a quorum child by mirroring from quorum. >> >> This has (probably) two reasons: >> (1) We do not want to create loops. >> (2) @replaces and @device should have exactly the same content so >> replacing them does not cause visible data to change. >> >> This has two issues: >> (1) It is overly restrictive. It is completely fine for @replaces to be >> a filter. >> (2) It is not restrictive enough. You can create loops with this as >> follows: >> >> $ qemu-img create -f qcow2 /tmp/source.qcow2 64M >> $ qemu-system-x86_64 -qmp stdio >> {"execute": "qmp_capabilities"} >> {"execute": "object-add", >> "arguments": {"qom-type": "throttle-group", "id": "tg0"}} >> {"execute": "blockdev-add", >> "arguments": { >> "node-name": "source", >> "driver": "throttle", >> "throttle-group": "tg0", >> "file": { >> "node-name": "filtered", >> "driver": "qcow2", >> "file": { >> "driver": "file", >> "filename": "/tmp/source.qcow2" >> } } } } >> {"execute": "drive-mirror", >> "arguments": { >> "job-id": "mirror", >> "device": "source", >> "target": "/tmp/target.qcow2", >> "format": "qcow2", >> "node-name": "target", >> "sync" :"none", >> "replaces": "filtered" >> } } >> {"execute": "block-job-complete", "arguments": {"device": "mirror"}} >> >> And qemu crashes because of a stack overflow due to the loop being >> created (target's backing file is source, so when it replaces filtered, >> it points to itself through source). >> >> (blockdev-mirror can be broken similarly.) >> >> So let us make the checks for the two conditions above explicit, which >> makes the whole function exactly as restrictive as it needs to be. >> >> Signed-off-by: Max Reitz >> --- >> include/block/block.h | 1 + >> block.c | 83 +++++++++++++++++++++++++++++++++++++++---- >> blockdev.c | 34 ++++++++++++++++-- >> 3 files changed, 110 insertions(+), 8 deletions(-) >> >> diff --git a/include/block/block.h b/include/block/block.h >> index 6ba853fb90..8da706cd89 100644 >> --- a/include/block/block.h >> +++ b/include/block/block.h >> @@ -404,6 +404,7 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate); >> >> /* check if a named node can be replaced when doing drive-mirror */ >> BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs, >> + BlockDriverState *backing_bs, >> const char *node_name, Error **errp); >> >> /* async block I/O */ >> diff --git a/block.c b/block.c >> index 915b80153c..4858d3e718 100644 >> --- a/block.c >> +++ b/block.c >> @@ -6290,7 +6290,59 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate) >> return false; >> } >> >> +static bool is_child_of(BlockDriverState *child, BlockDriverState *parent) >> +{ >> + BdrvChild *c; >> + >> + if (!parent) { >> + return false; >> + } >> + >> + QLIST_FOREACH(c, &parent->children, next) { >> + if (c->bs == child || is_child_of(child, c->bs)) { >> + return true; >> + } >> + } >> + >> + return false; >> +} >> + >> +/* >> + * Return true if there are only filters in [@top, @base). Note that >> + * this may include quorum (which bdrv_chain_contains() cannot >> + * handle). > > More presizely: return true if exists chain of filters from top to base or if > top == base. > > I keep in mind backup-top filter: > > [backup-top] > | \target backup-top can’t be a filter if it has two children with different contents, though. (commit-top and mirror-top aren’t filters either.) That’s why there must be a unique chain [@top, @base). I should probably not that it will return true if top == base, though, yes. > |backing -------->[target] > V / > [source] <---------/backing > >> + */ >> +static bool is_filtered_child(BlockDriverState *top, BlockDriverState *base) >> +{ >> + BdrvChild *c; >> + >> + if (!top) { >> + return false; >> + } >> + >> + if (top == base) { >> + return true; >> + } >> + >> + if (!top->drv->is_filter) { >> + return false; >> + } >> + >> + QLIST_FOREACH(c, &top->children, next) { >> + if (is_filtered_child(c->bs, base)) { >> + return true; >> + } >> + } > > interesting, how much is it better to somehow reuse DFS search written in should_update_child().. > [just note, don't do it in these series please] > >> + >> + return false; >> +} >> + >> +/* >> + * @parent_bs is mirror's source BDS, @backing_bs is the BDS which >> + * will be attached to the target when mirror completes. >> + */ >> BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs, >> + BlockDriverState *backing_bs, >> const char *node_name, Error **errp) >> { >> BlockDriverState *to_replace_bs = bdrv_find_node(node_name); >> @@ -6309,13 +6361,32 @@ BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs, >> goto out; >> } >> >> - /* We don't want arbitrary node of the BDS chain to be replaced only the top >> - * most non filter in order to prevent data corruption. >> - * Another benefit is that this tests exclude backing files which are >> - * blocked by the backing blockers. >> + /* >> + * If to_replace_bs is (recursively) a child of backing_bs, >> + * replacing it may create a loop. We cannot allow that. >> */ >> - if (!bdrv_recurse_is_first_non_filter(parent_bs, to_replace_bs)) { >> - error_setg(errp, "Only top most non filter can be replaced"); >> + if (to_replace_bs == backing_bs || is_child_of(to_replace_bs, backing_bs)) { > > first condition is covered by second, so first may be omitted. It is not. is_child_of() does not return true if child == parent. >> + error_setg(errp, "Replacing this node would result in a loop"); >> + to_replace_bs = NULL; >> + goto out; >> + } >> + >> + /* >> + * Mirror is designed in such a way that when it completes, the >> + * source BDS is seamlessly replaced. > > Not source but to_replace_bs is replaced? It has originally been designed to replace the source. If it could replace any arbitrary BDS, all of this would be moot. >> It is therefore not allowed >> + * to replace a BDS where this condition would be violated, as that >> + * would defeat the purpose of mirror and could lead to data >> + * corruption. >> + * Therefore, between parent_bs and to_replace_bs there may be >> + * only filters (and the one on top must be a filter, too), so >> + * their data always stays in sync and mirror can complete and >> + * replace to_replace_bs without any possible corruptions. >> + */ >> + if (!is_filtered_child(parent_bs, to_replace_bs) && >> + !is_filtered_child(to_replace_bs, parent_bs)) >> + { >> + error_setg(errp, "The node to be replaced must be connected to the " >> + "source through filter nodes only"); > > "and the one on top must be a filter, too" not mentioned in the error.. Well, unless the source node is the node to be replaced. Hm... This gets very hard to express. I think I’d prefer to keep this as it is, even though it is not quite correct, unless you have a better suggestion of what to report. :-/ >> to_replace_bs = NULL; >> goto out; >> } >> diff --git a/blockdev.c b/blockdev.c >> index 4e72f6f701..758e0b5431 100644 >> --- a/blockdev.c >> +++ b/blockdev.c >> @@ -3887,7 +3887,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs, >> } >> >> if (has_replaces) { >> - BlockDriverState *to_replace_bs; >> + BlockDriverState *to_replace_bs, *backing_bs; >> AioContext *replace_aio_context; >> int64_t bs_size, replace_size; >> >> @@ -3897,7 +3897,37 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs, >> return; >> } >> >> - to_replace_bs = check_to_replace_node(bs, replaces, errp); >> + if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN || >> + backing_mode == MIRROR_OPEN_BACKING_CHAIN) >> + { >> + /* >> + * While we do not quite know what OPEN_BACKING_CHAIN >> + * (used for mode=existing) will yield, it is probably >> + * best to restrict it exactly like SOURCE_BACKING_CHAIN, >> + * because that is our best guess. >> + */ >> + switch (sync) { >> + case MIRROR_SYNC_MODE_FULL: >> + backing_bs = NULL; >> + break; >> + >> + case MIRROR_SYNC_MODE_TOP: >> + backing_bs = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)); > > why not bdrv_backing_chain_next(bs) like in mirror_start? Good question. I suppose it should be bdrv_filtered_cow_bs(bdrv_backing_chain_next(bs)) in mirror_start()? Because with sync=top, we just want to remove the topmost COW node (and filters on top), but keep filters behind it. Max >> + break; >> + >> + case MIRROR_SYNC_MODE_NONE: >> + backing_bs = bs; >> + break; >> + >> + default: >> + abort(); >> + } >> + } else { >> + assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN); >> + backing_bs = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(target)); >> + } >> + >> + to_replace_bs = check_to_replace_node(bs, backing_bs, replaces, errp); >> if (!to_replace_bs) { >> return; >> } >> > >