Am 06.02.2020 um 11:21 hat Max Reitz geschrieben: > On 05.02.20 16:55, Kevin Wolf wrote: > > Am 11.11.2019 um 17:02 hat Max Reitz geschrieben: > >> Signed-off-by: Max Reitz > >> --- > >> block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 62 insertions(+) > >> > >> diff --git a/block/quorum.c b/block/quorum.c > >> index 3a824e77e3..8ee03e9baf 100644 > >> --- a/block/quorum.c > >> +++ b/block/quorum.c > >> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs, > >> return false; > >> } > >> > >> +static bool quorum_recurse_can_replace(BlockDriverState *bs, > >> + BlockDriverState *to_replace) > >> +{ > >> + BDRVQuorumState *s = bs->opaque; > >> + int i; > >> + > >> + for (i = 0; i < s->num_children; i++) { > >> + /* > >> + * We have no idea whether our children show the same data as > >> + * this node (@bs). It is actually highly likely that > >> + * @to_replace does not, because replacing a broken child is > >> + * one of the main use cases here. > >> + * > >> + * We do know that the new BDS will match @bs, so replacing > >> + * any of our children by it will be safe. It cannot change > >> + * the data this quorum node presents to its parents. > >> + * > >> + * However, replacing @to_replace by @bs in any of our > >> + * children's chains may change visible data somewhere in > >> + * there. We therefore cannot recurse down those chains with > >> + * bdrv_recurse_can_replace(). > >> + * (More formally, bdrv_recurse_can_replace() requires that > >> + * @to_replace will be replaced by something matching the @bs > >> + * passed to it. We cannot guarantee that.) > >> + * > >> + * Thus, we can only check whether any of our immediate > >> + * children matches @to_replace. > >> + * > >> + * (In the future, we might add a function to recurse down a > >> + * chain that checks that nothing there cares about a change > >> + * in data from the respective child in question. For > >> + * example, most filters do not care when their child's data > >> + * suddenly changes, as long as their parents do not care.) > >> + */ > >> + if (s->children[i].child->bs == to_replace) { > >> + Error *local_err = NULL; > >> + > >> + /* > >> + * We now have to ensure that there is no other parent > >> + * that cares about replacing this child by a node with > >> + * potentially different data. > >> + */ > >> + s->children[i].to_be_replaced = true; > >> + bdrv_child_refresh_perms(bs, s->children[i].child, &local_err); > >> + > >> + /* Revert permissions */ > >> + s->children[i].to_be_replaced = false; > >> + bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort); > > > > Quite a hack. The two obvious problems are: > > > > 1. We can't guarantee that we can actually revert the permissions. I > > think we ignore failure to loosen permissions meanwhile so that at > > least the &error_abort doesn't trigger, but bs could still be in the > > wrong state afterwards. > > I thought we guaranteed that loosening permissions never fails. > > (Well, you know. It may “leak” permissions, but we’d never get an error > here so there’s nothing to handle anyway.) This is what I meant. We ignore the failure (i.e. don't return an error), but the result still isn't completely correct ("leaked" permissions). > > It would be cleaner to use check+abort instead of actually setting > > the new permission. > > Oh. Yes. Maybe. It does require more code, though, because I’d rather > not use bdrv_check_update_perm() from here as-is. I'm not saying you need to do it, just that it would be cleaner. :-) > > 2. As aborting the permission change makes more obvious, we're checking > > something that might not be true any more when we actually make the > > change. > > True. I tried to do it right by having a post-replace cleanup function, > but after a while that was just going nowhere, really. So I just went > with what’s patch 13 here. > > But isn’t 13 enough, actually? It check can_replace right before > replacing in a drained section. I can’t imagine the permissions to > change there. Permissions are tied to file locks, so an external process can just grab the locks in between. But if I understand correctly, all we try here is to have an additional safeguard to prevent the user from doing stupid things. So I guess not being 100% is fine as long as it's documented in the code. Kevin