On 06.02.20 16:42, Kevin Wolf wrote: > Am 06.02.2020 um 16:19 hat Max Reitz geschrieben: >> On 06.02.20 15:42, Kevin Wolf wrote: >>> Am 06.02.2020 um 11:21 hat Max Reitz geschrieben: >>>> On 05.02.20 16:55, Kevin Wolf wrote: >>>>> Am 11.11.2019 um 17:02 hat Max Reitz geschrieben: >>>>>> Signed-off-by: Max Reitz >>>>>> --- >>>>>> block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> 1 file changed, 62 insertions(+) >>>>>> >>>>>> diff --git a/block/quorum.c b/block/quorum.c >>>>>> index 3a824e77e3..8ee03e9baf 100644 >>>>>> --- a/block/quorum.c >>>>>> +++ b/block/quorum.c >>>>>> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs, >>>>>> return false; >>>>>> } >>>>>> >>>>>> +static bool quorum_recurse_can_replace(BlockDriverState *bs, >>>>>> + BlockDriverState *to_replace) >>>>>> +{ >>>>>> + BDRVQuorumState *s = bs->opaque; >>>>>> + int i; >>>>>> + >>>>>> + for (i = 0; i < s->num_children; i++) { >>>>>> + /* >>>>>> + * We have no idea whether our children show the same data as >>>>>> + * this node (@bs). It is actually highly likely that >>>>>> + * @to_replace does not, because replacing a broken child is >>>>>> + * one of the main use cases here. >>>>>> + * >>>>>> + * We do know that the new BDS will match @bs, so replacing >>>>>> + * any of our children by it will be safe. It cannot change >>>>>> + * the data this quorum node presents to its parents. >>>>>> + * >>>>>> + * However, replacing @to_replace by @bs in any of our >>>>>> + * children's chains may change visible data somewhere in >>>>>> + * there. We therefore cannot recurse down those chains with >>>>>> + * bdrv_recurse_can_replace(). >>>>>> + * (More formally, bdrv_recurse_can_replace() requires that >>>>>> + * @to_replace will be replaced by something matching the @bs >>>>>> + * passed to it. We cannot guarantee that.) >>>>>> + * >>>>>> + * Thus, we can only check whether any of our immediate >>>>>> + * children matches @to_replace. >>>>>> + * >>>>>> + * (In the future, we might add a function to recurse down a >>>>>> + * chain that checks that nothing there cares about a change >>>>>> + * in data from the respective child in question. For >>>>>> + * example, most filters do not care when their child's data >>>>>> + * suddenly changes, as long as their parents do not care.) >>>>>> + */ >>>>>> + if (s->children[i].child->bs == to_replace) { >>>>>> + Error *local_err = NULL; >>>>>> + >>>>>> + /* >>>>>> + * We now have to ensure that there is no other parent >>>>>> + * that cares about replacing this child by a node with >>>>>> + * potentially different data. >>>>>> + */ >>>>>> + s->children[i].to_be_replaced = true; >>>>>> + bdrv_child_refresh_perms(bs, s->children[i].child, &local_err); >>>>>> + >>>>>> + /* Revert permissions */ >>>>>> + s->children[i].to_be_replaced = false; >>>>>> + bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort); >>>>> >>>>> Quite a hack. The two obvious problems are: >>>>> >>>>> 1. We can't guarantee that we can actually revert the permissions. I >>>>> think we ignore failure to loosen permissions meanwhile so that at >>>>> least the &error_abort doesn't trigger, but bs could still be in the >>>>> wrong state afterwards. >>>> >>>> I thought we guaranteed that loosening permissions never fails. >>>> >>>> (Well, you know. It may “leak” permissions, but we’d never get an error >>>> here so there’s nothing to handle anyway.) >>> >>> This is what I meant. We ignore the failure (i.e. don't return an error), >>> but the result still isn't completely correct ("leaked" permissions). >>> >>>>> It would be cleaner to use check+abort instead of actually setting >>>>> the new permission. >>>> >>>> Oh. Yes. Maybe. It does require more code, though, because I’d rather >>>> not use bdrv_check_update_perm() from here as-is. >>> >>> I'm not saying you need to do it, just that it would be cleaner. :-) >> >> It would. Thanks for the suggestion, I obviously didn’t think of it. >> (Or there’d be a comment on how this is not the best way in theory, but >> in practice it’s good enough.) I suppose I’ll see how what I can do. >> >>>>> 2. As aborting the permission change makes more obvious, we're checking >>>>> something that might not be true any more when we actually make the >>>>> change. >>>> >>>> True. I tried to do it right by having a post-replace cleanup function, >>>> but after a while that was just going nowhere, really. So I just went >>>> with what’s patch 13 here. >>>> >>>> But isn’t 13 enough, actually? It check can_replace right before >>>> replacing in a drained section. I can’t imagine the permissions to >>>> change there. >>> >>> Permissions are tied to file locks, so an external process can just grab >>> the locks in between. >> >> Ah, right, I didn’t think of that. >> >>> But if I understand correctly, all we try here is >>> to have an additional safeguard to prevent the user from doing stupid >>> things. So I guess not being 100% is fine as long as it's documented in >>> the code. >> >> Yes. I just think it actually would be 100 % in practice, so I wondered >> whether it would need to be documented. >> >> You’re right, though, it isn’t 100 %, so it should definitely be >> documented. Maybe something like >> >> In theory, we would have to keep the permissions tightened until the >> node is replaced. In practice, that would require post-replacement >> cleanup infrastructure, which we do not have, and which would be >> unreasonably complex to implement. > > Sounds good until here. > >> Therefore, all we can do is require >> anyone who wants to replace one node by some potentially unrelated other >> node (i.e., the mirror job on completion) to invoke >> bdrv_recurse_can_replace() immediately before and thus minimize the time >> during which some condition may arise that might forbid the swap. >> >> ? > > This second part of your suggested comment could be dropped, as far as > I'm concerned. If anything, it's part of the contract and would belong > in the bdrv_recurse_can_replace() documentation. > > However, I think I would mention why not being 100% is okay: The part > with "additional safeguard to prevent the user from doing stupid > things", and that it doesn't make a difference if the user runs the > correct command. OK. Max