From: Kevin Wolf <kwolf@redhat.com>
To: Max Reitz <mreitz@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
Alberto Garcia <berto@igalia.com>,
qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace()
Date: Thu, 6 Feb 2020 16:42:01 +0100 [thread overview]
Message-ID: <20200206154201.GF4926@linux.fritz.box> (raw)
In-Reply-To: <1bb2e344-e66d-de37-0d49-f4a8a5a6eb40@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 6860 bytes --]
Am 06.02.2020 um 16:19 hat Max Reitz geschrieben:
> On 06.02.20 15:42, Kevin Wolf wrote:
> > Am 06.02.2020 um 11:21 hat Max Reitz geschrieben:
> >> On 05.02.20 16:55, Kevin Wolf wrote:
> >>> Am 11.11.2019 um 17:02 hat Max Reitz geschrieben:
> >>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>> ---
> >>>> block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> 1 file changed, 62 insertions(+)
> >>>>
> >>>> diff --git a/block/quorum.c b/block/quorum.c
> >>>> index 3a824e77e3..8ee03e9baf 100644
> >>>> --- a/block/quorum.c
> >>>> +++ b/block/quorum.c
> >>>> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
> >>>> return false;
> >>>> }
> >>>>
> >>>> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
> >>>> + BlockDriverState *to_replace)
> >>>> +{
> >>>> + BDRVQuorumState *s = bs->opaque;
> >>>> + int i;
> >>>> +
> >>>> + for (i = 0; i < s->num_children; i++) {
> >>>> + /*
> >>>> + * We have no idea whether our children show the same data as
> >>>> + * this node (@bs). It is actually highly likely that
> >>>> + * @to_replace does not, because replacing a broken child is
> >>>> + * one of the main use cases here.
> >>>> + *
> >>>> + * We do know that the new BDS will match @bs, so replacing
> >>>> + * any of our children by it will be safe. It cannot change
> >>>> + * the data this quorum node presents to its parents.
> >>>> + *
> >>>> + * However, replacing @to_replace by @bs in any of our
> >>>> + * children's chains may change visible data somewhere in
> >>>> + * there. We therefore cannot recurse down those chains with
> >>>> + * bdrv_recurse_can_replace().
> >>>> + * (More formally, bdrv_recurse_can_replace() requires that
> >>>> + * @to_replace will be replaced by something matching the @bs
> >>>> + * passed to it. We cannot guarantee that.)
> >>>> + *
> >>>> + * Thus, we can only check whether any of our immediate
> >>>> + * children matches @to_replace.
> >>>> + *
> >>>> + * (In the future, we might add a function to recurse down a
> >>>> + * chain that checks that nothing there cares about a change
> >>>> + * in data from the respective child in question. For
> >>>> + * example, most filters do not care when their child's data
> >>>> + * suddenly changes, as long as their parents do not care.)
> >>>> + */
> >>>> + if (s->children[i].child->bs == to_replace) {
> >>>> + Error *local_err = NULL;
> >>>> +
> >>>> + /*
> >>>> + * We now have to ensure that there is no other parent
> >>>> + * that cares about replacing this child by a node with
> >>>> + * potentially different data.
> >>>> + */
> >>>> + s->children[i].to_be_replaced = true;
> >>>> + bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
> >>>> +
> >>>> + /* Revert permissions */
> >>>> + s->children[i].to_be_replaced = false;
> >>>> + bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
> >>>
> >>> Quite a hack. The two obvious problems are:
> >>>
> >>> 1. We can't guarantee that we can actually revert the permissions. I
> >>> think we ignore failure to loosen permissions meanwhile so that at
> >>> least the &error_abort doesn't trigger, but bs could still be in the
> >>> wrong state afterwards.
> >>
> >> I thought we guaranteed that loosening permissions never fails.
> >>
> >> (Well, you know. It may “leak” permissions, but we’d never get an error
> >> here so there’s nothing to handle anyway.)
> >
> > This is what I meant. We ignore the failure (i.e. don't return an error),
> > but the result still isn't completely correct ("leaked" permissions).
> >
> >>> It would be cleaner to use check+abort instead of actually setting
> >>> the new permission.
> >>
> >> Oh. Yes. Maybe. It does require more code, though, because I’d rather
> >> not use bdrv_check_update_perm() from here as-is.
> >
> > I'm not saying you need to do it, just that it would be cleaner. :-)
>
> It would. Thanks for the suggestion, I obviously didn’t think of it.
> (Or there’d be a comment on how this is not the best way in theory, but
> in practice it’s good enough.) I suppose I’ll see how what I can do.
>
> >>> 2. As aborting the permission change makes more obvious, we're checking
> >>> something that might not be true any more when we actually make the
> >>> change.
> >>
> >> True. I tried to do it right by having a post-replace cleanup function,
> >> but after a while that was just going nowhere, really. So I just went
> >> with what’s patch 13 here.
> >>
> >> But isn’t 13 enough, actually? It check can_replace right before
> >> replacing in a drained section. I can’t imagine the permissions to
> >> change there.
> >
> > Permissions are tied to file locks, so an external process can just grab
> > the locks in between.
>
> Ah, right, I didn’t think of that.
>
> > But if I understand correctly, all we try here is
> > to have an additional safeguard to prevent the user from doing stupid
> > things. So I guess not being 100% is fine as long as it's documented in
> > the code.
>
> Yes. I just think it actually would be 100 % in practice, so I wondered
> whether it would need to be documented.
>
> You’re right, though, it isn’t 100 %, so it should definitely be
> documented. Maybe something like
>
> In theory, we would have to keep the permissions tightened until the
> node is replaced. In practice, that would require post-replacement
> cleanup infrastructure, which we do not have, and which would be
> unreasonably complex to implement.
Sounds good until here.
> Therefore, all we can do is require
> anyone who wants to replace one node by some potentially unrelated other
> node (i.e., the mirror job on completion) to invoke
> bdrv_recurse_can_replace() immediately before and thus minimize the time
> during which some condition may arise that might forbid the swap.
>
> ?
This second part of your suggested comment could be dropped, as far as
I'm concerned. If anything, it's part of the contract and would belong
in the bdrv_recurse_can_replace() documentation.
However, I think I would mention why not being 100% is okay: The part
with "additional safeguard to prevent the user from doing stupid
things", and that it doesn't make a difference if the user runs the
correct command.
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
next prev parent reply other threads:[~2020-02-06 15:48 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 01/23] blockdev: Allow external snapshots everywhere Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 02/23] blockdev: Allow resizing everywhere Max Reitz
2019-12-06 14:04 ` Alberto Garcia
2019-12-09 13:56 ` Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 03/23] block: Drop bdrv_is_first_non_filter() Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 04/23] iotests: Let 041 use -blockdev for quorum children Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 05/23] quorum: Fix child permissions Max Reitz
2019-11-29 9:14 ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:01 ` [PATCH for-5.0 v2 06/23] block: Add bdrv_recurse_can_replace() Max Reitz
2019-11-29 9:34 ` Vladimir Sementsov-Ogievskiy
2019-11-29 10:23 ` Max Reitz
2019-11-29 11:04 ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 07/23] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
2019-11-29 9:41 ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 08/23] quorum: Store children in own structure Max Reitz
2019-11-29 9:46 ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 09/23] quorum: Add QuorumChild.to_be_replaced Max Reitz
2019-11-29 9:59 ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
2019-11-29 10:18 ` Vladimir Sementsov-Ogievskiy
2019-11-29 12:50 ` Max Reitz
2020-02-05 15:55 ` Kevin Wolf
2020-02-05 16:03 ` Kevin Wolf
2020-02-06 10:21 ` Max Reitz
2020-02-06 14:42 ` Kevin Wolf
2020-02-06 15:19 ` Max Reitz
2020-02-06 15:42 ` Kevin Wolf [this message]
2020-02-06 16:44 ` Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 11/23] block: Use bdrv_recurse_can_replace() Max Reitz
2019-11-29 11:07 ` Vladimir Sementsov-Ogievskiy
2020-02-05 15:57 ` Kevin Wolf
2019-11-11 16:02 ` [PATCH for-5.0 v2 12/23] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 13/23] mirror: Double-check immediately before replacing Max Reitz
2019-11-29 11:18 ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 14/23] quorum: Stop marking it as a filter Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 15/23] mirror: Prevent loops Max Reitz
2019-11-29 12:01 ` Vladimir Sementsov-Ogievskiy
2019-11-29 13:46 ` Max Reitz
2019-11-29 13:55 ` Vladimir Sementsov-Ogievskiy
2019-11-29 14:17 ` Max Reitz
2019-11-29 14:26 ` Vladimir Sementsov-Ogievskiy
2019-11-29 14:38 ` Max Reitz
2019-12-02 12:12 ` Vladimir Sementsov-Ogievskiy
2019-12-09 14:43 ` Max Reitz
2019-12-13 11:18 ` Vladimir Sementsov-Ogievskiy
2019-12-20 11:39 ` Max Reitz
2019-12-20 11:55 ` Vladimir Sementsov-Ogievskiy
2019-12-20 12:10 ` Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 16/23] iotests: Use complete_and_wait() in 155 Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 17/23] iotests: Use skip_if_unsupported decorator in 041 Max Reitz
2019-12-03 12:03 ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path() Max Reitz
2019-12-03 12:59 ` Vladimir Sementsov-Ogievskiy
2019-12-09 15:10 ` Max Reitz
2019-12-13 11:26 ` Vladimir Sementsov-Ogievskiy
2019-12-13 11:27 ` Vladimir Sementsov-Ogievskiy
2019-12-20 11:42 ` Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 19/23] iotests: Resolve TODOs in 041 Max Reitz
2019-12-03 13:32 ` Vladimir Sementsov-Ogievskiy
2019-12-03 13:33 ` Vladimir Sementsov-Ogievskiy
2019-12-09 15:15 ` Max Reitz
2019-12-13 11:31 ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 20/23] iotests: Use self.image_len in TestRepairQuorum Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 21/23] iotests: Add tests for invalid Quorum @replaces Max Reitz
2019-12-03 14:40 ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 22/23] iotests: Check that @replaces can replace filters Max Reitz
2019-12-03 15:58 ` Vladimir Sementsov-Ogievskiy
2019-12-09 15:17 ` Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 23/23] iotests: Mirror must not attempt to create loops Max Reitz
2019-12-03 17:03 ` Vladimir Sementsov-Ogievskiy
2019-11-29 12:24 ` [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Vladimir Sementsov-Ogievskiy
2019-11-29 12:49 ` Max Reitz
2019-11-29 12:55 ` Vladimir Sementsov-Ogievskiy
2019-11-29 13:08 ` Max Reitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200206154201.GF4926@linux.fritz.box \
--to=kwolf@redhat.com \
--cc=berto@igalia.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).