qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	Alberto Garcia <berto@igalia.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace()
Date: Thu, 6 Feb 2020 16:19:07 +0100	[thread overview]
Message-ID: <1bb2e344-e66d-de37-0d49-f4a8a5a6eb40@redhat.com> (raw)
In-Reply-To: <20200206144207.GC4926@linux.fritz.box>


[-- Attachment #1.1: Type: text/plain, Size: 6094 bytes --]

On 06.02.20 15:42, Kevin Wolf wrote:
> Am 06.02.2020 um 11:21 hat Max Reitz geschrieben:
>> On 05.02.20 16:55, Kevin Wolf wrote:
>>> Am 11.11.2019 um 17:02 hat Max Reitz geschrieben:
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>  block/quorum.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 62 insertions(+)
>>>>
>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>> index 3a824e77e3..8ee03e9baf 100644
>>>> --- a/block/quorum.c
>>>> +++ b/block/quorum.c
>>>> @@ -825,6 +825,67 @@ static bool quorum_recurse_is_first_non_filter(BlockDriverState *bs,
>>>>      return false;
>>>>  }
>>>>  
>>>> +static bool quorum_recurse_can_replace(BlockDriverState *bs,
>>>> +                                       BlockDriverState *to_replace)
>>>> +{
>>>> +    BDRVQuorumState *s = bs->opaque;
>>>> +    int i;
>>>> +
>>>> +    for (i = 0; i < s->num_children; i++) {
>>>> +        /*
>>>> +         * We have no idea whether our children show the same data as
>>>> +         * this node (@bs).  It is actually highly likely that
>>>> +         * @to_replace does not, because replacing a broken child is
>>>> +         * one of the main use cases here.
>>>> +         *
>>>> +         * We do know that the new BDS will match @bs, so replacing
>>>> +         * any of our children by it will be safe.  It cannot change
>>>> +         * the data this quorum node presents to its parents.
>>>> +         *
>>>> +         * However, replacing @to_replace by @bs in any of our
>>>> +         * children's chains may change visible data somewhere in
>>>> +         * there.  We therefore cannot recurse down those chains with
>>>> +         * bdrv_recurse_can_replace().
>>>> +         * (More formally, bdrv_recurse_can_replace() requires that
>>>> +         * @to_replace will be replaced by something matching the @bs
>>>> +         * passed to it.  We cannot guarantee that.)
>>>> +         *
>>>> +         * Thus, we can only check whether any of our immediate
>>>> +         * children matches @to_replace.
>>>> +         *
>>>> +         * (In the future, we might add a function to recurse down a
>>>> +         * chain that checks that nothing there cares about a change
>>>> +         * in data from the respective child in question.  For
>>>> +         * example, most filters do not care when their child's data
>>>> +         * suddenly changes, as long as their parents do not care.)
>>>> +         */
>>>> +        if (s->children[i].child->bs == to_replace) {
>>>> +            Error *local_err = NULL;
>>>> +
>>>> +            /*
>>>> +             * We now have to ensure that there is no other parent
>>>> +             * that cares about replacing this child by a node with
>>>> +             * potentially different data.
>>>> +             */
>>>> +            s->children[i].to_be_replaced = true;
>>>> +            bdrv_child_refresh_perms(bs, s->children[i].child, &local_err);
>>>> +
>>>> +            /* Revert permissions */
>>>> +            s->children[i].to_be_replaced = false;
>>>> +            bdrv_child_refresh_perms(bs, s->children[i].child, &error_abort);
>>>
>>> Quite a hack. The two obvious problems are:
>>>
>>> 1. We can't guarantee that we can actually revert the permissions. I
>>>    think we ignore failure to loosen permissions meanwhile so that at
>>>    least the &error_abort doesn't trigger, but bs could still be in the
>>>    wrong state afterwards.
>>
>> I thought we guaranteed that loosening permissions never fails.
>>
>> (Well, you know.  It may “leak” permissions, but we’d never get an error
>> here so there’s nothing to handle anyway.)
> 
> This is what I meant. We ignore the failure (i.e. don't return an error),
> but the result still isn't completely correct ("leaked" permissions).
> 
>>>    It would be cleaner to use check+abort instead of actually setting
>>>    the new permission.
>>
>> Oh.  Yes.  Maybe.  It does require more code, though, because I’d rather
>> not use bdrv_check_update_perm() from here as-is.
> 
> I'm not saying you need to do it, just that it would be cleaner. :-)

It would.  Thanks for the suggestion, I obviously didn’t think of it.
(Or there’d be a comment on how this is not the best way in theory, but
in practice it’s good enough.)  I suppose I’ll see how what I can do.

>>> 2. As aborting the permission change makes more obvious, we're checking
>>>    something that might not be true any more when we actually make the
>>>    change.
>>
>> True.  I tried to do it right by having a post-replace cleanup function,
>> but after a while that was just going nowhere, really.  So I just went
>> with what’s patch 13 here.
>>
>> But isn’t 13 enough, actually?  It check can_replace right before
>> replacing in a drained section.  I can’t imagine the permissions to
>> change there.
> 
> Permissions are tied to file locks, so an external process can just grab
> the locks in between.

Ah, right, I didn’t think of that.

> But if I understand correctly, all we try here is
> to have an additional safeguard to prevent the user from doing stupid
> things. So I guess not being 100% is fine as long as it's documented in
> the code.

Yes.  I just think it actually would be 100 % in practice, so I wondered
whether it would need to be documented.

You’re right, though, it isn’t 100 %, so it should definitely be
documented.  Maybe something like

In theory, we would have to keep the permissions tightened until the
node is replaced.  In practice, that would require post-replacement
cleanup infrastructure, which we do not have, and which would be
unreasonably complex to implement.  Therefore, all we can do is require
anyone who wants to replace one node by some potentially unrelated other
node (i.e., the mirror job on completion) to invoke
bdrv_recurse_can_replace() immediately before and thus minimize the time
during which some condition may arise that might forbid the swap.

?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-02-06 15:20 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-11 16:01 [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 01/23] blockdev: Allow external snapshots everywhere Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 02/23] blockdev: Allow resizing everywhere Max Reitz
2019-12-06 14:04   ` Alberto Garcia
2019-12-09 13:56     ` Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 03/23] block: Drop bdrv_is_first_non_filter() Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 04/23] iotests: Let 041 use -blockdev for quorum children Max Reitz
2019-11-11 16:01 ` [PATCH for-5.0 v2 05/23] quorum: Fix child permissions Max Reitz
2019-11-29  9:14   ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:01 ` [PATCH for-5.0 v2 06/23] block: Add bdrv_recurse_can_replace() Max Reitz
2019-11-29  9:34   ` Vladimir Sementsov-Ogievskiy
2019-11-29 10:23     ` Max Reitz
2019-11-29 11:04       ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 07/23] blkverify: Implement .bdrv_recurse_can_replace() Max Reitz
2019-11-29  9:41   ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 08/23] quorum: Store children in own structure Max Reitz
2019-11-29  9:46   ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 09/23] quorum: Add QuorumChild.to_be_replaced Max Reitz
2019-11-29  9:59   ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 10/23] quorum: Implement .bdrv_recurse_can_replace() Max Reitz
2019-11-29 10:18   ` Vladimir Sementsov-Ogievskiy
2019-11-29 12:50     ` Max Reitz
2020-02-05 15:55   ` Kevin Wolf
2020-02-05 16:03     ` Kevin Wolf
2020-02-06 10:21     ` Max Reitz
2020-02-06 14:42       ` Kevin Wolf
2020-02-06 15:19         ` Max Reitz [this message]
2020-02-06 15:42           ` Kevin Wolf
2020-02-06 16:44             ` Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 11/23] block: Use bdrv_recurse_can_replace() Max Reitz
2019-11-29 11:07   ` Vladimir Sementsov-Ogievskiy
2020-02-05 15:57   ` Kevin Wolf
2019-11-11 16:02 ` [PATCH for-5.0 v2 12/23] block: Remove bdrv_recurse_is_first_non_filter() Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 13/23] mirror: Double-check immediately before replacing Max Reitz
2019-11-29 11:18   ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 14/23] quorum: Stop marking it as a filter Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 15/23] mirror: Prevent loops Max Reitz
2019-11-29 12:01   ` Vladimir Sementsov-Ogievskiy
2019-11-29 13:46     ` Max Reitz
2019-11-29 13:55       ` Vladimir Sementsov-Ogievskiy
2019-11-29 14:17         ` Max Reitz
2019-11-29 14:26           ` Vladimir Sementsov-Ogievskiy
2019-11-29 14:38             ` Max Reitz
2019-12-02 12:12   ` Vladimir Sementsov-Ogievskiy
2019-12-09 14:43     ` Max Reitz
2019-12-13 11:18       ` Vladimir Sementsov-Ogievskiy
2019-12-20 11:39         ` Max Reitz
2019-12-20 11:55           ` Vladimir Sementsov-Ogievskiy
2019-12-20 12:10             ` Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 16/23] iotests: Use complete_and_wait() in 155 Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 17/23] iotests: Use skip_if_unsupported decorator in 041 Max Reitz
2019-12-03 12:03   ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 18/23] iotests: Add VM.assert_block_path() Max Reitz
2019-12-03 12:59   ` Vladimir Sementsov-Ogievskiy
2019-12-09 15:10     ` Max Reitz
2019-12-13 11:26       ` Vladimir Sementsov-Ogievskiy
2019-12-13 11:27   ` Vladimir Sementsov-Ogievskiy
2019-12-20 11:42     ` Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 19/23] iotests: Resolve TODOs in 041 Max Reitz
2019-12-03 13:32   ` Vladimir Sementsov-Ogievskiy
2019-12-03 13:33     ` Vladimir Sementsov-Ogievskiy
2019-12-09 15:15       ` Max Reitz
2019-12-13 11:31         ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 20/23] iotests: Use self.image_len in TestRepairQuorum Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 21/23] iotests: Add tests for invalid Quorum @replaces Max Reitz
2019-12-03 14:40   ` Vladimir Sementsov-Ogievskiy
2019-11-11 16:02 ` [PATCH for-5.0 v2 22/23] iotests: Check that @replaces can replace filters Max Reitz
2019-12-03 15:58   ` Vladimir Sementsov-Ogievskiy
2019-12-09 15:17     ` Max Reitz
2019-11-11 16:02 ` [PATCH for-5.0 v2 23/23] iotests: Mirror must not attempt to create loops Max Reitz
2019-12-03 17:03   ` Vladimir Sementsov-Ogievskiy
2019-11-29 12:24 ` [PATCH for-5.0 v2 00/23] block: Fix check_to_replace_node() Vladimir Sementsov-Ogievskiy
2019-11-29 12:49   ` Max Reitz
2019-11-29 12:55     ` Vladimir Sementsov-Ogievskiy
2019-11-29 13:08       ` Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1bb2e344-e66d-de37-0d49-f4a8a5a6eb40@redhat.com \
    --to=mreitz@redhat.com \
    --cc=berto@igalia.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).