Am 06.02.2020 um 17:47 hat Max Reitz geschrieben:
> On 06.02.20 17:43, Max Reitz wrote:
> > On 06.02.20 16:51, Kevin Wolf wrote:
> >> Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
> >>> On 06.02.20 15:58, Kevin Wolf wrote:
> >>>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
> >>>>> On 05.02.20 16:38, Kevin Wolf wrote:
> >>>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
> >>>>>>> We will need this to verify that Quorum can let one of its children be
> >>>>>>> replaced without breaking anything else.
> >>>>>>>
> >>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>>>>> ---
> >>>>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
> >>>>>>>  1 file changed, 25 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/block/quorum.c b/block/quorum.c
> >>>>>>> index 59cd524502..6a7224c9e4 100644
> >>>>>>> --- a/block/quorum.c
> >>>>>>> +++ b/block/quorum.c
> >>>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
> >>>>>>>  
> >>>>>>>  typedef struct QuorumChild {
> >>>>>>>      BdrvChild *child;
> >>>>>>> +
> >>>>>>> +    /*
> >>>>>>> +     * If set, check whether this node can be replaced without any
> >>>>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> >>>>>>> +     * WRITE permission.
> >>>>>>> +     */
> >>>>>>> +    bool to_be_replaced;
> >>>>>>
> >>>>>> I don't understand these permission changes. How does (preparing for)
> >>>>>> detaching a node from quorum make its content invalid?
> >>>>>
> >>>>> It doesn’t, of course.  What we are preparing for is to replace it by
> >>>>> some other node with some other content.
> >>>>>
> >>>>>> And why do we
> >>>>>> suddenly need WRITE permissions even if the quorum node is only used
> >>>>>> read-only?
> >>>>>>
> >>>>>> The comment is a bit unclear, too. "check whether" implies that both
> >>>>>> outcomes could be true, but it doesn't say what happens in either case.
> >>>>>> Is this really "make sure that"?
> >>>>>
> >>>>> I think the comment is not only unclear, it is the problem.  (Well,
> >>>>> maybe the code is also.)
> >>>>>
> >>>>> This series is about fixing at least some things about replacing nodes
> >>>>> by mirroring.  The original use cases this was introduced for was to fix
> >>>>> broken quorum children: The other children are still intact, so you read
> >>>>> from the quorum node and replace the broken child (which maybe shows
> >>>>> invalid data, or maybe just EIO) by the fixed mirror result.
> >>>>>
> >>>>> Replacing that broken node by the fixed one changes the data that’s
> >>>>> visible on that node.
> >>>>
> >>>> Hm, yes, that's true. But I wonder if this is really something that the
> >>>> permission system must catch. Like other graph manipulations, it's
> >>>> essentially the user saying "trust me, I know what I'm doing, this node
> >>>> makes sense in this place".
> >>>>
> >>>> Because if you assume that the user could add a node with unsuitable
> >>>> content and you want to prevent this, where do we stop?
> >>>> blockdev-snapshot can insert a non-empty overlay, which would result in
> >>>> visible data change. Should we therefore only allow snapshots when
> >>>> shared writes are allowed? This doesn't work obviously.
> >>>>
> >>>> So I'm inclined to say that this is the user's responsibility and we
> >>>> don't have to jump through hoops to prevent every possible way that the
> >>>> user could mess up. (Which often also result in preventing legitimate
> >>>> cases like here a quorum of read-only nodes.)
> >>>
> >>> Well, if you ask the question “where do we stop”, we also have to ask
> >>> the question “where do we start”.  If we say the user knows what they’re
> >>> doing, we might as well drop the whole can_replace infrastructure
> >>> altogether and just assume that you can replace any node by anything.
> >>
> >> Well, I don't actually know if that would be completely unreasonable.
> >> The idea was obviously to keep graph changes restricted to very specific
> >> cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
> >> have quite a few more operations that allow changing the graph.
> >>
> >> So if preventing some cases gives us headaches and is probably more work
> >> than dealing with any bugs they might reveal, maybe preventing them is
> >> wrong.
> >>
> >> I'm just afraid that we might be overengineering this and waste time on
> >> things that we don't actually get much use from.
> > 
> > That’s why I’m asking.
> 
> (One thing to consider here, though, is that this series exists and has
> been reviewed by Vladimir in full, so most of the engineering effort has
> already been done.  In contrast, writing a new series to drop the whole
> can_replace infrastructure with no replacement may actually cost more.)

Fair enough. If the artificial permission changes didn't feel so wrong,
I think I would just say "let's merge it and forget about it". But they
do feel wrong, so I'm not as sure.

Kevin