qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v6 04/42] block: Add child access functions
Date: Tue, 10 Sep 2019 13:36:47 +0200	[thread overview]
Message-ID: <00aa6729-5fa0-31e0-8af5-1a91ae034f28@redhat.com> (raw)
In-Reply-To: <20190910104748.GC4446@localhost.localdomain>


[-- Attachment #1.1: Type: text/plain, Size: 12125 bytes --]

On 10.09.19 12:47, Kevin Wolf wrote:
> Am 10.09.2019 um 11:14 hat Max Reitz geschrieben:
>> On 09.09.19 18:13, Kevin Wolf wrote:
>>> Am 09.09.2019 um 16:04 hat Max Reitz geschrieben:
>>>> On 09.09.19 11:36, Kevin Wolf wrote:
>>>>> Am 09.09.2019 um 09:56 hat Max Reitz geschrieben:
>>>>>> On 04.09.19 18:16, Kevin Wolf wrote:
>>>>>>> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
>>>>>>>> There are BDS children that the general block layer code can access,
>>>>>>>> namely bs->file and bs->backing.  Since the introduction of filters and
>>>>>>>> external data files, their meaning is not quite clear.  bs->backing can
>>>>>>>> be a COW source, or it can be an R/W-filtered child; bs->file can be an
>>>>>>>> R/W-filtered child, it can be data and metadata storage, or it can be
>>>>>>>> just metadata storage.
>>>>>>>>
>>>>>>>> This overloading really is not helpful.  This patch adds function that
>>>>>>>> retrieve the correct child for each exact purpose.  Later patches in
>>>>>>>> this series will make use of them.  Doing so will allow us to handle
>>>>>>>> filter nodes and external data files in a meaningful way.
>>>>>>>>
>>>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>>
>>>>>>> Each time I look at this patch, I'm confused by the function names.
>>>>>>> Maybe I should just ask what the idea there was, or more specifically:
>>>>>>> What does the "filtered" in "filtered child" really mean?
>>>>>>>
>>>>>>> Apparently any child of a filter node is "filtered" (which makes sense),
>>>>>>
>>>>>> It isn’t, filters can have non-filter children.  For example, backup-top
>>>>>> could have the source as a filtered child and the target as a non-filter
>>>>>> child.
>>>>>
>>>>> Hm, okay, makes sense. I had a definition in mind that says that filter
>>>>> nodes only have a single child node. Is it that a filter may have only a
>>>>> single _filtered_ child node?
>>>>
>>>> Well, there’s Quorum...
>>>
>>> Ah, nice, quorum sets is_filter = true even though it neither fulfulls
>>> the conditions for it before this series, nor the changed conditions
>>> after this series.
>>>
>>> So either quorum lies and isn't actually a filter driver, or our
>>> definition in the documentation of is_filter is wrong.
>>
>> You could say it lies because in FIFO mode it clearly isn’t a filter for
>> all of its children.
>>
>> There is a reason for lying, though, which is
>> bdrv_recurse_is_first_non_filter(), which is necessary to use the whole
>> to_replace mirror stuff.
> 
> Hm, actually, now that you mention bdrv_recurse_is_first_non_filter(),
> quorum was the first driver to declare itself a filter, so strictly
> speaking, if there is an inconsistency, it's the other uses that are
> abusing the field...
> 
>> (You mirror from a quorum with a failed child and then replace the
>> failed child.  mirror needs to ensure that there are only R/W filters
>> between the child and the mirror source so that replacing it will not
>> suddenly change any visible data.  Which is actually a lie for quorum,
>> because the child is clearly broken and thus precisely doesn’t show the
>> same data...)
>>
>> Maybe we should stop declaring Quorum a filter and then rename the
>> bdrv_recurse_is_first_non_filter() to, I don’t know,
>> bdrv_recurse_can_be_replaced_by_mirror()?
> 
> Why not.

It feels difficult to do in this series because this is a whole new can
of worms.

In patch 35, I actually replace the mirror use case by
is_filtered_child().  So it looks to me as if that should not be done,
because I should instead fix bdrv_recurse_is_first_non_filter() (and
rename it), because quorum does allow replacing its children by mirror,
even if it does not act as a filter for them.

OTOH, there are other users of bdrv_is_first_non_filter().  Those are
qmp_block_resize() and external_snapshot_prepare(), who throw an error
if that returns false.

I think that’s just wrong.  First of all, I don’t even know why we have
that restriction anymore (I can imagine why it used to make sense before
the permission system).  qmp_block_resize() should always work as long
as it can get BLK_PERM_RESIZE; and I don’t know why the parents of some
node would care if you take a snapshot of their child.

>>>>>>> but also bs->backing of a qcow2 image, while bs->file of qcow2 isn't.
>>>>>>> raw doesn't have any "filtered" child. What's the system behind this?
>>>>>>
>>>>>> “filtered” means: If the parent node returns data from this child, it
>>>>>> won’t modify it, neither its content nor its position.  COW and R/W
>>>>>> filters differ in how they handle writes; R/W filters pass them through
>>>>>> to the filtered child, COW filters copy them off to some other child
>>>>>> node (and then the filtered child’s data will no longer be visible at
>>>>>> that location).
>>>>>
>>>>> But there is no reason why a node couldn't fulfill this condition for
>>>>> more than one child node. bdrv_filtered_child() isn't well-defined then.
>>>>> Technically, the description "Return any filtered child" is correct
>>>>> because "any" can be interpreted as "an arbitrary", but obviously that
>>>>> makes the function useless.
>>>>
>>>> Which is why it currently returns NULL for Quorum.
>>>
>>> Which is about the only possible choice that breaks the contract...
>>>
>>>  * Return any filtered child, independently of how it reacts to write
>>
>> I don’t know if you’re serious about this proposition, because I don’t
>> know whether that could be useful in any way. :-?
> 
> Huh? This is just quoting the contract from your code?

I see.  I was thinking about “any of COW/RW, of which only one exists”.
 There is an assertion for that (that only one filtered child exists at
a time) in the code.  (And I consider assertions part of the contract.)

>>>  * accesses and whether data is copied onto this BDS through COR.
>>
>> I meant the contract as “Return the single filtered child there is, or NULL”
> 
> Then that should probably be spelt out in the contract.Probably even
> explicitly "NULL if there is either no filtered child or multiple
> filtered children".

Well, it’s spelled out through the assertion, but not in the
documentation, yes.

>>> Maybe the documentation of bdrv_filtered_child() needs to be rephrased?
>>>
>>> Going back to qcow2, it's really not much different as it has multiple
>>> (two) filtered children, too.
>>
>> Well, it doesn’t.  It isn’t an R/W filter.
> 
> What do I have to look at to see whether something is an R/W filter or
> not? qcow2 matches your criteria for an R/W filter.

No.  Some qcow2 nodes match the criteria.  But not all, which makes the
qcow2 driver not a filter driver.

> You say that it's
> not useful, so it's not an R/W filter anyway. But where in the code
> could I get this information?

“Where in the code”?  Do you want to add a comment to every BlockDriver
structure on why it does or doesn’t set .is_filter?

> This just doesn't make sense to me. If a driver matches the criteria for
> an R/W filter, then it should be one. If qcow2 should not be considered
> a R/W filter, then the criteria must be changed so that it isn't.

See below.

>> Maybe what we actually need to rephrase is the definition of .is_filter.
>>  (Namely something along the lines of “Fulfills these guarantees (same
>> data, etc. pp.), *and* should be skipped for allocation information
>> queries etc.”.
> 
> Hm - does this imply that .is_filter == this is a R/W filter? Because
> this was never spelt out, neither in code comments nor in commit
> messages.

While I’m not a fan of comment-less code, I do think that it’s possible
to read code.  Which clearly stated this.

> If we called R/W filters just "filters" (which makes it obvious how it
> relates to .is_filter) and COW nodes something that doesn't include the
> word "filter", things might become a lot clearer.

Because you apparently wrote this before reading that I agreed to your
renaming proposal, I now feel free to argue that I could just as well
rename .is_filter to .is_rw_filter.

Obviously I won’t because I prefer your proposal.

[...]

>>>>> Specficially, according to your definition, qcow2 filters both the
>>>>> backing file (COW filter) and the external data file (R/W filter).
>>>>
>>>> Not wrong.  But the same question as for raw arises: Is there any use to
>>>> declaring qcow2 an R/W filter driver just because it fits the definition?
>>>
>>> Wait, where is there even a place where this could be declared?
>>>
>>> The once thing I see that a driver even can declare is drv->is_filter,
>>> which is about the whole driver and not about nodes. It is false for
>>> qcow2.
>>
>> That’s correct.  But that’s not a fundamental problem, of course, we
>> could make it a per-BDS attribute if that made sense.
> 
> I was thinking per-child, actually, because you declare one BdrvChild
> filtered and another not filtered.

Why don’t you say so from the start then?

(Sorry, but honestly about 30 % of this discussion to me feels like
you’re playing games with me.  Please don’t take this the wrong way, I
mean it very neutrally.  It’s just that I feel like I’m explaining
things to you that you very much know, but you just want me to say them.
 And that feels unproductive and sometimes indeed frustrating.)

One thing is that this wouldn’t make the quorum case any easier because
it actually doesn’t know for which children it acts as a filter and for
which it doesn’t.

> But by now I think most of the confusion is really just a result of COW
> being considered a filter in some respects (mainly just the names of the
> child access functions), but not in others (like .is_filter).

I don’t quite see how it’s “by now” when in your first mail you already
basically wrote that functionally, everything works (leaving out
quorum), but that you’re confused (or claim to be confused, I have no
idea what’s real and what’s pretended anymore) by the names.


We have come to two results, as far as I can see:

First, naming COW backing nodes “COW filtered children” clashes with our
existing use of ”filter”.  There is no point in forcing the ”filter”
label on everything.  We can just keep calling (R/W) filters filters and
COW backing children COW children.  The names are succinct enough.

In some cases, we don’t care whether something is a COW or filtered
child, in such a case a caller can be bothered to use the slightly
longer bdrv_cow_or_filtered_child().


Second, most of the time we want a filter node to have a clear and
unique path to go down.  This is the important property of filters: That
you can skip them and go to the node that actually has the data.

Quorum breaks this by having multiple children, and nobody knows which
of them has the data we will see on the next read operation.

All “filters” who could have multiple children would have this problem.
 Hence a filter must always have a single unique data child.  I think.

[...]

>>> Either use a narrow definition, or use a broad one. But use only one and
>>> use it consistently.
>>
>> I think the problem appears because you restrict the process to a single
>> step where there’s actually two.
>>
>> Drivers can be either
>> (1) R/W filters (e.g. throttle)
>> (2) COW filters (e.g. qcow2)
>> (3) None of the above (e.g. vhdx, curl)
>>
>> This choice is made on the driver level, not on the node level (for good
>> reason, see below*).
> 
> What prevents a driver from being
> (4) COW filter and R/W filter (e.g. qcow2 if it were useful)?
> 
> I mean, conceptually, not in the implementation.

An R/W filter always shows the same data as the filtered child.  So the
COW child‘s data can never be visible, and as such you couldn’t have a
COW child at the same time.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-09-10 11:49 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-09 16:13 [Qemu-devel] [PATCH v6 00/42] block: Deal with filters Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 01/42] block: Mark commit and mirror as filter drivers Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 02/42] copy-on-read: Support compressed writes Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 03/42] throttle: " Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 04/42] block: Add child access functions Max Reitz
2019-08-09 16:56   ` Eric Blake
2019-09-04 16:16   ` Kevin Wolf
2019-09-09  7:56     ` Max Reitz
2019-09-09  9:36       ` Kevin Wolf
2019-09-09 14:04         ` Max Reitz
2019-09-09 16:13           ` Kevin Wolf
2019-09-10  9:14             ` Max Reitz
2019-09-10 10:47               ` Kevin Wolf
2019-09-10 11:36                 ` Max Reitz [this message]
2019-09-10 12:48                   ` Kevin Wolf
2019-09-10 12:59                     ` Max Reitz
2019-09-10 13:10                       ` Kevin Wolf
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 05/42] block: Add chain helper functions Max Reitz
2019-08-09 17:01   ` Eric Blake
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 06/42] qcow2: Implement .bdrv_storage_child() Max Reitz
2019-08-09 17:07   ` Eric Blake
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 07/42] block: *filtered_cow_child() for *has_zero_init() Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 08/42] block: bdrv_set_backing_hd() is about bs->backing Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 09/42] block: Include filters when freezing backing chain Max Reitz
2019-08-10 13:32   ` Vladimir Sementsov-Ogievskiy
2019-08-12 12:56     ` Max Reitz
2019-09-05 13:05   ` Kevin Wolf
2019-09-09  8:02     ` Max Reitz
2019-09-09  9:40       ` Kevin Wolf
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 10/42] block: Drop bdrv_is_encrypted() Max Reitz
2019-08-10 13:42   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 11/42] block: Add bdrv_supports_compressed_writes() Max Reitz
2019-09-05 13:11   ` Kevin Wolf
2019-09-09  8:09     ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 12/42] block: Use bdrv_filtered_rw* where obvious Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 13/42] block: Use CAFs in block status functions Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 14/42] block: Use CAFs when working with backing chains Max Reitz
2019-08-10 15:19   ` Vladimir Sementsov-Ogievskiy
2019-09-05 14:05   ` Kevin Wolf
2019-09-09  8:25     ` Max Reitz
2019-09-09  9:55       ` Kevin Wolf
2019-09-09 14:08         ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 15/42] block: Re-evaluate backing file handling in reopen Max Reitz
2019-08-10 16:05   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 16/42] block: Flush all children in generic code Max Reitz
2019-08-10 15:36   ` Vladimir Sementsov-Ogievskiy
2019-08-12 12:58     ` Max Reitz
2019-09-05 16:24       ` Kevin Wolf
2019-09-09  8:31         ` Max Reitz
2019-09-09 10:01           ` Kevin Wolf
2019-09-09 14:15             ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 17/42] block: Use CAFs in bdrv_refresh_limits() Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 18/42] block: Use CAFs in bdrv_refresh_filename() Max Reitz
2019-08-10 16:22   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 19/42] block: Use CAF in bdrv_co_rw_vmstate() Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 20/42] block/snapshot: Fix fallback Max Reitz
2019-08-10 16:34   ` Vladimir Sementsov-Ogievskiy
2019-08-12 13:06     ` Max Reitz
2019-09-10 11:56   ` Kevin Wolf
2019-09-10 12:04     ` Max Reitz
2019-09-10 12:49       ` Kevin Wolf
2019-09-10 13:06         ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 21/42] block: Use CAFs for debug breakpoints Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback Max Reitz
2019-08-10 16:41   ` Vladimir Sementsov-Ogievskiy
2019-08-12 13:09     ` Max Reitz
2019-08-12 17:14       ` Vladimir Sementsov-Ogievskiy
2019-08-12 19:15         ` Max Reitz
2019-09-10 14:52   ` Kevin Wolf
2019-09-11  6:20     ` Max Reitz
2019-09-11  6:55       ` Kevin Wolf
2019-09-11  7:37         ` Max Reitz
2019-09-11  8:27           ` Kevin Wolf
2019-09-11 10:00             ` Max Reitz
2019-09-11 10:31               ` Kevin Wolf
2019-09-11 11:00                 ` Max Reitz
2019-09-12 10:34                   ` Kevin Wolf
2019-11-14 13:11                   ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 23/42] blockdev: Use CAF in external_snapshot_prepare() Max Reitz
2019-09-10 15:02   ` Kevin Wolf
2019-09-11  6:21     ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 24/42] block: Use child access functions for QAPI queries Max Reitz
2019-08-10 16:57   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 25/42] mirror: Deal with filters Max Reitz
2019-08-12 11:09   ` Vladimir Sementsov-Ogievskiy
2019-08-12 13:26     ` Max Reitz
2019-08-14 15:17       ` Vladimir Sementsov-Ogievskiy
2019-08-31  9:57   ` Vladimir Sementsov-Ogievskiy
2019-09-02 14:35     ` Max Reitz
2019-09-03  8:32       ` Vladimir Sementsov-Ogievskiy
2019-09-09  7:41         ` Max Reitz
2019-09-13 12:55   ` Kevin Wolf
2019-09-16 10:26     ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 26/42] backup: " Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 27/42] commit: " Max Reitz
2019-08-31 10:44   ` Vladimir Sementsov-Ogievskiy
2019-09-02 14:55     ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 28/42] stream: " Max Reitz
2019-08-12 11:55   ` Vladimir Sementsov-Ogievskiy
2019-09-13 14:16   ` Kevin Wolf
2019-09-16  9:52     ` Max Reitz
2019-09-16 14:47       ` Kevin Wolf
2019-12-11 12:52       ` Max Reitz
2019-12-11 15:52         ` Kevin Wolf
2019-12-11 16:12           ` Max Reitz
2019-12-11 16:35             ` Kevin Wolf
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 29/42] nbd: Use CAF when looking for dirty bitmap Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 30/42] qemu-img: Use child access functions Max Reitz
2019-08-12 12:14   ` Vladimir Sementsov-Ogievskiy
2019-08-12 13:28     ` Max Reitz
2019-08-14 16:04   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 31/42] block: Drop backing_bs() Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 32/42] block: Make bdrv_get_cumulative_perm() public Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 33/42] blockdev: Fix active commit choice Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 34/42] block: Inline bdrv_co_block_status_from_*() Max Reitz
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 35/42] block: Fix check_to_replace_node() Max Reitz
2019-08-15 15:21   ` Vladimir Sementsov-Ogievskiy
2019-08-15 17:01     ` Max Reitz
2019-08-16 11:01       ` Vladimir Sementsov-Ogievskiy
2019-08-16 13:30         ` Max Reitz
2019-08-16 14:24           ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 36/42] iotests: Add tests for mirror @replaces loops Max Reitz
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 37/42] block: Leave BDS.backing_file constant Max Reitz
2019-08-16 16:16   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 38/42] iotests: Let complete_and_wait() work with commit Max Reitz
2019-08-23  5:59   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 39/42] iotests: Add filter commit test cases Max Reitz
2019-08-31 11:41   ` Vladimir Sementsov-Ogievskiy
2019-09-02 15:06     ` Max Reitz
2019-08-31 12:35   ` Vladimir Sementsov-Ogievskiy
2019-09-02 15:09     ` Max Reitz
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 40/42] iotests: Add filter mirror " Max Reitz
2019-08-31 12:35   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 41/42] iotests: Add test for commit in sub directory Max Reitz
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 42/42] iotests: Test committing to overridden backing Max Reitz
2019-09-03  9:18   ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=00aa6729-5fa0-31e0-8af5-1a91ae034f28@redhat.com \
    --to=mreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).