All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>
Cc: Kevin Wolf <kwolf@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions
Date: Tue, 7 May 2019 15:15:58 +0200	[thread overview]
Message-ID: <cf8740d9-9a28-e4a8-f3aa-5d5caa43f791@redhat.com> (raw)
In-Reply-To: <36a2896f-003a-49ce-0bbf-cf59faf67e0a@virtuozzo.com>

[-- Attachment #1: Type: text/plain, Size: 15287 bytes --]

On 07.05.19 11:32, Vladimir Sementsov-Ogievskiy wrote:
> 24.04.2019 19:36, Max Reitz wrote:
>> On 19.04.19 12:23, Vladimir Sementsov-Ogievskiy wrote:
>>> 17.04.2019 19:22, Max Reitz wrote:
>>>> On 16.04.19 12:02, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 10.04.2019 23:20, Max Reitz wrote:
>>>>>> What bs->file and bs->backing mean depends on the node.  For filter
>>>>>> nodes, both signify a node that will eventually receive all R/W
>>>>>> accesses.  For format nodes, bs->file contains metadata and data, and
>>>>>> bs->backing will not receive writes -- instead, writes are COWed to
>>>>>> bs->file.  Usually.
>>>>>>
>>>>>> In any case, it is not trivial to guess what a child means exactly with
>>>>>> our currently limited form of expression.  It is better to introduce
>>>>>> some functions that actually guarantee a meaning:
>>>>>>
>>>>>> - bdrv_filtered_cow_child() will return the child that receives requests
>>>>>>      filtered through COW.  That is, reads may or may not be forwarded
>>>>>>      (depending on the overlay's allocation status), but writes never go to
>>>>>>      this child.
>>>>>>
>>>>>> - bdrv_filtered_rw_child() will return the child that receives requests
>>>>>>      filtered through some very plain process.  Reads and writes issued to
>>>>>>      the parent will go to the child as well (although timing, etc. may be
>>>>>>      modified).
>>>>>>
>>>>>> - All drivers but quorum (but quorum is pretty opaque to the general
>>>>>>      block layer anyway) always only have one of these children: All read
>>>>>>      requests must be served from the filtered_rw_child (if it exists), so
>>>>>>      if there was a filtered_cow_child in addition, it would not receive
>>>>>>      any requests at all.
>>>>>>      (The closest here is mirror, where all requests are passed on to the
>>>>>>      source, but with write-blocking, write requests are "COWed" to the
>>>>>>      target.  But that just means that the target is a special child that
>>>>>>      cannot be introspected by the generic block layer functions, and that
>>>>>>      source is a filtered_rw_child.)
>>>>>>      Therefore, we can also add bdrv_filtered_child() which returns that
>>>>>>      one child (or NULL, if there is no filtered child).
>>>>>>
>>>>>> Also, many places in the current block layer should be skipping filters
>>>>>> (all filters or just the ones added implicitly, it depends) when going
>>>>>> through a block node chain.  They do not do that currently, but this
>>>>>> patch makes them.
>>>>>>
>>>>>> One example for this is qemu-img map, which should skip filters and only
>>>>>> look at the COW elements in the graph.  The change to iotest 204's
>>>>>> reference output shows how using blkdebug on top of a COW node used to
>>>>>> make qemu-img map disregard the rest of the backing chain, but with this
>>>>>> patch, the allocation in the base image is reported correctly.
>>>>>>
>>>>>> Furthermore, a note should be made that sometimes we do want to access
>>>>>> bs->backing directly.  This is whenever the operation in question is not
>>>>>> about accessing the COW child, but the "backing" child, be it COW or
>>>>>> not.  This is the case in functions such as bdrv_open_backing_file() or
>>>>>> whenever we have to deal with the special behavior of @backing as a
>>>>>> blockdev option, which is that it does not default to null like all
>>>>>> other child references do.
>>>>>>
>>>>>> Finally, the query functions (query-block and query-named-block-nodes)
>>>>>> are modified to return any filtered child under "backing", not just
>>>>>> bs->backing or COW children.  This is so that filters do not interrupt
>>>>>> the reported backing chain.  This changes the output of iotest 184, as
>>>>>> the throttled node now appears as a backing child.
>>>>>>
>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>> ---
>>>>>>     qapi/block-core.json           |   4 +
>>>>>>     include/block/block.h          |   1 +
>>>>>>     include/block/block_int.h      |  40 +++++--
>>>>>>     block.c                        | 210 +++++++++++++++++++++++++++------
>>>>>>     block/backup.c                 |   8 +-
>>>>>>     block/block-backend.c          |  16 ++-
>>>>>>     block/commit.c                 |  33 +++---
>>>>>>     block/io.c                     |  45 ++++---
>>>>>>     block/mirror.c                 |  21 ++--
>>>>>>     block/qapi.c                   |  30 +++--
>>>>>>     block/stream.c                 |  13 +-
>>>>>>     blockdev.c                     |  88 +++++++++++---
>>>>>>     migration/block-dirty-bitmap.c |   4 +-
>>>>>>     nbd/server.c                   |   6 +-
>>>>>>     qemu-img.c                     |  29 ++---
>>>>>>     tests/qemu-iotests/184.out     |   7 +-
>>>>>>     tests/qemu-iotests/204.out     |   1 +
>>>>>>     17 files changed, 411 insertions(+), 145 deletions(-)
>>>>>
>>>>> really huge... didn't you consider conversion file-by-file?
>>>>
>>>> Frankly, no, I just didn’t consider it.
>>>>
>>>> Hm.  I don’t know, 30-patch series always look so frightening.
>>>>
>>>>>> diff --git a/block.c b/block.c
>>>>>> index 16615bc876..e8f6febda0 100644
>>>>>> --- a/block.c
>>>>>> +++ b/block.c
>>>>>
>>>>> [..]
>>>>>
>>>>>>     
>>>>>> @@ -3467,14 +3469,17 @@ static int bdrv_reopen_parse_backing(BDRVReopenState *reopen_state,
>>>>>>         /*
>>>>>>          * Find the "actual" backing file by skipping all links that point
>>>>>>          * to an implicit node, if any (e.g. a commit filter node).
>>>>>> +     * We cannot use any of the bdrv_skip_*() functions here because
>>>>>> +     * those return the first explicit node, while we are looking for
>>>>>> +     * its overlay here.
>>>>>>          */
>>>>>>         overlay_bs = bs;
>>>>>> -    while (backing_bs(overlay_bs) && backing_bs(overlay_bs)->implicit) {
>>>>>> -        overlay_bs = backing_bs(overlay_bs);
>>>>>> +    while (overlay_bs->backing && bdrv_filtered_bs(overlay_bs)->implicit) {
>>>>>
>>>>> So, you don't want to skip implicit filters with 'file' child? Then, why not to use
>>>>> child_bs(overlay_bs->backing), like in following if condition?
>>>>
>>>> I think it was an artifact of writing the patch.  I started with
>>>> bdrv_filtered_bs() and then realized this depends on ->backing,
>>>> actually.  There was no functional difference so I left it as it was.
>>>>
>>>> But you’re right, it is more clear to use child_bs(overlay_bs->backing)
>>>> isntead.
>>>>
>>>>> Could we instead make backing-based filters equal to file-based, to make it possible
>>>>> to use file-based filters in backing-chain related scenarios (like upcoming copy-on-read
>>>>> filter for stream)? So, to expand backing-chain concept to include filters with file child?
>>>>
>>>> If I understand you correctly, that’s basically the purpose of this
>>>> series and especially this patch here.  As far as it is possible and
>>>> reasonable, I want filters that use bs->backing and bs->file behave the
>>>> same.
>>>>
>>>> However, there are cases where this is not possible and
>>>> bdrv_reopen_parse_backing() is one such case.  bs->backing and bs->file
>>>> correspond to QAPI names, namely 'backing' and 'file'.  If that
>>>> distinction was already visible to the user, we cannot change it now.
>>>>
>>>> We definitely cannot make file-based filters use bs->backing now because
>>>> you can create them over QAPI and they use 'file' as their child name.
>>>> Can we make backing-based filters use bs->file?  Seems more likely,
>>>> because all of them are implicit nodes, so the user usually doesn’t see
>>>> them.  But usually isn’t always; they do become user-visible once the
>>>> user specifies a node-name for mirror or commit.
>>>>
>>>> I found it more reasonable to introduce new functions that explicitly
>>>> express what kind of child they expect and then apply them everywhere as
>>>> I saw fit, instead of making the mirror/commit filter drivers use
>>>> bs->file and hope it works; not least because I’d still have to go
>>>> through the whole block layer and check every instance of bs->backing to
>>>> see whether it really needs bs->backing or whether it should use either
>>>> of bs->backing or bs->file.
>>>>
>>>>>> +        overlay_bs = bdrv_filtered_bs(overlay_bs);
>>>>>>         }
>>>>>>     
>>>>>>         /* If we want to replace the backing file we need some extra checks */
>>>>>> -    if (new_backing_bs != backing_bs(overlay_bs)) {
>>>>>> +    if (new_backing_bs != child_bs(overlay_bs->backing)) { >           /* Check for implicit nodes between bs and its backing file */
>>>>>>             if (bs != overlay_bs) {
>>>>>>                 error_setg(errp, "Cannot change backing link if '%s' has "
>>>>>
>>>>> [..]
>>>>>
>>>>>> @@ -4203,8 +4208,8 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>>>>>     BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>>>>>>                                         BlockDriverState *bs)
>>>>>>     {
>>>>>> -    while (active && bs != backing_bs(active)) {
>>>>>> -        active = backing_bs(active);
>>>>>> +    while (active && bs != bdrv_filtered_bs(active)) {
>>>>>
>>>>> hmm and here you actually support backing-chain with file-child-based filters in it..
>>>>
>>>> Yes, because this is not about the QAPI 'backing' link.  This function
>>>> should continue to work even if there are filters in the backing chain.
>>>
>>> this is a generic function to find overlay in backing chain and it may be used from different places,
>>> for example it is used in Andrey's series about filter for block-stream.
>>
>> Well, all places that use it accept backing chains with filters inside
>> of them.
>>
>>> It is used from qmp_block_commit, isn't it about QAPI?
>>
>> By "QAPI 'backing' link" I mean the user-visible block graph.  Hm.  I
>> wrote in my other mail that you could use query-named-block-nodes to see
>> that graph; apparently you can’t.  So besides x-debug-query-block-graph,
>> we still don’t have any facility to query the block graph?  I don’t know
>> what to say.
>>
>> Anyway, you can still construct the graph with blockdev-add, so it is
>> user-visible.  And in that block graph, there is a 'backing' link, and
>> there is a 'file' link -- this is what I mean with "QAPI link".
>>
>> We have commands that are abstract and don’t work on specific graph
>> links.  For instance, block-commit commits across a backing chain, so it
>> doesn’t matter whether the graph link is called 'backing' or whatever,
>> what is important is that it’s a COW link.  But we should also ignore
>> filters on the way, so this patch makes block-commit and others use
>> those more abstract child access functions.
>>
>> But whenever it is about exactly the "file" or the "backing" link, we
>> have to use bs->file and bs->backing, respectively.  That's just how it
>> currently is.
>>
>>>>>> +        active = bdrv_filtered_bs(active);
>>>>>>         }
>>>>>>     
>>>>>>         return active;
>>>>>> @@ -4226,11 +4231,11 @@ bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
>>>>>>     {
>>>>>>         BlockDriverState *i;
>>>>>>     
>>>>>> -    for (i = bs; i != base; i = backing_bs(i)) {
>>>>>> +    for (i = bs; i != base; i = child_bs(i->backing)) {
>>>>>
>>>>> and here don't..
>>>>
>>>> Yes, because this function is about the QAPI 'backing' link.
>>>
>>> And this again a generic thing, that may be used in same places as bdrv_find_overlay,
>>
>> But it isn’t.
>>
>>> and it is used in series about block-stream filter too. So, for further developments
>>> we'll have to keep in mind all these differences between generic block layer functions,
>>> which supports .file children inside backing chain and which are not...
>>
>> I was wrong about bdrv_is_backing_chain_frozen(), if that helps (as I
>> wrote in my other (previous) mail).
>>
>> But for example bdrv_set_backing_hd() always has to use bs->backing,
>> because that’s what it’s about (and I do change its descriptive comment
>> to reflect that, so you don’t need to keep it in mind).  Same for
>> bdrv_open_backing_file().
>>
>> Hm, what other cases are there...
>>
>> bdrv_reopen_parse_backing(): Fundamentally, this too is about the
>> user-visible "backing" link (as specified through x-blockdev-reopen).
>> But the loop it contains is more difficult to translate than I had
>> thought.  At some point, there needs to be a bs->backing link, because
>> that is what this function is about, but it should also skip all
>> implicit filters in the way, I think.  So e.g. this should be recognized:
>>
>> bs  ---backing-->  COR ---file-->  base
>>
>> @overlay_bs should be COR, I think...?  I mean, as long as COR is an
>> implicit node.  So the loop really should use bdrv_filtered_bs()
>> everywhere, and then the same afterwards.  I think that we should also
>> ensure that @bs can support a ->backing child, but how would I check
>> that?  Maybe it’s safe to just omit such a check...
>>
>> But then another issue comes in: The link to replace (in the above case
>> from "COR" to "base") is no longer necessarily a backing link.  So
>> bdrv_reopen_commit() has to be capable of replacing both bs->backing and
>> bs->file.
>>
>> Actually, how does bdrv_reopen_commit() handle implicit nodes at all?
>> bdrv_reopen_parse_backing() just sets reopen_state->replace_backing_bs
>> and ->new_backing_bs.  It doesn’t communicate anything about overlay_bs.
>>   bdrv_reopen_commit() then asserts that !bs->backing->bs->implicit and
>> replaces bs->backing.  So it seems to just fail on the implicit nodes
>> that bdrv_reopen_parse_backing() took care to skip...
>>
>>
>> OK, what else...  bdrv_reopen_prepare() checks
>> reopen_state->bs->backing, which I claim is correct because while there
>> may be implicit filters in the chain, the first link has to be a
>> ->backing link.
> 
> [sorry for a long delay]
> Are you working on next version or waiting for more reviews?

I haven’t worked on the next version yet, but that’s just because other
things were more important, not because of reviews.

> Why first link should be backing? We want to skip all implicit filters, including
> file-child-based in following call to bdrv_reopen_parse_backing(). So, don't we
> want something like bdrv_backing_chain_next() here? But then a question, could
> reopen_state->bs be filter itself...

Because this function is about the 'backing' option.  As I explained
above, this must correspond to a bs->backing child.  If there is an
implicit filter, it will still be under bs->backing.

Max

>> bdrv_backing_overridden() has to query bs->backing because this function
>> is used when it is about a specific characteristic of the backing link:
>> There is a non-null default (given by the image header), so if the
>> current bs->backing matches this default, you do not have to specify the
>> backing filename in either blockdev-add or a filename.  Same in
>> bdrv_refresh_filename().
>>
>>
>> I hope that was all...?
>>
>> Max
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-05-07 13:17 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-10 20:20 [Qemu-devel] [PATCH v4 00/11] block: Deal with filters Max Reitz
2019-04-10 20:20 ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 01/11] block: Mark commit and mirror as filter drivers Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 02/11] block: Filtered children access functions Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-16 10:02   ` Vladimir Sementsov-Ogievskiy
2019-04-17 16:22     ` Max Reitz
2019-04-18  8:36       ` Vladimir Sementsov-Ogievskiy
2019-04-24 15:23         ` Max Reitz
2019-04-19 10:23       ` Vladimir Sementsov-Ogievskiy
2019-04-24 16:36         ` Max Reitz
2019-05-07  9:32           ` Vladimir Sementsov-Ogievskiy
2019-05-07 13:15             ` Max Reitz [this message]
2019-05-07 13:33               ` Vladimir Sementsov-Ogievskiy
2019-05-31 16:26     ` Max Reitz
2019-05-31 17:02       ` Max Reitz
2019-05-07 13:30   ` Vladimir Sementsov-Ogievskiy
2019-05-07 15:13     ` Max Reitz
2019-05-17 11:50       ` Vladimir Sementsov-Ogievskiy
2019-05-23 14:49         ` Max Reitz
2019-05-23 15:08           ` Vladimir Sementsov-Ogievskiy
2019-05-23 15:56             ` Max Reitz
2019-05-17 14:50   ` Vladimir Sementsov-Ogievskiy
2019-05-23 17:27     ` Max Reitz
2019-05-24  8:12       ` Vladimir Sementsov-Ogievskiy
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 03/11] block: Storage child access function Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-05-20 10:41   ` Vladimir Sementsov-Ogievskiy
2019-05-28 18:09     ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 04/11] block: Inline bdrv_co_block_status_from_*() Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-05-21  8:57   ` Vladimir Sementsov-Ogievskiy
2019-05-28 17:58     ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 05/11] block: Fix check_to_replace_node() Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 06/11] iotests: Add tests for mirror @replaces loops Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 07/11] block: Leave BDS.backing_file constant Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 08/11] iotests: Add filter commit test cases Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 09/11] iotests: Add filter mirror " Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 10/11] iotests: Add test for commit in sub directory Max Reitz
2019-04-10 20:20   ` Max Reitz
2019-04-10 20:20 ` [Qemu-devel] [PATCH v4 11/11] iotests: Test committing to overridden backing Max Reitz
2019-04-10 20:20   ` Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf8740d9-9a28-e4a8-f3aa-5d5caa43f791@redhat.com \
    --to=mreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.