qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback
Date: Wed, 11 Sep 2019 12:00:40 +0200	[thread overview]
Message-ID: <aa9a6a3d-73ae-c7d4-78c9-a9c6c18fcfa4@redhat.com> (raw)
In-Reply-To: <20190911082743.GC4907@localhost.localdomain>


[-- Attachment #1.1: Type: text/plain, Size: 8271 bytes --]

On 11.09.19 10:27, Kevin Wolf wrote:
> Am 11.09.2019 um 09:37 hat Max Reitz geschrieben:
>> On 11.09.19 08:55, Kevin Wolf wrote:
>>> Am 11.09.2019 um 08:20 hat Max Reitz geschrieben:
>>>> On 10.09.19 16:52, Kevin Wolf wrote:
>>>>> Am 09.08.2019 um 18:13 hat Max Reitz geschrieben:
>>>>>> If the driver does not implement bdrv_get_allocated_file_size(), we
>>>>>> should fall back to cumulating the allocated size of all non-COW
>>>>>> children instead of just bs->file.
>>>>>>
>>>>>> Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>
>>>>> This smells like an overgeneralisation, but if we want to count all vmdk
>>>>> extents, the qcow2 external data file, etc. it's an improvement anyway.
>>>>> A driver that has a child that should not be counted must just remember
>>>>> to implement the callback.
>>>>>
>>>>> Let me think of an example... How about quorum, for a change? :-)
>>>>> Or the second blkverify child.
>>>>>
>>>>> Or eventually the block job filter nodes.
>>>>
>>>> I actually think it makes sense for all of these nodes to report the sum
>>>> of all of their children’s allocated sizes.
>>>
>>> Hm... Yes, in a way. But not much more than it would make sense to
>>> report the sum of the sizes of all images in the whole backing chain
>>> (this is a useful thing to ask for, just maybe not the right thing to
>>> return for a low-level interface). But I can accept that it's maybe a
>>> bit more expected for quorum and blkverify than for COW images.
>>>
>>> If you include the block job filter nodes, I have to disagree, though.
>>> If mirror_top_bs (or any other job filter) sits in the middle of the
>>> source chain, then I certainly don't want to see the target size added
>>> to it.
>>
>> Hm, I don’t care much either way.  I think it makes complete sense to
>> add the target size there, but OTOH it’s only temporary while the job
>> runs, so it may be a bit confusing if it suddenly goes up and then down
>> again.
> 
> I think the number that most users are interested in is knowing how much
> space the image for their /dev/vda takes up on the host.
> 
> I can see how they might be interested in not only that one image file,
> but all other image files connected to it, i.e. their /dev/vda with all
> of its snapshots. This would mean counting backing files. I think adding
> up the numbers for this should be done in the management layer.

My main argument against counting backing files is that we’ve never done it.

(Whereas for quorum, I’d argue we just forgot to adjust
bdrv_get_allocated_file_size() for it.)

> I can possibly also imagine users wanting to count everything that's
> even loosely connected to their /dev/vda, like copies of it. I doubt,
> however, they want to count only copies that are currently being made,
> but not snapshots and copies that have been completed earlier. So this
> is clearly a management layer thing, too.

OK.

>> But I think this is the special case, so this is what should be handled
>> in a driver callback.
> 
> It's a special case, yes. But see below.
> 
>>>> If a quorum node has three children with allocated sizes of 3 MB, 1 MB,
>>>> and 2 MB, respectively (totally possible if some have explicit zeroes
>>>> and others don’t; it may also depend on the protocol, the filesystem,
>>>> etc.), then I think it makes most sense to report indeed 6 MB for the
>>>> quorum subtree as a whole.  What would you report?  3 MB?
>>>
>>> Do it's the quorum way: Just vote!
>>
>> Add an option for it?  Average, maximum, median, majority, sum? :-)
> 
> We could also introduce a mode with an Electoral College so that
> sometimes an image that missed the majority has a chance to win anyway.

That’s actually a good idea for a quorum mode in general.  Who says the
majority is right?  Better let someone with more authority cross-check
the result.

>>> No, you're right, of course. -ENOTSUP is probably the only other thing
>>> you could do then.
>>>
>>>>> Ehm... Maybe I should just take back what I said first. It almost feels
>>>>> like it would be better if qcow2 and vmdk explicitly used a handler that
>>>>> counts all children (could still be a generic one in block.c) rather
>>>>> than having to remember to disable the functionality everywhere where we
>>>>> don't want to have it.
>>>>
>>>> I don’t, because everywhere we don’t want this functionality, we still
>>>> need to choose a child.  This has to be done by the driver anyway.
>>>
>>> Well, by default the primary child, which should cover like 90% of the
>>> drivers?
>>
>> Hm, yes.
>>
>> But I still think that the drivers that do not want to count every
>> single non-COW child are the exception.
> 
> They are, but drivers that want to count more than their primary node
> are exceptions, too. And I think you're more likely to remember adding
> the callback when you want to have a certain feature, not when you don't
> want to have it.
> 
> I really think we're likely to forget adding the callback where we need
> to disable the feature.

Well, I mean, we did forget adding it for qcow2.

> I can see two options that should address both of our views:
> 
> 1. Just don't have a fallback at all, make the callback mandatory and
>    provide implementations in block.c that can be referred to in
>    BlockDriver. Not specifying the callback causes an assertion failure,
>    so we'd hopefully notice it quite early (assuming that we run either
>    'qemu-img info' or 'query-block' on a configuration with the block
>    driver, but I think that's faily safe to assume).

Hm.  Seems a bit much, but if we can’t agree on what’s a good general
implementation that works for everything, this is probably the only
thing that would actually keep us from forgetting to add special cases.

Though I actually don’t know.  I’d probably add two globally available
helpers, one that returns the sum of everything but the backing node,
and one that just returns the primary node.

Now if I were to make qcow2 use the primary node helper function, would
we have remembered changing it once we added a data file?

Hmm.  Maybe not, but it should be OK to just make everything use the sum
helper, except the drivers that want the primary node.  That should work
for all cases.  (I think that whenever a format driver suddenly gains
more child nodes, we probably will want to count them.  OTOH, everything
that has nodes that shouldn’t be counted probably always wants to use
the primary node helper function from the start.)

> 2. Make the 90% solution a 100% solution: Allow drivers to have multiple
>    storage children (for vmdk) and then have the fallback add up the
>    primary child plus all storage children. This is what I suggested as
>    the documented semantics in my initial reply to this patch (that you
>    chose not to answer).

I didn’t answer that because I didn’t disagree.

>    Adding the size of storage children covers qcow2 and vmdk.

That’s of course exactly what we’re trying to do, but the question is,
how do we figure out that storage children?  Make it a per-BdrvChild
attribute?  That seems rather heavy-handed, because I think we’d need it
only here.

>    As the job filter won't declare the target or any other involved
>    nodes their storage nodes (I hope), this will do the right thing for
>    them, too.
> 
>    For quorum and blkverify both ways could be justifiable. I think they
>    probably shouldn't declare their children as storage nodes. They are
>    more like filters that don't have a single filtered node. So some
>    kind of almost-filters.

I don’t think quorum is a filter, and blkverify can only be justified to
be a filter because it quits qemu when there is a mismatch.

The better example is replication, but that has a clear filtered child
(the primary node).


So all in all I think it’s best to make the callback mandatory and add
two global helper functions.  That’s simple enough and should prevent us
from making mistakes by forgetting to adjust something in the future.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-09-11 10:02 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-09 16:13 [Qemu-devel] [PATCH v6 00/42] block: Deal with filters Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 01/42] block: Mark commit and mirror as filter drivers Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 02/42] copy-on-read: Support compressed writes Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 03/42] throttle: " Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 04/42] block: Add child access functions Max Reitz
2019-08-09 16:56   ` Eric Blake
2019-09-04 16:16   ` Kevin Wolf
2019-09-09  7:56     ` Max Reitz
2019-09-09  9:36       ` Kevin Wolf
2019-09-09 14:04         ` Max Reitz
2019-09-09 16:13           ` Kevin Wolf
2019-09-10  9:14             ` Max Reitz
2019-09-10 10:47               ` Kevin Wolf
2019-09-10 11:36                 ` Max Reitz
2019-09-10 12:48                   ` Kevin Wolf
2019-09-10 12:59                     ` Max Reitz
2019-09-10 13:10                       ` Kevin Wolf
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 05/42] block: Add chain helper functions Max Reitz
2019-08-09 17:01   ` Eric Blake
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 06/42] qcow2: Implement .bdrv_storage_child() Max Reitz
2019-08-09 17:07   ` Eric Blake
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 07/42] block: *filtered_cow_child() for *has_zero_init() Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 08/42] block: bdrv_set_backing_hd() is about bs->backing Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 09/42] block: Include filters when freezing backing chain Max Reitz
2019-08-10 13:32   ` Vladimir Sementsov-Ogievskiy
2019-08-12 12:56     ` Max Reitz
2019-09-05 13:05   ` Kevin Wolf
2019-09-09  8:02     ` Max Reitz
2019-09-09  9:40       ` Kevin Wolf
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 10/42] block: Drop bdrv_is_encrypted() Max Reitz
2019-08-10 13:42   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 11/42] block: Add bdrv_supports_compressed_writes() Max Reitz
2019-09-05 13:11   ` Kevin Wolf
2019-09-09  8:09     ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 12/42] block: Use bdrv_filtered_rw* where obvious Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 13/42] block: Use CAFs in block status functions Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 14/42] block: Use CAFs when working with backing chains Max Reitz
2019-08-10 15:19   ` Vladimir Sementsov-Ogievskiy
2019-09-05 14:05   ` Kevin Wolf
2019-09-09  8:25     ` Max Reitz
2019-09-09  9:55       ` Kevin Wolf
2019-09-09 14:08         ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 15/42] block: Re-evaluate backing file handling in reopen Max Reitz
2019-08-10 16:05   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 16/42] block: Flush all children in generic code Max Reitz
2019-08-10 15:36   ` Vladimir Sementsov-Ogievskiy
2019-08-12 12:58     ` Max Reitz
2019-09-05 16:24       ` Kevin Wolf
2019-09-09  8:31         ` Max Reitz
2019-09-09 10:01           ` Kevin Wolf
2019-09-09 14:15             ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 17/42] block: Use CAFs in bdrv_refresh_limits() Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 18/42] block: Use CAFs in bdrv_refresh_filename() Max Reitz
2019-08-10 16:22   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 19/42] block: Use CAF in bdrv_co_rw_vmstate() Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 20/42] block/snapshot: Fix fallback Max Reitz
2019-08-10 16:34   ` Vladimir Sementsov-Ogievskiy
2019-08-12 13:06     ` Max Reitz
2019-09-10 11:56   ` Kevin Wolf
2019-09-10 12:04     ` Max Reitz
2019-09-10 12:49       ` Kevin Wolf
2019-09-10 13:06         ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 21/42] block: Use CAFs for debug breakpoints Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback Max Reitz
2019-08-10 16:41   ` Vladimir Sementsov-Ogievskiy
2019-08-12 13:09     ` Max Reitz
2019-08-12 17:14       ` Vladimir Sementsov-Ogievskiy
2019-08-12 19:15         ` Max Reitz
2019-09-10 14:52   ` Kevin Wolf
2019-09-11  6:20     ` Max Reitz
2019-09-11  6:55       ` Kevin Wolf
2019-09-11  7:37         ` Max Reitz
2019-09-11  8:27           ` Kevin Wolf
2019-09-11 10:00             ` Max Reitz [this message]
2019-09-11 10:31               ` Kevin Wolf
2019-09-11 11:00                 ` Max Reitz
2019-09-12 10:34                   ` Kevin Wolf
2019-11-14 13:11                   ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 23/42] blockdev: Use CAF in external_snapshot_prepare() Max Reitz
2019-09-10 15:02   ` Kevin Wolf
2019-09-11  6:21     ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 24/42] block: Use child access functions for QAPI queries Max Reitz
2019-08-10 16:57   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 25/42] mirror: Deal with filters Max Reitz
2019-08-12 11:09   ` Vladimir Sementsov-Ogievskiy
2019-08-12 13:26     ` Max Reitz
2019-08-14 15:17       ` Vladimir Sementsov-Ogievskiy
2019-08-31  9:57   ` Vladimir Sementsov-Ogievskiy
2019-09-02 14:35     ` Max Reitz
2019-09-03  8:32       ` Vladimir Sementsov-Ogievskiy
2019-09-09  7:41         ` Max Reitz
2019-09-13 12:55   ` Kevin Wolf
2019-09-16 10:26     ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 26/42] backup: " Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 27/42] commit: " Max Reitz
2019-08-31 10:44   ` Vladimir Sementsov-Ogievskiy
2019-09-02 14:55     ` Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 28/42] stream: " Max Reitz
2019-08-12 11:55   ` Vladimir Sementsov-Ogievskiy
2019-09-13 14:16   ` Kevin Wolf
2019-09-16  9:52     ` Max Reitz
2019-09-16 14:47       ` Kevin Wolf
2019-12-11 12:52       ` Max Reitz
2019-12-11 15:52         ` Kevin Wolf
2019-12-11 16:12           ` Max Reitz
2019-12-11 16:35             ` Kevin Wolf
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 29/42] nbd: Use CAF when looking for dirty bitmap Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 30/42] qemu-img: Use child access functions Max Reitz
2019-08-12 12:14   ` Vladimir Sementsov-Ogievskiy
2019-08-12 13:28     ` Max Reitz
2019-08-14 16:04   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 31/42] block: Drop backing_bs() Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 32/42] block: Make bdrv_get_cumulative_perm() public Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 33/42] blockdev: Fix active commit choice Max Reitz
2019-08-09 16:13 ` [Qemu-devel] [PATCH v6 34/42] block: Inline bdrv_co_block_status_from_*() Max Reitz
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 35/42] block: Fix check_to_replace_node() Max Reitz
2019-08-15 15:21   ` Vladimir Sementsov-Ogievskiy
2019-08-15 17:01     ` Max Reitz
2019-08-16 11:01       ` Vladimir Sementsov-Ogievskiy
2019-08-16 13:30         ` Max Reitz
2019-08-16 14:24           ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 36/42] iotests: Add tests for mirror @replaces loops Max Reitz
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 37/42] block: Leave BDS.backing_file constant Max Reitz
2019-08-16 16:16   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 38/42] iotests: Let complete_and_wait() work with commit Max Reitz
2019-08-23  5:59   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 39/42] iotests: Add filter commit test cases Max Reitz
2019-08-31 11:41   ` Vladimir Sementsov-Ogievskiy
2019-09-02 15:06     ` Max Reitz
2019-08-31 12:35   ` Vladimir Sementsov-Ogievskiy
2019-09-02 15:09     ` Max Reitz
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 40/42] iotests: Add filter mirror " Max Reitz
2019-08-31 12:35   ` Vladimir Sementsov-Ogievskiy
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 41/42] iotests: Add test for commit in sub directory Max Reitz
2019-08-09 16:14 ` [Qemu-devel] [PATCH v6 42/42] iotests: Test committing to overridden backing Max Reitz
2019-09-03  9:18   ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa9a6a3d-73ae-c7d4-78c9-a9c6c18fcfa4@redhat.com \
    --to=mreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).