All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Qemu-block <qemu-block@nongnu.org>, Max Reitz <mreitz@redhat.com>
Subject: Re: bitmap migration bug with -drive while block mirror runs
Date: Tue, 1 Oct 2019 12:23:02 -0400	[thread overview]
Message-ID: <2114e2a7-8cf7-ce27-6d84-00526fd2ab04@redhat.com> (raw)
In-Reply-To: <20191001155859.GE4688@linux.fritz.box>



On 10/1/19 11:58 AM, Kevin Wolf wrote:
> Am 01.10.2019 um 17:09 hat John Snow geschrieben:
>>
>>
>> On 10/1/19 5:54 AM, Kevin Wolf wrote:
>>> Am 01.10.2019 um 10:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 01.10.2019 3:09, John Snow wrote:
>>>>> Hi folks, I identified a problem with the migration code that Red Hat QE
>>>>> found and thought you'd like to see it:
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>>>>
>>>>> Very, very briefly: drive-mirror inserts a filter node that changes what
>>>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>>>>
>>>>>
>>>>> Ignorant question #1: Can we multi-parent the filter node and
>>>>> source-node? It looks like at the moment both consider their only parent
>>>>> to be the block-job and don't have a link back to their parents otherwise.
>>>>>
>>>>>
>>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>>>> ultimately what we want is to be able to find the "addressable" name for
>>>>> the node the bitmap is attached to, which would be the name of the first
>>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>>>> above that node.)
>>>>
>>>>
>>>> Better would be to migrate by node-name only.. But am I right that
>>>> node-names are different on source and destination? Or this situation
>>>> changed?
>>>
>>> Traditionally, I think migration assumes that frontends (guest devices)
>>> must match exactly, but backends may and usually will differ.
>>>
>>> Of course, dirty bitmaps are a backend feature that isn't really related
>>> to guest devices, so this doesn't really work out any more in your case.
>>> BlockBackend names are unusable for this purpose (especially as we're
>>> moving towards anonymous BlockBackends everywhere), which I guess
>>> essentially means node-name is the only option left.
>>>
>>
>> The problem as I see it involves API stability.
>>
>> We allow block-dirty-bitmap-add against e.g. "drive1" through the
>> block-backend name (the name of the "drive" as the user sees it.)
>>
>> Of course, once you start mirror, you aren't able to access that bitmap
>> through that namepair anymore -- the "address" of the bitmap has "changed"!
>>
>> (In actual fact, the bitmap always had two addresses; and simply we lost
>> an alias -- but it's the one that the user likely used to create the
>> bitmap, so that's bad.)
> 
> So if I understand correctly, the problem is that without a filter, in
> some setups we get a usable BlockBackend name like "drive1", but when a
> filter is added, we return the node-name instead which is
> auto-generated and will be different on the destination.
> 
> Looking at the ChildRole documentation:
> 
>     /* Returns a name that is supposedly more useful for human users than the
>      * node name for identifying the node in question (in particular, a BB
>      * name), or NULL if the parent can't provide a better name. */
>     const char *(*get_name)(BdrvChild *child);
> 
> I'd argue that a BlockBackend name is more useful for a human user even
> across filter, so I'd support a .get_name implementation for a filter
> child role (which Max wanted to introduce anyway for his filter series).
> 
> Of course, if you have a function that is made to return a convenient
> text for human users, and you use it for a stable ABI like the migration
> stream, this is an idea that would certainly have caused an entertaining
> Linus rant in the good old kernel times.
> 
>>> Is bitmap migration something that must be enabled explicitly or does
>>> it happen automatically? If it's explicit, then making an additional
>>> requirement (matching node-names) shouldn't be a problem.
>>
>> This means that bitmap migration becomes a blockdev-only feature.
> 
> I meant this more as the preferred way for the future rather than the
> only thing supported.
> 

My confusion then is how do you get the node-names to match *without*
using blockdev?

Or maybe I misunderstand.

We can always use explicit node-names whenever available, but if we
don't make blockdev a requirement, we need a fallback to determine an
addressable name.

> But Peter has actually mentioned that for libvirt it will be
> blockdev-only anyway. So do we even have a good reason to invest much
> for the non-blockdev case?
> 
> Maybe making it blockdev-only is actually pretty reasonable.
> 
>> Serious question: do we have plans to formally deprecate things like
>> -drive and mandate a blockdev workflow, or otherwise work to unify the
>> actual graph that gets created between the two methods?
> 
> It's high on my wishlist, though we can't before libvirt uses blockdev.
> 
> Maybe something to talk about at KVM Forum?
> 

Yes, please!

I've not well-understood the trajectory the block graph is taking. I
even wrote a graph visualization so I could understand the problem.

I think I have a better grip on what we have *currently* right now, so
I'd be keen to hear your roadmap for what we want to have and how it
ties into the CLI and QAPI.

>>>>> A simple way to do this might be a "child_unfiltered" BdrvChild role
>>>>> that simply bypasses the filter that was inserted and serves no real
>>>>> purpose other than to allow the child to have a parent link and find who
>>>>> it's """real""" parent is.
>>>>>
>>>>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
>>>>> feasible this quick idea might be, though.
>>>>>
>>>>>
>>>>> - Corollary fix #1: call error_setg if the bitmap node name that's about
>>>>> to go over the wire is an autogenerated node: this is never correct!
>>>>>
>>>>> (Why not? because the target is incapable of matching the node-name
>>>>> because they are randomly generated AND you cannot specify node-names
>>>>> with # prefixes as they are especially reserved!
>>>>>
>>>>> (This raises a related problem: if you explicitly add bitmaps to nodes
>>>>> with autogenerated names, you will be unable to migrate them.))
>>>>>
>>>>
>>>> In other words, we need a well defined way to match nodes on source and destination,
>>>> keeping in mind filters, to migrate bitmaps correctly.
>>>>
>>
>> Yes, exactly.
>>
>>>> Hm, did you thought about bitmaps in filters? It's not a problem to create bitmap in
>>>> mirror-top filter during mirror job:)
>>>>
>>>> Or what about bitmaps in Quorum children? Or what about bitmap in qcow2 file child bs?
>>>>
>>>> If node-names are different on source and destination, what is the same? Top blk name
>>>> and bdrv-children names (I recently saw Max's idea to check node "path" in iotest).
>>>
>>> blk_name has to be assumed to be "". The BdrvChild path changes when
>>> filters are inserted (and inserting filters on the destination that
>>> aren't present on the source, or vice versa, sounds like something that
>>> should just work).
>>>
>>> So both parts of this are not great ways for addressing nodes.
>>>
>>>> So, actually node is migration-addressable, if path <blk-name>/root[/child-name] to the
>>>> defines this node directly (we must not have children with same name for some node in
>>>> the path).
>>>>
>>>> And I think it's a correct way to define node in migration stream - by path.
>>>
>>> I'm afraid node-name is the only thing that could possibly work reliably
>>> for identifying nodes.
>>>
>>> Kevin
>>>
>>
>> It sounds like you are saying that bitmaps must become a blockdev-only
>> feature.
> 
> More like we can't rely on BlockBackend names in a blockdev setup.

Agree; so this discussion is now mostly about what the fallback looks like.

> BlockBackend names can work in a purely traditional setup if we make
> some effort to find the block backend even with filters in bettwen, but
> they aren't universal even then. And node-names work in the blockdev
> case, but they aren't universal either.
> 
> But as I said above, you made me wonder whether making it a
> blockdev-only feature wouldn't actually be a good idea.
> 

O:-) It's the most logically consistent, but we should figure out how to
make new CLI sugar that helps make our story consistent. Or get rid of
the CLI sugar, I guess.

(I kind of like sugar, perhaps infamously, but if our view is that QEMU
is a library and API and nothing more, the sugar can go and be replaced
by sugar in binding libraries that provide said sugar.)

>> I'm not sure if I have arrived at that conclusion yet, but it's at least
>> inarguable that with blockdev it's a lot simpler to guarantee correctness.
>>
>> However, we still have -cdrom and -hda and -drive and any number of
>> sugars that I think we aren't committed to getting rid of yet... (or ever?)
> 
> -hda and -cdrom currently have hard-coded BlockBackend names. I don't
> see what would stop us from changing this to hard-coded node names.
> 

Ideally, I'd see something like -hda and -cdrom become strict sugar for
-drive, which becomes strict sugar for -device/-blockdevice.

So that everything is using the same mechanisms and is well-defined.

I guess. Perhaps.

Let's really talk very seriously about this at KVM Forum -- it aligns
with some pet projects I want to investigate; namely simplifying the
QEMU CLI such that we can generate better documentation.

> -drive is something that we probably need to remove in its current form
> and introduce something new that is both user-friendly and powerful.
> Actually, we should probably add the replacement first. :-)
> 

Great! Looking forward to Lyon.

--js


  parent reply	other threads:[~2019-10-01 16:59 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-01  0:09 bitmap migration bug with -drive while block mirror runs John Snow
2019-10-01  4:28 ` Peter Krempa
2019-10-01  9:07   ` Vladimir Sementsov-Ogievskiy
2019-10-01  8:57 ` Vladimir Sementsov-Ogievskiy
2019-10-01  9:54   ` Kevin Wolf
2019-10-01 10:05     ` Vladimir Sementsov-Ogievskiy
2019-10-01 13:24     ` Peter Krempa
2019-10-01 15:09     ` John Snow
2019-10-01 15:58       ` Kevin Wolf
2019-10-01 16:12         ` Vladimir Sementsov-Ogievskiy
2019-10-01 16:24           ` Kevin Wolf
2019-10-01 16:23         ` John Snow [this message]
2019-10-01 11:45   ` Peter Krempa
2019-10-01  9:17 ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:00 ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:10   ` John Snow
2019-10-01 15:57     ` Vladimir Sementsov-Ogievskiy
2019-10-01 16:07       ` John Snow
2019-10-02  8:12         ` Kevin Wolf
2019-10-02 10:46         ` Peter Krempa
2019-10-02 11:11           ` Kevin Wolf
2019-10-02 12:22             ` Vladimir Sementsov-Ogievskiy
2019-10-02 13:48               ` Peter Krempa
2019-10-02 13:43             ` Peter Krempa
2019-10-02 14:03               ` Vladimir Sementsov-Ogievskiy
2019-10-02 21:35           ` John Snow
2019-10-03 10:14             ` Vladimir Sementsov-Ogievskiy
2019-10-03 23:34               ` John Snow
2019-10-04  8:33                 ` Peter Krempa
2019-10-04  9:21                   ` Vladimir Sementsov-Ogievskiy
2019-10-06  3:15                   ` John Snow
2019-10-04  9:24                 ` Vladimir Sementsov-Ogievskiy
2019-10-04 13:07                   ` Eric Blake
2019-10-06  3:19                     ` John Snow
2019-10-01 16:16       ` Kevin Wolf
2019-10-01 16:17         ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:13   ` Max Reitz
2019-10-01 14:27     ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:34       ` Max Reitz
2019-10-01 14:53         ` Vladimir Sementsov-Ogievskiy
2019-10-01 15:26           ` Max Reitz
2019-10-02  7:34             ` Peter Krempa
2019-10-01 15:09         ` Kevin Wolf
2019-10-01 15:27           ` Max Reitz
2019-10-01 16:12             ` Kevin Wolf
2019-10-01 16:17               ` Max Reitz
2019-10-01 16:22                 ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2114e2a7-8cf7-ce27-6d84-00526fd2ab04@redhat.com \
    --to=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.