All of lore.kernel.org
 help / color / mirror / Atom feed
* bitmap migration bug with -drive while block mirror runs
@ 2019-10-01  0:09 John Snow
  2019-10-01  4:28 ` Peter Krempa
                   ` (3 more replies)
  0 siblings, 4 replies; 47+ messages in thread
From: John Snow @ 2019-10-01  0:09 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Peter Krempa, qemu-devel, Qemu-block, Max Reitz

Hi folks, I identified a problem with the migration code that Red Hat QE
found and thought you'd like to see it:

https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20

Very, very briefly: drive-mirror inserts a filter node that changes what
bdrv_get_device_or_node_name() returns, which causes a migration problem.


Ignorant question #1: Can we multi-parent the filter node and
source-node? It looks like at the moment both consider their only parent
to be the block-job and don't have a link back to their parents otherwise.


Otherwise: I have a lot of cloudy ideas on how to solve this, but
ultimately what we want is to be able to find the "addressable" name for
the node the bitmap is attached to, which would be the name of the first
ancestor node that isn't a filter. (OR, the name of the block-backend
above that node.)

A simple way to do this might be a "child_unfiltered" BdrvChild role
that simply bypasses the filter that was inserted and serves no real
purpose other than to allow the child to have a parent link and find who
it's """real""" parent is.

Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
feasible this quick idea might be, though.


- Corollary fix #1: call error_setg if the bitmap node name that's about
to go over the wire is an autogenerated node: this is never correct!

(Why not? because the target is incapable of matching the node-name
because they are randomly generated AND you cannot specify node-names
with # prefixes as they are especially reserved!

(This raises a related problem: if you explicitly add bitmaps to nodes
with autogenerated names, you will be unable to migrate them.))

--js


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01  0:09 bitmap migration bug with -drive while block mirror runs John Snow
@ 2019-10-01  4:28 ` Peter Krempa
  2019-10-01  9:07   ` Vladimir Sementsov-Ogievskiy
  2019-10-01  8:57 ` Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 47+ messages in thread
From: Peter Krempa @ 2019-10-01  4:28 UTC (permalink / raw)
  To: John Snow; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Qemu-block, Max Reitz

On Mon, Sep 30, 2019 at 20:09:28 -0400, John Snow wrote:
> Hi folks, I identified a problem with the migration code that Red Hat QE
> found and thought you'd like to see it:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
> 
> Very, very briefly: drive-mirror inserts a filter node that changes what
> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>
> Ignorant question #1: Can we multi-parent the filter node and
> source-node? It looks like at the moment both consider their only parent
> to be the block-job and don't have a link back to their parents otherwise.
> 
> 
> Otherwise: I have a lot of cloudy ideas on how to solve this, but
> ultimately what we want is to be able to find the "addressable" name for
> the node the bitmap is attached to, which would be the name of the first
> ancestor node that isn't a filter. (OR, the name of the block-backend
> above that node.)

One possibility if there isn't an elegant qemu-based solution would be
to add a migration feature libvirt could enable which would simply stop
bitmaps from being copied and libvirt would do that in the synchronised
phase of the migration explicitly.

Libvirt might possibly need to do it anyways for inactive bitmaps if
the automatic bitmap copying includes only active bitmaps.

I'm not sure though how that would combine with post-copy migration or
what the impact on latency would be, but if you are migrating with
storage I think performance will not be stelar anyways.

> A simple way to do this might be a "child_unfiltered" BdrvChild role
> that simply bypasses the filter that was inserted and serves no real
> purpose other than to allow the child to have a parent link and find who
> it's """real""" parent is.
> 
> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
> feasible this quick idea might be, though.
> 
> 
> - Corollary fix #1: call error_setg if the bitmap node name that's about
> to go over the wire is an autogenerated node: this is never correct!
> 
> (Why not? because the target is incapable of matching the node-name
> because they are randomly generated AND you cannot specify node-names
> with # prefixes as they are especially reserved!
> 
> (This raises a related problem: if you explicitly add bitmaps to nodes
> with autogenerated names, you will be unable to migrate them.))

I think this should be okay. In libvirt I opted to forbid checkpoints
which map to bitmap creation until blockdev will be supported where we
manage node names ourselves.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01  0:09 bitmap migration bug with -drive while block mirror runs John Snow
  2019-10-01  4:28 ` Peter Krempa
@ 2019-10-01  8:57 ` Vladimir Sementsov-Ogievskiy
  2019-10-01  9:54   ` Kevin Wolf
  2019-10-01 11:45   ` Peter Krempa
  2019-10-01  9:17 ` Vladimir Sementsov-Ogievskiy
  2019-10-01 14:00 ` Vladimir Sementsov-Ogievskiy
  3 siblings, 2 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-01  8:57 UTC (permalink / raw)
  To: John Snow; +Cc: Peter Krempa, qemu-devel, Qemu-block, Max Reitz

01.10.2019 3:09, John Snow wrote:
> Hi folks, I identified a problem with the migration code that Red Hat QE
> found and thought you'd like to see it:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
> 
> Very, very briefly: drive-mirror inserts a filter node that changes what
> bdrv_get_device_or_node_name() returns, which causes a migration problem.
> 
> 
> Ignorant question #1: Can we multi-parent the filter node and
> source-node? It looks like at the moment both consider their only parent
> to be the block-job and don't have a link back to their parents otherwise.
> 
> 
> Otherwise: I have a lot of cloudy ideas on how to solve this, but
> ultimately what we want is to be able to find the "addressable" name for
> the node the bitmap is attached to, which would be the name of the first
> ancestor node that isn't a filter. (OR, the name of the block-backend
> above that node.)


Better would be to migrate by node-name only.. But am I right that node-names
are different on source and destination? Or this situation changed?

> 
> A simple way to do this might be a "child_unfiltered" BdrvChild role
> that simply bypasses the filter that was inserted and serves no real
> purpose other than to allow the child to have a parent link and find who
> it's """real""" parent is.
> 
> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
> feasible this quick idea might be, though.
> 
> 
> - Corollary fix #1: call error_setg if the bitmap node name that's about
> to go over the wire is an autogenerated node: this is never correct!
> 
> (Why not? because the target is incapable of matching the node-name
> because they are randomly generated AND you cannot specify node-names
> with # prefixes as they are especially reserved!
> 
> (This raises a related problem: if you explicitly add bitmaps to nodes
> with autogenerated names, you will be unable to migrate them.))
> 

In other words, we need a well defined way to match nodes on source and destination,
keeping in mind filters, to migrate bitmaps correctly.

Hm, did you thought about bitmaps in filters? It's not a problem to create bitmap in
mirror-top filter during mirror job:)

Or what about bitmaps in Quorum children? Or what about bitmap in qcow2 file child bs?

If node-names are different on source and destination, what is the same? Top blk name
and bdrv-children names (I recently saw Max's idea to check node "path" in iotest).

So, actually node is migration-addressable, if path <blk-name>/root[/child-name] to the
defines this node directly (we must not have children with same name for some node in
the path).

And I think it's a correct way to define node in migration stream - by path.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01  4:28 ` Peter Krempa
@ 2019-10-01  9:07   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-01  9:07 UTC (permalink / raw)
  To: Peter Krempa, John Snow; +Cc: qemu-devel, Qemu-block, Max Reitz

01.10.2019 7:28, Peter Krempa wrote:
> On Mon, Sep 30, 2019 at 20:09:28 -0400, John Snow wrote:
>> Hi folks, I identified a problem with the migration code that Red Hat QE
>> found and thought you'd like to see it:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>
>> Very, very briefly: drive-mirror inserts a filter node that changes what
>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>
>> Ignorant question #1: Can we multi-parent the filter node and
>> source-node? It looks like at the moment both consider their only parent
>> to be the block-job and don't have a link back to their parents otherwise.
>>
>>
>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>> ultimately what we want is to be able to find the "addressable" name for
>> the node the bitmap is attached to, which would be the name of the first
>> ancestor node that isn't a filter. (OR, the name of the block-backend
>> above that node.)
> 
> One possibility if there isn't an elegant qemu-based solution would be
> to add a migration feature libvirt could enable which would simply stop
> bitmaps from being copied and libvirt would do that in the synchronised
> phase of the migration explicitly.

Hmm. How it can do it?

I now the only way to migrate bitmaps without migration-capability: through
shared storage during migration downtime. But this means that downtime becomes
long if there are a lot of bitmap data and storage is not fast enough.

> 
> Libvirt might possibly need to do it anyways for inactive bitmaps if
> the automatic bitmap copying includes only active bitmaps.

All bitmaps are migrated.

> 
> I'm not sure though how that would combine with post-copy migration or
> what the impact on latency would be, but if you are migrating with
> storage I think performance will not be stelar anyways.

Just keep in mind that bitmap with default granularity (64k) for 16TB disk
is 32M, and we may have several such bitmaps. Storing/loading them
during migration downtime will of course influence this downtime

> 
>> A simple way to do this might be a "child_unfiltered" BdrvChild role
>> that simply bypasses the filter that was inserted and serves no real
>> purpose other than to allow the child to have a parent link and find who
>> it's """real""" parent is.
>>
>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
>> feasible this quick idea might be, though.
>>
>>
>> - Corollary fix #1: call error_setg if the bitmap node name that's about
>> to go over the wire is an autogenerated node: this is never correct!
>>
>> (Why not? because the target is incapable of matching the node-name
>> because they are randomly generated AND you cannot specify node-names
>> with # prefixes as they are especially reserved!
>>
>> (This raises a related problem: if you explicitly add bitmaps to nodes
>> with autogenerated names, you will be unable to migrate them.))
> 
> I think this should be okay. In libvirt I opted to forbid checkpoints
> which map to bitmap creation until blockdev will be supported where we
> manage node names ourselves.
> 

Hmm, I'm afraid that we'll have to fix this problem sooner.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01  0:09 bitmap migration bug with -drive while block mirror runs John Snow
  2019-10-01  4:28 ` Peter Krempa
  2019-10-01  8:57 ` Vladimir Sementsov-Ogievskiy
@ 2019-10-01  9:17 ` Vladimir Sementsov-Ogievskiy
  2019-10-01 14:00 ` Vladimir Sementsov-Ogievskiy
  3 siblings, 0 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-01  9:17 UTC (permalink / raw)
  To: John Snow; +Cc: Peter Krempa, qemu-devel, Qemu-block, Max Reitz

01.10.2019 3:09, John Snow wrote:
> Hi folks, I identified a problem with the migration code that Red Hat QE
> found and thought you'd like to see it:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
> 
> Very, very briefly: drive-mirror inserts a filter node that changes what
> bdrv_get_device_or_node_name() returns, which causes a migration problem.
> 
> 
> Ignorant question #1: Can we multi-parent the filter node and
> source-node? It looks like at the moment both consider their only parent
> to be the block-job and don't have a link back to their parents otherwise.
> 
> 
> Otherwise: I have a lot of cloudy ideas on how to solve this, but
> ultimately what we want is to be able to find the "addressable" name for
> the node the bitmap is attached to, which would be the name of the first
> ancestor node that isn't a filter. (OR, the name of the block-backend
> above that node.)
> 
> A simple way to do this might be a "child_unfiltered" BdrvChild role
> that simply bypasses the filter that was inserted and serves no real
> purpose other than to allow the child to have a parent link and find who
> it's """real""" parent is.
> 
> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
> feasible this quick idea might be, though.
> 
> 
> - Corollary fix #1: call error_setg if the bitmap node name that's about
> to go over the wire is an autogenerated node: this is never correct!
> 
> (Why not? because the target is incapable of matching the node-name
> because they are randomly generated AND you cannot specify node-names
> with # prefixes as they are especially reserved!
> 
> (This raises a related problem: if you explicitly add bitmaps to nodes
> with autogenerated names, you will be unable to migrate them.))
> 

Related problem:

# cd tests/qemu-iotests; git grep -il migration ??? | xargs git grep -il mirror | xargs git grep -il bitmap | wc -l
0



-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01  8:57 ` Vladimir Sementsov-Ogievskiy
@ 2019-10-01  9:54   ` Kevin Wolf
  2019-10-01 10:05     ` Vladimir Sementsov-Ogievskiy
                       ` (2 more replies)
  2019-10-01 11:45   ` Peter Krempa
  1 sibling, 3 replies; 47+ messages in thread
From: Kevin Wolf @ 2019-10-01  9:54 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: John Snow, qemu-devel, Qemu-block, Max Reitz

Am 01.10.2019 um 10:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 01.10.2019 3:09, John Snow wrote:
> > Hi folks, I identified a problem with the migration code that Red Hat QE
> > found and thought you'd like to see it:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
> > 
> > Very, very briefly: drive-mirror inserts a filter node that changes what
> > bdrv_get_device_or_node_name() returns, which causes a migration problem.
> > 
> > 
> > Ignorant question #1: Can we multi-parent the filter node and
> > source-node? It looks like at the moment both consider their only parent
> > to be the block-job and don't have a link back to their parents otherwise.
> > 
> > 
> > Otherwise: I have a lot of cloudy ideas on how to solve this, but
> > ultimately what we want is to be able to find the "addressable" name for
> > the node the bitmap is attached to, which would be the name of the first
> > ancestor node that isn't a filter. (OR, the name of the block-backend
> > above that node.)
> 
> 
> Better would be to migrate by node-name only.. But am I right that
> node-names are different on source and destination? Or this situation
> changed?

Traditionally, I think migration assumes that frontends (guest devices)
must match exactly, but backends may and usually will differ.

Of course, dirty bitmaps are a backend feature that isn't really related
to guest devices, so this doesn't really work out any more in your case.
BlockBackend names are unusable for this purpose (especially as we're
moving towards anonymous BlockBackends everywhere), which I guess
essentially means node-name is the only option left.

Is bitmap migration something that must be enabled explicitly or does
it happen automatically? If it's explicit, then making an additional
requirement (matching node-names) shouldn't be a problem.

> > A simple way to do this might be a "child_unfiltered" BdrvChild role
> > that simply bypasses the filter that was inserted and serves no real
> > purpose other than to allow the child to have a parent link and find who
> > it's """real""" parent is.
> > 
> > Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
> > feasible this quick idea might be, though.
> > 
> > 
> > - Corollary fix #1: call error_setg if the bitmap node name that's about
> > to go over the wire is an autogenerated node: this is never correct!
> > 
> > (Why not? because the target is incapable of matching the node-name
> > because they are randomly generated AND you cannot specify node-names
> > with # prefixes as they are especially reserved!
> > 
> > (This raises a related problem: if you explicitly add bitmaps to nodes
> > with autogenerated names, you will be unable to migrate them.))
> > 
> 
> In other words, we need a well defined way to match nodes on source and destination,
> keeping in mind filters, to migrate bitmaps correctly.
> 
> Hm, did you thought about bitmaps in filters? It's not a problem to create bitmap in
> mirror-top filter during mirror job:)
> 
> Or what about bitmaps in Quorum children? Or what about bitmap in qcow2 file child bs?
> 
> If node-names are different on source and destination, what is the same? Top blk name
> and bdrv-children names (I recently saw Max's idea to check node "path" in iotest).

blk_name has to be assumed to be "". The BdrvChild path changes when
filters are inserted (and inserting filters on the destination that
aren't present on the source, or vice versa, sounds like something that
should just work).

So both parts of this are not great ways for addressing nodes.

> So, actually node is migration-addressable, if path <blk-name>/root[/child-name] to the
> defines this node directly (we must not have children with same name for some node in
> the path).
> 
> And I think it's a correct way to define node in migration stream - by path.

I'm afraid node-name is the only thing that could possibly work reliably
for identifying nodes.

Kevin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01  9:54   ` Kevin Wolf
@ 2019-10-01 10:05     ` Vladimir Sementsov-Ogievskiy
  2019-10-01 13:24     ` Peter Krempa
  2019-10-01 15:09     ` John Snow
  2 siblings, 0 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-01 10:05 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Peter Krempa, Qemu-block, qemu-devel, Max Reitz,
	Nikolay Shirokovskiy, John Snow

01.10.2019 12:54, Kevin Wolf wrote:
> Am 01.10.2019 um 10:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 01.10.2019 3:09, John Snow wrote:
>>> Hi folks, I identified a problem with the migration code that Red Hat QE
>>> found and thought you'd like to see it:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>>
>>> Very, very briefly: drive-mirror inserts a filter node that changes what
>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>>
>>>
>>> Ignorant question #1: Can we multi-parent the filter node and
>>> source-node? It looks like at the moment both consider their only parent
>>> to be the block-job and don't have a link back to their parents otherwise.
>>>
>>>
>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>> ultimately what we want is to be able to find the "addressable" name for
>>> the node the bitmap is attached to, which would be the name of the first
>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>> above that node.)
>>
>>
>> Better would be to migrate by node-name only.. But am I right that
>> node-names are different on source and destination? Or this situation
>> changed?
> 
> Traditionally, I think migration assumes that frontends (guest devices)
> must match exactly, but backends may and usually will differ.
> 
> Of course, dirty bitmaps are a backend feature that isn't really related
> to guest devices, so this doesn't really work out any more in your case.
> BlockBackend names are unusable for this purpose (especially as we're
> moving towards anonymous BlockBackends everywhere), which I guess
> essentially means node-name is the only option left.
> 
> Is bitmap migration something that must be enabled explicitly or does
> it happen automatically? If it's explicit, then making an additional
> requirement (matching node-names) shouldn't be a problem.

The problem is that mirror filter is already in Rhel qemu and we have this bug,
but libvirt is not yet prepared to match node-name on migration, or am I wrong?

> 
>>> A simple way to do this might be a "child_unfiltered" BdrvChild role
>>> that simply bypasses the filter that was inserted and serves no real
>>> purpose other than to allow the child to have a parent link and find who
>>> it's """real""" parent is.
>>>
>>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
>>> feasible this quick idea might be, though.
>>>
>>>
>>> - Corollary fix #1: call error_setg if the bitmap node name that's about
>>> to go over the wire is an autogenerated node: this is never correct!
>>>
>>> (Why not? because the target is incapable of matching the node-name
>>> because they are randomly generated AND you cannot specify node-names
>>> with # prefixes as they are especially reserved!
>>>
>>> (This raises a related problem: if you explicitly add bitmaps to nodes
>>> with autogenerated names, you will be unable to migrate them.))
>>>
>>
>> In other words, we need a well defined way to match nodes on source and destination,
>> keeping in mind filters, to migrate bitmaps correctly.
>>
>> Hm, did you thought about bitmaps in filters? It's not a problem to create bitmap in
>> mirror-top filter during mirror job:)
>>
>> Or what about bitmaps in Quorum children? Or what about bitmap in qcow2 file child bs?
>>
>> If node-names are different on source and destination, what is the same? Top blk name
>> and bdrv-children names (I recently saw Max's idea to check node "path" in iotest).
> 
> blk_name has to be assumed to be "". The BdrvChild path changes when
> filters are inserted (and inserting filters on the destination that
> aren't present on the source, or vice versa, sounds like something that
> should just work).

Skipping filters is not a problem until we don't want bitmaps in filters.

> 
> So both parts of this are not great ways for addressing nodes.

Not great, yes..

> 
>> So, actually node is migration-addressable, if path <blk-name>/root[/child-name] to the
>> defines this node directly (we must not have children with same name for some node in
>> the path).
>>
>> And I think it's a correct way to define node in migration stream - by path.
> 
> I'm afraid node-name is the only thing that could possibly work reliably
> for identifying nodes.
> 

Can we do it to fix bug in current Rhel's qemu? How much effort on libvirt part is needed?


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01  8:57 ` Vladimir Sementsov-Ogievskiy
  2019-10-01  9:54   ` Kevin Wolf
@ 2019-10-01 11:45   ` Peter Krempa
  1 sibling, 0 replies; 47+ messages in thread
From: Peter Krempa @ 2019-10-01 11:45 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: John Snow, qemu-devel, Qemu-block, Max Reitz

On Tue, Oct 01, 2019 at 08:57:37 +0000, Vladimir Sementsov-Ogievskiy wrote:
> 01.10.2019 3:09, John Snow wrote:
> > Hi folks, I identified a problem with the migration code that Red Hat QE
> > found and thought you'd like to see it:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
> > 
> > Very, very briefly: drive-mirror inserts a filter node that changes what
> > bdrv_get_device_or_node_name() returns, which causes a migration problem.
> > 
> > 
> > Ignorant question #1: Can we multi-parent the filter node and
> > source-node? It looks like at the moment both consider their only parent
> > to be the block-job and don't have a link back to their parents otherwise.
> > 
> > 
> > Otherwise: I have a lot of cloudy ideas on how to solve this, but
> > ultimately what we want is to be able to find the "addressable" name for
> > the node the bitmap is attached to, which would be the name of the first
> > ancestor node that isn't a filter. (OR, the name of the block-backend
> > above that node.)
> 
> 
> Better would be to migrate by node-name only.. But am I right that node-names
> are different on source and destination? Or this situation changed?

They may be different. Some time ago when I asked I was told that node
names are not bound to any state in qemu and thus can be re-created on
destination. Otherwise they technically become guest ABI which would
require changes to libvirt's blockdev usage.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01  9:54   ` Kevin Wolf
  2019-10-01 10:05     ` Vladimir Sementsov-Ogievskiy
@ 2019-10-01 13:24     ` Peter Krempa
  2019-10-01 15:09     ` John Snow
  2 siblings, 0 replies; 47+ messages in thread
From: Peter Krempa @ 2019-10-01 13:24 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Qemu-block, Max Reitz

On Tue, Oct 01, 2019 at 11:54:16 +0200, Kevin Wolf wrote:
> Am 01.10.2019 um 10:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > 01.10.2019 3:09, John Snow wrote:
> > > Hi folks, I identified a problem with the migration code that Red Hat QE
> > > found and thought you'd like to see it:
> > > 
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
> > > 
> > > Very, very briefly: drive-mirror inserts a filter node that changes what
> > > bdrv_get_device_or_node_name() returns, which causes a migration problem.
> > > 
> > > 
> > > Ignorant question #1: Can we multi-parent the filter node and
> > > source-node? It looks like at the moment both consider their only parent
> > > to be the block-job and don't have a link back to their parents otherwise.
> > > 
> > > 
> > > Otherwise: I have a lot of cloudy ideas on how to solve this, but
> > > ultimately what we want is to be able to find the "addressable" name for
> > > the node the bitmap is attached to, which would be the name of the first
> > > ancestor node that isn't a filter. (OR, the name of the block-backend
> > > above that node.)
> > 
> > 
> > Better would be to migrate by node-name only.. But am I right that
> > node-names are different on source and destination? Or this situation
> > changed?
> 
> Traditionally, I think migration assumes that frontends (guest devices)
> must match exactly, but backends may and usually will differ.
> 
> Of course, dirty bitmaps are a backend feature that isn't really related
> to guest devices, so this doesn't really work out any more in your case.
> BlockBackend names are unusable for this purpose (especially as we're
> moving towards anonymous BlockBackends everywhere), which I guess
> essentially means node-name is the only option left.
> 
> Is bitmap migration something that must be enabled explicitly or does
> it happen automatically? If it's explicit, then making an additional
> requirement (matching node-names) shouldn't be a problem.

I think a far better and more reasonable solution is to provide a map on
migration which would match the node names explicitly. If they have to
be the same it moves node names together with the frontend options as
guest ABI.

This means that changing the disk backend on migration will become very
painful.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01  0:09 bitmap migration bug with -drive while block mirror runs John Snow
                   ` (2 preceding siblings ...)
  2019-10-01  9:17 ` Vladimir Sementsov-Ogievskiy
@ 2019-10-01 14:00 ` Vladimir Sementsov-Ogievskiy
  2019-10-01 14:10   ` John Snow
  2019-10-01 14:13   ` Max Reitz
  3 siblings, 2 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-01 14:00 UTC (permalink / raw)
  To: John Snow; +Cc: Peter Krempa, qemu-devel, Qemu-block, Max Reitz

01.10.2019 3:09, John Snow wrote:
> Hi folks, I identified a problem with the migration code that Red Hat QE
> found and thought you'd like to see it:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
> 
> Very, very briefly: drive-mirror inserts a filter node that changes what
> bdrv_get_device_or_node_name() returns, which causes a migration problem.
> 
> 
> Ignorant question #1: Can we multi-parent the filter node and
> source-node? It looks like at the moment both consider their only parent
> to be the block-job and don't have a link back to their parents otherwise.
> 
> 
> Otherwise: I have a lot of cloudy ideas on how to solve this, but
> ultimately what we want is to be able to find the "addressable" name for
> the node the bitmap is attached to, which would be the name of the first
> ancestor node that isn't a filter. (OR, the name of the block-backend
> above that node.)

Not the name of ancestor node, it will break mapping: it must be name of the
node itself or name of parent (may be through several filters) block-backend

> 
> A simple way to do this might be a "child_unfiltered" BdrvChild role
> that simply bypasses the filter that was inserted and serves no real
> purpose other than to allow the child to have a parent link and find who
> it's """real""" parent is.
> 
> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
> feasible this quick idea might be, though.
> 
> 
> - Corollary fix #1: call error_setg if the bitmap node name that's about
> to go over the wire is an autogenerated node: this is never correct!
> 
> (Why not? because the target is incapable of matching the node-name
> because they are randomly generated AND you cannot specify node-names
> with # prefixes as they are especially reserved!
> 
> (This raises a related problem: if you explicitly add bitmaps to nodes
> with autogenerated names, you will be unable to migrate them.))
> 
> --js
> 

What about the following:

diff --git a/block.c b/block.c
index 5944124845..6739c19be9 100644
--- a/block.c
+++ b/block.c
@@ -1009,8 +1009,20 @@ static void bdrv_inherited_options(int *child_flags, QDict *child_options,
      *child_flags = flags;
  }

+static const char *bdrv_child_get_name(BdrvChild *child)
+{
+    BlockDriverState *parent = child->opaque;
+
+    if (parent->drv && parent->drv->is_filter) {
+        return bdrv_get_parent_name(parent);
+    }
+
+    return NULL;
+}
+
  const BdrvChildRole child_file = {
      .parent_is_bds   = true,
+    .get_name        = bdrv_child_get_name,
      .get_parent_desc = bdrv_child_get_parent_desc,
      .inherit_options = bdrv_inherited_options,
      .drained_begin   = bdrv_child_cb_drained_begin,
@@ -1163,6 +1175,7 @@ static int bdrv_backing_update_filename(BdrvChild *c, BlockDriverState *base,

  const BdrvChildRole child_backing = {
      .parent_is_bds   = true,
+    .get_name        = bdrv_child_get_name,
      .get_parent_desc = bdrv_child_get_parent_desc,
      .attach          = bdrv_backing_attach,
      .detach          = bdrv_backing_detach,


-- 
Best regards,
Vladimir

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 14:00 ` Vladimir Sementsov-Ogievskiy
@ 2019-10-01 14:10   ` John Snow
  2019-10-01 15:57     ` Vladimir Sementsov-Ogievskiy
  2019-10-01 14:13   ` Max Reitz
  1 sibling, 1 reply; 47+ messages in thread
From: John Snow @ 2019-10-01 14:10 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Peter Krempa, qemu-devel, Qemu-block, Max Reitz



On 10/1/19 10:00 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>> ultimately what we want is to be able to find the "addressable" name for
>> the node the bitmap is attached to, which would be the name of the first
>> ancestor node that isn't a filter. (OR, the name of the block-backend
>> above that node.)
> Not the name of ancestor node, it will break mapping: it must be name of the
> node itself or name of parent (may be through several filters) block-backend
> 

Ah, you are right of course -- because block-backends are the only
"nodes" for which we actually descend the graph and add the bitmap to
its child.

So the real back-resolution mechanism is:

- Find the first non-filter ancestor, A
- if A is not a block-backend, we must use our node-local name.
- if A's name is empty, we must use our node-local name.
- If the name we have chosen is not id_wellformed, we have no
migration-stable addressable name for this bitmap and the migration must
fail!


For resolving bitmap addresses via QMP (node, name) pairs; the
resolution method would be this:

- if the node-name N is a block-backend, descend the tree until we find
the first non-filter node V.
- if the node-name N is a BlockDriverState, use this node directly.


(I don't have the time to investigate the code snippet right now; my
attention is being pulled to a different project. sorry!)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 14:00 ` Vladimir Sementsov-Ogievskiy
  2019-10-01 14:10   ` John Snow
@ 2019-10-01 14:13   ` Max Reitz
  2019-10-01 14:27     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 47+ messages in thread
From: Max Reitz @ 2019-10-01 14:13 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, John Snow
  Cc: Peter Krempa, qemu-devel, Qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 2694 bytes --]

On 01.10.19 16:00, Vladimir Sementsov-Ogievskiy wrote:
> 01.10.2019 3:09, John Snow wrote:
>> Hi folks, I identified a problem with the migration code that Red Hat QE
>> found and thought you'd like to see it:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>
>> Very, very briefly: drive-mirror inserts a filter node that changes what
>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>
>>
>> Ignorant question #1: Can we multi-parent the filter node and
>> source-node? It looks like at the moment both consider their only parent
>> to be the block-job and don't have a link back to their parents otherwise.
>>
>>
>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>> ultimately what we want is to be able to find the "addressable" name for
>> the node the bitmap is attached to, which would be the name of the first
>> ancestor node that isn't a filter. (OR, the name of the block-backend
>> above that node.)
> 
> Not the name of ancestor node, it will break mapping: it must be name of the
> node itself or name of parent (may be through several filters) block-backend
> 
>>
>> A simple way to do this might be a "child_unfiltered" BdrvChild role
>> that simply bypasses the filter that was inserted and serves no real
>> purpose other than to allow the child to have a parent link and find who
>> it's """real""" parent is.
>>
>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
>> feasible this quick idea might be, though.
>>
>>
>> - Corollary fix #1: call error_setg if the bitmap node name that's about
>> to go over the wire is an autogenerated node: this is never correct!
>>
>> (Why not? because the target is incapable of matching the node-name
>> because they are randomly generated AND you cannot specify node-names
>> with # prefixes as they are especially reserved!
>>
>> (This raises a related problem: if you explicitly add bitmaps to nodes
>> with autogenerated names, you will be unable to migrate them.))
>>
>> --js
>>
> 
> What about the following:
> 
> diff --git a/block.c b/block.c
> index 5944124845..6739c19be9 100644
> --- a/block.c
> +++ b/block.c
> @@ -1009,8 +1009,20 @@ static void bdrv_inherited_options(int *child_flags, QDict *child_options,
>       *child_flags = flags;
>   }
> 
> +static const char *bdrv_child_get_name(BdrvChild *child)
> +{
> +    BlockDriverState *parent = child->opaque;
> +
> +    if (parent->drv && parent->drv->is_filter) {
> +        return bdrv_get_parent_name(parent);
> +    }
> +
> +    return NULL;
> +}
> +

Why would we skip filters explicitly added by the user?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 14:13   ` Max Reitz
@ 2019-10-01 14:27     ` Vladimir Sementsov-Ogievskiy
  2019-10-01 14:34       ` Max Reitz
  0 siblings, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-01 14:27 UTC (permalink / raw)
  To: Max Reitz, John Snow; +Cc: Peter Krempa, qemu-devel, Qemu-block

01.10.2019 17:13, Max Reitz wrote:
> On 01.10.19 16:00, Vladimir Sementsov-Ogievskiy wrote:
>> 01.10.2019 3:09, John Snow wrote:
>>> Hi folks, I identified a problem with the migration code that Red Hat QE
>>> found and thought you'd like to see it:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>>
>>> Very, very briefly: drive-mirror inserts a filter node that changes what
>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>>
>>>
>>> Ignorant question #1: Can we multi-parent the filter node and
>>> source-node? It looks like at the moment both consider their only parent
>>> to be the block-job and don't have a link back to their parents otherwise.
>>>
>>>
>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>> ultimately what we want is to be able to find the "addressable" name for
>>> the node the bitmap is attached to, which would be the name of the first
>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>> above that node.)
>>
>> Not the name of ancestor node, it will break mapping: it must be name of the
>> node itself or name of parent (may be through several filters) block-backend
>>
>>>
>>> A simple way to do this might be a "child_unfiltered" BdrvChild role
>>> that simply bypasses the filter that was inserted and serves no real
>>> purpose other than to allow the child to have a parent link and find who
>>> it's """real""" parent is.
>>>
>>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
>>> feasible this quick idea might be, though.
>>>
>>>
>>> - Corollary fix #1: call error_setg if the bitmap node name that's about
>>> to go over the wire is an autogenerated node: this is never correct!
>>>
>>> (Why not? because the target is incapable of matching the node-name
>>> because they are randomly generated AND you cannot specify node-names
>>> with # prefixes as they are especially reserved!
>>>
>>> (This raises a related problem: if you explicitly add bitmaps to nodes
>>> with autogenerated names, you will be unable to migrate them.))
>>>
>>> --js
>>>
>>
>> What about the following:
>>
>> diff --git a/block.c b/block.c
>> index 5944124845..6739c19be9 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1009,8 +1009,20 @@ static void bdrv_inherited_options(int *child_flags, QDict *child_options,
>>        *child_flags = flags;
>>    }
>>
>> +static const char *bdrv_child_get_name(BdrvChild *child)
>> +{
>> +    BlockDriverState *parent = child->opaque;
>> +
>> +    if (parent->drv && parent->drv->is_filter) {
>> +        return bdrv_get_parent_name(parent);
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
> 
> Why would we skip filters explicitly added by the user?
> 

Why not? Otherwise migration of bitmaps will not work: we may have different set
of filters on source and destination, and we still should map nodes with bitmaps
automatically.

I like John's idea of explicitly defined node mapping, but now we need simpler fix,
not involving libvirt changes if possible.

Hmm, or you mean that by this patch I touch not only migration but all users of
bdrv_get_device_or_node_name? Than I can't predict all the consequences...

Is it better to add .get_name_for_bitmaps_migration handler (with simpler name of course)?

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 14:27     ` Vladimir Sementsov-Ogievskiy
@ 2019-10-01 14:34       ` Max Reitz
  2019-10-01 14:53         ` Vladimir Sementsov-Ogievskiy
  2019-10-01 15:09         ` Kevin Wolf
  0 siblings, 2 replies; 47+ messages in thread
From: Max Reitz @ 2019-10-01 14:34 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, John Snow
  Cc: Peter Krempa, qemu-devel, Qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 3807 bytes --]

On 01.10.19 16:27, Vladimir Sementsov-Ogievskiy wrote:
> 01.10.2019 17:13, Max Reitz wrote:
>> On 01.10.19 16:00, Vladimir Sementsov-Ogievskiy wrote:
>>> 01.10.2019 3:09, John Snow wrote:
>>>> Hi folks, I identified a problem with the migration code that Red Hat QE
>>>> found and thought you'd like to see it:
>>>>
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>>>
>>>> Very, very briefly: drive-mirror inserts a filter node that changes what
>>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>>>
>>>>
>>>> Ignorant question #1: Can we multi-parent the filter node and
>>>> source-node? It looks like at the moment both consider their only parent
>>>> to be the block-job and don't have a link back to their parents otherwise.
>>>>
>>>>
>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>>> ultimately what we want is to be able to find the "addressable" name for
>>>> the node the bitmap is attached to, which would be the name of the first
>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>>> above that node.)
>>>
>>> Not the name of ancestor node, it will break mapping: it must be name of the
>>> node itself or name of parent (may be through several filters) block-backend
>>>
>>>>
>>>> A simple way to do this might be a "child_unfiltered" BdrvChild role
>>>> that simply bypasses the filter that was inserted and serves no real
>>>> purpose other than to allow the child to have a parent link and find who
>>>> it's """real""" parent is.
>>>>
>>>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
>>>> feasible this quick idea might be, though.
>>>>
>>>>
>>>> - Corollary fix #1: call error_setg if the bitmap node name that's about
>>>> to go over the wire is an autogenerated node: this is never correct!
>>>>
>>>> (Why not? because the target is incapable of matching the node-name
>>>> because they are randomly generated AND you cannot specify node-names
>>>> with # prefixes as they are especially reserved!
>>>>
>>>> (This raises a related problem: if you explicitly add bitmaps to nodes
>>>> with autogenerated names, you will be unable to migrate them.))
>>>>
>>>> --js
>>>>
>>>
>>> What about the following:
>>>
>>> diff --git a/block.c b/block.c
>>> index 5944124845..6739c19be9 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -1009,8 +1009,20 @@ static void bdrv_inherited_options(int *child_flags, QDict *child_options,
>>>        *child_flags = flags;
>>>    }
>>>
>>> +static const char *bdrv_child_get_name(BdrvChild *child)
>>> +{
>>> +    BlockDriverState *parent = child->opaque;
>>> +
>>> +    if (parent->drv && parent->drv->is_filter) {
>>> +        return bdrv_get_parent_name(parent);
>>> +    }
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>
>> Why would we skip filters explicitly added by the user?
>>
> 
> Why not? Otherwise migration of bitmaps will not work: we may have different set
> of filters on source and destination, and we still should map nodes with bitmaps
> automatically.

Why would we have a different set of explicitly added filters on source
and destination and allow them to be automatically changed during
migration?  Shouldn’t users only change them pre or post migration?

And you can also add bitmaps on filters, of course.

> I like John's idea of explicitly defined node mapping, but now we need simpler fix,
> not involving libvirt changes if possible.
> 
> Hmm, or you mean that by this patch I touch not only migration but all users of
> bdrv_get_device_or_node_name? Than I can't predict all the consequences...

I’m hinting at the fact that maybe we cannot drop the concept of
implicit filters quite as fast as we wanted.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 14:34       ` Max Reitz
@ 2019-10-01 14:53         ` Vladimir Sementsov-Ogievskiy
  2019-10-01 15:26           ` Max Reitz
  2019-10-01 15:09         ` Kevin Wolf
  1 sibling, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-01 14:53 UTC (permalink / raw)
  To: Max Reitz, John Snow; +Cc: Peter Krempa, qemu-devel, Qemu-block

01.10.2019 17:34, Max Reitz wrote:
> On 01.10.19 16:27, Vladimir Sementsov-Ogievskiy wrote:
>> 01.10.2019 17:13, Max Reitz wrote:
>>> On 01.10.19 16:00, Vladimir Sementsov-Ogievskiy wrote:
>>>> 01.10.2019 3:09, John Snow wrote:
>>>>> Hi folks, I identified a problem with the migration code that Red Hat QE
>>>>> found and thought you'd like to see it:
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>>>>
>>>>> Very, very briefly: drive-mirror inserts a filter node that changes what
>>>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>>>>
>>>>>
>>>>> Ignorant question #1: Can we multi-parent the filter node and
>>>>> source-node? It looks like at the moment both consider their only parent
>>>>> to be the block-job and don't have a link back to their parents otherwise.
>>>>>
>>>>>
>>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>>>> ultimately what we want is to be able to find the "addressable" name for
>>>>> the node the bitmap is attached to, which would be the name of the first
>>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>>>> above that node.)
>>>>
>>>> Not the name of ancestor node, it will break mapping: it must be name of the
>>>> node itself or name of parent (may be through several filters) block-backend
>>>>
>>>>>
>>>>> A simple way to do this might be a "child_unfiltered" BdrvChild role
>>>>> that simply bypasses the filter that was inserted and serves no real
>>>>> purpose other than to allow the child to have a parent link and find who
>>>>> it's """real""" parent is.
>>>>>
>>>>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
>>>>> feasible this quick idea might be, though.
>>>>>
>>>>>
>>>>> - Corollary fix #1: call error_setg if the bitmap node name that's about
>>>>> to go over the wire is an autogenerated node: this is never correct!
>>>>>
>>>>> (Why not? because the target is incapable of matching the node-name
>>>>> because they are randomly generated AND you cannot specify node-names
>>>>> with # prefixes as they are especially reserved!
>>>>>
>>>>> (This raises a related problem: if you explicitly add bitmaps to nodes
>>>>> with autogenerated names, you will be unable to migrate them.))
>>>>>
>>>>> --js
>>>>>
>>>>
>>>> What about the following:
>>>>
>>>> diff --git a/block.c b/block.c
>>>> index 5944124845..6739c19be9 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -1009,8 +1009,20 @@ static void bdrv_inherited_options(int *child_flags, QDict *child_options,
>>>>         *child_flags = flags;
>>>>     }
>>>>
>>>> +static const char *bdrv_child_get_name(BdrvChild *child)
>>>> +{
>>>> +    BlockDriverState *parent = child->opaque;
>>>> +
>>>> +    if (parent->drv && parent->drv->is_filter) {
>>>> +        return bdrv_get_parent_name(parent);
>>>> +    }
>>>> +
>>>> +    return NULL;
>>>> +}
>>>> +
>>>
>>> Why would we skip filters explicitly added by the user?
>>>
>>
>> Why not? Otherwise migration of bitmaps will not work: we may have different set
>> of filters on source and destination, and we still should map nodes with bitmaps
>> automatically.
> 
> Why would we have a different set of explicitly added filters on source
> and destination and allow them to be automatically changed during
> migration?  Shouldn’t users only change them pre or post migration?

Hm I didn't say to automatically change filters.

I meant, that we must understand the mapping of nodes with bitmaps: if we have source
node N with bitmaps, in which target node these bitmaps should be after migration?
And we have to understand it without user interaction.

Still, if you don't like skipping explicit filters, I'm OK with implicit only, it will
fix the bug we are saying about.

But why you don't like creating some additional explicit filters on target (prior to
migration process) which we didn't have on source?

> 
> And you can also add bitmaps on filters, of course.
> 
>> I like John's idea of explicitly defined node mapping, but now we need simpler fix,
>> not involving libvirt changes if possible.
>>
>> Hmm, or you mean that by this patch I touch not only migration but all users of
>> bdrv_get_device_or_node_name? Than I can't predict all the consequences...
> 
> I’m hinting at the fact that maybe we cannot drop the concept of
> implicit filters quite as fast as we wanted.
> 
> Max
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 14:34       ` Max Reitz
  2019-10-01 14:53         ` Vladimir Sementsov-Ogievskiy
@ 2019-10-01 15:09         ` Kevin Wolf
  2019-10-01 15:27           ` Max Reitz
  1 sibling, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2019-10-01 15:09 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, Qemu-block

[-- Attachment #1: Type: text/plain, Size: 4168 bytes --]

Am 01.10.2019 um 16:34 hat Max Reitz geschrieben:
> On 01.10.19 16:27, Vladimir Sementsov-Ogievskiy wrote:
> > 01.10.2019 17:13, Max Reitz wrote:
> >> On 01.10.19 16:00, Vladimir Sementsov-Ogievskiy wrote:
> >>> 01.10.2019 3:09, John Snow wrote:
> >>>> Hi folks, I identified a problem with the migration code that Red Hat QE
> >>>> found and thought you'd like to see it:
> >>>>
> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
> >>>>
> >>>> Very, very briefly: drive-mirror inserts a filter node that changes what
> >>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
> >>>>
> >>>>
> >>>> Ignorant question #1: Can we multi-parent the filter node and
> >>>> source-node? It looks like at the moment both consider their only parent
> >>>> to be the block-job and don't have a link back to their parents otherwise.
> >>>>
> >>>>
> >>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
> >>>> ultimately what we want is to be able to find the "addressable" name for
> >>>> the node the bitmap is attached to, which would be the name of the first
> >>>> ancestor node that isn't a filter. (OR, the name of the block-backend
> >>>> above that node.)
> >>>
> >>> Not the name of ancestor node, it will break mapping: it must be name of the
> >>> node itself or name of parent (may be through several filters) block-backend
> >>>
> >>>>
> >>>> A simple way to do this might be a "child_unfiltered" BdrvChild role
> >>>> that simply bypasses the filter that was inserted and serves no real
> >>>> purpose other than to allow the child to have a parent link and find who
> >>>> it's """real""" parent is.
> >>>>
> >>>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
> >>>> feasible this quick idea might be, though.
> >>>>
> >>>>
> >>>> - Corollary fix #1: call error_setg if the bitmap node name that's about
> >>>> to go over the wire is an autogenerated node: this is never correct!
> >>>>
> >>>> (Why not? because the target is incapable of matching the node-name
> >>>> because they are randomly generated AND you cannot specify node-names
> >>>> with # prefixes as they are especially reserved!
> >>>>
> >>>> (This raises a related problem: if you explicitly add bitmaps to nodes
> >>>> with autogenerated names, you will be unable to migrate them.))
> >>>>
> >>>> --js
> >>>>
> >>>
> >>> What about the following:
> >>>
> >>> diff --git a/block.c b/block.c
> >>> index 5944124845..6739c19be9 100644
> >>> --- a/block.c
> >>> +++ b/block.c
> >>> @@ -1009,8 +1009,20 @@ static void bdrv_inherited_options(int *child_flags, QDict *child_options,
> >>>        *child_flags = flags;
> >>>    }
> >>>
> >>> +static const char *bdrv_child_get_name(BdrvChild *child)
> >>> +{
> >>> +    BlockDriverState *parent = child->opaque;
> >>> +
> >>> +    if (parent->drv && parent->drv->is_filter) {
> >>> +        return bdrv_get_parent_name(parent);
> >>> +    }
> >>> +
> >>> +    return NULL;
> >>> +}
> >>> +
> >>
> >> Why would we skip filters explicitly added by the user?
> >>
> > 
> > Why not? Otherwise migration of bitmaps will not work: we may have different set
> > of filters on source and destination, and we still should map nodes with bitmaps
> > automatically.
> 
> Why would we have a different set of explicitly added filters on source
> and destination and allow them to be automatically changed during
> migration?  Shouldn’t users only change them pre or post migration?

We never made a requirement that the backend must be the same on the
source and the destination. Basically, migration copies the state of
frontends and the user is responsible for having these frontends created
and connected to the right backends on the destination.

Using different paths on the destination is a very obvious requirement
for block devices. It's less obvious for the graph structure, but I
don't see a reason why it couldn't change on migration. Say we were
using local storage on the source, but now we did storage migration to
some network storage, access to which should be throttled.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01  9:54   ` Kevin Wolf
  2019-10-01 10:05     ` Vladimir Sementsov-Ogievskiy
  2019-10-01 13:24     ` Peter Krempa
@ 2019-10-01 15:09     ` John Snow
  2019-10-01 15:58       ` Kevin Wolf
  2 siblings, 1 reply; 47+ messages in thread
From: John Snow @ 2019-10-01 15:09 UTC (permalink / raw)
  To: Kevin Wolf, Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, Qemu-block, Max Reitz



On 10/1/19 5:54 AM, Kevin Wolf wrote:
> Am 01.10.2019 um 10:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 01.10.2019 3:09, John Snow wrote:
>>> Hi folks, I identified a problem with the migration code that Red Hat QE
>>> found and thought you'd like to see it:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>>
>>> Very, very briefly: drive-mirror inserts a filter node that changes what
>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>>
>>>
>>> Ignorant question #1: Can we multi-parent the filter node and
>>> source-node? It looks like at the moment both consider their only parent
>>> to be the block-job and don't have a link back to their parents otherwise.
>>>
>>>
>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>> ultimately what we want is to be able to find the "addressable" name for
>>> the node the bitmap is attached to, which would be the name of the first
>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>> above that node.)
>>
>>
>> Better would be to migrate by node-name only.. But am I right that
>> node-names are different on source and destination? Or this situation
>> changed?
> 
> Traditionally, I think migration assumes that frontends (guest devices)
> must match exactly, but backends may and usually will differ.
> 
> Of course, dirty bitmaps are a backend feature that isn't really related
> to guest devices, so this doesn't really work out any more in your case.
> BlockBackend names are unusable for this purpose (especially as we're
> moving towards anonymous BlockBackends everywhere), which I guess
> essentially means node-name is the only option left.
> 

The problem as I see it involves API stability.

We allow block-dirty-bitmap-add against e.g. "drive1" through the
block-backend name (the name of the "drive" as the user sees it.)

Of course, once you start mirror, you aren't able to access that bitmap
through that namepair anymore -- the "address" of the bitmap has "changed"!

(In actual fact, the bitmap always had two addresses; and simply we lost
an alias -- but it's the one that the user likely used to create the
bitmap, so that's bad.)

> Is bitmap migration something that must be enabled explicitly or does
> it happen automatically? If it's explicit, then making an additional
> requirement (matching node-names) shouldn't be a problem.
> 

This means that bitmap migration becomes a blockdev-only feature.

Serious question: do we have plans to formally deprecate things like
-drive and mandate a blockdev workflow, or otherwise work to unify the
actual graph that gets created between the two methods?

>>> A simple way to do this might be a "child_unfiltered" BdrvChild role
>>> that simply bypasses the filter that was inserted and serves no real
>>> purpose other than to allow the child to have a parent link and find who
>>> it's """real""" parent is.
>>>
>>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
>>> feasible this quick idea might be, though.
>>>
>>>
>>> - Corollary fix #1: call error_setg if the bitmap node name that's about
>>> to go over the wire is an autogenerated node: this is never correct!
>>>
>>> (Why not? because the target is incapable of matching the node-name
>>> because they are randomly generated AND you cannot specify node-names
>>> with # prefixes as they are especially reserved!
>>>
>>> (This raises a related problem: if you explicitly add bitmaps to nodes
>>> with autogenerated names, you will be unable to migrate them.))
>>>
>>
>> In other words, we need a well defined way to match nodes on source and destination,
>> keeping in mind filters, to migrate bitmaps correctly.
>>

Yes, exactly.

>> Hm, did you thought about bitmaps in filters? It's not a problem to create bitmap in
>> mirror-top filter during mirror job:)
>>
>> Or what about bitmaps in Quorum children? Or what about bitmap in qcow2 file child bs?
>>
>> If node-names are different on source and destination, what is the same? Top blk name
>> and bdrv-children names (I recently saw Max's idea to check node "path" in iotest).
> 
> blk_name has to be assumed to be "". The BdrvChild path changes when
> filters are inserted (and inserting filters on the destination that
> aren't present on the source, or vice versa, sounds like something that
> should just work).
> 
> So both parts of this are not great ways for addressing nodes.
> 
>> So, actually node is migration-addressable, if path <blk-name>/root[/child-name] to the
>> defines this node directly (we must not have children with same name for some node in
>> the path).
>>
>> And I think it's a correct way to define node in migration stream - by path.
> 
> I'm afraid node-name is the only thing that could possibly work reliably
> for identifying nodes.
> 
> Kevin
> 

It sounds like you are saying that bitmaps must become a blockdev-only
feature.

I'm not sure if I have arrived at that conclusion yet, but it's at least
inarguable that with blockdev it's a lot simpler to guarantee correctness.

However, we still have -cdrom and -hda and -drive and any number of
sugars that I think we aren't committed to getting rid of yet... (or ever?)

--js


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 14:53         ` Vladimir Sementsov-Ogievskiy
@ 2019-10-01 15:26           ` Max Reitz
  2019-10-02  7:34             ` Peter Krempa
  0 siblings, 1 reply; 47+ messages in thread
From: Max Reitz @ 2019-10-01 15:26 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, John Snow
  Cc: Peter Krempa, qemu-devel, Qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 4676 bytes --]

On 01.10.19 16:53, Vladimir Sementsov-Ogievskiy wrote:
> 01.10.2019 17:34, Max Reitz wrote:
>> On 01.10.19 16:27, Vladimir Sementsov-Ogievskiy wrote:
>>> 01.10.2019 17:13, Max Reitz wrote:
>>>> On 01.10.19 16:00, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 01.10.2019 3:09, John Snow wrote:
>>>>>> Hi folks, I identified a problem with the migration code that Red Hat QE
>>>>>> found and thought you'd like to see it:
>>>>>>
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>>>>>
>>>>>> Very, very briefly: drive-mirror inserts a filter node that changes what
>>>>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>>>>>
>>>>>>
>>>>>> Ignorant question #1: Can we multi-parent the filter node and
>>>>>> source-node? It looks like at the moment both consider their only parent
>>>>>> to be the block-job and don't have a link back to their parents otherwise.
>>>>>>
>>>>>>
>>>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>>>>> ultimately what we want is to be able to find the "addressable" name for
>>>>>> the node the bitmap is attached to, which would be the name of the first
>>>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>>>>> above that node.)
>>>>>
>>>>> Not the name of ancestor node, it will break mapping: it must be name of the
>>>>> node itself or name of parent (may be through several filters) block-backend
>>>>>
>>>>>>
>>>>>> A simple way to do this might be a "child_unfiltered" BdrvChild role
>>>>>> that simply bypasses the filter that was inserted and serves no real
>>>>>> purpose other than to allow the child to have a parent link and find who
>>>>>> it's """real""" parent is.
>>>>>>
>>>>>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
>>>>>> feasible this quick idea might be, though.
>>>>>>
>>>>>>
>>>>>> - Corollary fix #1: call error_setg if the bitmap node name that's about
>>>>>> to go over the wire is an autogenerated node: this is never correct!
>>>>>>
>>>>>> (Why not? because the target is incapable of matching the node-name
>>>>>> because they are randomly generated AND you cannot specify node-names
>>>>>> with # prefixes as they are especially reserved!
>>>>>>
>>>>>> (This raises a related problem: if you explicitly add bitmaps to nodes
>>>>>> with autogenerated names, you will be unable to migrate them.))
>>>>>>
>>>>>> --js
>>>>>>
>>>>>
>>>>> What about the following:
>>>>>
>>>>> diff --git a/block.c b/block.c
>>>>> index 5944124845..6739c19be9 100644
>>>>> --- a/block.c
>>>>> +++ b/block.c
>>>>> @@ -1009,8 +1009,20 @@ static void bdrv_inherited_options(int *child_flags, QDict *child_options,
>>>>>         *child_flags = flags;
>>>>>     }
>>>>>
>>>>> +static const char *bdrv_child_get_name(BdrvChild *child)
>>>>> +{
>>>>> +    BlockDriverState *parent = child->opaque;
>>>>> +
>>>>> +    if (parent->drv && parent->drv->is_filter) {
>>>>> +        return bdrv_get_parent_name(parent);
>>>>> +    }
>>>>> +
>>>>> +    return NULL;
>>>>> +}
>>>>> +
>>>>
>>>> Why would we skip filters explicitly added by the user?
>>>>
>>>
>>> Why not? Otherwise migration of bitmaps will not work: we may have different set
>>> of filters on source and destination, and we still should map nodes with bitmaps
>>> automatically.
>>
>> Why would we have a different set of explicitly added filters on source
>> and destination and allow them to be automatically changed during
>> migration?  Shouldn’t users only change them pre or post migration?
> 
> Hm I didn't say to automatically change filters.
> 
> I meant, that we must understand the mapping of nodes with bitmaps: if we have source
> node N with bitmaps, in which target node these bitmaps should be after migration?
> And we have to understand it without user interaction.

Well, I’d think it should be on the one with the same node name, but it
appears others don’t want a node-name-based mapping, so maybe I should
just stop trying to be part of the discussion. :-)

> Still, if you don't like skipping explicit filters, I'm OK with implicit only, it will
> fix the bug we are saying about.
> 
> But why you don't like creating some additional explicit filters on target (prior to
> migration process) which we didn't have on source?

Because I feel like (without having too much insight into migration, I
admit) that migration is generally a process where you move from one VM
to another, but both should have the same configuration.  If you want to
change the configuration, you do that before or after the migration.
(I’d think.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 15:09         ` Kevin Wolf
@ 2019-10-01 15:27           ` Max Reitz
  2019-10-01 16:12             ` Kevin Wolf
  0 siblings, 1 reply; 47+ messages in thread
From: Max Reitz @ 2019-10-01 15:27 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, Qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 4608 bytes --]

On 01.10.19 17:09, Kevin Wolf wrote:
> Am 01.10.2019 um 16:34 hat Max Reitz geschrieben:
>> On 01.10.19 16:27, Vladimir Sementsov-Ogievskiy wrote:
>>> 01.10.2019 17:13, Max Reitz wrote:
>>>> On 01.10.19 16:00, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 01.10.2019 3:09, John Snow wrote:
>>>>>> Hi folks, I identified a problem with the migration code that Red Hat QE
>>>>>> found and thought you'd like to see it:
>>>>>>
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>>>>>
>>>>>> Very, very briefly: drive-mirror inserts a filter node that changes what
>>>>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>>>>>
>>>>>>
>>>>>> Ignorant question #1: Can we multi-parent the filter node and
>>>>>> source-node? It looks like at the moment both consider their only parent
>>>>>> to be the block-job and don't have a link back to their parents otherwise.
>>>>>>
>>>>>>
>>>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>>>>> ultimately what we want is to be able to find the "addressable" name for
>>>>>> the node the bitmap is attached to, which would be the name of the first
>>>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>>>>> above that node.)
>>>>>
>>>>> Not the name of ancestor node, it will break mapping: it must be name of the
>>>>> node itself or name of parent (may be through several filters) block-backend
>>>>>
>>>>>>
>>>>>> A simple way to do this might be a "child_unfiltered" BdrvChild role
>>>>>> that simply bypasses the filter that was inserted and serves no real
>>>>>> purpose other than to allow the child to have a parent link and find who
>>>>>> it's """real""" parent is.
>>>>>>
>>>>>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
>>>>>> feasible this quick idea might be, though.
>>>>>>
>>>>>>
>>>>>> - Corollary fix #1: call error_setg if the bitmap node name that's about
>>>>>> to go over the wire is an autogenerated node: this is never correct!
>>>>>>
>>>>>> (Why not? because the target is incapable of matching the node-name
>>>>>> because they are randomly generated AND you cannot specify node-names
>>>>>> with # prefixes as they are especially reserved!
>>>>>>
>>>>>> (This raises a related problem: if you explicitly add bitmaps to nodes
>>>>>> with autogenerated names, you will be unable to migrate them.))
>>>>>>
>>>>>> --js
>>>>>>
>>>>>
>>>>> What about the following:
>>>>>
>>>>> diff --git a/block.c b/block.c
>>>>> index 5944124845..6739c19be9 100644
>>>>> --- a/block.c
>>>>> +++ b/block.c
>>>>> @@ -1009,8 +1009,20 @@ static void bdrv_inherited_options(int *child_flags, QDict *child_options,
>>>>>        *child_flags = flags;
>>>>>    }
>>>>>
>>>>> +static const char *bdrv_child_get_name(BdrvChild *child)
>>>>> +{
>>>>> +    BlockDriverState *parent = child->opaque;
>>>>> +
>>>>> +    if (parent->drv && parent->drv->is_filter) {
>>>>> +        return bdrv_get_parent_name(parent);
>>>>> +    }
>>>>> +
>>>>> +    return NULL;
>>>>> +}
>>>>> +
>>>>
>>>> Why would we skip filters explicitly added by the user?
>>>>
>>>
>>> Why not? Otherwise migration of bitmaps will not work: we may have different set
>>> of filters on source and destination, and we still should map nodes with bitmaps
>>> automatically.
>>
>> Why would we have a different set of explicitly added filters on source
>> and destination and allow them to be automatically changed during
>> migration?  Shouldn’t users only change them pre or post migration?
> 
> We never made a requirement that the backend must be the same on the
> source and the destination. Basically, migration copies the state of
> frontends and the user is responsible for having these frontends created
> and connected to the right backends on the destination.
> 
> Using different paths on the destination is a very obvious requirement
> for block devices. It's less obvious for the graph structure, but I
> don't see a reason why it couldn't change on migration. Say we were
> using local storage on the source, but now we did storage migration to
> some network storage, access to which should be throttled.

I don’t quite see why we couldn’t add such filters before or after
migration.  And it was my impression that bitmap migration was a problem
now precisely because it is bound to the graph structure.

But anyway.  I’ll gladly remove myself from this discussion because I
don’t know much about migration and actually I’d prefer to keep it that
way.  (Sorry.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 14:10   ` John Snow
@ 2019-10-01 15:57     ` Vladimir Sementsov-Ogievskiy
  2019-10-01 16:07       ` John Snow
  2019-10-01 16:16       ` Kevin Wolf
  0 siblings, 2 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-01 15:57 UTC (permalink / raw)
  To: John Snow; +Cc: Peter Krempa, qemu-devel, Qemu-block, Max Reitz

01.10.2019 17:10, John Snow wrote:
> 
> 
> On 10/1/19 10:00 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>> ultimately what we want is to be able to find the "addressable" name for
>>> the node the bitmap is attached to, which would be the name of the first
>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>> above that node.)
>> Not the name of ancestor node, it will break mapping: it must be name of the
>> node itself or name of parent (may be through several filters) block-backend
>>
> 
> Ah, you are right of course -- because block-backends are the only
> "nodes" for which we actually descend the graph and add the bitmap to
> its child.
> 
> So the real back-resolution mechanism is:
> 
> - Find the first non-filter ancestor, A
> - if A is not a block-backend, we must use our node-local name.
> - if A's name is empty, we must use our node-local name.
> - If the name we have chosen is not id_wellformed, we have no
> migration-stable addressable name for this bitmap and the migration must
> fail!
> 
> 
> For resolving bitmap addresses via QMP (node, name) pairs; the
> resolution method would be this:
> 
> - if the node-name N is a block-backend, descend the tree until we find
> the first non-filter node V.
> - if the node-name N is a BlockDriverState, use this node directly.
> 

Looks good for me.

I also think if on destination we have both block-backend with name N and
block-node with name N and the latter is not (filtered) child of the former,
we should fail migration of at least that bitmap. (Hope, nobody reuse
block-backend names as node-names in practice.. (should we restrict it
explicitly ?))

> 
> (I don't have the time to investigate the code snippet right now; my
> attention is being pulled to a different project. sorry!)
> 

So, you are not working on this? Then I'll make patches.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 15:09     ` John Snow
@ 2019-10-01 15:58       ` Kevin Wolf
  2019-10-01 16:12         ` Vladimir Sementsov-Ogievskiy
  2019-10-01 16:23         ` John Snow
  0 siblings, 2 replies; 47+ messages in thread
From: Kevin Wolf @ 2019-10-01 15:58 UTC (permalink / raw)
  To: John Snow; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Qemu-block, Max Reitz

Am 01.10.2019 um 17:09 hat John Snow geschrieben:
> 
> 
> On 10/1/19 5:54 AM, Kevin Wolf wrote:
> > Am 01.10.2019 um 10:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> 01.10.2019 3:09, John Snow wrote:
> >>> Hi folks, I identified a problem with the migration code that Red Hat QE
> >>> found and thought you'd like to see it:
> >>>
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
> >>>
> >>> Very, very briefly: drive-mirror inserts a filter node that changes what
> >>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
> >>>
> >>>
> >>> Ignorant question #1: Can we multi-parent the filter node and
> >>> source-node? It looks like at the moment both consider their only parent
> >>> to be the block-job and don't have a link back to their parents otherwise.
> >>>
> >>>
> >>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
> >>> ultimately what we want is to be able to find the "addressable" name for
> >>> the node the bitmap is attached to, which would be the name of the first
> >>> ancestor node that isn't a filter. (OR, the name of the block-backend
> >>> above that node.)
> >>
> >>
> >> Better would be to migrate by node-name only.. But am I right that
> >> node-names are different on source and destination? Or this situation
> >> changed?
> > 
> > Traditionally, I think migration assumes that frontends (guest devices)
> > must match exactly, but backends may and usually will differ.
> > 
> > Of course, dirty bitmaps are a backend feature that isn't really related
> > to guest devices, so this doesn't really work out any more in your case.
> > BlockBackend names are unusable for this purpose (especially as we're
> > moving towards anonymous BlockBackends everywhere), which I guess
> > essentially means node-name is the only option left.
> > 
> 
> The problem as I see it involves API stability.
> 
> We allow block-dirty-bitmap-add against e.g. "drive1" through the
> block-backend name (the name of the "drive" as the user sees it.)
> 
> Of course, once you start mirror, you aren't able to access that bitmap
> through that namepair anymore -- the "address" of the bitmap has "changed"!
> 
> (In actual fact, the bitmap always had two addresses; and simply we lost
> an alias -- but it's the one that the user likely used to create the
> bitmap, so that's bad.)

So if I understand correctly, the problem is that without a filter, in
some setups we get a usable BlockBackend name like "drive1", but when a
filter is added, we return the node-name instead which is
auto-generated and will be different on the destination.

Looking at the ChildRole documentation:

    /* Returns a name that is supposedly more useful for human users than the
     * node name for identifying the node in question (in particular, a BB
     * name), or NULL if the parent can't provide a better name. */
    const char *(*get_name)(BdrvChild *child);

I'd argue that a BlockBackend name is more useful for a human user even
across filter, so I'd support a .get_name implementation for a filter
child role (which Max wanted to introduce anyway for his filter series).

Of course, if you have a function that is made to return a convenient
text for human users, and you use it for a stable ABI like the migration
stream, this is an idea that would certainly have caused an entertaining
Linus rant in the good old kernel times.

> > Is bitmap migration something that must be enabled explicitly or does
> > it happen automatically? If it's explicit, then making an additional
> > requirement (matching node-names) shouldn't be a problem.
> 
> This means that bitmap migration becomes a blockdev-only feature.

I meant this more as the preferred way for the future rather than the
only thing supported.

But Peter has actually mentioned that for libvirt it will be
blockdev-only anyway. So do we even have a good reason to invest much
for the non-blockdev case?

Maybe making it blockdev-only is actually pretty reasonable.

> Serious question: do we have plans to formally deprecate things like
> -drive and mandate a blockdev workflow, or otherwise work to unify the
> actual graph that gets created between the two methods?

It's high on my wishlist, though we can't before libvirt uses blockdev.

Maybe something to talk about at KVM Forum?

> >>> A simple way to do this might be a "child_unfiltered" BdrvChild role
> >>> that simply bypasses the filter that was inserted and serves no real
> >>> purpose other than to allow the child to have a parent link and find who
> >>> it's """real""" parent is.
> >>>
> >>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
> >>> feasible this quick idea might be, though.
> >>>
> >>>
> >>> - Corollary fix #1: call error_setg if the bitmap node name that's about
> >>> to go over the wire is an autogenerated node: this is never correct!
> >>>
> >>> (Why not? because the target is incapable of matching the node-name
> >>> because they are randomly generated AND you cannot specify node-names
> >>> with # prefixes as they are especially reserved!
> >>>
> >>> (This raises a related problem: if you explicitly add bitmaps to nodes
> >>> with autogenerated names, you will be unable to migrate them.))
> >>>
> >>
> >> In other words, we need a well defined way to match nodes on source and destination,
> >> keeping in mind filters, to migrate bitmaps correctly.
> >>
> 
> Yes, exactly.
> 
> >> Hm, did you thought about bitmaps in filters? It's not a problem to create bitmap in
> >> mirror-top filter during mirror job:)
> >>
> >> Or what about bitmaps in Quorum children? Or what about bitmap in qcow2 file child bs?
> >>
> >> If node-names are different on source and destination, what is the same? Top blk name
> >> and bdrv-children names (I recently saw Max's idea to check node "path" in iotest).
> > 
> > blk_name has to be assumed to be "". The BdrvChild path changes when
> > filters are inserted (and inserting filters on the destination that
> > aren't present on the source, or vice versa, sounds like something that
> > should just work).
> > 
> > So both parts of this are not great ways for addressing nodes.
> > 
> >> So, actually node is migration-addressable, if path <blk-name>/root[/child-name] to the
> >> defines this node directly (we must not have children with same name for some node in
> >> the path).
> >>
> >> And I think it's a correct way to define node in migration stream - by path.
> > 
> > I'm afraid node-name is the only thing that could possibly work reliably
> > for identifying nodes.
> > 
> > Kevin
> > 
> 
> It sounds like you are saying that bitmaps must become a blockdev-only
> feature.

More like we can't rely on BlockBackend names in a blockdev setup.
BlockBackend names can work in a purely traditional setup if we make
some effort to find the block backend even with filters in bettwen, but
they aren't universal even then. And node-names work in the blockdev
case, but they aren't universal either.

But as I said above, you made me wonder whether making it a
blockdev-only feature wouldn't actually be a good idea.

> I'm not sure if I have arrived at that conclusion yet, but it's at least
> inarguable that with blockdev it's a lot simpler to guarantee correctness.
> 
> However, we still have -cdrom and -hda and -drive and any number of
> sugars that I think we aren't committed to getting rid of yet... (or ever?)

-hda and -cdrom currently have hard-coded BlockBackend names. I don't
see what would stop us from changing this to hard-coded node names.

-drive is something that we probably need to remove in its current form
and introduce something new that is both user-friendly and powerful.
Actually, we should probably add the replacement first. :-)

Kevin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 15:57     ` Vladimir Sementsov-Ogievskiy
@ 2019-10-01 16:07       ` John Snow
  2019-10-02  8:12         ` Kevin Wolf
  2019-10-02 10:46         ` Peter Krempa
  2019-10-01 16:16       ` Kevin Wolf
  1 sibling, 2 replies; 47+ messages in thread
From: John Snow @ 2019-10-01 16:07 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Peter Krempa, qemu-devel, Qemu-block, Max Reitz



On 10/1/19 11:57 AM, Vladimir Sementsov-Ogievskiy wrote:
> 01.10.2019 17:10, John Snow wrote:
>>
>>
>> On 10/1/19 10:00 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>>> ultimately what we want is to be able to find the "addressable" name for
>>>> the node the bitmap is attached to, which would be the name of the first
>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>>> above that node.)
>>> Not the name of ancestor node, it will break mapping: it must be name of the
>>> node itself or name of parent (may be through several filters) block-backend
>>>
>>
>> Ah, you are right of course -- because block-backends are the only
>> "nodes" for which we actually descend the graph and add the bitmap to
>> its child.
>>
>> So the real back-resolution mechanism is:
>>

Amendment:
   - If our local node-name N is well-formed, use this.
   - Otherwise:
>> - Find the first non-filter ancestor, A
>> - if A is not a block-backend, we must use our node-local name.
     Amendment: If it's not a BB, we have no addressable name
                for the bitmap and this is an error.
>> - if A's name is empty, we must use our node-local name.
     Amendment: If it's empty, we have no addressable name
                for the bitmap and this is an error.
>> - If the name we have chosen is not id_wellformed, we have no
>> migration-stable addressable name for this bitmap and the migration must
>> fail!
     (Handled by above amendments.)

The reason for the change is to prefer user-defined node names whenever
possible; only trying to find a "device" or "backend" name whenever we
fail to find one.

>>
>>
>> For resolving bitmap addresses via QMP (node, name) pairs; the
>> resolution method would be this:
>>
>> - if the node-name N is a block-backend, descend the tree until we find
>> the first non-filter node V.
>> - if the node-name N is a BlockDriverState, use this node directly.
>>
> 
> Looks good for me.
> 

I hope the increased leeway of the name resolution doesn't make things
worse with respect to filters. I specified this to match the
back-resolution process.

I suppose if you want to add bitmaps to filters, you are required to add
them by node-name specifically.

> I also think if on destination we have both block-backend with name N and
> block-node with name N and the latter is not (filtered) child of the former,
> we should fail migration of at least that bitmap. (Hope, nobody reuse
> block-backend names as node-names in practice.. (should we restrict it
> explicitly ?))
> 

:(

>>
>> (I don't have the time to investigate the code snippet right now; my
>> attention is being pulled to a different project. sorry!)
>>
> 
> So, you are not working on this? Then I'll make patches.
> 

Right; I'm not prototyping a fix currently.

--js


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 15:27           ` Max Reitz
@ 2019-10-01 16:12             ` Kevin Wolf
  2019-10-01 16:17               ` Max Reitz
  0 siblings, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2019-10-01 16:12 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, Qemu-block

[-- Attachment #1: Type: text/plain, Size: 5649 bytes --]

Am 01.10.2019 um 17:27 hat Max Reitz geschrieben:
> On 01.10.19 17:09, Kevin Wolf wrote:
> > Am 01.10.2019 um 16:34 hat Max Reitz geschrieben:
> >> On 01.10.19 16:27, Vladimir Sementsov-Ogievskiy wrote:
> >>> 01.10.2019 17:13, Max Reitz wrote:
> >>>> On 01.10.19 16:00, Vladimir Sementsov-Ogievskiy wrote:
> >>>>> 01.10.2019 3:09, John Snow wrote:
> >>>>>> Hi folks, I identified a problem with the migration code that Red Hat QE
> >>>>>> found and thought you'd like to see it:
> >>>>>>
> >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
> >>>>>>
> >>>>>> Very, very briefly: drive-mirror inserts a filter node that changes what
> >>>>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
> >>>>>>
> >>>>>>
> >>>>>> Ignorant question #1: Can we multi-parent the filter node and
> >>>>>> source-node? It looks like at the moment both consider their only parent
> >>>>>> to be the block-job and don't have a link back to their parents otherwise.
> >>>>>>
> >>>>>>
> >>>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
> >>>>>> ultimately what we want is to be able to find the "addressable" name for
> >>>>>> the node the bitmap is attached to, which would be the name of the first
> >>>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
> >>>>>> above that node.)
> >>>>>
> >>>>> Not the name of ancestor node, it will break mapping: it must be name of the
> >>>>> node itself or name of parent (may be through several filters) block-backend
> >>>>>
> >>>>>>
> >>>>>> A simple way to do this might be a "child_unfiltered" BdrvChild role
> >>>>>> that simply bypasses the filter that was inserted and serves no real
> >>>>>> purpose other than to allow the child to have a parent link and find who
> >>>>>> it's """real""" parent is.
> >>>>>>
> >>>>>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
> >>>>>> feasible this quick idea might be, though.
> >>>>>>
> >>>>>>
> >>>>>> - Corollary fix #1: call error_setg if the bitmap node name that's about
> >>>>>> to go over the wire is an autogenerated node: this is never correct!
> >>>>>>
> >>>>>> (Why not? because the target is incapable of matching the node-name
> >>>>>> because they are randomly generated AND you cannot specify node-names
> >>>>>> with # prefixes as they are especially reserved!
> >>>>>>
> >>>>>> (This raises a related problem: if you explicitly add bitmaps to nodes
> >>>>>> with autogenerated names, you will be unable to migrate them.))
> >>>>>>
> >>>>>> --js
> >>>>>>
> >>>>>
> >>>>> What about the following:
> >>>>>
> >>>>> diff --git a/block.c b/block.c
> >>>>> index 5944124845..6739c19be9 100644
> >>>>> --- a/block.c
> >>>>> +++ b/block.c
> >>>>> @@ -1009,8 +1009,20 @@ static void bdrv_inherited_options(int *child_flags, QDict *child_options,
> >>>>>        *child_flags = flags;
> >>>>>    }
> >>>>>
> >>>>> +static const char *bdrv_child_get_name(BdrvChild *child)
> >>>>> +{
> >>>>> +    BlockDriverState *parent = child->opaque;
> >>>>> +
> >>>>> +    if (parent->drv && parent->drv->is_filter) {
> >>>>> +        return bdrv_get_parent_name(parent);
> >>>>> +    }
> >>>>> +
> >>>>> +    return NULL;
> >>>>> +}
> >>>>> +
> >>>>
> >>>> Why would we skip filters explicitly added by the user?
> >>>>
> >>>
> >>> Why not? Otherwise migration of bitmaps will not work: we may have different set
> >>> of filters on source and destination, and we still should map nodes with bitmaps
> >>> automatically.
> >>
> >> Why would we have a different set of explicitly added filters on source
> >> and destination and allow them to be automatically changed during
> >> migration?  Shouldn’t users only change them pre or post migration?
> > 
> > We never made a requirement that the backend must be the same on the
> > source and the destination. Basically, migration copies the state of
> > frontends and the user is responsible for having these frontends created
> > and connected to the right backends on the destination.
> > 
> > Using different paths on the destination is a very obvious requirement
> > for block devices. It's less obvious for the graph structure, but I
> > don't see a reason why it couldn't change on migration. Say we were
> > using local storage on the source, but now we did storage migration to
> > some network storage, access to which should be throttled.
> 
> I don’t quite see why we couldn’t add such filters before or after
> migration.

Possibly. But why would we when the source doesn't need the filter? We
don't change the image path before migration either.

I think the tricky part is coming up with rules and "keep the frontend
the same, the backend can change arbitrarily" is a very easy rule.

> And it was my impression that bitmap migration was a problem now
> precisely because it is bound to the graph structure.

So apparently I wasn't completely wrong when I preferred just writing
bitmaps back to the image instead of transferring them in the migration
stream...

It's not really bound to the graph structure per se, but to node names
and for non-anonymous BlockBackends to the link between the BB and its
root node. The latter is part of the graph structure, but only a very
small part, and it exists only for legacy (non-blockdev) configurations.

> But anyway.  I’ll gladly remove myself from this discussion because I
> don’t know much about migration and actually I’d prefer to keep it that
> way.  (Sorry.)

Good idea, let's have the migration maintainers handle this.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 15:58       ` Kevin Wolf
@ 2019-10-01 16:12         ` Vladimir Sementsov-Ogievskiy
  2019-10-01 16:24           ` Kevin Wolf
  2019-10-01 16:23         ` John Snow
  1 sibling, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-01 16:12 UTC (permalink / raw)
  To: Kevin Wolf, John Snow; +Cc: qemu-devel, Qemu-block, Max Reitz

01.10.2019 18:58, Kevin Wolf wrote:
> Am 01.10.2019 um 17:09 hat John Snow geschrieben:
>>
>>
>> On 10/1/19 5:54 AM, Kevin Wolf wrote:
>>> Am 01.10.2019 um 10:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 01.10.2019 3:09, John Snow wrote:
>>>>> Hi folks, I identified a problem with the migration code that Red Hat QE
>>>>> found and thought you'd like to see it:
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>>>>
>>>>> Very, very briefly: drive-mirror inserts a filter node that changes what
>>>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>>>>
>>>>>
>>>>> Ignorant question #1: Can we multi-parent the filter node and
>>>>> source-node? It looks like at the moment both consider their only parent
>>>>> to be the block-job and don't have a link back to their parents otherwise.
>>>>>
>>>>>
>>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>>>> ultimately what we want is to be able to find the "addressable" name for
>>>>> the node the bitmap is attached to, which would be the name of the first
>>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>>>> above that node.)
>>>>
>>>>
>>>> Better would be to migrate by node-name only.. But am I right that
>>>> node-names are different on source and destination? Or this situation
>>>> changed?
>>>
>>> Traditionally, I think migration assumes that frontends (guest devices)
>>> must match exactly, but backends may and usually will differ.
>>>
>>> Of course, dirty bitmaps are a backend feature that isn't really related
>>> to guest devices, so this doesn't really work out any more in your case.
>>> BlockBackend names are unusable for this purpose (especially as we're
>>> moving towards anonymous BlockBackends everywhere), which I guess
>>> essentially means node-name is the only option left.
>>>
>>
>> The problem as I see it involves API stability.
>>
>> We allow block-dirty-bitmap-add against e.g. "drive1" through the
>> block-backend name (the name of the "drive" as the user sees it.)
>>
>> Of course, once you start mirror, you aren't able to access that bitmap
>> through that namepair anymore -- the "address" of the bitmap has "changed"!
>>
>> (In actual fact, the bitmap always had two addresses; and simply we lost
>> an alias -- but it's the one that the user likely used to create the
>> bitmap, so that's bad.)
> 
> So if I understand correctly, the problem is that without a filter, in
> some setups we get a usable BlockBackend name like "drive1", but when a
> filter is added, we return the node-name instead which is
> auto-generated and will be different on the destination.
> 
> Looking at the ChildRole documentation:
> 
>      /* Returns a name that is supposedly more useful for human users than the
>       * node name for identifying the node in question (in particular, a BB
>       * name), or NULL if the parent can't provide a better name. */
>      const char *(*get_name)(BdrvChild *child);
> 
> I'd argue that a BlockBackend name is more useful for a human user even
> across filter, so I'd support a .get_name implementation for a filter
> child role (which Max wanted to introduce anyway for his filter series).
> 
> Of course, if you have a function that is made to return a convenient
> text for human users, and you use it for a stable ABI like the migration
> stream, this is an idea that would certainly have caused an entertaining
> Linus rant in the good old kernel times.
> 
>>> Is bitmap migration something that must be enabled explicitly or does
>>> it happen automatically? If it's explicit, then making an additional
>>> requirement (matching node-names) shouldn't be a problem.
>>
>> This means that bitmap migration becomes a blockdev-only feature.
> 
> I meant this more as the preferred way for the future rather than the
> only thing supported.
> 
> But Peter has actually mentioned that for libvirt it will be
> blockdev-only anyway. So do we even have a good reason to invest much
> for the non-blockdev case?
> 
> Maybe making it blockdev-only is actually pretty reasonable.

We in Virtuozzo use bitmap migration, so I'd have to fix it at least
downstream (it seems easier than switch downstream libvirt to blockdev now).

And what about original bug
https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
?

Also, if we make it blockdev-only upstream, what we mean by that? Node names
on destination must match, or we add additional qmp command
migration-set-bitmap-node-mapping, to specify mapping between node names on
source and target?




-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 15:57     ` Vladimir Sementsov-Ogievskiy
  2019-10-01 16:07       ` John Snow
@ 2019-10-01 16:16       ` Kevin Wolf
  2019-10-01 16:17         ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2019-10-01 16:16 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: John Snow, qemu-devel, Qemu-block, Max Reitz

Am 01.10.2019 um 17:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 01.10.2019 17:10, John Snow wrote:
> > 
> > 
> > On 10/1/19 10:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> >>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
> >>> ultimately what we want is to be able to find the "addressable" name for
> >>> the node the bitmap is attached to, which would be the name of the first
> >>> ancestor node that isn't a filter. (OR, the name of the block-backend
> >>> above that node.)
> >> Not the name of ancestor node, it will break mapping: it must be name of the
> >> node itself or name of parent (may be through several filters) block-backend
> >>
> > 
> > Ah, you are right of course -- because block-backends are the only
> > "nodes" for which we actually descend the graph and add the bitmap to
> > its child.
> > 
> > So the real back-resolution mechanism is:
> > 
> > - Find the first non-filter ancestor, A
> > - if A is not a block-backend, we must use our node-local name.
> > - if A's name is empty, we must use our node-local name.
> > - If the name we have chosen is not id_wellformed, we have no
> > migration-stable addressable name for this bitmap and the migration must
> > fail!
> > 
> > 
> > For resolving bitmap addresses via QMP (node, name) pairs; the
> > resolution method would be this:
> > 
> > - if the node-name N is a block-backend, descend the tree until we find
> > the first non-filter node V.
> > - if the node-name N is a BlockDriverState, use this node directly.
> > 
> 
> Looks good for me.
> 
> I also think if on destination we have both block-backend with name N and
> block-node with name N and the latter is not (filtered) child of the former,
> we should fail migration of at least that bitmap. (Hope, nobody reuse
> block-backend names as node-names in practice.. (should we restrict it
> explicitly ?))

You can't have a node and a BlockBackend of the same name, they share a
single namespace. If you try to do so, you get an error.

Kevin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 16:16       ` Kevin Wolf
@ 2019-10-01 16:17         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-01 16:17 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: John Snow, qemu-devel, Qemu-block, Max Reitz

01.10.2019 19:16, Kevin Wolf wrote:
> Am 01.10.2019 um 17:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 01.10.2019 17:10, John Snow wrote:
>>>
>>>
>>> On 10/1/19 10:00 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>>>> ultimately what we want is to be able to find the "addressable" name for
>>>>> the node the bitmap is attached to, which would be the name of the first
>>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>>>> above that node.)
>>>> Not the name of ancestor node, it will break mapping: it must be name of the
>>>> node itself or name of parent (may be through several filters) block-backend
>>>>
>>>
>>> Ah, you are right of course -- because block-backends are the only
>>> "nodes" for which we actually descend the graph and add the bitmap to
>>> its child.
>>>
>>> So the real back-resolution mechanism is:
>>>
>>> - Find the first non-filter ancestor, A
>>> - if A is not a block-backend, we must use our node-local name.
>>> - if A's name is empty, we must use our node-local name.
>>> - If the name we have chosen is not id_wellformed, we have no
>>> migration-stable addressable name for this bitmap and the migration must
>>> fail!
>>>
>>>
>>> For resolving bitmap addresses via QMP (node, name) pairs; the
>>> resolution method would be this:
>>>
>>> - if the node-name N is a block-backend, descend the tree until we find
>>> the first non-filter node V.
>>> - if the node-name N is a BlockDriverState, use this node directly.
>>>
>>
>> Looks good for me.
>>
>> I also think if on destination we have both block-backend with name N and
>> block-node with name N and the latter is not (filtered) child of the former,
>> we should fail migration of at least that bitmap. (Hope, nobody reuse
>> block-backend names as node-names in practice.. (should we restrict it
>> explicitly ?))
> 
> You can't have a node and a BlockBackend of the same name, they share a
> single namespace. If you try to do so, you get an error.
> 


Oh, thanks, that's cool.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 16:12             ` Kevin Wolf
@ 2019-10-01 16:17               ` Max Reitz
  2019-10-01 16:22                 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 47+ messages in thread
From: Max Reitz @ 2019-10-01 16:17 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, Qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 6046 bytes --]

On 01.10.19 18:12, Kevin Wolf wrote:
> Am 01.10.2019 um 17:27 hat Max Reitz geschrieben:
>> On 01.10.19 17:09, Kevin Wolf wrote:
>>> Am 01.10.2019 um 16:34 hat Max Reitz geschrieben:
>>>> On 01.10.19 16:27, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 01.10.2019 17:13, Max Reitz wrote:
>>>>>> On 01.10.19 16:00, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> 01.10.2019 3:09, John Snow wrote:
>>>>>>>> Hi folks, I identified a problem with the migration code that Red Hat QE
>>>>>>>> found and thought you'd like to see it:
>>>>>>>>
>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>>>>>>>
>>>>>>>> Very, very briefly: drive-mirror inserts a filter node that changes what
>>>>>>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>>>>>>>
>>>>>>>>
>>>>>>>> Ignorant question #1: Can we multi-parent the filter node and
>>>>>>>> source-node? It looks like at the moment both consider their only parent
>>>>>>>> to be the block-job and don't have a link back to their parents otherwise.
>>>>>>>>
>>>>>>>>
>>>>>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>>>>>>> ultimately what we want is to be able to find the "addressable" name for
>>>>>>>> the node the bitmap is attached to, which would be the name of the first
>>>>>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>>>>>>> above that node.)
>>>>>>>
>>>>>>> Not the name of ancestor node, it will break mapping: it must be name of the
>>>>>>> node itself or name of parent (may be through several filters) block-backend
>>>>>>>
>>>>>>>>
>>>>>>>> A simple way to do this might be a "child_unfiltered" BdrvChild role
>>>>>>>> that simply bypasses the filter that was inserted and serves no real
>>>>>>>> purpose other than to allow the child to have a parent link and find who
>>>>>>>> it's """real""" parent is.
>>>>>>>>
>>>>>>>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
>>>>>>>> feasible this quick idea might be, though.
>>>>>>>>
>>>>>>>>
>>>>>>>> - Corollary fix #1: call error_setg if the bitmap node name that's about
>>>>>>>> to go over the wire is an autogenerated node: this is never correct!
>>>>>>>>
>>>>>>>> (Why not? because the target is incapable of matching the node-name
>>>>>>>> because they are randomly generated AND you cannot specify node-names
>>>>>>>> with # prefixes as they are especially reserved!
>>>>>>>>
>>>>>>>> (This raises a related problem: if you explicitly add bitmaps to nodes
>>>>>>>> with autogenerated names, you will be unable to migrate them.))
>>>>>>>>
>>>>>>>> --js
>>>>>>>>
>>>>>>>
>>>>>>> What about the following:
>>>>>>>
>>>>>>> diff --git a/block.c b/block.c
>>>>>>> index 5944124845..6739c19be9 100644
>>>>>>> --- a/block.c
>>>>>>> +++ b/block.c
>>>>>>> @@ -1009,8 +1009,20 @@ static void bdrv_inherited_options(int *child_flags, QDict *child_options,
>>>>>>>        *child_flags = flags;
>>>>>>>    }
>>>>>>>
>>>>>>> +static const char *bdrv_child_get_name(BdrvChild *child)
>>>>>>> +{
>>>>>>> +    BlockDriverState *parent = child->opaque;
>>>>>>> +
>>>>>>> +    if (parent->drv && parent->drv->is_filter) {
>>>>>>> +        return bdrv_get_parent_name(parent);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return NULL;
>>>>>>> +}
>>>>>>> +
>>>>>>
>>>>>> Why would we skip filters explicitly added by the user?
>>>>>>
>>>>>
>>>>> Why not? Otherwise migration of bitmaps will not work: we may have different set
>>>>> of filters on source and destination, and we still should map nodes with bitmaps
>>>>> automatically.
>>>>
>>>> Why would we have a different set of explicitly added filters on source
>>>> and destination and allow them to be automatically changed during
>>>> migration?  Shouldn’t users only change them pre or post migration?
>>>
>>> We never made a requirement that the backend must be the same on the
>>> source and the destination. Basically, migration copies the state of
>>> frontends and the user is responsible for having these frontends created
>>> and connected to the right backends on the destination.
>>>
>>> Using different paths on the destination is a very obvious requirement
>>> for block devices. It's less obvious for the graph structure, but I
>>> don't see a reason why it couldn't change on migration. Say we were
>>> using local storage on the source, but now we did storage migration to
>>> some network storage, access to which should be throttled.
>>
>> I don’t quite see why we couldn’t add such filters before or after
>> migration.
> 
> Possibly. But why would we when the source doesn't need the filter? We
> don't change the image path before migration either.
> 
> I think the tricky part is coming up with rules and "keep the frontend
> the same, the backend can change arbitrarily" is a very easy rule.

OK, indeed.

>> And it was my impression that bitmap migration was a problem now
>> precisely because it is bound to the graph structure.
> 
> So apparently I wasn't completely wrong when I preferred just writing
> bitmaps back to the image instead of transferring them in the migration
> stream...
> 
> It's not really bound to the graph structure per se, but to node names
> and for non-anonymous BlockBackends to the link between the BB and its
> root node. The latter is part of the graph structure, but only a very
> small part, and it exists only for legacy (non-blockdev) configurations.
> 
>> But anyway.  I’ll gladly remove myself from this discussion because I
>> don’t know much about migration and actually I’d prefer to keep it that
>> way.  (Sorry.)
> 
> Good idea, let's have the migration maintainers handle this.

:-)

That’s always how it goes, isn’t it?  Migration maintainers don’t know
the block side, and we don’t know the migration side...

Anyway.  It’s just a fact that I don’t have much to add to the
discussion, whereas there seems to be a productive discussion without me
already.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 16:17               ` Max Reitz
@ 2019-10-01 16:22                 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-01 16:22 UTC (permalink / raw)
  To: Max Reitz, Kevin Wolf; +Cc: John Snow, qemu-devel, Qemu-block

01.10.2019 19:17, Max Reitz wrote:
> On 01.10.19 18:12, Kevin Wolf wrote:
>> Am 01.10.2019 um 17:27 hat Max Reitz geschrieben:
>>> On 01.10.19 17:09, Kevin Wolf wrote:
>>>> Am 01.10.2019 um 16:34 hat Max Reitz geschrieben:
>>>>> On 01.10.19 16:27, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 01.10.2019 17:13, Max Reitz wrote:
>>>>>>> On 01.10.19 16:00, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>> 01.10.2019 3:09, John Snow wrote:
>>>>>>>>> Hi folks, I identified a problem with the migration code that Red Hat QE
>>>>>>>>> found and thought you'd like to see it:
>>>>>>>>>
>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>>>>>>>>
>>>>>>>>> Very, very briefly: drive-mirror inserts a filter node that changes what
>>>>>>>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ignorant question #1: Can we multi-parent the filter node and
>>>>>>>>> source-node? It looks like at the moment both consider their only parent
>>>>>>>>> to be the block-job and don't have a link back to their parents otherwise.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>>>>>>>> ultimately what we want is to be able to find the "addressable" name for
>>>>>>>>> the node the bitmap is attached to, which would be the name of the first
>>>>>>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>>>>>>>> above that node.)
>>>>>>>>
>>>>>>>> Not the name of ancestor node, it will break mapping: it must be name of the
>>>>>>>> node itself or name of parent (may be through several filters) block-backend
>>>>>>>>
>>>>>>>>>
>>>>>>>>> A simple way to do this might be a "child_unfiltered" BdrvChild role
>>>>>>>>> that simply bypasses the filter that was inserted and serves no real
>>>>>>>>> purpose other than to allow the child to have a parent link and find who
>>>>>>>>> it's """real""" parent is.
>>>>>>>>>
>>>>>>>>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
>>>>>>>>> feasible this quick idea might be, though.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> - Corollary fix #1: call error_setg if the bitmap node name that's about
>>>>>>>>> to go over the wire is an autogenerated node: this is never correct!
>>>>>>>>>
>>>>>>>>> (Why not? because the target is incapable of matching the node-name
>>>>>>>>> because they are randomly generated AND you cannot specify node-names
>>>>>>>>> with # prefixes as they are especially reserved!
>>>>>>>>>
>>>>>>>>> (This raises a related problem: if you explicitly add bitmaps to nodes
>>>>>>>>> with autogenerated names, you will be unable to migrate them.))
>>>>>>>>>
>>>>>>>>> --js
>>>>>>>>>
>>>>>>>>
>>>>>>>> What about the following:
>>>>>>>>
>>>>>>>> diff --git a/block.c b/block.c
>>>>>>>> index 5944124845..6739c19be9 100644
>>>>>>>> --- a/block.c
>>>>>>>> +++ b/block.c
>>>>>>>> @@ -1009,8 +1009,20 @@ static void bdrv_inherited_options(int *child_flags, QDict *child_options,
>>>>>>>>         *child_flags = flags;
>>>>>>>>     }
>>>>>>>>
>>>>>>>> +static const char *bdrv_child_get_name(BdrvChild *child)
>>>>>>>> +{
>>>>>>>> +    BlockDriverState *parent = child->opaque;
>>>>>>>> +
>>>>>>>> +    if (parent->drv && parent->drv->is_filter) {
>>>>>>>> +        return bdrv_get_parent_name(parent);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    return NULL;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>
>>>>>>> Why would we skip filters explicitly added by the user?
>>>>>>>
>>>>>>
>>>>>> Why not? Otherwise migration of bitmaps will not work: we may have different set
>>>>>> of filters on source and destination, and we still should map nodes with bitmaps
>>>>>> automatically.
>>>>>
>>>>> Why would we have a different set of explicitly added filters on source
>>>>> and destination and allow them to be automatically changed during
>>>>> migration?  Shouldn’t users only change them pre or post migration?
>>>>
>>>> We never made a requirement that the backend must be the same on the
>>>> source and the destination. Basically, migration copies the state of
>>>> frontends and the user is responsible for having these frontends created
>>>> and connected to the right backends on the destination.
>>>>
>>>> Using different paths on the destination is a very obvious requirement
>>>> for block devices. It's less obvious for the graph structure, but I
>>>> don't see a reason why it couldn't change on migration. Say we were
>>>> using local storage on the source, but now we did storage migration to
>>>> some network storage, access to which should be throttled.
>>>
>>> I don’t quite see why we couldn’t add such filters before or after
>>> migration.
>>
>> Possibly. But why would we when the source doesn't need the filter? We
>> don't change the image path before migration either.
>>
>> I think the tricky part is coming up with rules and "keep the frontend
>> the same, the backend can change arbitrarily" is a very easy rule.
> 
> OK, indeed.
> 
>>> And it was my impression that bitmap migration was a problem now
>>> precisely because it is bound to the graph structure.
>>
>> So apparently I wasn't completely wrong when I preferred just writing
>> bitmaps back to the image instead of transferring them in the migration
>> stream...
>>
>> It's not really bound to the graph structure per se, but to node names
>> and for non-anonymous BlockBackends to the link between the BB and its
>> root node. The latter is part of the graph structure, but only a very
>> small part, and it exists only for legacy (non-blockdev) configurations.
>>
>>> But anyway.  I’ll gladly remove myself from this discussion because I
>>> don’t know much about migration and actually I’d prefer to keep it that
>>> way.  (Sorry.)
>>
>> Good idea, let's have the migration maintainers handle this.
> 
> :-)
> 
> That’s always how it goes, isn’t it?  Migration maintainers don’t know
> the block side, and we don’t know the migration side...

Haha, luckily I'm not a maintainer :)

> 
> Anyway.  It’s just a fact that I don’t have much to add to the
> discussion, whereas there seems to be a productive discussion without me
> already.
> 
> Max
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 15:58       ` Kevin Wolf
  2019-10-01 16:12         ` Vladimir Sementsov-Ogievskiy
@ 2019-10-01 16:23         ` John Snow
  1 sibling, 0 replies; 47+ messages in thread
From: John Snow @ 2019-10-01 16:23 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Qemu-block, Max Reitz



On 10/1/19 11:58 AM, Kevin Wolf wrote:
> Am 01.10.2019 um 17:09 hat John Snow geschrieben:
>>
>>
>> On 10/1/19 5:54 AM, Kevin Wolf wrote:
>>> Am 01.10.2019 um 10:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 01.10.2019 3:09, John Snow wrote:
>>>>> Hi folks, I identified a problem with the migration code that Red Hat QE
>>>>> found and thought you'd like to see it:
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
>>>>>
>>>>> Very, very briefly: drive-mirror inserts a filter node that changes what
>>>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
>>>>>
>>>>>
>>>>> Ignorant question #1: Can we multi-parent the filter node and
>>>>> source-node? It looks like at the moment both consider their only parent
>>>>> to be the block-job and don't have a link back to their parents otherwise.
>>>>>
>>>>>
>>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>>>> ultimately what we want is to be able to find the "addressable" name for
>>>>> the node the bitmap is attached to, which would be the name of the first
>>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>>>> above that node.)
>>>>
>>>>
>>>> Better would be to migrate by node-name only.. But am I right that
>>>> node-names are different on source and destination? Or this situation
>>>> changed?
>>>
>>> Traditionally, I think migration assumes that frontends (guest devices)
>>> must match exactly, but backends may and usually will differ.
>>>
>>> Of course, dirty bitmaps are a backend feature that isn't really related
>>> to guest devices, so this doesn't really work out any more in your case.
>>> BlockBackend names are unusable for this purpose (especially as we're
>>> moving towards anonymous BlockBackends everywhere), which I guess
>>> essentially means node-name is the only option left.
>>>
>>
>> The problem as I see it involves API stability.
>>
>> We allow block-dirty-bitmap-add against e.g. "drive1" through the
>> block-backend name (the name of the "drive" as the user sees it.)
>>
>> Of course, once you start mirror, you aren't able to access that bitmap
>> through that namepair anymore -- the "address" of the bitmap has "changed"!
>>
>> (In actual fact, the bitmap always had two addresses; and simply we lost
>> an alias -- but it's the one that the user likely used to create the
>> bitmap, so that's bad.)
> 
> So if I understand correctly, the problem is that without a filter, in
> some setups we get a usable BlockBackend name like "drive1", but when a
> filter is added, we return the node-name instead which is
> auto-generated and will be different on the destination.
> 
> Looking at the ChildRole documentation:
> 
>     /* Returns a name that is supposedly more useful for human users than the
>      * node name for identifying the node in question (in particular, a BB
>      * name), or NULL if the parent can't provide a better name. */
>     const char *(*get_name)(BdrvChild *child);
> 
> I'd argue that a BlockBackend name is more useful for a human user even
> across filter, so I'd support a .get_name implementation for a filter
> child role (which Max wanted to introduce anyway for his filter series).
> 
> Of course, if you have a function that is made to return a convenient
> text for human users, and you use it for a stable ABI like the migration
> stream, this is an idea that would certainly have caused an entertaining
> Linus rant in the good old kernel times.
> 
>>> Is bitmap migration something that must be enabled explicitly or does
>>> it happen automatically? If it's explicit, then making an additional
>>> requirement (matching node-names) shouldn't be a problem.
>>
>> This means that bitmap migration becomes a blockdev-only feature.
> 
> I meant this more as the preferred way for the future rather than the
> only thing supported.
> 

My confusion then is how do you get the node-names to match *without*
using blockdev?

Or maybe I misunderstand.

We can always use explicit node-names whenever available, but if we
don't make blockdev a requirement, we need a fallback to determine an
addressable name.

> But Peter has actually mentioned that for libvirt it will be
> blockdev-only anyway. So do we even have a good reason to invest much
> for the non-blockdev case?
> 
> Maybe making it blockdev-only is actually pretty reasonable.
> 
>> Serious question: do we have plans to formally deprecate things like
>> -drive and mandate a blockdev workflow, or otherwise work to unify the
>> actual graph that gets created between the two methods?
> 
> It's high on my wishlist, though we can't before libvirt uses blockdev.
> 
> Maybe something to talk about at KVM Forum?
> 

Yes, please!

I've not well-understood the trajectory the block graph is taking. I
even wrote a graph visualization so I could understand the problem.

I think I have a better grip on what we have *currently* right now, so
I'd be keen to hear your roadmap for what we want to have and how it
ties into the CLI and QAPI.

>>>>> A simple way to do this might be a "child_unfiltered" BdrvChild role
>>>>> that simply bypasses the filter that was inserted and serves no real
>>>>> purpose other than to allow the child to have a parent link and find who
>>>>> it's """real""" parent is.
>>>>>
>>>>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
>>>>> feasible this quick idea might be, though.
>>>>>
>>>>>
>>>>> - Corollary fix #1: call error_setg if the bitmap node name that's about
>>>>> to go over the wire is an autogenerated node: this is never correct!
>>>>>
>>>>> (Why not? because the target is incapable of matching the node-name
>>>>> because they are randomly generated AND you cannot specify node-names
>>>>> with # prefixes as they are especially reserved!
>>>>>
>>>>> (This raises a related problem: if you explicitly add bitmaps to nodes
>>>>> with autogenerated names, you will be unable to migrate them.))
>>>>>
>>>>
>>>> In other words, we need a well defined way to match nodes on source and destination,
>>>> keeping in mind filters, to migrate bitmaps correctly.
>>>>
>>
>> Yes, exactly.
>>
>>>> Hm, did you thought about bitmaps in filters? It's not a problem to create bitmap in
>>>> mirror-top filter during mirror job:)
>>>>
>>>> Or what about bitmaps in Quorum children? Or what about bitmap in qcow2 file child bs?
>>>>
>>>> If node-names are different on source and destination, what is the same? Top blk name
>>>> and bdrv-children names (I recently saw Max's idea to check node "path" in iotest).
>>>
>>> blk_name has to be assumed to be "". The BdrvChild path changes when
>>> filters are inserted (and inserting filters on the destination that
>>> aren't present on the source, or vice versa, sounds like something that
>>> should just work).
>>>
>>> So both parts of this are not great ways for addressing nodes.
>>>
>>>> So, actually node is migration-addressable, if path <blk-name>/root[/child-name] to the
>>>> defines this node directly (we must not have children with same name for some node in
>>>> the path).
>>>>
>>>> And I think it's a correct way to define node in migration stream - by path.
>>>
>>> I'm afraid node-name is the only thing that could possibly work reliably
>>> for identifying nodes.
>>>
>>> Kevin
>>>
>>
>> It sounds like you are saying that bitmaps must become a blockdev-only
>> feature.
> 
> More like we can't rely on BlockBackend names in a blockdev setup.

Agree; so this discussion is now mostly about what the fallback looks like.

> BlockBackend names can work in a purely traditional setup if we make
> some effort to find the block backend even with filters in bettwen, but
> they aren't universal even then. And node-names work in the blockdev
> case, but they aren't universal either.
> 
> But as I said above, you made me wonder whether making it a
> blockdev-only feature wouldn't actually be a good idea.
> 

O:-) It's the most logically consistent, but we should figure out how to
make new CLI sugar that helps make our story consistent. Or get rid of
the CLI sugar, I guess.

(I kind of like sugar, perhaps infamously, but if our view is that QEMU
is a library and API and nothing more, the sugar can go and be replaced
by sugar in binding libraries that provide said sugar.)

>> I'm not sure if I have arrived at that conclusion yet, but it's at least
>> inarguable that with blockdev it's a lot simpler to guarantee correctness.
>>
>> However, we still have -cdrom and -hda and -drive and any number of
>> sugars that I think we aren't committed to getting rid of yet... (or ever?)
> 
> -hda and -cdrom currently have hard-coded BlockBackend names. I don't
> see what would stop us from changing this to hard-coded node names.
> 

Ideally, I'd see something like -hda and -cdrom become strict sugar for
-drive, which becomes strict sugar for -device/-blockdevice.

So that everything is using the same mechanisms and is well-defined.

I guess. Perhaps.

Let's really talk very seriously about this at KVM Forum -- it aligns
with some pet projects I want to investigate; namely simplifying the
QEMU CLI such that we can generate better documentation.

> -drive is something that we probably need to remove in its current form
> and introduce something new that is both user-friendly and powerful.
> Actually, we should probably add the replacement first. :-)
> 

Great! Looking forward to Lyon.

--js


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 16:12         ` Vladimir Sementsov-Ogievskiy
@ 2019-10-01 16:24           ` Kevin Wolf
  0 siblings, 0 replies; 47+ messages in thread
From: Kevin Wolf @ 2019-10-01 16:24 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: John Snow, qemu-devel, Qemu-block, Max Reitz

Am 01.10.2019 um 18:12 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 01.10.2019 18:58, Kevin Wolf wrote:
> > Am 01.10.2019 um 17:09 hat John Snow geschrieben:
> >>
> >>
> >> On 10/1/19 5:54 AM, Kevin Wolf wrote:
> >>> Am 01.10.2019 um 10:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>>> 01.10.2019 3:09, John Snow wrote:
> >>>>> Hi folks, I identified a problem with the migration code that Red Hat QE
> >>>>> found and thought you'd like to see it:
> >>>>>
> >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
> >>>>>
> >>>>> Very, very briefly: drive-mirror inserts a filter node that changes what
> >>>>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
> >>>>>
> >>>>>
> >>>>> Ignorant question #1: Can we multi-parent the filter node and
> >>>>> source-node? It looks like at the moment both consider their only parent
> >>>>> to be the block-job and don't have a link back to their parents otherwise.
> >>>>>
> >>>>>
> >>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
> >>>>> ultimately what we want is to be able to find the "addressable" name for
> >>>>> the node the bitmap is attached to, which would be the name of the first
> >>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
> >>>>> above that node.)
> >>>>
> >>>>
> >>>> Better would be to migrate by node-name only.. But am I right that
> >>>> node-names are different on source and destination? Or this situation
> >>>> changed?
> >>>
> >>> Traditionally, I think migration assumes that frontends (guest devices)
> >>> must match exactly, but backends may and usually will differ.
> >>>
> >>> Of course, dirty bitmaps are a backend feature that isn't really related
> >>> to guest devices, so this doesn't really work out any more in your case.
> >>> BlockBackend names are unusable for this purpose (especially as we're
> >>> moving towards anonymous BlockBackends everywhere), which I guess
> >>> essentially means node-name is the only option left.
> >>>
> >>
> >> The problem as I see it involves API stability.
> >>
> >> We allow block-dirty-bitmap-add against e.g. "drive1" through the
> >> block-backend name (the name of the "drive" as the user sees it.)
> >>
> >> Of course, once you start mirror, you aren't able to access that bitmap
> >> through that namepair anymore -- the "address" of the bitmap has "changed"!
> >>
> >> (In actual fact, the bitmap always had two addresses; and simply we lost
> >> an alias -- but it's the one that the user likely used to create the
> >> bitmap, so that's bad.)
> > 
> > So if I understand correctly, the problem is that without a filter, in
> > some setups we get a usable BlockBackend name like "drive1", but when a
> > filter is added, we return the node-name instead which is
> > auto-generated and will be different on the destination.
> > 
> > Looking at the ChildRole documentation:
> > 
> >      /* Returns a name that is supposedly more useful for human users than the
> >       * node name for identifying the node in question (in particular, a BB
> >       * name), or NULL if the parent can't provide a better name. */
> >      const char *(*get_name)(BdrvChild *child);
> > 
> > I'd argue that a BlockBackend name is more useful for a human user even
> > across filter, so I'd support a .get_name implementation for a filter
> > child role (which Max wanted to introduce anyway for his filter series).
> > 
> > Of course, if you have a function that is made to return a convenient
> > text for human users, and you use it for a stable ABI like the migration
> > stream, this is an idea that would certainly have caused an entertaining
> > Linus rant in the good old kernel times.
> > 
> >>> Is bitmap migration something that must be enabled explicitly or does
> >>> it happen automatically? If it's explicit, then making an additional
> >>> requirement (matching node-names) shouldn't be a problem.
> >>
> >> This means that bitmap migration becomes a blockdev-only feature.
> > 
> > I meant this more as the preferred way for the future rather than the
> > only thing supported.
> > 
> > But Peter has actually mentioned that for libvirt it will be
> > blockdev-only anyway. So do we even have a good reason to invest much
> > for the non-blockdev case?
> > 
> > Maybe making it blockdev-only is actually pretty reasonable.
> 
> We in Virtuozzo use bitmap migration, so I'd have to fix it at least
> downstream (it seems easier than switch downstream libvirt to blockdev now).
> 
> And what about original bug
> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
> ?
> 
> Also, if we make it blockdev-only upstream, what we mean by that? Node names
> on destination must match, or we add additional qmp command
> migration-set-bitmap-node-mapping, to specify mapping between node names on
> source and target?

Good question. :-)

I would have thought that just having matching node-names would be
pretty convenient for users, but Peter seems to disagree.

With a separate migration-set-bitmap-node-mapping, what would we do if
migrate is called before a mapping is configured? Would this cause
migration failure? I would find that pretty heavy.

Maybe default to a 1:1 mapping and allow the user to override it? (And
if so, do we do the mapping on the source or the destination?)

Kevin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 15:26           ` Max Reitz
@ 2019-10-02  7:34             ` Peter Krempa
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Krempa @ 2019-10-02  7:34 UTC (permalink / raw)
  To: Max Reitz; +Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, Qemu-block

On Tue, Oct 01, 2019 at 17:26:13 +0200, Max Reitz wrote:
> On 01.10.19 16:53, Vladimir Sementsov-Ogievskiy wrote:
> > 01.10.2019 17:34, Max Reitz wrote:
> >> On 01.10.19 16:27, Vladimir Sementsov-Ogievskiy wrote:
> >>> 01.10.2019 17:13, Max Reitz wrote:
> >>>> On 01.10.19 16:00, Vladimir Sementsov-Ogievskiy wrote:
> >>>>> 01.10.2019 3:09, John Snow wrote:

[...]

> Well, I’d think it should be on the one with the same node name, but it
> appears others don’t want a node-name-based mapping, so maybe I should
> just stop trying to be part of the discussion. :-)
> 
> > Still, if you don't like skipping explicit filters, I'm OK with implicit only, it will
> > fix the bug we are saying about.
> > 
> > But why you don't like creating some additional explicit filters on target (prior to
> > migration process) which we didn't have on source?
> 
> Because I feel like (without having too much insight into migration, I
> admit) that migration is generally a process where you move from one VM
> to another, but both should have the same configuration.  If you want to

During migrations you might want to change backends of the devices
though. (E.g. reconfigure disk paths) or network connections so that the
VM works on the different host.

In addition some bits of migration are not symmetrical. In libvirt we
use a blockdev/drive-mirror to copy over disks if you request migration
with non-shared storage. The destination runs an NBD server and the
source runs the blockjob to copy it over. There can't be symmetry in
this case.

> change the configuration, you do that before or after the migration.
> (I’d think.)




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 16:07       ` John Snow
@ 2019-10-02  8:12         ` Kevin Wolf
  2019-10-02 10:46         ` Peter Krempa
  1 sibling, 0 replies; 47+ messages in thread
From: Kevin Wolf @ 2019-10-02  8:12 UTC (permalink / raw)
  To: John Snow; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Qemu-block, Max Reitz

Am 01.10.2019 um 18:07 hat John Snow geschrieben:
> >> So the real back-resolution mechanism is:
> 
> Amendment:
>    - If our local node-name N is well-formed, use this.
>    - Otherwise:
> >> - Find the first non-filter ancestor, A
> >> - if A is not a block-backend, we must use our node-local name.
>      Amendment: If it's not a BB, we have no addressable name
>                 for the bitmap and this is an error.
> >> - if A's name is empty, we must use our node-local name.
>      Amendment: If it's empty, we have no addressable name
>                 for the bitmap and this is an error.
> >> - If the name we have chosen is not id_wellformed, we have no
> >> migration-stable addressable name for this bitmap and the migration must
> >> fail!
>      (Handled by above amendments.)
> 
> The reason for the change is to prefer user-defined node names whenever
> possible; only trying to find a "device" or "backend" name whenever we
> fail to find one.

I think we also need to clarify what "find the first non-filter" means:
"first" means not skipping over non-filters, but there can still be
multiple parents, and we can take the name of any of them.

In legacy setups, we can expect that there is only one BB that has a
defined name (i.e. nodes aren't reused in multiple subtrees) whereas the
other BBs may come from things like block jobs. We must not give up and
error out when the job BB is returned first.

Kevin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-01 16:07       ` John Snow
  2019-10-02  8:12         ` Kevin Wolf
@ 2019-10-02 10:46         ` Peter Krempa
  2019-10-02 11:11           ` Kevin Wolf
  2019-10-02 21:35           ` John Snow
  1 sibling, 2 replies; 47+ messages in thread
From: Peter Krempa @ 2019-10-02 10:46 UTC (permalink / raw)
  To: John Snow; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Qemu-block, Max Reitz

On Tue, Oct 01, 2019 at 12:07:54 -0400, John Snow wrote:
> 
> 
> On 10/1/19 11:57 AM, Vladimir Sementsov-Ogievskiy wrote:
> > 01.10.2019 17:10, John Snow wrote:
> >>
> >>
> >> On 10/1/19 10:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> >>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
> >>>> ultimately what we want is to be able to find the "addressable" name for
> >>>> the node the bitmap is attached to, which would be the name of the first
> >>>> ancestor node that isn't a filter. (OR, the name of the block-backend
> >>>> above that node.)
> >>> Not the name of ancestor node, it will break mapping: it must be name of the
> >>> node itself or name of parent (may be through several filters) block-backend
> >>>
> >>
> >> Ah, you are right of course -- because block-backends are the only
> >> "nodes" for which we actually descend the graph and add the bitmap to
> >> its child.
> >>
> >> So the real back-resolution mechanism is:
> >>
> 
> Amendment:
>    - If our local node-name N is well-formed, use this.

I'd like to re-iterate that the necessity to keep node names same on
both sides of migration is unexpected, undocumented and in some cases
impossible.

If you want to mandate that they must be kept the same please document
it and also note the following:

- during migrations the storage layout may change e.g. a backing chain
  may become flattened, thus keeping node names stable beyond the top
  layer is impossible

- in some cases (readonly image in a cdrom not present on destination,
  thus not relevant here probably) it may even become impossible to
  create any node thus keeping the top node may be impossible

- it should be documented when and why this happens and how management
  tools are supposed to do it

- please let me know what's actually expected, since libvirt
  didn't enable blockdev yet we can fix any unexpected expectations

- Document it so that the expectations don't change after this.

- Ideally node names will not be bound to anything and freely
  changeable. If necessary we can provide a map to qemu during migration
  which is probably less painful and more straightforward than keeping
  them in sync somehow ...

>    - Otherwise:
> >> - Find the first non-filter ancestor, A
> >> - if A is not a block-backend, we must use our node-local name.
>      Amendment: If it's not a BB, we have no addressable name
>                 for the bitmap and this is an error.
> >> - if A's name is empty, we must use our node-local name.
>      Amendment: If it's empty, we have no addressable name
>                 for the bitmap and this is an error.
> >> - If the name we have chosen is not id_wellformed, we have no
> >> migration-stable addressable name for this bitmap and the migration must
> >> fail!
>      (Handled by above amendments.)



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-02 10:46         ` Peter Krempa
@ 2019-10-02 11:11           ` Kevin Wolf
  2019-10-02 12:22             ` Vladimir Sementsov-Ogievskiy
  2019-10-02 13:43             ` Peter Krempa
  2019-10-02 21:35           ` John Snow
  1 sibling, 2 replies; 47+ messages in thread
From: Kevin Wolf @ 2019-10-02 11:11 UTC (permalink / raw)
  To: Peter Krempa
  Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, Qemu-block,
	Max Reitz

Am 02.10.2019 um 12:46 hat Peter Krempa geschrieben:
> On Tue, Oct 01, 2019 at 12:07:54 -0400, John Snow wrote:
> > 
> > 
> > On 10/1/19 11:57 AM, Vladimir Sementsov-Ogievskiy wrote:
> > > 01.10.2019 17:10, John Snow wrote:
> > >>
> > >>
> > >> On 10/1/19 10:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> > >>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
> > >>>> ultimately what we want is to be able to find the "addressable" name for
> > >>>> the node the bitmap is attached to, which would be the name of the first
> > >>>> ancestor node that isn't a filter. (OR, the name of the block-backend
> > >>>> above that node.)
> > >>> Not the name of ancestor node, it will break mapping: it must be name of the
> > >>> node itself or name of parent (may be through several filters) block-backend
> > >>>
> > >>
> > >> Ah, you are right of course -- because block-backends are the only
> > >> "nodes" for which we actually descend the graph and add the bitmap to
> > >> its child.
> > >>
> > >> So the real back-resolution mechanism is:
> > >>
> > 
> > Amendment:
> >    - If our local node-name N is well-formed, use this.
> 
> I'd like to re-iterate that the necessity to keep node names same on
> both sides of migration is unexpected, undocumented and in some cases
> impossible.

I think the (implicitly made) requirement is not that all node-names are
kept the same, but only the node-names of those nodes for which
migration transfers some state.

It seems to me that bitmap migration is the first case of putting
something in the migration stream that isn't related to a frontend, but
to the backend, so the usual device hierarchy to address information
doesn't work here. And it seems the implications of this weren't really
considered sufficiently, resulting in the design problem we're
discussing now.

What we need to transfer is dirty bitmaps, which can be attached to any
node in the block graph. If we accept that the way to transfer this is
the migration stream, we need a way to tell which bitmap belongs to
which node. Matching node-name is the obvious answer, just like a
matching device tree hierarchy is used for frontends.

If we don't want to use the migration stream for backends, we would need
to find another way to transfer the bitmaps. I would welcome removing
backend data from the migration stream, but if this includes
non-persistent bitmaps, I don't see what the alternative could be.

> If you want to mandate that they must be kept the same please document
> it and also note the following:
> 
> - during migrations the storage layout may change e.g. a backing chain
>   may become flattened, thus keeping node names stable beyond the top
>   layer is impossible

You don't want to transfer bitmaps of nodes that you're going to drop.
I'm not an expert for these bitmaps, but I think this just means you
would have to disable any bitmaps on the backing files to be dropped on
the source host before you migrate.

> - in some cases (readonly image in a cdrom not present on destination,
>   thus not relevant here probably) it may even become impossible to
>   create any node thus keeping the top node may be impossible

Same thing, you don't want to transfer a bitmap for a node that
disappears.

> - it should be documented when and why this happens and how management
>   tools are supposed to do it
> 
> - please let me know what's actually expected, since libvirt
>   didn't enable blockdev yet we can fix any unexpected expectations
> 
> - Document it so that the expectations don't change after this.

Yes, we need a good and ideally future-proof rule of which node-names
need to stay the same. Currently it's only bitmaps, but might we get
another feature later where we want to transfer more backend data?

> - Ideally node names will not be bound to anything and freely
>   changeable. If necessary we can provide a map to qemu during migration
>   which is probably less painful and more straightforward than keeping
>   them in sync somehow ...

A map feels painful for the average user (and for the QEMU
implementation), even if it looks convenient for libvirt. If anything,
I'd make it optional and default to 1:1 mappings for anything that isn't
explicitly mapped.

Kevin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-02 11:11           ` Kevin Wolf
@ 2019-10-02 12:22             ` Vladimir Sementsov-Ogievskiy
  2019-10-02 13:48               ` Peter Krempa
  2019-10-02 13:43             ` Peter Krempa
  1 sibling, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-02 12:22 UTC (permalink / raw)
  To: Kevin Wolf, Peter Krempa; +Cc: John Snow, qemu-devel, Qemu-block, Max Reitz

02.10.2019 14:11, Kevin Wolf wrote:
> Am 02.10.2019 um 12:46 hat Peter Krempa geschrieben:
>> On Tue, Oct 01, 2019 at 12:07:54 -0400, John Snow wrote:
>>>
>>>
>>> On 10/1/19 11:57 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> 01.10.2019 17:10, John Snow wrote:
>>>>>
>>>>>
>>>>> On 10/1/19 10:00 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
>>>>>>> ultimately what we want is to be able to find the "addressable" name for
>>>>>>> the node the bitmap is attached to, which would be the name of the first
>>>>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
>>>>>>> above that node.)
>>>>>> Not the name of ancestor node, it will break mapping: it must be name of the
>>>>>> node itself or name of parent (may be through several filters) block-backend
>>>>>>
>>>>>
>>>>> Ah, you are right of course -- because block-backends are the only
>>>>> "nodes" for which we actually descend the graph and add the bitmap to
>>>>> its child.
>>>>>
>>>>> So the real back-resolution mechanism is:
>>>>>
>>>
>>> Amendment:
>>>     - If our local node-name N is well-formed, use this.
>>
>> I'd like to re-iterate that the necessity to keep node names same on
>> both sides of migration is unexpected, undocumented and in some cases
>> impossible.
> 
> I think the (implicitly made) requirement is not that all node-names are
> kept the same, but only the node-names of those nodes for which
> migration transfers some state.
> 
> It seems to me that bitmap migration is the first case of putting
> something in the migration stream that isn't related to a frontend, but
> to the backend, so the usual device hierarchy to address information
> doesn't work here. And it seems the implications of this weren't really
> considered sufficiently, resulting in the design problem we're
> discussing now.
> 
> What we need to transfer is dirty bitmaps, which can be attached to any
> node in the block graph. If we accept that the way to transfer this is
> the migration stream, we need a way to tell which bitmap belongs to
> which node. Matching node-name is the obvious answer, just like a
> matching device tree hierarchy is used for frontends.
> 
> If we don't want to use the migration stream for backends, we would need
> to find another way to transfer the bitmaps. I would welcome removing
> backend data from the migration stream, but if this includes
> non-persistent bitmaps, I don't see what the alternative could be.

But how to migrate persistent bitmaps if storage is not shared?

And even with only persistent bitmaps and shared storage: bitmaps data may
be large, and storing/loading it during migration downtime will increase
it.

> 
>> If you want to mandate that they must be kept the same please document
>> it and also note the following:
>>
>> - during migrations the storage layout may change e.g. a backing chain
>>    may become flattened, thus keeping node names stable beyond the top
>>    layer is impossible
> 
> You don't want to transfer bitmaps of nodes that you're going to drop.
> I'm not an expert for these bitmaps, but I think this just means you
> would have to disable any bitmaps on the backing files to be dropped on
> the source host before you migrate.

You mean remove them.. But yes, any way it's not a problem. If corresponding
node isn't exist on target, we don't need any bitmaps for it.

> 
>> - in some cases (readonly image in a cdrom not present on destination,
>>    thus not relevant here probably) it may even become impossible to
>>    create any node thus keeping the top node may be impossible
> 
> Same thing, you don't want to transfer a bitmap for a node that
> disappears.
> 
>> - it should be documented when and why this happens and how management
>>    tools are supposed to do it
>>
>> - please let me know what's actually expected, since libvirt
>>    didn't enable blockdev yet we can fix any unexpected expectations
>>
>> - Document it so that the expectations don't change after this.
> 
> Yes, we need a good and ideally future-proof rule of which node-names
> need to stay the same. Currently it's only bitmaps, but might we get
> another feature later where we want to transfer more backend data?
> 
>> - Ideally node names will not be bound to anything and freely
>>    changeable. If necessary we can provide a map to qemu during migration
>>    which is probably less painful and more straightforward than keeping
>>    them in sync somehow ...
> 
> A map feels painful for the average user (and for the QEMU
> implementation), even if it looks convenient for libvirt. If anything,
> I'd make it optional and default to 1:1 mappings for anything that isn't
> explicitly mapped.
> 

Hmm, I don't think that optional map is painful.

What about the following:

1. If map is provided:
- migrate only bitmaps in nodes, specified by map
- bitmaps migrated only accordingly to the map, block device names are not involved at all

2. If map not provided:
- For nodes directly bound to named block backends, or through several filters, use name of this
block backend.
- For other nodes use node-name

===

And I think [2.] should be done now to fix current bug, and [1.] may be postponed until we
really need it.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-02 11:11           ` Kevin Wolf
  2019-10-02 12:22             ` Vladimir Sementsov-Ogievskiy
@ 2019-10-02 13:43             ` Peter Krempa
  2019-10-02 14:03               ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 47+ messages in thread
From: Peter Krempa @ 2019-10-02 13:43 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, John Snow, qemu-devel, Qemu-block,
	Max Reitz

On Wed, Oct 02, 2019 at 13:11:47 +0200, Kevin Wolf wrote:
> Am 02.10.2019 um 12:46 hat Peter Krempa geschrieben:

[...]

> > I'd like to re-iterate that the necessity to keep node names same on
> > both sides of migration is unexpected, undocumented and in some cases
> > impossible.
> 
> I think the (implicitly made) requirement is not that all node-names are
> kept the same, but only the node-names of those nodes for which
> migration transfers some state.

[1] This also implies that node names of the nodes where migration should
not transfer state must be unique on the both sides since you can't
control it otherwise.

> It seems to me that bitmap migration is the first case of putting
> something in the migration stream that isn't related to a frontend, but
> to the backend, so the usual device hierarchy to address information
> doesn't work here. And it seems the implications of this weren't really
> considered sufficiently, resulting in the design problem we're
> discussing now.

This should then also be a moment to carefully think about the
semantics of migrating data for backends which don't need to be
identical on both sides of the migration for the VM to work correctly.

I think that any feature which touches backends should ideally be an
opt-in. This would call for a explicit action to use it which would also
allow management apps to consider expectations and implications of
enabling it rather than doing it automatically. One possibility would be
also to make it introspectable in such a way that it can be made opt-in
by disabling all unknown features programatically in the mgmt app.

In case of migration of bitmaps if node-names are going to be used for
the matching, it can have interresting consequences (such as matching
wrong ones if they don't match) and thus the feature should be used
knowingly.

> What we need to transfer is dirty bitmaps, which can be attached to any
> node in the block graph. If we accept that the way to transfer this is
> the migration stream, we need a way to tell which bitmap belongs to
> which node. Matching node-name is the obvious answer, just like a
> matching device tree hierarchy is used for frontends.



> If we don't want to use the migration stream for backends, we would need
> to find another way to transfer the bitmaps. I would welcome removing
> backend data from the migration stream, but if this includes
> non-persistent bitmaps, I don't see what the alternative could be.
> 
> > If you want to mandate that they must be kept the same please document
> > it and also note the following:
> > 
> > - during migrations the storage layout may change e.g. a backing chain
> >   may become flattened, thus keeping node names stable beyond the top
> >   layer is impossible
> 
> You don't want to transfer bitmaps of nodes that you're going to drop.
> I'm not an expert for these bitmaps, but I think this just means you
> would have to disable any bitmaps on the backing files to be dropped on
> the source host before you migrate.

Well it depends actually on what's happening. In some cases we use a
drive/blockdev-mirror to transfer the image if it's non-shared which
also by default flattens the backing chain. In such case you still might
want to transfer the bitmaps over and merge them during migration so
that backups can be taken despite the chain being flattened. In this
case we still want to use the bitmap on the destination with all new
changes merged in, but it must also allow the NBD server.

> > - in some cases (readonly image in a cdrom not present on destination,
> >   thus not relevant here probably) it may even become impossible to
> >   create any node thus keeping the top node may be impossible
> 
> Same thing, you don't want to transfer a bitmap for a node that
> disappears.

[1]

> > - it should be documented when and why this happens and how management
> >   tools are supposed to do it
> > 
> > - please let me know what's actually expected, since libvirt
> >   didn't enable blockdev yet we can fix any unexpected expectations
> > 
> > - Document it so that the expectations don't change after this.
> 
> Yes, we need a good and ideally future-proof rule of which node-names
> need to stay the same. Currently it's only bitmaps, but might we get
> another feature later where we want to transfer more backend data?

As I've said above, ideally anything like that will be an explicit
opt-in so that we can actually make sure that it's doing the right thing
if the user reconfigures the storage backends at the destination of
migration.

> > - Ideally node names will not be bound to anything and freely
> >   changeable. If necessary we can provide a map to qemu during migration
> >   which is probably less painful and more straightforward than keeping
> >   them in sync somehow ...
> 
> A map feels painful for the average user (and for the QEMU
> implementation), even if it looks convenient for libvirt. If anything,
> I'd make it optional and default to 1:1 mappings for anything that isn't
> explicitly mapped.

Well, without an explicit map the above cases (drop the bitmaps if the
destination does not have the corresponding image) become hard to tell
apart from failures. That will mean that any missing mapping will be
ignored and on any wrong configuration bitmaps just vanish rather than
reporting error.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-02 12:22             ` Vladimir Sementsov-Ogievskiy
@ 2019-10-02 13:48               ` Peter Krempa
  0 siblings, 0 replies; 47+ messages in thread
From: Peter Krempa @ 2019-10-02 13:48 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Kevin Wolf, John Snow, qemu-devel, Qemu-block, Max Reitz

On Wed, Oct 02, 2019 at 12:22:01 +0000, Vladimir Sementsov-Ogievskiy wrote:
> 02.10.2019 14:11, Kevin Wolf wrote:
> > Am 02.10.2019 um 12:46 hat Peter Krempa geschrieben:
> >> On Tue, Oct 01, 2019 at 12:07:54 -0400, John Snow wrote:
> >>>
> >>>
> >>> On 10/1/19 11:57 AM, Vladimir Sementsov-Ogievskiy wrote:
> >>>> 01.10.2019 17:10, John Snow wrote:
> >>>>>
> >>>>>
> >>>>> On 10/1/19 10:00 AM, Vladimir Sementsov-Ogievskiy wrote:
> >>>>>>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
> >>>>>>> ultimately what we want is to be able to find the "addressable" name for
> >>>>>>> the node the bitmap is attached to, which would be the name of the first
> >>>>>>> ancestor node that isn't a filter. (OR, the name of the block-backend
> >>>>>>> above that node.)
> >>>>>> Not the name of ancestor node, it will break mapping: it must be name of the
> >>>>>> node itself or name of parent (may be through several filters) block-backend
> >>>>>>
> >>>>>
> >>>>> Ah, you are right of course -- because block-backends are the only
> >>>>> "nodes" for which we actually descend the graph and add the bitmap to
> >>>>> its child.
> >>>>>
> >>>>> So the real back-resolution mechanism is:
> >>>>>
> >>>
> >>> Amendment:
> >>>     - If our local node-name N is well-formed, use this.
> >>
> >> I'd like to re-iterate that the necessity to keep node names same on
> >> both sides of migration is unexpected, undocumented and in some cases
> >> impossible.
> > 
> > I think the (implicitly made) requirement is not that all node-names are
> > kept the same, but only the node-names of those nodes for which
> > migration transfers some state.
> > 
> > It seems to me that bitmap migration is the first case of putting
> > something in the migration stream that isn't related to a frontend, but
> > to the backend, so the usual device hierarchy to address information
> > doesn't work here. And it seems the implications of this weren't really
> > considered sufficiently, resulting in the design problem we're
> > discussing now.
> > 
> > What we need to transfer is dirty bitmaps, which can be attached to any
> > node in the block graph. If we accept that the way to transfer this is
> > the migration stream, we need a way to tell which bitmap belongs to
> > which node. Matching node-name is the obvious answer, just like a
> > matching device tree hierarchy is used for frontends.
> > 
> > If we don't want to use the migration stream for backends, we would need
> > to find another way to transfer the bitmaps. I would welcome removing
> > backend data from the migration stream, but if this includes
> > non-persistent bitmaps, I don't see what the alternative could be.
> 
> But how to migrate persistent bitmaps if storage is not shared?

In that case we are exporting new fresh images from destination qemu
using NBD and mapping them into the source and using
blockdev/drive-mirror to copy the data.

At that point (since it's mapped) we could theoretically copy them over.

We certainly do want to copy over bitmaps using blockdev-mirror when
doing a standar virDomainBlockCopy operation in libvirt, so this could
share the code path.

> 
> And even with only persistent bitmaps and shared storage: bitmaps data may
> be large, and storing/loading it during migration downtime will increase
> it.

Note that for non-shared storage migration the downtime is bigger
anyways due to us setting up the NBD and having to terminate the jobs
etc.

> 
> > 
> >> If you want to mandate that they must be kept the same please document
> >> it and also note the following:
> >>
> >> - during migrations the storage layout may change e.g. a backing chain
> >>    may become flattened, thus keeping node names stable beyond the top
> >>    layer is impossible
> > 
> > You don't want to transfer bitmaps of nodes that you're going to drop.
> > I'm not an expert for these bitmaps, but I think this just means you
> > would have to disable any bitmaps on the backing files to be dropped on
> > the source host before you migrate.
> 
> You mean remove them.. But yes, any way it's not a problem. If corresponding
> node isn't exist on target, we don't need any bitmaps for it.
> 
> > 
> >> - in some cases (readonly image in a cdrom not present on destination,
> >>    thus not relevant here probably) it may even become impossible to
> >>    create any node thus keeping the top node may be impossible
> > 
> > Same thing, you don't want to transfer a bitmap for a node that
> > disappears.
> > 
> >> - it should be documented when and why this happens and how management
> >>    tools are supposed to do it
> >>
> >> - please let me know what's actually expected, since libvirt
> >>    didn't enable blockdev yet we can fix any unexpected expectations
> >>
> >> - Document it so that the expectations don't change after this.
> > 
> > Yes, we need a good and ideally future-proof rule of which node-names
> > need to stay the same. Currently it's only bitmaps, but might we get
> > another feature later where we want to transfer more backend data?
> > 
> >> - Ideally node names will not be bound to anything and freely
> >>    changeable. If necessary we can provide a map to qemu during migration
> >>    which is probably less painful and more straightforward than keeping
> >>    them in sync somehow ...
> > 
> > A map feels painful for the average user (and for the QEMU
> > implementation), even if it looks convenient for libvirt. If anything,
> > I'd make it optional and default to 1:1 mappings for anything that isn't
> > explicitly mapped.
> > 
> 
> Hmm, I don't think that optional map is painful.
> 
> What about the following:
> 
> 1. If map is provided:
> - migrate only bitmaps in nodes, specified by map
> - bitmaps migrated only accordingly to the map, block device names are not involved at all
> 
> 2. If map not provided:
> - For nodes directly bound to named block backends, or through several filters, use name of this
> block backend.
> - For other nodes use node-name

As I've pointed out in a different response, it feels wrong to me to
enable such a feature by default if it can mis-map the targets for the
bitmap migration unknowingly. Especially since users weren't expecting
that the implications of naming nodes in conjunction with migration.

> 
> ===
> 
> And I think [2.] should be done now to fix current bug, and [1.] may be postponed until we
> really need it.
> 
> -- 
> Best regards,
> Vladimir


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-02 13:43             ` Peter Krempa
@ 2019-10-02 14:03               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-02 14:03 UTC (permalink / raw)
  To: Peter Krempa, Kevin Wolf; +Cc: John Snow, qemu-devel, Qemu-block, Max Reitz

02.10.2019 16:43, Peter Krempa wrote:
> On Wed, Oct 02, 2019 at 13:11:47 +0200, Kevin Wolf wrote:
>> Am 02.10.2019 um 12:46 hat Peter Krempa geschrieben:
> 
> [...]
> 
>>> I'd like to re-iterate that the necessity to keep node names same on
>>> both sides of migration is unexpected, undocumented and in some cases
>>> impossible.
>>
>> I think the (implicitly made) requirement is not that all node-names are
>> kept the same, but only the node-names of those nodes for which
>> migration transfers some state.
> 
> [1] This also implies that node names of the nodes where migration should
> not transfer state must be unique on the both sides since you can't
> control it otherwise.
> 
>> It seems to me that bitmap migration is the first case of putting
>> something in the migration stream that isn't related to a frontend, but
>> to the backend, so the usual device hierarchy to address information
>> doesn't work here. And it seems the implications of this weren't really
>> considered sufficiently, resulting in the design problem we're
>> discussing now.
> 
> This should then also be a moment to carefully think about the
> semantics of migrating data for backends which don't need to be
> identical on both sides of the migration for the VM to work correctly.
> 
> I think that any feature which touches backends should ideally be an
> opt-in. This would call for a explicit action to use it which would also
> allow management apps to consider expectations and implications of
> enabling it rather than doing it automatically. One possibility would be
> also to make it introspectable in such a way that it can be made opt-in
> by disabling all unknown features programatically in the mgmt app.

Currently bitmaps are migrated only if bitmaps migration capability is enabled,
so it's not so bad.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-02 10:46         ` Peter Krempa
  2019-10-02 11:11           ` Kevin Wolf
@ 2019-10-02 21:35           ` John Snow
  2019-10-03 10:14             ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 47+ messages in thread
From: John Snow @ 2019-10-02 21:35 UTC (permalink / raw)
  To: Peter Krempa
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Qemu-block, Max Reitz

On 10/2/19 6:46 AM, Peter Krempa wrote:

[ * poof * ]

> 
> I'd like to re-iterate that the necessity to keep node names same on
> both sides of migration is unexpected, undocumented and in some cases
> impossible.
> 
> If you want to mandate that they must be kept the same please document
> it and also note the following:
> 
> - during migrations the storage layout may change e.g. a backing chain
>   may become flattened, thus keeping node names stable beyond the top
>   layer is impossible
> 

The struct and layout of the graph is entirely unrelated to the
requirement that a bitmap attached to a node with a name on the source
needs to have a node with the same name on the destination. It's an
addressability requirement only.

Change it entirely to a new drive if you want, move it up or down the
graph, it doesn't matter.

Libvirt is in the best position to understand where bitmaps already are
and where it wants them to go.

> - in some cases (readonly image in a cdrom not present on destination,
>   thus not relevant here probably) it may even become impossible to
>   create any node thus keeping the top node may be impossible
> 

It's not mandatory to recreate the graph exactly. Consider what you are
saying:

- Libvirt adds a bitmap to node "N"
- Libvirt asks QEMU for bitmap migration
- Libvirt migrates to a QEMU instance that not only does not have a node
"N", but has no analogous node at all!

I believe this is right to fail as there is no way to fulfill the
request as-is.

(More below if you feel it's valid to migrate only some bitmaps.)

> - it should be documented when and why this happens and how management
>   tools are supposed to do it
> 

OK, agreed, and I am sorry that our existing story has been hand-wavey.

Let me tell you the exact specifics of the current broken logic so you
can understand the requirements as they exist right now.

1. Bitmaps attempt to use their device name to migrate, if available.
This covers 99% of use cases where a bitmap was added to a node that was
attached directly to a device model.

This includes almost all usual cases: bitmaps loaded from qcow2 files,
bitmaps added via QMP to a root node, bitmaps added via QMP to a drive name.

(It does not include cases where bitmaps are intentionally added to
nodes that aren't a device root. Libvirt, I believe, can simply never do
this and it will never come up.)

2. If a device name isn't available because this bitmap is not attached
to a root OR the BB does not have a name, we migrate using the node name.

3. No attention is paid whatsoever to whether a node name is
automatically generated or not. In effect, if the device name lookup
fails we currently "trust" that the node name is something meaningful.

4. The bug as I originally perceived of it relates specifically to our
failure to resolve the device name after graph manipulations.


Under these rules, if we "fixed" #4, node-names wouldn't show up in the
stream at all if you never attached a bitmap to a non-root node. This is
probably what you expected.

Node-names only feature in cases where we can't find a device/drive
name, which is:
A. When a bitmap is attached to a non-root node specifically. Libvirt
can simply never do this!
B. When under a graph transformation for drive-mirror; point #4 above.


The workaround for this bug if we don't find a good policy:

1. Use blockdev.
2. Give explicit, semantic names to the root nodes that represent the drive.
3. Any name used to add a bitmap must appear on the destination in a
migration.


> - please let me know what's actually expected, since libvirt
>   didn't enable blockdev yet we can fix any unexpected expectations
> 

I have been and will continue to be diligent in CCing you and libvirt list.

At the moment I am still leaning towards the idea that libvirt should
expect that any bitmaps attached to a node with an explicit node-name
will want to use those names to migrate, but that we might be able to
limit the cases such that you will be able to avoid the circumstance
entirely.

However, QEMU's actual implementation is that they are node object. QEMU
is ill-equipped to make semantic decisions about what the bitmaps "mean"
or "represent"; the name is unfortunately the most explicit identifier
we have to convey what bitmap we are talking about.

It will be libvirt's job to use node names to help facilitate QEMU's
transfer of these objects during migration in a semantically helpful way.

> - Document it so that the expectations don't change after this.
> 

OK. I will take charge on this, once we reach a consensus.

> - Ideally node names will not be bound to anything and freely
>   changeable. If necessary we can provide a map to qemu during migration
>   which is probably less painful and more straightforward than keeping
>   them in sync somehow ...

Why do you want node names to be freely manipulable?

The only constraint we've actually added is that a root node (that has a
bitmap) attached to a device needs to have a name that is available on
the target.

(Oh, and, that the virtual size of that target matches the source.)

> 



Phew. In terms of non-direct replies to Peter's questions above, I've
written out like a dozen failed replies to this, so I'm still quite
confused but need to work on other things today.

I currently think that:


1. If a user uses block-dirty-bitmap-add, we have some sense of where
they wanted the bitmap to go in the graph because they specified a name.
Migration, if left as an automatic (opt-in) process, should try to
migrate in-kind:

- If the user used a drive name, try to use a drive name to migrate. If
there is no drive name and our node name is autogenerated, we cannot
migrate this bitmap.
- If the user used an explicit, non-generated node name, use the node
name. If the user used an implicit node-name, we need to try to resolve
the device name again. If that's not possible, the bitmap cannot be
migrated.


This implies that QEMU will try to "guess" where bitmaps go when using
-drive/-device, but will rely on explicit configuration when using
blockdev. I think the spirit of this idea is correct.

(Vladimir: this is indeed different from EITHER of my suggested
resolution orders over the last two days.)



2. I like Vladimir's idea of providing a "default" migration approach,
but allowing libvirt to override some features of it if necessary.

Overrides that I think will be helpful in alleviating any pain in the
long term:

- Whitelists / Blacklists

The ability to provide either a whitelist or a blacklist for bitmaps
that we desire to migrate. The default can continue to be: "All bitmaps
with a name." This will allow libvirt to drop bitmaps at its discretion
if it performs a block graph reconfiguration on migration and the bitmap
is no longer semantically relevant or appropriate for whatever reason.
This is superior to explicitly deleting bitmaps or dropping nodes in
order to have a valid recourse on failed migrations.


- The ability to override specific mappings on an as-needed basis. I
believe the default resolution mechanism should be one that behaves like
I specify above; but if that resolution is untenable for some reason,
you can specify a remapping if you really require.

I am actually hoping that remapping is actually not necessary, because I
think it's sufficient to use node-names to explicitly direct bitmaps to
their intended targets.

But if we truly do need that power, I'm open to providing an interface
to specify it.



I hope everyone is as confused as I am, now.
--js


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-02 21:35           ` John Snow
@ 2019-10-03 10:14             ` Vladimir Sementsov-Ogievskiy
  2019-10-03 23:34               ` John Snow
  0 siblings, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-03 10:14 UTC (permalink / raw)
  To: John Snow, Peter Krempa; +Cc: qemu-devel, Qemu-block, Max Reitz

03.10.2019 0:35, John Snow wrote:
> On 10/2/19 6:46 AM, Peter Krempa wrote:
> 
> [ * poof * ]
> 
>>
>> I'd like to re-iterate that the necessity to keep node names same on
>> both sides of migration is unexpected, undocumented and in some cases
>> impossible.
>>
>> If you want to mandate that they must be kept the same please document
>> it and also note the following:
>>
>> - during migrations the storage layout may change e.g. a backing chain
>>    may become flattened, thus keeping node names stable beyond the top
>>    layer is impossible
>>
> 
> The struct and layout of the graph is entirely unrelated to the
> requirement that a bitmap attached to a node with a name on the source
> needs to have a node with the same name on the destination. It's an
> addressability requirement only.
> 
> Change it entirely to a new drive if you want, move it up or down the
> graph, it doesn't matter.
> 
> Libvirt is in the best position to understand where bitmaps already are
> and where it wants them to go.
> 
>> - in some cases (readonly image in a cdrom not present on destination,
>>    thus not relevant here probably) it may even become impossible to
>>    create any node thus keeping the top node may be impossible
>>
> 
> It's not mandatory to recreate the graph exactly. Consider what you are
> saying:
> 
> - Libvirt adds a bitmap to node "N"
> - Libvirt asks QEMU for bitmap migration
> - Libvirt migrates to a QEMU instance that not only does not have a node
> "N", but has no analogous node at all!
> 
> I believe this is right to fail as there is no way to fulfill the
> request as-is.
> 
> (More below if you feel it's valid to migrate only some bitmaps.)
> 
>> - it should be documented when and why this happens and how management
>>    tools are supposed to do it
>>
> 
> OK, agreed, and I am sorry that our existing story has been hand-wavey.
> 
> Let me tell you the exact specifics of the current broken logic so you
> can understand the requirements as they exist right now.
> 
> 1. Bitmaps attempt to use their device name to migrate, if available.
> This covers 99% of use cases where a bitmap was added to a node that was
> attached directly to a device model.
> 
> This includes almost all usual cases: bitmaps loaded from qcow2 files,
> bitmaps added via QMP to a root node, bitmaps added via QMP to a drive name.
> 
> (It does not include cases where bitmaps are intentionally added to
> nodes that aren't a device root. Libvirt, I believe, can simply never do
> this and it will never come up.)
> 
> 2. If a device name isn't available because this bitmap is not attached
> to a root OR the BB does not have a name, we migrate using the node name.
> 
> 3. No attention is paid whatsoever to whether a node name is
> automatically generated or not. In effect, if the device name lookup
> fails we currently "trust" that the node name is something meaningful.
> 
> 4. The bug as I originally perceived of it relates specifically to our
> failure to resolve the device name after graph manipulations.
> 
> 
> Under these rules, if we "fixed" #4, node-names wouldn't show up in the
> stream at all if you never attached a bitmap to a non-root node. This is
> probably what you expected.
> 
> Node-names only feature in cases where we can't find a device/drive
> name, which is:
> A. When a bitmap is attached to a non-root node specifically. Libvirt
> can simply never do this!
> B. When under a graph transformation for drive-mirror; point #4 above.
> 
> 
> The workaround for this bug if we don't find a good policy:
> 
> 1. Use blockdev.
> 2. Give explicit, semantic names to the root nodes that represent the drive.
> 3. Any name used to add a bitmap must appear on the destination in a
> migration.
> 
> 
>> - please let me know what's actually expected, since libvirt
>>    didn't enable blockdev yet we can fix any unexpected expectations
>>
> 
> I have been and will continue to be diligent in CCing you and libvirt list.
> 
> At the moment I am still leaning towards the idea that libvirt should
> expect that any bitmaps attached to a node with an explicit node-name
> will want to use those names to migrate, but that we might be able to
> limit the cases such that you will be able to avoid the circumstance
> entirely.
> 
> However, QEMU's actual implementation is that they are node object. QEMU
> is ill-equipped to make semantic decisions about what the bitmaps "mean"
> or "represent"; the name is unfortunately the most explicit identifier
> we have to convey what bitmap we are talking about.
> 
> It will be libvirt's job to use node names to help facilitate QEMU's
> transfer of these objects during migration in a semantically helpful way.
> 
>> - Document it so that the expectations don't change after this.
>>
> 
> OK. I will take charge on this, once we reach a consensus.
> 
>> - Ideally node names will not be bound to anything and freely
>>    changeable. If necessary we can provide a map to qemu during migration
>>    which is probably less painful and more straightforward than keeping
>>    them in sync somehow ...
> 
> Why do you want node names to be freely manipulable?
> 
> The only constraint we've actually added is that a root node (that has a
> bitmap) attached to a device needs to have a name that is available on
> the target.
> 
> (Oh, and, that the virtual size of that target matches the source.)
> 
>>
> 
> 
> 
> Phew. In terms of non-direct replies to Peter's questions above, I've
> written out like a dozen failed replies to this, so I'm still quite
> confused but need to work on other things today.
> 
> I currently think that:
> 
> 
> 1. If a user uses block-dirty-bitmap-add, we have some sense of where
> they wanted the bitmap to go in the graph because they specified a name.
> Migration, if left as an automatic (opt-in) process, should try to
> migrate in-kind:
> 
> - If the user used a drive name, try to use a drive name to migrate. If
> there is no drive name and our node name is autogenerated, we cannot
> migrate this bitmap.
> - If the user used an explicit, non-generated node name, use the node
> name. If the user used an implicit node-name, we need to try to resolve
> the device name again. If that's not possible, the bitmap cannot be
> migrated.
> 

Personally, I don't like the idea of separating auto-generated node-names from
user specified.. But I inderstand that there is a risk of selecting wrong node
on target because of this..

Also, I'm afraid we can't rely on which name user used when created bitmap, as
we don't want to store this information into qcow2, when we shutdown vm..

> 
> This implies that QEMU will try to "guess" where bitmaps go when using
> -drive/-device, but will rely on explicit configuration when using
> blockdev. I think the spirit of this idea is correct.
> 
> (Vladimir: this is indeed different from EITHER of my suggested
> resolution orders over the last two days.)
> 
> 
> 
> 2. I like Vladimir's idea of providing a "default" migration approach,
> but allowing libvirt to override some features of it if necessary.
> 
> Overrides that I think will be helpful in alleviating any pain in the
> long term:
> 
> - Whitelists / Blacklists
> 
> The ability to provide either a whitelist or a blacklist for bitmaps
> that we desire to migrate. The default can continue to be: "All bitmaps
> with a name." This will allow libvirt to drop bitmaps at its discretion
> if it performs a block graph reconfiguration on migration and the bitmap
> is no longer semantically relevant or appropriate for whatever reason.
> This is superior to explicitly deleting bitmaps or dropping nodes in
> order to have a valid recourse on failed migrations.
> 
> 
> - The ability to override specific mappings on an as-needed basis. I
> believe the default resolution mechanism should be one that behaves like
> I specify above; but if that resolution is untenable for some reason,
> you can specify a remapping if you really require.
> 
> I am actually hoping that remapping is actually not necessary, because I
> think it's sufficient to use node-names to explicitly direct bitmaps to
> their intended targets.
> 
> But if we truly do need that power, I'm open to providing an interface
> to specify it.
> 
> 
> 
> I hope everyone is as confused as I am, now.
> --js
> 

I now feel myself closer to the idea that node-names are a property of running
Qemu instance, and may change after vm restart or migration. We may add/remove
drives which are including persistent dirty bitmaps, so node-name is not a
persistent property of the bitmap. Note, that we store bitmap names in qcow2,
but we never store node-name.

Therefore, I think it's OK to drop node-name default matching at all, because:

1. We don't need it for backward compatibility, as it never worked, because of
autogeneration of node-names.
2. It may lead to occasional migration of the bitmap to wrong node, which may
lead to some kind of corrupted backup. (it's always safer to drop bitmap than to
wrongly migrate it)

So, I'd leave default to migrate bitmaps in root node (may be, though some filters)
by Blk node name. And don't touch node-names.

====

If we implement migration-set-bitmaps-mapping qmp command, I think it should
override the default, so that blk names are not involved at all. And this map
should define the whole bitmaps migration picture, without any defaults, like this:

MigrationMap = [ MigrationEntry, ... ]
MigrationEntry = { MigrationSource, MigrationTarget }
MigrationSource = NodeName (means all bitmaps from the node) | {NodeName, BitmapName}
MigrationTarget = NodeName | Null (means remove the bitmap)

All named bitmaps must be covered by this definition, otherwise - return error.

The definition is extendable, if we decide at some point to allow to change bitmap name
or to change properties (persistence, enabled) it will be possible.

====

If it's a problem for libvirt to keep same node-names, why should we insist?


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-03 10:14             ` Vladimir Sementsov-Ogievskiy
@ 2019-10-03 23:34               ` John Snow
  2019-10-04  8:33                 ` Peter Krempa
  2019-10-04  9:24                 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 47+ messages in thread
From: John Snow @ 2019-10-03 23:34 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Peter Krempa
  Cc: qemu-devel, Qemu-block, Max Reitz



On 10/3/19 6:14 AM, Vladimir Sementsov-Ogievskiy wrote:
> 03.10.2019 0:35, John Snow wrote:
>> On 10/2/19 6:46 AM, Peter Krempa wrote:
>>
>> [ * poof * ]
>>
>>>
>>> I'd like to re-iterate that the necessity to keep node names same on
>>> both sides of migration is unexpected, undocumented and in some cases
>>> impossible.
>>>
>>> If you want to mandate that they must be kept the same please document
>>> it and also note the following:
>>>
>>> - during migrations the storage layout may change e.g. a backing chain
>>>    may become flattened, thus keeping node names stable beyond the top
>>>    layer is impossible
>>>
>>
>> The struct and layout of the graph is entirely unrelated to the
>> requirement that a bitmap attached to a node with a name on the source
>> needs to have a node with the same name on the destination. It's an
>> addressability requirement only.
>>
>> Change it entirely to a new drive if you want, move it up or down the
>> graph, it doesn't matter.
>>
>> Libvirt is in the best position to understand where bitmaps already are
>> and where it wants them to go.
>>
>>> - in some cases (readonly image in a cdrom not present on destination,
>>>    thus not relevant here probably) it may even become impossible to
>>>    create any node thus keeping the top node may be impossible
>>>
>>
>> It's not mandatory to recreate the graph exactly. Consider what you are
>> saying:
>>
>> - Libvirt adds a bitmap to node "N"
>> - Libvirt asks QEMU for bitmap migration
>> - Libvirt migrates to a QEMU instance that not only does not have a node
>> "N", but has no analogous node at all!
>>
>> I believe this is right to fail as there is no way to fulfill the
>> request as-is.
>>
>> (More below if you feel it's valid to migrate only some bitmaps.)
>>
>>> - it should be documented when and why this happens and how management
>>>    tools are supposed to do it
>>>
>>
>> OK, agreed, and I am sorry that our existing story has been hand-wavey.
>>
>> Let me tell you the exact specifics of the current broken logic so you
>> can understand the requirements as they exist right now.
>>
>> 1. Bitmaps attempt to use their device name to migrate, if available.
>> This covers 99% of use cases where a bitmap was added to a node that was
>> attached directly to a device model.
>>
>> This includes almost all usual cases: bitmaps loaded from qcow2 files,
>> bitmaps added via QMP to a root node, bitmaps added via QMP to a drive name.
>>
>> (It does not include cases where bitmaps are intentionally added to
>> nodes that aren't a device root. Libvirt, I believe, can simply never do
>> this and it will never come up.)
>>
>> 2. If a device name isn't available because this bitmap is not attached
>> to a root OR the BB does not have a name, we migrate using the node name.
>>
>> 3. No attention is paid whatsoever to whether a node name is
>> automatically generated or not. In effect, if the device name lookup
>> fails we currently "trust" that the node name is something meaningful.
>>
>> 4. The bug as I originally perceived of it relates specifically to our
>> failure to resolve the device name after graph manipulations.
>>
>>
>> Under these rules, if we "fixed" #4, node-names wouldn't show up in the
>> stream at all if you never attached a bitmap to a non-root node. This is
>> probably what you expected.
>>
>> Node-names only feature in cases where we can't find a device/drive
>> name, which is:
>> A. When a bitmap is attached to a non-root node specifically. Libvirt
>> can simply never do this!
>> B. When under a graph transformation for drive-mirror; point #4 above.
>>
>>
>> The workaround for this bug if we don't find a good policy:
>>
>> 1. Use blockdev.
>> 2. Give explicit, semantic names to the root nodes that represent the drive.
>> 3. Any name used to add a bitmap must appear on the destination in a
>> migration.
>>
>>
>>> - please let me know what's actually expected, since libvirt
>>>    didn't enable blockdev yet we can fix any unexpected expectations
>>>
>>
>> I have been and will continue to be diligent in CCing you and libvirt list.
>>
>> At the moment I am still leaning towards the idea that libvirt should
>> expect that any bitmaps attached to a node with an explicit node-name
>> will want to use those names to migrate, but that we might be able to
>> limit the cases such that you will be able to avoid the circumstance
>> entirely.
>>
>> However, QEMU's actual implementation is that they are node object. QEMU
>> is ill-equipped to make semantic decisions about what the bitmaps "mean"
>> or "represent"; the name is unfortunately the most explicit identifier
>> we have to convey what bitmap we are talking about.
>>
>> It will be libvirt's job to use node names to help facilitate QEMU's
>> transfer of these objects during migration in a semantically helpful way.
>>
>>> - Document it so that the expectations don't change after this.
>>>
>>
>> OK. I will take charge on this, once we reach a consensus.
>>
>>> - Ideally node names will not be bound to anything and freely
>>>    changeable. If necessary we can provide a map to qemu during migration
>>>    which is probably less painful and more straightforward than keeping
>>>    them in sync somehow ...
>>
>> Why do you want node names to be freely manipulable?
>>
>> The only constraint we've actually added is that a root node (that has a
>> bitmap) attached to a device needs to have a name that is available on
>> the target.
>>
>> (Oh, and, that the virtual size of that target matches the source.)
>>
>>>
>>
>>
>>
>> Phew. In terms of non-direct replies to Peter's questions above, I've
>> written out like a dozen failed replies to this, so I'm still quite
>> confused but need to work on other things today.
>>
>> I currently think that:
>>
>>
>> 1. If a user uses block-dirty-bitmap-add, we have some sense of where
>> they wanted the bitmap to go in the graph because they specified a name.
>> Migration, if left as an automatic (opt-in) process, should try to
>> migrate in-kind:
>>
>> - If the user used a drive name, try to use a drive name to migrate. If
>> there is no drive name and our node name is autogenerated, we cannot
>> migrate this bitmap.
>> - If the user used an explicit, non-generated node name, use the node
>> name. If the user used an implicit node-name, we need to try to resolve
>> the device name again. If that's not possible, the bitmap cannot be
>> migrated.
>>
> 
> Personally, I don't like the idea of separating auto-generated node-names from
> user specified.. But I inderstand that there is a risk of selecting wrong node
> on target because of this..
> 

The way I see it, we know an auto-generated node name will never be
correct, but an explicitly specified one represents an explicit user
configuration.

It's wrong to use generated names for migration details, but it's never
wrong to use explicit configuration.

So I believe they are /already/ distinct in nature. We even already have
code that can tell them apart.

> Also, I'm afraid we can't rely on which name user used when created bitmap, as
> we don't want to store this information into qcow2, when we shutdown vm..
> 

Right, I was thinking about this as I wrote my long email, but believe
that it poses no real problem in the "no explicit mapping" default case.

There are four cases here:

- The bitmap is loaded to a root node with an explicit name
- The bitmap is loaded to a non-root node with an explicit name

The blockdev case with persistence. The name represents explicit user
configuration and can be relied upon in the destination.

- The bitmap is loaded to a root node with an implicit name, with a named BB

The -drive case. The named BB represents the explicit user configuration
and can be relied upon.

- The bitmap is loaded to a non-root node with an implicit name.

There is no known explicit user configuration that identifies the node
this bitmap is attached to; it cannot be migrated without an explicit
override mapping.


Why I like this approach:
- We always use explicit user configuration.
- We do not attempt to migrate bitmaps that were not explicitly
positioned by the user without further configuration.

>>
>> This implies that QEMU will try to "guess" where bitmaps go when using
>> -drive/-device, but will rely on explicit configuration when using
>> blockdev. I think the spirit of this idea is correct.
>>
>> (Vladimir: this is indeed different from EITHER of my suggested
>> resolution orders over the last two days.)
>>
>>
>>
>> 2. I like Vladimir's idea of providing a "default" migration approach,
>> but allowing libvirt to override some features of it if necessary.
>>
>> Overrides that I think will be helpful in alleviating any pain in the
>> long term:
>>
>> - Whitelists / Blacklists
>>
>> The ability to provide either a whitelist or a blacklist for bitmaps
>> that we desire to migrate. The default can continue to be: "All bitmaps
>> with a name." This will allow libvirt to drop bitmaps at its discretion
>> if it performs a block graph reconfiguration on migration and the bitmap
>> is no longer semantically relevant or appropriate for whatever reason.
>> This is superior to explicitly deleting bitmaps or dropping nodes in
>> order to have a valid recourse on failed migrations.
>>
>>
>> - The ability to override specific mappings on an as-needed basis. I
>> believe the default resolution mechanism should be one that behaves like
>> I specify above; but if that resolution is untenable for some reason,
>> you can specify a remapping if you really require.
>>
>> I am actually hoping that remapping is actually not necessary, because I
>> think it's sufficient to use node-names to explicitly direct bitmaps to
>> their intended targets.
>>
>> But if we truly do need that power, I'm open to providing an interface
>> to specify it.
>>
>>
>>
>> I hope everyone is as confused as I am, now.
>> --js
>>
> 
> I now feel myself closer to the idea that node-names are a property of running
> Qemu instance, and may change after vm restart or migration. We may add/remove
> drives which are including persistent dirty bitmaps, so node-name is not a
> persistent property of the bitmap. Note, that we store bitmap names in qcow2,
> but we never store node-name.
> 

You're right that node-names aren't an indelible property of the bitmap
itself in the same way that granularity and size are.

However, I do think that node-names are a run-time attribute  of bitmaps
in that bitmaps have an "address" -- their (node, name) pair -- that has
always served as their identifier in the QMP API.

To say that this is an internal detail is misleading, I think. It has
always been the single most visible identifier of bitmaps at runtime.

Does it make sense to think that a bitmap you added with "add
node-name=NodeX bitmap=MyBitmap" would do anything other than associate
with NodeX on the destination?

That's surprising to me.

> Therefore, I think it's OK to drop node-name default matching at all, because:
> 
> 1. We don't need it for backward compatibility, as it never worked, because of
> autogeneration of node-names.

Well, part of it worked. I always tested with explicitly named nodes on
both ends of the migration because I assumed that this was a reasonable
constraint -- I expected it to break if you named them incorrectly.
That's not too different from how migration works already: If you
misconfigure the target, it's going to go poorly.

What I never tested was a configuration that accidentally sent a
generated node-name, which is something that I am adamant we must
prevent. However, on this subject and related to your point below:

> 2. It may lead to occasional migration of the bitmap to wrong node, which may
> lead to some kind of corrupted backup. (it's always safer to drop bitmap than to
> wrongly migrate it)
> 
> So, I'd leave default to migrate bitmaps in root node (may be, though some filters)
> by Blk node name. And don't touch node-names.
> 

... I'm actually OK with this approach for right now, provided that we
ensure the drive-mirror case works again. This is apparently quite
complex and I'm okay with taking the time to make sure we all agree.

However, this might cause a regression for anyone relying on named node
migrations already; we should be careful there.

> ====
> 
> If we implement migration-set-bitmaps-mapping qmp command, I think it should
> override the default, so that blk names are not involved at all. And this map
> should define the whole bitmaps migration picture, without any defaults, like this:
> 
> MigrationMap = [ MigrationEntry, ... ]
> MigrationEntry = { MigrationSource, MigrationTarget }
> MigrationSource = NodeName (means all bitmaps from the node) | {NodeName, BitmapName}
> MigrationTarget = NodeName | Null (means remove the bitmap)
> 
> All named bitmaps must be covered by this definition, otherwise - return error.
> 

I like this grammar. If we wind up wanting or needing this, this is a
good start.

> The definition is extendable, if we decide at some point to allow to change bitmap name
> or to change properties (persistence, enabled) it will be possible.
> 

Sometimes I like the idea of an overlay configuration where you don't
have to specify the entire mapping, but instead just extend a default
behavior with the ability to specify the entire mapping.

Libvirt, of course, would naturally always specify the entire
configuration explicitly.

In my vision:

- QEMU migrates any bitmap it has a "concrete name" for that was
unambiguously provided directly by the user.
- For bitmaps with ambiguous locations (no named BB direct ancestor, no
explicit node-name) we cannot migrate without further config; fail.
- QEMU allows as-needed mappings/overrides on a per-bitmap basis to
change behavior as desired.

Why I like this approach:
- It actually doesn't involve any magic; it's using only explicit user
configuration.
- The defaults are arguably correct for any situation that does not
incur an error during initialization.
- It does not involve any values that were guessed without user input,
so the migration should not be dangerous. If the user gets the node
names wrong on the destination, that is explicitly their fault for
asking for the wrong thing.
- It allows a user to engage a bitmap migration with less scaffolding to
accomplish it. If it is at all possible, I prefer an API that does not
require extremely verbose configuration to operate in normative cases. I
recognize that this is not always possible.

If you are worried that some defaults might "slip through" without
getting overridden, it's always OK to add a "nodefaults" flag so that
libvirt can be solidly assured that is in complete and total control of
every last detail.

(But, if it's simpler to say that manual configuration is always "all or
nothing" for the sake of implementation, that's a good reason not to do
something fancier. I'm just stating my preference in terms of what I
like as a user.)

> ====
> 
> If it's a problem for libvirt to keep same node-names, why should we insist?
> 
> 

Is it really a problem? If libvirt requests migration of bitmaps, those
bitmaps need to go somewhere. Even if it constructs a different graph
for valid reasons, it should still understand which qcow2 nodes
correlate to which other qcow2 nodes and name them accordingly.

I don't see why this is actually a terrible constraint. Even in our
mapping proposal we're still using node-names.


So here's a summary of where I stand right now:

- I want an API in QEMU that doesn't require libvirt.

- I want to accommodate libvirt, but don't understand the requirement
that node-names must be ephemeral.

- I would like to define a set of default behaviors (when bitmap
migration is enabled) that migrates only bitmaps it is confident it can
do so correctly and error out when it cannot.

- I'd like to amend the bitmap device name resolution to accommodate the
drive-mirror case.

- Acknowledging that there might be cases where the defaults just simply
aren't powerful enough, allow a manual configuration that allows us to
select or deselect bitmaps and explicitly set their destination node-name.


Sorry to have typed so many words about this; the path forward was not
necessarily clear and we all live in quite different timezones.

--js


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-03 23:34               ` John Snow
@ 2019-10-04  8:33                 ` Peter Krempa
  2019-10-04  9:21                   ` Vladimir Sementsov-Ogievskiy
  2019-10-06  3:15                   ` John Snow
  2019-10-04  9:24                 ` Vladimir Sementsov-Ogievskiy
  1 sibling, 2 replies; 47+ messages in thread
From: Peter Krempa @ 2019-10-04  8:33 UTC (permalink / raw)
  To: John Snow; +Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Qemu-block, Max Reitz

On Thu, Oct 03, 2019 at 19:34:56 -0400, John Snow wrote:
> On 10/3/19 6:14 AM, Vladimir Sementsov-Ogievskiy wrote:
> > 03.10.2019 0:35, John Snow wrote:
> >> On 10/2/19 6:46 AM, Peter Krempa wrote:
> > ====

[...]

(I'm sorry if I ignored something which might require input in the
trimmed part but I don't have enough mental capacity to follow this
thread fully)

> > 
> > If it's a problem for libvirt to keep same node-names, why should we insist?
> > 
> > 
> 
> Is it really a problem? If libvirt requests migration of bitmaps, those
> bitmaps need to go somewhere. Even if it constructs a different graph
> for valid reasons, it should still understand which qcow2 nodes
> correlate to which other qcow2 nodes and name them accordingly.

Well, no it is not a problem. Since bitmap migration has a migration
capability and libvirt by default disables all unknown migration
capabilities we can deal with it.

We have measures to transfer state to the destination we can
basically do the equivalent of the explicit mapping but with extra
steps.

We know where we want to place the bitmap and thus we can configure
those nodes appropriately and generate new names for everything else so
that nothing gets accidentally copied to wrong place.

My concern is though about the future. Since this is the first instance
of such a  migration feature which requires node names it's okay because
we can cheat by naming the destination "appropriately". The problem
will start though if there will be something else bound to the backend
of a disk addressed by node names which will have different semantics.

In that case we won't be able to cheat again.

Let's assume the following example:

qemu adds a new feature of migrating the qcow2 L2 cache. This will
obviously have different implications on when it can be used than
bitmaps.

If we'd like to use either of the features but not both together on a
node there wouldn't be a possibility to achieve that.

The thing about bitmaps is that they are not really bound to the image
itself but rather the data in the image. As long as the user provides a
image with exactly the same contents the VM can use it and the bitmap
will be correct for it.

We use this in non-shared storage migration where we by default flatten
the backing chain into a new image. In such case a bitmap is still valid
but the cache in the hypothetical example is not valid to be copied over
for the same node name.

At the very least the nuances of the capability should be documented so
that we don't have to second guess what is going to happen.

> I don't see why this is actually a terrible constraint. Even in our
> mapping proposal we're still using node-names.
> 
> 
> So here's a summary of where I stand right now:
> 
> - I want an API in QEMU that doesn't require libvirt.
> 
> - I want to accommodate libvirt, but don't understand the requirement
> that node-names must be ephemeral.

As I've outlined above, the node names must not be ephemeral but on the
same note it's then necessary to clarify when they must be stable
accross migration and when they must be different.

In the above example I'm outlining an image which has the same data but
it's a different image (it was converted for example). In that case the
bitmap migration would imply the same node name, but at the same time
the image is completely different and any other feature may be
incompatible with it.

The same is possible e.g. when you have multiple protocols to access the
same data are they the same thing and thus warrant the same node name?
or are they different.

Treating node names as ephemeral has the advantage of not trying to
assume the equivalence of the images on the migration channel and not
having to try to figure out whether they are "euqivalent enough" for the
given feature.

> 
> - I would like to define a set of default behaviors (when bitmap
> migration is enabled) that migrates only bitmaps it is confident it can
> do so correctly and error out when it cannot.

This requires also defining a set of external constraints when it will
work. Note that they can differ with other features.

> 
> - I'd like to amend the bitmap device name resolution to accommodate the
> drive-mirror case.
> 
> - Acknowledging that there might be cases where the defaults just simply
> aren't powerful enough, allow a manual configuration that allows us to
> select or deselect bitmaps and explicitly set their destination node-name.

This tangentially brings me to another question. In case when the
destination image already contains a bitmap with the same name, will the
migration of bitmaps overwrite it or merge with it?

This is again one thing that should be documented.

In the outlined case of non-shared storage migration libvirt would
obviously prefer merge or having it configurable, but as said, we have
means to work this around by renaming the bitmap temporarily  during
migration and then merging it explicitly.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-04  8:33                 ` Peter Krempa
@ 2019-10-04  9:21                   ` Vladimir Sementsov-Ogievskiy
  2019-10-06  3:15                   ` John Snow
  1 sibling, 0 replies; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-04  9:21 UTC (permalink / raw)
  To: Peter Krempa, John Snow; +Cc: qemu-devel, Qemu-block, Max Reitz

04.10.2019 11:33, Peter Krempa wrote:
> On Thu, Oct 03, 2019 at 19:34:56 -0400, John Snow wrote:
>> On 10/3/19 6:14 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> 03.10.2019 0:35, John Snow wrote:
>>>> On 10/2/19 6:46 AM, Peter Krempa wrote:
>>> ====
> 
> [...]
> 
> (I'm sorry if I ignored something which might require input in the
> trimmed part but I don't have enough mental capacity to follow this
> thread fully)
> 
>>>
>>> If it's a problem for libvirt to keep same node-names, why should we insist?
>>>
>>>
>>
>> Is it really a problem? If libvirt requests migration of bitmaps, those
>> bitmaps need to go somewhere. Even if it constructs a different graph
>> for valid reasons, it should still understand which qcow2 nodes
>> correlate to which other qcow2 nodes and name them accordingly.
> 
> Well, no it is not a problem. Since bitmap migration has a migration
> capability and libvirt by default disables all unknown migration
> capabilities we can deal with it.
> 
> We have measures to transfer state to the destination we can
> basically do the equivalent of the explicit mapping but with extra
> steps.
> 
> We know where we want to place the bitmap and thus we can configure
> those nodes appropriately and generate new names for everything else so
> that nothing gets accidentally copied to wrong place.
> 
> My concern is though about the future. Since this is the first instance
> of such a  migration feature which requires node names it's okay because
> we can cheat by naming the destination "appropriately". The problem
> will start though if there will be something else bound to the backend
> of a disk addressed by node names which will have different semantics.
> 
> In that case we won't be able to cheat again.
> 
> Let's assume the following example:
> 
> qemu adds a new feature of migrating the qcow2 L2 cache. This will
> obviously have different implications on when it can be used than
> bitmaps.
> 
> If we'd like to use either of the features but not both together on a
> node there wouldn't be a possibility to achieve that.
> 
> The thing about bitmaps is that they are not really bound to the image
> itself but rather the data in the image. As long as the user provides a
> image with exactly the same contents the VM can use it and the bitmap
> will be correct for it.
> 
> We use this in non-shared storage migration where we by default flatten
> the backing chain into a new image. In such case a bitmap is still valid
> but the cache in the hypothetical example is not valid to be copied over
> for the same node name.
> 
> At the very least the nuances of the capability should be documented so
> that we don't have to second guess what is going to happen.
> 
>> I don't see why this is actually a terrible constraint. Even in our
>> mapping proposal we're still using node-names.
>>
>>
>> So here's a summary of where I stand right now:
>>
>> - I want an API in QEMU that doesn't require libvirt.
>>
>> - I want to accommodate libvirt, but don't understand the requirement
>> that node-names must be ephemeral.
> 
> As I've outlined above, the node names must not be ephemeral but on the
> same note it's then necessary to clarify when they must be stable
> accross migration and when they must be different.
> 
> In the above example I'm outlining an image which has the same data but
> it's a different image (it was converted for example). In that case the
> bitmap migration would imply the same node name, but at the same time
> the image is completely different and any other feature may be
> incompatible with it.
> 
> The same is possible e.g. when you have multiple protocols to access the
> same data are they the same thing and thus warrant the same node name?
> or are they different.
> 
> Treating node names as ephemeral has the advantage of not trying to
> assume the equivalence of the images on the migration channel and not
> having to try to figure out whether they are "euqivalent enough" for the
> given feature.
> 
>>
>> - I would like to define a set of default behaviors (when bitmap
>> migration is enabled) that migrates only bitmaps it is confident it can
>> do so correctly and error out when it cannot.
> 
> This requires also defining a set of external constraints when it will
> work. Note that they can differ with other features.
> 
>>
>> - I'd like to amend the bitmap device name resolution to accommodate the
>> drive-mirror case.
>>
>> - Acknowledging that there might be cases where the defaults just simply
>> aren't powerful enough, allow a manual configuration that allows us to
>> select or deselect bitmaps and explicitly set their destination node-name.
> 
> This tangentially brings me to another question. In case when the
> destination image already contains a bitmap with the same name, will the
> migration of bitmaps overwrite it or merge with it?

It will fail if bitmap already exists.

> 
> This is again one thing that should be documented.
> 
> In the outlined case of non-shared storage migration libvirt would
> obviously prefer merge or having it configurable, but as said, we have
> means to work this around by renaming the bitmap temporarily  during
> migration and then merging it explicitly.
> 

Why merge? source and destination disks are not merged, by destination is replaced
by source...

And where from will you have bitmaps in destination?

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-03 23:34               ` John Snow
  2019-10-04  8:33                 ` Peter Krempa
@ 2019-10-04  9:24                 ` Vladimir Sementsov-Ogievskiy
  2019-10-04 13:07                   ` Eric Blake
  1 sibling, 1 reply; 47+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-04  9:24 UTC (permalink / raw)
  To: John Snow, Peter Krempa; +Cc: qemu-devel, Qemu-block, Max Reitz

04.10.2019 2:34, John Snow wrote:
> 
> 
> On 10/3/19 6:14 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 03.10.2019 0:35, John Snow wrote:
>>> On 10/2/19 6:46 AM, Peter Krempa wrote:
>>>
>>> [ * poof * ]
>>>
>>>>
>>>> I'd like to re-iterate that the necessity to keep node names same on
>>>> both sides of migration is unexpected, undocumented and in some cases
>>>> impossible.
>>>>
>>>> If you want to mandate that they must be kept the same please document
>>>> it and also note the following:
>>>>
>>>> - during migrations the storage layout may change e.g. a backing chain
>>>>     may become flattened, thus keeping node names stable beyond the top
>>>>     layer is impossible
>>>>
>>>
>>> The struct and layout of the graph is entirely unrelated to the
>>> requirement that a bitmap attached to a node with a name on the source
>>> needs to have a node with the same name on the destination. It's an
>>> addressability requirement only.
>>>
>>> Change it entirely to a new drive if you want, move it up or down the
>>> graph, it doesn't matter.
>>>
>>> Libvirt is in the best position to understand where bitmaps already are
>>> and where it wants them to go.
>>>
>>>> - in some cases (readonly image in a cdrom not present on destination,
>>>>     thus not relevant here probably) it may even become impossible to
>>>>     create any node thus keeping the top node may be impossible
>>>>
>>>
>>> It's not mandatory to recreate the graph exactly. Consider what you are
>>> saying:
>>>
>>> - Libvirt adds a bitmap to node "N"
>>> - Libvirt asks QEMU for bitmap migration
>>> - Libvirt migrates to a QEMU instance that not only does not have a node
>>> "N", but has no analogous node at all!
>>>
>>> I believe this is right to fail as there is no way to fulfill the
>>> request as-is.
>>>
>>> (More below if you feel it's valid to migrate only some bitmaps.)
>>>
>>>> - it should be documented when and why this happens and how management
>>>>     tools are supposed to do it
>>>>
>>>
>>> OK, agreed, and I am sorry that our existing story has been hand-wavey.
>>>
>>> Let me tell you the exact specifics of the current broken logic so you
>>> can understand the requirements as they exist right now.
>>>
>>> 1. Bitmaps attempt to use their device name to migrate, if available.
>>> This covers 99% of use cases where a bitmap was added to a node that was
>>> attached directly to a device model.
>>>
>>> This includes almost all usual cases: bitmaps loaded from qcow2 files,
>>> bitmaps added via QMP to a root node, bitmaps added via QMP to a drive name.
>>>
>>> (It does not include cases where bitmaps are intentionally added to
>>> nodes that aren't a device root. Libvirt, I believe, can simply never do
>>> this and it will never come up.)
>>>
>>> 2. If a device name isn't available because this bitmap is not attached
>>> to a root OR the BB does not have a name, we migrate using the node name.
>>>
>>> 3. No attention is paid whatsoever to whether a node name is
>>> automatically generated or not. In effect, if the device name lookup
>>> fails we currently "trust" that the node name is something meaningful.
>>>
>>> 4. The bug as I originally perceived of it relates specifically to our
>>> failure to resolve the device name after graph manipulations.
>>>
>>>
>>> Under these rules, if we "fixed" #4, node-names wouldn't show up in the
>>> stream at all if you never attached a bitmap to a non-root node. This is
>>> probably what you expected.
>>>
>>> Node-names only feature in cases where we can't find a device/drive
>>> name, which is:
>>> A. When a bitmap is attached to a non-root node specifically. Libvirt
>>> can simply never do this!
>>> B. When under a graph transformation for drive-mirror; point #4 above.
>>>
>>>
>>> The workaround for this bug if we don't find a good policy:
>>>
>>> 1. Use blockdev.
>>> 2. Give explicit, semantic names to the root nodes that represent the drive.
>>> 3. Any name used to add a bitmap must appear on the destination in a
>>> migration.
>>>
>>>
>>>> - please let me know what's actually expected, since libvirt
>>>>     didn't enable blockdev yet we can fix any unexpected expectations
>>>>
>>>
>>> I have been and will continue to be diligent in CCing you and libvirt list.
>>>
>>> At the moment I am still leaning towards the idea that libvirt should
>>> expect that any bitmaps attached to a node with an explicit node-name
>>> will want to use those names to migrate, but that we might be able to
>>> limit the cases such that you will be able to avoid the circumstance
>>> entirely.
>>>
>>> However, QEMU's actual implementation is that they are node object. QEMU
>>> is ill-equipped to make semantic decisions about what the bitmaps "mean"
>>> or "represent"; the name is unfortunately the most explicit identifier
>>> we have to convey what bitmap we are talking about.
>>>
>>> It will be libvirt's job to use node names to help facilitate QEMU's
>>> transfer of these objects during migration in a semantically helpful way.
>>>
>>>> - Document it so that the expectations don't change after this.
>>>>
>>>
>>> OK. I will take charge on this, once we reach a consensus.
>>>
>>>> - Ideally node names will not be bound to anything and freely
>>>>     changeable. If necessary we can provide a map to qemu during migration
>>>>     which is probably less painful and more straightforward than keeping
>>>>     them in sync somehow ...
>>>
>>> Why do you want node names to be freely manipulable?
>>>
>>> The only constraint we've actually added is that a root node (that has a
>>> bitmap) attached to a device needs to have a name that is available on
>>> the target.
>>>
>>> (Oh, and, that the virtual size of that target matches the source.)
>>>
>>>>
>>>
>>>
>>>
>>> Phew. In terms of non-direct replies to Peter's questions above, I've
>>> written out like a dozen failed replies to this, so I'm still quite
>>> confused but need to work on other things today.
>>>
>>> I currently think that:
>>>
>>>
>>> 1. If a user uses block-dirty-bitmap-add, we have some sense of where
>>> they wanted the bitmap to go in the graph because they specified a name.
>>> Migration, if left as an automatic (opt-in) process, should try to
>>> migrate in-kind:
>>>
>>> - If the user used a drive name, try to use a drive name to migrate. If
>>> there is no drive name and our node name is autogenerated, we cannot
>>> migrate this bitmap.
>>> - If the user used an explicit, non-generated node name, use the node
>>> name. If the user used an implicit node-name, we need to try to resolve
>>> the device name again. If that's not possible, the bitmap cannot be
>>> migrated.
>>>
>>
>> Personally, I don't like the idea of separating auto-generated node-names from
>> user specified.. But I inderstand that there is a risk of selecting wrong node
>> on target because of this..
>>
> 
> The way I see it, we know an auto-generated node name will never be
> correct, but an explicitly specified one represents an explicit user
> configuration.
> 
> It's wrong to use generated names for migration details, but it's never
> wrong to use explicit configuration.
> 
> So I believe they are /already/ distinct in nature. We even already have
> code that can tell them apart.

Is it restricted to create user node-names formatted like automated ones?

> 
>> Also, I'm afraid we can't rely on which name user used when created bitmap, as
>> we don't want to store this information into qcow2, when we shutdown vm..
>>
> 
> Right, I was thinking about this as I wrote my long email, but believe
> that it poses no real problem in the "no explicit mapping" default case.
> 
> There are four cases here:
> 
> - The bitmap is loaded to a root node with an explicit name
> - The bitmap is loaded to a non-root node with an explicit name
> 
> The blockdev case with persistence. The name represents explicit user
> configuration and can be relied upon in the destination.
> 
> - The bitmap is loaded to a root node with an implicit name, with a named BB
> 
> The -drive case. The named BB represents the explicit user configuration
> and can be relied upon.
> 
> - The bitmap is loaded to a non-root node with an implicit name.

So, do you suggest to save information of haw bitmap was loaded or created in
BdrvDirtyBitmap structure, to distinguish, how to identify it, by blk name or
by node-name? And how this information would be updated on bitmap merge? And
what about creating bitmaps?

So if one bitmap created in node N by blk name B, and another bitmap created in
same node N by node-name N, will we migrated these bitmaps in different ways?

> 
> There is no known explicit user configuration that identifies the node
> this bitmap is attached to; it cannot be migrated without an explicit
> override mapping.
> 
> 
> Why I like this approach:
> - We always use explicit user configuration.
> - We do not attempt to migrate bitmaps that were not explicitly
> positioned by the user without further configuration.
> 
>>>
>>> This implies that QEMU will try to "guess" where bitmaps go when using
>>> -drive/-device, but will rely on explicit configuration when using
>>> blockdev. I think the spirit of this idea is correct.
>>>
>>> (Vladimir: this is indeed different from EITHER of my suggested
>>> resolution orders over the last two days.)
>>>
>>>
>>>
>>> 2. I like Vladimir's idea of providing a "default" migration approach,
>>> but allowing libvirt to override some features of it if necessary.
>>>
>>> Overrides that I think will be helpful in alleviating any pain in the
>>> long term:
>>>
>>> - Whitelists / Blacklists
>>>
>>> The ability to provide either a whitelist or a blacklist for bitmaps
>>> that we desire to migrate. The default can continue to be: "All bitmaps
>>> with a name." This will allow libvirt to drop bitmaps at its discretion
>>> if it performs a block graph reconfiguration on migration and the bitmap
>>> is no longer semantically relevant or appropriate for whatever reason.
>>> This is superior to explicitly deleting bitmaps or dropping nodes in
>>> order to have a valid recourse on failed migrations.
>>>
>>>
>>> - The ability to override specific mappings on an as-needed basis. I
>>> believe the default resolution mechanism should be one that behaves like
>>> I specify above; but if that resolution is untenable for some reason,
>>> you can specify a remapping if you really require.
>>>
>>> I am actually hoping that remapping is actually not necessary, because I
>>> think it's sufficient to use node-names to explicitly direct bitmaps to
>>> their intended targets.
>>>
>>> But if we truly do need that power, I'm open to providing an interface
>>> to specify it.
>>>
>>>
>>>
>>> I hope everyone is as confused as I am, now.
>>> --js
>>>
>>
>> I now feel myself closer to the idea that node-names are a property of running
>> Qemu instance, and may change after vm restart or migration. We may add/remove
>> drives which are including persistent dirty bitmaps, so node-name is not a
>> persistent property of the bitmap. Note, that we store bitmap names in qcow2,
>> but we never store node-name.
>>
> 
> You're right that node-names aren't an indelible property of the bitmap
> itself in the same way that granularity and size are.
> 
> However, I do think that node-names are a run-time attribute  of bitmaps
> in that bitmaps have an "address" -- their (node, name) pair -- that has
> always served as their identifier in the QMP API.
> 
> To say that this is an internal detail is misleading, I think. It has
> always been the single most visible identifier of bitmaps at runtime.

Not internal. But it's an option of one running Qemu process.

> 
> Does it make sense to think that a bitmap you added with "add
> node-name=NodeX bitmap=MyBitmap" would do anything other than associate
> with NodeX on the destination?
> 
> That's surprising to me.
> 
>> Therefore, I think it's OK to drop node-name default matching at all, because:
>>
>> 1. We don't need it for backward compatibility, as it never worked, because of
>> autogeneration of node-names.
> 
> Well, part of it worked. I always tested with explicitly named nodes on
> both ends of the migration because I assumed that this was a reasonable
> constraint -- I expected it to break if you named them incorrectly.
> That's not too different from how migration works already: If you
> misconfigure the target, it's going to go poorly.
> 
> What I never tested was a configuration that accidentally sent a
> generated node-name, which is something that I am adamant we must
> prevent. However, on this subject and related to your point below:
> 
>> 2. It may lead to occasional migration of the bitmap to wrong node, which may
>> lead to some kind of corrupted backup. (it's always safer to drop bitmap than to
>> wrongly migrate it)
>>
>> So, I'd leave default to migrate bitmaps in root node (may be, though some filters)
>> by Blk node name. And don't touch node-names.
>>
> 
> ... I'm actually OK with this approach for right now, provided that we
> ensure the drive-mirror case works again. This is apparently quite
> complex and I'm okay with taking the time to make sure we all agree.
> 
> However, this might cause a regression for anyone relying on named node
> migrations already; we should be careful there.

Hmm, that's true too...

> 
>> ====
>>
>> If we implement migration-set-bitmaps-mapping qmp command, I think it should
>> override the default, so that blk names are not involved at all. And this map
>> should define the whole bitmaps migration picture, without any defaults, like this:
>>
>> MigrationMap = [ MigrationEntry, ... ]
>> MigrationEntry = { MigrationSource, MigrationTarget }
>> MigrationSource = NodeName (means all bitmaps from the node) | {NodeName, BitmapName}
>> MigrationTarget = NodeName | Null (means remove the bitmap)
>>
>> All named bitmaps must be covered by this definition, otherwise - return error.
>>
> 
> I like this grammar. If we wind up wanting or needing this, this is a
> good start.
> 
>> The definition is extendable, if we decide at some point to allow to change bitmap name
>> or to change properties (persistence, enabled) it will be possible.
>>
> 
> Sometimes I like the idea of an overlay configuration where you don't
> have to specify the entire mapping, but instead just extend a default
> behavior with the ability to specify the entire mapping.

Hmm, it may be tuned by
MigrationMap = { map: [ MigrationEntry], default: 'none' | 'node-name-match' }



> 
> Libvirt, of course, would naturally always specify the entire
> configuration explicitly.
> 
> In my vision:
> 
> - QEMU migrates any bitmap it has a "concrete name" for that was
> unambiguously provided directly by the user.
> - For bitmaps with ambiguous locations (no named BB direct ancestor, no
> explicit node-name) we cannot migrate without further config; fail.
> - QEMU allows as-needed mappings/overrides on a per-bitmap basis to
> change behavior as desired.
> 
> Why I like this approach:
> - It actually doesn't involve any magic; it's using only explicit user
> configuration.
> - The defaults are arguably correct for any situation that does not
> incur an error during initialization.
> - It does not involve any values that were guessed without user input,
> so the migration should not be dangerous. If the user gets the node
> names wrong on the destination, that is explicitly their fault for
> asking for the wrong thing.
> - It allows a user to engage a bitmap migration with less scaffolding to
> accomplish it. If it is at all possible, I prefer an API that does not
> require extremely verbose configuration to operate in normative cases. I
> recognize that this is not always possible.
> 
> If you are worried that some defaults might "slip through" without
> getting overridden, it's always OK to add a "nodefaults" flag so that
> libvirt can be solidly assured that is in complete and total control of
> every last detail.
> 
> (But, if it's simpler to say that manual configuration is always "all or
> nothing" for the sake of implementation, that's a good reason not to do
> something fancier. I'm just stating my preference in terms of what I
> like as a user.)
> 
>> ====
>>
>> If it's a problem for libvirt to keep same node-names, why should we insist?
>>
>>
> 
> Is it really a problem? If libvirt requests migration of bitmaps, those
> bitmaps need to go somewhere. Even if it constructs a different graph
> for valid reasons, it should still understand which qcow2 nodes
> correlate to which other qcow2 nodes and name them accordingly.
> 
> I don't see why this is actually a terrible constraint. Even in our
> mapping proposal we're still using node-names.
> 
> 
> So here's a summary of where I stand right now:
> 
> - I want an API in QEMU that doesn't require libvirt.
> 
> - I want to accommodate libvirt, but don't understand the requirement
> that node-names must be ephemeral.
> 
> - I would like to define a set of default behaviors (when bitmap
> migration is enabled) that migrates only bitmaps it is confident it can
> do so correctly and error out when it cannot.
> 
> - I'd like to amend the bitmap device name resolution to accommodate the
> drive-mirror case.
> 
> - Acknowledging that there might be cases where the defaults just simply
> aren't powerful enough, allow a manual configuration that allows us to
> select or deselect bitmaps and explicitly set their destination node-name.
> 
> 
> Sorry to have typed so many words about this; the path forward was not
> necessarily clear and we all live in quite different timezones.
> 
> --js
> 

Ok, I'm not completely against node-name matching, keeping in mind that it is
current default behavior anyway. And I see Peter not completely against too.

But I'd prefer to select default name from current moment, not involving information
of "how bitmap was created or loaded", as it may lead to migrating bitmaps from one
node in different ways which seems inconsistent.

Current default is blk name. And node-name if blk name is not available. So I think
the only thing to fix right now is skipping filters. We possibly may additionally
restrict migrating bitmaps without blk name and with generated node-name.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-04  9:24                 ` Vladimir Sementsov-Ogievskiy
@ 2019-10-04 13:07                   ` Eric Blake
  2019-10-06  3:19                     ` John Snow
  0 siblings, 1 reply; 47+ messages in thread
From: Eric Blake @ 2019-10-04 13:07 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, John Snow, Peter Krempa
  Cc: qemu-devel, Qemu-block, Max Reitz

On 10/4/19 4:24 AM, Vladimir Sementsov-Ogievskiy wrote:

>> The way I see it, we know an auto-generated node name will never be
>> correct, but an explicitly specified one represents an explicit user
>> configuration.
>>
>> It's wrong to use generated names for migration details, but it's never
>> wrong to use explicit configuration.
>>
>> So I believe they are /already/ distinct in nature. We even already have
>> code that can tell them apart.
> 
> Is it restricted to create user node-names formatted like automated ones?

Yes. Explicit node names cannot begin with '#', while all generated node 
names do.


>> There are four cases here:
>>
>> - The bitmap is loaded to a root node with an explicit name
>> - The bitmap is loaded to a non-root node with an explicit name
>>
>> The blockdev case with persistence. The name represents explicit user
>> configuration and can be relied upon in the destination.
>>
>> - The bitmap is loaded to a root node with an implicit name, with a named BB
>>
>> The -drive case. The named BB represents the explicit user configuration
>> and can be relied upon.
>>
>> - The bitmap is loaded to a non-root node with an implicit name.
> 
> So, do you suggest to save information of haw bitmap was loaded or created in
> BdrvDirtyBitmap structure, to distinguish, how to identify it, by blk name or
> by node-name? And how this information would be updated on bitmap merge? And
> what about creating bitmaps?
> 
> So if one bitmap created in node N by blk name B, and another bitmap created in
> same node N by node-name N, will we migrated these bitmaps in different ways?

In the -drive case (historical libvirt), the block device is named, and 
node names are generated (it may be possible to use -drive and still 
create explicit node names, but libvirt will never do that).  You can 
create a bitmap using either ('drive-name','bitmap-name'), or 
('generated-node-name','bitmap-name'), but for the purposes of 
migration, only the 'drive-name' variant is migrateable.

In the -blockdev case (upcoming libvirt), the block device is anonymous, 
and all node names are given by libvirt.  Thus, you can only create a 
bitmap using ('node-name','bitmap-name'), but it is also obvious that 
migration will use the 'node-name' variant.


>>>
>>> If it's a problem for libvirt to keep same node-names, why should we insist?
>>>
>>>
>>
>> Is it really a problem? If libvirt requests migration of bitmaps, those
>> bitmaps need to go somewhere. Even if it constructs a different graph
>> for valid reasons, it should still understand which qcow2 nodes
>> correlate to which other qcow2 nodes and name them accordingly.
>>
>> I don't see why this is actually a terrible constraint. Even in our
>> mapping proposal we're still using node-names.
>>
>>

The obvious case I see is that if we have a source:

Backing.qcow2 (contains bitmap1) <- Active.qcow2 (contains bitmap2)

and we want to migrate AND flatten at the same time, but still preserve 
the bitmaps as a record of changes between the points in time, then 
libvirt needs a way to specify migration to:

Flattened.qcow2 (contains bitmap1 and bitmap2)

No matter which node name libvirt assigns to Flattened.qcow2, at least 
one of the two bitmaps on the source will be migrated to a different 
node name on the destination, while still giving the net result of a 
bitmap logically associated with the drive (and not any particular node).


> Ok, I'm not completely against node-name matching, keeping in mind that it is
> current default behavior anyway. And I see Peter not completely against too.
> 
> But I'd prefer to select default name from current moment, not involving information
> of "how bitmap was created or loaded", as it may lead to migrating bitmaps from one
> node in different ways which seems inconsistent.

As long as a bitmap never has both names non-generated, we should be 
fine (it either has an explicit drive name and generated node name, or 
it has no drive name and an explicit node name).

> 
> Current default is blk name. And node-name if blk name is not available. So I think
> the only thing to fix right now is skipping filters. We possibly may additionally
> restrict migrating bitmaps without blk name and with generated node-name.
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-04  8:33                 ` Peter Krempa
  2019-10-04  9:21                   ` Vladimir Sementsov-Ogievskiy
@ 2019-10-06  3:15                   ` John Snow
  1 sibling, 0 replies; 47+ messages in thread
From: John Snow @ 2019-10-06  3:15 UTC (permalink / raw)
  To: Peter Krempa
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, Qemu-block, Max Reitz



On 10/4/19 4:33 AM, Peter Krempa wrote:
> On Thu, Oct 03, 2019 at 19:34:56 -0400, John Snow wrote:
>> On 10/3/19 6:14 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> 03.10.2019 0:35, John Snow wrote:
>>>> On 10/2/19 6:46 AM, Peter Krempa wrote:
>>> ====
> 
> [...]
> 
> (I'm sorry if I ignored something which might require input in the
> trimmed part but I don't have enough mental capacity to follow this
> thread fully)
> 

Yeah, understandable -- it's getting a bit long, but I'm trying to make
sure I understand the nuance everywhere before I start pursuing a
particular solution.

I think you caught the important parts in your replies below.

>>>
>>> If it's a problem for libvirt to keep same node-names, why should we insist?
>>>
>>>
>>
>> Is it really a problem? If libvirt requests migration of bitmaps, those
>> bitmaps need to go somewhere. Even if it constructs a different graph
>> for valid reasons, it should still understand which qcow2 nodes
>> correlate to which other qcow2 nodes and name them accordingly.
> 
> Well, no it is not a problem. Since bitmap migration has a migration
> capability and libvirt by default disables all unknown migration
> capabilities we can deal with it.
> 
> We have measures to transfer state to the destination we can
> basically do the equivalent of the explicit mapping but with extra
> steps.
> 
> We know where we want to place the bitmap and thus we can configure
> those nodes appropriately and generate new names for everything else so
> that nothing gets accidentally copied to wrong place.
> 
> My concern is though about the future. Since this is the first instance
> of such a  migration feature which requires node names it's okay because
> we can cheat by naming the destination "appropriately". The problem
> will start though if there will be something else bound to the backend
> of a disk addressed by node names which will have different semantics.
> 
> In that case we won't be able to cheat again.
> 

OK, I see the concern now. Though we're free to name nodes to achieve
the bitmap semantics we want right now, graph reconfigurations in the
future might not be able to fit within the same constraints simultaneously.

Reasonable concern.

Thank you for the illustrative hypothetical.

> Let's assume the following example:
> 
> qemu adds a new feature of migrating the qcow2 L2 cache. This will
> obviously have different implications on when it can be used than
> bitmaps.
> 
> If we'd like to use either of the features but not both together on a
> node there wouldn't be a possibility to achieve that.
> 
> The thing about bitmaps is that they are not really bound to the image
> itself but rather the data in the image. As long as the user provides a
> image with exactly the same contents the VM can use it and the bitmap
> will be correct for it.
> 
> We use this in non-shared storage migration where we by default flatten
> the backing chain into a new image. In such case a bitmap is still valid
> but the cache in the hypothetical example is not valid to be copied over
> for the same node name.
> 
> At the very least the nuances of the capability should be documented so
> that we don't have to second guess what is going to happen.
> 

OK, understood.

>> I don't see why this is actually a terrible constraint. Even in our
>> mapping proposal we're still using node-names.
>>
>>
>> So here's a summary of where I stand right now:
>>
>> - I want an API in QEMU that doesn't require libvirt.
>>
>> - I want to accommodate libvirt, but don't understand the requirement
>> that node-names must be ephemeral.
> 
> As I've outlined above, the node names must not be ephemeral but on the
> same note it's then necessary to clarify when they must be stable
> accross migration and when they must be different.
> 
> In the above example I'm outlining an image which has the same data but
> it's a different image (it was converted for example). In that case the
> bitmap migration would imply the same node name, but at the same time
> the image is completely different and any other feature may be
> incompatible with it.
> 
> The same is possible e.g. when you have multiple protocols to access the
> same data are they the same thing and thus warrant the same node name?
> or are they different.
> 
> Treating node names as ephemeral has the advantage of not trying to
> assume the equivalence of the images on the migration channel and not
> having to try to figure out whether they are "euqivalent enough" for the
> given feature.
> 
>>
>> - I would like to define a set of default behaviors (when bitmap
>> migration is enabled) that migrates only bitmaps it is confident it can
>> do so correctly and error out when it cannot.
> 
> This requires also defining a set of external constraints when it will
> work. Note that they can differ with other features.
> 
>>
>> - I'd like to amend the bitmap device name resolution to accommodate the
>> drive-mirror case.
>>
>> - Acknowledging that there might be cases where the defaults just simply
>> aren't powerful enough, allow a manual configuration that allows us to
>> select or deselect bitmaps and explicitly set their destination node-name.
> 
> This tangentially brings me to another question. In case when the
> destination image already contains a bitmap with the same name, will the
> migration of bitmaps overwrite it or merge with it?
> 

It will emit an error, currently.

> This is again one thing that should be documented.
> 
> In the outlined case of non-shared storage migration libvirt would
> obviously prefer merge or having it configurable, but as said, we have
> means to work this around by renaming the bitmap temporarily  during
> migration and then merging it explicitly.
> 

Well, we don't know what bitmap is already there or what semantics it
has -- so at the moment we just bail out because we can't tell what's
going on.

If you find a use case for merge or overwrite, we can add that as a
feature, but right now we do the "safe thing".



I would have still liked to find a way to migrate bitmap with a "sane
default", but because I don't know what I don't know, it might be the
case that node-names in the stream that aren't explicitly requested
could be a very limiting factor in the future. Sadly, we are already
using them -- but perhaps we can find a way to back out of that.


So I think the safest thing, given your concern, is:
- Any bitmap attached to a root node can be migrated using the root's
drive name, if it has one. (This includes the drive-mirror case, if we
fix our ability to resolve the root through the filter node.)

- Any bitmap that doesn't have a named device or backend at its root
should not be migrated without further configuration, because doing so
requires node-names in the stream, for which we have a poor
understanding of possible future competing constraints.


I'm a little sad that this means that blockdev configurations can't be
migrated without a much more verbose migration statement.

I'm open to suggestions, but it sounds like we do want the ability to
specify a migration mapping to keep migration as flexible as possible.

I'll sleep on it.

--js


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: bitmap migration bug with -drive while block mirror runs
  2019-10-04 13:07                   ` Eric Blake
@ 2019-10-06  3:19                     ` John Snow
  0 siblings, 0 replies; 47+ messages in thread
From: John Snow @ 2019-10-06  3:19 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, Peter Krempa
  Cc: qemu-devel, Qemu-block, Max Reitz



On 10/4/19 9:07 AM, Eric Blake wrote:
> On 10/4/19 4:24 AM, Vladimir Sementsov-Ogievskiy wrote:
> 
>>> The way I see it, we know an auto-generated node name will never be
>>> correct, but an explicitly specified one represents an explicit user
>>> configuration.
>>>
>>> It's wrong to use generated names for migration details, but it's never
>>> wrong to use explicit configuration.
>>>
>>> So I believe they are /already/ distinct in nature. We even already have
>>> code that can tell them apart.
>>
>> Is it restricted to create user node-names formatted like automated ones?
> 
> Yes. Explicit node names cannot begin with '#', while all generated node
> names do.

Right, we already have id_wellformed which tells us which kind of node
names are which. Automatic ones are not wellformed, explicit ones are.

Peter's latest reply to my wall of text is cooling me off on the plan I
laid out in that missive, though.

> 
> 
>>> There are four cases here:
>>>
>>> - The bitmap is loaded to a root node with an explicit name
>>> - The bitmap is loaded to a non-root node with an explicit name
>>>
>>> The blockdev case with persistence. The name represents explicit user
>>> configuration and can be relied upon in the destination.
>>>
>>> - The bitmap is loaded to a root node with an implicit name, with a
>>> named BB
>>>
>>> The -drive case. The named BB represents the explicit user configuration
>>> and can be relied upon.
>>>
>>> - The bitmap is loaded to a non-root node with an implicit name.
>>
>> So, do you suggest to save information of haw bitmap was loaded or
>> created in
>> BdrvDirtyBitmap structure, to distinguish, how to identify it, by blk
>> name or
>> by node-name? And how this information would be updated on bitmap
>> merge? And
>> what about creating bitmaps?
>>
>> So if one bitmap created in node N by blk name B, and another bitmap
>> created in
>> same node N by node-name N, will we migrated these bitmaps in
>> different ways?
> 
> In the -drive case (historical libvirt), the block device is named, and
> node names are generated (it may be possible to use -drive and still
> create explicit node names, but libvirt will never do that).  You can
> create a bitmap using either ('drive-name','bitmap-name'), or
> ('generated-node-name','bitmap-name'), but for the purposes of
> migration, only the 'drive-name' variant is migrateable.
> 
> In the -blockdev case (upcoming libvirt), the block device is anonymous,
> and all node names are given by libvirt.  Thus, you can only create a
> bitmap using ('node-name','bitmap-name'), but it is also obvious that
> migration will use the 'node-name' variant.
> 
> 
>>>>
>>>> If it's a problem for libvirt to keep same node-names, why should we
>>>> insist?
>>>>
>>>>
>>>
>>> Is it really a problem? If libvirt requests migration of bitmaps, those
>>> bitmaps need to go somewhere. Even if it constructs a different graph
>>> for valid reasons, it should still understand which qcow2 nodes
>>> correlate to which other qcow2 nodes and name them accordingly.
>>>
>>> I don't see why this is actually a terrible constraint. Even in our
>>> mapping proposal we're still using node-names.
>>>
>>>
> 
> The obvious case I see is that if we have a source:
> 
> Backing.qcow2 (contains bitmap1) <- Active.qcow2 (contains bitmap2)
> 
> and we want to migrate AND flatten at the same time, but still preserve
> the bitmaps as a record of changes between the points in time, then
> libvirt needs a way to specify migration to:
> 
> Flattened.qcow2 (contains bitmap1 and bitmap2)
> 
> No matter which node name libvirt assigns to Flattened.qcow2, at least
> one of the two bitmaps on the source will be migrated to a different
> node name on the destination, while still giving the net result of a
> bitmap logically associated with the drive (and not any particular node).
> 

A good example that clearly demonstrates the need for an explicit
mapping provided by libvirt.

> 
>> Ok, I'm not completely against node-name matching, keeping in mind
>> that it is
>> current default behavior anyway. And I see Peter not completely
>> against too.
>>
>> But I'd prefer to select default name from current moment, not
>> involving information
>> of "how bitmap was created or loaded", as it may lead to migrating
>> bitmaps from one
>> node in different ways which seems inconsistent.
> 
> As long as a bitmap never has both names non-generated, we should be
> fine (it either has an explicit drive name and generated node name, or
> it has no drive name and an explicit node name).
> 
>>
>> Current default is blk name. And node-name if blk name is not
>> available. So I think
>> the only thing to fix right now is skipping filters. We possibly may
>> additionally
>> restrict migrating bitmaps without blk name and with generated node-name.
>>
> 



^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2019-10-06  3:20 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-01  0:09 bitmap migration bug with -drive while block mirror runs John Snow
2019-10-01  4:28 ` Peter Krempa
2019-10-01  9:07   ` Vladimir Sementsov-Ogievskiy
2019-10-01  8:57 ` Vladimir Sementsov-Ogievskiy
2019-10-01  9:54   ` Kevin Wolf
2019-10-01 10:05     ` Vladimir Sementsov-Ogievskiy
2019-10-01 13:24     ` Peter Krempa
2019-10-01 15:09     ` John Snow
2019-10-01 15:58       ` Kevin Wolf
2019-10-01 16:12         ` Vladimir Sementsov-Ogievskiy
2019-10-01 16:24           ` Kevin Wolf
2019-10-01 16:23         ` John Snow
2019-10-01 11:45   ` Peter Krempa
2019-10-01  9:17 ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:00 ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:10   ` John Snow
2019-10-01 15:57     ` Vladimir Sementsov-Ogievskiy
2019-10-01 16:07       ` John Snow
2019-10-02  8:12         ` Kevin Wolf
2019-10-02 10:46         ` Peter Krempa
2019-10-02 11:11           ` Kevin Wolf
2019-10-02 12:22             ` Vladimir Sementsov-Ogievskiy
2019-10-02 13:48               ` Peter Krempa
2019-10-02 13:43             ` Peter Krempa
2019-10-02 14:03               ` Vladimir Sementsov-Ogievskiy
2019-10-02 21:35           ` John Snow
2019-10-03 10:14             ` Vladimir Sementsov-Ogievskiy
2019-10-03 23:34               ` John Snow
2019-10-04  8:33                 ` Peter Krempa
2019-10-04  9:21                   ` Vladimir Sementsov-Ogievskiy
2019-10-06  3:15                   ` John Snow
2019-10-04  9:24                 ` Vladimir Sementsov-Ogievskiy
2019-10-04 13:07                   ` Eric Blake
2019-10-06  3:19                     ` John Snow
2019-10-01 16:16       ` Kevin Wolf
2019-10-01 16:17         ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:13   ` Max Reitz
2019-10-01 14:27     ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:34       ` Max Reitz
2019-10-01 14:53         ` Vladimir Sementsov-Ogievskiy
2019-10-01 15:26           ` Max Reitz
2019-10-02  7:34             ` Peter Krempa
2019-10-01 15:09         ` Kevin Wolf
2019-10-01 15:27           ` Max Reitz
2019-10-01 16:12             ` Kevin Wolf
2019-10-01 16:17               ` Max Reitz
2019-10-01 16:22                 ` Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.