All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] Intermediate block mirroring
@ 2018-04-12 17:07 Alberto Garcia
  2018-04-13 14:23 ` Max Reitz
  0 siblings, 1 reply; 21+ messages in thread
From: Alberto Garcia @ 2018-04-12 17:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Max Reitz, Eric Blake, Stefan Hajnoczi

Hello,

I mentioned this some time ago, but I'd like to retake it now: I'm
checking how to copy arbitrary nodes on a backing chain, so if I have
e.g.

   [A] <- [B] <- [C] <- [D]

I'd like to end up with

   [A] <- [E] <- [C] <- [D]

where [E] is a copy of [B]. The most obvious use case is to move [B]
to a different storage backend.

At the moment we can already copy [B] into [E] (outside QEMU) and then
do

   change-backing-file device=[D] image-node-name=[C] backing-file=[E]

However this only changes the image on disk and the bs->backing_file
string in memory. QEMU keeps using the [B] BlockDriverState, and we
need to make it switch to [E].

One possible way to do this would be to modify blockdev-mirror so the
source image can be located anywhere on a chain. Currently there's
bs->backing_blocker preventing that, plus qmp_blockdev_mirror()
refuses to take any non-root source node. Other than permission
changes I don't think the algorithm itself needs any additional
modification, although I suppose we'd like to change the backing file
as part of the same job, and that would require API changes.

One other way is to have a more generic replace-node command which
would call bdrv_replace_node(), but I don't know if we want to expose
that and I don't have any other use case for it at the moment.

Opinions, comments?

Berto

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-04-12 17:07 [Qemu-devel] [RFC] Intermediate block mirroring Alberto Garcia
@ 2018-04-13 14:23 ` Max Reitz
  2018-04-16 14:59   ` Alberto Garcia
  0 siblings, 1 reply; 21+ messages in thread
From: Max Reitz @ 2018-04-13 14:23 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

On 2018-04-12 19:07, Alberto Garcia wrote:
> Hello,
> 
> I mentioned this some time ago, but I'd like to retake it now: I'm
> checking how to copy arbitrary nodes on a backing chain, so if I have
> e.g.
> 
>    [A] <- [B] <- [C] <- [D]
> 
> I'd like to end up with
> 
>    [A] <- [E] <- [C] <- [D]
> 
> where [E] is a copy of [B]. The most obvious use case is to move [B]
> to a different storage backend.
> 
> At the moment we can already copy [B] into [E] (outside QEMU) and then
> do
> 
>    change-backing-file device=[D] image-node-name=[C] backing-file=[E]
> 
> However this only changes the image on disk and the bs->backing_file
> string in memory. QEMU keeps using the [B] BlockDriverState, and we
> need to make it switch to [E].
> 
> One possible way to do this would be to modify blockdev-mirror so the
> source image can be located anywhere on a chain. Currently there's
> bs->backing_blocker preventing that, plus qmp_blockdev_mirror()
> refuses to take any non-root source node. Other than permission
> changes I don't think the algorithm itself needs any additional
> modification, although I suppose we'd like to change the backing file
> as part of the same job, and that would require API changes.

Another way would be to write blockdev-copy. :-)

I don't think there should be a limitation which nodes can be used as
the source of blockdev-mirror, and I think the hardest part about
lifting that limitation is about thinking what might break...  Maybe we
should just drop the limitation and deal with the fallout later?  I
can't imagine a single node configuration where we'd want to disallow
copying off a node, so I suppose if that breaks in some edge case that
wouldn't be something we'd have to disallow again but we'd just have to
fix it.

> One other way is to have a more generic replace-node command which
> would call bdrv_replace_node(), but I don't know if we want to expose
> that and I don't have any other use case for it at the moment.

I think we do want to expose that.  As far as I'm informed, for graph
manipulation we want a command that can replace nodes (including
replacing nothing by a new node or replacing an existing node by
nothing, if the parent supports that).

Also, well, there is blockdev-mirror already has a @replaces parameter,
right?  That was supposed to work with quorum, but it seems obvious to
use it here, too.

Max

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-04-13 14:23 ` Max Reitz
@ 2018-04-16 14:59   ` Alberto Garcia
  2018-04-16 15:15     ` Max Reitz
  0 siblings, 1 reply; 21+ messages in thread
From: Alberto Garcia @ 2018-04-16 14:59 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

On Fri 13 Apr 2018 04:23:07 PM CEST, Max Reitz wrote:
> On 2018-04-12 19:07, Alberto Garcia wrote:
>> Hello,
>> 
>> I mentioned this some time ago, but I'd like to retake it now: I'm
>> checking how to copy arbitrary nodes on a backing chain, so if I have
>> e.g.
>> 
>>    [A] <- [B] <- [C] <- [D]
>> 
>> I'd like to end up with
>> 
>>    [A] <- [E] <- [C] <- [D]
>> 
>> where [E] is a copy of [B]. The most obvious use case is to move [B]
>> to a different storage backend.
>> 
>> At the moment we can already copy [B] into [E] (outside QEMU) and then
>> do
>> 
>>    change-backing-file device=[D] image-node-name=[C] backing-file=[E]
>> 
>> However this only changes the image on disk and the bs->backing_file
>> string in memory. QEMU keeps using the [B] BlockDriverState, and we
>> need to make it switch to [E].
>> 
>> One possible way to do this would be to modify blockdev-mirror so the
>> source image can be located anywhere on a chain. Currently there's
>> bs->backing_blocker preventing that, plus qmp_blockdev_mirror()
>> refuses to take any non-root source node. Other than permission
>> changes I don't think the algorithm itself needs any additional
>> modification, although I suppose we'd like to change the backing file
>> as part of the same job, and that would require API changes.
>
> Another way would be to write blockdev-copy. :-)

Something like blockdev-backup but without copying data from backing
files? blockdev-mirror already allows configuring that using the
MirrorSyncMode parameter.

> I don't think there should be a limitation which nodes can be used as
> the source of blockdev-mirror, and I think the hardest part about
> lifting that limitation is about thinking what might break...  Maybe
> we should just drop the limitation and deal with the fallout later?  I
> can't imagine a single node configuration where we'd want to disallow
> copying off a node, so I suppose if that breaks in some edge case that
> wouldn't be something we'd have to disallow again but we'd just have
> to fix it.

I can't think of any problem either. The important question is what to
do if the source node is read/write, but that's what blockdev-mirror is
already about. In all other cases the source node is going to be read
only, isn't it?

>> One other way is to have a more generic replace-node command which
>> would call bdrv_replace_node(), but I don't know if we want to expose
>> that and I don't have any other use case for it at the moment.
>
> I think we do want to expose that.  As far as I'm informed, for graph
> manipulation we want a command that can replace nodes (including
> replacing nothing by a new node or replacing an existing node by
> nothing, if the parent supports that).

Was there any discussion about this in the mailing list? (proposed name
for this function, features, etc.)

> Also, well, there is blockdev-mirror already has a @replaces
> parameter, right?  That was supposed to work with quorum, but it seems
> obvious to use it here, too.

Yes, it could be used for this.

Berto

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-04-16 14:59   ` Alberto Garcia
@ 2018-04-16 15:15     ` Max Reitz
  2018-04-18 15:34       ` Alberto Garcia
  0 siblings, 1 reply; 21+ messages in thread
From: Max Reitz @ 2018-04-16 15:15 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 4920 bytes --]

On 2018-04-16 16:59, Alberto Garcia wrote:
> On Fri 13 Apr 2018 04:23:07 PM CEST, Max Reitz wrote:
>> On 2018-04-12 19:07, Alberto Garcia wrote:
>>> Hello,
>>>
>>> I mentioned this some time ago, but I'd like to retake it now: I'm
>>> checking how to copy arbitrary nodes on a backing chain, so if I have
>>> e.g.
>>>
>>>    [A] <- [B] <- [C] <- [D]
>>>
>>> I'd like to end up with
>>>
>>>    [A] <- [E] <- [C] <- [D]
>>>
>>> where [E] is a copy of [B]. The most obvious use case is to move [B]
>>> to a different storage backend.
>>>
>>> At the moment we can already copy [B] into [E] (outside QEMU) and then
>>> do
>>>
>>>    change-backing-file device=[D] image-node-name=[C] backing-file=[E]
>>>
>>> However this only changes the image on disk and the bs->backing_file
>>> string in memory. QEMU keeps using the [B] BlockDriverState, and we
>>> need to make it switch to [E].
>>>
>>> One possible way to do this would be to modify blockdev-mirror so the
>>> source image can be located anywhere on a chain. Currently there's
>>> bs->backing_blocker preventing that, plus qmp_blockdev_mirror()
>>> refuses to take any non-root source node. Other than permission
>>> changes I don't think the algorithm itself needs any additional
>>> modification, although I suppose we'd like to change the backing file
>>> as part of the same job, and that would require API changes.
>>
>> Another way would be to write blockdev-copy. :-)
> 
> Something like blockdev-backup but without copying data from backing
> files? blockdev-mirror already allows configuring that using the
> MirrorSyncMode parameter.

No, a block job that unites blockdev-backup, blockdev-mirror, and
block-commit.

Active commit already is just mirror, non-active commit probably can
easily be implemented with mirror (we just need an auto-complete), and
it should be possible to add a backup mode to mirror (like active
mirroring).  So we'd just have a single blockdev-copy job that can
simply... Copy data from any node to any other.

I think we wanted to exclude streaming, because that works differently.
But, well, streaming would be just installing a COR filter node and
doing a blockdev-copy to null-co://, right? :-)

>> I don't think there should be a limitation which nodes can be used as
>> the source of blockdev-mirror, and I think the hardest part about
>> lifting that limitation is about thinking what might break...  Maybe
>> we should just drop the limitation and deal with the fallout later?  I
>> can't imagine a single node configuration where we'd want to disallow
>> copying off a node, so I suppose if that breaks in some edge case that
>> wouldn't be something we'd have to disallow again but we'd just have
>> to fix it.
> 
> I can't think of any problem either. The important question is what to
> do if the source node is read/write, but that's what blockdev-mirror is
> already about. In all other cases the source node is going to be read
> only, isn't it?

I guess so.

I just now remembered a bitmap issue, but that probably shouldn't stop
you.  There is an issue with bitmaps on nodes with shared-writing children.

Imagine you have some qcow2 node or whatever, and you have a guest
running on it.  Then you attach a filter to it (which allows shared
writing) and put a mirror job on top of the filter.  Now whenever the
guest writes to the qcow2 node, the bitmap the mirror job installed on
the filter won't reflect that.  So that's an issue, but I think that
case isn't forbidden by the current permission system, so we're already
screwed there.

So on the contrary, I'd suspect that running mirror inside of some
backing chain or whatever may be less problematic than running it on
top, actually...

>>> One other way is to have a more generic replace-node command which
>>> would call bdrv_replace_node(), but I don't know if we want to expose
>>> that and I don't have any other use case for it at the moment.
>>
>> I think we do want to expose that.  As far as I'm informed, for graph
>> manipulation we want a command that can replace nodes (including
>> replacing nothing by a new node or replacing an existing node by
>> nothing, if the parent supports that).
> 
> Was there any discussion about this in the mailing list? (proposed name
> for this function, features, etc.)

Well, there is x-blockdev-change.  But we probably want to expose
bdrv_reopen() in the long run.

http://lists.nongnu.org/archive/html/qemu-block/2017-12/msg00522.html is
what I can offer you.  That also has some info on blockdev-copy and the
bitmap issue ("omniscient bitmaps" is the keyword).

Max

>> Also, well, there is blockdev-mirror already has a @replaces
>> parameter, right?  That was supposed to work with quorum, but it seems
>> obvious to use it here, too.
> 
> Yes, it could be used for this.
> 
> Berto
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-04-16 15:15     ` Max Reitz
@ 2018-04-18 15:34       ` Alberto Garcia
  2018-04-20 13:13         ` Max Reitz
  0 siblings, 1 reply; 21+ messages in thread
From: Alberto Garcia @ 2018-04-18 15:34 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

On Mon 16 Apr 2018 05:15:21 PM CEST, Max Reitz wrote:
> On 2018-04-16 16:59, Alberto Garcia wrote:
>> On Fri 13 Apr 2018 04:23:07 PM CEST, Max Reitz wrote:
>>> On 2018-04-12 19:07, Alberto Garcia wrote:
>>>> Hello,
>>>>
>>>> I mentioned this some time ago, but I'd like to retake it now: I'm
>>>> checking how to copy arbitrary nodes on a backing chain, so if I have
>>>> e.g.
>>>>
>>>>    [A] <- [B] <- [C] <- [D]
>>>>
>>>> I'd like to end up with
>>>>
>>>>    [A] <- [E] <- [C] <- [D]
>>>>
>>>> where [E] is a copy of [B]. The most obvious use case is to move [B]
>>>> to a different storage backend.
>>>>
>>>> At the moment we can already copy [B] into [E] (outside QEMU) and then
>>>> do
>>>>
>>>>    change-backing-file device=[D] image-node-name=[C] backing-file=[E]
>>>>
>>>> However this only changes the image on disk and the bs->backing_file
>>>> string in memory. QEMU keeps using the [B] BlockDriverState, and we
>>>> need to make it switch to [E].
>>>>
>>>> One possible way to do this would be to modify blockdev-mirror so the
>>>> source image can be located anywhere on a chain. Currently there's
>>>> bs->backing_blocker preventing that, plus qmp_blockdev_mirror()
>>>> refuses to take any non-root source node. Other than permission
>>>> changes I don't think the algorithm itself needs any additional
>>>> modification, although I suppose we'd like to change the backing file
>>>> as part of the same job, and that would require API changes.
>>>
>>> Another way would be to write blockdev-copy. :-)
>> 
>> Something like blockdev-backup but without copying data from backing
>> files? blockdev-mirror already allows configuring that using the
>> MirrorSyncMode parameter.
>
> No, a block job that unites blockdev-backup, blockdev-mirror, and
> block-commit.

Yeah, I was reading again the notes from December. This would then unify
all those existing commands.

Is there any other plan or roadmap for this command, or was just an idea
that was discussed as nice to have? Also, does that mean that the old
commands should not be changed now? I'm asking because there's a big
difference between updating blockdev-mirror to allow non-root source
nodes and creating this new command that would support all features of
the old ones :-)

> I think we wanted to exclude streaming, because that works
> differently.  But, well, streaming would be just installing a COR
> filter node and doing a blockdev-copy to null-co://, right? :-)

Yeah I suppose, but I don't think you can make the filter step part of
the blockdev-copy API.

> I just now remembered a bitmap issue, but that probably shouldn't stop
> you.  There is an issue with bitmaps on nodes with shared-writing
> children.
>
> Imagine you have some qcow2 node or whatever, and you have a guest
> running on it.  Then you attach a filter to it (which allows shared
> writing) and put a mirror job on top of the filter.  Now whenever the
> guest writes to the qcow2 node, the bitmap the mirror job installed on
> the filter won't reflect that.  So that's an issue, but I think that
> case isn't forbidden by the current permission system, so we're
> already screwed there.
>
> So on the contrary, I'd suspect that running mirror inside of some
> backing chain or whatever may be less problematic than running it on
> top, actually...

Yeah I don't think there's a problem there.

>>>> One other way is to have a more generic replace-node command which
>>>> would call bdrv_replace_node(), but I don't know if we want to
>>>> expose that and I don't have any other use case for it at the
>>>> moment.
>>>
>>> I think we do want to expose that.  As far as I'm informed, for
>>> graph manipulation we want a command that can replace nodes
>>> (including replacing nothing by a new node or replacing an existing
>>> node by nothing, if the parent supports that).
>> 
>> Was there any discussion about this in the mailing list? (proposed
>> name for this function, features, etc.)
>
> Well, there is x-blockdev-change.  But we probably want to expose
> bdrv_reopen() in the long run.

Exposing bdrv_reopen() itself shouldn't be too much work (it takes just
two node names, or am I missing something?).

There's still the question of how to update the backing file string (on
the overlay). stream and commit do it automatically, but if we do
bdrv_reopen() or the aforementioned modification to blockdev-mirror we'd
need to do it explicitly with change-backing-file, or extend the API to
allow for that.

Berto

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-04-18 15:34       ` Alberto Garcia
@ 2018-04-20 13:13         ` Max Reitz
  2018-04-25 12:58           ` Alberto Garcia
  0 siblings, 1 reply; 21+ messages in thread
From: Max Reitz @ 2018-04-20 13:13 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 5432 bytes --]

On 2018-04-18 17:34, Alberto Garcia wrote:
> On Mon 16 Apr 2018 05:15:21 PM CEST, Max Reitz wrote:
>> On 2018-04-16 16:59, Alberto Garcia wrote:
>>> On Fri 13 Apr 2018 04:23:07 PM CEST, Max Reitz wrote:
>>>> On 2018-04-12 19:07, Alberto Garcia wrote:
>>>>> Hello,
>>>>>
>>>>> I mentioned this some time ago, but I'd like to retake it now: I'm
>>>>> checking how to copy arbitrary nodes on a backing chain, so if I have
>>>>> e.g.
>>>>>
>>>>>    [A] <- [B] <- [C] <- [D]
>>>>>
>>>>> I'd like to end up with
>>>>>
>>>>>    [A] <- [E] <- [C] <- [D]
>>>>>
>>>>> where [E] is a copy of [B]. The most obvious use case is to move [B]
>>>>> to a different storage backend.
>>>>>
>>>>> At the moment we can already copy [B] into [E] (outside QEMU) and then
>>>>> do
>>>>>
>>>>>    change-backing-file device=[D] image-node-name=[C] backing-file=[E]
>>>>>
>>>>> However this only changes the image on disk and the bs->backing_file
>>>>> string in memory. QEMU keeps using the [B] BlockDriverState, and we
>>>>> need to make it switch to [E].
>>>>>
>>>>> One possible way to do this would be to modify blockdev-mirror so the
>>>>> source image can be located anywhere on a chain. Currently there's
>>>>> bs->backing_blocker preventing that, plus qmp_blockdev_mirror()
>>>>> refuses to take any non-root source node. Other than permission
>>>>> changes I don't think the algorithm itself needs any additional
>>>>> modification, although I suppose we'd like to change the backing file
>>>>> as part of the same job, and that would require API changes.
>>>>
>>>> Another way would be to write blockdev-copy. :-)
>>>
>>> Something like blockdev-backup but without copying data from backing
>>> files? blockdev-mirror already allows configuring that using the
>>> MirrorSyncMode parameter.
>>
>> No, a block job that unites blockdev-backup, blockdev-mirror, and
>> block-commit.
> 
> Yeah, I was reading again the notes from December. This would then unify
> all those existing commands.
> 
> Is there any other plan or roadmap for this command, or was just an idea
> that was discussed as nice to have?

Mostly the latter, I think.  Well, there definitely isn't a roadmap and
there is no plan apart from what's in the December notes.

>                                     Also, does that mean that the old
> commands should not be changed now? I'm asking because there's a big
> difference between updating blockdev-mirror to allow non-root source
> nodes and creating this new command that would support all features of
> the old ones :-)

No, changing old commands is OK.  We just should think about how to
incorporate those changes into blockdev-copy (so don't introduce some
way of doing the same thing with two different interfaces to both mirror
and backup).

Especially adding things to mirror should be OK, because I suppose
mirror is going to be our blockdev-copy (and we'll just extend it to
cover backup and non-active commit).

Of course, things usually don't get done because they're "nice to have"
but because someone actually needs them.  So that's the reason why I
mentioned blockdev-copy. O:-)

>> I think we wanted to exclude streaming, because that works
>> differently.  But, well, streaming would be just installing a COR
>> filter node and doing a blockdev-copy to null-co://, right? :-)
> 
> Yeah I suppose, but I don't think you can make the filter step part of
> the blockdev-copy API.

No, definitely not.

Or, well, actually...  If you could start block jobs by installing their
filter nodes via blockdev-add, then maybe you could in a sense, but
let's not get into that rabbit hole now. O:-)

[...]

>>>>> One other way is to have a more generic replace-node command which
>>>>> would call bdrv_replace_node(), but I don't know if we want to
>>>>> expose that and I don't have any other use case for it at the
>>>>> moment.
>>>>
>>>> I think we do want to expose that.  As far as I'm informed, for
>>>> graph manipulation we want a command that can replace nodes
>>>> (including replacing nothing by a new node or replacing an existing
>>>> node by nothing, if the parent supports that).
>>>
>>> Was there any discussion about this in the mailing list? (proposed
>>> name for this function, features, etc.)
>>
>> Well, there is x-blockdev-change.  But we probably want to expose
>> bdrv_reopen() in the long run.
> 
> Exposing bdrv_reopen() itself shouldn't be too much work (it takes just
> two node names, or am I missing something?).

I suppose Kevin knows more about this.  I guess there may be things in
the block layer that cannot cope with an arbitrary reopen from the user,
but that's the only thing that would come to my mind.

So from my perspective...  If you think it's easy, why don't you try it
and then we'll see? *cough*

> There's still the question of how to update the backing file string (on
> the overlay). stream and commit do it automatically, but if we do
> bdrv_reopen() or the aforementioned modification to blockdev-mirror we'd
> need to do it explicitly with change-backing-file, or extend the API to
> allow for that.

Well, the December notes do say that we probably want an option to
specify the target's filename which is what will be written into the
overlays of @to_replace as their backing file string, so I think
extending the API should be fine.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-04-20 13:13         ` Max Reitz
@ 2018-04-25 12:58           ` Alberto Garcia
  2018-04-25 13:06             ` Max Reitz
  0 siblings, 1 reply; 21+ messages in thread
From: Alberto Garcia @ 2018-04-25 12:58 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

On Fri 20 Apr 2018 03:13:49 PM CEST, Max Reitz wrote:
>>>>>> One other way is to have a more generic replace-node command
>>>>>> which would call bdrv_replace_node(), but I don't know if we want
>>>>>> to expose that and I don't have any other use case for it at the
>>>>>> moment.
>>>>>
>>>>> I think we do want to expose that.  As far as I'm informed, for
>>>>> graph manipulation we want a command that can replace nodes
>>>>> (including replacing nothing by a new node or replacing an
>>>>> existing node by nothing, if the parent supports that).
>>>>
>>>> Was there any discussion about this in the mailing list? (proposed
>>>> name for this function, features, etc.)
>>>
>>> Well, there is x-blockdev-change.  But we probably want to expose
>>> bdrv_reopen() in the long run.
>> 
>> Exposing bdrv_reopen() itself shouldn't be too much work (it takes
>> just two node names, or am I missing something?).

I just realized that I meant "Exposing bdrv_replace_node()" here.

> So from my perspective...  If you think it's easy, why don't you try
> it and then we'll see? *cough*

I'm doing it :-)

>> There's still the question of how to update the backing file string
>> (on the overlay). stream and commit do it automatically, but if we do
>> bdrv_reopen() or the aforementioned modification to blockdev-mirror

bdrv_replace_node() again

>> we'd need to do it explicitly with change-backing-file, or extend the
>> API to allow for that.
>
> Well, the December notes do say that we probably want an option to
> specify the target's filename which is what will be written into the
> overlays of @to_replace as their backing file string, so I think
> extending the API should be fine.

Yes, but how do you find out what the overlays are (without an
additional parameter, that is)? I thought we didn't want to support
walking the backing chain in reverse direction.

Berto

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-04-25 12:58           ` Alberto Garcia
@ 2018-04-25 13:06             ` Max Reitz
  2018-04-25 13:42               ` Alberto Garcia
  0 siblings, 1 reply; 21+ messages in thread
From: Max Reitz @ 2018-04-25 13:06 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2641 bytes --]

On 2018-04-25 14:58, Alberto Garcia wrote:
> On Fri 20 Apr 2018 03:13:49 PM CEST, Max Reitz wrote:
>>>>>>> One other way is to have a more generic replace-node command
>>>>>>> which would call bdrv_replace_node(), but I don't know if we want
>>>>>>> to expose that and I don't have any other use case for it at the
>>>>>>> moment.
>>>>>>
>>>>>> I think we do want to expose that.  As far as I'm informed, for
>>>>>> graph manipulation we want a command that can replace nodes
>>>>>> (including replacing nothing by a new node or replacing an
>>>>>> existing node by nothing, if the parent supports that).
>>>>>
>>>>> Was there any discussion about this in the mailing list? (proposed
>>>>> name for this function, features, etc.)
>>>>
>>>> Well, there is x-blockdev-change.  But we probably want to expose
>>>> bdrv_reopen() in the long run.
>>>
>>> Exposing bdrv_reopen() itself shouldn't be too much work (it takes
>>> just two node names, or am I missing something?).
> 
> I just realized that I meant "Exposing bdrv_replace_node()" here.
> 
>> So from my perspective...  If you think it's easy, why don't you try
>> it and then we'll see? *cough*
> 
> I'm doing it :-)
> 
>>> There's still the question of how to update the backing file string
>>> (on the overlay). stream and commit do it automatically, but if we do
>>> bdrv_reopen() or the aforementioned modification to blockdev-mirror
> 
> bdrv_replace_node() again

But the question stands whether we need simple node replacement when we
want bdrv_reopen() anyway.  In addition, we don't need just replacement,
we also need addition and removal (e.g. for backing files or quorum
children) -- and especially in the case of quorum, that is going to be a
pain (mostly naming the children).

With bdrv_reopen(), we can just require the user to respecify all
children, so we don't run into the issue of how to name things at least.

(Other things we we'd want bdrv_reopen() are listed in the meeting notes.)

>>> we'd need to do it explicitly with change-backing-file, or extend the
>>> API to allow for that.
>>
>> Well, the December notes do say that we probably want an option to
>> specify the target's filename which is what will be written into the
>> overlays of @to_replace as their backing file string, so I think
>> extending the API should be fine.
> 
> Yes, but how do you find out what the overlays are (without an
> additional parameter, that is)? I thought we didn't want to support
> walking the backing chain in reverse direction.

We already have an .update_filename() BdrvChildRole callback, though.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-04-25 13:06             ` Max Reitz
@ 2018-04-25 13:42               ` Alberto Garcia
  2018-04-25 14:03                 ` Max Reitz
  0 siblings, 1 reply; 21+ messages in thread
From: Alberto Garcia @ 2018-04-25 13:42 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

On Wed 25 Apr 2018 03:06:34 PM CEST, Max Reitz wrote:
>>>>>>>> One other way is to have a more generic replace-node command
>>>>>>>> which would call bdrv_replace_node(), but I don't know if we
>>>>>>>> want to expose that and I don't have any other use case for it
>>>>>>>> at the moment.
>>>>>>>
>>>>>>> I think we do want to expose that.  As far as I'm informed, for
>>>>>>> graph manipulation we want a command that can replace nodes
>>>>>>> (including replacing nothing by a new node or replacing an
>>>>>>> existing node by nothing, if the parent supports that).
>>>>>>
>>>>>> Was there any discussion about this in the mailing list?
>>>>>> (proposed name for this function, features, etc.)
>>>>>
>>>>> Well, there is x-blockdev-change.  But we probably want to expose
>>>>> bdrv_reopen() in the long run.
>>>>
>>>> Exposing bdrv_reopen() itself shouldn't be too much work (it takes
>>>> just two node names, or am I missing something?).
>> 
>> I just realized that I meant "Exposing bdrv_replace_node()" here.
>> 
>>> So from my perspective...  If you think it's easy, why don't you try
>>> it and then we'll see? *cough*
>> 
>> I'm doing it :-)
>> 
>>>> There's still the question of how to update the backing file string
>>>> (on the overlay). stream and commit do it automatically, but if we do
>>>> bdrv_reopen() or the aforementioned modification to blockdev-mirror
>> 
>> bdrv_replace_node() again
>
> But the question stands whether we need simple node replacement when
> we want bdrv_reopen() anyway.  In addition, we don't need just
> replacement, we also need addition and removal (e.g. for backing files
> or quorum children) -- and especially in the case of quorum, that is
> going to be a pain (mostly naming the children).
>
> With bdrv_reopen(), we can just require the user to respecify all
> children, so we don't run into the issue of how to name things at
> least.

So in this example, if you want to replace [B] with [E] you would reopen
[C] specifying the new backing file?

   [A] <- [B] <- [C] <- [D]

          [E]

Berto

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-04-25 13:42               ` Alberto Garcia
@ 2018-04-25 14:03                 ` Max Reitz
  2018-05-02 13:07                   ` Alberto Garcia
  0 siblings, 1 reply; 21+ messages in thread
From: Max Reitz @ 2018-04-25 14:03 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2269 bytes --]

On 2018-04-25 15:42, Alberto Garcia wrote:
> On Wed 25 Apr 2018 03:06:34 PM CEST, Max Reitz wrote:
>>>>>>>>> One other way is to have a more generic replace-node command
>>>>>>>>> which would call bdrv_replace_node(), but I don't know if we
>>>>>>>>> want to expose that and I don't have any other use case for it
>>>>>>>>> at the moment.
>>>>>>>>
>>>>>>>> I think we do want to expose that.  As far as I'm informed, for
>>>>>>>> graph manipulation we want a command that can replace nodes
>>>>>>>> (including replacing nothing by a new node or replacing an
>>>>>>>> existing node by nothing, if the parent supports that).
>>>>>>>
>>>>>>> Was there any discussion about this in the mailing list?
>>>>>>> (proposed name for this function, features, etc.)
>>>>>>
>>>>>> Well, there is x-blockdev-change.  But we probably want to expose
>>>>>> bdrv_reopen() in the long run.
>>>>>
>>>>> Exposing bdrv_reopen() itself shouldn't be too much work (it takes
>>>>> just two node names, or am I missing something?).
>>>
>>> I just realized that I meant "Exposing bdrv_replace_node()" here.
>>>
>>>> So from my perspective...  If you think it's easy, why don't you try
>>>> it and then we'll see? *cough*
>>>
>>> I'm doing it :-)
>>>
>>>>> There's still the question of how to update the backing file string
>>>>> (on the overlay). stream and commit do it automatically, but if we do
>>>>> bdrv_reopen() or the aforementioned modification to blockdev-mirror
>>>
>>> bdrv_replace_node() again
>>
>> But the question stands whether we need simple node replacement when
>> we want bdrv_reopen() anyway.  In addition, we don't need just
>> replacement, we also need addition and removal (e.g. for backing files
>> or quorum children) -- and especially in the case of quorum, that is
>> going to be a pain (mostly naming the children).
>>
>> With bdrv_reopen(), we can just require the user to respecify all
>> children, so we don't run into the issue of how to name things at
>> least.
> 
> So in this example, if you want to replace [B] with [E] you would reopen
> [C] specifying the new backing file?
> 
>    [A] <- [B] <- [C] <- [D]
> 
>           [E]

You can just use child node name references like in blockdev-add.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-04-25 14:03                 ` Max Reitz
@ 2018-05-02 13:07                   ` Alberto Garcia
  2018-05-02 14:12                     ` Max Reitz
  0 siblings, 1 reply; 21+ messages in thread
From: Alberto Garcia @ 2018-05-02 13:07 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

On Wed 25 Apr 2018 04:03:22 PM CEST, Max Reitz wrote:
>>> But the question stands whether we need simple node replacement when
>>> we want bdrv_reopen() anyway.  In addition, we don't need just
>>> replacement, we also need addition and removal (e.g. for backing
>>> files or quorum children) -- and especially in the case of quorum,
>>> that is going to be a pain (mostly naming the children).
>>>
>>> With bdrv_reopen(), we can just require the user to respecify all
>>> children, so we don't run into the issue of how to name things at
>>> least.
>> 
>> So in this example, if you want to replace [B] with [E] you would
>> reopen [C] specifying the new backing file?
>> 
>>    [A] <- [B] <- [C] <- [D]
>> 
>>           [E]
>
> You can just use child node name references like in blockdev-add.

Were the (more or less) exact requirements of QMP blockdev-reopen
discussed? How is it different from qemu-io's "reopen" command? What are
the options that you can and can not change?

Also, you could replace an element in the backing chain using reopen,
but can you replace a node that is not a backing image of another one?

With QMP blockdev-replace you could use the existing blockdev-add
command and create an arbitrary tree and then simply do blockdev-replace
so I suppose that for some use cases at least it would be pretty
straightforward.

Berto

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-05-02 13:07                   ` Alberto Garcia
@ 2018-05-02 14:12                     ` Max Reitz
  2018-05-03 10:32                       ` Alberto Garcia
  2018-05-03 12:22                       ` Kevin Wolf
  0 siblings, 2 replies; 21+ messages in thread
From: Max Reitz @ 2018-05-02 14:12 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2937 bytes --]

On 2018-05-02 15:07, Alberto Garcia wrote:
> On Wed 25 Apr 2018 04:03:22 PM CEST, Max Reitz wrote:
>>>> But the question stands whether we need simple node replacement when
>>>> we want bdrv_reopen() anyway.  In addition, we don't need just
>>>> replacement, we also need addition and removal (e.g. for backing
>>>> files or quorum children) -- and especially in the case of quorum,
>>>> that is going to be a pain (mostly naming the children).
>>>>
>>>> With bdrv_reopen(), we can just require the user to respecify all
>>>> children, so we don't run into the issue of how to name things at
>>>> least.
>>>
>>> So in this example, if you want to replace [B] with [E] you would
>>> reopen [C] specifying the new backing file?
>>>
>>>    [A] <- [B] <- [C] <- [D]
>>>
>>>           [E]
>>
>> You can just use child node name references like in blockdev-add.
> 
> Were the (more or less) exact requirements of QMP blockdev-reopen
> discussed? How is it different from qemu-io's "reopen" command? What are
> the options that you can and can not change?

I can't quite remember, I'm afraid.  I think it was supposed to be
pretty much qemu-io's reopen (so just bdrv_reopen()).  I suppose you
cannot change the driver (obviously) or probably the node name, because
either would result in the node being replaced by a completely new one.

Other than that, it probably depends on what the block driver supports,
but ideally you should be able to change everything.

> Also, you could replace an element in the backing chain using reopen,
> but can you replace a node that is not a backing image of another one?

Node replacement only makes sense if the node is used as the child of
something.  Otherwise it's just a blockdev-add + blockdev-del.

But it doesn't have to be in a backing chain, of course.  It would work
for Quorum children, too.

> With QMP blockdev-replace you could use the existing blockdev-add
> command and create an arbitrary tree and then simply do blockdev-replace
> so I suppose that for some use cases at least it would be pretty
> straightforward.

You can do the same with reopen.  Just use a reference for the child name.

Say you have a qcow2 node "overlay" with a backing child.  Now you want
to replace that with a different child "new-backing".

For both blockdev-replace and reopen the first thing you can do is
blockdev-add the new child.  Then it depends:

blockdev-replace: You call it with parent=overlay, child=backing,
node=new-backing.

reopen: You call it like blockdev-add, that is:
    { "node-name": "overlay",
      "backing": "new-backing" }
In theory every option you do not specify is unchanged, so I suppose you
can leave out e.g. the "driver" option here.

Sure, with reopen you can also respecify the whole tree if you so desire
(so you can get away without blockdev-add'ing new-backing before), but
you don't have to.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-05-02 14:12                     ` Max Reitz
@ 2018-05-03 10:32                       ` Alberto Garcia
  2018-05-03 12:22                       ` Kevin Wolf
  1 sibling, 0 replies; 21+ messages in thread
From: Alberto Garcia @ 2018-05-03 10:32 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, Kevin Wolf, Eric Blake, Stefan Hajnoczi

On Wed 02 May 2018 04:12:49 PM CEST, Max Reitz wrote:
> On 2018-05-02 15:07, Alberto Garcia wrote:
>> On Wed 25 Apr 2018 04:03:22 PM CEST, Max Reitz wrote:
>>>>> But the question stands whether we need simple node replacement when
>>>>> we want bdrv_reopen() anyway.  In addition, we don't need just
>>>>> replacement, we also need addition and removal (e.g. for backing
>>>>> files or quorum children) -- and especially in the case of quorum,
>>>>> that is going to be a pain (mostly naming the children).
>>>>>
>>>>> With bdrv_reopen(), we can just require the user to respecify all
>>>>> children, so we don't run into the issue of how to name things at
>>>>> least.
>>>>
>>>> So in this example, if you want to replace [B] with [E] you would
>>>> reopen [C] specifying the new backing file?
>>>>
>>>>    [A] <- [B] <- [C] <- [D]
>>>>
>>>>           [E]
>>>
>>> You can just use child node name references like in blockdev-add.
>> 
>> Were the (more or less) exact requirements of QMP blockdev-reopen
>> discussed? How is it different from qemu-io's "reopen" command? What are
>> the options that you can and can not change?
>
> I can't quite remember, I'm afraid.  I think it was supposed to be
> pretty much qemu-io's reopen (so just bdrv_reopen()).  I suppose you
> cannot change the driver (obviously) or probably the node name,
> because either would result in the node being replaced by a completely
> new one.

I was testing qemu-io's reopen and it didn't seem like you could change
the backing chain at all, so this would probably need to do more things.

> Other than that, it probably depends on what the block driver
> supports, but ideally you should be able to change everything.

I agree.

>> Also, you could replace an element in the backing chain using reopen,
>> but can you replace a node that is not a backing image of another
>> one?
>
> Node replacement only makes sense if the node is used as the child of
> something.  Otherwise it's just a blockdev-add + blockdev-del.

But how do you do blockdev-del of a root node if it's being used by the
guest?

> But it doesn't have to be in a backing chain, of course.  It would
> work for Quorum children, too.

Sure, I meant "child" when I said "backing image" :-)

>> With QMP blockdev-replace you could use the existing blockdev-add
>> command and create an arbitrary tree and then simply do
>> blockdev-replace so I suppose that for some use cases at least it
>> would be pretty straightforward.
>
> You can do the same with reopen.  Just use a reference for the child
> name.

Yes, yes, I agree. I mean that the implementation with blockdev-replace
seems much simpler.

Although there are cases where blockdev-replace would probably not work,
e.g. if you want to change the options of an image opened in read-write
mode. In this case you cannot open the image again with the new options
and replace the old with the new one because of the image locks. So you
would need to reopen the image instead.

Berto

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-05-02 14:12                     ` Max Reitz
  2018-05-03 10:32                       ` Alberto Garcia
@ 2018-05-03 12:22                       ` Kevin Wolf
  2018-05-03 12:33                         ` Alberto Garcia
                                           ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: Kevin Wolf @ 2018-05-03 12:22 UTC (permalink / raw)
  To: Max Reitz
  Cc: Alberto Garcia, qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2130 bytes --]

Am 02.05.2018 um 16:12 hat Max Reitz geschrieben:
> On 2018-05-02 15:07, Alberto Garcia wrote:
> > Were the (more or less) exact requirements of QMP blockdev-reopen
> > discussed? How is it different from qemu-io's "reopen" command? What are
> > the options that you can and can not change?
> 
> I can't quite remember, I'm afraid.  I think it was supposed to be
> pretty much qemu-io's reopen (so just bdrv_reopen()).  I suppose you
> cannot change the driver (obviously) or probably the node name, because
> either would result in the node being replaced by a completely new one.
> 
> Other than that, it probably depends on what the block driver supports,
> but ideally you should be able to change everything.

Honestly the design of bdrv_reopen() is quite broken because of the way
it tries to maintain old options if they aren't specified, and guesses
what you might mean when you add flags to the mix. The exact semantics
are quite complicated and I'd rather avoid them in a stable API.

A clean QMP command would probably apply the same defaults as
blockdev-add, so you just get to specify the full options again.

> reopen: You call it like blockdev-add, that is:
>     { "node-name": "overlay",
>       "backing": "new-backing" }
> In theory every option you do not specify is unchanged, so I suppose you
> can leave out e.g. the "driver" option here.

That already doesn't work because "driver" is the discriminator for the
QAPI union, so it must be mandatory. I'd go further and reuse
BlockdevOptions, which makes everything mandatory that is mandatory when
creating the image.

blockdev-reopen would basically mean "replace the whole set of options
of this node".

> Sure, with reopen you can also respecify the whole tree if you so desire
> (so you can get away without blockdev-add'ing new-backing before), but
> you don't have to.

It might be reasonable to forbid inline declarations for BlockdevRef in
blockdev-reopen, so that always have to specify node names instead. On
the other hand, maybe support for inline declarations falls out almost
automatically.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-05-03 12:22                       ` Kevin Wolf
@ 2018-05-03 12:33                         ` Alberto Garcia
  2018-05-09 14:22                         ` Alberto Garcia
  2018-06-01 10:51                         ` Alberto Garcia
  2 siblings, 0 replies; 21+ messages in thread
From: Alberto Garcia @ 2018-05-03 12:33 UTC (permalink / raw)
  To: Kevin Wolf, Max Reitz; +Cc: qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

On Thu 03 May 2018 02:22:41 PM CEST, Kevin Wolf wrote:
> Am 02.05.2018 um 16:12 hat Max Reitz geschrieben:
>> On 2018-05-02 15:07, Alberto Garcia wrote:
>> > Were the (more or less) exact requirements of QMP blockdev-reopen
>> > discussed? How is it different from qemu-io's "reopen" command? What are
>> > the options that you can and can not change?
>> 
>> I can't quite remember, I'm afraid.  I think it was supposed to be
>> pretty much qemu-io's reopen (so just bdrv_reopen()).  I suppose you
>> cannot change the driver (obviously) or probably the node name, because
>> either would result in the node being replaced by a completely new one.
>> 
>> Other than that, it probably depends on what the block driver supports,
>> but ideally you should be able to change everything.
>
> Honestly the design of bdrv_reopen() is quite broken because of the way
> it tries to maintain old options if they aren't specified, and guesses
> what you might mean when you add flags to the mix. The exact semantics
> are quite complicated and I'd rather avoid them in a stable API.
>
> A clean QMP command would probably apply the same defaults as
> blockdev-add, so you just get to specify the full options again.

Ok, so the end result would be more or less equivalent to "open the new
block device with blockdev-add and the user-specified options, then
replace the old one with the new one", but implemented with reopen
instead of open + replace.

Berto

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-05-03 12:22                       ` Kevin Wolf
  2018-05-03 12:33                         ` Alberto Garcia
@ 2018-05-09 14:22                         ` Alberto Garcia
  2018-06-01 10:51                         ` Alberto Garcia
  2 siblings, 0 replies; 21+ messages in thread
From: Alberto Garcia @ 2018-05-09 14:22 UTC (permalink / raw)
  To: Kevin Wolf, Max Reitz; +Cc: qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

On Thu 03 May 2018 02:22:41 PM CEST, Kevin Wolf wrote:

> A clean QMP command would probably apply the same defaults as
> blockdev-add, so you just get to specify the full options again.
>
>> reopen: You call it like blockdev-add, that is:
>>     { "node-name": "overlay",
>>       "backing": "new-backing" }
>> In theory every option you do not specify is unchanged, so I suppose
>> you can leave out e.g. the "driver" option here.
>
> That already doesn't work because "driver" is the discriminator for
> the QAPI union, so it must be mandatory. I'd go further and reuse
> BlockdevOptions, which makes everything mandatory that is mandatory
> when creating the image.
>
> blockdev-reopen would basically mean "replace the whole set of options
> of this node".

There's at least one case which would need special consideration. Let's
say we want to reopen an image ("top") that has a backing file ("base").

When specifying the options there are at least these alternatives for
the backing image:

1) "backing" is a dict with the options of the backing image.

   { ..., "backing": { "driver": "qcow2", "node-name": "base", ... }

   In this case I suppose the backing image would be reopened with the
   new options. Any attempt to change the node name or the file name
   should fail. The graph itself doesn't change, only the options of
   each node.

2) "backing" is reference to another node.

   { ..., "backing": "base2" }

   In this case this should replace "base" with "base2" (which must have
   been opened already).

3) "backing" is null

   { ..., "backing": null }

   This is similar to (2). "base" would be removed from the backing
   chain and "top" would no longer have a backing image. 

4) "backing" is omitted altogether

   In the case of 'blockdev-add' this means that the default backing
   image (specified in the header) is opened with the default
   options. What would it mean for 'blockdev-reopen' ? We keep the
   backing image with the current options (e.g we remove it from the
   BlockReopenQueue)?

There are similar considerations with other children. In the case of
"file" the child is mandatory and non-null, so options (3) and (4) are
not possible.

In the case of Quorum (where there are no default children) we should
keep the children that are specified and remove all others.

Berto

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-05-03 12:22                       ` Kevin Wolf
  2018-05-03 12:33                         ` Alberto Garcia
  2018-05-09 14:22                         ` Alberto Garcia
@ 2018-06-01 10:51                         ` Alberto Garcia
  2018-06-11 12:20                           ` Kevin Wolf
  2 siblings, 1 reply; 21+ messages in thread
From: Alberto Garcia @ 2018-06-01 10:51 UTC (permalink / raw)
  To: Kevin Wolf, Max Reitz; +Cc: qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

On Thu 03 May 2018 02:22:41 PM CEST, Kevin Wolf wrote:
>> > Were the (more or less) exact requirements of QMP blockdev-reopen
>> > discussed? How is it different from qemu-io's "reopen" command?
>> > What are the options that you can and can not change?
>> 
>> I can't quite remember, I'm afraid.  I think it was supposed to be
>> pretty much qemu-io's reopen (so just bdrv_reopen()).  I suppose you
>> cannot change the driver (obviously) or probably the node name, because
>> either would result in the node being replaced by a completely new one.
>> 
>> Other than that, it probably depends on what the block driver supports,
>> but ideally you should be able to change everything.
>
> Honestly the design of bdrv_reopen() is quite broken because of the
> way it tries to maintain old options if they aren't specified, and
> guesses what you might mean when you add flags to the mix. The exact
> semantics are quite complicated and I'd rather avoid them in a stable
> API.
>
> A clean QMP command would probably apply the same defaults as
> blockdev-add, so you just get to specify the full options again.

I have a prototype of this working and almost ready to be published, but
there's a tricky thing with this part:

If we want blockdev-reopen to apply the defaults for all options except
from the ones expliclity specified by the user, then it means that we
need to check not just the options that are present, but also the ones
that are omitted.

For example:

   { "execute": "blockdev-add",
     "arguments": { "driver": "null-aio",
                    "node-name": "root",
                    "size": 1024 }

This adds a null-aio block device with the "size" option set to 1024
(the default is 1 << 30).

null_reopen_prepare() allows reopening that block device, but it does
not allow changing any of its options. Attempting to change the value of
"size" is detected by the loop that checks unhandled options at the end
of bdrv_reopen_prepare() and returns "Cannot change the option 'size'".

So far, so good. We have this generic check for all options that works
with all drivers, so as long as we only specify options that we know
that can be changed, everything is fine.

However if we want blockdev-reopen to apply the default values for all
omitted options, then omitting "size" would be equivalent to setting it
to its default value (1 << 30). And if "size" cannot be changed then
QEMU should complain unless we explicitly set "size" to 1024 again on
reopen.

This complicates things a bit, because we would go from "the options
that can't be changed are the ones that are not handled by each driver's
_prepare() function" to "options that are absent can also produce an
error".

Berto

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-06-01 10:51                         ` Alberto Garcia
@ 2018-06-11 12:20                           ` Kevin Wolf
  2018-06-11 12:23                             ` Alberto Garcia
  0 siblings, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2018-06-11 12:20 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Max Reitz, qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

Am 01.06.2018 um 12:51 hat Alberto Garcia geschrieben:
> On Thu 03 May 2018 02:22:41 PM CEST, Kevin Wolf wrote:
> >> > Were the (more or less) exact requirements of QMP blockdev-reopen
> >> > discussed? How is it different from qemu-io's "reopen" command?
> >> > What are the options that you can and can not change?
> >> 
> >> I can't quite remember, I'm afraid.  I think it was supposed to be
> >> pretty much qemu-io's reopen (so just bdrv_reopen()).  I suppose you
> >> cannot change the driver (obviously) or probably the node name, because
> >> either would result in the node being replaced by a completely new one.
> >> 
> >> Other than that, it probably depends on what the block driver supports,
> >> but ideally you should be able to change everything.
> >
> > Honestly the design of bdrv_reopen() is quite broken because of the
> > way it tries to maintain old options if they aren't specified, and
> > guesses what you might mean when you add flags to the mix. The exact
> > semantics are quite complicated and I'd rather avoid them in a stable
> > API.
> >
> > A clean QMP command would probably apply the same defaults as
> > blockdev-add, so you just get to specify the full options again.
> 
> I have a prototype of this working and almost ready to be published, but
> there's a tricky thing with this part:
> 
> If we want blockdev-reopen to apply the defaults for all options except
> from the ones expliclity specified by the user, then it means that we
> need to check not just the options that are present, but also the ones
> that are omitted.
> 
> For example:
> 
>    { "execute": "blockdev-add",
>      "arguments": { "driver": "null-aio",
>                     "node-name": "root",
>                     "size": 1024 }
> 
> This adds a null-aio block device with the "size" option set to 1024
> (the default is 1 << 30).
> 
> null_reopen_prepare() allows reopening that block device, but it does
> not allow changing any of its options. Attempting to change the value of
> "size" is detected by the loop that checks unhandled options at the end
> of bdrv_reopen_prepare() and returns "Cannot change the option 'size'".
> 
> So far, so good. We have this generic check for all options that works
> with all drivers, so as long as we only specify options that we know
> that can be changed, everything is fine.
> 
> However if we want blockdev-reopen to apply the default values for all
> omitted options, then omitting "size" would be equivalent to setting it
> to its default value (1 << 30). And if "size" cannot be changed then
> QEMU should complain unless we explicitly set "size" to 1024 again on
> reopen.
> 
> This complicates things a bit, because we would go from "the options
> that can't be changed are the ones that are not handled by each driver's
> _prepare() function" to "options that are absent can also produce an
> error".

To be honest, I think this is fine. If the user can specify the size
once (in blockdev-add), they can do it again (in blockdev-reopen).

We just need to make sure that we don't break existing bdrv_reopen*()
calls that come from places other than the monitor.

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2018-06-11 12:20                           ` Kevin Wolf
@ 2018-06-11 12:23                             ` Alberto Garcia
  0 siblings, 0 replies; 21+ messages in thread
From: Alberto Garcia @ 2018-06-11 12:23 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Max Reitz, qemu-devel, qemu-block, Eric Blake, Stefan Hajnoczi

On Mon 11 Jun 2018 02:20:06 PM CEST, Kevin Wolf wrote:
> Am 01.06.2018 um 12:51 hat Alberto Garcia geschrieben:
>> On Thu 03 May 2018 02:22:41 PM CEST, Kevin Wolf wrote:
>> >> > Were the (more or less) exact requirements of QMP blockdev-reopen
>> >> > discussed? How is it different from qemu-io's "reopen" command?
>> >> > What are the options that you can and can not change?
>> >> 
>> >> I can't quite remember, I'm afraid.  I think it was supposed to be
>> >> pretty much qemu-io's reopen (so just bdrv_reopen()).  I suppose you
>> >> cannot change the driver (obviously) or probably the node name, because
>> >> either would result in the node being replaced by a completely new one.
>> >> 
>> >> Other than that, it probably depends on what the block driver supports,
>> >> but ideally you should be able to change everything.
>> >
>> > Honestly the design of bdrv_reopen() is quite broken because of the
>> > way it tries to maintain old options if they aren't specified, and
>> > guesses what you might mean when you add flags to the mix. The exact
>> > semantics are quite complicated and I'd rather avoid them in a stable
>> > API.
>> >
>> > A clean QMP command would probably apply the same defaults as
>> > blockdev-add, so you just get to specify the full options again.
>> 
>> I have a prototype of this working and almost ready to be published, but
>> there's a tricky thing with this part:
>> 
>> If we want blockdev-reopen to apply the defaults for all options except
>> from the ones expliclity specified by the user, then it means that we
>> need to check not just the options that are present, but also the ones
>> that are omitted.
>> 
>> For example:
>> 
>>    { "execute": "blockdev-add",
>>      "arguments": { "driver": "null-aio",
>>                     "node-name": "root",
>>                     "size": 1024 }
>> 
>> This adds a null-aio block device with the "size" option set to 1024
>> (the default is 1 << 30).
>> 
>> null_reopen_prepare() allows reopening that block device, but it does
>> not allow changing any of its options. Attempting to change the value of
>> "size" is detected by the loop that checks unhandled options at the end
>> of bdrv_reopen_prepare() and returns "Cannot change the option 'size'".
>> 
>> So far, so good. We have this generic check for all options that works
>> with all drivers, so as long as we only specify options that we know
>> that can be changed, everything is fine.
>> 
>> However if we want blockdev-reopen to apply the default values for all
>> omitted options, then omitting "size" would be equivalent to setting it
>> to its default value (1 << 30). And if "size" cannot be changed then
>> QEMU should complain unless we explicitly set "size" to 1024 again on
>> reopen.
>> 
>> This complicates things a bit, because we would go from "the options
>> that can't be changed are the ones that are not handled by each driver's
>> _prepare() function" to "options that are absent can also produce an
>> error".
>
> To be honest, I think this is fine. If the user can specify the size
> once (in blockdev-add), they can do it again (in blockdev-reopen).
>
> We just need to make sure that we don't break existing bdrv_reopen*()
> calls that come from places other than the monitor.

Good. I have the first revision of the series almost ready, expect it
this week.

Berto

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC] Intermediate block mirroring
  2015-04-02 13:28 Alberto Garcia
@ 2015-04-02 16:56 ` Eric Blake
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Blake @ 2015-04-02 16:56 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3110 bytes --]

On 04/02/2015 07:28 AM, Alberto Garcia wrote:
> Hi,
> 
> I'm interested in adding the possibility to mirror an intermediate
> node in a disk image chain, but I would like to have some feedback
> before sending any patches.
> 
> The goal would be to convert this:
> 
>    [A] -> [B] -> [C] -> [D]
> 
> into this:
> 
>    [A] -> [B] -> [X] -> [D]
> 
> where [D] is the active image and [X] would be a copy of [C]. The
> latter would be unlinked from the chain.

Seems useful, if for no other reason than to be another tool in the
arsenal of low-level manipulations that can be strung together for cool
high-level operations.

> 
> A use case would be to move disk images across different storage
> backends.
> 
> My idea is to extend the drive-mirror command. Similar to what we
> discussed in the case of the intermediate block streaming, I can reuse
> the 'device' parameter to refer to a node name. So the API doesn't
> need any changes other than the extended semantics for this parameter.
> 
> One difference with the current functionality is that once the block
> job is completed, the node above the mirrored one would have to change
> its backing image to point to the new one. One solution is to iterate
> over all devices (bdrv_next()) and check which ones are connected
> directly or indirectly to the mirrored node (bdrv_find_overlay()).
> 
> drive-mirror has three different sync modes: top, full and none. This
> would be the chain from the example using each one of these modes:
> 
>   top:
> 
>      [A] -> [B] -> [X] -> [D]

That is, X becomes the mirror of C, and then a later command lets us
rebase D onto X (since we know the guest-visible contents accessible
from X and C are identical).

> 
>   full:
> 
>      [X] -> [D]

That is, X becomes the mirror of the full chain A through C, and then a
later command lets us rebase D onto X (since we know the guest-visible
contents accessible from X and C are identical).

> 
>   none:
> 
>      [A] -> [B] -> [C] -> [X] -> [D]

That is, X becomes a new file that tracks changes made since a point in
time which are also going into C; and if we desire we can issue a later
command to rebase D onto X (since we know the guest-visible contents
accessible from X and C are identical at that time), and even later
start cleaning up C (we could use dirty bitmaps to see what got moved
into X to clean those sectors out of C and reduce its size)

> 
> My understanding is that in the 'sync=full' case, [A] and [B] would
> also need to be blocked during the operation since they are going to
> disappear from the chain.
> 
> I have some code and in principle everything seems to be working fine,
> but I'd like to test it a bit more.
> 
> What's anyway your opinion about this proposal?

Certainly seems like something worth having.  The devil may be in the
details, but we can get there when you post proposed patches.

> 
> Thanks,
> 
> Berto
> 
> 
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [RFC] Intermediate block mirroring
@ 2015-04-02 13:28 Alberto Garcia
  2015-04-02 16:56 ` Eric Blake
  0 siblings, 1 reply; 21+ messages in thread
From: Alberto Garcia @ 2015-04-02 13:28 UTC (permalink / raw)
  To: qemu-devel, qemu-block

Hi,

I'm interested in adding the possibility to mirror an intermediate
node in a disk image chain, but I would like to have some feedback
before sending any patches.

The goal would be to convert this:

   [A] -> [B] -> [C] -> [D]

into this:

   [A] -> [B] -> [X] -> [D]

where [D] is the active image and [X] would be a copy of [C]. The
latter would be unlinked from the chain.

A use case would be to move disk images across different storage
backends.

My idea is to extend the drive-mirror command. Similar to what we
discussed in the case of the intermediate block streaming, I can reuse
the 'device' parameter to refer to a node name. So the API doesn't
need any changes other than the extended semantics for this parameter.

One difference with the current functionality is that once the block
job is completed, the node above the mirrored one would have to change
its backing image to point to the new one. One solution is to iterate
over all devices (bdrv_next()) and check which ones are connected
directly or indirectly to the mirrored node (bdrv_find_overlay()).

drive-mirror has three different sync modes: top, full and none. This
would be the chain from the example using each one of these modes:

  top:

     [A] -> [B] -> [X] -> [D]

  full:

     [X] -> [D]

  none:

     [A] -> [B] -> [C] -> [X] -> [D]

My understanding is that in the 'sync=full' case, [A] and [B] would
also need to be blocked during the operation since they are going to
disappear from the chain.

I have some code and in principle everything seems to be working fine,
but I'd like to test it a bit more.

What's anyway your opinion about this proposal?

Thanks,

Berto

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-06-11 12:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-12 17:07 [Qemu-devel] [RFC] Intermediate block mirroring Alberto Garcia
2018-04-13 14:23 ` Max Reitz
2018-04-16 14:59   ` Alberto Garcia
2018-04-16 15:15     ` Max Reitz
2018-04-18 15:34       ` Alberto Garcia
2018-04-20 13:13         ` Max Reitz
2018-04-25 12:58           ` Alberto Garcia
2018-04-25 13:06             ` Max Reitz
2018-04-25 13:42               ` Alberto Garcia
2018-04-25 14:03                 ` Max Reitz
2018-05-02 13:07                   ` Alberto Garcia
2018-05-02 14:12                     ` Max Reitz
2018-05-03 10:32                       ` Alberto Garcia
2018-05-03 12:22                       ` Kevin Wolf
2018-05-03 12:33                         ` Alberto Garcia
2018-05-09 14:22                         ` Alberto Garcia
2018-06-01 10:51                         ` Alberto Garcia
2018-06-11 12:20                           ` Kevin Wolf
2018-06-11 12:23                             ` Alberto Garcia
  -- strict thread matches above, loose matches on Subject: below --
2015-04-02 13:28 Alberto Garcia
2015-04-02 16:56 ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.