All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] block-insert-node and block-job-delete
@ 2017-07-26 14:19 Manos Pitsidianakis
  2017-07-26 15:12 ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Manos Pitsidianakis @ 2017-07-26 14:19 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Stefan Hajnoczi, Kevin Wolf, Alberto Garcia

[-- Attachment #1: Type: text/plain, Size: 2541 bytes --]

This proposal follows a discussion we had with Kevin and Stefan on 
filter node management.

With block filter drivers arises a need to configure filter nodes on 
runtime with QMP on live graphs. A problem with doing live graph 
modifications is that some block jobs modify the graph when they are 
done and don't operate under any synchronisation, resulting in a race 
condition if we try to insert a filter node in the place of an existing 
edge.

The race can be overcome if we introduce an optional manual-delete flag 
in the creation of a block job to indicate that they will not be deleted 
automatically, but rather wait until the user explicitly calls 
block-job-delete to remove the block job and apply the graph 
modifications. This makes filter insertion require that there are no 
active block jobs with manual-delete set to false. The graph operations 
will be synchronous unlike block-job-complete; the 
BlockJobDeferToMainLoopFn completion callback will be executed from QMP 
instead of using block_job_defer_to_main_loop.

block-job-delete will be defined as 
{ 'command': 'block-job-delete', 'data': { 'device': 'str' } }

With this change we can define block-insert-node. New nodes will be 
created with blockdev-add with the appropriate children nodes. On 
calling block-insert-node we specify the node to add, and the edge we 
wish to replace:
{ 'command': 'block-insert-node',
    'data' : { '*parent' :  'str', 'child' : 'str', 'node' : 'str', 
    '*device' : 'str' } }

This will be similar to x-blockdev-change, but instead of bdrv_add_child 
the parent driver must implement bdrv_reopen_* to change the child. If 
instead of parent we specify device, the node will be inserted as the 
root bs of the specified BlockBackend device. If 'child' is not in the 
bs->children of 'node', we should abort. I'm not certain if there's need 
to implement an option between bs->file/bs->backing.

In the following example we insert a throttle filter node (T) between A 
and B:

    A
    |
    B

{ "execute": "blockdev-add",
  "arguments": {
    "driver": "throttle",
    "node-name": "T",
    "throttling-group": "foobar",
    "limits" : {
      "iops-total" : 1000,
    },
    "file": "B"
  }
}

  A  T
  \ /
   B

{ "execute" : "block-insert-node",
 "arguments" : {
  "parent" : "A",
  "child" : "B",
  "node" : "T"
  }
}
  A
  |
  T
  |
  B

If bdrv_reopen is ever introduced to QMP this command, except for the BB 
root case, might be obsolete.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
  2017-07-26 14:19 [Qemu-devel] [RFC] block-insert-node and block-job-delete Manos Pitsidianakis
@ 2017-07-26 15:12 ` Stefan Hajnoczi
  2017-07-26 18:23   ` Manos Pitsidianakis
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2017-07-26 15:12 UTC (permalink / raw)
  To: Manos Pitsidianakis, qemu-block, qemu-devel, Kevin Wolf, Alberto Garcia
  Cc: John Snow, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 944 bytes --]

On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote:
> This proposal follows a discussion we had with Kevin and Stefan on filter
> node management.
> 
> With block filter drivers arises a need to configure filter nodes on runtime
> with QMP on live graphs. A problem with doing live graph modifications is
> that some block jobs modify the graph when they are done and don't operate
> under any synchronisation, resulting in a race condition if we try to insert
> a filter node in the place of an existing edge.

Block jobs *do* operate under synchronization.  They only manipulate the
graph from the main loop (under the QEMU global mutex just like a
monitor command).

But maybe you are thinking about higher level race conditions between
QMP commands.  Can you give an example of the race?

CCing John Snow (interested in commands for 'reaping' completed block
jobs) and Jeff Cody (block jobs maintainer).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
  2017-07-26 15:12 ` Stefan Hajnoczi
@ 2017-07-26 18:23   ` Manos Pitsidianakis
  2017-07-27 10:07     ` Stefan Hajnoczi
  2017-07-27 22:09     ` John Snow
  0 siblings, 2 replies; 13+ messages in thread
From: Manos Pitsidianakis @ 2017-07-26 18:23 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-block, qemu-devel, Kevin Wolf, Alberto Garcia, John Snow, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 2181 bytes --]

On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote:
>On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote:
>> This proposal follows a discussion we had with Kevin and Stefan on filter
>> node management.
>>
>> With block filter drivers arises a need to configure filter nodes on runtime
>> with QMP on live graphs. A problem with doing live graph modifications is
>> that some block jobs modify the graph when they are done and don't operate
>> under any synchronisation, resulting in a race condition if we try to insert
>> a filter node in the place of an existing edge.
>
>But maybe you are thinking about higher level race conditions between
>QMP commands.  Can you give an example of the race?

Exactly, synchronisation between the QMP/user actions. Here's an 
example, correct me if I'm wrong:

User issues a block-commit to this tree to commit A onto B:
    BB -> A -> B

This inserts the dummy mirror node at the top since this is a mirror job 
(active commit)
    BB -> dummy -> A -> B

User wants to add a filter node F below the BB. First we create the 
node:
    BB -> dummy -> A -> B
       F /

Commit finishes before we do block-insert-node, dummy and A is removed 
from the BB's chain, mirror_exit calls bdrv_replace_node()

    BB ------------> B
                F -> /

Now inserting F under BB will error since dummy does not exist any more.

The proposal doesn't guard against the following:
User issues a block-commit to this tree to commit B onto C:
    BB -> A -> B -> C

The dummy node (commit's top bs is A):
    BB -> A -> dummy -> B -> C

blockdev-add of a filter we wish to eventually be between A and C:
    BB -> A -> dummy -> B -> C
             F/

if commit finishes before we issue block-insert-node (commit_complete()
calls bdrv_set_backing_hd() which only touches A)
    BB -> A --------------> C
           F -> dummy -> B /
    resulting in error when issuing block-insert-node,

else if we set the job to manual-delete and insert F:
    BB -> A -> F -> dummy -> B -> C
deleting the job will result in F being lost since the job's top bs 
wasn't updated.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
  2017-07-26 18:23   ` Manos Pitsidianakis
@ 2017-07-27 10:07     ` Stefan Hajnoczi
  2017-07-28 12:08       ` Kevin Wolf
  2017-07-27 22:09     ` John Snow
  1 sibling, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2017-07-27 10:07 UTC (permalink / raw)
  To: Manos Pitsidianakis, qemu-block, qemu-devel, Kevin Wolf,
	Alberto Garcia, John Snow, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 3378 bytes --]

On Wed, Jul 26, 2017 at 09:23:20PM +0300, Manos Pitsidianakis wrote:
> On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote:
> > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote:
> > > This proposal follows a discussion we had with Kevin and Stefan on filter
> > > node management.
> > > 
> > > With block filter drivers arises a need to configure filter nodes on runtime
> > > with QMP on live graphs. A problem with doing live graph modifications is
> > > that some block jobs modify the graph when they are done and don't operate
> > > under any synchronisation, resulting in a race condition if we try to insert
> > > a filter node in the place of an existing edge.
> > 
> > But maybe you are thinking about higher level race conditions between
> > QMP commands.  Can you give an example of the race?
> 
> Exactly, synchronisation between the QMP/user actions. Here's an example,
> correct me if I'm wrong:
> 
> User issues a block-commit to this tree to commit A onto B:
>    BB -> A -> B
> 
> This inserts the dummy mirror node at the top since this is a mirror job
> (active commit)
>    BB -> dummy -> A -> B
> 
> User wants to add a filter node F below the BB. First we create the node:
>    BB -> dummy -> A -> B
>       F /
> 
> Commit finishes before we do block-insert-node, dummy and A is removed from
> the BB's chain, mirror_exit calls bdrv_replace_node()
> 
>    BB ------------> B
>                F -> /
> 
> Now inserting F under BB will error since dummy does not exist any more.

I see the diagrams but the flow isn't clear without the user's QMP
commands.

F is created using blockdev-add?  What are the parameters and how does
it know about dummy?  I think this is an interesting question in itself
because dummy is a transient internal node that QMP users don't
necessarily know about.

What is the full block-insert-node command?

> The proposal doesn't guard against the following:
> User issues a block-commit to this tree to commit B onto C:
>    BB -> A -> B -> C
> 
> The dummy node (commit's top bs is A):
>    BB -> A -> dummy -> B -> C
> 
> blockdev-add of a filter we wish to eventually be between A and C:
>    BB -> A -> dummy -> B -> C
>             F/
> 
> if commit finishes before we issue block-insert-node (commit_complete()
> calls bdrv_set_backing_hd() which only touches A)
>    BB -> A --------------> C
>           F -> dummy -> B /
>    resulting in error when issuing block-insert-node,
> 
> else if we set the job to manual-delete and insert F:
>    BB -> A -> F -> dummy -> B -> C
> deleting the job will result in F being lost since the job's top bs wasn't
> updated.

manual-delete is a solution for block jobs.  The write threshold feature
is a plain QMP command that in the future will create a transient
internal node (before write notifier).

I'm not sure it makes sense to turn the write threshold feature into a
block job.  I guess it could work, but it seems a little unnatural and
maybe there will be other features that use transient internal nodes.

What I'm getting at is that there might be alternative to manual-delete
that work in the general case.  For example, if blockdev-add +
block-insert-node are part of a QMP 'transaction' command, can we lock
the graph to protect it against modifications?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
  2017-07-26 18:23   ` Manos Pitsidianakis
  2017-07-27 10:07     ` Stefan Hajnoczi
@ 2017-07-27 22:09     ` John Snow
  2017-07-28  8:49       ` Manos Pitsidianakis
  2017-07-28 11:55       ` Kevin Wolf
  1 sibling, 2 replies; 13+ messages in thread
From: John Snow @ 2017-07-27 22:09 UTC (permalink / raw)
  To: Manos Pitsidianakis, Stefan Hajnoczi, qemu-block, qemu-devel,
	Kevin Wolf, Alberto Garcia, Jeff Cody



On 07/26/2017 02:23 PM, Manos Pitsidianakis wrote:
> On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote:
>> On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote:
>>> This proposal follows a discussion we had with Kevin and Stefan on 
>>> filter
>>> node management.
>>>
>>> With block filter drivers arises a need to configure filter nodes on 
>>> runtime
>>> with QMP on live graphs. A problem with doing live graph 
>>> modifications is
>>> that some block jobs modify the graph when they are done and don't 
>>> operate
>>> under any synchronisation, resulting in a race condition if we try to 
>>> insert
>>> a filter node in the place of an existing edge.
>>
>> But maybe you are thinking about higher level race conditions between
>> QMP commands.  Can you give an example of the race?
> 
> Exactly, synchronisation between the QMP/user actions. Here's an 
> example, correct me if I'm wrong:
> 
> User issues a block-commit to this tree to commit A onto B:
>     BB -> A -> B
> 
> This inserts the dummy mirror node at the top since this is a mirror job 
> (active commit)
>     BB -> dummy -> A -> B
> 
> User wants to add a filter node F below the BB. First we create the node:
>     BB -> dummy -> A -> B
>        F /
> 
> Commit finishes before we do block-insert-node, dummy and A is removed 
> from the BB's chain, mirror_exit calls bdrv_replace_node()
> 
>     BB ------------> B
>                 F -> /
> 
> Now inserting F under BB will error since dummy does not exist any more.
> 

Exactly how much of a problem is this? You, the user, have instructed 
QEMU to do two conflicting things:

(1) Squash A down into B, and
(2) Insert a filter above A

Shame on you for deleting the node you were trying to reference, no? And 
as an added bonus, QEMU will even error out on your command instead of 
trying to do something and getting it wrong.

Is this a problem?

> The proposal doesn't guard against the following:
> User issues a block-commit to this tree to commit B onto C:
>     BB -> A -> B -> C
> 
> The dummy node (commit's top bs is A):
>     BB -> A -> dummy -> B -> C
> 
> blockdev-add of a filter we wish to eventually be between A and C:
>     BB -> A -> dummy -> B -> C
>              F/
> 
> if commit finishes before we issue block-insert-node (commit_complete()
> calls bdrv_set_backing_hd() which only touches A)
>     BB -> A --------------> C
>            F -> dummy -> B /
>     resulting in error when issuing block-insert-node,

Because "B" and "dummy" have already actually been deleted, I assume. 
(The diagram is confusing.)

> 
> else if we set the job to manual-delete and insert F:
>     BB -> A -> F -> dummy -> B -> C
> deleting the job will result in F being lost since the job's top bs 
> wasn't updated.
> 

Seems like a fairly good reason not to pursue this avenue then, yes?

I think I agree with Stefan's concerns that this is more of a temporary 
workaround for the benefit of jobs, but that there may well be other 
reasons (especially in the future) where graph manipulations may occur 
invisibly to the user.

Do you have any use cases for failure here that don't involve the user 
themselves asking for two conflicting things? That is, QEMU doing some 
graph reorganization without the user's knowledge coupled with the user 
asking for a new filter node?

Are there any failures that result in anything more dangerous or bad 
than a "404: node not found" style error where we simply reject the 
premise of what the user was attempting to accomplish?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
  2017-07-27 22:09     ` John Snow
@ 2017-07-28  8:49       ` Manos Pitsidianakis
  2017-07-28 11:55       ` Kevin Wolf
  1 sibling, 0 replies; 13+ messages in thread
From: Manos Pitsidianakis @ 2017-07-28  8:49 UTC (permalink / raw)
  To: John Snow
  Cc: Stefan Hajnoczi, qemu-block, qemu-devel, Kevin Wolf,
	Alberto Garcia, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 4425 bytes --]

On Thu, Jul 27, 2017 at 06:09:04PM -0400, John Snow wrote:
>
>
>On 07/26/2017 02:23 PM, Manos Pitsidianakis wrote:
>>On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote:
>>>On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote:
>>>>This proposal follows a discussion we had with Kevin and Stefan 
>>>>on filter
>>>>node management.
>>>>
>>>>With block filter drivers arises a need to configure filter 
>>>>nodes on runtime
>>>>with QMP on live graphs. A problem with doing live graph 
>>>>modifications is
>>>>that some block jobs modify the graph when they are done and 
>>>>don't operate
>>>>under any synchronisation, resulting in a race condition if we 
>>>>try to insert
>>>>a filter node in the place of an existing edge.
>>>
>>>But maybe you are thinking about higher level race conditions between
>>>QMP commands.  Can you give an example of the race?
>>
>>Exactly, synchronisation between the QMP/user actions. Here's an 
>>example, correct me if I'm wrong:
>>
>>User issues a block-commit to this tree to commit A onto B:
>>    BB -> A -> B
>>
>>This inserts the dummy mirror node at the top since this is a mirror 
>>job (active commit)
>>    BB -> dummy -> A -> B
>>
>>User wants to add a filter node F below the BB. First we create the node:
>>    BB -> dummy -> A -> B
>>       F /
>>
>>Commit finishes before we do block-insert-node, dummy and A is 
>>removed from the BB's chain, mirror_exit calls bdrv_replace_node()
>>
>>    BB ------------> B
>>                F -> /
>>
>>Now inserting F under BB will error since dummy does not exist any more.
>>
>
>Exactly how much of a problem is this? You, the user, have instructed 
>QEMU to do two conflicting things:
>
>(1) Squash A down into B, and
>(2) Insert a filter above A
>
>Shame on you for deleting the node you were trying to reference, no? 
>And as an added bonus, QEMU will even error out on your command 
>instead of trying to do something and getting it wrong.
>
>Is this a problem?


>
>>The proposal doesn't guard against the following:
>>User issues a block-commit to this tree to commit B onto C:
>>    BB -> A -> B -> C
>>
>>The dummy node (commit's top bs is A):
>>    BB -> A -> dummy -> B -> C
>>
>>blockdev-add of a filter we wish to eventually be between A and C:
>>    BB -> A -> dummy -> B -> C
>>             F/
>>
>>if commit finishes before we issue block-insert-node (commit_complete()
>>calls bdrv_set_backing_hd() which only touches A)
>>    BB -> A --------------> C
>>           F -> dummy -> B /
>>    resulting in error when issuing block-insert-node,
>
>Because "B" and "dummy" have already actually been deleted, I assume. 
>(The diagram is confusing.)

because we have an edge (A, C) and C is not a child of F, so we cannot 
insert it as a child to A.  (I rotated the usual ascii art by 90 deg to 
save me from handling whitespace, but I might have done it worse)
>>
>>else if we set the job to manual-delete and insert F:
>>    BB -> A -> F -> dummy -> B -> C
>>deleting the job will result in F being lost since the job's top bs 
>>wasn't updated.
>>
>
>Seems like a fairly good reason not to pursue this avenue then, yes?

I think this is just a different problem, not a race but of broadcasting 
graph operations to affected parties (block jobs).
>
>I think I agree with Stefan's concerns that this is more of a 
>temporary workaround for the benefit of jobs, but that there may well 
>be other reasons (especially in the future) where graph manipulations 
>may occur invisibly to the user.
>
>Do you have any use cases for failure here that don't involve the user 
>themselves asking for two conflicting things? That is, QEMU doing some 
>graph reorganization without the user's knowledge coupled with the 
>user asking for a new filter node?
>
>Are there any failures that result in anything more dangerous or bad 
>than a "404: node not found" style error where we simply reject the 
>premise of what the user was attempting to accomplish?
>

I couldn't come up with a case that caused catastrophic failure, no 
(though I lack familiarity with block jobs). If this isn't a problem 
after all, good news.

block-insert-node needs to work with no dangerous conflicts to other 
graph operations, basically. Is there a case where there's not just an 
error and we can't recover?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
  2017-07-27 22:09     ` John Snow
  2017-07-28  8:49       ` Manos Pitsidianakis
@ 2017-07-28 11:55       ` Kevin Wolf
  2017-08-02 10:47         ` Stefan Hajnoczi
  1 sibling, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2017-07-28 11:55 UTC (permalink / raw)
  To: John Snow
  Cc: Manos Pitsidianakis, Stefan Hajnoczi, qemu-block, qemu-devel,
	Alberto Garcia, Jeff Cody, armbru

Am 28.07.2017 um 00:09 hat John Snow geschrieben:
> On 07/26/2017 02:23 PM, Manos Pitsidianakis wrote:
> > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote:
> > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote:
> > > > This proposal follows a discussion we had with Kevin and Stefan
> > > > on filter
> > > > node management.
> > > > 
> > > > With block filter drivers arises a need to configure filter
> > > > nodes on runtime
> > > > with QMP on live graphs. A problem with doing live graph
> > > > modifications is
> > > > that some block jobs modify the graph when they are done and
> > > > don't operate
> > > > under any synchronisation, resulting in a race condition if we
> > > > try to insert
> > > > a filter node in the place of an existing edge.
> > > 
> > > But maybe you are thinking about higher level race conditions between
> > > QMP commands.  Can you give an example of the race?
> > 
> > Exactly, synchronisation between the QMP/user actions. Here's an
> > example, correct me if I'm wrong:
> > 
> > User issues a block-commit to this tree to commit A onto B:
> >     BB -> A -> B
> > 
> > This inserts the dummy mirror node at the top since this is a mirror job
> > (active commit)
> >     BB -> dummy -> A -> B
> > 
> > User wants to add a filter node F below the BB. First we create the node:
> >     BB -> dummy -> A -> B
> >        F /
> > 
> > Commit finishes before we do block-insert-node, dummy and A is removed
> > from the BB's chain, mirror_exit calls bdrv_replace_node()
> > 
> >     BB ------------> B
> >                 F -> /
> > 
> > Now inserting F under BB will error since dummy does not exist any more.

So the basic scenario is the one from the last section of the STR block
meeting notes:

http://lists.gnu.org/archive/html/qemu-block/2016-12/msg00045.html

At the time, this scenario seemed to make sense, but in fact, as your
explanation show, it's not a real problem. The theory that inserting the
filter would bring the mirror_top node back is false because removing
the mirror_top node updates _all_ of its parents (including the new
throttling node) rather than just the BB to point to the BDS below.

However, whether block-insert-node actually fails depends on how you
address the edge(s) on which you want to insert a filter node. We would
only get an error if you specify it as "insert filter above this node".
However, this is not a way to unambiguously reference a single edge, you
could only update _all_ incoming edges for a node at once. Therefore I
don't think this is a good way to do things.

An unambiguous way to identify a single edge is the name of the parent
node (usually a node-name, but possibly a BB name - they share a single
namespace, so having one QMP argument for both could be enough) and of
the child (as in BdrvChild), e.g. "insert as new child=backing of
node=myimage",

This way, the example scenario would actually just work.

> Exactly how much of a problem is this? You, the user, have instructed QEMU
> to do two conflicting things:
> 
> (1) Squash A down into B, and
> (2) Insert a filter above A
> 
> Shame on you for deleting the node you were trying to reference, no? And as
> an added bonus, QEMU will even error out on your command instead of trying
> to do something and getting it wrong.
> 
> Is this a problem?

I don't think these two actions can reasonably be considered
conflicting, adding a filter on top doesn't influence the commit
operation at all.

We used to just forbid everything while doing block jobs, but these
operations are block jobs because they are long-running, and just
forbidding everything for a long time isn't very helpful. There is no
reason why a user shouldn't be able to take a snapshot while they are
creating a backup, etc.

As I said above, filter nodes automagically going away at some point
without an explicit user command isn't necessarily a problem, but I
also don't feel very comfortable with it. It certainly feels like an
easy way to introduce bugs. Things should become much easier to manage
when all graph modifications are explicit.

I'm surprised that you seem to be opposed to an explicit job-delete now
as we already discussed this before and it would solve several problems:

* Block job completion issues only an event. If the management tools
  reconnects, it has no way to get the same information with a query-*
  command. If jobs stay around until job-delete, this is not a problem.

* Completion is usually rather complicated with all of the graph changes
  and involves operations that can fail, but we don't have a way to
  actually report errors to the user. job-delete could do that.

* And finally the management problem we're discussing here. Yes, if I
  want to snapshot below a mirror node and the mirror goes away
  immediately before I send the command, libvirt will just get an error
  and can process all events and update its graph view and then retry
  with another node name.

  Though even if it's possible, do you really believe this will be fun
  to implement for management tools and help to avoid bugs? Or will
  people even remember to properly implement this?

On the other hand, the downsides of job-delete:

* Er... Haven't heard of any yet. Maybe inconvenient for human users?
  But they shouldn't be using QMP anyway.

> > The proposal doesn't guard against the following:
> > User issues a block-commit to this tree to commit B onto C:
> >     BB -> A -> B -> C
> > 
> > The dummy node (commit's top bs is A):
> >     BB -> A -> dummy -> B -> C
> > 
> > blockdev-add of a filter we wish to eventually be between A and C:
> >     BB -> A -> dummy -> B -> C
> >              F/
> > 
> > if commit finishes before we issue block-insert-node (commit_complete()
> > calls bdrv_set_backing_hd() which only touches A)
> >     BB -> A --------------> C
> >            F -> dummy -> B /
> >     resulting in error when issuing block-insert-node,
> 
> Because "B" and "dummy" have already actually been deleted, I assume. (The
> diagram is confusing.)

Why would there be a graph change at all before job-delete?

> > else if we set the job to manual-delete and insert F:
> >     BB -> A -> F -> dummy -> B -> C
> > deleting the job will result in F being lost since the job's top bs
> > wasn't updated.
> 
> Seems like a fairly good reason not to pursue this avenue then, yes?

That's an implementation problem of the commit job. That it requires
knowledge about the graph above the commit_top node is just completely
wrong. The current implementation works for the trivial case, but not
for much more. You don't even have to add filters dynamically for it to
break.

A future correct commit job would just replace the commit_top node by
base when it completes.

I have patches to implement this change, but unfortunately they revealed
some bugs between bdrv_reopen() and op blockers, so they aren't ready
yet to be posted.

> I think I agree with Stefan's concerns that this is more of a
> temporary workaround for the benefit of jobs, but that there may well
> be other reasons (especially in the future) where graph manipulations
> may occur invisibly to the user.

We should do our best to avoid this. Implicit graph changes only make
libvirt's life hard. I agree that we'll have many more graph changes in
the future, but let's keep them as explicit as possible.

> Do you have any use cases for failure here that don't involve the user
> themselves asking for two conflicting things? That is, QEMU doing some graph
> reorganization without the user's knowledge coupled with the user asking for
> a new filter node?
> 
> Are there any failures that result in anything more dangerous or bad than a
> "404: node not found" style error where we simply reject the premise of what
> the user was attempting to accomplish?

"404: node not found" is already very bad if you want management tools
to be able to cope with it.

Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
  2017-07-27 10:07     ` Stefan Hajnoczi
@ 2017-07-28 12:08       ` Kevin Wolf
  2017-07-31 14:53         ` Stefan Hajnoczi
  2017-07-31 17:30         ` Manos Pitsidianakis
  0 siblings, 2 replies; 13+ messages in thread
From: Kevin Wolf @ 2017-07-28 12:08 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Manos Pitsidianakis, qemu-block, qemu-devel, Alberto Garcia,
	John Snow, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 4691 bytes --]

Am 27.07.2017 um 12:07 hat Stefan Hajnoczi geschrieben:
> On Wed, Jul 26, 2017 at 09:23:20PM +0300, Manos Pitsidianakis wrote:
> > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote:
> > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote:
> > > > This proposal follows a discussion we had with Kevin and Stefan on filter
> > > > node management.
> > > > 
> > > > With block filter drivers arises a need to configure filter nodes on runtime
> > > > with QMP on live graphs. A problem with doing live graph modifications is
> > > > that some block jobs modify the graph when they are done and don't operate
> > > > under any synchronisation, resulting in a race condition if we try to insert
> > > > a filter node in the place of an existing edge.
> > > 
> > > But maybe you are thinking about higher level race conditions between
> > > QMP commands.  Can you give an example of the race?
> > 
> > Exactly, synchronisation between the QMP/user actions. Here's an example,
> > correct me if I'm wrong:
> > 
> > User issues a block-commit to this tree to commit A onto B:
> >    BB -> A -> B
> > 
> > This inserts the dummy mirror node at the top since this is a mirror job
> > (active commit)
> >    BB -> dummy -> A -> B
> > 
> > User wants to add a filter node F below the BB. First we create the node:
> >    BB -> dummy -> A -> B
> >       F /
> > 
> > Commit finishes before we do block-insert-node, dummy and A is removed from
> > the BB's chain, mirror_exit calls bdrv_replace_node()
> > 
> >    BB ------------> B
> >                F -> /
> > 
> > Now inserting F under BB will error since dummy does not exist any more.
> 
> I see the diagrams but the flow isn't clear without the user's QMP
> commands.
> 
> F is created using blockdev-add?  What are the parameters and how does
> it know about dummy?  I think this is an interesting question in itself
> because dummy is a transient internal node that QMP users don't
> necessarily know about.

We can expect that users of block-insert-node also know about block job
filter nodes, simply because the former is newer than the latter.

(I also don't like the name "dummy", this makes it sound like it's not
really a proper node. In reality, there is little reason to treat it
specially.)

> What is the full block-insert-node command?
> 
> > The proposal doesn't guard against the following:
> > User issues a block-commit to this tree to commit B onto C:
> >    BB -> A -> B -> C
> > 
> > The dummy node (commit's top bs is A):
> >    BB -> A -> dummy -> B -> C
> > 
> > blockdev-add of a filter we wish to eventually be between A and C:
> >    BB -> A -> dummy -> B -> C
> >             F/
> > 
> > if commit finishes before we issue block-insert-node (commit_complete()
> > calls bdrv_set_backing_hd() which only touches A)
> >    BB -> A --------------> C
> >           F -> dummy -> B /
> >    resulting in error when issuing block-insert-node,
> > 
> > else if we set the job to manual-delete and insert F:
> >    BB -> A -> F -> dummy -> B -> C
> > deleting the job will result in F being lost since the job's top bs wasn't
> > updated.
> 
> manual-delete is a solution for block jobs.  The write threshold
> feature is a plain QMP command that in the future will create a
> transient internal node (before write notifier).

Yes, that becomes a legacy QMP command then. The new way is blockdev-add
and block-insert-node for write threshold, too.

Apart from that, a write threshold node never disappears by itself, it
is only inserted or removed in the context of a QMP command. That makes
it a lot less dangerous than automatic block completion.

> I'm not sure it makes sense to turn the write threshold feature into a
> block job.  I guess it could work, but it seems a little unnatural and
> maybe there will be other features that use transient internal nodes.

Yeah, no reason to do so. Just a manually created filter node.

> What I'm getting at is that there might be alternative to manual-delete
> that work in the general case.  For example, if blockdev-add +
> block-insert-node are part of a QMP 'transaction' command, can we lock
> the graph to protect it against modifications?

We thought about this at the last STR meeting. However, not for much
longer than a minute. Markus and I both agreed that we wouldn't be the
ones to try and make blockdev-add transactionable. It sounds like a huge
amount of work and a high risk of bugs for little benefit.

Let's just make sure that we provide a way for management tools to avoid
any surprise changes in the block graph.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
  2017-07-28 12:08       ` Kevin Wolf
@ 2017-07-31 14:53         ` Stefan Hajnoczi
  2017-07-31 17:30         ` Manos Pitsidianakis
  1 sibling, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2017-07-31 14:53 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Manos Pitsidianakis, qemu-block, qemu-devel, Alberto Garcia,
	John Snow, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 4019 bytes --]

On Fri, Jul 28, 2017 at 02:08:43PM +0200, Kevin Wolf wrote:
> Am 27.07.2017 um 12:07 hat Stefan Hajnoczi geschrieben:
> > On Wed, Jul 26, 2017 at 09:23:20PM +0300, Manos Pitsidianakis wrote:
> > > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote:
> > > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote:
> > > > > This proposal follows a discussion we had with Kevin and Stefan on filter
> > > > > node management.
> > > > > 
> > > > > With block filter drivers arises a need to configure filter nodes on runtime
> > > > > with QMP on live graphs. A problem with doing live graph modifications is
> > > > > that some block jobs modify the graph when they are done and don't operate
> > > > > under any synchronisation, resulting in a race condition if we try to insert
> > > > > a filter node in the place of an existing edge.
> > > > 
> > > > But maybe you are thinking about higher level race conditions between
> > > > QMP commands.  Can you give an example of the race?
> > > 
> > > Exactly, synchronisation between the QMP/user actions. Here's an example,
> > > correct me if I'm wrong:
> > > 
> > > User issues a block-commit to this tree to commit A onto B:
> > >    BB -> A -> B
> > > 
> > > This inserts the dummy mirror node at the top since this is a mirror job
> > > (active commit)
> > >    BB -> dummy -> A -> B
> > > 
> > > User wants to add a filter node F below the BB. First we create the node:
> > >    BB -> dummy -> A -> B
> > >       F /
> > > 
> > > Commit finishes before we do block-insert-node, dummy and A is removed from
> > > the BB's chain, mirror_exit calls bdrv_replace_node()
> > > 
> > >    BB ------------> B
> > >                F -> /
> > > 
> > > Now inserting F under BB will error since dummy does not exist any more.
> > 
> > I see the diagrams but the flow isn't clear without the user's QMP
> > commands.
> > 
> > F is created using blockdev-add?  What are the parameters and how does
> > it know about dummy?  I think this is an interesting question in itself
> > because dummy is a transient internal node that QMP users don't
> > necessarily know about.
> 
> We can expect that users of block-insert-node also know about block job
> filter nodes, simply because the former is newer than the latter.
> 
> (I also don't like the name "dummy", this makes it sound like it's not
> really a proper node. In reality, there is little reason to treat it
> specially.)
> 
> > What is the full block-insert-node command?
> > 
> > > The proposal doesn't guard against the following:
> > > User issues a block-commit to this tree to commit B onto C:
> > >    BB -> A -> B -> C
> > > 
> > > The dummy node (commit's top bs is A):
> > >    BB -> A -> dummy -> B -> C
> > > 
> > > blockdev-add of a filter we wish to eventually be between A and C:
> > >    BB -> A -> dummy -> B -> C
> > >             F/
> > > 
> > > if commit finishes before we issue block-insert-node (commit_complete()
> > > calls bdrv_set_backing_hd() which only touches A)
> > >    BB -> A --------------> C
> > >           F -> dummy -> B /
> > >    resulting in error when issuing block-insert-node,
> > > 
> > > else if we set the job to manual-delete and insert F:
> > >    BB -> A -> F -> dummy -> B -> C
> > > deleting the job will result in F being lost since the job's top bs wasn't
> > > updated.
> > 
> > manual-delete is a solution for block jobs.  The write threshold
> > feature is a plain QMP command that in the future will create a
> > transient internal node (before write notifier).
> 
> Yes, that becomes a legacy QMP command then. The new way is blockdev-add
> and block-insert-node for write threshold, too.
> 
> Apart from that, a write threshold node never disappears by itself, it
> is only inserted or removed in the context of a QMP command. That makes
> it a lot less dangerous than automatic block completion.

Nice thinking - write threshold can be a node too!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
  2017-07-28 12:08       ` Kevin Wolf
  2017-07-31 14:53         ` Stefan Hajnoczi
@ 2017-07-31 17:30         ` Manos Pitsidianakis
  2017-08-01 13:50           ` Kevin Wolf
  1 sibling, 1 reply; 13+ messages in thread
From: Manos Pitsidianakis @ 2017-07-31 17:30 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, qemu-block, qemu-devel, Alberto Garcia,
	John Snow, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 4874 bytes --]

On Fri, Jul 28, 2017 at 02:08:43PM +0200, Kevin Wolf wrote:
>Am 27.07.2017 um 12:07 hat Stefan Hajnoczi geschrieben:
>> On Wed, Jul 26, 2017 at 09:23:20PM +0300, Manos Pitsidianakis wrote:
>> > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote:
>> > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote:
>> > > > This proposal follows a discussion we had with Kevin and Stefan on filter
>> > > > node management.
>> > > >
>> > > > With block filter drivers arises a need to configure filter nodes on runtime
>> > > > with QMP on live graphs. A problem with doing live graph modifications is
>> > > > that some block jobs modify the graph when they are done and don't operate
>> > > > under any synchronisation, resulting in a race condition if we try to insert
>> > > > a filter node in the place of an existing edge.
>> > >
>> > > But maybe you are thinking about higher level race conditions between
>> > > QMP commands.  Can you give an example of the race?
>> >
>> > Exactly, synchronisation between the QMP/user actions. Here's an example,
>> > correct me if I'm wrong:
>> >
>> > User issues a block-commit to this tree to commit A onto B:
>> >    BB -> A -> B
>> >
>> > This inserts the dummy mirror node at the top since this is a mirror job
>> > (active commit)
>> >    BB -> dummy -> A -> B
>> >
>> > User wants to add a filter node F below the BB. First we create the node:
>> >    BB -> dummy -> A -> B
>> >       F /
>> >
>> > Commit finishes before we do block-insert-node, dummy and A is removed from
>> > the BB's chain, mirror_exit calls bdrv_replace_node()
>> >
>> >    BB ------------> B
>> >                F -> /
>> >
>> > Now inserting F under BB will error since dummy does not exist any more.
>>
>> I see the diagrams but the flow isn't clear without the user's QMP
>> commands.
>>
>> F is created using blockdev-add?  What are the parameters and how does
>> it know about dummy?  I think this is an interesting question in itself
>> because dummy is a transient internal node that QMP users don't
>> necessarily know about.
>
>We can expect that users of block-insert-node also know about block job
>filter nodes, simply because the former is newer than the latter.
>
>(I also don't like the name "dummy", this makes it sound like it's not
>really a proper node. In reality, there is little reason to treat it
>specially.)
>
>> What is the full block-insert-node command?
>>
>> > The proposal doesn't guard against the following:
>> > User issues a block-commit to this tree to commit B onto C:
>> >    BB -> A -> B -> C
>> >
>> > The dummy node (commit's top bs is A):
>> >    BB -> A -> dummy -> B -> C
>> >
>> > blockdev-add of a filter we wish to eventually be between A and C:
>> >    BB -> A -> dummy -> B -> C
>> >             F/
>> >
>> > if commit finishes before we issue block-insert-node (commit_complete()
>> > calls bdrv_set_backing_hd() which only touches A)
>> >    BB -> A --------------> C
>> >           F -> dummy -> B /
>> >    resulting in error when issuing block-insert-node,
>> >
>> > else if we set the job to manual-delete and insert F:
>> >    BB -> A -> F -> dummy -> B -> C
>> > deleting the job will result in F being lost since the job's top bs wasn't
>> > updated.
>>
>> manual-delete is a solution for block jobs.  The write threshold
>> feature is a plain QMP command that in the future will create a
>> transient internal node (before write notifier).
>
>Yes, that becomes a legacy QMP command then. The new way is blockdev-add
>and block-insert-node for write threshold, too.
>
>Apart from that, a write threshold node never disappears by itself, it
>is only inserted or removed in the context of a QMP command. That makes
>it a lot less dangerous than automatic block completion.
>
>> I'm not sure it makes sense to turn the write threshold feature into a
>> block job.  I guess it could work, but it seems a little unnatural and
>> maybe there will be other features that use transient internal nodes.
>
>Yeah, no reason to do so. Just a manually created filter node.
>

Can you explain what you have in mind? The current workflow is using 
block-set-write-threshold on the targetted bs. If we want write 
threshold to be on node level we would want a new filter driver so that 
it can take options special to write-threshold. Unless we make the 
notify filter be a write threshold by default, and when using it 
internally by the backup job, disable the threshold and add our backup 
notifier to the node's list. Of course the current write threshold code 
could be refactored into a new driver so that it doesn't have to rely on 
notifiers.

The way I've currently done the conversion is to add an implicit filter 
when enabling the threshold, just like in backup jobs.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
  2017-07-31 17:30         ` Manos Pitsidianakis
@ 2017-08-01 13:50           ` Kevin Wolf
  2017-08-01 13:57             ` Manos Pitsidianakis
  0 siblings, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2017-08-01 13:50 UTC (permalink / raw)
  To: Manos Pitsidianakis, Stefan Hajnoczi, qemu-block, qemu-devel,
	Alberto Garcia, John Snow, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 6005 bytes --]

Am 31.07.2017 um 19:30 hat Manos Pitsidianakis geschrieben:
> On Fri, Jul 28, 2017 at 02:08:43PM +0200, Kevin Wolf wrote:
> > Am 27.07.2017 um 12:07 hat Stefan Hajnoczi geschrieben:
> > > On Wed, Jul 26, 2017 at 09:23:20PM +0300, Manos Pitsidianakis wrote:
> > > > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote:
> > > > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote:
> > > > > > This proposal follows a discussion we had with Kevin and Stefan on filter
> > > > > > node management.
> > > > > >
> > > > > > With block filter drivers arises a need to configure filter nodes on runtime
> > > > > > with QMP on live graphs. A problem with doing live graph modifications is
> > > > > > that some block jobs modify the graph when they are done and don't operate
> > > > > > under any synchronisation, resulting in a race condition if we try to insert
> > > > > > a filter node in the place of an existing edge.
> > > > >
> > > > > But maybe you are thinking about higher level race conditions between
> > > > > QMP commands.  Can you give an example of the race?
> > > >
> > > > Exactly, synchronisation between the QMP/user actions. Here's an example,
> > > > correct me if I'm wrong:
> > > >
> > > > User issues a block-commit to this tree to commit A onto B:
> > > >    BB -> A -> B
> > > >
> > > > This inserts the dummy mirror node at the top since this is a mirror job
> > > > (active commit)
> > > >    BB -> dummy -> A -> B
> > > >
> > > > User wants to add a filter node F below the BB. First we create the node:
> > > >    BB -> dummy -> A -> B
> > > >       F /
> > > >
> > > > Commit finishes before we do block-insert-node, dummy and A is removed from
> > > > the BB's chain, mirror_exit calls bdrv_replace_node()
> > > >
> > > >    BB ------------> B
> > > >                F -> /
> > > >
> > > > Now inserting F under BB will error since dummy does not exist any more.
> > > 
> > > I see the diagrams but the flow isn't clear without the user's QMP
> > > commands.
> > > 
> > > F is created using blockdev-add?  What are the parameters and how does
> > > it know about dummy?  I think this is an interesting question in itself
> > > because dummy is a transient internal node that QMP users don't
> > > necessarily know about.
> > 
> > We can expect that users of block-insert-node also know about block job
> > filter nodes, simply because the former is newer than the latter.
> > 
> > (I also don't like the name "dummy", this makes it sound like it's not
> > really a proper node. In reality, there is little reason to treat it
> > specially.)
> > 
> > > What is the full block-insert-node command?
> > > 
> > > > The proposal doesn't guard against the following:
> > > > User issues a block-commit to this tree to commit B onto C:
> > > >    BB -> A -> B -> C
> > > >
> > > > The dummy node (commit's top bs is A):
> > > >    BB -> A -> dummy -> B -> C
> > > >
> > > > blockdev-add of a filter we wish to eventually be between A and C:
> > > >    BB -> A -> dummy -> B -> C
> > > >             F/
> > > >
> > > > if commit finishes before we issue block-insert-node (commit_complete()
> > > > calls bdrv_set_backing_hd() which only touches A)
> > > >    BB -> A --------------> C
> > > >           F -> dummy -> B /
> > > >    resulting in error when issuing block-insert-node,
> > > >
> > > > else if we set the job to manual-delete and insert F:
> > > >    BB -> A -> F -> dummy -> B -> C
> > > > deleting the job will result in F being lost since the job's top bs wasn't
> > > > updated.
> > > 
> > > manual-delete is a solution for block jobs.  The write threshold
> > > feature is a plain QMP command that in the future will create a
> > > transient internal node (before write notifier).
> > 
> > Yes, that becomes a legacy QMP command then. The new way is blockdev-add
> > and block-insert-node for write threshold, too.
> > 
> > Apart from that, a write threshold node never disappears by itself, it
> > is only inserted or removed in the context of a QMP command. That makes
> > it a lot less dangerous than automatic block completion.
> > 
> > > I'm not sure it makes sense to turn the write threshold feature into a
> > > block job.  I guess it could work, but it seems a little unnatural and
> > > maybe there will be other features that use transient internal nodes.
> > 
> > Yeah, no reason to do so. Just a manually created filter node.
> > 
> 
> Can you explain what you have in mind? The current workflow is using
> block-set-write-threshold on the targetted bs. If we want write threshold to
> be on node level we would want a new filter driver so that it can take
> options special to write-threshold.

Yes, I think that's what we want.

> Unless we make the notify filter be a write threshold by default, and
> when using it internally by the backup job, disable the threshold and
> add our backup notifier to the node's list. Of course the current
> write threshold code could be refactored into a new driver so that it
> doesn't have to rely on notifiers.

As discussed on IRC, we shouldn't implement a bad interface (which
notifiers are, they bypass normal block device configuration) in a
different way, but we should replace it by something that fits better in
the block layer design.

In other words, don't add magic to provide the same notifier functions
as we had, but convert their callers to have a block node of their own
(i.e. one backup_top driver, one write-treshold driver, etc.).

> The way I've currently done the conversion is to add an implicit
> filter when enabling the threshold, just like in backup jobs.

This is fine, but it should be specifically a write-threshold filter
that a user can also manually insert whereever they like. The old QMP
interfaces would be considered legacy commands then, because
blockdev-add and blockdev-insert-node can provide the same thing.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
  2017-08-01 13:50           ` Kevin Wolf
@ 2017-08-01 13:57             ` Manos Pitsidianakis
  0 siblings, 0 replies; 13+ messages in thread
From: Manos Pitsidianakis @ 2017-08-01 13:57 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, qemu-block, qemu-devel, Alberto Garcia,
	John Snow, Jeff Cody

[-- Attachment #1: Type: text/plain, Size: 6095 bytes --]

On Tue, Aug 01, 2017 at 03:50:36PM +0200, Kevin Wolf wrote:
>Am 31.07.2017 um 19:30 hat Manos Pitsidianakis geschrieben:
>> On Fri, Jul 28, 2017 at 02:08:43PM +0200, Kevin Wolf wrote:
>> > Am 27.07.2017 um 12:07 hat Stefan Hajnoczi geschrieben:
>> > > On Wed, Jul 26, 2017 at 09:23:20PM +0300, Manos Pitsidianakis wrote:
>> > > > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote:
>> > > > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote:
>> > > > > > This proposal follows a discussion we had with Kevin and Stefan on filter
>> > > > > > node management.
>> > > > > >
>> > > > > > With block filter drivers arises a need to configure filter nodes on runtime
>> > > > > > with QMP on live graphs. A problem with doing live graph modifications is
>> > > > > > that some block jobs modify the graph when they are done and don't operate
>> > > > > > under any synchronisation, resulting in a race condition if we try to insert
>> > > > > > a filter node in the place of an existing edge.
>> > > > >
>> > > > > But maybe you are thinking about higher level race conditions between
>> > > > > QMP commands.  Can you give an example of the race?
>> > > >
>> > > > Exactly, synchronisation between the QMP/user actions. Here's an example,
>> > > > correct me if I'm wrong:
>> > > >
>> > > > User issues a block-commit to this tree to commit A onto B:
>> > > >    BB -> A -> B
>> > > >
>> > > > This inserts the dummy mirror node at the top since this is a mirror job
>> > > > (active commit)
>> > > >    BB -> dummy -> A -> B
>> > > >
>> > > > User wants to add a filter node F below the BB. First we create the node:
>> > > >    BB -> dummy -> A -> B
>> > > >       F /
>> > > >
>> > > > Commit finishes before we do block-insert-node, dummy and A is removed from
>> > > > the BB's chain, mirror_exit calls bdrv_replace_node()
>> > > >
>> > > >    BB ------------> B
>> > > >                F -> /
>> > > >
>> > > > Now inserting F under BB will error since dummy does not exist any more.
>> > >
>> > > I see the diagrams but the flow isn't clear without the user's QMP
>> > > commands.
>> > >
>> > > F is created using blockdev-add?  What are the parameters and how does
>> > > it know about dummy?  I think this is an interesting question in itself
>> > > because dummy is a transient internal node that QMP users don't
>> > > necessarily know about.
>> >
>> > We can expect that users of block-insert-node also know about block job
>> > filter nodes, simply because the former is newer than the latter.
>> >
>> > (I also don't like the name "dummy", this makes it sound like it's not
>> > really a proper node. In reality, there is little reason to treat it
>> > specially.)
>> >
>> > > What is the full block-insert-node command?
>> > >
>> > > > The proposal doesn't guard against the following:
>> > > > User issues a block-commit to this tree to commit B onto C:
>> > > >    BB -> A -> B -> C
>> > > >
>> > > > The dummy node (commit's top bs is A):
>> > > >    BB -> A -> dummy -> B -> C
>> > > >
>> > > > blockdev-add of a filter we wish to eventually be between A and C:
>> > > >    BB -> A -> dummy -> B -> C
>> > > >             F/
>> > > >
>> > > > if commit finishes before we issue block-insert-node (commit_complete()
>> > > > calls bdrv_set_backing_hd() which only touches A)
>> > > >    BB -> A --------------> C
>> > > >           F -> dummy -> B /
>> > > >    resulting in error when issuing block-insert-node,
>> > > >
>> > > > else if we set the job to manual-delete and insert F:
>> > > >    BB -> A -> F -> dummy -> B -> C
>> > > > deleting the job will result in F being lost since the job's top bs wasn't
>> > > > updated.
>> > >
>> > > manual-delete is a solution for block jobs.  The write threshold
>> > > feature is a plain QMP command that in the future will create a
>> > > transient internal node (before write notifier).
>> >
>> > Yes, that becomes a legacy QMP command then. The new way is blockdev-add
>> > and block-insert-node for write threshold, too.
>> >
>> > Apart from that, a write threshold node never disappears by itself, it
>> > is only inserted or removed in the context of a QMP command. That makes
>> > it a lot less dangerous than automatic block completion.
>> >
>> > > I'm not sure it makes sense to turn the write threshold feature into a
>> > > block job.  I guess it could work, but it seems a little unnatural and
>> > > maybe there will be other features that use transient internal nodes.
>> >
>> > Yeah, no reason to do so. Just a manually created filter node.
>> >
>>
>> Can you explain what you have in mind? The current workflow is using
>> block-set-write-threshold on the targetted bs. If we want write threshold to
>> be on node level we would want a new filter driver so that it can take
>> options special to write-threshold.
>
>Yes, I think that's what we want.
>
>> Unless we make the notify filter be a write threshold by default, and
>> when using it internally by the backup job, disable the threshold and
>> add our backup notifier to the node's list. Of course the current
>> write threshold code could be refactored into a new driver so that it
>> doesn't have to rely on notifiers.
>
>As discussed on IRC, we shouldn't implement a bad interface (which
>notifiers are, they bypass normal block device configuration) in a
>different way, but we should replace it by something that fits better in
>the block layer design.
>
>In other words, don't add magic to provide the same notifier functions
>as we had, but convert their callers to have a block node of their own
>(i.e. one backup_top driver, one write-treshold driver, etc.).
>
>> The way I've currently done the conversion is to add an implicit
>> filter when enabling the threshold, just like in backup jobs.
>
>This is fine, but it should be specifically a write-threshold filter
>that a user can also manually insert whereever they like. The old QMP
>interfaces would be considered legacy commands then, because
>blockdev-add and blockdev-insert-node can provide the same thing.
>
>Kevin

Alright thanks, this will make it easier.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] block-insert-node and block-job-delete
  2017-07-28 11:55       ` Kevin Wolf
@ 2017-08-02 10:47         ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2017-08-02 10:47 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: John Snow, Manos Pitsidianakis, qemu-block, qemu-devel,
	Alberto Garcia, Jeff Cody, armbru

[-- Attachment #1: Type: text/plain, Size: 1054 bytes --]

On Fri, Jul 28, 2017 at 01:55:38PM +0200, Kevin Wolf wrote:
> Am 28.07.2017 um 00:09 hat John Snow geschrieben:
> > On 07/26/2017 02:23 PM, Manos Pitsidianakis wrote:
> > > On Wed, Jul 26, 2017 at 04:12:21PM +0100, Stefan Hajnoczi wrote:
> > > > On Wed, Jul 26, 2017 at 05:19:24PM +0300, Manos Pitsidianakis wrote:
> > I think I agree with Stefan's concerns that this is more of a
> > temporary workaround for the benefit of jobs, but that there may well
> > be other reasons (especially in the future) where graph manipulations
> > may occur invisibly to the user.
> 
> We should do our best to avoid this. Implicit graph changes only make
> libvirt's life hard. I agree that we'll have many more graph changes in
> the future, but let's keep them as explicit as possible.

Speaking of implicit graph changes: the COLO replication driver
(block/replication.c) uses the backup job internally and that requires
before write notifiers.  That means there is an implicit graph change
when COLO decides to complete the backup job.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-08-02 10:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-26 14:19 [Qemu-devel] [RFC] block-insert-node and block-job-delete Manos Pitsidianakis
2017-07-26 15:12 ` Stefan Hajnoczi
2017-07-26 18:23   ` Manos Pitsidianakis
2017-07-27 10:07     ` Stefan Hajnoczi
2017-07-28 12:08       ` Kevin Wolf
2017-07-31 14:53         ` Stefan Hajnoczi
2017-07-31 17:30         ` Manos Pitsidianakis
2017-08-01 13:50           ` Kevin Wolf
2017-08-01 13:57             ` Manos Pitsidianakis
2017-07-27 22:09     ` John Snow
2017-07-28  8:49       ` Manos Pitsidianakis
2017-07-28 11:55       ` Kevin Wolf
2017-08-02 10:47         ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.