All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Minutes from the "Stuttgart block Gipfele"
@ 2015-12-18 13:15 Markus Armbruster
  2015-12-23  8:33 ` Stefan Hajnoczi
  2015-12-23 10:15 ` Fam Zheng
  0 siblings, 2 replies; 9+ messages in thread
From: Markus Armbruster @ 2015-12-18 13:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Jeff Cody, Max Reitz,
	Stefan Hajnoczi, Alberto Garcia, John Snow

Kevin, Max and I used an opportunity to meet and discuss block layer
matters.  We examined two topics in some depth: BlockBackend, and block
filters and dynamic reconfiguration.

Not nearly enough people to call it a block summit.  But the local
dialect is known for its use of diminutives, and "Gipfele" is the
diminutive of "summit" :)


= BlockBackend =

Background: BlockBackend (BB) was split off BlockDriverState (BDS) to
separate the block layer's external interface (BB) from its internal
building block (BDS).  Block layer clients such as device models and the
NBD server attach to a BB by BB name.  A BB has zero or one BDS (zero
means no medium).

Multiple device models using the same BB is dangerous, so we allow
attaching only one.  We don't currently enforce an "only one"
restriction for other clients.  This is problematic, because

* Different clients may want to configure the BB in conflicting ways,
  e.g. writeback caching mode (still to be moved from the BDS's
  enable_write_cache to the BB).

* When the BDS graph gets dynamically reconfigured, say when a block
  filter gets spliced in, clients that started out in the same spot may
  need to move differently.

Instead, each client should connect to its own BB.

This leads to the next question: how should this BB be created?

Initially, what is now the BB was mashed into the BDS.  In a way, the BB
got created along with the BDS.

The current code lets you create a BB along with a BDS when you need
one, or create a new BB for an existing BDS.  The BB has a name, and the
BDS may have a node-name.

The obvious low-level building blocks would be "create BB", "connect BB
to a BDS" (we have that as x-blockdev-insert-medium), "disconnect BB
from a BDS" (x-blockdev-remove-medium) and "destroy BB"
(x-blockdev-del).

Management applications probably don't mind having to work at this low
level, but for human users, it's cumbersome.  Perhaps the BB should be
created along with the client, at least optionally.

Means to create BBs separately are mostly useful when the BB needs to be
configured by the user: instead of duplicating the BB configuration
within each client, we keep it neatly separate.  We're not aware of
user-configurable knobs, though.

Currently, a client is configured to attach to a BB by specifying a BB
name.  For instance, a device model has a "drive" property that names a
BB.  If we create the BB automatically, we need client configuration to
name a BDS instead, i.e. we need a node-name instead of a BB name.

Of course, we'll have to keep the legacy configuration working.  The
"drive" property will have to refer to a BDS, like it did before BBs
were invented.  We could:

* Move the BB name back into the BDS.

* Move the BB name into DriveInfo, where the other legacy stuff lives.
  DriveInfo needs to be changed to hang off BDS rather than BB.

Regardless, dynamic reconfiguration may have to move the name / the
DriveInfo to a different BDS.

Not entirely sure automatic creation of BB is worthwhile or not.

Next steps:

* Support multiple BBs sharing the same BDS.

* Restrict BB to only one client of any kind instead of special-casing
  device models.

* Block jobs should go through BB.

* Investigate automatic creation of BB.


= Block filters =

We already have a few block filters:

* blkdebug, blkverify, quorum

Encryption should become another one.

Moreover, we have a few things mashed into BDS that should be filters:

* throttle (only at a root, i.e. right below a BB), copy-on-read,
  notifier (for backup block job), detect-zero

Dynamic reconfiguration means altering the BDS graph while it's in use.
Existing mutators:

* snaphot, mirror-complete, commit-complete, x-blockdev-change.

Things become interesting when nodes get implicitly inserted into the
graph, e.g.:

* A backup job inserts its notifier filter

* We create an implicit throttle filter to implement legacy throttling
  configuration

And so forth.  Nothing of the sort exists just yet.

What should happen when the user asks for a mutation at a place where we
have implicit filter(s)?

First, let's examine how such a chain could look like.  If we read the
current code correctly, it behaves as if we had a chain

        BB
         |
      throttle
         |
     detect-zero
         |
    copy-on-read
         |
        BDS

Except for the backup job, which behaves as if we had

               backup job
              /
      notifier
         |
     detect-zero
         |
        BDS

We believe that the following cleaned up filter stack should work:

        BB
         |
      throttle      \
         |           \
    copy-on-read      ) fixed at creation time
         |           /
     detect-zero    /
         |
         |     backup job
         |    /
      notifier      ) dynamically inserted by the job
         |
        BDS

Clients (device model, NBD server) connect through a BB on top.

Snapshot cuts in between the BDS and its implicit filters, like this:

        BB
         |
      throttle
         |
    copy-on-read
         |
     detect-zero
         |
       qcow2            \
         |  \            ) inserted by snapshot snapshot
         |   overlay    /
        BDS

The notifier filter not shown, because we can't currently snapshot while
a block job is active.

Still to do: similar analysis for the other mutators.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] Minutes from the "Stuttgart block Gipfele"
  2015-12-18 13:15 [Qemu-devel] Minutes from the "Stuttgart block Gipfele" Markus Armbruster
@ 2015-12-23  8:33 ` Stefan Hajnoczi
  2016-01-11 15:10   ` Kevin Wolf
  2015-12-23 10:15 ` Fam Zheng
  1 sibling, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2015-12-23  8:33 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Fam Zheng, qemu-block, Jeff Cody, qemu-devel,
	Max Reitz, Alberto Garcia, John Snow

[-- Attachment #1: Type: text/plain, Size: 1109 bytes --]

On Fri, Dec 18, 2015 at 02:15:38PM +0100, Markus Armbruster wrote:
> What should happen when the user asks for a mutation at a place where we
> have implicit filter(s)?

Please suspend your disbelief for a second:

In principle it's simplest not having implicit filters.  The client
needs to set up throttling nodes or the backup filter explicitly.

Okay, now it's time to tear this apart:

For backwards compatibility it's necessary to support throttling,
copy-on-read, backup notifier, etc.  It may be possible to tag implicit
filter nodes so that mutation operations that wouldn't be possible today
are rejected.  The client must use the explicit syntax to do mutations
on implicit filters.  This is easier said than done, I'm not sure it can
be implemented cleanly.

Another problem is that the backup block job and other operations that
require a single command today could require sequences of low-level
setup commands to create filter nodes.  The QMP client would need to
first create a write notifier filter and then start the backup block job
with the write notifier node name.  It's clumsy.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] Minutes from the "Stuttgart block Gipfele"
  2015-12-18 13:15 [Qemu-devel] Minutes from the "Stuttgart block Gipfele" Markus Armbruster
  2015-12-23  8:33 ` Stefan Hajnoczi
@ 2015-12-23 10:15 ` Fam Zheng
  2016-01-04  5:16   ` Stefan Hajnoczi
  1 sibling, 1 reply; 9+ messages in thread
From: Fam Zheng @ 2015-12-23 10:15 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, Jeff Cody, qemu-devel,
	Max Reitz, Stefan Hajnoczi, John Snow

On Fri, 12/18 14:15, Markus Armbruster wrote:
> First, let's examine how such a chain could look like.  If we read the
> current code correctly, it behaves as if we had a chain
> 
>         BB
>          |
>       throttle
>          |
>      detect-zero
>          |
>     copy-on-read
>          |
>         BDS
> 
> Except for the backup job, which behaves as if we had
> 
>                backup job
>               /
>       notifier
>          |
>      detect-zero
>          |
>         BDS

Just to brainstorm block jobs in the dynamic reconfigured node graph: (not sure
if this is useful)

Nothing stops us from viewing backup as a self-contained filter,

        [backup]
           |
       detect-zero
           |
          BDS

where its .bdrv_co_writev copies out the old data, and at instantiation time
it also creates a long running coroutine (backup_run).

In that theory, all other block job types, mirror/stream/commit, fit into a
"pull" model, which follows a specified dirty bitmap and copies data from a
specified src BDS. In this pull model,

mirror (device=d0 target=d1) becomes a pull fileter:

        BB[d0]            BB[d1]
           |                 |
        throttle        [pull,src=d0]
           |                 |
       detect-zero       detect-zero
           |                 |
      copy-on-read      copy-on-read
           |                 |
          BDS               BDS

Note: the pull reuses most of the block/mirror.c code except the
s->dirty_bitmap will be initialized depending on the block job type. In the
case of mirror, it is trivially the same as now.

stream (device=d0 base=base0) becomes a pull filter:

          BB[d0]
           |
     [pull,src=base0]
           |
       detect-zero
           |
      copy-on-read
           |
          BDS
           |
          BDS[base0]

Note: s->dirty_bitmap will be initialized with the blocks which should be copied
by block-stream.

Similarly, active commit (device=d0 base=base0) becomes a pull filter:

          BB[d0]
           |
       detect-zero
           |
      copy-on-read
           |
          BDS
           |
      [pull,src=d0]
           |
          BDS[base0]

and commit (device=d0 top=base1 base=base0) becomes a pull filter:

          BB[d0]
           |
       detect-zero
           |
      copy-on-read
           |
          BDS
           |
          BDS[base1]
           |
      [pull,src=base1]
           |
          BDS[base0]


If this could work, I'm looking forward to a pretty looking diffstat if we can
unify the coroutine code of all four jobs. :)

Fam

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] Minutes from the "Stuttgart block Gipfele"
  2015-12-23 10:15 ` Fam Zheng
@ 2016-01-04  5:16   ` Stefan Hajnoczi
  2016-01-04  7:28     ` Fam Zheng
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2016-01-04  5:16 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, qemu-devel, Jeff Cody,
	Markus Armbruster, Max Reitz, John Snow

[-- Attachment #1: Type: text/plain, Size: 1145 bytes --]

On Wed, Dec 23, 2015 at 06:15:20PM +0800, Fam Zheng wrote:
> On Fri, 12/18 14:15, Markus Armbruster wrote:
> In that theory, all other block job types, mirror/stream/commit, fit into a
> "pull" model, which follows a specified dirty bitmap and copies data from a
> specified src BDS. In this pull model,
> 
> mirror (device=d0 target=d1) becomes a pull fileter:
> 
>         BB[d0]            BB[d1]
>            |                 |
>         throttle        [pull,src=d0]
>            |                 |
>        detect-zero       detect-zero
>            |                 |
>       copy-on-read      copy-on-read
>            |                 |
>           BDS               BDS
> 
> Note: the pull reuses most of the block/mirror.c code except the
> s->dirty_bitmap will be initialized depending on the block job type. In the
> case of mirror, it is trivially the same as now.

I don't understand the pull filter.  Is there also a mirror block job
coroutine?

Does anything perform I/O to BB[d1]?

If nothing is writing to/reading from BB[d1], then I don't understand
the purpose of the pull filter.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] Minutes from the "Stuttgart block Gipfele"
  2016-01-04  5:16   ` Stefan Hajnoczi
@ 2016-01-04  7:28     ` Fam Zheng
  2016-01-07  5:23       ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  0 siblings, 1 reply; 9+ messages in thread
From: Fam Zheng @ 2016-01-04  7:28 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Alberto Garcia, qemu-block, qemu-devel, Jeff Cody,
	Markus Armbruster, Max Reitz, John Snow

On Mon, 01/04 13:16, Stefan Hajnoczi wrote:
> On Wed, Dec 23, 2015 at 06:15:20PM +0800, Fam Zheng wrote:
> > On Fri, 12/18 14:15, Markus Armbruster wrote:
> > In that theory, all other block job types, mirror/stream/commit, fit into a
> > "pull" model, which follows a specified dirty bitmap and copies data from a
> > specified src BDS. In this pull model,
> > 
> > mirror (device=d0 target=d1) becomes a pull fileter:
> > 
> >         BB[d0]            BB[d1]
> >            |                 |
> >         throttle        [pull,src=d0]
> >            |                 |
> >        detect-zero       detect-zero
> >            |                 |
> >       copy-on-read      copy-on-read
> >            |                 |
> >           BDS               BDS
> > 
> > Note: the pull reuses most of the block/mirror.c code except the
> > s->dirty_bitmap will be initialized depending on the block job type. In the
> > case of mirror, it is trivially the same as now.
> 
> I don't understand the pull filter.  Is there also a mirror block job
> coroutine?
> 
> Does anything perform I/O to BB[d1]?

Yes, the filter will have a mirror block job coroutine, and it writes to the
BDS behind BB[d1]. This is conceptually different from the "block jobs have
their own BBs" design.

> 
> If nothing is writing to/reading from BB[d1], then I don't understand
> the purpose of the pull filter.
> 
> Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [Qemu-block] Minutes from the "Stuttgart block Gipfele"
  2016-01-04  7:28     ` Fam Zheng
@ 2016-01-07  5:23       ` Stefan Hajnoczi
  2016-01-07  9:32         ` Fam Zheng
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2016-01-07  5:23 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, qemu-block, qemu-devel, Markus Armbruster, Max Reitz,
	Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1666 bytes --]

On Mon, Jan 04, 2016 at 03:28:36PM +0800, Fam Zheng wrote:
> On Mon, 01/04 13:16, Stefan Hajnoczi wrote:
> > On Wed, Dec 23, 2015 at 06:15:20PM +0800, Fam Zheng wrote:
> > > On Fri, 12/18 14:15, Markus Armbruster wrote:
> > > In that theory, all other block job types, mirror/stream/commit, fit into a
> > > "pull" model, which follows a specified dirty bitmap and copies data from a
> > > specified src BDS. In this pull model,
> > > 
> > > mirror (device=d0 target=d1) becomes a pull fileter:
> > > 
> > >         BB[d0]            BB[d1]
> > >            |                 |
> > >         throttle        [pull,src=d0]
> > >            |                 |
> > >        detect-zero       detect-zero
> > >            |                 |
> > >       copy-on-read      copy-on-read
> > >            |                 |
> > >           BDS               BDS
> > > 
> > > Note: the pull reuses most of the block/mirror.c code except the
> > > s->dirty_bitmap will be initialized depending on the block job type. In the
> > > case of mirror, it is trivially the same as now.
> > 
> > I don't understand the pull filter.  Is there also a mirror block job
> > coroutine?
> > 
> > Does anything perform I/O to BB[d1]?
> 
> Yes, the filter will have a mirror block job coroutine, and it writes to the
> BDS behind BB[d1]. This is conceptually different from the "block jobs have
> their own BBs" design.

Are any of the pull filter's callbacks (.bdrv_co_readv(),
.bdrv_co_writev()) being invoked?

I still don't understand your idea because it seems like the coroutine
is doing all the work and the filter serves no purpose.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [Qemu-block] Minutes from the "Stuttgart block Gipfele"
  2016-01-07  5:23       ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2016-01-07  9:32         ` Fam Zheng
  0 siblings, 0 replies; 9+ messages in thread
From: Fam Zheng @ 2016-01-07  9:32 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, qemu-block, qemu-devel, Markus Armbruster, Max Reitz,
	Stefan Hajnoczi

On Thu, 01/07 13:23, Stefan Hajnoczi wrote:
> On Mon, Jan 04, 2016 at 03:28:36PM +0800, Fam Zheng wrote:
> > On Mon, 01/04 13:16, Stefan Hajnoczi wrote:
> > > On Wed, Dec 23, 2015 at 06:15:20PM +0800, Fam Zheng wrote:
> > > > On Fri, 12/18 14:15, Markus Armbruster wrote:
> > > > In that theory, all other block job types, mirror/stream/commit, fit into a
> > > > "pull" model, which follows a specified dirty bitmap and copies data from a
> > > > specified src BDS. In this pull model,
> > > > 
> > > > mirror (device=d0 target=d1) becomes a pull fileter:
> > > > 
> > > >         BB[d0]            BB[d1]
> > > >            |                 |
> > > >         throttle        [pull,src=d0]
> > > >            |                 |
> > > >        detect-zero       detect-zero
> > > >            |                 |
> > > >       copy-on-read      copy-on-read
> > > >            |                 |
> > > >           BDS               BDS
> > > > 
> > > > Note: the pull reuses most of the block/mirror.c code except the
> > > > s->dirty_bitmap will be initialized depending on the block job type. In the
> > > > case of mirror, it is trivially the same as now.
> > > 
> > > I don't understand the pull filter.  Is there also a mirror block job
> > > coroutine?
> > > 
> > > Does anything perform I/O to BB[d1]?
> > 
> > Yes, the filter will have a mirror block job coroutine, and it writes to the
> > BDS behind BB[d1]. This is conceptually different from the "block jobs have
> > their own BBs" design.
> 
> Are any of the pull filter's callbacks (.bdrv_co_readv(),
> .bdrv_co_writev()) being invoked?
> 
> I still don't understand your idea because it seems like the coroutine
> is doing all the work and the filter serves no purpose.
> 

OK, it's more of the mental model. My point is letting the dynamic filter
reconfiguration interface manage the block job in units of filters, the
internal mechanism should be the same between "pull" filter and "mirror" job
coroutine.

Fam

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] Minutes from the "Stuttgart block Gipfele"
  2015-12-23  8:33 ` Stefan Hajnoczi
@ 2016-01-11 15:10   ` Kevin Wolf
  2016-01-14 11:25     ` Stefan Hajnoczi
  0 siblings, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2016-01-11 15:10 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, qemu-block, qemu-devel, Jeff Cody, Markus Armbruster,
	Max Reitz, Alberto Garcia, John Snow

[-- Attachment #1: Type: text/plain, Size: 1829 bytes --]

Am 23.12.2015 um 09:33 hat Stefan Hajnoczi geschrieben:
> On Fri, Dec 18, 2015 at 02:15:38PM +0100, Markus Armbruster wrote:
> > What should happen when the user asks for a mutation at a place where we
> > have implicit filter(s)?
> 
> Please suspend your disbelief for a second:
> 
> In principle it's simplest not having implicit filters.  The client
> needs to set up throttling nodes or the backup filter explicitly.
> 
> Okay, now it's time to tear this apart:
> 
> For backwards compatibility it's necessary to support throttling,
> copy-on-read, backup notifier, etc.  It may be possible to tag implicit
> filter nodes so that mutation operations that wouldn't be possible today
> are rejected.  The client must use the explicit syntax to do mutations
> on implicit filters.  This is easier said than done, I'm not sure it can
> be implemented cleanly.

Yes, backwards compatibility is what complicates things a bit.

> Another problem is that the backup block job and other operations that
> require a single command today could require sequences of low-level
> setup commands to create filter nodes.  The QMP client would need to
> first create a write notifier filter and then start the backup block job
> with the write notifier node name.  It's clumsy.

I don't think splitting it up into several low-level commands is
necessary. We don't expect the user to set any options for the filter
(and if we did, they would probably have to match the same options for
the block job), so we can keep the BDS creation as a part of starting
the block job.

The important part is that the management tool knows that a filter is
going to be inserted and how to address it. In order to achieve that, we
could simply add a 'filter-node-name' option to the QMP command starting
the job.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] Minutes from the "Stuttgart block Gipfele"
  2016-01-11 15:10   ` Kevin Wolf
@ 2016-01-14 11:25     ` Stefan Hajnoczi
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2016-01-14 11:25 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Fam Zheng, qemu-block, qemu-devel, Jeff Cody, Markus Armbruster,
	Max Reitz, Alberto Garcia, John Snow

[-- Attachment #1: Type: text/plain, Size: 1145 bytes --]

On Mon, Jan 11, 2016 at 04:10:12PM +0100, Kevin Wolf wrote:
> Am 23.12.2015 um 09:33 hat Stefan Hajnoczi geschrieben:
> > On Fri, Dec 18, 2015 at 02:15:38PM +0100, Markus Armbruster wrote:
> > Another problem is that the backup block job and other operations that
> > require a single command today could require sequences of low-level
> > setup commands to create filter nodes.  The QMP client would need to
> > first create a write notifier filter and then start the backup block job
> > with the write notifier node name.  It's clumsy.
> 
> I don't think splitting it up into several low-level commands is
> necessary. We don't expect the user to set any options for the filter
> (and if we did, they would probably have to match the same options for
> the block job), so we can keep the BDS creation as a part of starting
> the block job.
> 
> The important part is that the management tool knows that a filter is
> going to be inserted and how to address it. In order to achieve that, we
> could simply add a 'filter-node-name' option to the QMP command starting
> the job.

That sounds like a good approach.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-01-14 11:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-18 13:15 [Qemu-devel] Minutes from the "Stuttgart block Gipfele" Markus Armbruster
2015-12-23  8:33 ` Stefan Hajnoczi
2016-01-11 15:10   ` Kevin Wolf
2016-01-14 11:25     ` Stefan Hajnoczi
2015-12-23 10:15 ` Fam Zheng
2016-01-04  5:16   ` Stefan Hajnoczi
2016-01-04  7:28     ` Fam Zheng
2016-01-07  5:23       ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-01-07  9:32         ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.