All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ?
@ 2017-11-02 12:02 Daniel P. Berrange
  2017-11-02 16:40 ` [Qemu-devel] [Qemu-block] " Kashyap Chamarthy
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Daniel P. Berrange @ 2017-11-02 12:02 UTC (permalink / raw)
  To: qemu-devel, qemu-block, Eric Blake

I've been thinking about a potential design/impl improvement for the way
that OpenStack Nova handles disk images when booting virtual machines, and
thinking if some enhancements to qemu-nbd could be beneficial...

At a high level, OpenStack has a repository of disk images (Glance), and
when we go to boot a VM, Nova copies the disk image out of the repository
onto the local host's image cache. We doing this, Nova may also enlarge
disk image (eg if the original image has 10GB size, it may do a qemu-img
resize to 40GB). Nova then creates a qcow2 overlay with backing file
pointing to its local cache. Multiple VMs can be booted in parallel each
with their own overlay pointing to the same backing file

The problem with this approach is that VM startup is delayed while we copy
the disk image from the glance repository to the local cache, and again
while we do the image resize (though the latter is pretty quick really
since its just changing metadata in the image and/or host filesystem)

One might suggest that we avoid the local disk copy and just point the
VM directly at an NBD server running in the remote image repository, but
this introduces a centralized point of failure. With the local disk copy
VMs can safely continue running even if the image repository dies. Running
from the local image cache can offer better performance too, particularly
if having SSD storage. 

Conceptually what I want to start with is a 3 layer chain

   master-disk1.qcow2  (qemu-nbd)
          |
          |  (format=raw, proto=nbd)
	  |
   cache-disk1.qcow2   (qemu-system-XXX)
          |
          |  (format=qcow2, proto=file)
	  |
          +-  vm-a-disk1.qcow2   (qemu-system-XXX)

NB vm-?-disk.qcow2 sizes may different than the backing file.
Sometimes OS disk images are built with a fairly small root filesystem
size, and the guest OS will grow its root FS to fill the actual disk
size allowed to the specific VM instance.

The cache-disk1.qcow2 is on each local virt host that needs disk1, and
created when the first VM is launched. Further launched VMs can all use
this same cached disk.  Now the cache-disk1.qcow2 is not useful as is,
because it has no allocated clusters, so after its created we need to
be able to stream content into it from master-disk1.qcow2, in parallel
with the VM A booting off vm-a-disk1.qcow2

If there was only a single VM, this would be easy enough, because we
can use drive-mirror monitor command to pull master-disk1.qcow2 data
into cache-disk1.qcow2 and then remove the backing chain leaving just

   cache-disk1.qcow2   (qemu-system-XXX)
          |
	  |  (format=qcow2, proto=file)
          |
          +-  vm-a-disk1.qcow2  (qemu-system-XXX)

The problem is that many VMs are wanting to use cache-disk1.qcow2 as
their disk's backing file, and only one process is permitted to be
writing to disk backing file at any time. So I can't use the drive-mirror
in the QEMU processes to deal with this; all QEMU's must see their
backing file in a consistent read-only state

I've been wondering if it is possible to add an extra layer of NBD to
deal with this scenario. i.e. start off with:

   master-disk1.qcow2  (qemu-nbd)
          |
          |  (format=raw, proto=nbd)
	  |
   cache-disk1.qcow2  (qemu-nbd)
          |
          |  (format=raw, proto=nbd)
	  |
          +-  vm-a-disk1.qcow2  (qemu-system-XXX)
          +-  vm-b-disk1.qcow2  (qemu-system-XXX)
          +-  vm-c-disk1.qcow2  (qemu-system-XXX)


In this model 'cache-disk1.qcow2' would be opened read-write by a
qemu-nbd server process, but exported read-only to QEMU. qemu-nbd
would then do a drive mirror to stream the contents of
master-disk1.qcow2 into its cache-disk1.qcow2, concurrently with
servicing read requests from many QEMU's vm-*-disk1.qcow2 files
over NBD. When the drive mirror is complete we would again cut
the backing file to give

   cache-disk1.qcow2  (qemu-nbd)
          |
          |  (format=raw, proto=nbd)
	  |
          +-  vm-a-disk1.qcow2  (qemu-system-XXX)
          +-  vm-b-disk1.qcow2  (qemu-system-XXX)
          +-  vm-c-disk1.qcow2  (qemu-system-XXX)

Since qemu-nbd no longer needs write to cache-disk1.qcow2 at this point,
we can further pivot all the QEMU servers to make vm-*-disk1.qcow2 use
format=qcow2,proto=file, allowing the local qemu-nbd to close the disk
image, and potentially exit (assuming it doesn't have other disks to
service). This would leave

   cache-disk1.qcow2  (qemu-system-XXX)
          |
          |  (format=qcow2, proto=file)
	  |
          +-  vm-a-disk1.qcow2  (qemu-system-XXX)
          +-  vm-b-disk1.qcow2  (qemu-system-XXX)
          +-  vm-c-disk1.qcow2  (qemu-system-XXX)

Conceptually QEMU has all the pieces neccessary to support this kind of
approach to disk images, but they're not exposed by qemu-nbd as it has
no QMP interface of its own.

Another more minor issue is that the disk image repository may have
1000's of images in it, and I don't want to be running 1000's of
qemu-nbd instances. I'd like 1 server to export many disks. I could
use iscsi in the disk image repository instead to deal with that, 
only having the qemu-nbd processes running on the local virt host
for the duration of populating cache-disks1.qcow2 from master-disk1.qcow2
The iscsi server admin commands are pretty unplesant to use compared
to QMP though, so its appealing to use NBD for everything.

After all that long background explanation, what I'm wondering is whether
there is any interest / desire to extend qemu-nbd to have more advanced
featureset than simply exporting a single disk image which must be listed
at startup time.

 - Ability to start qemu-nbd up with no initial disk image connected
 - Option to have a QMP interface to control qemu-nbd
 - Commands to add / remove individual disk image exports
 - Commands for doing the drive-mirror / backing file pivot

It feels like this wouldn't require significant new functionality in either
QMP or block layer. It ought to be mostly a cache of taking existing QMP
code and wiring it up in qemu-nbd, and only exposing a whitelisted subset
of existing QMP commands related to block backends. 

One alternative approach to doing this would be to suggest that we should
instead just spawn qemu-system-x86_64 with '--machine none' and use that
as a replacement for qemu-nbd, since it already has a built-in NBD server
which can do many exports at once and arbitrary block jobs.

I'm concerned that this could end up being a be a game of whack-a-mole
though, constantly trying to cut out/down all the bits of system emulation
in the machine emulators to get its resource overhead to match the low
overhead of standalone qemu-nbd.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ?
  2017-11-02 12:02 [Qemu-devel] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ? Daniel P. Berrange
@ 2017-11-02 16:40 ` Kashyap Chamarthy
  2017-11-02 17:04   ` Daniel P. Berrange
  2017-11-02 18:06 ` Paolo Bonzini
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Kashyap Chamarthy @ 2017-11-02 16:40 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: qemu-devel, qemu-block, Eric Blake, mbooth

[Cc: Matt Booth from Nova upstream; so not snipping the email to retain
context for Matt.]

On Thu, Nov 02, 2017 at 12:02:23PM +0000, Daniel P. Berrange wrote:
> I've been thinking about a potential design/impl improvement for the way
> that OpenStack Nova handles disk images when booting virtual machines, and
> thinking if some enhancements to qemu-nbd could be beneficial...

Just read-through, very intereesting idea.  A couple of things inline.

> At a high level, OpenStack has a repository of disk images (Glance), and
> when we go to boot a VM, Nova copies the disk image out of the repository
> onto the local host's image cache. We doing this, Nova may also enlarge
> disk image (eg if the original image has 10GB size, it may do a qemu-img
> resize to 40GB). Nova then creates a qcow2 overlay with backing file
> pointing to its local cache. Multiple VMs can be booted in parallel each
> with their own overlay pointing to the same backing file
> 
> The problem with this approach is that VM startup is delayed while we copy
> the disk image from the glance repository to the local cache, and again
> while we do the image resize (though the latter is pretty quick really
> since its just changing metadata in the image and/or host filesystem)
> 
> One might suggest that we avoid the local disk copy and just point the
> VM directly at an NBD server running in the remote image repository, but
> this introduces a centralized point of failure. With the local disk copy
> VMs can safely continue running even if the image repository dies. Running
> from the local image cache can offer better performance too, particularly
> if having SSD storage. 
> 
> Conceptually what I want to start with is a 3 layer chain
> 
>    master-disk1.qcow2  (qemu-nbd)
>           |
>           |  (format=raw, proto=nbd)
> 	  |
>    cache-disk1.qcow2   (qemu-system-XXX)
>           |
>           |  (format=qcow2, proto=file)
> 	  |
>           +-  vm-a-disk1.qcow2   (qemu-system-XXX)
> 
> NB vm-?-disk.qcow2 sizes may different than the backing file.
> Sometimes OS disk images are built with a fairly small root filesystem
> size, and the guest OS will grow its root FS to fill the actual disk
> size allowed to the specific VM instance.
> 
> The cache-disk1.qcow2 is on each local virt host that needs disk1, and
> created when the first VM is launched. Further launched VMs can all use
> this same cached disk.  Now the cache-disk1.qcow2 is not useful as is,
> because it has no allocated clusters, so after its created we need to
> be able to stream content into it from master-disk1.qcow2, in parallel
> with the VM A booting off vm-a-disk1.qcow2
> 
> If there was only a single VM, this would be easy enough, because we
> can use drive-mirror monitor command to pull master-disk1.qcow2 data
> into cache-disk1.qcow2 and then remove the backing chain leaving just
> 
>    cache-disk1.qcow2   (qemu-system-XXX)
>           |

Just for my own understanding: in this hypothetical single VM diagram,
you denote a QEMU binary ("qemu-system-XXX") for 'cache-disk1.qcow2'
because it will be issuing 'drive-mirror' / 'blockdev-mirror' to the
'qemu-nbd' that exported 'master-disk1.qcow2', and "un-chain"
post completion of 'mirror' job.  Yes?

> 	    |  (format=qcow2, proto=file)
>           |
>           +-  vm-a-disk1.qcow2  (qemu-system-XXX)
> 
> The problem is that many VMs are wanting to use cache-disk1.qcow2 as
> their disk's backing file, and only one process is permitted to be
> writing to disk backing file at any time.

Can you explain a bit more about how many VMs are trying to write to
write to the same backing file 'cache-disk1.qcow2'?  I'd assume it's
just the "immutable" local backing store (once the previous 'mirror' job
is completed), based on which Nova creates a qcow2 overlay for each
instance it boots.

When I pointed this e-mail of yours to Matt Booth on Freenode Nova IRC
channel, he said the intermediate image (cache-disk1.qcow2) is a COR
Copy-On-Read).  I realize what COR is -- everytime you read a cluster
from the backing file, you write that locally, to avoid reading it
again.

> So I can't use the drive-mirror
> in the QEMU processes to deal with this; all QEMU's must see their
> backing file in a consistent read-only state
> 
> I've been wondering if it is possible to add an extra layer of NBD to
> deal with this scenario. i.e. start off with:
> 
>    master-disk1.qcow2  (qemu-nbd)
>           |
>           |  (format=raw, proto=nbd)
> 	  |
>    cache-disk1.qcow2  (qemu-nbd)
>           |
>           |  (format=raw, proto=nbd)
> 	  |
>           +-  vm-a-disk1.qcow2  (qemu-system-XXX)
>           +-  vm-b-disk1.qcow2  (qemu-system-XXX)
>           +-  vm-c-disk1.qcow2  (qemu-system-XXX)
> 
> 
> In this model 'cache-disk1.qcow2' would be opened read-write by a
> qemu-nbd server process, but exported read-only to QEMU. qemu-nbd
> would then do a drive mirror to stream the contents of
> master-disk1.qcow2 into its cache-disk1.qcow2, concurrently with
> servicing read requests from many QEMU's vm-*-disk1.qcow2 files
> over NBD. When the drive mirror is complete we would again cut
> the backing file to give
> 
>    cache-disk1.qcow2  (qemu-nbd)
>           |
>           |  (format=raw, proto=nbd)
> 	  |
>           +-  vm-a-disk1.qcow2  (qemu-system-XXX)
>           +-  vm-b-disk1.qcow2  (qemu-system-XXX)
>           +-  vm-c-disk1.qcow2  (qemu-system-XXX)
> 
> Since qemu-nbd no longer needs write to cache-disk1.qcow2 at this point,
> we can further pivot all the QEMU servers to make vm-*-disk1.qcow2 use
> format=qcow2,proto=file, allowing the local qemu-nbd to close the disk
> image, and potentially exit (assuming it doesn't have other disks to
> service). This would leave
> 
>    cache-disk1.qcow2  (qemu-system-XXX)
>           |
>           |  (format=qcow2, proto=file)
> 	  |
>           +-  vm-a-disk1.qcow2  (qemu-system-XXX)
>           +-  vm-b-disk1.qcow2  (qemu-system-XXX)
>           +-  vm-c-disk1.qcow2  (qemu-system-XXX)
> 
> Conceptually QEMU has all the pieces neccessary to support this kind of
> approach to disk images, but they're not exposed by qemu-nbd as it has
> no QMP interface of its own.
> 
> Another more minor issue is that the disk image repository may have
> 1000's of images in it, and I don't want to be running 1000's of
> qemu-nbd instances. I'd like 1 server to export many disks. I could
> use iscsi in the disk image repository instead to deal with that, 
> only having the qemu-nbd processes running on the local virt host
> for the duration of populating cache-disks1.qcow2 from master-disk1.qcow2
> The iscsi server admin commands are pretty unplesant to use compared
> to QMP though, so its appealing to use NBD for everything.
> 
> After all that long background explanation, what I'm wondering is whether
> there is any interest / desire to extend qemu-nbd to have more advanced
> featureset than simply exporting a single disk image which must be listed
> at startup time.
> 
>  - Ability to start qemu-nbd up with no initial disk image connected
>  - Option to have a QMP interface to control qemu-nbd
>  - Commands to add / remove individual disk image exports
>  - Commands for doing the drive-mirror / backing file pivot
> 
> It feels like this wouldn't require significant new functionality in either
> QMP or block layer. It ought to be mostly a cache of taking existing QMP
> code and wiring it up in qemu-nbd, and only exposing a whitelisted subset
> of existing QMP commands related to block backends. 
> 
> One alternative approach to doing this would be to suggest that we should
> instead just spawn qemu-system-x86_64 with '--machine none' and use that
> as a replacement for qemu-nbd, since it already has a built-in NBD server
> which can do many exports at once and arbitrary block jobs.
> 
> I'm concerned that this could end up being a be a game of whack-a-mole
> though, constantly trying to cut out/down all the bits of system emulation
> in the machine emulators to get its resource overhead to match the low
> overhead of standalone qemu-nbd.
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 

-- 
/kashyap

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ?
  2017-11-02 16:40 ` [Qemu-devel] [Qemu-block] " Kashyap Chamarthy
@ 2017-11-02 17:04   ` Daniel P. Berrange
  2017-11-02 17:50     ` Eric Blake
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel P. Berrange @ 2017-11-02 17:04 UTC (permalink / raw)
  To: Kashyap Chamarthy; +Cc: qemu-devel, qemu-block, Eric Blake, mbooth

On Thu, Nov 02, 2017 at 05:40:28PM +0100, Kashyap Chamarthy wrote:
> [Cc: Matt Booth from Nova upstream; so not snipping the email to retain
> context for Matt.]
> 
> On Thu, Nov 02, 2017 at 12:02:23PM +0000, Daniel P. Berrange wrote:
> > I've been thinking about a potential design/impl improvement for the way
> > that OpenStack Nova handles disk images when booting virtual machines, and
> > thinking if some enhancements to qemu-nbd could be beneficial...
> 
> Just read-through, very intereesting idea.  A couple of things inline.
> 
> > At a high level, OpenStack has a repository of disk images (Glance), and
> > when we go to boot a VM, Nova copies the disk image out of the repository
> > onto the local host's image cache. We doing this, Nova may also enlarge
> > disk image (eg if the original image has 10GB size, it may do a qemu-img
> > resize to 40GB). Nova then creates a qcow2 overlay with backing file
> > pointing to its local cache. Multiple VMs can be booted in parallel each
> > with their own overlay pointing to the same backing file
> > 
> > The problem with this approach is that VM startup is delayed while we copy
> > the disk image from the glance repository to the local cache, and again
> > while we do the image resize (though the latter is pretty quick really
> > since its just changing metadata in the image and/or host filesystem)
> > 
> > One might suggest that we avoid the local disk copy and just point the
> > VM directly at an NBD server running in the remote image repository, but
> > this introduces a centralized point of failure. With the local disk copy
> > VMs can safely continue running even if the image repository dies. Running
> > from the local image cache can offer better performance too, particularly
> > if having SSD storage. 
> > 
> > Conceptually what I want to start with is a 3 layer chain
> > 
> >    master-disk1.qcow2  (qemu-nbd)
> >           |
> >           |  (format=raw, proto=nbd)
> > 	  |
> >    cache-disk1.qcow2   (qemu-system-XXX)
> >           |
> >           |  (format=qcow2, proto=file)
> > 	  |
> >           +-  vm-a-disk1.qcow2   (qemu-system-XXX)
> > 
> > NB vm-?-disk.qcow2 sizes may different than the backing file.
> > Sometimes OS disk images are built with a fairly small root filesystem
> > size, and the guest OS will grow its root FS to fill the actual disk
> > size allowed to the specific VM instance.
> > 
> > The cache-disk1.qcow2 is on each local virt host that needs disk1, and
> > created when the first VM is launched. Further launched VMs can all use
> > this same cached disk.  Now the cache-disk1.qcow2 is not useful as is,
> > because it has no allocated clusters, so after its created we need to
> > be able to stream content into it from master-disk1.qcow2, in parallel
> > with the VM A booting off vm-a-disk1.qcow2
> > 
> > If there was only a single VM, this would be easy enough, because we
> > can use drive-mirror monitor command to pull master-disk1.qcow2 data
> > into cache-disk1.qcow2 and then remove the backing chain leaving just
> > 
> >    cache-disk1.qcow2   (qemu-system-XXX)
> >           |
> 
> Just for my own understanding: in this hypothetical single VM diagram,
> you denote a QEMU binary ("qemu-system-XXX") for 'cache-disk1.qcow2'
> because it will be issuing 'drive-mirror' / 'blockdev-mirror' to the
> 'qemu-nbd' that exported 'master-disk1.qcow2', and "un-chain"
> post completion of 'mirror' job.  Yes?

In this diagram the same QEMU process has both cache-disk1.qcow2 and
vm-a-disk1.qcow2 open - its just a regular backing file setup.

> 
> > 	    |  (format=qcow2, proto=file)
> >           |
> >           +-  vm-a-disk1.qcow2  (qemu-system-XXX)
> > 
> > The problem is that many VMs are wanting to use cache-disk1.qcow2 as
> > their disk's backing file, and only one process is permitted to be
> > writing to disk backing file at any time.
> 
> Can you explain a bit more about how many VMs are trying to write to
> write to the same backing file 'cache-disk1.qcow2'?  I'd assume it's
> just the "immutable" local backing store (once the previous 'mirror' job
> is completed), based on which Nova creates a qcow2 overlay for each
> instance it boots.

An arbitrary number of  vm-*-disk1.qcow2 files could exist all using
the same cache-disk1.qcow2 image. Its only limited by how many VMs
you can fit on the host. By definition you can only ever have a single
process writing to a qcow2 file though, otherwise corruption will quickly
follow.

> When I pointed this e-mail of yours to Matt Booth on Freenode Nova IRC
> channel, he said the intermediate image (cache-disk1.qcow2) is a COR
> Copy-On-Read).  I realize what COR is -- everytime you read a cluster
> from the backing file, you write that locally, to avoid reading it
> again.

qcow2 doesn't give you COR, only COW. So every read request would have a miss
in cache-disk1.qcow2 and thus have to be fetched from master-disk1.qcow2. The
use of drive-mirror to pull master-disk1.qcow2 contents into cache-disk1.qcow
makes up for the lack of COR by populating cache-disk1.qcow2 in the background.

> > So I can't use the drive-mirror
> > in the QEMU processes to deal with this; all QEMU's must see their
> > backing file in a consistent read-only state
> > 
> > I've been wondering if it is possible to add an extra layer of NBD to
> > deal with this scenario. i.e. start off with:
> > 
> >    master-disk1.qcow2  (qemu-nbd)
> >           |
> >           |  (format=raw, proto=nbd)
> > 	  |
> >    cache-disk1.qcow2  (qemu-nbd)
> >           |
> >           |  (format=raw, proto=nbd)
> > 	  |
> >           +-  vm-a-disk1.qcow2  (qemu-system-XXX)
> >           +-  vm-b-disk1.qcow2  (qemu-system-XXX)
> >           +-  vm-c-disk1.qcow2  (qemu-system-XXX)
> > 
> > 
> > In this model 'cache-disk1.qcow2' would be opened read-write by a
> > qemu-nbd server process, but exported read-only to QEMU. qemu-nbd
> > would then do a drive mirror to stream the contents of
> > master-disk1.qcow2 into its cache-disk1.qcow2, concurrently with
> > servicing read requests from many QEMU's vm-*-disk1.qcow2 files
> > over NBD. When the drive mirror is complete we would again cut
> > the backing file to give
> > 
> >    cache-disk1.qcow2  (qemu-nbd)
> >           |
> >           |  (format=raw, proto=nbd)
> > 	  |
> >           +-  vm-a-disk1.qcow2  (qemu-system-XXX)
> >           +-  vm-b-disk1.qcow2  (qemu-system-XXX)
> >           +-  vm-c-disk1.qcow2  (qemu-system-XXX)
> > 
> > Since qemu-nbd no longer needs write to cache-disk1.qcow2 at this point,
> > we can further pivot all the QEMU servers to make vm-*-disk1.qcow2 use
> > format=qcow2,proto=file, allowing the local qemu-nbd to close the disk
> > image, and potentially exit (assuming it doesn't have other disks to
> > service). This would leave
> > 
> >    cache-disk1.qcow2  (qemu-system-XXX)
> >           |
> >           |  (format=qcow2, proto=file)
> > 	  |
> >           +-  vm-a-disk1.qcow2  (qemu-system-XXX)
> >           +-  vm-b-disk1.qcow2  (qemu-system-XXX)
> >           +-  vm-c-disk1.qcow2  (qemu-system-XXX)
> > 
> > Conceptually QEMU has all the pieces neccessary to support this kind of
> > approach to disk images, but they're not exposed by qemu-nbd as it has
> > no QMP interface of its own.
> > 
> > Another more minor issue is that the disk image repository may have
> > 1000's of images in it, and I don't want to be running 1000's of
> > qemu-nbd instances. I'd like 1 server to export many disks. I could
> > use iscsi in the disk image repository instead to deal with that, 
> > only having the qemu-nbd processes running on the local virt host
> > for the duration of populating cache-disks1.qcow2 from master-disk1.qcow2
> > The iscsi server admin commands are pretty unplesant to use compared
> > to QMP though, so its appealing to use NBD for everything.
> > 
> > After all that long background explanation, what I'm wondering is whether
> > there is any interest / desire to extend qemu-nbd to have more advanced
> > featureset than simply exporting a single disk image which must be listed
> > at startup time.
> > 
> >  - Ability to start qemu-nbd up with no initial disk image connected
> >  - Option to have a QMP interface to control qemu-nbd
> >  - Commands to add / remove individual disk image exports
> >  - Commands for doing the drive-mirror / backing file pivot
> > 
> > It feels like this wouldn't require significant new functionality in either
> > QMP or block layer. It ought to be mostly a cache of taking existing QMP
> > code and wiring it up in qemu-nbd, and only exposing a whitelisted subset
> > of existing QMP commands related to block backends. 
> > 
> > One alternative approach to doing this would be to suggest that we should
> > instead just spawn qemu-system-x86_64 with '--machine none' and use that
> > as a replacement for qemu-nbd, since it already has a built-in NBD server
> > which can do many exports at once and arbitrary block jobs.
> > 
> > I'm concerned that this could end up being a be a game of whack-a-mole
> > though, constantly trying to cut out/down all the bits of system emulation
> > in the machine emulators to get its resource overhead to match the low
> > overhead of standalone qemu-nbd.
> > 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ?
  2017-11-02 17:04   ` Daniel P. Berrange
@ 2017-11-02 17:50     ` Eric Blake
  2017-11-03 10:04       ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Blake @ 2017-11-02 17:50 UTC (permalink / raw)
  To: Daniel P. Berrange, Kashyap Chamarthy; +Cc: qemu-devel, qemu-block, mbooth

[-- Attachment #1: Type: text/plain, Size: 2634 bytes --]

On 11/02/2017 12:04 PM, Daniel P. Berrange wrote:

> vm-a-disk1.qcow2 open - its just a regular backing file setup.
> 
>>
>>> 	    |  (format=qcow2, proto=file)
>>>           |
>>>           +-  vm-a-disk1.qcow2  (qemu-system-XXX)
>>>
>>> The problem is that many VMs are wanting to use cache-disk1.qcow2 as
>>> their disk's backing file, and only one process is permitted to be
>>> writing to disk backing file at any time.
>>
>> Can you explain a bit more about how many VMs are trying to write to
>> write to the same backing file 'cache-disk1.qcow2'?  I'd assume it's
>> just the "immutable" local backing store (once the previous 'mirror' job
>> is completed), based on which Nova creates a qcow2 overlay for each
>> instance it boots.
> 
> An arbitrary number of  vm-*-disk1.qcow2 files could exist all using
> the same cache-disk1.qcow2 image. Its only limited by how many VMs
> you can fit on the host. By definition you can only ever have a single
> process writing to a qcow2 file though, otherwise corruption will quickly
> follow.

So if I'm following, your argument is that the local qemu-nbd process is
the only one writing to the file, while all other overlays are backed by
the NBD process; and then as any one of the VMs reads, the qemu-nbd
process pulls those sectors into the local storage as a result.

> 
>> When I pointed this e-mail of yours to Matt Booth on Freenode Nova IRC
>> channel, he said the intermediate image (cache-disk1.qcow2) is a COR
>> Copy-On-Read).  I realize what COR is -- everytime you read a cluster
>> from the backing file, you write that locally, to avoid reading it
>> again.
> 
> qcow2 doesn't give you COR, only COW. So every read request would have a miss
> in cache-disk1.qcow2 and thus have to be fetched from master-disk1.qcow2. The
> use of drive-mirror to pull master-disk1.qcow2 contents into cache-disk1.qcow
> makes up for the lack of COR by populating cache-disk1.qcow2 in the background.

Ah, but qcow2 (or more precisely, any protocol qemu BDS) DOES have
copy-on-read, built in to the block layer.  See qemu-iotest 197 for an
example of it in use.  If we use COR correctly, then every initial read
request will miss in the cache, but the COR will populate the cache
without having to have a background drive-mirror.  A background
drive-mirror may still be useful to populate the cache faster, but COR
populates the parts you want now regardless of how fast the background
task is running.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ?
  2017-11-02 12:02 [Qemu-devel] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ? Daniel P. Berrange
  2017-11-02 16:40 ` [Qemu-devel] [Qemu-block] " Kashyap Chamarthy
@ 2017-11-02 18:06 ` Paolo Bonzini
  2017-11-02 21:38 ` Max Reitz
  2017-11-03  6:00 ` [Qemu-devel] " Fam Zheng
  3 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2017-11-02 18:06 UTC (permalink / raw)
  To: Daniel P. Berrange, qemu-devel, qemu-block, Eric Blake

On 02/11/2017 13:02, Daniel P. Berrange wrote:
> 
> After all that long background explanation, what I'm wondering is whether
> there is any interest / desire to extend qemu-nbd to have more advanced
> featureset than simply exporting a single disk image which must be listed
> at startup time.
> 
>  - Ability to start qemu-nbd up with no initial disk image connected
>  - Option to have a QMP interface to control qemu-nbd
>  - Commands to add / remove individual disk image exports
>  - Commands for doing the drive-mirror / backing file pivot
> 
> It feels like this wouldn't require significant new functionality in either
> QMP or block layer. It ought to be mostly a cache of taking existing QMP
> code and wiring it up in qemu-nbd, and only exposing a whitelisted subset
> of existing QMP commands related to block backends. 

I think adding a QMP interface is a good idea; if you're using Unix
sockets I don't see much benefit in using multiple disk image exports
from a single qemu-nbd instance, but maybe you aren't?

At this point it does indeed feel a lot like --machine none.  Perhaps we
should just have a new binary /usr/bin/qemu-noguest instead of
augmenting qemu-nbd.

Would you also add rerror/werror support to qemu-nbd at the same time?

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ?
  2017-11-02 12:02 [Qemu-devel] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ? Daniel P. Berrange
  2017-11-02 16:40 ` [Qemu-devel] [Qemu-block] " Kashyap Chamarthy
  2017-11-02 18:06 ` Paolo Bonzini
@ 2017-11-02 21:38 ` Max Reitz
  2017-11-03  9:59   ` Stefan Hajnoczi
  2017-11-09 13:54   ` Markus Armbruster
  2017-11-03  6:00 ` [Qemu-devel] " Fam Zheng
  3 siblings, 2 replies; 13+ messages in thread
From: Max Reitz @ 2017-11-02 21:38 UTC (permalink / raw)
  To: Daniel P. Berrange, qemu-devel, qemu-block, Eric Blake,
	Markus Armbruster, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 2252 bytes --]

On 2017-11-02 13:02, Daniel P. Berrange wrote:
[...]
> One alternative approach to doing this would be to suggest that we should
> instead just spawn qemu-system-x86_64 with '--machine none' and use that
> as a replacement for qemu-nbd, since it already has a built-in NBD server
> which can do many exports at once and arbitrary block jobs.

As far as I know, we had wanted to add QMP support to qemu-nbd maybe one
or two years ago, but nobody ever did it.

I've had some discussions about this with Markus and Kevin at KVM Forum.
 They appeared to strongly prefer this approach.  I agree with them that
design-wise, a qemu with no machine at all (and not even -M none) and
early QMP is the way we want to go anyway, and then this would be the
correct tool to use.

> I'm concerned that this could end up being a be a game of whack-a-mole
> though, constantly trying to cut out/down all the bits of system emulation
> in the machine emulators to get its resource overhead to match the low
> overhead of standalone qemu-nbd.

However, I personally share your concern.  Especially, I think that
getting to a point where we can have no machine at all and early QMP
will take much longer than just adding QMP to qemu-nbd -- or adding a
qmp command to qemu-img (because you can add NBD exports through QMP, so
qemu-nbd's hardcoded focus on NBD exports seems kind of weird then)[1].

I'm very much torn here.  There are two approaches: Stripping fat qemu
down, or fattening lean qemu-img (?) up.  The latter is very simple.
The former is what we want anyway.

Markus says it's not too hard to strip down qemu.  If that is true,
there is no point in fattening qemu-img now.  I personally am not
convinced at all, but he knows the state of that project much better
than me, so I cannot reasonably disagree.

So my mail is more of a CC to Markus and Kevin -- but I think both are
on PTO right now.

I guess the main question is: If someone were to introduce a qemu-img
QMP subcommand -- would it be accepted? :-)

Max


[1] Also, adding QMP should trivially add block jobs and multiple
exports to whatever tool we are talking about (in fact, qemu-img already
does perform the mirror block job for committing).


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ?
  2017-11-02 12:02 [Qemu-devel] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ? Daniel P. Berrange
                   ` (2 preceding siblings ...)
  2017-11-02 21:38 ` Max Reitz
@ 2017-11-03  6:00 ` Fam Zheng
  2017-11-03 10:01   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  3 siblings, 1 reply; 13+ messages in thread
From: Fam Zheng @ 2017-11-03  6:00 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: qemu-devel, qemu-block, Eric Blake, pbonzini, stefanha

On Thu, 11/02 12:02, Daniel P. Berrange wrote:
> One alternative approach to doing this would be to suggest that we should
> instead just spawn qemu-system-x86_64 with '--machine none' and use that
> as a replacement for qemu-nbd, since it already has a built-in NBD server
> which can do many exports at once and arbitrary block jobs.

Here is a crazy idea from KVM Forum discussions that may relate, so mention it
here: we could move the QEMU block layer to a separate program and guest can use
vhost-user-{blk,scsi} for I/O. It is something like this:


   master-disk1.qcow2  (qemu-nbd)
          ^
          |  backing
          |
   cache-disk1.qcow2   (qemu-vhost)     <-------------.
          ^                                           |
          |  backing                                  |  backing
          |                                           |
          +-  vm-a-disk1.qcow2   (qemu-vhost)         +-  vm-a-disk2.qcow2   (qemu-vhost)
                    ^                                             ^
                    |  vhost-user-blk                             |  vhost-user-blk
                    |                                             |
                    +- VM-1 (qemu-system-XXX)                     +- VM-2 (qemu-system-XXX)


So on the compute node, there will be N qemu-system-XXX processes (where N is
the number of VMs) and 1 qemu-vhost process.

The hypothetical qemu-vhost program needs to support QMP as well and it runs the
COR/mirroring jobs from master disk to cache disk, just like what you propose to
do with the extended qemu-nbd. The only difference is replacing the local NBD
with vhost-user, which is more efficient.

Fam

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ?
  2017-11-02 21:38 ` Max Reitz
@ 2017-11-03  9:59   ` Stefan Hajnoczi
  2017-11-09 13:54   ` Markus Armbruster
  1 sibling, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2017-11-03  9:59 UTC (permalink / raw)
  To: Max Reitz
  Cc: Daniel P. Berrange, qemu-devel, qemu-block, Eric Blake,
	Markus Armbruster, Kevin Wolf

[-- Attachment #1: Type: text/plain, Size: 2685 bytes --]

On Thu, Nov 02, 2017 at 10:38:16PM +0100, Max Reitz wrote:
> On 2017-11-02 13:02, Daniel P. Berrange wrote:
> [...]
> > One alternative approach to doing this would be to suggest that we should
> > instead just spawn qemu-system-x86_64 with '--machine none' and use that
> > as a replacement for qemu-nbd, since it already has a built-in NBD server
> > which can do many exports at once and arbitrary block jobs.
> 
> As far as I know, we had wanted to add QMP support to qemu-nbd maybe one
> or two years ago, but nobody ever did it.

Benoît Canet worked on this in 2014:
https://lists.gnu.org/archive/html/qemu-devel/2014-08/msg02533.html

> I've had some discussions about this with Markus and Kevin at KVM Forum.
>  They appeared to strongly prefer this approach.  I agree with them that
> design-wise, a qemu with no machine at all (and not even -M none) and
> early QMP is the way we want to go anyway, and then this would be the
> correct tool to use.
> 
> > I'm concerned that this could end up being a be a game of whack-a-mole
> > though, constantly trying to cut out/down all the bits of system emulation
> > in the machine emulators to get its resource overhead to match the low
> > overhead of standalone qemu-nbd.
> 
> However, I personally share your concern.  Especially, I think that
> getting to a point where we can have no machine at all and early QMP
> will take much longer than just adding QMP to qemu-nbd -- or adding a
> qmp command to qemu-img (because you can add NBD exports through QMP, so
> qemu-nbd's hardcoded focus on NBD exports seems kind of weird then)[1].
> 
> I'm very much torn here.  There are two approaches: Stripping fat qemu
> down, or fattening lean qemu-img (?) up.  The latter is very simple.
> The former is what we want anyway.
> 
> Markus says it's not too hard to strip down qemu.  If that is true,
> there is no point in fattening qemu-img now.  I personally am not
> convinced at all, but he knows the state of that project much better
> than me, so I cannot reasonably disagree.
> 
> So my mail is more of a CC to Markus and Kevin -- but I think both are
> on PTO right now.
> 
> I guess the main question is: If someone were to introduce a qemu-img
> QMP subcommand -- would it be accepted? :-)

I'm in favor of the -machine none approach since it seems inevitable
that both qemu-img and qemu-nbd will become QMP interfaces.  qemu-io has
already become a monitor interface...

> [1] Also, adding QMP should trivially add block jobs and multiple
> exports to whatever tool we are talking about (in fact, qemu-img already
> does perform the mirror block job for committing).
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Qemu-block]  RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ?
  2017-11-03  6:00 ` [Qemu-devel] " Fam Zheng
@ 2017-11-03 10:01   ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2017-11-03 10:01 UTC (permalink / raw)
  To: Fam Zheng; +Cc: Daniel P. Berrange, stefanha, pbonzini, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1984 bytes --]

On Fri, Nov 03, 2017 at 02:00:46PM +0800, Fam Zheng wrote:
> On Thu, 11/02 12:02, Daniel P. Berrange wrote:
> > One alternative approach to doing this would be to suggest that we should
> > instead just spawn qemu-system-x86_64 with '--machine none' and use that
> > as a replacement for qemu-nbd, since it already has a built-in NBD server
> > which can do many exports at once and arbitrary block jobs.
> 
> Here is a crazy idea from KVM Forum discussions that may relate, so mention it
> here: we could move the QEMU block layer to a separate program and guest can use
> vhost-user-{blk,scsi} for I/O. It is something like this:
> 
> 
>    master-disk1.qcow2  (qemu-nbd)
>           ^
>           |  backing
>           |
>    cache-disk1.qcow2   (qemu-vhost)     <-------------.
>           ^                                           |
>           |  backing                                  |  backing
>           |                                           |
>           +-  vm-a-disk1.qcow2   (qemu-vhost)         +-  vm-a-disk2.qcow2   (qemu-vhost)
>                     ^                                             ^
>                     |  vhost-user-blk                             |  vhost-user-blk
>                     |                                             |
>                     +- VM-1 (qemu-system-XXX)                     +- VM-2 (qemu-system-XXX)
> 
> 
> So on the compute node, there will be N qemu-system-XXX processes (where N is
> the number of VMs) and 1 qemu-vhost process.
> 
> The hypothetical qemu-vhost program needs to support QMP as well and it runs the
> COR/mirroring jobs from master disk to cache disk, just like what you propose to
> do with the extended qemu-nbd. The only difference is replacing the local NBD
> with vhost-user, which is more efficient.

The vhost-user part can be added later.  I think the first step is
whether to add a QMP interface to qemu-nbd or use -machine none.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ?
  2017-11-02 17:50     ` Eric Blake
@ 2017-11-03 10:04       ` Stefan Hajnoczi
  2017-11-03 10:16         ` Daniel P. Berrange
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2017-11-03 10:04 UTC (permalink / raw)
  To: Eric Blake
  Cc: Daniel P. Berrange, Kashyap Chamarthy, qemu-devel, qemu-block, mbooth

[-- Attachment #1: Type: text/plain, Size: 3103 bytes --]

On Thu, Nov 02, 2017 at 12:50:39PM -0500, Eric Blake wrote:
> On 11/02/2017 12:04 PM, Daniel P. Berrange wrote:
> 
> > vm-a-disk1.qcow2 open - its just a regular backing file setup.
> > 
> >>
> >>> 	    |  (format=qcow2, proto=file)
> >>>           |
> >>>           +-  vm-a-disk1.qcow2  (qemu-system-XXX)
> >>>
> >>> The problem is that many VMs are wanting to use cache-disk1.qcow2 as
> >>> their disk's backing file, and only one process is permitted to be
> >>> writing to disk backing file at any time.
> >>
> >> Can you explain a bit more about how many VMs are trying to write to
> >> write to the same backing file 'cache-disk1.qcow2'?  I'd assume it's
> >> just the "immutable" local backing store (once the previous 'mirror' job
> >> is completed), based on which Nova creates a qcow2 overlay for each
> >> instance it boots.
> > 
> > An arbitrary number of  vm-*-disk1.qcow2 files could exist all using
> > the same cache-disk1.qcow2 image. Its only limited by how many VMs
> > you can fit on the host. By definition you can only ever have a single
> > process writing to a qcow2 file though, otherwise corruption will quickly
> > follow.
> 
> So if I'm following, your argument is that the local qemu-nbd process is
> the only one writing to the file, while all other overlays are backed by
> the NBD process; and then as any one of the VMs reads, the qemu-nbd
> process pulls those sectors into the local storage as a result.
> 
> > 
> >> When I pointed this e-mail of yours to Matt Booth on Freenode Nova IRC
> >> channel, he said the intermediate image (cache-disk1.qcow2) is a COR
> >> Copy-On-Read).  I realize what COR is -- everytime you read a cluster
> >> from the backing file, you write that locally, to avoid reading it
> >> again.
> > 
> > qcow2 doesn't give you COR, only COW. So every read request would have a miss
> > in cache-disk1.qcow2 and thus have to be fetched from master-disk1.qcow2. The
> > use of drive-mirror to pull master-disk1.qcow2 contents into cache-disk1.qcow
> > makes up for the lack of COR by populating cache-disk1.qcow2 in the background.
> 
> Ah, but qcow2 (or more precisely, any protocol qemu BDS) DOES have
> copy-on-read, built in to the block layer.  See qemu-iotest 197 for an
> example of it in use.  If we use COR correctly, then every initial read
> request will miss in the cache, but the COR will populate the cache
> without having to have a background drive-mirror.  A background
> drive-mirror may still be useful to populate the cache faster, but COR
> populates the parts you want now regardless of how fast the background
> task is running.

-drive copy-on-read=on and the stream block job were added exactly for
this provisioning use case.  They can be used together.

I was a little surprised that the discussion has been about the mirror
job rather than the stream job.

One difference between stream and mirror is that stream doesn't pivot
the image file on completion.  Instead it clears the backing file so the
link to the remote server no longer exists.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ?
  2017-11-03 10:04       ` Stefan Hajnoczi
@ 2017-11-03 10:16         ` Daniel P. Berrange
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel P. Berrange @ 2017-11-03 10:16 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Eric Blake, Kashyap Chamarthy, qemu-devel, qemu-block, mbooth

On Fri, Nov 03, 2017 at 10:04:57AM +0000, Stefan Hajnoczi wrote:
> On Thu, Nov 02, 2017 at 12:50:39PM -0500, Eric Blake wrote:
> > On 11/02/2017 12:04 PM, Daniel P. Berrange wrote:
> > 
> > > vm-a-disk1.qcow2 open - its just a regular backing file setup.
> > > 
> > >>
> > >>> 	    |  (format=qcow2, proto=file)
> > >>>           |
> > >>>           +-  vm-a-disk1.qcow2  (qemu-system-XXX)
> > >>>
> > >>> The problem is that many VMs are wanting to use cache-disk1.qcow2 as
> > >>> their disk's backing file, and only one process is permitted to be
> > >>> writing to disk backing file at any time.
> > >>
> > >> Can you explain a bit more about how many VMs are trying to write to
> > >> write to the same backing file 'cache-disk1.qcow2'?  I'd assume it's
> > >> just the "immutable" local backing store (once the previous 'mirror' job
> > >> is completed), based on which Nova creates a qcow2 overlay for each
> > >> instance it boots.
> > > 
> > > An arbitrary number of  vm-*-disk1.qcow2 files could exist all using
> > > the same cache-disk1.qcow2 image. Its only limited by how many VMs
> > > you can fit on the host. By definition you can only ever have a single
> > > process writing to a qcow2 file though, otherwise corruption will quickly
> > > follow.
> > 
> > So if I'm following, your argument is that the local qemu-nbd process is
> > the only one writing to the file, while all other overlays are backed by
> > the NBD process; and then as any one of the VMs reads, the qemu-nbd
> > process pulls those sectors into the local storage as a result.
> > 
> > > 
> > >> When I pointed this e-mail of yours to Matt Booth on Freenode Nova IRC
> > >> channel, he said the intermediate image (cache-disk1.qcow2) is a COR
> > >> Copy-On-Read).  I realize what COR is -- everytime you read a cluster
> > >> from the backing file, you write that locally, to avoid reading it
> > >> again.
> > > 
> > > qcow2 doesn't give you COR, only COW. So every read request would have a miss
> > > in cache-disk1.qcow2 and thus have to be fetched from master-disk1.qcow2. The
> > > use of drive-mirror to pull master-disk1.qcow2 contents into cache-disk1.qcow
> > > makes up for the lack of COR by populating cache-disk1.qcow2 in the background.
> > 
> > Ah, but qcow2 (or more precisely, any protocol qemu BDS) DOES have
> > copy-on-read, built in to the block layer.  See qemu-iotest 197 for an
> > example of it in use.  If we use COR correctly, then every initial read
> > request will miss in the cache, but the COR will populate the cache
> > without having to have a background drive-mirror.  A background
> > drive-mirror may still be useful to populate the cache faster, but COR
> > populates the parts you want now regardless of how fast the background
> > task is running.
> 
> -drive copy-on-read=on and the stream block job were added exactly for
> this provisioning use case.  They can be used together.
> 
> I was a little surprised that the discussion has been about the mirror
> job rather than the stream job.
> 
> One difference between stream and mirror is that stream doesn't pivot
> the image file on completion.  Instead it clears the backing file so the
> link to the remote server no longer exists.

The confusion between 'stream' and 'mirror' is entirely my lack of
understanding. Just substitute whichever makes sense :-)

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ?
  2017-11-02 21:38 ` Max Reitz
  2017-11-03  9:59   ` Stefan Hajnoczi
@ 2017-11-09 13:54   ` Markus Armbruster
  2017-11-09 16:02     ` Daniel P. Berrange
  1 sibling, 1 reply; 13+ messages in thread
From: Markus Armbruster @ 2017-11-09 13:54 UTC (permalink / raw)
  To: Max Reitz
  Cc: Daniel P. Berrange, qemu-devel, qemu-block, Eric Blake, Kevin Wolf

Max Reitz <mreitz@redhat.com> writes:

> On 2017-11-02 13:02, Daniel P. Berrange wrote:
> [...]
>> One alternative approach to doing this would be to suggest that we should
>> instead just spawn qemu-system-x86_64 with '--machine none' and use that
>> as a replacement for qemu-nbd, since it already has a built-in NBD server
>> which can do many exports at once and arbitrary block jobs.
>
> As far as I know, we had wanted to add QMP support to qemu-nbd maybe one
> or two years ago, but nobody ever did it.
>
> I've had some discussions about this with Markus and Kevin at KVM Forum.
>  They appeared to strongly prefer this approach.  I agree with them that
> design-wise, a qemu with no machine at all (and not even -M none) and
> early QMP is the way we want to go anyway, and then this would be the
> correct tool to use.

"Strongly" is perhaps a bit strong, at least as far as I'm concerned.  I
just believe that we want the capability to run QEMU without a machine
anyway, and if that's good enough, then why bother duplicating so much
of qemu-system-FOO in qemu-nbd & friends?  Besides, once you start to
duplicate, you'll likely find it hard to stop.

>> I'm concerned that this could end up being a be a game of whack-a-mole
>> though, constantly trying to cut out/down all the bits of system emulation
>> in the machine emulators to get its resource overhead to match the low
>> overhead of standalone qemu-nbd.
>
> However, I personally share your concern.  Especially, I think that
> getting to a point where we can have no machine at all and early QMP
> will take much longer than just adding QMP to qemu-nbd -- or adding a
> qmp command to qemu-img (because you can add NBD exports through QMP, so
> qemu-nbd's hardcoded focus on NBD exports seems kind of weird then)[1].
>
> I'm very much torn here.  There are two approaches: Stripping fat qemu
> down, or fattening lean qemu-img (?) up.  The latter is very simple.

"Very simple" is perhaps debatable, but I think we can agree on
"temptingly simple".

> The former is what we want anyway.

Yes.

> Markus says it's not too hard to strip down qemu.  If that is true,

To find out, we need to experimentally remodel main() with an axe.
Volunteers?

> there is no point in fattening qemu-img now.  I personally am not
> convinced at all, but he knows the state of that project much better
> than me, so I cannot reasonably disagree.
>
> So my mail is more of a CC to Markus and Kevin -- but I think both are
> on PTO right now.

Back, nursing my conference cold.

> I guess the main question is: If someone were to introduce a qemu-img
> QMP subcommand -- would it be accepted? :-)
>
> Max
>
>
> [1] Also, adding QMP should trivially add block jobs and multiple
> exports to whatever tool we are talking about (in fact, qemu-img already
> does perform the mirror block job for committing).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Qemu-block] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ?
  2017-11-09 13:54   ` Markus Armbruster
@ 2017-11-09 16:02     ` Daniel P. Berrange
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel P. Berrange @ 2017-11-09 16:02 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Max Reitz, qemu-devel, qemu-block, Eric Blake, Kevin Wolf

On Thu, Nov 09, 2017 at 02:54:35PM +0100, Markus Armbruster wrote:
> Max Reitz <mreitz@redhat.com> writes:
> 
> > On 2017-11-02 13:02, Daniel P. Berrange wrote:
> > [...]
> >> One alternative approach to doing this would be to suggest that we should
> >> instead just spawn qemu-system-x86_64 with '--machine none' and use that
> >> as a replacement for qemu-nbd, since it already has a built-in NBD server
> >> which can do many exports at once and arbitrary block jobs.
> >
> > As far as I know, we had wanted to add QMP support to qemu-nbd maybe one
> > or two years ago, but nobody ever did it.
> >
> > I've had some discussions about this with Markus and Kevin at KVM Forum.
> >  They appeared to strongly prefer this approach.  I agree with them that
> > design-wise, a qemu with no machine at all (and not even -M none) and
> > early QMP is the way we want to go anyway, and then this would be the
> > correct tool to use.
> 
> "Strongly" is perhaps a bit strong, at least as far as I'm concerned.  I
> just believe that we want the capability to run QEMU without a machine
> anyway, and if that's good enough, then why bother duplicating so much
> of qemu-system-FOO in qemu-nbd & friends?  Besides, once you start to
> duplicate, you'll likely find it hard to stop.
> 
> >> I'm concerned that this could end up being a be a game of whack-a-mole
> >> though, constantly trying to cut out/down all the bits of system emulation
> >> in the machine emulators to get its resource overhead to match the low
> >> overhead of standalone qemu-nbd.
> >
> > However, I personally share your concern.  Especially, I think that
> > getting to a point where we can have no machine at all and early QMP
> > will take much longer than just adding QMP to qemu-nbd -- or adding a
> > qmp command to qemu-img (because you can add NBD exports through QMP, so
> > qemu-nbd's hardcoded focus on NBD exports seems kind of weird then)[1].
> >
> > I'm very much torn here.  There are two approaches: Stripping fat qemu
> > down, or fattening lean qemu-img (?) up.  The latter is very simple.
> 
> "Very simple" is perhaps debatable, but I think we can agree on
> "temptingly simple".

My other concern with using QEMU system emulator binary is that even
if you make it possible to run it with no guest machine instantiated,
it is still a massive binary containing all the stuff you get with a
QEMU system emulator. To be able to lock down the security of this
QEMU to the same level that we could do with qemu-nbd will take even
more work than we would already have todo to make "no machine" be
possible. Even then getting the right security level would require
the invoker to enable the right magic combo of options. And then we
would also need full QEMU backend modularization to be done so that
we could have a binary for serving NBD which didn't pull in irrelevant
stuff like SPICE, GTK & Xorg libraries. This just all sounds like we
are queuing up unfeasibly large amounts of work, alot of which we've
talked about for 10 years and still not been any to make a step forward
on impl.

It feels like the key important factor is to avoid re-inventing the
wheel in multiple places. I don't think this implies that we need to
have a single binary for doing two completely separate tasks. On the
qemu-nbd side it should basically involve assembling building blocks
that we already have available through the QEMU code base. The block
layer is already well isolated & reusable, as evidenced by fact that
it is used across many programs already with little code duplication.
In theory the QMP monitor is fairly well isolated, since we already
reuse the infra for the QEMU guest agent too. For qemu-nbd we could
reuse even more than with the guest agent, since we can pull in some
of the actual command impls for sharing. There's doubtless still
some refactoring to QMP to make it possible, but its no where near
the kind of scope we'd be looking at to take the QEMU system emulator
and enable a "no machine" startup, make it secure by default and
modularize all its dependancies.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-11-09 16:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-02 12:02 [Qemu-devel] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ? Daniel P. Berrange
2017-11-02 16:40 ` [Qemu-devel] [Qemu-block] " Kashyap Chamarthy
2017-11-02 17:04   ` Daniel P. Berrange
2017-11-02 17:50     ` Eric Blake
2017-11-03 10:04       ` Stefan Hajnoczi
2017-11-03 10:16         ` Daniel P. Berrange
2017-11-02 18:06 ` Paolo Bonzini
2017-11-02 21:38 ` Max Reitz
2017-11-03  9:59   ` Stefan Hajnoczi
2017-11-09 13:54   ` Markus Armbruster
2017-11-09 16:02     ` Daniel P. Berrange
2017-11-03  6:00 ` [Qemu-devel] " Fam Zheng
2017-11-03 10:01   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.