All of lore.kernel.org
 help / color / mirror / Atom feed
From: John G Johnson <john.g.johnson@oracle.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: "Elena Ufimtseva" <elena.ufimtseva@oracle.com>,
	"fam@euphon.net" <fam@euphon.net>,
	"Swapnil Ingle" <swapnil.ingle@nutanix.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Walker, Benjamin" <benjamin.walker@intel.com>,
	"kraxel@redhat.com" <kraxel@redhat.com>,
	"jag.raman@oracle.com" <jag.raman@oracle.com>,
	"Harris, James R" <james.r.harris@intel.com>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"kanth.ghatraju@oracle.com" <kanth.ghatraju@oracle.com>,
	"Felipe Franciosi" <felipe@nutanix.com>,
	"thuth@redhat.com" <thuth@redhat.com>,
	"ehabkost@redhat.com" <ehabkost@redhat.com>,
	"konrad.wilk@oracle.com" <konrad.wilk@oracle.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"liran.alon@oracle.com" <liran.alon@oracle.com>,
	"Thanos Makatos" <thanos.makatos@nutanix.com>,
	"rth@twiddle.net" <rth@twiddle.net>,
	"kwolf@redhat.com" <kwolf@redhat.com>,
	"\"Daniel P. Berrangé\"" <berrange@redhat.com>,
	"mreitz@redhat.com" <mreitz@redhat.com>,
	"ross.lagerwall@citrix.com" <ross.lagerwall@citrix.com>,
	"marcandre.lureau@gmail.com" <marcandre.lureau@gmail.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>
Subject: Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
Date: Mon, 13 Jan 2020 17:56:25 -0800	[thread overview]
Message-ID: <39A027D6-21C3-484F-8F90-9F04DCB9E4CF@oracle.com> (raw)
In-Reply-To: <20200103155920.GB281236@stefanha-x1.localdomain>



> On Jan 3, 2020, at 7:59 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
> On Thu, Jan 02, 2020 at 11:03:22AM +0000, Felipe Franciosi wrote:
>>> On Jan 2, 2020, at 10:42 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>> On Fri, Dec 20, 2019 at 10:22:37AM +0000, Daniel P. Berrangé wrote:
>>>> On Fri, Dec 20, 2019 at 09:47:12AM +0000, Stefan Hajnoczi wrote:
>>>>> On Thu, Dec 19, 2019 at 12:55:04PM +0000, Daniel P. Berrangé wrote:
>>>>>> On Thu, Dec 19, 2019 at 12:33:15PM +0000, Felipe Franciosi wrote:
>>>>>>>> On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>>>>>>> On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
>>>>>>>>>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>>>>>>>>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
>>>>>>>>>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
>>>>>>>>>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
>>>>>>> To be clear: I'm very happy to have a userspace-only option for this,
>>>>>>> I just don't want to ditch the kernel module (yet, anyway). :)
>>>>>> 
>>>>>> If it doesn't create too large of a burden to support both, then I think
>>>>>> it is very desirable. IIUC, this is saying a kernel based solution as the
>>>>>> optimized/optimal solution, and userspace UNIX socket based option as the
>>>>>> generic "works everywhere" fallback solution.
>>>>> 
>>>>> I'm slightly in favor of the kernel implementation because it keeps us
>>>>> better aligned with VFIO.  That means solving problems in one place only
>>>>> and less reinventing the wheel.
>>>>> 
>>>>> Knowing that a userspace implementation is possible is a plus though.
>>>>> Maybe that option will become attractive in the future and someone will
>>>>> develop it.  In fact, a userspace implementation may be a cool Google
>>>>> Summer of Code project idea that I'd like to co-mentor.
>>>> 
>>>> If it is technically viable as an approach, then I think  we should be
>>>> treating a fully unprivileged muser-over-UNIX socket as a higher priority
>>>> than just "maybe a GSoC student will want todo it".
>>>> 
>>>> Libvirt is getting strong message from KubeVirt project that they want to
>>>> be running both libvirtd and QEMU fully unprivileged. This allows their
>>>> containers to be unprivileged. Anything that requires privileges requires
>>>> jumping through extra hoops writing custom code in KubeVirt to do things
>>>> outside libvirt in side loaded privileged containers and this limits how
>>>> where those features can be used.
>>> 
>>> Okay this makes sense.
>>> 
>>> There needs to be a consensus on whether to go with a qdev-over-socket
>>> approach that is QEMU-specific and strongly discourages third-party
>>> device distribution or a muser-over-socket approach that offers a stable
>>> API for VMM interoperability and third-party device distribution.
>> 
>> The reason I dislike yet another offloading protocol (ie. there is
>> vhost, there is vfio, and then there would be qdev-over-socket) is
>> that we keep reinventing the wheel. I very much prefer picking
>> something solid (eg. VFIO) and keep investing on it.
> 
> I like the idea of sticking close to VFIO too.  The first step is
> figuring out whether VFIO can be mapped to a UNIX domain socket protocol
> and many non-VFIO protocol messages are required.  Hopefully that extra
> non-VFIO stuff isn't too large.
> 


	I looked at this and think we could map VFIO commands over a
UNIX socket without a lot of difficulty.  We'd have to use SCM
messages to pass file descriptors from the QEMU process to the
emulation process for certain operations, but that shouldn't be
a big problem.  Here are the mission mode operations:

configuration

	VFIO defines a number of configuration ioctl()s that we could
turn into messages, but if we make the protocol specific to PCI, then
all of the information they transmit (e.g., device regions and
interrupts) can be discovered by parsing the device's PCI config
space.  A lot of the current VFIO code that parses config space could
be re-used to do this.

MMIO

	VFIO uses reads and writes on the VFIO file descriptor to
perform MMIOs to the device.  The read/write offset encodes the VFIO
region and offset of the MMIO. (the VFIO regions correspond to PCI
BARs) These would have to be changed to send messages that include the
VFIO region and offset (and data for writes) to the emulation process.

interrupts

	VFIO creates eventfds that are sent to the kernel driver so it
can inject interrupts into a guest.  We would have to send these
eventfds over the socket to the emulation process using SCM messages.
The emulation process could then trigger interrupts by writing on the
eventfd.

DMA

	This is one place where I might diverge from VFIO.  It uses an
ioctl to tell the kernel driver what areas of guest memory the device
can address.  The driver then pins that memory so it can be programmed
into a HW IOMMU.  We could avoid pinning of guest memory by adopting
the vhost-user idea of sending the file descriptors used by QEMU to
create guest memory to the emulation process, and having it mmap() the
guest itself.  IOMMUs are handled by having the emulation process
request device DMA to guest PA translations from QEMU.



> If implementations can use the kernel uapi vfio header files then we're
> on track for compatibility with VFIO.
> 
>>> This is just a more elaborate explanation for the "the cat is out of the
>>> bag" comments that have already been made on licensing.  Does anyone
>>> still disagree or want to discuss further?
>>> 
>>> If there is agreement that a stable API is okay then I think the
>>> practical way to do this is to first merge a cleaned-up version of
>>> multi-process QEMU as an unstable experimental API.  Once it's being
>>> tested and used we can write a protocol specification and publish it as
>>> a stable interface when the spec has addressed most use cases.
>>> 
>>> Does this sound good?
>> 
>> In that case, wouldn't it be preferable to revive our proposal from
>> Edinburgh (KVM Forum 2018)? Our prototypes moved more of the Qemu VFIO
>> code to "common" and added a "user" backend underneath it, similar to
>> how vhost-user-scsi moved some of vhost-scsi to vhost-scsi-common and
>> added vhost-user-scsi. It was centric on PCI, but it doesn't have to
>> be. The other side can be implemented in libmuser for facilitating things.
> 
> That sounds good.
> 

       The emulation program API could be based on the current
libmuser API or the libvfio-user API.  The protocol itself wouldn’t
care which is chosen.  Our multi-processQEMU project would have to
change how devices are specified from the QEMU command line to the
emulation process command line.

							JJ



  reply	other threads:[~2020-01-14  1:57 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 01/49] multi-process: memory: alloc RAM from file at offset Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread Jagannathan Raman
2019-11-13 15:30   ` Stefan Hajnoczi
2019-11-13 15:38     ` Jag Raman
2019-11-13 15:51       ` Daniel P. Berrangé
2019-11-13 16:04         ` Jag Raman
2019-11-13 16:35           ` Daniel P. Berrangé
2019-10-24  9:08 ` [RFC v4 PATCH 03/49] multi-process: add a command line option for debug file Jagannathan Raman
2019-11-13 15:35   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 04/49] multi-process: Add stub functions to facilate build of multi-process Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 05/49] multi-process: Add config option for multi-process QEMU Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 06/49] multi-process: build system for remote device process Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object Jagannathan Raman
2019-11-11 16:41   ` Stefan Hajnoczi
2019-11-13 15:47     ` Jag Raman
2019-11-13 15:53   ` Stefan Hajnoczi
2019-11-18 15:26     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 08/49] multi-process: add functions to synchronize proxy and remote endpoints Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 09/49] multi-process: setup PCI host bridge for remote device Jagannathan Raman
2019-11-13 16:07   ` Stefan Hajnoczi
2019-11-18 15:25     ` Jag Raman
2019-11-21 10:37       ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 10/49] multi-process: setup a machine object for remote device process Jagannathan Raman
2019-11-13 16:22   ` Stefan Hajnoczi
2019-11-18 15:29     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device Jagannathan Raman
2019-11-13 16:33   ` Stefan Hajnoczi
2019-11-13 16:34     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 12/49] multi-process: remote process initialization Jagannathan Raman
2019-11-13 16:38   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 13/49] multi-process: introduce proxy object Jagannathan Raman
2019-11-21 11:09   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 14/49] mutli-process: build remote command line args Jagannathan Raman
2019-11-21 11:23   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 15/49] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
2019-11-21 11:33   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 16/49] multi-process: Add LSI device proxy object Jagannathan Raman
2019-11-21 11:35   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 17/49] multi-process: Synchronize remote memory Jagannathan Raman
2019-11-21 11:44   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 18/49] multi-process: create IOHUB object to handle irq Jagannathan Raman
2019-11-21 12:02   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 19/49] multi-process: configure remote side devices Jagannathan Raman
2019-11-21 12:05   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 20/49] multi-process: add qdev_proxy_add to create proxy devices Jagannathan Raman
2019-11-21 12:16   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 21/49] multi-process: remote: add setup_devices and setup_drive msg processing Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 22/49] multi-process: remote: use fd for socket from parent process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 23/49] multi-process: remote: add create_done condition Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 24/49] multi-process: add processing of remote drive and device command line Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 25/49] multi-process: Introduce build flags to separate remote process code Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 26/49] multi-process: refractor vl.c code to re-use in remote Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 27/49] multi-process: add remote option Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 28/49] multi-process: add remote options parser Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 29/49] multi-process: add parse_cmdline in remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 30/49] multi-process: send heartbeat messages to remote Jagannathan Raman
2019-11-11 16:27   ` Stefan Hajnoczi
2019-11-13 16:01     ` Jag Raman
2019-11-21 12:19       ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 31/49] multi-process: handle heartbeat messages in remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 32/49] multi-process: Use separate MMIO communication channel Jagannathan Raman
2019-11-11 16:21   ` Stefan Hajnoczi
2019-11-13 16:14     ` Jag Raman
2019-11-21 12:31       ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 33/49] multi-process: perform device reset in the remote process Jagannathan Raman
2019-11-11 16:19   ` Stefan Hajnoczi
2019-11-13 16:15     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 34/49] multi-process/mon: choose HMP commands based on target Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 35/49] multi-process/mon: stub functions to enable QMP module for remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 36/49] multi-process/mon: enable QMP module support in the " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 37/49] multi-process/mon: Refactor monitor/chardev functions out of vl.c Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 38/49] multi-process/mon: Initialize QMP module for remote processes Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 39/49] multi-process: prevent duplicate memory initialization in remote Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 40/49] multi-process/mig: build migration module in the remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object Jagannathan Raman
2019-11-13 15:50   ` Daniel P. Berrangé
2019-11-13 16:32     ` Jag Raman
2019-11-13 17:11       ` Daniel P. Berrangé
2019-11-18 15:42         ` Jag Raman
2019-11-22 10:34           ` Dr. David Alan Gilbert
2019-10-24  9:09 ` [RFC v4 PATCH 42/49] multi-process/mig: Send VMSD of remote to " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 43/49] multi-process/mig: Load VMSD in the proxy object Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 44/49] multi-process/mig: refactor runstate_check into common file Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 45/49] multi-process/mig: Synchronize runstate of remote process Jagannathan Raman
2019-11-11 16:17   ` Stefan Hajnoczi
2019-11-13 16:33     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 46/49] multi-process/mig: Restore the VMSD in " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 47/49] multi-process: Enable support for multiple devices in remote Jagannathan Raman
2019-11-11 16:15   ` Stefan Hajnoczi
2019-11-13 16:21     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
2019-10-25 19:33   ` Elena Ufimtseva
2019-11-07 15:50   ` Stefan Hajnoczi
2019-11-11 15:41   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 49/49] multi-process: add configure and usage information Jagannathan Raman
2019-11-07 14:02   ` Stefan Hajnoczi
2019-11-07 14:33     ` Michael S. Tsirkin
2019-11-08 11:17       ` Stefan Hajnoczi
2019-11-08 11:32         ` Daniel P. Berrangé
2019-11-07 14:39     ` Daniel P. Berrangé
2019-11-07 15:53       ` Jag Raman
2019-11-08 11:14         ` Stefan Hajnoczi
2019-10-25  2:08 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu no-reply
2019-10-25  2:08 ` no-reply
2019-10-25  2:10 ` no-reply
2019-11-21 12:46 ` Stefan Hajnoczi
2019-12-10  6:47 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update Elena Ufimtseva
2019-12-13 10:41   ` Stefan Hajnoczi
2019-12-16 19:46     ` Elena Ufimtseva
2019-12-16 19:57       ` Felipe Franciosi
2019-12-17 16:33         ` Stefan Hajnoczi
2019-12-17 22:57           ` Felipe Franciosi
2019-12-18  0:00             ` Paolo Bonzini
2019-12-19 13:36               ` Stefan Hajnoczi
2019-12-20 17:15                 ` John G Johnson
2020-01-02 10:00                   ` Stefan Hajnoczi
2020-01-02 10:04                   ` Stefan Hajnoczi
2019-12-19 11:55             ` Stefan Hajnoczi
2019-12-19 12:33               ` Felipe Franciosi
2019-12-19 12:55                 ` Daniel P. Berrangé
2019-12-20  9:47                   ` Stefan Hajnoczi
2019-12-20  9:50                     ` Paolo Bonzini
2019-12-20 14:14                       ` Felipe Franciosi
2019-12-20 15:25                         ` Alex Williamson
2019-12-20 16:00                           ` Felipe Franciosi
2020-02-25  9:16                           ` Thanos Makatos
2019-12-20 10:22                     ` Daniel P. Berrangé
2020-01-02 10:42                       ` Stefan Hajnoczi
2020-01-02 11:03                         ` Felipe Franciosi
2020-01-02 18:55                           ` Marc-André Lureau
2020-01-08 16:31                             ` Stefan Hajnoczi
2020-01-03 15:59                           ` Stefan Hajnoczi
2020-01-14  1:56                             ` John G Johnson [this message]
2020-01-17 17:25                               ` Dr. David Alan Gilbert
2019-12-19 16:40                 ` Jag Raman
2019-12-19 12:50             ` Daniel P. Berrangé
2019-12-19 16:46               ` Daniel P. Berrangé
2020-01-02 16:01           ` Elena Ufimtseva
2020-01-03 15:00             ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39A027D6-21C3-484F-8F90-9F04DCB9E4CF@oracle.com \
    --to=john.g.johnson@oracle.com \
    --cc=armbru@redhat.com \
    --cc=benjamin.walker@intel.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=fam@euphon.net \
    --cc=felipe@nutanix.com \
    --cc=jag.raman@oracle.com \
    --cc=james.r.harris@intel.com \
    --cc=kanth.ghatraju@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kraxel@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=liran.alon@oracle.com \
    --cc=marcandre.lureau@gmail.com \
    --cc=mreitz@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=ross.lagerwall@citrix.com \
    --cc=rth@twiddle.net \
    --cc=stefanha@gmail.com \
    --cc=swapnil.ingle@nutanix.com \
    --cc=thanos.makatos@nutanix.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.