qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Felipe Franciosi <felipe@nutanix.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
	"fam@euphon.net" <fam@euphon.net>,
	Swapnil Ingle <swapnil.ingle@nutanix.com>,
	"john.g.johnson@oracle.com" <john.g.johnson@oracle.com>,
	Stefan Hajnoczi <stefanha@gmail.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Walker, Benjamin" <benjamin.walker@intel.com>,
	"kraxel@redhat.com" <kraxel@redhat.com>,
	"jag.raman@oracle.com" <jag.raman@oracle.com>,
	"Harris, James R" <james.r.harris@intel.com>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"kanth.ghatraju@oracle.com" <kanth.ghatraju@oracle.com>,
	"thuth@redhat.com" <thuth@redhat.com>,
	"ehabkost@redhat.com" <ehabkost@redhat.com>,
	"konrad.wilk@oracle.com" <konrad.wilk@oracle.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"liran.alon@oracle.com" <liran.alon@oracle.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"rth@twiddle.net" <rth@twiddle.net>,
	"kwolf@redhat.com" <kwolf@redhat.com>,
	"mreitz@redhat.com" <mreitz@redhat.com>,
	"ross.lagerwall@citrix.com" <ross.lagerwall@citrix.com>,
	"marcandre.lureau@gmail.com" <marcandre.lureau@gmail.com>,
	Thanos Makatos <thanos.makatos@nutanix.com>
Subject: Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
Date: Thu, 19 Dec 2019 12:55:04 +0000	[thread overview]
Message-ID: <20191219125504.GI1190276@redhat.com> (raw)
In-Reply-To: <772D9CF3-D15D-42D1-B9CF-1279619D7C20@nutanix.com>

On Thu, Dec 19, 2019 at 12:33:15PM +0000, Felipe Franciosi wrote:
> Hello,
> 
> (I've added Jim and Ben from the SPDK team to the thread.)
> 
> > On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > 
> > On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
> >>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
> >>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> >>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
> >>> Questions I've seen when discussing muser with people have been:
> >>> 
> >>> 1. Can unprivileged containers create muser devices?  If not, this is a
> >>>  blocker for use cases that want to avoid root privileges entirely.
> >> 
> >> Yes you can. Muser device creation follows the same process as general
> >> mdev device creation (ie. you write to a sysfs path). That creates an
> >> entry in /dev/vfio and the control plane can further drop privileges
> >> there (set selinux contexts, &c.)
> > 
> > In this case there is still a privileged step during setup.  What about
> > completely unprivileged scenarios like a regular user without root or a
> > rootless container?
> 
> Oh, I see what you are saying. I suppose we need to investigate
> adjusting the privileges of the sysfs path correctly beforehand to
> allow devices to be created by non-root users. The credentials used on
> creation should be reflected on the vfio endpoint (ie. /dev/fio/<group>).
> 
> I need to look into that and get back to you.
> 
> > 
> >>> 2. Does muser need to be in the kernel (e.g. slower to develop/ship,
> >>>  security reasons)?  A similar library could be implemented in
> >>>  userspace along the lines of the vhost-user protocol.  Although VMMs
> >>>  would then need to use a new libmuser-client library instead of
> >>>  reusing their VFIO code to access the device.
> >> 
> >> Doing it in userspace was the flow we proposed back in last year's KVM
> >> Forum (Edinburgh), but it got turned down. That's why we procured the
> >> kernel approach, which turned out to have some advantages:
> >> - No changes needed to Qemu
> >> - No Qemu needed at all for userspace drivers
> >> - Device emulation process restart is trivial
> >>  (it therefore makes device code upgrades much easier)
> >> 
> >> Having said that, nothing stops us from enhancing libmuser to talk
> >> directly to Qemu (for the Qemu case). I envision at least two ways of
> >> doing that:
> >> - Hooking up libmuser with Qemu directly (eg. over a unix socket)
> >> - Hooking Qemu with CUSE and implementing the muser.ko interface
> >> 
> >> For the latter, libmuser would talk to a character device just like it
> >> talks to the vfio character device. We "just" need to implement that
> >> backend in Qemu. :)
> > 
> > What about:
> > * libmuser's API stays mostly unchanged but the library speaks a
> >   VFIO-over-UNIX domain sockets protocol instead of talking to
> >   mdev/vfio in the host kernel.
> 
> As I said above, there are advantages to the kernel model. The key one
> is transparent device emulation restarts. Today, muser.ko keeps the
> "device memory" internally in a prefix tree. Upon restart, a new
> device emulator can recover state (eg. from a state file in /dev/shm
> or similar) and remap the same memory that is already configured to
> the guest via Qemu. We have a pending work item for muser.ko to also
> keep the eventfds so we can recover those, too. Another advantage is
> working with any userspace driver and not requiring a VMM at all.
> 
> If done entirely in userspace, the device emulator needs to allocate
> the device memory somewhere that remains accessible (eg. tmpfs), with
> the difference that now we may be talking about non-trivial amounts of
> memory. Also, that may not be the kind of content you want lingering
> around the filesystem (for the same reasons Qemu unlinks memory files
> from /dev/hugepages after mmap'ing it).
> 
> That's why I'd prefer to rephrase what you said to "in addition"
> instead of "instead".
> 
> > * VMMs can implement this protocol directly for POSIX-portable and
> >   unprivileged operation.
> > * A CUSE VFIO adapter simulates /dev/vfio so that VFIO-only VMMs can
> >   still take advantage of libmuser devices.
> 
> I'm happy with that.
> We need to think the credential aspect throughout to ensure nodes can
> be created in the right places with the right privileges.
> 
> > 
> > Assuming this is feasible, would you lose any important
> > features/advantages of the muser.ko approach?  I don't know enough about
> > VFIO to identify any blocker or obvious performance problems.
> 
> That's what I elaborated above. The fact that muser.ko can keep
> various metadata (and other resources) about the device in the kernel
> and grant it back to userspace as needed. There are ways around it,
> but it requires some orchestration with tmpfs and the VMM (only so
> much can be kept in tmpfs; the eventfds need to be retransmitted from
> the machine emulator on request).
> 
> Restarting is a critical aspect of this. One key use case for the
> project is to be able to emulate various devices from one process (for
> polling). That must be able to restart for upgrades or recovery.
> 
> > 
> > Regarding recovery, it seems straightforward to keep state in a tmpfs
> > file that can be reopened when the device is restarted.  I don't think
> > kernel code is necessary?
> 
> It adds a dependency, but isn't a show stopper. If we can work through
> permission issues, making sure the VMM can reconnect and retransmit
> eventfds and other state, then it should be ok.
> 
> To be clear: I'm very happy to have a userspace-only option for this,
> I just don't want to ditch the kernel module (yet, anyway). :)

If it doesn't create too large of a burden to support both, then I think
it is very desirable. IIUC, this is saying a kernel based solution as the
optimized/optimal solution, and userspace UNIX socket based option as the
generic "works everywhere" fallback solution.



Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  reply	other threads:[~2019-12-19 13:21 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 01/49] multi-process: memory: alloc RAM from file at offset Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread Jagannathan Raman
2019-11-13 15:30   ` Stefan Hajnoczi
2019-11-13 15:38     ` Jag Raman
2019-11-13 15:51       ` Daniel P. Berrangé
2019-11-13 16:04         ` Jag Raman
2019-11-13 16:35           ` Daniel P. Berrangé
2019-10-24  9:08 ` [RFC v4 PATCH 03/49] multi-process: add a command line option for debug file Jagannathan Raman
2019-11-13 15:35   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 04/49] multi-process: Add stub functions to facilate build of multi-process Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 05/49] multi-process: Add config option for multi-process QEMU Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 06/49] multi-process: build system for remote device process Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object Jagannathan Raman
2019-11-11 16:41   ` Stefan Hajnoczi
2019-11-13 15:47     ` Jag Raman
2019-11-13 15:53   ` Stefan Hajnoczi
2019-11-18 15:26     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 08/49] multi-process: add functions to synchronize proxy and remote endpoints Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 09/49] multi-process: setup PCI host bridge for remote device Jagannathan Raman
2019-11-13 16:07   ` Stefan Hajnoczi
2019-11-18 15:25     ` Jag Raman
2019-11-21 10:37       ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 10/49] multi-process: setup a machine object for remote device process Jagannathan Raman
2019-11-13 16:22   ` Stefan Hajnoczi
2019-11-18 15:29     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device Jagannathan Raman
2019-11-13 16:33   ` Stefan Hajnoczi
2019-11-13 16:34     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 12/49] multi-process: remote process initialization Jagannathan Raman
2019-11-13 16:38   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 13/49] multi-process: introduce proxy object Jagannathan Raman
2019-11-21 11:09   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 14/49] mutli-process: build remote command line args Jagannathan Raman
2019-11-21 11:23   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 15/49] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
2019-11-21 11:33   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 16/49] multi-process: Add LSI device proxy object Jagannathan Raman
2019-11-21 11:35   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 17/49] multi-process: Synchronize remote memory Jagannathan Raman
2019-11-21 11:44   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 18/49] multi-process: create IOHUB object to handle irq Jagannathan Raman
2019-11-21 12:02   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 19/49] multi-process: configure remote side devices Jagannathan Raman
2019-11-21 12:05   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 20/49] multi-process: add qdev_proxy_add to create proxy devices Jagannathan Raman
2019-11-21 12:16   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 21/49] multi-process: remote: add setup_devices and setup_drive msg processing Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 22/49] multi-process: remote: use fd for socket from parent process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 23/49] multi-process: remote: add create_done condition Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 24/49] multi-process: add processing of remote drive and device command line Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 25/49] multi-process: Introduce build flags to separate remote process code Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 26/49] multi-process: refractor vl.c code to re-use in remote Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 27/49] multi-process: add remote option Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 28/49] multi-process: add remote options parser Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 29/49] multi-process: add parse_cmdline in remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 30/49] multi-process: send heartbeat messages to remote Jagannathan Raman
2019-11-11 16:27   ` Stefan Hajnoczi
2019-11-13 16:01     ` Jag Raman
2019-11-21 12:19       ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 31/49] multi-process: handle heartbeat messages in remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 32/49] multi-process: Use separate MMIO communication channel Jagannathan Raman
2019-11-11 16:21   ` Stefan Hajnoczi
2019-11-13 16:14     ` Jag Raman
2019-11-21 12:31       ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 33/49] multi-process: perform device reset in the remote process Jagannathan Raman
2019-11-11 16:19   ` Stefan Hajnoczi
2019-11-13 16:15     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 34/49] multi-process/mon: choose HMP commands based on target Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 35/49] multi-process/mon: stub functions to enable QMP module for remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 36/49] multi-process/mon: enable QMP module support in the " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 37/49] multi-process/mon: Refactor monitor/chardev functions out of vl.c Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 38/49] multi-process/mon: Initialize QMP module for remote processes Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 39/49] multi-process: prevent duplicate memory initialization in remote Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 40/49] multi-process/mig: build migration module in the remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object Jagannathan Raman
2019-11-13 15:50   ` Daniel P. Berrangé
2019-11-13 16:32     ` Jag Raman
2019-11-13 17:11       ` Daniel P. Berrangé
2019-11-18 15:42         ` Jag Raman
2019-11-22 10:34           ` Dr. David Alan Gilbert
2019-10-24  9:09 ` [RFC v4 PATCH 42/49] multi-process/mig: Send VMSD of remote to " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 43/49] multi-process/mig: Load VMSD in the proxy object Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 44/49] multi-process/mig: refactor runstate_check into common file Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 45/49] multi-process/mig: Synchronize runstate of remote process Jagannathan Raman
2019-11-11 16:17   ` Stefan Hajnoczi
2019-11-13 16:33     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 46/49] multi-process/mig: Restore the VMSD in " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 47/49] multi-process: Enable support for multiple devices in remote Jagannathan Raman
2019-11-11 16:15   ` Stefan Hajnoczi
2019-11-13 16:21     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
2019-10-25 19:33   ` Elena Ufimtseva
2019-11-07 15:50   ` Stefan Hajnoczi
2019-11-11 15:41   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 49/49] multi-process: add configure and usage information Jagannathan Raman
2019-11-07 14:02   ` Stefan Hajnoczi
2019-11-07 14:33     ` Michael S. Tsirkin
2019-11-08 11:17       ` Stefan Hajnoczi
2019-11-08 11:32         ` Daniel P. Berrangé
2019-11-07 14:39     ` Daniel P. Berrangé
2019-11-07 15:53       ` Jag Raman
2019-11-08 11:14         ` Stefan Hajnoczi
2019-10-25  2:08 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu no-reply
2019-10-25  2:08 ` no-reply
2019-10-25  2:10 ` no-reply
2019-11-21 12:46 ` Stefan Hajnoczi
2019-12-10  6:47 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update Elena Ufimtseva
2019-12-13 10:41   ` Stefan Hajnoczi
2019-12-16 19:46     ` Elena Ufimtseva
2019-12-16 19:57       ` Felipe Franciosi
2019-12-17 16:33         ` Stefan Hajnoczi
2019-12-17 22:57           ` Felipe Franciosi
2019-12-18  0:00             ` Paolo Bonzini
2019-12-19 13:36               ` Stefan Hajnoczi
2019-12-20 17:15                 ` John G Johnson
2020-01-02 10:00                   ` Stefan Hajnoczi
2020-01-02 10:04                   ` Stefan Hajnoczi
2019-12-19 11:55             ` Stefan Hajnoczi
2019-12-19 12:33               ` Felipe Franciosi
2019-12-19 12:55                 ` Daniel P. Berrangé [this message]
2019-12-20  9:47                   ` Stefan Hajnoczi
2019-12-20  9:50                     ` Paolo Bonzini
2019-12-20 14:14                       ` Felipe Franciosi
2019-12-20 15:25                         ` Alex Williamson
2019-12-20 16:00                           ` Felipe Franciosi
2020-02-25  9:16                           ` Thanos Makatos
2019-12-20 10:22                     ` Daniel P. Berrangé
2020-01-02 10:42                       ` Stefan Hajnoczi
2020-01-02 11:03                         ` Felipe Franciosi
2020-01-02 18:55                           ` Marc-André Lureau
2020-01-08 16:31                             ` Stefan Hajnoczi
2020-01-03 15:59                           ` Stefan Hajnoczi
2020-01-14  1:56                             ` John G Johnson
2020-01-17 17:25                               ` Dr. David Alan Gilbert
2019-12-19 16:40                 ` Jag Raman
2019-12-19 12:50             ` Daniel P. Berrangé
2019-12-19 16:46               ` Daniel P. Berrangé
2020-01-02 16:01           ` Elena Ufimtseva
2020-01-03 15:00             ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191219125504.GI1190276@redhat.com \
    --to=berrange@redhat.com \
    --cc=armbru@redhat.com \
    --cc=benjamin.walker@intel.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=fam@euphon.net \
    --cc=felipe@nutanix.com \
    --cc=jag.raman@oracle.com \
    --cc=james.r.harris@intel.com \
    --cc=john.g.johnson@oracle.com \
    --cc=kanth.ghatraju@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kraxel@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=liran.alon@oracle.com \
    --cc=marcandre.lureau@gmail.com \
    --cc=mreitz@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=ross.lagerwall@citrix.com \
    --cc=rth@twiddle.net \
    --cc=stefanha@gmail.com \
    --cc=swapnil.ingle@nutanix.com \
    --cc=thanos.makatos@nutanix.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).