From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Felipe Franciosi <felipe@nutanix.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
"fam@euphon.net" <fam@euphon.net>,
Swapnil Ingle <swapnil.ingle@nutanix.com>,
"john.g.johnson@oracle.com" <john.g.johnson@oracle.com>,
Stefan Hajnoczi <stefanha@gmail.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"Walker, Benjamin" <benjamin.walker@intel.com>,
"kraxel@redhat.com" <kraxel@redhat.com>,
"jag.raman@oracle.com" <jag.raman@oracle.com>,
"Harris, James R" <james.r.harris@intel.com>,
"quintela@redhat.com" <quintela@redhat.com>,
"mst@redhat.com" <mst@redhat.com>,
"armbru@redhat.com" <armbru@redhat.com>,
"kanth.ghatraju@oracle.com" <kanth.ghatraju@oracle.com>,
"thuth@redhat.com" <thuth@redhat.com>,
"ehabkost@redhat.com" <ehabkost@redhat.com>,
"konrad.wilk@oracle.com" <konrad.wilk@oracle.com>,
"dgilbert@redhat.com" <dgilbert@redhat.com>,
"liran.alon@oracle.com" <liran.alon@oracle.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"rth@twiddle.net" <rth@twiddle.net>,
"kwolf@redhat.com" <kwolf@redhat.com>,
"mreitz@redhat.com" <mreitz@redhat.com>,
"ross.lagerwall@citrix.com" <ross.lagerwall@citrix.com>,
"marcandre.lureau@gmail.com" <marcandre.lureau@gmail.com>,
Thanos Makatos <thanos.makatos@nutanix.com>
Subject: Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
Date: Thu, 19 Dec 2019 12:55:04 +0000 [thread overview]
Message-ID: <20191219125504.GI1190276@redhat.com> (raw)
In-Reply-To: <772D9CF3-D15D-42D1-B9CF-1279619D7C20@nutanix.com>
On Thu, Dec 19, 2019 at 12:33:15PM +0000, Felipe Franciosi wrote:
> Hello,
>
> (I've added Jim and Ben from the SPDK team to the thread.)
>
> > On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >
> > On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
> >>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
> >>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
> >>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
> >>> Questions I've seen when discussing muser with people have been:
> >>>
> >>> 1. Can unprivileged containers create muser devices? If not, this is a
> >>> blocker for use cases that want to avoid root privileges entirely.
> >>
> >> Yes you can. Muser device creation follows the same process as general
> >> mdev device creation (ie. you write to a sysfs path). That creates an
> >> entry in /dev/vfio and the control plane can further drop privileges
> >> there (set selinux contexts, &c.)
> >
> > In this case there is still a privileged step during setup. What about
> > completely unprivileged scenarios like a regular user without root or a
> > rootless container?
>
> Oh, I see what you are saying. I suppose we need to investigate
> adjusting the privileges of the sysfs path correctly beforehand to
> allow devices to be created by non-root users. The credentials used on
> creation should be reflected on the vfio endpoint (ie. /dev/fio/<group>).
>
> I need to look into that and get back to you.
>
> >
> >>> 2. Does muser need to be in the kernel (e.g. slower to develop/ship,
> >>> security reasons)? A similar library could be implemented in
> >>> userspace along the lines of the vhost-user protocol. Although VMMs
> >>> would then need to use a new libmuser-client library instead of
> >>> reusing their VFIO code to access the device.
> >>
> >> Doing it in userspace was the flow we proposed back in last year's KVM
> >> Forum (Edinburgh), but it got turned down. That's why we procured the
> >> kernel approach, which turned out to have some advantages:
> >> - No changes needed to Qemu
> >> - No Qemu needed at all for userspace drivers
> >> - Device emulation process restart is trivial
> >> (it therefore makes device code upgrades much easier)
> >>
> >> Having said that, nothing stops us from enhancing libmuser to talk
> >> directly to Qemu (for the Qemu case). I envision at least two ways of
> >> doing that:
> >> - Hooking up libmuser with Qemu directly (eg. over a unix socket)
> >> - Hooking Qemu with CUSE and implementing the muser.ko interface
> >>
> >> For the latter, libmuser would talk to a character device just like it
> >> talks to the vfio character device. We "just" need to implement that
> >> backend in Qemu. :)
> >
> > What about:
> > * libmuser's API stays mostly unchanged but the library speaks a
> > VFIO-over-UNIX domain sockets protocol instead of talking to
> > mdev/vfio in the host kernel.
>
> As I said above, there are advantages to the kernel model. The key one
> is transparent device emulation restarts. Today, muser.ko keeps the
> "device memory" internally in a prefix tree. Upon restart, a new
> device emulator can recover state (eg. from a state file in /dev/shm
> or similar) and remap the same memory that is already configured to
> the guest via Qemu. We have a pending work item for muser.ko to also
> keep the eventfds so we can recover those, too. Another advantage is
> working with any userspace driver and not requiring a VMM at all.
>
> If done entirely in userspace, the device emulator needs to allocate
> the device memory somewhere that remains accessible (eg. tmpfs), with
> the difference that now we may be talking about non-trivial amounts of
> memory. Also, that may not be the kind of content you want lingering
> around the filesystem (for the same reasons Qemu unlinks memory files
> from /dev/hugepages after mmap'ing it).
>
> That's why I'd prefer to rephrase what you said to "in addition"
> instead of "instead".
>
> > * VMMs can implement this protocol directly for POSIX-portable and
> > unprivileged operation.
> > * A CUSE VFIO adapter simulates /dev/vfio so that VFIO-only VMMs can
> > still take advantage of libmuser devices.
>
> I'm happy with that.
> We need to think the credential aspect throughout to ensure nodes can
> be created in the right places with the right privileges.
>
> >
> > Assuming this is feasible, would you lose any important
> > features/advantages of the muser.ko approach? I don't know enough about
> > VFIO to identify any blocker or obvious performance problems.
>
> That's what I elaborated above. The fact that muser.ko can keep
> various metadata (and other resources) about the device in the kernel
> and grant it back to userspace as needed. There are ways around it,
> but it requires some orchestration with tmpfs and the VMM (only so
> much can be kept in tmpfs; the eventfds need to be retransmitted from
> the machine emulator on request).
>
> Restarting is a critical aspect of this. One key use case for the
> project is to be able to emulate various devices from one process (for
> polling). That must be able to restart for upgrades or recovery.
>
> >
> > Regarding recovery, it seems straightforward to keep state in a tmpfs
> > file that can be reopened when the device is restarted. I don't think
> > kernel code is necessary?
>
> It adds a dependency, but isn't a show stopper. If we can work through
> permission issues, making sure the VMM can reconnect and retransmit
> eventfds and other state, then it should be ok.
>
> To be clear: I'm very happy to have a userspace-only option for this,
> I just don't want to ditch the kernel module (yet, anyway). :)
If it doesn't create too large of a burden to support both, then I think
it is very desirable. IIUC, this is saying a kernel based solution as the
optimized/optimal solution, and userspace UNIX socket based option as the
generic "works everywhere" fallback solution.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
next prev parent reply other threads:[~2019-12-19 13:21 UTC|newest]
Thread overview: 140+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-24 9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
2019-10-24 9:08 ` [RFC v4 PATCH 01/49] multi-process: memory: alloc RAM from file at offset Jagannathan Raman
2019-10-24 9:08 ` [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread Jagannathan Raman
2019-11-13 15:30 ` Stefan Hajnoczi
2019-11-13 15:38 ` Jag Raman
2019-11-13 15:51 ` Daniel P. Berrangé
2019-11-13 16:04 ` Jag Raman
2019-11-13 16:35 ` Daniel P. Berrangé
2019-10-24 9:08 ` [RFC v4 PATCH 03/49] multi-process: add a command line option for debug file Jagannathan Raman
2019-11-13 15:35 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 04/49] multi-process: Add stub functions to facilate build of multi-process Jagannathan Raman
2019-10-24 9:08 ` [RFC v4 PATCH 05/49] multi-process: Add config option for multi-process QEMU Jagannathan Raman
2019-10-24 9:08 ` [RFC v4 PATCH 06/49] multi-process: build system for remote device process Jagannathan Raman
2019-10-24 9:08 ` [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object Jagannathan Raman
2019-11-11 16:41 ` Stefan Hajnoczi
2019-11-13 15:47 ` Jag Raman
2019-11-13 15:53 ` Stefan Hajnoczi
2019-11-18 15:26 ` Jag Raman
2019-10-24 9:08 ` [RFC v4 PATCH 08/49] multi-process: add functions to synchronize proxy and remote endpoints Jagannathan Raman
2019-10-24 9:08 ` [RFC v4 PATCH 09/49] multi-process: setup PCI host bridge for remote device Jagannathan Raman
2019-11-13 16:07 ` Stefan Hajnoczi
2019-11-18 15:25 ` Jag Raman
2019-11-21 10:37 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 10/49] multi-process: setup a machine object for remote device process Jagannathan Raman
2019-11-13 16:22 ` Stefan Hajnoczi
2019-11-18 15:29 ` Jag Raman
2019-10-24 9:08 ` [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device Jagannathan Raman
2019-11-13 16:33 ` Stefan Hajnoczi
2019-11-13 16:34 ` Jag Raman
2019-10-24 9:08 ` [RFC v4 PATCH 12/49] multi-process: remote process initialization Jagannathan Raman
2019-11-13 16:38 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 13/49] multi-process: introduce proxy object Jagannathan Raman
2019-11-21 11:09 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 14/49] mutli-process: build remote command line args Jagannathan Raman
2019-11-21 11:23 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 15/49] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
2019-11-21 11:33 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 16/49] multi-process: Add LSI device proxy object Jagannathan Raman
2019-11-21 11:35 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 17/49] multi-process: Synchronize remote memory Jagannathan Raman
2019-11-21 11:44 ` Stefan Hajnoczi
2019-10-24 9:08 ` [RFC v4 PATCH 18/49] multi-process: create IOHUB object to handle irq Jagannathan Raman
2019-11-21 12:02 ` Stefan Hajnoczi
2019-10-24 9:09 ` [RFC v4 PATCH 19/49] multi-process: configure remote side devices Jagannathan Raman
2019-11-21 12:05 ` Stefan Hajnoczi
2019-10-24 9:09 ` [RFC v4 PATCH 20/49] multi-process: add qdev_proxy_add to create proxy devices Jagannathan Raman
2019-11-21 12:16 ` Stefan Hajnoczi
2019-10-24 9:09 ` [RFC v4 PATCH 21/49] multi-process: remote: add setup_devices and setup_drive msg processing Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 22/49] multi-process: remote: use fd for socket from parent process Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 23/49] multi-process: remote: add create_done condition Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 24/49] multi-process: add processing of remote drive and device command line Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 25/49] multi-process: Introduce build flags to separate remote process code Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 26/49] multi-process: refractor vl.c code to re-use in remote Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 27/49] multi-process: add remote option Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 28/49] multi-process: add remote options parser Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 29/49] multi-process: add parse_cmdline in remote process Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 30/49] multi-process: send heartbeat messages to remote Jagannathan Raman
2019-11-11 16:27 ` Stefan Hajnoczi
2019-11-13 16:01 ` Jag Raman
2019-11-21 12:19 ` Stefan Hajnoczi
2019-10-24 9:09 ` [RFC v4 PATCH 31/49] multi-process: handle heartbeat messages in remote process Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 32/49] multi-process: Use separate MMIO communication channel Jagannathan Raman
2019-11-11 16:21 ` Stefan Hajnoczi
2019-11-13 16:14 ` Jag Raman
2019-11-21 12:31 ` Stefan Hajnoczi
2019-10-24 9:09 ` [RFC v4 PATCH 33/49] multi-process: perform device reset in the remote process Jagannathan Raman
2019-11-11 16:19 ` Stefan Hajnoczi
2019-11-13 16:15 ` Jag Raman
2019-10-24 9:09 ` [RFC v4 PATCH 34/49] multi-process/mon: choose HMP commands based on target Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 35/49] multi-process/mon: stub functions to enable QMP module for remote process Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 36/49] multi-process/mon: enable QMP module support in the " Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 37/49] multi-process/mon: Refactor monitor/chardev functions out of vl.c Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 38/49] multi-process/mon: Initialize QMP module for remote processes Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 39/49] multi-process: prevent duplicate memory initialization in remote Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 40/49] multi-process/mig: build migration module in the remote process Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object Jagannathan Raman
2019-11-13 15:50 ` Daniel P. Berrangé
2019-11-13 16:32 ` Jag Raman
2019-11-13 17:11 ` Daniel P. Berrangé
2019-11-18 15:42 ` Jag Raman
2019-11-22 10:34 ` Dr. David Alan Gilbert
2019-10-24 9:09 ` [RFC v4 PATCH 42/49] multi-process/mig: Send VMSD of remote to " Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 43/49] multi-process/mig: Load VMSD in the proxy object Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 44/49] multi-process/mig: refactor runstate_check into common file Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 45/49] multi-process/mig: Synchronize runstate of remote process Jagannathan Raman
2019-11-11 16:17 ` Stefan Hajnoczi
2019-11-13 16:33 ` Jag Raman
2019-10-24 9:09 ` [RFC v4 PATCH 46/49] multi-process/mig: Restore the VMSD in " Jagannathan Raman
2019-10-24 9:09 ` [RFC v4 PATCH 47/49] multi-process: Enable support for multiple devices in remote Jagannathan Raman
2019-11-11 16:15 ` Stefan Hajnoczi
2019-11-13 16:21 ` Jag Raman
2019-10-24 9:09 ` [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
2019-10-25 19:33 ` Elena Ufimtseva
2019-11-07 15:50 ` Stefan Hajnoczi
2019-11-11 15:41 ` Stefan Hajnoczi
2019-10-24 9:09 ` [RFC v4 PATCH 49/49] multi-process: add configure and usage information Jagannathan Raman
2019-11-07 14:02 ` Stefan Hajnoczi
2019-11-07 14:33 ` Michael S. Tsirkin
2019-11-08 11:17 ` Stefan Hajnoczi
2019-11-08 11:32 ` Daniel P. Berrangé
2019-11-07 14:39 ` Daniel P. Berrangé
2019-11-07 15:53 ` Jag Raman
2019-11-08 11:14 ` Stefan Hajnoczi
2019-10-25 2:08 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu no-reply
2019-10-25 2:08 ` no-reply
2019-10-25 2:10 ` no-reply
2019-11-21 12:46 ` Stefan Hajnoczi
2019-12-10 6:47 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update Elena Ufimtseva
2019-12-13 10:41 ` Stefan Hajnoczi
2019-12-16 19:46 ` Elena Ufimtseva
2019-12-16 19:57 ` Felipe Franciosi
2019-12-17 16:33 ` Stefan Hajnoczi
2019-12-17 22:57 ` Felipe Franciosi
2019-12-18 0:00 ` Paolo Bonzini
2019-12-19 13:36 ` Stefan Hajnoczi
2019-12-20 17:15 ` John G Johnson
2020-01-02 10:00 ` Stefan Hajnoczi
2020-01-02 10:04 ` Stefan Hajnoczi
2019-12-19 11:55 ` Stefan Hajnoczi
2019-12-19 12:33 ` Felipe Franciosi
2019-12-19 12:55 ` Daniel P. Berrangé [this message]
2019-12-20 9:47 ` Stefan Hajnoczi
2019-12-20 9:50 ` Paolo Bonzini
2019-12-20 14:14 ` Felipe Franciosi
2019-12-20 15:25 ` Alex Williamson
2019-12-20 16:00 ` Felipe Franciosi
2020-02-25 9:16 ` Thanos Makatos
2019-12-20 10:22 ` Daniel P. Berrangé
2020-01-02 10:42 ` Stefan Hajnoczi
2020-01-02 11:03 ` Felipe Franciosi
2020-01-02 18:55 ` Marc-André Lureau
2020-01-08 16:31 ` Stefan Hajnoczi
2020-01-03 15:59 ` Stefan Hajnoczi
2020-01-14 1:56 ` John G Johnson
2020-01-17 17:25 ` Dr. David Alan Gilbert
2019-12-19 16:40 ` Jag Raman
2019-12-19 12:50 ` Daniel P. Berrangé
2019-12-19 16:46 ` Daniel P. Berrangé
2020-01-02 16:01 ` Elena Ufimtseva
2020-01-03 15:00 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191219125504.GI1190276@redhat.com \
--to=berrange@redhat.com \
--cc=armbru@redhat.com \
--cc=benjamin.walker@intel.com \
--cc=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=elena.ufimtseva@oracle.com \
--cc=fam@euphon.net \
--cc=felipe@nutanix.com \
--cc=jag.raman@oracle.com \
--cc=james.r.harris@intel.com \
--cc=john.g.johnson@oracle.com \
--cc=kanth.ghatraju@oracle.com \
--cc=konrad.wilk@oracle.com \
--cc=kraxel@redhat.com \
--cc=kwolf@redhat.com \
--cc=liran.alon@oracle.com \
--cc=marcandre.lureau@gmail.com \
--cc=mreitz@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=ross.lagerwall@citrix.com \
--cc=rth@twiddle.net \
--cc=stefanha@gmail.com \
--cc=swapnil.ingle@nutanix.com \
--cc=thanos.makatos@nutanix.com \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).