qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jag Raman <jag.raman@oracle.com>
To: Felipe Franciosi <felipe@nutanix.com>,
	Stefan Hajnoczi <stefanha@gmail.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
	"fam@euphon.net" <fam@euphon.net>,
	Swapnil Ingle <swapnil.ingle@nutanix.com>,
	"john.g.johnson@oracle.com" <john.g.johnson@oracle.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Walker, Benjamin" <benjamin.walker@intel.com>,
	"kraxel@redhat.com" <kraxel@redhat.com>,
	"Harris, James R" <james.r.harris@intel.com>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"kanth.ghatraju@oracle.com" <kanth.ghatraju@oracle.com>,
	"thuth@redhat.com" <thuth@redhat.com>,
	"ehabkost@redhat.com" <ehabkost@redhat.com>,
	"konrad.wilk@oracle.com" <konrad.wilk@oracle.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"liran.alon@oracle.com" <liran.alon@oracle.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"rth@twiddle.net" <rth@twiddle.net>,
	"kwolf@redhat.com" <kwolf@redhat.com>,
	"berrange@redhat.com" <berrange@redhat.com>,
	"mreitz@redhat.com" <mreitz@redhat.com>,
	"ross.lagerwall@citrix.com" <ross.lagerwall@citrix.com>,
	"marcandre.lureau@gmail.com" <marcandre.lureau@gmail.com>,
	Thanos Makatos <thanos.makatos@nutanix.com>
Subject: Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
Date: Thu, 19 Dec 2019 11:40:05 -0500	[thread overview]
Message-ID: <a98221a8-0a62-46f8-499d-84447fb4980a@oracle.com> (raw)
In-Reply-To: <772D9CF3-D15D-42D1-B9CF-1279619D7C20@nutanix.com>



On 12/19/2019 7:33 AM, Felipe Franciosi wrote:
> Hello,
> 
> (I've added Jim and Ben from the SPDK team to the thread.)
> 
>> On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>
>> On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
>>>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
>>>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
>>>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
>>>> Questions I've seen when discussing muser with people have been:
>>>>
>>>> 1. Can unprivileged containers create muser devices?  If not, this is a
>>>>   blocker for use cases that want to avoid root privileges entirely.
>>>
>>> Yes you can. Muser device creation follows the same process as general
>>> mdev device creation (ie. you write to a sysfs path). That creates an
>>> entry in /dev/vfio and the control plane can further drop privileges
>>> there (set selinux contexts, &c.)
>>
>> In this case there is still a privileged step during setup.  What about
>> completely unprivileged scenarios like a regular user without root or a
>> rootless container?
> 
> Oh, I see what you are saying. I suppose we need to investigate
> adjusting the privileges of the sysfs path correctly beforehand to
> allow devices to be created by non-root users. The credentials used on
> creation should be reflected on the vfio endpoint (ie. /dev/fio/<group>).
> 
> I need to look into that and get back to you.

As a prerequisite to using the "vfio-pci" device in QEMU, the user
assigns the PCI device on the host bus to the VFIO kernel driver by
writing to "/sys/bus/pci/drivers/vfio-pci/new_id" and
"/sys/bus/pci/drivers/vfio-pci/bind"

I believe a privileged control plane is required to perform these
prerequisite steps. Therefore, I wonder how rootless containers or
unprivileged users currently go about using a VFIO device with QEMU/KVM.

Thanks!
--
Jag

> 
>>
>>>> 2. Does muser need to be in the kernel (e.g. slower to develop/ship,
>>>>   security reasons)?  A similar library could be implemented in
>>>>   userspace along the lines of the vhost-user protocol.  Although VMMs
>>>>   would then need to use a new libmuser-client library instead of
>>>>   reusing their VFIO code to access the device.
>>>
>>> Doing it in userspace was the flow we proposed back in last year's KVM
>>> Forum (Edinburgh), but it got turned down. That's why we procured the
>>> kernel approach, which turned out to have some advantages:
>>> - No changes needed to Qemu
>>> - No Qemu needed at all for userspace drivers
>>> - Device emulation process restart is trivial
>>>   (it therefore makes device code upgrades much easier)
>>>
>>> Having said that, nothing stops us from enhancing libmuser to talk
>>> directly to Qemu (for the Qemu case). I envision at least two ways of
>>> doing that:
>>> - Hooking up libmuser with Qemu directly (eg. over a unix socket)
>>> - Hooking Qemu with CUSE and implementing the muser.ko interface
>>>
>>> For the latter, libmuser would talk to a character device just like it
>>> talks to the vfio character device. We "just" need to implement that
>>> backend in Qemu. :)
>>
>> What about:
>> * libmuser's API stays mostly unchanged but the library speaks a
>>    VFIO-over-UNIX domain sockets protocol instead of talking to
>>    mdev/vfio in the host kernel.
> 
> As I said above, there are advantages to the kernel model. The key one
> is transparent device emulation restarts. Today, muser.ko keeps the
> "device memory" internally in a prefix tree. Upon restart, a new
> device emulator can recover state (eg. from a state file in /dev/shm
> or similar) and remap the same memory that is already configured to
> the guest via Qemu. We have a pending work item for muser.ko to also
> keep the eventfds so we can recover those, too. Another advantage is
> working with any userspace driver and not requiring a VMM at all.
> 
> If done entirely in userspace, the device emulator needs to allocate
> the device memory somewhere that remains accessible (eg. tmpfs), with
> the difference that now we may be talking about non-trivial amounts of
> memory. Also, that may not be the kind of content you want lingering
> around the filesystem (for the same reasons Qemu unlinks memory files
> from /dev/hugepages after mmap'ing it).
> 
> That's why I'd prefer to rephrase what you said to "in addition"
> instead of "instead".
> 
>> * VMMs can implement this protocol directly for POSIX-portable and
>>    unprivileged operation.
>> * A CUSE VFIO adapter simulates /dev/vfio so that VFIO-only VMMs can
>>    still take advantage of libmuser devices.
> 
> I'm happy with that.
> We need to think the credential aspect throughout to ensure nodes can
> be created in the right places with the right privileges.
> 
>>
>> Assuming this is feasible, would you lose any important
>> features/advantages of the muser.ko approach?  I don't know enough about
>> VFIO to identify any blocker or obvious performance problems.
> 
> That's what I elaborated above. The fact that muser.ko can keep
> various metadata (and other resources) about the device in the kernel
> and grant it back to userspace as needed. There are ways around it,
> but it requires some orchestration with tmpfs and the VMM (only so
> much can be kept in tmpfs; the eventfds need to be retransmitted from
> the machine emulator on request).
> 
> Restarting is a critical aspect of this. One key use case for the
> project is to be able to emulate various devices from one process (for
> polling). That must be able to restart for upgrades or recovery.
> 
>>
>> Regarding recovery, it seems straightforward to keep state in a tmpfs
>> file that can be reopened when the device is restarted.  I don't think
>> kernel code is necessary?
> 
> It adds a dependency, but isn't a show stopper. If we can work through
> permission issues, making sure the VMM can reconnect and retransmit
> eventfds and other state, then it should be ok.
> 
> To be clear: I'm very happy to have a userspace-only option for this,
> I just don't want to ditch the kernel module (yet, anyway). :)
> 
> F.
> 
>>
>> Stefan
> 


  parent reply	other threads:[~2019-12-19 16:43 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 01/49] multi-process: memory: alloc RAM from file at offset Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread Jagannathan Raman
2019-11-13 15:30   ` Stefan Hajnoczi
2019-11-13 15:38     ` Jag Raman
2019-11-13 15:51       ` Daniel P. Berrangé
2019-11-13 16:04         ` Jag Raman
2019-11-13 16:35           ` Daniel P. Berrangé
2019-10-24  9:08 ` [RFC v4 PATCH 03/49] multi-process: add a command line option for debug file Jagannathan Raman
2019-11-13 15:35   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 04/49] multi-process: Add stub functions to facilate build of multi-process Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 05/49] multi-process: Add config option for multi-process QEMU Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 06/49] multi-process: build system for remote device process Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object Jagannathan Raman
2019-11-11 16:41   ` Stefan Hajnoczi
2019-11-13 15:47     ` Jag Raman
2019-11-13 15:53   ` Stefan Hajnoczi
2019-11-18 15:26     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 08/49] multi-process: add functions to synchronize proxy and remote endpoints Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 09/49] multi-process: setup PCI host bridge for remote device Jagannathan Raman
2019-11-13 16:07   ` Stefan Hajnoczi
2019-11-18 15:25     ` Jag Raman
2019-11-21 10:37       ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 10/49] multi-process: setup a machine object for remote device process Jagannathan Raman
2019-11-13 16:22   ` Stefan Hajnoczi
2019-11-18 15:29     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device Jagannathan Raman
2019-11-13 16:33   ` Stefan Hajnoczi
2019-11-13 16:34     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 12/49] multi-process: remote process initialization Jagannathan Raman
2019-11-13 16:38   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 13/49] multi-process: introduce proxy object Jagannathan Raman
2019-11-21 11:09   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 14/49] mutli-process: build remote command line args Jagannathan Raman
2019-11-21 11:23   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 15/49] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
2019-11-21 11:33   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 16/49] multi-process: Add LSI device proxy object Jagannathan Raman
2019-11-21 11:35   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 17/49] multi-process: Synchronize remote memory Jagannathan Raman
2019-11-21 11:44   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 18/49] multi-process: create IOHUB object to handle irq Jagannathan Raman
2019-11-21 12:02   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 19/49] multi-process: configure remote side devices Jagannathan Raman
2019-11-21 12:05   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 20/49] multi-process: add qdev_proxy_add to create proxy devices Jagannathan Raman
2019-11-21 12:16   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 21/49] multi-process: remote: add setup_devices and setup_drive msg processing Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 22/49] multi-process: remote: use fd for socket from parent process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 23/49] multi-process: remote: add create_done condition Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 24/49] multi-process: add processing of remote drive and device command line Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 25/49] multi-process: Introduce build flags to separate remote process code Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 26/49] multi-process: refractor vl.c code to re-use in remote Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 27/49] multi-process: add remote option Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 28/49] multi-process: add remote options parser Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 29/49] multi-process: add parse_cmdline in remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 30/49] multi-process: send heartbeat messages to remote Jagannathan Raman
2019-11-11 16:27   ` Stefan Hajnoczi
2019-11-13 16:01     ` Jag Raman
2019-11-21 12:19       ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 31/49] multi-process: handle heartbeat messages in remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 32/49] multi-process: Use separate MMIO communication channel Jagannathan Raman
2019-11-11 16:21   ` Stefan Hajnoczi
2019-11-13 16:14     ` Jag Raman
2019-11-21 12:31       ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 33/49] multi-process: perform device reset in the remote process Jagannathan Raman
2019-11-11 16:19   ` Stefan Hajnoczi
2019-11-13 16:15     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 34/49] multi-process/mon: choose HMP commands based on target Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 35/49] multi-process/mon: stub functions to enable QMP module for remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 36/49] multi-process/mon: enable QMP module support in the " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 37/49] multi-process/mon: Refactor monitor/chardev functions out of vl.c Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 38/49] multi-process/mon: Initialize QMP module for remote processes Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 39/49] multi-process: prevent duplicate memory initialization in remote Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 40/49] multi-process/mig: build migration module in the remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object Jagannathan Raman
2019-11-13 15:50   ` Daniel P. Berrangé
2019-11-13 16:32     ` Jag Raman
2019-11-13 17:11       ` Daniel P. Berrangé
2019-11-18 15:42         ` Jag Raman
2019-11-22 10:34           ` Dr. David Alan Gilbert
2019-10-24  9:09 ` [RFC v4 PATCH 42/49] multi-process/mig: Send VMSD of remote to " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 43/49] multi-process/mig: Load VMSD in the proxy object Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 44/49] multi-process/mig: refactor runstate_check into common file Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 45/49] multi-process/mig: Synchronize runstate of remote process Jagannathan Raman
2019-11-11 16:17   ` Stefan Hajnoczi
2019-11-13 16:33     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 46/49] multi-process/mig: Restore the VMSD in " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 47/49] multi-process: Enable support for multiple devices in remote Jagannathan Raman
2019-11-11 16:15   ` Stefan Hajnoczi
2019-11-13 16:21     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
2019-10-25 19:33   ` Elena Ufimtseva
2019-11-07 15:50   ` Stefan Hajnoczi
2019-11-11 15:41   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 49/49] multi-process: add configure and usage information Jagannathan Raman
2019-11-07 14:02   ` Stefan Hajnoczi
2019-11-07 14:33     ` Michael S. Tsirkin
2019-11-08 11:17       ` Stefan Hajnoczi
2019-11-08 11:32         ` Daniel P. Berrangé
2019-11-07 14:39     ` Daniel P. Berrangé
2019-11-07 15:53       ` Jag Raman
2019-11-08 11:14         ` Stefan Hajnoczi
2019-10-25  2:08 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu no-reply
2019-10-25  2:08 ` no-reply
2019-10-25  2:10 ` no-reply
2019-11-21 12:46 ` Stefan Hajnoczi
2019-12-10  6:47 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update Elena Ufimtseva
2019-12-13 10:41   ` Stefan Hajnoczi
2019-12-16 19:46     ` Elena Ufimtseva
2019-12-16 19:57       ` Felipe Franciosi
2019-12-17 16:33         ` Stefan Hajnoczi
2019-12-17 22:57           ` Felipe Franciosi
2019-12-18  0:00             ` Paolo Bonzini
2019-12-19 13:36               ` Stefan Hajnoczi
2019-12-20 17:15                 ` John G Johnson
2020-01-02 10:00                   ` Stefan Hajnoczi
2020-01-02 10:04                   ` Stefan Hajnoczi
2019-12-19 11:55             ` Stefan Hajnoczi
2019-12-19 12:33               ` Felipe Franciosi
2019-12-19 12:55                 ` Daniel P. Berrangé
2019-12-20  9:47                   ` Stefan Hajnoczi
2019-12-20  9:50                     ` Paolo Bonzini
2019-12-20 14:14                       ` Felipe Franciosi
2019-12-20 15:25                         ` Alex Williamson
2019-12-20 16:00                           ` Felipe Franciosi
2020-02-25  9:16                           ` Thanos Makatos
2019-12-20 10:22                     ` Daniel P. Berrangé
2020-01-02 10:42                       ` Stefan Hajnoczi
2020-01-02 11:03                         ` Felipe Franciosi
2020-01-02 18:55                           ` Marc-André Lureau
2020-01-08 16:31                             ` Stefan Hajnoczi
2020-01-03 15:59                           ` Stefan Hajnoczi
2020-01-14  1:56                             ` John G Johnson
2020-01-17 17:25                               ` Dr. David Alan Gilbert
2019-12-19 16:40                 ` Jag Raman [this message]
2019-12-19 12:50             ` Daniel P. Berrangé
2019-12-19 16:46               ` Daniel P. Berrangé
2020-01-02 16:01           ` Elena Ufimtseva
2020-01-03 15:00             ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a98221a8-0a62-46f8-499d-84447fb4980a@oracle.com \
    --to=jag.raman@oracle.com \
    --cc=armbru@redhat.com \
    --cc=benjamin.walker@intel.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=fam@euphon.net \
    --cc=felipe@nutanix.com \
    --cc=james.r.harris@intel.com \
    --cc=john.g.johnson@oracle.com \
    --cc=kanth.ghatraju@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kraxel@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=liran.alon@oracle.com \
    --cc=marcandre.lureau@gmail.com \
    --cc=mreitz@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=ross.lagerwall@citrix.com \
    --cc=rth@twiddle.net \
    --cc=stefanha@gmail.com \
    --cc=swapnil.ingle@nutanix.com \
    --cc=thanos.makatos@nutanix.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).