All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jag Raman <jag.raman@oracle.com>
To: Felipe Franciosi <felipe@nutanix.com>,
	Stefan Hajnoczi <stefanha@gmail.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
	"fam@euphon.net" <fam@euphon.net>,
	Swapnil Ingle <swapnil.ingle@nutanix.com>,
	"john.g.johnson@oracle.com" <john.g.johnson@oracle.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Walker, Benjamin" <benjamin.walker@intel.com>,
	"kraxel@redhat.com" <kraxel@redhat.com>,
	"Harris, James R" <james.r.harris@intel.com>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"kanth.ghatraju@oracle.com" <kanth.ghatraju@oracle.com>,
	"thuth@redhat.com" <thuth@redhat.com>,
	"ehabkost@redhat.com" <ehabkost@redhat.com>,
	"konrad.wilk@oracle.com" <konrad.wilk@oracle.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"liran.alon@oracle.com" <liran.alon@oracle.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"rth@twiddle.net" <rth@twiddle.net>,
	"kwolf@redhat.com" <kwolf@redhat.com>,
	"berrange@redhat.com" <berrange@redhat.com>,
	"mreitz@redhat.com" <mreitz@redhat.com>,
	"ross.lagerwall@citrix.com" <ross.lagerwall@citrix.com>,
	"marcandre.lureau@gmail.com" <marcandre.lureau@gmail.com>,
	Thanos Makatos <thanos.makatos@nutanix.com>
Subject: Re: [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update
Date: Thu, 19 Dec 2019 11:40:05 -0500	[thread overview]
Message-ID: <a98221a8-0a62-46f8-499d-84447fb4980a@oracle.com> (raw)
In-Reply-To: <772D9CF3-D15D-42D1-B9CF-1279619D7C20@nutanix.com>



On 12/19/2019 7:33 AM, Felipe Franciosi wrote:
> Hello,
> 
> (I've added Jim and Ben from the SPDK team to the thread.)
> 
>> On Dec 19, 2019, at 11:55 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>
>> On Tue, Dec 17, 2019 at 10:57:17PM +0000, Felipe Franciosi wrote:
>>>> On Dec 17, 2019, at 5:33 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>>> On Mon, Dec 16, 2019 at 07:57:32PM +0000, Felipe Franciosi wrote:
>>>>>> On 16 Dec 2019, at 20:47, Elena Ufimtseva <elena.ufimtseva@oracle.com> wrote:
>>>>>> On Fri, Dec 13, 2019 at 10:41:16AM +0000, Stefan Hajnoczi wrote:
>>>> Questions I've seen when discussing muser with people have been:
>>>>
>>>> 1. Can unprivileged containers create muser devices?  If not, this is a
>>>>   blocker for use cases that want to avoid root privileges entirely.
>>>
>>> Yes you can. Muser device creation follows the same process as general
>>> mdev device creation (ie. you write to a sysfs path). That creates an
>>> entry in /dev/vfio and the control plane can further drop privileges
>>> there (set selinux contexts, &c.)
>>
>> In this case there is still a privileged step during setup.  What about
>> completely unprivileged scenarios like a regular user without root or a
>> rootless container?
> 
> Oh, I see what you are saying. I suppose we need to investigate
> adjusting the privileges of the sysfs path correctly beforehand to
> allow devices to be created by non-root users. The credentials used on
> creation should be reflected on the vfio endpoint (ie. /dev/fio/<group>).
> 
> I need to look into that and get back to you.

As a prerequisite to using the "vfio-pci" device in QEMU, the user
assigns the PCI device on the host bus to the VFIO kernel driver by
writing to "/sys/bus/pci/drivers/vfio-pci/new_id" and
"/sys/bus/pci/drivers/vfio-pci/bind"

I believe a privileged control plane is required to perform these
prerequisite steps. Therefore, I wonder how rootless containers or
unprivileged users currently go about using a VFIO device with QEMU/KVM.

Thanks!
--
Jag

> 
>>
>>>> 2. Does muser need to be in the kernel (e.g. slower to develop/ship,
>>>>   security reasons)?  A similar library could be implemented in
>>>>   userspace along the lines of the vhost-user protocol.  Although VMMs
>>>>   would then need to use a new libmuser-client library instead of
>>>>   reusing their VFIO code to access the device.
>>>
>>> Doing it in userspace was the flow we proposed back in last year's KVM
>>> Forum (Edinburgh), but it got turned down. That's why we procured the
>>> kernel approach, which turned out to have some advantages:
>>> - No changes needed to Qemu
>>> - No Qemu needed at all for userspace drivers
>>> - Device emulation process restart is trivial
>>>   (it therefore makes device code upgrades much easier)
>>>
>>> Having said that, nothing stops us from enhancing libmuser to talk
>>> directly to Qemu (for the Qemu case). I envision at least two ways of
>>> doing that:
>>> - Hooking up libmuser with Qemu directly (eg. over a unix socket)
>>> - Hooking Qemu with CUSE and implementing the muser.ko interface
>>>
>>> For the latter, libmuser would talk to a character device just like it
>>> talks to the vfio character device. We "just" need to implement that
>>> backend in Qemu. :)
>>
>> What about:
>> * libmuser's API stays mostly unchanged but the library speaks a
>>    VFIO-over-UNIX domain sockets protocol instead of talking to
>>    mdev/vfio in the host kernel.
> 
> As I said above, there are advantages to the kernel model. The key one
> is transparent device emulation restarts. Today, muser.ko keeps the
> "device memory" internally in a prefix tree. Upon restart, a new
> device emulator can recover state (eg. from a state file in /dev/shm
> or similar) and remap the same memory that is already configured to
> the guest via Qemu. We have a pending work item for muser.ko to also
> keep the eventfds so we can recover those, too. Another advantage is
> working with any userspace driver and not requiring a VMM at all.
> 
> If done entirely in userspace, the device emulator needs to allocate
> the device memory somewhere that remains accessible (eg. tmpfs), with
> the difference that now we may be talking about non-trivial amounts of
> memory. Also, that may not be the kind of content you want lingering
> around the filesystem (for the same reasons Qemu unlinks memory files
> from /dev/hugepages after mmap'ing it).
> 
> That's why I'd prefer to rephrase what you said to "in addition"
> instead of "instead".
> 
>> * VMMs can implement this protocol directly for POSIX-portable and
>>    unprivileged operation.
>> * A CUSE VFIO adapter simulates /dev/vfio so that VFIO-only VMMs can
>>    still take advantage of libmuser devices.
> 
> I'm happy with that.
> We need to think the credential aspect throughout to ensure nodes can
> be created in the right places with the right privileges.
> 
>>
>> Assuming this is feasible, would you lose any important
>> features/advantages of the muser.ko approach?  I don't know enough about
>> VFIO to identify any blocker or obvious performance problems.
> 
> That's what I elaborated above. The fact that muser.ko can keep
> various metadata (and other resources) about the device in the kernel
> and grant it back to userspace as needed. There are ways around it,
> but it requires some orchestration with tmpfs and the VMM (only so
> much can be kept in tmpfs; the eventfds need to be retransmitted from
> the machine emulator on request).
> 
> Restarting is a critical aspect of this. One key use case for the
> project is to be able to emulate various devices from one process (for
> polling). That must be able to restart for upgrades or recovery.
> 
>>
>> Regarding recovery, it seems straightforward to keep state in a tmpfs
>> file that can be reopened when the device is restarted.  I don't think
>> kernel code is necessary?
> 
> It adds a dependency, but isn't a show stopper. If we can work through
> permission issues, making sure the VMM can reconnect and retransmit
> eventfds and other state, then it should be ok.
> 
> To be clear: I'm very happy to have a userspace-only option for this,
> I just don't want to ditch the kernel module (yet, anyway). :)
> 
> F.
> 
>>
>> Stefan
> 


  parent reply	other threads:[~2019-12-19 16:43 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-24  9:08 [RFC v4 PATCH 00/49] Initial support of multi-process qemu Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 01/49] multi-process: memory: alloc RAM from file at offset Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread Jagannathan Raman
2019-11-13 15:30   ` Stefan Hajnoczi
2019-11-13 15:38     ` Jag Raman
2019-11-13 15:51       ` Daniel P. Berrangé
2019-11-13 16:04         ` Jag Raman
2019-11-13 16:35           ` Daniel P. Berrangé
2019-10-24  9:08 ` [RFC v4 PATCH 03/49] multi-process: add a command line option for debug file Jagannathan Raman
2019-11-13 15:35   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 04/49] multi-process: Add stub functions to facilate build of multi-process Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 05/49] multi-process: Add config option for multi-process QEMU Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 06/49] multi-process: build system for remote device process Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object Jagannathan Raman
2019-11-11 16:41   ` Stefan Hajnoczi
2019-11-13 15:47     ` Jag Raman
2019-11-13 15:53   ` Stefan Hajnoczi
2019-11-18 15:26     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 08/49] multi-process: add functions to synchronize proxy and remote endpoints Jagannathan Raman
2019-10-24  9:08 ` [RFC v4 PATCH 09/49] multi-process: setup PCI host bridge for remote device Jagannathan Raman
2019-11-13 16:07   ` Stefan Hajnoczi
2019-11-18 15:25     ` Jag Raman
2019-11-21 10:37       ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 10/49] multi-process: setup a machine object for remote device process Jagannathan Raman
2019-11-13 16:22   ` Stefan Hajnoczi
2019-11-18 15:29     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device Jagannathan Raman
2019-11-13 16:33   ` Stefan Hajnoczi
2019-11-13 16:34     ` Jag Raman
2019-10-24  9:08 ` [RFC v4 PATCH 12/49] multi-process: remote process initialization Jagannathan Raman
2019-11-13 16:38   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 13/49] multi-process: introduce proxy object Jagannathan Raman
2019-11-21 11:09   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 14/49] mutli-process: build remote command line args Jagannathan Raman
2019-11-21 11:23   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 15/49] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
2019-11-21 11:33   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 16/49] multi-process: Add LSI device proxy object Jagannathan Raman
2019-11-21 11:35   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 17/49] multi-process: Synchronize remote memory Jagannathan Raman
2019-11-21 11:44   ` Stefan Hajnoczi
2019-10-24  9:08 ` [RFC v4 PATCH 18/49] multi-process: create IOHUB object to handle irq Jagannathan Raman
2019-11-21 12:02   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 19/49] multi-process: configure remote side devices Jagannathan Raman
2019-11-21 12:05   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 20/49] multi-process: add qdev_proxy_add to create proxy devices Jagannathan Raman
2019-11-21 12:16   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 21/49] multi-process: remote: add setup_devices and setup_drive msg processing Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 22/49] multi-process: remote: use fd for socket from parent process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 23/49] multi-process: remote: add create_done condition Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 24/49] multi-process: add processing of remote drive and device command line Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 25/49] multi-process: Introduce build flags to separate remote process code Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 26/49] multi-process: refractor vl.c code to re-use in remote Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 27/49] multi-process: add remote option Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 28/49] multi-process: add remote options parser Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 29/49] multi-process: add parse_cmdline in remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 30/49] multi-process: send heartbeat messages to remote Jagannathan Raman
2019-11-11 16:27   ` Stefan Hajnoczi
2019-11-13 16:01     ` Jag Raman
2019-11-21 12:19       ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 31/49] multi-process: handle heartbeat messages in remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 32/49] multi-process: Use separate MMIO communication channel Jagannathan Raman
2019-11-11 16:21   ` Stefan Hajnoczi
2019-11-13 16:14     ` Jag Raman
2019-11-21 12:31       ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 33/49] multi-process: perform device reset in the remote process Jagannathan Raman
2019-11-11 16:19   ` Stefan Hajnoczi
2019-11-13 16:15     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 34/49] multi-process/mon: choose HMP commands based on target Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 35/49] multi-process/mon: stub functions to enable QMP module for remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 36/49] multi-process/mon: enable QMP module support in the " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 37/49] multi-process/mon: Refactor monitor/chardev functions out of vl.c Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 38/49] multi-process/mon: Initialize QMP module for remote processes Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 39/49] multi-process: prevent duplicate memory initialization in remote Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 40/49] multi-process/mig: build migration module in the remote process Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object Jagannathan Raman
2019-11-13 15:50   ` Daniel P. Berrangé
2019-11-13 16:32     ` Jag Raman
2019-11-13 17:11       ` Daniel P. Berrangé
2019-11-18 15:42         ` Jag Raman
2019-11-22 10:34           ` Dr. David Alan Gilbert
2019-10-24  9:09 ` [RFC v4 PATCH 42/49] multi-process/mig: Send VMSD of remote to " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 43/49] multi-process/mig: Load VMSD in the proxy object Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 44/49] multi-process/mig: refactor runstate_check into common file Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 45/49] multi-process/mig: Synchronize runstate of remote process Jagannathan Raman
2019-11-11 16:17   ` Stefan Hajnoczi
2019-11-13 16:33     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 46/49] multi-process/mig: Restore the VMSD in " Jagannathan Raman
2019-10-24  9:09 ` [RFC v4 PATCH 47/49] multi-process: Enable support for multiple devices in remote Jagannathan Raman
2019-11-11 16:15   ` Stefan Hajnoczi
2019-11-13 16:21     ` Jag Raman
2019-10-24  9:09 ` [RFC v4 PATCH 48/49] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
2019-10-25 19:33   ` Elena Ufimtseva
2019-11-07 15:50   ` Stefan Hajnoczi
2019-11-11 15:41   ` Stefan Hajnoczi
2019-10-24  9:09 ` [RFC v4 PATCH 49/49] multi-process: add configure and usage information Jagannathan Raman
2019-11-07 14:02   ` Stefan Hajnoczi
2019-11-07 14:33     ` Michael S. Tsirkin
2019-11-08 11:17       ` Stefan Hajnoczi
2019-11-08 11:32         ` Daniel P. Berrangé
2019-11-07 14:39     ` Daniel P. Berrangé
2019-11-07 15:53       ` Jag Raman
2019-11-08 11:14         ` Stefan Hajnoczi
2019-10-25  2:08 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu no-reply
2019-10-25  2:08 ` no-reply
2019-10-25  2:10 ` no-reply
2019-11-21 12:46 ` Stefan Hajnoczi
2019-12-10  6:47 ` [RFC v4 PATCH 00/49] Initial support of multi-process qemu - status update Elena Ufimtseva
2019-12-13 10:41   ` Stefan Hajnoczi
2019-12-16 19:46     ` Elena Ufimtseva
2019-12-16 19:57       ` Felipe Franciosi
2019-12-17 16:33         ` Stefan Hajnoczi
2019-12-17 22:57           ` Felipe Franciosi
2019-12-18  0:00             ` Paolo Bonzini
2019-12-19 13:36               ` Stefan Hajnoczi
2019-12-20 17:15                 ` John G Johnson
2020-01-02 10:00                   ` Stefan Hajnoczi
2020-01-02 10:04                   ` Stefan Hajnoczi
2019-12-19 11:55             ` Stefan Hajnoczi
2019-12-19 12:33               ` Felipe Franciosi
2019-12-19 12:55                 ` Daniel P. Berrangé
2019-12-20  9:47                   ` Stefan Hajnoczi
2019-12-20  9:50                     ` Paolo Bonzini
2019-12-20 14:14                       ` Felipe Franciosi
2019-12-20 15:25                         ` Alex Williamson
2019-12-20 16:00                           ` Felipe Franciosi
2020-02-25  9:16                           ` Thanos Makatos
2019-12-20 10:22                     ` Daniel P. Berrangé
2020-01-02 10:42                       ` Stefan Hajnoczi
2020-01-02 11:03                         ` Felipe Franciosi
2020-01-02 18:55                           ` Marc-André Lureau
2020-01-08 16:31                             ` Stefan Hajnoczi
2020-01-03 15:59                           ` Stefan Hajnoczi
2020-01-14  1:56                             ` John G Johnson
2020-01-17 17:25                               ` Dr. David Alan Gilbert
2019-12-19 16:40                 ` Jag Raman [this message]
2019-12-19 12:50             ` Daniel P. Berrangé
2019-12-19 16:46               ` Daniel P. Berrangé
2020-01-02 16:01           ` Elena Ufimtseva
2020-01-03 15:00             ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a98221a8-0a62-46f8-499d-84447fb4980a@oracle.com \
    --to=jag.raman@oracle.com \
    --cc=armbru@redhat.com \
    --cc=benjamin.walker@intel.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=fam@euphon.net \
    --cc=felipe@nutanix.com \
    --cc=james.r.harris@intel.com \
    --cc=john.g.johnson@oracle.com \
    --cc=kanth.ghatraju@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kraxel@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=liran.alon@oracle.com \
    --cc=marcandre.lureau@gmail.com \
    --cc=mreitz@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=ross.lagerwall@citrix.com \
    --cc=rth@twiddle.net \
    --cc=stefanha@gmail.com \
    --cc=swapnil.ingle@nutanix.com \
    --cc=thanos.makatos@nutanix.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.