Re: [virtio-dev] Memory sharing device

From: Frank Yang <lfy@google.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Roman Kiryanov <rkir@google.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	virtio-dev@lists.oasis-open.org
Subject: Re: [virtio-dev] Memory sharing device
Date: Tue, 5 Feb 2019 07:17:27 -0800	[thread overview]
Message-ID: <CAEkmjvWdSyMyy0=2CEN7LtX7XpDYNpVXeZfQTV2mz5rXkWmWmQ@mail.gmail.com> (raw)
In-Reply-To: <20190205100427.GA2693@work-vm>

[-- Attachment #1: Type: text/plain, Size: 8131 bytes --]

Hi all,

I'm Frank who's been using Roman's goldfish address space driver for Vulkan
host visible memory for the emulator. Some more in-depth replies inline.

On Tue, Feb 5, 2019 at 2:04 AM Dr. David Alan Gilbert <dgilbert@redhat.com>
wrote:

> * Roman Kiryanov (rkir@google.com) wrote:
> > Hi Gerd,
> >
> > > virtio-gpu specifically needs that to support vulkan and opengl
> > > extensions for coherent buffers, which must be allocated by the host
> gpu
> > > driver.  It's WIP still.
> >
>
> Hi Roman,
>
> > the proposed spec says:
> >
> > +Shared memory regions MUST NOT be used to control the operation
> > +of the device, nor to stream data; those should still be performed
> > +using virtqueues.
>
> Yes, I put that in.
>
> > Is there a strong reason to prohibit using memory regions for control
> purposes?
> > Our long term goal is to have as few kernel drivers as possible and to
> move
> > "drivers" into userspace. If we go with the virtqueues, is there
> > general a purpose
> > device/driver to talk between our host and guest to support custom
> hardware
> > (with own blobs)? Could you please advise if we can use something else to
> > achieve this goal?
>
> My reason for that paragraph was to try and think about what should
> still be in the virtqueues; after all a device that *just* shares a
> block of memory and does everything in the block of memory itself isn't
> really a virtio device - it's the standardised queue structure that
> makes it a virtio device.
> However, I'd be happy to accept the 'MUST NOT' might be a bit strong for
> some cases where there's stuff that makes sense in the queues and
> stuff that makes sense differently.
>
>
Currently, how we drive the gl/vk host coherent memory is that a host
memory sharing device and a meta pipe device are used in tandem. The pipe
device, goldfish_pipe, is used to send control messages (but is also
currently used to send the API call parameters themselves over, and to
drive other devices like sensors and camera; it's a bit of a catch-all),
while the host memory sharing device does the act of sharing memory from
the host and telling the guest which physical addresses are to be sent with
the glMapBufferRange/vkMapMemory calls over the pipe to the host.

In the interest of having fewer custom kernel drivers for the emulator, we
were thinking of two major approaches to upstreaming the control message /
meta pipe part:

   1. Come up with a new virtio driver that captures what goldfish_pipe
   does; it would have a virtqueue and it would be something like a virtio
   driver for drivers defined in userspace that interacts closely with a host
   memory sharing driver (virtio-userspace?). It would be used with the host
   memory sharing driver not just to share coherent mappings, but also to
   deliver the API call parameters. It'd have a single ioctl that pushes a
   message into the virtqueue that notifies the host a) what kind of userspace
   driver it is and b) how much data to send/receive.
      1. On the host side, we would make the resolution of what virtual
      device code to run based on the control message decided by a
plugin DLL to
      qemu. So once we decide to add new functionality, we would at max need to
      increment some version number that is sent in some initial
control message,
      or change some enumeration in a handshake at the beginning, so no changes
      would have to be made to the guest kernel or QEMU itself.
      2. This is useful for standardizing the Android Emulator drivers in
      the short term, but in the long term, it could be useful for quickly
      specifying new drivers/devices in situations where the developer has some
      control over both the guest/host bits. We'd use this also for:
         1. Media codecs: the guest is given a pointer to host codec
         input/output buffers, downloads compressed data to the input
buffer, and
         ioctl ping's the host. Then the host asynchronously decodes
and populates
         the codec output buffer.
         2. One-off extension functionalities for Vulkan, such as
         VK_KHR_external_memory_fd/win32. Suppose we want to simulate
an OPAQUE_FD
         Vulkan external memory in the guest, but we are running on a
win32 host
         (this will be an important part of our use case). Define a
new driver type
         in userspace, say enum VulkanOpaqueFdWrapper = 55, then open
         virtio-userspace and run ioctls to define that fd as that
kind of driver.
         On the host side, it would then associate the filp with a
host-side win32
         vulkan handle. This can be a modular way to handle further
functionality in
         Vulkan as it comes up without requiring kernel / QEMU changes.
      2. Add a raw ioctl for the above control messages to the proposed
   host memory sharing driver, and make those control messages part of the
   host memory sharing driver's virtqueue.

I heard somewhere that having this kind of thing might run up against
virtio design philosophies of having fewer 'generic' pipes; however, it
could be valuable to have a generic way of defining driver/device
functionality that is configurable without needing to change guest kernels
/ qemu directly.

> I saw there were registers added, could you please elaborate how new
> address
> > regions are added and associated with the host memory (and backwards)?
>
> In virtio-fs we have two separate stages:
>   a) A shared arena is setup (and that's what the spec Stefan pointed to
> is about) -
>      it's statically allocated at device creation and corresponds to a
> chunk
>      of guest physical address space
>
> This is quite like what we're doing for goldfish address space and Vulkan
host visible currently.

   - Our address space device reserves a fixed region in guest physical
   address space on device realization. 16 gb
   -  At the level of Vulkan, on Vulkan device creation, we map a sizable
   amount of host visible memory on the host, and then use the address space
   device to expose it to the guest. It then occupies some offset into the
   address space device's pci resource.
   - At the level of the guest Vulkan user, we satisfy host visible
   VkDeviceMemory allocations by faking them; creating guest-only handles and
   suballocating into that initial host visible memory, and then editing
   memory offset/size parameters to correspond to the actual memory before the
   API calls get to the host driver.

  b) During operation the guest kernel asks for files to be mapped into
>      part of that arena dynamically, using commands sent over the queue
>      - our queue carries FUSE commands, and we've added two new FUSE
>      commands to perform the map/unmap.  They talk in terms of offsets
>      within the shared arena, rather than GPAs.
>

Yes, we'll most likely be operating in a similar manner for OpenGL and
VUlkan.

>
> So I'd tried to start by doing the spec for (a).
>
> > We allocate a region from the guest first and pass its offset to the
> > host to plug
> > real RAM into it and then we mmap this offset:
> >
> > https://photos.app.goo.gl/NJvPBvvFS3S3n9mn6
>
> How do you transmit the glMapBufferRange command from QEMU driver to
> host?
>
> This is done through an ioctl in the address space driver together with
meta pipe commands:

   1. Using the address space driver, run an ioctl to "Allocate" a region,
   which reserves some space. An offset into the region is returned.
   2. Using the meta pipe drier, tell the host about the offset and the API
   call parameters of glMapBufferRange. On the host, glMapBufferRange is run
   for real, and the resulting host pointer is sent to
   KVM_SET_USER_MEMORY_REGION + pci resource start + that offset.
   3.  mmap the region with the supplied offset in the guest.

Dave
>
> > Thank you.
> >
> > Regards,
> > Roman.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>

[-- Attachment #2: Type: text/html, Size: 9819 bytes --]