[virtio-comment] RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)

* [virtio-comment] RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)
@ 2019-02-24 21:18 Frank Yang
  2019-02-24 21:22 ` [virtio-comment] " Frank Yang
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Frank Yang @ 2019-02-24 21:18 UTC (permalink / raw)
  To: virtio-comment, Michael S. Tsirkin, Cornelia Huck, Gerd Hoffmann,
	Stefan Hajnoczi, Dr. David Alan Gilbert, Roman Kiryanov

[-- Attachment #1: Type: text/plain, Size: 6316 bytes --]

virtio-hostmem is a proposed way to share host memory to the guest and
communicate notifications. One potential use case is to have userspace
drivers for virtual machines.

The latest version of the spec proposal can be found at

https://github.com/741g/virtio-spec/blob/master/virtio-hostmem.tex

The revision history so far:

https://github.com/741g/virtio-spec/commit/7c479f79ef6236a064471c5b1b8bc125c887b948
- originally called virtio-user
https://github.com/741g/virtio-spec/commit/206b9386d76f2ce18000dfc2b218375e423ac8e0
- renamed to virtio-hostmem and removed dependence on host callbacks
https://github.com/741g/virtio-spec/commit/e3e5539b08cfbaab22bf644fd4e50c00ec428928
- removed a straggling mention of a host callback
https://github.com/741g/virtio-spec/commit/61c500d5585552658a7c98ef788a625ffe1e201c
- Added an example usage of virtio-hostmem

This first RFC email includes replies to comments from mst@redhat.com:

  > \item Guest allocates into the PCI region via config virtqueue messages.

Michael: OK so who allocates memory out of the PCI region?
Response:

Allocation will be split by guest address space versus host address space.

Guest address space: The guest driver determines the offset into the BAR in
which to allocate the new region. The implementation of the allocator
itself may live on the host (while guest triggers such allocations via the
config virtqueue messages), but the ownership of region offsets and sizes
will be in the guest. This allows for the easy use of existing guest
ref-counting mechanisms such as last close() calling release() to clean up
the memory regions in the guest.

Host address space: The backing of such memory regions is considered
completely optional. The host may service a guest region with a memory of
its choice that depends on the usage of the device. The time this servicing
happens may be any time after the guest communicates the message to create
a memory region, but before the guest destroys the memory region. In the
meantime, some examples of how the host may respond to the allocation
request:

   - The host does not back the region at all and a page fault happens.
   - The host has already allocated host RAM (from some source;
   vkMapMemory, malloc(), mmap, etc) memory of some kind and maps a
   page-aligned host pointer to the guest physical address corresponding to
   the region.
   - The host has already set up a MMIO region (such as via the
   MemoryRegion API in QEMU) and maps that MMIO region to the guest physical
   address, allowing for MMIO callbacks to happen on read/writes to that
   memory region.

  > \item Guest: After a packet of compressed video stream is downloaded to
the
>     buffer, another message, like a doorbell, is sent on the ping
virtqueue to
>         consume existing compressed data. The ping message's offset field
is
>         set to the proper offset into the shared-mem object.

Michael: BTW is this terminology e.g. "download", "ping message" standard
somewhere?
Response:

Conceptually, it has a lot in common with "virtqueue notification" or
"doorbell register". We should resolve to a more standard terminology; what
about "notification"?

  > \item Large bidirectional transfers are possible with zero copy.

Michael: However just to make sure, sending small amounts of data
is slower since you get to do all the mmap dance.

Reponse: Yes it will be very slow if the user chooses to perform mmap for
each transfer. However, we expect that for users who want to perform
frequent transfers of small amounts of data, such as for the sensors /
codec use cases, that the mmap happens once on instance creation with a
single message to create a memory region, and then every time a transfer
hapens, only the notification message is needed, while the existing mmap'ed
region is reused. We expect the regions to remain fairly stable over the
use of the instance, in most cases; the guest userspace will also mmap()
once to get direct access to the host memory, then reuse it many times
while sending traffic.

  > \item It is not necessary to use socket datagrams or data streams to
>     communicate the ping messages; they can be raw structs fresh off the
>         virtqueue.

Michael: OK and ping messages are all fixed size?
Response:

Yes, all ping messages are fixed size.

Michael: OK I still owe you that write-up about vhost pci.  will try to
complete
that early next week. But generally if I got it right that the host
allocates buffers then what you describe does seem to fit a bit better
with the vhost pci host/guest interface idea.

One question that was asked about vhost pci is whether it is in fact
necessary to share a device between multiple applications.
Or is it enough to just have one id per device?

Response:
Yes, looking forward! I'm kind of getting some rough idea now of what you
may be referring to with vhost pci, perhaps if we can use a shared memory
channel like Chrome to drive vhost or something. I'll wait for the full
response before designing more into this area though :)

For now, it's not necessary to share the device between multiple VMs, but
it is necessary to share between multiple guest processes, so multiple
instance ids need to be supported for each device id.

It is also possible to share one instance id across guest processes as
well. In the codec example, the codec may run in a separate guest process
from the guest process that consumes the data, so to prevent copies,
ideally both would have a view of the same host memory. Similar things
shows up when running Vulkan or gralloc/dmabuf-like mechanisms; in recent
versions of gralloc for example, one process allocates the memory while
other processes share that memory by mapping it directly.

However those are between guest processes. For inter-VM communication, I am
still a bit tentative on this but it shows that instance id's fundamentally
reflect a host-side context and its resources. Two VMs could map the same
host memory in principle (though I have not tried it with KVM, I'm not sure
if things explode if set user memory region happens for the same host
memory across two VMs), and if it makes sense for them to communicate over
that memory, then it makes sense for the instance id to be shared across
the two VMs as well.

Anyway, thanks for the feedback!

Best,

Frank

[-- Attachment #2: Type: text/html, Size: 7819 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread