All of lore.kernel.org
 help / color / mirror / Atom feed
* [virtio-comment] RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)
@ 2019-02-24 21:18 Frank Yang
  2019-02-24 21:22 ` [virtio-comment] " Frank Yang
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Frank Yang @ 2019-02-24 21:18 UTC (permalink / raw)
  To: virtio-comment, Michael S. Tsirkin, Cornelia Huck, Gerd Hoffmann,
	Stefan Hajnoczi, Dr. David Alan Gilbert, Roman Kiryanov

[-- Attachment #1: Type: text/plain, Size: 6316 bytes --]

virtio-hostmem is a proposed way to share host memory to the guest and
communicate notifications. One potential use case is to have userspace
drivers for virtual machines.

The latest version of the spec proposal can be found at

https://github.com/741g/virtio-spec/blob/master/virtio-hostmem.tex

The revision history so far:

https://github.com/741g/virtio-spec/commit/7c479f79ef6236a064471c5b1b8bc125c887b948
- originally called virtio-user
https://github.com/741g/virtio-spec/commit/206b9386d76f2ce18000dfc2b218375e423ac8e0
- renamed to virtio-hostmem and removed dependence on host callbacks
https://github.com/741g/virtio-spec/commit/e3e5539b08cfbaab22bf644fd4e50c00ec428928
- removed a straggling mention of a host callback
https://github.com/741g/virtio-spec/commit/61c500d5585552658a7c98ef788a625ffe1e201c
- Added an example usage of virtio-hostmem

This first RFC email includes replies to comments from mst@redhat.com:

  > \item Guest allocates into the PCI region via config virtqueue messages.

Michael: OK so who allocates memory out of the PCI region?
Response:

Allocation will be split by guest address space versus host address space.

Guest address space: The guest driver determines the offset into the BAR in
which to allocate the new region. The implementation of the allocator
itself may live on the host (while guest triggers such allocations via the
config virtqueue messages), but the ownership of region offsets and sizes
will be in the guest. This allows for the easy use of existing guest
ref-counting mechanisms such as last close() calling release() to clean up
the memory regions in the guest.

Host address space: The backing of such memory regions is considered
completely optional. The host may service a guest region with a memory of
its choice that depends on the usage of the device. The time this servicing
happens may be any time after the guest communicates the message to create
a memory region, but before the guest destroys the memory region. In the
meantime, some examples of how the host may respond to the allocation
request:

   - The host does not back the region at all and a page fault happens.
   - The host has already allocated host RAM (from some source;
   vkMapMemory, malloc(), mmap, etc) memory of some kind and maps a
   page-aligned host pointer to the guest physical address corresponding to
   the region.
   - The host has already set up a MMIO region (such as via the
   MemoryRegion API in QEMU) and maps that MMIO region to the guest physical
   address, allowing for MMIO callbacks to happen on read/writes to that
   memory region.

  > \item Guest: After a packet of compressed video stream is downloaded to
the
>     buffer, another message, like a doorbell, is sent on the ping
virtqueue to
>         consume existing compressed data. The ping message's offset field
is
>         set to the proper offset into the shared-mem object.

Michael: BTW is this terminology e.g. "download", "ping message" standard
somewhere?
Response:

Conceptually, it has a lot in common with "virtqueue notification" or
"doorbell register". We should resolve to a more standard terminology; what
about "notification"?

  > \item Large bidirectional transfers are possible with zero copy.

Michael: However just to make sure, sending small amounts of data
is slower since you get to do all the mmap dance.

Reponse: Yes it will be very slow if the user chooses to perform mmap for
each transfer. However, we expect that for users who want to perform
frequent transfers of small amounts of data, such as for the sensors /
codec use cases, that the mmap happens once on instance creation with a
single message to create a memory region, and then every time a transfer
hapens, only the notification message is needed, while the existing mmap'ed
region is reused. We expect the regions to remain fairly stable over the
use of the instance, in most cases; the guest userspace will also mmap()
once to get direct access to the host memory, then reuse it many times
while sending traffic.

  > \item It is not necessary to use socket datagrams or data streams to
>     communicate the ping messages; they can be raw structs fresh off the
>         virtqueue.

Michael: OK and ping messages are all fixed size?
Response:

Yes, all ping messages are fixed size.

Michael: OK I still owe you that write-up about vhost pci.  will try to
complete
that early next week. But generally if I got it right that the host
allocates buffers then what you describe does seem to fit a bit better
with the vhost pci host/guest interface idea.

One question that was asked about vhost pci is whether it is in fact
necessary to share a device between multiple applications.
Or is it enough to just have one id per device?

Response:
Yes, looking forward! I'm kind of getting some rough idea now of what you
may be referring to with vhost pci, perhaps if we can use a shared memory
channel like Chrome to drive vhost or something. I'll wait for the full
response before designing more into this area though :)

For now, it's not necessary to share the device between multiple VMs, but
it is necessary to share between multiple guest processes, so multiple
instance ids need to be supported for each device id.

It is also possible to share one instance id across guest processes as
well. In the codec example, the codec may run in a separate guest process
from the guest process that consumes the data, so to prevent copies,
ideally both would have a view of the same host memory. Similar things
shows up when running Vulkan or gralloc/dmabuf-like mechanisms; in recent
versions of gralloc for example, one process allocates the memory while
other processes share that memory by mapping it directly.

However those are between guest processes. For inter-VM communication, I am
still a bit tentative on this but it shows that instance id's fundamentally
reflect a host-side context and its resources. Two VMs could map the same
host memory in principle (though I have not tried it with KVM, I'm not sure
if things explode if set user memory region happens for the same host
memory across two VMs), and if it makes sense for them to communicate over
that memory, then it makes sense for the instance id to be shared across
the two VMs as well.

Anyway, thanks for the feedback!

Best,

Frank

[-- Attachment #2: Type: text/html, Size: 7819 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [virtio-comment] Re: RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)
  2019-02-24 21:18 [virtio-comment] RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device) Frank Yang
@ 2019-02-24 21:22 ` Frank Yang
  2019-02-25  5:15 ` Roman Kiryanov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Frank Yang @ 2019-02-24 21:22 UTC (permalink / raw)
  To: virtio-comment, Michael S. Tsirkin, Cornelia Huck, Gerd Hoffmann,
	Stefan Hajnoczi, Dr. David Alan Gilbert, Roman Kiryanov

[-- Attachment #1: Type: text/plain, Size: 7426 bytes --]

Missed this:

  > \item Large bidirectional transfers are possible without scatterlists,
because
>     the memory is always physically contiguous.

Michael: It might get fragmented though. I think it would be up to
host to try and make sure it's not too fragmented, right?

Response:

Yes, it's definitely possible to get fragmented, and it is up to the host
to make sure it's not too fragmented in the end.

This can be more likely achieved if the device is typically used in such a
way that allocations on host tend to be page granularity, and live for a
long time. For the use cases such as codec and graphics, allocations will
tend to be over a page and have that long lifetime discussed (well, at
least if we consider lifetime of a guest process using the device "long"
enough)


On Sun, Feb 24, 2019 at 1:18 PM Frank Yang <lfy@google.com> wrote:

> virtio-hostmem is a proposed way to share host memory to the guest and
> communicate notifications. One potential use case is to have userspace
> drivers for virtual machines.
>
> The latest version of the spec proposal can be found at
>
> https://github.com/741g/virtio-spec/blob/master/virtio-hostmem.tex
>
> The revision history so far:
>
>
> https://github.com/741g/virtio-spec/commit/7c479f79ef6236a064471c5b1b8bc125c887b948
> - originally called virtio-user
>
> https://github.com/741g/virtio-spec/commit/206b9386d76f2ce18000dfc2b218375e423ac8e0
> - renamed to virtio-hostmem and removed dependence on host callbacks
>
> https://github.com/741g/virtio-spec/commit/e3e5539b08cfbaab22bf644fd4e50c00ec428928
> - removed a straggling mention of a host callback
>
> https://github.com/741g/virtio-spec/commit/61c500d5585552658a7c98ef788a625ffe1e201c
> - Added an example usage of virtio-hostmem
>
> This first RFC email includes replies to comments from mst@redhat.com:
>
>   > \item Guest allocates into the PCI region via config virtqueue
> messages.
>
> Michael: OK so who allocates memory out of the PCI region?
> Response:
>
> Allocation will be split by guest address space versus host address space.
>
> Guest address space: The guest driver determines the offset into the BAR
> in which to allocate the new region. The implementation of the allocator
> itself may live on the host (while guest triggers such allocations via the
> config virtqueue messages), but the ownership of region offsets and sizes
> will be in the guest. This allows for the easy use of existing guest
> ref-counting mechanisms such as last close() calling release() to clean up
> the memory regions in the guest.
>
> Host address space: The backing of such memory regions is considered
> completely optional. The host may service a guest region with a memory of
> its choice that depends on the usage of the device. The time this servicing
> happens may be any time after the guest communicates the message to create
> a memory region, but before the guest destroys the memory region. In the
> meantime, some examples of how the host may respond to the allocation
> request:
>
>    - The host does not back the region at all and a page fault happens.
>    - The host has already allocated host RAM (from some source;
>    vkMapMemory, malloc(), mmap, etc) memory of some kind and maps a
>    page-aligned host pointer to the guest physical address corresponding to
>    the region.
>    - The host has already set up a MMIO region (such as via the
>    MemoryRegion API in QEMU) and maps that MMIO region to the guest physical
>    address, allowing for MMIO callbacks to happen on read/writes to that
>    memory region.
>
>   > \item Guest: After a packet of compressed video stream is downloaded
> to the
> >     buffer, another message, like a doorbell, is sent on the ping
> virtqueue to
> >         consume existing compressed data. The ping message's offset
> field is
> >         set to the proper offset into the shared-mem object.
>
> Michael: BTW is this terminology e.g. "download", "ping message" standard
> somewhere?
> Response:
>
> Conceptually, it has a lot in common with "virtqueue notification" or
> "doorbell register". We should resolve to a more standard terminology; what
> about "notification"?
>
>   > \item Large bidirectional transfers are possible with zero copy.
>
> Michael: However just to make sure, sending small amounts of data
> is slower since you get to do all the mmap dance.
>
> Reponse: Yes it will be very slow if the user chooses to perform mmap for
> each transfer. However, we expect that for users who want to perform
> frequent transfers of small amounts of data, such as for the sensors /
> codec use cases, that the mmap happens once on instance creation with a
> single message to create a memory region, and then every time a transfer
> hapens, only the notification message is needed, while the existing mmap'ed
> region is reused. We expect the regions to remain fairly stable over the
> use of the instance, in most cases; the guest userspace will also mmap()
> once to get direct access to the host memory, then reuse it many times
> while sending traffic.
>
>   > \item It is not necessary to use socket datagrams or data streams to
> >     communicate the ping messages; they can be raw structs fresh off the
> >         virtqueue.
>
> Michael: OK and ping messages are all fixed size?
> Response:
>
> Yes, all ping messages are fixed size.
>
> Michael: OK I still owe you that write-up about vhost pci.  will try to
> complete
> that early next week. But generally if I got it right that the host
> allocates buffers then what you describe does seem to fit a bit better
> with the vhost pci host/guest interface idea.
>
> One question that was asked about vhost pci is whether it is in fact
> necessary to share a device between multiple applications.
> Or is it enough to just have one id per device?
>
> Response:
> Yes, looking forward! I'm kind of getting some rough idea now of what you
> may be referring to with vhost pci, perhaps if we can use a shared memory
> channel like Chrome to drive vhost or something. I'll wait for the full
> response before designing more into this area though :)
>
> For now, it's not necessary to share the device between multiple VMs, but
> it is necessary to share between multiple guest processes, so multiple
> instance ids need to be supported for each device id.
>
> It is also possible to share one instance id across guest processes as
> well. In the codec example, the codec may run in a separate guest process
> from the guest process that consumes the data, so to prevent copies,
> ideally both would have a view of the same host memory. Similar things
> shows up when running Vulkan or gralloc/dmabuf-like mechanisms; in recent
> versions of gralloc for example, one process allocates the memory while
> other processes share that memory by mapping it directly.
>
> However those are between guest processes. For inter-VM communication, I
> am still a bit tentative on this but it shows that instance id's
> fundamentally reflect a host-side context and its resources. Two VMs could
> map the same host memory in principle (though I have not tried it with KVM,
> I'm not sure if things explode if set user memory region happens for the
> same host memory across two VMs), and if it makes sense for them to
> communicate over that memory, then it makes sense for the instance id to be
> shared across the two VMs as well.
>
> Anyway, thanks for the feedback!
>
> Best,
>
> Frank
>

[-- Attachment #2: Type: text/html, Size: 9205 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)
  2019-02-24 21:18 [virtio-comment] RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device) Frank Yang
  2019-02-24 21:22 ` [virtio-comment] " Frank Yang
@ 2019-02-25  5:15 ` Roman Kiryanov
  2019-02-25  5:27 ` [virtio-comment] " Roman Kiryanov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Roman Kiryanov @ 2019-02-25  5:15 UTC (permalink / raw)
  To: Frank Yang
  Cc: virtio-comment, Michael S. Tsirkin, Cornelia Huck, Gerd Hoffmann,
	Stefan Hajnoczi, Dr. David Alan Gilbert

> Michael: OK so who allocates memory out of the PCI region?
> Response:
>
> Allocation will be split by guest address space versus host address space.
>
> Guest address space: The guest driver determines the offset into the BAR in which to allocate the new region. The implementation of the allocator itself may live on the host (while guest triggers such allocations via the config virtqueue messages), but the ownership of region offsets and sizes will be in the guest. This allows for the easy use of existing guest ref-counting mechanisms such as last close() calling release() to clean up the memory regions in the guest.
>
> Host address space: The backing of such memory regions is considered completely optional. The host may service a guest region with a memory of its choice that depends on the usage of the device. The time this servicing happens may be any time after the guest communicates the message to create a memory region, but before the guest destroys the memory region. In the meantime, some examples of how the host may respond to the allocation request:

Should we note here what happens if a guest releases (a user process
dies) the region without asking the host to un-back it?

Regards,
Roman.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [virtio-comment] Re: RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)
  2019-02-24 21:18 [virtio-comment] RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device) Frank Yang
  2019-02-24 21:22 ` [virtio-comment] " Frank Yang
  2019-02-25  5:15 ` Roman Kiryanov
@ 2019-02-25  5:27 ` Roman Kiryanov
  2019-02-25 12:56 ` [virtio-comment] " Dr. David Alan Gilbert
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Roman Kiryanov @ 2019-02-25  5:27 UTC (permalink / raw)
  To: Frank Yang
  Cc: virtio-comment, Michael S. Tsirkin, Cornelia Huck, Gerd Hoffmann,
	Stefan Hajnoczi, Dr. David Alan Gilbert

> Michael: OK so who allocates memory out of the PCI region?
> Response:
>
> Allocation will be split by guest address space versus host address space.
>
> Guest address space: The guest driver determines the offset into the BAR in which to allocate the new region. The implementation of the allocator itself may live on the host (while guest triggers such allocations via the config virtqueue messages), but the ownership of region offsets and sizes will be in the guest. This allows for the easy use of existing guest ref-counting mechanisms such as last close() calling release() to clean up the memory regions in the guest.
>
> Host address space: The backing of such memory regions is considered completely optional. The host may service a guest region with a memory of its choice that depends on the usage of the device. The time this servicing happens may be any time after the guest communicates the message to create a memory region, but before the guest destroys the memory region. In the meantime, some examples of how the host may respond to the allocation request:

Should we note here what happens if a guest releases (a user process
dies) the region without asking the host to un-back it?

Regards,
Roman.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)
  2019-02-24 21:18 [virtio-comment] RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device) Frank Yang
                   ` (2 preceding siblings ...)
  2019-02-25  5:27 ` [virtio-comment] " Roman Kiryanov
@ 2019-02-25 12:56 ` Dr. David Alan Gilbert
  2019-02-25 13:50 ` [virtio-comment] " Michael S. Tsirkin
  2019-03-06 16:36 ` [virtio-comment] " Stefan Hajnoczi
  5 siblings, 0 replies; 13+ messages in thread
From: Dr. David Alan Gilbert @ 2019-02-25 12:56 UTC (permalink / raw)
  To: Frank Yang
  Cc: virtio-comment, Michael S. Tsirkin, Cornelia Huck, Gerd Hoffmann,
	Stefan Hajnoczi, Roman Kiryanov

* Frank Yang (lfy@google.com) wrote:
> virtio-hostmem is a proposed way to share host memory to the guest and
> communicate notifications. One potential use case is to have userspace
> drivers for virtual machines.
> 
> The latest version of the spec proposal can be found at
> 
> https://github.com/741g/virtio-spec/blob/master/virtio-hostmem.tex
> 
> The revision history so far:
> 
> https://github.com/741g/virtio-spec/commit/7c479f79ef6236a064471c5b1b8bc125c887b948
> - originally called virtio-user
> https://github.com/741g/virtio-spec/commit/206b9386d76f2ce18000dfc2b218375e423ac8e0
> - renamed to virtio-hostmem and removed dependence on host callbacks
> https://github.com/741g/virtio-spec/commit/e3e5539b08cfbaab22bf644fd4e50c00ec428928
> - removed a straggling mention of a host callback
> https://github.com/741g/virtio-spec/commit/61c500d5585552658a7c98ef788a625ffe1e201c
> - Added an example usage of virtio-hostmem
> 
> This first RFC email includes replies to comments from mst@redhat.com:
> 
>   > \item Guest allocates into the PCI region via config virtqueue messages.
> 
> Michael: OK so who allocates memory out of the PCI region?
> Response:
> 
> Allocation will be split by guest address space versus host address space.
> 
> Guest address space: The guest driver determines the offset into the BAR in
> which to allocate the new region. The implementation of the allocator
> itself may live on the host (while guest triggers such allocations via the
> config virtqueue messages), but the ownership of region offsets and sizes
> will be in the guest. This allows for the easy use of existing guest
> ref-counting mechanisms such as last close() calling release() to clean up
> the memory regions in the guest.
> 
> Host address space: The backing of such memory regions is considered
> completely optional. The host may service a guest region with a memory of
> its choice that depends on the usage of the device. The time this servicing
> happens may be any time after the guest communicates the message to create
> a memory region, but before the guest destroys the memory region. In the
> meantime, some examples of how the host may respond to the allocation
> request:
> 
>    - The host does not back the region at all and a page fault happens.

Note that a mapping missing on the host wont necessarily turn into a
page fault in the guest; on qemu for example, if you have a memory
region like this where the guest accesses an area with no mapping, I
think we hit a kvm error.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [virtio-comment] Re: RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)
  2019-02-24 21:18 [virtio-comment] RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device) Frank Yang
                   ` (3 preceding siblings ...)
  2019-02-25 12:56 ` [virtio-comment] " Dr. David Alan Gilbert
@ 2019-02-25 13:50 ` Michael S. Tsirkin
  2019-02-25 18:54   ` Roman Kiryanov
  2019-03-06 16:36 ` [virtio-comment] " Stefan Hajnoczi
  5 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2019-02-25 13:50 UTC (permalink / raw)
  To: Frank Yang
  Cc: virtio-comment, Cornelia Huck, Gerd Hoffmann, Stefan Hajnoczi,
	Dr. David Alan Gilbert, Roman Kiryanov

On Sun, Feb 24, 2019 at 01:18:11PM -0800, Frank Yang wrote:
> virtio-hostmem is a proposed way to share host memory to the guest and
> communicate notifications. One potential use case is to have userspace drivers
> for virtual machines.
> 
> The latest version of the spec proposal can be found at
> 
> https://github.com/741g/virtio-spec/blob/master/virtio-hostmem.tex
> 
> The revision history so far:
> 
> https://github.com/741g/virtio-spec/commit/
> 7c479f79ef6236a064471c5b1b8bc125c887b948 - originally called virtio-user
> https://github.com/741g/virtio-spec/commit/
> 206b9386d76f2ce18000dfc2b218375e423ac8e0 - renamed to virtio-hostmem and
> removed dependence on host callbacks
> https://github.com/741g/virtio-spec/commit/
> e3e5539b08cfbaab22bf644fd4e50c00ec428928 - removed a straggling mention of a
> host callback
> https://github.com/741g/virtio-spec/commit/
> 61c500d5585552658a7c98ef788a625ffe1e201c - Added an example usage of
> virtio-hostmem
> 
> This first RFC email includes replies to comments from mst@redhat.com:
> 
>   > \item Guest allocates into the PCI region via config virtqueue messages.
> 
> Michael: OK so who allocates memory out of the PCI region? 
> Response:
> 
> Allocation will be split by guest address space versus host address space.
> 
> Guest address space: The guest driver determines the offset into the BAR in
> which to allocate the new region. The implementation of the allocator itself
> may live on the host (while guest triggers such allocations via the config
> virtqueue messages), but the ownership of region offsets and sizes will be in
> the guest. This allows for the easy use of existing guest ref-counting
> mechanisms such as last close() calling release() to clean up the memory
> regions in the guest.
> 
> Host address space: The backing of such memory regions is considered completely
> optional. The host may service a guest region with a memory of its choice that
> depends on the usage of the device. The time this servicing happens may be any
> time after the guest communicates the message to create a memory region, but
> before the guest destroys the memory region. In the meantime, some examples of
> how the host may respond to the allocation request:
> 
>   • The host does not back the region at all and a page fault happens.

Then what? Guest dies?
That doesn't sound reasonable, in particular if you want to
allow userspace to map this memory.

>   • The host has already allocated host RAM (from some source; vkMapMemory,
>     malloc(), mmap, etc) memory of some kind and maps a page-aligned host
>     pointer to the guest physical address corresponding to the region.

I'm not sure what does "of some kind" mean here.
Also host and guest might have different ideas about
what does page-aligned mean.

>   • The host has already set up a MMIO region (such as via the MemoryRegion API
>     in QEMU) and maps that MMIO region to the guest physical address, allowing
>     for MMIO callbacks to happen on read/writes to that memory region.

Callbacks are an implementation detail.


What is missing here is description of how device behaves
from guest point of view.
And this will affect how guest behaves.
For example should guest map the memory uncacheable? WB?
MMIO would need uncacheable. RAM would need WB.

If we are following vhost-pci design then the memory should behave as
RAM. It can be faulted in lazily but that is transparent to guest.
Actions trigger through queues, not MMIO.


>   > \item Guest: After a packet of compressed video stream is downloaded to the
> >     buffer, another message, like a doorbell, is sent on the ping virtqueue
> to
> >         consume existing compressed data. The ping message's offset field is
> >         set to the proper offset into the shared-mem object.
> 
> Michael: BTW is this terminology e.g. "download", "ping message" standard
> somewhere?
> Response:
> 
> Conceptually, it has a lot in common with "virtqueue notification" or "doorbell
> register". We should resolve to a more standard terminology; what about
> "notification"?


Virtio uses the terms "available buffer notification" and
"used buffer notification". If this follows vhost-pci design
tnen available buffer notification is sent host to guest,
and used buffer notification is sent guest to host.
Virtio is the reverse.

>   > \item Large bidirectional transfers are possible with zero copy.
> 
> Michael: However just to make sure, sending small amounts of data
> is slower since you get to do all the mmap dance.
> 
> Reponse: Yes it will be very slow if the user chooses to perform mmap for each
> transfer. However, we expect that for users who want to perform frequent
> transfers of small amounts of data, such as for the sensors / codec use cases,
> that the mmap happens once on instance creation with a single message to create
> a memory region, and then every time a transfer hapens, only the notification
> message is needed, while the existing mmap'ed region is reused. We expect the
> regions to remain fairly stable over the use of the instance, in most cases;
> the guest userspace will also mmap() once to get direct access to the host
> memory, then reuse it many times while sending traffic.
> 
>   > \item It is not necessary to use socket datagrams or data streams to
> >     communicate the ping messages; they can be raw structs fresh off the
> >         virtqueue.
> 
> Michael: OK and ping messages are all fixed size? 
> Response:
> 
> Yes, all ping messages are fixed size. 
> 
> Michael: OK I still owe you that write-up about vhost pci.  will try to
> complete
> that early next week. But generally if I got it right that the host
> allocates buffers then what you describe does seem to fit a bit better
> with the vhost pci host/guest interface idea.
> 
> One question that was asked about vhost pci is whether it is in fact
> necessary to share a device between multiple applications.
> Or is it enough to just have one id per device?  
> 
> Response:
> Yes, looking forward! I'm kind of getting some rough idea now of what you may
> be referring to with vhost pci, perhaps if we can use a shared memory channel
> like Chrome to drive vhost or something. I'll wait for the full response before
> designing more into this area though :)
> 
> For now, it's not necessary to share the device between multiple VMs, but it is
> necessary to share between multiple guest processes, so multiple instance ids
> need to be supported for each device id.
> 
> It is also possible to share one instance id across guest processes as well. In
> the codec example, the codec may run in a separate guest process from the guest
> process that consumes the data, so to prevent copies, ideally both would have a
> view of the same host memory.

Especially in this case, this needs some security model enforced by
guest kernel.

> Similar things shows up when running Vulkan or
> gralloc/dmabuf-like mechanisms; in recent versions of gralloc for example, one
> process allocates the memory while other processes share that memory by mapping
> it directly. 

I'm only vaguely familiar with that.
They map is through kernel right?

> However those are between guest processes. For inter-VM communication, I am
> still a bit tentative on this but it shows that instance id's fundamentally
> reflect a host-side context and its resources. Two VMs could map the same host
> memory in principle (though I have not tried it with KVM, I'm not sure if
> things explode if set user memory region happens for the same host memory
> across two VMs), and if it makes sense for them to communicate over that
> memory, then it makes sense for the instance id to be shared across the two VMs
> as well.
> 
> Anyway, thanks for the feedback!
> 
> Best,
> 
> Frank

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [virtio-comment] Re: RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)
  2019-02-25 13:50 ` [virtio-comment] " Michael S. Tsirkin
@ 2019-02-25 18:54   ` Roman Kiryanov
  2019-02-25 20:34     ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Kiryanov @ 2019-02-25 18:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Frank Yang, virtio-comment, Cornelia Huck, Gerd Hoffmann,
	Stefan Hajnoczi, Dr. David Alan Gilbert

> >   • The host does not back the region at all and a page fault happens.
>
> Then what? Guest dies?
> That doesn't sound reasonable, in particular if you want to
> allow userspace to map this memory.

In our implementation we call mmap after asking the host to back the region.

https://photos.app.goo.gl/NJvPBvvFS3S3n9mn6

Nothing prevents a guest to call mmap on an unbacked region, then the
guest will die. If it is possible for the device to figure out if an
address range
is backed in VM, the guest driver could talk to the device to fail an mmap
call if a region is not accessible.

> >   • The host has already allocated host RAM (from some source; vkMapMemory,
> >     malloc(), mmap, etc) memory of some kind and maps a page-aligned host
> >     pointer to the guest physical address corresponding to the region.
>
> I'm not sure what does "of some kind" mean here.

Memory from any API call that could be used for access through this
address range.

> Also host and guest might have different ideas about
> what does page-aligned mean.

In our implementation we do aligning (for VM operations) and unaligning in the
guest userspace (because mmap is page aligned) to get the pointer to handle
pointers in the middle of a page (we have no control on pointers returned
from a third party API).

Regards,
Roman.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [virtio-comment] Re: RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)
  2019-02-25 18:54   ` Roman Kiryanov
@ 2019-02-25 20:34     ` Michael S. Tsirkin
  2019-02-25 23:08       ` Roman Kiryanov
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2019-02-25 20:34 UTC (permalink / raw)
  To: Roman Kiryanov
  Cc: Frank Yang, virtio-comment, Cornelia Huck, Gerd Hoffmann,
	Stefan Hajnoczi, Dr. David Alan Gilbert

On Mon, Feb 25, 2019 at 10:54:03AM -0800, Roman Kiryanov wrote:
> > >   • The host does not back the region at all and a page fault happens.
> >
> > Then what? Guest dies?
> > That doesn't sound reasonable, in particular if you want to
> > allow userspace to map this memory.
> 
> In our implementation we call mmap after asking the host to back the region.

So I guess spec should not say host does not have to back the region
then.


> https://photos.app.goo.gl/NJvPBvvFS3S3n9mn6
> 
> Nothing prevents a guest to call mmap on an unbacked region, then the
> guest will die. If it is possible for the device to figure out if an
> address range
> is backed in VM, the guest driver could talk to the device to fail an mmap
> call if a region is not accessible.

So if driver needs specific knowlegde from the device that needs to be
in the spec.

> > >   • The host has already allocated host RAM (from some source; vkMapMemory,
> > >     malloc(), mmap, etc) memory of some kind and maps a page-aligned host
> > >     pointer to the guest physical address corresponding to the region.
> >
> > I'm not sure what does "of some kind" mean here.
> 
> Memory from any API call that could be used for access through this
> address range.

So just RAM really?

> > Also host and guest might have different ideas about
> > what does page-aligned mean.
> 
> In our implementation we do aligning (for VM operations) and unaligning in the
> guest userspace (because mmap is page aligned) to get the pointer to handle
> pointers in the middle of a page (we have no control on pointers returned
> from a third party API).
> 
> Regards,
> Roman.

I'm not sure how does above answer the comment.  I understand you are
using all kind of APIs internally in your hypervisor but please put
things in terms that can apply to host/guest communication. I can kind
of read it between the lines if I squint hard enough but this makes my
head hurt and there's no guarantee I do it correctly.

To try and put things in your terms, if you try to map a range of memory
you get access to a page that can be bigger than the range you asked
for.  It can cause two ranges to violate a security boundary, cause
information leaks, etc. A library can play with offsets and give a well
behaved application an illusion of a private range but if it ends up
sharing a page of memory with a malicious application then there's no
security boundary between them.

HTH

-- 
MST

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [virtio-comment] Re: RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)
  2019-02-25 20:34     ` Michael S. Tsirkin
@ 2019-02-25 23:08       ` Roman Kiryanov
  2019-02-25 23:45         ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Kiryanov @ 2019-02-25 23:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Frank Yang, virtio-comment, Cornelia Huck, Gerd Hoffmann,
	Stefan Hajnoczi, Dr. David Alan Gilbert

Michael, thank you for your comments.

> I'm not sure how does above answer the comment.

Sorry for leaving this unclear, our guest driver tells the
device guest's page size and then we do aligning-unaligning.

> To try and put things in your terms, if you try to map a range of memory
> you get access to a page that can be bigger than the range you asked
> for.

This is correct.

>  It can cause two ranges to violate a security boundary, cause
> information leaks, etc.

Could you please correct me if I am wrong. If I ask glMapBufferRange
(without hosts and guests) for a 1K buffer with 4K pages, I will have
access to other 3K. If a driver decides to put sensitive bits there -
will this be the same situation?

We assume pages are not shared between processes.
If this assumption does not work then it is hard to share arbitrary pointers.

Regards,
Roman.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [virtio-comment] Re: RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)
  2019-02-25 23:08       ` Roman Kiryanov
@ 2019-02-25 23:45         ` Michael S. Tsirkin
  0 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2019-02-25 23:45 UTC (permalink / raw)
  To: Roman Kiryanov
  Cc: Frank Yang, virtio-comment, Cornelia Huck, Gerd Hoffmann,
	Stefan Hajnoczi, Dr. David Alan Gilbert

On Mon, Feb 25, 2019 at 03:08:19PM -0800, Roman Kiryanov wrote:
> Michael, thank you for your comments.
> 
> > I'm not sure how does above answer the comment.
> 
> Sorry for leaving this unclear, our guest driver tells the
> device guest's page size and then we do aligning-unaligning.


This might work. Note that host page size might be different.
If it's bigger host needs to be careful about allocating
full host pages anyway.

> > To try and put things in your terms, if you try to map a range of memory
> > you get access to a page that can be bigger than the range you asked
> > for.
> 
> This is correct.
> 
> >  It can cause two ranges to violate a security boundary, cause
> > information leaks, etc.
> 
> Could you please correct me if I am wrong. If I ask glMapBufferRange
> (without hosts and guests) for a 1K buffer with 4K pages, I will have
> access to other 3K. If a driver decides to put sensitive bits there -
> will this be the same situation?

Sounds similar.

> We assume pages are not shared between processes.
> If this assumption does not work then it is hard to share arbitrary pointers.
> 
> Regards,
> Roman.

Right. Details on how memory is allocated in the proposed scheme are
scant but above I think shows that it can't all be up to guest.


-- 
MST

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)
  2019-02-24 21:18 [virtio-comment] RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device) Frank Yang
                   ` (4 preceding siblings ...)
  2019-02-25 13:50 ` [virtio-comment] " Michael S. Tsirkin
@ 2019-03-06 16:36 ` Stefan Hajnoczi
  2019-03-06 17:28   ` Michael S. Tsirkin
  5 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2019-03-06 16:36 UTC (permalink / raw)
  To: Frank Yang
  Cc: virtio-comment, Michael S. Tsirkin, Cornelia Huck, Gerd Hoffmann,
	Dr. David Alan Gilbert, Roman Kiryanov

[-- Attachment #1: Type: text/plain, Size: 13839 bytes --]

> \section{Host Memory Device}\label{sec:Device Types / Host Memory Device}

The Host Memory Device defines an entirely new device model that bypasses
VIRTIO.  Why make this a VIRTIO device if the VIRTIO device model is not a good
fit for what you're trying to achieve?  This device seems out of scope to me.

>
> Note: This depends on the upcoming shared-mem type of virtio
> that allows sharing of host memory to the guest.
>
> virtio-hostmem is a device for sharing host memory all the way to the guest userspace.
> It runs on top of virtio-pci for virtqueue messages and
> uses the PCI address space for direct access like virtio-fs does.

Perhaps a more general way to express this is to explain that it provides
direct access to memory using Shared Memory Resources.  (The ones defined by
David Gilbert's work-in-progress spec.)  Then you could remove explicit
references to virtio-pci, PCI, and virtio-fs.

>
> virtio-hostmem's purpose is
> to allow high performance general memory accesses between guest and host,
> and to allow the guest to access host memory constructed at runtime,
> such as mapped memory from graphics APIs.
>
> Note that vhost-pci/vhost-vsock, virtio-vsock, and virtio-fs

vhost-pci and virtio-vhost-user (is that what you meant?) are unlikely to be in
the VIRTIO specification any time soon, so readers might not be aware of them.

> are also general ways to share data between the guest and host,
> but they are specialized to socket APIs in the guest plus
> having host OS-dependent socket communication mechanism,
> or depend on a FUSE implementation.
>
> virtio-hostmem provides guest/host communication mechanisms over raw host memory,
> as opposed to sockets,
> which has benefits of being more portable across hypervisors and guest OSes,
> and potentially higher performance due to always being physically contiguous to the guest.

/to the guest/in guest memory/?

> \subsection{Fixed Host Memory Regions}\label{sec:Device Types / Host Memory Device / Communication over Fixed Host Memory Regions}
>
> Shmids will be set up as a set of fixed ranges on boot,

What does this mean?  Where is the meaning of each shmid defined and how is it
represented?

> one for each sub-device available.

Please explain sub-devices first.

> This means that the boot-up sequence plus the guest kernel

boot-up sequence -> the guest boot-up sequence?

> configures memory regions used by sub-devices once on startup,
> and does not touch them again;

Normally the VIRTIO specification talks about a "device" and a "driver" rather
than the guest, guest kernel, guest userspace, etc.  It focuses on the
device/driver interface rather than on other layers of the stack.

After reading this paragraph it's still not clear how sub-devices are detected,
configured, numbered, etc and who is really responsible for that.

> this simplifies the set of possible behavior,
> and bounds the maximum amount of guest physical memory that can be used.
>
> It is always assumed that the memory regions act as RAM
> and are backed in some way by the host.
> There is no caching, all access is coherent.

What does no caching mean?  The memory pages should be mapped like normal RAM,
right?

> When the guest sends notifications to the host,
> Memory fence instructions are automatically deployed
> for architectures without store/load coherency.
>
> The host is allowed to lazily back / modify which host pointers
> correspond to the physical address ranges of such regions,
> such as in response to sub-device specific protocols,
> but is never allowed to leave those mappings unmapped.

The part of lazily backing address ranges is a device implementation detail
that is not visible in the device-driver interface.  I suggest removing these
kinds of statements on focussing on the device-driver interface instead.

Regarding leaving mappings unmapped, a device normative section can specify
that "every page in the address range MUST be accessible so that the driver
does not encounter an CPU architecture-specific error when accessing a page".

> These guest physical memory regions persist throughout the lifetime of the VMM;

Memory region == sub-device == shmid?

> they are not created or destroyed dynamically even if the virtio-hostmem device
> is realized as a PCI device that can hotplug.

I'm not sure what the hotplug statement means.  If the device is not plugged in
yet, then the memory regions are absent from the guest memory space, right?
Once the device is plugged in they become present?  Once the device is removed
they are gone again?  If another device is hotplugged later it could have
different sub-devices/memory regions?

>
> The guest physical regions must not overlap and must not be shared
> directly across different sub-device types.

Please move this into a device normative section and s/must/MUST/.

What does "shared directly" mean?

>
> \subsection{Sub-devices and Instances}\label{sec:Device Types / Host Memory Device / Sub-devices and Instances}
>
> The guest and host communicate
> over the config, notification (tx) and event (rx) virtqueues.
> The config virtqueue is used for instance creation/destruction.
>
> The guest can create "instances" which capture
> a particular use case of the device.
> Different use cases are distinguished by different sub-device IDs;
> virtio-hostmem is like virtio-input in that the guest can query
> for sub-devices that are implemented on the host via device and vendor ID's;
> the guest provides vendor and device id in a configuration message.
> The host then accepts or rejects the instance creation request.
>
> Each instance can only touch memory regions
> associated with its particular sub-device,
> and only knows the offset into the associated memory region.
> It is up to the userspace driver / device implementation to
> resolve how offsets into the memory region are shared across instances or not.

An earlier statement says regions "must not be shared directly across different
sub-device types".  So they can be shared across instances but not sub-device
types?

>
> This means that it is possible to share the same physical pages across multiple processes,
> which is useful for implementing functionality such as gralloc/ashmem/dmabuf;
> virtio-hostmem only guarantees the security boundaries where
> no sub-device instance is allowed to access the memory of an instance of
> a different sub-device.
>
> Indeed, it is possible for a malicious guest process to improperly access
> the shared memory of a gralloc/ashmem/dmabuf implementation on virtio-hostmem,
> but we regard that as a flaw in the security model of the guest,
> not the security model of virtio-hostmem.
>
> When a virtio-hostmem instance in the guest is created,
> a use-case-specific initialization happens on the host
> in response to the creation request.
>
> In operating the device, a notification virtqueue is used for the guest to notify the host
> when something interesting has happened in the shared memory via communicating
> the offset / size of any transaction, if applicable, and metadata.
> This makes it well suited for many kinds of high performance / low latency
> devices such as graphics API forwarding, audio/video codecs, sensors, etc;
> no actual memory is sent over the virtqueue.
>
> Note that this is asymmetric;
> there will be on tx notification virtqueue for each guest instance,
> while there is only one rx event virtqueue for host to guest notifications.
> This is because it can be faster not to share the same virtqueue
> if multiple guest instances all use high bandwidth/low memory operations
> over the virtio-hostmem device to send data to the host;
> this is especially common for the case of graphics API forwarding
> and media codecs.
>
> Both guest kernel and userspace drivers can be written using operations
> on virtio-hostmem in a way that mirrors UIO for Linux;
> open()/close()/ioctl()/read()/write()/mmap(),
> but concrete implementations are outside the scope of this spec.
>
> \subsection{Example Use Case}\label{sec:Device Types / Host Memory Device / Example Use Case}
>
> Suppose the guest wants to decode a compressed video buffer.
>
> \begin{enumerate}
>
> \item VMM is configured for codec support and a vendor/device/revision id is associated
>     with the codec device.
>
> \item On startup, a physical address range, say 128 MB, is associated with the codec device.
>     The range must be usable as RAM, so the host backs it as part of the guest startup process.
>
> \item To save memory, the codec device implementation on the host
>     can begin servicing this range via mapping them all to the same host page. But this is not required;
>         the host can initialize a codec library buffer on the host on bootup and pre-allocate the entire region there.
>         The main invariant is that the physical address range is never not mapped as RAM and usable as RAM.
>
> \item Guest creates an instance for the codec vendor id / device id / revision
>     via sending a message over the config virtqueue.
>
> \item Guest codec driver does an implementation dependent suballocation operation and communicates via
>     notification virtqueue to the host that this instance wants to use that sub-region.
>
> \item The host now does an implementation dependent operation to back the sub-region with usable memory.
>     But this is not required; the host could have set the entire region up at startup.
>
> \item Guest downloads compressed video buffers into that region.
>
> \item Guest: After a packet of compressed video stream is downloaded to the
>     buffer, another message, like a doorbell, is sent on the notification virtqueue to
>         consume existing compressed data. The notification message's offset field is
>         set to the proper offset into the shared-mem object.
>
> \item Host: Codec implementation decodes the video and puts the decoded frames
>     to either a host-side display library (thus with no further guest
>         communication necessary), or puts the raw decompressed frame to a
>         further offset in the shared memory region that the guest knows about.
>
> \item Guest: Continue downloading video streams and sending notifications,
>     or optionally, wait until the host is done first. If scheduling is not that
>         big of an impact, this can be done without even any further VM exit, by
>         the host writing to an agreed memory location when decoding is done,
>         then the guest uses a polling sleep(N) where N is the correctly tuned
>         timeout such that only a few poll spins are necessary.
>
> \item Guest: Or, the host can send back on the event virtqueue \field{revents}
>     and the guest can perform a blocking read() for it.
>
> \end{enumerate}
>
> The unique / interesting aspects of virtio-hostmem are demonstrated:
>
> \begin{enumerate}
>
> \item During instance creation the host was allowed to reject the request if
>     the codec device did not exist on host.
>
> \item The host can expose a codec library buffer directly to the guest,
>     allowing the guest to write into it with zero copy and the host to decompress again without copying.
>
> \item Large bidirectional transfers are possible with zero copy.
>
> \item Large bidirectional transfers are possible without scatterlists, because
>     the memory is always physically contiguous.
>
> \item It is not necessary to use socket datagrams or data streams to
>     communicate the notification messages; they can be raw structs fresh off the
>         virtqueue.
>
> \item After decoding, the guest has the option but not the requirement to wait
>     for the host round trip, allowing for async operation of the codec.
>
> \item The guest has the option but not the requirement to wait for the host
>     round trip, allowing for async operation of the codec.
>
> \end{enumerate}

I skipped everything until here because I really need to see the driver/device
interface before further statements will make any sense.  Perhaps the spec can
be reordered, it's hard to read it linearly.

>
> \subsection{Device ID}\label{sec:Device Types / Host Memory Device / Device ID}
>
> 21
>
> \subsection{Virtqueues}\label{sec:Device Types / Host Memory Device / Virtqueues}
>
> \begin{description}
> \item[0] config
> \item[1] event
> \item[2..n] notification
> \end{description}

The text previously talked about rx/tx queues.  I don't see them here?  Please
pick one term and use it consistently.

>
> There is one notification virtqueue for each instance.
> The maximum number of virtqueue is kept to an implementation-specific limit

s/virtqueue/virtqueues/

> that is stored in the configuration layout.
>
> \subsection{Feature bits}\label{sec: Device Types / Host Memory Device / Feature bits }
>
> No feature bits.
>
> \subsubsection{Feature bit requirements}\label{sec:Device Types / Host Memory Device / Feature bit requirements}
>
> No feature bit requirements.
>
> \subsection{Device configuration layout}\label{sec:Device Types / Host Memory Device / Device configuration layout}
>
> The configuration layout enumerates all sub-devices and a guest physical memory region
> for each sub-device.
>
> \begin{lstlisting}
> struct virtio_hostmem_device_memory_region {
>     le64 phys_addr_start;
>     le64 phys_addr_end;
> }
>
> struct virtio_hostmem_device_info {
>     le32 vendor_id;
>     le32 device_id;
>     le32 revision;
>     struct virtio_hostmem_device_memory_region mem_region;
> }
>
> struct virtio_hostmem_config {
>     le32 num_devices;
>     virtio_hostmem_device_info available_devices[MAX_DEVICES];
>     le32 MAX_INSTANCES;
> };

Where is MAX_DEVICES defined?

>
> One shared memory shmid is associatd with each sub-device.

I'm confused.  The code only mentions "device" but the spec mentions
sub-device.  Are they different concepts?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)
  2019-03-06 16:36 ` [virtio-comment] " Stefan Hajnoczi
@ 2019-03-06 17:28   ` Michael S. Tsirkin
  2019-03-07 17:33     ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2019-03-06 17:28 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Frank Yang, virtio-comment, Cornelia Huck, Gerd Hoffmann,
	Dr. David Alan Gilbert, Roman Kiryanov

On Wed, Mar 06, 2019 at 04:36:16PM +0000, Stefan Hajnoczi wrote:
> > \section{Host Memory Device}\label{sec:Device Types / Host Memory Device}
> 
> The Host Memory Device defines an entirely new device model that bypasses
> VIRTIO.  Why make this a VIRTIO device if the VIRTIO device model is not a good
> fit for what you're trying to achieve?  This device seems out of scope to me.

It's different from virtio pci.
But this looks a bit like a special transport to me.
I'll look at it more next week sometime.

-- 
MST

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)
  2019-03-06 17:28   ` Michael S. Tsirkin
@ 2019-03-07 17:33     ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2019-03-07 17:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Frank Yang, virtio-comment, Cornelia Huck, Gerd Hoffmann,
	Dr. David Alan Gilbert, Roman Kiryanov

[-- Attachment #1: Type: text/plain, Size: 914 bytes --]

On Wed, Mar 06, 2019 at 12:28:30PM -0500, Michael S. Tsirkin wrote:
> On Wed, Mar 06, 2019 at 04:36:16PM +0000, Stefan Hajnoczi wrote:
> > > \section{Host Memory Device}\label{sec:Device Types / Host Memory Device}
> > 
> > The Host Memory Device defines an entirely new device model that bypasses
> > VIRTIO.  Why make this a VIRTIO device if the VIRTIO device model is not a good
> > fit for what you're trying to achieve?  This device seems out of scope to me.
> 
> It's different from virtio pci.
> But this looks a bit like a special transport to me.
> I'll look at it more next week sometime.

It resembles ivshmem.

Frank: ivshmem is an existing (non-VIRTIO) PCI device that offers
similar functionality to what you're proposing.  It was not actively
maintained over the years and is not widely used, but you can look at
QEMU's docs/specs/ivshmem-spec.txt and hw/misc/ivshmem.c.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-03-07 17:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-24 21:18 [virtio-comment] RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device) Frank Yang
2019-02-24 21:22 ` [virtio-comment] " Frank Yang
2019-02-25  5:15 ` Roman Kiryanov
2019-02-25  5:27 ` [virtio-comment] " Roman Kiryanov
2019-02-25 12:56 ` [virtio-comment] " Dr. David Alan Gilbert
2019-02-25 13:50 ` [virtio-comment] " Michael S. Tsirkin
2019-02-25 18:54   ` Roman Kiryanov
2019-02-25 20:34     ` Michael S. Tsirkin
2019-02-25 23:08       ` Roman Kiryanov
2019-02-25 23:45         ` Michael S. Tsirkin
2019-03-06 16:36 ` [virtio-comment] " Stefan Hajnoczi
2019-03-06 17:28   ` Michael S. Tsirkin
2019-03-07 17:33     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.