* [virtio-dev] Memory sharing device @ 2019-02-01 20:34 Roman Kiryanov 2019-02-04 5:40 ` Stefan Hajnoczi 0 siblings, 1 reply; 72+ messages in thread From: Roman Kiryanov @ 2019-02-01 20:34 UTC (permalink / raw) To: virtio-dev Hi, for our purposes we need to access host's memory (e.g. Vulkan buffers, but we also considering other things, like running all drivers in userspace) directly from a linux guest (Android). I implemented a device in QEMU and a linux driver for it: https://android.googlesource.com/kernel/goldfish/+/android-goldfish-4.14-dev/drivers/misc/goldfish_address_space.c https://android.googlesource.com/platform/external/qemu/+/emu-master-dev/hw/pci/goldfish_address_space.c during upstreaming the driver it was suggested that developing a virtio spec could be a better approach than inventing our specific driver and device. Could you please advise how to start? Thank you. Regards, Roman. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-01 20:34 [virtio-dev] Memory sharing device Roman Kiryanov @ 2019-02-04 5:40 ` Stefan Hajnoczi 2019-02-04 10:13 ` Gerd Hoffmann 0 siblings, 1 reply; 72+ messages in thread From: Stefan Hajnoczi @ 2019-02-04 5:40 UTC (permalink / raw) To: Roman Kiryanov; +Cc: virtio-dev, Dr. David Alan Gilbert, kraxel [-- Attachment #1: Type: text/plain, Size: 1285 bytes --] On Fri, Feb 01, 2019 at 12:34:07PM -0800, Roman Kiryanov wrote: > for our purposes we need to access host's memory (e.g. Vulkan buffers, > but we also considering other things, like running all drivers in > userspace) directly from a linux guest (Android). I implemented a > device in QEMU and a linux driver for it: > > https://android.googlesource.com/kernel/goldfish/+/android-goldfish-4.14-dev/drivers/misc/goldfish_address_space.c > > https://android.googlesource.com/platform/external/qemu/+/emu-master-dev/hw/pci/goldfish_address_space.c > > during upstreaming the driver it was suggested that developing a > virtio spec could be a better approach than inventing our specific > driver and device. > > Could you please advise how to start? Hi Roman, David Gilbert, Gerd Hoffmann, and I have discussed adding shared memory resources to VIRTIO. That means memory made available by the device to the driver instead of the usual other way around. virtio-gpu needs this and perhaps that use case overlaps with yours too. virtio-fs and virtio-vhost-user both also need shared memory. Here is David Gilbert's latest draft spec: https://lists.oasis-open.org/archives/virtio-comment/201901/msg00000.html Is this what you were thinking of? Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-04 5:40 ` Stefan Hajnoczi @ 2019-02-04 10:13 ` Gerd Hoffmann 2019-02-04 10:18 ` Roman Kiryanov 2019-02-05 7:42 ` Roman Kiryanov 0 siblings, 2 replies; 72+ messages in thread From: Gerd Hoffmann @ 2019-02-04 10:13 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Roman Kiryanov, virtio-dev, Dr. David Alan Gilbert On Mon, Feb 04, 2019 at 01:40:53PM +0800, Stefan Hajnoczi wrote: > On Fri, Feb 01, 2019 at 12:34:07PM -0800, Roman Kiryanov wrote: > > for our purposes we need to access host's memory (e.g. Vulkan buffers, > > but we also considering other things, like running all drivers in > > userspace) directly from a linux guest (Android). I implemented a > > device in QEMU and a linux driver for it: > > > > https://android.googlesource.com/kernel/goldfish/+/android-goldfish-4.14-dev/drivers/misc/goldfish_address_space.c > > > > https://android.googlesource.com/platform/external/qemu/+/emu-master-dev/hw/pci/goldfish_address_space.c > > > > during upstreaming the driver it was suggested that developing a > > virtio spec could be a better approach than inventing our specific > > driver and device. > > > > Could you please advise how to start? > > Hi Roman, > David Gilbert, Gerd Hoffmann, and I have discussed adding shared memory > resources to VIRTIO. That means memory made available by the device to > the driver instead of the usual other way around. > > virtio-gpu needs this and perhaps that use case overlaps with yours too. virtio-gpu specifically needs that to support vulkan and opengl extensions for coherent buffers, which must be allocated by the host gpu driver. It's WIP still. cheers, Gerd --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-04 10:13 ` Gerd Hoffmann @ 2019-02-04 10:18 ` Roman Kiryanov 2019-02-05 7:42 ` Roman Kiryanov 1 sibling, 0 replies; 72+ messages in thread From: Roman Kiryanov @ 2019-02-04 10:18 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: Stefan Hajnoczi, virtio-dev, Dr. David Alan Gilbert > > David Gilbert, Gerd Hoffmann, and I have discussed adding shared memory > > resources to VIRTIO. That means memory made available by the device to > > the driver instead of the usual other way around. > > > > virtio-gpu needs this and perhaps that use case overlaps with yours too. > > virtio-gpu specifically needs that to support vulkan and opengl > extensions for coherent buffers, which must be allocated by the host gpu > driver. It's WIP still. Thank you, Stefan and Gerd for looking into it. Yes, Vulkan is our first use case for the memory sharing device. I will take a look to the draft Stefan provided to see how it is different from what we are thinking about. Regards, Roman. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-04 10:13 ` Gerd Hoffmann 2019-02-04 10:18 ` Roman Kiryanov @ 2019-02-05 7:42 ` Roman Kiryanov 2019-02-05 10:04 ` Dr. David Alan Gilbert 2019-02-11 14:49 ` Michael S. Tsirkin 1 sibling, 2 replies; 72+ messages in thread From: Roman Kiryanov @ 2019-02-05 7:42 UTC (permalink / raw) To: Gerd Hoffmann Cc: Stefan Hajnoczi, virtio-dev, Dr. David Alan Gilbert, Lingfeng Yang Hi Gerd, > virtio-gpu specifically needs that to support vulkan and opengl > extensions for coherent buffers, which must be allocated by the host gpu > driver. It's WIP still. the proposed spec says: +Shared memory regions MUST NOT be used to control the operation +of the device, nor to stream data; those should still be performed +using virtqueues. Is there a strong reason to prohibit using memory regions for control purposes? Our long term goal is to have as few kernel drivers as possible and to move "drivers" into userspace. If we go with the virtqueues, is there general a purpose device/driver to talk between our host and guest to support custom hardware (with own blobs)? Could you please advise if we can use something else to achieve this goal? I saw there were registers added, could you please elaborate how new address regions are added and associated with the host memory (and backwards)? We allocate a region from the guest first and pass its offset to the host to plug real RAM into it and then we mmap this offset: https://photos.app.goo.gl/NJvPBvvFS3S3n9mn6 Thank you. Regards, Roman. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-05 7:42 ` Roman Kiryanov @ 2019-02-05 10:04 ` Dr. David Alan Gilbert 2019-02-05 15:17 ` Frank Yang 2019-02-05 21:06 ` Roman Kiryanov 2019-02-11 14:49 ` Michael S. Tsirkin 1 sibling, 2 replies; 72+ messages in thread From: Dr. David Alan Gilbert @ 2019-02-05 10:04 UTC (permalink / raw) To: Roman Kiryanov; +Cc: Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Lingfeng Yang * Roman Kiryanov (rkir@google.com) wrote: > Hi Gerd, > > > virtio-gpu specifically needs that to support vulkan and opengl > > extensions for coherent buffers, which must be allocated by the host gpu > > driver. It's WIP still. > Hi Roman, > the proposed spec says: > > +Shared memory regions MUST NOT be used to control the operation > +of the device, nor to stream data; those should still be performed > +using virtqueues. Yes, I put that in. > Is there a strong reason to prohibit using memory regions for control purposes? > Our long term goal is to have as few kernel drivers as possible and to move > "drivers" into userspace. If we go with the virtqueues, is there > general a purpose > device/driver to talk between our host and guest to support custom hardware > (with own blobs)? Could you please advise if we can use something else to > achieve this goal? My reason for that paragraph was to try and think about what should still be in the virtqueues; after all a device that *just* shares a block of memory and does everything in the block of memory itself isn't really a virtio device - it's the standardised queue structure that makes it a virtio device. However, I'd be happy to accept the 'MUST NOT' might be a bit strong for some cases where there's stuff that makes sense in the queues and stuff that makes sense differently. > I saw there were registers added, could you please elaborate how new address > regions are added and associated with the host memory (and backwards)? In virtio-fs we have two separate stages: a) A shared arena is setup (and that's what the spec Stefan pointed to is about) - it's statically allocated at device creation and corresponds to a chunk of guest physical address space b) During operation the guest kernel asks for files to be mapped into part of that arena dynamically, using commands sent over the queue - our queue carries FUSE commands, and we've added two new FUSE commands to perform the map/unmap. They talk in terms of offsets within the shared arena, rather than GPAs. So I'd tried to start by doing the spec for (a). > We allocate a region from the guest first and pass its offset to the > host to plug > real RAM into it and then we mmap this offset: > > https://photos.app.goo.gl/NJvPBvvFS3S3n9mn6 How do you transmit the glMapBufferRange command from QEMU driver to host? Dave > Thank you. > > Regards, > Roman. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-05 10:04 ` Dr. David Alan Gilbert @ 2019-02-05 15:17 ` Frank Yang 2019-02-05 15:21 ` Frank Yang 2019-02-05 21:06 ` Roman Kiryanov 1 sibling, 1 reply; 72+ messages in thread From: Frank Yang @ 2019-02-05 15:17 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev [-- Attachment #1: Type: text/plain, Size: 8131 bytes --] Hi all, I'm Frank who's been using Roman's goldfish address space driver for Vulkan host visible memory for the emulator. Some more in-depth replies inline. On Tue, Feb 5, 2019 at 2:04 AM Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > * Roman Kiryanov (rkir@google.com) wrote: > > Hi Gerd, > > > > > virtio-gpu specifically needs that to support vulkan and opengl > > > extensions for coherent buffers, which must be allocated by the host > gpu > > > driver. It's WIP still. > > > > Hi Roman, > > > the proposed spec says: > > > > +Shared memory regions MUST NOT be used to control the operation > > +of the device, nor to stream data; those should still be performed > > +using virtqueues. > > Yes, I put that in. > > > Is there a strong reason to prohibit using memory regions for control > purposes? > > Our long term goal is to have as few kernel drivers as possible and to > move > > "drivers" into userspace. If we go with the virtqueues, is there > > general a purpose > > device/driver to talk between our host and guest to support custom > hardware > > (with own blobs)? Could you please advise if we can use something else to > > achieve this goal? > > My reason for that paragraph was to try and think about what should > still be in the virtqueues; after all a device that *just* shares a > block of memory and does everything in the block of memory itself isn't > really a virtio device - it's the standardised queue structure that > makes it a virtio device. > However, I'd be happy to accept the 'MUST NOT' might be a bit strong for > some cases where there's stuff that makes sense in the queues and > stuff that makes sense differently. > > Currently, how we drive the gl/vk host coherent memory is that a host memory sharing device and a meta pipe device are used in tandem. The pipe device, goldfish_pipe, is used to send control messages (but is also currently used to send the API call parameters themselves over, and to drive other devices like sensors and camera; it's a bit of a catch-all), while the host memory sharing device does the act of sharing memory from the host and telling the guest which physical addresses are to be sent with the glMapBufferRange/vkMapMemory calls over the pipe to the host. In the interest of having fewer custom kernel drivers for the emulator, we were thinking of two major approaches to upstreaming the control message / meta pipe part: 1. Come up with a new virtio driver that captures what goldfish_pipe does; it would have a virtqueue and it would be something like a virtio driver for drivers defined in userspace that interacts closely with a host memory sharing driver (virtio-userspace?). It would be used with the host memory sharing driver not just to share coherent mappings, but also to deliver the API call parameters. It'd have a single ioctl that pushes a message into the virtqueue that notifies the host a) what kind of userspace driver it is and b) how much data to send/receive. 1. On the host side, we would make the resolution of what virtual device code to run based on the control message decided by a plugin DLL to qemu. So once we decide to add new functionality, we would at max need to increment some version number that is sent in some initial control message, or change some enumeration in a handshake at the beginning, so no changes would have to be made to the guest kernel or QEMU itself. 2. This is useful for standardizing the Android Emulator drivers in the short term, but in the long term, it could be useful for quickly specifying new drivers/devices in situations where the developer has some control over both the guest/host bits. We'd use this also for: 1. Media codecs: the guest is given a pointer to host codec input/output buffers, downloads compressed data to the input buffer, and ioctl ping's the host. Then the host asynchronously decodes and populates the codec output buffer. 2. One-off extension functionalities for Vulkan, such as VK_KHR_external_memory_fd/win32. Suppose we want to simulate an OPAQUE_FD Vulkan external memory in the guest, but we are running on a win32 host (this will be an important part of our use case). Define a new driver type in userspace, say enum VulkanOpaqueFdWrapper = 55, then open virtio-userspace and run ioctls to define that fd as that kind of driver. On the host side, it would then associate the filp with a host-side win32 vulkan handle. This can be a modular way to handle further functionality in Vulkan as it comes up without requiring kernel / QEMU changes. 2. Add a raw ioctl for the above control messages to the proposed host memory sharing driver, and make those control messages part of the host memory sharing driver's virtqueue. I heard somewhere that having this kind of thing might run up against virtio design philosophies of having fewer 'generic' pipes; however, it could be valuable to have a generic way of defining driver/device functionality that is configurable without needing to change guest kernels / qemu directly. > I saw there were registers added, could you please elaborate how new > address > > regions are added and associated with the host memory (and backwards)? > > In virtio-fs we have two separate stages: > a) A shared arena is setup (and that's what the spec Stefan pointed to > is about) - > it's statically allocated at device creation and corresponds to a > chunk > of guest physical address space > > This is quite like what we're doing for goldfish address space and Vulkan host visible currently. - Our address space device reserves a fixed region in guest physical address space on device realization. 16 gb - At the level of Vulkan, on Vulkan device creation, we map a sizable amount of host visible memory on the host, and then use the address space device to expose it to the guest. It then occupies some offset into the address space device's pci resource. - At the level of the guest Vulkan user, we satisfy host visible VkDeviceMemory allocations by faking them; creating guest-only handles and suballocating into that initial host visible memory, and then editing memory offset/size parameters to correspond to the actual memory before the API calls get to the host driver. b) During operation the guest kernel asks for files to be mapped into > part of that arena dynamically, using commands sent over the queue > - our queue carries FUSE commands, and we've added two new FUSE > commands to perform the map/unmap. They talk in terms of offsets > within the shared arena, rather than GPAs. > Yes, we'll most likely be operating in a similar manner for OpenGL and VUlkan. > > So I'd tried to start by doing the spec for (a). > > > We allocate a region from the guest first and pass its offset to the > > host to plug > > real RAM into it and then we mmap this offset: > > > > https://photos.app.goo.gl/NJvPBvvFS3S3n9mn6 > > How do you transmit the glMapBufferRange command from QEMU driver to > host? > > This is done through an ioctl in the address space driver together with meta pipe commands: 1. Using the address space driver, run an ioctl to "Allocate" a region, which reserves some space. An offset into the region is returned. 2. Using the meta pipe drier, tell the host about the offset and the API call parameters of glMapBufferRange. On the host, glMapBufferRange is run for real, and the resulting host pointer is sent to KVM_SET_USER_MEMORY_REGION + pci resource start + that offset. 3. mmap the region with the supplied offset in the guest. Dave > > > Thank you. > > > > Regards, > > Roman. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > [-- Attachment #2: Type: text/html, Size: 9819 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-05 15:17 ` Frank Yang @ 2019-02-05 15:21 ` Frank Yang 0 siblings, 0 replies; 72+ messages in thread From: Frank Yang @ 2019-02-05 15:21 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev [-- Attachment #1: Type: text/plain, Size: 8826 bytes --] On Tue, Feb 5, 2019 at 7:17 AM Frank Yang <lfy@google.com> wrote: > Hi all, > > I'm Frank who's been using Roman's goldfish address space driver for > Vulkan host visible memory for the emulator. Some more in-depth replies > inline. > > On Tue, Feb 5, 2019 at 2:04 AM Dr. David Alan Gilbert <dgilbert@redhat.com> > wrote: > >> * Roman Kiryanov (rkir@google.com) wrote: >> > Hi Gerd, >> > >> > > virtio-gpu specifically needs that to support vulkan and opengl >> > > extensions for coherent buffers, which must be allocated by the host >> gpu >> > > driver. It's WIP still. >> > >> >> Hi Roman, >> >> > the proposed spec says: >> > >> > +Shared memory regions MUST NOT be used to control the operation >> > +of the device, nor to stream data; those should still be performed >> > +using virtqueues. >> >> Yes, I put that in. >> >> > Is there a strong reason to prohibit using memory regions for control >> purposes? >> > Our long term goal is to have as few kernel drivers as possible and to >> move >> > "drivers" into userspace. If we go with the virtqueues, is there >> > general a purpose >> > device/driver to talk between our host and guest to support custom >> hardware >> > (with own blobs)? Could you please advise if we can use something else >> to >> > achieve this goal? >> >> My reason for that paragraph was to try and think about what should >> still be in the virtqueues; after all a device that *just* shares a >> block of memory and does everything in the block of memory itself isn't >> really a virtio device - it's the standardised queue structure that >> makes it a virtio device. >> However, I'd be happy to accept the 'MUST NOT' might be a bit strong for >> some cases where there's stuff that makes sense in the queues and >> stuff that makes sense differently. >> >> > Currently, how we drive the gl/vk host coherent memory is that a host > memory sharing device and a meta pipe device are used in tandem. The pipe > device, goldfish_pipe, is used to send control messages (but is also > currently used to send the API call parameters themselves over, and to > drive other devices like sensors and camera; it's a bit of a catch-all), > while the host memory sharing device does the act of sharing memory from > the host and telling the guest which physical addresses are to be sent with > the glMapBufferRange/vkMapMemory calls over the pipe to the host. > On this note, it might also be beneficial to run host memory sharing drivers where host memory backs the API call parameters as well. This way, the guest has to do less work to notify the host of which API calls need to run. In particular, the "scatterlist" to reconstruct such API calls will always be length 1, as the memory is always physically contiguous. > > In the interest of having fewer custom kernel drivers for the emulator, we > were thinking of two major approaches to upstreaming the control message / > meta pipe part: > > 1. Come up with a new virtio driver that captures what goldfish_pipe > does; it would have a virtqueue and it would be something like a virtio > driver for drivers defined in userspace that interacts closely with a host > memory sharing driver (virtio-userspace?). It would be used with the host > memory sharing driver not just to share coherent mappings, but also to > deliver the API call parameters. It'd have a single ioctl that pushes a > message into the virtqueue that notifies the host a) what kind of userspace > driver it is and b) how much data to send/receive. > 1. On the host side, we would make the resolution of what virtual > device code to run based on the control message decided by a plugin DLL to > qemu. So once we decide to add new functionality, we would at max need to > increment some version number that is sent in some initial control message, > or change some enumeration in a handshake at the beginning, so no changes > would have to be made to the guest kernel or QEMU itself. > 2. This is useful for standardizing the Android Emulator drivers in > the short term, but in the long term, it could be useful for quickly > specifying new drivers/devices in situations where the developer has some > control over both the guest/host bits. We'd use this also for: > 1. Media codecs: the guest is given a pointer to host codec > input/output buffers, downloads compressed data to the input buffer, and > ioctl ping's the host. Then the host asynchronously decodes and populates > the codec output buffer. > 2. One-off extension functionalities for Vulkan, such as > VK_KHR_external_memory_fd/win32. Suppose we want to simulate an OPAQUE_FD > Vulkan external memory in the guest, but we are running on a win32 host > (this will be an important part of our use case). Define a new driver type > in userspace, say enum VulkanOpaqueFdWrapper = 55, then open > virtio-userspace and run ioctls to define that fd as that kind of driver. > On the host side, it would then associate the filp with a host-side win32 > vulkan handle. This can be a modular way to handle further functionality in > Vulkan as it comes up without requiring kernel / QEMU changes. > 2. Add a raw ioctl for the above control messages to the proposed > host memory sharing driver, and make those control messages part of the > host memory sharing driver's virtqueue. > > I heard somewhere that having this kind of thing might run up against > virtio design philosophies of having fewer 'generic' pipes; however, it > could be valuable to have a generic way of defining driver/device > functionality that is configurable without needing to change guest kernels > / qemu directly. > > > I saw there were registers added, could you please elaborate how new >> address >> > regions are added and associated with the host memory (and backwards)? >> >> In virtio-fs we have two separate stages: >> a) A shared arena is setup (and that's what the spec Stefan pointed to >> is about) - >> it's statically allocated at device creation and corresponds to a >> chunk >> of guest physical address space >> >> This is quite like what we're doing for goldfish address space and Vulkan > host visible currently. > > > - Our address space device reserves a fixed region in guest physical > address space on device realization. 16 gb > - At the level of Vulkan, on Vulkan device creation, we map a sizable > amount of host visible memory on the host, and then use the address space > device to expose it to the guest. It then occupies some offset into the > address space device's pci resource. > - At the level of the guest Vulkan user, we satisfy host visible > VkDeviceMemory allocations by faking them; creating guest-only handles and > suballocating into that initial host visible memory, and then editing > memory offset/size parameters to correspond to the actual memory before the > API calls get to the host driver. > > b) During operation the guest kernel asks for files to be mapped into >> part of that arena dynamically, using commands sent over the queue >> - our queue carries FUSE commands, and we've added two new FUSE >> commands to perform the map/unmap. They talk in terms of offsets >> within the shared arena, rather than GPAs. >> > > Yes, we'll most likely be operating in a similar manner for OpenGL and > VUlkan. > >> >> So I'd tried to start by doing the spec for (a). >> >> > We allocate a region from the guest first and pass its offset to the >> > host to plug >> > real RAM into it and then we mmap this offset: >> > >> > https://photos.app.goo.gl/NJvPBvvFS3S3n9mn6 >> >> How do you transmit the glMapBufferRange command from QEMU driver to >> host? >> >> This is done through an ioctl in the address space driver together with > meta pipe commands: > > 1. Using the address space driver, run an ioctl to "Allocate" a > region, which reserves some space. An offset into the region is returned. > 2. Using the meta pipe drier, tell the host about the offset and the > API call parameters of glMapBufferRange. On the host, glMapBufferRange is > run for real, and the resulting host pointer is sent to > KVM_SET_USER_MEMORY_REGION + pci resource start + that offset. > 3. mmap the region with the supplied offset in the guest. > > Dave >> >> > Thank you. >> > >> > Regards, >> > Roman. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org >> > >> -- >> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >> > [-- Attachment #2: Type: text/html, Size: 10797 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-05 10:04 ` Dr. David Alan Gilbert 2019-02-05 15:17 ` Frank Yang @ 2019-02-05 21:06 ` Roman Kiryanov 2019-02-06 7:03 ` Gerd Hoffmann 2019-02-06 20:14 ` Dr. David Alan Gilbert 1 sibling, 2 replies; 72+ messages in thread From: Roman Kiryanov @ 2019-02-05 21:06 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Lingfeng Yang Hi Dave, > In virtio-fs we have two separate stages: > a) A shared arena is setup (and that's what the spec Stefan pointed to is about) - > it's statically allocated at device creation and corresponds to a chunk > of guest physical address space We do exactly the same: https://android.googlesource.com/platform/external/qemu/+/emu-master-dev/hw/pci/goldfish_address_space.c#659 > b) During operation the guest kernel asks for files to be mapped into > part of that arena dynamically, using commands sent over the queue > - our queue carries FUSE commands, and we've added two new FUSE > commands to perform the map/unmap. They talk in terms of offsets > within the shared arena, rather than GPAs. In our case we have no files to map, only pointers returned from OpenGL or Vulkan. Do you have the approach to share for this use case? > How do you transmit the glMapBufferRange command from QEMU driver to > host? In December we did this by passing these bits over our guest-host channel (another driver, goldfish_pipe). Frank is currently working on moving this into our memory mapping device as "something changed in the memory you shared". Do you this it is possible to have virtio-pipe where we could send arbitrary blobs between guest and host? We want to move all our drivers into userspace so we could share memory using the device you are currently working on and this virtio-pipe to pass MMIOs and IRQs to control our devices to avoid dealing with kernel drivers at all. Thank you. Regards, Roman. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-05 21:06 ` Roman Kiryanov @ 2019-02-06 7:03 ` Gerd Hoffmann 2019-02-06 15:09 ` Frank Yang 2019-02-06 20:14 ` Dr. David Alan Gilbert 1 sibling, 1 reply; 72+ messages in thread From: Gerd Hoffmann @ 2019-02-06 7:03 UTC (permalink / raw) To: Roman Kiryanov Cc: Dr. David Alan Gilbert, Stefan Hajnoczi, virtio-dev, Lingfeng Yang On Tue, Feb 05, 2019 at 01:06:42PM -0800, Roman Kiryanov wrote: > Hi Dave, > > > In virtio-fs we have two separate stages: > > a) A shared arena is setup (and that's what the spec Stefan pointed to is about) - > > it's statically allocated at device creation and corresponds to a chunk > > of guest physical address space > > We do exactly the same: > https://android.googlesource.com/platform/external/qemu/+/emu-master-dev/hw/pci/goldfish_address_space.c#659 > > > b) During operation the guest kernel asks for files to be mapped into > > part of that arena dynamically, using commands sent over the queue > > - our queue carries FUSE commands, and we've added two new FUSE > > commands to perform the map/unmap. They talk in terms of offsets > > within the shared arena, rather than GPAs. > > In our case we have no files to map, only pointers returned from > OpenGL or Vulkan. > Do you have the approach to share for this use case? Fundamentally the same: The memory region (PCI bar in case of virtio-pci) reserves address space. The guest manages the address space, it can ask the host to map host gpu resources there. Well, that is at least the plan. Some incomplete WIP patches exist, I'm still busy hammering virtio-gpu ttm code into shape so it can support different kinds of gpu objects. > Do you this it is possible to have virtio-pipe where we could send > arbitrary blobs between > guest and host? Well, there are virtio-serial and virtio-vsock which both give you a pipe between host and guest, simliar to a serial line / tcp socket. Dunno how good they are at handling larger blobs though. cheers, Gerd --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-06 7:03 ` Gerd Hoffmann @ 2019-02-06 15:09 ` Frank Yang 2019-02-06 15:11 ` Frank Yang 2019-02-08 7:57 ` Stefan Hajnoczi 0 siblings, 2 replies; 72+ messages in thread From: Frank Yang @ 2019-02-06 15:09 UTC (permalink / raw) To: Gerd Hoffmann Cc: Roman Kiryanov, Dr. David Alan Gilbert, Stefan Hajnoczi, virtio-dev [-- Attachment #1: Type: text/plain, Size: 2215 bytes --] On Tue, Feb 5, 2019 at 11:03 PM Gerd Hoffmann <kraxel@redhat.com> wrote: > On Tue, Feb 05, 2019 at 01:06:42PM -0800, Roman Kiryanov wrote: > > Hi Dave, > > > > > In virtio-fs we have two separate stages: > > > a) A shared arena is setup (and that's what the spec Stefan pointed > to is about) - > > > it's statically allocated at device creation and corresponds to a > chunk > > > of guest physical address space > > > > We do exactly the same: > > > https://android.googlesource.com/platform/external/qemu/+/emu-master-dev/hw/pci/goldfish_address_space.c#659 > > > > > b) During operation the guest kernel asks for files to be mapped into > > > part of that arena dynamically, using commands sent over the queue > > > - our queue carries FUSE commands, and we've added two new FUSE > > > commands to perform the map/unmap. They talk in terms of offsets > > > within the shared arena, rather than GPAs. > > > > In our case we have no files to map, only pointers returned from > > OpenGL or Vulkan. > > Do you have the approach to share for this use case? > > Fundamentally the same: The memory region (PCI bar in case of > virtio-pci) reserves address space. The guest manages the address > space, it can ask the host to map host gpu resources there. > > Well, that is at least the plan. Some incomplete WIP patches exist, I'm > still busy hammering virtio-gpu ttm code into shape so it can support > different kinds of gpu objects. > > > Do you this it is possible to have virtio-pipe where we could send > > arbitrary blobs between > > guest and host? > > Well, there are virtio-serial and virtio-vsock which both give you a > pipe between host and guest, simliar to a serial line / tcp socket. > Dunno how good they are at handling larger blobs though. > > I've looked at virtio-vsock and it seems general, but requires Unix sockets, which is not going to work for us on Windows and not going to work as expected on macOS (most likely). Is there anything that is similar to and as portable as goldfish pipe which is more like a raw virtqueue? This would then work on memory in the same process, with callbacks registered to trigger upon transmission. cheers, > Gerd > > [-- Attachment #2: Type: text/html, Size: 3110 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-06 15:09 ` Frank Yang @ 2019-02-06 15:11 ` Frank Yang 2019-02-08 7:57 ` Stefan Hajnoczi 1 sibling, 0 replies; 72+ messages in thread From: Frank Yang @ 2019-02-06 15:11 UTC (permalink / raw) To: Gerd Hoffmann Cc: Roman Kiryanov, Dr. David Alan Gilbert, Stefan Hajnoczi, virtio-dev [-- Attachment #1: Type: text/plain, Size: 2509 bytes --] (Virtio-serial also doesn't seem like a good option due to specialization to console forwarding and having an encoded limit on the number of connections) On Wed, Feb 6, 2019 at 7:09 AM Frank Yang <lfy@google.com> wrote: > > > On Tue, Feb 5, 2019 at 11:03 PM Gerd Hoffmann <kraxel@redhat.com> wrote: > >> On Tue, Feb 05, 2019 at 01:06:42PM -0800, Roman Kiryanov wrote: >> > Hi Dave, >> > >> > > In virtio-fs we have two separate stages: >> > > a) A shared arena is setup (and that's what the spec Stefan pointed >> to is about) - >> > > it's statically allocated at device creation and corresponds to >> a chunk >> > > of guest physical address space >> > >> > We do exactly the same: >> > >> https://android.googlesource.com/platform/external/qemu/+/emu-master-dev/hw/pci/goldfish_address_space.c#659 >> > >> > > b) During operation the guest kernel asks for files to be mapped >> into >> > > part of that arena dynamically, using commands sent over the >> queue >> > > - our queue carries FUSE commands, and we've added two new FUSE >> > > commands to perform the map/unmap. They talk in terms of offsets >> > > within the shared arena, rather than GPAs. >> > >> > In our case we have no files to map, only pointers returned from >> > OpenGL or Vulkan. >> > Do you have the approach to share for this use case? >> >> Fundamentally the same: The memory region (PCI bar in case of >> virtio-pci) reserves address space. The guest manages the address >> space, it can ask the host to map host gpu resources there. >> >> Well, that is at least the plan. Some incomplete WIP patches exist, I'm >> still busy hammering virtio-gpu ttm code into shape so it can support >> different kinds of gpu objects. >> >> > Do you this it is possible to have virtio-pipe where we could send >> > arbitrary blobs between >> > guest and host? >> >> Well, there are virtio-serial and virtio-vsock which both give you a >> pipe between host and guest, simliar to a serial line / tcp socket. >> Dunno how good they are at handling larger blobs though. >> >> > I've looked at virtio-vsock and it seems general, but requires Unix > sockets, which is not going to work for us on Windows and not going to work > as expected on macOS (most likely). Is there anything that is similar to > and as portable as goldfish pipe which is more like a raw virtqueue? This > would then work on memory in the same process, with callbacks registered to > trigger upon transmission. > > cheers, >> Gerd >> >> [-- Attachment #2: Type: text/html, Size: 3626 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-06 15:09 ` Frank Yang 2019-02-06 15:11 ` Frank Yang @ 2019-02-08 7:57 ` Stefan Hajnoczi 2019-02-08 14:46 ` Frank Yang 1 sibling, 1 reply; 72+ messages in thread From: Stefan Hajnoczi @ 2019-02-08 7:57 UTC (permalink / raw) To: Frank Yang Cc: Gerd Hoffmann, Roman Kiryanov, Dr. David Alan Gilbert, virtio-dev [-- Attachment #1: Type: text/plain, Size: 975 bytes --] On Wed, Feb 06, 2019 at 07:09:36AM -0800, Frank Yang wrote: > I've looked at virtio-vsock and it seems general, but requires Unix > sockets, which is not going to work for us on Windows and not going to work > as expected on macOS (most likely). Is there anything that is similar to > and as portable as goldfish pipe which is more like a raw virtqueue? This > would then work on memory in the same process, with callbacks registered to > trigger upon transmission. virtio-vsock is independent of UNIX domain sockets. I'm not sure what you mean here. I think Linaro implemented virtio-vsock inside QEMU for the Android emulator but I'm not sure how far they got. Today virtio-vsock relies on a Linux host machine because the vhost_vsock.ko driver is used to integrate into the host network stack. The Linaro implementation moved that into QEMU userspace (with the drawback that socket(AF_VSOCK) no longer works on the host and you need to talk to QEMU instead). Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-08 7:57 ` Stefan Hajnoczi @ 2019-02-08 14:46 ` Frank Yang 0 siblings, 0 replies; 72+ messages in thread From: Frank Yang @ 2019-02-08 14:46 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Gerd Hoffmann, Roman Kiryanov, Dr. David Alan Gilbert, virtio-dev [-- Attachment #1: Type: text/plain, Size: 1552 bytes --] On Thu, Feb 7, 2019 at 11:57 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > On Wed, Feb 06, 2019 at 07:09:36AM -0800, Frank Yang wrote: > > I've looked at virtio-vsock and it seems general, but requires Unix > > sockets, which is not going to work for us on Windows and not going to > work > > as expected on macOS (most likely). Is there anything that is similar to > > and as portable as goldfish pipe which is more like a raw virtqueue? This > > would then work on memory in the same process, with callbacks registered > to > > trigger upon transmission. > > virtio-vsock is independent of UNIX domain sockets. I'm not sure what > you mean here. > > I think Linaro implemented virtio-vsock inside QEMU for the Android > emulator but I'm not sure how far they got. Today virtio-vsock relies > on a Linux host machine because the vhost_vsock.ko driver is used to > integrate into the host network stack. The Linaro implementation moved > that into QEMU userspace (with the drawback that socket(AF_VSOCK) no > longer works on the host and you need to talk to QEMU instead). > > Thanks for the info! Since we prefer to keep things as canonical as possible (and thus not mess with existing infrastructure; benefit being that we can use any upstream QEMU / Linux kernel), it doesn't solve our vsock issue. We'd also like to decouple the concept of dynamically defined drivers/devices from the transport. Finally, when host memory is introduced, it's also possible to be faster than a raw virtqueue for all data. I'll send out a spec. > Stefan > [-- Attachment #2: Type: text/html, Size: 2143 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-05 21:06 ` Roman Kiryanov 2019-02-06 7:03 ` Gerd Hoffmann @ 2019-02-06 20:14 ` Dr. David Alan Gilbert 2019-02-06 20:27 ` Frank Yang 1 sibling, 1 reply; 72+ messages in thread From: Dr. David Alan Gilbert @ 2019-02-06 20:14 UTC (permalink / raw) To: Roman Kiryanov; +Cc: Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Lingfeng Yang * Roman Kiryanov (rkir@google.com) wrote: > Hi Dave, > > > In virtio-fs we have two separate stages: > > a) A shared arena is setup (and that's what the spec Stefan pointed to is about) - > > it's statically allocated at device creation and corresponds to a chunk > > of guest physical address space > > We do exactly the same: > https://android.googlesource.com/platform/external/qemu/+/emu-master-dev/hw/pci/goldfish_address_space.c#659 > > > b) During operation the guest kernel asks for files to be mapped into > > part of that arena dynamically, using commands sent over the queue > > - our queue carries FUSE commands, and we've added two new FUSE > > commands to perform the map/unmap. They talk in terms of offsets > > within the shared arena, rather than GPAs. > > In our case we have no files to map, only pointers returned from > OpenGL or Vulkan. > Do you have the approach to share for this use case? I should say that the spec I'm talking aobut is my 1st virito spec change; so take my ideas with a large pinch of salt! > > How do you transmit the glMapBufferRange command from QEMU driver to > > host? > > In December we did this by passing these bits over our guest-host channel > (another driver, goldfish_pipe). Frank is currently working on moving > this into our memory > mapping device as "something changed in the memory you shared". > > Do you this it is possible to have virtio-pipe where we could send > arbitrary blobs between > guest and host? We want to move all our drivers into userspace so we > could share memory > using the device you are currently working on and this virtio-pipe to > pass MMIOs and IRQs > to control our devices to avoid dealing with kernel drivers at all. It sounds to me like you want something like a virtio-pipe, with a shared arena (like specified using the spec change I suggested) but with either a separate queue, or commands in the queue to do the mapping/unmapping of your GL pointers from your arena. Dave > Thank you. > > Regards, > Roman. -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-06 20:14 ` Dr. David Alan Gilbert @ 2019-02-06 20:27 ` Frank Yang 2019-02-07 12:10 ` Cornelia Huck 0 siblings, 1 reply; 72+ messages in thread From: Frank Yang @ 2019-02-06 20:27 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev [-- Attachment #1: Type: text/plain, Size: 3917 bytes --] On Wed, Feb 6, 2019 at 12:14 PM Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > * Roman Kiryanov (rkir@google.com) wrote: > > Hi Dave, > > > > > In virtio-fs we have two separate stages: > > > a) A shared arena is setup (and that's what the spec Stefan pointed > to is about) - > > > it's statically allocated at device creation and corresponds to a > chunk > > > of guest physical address space > > > > We do exactly the same: > > > https://android.googlesource.com/platform/external/qemu/+/emu-master-dev/hw/pci/goldfish_address_space.c#659 > > > > > b) During operation the guest kernel asks for files to be mapped into > > > part of that arena dynamically, using commands sent over the queue > > > - our queue carries FUSE commands, and we've added two new FUSE > > > commands to perform the map/unmap. They talk in terms of offsets > > > within the shared arena, rather than GPAs. > > > > In our case we have no files to map, only pointers returned from > > OpenGL or Vulkan. > > Do you have the approach to share for this use case? > > I should say that the spec I'm talking aobut is my 1st virito spec > change; so take my ideas with a large pinch of salt! > > > > How do you transmit the glMapBufferRange command from QEMU driver to > > > host? > > > > In December we did this by passing these bits over our guest-host channel > > (another driver, goldfish_pipe). Frank is currently working on moving > > this into our memory > > mapping device as "something changed in the memory you shared". > > > > Do you this it is possible to have virtio-pipe where we could send > > arbitrary blobs between > > guest and host? We want to move all our drivers into userspace so we > > could share memory > > using the device you are currently working on and this virtio-pipe to > > pass MMIOs and IRQs > > to control our devices to avoid dealing with kernel drivers at all. > > It sounds to me like you want something like a virtio-pipe, with > a shared arena (like specified using the spec change I suggested) > but with either a separate queue, or commands in the queue to do the > mapping/unmapping of your GL pointers from your arena. > This sounds close to what we want, but the current suggestions to use virtio-serial/virtio-vsock are difficult to deal with as they add on the req of console forwarding/hard limits on the number of queues, or coupling to unix sockets on the host. What about this: A new spec, called "virtio-pipe". It only sends control messages. It's meant to work in tandem with the current virtio host memory proposal. It's not specialized to anything; it doesn't use sockets on the host either, instead, it uses dlopen/dlsym on the host to load a library implementing the wanted userspace devices, together with a minimal ioctl in the guest to capture everything: There is one ioctl: ioctl_ping: u64 offset (to the virtio host memory object) u64 size u64 metadata (driver-dependent data) u32 wait (whether the guest is waiting for the host to be done with something) These are sent over virtqueue. On the host, these pings arrive and call some dlsym'ed functions: u32 on_context_create - when guest userspace open()'s the virtio-pipe, this returns a new id. on_context_destroy(u32 id) - on last close of the pipe on_ioctl_ping(u32 id, u64 physaddr, u64 size, u64 metadata, u32 wait) - called when the guest ioctl ping's. There would need to be some kind of IRQ-like mechanism (either done with actual virtual irqs, or polling, or a mprotect/mwait-like mechanism) that tells the guest the host is done with something. This would be the absolute minimum and most general way to send anything to/from the host with explicit control messages; any device can be defined on top of this with no changes to virtio or qemu. > Dave > > > Thank you. > > > > Regards, > > Roman. > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > [-- Attachment #2: Type: text/html, Size: 5331 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-06 20:27 ` Frank Yang @ 2019-02-07 12:10 ` Cornelia Huck 0 siblings, 0 replies; 72+ messages in thread From: Cornelia Huck @ 2019-02-07 12:10 UTC (permalink / raw) To: Frank Yang Cc: Dr. David Alan Gilbert, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev [Jumping into the discussion here; I have read through the discussion so far, but I might have misunderstood things, as I'm not really familiar with Vulkan et al.] On Wed, 6 Feb 2019 12:27:48 -0800 Frank Yang <lfy@google.com> wrote: > On Wed, Feb 6, 2019 at 12:14 PM Dr. David Alan Gilbert <dgilbert@redhat.com> > wrote: > > > * Roman Kiryanov (rkir@google.com) wrote: > > > Hi Dave, > > > > > > > In virtio-fs we have two separate stages: > > > > a) A shared arena is setup (and that's what the spec Stefan pointed > > to is about) - > > > > it's statically allocated at device creation and corresponds to a > > chunk > > > > of guest physical address space > > > > > > We do exactly the same: > > > > > https://android.googlesource.com/platform/external/qemu/+/emu-master-dev/hw/pci/goldfish_address_space.c#659 > > > > > > > b) During operation the guest kernel asks for files to be mapped into > > > > part of that arena dynamically, using commands sent over the queue > > > > - our queue carries FUSE commands, and we've added two new FUSE > > > > commands to perform the map/unmap. They talk in terms of offsets > > > > within the shared arena, rather than GPAs. > > > > > > In our case we have no files to map, only pointers returned from > > > OpenGL or Vulkan. > > > Do you have the approach to share for this use case? > > > > I should say that the spec I'm talking aobut is my 1st virito spec > > change; so take my ideas with a large pinch of salt! > > > > > > How do you transmit the glMapBufferRange command from QEMU driver to > > > > host? > > > > > > In December we did this by passing these bits over our guest-host channel > > > (another driver, goldfish_pipe). Frank is currently working on moving > > > this into our memory > > > mapping device as "something changed in the memory you shared". > > > > > > Do you this it is possible to have virtio-pipe where we could send > > > arbitrary blobs between > > > guest and host? We want to move all our drivers into userspace so we > > > could share memory > > > using the device you are currently working on and this virtio-pipe to > > > pass MMIOs and IRQs > > > to control our devices to avoid dealing with kernel drivers at all. > > > > It sounds to me like you want something like a virtio-pipe, with > > a shared arena (like specified using the spec change I suggested) > > but with either a separate queue, or commands in the queue to do the > > mapping/unmapping of your GL pointers from your arena. > > > > This sounds close to what we want, but the current suggestions to use > virtio-serial/virtio-vsock are difficult to deal with as they add on the > req of console forwarding/hard limits on the number of queues, or coupling > to unix sockets on the host. If existing devices don't work for your use case, adding a new type is completely fine; however, I'm worried that it might end up too generic. A lose specification might not have enough information to write either a device or a driver that interacts with an existing driver or device; if it relies on both device and driver being controlled by the same instance, it's not a good fit for the virtio spec IMHO. > > What about this: > > A new spec, called "virtio-pipe". It only sends control messages. It's > meant to work in tandem with the current virtio host memory proposal. It's > not specialized to anything; it doesn't use sockets on the host either, > instead, it uses dlopen/dlsym on the host to load a library implementing > the wanted userspace devices, together with a minimal ioctl in the guest to > capture everything: > > There is one ioctl: > > ioctl_ping: > u64 offset (to the virtio host memory object) > u64 size > u64 metadata (driver-dependent data) > u32 wait (whether the guest is waiting for the host to be done with > something) > > These are sent over virtqueue. One thing you need to keep in mind is that the virtio spec does not specify anything like ioctls; what the individual device and driver implementations do is up to the specific environment they're run in. IOW, if you want your user space driver to be able to submit and receive some information, you must make sure that everything it needs is transmitted via virtqueues and shared regions; how it actually accessed it is up to the implementation. If you frame your ioctl structure as "this is the format of the buffers that are transmitted via the virtqueue", it seems like something that we can build upon. > > On the host, these pings arrive and call some dlsym'ed functions: > > u32 on_context_create - when guest userspace open()'s the virtio-pipe, this > returns a new id. > on_context_destroy(u32 id) - on last close of the pipe > on_ioctl_ping(u32 id, u64 physaddr, u64 size, u64 metadata, u32 wait) - > called when the guest ioctl ping's. Same here: If the information transmitted in the virtqueue buffers is sufficient, the host side can implement whatever it needs. > > There would need to be some kind of IRQ-like mechanism (either done with > actual virtual irqs, or polling, or a mprotect/mwait-like mechanism) that > tells the guest the host is done with something. If you frame the virtqueue buffers nicely, the generic virtqueue notifications should probably be sufficient, I guess. > > This would be the absolute minimum and most general way to send anything > to/from the host with explicit control messages; any device can be defined > on top of this with no changes to virtio or qemu. Ok, this brings me back to my "too generic" concern. If you do everything in user space on the host and guest sides and the virtio device is basically only a dumb pipe, correct functioning depends entirely on correct implementations in the user space components. What you're throwing away are some nice features of virtio like feature bit negotiation. If, for some reason, the user space implementations on guest and host side run out of sync, or you accidentally pair up two incompatible types, virtio will continue to cheerfully shuffle around data until it goes boom. I'm not sure about all of the future use cases for this, but I'd advise to specify some way for: - the driver to find out what kind of blobs the device supports (can maybe be done via feature bits) - some kind of versioning, so you can extend the control messages should they turn out to be missing something --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-05 7:42 ` Roman Kiryanov 2019-02-05 10:04 ` Dr. David Alan Gilbert @ 2019-02-11 14:49 ` Michael S. Tsirkin 2019-02-11 15:14 ` Frank Yang 2019-02-12 8:27 ` Roman Kiryanov 1 sibling, 2 replies; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-11 14:49 UTC (permalink / raw) To: Roman Kiryanov Cc: Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Dr. David Alan Gilbert, Lingfeng Yang On Mon, Feb 04, 2019 at 11:42:25PM -0800, Roman Kiryanov wrote: > Hi Gerd, > > > virtio-gpu specifically needs that to support vulkan and opengl > > extensions for coherent buffers, which must be allocated by the host gpu > > driver. It's WIP still. > > the proposed spec says: > > +Shared memory regions MUST NOT be used to control the operation > +of the device, nor to stream data; those should still be performed > +using virtqueues. > > Is there a strong reason to prohibit using memory regions for control purposes? That's in order to make virtio have portability implications, such that if people see a virtio device in lspci they know there's no lock-in, their guest can be moved between hypervisors and will still work. > Our long term goal is to have as few kernel drivers as possible and to move > "drivers" into userspace. If we go with the virtqueues, is there > general a purpose > device/driver to talk between our host and guest to support custom hardware > (with own blobs)? The challenge is to answer the following question: how to do this without losing the benefits of standartization? > Could you please advise if we can use something else to > achieve this goal? I am not sure what the goal is though. Blobs is a means I guess or it should be :) E.g. is it about being able to iterate quickly? Maybe you should look at vhost-user-gpu patches on qemu? Would this address your need? Acks for these patches would be a good thing. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-11 14:49 ` Michael S. Tsirkin @ 2019-02-11 15:14 ` Frank Yang 2019-02-11 15:25 ` Frank Yang 2019-02-11 16:57 ` Michael S. Tsirkin 2019-02-12 8:27 ` Roman Kiryanov 1 sibling, 2 replies; 72+ messages in thread From: Frank Yang @ 2019-02-11 15:14 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Dr. David Alan Gilbert [-- Attachment #1: Type: text/plain, Size: 2046 bytes --] On Mon, Feb 11, 2019 at 6:49 AM Michael S. Tsirkin <mst@redhat.com> wrote: > On Mon, Feb 04, 2019 at 11:42:25PM -0800, Roman Kiryanov wrote: > > Hi Gerd, > > > > > virtio-gpu specifically needs that to support vulkan and opengl > > > extensions for coherent buffers, which must be allocated by the host > gpu > > > driver. It's WIP still. > > > > the proposed spec says: > > > > +Shared memory regions MUST NOT be used to control the operation > > +of the device, nor to stream data; those should still be performed > > +using virtqueues. > > > > Is there a strong reason to prohibit using memory regions for control > purposes? > > That's in order to make virtio have portability implications, such that if > people see a virtio device in lspci they know there's > no lock-in, their guest can be moved between hypervisors > and will still work. > > > Our long term goal is to have as few kernel drivers as possible and to > move > > "drivers" into userspace. If we go with the virtqueues, is there > > general a purpose > > device/driver to talk between our host and guest to support custom > hardware > > (with own blobs)? > > The challenge is to answer the following question: > how to do this without losing the benefits of standartization? > > Draft spec is incoming, but the basic idea is to standardize how to enumerate, discover, and operate (with high performance) such userspace drivers/devices; the basic operations would be standardized, and userspace drivers would be constructed out of the resulting primitives. > Could you please advise if we can use something else to > > achieve this goal? > > I am not sure what the goal is though. Blobs is a means I guess > or it should be :) E.g. is it about being able to iterate quickly? > > Maybe you should look at vhost-user-gpu patches on qemu? > Would this address your need? > Acks for these patches would be a good thing. > > Is this it: https://patchwork.kernel.org/patch/10444089/ ? I'll check it out and try to discuss. Is there a draft spec for it as well? > > -- > MST > [-- Attachment #2: Type: text/html, Size: 3030 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-11 15:14 ` Frank Yang @ 2019-02-11 15:25 ` Frank Yang 2019-02-12 13:01 ` Michael S. Tsirkin ` (2 more replies) 2019-02-11 16:57 ` Michael S. Tsirkin 1 sibling, 3 replies; 72+ messages in thread From: Frank Yang @ 2019-02-11 15:25 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Dr. David Alan Gilbert [-- Attachment #1: Type: text/plain, Size: 3486 bytes --] BTW, I have a few concerns about the upcoming shared-mem virtio type. This is mostly directed at David and kraxel. We've found that for many applications, simply telling the guest to create a new host pointer of Vulkan or OpenGL has quite some overhead in just telling the hypervisor to map it, and in fact, it's easy to run out of KVM slots by doing so. So for Vulkan, we rely on having one large host visible region on the host that is a single region of host shared memory. That, is then sub-allocated for the guest. So there is no Vulkan host pointer that is being shared to the guest 1:1; we suballocate, then generate the right 'underlying' Vulkan device memory offset and size parameters for the host. In general though, this means that the ideal usage of host pointers would be to set a few regions up front for certain purposes, then share that out amongst other device contexts. This also facilitates sharing the memory between guest processes, which is useful for implementing things like compositors. This also features heavily for our "virtio userspace" thing. Since this is a common pattern, should this sharing concept be standardized somehow? I.e., should there be a standard way to send Shmid/offset/size to other devices, or have that be a standard struct in the hypervisor? On Mon, Feb 11, 2019 at 7:14 AM Frank Yang <lfy@google.com> wrote: > > > On Mon, Feb 11, 2019 at 6:49 AM Michael S. Tsirkin <mst@redhat.com> wrote: > >> On Mon, Feb 04, 2019 at 11:42:25PM -0800, Roman Kiryanov wrote: >> > Hi Gerd, >> > >> > > virtio-gpu specifically needs that to support vulkan and opengl >> > > extensions for coherent buffers, which must be allocated by the host >> gpu >> > > driver. It's WIP still. >> > >> > the proposed spec says: >> > >> > +Shared memory regions MUST NOT be used to control the operation >> > +of the device, nor to stream data; those should still be performed >> > +using virtqueues. >> > >> > Is there a strong reason to prohibit using memory regions for control >> purposes? >> >> That's in order to make virtio have portability implications, such that if >> people see a virtio device in lspci they know there's >> no lock-in, their guest can be moved between hypervisors >> and will still work. >> >> > Our long term goal is to have as few kernel drivers as possible and to >> move >> > "drivers" into userspace. If we go with the virtqueues, is there >> > general a purpose >> > device/driver to talk between our host and guest to support custom >> hardware >> > (with own blobs)? >> >> The challenge is to answer the following question: >> how to do this without losing the benefits of standartization? >> >> Draft spec is incoming, but the basic idea is to standardize how to > enumerate, discover, and operate (with high performance) such userspace > drivers/devices; the basic operations would be standardized, and userspace > drivers would be constructed out of the resulting primitives. > > > Could you please advise if we can use something else to >> > achieve this goal? >> >> I am not sure what the goal is though. Blobs is a means I guess >> or it should be :) E.g. is it about being able to iterate quickly? >> >> Maybe you should look at vhost-user-gpu patches on qemu? >> Would this address your need? >> Acks for these patches would be a good thing. >> >> > Is this it: > > https://patchwork.kernel.org/patch/10444089/ ? > > I'll check it out and try to discuss. Is there a draft spec for it as well? > > >> >> -- >> MST >> > [-- Attachment #2: Type: text/html, Size: 4774 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-11 15:25 ` Frank Yang @ 2019-02-12 13:01 ` Michael S. Tsirkin 2019-02-12 13:16 ` Dr. David Alan Gilbert 2019-02-19 7:12 ` Gerd Hoffmann 2 siblings, 0 replies; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-12 13:01 UTC (permalink / raw) To: Frank Yang Cc: Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Dr. David Alan Gilbert On Mon, Feb 11, 2019 at 07:25:18AM -0800, Frank Yang wrote: > BTW, I have a few concerns about the upcoming shared-mem virtio type. This is > mostly directed at David and kraxel. Gerd, David any feedback on this? --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-11 15:25 ` Frank Yang 2019-02-12 13:01 ` Michael S. Tsirkin @ 2019-02-12 13:16 ` Dr. David Alan Gilbert 2019-02-12 13:27 ` Michael S. Tsirkin 2019-02-19 7:12 ` Gerd Hoffmann 2 siblings, 1 reply; 72+ messages in thread From: Dr. David Alan Gilbert @ 2019-02-12 13:16 UTC (permalink / raw) To: Frank Yang Cc: Michael S. Tsirkin, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev * Frank Yang (lfy@google.com) wrote: > BTW, I have a few concerns about the upcoming shared-mem virtio type. This > is mostly directed at David and kraxel. > > We've found that for many applications, simply telling the guest to create > a new host pointer of Vulkan or OpenGL has quite some overhead in just > telling the hypervisor to map it, and in fact, it's easy to run out of KVM > slots by doing so. So for Vulkan, we rely on having one large host visible > region on the host that is a single region of host shared memory. That, is > then sub-allocated for the guest. So there is no Vulkan host pointer that > is being shared to the guest 1:1; we suballocate, then generate the right > 'underlying' Vulkan device memory offset and size parameters for the host. That's the same for our DAX in virtio-fs; we just allocate a big 'arena' and then map stuff within that arena; it's the arena that's described as one of the shared regions in the spec that I presented. All the requests to map/unmap in that arena then happen as commands over the virtqueue. > In general though, this means that the ideal usage of host pointers would > be to set a few regions up front for certain purposes, then share that out > amongst other device contexts. This also facilitates sharing the memory > between guest processes, which is useful for implementing things like > compositors. This also features heavily for our "virtio userspace" thing. Yes, that makes sense. > Since this is a common pattern, should this sharing concept be standardized > somehow? I.e., should there be a standard way to send Shmid/offset/size to > other devices, or have that be a standard struct in the hypervisor? That I don't know how to do - because then you need a way to associate different devices. Dave > On Mon, Feb 11, 2019 at 7:14 AM Frank Yang <lfy@google.com> wrote: > > > > > > > On Mon, Feb 11, 2019 at 6:49 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > >> On Mon, Feb 04, 2019 at 11:42:25PM -0800, Roman Kiryanov wrote: > >> > Hi Gerd, > >> > > >> > > virtio-gpu specifically needs that to support vulkan and opengl > >> > > extensions for coherent buffers, which must be allocated by the host > >> gpu > >> > > driver. It's WIP still. > >> > > >> > the proposed spec says: > >> > > >> > +Shared memory regions MUST NOT be used to control the operation > >> > +of the device, nor to stream data; those should still be performed > >> > +using virtqueues. > >> > > >> > Is there a strong reason to prohibit using memory regions for control > >> purposes? > >> > >> That's in order to make virtio have portability implications, such that if > >> people see a virtio device in lspci they know there's > >> no lock-in, their guest can be moved between hypervisors > >> and will still work. > >> > >> > Our long term goal is to have as few kernel drivers as possible and to > >> move > >> > "drivers" into userspace. If we go with the virtqueues, is there > >> > general a purpose > >> > device/driver to talk between our host and guest to support custom > >> hardware > >> > (with own blobs)? > >> > >> The challenge is to answer the following question: > >> how to do this without losing the benefits of standartization? > >> > >> Draft spec is incoming, but the basic idea is to standardize how to > > enumerate, discover, and operate (with high performance) such userspace > > drivers/devices; the basic operations would be standardized, and userspace > > drivers would be constructed out of the resulting primitives. > > > > > Could you please advise if we can use something else to > >> > achieve this goal? > >> > >> I am not sure what the goal is though. Blobs is a means I guess > >> or it should be :) E.g. is it about being able to iterate quickly? > >> > >> Maybe you should look at vhost-user-gpu patches on qemu? > >> Would this address your need? > >> Acks for these patches would be a good thing. > >> > >> > > Is this it: > > > > https://patchwork.kernel.org/patch/10444089/ ? > > > > I'll check it out and try to discuss. Is there a draft spec for it as well? > > > > > >> > >> -- > >> MST > >> > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 13:16 ` Dr. David Alan Gilbert @ 2019-02-12 13:27 ` Michael S. Tsirkin 2019-02-12 16:17 ` Frank Yang 0 siblings, 1 reply; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-12 13:27 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Frank Yang, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev On Tue, Feb 12, 2019 at 01:16:39PM +0000, Dr. David Alan Gilbert wrote: > > somehow? I.e., should there be a standard way to send Shmid/offset/size to > > other devices, or have that be a standard struct in the hypervisor? > > That I don't know how to do - because then you need a way to > associate different devices. > > Dave We could come up with a way maybe but why? Are there so many devices? -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 13:27 ` Michael S. Tsirkin @ 2019-02-12 16:17 ` Frank Yang 2019-02-19 7:17 ` Gerd Hoffmann 0 siblings, 1 reply; 72+ messages in thread From: Frank Yang @ 2019-02-12 16:17 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Dr. David Alan Gilbert, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev [-- Attachment #1: Type: text/plain, Size: 825 bytes --] In implementing a complete graphics system, shared memory objects may transcend device boundaries. Consider for example, Android and gralloc. The media codec versus the GPU rendering are commonly considered as separate devices, but they work better if operating on a common shared memory type (gralloc buffers). On Tue, Feb 12, 2019 at 5:27 AM Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Feb 12, 2019 at 01:16:39PM +0000, Dr. David Alan Gilbert wrote: > > > somehow? I.e., should there be a standard way to send > Shmid/offset/size to > > > other devices, or have that be a standard struct in the hypervisor? > > > > That I don't know how to do - because then you need a way to > > associate different devices. > > > > Dave > > We could come up with a way maybe but why? Are there so many devices? > > -- > MST > [-- Attachment #2: Type: text/html, Size: 1194 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 16:17 ` Frank Yang @ 2019-02-19 7:17 ` Gerd Hoffmann 2019-02-19 15:59 ` Frank Yang 0 siblings, 1 reply; 72+ messages in thread From: Gerd Hoffmann @ 2019-02-19 7:17 UTC (permalink / raw) To: Frank Yang Cc: Michael S. Tsirkin, Dr. David Alan Gilbert, Roman Kiryanov, Stefan Hajnoczi, virtio-dev On Tue, Feb 12, 2019 at 08:17:13AM -0800, Frank Yang wrote: > In implementing a complete graphics system, shared memory objects may > transcend device boundaries. > > Consider for example, Android and gralloc. The media codec versus the GPU > rendering are commonly considered as separate devices, but they work better > if operating on a common shared memory type (gralloc buffers). Linux has dma-bufs for that (driver-api/dma-buf.rst). cheers, Gerd --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-19 7:17 ` Gerd Hoffmann @ 2019-02-19 15:59 ` Frank Yang 2019-02-20 6:51 ` Gerd Hoffmann 0 siblings, 1 reply; 72+ messages in thread From: Frank Yang @ 2019-02-19 15:59 UTC (permalink / raw) To: Gerd Hoffmann Cc: Michael S. Tsirkin, Dr. David Alan Gilbert, Roman Kiryanov, Stefan Hajnoczi, virtio-dev [-- Attachment #1: Type: text/plain, Size: 1480 bytes --] On Mon, Feb 18, 2019 at 11:18 PM Gerd Hoffmann <kraxel@redhat.com> wrote: > On Tue, Feb 12, 2019 at 08:17:13AM -0800, Frank Yang wrote: > > In implementing a complete graphics system, shared memory objects may > > transcend device boundaries. > > > > Consider for example, Android and gralloc. The media codec versus the GPU > > rendering are commonly considered as separate devices, but they work > better > > if operating on a common shared memory type (gralloc buffers). > > Linux has dma-bufs for that (driver-api/dma-buf.rst). > > Yes; I also followed a discussion around udmabuf for Vulkan host memory sharing. However, dma-buf seems to require either a Linux kernel or a Linux host. Dma-bufs aren't also 1:1 with Vulkan host visible memory pointers, or v4l2 codec buffers, or ffmpeg codec buffers, etc. For the use case of Vulkan, it's fortunate that there is an external memory dma buf, but its application seems very limited to linux hosts and host drivers that support that kind of external memory. The proposed device would be able to expose memory for direct access in a way that does not couple to dma-buf which is highly desirable for our use case. Using the ping/event messages, even win32 handles and general opaque fds can be passed from host to guest and back. You can think of the proposed device as a 'virtio-dmabuf' that tries to expose shareable memory in a way that disregards implementation details of guest and host kernels. > cheers, > Gerd > > [-- Attachment #2: Type: text/html, Size: 2210 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-19 15:59 ` Frank Yang @ 2019-02-20 6:51 ` Gerd Hoffmann 2019-02-20 15:31 ` Frank Yang 0 siblings, 1 reply; 72+ messages in thread From: Gerd Hoffmann @ 2019-02-20 6:51 UTC (permalink / raw) To: Frank Yang Cc: Michael S. Tsirkin, Dr. David Alan Gilbert, Roman Kiryanov, Stefan Hajnoczi, virtio-dev Hi, > However, dma-buf seems to require either a Linux kernel or a Linux host. Sure. They allow passing buffers from one linux driver to another without copying the data. > Dma-bufs aren't also 1:1 with Vulkan host visible memory pointers, > or v4l2 codec buffers, or ffmpeg codec buffers, etc. Some v4l2 drivers have dma-buf support, so you can pass buffers from v4l2 to (for example) gpu drivers that way. > The proposed device would be able to expose memory for direct access in a > way that > does not couple to dma-buf which is highly desirable for our use case. > Using the ping/event messages, even win32 handles and general opaque fds > can be passed from host to guest and back. > > You can think of the proposed device as a 'virtio-dmabuf' that > tries to expose shareable memory in a way that disregards implementation > details of > guest and host kernels. That would probably look alot like virtio-gpu with only the resource handling. virtio-gpu fundamentally are just buffers. Also virtio-dmabuf would be a pretty bad name. dma-bufs are not a virtio concept, they are a linux concept. They can be used by linux guests, to pass buffers from/to virtio-gpu (note: I'm still busy adding driver support for that). They can be used by linux hosts, to pass buffers (with udmabuf help) from qemu to other processes/devices (details are still to be hashed out). Non-linux systems obviously need something else for the job. The guest/host implementation details don't affect the virtio-gpu specs though. cheers, Gerd --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-20 6:51 ` Gerd Hoffmann @ 2019-02-20 15:31 ` Frank Yang 2019-02-21 6:55 ` Gerd Hoffmann 0 siblings, 1 reply; 72+ messages in thread From: Frank Yang @ 2019-02-20 15:31 UTC (permalink / raw) To: Gerd Hoffmann Cc: Michael S. Tsirkin, Dr. David Alan Gilbert, Roman Kiryanov, Stefan Hajnoczi, virtio-dev [-- Attachment #1: Type: text/plain, Size: 2123 bytes --] On Tue, Feb 19, 2019 at 10:52 PM Gerd Hoffmann <kraxel@redhat.com> wrote: > Hi, > > > However, dma-buf seems to require either a Linux kernel or a Linux host. > > Sure. They allow passing buffers from one linux driver to another > without copying the data. > > > Dma-bufs aren't also 1:1 with Vulkan host visible memory pointers, > > or v4l2 codec buffers, or ffmpeg codec buffers, etc. > > Some v4l2 drivers have dma-buf support, so you can pass buffers from > v4l2 to (for example) gpu drivers that way. > > > The proposed device would be able to expose memory for direct access in a > > way that > > does not couple to dma-buf which is highly desirable for our use case. > > Using the ping/event messages, even win32 handles and general opaque fds > > can be passed from host to guest and back. > > > > You can think of the proposed device as a 'virtio-dmabuf' that > > tries to expose shareable memory in a way that disregards implementation > > details of > > guest and host kernels. > > That would probably look alot like virtio-gpu with only the resource > handling. virtio-gpu fundamentally are just buffers. > > This plus new transport makes me wonder if we can have something like a transport/device pair where the transport makes it easy to work directly off host memory pci bar, and the device is virtio-gpu except really just buffers. We'd really like to go for something like this. Also virtio-dmabuf would be a pretty bad name. dma-bufs are not a > virtio concept, they are a linux concept. They can be used by linux > guests, to pass buffers from/to virtio-gpu (note: I'm still busy adding > driver support for that). They can be used by linux hosts, to pass > buffers (with udmabuf help) from qemu to other processes/devices > (details are still to be hashed out). > > Got it, that sounds pretty interesting. > Non-linux systems obviously need something else for the job. The > guest/host implementation details don't affect the virtio-gpu specs > though. While we're talking about this: what is your plan for virtio-gpu implementations for non-Linux guests/hosts? > > cheers, > Gerd > > [-- Attachment #2: Type: text/html, Size: 3301 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-20 15:31 ` Frank Yang @ 2019-02-21 6:55 ` Gerd Hoffmann 0 siblings, 0 replies; 72+ messages in thread From: Gerd Hoffmann @ 2019-02-21 6:55 UTC (permalink / raw) To: Frank Yang Cc: Michael S. Tsirkin, Dr. David Alan Gilbert, Roman Kiryanov, Stefan Hajnoczi, virtio-dev Hi, > > Non-linux systems obviously need something else for the job. The > > guest/host implementation details don't affect the virtio-gpu specs > > though. > > While we're talking about this: what is your plan for virtio-gpu > implementations for non-Linux guests/hosts? IIRC someone is working on windows guest drivers, don't know what the current status is though. edk2 (uefi firmware) has a driver too. I'm not aware on any other guest drivers, and I don't have any plans personally (and that isn't going to change anytime soon, I'm busy enough with linux alone). On the host side virtio-gpu (without virgl) works just fine on any platform supported by qemu, including *bsd, macos and windows. I have no idea if and how virtio-gpu buffers could be shared with other processes on non-linux platforms though. cheers, Gerd --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-11 15:25 ` Frank Yang 2019-02-12 13:01 ` Michael S. Tsirkin 2019-02-12 13:16 ` Dr. David Alan Gilbert @ 2019-02-19 7:12 ` Gerd Hoffmann 2019-02-19 16:02 ` Frank Yang 2 siblings, 1 reply; 72+ messages in thread From: Gerd Hoffmann @ 2019-02-19 7:12 UTC (permalink / raw) To: Frank Yang Cc: Michael S. Tsirkin, Roman Kiryanov, Stefan Hajnoczi, virtio-dev, Dr. David Alan Gilbert Hi, > slots by doing so. So for Vulkan, we rely on having one large host visible > region on the host that is a single region of host shared memory. That, is > then sub-allocated for the guest. So there is no Vulkan host pointer that Yes, sub-allocating will be needed for reasonable performance. > In general though, this means that the ideal usage of host pointers would > be to set a few regions up front for certain purposes, then share that out > amongst other device contexts. This also facilitates sharing the memory > between guest processes, which is useful for implementing things like > compositors. Guest processes in the same VM or in different VMs? > This also features heavily for our "virtio userspace" thing. > Since this is a common pattern, should this sharing concept be standardized > somehow? I.e., should there be a standard way to send Shmid/offset/size to > other devices, or have that be a standard struct in the hypervisor? Same question: other devices of the same VM? cheers, Gerd --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-19 7:12 ` Gerd Hoffmann @ 2019-02-19 16:02 ` Frank Yang 2019-02-20 7:02 ` Gerd Hoffmann 0 siblings, 1 reply; 72+ messages in thread From: Frank Yang @ 2019-02-19 16:02 UTC (permalink / raw) To: Gerd Hoffmann Cc: Michael S. Tsirkin, Roman Kiryanov, Stefan Hajnoczi, virtio-dev, Dr. David Alan Gilbert [-- Attachment #1: Type: text/plain, Size: 1827 bytes --] On Mon, Feb 18, 2019 at 11:12 PM Gerd Hoffmann <kraxel@redhat.com> wrote: > Hi, > > > slots by doing so. So for Vulkan, we rely on having one large host > visible > > region on the host that is a single region of host shared memory. That, > is > > then sub-allocated for the guest. So there is no Vulkan host pointer that > > Yes, sub-allocating will be needed for reasonable performance. > > > In general though, this means that the ideal usage of host pointers would > > be to set a few regions up front for certain purposes, then share that > out > > amongst other device contexts. This also facilitates sharing the memory > > between guest processes, which is useful for implementing things like > > compositors. > > Guest processes in the same VM or in different VMs? > The guest processes are in the same VM. Are you also considering the usage in different VMs? In that case, if we had to address it, we would rely on there being an existing method on the host to map shared regions across processes, similar to gralloc, and use the fact that for each VM, they have a host pointer somehow, and expose the relevant host pointer to the guest. > This also features heavily for our "virtio userspace" thing. > > Since this is a common pattern, should this sharing concept be > standardized > > somehow? I.e., should there be a standard way to send Shmid/offset/size > to > > other devices, or have that be a standard struct in the hypervisor? > > Same question: other devices of the same VM? > Yes, also, other devices of the same VM. For different VMs, does the same scheme to send shmid/offset/size to other VMs work? Shmid's could be different and alias a common shared memory on the host, or additional metadata could be passed at some other level about which memory is "actually" shared. > cheers, > Gerd > > [-- Attachment #2: Type: text/html, Size: 2707 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-19 16:02 ` Frank Yang @ 2019-02-20 7:02 ` Gerd Hoffmann 2019-02-20 15:32 ` Frank Yang 0 siblings, 1 reply; 72+ messages in thread From: Gerd Hoffmann @ 2019-02-20 7:02 UTC (permalink / raw) To: Frank Yang Cc: Michael S. Tsirkin, Roman Kiryanov, Stefan Hajnoczi, virtio-dev, Dr. David Alan Gilbert > > > In general though, this means that the ideal usage of host pointers would > > > be to set a few regions up front for certain purposes, then share that > > out > > > amongst other device contexts. This also facilitates sharing the memory > > > between guest processes, which is useful for implementing things like > > > compositors. > > > > Guest processes in the same VM or in different VMs? > > > > > The guest processes are in the same VM. > Are you also considering the usage in different VMs? No, I'm just asking whenever that is important to you. For communication between guest processes within the same VM I don't really see a need to involve the hypervisor ... > > This also features heavily for our "virtio userspace" thing. > > > Since this is a common pattern, should this sharing concept be > > standardized > > > somehow? I.e., should there be a standard way to send Shmid/offset/size > > to > > > other devices, or have that be a standard struct in the hypervisor? > > > > Same question: other devices of the same VM? > > > > Yes, also, other devices of the same VM. So why involve the hypervisor here? The guest can handle that on its own. Passing an image data buffer from the usb webcam to the intel gpu for display (on bare metal) isn't fundamentally different from passing a buffer from virtio-camera to virtio-gpu (in a VM). Linux guests will use dma-bufs for that, other OSes probably something else. cheers, Gerd --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-20 7:02 ` Gerd Hoffmann @ 2019-02-20 15:32 ` Frank Yang 2019-02-21 7:29 ` Gerd Hoffmann 0 siblings, 1 reply; 72+ messages in thread From: Frank Yang @ 2019-02-20 15:32 UTC (permalink / raw) To: Gerd Hoffmann Cc: Michael S. Tsirkin, Roman Kiryanov, Stefan Hajnoczi, virtio-dev, Dr. David Alan Gilbert [-- Attachment #1: Type: text/plain, Size: 2072 bytes --] On Tue, Feb 19, 2019 at 11:02 PM Gerd Hoffmann <kraxel@redhat.com> wrote: > > > > In general though, this means that the ideal usage of host pointers > would > > > > be to set a few regions up front for certain purposes, then share > that > > > out > > > > amongst other device contexts. This also facilitates sharing the > memory > > > > between guest processes, which is useful for implementing things like > > > > compositors. > > > > > > Guest processes in the same VM or in different VMs? > > > > > > > > > The guest processes are in the same VM. > > Are you also considering the usage in different VMs? > > No, I'm just asking whenever that is important to you. > > For communication between guest processes within the same VM I don't > really see a need to involve the hypervisor ... > > Right, once the host memory is set up we can rely on purely guest side stuff map sub-regions of it. > > > This also features heavily for our "virtio userspace" thing. > > > > Since this is a common pattern, should this sharing concept be > > > standardized > > > > somehow? I.e., should there be a standard way to send > Shmid/offset/size > > > to > > > > other devices, or have that be a standard struct in the hypervisor? > > > > > > Same question: other devices of the same VM? > > > > > > > Yes, also, other devices of the same VM. > > So why involve the hypervisor here? The guest can handle that on its > own. Passing an image data buffer from the usb webcam to the intel gpu > for display (on bare metal) isn't fundamentally different from passing a > buffer from virtio-camera to virtio-gpu (in a VM). Linux guests will > use dma-bufs for that, other OSes probably something else. > That's true that it can be handled purely in the guest layers, if there is an existing interface in the guest to pass the proposed host memory id's / offsets / sizes between them. However, for the proposed host memory sharing spec, would there be a standard way to share the host memory across different virtio devices without relying on Linux dmabufs? > cheers, > Gerd > > [-- Attachment #2: Type: text/html, Size: 3102 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-20 15:32 ` Frank Yang @ 2019-02-21 7:29 ` Gerd Hoffmann 2019-02-21 9:24 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 72+ messages in thread From: Gerd Hoffmann @ 2019-02-21 7:29 UTC (permalink / raw) To: Frank Yang Cc: Michael S. Tsirkin, Roman Kiryanov, Stefan Hajnoczi, virtio-dev, Dr. David Alan Gilbert Hi, > > For communication between guest processes within the same VM I don't > > really see a need to involve the hypervisor ... > > > Right, once the host memory is set up we can rely on purely guest side stuff > map sub-regions of it. Or just use guest ram ... > > > Yes, also, other devices of the same VM. > > > > So why involve the hypervisor here? The guest can handle that on its > > own. Passing an image data buffer from the usb webcam to the intel gpu > > for display (on bare metal) isn't fundamentally different from passing a > > buffer from virtio-camera to virtio-gpu (in a VM). Linux guests will > > use dma-bufs for that, other OSes probably something else. > > That's true that it can be handled purely in the guest layers, > if there is an existing interface in the guest > to pass the proposed host memory id's / offsets / sizes > between them. Note: I think using a pci memory bar (aka host memory mapped into the guest) as backing storage for dma-bufs isn't going to work. > However, for the proposed host memory sharing spec, > would there be a standard way to share the host memory across > different virtio devices without relying on Linux dmabufs? I think with the current draft for each device (virtio-fs, virtio-gpu, ...) has its own device-specific memory, and there is no mechanism to exchange buffers between devices. Stefan? I'm also not convinced that explicitly avoiding dmabufs is a good idea here. That would put virtio into its own universe and sharing buffers with non-virtio devices will not work. Think about a intel vgpu as display device, or a usb camera attached to the guest using usb pass-through. Experience shows that using virtualization-specific features / optimizations / short-cuts often turns out to have drawbacks in the long run, even if it looked like a good idea initially. Just look at the mess we had with virtio-pci dma after iommu emulation landed in qemu. And this is only one example, we have more of this ... cheers, Gerd --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-21 7:29 ` Gerd Hoffmann @ 2019-02-21 9:24 ` Dr. David Alan Gilbert 2019-02-21 9:59 ` Gerd Hoffmann 0 siblings, 1 reply; 72+ messages in thread From: Dr. David Alan Gilbert @ 2019-02-21 9:24 UTC (permalink / raw) To: Gerd Hoffmann Cc: Frank Yang, Michael S. Tsirkin, Roman Kiryanov, Stefan Hajnoczi, virtio-dev * Gerd Hoffmann (kraxel@redhat.com) wrote: > Hi, > > > > For communication between guest processes within the same VM I don't > > > really see a need to involve the hypervisor ... > > > > > Right, once the host memory is set up we can rely on purely guest side stuff > > map sub-regions of it. > > Or just use guest ram ... > > > > > Yes, also, other devices of the same VM. > > > > > > So why involve the hypervisor here? The guest can handle that on its > > > own. Passing an image data buffer from the usb webcam to the intel gpu > > > for display (on bare metal) isn't fundamentally different from passing a > > > buffer from virtio-camera to virtio-gpu (in a VM). Linux guests will > > > use dma-bufs for that, other OSes probably something else. > > > > That's true that it can be handled purely in the guest layers, > > if there is an existing interface in the guest > > to pass the proposed host memory id's / offsets / sizes > > between them. > > Note: I think using a pci memory bar (aka host memory mapped into the > guest) as backing storage for dma-bufs isn't going to work. (Not knowing dma-bufs) but could you explain why? Note in my spec the pci-bar isn't necessarily one chunk of host memory; it's a chunk of host VMA into which multiple mmaps go. Dave > > However, for the proposed host memory sharing spec, > > would there be a standard way to share the host memory across > > different virtio devices without relying on Linux dmabufs? > > I think with the current draft for each device (virtio-fs, virtio-gpu, > ...) has its own device-specific memory, and there is no mechanism to > exchange buffers between devices. > > Stefan? > > I'm also not convinced that explicitly avoiding dmabufs is a good idea > here. That would put virtio into its own universe and sharing buffers > with non-virtio devices will not work. Think about a intel vgpu as > display device, or a usb camera attached to the guest using usb > pass-through. > > Experience shows that using virtualization-specific features / > optimizations / short-cuts often turns out to have drawbacks in the long > run, even if it looked like a good idea initially. Just look at the > mess we had with virtio-pci dma after iommu emulation landed in qemu. > And this is only one example, we have more of this ... > > cheers, > Gerd > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-21 9:24 ` Dr. David Alan Gilbert @ 2019-02-21 9:59 ` Gerd Hoffmann 2019-02-21 10:03 ` Dr. David Alan Gilbert 2019-02-22 6:15 ` Michael S. Tsirkin 0 siblings, 2 replies; 72+ messages in thread From: Gerd Hoffmann @ 2019-02-21 9:59 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Frank Yang, Michael S. Tsirkin, Roman Kiryanov, Stefan Hajnoczi, virtio-dev Hi, > > Note: I think using a pci memory bar (aka host memory mapped into the > > guest) as backing storage for dma-bufs isn't going to work. > > (Not knowing dma-bufs) but could you explain why? Many places in the linux kernel assume dma-bufs are built out of normal ram pages (i.e. something backed by struct page). Which is not the case for pci memory bars which are typically ioremapped() (completely or in chunks) to access them. cheers, Gerd --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-21 9:59 ` Gerd Hoffmann @ 2019-02-21 10:03 ` Dr. David Alan Gilbert 2019-02-22 6:15 ` Michael S. Tsirkin 1 sibling, 0 replies; 72+ messages in thread From: Dr. David Alan Gilbert @ 2019-02-21 10:03 UTC (permalink / raw) To: Gerd Hoffmann Cc: Frank Yang, Michael S. Tsirkin, Roman Kiryanov, Stefan Hajnoczi, virtio-dev * Gerd Hoffmann (kraxel@redhat.com) wrote: > Hi, > > > > Note: I think using a pci memory bar (aka host memory mapped into the > > > guest) as backing storage for dma-bufs isn't going to work. > > > > (Not knowing dma-bufs) but could you explain why? > > Many places in the linux kernel assume dma-bufs are built out of normal > ram pages (i.e. something backed by struct page). Which is not the case > for pci memory bars which are typically ioremapped() (completely or in > chunks) to access them. Ah OK, so it's the view from inside the guest that's the problem. Dave > cheers, > Gerd > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-21 9:59 ` Gerd Hoffmann 2019-02-21 10:03 ` Dr. David Alan Gilbert @ 2019-02-22 6:15 ` Michael S. Tsirkin 2019-02-22 6:42 ` Gerd Hoffmann 1 sibling, 1 reply; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-22 6:15 UTC (permalink / raw) To: Gerd Hoffmann Cc: Dr. David Alan Gilbert, Frank Yang, Roman Kiryanov, Stefan Hajnoczi, virtio-dev On Thu, Feb 21, 2019 at 10:59:02AM +0100, Gerd Hoffmann wrote: > Hi, > > > > Note: I think using a pci memory bar (aka host memory mapped into the > > > guest) as backing storage for dma-bufs isn't going to work. > > > > (Not knowing dma-bufs) but could you explain why? > > Many places in the linux kernel assume dma-bufs are built out of normal > ram pages (i.e. something backed by struct page). Which is not the case > for pci memory bars which are typically ioremapped() (completely or in > chunks) to access them. > > cheers, > Gerd Can't we register PCI memory as ZONE_DEVICE? --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-22 6:15 ` Michael S. Tsirkin @ 2019-02-22 6:42 ` Gerd Hoffmann 0 siblings, 0 replies; 72+ messages in thread From: Gerd Hoffmann @ 2019-02-22 6:42 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Dr. David Alan Gilbert, Frank Yang, Roman Kiryanov, Stefan Hajnoczi, virtio-dev On Fri, Feb 22, 2019 at 01:15:27AM -0500, Michael S. Tsirkin wrote: > On Thu, Feb 21, 2019 at 10:59:02AM +0100, Gerd Hoffmann wrote: > > Hi, > > > > > > Note: I think using a pci memory bar (aka host memory mapped into the > > > > guest) as backing storage for dma-bufs isn't going to work. > > > > > > (Not knowing dma-bufs) but could you explain why? > > > > Many places in the linux kernel assume dma-bufs are built out of normal > > ram pages (i.e. something backed by struct page). Which is not the case > > for pci memory bars which are typically ioremapped() (completely or in > > chunks) to access them. > > Can't we register PCI memory as ZONE_DEVICE? Oh, didn't know this exists. Yes, that might work. cheers, Gerd --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-11 15:14 ` Frank Yang 2019-02-11 15:25 ` Frank Yang @ 2019-02-11 16:57 ` Michael S. Tsirkin 1 sibling, 0 replies; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-11 16:57 UTC (permalink / raw) To: Frank Yang Cc: Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Dr. David Alan Gilbert On Mon, Feb 11, 2019 at 07:14:53AM -0800, Frank Yang wrote: > > > On Mon, Feb 11, 2019 at 6:49 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, Feb 04, 2019 at 11:42:25PM -0800, Roman Kiryanov wrote: > > Hi Gerd, > > > > > virtio-gpu specifically needs that to support vulkan and opengl > > > extensions for coherent buffers, which must be allocated by the host > gpu > > > driver. It's WIP still. > > > > the proposed spec says: > > > > +Shared memory regions MUST NOT be used to control the operation > > +of the device, nor to stream data; those should still be performed > > +using virtqueues. > > > > Is there a strong reason to prohibit using memory regions for control > purposes? > > That's in order to make virtio have portability implications, such that if > people see a virtio device in lspci they know there's > no lock-in, their guest can be moved between hypervisors > and will still work. > > > Our long term goal is to have as few kernel drivers as possible and to > move > > "drivers" into userspace. If we go with the virtqueues, is there > > general a purpose > > device/driver to talk between our host and guest to support custom > hardware > > (with own blobs)? > > The challenge is to answer the following question: > how to do this without losing the benefits of standartization? > > > Draft spec is incoming, but the basic idea is to standardize how to enumerate, > discover, and operate (with high performance) such userspace drivers/devices; > the basic operations would be standardized, and userspace drivers would be > constructed out of the resulting primitives. As long standartization facilitates functionality, e.g. if we can support moving between hypervisors, this seems in-scope for virtio. > > Could you please advise if we can use something else to > > achieve this goal? > > I am not sure what the goal is though. Blobs is a means I guess > or it should be :) E.g. is it about being able to iterate quickly? > > Maybe you should look at vhost-user-gpu patches on qemu? > Would this address your need? > Acks for these patches would be a good thing. > > > > Is this it: > > https://patchwork.kernel.org/patch/10444089/ ? > > I'll check it out and try to discuss. Is there a draft spec for it as well? virtio gpu is part of the csprd01 draft. > > > -- > MST > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-11 14:49 ` Michael S. Tsirkin 2019-02-11 15:14 ` Frank Yang @ 2019-02-12 8:27 ` Roman Kiryanov 2019-02-12 11:25 ` Dr. David Alan Gilbert 2019-02-12 13:00 ` Michael S. Tsirkin 1 sibling, 2 replies; 72+ messages in thread From: Roman Kiryanov @ 2019-02-12 8:27 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Dr. David Alan Gilbert, Lingfeng Yang > > Our long term goal is to have as few kernel drivers as possible and to move > > "drivers" into userspace. If we go with the virtqueues, is there > > general a purpose > > device/driver to talk between our host and guest to support custom hardware > > (with own blobs)? > > The challenge is to answer the following question: > how to do this without losing the benefits of standartization? We looked into UIO and it still requires some kernel driver to tell where the device is, it also has limitations on sharing a device between processes. The benefit of standardization could be in avoiding everybody writing their own UIO drivers for virtual devices. Our emulator uses a battery, sound, accelerometer and more. We need to support all of this. I looked into the spec, "5 Device types", and seems "battery" is not there. We can invent our own drivers but we see having one flexible driver is a better idea. Yes, I realize that a guest could think it is using the same device as the host advertised (because strings matched) while it is not. We control both the host and the guest and we can live with this. Regards, Roman. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 8:27 ` Roman Kiryanov @ 2019-02-12 11:25 ` Dr. David Alan Gilbert 2019-02-12 13:47 ` Cornelia Huck 2019-02-12 13:00 ` Michael S. Tsirkin 1 sibling, 1 reply; 72+ messages in thread From: Dr. David Alan Gilbert @ 2019-02-12 11:25 UTC (permalink / raw) To: Roman Kiryanov Cc: Michael S. Tsirkin, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Lingfeng Yang * Roman Kiryanov (rkir@google.com) wrote: > > > Our long term goal is to have as few kernel drivers as possible and to move > > > "drivers" into userspace. If we go with the virtqueues, is there > > > general a purpose > > > device/driver to talk between our host and guest to support custom hardware > > > (with own blobs)? > > > > The challenge is to answer the following question: > > how to do this without losing the benefits of standartization? > > We looked into UIO and it still requires some kernel driver to tell > where the device is, it also has limitations on sharing a device > between processes. The benefit of standardization could be in avoiding > everybody writing their own UIO drivers for virtual devices. > > Our emulator uses a battery, sound, accelerometer and more. We need to > support all of this. I looked into the spec, "5 Device types", and > seems "battery" is not there. We can invent our own drivers but we see > having one flexible driver is a better idea. Can you group these devices together at all in their requirements? For example, battery and accelerometers (to me) sound like low-bandwidth 'sensors' with a set of key,value pairs that update occasionally and a limited (no?) amount of control from the VM->host. A 'virtio-values' device that carried a string list of keys that it supported might make sense and be enough for at least two of your device types. Dave > Yes, I realize that a guest could think it is using the same device as > the host advertised (because strings matched) while it is not. We > control both the host and the guest and we can live with this. > > Regards, > Roman. -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 11:25 ` Dr. David Alan Gilbert @ 2019-02-12 13:47 ` Cornelia Huck 2019-02-12 14:03 ` Michael S. Tsirkin 0 siblings, 1 reply; 72+ messages in thread From: Cornelia Huck @ 2019-02-12 13:47 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Roman Kiryanov, Michael S. Tsirkin, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Lingfeng Yang On Tue, 12 Feb 2019 11:25:47 +0000 "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > * Roman Kiryanov (rkir@google.com) wrote: > > > > Our long term goal is to have as few kernel drivers as possible and to move > > > > "drivers" into userspace. If we go with the virtqueues, is there > > > > general a purpose > > > > device/driver to talk between our host and guest to support custom hardware > > > > (with own blobs)? > > > > > > The challenge is to answer the following question: > > > how to do this without losing the benefits of standartization? > > > > We looked into UIO and it still requires some kernel driver to tell > > where the device is, it also has limitations on sharing a device > > between processes. The benefit of standardization could be in avoiding > > everybody writing their own UIO drivers for virtual devices. > > > > Our emulator uses a battery, sound, accelerometer and more. We need to > > support all of this. I looked into the spec, "5 Device types", and > > seems "battery" is not there. We can invent our own drivers but we see > > having one flexible driver is a better idea. > > Can you group these devices together at all in their requirements? > For example, battery and accelerometers (to me) sound like low-bandwidth > 'sensors' with a set of key,value pairs that update occasionally > and a limited (no?) amount of control from the VM->host. > A 'virtio-values' device that carried a string list of keys that it > supported might make sense and be enough for at least two of your > device types. Maybe not a 'virtio-values' device -- but a 'virtio-sensors' device looks focused enough without being too inflexible. It can easily advertise its type (battery, etc.) and therefore avoid the mismatch problem that a too loosely defined device would be susceptible to. > > Yes, I realize that a guest could think it is using the same device as > > the host advertised (because strings matched) while it is not. We > > control both the host and the guest and we can live with this. The problem is that this is not true for the general case if you have a standardized device type. It must be possible in theory to switch to an alternative implementation of the device or the driver, as long as they conform to the spec. I think a more concretely specified device type (like the suggested virtio-values or virtio-sensors) is needed for that. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 13:47 ` Cornelia Huck @ 2019-02-12 14:03 ` Michael S. Tsirkin 2019-02-12 15:56 ` Frank Yang 0 siblings, 1 reply; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-12 14:03 UTC (permalink / raw) To: Cornelia Huck Cc: Dr. David Alan Gilbert, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Lingfeng Yang On Tue, Feb 12, 2019 at 02:47:41PM +0100, Cornelia Huck wrote: > On Tue, 12 Feb 2019 11:25:47 +0000 > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > > > * Roman Kiryanov (rkir@google.com) wrote: > > > > > Our long term goal is to have as few kernel drivers as possible and to move > > > > > "drivers" into userspace. If we go with the virtqueues, is there > > > > > general a purpose > > > > > device/driver to talk between our host and guest to support custom hardware > > > > > (with own blobs)? > > > > > > > > The challenge is to answer the following question: > > > > how to do this without losing the benefits of standartization? > > > > > > We looked into UIO and it still requires some kernel driver to tell > > > where the device is, it also has limitations on sharing a device > > > between processes. The benefit of standardization could be in avoiding > > > everybody writing their own UIO drivers for virtual devices. > > > > > > Our emulator uses a battery, sound, accelerometer and more. We need to > > > support all of this. I looked into the spec, "5 Device types", and > > > seems "battery" is not there. We can invent our own drivers but we see > > > having one flexible driver is a better idea. > > > > Can you group these devices together at all in their requirements? > > For example, battery and accelerometers (to me) sound like low-bandwidth > > 'sensors' with a set of key,value pairs that update occasionally > > and a limited (no?) amount of control from the VM->host. > > A 'virtio-values' device that carried a string list of keys that it > > supported might make sense and be enough for at least two of your > > device types. > > Maybe not a 'virtio-values' device -- but a 'virtio-sensors' device > looks focused enough without being too inflexible. It can easily > advertise its type (battery, etc.) and therefore avoid the mismatch > problem that a too loosely defined device would be susceptible to. Isn't virtio-vsock/vhost-vsock a good fit for this kind of general string passing? People seem to use it exactly for this. > > > Yes, I realize that a guest could think it is using the same device as > > > the host advertised (because strings matched) while it is not. We > > > control both the host and the guest and we can live with this. > > The problem is that this is not true for the general case if you have a > standardized device type. It must be possible in theory to switch to an > alternative implementation of the device or the driver, as long as they > conform to the spec. I think a more concretely specified device type > (like the suggested virtio-values or virtio-sensors) is needed for that. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 14:03 ` Michael S. Tsirkin @ 2019-02-12 15:56 ` Frank Yang 2019-02-12 16:46 ` Dr. David Alan Gilbert 2019-02-12 18:22 ` Michael S. Tsirkin 0 siblings, 2 replies; 72+ messages in thread From: Frank Yang @ 2019-02-12 15:56 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Cornelia Huck, Dr. David Alan Gilbert, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1.1: Type: text/plain, Size: 6022 bytes --] Thanks Roman for the reply. Yes, we need sensors, sound, codecs, etc. as well. For general string passing, yes, perhaps virtio-vsock can be used. However, I have some concerns about virtio-serial and virtio-vsock (mentioned elsewhere in the thread in rely to Stefan's similar comments) around socket API specialization. Stepping back to standardization and portability concerns, it is also not necessarily desirable to use general pipes to do what we want, because even though that device exists and is part of the spec already, that results in _de-facto_ non-portability. If we had some kind of spec to enumerate such 'user-defined' devices, at least we can have _de-jure_ non-portability; an enumerated device doesn't work as advertised. virtio-gpu: we have concerns around its specialization to virgl and de-facto gallium-based protocol, while we tend to favor API forwarding due to its debuggability and flexibility. We may use virtio-gpu in the future if/when it provides that general "send api data" capability.] In any case, I now have a very rough version of the spec in mind (attached as a patch and as a pdf). The part of the intro in there that is relevant to the current thread: """ Note that virtio-serial/virtio-vsock is not considered because they do not standardize the set of devices that operate on top of them, but in practice, are often used for fully general devices. Spec-wise, this is not a great situation because we would still have potentially non portable device implementations where there is no standard mechanism to determine whether or not things are portable. virtio-user provides a device enumeration mechanism to better control this. In addition, for performance considerations in applications such as graphics and media, virtio-serial/virtio-vsock have the overhead of sending actual traffic through the virtqueue, while an approach based on shared memory can result in having fewer copies and virtqueue messages. virtio-serial is also limited in being specialized for console forwarding and having a cap on the number of clients. virtio-vsock is also not optimal in its choice of sockets API for transport; shared memory cannot be used, arbitrary strings can be passed without an designation of the device/driver being run de-facto, and the guest must have additional machinery to handle socket APIs. In addition, on the host, sockets are only dependable on Linux, with less predictable behavior from Windows/macOS regarding Unix sockets. Waiting for socket traffic on the host also requires a poll() loop, which is suboptimal for latency. With virtio-user, only the bare set of standard driver calls (open/close/ioctl/mmap/read) is needed, and RAM is a more universal transport abstraction. We also explicitly spec out callbacks on host that are triggered by virtqueue messages, which results in lower latency and makes it easy to dispatch to a particular device implementation without polling. """ On Tue, Feb 12, 2019 at 6:03 AM Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Feb 12, 2019 at 02:47:41PM +0100, Cornelia Huck wrote: > > On Tue, 12 Feb 2019 11:25:47 +0000 > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > > > > > * Roman Kiryanov (rkir@google.com) wrote: > > > > > > Our long term goal is to have as few kernel drivers as possible > and to move > > > > > > "drivers" into userspace. If we go with the virtqueues, is there > > > > > > general a purpose > > > > > > device/driver to talk between our host and guest to support > custom hardware > > > > > > (with own blobs)? > > > > > > > > > > The challenge is to answer the following question: > > > > > how to do this without losing the benefits of standartization? > > > > > > > > We looked into UIO and it still requires some kernel driver to tell > > > > where the device is, it also has limitations on sharing a device > > > > between processes. The benefit of standardization could be in > avoiding > > > > everybody writing their own UIO drivers for virtual devices. > > > > > > > > Our emulator uses a battery, sound, accelerometer and more. We need > to > > > > support all of this. I looked into the spec, "5 Device types", and > > > > seems "battery" is not there. We can invent our own drivers but we > see > > > > having one flexible driver is a better idea. > > > > > > Can you group these devices together at all in their requirements? > > > For example, battery and accelerometers (to me) sound like > low-bandwidth > > > 'sensors' with a set of key,value pairs that update occasionally > > > and a limited (no?) amount of control from the VM->host. > > > A 'virtio-values' device that carried a string list of keys that it > > > supported might make sense and be enough for at least two of your > > > device types. > > > > Maybe not a 'virtio-values' device -- but a 'virtio-sensors' device > > looks focused enough without being too inflexible. It can easily > > advertise its type (battery, etc.) and therefore avoid the mismatch > > problem that a too loosely defined device would be susceptible to. > > Isn't virtio-vsock/vhost-vsock a good fit for this kind of general > string passing? People seem to use it exactly for this. > > > > > Yes, I realize that a guest could think it is using the same device > as > > > > the host advertised (because strings matched) while it is not. We > > > > control both the host and the guest and we can live with this. > > > > The problem is that this is not true for the general case if you have a > > standardized device type. It must be possible in theory to switch to an > > alternative implementation of the device or the driver, as long as they > > conform to the spec. I think a more concretely specified device type > > (like the suggested virtio-values or virtio-sensors) is needed for that. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > [-- Attachment #1.2: Type: text/html, Size: 7661 bytes --] [-- Attachment #2: 0001-virtio-user-draft-spec.patch --] [-- Type: application/octet-stream, Size: 27537 bytes --] From 4b6bac6e52f86cab1d21f257556822674649eb2e Mon Sep 17 00:00:00 2001 From: Lingfeng Yang <lfy@google.com> Date: Tue, 12 Feb 2019 07:21:08 -0800 Subject: [PATCH] virtio-user draft spec --- content.tex | 1 + virtio-user.tex | 561 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 562 insertions(+) create mode 100644 virtio-user.tex diff --git a/content.tex b/content.tex index 836ee52..5051209 100644 --- a/content.tex +++ b/content.tex @@ -5559,6 +5559,7 @@ descriptor for the \field{sense_len}, \field{residual}, \input{virtio-input.tex} \input{virtio-crypto.tex} \input{virtio-vsock.tex} +\input{virtio-user.tex} \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} diff --git a/virtio-user.tex b/virtio-user.tex new file mode 100644 index 0000000..f9a08cf --- /dev/null +++ b/virtio-user.tex @@ -0,0 +1,561 @@ +\section{User Device}\label{sec:Device Types / User Device} + +Note: This depends on the upcoming shared-mem type of virtio. + +virtio-user is an interface for defining and operating virtual devices +with high performance. +It is intended that virtio-user serve a need for defining userspace drivers +for virtual machines, but it can be used for kernel drivers as well, +and there are several benefits to this approach that can potentially +make it more flexible and performant than commonly suggested alternatives. + +virtio-user is configured at virtio-user device realization time. +The host enumerates a set of available devices for virtio-user +and the guest is able to use the available ones +according to a privately-defined protocol +that uses a combination of virtqueues and shared memory. + +virtio-user has three main virtqueue types: config, ping, and event. +The config virtqueue is used to enumerate devices, create instances, and allocate shared memory. +The ping virtqueue is optionally used as a doorbell to notify the host to process data. +The event virtqueue is optinally used to wait for the host to complete operations +from the guest. + +On the host, callbacks specific to the enumerated device are issued +on enumeration, instance creation, shared memory operations, and ping. +These device implementations are stored in shared library plugins +separate from the host hypervisor. +The host hypervisor implements a minimal set of operations to allow +the dispatch to happen and to send back event virtqueue messages. + +The main benefit of virtio-user is +to decouple definition of new drivers and devices +from the underlying transport protocol +and from the host hypervisor implementation. +Virtio-user then serves as +a platform for "userspace" drivers for virtual machines; +"userspace" in the literal sense of allowing guest drivers +to be userspace-defined, decoupled from the guest kernel, +"userspace" in the sense of device implementations +being defined away from the "kernel" of the host hypervisor code. + +The second benefit of virtio-user is high performance via shared memory +(Note: This depends on the upcoming shared-mem type of virtio). +Each driver/device created from userspace or the guest kernel +is allowed to create or share regions of shared memory. +Sharing can be done with other virtio-user devices only, +though it may be possible to share with other virtio devices if that is beneficial. + +Another benefit derives from +the separation between the driver definition from the transport protocol. +The implementation of all such user-level drivers +is captured by a set of primitive operations in the guest +and shared library function pointers in the host. +Because of this, virtio-user itself will have a very small implementation footprint, +allowing it to be readily used with a wide variety of guest OSes and host VMMs, +while sharing the same driver/device functionality +defined in the guest and defined in a shared library on the host. +This facilitates using any particular virtual device in many different guest OSes and host VMMs. + +Finally, this has the benefit of being +a general standardization path for existing non-standard devices to use virtio; +if a new device type is introduced that can be used with virtio, +it can be implemented on top of virtio-user first and work immediately +all existing guest OSes / host hypervisors supporting virtio-user. +virtio-user can be used to as a staging area for potential new +virtio device types, and moved to new virtio device types as appropriate. +Currently, virtio-vsock is often suggested as a generic pipe, +but from the standardization point of view, +doing so causes de-facto non-portability; +there is no standard way to enumerate how such generic pipes are used. + +Note that virtio-serial/virtio-vsock is not considered +because they do not standardize the set of devices that operate on top of them, +but in practice, are often used for fully general devices. +Spec-wise, this is not a great situation because we would still have potentially +non portable device implementations where there is no standard mechanism to +determine whether or not things are portable. +virtio-user provides a device enumeration mechanism to better control this. + +In addition, for performance considerations +in applications such as graphics and media, +virtio-serial/virtio-vsock have the overhead of sending actual traffic +through the virtqueue, while an approach based on shared memory +can result in having fewer copies and virtqueue messages. +virtio-serial is also limited in being specialized for console forwarding +and having a cap on the number of clients. +virtio-vsock is also not optimal in its choice of sockets API for +transport; shared memory cannot be used, +arbitrary strings can be passed without an designation of the device/driver being run de-facto, +and the guest must have additional machinery to handle socket APIs. +In addition, on the host, sockets are only dependable on Linux, +with less predictable behavior from Windows/macOS regarding Unix sockets. +Waiting for socket traffic on the host also requires a poll() loop, +which is suboptimal for latency. +With virtio-user, only the bare set of standard driver calls (open/close/ioctl/mmap/read) is needed, +and RAM is a more universal transport abstraction. +We also explicitly spec out callbacks on host that are triggered by virtqueue messages, +which results in lower latency and makes it easy to dispatch to a particular device implementation +without polling. + +\subsection{Device ID}\label{sec:Device Types / User Device / Device ID} + +21 + +\subsection{Virtqueues}\label{sec:Device Types / User Device / Virtqueues} + +\begin{description} +\item[0] config tx +\item[1] config rx +\item[2] ping +\item[3] event +\end{description} + +\subsection{Feature bits}\label{sec: Device Types / User Device / Feature bits } + +No feature bits, unless we go with this alternative: + +An alternative is to specify the possible drivers/devices in the feature bits themselves. +This ensures that there is a standard place where such devices are defined. +However, changing the feature bits would require updates to the spec, driver, and hypervisor, +which may not be as well suited to fast iteration, +and has the undesirable property of coupling device changes to hypervisor changes. + +\subsubsection{Feature bit requirements}\label{sec:Device Types / User Device / Feature bit requirements} + +No feature bit requirements, unless we go with device enumeration in feature bits. + +\subsection{Device configuration layout}\label{sec:Device Types / User Device / Device configuration layout} + +\begin{lstlisting} +struct virtio_user_config { + le32 enumeration_space_id; +}; +\end{lstlisting} + +These serve to identify the virtio-user instance for purposes of compatibility. +Userspace drivers/devices enumerated under the same \field{enumeration_space_id} that match are considered to be compatible. +The guest may not write to \field{enumeration_space_id}. +The host writes once to \field{enumeration_space_id} on initialization. + +\subsection{Device Initialization}\label{sec:Device Types / User Device / Device Initialization} + +The enumeration space id is read from the host into \field{virtio_user_config.enumeration_space_id}. + +On device startup, the config virtqueue is used to enumerate a set of virtual devices available on the host. +They are then registered to the guest in a way that is specific to the guest OS, +such as misc_register for Linux. + +Buffers are added to the config virtqueues +to enumerate available userspace drivers, +to create / destroy userspace device contexts, +or to alloc / free / import / export shared memory. + +Buffers are added to the ping virtqueue to notify the host of device specific operations +or to notify the host that there is available shared memory to consume. +This is like a doorbell with user-defined semantics. + +Buffers are added to the event virtqueue from the device to the driver to +notify the driver that an operation has completed. + +\subsection{Device Operation}\label{sec:Device Types / User Device / Device Operation} + +\subsubsection{Config Virtqueue Messages}\label{sec:Device Types / User Device / Device Operation / Config Virtqueue Messages} + +Operation always begins on the config virtqueue. +Messages transmitted or received on the config virtqueue are of the following structure: + +\begin{lstlisting} +struct virtio_user_config_msg { + le32 msg_type; + le32 device_count; + le32 vendor_ids[MAX_DEVICES]; + le32 device_ids[MAX_DEVICES]; + le32 versions[MAX_DEVICES]; + le64 instance_handle; + le64 shm_id; + le64 shm_offset; + le64 shm_size; + le32 shm_flags; + le32 error; +} +\end{lstlisting} + +\field{MAX_DEVICES} is defined as 32. +\field{msg_type} can only be one of the following: + +\begin{lstlisting} +enum { + VIRTIO_USER_CONFIG_OP_ENUMERATE_DEVICES; + VIRTIO_USER_CONFIG_OP_CREATE_INSTANCE; + VIRTIO_USER_CONFIG_OP_DESTROY_INSTANCE; + VIRTIO_USER_CONFIG_OP_SHARED_MEMORY_ALLOC; + VIRTIO_USER_CONFIG_OP_SHARED_MEMORY_FREE; + VIRTIO_USER_CONFIG_OP_SHARED_MEMORY_EXPORT; + VIRTIO_USER_CONFIG_OP_SHARED_MEMORY_IMPORT; +} +\end{lstlisting} + +\field{error} can only be one of the following: + +\begin{lstlisting} +enum { + VIRTIO_USER_ERROR_CONFIG_DEVICE_INITIALIZATION_FAILED; + VIRTIO_USER_ERROR_CONFIG_INSTANCE_CREATION_FAILED; + VIRTIO_USER_ERROR_CONFIG_SHARED_MEMORY_ALLOC_FAILED; + VIRTIO_USER_ERROR_CONFIG_SHARED_MEMORY_EXPORT_FAILED; + VIRTIO_USER_ERROR_CONFIG_SHARED_MEMORY_IMPORT_FAILED; +} +\end{lstlisting} + +When the guest starts, a \field{virtio_user_config_msg} +with \field{msg_type} equal to \field{VIRTIO_USER_CONFIG_OP_ENUMERATE_DEVICES} is sent +from the guest to the host on the config tx virtqueue. All other fields are ignored. + +The guest then receives a \field{virtio_user_config_msg} +with \field{msg_type} equal to \field{VIRTIO_USER_CONFIG_OP_ENUMERATE_DEVICES}, +with \field{device_count} populated with the number of available devices, +the \field{vendor_ids} array populated with \field{device_count} vendor ids, +the \field{device_ids} array populated with \field{device_count} device ids, +and the \field{versions} array populated with \field{device_count} device versions. + +The results can be obtained more than once, and the same results will always be received +by the guest as long as there is no change to existing virtio userspace devices. + +The guest now knows which devices are available, in addition to \field{enumeration_space_id}. +It is guaranteed that host/guest setups with the same \field{enumeration space id}, +\field{device_count}, \field{device_ids}, \field{vendor_ids}, +and \field{versions} arrays (up to \field{device_count}) +operate the same way as far as virtio-user devices. +There are the following relaxations: + +\begin{enumerate} +\item If a particular combination of IDs in \field{device_ids} / \field{vendor_ids} is missing, +the guest can still continue with the existing set of devices. +\item If a particular combination of IDs in \field{device_ids} / \field{vendor_ids} mismatch in \field{versions}, +the guest can still continue provided the version is deemed ``compatible'' by the guest, +which is determined by the particular device implementation. +Some devices are never compatible between versions +while other devices are backward and/or forward compatible. +\end{enumerate} + +Next, instances, which are particular userspace contexts surrounding devices, are created. + +Creating instances: +The guest sends a \field{virtio_user_config_msg} +with \field{msg_type} equal to \field{VIRTIO_USER_CONFIG_OP_CREATE_INSTANCE} +on the config tx virtqueue. +The first IDs and versions number in \field{vendor_ids}/\field{device_ids}/\field{versions} +On the host, +a new \field{instance_handle} is generated, +and a device-specific instance creation function is run +based on the vendor, device, and version. + +If unsuccessful, \field{error} is set and sent back to the guest +on the config rx virtqueue, and the \field{instance_handle} is discarded. +If successfull, +a \field{virtio_user_config_msg} +with \field{msg_type} equal to \field{VIRTIO_USER_CONFIG_OP_CREATE_INSTANCE} +and \field{instance_handle} equal to the generated handle +is sent on the config rx virtqueue. + +The instance creation function is a callback function that is tied +to a plugin associated with the vendor and device id in question: + +(le64 instance_handle) -> bool + +returning true if instance creation succeeded, +and false if failed. + +Let's call this \field{on_create_instance}. + +Destroying instance: +The guest sends a \field{virtio_user_config_msg} +with \field{msg_type} equal to \field{VIRTIO_USER_CONFIG_OP_DESTROY_INSTANCE} +on the config tx virtqueue. +The only field that needs to be populated +is \field{instance_handle}. +On the host, a device-specific instance destruction function is run: + +(instance_handle) -> void + +Let's call this \field{on_destroy_instance}. + +Also, all \field{shm_id}'s have their memory freed by instance destruction +only if the shared memory was not exported (detailed below). + +Next, shared memory is set up to back device operation. +This depends on the particular guest in question and what drivers/devices are being used. +The shared memory configuration operations are as follows: + +Allocating shared memory: +The guest sends a \field{virtio_user_config_msg} +with \field{msg_type} equal to \field{VIRTIO_USER_CONFIG_OP_SHARED_MEMORY_ALLOC} +on the config tx virtqueue. +\field{instance_handle} needs to be a valid instance handle generated by the host. +\field{shm_size} must be set and greater than zero. +A new shared memory region is created in the PCI address space (actual allocation is deferred). +If any operation fails, a message on the config tx virtqueue +with \field{msg_type} equal to \field{VIRTIO_USER_CONFIG_OP_SHARED_MEMORY_ALLOC} +and \field{error} equal to \field{VIRTIO_USER_ERROR_CONFIG_SHARED_MEMORY_ALLOC_FAILED} +is sent. +If all operations succeed, +a new \field{shm_id} is generated along with \field{shm_offset} (offset into the PCI). +and sent back on the config tx virtqueue. + +Freeing shared memory objects works in a similar way, +with setting \field{msg_type} equal to \field{VIRTIO_USER_CONFIG_OP_SHARED_MEMORY_FREE}. +If the memory has been shared, +it is refcounted based on how many instance have used it. +When the refcount reaches 0, +the host hypervisor will explicitly unmap that shared memory object +from any existing host pointers. + +To export a shared memory object, we need to have a valid \field{instance_handle} +and an allocated shared memory object with a valid \field{shm_id}. +The export operation itself for now is mostly administrative; +it marks that allocated memory as available for sharing. + +To import a shared memory object, we need to have a valid \field{instance_handle} +and an allocated shared memory object with a valid \field{shm_id} +that has been allocated and exported. A new \field{shm_id} is not generated; +this is mostly administrative and marks that that \field{shm_id} +can also be used from the second instance. +This is for sharing memory, so \field{instance_handle} need not +be the same as the \field{instance_handle} that allocated the shared memory. + +This is similar to Vulkan \field{VK_KHR_external_memory}, +except over raw PCI address space and \field{shm_id}'s. + +For mapping and unmapping shared memory objects, +we do not include explicit virtqueue methods for these, +and instead rely on the guest kernel's memory mapping primitives. + +Flow control: Only one config message is allowed to be in flight +either to or from the host at any time. +That is, the handshake tx/rx for device enumeration, instance creation, and shared memory operations +are done in a globally visible single threaded manner. +This is to make it easier to synchronize operations on shared memory and instance creation. + +\subsubsection{Ping Virtqueue Messages}\label{sec:Device Types / User Device / Device Operation / Ping Virtqueue Messages} + +Once the instances have been created and configured with shared memory, +we can already read/write memory, and for some device that may already be enough +if they can operate lock-free and wait-free without needing notifications; we're done! + +However, in order to prevent burning up CPU in most cases, +most devices need some kind of mechanism to trigger activity on the device +from the guest. This is captured via a new message struct, +which is separate from the config struct because it's smaller and +the common case is to send those messages. +These messages are sent from the guest to host +on the ping virtqueue. + +\begin{lstlisting} +struct virtio_user_ping { + le64 instance_handle; + le64 metadata; + le64 shm_id; + le64 shm_offset; + le64 shm_size; + le32 events; +} +\end{lstlisting} + +\field{instance_handle} must be a valid instance handle. +\field{shm_id} need not be a valid shm_id. +If \field{shm_id} is a valid shm_id, +it need not be allocated on the host yet. + +On the device side, each ping results in calling a callback function of type: + +(instance_handle, metadata, phys_addr, host_ptr, events) -> revents + +Let us call this function \field{on_instance_ping}. +It returns revents, which is optionally used in event virtqueue replies. + +If \field{shm_id} is a valid shm_id, +\field{phys_addr} is resolved given \field{shm_offset} by either +the virtio-user driver or the host hypervisor. + +If \field{shm_id} is a valid shm_id +and there is a mapping set up for \field{phys_addr}, +\field{host_ptr} refers to the corresponding memory view in the host address space. +This allows coherent access to device memory from both the host and guest, given +a few extra considerations. +For example, for architectures that do not have store/load coherency (i.e., not x86) +an explicit set of fence or synchronization instructions will also be run by virtio-user +both before and after the call to \field{on_instance_ping}. +An alternative is to leave this up to the implementor of the virtual device, +but it is going to be such a common case to synchronize views of the same memory +that it is probably a good idea to include synchronization out of the box. + +Although, it may be common to block a guest thread until \field{on_instance_ping} +completes on the device side. +That is the purpose of the \field{events} field; the guest can populate it +if it is desired to sync on the host completion. +If \field{events} is not zero, then a reply is sent +back to the guest via the event virtqueue after \field{on_instance_ping} completes, +with the \field{revents} return value. + +Flow control: Arbitrary levels of traffic can be sent +on the ping virtqueue from multiple instances at the same time, +but ordering within an instance is strictly preserved. +Additional resources outside the virtqueue are used to hold incoming messages +if the virtqueue itself fills up. +This is similar to how virtio-vsock handles high traffic. + +The semantics of ping messages themselves also are not restricted to guest to host only; +the shared memory region named in the message can also be filled by the host +and used as receive traffic by the guest. +The ping message is then suitable for DMA operations in both directions, +such as glTexImage2D and glReadPixels, +and audio/video (de)compression (guest populates shared memory with (de)compressed buffers, +sends ping message, host (de)compresses into the same memory region). + +\subsubsection{Event Virtqueue Messages}\label{sec:Device Types / User Device / Device Operation / Event Virtqueue Messages} + +Ping virtqueue messages are enough to cover all async device operations; +that is, operations that do not require a round trip from the host. +This is useful for most kinds of graphics API forwarding along +with media codecs. + +However, it can still be important to synchronize the guest on the completion +of a device operation. + +In the userspace driver, the interface can be similar to Linux uio interrupts for example; +a blocking read() of an device is done and after unblocking, +the operation has completed. +The exact way of waiting is dependent on the guest OS. + +However, it is all implemented on the event virtqueue. The message type: + +\begin{lstlisting} +struct virtio_user_event { + le64 instance_handle; + le32 revents; +} +\end{lstlisting} + +Event messages are sent back to the guest if \field{events} field is nonzero, +as detailed in the section on ping virtqueue messages. + +The guest driver can distinguish which instance receives which ping using +\field{instance_handle}. +The field \field{revents} is written by the return value of +\field{on_instance_ping} from the device side. + +\subsection{Kernel Drivers via virtio-user}\label{sec:Device Types / User Device / Kernel Drivers via virtio-user} + +It is not a hard restriction for instances to be created from guest userspace; +there are many kernel mechanisms such as sync fd's and USB devices +that can benefit from running on top of virtio-user. + +Provided the functionality exists in the guest kernel, virtio-user +shall expose all of its operations to other kernel drivers as well. + +\subsection{Kernel and Hypervisor Portability Requirements}\label{sec:Device Types / User Device / Kernel and Hypervisor Portability Requirements} + +The main goal of virto-user is to allow high performance userspace drivers/devices +to be defined and implemented in a way that is decoupled +from guest kernels and host hypervisors; +even socket interfaces are not assumed to exist, +with only virtqueues and shared memory as the basic transport. + +The device implementations themselves live in shared libraries +that plug in to the host hypervisor. +The userspace driver implementation use existing guest userspace facilities +for communicating with drivers, +such as open()/ioctl()/read()/mmap() on Linux. + +This set of configuration struct and virtqueue message structs +is meant to be implemented +across a wide variety of guest kernels and host hypervisors. +What follows are the requirements to implement virtio-user +for a given guest kernel and a host hypervisor. + +\subsubsection{Kernel Portability Requirements}\label{sec:Device Types / User Device / Kernel and Hypervisor Portability Requirements / Kernel Portability Requirements} + +First, the guest kernel is required to be able to expose the enumerated devices +in the existing way in which devices are exposed. +For example, in Linux, misc_register must be available to add new entries +to /dev/ for each device. +Each such device is associated with the vendor id, device id, and version. +For example, /dev/virtio-user/abcd:ef10:03 refers to vendor id 0xabcd, device id 0xef10, version 3. + +The guest kernel also needs some way to expose config operations to userspace +and to the guest kernel space (as there are a few use cases that would involve implementing +some kernel drivers in terms of virtio-userspace, such as sync fd's, usb, etc) +In Linux, this is done by mapping open() to instance creation, +the last close() to instance destruction, +ioctl() for alloc/free/export/import, +and mmap() to map memory. + +The guest kernel also needs some way to forward ping messages. +In Linux, this can also be done via ioctl(). + +The guest kernel also needs some way to expose event waiting. +In Linux, this can be done via read(), +and the return value will be revents in the event virtqueue message. + +\subsubsection{Hypervisor Portability Requirements}\label{sec:Device Types / User Device / Kernel and Hypervisor Portability Requirements / Kernel Portability Requirements} + +The first capability the host hypervisor will need to support is runtime mapping of +host pointers to guest physical addresses. +As of this writing, this is available in KVM, Intel HAXM, and macOS Hypervisor.framework. + +Next, the host hypervisor will need to support shared library plugin loading. +This is so the device implementation can be separate from the host hypervisor. +The device implementations live in single shared libraries. +There is one plugin shared library +for each vendor/device id. +The functions exposed by each shared library shall have the following form: + +\begin{lstlisting} +void register_memory_mapping_funcs( + bool (*map_guest_ram)(le64 phys_addr, void* host_ptr, le64 size), + bool (*unmap_guest_ram)(le64 phys_addr, le64 size)); +void get_device_config_info(le32* vendorId, le32* deviceId, le32* version); +bool on_create_instance(le64 instance_handle); +void on_destroy_instance(le64 instance_handle); +le32 on_instance_ping(le64 instance_handle, le64 metadata, le64 phys_addr, void* host_ptr, le32 events); +\end{lstlisting} + +The host hypervisor's plugin loading system will load set of such shared libraries +and resolve their vendor id, device id, and versions, +which populates the information necessary for device enumeration to work. + +Each instance is able to use the results of \field{register_memory_mapping_funcs} +to communicate with the host hypervisor to map/unmap the shared memory +to host buffers. + +When an instance with a given vendor and device id is created via +\field{on_create_instance}, the host hypervisor runs +the plugin's \field{on_create_instance} function. + +When an instance is destroyed, +the host hypervisor runs the plugin's \field{on_destroy_instance} call. + +When a ping happens, +the host hypervisor calls the \field{on_instance_ping} of the plugin that is associated +with the \field{instance_handle}. + +If \field{shm_id} and \field{shm_offset} are valid, \field{phys_addr} is populated +with the corresponding guest physical address. + +If the guest physical address is mapped to a host pointer somewhere, then +\field{host_ptr} is populated. + +The return value from the plugin is then used as revents, +and if the events was nonzero, the event virtqueue will be used to +send revents back to the guest. + +Given the portable guest OS / host hypervisor, an existing set of shared libraries +implementing a device can be used for many different guest OSes and hypervisors +that support virtio-user. + +In the guest side, there needs to be a similar set of libraries to send +commands; these depend more on the specifics of the guest OS and how +virtio-user was exposed, but it will tend to be a parallel set of shared +libraries in guest userspace where only guest OS-specific customizations need +to be made while the basic protocol remains the same. -- 2.19.0.605.g01d371f741-goog [-- Attachment #3: virtio-v1.1-wd01.pdf --] [-- Type: application/pdf, Size: 725309 bytes --] [-- Attachment #4: Type: text/plain, Size: 208 bytes --] --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply related [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 15:56 ` Frank Yang @ 2019-02-12 16:46 ` Dr. David Alan Gilbert 2019-02-12 17:20 ` Frank Yang 2019-02-12 18:22 ` Michael S. Tsirkin 1 sibling, 1 reply; 72+ messages in thread From: Dr. David Alan Gilbert @ 2019-02-12 16:46 UTC (permalink / raw) To: Frank Yang Cc: Michael S. Tsirkin, Cornelia Huck, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman * Frank Yang (lfy@google.com) wrote: > Thanks Roman for the reply. Yes, we need sensors, sound, codecs, etc. as > well. > > For general string passing, yes, perhaps virtio-vsock can be used. However, > I have some concerns about virtio-serial and virtio-vsock (mentioned > elsewhere in the thread in rely to Stefan's similar comments) around socket > API specialization. > > Stepping back to standardization and portability concerns, it is also not > necessarily desirable to use general pipes to do what we want, because even > though that device exists and is part of the spec already, that results in > _de-facto_ non-portability. If we had some kind of spec to enumerate such > 'user-defined' devices, at least we can have _de-jure_ non-portability; an > enumerated device doesn't work as advertised. > > virtio-gpu: we have concerns around its specialization to virgl and > de-facto gallium-based protocol, while we tend to favor API forwarding due > to its debuggability and flexibility. We may use virtio-gpu in the future > if/when it provides that general "send api data" capability.] > > In any case, I now have a very rough version of the spec in mind (attached > as a patch and as a pdf). Some thoughts (and remember I'm fairly new to virtio): a) Please don't call it virito-user - we have vhost-user as one of the implementations of virtio and that would just get confusing (especially when we have a vhost-user-user implementation) b) Your ping and event queues confuse me - they seem to be reimplementing exactly what virtio-queues already are; aren't virtio queues already lumps of shared memory with a 'kick' mechanism to wake up the other end when something interesting happens? c) I think you actually have two separate types of devices that should be treated differently; 1) your high bandwidth gpu/codec 2) Low bandwidth batteries/sensors I can imagine you having a total of two device definitions and drivers for (1) and (2). (2) feels like it's pretty similar to a socket/pipe/serial - but it needs a way to enumerate the sensors you have, their ranges etc and a defined format for transmitting the data. I'm not sure if it's possible to take one of the existing socket/pipe/serial things and layer on top of it. (is there any HID like standard for sensors like that?) Perhaps for (1) for your GPU stuff, maybe a single virtio device would work, with a small number of shared memory arenas but multiple virtio queues; each (set of) queues would represent a subdevice (say a bunch of queues for the GPU another bunch for the CODEC etc). Dave > The part of the intro in there that is relevant to the current thread: > > """ > Note that virtio-serial/virtio-vsock is not considered because they do not > standardize the set of devices that operate on top of them, but in practice, > are often used for fully general devices. Spec-wise, this is not a great > situation because we would still have potentially non portable device > implementations where there is no standard mechanism to determine whether or > not things are portable. virtio-user provides a device enumeration > mechanism > to better control this. > > In addition, for performance considerations in applications such as graphics > and media, virtio-serial/virtio-vsock have the overhead of sending actual > traffic through the virtqueue, while an approach based on shared memory can > result in having fewer copies and virtqueue messages. virtio-serial is also > limited in being specialized for console forwarding and having a cap on the > number of clients. virtio-vsock is also not optimal in its choice of > sockets > API for transport; shared memory cannot be used, arbitrary strings can be > passed without an designation of the device/driver being run de-facto, and > the > guest must have additional machinery to handle socket APIs. In addition, on > the host, sockets are only dependable on Linux, with less predictable > behavior > from Windows/macOS regarding Unix sockets. Waiting for socket traffic on > the > host also requires a poll() loop, which is suboptimal for latency. With > virtio-user, only the bare set of standard driver calls > (open/close/ioctl/mmap/read) is needed, and RAM is a more universal > transport > abstraction. We also explicitly spec out callbacks on host that are > triggered > by virtqueue messages, which results in lower latency and makes it easy to > dispatch to a particular device implementation without polling. > > """ > > On Tue, Feb 12, 2019 at 6:03 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > On Tue, Feb 12, 2019 at 02:47:41PM +0100, Cornelia Huck wrote: > > > On Tue, 12 Feb 2019 11:25:47 +0000 > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > > > > > > > * Roman Kiryanov (rkir@google.com) wrote: > > > > > > > Our long term goal is to have as few kernel drivers as possible > > and to move > > > > > > > "drivers" into userspace. If we go with the virtqueues, is there > > > > > > > general a purpose > > > > > > > device/driver to talk between our host and guest to support > > custom hardware > > > > > > > (with own blobs)? > > > > > > > > > > > > The challenge is to answer the following question: > > > > > > how to do this without losing the benefits of standartization? > > > > > > > > > > We looked into UIO and it still requires some kernel driver to tell > > > > > where the device is, it also has limitations on sharing a device > > > > > between processes. The benefit of standardization could be in > > avoiding > > > > > everybody writing their own UIO drivers for virtual devices. > > > > > > > > > > Our emulator uses a battery, sound, accelerometer and more. We need > > to > > > > > support all of this. I looked into the spec, "5 Device types", and > > > > > seems "battery" is not there. We can invent our own drivers but we > > see > > > > > having one flexible driver is a better idea. > > > > > > > > Can you group these devices together at all in their requirements? > > > > For example, battery and accelerometers (to me) sound like > > low-bandwidth > > > > 'sensors' with a set of key,value pairs that update occasionally > > > > and a limited (no?) amount of control from the VM->host. > > > > A 'virtio-values' device that carried a string list of keys that it > > > > supported might make sense and be enough for at least two of your > > > > device types. > > > > > > Maybe not a 'virtio-values' device -- but a 'virtio-sensors' device > > > looks focused enough without being too inflexible. It can easily > > > advertise its type (battery, etc.) and therefore avoid the mismatch > > > problem that a too loosely defined device would be susceptible to. > > > > Isn't virtio-vsock/vhost-vsock a good fit for this kind of general > > string passing? People seem to use it exactly for this. > > > > > > > Yes, I realize that a guest could think it is using the same device > > as > > > > > the host advertised (because strings matched) while it is not. We > > > > > control both the host and the guest and we can live with this. > > > > > > The problem is that this is not true for the general case if you have a > > > standardized device type. It must be possible in theory to switch to an > > > alternative implementation of the device or the driver, as long as they > > > conform to the spec. I think a more concretely specified device type > > > (like the suggested virtio-values or virtio-sensors) is needed for that. > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 16:46 ` Dr. David Alan Gilbert @ 2019-02-12 17:20 ` Frank Yang 2019-02-12 17:26 ` Frank Yang 2019-02-19 7:54 ` Gerd Hoffmann 0 siblings, 2 replies; 72+ messages in thread From: Frank Yang @ 2019-02-12 17:20 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Michael S. Tsirkin, Cornelia Huck, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1: Type: text/plain, Size: 12149 bytes --] On Tue, Feb 12, 2019 at 8:46 AM Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > * Frank Yang (lfy@google.com) wrote: > > Thanks Roman for the reply. Yes, we need sensors, sound, codecs, etc. as > > well. > > > > For general string passing, yes, perhaps virtio-vsock can be used. > However, > > I have some concerns about virtio-serial and virtio-vsock (mentioned > > elsewhere in the thread in rely to Stefan's similar comments) around > socket > > API specialization. > > > > Stepping back to standardization and portability concerns, it is also not > > necessarily desirable to use general pipes to do what we want, because > even > > though that device exists and is part of the spec already, that results > in > > _de-facto_ non-portability. If we had some kind of spec to enumerate such > > 'user-defined' devices, at least we can have _de-jure_ non-portability; > an > > enumerated device doesn't work as advertised. > > > > virtio-gpu: we have concerns around its specialization to virgl and > > de-facto gallium-based protocol, while we tend to favor API forwarding > due > > to its debuggability and flexibility. We may use virtio-gpu in the future > > if/when it provides that general "send api data" capability.] > > > > In any case, I now have a very rough version of the spec in mind > (attached > > as a patch and as a pdf). > > Some thoughts (and remember I'm fairly new to virtio): > Thanks for the feedback! These are good points. > > a) Please don't call it virito-user - we have vhost-user as one of the > implementations of virtio and that would just get confusing (especially > when we have a vhost-user-user implementation) > > Ok, the name isn't really important to me. virto-meta perhaps? > b) Your ping and event queues confuse me - they seem to be > reimplementing exactly what virtio-queues already are; aren't virtio > queues already lumps of shared memory with a 'kick' mechanism to wake up > the other end when something interesting happens? > > Right, it does expose that aspect of virtqueues, but there is a unique part; it adds on a device-specific component (the instance handle and metadata) to dispatch to and trigger the device-specific callback when it comes on the host. I wonder if there's a way to define just the instance handle/metadata/callback parts, with other virtqueue stuff being standard. > c) I think you actually have two separate types of devices that > should be treated differently; > 1) your high bandwidth gpu/codec > 2) Low bandwidth batteries/sensors > Yeah, perhaps there's some way to have general virtio devices for different bandwidth levels. Note that latency can also be an important consideration; while sensors don't send much data per second, the time of their arrival can matter a lot. Sensors themselves can "ping" at 10^2 Hz or even higher. I can imagine you having a total of two device definitions and drivers > for (1) and (2). > (2) feels like it's pretty similar to a socket/pipe/serial - but it > needs a way to enumerate the sensors you have, their ranges etc and a > defined format for transmitting the data. I'm not sure if it's possible > to take one of the existing socket/pipe/serial things and layer on top > of it. (is there any HID like standard for sensors like that?) > > I'm not sure about there being an existing standard (and it's possible to send different kinds of sensor / battery signals; i.e., a fake source for testing purposes, raw sensor output from the actual hardware, and an integrated virtual sensor that passes through derived pose information. There's a similar breakdown when one is considering other devices such as battery and camera.) Though I have reservations about using actual sockets/pipes to communicate, for compatibility and latency reasons. As another example, consider camera / video playback. I've heard that the way to approach this is to build v4l2 and build a virtio-v4l2 layer underneath. However, this would mean either assuming v4l2 on host which is not portable to non-Linux, or requiring additional translation layers between v4l2 in the guest and $library on the host. With the proposed spec, a more general 'codec' device can be implemented, tested on actual programs on multiple guests/hosts easily, and then, if it makes sense, "promoted" to a new "virtio-codec" device type. I think there's a central tension here that is process-related and is similar to the reasons userspace drivers are a thing in Linux; There's a set of existing guests/hosts that use virtio on some sort of development schedule that doesn't match the pace of development in kernels supporting virtio / the virtio spec itself. In Virtio, we have a centralized definition of drivers and devices that is set in the middle layers between chrome on top of VMMs and guest userspace: the guest kernel and core VMM code. When developing for programs such as the Android Emulator, we aim most of our changes at the guest userspace and the chrome on top of the VMM, but it can often involve new forms of guest/host communication. It's possible that what's being proposed also serves as a kind of L2 cache (in the form of such a virtio meta device) to deal with changes in third-party programs like the Android Emulator with servicing the cache miss being standardization out of the meta device and into an existing concrete virtio device type. Seen from this perspective, it can be beneficial process-wise as a way to pull in more new devices / standardize at a faster rate, that would otherwise not have been visible. > Perhaps for (1) for your GPU stuff, maybe a single virtio device > would work, with a small number of shared memory arenas but multiple > virtio queues; each (set of) queues would represent a subdevice (say > a bunch of queues for the GPU another bunch for the CODEC etc). Well, if one device definition works well with both bandwidth regimes and has better latency properties, I like having the one definition better. Along these lines, I would also be open to not putting up a new standard, and revising virtio-vsock to incorporate all the unique parts of the features proposed: dropping socket API requirement from guest/host, device enumeration, operation via ping+shared memory, and callback-based operation with device definitions in the shared libraries. The way we're approaching GPU, with the workloads we tend to run (just UI drawing, or in 3d apps, not too many draw calls per second / limited by host GPU doing work), still having one virtqueue seems fine as it's only used for kick (API call parameters themselves would also live in shared memory), though there are probably workloads that benefit from having one virtqueue per "instance". > Dave > > > The part of the intro in there that is relevant to the current thread: > > > > """ > > Note that virtio-serial/virtio-vsock is not considered because they do > not > > standardize the set of devices that operate on top of them, but in > practice, > > are often used for fully general devices. Spec-wise, this is not a great > > situation because we would still have potentially non portable device > > implementations where there is no standard mechanism to determine > whether or > > not things are portable. virtio-user provides a device enumeration > > mechanism > > to better control this. > > > > In addition, for performance considerations in applications such as > graphics > > and media, virtio-serial/virtio-vsock have the overhead of sending actual > > traffic through the virtqueue, while an approach based on shared memory > can > > result in having fewer copies and virtqueue messages. virtio-serial is > also > > limited in being specialized for console forwarding and having a cap on > the > > number of clients. virtio-vsock is also not optimal in its choice of > > sockets > > API for transport; shared memory cannot be used, arbitrary strings can be > > passed without an designation of the device/driver being run de-facto, > and > > the > > guest must have additional machinery to handle socket APIs. In > addition, on > > the host, sockets are only dependable on Linux, with less predictable > > behavior > > from Windows/macOS regarding Unix sockets. Waiting for socket traffic on > > the > > host also requires a poll() loop, which is suboptimal for latency. With > > virtio-user, only the bare set of standard driver calls > > (open/close/ioctl/mmap/read) is needed, and RAM is a more universal > > transport > > abstraction. We also explicitly spec out callbacks on host that are > > triggered > > by virtqueue messages, which results in lower latency and makes it easy > to > > dispatch to a particular device implementation without polling. > > > > """ > > > > On Tue, Feb 12, 2019 at 6:03 AM Michael S. Tsirkin <mst@redhat.com> > wrote: > > > > > On Tue, Feb 12, 2019 at 02:47:41PM +0100, Cornelia Huck wrote: > > > > On Tue, 12 Feb 2019 11:25:47 +0000 > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > > > > > > > > > * Roman Kiryanov (rkir@google.com) wrote: > > > > > > > > Our long term goal is to have as few kernel drivers as > possible > > > and to move > > > > > > > > "drivers" into userspace. If we go with the virtqueues, is > there > > > > > > > > general a purpose > > > > > > > > device/driver to talk between our host and guest to support > > > custom hardware > > > > > > > > (with own blobs)? > > > > > > > > > > > > > > The challenge is to answer the following question: > > > > > > > how to do this without losing the benefits of standartization? > > > > > > > > > > > > We looked into UIO and it still requires some kernel driver to > tell > > > > > > where the device is, it also has limitations on sharing a device > > > > > > between processes. The benefit of standardization could be in > > > avoiding > > > > > > everybody writing their own UIO drivers for virtual devices. > > > > > > > > > > > > Our emulator uses a battery, sound, accelerometer and more. We > need > > > to > > > > > > support all of this. I looked into the spec, "5 Device types", > and > > > > > > seems "battery" is not there. We can invent our own drivers but > we > > > see > > > > > > having one flexible driver is a better idea. > > > > > > > > > > Can you group these devices together at all in their requirements? > > > > > For example, battery and accelerometers (to me) sound like > > > low-bandwidth > > > > > 'sensors' with a set of key,value pairs that update occasionally > > > > > and a limited (no?) amount of control from the VM->host. > > > > > A 'virtio-values' device that carried a string list of keys that it > > > > > supported might make sense and be enough for at least two of your > > > > > device types. > > > > > > > > Maybe not a 'virtio-values' device -- but a 'virtio-sensors' device > > > > looks focused enough without being too inflexible. It can easily > > > > advertise its type (battery, etc.) and therefore avoid the mismatch > > > > problem that a too loosely defined device would be susceptible to. > > > > > > Isn't virtio-vsock/vhost-vsock a good fit for this kind of general > > > string passing? People seem to use it exactly for this. > > > > > > > > > Yes, I realize that a guest could think it is using the same > device > > > as > > > > > > the host advertised (because strings matched) while it is not. We > > > > > > control both the host and the guest and we can live with this. > > > > > > > > The problem is that this is not true for the general case if you > have a > > > > standardized device type. It must be possible in theory to switch to > an > > > > alternative implementation of the device or the driver, as long as > they > > > > conform to the spec. I think a more concretely specified device type > > > > (like the suggested virtio-values or virtio-sensors) is needed for > that. > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > > > For additional commands, e-mail: > virtio-dev-help@lists.oasis-open.org > > > > > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > [-- Attachment #2: Type: text/html, Size: 16221 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 17:20 ` Frank Yang @ 2019-02-12 17:26 ` Frank Yang 2019-02-12 19:06 ` Michael S. Tsirkin 2019-02-19 7:54 ` Gerd Hoffmann 1 sibling, 1 reply; 72+ messages in thread From: Frank Yang @ 2019-02-12 17:26 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Michael S. Tsirkin, Cornelia Huck, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1: Type: text/plain, Size: 13021 bytes --] On Tue, Feb 12, 2019 at 9:20 AM Frank Yang <lfy@google.com> wrote: > > > On Tue, Feb 12, 2019 at 8:46 AM Dr. David Alan Gilbert < > dgilbert@redhat.com> wrote: > >> * Frank Yang (lfy@google.com) wrote: >> > Thanks Roman for the reply. Yes, we need sensors, sound, codecs, etc. as >> > well. >> > >> > For general string passing, yes, perhaps virtio-vsock can be used. >> However, >> > I have some concerns about virtio-serial and virtio-vsock (mentioned >> > elsewhere in the thread in rely to Stefan's similar comments) around >> socket >> > API specialization. >> > >> > Stepping back to standardization and portability concerns, it is also >> not >> > necessarily desirable to use general pipes to do what we want, because >> even >> > though that device exists and is part of the spec already, that results >> in >> > _de-facto_ non-portability. If we had some kind of spec to enumerate >> such >> > 'user-defined' devices, at least we can have _de-jure_ non-portability; >> an >> > enumerated device doesn't work as advertised. >> > >> > virtio-gpu: we have concerns around its specialization to virgl and >> > de-facto gallium-based protocol, while we tend to favor API forwarding >> due >> > to its debuggability and flexibility. We may use virtio-gpu in the >> future >> > if/when it provides that general "send api data" capability.] >> > >> > In any case, I now have a very rough version of the spec in mind >> (attached >> > as a patch and as a pdf). >> >> Some thoughts (and remember I'm fairly new to virtio): >> > > Thanks for the feedback! These are good points. > > >> >> a) Please don't call it virito-user - we have vhost-user as one of the >> implementations of virtio and that would just get confusing (especially >> when we have a vhost-user-user implementation) >> >> > Ok, the name isn't really important to me. virto-meta perhaps? > > >> b) Your ping and event queues confuse me - they seem to be >> reimplementing exactly what virtio-queues already are; aren't virtio >> queues already lumps of shared memory with a 'kick' mechanism to wake up >> the other end when something interesting happens? >> >> Right, it does expose that aspect of virtqueues, but there is a unique > part; it adds on a device-specific component > (the instance handle and metadata) to dispatch to and trigger the > device-specific callback when it comes on the host. > I wonder if there's a way to define just the instance > handle/metadata/callback parts, with other virtqueue stuff being standard. > > BTW, the other unique aspect is that the ping messages allow a _host_ pointer to serve as the lump of shared memory; then there is no need to track buffers in the guest kernel and the device implementation can perform specialize buffer space management. Because it is also host pointer shared memory, it is also physically contiguous and there is no scatterlist needed to process the traffic. > > >> c) I think you actually have two separate types of devices that >> should be treated differently; >> 1) your high bandwidth gpu/codec >> 2) Low bandwidth batteries/sensors >> > > Yeah, perhaps there's some way to have general virtio devices for > different bandwidth levels. > Note that latency can also be an important consideration; while sensors > don't send much data per second, > the time of their arrival can matter a lot. Sensors themselves can "ping" > at 10^2 Hz or even higher. > > I can imagine you having a total of two device definitions and drivers >> for (1) and (2). >> > (2) feels like it's pretty similar to a socket/pipe/serial - but it >> needs a way to enumerate the sensors you have, their ranges etc and a >> defined format for transmitting the data. I'm not sure if it's possible >> to take one of the existing socket/pipe/serial things and layer on top >> of it. (is there any HID like standard for sensors like that?) >> >> I'm not sure about there being an existing standard > (and it's possible to send different kinds of sensor / battery signals; > i.e., a fake source for testing purposes, raw sensor output from the > actual hardware, > and an integrated virtual sensor that passes through derived pose > information. > There's a similar breakdown when one is considering other devices such as > battery and camera.) > Though I have reservations about using actual sockets/pipes to communicate, > for compatibility and latency reasons. > > As another example, consider camera / video playback. I've heard that the > way to approach this > is to build v4l2 and build a virtio-v4l2 layer underneath. However, this > would mean either assuming v4l2 on host > which is not portable to non-Linux, or requiring additional translation > layers between v4l2 in the guest and > $library on the host. With the proposed spec, a more general 'codec' > device can be implemented, > tested on actual programs on multiple guests/hosts easily, and then, if it > makes sense, > "promoted" to a new "virtio-codec" device type. > > I think there's a central tension here that is process-related and is > similar to the reasons userspace drivers are a thing in Linux; > There's a set of existing guests/hosts that use virtio on some sort of > development schedule that > doesn't match the pace of development in kernels supporting virtio / the > virtio spec itself. > In Virtio, we have a centralized definition of drivers and devices that is > set in the middle layers > between chrome on top of VMMs and guest userspace: the guest kernel and > core VMM code. > When developing for programs such as the Android Emulator, > we aim most of our changes at the guest userspace and the chrome on top of > the VMM, > but it can often involve new forms of guest/host communication. > > It's possible that what's being proposed also serves as a kind of L2 cache > (in the form of such a virtio meta device) to deal with changes > in third-party programs like the Android Emulator with servicing the cache > miss being standardization out of the meta device and into an existing > concrete virtio device type. > Seen from this perspective, it can be beneficial process-wise as a way to > pull in more new devices / standardize at a faster rate, that would > otherwise not have been visible. > > >> Perhaps for (1) for your GPU stuff, maybe a single virtio device >> would work, with a small number of shared memory arenas but multiple >> virtio queues; each (set of) queues would represent a subdevice (say >> a bunch of queues for the GPU another bunch for the CODEC etc). > > > Well, if one device definition works well with both bandwidth regimes and > has better latency properties, > I like having the one definition better. > > Along these lines, I would also be open to not putting up a new standard, > and revising virtio-vsock to incorporate all the unique parts of the > features proposed: > dropping socket API requirement from guest/host, > device enumeration, operation via ping+shared memory, > and callback-based operation with device definitions in the shared > libraries. > > The way we're approaching GPU, with the workloads we tend to run > (just UI drawing, or in 3d apps, not too many draw calls per second / > limited by host GPU doing work), > still having one virtqueue seems fine as it's only used for kick > (API call parameters themselves would also live in shared memory), > though there are probably workloads that benefit from having one virtqueue > per "instance". > > >> > Dave >> >> > The part of the intro in there that is relevant to the current thread: >> > >> > """ >> > Note that virtio-serial/virtio-vsock is not considered because they do >> not >> > standardize the set of devices that operate on top of them, but in >> practice, >> > are often used for fully general devices. Spec-wise, this is not a >> great >> > situation because we would still have potentially non portable device >> > implementations where there is no standard mechanism to determine >> whether or >> > not things are portable. virtio-user provides a device enumeration >> > mechanism >> > to better control this. >> > >> > In addition, for performance considerations in applications such as >> graphics >> > and media, virtio-serial/virtio-vsock have the overhead of sending >> actual >> > traffic through the virtqueue, while an approach based on shared memory >> can >> > result in having fewer copies and virtqueue messages. virtio-serial is >> also >> > limited in being specialized for console forwarding and having a cap on >> the >> > number of clients. virtio-vsock is also not optimal in its choice of >> > sockets >> > API for transport; shared memory cannot be used, arbitrary strings can >> be >> > passed without an designation of the device/driver being run de-facto, >> and >> > the >> > guest must have additional machinery to handle socket APIs. In >> addition, on >> > the host, sockets are only dependable on Linux, with less predictable >> > behavior >> > from Windows/macOS regarding Unix sockets. Waiting for socket traffic >> on >> > the >> > host also requires a poll() loop, which is suboptimal for latency. With >> > virtio-user, only the bare set of standard driver calls >> > (open/close/ioctl/mmap/read) is needed, and RAM is a more universal >> > transport >> > abstraction. We also explicitly spec out callbacks on host that are >> > triggered >> > by virtqueue messages, which results in lower latency and makes it easy >> to >> > dispatch to a particular device implementation without polling. >> > >> > """ >> > >> > On Tue, Feb 12, 2019 at 6:03 AM Michael S. Tsirkin <mst@redhat.com> >> wrote: >> > >> > > On Tue, Feb 12, 2019 at 02:47:41PM +0100, Cornelia Huck wrote: >> > > > On Tue, 12 Feb 2019 11:25:47 +0000 >> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: >> > > > >> > > > > * Roman Kiryanov (rkir@google.com) wrote: >> > > > > > > > Our long term goal is to have as few kernel drivers as >> possible >> > > and to move >> > > > > > > > "drivers" into userspace. If we go with the virtqueues, is >> there >> > > > > > > > general a purpose >> > > > > > > > device/driver to talk between our host and guest to support >> > > custom hardware >> > > > > > > > (with own blobs)? >> > > > > > > >> > > > > > > The challenge is to answer the following question: >> > > > > > > how to do this without losing the benefits of standartization? >> > > > > > >> > > > > > We looked into UIO and it still requires some kernel driver to >> tell >> > > > > > where the device is, it also has limitations on sharing a device >> > > > > > between processes. The benefit of standardization could be in >> > > avoiding >> > > > > > everybody writing their own UIO drivers for virtual devices. >> > > > > > >> > > > > > Our emulator uses a battery, sound, accelerometer and more. We >> need >> > > to >> > > > > > support all of this. I looked into the spec, "5 Device types", >> and >> > > > > > seems "battery" is not there. We can invent our own drivers but >> we >> > > see >> > > > > > having one flexible driver is a better idea. >> > > > > >> > > > > Can you group these devices together at all in their requirements? >> > > > > For example, battery and accelerometers (to me) sound like >> > > low-bandwidth >> > > > > 'sensors' with a set of key,value pairs that update occasionally >> > > > > and a limited (no?) amount of control from the VM->host. >> > > > > A 'virtio-values' device that carried a string list of keys that >> it >> > > > > supported might make sense and be enough for at least two of your >> > > > > device types. >> > > > >> > > > Maybe not a 'virtio-values' device -- but a 'virtio-sensors' device >> > > > looks focused enough without being too inflexible. It can easily >> > > > advertise its type (battery, etc.) and therefore avoid the mismatch >> > > > problem that a too loosely defined device would be susceptible to. >> > > >> > > Isn't virtio-vsock/vhost-vsock a good fit for this kind of general >> > > string passing? People seem to use it exactly for this. >> > > >> > > > > > Yes, I realize that a guest could think it is using the same >> device >> > > as >> > > > > > the host advertised (because strings matched) while it is not. >> We >> > > > > > control both the host and the guest and we can live with this. >> > > > >> > > > The problem is that this is not true for the general case if you >> have a >> > > > standardized device type. It must be possible in theory to switch >> to an >> > > > alternative implementation of the device or the driver, as long as >> they >> > > > conform to the spec. I think a more concretely specified device type >> > > > (like the suggested virtio-values or virtio-sensors) is needed for >> that. >> > > > >> > > > >> --------------------------------------------------------------------- >> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org >> > > > For additional commands, e-mail: >> virtio-dev-help@lists.oasis-open.org >> > > >> >> >> >> -- >> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >> > [-- Attachment #2: Type: text/html, Size: 17254 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 17:26 ` Frank Yang @ 2019-02-12 19:06 ` Michael S. Tsirkin 2019-02-13 2:50 ` Frank Yang 0 siblings, 1 reply; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-12 19:06 UTC (permalink / raw) To: Frank Yang Cc: Dr. David Alan Gilbert, Cornelia Huck, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman On Tue, Feb 12, 2019 at 09:26:10AM -0800, Frank Yang wrote: > BTW, the other unique aspect is that the ping messages allow a _host_ pointer > to serve as the lump of shared memory; > then there is no need to track buffers in the guest kernel and the device > implementation can perform specialize buffer space management. > Because it is also host pointer shared memory, it is also physically contiguous > and there is no scatterlist needed to process the traffic. Yes at the moment virtio descriptors all pass addresses guest to host. Ability to reverse that was part of the vhost-pci proposal a while ago. BTW that also at least originally had ability to tunnel multiple devices over a single connection. There was nothing wrong with the proposals I think, they just had to be polished a bit before making it into the spec. And that runneling was dropped but I think it can be brought back if desired, we just didn't see a use for it. How about that? That sounds close to what you were looking for, does it not? That would be something to look into - if your ideas can be used to implement a virtio device backend by code running within a VM, that would be very interesting. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 19:06 ` Michael S. Tsirkin @ 2019-02-13 2:50 ` Frank Yang 2019-02-13 4:02 ` Michael S. Tsirkin 0 siblings, 1 reply; 72+ messages in thread From: Frank Yang @ 2019-02-13 2:50 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Dr. David Alan Gilbert, Cornelia Huck, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1: Type: text/plain, Size: 2798 bytes --] On Tue, Feb 12, 2019 at 11:06 AM Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Feb 12, 2019 at 09:26:10AM -0800, Frank Yang wrote: > > BTW, the other unique aspect is that the ping messages allow a _host_ > pointer > > to serve as the lump of shared memory; > > then there is no need to track buffers in the guest kernel and the device > > implementation can perform specialize buffer space management. > > Because it is also host pointer shared memory, it is also physically > contiguous > > and there is no scatterlist needed to process the traffic. > > Yes at the moment virtio descriptors all pass addresses guest to host. > > Ability to reverse that was part of the vhost-pci proposal a while ago. > BTW that also at least originally had ability to tunnel > multiple devices over a single connection. > > Can there be a similar proposal for virtio-pci without vhsot? > There was nothing wrong with the proposals I think, they > just had to be polished a bit before making it into the spec. > And that runneling was dropped but I think it can be brought back > if desired, we just didn't see a use for it. > > Thinking about it more, I think vhost-pci might be too much for us due to the vhost requirement (sockets and IPC while we desire a highly process local solution) But there's nothing preventing us from having the same reversals for virtio-pci devices without vhost, right? That's kind of what's being proposed with the shared memory stuff at the moment, though it is not a device type by itself yet (Arguably, it should be). How about that? That sounds close to what you were looking for, > does it not? That would be something to look into - > if your ideas can be used to implement a virtio device > backend by code running within a VM, that would be very interesting. > What about a device type, say, virtio-metapci, that relies on virtio-pci for device enumeration and shared memory handling (assuming it's going to be compatible with the host pointer shared memory implementation), so there's no duplication of the concept of device enumeration nor shared memory operations. But, it works in terms of the ping / event virtqueues, and relies on the host hypervisor to dispatch to device implementation callbacks. A potential issue is that such metapci device share the same device id namespace as other virtio-pci devices...but maybe that's OK? If this can build on virtio-pci, I might be able to come up with a spec that assumes virtio-pci as the transport, and assumes (via the WIP host memory sharing work) that host memory can be used as buffer storage. The difference is that it will not contain most of the config virtqueue stuff (except maybe for create/destroy instance), and it should also work with the existing ecosystem around virtio-pci. > > -- > MST > [-- Attachment #2: Type: text/html, Size: 3834 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-13 2:50 ` Frank Yang @ 2019-02-13 4:02 ` Michael S. Tsirkin 2019-02-13 4:19 ` Michael S. Tsirkin 2019-02-13 4:59 ` Frank Yang 0 siblings, 2 replies; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-13 4:02 UTC (permalink / raw) To: Frank Yang Cc: Dr. David Alan Gilbert, Cornelia Huck, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman On Tue, Feb 12, 2019 at 06:50:29PM -0800, Frank Yang wrote: > > > On Tue, Feb 12, 2019 at 11:06 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Feb 12, 2019 at 09:26:10AM -0800, Frank Yang wrote: > > BTW, the other unique aspect is that the ping messages allow a _host_ > pointer > > to serve as the lump of shared memory; > > then there is no need to track buffers in the guest kernel and the device > > implementation can perform specialize buffer space management. > > Because it is also host pointer shared memory, it is also physically > contiguous > > and there is no scatterlist needed to process the traffic. > > Yes at the moment virtio descriptors all pass addresses guest to host. > > Ability to reverse that was part of the vhost-pci proposal a while ago. > BTW that also at least originally had ability to tunnel > multiple devices over a single connection. > > > > Can there be a similar proposal for virtio-pci without vhsot? > > There was nothing wrong with the proposals I think, they > just had to be polished a bit before making it into the spec. > And that runneling was dropped but I think it can be brought back > if desired, we just didn't see a use for it. > > > Thinking about it more, I think vhost-pci might be too much for us due to the > vhost requirement (sockets and IPC while we desire a highly process local > solution) I agree because the patches try to document a bunch of stuff. But I really just mean taking the host/guest interface part from there. > But there's nothing preventing us from having the same reversals for virtio-pci > devices without vhost, right? Right. I think that if you build something such that vhost pci can be an instance of it on top, then it would have a lot of value. > That's kind of what's being proposed with the shared memory stuff at the > moment, though it is not a device type by itself yet (Arguably, it should be). > > > How about that? That sounds close to what you were looking for, > does it not? That would be something to look into - > if your ideas can be used to implement a virtio device > backend by code running within a VM, that would be very interesting. > > > What about a device type, say, virtio-metapci, I have to say I really dislike that name. It's basically just saying I'm not telling you what it is. Let's try to figure it out. Looks like although it's not a vsock device it's also trying to support creating channels with support for passing two types of messages (data and control) as well as some shared memory access. And it also has enumeration so opened channels can be tied to what? strings? PCI Device IDs? Then the vsock device was designed for this problem space. It might not be a good fit for you e.g. because of some vsock baggage it has. But one of the complex issues it does address is controlling host resource usage so guest socket can't DOS host or starve other sockets by throwing data at host. Things might slow down but progress will be made. If you are building a generic kind of message exchange you could do worse than copy that protocol part. I don't think the question of why not vsock generally was addressed all that well. There's discussion of sockets and poll, but that has nothing to do with virtio which is a host/guest interface. If you are basically happy with the host/guest interface but want to bind a different driver to it, with some minor tweaks, we could create a virtio-wsock which is just like virtio-vsock but has a different id, and use that as a starting point. Go wild build a different driver for it. > that relies on virtio-pci for > device enumeration and shared memory handling > (assuming it's going to be compatible with the host pointer shared memory > implementation), > so there's no duplication of the concept of device enumeration nor shared > memory operations. > But, it works in terms of the ping / event virtqueues, and relies on the host > hypervisor to dispatch to device implementation callbacks. All the talk about dispatch and device implementation is just adding to confusion. This isn't something that belongs in virtio spec anyway, and e.g. qemu is unlikely to add an in-process plugin support just for this. > A potential issue is that such metapci device share the same device id > namespace as other virtio-pci devices...but maybe that's OK? That's a vague question. Same device and vendor id needs to imply same driver works. I think you use the terminology that doesn't match virtio. words device and driver have a specific meaning and that doesnt include things like implementation callbacks. > If this can build on virtio-pci, I might be able to come up with a spec that > assumes virtio-pci as the transport, > and assumes (via the WIP host memory sharing work) that host memory can be used > as buffer storage. > The difference is that it will not contain most of the config virtqueue stuff > (except maybe for create/destroy instance), > and it should also work with the existing ecosystem around virtio-pci. > I still can't say from above whether it's in scope for virtio or not. All the talk about blobs and controlling both host and guest sounds out of scope. But it could be that there are pieces that are inscope, and you would use them for whatever vendor specific thing you need. And I spent a lot of time on this by now. So could you maybe try to extract specifically the host/guest interface things that you miss? I got the part where you want to take a buffer within BAR and pass it to guest. But beyond that I didn't get a lot. E.g. who is sending most data? host? guest? both? There are control messages are these coming from guest? Do you want to know when guest is done with the buffer host allocated? Can you take a buffer away from guest? OTOH all the callback discussion is really irrelevant for the virtio tc. If things can't be described without this they are out of scope for virtio. > > -- > MST > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-13 4:02 ` Michael S. Tsirkin @ 2019-02-13 4:19 ` Michael S. Tsirkin 2019-02-13 4:59 ` Frank Yang 2019-02-13 4:59 ` Frank Yang 1 sibling, 1 reply; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-13 4:19 UTC (permalink / raw) To: Frank Yang Cc: Dr. David Alan Gilbert, Cornelia Huck, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman On Tue, Feb 12, 2019 at 11:02:19PM -0500, Michael S. Tsirkin wrote: > On Tue, Feb 12, 2019 at 06:50:29PM -0800, Frank Yang wrote: > > > > > > On Tue, Feb 12, 2019 at 11:06 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Tue, Feb 12, 2019 at 09:26:10AM -0800, Frank Yang wrote: > > > BTW, the other unique aspect is that the ping messages allow a _host_ > > pointer > > > to serve as the lump of shared memory; > > > then there is no need to track buffers in the guest kernel and the device > > > implementation can perform specialize buffer space management. > > > Because it is also host pointer shared memory, it is also physically > > contiguous > > > and there is no scatterlist needed to process the traffic. > > > > Yes at the moment virtio descriptors all pass addresses guest to host. > > > > Ability to reverse that was part of the vhost-pci proposal a while ago. > > BTW that also at least originally had ability to tunnel > > multiple devices over a single connection. > > > > > > > > Can there be a similar proposal for virtio-pci without vhsot? > > > > There was nothing wrong with the proposals I think, they > > just had to be polished a bit before making it into the spec. > > And that runneling was dropped but I think it can be brought back > > if desired, we just didn't see a use for it. > > > > > > Thinking about it more, I think vhost-pci might be too much for us due to the > > vhost requirement (sockets and IPC while we desire a highly process local > > solution) > > I agree because the patches try to document a bunch of stuff. > But I really just mean taking the host/guest interface > part from there. Tomorrow I'll try to write up a little bit more about the vhost pci ideas. The patches on list go deep into envisioned implementation detail within qemu instead of the actual host/guest interface. You should then be able to figure out they are relevant for you. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-13 4:19 ` Michael S. Tsirkin @ 2019-02-13 4:59 ` Frank Yang 2019-02-13 18:18 ` Frank Yang 0 siblings, 1 reply; 72+ messages in thread From: Frank Yang @ 2019-02-13 4:59 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Dr. David Alan Gilbert, Cornelia Huck, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1: Type: text/plain, Size: 2418 bytes --] On Tue, Feb 12, 2019 at 8:19 PM Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Feb 12, 2019 at 11:02:19PM -0500, Michael S. Tsirkin wrote: > > On Tue, Feb 12, 2019 at 06:50:29PM -0800, Frank Yang wrote: > > > > > > > > > On Tue, Feb 12, 2019 at 11:06 AM Michael S. Tsirkin <mst@redhat.com> > wrote: > > > > > > On Tue, Feb 12, 2019 at 09:26:10AM -0800, Frank Yang wrote: > > > > BTW, the other unique aspect is that the ping messages allow a > _host_ > > > pointer > > > > to serve as the lump of shared memory; > > > > then there is no need to track buffers in the guest kernel and > the device > > > > implementation can perform specialize buffer space management. > > > > Because it is also host pointer shared memory, it is also > physically > > > contiguous > > > > and there is no scatterlist needed to process the traffic. > > > > > > Yes at the moment virtio descriptors all pass addresses guest to > host. > > > > > > Ability to reverse that was part of the vhost-pci proposal a while > ago. > > > BTW that also at least originally had ability to tunnel > > > multiple devices over a single connection. > > > > > > > > > > > > Can there be a similar proposal for virtio-pci without vhsot? > > > > > > There was nothing wrong with the proposals I think, they > > > just had to be polished a bit before making it into the spec. > > > And that runneling was dropped but I think it can be brought back > > > if desired, we just didn't see a use for it. > > > > > > > > > Thinking about it more, I think vhost-pci might be too much for us due > to the > > > vhost requirement (sockets and IPC while we desire a highly process > local > > > solution) > > > > I agree because the patches try to document a bunch of stuff. > > But I really just mean taking the host/guest interface > > part from there. > > Tomorrow I'll try to write up a little bit more about the vhost pci > ideas. The patches on list go deep into envisioned implementation > detail within qemu instead of the actual host/guest interface. > You should then be able to figure out they are relevant for you. > > Ok, looking forward! It does seems like a lot of what you are pointing out, and in the vhost pci, what is specified is not strictly dependent on the qemu implementation. Hopefully we can make the minimal set of changes to support our constraints. > -- > MST > [-- Attachment #2: Type: text/html, Size: 3359 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-13 4:59 ` Frank Yang @ 2019-02-13 18:18 ` Frank Yang 2019-02-14 7:15 ` Frank Yang 0 siblings, 1 reply; 72+ messages in thread From: Frank Yang @ 2019-02-13 18:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Dr. David Alan Gilbert, Cornelia Huck, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1.1: Type: text/plain, Size: 3595 bytes --] Attached is another spec (and at https://github.com/741g/virtio-spec/commit/206b9386d76f2ce18000dfc2b218375e423ac8e0 ) that tries to be more like just another virtio-pci device, but for the express purpose of controlling host memory. Changes: Name is now virtio-hostmem. Explicitly uses virtio-pci transport. Removed explicit device enumeration operation and made it part of the configuration data; the config virtqueue now mainly captures the concepts of instances and host memory allocation + sharing. Removed mentions of triggering callbacks as that is device implementation specific and out of scope. What it ends up being: promote host memory sharing to a device type, the purpose is guest/host communication but for when sockets are not suitable. Control messages are added. It implicitly expects that the "reversal" for virtio-pci buffers as host backed is already available (IIUC that is what virtio-fs with DAX relies on as well), and builds a thin layer on top. Michael, is this a bit closer to what you were thinking? On Tue, Feb 12, 2019 at 8:59 PM Frank Yang <lfy@google.com> wrote: > > > On Tue, Feb 12, 2019 at 8:19 PM Michael S. Tsirkin <mst@redhat.com> wrote: > >> On Tue, Feb 12, 2019 at 11:02:19PM -0500, Michael S. Tsirkin wrote: >> > On Tue, Feb 12, 2019 at 06:50:29PM -0800, Frank Yang wrote: >> > > >> > > >> > > On Tue, Feb 12, 2019 at 11:06 AM Michael S. Tsirkin <mst@redhat.com> >> wrote: >> > > >> > > On Tue, Feb 12, 2019 at 09:26:10AM -0800, Frank Yang wrote: >> > > > BTW, the other unique aspect is that the ping messages allow a >> _host_ >> > > pointer >> > > > to serve as the lump of shared memory; >> > > > then there is no need to track buffers in the guest kernel and >> the device >> > > > implementation can perform specialize buffer space management. >> > > > Because it is also host pointer shared memory, it is also >> physically >> > > contiguous >> > > > and there is no scatterlist needed to process the traffic. >> > > >> > > Yes at the moment virtio descriptors all pass addresses guest to >> host. >> > > >> > > Ability to reverse that was part of the vhost-pci proposal a >> while ago. >> > > BTW that also at least originally had ability to tunnel >> > > multiple devices over a single connection. >> > > >> > > >> > > >> > > Can there be a similar proposal for virtio-pci without vhsot? >> > > >> > > There was nothing wrong with the proposals I think, they >> > > just had to be polished a bit before making it into the spec. >> > > And that runneling was dropped but I think it can be brought back >> > > if desired, we just didn't see a use for it. >> > > >> > > >> > > Thinking about it more, I think vhost-pci might be too much for us >> due to the >> > > vhost requirement (sockets and IPC while we desire a highly process >> local >> > > solution) >> > >> > I agree because the patches try to document a bunch of stuff. >> > But I really just mean taking the host/guest interface >> > part from there. >> >> Tomorrow I'll try to write up a little bit more about the vhost pci >> ideas. The patches on list go deep into envisioned implementation >> detail within qemu instead of the actual host/guest interface. >> You should then be able to figure out they are relevant for you. >> >> Ok, looking forward! It does seems like a lot of what you are pointing > out, and in the vhost pci, what is specified is not strictly dependent on > the qemu implementation. Hopefully we can make the minimal set of changes > to support our constraints. > > >> -- >> MST >> > [-- Attachment #1.2: Type: text/html, Size: 5041 bytes --] [-- Attachment #2: 0001-virtio-hostmem-draft-spec.patch --] [-- Type: application/octet-stream, Size: 16078 bytes --] From 206b9386d76f2ce18000dfc2b218375e423ac8e0 Mon Sep 17 00:00:00 2001 From: Lingfeng Yang <lfy@google.com> Date: Wed, 13 Feb 2019 10:03:40 -0800 Subject: [PATCH] virtio-hostmem draft spec --- content.tex | 1 + virtio-hostmem.tex | 356 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 357 insertions(+) create mode 100644 virtio-hostmem.tex diff --git a/content.tex b/content.tex index 5051209..fe771ef 100644 --- a/content.tex +++ b/content.tex @@ -5560,6 +5560,7 @@ descriptor for the \field{sense_len}, \field{residual}, \input{virtio-crypto.tex} \input{virtio-vsock.tex} \input{virtio-user.tex} +\input{virtio-hostmem.tex} \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} diff --git a/virtio-hostmem.tex b/virtio-hostmem.tex new file mode 100644 index 0000000..956285a --- /dev/null +++ b/virtio-hostmem.tex @@ -0,0 +1,356 @@ +\section{Host Memory Device}\label{sec:Device Types / Host Memory Device} + +Note: This depends on the upcoming shared-mem type of virtio +that allows sharing of host memory to the guest. + +virtio-hostmem is a device for sharing host memory to the guest. +It runs on top of virtio-pci for virtqueue messages and +uses the PCI address space for direct access like virtio-fs does. + +virtio-hostmem's purpose is +to allow high performance general memory accesses between guest and host, +and to allow the guest to access host memory constructed at runtime, +such as mapped memory from graphics APIs. + +Note that vhost-pci/vhost-vsock, virtio-vsock, and virtio-fs +are also general ways to share data between the guest and host, +but they are specialized to socket APIs in the guest, +or depend on a FUSE implementation. +virtio-hostmem provides such communication mechanism over raw memory, +which has benefits of being more portable across hypervisors and guest OSes, +and potentially higher performance due to always being physically contiguous to the guest. + +The guest can create "instances" which capture +a particular use case of the device. +virtio-hostmem is like virtio-input in that the guest can query +for sub-devices with IDs; +the guest provides vendor and device id in configuration. +The host then accepts or rejects the instance creation request. + +Once instance creation succeeds, +shared-mem objects can be allocated from each instance. +Also, different instances can share the same shared-mem objects +through export/import operations. +On the host, it is assumed that the hypervisor will handle +all backing of the shared memory objects with actual memory of some kind. + +In operating the device, a ping virtqueue is used for the guest to notify the host +when something interesting has happened in the shared memory. +Conversely, the event virtqueue is used for the host to notify the guest. +Note that this is asymmetric; +it is expected that the guest will initiate most operations via ping virtqueue, +while occasionally using the event virtqueue to wait on host completions. + +Both guest kernel and userspace drivers can be written using operations +on virtio-hostmem in a way that mirrors UIO for Linux; +open()/close()/ioctl()/read()/write()/mmap(), +but concrete implementations are outside the scope of this spec. + +\subsection{Device ID}\label{sec:Device Types / Host Memory Device / Device ID} + +21 + +\subsection{Virtqueues}\label{sec:Device Types / Host Memory Device / Virtqueues} + +\begin{description} +\item[0] config tx +\item[1] config rx +\item[2] ping +\item[3] event +\end{description} + +\subsection{Feature bits}\label{sec: Device Types / Host Memory Device / Feature bits } + +No feature bits. + +\subsubsection{Feature bit requirements}\label{sec:Device Types / Host Memory Device / Feature bit requirements} + +No feature bit requirements. + +\subsection{Device configuration layout}\label{sec:Device Types / Host Memory Device / Device configuration layout} + +\begin{lstlisting} +struct virtio_hostmem_device_info { + le32 vendor_id; + le32 device_id; + le32 revision; +} + +struct virtio_hostmem_config { + le64 reserved_size; + le32 num_devices; + virtio_hostmem_device_info available_devices[MAX_DEVICES]; +}; +\end{lstlisting} + +\field{virtio_hostmem_device_info} describes a particular usage of the device +in terms of the vendor / device ID and revision. + +\field{reserved_size} is the amount of address space taken away from the guest +to support virtio-hostmem. +A sufficient setting for most purposes is 16 GB. + +\field{num_devices} represents the number of valid entries in \field{available_devices}. + +\field{available_devices} represents the set of available usages of virtio-hostmem (up to \field{MAX_DEVICES}). + +\field{MAX_DEVICES} is the maximum number of sub-devices possible (here, set to 32). + +\subsection{Device Initialization}\label{sec:Device Types / Host Memory Device / Device Initialization} + +Initialization of virtio-hostmem works much like other virtio PCI devices. +It will need a PCI device ID. + +\subsection{Device Operation}\label{sec:Device Types / Host Memory Device / Device Operation} + +\subsubsection{Config Virtqueue Messages}\label{sec:Device Types / Host Memory Device / Device Operation / Config Virtqueue Messages} + +Operation always begins on the config virtqueue. +Messages transmitted or received on the config virtqueue are of the following structure: + +\begin{lstlisting} +struct virtio_hostmem_config_msg { + le32 msg_type; + le32 vendor_id; + le32 device_id; + le32 revision; + le64 instance_handle; + le64 shm_id; + le64 shm_offset; + le64 shm_size; + le32 shm_flags; + le32 error; +} +\end{lstlisting} + +\field{msg_type} can only be one of the following: + +\begin{lstlisting} +enum { + VIRTIO_HOSTMEM_CONFIG_OP_CREATE_INSTANCE; + VIRTIO_HOSTMEM_CONFIG_OP_DESTROY_INSTANCE; + VIRTIO_HOSTMEM_CONFIG_OP_SHARED_MEMORY_ALLOC; + VIRTIO_HOSTMEM_CONFIG_OP_SHARED_MEMORY_FREE; + VIRTIO_HOSTMEM_CONFIG_OP_SHARED_MEMORY_EXPORT; + VIRTIO_HOSTMEM_CONFIG_OP_SHARED_MEMORY_IMPORT; +} +\end{lstlisting} + +\field{error} can only be one of the following: + +\begin{lstlisting} +enum { + VIRTIO_HOSTMEM_ERROR_CONFIG_DEVICE_INITIALIZATION_FAILED; + VIRTIO_HOSTMEM_ERROR_CONFIG_INSTANCE_CREATION_FAILED; + VIRTIO_HOSTMEM_ERROR_CONFIG_SHARED_MEMORY_ALLOC_FAILED; + VIRTIO_HOSTMEM_ERROR_CONFIG_SHARED_MEMORY_EXPORT_FAILED; + VIRTIO_HOSTMEM_ERROR_CONFIG_SHARED_MEMORY_IMPORT_FAILED; +} +\end{lstlisting} + +Instances are particular contexts surrounding usages of virtio-hostmem. +They control whether and how shared memory is allocated +and how messages are dispatched to the host. + +\field{vendor_id}, \field{device_id}, and \field{revision} +distinguish how the hostmem device is used. +If supported on the host via checking the device configuration; +that is, if there exists a backend corresponding to those fields, +instance creation will succeed. +The vendor and device id must match, +while \field{revision} can be more flexible depending on the use case. + +Creating instances: + +The guest sends a \field{virtio_hostmem_config_msg} +with \field{msg_type} equal to \field{VIRTIO_HOSTMEM_CONFIG_OP_CREATE_INSTANCE} +and with the \field{vendor_id}, \field{device_id}, \field{revision} fields set. +The guest sends this message on the config tx virtqueue. +On the host, a new \field{instance_handle} is generated. + +If unsuccessful, \field{error} is set and sent back to the guest +on the config rx virtqueue, and the \field{instance_handle} is discarded. + +If successful, a \field{virtio_hostmem_config_msg} +with \field{msg_type} equal to \field{VIRTIO_HOSTMEM_CONFIG_OP_CREATE_INSTANCE} +and \field{instance_handle} equal to the generated handle +is sent on the config rx virtqueue. + +Destroying instances: + +The guest sends a \field{virtio_hostmem_config_msg} +with \field{msg_type} equal to \field{VIRTIO_HOSTMEM_CONFIG_OP_DESTROY_INSTANCE} +on the config tx virtqueue. +The only field that needs to be populated +is \field{instance_handle}. + +Destroying the instance unmaps from the guest PCI space +and also unmaps on the host side +for all non-exported/imported allocations of the instance. +For exported or imported allocations, unmapping +only occurs when a shared-mem-specific refcount reaches zero. + +The other kinds of config message concern creation of shared host memory regions. +The shared memory configuration operations are as follows: + +Shared memory operations: + +The guest sends a \field{virtio_hostmem_config_msg} +with \field{msg_type} equal to \field{VIRTIO_HOSTMEM_CONFIG_OP_SHARED_MEMORY_ALLOC} +on the config tx virtqueue. +\field{instance_handle} needs to be a valid instance handle generated by the host. +\field{shm_size} must be set and greater than zero. +A new shared memory region is created in the PCI address space (actual allocation is deferred). +If any operation fails, a message on the config tx virtqueue +with \field{msg_type} equal to \field{VIRTIO_HOSTMEM_CONFIG_OP_SHARED_MEMORY_ALLOC} +and \field{error} equal to \field{VIRTIO_HOSTMEM_ERROR_CONFIG_SHARED_MEMORY_ALLOC_FAILED} +is sent. +If all operations succeed, +a new \field{shm_id} is generated along with \field{shm_offset} (offset into the PCI). +and sent back on the config tx virtqueue. + +Freeing shared memory objects works in a similar way, +with setting \field{msg_type} equal to \field{VIRTIO_HOSTMEM_CONFIG_OP_SHARED_MEMORY_FREE}. +If the memory has been shared, +it is refcounted based on how many instance have used it. +When the refcount reaches 0, +the host hypervisor will explicitly unmap that shared memory object +from any existing host pointers. + +To export a shared memory object, we need to have a valid \field{instance_handle} +and an allocated shared memory object with a valid \field{shm_id}. +The export operation itself for now is mostly administrative; +it marks that allocated memory as available for sharing. + +To import a shared memory object, we need to have a valid \field{instance_handle} +and an allocated shared memory object with a valid \field{shm_id} +that has been allocated and exported. A new \field{shm_id} is not generated; +this is mostly administrative and marks that that \field{shm_id} +can also be used from the second instance. +This is for sharing memory, so \field{instance_handle} need not +be the same as the \field{instance_handle} that allocated the shared memory. + +This is similar to Vulkan \field{VK_KHR_external_memory}, +except over raw PCI address space and \field{shm_id}'s. + +For mapping and unmapping shared memory objects, +we do not include explicit virtqueue methods for these, +and instead rely on the guest kernel's memory mapping primitives. + +Flow control: Only one config message is allowed to be in flight +either to or from the host at any time. +That is, the handshake tx/rx for device enumeration, instance creation, and shared memory operations +are done in a globally visible single threaded manner. +This is to make it easier to synchronize operations on shared memory and instance creation. + +\subsubsection{Ping Virtqueue Messages}\label{sec:Device Types / Host Memory Device / Device Operation / Ping Virtqueue Messages} + +Once the instances have been created and configured with shared memory, +we can already read/write memory, and for some devices, that may already be enough +if they can operate lock-free and wait-free without needing notifications; we're done! + +However, in order to prevent burning up CPU in most cases, +most devices need some kind of mechanism to trigger activity on the device +from the guest. This is captured via a new message struct, +which is separate from the config struct because it's smaller and +the common case is to send those messages. +These messages are sent from the guest to host +on the ping virtqueue. + +\begin{lstlisting} +struct virtio_hostmem_ping { + le64 instance_handle; + le64 metadata; + le64 shm_id; + le64 shm_offset; + le64 shm_size; + le64 phys_addr; + le64 host_addr; + le32 events; +} +\end{lstlisting} + +\field{instance_handle} must be a valid instance handle. +\field{shm_id} need not be a valid shm_id. +If \field{shm_id} is a valid shm_id, +it need not be allocated on the host yet. + +If \field{shm_id} is a valid shm_id, +For security reasons, +\field{phys_addr} is resolved given \field{shm_offset} by +the virtio-hostmem driver after the message arrives to the driver. + +If \field{shm_id} is a valid shm_id +and there is a mapping set up for \field{phys_addr}, +\field{host_addr} refers to the corresponding memory view in the host address space. +For security reasons, +\field{host_addr} is only resolved on the host after the message arrives on the host. + +This allows notifications to coherently access to device memory +from both the host and guest, given a few extra considerations. +For example, for architectures that do not have store/load coherency (i.e., not x86) +an explicit set of fence or synchronization instructions will also be run by virtio-hostmem +both before and after the call to \field{on_instance_ping}. +An alternative is to leave this up to the implementor of the virtual device, +but it is going to be such a common case to synchronize views of the same memory +that it is probably a good idea to include synchronization out of the box. + +Although, it may be common to block a guest thread until \field{on_instance_ping} +completes on the device side. +That is the purpose of the \field{events} field; the guest can populate it +if it is desired to sync on the host completion. +If \field{events} is not zero, then a reply shall be sent +back to the guest via the event virtqueue, +with the \field{revents} set to the appropriate value. + +Flow control: Arbitrary levels of traffic can be sent +on the ping virtqueue from multiple instances at the same time, +but ordering within an instance is strictly preserved. +Additional resources outside the virtqueue are used to hold incoming messages +if the virtqueue itself fills up. +This is similar to how virtio-vsock handles high traffic. +However, there will be a limit on the maximum number of messages in flight +to prevent the guest from over-notifying the host. +Once the limit is reached, the guest blocks until the number of messages in flight +is decreased. + +The semantics of ping messages themselves also are not restricted to guest to host only; +the shared memory region named in the message can also be filled by the host +and used as receive traffic by the guest. +The ping message is then suitable for DMA operations in both directions, +such as glTexImage2D and glReadPixels, +and audio/video (de)compression (guest populates shared memory with (de)compressed buffers, +sends ping message, host (de)compresses into the same memory region). + +\subsubsection{Event Virtqueue Messages}\label{sec:Device Types / Host Memory Device / Device Operation / Event Virtqueue Messages} + +Ping virtqueue messages are enough to cover all async device operations; +that is, operations that do not require a round trip from the host. +This is useful for most kinds of graphics API forwarding along +with media codecs. + +However, it can still be important to synchronize the guest on the completion +of a device operation. + +In the driver, the interface can be similar to Linux uio interrupts for example; +a blocking read() of an device is done and after unblocking, +the operation has completed. +The exact way of waiting is dependent on the guest OS. + +However, it is all implemented on the event virtqueue. The message type: + +\begin{lstlisting} +struct virtio_hostmem_event { + le64 instance_handle; + le32 revents; +} +\end{lstlisting} + +Event messages are sent back to the guest if \field{events} field is nonzero, +as detailed in the section on ping virtqueue messages. + +The guest driver can distinguish which instance receives which ping using +\field{instance_handle}. +The field \field{revents} is written by the return value of +\field{on_instance_ping} from the device side. + -- 2.19.0.605.g01d371f741-goog [-- Attachment #3: virtio-v1.1-wd01.pdf --] [-- Type: application/pdf, Size: 744751 bytes --] [-- Attachment #4: Type: text/plain, Size: 208 bytes --] --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply related [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-13 18:18 ` Frank Yang @ 2019-02-14 7:15 ` Frank Yang 2019-02-22 22:05 ` Michael S. Tsirkin 0 siblings, 1 reply; 72+ messages in thread From: Frank Yang @ 2019-02-14 7:15 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Dr. David Alan Gilbert, Cornelia Huck, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1.1: Type: text/plain, Size: 7178 bytes --] Revised the spec to clean up a straggling mention of callbacks, and added a concrete example of how it would be used for a video codec. Inlined here: \subsection{Example Use Case}\label{sec:Device Types / Host Memory Device / Example Use Case} Suppose the guest wants to decode a compressed video buffer. \begin{enumerate} \item Guest creates an instance for the codec vendor id / device id / revision. \item Guest allocates into the PCI region via config virtqueue messages. \item Guest sends a message over the ping virtqueue for the host to back that memory. \item Host codec device implementation exposes codec library's buffer directly to guest. \item Guest: now that the memory is host backed, the guest mmap()'s and downloads the compressed video stream directly to the host buffer. \item Guest: After a packet of compressed video stream is downloaded to the buffer, another message, like a doorbell, is sent on the ping virtqueue to consume existing compressed data. The ping message's offset field is set to the proper offset into the shared-mem object. \item Host: The ping message arrives on the host and the offset is resolved to a physical address and then, if possible, the physical address to a host pointer. Since the memory is now backed, the host pointer is also resolved. \item Host: Codec implementation decodes the video and puts the decoded frames to either a host-side display library (thus with no further guest communication necessary), or puts the raw decompressed frame to a further offset in the host buffer that the guest knows about. \item Guest: Continue downloading video streams and hitting the doorbell, or optionally, wait until the host is done first. If scheduling is not that big of an impact, this can be done without even any further VM exit, by the host writing to an agreed memory location when decoding is done, then the guest uses a polling sleep(N) where N is the correctly tuned timeout such that only a few poll spins are necessary. \item Guest: Or, the host can send back on the event virtqueue \field{revents} and the guest can perform a blocking read() for it. \end{enumerate} The unique / interesting aspects of virtio-hostmem are demonstrated: \begin{enumerate} \item During instance creation the host was allowed to reject the request if the codec device did not exist on host. \item The host can expose a codec library buffer directly to the guest, allowing the guest to write into it with zero copy and the host to decompress again without copying. \item Large bidirectional transfers are possible with zero copy. \item Large bidirectional transfers are possible without scatterlists, because the memory is always physically contiguous. \item It is not necessary to use socket datagrams or data streams to communicate the ping messages; they can be raw structs fresh off the virtqueue. \item After decoding, the guest has the option but not the requirement to wait for the host round trip, allowing for async operation of the codec. \item The guest has the option but not the requirement to wait for the host round trip, allowing for async operation of the codec. \end{enumerate} https://github.com/741g/virtio-spec/blob/61c500d5585552658a7c98ef788a625ffe1e201c/virtio-hostmem.tex On Wed, Feb 13, 2019 at 10:18 AM Frank Yang <lfy@google.com> wrote: > Attached is another spec (and at > https://github.com/741g/virtio-spec/commit/206b9386d76f2ce18000dfc2b218375e423ac8e0 > ) > > that tries to be more like just another virtio-pci device, but for the > express purpose of controlling host memory. > > Changes: > > Name is now virtio-hostmem. > Explicitly uses virtio-pci transport. > Removed explicit device enumeration operation and made it part of the > configuration data; the config virtqueue now mainly captures the concepts > of instances and host memory allocation + sharing. > Removed mentions of triggering callbacks as that is device implementation > specific and out of scope. > > What it ends up being: promote host memory sharing to a device type, the > purpose is guest/host communication but for when sockets are not suitable. > Control messages are added. It implicitly expects that the "reversal" for > virtio-pci buffers as host backed is already available (IIUC that is what > virtio-fs with DAX relies on as well), and builds a thin layer on top. > > Michael, is this a bit closer to what you were thinking? > > On Tue, Feb 12, 2019 at 8:59 PM Frank Yang <lfy@google.com> wrote: > >> >> >> On Tue, Feb 12, 2019 at 8:19 PM Michael S. Tsirkin <mst@redhat.com> >> wrote: >> >>> On Tue, Feb 12, 2019 at 11:02:19PM -0500, Michael S. Tsirkin wrote: >>> > On Tue, Feb 12, 2019 at 06:50:29PM -0800, Frank Yang wrote: >>> > > >>> > > >>> > > On Tue, Feb 12, 2019 at 11:06 AM Michael S. Tsirkin <mst@redhat.com> >>> wrote: >>> > > >>> > > On Tue, Feb 12, 2019 at 09:26:10AM -0800, Frank Yang wrote: >>> > > > BTW, the other unique aspect is that the ping messages allow a >>> _host_ >>> > > pointer >>> > > > to serve as the lump of shared memory; >>> > > > then there is no need to track buffers in the guest kernel and >>> the device >>> > > > implementation can perform specialize buffer space management. >>> > > > Because it is also host pointer shared memory, it is also >>> physically >>> > > contiguous >>> > > > and there is no scatterlist needed to process the traffic. >>> > > >>> > > Yes at the moment virtio descriptors all pass addresses guest to >>> host. >>> > > >>> > > Ability to reverse that was part of the vhost-pci proposal a >>> while ago. >>> > > BTW that also at least originally had ability to tunnel >>> > > multiple devices over a single connection. >>> > > >>> > > >>> > > >>> > > Can there be a similar proposal for virtio-pci without vhsot? >>> > > >>> > > There was nothing wrong with the proposals I think, they >>> > > just had to be polished a bit before making it into the spec. >>> > > And that runneling was dropped but I think it can be brought back >>> > > if desired, we just didn't see a use for it. >>> > > >>> > > >>> > > Thinking about it more, I think vhost-pci might be too much for us >>> due to the >>> > > vhost requirement (sockets and IPC while we desire a highly process >>> local >>> > > solution) >>> > >>> > I agree because the patches try to document a bunch of stuff. >>> > But I really just mean taking the host/guest interface >>> > part from there. >>> >>> Tomorrow I'll try to write up a little bit more about the vhost pci >>> ideas. The patches on list go deep into envisioned implementation >>> detail within qemu instead of the actual host/guest interface. >>> You should then be able to figure out they are relevant for you. >>> >>> Ok, looking forward! It does seems like a lot of what you are pointing >> out, and in the vhost pci, what is specified is not strictly dependent on >> the qemu implementation. Hopefully we can make the minimal set of changes >> to support our constraints. >> >> >>> -- >>> MST >>> >> [-- Attachment #1.2: Type: text/html, Size: 9966 bytes --] [-- Attachment #2: virtio-v1.1-wd01.pdf --] [-- Type: application/pdf, Size: 748302 bytes --] [-- Attachment #3: Type: text/plain, Size: 208 bytes --] --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-14 7:15 ` Frank Yang @ 2019-02-22 22:05 ` Michael S. Tsirkin 2019-02-24 21:19 ` Frank Yang 0 siblings, 1 reply; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-22 22:05 UTC (permalink / raw) To: Frank Yang Cc: Dr. David Alan Gilbert, Cornelia Huck, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman On Wed, Feb 13, 2019 at 11:15:12PM -0800, Frank Yang wrote: > Revised the spec to clean up a straggling mention of callbacks, and added a > concrete example of how it would be used for a video codec. > > Inlined here: BTW I'd rather you started sending versioned RFC patches. This thread's too large already. > \subsection{Example Use Case}\label{sec:Device Types / Host Memory Device / > Example Use Case} > > Suppose the guest wants to decode a compressed video buffer. > > \begin{enumerate} > > \item Guest creates an instance for the codec vendor id / device id / revision. OK we'll need to see how do we come up with a way to avoid conflicts here, e.g. if multiple vendors will use this device. > \item Guest allocates into the PCI region via config virtqueue messages. OK so who allocates memory out of the PCI region? Is it the host or the guest? E.g. does guest say "I want X bytes" and host would respond "here they are, start at offset X"? > \item Guest sends a message over the ping virtqueue for the host to back that > memory. And all of these will need to be maintained on the host right? How many of these regions need to be supported? > > \item Host codec device implementation exposes codec library's buffer directly > to guest. > > \item Guest: now that the memory is host backed, the guest mmap()'s and > downloads the compressed video stream directly to the host buffer. > > \item Guest: After a packet of compressed video stream is downloaded to the > buffer, another message, like a doorbell, is sent on the ping virtqueue to > consume existing compressed data. The ping message's offset field is > set to the proper offset into the shared-mem object. BTW is this terminology e.g. "download", "ping message" standard somewhere? > \item Host: The ping message arrives on the host and the offset is resolved to > a physical address and then, if possible, the physical address to a host > pointer. Since the memory is now backed, the host pointer is also > resolved. > > \item Host: Codec implementation decodes the video and puts the decoded frames > to either a host-side display library (thus with no further guest > communication necessary), or puts the raw decompressed frame to a > further offset in the host buffer that the guest knows about. > > \item Guest: Continue downloading video streams and hitting the doorbell, or > optionally, wait until the host is done first. If scheduling is not that > big of an impact, this can be done without even any further VM exit, by > the host writing to an agreed memory location when decoding is done, > then the guest uses a polling sleep(N) where N is the correctly tuned > timeout such that only a few poll spins are necessary. > > > \item Guest: Or, the host can send back on the event virtqueue \field{revents} > and the guest can perform a blocking read() for it. > > \end{enumerate} > > The unique / interesting aspects of virtio-hostmem are demonstrated: > > \begin{enumerate} > > \item During instance creation the host was allowed to reject the request if > the codec device did not exist on host. > > \item The host can expose a codec library buffer directly to the guest, > allowing the guest to write into it with zero copy and the host to decompress > again without copying. > > \item Large bidirectional transfers are possible with zero copy. However just to make sure, sending small amounts of data is slower since you get to do all the mmap dance. > \item Large bidirectional transfers are possible without scatterlists, because > the memory is always physically contiguous. It might get fragmented though. I think it would be up to host to try and make sure it's not too fragmented, right? > \item It is not necessary to use socket datagrams or data streams to > communicate the ping messages; they can be raw structs fresh off the > virtqueue. OK and ping messages are all fixed size? > \item After decoding, the guest has the option but not the requirement to wait > for the host round trip, allowing for async operation of the codec. > > \item The guest has the option but not the requirement to wait for the host > round trip, allowing for async operation of the codec. > > \end{enumerate} OK I still owe you that write-up about vhost pci. will try to complete that early next week. But generally if I got it right that the host allocates buffers then what you describe does seem to fit a bit better with the vhost pci host/guest interface idea. One question that was asked about vhost pci is whether it is in fact necessary to share a device between multiple applications. Or is it enough to just have one id per device? Thanks! -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-22 22:05 ` Michael S. Tsirkin @ 2019-02-24 21:19 ` Frank Yang 0 siblings, 0 replies; 72+ messages in thread From: Frank Yang @ 2019-02-24 21:19 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Dr. David Alan Gilbert, Cornelia Huck, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1: Type: text/plain, Size: 5292 bytes --] On Fri, Feb 22, 2019 at 2:05 PM Michael S. Tsirkin <mst@redhat.com> wrote: > On Wed, Feb 13, 2019 at 11:15:12PM -0800, Frank Yang wrote: > > Revised the spec to clean up a straggling mention of callbacks, and > added a > > concrete example of how it would be used for a video codec. > > > > Inlined here: > > BTW I'd rather you started sending versioned RFC patches. > This thread's too large already. > > > Ok, I've started a new thread on virtio-comment that re-shuffles the github links I've sent over so far, and responds to your latest questions. Thanks Michael! > > \subsection{Example Use Case}\label{sec:Device Types / Host Memory > Device / > > Example Use Case} > > > > Suppose the guest wants to decode a compressed video buffer. > > > > \begin{enumerate} > > > > \item Guest creates an instance for the codec vendor id / device id / > revision. > > OK we'll need to see how do we come up with a way to avoid conflicts > here, e.g. if multiple vendors will use this device. > > > \item Guest allocates into the PCI region via config virtqueue messages. > > OK so who allocates memory out of the PCI region? > Is it the host or the guest? > E.g. does guest say "I want X bytes" and host would respond > "here they are, start at offset X"? > > > \item Guest sends a message over the ping virtqueue for the host to back > that > > memory. > > And all of these will need to be maintained on the host right? > How many of these regions need to be supported? > > > > > \item Host codec device implementation exposes codec library's buffer > directly > > to guest. > > > > \item Guest: now that the memory is host backed, the guest mmap()'s and > > downloads the compressed video stream directly to the host buffer. > > > > \item Guest: After a packet of compressed video stream is downloaded to > the > > buffer, another message, like a doorbell, is sent on the ping > virtqueue to > > consume existing compressed data. The ping message's offset > field is > > set to the proper offset into the shared-mem object. > > BTW is this terminology e.g. "download", "ping message" standard somewhere? > > > \item Host: The ping message arrives on the host and the offset is > resolved to > > a physical address and then, if possible, the physical address to a > host > > pointer. Since the memory is now backed, the host pointer is also > > resolved. > > > > \item Host: Codec implementation decodes the video and puts the decoded > frames > > to either a host-side display library (thus with no further guest > > communication necessary), or puts the raw decompressed frame to a > > further offset in the host buffer that the guest knows about. > > > > \item Guest: Continue downloading video streams and hitting the > doorbell, or > > optionally, wait until the host is done first. If scheduling is not > that > > big of an impact, this can be done without even any further VM > exit, by > > the host writing to an agreed memory location when decoding is > done, > > then the guest uses a polling sleep(N) where N is the correctly > tuned > > timeout such that only a few poll spins are necessary. > > > > > > \item Guest: Or, the host can send back on the event virtqueue > \field{revents} > > and the guest can perform a blocking read() for it. > > > > \end{enumerate} > > > > The unique / interesting aspects of virtio-hostmem are demonstrated: > > > > \begin{enumerate} > > > > \item During instance creation the host was allowed to reject the > request if > > the codec device did not exist on host. > > > > \item The host can expose a codec library buffer directly to the guest, > > allowing the guest to write into it with zero copy and the host to > decompress > > again without copying. > > > > \item Large bidirectional transfers are possible with zero copy. > > However just to make sure, sending small amounts of data > is slower since you get to do all the mmap dance. > > > > \item Large bidirectional transfers are possible without scatterlists, > because > > the memory is always physically contiguous. > > It might get fragmented though. I think it would be up to > host to try and make sure it's not too fragmented, right? > > > > \item It is not necessary to use socket datagrams or data streams to > > communicate the ping messages; they can be raw structs fresh off the > > virtqueue. > > OK and ping messages are all fixed size? > > > > \item After decoding, the guest has the option but not the requirement > to wait > > for the host round trip, allowing for async operation of the codec. > > > > \item The guest has the option but not the requirement to wait for the > host > > round trip, allowing for async operation of the codec. > > > > \end{enumerate} > > OK I still owe you that write-up about vhost pci. will try to complete > that early next week. But generally if I got it right that the host > allocates buffers then what you describe does seem to fit a bit better > with the vhost pci host/guest interface idea. > > One question that was asked about vhost pci is whether it is in fact > necessary to share a device between multiple applications. > Or is it enough to just have one id per device? > > Thanks! > -- > MST > [-- Attachment #2: Type: text/html, Size: 6472 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-13 4:02 ` Michael S. Tsirkin 2019-02-13 4:19 ` Michael S. Tsirkin @ 2019-02-13 4:59 ` Frank Yang 1 sibling, 0 replies; 72+ messages in thread From: Frank Yang @ 2019-02-13 4:59 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Dr. David Alan Gilbert, Cornelia Huck, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1: Type: text/plain, Size: 8913 bytes --] On Tue, Feb 12, 2019 at 8:02 PM Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Feb 12, 2019 at 06:50:29PM -0800, Frank Yang wrote: > > > > > > On Tue, Feb 12, 2019 at 11:06 AM Michael S. Tsirkin <mst@redhat.com> > wrote: > > > > On Tue, Feb 12, 2019 at 09:26:10AM -0800, Frank Yang wrote: > > > BTW, the other unique aspect is that the ping messages allow a > _host_ > > pointer > > > to serve as the lump of shared memory; > > > then there is no need to track buffers in the guest kernel and the > device > > > implementation can perform specialize buffer space management. > > > Because it is also host pointer shared memory, it is also > physically > > contiguous > > > and there is no scatterlist needed to process the traffic. > > > > Yes at the moment virtio descriptors all pass addresses guest to > host. > > > > Ability to reverse that was part of the vhost-pci proposal a while > ago. > > BTW that also at least originally had ability to tunnel > > multiple devices over a single connection. > > > > > > > > Can there be a similar proposal for virtio-pci without vhsot? > > > > There was nothing wrong with the proposals I think, they > > just had to be polished a bit before making it into the spec. > > And that runneling was dropped but I think it can be brought back > > if desired, we just didn't see a use for it. > > > > > > Thinking about it more, I think vhost-pci might be too much for us due > to the > > vhost requirement (sockets and IPC while we desire a highly process local > > solution) > > I agree because the patches try to document a bunch of stuff. > But I really just mean taking the host/guest interface > part from there. > > So, are you referring to the new ideas that vhost-pci introduces minus socket IPC/inter-VM communication, and the vhost server being in the same process as qemu? That sounds like we could build something for qemu (Stefan?) that talks to a virtio-pci-user (?) backend with a similar set of command line arguments. > > > But there's nothing preventing us from having the same reversals for > virtio-pci > > devices without vhost, right? > > Right. I think that if you build something such that vhost pci > can be an instance of it on top, then it would have > a lot of value. > > I'd be very eager to chase this down. The more interop with existing virtual PCI concepts the better. > > > That's kind of what's being proposed with the shared memory stuff at the > > moment, though it is not a device type by itself yet (Arguably, it > should be). > > > > > > How about that? That sounds close to what you were looking for, > > does it not? That would be something to look into - > > if your ideas can be used to implement a virtio device > > backend by code running within a VM, that would be very interesting. > > > > > > What about a device type, say, virtio-metapci, > > I have to say I really dislike that name. It's basically just saying I'm > not telling you what it is. Let's try to figure it out. Looks like > although it's not a vsock device it's also trying to support creating > channels with support for passing two types of messages (data and > control) as well as some shared memory access. And it also has > enumeration so opened channels can be tied to what? strings? PCI Device > IDs? > > I think we can build this relying on PCI device ids, assuming there are still device IDs readily available. Then the vsock device was designed for this problem space. It might not > be a good fit for you e.g. because of some vsock baggage it has. But one > of the complex issues it does address is controlling host resource usage > so guest socket can't DOS host or starve other sockets by throwing data > at host. Things might slow down but progress will be made. If you are > building a generic kind of message exchange you could do worse than copy > that protocol part. > > That's a good point and I should make sure we've captured that. > I don't think the question of why not vsock generally was addressed all > that well. There's discussion of sockets and poll, but that has nothing > to do with virtio which is a host/guest interface. If you are basically > happy with the host/guest interface but want to bind a different driver > to it, with some minor tweaks, we could create a virtio-wsock which is > just like virtio-vsock but has a different id, and use that as a > starting point. Go wild build a different driver for it. > > The virtio-wsock notion also sounds good, though (also from Stefan's comments) I'd want to clarify how we would define such a device type that is both pure in terms of host/guest interface (i.e., not assuming sockets either in the guest or host), but doesn't also, at the implementation level, imply that the existing implementation of v(w)sock change to accomodate non-socket-based guest/host interfaces. > > > that relies on virtio-pci for > > device enumeration and shared memory handling > > (assuming it's going to be compatible with the host pointer shared memory > > implementation), > > so there's no duplication of the concept of device enumeration nor shared > > memory operations. > > But, it works in terms of the ping / event virtqueues, and relies on the > host > > hypervisor to dispatch to device implementation callbacks. > > All the talk about dispatch and device implementation is just adding to > confusion. This isn't something that belongs in virtio spec anyway, and > e.g. qemu is unlikely to add an in-process plugin support just for this. > > A plugin system of some type is what we think is quite valuable, for decoupling device functionality from QEMU. Keeping the same process is also attractive because of the lack of need of IPC. If there's a lightweight, cross platform way to do IPC via function pointer mechanisms, perhaps by dlsym or LoadLibrary on another process, that could work too, though yes, it would need to be integrated to qemu. > > > A potential issue is that such metapci device share the same device id > > namespace as other virtio-pci devices...but maybe that's OK? > > That's a vague question. > Same device and vendor id needs to imply same driver works. > I think you use the terminology that doesn't match virtio. > words device and driver have a specific meaning and that > doesnt include things like implementation callbacks. > > > > If this can build on virtio-pci, I might be able to come up with a spec > that > > assumes virtio-pci as the transport, > > and assumes (via the WIP host memory sharing work) that host memory can > be used > > as buffer storage. > > The difference is that it will not contain most of the config virtqueue > stuff > > (except maybe for create/destroy instance), > > and it should also work with the existing ecosystem around virtio-pci. > > > > I still can't say from above whether it's in scope for virtio or not. > All the talk about blobs and controlling both host and guest sounds > out of scope. But it could be that there are pieces that are > inscope, and you would use them for whatever vendor specific > thing you need. > > And I spent a lot of time on this by now. > > So could you maybe try to extract specifically the host/guest interface > things that you miss? I got the part where you want to take a buffer > within BAR and pass it to guest. But beyond that I didn't get a lot. E.g. > who is sending most data? host? guest? both? There are control messages > are these coming from guest? Do you want to know when guest is done > with the buffer host allocated? Can you take a buffer away from guest? > > Thanks for taking the time to evaluate; we very much appreciate it and want to resolve the issues you're having! To answer the other questions: - We expect most data to be sent by the guest, except in the cases of image readback, where the host will write a lot of data. - The control messages (ping) are driven by the guest only; at most, the guest can async wait on a long host operation that was triggered by the guest. Hence, the events argument in the spec being guest driver, and revents being populated by the host. - It is out of scope to know when the guest is done with the host buffer. - Buffers will be owned by the host, and the guest will not own any buffers under the current scheme. This is because it is up to the host-side implementation to decide how to back new memory allocations from the guest. > OTOH all the callback discussion is really irrelevant for the virtio tc. > If things can't be described without this they are out of scope > for virtio. > > We can describe the current proposal for virtio without explicitly naming callbacks on the host, but it would push that kind of implementation burden to the host qemu, so I thought it would be a good idea to lay things out end to end in a way that would be concretely implementable. > > > > -- > > MST > > > [-- Attachment #2: Type: text/html, Size: 11521 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 17:20 ` Frank Yang 2019-02-12 17:26 ` Frank Yang @ 2019-02-19 7:54 ` Gerd Hoffmann 2019-02-19 15:54 ` Frank Yang 1 sibling, 1 reply; 72+ messages in thread From: Gerd Hoffmann @ 2019-02-19 7:54 UTC (permalink / raw) To: Frank Yang Cc: Dr. David Alan Gilbert, Michael S. Tsirkin, Cornelia Huck, Roman Kiryanov, Stefan Hajnoczi, virtio-dev, Greg Hartman Hi, > As another example, consider camera / video playback. I've heard that the > way to approach this > is to build v4l2 and build a virtio-v4l2 layer underneath. That would be one option, yes. virtio-v4l2 is a pretty bad name though, I'd name it virtio-camera or virtio-media (if you want both capture and playback) and you don't have to mimic the v4l2 api at virtio level. That might simlify pass-through of v4l2 host devices, but isn't necessarily the best choice long-term. Alternatively emulate something existing, USB Video Class device for example. > However, this would mean either assuming v4l2 on host which is not > portable to non-Linux, or requiring additional translation layers > between v4l2 in the guest and $library on the host. Well, sure, you need to wire up the host side somehow anyway. Emulate something virtual, feed the guest camera with image data from the host camera, ... > With the proposed spec, a more general 'codec' device can be > implemented, tested on actual programs on multiple guests/hosts > easily, and then, if it makes sense, "promoted" to a new > "virtio-codec" device type. I fail to see how a more general device avoids host-specific code (to talk to the host camera, for example). cheers, Gerd --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-19 7:54 ` Gerd Hoffmann @ 2019-02-19 15:54 ` Frank Yang 2019-02-20 3:46 ` Michael S. Tsirkin 2019-02-20 6:25 ` Gerd Hoffmann 0 siblings, 2 replies; 72+ messages in thread From: Frank Yang @ 2019-02-19 15:54 UTC (permalink / raw) To: Gerd Hoffmann Cc: Dr. David Alan Gilbert, Michael S. Tsirkin, Cornelia Huck, Roman Kiryanov, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1: Type: text/plain, Size: 3935 bytes --] On Mon, Feb 18, 2019 at 11:54 PM Gerd Hoffmann <kraxel@redhat.com> wrote: > Hi, > > > As another example, consider camera / video playback. I've heard that the > > way to approach this > > is to build v4l2 and build a virtio-v4l2 layer underneath. > > That would be one option, yes. virtio-v4l2 is a pretty bad name though, > I'd name it virtio-camera or virtio-media (if you want both capture and > playback) and you don't have to mimic the v4l2 api at virtio level. > I think a big thing that's different about our context is that, there are issues when running on non-Linux hosts, non-QEMU hypervisors or non-Linux kernels that isn't commonly encountered in the virtio community. I agree that using v4l2 API itself would bake in all sorts of design decisions that might not be the best idea for a given host-side codec or camera library. > That might simlify pass-through of v4l2 host devices, but isn't > necessarily the best choice long-term. > > Right, but if we're not emulating at the v4l2 api level, then it starts looking a lot like the proposed virtio-hostmem; there's a common pattern of direct access to host memory and a message to notify either guest or host that there is something interesting to do with the host memory, that we'd like to capture with the proposed spec. In addition, we'd like to do things in a way that allows virtual drivers/devices to be defined in a manner that doesn't require the guest kernel to be updated. For us, maintaining and upgrading guest kernels in response to tweaks to virtual devices is much more work than modifying a userspace shared library driver that communicates to some virtio driver. Thus far, it's suggested that socket or network devices be used for this, because they are general guest/host communication, but they don't have the portability or performance characteristics we want. We'd want to benefit from being able to run reliably on hosts other than Linux (which makes Unix sockets riskier), with RAM being the underlying buffer storage. Alternatively emulate something existing, USB Video Class device for > example. > Using usb or faking some other transport generally also involves being coupled to that set of kernel code in the guest, which introduces complexity and overhead. Exposing RAM can be a more flexible abstraction. > > However, this would mean either assuming v4l2 on host which is not > > portable to non-Linux, or requiring additional translation layers > > between v4l2 in the guest and $library on the host. > > Well, sure, you need to wire up the host side somehow anyway. Emulate > something virtual, feed the guest camera with image data from the host > camera, ... > > Right, but we also don't want to couple the implementations together. As specified in the example, the implementation of how the guest camera and host camera work together is defined in areas away from the guest kernel. As long as there are no conflicting device ID's in the configuration space, we can also tell when a particular usage is supported. > With the proposed spec, a more general 'codec' device can be > > implemented, tested on actual programs on multiple guests/hosts > > easily, and then, if it makes sense, "promoted" to a new > > "virtio-codec" device type. > > I fail to see how a more general device avoids host-specific code (to > talk to the host camera, for example). > > Agreed, no, it doesn't; but it facilitates the possibility of the same host-specific code to be defined in a different library from QEMU or whatever VMM is being used. To update driver/device functionality, ideally we want to ship two small shared libraries, one to guest userspace and one to plug in to the host VMM. > cheers, > Gerd > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > > [-- Attachment #2: Type: text/html, Size: 5828 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-19 15:54 ` Frank Yang @ 2019-02-20 3:46 ` Michael S. Tsirkin 2019-02-20 15:24 ` Frank Yang 2019-02-20 6:25 ` Gerd Hoffmann 1 sibling, 1 reply; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-20 3:46 UTC (permalink / raw) To: Frank Yang Cc: Gerd Hoffmann, Dr. David Alan Gilbert, Cornelia Huck, Roman Kiryanov, Stefan Hajnoczi, virtio-dev, Greg Hartman On Tue, Feb 19, 2019 at 07:54:04AM -0800, Frank Yang wrote: > To update driver/device functionality, ideally we want to ship two small shared > libraries, > one to guest userspace and one to plug in to the host VMM. I don't think we want to support that last in QEMU. Generally you want process isolation, not shared library plugins - definitely not host side - VMM is just to sensitive to allow random plugins - and maybe not guest side either. I'm at a conference tomorrow but I hope to complete review of the proposal and respond by end of week. Thanks! -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-20 3:46 ` Michael S. Tsirkin @ 2019-02-20 15:24 ` Frank Yang 2019-02-20 19:29 ` Michael S. Tsirkin 0 siblings, 1 reply; 72+ messages in thread From: Frank Yang @ 2019-02-20 15:24 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Gerd Hoffmann, Dr. David Alan Gilbert, Cornelia Huck, Roman Kiryanov, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1: Type: text/plain, Size: 1828 bytes --] On Tue, Feb 19, 2019 at 7:46 PM Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Feb 19, 2019 at 07:54:04AM -0800, Frank Yang wrote: > > To update driver/device functionality, ideally we want to ship two small > shared > > libraries, > > one to guest userspace and one to plug in to the host VMM. > > I don't think we want to support that last in QEMU. Generally you want > process isolation, not shared library plugins - definitely not host > side - VMM is just to sensitive to allow random plugins - > and maybe not guest side either. > > Yeah that's a good point. I haven't really thought too much into it since I've been planning around, for upstreaming purposes, if not a shared library plugin, then non-shared code living in the VMM, though that does make things more complex for us since we also don't want to further customize QEMU if possible. IPC though, seems like it would add quite some overhead, unless there's some generally accepted portable way to run via shared memory that doesn't also involve busy waiting in a way that burns up the CPU? Then we could maybe define a new transport that works through that, or something. Well, regardless of IPC mechanism, we would also need to solve a compatibility issue: on most host OSes, we can't just take any host pointer and map it into another process (then the hypervisor mapping that pointer to the guest); the Vulkan use case for example, ironically, seems to only work well after remapping through a hypervisor; IPC cannot be done with them unless the driver happens to support that flavor of external memory (cross process shareable + host visible). > I'm at a conference tomorrow but I hope to complete review > of the proposal and respond by end of week. > > Thanks! > > Thanks very much Michael for taking the the time to review it. > -- > MST > [-- Attachment #2: Type: text/html, Size: 2731 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-20 15:24 ` Frank Yang @ 2019-02-20 19:29 ` Michael S. Tsirkin 0 siblings, 0 replies; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-20 19:29 UTC (permalink / raw) To: Frank Yang Cc: Gerd Hoffmann, Dr. David Alan Gilbert, Cornelia Huck, Roman Kiryanov, Stefan Hajnoczi, virtio-dev, Greg Hartman On Wed, Feb 20, 2019 at 07:24:20AM -0800, Frank Yang wrote: > IPC though, seems like it would add quite some overhead, > unless there's some generally accepted portable way to run via shared memory > that doesn't also involve busy waiting in a way that burns up the CPU? Let's just say designing host/guest APIs around specific OS limitations doesn't sound like a good direction either. Try talking to Chrome guys, I hear they are using process per plugin for security too? What do they do for portability? -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-19 15:54 ` Frank Yang 2019-02-20 3:46 ` Michael S. Tsirkin @ 2019-02-20 6:25 ` Gerd Hoffmann 2019-02-20 15:30 ` Frank Yang 1 sibling, 1 reply; 72+ messages in thread From: Gerd Hoffmann @ 2019-02-20 6:25 UTC (permalink / raw) To: Frank Yang Cc: Dr. David Alan Gilbert, Michael S. Tsirkin, Cornelia Huck, Roman Kiryanov, Stefan Hajnoczi, virtio-dev, Greg Hartman Hi, > > That might simlify pass-through of v4l2 host devices, but isn't > > necessarily the best choice long-term. > Right, but if we're not emulating at the v4l2 api level, then it starts > looking a lot > like the proposed virtio-hostmem; there's a common pattern of > direct access to host memory Typically camera hardware can DMA the image data to any place in memory, including guest ram. There is no need to expose host memory to the guest for that. The normal virtio workflow with guest-allocated buffers will work just fine. > In addition, we'd like to do things in a way > that allows virtual drivers/devices to be defined in a manner that doesn't > require the guest kernel to be updated. Hmm. I'm wondering whenever you just need a different virtio transport. Very simliar to virtio-pci, but instead of using all guest ram (modulo iommu quirks) as address space use a pci memory bar as address space. virtio rings would live there, all buffers would live there, addresses passed around would be offsets into that pci bar. Then your userspace driver can just mmap() that pci bar and handle (almost?) everything on its own. Maybe a little stub driver in the kernel is needed for ring notifications. > For us, maintaining and upgrading guest kernels in response to tweaks to > virtual devices > is much more work than modifying a userspace shared library driver that > communicates to some virtio driver. Well, one of the design goals of virtio is that such an upgrade is not required. Capabilities of device and driver are negotiated using feature flags, which removes the need for lockstep updates of guest and host. > Thus far, it's suggested that socket or > network devices > be used for this, because they are general guest/host communication, > but they don't have the portability or performance characteristics we want. For high-bandwidth fully agree. For low-bandwidth (like the sensors discussed elsewhere in the thread) I think running some protocol (mqtt, something json-based, ...) over virtio-serial or virtio-vsock isn't too bad. But creating virtio-sensors makes sense too. > > Alternatively emulate something existing, USB Video Class device for > > example. > > Using usb or faking some other transport generally also involves being > coupled to that set of kernel code in the guest, which introduces > complexity and overhead. Exposing RAM can be a more flexible > abstraction. Well, you get guest drivers for free and all existing software will run just fine out-of-the-box. Which is the reason why there is no virtio-usb for example. xhci has a hardware design which can be emulated without much overhead, so the performance benefit of virtio-usb over xhci would be pretty close to zero. cheers, Gerd --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-20 6:25 ` Gerd Hoffmann @ 2019-02-20 15:30 ` Frank Yang 2019-02-20 15:35 ` Frank Yang 2019-02-21 6:44 ` Gerd Hoffmann 0 siblings, 2 replies; 72+ messages in thread From: Frank Yang @ 2019-02-20 15:30 UTC (permalink / raw) To: Gerd Hoffmann Cc: Dr. David Alan Gilbert, Michael S. Tsirkin, Cornelia Huck, Roman Kiryanov, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1: Type: text/plain, Size: 4726 bytes --] On Tue, Feb 19, 2019 at 10:25 PM Gerd Hoffmann <kraxel@redhat.com> wrote: > Hi, > > > > That might simlify pass-through of v4l2 host devices, but isn't > > > necessarily the best choice long-term. > > > Right, but if we're not emulating at the v4l2 api level, then it starts > > looking a lot > > like the proposed virtio-hostmem; there's a common pattern of > > direct access to host memory > > Typically camera hardware can DMA the image data to any place in memory, > including guest ram. There is no need to expose host memory to the > guest for that. The normal virtio workflow with guest-allocated buffers > will work just fine. > > True, thanks for pointing it out. Although, I'd still think it is still a bit nicer to do it via a host memory buffer, because it'd be outside of guest ram; we wouldn't have to interface as tightly with the kernel in terms of not interfering with other guest RAM usages, such as equivalents of dma_alloc_coherent with the limitation on CMA and all. > In addition, we'd like to do things in a way > > that allows virtual drivers/devices to be defined in a manner that > doesn't > > require the guest kernel to be updated. > > Hmm. I'm wondering whenever you just need a different virtio transport. > Very simliar to virtio-pci, but instead of using all guest ram (modulo > iommu quirks) as address space use a pci memory bar as address space. > virtio rings would live there, all buffers would live there, addresses > passed around would be offsets into that pci bar. Then your userspace > driver can just mmap() that pci bar and handle (almost?) everything on > its own. Maybe a little stub driver in the kernel is needed for ring > notifications. > > Yeah, that's (almost?) exactly what we want to do / we're doing already pretty much with the memory sharing scheme we're using currently. Defining it as a transport would also require definition of another virtio device type that the userspace drivers talk to, right? As a transport/device pair, would it break down as, the transport ends up restricting what kinds of addresses can be referred to in the ring messages (in the pci memory bar), then the device provides the userspace drivers with implementation of mmap() + memory allocation/sharing ops and notification? > > For us, maintaining and upgrading guest kernels in response to tweaks to > > virtual devices > > is much more work than modifying a userspace shared library driver that > > communicates to some virtio driver. > > Well, one of the design goals of virtio is that such an upgrade is not > required. Capabilities of device and driver are negotiated using > feature flags, which removes the need for lockstep updates of guest and > host. > > We also have a similar feature flag system, but yeah it would be great to use the capabilities/flags to their fullest extent here. > > Thus far, it's suggested that socket or > > network devices > > be used for this, because they are general guest/host communication, > > but they don't have the portability or performance characteristics we > want. > > For high-bandwidth fully agree. > > For low-bandwidth (like the sensors discussed elsewhere in the thread) I > think running some protocol (mqtt, something json-based, ...) over > virtio-serial or virtio-vsock isn't too bad. > > But creating virtio-sensors makes sense too. > > Yeah for performance considerations, sockets seem to be fine. It's just that I've also got concerns on how well sockets work on non-Linux. > > Alternatively emulate something existing, USB Video Class device for > > > example. > > > > Using usb or faking some other transport generally also involves being > > coupled to that set of kernel code in the guest, which introduces > > complexity and overhead. Exposing RAM can be a more flexible > > abstraction. > > Well, you get guest drivers for free and all existing software will run > just fine out-of-the-box. > > Which is the reason why there is no virtio-usb for example. xhci has a > hardware design which can be emulated without much overhead, so the > performance benefit of virtio-usb over xhci would be pretty close to > zero. > > That's pretty interesting; from a brief glance at xhci, it looks like what you're proposing is that since there usb passthrough already with xhci, and xhci has something that looks like a virtqueue, perhaps we can create our own USB driver on the host, then pass it through. It does make things more inconvenient for users though since they would have to install usb drivers out of nowhere. Wonder if a usb forwarder virtual device exists that can talk to some shared library on the host for the xhci queue, though that might be considered an unsafe plugin. > cheers, > Gerd > > [-- Attachment #2: Type: text/html, Size: 6495 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-20 15:30 ` Frank Yang @ 2019-02-20 15:35 ` Frank Yang 2019-02-21 6:44 ` Gerd Hoffmann 1 sibling, 0 replies; 72+ messages in thread From: Frank Yang @ 2019-02-20 15:35 UTC (permalink / raw) To: Gerd Hoffmann Cc: Dr. David Alan Gilbert, Michael S. Tsirkin, Cornelia Huck, Roman Kiryanov, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1: Type: text/plain, Size: 5022 bytes --] On Wed, Feb 20, 2019 at 7:30 AM Frank Yang <lfy@google.com> wrote: > > > On Tue, Feb 19, 2019 at 10:25 PM Gerd Hoffmann <kraxel@redhat.com> wrote: > >> Hi, >> >> > > That might simlify pass-through of v4l2 host devices, but isn't >> > > necessarily the best choice long-term. >> >> > Right, but if we're not emulating at the v4l2 api level, then it starts >> > looking a lot >> > like the proposed virtio-hostmem; there's a common pattern of >> > direct access to host memory >> >> Typically camera hardware can DMA the image data to any place in memory, >> including guest ram. There is no need to expose host memory to the >> guest for that. The normal virtio workflow with guest-allocated buffers >> will work just fine. >> >> > True, thanks for pointing it out. > Although, I'd still think it is still a bit nicer to do it via a host > memory buffer, > because it'd be outside of guest ram; > we wouldn't have to interface as tightly with the kernel in terms of not > interfering with > other guest RAM usages, > such as equivalents of dma_alloc_coherent with the limitation on CMA and > all. > > > In addition, we'd like to do things in a way >> > that allows virtual drivers/devices to be defined in a manner that >> doesn't >> > require the guest kernel to be updated. >> >> Hmm. I'm wondering whenever you just need a different virtio transport. >> Very simliar to virtio-pci, but instead of using all guest ram (modulo >> iommu quirks) as address space use a pci memory bar as address space. >> virtio rings would live there, all buffers would live there, addresses >> passed around would be offsets into that pci bar. Then your userspace >> driver can just mmap() that pci bar and handle (almost?) everything on >> its own. Maybe a little stub driver in the kernel is needed for ring >> notifications. >> >> > Yeah, that's (almost?) exactly what we want to do / we're doing already > pretty much > with the memory sharing scheme we're using currently. > > Defining it as a transport would also require definition of another virtio > device type > that the userspace drivers talk to, right? > As a transport/device pair, would it break down as, the transport ends up > restricting what kinds of addresses can be referred to in the ring messages > (in the pci memory bar), > then the device provides the userspace drivers with > implementation of mmap() + memory allocation/sharing ops and notification? > > >> > For us, maintaining and upgrading guest kernels in response to tweaks to >> > virtual devices >> > is much more work than modifying a userspace shared library driver that >> > communicates to some virtio driver. >> >> Well, one of the design goals of virtio is that such an upgrade is not >> required. Capabilities of device and driver are negotiated using >> feature flags, which removes the need for lockstep updates of guest and >> host. >> >> > We also have a similar feature flag system, > but yeah it would be great to use the capabilities/flags > to their fullest extent here. > > >> > Thus far, it's suggested that socket or >> > network devices >> > be used for this, because they are general guest/host communication, >> > but they don't have the portability or performance characteristics we >> want. >> >> For high-bandwidth fully agree. >> >> For low-bandwidth (like the sensors discussed elsewhere in the thread) I >> think running some protocol (mqtt, something json-based, ...) over >> virtio-serial or virtio-vsock isn't too bad. >> >> But creating virtio-sensors makes sense too. >> >> > Yeah for performance considerations, sockets seem to be fine. > (for low bandwidth devices that is, just to be clear) > It's just that I've also got concerns on how well sockets work on > non-Linux. > > > > Alternatively emulate something existing, USB Video Class device for >> > > example. >> > >> > Using usb or faking some other transport generally also involves being >> > coupled to that set of kernel code in the guest, which introduces >> > complexity and overhead. Exposing RAM can be a more flexible >> > abstraction. >> >> Well, you get guest drivers for free and all existing software will run >> just fine out-of-the-box. >> >> Which is the reason why there is no virtio-usb for example. xhci has a >> hardware design which can be emulated without much overhead, so the >> performance benefit of virtio-usb over xhci would be pretty close to >> zero. >> >> > That's pretty interesting; from a brief glance at xhci, it looks like what > you're proposing > is that since there usb passthrough already with xhci, > and xhci has something that looks like a virtqueue, > perhaps we can create our own USB driver on the host, > then pass it through. > > It does make things more inconvenient for users though since they would > have to install > usb drivers out of nowhere. Wonder if a usb forwarder virtual device > exists that can talk to some shared library on the host for the xhci queue, > though that might be considered an unsafe plugin. > > >> cheers, >> Gerd >> >> [-- Attachment #2: Type: text/html, Size: 7132 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-20 15:30 ` Frank Yang 2019-02-20 15:35 ` Frank Yang @ 2019-02-21 6:44 ` Gerd Hoffmann 1 sibling, 0 replies; 72+ messages in thread From: Gerd Hoffmann @ 2019-02-21 6:44 UTC (permalink / raw) To: Frank Yang Cc: Dr. David Alan Gilbert, Michael S. Tsirkin, Cornelia Huck, Roman Kiryanov, Stefan Hajnoczi, virtio-dev, Greg Hartman > > Hmm. I'm wondering whenever you just need a different virtio transport. > > Very simliar to virtio-pci, but instead of using all guest ram (modulo > > iommu quirks) as address space use a pci memory bar as address space. > > virtio rings would live there, all buffers would live there, addresses > > passed around would be offsets into that pci bar. Then your userspace > > driver can just mmap() that pci bar and handle (almost?) everything on > > its own. Maybe a little stub driver in the kernel is needed for ring > > notifications. > > > > > Yeah, that's (almost?) exactly what we want to do / we're doing already > pretty much > with the memory sharing scheme we're using currently. > > Defining it as a transport would also require definition of another virtio > device type > that the userspace drivers talk to, right? Yes, you still need to define (for example) virtio-sensors. > As a transport/device pair, would it break down as, the transport ends up > restricting what kinds of addresses can be referred to in the ring messages > (in the pci memory bar), > then the device provides the userspace drivers with > implementation of mmap() + memory allocation/sharing ops and notification? I'd expect the userspace driver would just mmap the whole pci bar, then manage it without the kernel's help. > > > Using usb or faking some other transport generally also involves being > > > coupled to that set of kernel code in the guest, which introduces > > > complexity and overhead. Exposing RAM can be a more flexible > > > abstraction. > > > > Well, you get guest drivers for free and all existing software will run > > just fine out-of-the-box. > > > > Which is the reason why there is no virtio-usb for example. xhci has a > > hardware design which can be emulated without much overhead, so the > > performance benefit of virtio-usb over xhci would be pretty close to > > zero. > > > > > That's pretty interesting; from a brief glance at xhci, it looks like what > you're proposing > is that since there usb passthrough already with xhci, > and xhci has something that looks like a virtqueue, > perhaps we can create our own USB driver on the host, > then pass it through. > > It does make things more inconvenient for users though since they would > have to install > usb drivers out of nowhere. Wonder if a usb forwarder virtual device exists > that can talk to some shared library on the host for the xhci queue, though > that might be considered an unsafe plugin. Well, we have virtual emulated devices (hw/usb/dev-*.c in source tree). You could add a virtual camera device here ... Alternatively qemu can pass through anything libusb can talk to (typically physical devices, but IIRC you can also create virtual ones with the kernel's usb gadget drivers). Direct access to the xhci queues/rings isn't available, this is abstracted away by the qemu usb subsystem. The virtual usb devices emulation drivers get a iov pointing to the buffers in guest memory for a given usb transfer request, no matter whenever uhci, ehci or xhci is used. cheers, Gerd --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 15:56 ` Frank Yang 2019-02-12 16:46 ` Dr. David Alan Gilbert @ 2019-02-12 18:22 ` Michael S. Tsirkin 2019-02-12 19:01 ` Frank Yang 1 sibling, 1 reply; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-12 18:22 UTC (permalink / raw) To: Frank Yang Cc: Cornelia Huck, Dr. David Alan Gilbert, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman On Tue, Feb 12, 2019 at 07:56:58AM -0800, Frank Yang wrote: > Stepping back to standardization and portability concerns, it is also not > necessarily desirable to use general pipes to do what we want, because even > though that device exists and is part of the spec already, that results in > _de-facto_ non-portability. That's not different from e.g. TCP. > If we had some kind of spec to enumerate such > 'user-defined' devices, at least we can have _de-jure_ non-portability; an > enumerated device doesn't work as advertised. I am not sure distinguishing between different types of non portability will be in scope for virtio. Actually having devices that are portable would be. ... > Note that virtio-serial/virtio-vsock is not considered because they do not > standardize the set of devices that operate on top of them, but in practice, > are often used for fully general devices. Spec-wise, this is not a great > situation because we would still have potentially non portable device > implementations where there is no standard mechanism to determine whether or > not things are portable. Well it's easy to add an enumeration on top of sockets, and several well known solutions exist. There's an advantage to just reusing these. > virtio-user provides a device enumeration mechanism > to better control this. We'll have to see what it all looks like. For virtio pci transport it's important that you can reason about the device at a basic level based on it's PCI ID, and that is quite fundamental. Maybe what you are looking for is a new virtio transport then? > In addition, for performance considerations in applications such as graphics > and media, virtio-serial/virtio-vsock have the overhead of sending actual > traffic through the virtqueue, while an approach based on shared memory can > result in having fewer copies and virtqueue messages. virtio-serial is also > limited in being specialized for console forwarding and having a cap on the > number of clients. virtio-vsock is also not optimal in its choice of sockets > API for transport; shared memory cannot be used, arbitrary strings can be > passed without an designation of the device/driver being run de-facto, and the > guest must have additional machinery to handle socket APIs. In addition, on > the host, sockets are only dependable on Linux, with less predictable behavior > from Windows/macOS regarding Unix sockets. Waiting for socket traffic on the > host also requires a poll() loop, which is suboptimal for latency. With > virtio-user, only the bare set of standard driver calls > (open/close/ioctl/mmap/read) is needed, and RAM is a more universal transport > abstraction. We also explicitly spec out callbacks on host that are triggered > by virtqueue messages, which results in lower latency and makes it easy to > dispatch to a particular device implementation without polling. open/close/mmap/read seem to make sense. ioctl gives one pause. Given open/close this begins to look a bit like virtio-fs. Have you looked at that? -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 18:22 ` Michael S. Tsirkin @ 2019-02-12 19:01 ` Frank Yang 2019-02-12 19:15 ` Michael S. Tsirkin 0 siblings, 1 reply; 72+ messages in thread From: Frank Yang @ 2019-02-12 19:01 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Cornelia Huck, Dr. David Alan Gilbert, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1: Type: text/plain, Size: 5255 bytes --] On Tue, Feb 12, 2019 at 10:22 AM Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Feb 12, 2019 at 07:56:58AM -0800, Frank Yang wrote: > > Stepping back to standardization and portability concerns, it is also not > > necessarily desirable to use general pipes to do what we want, because > even > > though that device exists and is part of the spec already, that results > in > > _de-facto_ non-portability. > > That's not different from e.g. TCP. > > > If we had some kind of spec to enumerate such > > 'user-defined' devices, at least we can have _de-jure_ non-portability; > an > > enumerated device doesn't work as advertised. > > I am not sure distinguishing between different types of non portability > will be in scope for virtio. Actually having devices that are portable > would be. > > The device itself is portable; the user-defined drivers that run on them will work or not depending on negotiating device IDs. > ... > > Note that virtio-serial/virtio-vsock is not considered because they do not > > standardize the set of devices that operate on top of them, but in > practice, > > are often used for fully general devices. Spec-wise, this is not a great > > situation because we would still have potentially non portable device > > implementations where there is no standard mechanism to determine > whether or > > not things are portable. > > Well it's easy to add an enumeration on top of sockets, and several well > known solutions exist. There's an advantage to just reusing these. Sure, but there are many unique features/desirable properties of having the virtio meta device because (as explained in the spec) there are limitations to network/socket based communication. > > virtio-user provides a device enumeration mechanism > > to better control this. > > We'll have to see what it all looks like. For virtio pci transport it's > important that you can reason about the device at a basic level based on > it's PCI ID, and that is quite fundamental. > > The spec contains more details; basically the device itself is always portable, and there is a configuration protocol to negotiate whether a particular use of the device is available. This is similar to PCI, but with more defined ways to operate the device in terms of callbacks in shared libraries on the host. > Maybe what you are looking for is a new virtio transport then? > > Perhaps, something like virtio host memory transport? But at the same time, it needs to interact with shared memory which is best set as a PCI device. Can we mix transport types? In any case, the analog of "PCI ID"'s here (the vendor/device/version numbers) are meaningful, with the contract being that the user of the device needs to match on vendor/device id and negotiate on version number. Wha are the advantages of defining a new virtio transport type? it would be something that has the IDs, and be able to handle resolving offsets to physical addresses to host memory addresses, in addition to dispatching to callbacks on the host. But it would be effectively equivalent to having a new virtio device type with device ID enumeration, right? > > In addition, for performance considerations in applications such as > graphics > > and media, virtio-serial/virtio-vsock have the overhead of sending actual > > traffic through the virtqueue, while an approach based on shared memory > can > > result in having fewer copies and virtqueue messages. virtio-serial is > also > > limited in being specialized for console forwarding and having a cap on > the > > number of clients. virtio-vsock is also not optimal in its choice of > sockets > > API for transport; shared memory cannot be used, arbitrary strings can be > > passed without an designation of the device/driver being run de-facto, > and the > > guest must have additional machinery to handle socket APIs. In > addition, on > > the host, sockets are only dependable on Linux, with less predictable > behavior > > from Windows/macOS regarding Unix sockets. Waiting for socket traffic > on the > > host also requires a poll() loop, which is suboptimal for latency. With > > virtio-user, only the bare set of standard driver calls > > (open/close/ioctl/mmap/read) is needed, and RAM is a more universal > transport > > abstraction. We also explicitly spec out callbacks on host that are > triggered > > by virtqueue messages, which results in lower latency and makes it easy > to > > dispatch to a particular device implementation without polling. > > open/close/mmap/read seem to make sense. ioctl gives one pause. > > ioctl would be to send ping messages, but I'm not fixated on that choice. write() is also a possibility to send ping messages; I preferred ioctl() because it should be clear that it's a control message not a data message. > Given open/close this begins to look a bit like virtio-fs. > Have you looked at that? > > That's an interesting possibility since virtio-fs maps host pointers as well, which fits our use cases. Another alternative is to add the features unique about virtio-user to virtio-fs: device enumeration, memory sharing operations, operation in terms of callbacks on the host. However, it doesn't seem like a good fit due to being specialized to filesystem operations. > -- > MST > [-- Attachment #2: Type: text/html, Size: 7219 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 19:01 ` Frank Yang @ 2019-02-12 19:15 ` Michael S. Tsirkin 2019-02-12 20:15 ` Frank Yang 0 siblings, 1 reply; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-12 19:15 UTC (permalink / raw) To: Frank Yang Cc: Cornelia Huck, Dr. David Alan Gilbert, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman On Tue, Feb 12, 2019 at 11:01:21AM -0800, Frank Yang wrote: > > > > On Tue, Feb 12, 2019 at 10:22 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Feb 12, 2019 at 07:56:58AM -0800, Frank Yang wrote: > > Stepping back to standardization and portability concerns, it is also not > > necessarily desirable to use general pipes to do what we want, because > even > > though that device exists and is part of the spec already, that results > in > > _de-facto_ non-portability. > > That's not different from e.g. TCP. > > > If we had some kind of spec to enumerate such > > 'user-defined' devices, at least we can have _de-jure_ non-portability; > an > > enumerated device doesn't work as advertised. > > I am not sure distinguishing between different types of non portability > will be in scope for virtio. Actually having devices that are portable > would be. > > > The device itself is portable; the user-defined drivers that run on them will > work or not depending on > negotiating device IDs. > > ... > > > Note that virtio-serial/virtio-vsock is not considered because they do > not > > standardize the set of devices that operate on top of them, but in > practice, > > are often used for fully general devices. Spec-wise, this is not a great > > situation because we would still have potentially non portable device > > implementations where there is no standard mechanism to determine whether > or > > not things are portable. > > Well it's easy to add an enumeration on top of sockets, and several well > known solutions exist. There's an advantage to just reusing these. > > > Sure, but there are many unique features/desirable properties of having the > virtio meta device > because (as explained in the spec) there are limitations to network/socket > based communication. > > > > virtio-user provides a device enumeration mechanism > > to better control this. > > We'll have to see what it all looks like. For virtio pci transport it's > important that you can reason about the device at a basic level based on > it's PCI ID, and that is quite fundamental. > > > The spec contains more details; basically the device itself is always portable, > and there is a configuration protocol > to negotiate whether a particular use of the device is available. This is > similar to PCI, > but with more defined ways to operate the device in terms of callbacks in > shared libraries on the host. > > > Maybe what you are looking for is a new virtio transport then? > > > > Perhaps, something like virtio host memory transport? But > at the same time, it needs to interact with shared memory which is best set as > a PCI device. > Can we mix transport types? In any case, the analog of "PCI ID"'s here (the > vendor/device/version numbers) > are meaningful, with the contract being that the user of the device needs to > match on vendor/device id and > negotiate on version number. Virtio is fundamentally using feature bits not versions. It's been pretty successful in maintaining compatiblity across a wide range of hypervisor/guest revisions. > Wha are the advantages of defining a new virtio transport type? > it would be something that has the IDs, and be able to handle resolving offsets > to > physical addresses to host memory addresses, > in addition to dispatching to callbacks on the host. > But it would be effectively equivalent to having a new virtio device type with > device ID enumeration, right? Under virtio PCI Device IDs are all defined in virtio spec. If you want your own ID scheme you want an alternative transport. But now what you describe looks kind of like vhost pci to me. > > > > In addition, for performance considerations in applications such as > graphics > > and media, virtio-serial/virtio-vsock have the overhead of sending actual > > traffic through the virtqueue, while an approach based on shared memory > can > > result in having fewer copies and virtqueue messages. virtio-serial is > also > > limited in being specialized for console forwarding and having a cap on > the > > number of clients. virtio-vsock is also not optimal in its choice of > sockets > > API for transport; shared memory cannot be used, arbitrary strings can be > > passed without an designation of the device/driver being run de-facto, > and the > > guest must have additional machinery to handle socket APIs. In addition, > on > > the host, sockets are only dependable on Linux, with less predictable > behavior > > from Windows/macOS regarding Unix sockets. Waiting for socket traffic on > the > > host also requires a poll() loop, which is suboptimal for latency. With > > virtio-user, only the bare set of standard driver calls > > (open/close/ioctl/mmap/read) is needed, and RAM is a more universal > transport > > abstraction. We also explicitly spec out callbacks on host that are > triggered > > by virtqueue messages, which results in lower latency and makes it easy > to > > dispatch to a particular device implementation without polling. > > open/close/mmap/read seem to make sense. ioctl gives one pause. > > > ioctl would be to send ping messages, but I'm not fixated on that choice. write > () is also a possibility to send ping messages; I preferred ioctl() because it > should be clear that it's a control message not a data message. Yes if ioctls supported are white-listed and not blindly passed through (e.g. send a ping message), then it does not matter. > > Given open/close this begins to look a bit like virtio-fs. > Have you looked at that? > > > > That's an interesting possibility since virtio-fs maps host pointers as well, > which fits our use cases. > Another alternative is to add the features unique about virtio-user to > virtio-fs: > device enumeration, memory sharing operations, operation in terms of callbacks > on the host. > However, it doesn't seem like a good fit due to being specialized to filesystem > operations. Well everything is a file :) > > > -- > MST > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 19:15 ` Michael S. Tsirkin @ 2019-02-12 20:15 ` Frank Yang 0 siblings, 0 replies; 72+ messages in thread From: Frank Yang @ 2019-02-12 20:15 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Cornelia Huck, Dr. David Alan Gilbert, Roman Kiryanov, Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Greg Hartman [-- Attachment #1: Type: text/plain, Size: 7767 bytes --] On Tue, Feb 12, 2019 at 11:15 AM Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, Feb 12, 2019 at 11:01:21AM -0800, Frank Yang wrote: > > > > > > > > On Tue, Feb 12, 2019 at 10:22 AM Michael S. Tsirkin <mst@redhat.com> > wrote: > > > > On Tue, Feb 12, 2019 at 07:56:58AM -0800, Frank Yang wrote: > > > Stepping back to standardization and portability concerns, it is > also not > > > necessarily desirable to use general pipes to do what we want, > because > > even > > > though that device exists and is part of the spec already, that > results > > in > > > _de-facto_ non-portability. > > > > That's not different from e.g. TCP. > > > > > If we had some kind of spec to enumerate such > > > 'user-defined' devices, at least we can have _de-jure_ > non-portability; > > an > > > enumerated device doesn't work as advertised. > > > > I am not sure distinguishing between different types of non > portability > > will be in scope for virtio. Actually having devices that are > portable > > would be. > > > > > > The device itself is portable; the user-defined drivers that run on them > will > > work or not depending on > > negotiating device IDs. > > > > ... > > > > > Note that virtio-serial/virtio-vsock is not considered because > they do > > not > > > standardize the set of devices that operate on top of them, but in > > practice, > > > are often used for fully general devices. Spec-wise, this is not > a great > > > situation because we would still have potentially non portable > device > > > implementations where there is no standard mechanism to determine > whether > > or > > > not things are portable. > > > > Well it's easy to add an enumeration on top of sockets, and several > well > > known solutions exist. There's an advantage to just reusing these. > > > > > > Sure, but there are many unique features/desirable properties of having > the > > virtio meta device > > because (as explained in the spec) there are limitations to > network/socket > > based communication. > > > > > > > virtio-user provides a device enumeration mechanism > > > to better control this. > > > > We'll have to see what it all looks like. For virtio pci transport > it's > > important that you can reason about the device at a basic level > based on > > it's PCI ID, and that is quite fundamental. > > > > > > The spec contains more details; basically the device itself is always > portable, > > and there is a configuration protocol > > to negotiate whether a particular use of the device is available. This is > > similar to PCI, > > but with more defined ways to operate the device in terms of callbacks in > > shared libraries on the host. > > > > > > Maybe what you are looking for is a new virtio transport then? > > > > > > > > Perhaps, something like virtio host memory transport? But > > at the same time, it needs to interact with shared memory which is best > set as > > a PCI device. > > Can we mix transport types? In any case, the analog of "PCI ID"'s here > (the > > vendor/device/version numbers) > > are meaningful, with the contract being that the user of the device > needs to > > match on vendor/device id and > > negotiate on version number. > > Virtio is fundamentally using feature bits not versions. > It's been pretty successful in maintaining compatiblity > across a wide range of hypervisor/guest revisions. > The proposed device should be compatible with all hypervisor/guest revisions as is, without feature bits. > > Wha are the advantages of defining a new virtio transport type? > > it would be something that has the IDs, and be able to handle resolving > offsets > > to > > physical addresses to host memory addresses, > > in addition to dispatching to callbacks on the host. > > But it would be effectively equivalent to having a new virtio device > type with > > device ID enumeration, right? > > Under virtio PCI Device IDs are all defined in virtio spec. > If you want your own ID scheme you want an alternative transport. > But now what you describe looks kind of like vhost pci to me. > > Yet, vhost is still not a good fit, due to reliance on sockets/network functionality. It looks like we want to build something that has aspects in common with other virtio devices, but there is no combination of existing virtio devices that works. - the locality of virtio-fs, but not specialized to fuse - the device id enumeration from virtio pci, but not in the pci id space - the general communication of virtio-vsock, but not specialized to sockets and allowing host memory-based transport So, I still think it should be a new device (or transport if that works better). > > > > > > > In addition, for performance considerations in applications such as > > graphics > > > and media, virtio-serial/virtio-vsock have the overhead of sending > actual > > > traffic through the virtqueue, while an approach based on shared > memory > > can > > > result in having fewer copies and virtqueue messages. > virtio-serial is > > also > > > limited in being specialized for console forwarding and having a > cap on > > the > > > number of clients. virtio-vsock is also not optimal in its choice > of > > sockets > > > API for transport; shared memory cannot be used, arbitrary strings > can be > > > passed without an designation of the device/driver being run > de-facto, > > and the > > > guest must have additional machinery to handle socket APIs. In > addition, > > on > > > the host, sockets are only dependable on Linux, with less > predictable > > behavior > > > from Windows/macOS regarding Unix sockets. Waiting for socket > traffic on > > the > > > host also requires a poll() loop, which is suboptimal for > latency. With > > > virtio-user, only the bare set of standard driver calls > > > (open/close/ioctl/mmap/read) is needed, and RAM is a more universal > > transport > > > abstraction. We also explicitly spec out callbacks on host that > are > > triggered > > > by virtqueue messages, which results in lower latency and makes it > easy > > to > > > dispatch to a particular device implementation without polling. > > > > open/close/mmap/read seem to make sense. ioctl gives one pause. > > > > > > ioctl would be to send ping messages, but I'm not fixated on that > choice. write > > () is also a possibility to send ping messages; I preferred ioctl() > because it > > should be clear that it's a control message not a data message. > > Yes if ioctls supported are white-listed and not blindly passed through > (e.g. send a ping message), then it does not matter. > > There would be one whitelisted ioctl: IOCTL_PING, with struct close to what is specified in the proposed spec, with the instance handle field populated by the kernel. > > > > > Given open/close this begins to look a bit like virtio-fs. > > Have you looked at that? > > > > > > > > That's an interesting possibility since virtio-fs maps host pointers as > well, > > which fits our use cases. > > Another alternative is to add the features unique about virtio-user to > > virtio-fs: > > device enumeration, memory sharing operations, operation in terms of > callbacks > > on the host. > > However, it doesn't seem like a good fit due to being specialized to > filesystem > > operations. > > Well everything is a file :) > > Not according to virtio-fs; it specializes in fuse support in the guest kernel. However, it would work for me if the unique features about virtio-user were added to it, and dropping the requirement to use fuse. > > > > > > -- > > MST > > > [-- Attachment #2: Type: text/html, Size: 10110 bytes --] ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [virtio-dev] Memory sharing device 2019-02-12 8:27 ` Roman Kiryanov 2019-02-12 11:25 ` Dr. David Alan Gilbert @ 2019-02-12 13:00 ` Michael S. Tsirkin 1 sibling, 0 replies; 72+ messages in thread From: Michael S. Tsirkin @ 2019-02-12 13:00 UTC (permalink / raw) To: Roman Kiryanov Cc: Gerd Hoffmann, Stefan Hajnoczi, virtio-dev, Dr. David Alan Gilbert, Lingfeng Yang On Tue, Feb 12, 2019 at 12:27:54AM -0800, Roman Kiryanov wrote: > > > Our long term goal is to have as few kernel drivers as possible and to move > > > "drivers" into userspace. If we go with the virtqueues, is there > > > general a purpose > > > device/driver to talk between our host and guest to support custom hardware > > > (with own blobs)? > > > > The challenge is to answer the following question: > > how to do this without losing the benefits of standartization? > > We looked into UIO and it still requires some kernel driver to tell > where the device is, it also has limitations on sharing a device > between processes. The benefit of standardization could be in avoiding > everybody writing their own UIO drivers for virtual devices. > > Our emulator uses a battery, sound, accelerometer and more. We need to > support all of this. I looked into the spec, "5 Device types", and > seems "battery" is not there. We can invent our own drivers but we see > having one flexible driver is a better idea. So it sounds like you should use virtio-vsock or a serial device for most of your needs. For gpu, I'd use virtio-gpu probably improving it as IIUC you have concerns about resource management. > Yes, I realize that a guest could think it is using the same device as > the host advertised (because strings matched) while it is not. We > control both the host and the guest and we can live with this. > > Regards, > Roman. I suggest you layer on top of some other existing device then. Most people don't build their own transport layer on a whim just because they control both communicating sides. There should be more of a reason for this. -- MST --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 72+ messages in thread
end of thread, other threads:[~2019-02-24 21:19 UTC | newest] Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-01 20:34 [virtio-dev] Memory sharing device Roman Kiryanov 2019-02-04 5:40 ` Stefan Hajnoczi 2019-02-04 10:13 ` Gerd Hoffmann 2019-02-04 10:18 ` Roman Kiryanov 2019-02-05 7:42 ` Roman Kiryanov 2019-02-05 10:04 ` Dr. David Alan Gilbert 2019-02-05 15:17 ` Frank Yang 2019-02-05 15:21 ` Frank Yang 2019-02-05 21:06 ` Roman Kiryanov 2019-02-06 7:03 ` Gerd Hoffmann 2019-02-06 15:09 ` Frank Yang 2019-02-06 15:11 ` Frank Yang 2019-02-08 7:57 ` Stefan Hajnoczi 2019-02-08 14:46 ` Frank Yang 2019-02-06 20:14 ` Dr. David Alan Gilbert 2019-02-06 20:27 ` Frank Yang 2019-02-07 12:10 ` Cornelia Huck 2019-02-11 14:49 ` Michael S. Tsirkin 2019-02-11 15:14 ` Frank Yang 2019-02-11 15:25 ` Frank Yang 2019-02-12 13:01 ` Michael S. Tsirkin 2019-02-12 13:16 ` Dr. David Alan Gilbert 2019-02-12 13:27 ` Michael S. Tsirkin 2019-02-12 16:17 ` Frank Yang 2019-02-19 7:17 ` Gerd Hoffmann 2019-02-19 15:59 ` Frank Yang 2019-02-20 6:51 ` Gerd Hoffmann 2019-02-20 15:31 ` Frank Yang 2019-02-21 6:55 ` Gerd Hoffmann 2019-02-19 7:12 ` Gerd Hoffmann 2019-02-19 16:02 ` Frank Yang 2019-02-20 7:02 ` Gerd Hoffmann 2019-02-20 15:32 ` Frank Yang 2019-02-21 7:29 ` Gerd Hoffmann 2019-02-21 9:24 ` Dr. David Alan Gilbert 2019-02-21 9:59 ` Gerd Hoffmann 2019-02-21 10:03 ` Dr. David Alan Gilbert 2019-02-22 6:15 ` Michael S. Tsirkin 2019-02-22 6:42 ` Gerd Hoffmann 2019-02-11 16:57 ` Michael S. Tsirkin 2019-02-12 8:27 ` Roman Kiryanov 2019-02-12 11:25 ` Dr. David Alan Gilbert 2019-02-12 13:47 ` Cornelia Huck 2019-02-12 14:03 ` Michael S. Tsirkin 2019-02-12 15:56 ` Frank Yang 2019-02-12 16:46 ` Dr. David Alan Gilbert 2019-02-12 17:20 ` Frank Yang 2019-02-12 17:26 ` Frank Yang 2019-02-12 19:06 ` Michael S. Tsirkin 2019-02-13 2:50 ` Frank Yang 2019-02-13 4:02 ` Michael S. Tsirkin 2019-02-13 4:19 ` Michael S. Tsirkin 2019-02-13 4:59 ` Frank Yang 2019-02-13 18:18 ` Frank Yang 2019-02-14 7:15 ` Frank Yang 2019-02-22 22:05 ` Michael S. Tsirkin 2019-02-24 21:19 ` Frank Yang 2019-02-13 4:59 ` Frank Yang 2019-02-19 7:54 ` Gerd Hoffmann 2019-02-19 15:54 ` Frank Yang 2019-02-20 3:46 ` Michael S. Tsirkin 2019-02-20 15:24 ` Frank Yang 2019-02-20 19:29 ` Michael S. Tsirkin 2019-02-20 6:25 ` Gerd Hoffmann 2019-02-20 15:30 ` Frank Yang 2019-02-20 15:35 ` Frank Yang 2019-02-21 6:44 ` Gerd Hoffmann 2019-02-12 18:22 ` Michael S. Tsirkin 2019-02-12 19:01 ` Frank Yang 2019-02-12 19:15 ` Michael S. Tsirkin 2019-02-12 20:15 ` Frank Yang 2019-02-12 13:00 ` Michael S. Tsirkin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.