All of lore.kernel.org
 help / color / mirror / Atom feed
* [virtio-comment] Seeking guidance for custom virtIO device
@ 2020-04-10  9:36 Eftime, Petre
  2020-04-10 10:09 ` Stefano Garzarella
  2020-04-13 11:46 ` Michael S. Tsirkin
  0 siblings, 2 replies; 13+ messages in thread
From: Eftime, Petre @ 2020-04-10  9:36 UTC (permalink / raw)
  To: virtio-comment

Hi all,

I am looking for guidance on how to proceed with regards to either reserving a virtio device ID for a specific device for a particular usecase  or for formalizing a device type that could be potentially used by others.

We have developed a virtio device that acts as a transport for API calls between a guest userspace library and a backend server in the host system.
Our requirements are:
* multiple clients in the guest (multiple servers is not required)
* provide an in-order, reliable datagram transport mechanism
* datagram size should be either negotiable or large (16k-64k?)
* performance is not a big concern for our usecase

The reason why we used a special device and not something else is the following:
* vsock spec does not contain a datagram specification (eg. SOCK_DGRAM, SOCK_SEQPACKET) and the effort of updating the Linux driver and other implementations for this particular purpose  seemed relatively high. The path to approach this problem wasn't clear. Vsock today only works in SOCK_STREAM mode and this is not ideal: the receiver must implement additional state and buffer incoming data,  adding complexity and host resource usage.
* rpmsg has some implementation limitation (eg. from what I understand, Linux can have at most 512 bytes per message) and the programming model seems complicated and doesn't necesarrily fit our needs, since it's fully callback based and would require additional complex logic to make sure it behaves well for our usecase (at least in Linux). I am also not entirely convinced you can implement endpoints that cross VM boundaries (eg. guest <-> host).
* virtio-serial is not datagram based (and therefore has similar issues to vsock in Stream mode).

For our device, we decided to:
* use a single virtio queue
* each element in the virtio queue contains two buffers: the first one is the request and the second one is the reply (similar to a virtio-block device)
* the guest driver itself doesn't need to be capable to understand the requests or replies, everything is handled in between guest userspace and host userspace
* the guest API passes both a send buffer and a receive buffer and calls are blocking, if you need concurrency you could create multiple threads
* since there is only one endpoint, there's no source or destination added to the packages.

Is there anyone working on a device that solves a similar problem to this one, which we might reuse or collaborate on? I could potentially see this type of device being useful to others, maybe with some minor adjustments.
If not, what is the path to having our device live somewhere in the virtio world? If there is no reason to include our device in the spec proper, we at least want to reserve an identifier for it, to prevent accidental collisions with other devices. I see that there are other devices which are rather specialized and not yet formalized listed in the spec, but I don't fully graps what steps I need to take in order to get an ID. Would submitting a patch with the ID be enough?

Thank you,
Petre Eftime





Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] Seeking guidance for custom virtIO device
  2020-04-10  9:36 [virtio-comment] Seeking guidance for custom virtIO device Eftime, Petre
@ 2020-04-10 10:09 ` Stefano Garzarella
  2020-04-14 10:50   ` Stefan Hajnoczi
  2020-04-13 11:46 ` Michael S. Tsirkin
  1 sibling, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2020-04-10 10:09 UTC (permalink / raw)
  To: Eftime, Petre; +Cc: virtio-comment

Hi,

On Fri, Apr 10, 2020 at 09:36:58AM +0000, Eftime, Petre wrote:
> Hi all,
> 
> I am looking for guidance on how to proceed with regards to either reserving a virtio device ID for a specific device for a particular usecase  or for formalizing a device type that could be potentially used by others.
> 
> We have developed a virtio device that acts as a transport for API calls between a guest userspace library and a backend server in the host system.
> Our requirements are:
> * multiple clients in the guest (multiple servers is not required)
> * provide an in-order, reliable datagram transport mechanism
> * datagram size should be either negotiable or large (16k-64k?)
> * performance is not a big concern for our usecase

It looks really close to vsock.

> 
> The reason why we used a special device and not something else is the following:
> * vsock spec does not contain a datagram specification (eg. SOCK_DGRAM, SOCK_SEQPACKET) and the effort of updating the Linux driver and other implementations for this particular purpose  seemed relatively high. The path to approach this problem wasn't clear. Vsock today only works in SOCK_STREAM mode and this is not ideal: the receiver must implement additional state and buffer incoming data,  adding complexity and host resource usage.

AF_VSOCK itself supports SOCK_DGRAM, but virtio-vsock doesn't provide
this feature. (vmci provides SOCK_DGRAM support)

The changes should not be too intrusive in the virtio-vsock specs and
implementation, we already have the "type" field in the packet header
to address this new feature.

We also have the credit-mechanism to provide in-order and reliable
packets delivery.

Maybe the hardest part could be change something in the core to handle
multiple transports that provide SOCK_DGRAM, for nested VMs.
We already did for stream sockets, but we didn't handle the datagram
socket for now.

I am not sure how convenient it is to have two very similar devices...

If you decide to give virtio-vsock a chance to get SOCK_DGRAM, I can try to
give you a more complete list of changes to make. :-)

Cheers,
Stefano


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] Seeking guidance for custom virtIO device
  2020-04-10  9:36 [virtio-comment] Seeking guidance for custom virtIO device Eftime, Petre
  2020-04-10 10:09 ` Stefano Garzarella
@ 2020-04-13 11:46 ` Michael S. Tsirkin
  1 sibling, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2020-04-13 11:46 UTC (permalink / raw)
  To: Eftime, Petre; +Cc: virtio-comment

On Fri, Apr 10, 2020 at 09:36:58AM +0000, Eftime, Petre wrote:
> If not, what is the path to having our device live somewhere in the
> virtio world?

Basically submit a spec patch, address review comments, eventually
ask for a TC vote to include it. Reserving an ID is a helpful
first step for many people since that lets you write
guest and host patches.

> If there is no reason to include our device in the spec
> proper, we at least want to reserve an identifier for it, to prevent
> accidental collisions with other devices. I see that there are other
> devices which are rather specialized and not yet formalized listed in
> the spec, but I don't fully graps what steps I need to take in order
> to get an ID. Would submitting a patch with the ID be enough?

Almost.
https://github.com/oasis-tcs/virtio-spec#use-of-github-issues
see text beginning with:
"To request a TC vote on resolving a specific comment:"

https://github.com/oasis-tcs/virtio-spec#providing-feedback
is also helpful.

-- 
MST


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] Seeking guidance for custom virtIO device
  2020-04-10 10:09 ` Stefano Garzarella
@ 2020-04-14 10:50   ` Stefan Hajnoczi
  2020-04-15 11:23     ` Eftime, Petre
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2020-04-14 10:50 UTC (permalink / raw)
  To: Eftime, Petre; +Cc: sgarzare, virtio-comment

[-- Attachment #1: Type: text/plain, Size: 3356 bytes --]

On Fri, Apr 10, 2020 at 12:09:22PM +0200, Stefano Garzarella wrote:
> Hi,
> 
> On Fri, Apr 10, 2020 at 09:36:58AM +0000, Eftime, Petre wrote:
> > Hi all,
> > 
> > I am looking for guidance on how to proceed with regards to either reserving a virtio device ID for a specific device for a particular usecase  or for formalizing a device type that could be potentially used by others.
> > 
> > We have developed a virtio device that acts as a transport for API calls between a guest userspace library and a backend server in the host system.
> > Our requirements are:
> > * multiple clients in the guest (multiple servers is not required)
> > * provide an in-order, reliable datagram transport mechanism
> > * datagram size should be either negotiable or large (16k-64k?)
> > * performance is not a big concern for our usecase
> 
> It looks really close to vsock.
> 
> > 
> > The reason why we used a special device and not something else is the following:
> > * vsock spec does not contain a datagram specification (eg. SOCK_DGRAM, SOCK_SEQPACKET) and the effort of updating the Linux driver and other implementations for this particular purpose  seemed relatively high. The path to approach this problem wasn't clear. Vsock today only works in SOCK_STREAM mode and this is not ideal: the receiver must implement additional state and buffer incoming data,  adding complexity and host resource usage.
> 
> AF_VSOCK itself supports SOCK_DGRAM, but virtio-vsock doesn't provide
> this feature. (vmci provides SOCK_DGRAM support)
> 
> The changes should not be too intrusive in the virtio-vsock specs and
> implementation, we already have the "type" field in the packet header
> to address this new feature.
> 
> We also have the credit-mechanism to provide in-order and reliable
> packets delivery.
> 
> Maybe the hardest part could be change something in the core to handle
> multiple transports that provide SOCK_DGRAM, for nested VMs.
> We already did for stream sockets, but we didn't handle the datagram
> socket for now.
> 
> I am not sure how convenient it is to have two very similar devices...
> 
> If you decide to give virtio-vsock a chance to get SOCK_DGRAM, I can try to
> give you a more complete list of changes to make. :-)

I although think this sounds exactly like adding SOCK_DGRAM support to
virtio-vsock.

The reason why the SOCK_DGRAM code was dropped from early virtio-vsock
patches is that the prototocol design didn't ensure reliable delivery
semantics.  At that time there were no real users for SOCK_DGRAM so it
was left as a feature to be added later.

The challenge with reusing the SOCK_STREAM credit mechanism for
SOCK_DGRAM is that datagrams are connectionless.  The credit mechanism
consists per-connection state.  Maybe it can be extended to cover
SOCK_DGRAM too.

I would urge you to add SOCK_DGRAM to virtio-vsock instead of trying to
create another device that does basically what is within the scope of
virtio-vsock.  It took quite a bit of time and effort to get AF_VSOCK
support into various software components, and doing that again for
another device is more effort than one would think.

If you don't want to modify the Linux guest driver, then let's just
discuss the device spec and protocol.  Someone else could make the Linux
driver changes.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] Seeking guidance for custom virtIO device
  2020-04-14 10:50   ` Stefan Hajnoczi
@ 2020-04-15 11:23     ` Eftime, Petre
  2020-04-17 10:33       ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Eftime, Petre @ 2020-04-15 11:23 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: sgarzare, graf, virtio-comment


On 2020-04-14 13:50, Stefan Hajnoczi wrote:
> On Fri, Apr 10, 2020 at 12:09:22PM +0200, Stefano Garzarella wrote:
>> Hi,
>>
>> On Fri, Apr 10, 2020 at 09:36:58AM +0000, Eftime, Petre wrote:
>>> Hi all,
>>>
>>> I am looking for guidance on how to proceed with regards to either reserving a virtio device ID for a specific device for a particular usecase  or for formalizing a device type that could be potentially used by others.
>>>
>>> We have developed a virtio device that acts as a transport for API calls between a guest userspace library and a backend server in the host system.
>>> Our requirements are:
>>> * multiple clients in the guest (multiple servers is not required)
>>> * provide an in-order, reliable datagram transport mechanism
>>> * datagram size should be either negotiable or large (16k-64k?)
>>> * performance is not a big concern for our usecase
>> It looks really close to vsock.
>>
>>> The reason why we used a special device and not something else is the following:
>>> * vsock spec does not contain a datagram specification (eg. SOCK_DGRAM, SOCK_SEQPACKET) and the effort of updating the Linux driver and other implementations for this particular purpose  seemed relatively high. The path to approach this problem wasn't clear. Vsock today only works in SOCK_STREAM mode and this is not ideal: the receiver must implement additional state and buffer incoming data,  adding complexity and host resource usage.
>> AF_VSOCK itself supports SOCK_DGRAM, but virtio-vsock doesn't provide
>> this feature. (vmci provides SOCK_DGRAM support)
>>
>> The changes should not be too intrusive in the virtio-vsock specs and
>> implementation, we already have the "type" field in the packet header
>> to address this new feature.
>>
>> We also have the credit-mechanism to provide in-order and reliable
>> packets delivery.
>>
>> Maybe the hardest part could be change something in the core to handle
>> multiple transports that provide SOCK_DGRAM, for nested VMs.
>> We already did for stream sockets, but we didn't handle the datagram
>> socket for now.
>>
>> I am not sure how convenient it is to have two very similar devices...
>>
>> If you decide to give virtio-vsock a chance to get SOCK_DGRAM, I can try to
>> give you a more complete list of changes to make. :-)
> I although think this sounds exactly like adding SOCK_DGRAM support to
> virtio-vsock.
>
> The reason why the SOCK_DGRAM code was dropped from early virtio-vsock
> patches is that the prototocol design didn't ensure reliable delivery
> semantics.  At that time there were no real users for SOCK_DGRAM so it
> was left as a feature to be added later.
>
> The challenge with reusing the SOCK_STREAM credit mechanism for
> SOCK_DGRAM is that datagrams are connectionless.  The credit mechanism
> consists per-connection state.  Maybe it can be extended to cover
> SOCK_DGRAM too.
>
> I would urge you to add SOCK_DGRAM to virtio-vsock instead of trying to
> create another device that does basically what is within the scope of
> virtio-vsock.  It took quite a bit of time and effort to get AF_VSOCK
> support into various software components, and doing that again for
> another device is more effort than one would think.
>
> If you don't want to modify the Linux guest driver, then let's just
> discuss the device spec and protocol.  Someone else could make the Linux
> driver changes.
>
> Stefan


I think it would be great if we could get the virtio-vsock driver to 
support SOCK_DGRAM/SOCK_SEQPACKET as it would make a lot of sense.


But one of the reasons that I don't really like virtio-vsock at the 
moment for my use-case in particular is that it doesn't seem well fitted 
to support non-cooperating live-migrateable VMs all that well.  One 
problem is that to avoid guest-visible disconnections to any service 
while doing a live migration there might be performance impact if using 
vsock for any other reasons.

I'll try to exemplify what I mean with this setup:

     * workload 1 sends data constantly via an AF_VSOCK SOCK_STREAM

     * workload 2 sends commands / gets replies once in a while via an 
AF_VSOCK SOCK_SEQPACKET.

Assume the VM needs to be migrated:

         1) If workload 2 currently not processing anything, even if 
there are some commands for it queued up, everything is fine, VMM can 
pause the guest and serialize.

         2) If there's an outstanding command the VMM needs to wait for 
it to finish and wait for the receive queue of the request to have 
enough capacity for the reply, but since this capacity is guest driven, 
this second part can take a while / forever. This is definitely not ideal.


To fix this issue, the VMM can keep any non-finishable command in queue 
until it can actually properly finish them, that is, it won't remove the 
command from the queue until it can push the reply back as well, if it 
needs to migrate, it can restart servicing the commands from queue. But 
this has a big impact on workload 1, which can get blocked by workload 2 
from meaningful progress.

I could potentially solve some of the issues above by playing around 
with the credits system, but it seems like it could be challenging to do 
well: if the device advertises 32k of buffer space, the guest could 
place 2 16k commands before the device can actually tell it that it's 
actually 0k space now. This also doesn't scale all that well, credits 
are per stream, so once the device gets 1 command from 1 stream, it 
needs to tell all the other streams to back off, and when it can handle 
another command, it would need to advertise back to all of them that 
everything is fine, consuming available capacity.


I short, I think workload 2 needs to be in control of its own queues for 
this to work reasonably well, I don't know if sharing ownership of 
queues can work. The device we defined doesn't have this problem: first 
of all, it's on a separate queue, so workload 1 never competes in any 
way with workload 2, and workload 2 always has where to place replies, 
since it has an attached reply buffer by design.


Perhaps a good compromise would be to have a multi-queue virtio-vsock or 
allow AF_VSOCK to be backed my more than 1 device and allow some 
minimalist routing on a per-destination basis?


Best,

Petre Eftime




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] Seeking guidance for custom virtIO device
  2020-04-15 11:23     ` Eftime, Petre
@ 2020-04-17 10:33       ` Stefan Hajnoczi
       [not found]         ` <15906829-2e85-ed4c-7b06-431d6e856ae9@amazon.de>
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2020-04-17 10:33 UTC (permalink / raw)
  To: Eftime, Petre; +Cc: sgarzare, graf, virtio-comment

[-- Attachment #1: Type: text/plain, Size: 7694 bytes --]

On Wed, Apr 15, 2020 at 02:23:48PM +0300, Eftime, Petre wrote:
> 
> On 2020-04-14 13:50, Stefan Hajnoczi wrote:
> > On Fri, Apr 10, 2020 at 12:09:22PM +0200, Stefano Garzarella wrote:
> > > Hi,
> > > 
> > > On Fri, Apr 10, 2020 at 09:36:58AM +0000, Eftime, Petre wrote:
> > > > Hi all,
> > > > 
> > > > I am looking for guidance on how to proceed with regards to either reserving a virtio device ID for a specific device for a particular usecase  or for formalizing a device type that could be potentially used by others.
> > > > 
> > > > We have developed a virtio device that acts as a transport for API calls between a guest userspace library and a backend server in the host system.
> > > > Our requirements are:
> > > > * multiple clients in the guest (multiple servers is not required)
> > > > * provide an in-order, reliable datagram transport mechanism
> > > > * datagram size should be either negotiable or large (16k-64k?)
> > > > * performance is not a big concern for our usecase
> > > It looks really close to vsock.
> > > 
> > > > The reason why we used a special device and not something else is the following:
> > > > * vsock spec does not contain a datagram specification (eg. SOCK_DGRAM, SOCK_SEQPACKET) and the effort of updating the Linux driver and other implementations for this particular purpose  seemed relatively high. The path to approach this problem wasn't clear. Vsock today only works in SOCK_STREAM mode and this is not ideal: the receiver must implement additional state and buffer incoming data,  adding complexity and host resource usage.
> > > AF_VSOCK itself supports SOCK_DGRAM, but virtio-vsock doesn't provide
> > > this feature. (vmci provides SOCK_DGRAM support)
> > > 
> > > The changes should not be too intrusive in the virtio-vsock specs and
> > > implementation, we already have the "type" field in the packet header
> > > to address this new feature.
> > > 
> > > We also have the credit-mechanism to provide in-order and reliable
> > > packets delivery.
> > > 
> > > Maybe the hardest part could be change something in the core to handle
> > > multiple transports that provide SOCK_DGRAM, for nested VMs.
> > > We already did for stream sockets, but we didn't handle the datagram
> > > socket for now.
> > > 
> > > I am not sure how convenient it is to have two very similar devices...
> > > 
> > > If you decide to give virtio-vsock a chance to get SOCK_DGRAM, I can try to
> > > give you a more complete list of changes to make. :-)
> > I although think this sounds exactly like adding SOCK_DGRAM support to
> > virtio-vsock.
> > 
> > The reason why the SOCK_DGRAM code was dropped from early virtio-vsock
> > patches is that the prototocol design didn't ensure reliable delivery
> > semantics.  At that time there were no real users for SOCK_DGRAM so it
> > was left as a feature to be added later.
> > 
> > The challenge with reusing the SOCK_STREAM credit mechanism for
> > SOCK_DGRAM is that datagrams are connectionless.  The credit mechanism
> > consists per-connection state.  Maybe it can be extended to cover
> > SOCK_DGRAM too.
> > 
> > I would urge you to add SOCK_DGRAM to virtio-vsock instead of trying to
> > create another device that does basically what is within the scope of
> > virtio-vsock.  It took quite a bit of time and effort to get AF_VSOCK
> > support into various software components, and doing that again for
> > another device is more effort than one would think.
> > 
> > If you don't want to modify the Linux guest driver, then let's just
> > discuss the device spec and protocol.  Someone else could make the Linux
> > driver changes.
> > 
> > Stefan
> 
> 
> I think it would be great if we could get the virtio-vsock driver to support
> SOCK_DGRAM/SOCK_SEQPACKET as it would make a lot of sense.
> 
> 
> But one of the reasons that I don't really like virtio-vsock at the moment
> for my use-case in particular is that it doesn't seem well fitted to support
> non-cooperating live-migrateable VMs all that well.  One problem is that to
> avoid guest-visible disconnections to any service while doing a live
> migration there might be performance impact if using vsock for any other
> reasons.
> 
> I'll try to exemplify what I mean with this setup:
> 
>     * workload 1 sends data constantly via an AF_VSOCK SOCK_STREAM
> 
>     * workload 2 sends commands / gets replies once in a while via an
> AF_VSOCK SOCK_SEQPACKET.

af_vsock.ko doesn't support SOCK_SEQPACKET.  Is this what you are
considering adding?

Earlier in this thread I thought we were discussing SOCK_DGRAM, which
has different semantics than SOCK_SEQPACKET.

The good news is that SOCK_SEQPACKET should be easier to add to
net/vmw_vsock than SOCK_DGRAM because the flow control credit mechanism
used for SOCK_STREAM should just work for SOCK_SEQPACKET.

> 
> Assume the VM needs to be migrated:
> 
>         1) If workload 2 currently not processing anything, even if there
> are some commands for it queued up, everything is fine, VMM can pause the
> guest and serialize.
> 
>         2) If there's an outstanding command the VMM needs to wait for it to
> finish and wait for the receive queue of the request to have enough capacity
> for the reply, but since this capacity is guest driven, this second part can
> take a while / forever. This is definitely not ideal.

I think you're describing how to reserve space for control packets so
that the device never has to wait on the driver.

Have you seen the drivers/vhost/vsock.c device implementation?  It has a
strategy for suspending tx queue processing until the rx queue has more
space.  Multiple implementation-specific approaches are possible, so
this isn't in the specification.

> I short, I think workload 2 needs to be in control of its own queues for
> this to work reasonably well, I don't know if sharing ownership of queues
> can work. The device we defined doesn't have this problem: first of all,
> it's on a separate queue, so workload 1 never competes in any way with
> workload 2, and workload 2 always has where to place replies, since it has
> an attached reply buffer by design.

Flow control in vsock works like this:

1. Data packets are accounted against per-socket buffers and removed
   from the virtqueue immediately.  This allows multiple competing data
   streams to share a single virtqueue without starvation.  It's the
   per-socket buffer that can be exhausted, but that only affects the
   application that isn't reading the socket socket.  The other side
   will stop sending more data when credit is exhausted so that delivery
   can be guaranteed.

2. Control packet replies can be sent in response to pretty much any
   packet.  Therefore, it's necessary to suspend packet processing when
   the other side's virtqueue is full.  This way you don't need to wait
   for them midway through processing a packet.

There is a problem with #2 which hasn't been solved.  If both sides are
operating at N-1 queue capacity (they are almost exhausted), then can we
reach a deadlock where both sides suspend queue processing because they
are waiting for the other side?  This has not been fully investigated or
demonstrated, but it's an area that needs attention sometime.

> Perhaps a good compromise would be to have a multi-queue virtio-vsock or

That would mean we've reached the conclusion that it's impossible to
have bi-directional communication with guaranteed delivery over a shared
communications channel.

virtio-serial did this to avoid having to come up with a scheme to avoid
starvation.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] Seeking guidance for custom virtIO device
       [not found]         ` <15906829-2e85-ed4c-7b06-431d6e856ae9@amazon.de>
@ 2020-04-21  9:37           ` Stefano Garzarella
       [not found]             ` <a26e9938-49c6-6b63-bb4a-6ba582af3d40@amazon.de>
  2020-04-29  9:53           ` Stefan Hajnoczi
  2020-04-29 10:06           ` Stefan Hajnoczi
  2 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2020-04-21  9:37 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Stefan Hajnoczi, Eftime, Petre, virtio-comment

On Fri, Apr 17, 2020 at 01:09:16PM +0200, Alexander Graf wrote:
> 
> 
> On 17.04.20 12:33, Stefan Hajnoczi wrote:
> > On Wed, Apr 15, 2020 at 02:23:48PM +0300, Eftime, Petre wrote:
> > > 
> > > On 2020-04-14 13:50, Stefan Hajnoczi wrote:
> > > > On Fri, Apr 10, 2020 at 12:09:22PM +0200, Stefano Garzarella wrote:
> > > > > Hi,
> > > > > 
> > > > > On Fri, Apr 10, 2020 at 09:36:58AM +0000, Eftime, Petre wrote:
> > > > > > Hi all,
> > > > > > 
> > > > > > I am looking for guidance on how to proceed with regards to either reserving a virtio device ID for a specific device for a particular usecase  or for formalizing a device type that could be potentially used by others.
> > > > > > 
> > > > > > We have developed a virtio device that acts as a transport for API calls between a guest userspace library and a backend server in the host system.
> > > > > > Our requirements are:
> > > > > > * multiple clients in the guest (multiple servers is not required)
> > > > > > * provide an in-order, reliable datagram transport mechanism
> > > > > > * datagram size should be either negotiable or large (16k-64k?)
> > > > > > * performance is not a big concern for our usecase
> > > > > It looks really close to vsock.
> > > > > 
> > > > > > The reason why we used a special device and not something else is the following:
> > > > > > * vsock spec does not contain a datagram specification (eg. SOCK_DGRAM, SOCK_SEQPACKET) and the effort of updating the Linux driver and other implementations for this particular purpose  seemed relatively high. The path to approach this problem wasn't clear. Vsock today only works in SOCK_STREAM mode and this is not ideal: the receiver must implement additional state and buffer incoming data,  adding complexity and host resource usage.
> > > > > AF_VSOCK itself supports SOCK_DGRAM, but virtio-vsock doesn't provide
> > > > > this feature. (vmci provides SOCK_DGRAM support)
> > > > > 
> > > > > The changes should not be too intrusive in the virtio-vsock specs and
> > > > > implementation, we already have the "type" field in the packet header
> > > > > to address this new feature.
> > > > > 
> > > > > We also have the credit-mechanism to provide in-order and reliable
> > > > > packets delivery.
> > > > > 
> > > > > Maybe the hardest part could be change something in the core to handle
> > > > > multiple transports that provide SOCK_DGRAM, for nested VMs.
> > > > > We already did for stream sockets, but we didn't handle the datagram
> > > > > socket for now.
> > > > > 
> > > > > I am not sure how convenient it is to have two very similar devices...
> > > > > 
> > > > > If you decide to give virtio-vsock a chance to get SOCK_DGRAM, I can try to
> > > > > give you a more complete list of changes to make. :-)
> > > > I although think this sounds exactly like adding SOCK_DGRAM support to
> > > > virtio-vsock.
> > > > 
> > > > The reason why the SOCK_DGRAM code was dropped from early virtio-vsock
> > > > patches is that the prototocol design didn't ensure reliable delivery
> > > > semantics.  At that time there were no real users for SOCK_DGRAM so it
> > > > was left as a feature to be added later.
> > > > 
> > > > The challenge with reusing the SOCK_STREAM credit mechanism for
> > > > SOCK_DGRAM is that datagrams are connectionless.  The credit mechanism
> > > > consists per-connection state.  Maybe it can be extended to cover
> > > > SOCK_DGRAM too.
> > > > 
> > > > I would urge you to add SOCK_DGRAM to virtio-vsock instead of trying to
> > > > create another device that does basically what is within the scope of
> > > > virtio-vsock.  It took quite a bit of time and effort to get AF_VSOCK
> > > > support into various software components, and doing that again for
> > > > another device is more effort than one would think.
> > > > 
> > > > If you don't want to modify the Linux guest driver, then let's just
> > > > discuss the device spec and protocol.  Someone else could make the Linux
> > > > driver changes.
> > > > 
> > > > Stefan
> > > 
> > > 
> > > I think it would be great if we could get the virtio-vsock driver to support
> > > SOCK_DGRAM/SOCK_SEQPACKET as it would make a lot of sense.
> > > 
> > > 
> > > But one of the reasons that I don't really like virtio-vsock at the moment
> > > for my use-case in particular is that it doesn't seem well fitted to support
> > > non-cooperating live-migrateable VMs all that well.  One problem is that to
> > > avoid guest-visible disconnections to any service while doing a live
> > > migration there might be performance impact if using vsock for any other
> > > reasons.
> > > 
> > > I'll try to exemplify what I mean with this setup:
> > > 
> > >      * workload 1 sends data constantly via an AF_VSOCK SOCK_STREAM
> > > 
> > >      * workload 2 sends commands / gets replies once in a while via an
> > > AF_VSOCK SOCK_SEQPACKET.
> > 
> > af_vsock.ko doesn't support SOCK_SEQPACKET.  Is this what you are
> > considering adding?
> > 
> > Earlier in this thread I thought we were discussing SOCK_DGRAM, which
> > has different semantics than SOCK_SEQPACKET.
> > 
> > The good news is that SOCK_SEQPACKET should be easier to add to
> > net/vmw_vsock than SOCK_DGRAM because the flow control credit mechanism
> > used for SOCK_STREAM should just work for SOCK_SEQPACKET.
> > 
> > > 
> > > Assume the VM needs to be migrated:
> > > 
> > >          1) If workload 2 currently not processing anything, even if there
> > > are some commands for it queued up, everything is fine, VMM can pause the
> > > guest and serialize.
> > > 
> > >          2) If there's an outstanding command the VMM needs to wait for it to
> > > finish and wait for the receive queue of the request to have enough capacity
> > > for the reply, but since this capacity is guest driven, this second part can
> > > take a while / forever. This is definitely not ideal.
> > 
> > I think you're describing how to reserve space for control packets so
> > that the device never has to wait on the driver.
> > 
> > Have you seen the drivers/vhost/vsock.c device implementation?  It has a
> > strategy for suspending tx queue processing until the rx queue has more
> > space.  Multiple implementation-specific approaches are possible, so
> > this isn't in the specification.
> > 
> > > I short, I think workload 2 needs to be in control of its own queues for
> > > this to work reasonably well, I don't know if sharing ownership of queues
> > > can work. The device we defined doesn't have this problem: first of all,
> > > it's on a separate queue, so workload 1 never competes in any way with
> > > workload 2, and workload 2 always has where to place replies, since it has
> > > an attached reply buffer by design.
> > 
> > Flow control in vsock works like this:
> > 
> > 1. Data packets are accounted against per-socket buffers and removed
> >     from the virtqueue immediately.  This allows multiple competing data
> >     streams to share a single virtqueue without starvation.  It's the
> >     per-socket buffer that can be exhausted, but that only affects the
> >     application that isn't reading the socket socket.  The other side
> >     will stop sending more data when credit is exhausted so that delivery
> >     can be guaranteed.
> > 
> > 2. Control packet replies can be sent in response to pretty much any
> >     packet.  Therefore, it's necessary to suspend packet processing when
> >     the other side's virtqueue is full.  This way you don't need to wait
> >     for them midway through processing a packet.
> > 
> > There is a problem with #2 which hasn't been solved.  If both sides are
> > operating at N-1 queue capacity (they are almost exhausted), then can we
> > reach a deadlock where both sides suspend queue processing because they
> > are waiting for the other side?  This has not been fully investigated or
> > demonstrated, but it's an area that needs attention sometime.
> > 
> > > Perhaps a good compromise would be to have a multi-queue virtio-vsock or
> > 
> > That would mean we've reached the conclusion that it's impossible to
> > have bi-directional communication with guaranteed delivery over a shared
> > communications channel.
> > 
> > virtio-serial did this to avoid having to come up with a scheme to avoid
> > starvation.
> 
> Let me throw in one more problem:
> 
> Imagine that we want to have virtio-vsock communication terminated in
> different domains, each of which has ownership of their own device
> emulation.
> 
> The easiest case where this happens is to have vsock between hypervisor and
> guest as well as between a PCIe implementation via VFIO and a guest. But the
> same can be true for stub domain like setups, where each connection end
> lives in its own stub domain (vhost-user in the vsock case I suppose).
> 
> In that case, it's impossible to share the one queue we have, no?

Maybe it is possible, but we have some restrictions:
- one guest should behave as an host, since every communication assumes
  that one peer is the host (VMADDR_CID_HOST)
- all packets go only between the two guests, without being able to be
  delivered to the host

If performance doesn't matter, we can have an host application in
user space that does this bridging (e.g. socat).

Cheers,
Stefano


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] Seeking guidance for custom virtIO device
       [not found]             ` <a26e9938-49c6-6b63-bb4a-6ba582af3d40@amazon.de>
@ 2020-04-22  8:40               ` Stefano Garzarella
  0 siblings, 0 replies; 13+ messages in thread
From: Stefano Garzarella @ 2020-04-22  8:40 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Stefan Hajnoczi, Eftime, Petre, virtio-comment

On Tue, Apr 21, 2020 at 09:22:34PM +0200, Alexander Graf wrote:
> 
> 
> On 21.04.20 11:37, Stefano Garzarella wrote:
> > 
> > 
> > 
> > On Fri, Apr 17, 2020 at 01:09:16PM +0200, Alexander Graf wrote:
> > > 
> > > 
> > > On 17.04.20 12:33, Stefan Hajnoczi wrote:
> > > > On Wed, Apr 15, 2020 at 02:23:48PM +0300, Eftime, Petre wrote:
> > > > > 
> > > > > On 2020-04-14 13:50, Stefan Hajnoczi wrote:
> > > > > > On Fri, Apr 10, 2020 at 12:09:22PM +0200, Stefano Garzarella wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On Fri, Apr 10, 2020 at 09:36:58AM +0000, Eftime, Petre wrote:
> > > > > > > > Hi all,
> > > > > > > > 
> > > > > > > > I am looking for guidance on how to proceed with regards to either reserving a virtio device ID for a specific device for a particular usecase  or for formalizing a device type that could be potentially used by others.
> > > > > > > > 
> > > > > > > > We have developed a virtio device that acts as a transport for API calls between a guest userspace library and a backend server in the host system.
> > > > > > > > Our requirements are:
> > > > > > > > * multiple clients in the guest (multiple servers is not required)
> > > > > > > > * provide an in-order, reliable datagram transport mechanism
> > > > > > > > * datagram size should be either negotiable or large (16k-64k?)
> > > > > > > > * performance is not a big concern for our usecase
> > > > > > > It looks really close to vsock.
> > > > > > > 
> > > > > > > > The reason why we used a special device and not something else is the following:
> > > > > > > > * vsock spec does not contain a datagram specification (eg. SOCK_DGRAM, SOCK_SEQPACKET) and the effort of updating the Linux driver and other implementations for this particular purpose  seemed relatively high. The path to approach this problem wasn't clear. Vsock today only works in SOCK_STREAM mode and this is not ideal: the receiver must implement additional state and buffer incoming data,  adding complexity and host resource usage.
> > > > > > > AF_VSOCK itself supports SOCK_DGRAM, but virtio-vsock doesn't provide
> > > > > > > this feature. (vmci provides SOCK_DGRAM support)
> > > > > > > 
> > > > > > > The changes should not be too intrusive in the virtio-vsock specs and
> > > > > > > implementation, we already have the "type" field in the packet header
> > > > > > > to address this new feature.
> > > > > > > 
> > > > > > > We also have the credit-mechanism to provide in-order and reliable
> > > > > > > packets delivery.
> > > > > > > 
> > > > > > > Maybe the hardest part could be change something in the core to handle
> > > > > > > multiple transports that provide SOCK_DGRAM, for nested VMs.
> > > > > > > We already did for stream sockets, but we didn't handle the datagram
> > > > > > > socket for now.
> > > > > > > 
> > > > > > > I am not sure how convenient it is to have two very similar devices...
> > > > > > > 
> > > > > > > If you decide to give virtio-vsock a chance to get SOCK_DGRAM, I can try to
> > > > > > > give you a more complete list of changes to make. :-)
> > > > > > I although think this sounds exactly like adding SOCK_DGRAM support to
> > > > > > virtio-vsock.
> > > > > > 
> > > > > > The reason why the SOCK_DGRAM code was dropped from early virtio-vsock
> > > > > > patches is that the prototocol design didn't ensure reliable delivery
> > > > > > semantics.  At that time there were no real users for SOCK_DGRAM so it
> > > > > > was left as a feature to be added later.
> > > > > > 
> > > > > > The challenge with reusing the SOCK_STREAM credit mechanism for
> > > > > > SOCK_DGRAM is that datagrams are connectionless.  The credit mechanism
> > > > > > consists per-connection state.  Maybe it can be extended to cover
> > > > > > SOCK_DGRAM too.
> > > > > > 
> > > > > > I would urge you to add SOCK_DGRAM to virtio-vsock instead of trying to
> > > > > > create another device that does basically what is within the scope of
> > > > > > virtio-vsock.  It took quite a bit of time and effort to get AF_VSOCK
> > > > > > support into various software components, and doing that again for
> > > > > > another device is more effort than one would think.
> > > > > > 
> > > > > > If you don't want to modify the Linux guest driver, then let's just
> > > > > > discuss the device spec and protocol.  Someone else could make the Linux
> > > > > > driver changes.
> > > > > > 
> > > > > > Stefan
> > > > > 
> > > > > 
> > > > > I think it would be great if we could get the virtio-vsock driver to support
> > > > > SOCK_DGRAM/SOCK_SEQPACKET as it would make a lot of sense.
> > > > > 
> > > > > 
> > > > > But one of the reasons that I don't really like virtio-vsock at the moment
> > > > > for my use-case in particular is that it doesn't seem well fitted to support
> > > > > non-cooperating live-migrateable VMs all that well.  One problem is that to
> > > > > avoid guest-visible disconnections to any service while doing a live
> > > > > migration there might be performance impact if using vsock for any other
> > > > > reasons.
> > > > > 
> > > > > I'll try to exemplify what I mean with this setup:
> > > > > 
> > > > >       * workload 1 sends data constantly via an AF_VSOCK SOCK_STREAM
> > > > > 
> > > > >       * workload 2 sends commands / gets replies once in a while via an
> > > > > AF_VSOCK SOCK_SEQPACKET.
> > > > 
> > > > af_vsock.ko doesn't support SOCK_SEQPACKET.  Is this what you are
> > > > considering adding?
> > > > 
> > > > Earlier in this thread I thought we were discussing SOCK_DGRAM, which
> > > > has different semantics than SOCK_SEQPACKET.
> > > > 
> > > > The good news is that SOCK_SEQPACKET should be easier to add to
> > > > net/vmw_vsock than SOCK_DGRAM because the flow control credit mechanism
> > > > used for SOCK_STREAM should just work for SOCK_SEQPACKET.
> > > > 
> > > > > 
> > > > > Assume the VM needs to be migrated:
> > > > > 
> > > > >           1) If workload 2 currently not processing anything, even if there
> > > > > are some commands for it queued up, everything is fine, VMM can pause the
> > > > > guest and serialize.
> > > > > 
> > > > >           2) If there's an outstanding command the VMM needs to wait for it to
> > > > > finish and wait for the receive queue of the request to have enough capacity
> > > > > for the reply, but since this capacity is guest driven, this second part can
> > > > > take a while / forever. This is definitely not ideal.
> > > > 
> > > > I think you're describing how to reserve space for control packets so
> > > > that the device never has to wait on the driver.
> > > > 
> > > > Have you seen the drivers/vhost/vsock.c device implementation?  It has a
> > > > strategy for suspending tx queue processing until the rx queue has more
> > > > space.  Multiple implementation-specific approaches are possible, so
> > > > this isn't in the specification.
> > > > 
> > > > > I short, I think workload 2 needs to be in control of its own queues for
> > > > > this to work reasonably well, I don't know if sharing ownership of queues
> > > > > can work. The device we defined doesn't have this problem: first of all,
> > > > > it's on a separate queue, so workload 1 never competes in any way with
> > > > > workload 2, and workload 2 always has where to place replies, since it has
> > > > > an attached reply buffer by design.
> > > > 
> > > > Flow control in vsock works like this:
> > > > 
> > > > 1. Data packets are accounted against per-socket buffers and removed
> > > >      from the virtqueue immediately.  This allows multiple competing data
> > > >      streams to share a single virtqueue without starvation.  It's the
> > > >      per-socket buffer that can be exhausted, but that only affects the
> > > >      application that isn't reading the socket socket.  The other side
> > > >      will stop sending more data when credit is exhausted so that delivery
> > > >      can be guaranteed.
> > > > 
> > > > 2. Control packet replies can be sent in response to pretty much any
> > > >      packet.  Therefore, it's necessary to suspend packet processing when
> > > >      the other side's virtqueue is full.  This way you don't need to wait
> > > >      for them midway through processing a packet.
> > > > 
> > > > There is a problem with #2 which hasn't been solved.  If both sides are
> > > > operating at N-1 queue capacity (they are almost exhausted), then can we
> > > > reach a deadlock where both sides suspend queue processing because they
> > > > are waiting for the other side?  This has not been fully investigated or
> > > > demonstrated, but it's an area that needs attention sometime.
> > > > 
> > > > > Perhaps a good compromise would be to have a multi-queue virtio-vsock or
> > > > 
> > > > That would mean we've reached the conclusion that it's impossible to
> > > > have bi-directional communication with guaranteed delivery over a shared
> > > > communications channel.
> > > > 
> > > > virtio-serial did this to avoid having to come up with a scheme to avoid
> > > > starvation.
> > > 
> > > Let me throw in one more problem:
> > > 
> > > Imagine that we want to have virtio-vsock communication terminated in
> > > different domains, each of which has ownership of their own device
> > > emulation.
> > > 
> > > The easiest case where this happens is to have vsock between hypervisor and
> > > guest as well as between a PCIe implementation via VFIO and a guest. But the
> > > same can be true for stub domain like setups, where each connection end
> > > lives in its own stub domain (vhost-user in the vsock case I suppose).
> > > 
> > > In that case, it's impossible to share the one queue we have, no?
> > 
> > Maybe it is possible, but we have some restrictions:
> > - one guest should behave as an host, since every communication assumes
> >    that one peer is the host (VMADDR_CID_HOST)
> > - all packets go only between the two guests, without being able to be
> >    delivered to the host
> > 
> > If performance doesn't matter, we can have an host application in
> > user space that does this bridging (e.g. socat).
> 
> I'm less concerned about ownership than security. The two implementation
> backends of vsock are separate for a good reason: The touch point between
> them is the guest.

Okay, now I got your point and I think is feasible to have multiple queues
to reach different endpoints.

We only need to fix the peer addressing to allow the driver to choose
which queue to use, or which direction (nested VM, host, siblings).
At this point, I'm not sure if it is better to support multiple devices
(hot-plug) instead of multiple queues.

As we previously discussed, maybe we need to centralize the CID allocation
at the lowest level (L0), or we need to define some kind of ARP.

> 
> If we now start to create a user space broker on the host, we're suddenly
> creating a single entity that sits right in between the information flow
> again :(.

Yeah, make sense if the information is critical.

Cheers,
Stefano


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] Seeking guidance for custom virtIO device
       [not found]         ` <15906829-2e85-ed4c-7b06-431d6e856ae9@amazon.de>
  2020-04-21  9:37           ` Stefano Garzarella
@ 2020-04-29  9:53           ` Stefan Hajnoczi
       [not found]             ` <fe4a003d-3705-1337-0349-c1c4a0e94125@amazon.de>
  2020-04-29 10:06           ` Stefan Hajnoczi
  2 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2020-04-29  9:53 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Eftime, Petre, sgarzare, virtio-comment

[-- Attachment #1: Type: text/plain, Size: 9521 bytes --]

On Fri, Apr 17, 2020 at 01:09:16PM +0200, Alexander Graf wrote:
> 
> 
> On 17.04.20 12:33, Stefan Hajnoczi wrote:
> > On Wed, Apr 15, 2020 at 02:23:48PM +0300, Eftime, Petre wrote:
> > > 
> > > On 2020-04-14 13:50, Stefan Hajnoczi wrote:
> > > > On Fri, Apr 10, 2020 at 12:09:22PM +0200, Stefano Garzarella wrote:
> > > > > Hi,
> > > > > 
> > > > > On Fri, Apr 10, 2020 at 09:36:58AM +0000, Eftime, Petre wrote:
> > > > > > Hi all,
> > > > > > 
> > > > > > I am looking for guidance on how to proceed with regards to either reserving a virtio device ID for a specific device for a particular usecase  or for formalizing a device type that could be potentially used by others.
> > > > > > 
> > > > > > We have developed a virtio device that acts as a transport for API calls between a guest userspace library and a backend server in the host system.
> > > > > > Our requirements are:
> > > > > > * multiple clients in the guest (multiple servers is not required)
> > > > > > * provide an in-order, reliable datagram transport mechanism
> > > > > > * datagram size should be either negotiable or large (16k-64k?)
> > > > > > * performance is not a big concern for our usecase
> > > > > It looks really close to vsock.
> > > > > 
> > > > > > The reason why we used a special device and not something else is the following:
> > > > > > * vsock spec does not contain a datagram specification (eg. SOCK_DGRAM, SOCK_SEQPACKET) and the effort of updating the Linux driver and other implementations for this particular purpose  seemed relatively high. The path to approach this problem wasn't clear. Vsock today only works in SOCK_STREAM mode and this is not ideal: the receiver must implement additional state and buffer incoming data,  adding complexity and host resource usage.
> > > > > AF_VSOCK itself supports SOCK_DGRAM, but virtio-vsock doesn't provide
> > > > > this feature. (vmci provides SOCK_DGRAM support)
> > > > > 
> > > > > The changes should not be too intrusive in the virtio-vsock specs and
> > > > > implementation, we already have the "type" field in the packet header
> > > > > to address this new feature.
> > > > > 
> > > > > We also have the credit-mechanism to provide in-order and reliable
> > > > > packets delivery.
> > > > > 
> > > > > Maybe the hardest part could be change something in the core to handle
> > > > > multiple transports that provide SOCK_DGRAM, for nested VMs.
> > > > > We already did for stream sockets, but we didn't handle the datagram
> > > > > socket for now.
> > > > > 
> > > > > I am not sure how convenient it is to have two very similar devices...
> > > > > 
> > > > > If you decide to give virtio-vsock a chance to get SOCK_DGRAM, I can try to
> > > > > give you a more complete list of changes to make. :-)
> > > > I although think this sounds exactly like adding SOCK_DGRAM support to
> > > > virtio-vsock.
> > > > 
> > > > The reason why the SOCK_DGRAM code was dropped from early virtio-vsock
> > > > patches is that the prototocol design didn't ensure reliable delivery
> > > > semantics.  At that time there were no real users for SOCK_DGRAM so it
> > > > was left as a feature to be added later.
> > > > 
> > > > The challenge with reusing the SOCK_STREAM credit mechanism for
> > > > SOCK_DGRAM is that datagrams are connectionless.  The credit mechanism
> > > > consists per-connection state.  Maybe it can be extended to cover
> > > > SOCK_DGRAM too.
> > > > 
> > > > I would urge you to add SOCK_DGRAM to virtio-vsock instead of trying to
> > > > create another device that does basically what is within the scope of
> > > > virtio-vsock.  It took quite a bit of time and effort to get AF_VSOCK
> > > > support into various software components, and doing that again for
> > > > another device is more effort than one would think.
> > > > 
> > > > If you don't want to modify the Linux guest driver, then let's just
> > > > discuss the device spec and protocol.  Someone else could make the Linux
> > > > driver changes.
> > > > 
> > > > Stefan
> > > 
> > > 
> > > I think it would be great if we could get the virtio-vsock driver to support
> > > SOCK_DGRAM/SOCK_SEQPACKET as it would make a lot of sense.
> > > 
> > > 
> > > But one of the reasons that I don't really like virtio-vsock at the moment
> > > for my use-case in particular is that it doesn't seem well fitted to support
> > > non-cooperating live-migrateable VMs all that well.  One problem is that to
> > > avoid guest-visible disconnections to any service while doing a live
> > > migration there might be performance impact if using vsock for any other
> > > reasons.
> > > 
> > > I'll try to exemplify what I mean with this setup:
> > > 
> > >      * workload 1 sends data constantly via an AF_VSOCK SOCK_STREAM
> > > 
> > >      * workload 2 sends commands / gets replies once in a while via an
> > > AF_VSOCK SOCK_SEQPACKET.
> > 
> > af_vsock.ko doesn't support SOCK_SEQPACKET.  Is this what you are
> > considering adding?
> > 
> > Earlier in this thread I thought we were discussing SOCK_DGRAM, which
> > has different semantics than SOCK_SEQPACKET.
> > 
> > The good news is that SOCK_SEQPACKET should be easier to add to
> > net/vmw_vsock than SOCK_DGRAM because the flow control credit mechanism
> > used for SOCK_STREAM should just work for SOCK_SEQPACKET.
> > 
> > > 
> > > Assume the VM needs to be migrated:
> > > 
> > >          1) If workload 2 currently not processing anything, even if there
> > > are some commands for it queued up, everything is fine, VMM can pause the
> > > guest and serialize.
> > > 
> > >          2) If there's an outstanding command the VMM needs to wait for it to
> > > finish and wait for the receive queue of the request to have enough capacity
> > > for the reply, but since this capacity is guest driven, this second part can
> > > take a while / forever. This is definitely not ideal.
> > 
> > I think you're describing how to reserve space for control packets so
> > that the device never has to wait on the driver.
> > 
> > Have you seen the drivers/vhost/vsock.c device implementation?  It has a
> > strategy for suspending tx queue processing until the rx queue has more
> > space.  Multiple implementation-specific approaches are possible, so
> > this isn't in the specification.
> > 
> > > I short, I think workload 2 needs to be in control of its own queues for
> > > this to work reasonably well, I don't know if sharing ownership of queues
> > > can work. The device we defined doesn't have this problem: first of all,
> > > it's on a separate queue, so workload 1 never competes in any way with
> > > workload 2, and workload 2 always has where to place replies, since it has
> > > an attached reply buffer by design.
> > 
> > Flow control in vsock works like this:
> > 
> > 1. Data packets are accounted against per-socket buffers and removed
> >     from the virtqueue immediately.  This allows multiple competing data
> >     streams to share a single virtqueue without starvation.  It's the
> >     per-socket buffer that can be exhausted, but that only affects the
> >     application that isn't reading the socket socket.  The other side
> >     will stop sending more data when credit is exhausted so that delivery
> >     can be guaranteed.
> > 
> > 2. Control packet replies can be sent in response to pretty much any
> >     packet.  Therefore, it's necessary to suspend packet processing when
> >     the other side's virtqueue is full.  This way you don't need to wait
> >     for them midway through processing a packet.
> > 
> > There is a problem with #2 which hasn't been solved.  If both sides are
> > operating at N-1 queue capacity (they are almost exhausted), then can we
> > reach a deadlock where both sides suspend queue processing because they
> > are waiting for the other side?  This has not been fully investigated or
> > demonstrated, but it's an area that needs attention sometime.
> > 
> > > Perhaps a good compromise would be to have a multi-queue virtio-vsock or
> > 
> > That would mean we've reached the conclusion that it's impossible to
> > have bi-directional communication with guaranteed delivery over a shared
> > communications channel.
> > 
> > virtio-serial did this to avoid having to come up with a scheme to avoid
> > starvation.
> 
> Let me throw in one more problem:
> 
> Imagine that we want to have virtio-vsock communication terminated in
> different domains, each of which has ownership of their own device
> emulation.
> 
> The easiest case where this happens is to have vsock between hypervisor and
> guest as well as between a PCIe implementation via VFIO and a guest. But the
> same can be true for stub domain like setups, where each connection end
> lives in its own stub domain (vhost-user in the vsock case I suppose).
> 
> In that case, it's impossible to share the one queue we have, no?

Do you see any relation to the SOCK_SEQPACKET semantics discussion?
That seems like a completely separate issue to me.

Even if you introduce multiple virtqueues for other reasons, it's
advantageous to keep virtio-vsock flow control credit mechanism so that
multiple connections can use a single virtqueue.

Initially the requirement might only be for one vsock connection but who
knows when that requirement changes and you need a dynamic number of
connections?  Virtqueues cannot be hotplugged.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] Seeking guidance for custom virtIO device
       [not found]         ` <15906829-2e85-ed4c-7b06-431d6e856ae9@amazon.de>
  2020-04-21  9:37           ` Stefano Garzarella
  2020-04-29  9:53           ` Stefan Hajnoczi
@ 2020-04-29 10:06           ` Stefan Hajnoczi
  2020-04-30  8:44             ` Eftime, Petre
  2 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2020-04-29 10:06 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Eftime, Petre, sgarzare, virtio-comment

[-- Attachment #1: Type: text/plain, Size: 9459 bytes --]

On Fri, Apr 17, 2020 at 01:09:16PM +0200, Alexander Graf wrote:
> On 17.04.20 12:33, Stefan Hajnoczi wrote:
> > On Wed, Apr 15, 2020 at 02:23:48PM +0300, Eftime, Petre wrote:
> > > On 2020-04-14 13:50, Stefan Hajnoczi wrote:
> > > > On Fri, Apr 10, 2020 at 12:09:22PM +0200, Stefano Garzarella wrote:
> > > > > Hi,
> > > > > 
> > > > > On Fri, Apr 10, 2020 at 09:36:58AM +0000, Eftime, Petre wrote:
> > > > > > Hi all,
> > > > > > 
> > > > > > I am looking for guidance on how to proceed with regards to either reserving a virtio device ID for a specific device for a particular usecase  or for formalizing a device type that could be potentially used by others.
> > > > > > 
> > > > > > We have developed a virtio device that acts as a transport for API calls between a guest userspace library and a backend server in the host system.
> > > > > > Our requirements are:
> > > > > > * multiple clients in the guest (multiple servers is not required)
> > > > > > * provide an in-order, reliable datagram transport mechanism
> > > > > > * datagram size should be either negotiable or large (16k-64k?)
> > > > > > * performance is not a big concern for our usecase
> > > > > It looks really close to vsock.
> > > > > 
> > > > > > The reason why we used a special device and not something else is the following:
> > > > > > * vsock spec does not contain a datagram specification (eg. SOCK_DGRAM, SOCK_SEQPACKET) and the effort of updating the Linux driver and other implementations for this particular purpose  seemed relatively high. The path to approach this problem wasn't clear. Vsock today only works in SOCK_STREAM mode and this is not ideal: the receiver must implement additional state and buffer incoming data,  adding complexity and host resource usage.
> > > > > AF_VSOCK itself supports SOCK_DGRAM, but virtio-vsock doesn't provide
> > > > > this feature. (vmci provides SOCK_DGRAM support)
> > > > > 
> > > > > The changes should not be too intrusive in the virtio-vsock specs and
> > > > > implementation, we already have the "type" field in the packet header
> > > > > to address this new feature.
> > > > > 
> > > > > We also have the credit-mechanism to provide in-order and reliable
> > > > > packets delivery.
> > > > > 
> > > > > Maybe the hardest part could be change something in the core to handle
> > > > > multiple transports that provide SOCK_DGRAM, for nested VMs.
> > > > > We already did for stream sockets, but we didn't handle the datagram
> > > > > socket for now.
> > > > > 
> > > > > I am not sure how convenient it is to have two very similar devices...
> > > > > 
> > > > > If you decide to give virtio-vsock a chance to get SOCK_DGRAM, I can try to
> > > > > give you a more complete list of changes to make. :-)
> > > > I although think this sounds exactly like adding SOCK_DGRAM support to
> > > > virtio-vsock.
> > > > 
> > > > The reason why the SOCK_DGRAM code was dropped from early virtio-vsock
> > > > patches is that the prototocol design didn't ensure reliable delivery
> > > > semantics.  At that time there were no real users for SOCK_DGRAM so it
> > > > was left as a feature to be added later.
> > > > 
> > > > The challenge with reusing the SOCK_STREAM credit mechanism for
> > > > SOCK_DGRAM is that datagrams are connectionless.  The credit mechanism
> > > > consists per-connection state.  Maybe it can be extended to cover
> > > > SOCK_DGRAM too.
> > > > 
> > > > I would urge you to add SOCK_DGRAM to virtio-vsock instead of trying to
> > > > create another device that does basically what is within the scope of
> > > > virtio-vsock.  It took quite a bit of time and effort to get AF_VSOCK
> > > > support into various software components, and doing that again for
> > > > another device is more effort than one would think.
> > > > 
> > > > If you don't want to modify the Linux guest driver, then let's just
> > > > discuss the device spec and protocol.  Someone else could make the Linux
> > > > driver changes.
> > > > 
> > > > Stefan
> > > 
> > > 
> > > I think it would be great if we could get the virtio-vsock driver to support
> > > SOCK_DGRAM/SOCK_SEQPACKET as it would make a lot of sense.
> > > 
> > > 
> > > But one of the reasons that I don't really like virtio-vsock at the moment
> > > for my use-case in particular is that it doesn't seem well fitted to support
> > > non-cooperating live-migrateable VMs all that well.  One problem is that to
> > > avoid guest-visible disconnections to any service while doing a live
> > > migration there might be performance impact if using vsock for any other
> > > reasons.
> > > 
> > > I'll try to exemplify what I mean with this setup:
> > > 
> > >      * workload 1 sends data constantly via an AF_VSOCK SOCK_STREAM
> > > 
> > >      * workload 2 sends commands / gets replies once in a while via an
> > > AF_VSOCK SOCK_SEQPACKET.
> > 
> > af_vsock.ko doesn't support SOCK_SEQPACKET.  Is this what you are
> > considering adding?
> > 
> > Earlier in this thread I thought we were discussing SOCK_DGRAM, which
> > has different semantics than SOCK_SEQPACKET.
> > 
> > The good news is that SOCK_SEQPACKET should be easier to add to
> > net/vmw_vsock than SOCK_DGRAM because the flow control credit mechanism
> > used for SOCK_STREAM should just work for SOCK_SEQPACKET.
> > 
> > > 
> > > Assume the VM needs to be migrated:
> > > 
> > >          1) If workload 2 currently not processing anything, even if there
> > > are some commands for it queued up, everything is fine, VMM can pause the
> > > guest and serialize.
> > > 
> > >          2) If there's an outstanding command the VMM needs to wait for it to
> > > finish and wait for the receive queue of the request to have enough capacity
> > > for the reply, but since this capacity is guest driven, this second part can
> > > take a while / forever. This is definitely not ideal.
> > 
> > I think you're describing how to reserve space for control packets so
> > that the device never has to wait on the driver.
> > 
> > Have you seen the drivers/vhost/vsock.c device implementation?  It has a
> > strategy for suspending tx queue processing until the rx queue has more
> > space.  Multiple implementation-specific approaches are possible, so
> > this isn't in the specification.
> > 
> > > I short, I think workload 2 needs to be in control of its own queues for
> > > this to work reasonably well, I don't know if sharing ownership of queues
> > > can work. The device we defined doesn't have this problem: first of all,
> > > it's on a separate queue, so workload 1 never competes in any way with
> > > workload 2, and workload 2 always has where to place replies, since it has
> > > an attached reply buffer by design.
> > 
> > Flow control in vsock works like this:
> > 
> > 1. Data packets are accounted against per-socket buffers and removed
> >     from the virtqueue immediately.  This allows multiple competing data
> >     streams to share a single virtqueue without starvation.  It's the
> >     per-socket buffer that can be exhausted, but that only affects the
> >     application that isn't reading the socket socket.  The other side
> >     will stop sending more data when credit is exhausted so that delivery
> >     can be guaranteed.
> > 
> > 2. Control packet replies can be sent in response to pretty much any
> >     packet.  Therefore, it's necessary to suspend packet processing when
> >     the other side's virtqueue is full.  This way you don't need to wait
> >     for them midway through processing a packet.
> > 
> > There is a problem with #2 which hasn't been solved.  If both sides are
> > operating at N-1 queue capacity (they are almost exhausted), then can we
> > reach a deadlock where both sides suspend queue processing because they
> > are waiting for the other side?  This has not been fully investigated or
> > demonstrated, but it's an area that needs attention sometime.
> > 
> > > Perhaps a good compromise would be to have a multi-queue virtio-vsock or
> > 
> > That would mean we've reached the conclusion that it's impossible to
> > have bi-directional communication with guaranteed delivery over a shared
> > communications channel.
> > 
> > virtio-serial did this to avoid having to come up with a scheme to avoid
> > starvation.
> 
> Let me throw in one more problem:
> 
> Imagine that we want to have virtio-vsock communication terminated in
> different domains, each of which has ownership of their own device
> emulation.
> 
> The easiest case where this happens is to have vsock between hypervisor and
> guest as well as between a PCIe implementation via VFIO and a guest. But the
> same can be true for stub domain like setups, where each connection end
> lives in its own stub domain (vhost-user in the vsock case I suppose).
> 
> In that case, it's impossible to share the one queue we have, no?

By the way, "dedicated virtqueues" could be interesting as a separate
feature.

I just think that SOCK_SEQPACKET should be implemented with flow control
like SOCK_STREAM.

Then, as a separate feature, the device could advertise dedicated
virtqueues allowing SOCK_STREAM and SOCK_SEQPACKET communication
directly in the virtqueue.  The dedicated virtqueue approach should be
faster since it eliminates the need for a memcpy to socket receive
buffers.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] Seeking guidance for custom virtIO device
       [not found]             ` <fe4a003d-3705-1337-0349-c1c4a0e94125@amazon.de>
@ 2020-04-30  8:43               ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2020-04-30  8:43 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Eftime, Petre, sgarzare, virtio-comment

[-- Attachment #1: Type: text/plain, Size: 11538 bytes --]

On Wed, Apr 29, 2020 at 01:47:16PM +0200, Alexander Graf wrote:
> 
> 
> On 29.04.20 11:53, Stefan Hajnoczi wrote:
> > On Fri, Apr 17, 2020 at 01:09:16PM +0200, Alexander Graf wrote:
> > > 
> > > 
> > > On 17.04.20 12:33, Stefan Hajnoczi wrote:
> > > > On Wed, Apr 15, 2020 at 02:23:48PM +0300, Eftime, Petre wrote:
> > > > > 
> > > > > On 2020-04-14 13:50, Stefan Hajnoczi wrote:
> > > > > > On Fri, Apr 10, 2020 at 12:09:22PM +0200, Stefano Garzarella wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On Fri, Apr 10, 2020 at 09:36:58AM +0000, Eftime, Petre wrote:
> > > > > > > > Hi all,
> > > > > > > > 
> > > > > > > > I am looking for guidance on how to proceed with regards to either reserving a virtio device ID for a specific device for a particular usecase  or for formalizing a device type that could be potentially used by others.
> > > > > > > > 
> > > > > > > > We have developed a virtio device that acts as a transport for API calls between a guest userspace library and a backend server in the host system.
> > > > > > > > Our requirements are:
> > > > > > > > * multiple clients in the guest (multiple servers is not required)
> > > > > > > > * provide an in-order, reliable datagram transport mechanism
> > > > > > > > * datagram size should be either negotiable or large (16k-64k?)
> > > > > > > > * performance is not a big concern for our usecase
> > > > > > > It looks really close to vsock.
> > > > > > > 
> > > > > > > > The reason why we used a special device and not something else is the following:
> > > > > > > > * vsock spec does not contain a datagram specification (eg. SOCK_DGRAM, SOCK_SEQPACKET) and the effort of updating the Linux driver and other implementations for this particular purpose  seemed relatively high. The path to approach this problem wasn't clear. Vsock today only works in SOCK_STREAM mode and this is not ideal: the receiver must implement additional state and buffer incoming data,  adding complexity and host resource usage.
> > > > > > > AF_VSOCK itself supports SOCK_DGRAM, but virtio-vsock doesn't provide
> > > > > > > this feature. (vmci provides SOCK_DGRAM support)
> > > > > > > 
> > > > > > > The changes should not be too intrusive in the virtio-vsock specs and
> > > > > > > implementation, we already have the "type" field in the packet header
> > > > > > > to address this new feature.
> > > > > > > 
> > > > > > > We also have the credit-mechanism to provide in-order and reliable
> > > > > > > packets delivery.
> > > > > > > 
> > > > > > > Maybe the hardest part could be change something in the core to handle
> > > > > > > multiple transports that provide SOCK_DGRAM, for nested VMs.
> > > > > > > We already did for stream sockets, but we didn't handle the datagram
> > > > > > > socket for now.
> > > > > > > 
> > > > > > > I am not sure how convenient it is to have two very similar devices...
> > > > > > > 
> > > > > > > If you decide to give virtio-vsock a chance to get SOCK_DGRAM, I can try to
> > > > > > > give you a more complete list of changes to make. :-)
> > > > > > I although think this sounds exactly like adding SOCK_DGRAM support to
> > > > > > virtio-vsock.
> > > > > > 
> > > > > > The reason why the SOCK_DGRAM code was dropped from early virtio-vsock
> > > > > > patches is that the prototocol design didn't ensure reliable delivery
> > > > > > semantics.  At that time there were no real users for SOCK_DGRAM so it
> > > > > > was left as a feature to be added later.
> > > > > > 
> > > > > > The challenge with reusing the SOCK_STREAM credit mechanism for
> > > > > > SOCK_DGRAM is that datagrams are connectionless.  The credit mechanism
> > > > > > consists per-connection state.  Maybe it can be extended to cover
> > > > > > SOCK_DGRAM too.
> > > > > > 
> > > > > > I would urge you to add SOCK_DGRAM to virtio-vsock instead of trying to
> > > > > > create another device that does basically what is within the scope of
> > > > > > virtio-vsock.  It took quite a bit of time and effort to get AF_VSOCK
> > > > > > support into various software components, and doing that again for
> > > > > > another device is more effort than one would think.
> > > > > > 
> > > > > > If you don't want to modify the Linux guest driver, then let's just
> > > > > > discuss the device spec and protocol.  Someone else could make the Linux
> > > > > > driver changes.
> > > > > > 
> > > > > > Stefan
> > > > > 
> > > > > 
> > > > > I think it would be great if we could get the virtio-vsock driver to support
> > > > > SOCK_DGRAM/SOCK_SEQPACKET as it would make a lot of sense.
> > > > > 
> > > > > 
> > > > > But one of the reasons that I don't really like virtio-vsock at the moment
> > > > > for my use-case in particular is that it doesn't seem well fitted to support
> > > > > non-cooperating live-migrateable VMs all that well.  One problem is that to
> > > > > avoid guest-visible disconnections to any service while doing a live
> > > > > migration there might be performance impact if using vsock for any other
> > > > > reasons.
> > > > > 
> > > > > I'll try to exemplify what I mean with this setup:
> > > > > 
> > > > >       * workload 1 sends data constantly via an AF_VSOCK SOCK_STREAM
> > > > > 
> > > > >       * workload 2 sends commands / gets replies once in a while via an
> > > > > AF_VSOCK SOCK_SEQPACKET.
> > > > 
> > > > af_vsock.ko doesn't support SOCK_SEQPACKET.  Is this what you are
> > > > considering adding?
> > > > 
> > > > Earlier in this thread I thought we were discussing SOCK_DGRAM, which
> > > > has different semantics than SOCK_SEQPACKET.
> > > > 
> > > > The good news is that SOCK_SEQPACKET should be easier to add to
> > > > net/vmw_vsock than SOCK_DGRAM because the flow control credit mechanism
> > > > used for SOCK_STREAM should just work for SOCK_SEQPACKET.
> > > > 
> > > > > 
> > > > > Assume the VM needs to be migrated:
> > > > > 
> > > > >           1) If workload 2 currently not processing anything, even if there
> > > > > are some commands for it queued up, everything is fine, VMM can pause the
> > > > > guest and serialize.
> > > > > 
> > > > >           2) If there's an outstanding command the VMM needs to wait for it to
> > > > > finish and wait for the receive queue of the request to have enough capacity
> > > > > for the reply, but since this capacity is guest driven, this second part can
> > > > > take a while / forever. This is definitely not ideal.
> > > > 
> > > > I think you're describing how to reserve space for control packets so
> > > > that the device never has to wait on the driver.
> > > > 
> > > > Have you seen the drivers/vhost/vsock.c device implementation?  It has a
> > > > strategy for suspending tx queue processing until the rx queue has more
> > > > space.  Multiple implementation-specific approaches are possible, so
> > > > this isn't in the specification.
> > > > 
> > > > > I short, I think workload 2 needs to be in control of its own queues for
> > > > > this to work reasonably well, I don't know if sharing ownership of queues
> > > > > can work. The device we defined doesn't have this problem: first of all,
> > > > > it's on a separate queue, so workload 1 never competes in any way with
> > > > > workload 2, and workload 2 always has where to place replies, since it has
> > > > > an attached reply buffer by design.
> > > > 
> > > > Flow control in vsock works like this:
> > > > 
> > > > 1. Data packets are accounted against per-socket buffers and removed
> > > >      from the virtqueue immediately.  This allows multiple competing data
> > > >      streams to share a single virtqueue without starvation.  It's the
> > > >      per-socket buffer that can be exhausted, but that only affects the
> > > >      application that isn't reading the socket socket.  The other side
> > > >      will stop sending more data when credit is exhausted so that delivery
> > > >      can be guaranteed.
> > > > 
> > > > 2. Control packet replies can be sent in response to pretty much any
> > > >      packet.  Therefore, it's necessary to suspend packet processing when
> > > >      the other side's virtqueue is full.  This way you don't need to wait
> > > >      for them midway through processing a packet.
> > > > 
> > > > There is a problem with #2 which hasn't been solved.  If both sides are
> > > > operating at N-1 queue capacity (they are almost exhausted), then can we
> > > > reach a deadlock where both sides suspend queue processing because they
> > > > are waiting for the other side?  This has not been fully investigated or
> > > > demonstrated, but it's an area that needs attention sometime.
> > > > 
> > > > > Perhaps a good compromise would be to have a multi-queue virtio-vsock or
> > > > 
> > > > That would mean we've reached the conclusion that it's impossible to
> > > > have bi-directional communication with guaranteed delivery over a shared
> > > > communications channel.
> > > > 
> > > > virtio-serial did this to avoid having to come up with a scheme to avoid
> > > > starvation.
> > > 
> > > Let me throw in one more problem:
> > > 
> > > Imagine that we want to have virtio-vsock communication terminated in
> > > different domains, each of which has ownership of their own device
> > > emulation.
> > > 
> > > The easiest case where this happens is to have vsock between hypervisor and
> > > guest as well as between a PCIe implementation via VFIO and a guest. But the
> > > same can be true for stub domain like setups, where each connection end
> > > lives in its own stub domain (vhost-user in the vsock case I suppose).
> > > 
> > > In that case, it's impossible to share the one queue we have, no?
> > 
> > Do you see any relation to the SOCK_SEQPACKET semantics discussion?
> > That seems like a completely separate issue to me.
> 
> It's a very different issue, yes :).
> 
> > Even if you introduce multiple virtqueues for other reasons, it's
> > advantageous to keep virtio-vsock flow control credit mechanism so that
> > multiple connections can use a single virtqueue.
> 
> Absolutely, yes!
> 
> > Initially the requirement might only be for one vsock connection but who
> > knows when that requirement changes and you need a dynamic number of
> > connections?  Virtqueues cannot be hotplugged.
> 
> The way I was thinking of this was that the identifier on which queue to
> take is 100% driven by CID. So what I was depicting above is just that it'd
> be good to have support for:
> 
>   * multiple queues (same host virtio implementation)
>   * multiple PCIe devices (different host virtio implementations per CID)
> 
> Whether to use Queue 0 of device 0 or Queue 1 of device 1 purely depends on
> the CID. However, each connection to/from a target CID should still go
> through the same queue, with the same flow control as today, no?
> 
> It's a bit like a poor man's switch with the CID as the MAC address I guess.

I see value in all of these:

 * multiqueue vsock for SMP scalability
 * per-CID multiqueue vsock for guest<->guest communication in software
 * per-CID multidevice vsock for hardware/vfio
 * per-connection multiqueue for performance (eliminates memcpy socket
   buffer and credit update packets)

They are all independent features.  I'm not sure if someone wants to
work on each one, but I think they all make sense and fit within the
scope of virtio-vsock.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] Seeking guidance for custom virtIO device
  2020-04-29 10:06           ` Stefan Hajnoczi
@ 2020-04-30  8:44             ` Eftime, Petre
  2020-04-30  8:59               ` Stefano Garzarella
  0 siblings, 1 reply; 13+ messages in thread
From: Eftime, Petre @ 2020-04-30  8:44 UTC (permalink / raw)
  To: Stefan Hajnoczi, Alexander Graf; +Cc: sgarzare, virtio-comment

[-- Attachment #1: Type: text/plain, Size: 10630 bytes --]


On 2020-04-29 13:06, Stefan Hajnoczi wrote:
> On Fri, Apr 17, 2020 at 01:09:16PM +0200, Alexander Graf wrote:
>> On 17.04.20 12:33, Stefan Hajnoczi wrote:
>>> On Wed, Apr 15, 2020 at 02:23:48PM +0300, Eftime, Petre wrote:
>>>> On 2020-04-14 13:50, Stefan Hajnoczi wrote:
>>>>> On Fri, Apr 10, 2020 at 12:09:22PM +0200, Stefano Garzarella wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Fri, Apr 10, 2020 at 09:36:58AM +0000, Eftime, Petre wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I am looking for guidance on how to proceed with regards to either reserving a virtio device ID for a specific device for a particular usecase  or for formalizing a device type that could be potentially used by others.
>>>>>>>
>>>>>>> We have developed a virtio device that acts as a transport for API calls between a guest userspace library and a backend server in the host system.
>>>>>>> Our requirements are:
>>>>>>> * multiple clients in the guest (multiple servers is not required)
>>>>>>> * provide an in-order, reliable datagram transport mechanism
>>>>>>> * datagram size should be either negotiable or large (16k-64k?)
>>>>>>> * performance is not a big concern for our usecase
>>>>>> It looks really close to vsock.
>>>>>>
>>>>>>> The reason why we used a special device and not something else is the following:
>>>>>>> * vsock spec does not contain a datagram specification (eg. SOCK_DGRAM, SOCK_SEQPACKET) and the effort of updating the Linux driver and other implementations for this particular purpose  seemed relatively high. The path to approach this problem wasn't clear. Vsock today only works in SOCK_STREAM mode and this is not ideal: the receiver must implement additional state and buffer incoming data,  adding complexity and host resource usage.
>>>>>> AF_VSOCK itself supports SOCK_DGRAM, but virtio-vsock doesn't provide
>>>>>> this feature. (vmci provides SOCK_DGRAM support)
>>>>>>
>>>>>> The changes should not be too intrusive in the virtio-vsock specs and
>>>>>> implementation, we already have the "type" field in the packet header
>>>>>> to address this new feature.
>>>>>>
>>>>>> We also have the credit-mechanism to provide in-order and reliable
>>>>>> packets delivery.
>>>>>>
>>>>>> Maybe the hardest part could be change something in the core to handle
>>>>>> multiple transports that provide SOCK_DGRAM, for nested VMs.
>>>>>> We already did for stream sockets, but we didn't handle the datagram
>>>>>> socket for now.
>>>>>>
>>>>>> I am not sure how convenient it is to have two very similar devices...
>>>>>>
>>>>>> If you decide to give virtio-vsock a chance to get SOCK_DGRAM, I can try to
>>>>>> give you a more complete list of changes to make. :-)
>>>>> I although think this sounds exactly like adding SOCK_DGRAM support to
>>>>> virtio-vsock.
>>>>>
>>>>> The reason why the SOCK_DGRAM code was dropped from early virtio-vsock
>>>>> patches is that the prototocol design didn't ensure reliable delivery
>>>>> semantics.  At that time there were no real users for SOCK_DGRAM so it
>>>>> was left as a feature to be added later.
>>>>>
>>>>> The challenge with reusing the SOCK_STREAM credit mechanism for
>>>>> SOCK_DGRAM is that datagrams are connectionless.  The credit mechanism
>>>>> consists per-connection state.  Maybe it can be extended to cover
>>>>> SOCK_DGRAM too.
>>>>>
>>>>> I would urge you to add SOCK_DGRAM to virtio-vsock instead of trying to
>>>>> create another device that does basically what is within the scope of
>>>>> virtio-vsock.  It took quite a bit of time and effort to get AF_VSOCK
>>>>> support into various software components, and doing that again for
>>>>> another device is more effort than one would think.
>>>>>
>>>>> If you don't want to modify the Linux guest driver, then let's just
>>>>> discuss the device spec and protocol.  Someone else could make the Linux
>>>>> driver changes.
>>>>>
>>>>> Stefan
>>>>
>>>> I think it would be great if we could get the virtio-vsock driver to support
>>>> SOCK_DGRAM/SOCK_SEQPACKET as it would make a lot of sense.
>>>>
>>>>
>>>> But one of the reasons that I don't really like virtio-vsock at the moment
>>>> for my use-case in particular is that it doesn't seem well fitted to support
>>>> non-cooperating live-migrateable VMs all that well.  One problem is that to
>>>> avoid guest-visible disconnections to any service while doing a live
>>>> migration there might be performance impact if using vsock for any other
>>>> reasons.
>>>>
>>>> I'll try to exemplify what I mean with this setup:
>>>>
>>>>       * workload 1 sends data constantly via an AF_VSOCK SOCK_STREAM
>>>>
>>>>       * workload 2 sends commands / gets replies once in a while via an
>>>> AF_VSOCK SOCK_SEQPACKET.
>>> af_vsock.ko doesn't support SOCK_SEQPACKET.  Is this what you are
>>> considering adding?
>>>
>>> Earlier in this thread I thought we were discussing SOCK_DGRAM, which
>>> has different semantics than SOCK_SEQPACKET.
>>>
>>> The good news is that SOCK_SEQPACKET should be easier to add to
>>> net/vmw_vsock than SOCK_DGRAM because the flow control credit mechanism
>>> used for SOCK_STREAM should just work for SOCK_SEQPACKET.
>>>
>>>> Assume the VM needs to be migrated:
>>>>
>>>>           1) If workload 2 currently not processing anything, even if there
>>>> are some commands for it queued up, everything is fine, VMM can pause the
>>>> guest and serialize.
>>>>
>>>>           2) If there's an outstanding command the VMM needs to wait for it to
>>>> finish and wait for the receive queue of the request to have enough capacity
>>>> for the reply, but since this capacity is guest driven, this second part can
>>>> take a while / forever. This is definitely not ideal.
>>> I think you're describing how to reserve space for control packets so
>>> that the device never has to wait on the driver.
>>>
>>> Have you seen the drivers/vhost/vsock.c device implementation?  It has a
>>> strategy for suspending tx queue processing until the rx queue has more
>>> space.  Multiple implementation-specific approaches are possible, so
>>> this isn't in the specification.
>>>
>>>> I short, I think workload 2 needs to be in control of its own queues for
>>>> this to work reasonably well, I don't know if sharing ownership of queues
>>>> can work. The device we defined doesn't have this problem: first of all,
>>>> it's on a separate queue, so workload 1 never competes in any way with
>>>> workload 2, and workload 2 always has where to place replies, since it has
>>>> an attached reply buffer by design.
>>> Flow control in vsock works like this:
>>>
>>> 1. Data packets are accounted against per-socket buffers and removed
>>>      from the virtqueue immediately.  This allows multiple competing data
>>>      streams to share a single virtqueue without starvation.  It's the
>>>      per-socket buffer that can be exhausted, but that only affects the
>>>      application that isn't reading the socket socket.  The other side
>>>      will stop sending more data when credit is exhausted so that delivery
>>>      can be guaranteed.
>>>
>>> 2. Control packet replies can be sent in response to pretty much any
>>>      packet.  Therefore, it's necessary to suspend packet processing when
>>>      the other side's virtqueue is full.  This way you don't need to wait
>>>      for them midway through processing a packet.
>>>
>>> There is a problem with #2 which hasn't been solved.  If both sides are
>>> operating at N-1 queue capacity (they are almost exhausted), then can we
>>> reach a deadlock where both sides suspend queue processing because they
>>> are waiting for the other side?  This has not been fully investigated or
>>> demonstrated, but it's an area that needs attention sometime.
>>>
>>>> Perhaps a good compromise would be to have a multi-queue virtio-vsock or
>>> That would mean we've reached the conclusion that it's impossible to
>>> have bi-directional communication with guaranteed delivery over a shared
>>> communications channel.
>>>
>>> virtio-serial did this to avoid having to come up with a scheme to avoid
>>> starvation.
>> Let me throw in one more problem:
>>
>> Imagine that we want to have virtio-vsock communication terminated in
>> different domains, each of which has ownership of their own device
>> emulation.
>>
>> The easiest case where this happens is to have vsock between hypervisor and
>> guest as well as between a PCIe implementation via VFIO and a guest. But the
>> same can be true for stub domain like setups, where each connection end
>> lives in its own stub domain (vhost-user in the vsock case I suppose).
>>
>> In that case, it's impossible to share the one queue we have, no?
> By the way, "dedicated virtqueues" could be interesting as a separate
> feature.
>
> I just think that SOCK_SEQPACKET should be implemented with flow control
> like SOCK_STREAM.
>
> Then, as a separate feature, the device could advertise dedicated
> virtqueues allowing SOCK_STREAM and SOCK_SEQPACKET communication
> directly in the virtqueue.  The dedicated virtqueue approach should be
> faster since it eliminates the need for a memcpy to socket receive
> buffers.
>
> Stefan

Yes, SOCK_SEQPACKET seems relatively easy to implement.

Would devices need to add a feature flag to advertise the fact that they 
support SOCK_SEQPACKET? Not sure I understand all the backwards 
compatibility issues adding this might introduce, but I think it would 
be required since the idea is that if you write a message on a 
SOCK_SEQPACKET socket it can't be split into multiple messages to the 
receiver, and both the driver and the device would need to know that.

A marker in the header should denote that there is more data to come in 
the same packet, in the next descriptor, so that message boundaries can 
be preserved, even if the driver doesn't place large enough descriptors 
in the RX queue. The device would split and mark messages accordingly 
and then the driver would re-assemble them before sending them to af_vsock.

Otherwise, minimal descriptor sizes should be part of the specification 
such that you can rely on some specific message size rather than what 
the driver decides to place in the queue, which so far is implementation 
defined. The first option seems more flexible to me.

Best,
Petre Eftime





Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.

[-- Attachment #2: Type: text/html, Size: 12587 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [virtio-comment] Seeking guidance for custom virtIO device
  2020-04-30  8:44             ` Eftime, Petre
@ 2020-04-30  8:59               ` Stefano Garzarella
  0 siblings, 0 replies; 13+ messages in thread
From: Stefano Garzarella @ 2020-04-30  8:59 UTC (permalink / raw)
  To: Eftime, Petre; +Cc: Stefan Hajnoczi, Alexander Graf, virtio-comment

On Thu, Apr 30, 2020 at 11:44:59AM +0300, Eftime, Petre wrote:
> 
> On 2020-04-29 13:06, Stefan Hajnoczi wrote:
> > On Fri, Apr 17, 2020 at 01:09:16PM +0200, Alexander Graf wrote:
> > > On 17.04.20 12:33, Stefan Hajnoczi wrote:
> > > > On Wed, Apr 15, 2020 at 02:23:48PM +0300, Eftime, Petre wrote:
> > > > > On 2020-04-14 13:50, Stefan Hajnoczi wrote:
> > > > > > On Fri, Apr 10, 2020 at 12:09:22PM +0200, Stefano Garzarella wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On Fri, Apr 10, 2020 at 09:36:58AM +0000, Eftime, Petre wrote:
> > > > > > > > Hi all,
> > > > > > > > 
> > > > > > > > I am looking for guidance on how to proceed with regards to either reserving a virtio device ID for a specific device for a particular usecase  or for formalizing a device type that could be potentially used by others.
> > > > > > > > 
> > > > > > > > We have developed a virtio device that acts as a transport for API calls between a guest userspace library and a backend server in the host system.
> > > > > > > > Our requirements are:
> > > > > > > > * multiple clients in the guest (multiple servers is not required)
> > > > > > > > * provide an in-order, reliable datagram transport mechanism
> > > > > > > > * datagram size should be either negotiable or large (16k-64k?)
> > > > > > > > * performance is not a big concern for our usecase
> > > > > > > It looks really close to vsock.
> > > > > > > 
> > > > > > > > The reason why we used a special device and not something else is the following:
> > > > > > > > * vsock spec does not contain a datagram specification (eg. SOCK_DGRAM, SOCK_SEQPACKET) and the effort of updating the Linux driver and other implementations for this particular purpose  seemed relatively high. The path to approach this problem wasn't clear. Vsock today only works in SOCK_STREAM mode and this is not ideal: the receiver must implement additional state and buffer incoming data,  adding complexity and host resource usage.
> > > > > > > AF_VSOCK itself supports SOCK_DGRAM, but virtio-vsock doesn't provide
> > > > > > > this feature. (vmci provides SOCK_DGRAM support)
> > > > > > > 
> > > > > > > The changes should not be too intrusive in the virtio-vsock specs and
> > > > > > > implementation, we already have the "type" field in the packet header
> > > > > > > to address this new feature.
> > > > > > > 
> > > > > > > We also have the credit-mechanism to provide in-order and reliable
> > > > > > > packets delivery.
> > > > > > > 
> > > > > > > Maybe the hardest part could be change something in the core to handle
> > > > > > > multiple transports that provide SOCK_DGRAM, for nested VMs.
> > > > > > > We already did for stream sockets, but we didn't handle the datagram
> > > > > > > socket for now.
> > > > > > > 
> > > > > > > I am not sure how convenient it is to have two very similar devices...
> > > > > > > 
> > > > > > > If you decide to give virtio-vsock a chance to get SOCK_DGRAM, I can try to
> > > > > > > give you a more complete list of changes to make. :-)
> > > > > > I although think this sounds exactly like adding SOCK_DGRAM support to
> > > > > > virtio-vsock.
> > > > > > 
> > > > > > The reason why the SOCK_DGRAM code was dropped from early virtio-vsock
> > > > > > patches is that the prototocol design didn't ensure reliable delivery
> > > > > > semantics.  At that time there were no real users for SOCK_DGRAM so it
> > > > > > was left as a feature to be added later.
> > > > > > 
> > > > > > The challenge with reusing the SOCK_STREAM credit mechanism for
> > > > > > SOCK_DGRAM is that datagrams are connectionless.  The credit mechanism
> > > > > > consists per-connection state.  Maybe it can be extended to cover
> > > > > > SOCK_DGRAM too.
> > > > > > 
> > > > > > I would urge you to add SOCK_DGRAM to virtio-vsock instead of trying to
> > > > > > create another device that does basically what is within the scope of
> > > > > > virtio-vsock.  It took quite a bit of time and effort to get AF_VSOCK
> > > > > > support into various software components, and doing that again for
> > > > > > another device is more effort than one would think.
> > > > > > 
> > > > > > If you don't want to modify the Linux guest driver, then let's just
> > > > > > discuss the device spec and protocol.  Someone else could make the Linux
> > > > > > driver changes.
> > > > > > 
> > > > > > Stefan
> > > > > 
> > > > > I think it would be great if we could get the virtio-vsock driver to support
> > > > > SOCK_DGRAM/SOCK_SEQPACKET as it would make a lot of sense.
> > > > > 
> > > > > 
> > > > > But one of the reasons that I don't really like virtio-vsock at the moment
> > > > > for my use-case in particular is that it doesn't seem well fitted to support
> > > > > non-cooperating live-migrateable VMs all that well.  One problem is that to
> > > > > avoid guest-visible disconnections to any service while doing a live
> > > > > migration there might be performance impact if using vsock for any other
> > > > > reasons.
> > > > > 
> > > > > I'll try to exemplify what I mean with this setup:
> > > > > 
> > > > >       * workload 1 sends data constantly via an AF_VSOCK SOCK_STREAM
> > > > > 
> > > > >       * workload 2 sends commands / gets replies once in a while via an
> > > > > AF_VSOCK SOCK_SEQPACKET.
> > > > af_vsock.ko doesn't support SOCK_SEQPACKET.  Is this what you are
> > > > considering adding?
> > > > 
> > > > Earlier in this thread I thought we were discussing SOCK_DGRAM, which
> > > > has different semantics than SOCK_SEQPACKET.
> > > > 
> > > > The good news is that SOCK_SEQPACKET should be easier to add to
> > > > net/vmw_vsock than SOCK_DGRAM because the flow control credit mechanism
> > > > used for SOCK_STREAM should just work for SOCK_SEQPACKET.
> > > > 
> > > > > Assume the VM needs to be migrated:
> > > > > 
> > > > >           1) If workload 2 currently not processing anything, even if there
> > > > > are some commands for it queued up, everything is fine, VMM can pause the
> > > > > guest and serialize.
> > > > > 
> > > > >           2) If there's an outstanding command the VMM needs to wait for it to
> > > > > finish and wait for the receive queue of the request to have enough capacity
> > > > > for the reply, but since this capacity is guest driven, this second part can
> > > > > take a while / forever. This is definitely not ideal.
> > > > I think you're describing how to reserve space for control packets so
> > > > that the device never has to wait on the driver.
> > > > 
> > > > Have you seen the drivers/vhost/vsock.c device implementation?  It has a
> > > > strategy for suspending tx queue processing until the rx queue has more
> > > > space.  Multiple implementation-specific approaches are possible, so
> > > > this isn't in the specification.
> > > > 
> > > > > I short, I think workload 2 needs to be in control of its own queues for
> > > > > this to work reasonably well, I don't know if sharing ownership of queues
> > > > > can work. The device we defined doesn't have this problem: first of all,
> > > > > it's on a separate queue, so workload 1 never competes in any way with
> > > > > workload 2, and workload 2 always has where to place replies, since it has
> > > > > an attached reply buffer by design.
> > > > Flow control in vsock works like this:
> > > > 
> > > > 1. Data packets are accounted against per-socket buffers and removed
> > > >      from the virtqueue immediately.  This allows multiple competing data
> > > >      streams to share a single virtqueue without starvation.  It's the
> > > >      per-socket buffer that can be exhausted, but that only affects the
> > > >      application that isn't reading the socket socket.  The other side
> > > >      will stop sending more data when credit is exhausted so that delivery
> > > >      can be guaranteed.
> > > > 
> > > > 2. Control packet replies can be sent in response to pretty much any
> > > >      packet.  Therefore, it's necessary to suspend packet processing when
> > > >      the other side's virtqueue is full.  This way you don't need to wait
> > > >      for them midway through processing a packet.
> > > > 
> > > > There is a problem with #2 which hasn't been solved.  If both sides are
> > > > operating at N-1 queue capacity (they are almost exhausted), then can we
> > > > reach a deadlock where both sides suspend queue processing because they
> > > > are waiting for the other side?  This has not been fully investigated or
> > > > demonstrated, but it's an area that needs attention sometime.
> > > > 
> > > > > Perhaps a good compromise would be to have a multi-queue virtio-vsock or
> > > > That would mean we've reached the conclusion that it's impossible to
> > > > have bi-directional communication with guaranteed delivery over a shared
> > > > communications channel.
> > > > 
> > > > virtio-serial did this to avoid having to come up with a scheme to avoid
> > > > starvation.
> > > Let me throw in one more problem:
> > > 
> > > Imagine that we want to have virtio-vsock communication terminated in
> > > different domains, each of which has ownership of their own device
> > > emulation.
> > > 
> > > The easiest case where this happens is to have vsock between hypervisor and
> > > guest as well as between a PCIe implementation via VFIO and a guest. But the
> > > same can be true for stub domain like setups, where each connection end
> > > lives in its own stub domain (vhost-user in the vsock case I suppose).
> > > 
> > > In that case, it's impossible to share the one queue we have, no?
> > By the way, "dedicated virtqueues" could be interesting as a separate
> > feature.
> > 
> > I just think that SOCK_SEQPACKET should be implemented with flow control
> > like SOCK_STREAM.
> > 
> > Then, as a separate feature, the device could advertise dedicated
> > virtqueues allowing SOCK_STREAM and SOCK_SEQPACKET communication
> > directly in the virtqueue.  The dedicated virtqueue approach should be
> > faster since it eliminates the need for a memcpy to socket receive
> > buffers.
> > 
> > Stefan
> 
> Yes, SOCK_SEQPACKET seems relatively easy to implement.

SOCK_SEQPACKET requires some changes also in af_vsock.c, but should be
feasible.

> 
> Would devices need to add a feature flag to advertise the fact that they
> support SOCK_SEQPACKET? Not sure I understand all the backwards
> compatibility issues adding this might introduce, but I think it would be
> required since the idea is that if you write a message on a SOCK_SEQPACKET
> socket it can't be split into multiple messages to the receiver, and both
> the driver and the device would need to know that.

In the virtio-vsock packet header (struct virtio_vsock_hdr) we have the
'type' field, and now the only one supported is VIRTIO_VSOCK_TYPE_STREAM.

We can add a new VIRTIO_VSOCK_TYPE_SEQPACKET to handle SOCK_SEQPACKET
packets.

I did a rapid check and we should discard packets with a different type
and we send a VIRTIO_VSOCK_OP_RST back, but I never tried...

If it works, maybe we don't need a feature flag.

> 
> A marker in the header should denote that there is more data to come in the
> same packet, in the next descriptor, so that message boundaries can be
> preserved, even if the driver doesn't place large enough descriptors in the
> RX queue. The device would split and mark messages accordingly and then the
> driver would re-assemble them before sending them to af_vsock.

The 'flags' field in the packet header should help in this case.

> 
> Otherwise, minimal descriptor sizes should be part of the specification such
> that you can rely on some specific message size rather than what the driver
> decides to place in the queue, which so far is implementation defined. The
> first option seems more flexible to me.

Me too.

Thanks,
Stefano


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-04-30  8:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-10  9:36 [virtio-comment] Seeking guidance for custom virtIO device Eftime, Petre
2020-04-10 10:09 ` Stefano Garzarella
2020-04-14 10:50   ` Stefan Hajnoczi
2020-04-15 11:23     ` Eftime, Petre
2020-04-17 10:33       ` Stefan Hajnoczi
     [not found]         ` <15906829-2e85-ed4c-7b06-431d6e856ae9@amazon.de>
2020-04-21  9:37           ` Stefano Garzarella
     [not found]             ` <a26e9938-49c6-6b63-bb4a-6ba582af3d40@amazon.de>
2020-04-22  8:40               ` Stefano Garzarella
2020-04-29  9:53           ` Stefan Hajnoczi
     [not found]             ` <fe4a003d-3705-1337-0349-c1c4a0e94125@amazon.de>
2020-04-30  8:43               ` Stefan Hajnoczi
2020-04-29 10:06           ` Stefan Hajnoczi
2020-04-30  8:44             ` Eftime, Petre
2020-04-30  8:59               ` Stefano Garzarella
2020-04-13 11:46 ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.