All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2] virtio-vsock: add description for datagram type
@ 2021-04-01  4:36 jiang.wang
  2021-04-12 13:50 ` Stefan Hajnoczi
  0 siblings, 1 reply; 34+ messages in thread
From: jiang.wang @ 2021-04-01  4:36 UTC (permalink / raw)
  To: mst, cohuck
  Cc: cong.wang, duanxiongchun, jiang.wang, virtualization, xieyongji,
	stefanha, asias, arseny.krasnov

Add supports for datagram type for virtio-vsock. Datagram
sockets are connectionless and unreliable. To avoid contention
with stream and other sockets, add two more virtqueues and
a new feature bit to identify if those two new queues exist or not.

Also add descriptions for resource management of datagram, which
does not use the existing credit update mechanism associated with
stream sockets.

Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
---
V2 addressed the comments for the previous version.

 virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 52 insertions(+), 10 deletions(-)

diff --git a/virtio-vsock.tex b/virtio-vsock.tex
index da7e641..62c12e0 100644
--- a/virtio-vsock.tex
+++ b/virtio-vsock.tex
@@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
 \begin{description}
 \item[0] rx
 \item[1] tx
+\item[2] datagram rx
+\item[3] datagram tx
+\item[4] event
+\end{description}
+The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
+only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
+
+\begin{description}
+\item[0] rx
+\item[1] tx
 \item[2] event
 \end{description}
 
+
 \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
 
-There are currently no feature bits defined for this device.
+\begin{description}
+\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
+\end{description}
 
 \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
 
@@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
 
 \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
 
+Flow control applies to stream sockets; datagram sockets do not have
+flow control.
+
 The tx virtqueue carries packets initiated by applications and replies to
 received packets.  The rx virtqueue carries packets initiated by the device and
 replies to previously transmitted packets.
@@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
 consists of a (cid, port number) tuple. The header fields used for this are
 \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
 
-Currently only stream sockets are supported. \field{type} is 1 for stream
-socket types.
+Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
+socket types. \field{type} is 3 for dgram socket types.
 
 Stream sockets provide in-order, guaranteed, connection-oriented delivery
 without message boundaries.
 
+Datagram sockets provide connectionless unreliable messages of
+a fixed maximum length.
+
 \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
 \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
 stream sockets. The guest and the device publish how much buffer space is
@@ -162,7 +181,7 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
 u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt);
 \end{lstlisting}
 
-If there is insufficient buffer space, the sender waits until virtqueue buffers
+For stream sockets, if there is insufficient buffer space, the sender waits until virtqueue buffers
 are returned and checks \field{buf_alloc} and \field{fwd_cnt} again. Sending
 the VIRTIO_VSOCK_OP_CREDIT_REQUEST packet queries how much buffer space is
 available. The reply to this query is a VIRTIO_VSOCK_OP_CREDIT_UPDATE packet.
@@ -170,16 +189,28 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
 previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
 communicating updates any time a change in buffer space occurs.
 
+Unlike stream sockets, dgram sockets do not use VIRTIO_VSOCK_OP_CREDIT_UPDATE or
+VIRTIO_VSOCK_OP_CREDIT_REQUEST packets. The dgram buffer management
+is split to two parts: tx side and rx side. For the tx side, there is
+additional buffer space for each socket.
+The dgram sender sends packets when the virtqueue or the additional buffer is not full.
+When both are full, the sender
+MUST return an appropriate error to the upper layer application.
+For the rx side, dgram also uses the \field{buf_alloc}. If it is full, the packet
+is dropped by the receiver.
+
 \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
-VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
-sufficient free buffer space for the payload.
+For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
+sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets
+MAY be transmitted when the peer buffer is full. Then the packet will be dropped by the receiver.
 
 All packets associated with a stream flow MUST contain valid information in
 \field{buf_alloc} and \field{fwd_cnt} fields.
 
 \devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
-VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
-sufficient free buffer space for the payload.
+For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
+sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets
+MAY be transmitted when the peer buffer is full. Then the packet will be dropped by the receiver.
 
 All packets associated with a stream flow MUST contain valid information in
 \field{buf_alloc} and \field{fwd_cnt} fields.
@@ -203,14 +234,14 @@ \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / De
 The \field{guest_cid} configuration field MUST be used as the source CID when
 sending outgoing packets.
 
-A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
+For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
 unknown \field{type} value.
 
 \devicenormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit}
 
 The \field{guest_cid} configuration field MUST NOT contain a reserved CID as listed in \ref{sec:Device Types / Socket Device / Device configuration layout}.
 
-A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
+For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
 unknown \field{type} value.
 
 \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device Operation / Stream Sockets}
@@ -240,6 +271,17 @@ \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device O
 destination) address tuple for a new connection while the other peer is still
 processing the old connection.
 
+\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Stream Sockets}
+
+Datagram (dgram) sockets are connectionless and unreliable. The sender just sends
+a message to the peer and hope it will be delivered. A VIRTIO_VSOCK_OP_RST reply is sent if
+a receiving socket does not exist on the destination.
+If the transmission or receiving buffers are full, the packets
+are dropped. If the transmission buffer is full, an appropriate error message
+is returned to the caller.
+
+Dgram sockets preserve the message boundaries.
+
 \subsubsection{Device Events}\label{sec:Device Types / Socket Device / Device Operation / Device Events}
 
 Certain events are communicated by the device to the driver using the event
-- 
2.11.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-01  4:36 [RFC v2] virtio-vsock: add description for datagram type jiang.wang
@ 2021-04-12 13:50 ` Stefan Hajnoczi
  2021-04-12 14:21   ` Stefano Garzarella
  2021-04-12 22:39   ` [External] " Jiang Wang .
  0 siblings, 2 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2021-04-12 13:50 UTC (permalink / raw)
  To: jiang.wang
  Cc: cong.wang, duanxiongchun, mst, cohuck, virtualization, xieyongji,
	arseny.krasnov, asias


[-- Attachment #1.1: Type: text/plain, Size: 10703 bytes --]

On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
> Add supports for datagram type for virtio-vsock. Datagram
> sockets are connectionless and unreliable. To avoid contention
> with stream and other sockets, add two more virtqueues and
> a new feature bit to identify if those two new queues exist or not.
> 
> Also add descriptions for resource management of datagram, which
> does not use the existing credit update mechanism associated with
> stream sockets.
> 
> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> ---
> V2 addressed the comments for the previous version.
> 
>  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 52 insertions(+), 10 deletions(-)
> 
> diff --git a/virtio-vsock.tex b/virtio-vsock.tex
> index da7e641..62c12e0 100644
> --- a/virtio-vsock.tex
> +++ b/virtio-vsock.tex
> @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
>  \begin{description}
>  \item[0] rx
>  \item[1] tx
> +\item[2] datagram rx
> +\item[3] datagram tx
> +\item[4] event
> +\end{description}
> +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
> +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
> +
> +\begin{description}
> +\item[0] rx
> +\item[1] tx
>  \item[2] event
>  \end{description}
>  

I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
virtqueues and also adding this:

  When behavior differs between stream and datagram rx/tx virtqueues
  their full names are used. Common behavior is simply described in
  terms of rx/tx virtqueues and applies to both stream and datagram
  virtqueues.

This way you won't need to duplicate portions of the spec that deal with
populating the virtqueues, for example.

It's also clearer to use a full name for stream rx/tx virtqueues instead
of calling them rx/tx virtqueues now that we have datagram rx/tx
virtqueues.

> +
>  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
>  
> -There are currently no feature bits defined for this device.
> +\begin{description}
> +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
> +\end{description}
>  
>  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
>  
> @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
>  
>  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
>  
> +Flow control applies to stream sockets; datagram sockets do not have
> +flow control.
> +
>  The tx virtqueue carries packets initiated by applications and replies to
>  received packets.  The rx virtqueue carries packets initiated by the device and
>  replies to previously transmitted packets.
> @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
>  consists of a (cid, port number) tuple. The header fields used for this are
>  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
>  
> -Currently only stream sockets are supported. \field{type} is 1 for stream
> -socket types.
> +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
> +socket types. \field{type} is 3 for dgram socket types.
>  
>  Stream sockets provide in-order, guaranteed, connection-oriented delivery
>  without message boundaries.
>  
> +Datagram sockets provide connectionless unreliable messages of
> +a fixed maximum length.

Plus unordered (?) and with message boundaries. In other words:

  Datagram sockets provide unordered, unreliable, connectionless message
  with message boundaries and a fixed maximum length.

I didn't think of the fixed maximum length aspect before. I guess the
intention is that the rx buffer size is the message size limit? That's
different from UDP messages, which can be fragmented into multiple IP
packets and can be larger than 64KiB:
https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure

Is it possible to support large datagram messages in vsock? I'm a little
concerned that applications that run successfully over UDP will not be
portable if vsock has this limitation because it would impose extra
message boundaries that the application protocol might not tolerate.

> +
>  \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
>  \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
>  stream sockets. The guest and the device publish how much buffer space is
> @@ -162,7 +181,7 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
>  u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt);
>  \end{lstlisting}
>  
> -If there is insufficient buffer space, the sender waits until virtqueue buffers
> +For stream sockets, if there is insufficient buffer space, the sender waits until virtqueue buffers
>  are returned and checks \field{buf_alloc} and \field{fwd_cnt} again. Sending
>  the VIRTIO_VSOCK_OP_CREDIT_REQUEST packet queries how much buffer space is
>  available. The reply to this query is a VIRTIO_VSOCK_OP_CREDIT_UPDATE packet.
> @@ -170,16 +189,28 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
>  previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
>  communicating updates any time a change in buffer space occurs.
>  
> +Unlike stream sockets, dgram sockets do not use VIRTIO_VSOCK_OP_CREDIT_UPDATE or
> +VIRTIO_VSOCK_OP_CREDIT_REQUEST packets. The dgram buffer management
> +is split to two parts: tx side and rx side. For the tx side, there is
> +additional buffer space for each socket.

Plus:

  ... according to the the driver and device's available memory
  resources. The amount of tx buffer space is an implementation detail
  of both the device and the driver. It is not visible to the other side
  and may be controlled by the application or administrative resource
  limits.

What I'm trying to describe here is that the additional tx buffer space
isn't part of the device interface.

> +The dgram sender sends packets when the virtqueue or the additional buffer is not full.
> +When both are full, the sender
> +MUST return an appropriate error to the upper layer application.

MUST, SHOULD, etc clauses need to go into the
devicenormative/drivernormative sections. They cannot be in regular
text.

> +For the rx side, dgram also uses the \field{buf_alloc}. If it is full, the packet
> +is dropped by the receiver.

UDP is connectionless so any number of other sources can send messages
to the same destination, causing buf_alloc's value to be unpredictable.
Can you explain how buf_alloc works with datagram sockets in more
detail?

>  \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> -sufficient free buffer space for the payload.
> +For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> +sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets
> +MAY be transmitted when the peer buffer is full. Then the packet will be dropped by the receiver.
>  
>  All packets associated with a stream flow MUST contain valid information in
>  \field{buf_alloc} and \field{fwd_cnt} fields.
>  
>  \devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> -sufficient free buffer space for the payload.
> +For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> +sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets
> +MAY be transmitted when the peer buffer is full. Then the packet will be dropped by the receiver.
>  
>  All packets associated with a stream flow MUST contain valid information in
>  \field{buf_alloc} and \field{fwd_cnt} fields.
> @@ -203,14 +234,14 @@ \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / De
>  The \field{guest_cid} configuration field MUST be used as the source CID when
>  sending outgoing packets.
>  
> -A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> +For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
>  unknown \field{type} value.

What about datagram sockets? Please state what must happen and why.

>  
>  \devicenormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit}
>  
>  The \field{guest_cid} configuration field MUST NOT contain a reserved CID as listed in \ref{sec:Device Types / Socket Device / Device configuration layout}.
>  
> -A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> +For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
>  unknown \field{type} value.
>  
>  \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device Operation / Stream Sockets}
> @@ -240,6 +271,17 @@ \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device O
>  destination) address tuple for a new connection while the other peer is still
>  processing the old connection.
>  
> +\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Stream Sockets}

s/Stream Sockets/Datagram Sockets/

> +
> +Datagram (dgram) sockets are connectionless and unreliable. The sender just sends
> +a message to the peer and hope it will be delivered. A VIRTIO_VSOCK_OP_RST reply is sent if

s/hope/hopes/

> +a receiving socket does not exist on the destination.
> +If the transmission or receiving buffers are full, the packets
> +are dropped. If the transmission buffer is full, an appropriate error message
> +is returned to the caller.

It's unclear whether the caller is the driver/device or something else.
I think you're referring to the application interace, which is outside
the scope of the VIRTIO spec. I suggest dropping the last sentence.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-12 13:50 ` Stefan Hajnoczi
@ 2021-04-12 14:21   ` Stefano Garzarella
  2021-04-12 22:42     ` Jiang Wang .
  2021-04-12 22:39   ` [External] " Jiang Wang .
  1 sibling, 1 reply; 34+ messages in thread
From: Stefano Garzarella @ 2021-04-12 14:21 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: cong.wang, duanxiongchun, jiang.wang, mst, cohuck,
	virtualization, xieyongji, asias, arseny.krasnov

On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
>On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
>> Add supports for datagram type for virtio-vsock. Datagram
>> sockets are connectionless and unreliable. To avoid contention
>> with stream and other sockets, add two more virtqueues and
>> a new feature bit to identify if those two new queues exist or not.
>>
>> Also add descriptions for resource management of datagram, which
>> does not use the existing credit update mechanism associated with
>> stream sockets.
>>
>> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>> ---
>> V2 addressed the comments for the previous version.
>>
>>  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
>>  1 file changed, 52 insertions(+), 10 deletions(-)
>>
>> diff --git a/virtio-vsock.tex b/virtio-vsock.tex
>> index da7e641..62c12e0 100644
>> --- a/virtio-vsock.tex
>> +++ b/virtio-vsock.tex
>> @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
>>  \begin{description}
>>  \item[0] rx
>>  \item[1] tx
>> +\item[2] datagram rx
>> +\item[3] datagram tx
>> +\item[4] event
>> +\end{description}
>> +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
>> +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
>> +
>> +\begin{description}
>> +\item[0] rx
>> +\item[1] tx
>>  \item[2] event
>>  \end{description}
>>
>
>I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
>virtqueues and also adding this:
>
>  When behavior differs between stream and datagram rx/tx virtqueues
>  their full names are used. Common behavior is simply described in
>  terms of rx/tx virtqueues and applies to both stream and datagram
>  virtqueues.
>
>This way you won't need to duplicate portions of the spec that deal with
>populating the virtqueues, for example.
>
>It's also clearer to use a full name for stream rx/tx virtqueues instead
>of calling them rx/tx virtqueues now that we have datagram rx/tx
>virtqueues.
>
>> +
>>  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
>>
>> -There are currently no feature bits defined for this device.
>> +\begin{description}
>> +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
>> +\end{description}
>>
>>  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
>>
>> @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
>>
>>  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
>>
>> +Flow control applies to stream sockets; datagram sockets do not have
>> +flow control.
>> +
>>  The tx virtqueue carries packets initiated by applications and replies to
>>  received packets.  The rx virtqueue carries packets initiated by the device and
>>  replies to previously transmitted packets.
>> @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
>>  consists of a (cid, port number) tuple. The header fields used for this are
>>  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
>>
>> -Currently only stream sockets are supported. \field{type} is 1 for stream
>> -socket types.
>> +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
>> +socket types. \field{type} is 3 for dgram socket types.
>>
>>  Stream sockets provide in-order, guaranteed, connection-oriented delivery
>>  without message boundaries.
>>
>> +Datagram sockets provide connectionless unreliable messages of
>> +a fixed maximum length.
>
>Plus unordered (?) and with message boundaries. In other words:
>
>  Datagram sockets provide unordered, unreliable, connectionless message
>  with message boundaries and a fixed maximum length.
>
>I didn't think of the fixed maximum length aspect before. I guess the
>intention is that the rx buffer size is the message size limit? That's
>different from UDP messages, which can be fragmented into multiple IP
>packets and can be larger than 64KiB:
>https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
>
>Is it possible to support large datagram messages in vsock? I'm a little
>concerned that applications that run successfully over UDP will not be
>portable if vsock has this limitation because it would impose extra
>message boundaries that the application protocol might not tolerate.

Maybe we can reuse the same approach Arseny is using for SEQPACKET. 
Fragment the packets according to the buffers in the virtqueue and set 
the EOR flag to indicate the last packet in the message.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [External] Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-12 13:50 ` Stefan Hajnoczi
  2021-04-12 14:21   ` Stefano Garzarella
@ 2021-04-12 22:39   ` Jiang Wang .
  2021-05-13 14:57     ` Stefan Hajnoczi
  1 sibling, 1 reply; 34+ messages in thread
From: Jiang Wang . @ 2021-04-12 22:39 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Arseny Krasnov, asias

On Mon, Apr 12, 2021 at 6:50 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
> > Add supports for datagram type for virtio-vsock. Datagram
> > sockets are connectionless and unreliable. To avoid contention
> > with stream and other sockets, add two more virtqueues and
> > a new feature bit to identify if those two new queues exist or not.
> >
> > Also add descriptions for resource management of datagram, which
> > does not use the existing credit update mechanism associated with
> > stream sockets.
> >
> > Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> > ---
> > V2 addressed the comments for the previous version.
> >
> >  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
> >  1 file changed, 52 insertions(+), 10 deletions(-)
> >
> > diff --git a/virtio-vsock.tex b/virtio-vsock.tex
> > index da7e641..62c12e0 100644
> > --- a/virtio-vsock.tex
> > +++ b/virtio-vsock.tex
> > @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
> >  \begin{description}
> >  \item[0] rx
> >  \item[1] tx
> > +\item[2] datagram rx
> > +\item[3] datagram tx
> > +\item[4] event
> > +\end{description}
> > +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
> > +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
> > +
> > +\begin{description}
> > +\item[0] rx
> > +\item[1] tx
> >  \item[2] event
> >  \end{description}
> >
>
> I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
> virtqueues and also adding this:

OK

>   When behavior differs between stream and datagram rx/tx virtqueues
>   their full names are used. Common behavior is simply described in
>   terms of rx/tx virtqueues and applies to both stream and datagram
>   virtqueues.

OK

> This way you won't need to duplicate portions of the spec that deal with
> populating the virtqueues, for example.
>
> It's also clearer to use a full name for stream rx/tx virtqueues instead
> of calling them rx/tx virtqueues now that we have datagram rx/tx
> virtqueues.

Sure.

> > +
> >  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> >
> > -There are currently no feature bits defined for this device.
> > +\begin{description}
> > +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
> > +\end{description}
> >
> >  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
> >
> > @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
> >
> >  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
> >
> > +Flow control applies to stream sockets; datagram sockets do not have
> > +flow control.
> > +
> >  The tx virtqueue carries packets initiated by applications and replies to
> >  received packets.  The rx virtqueue carries packets initiated by the device and
> >  replies to previously transmitted packets.
> > @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> >  consists of a (cid, port number) tuple. The header fields used for this are
> >  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> >
> > -Currently only stream sockets are supported. \field{type} is 1 for stream
> > -socket types.
> > +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
> > +socket types. \field{type} is 3 for dgram socket types.
> >
> >  Stream sockets provide in-order, guaranteed, connection-oriented delivery
> >  without message boundaries.
> >
> > +Datagram sockets provide connectionless unreliable messages of
> > +a fixed maximum length.
>
> Plus unordered (?) and with message boundaries. In other words:
>
>   Datagram sockets provide unordered, unreliable, connectionless message
>   with message boundaries and a fixed maximum length.

OK. Will do. In my implementation, the order is preserved because
there is only one virtqueue. But I think we can put unordered in the spec.

> I didn't think of the fixed maximum length aspect before. I guess the
> intention is that the rx buffer size is the message size limit? That's
> different from UDP messages, which can be fragmented into multiple IP
> packets and can be larger than 64KiB:
> https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
>
> Is it possible to support large datagram messages in vsock? I'm a little
> concerned that applications that run successfully over UDP will not be
> portable if vsock has this limitation because it would impose extra
> message boundaries that the application protocol might not tolerate.
>
OK. I think one way is to support fragmentation as suggested by Stefano.

> > +
> >  \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
> >  \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
> >  stream sockets. The guest and the device publish how much buffer space is
> > @@ -162,7 +181,7 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
> >  u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt);
> >  \end{lstlisting}
> >
> > -If there is insufficient buffer space, the sender waits until virtqueue buffers
> > +For stream sockets, if there is insufficient buffer space, the sender waits until virtqueue buffers
> >  are returned and checks \field{buf_alloc} and \field{fwd_cnt} again. Sending
> >  the VIRTIO_VSOCK_OP_CREDIT_REQUEST packet queries how much buffer space is
> >  available. The reply to this query is a VIRTIO_VSOCK_OP_CREDIT_UPDATE packet.
> > @@ -170,16 +189,28 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
> >  previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
> >  communicating updates any time a change in buffer space occurs.
> >
> > +Unlike stream sockets, dgram sockets do not use VIRTIO_VSOCK_OP_CREDIT_UPDATE or
> > +VIRTIO_VSOCK_OP_CREDIT_REQUEST packets. The dgram buffer management
> > +is split to two parts: tx side and rx side. For the tx side, there is
> > +additional buffer space for each socket.
>
> Plus:
>
>   ... according to the the driver and device's available memory
>   resources. The amount of tx buffer space is an implementation detail
>   of both the device and the driver. It is not visible to the other side
>   and may be controlled by the application or administrative resource
>   limits.
>
> What I'm trying to describe here is that the additional tx buffer space
> isn't part of the device interface.
>
OK. Will do.

> > +The dgram sender sends packets when the virtqueue or the additional buffer is not full.
> > +When both are full, the sender
> > +MUST return an appropriate error to the upper layer application.
>
> MUST, SHOULD, etc clauses need to go into the
> devicenormative/drivernormative sections. They cannot be in regular
> text.
>
OK.

> > +For the rx side, dgram also uses the \field{buf_alloc}. If it is full, the packet
> > +is dropped by the receiver.
>
> UDP is connectionless so any number of other sources can send messages
> to the same destination, causing buf_alloc's value to be unpredictable.
> Can you explain how buf_alloc works with datagram sockets in more
> detail?

In the linux kernel in my prototype, datagram sockets also use
virtio_transport_inc_rx_pkt() and virtio_transport_dec_rx_pkt() to update
vvs->rx_bytes and compare it with vvs->buf_alloc. virtio_transport_inc_rx_pkt
is called when enqueuing the datagram packets.
virtio_transport_dec_rx_pkt is called when dequeuing those packets.

If multiple sources send messages to the same destination, they will share
the same buf_alloc. Do you want something with more control?
Maybe somehow allocate a buffer for each remote CID and port? But I
feel that is a little bit overkill. Any other suggestions?

> >  \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > -sufficient free buffer space for the payload.
> > +For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > +sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets
> > +MAY be transmitted when the peer buffer is full. Then the packet will be dropped by the receiver.
> >
> >  All packets associated with a stream flow MUST contain valid information in
> >  \field{buf_alloc} and \field{fwd_cnt} fields.
> >
> >  \devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > -sufficient free buffer space for the payload.
> > +For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > +sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets
> > +MAY be transmitted when the peer buffer is full. Then the packet will be dropped by the receiver.
> >
> >  All packets associated with a stream flow MUST contain valid information in
> >  \field{buf_alloc} and \field{fwd_cnt} fields.
> > @@ -203,14 +234,14 @@ \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / De
> >  The \field{guest_cid} configuration field MUST be used as the source CID when
> >  sending outgoing packets.
> >
> > -A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> > +For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> >  unknown \field{type} value.
>
> What about datagram sockets? Please state what must happen and why.

I think datagram sockets should do the same thing as the stream to
return a VIRTIO_VSOCK_OP_RST?
Any other ideas?

> >
> >  \devicenormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit}
> >
> >  The \field{guest_cid} configuration field MUST NOT contain a reserved CID as listed in \ref{sec:Device Types / Socket Device / Device configuration layout}.
> >
> > -A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> > +For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> >  unknown \field{type} value.
> >
> >  \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device Operation / Stream Sockets}
> > @@ -240,6 +271,17 @@ \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device O
> >  destination) address tuple for a new connection while the other peer is still
> >  processing the old connection.
> >
> > +\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Stream Sockets}
>
> s/Stream Sockets/Datagram Sockets/

Will do

> > +
> > +Datagram (dgram) sockets are connectionless and unreliable. The sender just sends
> > +a message to the peer and hope it will be delivered. A VIRTIO_VSOCK_OP_RST reply is sent if
>
> s/hope/hopes/

got it.

> > +a receiving socket does not exist on the destination.
> > +If the transmission or receiving buffers are full, the packets
> > +are dropped. If the transmission buffer is full, an appropriate error message
> > +is returned to the caller.
>
> It's unclear whether the caller is the driver/device or something else.
> I think you're referring to the application interace, which is outside
> the scope of the VIRTIO spec. I suggest dropping the last sentence.

Yeah, I was thinking about the application interface. I will drop this
sentence.

Thanks for all the feedback.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-12 14:21   ` Stefano Garzarella
@ 2021-04-12 22:42     ` Jiang Wang .
  2021-04-13 12:58       ` Stefano Garzarella
  0 siblings, 1 reply; 34+ messages in thread
From: Jiang Wang . @ 2021-04-12 22:42 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, asias,
	Arseny Krasnov

On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
> >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
> >> Add supports for datagram type for virtio-vsock. Datagram
> >> sockets are connectionless and unreliable. To avoid contention
> >> with stream and other sockets, add two more virtqueues and
> >> a new feature bit to identify if those two new queues exist or not.
> >>
> >> Also add descriptions for resource management of datagram, which
> >> does not use the existing credit update mechanism associated with
> >> stream sockets.
> >>
> >> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> >> ---
> >> V2 addressed the comments for the previous version.
> >>
> >>  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
> >>  1 file changed, 52 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/virtio-vsock.tex b/virtio-vsock.tex
> >> index da7e641..62c12e0 100644
> >> --- a/virtio-vsock.tex
> >> +++ b/virtio-vsock.tex
> >> @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
> >>  \begin{description}
> >>  \item[0] rx
> >>  \item[1] tx
> >> +\item[2] datagram rx
> >> +\item[3] datagram tx
> >> +\item[4] event
> >> +\end{description}
> >> +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
> >> +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
> >> +
> >> +\begin{description}
> >> +\item[0] rx
> >> +\item[1] tx
> >>  \item[2] event
> >>  \end{description}
> >>
> >
> >I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
> >virtqueues and also adding this:
> >
> >  When behavior differs between stream and datagram rx/tx virtqueues
> >  their full names are used. Common behavior is simply described in
> >  terms of rx/tx virtqueues and applies to both stream and datagram
> >  virtqueues.
> >
> >This way you won't need to duplicate portions of the spec that deal with
> >populating the virtqueues, for example.
> >
> >It's also clearer to use a full name for stream rx/tx virtqueues instead
> >of calling them rx/tx virtqueues now that we have datagram rx/tx
> >virtqueues.
> >
> >> +
> >>  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> >>
> >> -There are currently no feature bits defined for this device.
> >> +\begin{description}
> >> +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
> >> +\end{description}
> >>
> >>  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
> >>
> >> @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
> >>
> >>  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
> >>
> >> +Flow control applies to stream sockets; datagram sockets do not have
> >> +flow control.
> >> +
> >>  The tx virtqueue carries packets initiated by applications and replies to
> >>  received packets.  The rx virtqueue carries packets initiated by the device and
> >>  replies to previously transmitted packets.
> >> @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> >>  consists of a (cid, port number) tuple. The header fields used for this are
> >>  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> >>
> >> -Currently only stream sockets are supported. \field{type} is 1 for stream
> >> -socket types.
> >> +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
> >> +socket types. \field{type} is 3 for dgram socket types.
> >>
> >>  Stream sockets provide in-order, guaranteed, connection-oriented delivery
> >>  without message boundaries.
> >>
> >> +Datagram sockets provide connectionless unreliable messages of
> >> +a fixed maximum length.
> >
> >Plus unordered (?) and with message boundaries. In other words:
> >
> >  Datagram sockets provide unordered, unreliable, connectionless message
> >  with message boundaries and a fixed maximum length.
> >
> >I didn't think of the fixed maximum length aspect before. I guess the
> >intention is that the rx buffer size is the message size limit? That's
> >different from UDP messages, which can be fragmented into multiple IP
> >packets and can be larger than 64KiB:
> >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
> >
> >Is it possible to support large datagram messages in vsock? I'm a little
> >concerned that applications that run successfully over UDP will not be
> >portable if vsock has this limitation because it would impose extra
> >message boundaries that the application protocol might not tolerate.
>
> Maybe we can reuse the same approach Arseny is using for SEQPACKET.
> Fragment the packets according to the buffers in the virtqueue and set
> the EOR flag to indicate the last packet in the message.
>
Agree. Another option is to use the ones for skb since we may need to
use skbs for multiple transport support anyway.

> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-12 22:42     ` Jiang Wang .
@ 2021-04-13 12:58       ` Stefano Garzarella
  2021-04-13 13:16         ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Stefano Garzarella @ 2021-04-13 12:58 UTC (permalink / raw)
  To: Jiang Wang .
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, asias,
	Arseny Krasnov

On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote:
>On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
>> >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
>> >> Add supports for datagram type for virtio-vsock. Datagram
>> >> sockets are connectionless and unreliable. To avoid contention
>> >> with stream and other sockets, add two more virtqueues and
>> >> a new feature bit to identify if those two new queues exist or not.
>> >>
>> >> Also add descriptions for resource management of datagram, which
>> >> does not use the existing credit update mechanism associated with
>> >> stream sockets.
>> >>
>> >> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>> >> ---
>> >> V2 addressed the comments for the previous version.
>> >>
>> >>  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
>> >>  1 file changed, 52 insertions(+), 10 deletions(-)
>> >>
>> >> diff --git a/virtio-vsock.tex b/virtio-vsock.tex
>> >> index da7e641..62c12e0 100644
>> >> --- a/virtio-vsock.tex
>> >> +++ b/virtio-vsock.tex
>> >> @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
>> >>  \begin{description}
>> >>  \item[0] rx
>> >>  \item[1] tx
>> >> +\item[2] datagram rx
>> >> +\item[3] datagram tx
>> >> +\item[4] event
>> >> +\end{description}
>> >> +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
>> >> +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
>> >> +
>> >> +\begin{description}
>> >> +\item[0] rx
>> >> +\item[1] tx
>> >>  \item[2] event
>> >>  \end{description}
>> >>
>> >
>> >I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
>> >virtqueues and also adding this:
>> >
>> >  When behavior differs between stream and datagram rx/tx virtqueues
>> >  their full names are used. Common behavior is simply described in
>> >  terms of rx/tx virtqueues and applies to both stream and datagram
>> >  virtqueues.
>> >
>> >This way you won't need to duplicate portions of the spec that deal with
>> >populating the virtqueues, for example.
>> >
>> >It's also clearer to use a full name for stream rx/tx virtqueues instead
>> >of calling them rx/tx virtqueues now that we have datagram rx/tx
>> >virtqueues.
>> >
>> >> +
>> >>  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
>> >>
>> >> -There are currently no feature bits defined for this device.
>> >> +\begin{description}
>> >> +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
>> >> +\end{description}
>> >>
>> >>  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
>> >>
>> >> @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
>> >>
>> >>  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
>> >>
>> >> +Flow control applies to stream sockets; datagram sockets do not have
>> >> +flow control.
>> >> +
>> >>  The tx virtqueue carries packets initiated by applications and replies to
>> >>  received packets.  The rx virtqueue carries packets initiated by the device and
>> >>  replies to previously transmitted packets.
>> >> @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
>> >>  consists of a (cid, port number) tuple. The header fields used for this are
>> >>  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
>> >>
>> >> -Currently only stream sockets are supported. \field{type} is 1 for stream
>> >> -socket types.
>> >> +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
>> >> +socket types. \field{type} is 3 for dgram socket types.
>> >>
>> >>  Stream sockets provide in-order, guaranteed, connection-oriented delivery
>> >>  without message boundaries.
>> >>
>> >> +Datagram sockets provide connectionless unreliable messages of
>> >> +a fixed maximum length.
>> >
>> >Plus unordered (?) and with message boundaries. In other words:
>> >
>> >  Datagram sockets provide unordered, unreliable, connectionless message
>> >  with message boundaries and a fixed maximum length.
>> >
>> >I didn't think of the fixed maximum length aspect before. I guess the
>> >intention is that the rx buffer size is the message size limit? That's
>> >different from UDP messages, which can be fragmented into multiple IP
>> >packets and can be larger than 64KiB:
>> >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
>> >
>> >Is it possible to support large datagram messages in vsock? I'm a little
>> >concerned that applications that run successfully over UDP will not be
>> >portable if vsock has this limitation because it would impose extra
>> >message boundaries that the application protocol might not tolerate.
>>
>> Maybe we can reuse the same approach Arseny is using for SEQPACKET.
>> Fragment the packets according to the buffers in the virtqueue and set
>> the EOR flag to indicate the last packet in the message.
>>
>Agree. Another option is to use the ones for skb since we may need to
>use skbs for multiple transport support anyway.
>

The important thing I think is to have a single flag in virtio-vsock 
that identifies pretty much the same thing: this is the last fragment of 
a series to rebuild a packet.

We should reuse the same flag for DGRAM and SEQPACKET.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-13 12:58       ` Stefano Garzarella
@ 2021-04-13 13:16         ` Michael S. Tsirkin
  2021-04-13 13:38           ` Stefano Garzarella
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2021-04-13 13:16 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Jiang Wang .,
	cohuck, virtualization, xieyongji, Stefan Hajnoczi,
	Arseny Krasnov, asias

On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote:
> On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote:
> > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > 
> > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
> > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
> > > >> Add supports for datagram type for virtio-vsock. Datagram
> > > >> sockets are connectionless and unreliable. To avoid contention
> > > >> with stream and other sockets, add two more virtqueues and
> > > >> a new feature bit to identify if those two new queues exist or not.
> > > >>
> > > >> Also add descriptions for resource management of datagram, which
> > > >> does not use the existing credit update mechanism associated with
> > > >> stream sockets.
> > > >>
> > > >> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> > > >> ---
> > > >> V2 addressed the comments for the previous version.
> > > >>
> > > >>  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
> > > >>  1 file changed, 52 insertions(+), 10 deletions(-)
> > > >>
> > > >> diff --git a/virtio-vsock.tex b/virtio-vsock.tex
> > > >> index da7e641..62c12e0 100644
> > > >> --- a/virtio-vsock.tex
> > > >> +++ b/virtio-vsock.tex
> > > >> @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
> > > >>  \begin{description}
> > > >>  \item[0] rx
> > > >>  \item[1] tx
> > > >> +\item[2] datagram rx
> > > >> +\item[3] datagram tx
> > > >> +\item[4] event
> > > >> +\end{description}
> > > >> +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
> > > >> +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
> > > >> +
> > > >> +\begin{description}
> > > >> +\item[0] rx
> > > >> +\item[1] tx
> > > >>  \item[2] event
> > > >>  \end{description}
> > > >>
> > > >
> > > >I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
> > > >virtqueues and also adding this:
> > > >
> > > >  When behavior differs between stream and datagram rx/tx virtqueues
> > > >  their full names are used. Common behavior is simply described in
> > > >  terms of rx/tx virtqueues and applies to both stream and datagram
> > > >  virtqueues.
> > > >
> > > >This way you won't need to duplicate portions of the spec that deal with
> > > >populating the virtqueues, for example.
> > > >
> > > >It's also clearer to use a full name for stream rx/tx virtqueues instead
> > > >of calling them rx/tx virtqueues now that we have datagram rx/tx
> > > >virtqueues.
> > > >
> > > >> +
> > > >>  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> > > >>
> > > >> -There are currently no feature bits defined for this device.
> > > >> +\begin{description}
> > > >> +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
> > > >> +\end{description}
> > > >>
> > > >>  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
> > > >>
> > > >> @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
> > > >>
> > > >>  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
> > > >>
> > > >> +Flow control applies to stream sockets; datagram sockets do not have
> > > >> +flow control.
> > > >> +
> > > >>  The tx virtqueue carries packets initiated by applications and replies to
> > > >>  received packets.  The rx virtqueue carries packets initiated by the device and
> > > >>  replies to previously transmitted packets.
> > > >> @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> > > >>  consists of a (cid, port number) tuple. The header fields used for this are
> > > >>  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> > > >>
> > > >> -Currently only stream sockets are supported. \field{type} is 1 for stream
> > > >> -socket types.
> > > >> +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
> > > >> +socket types. \field{type} is 3 for dgram socket types.
> > > >>
> > > >>  Stream sockets provide in-order, guaranteed, connection-oriented delivery
> > > >>  without message boundaries.
> > > >>
> > > >> +Datagram sockets provide connectionless unreliable messages of
> > > >> +a fixed maximum length.
> > > >
> > > >Plus unordered (?) and with message boundaries. In other words:
> > > >
> > > >  Datagram sockets provide unordered, unreliable, connectionless message
> > > >  with message boundaries and a fixed maximum length.
> > > >
> > > >I didn't think of the fixed maximum length aspect before. I guess the
> > > >intention is that the rx buffer size is the message size limit? That's
> > > >different from UDP messages, which can be fragmented into multiple IP
> > > >packets and can be larger than 64KiB:
> > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
> > > >
> > > >Is it possible to support large datagram messages in vsock? I'm a little
> > > >concerned that applications that run successfully over UDP will not be
> > > >portable if vsock has this limitation because it would impose extra
> > > >message boundaries that the application protocol might not tolerate.
> > > 
> > > Maybe we can reuse the same approach Arseny is using for SEQPACKET.
> > > Fragment the packets according to the buffers in the virtqueue and set
> > > the EOR flag to indicate the last packet in the message.
> > > 
> > Agree. Another option is to use the ones for skb since we may need to
> > use skbs for multiple transport support anyway.
> > 
> 
> The important thing I think is to have a single flag in virtio-vsock that
> identifies pretty much the same thing: this is the last fragment of a series
> to rebuild a packet.
> 
> We should reuse the same flag for DGRAM and SEQPACKET.
> 
> Thanks,
> Stefano

Well DGRAM can drop data so I wonder whether it can work ...

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-13 13:16         ` Michael S. Tsirkin
@ 2021-04-13 13:38           ` Stefano Garzarella
  2021-04-13 13:50             ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Stefano Garzarella @ 2021-04-13 13:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: cong.wang, Xiongchun Duan, Jiang Wang .,
	cohuck, virtualization, xieyongji, Stefan Hajnoczi,
	Arseny Krasnov, asias

On Tue, Apr 13, 2021 at 09:16:50AM -0400, Michael S. Tsirkin wrote:
>On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote:
>> On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote:
>> > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> > >
>> > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
>> > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
>> > > >> Add supports for datagram type for virtio-vsock. Datagram
>> > > >> sockets are connectionless and unreliable. To avoid contention
>> > > >> with stream and other sockets, add two more virtqueues and
>> > > >> a new feature bit to identify if those two new queues exist or not.
>> > > >>
>> > > >> Also add descriptions for resource management of datagram, which
>> > > >> does not use the existing credit update mechanism associated with
>> > > >> stream sockets.
>> > > >>
>> > > >> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>> > > >> ---
>> > > >> V2 addressed the comments for the previous version.
>> > > >>
>> > > >>  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
>> > > >>  1 file changed, 52 insertions(+), 10 deletions(-)
>> > > >>
>> > > >> diff --git a/virtio-vsock.tex b/virtio-vsock.tex
>> > > >> index da7e641..62c12e0 100644
>> > > >> --- a/virtio-vsock.tex
>> > > >> +++ b/virtio-vsock.tex
>> > > >> @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
>> > > >>  \begin{description}
>> > > >>  \item[0] rx
>> > > >>  \item[1] tx
>> > > >> +\item[2] datagram rx
>> > > >> +\item[3] datagram tx
>> > > >> +\item[4] event
>> > > >> +\end{description}
>> > > >> +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
>> > > >> +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
>> > > >> +
>> > > >> +\begin{description}
>> > > >> +\item[0] rx
>> > > >> +\item[1] tx
>> > > >>  \item[2] event
>> > > >>  \end{description}
>> > > >>
>> > > >
>> > > >I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
>> > > >virtqueues and also adding this:
>> > > >
>> > > >  When behavior differs between stream and datagram rx/tx virtqueues
>> > > >  their full names are used. Common behavior is simply described in
>> > > >  terms of rx/tx virtqueues and applies to both stream and datagram
>> > > >  virtqueues.
>> > > >
>> > > >This way you won't need to duplicate portions of the spec that deal with
>> > > >populating the virtqueues, for example.
>> > > >
>> > > >It's also clearer to use a full name for stream rx/tx virtqueues instead
>> > > >of calling them rx/tx virtqueues now that we have datagram rx/tx
>> > > >virtqueues.
>> > > >
>> > > >> +
>> > > >>  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
>> > > >>
>> > > >> -There are currently no feature bits defined for this device.
>> > > >> +\begin{description}
>> > > >> +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
>> > > >> +\end{description}
>> > > >>
>> > > >>  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
>> > > >>
>> > > >> @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
>> > > >>
>> > > >>  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
>> > > >>
>> > > >> +Flow control applies to stream sockets; datagram sockets do not have
>> > > >> +flow control.
>> > > >> +
>> > > >>  The tx virtqueue carries packets initiated by applications and replies to
>> > > >>  received packets.  The rx virtqueue carries packets initiated by the device and
>> > > >>  replies to previously transmitted packets.
>> > > >> @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
>> > > >>  consists of a (cid, port number) tuple. The header fields used for this are
>> > > >>  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
>> > > >>
>> > > >> -Currently only stream sockets are supported. \field{type} is 1 for stream
>> > > >> -socket types.
>> > > >> +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
>> > > >> +socket types. \field{type} is 3 for dgram socket types.
>> > > >>
>> > > >>  Stream sockets provide in-order, guaranteed, connection-oriented delivery
>> > > >>  without message boundaries.
>> > > >>
>> > > >> +Datagram sockets provide connectionless unreliable messages of
>> > > >> +a fixed maximum length.
>> > > >
>> > > >Plus unordered (?) and with message boundaries. In other words:
>> > > >
>> > > >  Datagram sockets provide unordered, unreliable, connectionless message
>> > > >  with message boundaries and a fixed maximum length.
>> > > >
>> > > >I didn't think of the fixed maximum length aspect before. I guess the
>> > > >intention is that the rx buffer size is the message size limit? That's
>> > > >different from UDP messages, which can be fragmented into multiple IP
>> > > >packets and can be larger than 64KiB:
>> > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
>> > > >
>> > > >Is it possible to support large datagram messages in vsock? I'm a little
>> > > >concerned that applications that run successfully over UDP will not be
>> > > >portable if vsock has this limitation because it would impose extra
>> > > >message boundaries that the application protocol might not tolerate.
>> > >
>> > > Maybe we can reuse the same approach Arseny is using for SEQPACKET.
>> > > Fragment the packets according to the buffers in the virtqueue and set
>> > > the EOR flag to indicate the last packet in the message.
>> > >
>> > Agree. Another option is to use the ones for skb since we may need to
>> > use skbs for multiple transport support anyway.
>> >
>>
>> The important thing I think is to have a single flag in virtio-vsock that
>> identifies pretty much the same thing: this is the last fragment of a series
>> to rebuild a packet.
>>
>> We should reuse the same flag for DGRAM and SEQPACKET.
>>
>> Thanks,
>> Stefano
>
>Well DGRAM can drop data so I wonder whether it can work ...
>

Yep, this is true, but the channel should not be losing packets, so if 
the receiver discards packets, it knows that it must then discard all of 
them until the EOR.

If the transmitter has to discard a packet, actually that could be a 
problem...
We could provide another flag or a special packet to say that the rest 
of the packet won't arrive and so you have to discard it all.

Otherwise we need to go back to boundary packets as in the first 
implementation of SEQPACKET.

Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-13 13:38           ` Stefano Garzarella
@ 2021-04-13 13:50             ` Michael S. Tsirkin
  2021-04-13 14:03               ` Stefano Garzarella
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2021-04-13 13:50 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Jiang Wang .,
	cohuck, virtualization, xieyongji, Stefan Hajnoczi,
	Arseny Krasnov, asias

On Tue, Apr 13, 2021 at 03:38:52PM +0200, Stefano Garzarella wrote:
> On Tue, Apr 13, 2021 at 09:16:50AM -0400, Michael S. Tsirkin wrote:
> > On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote:
> > > On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote:
> > > > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > >
> > > > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
> > > > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
> > > > > >> Add supports for datagram type for virtio-vsock. Datagram
> > > > > >> sockets are connectionless and unreliable. To avoid contention
> > > > > >> with stream and other sockets, add two more virtqueues and
> > > > > >> a new feature bit to identify if those two new queues exist or not.
> > > > > >>
> > > > > >> Also add descriptions for resource management of datagram, which
> > > > > >> does not use the existing credit update mechanism associated with
> > > > > >> stream sockets.
> > > > > >>
> > > > > >> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> > > > > >> ---
> > > > > >> V2 addressed the comments for the previous version.
> > > > > >>
> > > > > >>  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
> > > > > >>  1 file changed, 52 insertions(+), 10 deletions(-)
> > > > > >>
> > > > > >> diff --git a/virtio-vsock.tex b/virtio-vsock.tex
> > > > > >> index da7e641..62c12e0 100644
> > > > > >> --- a/virtio-vsock.tex
> > > > > >> +++ b/virtio-vsock.tex
> > > > > >> @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
> > > > > >>  \begin{description}
> > > > > >>  \item[0] rx
> > > > > >>  \item[1] tx
> > > > > >> +\item[2] datagram rx
> > > > > >> +\item[3] datagram tx
> > > > > >> +\item[4] event
> > > > > >> +\end{description}
> > > > > >> +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
> > > > > >> +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
> > > > > >> +
> > > > > >> +\begin{description}
> > > > > >> +\item[0] rx
> > > > > >> +\item[1] tx
> > > > > >>  \item[2] event
> > > > > >>  \end{description}
> > > > > >>
> > > > > >
> > > > > >I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
> > > > > >virtqueues and also adding this:
> > > > > >
> > > > > >  When behavior differs between stream and datagram rx/tx virtqueues
> > > > > >  their full names are used. Common behavior is simply described in
> > > > > >  terms of rx/tx virtqueues and applies to both stream and datagram
> > > > > >  virtqueues.
> > > > > >
> > > > > >This way you won't need to duplicate portions of the spec that deal with
> > > > > >populating the virtqueues, for example.
> > > > > >
> > > > > >It's also clearer to use a full name for stream rx/tx virtqueues instead
> > > > > >of calling them rx/tx virtqueues now that we have datagram rx/tx
> > > > > >virtqueues.
> > > > > >
> > > > > >> +
> > > > > >>  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> > > > > >>
> > > > > >> -There are currently no feature bits defined for this device.
> > > > > >> +\begin{description}
> > > > > >> +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
> > > > > >> +\end{description}
> > > > > >>
> > > > > >>  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
> > > > > >>
> > > > > >> @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
> > > > > >>
> > > > > >>  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
> > > > > >>
> > > > > >> +Flow control applies to stream sockets; datagram sockets do not have
> > > > > >> +flow control.
> > > > > >> +
> > > > > >>  The tx virtqueue carries packets initiated by applications and replies to
> > > > > >>  received packets.  The rx virtqueue carries packets initiated by the device and
> > > > > >>  replies to previously transmitted packets.
> > > > > >> @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> > > > > >>  consists of a (cid, port number) tuple. The header fields used for this are
> > > > > >>  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> > > > > >>
> > > > > >> -Currently only stream sockets are supported. \field{type} is 1 for stream
> > > > > >> -socket types.
> > > > > >> +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
> > > > > >> +socket types. \field{type} is 3 for dgram socket types.
> > > > > >>
> > > > > >>  Stream sockets provide in-order, guaranteed, connection-oriented delivery
> > > > > >>  without message boundaries.
> > > > > >>
> > > > > >> +Datagram sockets provide connectionless unreliable messages of
> > > > > >> +a fixed maximum length.
> > > > > >
> > > > > >Plus unordered (?) and with message boundaries. In other words:
> > > > > >
> > > > > >  Datagram sockets provide unordered, unreliable, connectionless message
> > > > > >  with message boundaries and a fixed maximum length.
> > > > > >
> > > > > >I didn't think of the fixed maximum length aspect before. I guess the
> > > > > >intention is that the rx buffer size is the message size limit? That's
> > > > > >different from UDP messages, which can be fragmented into multiple IP
> > > > > >packets and can be larger than 64KiB:
> > > > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
> > > > > >
> > > > > >Is it possible to support large datagram messages in vsock? I'm a little
> > > > > >concerned that applications that run successfully over UDP will not be
> > > > > >portable if vsock has this limitation because it would impose extra
> > > > > >message boundaries that the application protocol might not tolerate.
> > > > >
> > > > > Maybe we can reuse the same approach Arseny is using for SEQPACKET.
> > > > > Fragment the packets according to the buffers in the virtqueue and set
> > > > > the EOR flag to indicate the last packet in the message.
> > > > >
> > > > Agree. Another option is to use the ones for skb since we may need to
> > > > use skbs for multiple transport support anyway.
> > > >
> > > 
> > > The important thing I think is to have a single flag in virtio-vsock that
> > > identifies pretty much the same thing: this is the last fragment of a series
> > > to rebuild a packet.
> > > 
> > > We should reuse the same flag for DGRAM and SEQPACKET.
> > > 
> > > Thanks,
> > > Stefano
> > 
> > Well DGRAM can drop data so I wonder whether it can work ...
> > 
> 
> Yep, this is true, but the channel should not be losing packets, so if the
> receiver discards packets, it knows that it must then discard all of them
> until the EOR.

That is not so easy - they can come mixed up from multiple sources.
Sure linux net core does this but with fragmentation added in,
I start wondering whether you are beginning to reinvent the net stack
...

> If the transmitter has to discard a packet, actually that could be a
> problem...
> We could provide another flag or a special packet to say that the rest of
> the packet won't arrive and so you have to discard it all.
> 
> Otherwise we need to go back to boundary packets as in the first
> implementation of SEQPACKET.
> 
> Stefano


-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-13 13:50             ` Michael S. Tsirkin
@ 2021-04-13 14:03               ` Stefano Garzarella
  2021-04-13 19:58                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Stefano Garzarella @ 2021-04-13 14:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: cong.wang, Xiongchun Duan, Jiang Wang .,
	cohuck, virtualization, xieyongji, Stefan Hajnoczi,
	Arseny Krasnov, asias

On Tue, Apr 13, 2021 at 09:50:45AM -0400, Michael S. Tsirkin wrote:
>On Tue, Apr 13, 2021 at 03:38:52PM +0200, Stefano Garzarella wrote:
>> On Tue, Apr 13, 2021 at 09:16:50AM -0400, Michael S. Tsirkin wrote:
>> > On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote:
>> > > On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote:
>> > > > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> > > > >
>> > > > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
>> > > > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
>> > > > > >> Add supports for datagram type for virtio-vsock. Datagram
>> > > > > >> sockets are connectionless and unreliable. To avoid contention
>> > > > > >> with stream and other sockets, add two more virtqueues and
>> > > > > >> a new feature bit to identify if those two new queues exist or not.
>> > > > > >>
>> > > > > >> Also add descriptions for resource management of datagram, which
>> > > > > >> does not use the existing credit update mechanism associated with
>> > > > > >> stream sockets.
>> > > > > >>
>> > > > > >> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>> > > > > >> ---
>> > > > > >> V2 addressed the comments for the previous version.
>> > > > > >>
>> > > > > >>  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
>> > > > > >>  1 file changed, 52 insertions(+), 10 deletions(-)
>> > > > > >>
>> > > > > >> diff --git a/virtio-vsock.tex b/virtio-vsock.tex
>> > > > > >> index da7e641..62c12e0 100644
>> > > > > >> --- a/virtio-vsock.tex
>> > > > > >> +++ b/virtio-vsock.tex
>> > > > > >> @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
>> > > > > >>  \begin{description}
>> > > > > >>  \item[0] rx
>> > > > > >>  \item[1] tx
>> > > > > >> +\item[2] datagram rx
>> > > > > >> +\item[3] datagram tx
>> > > > > >> +\item[4] event
>> > > > > >> +\end{description}
>> > > > > >> +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
>> > > > > >> +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
>> > > > > >> +
>> > > > > >> +\begin{description}
>> > > > > >> +\item[0] rx
>> > > > > >> +\item[1] tx
>> > > > > >>  \item[2] event
>> > > > > >>  \end{description}
>> > > > > >>
>> > > > > >
>> > > > > >I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
>> > > > > >virtqueues and also adding this:
>> > > > > >
>> > > > > >  When behavior differs between stream and datagram rx/tx virtqueues
>> > > > > >  their full names are used. Common behavior is simply described in
>> > > > > >  terms of rx/tx virtqueues and applies to both stream and datagram
>> > > > > >  virtqueues.
>> > > > > >
>> > > > > >This way you won't need to duplicate portions of the spec that deal with
>> > > > > >populating the virtqueues, for example.
>> > > > > >
>> > > > > >It's also clearer to use a full name for stream rx/tx virtqueues instead
>> > > > > >of calling them rx/tx virtqueues now that we have datagram rx/tx
>> > > > > >virtqueues.
>> > > > > >
>> > > > > >> +
>> > > > > >>  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
>> > > > > >>
>> > > > > >> -There are currently no feature bits defined for this device.
>> > > > > >> +\begin{description}
>> > > > > >> +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
>> > > > > >> +\end{description}
>> > > > > >>
>> > > > > >>  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
>> > > > > >>
>> > > > > >> @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
>> > > > > >>
>> > > > > >>  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
>> > > > > >>
>> > > > > >> +Flow control applies to stream sockets; datagram sockets do not have
>> > > > > >> +flow control.
>> > > > > >> +
>> > > > > >>  The tx virtqueue carries packets initiated by applications and replies to
>> > > > > >>  received packets.  The rx virtqueue carries packets initiated by the device and
>> > > > > >>  replies to previously transmitted packets.
>> > > > > >> @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
>> > > > > >>  consists of a (cid, port number) tuple. The header fields used for this are
>> > > > > >>  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
>> > > > > >>
>> > > > > >> -Currently only stream sockets are supported. \field{type} is 1 for stream
>> > > > > >> -socket types.
>> > > > > >> +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
>> > > > > >> +socket types. \field{type} is 3 for dgram socket types.
>> > > > > >>
>> > > > > >>  Stream sockets provide in-order, guaranteed, connection-oriented delivery
>> > > > > >>  without message boundaries.
>> > > > > >>
>> > > > > >> +Datagram sockets provide connectionless unreliable messages of
>> > > > > >> +a fixed maximum length.
>> > > > > >
>> > > > > >Plus unordered (?) and with message boundaries. In other words:
>> > > > > >
>> > > > > >  Datagram sockets provide unordered, unreliable, connectionless message
>> > > > > >  with message boundaries and a fixed maximum length.
>> > > > > >
>> > > > > >I didn't think of the fixed maximum length aspect before. I guess the
>> > > > > >intention is that the rx buffer size is the message size limit? That's
>> > > > > >different from UDP messages, which can be fragmented into multiple IP
>> > > > > >packets and can be larger than 64KiB:
>> > > > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
>> > > > > >
>> > > > > >Is it possible to support large datagram messages in vsock? I'm a little
>> > > > > >concerned that applications that run successfully over UDP will not be
>> > > > > >portable if vsock has this limitation because it would impose extra
>> > > > > >message boundaries that the application protocol might not tolerate.
>> > > > >
>> > > > > Maybe we can reuse the same approach Arseny is using for SEQPACKET.
>> > > > > Fragment the packets according to the buffers in the virtqueue and set
>> > > > > the EOR flag to indicate the last packet in the message.
>> > > > >
>> > > > Agree. Another option is to use the ones for skb since we may need to
>> > > > use skbs for multiple transport support anyway.
>> > > >
>> > >
>> > > The important thing I think is to have a single flag in virtio-vsock that
>> > > identifies pretty much the same thing: this is the last fragment of a series
>> > > to rebuild a packet.
>> > >
>> > > We should reuse the same flag for DGRAM and SEQPACKET.
>> > >
>> > > Thanks,
>> > > Stefano
>> >
>> > Well DGRAM can drop data so I wonder whether it can work ...
>> >
>>
>> Yep, this is true, but the channel should not be losing packets, so if the
>> receiver discards packets, it knows that it must then discard all of them
>> until the EOR.
>
>That is not so easy - they can come mixed up from multiple sources.

I think we can prevent mixing because virtuqueue is point to point and 
its use is not thread safe, so the access (in the same peer) is already 
serialized.
In the end the packet would be fragmented only before copying it to the 
virtuqueue.

But maybe I missed something...

>Sure linux net core does this but with fragmentation added in,
>I start wondering whether you are beginning to reinvent the net stack
>...

No, I hope not :-), in the end our advantage is that we have a channel 
that doesn't lose packets, so I guess we can make assumptions that the 
network stack can't.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-13 14:03               ` Stefano Garzarella
@ 2021-04-13 19:58                 ` Michael S. Tsirkin
  2021-04-13 22:00                   ` Jiang Wang .
  2021-04-14  6:57                   ` Stefano Garzarella
  0 siblings, 2 replies; 34+ messages in thread
From: Michael S. Tsirkin @ 2021-04-13 19:58 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Jiang Wang .,
	cohuck, virtualization, xieyongji, Stefan Hajnoczi,
	Arseny Krasnov, asias

On Tue, Apr 13, 2021 at 04:03:51PM +0200, Stefano Garzarella wrote:
> On Tue, Apr 13, 2021 at 09:50:45AM -0400, Michael S. Tsirkin wrote:
> > On Tue, Apr 13, 2021 at 03:38:52PM +0200, Stefano Garzarella wrote:
> > > On Tue, Apr 13, 2021 at 09:16:50AM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote:
> > > > > On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote:
> > > > > > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
> > > > > > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
> > > > > > > >> Add supports for datagram type for virtio-vsock. Datagram
> > > > > > > >> sockets are connectionless and unreliable. To avoid contention
> > > > > > > >> with stream and other sockets, add two more virtqueues and
> > > > > > > >> a new feature bit to identify if those two new queues exist or not.
> > > > > > > >>
> > > > > > > >> Also add descriptions for resource management of datagram, which
> > > > > > > >> does not use the existing credit update mechanism associated with
> > > > > > > >> stream sockets.
> > > > > > > >>
> > > > > > > >> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> > > > > > > >> ---
> > > > > > > >> V2 addressed the comments for the previous version.
> > > > > > > >>
> > > > > > > >>  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
> > > > > > > >>  1 file changed, 52 insertions(+), 10 deletions(-)
> > > > > > > >>
> > > > > > > >> diff --git a/virtio-vsock.tex b/virtio-vsock.tex
> > > > > > > >> index da7e641..62c12e0 100644
> > > > > > > >> --- a/virtio-vsock.tex
> > > > > > > >> +++ b/virtio-vsock.tex
> > > > > > > >> @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
> > > > > > > >>  \begin{description}
> > > > > > > >>  \item[0] rx
> > > > > > > >>  \item[1] tx
> > > > > > > >> +\item[2] datagram rx
> > > > > > > >> +\item[3] datagram tx
> > > > > > > >> +\item[4] event
> > > > > > > >> +\end{description}
> > > > > > > >> +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
> > > > > > > >> +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
> > > > > > > >> +
> > > > > > > >> +\begin{description}
> > > > > > > >> +\item[0] rx
> > > > > > > >> +\item[1] tx
> > > > > > > >>  \item[2] event
> > > > > > > >>  \end{description}
> > > > > > > >>
> > > > > > > >
> > > > > > > >I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
> > > > > > > >virtqueues and also adding this:
> > > > > > > >
> > > > > > > >  When behavior differs between stream and datagram rx/tx virtqueues
> > > > > > > >  their full names are used. Common behavior is simply described in
> > > > > > > >  terms of rx/tx virtqueues and applies to both stream and datagram
> > > > > > > >  virtqueues.
> > > > > > > >
> > > > > > > >This way you won't need to duplicate portions of the spec that deal with
> > > > > > > >populating the virtqueues, for example.
> > > > > > > >
> > > > > > > >It's also clearer to use a full name for stream rx/tx virtqueues instead
> > > > > > > >of calling them rx/tx virtqueues now that we have datagram rx/tx
> > > > > > > >virtqueues.
> > > > > > > >
> > > > > > > >> +
> > > > > > > >>  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> > > > > > > >>
> > > > > > > >> -There are currently no feature bits defined for this device.
> > > > > > > >> +\begin{description}
> > > > > > > >> +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
> > > > > > > >> +\end{description}
> > > > > > > >>
> > > > > > > >>  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
> > > > > > > >>
> > > > > > > >> @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
> > > > > > > >>
> > > > > > > >>  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
> > > > > > > >>
> > > > > > > >> +Flow control applies to stream sockets; datagram sockets do not have
> > > > > > > >> +flow control.
> > > > > > > >> +
> > > > > > > >>  The tx virtqueue carries packets initiated by applications and replies to
> > > > > > > >>  received packets.  The rx virtqueue carries packets initiated by the device and
> > > > > > > >>  replies to previously transmitted packets.
> > > > > > > >> @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> > > > > > > >>  consists of a (cid, port number) tuple. The header fields used for this are
> > > > > > > >>  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> > > > > > > >>
> > > > > > > >> -Currently only stream sockets are supported. \field{type} is 1 for stream
> > > > > > > >> -socket types.
> > > > > > > >> +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
> > > > > > > >> +socket types. \field{type} is 3 for dgram socket types.
> > > > > > > >>
> > > > > > > >>  Stream sockets provide in-order, guaranteed, connection-oriented delivery
> > > > > > > >>  without message boundaries.
> > > > > > > >>
> > > > > > > >> +Datagram sockets provide connectionless unreliable messages of
> > > > > > > >> +a fixed maximum length.
> > > > > > > >
> > > > > > > >Plus unordered (?) and with message boundaries. In other words:
> > > > > > > >
> > > > > > > >  Datagram sockets provide unordered, unreliable, connectionless message
> > > > > > > >  with message boundaries and a fixed maximum length.
> > > > > > > >
> > > > > > > >I didn't think of the fixed maximum length aspect before. I guess the
> > > > > > > >intention is that the rx buffer size is the message size limit? That's
> > > > > > > >different from UDP messages, which can be fragmented into multiple IP
> > > > > > > >packets and can be larger than 64KiB:
> > > > > > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
> > > > > > > >
> > > > > > > >Is it possible to support large datagram messages in vsock? I'm a little
> > > > > > > >concerned that applications that run successfully over UDP will not be
> > > > > > > >portable if vsock has this limitation because it would impose extra
> > > > > > > >message boundaries that the application protocol might not tolerate.
> > > > > > >
> > > > > > > Maybe we can reuse the same approach Arseny is using for SEQPACKET.
> > > > > > > Fragment the packets according to the buffers in the virtqueue and set
> > > > > > > the EOR flag to indicate the last packet in the message.
> > > > > > >
> > > > > > Agree. Another option is to use the ones for skb since we may need to
> > > > > > use skbs for multiple transport support anyway.
> > > > > >
> > > > >
> > > > > The important thing I think is to have a single flag in virtio-vsock that
> > > > > identifies pretty much the same thing: this is the last fragment of a series
> > > > > to rebuild a packet.
> > > > >
> > > > > We should reuse the same flag for DGRAM and SEQPACKET.
> > > > >
> > > > > Thanks,
> > > > > Stefano
> > > >
> > > > Well DGRAM can drop data so I wonder whether it can work ...
> > > >
> > > 
> > > Yep, this is true, but the channel should not be losing packets, so if the
> > > receiver discards packets, it knows that it must then discard all of them
> > > until the EOR.
> > 
> > That is not so easy - they can come mixed up from multiple sources.
> 
> I think we can prevent mixing because virtuqueue is point to point and its
> use is not thread safe, so the access (in the same peer) is already
> serialized.
> In the end the packet would be fragmented only before copying it to the
> virtuqueue.
> 
> But maybe I missed something...

Well I ask what's the point of fragmenting then. I assume it's so we
can pass huge messages around so you can't keep locks ...


> > Sure linux net core does this but with fragmentation added in,
> > I start wondering whether you are beginning to reinvent the net stack
> > ...
> 
> No, I hope not :-), in the end our advantage is that we have a channel that
> doesn't lose packets, so I guess we can make assumptions that the network
> stack can't.
> 
> Thanks,
> Stefano

I still don't know how will credit accounting work for datagram,
but proposals I saw seem to actually lose packets ...

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-13 19:58                 ` Michael S. Tsirkin
@ 2021-04-13 22:00                   ` Jiang Wang .
  2021-04-14  7:07                     ` Stefano Garzarella
  2021-04-14  6:57                   ` Stefano Garzarella
  1 sibling, 1 reply; 34+ messages in thread
From: Jiang Wang . @ 2021-04-13 22:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: cong.wang, Xiongchun Duan, cohuck, virtualization, xieyongji,
	Stefan Hajnoczi, asias, Arseny Krasnov

On Tue, Apr 13, 2021 at 12:58 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Apr 13, 2021 at 04:03:51PM +0200, Stefano Garzarella wrote:
> > On Tue, Apr 13, 2021 at 09:50:45AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, Apr 13, 2021 at 03:38:52PM +0200, Stefano Garzarella wrote:
> > > > On Tue, Apr 13, 2021 at 09:16:50AM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote:
> > > > > > On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote:
> > > > > > > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
> > > > > > > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
> > > > > > > > >> Add supports for datagram type for virtio-vsock. Datagram
> > > > > > > > >> sockets are connectionless and unreliable. To avoid contention
> > > > > > > > >> with stream and other sockets, add two more virtqueues and
> > > > > > > > >> a new feature bit to identify if those two new queues exist or not.
> > > > > > > > >>
> > > > > > > > >> Also add descriptions for resource management of datagram, which
> > > > > > > > >> does not use the existing credit update mechanism associated with
> > > > > > > > >> stream sockets.
> > > > > > > > >>
> > > > > > > > >> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> > > > > > > > >> ---
> > > > > > > > >> V2 addressed the comments for the previous version.
> > > > > > > > >>
> > > > > > > > >>  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
> > > > > > > > >>  1 file changed, 52 insertions(+), 10 deletions(-)
> > > > > > > > >>
> > > > > > > > >> diff --git a/virtio-vsock.tex b/virtio-vsock.tex
> > > > > > > > >> index da7e641..62c12e0 100644
> > > > > > > > >> --- a/virtio-vsock.tex
> > > > > > > > >> +++ b/virtio-vsock.tex
> > > > > > > > >> @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
> > > > > > > > >>  \begin{description}
> > > > > > > > >>  \item[0] rx
> > > > > > > > >>  \item[1] tx
> > > > > > > > >> +\item[2] datagram rx
> > > > > > > > >> +\item[3] datagram tx
> > > > > > > > >> +\item[4] event
> > > > > > > > >> +\end{description}
> > > > > > > > >> +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
> > > > > > > > >> +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
> > > > > > > > >> +
> > > > > > > > >> +\begin{description}
> > > > > > > > >> +\item[0] rx
> > > > > > > > >> +\item[1] tx
> > > > > > > > >>  \item[2] event
> > > > > > > > >>  \end{description}
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
> > > > > > > > >virtqueues and also adding this:
> > > > > > > > >
> > > > > > > > >  When behavior differs between stream and datagram rx/tx virtqueues
> > > > > > > > >  their full names are used. Common behavior is simply described in
> > > > > > > > >  terms of rx/tx virtqueues and applies to both stream and datagram
> > > > > > > > >  virtqueues.
> > > > > > > > >
> > > > > > > > >This way you won't need to duplicate portions of the spec that deal with
> > > > > > > > >populating the virtqueues, for example.
> > > > > > > > >
> > > > > > > > >It's also clearer to use a full name for stream rx/tx virtqueues instead
> > > > > > > > >of calling them rx/tx virtqueues now that we have datagram rx/tx
> > > > > > > > >virtqueues.
> > > > > > > > >
> > > > > > > > >> +
> > > > > > > > >>  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> > > > > > > > >>
> > > > > > > > >> -There are currently no feature bits defined for this device.
> > > > > > > > >> +\begin{description}
> > > > > > > > >> +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
> > > > > > > > >> +\end{description}
> > > > > > > > >>
> > > > > > > > >>  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
> > > > > > > > >>
> > > > > > > > >> @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
> > > > > > > > >>
> > > > > > > > >>  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
> > > > > > > > >>
> > > > > > > > >> +Flow control applies to stream sockets; datagram sockets do not have
> > > > > > > > >> +flow control.
> > > > > > > > >> +
> > > > > > > > >>  The tx virtqueue carries packets initiated by applications and replies to
> > > > > > > > >>  received packets.  The rx virtqueue carries packets initiated by the device and
> > > > > > > > >>  replies to previously transmitted packets.
> > > > > > > > >> @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> > > > > > > > >>  consists of a (cid, port number) tuple. The header fields used for this are
> > > > > > > > >>  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> > > > > > > > >>
> > > > > > > > >> -Currently only stream sockets are supported. \field{type} is 1 for stream
> > > > > > > > >> -socket types.
> > > > > > > > >> +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
> > > > > > > > >> +socket types. \field{type} is 3 for dgram socket types.
> > > > > > > > >>
> > > > > > > > >>  Stream sockets provide in-order, guaranteed, connection-oriented delivery
> > > > > > > > >>  without message boundaries.
> > > > > > > > >>
> > > > > > > > >> +Datagram sockets provide connectionless unreliable messages of
> > > > > > > > >> +a fixed maximum length.
> > > > > > > > >
> > > > > > > > >Plus unordered (?) and with message boundaries. In other words:
> > > > > > > > >
> > > > > > > > >  Datagram sockets provide unordered, unreliable, connectionless message
> > > > > > > > >  with message boundaries and a fixed maximum length.
> > > > > > > > >
> > > > > > > > >I didn't think of the fixed maximum length aspect before. I guess the
> > > > > > > > >intention is that the rx buffer size is the message size limit? That's
> > > > > > > > >different from UDP messages, which can be fragmented into multiple IP
> > > > > > > > >packets and can be larger than 64KiB:
> > > > > > > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
> > > > > > > > >
> > > > > > > > >Is it possible to support large datagram messages in vsock? I'm a little
> > > > > > > > >concerned that applications that run successfully over UDP will not be
> > > > > > > > >portable if vsock has this limitation because it would impose extra
> > > > > > > > >message boundaries that the application protocol might not tolerate.
> > > > > > > >
> > > > > > > > Maybe we can reuse the same approach Arseny is using for SEQPACKET.
> > > > > > > > Fragment the packets according to the buffers in the virtqueue and set
> > > > > > > > the EOR flag to indicate the last packet in the message.
> > > > > > > >
> > > > > > > Agree. Another option is to use the ones for skb since we may need to
> > > > > > > use skbs for multiple transport support anyway.
> > > > > > >
> > > > > >
> > > > > > The important thing I think is to have a single flag in virtio-vsock that
> > > > > > identifies pretty much the same thing: this is the last fragment of a series
> > > > > > to rebuild a packet.
> > > > > >
> > > > > > We should reuse the same flag for DGRAM and SEQPACKET.
> > > > > >
> > > > > > Thanks,
> > > > > > Stefano
> > > > >
> > > > > Well DGRAM can drop data so I wonder whether it can work ...
> > > > >
> > > >
> > > > Yep, this is true, but the channel should not be losing packets, so if the
> > > > receiver discards packets, it knows that it must then discard all of them
> > > > until the EOR.
> > >
> > > That is not so easy - they can come mixed up from multiple sources.
> >
> > I think we can prevent mixing because virtuqueue is point to point and its
> > use is not thread safe, so the access (in the same peer) is already
> > serialized.
> > In the end the packet would be fragmented only before copying it to the
> > virtuqueue.
> >
> > But maybe I missed something...
>
> Well I ask what's the point of fragmenting then. I assume it's so we
> can pass huge messages around so you can't keep locks ...
>
I have a related question. How to determine the suitable size of a
fragment?  Unlike stream or seqpacket sockets, the datagram
sockets do not know the available recv buf of the listener. Eg.if the listener
free recv buf is 64KB, and the sender wants to send a 256KB packet, how
does the sender know what the fragment size is? One option is to
use some default  fragment size and try to send those fragments. But those
fragments could still be dropped by the receiver.

>
> > > Sure linux net core does this but with fragmentation added in,
> > > I start wondering whether you are beginning to reinvent the net stack
> > > ...
> >
> > No, I hope not :-), in the end our advantage is that we have a channel that
> > doesn't lose packets, so I guess we can make assumptions that the network
> > stack can't.
> >
> > Thanks,
> > Stefano
>
> I still don't know how will credit accounting work for datagram,
> but proposals I saw seem to actually lose packets ...

Yes, I agree that is a problem. In my mind, the accounting is
different from the credit mechanism used by the stream sockets.
The accounting is for the sender side only and the packets may
still be dropped at the receiving side if it is larger than the
available recv buf.


> --
> MST
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-13 19:58                 ` Michael S. Tsirkin
  2021-04-13 22:00                   ` Jiang Wang .
@ 2021-04-14  6:57                   ` Stefano Garzarella
  2021-04-14  7:20                     ` Michael S. Tsirkin
  1 sibling, 1 reply; 34+ messages in thread
From: Stefano Garzarella @ 2021-04-14  6:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: cong.wang, Xiongchun Duan, Jiang Wang .,
	cohuck, virtualization, xieyongji, Stefan Hajnoczi,
	Arseny Krasnov, asias

On Tue, Apr 13, 2021 at 03:58:34PM -0400, Michael S. Tsirkin wrote:
>On Tue, Apr 13, 2021 at 04:03:51PM +0200, Stefano Garzarella wrote:
>> On Tue, Apr 13, 2021 at 09:50:45AM -0400, Michael S. Tsirkin wrote:
>> > On Tue, Apr 13, 2021 at 03:38:52PM +0200, Stefano Garzarella wrote:
>> > > On Tue, Apr 13, 2021 at 09:16:50AM -0400, Michael S. Tsirkin wrote:
>> > > > On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote:
>> > > > > On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote:
>> > > > > > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> > > > > > >
>> > > > > > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
>> > > > > > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
>> > > > > > > >> Add supports for datagram type for virtio-vsock. Datagram
>> > > > > > > >> sockets are connectionless and unreliable. To avoid contention
>> > > > > > > >> with stream and other sockets, add two more virtqueues and
>> > > > > > > >> a new feature bit to identify if those two new queues exist or not.
>> > > > > > > >>
>> > > > > > > >> Also add descriptions for resource management of datagram, which
>> > > > > > > >> does not use the existing credit update mechanism associated with
>> > > > > > > >> stream sockets.
>> > > > > > > >>
>> > > > > > > >> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>> > > > > > > >> ---
>> > > > > > > >> V2 addressed the comments for the previous version.
>> > > > > > > >>
>> > > > > > > >>  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
>> > > > > > > >>  1 file changed, 52 insertions(+), 10 deletions(-)
>> > > > > > > >>
>> > > > > > > >> diff --git a/virtio-vsock.tex b/virtio-vsock.tex
>> > > > > > > >> index da7e641..62c12e0 100644
>> > > > > > > >> --- a/virtio-vsock.tex
>> > > > > > > >> +++ b/virtio-vsock.tex
>> > > > > > > >> @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
>> > > > > > > >>  \begin{description}
>> > > > > > > >>  \item[0] rx
>> > > > > > > >>  \item[1] tx
>> > > > > > > >> +\item[2] datagram rx
>> > > > > > > >> +\item[3] datagram tx
>> > > > > > > >> +\item[4] event
>> > > > > > > >> +\end{description}
>> > > > > > > >> +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
>> > > > > > > >> +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
>> > > > > > > >> +
>> > > > > > > >> +\begin{description}
>> > > > > > > >> +\item[0] rx
>> > > > > > > >> +\item[1] tx
>> > > > > > > >>  \item[2] event
>> > > > > > > >>  \end{description}
>> > > > > > > >>
>> > > > > > > >
>> > > > > > > >I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
>> > > > > > > >virtqueues and also adding this:
>> > > > > > > >
>> > > > > > > >  When behavior differs between stream and datagram rx/tx virtqueues
>> > > > > > > >  their full names are used. Common behavior is simply described in
>> > > > > > > >  terms of rx/tx virtqueues and applies to both stream and datagram
>> > > > > > > >  virtqueues.
>> > > > > > > >
>> > > > > > > >This way you won't need to duplicate portions of the spec that deal with
>> > > > > > > >populating the virtqueues, for example.
>> > > > > > > >
>> > > > > > > >It's also clearer to use a full name for stream rx/tx virtqueues instead
>> > > > > > > >of calling them rx/tx virtqueues now that we have datagram rx/tx
>> > > > > > > >virtqueues.
>> > > > > > > >
>> > > > > > > >> +
>> > > > > > > >>  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
>> > > > > > > >>
>> > > > > > > >> -There are currently no feature bits defined for this device.
>> > > > > > > >> +\begin{description}
>> > > > > > > >> +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
>> > > > > > > >> +\end{description}
>> > > > > > > >>
>> > > > > > > >>  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
>> > > > > > > >>
>> > > > > > > >> @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
>> > > > > > > >>
>> > > > > > > >>  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
>> > > > > > > >>
>> > > > > > > >> +Flow control applies to stream sockets; datagram sockets do not have
>> > > > > > > >> +flow control.
>> > > > > > > >> +
>> > > > > > > >>  The tx virtqueue carries packets initiated by applications and replies to
>> > > > > > > >>  received packets.  The rx virtqueue carries packets initiated by the device and
>> > > > > > > >>  replies to previously transmitted packets.
>> > > > > > > >> @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
>> > > > > > > >>  consists of a (cid, port number) tuple. The header fields used for this are
>> > > > > > > >>  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
>> > > > > > > >>
>> > > > > > > >> -Currently only stream sockets are supported. \field{type} is 1 for stream
>> > > > > > > >> -socket types.
>> > > > > > > >> +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
>> > > > > > > >> +socket types. \field{type} is 3 for dgram socket types.
>> > > > > > > >>
>> > > > > > > >>  Stream sockets provide in-order, guaranteed, connection-oriented delivery
>> > > > > > > >>  without message boundaries.
>> > > > > > > >>
>> > > > > > > >> +Datagram sockets provide connectionless unreliable messages of
>> > > > > > > >> +a fixed maximum length.
>> > > > > > > >
>> > > > > > > >Plus unordered (?) and with message boundaries. In other words:
>> > > > > > > >
>> > > > > > > >  Datagram sockets provide unordered, unreliable, connectionless message
>> > > > > > > >  with message boundaries and a fixed maximum length.
>> > > > > > > >
>> > > > > > > >I didn't think of the fixed maximum length aspect before. I guess the
>> > > > > > > >intention is that the rx buffer size is the message size limit? That's
>> > > > > > > >different from UDP messages, which can be fragmented into multiple IP
>> > > > > > > >packets and can be larger than 64KiB:
>> > > > > > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
>> > > > > > > >
>> > > > > > > >Is it possible to support large datagram messages in vsock? I'm a little
>> > > > > > > >concerned that applications that run successfully over UDP will not be
>> > > > > > > >portable if vsock has this limitation because it would impose extra
>> > > > > > > >message boundaries that the application protocol might not tolerate.
>> > > > > > >
>> > > > > > > Maybe we can reuse the same approach Arseny is using for SEQPACKET.
>> > > > > > > Fragment the packets according to the buffers in the virtqueue and set
>> > > > > > > the EOR flag to indicate the last packet in the message.
>> > > > > > >
>> > > > > > Agree. Another option is to use the ones for skb since we may need to
>> > > > > > use skbs for multiple transport support anyway.
>> > > > > >
>> > > > >
>> > > > > The important thing I think is to have a single flag in virtio-vsock that
>> > > > > identifies pretty much the same thing: this is the last fragment of a series
>> > > > > to rebuild a packet.
>> > > > >
>> > > > > We should reuse the same flag for DGRAM and SEQPACKET.
>> > > > >
>> > > > > Thanks,
>> > > > > Stefano
>> > > >
>> > > > Well DGRAM can drop data so I wonder whether it can work ...
>> > > >
>> > >
>> > > Yep, this is true, but the channel should not be losing packets, so if the
>> > > receiver discards packets, it knows that it must then discard all of them
>> > > until the EOR.
>> >
>> > That is not so easy - they can come mixed up from multiple sources.
>>
>> I think we can prevent mixing because virtuqueue is point to point and its
>> use is not thread safe, so the access (in the same peer) is already
>> serialized.
>> In the end the packet would be fragmented only before copying it to the
>> virtuqueue.
>>
>> But maybe I missed something...
>
>Well I ask what's the point of fragmenting then. I assume it's so we
>can pass huge messages around so you can't keep locks ...
>

Maybe I'm wrong, but isn't this similar to what we do in virtio-net with 
mergeable buffers?

Also in this case I think the fragmentation will happen only in the 
device, since the driver can enqueue the entire buffer.

Maybe we can reuse mergeable buffers for virtio-vsock if the EOR flag is 
not suitable.

IIUC in the vsock device the fragmentation for DGRAM will happen just 
before to queue it in the virtqueue, and the device can check how many 
buffers are available in the queue and it can decide whether to queue 
them all up or throw them away.

>
>> > Sure linux net core does this but with fragmentation added in,
>> > I start wondering whether you are beginning to reinvent the net stack
>> > ...
>>
>> No, I hope not :-), in the end our advantage is that we have a channel that
>> doesn't lose packets, so I guess we can make assumptions that the network
>> stack can't.
>>
>> Thanks,
>> Stefano
>
>I still don't know how will credit accounting work for datagram,
>but proposals I saw seem to actually lose packets ...
>

I still don't know too, but I think it's not an issue in the RX side,
since if it doesn't have space, can drop all the fragment.

Another option to avoid fragmentation could be to allocate 64K buffers 
for the new DGRAM virtqueue.
In this way we will have at most 64K packets, which is similar to 
UDP/IP, without extra work for the fragmentation.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-13 22:00                   ` Jiang Wang .
@ 2021-04-14  7:07                     ` Stefano Garzarella
  0 siblings, 0 replies; 34+ messages in thread
From: Stefano Garzarella @ 2021-04-14  7:07 UTC (permalink / raw)
  To: Jiang Wang .
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, asias,
	Arseny Krasnov

On Tue, Apr 13, 2021 at 03:00:50PM -0700, Jiang Wang . wrote:
>On Tue, Apr 13, 2021 at 12:58 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>> On Tue, Apr 13, 2021 at 04:03:51PM +0200, Stefano Garzarella wrote:
>> > On Tue, Apr 13, 2021 at 09:50:45AM -0400, Michael S. Tsirkin wrote:
>> > > On Tue, Apr 13, 2021 at 03:38:52PM +0200, Stefano Garzarella wrote:
>> > > > On Tue, Apr 13, 2021 at 09:16:50AM -0400, Michael S. Tsirkin wrote:
>> > > > > On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote:
>> > > > > > On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote:
>> > > > > > > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> > > > > > > >
>> > > > > > > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
>> > > > > > > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
>> > > > > > > > >> Add supports for datagram type for virtio-vsock. Datagram
>> > > > > > > > >> sockets are connectionless and unreliable. To avoid contention
>> > > > > > > > >> with stream and other sockets, add two more virtqueues and
>> > > > > > > > >> a new feature bit to identify if those two new queues exist or not.
>> > > > > > > > >>
>> > > > > > > > >> Also add descriptions for resource management of datagram, which
>> > > > > > > > >> does not use the existing credit update mechanism associated with
>> > > > > > > > >> stream sockets.
>> > > > > > > > >>
>> > > > > > > > >> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>> > > > > > > > >> ---
>> > > > > > > > >> V2 addressed the comments for the previous version.
>> > > > > > > > >>
>> > > > > > > > >>  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
>> > > > > > > > >>  1 file changed, 52 insertions(+), 10 deletions(-)
>> > > > > > > > >>
>> > > > > > > > >> diff --git a/virtio-vsock.tex b/virtio-vsock.tex
>> > > > > > > > >> index da7e641..62c12e0 100644
>> > > > > > > > >> --- a/virtio-vsock.tex
>> > > > > > > > >> +++ b/virtio-vsock.tex
>> > > > > > > > >> @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
>> > > > > > > > >>  \begin{description}
>> > > > > > > > >>  \item[0] rx
>> > > > > > > > >>  \item[1] tx
>> > > > > > > > >> +\item[2] datagram rx
>> > > > > > > > >> +\item[3] datagram tx
>> > > > > > > > >> +\item[4] event
>> > > > > > > > >> +\end{description}
>> > > > > > > > >> +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
>> > > > > > > > >> +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
>> > > > > > > > >> +
>> > > > > > > > >> +\begin{description}
>> > > > > > > > >> +\item[0] rx
>> > > > > > > > >> +\item[1] tx
>> > > > > > > > >>  \item[2] event
>> > > > > > > > >>  \end{description}
>> > > > > > > > >>
>> > > > > > > > >
>> > > > > > > > >I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
>> > > > > > > > >virtqueues and also adding this:
>> > > > > > > > >
>> > > > > > > > >  When behavior differs between stream and datagram rx/tx virtqueues
>> > > > > > > > >  their full names are used. Common behavior is simply described in
>> > > > > > > > >  terms of rx/tx virtqueues and applies to both stream and datagram
>> > > > > > > > >  virtqueues.
>> > > > > > > > >
>> > > > > > > > >This way you won't need to duplicate portions of the spec that deal with
>> > > > > > > > >populating the virtqueues, for example.
>> > > > > > > > >
>> > > > > > > > >It's also clearer to use a full name for stream rx/tx virtqueues instead
>> > > > > > > > >of calling them rx/tx virtqueues now that we have datagram rx/tx
>> > > > > > > > >virtqueues.
>> > > > > > > > >
>> > > > > > > > >> +
>> > > > > > > > >>  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
>> > > > > > > > >>
>> > > > > > > > >> -There are currently no feature bits defined for this device.
>> > > > > > > > >> +\begin{description}
>> > > > > > > > >> +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
>> > > > > > > > >> +\end{description}
>> > > > > > > > >>
>> > > > > > > > >>  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
>> > > > > > > > >>
>> > > > > > > > >> @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
>> > > > > > > > >>
>> > > > > > > > >>  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
>> > > > > > > > >>
>> > > > > > > > >> +Flow control applies to stream sockets; datagram sockets do not have
>> > > > > > > > >> +flow control.
>> > > > > > > > >> +
>> > > > > > > > >>  The tx virtqueue carries packets initiated by applications and replies to
>> > > > > > > > >>  received packets.  The rx virtqueue carries packets initiated by the device and
>> > > > > > > > >>  replies to previously transmitted packets.
>> > > > > > > > >> @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
>> > > > > > > > >>  consists of a (cid, port number) tuple. The header fields used for this are
>> > > > > > > > >>  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
>> > > > > > > > >>
>> > > > > > > > >> -Currently only stream sockets are supported. \field{type} is 1 for stream
>> > > > > > > > >> -socket types.
>> > > > > > > > >> +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
>> > > > > > > > >> +socket types. \field{type} is 3 for dgram socket types.
>> > > > > > > > >>
>> > > > > > > > >>  Stream sockets provide in-order, guaranteed, connection-oriented delivery
>> > > > > > > > >>  without message boundaries.
>> > > > > > > > >>
>> > > > > > > > >> +Datagram sockets provide connectionless unreliable messages of
>> > > > > > > > >> +a fixed maximum length.
>> > > > > > > > >
>> > > > > > > > >Plus unordered (?) and with message boundaries. In other words:
>> > > > > > > > >
>> > > > > > > > >  Datagram sockets provide unordered, unreliable, connectionless message
>> > > > > > > > >  with message boundaries and a fixed maximum length.
>> > > > > > > > >
>> > > > > > > > >I didn't think of the fixed maximum length aspect before. I guess the
>> > > > > > > > >intention is that the rx buffer size is the message size limit? That's
>> > > > > > > > >different from UDP messages, which can be fragmented into multiple IP
>> > > > > > > > >packets and can be larger than 64KiB:
>> > > > > > > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
>> > > > > > > > >
>> > > > > > > > >Is it possible to support large datagram messages in vsock? I'm a little
>> > > > > > > > >concerned that applications that run successfully over UDP will not be
>> > > > > > > > >portable if vsock has this limitation because it would impose extra
>> > > > > > > > >message boundaries that the application protocol might not tolerate.
>> > > > > > > >
>> > > > > > > > Maybe we can reuse the same approach Arseny is using for SEQPACKET.
>> > > > > > > > Fragment the packets according to the buffers in the virtqueue and set
>> > > > > > > > the EOR flag to indicate the last packet in the message.
>> > > > > > > >
>> > > > > > > Agree. Another option is to use the ones for skb since we may need to
>> > > > > > > use skbs for multiple transport support anyway.
>> > > > > > >
>> > > > > >
>> > > > > > The important thing I think is to have a single flag in virtio-vsock that
>> > > > > > identifies pretty much the same thing: this is the last fragment of a series
>> > > > > > to rebuild a packet.
>> > > > > >
>> > > > > > We should reuse the same flag for DGRAM and SEQPACKET.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Stefano
>> > > > >
>> > > > > Well DGRAM can drop data so I wonder whether it can work ...
>> > > > >
>> > > >
>> > > > Yep, this is true, but the channel should not be losing packets, so if the
>> > > > receiver discards packets, it knows that it must then discard all of them
>> > > > until the EOR.
>> > >
>> > > That is not so easy - they can come mixed up from multiple sources.
>> >
>> > I think we can prevent mixing because virtuqueue is point to point and its
>> > use is not thread safe, so the access (in the same peer) is already
>> > serialized.
>> > In the end the packet would be fragmented only before copying it to the
>> > virtuqueue.
>> >
>> > But maybe I missed something...
>>
>> Well I ask what's the point of fragmenting then. I assume it's so we
>> can pass huge messages around so you can't keep locks ...
>>
>I have a related question. How to determine the suitable size of a
>fragment?  Unlike stream or seqpacket sockets, the datagram
>sockets do not know the available recv buf of the listener. Eg.if the listener
>free recv buf is 64KB, and the sender wants to send a 256KB packet, how
>does the sender know what the fragment size is? One option is to
>use some default  fragment size and try to send those fragments. But those
>fragments could still be dropped by the receiver.

I guess it depends on the virtqueue buffers.
For the driver I don't think we need to fragment, we can queue the whole 
packet.
For device it depends on size of buffers in virtqueue, which determine 
size of fragment.

As I said in previous email, maybe we can allocate 64K buffers in 
virtqueue RX, and allow user at most 64K messages, avoiding 
fragmentation completely.

>
>>
>> > > Sure linux net core does this but with fragmentation added in,
>> > > I start wondering whether you are beginning to reinvent the net stack
>> > > ...
>> >
>> > No, I hope not :-), in the end our advantage is that we have a channel that
>> > doesn't lose packets, so I guess we can make assumptions that the network
>> > stack can't.
>> >
>> > Thanks,
>> > Stefano
>>
>> I still don't know how will credit accounting work for datagram,
>> but proposals I saw seem to actually lose packets ...
>
>Yes, I agree that is a problem. In my mind, the accounting is
>different from the credit mechanism used by the stream sockets.
>The accounting is for the sender side only and the packets may
>still be dropped at the receiving side if it is larger than the
>available recv buf.
>

I don't think it's a problem if the receiver decide to drop, the 
important thing is that it drops all the fragments.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-14  6:57                   ` Stefano Garzarella
@ 2021-04-14  7:20                     ` Michael S. Tsirkin
  2021-04-14  9:38                       ` Stefano Garzarella
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2021-04-14  7:20 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Jiang Wang .,
	cohuck, virtualization, xieyongji, Stefan Hajnoczi,
	Arseny Krasnov, asias

On Wed, Apr 14, 2021 at 08:57:06AM +0200, Stefano Garzarella wrote:
> On Tue, Apr 13, 2021 at 03:58:34PM -0400, Michael S. Tsirkin wrote:
> > On Tue, Apr 13, 2021 at 04:03:51PM +0200, Stefano Garzarella wrote:
> > > On Tue, Apr 13, 2021 at 09:50:45AM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, Apr 13, 2021 at 03:38:52PM +0200, Stefano Garzarella wrote:
> > > > > On Tue, Apr 13, 2021 at 09:16:50AM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote:
> > > > > > > On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote:
> > > > > > > > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
> > > > > > > > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
> > > > > > > > > >> Add supports for datagram type for virtio-vsock. Datagram
> > > > > > > > > >> sockets are connectionless and unreliable. To avoid contention
> > > > > > > > > >> with stream and other sockets, add two more virtqueues and
> > > > > > > > > >> a new feature bit to identify if those two new queues exist or not.
> > > > > > > > > >>
> > > > > > > > > >> Also add descriptions for resource management of datagram, which
> > > > > > > > > >> does not use the existing credit update mechanism associated with
> > > > > > > > > >> stream sockets.
> > > > > > > > > >>
> > > > > > > > > >> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> > > > > > > > > >> ---
> > > > > > > > > >> V2 addressed the comments for the previous version.
> > > > > > > > > >>
> > > > > > > > > >>  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
> > > > > > > > > >>  1 file changed, 52 insertions(+), 10 deletions(-)
> > > > > > > > > >>
> > > > > > > > > >> diff --git a/virtio-vsock.tex b/virtio-vsock.tex
> > > > > > > > > >> index da7e641..62c12e0 100644
> > > > > > > > > >> --- a/virtio-vsock.tex
> > > > > > > > > >> +++ b/virtio-vsock.tex
> > > > > > > > > >> @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
> > > > > > > > > >>  \begin{description}
> > > > > > > > > >>  \item[0] rx
> > > > > > > > > >>  \item[1] tx
> > > > > > > > > >> +\item[2] datagram rx
> > > > > > > > > >> +\item[3] datagram tx
> > > > > > > > > >> +\item[4] event
> > > > > > > > > >> +\end{description}
> > > > > > > > > >> +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
> > > > > > > > > >> +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
> > > > > > > > > >> +
> > > > > > > > > >> +\begin{description}
> > > > > > > > > >> +\item[0] rx
> > > > > > > > > >> +\item[1] tx
> > > > > > > > > >>  \item[2] event
> > > > > > > > > >>  \end{description}
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
> > > > > > > > > >virtqueues and also adding this:
> > > > > > > > > >
> > > > > > > > > >  When behavior differs between stream and datagram rx/tx virtqueues
> > > > > > > > > >  their full names are used. Common behavior is simply described in
> > > > > > > > > >  terms of rx/tx virtqueues and applies to both stream and datagram
> > > > > > > > > >  virtqueues.
> > > > > > > > > >
> > > > > > > > > >This way you won't need to duplicate portions of the spec that deal with
> > > > > > > > > >populating the virtqueues, for example.
> > > > > > > > > >
> > > > > > > > > >It's also clearer to use a full name for stream rx/tx virtqueues instead
> > > > > > > > > >of calling them rx/tx virtqueues now that we have datagram rx/tx
> > > > > > > > > >virtqueues.
> > > > > > > > > >
> > > > > > > > > >> +
> > > > > > > > > >>  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> > > > > > > > > >>
> > > > > > > > > >> -There are currently no feature bits defined for this device.
> > > > > > > > > >> +\begin{description}
> > > > > > > > > >> +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
> > > > > > > > > >> +\end{description}
> > > > > > > > > >>
> > > > > > > > > >>  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
> > > > > > > > > >>
> > > > > > > > > >> @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
> > > > > > > > > >>
> > > > > > > > > >>  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
> > > > > > > > > >>
> > > > > > > > > >> +Flow control applies to stream sockets; datagram sockets do not have
> > > > > > > > > >> +flow control.
> > > > > > > > > >> +
> > > > > > > > > >>  The tx virtqueue carries packets initiated by applications and replies to
> > > > > > > > > >>  received packets.  The rx virtqueue carries packets initiated by the device and
> > > > > > > > > >>  replies to previously transmitted packets.
> > > > > > > > > >> @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> > > > > > > > > >>  consists of a (cid, port number) tuple. The header fields used for this are
> > > > > > > > > >>  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> > > > > > > > > >>
> > > > > > > > > >> -Currently only stream sockets are supported. \field{type} is 1 for stream
> > > > > > > > > >> -socket types.
> > > > > > > > > >> +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
> > > > > > > > > >> +socket types. \field{type} is 3 for dgram socket types.
> > > > > > > > > >>
> > > > > > > > > >>  Stream sockets provide in-order, guaranteed, connection-oriented delivery
> > > > > > > > > >>  without message boundaries.
> > > > > > > > > >>
> > > > > > > > > >> +Datagram sockets provide connectionless unreliable messages of
> > > > > > > > > >> +a fixed maximum length.
> > > > > > > > > >
> > > > > > > > > >Plus unordered (?) and with message boundaries. In other words:
> > > > > > > > > >
> > > > > > > > > >  Datagram sockets provide unordered, unreliable, connectionless message
> > > > > > > > > >  with message boundaries and a fixed maximum length.
> > > > > > > > > >
> > > > > > > > > >I didn't think of the fixed maximum length aspect before. I guess the
> > > > > > > > > >intention is that the rx buffer size is the message size limit? That's
> > > > > > > > > >different from UDP messages, which can be fragmented into multiple IP
> > > > > > > > > >packets and can be larger than 64KiB:
> > > > > > > > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
> > > > > > > > > >
> > > > > > > > > >Is it possible to support large datagram messages in vsock? I'm a little
> > > > > > > > > >concerned that applications that run successfully over UDP will not be
> > > > > > > > > >portable if vsock has this limitation because it would impose extra
> > > > > > > > > >message boundaries that the application protocol might not tolerate.
> > > > > > > > >
> > > > > > > > > Maybe we can reuse the same approach Arseny is using for SEQPACKET.
> > > > > > > > > Fragment the packets according to the buffers in the virtqueue and set
> > > > > > > > > the EOR flag to indicate the last packet in the message.
> > > > > > > > >
> > > > > > > > Agree. Another option is to use the ones for skb since we may need to
> > > > > > > > use skbs for multiple transport support anyway.
> > > > > > > >
> > > > > > >
> > > > > > > The important thing I think is to have a single flag in virtio-vsock that
> > > > > > > identifies pretty much the same thing: this is the last fragment of a series
> > > > > > > to rebuild a packet.
> > > > > > >
> > > > > > > We should reuse the same flag for DGRAM and SEQPACKET.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Stefano
> > > > > >
> > > > > > Well DGRAM can drop data so I wonder whether it can work ...
> > > > > >
> > > > >
> > > > > Yep, this is true, but the channel should not be losing packets, so if the
> > > > > receiver discards packets, it knows that it must then discard all of them
> > > > > until the EOR.
> > > >
> > > > That is not so easy - they can come mixed up from multiple sources.
> > > 
> > > I think we can prevent mixing because virtuqueue is point to point and its
> > > use is not thread safe, so the access (in the same peer) is already
> > > serialized.
> > > In the end the packet would be fragmented only before copying it to the
> > > virtuqueue.
> > > 
> > > But maybe I missed something...
> > 
> > Well I ask what's the point of fragmenting then. I assume it's so we
> > can pass huge messages around so you can't keep locks ...
> > 
> 
> Maybe I'm wrong, but isn't this similar to what we do in virtio-net with
> mergeable buffers?

The point of mergeable buffers is to use less memory: both for each
packet and for a full receive vq.

> Also in this case I think the fragmentation will happen only in the device,
> since the driver can enqueue the entire buffer.
> 
> Maybe we can reuse mergeable buffers for virtio-vsock if the EOR flag is not
> suitable.

That sounds very reasonable.

> IIUC in the vsock device the fragmentation for DGRAM will happen just before
> to queue it in the virtqueue, and the device can check how many buffers are
> available in the queue and it can decide whether to queue them all up or
> throw them away.
> > 
> > > > Sure linux net core does this but with fragmentation added in,
> > > > I start wondering whether you are beginning to reinvent the net stack
> > > > ...
> > > 
> > > No, I hope not :-), in the end our advantage is that we have a channel that
> > > doesn't lose packets, so I guess we can make assumptions that the network
> > > stack can't.
> > > 
> > > Thanks,
> > > Stefano
> > 
> > I still don't know how will credit accounting work for datagram,
> > but proposals I saw seem to actually lose packets ...
> > 
> 
> I still don't know too, but I think it's not an issue in the RX side,
> since if it doesn't have space, can drop all the fragment.
> 
> Another option to avoid fragmentation could be to allocate 64K buffers for
> the new DGRAM virtqueue.

That's a lot of buffers ...

> In this way we will have at most 64K packets, which is similar to UDP/IP,
> without extra work for the fragmentation.

IIRC default MTU is 1280 not 64K ...

> 
> Thanks,
> Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-14  7:20                     ` Michael S. Tsirkin
@ 2021-04-14  9:38                       ` Stefano Garzarella
  2021-04-15  3:15                         ` Jiang Wang .
  0 siblings, 1 reply; 34+ messages in thread
From: Stefano Garzarella @ 2021-04-14  9:38 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jiang Wang .
  Cc: cong.wang, Xiongchun Duan, cohuck, virtualization, xieyongji,
	Stefan Hajnoczi, asias, Arseny Krasnov

On Wed, Apr 14, 2021 at 03:20:07AM -0400, Michael S. Tsirkin wrote:
>On Wed, Apr 14, 2021 at 08:57:06AM +0200, Stefano Garzarella wrote:
>> On Tue, Apr 13, 2021 at 03:58:34PM -0400, Michael S. Tsirkin wrote:
>> > On Tue, Apr 13, 2021 at 04:03:51PM +0200, Stefano Garzarella wrote:
>> > > On Tue, Apr 13, 2021 at 09:50:45AM -0400, Michael S. Tsirkin wrote:
>> > > > On Tue, Apr 13, 2021 at 03:38:52PM +0200, Stefano Garzarella wrote:
>> > > > > On Tue, Apr 13, 2021 at 09:16:50AM -0400, Michael S. Tsirkin wrote:
>> > > > > > On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote:
>> > > > > > > On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote:
>> > > > > > > > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> > > > > > > > >
>> > > > > > > > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
>> > > > > > > > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang 
>> > > > > > > > > >wrote:

[...]

>> > > > > > > > > >>
>> > > > > > > > > >> +Datagram sockets provide connectionless unreliable messages of
>> > > > > > > > > >> +a fixed maximum length.
>> > > > > > > > > >
>> > > > > > > > > >Plus unordered (?) and with message boundaries. In other words:
>> > > > > > > > > >
>> > > > > > > > > >  Datagram sockets provide unordered, unreliable, connectionless message
>> > > > > > > > > >  with message boundaries and a fixed maximum length.
>> > > > > > > > > >
>> > > > > > > > > >I didn't think of the fixed maximum length aspect before. I guess the
>> > > > > > > > > >intention is that the rx buffer size is the message size limit? That's
>> > > > > > > > > >different from UDP messages, which can be fragmented into multiple IP
>> > > > > > > > > >packets and can be larger than 64KiB:
>> > > > > > > > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
>> > > > > > > > > >
>> > > > > > > > > >Is it possible to support large datagram messages in vsock? I'm a little
>> > > > > > > > > >concerned that applications that run successfully over UDP will not be
>> > > > > > > > > >portable if vsock has this limitation because it would impose extra
>> > > > > > > > > >message boundaries that the application protocol might not tolerate.
>> > > > > > > > >
>> > > > > > > > > Maybe we can reuse the same approach Arseny is using for SEQPACKET.
>> > > > > > > > > Fragment the packets according to the buffers in the virtqueue and set
>> > > > > > > > > the EOR flag to indicate the last packet in the message.
>> > > > > > > > >
>> > > > > > > > Agree. Another option is to use the ones for skb since we may need to
>> > > > > > > > use skbs for multiple transport support anyway.
>> > > > > > > >
>> > > > > > >
>> > > > > > > The important thing I think is to have a single flag in virtio-vsock that
>> > > > > > > identifies pretty much the same thing: this is the last fragment of a series
>> > > > > > > to rebuild a packet.
>> > > > > > >
>> > > > > > > We should reuse the same flag for DGRAM and SEQPACKET.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Stefano
>> > > > > >
>> > > > > > Well DGRAM can drop data so I wonder whether it can work ...
>> > > > > >
>> > > > >
>> > > > > Yep, this is true, but the channel should not be losing packets, so if the
>> > > > > receiver discards packets, it knows that it must then discard all of them
>> > > > > until the EOR.
>> > > >
>> > > > That is not so easy - they can come mixed up from multiple sources.
>> > >
>> > > I think we can prevent mixing because virtuqueue is point to point and its
>> > > use is not thread safe, so the access (in the same peer) is already
>> > > serialized.
>> > > In the end the packet would be fragmented only before copying it to the
>> > > virtuqueue.
>> > >
>> > > But maybe I missed something...
>> >
>> > Well I ask what's the point of fragmenting then. I assume it's so we
>> > can pass huge messages around so you can't keep locks ...
>> >
>>
>> Maybe I'm wrong, but isn't this similar to what we do in virtio-net with
>> mergeable buffers?
>
>The point of mergeable buffers is to use less memory: both for each
>packet and for a full receive vq.
>
>> Also in this case I think the fragmentation will happen only in the device,
>> since the driver can enqueue the entire buffer.
>>
>> Maybe we can reuse mergeable buffers for virtio-vsock if the EOR flag is not
>> suitable.
>
>That sounds very reasonable.

It should also allow us to save the header for each fragment.

@Jiang Do you want to explore this?
I'm talking about VIRTIO_NET_F_MRG_RXBUF feature.

>
>> IIUC in the vsock device the fragmentation for DGRAM will happen just 
>> before
>> to queue it in the virtqueue, and the device can check how many buffers are
>> available in the queue and it can decide whether to queue them all up or
>> throw them away.
>> >
>> > > > Sure linux net core does this but with fragmentation added in,
>> > > > I start wondering whether you are beginning to reinvent the net stack
>> > > > ...
>> > >
>> > > No, I hope not :-), in the end our advantage is that we have a channel that
>> > > doesn't lose packets, so I guess we can make assumptions that the network
>> > > stack can't.
>> > >
>> > > Thanks,
>> > > Stefano
>> >
>> > I still don't know how will credit accounting work for datagram,
>> > but proposals I saw seem to actually lose packets ...
>> >
>>
>> I still don't know too, but I think it's not an issue in the RX side,
>> since if it doesn't have space, can drop all the fragment.
>>
>> Another option to avoid fragmentation could be to allocate 64K buffers for
>> the new DGRAM virtqueue.
>
>That's a lot of buffers ...

Yep I see, and they would often be mostly unused...

>
>> In this way we will have at most 64K packets, which is similar to 
>> UDP/IP,
>> without extra work for the fragmentation.
>
>IIRC default MTU is 1280 not 64K ...

I was thinking that UDP at most can support 64K messages that IP should 
fragment according to MTU.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-14  9:38                       ` Stefano Garzarella
@ 2021-04-15  3:15                         ` Jiang Wang .
  2021-05-04  3:40                           ` Jiang Wang .
  0 siblings, 1 reply; 34+ messages in thread
From: Jiang Wang . @ 2021-04-15  3:15 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, asias,
	Arseny Krasnov

On Wed, Apr 14, 2021 at 2:38 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Apr 14, 2021 at 03:20:07AM -0400, Michael S. Tsirkin wrote:
> >On Wed, Apr 14, 2021 at 08:57:06AM +0200, Stefano Garzarella wrote:
> >> On Tue, Apr 13, 2021 at 03:58:34PM -0400, Michael S. Tsirkin wrote:
> >> > On Tue, Apr 13, 2021 at 04:03:51PM +0200, Stefano Garzarella wrote:
> >> > > On Tue, Apr 13, 2021 at 09:50:45AM -0400, Michael S. Tsirkin wrote:
> >> > > > On Tue, Apr 13, 2021 at 03:38:52PM +0200, Stefano Garzarella wrote:
> >> > > > > On Tue, Apr 13, 2021 at 09:16:50AM -0400, Michael S. Tsirkin wrote:
> >> > > > > > On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote:
> >> > > > > > > On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote:
> >> > > > > > > > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >> > > > > > > > >
> >> > > > > > > > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
> >> > > > > > > > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang
> >> > > > > > > > > >wrote:
>
> [...]
>
> >> > > > > > > > > >>
> >> > > > > > > > > >> +Datagram sockets provide connectionless unreliable messages of
> >> > > > > > > > > >> +a fixed maximum length.
> >> > > > > > > > > >
> >> > > > > > > > > >Plus unordered (?) and with message boundaries. In other words:
> >> > > > > > > > > >
> >> > > > > > > > > >  Datagram sockets provide unordered, unreliable, connectionless message
> >> > > > > > > > > >  with message boundaries and a fixed maximum length.
> >> > > > > > > > > >
> >> > > > > > > > > >I didn't think of the fixed maximum length aspect before. I guess the
> >> > > > > > > > > >intention is that the rx buffer size is the message size limit? That's
> >> > > > > > > > > >different from UDP messages, which can be fragmented into multiple IP
> >> > > > > > > > > >packets and can be larger than 64KiB:
> >> > > > > > > > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
> >> > > > > > > > > >
> >> > > > > > > > > >Is it possible to support large datagram messages in vsock? I'm a little
> >> > > > > > > > > >concerned that applications that run successfully over UDP will not be
> >> > > > > > > > > >portable if vsock has this limitation because it would impose extra
> >> > > > > > > > > >message boundaries that the application protocol might not tolerate.
> >> > > > > > > > >
> >> > > > > > > > > Maybe we can reuse the same approach Arseny is using for SEQPACKET.
> >> > > > > > > > > Fragment the packets according to the buffers in the virtqueue and set
> >> > > > > > > > > the EOR flag to indicate the last packet in the message.
> >> > > > > > > > >
> >> > > > > > > > Agree. Another option is to use the ones for skb since we may need to
> >> > > > > > > > use skbs for multiple transport support anyway.
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > The important thing I think is to have a single flag in virtio-vsock that
> >> > > > > > > identifies pretty much the same thing: this is the last fragment of a series
> >> > > > > > > to rebuild a packet.
> >> > > > > > >
> >> > > > > > > We should reuse the same flag for DGRAM and SEQPACKET.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > > Stefano
> >> > > > > >
> >> > > > > > Well DGRAM can drop data so I wonder whether it can work ...
> >> > > > > >
> >> > > > >
> >> > > > > Yep, this is true, but the channel should not be losing packets, so if the
> >> > > > > receiver discards packets, it knows that it must then discard all of them
> >> > > > > until the EOR.
> >> > > >
> >> > > > That is not so easy - they can come mixed up from multiple sources.
> >> > >
> >> > > I think we can prevent mixing because virtuqueue is point to point and its
> >> > > use is not thread safe, so the access (in the same peer) is already
> >> > > serialized.
> >> > > In the end the packet would be fragmented only before copying it to the
> >> > > virtuqueue.
> >> > >
> >> > > But maybe I missed something...
> >> >
> >> > Well I ask what's the point of fragmenting then. I assume it's so we
> >> > can pass huge messages around so you can't keep locks ...
> >> >
> >>
> >> Maybe I'm wrong, but isn't this similar to what we do in virtio-net with
> >> mergeable buffers?
> >
> >The point of mergeable buffers is to use less memory: both for each
> >packet and for a full receive vq.
> >
> >> Also in this case I think the fragmentation will happen only in the device,
> >> since the driver can enqueue the entire buffer.
> >>
> >> Maybe we can reuse mergeable buffers for virtio-vsock if the EOR flag is not
> >> suitable.
> >
> >That sounds very reasonable.
>
> It should also allow us to save the header for each fragment.
>
> @Jiang Do you want to explore this?
> I'm talking about VIRTIO_NET_F_MRG_RXBUF feature.

Sure. Will do.

> >
> >> IIUC in the vsock device the fragmentation for DGRAM will happen just
> >> before
> >> to queue it in the virtqueue, and the device can check how many buffers are
> >> available in the queue and it can decide whether to queue them all up or
> >> throw them away.
> >> >
> >> > > > Sure linux net core does this but with fragmentation added in,
> >> > > > I start wondering whether you are beginning to reinvent the net stack
> >> > > > ...
> >> > >
> >> > > No, I hope not :-), in the end our advantage is that we have a channel that
> >> > > doesn't lose packets, so I guess we can make assumptions that the network
> >> > > stack can't.
> >> > >
> >> > > Thanks,
> >> > > Stefano
> >> >
> >> > I still don't know how will credit accounting work for datagram,
> >> > but proposals I saw seem to actually lose packets ...
> >> >
> >>
> >> I still don't know too, but I think it's not an issue in the RX side,
> >> since if it doesn't have space, can drop all the fragment.
> >>
> >> Another option to avoid fragmentation could be to allocate 64K buffers for
> >> the new DGRAM virtqueue.
> >
> >That's a lot of buffers ...
>
> Yep I see, and they would often be mostly unused...
>
> >
> >> In this way we will have at most 64K packets, which is similar to
> >> UDP/IP,
> >> without extra work for the fragmentation.
> >
> >IIRC default MTU is 1280 not 64K ...
>
> I was thinking that UDP at most can support 64K messages that IP should
> fragment according to MTU.
>
> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-15  3:15                         ` Jiang Wang .
@ 2021-05-04  3:40                           ` Jiang Wang .
  2021-05-04 16:16                             ` Stefano Garzarella
  0 siblings, 1 reply; 34+ messages in thread
From: Jiang Wang . @ 2021-05-04  3:40 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, asias,
	Arseny Krasnov

Hi Stefano,

I checked the VIRTIO_NET_F_MRG_RXBUF feature bit and I think vsock
dgram can use that feature too.
Do we want to make this feature a must-have or optional? One idea is
to make it optional. When not
supported, dgram rx buf is 16 KB which should be good in most cases.
When VIRTIO_NET_F_MRG_RXBUF is supported, the rx buf is 4K and the max
packet size is 64 KB.

Also, just to make sure we are on the same page, the current vsock
stream code can also split a
big packet to multiple buffers and the receive side can assemble them
together.  But dgram cannot
use that code because the dgram may drop a buffer in the driver code
(if there is not enough space).
That means dgram may drop some buffers at the beginning, in the end or in the
middle of a pkt. And a packet may
not be received as a complete one. Therefore, we need something like
VIRTIO_NET_F_MRG_RXBUF.

If we want to leverage current stream code without using VIRTIO_NET_F_MRG_RXBUF,
we could add a total_len and offset to the virtio_vsock_hdr. Then when sending
packet, the device split the big packet to multiple small ones and
each has a header. They will have the
same total_len, but different offsets. On the driver side, the driver
can check the total_len before
enqueueing the big packet for the one with offset 0.
If there is enough space, all the remaining packets will be received.
If not, the remaining packets will be dropped.
I feel this implementation might be easier than using
VIRTIO_NET_F_MRG_RXBUF. But either one is fine with me.
Any preference? Thanks.

Regards,

Jiang


On Wed, Apr 14, 2021 at 8:15 PM Jiang Wang . <jiang.wang@bytedance.com> wrote:
>
> On Wed, Apr 14, 2021 at 2:38 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > On Wed, Apr 14, 2021 at 03:20:07AM -0400, Michael S. Tsirkin wrote:
> > >On Wed, Apr 14, 2021 at 08:57:06AM +0200, Stefano Garzarella wrote:
> > >> On Tue, Apr 13, 2021 at 03:58:34PM -0400, Michael S. Tsirkin wrote:
> > >> > On Tue, Apr 13, 2021 at 04:03:51PM +0200, Stefano Garzarella wrote:
> > >> > > On Tue, Apr 13, 2021 at 09:50:45AM -0400, Michael S. Tsirkin wrote:
> > >> > > > On Tue, Apr 13, 2021 at 03:38:52PM +0200, Stefano Garzarella wrote:
> > >> > > > > On Tue, Apr 13, 2021 at 09:16:50AM -0400, Michael S. Tsirkin wrote:
> > >> > > > > > On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote:
> > >> > > > > > > On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote:
> > >> > > > > > > > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
> > >> > > > > > > > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang
> > >> > > > > > > > > >wrote:
> >
> > [...]
> >
> > >> > > > > > > > > >>
> > >> > > > > > > > > >> +Datagram sockets provide connectionless unreliable messages of
> > >> > > > > > > > > >> +a fixed maximum length.
> > >> > > > > > > > > >
> > >> > > > > > > > > >Plus unordered (?) and with message boundaries. In other words:
> > >> > > > > > > > > >
> > >> > > > > > > > > >  Datagram sockets provide unordered, unreliable, connectionless message
> > >> > > > > > > > > >  with message boundaries and a fixed maximum length.
> > >> > > > > > > > > >
> > >> > > > > > > > > >I didn't think of the fixed maximum length aspect before. I guess the
> > >> > > > > > > > > >intention is that the rx buffer size is the message size limit? That's
> > >> > > > > > > > > >different from UDP messages, which can be fragmented into multiple IP
> > >> > > > > > > > > >packets and can be larger than 64KiB:
> > >> > > > > > > > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
> > >> > > > > > > > > >
> > >> > > > > > > > > >Is it possible to support large datagram messages in vsock? I'm a little
> > >> > > > > > > > > >concerned that applications that run successfully over UDP will not be
> > >> > > > > > > > > >portable if vsock has this limitation because it would impose extra
> > >> > > > > > > > > >message boundaries that the application protocol might not tolerate.
> > >> > > > > > > > >
> > >> > > > > > > > > Maybe we can reuse the same approach Arseny is using for SEQPACKET.
> > >> > > > > > > > > Fragment the packets according to the buffers in the virtqueue and set
> > >> > > > > > > > > the EOR flag to indicate the last packet in the message.
> > >> > > > > > > > >
> > >> > > > > > > > Agree. Another option is to use the ones for skb since we may need to
> > >> > > > > > > > use skbs for multiple transport support anyway.
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > > > The important thing I think is to have a single flag in virtio-vsock that
> > >> > > > > > > identifies pretty much the same thing: this is the last fragment of a series
> > >> > > > > > > to rebuild a packet.
> > >> > > > > > >
> > >> > > > > > > We should reuse the same flag for DGRAM and SEQPACKET.
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > > Stefano
> > >> > > > > >
> > >> > > > > > Well DGRAM can drop data so I wonder whether it can work ...
> > >> > > > > >
> > >> > > > >
> > >> > > > > Yep, this is true, but the channel should not be losing packets, so if the
> > >> > > > > receiver discards packets, it knows that it must then discard all of them
> > >> > > > > until the EOR.
> > >> > > >
> > >> > > > That is not so easy - they can come mixed up from multiple sources.
> > >> > >
> > >> > > I think we can prevent mixing because virtuqueue is point to point and its
> > >> > > use is not thread safe, so the access (in the same peer) is already
> > >> > > serialized.
> > >> > > In the end the packet would be fragmented only before copying it to the
> > >> > > virtuqueue.
> > >> > >
> > >> > > But maybe I missed something...
> > >> >
> > >> > Well I ask what's the point of fragmenting then. I assume it's so we
> > >> > can pass huge messages around so you can't keep locks ...
> > >> >
> > >>
> > >> Maybe I'm wrong, but isn't this similar to what we do in virtio-net with
> > >> mergeable buffers?
> > >
> > >The point of mergeable buffers is to use less memory: both for each
> > >packet and for a full receive vq.
> > >
> > >> Also in this case I think the fragmentation will happen only in the device,
> > >> since the driver can enqueue the entire buffer.
> > >>
> > >> Maybe we can reuse mergeable buffers for virtio-vsock if the EOR flag is not
> > >> suitable.
> > >
> > >That sounds very reasonable.
> >
> > It should also allow us to save the header for each fragment.
> >
> > @Jiang Do you want to explore this?
> > I'm talking about VIRTIO_NET_F_MRG_RXBUF feature.
>
> Sure. Will do.
>
> > >
> > >> IIUC in the vsock device the fragmentation for DGRAM will happen just
> > >> before
> > >> to queue it in the virtqueue, and the device can check how many buffers are
> > >> available in the queue and it can decide whether to queue them all up or
> > >> throw them away.
> > >> >
> > >> > > > Sure linux net core does this but with fragmentation added in,
> > >> > > > I start wondering whether you are beginning to reinvent the net stack
> > >> > > > ...
> > >> > >
> > >> > > No, I hope not :-), in the end our advantage is that we have a channel that
> > >> > > doesn't lose packets, so I guess we can make assumptions that the network
> > >> > > stack can't.
> > >> > >
> > >> > > Thanks,
> > >> > > Stefano
> > >> >
> > >> > I still don't know how will credit accounting work for datagram,
> > >> > but proposals I saw seem to actually lose packets ...
> > >> >
> > >>
> > >> I still don't know too, but I think it's not an issue in the RX side,
> > >> since if it doesn't have space, can drop all the fragment.
> > >>
> > >> Another option to avoid fragmentation could be to allocate 64K buffers for
> > >> the new DGRAM virtqueue.
> > >
> > >That's a lot of buffers ...
> >
> > Yep I see, and they would often be mostly unused...
> >
> > >
> > >> In this way we will have at most 64K packets, which is similar to
> > >> UDP/IP,
> > >> without extra work for the fragmentation.
> > >
> > >IIRC default MTU is 1280 not 64K ...
> >
> > I was thinking that UDP at most can support 64K messages that IP should
> > fragment according to MTU.
> >
> > Thanks,
> > Stefano
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-05-04  3:40                           ` Jiang Wang .
@ 2021-05-04 16:16                             ` Stefano Garzarella
  2021-05-04 17:06                               ` Jiang Wang .
  0 siblings, 1 reply; 34+ messages in thread
From: Stefano Garzarella @ 2021-05-04 16:16 UTC (permalink / raw)
  To: Jiang Wang .
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, asias,
	Arseny Krasnov

Hi Jiang,

On Mon, May 03, 2021 at 08:40:46PM -0700, Jiang Wang . wrote:
>Hi Stefano,
>
>I checked the VIRTIO_NET_F_MRG_RXBUF feature bit and I think vsock
>dgram can use that feature too.

Cool, thanks for checking!

>Do we want to make this feature a must-have or optional? One idea is
>to make it optional. When not

I think optional is fine, and we should support it for all kind of 
traffic (stream, dgram, seqpacket).

>supported, dgram rx buf is 16 KB which should be good in most cases.

Why not 4 KB like for stream? Or we could make it configurable.

>When VIRTIO_NET_F_MRG_RXBUF is supported, the rx buf is 4K and the max
>packet size is 64 KB.
>
>Also, just to make sure we are on the same page, the current vsock
>stream code can also split a
>big packet to multiple buffers and the receive side can assemble them
>together.

Yes, sort of. Being a stream, there's no concept of a boundary.

> But dgram cannot
>use that code because the dgram may drop a buffer in the driver code
>(if there is not enough space).
>That means dgram may drop some buffers at the beginning, in the end or in the
>middle of a pkt. And a packet may
>not be received as a complete one. Therefore, we need something like
>VIRTIO_NET_F_MRG_RXBUF.

Yep.

>
>If we want to leverage current stream code without using 
>VIRTIO_NET_F_MRG_RXBUF,
>we could add a total_len and offset to the virtio_vsock_hdr. Then when sending
>packet, the device split the big packet to multiple small ones and
>each has a header. They will have the
>same total_len, but different offsets. On the driver side, the driver
>can check the total_len before
>enqueueing the big packet for the one with offset 0.
>If there is enough space, all the remaining packets will be received.
>If not, the remaining packets will be dropped.
>I feel this implementation might be easier than using
>VIRTIO_NET_F_MRG_RXBUF. But either one is fine with me.
>Any preference? Thanks.

This is very similar to what we discussed with Michael. He pointed out 
that it could be complicated and we could have several problems.

For example, we should also provide an ID to prevent different fragments 
from overlapping. Also we might have problems handling different flows 
at the same time.

Mergable buffers allow us to avoid these problems and also bring 
advantages for the other types of traffic (stream, seqpacket).

It also allows us to use a single header for the packet and all its 
fragments.

So IMHO, if there are no significant issues, the best way would be to 
implement mergeable buffers in vsock,
I think there are only advantages to using this feature.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-05-04 16:16                             ` Stefano Garzarella
@ 2021-05-04 17:06                               ` Jiang Wang .
  2021-05-05 10:49                                 ` Stefano Garzarella
  0 siblings, 1 reply; 34+ messages in thread
From: Jiang Wang . @ 2021-05-04 17:06 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, asias,
	Arseny Krasnov

On Tue, May 4, 2021 at 9:16 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> Hi Jiang,
>
> On Mon, May 03, 2021 at 08:40:46PM -0700, Jiang Wang . wrote:
> >Hi Stefano,
> >
> >I checked the VIRTIO_NET_F_MRG_RXBUF feature bit and I think vsock
> >dgram can use that feature too.
>
> Cool, thanks for checking!

NP.

> >Do we want to make this feature a must-have or optional? One idea is
> >to make it optional. When not
>
> I think optional is fine, and we should support it for all kind of
> traffic (stream, dgram, seqpacket).

Got it. I was thinking only for dgram originally, but I think it should be fine
for stream and seqpacket too.

Btw, I have a small implementation question. For now, the vsock
allocates rx buffers with two scatterlist. One for header and one for the
payload. After we enable VIRTIO_NET_F_MRG_RXBUF feature,
do we still want to allocate buffers like that? Or could we just use
one big scatterlist for the whole packet? I think using the same allocation
method is fine, but it just may not line up with the real packets well since
we will skip headers for the big packets except the first buffer.

> >supported, dgram rx buf is 16 KB which should be good in most cases.
>
> Why not 4 KB like for stream? Or we could make it configurable.

OK. sure. 4 KB is fine with me. I mentioned 16 KB because I was thinking
jumbo frames in the ethernet world. But  I just found out the jumbo frame
is about 8 KB or 9 KB only.

If we make it configurable, what kind of interface to use to configure it?
In linux, we could use something like the sysfs interface. I guess we don't
need to specify that detail in the spec though. I will just put the size should
be configurable in the spec.

> >When VIRTIO_NET_F_MRG_RXBUF is supported, the rx buf is 4K and the max
> >packet size is 64 KB.
> >
> >Also, just to make sure we are on the same page, the current vsock
> >stream code can also split a
> >big packet to multiple buffers and the receive side can assemble them
> >together.
>
> Yes, sort of. Being a stream, there's no concept of a boundary.
>
> > But dgram cannot
> >use that code because the dgram may drop a buffer in the driver code
> >(if there is not enough space).
> >That means dgram may drop some buffers at the beginning, in the end or in the
> >middle of a pkt. And a packet may
> >not be received as a complete one. Therefore, we need something like
> >VIRTIO_NET_F_MRG_RXBUF.
>
> Yep.
>
> >
> >If we want to leverage current stream code without using
> >VIRTIO_NET_F_MRG_RXBUF,
> >we could add a total_len and offset to the virtio_vsock_hdr. Then when sending
> >packet, the device split the big packet to multiple small ones and
> >each has a header. They will have the
> >same total_len, but different offsets. On the driver side, the driver
> >can check the total_len before
> >enqueueing the big packet for the one with offset 0.
> >If there is enough space, all the remaining packets will be received.
> >If not, the remaining packets will be dropped.
> >I feel this implementation might be easier than using
> >VIRTIO_NET_F_MRG_RXBUF. But either one is fine with me.
> >Any preference? Thanks.
>
> This is very similar to what we discussed with Michael. He pointed out
> that it could be complicated and we could have several problems.
>
> For example, we should also provide an ID to prevent different fragments
> from overlapping. Also we might have problems handling different flows
> at the same time.
>
> Mergable buffers allow us to avoid these problems and also bring
> advantages for the other types of traffic (stream, seqpacket).
>
> It also allows us to use a single header for the packet and all its
> fragments.
>
> So IMHO, if there are no significant issues, the best way would be to
> implement mergeable buffers in vsock,
> I think there are only advantages to using this feature.

Sure. Got it. I was thinking only about dgram, which is simpler than
stream and seqpacket. For those two, they will have issues as you
just mentioned.

Also, just to make sure. For steam and seqpacket, supporting
mergeable buffers is mainly for performance improvements,
right? Or to save memory? I think functionally, they will be the
same with or without
mergeable buffers. For dgram, the maximum supported packet size
is increased when using MRG_RXBUF if the rx buf size is fixed,
and it can save lots of memory.

I am a little bit confused about the motivation to support mergeable
buffers for stream and seqpacket. Could you remind me again? Sorry
that if it was already mentioned in the old emails.

We could only support it on dgram since dgram has its own virtqueues.

btw, my company email system automatically adds [External] to
these emails, and I meant to remove it manually when I reply,
but forgot to do that sometimes, so the email threading may not
be very accurate.
Sorry about that.


> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-05-04 17:06                               ` Jiang Wang .
@ 2021-05-05 10:49                                 ` Stefano Garzarella
  2021-05-05 16:58                                   ` Jiang Wang .
  0 siblings, 1 reply; 34+ messages in thread
From: Stefano Garzarella @ 2021-05-05 10:49 UTC (permalink / raw)
  To: Jiang Wang .
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, asias,
	Arseny Krasnov

On Tue, May 04, 2021 at 10:06:02AM -0700, Jiang Wang . wrote:
>On Tue, May 4, 2021 at 9:16 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> Hi Jiang,
>>
>> On Mon, May 03, 2021 at 08:40:46PM -0700, Jiang Wang . wrote:
>> >Hi Stefano,
>> >
>> >I checked the VIRTIO_NET_F_MRG_RXBUF feature bit and I think vsock
>> >dgram can use that feature too.
>>
>> Cool, thanks for checking!
>
>NP.
>
>> >Do we want to make this feature a must-have or optional? One idea is
>> >to make it optional. When not
>>
>> I think optional is fine, and we should support it for all kind of
>> traffic (stream, dgram, seqpacket).
>
>Got it. I was thinking only for dgram originally, but I think it should be fine
>for stream and seqpacket too.
>
>Btw, I have a small implementation question. For now, the vsock
>allocates rx buffers with two scatterlist. One for header and one for the
>payload. After we enable VIRTIO_NET_F_MRG_RXBUF feature,
>do we still want to allocate buffers like that? Or could we just use
>one big scatterlist for the whole packet? I think using the same allocation
>method is fine, but it just may not line up with the real packets well since
>we will skip headers for the big packets except the first buffer.

Good question.

With mergeable buffer I think is better to remove the little buffer for 
the header in the scatterlist, this should also avoid to do two 
allocations per packet/buffer in the guest.

>
>> >supported, dgram rx buf is 16 KB which should be good in most cases.
>>
>> Why not 4 KB like for stream? Or we could make it configurable.
>
>OK. sure. 4 KB is fine with me. I mentioned 16 KB because I was thinking
>jumbo frames in the ethernet world. But  I just found out the jumbo frame
>is about 8 KB or 9 KB only.
>
>If we make it configurable, what kind of interface to use to configure it?
>In linux, we could use something like the sysfs interface. I guess we don't

Yes, something like that for the guest driver.

>need to specify that detail in the spec though. I will just put the size should
>be configurable in the spec.

Yeah, I remember that at some point we fixed an issue where the host 
always expected buffer of 4 KB.

Now it should support any buffer sizes less or equal to 64 KB.

>
>> >When VIRTIO_NET_F_MRG_RXBUF is supported, the rx buf is 4K and the max
>> >packet size is 64 KB.
>> >
>> >Also, just to make sure we are on the same page, the current vsock
>> >stream code can also split a
>> >big packet to multiple buffers and the receive side can assemble them
>> >together.
>>
>> Yes, sort of. Being a stream, there's no concept of a boundary.
>>
>> > But dgram cannot
>> >use that code because the dgram may drop a buffer in the driver code
>> >(if there is not enough space).
>> >That means dgram may drop some buffers at the beginning, in the end or in the
>> >middle of a pkt. And a packet may
>> >not be received as a complete one. Therefore, we need something like
>> >VIRTIO_NET_F_MRG_RXBUF.
>>
>> Yep.
>>
>> >
>> >If we want to leverage current stream code without using
>> >VIRTIO_NET_F_MRG_RXBUF,
>> >we could add a total_len and offset to the virtio_vsock_hdr. Then when sending
>> >packet, the device split the big packet to multiple small ones and
>> >each has a header. They will have the
>> >same total_len, but different offsets. On the driver side, the driver
>> >can check the total_len before
>> >enqueueing the big packet for the one with offset 0.
>> >If there is enough space, all the remaining packets will be received.
>> >If not, the remaining packets will be dropped.
>> >I feel this implementation might be easier than using
>> >VIRTIO_NET_F_MRG_RXBUF. But either one is fine with me.
>> >Any preference? Thanks.
>>
>> This is very similar to what we discussed with Michael. He pointed 
>> out
>> that it could be complicated and we could have several problems.
>>
>> For example, we should also provide an ID to prevent different 
>> fragments
>> from overlapping. Also we might have problems handling different 
>> flows
>> at the same time.
>>
>> Mergable buffers allow us to avoid these problems and also bring
>> advantages for the other types of traffic (stream, seqpacket).
>>
>> It also allows us to use a single header for the packet and all its
>> fragments.
>>
>> So IMHO, if there are no significant issues, the best way would be to
>> implement mergeable buffers in vsock,
>> I think there are only advantages to using this feature.
>
>Sure. Got it. I was thinking only about dgram, which is simpler than
>stream and seqpacket. For those two, they will have issues as you
>just mentioned.
>
>Also, just to make sure. For steam and seqpacket, supporting
>mergeable buffers is mainly for performance improvements,
>right? Or to save memory? I think functionally, they will be the
>same with or without
>mergeable buffers.

Yes, right!

> For dgram, the maximum supported packet size
>is increased when using MRG_RXBUF if the rx buf size is fixed,
>and it can save lots of memory.
>
>I am a little bit confused about the motivation to support mergeable
>buffers for stream and seqpacket. Could you remind me again? Sorry
>that if it was already mentioned in the old emails.

We can save the header overhead, using a single header for the entire 
"big" packet.

For example, in the current implementation, if the host has a 64KB 
buffer to send to the guest with a stream socket, must split it into 16 
packets, using a header for each fragment. With mergable buffers, we 
would save the extra header for each fragment by using a single initial 
header specifying the number of descriptors used.

>
>We could only support it on dgram since dgram has its own virtqueues.

Maybe for the initial implementation is fine, then we can add support 
also for other types.

Please, keep this in mind, so it will be easier to reuse it for other 
types.

>
>btw, my company email system automatically adds [External] to
>these emails, and I meant to remove it manually when I reply,
>but forgot to do that sometimes, so the email threading may not
>be very accurate.
>Sorry about that.

Don't worry :-)

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-05-05 10:49                                 ` Stefano Garzarella
@ 2021-05-05 16:58                                   ` Jiang Wang .
  2021-05-07 16:53                                     ` Jiang Wang .
  0 siblings, 1 reply; 34+ messages in thread
From: Jiang Wang . @ 2021-05-05 16:58 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, asias,
	Arseny Krasnov

On Wed, May 5, 2021 at 3:49 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Tue, May 04, 2021 at 10:06:02AM -0700, Jiang Wang . wrote:
> >On Tue, May 4, 2021 at 9:16 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >>
> >> Hi Jiang,
> >>
> >> On Mon, May 03, 2021 at 08:40:46PM -0700, Jiang Wang . wrote:
> >> >Hi Stefano,
> >> >
> >> >I checked the VIRTIO_NET_F_MRG_RXBUF feature bit and I think vsock
> >> >dgram can use that feature too.
> >>
> >> Cool, thanks for checking!
> >
> >NP.
> >
> >> >Do we want to make this feature a must-have or optional? One idea is
> >> >to make it optional. When not
> >>
> >> I think optional is fine, and we should support it for all kind of
> >> traffic (stream, dgram, seqpacket).
> >
> >Got it. I was thinking only for dgram originally, but I think it should be fine
> >for stream and seqpacket too.
> >
> >Btw, I have a small implementation question. For now, the vsock
> >allocates rx buffers with two scatterlist. One for header and one for the
> >payload. After we enable VIRTIO_NET_F_MRG_RXBUF feature,
> >do we still want to allocate buffers like that? Or could we just use
> >one big scatterlist for the whole packet? I think using the same allocation
> >method is fine, but it just may not line up with the real packets well since
> >we will skip headers for the big packets except the first buffer.
>
> Good question.
>
> With mergeable buffer I think is better to remove the little buffer for
> the header in the scatterlist, this should also avoid to do two
> allocations per packet/buffer in the guest.

Got  it. Will do.

> >
> >> >supported, dgram rx buf is 16 KB which should be good in most cases.
> >>
> >> Why not 4 KB like for stream? Or we could make it configurable.
> >
> >OK. sure. 4 KB is fine with me. I mentioned 16 KB because I was thinking
> >jumbo frames in the ethernet world. But  I just found out the jumbo frame
> >is about 8 KB or 9 KB only.
> >
> >If we make it configurable, what kind of interface to use to configure it?
> >In linux, we could use something like the sysfs interface. I guess we don't
>
> Yes, something like that for the guest driver.

Got it.

> >need to specify that detail in the spec though. I will just put the size should
> >be configurable in the spec.
>
> Yeah, I remember that at some point we fixed an issue where the host
> always expected buffer of 4 KB.
>
> Now it should support any buffer sizes less or equal to 64 KB.
>
I see. I will if there is any issue with that.

> >
> >> >When VIRTIO_NET_F_MRG_RXBUF is supported, the rx buf is 4K and the max
> >> >packet size is 64 KB.
> >> >
> >> >Also, just to make sure we are on the same page, the current vsock
> >> >stream code can also split a
> >> >big packet to multiple buffers and the receive side can assemble them
> >> >together.
> >>
> >> Yes, sort of. Being a stream, there's no concept of a boundary.
> >>
> >> > But dgram cannot
> >> >use that code because the dgram may drop a buffer in the driver code
> >> >(if there is not enough space).
> >> >That means dgram may drop some buffers at the beginning, in the end or in the
> >> >middle of a pkt. And a packet may
> >> >not be received as a complete one. Therefore, we need something like
> >> >VIRTIO_NET_F_MRG_RXBUF.
> >>
> >> Yep.
> >>
> >> >
> >> >If we want to leverage current stream code without using
> >> >VIRTIO_NET_F_MRG_RXBUF,
> >> >we could add a total_len and offset to the virtio_vsock_hdr. Then when sending
> >> >packet, the device split the big packet to multiple small ones and
> >> >each has a header. They will have the
> >> >same total_len, but different offsets. On the driver side, the driver
> >> >can check the total_len before
> >> >enqueueing the big packet for the one with offset 0.
> >> >If there is enough space, all the remaining packets will be received.
> >> >If not, the remaining packets will be dropped.
> >> >I feel this implementation might be easier than using
> >> >VIRTIO_NET_F_MRG_RXBUF. But either one is fine with me.
> >> >Any preference? Thanks.
> >>
> >> This is very similar to what we discussed with Michael. He pointed
> >> out
> >> that it could be complicated and we could have several problems.
> >>
> >> For example, we should also provide an ID to prevent different
> >> fragments
> >> from overlapping. Also we might have problems handling different
> >> flows
> >> at the same time.
> >>
> >> Mergable buffers allow us to avoid these problems and also bring
> >> advantages for the other types of traffic (stream, seqpacket).
> >>
> >> It also allows us to use a single header for the packet and all its
> >> fragments.
> >>
> >> So IMHO, if there are no significant issues, the best way would be to
> >> implement mergeable buffers in vsock,
> >> I think there are only advantages to using this feature.
> >
> >Sure. Got it. I was thinking only about dgram, which is simpler than
> >stream and seqpacket. For those two, they will have issues as you
> >just mentioned.
> >
> >Also, just to make sure. For steam and seqpacket, supporting
> >mergeable buffers is mainly for performance improvements,
> >right? Or to save memory? I think functionally, they will be the
> >same with or without
> >mergeable buffers.
>
> Yes, right!
>
> > For dgram, the maximum supported packet size
> >is increased when using MRG_RXBUF if the rx buf size is fixed,
> >and it can save lots of memory.
> >
> >I am a little bit confused about the motivation to support mergeable
> >buffers for stream and seqpacket. Could you remind me again? Sorry
> >that if it was already mentioned in the old emails.
>
> We can save the header overhead, using a single header for the entire
> "big" packet.
>
> For example, in the current implementation, if the host has a 64KB
> buffer to send to the guest with a stream socket, must split it into 16
> packets, using a header for each fragment. With mergable buffers, we
> would save the extra header for each fragment by using a single initial
> header specifying the number of descriptors used.
>
OK. Sure.
> >
> >We could only support it on dgram since dgram has its own virtqueues.
>
> Maybe for the initial implementation is fine, then we can add support
> also for other types.
>
> Please, keep this in mind, so it will be easier to reuse it for other
> types.
>
Got it. Will do. Thanks for the suggestions and comments. I will
update the spec patch next.

> >
> >btw, my company email system automatically adds [External] to
> >these emails, and I meant to remove it manually when I reply,
> >but forgot to do that sometimes, so the email threading may not
> >be very accurate.
> >Sorry about that.
>
> Don't worry :-)
>
> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-05-05 16:58                                   ` Jiang Wang .
@ 2021-05-07 16:53                                     ` Jiang Wang .
  2021-05-10 14:50                                       ` Stefano Garzarella
  0 siblings, 1 reply; 34+ messages in thread
From: Jiang Wang . @ 2021-05-07 16:53 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, asias,
	Arseny Krasnov

Hi guys,

I have one question about adding two new virtqueues for dgram. One new
thought is that we don't add two new virtqueues but keep using existing
virtqueues for both stream and dgram.

At the beginning when I first thought about supporting dgram, I thought
adding two new virtqueues would be easier and have better performance.
But now, after the prototype is done, I think that to keep using
existing virtqueues is also not complicated and could in fact be simpler.
The performance difference may not be very big.

Original code has about 3 places which have assumptions about the
virtqueues are only used by the stream. But we can change those codes.
One place is to check pkt len. We can check only for stream pkts.
Another two are in tx and rx code path where if queued replies pkts are
too much, the code will stop the rx queue and resume later. We can keep
that same logic. The dgram will be affected a little bit but that should
be fine I think. Are there any other places that we should fix?

In short, the virtqueues are in a lower level and can support multiple
flows and socket types. Use existing virtqueues also make it more
compatible with old versions.

What do you guys think? I remember Stefano mentioned that we should add
two new virtqueues for dgram. Stefano, do you have some specific reasons
for that? Could we just keep using existing virtqueues? Thanks.

Regards,

Jiang

On Wed, May 5, 2021 at 9:58 AM Jiang Wang . <jiang.wang@bytedance.com> wrote:
>
> On Wed, May 5, 2021 at 3:49 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > On Tue, May 04, 2021 at 10:06:02AM -0700, Jiang Wang . wrote:
> > >On Tue, May 4, 2021 at 9:16 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >>
> > >> Hi Jiang,
> > >>
> > >> On Mon, May 03, 2021 at 08:40:46PM -0700, Jiang Wang . wrote:
> > >> >Hi Stefano,
> > >> >
> > >> >I checked the VIRTIO_NET_F_MRG_RXBUF feature bit and I think vsock
> > >> >dgram can use that feature too.
> > >>
> > >> Cool, thanks for checking!
> > >
> > >NP.
> > >
> > >> >Do we want to make this feature a must-have or optional? One idea is
> > >> >to make it optional. When not
> > >>
> > >> I think optional is fine, and we should support it for all kind of
> > >> traffic (stream, dgram, seqpacket).
> > >
> > >Got it. I was thinking only for dgram originally, but I think it should be fine
> > >for stream and seqpacket too.
> > >
> > >Btw, I have a small implementation question. For now, the vsock
> > >allocates rx buffers with two scatterlist. One for header and one for the
> > >payload. After we enable VIRTIO_NET_F_MRG_RXBUF feature,
> > >do we still want to allocate buffers like that? Or could we just use
> > >one big scatterlist for the whole packet? I think using the same allocation
> > >method is fine, but it just may not line up with the real packets well since
> > >we will skip headers for the big packets except the first buffer.
> >
> > Good question.
> >
> > With mergeable buffer I think is better to remove the little buffer for
> > the header in the scatterlist, this should also avoid to do two
> > allocations per packet/buffer in the guest.
>
> Got  it. Will do.
>
> > >
> > >> >supported, dgram rx buf is 16 KB which should be good in most cases.
> > >>
> > >> Why not 4 KB like for stream? Or we could make it configurable.
> > >
> > >OK. sure. 4 KB is fine with me. I mentioned 16 KB because I was thinking
> > >jumbo frames in the ethernet world. But  I just found out the jumbo frame
> > >is about 8 KB or 9 KB only.
> > >
> > >If we make it configurable, what kind of interface to use to configure it?
> > >In linux, we could use something like the sysfs interface. I guess we don't
> >
> > Yes, something like that for the guest driver.
>
> Got it.
>
> > >need to specify that detail in the spec though. I will just put the size should
> > >be configurable in the spec.
> >
> > Yeah, I remember that at some point we fixed an issue where the host
> > always expected buffer of 4 KB.
> >
> > Now it should support any buffer sizes less or equal to 64 KB.
> >
> I see. I will if there is any issue with that.
>
> > >
> > >> >When VIRTIO_NET_F_MRG_RXBUF is supported, the rx buf is 4K and the max
> > >> >packet size is 64 KB.
> > >> >
> > >> >Also, just to make sure we are on the same page, the current vsock
> > >> >stream code can also split a
> > >> >big packet to multiple buffers and the receive side can assemble them
> > >> >together.
> > >>
> > >> Yes, sort of. Being a stream, there's no concept of a boundary.
> > >>
> > >> > But dgram cannot
> > >> >use that code because the dgram may drop a buffer in the driver code
> > >> >(if there is not enough space).
> > >> >That means dgram may drop some buffers at the beginning, in the end or in the
> > >> >middle of a pkt. And a packet may
> > >> >not be received as a complete one. Therefore, we need something like
> > >> >VIRTIO_NET_F_MRG_RXBUF.
> > >>
> > >> Yep.
> > >>
> > >> >
> > >> >If we want to leverage current stream code without using
> > >> >VIRTIO_NET_F_MRG_RXBUF,
> > >> >we could add a total_len and offset to the virtio_vsock_hdr. Then when sending
> > >> >packet, the device split the big packet to multiple small ones and
> > >> >each has a header. They will have the
> > >> >same total_len, but different offsets. On the driver side, the driver
> > >> >can check the total_len before
> > >> >enqueueing the big packet for the one with offset 0.
> > >> >If there is enough space, all the remaining packets will be received.
> > >> >If not, the remaining packets will be dropped.
> > >> >I feel this implementation might be easier than using
> > >> >VIRTIO_NET_F_MRG_RXBUF. But either one is fine with me.
> > >> >Any preference? Thanks.
> > >>
> > >> This is very similar to what we discussed with Michael. He pointed
> > >> out
> > >> that it could be complicated and we could have several problems.
> > >>
> > >> For example, we should also provide an ID to prevent different
> > >> fragments
> > >> from overlapping. Also we might have problems handling different
> > >> flows
> > >> at the same time.
> > >>
> > >> Mergable buffers allow us to avoid these problems and also bring
> > >> advantages for the other types of traffic (stream, seqpacket).
> > >>
> > >> It also allows us to use a single header for the packet and all its
> > >> fragments.
> > >>
> > >> So IMHO, if there are no significant issues, the best way would be to
> > >> implement mergeable buffers in vsock,
> > >> I think there are only advantages to using this feature.
> > >
> > >Sure. Got it. I was thinking only about dgram, which is simpler than
> > >stream and seqpacket. For those two, they will have issues as you
> > >just mentioned.
> > >
> > >Also, just to make sure. For steam and seqpacket, supporting
> > >mergeable buffers is mainly for performance improvements,
> > >right? Or to save memory? I think functionally, they will be the
> > >same with or without
> > >mergeable buffers.
> >
> > Yes, right!
> >
> > > For dgram, the maximum supported packet size
> > >is increased when using MRG_RXBUF if the rx buf size is fixed,
> > >and it can save lots of memory.
> > >
> > >I am a little bit confused about the motivation to support mergeable
> > >buffers for stream and seqpacket. Could you remind me again? Sorry
> > >that if it was already mentioned in the old emails.
> >
> > We can save the header overhead, using a single header for the entire
> > "big" packet.
> >
> > For example, in the current implementation, if the host has a 64KB
> > buffer to send to the guest with a stream socket, must split it into 16
> > packets, using a header for each fragment. With mergable buffers, we
> > would save the extra header for each fragment by using a single initial
> > header specifying the number of descriptors used.
> >
> OK. Sure.
> > >
> > >We could only support it on dgram since dgram has its own virtqueues.
> >
> > Maybe for the initial implementation is fine, then we can add support
> > also for other types.
> >
> > Please, keep this in mind, so it will be easier to reuse it for other
> > types.
> >
> Got it. Will do. Thanks for the suggestions and comments. I will
> update the spec patch next.
>
> > >
> > >btw, my company email system automatically adds [External] to
> > >these emails, and I meant to remove it manually when I reply,
> > >but forgot to do that sometimes, so the email threading may not
> > >be very accurate.
> > >Sorry about that.
> >
> > Don't worry :-)
> >
> > Thanks,
> > Stefano
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-05-07 16:53                                     ` Jiang Wang .
@ 2021-05-10 14:50                                       ` Stefano Garzarella
  2021-05-13 23:26                                         ` Jiang Wang .
  0 siblings, 1 reply; 34+ messages in thread
From: Stefano Garzarella @ 2021-05-10 14:50 UTC (permalink / raw)
  To: Jiang Wang .
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, Arseny Krasnov

On Fri, May 07, 2021 at 09:53:19AM -0700, Jiang Wang . wrote:
>Hi guys,
>
>I have one question about adding two new virtqueues for dgram. One new
>thought is that we don't add two new virtqueues but keep using existing
>virtqueues for both stream and dgram.
>
>At the beginning when I first thought about supporting dgram, I thought
>adding two new virtqueues would be easier and have better performance.
>But now, after the prototype is done, I think that to keep using
>existing virtqueues is also not complicated and could in fact be simpler.
>The performance difference may not be very big.

I honestly didn't think it was easier to use two new queues, quite the 
opposite.

>
>Original code has about 3 places which have assumptions about the
>virtqueues are only used by the stream. But we can change those codes.
>One place is to check pkt len. We can check only for stream pkts.
>Another two are in tx and rx code path where if queued replies pkts are
>too much, the code will stop the rx queue and resume later. We can keep
>that same logic. The dgram will be affected a little bit but that should
>be fine I think. Are there any other places that we should fix?

Did you take a look at Arseny's series?
I think he's already found the places where to check the type and it 
seems to me they are the places you listed.

>
>In short, the virtqueues are in a lower level and can support multiple
>flows and socket types. Use existing virtqueues also make it more
>compatible with old versions.

It's not clear to me how compatibility is improved. Can you elaborate on 
this?

>
>What do you guys think? I remember Stefano mentioned that we should add
>two new virtqueues for dgram. Stefano, do you have some specific reasons
>for that? Could we just keep using existing virtqueues? Thanks.
>

My biggest concern was about the credit mechanism for datagrams. I mean 
avoiding datagrams from crowding the queue without limits, preventing 
streams from communicating.

If you've found a way to limit datagram traffic, then maybe it's doable.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [External] Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-04-12 22:39   ` [External] " Jiang Wang .
@ 2021-05-13 14:57     ` Stefan Hajnoczi
  0 siblings, 0 replies; 34+ messages in thread
From: Stefan Hajnoczi @ 2021-05-13 14:57 UTC (permalink / raw)
  To: Jiang Wang .
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Arseny Krasnov, asias


[-- Attachment #1.1: Type: text/plain, Size: 3913 bytes --]

On Mon, Apr 12, 2021 at 03:39:36PM -0700, Jiang Wang . wrote:
> On Mon, Apr 12, 2021 at 6:50 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
> > > +For the rx side, dgram also uses the \field{buf_alloc}. If it is full, the packet
> > > +is dropped by the receiver.
> >
> > UDP is connectionless so any number of other sources can send messages
> > to the same destination, causing buf_alloc's value to be unpredictable.
> > Can you explain how buf_alloc works with datagram sockets in more
> > detail?
> 
> In the linux kernel in my prototype, datagram sockets also use
> virtio_transport_inc_rx_pkt() and virtio_transport_dec_rx_pkt() to update
> vvs->rx_bytes and compare it with vvs->buf_alloc. virtio_transport_inc_rx_pkt
> is called when enqueuing the datagram packets.
> virtio_transport_dec_rx_pkt is called when dequeuing those packets.
> 
> If multiple sources send messages to the same destination, they will share
> the same buf_alloc. Do you want something with more control?
> Maybe somehow allocate a buffer for each remote CID and port? But I
> feel that is a little bit overkill. Any other suggestions?

The opposite, I want less control :). It's not possible to track buffer
space accurately with connectionless sockets, so let's not try.

A real example is the iperf3 networking benchmark where you need to set
target bitrate for UDP sockets because there is no flow control (buffer
space) mechanism in UDP. That's how UDP applications work and for
similar reasons I don't think it's possible to achieve flow control with
vsock's buf_alloc.

> > >  \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > > -sufficient free buffer space for the payload.
> > > +For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > > +sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets
> > > +MAY be transmitted when the peer buffer is full. Then the packet will be dropped by the receiver.
> > >
> > >  All packets associated with a stream flow MUST contain valid information in
> > >  \field{buf_alloc} and \field{fwd_cnt} fields.
> > >
> > >  \devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > > -sufficient free buffer space for the payload.
> > > +For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > > +sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets
> > > +MAY be transmitted when the peer buffer is full. Then the packet will be dropped by the receiver.
> > >
> > >  All packets associated with a stream flow MUST contain valid information in
> > >  \field{buf_alloc} and \field{fwd_cnt} fields.
> > > @@ -203,14 +234,14 @@ \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / De
> > >  The \field{guest_cid} configuration field MUST be used as the source CID when
> > >  sending outgoing packets.
> > >
> > > -A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> > > +For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> > >  unknown \field{type} value.
> >
> > What about datagram sockets? Please state what must happen and why.
> 
> I think datagram sockets should do the same thing as the stream to
> return a VIRTIO_VSOCK_OP_RST?
> Any other ideas?

Sounds good to me. I just wanted to check and request that you add that
to the spec.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-05-10 14:50                                       ` Stefano Garzarella
@ 2021-05-13 23:26                                         ` Jiang Wang .
  2021-05-14 15:17                                           ` Stefano Garzarella
  0 siblings, 1 reply; 34+ messages in thread
From: Jiang Wang . @ 2021-05-13 23:26 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, Arseny Krasnov

On Mon, May 10, 2021 at 7:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Fri, May 07, 2021 at 09:53:19AM -0700, Jiang Wang . wrote:
> >Hi guys,
> >
> >I have one question about adding two new virtqueues for dgram. One new
> >thought is that we don't add two new virtqueues but keep using existing
> >virtqueues for both stream and dgram.
> >
> >At the beginning when I first thought about supporting dgram, I thought
> >adding two new virtqueues would be easier and have better performance.
> >But now, after the prototype is done, I think that to keep using
> >existing virtqueues is also not complicated and could in fact be simpler.
> >The performance difference may not be very big.
>
> I honestly didn't think it was easier to use two new queues, quite the
> opposite.

Got it.

> >
> >Original code has about 3 places which have assumptions about the
> >virtqueues are only used by the stream. But we can change those codes.
> >One place is to check pkt len. We can check only for stream pkts.
> >Another two are in tx and rx code path where if queued replies pkts are
> >too much, the code will stop the rx queue and resume later. We can keep
> >that same logic. The dgram will be affected a little bit but that should
> >be fine I think. Are there any other places that we should fix?
>
> Did you take a look at Arseny's series?
> I think he's already found the places where to check the type and it
> seems to me they are the places you listed.

Yes. I checked his patch. And that helps.

> >
> >In short, the virtqueues are in a lower level and can support multiple
> >flows and socket types. Use existing virtqueues also make it more
> >compatible with old versions.
>
> It's not clear to me how compatibility is improved. Can you elaborate on
> this?

I was thinking if we don't add two new virtqueues, then maybe we don't
need to add new feature bit too? If the other end does not support
dgram, then the packets will be just dropped. What do you think? Do
we still need to add dgram feature bits? I can have a feature bit for
mergeable buffer.

> >
> >What do you guys think? I remember Stefano mentioned that we should add
> >two new virtqueues for dgram. Stefano, do you have some specific reasons
> >for that? Could we just keep using existing virtqueues? Thanks.
> >
>
> My biggest concern was about the credit mechanism for datagrams. I mean
> avoiding datagrams from crowding the queue without limits, preventing
> streams from communicating.
>
> If you've found a way to limit datagram traffic, then maybe it's doable.

I see. I will add some limit to dgram packets. Also, when the virtqueues
are shared between stream and dgram, both of them need to grab a lock
before using the virtqueue, so one will not completely block another one.

> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-05-13 23:26                                         ` Jiang Wang .
@ 2021-05-14 15:17                                           ` Stefano Garzarella
  2021-05-14 18:55                                             ` Jiang Wang .
  0 siblings, 1 reply; 34+ messages in thread
From: Stefano Garzarella @ 2021-05-14 15:17 UTC (permalink / raw)
  To: Jiang Wang .
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, Arseny Krasnov

On Thu, May 13, 2021 at 04:26:03PM -0700, Jiang Wang . wrote:
>On Mon, May 10, 2021 at 7:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> On Fri, May 07, 2021 at 09:53:19AM -0700, Jiang Wang . wrote:

[...]

>I was thinking if we don't add two new virtqueues, then maybe we don't
>need to add new feature bit too? If the other end does not support
>dgram, then the packets will be just dropped. What do you think? Do
>we still need to add dgram feature bits? I can have a feature bit for
>mergeable buffer.

With seqpacket, where we reuse stream queues, we decided to add the new 
feature bit, so I guess we should do the same for dgram.

In this way the driver knows if the protocol is supported and we can 
avoid for example to open a listening socket.

Without the feature bit this would not be possible. I mean, the sender 
will get an error, but the receiver will never know if it can receive or 
not.

>> >What do you guys think? I remember Stefano mentioned that we should 
>> >add
>> >two new virtqueues for dgram. Stefano, do you have some specific reasons
>> >for that? Could we just keep using existing virtqueues? Thanks.
>>
>> My biggest concern was about the credit mechanism for datagrams. I mean
>> avoiding datagrams from crowding the queue without limits, preventing
>> streams from communicating.
>>
>> If you've found a way to limit datagram traffic, then maybe it's doable.
>
>I see. I will add some limit to dgram packets. Also, when the virtqueues
>are shared between stream and dgram, both of them need to grab a lock
>before using the virtqueue, so one will not completely block another one.

I'm not worried about the concurrent access that we definitely need to 
handle with a lock, but more about the uncontrolled packet sending that 
dgram might have, flooding the queues and preventing others from 
communicating.

So having 2 dedicated queues could avoid a credit mechanism at all for 
connection-less sockets, and simply the receiver discards packets that 
it can't handle.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-05-14 15:17                                           ` Stefano Garzarella
@ 2021-05-14 18:55                                             ` Jiang Wang .
  2021-05-17 11:02                                               ` Stefano Garzarella
  0 siblings, 1 reply; 34+ messages in thread
From: Jiang Wang . @ 2021-05-14 18:55 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, Arseny Krasnov

On Fri, May 14, 2021 at 8:17 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Thu, May 13, 2021 at 04:26:03PM -0700, Jiang Wang . wrote:
> >On Mon, May 10, 2021 at 7:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >> On Fri, May 07, 2021 at 09:53:19AM -0700, Jiang Wang . wrote:
>
> [...]
>
> >I was thinking if we don't add two new virtqueues, then maybe we don't
> >need to add new feature bit too? If the other end does not support
> >dgram, then the packets will be just dropped. What do you think? Do
> >we still need to add dgram feature bits? I can have a feature bit for
> >mergeable buffer.
>
> With seqpacket, where we reuse stream queues, we decided to add the new
> feature bit, so I guess we should do the same for dgram.
>
> In this way the driver knows if the protocol is supported and we can
> avoid for example to open a listening socket.

Sure. I will keep the feature bit.

> Without the feature bit this would not be possible. I mean, the sender
> will get an error, but the receiver will never know if it can receive or
> not.
>
> >> >What do you guys think? I remember Stefano mentioned that we should
> >> >add
> >> >two new virtqueues for dgram. Stefano, do you have some specific reasons
> >> >for that? Could we just keep using existing virtqueues? Thanks.
> >>
> >> My biggest concern was about the credit mechanism for datagrams. I mean
> >> avoiding datagrams from crowding the queue without limits, preventing
> >> streams from communicating.
> >>
> >> If you've found a way to limit datagram traffic, then maybe it's doable.
> >
> >I see. I will add some limit to dgram packets. Also, when the virtqueues
> >are shared between stream and dgram, both of them need to grab a lock
> >before using the virtqueue, so one will not completely block another one.
>
> I'm not worried about the concurrent access that we definitely need to
> handle with a lock, but more about the uncontrolled packet sending that
> dgram might have, flooding the queues and preventing others from
> communicating.

That is a valid concern. Let me explain how I would handle that if we
don't add two new virtqueues. For dgram, I also add a dgram_send_pkt_list,
which is similar to send_pkt_list for stream (and seqpacket). But there
is one difference. The dgram_send_pkt_list has a maximum size setting,
and keep tracking how many pkts are in the list. The track number
(dgram_send_pkt_list_size) is  increased when a packet is added
to the list and is decreased when a packet
is removed from the list and added to the virtqueue. In
virtio_transport_send_pkt, if the current
dgram_send_pkt_list_size is equal
to the maximum ( let's say 128), then it will not add to the
dgram_send_pkt_list and return an error to the application.

In this way, the number of pending dgram pkts to be sent is limited.
Then both stream and dgram sockets will compete to hold a lock
for the tx virtqueue. Depending on the linux scheduler, this competition
will be somewhat fair. As a result, dgram will not block stream completely.
It will compete and share the virtqueue with stream, but stream
can still send some pkts.

Basically, the virtqueue becomes a first-come first-serve resource for
the stream and dgram. When both stream and dgram applications
have lots of data to send, dgram_send_pkt_list and send_pkt_list
will still be a limited size, and each will have some chance to send out
the data via virtqueue.  Does this address your concern?


> So having 2 dedicated queues could avoid a credit mechanism at all for
> connection-less sockets, and simply the receiver discards packets that
> it can't handle.

With 2 dedicated queues, we still need some kind of accounting for
the dgram. Like the dgram_send_pkt_size I mentioned above.  Otherwise,
it will cause OOM. It is not a real credit mechanism, but code is similar
with or without 2 dedicated queues in my current prototype.

For receiver discarding packets part, could you explain more? I think
receiver discarding pkts code also is same with or without 2 dedicated
queues. Both will use can_enqueue = virtio_transport_inc_rx_pkt(vvs, pkt);
to check if a packet should be discarded or not.

Thanks,

Jiang

> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-05-14 18:55                                             ` Jiang Wang .
@ 2021-05-17 11:02                                               ` Stefano Garzarella
  2021-05-18  6:33                                                 ` Jiang Wang .
  0 siblings, 1 reply; 34+ messages in thread
From: Stefano Garzarella @ 2021-05-17 11:02 UTC (permalink / raw)
  To: Jiang Wang .
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, Arseny Krasnov

On Fri, May 14, 2021 at 11:55:29AM -0700, Jiang Wang . wrote:
>On Fri, May 14, 2021 at 8:17 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> On Thu, May 13, 2021 at 04:26:03PM -0700, Jiang Wang . wrote:

[...]

>> >I see. I will add some limit to dgram packets. Also, when the 
>> >virtqueues
>> >are shared between stream and dgram, both of them need to grab a lock
>> >before using the virtqueue, so one will not completely block another one.
>>
>> I'm not worried about the concurrent access that we definitely need to
>> handle with a lock, but more about the uncontrolled packet sending that
>> dgram might have, flooding the queues and preventing others from
>> communicating.
>
>That is a valid concern. Let me explain how I would handle that if we
>don't add two new virtqueues. For dgram, I also add a dgram_send_pkt_list,
>which is similar to send_pkt_list for stream (and seqpacket). But there
>is one difference. The dgram_send_pkt_list has a maximum size setting,
>and keep tracking how many pkts are in the list. The track number
>(dgram_send_pkt_list_size) is  increased when a packet is added
>to the list and is decreased when a packet
>is removed from the list and added to the virtqueue. In
>virtio_transport_send_pkt, if the current
>dgram_send_pkt_list_size is equal
>to the maximum ( let's say 128), then it will not add to the
>dgram_send_pkt_list and return an error to the application.

For stream socket, we have the send_pkt_list and the send worker because 
the virtqueue can be full and the transmitter needs to wait available 
slots, because we can't discard packets.

For dgram I think we don't need this, so we can avoid the 
dgram_send_pkt_list and directly enqueue packets in the virtqueue.

If there are no slots available, we can simply drop the packet.
In this way we don't have to put limits other than the available space 
in the virtqueue.

>
>In this way, the number of pending dgram pkts to be sent is limited.
>Then both stream and dgram sockets will compete to hold a lock
>for the tx virtqueue. Depending on the linux scheduler, this competition
>will be somewhat fair. As a result, dgram will not block stream completely.
>It will compete and share the virtqueue with stream, but stream
>can still send some pkts.
>
>Basically, the virtqueue becomes a first-come first-serve resource for
>the stream and dgram. When both stream and dgram applications
>have lots of data to send, dgram_send_pkt_list and send_pkt_list
>will still be a limited size, and each will have some chance to send out
>the data via virtqueue.  Does this address your concern?
>
>
>> So having 2 dedicated queues could avoid a credit mechanism at all for
>> connection-less sockets, and simply the receiver discards packets that
>> it can't handle.
>
>With 2 dedicated queues, we still need some kind of accounting for
>the dgram. Like the dgram_send_pkt_size I mentioned above.  Otherwise,
>it will cause OOM. It is not a real credit mechanism, but code is similar
>with or without 2 dedicated queues in my current prototype.

I think in both cases we don't need an accounting, but we can drop 
packets if the virtqueue is full.

It's still not clear to me where OOM may occur. Can you elaborate on 
this?

>
>For receiver discarding packets part, could you explain more? I think
>receiver discarding pkts code also is same with or without 2 dedicated
>queues.

Yep.

> Both will use can_enqueue = virtio_transport_inc_rx_pkt(vvs, pkt);
>to check if a packet should be discarded or not.

I don't think we can use virtio_transport_inc_rx_pkt() for dgram.
This is based on the credit mechanism (e.g. buf_alloc) that works when 
you have a connection oriented socket.

With dgram you can have multiple transmitter for a single socket, so how 
we can exchange that information?

If we reuse the VMCI implementation with skbuff, the sk_receive_skb() 
already checks if the sk_rcvbuf is full and discards the packet in this 
case, so we don't need an internal rx_queue.

Maybe someday we should convert stream to skbuff too, it would make our 
lives easier.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-05-17 11:02                                               ` Stefano Garzarella
@ 2021-05-18  6:33                                                 ` Jiang Wang .
  2021-05-18 13:02                                                   ` Stefano Garzarella
  0 siblings, 1 reply; 34+ messages in thread
From: Jiang Wang . @ 2021-05-18  6:33 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, Arseny Krasnov

On Mon, May 17, 2021 at 4:02 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Fri, May 14, 2021 at 11:55:29AM -0700, Jiang Wang . wrote:
> >On Fri, May 14, 2021 at 8:17 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >> On Thu, May 13, 2021 at 04:26:03PM -0700, Jiang Wang . wrote:
>
> [...]
>
> >> >I see. I will add some limit to dgram packets. Also, when the
> >> >virtqueues
> >> >are shared between stream and dgram, both of them need to grab a lock
> >> >before using the virtqueue, so one will not completely block another one.
> >>
> >> I'm not worried about the concurrent access that we definitely need to
> >> handle with a lock, but more about the uncontrolled packet sending that
> >> dgram might have, flooding the queues and preventing others from
> >> communicating.
> >
> >That is a valid concern. Let me explain how I would handle that if we
> >don't add two new virtqueues. For dgram, I also add a dgram_send_pkt_list,
> >which is similar to send_pkt_list for stream (and seqpacket). But there
> >is one difference. The dgram_send_pkt_list has a maximum size setting,
> >and keep tracking how many pkts are in the list. The track number
> >(dgram_send_pkt_list_size) is  increased when a packet is added
> >to the list and is decreased when a packet
> >is removed from the list and added to the virtqueue. In
> >virtio_transport_send_pkt, if the current
> >dgram_send_pkt_list_size is equal
> >to the maximum ( let's say 128), then it will not add to the
> >dgram_send_pkt_list and return an error to the application.
>
> For stream socket, we have the send_pkt_list and the send worker because
> the virtqueue can be full and the transmitter needs to wait available
> slots, because we can't discard packets.
>
> For dgram I think we don't need this, so we can avoid the
> dgram_send_pkt_list and directly enqueue packets in the virtqueue.
>
> If there are no slots available, we can simply drop the packet.
> In this way we don't have to put limits other than the available space
> in the virtqueue.

I considered this approach before. One question not clear to me is
whether we should call virtqueue_kick for every dgram packet. If I
am not wrong, each virtqueue_kick will lead to a vm_exit to the host.
If we call virtqueue_kick() for each dgram packet, the performance
might be bad when there are lots of packets. One idea is to set
a threshold and a timer to call virtqueue_kick(). When there are
lots of packets, we wait until a threshold. When there are few packets,
the timer will make sure those packets will be delivered not too late.

Any other ideas for virtqueue_kick? If the threshold plus timer is fine,
I can go this direction too.

> >
> >In this way, the number of pending dgram pkts to be sent is limited.
> >Then both stream and dgram sockets will compete to hold a lock
> >for the tx virtqueue. Depending on the linux scheduler, this competition
> >will be somewhat fair. As a result, dgram will not block stream completely.
> >It will compete and share the virtqueue with stream, but stream
> >can still send some pkts.
> >
> >Basically, the virtqueue becomes a first-come first-serve resource for
> >the stream and dgram. When both stream and dgram applications
> >have lots of data to send, dgram_send_pkt_list and send_pkt_list
> >will still be a limited size, and each will have some chance to send out
> >the data via virtqueue.  Does this address your concern?
> >
> >
> >> So having 2 dedicated queues could avoid a credit mechanism at all for
> >> connection-less sockets, and simply the receiver discards packets that
> >> it can't handle.
> >
> >With 2 dedicated queues, we still need some kind of accounting for
> >the dgram. Like the dgram_send_pkt_size I mentioned above.  Otherwise,
> >it will cause OOM. It is not a real credit mechanism, but code is similar
> >with or without 2 dedicated queues in my current prototype.
>
> I think in both cases we don't need an accounting, but we can drop
> packets if the virtqueue is full.
>
> It's still not clear to me where OOM may occur. Can you elaborate on
> this?

OOM depending on the implementation details. If we keep
dgram_send_pkt_list, and put no limitation,  it may increase indefinitely
and cause OOM. As long as we have some limitations or drop packets
quickly, then we should be fine.

> >
> >For receiver discarding packets part, could you explain more? I think
> >receiver discarding pkts code also is same with or without 2 dedicated
> >queues.
>
> Yep.
>
> > Both will use can_enqueue = virtio_transport_inc_rx_pkt(vvs, pkt);
> >to check if a packet should be discarded or not.
>
> I don't think we can use virtio_transport_inc_rx_pkt() for dgram.
> This is based on the credit mechanism (e.g. buf_alloc) that works when
> you have a connection oriented socket.
>
> With dgram you can have multiple transmitter for a single socket, so how
> we can exchange that information?
>
In my view, we don't need to exchange that information. Buf_alloc for
dgram is local to a receiving socket. The sending sockets do not know
about that. For receiving socket, it just checks if there is still buffer
available to decide to drop a packet or not. If multiple transmitter
send to a single socket, they will share the same buf_alloc, and packets
will be dropped randomly for those transmitters.

> If we reuse the VMCI implementation with skbuff, the sk_receive_skb()
> already checks if the sk_rcvbuf is full and discards the packet in this
> case, so we don't need an internal rx_queue.

OK.

> Maybe someday we should convert stream to skbuff too, it would make our
> lives easier.
>
Yeah.  I remember the original virtio vsock patch did use skb, but somehow
it  did not use skb anymore when merging to the upstream, not sure why.

Thanks,

Jiang

> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-05-18  6:33                                                 ` Jiang Wang .
@ 2021-05-18 13:02                                                   ` Stefano Garzarella
  2021-05-19  4:59                                                     ` Jiang Wang .
  0 siblings, 1 reply; 34+ messages in thread
From: Stefano Garzarella @ 2021-05-18 13:02 UTC (permalink / raw)
  To: Jiang Wang .
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, Arseny Krasnov

On Mon, May 17, 2021 at 11:33:06PM -0700, Jiang Wang . wrote:
>On Mon, May 17, 2021 at 4:02 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> On Fri, May 14, 2021 at 11:55:29AM -0700, Jiang Wang . wrote:
>> >On Fri, May 14, 2021 at 8:17 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> >> On Thu, May 13, 2021 at 04:26:03PM -0700, Jiang Wang . wrote:
>>
>> [...]
>>
>> >> >I see. I will add some limit to dgram packets. Also, when the
>> >> >virtqueues
>> >> >are shared between stream and dgram, both of them need to grab a lock
>> >> >before using the virtqueue, so one will not completely block another one.
>> >>
>> >> I'm not worried about the concurrent access that we definitely need to
>> >> handle with a lock, but more about the uncontrolled packet sending that
>> >> dgram might have, flooding the queues and preventing others from
>> >> communicating.
>> >
>> >That is a valid concern. Let me explain how I would handle that if we
>> >don't add two new virtqueues. For dgram, I also add a dgram_send_pkt_list,
>> >which is similar to send_pkt_list for stream (and seqpacket). But there
>> >is one difference. The dgram_send_pkt_list has a maximum size setting,
>> >and keep tracking how many pkts are in the list. The track number
>> >(dgram_send_pkt_list_size) is  increased when a packet is added
>> >to the list and is decreased when a packet
>> >is removed from the list and added to the virtqueue. In
>> >virtio_transport_send_pkt, if the current
>> >dgram_send_pkt_list_size is equal
>> >to the maximum ( let's say 128), then it will not add to the
>> >dgram_send_pkt_list and return an error to the application.
>>
>> For stream socket, we have the send_pkt_list and the send worker because
>> the virtqueue can be full and the transmitter needs to wait available
>> slots, because we can't discard packets.
>>
>> For dgram I think we don't need this, so we can avoid the
>> dgram_send_pkt_list and directly enqueue packets in the virtqueue.
>>
>> If there are no slots available, we can simply drop the packet.
>> In this way we don't have to put limits other than the available space
>> in the virtqueue.
>
>I considered this approach before. One question not clear to me is
>whether we should call virtqueue_kick for every dgram packet. If I
>am not wrong, each virtqueue_kick will lead to a vm_exit to the host.
>If we call virtqueue_kick() for each dgram packet, the performance
>might be bad when there are lots of packets.

Not every virtqueue_kick() will lead to a vm_exit. If the host (see 
vhost_transport_do_send_pkt()) is handling the virtqueue, it disables 
the notification through vhost_disable_notify().

>One idea is to set
>a threshold and a timer to call virtqueue_kick(). When there are
>lots of packets, we wait until a threshold. When there are few packets,
>the timer will make sure those packets will be delivered not too late.

I think is better to keep the host polling a bit if we want to avoid 
some notifications (see vhost_net_busy_poll()).

>
>Any other ideas for virtqueue_kick? If the threshold plus timer is fine,
>I can go this direction too.

I think is better to follow vhost_net_busy_poll() approach.

>
>> >
>> >In this way, the number of pending dgram pkts to be sent is limited.
>> >Then both stream and dgram sockets will compete to hold a lock
>> >for the tx virtqueue. Depending on the linux scheduler, this competition
>> >will be somewhat fair. As a result, dgram will not block stream completely.
>> >It will compete and share the virtqueue with stream, but stream
>> >can still send some pkts.
>> >
>> >Basically, the virtqueue becomes a first-come first-serve resource for
>> >the stream and dgram. When both stream and dgram applications
>> >have lots of data to send, dgram_send_pkt_list and send_pkt_list
>> >will still be a limited size, and each will have some chance to send out
>> >the data via virtqueue.  Does this address your concern?
>> >
>> >
>> >> So having 2 dedicated queues could avoid a credit mechanism at all for
>> >> connection-less sockets, and simply the receiver discards packets that
>> >> it can't handle.
>> >
>> >With 2 dedicated queues, we still need some kind of accounting for
>> >the dgram. Like the dgram_send_pkt_size I mentioned above.  Otherwise,
>> >it will cause OOM. It is not a real credit mechanism, but code is similar
>> >with or without 2 dedicated queues in my current prototype.
>>
>> I think in both cases we don't need an accounting, but we can drop
>> packets if the virtqueue is full.
>>
>> It's still not clear to me where OOM may occur. Can you elaborate on
>> this?
>
>OOM depending on the implementation details. If we keep
>dgram_send_pkt_list, and put no limitation,  it may increase indefinitely
>and cause OOM. As long as we have some limitations or drop packets
>quickly, then we should be fine.

Sure, with the extra list we need some limitations.

>
>> >
>> >For receiver discarding packets part, could you explain more? I think
>> >receiver discarding pkts code also is same with or without 2 dedicated
>> >queues.
>>
>> Yep.
>>
>> > Both will use can_enqueue = virtio_transport_inc_rx_pkt(vvs, pkt);
>> >to check if a packet should be discarded or not.
>>
>> I don't think we can use virtio_transport_inc_rx_pkt() for dgram.
>> This is based on the credit mechanism (e.g. buf_alloc) that works when
>> you have a connection oriented socket.
>>
>> With dgram you can have multiple transmitter for a single socket, so how
>> we can exchange that information?
>>
>In my view, we don't need to exchange that information. Buf_alloc for
>dgram is local to a receiving socket. The sending sockets do not know
>about that. For receiving socket, it just checks if there is still buffer
>available to decide to drop a packet or not. If multiple transmitter
>send to a single socket, they will share the same buf_alloc, and packets
>will be dropped randomly for those transmitters.

I see, but if we reuse skbuff I think we don't need this.

>
>> If we reuse the VMCI implementation with skbuff, the sk_receive_skb()
>> already checks if the sk_rcvbuf is full and discards the packet in 
>> this
>> case, so we don't need an internal rx_queue.
>
>OK.
>
>> Maybe someday we should convert stream to skbuff too, it would make our
>> lives easier.
>>
>Yeah.  I remember the original virtio vsock patch did use skb, but somehow
>it  did not use skb anymore when merging to the upstream, not sure why.

Really? I didn't know, then I'll take a look. If you have any links, 
please send :-)

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-05-18 13:02                                                   ` Stefano Garzarella
@ 2021-05-19  4:59                                                     ` Jiang Wang .
  2021-06-09  4:31                                                       ` Jiang Wang .
  0 siblings, 1 reply; 34+ messages in thread
From: Jiang Wang . @ 2021-05-19  4:59 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, xieyongji, Stefan Hajnoczi, Arseny Krasnov

On Tue, May 18, 2021 at 6:02 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Mon, May 17, 2021 at 11:33:06PM -0700, Jiang Wang . wrote:
> >On Mon, May 17, 2021 at 4:02 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >>
> >> On Fri, May 14, 2021 at 11:55:29AM -0700, Jiang Wang . wrote:
> >> >On Fri, May 14, 2021 at 8:17 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >> >> On Thu, May 13, 2021 at 04:26:03PM -0700, Jiang Wang . wrote:
> >>
> >> [...]
> >>
> >> >> >I see. I will add some limit to dgram packets. Also, when the
> >> >> >virtqueues
> >> >> >are shared between stream and dgram, both of them need to grab a lock
> >> >> >before using the virtqueue, so one will not completely block another one.
> >> >>
> >> >> I'm not worried about the concurrent access that we definitely need to
> >> >> handle with a lock, but more about the uncontrolled packet sending that
> >> >> dgram might have, flooding the queues and preventing others from
> >> >> communicating.
> >> >
> >> >That is a valid concern. Let me explain how I would handle that if we
> >> >don't add two new virtqueues. For dgram, I also add a dgram_send_pkt_list,
> >> >which is similar to send_pkt_list for stream (and seqpacket). But there
> >> >is one difference. The dgram_send_pkt_list has a maximum size setting,
> >> >and keep tracking how many pkts are in the list. The track number
> >> >(dgram_send_pkt_list_size) is  increased when a packet is added
> >> >to the list and is decreased when a packet
> >> >is removed from the list and added to the virtqueue. In
> >> >virtio_transport_send_pkt, if the current
> >> >dgram_send_pkt_list_size is equal
> >> >to the maximum ( let's say 128), then it will not add to the
> >> >dgram_send_pkt_list and return an error to the application.
> >>
> >> For stream socket, we have the send_pkt_list and the send worker because
> >> the virtqueue can be full and the transmitter needs to wait available
> >> slots, because we can't discard packets.
> >>
> >> For dgram I think we don't need this, so we can avoid the
> >> dgram_send_pkt_list and directly enqueue packets in the virtqueue.
> >>
> >> If there are no slots available, we can simply drop the packet.
> >> In this way we don't have to put limits other than the available space
> >> in the virtqueue.
> >
> >I considered this approach before. One question not clear to me is
> >whether we should call virtqueue_kick for every dgram packet. If I
> >am not wrong, each virtqueue_kick will lead to a vm_exit to the host.
> >If we call virtqueue_kick() for each dgram packet, the performance
> >might be bad when there are lots of packets.
>
> Not every virtqueue_kick() will lead to a vm_exit. If the host (see
> vhost_transport_do_send_pkt()) is handling the virtqueue, it disables
> the notification through vhost_disable_notify().

Got it.

> >One idea is to set
> >a threshold and a timer to call virtqueue_kick(). When there are
> >lots of packets, we wait until a threshold. When there are few packets,
> >the timer will make sure those packets will be delivered not too late.
>
> I think is better to keep the host polling a bit if we want to avoid
> some notifications (see vhost_net_busy_poll()).

Sure.

> >
> >Any other ideas for virtqueue_kick? If the threshold plus timer is fine,
> >I can go this direction too.
>
> I think is better to follow vhost_net_busy_poll() approach.

OK. I will follow vhost_net_busy_poll()

> >
> >> >
> >> >In this way, the number of pending dgram pkts to be sent is limited.
> >> >Then both stream and dgram sockets will compete to hold a lock
> >> >for the tx virtqueue. Depending on the linux scheduler, this competition
> >> >will be somewhat fair. As a result, dgram will not block stream completely.
> >> >It will compete and share the virtqueue with stream, but stream
> >> >can still send some pkts.
> >> >
> >> >Basically, the virtqueue becomes a first-come first-serve resource for
> >> >the stream and dgram. When both stream and dgram applications
> >> >have lots of data to send, dgram_send_pkt_list and send_pkt_list
> >> >will still be a limited size, and each will have some chance to send out
> >> >the data via virtqueue.  Does this address your concern?
> >> >
> >> >
> >> >> So having 2 dedicated queues could avoid a credit mechanism at all for
> >> >> connection-less sockets, and simply the receiver discards packets that
> >> >> it can't handle.
> >> >
> >> >With 2 dedicated queues, we still need some kind of accounting for
> >> >the dgram. Like the dgram_send_pkt_size I mentioned above.  Otherwise,
> >> >it will cause OOM. It is not a real credit mechanism, but code is similar
> >> >with or without 2 dedicated queues in my current prototype.
> >>
> >> I think in both cases we don't need an accounting, but we can drop
> >> packets if the virtqueue is full.
> >>
> >> It's still not clear to me where OOM may occur. Can you elaborate on
> >> this?
> >
> >OOM depending on the implementation details. If we keep
> >dgram_send_pkt_list, and put no limitation,  it may increase indefinitely
> >and cause OOM. As long as we have some limitations or drop packets
> >quickly, then we should be fine.
>
> Sure, with the extra list we need some limitations.
>
> >
> >> >
> >> >For receiver discarding packets part, could you explain more? I think
> >> >receiver discarding pkts code also is same with or without 2 dedicated
> >> >queues.
> >>
> >> Yep.
> >>
> >> > Both will use can_enqueue = virtio_transport_inc_rx_pkt(vvs, pkt);
> >> >to check if a packet should be discarded or not.
> >>
> >> I don't think we can use virtio_transport_inc_rx_pkt() for dgram.
> >> This is based on the credit mechanism (e.g. buf_alloc) that works when
> >> you have a connection oriented socket.
> >>
> >> With dgram you can have multiple transmitter for a single socket, so how
> >> we can exchange that information?
> >>
> >In my view, we don't need to exchange that information. Buf_alloc for
> >dgram is local to a receiving socket. The sending sockets do not know
> >about that. For receiving socket, it just checks if there is still buffer
> >available to decide to drop a packet or not. If multiple transmitter
> >send to a single socket, they will share the same buf_alloc, and packets
> >will be dropped randomly for those transmitters.
>
> I see, but if we reuse skbuff I think we don't need this.
>
> >
> >> If we reuse the VMCI implementation with skbuff, the sk_receive_skb()
> >> already checks if the sk_rcvbuf is full and discards the packet in
> >> this
> >> case, so we don't need an internal rx_queue.
> >
> >OK.
> >
> >> Maybe someday we should convert stream to skbuff too, it would make our
> >> lives easier.
> >>
> >Yeah.  I remember the original virtio vsock patch did use skb, but somehow
> >it  did not use skb anymore when merging to the upstream, not sure why.
>
> Really? I didn't know, then I'll take a look. If you have any links,
> please send :-)

No. I take it back. The skb is only used for dgram, not for stream
sockets. Sorry for the confusion. The code I found is not in a easy-to-read
format. Anyway, here is the link if you are interested:

https://kvm.vger.kernel.narkive.com/BMvH9eEr/rfc-v2-0-7-introduce-vm-sockets-virtio-transport#post1

btw: as major parts of the spec are kind of settled. I will send my current
POC code to the mailing list just to have a reference and see if there are
other major changes are needed. I will add to do's in the cover letter.

Will update spec too.


> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-05-19  4:59                                                     ` Jiang Wang .
@ 2021-06-09  4:31                                                       ` Jiang Wang .
  2021-06-09  7:40                                                         ` Stefano Garzarella
  0 siblings, 1 reply; 34+ messages in thread
From: Jiang Wang . @ 2021-06-09  4:31 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, Yongji Xie, Stefan Hajnoczi, Arseny Krasnov

On Tue, May 18, 2021 at 9:59 PM Jiang Wang . <jiang.wang@bytedance.com> wrote:
>
> On Tue, May 18, 2021 at 6:02 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > On Mon, May 17, 2021 at 11:33:06PM -0700, Jiang Wang . wrote:
> > >On Mon, May 17, 2021 at 4:02 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >>
> > >> On Fri, May 14, 2021 at 11:55:29AM -0700, Jiang Wang . wrote:
> > >> >On Fri, May 14, 2021 at 8:17 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >> >> On Thu, May 13, 2021 at 04:26:03PM -0700, Jiang Wang . wrote:
> > >>
> > >> [...]
> > >>
> > >> >> >I see. I will add some limit to dgram packets. Also, when the
> > >> >> >virtqueues
> > >> >> >are shared between stream and dgram, both of them need to grab a lock
> > >> >> >before using the virtqueue, so one will not completely block another one.
> > >> >>
> > >> >> I'm not worried about the concurrent access that we definitely need to
> > >> >> handle with a lock, but more about the uncontrolled packet sending that
> > >> >> dgram might have, flooding the queues and preventing others from
> > >> >> communicating.
> > >> >
> > >> >That is a valid concern. Let me explain how I would handle that if we
> > >> >don't add two new virtqueues. For dgram, I also add a dgram_send_pkt_list,
> > >> >which is similar to send_pkt_list for stream (and seqpacket). But there
> > >> >is one difference. The dgram_send_pkt_list has a maximum size setting,
> > >> >and keep tracking how many pkts are in the list. The track number
> > >> >(dgram_send_pkt_list_size) is  increased when a packet is added
> > >> >to the list and is decreased when a packet
> > >> >is removed from the list and added to the virtqueue. In
> > >> >virtio_transport_send_pkt, if the current
> > >> >dgram_send_pkt_list_size is equal
> > >> >to the maximum ( let's say 128), then it will not add to the
> > >> >dgram_send_pkt_list and return an error to the application.
> > >>
> > >> For stream socket, we have the send_pkt_list and the send worker because
> > >> the virtqueue can be full and the transmitter needs to wait available
> > >> slots, because we can't discard packets.
> > >> For dgram I think we don't need this, so we can avoid the
> > >> dgram_send_pkt_list and directly enqueue packets in the virtqueue.
> > >>

For the question of whether we need dgram_send_pkt_list, I tried to remove
it and that has no problem with virtio vsock in the guest. But on the host, we
still need to keep the dgram_send_pkt_list. The reason is that we cannot
access virtqueue memory reliably in the syscall handling of an
arbitrary process.
The virtqueue memory is in the QEMU process virtual memory and may be
paged out.

> > >> If there are no slots available, we can simply drop the packet.
> > >> In this way we don't have to put limits other than the available space
> > >> in the virtqueue.
> > >
> > >I considered this approach before. One question not clear to me is
> > >whether we should call virtqueue_kick for every dgram packet. If I
> > >am not wrong, each virtqueue_kick will lead to a vm_exit to the host.
> > >If we call virtqueue_kick() for each dgram packet, the performance
> > >might be bad when there are lots of packets.
> >
> > Not every virtqueue_kick() will lead to a vm_exit. If the host (see
> > vhost_transport_do_send_pkt()) is handling the virtqueue, it disables
> > the notification through vhost_disable_notify().
>
> Got it.
>
> > >One idea is to set
> > >a threshold and a timer to call virtqueue_kick(). When there are
> > >lots of packets, we wait until a threshold. When there are few packets,
> > >the timer will make sure those packets will be delivered not too late.
> >
> > I think is better to keep the host polling a bit if we want to avoid
> > some notifications (see vhost_net_busy_poll()).
>
> Sure.
>
> > >
> > >Any other ideas for virtqueue_kick? If the threshold plus timer is fine,
> > >I can go this direction too.
> >
> > I think is better to follow vhost_net_busy_poll() approach.
>
> OK. I will follow vhost_net_busy_poll()
>
> > >
> > >> >
> > >> >In this way, the number of pending dgram pkts to be sent is limited.
> > >> >Then both stream and dgram sockets will compete to hold a lock
> > >> >for the tx virtqueue. Depending on the linux scheduler, this competition
> > >> >will be somewhat fair. As a result, dgram will not block stream completely.
> > >> >It will compete and share the virtqueue with stream, but stream
> > >> >can still send some pkts.
> > >> >
> > >> >Basically, the virtqueue becomes a first-come first-serve resource for
> > >> >the stream and dgram. When both stream and dgram applications
> > >> >have lots of data to send, dgram_send_pkt_list and send_pkt_list
> > >> >will still be a limited size, and each will have some chance to send out
> > >> >the data via virtqueue.  Does this address your concern?
> > >> >
> > >> >
> > >> >> So having 2 dedicated queues could avoid a credit mechanism at all for
> > >> >> connection-less sockets, and simply the receiver discards packets that
> > >> >> it can't handle.
> > >> >
> > >> >With 2 dedicated queues, we still need some kind of accounting for
> > >> >the dgram. Like the dgram_send_pkt_size I mentioned above.  Otherwise,
> > >> >it will cause OOM. It is not a real credit mechanism, but code is similar
> > >> >with or without 2 dedicated queues in my current prototype.
> > >>
> > >> I think in both cases we don't need an accounting, but we can drop
> > >> packets if the virtqueue is full.
> > >>
> > >> It's still not clear to me where OOM may occur. Can you elaborate on
> > >> this?
> > >
> > >OOM depending on the implementation details. If we keep
> > >dgram_send_pkt_list, and put no limitation,  it may increase indefinitely
> > >and cause OOM. As long as we have some limitations or drop packets
> > >quickly, then we should be fine.
> >
> > Sure, with the extra list we need some limitations.
> >
> > >
> > >> >
> > >> >For receiver discarding packets part, could you explain more? I think
> > >> >receiver discarding pkts code also is same with or without 2 dedicated
> > >> >queues.
> > >>
> > >> Yep.
> > >>
> > >> > Both will use can_enqueue = virtio_transport_inc_rx_pkt(vvs, pkt);
> > >> >to check if a packet should be discarded or not.
> > >>
> > >> I don't think we can use virtio_transport_inc_rx_pkt() for dgram.
> > >> This is based on the credit mechanism (e.g. buf_alloc) that works when
> > >> you have a connection oriented socket.
> > >>
> > >> With dgram you can have multiple transmitter for a single socket, so how
> > >> we can exchange that information?
> > >>
> > >In my view, we don't need to exchange that information. Buf_alloc for
> > >dgram is local to a receiving socket. The sending sockets do not know
> > >about that. For receiving socket, it just checks if there is still buffer
> > >available to decide to drop a packet or not. If multiple transmitter
> > >send to a single socket, they will share the same buf_alloc, and packets
> > >will be dropped randomly for those transmitters.
> >
> > I see, but if we reuse skbuff I think we don't need this.
> >
> > >
> > >> If we reuse the VMCI implementation with skbuff, the sk_receive_skb()
> > >> already checks if the sk_rcvbuf is full and discards the packet in
> > >> this
> > >> case, so we don't need an internal rx_queue.
> > >
> > >OK.
> > >
> > >> Maybe someday we should convert stream to skbuff too, it would make our
> > >> lives easier.
> > >>
> > >Yeah.  I remember the original virtio vsock patch did use skb, but somehow
> > >it  did not use skb anymore when merging to the upstream, not sure why.
> >
> > Really? I didn't know, then I'll take a look. If you have any links,
> > please send :-)
>
> No. I take it back. The skb is only used for dgram, not for stream
> sockets. Sorry for the confusion. The code I found is not in a easy-to-read
> format. Anyway, here is the link if you are interested:
>
> https://kvm.vger.kernel.narkive.com/BMvH9eEr/rfc-v2-0-7-introduce-vm-sockets-virtio-transport#post1
>
> btw: as major parts of the spec are kind of settled. I will send my current
> POC code to the mailing list just to have a reference and see if there are
> other major changes are needed. I will add to do's in the cover letter.
>
> Will update spec too.
>
>
> > Thanks,
> > Stefano
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC v2] virtio-vsock: add description for datagram type
  2021-06-09  4:31                                                       ` Jiang Wang .
@ 2021-06-09  7:40                                                         ` Stefano Garzarella
  0 siblings, 0 replies; 34+ messages in thread
From: Stefano Garzarella @ 2021-06-09  7:40 UTC (permalink / raw)
  To: Jiang Wang .
  Cc: cong.wang, Xiongchun Duan, Michael S. Tsirkin, cohuck,
	virtualization, Yongji Xie, Stefan Hajnoczi, Arseny Krasnov

On Tue, Jun 08, 2021 at 09:31:28PM -0700, Jiang Wang . wrote:
>On Tue, May 18, 2021 at 9:59 PM Jiang Wang . <jiang.wang@bytedance.com> wrote:
>>
>> On Tue, May 18, 2021 at 6:02 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> >
>> > On Mon, May 17, 2021 at 11:33:06PM -0700, Jiang Wang . wrote:
>> > >On Mon, May 17, 2021 at 4:02 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> > >>
>> > >> On Fri, May 14, 2021 at 11:55:29AM -0700, Jiang Wang . wrote:
>> > >> >On Fri, May 14, 2021 at 8:17 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>> > >> >> On Thu, May 13, 2021 at 04:26:03PM -0700, Jiang Wang . wrote:
>> > >>
>> > >> [...]
>> > >>
>> > >> >> >I see. I will add some limit to dgram packets. Also, when the
>> > >> >> >virtqueues
>> > >> >> >are shared between stream and dgram, both of them need to grab a lock
>> > >> >> >before using the virtqueue, so one will not completely block another one.
>> > >> >>
>> > >> >> I'm not worried about the concurrent access that we definitely need to
>> > >> >> handle with a lock, but more about the uncontrolled packet sending that
>> > >> >> dgram might have, flooding the queues and preventing others from
>> > >> >> communicating.
>> > >> >
>> > >> >That is a valid concern. Let me explain how I would handle that if we
>> > >> >don't add two new virtqueues. For dgram, I also add a dgram_send_pkt_list,
>> > >> >which is similar to send_pkt_list for stream (and seqpacket). But there
>> > >> >is one difference. The dgram_send_pkt_list has a maximum size setting,
>> > >> >and keep tracking how many pkts are in the list. The track number
>> > >> >(dgram_send_pkt_list_size) is  increased when a packet is added
>> > >> >to the list and is decreased when a packet
>> > >> >is removed from the list and added to the virtqueue. In
>> > >> >virtio_transport_send_pkt, if the current
>> > >> >dgram_send_pkt_list_size is equal
>> > >> >to the maximum ( let's say 128), then it will not add to the
>> > >> >dgram_send_pkt_list and return an error to the application.
>> > >>
>> > >> For stream socket, we have the send_pkt_list and the send worker because
>> > >> the virtqueue can be full and the transmitter needs to wait available
>> > >> slots, because we can't discard packets.
>> > >> For dgram I think we don't need this, so we can avoid the
>> > >> dgram_send_pkt_list and directly enqueue packets in the virtqueue.
>> > >>
>
>For the question of whether we need dgram_send_pkt_list, I tried to remove
>it and that has no problem with virtio vsock in the guest. But on the host, we
>still need to keep the dgram_send_pkt_list. The reason is that we cannot
>access virtqueue memory reliably in the syscall handling of an
>arbitrary process.
>The virtqueue memory is in the QEMU process virtual memory and may be
>paged out.

I see, I think in that case we can use the virtqueue size as limit for 
the dgram_send_pkt_list.

I mean for example if the virtqueue has 128 elements, we can queue at 
least 128 packets in the dgram_send_pkt_list.

If you have a better idea go ahead, we can discuss this implementation 
detail in the RFC linux series :-)

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-06-09  7:40 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-01  4:36 [RFC v2] virtio-vsock: add description for datagram type jiang.wang
2021-04-12 13:50 ` Stefan Hajnoczi
2021-04-12 14:21   ` Stefano Garzarella
2021-04-12 22:42     ` Jiang Wang .
2021-04-13 12:58       ` Stefano Garzarella
2021-04-13 13:16         ` Michael S. Tsirkin
2021-04-13 13:38           ` Stefano Garzarella
2021-04-13 13:50             ` Michael S. Tsirkin
2021-04-13 14:03               ` Stefano Garzarella
2021-04-13 19:58                 ` Michael S. Tsirkin
2021-04-13 22:00                   ` Jiang Wang .
2021-04-14  7:07                     ` Stefano Garzarella
2021-04-14  6:57                   ` Stefano Garzarella
2021-04-14  7:20                     ` Michael S. Tsirkin
2021-04-14  9:38                       ` Stefano Garzarella
2021-04-15  3:15                         ` Jiang Wang .
2021-05-04  3:40                           ` Jiang Wang .
2021-05-04 16:16                             ` Stefano Garzarella
2021-05-04 17:06                               ` Jiang Wang .
2021-05-05 10:49                                 ` Stefano Garzarella
2021-05-05 16:58                                   ` Jiang Wang .
2021-05-07 16:53                                     ` Jiang Wang .
2021-05-10 14:50                                       ` Stefano Garzarella
2021-05-13 23:26                                         ` Jiang Wang .
2021-05-14 15:17                                           ` Stefano Garzarella
2021-05-14 18:55                                             ` Jiang Wang .
2021-05-17 11:02                                               ` Stefano Garzarella
2021-05-18  6:33                                                 ` Jiang Wang .
2021-05-18 13:02                                                   ` Stefano Garzarella
2021-05-19  4:59                                                     ` Jiang Wang .
2021-06-09  4:31                                                       ` Jiang Wang .
2021-06-09  7:40                                                         ` Stefano Garzarella
2021-04-12 22:39   ` [External] " Jiang Wang .
2021-05-13 14:57     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.