virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: "Jiang Wang ." <jiang.wang@bytedance.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: cong.wang@bytedance.com,
	Xiongchun Duan <duanxiongchun@bytedance.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	cohuck@redhat.com, virtualization@lists.linux-foundation.org,
	xieyongji@bytedance.com,
	Arseny Krasnov <arseny.krasnov@kaspersky.com>,
	asias@redhat.com
Subject: Re: [External] Re: [RFC v2] virtio-vsock: add description for datagram type
Date: Mon, 12 Apr 2021 15:39:36 -0700	[thread overview]
Message-ID: <CAP_N_Z-OMqQtnV04Wpynr7GhZooz1iQQ+0To-P8xPEnA0+sZgQ@mail.gmail.com> (raw)
In-Reply-To: <YHRQGSSnkN8Zipy0@stefanha-x1.localdomain>

On Mon, Apr 12, 2021 at 6:50 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
> > Add supports for datagram type for virtio-vsock. Datagram
> > sockets are connectionless and unreliable. To avoid contention
> > with stream and other sockets, add two more virtqueues and
> > a new feature bit to identify if those two new queues exist or not.
> >
> > Also add descriptions for resource management of datagram, which
> > does not use the existing credit update mechanism associated with
> > stream sockets.
> >
> > Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> > ---
> > V2 addressed the comments for the previous version.
> >
> >  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
> >  1 file changed, 52 insertions(+), 10 deletions(-)
> >
> > diff --git a/virtio-vsock.tex b/virtio-vsock.tex
> > index da7e641..62c12e0 100644
> > --- a/virtio-vsock.tex
> > +++ b/virtio-vsock.tex
> > @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
> >  \begin{description}
> >  \item[0] rx
> >  \item[1] tx
> > +\item[2] datagram rx
> > +\item[3] datagram tx
> > +\item[4] event
> > +\end{description}
> > +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
> > +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
> > +
> > +\begin{description}
> > +\item[0] rx
> > +\item[1] tx
> >  \item[2] event
> >  \end{description}
> >
>
> I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
> virtqueues and also adding this:

OK

>   When behavior differs between stream and datagram rx/tx virtqueues
>   their full names are used. Common behavior is simply described in
>   terms of rx/tx virtqueues and applies to both stream and datagram
>   virtqueues.

OK

> This way you won't need to duplicate portions of the spec that deal with
> populating the virtqueues, for example.
>
> It's also clearer to use a full name for stream rx/tx virtqueues instead
> of calling them rx/tx virtqueues now that we have datagram rx/tx
> virtqueues.

Sure.

> > +
> >  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> >
> > -There are currently no feature bits defined for this device.
> > +\begin{description}
> > +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
> > +\end{description}
> >
> >  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
> >
> > @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
> >
> >  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
> >
> > +Flow control applies to stream sockets; datagram sockets do not have
> > +flow control.
> > +
> >  The tx virtqueue carries packets initiated by applications and replies to
> >  received packets.  The rx virtqueue carries packets initiated by the device and
> >  replies to previously transmitted packets.
> > @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> >  consists of a (cid, port number) tuple. The header fields used for this are
> >  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> >
> > -Currently only stream sockets are supported. \field{type} is 1 for stream
> > -socket types.
> > +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
> > +socket types. \field{type} is 3 for dgram socket types.
> >
> >  Stream sockets provide in-order, guaranteed, connection-oriented delivery
> >  without message boundaries.
> >
> > +Datagram sockets provide connectionless unreliable messages of
> > +a fixed maximum length.
>
> Plus unordered (?) and with message boundaries. In other words:
>
>   Datagram sockets provide unordered, unreliable, connectionless message
>   with message boundaries and a fixed maximum length.

OK. Will do. In my implementation, the order is preserved because
there is only one virtqueue. But I think we can put unordered in the spec.

> I didn't think of the fixed maximum length aspect before. I guess the
> intention is that the rx buffer size is the message size limit? That's
> different from UDP messages, which can be fragmented into multiple IP
> packets and can be larger than 64KiB:
> https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
>
> Is it possible to support large datagram messages in vsock? I'm a little
> concerned that applications that run successfully over UDP will not be
> portable if vsock has this limitation because it would impose extra
> message boundaries that the application protocol might not tolerate.
>
OK. I think one way is to support fragmentation as suggested by Stefano.

> > +
> >  \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
> >  \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
> >  stream sockets. The guest and the device publish how much buffer space is
> > @@ -162,7 +181,7 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
> >  u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt);
> >  \end{lstlisting}
> >
> > -If there is insufficient buffer space, the sender waits until virtqueue buffers
> > +For stream sockets, if there is insufficient buffer space, the sender waits until virtqueue buffers
> >  are returned and checks \field{buf_alloc} and \field{fwd_cnt} again. Sending
> >  the VIRTIO_VSOCK_OP_CREDIT_REQUEST packet queries how much buffer space is
> >  available. The reply to this query is a VIRTIO_VSOCK_OP_CREDIT_UPDATE packet.
> > @@ -170,16 +189,28 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
> >  previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
> >  communicating updates any time a change in buffer space occurs.
> >
> > +Unlike stream sockets, dgram sockets do not use VIRTIO_VSOCK_OP_CREDIT_UPDATE or
> > +VIRTIO_VSOCK_OP_CREDIT_REQUEST packets. The dgram buffer management
> > +is split to two parts: tx side and rx side. For the tx side, there is
> > +additional buffer space for each socket.
>
> Plus:
>
>   ... according to the the driver and device's available memory
>   resources. The amount of tx buffer space is an implementation detail
>   of both the device and the driver. It is not visible to the other side
>   and may be controlled by the application or administrative resource
>   limits.
>
> What I'm trying to describe here is that the additional tx buffer space
> isn't part of the device interface.
>
OK. Will do.

> > +The dgram sender sends packets when the virtqueue or the additional buffer is not full.
> > +When both are full, the sender
> > +MUST return an appropriate error to the upper layer application.
>
> MUST, SHOULD, etc clauses need to go into the
> devicenormative/drivernormative sections. They cannot be in regular
> text.
>
OK.

> > +For the rx side, dgram also uses the \field{buf_alloc}. If it is full, the packet
> > +is dropped by the receiver.
>
> UDP is connectionless so any number of other sources can send messages
> to the same destination, causing buf_alloc's value to be unpredictable.
> Can you explain how buf_alloc works with datagram sockets in more
> detail?

In the linux kernel in my prototype, datagram sockets also use
virtio_transport_inc_rx_pkt() and virtio_transport_dec_rx_pkt() to update
vvs->rx_bytes and compare it with vvs->buf_alloc. virtio_transport_inc_rx_pkt
is called when enqueuing the datagram packets.
virtio_transport_dec_rx_pkt is called when dequeuing those packets.

If multiple sources send messages to the same destination, they will share
the same buf_alloc. Do you want something with more control?
Maybe somehow allocate a buffer for each remote CID and port? But I
feel that is a little bit overkill. Any other suggestions?

> >  \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > -sufficient free buffer space for the payload.
> > +For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > +sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets
> > +MAY be transmitted when the peer buffer is full. Then the packet will be dropped by the receiver.
> >
> >  All packets associated with a stream flow MUST contain valid information in
> >  \field{buf_alloc} and \field{fwd_cnt} fields.
> >
> >  \devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > -sufficient free buffer space for the payload.
> > +For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > +sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets
> > +MAY be transmitted when the peer buffer is full. Then the packet will be dropped by the receiver.
> >
> >  All packets associated with a stream flow MUST contain valid information in
> >  \field{buf_alloc} and \field{fwd_cnt} fields.
> > @@ -203,14 +234,14 @@ \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / De
> >  The \field{guest_cid} configuration field MUST be used as the source CID when
> >  sending outgoing packets.
> >
> > -A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> > +For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> >  unknown \field{type} value.
>
> What about datagram sockets? Please state what must happen and why.

I think datagram sockets should do the same thing as the stream to
return a VIRTIO_VSOCK_OP_RST?
Any other ideas?

> >
> >  \devicenormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit}
> >
> >  The \field{guest_cid} configuration field MUST NOT contain a reserved CID as listed in \ref{sec:Device Types / Socket Device / Device configuration layout}.
> >
> > -A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> > +For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> >  unknown \field{type} value.
> >
> >  \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device Operation / Stream Sockets}
> > @@ -240,6 +271,17 @@ \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device O
> >  destination) address tuple for a new connection while the other peer is still
> >  processing the old connection.
> >
> > +\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Stream Sockets}
>
> s/Stream Sockets/Datagram Sockets/

Will do

> > +
> > +Datagram (dgram) sockets are connectionless and unreliable. The sender just sends
> > +a message to the peer and hope it will be delivered. A VIRTIO_VSOCK_OP_RST reply is sent if
>
> s/hope/hopes/

got it.

> > +a receiving socket does not exist on the destination.
> > +If the transmission or receiving buffers are full, the packets
> > +are dropped. If the transmission buffer is full, an appropriate error message
> > +is returned to the caller.
>
> It's unclear whether the caller is the driver/device or something else.
> I think you're referring to the application interace, which is outside
> the scope of the VIRTIO spec. I suggest dropping the last sentence.

Yeah, I was thinking about the application interface. I will drop this
sentence.

Thanks for all the feedback.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  parent reply	other threads:[~2021-04-12 22:39 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01  4:36 [RFC v2] virtio-vsock: add description for datagram type jiang.wang
2021-04-12 13:50 ` Stefan Hajnoczi
2021-04-12 14:21   ` Stefano Garzarella
2021-04-12 22:42     ` Jiang Wang .
2021-04-13 12:58       ` Stefano Garzarella
2021-04-13 13:16         ` Michael S. Tsirkin
2021-04-13 13:38           ` Stefano Garzarella
2021-04-13 13:50             ` Michael S. Tsirkin
2021-04-13 14:03               ` Stefano Garzarella
2021-04-13 19:58                 ` Michael S. Tsirkin
2021-04-13 22:00                   ` Jiang Wang .
2021-04-14  7:07                     ` Stefano Garzarella
2021-04-14  6:57                   ` Stefano Garzarella
2021-04-14  7:20                     ` Michael S. Tsirkin
2021-04-14  9:38                       ` Stefano Garzarella
2021-04-15  3:15                         ` Jiang Wang .
2021-05-04  3:40                           ` Jiang Wang .
2021-05-04 16:16                             ` Stefano Garzarella
2021-05-04 17:06                               ` Jiang Wang .
2021-05-05 10:49                                 ` Stefano Garzarella
2021-05-05 16:58                                   ` Jiang Wang .
2021-05-07 16:53                                     ` Jiang Wang .
2021-05-10 14:50                                       ` Stefano Garzarella
2021-05-13 23:26                                         ` Jiang Wang .
2021-05-14 15:17                                           ` Stefano Garzarella
2021-05-14 18:55                                             ` Jiang Wang .
2021-05-17 11:02                                               ` Stefano Garzarella
2021-05-18  6:33                                                 ` Jiang Wang .
2021-05-18 13:02                                                   ` Stefano Garzarella
2021-05-19  4:59                                                     ` Jiang Wang .
2021-06-09  4:31                                                       ` Jiang Wang .
2021-06-09  7:40                                                         ` Stefano Garzarella
2021-04-12 22:39   ` Jiang Wang . [this message]
2021-05-13 14:57     ` [External] " Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP_N_Z-OMqQtnV04Wpynr7GhZooz1iQQ+0To-P8xPEnA0+sZgQ@mail.gmail.com \
    --to=jiang.wang@bytedance.com \
    --cc=arseny.krasnov@kaspersky.com \
    --cc=asias@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=cong.wang@bytedance.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=mst@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).