All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jiang Wang ." <jiang.wang@bytedance.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: cong.wang@bytedance.com,
	Xiongchun Duan <duanxiongchun@bytedance.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	cohuck@redhat.com, virtualization@lists.linux-foundation.org,
	xieyongji@bytedance.com,
	Arseny Krasnov <arseny.krasnov@kaspersky.com>,
	asias@redhat.com
Subject: Re: [External] Re: [RFC v2] virtio-vsock: add description for datagram type
Date: Mon, 12 Apr 2021 15:39:36 -0700	[thread overview]
Message-ID: <CAP_N_Z-OMqQtnV04Wpynr7GhZooz1iQQ+0To-P8xPEnA0+sZgQ@mail.gmail.com> (raw)
In-Reply-To: <YHRQGSSnkN8Zipy0@stefanha-x1.localdomain>

On Mon, Apr 12, 2021 at 6:50 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang wrote:
> > Add supports for datagram type for virtio-vsock. Datagram
> > sockets are connectionless and unreliable. To avoid contention
> > with stream and other sockets, add two more virtqueues and
> > a new feature bit to identify if those two new queues exist or not.
> >
> > Also add descriptions for resource management of datagram, which
> > does not use the existing credit update mechanism associated with
> > stream sockets.
> >
> > Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> > ---
> > V2 addressed the comments for the previous version.
> >
> >  virtio-vsock.tex | 62 +++++++++++++++++++++++++++++++++++++++++++++++---------
> >  1 file changed, 52 insertions(+), 10 deletions(-)
> >
> > diff --git a/virtio-vsock.tex b/virtio-vsock.tex
> > index da7e641..62c12e0 100644
> > --- a/virtio-vsock.tex
> > +++ b/virtio-vsock.tex
> > @@ -11,12 +11,25 @@ \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
> >  \begin{description}
> >  \item[0] rx
> >  \item[1] tx
> > +\item[2] datagram rx
> > +\item[3] datagram tx
> > +\item[4] event
> > +\end{description}
> > +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it
> > +only uses 3 queues, as the following. Rx and tx queues are always used for stream sockets.
> > +
> > +\begin{description}
> > +\item[0] rx
> > +\item[1] tx
> >  \item[2] event
> >  \end{description}
> >
>
> I suggest renaming "rx" and "tx" to "stream rx" and "stream tx"
> virtqueues and also adding this:

OK

>   When behavior differs between stream and datagram rx/tx virtqueues
>   their full names are used. Common behavior is simply described in
>   terms of rx/tx virtqueues and applies to both stream and datagram
>   virtqueues.

OK

> This way you won't need to duplicate portions of the spec that deal with
> populating the virtqueues, for example.
>
> It's also clearer to use a full name for stream rx/tx virtqueues instead
> of calling them rx/tx virtqueues now that we have datagram rx/tx
> virtqueues.

Sure.

> > +
> >  \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> >
> > -There are currently no feature bits defined for this device.
> > +\begin{description}
> > +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type.
> > +\end{description}
> >
> >  \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
> >
> > @@ -107,6 +120,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op
> >
> >  \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
> >
> > +Flow control applies to stream sockets; datagram sockets do not have
> > +flow control.
> > +
> >  The tx virtqueue carries packets initiated by applications and replies to
> >  received packets.  The rx virtqueue carries packets initiated by the device and
> >  replies to previously transmitted packets.
> > @@ -140,12 +156,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> >  consists of a (cid, port number) tuple. The header fields used for this are
> >  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> >
> > -Currently only stream sockets are supported. \field{type} is 1 for stream
> > -socket types.
> > +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream
> > +socket types. \field{type} is 3 for dgram socket types.
> >
> >  Stream sockets provide in-order, guaranteed, connection-oriented delivery
> >  without message boundaries.
> >
> > +Datagram sockets provide connectionless unreliable messages of
> > +a fixed maximum length.
>
> Plus unordered (?) and with message boundaries. In other words:
>
>   Datagram sockets provide unordered, unreliable, connectionless message
>   with message boundaries and a fixed maximum length.

OK. Will do. In my implementation, the order is preserved because
there is only one virtqueue. But I think we can put unordered in the spec.

> I didn't think of the fixed maximum length aspect before. I guess the
> intention is that the rx buffer size is the message size limit? That's
> different from UDP messages, which can be fragmented into multiple IP
> packets and can be larger than 64KiB:
> https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
>
> Is it possible to support large datagram messages in vsock? I'm a little
> concerned that applications that run successfully over UDP will not be
> portable if vsock has this limitation because it would impose extra
> message boundaries that the application protocol might not tolerate.
>
OK. I think one way is to support fragmentation as suggested by Stefano.

> > +
> >  \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
> >  \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
> >  stream sockets. The guest and the device publish how much buffer space is
> > @@ -162,7 +181,7 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
> >  u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt);
> >  \end{lstlisting}
> >
> > -If there is insufficient buffer space, the sender waits until virtqueue buffers
> > +For stream sockets, if there is insufficient buffer space, the sender waits until virtqueue buffers
> >  are returned and checks \field{buf_alloc} and \field{fwd_cnt} again. Sending
> >  the VIRTIO_VSOCK_OP_CREDIT_REQUEST packet queries how much buffer space is
> >  available. The reply to this query is a VIRTIO_VSOCK_OP_CREDIT_UPDATE packet.
> > @@ -170,16 +189,28 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
> >  previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
> >  communicating updates any time a change in buffer space occurs.
> >
> > +Unlike stream sockets, dgram sockets do not use VIRTIO_VSOCK_OP_CREDIT_UPDATE or
> > +VIRTIO_VSOCK_OP_CREDIT_REQUEST packets. The dgram buffer management
> > +is split to two parts: tx side and rx side. For the tx side, there is
> > +additional buffer space for each socket.
>
> Plus:
>
>   ... according to the the driver and device's available memory
>   resources. The amount of tx buffer space is an implementation detail
>   of both the device and the driver. It is not visible to the other side
>   and may be controlled by the application or administrative resource
>   limits.
>
> What I'm trying to describe here is that the additional tx buffer space
> isn't part of the device interface.
>
OK. Will do.

> > +The dgram sender sends packets when the virtqueue or the additional buffer is not full.
> > +When both are full, the sender
> > +MUST return an appropriate error to the upper layer application.
>
> MUST, SHOULD, etc clauses need to go into the
> devicenormative/drivernormative sections. They cannot be in regular
> text.
>
OK.

> > +For the rx side, dgram also uses the \field{buf_alloc}. If it is full, the packet
> > +is dropped by the receiver.
>
> UDP is connectionless so any number of other sources can send messages
> to the same destination, causing buf_alloc's value to be unpredictable.
> Can you explain how buf_alloc works with datagram sockets in more
> detail?

In the linux kernel in my prototype, datagram sockets also use
virtio_transport_inc_rx_pkt() and virtio_transport_dec_rx_pkt() to update
vvs->rx_bytes and compare it with vvs->buf_alloc. virtio_transport_inc_rx_pkt
is called when enqueuing the datagram packets.
virtio_transport_dec_rx_pkt is called when dequeuing those packets.

If multiple sources send messages to the same destination, they will share
the same buf_alloc. Do you want something with more control?
Maybe somehow allocate a buffer for each remote CID and port? But I
feel that is a little bit overkill. Any other suggestions?

> >  \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > -sufficient free buffer space for the payload.
> > +For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > +sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets
> > +MAY be transmitted when the peer buffer is full. Then the packet will be dropped by the receiver.
> >
> >  All packets associated with a stream flow MUST contain valid information in
> >  \field{buf_alloc} and \field{fwd_cnt} fields.
> >
> >  \devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > -sufficient free buffer space for the payload.
> > +For stream sockets, VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > +sufficient free buffer space for the payload. For dgram sockets, VIRTIO_VSOCK_OP_RW data packets
> > +MAY be transmitted when the peer buffer is full. Then the packet will be dropped by the receiver.
> >
> >  All packets associated with a stream flow MUST contain valid information in
> >  \field{buf_alloc} and \field{fwd_cnt} fields.
> > @@ -203,14 +234,14 @@ \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / De
> >  The \field{guest_cid} configuration field MUST be used as the source CID when
> >  sending outgoing packets.
> >
> > -A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> > +For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> >  unknown \field{type} value.
>
> What about datagram sockets? Please state what must happen and why.

I think datagram sockets should do the same thing as the stream to
return a VIRTIO_VSOCK_OP_RST?
Any other ideas?

> >
> >  \devicenormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit}
> >
> >  The \field{guest_cid} configuration field MUST NOT contain a reserved CID as listed in \ref{sec:Device Types / Socket Device / Device configuration layout}.
> >
> > -A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> > +For stream sockets, A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
> >  unknown \field{type} value.
> >
> >  \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device Operation / Stream Sockets}
> > @@ -240,6 +271,17 @@ \subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device O
> >  destination) address tuple for a new connection while the other peer is still
> >  processing the old connection.
> >
> > +\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Stream Sockets}
>
> s/Stream Sockets/Datagram Sockets/

Will do

> > +
> > +Datagram (dgram) sockets are connectionless and unreliable. The sender just sends
> > +a message to the peer and hope it will be delivered. A VIRTIO_VSOCK_OP_RST reply is sent if
>
> s/hope/hopes/

got it.

> > +a receiving socket does not exist on the destination.
> > +If the transmission or receiving buffers are full, the packets
> > +are dropped. If the transmission buffer is full, an appropriate error message
> > +is returned to the caller.
>
> It's unclear whether the caller is the driver/device or something else.
> I think you're referring to the application interace, which is outside
> the scope of the VIRTIO spec. I suggest dropping the last sentence.

Yeah, I was thinking about the application interface. I will drop this
sentence.

Thanks for all the feedback.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  parent reply	other threads:[~2021-04-12 22:39 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01  4:36 [RFC v2] virtio-vsock: add description for datagram type jiang.wang
2021-04-12 13:50 ` Stefan Hajnoczi
2021-04-12 14:21   ` Stefano Garzarella
2021-04-12 22:42     ` Jiang Wang .
2021-04-13 12:58       ` Stefano Garzarella
2021-04-13 13:16         ` Michael S. Tsirkin
2021-04-13 13:38           ` Stefano Garzarella
2021-04-13 13:50             ` Michael S. Tsirkin
2021-04-13 14:03               ` Stefano Garzarella
2021-04-13 19:58                 ` Michael S. Tsirkin
2021-04-13 22:00                   ` Jiang Wang .
2021-04-14  7:07                     ` Stefano Garzarella
2021-04-14  6:57                   ` Stefano Garzarella
2021-04-14  7:20                     ` Michael S. Tsirkin
2021-04-14  9:38                       ` Stefano Garzarella
2021-04-15  3:15                         ` Jiang Wang .
2021-05-04  3:40                           ` Jiang Wang .
2021-05-04 16:16                             ` Stefano Garzarella
2021-05-04 17:06                               ` Jiang Wang .
2021-05-05 10:49                                 ` Stefano Garzarella
2021-05-05 16:58                                   ` Jiang Wang .
2021-05-07 16:53                                     ` Jiang Wang .
2021-05-10 14:50                                       ` Stefano Garzarella
2021-05-13 23:26                                         ` Jiang Wang .
2021-05-14 15:17                                           ` Stefano Garzarella
2021-05-14 18:55                                             ` Jiang Wang .
2021-05-17 11:02                                               ` Stefano Garzarella
2021-05-18  6:33                                                 ` Jiang Wang .
2021-05-18 13:02                                                   ` Stefano Garzarella
2021-05-19  4:59                                                     ` Jiang Wang .
2021-06-09  4:31                                                       ` Jiang Wang .
2021-06-09  7:40                                                         ` Stefano Garzarella
2021-04-12 22:39   ` Jiang Wang . [this message]
2021-05-13 14:57     ` [External] " Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP_N_Z-OMqQtnV04Wpynr7GhZooz1iQQ+0To-P8xPEnA0+sZgQ@mail.gmail.com \
    --to=jiang.wang@bytedance.com \
    --cc=arseny.krasnov@kaspersky.com \
    --cc=asias@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=cong.wang@bytedance.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=mst@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.