All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jiang Wang ." <jiang.wang@bytedance.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: cong.wang@bytedance.com,
	Xiongchun Duan <duanxiongchun@bytedance.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	cohuck@redhat.com, virtualization@lists.linux-foundation.org,
	xieyongji@bytedance.com, Stefan Hajnoczi <stefanha@redhat.com>,
	asias@redhat.com, Arseny Krasnov <arseny.krasnov@kaspersky.com>
Subject: Re: Re: [RFC v2] virtio-vsock: add description for datagram type
Date: Wed, 14 Apr 2021 20:15:36 -0700	[thread overview]
Message-ID: <CAP_N_Z85c+GLTmqBkMrRGQzWFj73i=FSiU-hAP7bEmaKTNnc6g@mail.gmail.com> (raw)
In-Reply-To: <20210414093841.koerx2wsmszv4nnj@steredhat>

On Wed, Apr 14, 2021 at 2:38 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Apr 14, 2021 at 03:20:07AM -0400, Michael S. Tsirkin wrote:
> >On Wed, Apr 14, 2021 at 08:57:06AM +0200, Stefano Garzarella wrote:
> >> On Tue, Apr 13, 2021 at 03:58:34PM -0400, Michael S. Tsirkin wrote:
> >> > On Tue, Apr 13, 2021 at 04:03:51PM +0200, Stefano Garzarella wrote:
> >> > > On Tue, Apr 13, 2021 at 09:50:45AM -0400, Michael S. Tsirkin wrote:
> >> > > > On Tue, Apr 13, 2021 at 03:38:52PM +0200, Stefano Garzarella wrote:
> >> > > > > On Tue, Apr 13, 2021 at 09:16:50AM -0400, Michael S. Tsirkin wrote:
> >> > > > > > On Tue, Apr 13, 2021 at 02:58:53PM +0200, Stefano Garzarella wrote:
> >> > > > > > > On Mon, Apr 12, 2021 at 03:42:23PM -0700, Jiang Wang . wrote:
> >> > > > > > > > On Mon, Apr 12, 2021 at 7:21 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >> > > > > > > > >
> >> > > > > > > > > On Mon, Apr 12, 2021 at 02:50:17PM +0100, Stefan Hajnoczi wrote:
> >> > > > > > > > > >On Thu, Apr 01, 2021 at 04:36:02AM +0000, jiang.wang
> >> > > > > > > > > >wrote:
>
> [...]
>
> >> > > > > > > > > >>
> >> > > > > > > > > >> +Datagram sockets provide connectionless unreliable messages of
> >> > > > > > > > > >> +a fixed maximum length.
> >> > > > > > > > > >
> >> > > > > > > > > >Plus unordered (?) and with message boundaries. In other words:
> >> > > > > > > > > >
> >> > > > > > > > > >  Datagram sockets provide unordered, unreliable, connectionless message
> >> > > > > > > > > >  with message boundaries and a fixed maximum length.
> >> > > > > > > > > >
> >> > > > > > > > > >I didn't think of the fixed maximum length aspect before. I guess the
> >> > > > > > > > > >intention is that the rx buffer size is the message size limit? That's
> >> > > > > > > > > >different from UDP messages, which can be fragmented into multiple IP
> >> > > > > > > > > >packets and can be larger than 64KiB:
> >> > > > > > > > > >https://en.wikipedia.org/wiki/User_Datagram_Protocol#UDP_datagram_structure
> >> > > > > > > > > >
> >> > > > > > > > > >Is it possible to support large datagram messages in vsock? I'm a little
> >> > > > > > > > > >concerned that applications that run successfully over UDP will not be
> >> > > > > > > > > >portable if vsock has this limitation because it would impose extra
> >> > > > > > > > > >message boundaries that the application protocol might not tolerate.
> >> > > > > > > > >
> >> > > > > > > > > Maybe we can reuse the same approach Arseny is using for SEQPACKET.
> >> > > > > > > > > Fragment the packets according to the buffers in the virtqueue and set
> >> > > > > > > > > the EOR flag to indicate the last packet in the message.
> >> > > > > > > > >
> >> > > > > > > > Agree. Another option is to use the ones for skb since we may need to
> >> > > > > > > > use skbs for multiple transport support anyway.
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > The important thing I think is to have a single flag in virtio-vsock that
> >> > > > > > > identifies pretty much the same thing: this is the last fragment of a series
> >> > > > > > > to rebuild a packet.
> >> > > > > > >
> >> > > > > > > We should reuse the same flag for DGRAM and SEQPACKET.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > > Stefano
> >> > > > > >
> >> > > > > > Well DGRAM can drop data so I wonder whether it can work ...
> >> > > > > >
> >> > > > >
> >> > > > > Yep, this is true, but the channel should not be losing packets, so if the
> >> > > > > receiver discards packets, it knows that it must then discard all of them
> >> > > > > until the EOR.
> >> > > >
> >> > > > That is not so easy - they can come mixed up from multiple sources.
> >> > >
> >> > > I think we can prevent mixing because virtuqueue is point to point and its
> >> > > use is not thread safe, so the access (in the same peer) is already
> >> > > serialized.
> >> > > In the end the packet would be fragmented only before copying it to the
> >> > > virtuqueue.
> >> > >
> >> > > But maybe I missed something...
> >> >
> >> > Well I ask what's the point of fragmenting then. I assume it's so we
> >> > can pass huge messages around so you can't keep locks ...
> >> >
> >>
> >> Maybe I'm wrong, but isn't this similar to what we do in virtio-net with
> >> mergeable buffers?
> >
> >The point of mergeable buffers is to use less memory: both for each
> >packet and for a full receive vq.
> >
> >> Also in this case I think the fragmentation will happen only in the device,
> >> since the driver can enqueue the entire buffer.
> >>
> >> Maybe we can reuse mergeable buffers for virtio-vsock if the EOR flag is not
> >> suitable.
> >
> >That sounds very reasonable.
>
> It should also allow us to save the header for each fragment.
>
> @Jiang Do you want to explore this?
> I'm talking about VIRTIO_NET_F_MRG_RXBUF feature.

Sure. Will do.

> >
> >> IIUC in the vsock device the fragmentation for DGRAM will happen just
> >> before
> >> to queue it in the virtqueue, and the device can check how many buffers are
> >> available in the queue and it can decide whether to queue them all up or
> >> throw them away.
> >> >
> >> > > > Sure linux net core does this but with fragmentation added in,
> >> > > > I start wondering whether you are beginning to reinvent the net stack
> >> > > > ...
> >> > >
> >> > > No, I hope not :-), in the end our advantage is that we have a channel that
> >> > > doesn't lose packets, so I guess we can make assumptions that the network
> >> > > stack can't.
> >> > >
> >> > > Thanks,
> >> > > Stefano
> >> >
> >> > I still don't know how will credit accounting work for datagram,
> >> > but proposals I saw seem to actually lose packets ...
> >> >
> >>
> >> I still don't know too, but I think it's not an issue in the RX side,
> >> since if it doesn't have space, can drop all the fragment.
> >>
> >> Another option to avoid fragmentation could be to allocate 64K buffers for
> >> the new DGRAM virtqueue.
> >
> >That's a lot of buffers ...
>
> Yep I see, and they would often be mostly unused...
>
> >
> >> In this way we will have at most 64K packets, which is similar to
> >> UDP/IP,
> >> without extra work for the fragmentation.
> >
> >IIRC default MTU is 1280 not 64K ...
>
> I was thinking that UDP at most can support 64K messages that IP should
> fragment according to MTU.
>
> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2021-04-15  3:15 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01  4:36 [RFC v2] virtio-vsock: add description for datagram type jiang.wang
2021-04-12 13:50 ` Stefan Hajnoczi
2021-04-12 14:21   ` Stefano Garzarella
2021-04-12 22:42     ` Jiang Wang .
2021-04-13 12:58       ` Stefano Garzarella
2021-04-13 13:16         ` Michael S. Tsirkin
2021-04-13 13:38           ` Stefano Garzarella
2021-04-13 13:50             ` Michael S. Tsirkin
2021-04-13 14:03               ` Stefano Garzarella
2021-04-13 19:58                 ` Michael S. Tsirkin
2021-04-13 22:00                   ` Jiang Wang .
2021-04-14  7:07                     ` Stefano Garzarella
2021-04-14  6:57                   ` Stefano Garzarella
2021-04-14  7:20                     ` Michael S. Tsirkin
2021-04-14  9:38                       ` Stefano Garzarella
2021-04-15  3:15                         ` Jiang Wang . [this message]
2021-05-04  3:40                           ` Jiang Wang .
2021-05-04 16:16                             ` Stefano Garzarella
2021-05-04 17:06                               ` Jiang Wang .
2021-05-05 10:49                                 ` Stefano Garzarella
2021-05-05 16:58                                   ` Jiang Wang .
2021-05-07 16:53                                     ` Jiang Wang .
2021-05-10 14:50                                       ` Stefano Garzarella
2021-05-13 23:26                                         ` Jiang Wang .
2021-05-14 15:17                                           ` Stefano Garzarella
2021-05-14 18:55                                             ` Jiang Wang .
2021-05-17 11:02                                               ` Stefano Garzarella
2021-05-18  6:33                                                 ` Jiang Wang .
2021-05-18 13:02                                                   ` Stefano Garzarella
2021-05-19  4:59                                                     ` Jiang Wang .
2021-06-09  4:31                                                       ` Jiang Wang .
2021-06-09  7:40                                                         ` Stefano Garzarella
2021-04-12 22:39   ` [External] " Jiang Wang .
2021-05-13 14:57     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP_N_Z85c+GLTmqBkMrRGQzWFj73i=FSiU-hAP7bEmaKTNnc6g@mail.gmail.com' \
    --to=jiang.wang@bytedance.com \
    --cc=arseny.krasnov@kaspersky.com \
    --cc=asias@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=cong.wang@bytedance.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=mst@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.