All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jiang Wang ." <jiang.wang@bytedance.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: cong.wang@bytedance.com,
	Xiongchun Duan <duanxiongchun@bytedance.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	cohuck@redhat.com, virtualization@lists.linux-foundation.org,
	xieyongji@bytedance.com, Stefan Hajnoczi <stefanha@redhat.com>,
	asias@redhat.com, Arseny Krasnov <arseny.krasnov@kaspersky.com>
Subject: Re: Re: [RFC v2] virtio-vsock: add description for datagram type
Date: Fri, 7 May 2021 09:53:19 -0700	[thread overview]
Message-ID: <CAP_N_Z9yZ4ydXaEn1e=70pDh3FyDjrrxgzis4YTfyDoZ5c8k+g@mail.gmail.com> (raw)
In-Reply-To: <CAP_N_Z_DN+SYZ3E52HGdcmSfopBoNayKVzUdH7Nc2jUK5nfmLA@mail.gmail.com>

Hi guys,

I have one question about adding two new virtqueues for dgram. One new
thought is that we don't add two new virtqueues but keep using existing
virtqueues for both stream and dgram.

At the beginning when I first thought about supporting dgram, I thought
adding two new virtqueues would be easier and have better performance.
But now, after the prototype is done, I think that to keep using
existing virtqueues is also not complicated and could in fact be simpler.
The performance difference may not be very big.

Original code has about 3 places which have assumptions about the
virtqueues are only used by the stream. But we can change those codes.
One place is to check pkt len. We can check only for stream pkts.
Another two are in tx and rx code path where if queued replies pkts are
too much, the code will stop the rx queue and resume later. We can keep
that same logic. The dgram will be affected a little bit but that should
be fine I think. Are there any other places that we should fix?

In short, the virtqueues are in a lower level and can support multiple
flows and socket types. Use existing virtqueues also make it more
compatible with old versions.

What do you guys think? I remember Stefano mentioned that we should add
two new virtqueues for dgram. Stefano, do you have some specific reasons
for that? Could we just keep using existing virtqueues? Thanks.

Regards,

Jiang

On Wed, May 5, 2021 at 9:58 AM Jiang Wang . <jiang.wang@bytedance.com> wrote:
>
> On Wed, May 5, 2021 at 3:49 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > On Tue, May 04, 2021 at 10:06:02AM -0700, Jiang Wang . wrote:
> > >On Tue, May 4, 2021 at 9:16 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >>
> > >> Hi Jiang,
> > >>
> > >> On Mon, May 03, 2021 at 08:40:46PM -0700, Jiang Wang . wrote:
> > >> >Hi Stefano,
> > >> >
> > >> >I checked the VIRTIO_NET_F_MRG_RXBUF feature bit and I think vsock
> > >> >dgram can use that feature too.
> > >>
> > >> Cool, thanks for checking!
> > >
> > >NP.
> > >
> > >> >Do we want to make this feature a must-have or optional? One idea is
> > >> >to make it optional. When not
> > >>
> > >> I think optional is fine, and we should support it for all kind of
> > >> traffic (stream, dgram, seqpacket).
> > >
> > >Got it. I was thinking only for dgram originally, but I think it should be fine
> > >for stream and seqpacket too.
> > >
> > >Btw, I have a small implementation question. For now, the vsock
> > >allocates rx buffers with two scatterlist. One for header and one for the
> > >payload. After we enable VIRTIO_NET_F_MRG_RXBUF feature,
> > >do we still want to allocate buffers like that? Or could we just use
> > >one big scatterlist for the whole packet? I think using the same allocation
> > >method is fine, but it just may not line up with the real packets well since
> > >we will skip headers for the big packets except the first buffer.
> >
> > Good question.
> >
> > With mergeable buffer I think is better to remove the little buffer for
> > the header in the scatterlist, this should also avoid to do two
> > allocations per packet/buffer in the guest.
>
> Got  it. Will do.
>
> > >
> > >> >supported, dgram rx buf is 16 KB which should be good in most cases.
> > >>
> > >> Why not 4 KB like for stream? Or we could make it configurable.
> > >
> > >OK. sure. 4 KB is fine with me. I mentioned 16 KB because I was thinking
> > >jumbo frames in the ethernet world. But  I just found out the jumbo frame
> > >is about 8 KB or 9 KB only.
> > >
> > >If we make it configurable, what kind of interface to use to configure it?
> > >In linux, we could use something like the sysfs interface. I guess we don't
> >
> > Yes, something like that for the guest driver.
>
> Got it.
>
> > >need to specify that detail in the spec though. I will just put the size should
> > >be configurable in the spec.
> >
> > Yeah, I remember that at some point we fixed an issue where the host
> > always expected buffer of 4 KB.
> >
> > Now it should support any buffer sizes less or equal to 64 KB.
> >
> I see. I will if there is any issue with that.
>
> > >
> > >> >When VIRTIO_NET_F_MRG_RXBUF is supported, the rx buf is 4K and the max
> > >> >packet size is 64 KB.
> > >> >
> > >> >Also, just to make sure we are on the same page, the current vsock
> > >> >stream code can also split a
> > >> >big packet to multiple buffers and the receive side can assemble them
> > >> >together.
> > >>
> > >> Yes, sort of. Being a stream, there's no concept of a boundary.
> > >>
> > >> > But dgram cannot
> > >> >use that code because the dgram may drop a buffer in the driver code
> > >> >(if there is not enough space).
> > >> >That means dgram may drop some buffers at the beginning, in the end or in the
> > >> >middle of a pkt. And a packet may
> > >> >not be received as a complete one. Therefore, we need something like
> > >> >VIRTIO_NET_F_MRG_RXBUF.
> > >>
> > >> Yep.
> > >>
> > >> >
> > >> >If we want to leverage current stream code without using
> > >> >VIRTIO_NET_F_MRG_RXBUF,
> > >> >we could add a total_len and offset to the virtio_vsock_hdr. Then when sending
> > >> >packet, the device split the big packet to multiple small ones and
> > >> >each has a header. They will have the
> > >> >same total_len, but different offsets. On the driver side, the driver
> > >> >can check the total_len before
> > >> >enqueueing the big packet for the one with offset 0.
> > >> >If there is enough space, all the remaining packets will be received.
> > >> >If not, the remaining packets will be dropped.
> > >> >I feel this implementation might be easier than using
> > >> >VIRTIO_NET_F_MRG_RXBUF. But either one is fine with me.
> > >> >Any preference? Thanks.
> > >>
> > >> This is very similar to what we discussed with Michael. He pointed
> > >> out
> > >> that it could be complicated and we could have several problems.
> > >>
> > >> For example, we should also provide an ID to prevent different
> > >> fragments
> > >> from overlapping. Also we might have problems handling different
> > >> flows
> > >> at the same time.
> > >>
> > >> Mergable buffers allow us to avoid these problems and also bring
> > >> advantages for the other types of traffic (stream, seqpacket).
> > >>
> > >> It also allows us to use a single header for the packet and all its
> > >> fragments.
> > >>
> > >> So IMHO, if there are no significant issues, the best way would be to
> > >> implement mergeable buffers in vsock,
> > >> I think there are only advantages to using this feature.
> > >
> > >Sure. Got it. I was thinking only about dgram, which is simpler than
> > >stream and seqpacket. For those two, they will have issues as you
> > >just mentioned.
> > >
> > >Also, just to make sure. For steam and seqpacket, supporting
> > >mergeable buffers is mainly for performance improvements,
> > >right? Or to save memory? I think functionally, they will be the
> > >same with or without
> > >mergeable buffers.
> >
> > Yes, right!
> >
> > > For dgram, the maximum supported packet size
> > >is increased when using MRG_RXBUF if the rx buf size is fixed,
> > >and it can save lots of memory.
> > >
> > >I am a little bit confused about the motivation to support mergeable
> > >buffers for stream and seqpacket. Could you remind me again? Sorry
> > >that if it was already mentioned in the old emails.
> >
> > We can save the header overhead, using a single header for the entire
> > "big" packet.
> >
> > For example, in the current implementation, if the host has a 64KB
> > buffer to send to the guest with a stream socket, must split it into 16
> > packets, using a header for each fragment. With mergable buffers, we
> > would save the extra header for each fragment by using a single initial
> > header specifying the number of descriptors used.
> >
> OK. Sure.
> > >
> > >We could only support it on dgram since dgram has its own virtqueues.
> >
> > Maybe for the initial implementation is fine, then we can add support
> > also for other types.
> >
> > Please, keep this in mind, so it will be easier to reuse it for other
> > types.
> >
> Got it. Will do. Thanks for the suggestions and comments. I will
> update the spec patch next.
>
> > >
> > >btw, my company email system automatically adds [External] to
> > >these emails, and I meant to remove it manually when I reply,
> > >but forgot to do that sometimes, so the email threading may not
> > >be very accurate.
> > >Sorry about that.
> >
> > Don't worry :-)
> >
> > Thanks,
> > Stefano
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2021-05-07 16:53 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01  4:36 [RFC v2] virtio-vsock: add description for datagram type jiang.wang
2021-04-12 13:50 ` Stefan Hajnoczi
2021-04-12 14:21   ` Stefano Garzarella
2021-04-12 22:42     ` Jiang Wang .
2021-04-13 12:58       ` Stefano Garzarella
2021-04-13 13:16         ` Michael S. Tsirkin
2021-04-13 13:38           ` Stefano Garzarella
2021-04-13 13:50             ` Michael S. Tsirkin
2021-04-13 14:03               ` Stefano Garzarella
2021-04-13 19:58                 ` Michael S. Tsirkin
2021-04-13 22:00                   ` Jiang Wang .
2021-04-14  7:07                     ` Stefano Garzarella
2021-04-14  6:57                   ` Stefano Garzarella
2021-04-14  7:20                     ` Michael S. Tsirkin
2021-04-14  9:38                       ` Stefano Garzarella
2021-04-15  3:15                         ` Jiang Wang .
2021-05-04  3:40                           ` Jiang Wang .
2021-05-04 16:16                             ` Stefano Garzarella
2021-05-04 17:06                               ` Jiang Wang .
2021-05-05 10:49                                 ` Stefano Garzarella
2021-05-05 16:58                                   ` Jiang Wang .
2021-05-07 16:53                                     ` Jiang Wang . [this message]
2021-05-10 14:50                                       ` Stefano Garzarella
2021-05-13 23:26                                         ` Jiang Wang .
2021-05-14 15:17                                           ` Stefano Garzarella
2021-05-14 18:55                                             ` Jiang Wang .
2021-05-17 11:02                                               ` Stefano Garzarella
2021-05-18  6:33                                                 ` Jiang Wang .
2021-05-18 13:02                                                   ` Stefano Garzarella
2021-05-19  4:59                                                     ` Jiang Wang .
2021-06-09  4:31                                                       ` Jiang Wang .
2021-06-09  7:40                                                         ` Stefano Garzarella
2021-04-12 22:39   ` [External] " Jiang Wang .
2021-05-13 14:57     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP_N_Z9yZ4ydXaEn1e=70pDh3FyDjrrxgzis4YTfyDoZ5c8k+g@mail.gmail.com' \
    --to=jiang.wang@bytedance.com \
    --cc=arseny.krasnov@kaspersky.com \
    --cc=asias@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=cong.wang@bytedance.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=mst@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.