virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: "Jiang Wang ." <jiang.wang@bytedance.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: cong.wang@bytedance.com,
	Xiongchun Duan <duanxiongchun@bytedance.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	cohuck@redhat.com, virtualization@lists.linux-foundation.org,
	xieyongji@bytedance.com, Stefan Hajnoczi <stefanha@redhat.com>,
	Arseny Krasnov <arseny.krasnov@kaspersky.com>
Subject: Re: Re: [RFC v2] virtio-vsock: add description for datagram type
Date: Tue, 18 May 2021 21:59:43 -0700	[thread overview]
Message-ID: <CAP_N_Z9BccUvm=yuNd8694uhc_uF8f=0A0faTETBYF=R4=t7vQ@mail.gmail.com> (raw)
In-Reply-To: <20210518130242.plnzg5mx7ljnhxig@steredhat>

On Tue, May 18, 2021 at 6:02 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Mon, May 17, 2021 at 11:33:06PM -0700, Jiang Wang . wrote:
> >On Mon, May 17, 2021 at 4:02 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >>
> >> On Fri, May 14, 2021 at 11:55:29AM -0700, Jiang Wang . wrote:
> >> >On Fri, May 14, 2021 at 8:17 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
> >> >> On Thu, May 13, 2021 at 04:26:03PM -0700, Jiang Wang . wrote:
> >>
> >> [...]
> >>
> >> >> >I see. I will add some limit to dgram packets. Also, when the
> >> >> >virtqueues
> >> >> >are shared between stream and dgram, both of them need to grab a lock
> >> >> >before using the virtqueue, so one will not completely block another one.
> >> >>
> >> >> I'm not worried about the concurrent access that we definitely need to
> >> >> handle with a lock, but more about the uncontrolled packet sending that
> >> >> dgram might have, flooding the queues and preventing others from
> >> >> communicating.
> >> >
> >> >That is a valid concern. Let me explain how I would handle that if we
> >> >don't add two new virtqueues. For dgram, I also add a dgram_send_pkt_list,
> >> >which is similar to send_pkt_list for stream (and seqpacket). But there
> >> >is one difference. The dgram_send_pkt_list has a maximum size setting,
> >> >and keep tracking how many pkts are in the list. The track number
> >> >(dgram_send_pkt_list_size) is  increased when a packet is added
> >> >to the list and is decreased when a packet
> >> >is removed from the list and added to the virtqueue. In
> >> >virtio_transport_send_pkt, if the current
> >> >dgram_send_pkt_list_size is equal
> >> >to the maximum ( let's say 128), then it will not add to the
> >> >dgram_send_pkt_list and return an error to the application.
> >>
> >> For stream socket, we have the send_pkt_list and the send worker because
> >> the virtqueue can be full and the transmitter needs to wait available
> >> slots, because we can't discard packets.
> >>
> >> For dgram I think we don't need this, so we can avoid the
> >> dgram_send_pkt_list and directly enqueue packets in the virtqueue.
> >>
> >> If there are no slots available, we can simply drop the packet.
> >> In this way we don't have to put limits other than the available space
> >> in the virtqueue.
> >
> >I considered this approach before. One question not clear to me is
> >whether we should call virtqueue_kick for every dgram packet. If I
> >am not wrong, each virtqueue_kick will lead to a vm_exit to the host.
> >If we call virtqueue_kick() for each dgram packet, the performance
> >might be bad when there are lots of packets.
>
> Not every virtqueue_kick() will lead to a vm_exit. If the host (see
> vhost_transport_do_send_pkt()) is handling the virtqueue, it disables
> the notification through vhost_disable_notify().

Got it.

> >One idea is to set
> >a threshold and a timer to call virtqueue_kick(). When there are
> >lots of packets, we wait until a threshold. When there are few packets,
> >the timer will make sure those packets will be delivered not too late.
>
> I think is better to keep the host polling a bit if we want to avoid
> some notifications (see vhost_net_busy_poll()).

Sure.

> >
> >Any other ideas for virtqueue_kick? If the threshold plus timer is fine,
> >I can go this direction too.
>
> I think is better to follow vhost_net_busy_poll() approach.

OK. I will follow vhost_net_busy_poll()

> >
> >> >
> >> >In this way, the number of pending dgram pkts to be sent is limited.
> >> >Then both stream and dgram sockets will compete to hold a lock
> >> >for the tx virtqueue. Depending on the linux scheduler, this competition
> >> >will be somewhat fair. As a result, dgram will not block stream completely.
> >> >It will compete and share the virtqueue with stream, but stream
> >> >can still send some pkts.
> >> >
> >> >Basically, the virtqueue becomes a first-come first-serve resource for
> >> >the stream and dgram. When both stream and dgram applications
> >> >have lots of data to send, dgram_send_pkt_list and send_pkt_list
> >> >will still be a limited size, and each will have some chance to send out
> >> >the data via virtqueue.  Does this address your concern?
> >> >
> >> >
> >> >> So having 2 dedicated queues could avoid a credit mechanism at all for
> >> >> connection-less sockets, and simply the receiver discards packets that
> >> >> it can't handle.
> >> >
> >> >With 2 dedicated queues, we still need some kind of accounting for
> >> >the dgram. Like the dgram_send_pkt_size I mentioned above.  Otherwise,
> >> >it will cause OOM. It is not a real credit mechanism, but code is similar
> >> >with or without 2 dedicated queues in my current prototype.
> >>
> >> I think in both cases we don't need an accounting, but we can drop
> >> packets if the virtqueue is full.
> >>
> >> It's still not clear to me where OOM may occur. Can you elaborate on
> >> this?
> >
> >OOM depending on the implementation details. If we keep
> >dgram_send_pkt_list, and put no limitation,  it may increase indefinitely
> >and cause OOM. As long as we have some limitations or drop packets
> >quickly, then we should be fine.
>
> Sure, with the extra list we need some limitations.
>
> >
> >> >
> >> >For receiver discarding packets part, could you explain more? I think
> >> >receiver discarding pkts code also is same with or without 2 dedicated
> >> >queues.
> >>
> >> Yep.
> >>
> >> > Both will use can_enqueue = virtio_transport_inc_rx_pkt(vvs, pkt);
> >> >to check if a packet should be discarded or not.
> >>
> >> I don't think we can use virtio_transport_inc_rx_pkt() for dgram.
> >> This is based on the credit mechanism (e.g. buf_alloc) that works when
> >> you have a connection oriented socket.
> >>
> >> With dgram you can have multiple transmitter for a single socket, so how
> >> we can exchange that information?
> >>
> >In my view, we don't need to exchange that information. Buf_alloc for
> >dgram is local to a receiving socket. The sending sockets do not know
> >about that. For receiving socket, it just checks if there is still buffer
> >available to decide to drop a packet or not. If multiple transmitter
> >send to a single socket, they will share the same buf_alloc, and packets
> >will be dropped randomly for those transmitters.
>
> I see, but if we reuse skbuff I think we don't need this.
>
> >
> >> If we reuse the VMCI implementation with skbuff, the sk_receive_skb()
> >> already checks if the sk_rcvbuf is full and discards the packet in
> >> this
> >> case, so we don't need an internal rx_queue.
> >
> >OK.
> >
> >> Maybe someday we should convert stream to skbuff too, it would make our
> >> lives easier.
> >>
> >Yeah.  I remember the original virtio vsock patch did use skb, but somehow
> >it  did not use skb anymore when merging to the upstream, not sure why.
>
> Really? I didn't know, then I'll take a look. If you have any links,
> please send :-)

No. I take it back. The skb is only used for dgram, not for stream
sockets. Sorry for the confusion. The code I found is not in a easy-to-read
format. Anyway, here is the link if you are interested:

https://kvm.vger.kernel.narkive.com/BMvH9eEr/rfc-v2-0-7-introduce-vm-sockets-virtio-transport#post1

btw: as major parts of the spec are kind of settled. I will send my current
POC code to the mailing list just to have a reference and see if there are
other major changes are needed. I will add to do's in the cover letter.

Will update spec too.


> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2021-05-19  5:00 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01  4:36 [RFC v2] virtio-vsock: add description for datagram type jiang.wang
2021-04-12 13:50 ` Stefan Hajnoczi
2021-04-12 14:21   ` Stefano Garzarella
2021-04-12 22:42     ` Jiang Wang .
2021-04-13 12:58       ` Stefano Garzarella
2021-04-13 13:16         ` Michael S. Tsirkin
2021-04-13 13:38           ` Stefano Garzarella
2021-04-13 13:50             ` Michael S. Tsirkin
2021-04-13 14:03               ` Stefano Garzarella
2021-04-13 19:58                 ` Michael S. Tsirkin
2021-04-13 22:00                   ` Jiang Wang .
2021-04-14  7:07                     ` Stefano Garzarella
2021-04-14  6:57                   ` Stefano Garzarella
2021-04-14  7:20                     ` Michael S. Tsirkin
2021-04-14  9:38                       ` Stefano Garzarella
2021-04-15  3:15                         ` Jiang Wang .
2021-05-04  3:40                           ` Jiang Wang .
2021-05-04 16:16                             ` Stefano Garzarella
2021-05-04 17:06                               ` Jiang Wang .
2021-05-05 10:49                                 ` Stefano Garzarella
2021-05-05 16:58                                   ` Jiang Wang .
2021-05-07 16:53                                     ` Jiang Wang .
2021-05-10 14:50                                       ` Stefano Garzarella
2021-05-13 23:26                                         ` Jiang Wang .
2021-05-14 15:17                                           ` Stefano Garzarella
2021-05-14 18:55                                             ` Jiang Wang .
2021-05-17 11:02                                               ` Stefano Garzarella
2021-05-18  6:33                                                 ` Jiang Wang .
2021-05-18 13:02                                                   ` Stefano Garzarella
2021-05-19  4:59                                                     ` Jiang Wang . [this message]
2021-06-09  4:31                                                       ` Jiang Wang .
2021-06-09  7:40                                                         ` Stefano Garzarella
2021-04-12 22:39   ` [External] " Jiang Wang .
2021-05-13 14:57     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP_N_Z9BccUvm=yuNd8694uhc_uF8f=0A0faTETBYF=R4=t7vQ@mail.gmail.com' \
    --to=jiang.wang@bytedance.com \
    --cc=arseny.krasnov@kaspersky.com \
    --cc=cohuck@redhat.com \
    --cc=cong.wang@bytedance.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=mst@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).