All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: linux-hyperv@vger.kernel.org,
	Bobby Eshleman <bobby.eshleman@bytedance.com>,
	kvm@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>,
	VMware PV-Drivers Reviewers <pv-drivers@vmware.com>,
	Simon Horman <simon.horman@corigine.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	virtualization@lists.linux-foundation.org,
	Eric Dumazet <edumazet@google.com>,
	Dan Carpenter <dan.carpenter@linaro.org>,
	Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
	Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
	Bryan Tan <bryantan@vmware.com>, Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Arseniy Krasnov <oxffffaa@gmail.com>,
	Vishnu Dasa <vdasa@vmware.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org, "David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams
Date: Tue, 27 Jun 2023 01:19:43 +0000	[thread overview]
Message-ID: <ZJo5L+IM1P3kFAhe@bullseye> (raw)
In-Reply-To: <d53tgo4igvz34pycgs36xikjosrncejlzuvh47bszk55milq52@whcyextsxfka>

On Mon, Jun 26, 2023 at 05:03:15PM +0200, Stefano Garzarella wrote:
> On Fri, Jun 23, 2023 at 04:37:55AM +0000, Bobby Eshleman wrote:
> > On Thu, Jun 22, 2023 at 06:09:12PM +0200, Stefano Garzarella wrote:
> > > On Sun, Jun 11, 2023 at 11:49:02PM +0300, Arseniy Krasnov wrote:
> > > > Hello Bobby!
> > > >
> > > > On 10.06.2023 03:58, Bobby Eshleman wrote:
> > > > > This commit adds support for datagrams over virtio/vsock.
> > > > >
> > > > > Message boundaries are preserved on a per-skb and per-vq entry basis.
> > > >
> > > > I'm a little bit confused about the following case: let vhost sends 4097 bytes
> > > > datagram to the guest. Guest uses 4096 RX buffers in it's virtio queue, each
> > > > buffer has attached empty skb to it. Vhost places first 4096 bytes to the first
> > > > buffer of guests RX queue, and 1 last byte to the second buffer. Now IIUC guest
> > > > has two skb in it rx queue, and user in guest wants to read data - does it read
> > > > 4097 bytes, while guest has two skb - 4096 bytes and 1 bytes? In seqpacket there is
> > > > special marker in header which shows where message ends, and how it works here?
> > > 
> > > I think the main difference is that DGRAM is not connection-oriented, so
> > > we don't have a stream and we can't split the packet into 2 (maybe we
> > > could, but we have no guarantee that the second one for example will be
> > > not discarded because there is no space).
> > > 
> > > So I think it is acceptable as a restriction to keep it simple.
> > > 
> > > My only doubt is, should we make the RX buffer size configurable,
> > > instead of always using 4k?
> > > 
> > I think that is a really good idea. What mechanism do you imagine?
> 
> Some parameter in sysfs?
> 

I comment more on this below.

> > 
> > For sendmsg() with buflen > VQ_BUF_SIZE, I think I'd like -ENOBUFS
> 
> For the guest it should be easy since it allocates the buffers, but for
> the host?
> 
> Maybe we should add a field in the configuration space that reports some
> sort of MTU.
> 
> Something in addition to what Laura had proposed here:
> https://markmail.org/message/ymhz7wllutdxji3e
> 

That sounds good to me.

IIUC vhost exposes the limit via the configuration space, and the guest
can configure the RX buffer size up to that limit via sysfs?

> > returned even though it is uncharacteristic of Linux sockets.
> > Alternatively, silently dropping is okay... but seems needlessly
> > unhelpful.
> 
> UDP takes advantage of IP fragmentation, right?
> But what happens if a fragment is lost?
> 
> We should try to behave in a similar way.
> 

AFAICT in UDP the sending socket will see EHOSTUNREACH on its error
queue and the packet will be dropped.

For more details:
- the IP defragmenter will emit an ICMP_TIME_EXCEEDED from ip_expire()
  if the fragment queue is not completed within time.
- Upon seeing ICMP_TIME_EXCEEDED, the sending stack will then add
  EHOSTUNREACH to the socket's error queue, as seen in __udp4_lib_err().

Given some updated man pages I think enqueuing EHOSTUNREACH is okay for
vsock too. This also reserves ENOBUFS/ENOMEM only for shortage on local
buffers / mem.

What do you think?

Thanks,
Bobby

  reply	other threads:[~2023-06-27 17:25 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-10  0:58 [PATCH RFC net-next v4 0/8] virtio/vsock: support datagrams Bobby Eshleman
2023-06-10  0:58 ` [PATCH RFC net-next v4 1/8] vsock/dgram: generalize recvmsg and drop transport->dgram_dequeue Bobby Eshleman
2023-06-11 20:43   ` Arseniy Krasnov
2023-06-22 14:51     ` Stefano Garzarella
2023-06-22 14:51       ` Stefano Garzarella
2023-06-22 19:23       ` Arseniy Krasnov
2023-06-22 23:34         ` Bobby Eshleman
2023-06-22 23:37       ` Bobby Eshleman
2023-06-23  8:14         ` Stefano Garzarella
2023-06-23  8:14           ` Stefano Garzarella
2023-06-22 23:25     ` Bobby Eshleman
2023-06-10  0:58 ` [PATCH RFC net-next v4 2/8] vsock: refactor transport lookup code Bobby Eshleman
2023-06-22 14:57   ` Stefano Garzarella
2023-06-22 14:57     ` Stefano Garzarella
2023-06-10  0:58 ` [PATCH RFC net-next v4 3/8] vsock: support multi-transport datagrams Bobby Eshleman
2023-06-22 15:19   ` Stefano Garzarella
2023-06-22 15:19     ` Stefano Garzarella
2023-06-23  2:50     ` Bobby Eshleman
2023-06-23  2:59       ` Bobby Eshleman
2023-06-26 14:50         ` Stefano Garzarella
2023-06-26 14:50           ` Stefano Garzarella
2023-06-10  0:58 ` [PATCH RFC net-next v4 4/8] vsock: make vsock bind reusable Bobby Eshleman
2023-06-12  9:49   ` Simon Horman
2023-06-22 23:00     ` Bobby Eshleman
2023-06-22 15:25   ` Stefano Garzarella
2023-06-22 15:25     ` Stefano Garzarella
2023-06-22 23:05     ` Bobby Eshleman
2023-06-23  8:15       ` Stefano Garzarella
2023-06-23  8:15         ` Stefano Garzarella
2023-06-10  0:58 ` [PATCH RFC net-next v4 5/8] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Bobby Eshleman
2023-06-22 15:29   ` Stefano Garzarella
2023-06-22 15:29     ` Stefano Garzarella
2023-06-22 23:06     ` Bobby Eshleman
2023-06-10  0:58 ` [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams Bobby Eshleman
2023-06-11 20:49   ` Arseniy Krasnov
2023-06-22 16:09     ` Stefano Garzarella
2023-06-22 16:09       ` Stefano Garzarella
2023-06-22 18:46       ` Arseniy Krasnov
2023-06-23  4:37       ` Bobby Eshleman
2023-06-26 15:03         ` Stefano Garzarella
2023-06-26 15:03           ` Stefano Garzarella
2023-06-27  1:19           ` Bobby Eshleman [this message]
2023-06-29 12:30             ` Stefano Garzarella
2023-06-29 12:30               ` Stefano Garzarella
2023-06-22 16:31   ` Stefano Garzarella
2023-06-22 16:31     ` Stefano Garzarella
2023-06-10  0:58 ` [PATCH RFC net-next v4 7/8] vsock: Add lockless sendmsg() support Bobby Eshleman
2023-06-12  9:53   ` Simon Horman
2023-06-22 22:59     ` Bobby Eshleman
2023-06-22 16:37   ` Stefano Garzarella
2023-06-22 16:37     ` Stefano Garzarella
2023-06-22 22:57     ` Bobby Eshleman
2023-06-10  0:58 ` [PATCH RFC net-next v4 8/8] tests: add vsock dgram tests Bobby Eshleman
2023-06-11 20:54   ` Arseniy Krasnov
2023-06-22 23:16     ` Bobby Eshleman
2023-06-23 18:34       ` Arseniy Krasnov
2023-06-23  6:33         ` Bobby Eshleman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJo5L+IM1P3kFAhe@bullseye \
    --to=bobbyeshleman@gmail.com \
    --cc=bobby.eshleman@bytedance.com \
    --cc=bpf@vger.kernel.org \
    --cc=bryantan@vmware.com \
    --cc=dan.carpenter@linaro.org \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=edumazet@google.com \
    --cc=haiyangz@microsoft.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=oxffffaa@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=pv-drivers@vmware.com \
    --cc=sgarzare@redhat.com \
    --cc=simon.horman@corigine.com \
    --cc=stefanha@redhat.com \
    --cc=vdasa@vmware.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=wei.liu@kernel.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.