All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: Bobby Eshleman <bobby.eshleman@bytedance.com>,
	linux-hyperv@vger.kernel.org,
	Stefan Hajnoczi <stefanha@redhat.com>,
	kvm@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>,
	VMware PV-Drivers Reviewers <pv-drivers@vmware.com>,
	Simon Horman <simon.horman@corigine.com>,
	virtualization@lists.linux-foundation.org,
	Eric Dumazet <edumazet@google.com>,
	Dan Carpenter <dan.carpenter@linaro.org>,
	Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
	Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
	Bryan Tan <bryantan@vmware.com>, Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Krasnov Arseniy <oxffffaa@gmail.com>,
	Vishnu Dasa <vdasa@vmware.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org, "David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH RFC net-next v4 3/8] vsock: support multi-transport datagrams
Date: Fri, 23 Jun 2023 02:50:01 +0000	[thread overview]
Message-ID: <ZJUIWcgg13F7DNBm@bullseye> (raw)
In-Reply-To: <tngyeva5by3aldrhlixajjin2hqmcl6uruvuoed7hyrndlesfd@bbv7aphqye2q>

On Thu, Jun 22, 2023 at 05:19:08PM +0200, Stefano Garzarella wrote:
> On Sat, Jun 10, 2023 at 12:58:30AM +0000, Bobby Eshleman wrote:
> > This patch adds support for multi-transport datagrams.
> > 
> > This includes:
> > - Per-packet lookup of transports when using sendto(sockaddr_vm)
> > - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in
> >  sockaddr_vm
> > 
> > To preserve backwards compatibility with VMCI, some important changes
> > were made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to
> > be used for dgrams iff there is not yet a g2h or h2g transport that has
> 
> s/iff/if
> 
> > been registered that can transmit the packet. If there is a g2h/h2g
> > transport for that remote address, then that transport will be used and
> > not "transport_dgram". This essentially makes "transport_dgram" a
> > fallback transport for when h2g/g2h has not yet gone online, which
> > appears to be the exact use case for VMCI.
> > 
> > This design makes sense, because there is no reason that the
> > transport_{g2h,h2g} cannot also service datagrams, which makes the role
> > of transport_dgram difficult to understand outside of the VMCI context.
> > 
> > The logic around "transport_dgram" had to be retained to prevent
> > breaking VMCI:
> > 
> > 1) VMCI datagrams appear to function outside of the h2g/g2h
> >   paradigm. When the vmci transport becomes online, it registers itself
> >   with the DGRAM feature, but not H2G/G2H. Only later when the
> >   transport has more information about its environment does it register
> >   H2G or G2H. In the case that a datagram socket becomes active
> >   after DGRAM registration but before G2H/H2G registration, the
> >   "transport_dgram" transport needs to be used.
> 
> IIRC we did this, because at that time only VMCI supported DGRAM. Now that
> there are more transports, maybe DGRAM can follow the h2g/g2h paradigm.
> 

Totally makes sense. I'll add the detail above that the prior design was
a result of chronology.

> > 
> > 2) VMCI seems to require special message be sent by the transport when a
> >   datagram socket calls bind(). Under the h2g/g2h model, the transport
> >   is selected using the remote_addr which is set by connect(). At
> >   bind time there is no remote_addr because often no connect() has been
> >   called yet: the transport is null. Therefore, with a null transport
> >   there doesn't seem to be any good way for a datagram socket a tell the
> >   VMCI transport that it has just had bind() called upon it.
> 
> @Vishnu, @Bryan do you think we can avoid this in some way?
> 
> > 
> > Only transports with a special datagram fallback use-case such as VMCI
> > need to register VSOCK_TRANSPORT_F_DGRAM.
> 
> Maybe we should rename it in VSOCK_TRANSPORT_F_DGRAM_FALLBACK or
> something like that.
> 
> In any case, we definitely need to update the comment in
> include/net/af_vsock.h on top of VSOCK_TRANSPORT_F_DGRAM mentioning
> this.
> 

Agreed. I'll rename to VSOCK_TRANSPORT_F_DGRAM_FALLBACK, unless we find
there is a better way altogether.

> > 
> > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > ---
> > drivers/vhost/vsock.c                   |  1 -
> > include/linux/virtio_vsock.h            |  2 -
> > net/vmw_vsock/af_vsock.c                | 78 +++++++++++++++++++++++++--------
> > net/vmw_vsock/hyperv_transport.c        |  6 ---
> > net/vmw_vsock/virtio_transport.c        |  1 -
> > net/vmw_vsock/virtio_transport_common.c |  7 ---
> > net/vmw_vsock/vsock_loopback.c          |  1 -
> > 7 files changed, 60 insertions(+), 36 deletions(-)
> > 
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index c8201c070b4b..8f0082da5e70 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = {
> > 		.cancel_pkt               = vhost_transport_cancel_pkt,
> > 
> > 		.dgram_enqueue            = virtio_transport_dgram_enqueue,
> > -		.dgram_bind               = virtio_transport_dgram_bind,
> > 		.dgram_allow              = virtio_transport_dgram_allow,
> > 		.dgram_get_cid		  = virtio_transport_dgram_get_cid,
> > 		.dgram_get_port		  = virtio_transport_dgram_get_port,
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 23521a318cf0..73afa09f4585 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -216,8 +216,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
> > u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
> > bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
> > bool virtio_transport_stream_allow(u32 cid, u32 port);
> > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > -				struct sockaddr_vm *addr);
> > bool virtio_transport_dgram_allow(u32 cid, u32 port);
> > int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid);
> > int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port);
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index 74358f0b47fa..ef86765f3765 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -438,6 +438,18 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags)
> > 	return transport;
> > }
> > 
> > +static const struct vsock_transport *
> > +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags)
> > +{
> > +	const struct vsock_transport *transport;
> > +
> > +	transport = vsock_connectible_lookup_transport(cid, flags);
> > +	if (transport)
> > +		return transport;
> > +
> > +	return transport_dgram;
> > +}
> > +
> > /* Assign a transport to a socket and call the .init transport callback.
> >  *
> >  * Note: for connection oriented socket this must be called when vsk->remote_addr
> > @@ -474,7 +486,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> > 
> > 	switch (sk->sk_type) {
> > 	case SOCK_DGRAM:
> > -		new_transport = transport_dgram;
> > +		new_transport = vsock_dgram_lookup_transport(remote_cid,
> > +							     remote_flags);
> > 		break;
> > 	case SOCK_STREAM:
> > 	case SOCK_SEQPACKET:
> > @@ -691,6 +704,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> > static int __vsock_bind_dgram(struct vsock_sock *vsk,
> > 			      struct sockaddr_vm *addr)
> > {
> > +	if (!vsk->transport || !vsk->transport->dgram_bind)
> > +		return -EINVAL;
> > +
> > 	return vsk->transport->dgram_bind(vsk, addr);
> > }
> > 
> > @@ -1172,19 +1188,24 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > 
> > 	lock_sock(sk);
> > 
> > -	transport = vsk->transport;
> > -
> > -	err = vsock_auto_bind(vsk);
> > -	if (err)
> > -		goto out;
> > -
> > -
> > 	/* If the provided message contains an address, use that.  Otherwise
> > 	 * fall back on the socket's remote handle (if it has been connected).
> > 	 */
> > 	if (msg->msg_name &&
> > 	    vsock_addr_cast(msg->msg_name, msg->msg_namelen,
> > 			    &remote_addr) == 0) {
> > +		transport = vsock_dgram_lookup_transport(remote_addr->svm_cid,
> > +							 remote_addr->svm_flags);
> > +		if (!transport) {
> > +			err = -EINVAL;
> > +			goto out;
> > +		}
> > +
> > +		if (!try_module_get(transport->module)) {
> > +			err = -ENODEV;
> > +			goto out;
> > +		}
> > +
> > 		/* Ensure this address is of the right type and is a valid
> > 		 * destination.
> > 		 */
> > @@ -1193,11 +1214,27 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > 			remote_addr->svm_cid = transport->get_local_cid();
> > 
> 
> From here ...
> 
> > 		if (!vsock_addr_bound(remote_addr)) {
> > +			module_put(transport->module);
> > +			err = -EINVAL;
> > +			goto out;
> > +		}
> > +
> > +		if (!transport->dgram_allow(remote_addr->svm_cid,
> > +					    remote_addr->svm_port)) {
> > +			module_put(transport->module);
> > 			err = -EINVAL;
> > 			goto out;
> > 		}
> > +
> > +		err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
> 
> ... to here, looks like duplicate code, can we get it out of the if block?
> 

Yes, I think using something like this:

[...]
	bool module_got = false;

[...]
		if (!try_module_get(transport->module)) {
			err = -ENODEV;
			goto out;
		}
		module_got = true;

[...]

out:
	if (likely(transport && !err && module_got))
		module_put(transport->module)

> > +		module_put(transport->module);
> > 	} else if (sock->state == SS_CONNECTED) {
> > 		remote_addr = &vsk->remote_addr;
> > +		transport = vsk->transport;
> > +
> > +		err = vsock_auto_bind(vsk);
> > +		if (err)
> > +			goto out;
> > 
> > 		if (remote_addr->svm_cid == VMADDR_CID_ANY)
> > 			remote_addr->svm_cid = transport->get_local_cid();
> > @@ -1205,23 +1242,23 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> > 		/* XXX Should connect() or this function ensure remote_addr is
> > 		 * bound?
> > 		 */
> > -		if (!vsock_addr_bound(&vsk->remote_addr)) {
> > +		if (!vsock_addr_bound(remote_addr)) {
> > 			err = -EINVAL;
> > 			goto out;
> > 		}
> > -	} else {
> > -		err = -EINVAL;
> > -		goto out;
> > -	}
> > 
> > -	if (!transport->dgram_allow(remote_addr->svm_cid,
> > -				    remote_addr->svm_port)) {
> > +		if (!transport->dgram_allow(remote_addr->svm_cid,
> > +					    remote_addr->svm_port)) {
> > +			err = -EINVAL;
> > +			goto out;
> > +		}
> > +
> > +		err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
> > +	} else {
> > 		err = -EINVAL;
> > 		goto out;
> > 	}
> > 
> > -	err = transport->dgram_enqueue(vsk, remote_addr, msg, len);
> > -
> > out:
> > 	release_sock(sk);
> > 	return err;
> > @@ -1255,13 +1292,18 @@ static int vsock_dgram_connect(struct socket *sock,
> > 	if (err)
> > 		goto out;
> > 
> > +	memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> > +
> > +	err = vsock_assign_transport(vsk, NULL);
> > +	if (err)
> > +		goto out;
> > +
> > 	if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
> > 					 remote_addr->svm_port)) {
> > 		err = -EINVAL;
> > 		goto out;
> > 	}
> > 
> > -	memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr));
> > 	sock->state = SS_CONNECTED;
> > 
> > 	/* sock map disallows redirection of non-TCP sockets with sk_state !=
> > diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> > index ff6e87e25fa0..c00bc5da769a 100644
> > --- a/net/vmw_vsock/hyperv_transport.c
> > +++ b/net/vmw_vsock/hyperv_transport.c
> > @@ -551,11 +551,6 @@ static void hvs_destruct(struct vsock_sock *vsk)
> > 	kfree(hvs);
> > }
> > 
> > -static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
> > -{
> > -	return -EOPNOTSUPP;
> > -}
> > -
> > static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> > {
> > 	return -EOPNOTSUPP;
> > @@ -841,7 +836,6 @@ static struct vsock_transport hvs_transport = {
> > 	.connect                  = hvs_connect,
> > 	.shutdown                 = hvs_shutdown,
> > 
> > -	.dgram_bind               = hvs_dgram_bind,
> > 	.dgram_get_cid		  = hvs_dgram_get_cid,
> > 	.dgram_get_port		  = hvs_dgram_get_port,
> > 	.dgram_get_length	  = hvs_dgram_get_length,
> > diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > index 5763cdf13804..1b7843a7779a 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -428,7 +428,6 @@ static struct virtio_transport virtio_transport = {
> > 		.shutdown                 = virtio_transport_shutdown,
> > 		.cancel_pkt               = virtio_transport_cancel_pkt,
> > 
> > -		.dgram_bind               = virtio_transport_dgram_bind,
> > 		.dgram_enqueue            = virtio_transport_dgram_enqueue,
> > 		.dgram_allow              = virtio_transport_dgram_allow,
> > 		.dgram_get_cid		  = virtio_transport_dgram_get_cid,
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index e6903c719964..d5a3c8efe84b 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -790,13 +790,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port)
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> > 
> > -int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > -				struct sockaddr_vm *addr)
> > -{
> > -	return -EOPNOTSUPP;
> > -}
> > -EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> > -
> > int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid)
> > {
> > 	return -EOPNOTSUPP;
> > diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> > index 2f3cabc79ee5..e9de45a26fbd 100644
> > --- a/net/vmw_vsock/vsock_loopback.c
> > +++ b/net/vmw_vsock/vsock_loopback.c
> > @@ -61,7 +61,6 @@ static struct virtio_transport loopback_transport = {
> > 		.shutdown                 = virtio_transport_shutdown,
> > 		.cancel_pkt               = vsock_loopback_cancel_pkt,
> > 
> > -		.dgram_bind               = virtio_transport_dgram_bind,
> > 		.dgram_enqueue            = virtio_transport_dgram_enqueue,
> > 		.dgram_allow              = virtio_transport_dgram_allow,
> > 		.dgram_get_cid		  = virtio_transport_dgram_get_cid,
> > 
> > -- 
> > 2.30.2
> > 
> 
> The rest LGTM!
> 
> Stefano

Thanks,
Bobby

  reply	other threads:[~2023-06-23 20:38 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-10  0:58 [PATCH RFC net-next v4 0/8] virtio/vsock: support datagrams Bobby Eshleman
2023-06-10  0:58 ` [PATCH RFC net-next v4 1/8] vsock/dgram: generalize recvmsg and drop transport->dgram_dequeue Bobby Eshleman
2023-06-11 20:43   ` Arseniy Krasnov
2023-06-22 14:51     ` Stefano Garzarella
2023-06-22 14:51       ` Stefano Garzarella
2023-06-22 19:23       ` Arseniy Krasnov
2023-06-22 23:34         ` Bobby Eshleman
2023-06-22 23:37       ` Bobby Eshleman
2023-06-23  8:14         ` Stefano Garzarella
2023-06-23  8:14           ` Stefano Garzarella
2023-06-22 23:25     ` Bobby Eshleman
2023-06-10  0:58 ` [PATCH RFC net-next v4 2/8] vsock: refactor transport lookup code Bobby Eshleman
2023-06-22 14:57   ` Stefano Garzarella
2023-06-22 14:57     ` Stefano Garzarella
2023-06-10  0:58 ` [PATCH RFC net-next v4 3/8] vsock: support multi-transport datagrams Bobby Eshleman
2023-06-22 15:19   ` Stefano Garzarella
2023-06-22 15:19     ` Stefano Garzarella
2023-06-23  2:50     ` Bobby Eshleman [this message]
2023-06-23  2:59       ` Bobby Eshleman
2023-06-26 14:50         ` Stefano Garzarella
2023-06-26 14:50           ` Stefano Garzarella
2023-06-10  0:58 ` [PATCH RFC net-next v4 4/8] vsock: make vsock bind reusable Bobby Eshleman
2023-06-12  9:49   ` Simon Horman
2023-06-22 23:00     ` Bobby Eshleman
2023-06-22 15:25   ` Stefano Garzarella
2023-06-22 15:25     ` Stefano Garzarella
2023-06-22 23:05     ` Bobby Eshleman
2023-06-23  8:15       ` Stefano Garzarella
2023-06-23  8:15         ` Stefano Garzarella
2023-06-10  0:58 ` [PATCH RFC net-next v4 5/8] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Bobby Eshleman
2023-06-22 15:29   ` Stefano Garzarella
2023-06-22 15:29     ` Stefano Garzarella
2023-06-22 23:06     ` Bobby Eshleman
2023-06-10  0:58 ` [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams Bobby Eshleman
2023-06-11 20:49   ` Arseniy Krasnov
2023-06-22 16:09     ` Stefano Garzarella
2023-06-22 16:09       ` Stefano Garzarella
2023-06-22 18:46       ` Arseniy Krasnov
2023-06-23  4:37       ` Bobby Eshleman
2023-06-26 15:03         ` Stefano Garzarella
2023-06-26 15:03           ` Stefano Garzarella
2023-06-27  1:19           ` Bobby Eshleman
2023-06-29 12:30             ` Stefano Garzarella
2023-06-29 12:30               ` Stefano Garzarella
2023-06-22 16:31   ` Stefano Garzarella
2023-06-22 16:31     ` Stefano Garzarella
2023-06-10  0:58 ` [PATCH RFC net-next v4 7/8] vsock: Add lockless sendmsg() support Bobby Eshleman
2023-06-12  9:53   ` Simon Horman
2023-06-22 22:59     ` Bobby Eshleman
2023-06-22 16:37   ` Stefano Garzarella
2023-06-22 16:37     ` Stefano Garzarella
2023-06-22 22:57     ` Bobby Eshleman
2023-06-10  0:58 ` [PATCH RFC net-next v4 8/8] tests: add vsock dgram tests Bobby Eshleman
2023-06-11 20:54   ` Arseniy Krasnov
2023-06-22 23:16     ` Bobby Eshleman
2023-06-23 18:34       ` Arseniy Krasnov
2023-06-23  6:33         ` Bobby Eshleman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJUIWcgg13F7DNBm@bullseye \
    --to=bobbyeshleman@gmail.com \
    --cc=bobby.eshleman@bytedance.com \
    --cc=bpf@vger.kernel.org \
    --cc=bryantan@vmware.com \
    --cc=dan.carpenter@linaro.org \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=edumazet@google.com \
    --cc=haiyangz@microsoft.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=oxffffaa@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=pv-drivers@vmware.com \
    --cc=sgarzare@redhat.com \
    --cc=simon.horman@corigine.com \
    --cc=stefanha@redhat.com \
    --cc=vdasa@vmware.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=wei.liu@kernel.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.