Linux-HyperV Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH net-next 00/14] vsock: add multi-transports support
@ 2019-10-23  9:55 Stefano Garzarella
  2019-10-23  9:55 ` [PATCH net-next 01/14] vsock/vmci: remove unused VSOCK_DEFAULT_CONNECT_TIMEOUT Stefano Garzarella
                   ` (14 more replies)
  0 siblings, 15 replies; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

This series adds the multi-transports support to vsock, following
this proposal: https://www.spinics.net/lists/netdev/msg575792.html

With the multi-transports support, we can use VSOCK with nested VMs
(using also different hypervisors) loading both guest->host and
host->guest transports at the same time.
Before this series, vmci-transport supported this behavior but only
using VMware hypervisor on L0, L1, etc.

RFC: https://patchwork.ozlabs.org/cover/1168442/
RFC -> v1:
- Added R-b/A-b from Dexuan and Stefan
- Fixed comments and typos in several patches (Stefan)
- Patch 7: changed .notify_buffer_size return to void (Stefan)
- Added patch 8 to simplify the API exposed to the transports (Stefan)
- Patch 11:
  + documented VSOCK_TRANSPORT_F_* flags (Stefan)
  + fixed vsock_assign_transport() when the socket is already assigned
  + moved features outside of struct vsock_transport, and used as
    parameter of vsock_core_register() as a preparation of Patch 12
- Removed "vsock: add 'transport_hg' to handle g2h\h2g transports" patch
- Added patch 12 to register vmci_transport only when VMCI guest/host
  are active

The first 9 patches are cleanups and preparations, maybe some of
these can go regardless of this series.

Patch 10 changes the hvs_remote_addr_init(). setting the
VMADDR_CID_HOST as remote CID instead of VMADDR_CID_ANY to make
the choice of transport to be used work properly.

Patch 11 adds multi-transports support.

Patch 12 touch a little bit the vmci_transport and the vmci driver
to register the vmci_transport only when there are active host/guest.

Patch 13 prevents the transport modules unloading while sockets are
assigned to them.

Patch 14 fixes an issue in the bind() logic discoverable only with
the new multi-transport support.

I've tested this series with nested KVM (vsock-transport [L0,L1],
virtio-transport[L1,L2]) and with VMware (L0) + KVM (L1)
(vmci-transport [L0,L1], vhost-transport [L1], virtio-transport[L2]).

Dexuan successfully tested the RFC series on HyperV with a Linux guest.

Stefano Garzarella (14):
  vsock/vmci: remove unused VSOCK_DEFAULT_CONNECT_TIMEOUT
  vsock: remove vm_sockets_get_local_cid()
  vsock: remove include/linux/vm_sockets.h file
  vsock: add 'transport' member in the struct vsock_sock
  vsock/virtio: add transport parameter to the
    virtio_transport_reset_no_sock()
  vsock: add 'struct vsock_sock *' param to vsock_core_get_transport()
  vsock: handle buffer_size sockopts in the core
  vsock: add vsock_create_connected() called by transports
  vsock: move vsock_insert_unbound() in the vsock_create()
  hv_sock: set VMADDR_CID_HOST in the hvs_remote_addr_init()
  vsock: add multi-transports support
  vsock/vmci: register vmci_transport only when VMCI guest/host are
    active
  vsock: prevent transport modules unloading
  vsock: fix bind() behaviour taking care of CID

 drivers/misc/vmw_vmci/vmci_driver.c     |  50 ++++
 drivers/misc/vmw_vmci/vmci_driver.h     |   2 +
 drivers/misc/vmw_vmci/vmci_guest.c      |   2 +
 drivers/misc/vmw_vmci/vmci_host.c       |   7 +
 drivers/vhost/vsock.c                   |  96 +++---
 include/linux/virtio_vsock.h            |  18 +-
 include/linux/vm_sockets.h              |  15 -
 include/linux/vmw_vmci_api.h            |   2 +
 include/net/af_vsock.h                  |  44 +--
 include/net/vsock_addr.h                |   2 +-
 net/vmw_vsock/af_vsock.c                | 376 ++++++++++++++++++------
 net/vmw_vsock/hyperv_transport.c        |  70 ++---
 net/vmw_vsock/virtio_transport.c        | 177 ++++++-----
 net/vmw_vsock/virtio_transport_common.c | 131 +++------
 net/vmw_vsock/vmci_transport.c          | 137 +++------
 net/vmw_vsock/vmci_transport.h          |   3 -
 net/vmw_vsock/vmci_transport_notify.h   |   1 -
 17 files changed, 627 insertions(+), 506 deletions(-)
 delete mode 100644 include/linux/vm_sockets.h

-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net-next 01/14] vsock/vmci: remove unused VSOCK_DEFAULT_CONNECT_TIMEOUT
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
@ 2019-10-23  9:55 ` Stefano Garzarella
  2019-10-30 14:54   ` Jorgen Hansen
  2019-10-23  9:55 ` [PATCH net-next 02/14] vsock: remove vm_sockets_get_local_cid() Stefano Garzarella
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

The VSOCK_DEFAULT_CONNECT_TIMEOUT definition was introduced with
commit d021c344051af ("VSOCK: Introduce VM Sockets"), but it is
never used in the net/vmw_vsock/vmci_transport.c.

VSOCK_DEFAULT_CONNECT_TIMEOUT is used and defined in
net/vmw_vsock/af_vsock.c

Cc: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/vmci_transport.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 8c9c4ed90fa7..f8e3131ac480 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -78,11 +78,6 @@ static int PROTOCOL_OVERRIDE = -1;
 #define VMCI_TRANSPORT_DEFAULT_QP_SIZE       262144
 #define VMCI_TRANSPORT_DEFAULT_QP_SIZE_MAX   262144
 
-/* The default peer timeout indicates how long we will wait for a peer response
- * to a control message.
- */
-#define VSOCK_DEFAULT_CONNECT_TIMEOUT (2 * HZ)
-
 /* Helper function to convert from a VMCI error code to a VSock error code. */
 
 static s32 vmci_transport_error_to_vsock_error(s32 vmci_error)
-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net-next 02/14] vsock: remove vm_sockets_get_local_cid()
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
  2019-10-23  9:55 ` [PATCH net-next 01/14] vsock/vmci: remove unused VSOCK_DEFAULT_CONNECT_TIMEOUT Stefano Garzarella
@ 2019-10-23  9:55 ` Stefano Garzarella
  2019-10-30 14:55   ` Jorgen Hansen
  2019-10-23  9:55 ` [PATCH net-next 03/14] vsock: remove include/linux/vm_sockets.h file Stefano Garzarella
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

vm_sockets_get_local_cid() is only used in virtio_transport_common.c.
We can replace it calling the virtio_transport_get_ops() and
using the get_local_cid() callback registered by the transport.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/linux/vm_sockets.h              |  2 --
 net/vmw_vsock/af_vsock.c                | 10 ----------
 net/vmw_vsock/virtio_transport_common.c |  2 +-
 3 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/include/linux/vm_sockets.h b/include/linux/vm_sockets.h
index 33f1a2ecd905..7dd899ccb920 100644
--- a/include/linux/vm_sockets.h
+++ b/include/linux/vm_sockets.h
@@ -10,6 +10,4 @@
 
 #include <uapi/linux/vm_sockets.h>
 
-int vm_sockets_get_local_cid(void);
-
 #endif /* _VM_SOCKETS_H */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 2ab43b2bba31..2f2582fb7fdd 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -129,16 +129,6 @@ static struct proto vsock_proto = {
 static const struct vsock_transport *transport;
 static DEFINE_MUTEX(vsock_register_mutex);
 
-/**** EXPORTS ****/
-
-/* Get the ID of the local context.  This is transport dependent. */
-
-int vm_sockets_get_local_cid(void)
-{
-	return transport->get_local_cid();
-}
-EXPORT_SYMBOL_GPL(vm_sockets_get_local_cid);
-
 /**** UTILS ****/
 
 /* Each bound VSocket is stored in the bind hash table and each connected
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index d02c9b41a768..b1cd16ed66ea 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -168,7 +168,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	struct virtio_vsock_pkt *pkt;
 	u32 pkt_len = info->pkt_len;
 
-	src_cid = vm_sockets_get_local_cid();
+	src_cid = virtio_transport_get_ops()->transport.get_local_cid();
 	src_port = vsk->local_addr.svm_port;
 	if (!info->remote_cid) {
 		dst_cid	= vsk->remote_addr.svm_cid;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net-next 03/14] vsock: remove include/linux/vm_sockets.h file
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
  2019-10-23  9:55 ` [PATCH net-next 01/14] vsock/vmci: remove unused VSOCK_DEFAULT_CONNECT_TIMEOUT Stefano Garzarella
  2019-10-23  9:55 ` [PATCH net-next 02/14] vsock: remove vm_sockets_get_local_cid() Stefano Garzarella
@ 2019-10-23  9:55 ` Stefano Garzarella
  2019-10-30 14:57   ` Jorgen Hansen
  2019-10-23  9:55 ` [PATCH net-next 04/14] vsock: add 'transport' member in the struct vsock_sock Stefano Garzarella
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

This header file now only includes the "uapi/linux/vm_sockets.h".
We can include directly it when needed.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/linux/vm_sockets.h            | 13 -------------
 include/net/af_vsock.h                |  2 +-
 include/net/vsock_addr.h              |  2 +-
 net/vmw_vsock/vmci_transport_notify.h |  1 -
 4 files changed, 2 insertions(+), 16 deletions(-)
 delete mode 100644 include/linux/vm_sockets.h

diff --git a/include/linux/vm_sockets.h b/include/linux/vm_sockets.h
deleted file mode 100644
index 7dd899ccb920..000000000000
--- a/include/linux/vm_sockets.h
+++ /dev/null
@@ -1,13 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * VMware vSockets Driver
- *
- * Copyright (C) 2007-2013 VMware, Inc. All rights reserved.
- */
-
-#ifndef _VM_SOCKETS_H
-#define _VM_SOCKETS_H
-
-#include <uapi/linux/vm_sockets.h>
-
-#endif /* _VM_SOCKETS_H */
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 80ea0f93d3f7..c660402b10f2 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -10,7 +10,7 @@
 
 #include <linux/kernel.h>
 #include <linux/workqueue.h>
-#include <linux/vm_sockets.h>
+#include <uapi/linux/vm_sockets.h>
 
 #include "vsock_addr.h"
 
diff --git a/include/net/vsock_addr.h b/include/net/vsock_addr.h
index 57d2db5c4bdf..cf8cc140d68d 100644
--- a/include/net/vsock_addr.h
+++ b/include/net/vsock_addr.h
@@ -8,7 +8,7 @@
 #ifndef _VSOCK_ADDR_H_
 #define _VSOCK_ADDR_H_
 
-#include <linux/vm_sockets.h>
+#include <uapi/linux/vm_sockets.h>
 
 void vsock_addr_init(struct sockaddr_vm *addr, u32 cid, u32 port);
 int vsock_addr_validate(const struct sockaddr_vm *addr);
diff --git a/net/vmw_vsock/vmci_transport_notify.h b/net/vmw_vsock/vmci_transport_notify.h
index 7843f08d4290..a1aa5a998c0e 100644
--- a/net/vmw_vsock/vmci_transport_notify.h
+++ b/net/vmw_vsock/vmci_transport_notify.h
@@ -11,7 +11,6 @@
 #include <linux/types.h>
 #include <linux/vmw_vmci_defs.h>
 #include <linux/vmw_vmci_api.h>
-#include <linux/vm_sockets.h>
 
 #include "vmci_transport.h"
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net-next 04/14] vsock: add 'transport' member in the struct vsock_sock
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
                   ` (2 preceding siblings ...)
  2019-10-23  9:55 ` [PATCH net-next 03/14] vsock: remove include/linux/vm_sockets.h file Stefano Garzarella
@ 2019-10-23  9:55 ` Stefano Garzarella
  2019-10-30 14:57   ` Jorgen Hansen
  2019-10-23  9:55 ` [PATCH net-next 05/14] vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock() Stefano Garzarella
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

As a preparation to support multiple transports, this patch adds
the 'transport' member at the 'struct vsock_sock'.
This new field is initialized during the creation in the
__vsock_create() function.

This patch also renames the global 'transport' pointer to
'transport_single', since for now we're only supporting a single
transport registered at run-time.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/net/af_vsock.h   |  1 +
 net/vmw_vsock/af_vsock.c | 56 +++++++++++++++++++++++++++-------------
 2 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index c660402b10f2..a5e1e134261d 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -27,6 +27,7 @@ extern spinlock_t vsock_table_lock;
 struct vsock_sock {
 	/* sk must be the first member. */
 	struct sock sk;
+	const struct vsock_transport *transport;
 	struct sockaddr_vm local_addr;
 	struct sockaddr_vm remote_addr;
 	/* Links for the global tables of bound and connected sockets. */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 2f2582fb7fdd..c3a14f853eb0 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -126,7 +126,7 @@ static struct proto vsock_proto = {
  */
 #define VSOCK_DEFAULT_CONNECT_TIMEOUT (2 * HZ)
 
-static const struct vsock_transport *transport;
+static const struct vsock_transport *transport_single;
 static DEFINE_MUTEX(vsock_register_mutex);
 
 /**** UTILS ****/
@@ -408,7 +408,9 @@ static bool vsock_is_pending(struct sock *sk)
 
 static int vsock_send_shutdown(struct sock *sk, int mode)
 {
-	return transport->shutdown(vsock_sk(sk), mode);
+	struct vsock_sock *vsk = vsock_sk(sk);
+
+	return vsk->transport->shutdown(vsk, mode);
 }
 
 static void vsock_pending_work(struct work_struct *work)
@@ -518,7 +520,7 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
 static int __vsock_bind_dgram(struct vsock_sock *vsk,
 			      struct sockaddr_vm *addr)
 {
-	return transport->dgram_bind(vsk, addr);
+	return vsk->transport->dgram_bind(vsk, addr);
 }
 
 static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
@@ -536,7 +538,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
 	 * like AF_INET prevents binding to a non-local IP address (in most
 	 * cases), we only allow binding to the local CID.
 	 */
-	cid = transport->get_local_cid();
+	cid = vsk->transport->get_local_cid();
 	if (addr->svm_cid != cid && addr->svm_cid != VMADDR_CID_ANY)
 		return -EADDRNOTAVAIL;
 
@@ -586,6 +588,7 @@ struct sock *__vsock_create(struct net *net,
 		sk->sk_type = type;
 
 	vsk = vsock_sk(sk);
+	vsk->transport = transport_single;
 	vsock_addr_init(&vsk->local_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
 	vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
 
@@ -616,7 +619,7 @@ struct sock *__vsock_create(struct net *net,
 		vsk->connect_timeout = VSOCK_DEFAULT_CONNECT_TIMEOUT;
 	}
 
-	if (transport->init(vsk, psk) < 0) {
+	if (vsk->transport->init(vsk, psk) < 0) {
 		sk_free(sk);
 		return NULL;
 	}
@@ -641,7 +644,7 @@ static void __vsock_release(struct sock *sk, int level)
 		/* The release call is supposed to use lock_sock_nested()
 		 * rather than lock_sock(), if a sock lock should be acquired.
 		 */
-		transport->release(vsk);
+		vsk->transport->release(vsk);
 
 		/* When "level" is SINGLE_DEPTH_NESTING, use the nested
 		 * version to avoid the warning "possible recursive locking
@@ -670,7 +673,7 @@ static void vsock_sk_destruct(struct sock *sk)
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
 
-	transport->destruct(vsk);
+	vsk->transport->destruct(vsk);
 
 	/* When clearing these addresses, there's no need to set the family and
 	 * possibly register the address family with the kernel.
@@ -694,13 +697,13 @@ static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 
 s64 vsock_stream_has_data(struct vsock_sock *vsk)
 {
-	return transport->stream_has_data(vsk);
+	return vsk->transport->stream_has_data(vsk);
 }
 EXPORT_SYMBOL_GPL(vsock_stream_has_data);
 
 s64 vsock_stream_has_space(struct vsock_sock *vsk)
 {
-	return transport->stream_has_space(vsk);
+	return vsk->transport->stream_has_space(vsk);
 }
 EXPORT_SYMBOL_GPL(vsock_stream_has_space);
 
@@ -869,6 +872,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
 			mask |= EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND;
 
 	} else if (sock->type == SOCK_STREAM) {
+		const struct vsock_transport *transport = vsk->transport;
 		lock_sock(sk);
 
 		/* Listening sockets that have connections in their accept
@@ -944,6 +948,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	struct sock *sk;
 	struct vsock_sock *vsk;
 	struct sockaddr_vm *remote_addr;
+	const struct vsock_transport *transport;
 
 	if (msg->msg_flags & MSG_OOB)
 		return -EOPNOTSUPP;
@@ -952,6 +957,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 	err = 0;
 	sk = sock->sk;
 	vsk = vsock_sk(sk);
+	transport = vsk->transport;
 
 	lock_sock(sk);
 
@@ -1036,8 +1042,8 @@ static int vsock_dgram_connect(struct socket *sock,
 	if (err)
 		goto out;
 
-	if (!transport->dgram_allow(remote_addr->svm_cid,
-				    remote_addr->svm_port)) {
+	if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
+					 remote_addr->svm_port)) {
 		err = -EINVAL;
 		goto out;
 	}
@@ -1053,7 +1059,9 @@ static int vsock_dgram_connect(struct socket *sock,
 static int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
 			       size_t len, int flags)
 {
-	return transport->dgram_dequeue(vsock_sk(sock->sk), msg, len, flags);
+	struct vsock_sock *vsk = vsock_sk(sock->sk);
+
+	return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
 }
 
 static const struct proto_ops vsock_dgram_ops = {
@@ -1079,6 +1087,8 @@ static const struct proto_ops vsock_dgram_ops = {
 
 static int vsock_transport_cancel_pkt(struct vsock_sock *vsk)
 {
+	const struct vsock_transport *transport = vsk->transport;
+
 	if (!transport->cancel_pkt)
 		return -EOPNOTSUPP;
 
@@ -1115,6 +1125,7 @@ static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
 	int err;
 	struct sock *sk;
 	struct vsock_sock *vsk;
+	const struct vsock_transport *transport;
 	struct sockaddr_vm *remote_addr;
 	long timeout;
 	DEFINE_WAIT(wait);
@@ -1122,6 +1133,7 @@ static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
 	err = 0;
 	sk = sock->sk;
 	vsk = vsock_sk(sk);
+	transport = vsk->transport;
 
 	lock_sock(sk);
 
@@ -1365,6 +1377,7 @@ static int vsock_stream_setsockopt(struct socket *sock,
 	int err;
 	struct sock *sk;
 	struct vsock_sock *vsk;
+	const struct vsock_transport *transport;
 	u64 val;
 
 	if (level != AF_VSOCK)
@@ -1385,6 +1398,7 @@ static int vsock_stream_setsockopt(struct socket *sock,
 	err = 0;
 	sk = sock->sk;
 	vsk = vsock_sk(sk);
+	transport = vsk->transport;
 
 	lock_sock(sk);
 
@@ -1442,6 +1456,7 @@ static int vsock_stream_getsockopt(struct socket *sock,
 	int len;
 	struct sock *sk;
 	struct vsock_sock *vsk;
+	const struct vsock_transport *transport;
 	u64 val;
 
 	if (level != AF_VSOCK)
@@ -1465,6 +1480,7 @@ static int vsock_stream_getsockopt(struct socket *sock,
 	err = 0;
 	sk = sock->sk;
 	vsk = vsock_sk(sk);
+	transport = vsk->transport;
 
 	switch (optname) {
 	case SO_VM_SOCKETS_BUFFER_SIZE:
@@ -1509,6 +1525,7 @@ static int vsock_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 {
 	struct sock *sk;
 	struct vsock_sock *vsk;
+	const struct vsock_transport *transport;
 	ssize_t total_written;
 	long timeout;
 	int err;
@@ -1517,6 +1534,7 @@ static int vsock_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 
 	sk = sock->sk;
 	vsk = vsock_sk(sk);
+	transport = vsk->transport;
 	total_written = 0;
 	err = 0;
 
@@ -1648,6 +1666,7 @@ vsock_stream_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 {
 	struct sock *sk;
 	struct vsock_sock *vsk;
+	const struct vsock_transport *transport;
 	int err;
 	size_t target;
 	ssize_t copied;
@@ -1658,6 +1677,7 @@ vsock_stream_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 
 	sk = sock->sk;
 	vsk = vsock_sk(sk);
+	transport = vsk->transport;
 	err = 0;
 
 	lock_sock(sk);
@@ -1872,7 +1892,7 @@ static long vsock_dev_do_ioctl(struct file *filp,
 
 	switch (cmd) {
 	case IOCTL_VM_SOCKETS_GET_LOCAL_CID:
-		if (put_user(transport->get_local_cid(), p) != 0)
+		if (put_user(transport_single->get_local_cid(), p) != 0)
 			retval = -EFAULT;
 		break;
 
@@ -1919,7 +1939,7 @@ int __vsock_core_init(const struct vsock_transport *t, struct module *owner)
 	if (err)
 		return err;
 
-	if (transport) {
+	if (transport_single) {
 		err = -EBUSY;
 		goto err_busy;
 	}
@@ -1928,7 +1948,7 @@ int __vsock_core_init(const struct vsock_transport *t, struct module *owner)
 	 * unload while there are open sockets.
 	 */
 	vsock_proto.owner = owner;
-	transport = t;
+	transport_single = t;
 
 	vsock_device.minor = MISC_DYNAMIC_MINOR;
 	err = misc_register(&vsock_device);
@@ -1958,7 +1978,7 @@ int __vsock_core_init(const struct vsock_transport *t, struct module *owner)
 err_deregister_misc:
 	misc_deregister(&vsock_device);
 err_reset_transport:
-	transport = NULL;
+	transport_single = NULL;
 err_busy:
 	mutex_unlock(&vsock_register_mutex);
 	return err;
@@ -1975,7 +1995,7 @@ void vsock_core_exit(void)
 
 	/* We do not want the assignment below re-ordered. */
 	mb();
-	transport = NULL;
+	transport_single = NULL;
 
 	mutex_unlock(&vsock_register_mutex);
 }
@@ -1986,7 +2006,7 @@ const struct vsock_transport *vsock_core_get_transport(void)
 	/* vsock_register_mutex not taken since only the transport uses this
 	 * function and only while registered.
 	 */
-	return transport;
+	return transport_single;
 }
 EXPORT_SYMBOL_GPL(vsock_core_get_transport);
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net-next 05/14] vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock()
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
                   ` (3 preceding siblings ...)
  2019-10-23  9:55 ` [PATCH net-next 04/14] vsock: add 'transport' member in the struct vsock_sock Stefano Garzarella
@ 2019-10-23  9:55 ` Stefano Garzarella
  2019-10-23  9:55 ` [PATCH net-next 06/14] vsock: add 'struct vsock_sock *' param to vsock_core_get_transport() Stefano Garzarella
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

We are going to add 'struct vsock_sock *' parameter to
virtio_transport_get_ops().

In some cases, like in the virtio_transport_reset_no_sock(),
we don't have any socket assigned to the packet received,
so we can't use the virtio_transport_get_ops().

In order to allow virtio_transport_reset_no_sock() to use the
'.send_pkt' callback from the 'vhost_transport' or 'virtio_transport',
we add the 'struct virtio_transport *' to it and to its caller:
virtio_transport_recv_pkt().

We moved the 'vhost_transport' and 'virtio_transport' definition,
to pass their address to the virtio_transport_recv_pkt().

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 drivers/vhost/vsock.c                   |  94 +++++++-------
 include/linux/virtio_vsock.h            |   3 +-
 net/vmw_vsock/virtio_transport.c        | 160 ++++++++++++------------
 net/vmw_vsock/virtio_transport_common.c |  12 +-
 4 files changed, 135 insertions(+), 134 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 9f57736fe15e..92ab3852c954 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -384,6 +384,52 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
 	return val < vq->num;
 }
 
+static struct virtio_transport vhost_transport = {
+	.transport = {
+		.get_local_cid            = vhost_transport_get_local_cid,
+
+		.init                     = virtio_transport_do_socket_init,
+		.destruct                 = virtio_transport_destruct,
+		.release                  = virtio_transport_release,
+		.connect                  = virtio_transport_connect,
+		.shutdown                 = virtio_transport_shutdown,
+		.cancel_pkt               = vhost_transport_cancel_pkt,
+
+		.dgram_enqueue            = virtio_transport_dgram_enqueue,
+		.dgram_dequeue            = virtio_transport_dgram_dequeue,
+		.dgram_bind               = virtio_transport_dgram_bind,
+		.dgram_allow              = virtio_transport_dgram_allow,
+
+		.stream_enqueue           = virtio_transport_stream_enqueue,
+		.stream_dequeue           = virtio_transport_stream_dequeue,
+		.stream_has_data          = virtio_transport_stream_has_data,
+		.stream_has_space         = virtio_transport_stream_has_space,
+		.stream_rcvhiwat          = virtio_transport_stream_rcvhiwat,
+		.stream_is_active         = virtio_transport_stream_is_active,
+		.stream_allow             = virtio_transport_stream_allow,
+
+		.notify_poll_in           = virtio_transport_notify_poll_in,
+		.notify_poll_out          = virtio_transport_notify_poll_out,
+		.notify_recv_init         = virtio_transport_notify_recv_init,
+		.notify_recv_pre_block    = virtio_transport_notify_recv_pre_block,
+		.notify_recv_pre_dequeue  = virtio_transport_notify_recv_pre_dequeue,
+		.notify_recv_post_dequeue = virtio_transport_notify_recv_post_dequeue,
+		.notify_send_init         = virtio_transport_notify_send_init,
+		.notify_send_pre_block    = virtio_transport_notify_send_pre_block,
+		.notify_send_pre_enqueue  = virtio_transport_notify_send_pre_enqueue,
+		.notify_send_post_enqueue = virtio_transport_notify_send_post_enqueue,
+
+		.set_buffer_size          = virtio_transport_set_buffer_size,
+		.set_min_buffer_size      = virtio_transport_set_min_buffer_size,
+		.set_max_buffer_size      = virtio_transport_set_max_buffer_size,
+		.get_buffer_size          = virtio_transport_get_buffer_size,
+		.get_min_buffer_size      = virtio_transport_get_min_buffer_size,
+		.get_max_buffer_size      = virtio_transport_get_max_buffer_size,
+	},
+
+	.send_pkt = vhost_transport_send_pkt,
+};
+
 static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 {
 	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
@@ -438,7 +484,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 
 		/* Only accept correctly addressed packets */
 		if (le64_to_cpu(pkt->hdr.src_cid) == vsock->guest_cid)
-			virtio_transport_recv_pkt(pkt);
+			virtio_transport_recv_pkt(&vhost_transport, pkt);
 		else
 			virtio_transport_free_pkt(pkt);
 
@@ -786,52 +832,6 @@ static struct miscdevice vhost_vsock_misc = {
 	.fops = &vhost_vsock_fops,
 };
 
-static struct virtio_transport vhost_transport = {
-	.transport = {
-		.get_local_cid            = vhost_transport_get_local_cid,
-
-		.init                     = virtio_transport_do_socket_init,
-		.destruct                 = virtio_transport_destruct,
-		.release                  = virtio_transport_release,
-		.connect                  = virtio_transport_connect,
-		.shutdown                 = virtio_transport_shutdown,
-		.cancel_pkt               = vhost_transport_cancel_pkt,
-
-		.dgram_enqueue            = virtio_transport_dgram_enqueue,
-		.dgram_dequeue            = virtio_transport_dgram_dequeue,
-		.dgram_bind               = virtio_transport_dgram_bind,
-		.dgram_allow              = virtio_transport_dgram_allow,
-
-		.stream_enqueue           = virtio_transport_stream_enqueue,
-		.stream_dequeue           = virtio_transport_stream_dequeue,
-		.stream_has_data          = virtio_transport_stream_has_data,
-		.stream_has_space         = virtio_transport_stream_has_space,
-		.stream_rcvhiwat          = virtio_transport_stream_rcvhiwat,
-		.stream_is_active         = virtio_transport_stream_is_active,
-		.stream_allow             = virtio_transport_stream_allow,
-
-		.notify_poll_in           = virtio_transport_notify_poll_in,
-		.notify_poll_out          = virtio_transport_notify_poll_out,
-		.notify_recv_init         = virtio_transport_notify_recv_init,
-		.notify_recv_pre_block    = virtio_transport_notify_recv_pre_block,
-		.notify_recv_pre_dequeue  = virtio_transport_notify_recv_pre_dequeue,
-		.notify_recv_post_dequeue = virtio_transport_notify_recv_post_dequeue,
-		.notify_send_init         = virtio_transport_notify_send_init,
-		.notify_send_pre_block    = virtio_transport_notify_send_pre_block,
-		.notify_send_pre_enqueue  = virtio_transport_notify_send_pre_enqueue,
-		.notify_send_post_enqueue = virtio_transport_notify_send_post_enqueue,
-
-		.set_buffer_size          = virtio_transport_set_buffer_size,
-		.set_min_buffer_size      = virtio_transport_set_min_buffer_size,
-		.set_max_buffer_size      = virtio_transport_set_max_buffer_size,
-		.get_buffer_size          = virtio_transport_get_buffer_size,
-		.get_min_buffer_size      = virtio_transport_get_min_buffer_size,
-		.get_max_buffer_size      = virtio_transport_get_max_buffer_size,
-	},
-
-	.send_pkt = vhost_transport_send_pkt,
-};
-
 static int __init vhost_vsock_init(void)
 {
 	int ret;
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 4c7781f4b29b..96d8132acbd7 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -151,7 +151,8 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
 
 void virtio_transport_destruct(struct vsock_sock *vsk);
 
-void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt);
+void virtio_transport_recv_pkt(struct virtio_transport *t,
+			       struct virtio_vsock_pkt *pkt);
 void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt);
 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt);
 u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted);
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 082a30936690..3756f0857946 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -86,33 +86,6 @@ static u32 virtio_transport_get_local_cid(void)
 	return ret;
 }
 
-static void virtio_transport_loopback_work(struct work_struct *work)
-{
-	struct virtio_vsock *vsock =
-		container_of(work, struct virtio_vsock, loopback_work);
-	LIST_HEAD(pkts);
-
-	spin_lock_bh(&vsock->loopback_list_lock);
-	list_splice_init(&vsock->loopback_list, &pkts);
-	spin_unlock_bh(&vsock->loopback_list_lock);
-
-	mutex_lock(&vsock->rx_lock);
-
-	if (!vsock->rx_run)
-		goto out;
-
-	while (!list_empty(&pkts)) {
-		struct virtio_vsock_pkt *pkt;
-
-		pkt = list_first_entry(&pkts, struct virtio_vsock_pkt, list);
-		list_del_init(&pkt->list);
-
-		virtio_transport_recv_pkt(pkt);
-	}
-out:
-	mutex_unlock(&vsock->rx_lock);
-}
-
 static int virtio_transport_send_pkt_loopback(struct virtio_vsock *vsock,
 					      struct virtio_vsock_pkt *pkt)
 {
@@ -370,59 +343,6 @@ static bool virtio_transport_more_replies(struct virtio_vsock *vsock)
 	return val < virtqueue_get_vring_size(vq);
 }
 
-static void virtio_transport_rx_work(struct work_struct *work)
-{
-	struct virtio_vsock *vsock =
-		container_of(work, struct virtio_vsock, rx_work);
-	struct virtqueue *vq;
-
-	vq = vsock->vqs[VSOCK_VQ_RX];
-
-	mutex_lock(&vsock->rx_lock);
-
-	if (!vsock->rx_run)
-		goto out;
-
-	do {
-		virtqueue_disable_cb(vq);
-		for (;;) {
-			struct virtio_vsock_pkt *pkt;
-			unsigned int len;
-
-			if (!virtio_transport_more_replies(vsock)) {
-				/* Stop rx until the device processes already
-				 * pending replies.  Leave rx virtqueue
-				 * callbacks disabled.
-				 */
-				goto out;
-			}
-
-			pkt = virtqueue_get_buf(vq, &len);
-			if (!pkt) {
-				break;
-			}
-
-			vsock->rx_buf_nr--;
-
-			/* Drop short/long packets */
-			if (unlikely(len < sizeof(pkt->hdr) ||
-				     len > sizeof(pkt->hdr) + pkt->len)) {
-				virtio_transport_free_pkt(pkt);
-				continue;
-			}
-
-			pkt->len = len - sizeof(pkt->hdr);
-			virtio_transport_deliver_tap_pkt(pkt);
-			virtio_transport_recv_pkt(pkt);
-		}
-	} while (!virtqueue_enable_cb(vq));
-
-out:
-	if (vsock->rx_buf_nr < vsock->rx_buf_max_nr / 2)
-		virtio_vsock_rx_fill(vsock);
-	mutex_unlock(&vsock->rx_lock);
-}
-
 /* event_lock must be held */
 static int virtio_vsock_event_fill_one(struct virtio_vsock *vsock,
 				       struct virtio_vsock_event *event)
@@ -586,6 +506,86 @@ static struct virtio_transport virtio_transport = {
 	.send_pkt = virtio_transport_send_pkt,
 };
 
+static void virtio_transport_loopback_work(struct work_struct *work)
+{
+	struct virtio_vsock *vsock =
+		container_of(work, struct virtio_vsock, loopback_work);
+	LIST_HEAD(pkts);
+
+	spin_lock_bh(&vsock->loopback_list_lock);
+	list_splice_init(&vsock->loopback_list, &pkts);
+	spin_unlock_bh(&vsock->loopback_list_lock);
+
+	mutex_lock(&vsock->rx_lock);
+
+	if (!vsock->rx_run)
+		goto out;
+
+	while (!list_empty(&pkts)) {
+		struct virtio_vsock_pkt *pkt;
+
+		pkt = list_first_entry(&pkts, struct virtio_vsock_pkt, list);
+		list_del_init(&pkt->list);
+
+		virtio_transport_recv_pkt(&virtio_transport, pkt);
+	}
+out:
+	mutex_unlock(&vsock->rx_lock);
+}
+
+static void virtio_transport_rx_work(struct work_struct *work)
+{
+	struct virtio_vsock *vsock =
+		container_of(work, struct virtio_vsock, rx_work);
+	struct virtqueue *vq;
+
+	vq = vsock->vqs[VSOCK_VQ_RX];
+
+	mutex_lock(&vsock->rx_lock);
+
+	if (!vsock->rx_run)
+		goto out;
+
+	do {
+		virtqueue_disable_cb(vq);
+		for (;;) {
+			struct virtio_vsock_pkt *pkt;
+			unsigned int len;
+
+			if (!virtio_transport_more_replies(vsock)) {
+				/* Stop rx until the device processes already
+				 * pending replies.  Leave rx virtqueue
+				 * callbacks disabled.
+				 */
+				goto out;
+			}
+
+			pkt = virtqueue_get_buf(vq, &len);
+			if (!pkt) {
+				break;
+			}
+
+			vsock->rx_buf_nr--;
+
+			/* Drop short/long packets */
+			if (unlikely(len < sizeof(pkt->hdr) ||
+				     len > sizeof(pkt->hdr) + pkt->len)) {
+				virtio_transport_free_pkt(pkt);
+				continue;
+			}
+
+			pkt->len = len - sizeof(pkt->hdr);
+			virtio_transport_deliver_tap_pkt(pkt);
+			virtio_transport_recv_pkt(&virtio_transport, pkt);
+		}
+	} while (!virtqueue_enable_cb(vq));
+
+out:
+	if (vsock->rx_buf_nr < vsock->rx_buf_max_nr / 2)
+		virtio_vsock_rx_fill(vsock);
+	mutex_unlock(&vsock->rx_lock);
+}
+
 static int virtio_vsock_probe(struct virtio_device *vdev)
 {
 	vq_callback_t *callbacks[] = {
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index b1cd16ed66ea..9763394f7a61 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -745,9 +745,9 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
 /* Normally packets are associated with a socket.  There may be no socket if an
  * attempt was made to connect to a socket that does not exist.
  */
-static int virtio_transport_reset_no_sock(struct virtio_vsock_pkt *pkt)
+static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
+					  struct virtio_vsock_pkt *pkt)
 {
-	const struct virtio_transport *t;
 	struct virtio_vsock_pkt *reply;
 	struct virtio_vsock_pkt_info info = {
 		.op = VIRTIO_VSOCK_OP_RST,
@@ -767,7 +767,6 @@ static int virtio_transport_reset_no_sock(struct virtio_vsock_pkt *pkt)
 	if (!reply)
 		return -ENOMEM;
 
-	t = virtio_transport_get_ops();
 	if (!t) {
 		virtio_transport_free_pkt(reply);
 		return -ENOTCONN;
@@ -1107,7 +1106,8 @@ static bool virtio_transport_space_update(struct sock *sk,
 /* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex
  * lock.
  */
-void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
+void virtio_transport_recv_pkt(struct virtio_transport *t,
+			       struct virtio_vsock_pkt *pkt)
 {
 	struct sockaddr_vm src, dst;
 	struct vsock_sock *vsk;
@@ -1129,7 +1129,7 @@ void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
 					le32_to_cpu(pkt->hdr.fwd_cnt));
 
 	if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM) {
-		(void)virtio_transport_reset_no_sock(pkt);
+		(void)virtio_transport_reset_no_sock(t, pkt);
 		goto free_pkt;
 	}
 
@@ -1140,7 +1140,7 @@ void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
 	if (!sk) {
 		sk = vsock_find_bound_socket(&dst);
 		if (!sk) {
-			(void)virtio_transport_reset_no_sock(pkt);
+			(void)virtio_transport_reset_no_sock(t, pkt);
 			goto free_pkt;
 		}
 	}
-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net-next 06/14] vsock: add 'struct vsock_sock *' param to vsock_core_get_transport()
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
                   ` (4 preceding siblings ...)
  2019-10-23  9:55 ` [PATCH net-next 05/14] vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock() Stefano Garzarella
@ 2019-10-23  9:55 ` Stefano Garzarella
  2019-10-30 15:01   ` Jorgen Hansen
  2019-10-23  9:55 ` [PATCH net-next 07/14] vsock: handle buffer_size sockopts in the core Stefano Garzarella
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

Since now the 'struct vsock_sock' object contains a pointer to
the transport, this patch adds a parameter to the
vsock_core_get_transport() to return the right transport
assigned to the socket.

This patch modifies also the virtio_transport_get_ops(), that
uses the vsock_core_get_transport(), adding the
'struct vsock_sock *' parameter.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
RFC -> v1:
- Removed comment about protecting transport_single (Stefan)
---
 include/net/af_vsock.h                  | 2 +-
 net/vmw_vsock/af_vsock.c                | 7 ++-----
 net/vmw_vsock/virtio_transport_common.c | 9 +++++----
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index a5e1e134261d..2ca67d048de4 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -166,7 +166,7 @@ static inline int vsock_core_init(const struct vsock_transport *t)
 void vsock_core_exit(void);
 
 /* The transport may downcast this to access transport-specific functions */
-const struct vsock_transport *vsock_core_get_transport(void);
+const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk);
 
 /**** UTILS ****/
 
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index c3a14f853eb0..eaea159006c8 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -2001,12 +2001,9 @@ void vsock_core_exit(void)
 }
 EXPORT_SYMBOL_GPL(vsock_core_exit);
 
-const struct vsock_transport *vsock_core_get_transport(void)
+const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk)
 {
-	/* vsock_register_mutex not taken since only the transport uses this
-	 * function and only while registered.
-	 */
-	return transport_single;
+	return vsk->transport;
 }
 EXPORT_SYMBOL_GPL(vsock_core_get_transport);
 
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 9763394f7a61..37a1c7e7c7fe 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -29,9 +29,10 @@
 /* Threshold for detecting small packets to copy */
 #define GOOD_COPY_LEN  128
 
-static const struct virtio_transport *virtio_transport_get_ops(void)
+static const struct virtio_transport *
+virtio_transport_get_ops(struct vsock_sock *vsk)
 {
-	const struct vsock_transport *t = vsock_core_get_transport();
+	const struct vsock_transport *t = vsock_core_get_transport(vsk);
 
 	return container_of(t, struct virtio_transport, transport);
 }
@@ -168,7 +169,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	struct virtio_vsock_pkt *pkt;
 	u32 pkt_len = info->pkt_len;
 
-	src_cid = virtio_transport_get_ops()->transport.get_local_cid();
+	src_cid = virtio_transport_get_ops(vsk)->transport.get_local_cid();
 	src_port = vsk->local_addr.svm_port;
 	if (!info->remote_cid) {
 		dst_cid	= vsk->remote_addr.svm_cid;
@@ -201,7 +202,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 
 	virtio_transport_inc_tx_pkt(vvs, pkt);
 
-	return virtio_transport_get_ops()->send_pkt(pkt);
+	return virtio_transport_get_ops(vsk)->send_pkt(pkt);
 }
 
 static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net-next 07/14] vsock: handle buffer_size sockopts in the core
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
                   ` (5 preceding siblings ...)
  2019-10-23  9:55 ` [PATCH net-next 06/14] vsock: add 'struct vsock_sock *' param to vsock_core_get_transport() Stefano Garzarella
@ 2019-10-23  9:55 ` Stefano Garzarella
  2019-10-27  8:08   ` Stefan Hajnoczi
  2019-10-30 15:08   ` Jorgen Hansen
  2019-10-23  9:55 ` [PATCH net-next 08/14] vsock: add vsock_create_connected() called by transports Stefano Garzarella
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

virtio_transport and vmci_transport handle the buffer_size
sockopts in a very similar way.

In order to support multiple transports, this patch moves this
handling in the core to allow the user to change the options
also if the socket is not yet assigned to any transport.

This patch also adds the '.notify_buffer_size' callback in the
'struct virtio_transport' in order to inform the transport,
when the buffer_size is changed by the user. It is also useful
to limit the 'buffer_size' requested (e.g. virtio transports).

Acked-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
RFC -> v1:
- changed .notify_buffer_size return to void (Stefan)
- documented that .notify_buffer_size is called with sk_lock held (Stefan)
---
 drivers/vhost/vsock.c                   |  7 +-
 include/linux/virtio_vsock.h            | 15 +----
 include/net/af_vsock.h                  | 15 ++---
 net/vmw_vsock/af_vsock.c                | 43 ++++++++++---
 net/vmw_vsock/hyperv_transport.c        | 36 -----------
 net/vmw_vsock/virtio_transport.c        |  8 +--
 net/vmw_vsock/virtio_transport_common.c | 79 ++++-------------------
 net/vmw_vsock/vmci_transport.c          | 86 +++----------------------
 net/vmw_vsock/vmci_transport.h          |  3 -
 9 files changed, 65 insertions(+), 227 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 92ab3852c954..6d7e4f022748 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -418,13 +418,8 @@ static struct virtio_transport vhost_transport = {
 		.notify_send_pre_block    = virtio_transport_notify_send_pre_block,
 		.notify_send_pre_enqueue  = virtio_transport_notify_send_pre_enqueue,
 		.notify_send_post_enqueue = virtio_transport_notify_send_post_enqueue,
+		.notify_buffer_size       = virtio_transport_notify_buffer_size,
 
-		.set_buffer_size          = virtio_transport_set_buffer_size,
-		.set_min_buffer_size      = virtio_transport_set_min_buffer_size,
-		.set_max_buffer_size      = virtio_transport_set_max_buffer_size,
-		.get_buffer_size          = virtio_transport_get_buffer_size,
-		.get_min_buffer_size      = virtio_transport_get_min_buffer_size,
-		.get_max_buffer_size      = virtio_transport_get_max_buffer_size,
 	},
 
 	.send_pkt = vhost_transport_send_pkt,
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 96d8132acbd7..b79befd2a5a4 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -7,9 +7,6 @@
 #include <net/sock.h>
 #include <net/af_vsock.h>
 
-#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
-#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
-#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
 #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
 #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
 #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
@@ -25,11 +22,6 @@ enum {
 struct virtio_vsock_sock {
 	struct vsock_sock *vsk;
 
-	/* Protected by lock_sock(sk_vsock(trans->vsk)) */
-	u32 buf_size;
-	u32 buf_size_min;
-	u32 buf_size_max;
-
 	spinlock_t tx_lock;
 	spinlock_t rx_lock;
 
@@ -93,12 +85,6 @@ s64 virtio_transport_stream_has_space(struct vsock_sock *vsk);
 
 int virtio_transport_do_socket_init(struct vsock_sock *vsk,
 				 struct vsock_sock *psk);
-u64 virtio_transport_get_buffer_size(struct vsock_sock *vsk);
-u64 virtio_transport_get_min_buffer_size(struct vsock_sock *vsk);
-u64 virtio_transport_get_max_buffer_size(struct vsock_sock *vsk);
-void virtio_transport_set_buffer_size(struct vsock_sock *vsk, u64 val);
-void virtio_transport_set_min_buffer_size(struct vsock_sock *vsk, u64 val);
-void virtio_transport_set_max_buffer_size(struct vsock_sock *vs, u64 val);
 int
 virtio_transport_notify_poll_in(struct vsock_sock *vsk,
 				size_t target,
@@ -125,6 +111,7 @@ int virtio_transport_notify_send_pre_enqueue(struct vsock_sock *vsk,
 	struct vsock_transport_send_notify_data *data);
 int virtio_transport_notify_send_post_enqueue(struct vsock_sock *vsk,
 	ssize_t written, struct vsock_transport_send_notify_data *data);
+void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val);
 
 u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);
 bool virtio_transport_stream_is_active(struct vsock_sock *vsk);
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 2ca67d048de4..4b5d16840fd4 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -65,6 +65,11 @@ struct vsock_sock {
 	bool sent_request;
 	bool ignore_connecting_rst;
 
+	/* Protected by lock_sock(sk) */
+	u64 buffer_size;
+	u64 buffer_min_size;
+	u64 buffer_max_size;
+
 	/* Private to transport. */
 	void *trans;
 };
@@ -140,18 +145,12 @@ struct vsock_transport {
 		struct vsock_transport_send_notify_data *);
 	int (*notify_send_post_enqueue)(struct vsock_sock *, ssize_t,
 		struct vsock_transport_send_notify_data *);
+	/* sk_lock held by the caller */
+	void (*notify_buffer_size)(struct vsock_sock *, u64 *);
 
 	/* Shutdown. */
 	int (*shutdown)(struct vsock_sock *, int);
 
-	/* Buffer sizes. */
-	void (*set_buffer_size)(struct vsock_sock *, u64);
-	void (*set_min_buffer_size)(struct vsock_sock *, u64);
-	void (*set_max_buffer_size)(struct vsock_sock *, u64);
-	u64 (*get_buffer_size)(struct vsock_sock *);
-	u64 (*get_min_buffer_size)(struct vsock_sock *);
-	u64 (*get_max_buffer_size)(struct vsock_sock *);
-
 	/* Addressing. */
 	u32 (*get_local_cid)(void);
 };
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index eaea159006c8..90ac46ea12ef 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -126,6 +126,10 @@ static struct proto vsock_proto = {
  */
 #define VSOCK_DEFAULT_CONNECT_TIMEOUT (2 * HZ)
 
+#define VSOCK_DEFAULT_BUFFER_SIZE     (1024 * 256)
+#define VSOCK_DEFAULT_BUFFER_MAX_SIZE (1024 * 256)
+#define VSOCK_DEFAULT_BUFFER_MIN_SIZE 128
+
 static const struct vsock_transport *transport_single;
 static DEFINE_MUTEX(vsock_register_mutex);
 
@@ -613,10 +617,16 @@ struct sock *__vsock_create(struct net *net,
 		vsk->trusted = psk->trusted;
 		vsk->owner = get_cred(psk->owner);
 		vsk->connect_timeout = psk->connect_timeout;
+		vsk->buffer_size = psk->buffer_size;
+		vsk->buffer_min_size = psk->buffer_min_size;
+		vsk->buffer_max_size = psk->buffer_max_size;
 	} else {
 		vsk->trusted = capable(CAP_NET_ADMIN);
 		vsk->owner = get_current_cred();
 		vsk->connect_timeout = VSOCK_DEFAULT_CONNECT_TIMEOUT;
+		vsk->buffer_size = VSOCK_DEFAULT_BUFFER_SIZE;
+		vsk->buffer_min_size = VSOCK_DEFAULT_BUFFER_MIN_SIZE;
+		vsk->buffer_max_size = VSOCK_DEFAULT_BUFFER_MAX_SIZE;
 	}
 
 	if (vsk->transport->init(vsk, psk) < 0) {
@@ -1368,6 +1378,23 @@ static int vsock_listen(struct socket *sock, int backlog)
 	return err;
 }
 
+static void vsock_update_buffer_size(struct vsock_sock *vsk,
+				     const struct vsock_transport *transport,
+				     u64 val)
+{
+	if (val > vsk->buffer_max_size)
+		val = vsk->buffer_max_size;
+
+	if (val < vsk->buffer_min_size)
+		val = vsk->buffer_min_size;
+
+	if (val != vsk->buffer_size &&
+	    transport && transport->notify_buffer_size)
+		transport->notify_buffer_size(vsk, &val);
+
+	vsk->buffer_size = val;
+}
+
 static int vsock_stream_setsockopt(struct socket *sock,
 				   int level,
 				   int optname,
@@ -1405,17 +1432,19 @@ static int vsock_stream_setsockopt(struct socket *sock,
 	switch (optname) {
 	case SO_VM_SOCKETS_BUFFER_SIZE:
 		COPY_IN(val);
-		transport->set_buffer_size(vsk, val);
+		vsock_update_buffer_size(vsk, transport, val);
 		break;
 
 	case SO_VM_SOCKETS_BUFFER_MAX_SIZE:
 		COPY_IN(val);
-		transport->set_max_buffer_size(vsk, val);
+		vsk->buffer_max_size = val;
+		vsock_update_buffer_size(vsk, transport, vsk->buffer_size);
 		break;
 
 	case SO_VM_SOCKETS_BUFFER_MIN_SIZE:
 		COPY_IN(val);
-		transport->set_min_buffer_size(vsk, val);
+		vsk->buffer_min_size = val;
+		vsock_update_buffer_size(vsk, transport, vsk->buffer_size);
 		break;
 
 	case SO_VM_SOCKETS_CONNECT_TIMEOUT: {
@@ -1456,7 +1485,6 @@ static int vsock_stream_getsockopt(struct socket *sock,
 	int len;
 	struct sock *sk;
 	struct vsock_sock *vsk;
-	const struct vsock_transport *transport;
 	u64 val;
 
 	if (level != AF_VSOCK)
@@ -1480,21 +1508,20 @@ static int vsock_stream_getsockopt(struct socket *sock,
 	err = 0;
 	sk = sock->sk;
 	vsk = vsock_sk(sk);
-	transport = vsk->transport;
 
 	switch (optname) {
 	case SO_VM_SOCKETS_BUFFER_SIZE:
-		val = transport->get_buffer_size(vsk);
+		val = vsk->buffer_size;
 		COPY_OUT(val);
 		break;
 
 	case SO_VM_SOCKETS_BUFFER_MAX_SIZE:
-		val = transport->get_max_buffer_size(vsk);
+		val = vsk->buffer_max_size;
 		COPY_OUT(val);
 		break;
 
 	case SO_VM_SOCKETS_BUFFER_MIN_SIZE:
-		val = transport->get_min_buffer_size(vsk);
+		val = vsk->buffer_min_size;
 		COPY_OUT(val);
 		break;
 
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index bef8772116ec..d62297a62ca6 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -845,36 +845,6 @@ int hvs_notify_send_post_enqueue(struct vsock_sock *vsk, ssize_t written,
 	return 0;
 }
 
-static void hvs_set_buffer_size(struct vsock_sock *vsk, u64 val)
-{
-	/* Ignored. */
-}
-
-static void hvs_set_min_buffer_size(struct vsock_sock *vsk, u64 val)
-{
-	/* Ignored. */
-}
-
-static void hvs_set_max_buffer_size(struct vsock_sock *vsk, u64 val)
-{
-	/* Ignored. */
-}
-
-static u64 hvs_get_buffer_size(struct vsock_sock *vsk)
-{
-	return -ENOPROTOOPT;
-}
-
-static u64 hvs_get_min_buffer_size(struct vsock_sock *vsk)
-{
-	return -ENOPROTOOPT;
-}
-
-static u64 hvs_get_max_buffer_size(struct vsock_sock *vsk)
-{
-	return -ENOPROTOOPT;
-}
-
 static struct vsock_transport hvs_transport = {
 	.get_local_cid            = hvs_get_local_cid,
 
@@ -908,12 +878,6 @@ static struct vsock_transport hvs_transport = {
 	.notify_send_pre_enqueue  = hvs_notify_send_pre_enqueue,
 	.notify_send_post_enqueue = hvs_notify_send_post_enqueue,
 
-	.set_buffer_size          = hvs_set_buffer_size,
-	.set_min_buffer_size      = hvs_set_min_buffer_size,
-	.set_max_buffer_size      = hvs_set_max_buffer_size,
-	.get_buffer_size          = hvs_get_buffer_size,
-	.get_min_buffer_size      = hvs_get_min_buffer_size,
-	.get_max_buffer_size      = hvs_get_max_buffer_size,
 };
 
 static int hvs_probe(struct hv_device *hdev,
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 3756f0857946..fb1fc7760e8c 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -494,13 +494,7 @@ static struct virtio_transport virtio_transport = {
 		.notify_send_pre_block    = virtio_transport_notify_send_pre_block,
 		.notify_send_pre_enqueue  = virtio_transport_notify_send_pre_enqueue,
 		.notify_send_post_enqueue = virtio_transport_notify_send_post_enqueue,
-
-		.set_buffer_size          = virtio_transport_set_buffer_size,
-		.set_min_buffer_size      = virtio_transport_set_min_buffer_size,
-		.set_max_buffer_size      = virtio_transport_set_max_buffer_size,
-		.get_buffer_size          = virtio_transport_get_buffer_size,
-		.get_min_buffer_size      = virtio_transport_get_min_buffer_size,
-		.get_max_buffer_size      = virtio_transport_get_max_buffer_size,
+		.notify_buffer_size       = virtio_transport_notify_buffer_size,
 	},
 
 	.send_pkt = virtio_transport_send_pkt,
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 37a1c7e7c7fe..b2a310dfa158 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -456,17 +456,13 @@ int virtio_transport_do_socket_init(struct vsock_sock *vsk,
 	if (psk) {
 		struct virtio_vsock_sock *ptrans = psk->trans;
 
-		vvs->buf_size	= ptrans->buf_size;
-		vvs->buf_size_min = ptrans->buf_size_min;
-		vvs->buf_size_max = ptrans->buf_size_max;
 		vvs->peer_buf_alloc = ptrans->peer_buf_alloc;
-	} else {
-		vvs->buf_size = VIRTIO_VSOCK_DEFAULT_BUF_SIZE;
-		vvs->buf_size_min = VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE;
-		vvs->buf_size_max = VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE;
 	}
 
-	vvs->buf_alloc = vvs->buf_size;
+	if (vsk->buffer_size > VIRTIO_VSOCK_MAX_BUF_SIZE)
+		vsk->buffer_size = VIRTIO_VSOCK_MAX_BUF_SIZE;
+
+	vvs->buf_alloc = vsk->buffer_size;
 
 	spin_lock_init(&vvs->rx_lock);
 	spin_lock_init(&vvs->tx_lock);
@@ -476,71 +472,20 @@ int virtio_transport_do_socket_init(struct vsock_sock *vsk,
 }
 EXPORT_SYMBOL_GPL(virtio_transport_do_socket_init);
 
-u64 virtio_transport_get_buffer_size(struct vsock_sock *vsk)
-{
-	struct virtio_vsock_sock *vvs = vsk->trans;
-
-	return vvs->buf_size;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_get_buffer_size);
-
-u64 virtio_transport_get_min_buffer_size(struct vsock_sock *vsk)
+/* sk_lock held by the caller */
+void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val)
 {
 	struct virtio_vsock_sock *vvs = vsk->trans;
 
-	return vvs->buf_size_min;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_get_min_buffer_size);
-
-u64 virtio_transport_get_max_buffer_size(struct vsock_sock *vsk)
-{
-	struct virtio_vsock_sock *vvs = vsk->trans;
-
-	return vvs->buf_size_max;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_get_max_buffer_size);
-
-void virtio_transport_set_buffer_size(struct vsock_sock *vsk, u64 val)
-{
-	struct virtio_vsock_sock *vvs = vsk->trans;
+	if (*val > VIRTIO_VSOCK_MAX_BUF_SIZE)
+		*val = VIRTIO_VSOCK_MAX_BUF_SIZE;
 
-	if (val > VIRTIO_VSOCK_MAX_BUF_SIZE)
-		val = VIRTIO_VSOCK_MAX_BUF_SIZE;
-	if (val < vvs->buf_size_min)
-		vvs->buf_size_min = val;
-	if (val > vvs->buf_size_max)
-		vvs->buf_size_max = val;
-	vvs->buf_size = val;
-	vvs->buf_alloc = val;
+	vvs->buf_alloc = *val;
 
 	virtio_transport_send_credit_update(vsk, VIRTIO_VSOCK_TYPE_STREAM,
 					    NULL);
 }
-EXPORT_SYMBOL_GPL(virtio_transport_set_buffer_size);
-
-void virtio_transport_set_min_buffer_size(struct vsock_sock *vsk, u64 val)
-{
-	struct virtio_vsock_sock *vvs = vsk->trans;
-
-	if (val > VIRTIO_VSOCK_MAX_BUF_SIZE)
-		val = VIRTIO_VSOCK_MAX_BUF_SIZE;
-	if (val > vvs->buf_size)
-		vvs->buf_size = val;
-	vvs->buf_size_min = val;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_set_min_buffer_size);
-
-void virtio_transport_set_max_buffer_size(struct vsock_sock *vsk, u64 val)
-{
-	struct virtio_vsock_sock *vvs = vsk->trans;
-
-	if (val > VIRTIO_VSOCK_MAX_BUF_SIZE)
-		val = VIRTIO_VSOCK_MAX_BUF_SIZE;
-	if (val < vvs->buf_size)
-		vvs->buf_size = val;
-	vvs->buf_size_max = val;
-}
-EXPORT_SYMBOL_GPL(virtio_transport_set_max_buffer_size);
+EXPORT_SYMBOL_GPL(virtio_transport_notify_buffer_size);
 
 int
 virtio_transport_notify_poll_in(struct vsock_sock *vsk,
@@ -632,9 +577,7 @@ EXPORT_SYMBOL_GPL(virtio_transport_notify_send_post_enqueue);
 
 u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk)
 {
-	struct virtio_vsock_sock *vvs = vsk->trans;
-
-	return vvs->buf_size;
+	return vsk->buffer_size;
 }
 EXPORT_SYMBOL_GPL(virtio_transport_stream_rcvhiwat);
 
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index f8e3131ac480..8290d37b6587 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -74,10 +74,6 @@ static u32 vmci_transport_qp_resumed_sub_id = VMCI_INVALID_ID;
 
 static int PROTOCOL_OVERRIDE = -1;
 
-#define VMCI_TRANSPORT_DEFAULT_QP_SIZE_MIN   128
-#define VMCI_TRANSPORT_DEFAULT_QP_SIZE       262144
-#define VMCI_TRANSPORT_DEFAULT_QP_SIZE_MAX   262144
-
 /* Helper function to convert from a VMCI error code to a VSock error code. */
 
 static s32 vmci_transport_error_to_vsock_error(s32 vmci_error)
@@ -1025,11 +1021,11 @@ static int vmci_transport_recv_listen(struct sock *sk,
 	/* If the proposed size fits within our min/max, accept it. Otherwise
 	 * propose our own size.
 	 */
-	if (pkt->u.size >= vmci_trans(vpending)->queue_pair_min_size &&
-	    pkt->u.size <= vmci_trans(vpending)->queue_pair_max_size) {
+	if (pkt->u.size >= vpending->buffer_min_size &&
+	    pkt->u.size <= vpending->buffer_max_size) {
 		qp_size = pkt->u.size;
 	} else {
-		qp_size = vmci_trans(vpending)->queue_pair_size;
+		qp_size = vpending->buffer_size;
 	}
 
 	/* Figure out if we are using old or new requests based on the
@@ -1098,7 +1094,7 @@ static int vmci_transport_recv_listen(struct sock *sk,
 	pending->sk_state = TCP_SYN_SENT;
 	vmci_trans(vpending)->produce_size =
 		vmci_trans(vpending)->consume_size = qp_size;
-	vmci_trans(vpending)->queue_pair_size = qp_size;
+	vpending->buffer_size = qp_size;
 
 	vmci_trans(vpending)->notify_ops->process_request(pending);
 
@@ -1392,8 +1388,8 @@ static int vmci_transport_recv_connecting_client_negotiate(
 	vsk->ignore_connecting_rst = false;
 
 	/* Verify that we're OK with the proposed queue pair size */
-	if (pkt->u.size < vmci_trans(vsk)->queue_pair_min_size ||
-	    pkt->u.size > vmci_trans(vsk)->queue_pair_max_size) {
+	if (pkt->u.size < vsk->buffer_min_size ||
+	    pkt->u.size > vsk->buffer_max_size) {
 		err = -EINVAL;
 		goto destroy;
 	}
@@ -1498,8 +1494,7 @@ vmci_transport_recv_connecting_client_invalid(struct sock *sk,
 		vsk->sent_request = false;
 		vsk->ignore_connecting_rst = true;
 
-		err = vmci_transport_send_conn_request(
-			sk, vmci_trans(vsk)->queue_pair_size);
+		err = vmci_transport_send_conn_request(sk, vsk->buffer_size);
 		if (err < 0)
 			err = vmci_transport_error_to_vsock_error(err);
 		else
@@ -1583,21 +1578,6 @@ static int vmci_transport_socket_init(struct vsock_sock *vsk,
 	INIT_LIST_HEAD(&vmci_trans(vsk)->elem);
 	vmci_trans(vsk)->sk = &vsk->sk;
 	spin_lock_init(&vmci_trans(vsk)->lock);
-	if (psk) {
-		vmci_trans(vsk)->queue_pair_size =
-			vmci_trans(psk)->queue_pair_size;
-		vmci_trans(vsk)->queue_pair_min_size =
-			vmci_trans(psk)->queue_pair_min_size;
-		vmci_trans(vsk)->queue_pair_max_size =
-			vmci_trans(psk)->queue_pair_max_size;
-	} else {
-		vmci_trans(vsk)->queue_pair_size =
-			VMCI_TRANSPORT_DEFAULT_QP_SIZE;
-		vmci_trans(vsk)->queue_pair_min_size =
-			 VMCI_TRANSPORT_DEFAULT_QP_SIZE_MIN;
-		vmci_trans(vsk)->queue_pair_max_size =
-			VMCI_TRANSPORT_DEFAULT_QP_SIZE_MAX;
-	}
 
 	return 0;
 }
@@ -1813,8 +1793,7 @@ static int vmci_transport_connect(struct vsock_sock *vsk)
 
 	if (vmci_transport_old_proto_override(&old_pkt_proto) &&
 		old_pkt_proto) {
-		err = vmci_transport_send_conn_request(
-			sk, vmci_trans(vsk)->queue_pair_size);
+		err = vmci_transport_send_conn_request(sk, vsk->buffer_size);
 		if (err < 0) {
 			sk->sk_state = TCP_CLOSE;
 			return err;
@@ -1822,8 +1801,7 @@ static int vmci_transport_connect(struct vsock_sock *vsk)
 	} else {
 		int supported_proto_versions =
 			vmci_transport_new_proto_supported_versions();
-		err = vmci_transport_send_conn_request2(
-				sk, vmci_trans(vsk)->queue_pair_size,
+		err = vmci_transport_send_conn_request2(sk, vsk->buffer_size,
 				supported_proto_versions);
 		if (err < 0) {
 			sk->sk_state = TCP_CLOSE;
@@ -1876,46 +1854,6 @@ static bool vmci_transport_stream_is_active(struct vsock_sock *vsk)
 	return !vmci_handle_is_invalid(vmci_trans(vsk)->qp_handle);
 }
 
-static u64 vmci_transport_get_buffer_size(struct vsock_sock *vsk)
-{
-	return vmci_trans(vsk)->queue_pair_size;
-}
-
-static u64 vmci_transport_get_min_buffer_size(struct vsock_sock *vsk)
-{
-	return vmci_trans(vsk)->queue_pair_min_size;
-}
-
-static u64 vmci_transport_get_max_buffer_size(struct vsock_sock *vsk)
-{
-	return vmci_trans(vsk)->queue_pair_max_size;
-}
-
-static void vmci_transport_set_buffer_size(struct vsock_sock *vsk, u64 val)
-{
-	if (val < vmci_trans(vsk)->queue_pair_min_size)
-		vmci_trans(vsk)->queue_pair_min_size = val;
-	if (val > vmci_trans(vsk)->queue_pair_max_size)
-		vmci_trans(vsk)->queue_pair_max_size = val;
-	vmci_trans(vsk)->queue_pair_size = val;
-}
-
-static void vmci_transport_set_min_buffer_size(struct vsock_sock *vsk,
-					       u64 val)
-{
-	if (val > vmci_trans(vsk)->queue_pair_size)
-		vmci_trans(vsk)->queue_pair_size = val;
-	vmci_trans(vsk)->queue_pair_min_size = val;
-}
-
-static void vmci_transport_set_max_buffer_size(struct vsock_sock *vsk,
-					       u64 val)
-{
-	if (val < vmci_trans(vsk)->queue_pair_size)
-		vmci_trans(vsk)->queue_pair_size = val;
-	vmci_trans(vsk)->queue_pair_max_size = val;
-}
-
 static int vmci_transport_notify_poll_in(
 	struct vsock_sock *vsk,
 	size_t target,
@@ -2098,12 +2036,6 @@ static const struct vsock_transport vmci_transport = {
 	.notify_send_pre_enqueue = vmci_transport_notify_send_pre_enqueue,
 	.notify_send_post_enqueue = vmci_transport_notify_send_post_enqueue,
 	.shutdown = vmci_transport_shutdown,
-	.set_buffer_size = vmci_transport_set_buffer_size,
-	.set_min_buffer_size = vmci_transport_set_min_buffer_size,
-	.set_max_buffer_size = vmci_transport_set_max_buffer_size,
-	.get_buffer_size = vmci_transport_get_buffer_size,
-	.get_min_buffer_size = vmci_transport_get_min_buffer_size,
-	.get_max_buffer_size = vmci_transport_get_max_buffer_size,
 	.get_local_cid = vmci_transport_get_local_cid,
 };
 
diff --git a/net/vmw_vsock/vmci_transport.h b/net/vmw_vsock/vmci_transport.h
index 1ca1e8640b31..b7b072194282 100644
--- a/net/vmw_vsock/vmci_transport.h
+++ b/net/vmw_vsock/vmci_transport.h
@@ -108,9 +108,6 @@ struct vmci_transport {
 	struct vmci_qp *qpair;
 	u64 produce_size;
 	u64 consume_size;
-	u64 queue_pair_size;
-	u64 queue_pair_min_size;
-	u64 queue_pair_max_size;
 	u32 detach_sub_id;
 	union vmci_transport_notify notify;
 	const struct vmci_transport_notify_ops *notify_ops;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net-next 08/14] vsock: add vsock_create_connected() called by transports
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
                   ` (6 preceding siblings ...)
  2019-10-23  9:55 ` [PATCH net-next 07/14] vsock: handle buffer_size sockopts in the core Stefano Garzarella
@ 2019-10-23  9:55 ` Stefano Garzarella
  2019-10-27  8:12   ` Stefan Hajnoczi
  2019-10-30 15:12   ` Jorgen Hansen
  2019-10-23  9:55 ` [PATCH net-next 09/14] vsock: move vsock_insert_unbound() in the vsock_create() Stefano Garzarella
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

All transports call __vsock_create() with the same parameters,
most of them depending on the parent socket. In order to simplify
the VSOCK core APIs exposed to the transports, this patch adds
the vsock_create_connected() callable from transports to create
a new socket when a connection request is received.
We also unexported the __vsock_create().

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 include/net/af_vsock.h                  |  5 +----
 net/vmw_vsock/af_vsock.c                | 20 +++++++++++++-------
 net/vmw_vsock/hyperv_transport.c        |  3 +--
 net/vmw_vsock/virtio_transport_common.c |  3 +--
 net/vmw_vsock/vmci_transport.c          |  3 +--
 5 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 4b5d16840fd4..fa1570dc9f5c 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -76,10 +76,7 @@ struct vsock_sock {
 
 s64 vsock_stream_has_data(struct vsock_sock *vsk);
 s64 vsock_stream_has_space(struct vsock_sock *vsk);
-struct sock *__vsock_create(struct net *net,
-			    struct socket *sock,
-			    struct sock *parent,
-			    gfp_t priority, unsigned short type, int kern);
+struct sock *vsock_create_connected(struct sock *parent);
 
 /**** TRANSPORT ****/
 
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 90ac46ea12ef..95878bed2c67 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -567,12 +567,12 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
 
 static void vsock_connect_timeout(struct work_struct *work);
 
-struct sock *__vsock_create(struct net *net,
-			    struct socket *sock,
-			    struct sock *parent,
-			    gfp_t priority,
-			    unsigned short type,
-			    int kern)
+static struct sock *__vsock_create(struct net *net,
+				   struct socket *sock,
+				   struct sock *parent,
+				   gfp_t priority,
+				   unsigned short type,
+				   int kern)
 {
 	struct sock *sk;
 	struct vsock_sock *psk;
@@ -639,7 +639,6 @@ struct sock *__vsock_create(struct net *net,
 
 	return sk;
 }
-EXPORT_SYMBOL_GPL(__vsock_create);
 
 static void __vsock_release(struct sock *sk, int level)
 {
@@ -705,6 +704,13 @@ static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	return err;
 }
 
+struct sock *vsock_create_connected(struct sock *parent)
+{
+	return __vsock_create(sock_net(parent), NULL, parent, GFP_KERNEL,
+			      parent->sk_type, 0);
+}
+EXPORT_SYMBOL_GPL(vsock_create_connected);
+
 s64 vsock_stream_has_data(struct vsock_sock *vsk)
 {
 	return vsk->transport->stream_has_data(vsk);
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index d62297a62ca6..0ce792a1bf6c 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -360,8 +360,7 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 		if (sk->sk_ack_backlog >= sk->sk_max_ack_backlog)
 			goto out;
 
-		new = __vsock_create(sock_net(sk), NULL, sk, GFP_KERNEL,
-				     sk->sk_type, 0);
+		new = vsock_create_connected(sk);
 		if (!new)
 			goto out;
 
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index b2a310dfa158..f7d0ecbd8f97 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1002,8 +1002,7 @@ virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt)
 		return -ENOMEM;
 	}
 
-	child = __vsock_create(sock_net(sk), NULL, sk, GFP_KERNEL,
-			       sk->sk_type, 0);
+	child = vsock_create_connected(sk);
 	if (!child) {
 		virtio_transport_reset(vsk, pkt);
 		return -ENOMEM;
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 8290d37b6587..5955238ffc13 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -1004,8 +1004,7 @@ static int vmci_transport_recv_listen(struct sock *sk,
 		return -ECONNREFUSED;
 	}
 
-	pending = __vsock_create(sock_net(sk), NULL, sk, GFP_KERNEL,
-				 sk->sk_type, 0);
+	pending = vsock_create_connected(sk);
 	if (!pending) {
 		vmci_transport_send_reset(sk, pkt);
 		return -ENOMEM;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net-next 09/14] vsock: move vsock_insert_unbound() in the vsock_create()
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
                   ` (7 preceding siblings ...)
  2019-10-23  9:55 ` [PATCH net-next 08/14] vsock: add vsock_create_connected() called by transports Stefano Garzarella
@ 2019-10-23  9:55 ` Stefano Garzarella
  2019-10-30 15:12   ` Jorgen Hansen
  2019-10-23  9:55 ` [PATCH net-next 10/14] hv_sock: set VMADDR_CID_HOST in the hvs_remote_addr_init() Stefano Garzarella
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

vsock_insert_unbound() was called only when 'sock' parameter of
__vsock_create() was not null. This only happened when
__vsock_create() was called by vsock_create().

In order to simplify the multi-transports support, this patch
moves vsock_insert_unbound() at the end of vsock_create().

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/af_vsock.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 95878bed2c67..d89381166028 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -634,9 +634,6 @@ static struct sock *__vsock_create(struct net *net,
 		return NULL;
 	}
 
-	if (sock)
-		vsock_insert_unbound(vsk);
-
 	return sk;
 }
 
@@ -1889,6 +1886,8 @@ static const struct proto_ops vsock_stream_ops = {
 static int vsock_create(struct net *net, struct socket *sock,
 			int protocol, int kern)
 {
+	struct sock *sk;
+
 	if (!sock)
 		return -EINVAL;
 
@@ -1908,7 +1907,13 @@ static int vsock_create(struct net *net, struct socket *sock,
 
 	sock->state = SS_UNCONNECTED;
 
-	return __vsock_create(net, sock, NULL, GFP_KERNEL, 0, kern) ? 0 : -ENOMEM;
+	sk = __vsock_create(net, sock, NULL, GFP_KERNEL, 0, kern);
+	if (!sk)
+		return -ENOMEM;
+
+	vsock_insert_unbound(vsock_sk(sk));
+
+	return 0;
 }
 
 static const struct net_proto_family vsock_family_ops = {
-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net-next 10/14] hv_sock: set VMADDR_CID_HOST in the hvs_remote_addr_init()
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
                   ` (8 preceding siblings ...)
  2019-10-23  9:55 ` [PATCH net-next 09/14] vsock: move vsock_insert_unbound() in the vsock_create() Stefano Garzarella
@ 2019-10-23  9:55 ` Stefano Garzarella
  2019-10-23  9:55 ` [PATCH net-next 11/14] vsock: add multi-transports support Stefano Garzarella
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

Remote peer is always the host, so we set VMADDR_CID_HOST as
remote CID instead of VMADDR_CID_ANY.

Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/hyperv_transport.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 0ce792a1bf6c..fc7e61765a4a 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -188,7 +188,8 @@ static void hvs_remote_addr_init(struct sockaddr_vm *remote,
 	static u32 host_ephemeral_port = MIN_HOST_EPHEMERAL_PORT;
 	struct sock *sk;
 
-	vsock_addr_init(remote, VMADDR_CID_ANY, VMADDR_PORT_ANY);
+	/* Remote peer is always the host */
+	vsock_addr_init(remote, VMADDR_CID_HOST, VMADDR_PORT_ANY);
 
 	while (1) {
 		/* Wrap around ? */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net-next 11/14] vsock: add multi-transports support
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
                   ` (9 preceding siblings ...)
  2019-10-23  9:55 ` [PATCH net-next 10/14] hv_sock: set VMADDR_CID_HOST in the hvs_remote_addr_init() Stefano Garzarella
@ 2019-10-23  9:55 ` Stefano Garzarella
  2019-10-23 15:08   ` Stefano Garzarella
  2019-11-11 13:53   ` Jorgen Hansen
  2019-10-23  9:55 ` [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active Stefano Garzarella
                   ` (3 subsequent siblings)
  14 siblings, 2 replies; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

This patch adds the support of multiple transports in the
VSOCK core.

With the multi-transports support, we can use vsock with nested VMs
(using also different hypervisors) loading both guest->host and
host->guest transports at the same time.

Major changes:
- vsock core module can be loaded regardless of the transports
- vsock_core_init() and vsock_core_exit() are renamed to
  vsock_core_register() and vsock_core_unregister()
- vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
  to identify which directions the transport can handle and if it's
  support DGRAM (only vmci)
- each stream socket is assigned to a transport when the remote CID
  is set (during the connect() or when we receive a connection request
  on a listener socket).
  The remote CID is used to decide which transport to use:
  - remote CID > VMADDR_CID_HOST will use host->guest transport
  - remote CID <= VMADDR_CID_HOST will use guest->host transport
- listener sockets are not bound to any transports since no transport
  operations are done on it. In this way we can create a listener
  socket, also if the transports are not loaded or with VMADDR_CID_ANY
  to listen on all transports.
- DGRAM sockets are handled as before, since only the vmci_transport
  provides this feature.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
RFC -> v1:
- documented VSOCK_TRANSPORT_F_* flags
- fixed vsock_assign_transport() when the socket is already assigned
  (e.g connection failed)
- moved features outside of struct vsock_transport, and used as
  parameter of vsock_core_register()
---
 drivers/vhost/vsock.c                   |   5 +-
 include/net/af_vsock.h                  |  17 +-
 net/vmw_vsock/af_vsock.c                | 237 ++++++++++++++++++------
 net/vmw_vsock/hyperv_transport.c        |  26 ++-
 net/vmw_vsock/virtio_transport.c        |   7 +-
 net/vmw_vsock/virtio_transport_common.c |  28 ++-
 net/vmw_vsock/vmci_transport.c          |  31 +++-
 7 files changed, 270 insertions(+), 81 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 6d7e4f022748..b235f4bbe8ea 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -831,7 +831,8 @@ static int __init vhost_vsock_init(void)
 {
 	int ret;
 
-	ret = vsock_core_init(&vhost_transport.transport);
+	ret = vsock_core_register(&vhost_transport.transport,
+				  VSOCK_TRANSPORT_F_H2G);
 	if (ret < 0)
 		return ret;
 	return misc_register(&vhost_vsock_misc);
@@ -840,7 +841,7 @@ static int __init vhost_vsock_init(void)
 static void __exit vhost_vsock_exit(void)
 {
 	misc_deregister(&vhost_vsock_misc);
-	vsock_core_exit();
+	vsock_core_unregister(&vhost_transport.transport);
 };
 
 module_init(vhost_vsock_init);
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index fa1570dc9f5c..27a3463e4892 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -91,6 +91,14 @@ struct vsock_transport_send_notify_data {
 	u64 data2; /* Transport-defined. */
 };
 
+/* Transport features flags */
+/* Transport provides host->guest communication */
+#define VSOCK_TRANSPORT_F_H2G		0x00000001
+/* Transport provides guest->host communication */
+#define VSOCK_TRANSPORT_F_G2H		0x00000002
+/* Transport provides DGRAM communication */
+#define VSOCK_TRANSPORT_F_DGRAM		0x00000004
+
 struct vsock_transport {
 	/* Initialize/tear-down socket. */
 	int (*init)(struct vsock_sock *, struct vsock_sock *);
@@ -154,12 +162,8 @@ struct vsock_transport {
 
 /**** CORE ****/
 
-int __vsock_core_init(const struct vsock_transport *t, struct module *owner);
-static inline int vsock_core_init(const struct vsock_transport *t)
-{
-	return __vsock_core_init(t, THIS_MODULE);
-}
-void vsock_core_exit(void);
+int vsock_core_register(const struct vsock_transport *t, int features);
+void vsock_core_unregister(const struct vsock_transport *t);
 
 /* The transport may downcast this to access transport-specific functions */
 const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk);
@@ -190,6 +194,7 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 					 struct sockaddr_vm *dst);
 void vsock_remove_sock(struct vsock_sock *vsk);
 void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
+int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
 
 /**** TAP ****/
 
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d89381166028..dddd85d9a147 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -130,7 +130,12 @@ static struct proto vsock_proto = {
 #define VSOCK_DEFAULT_BUFFER_MAX_SIZE (1024 * 256)
 #define VSOCK_DEFAULT_BUFFER_MIN_SIZE 128
 
-static const struct vsock_transport *transport_single;
+/* Transport used for host->guest communication */
+static const struct vsock_transport *transport_h2g;
+/* Transport used for guest->host communication */
+static const struct vsock_transport *transport_g2h;
+/* Transport used for DGRAM communication */
+static const struct vsock_transport *transport_dgram;
 static DEFINE_MUTEX(vsock_register_mutex);
 
 /**** UTILS ****/
@@ -182,7 +187,7 @@ static int vsock_auto_bind(struct vsock_sock *vsk)
 	return __vsock_bind(sk, &local_addr);
 }
 
-static int __init vsock_init_tables(void)
+static void vsock_init_tables(void)
 {
 	int i;
 
@@ -191,7 +196,6 @@ static int __init vsock_init_tables(void)
 
 	for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
 		INIT_LIST_HEAD(&vsock_connected_table[i]);
-	return 0;
 }
 
 static void __vsock_insert_bound(struct list_head *list,
@@ -376,6 +380,62 @@ void vsock_enqueue_accept(struct sock *listener, struct sock *connected)
 }
 EXPORT_SYMBOL_GPL(vsock_enqueue_accept);
 
+/* Assign a transport to a socket and call the .init transport callback.
+ *
+ * Note: for stream socket this must be called when vsk->remote_addr is set
+ * (e.g. during the connect() or when a connection request on a listener
+ * socket is received).
+ * The vsk->remote_addr is used to decide which transport to use:
+ *  - remote CID > VMADDR_CID_HOST will use host->guest transport
+ *  - remote CID <= VMADDR_CID_HOST will use guest->host transport
+ */
+int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
+{
+	const struct vsock_transport *new_transport;
+	struct sock *sk = sk_vsock(vsk);
+
+	switch (sk->sk_type) {
+	case SOCK_DGRAM:
+		new_transport = transport_dgram;
+		break;
+	case SOCK_STREAM:
+		if (vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
+			new_transport = transport_h2g;
+		else
+			new_transport = transport_g2h;
+		break;
+	default:
+		return -ESOCKTNOSUPPORT;
+	}
+
+	if (vsk->transport) {
+		if (vsk->transport == new_transport)
+			return 0;
+
+		vsk->transport->release(vsk);
+		vsk->transport->destruct(vsk);
+	}
+
+	if (!new_transport)
+		return -ENODEV;
+
+	vsk->transport = new_transport;
+
+	return vsk->transport->init(vsk, psk);
+}
+EXPORT_SYMBOL_GPL(vsock_assign_transport);
+
+static bool vsock_find_cid(unsigned int cid)
+{
+	if (transport_g2h && cid == transport_g2h->get_local_cid())
+		return true;
+
+	if (transport_h2g && cid == VMADDR_CID_HOST)
+		return true;
+
+	return false;
+}
+
 static struct sock *vsock_dequeue_accept(struct sock *listener)
 {
 	struct vsock_sock *vlistener;
@@ -414,6 +474,9 @@ static int vsock_send_shutdown(struct sock *sk, int mode)
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
 
+	if (!vsk->transport)
+		return -ENODEV;
+
 	return vsk->transport->shutdown(vsk, mode);
 }
 
@@ -530,7 +593,6 @@ static int __vsock_bind_dgram(struct vsock_sock *vsk,
 static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
-	u32 cid;
 	int retval;
 
 	/* First ensure this socket isn't already bound. */
@@ -540,10 +602,9 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
 	/* Now bind to the provided address or select appropriate values if
 	 * none are provided (VMADDR_CID_ANY and VMADDR_PORT_ANY).  Note that
 	 * like AF_INET prevents binding to a non-local IP address (in most
-	 * cases), we only allow binding to the local CID.
+	 * cases), we only allow binding to a local CID.
 	 */
-	cid = vsk->transport->get_local_cid();
-	if (addr->svm_cid != cid && addr->svm_cid != VMADDR_CID_ANY)
+	if (addr->svm_cid != VMADDR_CID_ANY && !vsock_find_cid(addr->svm_cid))
 		return -EADDRNOTAVAIL;
 
 	switch (sk->sk_socket->type) {
@@ -592,7 +653,6 @@ static struct sock *__vsock_create(struct net *net,
 		sk->sk_type = type;
 
 	vsk = vsock_sk(sk);
-	vsk->transport = transport_single;
 	vsock_addr_init(&vsk->local_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
 	vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
 
@@ -629,11 +689,6 @@ static struct sock *__vsock_create(struct net *net,
 		vsk->buffer_max_size = VSOCK_DEFAULT_BUFFER_MAX_SIZE;
 	}
 
-	if (vsk->transport->init(vsk, psk) < 0) {
-		sk_free(sk);
-		return NULL;
-	}
-
 	return sk;
 }
 
@@ -650,7 +705,10 @@ static void __vsock_release(struct sock *sk, int level)
 		/* The release call is supposed to use lock_sock_nested()
 		 * rather than lock_sock(), if a sock lock should be acquired.
 		 */
-		vsk->transport->release(vsk);
+		if (vsk->transport)
+			vsk->transport->release(vsk);
+		else if (sk->sk_type == SOCK_STREAM)
+			vsock_remove_sock(vsk);
 
 		/* When "level" is SINGLE_DEPTH_NESTING, use the nested
 		 * version to avoid the warning "possible recursive locking
@@ -679,7 +737,8 @@ static void vsock_sk_destruct(struct sock *sk)
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
 
-	vsk->transport->destruct(vsk);
+	if (vsk->transport)
+		vsk->transport->destruct(vsk);
 
 	/* When clearing these addresses, there's no need to set the family and
 	 * possibly register the address family with the kernel.
@@ -896,7 +955,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
 			mask |= EPOLLIN | EPOLLRDNORM;
 
 		/* If there is something in the queue then we can read. */
-		if (transport->stream_is_active(vsk) &&
+		if (transport && transport->stream_is_active(vsk) &&
 		    !(sk->sk_shutdown & RCV_SHUTDOWN)) {
 			bool data_ready_now = false;
 			int ret = transport->notify_poll_in(
@@ -1146,7 +1205,6 @@ static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
 	err = 0;
 	sk = sock->sk;
 	vsk = vsock_sk(sk);
-	transport = vsk->transport;
 
 	lock_sock(sk);
 
@@ -1174,19 +1232,26 @@ static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
 			goto out;
 		}
 
+		/* Set the remote address that we are connecting to. */
+		memcpy(&vsk->remote_addr, remote_addr,
+		       sizeof(vsk->remote_addr));
+
+		err = vsock_assign_transport(vsk, NULL);
+		if (err)
+			goto out;
+
+		transport = vsk->transport;
+
 		/* The hypervisor and well-known contexts do not have socket
 		 * endpoints.
 		 */
-		if (!transport->stream_allow(remote_addr->svm_cid,
+		if (!transport ||
+		    !transport->stream_allow(remote_addr->svm_cid,
 					     remote_addr->svm_port)) {
 			err = -ENETUNREACH;
 			goto out;
 		}
 
-		/* Set the remote address that we are connecting to. */
-		memcpy(&vsk->remote_addr, remote_addr,
-		       sizeof(vsk->remote_addr));
-
 		err = vsock_auto_bind(vsk);
 		if (err)
 			goto out;
@@ -1586,7 +1651,7 @@ static int vsock_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 		goto out;
 	}
 
-	if (sk->sk_state != TCP_ESTABLISHED ||
+	if (!transport || sk->sk_state != TCP_ESTABLISHED ||
 	    !vsock_addr_bound(&vsk->local_addr)) {
 		err = -ENOTCONN;
 		goto out;
@@ -1712,7 +1777,7 @@ vsock_stream_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 
 	lock_sock(sk);
 
-	if (sk->sk_state != TCP_ESTABLISHED) {
+	if (!transport || sk->sk_state != TCP_ESTABLISHED) {
 		/* Recvmsg is supposed to return 0 if a peer performs an
 		 * orderly shutdown. Differentiate between that case and when a
 		 * peer has not connected or a local shutdown occured with the
@@ -1886,7 +1951,9 @@ static const struct proto_ops vsock_stream_ops = {
 static int vsock_create(struct net *net, struct socket *sock,
 			int protocol, int kern)
 {
+	struct vsock_sock *vsk;
 	struct sock *sk;
+	int ret;
 
 	if (!sock)
 		return -EINVAL;
@@ -1911,7 +1978,17 @@ static int vsock_create(struct net *net, struct socket *sock,
 	if (!sk)
 		return -ENOMEM;
 
-	vsock_insert_unbound(vsock_sk(sk));
+	vsk = vsock_sk(sk);
+
+	if (sock->type == SOCK_DGRAM) {
+		ret = vsock_assign_transport(vsk, NULL);
+		if (ret < 0) {
+			sock_put(sk);
+			return ret;
+		}
+	}
+
+	vsock_insert_unbound(vsk);
 
 	return 0;
 }
@@ -1926,11 +2003,20 @@ static long vsock_dev_do_ioctl(struct file *filp,
 			       unsigned int cmd, void __user *ptr)
 {
 	u32 __user *p = ptr;
+	u32 cid = VMADDR_CID_ANY;
 	int retval = 0;
 
 	switch (cmd) {
 	case IOCTL_VM_SOCKETS_GET_LOCAL_CID:
-		if (put_user(transport_single->get_local_cid(), p) != 0)
+		/* To be compatible with the VMCI behavior, we prioritize the
+		 * guest CID instead of well-know host CID (VMADDR_CID_HOST).
+		 */
+		if (transport_g2h)
+			cid = transport_g2h->get_local_cid();
+		else if (transport_h2g)
+			cid = transport_h2g->get_local_cid();
+
+		if (put_user(cid, p) != 0)
 			retval = -EFAULT;
 		break;
 
@@ -1970,24 +2056,13 @@ static struct miscdevice vsock_device = {
 	.fops		= &vsock_device_ops,
 };
 
-int __vsock_core_init(const struct vsock_transport *t, struct module *owner)
+static int __init vsock_init(void)
 {
-	int err = mutex_lock_interruptible(&vsock_register_mutex);
+	int err = 0;
 
-	if (err)
-		return err;
-
-	if (transport_single) {
-		err = -EBUSY;
-		goto err_busy;
-	}
-
-	/* Transport must be the owner of the protocol so that it can't
-	 * unload while there are open sockets.
-	 */
-	vsock_proto.owner = owner;
-	transport_single = t;
+	vsock_init_tables();
 
+	vsock_proto.owner = THIS_MODULE;
 	vsock_device.minor = MISC_DYNAMIC_MINOR;
 	err = misc_register(&vsock_device);
 	if (err) {
@@ -2008,7 +2083,6 @@ int __vsock_core_init(const struct vsock_transport *t, struct module *owner)
 		goto err_unregister_proto;
 	}
 
-	mutex_unlock(&vsock_register_mutex);
 	return 0;
 
 err_unregister_proto:
@@ -2016,28 +2090,15 @@ int __vsock_core_init(const struct vsock_transport *t, struct module *owner)
 err_deregister_misc:
 	misc_deregister(&vsock_device);
 err_reset_transport:
-	transport_single = NULL;
-err_busy:
-	mutex_unlock(&vsock_register_mutex);
 	return err;
 }
-EXPORT_SYMBOL_GPL(__vsock_core_init);
 
-void vsock_core_exit(void)
+static void __exit vsock_exit(void)
 {
-	mutex_lock(&vsock_register_mutex);
-
 	misc_deregister(&vsock_device);
 	sock_unregister(AF_VSOCK);
 	proto_unregister(&vsock_proto);
-
-	/* We do not want the assignment below re-ordered. */
-	mb();
-	transport_single = NULL;
-
-	mutex_unlock(&vsock_register_mutex);
 }
-EXPORT_SYMBOL_GPL(vsock_core_exit);
 
 const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk)
 {
@@ -2045,12 +2106,70 @@ const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk)
 }
 EXPORT_SYMBOL_GPL(vsock_core_get_transport);
 
-static void __exit vsock_exit(void)
+int vsock_core_register(const struct vsock_transport *t, int features)
+{
+	const struct vsock_transport *t_h2g, *t_g2h, *t_dgram;
+	int err = mutex_lock_interruptible(&vsock_register_mutex);
+
+	if (err)
+		return err;
+
+	t_h2g = transport_h2g;
+	t_g2h = transport_g2h;
+	t_dgram = transport_dgram;
+
+	if (features & VSOCK_TRANSPORT_F_H2G) {
+		if (t_h2g) {
+			err = -EBUSY;
+			goto err_busy;
+		}
+		t_h2g = t;
+	}
+
+	if (features & VSOCK_TRANSPORT_F_G2H) {
+		if (t_g2h) {
+			err = -EBUSY;
+			goto err_busy;
+		}
+		t_g2h = t;
+	}
+
+	if (features & VSOCK_TRANSPORT_F_DGRAM) {
+		if (t_dgram) {
+			err = -EBUSY;
+			goto err_busy;
+		}
+		t_dgram = t;
+	}
+
+	transport_h2g = t_h2g;
+	transport_g2h = t_g2h;
+	transport_dgram = t_dgram;
+
+err_busy:
+	mutex_unlock(&vsock_register_mutex);
+	return err;
+}
+EXPORT_SYMBOL_GPL(vsock_core_register);
+
+void vsock_core_unregister(const struct vsock_transport *t)
 {
-	/* Do nothing.  This function makes this module removable. */
+	mutex_lock(&vsock_register_mutex);
+
+	if (transport_h2g == t)
+		transport_h2g = NULL;
+
+	if (transport_g2h == t)
+		transport_g2h = NULL;
+
+	if (transport_dgram == t)
+		transport_dgram = NULL;
+
+	mutex_unlock(&vsock_register_mutex);
 }
+EXPORT_SYMBOL_GPL(vsock_core_unregister);
 
-module_init(vsock_init_tables);
+module_init(vsock_init);
 module_exit(vsock_exit);
 
 MODULE_AUTHOR("VMware, Inc.");
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index fc7e61765a4a..0ea66d87af39 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -165,6 +165,8 @@ static const guid_t srv_id_template =
 	GUID_INIT(0x00000000, 0xfacb, 0x11e6, 0xbd, 0x58,
 		  0x64, 0x00, 0x6a, 0x79, 0x86, 0xd3);
 
+static bool hvs_check_transport(struct vsock_sock *vsk);
+
 static bool is_valid_srv_id(const guid_t *id)
 {
 	return !memcmp(&id->b[4], &srv_id_template.b[4], sizeof(guid_t) - 4);
@@ -367,6 +369,18 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 
 		new->sk_state = TCP_SYN_SENT;
 		vnew = vsock_sk(new);
+
+		hvs_addr_init(&vnew->local_addr, if_type);
+		hvs_remote_addr_init(&vnew->remote_addr, &vnew->local_addr);
+
+		ret = vsock_assign_transport(vnew, vsock_sk(sk));
+		/* Transport assigned (looking at remote_addr) must be the
+		 * same where we received the request.
+		 */
+		if (ret || !hvs_check_transport(vnew)) {
+			sock_put(new);
+			goto out;
+		}
 		hvs_new = vnew->trans;
 		hvs_new->chan = chan;
 	} else {
@@ -430,9 +444,6 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 		new->sk_state = TCP_ESTABLISHED;
 		sk->sk_ack_backlog++;
 
-		hvs_addr_init(&vnew->local_addr, if_type);
-		hvs_remote_addr_init(&vnew->remote_addr, &vnew->local_addr);
-
 		hvs_new->vm_srv_id = *if_type;
 		hvs_new->host_srv_id = *if_instance;
 
@@ -880,6 +891,11 @@ static struct vsock_transport hvs_transport = {
 
 };
 
+static bool hvs_check_transport(struct vsock_sock *vsk)
+{
+	return vsk->transport == &hvs_transport;
+}
+
 static int hvs_probe(struct hv_device *hdev,
 		     const struct hv_vmbus_device_id *dev_id)
 {
@@ -928,7 +944,7 @@ static int __init hvs_init(void)
 	if (ret != 0)
 		return ret;
 
-	ret = vsock_core_init(&hvs_transport);
+	ret = vsock_core_register(&hvs_transport, VSOCK_TRANSPORT_F_G2H);
 	if (ret) {
 		vmbus_driver_unregister(&hvs_drv);
 		return ret;
@@ -939,7 +955,7 @@ static int __init hvs_init(void)
 
 static void __exit hvs_exit(void)
 {
-	vsock_core_exit();
+	vsock_core_unregister(&hvs_transport);
 	vmbus_driver_unregister(&hvs_drv);
 }
 
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index fb1fc7760e8c..83ad85050384 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -770,7 +770,8 @@ static int __init virtio_vsock_init(void)
 	if (!virtio_vsock_workqueue)
 		return -ENOMEM;
 
-	ret = vsock_core_init(&virtio_transport.transport);
+	ret = vsock_core_register(&virtio_transport.transport,
+				  VSOCK_TRANSPORT_F_G2H);
 	if (ret)
 		goto out_wq;
 
@@ -781,7 +782,7 @@ static int __init virtio_vsock_init(void)
 	return 0;
 
 out_vci:
-	vsock_core_exit();
+	vsock_core_unregister(&virtio_transport.transport);
 out_wq:
 	destroy_workqueue(virtio_vsock_workqueue);
 	return ret;
@@ -790,7 +791,7 @@ static int __init virtio_vsock_init(void)
 static void __exit virtio_vsock_exit(void)
 {
 	unregister_virtio_driver(&virtio_vsock_driver);
-	vsock_core_exit();
+	vsock_core_unregister(&virtio_transport.transport);
 	destroy_workqueue(virtio_vsock_workqueue);
 }
 
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index f7d0ecbd8f97..b39917eb120e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -453,7 +453,7 @@ int virtio_transport_do_socket_init(struct vsock_sock *vsk,
 
 	vsk->trans = vvs;
 	vvs->vsk = vsk;
-	if (psk) {
+	if (psk && psk->trans) {
 		struct virtio_vsock_sock *ptrans = psk->trans;
 
 		vvs->peer_buf_alloc = ptrans->peer_buf_alloc;
@@ -986,11 +986,13 @@ virtio_transport_send_response(struct vsock_sock *vsk,
 
 /* Handle server socket */
 static int
-virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt)
+virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt,
+			     struct virtio_transport *t)
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
 	struct vsock_sock *vchild;
 	struct sock *child;
+	int ret;
 
 	if (le16_to_cpu(pkt->hdr.op) != VIRTIO_VSOCK_OP_REQUEST) {
 		virtio_transport_reset(vsk, pkt);
@@ -1020,6 +1022,17 @@ virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt)
 	vsock_addr_init(&vchild->remote_addr, le64_to_cpu(pkt->hdr.src_cid),
 			le32_to_cpu(pkt->hdr.src_port));
 
+	ret = vsock_assign_transport(vchild, vsk);
+	/* Transport assigned (looking at remote_addr) must be the same
+	 * where we received the request.
+	 */
+	if (ret || vchild->transport != &t->transport) {
+		release_sock(child);
+		virtio_transport_reset(vsk, pkt);
+		sock_put(child);
+		return ret;
+	}
+
 	vsock_insert_connected(vchild);
 	vsock_enqueue_accept(sk, child);
 	virtio_transport_send_response(vchild, pkt);
@@ -1037,6 +1050,14 @@ static bool virtio_transport_space_update(struct sock *sk,
 	struct virtio_vsock_sock *vvs = vsk->trans;
 	bool space_available;
 
+	/* Listener sockets are not associated with any transport, so we are
+	 * not able to take the state to see if there is space available in the
+	 * remote peer, but since they are only used to receive requests, we
+	 * can assume that there is always space available in the other peer.
+	 */
+	if (!vvs)
+		return true;
+
 	/* buf_alloc and fwd_cnt is always included in the hdr */
 	spin_lock_bh(&vvs->tx_lock);
 	vvs->peer_buf_alloc = le32_to_cpu(pkt->hdr.buf_alloc);
@@ -1102,7 +1123,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 
 	switch (sk->sk_state) {
 	case TCP_LISTEN:
-		virtio_transport_recv_listen(sk, pkt);
+		virtio_transport_recv_listen(sk, pkt, t);
 		virtio_transport_free_pkt(pkt);
 		break;
 	case TCP_SYN_SENT:
@@ -1120,6 +1141,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 		virtio_transport_free_pkt(pkt);
 		break;
 	}
+
 	release_sock(sk);
 
 	/* Release refcnt obtained when we fetched this socket out of the
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 5955238ffc13..2eb3f16d53e7 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -57,6 +57,7 @@ static bool vmci_transport_old_proto_override(bool *old_pkt_proto);
 static u16 vmci_transport_new_proto_supported_versions(void);
 static bool vmci_transport_proto_to_notify_struct(struct sock *sk, u16 *proto,
 						  bool old_pkt_proto);
+static bool vmci_check_transport(struct vsock_sock *vsk);
 
 struct vmci_transport_recv_pkt_info {
 	struct work_struct work;
@@ -1017,6 +1018,15 @@ static int vmci_transport_recv_listen(struct sock *sk,
 	vsock_addr_init(&vpending->remote_addr, pkt->dg.src.context,
 			pkt->src_port);
 
+	err = vsock_assign_transport(vpending, vsock_sk(sk));
+	/* Transport assigned (looking at remote_addr) must be the same
+	 * where we received the request.
+	 */
+	if (err || !vmci_check_transport(vpending)) {
+		sock_put(pending);
+		return err;
+	}
+
 	/* If the proposed size fits within our min/max, accept it. Otherwise
 	 * propose our own size.
 	 */
@@ -2008,7 +2018,7 @@ static u32 vmci_transport_get_local_cid(void)
 	return vmci_get_context_id();
 }
 
-static const struct vsock_transport vmci_transport = {
+static struct vsock_transport vmci_transport = {
 	.init = vmci_transport_socket_init,
 	.destruct = vmci_transport_destruct,
 	.release = vmci_transport_release,
@@ -2038,10 +2048,25 @@ static const struct vsock_transport vmci_transport = {
 	.get_local_cid = vmci_transport_get_local_cid,
 };
 
+static bool vmci_check_transport(struct vsock_sock *vsk)
+{
+	return vsk->transport == &vmci_transport;
+}
+
 static int __init vmci_transport_init(void)
 {
+	int features = VSOCK_TRANSPORT_F_DGRAM | VSOCK_TRANSPORT_F_H2G;
+	int cid;
 	int err;
 
+	cid = vmci_get_context_id();
+
+	if (cid == VMCI_INVALID_ID)
+		return -EINVAL;
+
+	if (cid != VMCI_HOST_CONTEXT_ID)
+		features |= VSOCK_TRANSPORT_F_G2H;
+
 	/* Create the datagram handle that we will use to send and receive all
 	 * VSocket control messages for this context.
 	 */
@@ -2065,7 +2090,7 @@ static int __init vmci_transport_init(void)
 		goto err_destroy_stream_handle;
 	}
 
-	err = vsock_core_init(&vmci_transport);
+	err = vsock_core_register(&vmci_transport, features);
 	if (err < 0)
 		goto err_unsubscribe;
 
@@ -2096,7 +2121,7 @@ static void __exit vmci_transport_exit(void)
 		vmci_transport_qp_resumed_sub_id = VMCI_INVALID_ID;
 	}
 
-	vsock_core_exit();
+	vsock_core_unregister(&vmci_transport);
 }
 module_exit(vmci_transport_exit);
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
                   ` (10 preceding siblings ...)
  2019-10-23  9:55 ` [PATCH net-next 11/14] vsock: add multi-transports support Stefano Garzarella
@ 2019-10-23  9:55 ` Stefano Garzarella
  2019-10-27  8:17   ` Stefan Hajnoczi
                     ` (2 more replies)
  2019-10-23  9:55 ` [PATCH net-next 13/14] vsock: prevent transport modules unloading Stefano Garzarella
                   ` (2 subsequent siblings)
  14 siblings, 3 replies; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

To allow other transports to be loaded with vmci_transport,
we register the vmci_transport as G2H or H2G only when a VMCI guest
or host is active.

To do that, this patch adds a callback registered in the vmci driver
that will be called when a new host or guest become active.
This callback will register the vmci_transport in the VSOCK core.
If the transport is already registered, we ignore the error coming
from vsock_core_register().

Cc: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 drivers/misc/vmw_vmci/vmci_driver.c | 50 +++++++++++++++++++++++++++++
 drivers/misc/vmw_vmci/vmci_driver.h |  2 ++
 drivers/misc/vmw_vmci/vmci_guest.c  |  2 ++
 drivers/misc/vmw_vmci/vmci_host.c   |  7 ++++
 include/linux/vmw_vmci_api.h        |  2 ++
 net/vmw_vsock/vmci_transport.c      | 29 +++++++++++------
 6 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/drivers/misc/vmw_vmci/vmci_driver.c b/drivers/misc/vmw_vmci/vmci_driver.c
index 819e35995d32..195afbd7edc1 100644
--- a/drivers/misc/vmw_vmci/vmci_driver.c
+++ b/drivers/misc/vmw_vmci/vmci_driver.c
@@ -28,6 +28,9 @@ MODULE_PARM_DESC(disable_guest,
 static bool vmci_guest_personality_initialized;
 static bool vmci_host_personality_initialized;
 
+static DEFINE_MUTEX(vmci_vsock_mutex); /* protects vmci_vsock_transport_cb */
+static vmci_vsock_cb vmci_vsock_transport_cb;
+
 /*
  * vmci_get_context_id() - Gets the current context ID.
  *
@@ -45,6 +48,53 @@ u32 vmci_get_context_id(void)
 }
 EXPORT_SYMBOL_GPL(vmci_get_context_id);
 
+/*
+ * vmci_register_vsock_callback() - Register the VSOCK vmci_transport callback.
+ *
+ * The callback will be called every time a new host or guest become active,
+ * or if they are already active when this function is called.
+ * To unregister the callback, call this function with NULL parameter.
+ *
+ * Returns 0 on success. -EBUSY if a callback is already registered.
+ */
+int vmci_register_vsock_callback(vmci_vsock_cb callback)
+{
+	int err = 0;
+
+	mutex_lock(&vmci_vsock_mutex);
+
+	if (vmci_vsock_transport_cb && callback) {
+		err = -EBUSY;
+		goto out;
+	}
+
+	vmci_vsock_transport_cb = callback;
+
+	if (!vmci_vsock_transport_cb)
+		goto out;
+
+	if (vmci_guest_code_active())
+		vmci_vsock_transport_cb(false);
+
+	if (vmci_host_users() > 0)
+		vmci_vsock_transport_cb(true);
+
+out:
+	mutex_unlock(&vmci_vsock_mutex);
+	return err;
+}
+EXPORT_SYMBOL_GPL(vmci_register_vsock_callback);
+
+void vmci_call_vsock_callback(bool is_host)
+{
+	mutex_lock(&vmci_vsock_mutex);
+
+	if (vmci_vsock_transport_cb)
+		vmci_vsock_transport_cb(is_host);
+
+	mutex_unlock(&vmci_vsock_mutex);
+}
+
 static int __init vmci_drv_init(void)
 {
 	int vmci_err;
diff --git a/drivers/misc/vmw_vmci/vmci_driver.h b/drivers/misc/vmw_vmci/vmci_driver.h
index aab81b67670c..990682480bf6 100644
--- a/drivers/misc/vmw_vmci/vmci_driver.h
+++ b/drivers/misc/vmw_vmci/vmci_driver.h
@@ -36,10 +36,12 @@ extern struct pci_dev *vmci_pdev;
 
 u32 vmci_get_context_id(void);
 int vmci_send_datagram(struct vmci_datagram *dg);
+void vmci_call_vsock_callback(bool is_host);
 
 int vmci_host_init(void);
 void vmci_host_exit(void);
 bool vmci_host_code_active(void);
+int vmci_host_users(void);
 
 int vmci_guest_init(void);
 void vmci_guest_exit(void);
diff --git a/drivers/misc/vmw_vmci/vmci_guest.c b/drivers/misc/vmw_vmci/vmci_guest.c
index 7a84a48c75da..cc8eeb361fcd 100644
--- a/drivers/misc/vmw_vmci/vmci_guest.c
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -637,6 +637,8 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
 		  vmci_dev->iobase + VMCI_CONTROL_ADDR);
 
 	pci_set_drvdata(pdev, vmci_dev);
+
+	vmci_call_vsock_callback(false);
 	return 0;
 
 err_free_irq:
diff --git a/drivers/misc/vmw_vmci/vmci_host.c b/drivers/misc/vmw_vmci/vmci_host.c
index 833e2bd248a5..ff3c396146ff 100644
--- a/drivers/misc/vmw_vmci/vmci_host.c
+++ b/drivers/misc/vmw_vmci/vmci_host.c
@@ -108,6 +108,11 @@ bool vmci_host_code_active(void)
 	     atomic_read(&vmci_host_active_users) > 0);
 }
 
+int vmci_host_users(void)
+{
+	return atomic_read(&vmci_host_active_users);
+}
+
 /*
  * Called on open of /dev/vmci.
  */
@@ -338,6 +343,8 @@ static int vmci_host_do_init_context(struct vmci_host_dev *vmci_host_dev,
 	vmci_host_dev->ct_type = VMCIOBJ_CONTEXT;
 	atomic_inc(&vmci_host_active_users);
 
+	vmci_call_vsock_callback(true);
+
 	retval = 0;
 
 out:
diff --git a/include/linux/vmw_vmci_api.h b/include/linux/vmw_vmci_api.h
index acd9fafe4fc6..f28907345c80 100644
--- a/include/linux/vmw_vmci_api.h
+++ b/include/linux/vmw_vmci_api.h
@@ -19,6 +19,7 @@
 struct msghdr;
 typedef void (vmci_device_shutdown_fn) (void *device_registration,
 					void *user_data);
+typedef void (*vmci_vsock_cb) (bool is_host);
 
 int vmci_datagram_create_handle(u32 resource_id, u32 flags,
 				vmci_datagram_recv_cb recv_cb,
@@ -37,6 +38,7 @@ int vmci_doorbell_destroy(struct vmci_handle handle);
 int vmci_doorbell_notify(struct vmci_handle handle, u32 priv_flags);
 u32 vmci_get_context_id(void);
 bool vmci_is_context_owner(u32 context_id, kuid_t uid);
+int vmci_register_vsock_callback(vmci_vsock_cb callback);
 
 int vmci_event_subscribe(u32 event,
 			 vmci_event_cb callback, void *callback_data,
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 2eb3f16d53e7..04437f822d82 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -2053,19 +2053,22 @@ static bool vmci_check_transport(struct vsock_sock *vsk)
 	return vsk->transport == &vmci_transport;
 }
 
-static int __init vmci_transport_init(void)
+void vmci_vsock_transport_cb(bool is_host)
 {
-	int features = VSOCK_TRANSPORT_F_DGRAM | VSOCK_TRANSPORT_F_H2G;
-	int cid;
-	int err;
+	int features;
 
-	cid = vmci_get_context_id();
+	if (is_host)
+		features = VSOCK_TRANSPORT_F_H2G;
+	else
+		features = VSOCK_TRANSPORT_F_G2H;
 
-	if (cid == VMCI_INVALID_ID)
-		return -EINVAL;
+	vsock_core_register(&vmci_transport, features);
+}
 
-	if (cid != VMCI_HOST_CONTEXT_ID)
-		features |= VSOCK_TRANSPORT_F_G2H;
+static int __init vmci_transport_init(void)
+{
+	int features = VSOCK_TRANSPORT_F_DGRAM;
+	int err;
 
 	/* Create the datagram handle that we will use to send and receive all
 	 * VSocket control messages for this context.
@@ -2079,7 +2082,6 @@ static int __init vmci_transport_init(void)
 		pr_err("Unable to create datagram handle. (%d)\n", err);
 		return vmci_transport_error_to_vsock_error(err);
 	}
-
 	err = vmci_event_subscribe(VMCI_EVENT_QP_RESUMED,
 				   vmci_transport_qp_resumed_cb,
 				   NULL, &vmci_transport_qp_resumed_sub_id);
@@ -2094,8 +2096,14 @@ static int __init vmci_transport_init(void)
 	if (err < 0)
 		goto err_unsubscribe;
 
+	err = vmci_register_vsock_callback(vmci_vsock_transport_cb);
+	if (err < 0)
+		goto err_unregister;
+
 	return 0;
 
+err_unregister:
+	vsock_core_unregister(&vmci_transport);
 err_unsubscribe:
 	vmci_event_unsubscribe(vmci_transport_qp_resumed_sub_id);
 err_destroy_stream_handle:
@@ -2121,6 +2129,7 @@ static void __exit vmci_transport_exit(void)
 		vmci_transport_qp_resumed_sub_id = VMCI_INVALID_ID;
 	}
 
+	vmci_register_vsock_callback(NULL);
 	vsock_core_unregister(&vmci_transport);
 }
 module_exit(vmci_transport_exit);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net-next 13/14] vsock: prevent transport modules unloading
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
                   ` (11 preceding siblings ...)
  2019-10-23  9:55 ` [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active Stefano Garzarella
@ 2019-10-23  9:55 ` Stefano Garzarella
  2019-11-11 16:36   ` Jorgen Hansen
  2019-10-23  9:55 ` [PATCH net-next 14/14] vsock: fix bind() behaviour taking care of CID Stefano Garzarella
  2019-10-27  8:01 ` [PATCH net-next 00/14] vsock: add multi-transports support Stefan Hajnoczi
  14 siblings, 1 reply; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

This patch adds 'module' member in the 'struct vsock_transport'
in order to get/put the transport module. This prevents the
module unloading while sockets are assigned to it.

We increase the module refcnt when a socket is assigned to a
transport, and we decrease the module refcnt when the socket
is destructed.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
RFC -> v1:
- fixed typo 's/tranport/transport/' in a comment (Stefan)
---
 drivers/vhost/vsock.c            |  2 ++
 include/net/af_vsock.h           |  2 ++
 net/vmw_vsock/af_vsock.c         | 20 ++++++++++++++++----
 net/vmw_vsock/hyperv_transport.c |  2 ++
 net/vmw_vsock/virtio_transport.c |  2 ++
 net/vmw_vsock/vmci_transport.c   |  1 +
 6 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index b235f4bbe8ea..fdda9ec625ad 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -386,6 +386,8 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
 
 static struct virtio_transport vhost_transport = {
 	.transport = {
+		.module                   = THIS_MODULE,
+
 		.get_local_cid            = vhost_transport_get_local_cid,
 
 		.init                     = virtio_transport_do_socket_init,
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 27a3463e4892..269e2f034789 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -100,6 +100,8 @@ struct vsock_transport_send_notify_data {
 #define VSOCK_TRANSPORT_F_DGRAM		0x00000004
 
 struct vsock_transport {
+	struct module *module;
+
 	/* Initialize/tear-down socket. */
 	int (*init)(struct vsock_sock *, struct vsock_sock *);
 	void (*destruct)(struct vsock_sock *);
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index dddd85d9a147..1f2e707cae66 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -380,6 +380,16 @@ void vsock_enqueue_accept(struct sock *listener, struct sock *connected)
 }
 EXPORT_SYMBOL_GPL(vsock_enqueue_accept);
 
+static void vsock_deassign_transport(struct vsock_sock *vsk)
+{
+	if (!vsk->transport)
+		return;
+
+	vsk->transport->destruct(vsk);
+	module_put(vsk->transport->module);
+	vsk->transport = NULL;
+}
+
 /* Assign a transport to a socket and call the .init transport callback.
  *
  * Note: for stream socket this must be called when vsk->remote_addr is set
@@ -413,10 +423,13 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
 			return 0;
 
 		vsk->transport->release(vsk);
-		vsk->transport->destruct(vsk);
+		vsock_deassign_transport(vsk);
 	}
 
-	if (!new_transport)
+	/* We increase the module refcnt to prevent the transport unloading
+	 * while there are open sockets assigned to it.
+	 */
+	if (!new_transport || !try_module_get(new_transport->module))
 		return -ENODEV;
 
 	vsk->transport = new_transport;
@@ -737,8 +750,7 @@ static void vsock_sk_destruct(struct sock *sk)
 {
 	struct vsock_sock *vsk = vsock_sk(sk);
 
-	if (vsk->transport)
-		vsk->transport->destruct(vsk);
+	vsock_deassign_transport(vsk);
 
 	/* When clearing these addresses, there's no need to set the family and
 	 * possibly register the address family with the kernel.
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 0ea66d87af39..d0a349d85414 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -857,6 +857,8 @@ int hvs_notify_send_post_enqueue(struct vsock_sock *vsk, ssize_t written,
 }
 
 static struct vsock_transport hvs_transport = {
+	.module                   = THIS_MODULE,
+
 	.get_local_cid            = hvs_get_local_cid,
 
 	.init                     = hvs_sock_init,
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 83ad85050384..1458c5c8b64d 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -462,6 +462,8 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
 
 static struct virtio_transport virtio_transport = {
 	.transport = {
+		.module                   = THIS_MODULE,
+
 		.get_local_cid            = virtio_transport_get_local_cid,
 
 		.init                     = virtio_transport_do_socket_init,
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 04437f822d82..0cbf023fae11 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -2019,6 +2019,7 @@ static u32 vmci_transport_get_local_cid(void)
 }
 
 static struct vsock_transport vmci_transport = {
+	.module = THIS_MODULE,
 	.init = vmci_transport_socket_init,
 	.destruct = vmci_transport_destruct,
 	.release = vmci_transport_release,
-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH net-next 14/14] vsock: fix bind() behaviour taking care of CID
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
                   ` (12 preceding siblings ...)
  2019-10-23  9:55 ` [PATCH net-next 13/14] vsock: prevent transport modules unloading Stefano Garzarella
@ 2019-10-23  9:55 ` Stefano Garzarella
  2019-11-11 16:53   ` Jorgen Hansen
  2019-10-27  8:01 ` [PATCH net-next 00/14] vsock: add multi-transports support Stefan Hajnoczi
  14 siblings, 1 reply; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Jorgen Hansen,
	Sasha Levin, linux-kernel, Arnd Bergmann, Stefan Hajnoczi,
	linux-hyperv, K. Y. Srinivasan, Stephen Hemminger,
	virtualization

When we are looking for a socket bound to a specific address,
we also have to take into account the CID.

This patch is useful with multi-transports support because it
allows the binding of the same port with different CID, and
it prevents a connection to a wrong socket bound to the same
port, but with different CID.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/af_vsock.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 1f2e707cae66..7183de277072 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -228,10 +228,16 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr)
 {
 	struct vsock_sock *vsk;
 
-	list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table)
-		if (addr->svm_port == vsk->local_addr.svm_port)
+	list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
+		if (vsock_addr_equals_addr(addr, &vsk->local_addr))
 			return sk_vsock(vsk);
 
+		if (addr->svm_port == vsk->local_addr.svm_port &&
+		    (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
+		     addr->svm_cid == VMADDR_CID_ANY))
+			return sk_vsock(vsk);
+	}
+
 	return NULL;
 }
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 11/14] vsock: add multi-transports support
  2019-10-23  9:55 ` [PATCH net-next 11/14] vsock: add multi-transports support Stefano Garzarella
@ 2019-10-23 15:08   ` Stefano Garzarella
  2019-10-30 15:40     ` Jorgen Hansen
  2019-11-11 13:53   ` Jorgen Hansen
  1 sibling, 1 reply; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-23 15:08 UTC (permalink / raw)
  To: netdev, Jorgen Hansen
  Cc: Sasha Levin, linux-hyperv, Stephen Hemminger, Arnd Bergmann, kvm,
	Michael S. Tsirkin, Greg Kroah-Hartman, Dexuan Cui, linux-kernel,
	virtualization, Haiyang Zhang, Stefan Hajnoczi, David S. Miller

On Wed, Oct 23, 2019 at 11:59 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> This patch adds the support of multiple transports in the
> VSOCK core.
>
> With the multi-transports support, we can use vsock with nested VMs
> (using also different hypervisors) loading both guest->host and
> host->guest transports at the same time.
>
> Major changes:
> - vsock core module can be loaded regardless of the transports
> - vsock_core_init() and vsock_core_exit() are renamed to
>   vsock_core_register() and vsock_core_unregister()
> - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
>   to identify which directions the transport can handle and if it's
>   support DGRAM (only vmci)
> - each stream socket is assigned to a transport when the remote CID
>   is set (during the connect() or when we receive a connection request
>   on a listener socket).
>   The remote CID is used to decide which transport to use:
>   - remote CID > VMADDR_CID_HOST will use host->guest transport
>   - remote CID <= VMADDR_CID_HOST will use guest->host transport
> - listener sockets are not bound to any transports since no transport
>   operations are done on it. In this way we can create a listener
>   socket, also if the transports are not loaded or with VMADDR_CID_ANY
>   to listen on all transports.
> - DGRAM sockets are handled as before, since only the vmci_transport
>   provides this feature.
>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
> RFC -> v1:
> - documented VSOCK_TRANSPORT_F_* flags
> - fixed vsock_assign_transport() when the socket is already assigned
>   (e.g connection failed)
> - moved features outside of struct vsock_transport, and used as
>   parameter of vsock_core_register()
> ---
>  drivers/vhost/vsock.c                   |   5 +-
>  include/net/af_vsock.h                  |  17 +-
>  net/vmw_vsock/af_vsock.c                | 237 ++++++++++++++++++------
>  net/vmw_vsock/hyperv_transport.c        |  26 ++-
>  net/vmw_vsock/virtio_transport.c        |   7 +-
>  net/vmw_vsock/virtio_transport_common.c |  28 ++-
>  net/vmw_vsock/vmci_transport.c          |  31 +++-
>  7 files changed, 270 insertions(+), 81 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 6d7e4f022748..b235f4bbe8ea 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -831,7 +831,8 @@ static int __init vhost_vsock_init(void)
>  {
>         int ret;
>
> -       ret = vsock_core_init(&vhost_transport.transport);
> +       ret = vsock_core_register(&vhost_transport.transport,
> +                                 VSOCK_TRANSPORT_F_H2G);
>         if (ret < 0)
>                 return ret;
>         return misc_register(&vhost_vsock_misc);
> @@ -840,7 +841,7 @@ static int __init vhost_vsock_init(void)
>  static void __exit vhost_vsock_exit(void)
>  {
>         misc_deregister(&vhost_vsock_misc);
> -       vsock_core_exit();
> +       vsock_core_unregister(&vhost_transport.transport);
>  };
>
>  module_init(vhost_vsock_init);
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index fa1570dc9f5c..27a3463e4892 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -91,6 +91,14 @@ struct vsock_transport_send_notify_data {
>         u64 data2; /* Transport-defined. */
>  };
>
> +/* Transport features flags */
> +/* Transport provides host->guest communication */
> +#define VSOCK_TRANSPORT_F_H2G          0x00000001
> +/* Transport provides guest->host communication */
> +#define VSOCK_TRANSPORT_F_G2H          0x00000002
> +/* Transport provides DGRAM communication */
> +#define VSOCK_TRANSPORT_F_DGRAM                0x00000004
> +
>  struct vsock_transport {
>         /* Initialize/tear-down socket. */
>         int (*init)(struct vsock_sock *, struct vsock_sock *);
> @@ -154,12 +162,8 @@ struct vsock_transport {
>
>  /**** CORE ****/
>
> -int __vsock_core_init(const struct vsock_transport *t, struct module *owner);
> -static inline int vsock_core_init(const struct vsock_transport *t)
> -{
> -       return __vsock_core_init(t, THIS_MODULE);
> -}
> -void vsock_core_exit(void);
> +int vsock_core_register(const struct vsock_transport *t, int features);
> +void vsock_core_unregister(const struct vsock_transport *t);
>
>  /* The transport may downcast this to access transport-specific functions */
>  const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk);
> @@ -190,6 +194,7 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
>                                          struct sockaddr_vm *dst);
>  void vsock_remove_sock(struct vsock_sock *vsk);
>  void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
> +int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
>
>  /**** TAP ****/
>
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index d89381166028..dddd85d9a147 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -130,7 +130,12 @@ static struct proto vsock_proto = {
>  #define VSOCK_DEFAULT_BUFFER_MAX_SIZE (1024 * 256)
>  #define VSOCK_DEFAULT_BUFFER_MIN_SIZE 128
>
> -static const struct vsock_transport *transport_single;
> +/* Transport used for host->guest communication */
> +static const struct vsock_transport *transport_h2g;
> +/* Transport used for guest->host communication */
> +static const struct vsock_transport *transport_g2h;
> +/* Transport used for DGRAM communication */
> +static const struct vsock_transport *transport_dgram;
>  static DEFINE_MUTEX(vsock_register_mutex);
>
>  /**** UTILS ****/
> @@ -182,7 +187,7 @@ static int vsock_auto_bind(struct vsock_sock *vsk)
>         return __vsock_bind(sk, &local_addr);
>  }
>
> -static int __init vsock_init_tables(void)
> +static void vsock_init_tables(void)
>  {
>         int i;
>
> @@ -191,7 +196,6 @@ static int __init vsock_init_tables(void)
>
>         for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
>                 INIT_LIST_HEAD(&vsock_connected_table[i]);
> -       return 0;
>  }
>
>  static void __vsock_insert_bound(struct list_head *list,
> @@ -376,6 +380,62 @@ void vsock_enqueue_accept(struct sock *listener, struct sock *connected)
>  }
>  EXPORT_SYMBOL_GPL(vsock_enqueue_accept);
>
> +/* Assign a transport to a socket and call the .init transport callback.
> + *
> + * Note: for stream socket this must be called when vsk->remote_addr is set
> + * (e.g. during the connect() or when a connection request on a listener
> + * socket is received).
> + * The vsk->remote_addr is used to decide which transport to use:
> + *  - remote CID > VMADDR_CID_HOST will use host->guest transport
> + *  - remote CID <= VMADDR_CID_HOST will use guest->host transport
> + */
> +int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
> +{
> +       const struct vsock_transport *new_transport;
> +       struct sock *sk = sk_vsock(vsk);
> +
> +       switch (sk->sk_type) {
> +       case SOCK_DGRAM:
> +               new_transport = transport_dgram;
> +               break;
> +       case SOCK_STREAM:
> +               if (vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
> +                       new_transport = transport_h2g;
> +               else
> +                       new_transport = transport_g2h;

I just noticed that this break the loopback in the guest.
As a fix, we should use 'transport_g2h' when remote_cid <= VMADDR_CID_HOST
or remote_cid is the id of 'transport_g2h'.

To do that we also need to avoid that L2 guests can have the same CID of L1.
For vhost_vsock I can call vsock_find_cid() in vhost_vsock_set_cid()

@Jorgen: for vmci we need to do the same? or it is guaranteed, since
it's already support nested VMs, that a L2 guests cannot have the
same CID as the L1.

I'll send a v2 with this fix, but I'll wait a bit for other comments.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 00/14] vsock: add multi-transports support
  2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
                   ` (13 preceding siblings ...)
  2019-10-23  9:55 ` [PATCH net-next 14/14] vsock: fix bind() behaviour taking care of CID Stefano Garzarella
@ 2019-10-27  8:01 ` Stefan Hajnoczi
  2019-10-29 16:27   ` Stefano Garzarella
  14 siblings, 1 reply; 46+ messages in thread
From: Stefan Hajnoczi @ 2019-10-27  8:01 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Sasha Levin, linux-hyperv, Stephen Hemminger,
	Arnd Bergmann, kvm, Michael S. Tsirkin, Greg Kroah-Hartman,
	Dexuan Cui, linux-kernel, virtualization, Haiyang Zhang,
	Stefan Hajnoczi, David S. Miller, Jorgen Hansen

[-- Attachment #1: Type: text/plain, Size: 1433 bytes --]

On Wed, Oct 23, 2019 at 11:55:40AM +0200, Stefano Garzarella wrote:
> This series adds the multi-transports support to vsock, following
> this proposal: https://www.spinics.net/lists/netdev/msg575792.html
> 
> With the multi-transports support, we can use VSOCK with nested VMs
> (using also different hypervisors) loading both guest->host and
> host->guest transports at the same time.
> Before this series, vmci-transport supported this behavior but only
> using VMware hypervisor on L0, L1, etc.
> 
> RFC: https://patchwork.ozlabs.org/cover/1168442/
> RFC -> v1:
> - Added R-b/A-b from Dexuan and Stefan
> - Fixed comments and typos in several patches (Stefan)
> - Patch 7: changed .notify_buffer_size return to void (Stefan)
> - Added patch 8 to simplify the API exposed to the transports (Stefan)
> - Patch 11:
>   + documented VSOCK_TRANSPORT_F_* flags (Stefan)
>   + fixed vsock_assign_transport() when the socket is already assigned
>   + moved features outside of struct vsock_transport, and used as
>     parameter of vsock_core_register() as a preparation of Patch 12
> - Removed "vsock: add 'transport_hg' to handle g2h\h2g transports" patch
> - Added patch 12 to register vmci_transport only when VMCI guest/host
>   are active

Has there been feedback from Jorgen or someone else from VMware?  A
Reviewed-by or Acked-by would be nice since this patch series affects
VMCI AF_VSOCK.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 07/14] vsock: handle buffer_size sockopts in the core
  2019-10-23  9:55 ` [PATCH net-next 07/14] vsock: handle buffer_size sockopts in the core Stefano Garzarella
@ 2019-10-27  8:08   ` Stefan Hajnoczi
  2019-10-30 15:08   ` Jorgen Hansen
  1 sibling, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2019-10-27  8:08 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Sasha Levin, linux-hyperv, Stephen Hemminger,
	Arnd Bergmann, kvm, Michael S. Tsirkin, Greg Kroah-Hartman,
	Dexuan Cui, linux-kernel, virtualization, Haiyang Zhang,
	Stefan Hajnoczi, David S. Miller, Jorgen Hansen

[-- Attachment #1: Type: text/plain, Size: 1545 bytes --]

On Wed, Oct 23, 2019 at 11:55:47AM +0200, Stefano Garzarella wrote:
> virtio_transport and vmci_transport handle the buffer_size
> sockopts in a very similar way.
> 
> In order to support multiple transports, this patch moves this
> handling in the core to allow the user to change the options
> also if the socket is not yet assigned to any transport.
> 
> This patch also adds the '.notify_buffer_size' callback in the
> 'struct virtio_transport' in order to inform the transport,
> when the buffer_size is changed by the user. It is also useful
> to limit the 'buffer_size' requested (e.g. virtio transports).
> 
> Acked-by: Dexuan Cui <decui@microsoft.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
> RFC -> v1:
> - changed .notify_buffer_size return to void (Stefan)
> - documented that .notify_buffer_size is called with sk_lock held (Stefan)
> ---
>  drivers/vhost/vsock.c                   |  7 +-
>  include/linux/virtio_vsock.h            | 15 +----
>  include/net/af_vsock.h                  | 15 ++---
>  net/vmw_vsock/af_vsock.c                | 43 ++++++++++---
>  net/vmw_vsock/hyperv_transport.c        | 36 -----------
>  net/vmw_vsock/virtio_transport.c        |  8 +--
>  net/vmw_vsock/virtio_transport_common.c | 79 ++++-------------------
>  net/vmw_vsock/vmci_transport.c          | 86 +++----------------------
>  net/vmw_vsock/vmci_transport.h          |  3 -
>  9 files changed, 65 insertions(+), 227 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 08/14] vsock: add vsock_create_connected() called by transports
  2019-10-23  9:55 ` [PATCH net-next 08/14] vsock: add vsock_create_connected() called by transports Stefano Garzarella
@ 2019-10-27  8:12   ` Stefan Hajnoczi
  2019-10-30 15:12   ` Jorgen Hansen
  1 sibling, 0 replies; 46+ messages in thread
From: Stefan Hajnoczi @ 2019-10-27  8:12 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Sasha Levin, linux-hyperv, Stephen Hemminger,
	Arnd Bergmann, kvm, Michael S. Tsirkin, Greg Kroah-Hartman,
	Dexuan Cui, linux-kernel, virtualization, Haiyang Zhang,
	Stefan Hajnoczi, David S. Miller, Jorgen Hansen

[-- Attachment #1: Type: text/plain, Size: 954 bytes --]

On Wed, Oct 23, 2019 at 11:55:48AM +0200, Stefano Garzarella wrote:
> All transports call __vsock_create() with the same parameters,
> most of them depending on the parent socket. In order to simplify
> the VSOCK core APIs exposed to the transports, this patch adds
> the vsock_create_connected() callable from transports to create
> a new socket when a connection request is received.
> We also unexported the __vsock_create().
> 
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  include/net/af_vsock.h                  |  5 +----
>  net/vmw_vsock/af_vsock.c                | 20 +++++++++++++-------
>  net/vmw_vsock/hyperv_transport.c        |  3 +--
>  net/vmw_vsock/virtio_transport_common.c |  3 +--
>  net/vmw_vsock/vmci_transport.c          |  3 +--
>  5 files changed, 17 insertions(+), 17 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active
  2019-10-23  9:55 ` [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active Stefano Garzarella
@ 2019-10-27  8:17   ` Stefan Hajnoczi
  2019-10-29 16:35     ` Stefano Garzarella
  2019-11-04 10:10   ` Stefano Garzarella
  2019-11-11 16:27   ` Jorgen Hansen
  2 siblings, 1 reply; 46+ messages in thread
From: Stefan Hajnoczi @ 2019-10-27  8:17 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Sasha Levin, linux-hyperv, Stephen Hemminger,
	Arnd Bergmann, kvm, Michael S. Tsirkin, Greg Kroah-Hartman,
	Dexuan Cui, linux-kernel, virtualization, Haiyang Zhang,
	Stefan Hajnoczi, David S. Miller, Jorgen Hansen

[-- Attachment #1: Type: text/plain, Size: 194 bytes --]

On Wed, Oct 23, 2019 at 11:55:52AM +0200, Stefano Garzarella wrote:
> +static int __init vmci_transport_init(void)
> +{
> +	int features = VSOCK_TRANSPORT_F_DGRAM;

Where is this variable used?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 00/14] vsock: add multi-transports support
  2019-10-27  8:01 ` [PATCH net-next 00/14] vsock: add multi-transports support Stefan Hajnoczi
@ 2019-10-29 16:27   ` Stefano Garzarella
  0 siblings, 0 replies; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-29 16:27 UTC (permalink / raw)
  To: Stefan Hajnoczi, Adit Ranadive, Vishnu Dasa, Andy king,
	Aditya Sarwade, George Zhang, Jorgen Hansen
  Cc: netdev, Sasha Levin, linux-hyperv, Stephen Hemminger,
	Arnd Bergmann, kvm, Michael S. Tsirkin, Greg Kroah-Hartman,
	Dexuan Cui, linux-kernel, virtualization, Haiyang Zhang,
	Stefan Hajnoczi, David S. Miller

On Sun, Oct 27, 2019 at 09:01:46AM +0100, Stefan Hajnoczi wrote:
> On Wed, Oct 23, 2019 at 11:55:40AM +0200, Stefano Garzarella wrote:
> > This series adds the multi-transports support to vsock, following
> > this proposal: https://www.spinics.net/lists/netdev/msg575792.html
> > 
> > With the multi-transports support, we can use VSOCK with nested VMs
> > (using also different hypervisors) loading both guest->host and
> > host->guest transports at the same time.
> > Before this series, vmci-transport supported this behavior but only
> > using VMware hypervisor on L0, L1, etc.
> > 
> > RFC: https://patchwork.ozlabs.org/cover/1168442/
> > RFC -> v1:
> > - Added R-b/A-b from Dexuan and Stefan
> > - Fixed comments and typos in several patches (Stefan)
> > - Patch 7: changed .notify_buffer_size return to void (Stefan)
> > - Added patch 8 to simplify the API exposed to the transports (Stefan)
> > - Patch 11:
> >   + documented VSOCK_TRANSPORT_F_* flags (Stefan)
> >   + fixed vsock_assign_transport() when the socket is already assigned
> >   + moved features outside of struct vsock_transport, and used as
> >     parameter of vsock_core_register() as a preparation of Patch 12
> > - Removed "vsock: add 'transport_hg' to handle g2h\h2g transports" patch
> > - Added patch 12 to register vmci_transport only when VMCI guest/host
> >   are active
> 
> Has there been feedback from Jorgen or someone else from VMware?  A
> Reviewed-by or Acked-by would be nice since this patch series affects
> VMCI AF_VSOCK.
> 

Unfortunately not for now, I'm adding to this thread some VMware guys that
reviewed latest vmci patches.

Would be nice to have your feedback for these changes.

Thanks in advance,
Stefano

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active
  2019-10-27  8:17   ` Stefan Hajnoczi
@ 2019-10-29 16:35     ` Stefano Garzarella
  0 siblings, 0 replies; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-29 16:35 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: netdev, Sasha Levin, linux-hyperv, Stephen Hemminger,
	Arnd Bergmann, kvm, Michael S. Tsirkin, Greg Kroah-Hartman,
	Dexuan Cui, linux-kernel, virtualization, Haiyang Zhang,
	Stefan Hajnoczi, David S. Miller, Jorgen Hansen

On Sun, Oct 27, 2019 at 09:17:52AM +0100, Stefan Hajnoczi wrote:
> On Wed, Oct 23, 2019 at 11:55:52AM +0200, Stefano Garzarella wrote:
> > +static int __init vmci_transport_init(void)
> > +{
> > +	int features = VSOCK_TRANSPORT_F_DGRAM;
> 
> Where is this variable used?

It is introduced in the previous patch "vsock: add multi-transports support",
and it is used in the vsock_core_register(), but since now the
vmci_transport_init() registers the vmci_transport only with DGRAM
feature, I can remove this variable and I can use directly the
VSOCK_TRANSPORT_F_DGRAM.

I'll fix in the v3.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 01/14] vsock/vmci: remove unused VSOCK_DEFAULT_CONNECT_TIMEOUT
  2019-10-23  9:55 ` [PATCH net-next 01/14] vsock/vmci: remove unused VSOCK_DEFAULT_CONNECT_TIMEOUT Stefano Garzarella
@ 2019-10-30 14:54   ` Jorgen Hansen
  0 siblings, 0 replies; 46+ messages in thread
From: Jorgen Hansen @ 2019-10-30 14:54 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Wednesday, October 23, 2019 11:56 AM
> Subject: [PATCH net-next 01/14] vsock/vmci: remove unused
> VSOCK_DEFAULT_CONNECT_TIMEOUT
> 
> The VSOCK_DEFAULT_CONNECT_TIMEOUT definition was introduced with
> commit d021c344051af ("VSOCK: Introduce VM Sockets"), but it is never used
> in the net/vmw_vsock/vmci_transport.c.
> 
> VSOCK_DEFAULT_CONNECT_TIMEOUT is used and defined in
> net/vmw_vsock/af_vsock.c
> 
> Cc: Jorgen Hansen <jhansen@vmware.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/vmci_transport.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/net/vmw_vsock/vmci_transport.c
> b/net/vmw_vsock/vmci_transport.c index 8c9c4ed90fa7..f8e3131ac480
> 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -78,11 +78,6 @@ static int PROTOCOL_OVERRIDE = -1;
>  #define VMCI_TRANSPORT_DEFAULT_QP_SIZE       262144
>  #define VMCI_TRANSPORT_DEFAULT_QP_SIZE_MAX   262144
> 
> -/* The default peer timeout indicates how long we will wait for a peer
> response
> - * to a control message.
> - */
> -#define VSOCK_DEFAULT_CONNECT_TIMEOUT (2 * HZ)
> -
>  /* Helper function to convert from a VMCI error code to a VSock error code.
> */
> 
>  static s32 vmci_transport_error_to_vsock_error(s32 vmci_error)
> --
> 2.21.0

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 02/14] vsock: remove vm_sockets_get_local_cid()
  2019-10-23  9:55 ` [PATCH net-next 02/14] vsock: remove vm_sockets_get_local_cid() Stefano Garzarella
@ 2019-10-30 14:55   ` Jorgen Hansen
  0 siblings, 0 replies; 46+ messages in thread
From: Jorgen Hansen @ 2019-10-30 14:55 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization, netdev

> -----Original Message-----
> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Wednesday, October 23, 2019 11:56 AM
> To: netdev@vger.kernel.org
> Subject: [PATCH net-next 02/14] vsock: remove vm_sockets_get_local_cid()
> 
> vm_sockets_get_local_cid() is only used in virtio_transport_common.c.
> We can replace it calling the virtio_transport_get_ops() and using the
> get_local_cid() callback registered by the transport.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  include/linux/vm_sockets.h              |  2 --
>  net/vmw_vsock/af_vsock.c                | 10 ----------
>  net/vmw_vsock/virtio_transport_common.c |  2 +-
>  3 files changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/include/linux/vm_sockets.h b/include/linux/vm_sockets.h index
> 33f1a2ecd905..7dd899ccb920 100644
> --- a/include/linux/vm_sockets.h
> +++ b/include/linux/vm_sockets.h
> @@ -10,6 +10,4 @@
> 
>  #include <uapi/linux/vm_sockets.h>
> 
> -int vm_sockets_get_local_cid(void);
> -
>  #endif /* _VM_SOCKETS_H */
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index
> 2ab43b2bba31..2f2582fb7fdd 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -129,16 +129,6 @@ static struct proto vsock_proto = {  static const struct
> vsock_transport *transport;  static DEFINE_MUTEX(vsock_register_mutex);
> 
> -/**** EXPORTS ****/
> -
> -/* Get the ID of the local context.  This is transport dependent. */
> -
> -int vm_sockets_get_local_cid(void)
> -{
> -	return transport->get_local_cid();
> -}
> -EXPORT_SYMBOL_GPL(vm_sockets_get_local_cid);
> -
>  /**** UTILS ****/
> 
>  /* Each bound VSocket is stored in the bind hash table and each connected
> diff --git a/net/vmw_vsock/virtio_transport_common.c
> b/net/vmw_vsock/virtio_transport_common.c
> index d02c9b41a768..b1cd16ed66ea 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -168,7 +168,7 @@ static int virtio_transport_send_pkt_info(struct
> vsock_sock *vsk,
>  	struct virtio_vsock_pkt *pkt;
>  	u32 pkt_len = info->pkt_len;
> 
> -	src_cid = vm_sockets_get_local_cid();
> +	src_cid = virtio_transport_get_ops()->transport.get_local_cid();
>  	src_port = vsk->local_addr.svm_port;
>  	if (!info->remote_cid) {
>  		dst_cid	= vsk->remote_addr.svm_cid;
> --
> 2.21.0

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 03/14] vsock: remove include/linux/vm_sockets.h file
  2019-10-23  9:55 ` [PATCH net-next 03/14] vsock: remove include/linux/vm_sockets.h file Stefano Garzarella
@ 2019-10-30 14:57   ` Jorgen Hansen
  0 siblings, 0 replies; 46+ messages in thread
From: Jorgen Hansen @ 2019-10-30 14:57 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization, netdev

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Wednesday, October 23, 2019 11:56 AM
> Subject: [PATCH net-next 03/14] vsock: remove include/linux/vm_sockets.h
> file
> 
> This header file now only includes the "uapi/linux/vm_sockets.h".
> We can include directly it when needed.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  include/linux/vm_sockets.h            | 13 -------------
>  include/net/af_vsock.h                |  2 +-
>  include/net/vsock_addr.h              |  2 +-
>  net/vmw_vsock/vmci_transport_notify.h |  1 -
>  4 files changed, 2 insertions(+), 16 deletions(-)  delete mode 100644
> include/linux/vm_sockets.h
> 
> diff --git a/include/linux/vm_sockets.h b/include/linux/vm_sockets.h
> deleted file mode 100644 index 7dd899ccb920..000000000000
> --- a/include/linux/vm_sockets.h
> +++ /dev/null
> @@ -1,13 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -/*
> - * VMware vSockets Driver
> - *
> - * Copyright (C) 2007-2013 VMware, Inc. All rights reserved.
> - */
> -
> -#ifndef _VM_SOCKETS_H
> -#define _VM_SOCKETS_H
> -
> -#include <uapi/linux/vm_sockets.h>
> -
> -#endif /* _VM_SOCKETS_H */
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index
> 80ea0f93d3f7..c660402b10f2 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -10,7 +10,7 @@
> 
>  #include <linux/kernel.h>
>  #include <linux/workqueue.h>
> -#include <linux/vm_sockets.h>
> +#include <uapi/linux/vm_sockets.h>
> 
>  #include "vsock_addr.h"
> 
> diff --git a/include/net/vsock_addr.h b/include/net/vsock_addr.h index
> 57d2db5c4bdf..cf8cc140d68d 100644
> --- a/include/net/vsock_addr.h
> +++ b/include/net/vsock_addr.h
> @@ -8,7 +8,7 @@
>  #ifndef _VSOCK_ADDR_H_
>  #define _VSOCK_ADDR_H_
> 
> -#include <linux/vm_sockets.h>
> +#include <uapi/linux/vm_sockets.h>
> 
>  void vsock_addr_init(struct sockaddr_vm *addr, u32 cid, u32 port);  int
> vsock_addr_validate(const struct sockaddr_vm *addr); diff --git
> a/net/vmw_vsock/vmci_transport_notify.h
> b/net/vmw_vsock/vmci_transport_notify.h
> index 7843f08d4290..a1aa5a998c0e 100644
> --- a/net/vmw_vsock/vmci_transport_notify.h
> +++ b/net/vmw_vsock/vmci_transport_notify.h
> @@ -11,7 +11,6 @@
>  #include <linux/types.h>
>  #include <linux/vmw_vmci_defs.h>
>  #include <linux/vmw_vmci_api.h>
> -#include <linux/vm_sockets.h>
> 
>  #include "vmci_transport.h"
> 
> --
> 2.21.0

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 04/14] vsock: add 'transport' member in the struct vsock_sock
  2019-10-23  9:55 ` [PATCH net-next 04/14] vsock: add 'transport' member in the struct vsock_sock Stefano Garzarella
@ 2019-10-30 14:57   ` Jorgen Hansen
  0 siblings, 0 replies; 46+ messages in thread
From: Jorgen Hansen @ 2019-10-30 14:57 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization, netdev

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Wednesday, October 23, 2019 11:56 AM
> Subject: [PATCH net-next 04/14] vsock: add 'transport' member in the struct
> vsock_sock
> 
> As a preparation to support multiple transports, this patch adds the
> 'transport' member at the 'struct vsock_sock'.
> This new field is initialized during the creation in the
> __vsock_create() function.
> 
> This patch also renames the global 'transport' pointer to 'transport_single',
> since for now we're only supporting a single transport registered at run-time.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  include/net/af_vsock.h   |  1 +
>  net/vmw_vsock/af_vsock.c | 56 +++++++++++++++++++++++++++----------
> ---
>  2 files changed, 39 insertions(+), 18 deletions(-)
> 
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index
> c660402b10f2..a5e1e134261d 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -27,6 +27,7 @@ extern spinlock_t vsock_table_lock;  struct vsock_sock {
>  	/* sk must be the first member. */
>  	struct sock sk;
> +	const struct vsock_transport *transport;
>  	struct sockaddr_vm local_addr;
>  	struct sockaddr_vm remote_addr;
>  	/* Links for the global tables of bound and connected sockets. */ diff
> --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index
> 2f2582fb7fdd..c3a14f853eb0 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -126,7 +126,7 @@ static struct proto vsock_proto = {
>   */
>  #define VSOCK_DEFAULT_CONNECT_TIMEOUT (2 * HZ)
> 
> -static const struct vsock_transport *transport;
> +static const struct vsock_transport *transport_single;
>  static DEFINE_MUTEX(vsock_register_mutex);
> 
>  /**** UTILS ****/
> @@ -408,7 +408,9 @@ static bool vsock_is_pending(struct sock *sk)
> 
>  static int vsock_send_shutdown(struct sock *sk, int mode)  {
> -	return transport->shutdown(vsock_sk(sk), mode);
> +	struct vsock_sock *vsk = vsock_sk(sk);
> +
> +	return vsk->transport->shutdown(vsk, mode);
>  }
> 
>  static void vsock_pending_work(struct work_struct *work) @@ -518,7
> +520,7 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,  static int
> __vsock_bind_dgram(struct vsock_sock *vsk,
>  			      struct sockaddr_vm *addr)
>  {
> -	return transport->dgram_bind(vsk, addr);
> +	return vsk->transport->dgram_bind(vsk, addr);
>  }
> 
>  static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr) @@ -
> 536,7 +538,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm
> *addr)
>  	 * like AF_INET prevents binding to a non-local IP address (in most
>  	 * cases), we only allow binding to the local CID.
>  	 */
> -	cid = transport->get_local_cid();
> +	cid = vsk->transport->get_local_cid();
>  	if (addr->svm_cid != cid && addr->svm_cid != VMADDR_CID_ANY)
>  		return -EADDRNOTAVAIL;
> 
> @@ -586,6 +588,7 @@ struct sock *__vsock_create(struct net *net,
>  		sk->sk_type = type;
> 
>  	vsk = vsock_sk(sk);
> +	vsk->transport = transport_single;
>  	vsock_addr_init(&vsk->local_addr, VMADDR_CID_ANY,
> VMADDR_PORT_ANY);
>  	vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY,
> VMADDR_PORT_ANY);
> 
> @@ -616,7 +619,7 @@ struct sock *__vsock_create(struct net *net,
>  		vsk->connect_timeout =
> VSOCK_DEFAULT_CONNECT_TIMEOUT;
>  	}
> 
> -	if (transport->init(vsk, psk) < 0) {
> +	if (vsk->transport->init(vsk, psk) < 0) {
>  		sk_free(sk);
>  		return NULL;
>  	}
> @@ -641,7 +644,7 @@ static void __vsock_release(struct sock *sk, int level)
>  		/* The release call is supposed to use lock_sock_nested()
>  		 * rather than lock_sock(), if a sock lock should be acquired.
>  		 */
> -		transport->release(vsk);
> +		vsk->transport->release(vsk);
> 
>  		/* When "level" is SINGLE_DEPTH_NESTING, use the nested
>  		 * version to avoid the warning "possible recursive locking
> @@ -670,7 +673,7 @@ static void vsock_sk_destruct(struct sock *sk)  {
>  	struct vsock_sock *vsk = vsock_sk(sk);
> 
> -	transport->destruct(vsk);
> +	vsk->transport->destruct(vsk);
> 
>  	/* When clearing these addresses, there's no need to set the family
> and
>  	 * possibly register the address family with the kernel.
> @@ -694,13 +697,13 @@ static int vsock_queue_rcv_skb(struct sock *sk,
> struct sk_buff *skb)
> 
>  s64 vsock_stream_has_data(struct vsock_sock *vsk)  {
> -	return transport->stream_has_data(vsk);
> +	return vsk->transport->stream_has_data(vsk);
>  }
>  EXPORT_SYMBOL_GPL(vsock_stream_has_data);
> 
>  s64 vsock_stream_has_space(struct vsock_sock *vsk)  {
> -	return transport->stream_has_space(vsk);
> +	return vsk->transport->stream_has_space(vsk);
>  }
>  EXPORT_SYMBOL_GPL(vsock_stream_has_space);
> 
> @@ -869,6 +872,7 @@ static __poll_t vsock_poll(struct file *file, struct
> socket *sock,
>  			mask |= EPOLLOUT | EPOLLWRNORM |
> EPOLLWRBAND;
> 
>  	} else if (sock->type == SOCK_STREAM) {
> +		const struct vsock_transport *transport = vsk->transport;
>  		lock_sock(sk);
> 
>  		/* Listening sockets that have connections in their accept
> @@ -944,6 +948,7 @@ static int vsock_dgram_sendmsg(struct socket *sock,
> struct msghdr *msg,
>  	struct sock *sk;
>  	struct vsock_sock *vsk;
>  	struct sockaddr_vm *remote_addr;
> +	const struct vsock_transport *transport;
> 
>  	if (msg->msg_flags & MSG_OOB)
>  		return -EOPNOTSUPP;
> @@ -952,6 +957,7 @@ static int vsock_dgram_sendmsg(struct socket *sock,
> struct msghdr *msg,
>  	err = 0;
>  	sk = sock->sk;
>  	vsk = vsock_sk(sk);
> +	transport = vsk->transport;
> 
>  	lock_sock(sk);
> 
> @@ -1036,8 +1042,8 @@ static int vsock_dgram_connect(struct socket
> *sock,
>  	if (err)
>  		goto out;
> 
> -	if (!transport->dgram_allow(remote_addr->svm_cid,
> -				    remote_addr->svm_port)) {
> +	if (!vsk->transport->dgram_allow(remote_addr->svm_cid,
> +					 remote_addr->svm_port)) {
>  		err = -EINVAL;
>  		goto out;
>  	}
> @@ -1053,7 +1059,9 @@ static int vsock_dgram_connect(struct socket
> *sock,  static int vsock_dgram_recvmsg(struct socket *sock, struct msghdr
> *msg,
>  			       size_t len, int flags)
>  {
> -	return transport->dgram_dequeue(vsock_sk(sock->sk), msg, len,
> flags);
> +	struct vsock_sock *vsk = vsock_sk(sock->sk);
> +
> +	return vsk->transport->dgram_dequeue(vsk, msg, len, flags);
>  }
> 
>  static const struct proto_ops vsock_dgram_ops = { @@ -1079,6 +1087,8 @@
> static const struct proto_ops vsock_dgram_ops = {
> 
>  static int vsock_transport_cancel_pkt(struct vsock_sock *vsk)  {
> +	const struct vsock_transport *transport = vsk->transport;
> +
>  	if (!transport->cancel_pkt)
>  		return -EOPNOTSUPP;
> 
> @@ -1115,6 +1125,7 @@ static int vsock_stream_connect(struct socket
> *sock, struct sockaddr *addr,
>  	int err;
>  	struct sock *sk;
>  	struct vsock_sock *vsk;
> +	const struct vsock_transport *transport;
>  	struct sockaddr_vm *remote_addr;
>  	long timeout;
>  	DEFINE_WAIT(wait);
> @@ -1122,6 +1133,7 @@ static int vsock_stream_connect(struct socket
> *sock, struct sockaddr *addr,
>  	err = 0;
>  	sk = sock->sk;
>  	vsk = vsock_sk(sk);
> +	transport = vsk->transport;
> 
>  	lock_sock(sk);
> 
> @@ -1365,6 +1377,7 @@ static int vsock_stream_setsockopt(struct socket
> *sock,
>  	int err;
>  	struct sock *sk;
>  	struct vsock_sock *vsk;
> +	const struct vsock_transport *transport;
>  	u64 val;
> 
>  	if (level != AF_VSOCK)
> @@ -1385,6 +1398,7 @@ static int vsock_stream_setsockopt(struct socket
> *sock,
>  	err = 0;
>  	sk = sock->sk;
>  	vsk = vsock_sk(sk);
> +	transport = vsk->transport;
> 
>  	lock_sock(sk);
> 
> @@ -1442,6 +1456,7 @@ static int vsock_stream_getsockopt(struct socket
> *sock,
>  	int len;
>  	struct sock *sk;
>  	struct vsock_sock *vsk;
> +	const struct vsock_transport *transport;
>  	u64 val;
> 
>  	if (level != AF_VSOCK)
> @@ -1465,6 +1480,7 @@ static int vsock_stream_getsockopt(struct socket
> *sock,
>  	err = 0;
>  	sk = sock->sk;
>  	vsk = vsock_sk(sk);
> +	transport = vsk->transport;
> 
>  	switch (optname) {
>  	case SO_VM_SOCKETS_BUFFER_SIZE:
> @@ -1509,6 +1525,7 @@ static int vsock_stream_sendmsg(struct socket
> *sock, struct msghdr *msg,  {
>  	struct sock *sk;
>  	struct vsock_sock *vsk;
> +	const struct vsock_transport *transport;
>  	ssize_t total_written;
>  	long timeout;
>  	int err;
> @@ -1517,6 +1534,7 @@ static int vsock_stream_sendmsg(struct socket
> *sock, struct msghdr *msg,
> 
>  	sk = sock->sk;
>  	vsk = vsock_sk(sk);
> +	transport = vsk->transport;
>  	total_written = 0;
>  	err = 0;
> 
> @@ -1648,6 +1666,7 @@ vsock_stream_recvmsg(struct socket *sock, struct
> msghdr *msg, size_t len,  {
>  	struct sock *sk;
>  	struct vsock_sock *vsk;
> +	const struct vsock_transport *transport;
>  	int err;
>  	size_t target;
>  	ssize_t copied;
> @@ -1658,6 +1677,7 @@ vsock_stream_recvmsg(struct socket *sock, struct
> msghdr *msg, size_t len,
> 
>  	sk = sock->sk;
>  	vsk = vsock_sk(sk);
> +	transport = vsk->transport;
>  	err = 0;
> 
>  	lock_sock(sk);
> @@ -1872,7 +1892,7 @@ static long vsock_dev_do_ioctl(struct file *filp,
> 
>  	switch (cmd) {
>  	case IOCTL_VM_SOCKETS_GET_LOCAL_CID:
> -		if (put_user(transport->get_local_cid(), p) != 0)
> +		if (put_user(transport_single->get_local_cid(), p) != 0)
>  			retval = -EFAULT;
>  		break;
> 
> @@ -1919,7 +1939,7 @@ int __vsock_core_init(const struct vsock_transport
> *t, struct module *owner)
>  	if (err)
>  		return err;
> 
> -	if (transport) {
> +	if (transport_single) {
>  		err = -EBUSY;
>  		goto err_busy;
>  	}
> @@ -1928,7 +1948,7 @@ int __vsock_core_init(const struct vsock_transport
> *t, struct module *owner)
>  	 * unload while there are open sockets.
>  	 */
>  	vsock_proto.owner = owner;
> -	transport = t;
> +	transport_single = t;
> 
>  	vsock_device.minor = MISC_DYNAMIC_MINOR;
>  	err = misc_register(&vsock_device);
> @@ -1958,7 +1978,7 @@ int __vsock_core_init(const struct vsock_transport
> *t, struct module *owner)
>  err_deregister_misc:
>  	misc_deregister(&vsock_device);
>  err_reset_transport:
> -	transport = NULL;
> +	transport_single = NULL;
>  err_busy:
>  	mutex_unlock(&vsock_register_mutex);
>  	return err;
> @@ -1975,7 +1995,7 @@ void vsock_core_exit(void)
> 
>  	/* We do not want the assignment below re-ordered. */
>  	mb();
> -	transport = NULL;
> +	transport_single = NULL;
> 
>  	mutex_unlock(&vsock_register_mutex);
>  }
> @@ -1986,7 +2006,7 @@ const struct vsock_transport
> *vsock_core_get_transport(void)
>  	/* vsock_register_mutex not taken since only the transport uses this
>  	 * function and only while registered.
>  	 */
> -	return transport;
> +	return transport_single;
>  }
>  EXPORT_SYMBOL_GPL(vsock_core_get_transport);
> 
> --
> 2.21.0

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 06/14] vsock: add 'struct vsock_sock *' param to vsock_core_get_transport()
  2019-10-23  9:55 ` [PATCH net-next 06/14] vsock: add 'struct vsock_sock *' param to vsock_core_get_transport() Stefano Garzarella
@ 2019-10-30 15:01   ` Jorgen Hansen
  0 siblings, 0 replies; 46+ messages in thread
From: Jorgen Hansen @ 2019-10-30 15:01 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization, netdev

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Wednesday, October 23, 2019 11:56 AM
> Subject: [PATCH net-next 06/14] vsock: add 'struct vsock_sock *' param to
> vsock_core_get_transport()
> 
> Since now the 'struct vsock_sock' object contains a pointer to the transport,
> this patch adds a parameter to the
> vsock_core_get_transport() to return the right transport assigned to the
> socket.
> 
> This patch modifies also the virtio_transport_get_ops(), that uses the
> vsock_core_get_transport(), adding the 'struct vsock_sock *' parameter.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
> RFC -> v1:
> - Removed comment about protecting transport_single (Stefan)
> ---
>  include/net/af_vsock.h                  | 2 +-
>  net/vmw_vsock/af_vsock.c                | 7 ++-----
>  net/vmw_vsock/virtio_transport_common.c | 9 +++++----
>  3 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index
> a5e1e134261d..2ca67d048de4 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -166,7 +166,7 @@ static inline int vsock_core_init(const struct
> vsock_transport *t)  void vsock_core_exit(void);
> 
>  /* The transport may downcast this to access transport-specific functions */
> -const struct vsock_transport *vsock_core_get_transport(void);
> +const struct vsock_transport *vsock_core_get_transport(struct
> +vsock_sock *vsk);
> 
>  /**** UTILS ****/
> 
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index
> c3a14f853eb0..eaea159006c8 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -2001,12 +2001,9 @@ void vsock_core_exit(void)  }
> EXPORT_SYMBOL_GPL(vsock_core_exit);
> 
> -const struct vsock_transport *vsock_core_get_transport(void)
> +const struct vsock_transport *vsock_core_get_transport(struct
> +vsock_sock *vsk)
>  {
> -	/* vsock_register_mutex not taken since only the transport uses this
> -	 * function and only while registered.
> -	 */
> -	return transport_single;
> +	return vsk->transport;
>  }
>  EXPORT_SYMBOL_GPL(vsock_core_get_transport);
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c
> b/net/vmw_vsock/virtio_transport_common.c
> index 9763394f7a61..37a1c7e7c7fe 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -29,9 +29,10 @@
>  /* Threshold for detecting small packets to copy */  #define
> GOOD_COPY_LEN  128
> 
> -static const struct virtio_transport *virtio_transport_get_ops(void)
> +static const struct virtio_transport *
> +virtio_transport_get_ops(struct vsock_sock *vsk)
>  {
> -	const struct vsock_transport *t = vsock_core_get_transport();
> +	const struct vsock_transport *t = vsock_core_get_transport(vsk);
> 
>  	return container_of(t, struct virtio_transport, transport);  } @@ -
> 168,7 +169,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock
> *vsk,
>  	struct virtio_vsock_pkt *pkt;
>  	u32 pkt_len = info->pkt_len;
> 
> -	src_cid = virtio_transport_get_ops()->transport.get_local_cid();
> +	src_cid = virtio_transport_get_ops(vsk)->transport.get_local_cid();
>  	src_port = vsk->local_addr.svm_port;
>  	if (!info->remote_cid) {
>  		dst_cid	= vsk->remote_addr.svm_cid;
> @@ -201,7 +202,7 @@ static int virtio_transport_send_pkt_info(struct
> vsock_sock *vsk,
> 
>  	virtio_transport_inc_tx_pkt(vvs, pkt);
> 
> -	return virtio_transport_get_ops()->send_pkt(pkt);
> +	return virtio_transport_get_ops(vsk)->send_pkt(pkt);
>  }
> 
>  static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> --
> 2.21.0

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 07/14] vsock: handle buffer_size sockopts in the core
  2019-10-23  9:55 ` [PATCH net-next 07/14] vsock: handle buffer_size sockopts in the core Stefano Garzarella
  2019-10-27  8:08   ` Stefan Hajnoczi
@ 2019-10-30 15:08   ` Jorgen Hansen
  2019-10-31  8:50     ` Stefano Garzarella
  1 sibling, 1 reply; 46+ messages in thread
From: Jorgen Hansen @ 2019-10-30 15:08 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization, netdev

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Wednesday, October 23, 2019 11:56 AM
> Subject: [PATCH net-next 07/14] vsock: handle buffer_size sockopts in the
> core
> 
> virtio_transport and vmci_transport handle the buffer_size sockopts in a
> very similar way.
> 
> In order to support multiple transports, this patch moves this handling in the
> core to allow the user to change the options also if the socket is not yet
> assigned to any transport.
> 
> This patch also adds the '.notify_buffer_size' callback in the 'struct
> virtio_transport' in order to inform the transport, when the buffer_size is
> changed by the user. It is also useful to limit the 'buffer_size' requested (e.g.
> virtio transports).
> 
> Acked-by: Dexuan Cui <decui@microsoft.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
> RFC -> v1:
> - changed .notify_buffer_size return to void (Stefan)
> - documented that .notify_buffer_size is called with sk_lock held (Stefan)
> ---
>  drivers/vhost/vsock.c                   |  7 +-
>  include/linux/virtio_vsock.h            | 15 +----
>  include/net/af_vsock.h                  | 15 ++---
>  net/vmw_vsock/af_vsock.c                | 43 ++++++++++---
>  net/vmw_vsock/hyperv_transport.c        | 36 -----------
>  net/vmw_vsock/virtio_transport.c        |  8 +--
>  net/vmw_vsock/virtio_transport_common.c | 79 ++++-------------------
>  net/vmw_vsock/vmci_transport.c          | 86 +++----------------------
>  net/vmw_vsock/vmci_transport.h          |  3 -
>  9 files changed, 65 insertions(+), 227 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index
> 92ab3852c954..6d7e4f022748 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -418,13 +418,8 @@ static struct virtio_transport vhost_transport = {
>  		.notify_send_pre_block    =
> virtio_transport_notify_send_pre_block,
>  		.notify_send_pre_enqueue  =
> virtio_transport_notify_send_pre_enqueue,
>  		.notify_send_post_enqueue =
> virtio_transport_notify_send_post_enqueue,
> +		.notify_buffer_size       = virtio_transport_notify_buffer_size,
> 
> -		.set_buffer_size          = virtio_transport_set_buffer_size,
> -		.set_min_buffer_size      =
> virtio_transport_set_min_buffer_size,
> -		.set_max_buffer_size      =
> virtio_transport_set_max_buffer_size,
> -		.get_buffer_size          = virtio_transport_get_buffer_size,
> -		.get_min_buffer_size      =
> virtio_transport_get_min_buffer_size,
> -		.get_max_buffer_size      =
> virtio_transport_get_max_buffer_size,
>  	},
> 
>  	.send_pkt = vhost_transport_send_pkt,
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index
> 96d8132acbd7..b79befd2a5a4 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -7,9 +7,6 @@
>  #include <net/sock.h>
>  #include <net/af_vsock.h>
> 
> -#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
> -#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
> -#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
>  #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
>  #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
>  #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
> @@ -25,11 +22,6 @@ enum {
>  struct virtio_vsock_sock {
>  	struct vsock_sock *vsk;
> 
> -	/* Protected by lock_sock(sk_vsock(trans->vsk)) */
> -	u32 buf_size;
> -	u32 buf_size_min;
> -	u32 buf_size_max;
> -
>  	spinlock_t tx_lock;
>  	spinlock_t rx_lock;
> 
> @@ -93,12 +85,6 @@ s64 virtio_transport_stream_has_space(struct
> vsock_sock *vsk);
> 
>  int virtio_transport_do_socket_init(struct vsock_sock *vsk,
>  				 struct vsock_sock *psk);
> -u64 virtio_transport_get_buffer_size(struct vsock_sock *vsk);
> -u64 virtio_transport_get_min_buffer_size(struct vsock_sock *vsk);
> -u64 virtio_transport_get_max_buffer_size(struct vsock_sock *vsk); -void
> virtio_transport_set_buffer_size(struct vsock_sock *vsk, u64 val); -void
> virtio_transport_set_min_buffer_size(struct vsock_sock *vsk, u64 val); -void
> virtio_transport_set_max_buffer_size(struct vsock_sock *vs, u64 val);  int
> virtio_transport_notify_poll_in(struct vsock_sock *vsk,
>  				size_t target,
> @@ -125,6 +111,7 @@ int
> virtio_transport_notify_send_pre_enqueue(struct vsock_sock *vsk,
>  	struct vsock_transport_send_notify_data *data);  int
> virtio_transport_notify_send_post_enqueue(struct vsock_sock *vsk,
>  	ssize_t written, struct vsock_transport_send_notify_data *data);
> +void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64
> +*val);
> 
>  u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);  bool
> virtio_transport_stream_is_active(struct vsock_sock *vsk); diff --git
> a/include/net/af_vsock.h b/include/net/af_vsock.h index
> 2ca67d048de4..4b5d16840fd4 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -65,6 +65,11 @@ struct vsock_sock {
>  	bool sent_request;
>  	bool ignore_connecting_rst;
> 
> +	/* Protected by lock_sock(sk) */
> +	u64 buffer_size;
> +	u64 buffer_min_size;
> +	u64 buffer_max_size;
> +
>  	/* Private to transport. */
>  	void *trans;
>  };
> @@ -140,18 +145,12 @@ struct vsock_transport {
>  		struct vsock_transport_send_notify_data *);
>  	int (*notify_send_post_enqueue)(struct vsock_sock *, ssize_t,
>  		struct vsock_transport_send_notify_data *);
> +	/* sk_lock held by the caller */
> +	void (*notify_buffer_size)(struct vsock_sock *, u64 *);
> 
>  	/* Shutdown. */
>  	int (*shutdown)(struct vsock_sock *, int);
> 
> -	/* Buffer sizes. */
> -	void (*set_buffer_size)(struct vsock_sock *, u64);
> -	void (*set_min_buffer_size)(struct vsock_sock *, u64);
> -	void (*set_max_buffer_size)(struct vsock_sock *, u64);
> -	u64 (*get_buffer_size)(struct vsock_sock *);
> -	u64 (*get_min_buffer_size)(struct vsock_sock *);
> -	u64 (*get_max_buffer_size)(struct vsock_sock *);
> -
>  	/* Addressing. */
>  	u32 (*get_local_cid)(void);
>  };
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index
> eaea159006c8..90ac46ea12ef 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -126,6 +126,10 @@ static struct proto vsock_proto = {
>   */
>  #define VSOCK_DEFAULT_CONNECT_TIMEOUT (2 * HZ)
> 
> +#define VSOCK_DEFAULT_BUFFER_SIZE     (1024 * 256)
> +#define VSOCK_DEFAULT_BUFFER_MAX_SIZE (1024 * 256) #define
> +VSOCK_DEFAULT_BUFFER_MIN_SIZE 128
> +
>  static const struct vsock_transport *transport_single;  static
> DEFINE_MUTEX(vsock_register_mutex);
> 
> @@ -613,10 +617,16 @@ struct sock *__vsock_create(struct net *net,
>  		vsk->trusted = psk->trusted;
>  		vsk->owner = get_cred(psk->owner);
>  		vsk->connect_timeout = psk->connect_timeout;
> +		vsk->buffer_size = psk->buffer_size;
> +		vsk->buffer_min_size = psk->buffer_min_size;
> +		vsk->buffer_max_size = psk->buffer_max_size;
>  	} else {
>  		vsk->trusted = capable(CAP_NET_ADMIN);
>  		vsk->owner = get_current_cred();
>  		vsk->connect_timeout =
> VSOCK_DEFAULT_CONNECT_TIMEOUT;
> +		vsk->buffer_size = VSOCK_DEFAULT_BUFFER_SIZE;
> +		vsk->buffer_min_size =
> VSOCK_DEFAULT_BUFFER_MIN_SIZE;
> +		vsk->buffer_max_size =
> VSOCK_DEFAULT_BUFFER_MAX_SIZE;
>  	}
> 
>  	if (vsk->transport->init(vsk, psk) < 0) { @@ -1368,6 +1378,23 @@
> static int vsock_listen(struct socket *sock, int backlog)
>  	return err;
>  }
> 
> +static void vsock_update_buffer_size(struct vsock_sock *vsk,
> +				     const struct vsock_transport *transport,
> +				     u64 val)
> +{
> +	if (val > vsk->buffer_max_size)
> +		val = vsk->buffer_max_size;
> +
> +	if (val < vsk->buffer_min_size)
> +		val = vsk->buffer_min_size;
> +
> +	if (val != vsk->buffer_size &&
> +	    transport && transport->notify_buffer_size)
> +		transport->notify_buffer_size(vsk, &val);
> +
> +	vsk->buffer_size = val;
> +}
> +
>  static int vsock_stream_setsockopt(struct socket *sock,
>  				   int level,
>  				   int optname,
> @@ -1405,17 +1432,19 @@ static int vsock_stream_setsockopt(struct socket
> *sock,
>  	switch (optname) {
>  	case SO_VM_SOCKETS_BUFFER_SIZE:
>  		COPY_IN(val);
> -		transport->set_buffer_size(vsk, val);
> +		vsock_update_buffer_size(vsk, transport, val);
>  		break;
> 
>  	case SO_VM_SOCKETS_BUFFER_MAX_SIZE:
>  		COPY_IN(val);
> -		transport->set_max_buffer_size(vsk, val);
> +		vsk->buffer_max_size = val;
> +		vsock_update_buffer_size(vsk, transport, vsk->buffer_size);
>  		break;
> 
>  	case SO_VM_SOCKETS_BUFFER_MIN_SIZE:
>  		COPY_IN(val);
> -		transport->set_min_buffer_size(vsk, val);
> +		vsk->buffer_min_size = val;
> +		vsock_update_buffer_size(vsk, transport, vsk->buffer_size);
>  		break;
> 
>  	case SO_VM_SOCKETS_CONNECT_TIMEOUT: {
> @@ -1456,7 +1485,6 @@ static int vsock_stream_getsockopt(struct socket
> *sock,
>  	int len;
>  	struct sock *sk;
>  	struct vsock_sock *vsk;
> -	const struct vsock_transport *transport;
>  	u64 val;
> 
>  	if (level != AF_VSOCK)
> @@ -1480,21 +1508,20 @@ static int vsock_stream_getsockopt(struct socket
> *sock,
>  	err = 0;
>  	sk = sock->sk;
>  	vsk = vsock_sk(sk);
> -	transport = vsk->transport;
> 
>  	switch (optname) {
>  	case SO_VM_SOCKETS_BUFFER_SIZE:
> -		val = transport->get_buffer_size(vsk);
> +		val = vsk->buffer_size;
>  		COPY_OUT(val);
>  		break;
> 
>  	case SO_VM_SOCKETS_BUFFER_MAX_SIZE:
> -		val = transport->get_max_buffer_size(vsk);
> +		val = vsk->buffer_max_size;
>  		COPY_OUT(val);
>  		break;
> 
>  	case SO_VM_SOCKETS_BUFFER_MIN_SIZE:
> -		val = transport->get_min_buffer_size(vsk);
> +		val = vsk->buffer_min_size;
>  		COPY_OUT(val);
>  		break;
> 
> diff --git a/net/vmw_vsock/hyperv_transport.c
> b/net/vmw_vsock/hyperv_transport.c
> index bef8772116ec..d62297a62ca6 100644
> --- a/net/vmw_vsock/hyperv_transport.c
> +++ b/net/vmw_vsock/hyperv_transport.c
> @@ -845,36 +845,6 @@ int hvs_notify_send_post_enqueue(struct
> vsock_sock *vsk, ssize_t written,
>  	return 0;
>  }
> 
> -static void hvs_set_buffer_size(struct vsock_sock *vsk, u64 val) -{
> -	/* Ignored. */
> -}
> -
> -static void hvs_set_min_buffer_size(struct vsock_sock *vsk, u64 val) -{
> -	/* Ignored. */
> -}
> -
> -static void hvs_set_max_buffer_size(struct vsock_sock *vsk, u64 val) -{
> -	/* Ignored. */
> -}
> -
> -static u64 hvs_get_buffer_size(struct vsock_sock *vsk) -{
> -	return -ENOPROTOOPT;
> -}
> -
> -static u64 hvs_get_min_buffer_size(struct vsock_sock *vsk) -{
> -	return -ENOPROTOOPT;
> -}
> -
> -static u64 hvs_get_max_buffer_size(struct vsock_sock *vsk) -{
> -	return -ENOPROTOOPT;
> -}
> -
>  static struct vsock_transport hvs_transport = {
>  	.get_local_cid            = hvs_get_local_cid,
> 
> @@ -908,12 +878,6 @@ static struct vsock_transport hvs_transport = {
>  	.notify_send_pre_enqueue  = hvs_notify_send_pre_enqueue,
>  	.notify_send_post_enqueue = hvs_notify_send_post_enqueue,
> 
> -	.set_buffer_size          = hvs_set_buffer_size,
> -	.set_min_buffer_size      = hvs_set_min_buffer_size,
> -	.set_max_buffer_size      = hvs_set_max_buffer_size,
> -	.get_buffer_size          = hvs_get_buffer_size,
> -	.get_min_buffer_size      = hvs_get_min_buffer_size,
> -	.get_max_buffer_size      = hvs_get_max_buffer_size,
>  };
> 
>  static int hvs_probe(struct hv_device *hdev, diff --git
> a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 3756f0857946..fb1fc7760e8c 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -494,13 +494,7 @@ static struct virtio_transport virtio_transport = {
>  		.notify_send_pre_block    =
> virtio_transport_notify_send_pre_block,
>  		.notify_send_pre_enqueue  =
> virtio_transport_notify_send_pre_enqueue,
>  		.notify_send_post_enqueue =
> virtio_transport_notify_send_post_enqueue,
> -
> -		.set_buffer_size          = virtio_transport_set_buffer_size,
> -		.set_min_buffer_size      =
> virtio_transport_set_min_buffer_size,
> -		.set_max_buffer_size      =
> virtio_transport_set_max_buffer_size,
> -		.get_buffer_size          = virtio_transport_get_buffer_size,
> -		.get_min_buffer_size      =
> virtio_transport_get_min_buffer_size,
> -		.get_max_buffer_size      =
> virtio_transport_get_max_buffer_size,
> +		.notify_buffer_size       = virtio_transport_notify_buffer_size,
>  	},
> 
>  	.send_pkt = virtio_transport_send_pkt, diff --git
> a/net/vmw_vsock/virtio_transport_common.c
> b/net/vmw_vsock/virtio_transport_common.c
> index 37a1c7e7c7fe..b2a310dfa158 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -456,17 +456,13 @@ int virtio_transport_do_socket_init(struct
> vsock_sock *vsk,
>  	if (psk) {
>  		struct virtio_vsock_sock *ptrans = psk->trans;
> 
> -		vvs->buf_size	= ptrans->buf_size;
> -		vvs->buf_size_min = ptrans->buf_size_min;
> -		vvs->buf_size_max = ptrans->buf_size_max;
>  		vvs->peer_buf_alloc = ptrans->peer_buf_alloc;
> -	} else {
> -		vvs->buf_size = VIRTIO_VSOCK_DEFAULT_BUF_SIZE;
> -		vvs->buf_size_min =
> VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE;
> -		vvs->buf_size_max =
> VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE;
>  	}
> 
> -	vvs->buf_alloc = vvs->buf_size;
> +	if (vsk->buffer_size > VIRTIO_VSOCK_MAX_BUF_SIZE)
> +		vsk->buffer_size = VIRTIO_VSOCK_MAX_BUF_SIZE;
> +
> +	vvs->buf_alloc = vsk->buffer_size;
> 
>  	spin_lock_init(&vvs->rx_lock);
>  	spin_lock_init(&vvs->tx_lock);
> @@ -476,71 +472,20 @@ int virtio_transport_do_socket_init(struct
> vsock_sock *vsk,  }  EXPORT_SYMBOL_GPL(virtio_transport_do_socket_init);
> 
> -u64 virtio_transport_get_buffer_size(struct vsock_sock *vsk) -{
> -	struct virtio_vsock_sock *vvs = vsk->trans;
> -
> -	return vvs->buf_size;
> -}
> -EXPORT_SYMBOL_GPL(virtio_transport_get_buffer_size);
> -
> -u64 virtio_transport_get_min_buffer_size(struct vsock_sock *vsk)
> +/* sk_lock held by the caller */
> +void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64
> +*val)
>  {
>  	struct virtio_vsock_sock *vvs = vsk->trans;
> 
> -	return vvs->buf_size_min;
> -}
> -EXPORT_SYMBOL_GPL(virtio_transport_get_min_buffer_size);
> -
> -u64 virtio_transport_get_max_buffer_size(struct vsock_sock *vsk) -{
> -	struct virtio_vsock_sock *vvs = vsk->trans;
> -
> -	return vvs->buf_size_max;
> -}
> -EXPORT_SYMBOL_GPL(virtio_transport_get_max_buffer_size);
> -
> -void virtio_transport_set_buffer_size(struct vsock_sock *vsk, u64 val) -{
> -	struct virtio_vsock_sock *vvs = vsk->trans;
> +	if (*val > VIRTIO_VSOCK_MAX_BUF_SIZE)
> +		*val = VIRTIO_VSOCK_MAX_BUF_SIZE;
> 
> -	if (val > VIRTIO_VSOCK_MAX_BUF_SIZE)
> -		val = VIRTIO_VSOCK_MAX_BUF_SIZE;
> -	if (val < vvs->buf_size_min)
> -		vvs->buf_size_min = val;
> -	if (val > vvs->buf_size_max)
> -		vvs->buf_size_max = val;
> -	vvs->buf_size = val;
> -	vvs->buf_alloc = val;
> +	vvs->buf_alloc = *val;
> 
>  	virtio_transport_send_credit_update(vsk,
> VIRTIO_VSOCK_TYPE_STREAM,
>  					    NULL);
>  }
> -EXPORT_SYMBOL_GPL(virtio_transport_set_buffer_size);
> -
> -void virtio_transport_set_min_buffer_size(struct vsock_sock *vsk, u64 val)
> -{
> -	struct virtio_vsock_sock *vvs = vsk->trans;
> -
> -	if (val > VIRTIO_VSOCK_MAX_BUF_SIZE)
> -		val = VIRTIO_VSOCK_MAX_BUF_SIZE;
> -	if (val > vvs->buf_size)
> -		vvs->buf_size = val;
> -	vvs->buf_size_min = val;
> -}
> -EXPORT_SYMBOL_GPL(virtio_transport_set_min_buffer_size);
> -
> -void virtio_transport_set_max_buffer_size(struct vsock_sock *vsk, u64 val)
> -{
> -	struct virtio_vsock_sock *vvs = vsk->trans;
> -
> -	if (val > VIRTIO_VSOCK_MAX_BUF_SIZE)
> -		val = VIRTIO_VSOCK_MAX_BUF_SIZE;
> -	if (val < vvs->buf_size)
> -		vvs->buf_size = val;
> -	vvs->buf_size_max = val;
> -}
> -EXPORT_SYMBOL_GPL(virtio_transport_set_max_buffer_size);
> +EXPORT_SYMBOL_GPL(virtio_transport_notify_buffer_size);
> 
>  int
>  virtio_transport_notify_poll_in(struct vsock_sock *vsk, @@ -632,9 +577,7
> @@ EXPORT_SYMBOL_GPL(virtio_transport_notify_send_post_enqueue);
> 
>  u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk)  {
> -	struct virtio_vsock_sock *vvs = vsk->trans;
> -
> -	return vvs->buf_size;
> +	return vsk->buffer_size;
>  }
>  EXPORT_SYMBOL_GPL(virtio_transport_stream_rcvhiwat);

While the VMCI transport uses a transport local consumer_size for stream_rcvhiwat,
that consumer_size is always the same as buffer_size (a vmci queue pair allows the
producer and consumer queues to be of different sizes, but vsock doesn't use that).
So we could move the stream_rcvhiwat code to the common code as well, and just
use buffer_size, if that simplifies things.

> diff --git a/net/vmw_vsock/vmci_transport.c
> b/net/vmw_vsock/vmci_transport.c index f8e3131ac480..8290d37b6587
> 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -74,10 +74,6 @@ static u32 vmci_transport_qp_resumed_sub_id =
> VMCI_INVALID_ID;
> 
>  static int PROTOCOL_OVERRIDE = -1;
> 
> -#define VMCI_TRANSPORT_DEFAULT_QP_SIZE_MIN   128
> -#define VMCI_TRANSPORT_DEFAULT_QP_SIZE       262144
> -#define VMCI_TRANSPORT_DEFAULT_QP_SIZE_MAX   262144
> -
>  /* Helper function to convert from a VMCI error code to a VSock error code.
> */
> 
>  static s32 vmci_transport_error_to_vsock_error(s32 vmci_error) @@ -
> 1025,11 +1021,11 @@ static int vmci_transport_recv_listen(struct sock *sk,
>  	/* If the proposed size fits within our min/max, accept it. Otherwise
>  	 * propose our own size.
>  	 */
> -	if (pkt->u.size >= vmci_trans(vpending)->queue_pair_min_size &&
> -	    pkt->u.size <= vmci_trans(vpending)->queue_pair_max_size) {
> +	if (pkt->u.size >= vpending->buffer_min_size &&
> +	    pkt->u.size <= vpending->buffer_max_size) {
>  		qp_size = pkt->u.size;
>  	} else {
> -		qp_size = vmci_trans(vpending)->queue_pair_size;
> +		qp_size = vpending->buffer_size;
>  	}
> 
>  	/* Figure out if we are using old or new requests based on the @@ -
> 1098,7 +1094,7 @@ static int vmci_transport_recv_listen(struct sock *sk,
>  	pending->sk_state = TCP_SYN_SENT;
>  	vmci_trans(vpending)->produce_size =
>  		vmci_trans(vpending)->consume_size = qp_size;
> -	vmci_trans(vpending)->queue_pair_size = qp_size;
> +	vpending->buffer_size = qp_size;
> 
>  	vmci_trans(vpending)->notify_ops->process_request(pending);
> 
> @@ -1392,8 +1388,8 @@ static int
> vmci_transport_recv_connecting_client_negotiate(
>  	vsk->ignore_connecting_rst = false;
> 
>  	/* Verify that we're OK with the proposed queue pair size */
> -	if (pkt->u.size < vmci_trans(vsk)->queue_pair_min_size ||
> -	    pkt->u.size > vmci_trans(vsk)->queue_pair_max_size) {
> +	if (pkt->u.size < vsk->buffer_min_size ||
> +	    pkt->u.size > vsk->buffer_max_size) {
>  		err = -EINVAL;
>  		goto destroy;
>  	}
> @@ -1498,8 +1494,7 @@
> vmci_transport_recv_connecting_client_invalid(struct sock *sk,
>  		vsk->sent_request = false;
>  		vsk->ignore_connecting_rst = true;
> 
> -		err = vmci_transport_send_conn_request(
> -			sk, vmci_trans(vsk)->queue_pair_size);
> +		err = vmci_transport_send_conn_request(sk, vsk-
> >buffer_size);
>  		if (err < 0)
>  			err = vmci_transport_error_to_vsock_error(err);
>  		else
> @@ -1583,21 +1578,6 @@ static int vmci_transport_socket_init(struct
> vsock_sock *vsk,
>  	INIT_LIST_HEAD(&vmci_trans(vsk)->elem);
>  	vmci_trans(vsk)->sk = &vsk->sk;
>  	spin_lock_init(&vmci_trans(vsk)->lock);
> -	if (psk) {
> -		vmci_trans(vsk)->queue_pair_size =
> -			vmci_trans(psk)->queue_pair_size;
> -		vmci_trans(vsk)->queue_pair_min_size =
> -			vmci_trans(psk)->queue_pair_min_size;
> -		vmci_trans(vsk)->queue_pair_max_size =
> -			vmci_trans(psk)->queue_pair_max_size;
> -	} else {
> -		vmci_trans(vsk)->queue_pair_size =
> -			VMCI_TRANSPORT_DEFAULT_QP_SIZE;
> -		vmci_trans(vsk)->queue_pair_min_size =
> -			 VMCI_TRANSPORT_DEFAULT_QP_SIZE_MIN;
> -		vmci_trans(vsk)->queue_pair_max_size =
> -			VMCI_TRANSPORT_DEFAULT_QP_SIZE_MAX;
> -	}
> 
>  	return 0;
>  }
> @@ -1813,8 +1793,7 @@ static int vmci_transport_connect(struct
> vsock_sock *vsk)
> 
>  	if (vmci_transport_old_proto_override(&old_pkt_proto) &&
>  		old_pkt_proto) {
> -		err = vmci_transport_send_conn_request(
> -			sk, vmci_trans(vsk)->queue_pair_size);
> +		err = vmci_transport_send_conn_request(sk, vsk-
> >buffer_size);
>  		if (err < 0) {
>  			sk->sk_state = TCP_CLOSE;
>  			return err;
> @@ -1822,8 +1801,7 @@ static int vmci_transport_connect(struct
> vsock_sock *vsk)
>  	} else {
>  		int supported_proto_versions =
>  			vmci_transport_new_proto_supported_versions();
> -		err = vmci_transport_send_conn_request2(
> -				sk, vmci_trans(vsk)->queue_pair_size,
> +		err = vmci_transport_send_conn_request2(sk, vsk-
> >buffer_size,
>  				supported_proto_versions);
>  		if (err < 0) {
>  			sk->sk_state = TCP_CLOSE;
> @@ -1876,46 +1854,6 @@ static bool
> vmci_transport_stream_is_active(struct vsock_sock *vsk)
>  	return !vmci_handle_is_invalid(vmci_trans(vsk)->qp_handle);
>  }
> 
> -static u64 vmci_transport_get_buffer_size(struct vsock_sock *vsk) -{
> -	return vmci_trans(vsk)->queue_pair_size;
> -}
> -
> -static u64 vmci_transport_get_min_buffer_size(struct vsock_sock *vsk) -{
> -	return vmci_trans(vsk)->queue_pair_min_size;
> -}
> -
> -static u64 vmci_transport_get_max_buffer_size(struct vsock_sock *vsk) -{
> -	return vmci_trans(vsk)->queue_pair_max_size;
> -}
> -
> -static void vmci_transport_set_buffer_size(struct vsock_sock *vsk, u64 val)
> -{
> -	if (val < vmci_trans(vsk)->queue_pair_min_size)
> -		vmci_trans(vsk)->queue_pair_min_size = val;
> -	if (val > vmci_trans(vsk)->queue_pair_max_size)
> -		vmci_trans(vsk)->queue_pair_max_size = val;
> -	vmci_trans(vsk)->queue_pair_size = val;
> -}
> -
> -static void vmci_transport_set_min_buffer_size(struct vsock_sock *vsk,
> -					       u64 val)
> -{
> -	if (val > vmci_trans(vsk)->queue_pair_size)
> -		vmci_trans(vsk)->queue_pair_size = val;
> -	vmci_trans(vsk)->queue_pair_min_size = val;
> -}
> -
> -static void vmci_transport_set_max_buffer_size(struct vsock_sock *vsk,
> -					       u64 val)
> -{
> -	if (val < vmci_trans(vsk)->queue_pair_size)
> -		vmci_trans(vsk)->queue_pair_size = val;
> -	vmci_trans(vsk)->queue_pair_max_size = val;
> -}
> -
>  static int vmci_transport_notify_poll_in(
>  	struct vsock_sock *vsk,
>  	size_t target,
> @@ -2098,12 +2036,6 @@ static const struct vsock_transport vmci_transport
> = {
>  	.notify_send_pre_enqueue =
> vmci_transport_notify_send_pre_enqueue,
>  	.notify_send_post_enqueue =
> vmci_transport_notify_send_post_enqueue,
>  	.shutdown = vmci_transport_shutdown,
> -	.set_buffer_size = vmci_transport_set_buffer_size,
> -	.set_min_buffer_size = vmci_transport_set_min_buffer_size,
> -	.set_max_buffer_size = vmci_transport_set_max_buffer_size,
> -	.get_buffer_size = vmci_transport_get_buffer_size,
> -	.get_min_buffer_size = vmci_transport_get_min_buffer_size,
> -	.get_max_buffer_size = vmci_transport_get_max_buffer_size,
>  	.get_local_cid = vmci_transport_get_local_cid,  };
> 
> diff --git a/net/vmw_vsock/vmci_transport.h
> b/net/vmw_vsock/vmci_transport.h index 1ca1e8640b31..b7b072194282
> 100644
> --- a/net/vmw_vsock/vmci_transport.h
> +++ b/net/vmw_vsock/vmci_transport.h
> @@ -108,9 +108,6 @@ struct vmci_transport {
>  	struct vmci_qp *qpair;
>  	u64 produce_size;
>  	u64 consume_size;
> -	u64 queue_pair_size;
> -	u64 queue_pair_min_size;
> -	u64 queue_pair_max_size;
>  	u32 detach_sub_id;
>  	union vmci_transport_notify notify;
>  	const struct vmci_transport_notify_ops *notify_ops;
> --
> 2.21.0

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 08/14] vsock: add vsock_create_connected() called by transports
  2019-10-23  9:55 ` [PATCH net-next 08/14] vsock: add vsock_create_connected() called by transports Stefano Garzarella
  2019-10-27  8:12   ` Stefan Hajnoczi
@ 2019-10-30 15:12   ` Jorgen Hansen
  1 sibling, 0 replies; 46+ messages in thread
From: Jorgen Hansen @ 2019-10-30 15:12 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization, netdev

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Wednesday, October 23, 2019 11:56 AM
> Subject: [PATCH net-next 08/14] vsock: add vsock_create_connected() called
> by transports
> 
> All transports call __vsock_create() with the same parameters,
> most of them depending on the parent socket. In order to simplify
> the VSOCK core APIs exposed to the transports, this patch adds
> the vsock_create_connected() callable from transports to create
> a new socket when a connection request is received.
> We also unexported the __vsock_create().
> 
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  include/net/af_vsock.h                  |  5 +----
>  net/vmw_vsock/af_vsock.c                | 20 +++++++++++++-------
>  net/vmw_vsock/hyperv_transport.c        |  3 +--
>  net/vmw_vsock/virtio_transport_common.c |  3 +--
>  net/vmw_vsock/vmci_transport.c          |  3 +--
>  5 files changed, 17 insertions(+), 17 deletions(-)

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 09/14] vsock: move vsock_insert_unbound() in the vsock_create()
  2019-10-23  9:55 ` [PATCH net-next 09/14] vsock: move vsock_insert_unbound() in the vsock_create() Stefano Garzarella
@ 2019-10-30 15:12   ` Jorgen Hansen
  0 siblings, 0 replies; 46+ messages in thread
From: Jorgen Hansen @ 2019-10-30 15:12 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization, netdev

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Wednesday, October 23, 2019 11:56 AM
> Subject: [PATCH net-next 09/14] vsock: move vsock_insert_unbound() in the
> vsock_create()
> 
> vsock_insert_unbound() was called only when 'sock' parameter of
> __vsock_create() was not null. This only happened when
> __vsock_create() was called by vsock_create().
> 
> In order to simplify the multi-transports support, this patch moves
> vsock_insert_unbound() at the end of vsock_create().
> 
> Reviewed-by: Dexuan Cui <decui@microsoft.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/af_vsock.c | 13 +++++++++----

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 11/14] vsock: add multi-transports support
  2019-10-23 15:08   ` Stefano Garzarella
@ 2019-10-30 15:40     ` Jorgen Hansen
  2019-10-31  8:54       ` Stefano Garzarella
  0 siblings, 1 reply; 46+ messages in thread
From: Jorgen Hansen @ 2019-10-30 15:40 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Sasha Levin, linux-hyperv, Stephen Hemminger, Arnd Bergmann, kvm,
	Michael S. Tsirkin, Greg Kroah-Hartman, Dexuan Cui, linux-kernel,
	virtualization, Haiyang Zhang, Stefan Hajnoczi, David S. Miller,
	netdev

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> > +/* Assign a transport to a socket and call the .init transport callback.
> > + *
> > + * Note: for stream socket this must be called when vsk->remote_addr
> > +is set
> > + * (e.g. during the connect() or when a connection request on a
> > +listener
> > + * socket is received).
> > + * The vsk->remote_addr is used to decide which transport to use:
> > + *  - remote CID > VMADDR_CID_HOST will use host->guest transport
> > + *  - remote CID <= VMADDR_CID_HOST will use guest->host transport
> > +*/ int vsock_assign_transport(struct vsock_sock *vsk, struct
> > +vsock_sock *psk) {
> > +       const struct vsock_transport *new_transport;
> > +       struct sock *sk = sk_vsock(vsk);
> > +
> > +       switch (sk->sk_type) {
> > +       case SOCK_DGRAM:
> > +               new_transport = transport_dgram;
> > +               break;
> > +       case SOCK_STREAM:
> > +               if (vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
> > +                       new_transport = transport_h2g;
> > +               else
> > +                       new_transport = transport_g2h;
> 
> I just noticed that this break the loopback in the guest.
> As a fix, we should use 'transport_g2h' when remote_cid <=
> VMADDR_CID_HOST or remote_cid is the id of 'transport_g2h'.
> 
> To do that we also need to avoid that L2 guests can have the same CID of L1.
> For vhost_vsock I can call vsock_find_cid() in vhost_vsock_set_cid()
> 
> @Jorgen: for vmci we need to do the same? or it is guaranteed, since it's
> already support nested VMs, that a L2 guests cannot have the same CID as
> the L1.

As far as I can tell, we have the same issue with the current support for nested VMs in
VMCI. If we have an L2 guest with the same CID as the L1 guest, we will always send to
the L2 guest, and we may assign an L2 guest the same CID as L1. It should be straight
forward to avoid this, though.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 07/14] vsock: handle buffer_size sockopts in the core
  2019-10-30 15:08   ` Jorgen Hansen
@ 2019-10-31  8:50     ` Stefano Garzarella
  0 siblings, 0 replies; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-31  8:50 UTC (permalink / raw)
  To: Jorgen Hansen, Dexuan Cui
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Haiyang Zhang, Sasha Levin, linux-kernel,
	Arnd Bergmann, Stefan Hajnoczi, linux-hyperv, K. Y. Srinivasan,
	Stephen Hemminger, virtualization, netdev

On Wed, Oct 30, 2019 at 03:08:15PM +0000, Jorgen Hansen wrote:
> > From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> > Sent: Wednesday, October 23, 2019 11:56 AM
> > Subject: [PATCH net-next 07/14] vsock: handle buffer_size sockopts in the
> > core
> > 
> > virtio_transport and vmci_transport handle the buffer_size sockopts in a
> > very similar way.
> > 
> > In order to support multiple transports, this patch moves this handling in the
> > core to allow the user to change the options also if the socket is not yet
> > assigned to any transport.
> > 
> > This patch also adds the '.notify_buffer_size' callback in the 'struct
> > virtio_transport' in order to inform the transport, when the buffer_size is
> > changed by the user. It is also useful to limit the 'buffer_size' requested (e.g.
> > virtio transports).
> > 
> > Acked-by: Dexuan Cui <decui@microsoft.com>
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> > RFC -> v1:
> > - changed .notify_buffer_size return to void (Stefan)
> > - documented that .notify_buffer_size is called with sk_lock held (Stefan)
> > ---
> >  drivers/vhost/vsock.c                   |  7 +-
> >  include/linux/virtio_vsock.h            | 15 +----
> >  include/net/af_vsock.h                  | 15 ++---
> >  net/vmw_vsock/af_vsock.c                | 43 ++++++++++---
> >  net/vmw_vsock/hyperv_transport.c        | 36 -----------
> >  net/vmw_vsock/virtio_transport.c        |  8 +--
> >  net/vmw_vsock/virtio_transport_common.c | 79 ++++-------------------
> >  net/vmw_vsock/vmci_transport.c          | 86 +++----------------------
> >  net/vmw_vsock/vmci_transport.h          |  3 -
> >  9 files changed, 65 insertions(+), 227 deletions(-)
> > 
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index
> > 92ab3852c954..6d7e4f022748 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -418,13 +418,8 @@ static struct virtio_transport vhost_transport = {
> >  		.notify_send_pre_block    =
> > virtio_transport_notify_send_pre_block,
> >  		.notify_send_pre_enqueue  =
> > virtio_transport_notify_send_pre_enqueue,
> >  		.notify_send_post_enqueue =
> > virtio_transport_notify_send_post_enqueue,
> > +		.notify_buffer_size       = virtio_transport_notify_buffer_size,
> > 
> > -		.set_buffer_size          = virtio_transport_set_buffer_size,
> > -		.set_min_buffer_size      =
> > virtio_transport_set_min_buffer_size,
> > -		.set_max_buffer_size      =
> > virtio_transport_set_max_buffer_size,
> > -		.get_buffer_size          = virtio_transport_get_buffer_size,
> > -		.get_min_buffer_size      =
> > virtio_transport_get_min_buffer_size,
> > -		.get_max_buffer_size      =
> > virtio_transport_get_max_buffer_size,
> >  	},
> > 
> >  	.send_pkt = vhost_transport_send_pkt,
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index
> > 96d8132acbd7..b79befd2a5a4 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -7,9 +7,6 @@
> >  #include <net/sock.h>
> >  #include <net/af_vsock.h>
> > 
> > -#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE	128
> > -#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE		(1024 * 256)
> > -#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE	(1024 * 256)
> >  #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
> >  #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
> >  #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
> > @@ -25,11 +22,6 @@ enum {
> >  struct virtio_vsock_sock {
> >  	struct vsock_sock *vsk;
> > 
> > -	/* Protected by lock_sock(sk_vsock(trans->vsk)) */
> > -	u32 buf_size;
> > -	u32 buf_size_min;
> > -	u32 buf_size_max;
> > -
> >  	spinlock_t tx_lock;
> >  	spinlock_t rx_lock;
> > 
> > @@ -93,12 +85,6 @@ s64 virtio_transport_stream_has_space(struct
> > vsock_sock *vsk);
> > 
> >  int virtio_transport_do_socket_init(struct vsock_sock *vsk,
> >  				 struct vsock_sock *psk);
> > -u64 virtio_transport_get_buffer_size(struct vsock_sock *vsk);
> > -u64 virtio_transport_get_min_buffer_size(struct vsock_sock *vsk);
> > -u64 virtio_transport_get_max_buffer_size(struct vsock_sock *vsk); -void
> > virtio_transport_set_buffer_size(struct vsock_sock *vsk, u64 val); -void
> > virtio_transport_set_min_buffer_size(struct vsock_sock *vsk, u64 val); -void
> > virtio_transport_set_max_buffer_size(struct vsock_sock *vs, u64 val);  int
> > virtio_transport_notify_poll_in(struct vsock_sock *vsk,
> >  				size_t target,
> > @@ -125,6 +111,7 @@ int
> > virtio_transport_notify_send_pre_enqueue(struct vsock_sock *vsk,
> >  	struct vsock_transport_send_notify_data *data);  int
> > virtio_transport_notify_send_post_enqueue(struct vsock_sock *vsk,
> >  	ssize_t written, struct vsock_transport_send_notify_data *data);
> > +void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64
> > +*val);
> > 
> >  u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk);  bool
> > virtio_transport_stream_is_active(struct vsock_sock *vsk); diff --git
> > a/include/net/af_vsock.h b/include/net/af_vsock.h index
> > 2ca67d048de4..4b5d16840fd4 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -65,6 +65,11 @@ struct vsock_sock {
> >  	bool sent_request;
> >  	bool ignore_connecting_rst;
> > 
> > +	/* Protected by lock_sock(sk) */
> > +	u64 buffer_size;
> > +	u64 buffer_min_size;
> > +	u64 buffer_max_size;
> > +
> >  	/* Private to transport. */
> >  	void *trans;
> >  };
> > @@ -140,18 +145,12 @@ struct vsock_transport {
> >  		struct vsock_transport_send_notify_data *);
> >  	int (*notify_send_post_enqueue)(struct vsock_sock *, ssize_t,
> >  		struct vsock_transport_send_notify_data *);
> > +	/* sk_lock held by the caller */
> > +	void (*notify_buffer_size)(struct vsock_sock *, u64 *);
> > 
> >  	/* Shutdown. */
> >  	int (*shutdown)(struct vsock_sock *, int);
> > 
> > -	/* Buffer sizes. */
> > -	void (*set_buffer_size)(struct vsock_sock *, u64);
> > -	void (*set_min_buffer_size)(struct vsock_sock *, u64);
> > -	void (*set_max_buffer_size)(struct vsock_sock *, u64);
> > -	u64 (*get_buffer_size)(struct vsock_sock *);
> > -	u64 (*get_min_buffer_size)(struct vsock_sock *);
> > -	u64 (*get_max_buffer_size)(struct vsock_sock *);
> > -
> >  	/* Addressing. */
> >  	u32 (*get_local_cid)(void);
> >  };
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index
> > eaea159006c8..90ac46ea12ef 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -126,6 +126,10 @@ static struct proto vsock_proto = {
> >   */
> >  #define VSOCK_DEFAULT_CONNECT_TIMEOUT (2 * HZ)
> > 
> > +#define VSOCK_DEFAULT_BUFFER_SIZE     (1024 * 256)
> > +#define VSOCK_DEFAULT_BUFFER_MAX_SIZE (1024 * 256) #define
> > +VSOCK_DEFAULT_BUFFER_MIN_SIZE 128
> > +
> >  static const struct vsock_transport *transport_single;  static
> > DEFINE_MUTEX(vsock_register_mutex);
> > 
> > @@ -613,10 +617,16 @@ struct sock *__vsock_create(struct net *net,
> >  		vsk->trusted = psk->trusted;
> >  		vsk->owner = get_cred(psk->owner);
> >  		vsk->connect_timeout = psk->connect_timeout;
> > +		vsk->buffer_size = psk->buffer_size;
> > +		vsk->buffer_min_size = psk->buffer_min_size;
> > +		vsk->buffer_max_size = psk->buffer_max_size;
> >  	} else {
> >  		vsk->trusted = capable(CAP_NET_ADMIN);
> >  		vsk->owner = get_current_cred();
> >  		vsk->connect_timeout =
> > VSOCK_DEFAULT_CONNECT_TIMEOUT;
> > +		vsk->buffer_size = VSOCK_DEFAULT_BUFFER_SIZE;
> > +		vsk->buffer_min_size =
> > VSOCK_DEFAULT_BUFFER_MIN_SIZE;
> > +		vsk->buffer_max_size =
> > VSOCK_DEFAULT_BUFFER_MAX_SIZE;
> >  	}
> > 
> >  	if (vsk->transport->init(vsk, psk) < 0) { @@ -1368,6 +1378,23 @@
> > static int vsock_listen(struct socket *sock, int backlog)
> >  	return err;
> >  }
> > 
> > +static void vsock_update_buffer_size(struct vsock_sock *vsk,
> > +				     const struct vsock_transport *transport,
> > +				     u64 val)
> > +{
> > +	if (val > vsk->buffer_max_size)
> > +		val = vsk->buffer_max_size;
> > +
> > +	if (val < vsk->buffer_min_size)
> > +		val = vsk->buffer_min_size;
> > +
> > +	if (val != vsk->buffer_size &&
> > +	    transport && transport->notify_buffer_size)
> > +		transport->notify_buffer_size(vsk, &val);
> > +
> > +	vsk->buffer_size = val;
> > +}
> > +
> >  static int vsock_stream_setsockopt(struct socket *sock,
> >  				   int level,
> >  				   int optname,
> > @@ -1405,17 +1432,19 @@ static int vsock_stream_setsockopt(struct socket
> > *sock,
> >  	switch (optname) {
> >  	case SO_VM_SOCKETS_BUFFER_SIZE:
> >  		COPY_IN(val);
> > -		transport->set_buffer_size(vsk, val);
> > +		vsock_update_buffer_size(vsk, transport, val);
> >  		break;
> > 
> >  	case SO_VM_SOCKETS_BUFFER_MAX_SIZE:
> >  		COPY_IN(val);
> > -		transport->set_max_buffer_size(vsk, val);
> > +		vsk->buffer_max_size = val;
> > +		vsock_update_buffer_size(vsk, transport, vsk->buffer_size);
> >  		break;
> > 
> >  	case SO_VM_SOCKETS_BUFFER_MIN_SIZE:
> >  		COPY_IN(val);
> > -		transport->set_min_buffer_size(vsk, val);
> > +		vsk->buffer_min_size = val;
> > +		vsock_update_buffer_size(vsk, transport, vsk->buffer_size);
> >  		break;
> > 
> >  	case SO_VM_SOCKETS_CONNECT_TIMEOUT: {
> > @@ -1456,7 +1485,6 @@ static int vsock_stream_getsockopt(struct socket
> > *sock,
> >  	int len;
> >  	struct sock *sk;
> >  	struct vsock_sock *vsk;
> > -	const struct vsock_transport *transport;
> >  	u64 val;
> > 
> >  	if (level != AF_VSOCK)
> > @@ -1480,21 +1508,20 @@ static int vsock_stream_getsockopt(struct socket
> > *sock,
> >  	err = 0;
> >  	sk = sock->sk;
> >  	vsk = vsock_sk(sk);
> > -	transport = vsk->transport;
> > 
> >  	switch (optname) {
> >  	case SO_VM_SOCKETS_BUFFER_SIZE:
> > -		val = transport->get_buffer_size(vsk);
> > +		val = vsk->buffer_size;
> >  		COPY_OUT(val);
> >  		break;
> > 
> >  	case SO_VM_SOCKETS_BUFFER_MAX_SIZE:
> > -		val = transport->get_max_buffer_size(vsk);
> > +		val = vsk->buffer_max_size;
> >  		COPY_OUT(val);
> >  		break;
> > 
> >  	case SO_VM_SOCKETS_BUFFER_MIN_SIZE:
> > -		val = transport->get_min_buffer_size(vsk);
> > +		val = vsk->buffer_min_size;
> >  		COPY_OUT(val);
> >  		break;
> > 
> > diff --git a/net/vmw_vsock/hyperv_transport.c
> > b/net/vmw_vsock/hyperv_transport.c
> > index bef8772116ec..d62297a62ca6 100644
> > --- a/net/vmw_vsock/hyperv_transport.c
> > +++ b/net/vmw_vsock/hyperv_transport.c
> > @@ -845,36 +845,6 @@ int hvs_notify_send_post_enqueue(struct
> > vsock_sock *vsk, ssize_t written,
> >  	return 0;
> >  }
> > 
> > -static void hvs_set_buffer_size(struct vsock_sock *vsk, u64 val) -{
> > -	/* Ignored. */
> > -}
> > -
> > -static void hvs_set_min_buffer_size(struct vsock_sock *vsk, u64 val) -{
> > -	/* Ignored. */
> > -}
> > -
> > -static void hvs_set_max_buffer_size(struct vsock_sock *vsk, u64 val) -{
> > -	/* Ignored. */
> > -}
> > -
> > -static u64 hvs_get_buffer_size(struct vsock_sock *vsk) -{
> > -	return -ENOPROTOOPT;
> > -}
> > -
> > -static u64 hvs_get_min_buffer_size(struct vsock_sock *vsk) -{
> > -	return -ENOPROTOOPT;
> > -}
> > -
> > -static u64 hvs_get_max_buffer_size(struct vsock_sock *vsk) -{
> > -	return -ENOPROTOOPT;
> > -}
> > -
> >  static struct vsock_transport hvs_transport = {
> >  	.get_local_cid            = hvs_get_local_cid,
> > 
> > @@ -908,12 +878,6 @@ static struct vsock_transport hvs_transport = {
> >  	.notify_send_pre_enqueue  = hvs_notify_send_pre_enqueue,
> >  	.notify_send_post_enqueue = hvs_notify_send_post_enqueue,
> > 
> > -	.set_buffer_size          = hvs_set_buffer_size,
> > -	.set_min_buffer_size      = hvs_set_min_buffer_size,
> > -	.set_max_buffer_size      = hvs_set_max_buffer_size,
> > -	.get_buffer_size          = hvs_get_buffer_size,
> > -	.get_min_buffer_size      = hvs_get_min_buffer_size,
> > -	.get_max_buffer_size      = hvs_get_max_buffer_size,
> >  };
> > 
> >  static int hvs_probe(struct hv_device *hdev, diff --git
> > a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> > index 3756f0857946..fb1fc7760e8c 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -494,13 +494,7 @@ static struct virtio_transport virtio_transport = {
> >  		.notify_send_pre_block    =
> > virtio_transport_notify_send_pre_block,
> >  		.notify_send_pre_enqueue  =
> > virtio_transport_notify_send_pre_enqueue,
> >  		.notify_send_post_enqueue =
> > virtio_transport_notify_send_post_enqueue,
> > -
> > -		.set_buffer_size          = virtio_transport_set_buffer_size,
> > -		.set_min_buffer_size      =
> > virtio_transport_set_min_buffer_size,
> > -		.set_max_buffer_size      =
> > virtio_transport_set_max_buffer_size,
> > -		.get_buffer_size          = virtio_transport_get_buffer_size,
> > -		.get_min_buffer_size      =
> > virtio_transport_get_min_buffer_size,
> > -		.get_max_buffer_size      =
> > virtio_transport_get_max_buffer_size,
> > +		.notify_buffer_size       = virtio_transport_notify_buffer_size,
> >  	},
> > 
> >  	.send_pkt = virtio_transport_send_pkt, diff --git
> > a/net/vmw_vsock/virtio_transport_common.c
> > b/net/vmw_vsock/virtio_transport_common.c
> > index 37a1c7e7c7fe..b2a310dfa158 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -456,17 +456,13 @@ int virtio_transport_do_socket_init(struct
> > vsock_sock *vsk,
> >  	if (psk) {
> >  		struct virtio_vsock_sock *ptrans = psk->trans;
> > 
> > -		vvs->buf_size	= ptrans->buf_size;
> > -		vvs->buf_size_min = ptrans->buf_size_min;
> > -		vvs->buf_size_max = ptrans->buf_size_max;
> >  		vvs->peer_buf_alloc = ptrans->peer_buf_alloc;
> > -	} else {
> > -		vvs->buf_size = VIRTIO_VSOCK_DEFAULT_BUF_SIZE;
> > -		vvs->buf_size_min =
> > VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE;
> > -		vvs->buf_size_max =
> > VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE;
> >  	}
> > 
> > -	vvs->buf_alloc = vvs->buf_size;
> > +	if (vsk->buffer_size > VIRTIO_VSOCK_MAX_BUF_SIZE)
> > +		vsk->buffer_size = VIRTIO_VSOCK_MAX_BUF_SIZE;
> > +
> > +	vvs->buf_alloc = vsk->buffer_size;
> > 
> >  	spin_lock_init(&vvs->rx_lock);
> >  	spin_lock_init(&vvs->tx_lock);
> > @@ -476,71 +472,20 @@ int virtio_transport_do_socket_init(struct
> > vsock_sock *vsk,  }  EXPORT_SYMBOL_GPL(virtio_transport_do_socket_init);
> > 
> > -u64 virtio_transport_get_buffer_size(struct vsock_sock *vsk) -{
> > -	struct virtio_vsock_sock *vvs = vsk->trans;
> > -
> > -	return vvs->buf_size;
> > -}
> > -EXPORT_SYMBOL_GPL(virtio_transport_get_buffer_size);
> > -
> > -u64 virtio_transport_get_min_buffer_size(struct vsock_sock *vsk)
> > +/* sk_lock held by the caller */
> > +void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64
> > +*val)
> >  {
> >  	struct virtio_vsock_sock *vvs = vsk->trans;
> > 
> > -	return vvs->buf_size_min;
> > -}
> > -EXPORT_SYMBOL_GPL(virtio_transport_get_min_buffer_size);
> > -
> > -u64 virtio_transport_get_max_buffer_size(struct vsock_sock *vsk) -{
> > -	struct virtio_vsock_sock *vvs = vsk->trans;
> > -
> > -	return vvs->buf_size_max;
> > -}
> > -EXPORT_SYMBOL_GPL(virtio_transport_get_max_buffer_size);
> > -
> > -void virtio_transport_set_buffer_size(struct vsock_sock *vsk, u64 val) -{
> > -	struct virtio_vsock_sock *vvs = vsk->trans;
> > +	if (*val > VIRTIO_VSOCK_MAX_BUF_SIZE)
> > +		*val = VIRTIO_VSOCK_MAX_BUF_SIZE;
> > 
> > -	if (val > VIRTIO_VSOCK_MAX_BUF_SIZE)
> > -		val = VIRTIO_VSOCK_MAX_BUF_SIZE;
> > -	if (val < vvs->buf_size_min)
> > -		vvs->buf_size_min = val;
> > -	if (val > vvs->buf_size_max)
> > -		vvs->buf_size_max = val;
> > -	vvs->buf_size = val;
> > -	vvs->buf_alloc = val;
> > +	vvs->buf_alloc = *val;
> > 
> >  	virtio_transport_send_credit_update(vsk,
> > VIRTIO_VSOCK_TYPE_STREAM,
> >  					    NULL);
> >  }
> > -EXPORT_SYMBOL_GPL(virtio_transport_set_buffer_size);
> > -
> > -void virtio_transport_set_min_buffer_size(struct vsock_sock *vsk, u64 val)
> > -{
> > -	struct virtio_vsock_sock *vvs = vsk->trans;
> > -
> > -	if (val > VIRTIO_VSOCK_MAX_BUF_SIZE)
> > -		val = VIRTIO_VSOCK_MAX_BUF_SIZE;
> > -	if (val > vvs->buf_size)
> > -		vvs->buf_size = val;
> > -	vvs->buf_size_min = val;
> > -}
> > -EXPORT_SYMBOL_GPL(virtio_transport_set_min_buffer_size);
> > -
> > -void virtio_transport_set_max_buffer_size(struct vsock_sock *vsk, u64 val)
> > -{
> > -	struct virtio_vsock_sock *vvs = vsk->trans;
> > -
> > -	if (val > VIRTIO_VSOCK_MAX_BUF_SIZE)
> > -		val = VIRTIO_VSOCK_MAX_BUF_SIZE;
> > -	if (val < vvs->buf_size)
> > -		vvs->buf_size = val;
> > -	vvs->buf_size_max = val;
> > -}
> > -EXPORT_SYMBOL_GPL(virtio_transport_set_max_buffer_size);
> > +EXPORT_SYMBOL_GPL(virtio_transport_notify_buffer_size);
> > 
> >  int
> >  virtio_transport_notify_poll_in(struct vsock_sock *vsk, @@ -632,9 +577,7
> > @@ EXPORT_SYMBOL_GPL(virtio_transport_notify_send_post_enqueue);
> > 
> >  u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk)  {
> > -	struct virtio_vsock_sock *vvs = vsk->trans;
> > -
> > -	return vvs->buf_size;
> > +	return vsk->buffer_size;
> >  }
> >  EXPORT_SYMBOL_GPL(virtio_transport_stream_rcvhiwat);
> 
> While the VMCI transport uses a transport local consumer_size for stream_rcvhiwat,
> that consumer_size is always the same as buffer_size (a vmci queue pair allows the
> producer and consumer queues to be of different sizes, but vsock doesn't use that).
> So we could move the stream_rcvhiwat code to the common code as well, and just
> use buffer_size, if that simplifies things.
> 

Thanks to let me know. It could be another step to clean up the
transports. I'm only worried about hyperv_transport, because it returns
HVS_MTU_SIZE + 1.

@Dexuan Do you have any advice?


> > diff --git a/net/vmw_vsock/vmci_transport.c
> > b/net/vmw_vsock/vmci_transport.c index f8e3131ac480..8290d37b6587
> > 100644
> > --- a/net/vmw_vsock/vmci_transport.c
> > +++ b/net/vmw_vsock/vmci_transport.c
> > @@ -74,10 +74,6 @@ static u32 vmci_transport_qp_resumed_sub_id =
> > VMCI_INVALID_ID;
> > 
> >  static int PROTOCOL_OVERRIDE = -1;
> > 
> > -#define VMCI_TRANSPORT_DEFAULT_QP_SIZE_MIN   128
> > -#define VMCI_TRANSPORT_DEFAULT_QP_SIZE       262144
> > -#define VMCI_TRANSPORT_DEFAULT_QP_SIZE_MAX   262144
> > -
> >  /* Helper function to convert from a VMCI error code to a VSock error code.
> > */
> > 
> >  static s32 vmci_transport_error_to_vsock_error(s32 vmci_error) @@ -
> > 1025,11 +1021,11 @@ static int vmci_transport_recv_listen(struct sock *sk,
> >  	/* If the proposed size fits within our min/max, accept it. Otherwise
> >  	 * propose our own size.
> >  	 */
> > -	if (pkt->u.size >= vmci_trans(vpending)->queue_pair_min_size &&
> > -	    pkt->u.size <= vmci_trans(vpending)->queue_pair_max_size) {
> > +	if (pkt->u.size >= vpending->buffer_min_size &&
> > +	    pkt->u.size <= vpending->buffer_max_size) {
> >  		qp_size = pkt->u.size;
> >  	} else {
> > -		qp_size = vmci_trans(vpending)->queue_pair_size;
> > +		qp_size = vpending->buffer_size;
> >  	}
> > 
> >  	/* Figure out if we are using old or new requests based on the @@ -
> > 1098,7 +1094,7 @@ static int vmci_transport_recv_listen(struct sock *sk,
> >  	pending->sk_state = TCP_SYN_SENT;
> >  	vmci_trans(vpending)->produce_size =
> >  		vmci_trans(vpending)->consume_size = qp_size;
> > -	vmci_trans(vpending)->queue_pair_size = qp_size;
> > +	vpending->buffer_size = qp_size;
> > 
> >  	vmci_trans(vpending)->notify_ops->process_request(pending);
> > 
> > @@ -1392,8 +1388,8 @@ static int
> > vmci_transport_recv_connecting_client_negotiate(
> >  	vsk->ignore_connecting_rst = false;
> > 
> >  	/* Verify that we're OK with the proposed queue pair size */
> > -	if (pkt->u.size < vmci_trans(vsk)->queue_pair_min_size ||
> > -	    pkt->u.size > vmci_trans(vsk)->queue_pair_max_size) {
> > +	if (pkt->u.size < vsk->buffer_min_size ||
> > +	    pkt->u.size > vsk->buffer_max_size) {
> >  		err = -EINVAL;
> >  		goto destroy;
> >  	}
> > @@ -1498,8 +1494,7 @@
> > vmci_transport_recv_connecting_client_invalid(struct sock *sk,
> >  		vsk->sent_request = false;
> >  		vsk->ignore_connecting_rst = true;
> > 
> > -		err = vmci_transport_send_conn_request(
> > -			sk, vmci_trans(vsk)->queue_pair_size);
> > +		err = vmci_transport_send_conn_request(sk, vsk-
> > >buffer_size);
> >  		if (err < 0)
> >  			err = vmci_transport_error_to_vsock_error(err);
> >  		else
> > @@ -1583,21 +1578,6 @@ static int vmci_transport_socket_init(struct
> > vsock_sock *vsk,
> >  	INIT_LIST_HEAD(&vmci_trans(vsk)->elem);
> >  	vmci_trans(vsk)->sk = &vsk->sk;
> >  	spin_lock_init(&vmci_trans(vsk)->lock);
> > -	if (psk) {
> > -		vmci_trans(vsk)->queue_pair_size =
> > -			vmci_trans(psk)->queue_pair_size;
> > -		vmci_trans(vsk)->queue_pair_min_size =
> > -			vmci_trans(psk)->queue_pair_min_size;
> > -		vmci_trans(vsk)->queue_pair_max_size =
> > -			vmci_trans(psk)->queue_pair_max_size;
> > -	} else {
> > -		vmci_trans(vsk)->queue_pair_size =
> > -			VMCI_TRANSPORT_DEFAULT_QP_SIZE;
> > -		vmci_trans(vsk)->queue_pair_min_size =
> > -			 VMCI_TRANSPORT_DEFAULT_QP_SIZE_MIN;
> > -		vmci_trans(vsk)->queue_pair_max_size =
> > -			VMCI_TRANSPORT_DEFAULT_QP_SIZE_MAX;
> > -	}
> > 
> >  	return 0;
> >  }
> > @@ -1813,8 +1793,7 @@ static int vmci_transport_connect(struct
> > vsock_sock *vsk)
> > 
> >  	if (vmci_transport_old_proto_override(&old_pkt_proto) &&
> >  		old_pkt_proto) {
> > -		err = vmci_transport_send_conn_request(
> > -			sk, vmci_trans(vsk)->queue_pair_size);
> > +		err = vmci_transport_send_conn_request(sk, vsk-
> > >buffer_size);
> >  		if (err < 0) {
> >  			sk->sk_state = TCP_CLOSE;
> >  			return err;
> > @@ -1822,8 +1801,7 @@ static int vmci_transport_connect(struct
> > vsock_sock *vsk)
> >  	} else {
> >  		int supported_proto_versions =
> >  			vmci_transport_new_proto_supported_versions();
> > -		err = vmci_transport_send_conn_request2(
> > -				sk, vmci_trans(vsk)->queue_pair_size,
> > +		err = vmci_transport_send_conn_request2(sk, vsk-
> > >buffer_size,
> >  				supported_proto_versions);
> >  		if (err < 0) {
> >  			sk->sk_state = TCP_CLOSE;
> > @@ -1876,46 +1854,6 @@ static bool
> > vmci_transport_stream_is_active(struct vsock_sock *vsk)
> >  	return !vmci_handle_is_invalid(vmci_trans(vsk)->qp_handle);
> >  }
> > 
> > -static u64 vmci_transport_get_buffer_size(struct vsock_sock *vsk) -{
> > -	return vmci_trans(vsk)->queue_pair_size;
> > -}
> > -
> > -static u64 vmci_transport_get_min_buffer_size(struct vsock_sock *vsk) -{
> > -	return vmci_trans(vsk)->queue_pair_min_size;
> > -}
> > -
> > -static u64 vmci_transport_get_max_buffer_size(struct vsock_sock *vsk) -{
> > -	return vmci_trans(vsk)->queue_pair_max_size;
> > -}
> > -
> > -static void vmci_transport_set_buffer_size(struct vsock_sock *vsk, u64 val)
> > -{
> > -	if (val < vmci_trans(vsk)->queue_pair_min_size)
> > -		vmci_trans(vsk)->queue_pair_min_size = val;
> > -	if (val > vmci_trans(vsk)->queue_pair_max_size)
> > -		vmci_trans(vsk)->queue_pair_max_size = val;
> > -	vmci_trans(vsk)->queue_pair_size = val;
> > -}
> > -
> > -static void vmci_transport_set_min_buffer_size(struct vsock_sock *vsk,
> > -					       u64 val)
> > -{
> > -	if (val > vmci_trans(vsk)->queue_pair_size)
> > -		vmci_trans(vsk)->queue_pair_size = val;
> > -	vmci_trans(vsk)->queue_pair_min_size = val;
> > -}
> > -
> > -static void vmci_transport_set_max_buffer_size(struct vsock_sock *vsk,
> > -					       u64 val)
> > -{
> > -	if (val < vmci_trans(vsk)->queue_pair_size)
> > -		vmci_trans(vsk)->queue_pair_size = val;
> > -	vmci_trans(vsk)->queue_pair_max_size = val;
> > -}
> > -
> >  static int vmci_transport_notify_poll_in(
> >  	struct vsock_sock *vsk,
> >  	size_t target,
> > @@ -2098,12 +2036,6 @@ static const struct vsock_transport vmci_transport
> > = {
> >  	.notify_send_pre_enqueue =
> > vmci_transport_notify_send_pre_enqueue,
> >  	.notify_send_post_enqueue =
> > vmci_transport_notify_send_post_enqueue,
> >  	.shutdown = vmci_transport_shutdown,
> > -	.set_buffer_size = vmci_transport_set_buffer_size,
> > -	.set_min_buffer_size = vmci_transport_set_min_buffer_size,
> > -	.set_max_buffer_size = vmci_transport_set_max_buffer_size,
> > -	.get_buffer_size = vmci_transport_get_buffer_size,
> > -	.get_min_buffer_size = vmci_transport_get_min_buffer_size,
> > -	.get_max_buffer_size = vmci_transport_get_max_buffer_size,
> >  	.get_local_cid = vmci_transport_get_local_cid,  };
> > 
> > diff --git a/net/vmw_vsock/vmci_transport.h
> > b/net/vmw_vsock/vmci_transport.h index 1ca1e8640b31..b7b072194282
> > 100644
> > --- a/net/vmw_vsock/vmci_transport.h
> > +++ b/net/vmw_vsock/vmci_transport.h
> > @@ -108,9 +108,6 @@ struct vmci_transport {
> >  	struct vmci_qp *qpair;
> >  	u64 produce_size;
> >  	u64 consume_size;
> > -	u64 queue_pair_size;
> > -	u64 queue_pair_min_size;
> > -	u64 queue_pair_max_size;
> >  	u32 detach_sub_id;
> >  	union vmci_transport_notify notify;
> >  	const struct vmci_transport_notify_ops *notify_ops;
> > --
> > 2.21.0
> 
> Reviewed-by: Jorgen Hansen <jhansen@vmware.com>

Thanks for the reviews,
Stefano

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 11/14] vsock: add multi-transports support
  2019-10-30 15:40     ` Jorgen Hansen
@ 2019-10-31  8:54       ` Stefano Garzarella
  0 siblings, 0 replies; 46+ messages in thread
From: Stefano Garzarella @ 2019-10-31  8:54 UTC (permalink / raw)
  To: Jorgen Hansen
  Cc: Sasha Levin, linux-hyperv, Stephen Hemminger, Arnd Bergmann, kvm,
	Michael S. Tsirkin, Greg Kroah-Hartman, Dexuan Cui, linux-kernel,
	virtualization, Haiyang Zhang, Stefan Hajnoczi, David S. Miller,
	netdev

On Wed, Oct 30, 2019 at 03:40:05PM +0000, Jorgen Hansen wrote:
> > From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> > > +/* Assign a transport to a socket and call the .init transport callback.
> > > + *
> > > + * Note: for stream socket this must be called when vsk->remote_addr
> > > +is set
> > > + * (e.g. during the connect() or when a connection request on a
> > > +listener
> > > + * socket is received).
> > > + * The vsk->remote_addr is used to decide which transport to use:
> > > + *  - remote CID > VMADDR_CID_HOST will use host->guest transport
> > > + *  - remote CID <= VMADDR_CID_HOST will use guest->host transport
> > > +*/ int vsock_assign_transport(struct vsock_sock *vsk, struct
> > > +vsock_sock *psk) {
> > > +       const struct vsock_transport *new_transport;
> > > +       struct sock *sk = sk_vsock(vsk);
> > > +
> > > +       switch (sk->sk_type) {
> > > +       case SOCK_DGRAM:
> > > +               new_transport = transport_dgram;
> > > +               break;
> > > +       case SOCK_STREAM:
> > > +               if (vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
> > > +                       new_transport = transport_h2g;
> > > +               else
> > > +                       new_transport = transport_g2h;
> > 
> > I just noticed that this break the loopback in the guest.
> > As a fix, we should use 'transport_g2h' when remote_cid <=
> > VMADDR_CID_HOST or remote_cid is the id of 'transport_g2h'.
> > 
> > To do that we also need to avoid that L2 guests can have the same CID of L1.
> > For vhost_vsock I can call vsock_find_cid() in vhost_vsock_set_cid()
> > 
> > @Jorgen: for vmci we need to do the same? or it is guaranteed, since it's
> > already support nested VMs, that a L2 guests cannot have the same CID as
> > the L1.
> 
> As far as I can tell, we have the same issue with the current support for nested VMs in
> VMCI. If we have an L2 guest with the same CID as the L1 guest, we will always send to
> the L2 guest, and we may assign an L2 guest the same CID as L1. It should be straight
> forward to avoid this, though.
> 

Yes, I think so.

For the v2 I'm exposing the vsock_find_cid() to the transports, in this
way I can reject requests to set the same L1 CID for L2 guests.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active
  2019-10-23  9:55 ` [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active Stefano Garzarella
  2019-10-27  8:17   ` Stefan Hajnoczi
@ 2019-11-04 10:10   ` Stefano Garzarella
  2019-11-11 16:27   ` Jorgen Hansen
  2 siblings, 0 replies; 46+ messages in thread
From: Stefano Garzarella @ 2019-11-04 10:10 UTC (permalink / raw)
  To: Jorgen Hansen
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization, netdev

Hi Jorgen,
I'm preparing the v2, but first, if you have time, I'd like to have
a comment from you on this patch that modifies a bit vmci.

Thank you very much,
Stefano

On Wed, Oct 23, 2019 at 11:55:52AM +0200, Stefano Garzarella wrote:
> To allow other transports to be loaded with vmci_transport,
> we register the vmci_transport as G2H or H2G only when a VMCI guest
> or host is active.
> 
> To do that, this patch adds a callback registered in the vmci driver
> that will be called when a new host or guest become active.
> This callback will register the vmci_transport in the VSOCK core.
> If the transport is already registered, we ignore the error coming
> from vsock_core_register().
> 
> Cc: Jorgen Hansen <jhansen@vmware.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  drivers/misc/vmw_vmci/vmci_driver.c | 50 +++++++++++++++++++++++++++++
>  drivers/misc/vmw_vmci/vmci_driver.h |  2 ++
>  drivers/misc/vmw_vmci/vmci_guest.c  |  2 ++
>  drivers/misc/vmw_vmci/vmci_host.c   |  7 ++++
>  include/linux/vmw_vmci_api.h        |  2 ++
>  net/vmw_vsock/vmci_transport.c      | 29 +++++++++++------
>  6 files changed, 82 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/misc/vmw_vmci/vmci_driver.c b/drivers/misc/vmw_vmci/vmci_driver.c
> index 819e35995d32..195afbd7edc1 100644
> --- a/drivers/misc/vmw_vmci/vmci_driver.c
> +++ b/drivers/misc/vmw_vmci/vmci_driver.c
> @@ -28,6 +28,9 @@ MODULE_PARM_DESC(disable_guest,
>  static bool vmci_guest_personality_initialized;
>  static bool vmci_host_personality_initialized;
>  
> +static DEFINE_MUTEX(vmci_vsock_mutex); /* protects vmci_vsock_transport_cb */
> +static vmci_vsock_cb vmci_vsock_transport_cb;
> +
>  /*
>   * vmci_get_context_id() - Gets the current context ID.
>   *
> @@ -45,6 +48,53 @@ u32 vmci_get_context_id(void)
>  }
>  EXPORT_SYMBOL_GPL(vmci_get_context_id);
>  
> +/*
> + * vmci_register_vsock_callback() - Register the VSOCK vmci_transport callback.
> + *
> + * The callback will be called every time a new host or guest become active,
> + * or if they are already active when this function is called.
> + * To unregister the callback, call this function with NULL parameter.
> + *
> + * Returns 0 on success. -EBUSY if a callback is already registered.
> + */
> +int vmci_register_vsock_callback(vmci_vsock_cb callback)
> +{
> +	int err = 0;
> +
> +	mutex_lock(&vmci_vsock_mutex);
> +
> +	if (vmci_vsock_transport_cb && callback) {
> +		err = -EBUSY;
> +		goto out;
> +	}
> +
> +	vmci_vsock_transport_cb = callback;
> +
> +	if (!vmci_vsock_transport_cb)
> +		goto out;
> +
> +	if (vmci_guest_code_active())
> +		vmci_vsock_transport_cb(false);
> +
> +	if (vmci_host_users() > 0)
> +		vmci_vsock_transport_cb(true);
> +
> +out:
> +	mutex_unlock(&vmci_vsock_mutex);
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(vmci_register_vsock_callback);
> +
> +void vmci_call_vsock_callback(bool is_host)
> +{
> +	mutex_lock(&vmci_vsock_mutex);
> +
> +	if (vmci_vsock_transport_cb)
> +		vmci_vsock_transport_cb(is_host);
> +
> +	mutex_unlock(&vmci_vsock_mutex);
> +}
> +
>  static int __init vmci_drv_init(void)
>  {
>  	int vmci_err;
> diff --git a/drivers/misc/vmw_vmci/vmci_driver.h b/drivers/misc/vmw_vmci/vmci_driver.h
> index aab81b67670c..990682480bf6 100644
> --- a/drivers/misc/vmw_vmci/vmci_driver.h
> +++ b/drivers/misc/vmw_vmci/vmci_driver.h
> @@ -36,10 +36,12 @@ extern struct pci_dev *vmci_pdev;
>  
>  u32 vmci_get_context_id(void);
>  int vmci_send_datagram(struct vmci_datagram *dg);
> +void vmci_call_vsock_callback(bool is_host);
>  
>  int vmci_host_init(void);
>  void vmci_host_exit(void);
>  bool vmci_host_code_active(void);
> +int vmci_host_users(void);
>  
>  int vmci_guest_init(void);
>  void vmci_guest_exit(void);
> diff --git a/drivers/misc/vmw_vmci/vmci_guest.c b/drivers/misc/vmw_vmci/vmci_guest.c
> index 7a84a48c75da..cc8eeb361fcd 100644
> --- a/drivers/misc/vmw_vmci/vmci_guest.c
> +++ b/drivers/misc/vmw_vmci/vmci_guest.c
> @@ -637,6 +637,8 @@ static int vmci_guest_probe_device(struct pci_dev *pdev,
>  		  vmci_dev->iobase + VMCI_CONTROL_ADDR);
>  
>  	pci_set_drvdata(pdev, vmci_dev);
> +
> +	vmci_call_vsock_callback(false);
>  	return 0;
>  
>  err_free_irq:
> diff --git a/drivers/misc/vmw_vmci/vmci_host.c b/drivers/misc/vmw_vmci/vmci_host.c
> index 833e2bd248a5..ff3c396146ff 100644
> --- a/drivers/misc/vmw_vmci/vmci_host.c
> +++ b/drivers/misc/vmw_vmci/vmci_host.c
> @@ -108,6 +108,11 @@ bool vmci_host_code_active(void)
>  	     atomic_read(&vmci_host_active_users) > 0);
>  }
>  
> +int vmci_host_users(void)
> +{
> +	return atomic_read(&vmci_host_active_users);
> +}
> +
>  /*
>   * Called on open of /dev/vmci.
>   */
> @@ -338,6 +343,8 @@ static int vmci_host_do_init_context(struct vmci_host_dev *vmci_host_dev,
>  	vmci_host_dev->ct_type = VMCIOBJ_CONTEXT;
>  	atomic_inc(&vmci_host_active_users);
>  
> +	vmci_call_vsock_callback(true);
> +
>  	retval = 0;
>  
>  out:
> diff --git a/include/linux/vmw_vmci_api.h b/include/linux/vmw_vmci_api.h
> index acd9fafe4fc6..f28907345c80 100644
> --- a/include/linux/vmw_vmci_api.h
> +++ b/include/linux/vmw_vmci_api.h
> @@ -19,6 +19,7 @@
>  struct msghdr;
>  typedef void (vmci_device_shutdown_fn) (void *device_registration,
>  					void *user_data);
> +typedef void (*vmci_vsock_cb) (bool is_host);
>  
>  int vmci_datagram_create_handle(u32 resource_id, u32 flags,
>  				vmci_datagram_recv_cb recv_cb,
> @@ -37,6 +38,7 @@ int vmci_doorbell_destroy(struct vmci_handle handle);
>  int vmci_doorbell_notify(struct vmci_handle handle, u32 priv_flags);
>  u32 vmci_get_context_id(void);
>  bool vmci_is_context_owner(u32 context_id, kuid_t uid);
> +int vmci_register_vsock_callback(vmci_vsock_cb callback);
>  
>  int vmci_event_subscribe(u32 event,
>  			 vmci_event_cb callback, void *callback_data,
> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> index 2eb3f16d53e7..04437f822d82 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -2053,19 +2053,22 @@ static bool vmci_check_transport(struct vsock_sock *vsk)
>  	return vsk->transport == &vmci_transport;
>  }
>  
> -static int __init vmci_transport_init(void)
> +void vmci_vsock_transport_cb(bool is_host)
>  {
> -	int features = VSOCK_TRANSPORT_F_DGRAM | VSOCK_TRANSPORT_F_H2G;
> -	int cid;
> -	int err;
> +	int features;
>  
> -	cid = vmci_get_context_id();
> +	if (is_host)
> +		features = VSOCK_TRANSPORT_F_H2G;
> +	else
> +		features = VSOCK_TRANSPORT_F_G2H;
>  
> -	if (cid == VMCI_INVALID_ID)
> -		return -EINVAL;
> +	vsock_core_register(&vmci_transport, features);
> +}
>  
> -	if (cid != VMCI_HOST_CONTEXT_ID)
> -		features |= VSOCK_TRANSPORT_F_G2H;
> +static int __init vmci_transport_init(void)
> +{
> +	int features = VSOCK_TRANSPORT_F_DGRAM;
> +	int err;
>  
>  	/* Create the datagram handle that we will use to send and receive all
>  	 * VSocket control messages for this context.
> @@ -2079,7 +2082,6 @@ static int __init vmci_transport_init(void)
>  		pr_err("Unable to create datagram handle. (%d)\n", err);
>  		return vmci_transport_error_to_vsock_error(err);
>  	}
> -
>  	err = vmci_event_subscribe(VMCI_EVENT_QP_RESUMED,
>  				   vmci_transport_qp_resumed_cb,
>  				   NULL, &vmci_transport_qp_resumed_sub_id);
> @@ -2094,8 +2096,14 @@ static int __init vmci_transport_init(void)
>  	if (err < 0)
>  		goto err_unsubscribe;
>  
> +	err = vmci_register_vsock_callback(vmci_vsock_transport_cb);
> +	if (err < 0)
> +		goto err_unregister;
> +
>  	return 0;
>  
> +err_unregister:
> +	vsock_core_unregister(&vmci_transport);
>  err_unsubscribe:
>  	vmci_event_unsubscribe(vmci_transport_qp_resumed_sub_id);
>  err_destroy_stream_handle:
> @@ -2121,6 +2129,7 @@ static void __exit vmci_transport_exit(void)
>  		vmci_transport_qp_resumed_sub_id = VMCI_INVALID_ID;
>  	}
>  
> +	vmci_register_vsock_callback(NULL);
>  	vsock_core_unregister(&vmci_transport);
>  }
>  module_exit(vmci_transport_exit);
> -- 
> 2.21.0
> 

-- 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 11/14] vsock: add multi-transports support
  2019-10-23  9:55 ` [PATCH net-next 11/14] vsock: add multi-transports support Stefano Garzarella
  2019-10-23 15:08   ` Stefano Garzarella
@ 2019-11-11 13:53   ` Jorgen Hansen
  2019-11-11 17:17     ` Stefano Garzarella
  1 sibling, 1 reply; 46+ messages in thread
From: Jorgen Hansen @ 2019-11-11 13:53 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Wednesday, October 23, 2019 11:56 AM

Thanks a lot for working on this!

> With the multi-transports support, we can use vsock with nested VMs (using
> also different hypervisors) loading both guest->host and
> host->guest transports at the same time.
> 
> Major changes:
> - vsock core module can be loaded regardless of the transports
> - vsock_core_init() and vsock_core_exit() are renamed to
>   vsock_core_register() and vsock_core_unregister()
> - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
>   to identify which directions the transport can handle and if it's
>   support DGRAM (only vmci)
> - each stream socket is assigned to a transport when the remote CID
>   is set (during the connect() or when we receive a connection request
>   on a listener socket).

How about allowing the transport to be set during bind as well? That
would allow an application to ensure that it is using a specific transport,
i.e., if it binds to the host CID, it will use H2G, and if it binds to something
else it will use G2H? You can still use VMADDR_CID_ANY if you want to
initially listen to both transports.


>   The remote CID is used to decide which transport to use:
>   - remote CID > VMADDR_CID_HOST will use host->guest transport
>   - remote CID <= VMADDR_CID_HOST will use guest->host transport
> - listener sockets are not bound to any transports since no transport
>   operations are done on it. In this way we can create a listener
>   socket, also if the transports are not loaded or with VMADDR_CID_ANY
>   to listen on all transports.
> - DGRAM sockets are handled as before, since only the vmci_transport
>   provides this feature.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
> RFC -> v1:
> - documented VSOCK_TRANSPORT_F_* flags
> - fixed vsock_assign_transport() when the socket is already assigned
>   (e.g connection failed)
> - moved features outside of struct vsock_transport, and used as
>   parameter of vsock_core_register()
> ---
>  drivers/vhost/vsock.c                   |   5 +-
>  include/net/af_vsock.h                  |  17 +-
>  net/vmw_vsock/af_vsock.c                | 237 ++++++++++++++++++------
>  net/vmw_vsock/hyperv_transport.c        |  26 ++-
>  net/vmw_vsock/virtio_transport.c        |   7 +-
>  net/vmw_vsock/virtio_transport_common.c |  28 ++-
>  net/vmw_vsock/vmci_transport.c          |  31 +++-
>  7 files changed, 270 insertions(+), 81 deletions(-)
> 


> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index
> d89381166028..dddd85d9a147 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -130,7 +130,12 @@ static struct proto vsock_proto = {  #define
> VSOCK_DEFAULT_BUFFER_MAX_SIZE (1024 * 256)  #define
> VSOCK_DEFAULT_BUFFER_MIN_SIZE 128
> 
> -static const struct vsock_transport *transport_single;
> +/* Transport used for host->guest communication */ static const struct
> +vsock_transport *transport_h2g;
> +/* Transport used for guest->host communication */ static const struct
> +vsock_transport *transport_g2h;
> +/* Transport used for DGRAM communication */ static const struct
> +vsock_transport *transport_dgram;
>  static DEFINE_MUTEX(vsock_register_mutex);
> 
>  /**** UTILS ****/
> @@ -182,7 +187,7 @@ static int vsock_auto_bind(struct vsock_sock *vsk)
>  	return __vsock_bind(sk, &local_addr);
>  }
> 
> -static int __init vsock_init_tables(void)
> +static void vsock_init_tables(void)
>  {
>  	int i;
> 
> @@ -191,7 +196,6 @@ static int __init vsock_init_tables(void)
> 
>  	for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
>  		INIT_LIST_HEAD(&vsock_connected_table[i]);
> -	return 0;
>  }
> 
>  static void __vsock_insert_bound(struct list_head *list, @@ -376,6 +380,62
> @@ void vsock_enqueue_accept(struct sock *listener, struct sock
> *connected)  }  EXPORT_SYMBOL_GPL(vsock_enqueue_accept);
> 
> +/* Assign a transport to a socket and call the .init transport callback.
> + *
> + * Note: for stream socket this must be called when vsk->remote_addr is
> +set
> + * (e.g. during the connect() or when a connection request on a
> +listener
> + * socket is received).
> + * The vsk->remote_addr is used to decide which transport to use:
> + *  - remote CID > VMADDR_CID_HOST will use host->guest transport
> + *  - remote CID <= VMADDR_CID_HOST will use guest->host transport  */
> +int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock
> +*psk) {
> +	const struct vsock_transport *new_transport;
> +	struct sock *sk = sk_vsock(vsk);
> +
> +	switch (sk->sk_type) {
> +	case SOCK_DGRAM:
> +		new_transport = transport_dgram;
> +		break;
> +	case SOCK_STREAM:
> +		if (vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
> +			new_transport = transport_h2g;
> +		else
> +			new_transport = transport_g2h;
> +		break;

You already mentioned that you are working on a fix for loopback
here for the guest, but presumably a host could also do loopback.
If we select transport during bind to a specific CID, this comment
Isn't relevant, but otherwise, we should look at the local addr as
well, since a socket with local addr of host CID shouldn't use
the guest to host transport, and a socket with local addr > host CID
shouldn't use host to guest.


> +	default:
> +		return -ESOCKTNOSUPPORT;
> +	}
> +
> +	if (vsk->transport) {
> +		if (vsk->transport == new_transport)
> +			return 0;
> +
> +		vsk->transport->release(vsk);
> +		vsk->transport->destruct(vsk);
> +	}
> +
> +	if (!new_transport)
> +		return -ENODEV;
> +
> +	vsk->transport = new_transport;
> +
> +	return vsk->transport->init(vsk, psk); }
> +EXPORT_SYMBOL_GPL(vsock_assign_transport);
> +
> +static bool vsock_find_cid(unsigned int cid) {
> +	if (transport_g2h && cid == transport_g2h->get_local_cid())
> +		return true;
> +
> +	if (transport_h2g && cid == VMADDR_CID_HOST)
> +		return true;
> +
> +	return false;
> +}
> +
>  static struct sock *vsock_dequeue_accept(struct sock *listener)  {
>  	struct vsock_sock *vlistener;


> diff --git a/net/vmw_vsock/vmci_transport.c
> b/net/vmw_vsock/vmci_transport.c index 5955238ffc13..2eb3f16d53e7
> 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c

> @@ -1017,6 +1018,15 @@ static int vmci_transport_recv_listen(struct sock
> *sk,
>  	vsock_addr_init(&vpending->remote_addr, pkt->dg.src.context,
>  			pkt->src_port);
> 
> +	err = vsock_assign_transport(vpending, vsock_sk(sk));
> +	/* Transport assigned (looking at remote_addr) must be the same
> +	 * where we received the request.
> +	 */
> +	if (err || !vmci_check_transport(vpending)) {

We need to send a reset on error, i.e.,
  vmci_transport_send_reset(sk, pkt);

> +		sock_put(pending);
> +		return err;
> +	}
> +
>  	/* If the proposed size fits within our min/max, accept it. Otherwise
>  	 * propose our own size.
>  	 */

Thanks,
Jorgen

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active
  2019-10-23  9:55 ` [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active Stefano Garzarella
  2019-10-27  8:17   ` Stefan Hajnoczi
  2019-11-04 10:10   ` Stefano Garzarella
@ 2019-11-11 16:27   ` Jorgen Hansen
  2019-11-11 17:30     ` Stefano Garzarella
  2 siblings, 1 reply; 46+ messages in thread
From: Jorgen Hansen @ 2019-11-11 16:27 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Wednesday, October 23, 2019 11:56 AM
> 
> To allow other transports to be loaded with vmci_transport,
> we register the vmci_transport as G2H or H2G only when a VMCI guest
> or host is active.
> 
> To do that, this patch adds a callback registered in the vmci driver
> that will be called when a new host or guest become active.
> This callback will register the vmci_transport in the VSOCK core.
> If the transport is already registered, we ignore the error coming
> from vsock_core_register().

So today this is mainly an issue for the VMCI vsock transport, because
VMCI autoloads with vsock (and with this solution it can continue to
do that, so none of our old products break due to changed behavior,
which is great). Shouldn't vhost behave similar, so that any module
that registers a h2g transport only does so if it is in active use?


> --- a/drivers/misc/vmw_vmci/vmci_host.c
> +++ b/drivers/misc/vmw_vmci/vmci_host.c
> @@ -108,6 +108,11 @@ bool vmci_host_code_active(void)
>  	     atomic_read(&vmci_host_active_users) > 0);
>  }
> 
> +int vmci_host_users(void)
> +{
> +	return atomic_read(&vmci_host_active_users);
> +}
> +
>  /*
>   * Called on open of /dev/vmci.
>   */
> @@ -338,6 +343,8 @@ static int vmci_host_do_init_context(struct
> vmci_host_dev *vmci_host_dev,
>  	vmci_host_dev->ct_type = VMCIOBJ_CONTEXT;
>  	atomic_inc(&vmci_host_active_users);
> 
> +	vmci_call_vsock_callback(true);
> +

Since we don't unregister the transport if user count drops back to 0, we could
just call this the first time, a VM is powered on after the module is loaded.

>  	retval = 0;
> 
>  out:


^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 13/14] vsock: prevent transport modules unloading
  2019-10-23  9:55 ` [PATCH net-next 13/14] vsock: prevent transport modules unloading Stefano Garzarella
@ 2019-11-11 16:36   ` Jorgen Hansen
  0 siblings, 0 replies; 46+ messages in thread
From: Jorgen Hansen @ 2019-11-11 16:36 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Wednesday, October 23, 2019 11:56 AM

> This patch adds 'module' member in the 'struct vsock_transport'
> in order to get/put the transport module. This prevents the
> module unloading while sockets are assigned to it.
> 
> We increase the module refcnt when a socket is assigned to a
> transport, and we decrease the module refcnt when the socket
> is destructed.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
> RFC -> v1:
> - fixed typo 's/tranport/transport/' in a comment (Stefan)
> ---
>  drivers/vhost/vsock.c            |  2 ++
>  include/net/af_vsock.h           |  2 ++
>  net/vmw_vsock/af_vsock.c         | 20 ++++++++++++++++----
>  net/vmw_vsock/hyperv_transport.c |  2 ++
>  net/vmw_vsock/virtio_transport.c |  2 ++
>  net/vmw_vsock/vmci_transport.c   |  1 +
>  6 files changed, 25 insertions(+), 4 deletions(-)

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 14/14] vsock: fix bind() behaviour taking care of CID
  2019-10-23  9:55 ` [PATCH net-next 14/14] vsock: fix bind() behaviour taking care of CID Stefano Garzarella
@ 2019-11-11 16:53   ` Jorgen Hansen
  0 siblings, 0 replies; 46+ messages in thread
From: Jorgen Hansen @ 2019-11-11 16:53 UTC (permalink / raw)
  To: Stefano Garzarella, netdev
  Cc: Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Wednesday, October 23, 2019 11:56 AM
> When we are looking for a socket bound to a specific address,
> we also have to take into account the CID.
> 
> This patch is useful with multi-transports support because it
> allows the binding of the same port with different CID, and
> it prevents a connection to a wrong socket bound to the same
> port, but with different CID.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/af_vsock.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 11/14] vsock: add multi-transports support
  2019-11-11 13:53   ` Jorgen Hansen
@ 2019-11-11 17:17     ` Stefano Garzarella
  2019-11-12  9:59       ` Jorgen Hansen
  0 siblings, 1 reply; 46+ messages in thread
From: Stefano Garzarella @ 2019-11-11 17:17 UTC (permalink / raw)
  To: Jorgen Hansen
  Cc: netdev, Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization

On Mon, Nov 11, 2019 at 01:53:39PM +0000, Jorgen Hansen wrote:
> > From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> > Sent: Wednesday, October 23, 2019 11:56 AM
> 
> Thanks a lot for working on this!
> 

Thanks to you for the reviews!

> > With the multi-transports support, we can use vsock with nested VMs (using
> > also different hypervisors) loading both guest->host and
> > host->guest transports at the same time.
> > 
> > Major changes:
> > - vsock core module can be loaded regardless of the transports
> > - vsock_core_init() and vsock_core_exit() are renamed to
> >   vsock_core_register() and vsock_core_unregister()
> > - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
> >   to identify which directions the transport can handle and if it's
> >   support DGRAM (only vmci)
> > - each stream socket is assigned to a transport when the remote CID
> >   is set (during the connect() or when we receive a connection request
> >   on a listener socket).
> 
> How about allowing the transport to be set during bind as well? That
> would allow an application to ensure that it is using a specific transport,
> i.e., if it binds to the host CID, it will use H2G, and if it binds to something
> else it will use G2H? You can still use VMADDR_CID_ANY if you want to
> initially listen to both transports.

Do you mean for socket that will call the connect()?

For listener socket the "[PATCH net-next 14/14] vsock: fix bind() behaviour
taking care of CID" provides this behaviour.
Since the listener sockets don't use any transport specific callback
(they don't send any data to the remote peer), but they are used as placeholder,
we don't need to assign them to a transport.

> 
> 
> >   The remote CID is used to decide which transport to use:
> >   - remote CID > VMADDR_CID_HOST will use host->guest transport
> >   - remote CID <= VMADDR_CID_HOST will use guest->host transport
> > - listener sockets are not bound to any transports since no transport
> >   operations are done on it. In this way we can create a listener
> >   socket, also if the transports are not loaded or with VMADDR_CID_ANY
> >   to listen on all transports.
> > - DGRAM sockets are handled as before, since only the vmci_transport
> >   provides this feature.
> > 
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > ---
> > RFC -> v1:
> > - documented VSOCK_TRANSPORT_F_* flags
> > - fixed vsock_assign_transport() when the socket is already assigned
> >   (e.g connection failed)
> > - moved features outside of struct vsock_transport, and used as
> >   parameter of vsock_core_register()
> > ---
> >  drivers/vhost/vsock.c                   |   5 +-
> >  include/net/af_vsock.h                  |  17 +-
> >  net/vmw_vsock/af_vsock.c                | 237 ++++++++++++++++++------
> >  net/vmw_vsock/hyperv_transport.c        |  26 ++-
> >  net/vmw_vsock/virtio_transport.c        |   7 +-
> >  net/vmw_vsock/virtio_transport_common.c |  28 ++-
> >  net/vmw_vsock/vmci_transport.c          |  31 +++-
> >  7 files changed, 270 insertions(+), 81 deletions(-)
> > 
> 
> 
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index
> > d89381166028..dddd85d9a147 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -130,7 +130,12 @@ static struct proto vsock_proto = {  #define
> > VSOCK_DEFAULT_BUFFER_MAX_SIZE (1024 * 256)  #define
> > VSOCK_DEFAULT_BUFFER_MIN_SIZE 128
> > 
> > -static const struct vsock_transport *transport_single;
> > +/* Transport used for host->guest communication */ static const struct
> > +vsock_transport *transport_h2g;
> > +/* Transport used for guest->host communication */ static const struct
> > +vsock_transport *transport_g2h;
> > +/* Transport used for DGRAM communication */ static const struct
> > +vsock_transport *transport_dgram;
> >  static DEFINE_MUTEX(vsock_register_mutex);
> > 
> >  /**** UTILS ****/
> > @@ -182,7 +187,7 @@ static int vsock_auto_bind(struct vsock_sock *vsk)
> >  	return __vsock_bind(sk, &local_addr);
> >  }
> > 
> > -static int __init vsock_init_tables(void)
> > +static void vsock_init_tables(void)
> >  {
> >  	int i;
> > 
> > @@ -191,7 +196,6 @@ static int __init vsock_init_tables(void)
> > 
> >  	for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
> >  		INIT_LIST_HEAD(&vsock_connected_table[i]);
> > -	return 0;
> >  }
> > 
> >  static void __vsock_insert_bound(struct list_head *list, @@ -376,6 +380,62
> > @@ void vsock_enqueue_accept(struct sock *listener, struct sock
> > *connected)  }  EXPORT_SYMBOL_GPL(vsock_enqueue_accept);
> > 
> > +/* Assign a transport to a socket and call the .init transport callback.
> > + *
> > + * Note: for stream socket this must be called when vsk->remote_addr is
> > +set
> > + * (e.g. during the connect() or when a connection request on a
> > +listener
> > + * socket is received).
> > + * The vsk->remote_addr is used to decide which transport to use:
> > + *  - remote CID > VMADDR_CID_HOST will use host->guest transport
> > + *  - remote CID <= VMADDR_CID_HOST will use guest->host transport  */
> > +int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock
> > +*psk) {
> > +	const struct vsock_transport *new_transport;
> > +	struct sock *sk = sk_vsock(vsk);
> > +
> > +	switch (sk->sk_type) {
> > +	case SOCK_DGRAM:
> > +		new_transport = transport_dgram;
> > +		break;
> > +	case SOCK_STREAM:
> > +		if (vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
> > +			new_transport = transport_h2g;
> > +		else
> > +			new_transport = transport_g2h;
> > +		break;
> 
> You already mentioned that you are working on a fix for loopback
> here for the guest, but presumably a host could also do loopback.

IIUC we don't support loopback in the host, because in this case the
application will use the CID_HOST as address, but if we are in a nested
VM environment we are in trouble.

Since several people asked about this feature at the KVM Forum, I would like
to add a new VMADDR_CID_LOCAL (i.e. using the reserved 1) and implement
loopback in the core.

What do you think?

> If we select transport during bind to a specific CID, this comment

Also in this case, are you talking about the peer that will call
connect()?

> Isn't relevant, but otherwise, we should look at the local addr as
> well, since a socket with local addr of host CID shouldn't use
> the guest to host transport, and a socket with local addr > host CID
> shouldn't use host to guest.

Yes, I agree, in my fix I'm looking at the local addr, and in L1 I'll
not allow to assign a CID to a nested L2 equal to the CID of L1 (in
vhost-vsock)

Maybe we can allow the host loopback (using CID_HOST), only if there isn't
G2H loaded, but also in this case I'd like to move the loopback in the vsock
core, since we can do that, also if there are no transports loaded.

> 
> 
> > +	default:
> > +		return -ESOCKTNOSUPPORT;
> > +	}
> > +
> > +	if (vsk->transport) {
> > +		if (vsk->transport == new_transport)
> > +			return 0;
> > +
> > +		vsk->transport->release(vsk);
> > +		vsk->transport->destruct(vsk);
> > +	}
> > +
> > +	if (!new_transport)
> > +		return -ENODEV;
> > +
> > +	vsk->transport = new_transport;
> > +
> > +	return vsk->transport->init(vsk, psk); }
> > +EXPORT_SYMBOL_GPL(vsock_assign_transport);
> > +
> > +static bool vsock_find_cid(unsigned int cid) {
> > +	if (transport_g2h && cid == transport_g2h->get_local_cid())
> > +		return true;
> > +
> > +	if (transport_h2g && cid == VMADDR_CID_HOST)
> > +		return true;
> > +
> > +	return false;
> > +}
> > +
> >  static struct sock *vsock_dequeue_accept(struct sock *listener)  {
> >  	struct vsock_sock *vlistener;
> 
> 
> > diff --git a/net/vmw_vsock/vmci_transport.c
> > b/net/vmw_vsock/vmci_transport.c index 5955238ffc13..2eb3f16d53e7
> > 100644
> > --- a/net/vmw_vsock/vmci_transport.c
> > +++ b/net/vmw_vsock/vmci_transport.c
> 
> > @@ -1017,6 +1018,15 @@ static int vmci_transport_recv_listen(struct sock
> > *sk,
> >  	vsock_addr_init(&vpending->remote_addr, pkt->dg.src.context,
> >  			pkt->src_port);
> > 
> > +	err = vsock_assign_transport(vpending, vsock_sk(sk));
> > +	/* Transport assigned (looking at remote_addr) must be the same
> > +	 * where we received the request.
> > +	 */
> > +	if (err || !vmci_check_transport(vpending)) {
> 
> We need to send a reset on error, i.e.,
>   vmci_transport_send_reset(sk, pkt);

Good catch, I'll fix in the v2.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active
  2019-11-11 16:27   ` Jorgen Hansen
@ 2019-11-11 17:30     ` Stefano Garzarella
  2019-11-12 10:03       ` Jorgen Hansen
  0 siblings, 1 reply; 46+ messages in thread
From: Stefano Garzarella @ 2019-11-11 17:30 UTC (permalink / raw)
  To: Jorgen Hansen
  Cc: netdev, Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization

On Mon, Nov 11, 2019 at 04:27:28PM +0000, Jorgen Hansen wrote:
> > From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> > Sent: Wednesday, October 23, 2019 11:56 AM
> > 
> > To allow other transports to be loaded with vmci_transport,
> > we register the vmci_transport as G2H or H2G only when a VMCI guest
> > or host is active.
> > 
> > To do that, this patch adds a callback registered in the vmci driver
> > that will be called when a new host or guest become active.
> > This callback will register the vmci_transport in the VSOCK core.
> > If the transport is already registered, we ignore the error coming
> > from vsock_core_register().
> 
> So today this is mainly an issue for the VMCI vsock transport, because
> VMCI autoloads with vsock (and with this solution it can continue to
> do that, so none of our old products break due to changed behavior,
> which is great).

I tried to not break anything :-)

>                  Shouldn't vhost behave similar, so that any module
> that registers a h2g transport only does so if it is in active use?
> 

The vhost-vsock module will load when the first hypervisor open
/dev/vhost-vsock, so in theory, when there's at least one active user.

> 
> > --- a/drivers/misc/vmw_vmci/vmci_host.c
> > +++ b/drivers/misc/vmw_vmci/vmci_host.c
> > @@ -108,6 +108,11 @@ bool vmci_host_code_active(void)
> >  	     atomic_read(&vmci_host_active_users) > 0);
> >  }
> > 
> > +int vmci_host_users(void)
> > +{
> > +	return atomic_read(&vmci_host_active_users);
> > +}
> > +
> >  /*
> >   * Called on open of /dev/vmci.
> >   */
> > @@ -338,6 +343,8 @@ static int vmci_host_do_init_context(struct
> > vmci_host_dev *vmci_host_dev,
> >  	vmci_host_dev->ct_type = VMCIOBJ_CONTEXT;
> >  	atomic_inc(&vmci_host_active_users);
> > 
> > +	vmci_call_vsock_callback(true);
> > +
> 
> Since we don't unregister the transport if user count drops back to 0, we could
> just call this the first time, a VM is powered on after the module is loaded.

Yes, make sense. can I use the 'vmci_host_active_users' or is better to
add a new 'vmci_host_vsock_loaded'?

My doubt is that vmci_host_active_users can return to 0, so when it returns
to 1, we call vmci_call_vsock_callback() again.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 11/14] vsock: add multi-transports support
  2019-11-11 17:17     ` Stefano Garzarella
@ 2019-11-12  9:59       ` Jorgen Hansen
  2019-11-12 10:36         ` Stefano Garzarella
  0 siblings, 1 reply; 46+ messages in thread
From: Jorgen Hansen @ 2019-11-12  9:59 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Monday, November 11, 2019 6:18 PM
> To: Jorgen Hansen <jhansen@vmware.com>
> Subject: Re: [PATCH net-next 11/14] vsock: add multi-transports support
> 
> On Mon, Nov 11, 2019 at 01:53:39PM +0000, Jorgen Hansen wrote:
> > > From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> > > Sent: Wednesday, October 23, 2019 11:56 AM
> >
> > Thanks a lot for working on this!
> >
> 
> Thanks to you for the reviews!
> 
> > > With the multi-transports support, we can use vsock with nested VMs
> (using
> > > also different hypervisors) loading both guest->host and
> > > host->guest transports at the same time.
> > >
> > > Major changes:
> > > - vsock core module can be loaded regardless of the transports
> > > - vsock_core_init() and vsock_core_exit() are renamed to
> > >   vsock_core_register() and vsock_core_unregister()
> > > - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
> > >   to identify which directions the transport can handle and if it's
> > >   support DGRAM (only vmci)
> > > - each stream socket is assigned to a transport when the remote CID
> > >   is set (during the connect() or when we receive a connection request
> > >   on a listener socket).
> >
> > How about allowing the transport to be set during bind as well? That
> > would allow an application to ensure that it is using a specific transport,
> > i.e., if it binds to the host CID, it will use H2G, and if it binds to something
> > else it will use G2H? You can still use VMADDR_CID_ANY if you want to
> > initially listen to both transports.
> 
> Do you mean for socket that will call the connect()?

I was just thinking that in general we know the transport at that point, so we
could ensure that the socket would only see traffic from the relevant transport,
but as you mention below -  the updated bind lookup, and the added checks
when selecting transport should also take care of this, so that is fine.
 
> For listener socket the "[PATCH net-next 14/14] vsock: fix bind() behaviour
> taking care of CID" provides this behaviour.
> Since the listener sockets don't use any transport specific callback
> (they don't send any data to the remote peer), but they are used as
> placeholder,
> we don't need to assign them to a transport.
> 
> >
> >
> > >   The remote CID is used to decide which transport to use:
> > >   - remote CID > VMADDR_CID_HOST will use host->guest transport
> > >   - remote CID <= VMADDR_CID_HOST will use guest->host transport
> > > - listener sockets are not bound to any transports since no transport
> > >   operations are done on it. In this way we can create a listener
> > >   socket, also if the transports are not loaded or with VMADDR_CID_ANY
> > >   to listen on all transports.
> > > - DGRAM sockets are handled as before, since only the vmci_transport
> > >   provides this feature.
> > >
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > ---
> > > RFC -> v1:
> > > - documented VSOCK_TRANSPORT_F_* flags
> > > - fixed vsock_assign_transport() when the socket is already assigned
> > >   (e.g connection failed)
> > > - moved features outside of struct vsock_transport, and used as
> > >   parameter of vsock_core_register()
> > > ---
> > >  drivers/vhost/vsock.c                   |   5 +-
> > >  include/net/af_vsock.h                  |  17 +-
> > >  net/vmw_vsock/af_vsock.c                | 237 ++++++++++++++++++------
> > >  net/vmw_vsock/hyperv_transport.c        |  26 ++-
> > >  net/vmw_vsock/virtio_transport.c        |   7 +-
> > >  net/vmw_vsock/virtio_transport_common.c |  28 ++-
> > >  net/vmw_vsock/vmci_transport.c          |  31 +++-
> > >  7 files changed, 270 insertions(+), 81 deletions(-)
> > >
> >
> >
> > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index
> > > d89381166028..dddd85d9a147 100644
> > > --- a/net/vmw_vsock/af_vsock.c
> > > +++ b/net/vmw_vsock/af_vsock.c
> > > @@ -130,7 +130,12 @@ static struct proto vsock_proto = {  #define
> > > VSOCK_DEFAULT_BUFFER_MAX_SIZE (1024 * 256)  #define
> > > VSOCK_DEFAULT_BUFFER_MIN_SIZE 128
> > >
> > > -static const struct vsock_transport *transport_single;
> > > +/* Transport used for host->guest communication */ static const struct
> > > +vsock_transport *transport_h2g;
> > > +/* Transport used for guest->host communication */ static const struct
> > > +vsock_transport *transport_g2h;
> > > +/* Transport used for DGRAM communication */ static const struct
> > > +vsock_transport *transport_dgram;
> > >  static DEFINE_MUTEX(vsock_register_mutex);
> > >
> > >  /**** UTILS ****/
> > > @@ -182,7 +187,7 @@ static int vsock_auto_bind(struct vsock_sock *vsk)
> > >  	return __vsock_bind(sk, &local_addr);
> > >  }
> > >
> > > -static int __init vsock_init_tables(void)
> > > +static void vsock_init_tables(void)
> > >  {
> > >  	int i;
> > >
> > > @@ -191,7 +196,6 @@ static int __init vsock_init_tables(void)
> > >
> > >  	for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
> > >  		INIT_LIST_HEAD(&vsock_connected_table[i]);
> > > -	return 0;
> > >  }
> > >
> > >  static void __vsock_insert_bound(struct list_head *list, @@ -376,6
> +380,62
> > > @@ void vsock_enqueue_accept(struct sock *listener, struct sock
> > > *connected)  }  EXPORT_SYMBOL_GPL(vsock_enqueue_accept);
> > >
> > > +/* Assign a transport to a socket and call the .init transport callback.
> > > + *
> > > + * Note: for stream socket this must be called when vsk->remote_addr
> is
> > > +set
> > > + * (e.g. during the connect() or when a connection request on a
> > > +listener
> > > + * socket is received).
> > > + * The vsk->remote_addr is used to decide which transport to use:
> > > + *  - remote CID > VMADDR_CID_HOST will use host->guest transport
> > > + *  - remote CID <= VMADDR_CID_HOST will use guest->host transport
> */
> > > +int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock
> > > +*psk) {
> > > +	const struct vsock_transport *new_transport;
> > > +	struct sock *sk = sk_vsock(vsk);
> > > +
> > > +	switch (sk->sk_type) {
> > > +	case SOCK_DGRAM:
> > > +		new_transport = transport_dgram;
> > > +		break;
> > > +	case SOCK_STREAM:
> > > +		if (vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
> > > +			new_transport = transport_h2g;
> > > +		else
> > > +			new_transport = transport_g2h;
> > > +		break;
> >
> > You already mentioned that you are working on a fix for loopback
> > here for the guest, but presumably a host could also do loopback.
> 
> IIUC we don't support loopback in the host, because in this case the
> application will use the CID_HOST as address, but if we are in a nested
> VM environment we are in trouble.

If both src and dst CID are CID_HOST, we should be fairly sure that this
Is host loopback, no? If src is anything else, we would do G2H.

> 
> Since several people asked about this feature at the KVM Forum, I would like
> to add a new VMADDR_CID_LOCAL (i.e. using the reserved 1) and implement
> loopback in the core.
> 
> What do you think?

What kind of use cases are mentioned in the KVM forum for loopback? One concern
is that we have to maintain yet another interprocess communication mechanism,
even though other choices exist already  (and those are likely to be more efficient
given the development time and specific focus that went into those). To me, the
local connections are mainly useful as a way to sanity test the protocol and transports.
However, if loopback is compelling, it would make sense have it in the core, since it
shouldn't need a specific transport. 

> 
> > If we select transport during bind to a specific CID, this comment
> 
> Also in this case, are you talking about the peer that will call
> connect()?

The same thought as mentioned in the beginning - but as mentioned
above, I agree that your updated bind and transport selection should
handle this as well.
 
> > Isn't relevant, but otherwise, we should look at the local addr as
> > well, since a socket with local addr of host CID shouldn't use
> > the guest to host transport, and a socket with local addr > host CID
> > shouldn't use host to guest.
> 
> Yes, I agree, in my fix I'm looking at the local addr, and in L1 I'll
> not allow to assign a CID to a nested L2 equal to the CID of L1 (in
> vhost-vsock)
> 
> Maybe we can allow the host loopback (using CID_HOST), only if there isn't
> G2H loaded, but also in this case I'd like to move the loopback in the vsock
> core, since we can do that, also if there are no transports loaded.
> >
> >
> > > +	default:
> > > +		return -ESOCKTNOSUPPORT;
> > > +	}
> > > +
> > > +	if (vsk->transport) {
> > > +		if (vsk->transport == new_transport)
> > > +			return 0;
> > > +
> > > +		vsk->transport->release(vsk);
> > > +		vsk->transport->destruct(vsk);
> > > +	}
> > > +
> > > +	if (!new_transport)
> > > +		return -ENODEV;
> > > +
> > > +	vsk->transport = new_transport;
> > > +
> > > +	return vsk->transport->init(vsk, psk); }
> > > +EXPORT_SYMBOL_GPL(vsock_assign_transport);
> > > +
> > > +static bool vsock_find_cid(unsigned int cid) {
> > > +	if (transport_g2h && cid == transport_g2h->get_local_cid())
> > > +		return true;
> > > +
> > > +	if (transport_h2g && cid == VMADDR_CID_HOST)
> > > +		return true;
> > > +
> > > +	return false;
> > > +}
> > > +
> > >  static struct sock *vsock_dequeue_accept(struct sock *listener)  {
> > >  	struct vsock_sock *vlistener;
> >
> >
> > > diff --git a/net/vmw_vsock/vmci_transport.c
> > > b/net/vmw_vsock/vmci_transport.c index 5955238ffc13..2eb3f16d53e7
> > > 100644
> > > --- a/net/vmw_vsock/vmci_transport.c
> > > +++ b/net/vmw_vsock/vmci_transport.c
> >
> > > @@ -1017,6 +1018,15 @@ static int vmci_transport_recv_listen(struct
> sock
> > > *sk,
> > >  	vsock_addr_init(&vpending->remote_addr, pkt->dg.src.context,
> > >  			pkt->src_port);
> > >
> > > +	err = vsock_assign_transport(vpending, vsock_sk(sk));
> > > +	/* Transport assigned (looking at remote_addr) must be the same
> > > +	 * where we received the request.
> > > +	 */
> > > +	if (err || !vmci_check_transport(vpending)) {
> >
> > We need to send a reset on error, i.e.,
> >   vmci_transport_send_reset(sk, pkt);
> 
> Good catch, I'll fix in the v2.
> 
> Thanks,
> Stefano

Thanks,
Jorgen

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active
  2019-11-11 17:30     ` Stefano Garzarella
@ 2019-11-12 10:03       ` Jorgen Hansen
  2019-11-12 10:42         ` Stefano Garzarella
  0 siblings, 1 reply; 46+ messages in thread
From: Jorgen Hansen @ 2019-11-12 10:03 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Monday, November 11, 2019 6:31 PM
> On Mon, Nov 11, 2019 at 04:27:28PM +0000, Jorgen Hansen wrote:
> > > From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> > > Sent: Wednesday, October 23, 2019 11:56 AM
> > >
> > > To allow other transports to be loaded with vmci_transport,
> > > we register the vmci_transport as G2H or H2G only when a VMCI guest
> > > or host is active.
> > >
> > > To do that, this patch adds a callback registered in the vmci driver
> > > that will be called when a new host or guest become active.
> > > This callback will register the vmci_transport in the VSOCK core.
> > > If the transport is already registered, we ignore the error coming
> > > from vsock_core_register().
> >
> > So today this is mainly an issue for the VMCI vsock transport, because
> > VMCI autoloads with vsock (and with this solution it can continue to
> > do that, so none of our old products break due to changed behavior,
> > which is great).
> 
> I tried to not break anything :-)
> 
> >                  Shouldn't vhost behave similar, so that any module
> > that registers a h2g transport only does so if it is in active use?
> >
> 
> The vhost-vsock module will load when the first hypervisor open
> /dev/vhost-vsock, so in theory, when there's at least one active user.

Ok, sounds good then. 

> 
> >
> > > --- a/drivers/misc/vmw_vmci/vmci_host.c
> > > +++ b/drivers/misc/vmw_vmci/vmci_host.c
> > > @@ -108,6 +108,11 @@ bool vmci_host_code_active(void)
> > >  	     atomic_read(&vmci_host_active_users) > 0);
> > >  }
> > >
> > > +int vmci_host_users(void)
> > > +{
> > > +	return atomic_read(&vmci_host_active_users);
> > > +}
> > > +
> > >  /*
> > >   * Called on open of /dev/vmci.
> > >   */
> > > @@ -338,6 +343,8 @@ static int vmci_host_do_init_context(struct
> > > vmci_host_dev *vmci_host_dev,
> > >  	vmci_host_dev->ct_type = VMCIOBJ_CONTEXT;
> > >  	atomic_inc(&vmci_host_active_users);
> > >
> > > +	vmci_call_vsock_callback(true);
> > > +
> >
> > Since we don't unregister the transport if user count drops back to 0, we
> could
> > just call this the first time, a VM is powered on after the module is loaded.
> 
> Yes, make sense. can I use the 'vmci_host_active_users' or is better to
> add a new 'vmci_host_vsock_loaded'?
> 
> My doubt is that vmci_host_active_users can return to 0, so when it returns
> to 1, we call vmci_call_vsock_callback() again.

vmci_host_active_users can drop to 0 and then increase again, so having a flag
indicating whether the callback has been invoked would ensure that it is only
called once.

Thanks,
Jorgen



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 11/14] vsock: add multi-transports support
  2019-11-12  9:59       ` Jorgen Hansen
@ 2019-11-12 10:36         ` Stefano Garzarella
  2019-11-13 14:30           ` Jorgen Hansen
  0 siblings, 1 reply; 46+ messages in thread
From: Stefano Garzarella @ 2019-11-12 10:36 UTC (permalink / raw)
  To: Jorgen Hansen
  Cc: netdev, Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization

On Tue, Nov 12, 2019 at 09:59:12AM +0000, Jorgen Hansen wrote:
> > From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> > Sent: Monday, November 11, 2019 6:18 PM
> > To: Jorgen Hansen <jhansen@vmware.com>
> > Subject: Re: [PATCH net-next 11/14] vsock: add multi-transports support
> > 
> > On Mon, Nov 11, 2019 at 01:53:39PM +0000, Jorgen Hansen wrote:
> > > > From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> > > > Sent: Wednesday, October 23, 2019 11:56 AM
> > >
> > > Thanks a lot for working on this!
> > >
> > 
> > Thanks to you for the reviews!
> > 
> > > > With the multi-transports support, we can use vsock with nested VMs
> > (using
> > > > also different hypervisors) loading both guest->host and
> > > > host->guest transports at the same time.
> > > >
> > > > Major changes:
> > > > - vsock core module can be loaded regardless of the transports
> > > > - vsock_core_init() and vsock_core_exit() are renamed to
> > > >   vsock_core_register() and vsock_core_unregister()
> > > > - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM)
> > > >   to identify which directions the transport can handle and if it's
> > > >   support DGRAM (only vmci)
> > > > - each stream socket is assigned to a transport when the remote CID
> > > >   is set (during the connect() or when we receive a connection request
> > > >   on a listener socket).
> > >
> > > How about allowing the transport to be set during bind as well? That
> > > would allow an application to ensure that it is using a specific transport,
> > > i.e., if it binds to the host CID, it will use H2G, and if it binds to something
> > > else it will use G2H? You can still use VMADDR_CID_ANY if you want to
> > > initially listen to both transports.
> > 
> > Do you mean for socket that will call the connect()?
> 
> I was just thinking that in general we know the transport at that point, so we
> could ensure that the socket would only see traffic from the relevant transport,
> but as you mention below -  the updated bind lookup, and the added checks
> when selecting transport should also take care of this, so that is fine.
>  
> > For listener socket the "[PATCH net-next 14/14] vsock: fix bind() behaviour
> > taking care of CID" provides this behaviour.
> > Since the listener sockets don't use any transport specific callback
> > (they don't send any data to the remote peer), but they are used as
> > placeholder,
> > we don't need to assign them to a transport.
> > 
> > >
> > >
> > > >   The remote CID is used to decide which transport to use:
> > > >   - remote CID > VMADDR_CID_HOST will use host->guest transport
> > > >   - remote CID <= VMADDR_CID_HOST will use guest->host transport
> > > > - listener sockets are not bound to any transports since no transport
> > > >   operations are done on it. In this way we can create a listener
> > > >   socket, also if the transports are not loaded or with VMADDR_CID_ANY
> > > >   to listen on all transports.
> > > > - DGRAM sockets are handled as before, since only the vmci_transport
> > > >   provides this feature.
> > > >
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > ---
> > > > RFC -> v1:
> > > > - documented VSOCK_TRANSPORT_F_* flags
> > > > - fixed vsock_assign_transport() when the socket is already assigned
> > > >   (e.g connection failed)
> > > > - moved features outside of struct vsock_transport, and used as
> > > >   parameter of vsock_core_register()
> > > > ---
> > > >  drivers/vhost/vsock.c                   |   5 +-
> > > >  include/net/af_vsock.h                  |  17 +-
> > > >  net/vmw_vsock/af_vsock.c                | 237 ++++++++++++++++++------
> > > >  net/vmw_vsock/hyperv_transport.c        |  26 ++-
> > > >  net/vmw_vsock/virtio_transport.c        |   7 +-
> > > >  net/vmw_vsock/virtio_transport_common.c |  28 ++-
> > > >  net/vmw_vsock/vmci_transport.c          |  31 +++-
> > > >  7 files changed, 270 insertions(+), 81 deletions(-)
> > > >
> > >
> > >
> > > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index
> > > > d89381166028..dddd85d9a147 100644
> > > > --- a/net/vmw_vsock/af_vsock.c
> > > > +++ b/net/vmw_vsock/af_vsock.c
> > > > @@ -130,7 +130,12 @@ static struct proto vsock_proto = {  #define
> > > > VSOCK_DEFAULT_BUFFER_MAX_SIZE (1024 * 256)  #define
> > > > VSOCK_DEFAULT_BUFFER_MIN_SIZE 128
> > > >
> > > > -static const struct vsock_transport *transport_single;
> > > > +/* Transport used for host->guest communication */ static const struct
> > > > +vsock_transport *transport_h2g;
> > > > +/* Transport used for guest->host communication */ static const struct
> > > > +vsock_transport *transport_g2h;
> > > > +/* Transport used for DGRAM communication */ static const struct
> > > > +vsock_transport *transport_dgram;
> > > >  static DEFINE_MUTEX(vsock_register_mutex);
> > > >
> > > >  /**** UTILS ****/
> > > > @@ -182,7 +187,7 @@ static int vsock_auto_bind(struct vsock_sock *vsk)
> > > >  	return __vsock_bind(sk, &local_addr);
> > > >  }
> > > >
> > > > -static int __init vsock_init_tables(void)
> > > > +static void vsock_init_tables(void)
> > > >  {
> > > >  	int i;
> > > >
> > > > @@ -191,7 +196,6 @@ static int __init vsock_init_tables(void)
> > > >
> > > >  	for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
> > > >  		INIT_LIST_HEAD(&vsock_connected_table[i]);
> > > > -	return 0;
> > > >  }
> > > >
> > > >  static void __vsock_insert_bound(struct list_head *list, @@ -376,6
> > +380,62
> > > > @@ void vsock_enqueue_accept(struct sock *listener, struct sock
> > > > *connected)  }  EXPORT_SYMBOL_GPL(vsock_enqueue_accept);
> > > >
> > > > +/* Assign a transport to a socket and call the .init transport callback.
> > > > + *
> > > > + * Note: for stream socket this must be called when vsk->remote_addr
> > is
> > > > +set
> > > > + * (e.g. during the connect() or when a connection request on a
> > > > +listener
> > > > + * socket is received).
> > > > + * The vsk->remote_addr is used to decide which transport to use:
> > > > + *  - remote CID > VMADDR_CID_HOST will use host->guest transport
> > > > + *  - remote CID <= VMADDR_CID_HOST will use guest->host transport
> > */
> > > > +int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock
> > > > +*psk) {
> > > > +	const struct vsock_transport *new_transport;
> > > > +	struct sock *sk = sk_vsock(vsk);
> > > > +
> > > > +	switch (sk->sk_type) {
> > > > +	case SOCK_DGRAM:
> > > > +		new_transport = transport_dgram;
> > > > +		break;
> > > > +	case SOCK_STREAM:
> > > > +		if (vsk->remote_addr.svm_cid > VMADDR_CID_HOST)
> > > > +			new_transport = transport_h2g;
> > > > +		else
> > > > +			new_transport = transport_g2h;
> > > > +		break;
> > >
> > > You already mentioned that you are working on a fix for loopback
> > > here for the guest, but presumably a host could also do loopback.
> > 
> > IIUC we don't support loopback in the host, because in this case the
> > application will use the CID_HOST as address, but if we are in a nested
> > VM environment we are in trouble.
> 
> If both src and dst CID are CID_HOST, we should be fairly sure that this
> Is host loopback, no? If src is anything else, we would do G2H.
> 

The problem is that we don't know the src until we assign a transport
looking at the dst. (or if the user bound the socket to CID_HOST before
the connect(), but it is not very common)

So if we are in a L1 and the user uses the local guest CID, it works, but if
it uses the HOST_CID, the packet will go to the L0.

If we are in L0, it could be simple, because we can simply check if G2H
is not loaded, so any packet to CID_HOST, is host loopback.

I think that if the user uses the IOCTL_VM_SOCKETS_GET_LOCAL_CID, to set
the dest CID for the loopback, it works in both cases because we return the
HOST_CID in L0, and always the guest CID in L1, also if a H2G is loaded to
handle the L2.

Maybe we should document this in the man page.

But I have a question: Does vmci support the host loopback?
I've tried, and it seems not.

Also vhost-vsock doesn't support it, but virtio-vsock does.

> > 
> > Since several people asked about this feature at the KVM Forum, I would like
> > to add a new VMADDR_CID_LOCAL (i.e. using the reserved 1) and implement
> > loopback in the core.
> > 
> > What do you think?
> 
> What kind of use cases are mentioned in the KVM forum for loopback? One concern
> is that we have to maintain yet another interprocess communication mechanism,
> even though other choices exist already  (and those are likely to be more efficient
> given the development time and specific focus that went into those). To me, the
> local connections are mainly useful as a way to sanity test the protocol and transports.
> However, if loopback is compelling, it would make sense have it in the core, since it
> shouldn't need a specific transport. 

The common use cases is for developer point of view, and to test the
protocol and transports as you said.

People that are introducing VSOCK support in their projects, would like to
test them on their own PC without starting a VM.

The idea is to move the code to handle loopback from the virtio-vsock,
in the core, but in another series :-)

> 
> > 
> > > If we select transport during bind to a specific CID, this comment
> > 
> > Also in this case, are you talking about the peer that will call
> > connect()?
> 
> The same thought as mentioned in the beginning - but as mentioned
> above, I agree that your updated bind and transport selection should
> handle this as well.

Got it.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active
  2019-11-12 10:03       ` Jorgen Hansen
@ 2019-11-12 10:42         ` Stefano Garzarella
  0 siblings, 0 replies; 46+ messages in thread
From: Stefano Garzarella @ 2019-11-12 10:42 UTC (permalink / raw)
  To: Jorgen Hansen
  Cc: netdev, Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization

On Tue, Nov 12, 2019 at 10:03:54AM +0000, Jorgen Hansen wrote:
> > From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> > Sent: Monday, November 11, 2019 6:31 PM
> > On Mon, Nov 11, 2019 at 04:27:28PM +0000, Jorgen Hansen wrote:
> > > > From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> > > > Sent: Wednesday, October 23, 2019 11:56 AM
> > > >
> > > > To allow other transports to be loaded with vmci_transport,
> > > > we register the vmci_transport as G2H or H2G only when a VMCI guest
> > > > or host is active.
> > > >
> > > > To do that, this patch adds a callback registered in the vmci driver
> > > > that will be called when a new host or guest become active.
> > > > This callback will register the vmci_transport in the VSOCK core.
> > > > If the transport is already registered, we ignore the error coming
> > > > from vsock_core_register().
> > >
> > > So today this is mainly an issue for the VMCI vsock transport, because
> > > VMCI autoloads with vsock (and with this solution it can continue to
> > > do that, so none of our old products break due to changed behavior,
> > > which is great).
> > 
> > I tried to not break anything :-)
> > 
> > >                  Shouldn't vhost behave similar, so that any module
> > > that registers a h2g transport only does so if it is in active use?
> > >
> > 
> > The vhost-vsock module will load when the first hypervisor open
> > /dev/vhost-vsock, so in theory, when there's at least one active user.
> 
> Ok, sounds good then. 
> 
> > 
> > >
> > > > --- a/drivers/misc/vmw_vmci/vmci_host.c
> > > > +++ b/drivers/misc/vmw_vmci/vmci_host.c
> > > > @@ -108,6 +108,11 @@ bool vmci_host_code_active(void)
> > > >  	     atomic_read(&vmci_host_active_users) > 0);
> > > >  }
> > > >
> > > > +int vmci_host_users(void)
> > > > +{
> > > > +	return atomic_read(&vmci_host_active_users);
> > > > +}
> > > > +
> > > >  /*
> > > >   * Called on open of /dev/vmci.
> > > >   */
> > > > @@ -338,6 +343,8 @@ static int vmci_host_do_init_context(struct
> > > > vmci_host_dev *vmci_host_dev,
> > > >  	vmci_host_dev->ct_type = VMCIOBJ_CONTEXT;
> > > >  	atomic_inc(&vmci_host_active_users);
> > > >
> > > > +	vmci_call_vsock_callback(true);
> > > > +
> > >
> > > Since we don't unregister the transport if user count drops back to 0, we
> > could
> > > just call this the first time, a VM is powered on after the module is loaded.
> > 
> > Yes, make sense. can I use the 'vmci_host_active_users' or is better to
> > add a new 'vmci_host_vsock_loaded'?
> > 
> > My doubt is that vmci_host_active_users can return to 0, so when it returns
> > to 1, we call vmci_call_vsock_callback() again.
> 
> vmci_host_active_users can drop to 0 and then increase again, so having a flag
> indicating whether the callback has been invoked would ensure that it is only
> called once.

I agree, I will use a dedicated flag, maybe in the
vmci_call_vsock_callback(), since it can be called or during the
vmci_host_do_init_context() or when the callback is registered.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH net-next 11/14] vsock: add multi-transports support
  2019-11-12 10:36         ` Stefano Garzarella
@ 2019-11-13 14:30           ` Jorgen Hansen
  2019-11-13 16:38             ` Stefano Garzarella
  0 siblings, 1 reply; 46+ messages in thread
From: Jorgen Hansen @ 2019-11-13 14:30 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization

> From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> Sent: Tuesday, November 12, 2019 11:37 AM

> > > > You already mentioned that you are working on a fix for loopback
> > > > here for the guest, but presumably a host could also do loopback.
> > >
> > > IIUC we don't support loopback in the host, because in this case the
> > > application will use the CID_HOST as address, but if we are in a nested
> > > VM environment we are in trouble.
> >
> > If both src and dst CID are CID_HOST, we should be fairly sure that this
> > Is host loopback, no? If src is anything else, we would do G2H.
> >
> 
> The problem is that we don't know the src until we assign a transport
> looking at the dst. (or if the user bound the socket to CID_HOST before
> the connect(), but it is not very common)
> 
> So if we are in a L1 and the user uses the local guest CID, it works, but if
> it uses the HOST_CID, the packet will go to the L0.
> 
> If we are in L0, it could be simple, because we can simply check if G2H
> is not loaded, so any packet to CID_HOST, is host loopback.
> 
> I think that if the user uses the IOCTL_VM_SOCKETS_GET_LOCAL_CID, to set
> the dest CID for the loopback, it works in both cases because we return the
> HOST_CID in L0, and always the guest CID in L1, also if a H2G is loaded to
> handle the L2.
> 
> Maybe we should document this in the man page.

Yeah, it seems like a good idea to flesh out the routing behavior for nested
VMs in the man page.

> 
> But I have a question: Does vmci support the host loopback?
> I've tried, and it seems not.

Only for datagrams - not for stream sockets.
 
> Also vhost-vsock doesn't support it, but virtio-vsock does.
> 
> > >
> > > Since several people asked about this feature at the KVM Forum, I would
> like
> > > to add a new VMADDR_CID_LOCAL (i.e. using the reserved 1) and
> implement
> > > loopback in the core.
> > >
> > > What do you think?
> >
> > What kind of use cases are mentioned in the KVM forum for loopback?
> One concern
> > is that we have to maintain yet another interprocess communication
> mechanism,
> > even though other choices exist already  (and those are likely to be more
> efficient
> > given the development time and specific focus that went into those). To
> me, the
> > local connections are mainly useful as a way to sanity test the protocol and
> transports.
> > However, if loopback is compelling, it would make sense have it in the core,
> since it
> > shouldn't need a specific transport.
> 
> The common use cases is for developer point of view, and to test the
> protocol and transports as you said.
> 
> People that are introducing VSOCK support in their projects, would like to
> test them on their own PC without starting a VM.
> 
> The idea is to move the code to handle loopback from the virtio-vsock,
> in the core, but in another series :-)

OK, that makes sense.

Thanks,
Jorgen

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH net-next 11/14] vsock: add multi-transports support
  2019-11-13 14:30           ` Jorgen Hansen
@ 2019-11-13 16:38             ` Stefano Garzarella
  0 siblings, 0 replies; 46+ messages in thread
From: Stefano Garzarella @ 2019-11-13 16:38 UTC (permalink / raw)
  To: Jorgen Hansen
  Cc: netdev, Michael S. Tsirkin, kvm, Greg Kroah-Hartman, Jason Wang,
	David S. Miller, Dexuan Cui, Haiyang Zhang, Sasha Levin,
	linux-kernel, Arnd Bergmann, Stefan Hajnoczi, linux-hyperv,
	K. Y. Srinivasan, Stephen Hemminger, virtualization

On Wed, Nov 13, 2019 at 02:30:24PM +0000, Jorgen Hansen wrote:
> > From: Stefano Garzarella [mailto:sgarzare@redhat.com]
> > Sent: Tuesday, November 12, 2019 11:37 AM
> 
> > > > > You already mentioned that you are working on a fix for loopback
> > > > > here for the guest, but presumably a host could also do loopback.
> > > >
> > > > IIUC we don't support loopback in the host, because in this case the
> > > > application will use the CID_HOST as address, but if we are in a nested
> > > > VM environment we are in trouble.
> > >
> > > If both src and dst CID are CID_HOST, we should be fairly sure that this
> > > Is host loopback, no? If src is anything else, we would do G2H.
> > >
> > 
> > The problem is that we don't know the src until we assign a transport
> > looking at the dst. (or if the user bound the socket to CID_HOST before
> > the connect(), but it is not very common)
> > 
> > So if we are in a L1 and the user uses the local guest CID, it works, but if
> > it uses the HOST_CID, the packet will go to the L0.
> > 
> > If we are in L0, it could be simple, because we can simply check if G2H
> > is not loaded, so any packet to CID_HOST, is host loopback.
> > 
> > I think that if the user uses the IOCTL_VM_SOCKETS_GET_LOCAL_CID, to set
> > the dest CID for the loopback, it works in both cases because we return the
> > HOST_CID in L0, and always the guest CID in L1, also if a H2G is loaded to
> > handle the L2.
> > 
> > Maybe we should document this in the man page.
> 
> Yeah, it seems like a good idea to flesh out the routing behavior for nested
> VMs in the man page.

I'll do it.

> 
> > 
> > But I have a question: Does vmci support the host loopback?
> > I've tried, and it seems not.
> 
> Only for datagrams - not for stream sockets.
>  

Ok, I'll leave the datagram loopback as before.

> > Also vhost-vsock doesn't support it, but virtio-vsock does.
> > 
> > > >
> > > > Since several people asked about this feature at the KVM Forum, I would
> > like
> > > > to add a new VMADDR_CID_LOCAL (i.e. using the reserved 1) and
> > implement
> > > > loopback in the core.
> > > >
> > > > What do you think?
> > >
> > > What kind of use cases are mentioned in the KVM forum for loopback?
> > One concern
> > > is that we have to maintain yet another interprocess communication
> > mechanism,
> > > even though other choices exist already  (and those are likely to be more
> > efficient
> > > given the development time and specific focus that went into those). To
> > me, the
> > > local connections are mainly useful as a way to sanity test the protocol and
> > transports.
> > > However, if loopback is compelling, it would make sense have it in the core,
> > since it
> > > shouldn't need a specific transport.
> > 
> > The common use cases is for developer point of view, and to test the
> > protocol and transports as you said.
> > 
> > People that are introducing VSOCK support in their projects, would like to
> > test them on their own PC without starting a VM.
> > 
> > The idea is to move the code to handle loopback from the virtio-vsock,
> > in the core, but in another series :-)
> 
> OK, that makes sense.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, back to index

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-23  9:55 [PATCH net-next 00/14] vsock: add multi-transports support Stefano Garzarella
2019-10-23  9:55 ` [PATCH net-next 01/14] vsock/vmci: remove unused VSOCK_DEFAULT_CONNECT_TIMEOUT Stefano Garzarella
2019-10-30 14:54   ` Jorgen Hansen
2019-10-23  9:55 ` [PATCH net-next 02/14] vsock: remove vm_sockets_get_local_cid() Stefano Garzarella
2019-10-30 14:55   ` Jorgen Hansen
2019-10-23  9:55 ` [PATCH net-next 03/14] vsock: remove include/linux/vm_sockets.h file Stefano Garzarella
2019-10-30 14:57   ` Jorgen Hansen
2019-10-23  9:55 ` [PATCH net-next 04/14] vsock: add 'transport' member in the struct vsock_sock Stefano Garzarella
2019-10-30 14:57   ` Jorgen Hansen
2019-10-23  9:55 ` [PATCH net-next 05/14] vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock() Stefano Garzarella
2019-10-23  9:55 ` [PATCH net-next 06/14] vsock: add 'struct vsock_sock *' param to vsock_core_get_transport() Stefano Garzarella
2019-10-30 15:01   ` Jorgen Hansen
2019-10-23  9:55 ` [PATCH net-next 07/14] vsock: handle buffer_size sockopts in the core Stefano Garzarella
2019-10-27  8:08   ` Stefan Hajnoczi
2019-10-30 15:08   ` Jorgen Hansen
2019-10-31  8:50     ` Stefano Garzarella
2019-10-23  9:55 ` [PATCH net-next 08/14] vsock: add vsock_create_connected() called by transports Stefano Garzarella
2019-10-27  8:12   ` Stefan Hajnoczi
2019-10-30 15:12   ` Jorgen Hansen
2019-10-23  9:55 ` [PATCH net-next 09/14] vsock: move vsock_insert_unbound() in the vsock_create() Stefano Garzarella
2019-10-30 15:12   ` Jorgen Hansen
2019-10-23  9:55 ` [PATCH net-next 10/14] hv_sock: set VMADDR_CID_HOST in the hvs_remote_addr_init() Stefano Garzarella
2019-10-23  9:55 ` [PATCH net-next 11/14] vsock: add multi-transports support Stefano Garzarella
2019-10-23 15:08   ` Stefano Garzarella
2019-10-30 15:40     ` Jorgen Hansen
2019-10-31  8:54       ` Stefano Garzarella
2019-11-11 13:53   ` Jorgen Hansen
2019-11-11 17:17     ` Stefano Garzarella
2019-11-12  9:59       ` Jorgen Hansen
2019-11-12 10:36         ` Stefano Garzarella
2019-11-13 14:30           ` Jorgen Hansen
2019-11-13 16:38             ` Stefano Garzarella
2019-10-23  9:55 ` [PATCH net-next 12/14] vsock/vmci: register vmci_transport only when VMCI guest/host are active Stefano Garzarella
2019-10-27  8:17   ` Stefan Hajnoczi
2019-10-29 16:35     ` Stefano Garzarella
2019-11-04 10:10   ` Stefano Garzarella
2019-11-11 16:27   ` Jorgen Hansen
2019-11-11 17:30     ` Stefano Garzarella
2019-11-12 10:03       ` Jorgen Hansen
2019-11-12 10:42         ` Stefano Garzarella
2019-10-23  9:55 ` [PATCH net-next 13/14] vsock: prevent transport modules unloading Stefano Garzarella
2019-11-11 16:36   ` Jorgen Hansen
2019-10-23  9:55 ` [PATCH net-next 14/14] vsock: fix bind() behaviour taking care of CID Stefano Garzarella
2019-11-11 16:53   ` Jorgen Hansen
2019-10-27  8:01 ` [PATCH net-next 00/14] vsock: add multi-transports support Stefan Hajnoczi
2019-10-29 16:27   ` Stefano Garzarella

Linux-HyperV Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-hyperv/0 linux-hyperv/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-hyperv linux-hyperv/ https://lore.kernel.org/linux-hyperv \
		linux-hyperv@vger.kernel.org
	public-inbox-index linux-hyperv

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-hyperv


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git