From: Bobby Eshleman <bobby.eshleman@bytedance.com>
To: Stefan Hajnoczi <stefanha@redhat.com>,
Stefano Garzarella <sgarzare@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
"K. Y. Srinivasan" <kys@microsoft.com>,
Haiyang Zhang <haiyangz@microsoft.com>,
Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
Bryan Tan <bryantan@vmware.com>, Vishnu Dasa <vdasa@vmware.com>,
VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>,
Simon Horman <simon.horman@corigine.com>,
Krasnov Arseniy <oxffffaa@gmail.com>,
kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-hyperv@vger.kernel.org, bpf@vger.kernel.org,
Bobby Eshleman <bobby.eshleman@bytedance.com>
Subject: [PATCH RFC net-next v5 05/14] af_vsock: use a separate dgram bind table
Date: Wed, 19 Jul 2023 00:50:09 +0000 [thread overview]
Message-ID: <20230413-b4-vsock-dgram-v5-5-581bd37fdb26@bytedance.com> (raw)
In-Reply-To: <20230413-b4-vsock-dgram-v5-0-581bd37fdb26@bytedance.com>
This commit adds support for bound dgram sockets to be tracked in a
separate bind table from connectible sockets in order to avoid address
collisions. With this commit, users can simultaneously bind a dgram
socket and connectible socket to the same CID and port.
Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
---
net/vmw_vsock/af_vsock.c | 103 ++++++++++++++++++++++++++++++++++-------------
1 file changed, 76 insertions(+), 27 deletions(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 88100154156c..0895f4c1d340 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -10,18 +10,23 @@
* - There are two kinds of sockets: those created by user action (such as
* calling socket(2)) and those created by incoming connection request packets.
*
- * - There are two "global" tables, one for bound sockets (sockets that have
- * specified an address that they are responsible for) and one for connected
- * sockets (sockets that have established a connection with another socket).
- * These tables are "global" in that all sockets on the system are placed
- * within them. - Note, though, that the bound table contains an extra entry
- * for a list of unbound sockets and SOCK_DGRAM sockets will always remain in
- * that list. The bound table is used solely for lookup of sockets when packets
- * are received and that's not necessary for SOCK_DGRAM sockets since we create
- * a datagram handle for each and need not perform a lookup. Keeping SOCK_DGRAM
- * sockets out of the bound hash buckets will reduce the chance of collisions
- * when looking for SOCK_STREAM sockets and prevents us from having to check the
- * socket type in the hash table lookups.
+ * - There are three "global" tables, one for bound connectible (stream /
+ * seqpacket) sockets, one for bound datagram sockets, and one for connected
+ * sockets. Bound sockets are sockets that have specified an address that
+ * they are responsible for. Connected sockets are sockets that have
+ * established a connection with another socket. These tables are "global" in
+ * that all sockets on the system are placed within them. - Note, though,
+ * that the bound tables contain an extra entry for a list of unbound
+ * sockets. The bound tables are used solely for lookup of sockets when packets
+ * are received.
+ *
+ * - There are separate bind tables for connectible and datagram sockets to avoid
+ * address collisions between stream/seqpacket sockets and datagram sockets.
+ *
+ * - Transports may elect to NOT use the global datagram bind table by
+ * implementing the ->dgram_bind() callback. If that callback is implemented,
+ * the global bind table is not used and the responsibility of bound datagram
+ * socket tracking is deferred to the transport.
*
* - Sockets created by user action will either be "client" sockets that
* initiate a connection or "server" sockets that listen for connections; we do
@@ -115,6 +120,7 @@
static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
static void vsock_sk_destruct(struct sock *sk);
static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
+static bool sock_type_connectible(u16 type);
/* Protocol family. */
struct proto vsock_proto = {
@@ -151,21 +157,25 @@ static DEFINE_MUTEX(vsock_register_mutex);
* VSocket is stored in the connected hash table.
*
* Unbound sockets are all put on the same list attached to the end of the hash
- * table (vsock_unbound_sockets). Bound sockets are added to the hash table in
- * the bucket that their local address hashes to (vsock_bound_sockets(addr)
- * represents the list that addr hashes to).
+ * tables (vsock_unbound_sockets/vsock_unbound_dgram_sockets). Bound sockets
+ * are added to the hash table in the bucket that their local address hashes to
+ * (vsock_bound_sockets(addr) and vsock_bound_dgram_sockets(addr) represents
+ * the list that addr hashes to).
*
- * Specifically, we initialize the vsock_bind_table array to a size of
- * VSOCK_HASH_SIZE + 1 so that vsock_bind_table[0] through
- * vsock_bind_table[VSOCK_HASH_SIZE - 1] are for bound sockets and
- * vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets. The hash function
- * mods with VSOCK_HASH_SIZE to ensure this.
+ * Specifically, taking connectible sockets as an example we initialize the
+ * vsock_bind_table array to a size of VSOCK_HASH_SIZE + 1 so that
+ * vsock_bind_table[0] through vsock_bind_table[VSOCK_HASH_SIZE - 1] are for
+ * bound sockets and vsock_bind_table[VSOCK_HASH_SIZE] is for unbound sockets.
+ * The hash function mods with VSOCK_HASH_SIZE to ensure this.
+ * Datagrams and vsock_dgram_bind_table operate in the same way.
*/
#define MAX_PORT_RETRIES 24
#define VSOCK_HASH(addr) ((addr)->svm_port % VSOCK_HASH_SIZE)
#define vsock_bound_sockets(addr) (&vsock_bind_table[VSOCK_HASH(addr)])
+#define vsock_bound_dgram_sockets(addr) (&vsock_dgram_bind_table[VSOCK_HASH(addr)])
#define vsock_unbound_sockets (&vsock_bind_table[VSOCK_HASH_SIZE])
+#define vsock_unbound_dgram_sockets (&vsock_dgram_bind_table[VSOCK_HASH_SIZE])
/* XXX This can probably be implemented in a better way. */
#define VSOCK_CONN_HASH(src, dst) \
@@ -181,6 +191,8 @@ struct list_head vsock_connected_table[VSOCK_HASH_SIZE];
EXPORT_SYMBOL_GPL(vsock_connected_table);
DEFINE_SPINLOCK(vsock_table_lock);
EXPORT_SYMBOL_GPL(vsock_table_lock);
+static struct list_head vsock_dgram_bind_table[VSOCK_HASH_SIZE + 1];
+static DEFINE_SPINLOCK(vsock_dgram_table_lock);
/* Autobind this socket to the local address if necessary. */
static int vsock_auto_bind(struct vsock_sock *vsk)
@@ -203,6 +215,9 @@ static void vsock_init_tables(void)
for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++)
INIT_LIST_HEAD(&vsock_connected_table[i]);
+
+ for (i = 0; i < ARRAY_SIZE(vsock_dgram_bind_table); i++)
+ INIT_LIST_HEAD(&vsock_dgram_bind_table[i]);
}
static void __vsock_insert_bound(struct list_head *list,
@@ -270,13 +285,28 @@ static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
return NULL;
}
-static void vsock_insert_unbound(struct vsock_sock *vsk)
+static void __vsock_insert_dgram_unbound(struct vsock_sock *vsk)
+{
+ spin_lock_bh(&vsock_dgram_table_lock);
+ __vsock_insert_bound(vsock_unbound_dgram_sockets, vsk);
+ spin_unlock_bh(&vsock_dgram_table_lock);
+}
+
+static void __vsock_insert_connectible_unbound(struct vsock_sock *vsk)
{
spin_lock_bh(&vsock_table_lock);
__vsock_insert_bound(vsock_unbound_sockets, vsk);
spin_unlock_bh(&vsock_table_lock);
}
+static void vsock_insert_unbound(struct vsock_sock *vsk)
+{
+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
+ __vsock_insert_connectible_unbound(vsk);
+ else
+ __vsock_insert_dgram_unbound(vsk);
+}
+
void vsock_insert_connected(struct vsock_sock *vsk)
{
struct list_head *list = vsock_connected_sockets(
@@ -288,6 +318,14 @@ void vsock_insert_connected(struct vsock_sock *vsk)
}
EXPORT_SYMBOL_GPL(vsock_insert_connected);
+static void vsock_remove_dgram_bound(struct vsock_sock *vsk)
+{
+ spin_lock_bh(&vsock_dgram_table_lock);
+ if (__vsock_in_bound_table(vsk))
+ __vsock_remove_bound(vsk);
+ spin_unlock_bh(&vsock_dgram_table_lock);
+}
+
void vsock_remove_bound(struct vsock_sock *vsk)
{
spin_lock_bh(&vsock_table_lock);
@@ -339,7 +377,10 @@ EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
void vsock_remove_sock(struct vsock_sock *vsk)
{
- vsock_remove_bound(vsk);
+ if (sock_type_connectible(sk_vsock(vsk)->sk_type))
+ vsock_remove_bound(vsk);
+ else
+ vsock_remove_dgram_bound(vsk);
vsock_remove_connected(vsk);
}
EXPORT_SYMBOL_GPL(vsock_remove_sock);
@@ -722,11 +763,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1);
}
-static int __vsock_bind_dgram(struct vsock_sock *vsk,
- struct sockaddr_vm *addr)
+static int vsock_bind_dgram(struct vsock_sock *vsk,
+ struct sockaddr_vm *addr)
{
- if (!vsk->transport || !vsk->transport->dgram_bind)
- return -EINVAL;
+ if (!vsk->transport || !vsk->transport->dgram_bind) {
+ int retval;
+
+ spin_lock_bh(&vsock_dgram_table_lock);
+ retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table,
+ VSOCK_HASH_SIZE);
+ spin_unlock_bh(&vsock_dgram_table_lock);
+
+ return retval;
+ }
return vsk->transport->dgram_bind(vsk, addr);
}
@@ -757,7 +806,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr)
break;
case SOCK_DGRAM:
- retval = __vsock_bind_dgram(vsk, addr);
+ retval = vsock_bind_dgram(vsk, addr);
break;
default:
--
2.30.2
next prev parent reply other threads:[~2023-07-19 0:50 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-19 0:50 [PATCH RFC net-next v5 00/14] virtio/vsock: support datagrams Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 01/14] af_vsock: generalize vsock_dgram_recvmsg() to all transports Bobby Eshleman
2023-07-24 18:11 ` Arseniy Krasnov
2023-07-26 18:21 ` Bobby Eshleman
2023-07-27 7:53 ` Arseniy Krasnov
2023-07-19 0:50 ` [PATCH RFC net-next v5 02/14] af_vsock: refactor transport lookup code Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 03/14] af_vsock: support multi-transport datagrams Bobby Eshleman
2023-07-22 21:53 ` Arseniy Krasnov
2023-08-02 22:24 ` Bobby Eshleman
2023-08-03 0:53 ` Bobby Eshleman
2023-08-03 12:42 ` Stefano Garzarella
2023-08-03 18:58 ` Bobby Eshleman
2023-08-04 14:11 ` Stefano Garzarella
2023-08-04 17:01 ` Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 04/14] af_vsock: generalize bind table functions Bobby Eshleman
2023-07-19 0:50 ` Bobby Eshleman [this message]
2023-07-19 0:50 ` [PATCH RFC net-next v5 06/14] virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 07/14] virtio/vsock: add common datagram send path Bobby Eshleman
2023-07-22 8:16 ` Arseniy Krasnov
2023-07-26 17:08 ` Bobby Eshleman
2023-07-27 7:57 ` Arseniy Krasnov
2023-08-02 20:08 ` Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 08/14] af_vsock: add vsock_find_bound_dgram_socket() Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 09/14] virtio/vsock: add common datagram recv path Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 10/14] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Bobby Eshleman
2023-07-26 18:38 ` Michael S. Tsirkin
2023-07-27 7:48 ` Stefano Garzarella
2023-08-01 4:30 ` Bobby Eshleman
2023-08-01 16:04 ` Stefano Garzarella
2023-07-19 0:50 ` [PATCH RFC net-next v5 11/14] vhost/vsock: implement datagram support Bobby Eshleman
2023-07-22 8:42 ` Arseniy Krasnov
2023-07-26 17:55 ` Bobby Eshleman
2023-07-27 8:00 ` Arseniy Krasnov
2023-08-02 21:23 ` Bobby Eshleman
2023-08-04 13:16 ` Arseniy Krasnov
2023-07-27 8:04 ` Arseniy Krasnov
2023-07-26 18:40 ` Michael S. Tsirkin
2023-08-01 4:26 ` Bobby Eshleman
2023-08-02 19:28 ` Bobby Eshleman
2023-08-12 8:01 ` Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 12/14] vsock/loopback: " Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 13/14] virtio/vsock: " Bobby Eshleman
2023-07-22 8:45 ` Arseniy Krasnov
2023-07-26 17:58 ` Bobby Eshleman
2023-07-27 8:09 ` Arseniy Krasnov
2023-08-01 4:14 ` [PATCH RFC net-next v5 13/14] virtio/vsock: implement datagram supporty Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 14/14] test/vsock: add vsock dgram tests Bobby Eshleman
2023-07-22 8:51 ` [PATCH RFC net-next v5 00/14] virtio/vsock: support datagrams Arseniy Krasnov
2023-07-26 18:02 ` Bobby Eshleman
2023-07-27 7:51 ` Michael S. Tsirkin
2023-08-01 5:30 ` Bobby Eshleman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230413-b4-vsock-dgram-v5-5-581bd37fdb26@bytedance.com \
--to=bobby.eshleman@bytedance.com \
--cc=bpf@vger.kernel.org \
--cc=bryantan@vmware.com \
--cc=dan.carpenter@linaro.org \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=edumazet@google.com \
--cc=haiyangz@microsoft.com \
--cc=jasowang@redhat.com \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=kys@microsoft.com \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=oxffffaa@gmail.com \
--cc=pabeni@redhat.com \
--cc=pv-drivers@vmware.com \
--cc=sgarzare@redhat.com \
--cc=simon.horman@corigine.com \
--cc=stefanha@redhat.com \
--cc=vdasa@vmware.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=wei.liu@kernel.org \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).