All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
@ 2021-06-09 23:24 ` Jiang Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang @ 2021-06-09 23:24 UTC (permalink / raw)
  To: sgarzare
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, Jason Wang,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Jorgen Hansen, Andra Paraschiv, Norbert Slusarek,
	Lu Wei, Alexander Popov, kvm, netdev, linux-kernel

This patchset implements support of SOCK_DGRAM for virtio
transport.

Datagram sockets are connectionless and unreliable. To avoid unfair contention
with stream and other sockets, add two more virtqueues and
a new feature bit to indicate if those two new queues exist or not.

Dgram does not use the existing credit update mechanism for
stream sockets. When sending from the guest/driver, sending packets 
synchronously, so the sender will get an error when the virtqueue is full.
When sending from the host/device, send packets asynchronously
because the descriptor memory belongs to the corresponding QEMU
process.

The virtio spec patch is here: 
https://www.spinics.net/lists/linux-virtualization/msg50027.html

For those who prefer git repo, here is the link for the linux kernel:
https://github.com/Jiang1155/linux/tree/vsock-dgram-v1

qemu patch link:
https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1


To do:
1. use skb when receiving packets
2. support multiple transport
3. support mergeable rx buffer


Jiang Wang (6):
  virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
  virtio/vsock: add support for virtio datagram
  vhost/vsock: add support for vhost dgram.
  vsock_test: add tests for vsock dgram
  vhost/vsock: add kconfig for vhost dgram support
  virtio/vsock: add sysfs for rx buf len for dgram

 drivers/vhost/Kconfig                              |   8 +
 drivers/vhost/vsock.c                              | 207 ++++++++--
 include/linux/virtio_vsock.h                       |   9 +
 include/net/af_vsock.h                             |   1 +
 .../trace/events/vsock_virtio_transport_common.h   |   5 +-
 include/uapi/linux/virtio_vsock.h                  |   4 +
 net/vmw_vsock/af_vsock.c                           |  12 +
 net/vmw_vsock/virtio_transport.c                   | 433 ++++++++++++++++++---
 net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++-
 tools/testing/vsock/util.c                         | 105 +++++
 tools/testing/vsock/util.h                         |   4 +
 tools/testing/vsock/vsock_test.c                   | 195 ++++++++++
 12 files changed, 1070 insertions(+), 97 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
@ 2021-06-09 23:24 ` Jiang Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang @ 2021-06-09 23:24 UTC (permalink / raw)
  To: sgarzare
  Cc: cong.wang, Andra Paraschiv, kvm, mst, virtualization,
	Norbert Slusarek, jhansen, duanxiongchun, xieyongji, Ingo Molnar,
	Jakub Kicinski, Alexander Popov, Steven Rostedt, chaiwen.cc,
	stefanha, netdev, linux-kernel, Lu Wei, Colin Ian King,
	arseny.krasnov, David S. Miller, Jorgen Hansen

This patchset implements support of SOCK_DGRAM for virtio
transport.

Datagram sockets are connectionless and unreliable. To avoid unfair contention
with stream and other sockets, add two more virtqueues and
a new feature bit to indicate if those two new queues exist or not.

Dgram does not use the existing credit update mechanism for
stream sockets. When sending from the guest/driver, sending packets 
synchronously, so the sender will get an error when the virtqueue is full.
When sending from the host/device, send packets asynchronously
because the descriptor memory belongs to the corresponding QEMU
process.

The virtio spec patch is here: 
https://www.spinics.net/lists/linux-virtualization/msg50027.html

For those who prefer git repo, here is the link for the linux kernel:
https://github.com/Jiang1155/linux/tree/vsock-dgram-v1

qemu patch link:
https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1


To do:
1. use skb when receiving packets
2. support multiple transport
3. support mergeable rx buffer


Jiang Wang (6):
  virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
  virtio/vsock: add support for virtio datagram
  vhost/vsock: add support for vhost dgram.
  vsock_test: add tests for vsock dgram
  vhost/vsock: add kconfig for vhost dgram support
  virtio/vsock: add sysfs for rx buf len for dgram

 drivers/vhost/Kconfig                              |   8 +
 drivers/vhost/vsock.c                              | 207 ++++++++--
 include/linux/virtio_vsock.h                       |   9 +
 include/net/af_vsock.h                             |   1 +
 .../trace/events/vsock_virtio_transport_common.h   |   5 +-
 include/uapi/linux/virtio_vsock.h                  |   4 +
 net/vmw_vsock/af_vsock.c                           |  12 +
 net/vmw_vsock/virtio_transport.c                   | 433 ++++++++++++++++++---
 net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++-
 tools/testing/vsock/util.c                         | 105 +++++
 tools/testing/vsock/util.h                         |   4 +
 tools/testing/vsock/vsock_test.c                   | 195 ++++++++++
 12 files changed, 1070 insertions(+), 97 deletions(-)

-- 
2.11.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC v1 1/6] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
  2021-06-09 23:24 ` Jiang Wang
@ 2021-06-09 23:24   ` Jiang Wang
  -1 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang @ 2021-06-09 23:24 UTC (permalink / raw)
  To: sgarzare
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, Jason Wang,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Andra Paraschiv, Norbert Slusarek, Colin Ian King,
	Alexander Popov, kvm, netdev, linux-kernel

When this feature is enabled, allocate 5 queues,
otherwise, allocate 3 queues to be compatible with
old QEMU versions.

Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
---
 drivers/vhost/vsock.c             |  3 +-
 include/linux/virtio_vsock.h      |  9 +++++
 include/uapi/linux/virtio_vsock.h |  3 ++
 net/vmw_vsock/virtio_transport.c  | 73 +++++++++++++++++++++++++++++++++++----
 4 files changed, 80 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 5e78fb719602..81d064601093 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -31,7 +31,8 @@
 
 enum {
 	VHOST_VSOCK_FEATURES = VHOST_FEATURES |
-			       (1ULL << VIRTIO_F_ACCESS_PLATFORM)
+			       (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
+			       (1ULL << VIRTIO_VSOCK_F_DGRAM)
 };
 
 enum {
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index dc636b727179..ba3189ed9345 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -18,6 +18,15 @@ enum {
 	VSOCK_VQ_MAX    = 3,
 };
 
+enum {
+	VSOCK_VQ_STREAM_RX     = 0, /* for host to guest data */
+	VSOCK_VQ_STREAM_TX     = 1, /* for guest to host data */
+	VSOCK_VQ_DGRAM_RX       = 2,
+	VSOCK_VQ_DGRAM_TX       = 3,
+	VSOCK_VQ_EX_EVENT       = 4,
+	VSOCK_VQ_EX_MAX         = 5,
+};
+
 /* Per-socket state (accessed via vsk->trans) */
 struct virtio_vsock_sock {
 	struct vsock_sock *vsk;
diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
index 1d57ed3d84d2..b56614dff1c9 100644
--- a/include/uapi/linux/virtio_vsock.h
+++ b/include/uapi/linux/virtio_vsock.h
@@ -38,6 +38,9 @@
 #include <linux/virtio_ids.h>
 #include <linux/virtio_config.h>
 
+/* The feature bitmap for virtio net */
+#define VIRTIO_VSOCK_F_DGRAM	0	/* Host support dgram vsock */
+
 struct virtio_vsock_config {
 	__le64 guest_cid;
 } __attribute__((packed));
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 2700a63ab095..7dcb8db23305 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -27,7 +27,8 @@ static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
 
 struct virtio_vsock {
 	struct virtio_device *vdev;
-	struct virtqueue *vqs[VSOCK_VQ_MAX];
+	struct virtqueue **vqs;
+	bool has_dgram;
 
 	/* Virtqueue processing is deferred to a workqueue */
 	struct work_struct tx_work;
@@ -333,7 +334,10 @@ static int virtio_vsock_event_fill_one(struct virtio_vsock *vsock,
 	struct scatterlist sg;
 	struct virtqueue *vq;
 
-	vq = vsock->vqs[VSOCK_VQ_EVENT];
+	if (vsock->has_dgram)
+		vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
+	else
+		vq = vsock->vqs[VSOCK_VQ_EVENT];
 
 	sg_init_one(&sg, event, sizeof(*event));
 
@@ -351,7 +355,10 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)
 		virtio_vsock_event_fill_one(vsock, event);
 	}
 
-	virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
+	if (vsock->has_dgram)
+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
+	else
+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
 }
 
 static void virtio_vsock_reset_sock(struct sock *sk)
@@ -391,7 +398,10 @@ static void virtio_transport_event_work(struct work_struct *work)
 		container_of(work, struct virtio_vsock, event_work);
 	struct virtqueue *vq;
 
-	vq = vsock->vqs[VSOCK_VQ_EVENT];
+	if (vsock->has_dgram)
+		vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
+	else
+		vq = vsock->vqs[VSOCK_VQ_EVENT];
 
 	mutex_lock(&vsock->event_lock);
 
@@ -411,7 +421,10 @@ static void virtio_transport_event_work(struct work_struct *work)
 		}
 	} while (!virtqueue_enable_cb(vq));
 
-	virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
+	if (vsock->has_dgram)
+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
+	else
+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
 out:
 	mutex_unlock(&vsock->event_lock);
 }
@@ -434,6 +447,10 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)
 	queue_work(virtio_vsock_workqueue, &vsock->tx_work);
 }
 
+static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)
+{
+}
+
 static void virtio_vsock_rx_done(struct virtqueue *vq)
 {
 	struct virtio_vsock *vsock = vq->vdev->priv;
@@ -443,6 +460,10 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
 	queue_work(virtio_vsock_workqueue, &vsock->rx_work);
 }
 
+static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)
+{
+}
+
 static struct virtio_transport virtio_transport = {
 	.transport = {
 		.module                   = THIS_MODULE,
@@ -545,13 +566,29 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
 		virtio_vsock_tx_done,
 		virtio_vsock_event_done,
 	};
+	vq_callback_t *ex_callbacks[] = {
+		virtio_vsock_rx_done,
+		virtio_vsock_tx_done,
+		virtio_vsock_dgram_rx_done,
+		virtio_vsock_dgram_tx_done,
+		virtio_vsock_event_done,
+	};
+
 	static const char * const names[] = {
 		"rx",
 		"tx",
 		"event",
 	};
+	static const char * const ex_names[] = {
+		"rx",
+		"tx",
+		"dgram_rx",
+		"dgram_tx",
+		"event",
+	};
+
 	struct virtio_vsock *vsock = NULL;
-	int ret;
+	int ret, max_vq;
 
 	ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);
 	if (ret)
@@ -572,9 +609,30 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
 
 	vsock->vdev = vdev;
 
-	ret = virtio_find_vqs(vsock->vdev, VSOCK_VQ_MAX,
+	if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
+		vsock->has_dgram = true;
+
+	if (vsock->has_dgram)
+		max_vq = VSOCK_VQ_EX_MAX;
+	else
+		max_vq = VSOCK_VQ_MAX;
+
+	vsock->vqs = kmalloc_array(max_vq, sizeof(struct virtqueue *), GFP_KERNEL);
+	if (!vsock->vqs) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (vsock->has_dgram) {
+		ret = virtio_find_vqs(vsock->vdev, max_vq,
+			      vsock->vqs, ex_callbacks, ex_names,
+			      NULL);
+	} else {
+		ret = virtio_find_vqs(vsock->vdev, max_vq,
 			      vsock->vqs, callbacks, names,
 			      NULL);
+	}
+
 	if (ret < 0)
 		goto out;
 
@@ -695,6 +753,7 @@ static struct virtio_device_id id_table[] = {
 };
 
 static unsigned int features[] = {
+	VIRTIO_VSOCK_F_DGRAM,
 };
 
 static struct virtio_driver virtio_vsock_driver = {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC v1 1/6] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
@ 2021-06-09 23:24   ` Jiang Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang @ 2021-06-09 23:24 UTC (permalink / raw)
  To: sgarzare
  Cc: cong.wang, duanxiongchun, Ingo Molnar, kvm, mst, netdev,
	linux-kernel, Steven Rostedt, virtualization, xieyongji,
	chaiwen.cc, Norbert Slusarek, stefanha, Colin Ian King,
	Jakub Kicinski, arseny.krasnov, Alexander Popov, jhansen,
	David S. Miller, Andra Paraschiv

When this feature is enabled, allocate 5 queues,
otherwise, allocate 3 queues to be compatible with
old QEMU versions.

Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
---
 drivers/vhost/vsock.c             |  3 +-
 include/linux/virtio_vsock.h      |  9 +++++
 include/uapi/linux/virtio_vsock.h |  3 ++
 net/vmw_vsock/virtio_transport.c  | 73 +++++++++++++++++++++++++++++++++++----
 4 files changed, 80 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 5e78fb719602..81d064601093 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -31,7 +31,8 @@
 
 enum {
 	VHOST_VSOCK_FEATURES = VHOST_FEATURES |
-			       (1ULL << VIRTIO_F_ACCESS_PLATFORM)
+			       (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
+			       (1ULL << VIRTIO_VSOCK_F_DGRAM)
 };
 
 enum {
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index dc636b727179..ba3189ed9345 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -18,6 +18,15 @@ enum {
 	VSOCK_VQ_MAX    = 3,
 };
 
+enum {
+	VSOCK_VQ_STREAM_RX     = 0, /* for host to guest data */
+	VSOCK_VQ_STREAM_TX     = 1, /* for guest to host data */
+	VSOCK_VQ_DGRAM_RX       = 2,
+	VSOCK_VQ_DGRAM_TX       = 3,
+	VSOCK_VQ_EX_EVENT       = 4,
+	VSOCK_VQ_EX_MAX         = 5,
+};
+
 /* Per-socket state (accessed via vsk->trans) */
 struct virtio_vsock_sock {
 	struct vsock_sock *vsk;
diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
index 1d57ed3d84d2..b56614dff1c9 100644
--- a/include/uapi/linux/virtio_vsock.h
+++ b/include/uapi/linux/virtio_vsock.h
@@ -38,6 +38,9 @@
 #include <linux/virtio_ids.h>
 #include <linux/virtio_config.h>
 
+/* The feature bitmap for virtio net */
+#define VIRTIO_VSOCK_F_DGRAM	0	/* Host support dgram vsock */
+
 struct virtio_vsock_config {
 	__le64 guest_cid;
 } __attribute__((packed));
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 2700a63ab095..7dcb8db23305 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -27,7 +27,8 @@ static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
 
 struct virtio_vsock {
 	struct virtio_device *vdev;
-	struct virtqueue *vqs[VSOCK_VQ_MAX];
+	struct virtqueue **vqs;
+	bool has_dgram;
 
 	/* Virtqueue processing is deferred to a workqueue */
 	struct work_struct tx_work;
@@ -333,7 +334,10 @@ static int virtio_vsock_event_fill_one(struct virtio_vsock *vsock,
 	struct scatterlist sg;
 	struct virtqueue *vq;
 
-	vq = vsock->vqs[VSOCK_VQ_EVENT];
+	if (vsock->has_dgram)
+		vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
+	else
+		vq = vsock->vqs[VSOCK_VQ_EVENT];
 
 	sg_init_one(&sg, event, sizeof(*event));
 
@@ -351,7 +355,10 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)
 		virtio_vsock_event_fill_one(vsock, event);
 	}
 
-	virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
+	if (vsock->has_dgram)
+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
+	else
+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
 }
 
 static void virtio_vsock_reset_sock(struct sock *sk)
@@ -391,7 +398,10 @@ static void virtio_transport_event_work(struct work_struct *work)
 		container_of(work, struct virtio_vsock, event_work);
 	struct virtqueue *vq;
 
-	vq = vsock->vqs[VSOCK_VQ_EVENT];
+	if (vsock->has_dgram)
+		vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
+	else
+		vq = vsock->vqs[VSOCK_VQ_EVENT];
 
 	mutex_lock(&vsock->event_lock);
 
@@ -411,7 +421,10 @@ static void virtio_transport_event_work(struct work_struct *work)
 		}
 	} while (!virtqueue_enable_cb(vq));
 
-	virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
+	if (vsock->has_dgram)
+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
+	else
+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
 out:
 	mutex_unlock(&vsock->event_lock);
 }
@@ -434,6 +447,10 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)
 	queue_work(virtio_vsock_workqueue, &vsock->tx_work);
 }
 
+static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)
+{
+}
+
 static void virtio_vsock_rx_done(struct virtqueue *vq)
 {
 	struct virtio_vsock *vsock = vq->vdev->priv;
@@ -443,6 +460,10 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
 	queue_work(virtio_vsock_workqueue, &vsock->rx_work);
 }
 
+static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)
+{
+}
+
 static struct virtio_transport virtio_transport = {
 	.transport = {
 		.module                   = THIS_MODULE,
@@ -545,13 +566,29 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
 		virtio_vsock_tx_done,
 		virtio_vsock_event_done,
 	};
+	vq_callback_t *ex_callbacks[] = {
+		virtio_vsock_rx_done,
+		virtio_vsock_tx_done,
+		virtio_vsock_dgram_rx_done,
+		virtio_vsock_dgram_tx_done,
+		virtio_vsock_event_done,
+	};
+
 	static const char * const names[] = {
 		"rx",
 		"tx",
 		"event",
 	};
+	static const char * const ex_names[] = {
+		"rx",
+		"tx",
+		"dgram_rx",
+		"dgram_tx",
+		"event",
+	};
+
 	struct virtio_vsock *vsock = NULL;
-	int ret;
+	int ret, max_vq;
 
 	ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);
 	if (ret)
@@ -572,9 +609,30 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
 
 	vsock->vdev = vdev;
 
-	ret = virtio_find_vqs(vsock->vdev, VSOCK_VQ_MAX,
+	if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
+		vsock->has_dgram = true;
+
+	if (vsock->has_dgram)
+		max_vq = VSOCK_VQ_EX_MAX;
+	else
+		max_vq = VSOCK_VQ_MAX;
+
+	vsock->vqs = kmalloc_array(max_vq, sizeof(struct virtqueue *), GFP_KERNEL);
+	if (!vsock->vqs) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (vsock->has_dgram) {
+		ret = virtio_find_vqs(vsock->vdev, max_vq,
+			      vsock->vqs, ex_callbacks, ex_names,
+			      NULL);
+	} else {
+		ret = virtio_find_vqs(vsock->vdev, max_vq,
 			      vsock->vqs, callbacks, names,
 			      NULL);
+	}
+
 	if (ret < 0)
 		goto out;
 
@@ -695,6 +753,7 @@ static struct virtio_device_id id_table[] = {
 };
 
 static unsigned int features[] = {
+	VIRTIO_VSOCK_F_DGRAM,
 };
 
 static struct virtio_driver virtio_vsock_driver = {
-- 
2.11.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC v1 2/6] virtio/vsock: add support for virtio datagram
  2021-06-09 23:24 ` Jiang Wang
@ 2021-06-09 23:24   ` Jiang Wang
  -1 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang @ 2021-06-09 23:24 UTC (permalink / raw)
  To: sgarzare
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, Jason Wang,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Andra Paraschiv, Norbert Slusarek,
	Alexander Popov, kvm, netdev, linux-kernel

This patch add support for virtio dgram for the driver.
Implemented related functions for tx and rx, enqueue
and dequeue. Send packets synchronously to give sender
indication when the virtqueue is full.
Refactored virtio_transport_send_pkt_work() a little bit but
no functions changes for it.

Support for the host/device side is in another
patch.

Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
---
 include/net/af_vsock.h                             |   1 +
 .../trace/events/vsock_virtio_transport_common.h   |   5 +-
 include/uapi/linux/virtio_vsock.h                  |   1 +
 net/vmw_vsock/af_vsock.c                           |  12 +
 net/vmw_vsock/virtio_transport.c                   | 325 ++++++++++++++++++---
 net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++++--
 6 files changed, 466 insertions(+), 62 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index b1c717286993..fcae7bca9609 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -200,6 +200,7 @@ void vsock_remove_sock(struct vsock_sock *vsk);
 void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
 int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
 bool vsock_find_cid(unsigned int cid);
+int vsock_bind_stream(struct vsock_sock *vsk, struct sockaddr_vm *addr);
 
 /**** TAP ****/
 
diff --git a/include/trace/events/vsock_virtio_transport_common.h b/include/trace/events/vsock_virtio_transport_common.h
index 6782213778be..b1be25b327a1 100644
--- a/include/trace/events/vsock_virtio_transport_common.h
+++ b/include/trace/events/vsock_virtio_transport_common.h
@@ -9,9 +9,12 @@
 #include <linux/tracepoint.h>
 
 TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_STREAM);
+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_DGRAM);
 
 #define show_type(val) \
-	__print_symbolic(val, { VIRTIO_VSOCK_TYPE_STREAM, "STREAM" })
+	 __print_symbolic(val, \
+					{ VIRTIO_VSOCK_TYPE_STREAM, "STREAM" }, \
+					{ VIRTIO_VSOCK_TYPE_DGRAM, "DGRAM" })
 
 TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_INVALID);
 TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_REQUEST);
diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
index b56614dff1c9..5503585b26e8 100644
--- a/include/uapi/linux/virtio_vsock.h
+++ b/include/uapi/linux/virtio_vsock.h
@@ -68,6 +68,7 @@ struct virtio_vsock_hdr {
 
 enum virtio_vsock_type {
 	VIRTIO_VSOCK_TYPE_STREAM = 1,
+	VIRTIO_VSOCK_TYPE_DGRAM = 3,
 };
 
 enum virtio_vsock_op {
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 92a72f0e0d94..c1f512291b94 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -659,6 +659,18 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
 	return 0;
 }
 
+int vsock_bind_stream(struct vsock_sock *vsk,
+				       struct sockaddr_vm *addr)
+{
+	int retval;
+
+	spin_lock_bh(&vsock_table_lock);
+	retval = __vsock_bind_stream(vsk, addr);
+	spin_unlock_bh(&vsock_table_lock);
+	return retval;
+}
+EXPORT_SYMBOL(vsock_bind_stream);
+
 static int __vsock_bind_dgram(struct vsock_sock *vsk,
 			      struct sockaddr_vm *addr)
 {
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 7dcb8db23305..cf47aadb0c34 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -20,21 +20,29 @@
 #include <net/sock.h>
 #include <linux/mutex.h>
 #include <net/af_vsock.h>
+#include<linux/kobject.h>
+#include<linux/sysfs.h>
+#include <linux/refcount.h>
 
 static struct workqueue_struct *virtio_vsock_workqueue;
 static struct virtio_vsock __rcu *the_virtio_vsock;
+static struct virtio_vsock *the_virtio_vsock_dgram;
 static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
 
 struct virtio_vsock {
 	struct virtio_device *vdev;
 	struct virtqueue **vqs;
 	bool has_dgram;
+	refcount_t active;
 
 	/* Virtqueue processing is deferred to a workqueue */
 	struct work_struct tx_work;
 	struct work_struct rx_work;
 	struct work_struct event_work;
 
+	struct work_struct dgram_tx_work;
+	struct work_struct dgram_rx_work;
+
 	/* The following fields are protected by tx_lock.  vqs[VSOCK_VQ_TX]
 	 * must be accessed with tx_lock held.
 	 */
@@ -55,6 +63,22 @@ struct virtio_vsock {
 	int rx_buf_nr;
 	int rx_buf_max_nr;
 
+	/* The following fields are protected by dgram_tx_lock.  vqs[VSOCK_VQ_DGRAM_TX]
+	 * must be accessed with dgram_tx_lock held.
+	 */
+	struct mutex dgram_tx_lock;
+	bool dgram_tx_run;
+
+	atomic_t dgram_queued_replies;
+
+	/* The following fields are protected by dgram_rx_lock.  vqs[VSOCK_VQ_DGRAM_RX]
+	 * must be accessed with dgram_rx_lock held.
+	 */
+	struct mutex dgram_rx_lock;
+	bool dgram_rx_run;
+	int dgram_rx_buf_nr;
+	int dgram_rx_buf_max_nr;
+
 	/* The following fields are protected by event_lock.
 	 * vqs[VSOCK_VQ_EVENT] must be accessed with event_lock held.
 	 */
@@ -83,21 +107,11 @@ static u32 virtio_transport_get_local_cid(void)
 	return ret;
 }
 
-static void
-virtio_transport_send_pkt_work(struct work_struct *work)
+static void virtio_transport_do_send_pkt(struct virtio_vsock *vsock,
+		struct virtqueue *vq,  spinlock_t *lock, struct list_head *send_pkt_list,
+		bool *restart_rx)
 {
-	struct virtio_vsock *vsock =
-		container_of(work, struct virtio_vsock, send_pkt_work);
-	struct virtqueue *vq;
 	bool added = false;
-	bool restart_rx = false;
-
-	mutex_lock(&vsock->tx_lock);
-
-	if (!vsock->tx_run)
-		goto out;
-
-	vq = vsock->vqs[VSOCK_VQ_TX];
 
 	for (;;) {
 		struct virtio_vsock_pkt *pkt;
@@ -105,16 +119,16 @@ virtio_transport_send_pkt_work(struct work_struct *work)
 		int ret, in_sg = 0, out_sg = 0;
 		bool reply;
 
-		spin_lock_bh(&vsock->send_pkt_list_lock);
-		if (list_empty(&vsock->send_pkt_list)) {
-			spin_unlock_bh(&vsock->send_pkt_list_lock);
+		spin_lock_bh(lock);
+		if (list_empty(send_pkt_list)) {
+			spin_unlock_bh(lock);
 			break;
 		}
 
-		pkt = list_first_entry(&vsock->send_pkt_list,
+		pkt = list_first_entry(send_pkt_list,
 				       struct virtio_vsock_pkt, list);
 		list_del_init(&pkt->list);
-		spin_unlock_bh(&vsock->send_pkt_list_lock);
+		spin_unlock_bh(lock);
 
 		virtio_transport_deliver_tap_pkt(pkt);
 
@@ -132,9 +146,9 @@ virtio_transport_send_pkt_work(struct work_struct *work)
 		 * the vq
 		 */
 		if (ret < 0) {
-			spin_lock_bh(&vsock->send_pkt_list_lock);
-			list_add(&pkt->list, &vsock->send_pkt_list);
-			spin_unlock_bh(&vsock->send_pkt_list_lock);
+			spin_lock_bh(lock);
+			list_add(&pkt->list, send_pkt_list);
+			spin_unlock_bh(lock);
 			break;
 		}
 
@@ -146,7 +160,7 @@ virtio_transport_send_pkt_work(struct work_struct *work)
 
 			/* Do we now have resources to resume rx processing? */
 			if (val + 1 == virtqueue_get_vring_size(rx_vq))
-				restart_rx = true;
+				*restart_rx = true;
 		}
 
 		added = true;
@@ -154,7 +168,55 @@ virtio_transport_send_pkt_work(struct work_struct *work)
 
 	if (added)
 		virtqueue_kick(vq);
+}
 
+static int virtio_transport_do_send_dgram_pkt(struct virtio_vsock *vsock,
+		struct virtqueue *vq, struct virtio_vsock_pkt *pkt)
+{
+	struct scatterlist hdr, buf, *sgs[2];
+	int ret, in_sg = 0, out_sg = 0;
+
+	virtio_transport_deliver_tap_pkt(pkt);
+
+	sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
+	sgs[out_sg++] = &hdr;
+	if (pkt->buf) {
+		sg_init_one(&buf, pkt->buf, pkt->len);
+		sgs[out_sg++] = &buf;
+	}
+
+	ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt, GFP_KERNEL);
+	/* Usually this means that there is no more space available in
+	 * the vq
+	 */
+	if (ret < 0) {
+		virtio_transport_free_pkt(pkt);
+		return -ENOMEM;
+	}
+
+	virtqueue_kick(vq);
+
+	return pkt->len;
+}
+
+
+static void
+virtio_transport_send_pkt_work(struct work_struct *work)
+{
+	struct virtio_vsock *vsock =
+		container_of(work, struct virtio_vsock, send_pkt_work);
+	struct virtqueue *vq;
+	bool restart_rx = false;
+
+	mutex_lock(&vsock->tx_lock);
+
+	if (!vsock->tx_run)
+		goto out;
+
+	vq = vsock->vqs[VSOCK_VQ_TX];
+
+	virtio_transport_do_send_pkt(vsock, vq, &vsock->send_pkt_list_lock,
+							&vsock->send_pkt_list, &restart_rx);
 out:
 	mutex_unlock(&vsock->tx_lock);
 
@@ -163,11 +225,64 @@ virtio_transport_send_pkt_work(struct work_struct *work)
 }
 
 static int
+virtio_transport_send_dgram_pkt(struct virtio_vsock_pkt *pkt)
+{
+	struct virtio_vsock *vsock;
+	int len = pkt->len;
+	struct virtqueue *vq;
+
+	vsock = the_virtio_vsock_dgram;
+
+	if (!vsock) {
+		virtio_transport_free_pkt(pkt);
+		return -ENODEV;
+	}
+
+	if (!vsock->dgram_tx_run) {
+		virtio_transport_free_pkt(pkt);
+		return -ENODEV;
+	}
+
+	if (!refcount_inc_not_zero(&vsock->active)) {
+		virtio_transport_free_pkt(pkt);
+		return -ENODEV;
+	}
+
+	if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) {
+		virtio_transport_free_pkt(pkt);
+		len = -ENODEV;
+		goto out_ref;
+	}
+
+	/* send the pkt */
+	mutex_lock(&vsock->dgram_tx_lock);
+
+	if (!vsock->dgram_tx_run)
+		goto out_mutex;
+
+	vq = vsock->vqs[VSOCK_VQ_DGRAM_TX];
+
+	len = virtio_transport_do_send_dgram_pkt(vsock, vq, pkt);
+
+out_mutex:
+	mutex_unlock(&vsock->dgram_tx_lock);
+
+out_ref:
+	if (!refcount_dec_not_one(&vsock->active))
+		return -EFAULT;
+
+	return len;
+}
+
+static int
 virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 {
 	struct virtio_vsock *vsock;
 	int len = pkt->len;
 
+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)
+		return virtio_transport_send_dgram_pkt(pkt);
+
 	rcu_read_lock();
 	vsock = rcu_dereference(the_virtio_vsock);
 	if (!vsock) {
@@ -243,7 +358,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
 	return ret;
 }
 
-static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock, bool is_dgram)
 {
 	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
 	struct virtio_vsock_pkt *pkt;
@@ -251,7 +366,10 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
 	struct virtqueue *vq;
 	int ret;
 
-	vq = vsock->vqs[VSOCK_VQ_RX];
+	if (is_dgram)
+		vq = vsock->vqs[VSOCK_VQ_DGRAM_RX];
+	else
+		vq = vsock->vqs[VSOCK_VQ_RX];
 
 	do {
 		pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
@@ -277,10 +395,19 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
 			virtio_transport_free_pkt(pkt);
 			break;
 		}
-		vsock->rx_buf_nr++;
+		if (is_dgram)
+			vsock->dgram_rx_buf_nr++;
+		else
+			vsock->rx_buf_nr++;
 	} while (vq->num_free);
-	if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
-		vsock->rx_buf_max_nr = vsock->rx_buf_nr;
+	if (is_dgram) {
+		if (vsock->dgram_rx_buf_nr > vsock->dgram_rx_buf_max_nr)
+			vsock->dgram_rx_buf_max_nr = vsock->dgram_rx_buf_nr;
+	} else {
+		if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
+			vsock->rx_buf_max_nr = vsock->rx_buf_nr;
+	}
+
 	virtqueue_kick(vq);
 }
 
@@ -315,6 +442,34 @@ static void virtio_transport_tx_work(struct work_struct *work)
 		queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work);
 }
 
+static void virtio_transport_dgram_tx_work(struct work_struct *work)
+{
+	struct virtio_vsock *vsock =
+		container_of(work, struct virtio_vsock, dgram_tx_work);
+	struct virtqueue *vq;
+	bool added = false;
+
+	vq = vsock->vqs[VSOCK_VQ_DGRAM_TX];
+	mutex_lock(&vsock->dgram_tx_lock);
+
+	if (!vsock->dgram_tx_run)
+		goto out;
+
+	do {
+		struct virtio_vsock_pkt *pkt;
+		unsigned int len;
+
+		virtqueue_disable_cb(vq);
+		while ((pkt = virtqueue_get_buf(vq, &len)) != NULL) {
+			virtio_transport_free_pkt(pkt);
+			added = true;
+		}
+	} while (!virtqueue_enable_cb(vq));
+
+out:
+	mutex_unlock(&vsock->dgram_tx_lock);
+}
+
 /* Is there space left for replies to rx packets? */
 static bool virtio_transport_more_replies(struct virtio_vsock *vsock)
 {
@@ -449,6 +604,11 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)
 
 static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)
 {
+	struct virtio_vsock *vsock = vq->vdev->priv;
+
+	if (!vsock)
+		return;
+	queue_work(virtio_vsock_workqueue, &vsock->dgram_tx_work);
 }
 
 static void virtio_vsock_rx_done(struct virtqueue *vq)
@@ -462,8 +622,12 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
 
 static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)
 {
-}
+	struct virtio_vsock *vsock = vq->vdev->priv;
 
+	if (!vsock)
+		return;
+	queue_work(virtio_vsock_workqueue, &vsock->dgram_rx_work);
+}
 static struct virtio_transport virtio_transport = {
 	.transport = {
 		.module                   = THIS_MODULE,
@@ -506,19 +670,9 @@ static struct virtio_transport virtio_transport = {
 	.send_pkt = virtio_transport_send_pkt,
 };
 
-static void virtio_transport_rx_work(struct work_struct *work)
+static void virtio_transport_do_rx_work(struct virtio_vsock *vsock,
+						struct virtqueue *vq, bool is_dgram)
 {
-	struct virtio_vsock *vsock =
-		container_of(work, struct virtio_vsock, rx_work);
-	struct virtqueue *vq;
-
-	vq = vsock->vqs[VSOCK_VQ_RX];
-
-	mutex_lock(&vsock->rx_lock);
-
-	if (!vsock->rx_run)
-		goto out;
-
 	do {
 		virtqueue_disable_cb(vq);
 		for (;;) {
@@ -538,7 +692,10 @@ static void virtio_transport_rx_work(struct work_struct *work)
 				break;
 			}
 
-			vsock->rx_buf_nr--;
+			if (is_dgram)
+				vsock->dgram_rx_buf_nr--;
+			else
+				vsock->rx_buf_nr--;
 
 			/* Drop short/long packets */
 			if (unlikely(len < sizeof(pkt->hdr) ||
@@ -554,11 +711,45 @@ static void virtio_transport_rx_work(struct work_struct *work)
 	} while (!virtqueue_enable_cb(vq));
 
 out:
+	return;
+}
+
+static void virtio_transport_rx_work(struct work_struct *work)
+{
+	struct virtio_vsock *vsock =
+		container_of(work, struct virtio_vsock, rx_work);
+	struct virtqueue *vq;
+
+	vq = vsock->vqs[VSOCK_VQ_RX];
+
+	mutex_lock(&vsock->rx_lock);
+
+	if (vsock->rx_run)
+		virtio_transport_do_rx_work(vsock, vq, false);
+
 	if (vsock->rx_buf_nr < vsock->rx_buf_max_nr / 2)
-		virtio_vsock_rx_fill(vsock);
+		virtio_vsock_rx_fill(vsock, false);
 	mutex_unlock(&vsock->rx_lock);
 }
 
+static void virtio_transport_dgram_rx_work(struct work_struct *work)
+{
+	struct virtio_vsock *vsock =
+		container_of(work, struct virtio_vsock, dgram_rx_work);
+	struct virtqueue *vq;
+
+	vq = vsock->vqs[VSOCK_VQ_DGRAM_RX];
+
+	mutex_lock(&vsock->dgram_rx_lock);
+
+	if (vsock->dgram_rx_run)
+		virtio_transport_do_rx_work(vsock, vq, true);
+
+	if (vsock->dgram_rx_buf_nr < vsock->dgram_rx_buf_max_nr / 2)
+		virtio_vsock_rx_fill(vsock, true);
+	mutex_unlock(&vsock->dgram_rx_lock);
+}
+
 static int virtio_vsock_probe(struct virtio_device *vdev)
 {
 	vq_callback_t *callbacks[] = {
@@ -642,8 +833,14 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
 	vsock->rx_buf_max_nr = 0;
 	atomic_set(&vsock->queued_replies, 0);
 
+	vsock->dgram_rx_buf_nr = 0;
+	vsock->dgram_rx_buf_max_nr = 0;
+	atomic_set(&vsock->dgram_queued_replies, 0);
+
 	mutex_init(&vsock->tx_lock);
 	mutex_init(&vsock->rx_lock);
+	mutex_init(&vsock->dgram_tx_lock);
+	mutex_init(&vsock->dgram_rx_lock);
 	mutex_init(&vsock->event_lock);
 	spin_lock_init(&vsock->send_pkt_list_lock);
 	INIT_LIST_HEAD(&vsock->send_pkt_list);
@@ -651,16 +848,27 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
 	INIT_WORK(&vsock->tx_work, virtio_transport_tx_work);
 	INIT_WORK(&vsock->event_work, virtio_transport_event_work);
 	INIT_WORK(&vsock->send_pkt_work, virtio_transport_send_pkt_work);
+	INIT_WORK(&vsock->dgram_rx_work, virtio_transport_dgram_rx_work);
+	INIT_WORK(&vsock->dgram_tx_work, virtio_transport_dgram_tx_work);
 
 	mutex_lock(&vsock->tx_lock);
 	vsock->tx_run = true;
 	mutex_unlock(&vsock->tx_lock);
 
+	mutex_lock(&vsock->dgram_tx_lock);
+	vsock->dgram_tx_run = true;
+	mutex_unlock(&vsock->dgram_tx_lock);
+
 	mutex_lock(&vsock->rx_lock);
-	virtio_vsock_rx_fill(vsock);
+	virtio_vsock_rx_fill(vsock, false);
 	vsock->rx_run = true;
 	mutex_unlock(&vsock->rx_lock);
 
+	mutex_lock(&vsock->dgram_rx_lock);
+	virtio_vsock_rx_fill(vsock, true);
+	vsock->dgram_rx_run = true;
+	mutex_unlock(&vsock->dgram_rx_lock);
+
 	mutex_lock(&vsock->event_lock);
 	virtio_vsock_event_fill(vsock);
 	vsock->event_run = true;
@@ -669,6 +877,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
 	vdev->priv = vsock;
 	rcu_assign_pointer(the_virtio_vsock, vsock);
 
+	the_virtio_vsock_dgram = vsock;
+	refcount_set(&the_virtio_vsock_dgram->active, 1);
+
 	mutex_unlock(&the_virtio_vsock_mutex);
 	return 0;
 
@@ -699,14 +910,28 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
 	vsock->rx_run = false;
 	mutex_unlock(&vsock->rx_lock);
 
+	mutex_lock(&vsock->dgram_rx_lock);
+	vsock->dgram_rx_run = false;
+	mutex_unlock(&vsock->dgram_rx_lock);
+
 	mutex_lock(&vsock->tx_lock);
 	vsock->tx_run = false;
 	mutex_unlock(&vsock->tx_lock);
 
+	mutex_lock(&vsock->dgram_tx_lock);
+	vsock->dgram_tx_run = false;
+	mutex_unlock(&vsock->dgram_tx_lock);
+
 	mutex_lock(&vsock->event_lock);
 	vsock->event_run = false;
 	mutex_unlock(&vsock->event_lock);
 
+	while (!refcount_dec_if_one(&the_virtio_vsock_dgram->active)) {
+		if (signal_pending(current))
+			break;
+		msleep(5);
+	}
+
 	/* Flush all device writes and interrupts, device will not use any
 	 * more buffers.
 	 */
@@ -717,11 +942,21 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
 		virtio_transport_free_pkt(pkt);
 	mutex_unlock(&vsock->rx_lock);
 
+	mutex_lock(&vsock->dgram_rx_lock);
+	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_DGRAM_RX])))
+		virtio_transport_free_pkt(pkt);
+	mutex_unlock(&vsock->dgram_rx_lock);
+
 	mutex_lock(&vsock->tx_lock);
 	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_TX])))
 		virtio_transport_free_pkt(pkt);
 	mutex_unlock(&vsock->tx_lock);
 
+	mutex_lock(&vsock->dgram_tx_lock);
+	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_DGRAM_TX])))
+		virtio_transport_free_pkt(pkt);
+	mutex_unlock(&vsock->dgram_tx_lock);
+
 	spin_lock_bh(&vsock->send_pkt_list_lock);
 	while (!list_empty(&vsock->send_pkt_list)) {
 		pkt = list_first_entry(&vsock->send_pkt_list,
@@ -739,6 +974,8 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
 	 */
 	flush_work(&vsock->rx_work);
 	flush_work(&vsock->tx_work);
+	flush_work(&vsock->dgram_rx_work);
+	flush_work(&vsock->dgram_tx_work);
 	flush_work(&vsock->event_work);
 	flush_work(&vsock->send_pkt_work);
 
@@ -775,7 +1012,7 @@ static int __init virtio_vsock_init(void)
 		return -ENOMEM;
 
 	ret = vsock_core_register(&virtio_transport.transport,
-				  VSOCK_TRANSPORT_F_G2H);
+				  VSOCK_TRANSPORT_F_G2H | VSOCK_TRANSPORT_F_DGRAM);
 	if (ret)
 		goto out_wq;
 
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 902cb6dd710b..9f041515b7f1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -26,6 +26,8 @@
 /* Threshold for detecting small packets to copy */
 #define GOOD_COPY_LEN  128
 
+static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk);
+
 static const struct virtio_transport *
 virtio_transport_get_ops(struct vsock_sock *vsk)
 {
@@ -196,21 +198,28 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	vvs = vsk->trans;
 
 	/* we can send less than pkt_len bytes */
-	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
-		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
+	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
+		if (info->type == VIRTIO_VSOCK_TYPE_STREAM)
+			pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
+		else
+			return 0;
+	}
 
-	/* virtio_transport_get_credit might return less than pkt_len credit */
-	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
+	if (info->type == VIRTIO_VSOCK_TYPE_STREAM) {
+		/* virtio_transport_get_credit might return less than pkt_len credit */
+		pkt_len = virtio_transport_get_credit(vvs, pkt_len);
 
-	/* Do not send zero length OP_RW pkt */
-	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
-		return pkt_len;
+		/* Do not send zero length OP_RW pkt */
+		if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
+			return pkt_len;
+	}
 
 	pkt = virtio_transport_alloc_pkt(info, pkt_len,
 					 src_cid, src_port,
 					 dst_cid, dst_port);
 	if (!pkt) {
-		virtio_transport_put_credit(vvs, pkt_len);
+		if (info->type == VIRTIO_VSOCK_TYPE_STREAM)
+			virtio_transport_put_credit(vvs, pkt_len);
 		return -ENOMEM;
 	}
 
@@ -397,6 +406,58 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	return err;
 }
 
+static ssize_t
+virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
+						   struct msghdr *msg, size_t len)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+	struct virtio_vsock_pkt *pkt;
+	size_t total = 0;
+	u32 free_space;
+	int err = -EFAULT;
+
+	spin_lock_bh(&vvs->rx_lock);
+	if (total < len && !list_empty(&vvs->rx_queue)) {
+		pkt = list_first_entry(&vvs->rx_queue,
+				       struct virtio_vsock_pkt, list);
+
+		total = len;
+		if (total > pkt->len - pkt->off)
+			total = pkt->len - pkt->off;
+		else if (total < pkt->len - pkt->off)
+			msg->msg_flags |= MSG_TRUNC;
+
+		/* sk_lock is held by caller so no one else can dequeue.
+		 * Unlock rx_lock since memcpy_to_msg() may sleep.
+		 */
+		spin_unlock_bh(&vvs->rx_lock);
+
+		err = memcpy_to_msg(msg, pkt->buf + pkt->off, total);
+		if (err)
+			return err;
+
+		spin_lock_bh(&vvs->rx_lock);
+
+		virtio_transport_dec_rx_pkt(vvs, pkt);
+		list_del(&pkt->list);
+		virtio_transport_free_pkt(pkt);
+	}
+
+	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
+
+	spin_unlock_bh(&vvs->rx_lock);
+
+	if (total > 0 && msg->msg_name) {
+		/* Provide the address of the sender. */
+		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
+
+		vsock_addr_init(vm_addr, le64_to_cpu(pkt->hdr.src_cid),
+						le32_to_cpu(pkt->hdr.src_port));
+		msg->msg_namelen = sizeof(*vm_addr);
+	}
+	return total;
+}
+
 ssize_t
 virtio_transport_stream_dequeue(struct vsock_sock *vsk,
 				struct msghdr *msg,
@@ -414,7 +475,66 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
 			       struct msghdr *msg,
 			       size_t len, int flags)
 {
-	return -EOPNOTSUPP;
+	struct sock *sk;
+	size_t err = 0;
+	long timeout;
+
+	DEFINE_WAIT(wait);
+
+	sk = &vsk->sk;
+	err = 0;
+
+	lock_sock(sk);
+
+	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags & MSG_PEEK)
+		return -EOPNOTSUPP;
+
+	if (!len)
+		goto out;
+
+	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+
+	while (1) {
+		s64 ready;
+
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
+		ready = virtio_transport_dgram_has_data(vsk);
+
+		if (ready == 0) {
+			if (timeout == 0) {
+				err = -EAGAIN;
+				finish_wait(sk_sleep(sk), &wait);
+				break;
+			}
+
+			release_sock(sk);
+			timeout = schedule_timeout(timeout);
+			lock_sock(sk);
+
+			if (signal_pending(current)) {
+				err = sock_intr_errno(timeout);
+				finish_wait(sk_sleep(sk), &wait);
+				break;
+			} else if (timeout == 0) {
+				err = -EAGAIN;
+				finish_wait(sk_sleep(sk), &wait);
+				break;
+			}
+		} else {
+			finish_wait(sk_sleep(sk), &wait);
+
+			if (ready < 0) {
+				err = -ENOMEM;
+				goto out;
+			}
+
+			err = virtio_transport_dgram_do_dequeue(vsk, msg, len);
+			break;
+		}
+	}
+out:
+	release_sock(sk);
+	return err;
 }
 EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
 
@@ -431,6 +551,11 @@ s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
 }
 EXPORT_SYMBOL_GPL(virtio_transport_stream_has_data);
 
+static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk)
+{
+	return virtio_transport_stream_has_data(vsk);
+}
+
 static s64 virtio_transport_has_space(struct vsock_sock *vsk)
 {
 	struct virtio_vsock_sock *vvs = vsk->trans;
@@ -610,13 +735,15 @@ EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
 int virtio_transport_dgram_bind(struct vsock_sock *vsk,
 				struct sockaddr_vm *addr)
 {
-	return -EOPNOTSUPP;
+	//use same stream bind for dgram
+	int ret = vsock_bind_stream(vsk, addr);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
 
 bool virtio_transport_dgram_allow(u32 cid, u32 port)
 {
-	return false;
+	return true;
 }
 EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
 
@@ -654,7 +781,17 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
 			       struct msghdr *msg,
 			       size_t dgram_len)
 {
-	return -EOPNOTSUPP;
+	struct virtio_vsock_pkt_info info = {
+		.op = VIRTIO_VSOCK_OP_RW,
+		.type = VIRTIO_VSOCK_TYPE_DGRAM,
+		.msg = msg,
+		.pkt_len = dgram_len,
+		.vsk = vsk,
+		.remote_cid = remote_addr->svm_cid,
+		.remote_port = remote_addr->svm_port,
+	};
+
+	return virtio_transport_send_pkt_info(vsk, &info);
 }
 EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
 
@@ -729,7 +866,6 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
 		virtio_transport_free_pkt(reply);
 		return -ENOTCONN;
 	}
-
 	return t->send_pkt(reply);
 }
 
@@ -925,7 +1061,8 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
 		/* If there is space in the last packet queued, we copy the
 		 * new packet in its buffer.
 		 */
-		if (pkt->len <= last_pkt->buf_len - last_pkt->len) {
+		if (pkt->len <= last_pkt->buf_len - last_pkt->len &&
+			pkt->hdr.type == VIRTIO_VSOCK_TYPE_STREAM) {
 			memcpy(last_pkt->buf + last_pkt->len, pkt->buf,
 			       pkt->len);
 			last_pkt->len += pkt->len;
@@ -949,6 +1086,12 @@ virtio_transport_recv_connected(struct sock *sk,
 	struct vsock_sock *vsk = vsock_sk(sk);
 	int err = 0;
 
+	if (le16_to_cpu(pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)) {
+		virtio_transport_recv_enqueue(vsk, pkt);
+		sk->sk_data_ready(sk);
+		return err;
+	}
+
 	switch (le16_to_cpu(pkt->hdr.op)) {
 	case VIRTIO_VSOCK_OP_RW:
 		virtio_transport_recv_enqueue(vsk, pkt);
@@ -1121,7 +1264,8 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 					le32_to_cpu(pkt->hdr.buf_alloc),
 					le32_to_cpu(pkt->hdr.fwd_cnt));
 
-	if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM) {
+	if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM &&
+		le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_DGRAM) {
 		(void)virtio_transport_reset_no_sock(t, pkt);
 		goto free_pkt;
 	}
@@ -1150,11 +1294,16 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 		goto free_pkt;
 	}
 
-	space_available = virtio_transport_space_update(sk, pkt);
-
 	/* Update CID in case it has changed after a transport reset event */
 	vsk->local_addr.svm_cid = dst.svm_cid;
 
+	if (sk->sk_type == SOCK_DGRAM) {
+		virtio_transport_recv_connected(sk, pkt);
+		goto out;
+	}
+
+	space_available = virtio_transport_space_update(sk, pkt);
+
 	if (space_available)
 		sk->sk_write_space(sk);
 
@@ -1180,6 +1329,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 		break;
 	}
 
+out:
 	release_sock(sk);
 
 	/* Release refcnt obtained when we fetched this socket out of the
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC v1 2/6] virtio/vsock: add support for virtio datagram
@ 2021-06-09 23:24   ` Jiang Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang @ 2021-06-09 23:24 UTC (permalink / raw)
  To: sgarzare
  Cc: cong.wang, duanxiongchun, Ingo Molnar, kvm, mst, netdev,
	linux-kernel, Steven Rostedt, virtualization, xieyongji,
	chaiwen.cc, Norbert Slusarek, stefanha, Colin Ian King,
	Jakub Kicinski, arseny.krasnov, Alexander Popov, jhansen,
	David S. Miller, Andra Paraschiv

This patch add support for virtio dgram for the driver.
Implemented related functions for tx and rx, enqueue
and dequeue. Send packets synchronously to give sender
indication when the virtqueue is full.
Refactored virtio_transport_send_pkt_work() a little bit but
no functions changes for it.

Support for the host/device side is in another
patch.

Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
---
 include/net/af_vsock.h                             |   1 +
 .../trace/events/vsock_virtio_transport_common.h   |   5 +-
 include/uapi/linux/virtio_vsock.h                  |   1 +
 net/vmw_vsock/af_vsock.c                           |  12 +
 net/vmw_vsock/virtio_transport.c                   | 325 ++++++++++++++++++---
 net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++++--
 6 files changed, 466 insertions(+), 62 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index b1c717286993..fcae7bca9609 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -200,6 +200,7 @@ void vsock_remove_sock(struct vsock_sock *vsk);
 void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
 int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
 bool vsock_find_cid(unsigned int cid);
+int vsock_bind_stream(struct vsock_sock *vsk, struct sockaddr_vm *addr);
 
 /**** TAP ****/
 
diff --git a/include/trace/events/vsock_virtio_transport_common.h b/include/trace/events/vsock_virtio_transport_common.h
index 6782213778be..b1be25b327a1 100644
--- a/include/trace/events/vsock_virtio_transport_common.h
+++ b/include/trace/events/vsock_virtio_transport_common.h
@@ -9,9 +9,12 @@
 #include <linux/tracepoint.h>
 
 TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_STREAM);
+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_DGRAM);
 
 #define show_type(val) \
-	__print_symbolic(val, { VIRTIO_VSOCK_TYPE_STREAM, "STREAM" })
+	 __print_symbolic(val, \
+					{ VIRTIO_VSOCK_TYPE_STREAM, "STREAM" }, \
+					{ VIRTIO_VSOCK_TYPE_DGRAM, "DGRAM" })
 
 TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_INVALID);
 TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_REQUEST);
diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
index b56614dff1c9..5503585b26e8 100644
--- a/include/uapi/linux/virtio_vsock.h
+++ b/include/uapi/linux/virtio_vsock.h
@@ -68,6 +68,7 @@ struct virtio_vsock_hdr {
 
 enum virtio_vsock_type {
 	VIRTIO_VSOCK_TYPE_STREAM = 1,
+	VIRTIO_VSOCK_TYPE_DGRAM = 3,
 };
 
 enum virtio_vsock_op {
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 92a72f0e0d94..c1f512291b94 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -659,6 +659,18 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
 	return 0;
 }
 
+int vsock_bind_stream(struct vsock_sock *vsk,
+				       struct sockaddr_vm *addr)
+{
+	int retval;
+
+	spin_lock_bh(&vsock_table_lock);
+	retval = __vsock_bind_stream(vsk, addr);
+	spin_unlock_bh(&vsock_table_lock);
+	return retval;
+}
+EXPORT_SYMBOL(vsock_bind_stream);
+
 static int __vsock_bind_dgram(struct vsock_sock *vsk,
 			      struct sockaddr_vm *addr)
 {
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 7dcb8db23305..cf47aadb0c34 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -20,21 +20,29 @@
 #include <net/sock.h>
 #include <linux/mutex.h>
 #include <net/af_vsock.h>
+#include<linux/kobject.h>
+#include<linux/sysfs.h>
+#include <linux/refcount.h>
 
 static struct workqueue_struct *virtio_vsock_workqueue;
 static struct virtio_vsock __rcu *the_virtio_vsock;
+static struct virtio_vsock *the_virtio_vsock_dgram;
 static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
 
 struct virtio_vsock {
 	struct virtio_device *vdev;
 	struct virtqueue **vqs;
 	bool has_dgram;
+	refcount_t active;
 
 	/* Virtqueue processing is deferred to a workqueue */
 	struct work_struct tx_work;
 	struct work_struct rx_work;
 	struct work_struct event_work;
 
+	struct work_struct dgram_tx_work;
+	struct work_struct dgram_rx_work;
+
 	/* The following fields are protected by tx_lock.  vqs[VSOCK_VQ_TX]
 	 * must be accessed with tx_lock held.
 	 */
@@ -55,6 +63,22 @@ struct virtio_vsock {
 	int rx_buf_nr;
 	int rx_buf_max_nr;
 
+	/* The following fields are protected by dgram_tx_lock.  vqs[VSOCK_VQ_DGRAM_TX]
+	 * must be accessed with dgram_tx_lock held.
+	 */
+	struct mutex dgram_tx_lock;
+	bool dgram_tx_run;
+
+	atomic_t dgram_queued_replies;
+
+	/* The following fields are protected by dgram_rx_lock.  vqs[VSOCK_VQ_DGRAM_RX]
+	 * must be accessed with dgram_rx_lock held.
+	 */
+	struct mutex dgram_rx_lock;
+	bool dgram_rx_run;
+	int dgram_rx_buf_nr;
+	int dgram_rx_buf_max_nr;
+
 	/* The following fields are protected by event_lock.
 	 * vqs[VSOCK_VQ_EVENT] must be accessed with event_lock held.
 	 */
@@ -83,21 +107,11 @@ static u32 virtio_transport_get_local_cid(void)
 	return ret;
 }
 
-static void
-virtio_transport_send_pkt_work(struct work_struct *work)
+static void virtio_transport_do_send_pkt(struct virtio_vsock *vsock,
+		struct virtqueue *vq,  spinlock_t *lock, struct list_head *send_pkt_list,
+		bool *restart_rx)
 {
-	struct virtio_vsock *vsock =
-		container_of(work, struct virtio_vsock, send_pkt_work);
-	struct virtqueue *vq;
 	bool added = false;
-	bool restart_rx = false;
-
-	mutex_lock(&vsock->tx_lock);
-
-	if (!vsock->tx_run)
-		goto out;
-
-	vq = vsock->vqs[VSOCK_VQ_TX];
 
 	for (;;) {
 		struct virtio_vsock_pkt *pkt;
@@ -105,16 +119,16 @@ virtio_transport_send_pkt_work(struct work_struct *work)
 		int ret, in_sg = 0, out_sg = 0;
 		bool reply;
 
-		spin_lock_bh(&vsock->send_pkt_list_lock);
-		if (list_empty(&vsock->send_pkt_list)) {
-			spin_unlock_bh(&vsock->send_pkt_list_lock);
+		spin_lock_bh(lock);
+		if (list_empty(send_pkt_list)) {
+			spin_unlock_bh(lock);
 			break;
 		}
 
-		pkt = list_first_entry(&vsock->send_pkt_list,
+		pkt = list_first_entry(send_pkt_list,
 				       struct virtio_vsock_pkt, list);
 		list_del_init(&pkt->list);
-		spin_unlock_bh(&vsock->send_pkt_list_lock);
+		spin_unlock_bh(lock);
 
 		virtio_transport_deliver_tap_pkt(pkt);
 
@@ -132,9 +146,9 @@ virtio_transport_send_pkt_work(struct work_struct *work)
 		 * the vq
 		 */
 		if (ret < 0) {
-			spin_lock_bh(&vsock->send_pkt_list_lock);
-			list_add(&pkt->list, &vsock->send_pkt_list);
-			spin_unlock_bh(&vsock->send_pkt_list_lock);
+			spin_lock_bh(lock);
+			list_add(&pkt->list, send_pkt_list);
+			spin_unlock_bh(lock);
 			break;
 		}
 
@@ -146,7 +160,7 @@ virtio_transport_send_pkt_work(struct work_struct *work)
 
 			/* Do we now have resources to resume rx processing? */
 			if (val + 1 == virtqueue_get_vring_size(rx_vq))
-				restart_rx = true;
+				*restart_rx = true;
 		}
 
 		added = true;
@@ -154,7 +168,55 @@ virtio_transport_send_pkt_work(struct work_struct *work)
 
 	if (added)
 		virtqueue_kick(vq);
+}
 
+static int virtio_transport_do_send_dgram_pkt(struct virtio_vsock *vsock,
+		struct virtqueue *vq, struct virtio_vsock_pkt *pkt)
+{
+	struct scatterlist hdr, buf, *sgs[2];
+	int ret, in_sg = 0, out_sg = 0;
+
+	virtio_transport_deliver_tap_pkt(pkt);
+
+	sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
+	sgs[out_sg++] = &hdr;
+	if (pkt->buf) {
+		sg_init_one(&buf, pkt->buf, pkt->len);
+		sgs[out_sg++] = &buf;
+	}
+
+	ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt, GFP_KERNEL);
+	/* Usually this means that there is no more space available in
+	 * the vq
+	 */
+	if (ret < 0) {
+		virtio_transport_free_pkt(pkt);
+		return -ENOMEM;
+	}
+
+	virtqueue_kick(vq);
+
+	return pkt->len;
+}
+
+
+static void
+virtio_transport_send_pkt_work(struct work_struct *work)
+{
+	struct virtio_vsock *vsock =
+		container_of(work, struct virtio_vsock, send_pkt_work);
+	struct virtqueue *vq;
+	bool restart_rx = false;
+
+	mutex_lock(&vsock->tx_lock);
+
+	if (!vsock->tx_run)
+		goto out;
+
+	vq = vsock->vqs[VSOCK_VQ_TX];
+
+	virtio_transport_do_send_pkt(vsock, vq, &vsock->send_pkt_list_lock,
+							&vsock->send_pkt_list, &restart_rx);
 out:
 	mutex_unlock(&vsock->tx_lock);
 
@@ -163,11 +225,64 @@ virtio_transport_send_pkt_work(struct work_struct *work)
 }
 
 static int
+virtio_transport_send_dgram_pkt(struct virtio_vsock_pkt *pkt)
+{
+	struct virtio_vsock *vsock;
+	int len = pkt->len;
+	struct virtqueue *vq;
+
+	vsock = the_virtio_vsock_dgram;
+
+	if (!vsock) {
+		virtio_transport_free_pkt(pkt);
+		return -ENODEV;
+	}
+
+	if (!vsock->dgram_tx_run) {
+		virtio_transport_free_pkt(pkt);
+		return -ENODEV;
+	}
+
+	if (!refcount_inc_not_zero(&vsock->active)) {
+		virtio_transport_free_pkt(pkt);
+		return -ENODEV;
+	}
+
+	if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) {
+		virtio_transport_free_pkt(pkt);
+		len = -ENODEV;
+		goto out_ref;
+	}
+
+	/* send the pkt */
+	mutex_lock(&vsock->dgram_tx_lock);
+
+	if (!vsock->dgram_tx_run)
+		goto out_mutex;
+
+	vq = vsock->vqs[VSOCK_VQ_DGRAM_TX];
+
+	len = virtio_transport_do_send_dgram_pkt(vsock, vq, pkt);
+
+out_mutex:
+	mutex_unlock(&vsock->dgram_tx_lock);
+
+out_ref:
+	if (!refcount_dec_not_one(&vsock->active))
+		return -EFAULT;
+
+	return len;
+}
+
+static int
 virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 {
 	struct virtio_vsock *vsock;
 	int len = pkt->len;
 
+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)
+		return virtio_transport_send_dgram_pkt(pkt);
+
 	rcu_read_lock();
 	vsock = rcu_dereference(the_virtio_vsock);
 	if (!vsock) {
@@ -243,7 +358,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
 	return ret;
 }
 
-static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock, bool is_dgram)
 {
 	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
 	struct virtio_vsock_pkt *pkt;
@@ -251,7 +366,10 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
 	struct virtqueue *vq;
 	int ret;
 
-	vq = vsock->vqs[VSOCK_VQ_RX];
+	if (is_dgram)
+		vq = vsock->vqs[VSOCK_VQ_DGRAM_RX];
+	else
+		vq = vsock->vqs[VSOCK_VQ_RX];
 
 	do {
 		pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
@@ -277,10 +395,19 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
 			virtio_transport_free_pkt(pkt);
 			break;
 		}
-		vsock->rx_buf_nr++;
+		if (is_dgram)
+			vsock->dgram_rx_buf_nr++;
+		else
+			vsock->rx_buf_nr++;
 	} while (vq->num_free);
-	if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
-		vsock->rx_buf_max_nr = vsock->rx_buf_nr;
+	if (is_dgram) {
+		if (vsock->dgram_rx_buf_nr > vsock->dgram_rx_buf_max_nr)
+			vsock->dgram_rx_buf_max_nr = vsock->dgram_rx_buf_nr;
+	} else {
+		if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
+			vsock->rx_buf_max_nr = vsock->rx_buf_nr;
+	}
+
 	virtqueue_kick(vq);
 }
 
@@ -315,6 +442,34 @@ static void virtio_transport_tx_work(struct work_struct *work)
 		queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work);
 }
 
+static void virtio_transport_dgram_tx_work(struct work_struct *work)
+{
+	struct virtio_vsock *vsock =
+		container_of(work, struct virtio_vsock, dgram_tx_work);
+	struct virtqueue *vq;
+	bool added = false;
+
+	vq = vsock->vqs[VSOCK_VQ_DGRAM_TX];
+	mutex_lock(&vsock->dgram_tx_lock);
+
+	if (!vsock->dgram_tx_run)
+		goto out;
+
+	do {
+		struct virtio_vsock_pkt *pkt;
+		unsigned int len;
+
+		virtqueue_disable_cb(vq);
+		while ((pkt = virtqueue_get_buf(vq, &len)) != NULL) {
+			virtio_transport_free_pkt(pkt);
+			added = true;
+		}
+	} while (!virtqueue_enable_cb(vq));
+
+out:
+	mutex_unlock(&vsock->dgram_tx_lock);
+}
+
 /* Is there space left for replies to rx packets? */
 static bool virtio_transport_more_replies(struct virtio_vsock *vsock)
 {
@@ -449,6 +604,11 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)
 
 static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)
 {
+	struct virtio_vsock *vsock = vq->vdev->priv;
+
+	if (!vsock)
+		return;
+	queue_work(virtio_vsock_workqueue, &vsock->dgram_tx_work);
 }
 
 static void virtio_vsock_rx_done(struct virtqueue *vq)
@@ -462,8 +622,12 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
 
 static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)
 {
-}
+	struct virtio_vsock *vsock = vq->vdev->priv;
 
+	if (!vsock)
+		return;
+	queue_work(virtio_vsock_workqueue, &vsock->dgram_rx_work);
+}
 static struct virtio_transport virtio_transport = {
 	.transport = {
 		.module                   = THIS_MODULE,
@@ -506,19 +670,9 @@ static struct virtio_transport virtio_transport = {
 	.send_pkt = virtio_transport_send_pkt,
 };
 
-static void virtio_transport_rx_work(struct work_struct *work)
+static void virtio_transport_do_rx_work(struct virtio_vsock *vsock,
+						struct virtqueue *vq, bool is_dgram)
 {
-	struct virtio_vsock *vsock =
-		container_of(work, struct virtio_vsock, rx_work);
-	struct virtqueue *vq;
-
-	vq = vsock->vqs[VSOCK_VQ_RX];
-
-	mutex_lock(&vsock->rx_lock);
-
-	if (!vsock->rx_run)
-		goto out;
-
 	do {
 		virtqueue_disable_cb(vq);
 		for (;;) {
@@ -538,7 +692,10 @@ static void virtio_transport_rx_work(struct work_struct *work)
 				break;
 			}
 
-			vsock->rx_buf_nr--;
+			if (is_dgram)
+				vsock->dgram_rx_buf_nr--;
+			else
+				vsock->rx_buf_nr--;
 
 			/* Drop short/long packets */
 			if (unlikely(len < sizeof(pkt->hdr) ||
@@ -554,11 +711,45 @@ static void virtio_transport_rx_work(struct work_struct *work)
 	} while (!virtqueue_enable_cb(vq));
 
 out:
+	return;
+}
+
+static void virtio_transport_rx_work(struct work_struct *work)
+{
+	struct virtio_vsock *vsock =
+		container_of(work, struct virtio_vsock, rx_work);
+	struct virtqueue *vq;
+
+	vq = vsock->vqs[VSOCK_VQ_RX];
+
+	mutex_lock(&vsock->rx_lock);
+
+	if (vsock->rx_run)
+		virtio_transport_do_rx_work(vsock, vq, false);
+
 	if (vsock->rx_buf_nr < vsock->rx_buf_max_nr / 2)
-		virtio_vsock_rx_fill(vsock);
+		virtio_vsock_rx_fill(vsock, false);
 	mutex_unlock(&vsock->rx_lock);
 }
 
+static void virtio_transport_dgram_rx_work(struct work_struct *work)
+{
+	struct virtio_vsock *vsock =
+		container_of(work, struct virtio_vsock, dgram_rx_work);
+	struct virtqueue *vq;
+
+	vq = vsock->vqs[VSOCK_VQ_DGRAM_RX];
+
+	mutex_lock(&vsock->dgram_rx_lock);
+
+	if (vsock->dgram_rx_run)
+		virtio_transport_do_rx_work(vsock, vq, true);
+
+	if (vsock->dgram_rx_buf_nr < vsock->dgram_rx_buf_max_nr / 2)
+		virtio_vsock_rx_fill(vsock, true);
+	mutex_unlock(&vsock->dgram_rx_lock);
+}
+
 static int virtio_vsock_probe(struct virtio_device *vdev)
 {
 	vq_callback_t *callbacks[] = {
@@ -642,8 +833,14 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
 	vsock->rx_buf_max_nr = 0;
 	atomic_set(&vsock->queued_replies, 0);
 
+	vsock->dgram_rx_buf_nr = 0;
+	vsock->dgram_rx_buf_max_nr = 0;
+	atomic_set(&vsock->dgram_queued_replies, 0);
+
 	mutex_init(&vsock->tx_lock);
 	mutex_init(&vsock->rx_lock);
+	mutex_init(&vsock->dgram_tx_lock);
+	mutex_init(&vsock->dgram_rx_lock);
 	mutex_init(&vsock->event_lock);
 	spin_lock_init(&vsock->send_pkt_list_lock);
 	INIT_LIST_HEAD(&vsock->send_pkt_list);
@@ -651,16 +848,27 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
 	INIT_WORK(&vsock->tx_work, virtio_transport_tx_work);
 	INIT_WORK(&vsock->event_work, virtio_transport_event_work);
 	INIT_WORK(&vsock->send_pkt_work, virtio_transport_send_pkt_work);
+	INIT_WORK(&vsock->dgram_rx_work, virtio_transport_dgram_rx_work);
+	INIT_WORK(&vsock->dgram_tx_work, virtio_transport_dgram_tx_work);
 
 	mutex_lock(&vsock->tx_lock);
 	vsock->tx_run = true;
 	mutex_unlock(&vsock->tx_lock);
 
+	mutex_lock(&vsock->dgram_tx_lock);
+	vsock->dgram_tx_run = true;
+	mutex_unlock(&vsock->dgram_tx_lock);
+
 	mutex_lock(&vsock->rx_lock);
-	virtio_vsock_rx_fill(vsock);
+	virtio_vsock_rx_fill(vsock, false);
 	vsock->rx_run = true;
 	mutex_unlock(&vsock->rx_lock);
 
+	mutex_lock(&vsock->dgram_rx_lock);
+	virtio_vsock_rx_fill(vsock, true);
+	vsock->dgram_rx_run = true;
+	mutex_unlock(&vsock->dgram_rx_lock);
+
 	mutex_lock(&vsock->event_lock);
 	virtio_vsock_event_fill(vsock);
 	vsock->event_run = true;
@@ -669,6 +877,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
 	vdev->priv = vsock;
 	rcu_assign_pointer(the_virtio_vsock, vsock);
 
+	the_virtio_vsock_dgram = vsock;
+	refcount_set(&the_virtio_vsock_dgram->active, 1);
+
 	mutex_unlock(&the_virtio_vsock_mutex);
 	return 0;
 
@@ -699,14 +910,28 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
 	vsock->rx_run = false;
 	mutex_unlock(&vsock->rx_lock);
 
+	mutex_lock(&vsock->dgram_rx_lock);
+	vsock->dgram_rx_run = false;
+	mutex_unlock(&vsock->dgram_rx_lock);
+
 	mutex_lock(&vsock->tx_lock);
 	vsock->tx_run = false;
 	mutex_unlock(&vsock->tx_lock);
 
+	mutex_lock(&vsock->dgram_tx_lock);
+	vsock->dgram_tx_run = false;
+	mutex_unlock(&vsock->dgram_tx_lock);
+
 	mutex_lock(&vsock->event_lock);
 	vsock->event_run = false;
 	mutex_unlock(&vsock->event_lock);
 
+	while (!refcount_dec_if_one(&the_virtio_vsock_dgram->active)) {
+		if (signal_pending(current))
+			break;
+		msleep(5);
+	}
+
 	/* Flush all device writes and interrupts, device will not use any
 	 * more buffers.
 	 */
@@ -717,11 +942,21 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
 		virtio_transport_free_pkt(pkt);
 	mutex_unlock(&vsock->rx_lock);
 
+	mutex_lock(&vsock->dgram_rx_lock);
+	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_DGRAM_RX])))
+		virtio_transport_free_pkt(pkt);
+	mutex_unlock(&vsock->dgram_rx_lock);
+
 	mutex_lock(&vsock->tx_lock);
 	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_TX])))
 		virtio_transport_free_pkt(pkt);
 	mutex_unlock(&vsock->tx_lock);
 
+	mutex_lock(&vsock->dgram_tx_lock);
+	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_DGRAM_TX])))
+		virtio_transport_free_pkt(pkt);
+	mutex_unlock(&vsock->dgram_tx_lock);
+
 	spin_lock_bh(&vsock->send_pkt_list_lock);
 	while (!list_empty(&vsock->send_pkt_list)) {
 		pkt = list_first_entry(&vsock->send_pkt_list,
@@ -739,6 +974,8 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
 	 */
 	flush_work(&vsock->rx_work);
 	flush_work(&vsock->tx_work);
+	flush_work(&vsock->dgram_rx_work);
+	flush_work(&vsock->dgram_tx_work);
 	flush_work(&vsock->event_work);
 	flush_work(&vsock->send_pkt_work);
 
@@ -775,7 +1012,7 @@ static int __init virtio_vsock_init(void)
 		return -ENOMEM;
 
 	ret = vsock_core_register(&virtio_transport.transport,
-				  VSOCK_TRANSPORT_F_G2H);
+				  VSOCK_TRANSPORT_F_G2H | VSOCK_TRANSPORT_F_DGRAM);
 	if (ret)
 		goto out_wq;
 
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 902cb6dd710b..9f041515b7f1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -26,6 +26,8 @@
 /* Threshold for detecting small packets to copy */
 #define GOOD_COPY_LEN  128
 
+static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk);
+
 static const struct virtio_transport *
 virtio_transport_get_ops(struct vsock_sock *vsk)
 {
@@ -196,21 +198,28 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	vvs = vsk->trans;
 
 	/* we can send less than pkt_len bytes */
-	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
-		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
+	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
+		if (info->type == VIRTIO_VSOCK_TYPE_STREAM)
+			pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
+		else
+			return 0;
+	}
 
-	/* virtio_transport_get_credit might return less than pkt_len credit */
-	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
+	if (info->type == VIRTIO_VSOCK_TYPE_STREAM) {
+		/* virtio_transport_get_credit might return less than pkt_len credit */
+		pkt_len = virtio_transport_get_credit(vvs, pkt_len);
 
-	/* Do not send zero length OP_RW pkt */
-	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
-		return pkt_len;
+		/* Do not send zero length OP_RW pkt */
+		if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
+			return pkt_len;
+	}
 
 	pkt = virtio_transport_alloc_pkt(info, pkt_len,
 					 src_cid, src_port,
 					 dst_cid, dst_port);
 	if (!pkt) {
-		virtio_transport_put_credit(vvs, pkt_len);
+		if (info->type == VIRTIO_VSOCK_TYPE_STREAM)
+			virtio_transport_put_credit(vvs, pkt_len);
 		return -ENOMEM;
 	}
 
@@ -397,6 +406,58 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	return err;
 }
 
+static ssize_t
+virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
+						   struct msghdr *msg, size_t len)
+{
+	struct virtio_vsock_sock *vvs = vsk->trans;
+	struct virtio_vsock_pkt *pkt;
+	size_t total = 0;
+	u32 free_space;
+	int err = -EFAULT;
+
+	spin_lock_bh(&vvs->rx_lock);
+	if (total < len && !list_empty(&vvs->rx_queue)) {
+		pkt = list_first_entry(&vvs->rx_queue,
+				       struct virtio_vsock_pkt, list);
+
+		total = len;
+		if (total > pkt->len - pkt->off)
+			total = pkt->len - pkt->off;
+		else if (total < pkt->len - pkt->off)
+			msg->msg_flags |= MSG_TRUNC;
+
+		/* sk_lock is held by caller so no one else can dequeue.
+		 * Unlock rx_lock since memcpy_to_msg() may sleep.
+		 */
+		spin_unlock_bh(&vvs->rx_lock);
+
+		err = memcpy_to_msg(msg, pkt->buf + pkt->off, total);
+		if (err)
+			return err;
+
+		spin_lock_bh(&vvs->rx_lock);
+
+		virtio_transport_dec_rx_pkt(vvs, pkt);
+		list_del(&pkt->list);
+		virtio_transport_free_pkt(pkt);
+	}
+
+	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
+
+	spin_unlock_bh(&vvs->rx_lock);
+
+	if (total > 0 && msg->msg_name) {
+		/* Provide the address of the sender. */
+		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
+
+		vsock_addr_init(vm_addr, le64_to_cpu(pkt->hdr.src_cid),
+						le32_to_cpu(pkt->hdr.src_port));
+		msg->msg_namelen = sizeof(*vm_addr);
+	}
+	return total;
+}
+
 ssize_t
 virtio_transport_stream_dequeue(struct vsock_sock *vsk,
 				struct msghdr *msg,
@@ -414,7 +475,66 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
 			       struct msghdr *msg,
 			       size_t len, int flags)
 {
-	return -EOPNOTSUPP;
+	struct sock *sk;
+	size_t err = 0;
+	long timeout;
+
+	DEFINE_WAIT(wait);
+
+	sk = &vsk->sk;
+	err = 0;
+
+	lock_sock(sk);
+
+	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags & MSG_PEEK)
+		return -EOPNOTSUPP;
+
+	if (!len)
+		goto out;
+
+	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+
+	while (1) {
+		s64 ready;
+
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
+		ready = virtio_transport_dgram_has_data(vsk);
+
+		if (ready == 0) {
+			if (timeout == 0) {
+				err = -EAGAIN;
+				finish_wait(sk_sleep(sk), &wait);
+				break;
+			}
+
+			release_sock(sk);
+			timeout = schedule_timeout(timeout);
+			lock_sock(sk);
+
+			if (signal_pending(current)) {
+				err = sock_intr_errno(timeout);
+				finish_wait(sk_sleep(sk), &wait);
+				break;
+			} else if (timeout == 0) {
+				err = -EAGAIN;
+				finish_wait(sk_sleep(sk), &wait);
+				break;
+			}
+		} else {
+			finish_wait(sk_sleep(sk), &wait);
+
+			if (ready < 0) {
+				err = -ENOMEM;
+				goto out;
+			}
+
+			err = virtio_transport_dgram_do_dequeue(vsk, msg, len);
+			break;
+		}
+	}
+out:
+	release_sock(sk);
+	return err;
 }
 EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
 
@@ -431,6 +551,11 @@ s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
 }
 EXPORT_SYMBOL_GPL(virtio_transport_stream_has_data);
 
+static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk)
+{
+	return virtio_transport_stream_has_data(vsk);
+}
+
 static s64 virtio_transport_has_space(struct vsock_sock *vsk)
 {
 	struct virtio_vsock_sock *vvs = vsk->trans;
@@ -610,13 +735,15 @@ EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
 int virtio_transport_dgram_bind(struct vsock_sock *vsk,
 				struct sockaddr_vm *addr)
 {
-	return -EOPNOTSUPP;
+	//use same stream bind for dgram
+	int ret = vsock_bind_stream(vsk, addr);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
 
 bool virtio_transport_dgram_allow(u32 cid, u32 port)
 {
-	return false;
+	return true;
 }
 EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
 
@@ -654,7 +781,17 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
 			       struct msghdr *msg,
 			       size_t dgram_len)
 {
-	return -EOPNOTSUPP;
+	struct virtio_vsock_pkt_info info = {
+		.op = VIRTIO_VSOCK_OP_RW,
+		.type = VIRTIO_VSOCK_TYPE_DGRAM,
+		.msg = msg,
+		.pkt_len = dgram_len,
+		.vsk = vsk,
+		.remote_cid = remote_addr->svm_cid,
+		.remote_port = remote_addr->svm_port,
+	};
+
+	return virtio_transport_send_pkt_info(vsk, &info);
 }
 EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
 
@@ -729,7 +866,6 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
 		virtio_transport_free_pkt(reply);
 		return -ENOTCONN;
 	}
-
 	return t->send_pkt(reply);
 }
 
@@ -925,7 +1061,8 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
 		/* If there is space in the last packet queued, we copy the
 		 * new packet in its buffer.
 		 */
-		if (pkt->len <= last_pkt->buf_len - last_pkt->len) {
+		if (pkt->len <= last_pkt->buf_len - last_pkt->len &&
+			pkt->hdr.type == VIRTIO_VSOCK_TYPE_STREAM) {
 			memcpy(last_pkt->buf + last_pkt->len, pkt->buf,
 			       pkt->len);
 			last_pkt->len += pkt->len;
@@ -949,6 +1086,12 @@ virtio_transport_recv_connected(struct sock *sk,
 	struct vsock_sock *vsk = vsock_sk(sk);
 	int err = 0;
 
+	if (le16_to_cpu(pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)) {
+		virtio_transport_recv_enqueue(vsk, pkt);
+		sk->sk_data_ready(sk);
+		return err;
+	}
+
 	switch (le16_to_cpu(pkt->hdr.op)) {
 	case VIRTIO_VSOCK_OP_RW:
 		virtio_transport_recv_enqueue(vsk, pkt);
@@ -1121,7 +1264,8 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 					le32_to_cpu(pkt->hdr.buf_alloc),
 					le32_to_cpu(pkt->hdr.fwd_cnt));
 
-	if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM) {
+	if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM &&
+		le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_DGRAM) {
 		(void)virtio_transport_reset_no_sock(t, pkt);
 		goto free_pkt;
 	}
@@ -1150,11 +1294,16 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 		goto free_pkt;
 	}
 
-	space_available = virtio_transport_space_update(sk, pkt);
-
 	/* Update CID in case it has changed after a transport reset event */
 	vsk->local_addr.svm_cid = dst.svm_cid;
 
+	if (sk->sk_type == SOCK_DGRAM) {
+		virtio_transport_recv_connected(sk, pkt);
+		goto out;
+	}
+
+	space_available = virtio_transport_space_update(sk, pkt);
+
 	if (space_available)
 		sk->sk_write_space(sk);
 
@@ -1180,6 +1329,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
 		break;
 	}
 
+out:
 	release_sock(sk);
 
 	/* Release refcnt obtained when we fetched this socket out of the
-- 
2.11.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC v1 3/6] vhost/vsock: add support for vhost dgram.
  2021-06-09 23:24 ` Jiang Wang
@ 2021-06-09 23:24   ` Jiang Wang
  -1 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang @ 2021-06-09 23:24 UTC (permalink / raw)
  To: sgarzare
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, Jason Wang,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Andra Paraschiv, Norbert Slusarek,
	Jeff Vander Stoep, Alexander Popov, kvm, netdev, linux-kernel

This patch supports dgram on vhost side, including
tx and rx. The vhost send packets asynchronously.

Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
---
 drivers/vhost/vsock.c | 199 +++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 173 insertions(+), 26 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 81d064601093..d366463be6d4 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -28,7 +28,10 @@
  * small pkts.
  */
 #define VHOST_VSOCK_PKT_WEIGHT 256
+#define VHOST_VSOCK_DGRM_MAX_PENDING_PKT 128
 
+/* Max wait time in busy poll in microseconds */
+#define VHOST_VSOCK_BUSY_POLL_TIMEOUT 20
 enum {
 	VHOST_VSOCK_FEATURES = VHOST_FEATURES |
 			       (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
@@ -45,7 +48,7 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
 
 struct vhost_vsock {
 	struct vhost_dev dev;
-	struct vhost_virtqueue vqs[2];
+	struct vhost_virtqueue vqs[4];
 
 	/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */
 	struct hlist_node hash;
@@ -54,6 +57,11 @@ struct vhost_vsock {
 	spinlock_t send_pkt_list_lock;
 	struct list_head send_pkt_list;	/* host->guest pending packets */
 
+	spinlock_t dgram_send_pkt_list_lock;
+	struct list_head dgram_send_pkt_list;	/* host->guest pending packets */
+	struct vhost_work dgram_send_pkt_work;
+	int  dgram_used; /*pending packets to be send */
+
 	atomic_t queued_replies;
 
 	u32 guest_cid;
@@ -90,10 +98,22 @@ static void
 vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			    struct vhost_virtqueue *vq)
 {
-	struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX];
+	struct vhost_virtqueue *tx_vq;
 	int pkts = 0, total_len = 0;
 	bool added = false;
 	bool restart_tx = false;
+	spinlock_t *lock;
+	struct list_head *send_pkt_list;
+
+	if (vq == &vsock->vqs[VSOCK_VQ_RX]) {
+		tx_vq = &vsock->vqs[VSOCK_VQ_TX];
+		lock = &vsock->send_pkt_list_lock;
+		send_pkt_list = &vsock->send_pkt_list;
+	} else {
+		tx_vq = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
+		lock = &vsock->dgram_send_pkt_list_lock;
+		send_pkt_list = &vsock->dgram_send_pkt_list;
+	}
 
 	mutex_lock(&vq->mutex);
 
@@ -113,36 +133,48 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		size_t nbytes;
 		size_t iov_len, payload_len;
 		int head;
+		bool is_dgram = false;
 
-		spin_lock_bh(&vsock->send_pkt_list_lock);
-		if (list_empty(&vsock->send_pkt_list)) {
-			spin_unlock_bh(&vsock->send_pkt_list_lock);
+		spin_lock_bh(lock);
+		if (list_empty(send_pkt_list)) {
+			spin_unlock_bh(lock);
 			vhost_enable_notify(&vsock->dev, vq);
 			break;
 		}
 
-		pkt = list_first_entry(&vsock->send_pkt_list,
+		pkt = list_first_entry(send_pkt_list,
 				       struct virtio_vsock_pkt, list);
 		list_del_init(&pkt->list);
-		spin_unlock_bh(&vsock->send_pkt_list_lock);
+		spin_unlock_bh(lock);
+
+		if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)
+			is_dgram = true;
 
 		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 					 &out, &in, NULL, NULL);
 		if (head < 0) {
-			spin_lock_bh(&vsock->send_pkt_list_lock);
-			list_add(&pkt->list, &vsock->send_pkt_list);
-			spin_unlock_bh(&vsock->send_pkt_list_lock);
+			spin_lock_bh(lock);
+			list_add(&pkt->list, send_pkt_list);
+			spin_unlock_bh(lock);
 			break;
 		}
 
 		if (head == vq->num) {
-			spin_lock_bh(&vsock->send_pkt_list_lock);
-			list_add(&pkt->list, &vsock->send_pkt_list);
-			spin_unlock_bh(&vsock->send_pkt_list_lock);
+			if (is_dgram) {
+				virtio_transport_free_pkt(pkt);
+				vq_err(vq, "Dgram virtqueue is full!");
+				spin_lock_bh(lock);
+				vsock->dgram_used--;
+				spin_unlock_bh(lock);
+				break;
+			}
+			spin_lock_bh(lock);
+			list_add(&pkt->list, send_pkt_list);
+			spin_unlock_bh(lock);
 
 			/* We cannot finish yet if more buffers snuck in while
-			 * re-enabling notify.
-			 */
+			* re-enabling notify.
+			*/
 			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
 				vhost_disable_notify(&vsock->dev, vq);
 				continue;
@@ -153,6 +185,12 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		if (out) {
 			virtio_transport_free_pkt(pkt);
 			vq_err(vq, "Expected 0 output buffers, got %u\n", out);
+			if (is_dgram) {
+				spin_lock_bh(lock);
+				vsock->dgram_used--;
+				spin_unlock_bh(lock);
+			}
+
 			break;
 		}
 
@@ -160,6 +198,18 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		if (iov_len < sizeof(pkt->hdr)) {
 			virtio_transport_free_pkt(pkt);
 			vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
+			if (is_dgram) {
+				spin_lock_bh(lock);
+				vsock->dgram_used--;
+				spin_unlock_bh(lock);
+			}
+			break;
+		}
+
+		if (iov_len < pkt->len - pkt->off &&
+			vq == &vsock->vqs[VSOCK_VQ_DGRAM_RX]) {
+			virtio_transport_free_pkt(pkt);
+			vq_err(vq, "Buffer len [%zu] too small for dgram\n", iov_len);
 			break;
 		}
 
@@ -179,6 +229,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		if (nbytes != sizeof(pkt->hdr)) {
 			virtio_transport_free_pkt(pkt);
 			vq_err(vq, "Faulted on copying pkt hdr\n");
+			if (is_dgram) {
+				spin_lock_bh(lock);
+				vsock->dgram_used--;
+				spin_unlock_bh(lock);
+			}
 			break;
 		}
 
@@ -204,16 +259,17 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		/* If we didn't send all the payload we can requeue the packet
 		 * to send it with the next available buffer.
 		 */
-		if (pkt->off < pkt->len) {
+		if ((pkt->off < pkt->len)
+			&& (vq == &vsock->vqs[VSOCK_VQ_RX])) {
 			/* We are queueing the same virtio_vsock_pkt to handle
 			 * the remaining bytes, and we want to deliver it
 			 * to monitoring devices in the next iteration.
 			 */
 			pkt->tap_delivered = false;
 
-			spin_lock_bh(&vsock->send_pkt_list_lock);
-			list_add(&pkt->list, &vsock->send_pkt_list);
-			spin_unlock_bh(&vsock->send_pkt_list_lock);
+			spin_lock_bh(lock);
+			list_add(&pkt->list, send_pkt_list);
+			spin_unlock_bh(lock);
 		} else {
 			if (pkt->reply) {
 				int val;
@@ -228,6 +284,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			}
 
 			virtio_transport_free_pkt(pkt);
+			if (is_dgram) {
+				spin_lock_bh(lock);
+				vsock->dgram_used--;
+				spin_unlock_bh(lock);
+			}
 		}
 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
 	if (added)
@@ -251,11 +312,25 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work)
 	vhost_transport_do_send_pkt(vsock, vq);
 }
 
+static void vhost_transport_dgram_send_pkt_work(struct vhost_work *work)
+{
+	struct vhost_virtqueue *vq;
+	struct vhost_vsock *vsock;
+
+	vsock = container_of(work, struct vhost_vsock, dgram_send_pkt_work);
+	vq = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
+
+	vhost_transport_do_send_pkt(vsock, vq);
+}
+
 static int
 vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 {
 	struct vhost_vsock *vsock;
 	int len = pkt->len;
+	spinlock_t *lock;
+	struct list_head *send_pkt_list;
+	struct vhost_work *work;
 
 	rcu_read_lock();
 
@@ -267,14 +342,38 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 		return -ENODEV;
 	}
 
+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_STREAM) {
+		lock = &vsock->send_pkt_list_lock;
+		send_pkt_list = &vsock->send_pkt_list;
+		work = &vsock->send_pkt_work;
+	} else if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {
+		lock = &vsock->dgram_send_pkt_list_lock;
+		send_pkt_list = &vsock->dgram_send_pkt_list;
+		work = &vsock->dgram_send_pkt_work;
+	} else {
+		rcu_read_unlock();
+		virtio_transport_free_pkt(pkt);
+		return -EINVAL;
+	}
+
+
 	if (pkt->reply)
 		atomic_inc(&vsock->queued_replies);
 
-	spin_lock_bh(&vsock->send_pkt_list_lock);
-	list_add_tail(&pkt->list, &vsock->send_pkt_list);
-	spin_unlock_bh(&vsock->send_pkt_list_lock);
+	spin_lock_bh(lock);
+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {
+		if (vsock->dgram_used  == VHOST_VSOCK_DGRM_MAX_PENDING_PKT)
+			len = -ENOMEM;
+		else {
+			vsock->dgram_used++;
+			list_add_tail(&pkt->list, send_pkt_list);
+		}
+	} else
+		list_add_tail(&pkt->list, send_pkt_list);
 
-	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
+	spin_unlock_bh(lock);
+
+	vhost_work_queue(&vsock->dev, work);
 
 	rcu_read_unlock();
 	return len;
@@ -355,7 +454,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
 		return NULL;
 	}
 
-	if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM)
+	if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM
+		|| le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_DGRAM)
 		pkt->len = le32_to_cpu(pkt->hdr.len);
 
 	/* No payload */
@@ -442,6 +542,18 @@ static struct virtio_transport vhost_transport = {
 	.send_pkt = vhost_transport_send_pkt,
 };
 
+static inline unsigned long busy_clock(void)
+{
+	return local_clock() >> 10;
+}
+
+static bool vhost_can_busy_poll(unsigned long endtime)
+{
+	return likely(!need_resched() && !time_after(busy_clock(), endtime) &&
+		      !signal_pending(current));
+}
+
+
 static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 {
 	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
@@ -452,6 +564,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 	int head, pkts = 0, total_len = 0;
 	unsigned int out, in;
 	bool added = false;
+	unsigned long busyloop_timeout = VHOST_VSOCK_BUSY_POLL_TIMEOUT;
+	unsigned long endtime;
 
 	mutex_lock(&vq->mutex);
 
@@ -461,11 +575,14 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 	if (!vq_meta_prefetch(vq))
 		goto out;
 
+	endtime = busy_clock() + busyloop_timeout;
 	vhost_disable_notify(&vsock->dev, vq);
+	preempt_disable();
 	do {
 		u32 len;
 
-		if (!vhost_vsock_more_replies(vsock)) {
+		if (vq == &vsock->vqs[VSOCK_VQ_TX]
+			&& !vhost_vsock_more_replies(vsock)) {
 			/* Stop tx until the device processes already
 			 * pending replies.  Leave tx virtqueue
 			 * callbacks disabled.
@@ -479,6 +596,11 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 			break;
 
 		if (head == vq->num) {
+			if (vhost_can_busy_poll(endtime)) {
+				cpu_relax();
+				continue;
+			}
+
 			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
 				vhost_disable_notify(&vsock->dev, vq);
 				continue;
@@ -510,6 +632,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 		total_len += len;
 		added = true;
 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
+	preempt_enable();
 
 no_more_replies:
 	if (added)
@@ -565,6 +688,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
 	 * let's kick the send worker to send them.
 	 */
 	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
+	vhost_work_queue(&vsock->dev, &vsock->dgram_send_pkt_work);
 
 	mutex_unlock(&vsock->dev.mutex);
 	return 0;
@@ -639,8 +763,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 
 	vqs[VSOCK_VQ_TX] = &vsock->vqs[VSOCK_VQ_TX];
 	vqs[VSOCK_VQ_RX] = &vsock->vqs[VSOCK_VQ_RX];
+	vqs[VSOCK_VQ_DGRAM_TX] = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
+	vqs[VSOCK_VQ_DGRAM_RX] = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
 	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
 	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
+	vsock->vqs[VSOCK_VQ_DGRAM_TX].handle_kick =
+						vhost_vsock_handle_tx_kick;
+	vsock->vqs[VSOCK_VQ_DGRAM_RX].handle_kick =
+						vhost_vsock_handle_rx_kick;
 
 	vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
 		       UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
@@ -650,6 +780,11 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	spin_lock_init(&vsock->send_pkt_list_lock);
 	INIT_LIST_HEAD(&vsock->send_pkt_list);
 	vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
+	spin_lock_init(&vsock->dgram_send_pkt_list_lock);
+	INIT_LIST_HEAD(&vsock->dgram_send_pkt_list);
+	vhost_work_init(&vsock->dgram_send_pkt_work,
+			vhost_transport_dgram_send_pkt_work);
+
 	return 0;
 
 out:
@@ -665,6 +800,7 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)
 		if (vsock->vqs[i].handle_kick)
 			vhost_poll_flush(&vsock->vqs[i].poll);
 	vhost_work_flush(&vsock->dev, &vsock->send_pkt_work);
+	vhost_work_flush(&vsock->dev, &vsock->dgram_send_pkt_work);
 }
 
 static void vhost_vsock_reset_orphans(struct sock *sk)
@@ -724,6 +860,17 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
 	}
 	spin_unlock_bh(&vsock->send_pkt_list_lock);
 
+	spin_lock_bh(&vsock->dgram_send_pkt_list_lock);
+	while (!list_empty(&vsock->dgram_send_pkt_list)) {
+		struct virtio_vsock_pkt *pkt;
+
+		pkt = list_first_entry(&vsock->dgram_send_pkt_list,
+				struct virtio_vsock_pkt, list);
+		list_del_init(&pkt->list);
+		virtio_transport_free_pkt(pkt);
+	}
+	spin_unlock_bh(&vsock->dgram_send_pkt_list_lock);
+
 	vhost_dev_cleanup(&vsock->dev);
 	kfree(vsock->dev.vqs);
 	vhost_vsock_free(vsock);
@@ -906,7 +1053,7 @@ static int __init vhost_vsock_init(void)
 	int ret;
 
 	ret = vsock_core_register(&vhost_transport.transport,
-				  VSOCK_TRANSPORT_F_H2G);
+				  VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM);
 	if (ret < 0)
 		return ret;
 	return misc_register(&vhost_vsock_misc);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC v1 3/6] vhost/vsock: add support for vhost dgram.
@ 2021-06-09 23:24   ` Jiang Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang @ 2021-06-09 23:24 UTC (permalink / raw)
  To: sgarzare
  Cc: cong.wang, Andra Paraschiv, kvm, mst, virtualization,
	Norbert Slusarek, jhansen, duanxiongchun, Jeff Vander Stoep,
	xieyongji, Ingo Molnar, Jakub Kicinski, Alexander Popov,
	Steven Rostedt, chaiwen.cc, stefanha, netdev, linux-kernel,
	Colin Ian King, arseny.krasnov, David S. Miller

This patch supports dgram on vhost side, including
tx and rx. The vhost send packets asynchronously.

Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
---
 drivers/vhost/vsock.c | 199 +++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 173 insertions(+), 26 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 81d064601093..d366463be6d4 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -28,7 +28,10 @@
  * small pkts.
  */
 #define VHOST_VSOCK_PKT_WEIGHT 256
+#define VHOST_VSOCK_DGRM_MAX_PENDING_PKT 128
 
+/* Max wait time in busy poll in microseconds */
+#define VHOST_VSOCK_BUSY_POLL_TIMEOUT 20
 enum {
 	VHOST_VSOCK_FEATURES = VHOST_FEATURES |
 			       (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
@@ -45,7 +48,7 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
 
 struct vhost_vsock {
 	struct vhost_dev dev;
-	struct vhost_virtqueue vqs[2];
+	struct vhost_virtqueue vqs[4];
 
 	/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */
 	struct hlist_node hash;
@@ -54,6 +57,11 @@ struct vhost_vsock {
 	spinlock_t send_pkt_list_lock;
 	struct list_head send_pkt_list;	/* host->guest pending packets */
 
+	spinlock_t dgram_send_pkt_list_lock;
+	struct list_head dgram_send_pkt_list;	/* host->guest pending packets */
+	struct vhost_work dgram_send_pkt_work;
+	int  dgram_used; /*pending packets to be send */
+
 	atomic_t queued_replies;
 
 	u32 guest_cid;
@@ -90,10 +98,22 @@ static void
 vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			    struct vhost_virtqueue *vq)
 {
-	struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX];
+	struct vhost_virtqueue *tx_vq;
 	int pkts = 0, total_len = 0;
 	bool added = false;
 	bool restart_tx = false;
+	spinlock_t *lock;
+	struct list_head *send_pkt_list;
+
+	if (vq == &vsock->vqs[VSOCK_VQ_RX]) {
+		tx_vq = &vsock->vqs[VSOCK_VQ_TX];
+		lock = &vsock->send_pkt_list_lock;
+		send_pkt_list = &vsock->send_pkt_list;
+	} else {
+		tx_vq = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
+		lock = &vsock->dgram_send_pkt_list_lock;
+		send_pkt_list = &vsock->dgram_send_pkt_list;
+	}
 
 	mutex_lock(&vq->mutex);
 
@@ -113,36 +133,48 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		size_t nbytes;
 		size_t iov_len, payload_len;
 		int head;
+		bool is_dgram = false;
 
-		spin_lock_bh(&vsock->send_pkt_list_lock);
-		if (list_empty(&vsock->send_pkt_list)) {
-			spin_unlock_bh(&vsock->send_pkt_list_lock);
+		spin_lock_bh(lock);
+		if (list_empty(send_pkt_list)) {
+			spin_unlock_bh(lock);
 			vhost_enable_notify(&vsock->dev, vq);
 			break;
 		}
 
-		pkt = list_first_entry(&vsock->send_pkt_list,
+		pkt = list_first_entry(send_pkt_list,
 				       struct virtio_vsock_pkt, list);
 		list_del_init(&pkt->list);
-		spin_unlock_bh(&vsock->send_pkt_list_lock);
+		spin_unlock_bh(lock);
+
+		if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)
+			is_dgram = true;
 
 		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 					 &out, &in, NULL, NULL);
 		if (head < 0) {
-			spin_lock_bh(&vsock->send_pkt_list_lock);
-			list_add(&pkt->list, &vsock->send_pkt_list);
-			spin_unlock_bh(&vsock->send_pkt_list_lock);
+			spin_lock_bh(lock);
+			list_add(&pkt->list, send_pkt_list);
+			spin_unlock_bh(lock);
 			break;
 		}
 
 		if (head == vq->num) {
-			spin_lock_bh(&vsock->send_pkt_list_lock);
-			list_add(&pkt->list, &vsock->send_pkt_list);
-			spin_unlock_bh(&vsock->send_pkt_list_lock);
+			if (is_dgram) {
+				virtio_transport_free_pkt(pkt);
+				vq_err(vq, "Dgram virtqueue is full!");
+				spin_lock_bh(lock);
+				vsock->dgram_used--;
+				spin_unlock_bh(lock);
+				break;
+			}
+			spin_lock_bh(lock);
+			list_add(&pkt->list, send_pkt_list);
+			spin_unlock_bh(lock);
 
 			/* We cannot finish yet if more buffers snuck in while
-			 * re-enabling notify.
-			 */
+			* re-enabling notify.
+			*/
 			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
 				vhost_disable_notify(&vsock->dev, vq);
 				continue;
@@ -153,6 +185,12 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		if (out) {
 			virtio_transport_free_pkt(pkt);
 			vq_err(vq, "Expected 0 output buffers, got %u\n", out);
+			if (is_dgram) {
+				spin_lock_bh(lock);
+				vsock->dgram_used--;
+				spin_unlock_bh(lock);
+			}
+
 			break;
 		}
 
@@ -160,6 +198,18 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		if (iov_len < sizeof(pkt->hdr)) {
 			virtio_transport_free_pkt(pkt);
 			vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
+			if (is_dgram) {
+				spin_lock_bh(lock);
+				vsock->dgram_used--;
+				spin_unlock_bh(lock);
+			}
+			break;
+		}
+
+		if (iov_len < pkt->len - pkt->off &&
+			vq == &vsock->vqs[VSOCK_VQ_DGRAM_RX]) {
+			virtio_transport_free_pkt(pkt);
+			vq_err(vq, "Buffer len [%zu] too small for dgram\n", iov_len);
 			break;
 		}
 
@@ -179,6 +229,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		if (nbytes != sizeof(pkt->hdr)) {
 			virtio_transport_free_pkt(pkt);
 			vq_err(vq, "Faulted on copying pkt hdr\n");
+			if (is_dgram) {
+				spin_lock_bh(lock);
+				vsock->dgram_used--;
+				spin_unlock_bh(lock);
+			}
 			break;
 		}
 
@@ -204,16 +259,17 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		/* If we didn't send all the payload we can requeue the packet
 		 * to send it with the next available buffer.
 		 */
-		if (pkt->off < pkt->len) {
+		if ((pkt->off < pkt->len)
+			&& (vq == &vsock->vqs[VSOCK_VQ_RX])) {
 			/* We are queueing the same virtio_vsock_pkt to handle
 			 * the remaining bytes, and we want to deliver it
 			 * to monitoring devices in the next iteration.
 			 */
 			pkt->tap_delivered = false;
 
-			spin_lock_bh(&vsock->send_pkt_list_lock);
-			list_add(&pkt->list, &vsock->send_pkt_list);
-			spin_unlock_bh(&vsock->send_pkt_list_lock);
+			spin_lock_bh(lock);
+			list_add(&pkt->list, send_pkt_list);
+			spin_unlock_bh(lock);
 		} else {
 			if (pkt->reply) {
 				int val;
@@ -228,6 +284,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			}
 
 			virtio_transport_free_pkt(pkt);
+			if (is_dgram) {
+				spin_lock_bh(lock);
+				vsock->dgram_used--;
+				spin_unlock_bh(lock);
+			}
 		}
 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
 	if (added)
@@ -251,11 +312,25 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work)
 	vhost_transport_do_send_pkt(vsock, vq);
 }
 
+static void vhost_transport_dgram_send_pkt_work(struct vhost_work *work)
+{
+	struct vhost_virtqueue *vq;
+	struct vhost_vsock *vsock;
+
+	vsock = container_of(work, struct vhost_vsock, dgram_send_pkt_work);
+	vq = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
+
+	vhost_transport_do_send_pkt(vsock, vq);
+}
+
 static int
 vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 {
 	struct vhost_vsock *vsock;
 	int len = pkt->len;
+	spinlock_t *lock;
+	struct list_head *send_pkt_list;
+	struct vhost_work *work;
 
 	rcu_read_lock();
 
@@ -267,14 +342,38 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 		return -ENODEV;
 	}
 
+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_STREAM) {
+		lock = &vsock->send_pkt_list_lock;
+		send_pkt_list = &vsock->send_pkt_list;
+		work = &vsock->send_pkt_work;
+	} else if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {
+		lock = &vsock->dgram_send_pkt_list_lock;
+		send_pkt_list = &vsock->dgram_send_pkt_list;
+		work = &vsock->dgram_send_pkt_work;
+	} else {
+		rcu_read_unlock();
+		virtio_transport_free_pkt(pkt);
+		return -EINVAL;
+	}
+
+
 	if (pkt->reply)
 		atomic_inc(&vsock->queued_replies);
 
-	spin_lock_bh(&vsock->send_pkt_list_lock);
-	list_add_tail(&pkt->list, &vsock->send_pkt_list);
-	spin_unlock_bh(&vsock->send_pkt_list_lock);
+	spin_lock_bh(lock);
+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {
+		if (vsock->dgram_used  == VHOST_VSOCK_DGRM_MAX_PENDING_PKT)
+			len = -ENOMEM;
+		else {
+			vsock->dgram_used++;
+			list_add_tail(&pkt->list, send_pkt_list);
+		}
+	} else
+		list_add_tail(&pkt->list, send_pkt_list);
 
-	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
+	spin_unlock_bh(lock);
+
+	vhost_work_queue(&vsock->dev, work);
 
 	rcu_read_unlock();
 	return len;
@@ -355,7 +454,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
 		return NULL;
 	}
 
-	if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM)
+	if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM
+		|| le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_DGRAM)
 		pkt->len = le32_to_cpu(pkt->hdr.len);
 
 	/* No payload */
@@ -442,6 +542,18 @@ static struct virtio_transport vhost_transport = {
 	.send_pkt = vhost_transport_send_pkt,
 };
 
+static inline unsigned long busy_clock(void)
+{
+	return local_clock() >> 10;
+}
+
+static bool vhost_can_busy_poll(unsigned long endtime)
+{
+	return likely(!need_resched() && !time_after(busy_clock(), endtime) &&
+		      !signal_pending(current));
+}
+
+
 static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 {
 	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
@@ -452,6 +564,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 	int head, pkts = 0, total_len = 0;
 	unsigned int out, in;
 	bool added = false;
+	unsigned long busyloop_timeout = VHOST_VSOCK_BUSY_POLL_TIMEOUT;
+	unsigned long endtime;
 
 	mutex_lock(&vq->mutex);
 
@@ -461,11 +575,14 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 	if (!vq_meta_prefetch(vq))
 		goto out;
 
+	endtime = busy_clock() + busyloop_timeout;
 	vhost_disable_notify(&vsock->dev, vq);
+	preempt_disable();
 	do {
 		u32 len;
 
-		if (!vhost_vsock_more_replies(vsock)) {
+		if (vq == &vsock->vqs[VSOCK_VQ_TX]
+			&& !vhost_vsock_more_replies(vsock)) {
 			/* Stop tx until the device processes already
 			 * pending replies.  Leave tx virtqueue
 			 * callbacks disabled.
@@ -479,6 +596,11 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 			break;
 
 		if (head == vq->num) {
+			if (vhost_can_busy_poll(endtime)) {
+				cpu_relax();
+				continue;
+			}
+
 			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
 				vhost_disable_notify(&vsock->dev, vq);
 				continue;
@@ -510,6 +632,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 		total_len += len;
 		added = true;
 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
+	preempt_enable();
 
 no_more_replies:
 	if (added)
@@ -565,6 +688,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
 	 * let's kick the send worker to send them.
 	 */
 	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
+	vhost_work_queue(&vsock->dev, &vsock->dgram_send_pkt_work);
 
 	mutex_unlock(&vsock->dev.mutex);
 	return 0;
@@ -639,8 +763,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 
 	vqs[VSOCK_VQ_TX] = &vsock->vqs[VSOCK_VQ_TX];
 	vqs[VSOCK_VQ_RX] = &vsock->vqs[VSOCK_VQ_RX];
+	vqs[VSOCK_VQ_DGRAM_TX] = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
+	vqs[VSOCK_VQ_DGRAM_RX] = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
 	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
 	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
+	vsock->vqs[VSOCK_VQ_DGRAM_TX].handle_kick =
+						vhost_vsock_handle_tx_kick;
+	vsock->vqs[VSOCK_VQ_DGRAM_RX].handle_kick =
+						vhost_vsock_handle_rx_kick;
 
 	vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
 		       UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
@@ -650,6 +780,11 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 	spin_lock_init(&vsock->send_pkt_list_lock);
 	INIT_LIST_HEAD(&vsock->send_pkt_list);
 	vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
+	spin_lock_init(&vsock->dgram_send_pkt_list_lock);
+	INIT_LIST_HEAD(&vsock->dgram_send_pkt_list);
+	vhost_work_init(&vsock->dgram_send_pkt_work,
+			vhost_transport_dgram_send_pkt_work);
+
 	return 0;
 
 out:
@@ -665,6 +800,7 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)
 		if (vsock->vqs[i].handle_kick)
 			vhost_poll_flush(&vsock->vqs[i].poll);
 	vhost_work_flush(&vsock->dev, &vsock->send_pkt_work);
+	vhost_work_flush(&vsock->dev, &vsock->dgram_send_pkt_work);
 }
 
 static void vhost_vsock_reset_orphans(struct sock *sk)
@@ -724,6 +860,17 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
 	}
 	spin_unlock_bh(&vsock->send_pkt_list_lock);
 
+	spin_lock_bh(&vsock->dgram_send_pkt_list_lock);
+	while (!list_empty(&vsock->dgram_send_pkt_list)) {
+		struct virtio_vsock_pkt *pkt;
+
+		pkt = list_first_entry(&vsock->dgram_send_pkt_list,
+				struct virtio_vsock_pkt, list);
+		list_del_init(&pkt->list);
+		virtio_transport_free_pkt(pkt);
+	}
+	spin_unlock_bh(&vsock->dgram_send_pkt_list_lock);
+
 	vhost_dev_cleanup(&vsock->dev);
 	kfree(vsock->dev.vqs);
 	vhost_vsock_free(vsock);
@@ -906,7 +1053,7 @@ static int __init vhost_vsock_init(void)
 	int ret;
 
 	ret = vsock_core_register(&vhost_transport.transport,
-				  VSOCK_TRANSPORT_F_H2G);
+				  VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM);
 	if (ret < 0)
 		return ret;
 	return misc_register(&vhost_vsock_misc);
-- 
2.11.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC v1 4/6] vsock_test: add tests for vsock dgram
  2021-06-09 23:24 ` Jiang Wang
@ 2021-06-09 23:24   ` Jiang Wang
  -1 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang @ 2021-06-09 23:24 UTC (permalink / raw)
  To: sgarzare
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, Jason Wang,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Norbert Slusarek, Andra Paraschiv,
	Alexander Popov, kvm, netdev, linux-kernel

Added test cases for vsock dgram types.

Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
---
 tools/testing/vsock/util.c       | 105 +++++++++++++++++++++
 tools/testing/vsock/util.h       |   4 +
 tools/testing/vsock/vsock_test.c | 195 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 304 insertions(+)

diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 93cbd6f603f9..59e5301b5380 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -238,6 +238,57 @@ void send_byte(int fd, int expected_ret, int flags)
 	}
 }
 
+/* Transmit one byte and check the return value.
+ *
+ * expected_ret:
+ *  <0 Negative errno (for testing errors)
+ *   0 End-of-file
+ *   1 Success
+ */
+void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
+				int flags)
+{
+	const uint8_t byte = 'A';
+	ssize_t nwritten;
+
+	timeout_begin(TIMEOUT);
+	do {
+		nwritten = sendto(fd, &byte, sizeof(byte), flags, dest_addr,
+						len);
+		timeout_check("write");
+	} while (nwritten < 0 && errno == EINTR);
+	timeout_end();
+
+	if (expected_ret < 0) {
+		if (nwritten != -1) {
+			fprintf(stderr, "bogus sendto(2) return value %zd\n",
+				nwritten);
+			exit(EXIT_FAILURE);
+		}
+		if (errno != -expected_ret) {
+			perror("write");
+			exit(EXIT_FAILURE);
+		}
+		return;
+	}
+
+	if (nwritten < 0) {
+		perror("write");
+		exit(EXIT_FAILURE);
+	}
+	if (nwritten == 0) {
+		if (expected_ret == 0)
+			return;
+
+		fprintf(stderr, "unexpected EOF while sending byte\n");
+		exit(EXIT_FAILURE);
+	}
+	if (nwritten != sizeof(byte)) {
+		fprintf(stderr, "bogus sendto(2) return value %zd\n", nwritten);
+		exit(EXIT_FAILURE);
+	}
+}
+
 /* Receive one byte and check the return value.
  *
  * expected_ret:
@@ -291,6 +342,60 @@ void recv_byte(int fd, int expected_ret, int flags)
 	}
 }
 
+/* Receive one byte and check the return value.
+ *
+ * expected_ret:
+ *  <0 Negative errno (for testing errors)
+ *   0 End-of-file
+ *   1 Success
+ */
+void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
+				int expected_ret, int flags)
+{
+	uint8_t byte;
+	ssize_t nread;
+
+	timeout_begin(TIMEOUT);
+	do {
+		nread = recvfrom(fd, &byte, sizeof(byte), flags, src_addr, addrlen);
+		timeout_check("read");
+	} while (nread < 0 && errno == EINTR);
+	timeout_end();
+
+	if (expected_ret < 0) {
+		if (nread != -1) {
+			fprintf(stderr, "bogus recvfrom(2) return value %zd\n",
+				nread);
+			exit(EXIT_FAILURE);
+		}
+		if (errno != -expected_ret) {
+			perror("read");
+			exit(EXIT_FAILURE);
+		}
+		return;
+	}
+
+	if (nread < 0) {
+		perror("read");
+		exit(EXIT_FAILURE);
+	}
+	if (nread == 0) {
+		if (expected_ret == 0)
+			return;
+
+		fprintf(stderr, "unexpected EOF while receiving byte\n");
+		exit(EXIT_FAILURE);
+	}
+	if (nread != sizeof(byte)) {
+		fprintf(stderr, "bogus recvfrom(2) return value %zd\n", nread);
+		exit(EXIT_FAILURE);
+	}
+	if (byte != 'A') {
+		fprintf(stderr, "unexpected byte read %c\n", byte);
+		exit(EXIT_FAILURE);
+	}
+}
+
 /* Run test cases.  The program terminates if a failure occurs. */
 void run_tests(const struct test_case *test_cases,
 	       const struct test_opts *opts)
diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
index e53dd09d26d9..cea1acd094c6 100644
--- a/tools/testing/vsock/util.h
+++ b/tools/testing/vsock/util.h
@@ -40,7 +40,11 @@ int vsock_stream_accept(unsigned int cid, unsigned int port,
 			struct sockaddr_vm *clientaddrp);
 void vsock_wait_remote_close(int fd);
 void send_byte(int fd, int expected_ret, int flags);
+void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
+				int flags);
 void recv_byte(int fd, int expected_ret, int flags);
+void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
+				int expected_ret, int flags);
 void run_tests(const struct test_case *test_cases,
 	       const struct test_opts *opts);
 void list_tests(const struct test_case *test_cases);
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 5a4fb80fa832..9dd9f004b7df 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -197,6 +197,115 @@ static void test_stream_server_close_server(const struct test_opts *opts)
 	close(fd);
 }
 
+static void test_dgram_sendto_client(const struct test_opts *opts)
+{
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = opts->peer_cid,
+		},
+	};
+	int fd;
+
+	/* Wait for the server to be ready */
+	control_expectln("BIND");
+
+	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+	if (fd < 0) {
+		perror("socket");
+		exit(EXIT_FAILURE);
+	}
+
+	sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
+
+	/* Notify the server that the client has finished */
+	control_writeln("DONE");
+
+	close(fd);
+}
+
+static void test_dgram_sendto_server(const struct test_opts *opts)
+{
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = VMADDR_CID_ANY,
+		},
+	};
+	int fd;
+	int len = sizeof(addr.sa);
+
+	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+
+	if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+		perror("bind");
+		exit(EXIT_FAILURE);
+	}
+
+	/* Notify the client that the server is ready */
+	control_writeln("BIND");
+
+	recvfrom_byte(fd, &addr.sa, &len, 1, 0);
+	printf("got message from cid:%d, port %u ", addr.svm.svm_cid,
+			addr.svm.svm_port);
+
+	/* Wait for the client to finish */
+	control_expectln("DONE");
+
+	close(fd);
+}
+
+static void test_dgram_connect_client(const struct test_opts *opts)
+{
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = opts->peer_cid,
+		},
+	};
+	int fd;
+	int ret;
+
+	/* Wait for the server to be ready */
+	control_expectln("BIND");
+
+	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+	if (fd < 0) {
+		perror("bind");
+		exit(EXIT_FAILURE);
+	}
+
+	ret = connect(fd, &addr.sa, sizeof(addr.svm));
+	if (ret < 0) {
+		perror("connect");
+		exit(EXIT_FAILURE);
+	}
+
+	send_byte(fd, 1, 0);
+
+	/* Notify the server that the client has finished */
+	control_writeln("DONE");
+
+	close(fd);
+}
+
+static void test_dgram_connect_server(const struct test_opts *opts)
+{
+	test_dgram_sendto_server(opts);
+}
+
 /* With the standard socket sizes, VMCI is able to support about 100
  * concurrent stream connections.
  */
@@ -250,6 +359,77 @@ static void test_stream_multiconn_server(const struct test_opts *opts)
 		close(fds[i]);
 }
 
+static void test_dgram_multiconn_client(const struct test_opts *opts)
+{
+	int fds[MULTICONN_NFDS];
+	int i;
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = opts->peer_cid,
+		},
+	};
+
+	/* Wait for the server to be ready */
+	control_expectln("BIND");
+
+	for (i = 0; i < MULTICONN_NFDS; i++) {
+		fds[i] = socket(AF_VSOCK, SOCK_DGRAM, 0);
+		if (fds[i] < 0) {
+			perror("socket");
+			exit(EXIT_FAILURE);
+		}
+	}
+
+	for (i = 0; i < MULTICONN_NFDS; i++)
+		sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0);
+
+	/* Notify the server that the client has finished */
+	control_writeln("DONE");
+
+	for (i = 0; i < MULTICONN_NFDS; i++)
+		close(fds[i]);
+}
+
+static void test_dgram_multiconn_server(const struct test_opts *opts)
+{
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = VMADDR_CID_ANY,
+		},
+	};
+	int fd;
+	int len = sizeof(addr.sa);
+	int i;
+
+	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+
+	if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+		perror("bind");
+		exit(EXIT_FAILURE);
+	}
+
+	/* Notify the client that the server is ready */
+	control_writeln("BIND");
+
+	for (i = 0; i < MULTICONN_NFDS; i++)
+		recvfrom_byte(fd, &addr.sa, &len, 1, 0);
+
+	/* Wait for the client to finish */
+	control_expectln("DONE");
+
+	close(fd);
+}
+
 static void test_stream_msg_peek_client(const struct test_opts *opts)
 {
 	int fd;
@@ -309,6 +489,21 @@ static struct test_case test_cases[] = {
 		.run_client = test_stream_msg_peek_client,
 		.run_server = test_stream_msg_peek_server,
 	},
+	{
+		.name = "SOCK_DGRAM client close",
+		.run_client = test_dgram_sendto_client,
+		.run_server = test_dgram_sendto_server,
+	},
+	{
+		.name = "SOCK_DGRAM client connect",
+		.run_client = test_dgram_connect_client,
+		.run_server = test_dgram_connect_server,
+	},
+	{
+		.name = "SOCK_DGRAM multiple connections",
+		.run_client = test_dgram_multiconn_client,
+		.run_server = test_dgram_multiconn_server,
+	},
 	{},
 };
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC v1 4/6] vsock_test: add tests for vsock dgram
@ 2021-06-09 23:24   ` Jiang Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang @ 2021-06-09 23:24 UTC (permalink / raw)
  To: sgarzare
  Cc: cong.wang, duanxiongchun, Ingo Molnar, kvm, mst, netdev,
	linux-kernel, Steven Rostedt, virtualization, xieyongji,
	chaiwen.cc, Norbert Slusarek, stefanha, Colin Ian King,
	Jakub Kicinski, arseny.krasnov, Alexander Popov, jhansen,
	David S. Miller, Andra Paraschiv

Added test cases for vsock dgram types.

Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
---
 tools/testing/vsock/util.c       | 105 +++++++++++++++++++++
 tools/testing/vsock/util.h       |   4 +
 tools/testing/vsock/vsock_test.c | 195 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 304 insertions(+)

diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 93cbd6f603f9..59e5301b5380 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -238,6 +238,57 @@ void send_byte(int fd, int expected_ret, int flags)
 	}
 }
 
+/* Transmit one byte and check the return value.
+ *
+ * expected_ret:
+ *  <0 Negative errno (for testing errors)
+ *   0 End-of-file
+ *   1 Success
+ */
+void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
+				int flags)
+{
+	const uint8_t byte = 'A';
+	ssize_t nwritten;
+
+	timeout_begin(TIMEOUT);
+	do {
+		nwritten = sendto(fd, &byte, sizeof(byte), flags, dest_addr,
+						len);
+		timeout_check("write");
+	} while (nwritten < 0 && errno == EINTR);
+	timeout_end();
+
+	if (expected_ret < 0) {
+		if (nwritten != -1) {
+			fprintf(stderr, "bogus sendto(2) return value %zd\n",
+				nwritten);
+			exit(EXIT_FAILURE);
+		}
+		if (errno != -expected_ret) {
+			perror("write");
+			exit(EXIT_FAILURE);
+		}
+		return;
+	}
+
+	if (nwritten < 0) {
+		perror("write");
+		exit(EXIT_FAILURE);
+	}
+	if (nwritten == 0) {
+		if (expected_ret == 0)
+			return;
+
+		fprintf(stderr, "unexpected EOF while sending byte\n");
+		exit(EXIT_FAILURE);
+	}
+	if (nwritten != sizeof(byte)) {
+		fprintf(stderr, "bogus sendto(2) return value %zd\n", nwritten);
+		exit(EXIT_FAILURE);
+	}
+}
+
 /* Receive one byte and check the return value.
  *
  * expected_ret:
@@ -291,6 +342,60 @@ void recv_byte(int fd, int expected_ret, int flags)
 	}
 }
 
+/* Receive one byte and check the return value.
+ *
+ * expected_ret:
+ *  <0 Negative errno (for testing errors)
+ *   0 End-of-file
+ *   1 Success
+ */
+void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
+				int expected_ret, int flags)
+{
+	uint8_t byte;
+	ssize_t nread;
+
+	timeout_begin(TIMEOUT);
+	do {
+		nread = recvfrom(fd, &byte, sizeof(byte), flags, src_addr, addrlen);
+		timeout_check("read");
+	} while (nread < 0 && errno == EINTR);
+	timeout_end();
+
+	if (expected_ret < 0) {
+		if (nread != -1) {
+			fprintf(stderr, "bogus recvfrom(2) return value %zd\n",
+				nread);
+			exit(EXIT_FAILURE);
+		}
+		if (errno != -expected_ret) {
+			perror("read");
+			exit(EXIT_FAILURE);
+		}
+		return;
+	}
+
+	if (nread < 0) {
+		perror("read");
+		exit(EXIT_FAILURE);
+	}
+	if (nread == 0) {
+		if (expected_ret == 0)
+			return;
+
+		fprintf(stderr, "unexpected EOF while receiving byte\n");
+		exit(EXIT_FAILURE);
+	}
+	if (nread != sizeof(byte)) {
+		fprintf(stderr, "bogus recvfrom(2) return value %zd\n", nread);
+		exit(EXIT_FAILURE);
+	}
+	if (byte != 'A') {
+		fprintf(stderr, "unexpected byte read %c\n", byte);
+		exit(EXIT_FAILURE);
+	}
+}
+
 /* Run test cases.  The program terminates if a failure occurs. */
 void run_tests(const struct test_case *test_cases,
 	       const struct test_opts *opts)
diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
index e53dd09d26d9..cea1acd094c6 100644
--- a/tools/testing/vsock/util.h
+++ b/tools/testing/vsock/util.h
@@ -40,7 +40,11 @@ int vsock_stream_accept(unsigned int cid, unsigned int port,
 			struct sockaddr_vm *clientaddrp);
 void vsock_wait_remote_close(int fd);
 void send_byte(int fd, int expected_ret, int flags);
+void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
+				int flags);
 void recv_byte(int fd, int expected_ret, int flags);
+void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
+				int expected_ret, int flags);
 void run_tests(const struct test_case *test_cases,
 	       const struct test_opts *opts);
 void list_tests(const struct test_case *test_cases);
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 5a4fb80fa832..9dd9f004b7df 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -197,6 +197,115 @@ static void test_stream_server_close_server(const struct test_opts *opts)
 	close(fd);
 }
 
+static void test_dgram_sendto_client(const struct test_opts *opts)
+{
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = opts->peer_cid,
+		},
+	};
+	int fd;
+
+	/* Wait for the server to be ready */
+	control_expectln("BIND");
+
+	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+	if (fd < 0) {
+		perror("socket");
+		exit(EXIT_FAILURE);
+	}
+
+	sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
+
+	/* Notify the server that the client has finished */
+	control_writeln("DONE");
+
+	close(fd);
+}
+
+static void test_dgram_sendto_server(const struct test_opts *opts)
+{
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = VMADDR_CID_ANY,
+		},
+	};
+	int fd;
+	int len = sizeof(addr.sa);
+
+	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+
+	if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+		perror("bind");
+		exit(EXIT_FAILURE);
+	}
+
+	/* Notify the client that the server is ready */
+	control_writeln("BIND");
+
+	recvfrom_byte(fd, &addr.sa, &len, 1, 0);
+	printf("got message from cid:%d, port %u ", addr.svm.svm_cid,
+			addr.svm.svm_port);
+
+	/* Wait for the client to finish */
+	control_expectln("DONE");
+
+	close(fd);
+}
+
+static void test_dgram_connect_client(const struct test_opts *opts)
+{
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = opts->peer_cid,
+		},
+	};
+	int fd;
+	int ret;
+
+	/* Wait for the server to be ready */
+	control_expectln("BIND");
+
+	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+	if (fd < 0) {
+		perror("bind");
+		exit(EXIT_FAILURE);
+	}
+
+	ret = connect(fd, &addr.sa, sizeof(addr.svm));
+	if (ret < 0) {
+		perror("connect");
+		exit(EXIT_FAILURE);
+	}
+
+	send_byte(fd, 1, 0);
+
+	/* Notify the server that the client has finished */
+	control_writeln("DONE");
+
+	close(fd);
+}
+
+static void test_dgram_connect_server(const struct test_opts *opts)
+{
+	test_dgram_sendto_server(opts);
+}
+
 /* With the standard socket sizes, VMCI is able to support about 100
  * concurrent stream connections.
  */
@@ -250,6 +359,77 @@ static void test_stream_multiconn_server(const struct test_opts *opts)
 		close(fds[i]);
 }
 
+static void test_dgram_multiconn_client(const struct test_opts *opts)
+{
+	int fds[MULTICONN_NFDS];
+	int i;
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = opts->peer_cid,
+		},
+	};
+
+	/* Wait for the server to be ready */
+	control_expectln("BIND");
+
+	for (i = 0; i < MULTICONN_NFDS; i++) {
+		fds[i] = socket(AF_VSOCK, SOCK_DGRAM, 0);
+		if (fds[i] < 0) {
+			perror("socket");
+			exit(EXIT_FAILURE);
+		}
+	}
+
+	for (i = 0; i < MULTICONN_NFDS; i++)
+		sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0);
+
+	/* Notify the server that the client has finished */
+	control_writeln("DONE");
+
+	for (i = 0; i < MULTICONN_NFDS; i++)
+		close(fds[i]);
+}
+
+static void test_dgram_multiconn_server(const struct test_opts *opts)
+{
+	union {
+		struct sockaddr sa;
+		struct sockaddr_vm svm;
+	} addr = {
+		.svm = {
+			.svm_family = AF_VSOCK,
+			.svm_port = 1234,
+			.svm_cid = VMADDR_CID_ANY,
+		},
+	};
+	int fd;
+	int len = sizeof(addr.sa);
+	int i;
+
+	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
+
+	if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
+		perror("bind");
+		exit(EXIT_FAILURE);
+	}
+
+	/* Notify the client that the server is ready */
+	control_writeln("BIND");
+
+	for (i = 0; i < MULTICONN_NFDS; i++)
+		recvfrom_byte(fd, &addr.sa, &len, 1, 0);
+
+	/* Wait for the client to finish */
+	control_expectln("DONE");
+
+	close(fd);
+}
+
 static void test_stream_msg_peek_client(const struct test_opts *opts)
 {
 	int fd;
@@ -309,6 +489,21 @@ static struct test_case test_cases[] = {
 		.run_client = test_stream_msg_peek_client,
 		.run_server = test_stream_msg_peek_server,
 	},
+	{
+		.name = "SOCK_DGRAM client close",
+		.run_client = test_dgram_sendto_client,
+		.run_server = test_dgram_sendto_server,
+	},
+	{
+		.name = "SOCK_DGRAM client connect",
+		.run_client = test_dgram_connect_client,
+		.run_server = test_dgram_connect_server,
+	},
+	{
+		.name = "SOCK_DGRAM multiple connections",
+		.run_client = test_dgram_multiconn_client,
+		.run_server = test_dgram_multiconn_server,
+	},
 	{},
 };
 
-- 
2.11.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC v1 5/6] vhost/vsock: add kconfig for vhost dgram support
  2021-06-09 23:24 ` Jiang Wang
@ 2021-06-09 23:24   ` Jiang Wang
  -1 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang @ 2021-06-09 23:24 UTC (permalink / raw)
  To: sgarzare
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, Jason Wang,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Andra Paraschiv, Norbert Slusarek, Colin Ian King, Lu Wei,
	Alexander Popov, kvm, netdev, linux-kernel

Also change number of vqs according to the config

Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
---
 drivers/vhost/Kconfig |  8 ++++++++
 drivers/vhost/vsock.c | 11 ++++++++---
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 587fbae06182..d63fffee6007 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -61,6 +61,14 @@ config VHOST_VSOCK
 	To compile this driver as a module, choose M here: the module will be called
 	vhost_vsock.
 
+config VHOST_VSOCK_DGRAM
+	bool "vhost vsock datagram sockets support"
+	depends on VHOST_VSOCK
+	default n
+	help
+	Enable vhost-vsock to support datagram types vsock.  The QEMU
+	and the guest must support datagram types too to use it.
+
 config VHOST_VDPA
 	tristate "Vhost driver for vDPA-based backend"
 	depends on EVENTFD
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index d366463be6d4..12ca1dc0268f 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -48,7 +48,11 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
 
 struct vhost_vsock {
 	struct vhost_dev dev;
+#ifdef CONFIG_VHOST_VSOCK_DGRAM
 	struct vhost_virtqueue vqs[4];
+#else
+	struct vhost_virtqueue vqs[2];
+#endif
 
 	/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */
 	struct hlist_node hash;
@@ -763,15 +767,16 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 
 	vqs[VSOCK_VQ_TX] = &vsock->vqs[VSOCK_VQ_TX];
 	vqs[VSOCK_VQ_RX] = &vsock->vqs[VSOCK_VQ_RX];
-	vqs[VSOCK_VQ_DGRAM_TX] = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
-	vqs[VSOCK_VQ_DGRAM_RX] = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
 	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
 	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
+#ifdef CONFIG_VHOST_VSOCK_DGRAM
+	vqs[VSOCK_VQ_DGRAM_TX] = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
+	vqs[VSOCK_VQ_DGRAM_RX] = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
 	vsock->vqs[VSOCK_VQ_DGRAM_TX].handle_kick =
 						vhost_vsock_handle_tx_kick;
 	vsock->vqs[VSOCK_VQ_DGRAM_RX].handle_kick =
 						vhost_vsock_handle_rx_kick;
-
+#endif
 	vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
 		       UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
 		       VHOST_VSOCK_WEIGHT, true, NULL);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC v1 5/6] vhost/vsock: add kconfig for vhost dgram support
@ 2021-06-09 23:24   ` Jiang Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang @ 2021-06-09 23:24 UTC (permalink / raw)
  To: sgarzare
  Cc: cong.wang, Andra Paraschiv, kvm, mst, virtualization,
	Norbert Slusarek, jhansen, duanxiongchun, xieyongji, Ingo Molnar,
	Jakub Kicinski, Alexander Popov, Steven Rostedt, chaiwen.cc,
	stefanha, netdev, linux-kernel, Lu Wei, Colin Ian King,
	arseny.krasnov, David S. Miller

Also change number of vqs according to the config

Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
---
 drivers/vhost/Kconfig |  8 ++++++++
 drivers/vhost/vsock.c | 11 ++++++++---
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 587fbae06182..d63fffee6007 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -61,6 +61,14 @@ config VHOST_VSOCK
 	To compile this driver as a module, choose M here: the module will be called
 	vhost_vsock.
 
+config VHOST_VSOCK_DGRAM
+	bool "vhost vsock datagram sockets support"
+	depends on VHOST_VSOCK
+	default n
+	help
+	Enable vhost-vsock to support datagram types vsock.  The QEMU
+	and the guest must support datagram types too to use it.
+
 config VHOST_VDPA
 	tristate "Vhost driver for vDPA-based backend"
 	depends on EVENTFD
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index d366463be6d4..12ca1dc0268f 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -48,7 +48,11 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
 
 struct vhost_vsock {
 	struct vhost_dev dev;
+#ifdef CONFIG_VHOST_VSOCK_DGRAM
 	struct vhost_virtqueue vqs[4];
+#else
+	struct vhost_virtqueue vqs[2];
+#endif
 
 	/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */
 	struct hlist_node hash;
@@ -763,15 +767,16 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
 
 	vqs[VSOCK_VQ_TX] = &vsock->vqs[VSOCK_VQ_TX];
 	vqs[VSOCK_VQ_RX] = &vsock->vqs[VSOCK_VQ_RX];
-	vqs[VSOCK_VQ_DGRAM_TX] = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
-	vqs[VSOCK_VQ_DGRAM_RX] = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
 	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
 	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
+#ifdef CONFIG_VHOST_VSOCK_DGRAM
+	vqs[VSOCK_VQ_DGRAM_TX] = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
+	vqs[VSOCK_VQ_DGRAM_RX] = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
 	vsock->vqs[VSOCK_VQ_DGRAM_TX].handle_kick =
 						vhost_vsock_handle_tx_kick;
 	vsock->vqs[VSOCK_VQ_DGRAM_RX].handle_kick =
 						vhost_vsock_handle_rx_kick;
-
+#endif
 	vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
 		       UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
 		       VHOST_VSOCK_WEIGHT, true, NULL);
-- 
2.11.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC v1 6/6] virtio/vsock: add sysfs for rx buf len for dgram
  2021-06-09 23:24 ` Jiang Wang
@ 2021-06-09 23:24   ` Jiang Wang
  -1 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang @ 2021-06-09 23:24 UTC (permalink / raw)
  To: sgarzare
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, Jason Wang,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Norbert Slusarek, Andra Paraschiv, Lu Wei,
	Alexander Popov, kvm, netdev, linux-kernel

Make rx buf len configurable via sysfs

Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
---
 net/vmw_vsock/virtio_transport.c | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index cf47aadb0c34..2e4dd9c48472 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -29,6 +29,14 @@ static struct virtio_vsock __rcu *the_virtio_vsock;
 static struct virtio_vsock *the_virtio_vsock_dgram;
 static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
 
+static int rx_buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+static struct kobject *kobj_ref;
+static ssize_t  sysfs_show(struct kobject *kobj,
+			struct kobj_attribute *attr, char *buf);
+static ssize_t  sysfs_store(struct kobject *kobj,
+			struct kobj_attribute *attr, const char *buf, size_t count);
+static struct kobj_attribute rxbuf_attr = __ATTR(rx_buf_value, 0660, sysfs_show, sysfs_store);
+
 struct virtio_vsock {
 	struct virtio_device *vdev;
 	struct virtqueue **vqs;
@@ -360,7 +368,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
 
 static void virtio_vsock_rx_fill(struct virtio_vsock *vsock, bool is_dgram)
 {
-	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+	int buf_len = rx_buf_len;
 	struct virtio_vsock_pkt *pkt;
 	struct scatterlist hdr, buf, *sgs[2];
 	struct virtqueue *vq;
@@ -1003,6 +1011,22 @@ static struct virtio_driver virtio_vsock_driver = {
 	.remove = virtio_vsock_remove,
 };
 
+static ssize_t sysfs_show(struct kobject *kobj,
+		struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d", rx_buf_len);
+}
+
+static ssize_t sysfs_store(struct kobject *kobj,
+		struct kobj_attribute *attr, const char *buf, size_t count)
+{
+	if (kstrtou32(buf, 0, &rx_buf_len) < 0)
+		return -EINVAL;
+	if (rx_buf_len < 1024)
+		rx_buf_len = 1024;
+	return count;
+}
+
 static int __init virtio_vsock_init(void)
 {
 	int ret;
@@ -1020,8 +1044,17 @@ static int __init virtio_vsock_init(void)
 	if (ret)
 		goto out_vci;
 
-	return 0;
+	kobj_ref = kobject_create_and_add("vsock", kernel_kobj);
 
+	/*Creating sysfs file for etx_value*/
+	ret = sysfs_create_file(kobj_ref, &rxbuf_attr.attr);
+	if (ret)
+		goto out_sysfs;
+
+	return 0;
+out_sysfs:
+	kobject_put(kobj_ref);
+	sysfs_remove_file(kernel_kobj, &rxbuf_attr.attr);
 out_vci:
 	vsock_core_unregister(&virtio_transport.transport);
 out_wq:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC v1 6/6] virtio/vsock: add sysfs for rx buf len for dgram
@ 2021-06-09 23:24   ` Jiang Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang @ 2021-06-09 23:24 UTC (permalink / raw)
  To: sgarzare
  Cc: cong.wang, Andra Paraschiv, kvm, mst, virtualization,
	Norbert Slusarek, jhansen, duanxiongchun, xieyongji, Ingo Molnar,
	Jakub Kicinski, Alexander Popov, Steven Rostedt, chaiwen.cc,
	stefanha, netdev, linux-kernel, Lu Wei, Colin Ian King,
	arseny.krasnov, David S. Miller

Make rx buf len configurable via sysfs

Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
---
 net/vmw_vsock/virtio_transport.c | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index cf47aadb0c34..2e4dd9c48472 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -29,6 +29,14 @@ static struct virtio_vsock __rcu *the_virtio_vsock;
 static struct virtio_vsock *the_virtio_vsock_dgram;
 static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
 
+static int rx_buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+static struct kobject *kobj_ref;
+static ssize_t  sysfs_show(struct kobject *kobj,
+			struct kobj_attribute *attr, char *buf);
+static ssize_t  sysfs_store(struct kobject *kobj,
+			struct kobj_attribute *attr, const char *buf, size_t count);
+static struct kobj_attribute rxbuf_attr = __ATTR(rx_buf_value, 0660, sysfs_show, sysfs_store);
+
 struct virtio_vsock {
 	struct virtio_device *vdev;
 	struct virtqueue **vqs;
@@ -360,7 +368,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
 
 static void virtio_vsock_rx_fill(struct virtio_vsock *vsock, bool is_dgram)
 {
-	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+	int buf_len = rx_buf_len;
 	struct virtio_vsock_pkt *pkt;
 	struct scatterlist hdr, buf, *sgs[2];
 	struct virtqueue *vq;
@@ -1003,6 +1011,22 @@ static struct virtio_driver virtio_vsock_driver = {
 	.remove = virtio_vsock_remove,
 };
 
+static ssize_t sysfs_show(struct kobject *kobj,
+		struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d", rx_buf_len);
+}
+
+static ssize_t sysfs_store(struct kobject *kobj,
+		struct kobj_attribute *attr, const char *buf, size_t count)
+{
+	if (kstrtou32(buf, 0, &rx_buf_len) < 0)
+		return -EINVAL;
+	if (rx_buf_len < 1024)
+		rx_buf_len = 1024;
+	return count;
+}
+
 static int __init virtio_vsock_init(void)
 {
 	int ret;
@@ -1020,8 +1044,17 @@ static int __init virtio_vsock_init(void)
 	if (ret)
 		goto out_vci;
 
-	return 0;
+	kobj_ref = kobject_create_and_add("vsock", kernel_kobj);
 
+	/*Creating sysfs file for etx_value*/
+	ret = sysfs_create_file(kobj_ref, &rxbuf_attr.attr);
+	if (ret)
+		goto out_sysfs;
+
+	return 0;
+out_sysfs:
+	kobject_put(kobj_ref);
+	sysfs_remove_file(kernel_kobj, &rxbuf_attr.attr);
 out_vci:
 	vsock_core_unregister(&virtio_transport.transport);
 out_wq:
-- 
2.11.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
  2021-06-09 23:24 ` Jiang Wang
@ 2021-06-10  1:50   ` Jason Wang
  -1 siblings, 0 replies; 59+ messages in thread
From: Jason Wang @ 2021-06-10  1:50 UTC (permalink / raw)
  To: Jiang Wang, sgarzare
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, David S. Miller,
	Jakub Kicinski, Steven Rostedt, Ingo Molnar, Colin Ian King,
	Jorgen Hansen, Andra Paraschiv, Norbert Slusarek, Lu Wei,
	Alexander Popov, kvm, netdev, linux-kernel


在 2021/6/10 上午7:24, Jiang Wang 写道:
> This patchset implements support of SOCK_DGRAM for virtio
> transport.
>
> Datagram sockets are connectionless and unreliable. To avoid unfair contention
> with stream and other sockets, add two more virtqueues and
> a new feature bit to indicate if those two new queues exist or not.
>
> Dgram does not use the existing credit update mechanism for
> stream sockets. When sending from the guest/driver, sending packets
> synchronously, so the sender will get an error when the virtqueue is full.
> When sending from the host/device, send packets asynchronously
> because the descriptor memory belongs to the corresponding QEMU
> process.


What's the use case for the datagram vsock?


>
> The virtio spec patch is here:
> https://www.spinics.net/lists/linux-virtualization/msg50027.html


Have a quick glance, I suggest to split mergeable rx buffer into an 
separate patch.

But I think it's time to revisit the idea of unifying the virtio-net and 
virtio-vsock. Otherwise we're duplicating features and bugs.

Thanks


>
> For those who prefer git repo, here is the link for the linux kernel:
> https://github.com/Jiang1155/linux/tree/vsock-dgram-v1
>
> qemu patch link:
> https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1
>
>
> To do:
> 1. use skb when receiving packets
> 2. support multiple transport
> 3. support mergeable rx buffer
>
>
> Jiang Wang (6):
>    virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
>    virtio/vsock: add support for virtio datagram
>    vhost/vsock: add support for vhost dgram.
>    vsock_test: add tests for vsock dgram
>    vhost/vsock: add kconfig for vhost dgram support
>    virtio/vsock: add sysfs for rx buf len for dgram
>
>   drivers/vhost/Kconfig                              |   8 +
>   drivers/vhost/vsock.c                              | 207 ++++++++--
>   include/linux/virtio_vsock.h                       |   9 +
>   include/net/af_vsock.h                             |   1 +
>   .../trace/events/vsock_virtio_transport_common.h   |   5 +-
>   include/uapi/linux/virtio_vsock.h                  |   4 +
>   net/vmw_vsock/af_vsock.c                           |  12 +
>   net/vmw_vsock/virtio_transport.c                   | 433 ++++++++++++++++++---
>   net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++-
>   tools/testing/vsock/util.c                         | 105 +++++
>   tools/testing/vsock/util.h                         |   4 +
>   tools/testing/vsock/vsock_test.c                   | 195 ++++++++++
>   12 files changed, 1070 insertions(+), 97 deletions(-)
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
@ 2021-06-10  1:50   ` Jason Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jason Wang @ 2021-06-10  1:50 UTC (permalink / raw)
  To: Jiang Wang, sgarzare
  Cc: cong.wang, Andra Paraschiv, kvm, mst, virtualization,
	Norbert Slusarek, jhansen, duanxiongchun, xieyongji, Ingo Molnar,
	Jakub Kicinski, Alexander Popov, Steven Rostedt, chaiwen.cc,
	stefanha, netdev, linux-kernel, Lu Wei, Colin Ian King,
	arseny.krasnov, David S. Miller, Jorgen Hansen


在 2021/6/10 上午7:24, Jiang Wang 写道:
> This patchset implements support of SOCK_DGRAM for virtio
> transport.
>
> Datagram sockets are connectionless and unreliable. To avoid unfair contention
> with stream and other sockets, add two more virtqueues and
> a new feature bit to indicate if those two new queues exist or not.
>
> Dgram does not use the existing credit update mechanism for
> stream sockets. When sending from the guest/driver, sending packets
> synchronously, so the sender will get an error when the virtqueue is full.
> When sending from the host/device, send packets asynchronously
> because the descriptor memory belongs to the corresponding QEMU
> process.


What's the use case for the datagram vsock?


>
> The virtio spec patch is here:
> https://www.spinics.net/lists/linux-virtualization/msg50027.html


Have a quick glance, I suggest to split mergeable rx buffer into an 
separate patch.

But I think it's time to revisit the idea of unifying the virtio-net and 
virtio-vsock. Otherwise we're duplicating features and bugs.

Thanks


>
> For those who prefer git repo, here is the link for the linux kernel:
> https://github.com/Jiang1155/linux/tree/vsock-dgram-v1
>
> qemu patch link:
> https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1
>
>
> To do:
> 1. use skb when receiving packets
> 2. support multiple transport
> 3. support mergeable rx buffer
>
>
> Jiang Wang (6):
>    virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
>    virtio/vsock: add support for virtio datagram
>    vhost/vsock: add support for vhost dgram.
>    vsock_test: add tests for vsock dgram
>    vhost/vsock: add kconfig for vhost dgram support
>    virtio/vsock: add sysfs for rx buf len for dgram
>
>   drivers/vhost/Kconfig                              |   8 +
>   drivers/vhost/vsock.c                              | 207 ++++++++--
>   include/linux/virtio_vsock.h                       |   9 +
>   include/net/af_vsock.h                             |   1 +
>   .../trace/events/vsock_virtio_transport_common.h   |   5 +-
>   include/uapi/linux/virtio_vsock.h                  |   4 +
>   net/vmw_vsock/af_vsock.c                           |  12 +
>   net/vmw_vsock/virtio_transport.c                   | 433 ++++++++++++++++++---
>   net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++-
>   tools/testing/vsock/util.c                         | 105 +++++
>   tools/testing/vsock/util.h                         |   4 +
>   tools/testing/vsock/vsock_test.c                   | 195 ++++++++++
>   12 files changed, 1070 insertions(+), 97 deletions(-)
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
  2021-06-10  1:50   ` Jason Wang
@ 2021-06-10  3:43     ` Jiang Wang .
  -1 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang . @ 2021-06-10  3:43 UTC (permalink / raw)
  To: Jason Wang
  Cc: Stefano Garzarella, virtualization, Stefan Hajnoczi,
	Michael S. Tsirkin, Arseny Krasnov, jhansen, cong.wang,
	Xiongchun Duan, Yongji Xie, 柴稳,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Jorgen Hansen, Andra Paraschiv, Norbert Slusarek,
	Lu Wei, Alexander Popov, kvm, Networking, linux-kernel

On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/6/10 上午7:24, Jiang Wang 写道:
> > This patchset implements support of SOCK_DGRAM for virtio
> > transport.
> >
> > Datagram sockets are connectionless and unreliable. To avoid unfair contention
> > with stream and other sockets, add two more virtqueues and
> > a new feature bit to indicate if those two new queues exist or not.
> >
> > Dgram does not use the existing credit update mechanism for
> > stream sockets. When sending from the guest/driver, sending packets
> > synchronously, so the sender will get an error when the virtqueue is full.
> > When sending from the host/device, send packets asynchronously
> > because the descriptor memory belongs to the corresponding QEMU
> > process.
>
>
> What's the use case for the datagram vsock?
>
One use case is for non critical info logging from the guest
to the host, such as the performance data of some applications.

It can also be used to replace UDP communications between
the guest and the host.

> >
> > The virtio spec patch is here:
> > https://www.spinics.net/lists/linux-virtualization/msg50027.html
>
>
> Have a quick glance, I suggest to split mergeable rx buffer into an
> separate patch.

Sure.

> But I think it's time to revisit the idea of unifying the virtio-net and
> virtio-vsock. Otherwise we're duplicating features and bugs.

For mergeable rxbuf related code, I think a set of common helper
functions can be used by both virtio-net and virtio-vsock. For other
parts, that may not be very beneficial. I will think about more.

If there is a previous email discussion about this topic, could you send me
some links? I did a quick web search but did not find any related
info. Thanks.

> Thanks
>
>
> >
> > For those who prefer git repo, here is the link for the linux kernel:
> > https://github.com/Jiang1155/linux/tree/vsock-dgram-v1
> >
> > qemu patch link:
> > https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1
> >
> >
> > To do:
> > 1. use skb when receiving packets
> > 2. support multiple transport
> > 3. support mergeable rx buffer
> >
> >
> > Jiang Wang (6):
> >    virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
> >    virtio/vsock: add support for virtio datagram
> >    vhost/vsock: add support for vhost dgram.
> >    vsock_test: add tests for vsock dgram
> >    vhost/vsock: add kconfig for vhost dgram support
> >    virtio/vsock: add sysfs for rx buf len for dgram
> >
> >   drivers/vhost/Kconfig                              |   8 +
> >   drivers/vhost/vsock.c                              | 207 ++++++++--
> >   include/linux/virtio_vsock.h                       |   9 +
> >   include/net/af_vsock.h                             |   1 +
> >   .../trace/events/vsock_virtio_transport_common.h   |   5 +-
> >   include/uapi/linux/virtio_vsock.h                  |   4 +
> >   net/vmw_vsock/af_vsock.c                           |  12 +
> >   net/vmw_vsock/virtio_transport.c                   | 433 ++++++++++++++++++---
> >   net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++-
> >   tools/testing/vsock/util.c                         | 105 +++++
> >   tools/testing/vsock/util.h                         |   4 +
> >   tools/testing/vsock/vsock_test.c                   | 195 ++++++++++
> >   12 files changed, 1070 insertions(+), 97 deletions(-)
> >
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
@ 2021-06-10  3:43     ` Jiang Wang .
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang . @ 2021-06-10  3:43 UTC (permalink / raw)
  To: Jason Wang
  Cc: cong.wang, Andra Paraschiv, kvm, Michael S. Tsirkin,
	virtualization, Norbert Slusarek, jhansen, Xiongchun Duan,
	Yongji Xie, Ingo Molnar, Jakub Kicinski, Alexander Popov,
	Steven Rostedt, 柴稳,
	Stefan Hajnoczi, Networking, linux-kernel, Lu Wei,
	Colin Ian King, Arseny Krasnov, David S. Miller, Jorgen Hansen

On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/6/10 上午7:24, Jiang Wang 写道:
> > This patchset implements support of SOCK_DGRAM for virtio
> > transport.
> >
> > Datagram sockets are connectionless and unreliable. To avoid unfair contention
> > with stream and other sockets, add two more virtqueues and
> > a new feature bit to indicate if those two new queues exist or not.
> >
> > Dgram does not use the existing credit update mechanism for
> > stream sockets. When sending from the guest/driver, sending packets
> > synchronously, so the sender will get an error when the virtqueue is full.
> > When sending from the host/device, send packets asynchronously
> > because the descriptor memory belongs to the corresponding QEMU
> > process.
>
>
> What's the use case for the datagram vsock?
>
One use case is for non critical info logging from the guest
to the host, such as the performance data of some applications.

It can also be used to replace UDP communications between
the guest and the host.

> >
> > The virtio spec patch is here:
> > https://www.spinics.net/lists/linux-virtualization/msg50027.html
>
>
> Have a quick glance, I suggest to split mergeable rx buffer into an
> separate patch.

Sure.

> But I think it's time to revisit the idea of unifying the virtio-net and
> virtio-vsock. Otherwise we're duplicating features and bugs.

For mergeable rxbuf related code, I think a set of common helper
functions can be used by both virtio-net and virtio-vsock. For other
parts, that may not be very beneficial. I will think about more.

If there is a previous email discussion about this topic, could you send me
some links? I did a quick web search but did not find any related
info. Thanks.

> Thanks
>
>
> >
> > For those who prefer git repo, here is the link for the linux kernel:
> > https://github.com/Jiang1155/linux/tree/vsock-dgram-v1
> >
> > qemu patch link:
> > https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1
> >
> >
> > To do:
> > 1. use skb when receiving packets
> > 2. support multiple transport
> > 3. support mergeable rx buffer
> >
> >
> > Jiang Wang (6):
> >    virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
> >    virtio/vsock: add support for virtio datagram
> >    vhost/vsock: add support for vhost dgram.
> >    vsock_test: add tests for vsock dgram
> >    vhost/vsock: add kconfig for vhost dgram support
> >    virtio/vsock: add sysfs for rx buf len for dgram
> >
> >   drivers/vhost/Kconfig                              |   8 +
> >   drivers/vhost/vsock.c                              | 207 ++++++++--
> >   include/linux/virtio_vsock.h                       |   9 +
> >   include/net/af_vsock.h                             |   1 +
> >   .../trace/events/vsock_virtio_transport_common.h   |   5 +-
> >   include/uapi/linux/virtio_vsock.h                  |   4 +
> >   net/vmw_vsock/af_vsock.c                           |  12 +
> >   net/vmw_vsock/virtio_transport.c                   | 433 ++++++++++++++++++---
> >   net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++-
> >   tools/testing/vsock/util.c                         | 105 +++++
> >   tools/testing/vsock/util.h                         |   4 +
> >   tools/testing/vsock/vsock_test.c                   | 195 ++++++++++
> >   12 files changed, 1070 insertions(+), 97 deletions(-)
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
  2021-06-10  3:43     ` Jiang Wang .
@ 2021-06-10  4:02       ` Jason Wang
  -1 siblings, 0 replies; 59+ messages in thread
From: Jason Wang @ 2021-06-10  4:02 UTC (permalink / raw)
  To: Jiang Wang .
  Cc: Stefano Garzarella, virtualization, Stefan Hajnoczi,
	Michael S. Tsirkin, Arseny Krasnov, jhansen, cong.wang,
	Xiongchun Duan, Yongji Xie, 柴稳,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Jorgen Hansen, Andra Paraschiv, Norbert Slusarek,
	Lu Wei, Alexander Popov, kvm, Networking, linux-kernel


在 2021/6/10 上午11:43, Jiang Wang . 写道:
> On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/6/10 上午7:24, Jiang Wang 写道:
>>> This patchset implements support of SOCK_DGRAM for virtio
>>> transport.
>>>
>>> Datagram sockets are connectionless and unreliable. To avoid unfair contention
>>> with stream and other sockets, add two more virtqueues and
>>> a new feature bit to indicate if those two new queues exist or not.
>>>
>>> Dgram does not use the existing credit update mechanism for
>>> stream sockets. When sending from the guest/driver, sending packets
>>> synchronously, so the sender will get an error when the virtqueue is full.
>>> When sending from the host/device, send packets asynchronously
>>> because the descriptor memory belongs to the corresponding QEMU
>>> process.
>>
>> What's the use case for the datagram vsock?
>>
> One use case is for non critical info logging from the guest
> to the host, such as the performance data of some applications.


Anything that prevents you from using the stream socket?


>
> It can also be used to replace UDP communications between
> the guest and the host.


Any advantage for VSOCK in this case? Is it for performance (I guess not 
since I don't exepct vsock will be faster).

An obvious drawback is that it breaks the migration. Using UDP you can 
have a very rich features support from the kernel where vsock can't.


>
>>> The virtio spec patch is here:
>>> https://www.spinics.net/lists/linux-virtualization/msg50027.html
>>
>> Have a quick glance, I suggest to split mergeable rx buffer into an
>> separate patch.
> Sure.
>
>> But I think it's time to revisit the idea of unifying the virtio-net and
>> virtio-vsock. Otherwise we're duplicating features and bugs.
> For mergeable rxbuf related code, I think a set of common helper
> functions can be used by both virtio-net and virtio-vsock. For other
> parts, that may not be very beneficial. I will think about more.
>
> If there is a previous email discussion about this topic, could you send me
> some links? I did a quick web search but did not find any related
> info. Thanks.


We had a lot:

[1] 
https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/
[2] 
https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html
[3] https://www.lkml.org/lkml/2020/1/16/2043

Thanks

>
>> Thanks
>>
>>
>>> For those who prefer git repo, here is the link for the linux kernel:
>>> https://github.com/Jiang1155/linux/tree/vsock-dgram-v1
>>>
>>> qemu patch link:
>>> https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1
>>>
>>>
>>> To do:
>>> 1. use skb when receiving packets
>>> 2. support multiple transport
>>> 3. support mergeable rx buffer
>>>
>>>
>>> Jiang Wang (6):
>>>     virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
>>>     virtio/vsock: add support for virtio datagram
>>>     vhost/vsock: add support for vhost dgram.
>>>     vsock_test: add tests for vsock dgram
>>>     vhost/vsock: add kconfig for vhost dgram support
>>>     virtio/vsock: add sysfs for rx buf len for dgram
>>>
>>>    drivers/vhost/Kconfig                              |   8 +
>>>    drivers/vhost/vsock.c                              | 207 ++++++++--
>>>    include/linux/virtio_vsock.h                       |   9 +
>>>    include/net/af_vsock.h                             |   1 +
>>>    .../trace/events/vsock_virtio_transport_common.h   |   5 +-
>>>    include/uapi/linux/virtio_vsock.h                  |   4 +
>>>    net/vmw_vsock/af_vsock.c                           |  12 +
>>>    net/vmw_vsock/virtio_transport.c                   | 433 ++++++++++++++++++---
>>>    net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++-
>>>    tools/testing/vsock/util.c                         | 105 +++++
>>>    tools/testing/vsock/util.h                         |   4 +
>>>    tools/testing/vsock/vsock_test.c                   | 195 ++++++++++
>>>    12 files changed, 1070 insertions(+), 97 deletions(-)
>>>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
@ 2021-06-10  4:02       ` Jason Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jason Wang @ 2021-06-10  4:02 UTC (permalink / raw)
  To: Jiang Wang .
  Cc: cong.wang, Andra Paraschiv, kvm, Michael S. Tsirkin,
	virtualization, Norbert Slusarek, jhansen, Xiongchun Duan,
	Yongji Xie, Ingo Molnar, Jakub Kicinski, Alexander Popov,
	Steven Rostedt, 柴稳,
	Stefan Hajnoczi, Networking, linux-kernel, Lu Wei,
	Colin Ian King, Arseny Krasnov, David S. Miller, Jorgen Hansen


在 2021/6/10 上午11:43, Jiang Wang . 写道:
> On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/6/10 上午7:24, Jiang Wang 写道:
>>> This patchset implements support of SOCK_DGRAM for virtio
>>> transport.
>>>
>>> Datagram sockets are connectionless and unreliable. To avoid unfair contention
>>> with stream and other sockets, add two more virtqueues and
>>> a new feature bit to indicate if those two new queues exist or not.
>>>
>>> Dgram does not use the existing credit update mechanism for
>>> stream sockets. When sending from the guest/driver, sending packets
>>> synchronously, so the sender will get an error when the virtqueue is full.
>>> When sending from the host/device, send packets asynchronously
>>> because the descriptor memory belongs to the corresponding QEMU
>>> process.
>>
>> What's the use case for the datagram vsock?
>>
> One use case is for non critical info logging from the guest
> to the host, such as the performance data of some applications.


Anything that prevents you from using the stream socket?


>
> It can also be used to replace UDP communications between
> the guest and the host.


Any advantage for VSOCK in this case? Is it for performance (I guess not 
since I don't exepct vsock will be faster).

An obvious drawback is that it breaks the migration. Using UDP you can 
have a very rich features support from the kernel where vsock can't.


>
>>> The virtio spec patch is here:
>>> https://www.spinics.net/lists/linux-virtualization/msg50027.html
>>
>> Have a quick glance, I suggest to split mergeable rx buffer into an
>> separate patch.
> Sure.
>
>> But I think it's time to revisit the idea of unifying the virtio-net and
>> virtio-vsock. Otherwise we're duplicating features and bugs.
> For mergeable rxbuf related code, I think a set of common helper
> functions can be used by both virtio-net and virtio-vsock. For other
> parts, that may not be very beneficial. I will think about more.
>
> If there is a previous email discussion about this topic, could you send me
> some links? I did a quick web search but did not find any related
> info. Thanks.


We had a lot:

[1] 
https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/
[2] 
https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html
[3] https://www.lkml.org/lkml/2020/1/16/2043

Thanks

>
>> Thanks
>>
>>
>>> For those who prefer git repo, here is the link for the linux kernel:
>>> https://github.com/Jiang1155/linux/tree/vsock-dgram-v1
>>>
>>> qemu patch link:
>>> https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1
>>>
>>>
>>> To do:
>>> 1. use skb when receiving packets
>>> 2. support multiple transport
>>> 3. support mergeable rx buffer
>>>
>>>
>>> Jiang Wang (6):
>>>     virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
>>>     virtio/vsock: add support for virtio datagram
>>>     vhost/vsock: add support for vhost dgram.
>>>     vsock_test: add tests for vsock dgram
>>>     vhost/vsock: add kconfig for vhost dgram support
>>>     virtio/vsock: add sysfs for rx buf len for dgram
>>>
>>>    drivers/vhost/Kconfig                              |   8 +
>>>    drivers/vhost/vsock.c                              | 207 ++++++++--
>>>    include/linux/virtio_vsock.h                       |   9 +
>>>    include/net/af_vsock.h                             |   1 +
>>>    .../trace/events/vsock_virtio_transport_common.h   |   5 +-
>>>    include/uapi/linux/virtio_vsock.h                  |   4 +
>>>    net/vmw_vsock/af_vsock.c                           |  12 +
>>>    net/vmw_vsock/virtio_transport.c                   | 433 ++++++++++++++++++---
>>>    net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++-
>>>    tools/testing/vsock/util.c                         | 105 +++++
>>>    tools/testing/vsock/util.h                         |   4 +
>>>    tools/testing/vsock/vsock_test.c                   | 195 ++++++++++
>>>    12 files changed, 1070 insertions(+), 97 deletions(-)
>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
  2021-06-10  4:02       ` Jason Wang
@ 2021-06-10  7:23         ` Stefano Garzarella
  -1 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-10  7:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jiang Wang .,
	virtualization, Stefan Hajnoczi, Michael S. Tsirkin,
	Arseny Krasnov, jhansen, cong.wang, Xiongchun Duan, Yongji Xie,
	柴稳,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Jorgen Hansen, Andra Paraschiv, Norbert Slusarek,
	Lu Wei, Alexander Popov, kvm, Networking, linux-kernel

On Thu, Jun 10, 2021 at 12:02:35PM +0800, Jason Wang wrote:
>
>在 2021/6/10 上午11:43, Jiang Wang . 写道:
>>On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:
>>>
>>>在 2021/6/10 上午7:24, Jiang Wang 写道:
>>>>This patchset implements support of SOCK_DGRAM for virtio
>>>>transport.
>>>>
>>>>Datagram sockets are connectionless and unreliable. To avoid unfair contention
>>>>with stream and other sockets, add two more virtqueues and
>>>>a new feature bit to indicate if those two new queues exist or not.
>>>>
>>>>Dgram does not use the existing credit update mechanism for
>>>>stream sockets. When sending from the guest/driver, sending packets
>>>>synchronously, so the sender will get an error when the virtqueue is 
>>>>full.
>>>>When sending from the host/device, send packets asynchronously
>>>>because the descriptor memory belongs to the corresponding QEMU
>>>>process.
>>>
>>>What's the use case for the datagram vsock?
>>>
>>One use case is for non critical info logging from the guest
>>to the host, such as the performance data of some applications.
>
>
>Anything that prevents you from using the stream socket?
>
>
>>
>>It can also be used to replace UDP communications between
>>the guest and the host.
>
>
>Any advantage for VSOCK in this case? Is it for performance (I guess 
>not since I don't exepct vsock will be faster).

I think the general advantage to using vsock are for the guest agents 
that potentially don't need any configuration.

>
>An obvious drawback is that it breaks the migration. Using UDP you can 
>have a very rich features support from the kernel where vsock can't.
>

Thanks for bringing this up!
What features does UDP support and datagram on vsock could not support?

>
>>
>>>>The virtio spec patch is here:
>>>>https://www.spinics.net/lists/linux-virtualization/msg50027.html
>>>
>>>Have a quick glance, I suggest to split mergeable rx buffer into an
>>>separate patch.
>>Sure.
>>
>>>But I think it's time to revisit the idea of unifying the virtio-net 
>>>and
>>>virtio-vsock. Otherwise we're duplicating features and bugs.
>>For mergeable rxbuf related code, I think a set of common helper
>>functions can be used by both virtio-net and virtio-vsock. For other
>>parts, that may not be very beneficial. I will think about more.
>>
>>If there is a previous email discussion about this topic, could you 
>>send me
>>some links? I did a quick web search but did not find any related
>>info. Thanks.
>
>
>We had a lot:
>
>[1] 
>https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/
>[2] 
>https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html
>[3] https://www.lkml.org/lkml/2020/1/16/2043
>

When I tried it, the biggest problem that blocked me were all the 
features strictly related to TCP/IP stack and ethernet devices that 
vsock device doesn't know how to handle: TSO, GSO, checksums, MAC, napi, 
xdp, min ethernet frame size, MTU, etc.

So in my opinion to unify them is not so simple, because vsock is not 
really an ethernet device, but simply a socket.

But I fully agree that we shouldn't duplicate functionality and code, so 
maybe we could find those common parts and create helpers to be used by 
both.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
@ 2021-06-10  7:23         ` Stefano Garzarella
  0 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-10  7:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: cong.wang, Andra Paraschiv, kvm, Michael S. Tsirkin,
	virtualization, Norbert Slusarek, jhansen, Xiongchun Duan,
	Yongji Xie, Ingo Molnar, Jakub Kicinski, Alexander Popov,
	Steven Rostedt, 柴稳, Stefan Hajnoczi, Jiang Wang .,
	Networking, linux-kernel, Lu Wei, Colin Ian King, Arseny Krasnov,
	David S. Miller, Jorgen Hansen

On Thu, Jun 10, 2021 at 12:02:35PM +0800, Jason Wang wrote:
>
>在 2021/6/10 上午11:43, Jiang Wang . 写道:
>>On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:
>>>
>>>在 2021/6/10 上午7:24, Jiang Wang 写道:
>>>>This patchset implements support of SOCK_DGRAM for virtio
>>>>transport.
>>>>
>>>>Datagram sockets are connectionless and unreliable. To avoid unfair contention
>>>>with stream and other sockets, add two more virtqueues and
>>>>a new feature bit to indicate if those two new queues exist or not.
>>>>
>>>>Dgram does not use the existing credit update mechanism for
>>>>stream sockets. When sending from the guest/driver, sending packets
>>>>synchronously, so the sender will get an error when the virtqueue is 
>>>>full.
>>>>When sending from the host/device, send packets asynchronously
>>>>because the descriptor memory belongs to the corresponding QEMU
>>>>process.
>>>
>>>What's the use case for the datagram vsock?
>>>
>>One use case is for non critical info logging from the guest
>>to the host, such as the performance data of some applications.
>
>
>Anything that prevents you from using the stream socket?
>
>
>>
>>It can also be used to replace UDP communications between
>>the guest and the host.
>
>
>Any advantage for VSOCK in this case? Is it for performance (I guess 
>not since I don't exepct vsock will be faster).

I think the general advantage to using vsock are for the guest agents 
that potentially don't need any configuration.

>
>An obvious drawback is that it breaks the migration. Using UDP you can 
>have a very rich features support from the kernel where vsock can't.
>

Thanks for bringing this up!
What features does UDP support and datagram on vsock could not support?

>
>>
>>>>The virtio spec patch is here:
>>>>https://www.spinics.net/lists/linux-virtualization/msg50027.html
>>>
>>>Have a quick glance, I suggest to split mergeable rx buffer into an
>>>separate patch.
>>Sure.
>>
>>>But I think it's time to revisit the idea of unifying the virtio-net 
>>>and
>>>virtio-vsock. Otherwise we're duplicating features and bugs.
>>For mergeable rxbuf related code, I think a set of common helper
>>functions can be used by both virtio-net and virtio-vsock. For other
>>parts, that may not be very beneficial. I will think about more.
>>
>>If there is a previous email discussion about this topic, could you 
>>send me
>>some links? I did a quick web search but did not find any related
>>info. Thanks.
>
>
>We had a lot:
>
>[1] 
>https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/
>[2] 
>https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html
>[3] https://www.lkml.org/lkml/2020/1/16/2043
>

When I tried it, the biggest problem that blocked me were all the 
features strictly related to TCP/IP stack and ethernet devices that 
vsock device doesn't know how to handle: TSO, GSO, checksums, MAC, napi, 
xdp, min ethernet frame size, MTU, etc.

So in my opinion to unify them is not so simple, because vsock is not 
really an ethernet device, but simply a socket.

But I fully agree that we shouldn't duplicate functionality and code, so 
maybe we could find those common parts and create helpers to be used by 
both.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
  2021-06-10  7:23         ` Stefano Garzarella
@ 2021-06-10  7:46           ` Jason Wang
  -1 siblings, 0 replies; 59+ messages in thread
From: Jason Wang @ 2021-06-10  7:46 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Jiang Wang .,
	virtualization, Stefan Hajnoczi, Michael S. Tsirkin,
	Arseny Krasnov, jhansen, cong.wang, Xiongchun Duan, Yongji Xie,
	柴稳,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Jorgen Hansen, Andra Paraschiv, Norbert Slusarek,
	Lu Wei, Alexander Popov, kvm, Networking, linux-kernel


在 2021/6/10 下午3:23, Stefano Garzarella 写道:
> On Thu, Jun 10, 2021 at 12:02:35PM +0800, Jason Wang wrote:
>>
>> 在 2021/6/10 上午11:43, Jiang Wang . 写道:
>>> On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>
>>>> 在 2021/6/10 上午7:24, Jiang Wang 写道:
>>>>> This patchset implements support of SOCK_DGRAM for virtio
>>>>> transport.
>>>>>
>>>>> Datagram sockets are connectionless and unreliable. To avoid 
>>>>> unfair contention
>>>>> with stream and other sockets, add two more virtqueues and
>>>>> a new feature bit to indicate if those two new queues exist or not.
>>>>>
>>>>> Dgram does not use the existing credit update mechanism for
>>>>> stream sockets. When sending from the guest/driver, sending packets
>>>>> synchronously, so the sender will get an error when the virtqueue 
>>>>> is full.
>>>>> When sending from the host/device, send packets asynchronously
>>>>> because the descriptor memory belongs to the corresponding QEMU
>>>>> process.
>>>>
>>>> What's the use case for the datagram vsock?
>>>>
>>> One use case is for non critical info logging from the guest
>>> to the host, such as the performance data of some applications.
>>
>>
>> Anything that prevents you from using the stream socket?
>>
>>
>>>
>>> It can also be used to replace UDP communications between
>>> the guest and the host.
>>
>>
>> Any advantage for VSOCK in this case? Is it for performance (I guess 
>> not since I don't exepct vsock will be faster).
>
> I think the general advantage to using vsock are for the guest agents 
> that potentially don't need any configuration.


Right, I wonder if we really need datagram consider the host to guest 
communication is reliable.

(Note that I don't object it since vsock has already supported that, 
just wonder its use cases)


>
>>
>> An obvious drawback is that it breaks the migration. Using UDP you 
>> can have a very rich features support from the kernel where vsock can't.
>>
>
> Thanks for bringing this up!
> What features does UDP support and datagram on vsock could not support?


E.g the sendpage() and busy polling. And using UDP means qdiscs and eBPF 
can work.


>
>>
>>>
>>>>> The virtio spec patch is here:
>>>>> https://www.spinics.net/lists/linux-virtualization/msg50027.html
>>>>
>>>> Have a quick glance, I suggest to split mergeable rx buffer into an
>>>> separate patch.
>>> Sure.
>>>
>>>> But I think it's time to revisit the idea of unifying the 
>>>> virtio-net and
>>>> virtio-vsock. Otherwise we're duplicating features and bugs.
>>> For mergeable rxbuf related code, I think a set of common helper
>>> functions can be used by both virtio-net and virtio-vsock. For other
>>> parts, that may not be very beneficial. I will think about more.
>>>
>>> If there is a previous email discussion about this topic, could you 
>>> send me
>>> some links? I did a quick web search but did not find any related
>>> info. Thanks.
>>
>>
>> We had a lot:
>>
>> [1] 
>> https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/
>> [2] 
>> https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html
>> [3] https://www.lkml.org/lkml/2020/1/16/2043
>>
>
> When I tried it, the biggest problem that blocked me were all the 
> features strictly related to TCP/IP stack and ethernet devices that 
> vsock device doesn't know how to handle: TSO, GSO, checksums, MAC, 
> napi, xdp, min ethernet frame size, MTU, etc.


It depends on which level we want to share:

1) sharing codes
2) sharing devices
3) make vsock a protocol that is understood by the network core

We can start from 1), the low level tx/rx logic can be shared at both 
virtio-net and vhost-net. For 2) we probably need some work on the spec, 
probably with a new feature bit to demonstrate that it's a vsock device 
not a ethernet device. Then if it is probed as a vsock device we won't 
let packet to be delivered in the TCP/IP stack. For 3), it would be even 
harder and I'm not sure it's worth to do that.


>
> So in my opinion to unify them is not so simple, because vsock is not 
> really an ethernet device, but simply a socket.


We can start from sharing codes.


>
> But I fully agree that we shouldn't duplicate functionality and code, 
> so maybe we could find those common parts and create helpers to be 
> used by both.


Yes.

Thanks


>
> Thanks,
> Stefano
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
@ 2021-06-10  7:46           ` Jason Wang
  0 siblings, 0 replies; 59+ messages in thread
From: Jason Wang @ 2021-06-10  7:46 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Andra Paraschiv, kvm, Michael S. Tsirkin,
	virtualization, Norbert Slusarek, jhansen, Xiongchun Duan,
	Yongji Xie, Ingo Molnar, Jakub Kicinski, Alexander Popov,
	Steven Rostedt, 柴稳, Stefan Hajnoczi, Jiang Wang .,
	Networking, linux-kernel, Lu Wei, Colin Ian King, Arseny Krasnov,
	David S. Miller, Jorgen Hansen


在 2021/6/10 下午3:23, Stefano Garzarella 写道:
> On Thu, Jun 10, 2021 at 12:02:35PM +0800, Jason Wang wrote:
>>
>> 在 2021/6/10 上午11:43, Jiang Wang . 写道:
>>> On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>
>>>> 在 2021/6/10 上午7:24, Jiang Wang 写道:
>>>>> This patchset implements support of SOCK_DGRAM for virtio
>>>>> transport.
>>>>>
>>>>> Datagram sockets are connectionless and unreliable. To avoid 
>>>>> unfair contention
>>>>> with stream and other sockets, add two more virtqueues and
>>>>> a new feature bit to indicate if those two new queues exist or not.
>>>>>
>>>>> Dgram does not use the existing credit update mechanism for
>>>>> stream sockets. When sending from the guest/driver, sending packets
>>>>> synchronously, so the sender will get an error when the virtqueue 
>>>>> is full.
>>>>> When sending from the host/device, send packets asynchronously
>>>>> because the descriptor memory belongs to the corresponding QEMU
>>>>> process.
>>>>
>>>> What's the use case for the datagram vsock?
>>>>
>>> One use case is for non critical info logging from the guest
>>> to the host, such as the performance data of some applications.
>>
>>
>> Anything that prevents you from using the stream socket?
>>
>>
>>>
>>> It can also be used to replace UDP communications between
>>> the guest and the host.
>>
>>
>> Any advantage for VSOCK in this case? Is it for performance (I guess 
>> not since I don't exepct vsock will be faster).
>
> I think the general advantage to using vsock are for the guest agents 
> that potentially don't need any configuration.


Right, I wonder if we really need datagram consider the host to guest 
communication is reliable.

(Note that I don't object it since vsock has already supported that, 
just wonder its use cases)


>
>>
>> An obvious drawback is that it breaks the migration. Using UDP you 
>> can have a very rich features support from the kernel where vsock can't.
>>
>
> Thanks for bringing this up!
> What features does UDP support and datagram on vsock could not support?


E.g the sendpage() and busy polling. And using UDP means qdiscs and eBPF 
can work.


>
>>
>>>
>>>>> The virtio spec patch is here:
>>>>> https://www.spinics.net/lists/linux-virtualization/msg50027.html
>>>>
>>>> Have a quick glance, I suggest to split mergeable rx buffer into an
>>>> separate patch.
>>> Sure.
>>>
>>>> But I think it's time to revisit the idea of unifying the 
>>>> virtio-net and
>>>> virtio-vsock. Otherwise we're duplicating features and bugs.
>>> For mergeable rxbuf related code, I think a set of common helper
>>> functions can be used by both virtio-net and virtio-vsock. For other
>>> parts, that may not be very beneficial. I will think about more.
>>>
>>> If there is a previous email discussion about this topic, could you 
>>> send me
>>> some links? I did a quick web search but did not find any related
>>> info. Thanks.
>>
>>
>> We had a lot:
>>
>> [1] 
>> https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/
>> [2] 
>> https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html
>> [3] https://www.lkml.org/lkml/2020/1/16/2043
>>
>
> When I tried it, the biggest problem that blocked me were all the 
> features strictly related to TCP/IP stack and ethernet devices that 
> vsock device doesn't know how to handle: TSO, GSO, checksums, MAC, 
> napi, xdp, min ethernet frame size, MTU, etc.


It depends on which level we want to share:

1) sharing codes
2) sharing devices
3) make vsock a protocol that is understood by the network core

We can start from 1), the low level tx/rx logic can be shared at both 
virtio-net and vhost-net. For 2) we probably need some work on the spec, 
probably with a new feature bit to demonstrate that it's a vsock device 
not a ethernet device. Then if it is probed as a vsock device we won't 
let packet to be delivered in the TCP/IP stack. For 3), it would be even 
harder and I'm not sure it's worth to do that.


>
> So in my opinion to unify them is not so simple, because vsock is not 
> really an ethernet device, but simply a socket.


We can start from sharing codes.


>
> But I fully agree that we shouldn't duplicate functionality and code, 
> so maybe we could find those common parts and create helpers to be 
> used by both.


Yes.

Thanks


>
> Thanks,
> Stefano
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
  2021-06-10  7:46           ` Jason Wang
@ 2021-06-10  9:51             ` Stefano Garzarella
  -1 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-10  9:51 UTC (permalink / raw)
  To: Jason Wang, Jiang Wang .
  Cc: virtualization, Stefan Hajnoczi, Michael S. Tsirkin,
	Arseny Krasnov, jhansen, cong.wang, Xiongchun Duan, Yongji Xie,
	柴稳,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Jorgen Hansen, Andra Paraschiv, Norbert Slusarek,
	Lu Wei, Alexander Popov, kvm, Networking, linux-kernel

On Thu, Jun 10, 2021 at 03:46:55PM +0800, Jason Wang wrote:
>
>在 2021/6/10 下午3:23, Stefano Garzarella 写道:
>>On Thu, Jun 10, 2021 at 12:02:35PM +0800, Jason Wang wrote:
>>>
>>>在 2021/6/10 上午11:43, Jiang Wang . 写道:
>>>>On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>
>>>>>在 2021/6/10 上午7:24, Jiang Wang 写道:
>>>>>>This patchset implements support of SOCK_DGRAM for virtio
>>>>>>transport.
>>>>>>
>>>>>>Datagram sockets are connectionless and unreliable. To avoid 
>>>>>>unfair contention
>>>>>>with stream and other sockets, add two more virtqueues and
>>>>>>a new feature bit to indicate if those two new queues exist or not.
>>>>>>
>>>>>>Dgram does not use the existing credit update mechanism for
>>>>>>stream sockets. When sending from the guest/driver, sending packets
>>>>>>synchronously, so the sender will get an error when the 
>>>>>>virtqueue is full.
>>>>>>When sending from the host/device, send packets asynchronously
>>>>>>because the descriptor memory belongs to the corresponding QEMU
>>>>>>process.
>>>>>
>>>>>What's the use case for the datagram vsock?
>>>>>
>>>>One use case is for non critical info logging from the guest
>>>>to the host, such as the performance data of some applications.
>>>
>>>
>>>Anything that prevents you from using the stream socket?
>>>
>>>
>>>>
>>>>It can also be used to replace UDP communications between
>>>>the guest and the host.
>>>
>>>
>>>Any advantage for VSOCK in this case? Is it for performance (I 
>>>guess not since I don't exepct vsock will be faster).
>>
>>I think the general advantage to using vsock are for the guest 
>>agents that potentially don't need any configuration.
>
>
>Right, I wonder if we really need datagram consider the host to guest 
>communication is reliable.
>
>(Note that I don't object it since vsock has already supported that, 
>just wonder its use cases)

Yep, it was the same concern I had :-)
Also because we're now adding SEQPACKET, which provides reliable 
datagram support.

But IIUC the use case is the logging where you don't need a reliable 
communication and you want to avoid to keep more open connections with 
different guests.

So the server in the host can be pretty simple and doesn't have to 
handle connections. It just waits for datagrams on a port.

>
>
>>
>>>
>>>An obvious drawback is that it breaks the migration. Using UDP you 
>>>can have a very rich features support from the kernel where vsock 
>>>can't.
>>>
>>
>>Thanks for bringing this up!
>>What features does UDP support and datagram on vsock could not support?
>
>
>E.g the sendpage() and busy polling. And using UDP means qdiscs and 
>eBPF can work.

Thanks, I see!

>
>
>>
>>>
>>>>
>>>>>>The virtio spec patch is here:
>>>>>>https://www.spinics.net/lists/linux-virtualization/msg50027.html
>>>>>
>>>>>Have a quick glance, I suggest to split mergeable rx buffer into an
>>>>>separate patch.
>>>>Sure.
>>>>
>>>>>But I think it's time to revisit the idea of unifying the 
>>>>>virtio-net and
>>>>>virtio-vsock. Otherwise we're duplicating features and bugs.
>>>>For mergeable rxbuf related code, I think a set of common helper
>>>>functions can be used by both virtio-net and virtio-vsock. For other
>>>>parts, that may not be very beneficial. I will think about more.
>>>>
>>>>If there is a previous email discussion about this topic, could 
>>>>you send me
>>>>some links? I did a quick web search but did not find any related
>>>>info. Thanks.
>>>
>>>
>>>We had a lot:
>>>
>>>[1] https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/
>>>[2] https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html
>>>[3] https://www.lkml.org/lkml/2020/1/16/2043
>>>
>>
>>When I tried it, the biggest problem that blocked me were all the 
>>features strictly related to TCP/IP stack and ethernet devices that 
>>vsock device doesn't know how to handle: TSO, GSO, checksums, MAC, 
>>napi, xdp, min ethernet frame size, MTU, etc.
>
>
>It depends on which level we want to share:
>
>1) sharing codes
>2) sharing devices
>3) make vsock a protocol that is understood by the network core
>
>We can start from 1), the low level tx/rx logic can be shared at both 
>virtio-net and vhost-net. For 2) we probably need some work on the 
>spec, probably with a new feature bit to demonstrate that it's a vsock 
>device not a ethernet device. Then if it is probed as a vsock device we 
>won't let packet to be delivered in the TCP/IP stack. For 3), it would 
>be even harder and I'm not sure it's worth to do that.
>
>
>>
>>So in my opinion to unify them is not so simple, because vsock is not 
>>really an ethernet device, but simply a socket.
>
>
>We can start from sharing codes.

Yep, I agree, and maybe the mergeable buffer is a good starting point to 
share code!

@Jiang, do you want to take a look of this possibility?

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
@ 2021-06-10  9:51             ` Stefano Garzarella
  0 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-10  9:51 UTC (permalink / raw)
  To: Jason Wang, Jiang Wang .
  Cc: cong.wang, Andra Paraschiv, kvm, Michael S. Tsirkin,
	virtualization, Norbert Slusarek, jhansen, Xiongchun Duan,
	Yongji Xie, Ingo Molnar, Jakub Kicinski, Alexander Popov,
	Steven Rostedt, 柴稳,
	Stefan Hajnoczi, Networking, linux-kernel, Lu Wei,
	Colin Ian King, Arseny Krasnov, David S. Miller, Jorgen Hansen

On Thu, Jun 10, 2021 at 03:46:55PM +0800, Jason Wang wrote:
>
>在 2021/6/10 下午3:23, Stefano Garzarella 写道:
>>On Thu, Jun 10, 2021 at 12:02:35PM +0800, Jason Wang wrote:
>>>
>>>在 2021/6/10 上午11:43, Jiang Wang . 写道:
>>>>On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>
>>>>>在 2021/6/10 上午7:24, Jiang Wang 写道:
>>>>>>This patchset implements support of SOCK_DGRAM for virtio
>>>>>>transport.
>>>>>>
>>>>>>Datagram sockets are connectionless and unreliable. To avoid 
>>>>>>unfair contention
>>>>>>with stream and other sockets, add two more virtqueues and
>>>>>>a new feature bit to indicate if those two new queues exist or not.
>>>>>>
>>>>>>Dgram does not use the existing credit update mechanism for
>>>>>>stream sockets. When sending from the guest/driver, sending packets
>>>>>>synchronously, so the sender will get an error when the 
>>>>>>virtqueue is full.
>>>>>>When sending from the host/device, send packets asynchronously
>>>>>>because the descriptor memory belongs to the corresponding QEMU
>>>>>>process.
>>>>>
>>>>>What's the use case for the datagram vsock?
>>>>>
>>>>One use case is for non critical info logging from the guest
>>>>to the host, such as the performance data of some applications.
>>>
>>>
>>>Anything that prevents you from using the stream socket?
>>>
>>>
>>>>
>>>>It can also be used to replace UDP communications between
>>>>the guest and the host.
>>>
>>>
>>>Any advantage for VSOCK in this case? Is it for performance (I 
>>>guess not since I don't exepct vsock will be faster).
>>
>>I think the general advantage to using vsock are for the guest 
>>agents that potentially don't need any configuration.
>
>
>Right, I wonder if we really need datagram consider the host to guest 
>communication is reliable.
>
>(Note that I don't object it since vsock has already supported that, 
>just wonder its use cases)

Yep, it was the same concern I had :-)
Also because we're now adding SEQPACKET, which provides reliable 
datagram support.

But IIUC the use case is the logging where you don't need a reliable 
communication and you want to avoid to keep more open connections with 
different guests.

So the server in the host can be pretty simple and doesn't have to 
handle connections. It just waits for datagrams on a port.

>
>
>>
>>>
>>>An obvious drawback is that it breaks the migration. Using UDP you 
>>>can have a very rich features support from the kernel where vsock 
>>>can't.
>>>
>>
>>Thanks for bringing this up!
>>What features does UDP support and datagram on vsock could not support?
>
>
>E.g the sendpage() and busy polling. And using UDP means qdiscs and 
>eBPF can work.

Thanks, I see!

>
>
>>
>>>
>>>>
>>>>>>The virtio spec patch is here:
>>>>>>https://www.spinics.net/lists/linux-virtualization/msg50027.html
>>>>>
>>>>>Have a quick glance, I suggest to split mergeable rx buffer into an
>>>>>separate patch.
>>>>Sure.
>>>>
>>>>>But I think it's time to revisit the idea of unifying the 
>>>>>virtio-net and
>>>>>virtio-vsock. Otherwise we're duplicating features and bugs.
>>>>For mergeable rxbuf related code, I think a set of common helper
>>>>functions can be used by both virtio-net and virtio-vsock. For other
>>>>parts, that may not be very beneficial. I will think about more.
>>>>
>>>>If there is a previous email discussion about this topic, could 
>>>>you send me
>>>>some links? I did a quick web search but did not find any related
>>>>info. Thanks.
>>>
>>>
>>>We had a lot:
>>>
>>>[1] https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/
>>>[2] https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html
>>>[3] https://www.lkml.org/lkml/2020/1/16/2043
>>>
>>
>>When I tried it, the biggest problem that blocked me were all the 
>>features strictly related to TCP/IP stack and ethernet devices that 
>>vsock device doesn't know how to handle: TSO, GSO, checksums, MAC, 
>>napi, xdp, min ethernet frame size, MTU, etc.
>
>
>It depends on which level we want to share:
>
>1) sharing codes
>2) sharing devices
>3) make vsock a protocol that is understood by the network core
>
>We can start from 1), the low level tx/rx logic can be shared at both 
>virtio-net and vhost-net. For 2) we probably need some work on the 
>spec, probably with a new feature bit to demonstrate that it's a vsock 
>device not a ethernet device. Then if it is probed as a vsock device we 
>won't let packet to be delivered in the TCP/IP stack. For 3), it would 
>be even harder and I'm not sure it's worth to do that.
>
>
>>
>>So in my opinion to unify them is not so simple, because vsock is not 
>>really an ethernet device, but simply a socket.
>
>
>We can start from sharing codes.

Yep, I agree, and maybe the mergeable buffer is a good starting point to 
share code!

@Jiang, do you want to take a look of this possibility?

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
  2021-06-10  9:51             ` Stefano Garzarella
@ 2021-06-10 16:44               ` Jiang Wang .
  -1 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang . @ 2021-06-10 16:44 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Jason Wang, virtualization, Stefan Hajnoczi, Michael S. Tsirkin,
	Arseny Krasnov, cong.wang, Xiongchun Duan, Yongji Xie,
	柴稳,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Jorgen Hansen, Andra Paraschiv, Norbert Slusarek,
	Lu Wei, Alexander Popov, kvm, Networking, linux-kernel

On Thu, Jun 10, 2021 at 2:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Thu, Jun 10, 2021 at 03:46:55PM +0800, Jason Wang wrote:
> >
> >在 2021/6/10 下午3:23, Stefano Garzarella 写道:
> >>On Thu, Jun 10, 2021 at 12:02:35PM +0800, Jason Wang wrote:
> >>>
> >>>在 2021/6/10 上午11:43, Jiang Wang . 写道:
> >>>>On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:
> >>>>>
> >>>>>在 2021/6/10 上午7:24, Jiang Wang 写道:
> >>>>>>This patchset implements support of SOCK_DGRAM for virtio
> >>>>>>transport.
> >>>>>>
> >>>>>>Datagram sockets are connectionless and unreliable. To avoid
> >>>>>>unfair contention
> >>>>>>with stream and other sockets, add two more virtqueues and
> >>>>>>a new feature bit to indicate if those two new queues exist or not.
> >>>>>>
> >>>>>>Dgram does not use the existing credit update mechanism for
> >>>>>>stream sockets. When sending from the guest/driver, sending packets
> >>>>>>synchronously, so the sender will get an error when the
> >>>>>>virtqueue is full.
> >>>>>>When sending from the host/device, send packets asynchronously
> >>>>>>because the descriptor memory belongs to the corresponding QEMU
> >>>>>>process.
> >>>>>
> >>>>>What's the use case for the datagram vsock?
> >>>>>
> >>>>One use case is for non critical info logging from the guest
> >>>>to the host, such as the performance data of some applications.
> >>>
> >>>
> >>>Anything that prevents you from using the stream socket?
> >>>
> >>>
> >>>>
> >>>>It can also be used to replace UDP communications between
> >>>>the guest and the host.
> >>>
> >>>
> >>>Any advantage for VSOCK in this case? Is it for performance (I
> >>>guess not since I don't exepct vsock will be faster).
> >>
> >>I think the general advantage to using vsock are for the guest
> >>agents that potentially don't need any configuration.
> >
> >
> >Right, I wonder if we really need datagram consider the host to guest
> >communication is reliable.
> >
> >(Note that I don't object it since vsock has already supported that,
> >just wonder its use cases)
>
> Yep, it was the same concern I had :-)
> Also because we're now adding SEQPACKET, which provides reliable
> datagram support.
>
> But IIUC the use case is the logging where you don't need a reliable
> communication and you want to avoid to keep more open connections with
> different guests.
>
> So the server in the host can be pretty simple and doesn't have to
> handle connections. It just waits for datagrams on a port.

Yes. With datagram sockets, the application code is simpler than the stream
sockets. Also, it will be easier to port existing applications written
for dgram,
such as UDP or unix domain socket with datagram types to the vsock
dgram sockets.

Compared to UDP, the vsock dgram has a minimum configuration. When
sending data from the guest to the host, the client in the guest knows
the host CID will always be 2. For UDP, the host IP may change depending
on the configuration.

The advantage over UNIX domain sockets is more obvious. We
have some applications talking to each other with UNIX domain sockets,
but now the applications are running inside VMs, so we will need to
use vsock and those applications use datagram types, so it is natural
and simpler if vsock has datagram types too.

And we can also run applications for vmware vsock dgram on
the QEMU directly.

btw, SEQPACKET also supports datagram, but the application code
logic is similar to stream sockets and the server needs to maintain
connections.

> >
> >
> >>
> >>>
> >>>An obvious drawback is that it breaks the migration. Using UDP you
> >>>can have a very rich features support from the kernel where vsock
> >>>can't.
> >>>
> >>
> >>Thanks for bringing this up!
> >>What features does UDP support and datagram on vsock could not support?
> >
> >
> >E.g the sendpage() and busy polling. And using UDP means qdiscs and
> >eBPF can work.
>
> Thanks, I see!
>
> >
> >
> >>
> >>>
> >>>>
> >>>>>>The virtio spec patch is here:
> >>>>>>https://www.spinics.net/lists/linux-virtualization/msg50027.html
> >>>>>
> >>>>>Have a quick glance, I suggest to split mergeable rx buffer into an
> >>>>>separate patch.
> >>>>Sure.
> >>>>
> >>>>>But I think it's time to revisit the idea of unifying the
> >>>>>virtio-net and
> >>>>>virtio-vsock. Otherwise we're duplicating features and bugs.
> >>>>For mergeable rxbuf related code, I think a set of common helper
> >>>>functions can be used by both virtio-net and virtio-vsock. For other
> >>>>parts, that may not be very beneficial. I will think about more.
> >>>>
> >>>>If there is a previous email discussion about this topic, could
> >>>>you send me
> >>>>some links? I did a quick web search but did not find any related
> >>>>info. Thanks.
> >>>
> >>>
> >>>We had a lot:
> >>>
> >>>[1] https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/
> >>>[2] https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html
> >>>[3] https://www.lkml.org/lkml/2020/1/16/2043
> >>>
Got it. I will check, thanks.

> >>When I tried it, the biggest problem that blocked me were all the
> >>features strictly related to TCP/IP stack and ethernet devices that
> >>vsock device doesn't know how to handle: TSO, GSO, checksums, MAC,
> >>napi, xdp, min ethernet frame size, MTU, etc.
> >
> >
> >It depends on which level we want to share:
> >
> >1) sharing codes
> >2) sharing devices
> >3) make vsock a protocol that is understood by the network core
> >
> >We can start from 1), the low level tx/rx logic can be shared at both
> >virtio-net and vhost-net. For 2) we probably need some work on the
> >spec, probably with a new feature bit to demonstrate that it's a vsock
> >device not a ethernet device. Then if it is probed as a vsock device we
> >won't let packet to be delivered in the TCP/IP stack. For 3), it would
> >be even harder and I'm not sure it's worth to do that.
> >
> >
> >>
> >>So in my opinion to unify them is not so simple, because vsock is not
> >>really an ethernet device, but simply a socket.
> >
> >
> >We can start from sharing codes.
>
> Yep, I agree, and maybe the mergeable buffer is a good starting point to
> share code!
>
> @Jiang, do you want to take a look of this possibility?

Yes. I already read code about mergeable buffer in virtio-net, which I think
is the only place so far to use it. I will check how to share the code.

Thanks for all the comments.

> Thanks,
> Stefano
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
@ 2021-06-10 16:44               ` Jiang Wang .
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang . @ 2021-06-10 16:44 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Andra Paraschiv, kvm, Michael S. Tsirkin,
	virtualization, Norbert Slusarek, Xiongchun Duan, Yongji Xie,
	Ingo Molnar, Jakub Kicinski, Alexander Popov, Steven Rostedt,
	柴稳,
	Stefan Hajnoczi, Networking, linux-kernel, Lu Wei,
	Colin Ian King, Arseny Krasnov, David S. Miller, Jorgen Hansen

On Thu, Jun 10, 2021 at 2:52 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Thu, Jun 10, 2021 at 03:46:55PM +0800, Jason Wang wrote:
> >
> >在 2021/6/10 下午3:23, Stefano Garzarella 写道:
> >>On Thu, Jun 10, 2021 at 12:02:35PM +0800, Jason Wang wrote:
> >>>
> >>>在 2021/6/10 上午11:43, Jiang Wang . 写道:
> >>>>On Wed, Jun 9, 2021 at 6:51 PM Jason Wang <jasowang@redhat.com> wrote:
> >>>>>
> >>>>>在 2021/6/10 上午7:24, Jiang Wang 写道:
> >>>>>>This patchset implements support of SOCK_DGRAM for virtio
> >>>>>>transport.
> >>>>>>
> >>>>>>Datagram sockets are connectionless and unreliable. To avoid
> >>>>>>unfair contention
> >>>>>>with stream and other sockets, add two more virtqueues and
> >>>>>>a new feature bit to indicate if those two new queues exist or not.
> >>>>>>
> >>>>>>Dgram does not use the existing credit update mechanism for
> >>>>>>stream sockets. When sending from the guest/driver, sending packets
> >>>>>>synchronously, so the sender will get an error when the
> >>>>>>virtqueue is full.
> >>>>>>When sending from the host/device, send packets asynchronously
> >>>>>>because the descriptor memory belongs to the corresponding QEMU
> >>>>>>process.
> >>>>>
> >>>>>What's the use case for the datagram vsock?
> >>>>>
> >>>>One use case is for non critical info logging from the guest
> >>>>to the host, such as the performance data of some applications.
> >>>
> >>>
> >>>Anything that prevents you from using the stream socket?
> >>>
> >>>
> >>>>
> >>>>It can also be used to replace UDP communications between
> >>>>the guest and the host.
> >>>
> >>>
> >>>Any advantage for VSOCK in this case? Is it for performance (I
> >>>guess not since I don't exepct vsock will be faster).
> >>
> >>I think the general advantage to using vsock are for the guest
> >>agents that potentially don't need any configuration.
> >
> >
> >Right, I wonder if we really need datagram consider the host to guest
> >communication is reliable.
> >
> >(Note that I don't object it since vsock has already supported that,
> >just wonder its use cases)
>
> Yep, it was the same concern I had :-)
> Also because we're now adding SEQPACKET, which provides reliable
> datagram support.
>
> But IIUC the use case is the logging where you don't need a reliable
> communication and you want to avoid to keep more open connections with
> different guests.
>
> So the server in the host can be pretty simple and doesn't have to
> handle connections. It just waits for datagrams on a port.

Yes. With datagram sockets, the application code is simpler than the stream
sockets. Also, it will be easier to port existing applications written
for dgram,
such as UDP or unix domain socket with datagram types to the vsock
dgram sockets.

Compared to UDP, the vsock dgram has a minimum configuration. When
sending data from the guest to the host, the client in the guest knows
the host CID will always be 2. For UDP, the host IP may change depending
on the configuration.

The advantage over UNIX domain sockets is more obvious. We
have some applications talking to each other with UNIX domain sockets,
but now the applications are running inside VMs, so we will need to
use vsock and those applications use datagram types, so it is natural
and simpler if vsock has datagram types too.

And we can also run applications for vmware vsock dgram on
the QEMU directly.

btw, SEQPACKET also supports datagram, but the application code
logic is similar to stream sockets and the server needs to maintain
connections.

> >
> >
> >>
> >>>
> >>>An obvious drawback is that it breaks the migration. Using UDP you
> >>>can have a very rich features support from the kernel where vsock
> >>>can't.
> >>>
> >>
> >>Thanks for bringing this up!
> >>What features does UDP support and datagram on vsock could not support?
> >
> >
> >E.g the sendpage() and busy polling. And using UDP means qdiscs and
> >eBPF can work.
>
> Thanks, I see!
>
> >
> >
> >>
> >>>
> >>>>
> >>>>>>The virtio spec patch is here:
> >>>>>>https://www.spinics.net/lists/linux-virtualization/msg50027.html
> >>>>>
> >>>>>Have a quick glance, I suggest to split mergeable rx buffer into an
> >>>>>separate patch.
> >>>>Sure.
> >>>>
> >>>>>But I think it's time to revisit the idea of unifying the
> >>>>>virtio-net and
> >>>>>virtio-vsock. Otherwise we're duplicating features and bugs.
> >>>>For mergeable rxbuf related code, I think a set of common helper
> >>>>functions can be used by both virtio-net and virtio-vsock. For other
> >>>>parts, that may not be very beneficial. I will think about more.
> >>>>
> >>>>If there is a previous email discussion about this topic, could
> >>>>you send me
> >>>>some links? I did a quick web search but did not find any related
> >>>>info. Thanks.
> >>>
> >>>
> >>>We had a lot:
> >>>
> >>>[1] https://patchwork.kernel.org/project/kvm/patch/5BDFF537.3050806@huawei.com/
> >>>[2] https://lists.linuxfoundation.org/pipermail/virtualization/2018-November/039798.html
> >>>[3] https://www.lkml.org/lkml/2020/1/16/2043
> >>>
Got it. I will check, thanks.

> >>When I tried it, the biggest problem that blocked me were all the
> >>features strictly related to TCP/IP stack and ethernet devices that
> >>vsock device doesn't know how to handle: TSO, GSO, checksums, MAC,
> >>napi, xdp, min ethernet frame size, MTU, etc.
> >
> >
> >It depends on which level we want to share:
> >
> >1) sharing codes
> >2) sharing devices
> >3) make vsock a protocol that is understood by the network core
> >
> >We can start from 1), the low level tx/rx logic can be shared at both
> >virtio-net and vhost-net. For 2) we probably need some work on the
> >spec, probably with a new feature bit to demonstrate that it's a vsock
> >device not a ethernet device. Then if it is probed as a vsock device we
> >won't let packet to be delivered in the TCP/IP stack. For 3), it would
> >be even harder and I'm not sure it's worth to do that.
> >
> >
> >>
> >>So in my opinion to unify them is not so simple, because vsock is not
> >>really an ethernet device, but simply a socket.
> >
> >
> >We can start from sharing codes.
>
> Yep, I agree, and maybe the mergeable buffer is a good starting point to
> share code!
>
> @Jiang, do you want to take a look of this possibility?

Yes. I already read code about mergeable buffer in virtio-net, which I think
is the only place so far to use it. I will check how to share the code.

Thanks for all the comments.

> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 2/6] virtio/vsock: add support for virtio datagram
  2021-06-09 23:24   ` Jiang Wang
  (?)
@ 2021-06-16  9:06   ` kernel test robot
  -1 siblings, 0 replies; 59+ messages in thread
From: kernel test robot @ 2021-06-16  9:06 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2774 bytes --]

Hi Jiang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on vhost/linux-next]
[also build test WARNING on tip/perf/core linus/master v5.13-rc6]
[cannot apply to next-20210615]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jiang-Wang/virtio-vsock-introduce-SOCK_DGRAM-support/20210616-120056
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: nios2-allyesconfig (attached as .config)
compiler: nios2-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/b156a0ad587c43dbfc98397f01b34fad15054bf0
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jiang-Wang/virtio-vsock-introduce-SOCK_DGRAM-support/20210616-120056
        git checkout b156a0ad587c43dbfc98397f01b34fad15054bf0
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nios2 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   net/vmw_vsock/virtio_transport.c: In function 'virtio_transport_dgram_tx_work':
>> net/vmw_vsock/virtio_transport.c:450:7: warning: variable 'added' set but not used [-Wunused-but-set-variable]
     450 |  bool added = false;
         |       ^~~~~


vim +/added +450 net/vmw_vsock/virtio_transport.c

   444	
   445	static void virtio_transport_dgram_tx_work(struct work_struct *work)
   446	{
   447		struct virtio_vsock *vsock =
   448			container_of(work, struct virtio_vsock, dgram_tx_work);
   449		struct virtqueue *vq;
 > 450		bool added = false;
   451	
   452		vq = vsock->vqs[VSOCK_VQ_DGRAM_TX];
   453		mutex_lock(&vsock->dgram_tx_lock);
   454	
   455		if (!vsock->dgram_tx_run)
   456			goto out;
   457	
   458		do {
   459			struct virtio_vsock_pkt *pkt;
   460			unsigned int len;
   461	
   462			virtqueue_disable_cb(vq);
   463			while ((pkt = virtqueue_get_buf(vq, &len)) != NULL) {
   464				virtio_transport_free_pkt(pkt);
   465				added = true;
   466			}
   467		} while (!virtqueue_enable_cb(vq));
   468	
   469	out:
   470		mutex_unlock(&vsock->dgram_tx_lock);
   471	}
   472	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 58967 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 2/6] virtio/vsock: add support for virtio datagram
  2021-06-09 23:24   ` Jiang Wang
  (?)
  (?)
@ 2021-06-16  9:17   ` kernel test robot
  -1 siblings, 0 replies; 59+ messages in thread
From: kernel test robot @ 2021-06-16  9:17 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3126 bytes --]

Hi Jiang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on vhost/linux-next]
[also build test WARNING on tip/perf/core linus/master v5.13-rc6]
[cannot apply to next-20210615]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jiang-Wang/virtio-vsock-introduce-SOCK_DGRAM-support/20210616-120056
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: x86_64-randconfig-s021-20210615 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/b156a0ad587c43dbfc98397f01b34fad15054bf0
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jiang-Wang/virtio-vsock-introduce-SOCK_DGRAM-support/20210616-120056
        git checkout b156a0ad587c43dbfc98397f01b34fad15054bf0
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> net/vmw_vsock/virtio_transport.c:283:21: sparse: sparse: restricted __le16 degrades to integer
--
>> net/vmw_vsock/virtio_transport_common.c:1055:33: sparse: sparse: restricted __le16 degrades to integer
   net/vmw_vsock/virtio_transport_common.c:1079:13: sparse: sparse: restricted __le16 degrades to integer
>> net/vmw_vsock/virtio_transport_common.c:1079:13: sparse: sparse: cast to restricted __le16

vim +283 net/vmw_vsock/virtio_transport.c

   276	
   277	static int
   278	virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
   279	{
   280		struct virtio_vsock *vsock;
   281		int len = pkt->len;
   282	
 > 283		if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)
   284			return virtio_transport_send_dgram_pkt(pkt);
   285	
   286		rcu_read_lock();
   287		vsock = rcu_dereference(the_virtio_vsock);
   288		if (!vsock) {
   289			virtio_transport_free_pkt(pkt);
   290			len = -ENODEV;
   291			goto out_rcu;
   292		}
   293	
   294		if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) {
   295			virtio_transport_free_pkt(pkt);
   296			len = -ENODEV;
   297			goto out_rcu;
   298		}
   299	
   300		if (pkt->reply)
   301			atomic_inc(&vsock->queued_replies);
   302	
   303		spin_lock_bh(&vsock->send_pkt_list_lock);
   304		list_add_tail(&pkt->list, &vsock->send_pkt_list);
   305		spin_unlock_bh(&vsock->send_pkt_list_lock);
   306	
   307		queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work);
   308	
   309	out_rcu:
   310		rcu_read_unlock();
   311		return len;
   312	}
   313	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 37189 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 2/6] virtio/vsock: add support for virtio datagram
  2021-06-09 23:24   ` Jiang Wang
                     ` (2 preceding siblings ...)
  (?)
@ 2021-06-16 11:18   ` kernel test robot
  -1 siblings, 0 replies; 59+ messages in thread
From: kernel test robot @ 2021-06-16 11:18 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3861 bytes --]

Hi Jiang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on vhost/linux-next]
[also build test WARNING on tip/perf/core linus/master v5.13-rc6]
[cannot apply to next-20210615]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jiang-Wang/virtio-vsock-introduce-SOCK_DGRAM-support/20210616-120056
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: x86_64-randconfig-a015-20210615 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 64720f57bea6a6bf033feef4a5751ab9c0c3b401)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/b156a0ad587c43dbfc98397f01b34fad15054bf0
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jiang-Wang/virtio-vsock-introduce-SOCK_DGRAM-support/20210616-120056
        git checkout b156a0ad587c43dbfc98397f01b34fad15054bf0
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> net/vmw_vsock/virtio_transport_common.c:416:6: warning: variable 'free_space' set but not used [-Wunused-but-set-variable]
           u32 free_space;
               ^
   1 warning generated.


vim +/free_space +416 net/vmw_vsock/virtio_transport_common.c

   408	
   409	static ssize_t
   410	virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
   411							   struct msghdr *msg, size_t len)
   412	{
   413		struct virtio_vsock_sock *vvs = vsk->trans;
   414		struct virtio_vsock_pkt *pkt;
   415		size_t total = 0;
 > 416		u32 free_space;
   417		int err = -EFAULT;
   418	
   419		spin_lock_bh(&vvs->rx_lock);
   420		if (total < len && !list_empty(&vvs->rx_queue)) {
   421			pkt = list_first_entry(&vvs->rx_queue,
   422					       struct virtio_vsock_pkt, list);
   423	
   424			total = len;
   425			if (total > pkt->len - pkt->off)
   426				total = pkt->len - pkt->off;
   427			else if (total < pkt->len - pkt->off)
   428				msg->msg_flags |= MSG_TRUNC;
   429	
   430			/* sk_lock is held by caller so no one else can dequeue.
   431			 * Unlock rx_lock since memcpy_to_msg() may sleep.
   432			 */
   433			spin_unlock_bh(&vvs->rx_lock);
   434	
   435			err = memcpy_to_msg(msg, pkt->buf + pkt->off, total);
   436			if (err)
   437				return err;
   438	
   439			spin_lock_bh(&vvs->rx_lock);
   440	
   441			virtio_transport_dec_rx_pkt(vvs, pkt);
   442			list_del(&pkt->list);
   443			virtio_transport_free_pkt(pkt);
   444		}
   445	
   446		free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
   447	
   448		spin_unlock_bh(&vvs->rx_lock);
   449	
   450		if (total > 0 && msg->msg_name) {
   451			/* Provide the address of the sender. */
   452			DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
   453	
   454			vsock_addr_init(vm_addr, le64_to_cpu(pkt->hdr.src_cid),
   455							le32_to_cpu(pkt->hdr.src_port));
   456			msg->msg_namelen = sizeof(*vm_addr);
   457		}
   458		return total;
   459	}
   460	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 40350 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 3/6] vhost/vsock: add support for vhost dgram.
  2021-06-09 23:24   ` Jiang Wang
  (?)
@ 2021-06-16 12:33   ` kernel test robot
  -1 siblings, 0 replies; 59+ messages in thread
From: kernel test robot @ 2021-06-16 12:33 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 8765 bytes --]

Hi Jiang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on vhost/linux-next]
[also build test WARNING on tip/perf/core linus/master v5.13-rc6]
[cannot apply to next-20210615]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jiang-Wang/virtio-vsock-introduce-SOCK_DGRAM-support/20210616-120056
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: arm64-randconfig-s031-20210615 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/0d43b802cb4112ba50c616916364ada91c24a7bb
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jiang-Wang/virtio-vsock-introduce-SOCK_DGRAM-support/20210616-120056
        git checkout 0d43b802cb4112ba50c616916364ada91c24a7bb
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/vhost/vsock.c:150:29: sparse: sparse: restricted __le16 degrades to integer
   drivers/vhost/vsock.c:345:21: sparse: sparse: restricted __le16 degrades to integer
   drivers/vhost/vsock.c:349:28: sparse: sparse: restricted __le16 degrades to integer
   drivers/vhost/vsock.c:364:21: sparse: sparse: restricted __le16 degrades to integer

vim +150 drivers/vhost/vsock.c

    96	
    97	static void
    98	vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
    99				    struct vhost_virtqueue *vq)
   100	{
   101		struct vhost_virtqueue *tx_vq;
   102		int pkts = 0, total_len = 0;
   103		bool added = false;
   104		bool restart_tx = false;
   105		spinlock_t *lock;
   106		struct list_head *send_pkt_list;
   107	
   108		if (vq == &vsock->vqs[VSOCK_VQ_RX]) {
   109			tx_vq = &vsock->vqs[VSOCK_VQ_TX];
   110			lock = &vsock->send_pkt_list_lock;
   111			send_pkt_list = &vsock->send_pkt_list;
   112		} else {
   113			tx_vq = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
   114			lock = &vsock->dgram_send_pkt_list_lock;
   115			send_pkt_list = &vsock->dgram_send_pkt_list;
   116		}
   117	
   118		mutex_lock(&vq->mutex);
   119	
   120		if (!vhost_vq_get_backend(vq))
   121			goto out;
   122	
   123		if (!vq_meta_prefetch(vq))
   124			goto out;
   125	
   126		/* Avoid further vmexits, we're already processing the virtqueue */
   127		vhost_disable_notify(&vsock->dev, vq);
   128	
   129		do {
   130			struct virtio_vsock_pkt *pkt;
   131			struct iov_iter iov_iter;
   132			unsigned out, in;
   133			size_t nbytes;
   134			size_t iov_len, payload_len;
   135			int head;
   136			bool is_dgram = false;
   137	
   138			spin_lock_bh(lock);
   139			if (list_empty(send_pkt_list)) {
   140				spin_unlock_bh(lock);
   141				vhost_enable_notify(&vsock->dev, vq);
   142				break;
   143			}
   144	
   145			pkt = list_first_entry(send_pkt_list,
   146					       struct virtio_vsock_pkt, list);
   147			list_del_init(&pkt->list);
   148			spin_unlock_bh(lock);
   149	
 > 150			if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)
   151				is_dgram = true;
   152	
   153			head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
   154						 &out, &in, NULL, NULL);
   155			if (head < 0) {
   156				spin_lock_bh(lock);
   157				list_add(&pkt->list, send_pkt_list);
   158				spin_unlock_bh(lock);
   159				break;
   160			}
   161	
   162			if (head == vq->num) {
   163				if (is_dgram) {
   164					virtio_transport_free_pkt(pkt);
   165					vq_err(vq, "Dgram virtqueue is full!");
   166					spin_lock_bh(lock);
   167					vsock->dgram_used--;
   168					spin_unlock_bh(lock);
   169					break;
   170				}
   171				spin_lock_bh(lock);
   172				list_add(&pkt->list, send_pkt_list);
   173				spin_unlock_bh(lock);
   174	
   175				/* We cannot finish yet if more buffers snuck in while
   176				* re-enabling notify.
   177				*/
   178				if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
   179					vhost_disable_notify(&vsock->dev, vq);
   180					continue;
   181				}
   182				break;
   183			}
   184	
   185			if (out) {
   186				virtio_transport_free_pkt(pkt);
   187				vq_err(vq, "Expected 0 output buffers, got %u\n", out);
   188				if (is_dgram) {
   189					spin_lock_bh(lock);
   190					vsock->dgram_used--;
   191					spin_unlock_bh(lock);
   192				}
   193	
   194				break;
   195			}
   196	
   197			iov_len = iov_length(&vq->iov[out], in);
   198			if (iov_len < sizeof(pkt->hdr)) {
   199				virtio_transport_free_pkt(pkt);
   200				vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
   201				if (is_dgram) {
   202					spin_lock_bh(lock);
   203					vsock->dgram_used--;
   204					spin_unlock_bh(lock);
   205				}
   206				break;
   207			}
   208	
   209			if (iov_len < pkt->len - pkt->off &&
   210				vq == &vsock->vqs[VSOCK_VQ_DGRAM_RX]) {
   211				virtio_transport_free_pkt(pkt);
   212				vq_err(vq, "Buffer len [%zu] too small for dgram\n", iov_len);
   213				break;
   214			}
   215	
   216			iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
   217			payload_len = pkt->len - pkt->off;
   218	
   219			/* If the packet is greater than the space available in the
   220			 * buffer, we split it using multiple buffers.
   221			 */
   222			if (payload_len > iov_len - sizeof(pkt->hdr))
   223				payload_len = iov_len - sizeof(pkt->hdr);
   224	
   225			/* Set the correct length in the header */
   226			pkt->hdr.len = cpu_to_le32(payload_len);
   227	
   228			nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
   229			if (nbytes != sizeof(pkt->hdr)) {
   230				virtio_transport_free_pkt(pkt);
   231				vq_err(vq, "Faulted on copying pkt hdr\n");
   232				if (is_dgram) {
   233					spin_lock_bh(lock);
   234					vsock->dgram_used--;
   235					spin_unlock_bh(lock);
   236				}
   237				break;
   238			}
   239	
   240			nbytes = copy_to_iter(pkt->buf + pkt->off, payload_len,
   241					      &iov_iter);
   242			if (nbytes != payload_len) {
   243				virtio_transport_free_pkt(pkt);
   244				vq_err(vq, "Faulted on copying pkt buf\n");
   245				break;
   246			}
   247	
   248			/* Deliver to monitoring devices all packets that we
   249			 * will transmit.
   250			 */
   251			virtio_transport_deliver_tap_pkt(pkt);
   252	
   253			vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
   254			added = true;
   255	
   256			pkt->off += payload_len;
   257			total_len += payload_len;
   258	
   259			/* If we didn't send all the payload we can requeue the packet
   260			 * to send it with the next available buffer.
   261			 */
   262			if ((pkt->off < pkt->len)
   263				&& (vq == &vsock->vqs[VSOCK_VQ_RX])) {
   264				/* We are queueing the same virtio_vsock_pkt to handle
   265				 * the remaining bytes, and we want to deliver it
   266				 * to monitoring devices in the next iteration.
   267				 */
   268				pkt->tap_delivered = false;
   269	
   270				spin_lock_bh(lock);
   271				list_add(&pkt->list, send_pkt_list);
   272				spin_unlock_bh(lock);
   273			} else {
   274				if (pkt->reply) {
   275					int val;
   276	
   277					val = atomic_dec_return(&vsock->queued_replies);
   278	
   279					/* Do we have resources to resume tx
   280					 * processing?
   281					 */
   282					if (val + 1 == tx_vq->num)
   283						restart_tx = true;
   284				}
   285	
   286				virtio_transport_free_pkt(pkt);
   287				if (is_dgram) {
   288					spin_lock_bh(lock);
   289					vsock->dgram_used--;
   290					spin_unlock_bh(lock);
   291				}
   292			}
   293		} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
   294		if (added)
   295			vhost_signal(&vsock->dev, vq);
   296	
   297	out:
   298		mutex_unlock(&vq->mutex);
   299	
   300		if (restart_tx)
   301			vhost_poll_queue(&tx_vq->poll);
   302	}
   303	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 44461 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 2/6] virtio/vsock: add support for virtio datagram
  2021-06-09 23:24   ` Jiang Wang
                     ` (3 preceding siblings ...)
  (?)
@ 2021-06-16 17:54   ` kernel test robot
  -1 siblings, 0 replies; 59+ messages in thread
From: kernel test robot @ 2021-06-16 17:54 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2908 bytes --]

Hi Jiang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on vhost/linux-next]
[also build test WARNING on tip/perf/core linus/master v5.13-rc6]
[cannot apply to next-20210616]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jiang-Wang/virtio-vsock-introduce-SOCK_DGRAM-support/20210616-120056
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: x86_64-randconfig-a002-20210616 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 64720f57bea6a6bf033feef4a5751ab9c0c3b401)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/b156a0ad587c43dbfc98397f01b34fad15054bf0
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jiang-Wang/virtio-vsock-introduce-SOCK_DGRAM-support/20210616-120056
        git checkout b156a0ad587c43dbfc98397f01b34fad15054bf0
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> net/vmw_vsock/virtio_transport.c:450:7: warning: variable 'added' set but not used [-Wunused-but-set-variable]
           bool added = false;
                ^
   1 warning generated.


vim +/added +450 net/vmw_vsock/virtio_transport.c

   444	
   445	static void virtio_transport_dgram_tx_work(struct work_struct *work)
   446	{
   447		struct virtio_vsock *vsock =
   448			container_of(work, struct virtio_vsock, dgram_tx_work);
   449		struct virtqueue *vq;
 > 450		bool added = false;
   451	
   452		vq = vsock->vqs[VSOCK_VQ_DGRAM_TX];
   453		mutex_lock(&vsock->dgram_tx_lock);
   454	
   455		if (!vsock->dgram_tx_run)
   456			goto out;
   457	
   458		do {
   459			struct virtio_vsock_pkt *pkt;
   460			unsigned int len;
   461	
   462			virtqueue_disable_cb(vq);
   463			while ((pkt = virtqueue_get_buf(vq, &len)) != NULL) {
   464				virtio_transport_free_pkt(pkt);
   465				added = true;
   466			}
   467		} while (!virtqueue_enable_cb(vq));
   468	
   469	out:
   470		mutex_unlock(&vsock->dgram_tx_lock);
   471	}
   472	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 38410 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
  2021-06-09 23:24 ` Jiang Wang
@ 2021-06-18  9:35   ` Stefano Garzarella
  -1 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-18  9:35 UTC (permalink / raw)
  To: Jiang Wang
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, Jason Wang,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Jorgen Hansen, Andra Paraschiv, Norbert Slusarek,
	Lu Wei, Alexander Popov, kvm, netdev, linux-kernel

On Wed, Jun 09, 2021 at 11:24:52PM +0000, Jiang Wang wrote:
>This patchset implements support of SOCK_DGRAM for virtio
>transport.
>
>Datagram sockets are connectionless and unreliable. To avoid unfair contention
>with stream and other sockets, add two more virtqueues and
>a new feature bit to indicate if those two new queues exist or not.
>
>Dgram does not use the existing credit update mechanism for
>stream sockets. When sending from the guest/driver, sending packets
>synchronously, so the sender will get an error when the virtqueue is full.
>When sending from the host/device, send packets asynchronously
>because the descriptor memory belongs to the corresponding QEMU
>process.
>
>The virtio spec patch is here:
>https://www.spinics.net/lists/linux-virtualization/msg50027.html
>
>For those who prefer git repo, here is the link for the linux kernel:
>https://github.com/Jiang1155/linux/tree/vsock-dgram-v1
>
>qemu patch link:
>https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1
>
>
>To do:
>1. use skb when receiving packets
>2. support multiple transport
>3. support mergeable rx buffer

Jiang, I'll do a fast review, but I think is better to rebase on 
net-next since SEQPACKET support is now merged.

Please also run ./scripts/checkpatch.pl, there are a lot of issues.

I'll leave some simple comments in the patches, but I prefer to do a 
deep review after the rebase and the dynamic handling of DGRAM.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
@ 2021-06-18  9:35   ` Stefano Garzarella
  0 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-18  9:35 UTC (permalink / raw)
  To: Jiang Wang
  Cc: cong.wang, Andra Paraschiv, kvm, mst, virtualization,
	Norbert Slusarek, jhansen, duanxiongchun, xieyongji, Ingo Molnar,
	Jakub Kicinski, Alexander Popov, Steven Rostedt, chaiwen.cc,
	stefanha, netdev, linux-kernel, Lu Wei, Colin Ian King,
	arseny.krasnov, David S. Miller, Jorgen Hansen

On Wed, Jun 09, 2021 at 11:24:52PM +0000, Jiang Wang wrote:
>This patchset implements support of SOCK_DGRAM for virtio
>transport.
>
>Datagram sockets are connectionless and unreliable. To avoid unfair contention
>with stream and other sockets, add two more virtqueues and
>a new feature bit to indicate if those two new queues exist or not.
>
>Dgram does not use the existing credit update mechanism for
>stream sockets. When sending from the guest/driver, sending packets
>synchronously, so the sender will get an error when the virtqueue is full.
>When sending from the host/device, send packets asynchronously
>because the descriptor memory belongs to the corresponding QEMU
>process.
>
>The virtio spec patch is here:
>https://www.spinics.net/lists/linux-virtualization/msg50027.html
>
>For those who prefer git repo, here is the link for the linux kernel:
>https://github.com/Jiang1155/linux/tree/vsock-dgram-v1
>
>qemu patch link:
>https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1
>
>
>To do:
>1. use skb when receiving packets
>2. support multiple transport
>3. support mergeable rx buffer

Jiang, I'll do a fast review, but I think is better to rebase on 
net-next since SEQPACKET support is now merged.

Please also run ./scripts/checkpatch.pl, there are a lot of issues.

I'll leave some simple comments in the patches, but I prefer to do a 
deep review after the rebase and the dynamic handling of DGRAM.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 1/6] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
  2021-06-09 23:24   ` Jiang Wang
@ 2021-06-18  9:39     ` Stefano Garzarella
  -1 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-18  9:39 UTC (permalink / raw)
  To: Jiang Wang
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, Jason Wang,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Andra Paraschiv, Norbert Slusarek, Colin Ian King,
	Alexander Popov, kvm, netdev, linux-kernel

On Wed, Jun 09, 2021 at 11:24:53PM +0000, Jiang Wang wrote:
>When this feature is enabled, allocate 5 queues,
>otherwise, allocate 3 queues to be compatible with
>old QEMU versions.
>
>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>---
> drivers/vhost/vsock.c             |  3 +-
> include/linux/virtio_vsock.h      |  9 +++++
> include/uapi/linux/virtio_vsock.h |  3 ++
> net/vmw_vsock/virtio_transport.c  | 73 +++++++++++++++++++++++++++++++++++----
> 4 files changed, 80 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 5e78fb719602..81d064601093 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -31,7 +31,8 @@
>
> enum {
> 	VHOST_VSOCK_FEATURES = VHOST_FEATURES |
>-			       (1ULL << VIRTIO_F_ACCESS_PLATFORM)
>+			       (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
>+			       (1ULL << VIRTIO_VSOCK_F_DGRAM)
> };
>
> enum {
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index dc636b727179..ba3189ed9345 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -18,6 +18,15 @@ enum {
> 	VSOCK_VQ_MAX    = 3,
> };
>
>+enum {
>+	VSOCK_VQ_STREAM_RX     = 0, /* for host to guest data */
>+	VSOCK_VQ_STREAM_TX     = 1, /* for guest to host data */
>+	VSOCK_VQ_DGRAM_RX       = 2,
>+	VSOCK_VQ_DGRAM_TX       = 3,
>+	VSOCK_VQ_EX_EVENT       = 4,
>+	VSOCK_VQ_EX_MAX         = 5,
>+};
>+
> /* Per-socket state (accessed via vsk->trans) */
> struct virtio_vsock_sock {
> 	struct vsock_sock *vsk;
>diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>index 1d57ed3d84d2..b56614dff1c9 100644
>--- a/include/uapi/linux/virtio_vsock.h
>+++ b/include/uapi/linux/virtio_vsock.h
>@@ -38,6 +38,9 @@
> #include <linux/virtio_ids.h>
> #include <linux/virtio_config.h>
>
>+/* The feature bitmap for virtio net */
>+#define VIRTIO_VSOCK_F_DGRAM	0	/* Host support dgram vsock */
>+
> struct virtio_vsock_config {
> 	__le64 guest_cid;
> } __attribute__((packed));
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 2700a63ab095..7dcb8db23305 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -27,7 +27,8 @@ static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
>
> struct virtio_vsock {
> 	struct virtio_device *vdev;
>-	struct virtqueue *vqs[VSOCK_VQ_MAX];
>+	struct virtqueue **vqs;
>+	bool has_dgram;
>
> 	/* Virtqueue processing is deferred to a workqueue */
> 	struct work_struct tx_work;
>@@ -333,7 +334,10 @@ static int virtio_vsock_event_fill_one(struct virtio_vsock *vsock,
> 	struct scatterlist sg;
> 	struct virtqueue *vq;
>
>-	vq = vsock->vqs[VSOCK_VQ_EVENT];
>+	if (vsock->has_dgram)
>+		vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
>+	else
>+		vq = vsock->vqs[VSOCK_VQ_EVENT];
>
> 	sg_init_one(&sg, event, sizeof(*event));
>
>@@ -351,7 +355,10 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)
> 		virtio_vsock_event_fill_one(vsock, event);
> 	}
>
>-	virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
>+	if (vsock->has_dgram)
>+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
>+	else
>+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
> }
>
> static void virtio_vsock_reset_sock(struct sock *sk)
>@@ -391,7 +398,10 @@ static void virtio_transport_event_work(struct work_struct *work)
> 		container_of(work, struct virtio_vsock, event_work);
> 	struct virtqueue *vq;
>
>-	vq = vsock->vqs[VSOCK_VQ_EVENT];
>+	if (vsock->has_dgram)
>+		vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
>+	else
>+		vq = vsock->vqs[VSOCK_VQ_EVENT];
>
> 	mutex_lock(&vsock->event_lock);
>
>@@ -411,7 +421,10 @@ static void virtio_transport_event_work(struct work_struct *work)
> 		}
> 	} while (!virtqueue_enable_cb(vq));
>
>-	virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
>+	if (vsock->has_dgram)
>+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
>+	else
>+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
> out:
> 	mutex_unlock(&vsock->event_lock);
> }
>@@ -434,6 +447,10 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)
> 	queue_work(virtio_vsock_workqueue, &vsock->tx_work);
> }
>
>+static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)
>+{
>+}
>+
> static void virtio_vsock_rx_done(struct virtqueue *vq)
> {
> 	struct virtio_vsock *vsock = vq->vdev->priv;
>@@ -443,6 +460,10 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> 	queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> }
>
>+static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)
>+{
>+}
>+
> static struct virtio_transport virtio_transport = {
> 	.transport = {
> 		.module                   = THIS_MODULE,
>@@ -545,13 +566,29 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> 		virtio_vsock_tx_done,
> 		virtio_vsock_event_done,
> 	};
>+	vq_callback_t *ex_callbacks[] = {

'ex' is not clear, maybe better 'dgram'?

What happen if F_DGRAM is negotiated, but not F_STREAM?

>+		virtio_vsock_rx_done,
>+		virtio_vsock_tx_done,
>+		virtio_vsock_dgram_rx_done,
>+		virtio_vsock_dgram_tx_done,
>+		virtio_vsock_event_done,
>+	};
>+
> 	static const char * const names[] = {
> 		"rx",
> 		"tx",
> 		"event",
> 	};
>+	static const char * const ex_names[] = {
>+		"rx",
>+		"tx",
>+		"dgram_rx",
>+		"dgram_tx",
>+		"event",
>+	};
>+
> 	struct virtio_vsock *vsock = NULL;
>-	int ret;
>+	int ret, max_vq;
>
> 	ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);
> 	if (ret)
>@@ -572,9 +609,30 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
>
> 	vsock->vdev = vdev;
>
>-	ret = virtio_find_vqs(vsock->vdev, VSOCK_VQ_MAX,
>+	if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
>+		vsock->has_dgram = true;
>+
>+	if (vsock->has_dgram)
>+		max_vq = VSOCK_VQ_EX_MAX;
>+	else
>+		max_vq = VSOCK_VQ_MAX;
>+
>+	vsock->vqs = kmalloc_array(max_vq, sizeof(struct virtqueue *), GFP_KERNEL);
>+	if (!vsock->vqs) {
>+		ret = -ENOMEM;
>+		goto out;
>+	}
>+
>+	if (vsock->has_dgram) {
>+		ret = virtio_find_vqs(vsock->vdev, max_vq,
>+			      vsock->vqs, ex_callbacks, ex_names,
>+			      NULL);
>+	} else {
>+		ret = virtio_find_vqs(vsock->vdev, max_vq,
> 			      vsock->vqs, callbacks, names,
> 			      NULL);
>+	}
>+
> 	if (ret < 0)
> 		goto out;
>
>@@ -695,6 +753,7 @@ static struct virtio_device_id id_table[] = {
> };
>
> static unsigned int features[] = {
>+	VIRTIO_VSOCK_F_DGRAM,
> };
>
> static struct virtio_driver virtio_vsock_driver = {
>-- 
>2.11.0
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 1/6] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
@ 2021-06-18  9:39     ` Stefano Garzarella
  0 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-18  9:39 UTC (permalink / raw)
  To: Jiang Wang
  Cc: cong.wang, duanxiongchun, Ingo Molnar, kvm, mst, netdev,
	linux-kernel, Steven Rostedt, virtualization, xieyongji,
	chaiwen.cc, Norbert Slusarek, stefanha, Colin Ian King,
	Jakub Kicinski, arseny.krasnov, Alexander Popov, jhansen,
	David S. Miller, Andra Paraschiv

On Wed, Jun 09, 2021 at 11:24:53PM +0000, Jiang Wang wrote:
>When this feature is enabled, allocate 5 queues,
>otherwise, allocate 3 queues to be compatible with
>old QEMU versions.
>
>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>---
> drivers/vhost/vsock.c             |  3 +-
> include/linux/virtio_vsock.h      |  9 +++++
> include/uapi/linux/virtio_vsock.h |  3 ++
> net/vmw_vsock/virtio_transport.c  | 73 +++++++++++++++++++++++++++++++++++----
> 4 files changed, 80 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 5e78fb719602..81d064601093 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -31,7 +31,8 @@
>
> enum {
> 	VHOST_VSOCK_FEATURES = VHOST_FEATURES |
>-			       (1ULL << VIRTIO_F_ACCESS_PLATFORM)
>+			       (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
>+			       (1ULL << VIRTIO_VSOCK_F_DGRAM)
> };
>
> enum {
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index dc636b727179..ba3189ed9345 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -18,6 +18,15 @@ enum {
> 	VSOCK_VQ_MAX    = 3,
> };
>
>+enum {
>+	VSOCK_VQ_STREAM_RX     = 0, /* for host to guest data */
>+	VSOCK_VQ_STREAM_TX     = 1, /* for guest to host data */
>+	VSOCK_VQ_DGRAM_RX       = 2,
>+	VSOCK_VQ_DGRAM_TX       = 3,
>+	VSOCK_VQ_EX_EVENT       = 4,
>+	VSOCK_VQ_EX_MAX         = 5,
>+};
>+
> /* Per-socket state (accessed via vsk->trans) */
> struct virtio_vsock_sock {
> 	struct vsock_sock *vsk;
>diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>index 1d57ed3d84d2..b56614dff1c9 100644
>--- a/include/uapi/linux/virtio_vsock.h
>+++ b/include/uapi/linux/virtio_vsock.h
>@@ -38,6 +38,9 @@
> #include <linux/virtio_ids.h>
> #include <linux/virtio_config.h>
>
>+/* The feature bitmap for virtio net */
>+#define VIRTIO_VSOCK_F_DGRAM	0	/* Host support dgram vsock */
>+
> struct virtio_vsock_config {
> 	__le64 guest_cid;
> } __attribute__((packed));
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 2700a63ab095..7dcb8db23305 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -27,7 +27,8 @@ static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
>
> struct virtio_vsock {
> 	struct virtio_device *vdev;
>-	struct virtqueue *vqs[VSOCK_VQ_MAX];
>+	struct virtqueue **vqs;
>+	bool has_dgram;
>
> 	/* Virtqueue processing is deferred to a workqueue */
> 	struct work_struct tx_work;
>@@ -333,7 +334,10 @@ static int virtio_vsock_event_fill_one(struct virtio_vsock *vsock,
> 	struct scatterlist sg;
> 	struct virtqueue *vq;
>
>-	vq = vsock->vqs[VSOCK_VQ_EVENT];
>+	if (vsock->has_dgram)
>+		vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
>+	else
>+		vq = vsock->vqs[VSOCK_VQ_EVENT];
>
> 	sg_init_one(&sg, event, sizeof(*event));
>
>@@ -351,7 +355,10 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)
> 		virtio_vsock_event_fill_one(vsock, event);
> 	}
>
>-	virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
>+	if (vsock->has_dgram)
>+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
>+	else
>+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
> }
>
> static void virtio_vsock_reset_sock(struct sock *sk)
>@@ -391,7 +398,10 @@ static void virtio_transport_event_work(struct work_struct *work)
> 		container_of(work, struct virtio_vsock, event_work);
> 	struct virtqueue *vq;
>
>-	vq = vsock->vqs[VSOCK_VQ_EVENT];
>+	if (vsock->has_dgram)
>+		vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
>+	else
>+		vq = vsock->vqs[VSOCK_VQ_EVENT];
>
> 	mutex_lock(&vsock->event_lock);
>
>@@ -411,7 +421,10 @@ static void virtio_transport_event_work(struct work_struct *work)
> 		}
> 	} while (!virtqueue_enable_cb(vq));
>
>-	virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
>+	if (vsock->has_dgram)
>+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
>+	else
>+		virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
> out:
> 	mutex_unlock(&vsock->event_lock);
> }
>@@ -434,6 +447,10 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)
> 	queue_work(virtio_vsock_workqueue, &vsock->tx_work);
> }
>
>+static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)
>+{
>+}
>+
> static void virtio_vsock_rx_done(struct virtqueue *vq)
> {
> 	struct virtio_vsock *vsock = vq->vdev->priv;
>@@ -443,6 +460,10 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> 	queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> }
>
>+static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)
>+{
>+}
>+
> static struct virtio_transport virtio_transport = {
> 	.transport = {
> 		.module                   = THIS_MODULE,
>@@ -545,13 +566,29 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> 		virtio_vsock_tx_done,
> 		virtio_vsock_event_done,
> 	};
>+	vq_callback_t *ex_callbacks[] = {

'ex' is not clear, maybe better 'dgram'?

What happen if F_DGRAM is negotiated, but not F_STREAM?

>+		virtio_vsock_rx_done,
>+		virtio_vsock_tx_done,
>+		virtio_vsock_dgram_rx_done,
>+		virtio_vsock_dgram_tx_done,
>+		virtio_vsock_event_done,
>+	};
>+
> 	static const char * const names[] = {
> 		"rx",
> 		"tx",
> 		"event",
> 	};
>+	static const char * const ex_names[] = {
>+		"rx",
>+		"tx",
>+		"dgram_rx",
>+		"dgram_tx",
>+		"event",
>+	};
>+
> 	struct virtio_vsock *vsock = NULL;
>-	int ret;
>+	int ret, max_vq;
>
> 	ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);
> 	if (ret)
>@@ -572,9 +609,30 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
>
> 	vsock->vdev = vdev;
>
>-	ret = virtio_find_vqs(vsock->vdev, VSOCK_VQ_MAX,
>+	if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
>+		vsock->has_dgram = true;
>+
>+	if (vsock->has_dgram)
>+		max_vq = VSOCK_VQ_EX_MAX;
>+	else
>+		max_vq = VSOCK_VQ_MAX;
>+
>+	vsock->vqs = kmalloc_array(max_vq, sizeof(struct virtqueue *), GFP_KERNEL);
>+	if (!vsock->vqs) {
>+		ret = -ENOMEM;
>+		goto out;
>+	}
>+
>+	if (vsock->has_dgram) {
>+		ret = virtio_find_vqs(vsock->vdev, max_vq,
>+			      vsock->vqs, ex_callbacks, ex_names,
>+			      NULL);
>+	} else {
>+		ret = virtio_find_vqs(vsock->vdev, max_vq,
> 			      vsock->vqs, callbacks, names,
> 			      NULL);
>+	}
>+
> 	if (ret < 0)
> 		goto out;
>
>@@ -695,6 +753,7 @@ static struct virtio_device_id id_table[] = {
> };
>
> static unsigned int features[] = {
>+	VIRTIO_VSOCK_F_DGRAM,
> };
>
> static struct virtio_driver virtio_vsock_driver = {
>-- 
>2.11.0
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 2/6] virtio/vsock: add support for virtio datagram
  2021-06-09 23:24   ` Jiang Wang
@ 2021-06-18  9:52     ` Stefano Garzarella
  -1 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-18  9:52 UTC (permalink / raw)
  To: Jiang Wang
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, Jason Wang,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Andra Paraschiv, Norbert Slusarek,
	Alexander Popov, kvm, netdev, linux-kernel

On Wed, Jun 09, 2021 at 11:24:54PM +0000, Jiang Wang wrote:
>This patch add support for virtio dgram for the driver.
>Implemented related functions for tx and rx, enqueue
>and dequeue. Send packets synchronously to give sender
>indication when the virtqueue is full.
>Refactored virtio_transport_send_pkt_work() a little bit but
>no functions changes for it.
>
>Support for the host/device side is in another
>patch.
>
>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>---
> include/net/af_vsock.h                             |   1 +
> .../trace/events/vsock_virtio_transport_common.h   |   5 +-
> include/uapi/linux/virtio_vsock.h                  |   1 +
> net/vmw_vsock/af_vsock.c                           |  12 +
> net/vmw_vsock/virtio_transport.c                   | 325 ++++++++++++++++++---
> net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++++--
> 6 files changed, 466 insertions(+), 62 deletions(-)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index b1c717286993..fcae7bca9609 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -200,6 +200,7 @@ void vsock_remove_sock(struct vsock_sock *vsk);
> void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
> int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
> bool vsock_find_cid(unsigned int cid);
>+int vsock_bind_stream(struct vsock_sock *vsk, struct sockaddr_vm *addr);
>
> /**** TAP ****/
>
>diff --git a/include/trace/events/vsock_virtio_transport_common.h b/include/trace/events/vsock_virtio_transport_common.h
>index 6782213778be..b1be25b327a1 100644
>--- a/include/trace/events/vsock_virtio_transport_common.h
>+++ b/include/trace/events/vsock_virtio_transport_common.h
>@@ -9,9 +9,12 @@
> #include <linux/tracepoint.h>
>
> TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_STREAM);
>+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_DGRAM);
>
> #define show_type(val) \
>-	__print_symbolic(val, { VIRTIO_VSOCK_TYPE_STREAM, "STREAM" })
>+	 __print_symbolic(val, \
>+					{ VIRTIO_VSOCK_TYPE_STREAM, "STREAM" }, \
>+					{ VIRTIO_VSOCK_TYPE_DGRAM, "DGRAM" })
>
> TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_INVALID);
> TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_REQUEST);
>diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>index b56614dff1c9..5503585b26e8 100644
>--- a/include/uapi/linux/virtio_vsock.h
>+++ b/include/uapi/linux/virtio_vsock.h
>@@ -68,6 +68,7 @@ struct virtio_vsock_hdr {
>
> enum virtio_vsock_type {
> 	VIRTIO_VSOCK_TYPE_STREAM = 1,
>+	VIRTIO_VSOCK_TYPE_DGRAM = 3,
> };
>
> enum virtio_vsock_op {
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 92a72f0e0d94..c1f512291b94 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -659,6 +659,18 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
> 	return 0;
> }
>
>+int vsock_bind_stream(struct vsock_sock *vsk,
>+				       struct sockaddr_vm *addr)
>+{
>+	int retval;
>+
>+	spin_lock_bh(&vsock_table_lock);
>+	retval = __vsock_bind_stream(vsk, addr);
>+	spin_unlock_bh(&vsock_table_lock);
>+	return retval;
>+}
>+EXPORT_SYMBOL(vsock_bind_stream);
>+
> static int __vsock_bind_dgram(struct vsock_sock *vsk,
> 			      struct sockaddr_vm *addr)
> {
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 7dcb8db23305..cf47aadb0c34 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -20,21 +20,29 @@
> #include <net/sock.h>
> #include <linux/mutex.h>
> #include <net/af_vsock.h>
>+#include<linux/kobject.h>
           ^
           Space needed here
>+#include<linux/sysfs.h>
           ^
           Ditto
>+#include <linux/refcount.h>
>
> static struct workqueue_struct *virtio_vsock_workqueue;
> static struct virtio_vsock __rcu *the_virtio_vsock;
>+static struct virtio_vsock *the_virtio_vsock_dgram;
> static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
>
> struct virtio_vsock {
> 	struct virtio_device *vdev;
> 	struct virtqueue **vqs;
> 	bool has_dgram;
>+	refcount_t active;
>
> 	/* Virtqueue processing is deferred to a workqueue */
> 	struct work_struct tx_work;
> 	struct work_struct rx_work;
> 	struct work_struct event_work;
>
>+	struct work_struct dgram_tx_work;
>+	struct work_struct dgram_rx_work;
>+
> 	/* The following fields are protected by tx_lock.  vqs[VSOCK_VQ_TX]
> 	 * must be accessed with tx_lock held.
> 	 */
>@@ -55,6 +63,22 @@ struct virtio_vsock {
> 	int rx_buf_nr;
> 	int rx_buf_max_nr;
>
>+	/* The following fields are protected by dgram_tx_lock.  vqs[VSOCK_VQ_DGRAM_TX]
>+	 * must be accessed with dgram_tx_lock held.
>+	 */
>+	struct mutex dgram_tx_lock;
>+	bool dgram_tx_run;
>+
>+	atomic_t dgram_queued_replies;
>+
>+	/* The following fields are protected by dgram_rx_lock.  vqs[VSOCK_VQ_DGRAM_RX]
>+	 * must be accessed with dgram_rx_lock held.
>+	 */
>+	struct mutex dgram_rx_lock;
>+	bool dgram_rx_run;
>+	int dgram_rx_buf_nr;
>+	int dgram_rx_buf_max_nr;
>+
> 	/* The following fields are protected by event_lock.
> 	 * vqs[VSOCK_VQ_EVENT] must be accessed with event_lock held.
> 	 */
>@@ -83,21 +107,11 @@ static u32 virtio_transport_get_local_cid(void)
> 	return ret;
> }
>
>-static void
>-virtio_transport_send_pkt_work(struct work_struct *work)
>+static void virtio_transport_do_send_pkt(struct virtio_vsock *vsock,
>+		struct virtqueue *vq,  spinlock_t *lock, struct list_head *send_pkt_list,
>+		bool *restart_rx)
> {
>-	struct virtio_vsock *vsock =
>-		container_of(work, struct virtio_vsock, send_pkt_work);
>-	struct virtqueue *vq;
> 	bool added = false;
>-	bool restart_rx = false;
>-
>-	mutex_lock(&vsock->tx_lock);
>-
>-	if (!vsock->tx_run)
>-		goto out;
>-
>-	vq = vsock->vqs[VSOCK_VQ_TX];
>
> 	for (;;) {
> 		struct virtio_vsock_pkt *pkt;
>@@ -105,16 +119,16 @@ virtio_transport_send_pkt_work(struct work_struct *work)
> 		int ret, in_sg = 0, out_sg = 0;
> 		bool reply;
>
>-		spin_lock_bh(&vsock->send_pkt_list_lock);
>-		if (list_empty(&vsock->send_pkt_list)) {
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+		spin_lock_bh(lock);
>+		if (list_empty(send_pkt_list)) {
>+			spin_unlock_bh(lock);
> 			break;
> 		}
>
>-		pkt = list_first_entry(&vsock->send_pkt_list,
>+		pkt = list_first_entry(send_pkt_list,
> 				       struct virtio_vsock_pkt, list);
> 		list_del_init(&pkt->list);
>-		spin_unlock_bh(&vsock->send_pkt_list_lock);
>+		spin_unlock_bh(lock);
>
> 		virtio_transport_deliver_tap_pkt(pkt);
>
>@@ -132,9 +146,9 @@ virtio_transport_send_pkt_work(struct work_struct *work)
> 		 * the vq
> 		 */
> 		if (ret < 0) {
>-			spin_lock_bh(&vsock->send_pkt_list_lock);
>-			list_add(&pkt->list, &vsock->send_pkt_list);
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+			spin_lock_bh(lock);
>+			list_add(&pkt->list, send_pkt_list);
>+			spin_unlock_bh(lock);
> 			break;
> 		}
>
>@@ -146,7 +160,7 @@ virtio_transport_send_pkt_work(struct work_struct *work)
>
> 			/* Do we now have resources to resume rx processing? */
> 			if (val + 1 == virtqueue_get_vring_size(rx_vq))
>-				restart_rx = true;
>+				*restart_rx = true;
> 		}
>
> 		added = true;
>@@ -154,7 +168,55 @@ virtio_transport_send_pkt_work(struct work_struct *work)
>
> 	if (added)
> 		virtqueue_kick(vq);
>+}
>
>+static int virtio_transport_do_send_dgram_pkt(struct virtio_vsock *vsock,
>+		struct virtqueue *vq, struct virtio_vsock_pkt *pkt)
>+{
>+	struct scatterlist hdr, buf, *sgs[2];
>+	int ret, in_sg = 0, out_sg = 0;
>+
>+	virtio_transport_deliver_tap_pkt(pkt);
>+
>+	sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
>+	sgs[out_sg++] = &hdr;
>+	if (pkt->buf) {
>+		sg_init_one(&buf, pkt->buf, pkt->len);
>+		sgs[out_sg++] = &buf;
>+	}
>+
>+	ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt, GFP_KERNEL);
>+	/* Usually this means that there is no more space available in
>+	 * the vq
>+	 */
>+	if (ret < 0) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENOMEM;
>+	}
>+
>+	virtqueue_kick(vq);
>+
>+	return pkt->len;
>+}
>+
>+
>+static void
>+virtio_transport_send_pkt_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, send_pkt_work);
>+	struct virtqueue *vq;
>+	bool restart_rx = false;
>+
>+	mutex_lock(&vsock->tx_lock);
>+
>+	if (!vsock->tx_run)
>+		goto out;
>+
>+	vq = vsock->vqs[VSOCK_VQ_TX];
>+
>+	virtio_transport_do_send_pkt(vsock, vq, &vsock->send_pkt_list_lock,
>+							&vsock->send_pkt_list, &restart_rx);
> out:
> 	mutex_unlock(&vsock->tx_lock);
>
>@@ -163,11 +225,64 @@ virtio_transport_send_pkt_work(struct work_struct *work)
> }
>
> static int
>+virtio_transport_send_dgram_pkt(struct virtio_vsock_pkt *pkt)
>+{
>+	struct virtio_vsock *vsock;
>+	int len = pkt->len;
>+	struct virtqueue *vq;
>+
>+	vsock = the_virtio_vsock_dgram;
>+
>+	if (!vsock) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENODEV;
>+	}
>+
>+	if (!vsock->dgram_tx_run) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENODEV;
>+	}
>+
>+	if (!refcount_inc_not_zero(&vsock->active)) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENODEV;
>+	}
>+
>+	if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) {
>+		virtio_transport_free_pkt(pkt);
>+		len = -ENODEV;
>+		goto out_ref;
>+	}
>+
>+	/* send the pkt */
>+	mutex_lock(&vsock->dgram_tx_lock);
>+
>+	if (!vsock->dgram_tx_run)
>+		goto out_mutex;
>+
>+	vq = vsock->vqs[VSOCK_VQ_DGRAM_TX];
>+
>+	len = virtio_transport_do_send_dgram_pkt(vsock, vq, pkt);
>+
>+out_mutex:
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
>+out_ref:
>+	if (!refcount_dec_not_one(&vsock->active))
>+		return -EFAULT;
>+
>+	return len;
>+}
>+
>+static int
> virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> {
> 	struct virtio_vsock *vsock;
> 	int len = pkt->len;
>
>+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)
>+		return virtio_transport_send_dgram_pkt(pkt);
>+
> 	rcu_read_lock();
> 	vsock = rcu_dereference(the_virtio_vsock);
> 	if (!vsock) {
>@@ -243,7 +358,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
> 	return ret;
> }
>
>-static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock, bool is_dgram)
> {
> 	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> 	struct virtio_vsock_pkt *pkt;
>@@ -251,7 +366,10 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> 	struct virtqueue *vq;
> 	int ret;
>
>-	vq = vsock->vqs[VSOCK_VQ_RX];
>+	if (is_dgram)
>+		vq = vsock->vqs[VSOCK_VQ_DGRAM_RX];
>+	else
>+		vq = vsock->vqs[VSOCK_VQ_RX];
>
> 	do {
> 		pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
>@@ -277,10 +395,19 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> 			virtio_transport_free_pkt(pkt);
> 			break;
> 		}
>-		vsock->rx_buf_nr++;
>+		if (is_dgram)
>+			vsock->dgram_rx_buf_nr++;
>+		else
>+			vsock->rx_buf_nr++;
> 	} while (vq->num_free);
>-	if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
>-		vsock->rx_buf_max_nr = vsock->rx_buf_nr;
>+	if (is_dgram) {
>+		if (vsock->dgram_rx_buf_nr > vsock->dgram_rx_buf_max_nr)
>+			vsock->dgram_rx_buf_max_nr = vsock->dgram_rx_buf_nr;
>+	} else {
>+		if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
>+			vsock->rx_buf_max_nr = vsock->rx_buf_nr;
>+	}
>+
> 	virtqueue_kick(vq);
> }
>
>@@ -315,6 +442,34 @@ static void virtio_transport_tx_work(struct work_struct *work)
> 		queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work);
> }
>
>+static void virtio_transport_dgram_tx_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, dgram_tx_work);
>+	struct virtqueue *vq;
>+	bool added = false;

`added` seems unused.

>+
>+	vq = vsock->vqs[VSOCK_VQ_DGRAM_TX];
>+	mutex_lock(&vsock->dgram_tx_lock);
>+
>+	if (!vsock->dgram_tx_run)
>+		goto out;
>+
>+	do {
>+		struct virtio_vsock_pkt *pkt;
>+		unsigned int len;
>+
>+		virtqueue_disable_cb(vq);
>+		while ((pkt = virtqueue_get_buf(vq, &len)) != NULL) {
>+			virtio_transport_free_pkt(pkt);
>+			added = true;
>+		}
>+	} while (!virtqueue_enable_cb(vq));

This cycle seems the same of virtio_transport_tx_work(), maybe we can 
create an helper.

>+
>+out:
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+}
>+
> /* Is there space left for replies to rx packets? */
> static bool virtio_transport_more_replies(struct virtio_vsock *vsock)
> {
>@@ -449,6 +604,11 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)
>
> static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)
> {
>+	struct virtio_vsock *vsock = vq->vdev->priv;
>+
>+	if (!vsock)
>+		return;
>+	queue_work(virtio_vsock_workqueue, &vsock->dgram_tx_work);
> }
>
> static void virtio_vsock_rx_done(struct virtqueue *vq)
>@@ -462,8 +622,12 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
>
> static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)
> {
>-}
>+	struct virtio_vsock *vsock = vq->vdev->priv;
>
>+	if (!vsock)
>+		return;
>+	queue_work(virtio_vsock_workqueue, &vsock->dgram_rx_work);
>+}
> static struct virtio_transport virtio_transport = {
> 	.transport = {
> 		.module                   = THIS_MODULE,
>@@ -506,19 +670,9 @@ static struct virtio_transport virtio_transport = {
> 	.send_pkt = virtio_transport_send_pkt,
> };
>
>-static void virtio_transport_rx_work(struct work_struct *work)
>+static void virtio_transport_do_rx_work(struct virtio_vsock *vsock,
>+						struct virtqueue *vq, bool is_dgram)
> {
>-	struct virtio_vsock *vsock =
>-		container_of(work, struct virtio_vsock, rx_work);
>-	struct virtqueue *vq;
>-
>-	vq = vsock->vqs[VSOCK_VQ_RX];
>-
>-	mutex_lock(&vsock->rx_lock);
>-
>-	if (!vsock->rx_run)
>-		goto out;
>-
> 	do {
> 		virtqueue_disable_cb(vq);
> 		for (;;) {
>@@ -538,7 +692,10 @@ static void virtio_transport_rx_work(struct 
>work_struct *work)
> 				break;
> 			}
>
>-			vsock->rx_buf_nr--;
>+			if (is_dgram)
>+				vsock->dgram_rx_buf_nr--;
>+			else
>+				vsock->rx_buf_nr--;
>
> 			/* Drop short/long packets */
> 			if (unlikely(len < sizeof(pkt->hdr) ||
>@@ -554,11 +711,45 @@ static void virtio_transport_rx_work(struct work_struct *work)
> 	} while (!virtqueue_enable_cb(vq));
>
> out:
>+	return;
>+}
>+
>+static void virtio_transport_rx_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, rx_work);
>+	struct virtqueue *vq;
>+
>+	vq = vsock->vqs[VSOCK_VQ_RX];
>+
>+	mutex_lock(&vsock->rx_lock);
>+
>+	if (vsock->rx_run)
>+		virtio_transport_do_rx_work(vsock, vq, false);
>+
> 	if (vsock->rx_buf_nr < vsock->rx_buf_max_nr / 2)
>-		virtio_vsock_rx_fill(vsock);
>+		virtio_vsock_rx_fill(vsock, false);
> 	mutex_unlock(&vsock->rx_lock);
> }
>
>+static void virtio_transport_dgram_rx_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, dgram_rx_work);
>+	struct virtqueue *vq;
>+
>+	vq = vsock->vqs[VSOCK_VQ_DGRAM_RX];
>+
>+	mutex_lock(&vsock->dgram_rx_lock);
>+
>+	if (vsock->dgram_rx_run)
>+		virtio_transport_do_rx_work(vsock, vq, true);
>+
>+	if (vsock->dgram_rx_buf_nr < vsock->dgram_rx_buf_max_nr / 2)
>+		virtio_vsock_rx_fill(vsock, true);
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+}
>+
> static int virtio_vsock_probe(struct virtio_device *vdev)
> {
> 	vq_callback_t *callbacks[] = {
>@@ -642,8 +833,14 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> 	vsock->rx_buf_max_nr = 0;
> 	atomic_set(&vsock->queued_replies, 0);
>
>+	vsock->dgram_rx_buf_nr = 0;
>+	vsock->dgram_rx_buf_max_nr = 0;
>+	atomic_set(&vsock->dgram_queued_replies, 0);
>+
> 	mutex_init(&vsock->tx_lock);
> 	mutex_init(&vsock->rx_lock);
>+	mutex_init(&vsock->dgram_tx_lock);
>+	mutex_init(&vsock->dgram_rx_lock);
> 	mutex_init(&vsock->event_lock);
> 	spin_lock_init(&vsock->send_pkt_list_lock);
> 	INIT_LIST_HEAD(&vsock->send_pkt_list);
>@@ -651,16 +848,27 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> 	INIT_WORK(&vsock->tx_work, virtio_transport_tx_work);
> 	INIT_WORK(&vsock->event_work, virtio_transport_event_work);
> 	INIT_WORK(&vsock->send_pkt_work, virtio_transport_send_pkt_work);
>+	INIT_WORK(&vsock->dgram_rx_work, virtio_transport_dgram_rx_work);
>+	INIT_WORK(&vsock->dgram_tx_work, virtio_transport_dgram_tx_work);
>
> 	mutex_lock(&vsock->tx_lock);
> 	vsock->tx_run = true;
> 	mutex_unlock(&vsock->tx_lock);
>
>+	mutex_lock(&vsock->dgram_tx_lock);
>+	vsock->dgram_tx_run = true;
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
> 	mutex_lock(&vsock->rx_lock);
>-	virtio_vsock_rx_fill(vsock);
>+	virtio_vsock_rx_fill(vsock, false);
> 	vsock->rx_run = true;
> 	mutex_unlock(&vsock->rx_lock);
>
>+	mutex_lock(&vsock->dgram_rx_lock);
>+	virtio_vsock_rx_fill(vsock, true);
>+	vsock->dgram_rx_run = true;
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+
> 	mutex_lock(&vsock->event_lock);
> 	virtio_vsock_event_fill(vsock);
> 	vsock->event_run = true;
>@@ -669,6 +877,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> 	vdev->priv = vsock;
> 	rcu_assign_pointer(the_virtio_vsock, vsock);
>
>+	the_virtio_vsock_dgram = vsock;
>+	refcount_set(&the_virtio_vsock_dgram->active, 1);
>+
> 	mutex_unlock(&the_virtio_vsock_mutex);
> 	return 0;
>
>@@ -699,14 +910,28 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
> 	vsock->rx_run = false;
> 	mutex_unlock(&vsock->rx_lock);
>
>+	mutex_lock(&vsock->dgram_rx_lock);
>+	vsock->dgram_rx_run = false;
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+
> 	mutex_lock(&vsock->tx_lock);
> 	vsock->tx_run = false;
> 	mutex_unlock(&vsock->tx_lock);
>
>+	mutex_lock(&vsock->dgram_tx_lock);
>+	vsock->dgram_tx_run = false;
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
> 	mutex_lock(&vsock->event_lock);
> 	vsock->event_run = false;
> 	mutex_unlock(&vsock->event_lock);
>
>+	while (!refcount_dec_if_one(&the_virtio_vsock_dgram->active)) {
>+		if (signal_pending(current))
>+			break;
>+		msleep(5);

Why the sleep is needed?

If it is really needed, we should put a comment here with the reason.

>+	}
>+
> 	/* Flush all device writes and interrupts, device will not use any
> 	 * more buffers.
> 	 */
>@@ -717,11 +942,21 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
> 		virtio_transport_free_pkt(pkt);
> 	mutex_unlock(&vsock->rx_lock);
>
>+	mutex_lock(&vsock->dgram_rx_lock);
>+	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_DGRAM_RX])))
>+		virtio_transport_free_pkt(pkt);
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+
> 	mutex_lock(&vsock->tx_lock);
> 	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_TX])))
> 		virtio_transport_free_pkt(pkt);
> 	mutex_unlock(&vsock->tx_lock);
>
>+	mutex_lock(&vsock->dgram_tx_lock);
>+	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_DGRAM_TX])))
>+		virtio_transport_free_pkt(pkt);
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
> 	spin_lock_bh(&vsock->send_pkt_list_lock);
> 	while (!list_empty(&vsock->send_pkt_list)) {
> 		pkt = list_first_entry(&vsock->send_pkt_list,
>@@ -739,6 +974,8 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
> 	 */
> 	flush_work(&vsock->rx_work);
> 	flush_work(&vsock->tx_work);
>+	flush_work(&vsock->dgram_rx_work);
>+	flush_work(&vsock->dgram_tx_work);
> 	flush_work(&vsock->event_work);
> 	flush_work(&vsock->send_pkt_work);
>
>@@ -775,7 +1012,7 @@ static int __init virtio_vsock_init(void)
> 		return -ENOMEM;
>
> 	ret = vsock_core_register(&virtio_transport.transport,
>-				  VSOCK_TRANSPORT_F_G2H);
>+				  VSOCK_TRANSPORT_F_G2H | VSOCK_TRANSPORT_F_DGRAM);

I saw multi-transport in the TODO list :-)

We need to find a way, let me know if you want to discuss more about it.

> 	if (ret)
> 		goto out_wq;
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 902cb6dd710b..9f041515b7f1 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -26,6 +26,8 @@
> /* Threshold for detecting small packets to copy */
> #define GOOD_COPY_LEN  128
>
>+static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk);
>+
> static const struct virtio_transport *
> virtio_transport_get_ops(struct vsock_sock *vsk)
> {
>@@ -196,21 +198,28 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> 	vvs = vsk->trans;
>
> 	/* we can send less than pkt_len bytes */
>-	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
>-		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>+	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
>+		if (info->type == VIRTIO_VSOCK_TYPE_STREAM)
>+			pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>+		else
>+			return 0;
>+	}
>
>-	/* virtio_transport_get_credit might return less than pkt_len credit */
>-	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>+	if (info->type == VIRTIO_VSOCK_TYPE_STREAM) {
>+		/* virtio_transport_get_credit might return less than pkt_len credit */
>+		pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>
>-	/* Do not send zero length OP_RW pkt */
>-	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>-		return pkt_len;
>+		/* Do not send zero length OP_RW pkt */
>+		if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>+			return pkt_len;
>+	}
>
> 	pkt = virtio_transport_alloc_pkt(info, pkt_len,
> 					 src_cid, src_port,
> 					 dst_cid, dst_port);
> 	if (!pkt) {
>-		virtio_transport_put_credit(vvs, pkt_len);
>+		if (info->type == VIRTIO_VSOCK_TYPE_STREAM)
>+			virtio_transport_put_credit(vvs, pkt_len);
> 		return -ENOMEM;
> 	}
>
>@@ -397,6 +406,58 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> 	return err;
> }
>
>+static ssize_t
>+virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
>+						   struct msghdr *msg, size_t len)
>+{
>+	struct virtio_vsock_sock *vvs = vsk->trans;
>+	struct virtio_vsock_pkt *pkt;
>+	size_t total = 0;
>+	u32 free_space;
>+	int err = -EFAULT;
>+
>+	spin_lock_bh(&vvs->rx_lock);
>+	if (total < len && !list_empty(&vvs->rx_queue)) {
>+		pkt = list_first_entry(&vvs->rx_queue,
>+				       struct virtio_vsock_pkt, list);
>+
>+		total = len;
>+		if (total > pkt->len - pkt->off)
>+			total = pkt->len - pkt->off;
>+		else if (total < pkt->len - pkt->off)
>+			msg->msg_flags |= MSG_TRUNC;
>+
>+		/* sk_lock is held by caller so no one else can dequeue.
>+		 * Unlock rx_lock since memcpy_to_msg() may sleep.
>+		 */
>+		spin_unlock_bh(&vvs->rx_lock);
>+
>+		err = memcpy_to_msg(msg, pkt->buf + pkt->off, total);
>+		if (err)
>+			return err;
>+
>+		spin_lock_bh(&vvs->rx_lock);
>+
>+		virtio_transport_dec_rx_pkt(vvs, pkt);
>+		list_del(&pkt->list);
>+		virtio_transport_free_pkt(pkt);
>+	}
>+
>+	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
>+
>+	spin_unlock_bh(&vvs->rx_lock);
>+
>+	if (total > 0 && msg->msg_name) {
>+		/* Provide the address of the sender. */
>+		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
>+
>+		vsock_addr_init(vm_addr, le64_to_cpu(pkt->hdr.src_cid),
>+						le32_to_cpu(pkt->hdr.src_port));
>+		msg->msg_namelen = sizeof(*vm_addr);
>+	}
>+	return total;
>+}
>+
> ssize_t
> virtio_transport_stream_dequeue(struct vsock_sock *vsk,
> 				struct msghdr *msg,
>@@ -414,7 +475,66 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> 			       struct msghdr *msg,
> 			       size_t len, int flags)
> {
>-	return -EOPNOTSUPP;
>+	struct sock *sk;
>+	size_t err = 0;
>+	long timeout;
>+
>+	DEFINE_WAIT(wait);
>+
>+	sk = &vsk->sk;
>+	err = 0;
>+
>+	lock_sock(sk);
>+
>+	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags & MSG_PEEK)
>+		return -EOPNOTSUPP;
>+
>+	if (!len)
>+		goto out;
>+
>+	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
>+
>+	while (1) {
>+		s64 ready;
>+
>+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
>+		ready = virtio_transport_dgram_has_data(vsk);
>+
>+		if (ready == 0) {
>+			if (timeout == 0) {
>+				err = -EAGAIN;
>+				finish_wait(sk_sleep(sk), &wait);
>+				break;
>+			}
>+
>+			release_sock(sk);
>+			timeout = schedule_timeout(timeout);
>+			lock_sock(sk);
>+
>+			if (signal_pending(current)) {
>+				err = sock_intr_errno(timeout);
>+				finish_wait(sk_sleep(sk), &wait);
>+				break;
>+			} else if (timeout == 0) {
>+				err = -EAGAIN;
>+				finish_wait(sk_sleep(sk), &wait);
>+				break;
>+			}
>+		} else {
>+			finish_wait(sk_sleep(sk), &wait);
>+
>+			if (ready < 0) {
>+				err = -ENOMEM;
>+				goto out;
>+			}
>+
>+			err = virtio_transport_dgram_do_dequeue(vsk, msg, len);
>+			break;
>+		}
>+	}
>+out:
>+	release_sock(sk);
>+	return err;
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
>
>@@ -431,6 +551,11 @@ s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
> }
> EXPORT_SYMBOL_GPL(virtio_transport_stream_has_data);
>
>+static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk)
>+{
>+	return virtio_transport_stream_has_data(vsk);
>+}
>+
> static s64 virtio_transport_has_space(struct vsock_sock *vsk)
> {
> 	struct virtio_vsock_sock *vvs = vsk->trans;
>@@ -610,13 +735,15 @@ EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> 				struct sockaddr_vm *addr)
> {
>-	return -EOPNOTSUPP;
>+	//use same stream bind for dgram
>+	int ret = vsock_bind_stream(vsk, addr);
>+	return ret;
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
>
> bool virtio_transport_dgram_allow(u32 cid, u32 port)
> {
>-	return false;
>+	return true;
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
>
>@@ -654,7 +781,17 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
> 			       struct msghdr *msg,
> 			       size_t dgram_len)
> {
>-	return -EOPNOTSUPP;
>+	struct virtio_vsock_pkt_info info = {
>+		.op = VIRTIO_VSOCK_OP_RW,
>+		.type = VIRTIO_VSOCK_TYPE_DGRAM,
>+		.msg = msg,
>+		.pkt_len = dgram_len,
>+		.vsk = vsk,
>+		.remote_cid = remote_addr->svm_cid,
>+		.remote_port = remote_addr->svm_port,
>+	};
>+
>+	return virtio_transport_send_pkt_info(vsk, &info);
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
>
>@@ -729,7 +866,6 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
> 		virtio_transport_free_pkt(reply);
> 		return -ENOTCONN;
> 	}
>-
> 	return t->send_pkt(reply);
> }
>
>@@ -925,7 +1061,8 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> 		/* If there is space in the last packet queued, we copy the
> 		 * new packet in its buffer.
> 		 */
>-		if (pkt->len <= last_pkt->buf_len - last_pkt->len) {
>+		if (pkt->len <= last_pkt->buf_len - last_pkt->len &&
>+			pkt->hdr.type == VIRTIO_VSOCK_TYPE_STREAM) {
> 			memcpy(last_pkt->buf + last_pkt->len, pkt->buf,
> 			       pkt->len);
> 			last_pkt->len += pkt->len;
>@@ -949,6 +1086,12 @@ virtio_transport_recv_connected(struct sock *sk,
> 	struct vsock_sock *vsk = vsock_sk(sk);
> 	int err = 0;
>
>+	if (le16_to_cpu(pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)) {
>+		virtio_transport_recv_enqueue(vsk, pkt);
>+		sk->sk_data_ready(sk);
>+		return err;
>+	}
>+
> 	switch (le16_to_cpu(pkt->hdr.op)) {
> 	case VIRTIO_VSOCK_OP_RW:
> 		virtio_transport_recv_enqueue(vsk, pkt);
>@@ -1121,7 +1264,8 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> 					le32_to_cpu(pkt->hdr.buf_alloc),
> 					le32_to_cpu(pkt->hdr.fwd_cnt));
>
>-	if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM) {
>+	if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM &&
>+		le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_DGRAM) {
> 		(void)virtio_transport_reset_no_sock(t, pkt);
> 		goto free_pkt;
> 	}
>@@ -1150,11 +1294,16 @@ void virtio_transport_recv_pkt(struct 
>virtio_transport *t,
> 		goto free_pkt;
> 	}
>
>-	space_available = virtio_transport_space_update(sk, pkt);
>-
> 	/* Update CID in case it has changed after a transport reset event */
> 	vsk->local_addr.svm_cid = dst.svm_cid;
>
>+	if (sk->sk_type == SOCK_DGRAM) {
>+		virtio_transport_recv_connected(sk, pkt);
>+		goto out;
>+	}
>+
>+	space_available = virtio_transport_space_update(sk, pkt);
>+
> 	if (space_available)
> 		sk->sk_write_space(sk);
>
>@@ -1180,6 +1329,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> 		break;
> 	}
>
>+out:
> 	release_sock(sk);
>
> 	/* Release refcnt obtained when we fetched this socket out of the
>-- 
>2.11.0
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 2/6] virtio/vsock: add support for virtio datagram
@ 2021-06-18  9:52     ` Stefano Garzarella
  0 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-18  9:52 UTC (permalink / raw)
  To: Jiang Wang
  Cc: cong.wang, duanxiongchun, Ingo Molnar, kvm, mst, netdev,
	linux-kernel, Steven Rostedt, virtualization, xieyongji,
	chaiwen.cc, Norbert Slusarek, stefanha, Colin Ian King,
	Jakub Kicinski, arseny.krasnov, Alexander Popov, jhansen,
	David S. Miller, Andra Paraschiv

On Wed, Jun 09, 2021 at 11:24:54PM +0000, Jiang Wang wrote:
>This patch add support for virtio dgram for the driver.
>Implemented related functions for tx and rx, enqueue
>and dequeue. Send packets synchronously to give sender
>indication when the virtqueue is full.
>Refactored virtio_transport_send_pkt_work() a little bit but
>no functions changes for it.
>
>Support for the host/device side is in another
>patch.
>
>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>---
> include/net/af_vsock.h                             |   1 +
> .../trace/events/vsock_virtio_transport_common.h   |   5 +-
> include/uapi/linux/virtio_vsock.h                  |   1 +
> net/vmw_vsock/af_vsock.c                           |  12 +
> net/vmw_vsock/virtio_transport.c                   | 325 ++++++++++++++++++---
> net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++++--
> 6 files changed, 466 insertions(+), 62 deletions(-)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index b1c717286993..fcae7bca9609 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -200,6 +200,7 @@ void vsock_remove_sock(struct vsock_sock *vsk);
> void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
> int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
> bool vsock_find_cid(unsigned int cid);
>+int vsock_bind_stream(struct vsock_sock *vsk, struct sockaddr_vm *addr);
>
> /**** TAP ****/
>
>diff --git a/include/trace/events/vsock_virtio_transport_common.h b/include/trace/events/vsock_virtio_transport_common.h
>index 6782213778be..b1be25b327a1 100644
>--- a/include/trace/events/vsock_virtio_transport_common.h
>+++ b/include/trace/events/vsock_virtio_transport_common.h
>@@ -9,9 +9,12 @@
> #include <linux/tracepoint.h>
>
> TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_STREAM);
>+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_DGRAM);
>
> #define show_type(val) \
>-	__print_symbolic(val, { VIRTIO_VSOCK_TYPE_STREAM, "STREAM" })
>+	 __print_symbolic(val, \
>+					{ VIRTIO_VSOCK_TYPE_STREAM, "STREAM" }, \
>+					{ VIRTIO_VSOCK_TYPE_DGRAM, "DGRAM" })
>
> TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_INVALID);
> TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_REQUEST);
>diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>index b56614dff1c9..5503585b26e8 100644
>--- a/include/uapi/linux/virtio_vsock.h
>+++ b/include/uapi/linux/virtio_vsock.h
>@@ -68,6 +68,7 @@ struct virtio_vsock_hdr {
>
> enum virtio_vsock_type {
> 	VIRTIO_VSOCK_TYPE_STREAM = 1,
>+	VIRTIO_VSOCK_TYPE_DGRAM = 3,
> };
>
> enum virtio_vsock_op {
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 92a72f0e0d94..c1f512291b94 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -659,6 +659,18 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
> 	return 0;
> }
>
>+int vsock_bind_stream(struct vsock_sock *vsk,
>+				       struct sockaddr_vm *addr)
>+{
>+	int retval;
>+
>+	spin_lock_bh(&vsock_table_lock);
>+	retval = __vsock_bind_stream(vsk, addr);
>+	spin_unlock_bh(&vsock_table_lock);
>+	return retval;
>+}
>+EXPORT_SYMBOL(vsock_bind_stream);
>+
> static int __vsock_bind_dgram(struct vsock_sock *vsk,
> 			      struct sockaddr_vm *addr)
> {
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 7dcb8db23305..cf47aadb0c34 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -20,21 +20,29 @@
> #include <net/sock.h>
> #include <linux/mutex.h>
> #include <net/af_vsock.h>
>+#include<linux/kobject.h>
           ^
           Space needed here
>+#include<linux/sysfs.h>
           ^
           Ditto
>+#include <linux/refcount.h>
>
> static struct workqueue_struct *virtio_vsock_workqueue;
> static struct virtio_vsock __rcu *the_virtio_vsock;
>+static struct virtio_vsock *the_virtio_vsock_dgram;
> static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
>
> struct virtio_vsock {
> 	struct virtio_device *vdev;
> 	struct virtqueue **vqs;
> 	bool has_dgram;
>+	refcount_t active;
>
> 	/* Virtqueue processing is deferred to a workqueue */
> 	struct work_struct tx_work;
> 	struct work_struct rx_work;
> 	struct work_struct event_work;
>
>+	struct work_struct dgram_tx_work;
>+	struct work_struct dgram_rx_work;
>+
> 	/* The following fields are protected by tx_lock.  vqs[VSOCK_VQ_TX]
> 	 * must be accessed with tx_lock held.
> 	 */
>@@ -55,6 +63,22 @@ struct virtio_vsock {
> 	int rx_buf_nr;
> 	int rx_buf_max_nr;
>
>+	/* The following fields are protected by dgram_tx_lock.  vqs[VSOCK_VQ_DGRAM_TX]
>+	 * must be accessed with dgram_tx_lock held.
>+	 */
>+	struct mutex dgram_tx_lock;
>+	bool dgram_tx_run;
>+
>+	atomic_t dgram_queued_replies;
>+
>+	/* The following fields are protected by dgram_rx_lock.  vqs[VSOCK_VQ_DGRAM_RX]
>+	 * must be accessed with dgram_rx_lock held.
>+	 */
>+	struct mutex dgram_rx_lock;
>+	bool dgram_rx_run;
>+	int dgram_rx_buf_nr;
>+	int dgram_rx_buf_max_nr;
>+
> 	/* The following fields are protected by event_lock.
> 	 * vqs[VSOCK_VQ_EVENT] must be accessed with event_lock held.
> 	 */
>@@ -83,21 +107,11 @@ static u32 virtio_transport_get_local_cid(void)
> 	return ret;
> }
>
>-static void
>-virtio_transport_send_pkt_work(struct work_struct *work)
>+static void virtio_transport_do_send_pkt(struct virtio_vsock *vsock,
>+		struct virtqueue *vq,  spinlock_t *lock, struct list_head *send_pkt_list,
>+		bool *restart_rx)
> {
>-	struct virtio_vsock *vsock =
>-		container_of(work, struct virtio_vsock, send_pkt_work);
>-	struct virtqueue *vq;
> 	bool added = false;
>-	bool restart_rx = false;
>-
>-	mutex_lock(&vsock->tx_lock);
>-
>-	if (!vsock->tx_run)
>-		goto out;
>-
>-	vq = vsock->vqs[VSOCK_VQ_TX];
>
> 	for (;;) {
> 		struct virtio_vsock_pkt *pkt;
>@@ -105,16 +119,16 @@ virtio_transport_send_pkt_work(struct work_struct *work)
> 		int ret, in_sg = 0, out_sg = 0;
> 		bool reply;
>
>-		spin_lock_bh(&vsock->send_pkt_list_lock);
>-		if (list_empty(&vsock->send_pkt_list)) {
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+		spin_lock_bh(lock);
>+		if (list_empty(send_pkt_list)) {
>+			spin_unlock_bh(lock);
> 			break;
> 		}
>
>-		pkt = list_first_entry(&vsock->send_pkt_list,
>+		pkt = list_first_entry(send_pkt_list,
> 				       struct virtio_vsock_pkt, list);
> 		list_del_init(&pkt->list);
>-		spin_unlock_bh(&vsock->send_pkt_list_lock);
>+		spin_unlock_bh(lock);
>
> 		virtio_transport_deliver_tap_pkt(pkt);
>
>@@ -132,9 +146,9 @@ virtio_transport_send_pkt_work(struct work_struct *work)
> 		 * the vq
> 		 */
> 		if (ret < 0) {
>-			spin_lock_bh(&vsock->send_pkt_list_lock);
>-			list_add(&pkt->list, &vsock->send_pkt_list);
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+			spin_lock_bh(lock);
>+			list_add(&pkt->list, send_pkt_list);
>+			spin_unlock_bh(lock);
> 			break;
> 		}
>
>@@ -146,7 +160,7 @@ virtio_transport_send_pkt_work(struct work_struct *work)
>
> 			/* Do we now have resources to resume rx processing? */
> 			if (val + 1 == virtqueue_get_vring_size(rx_vq))
>-				restart_rx = true;
>+				*restart_rx = true;
> 		}
>
> 		added = true;
>@@ -154,7 +168,55 @@ virtio_transport_send_pkt_work(struct work_struct *work)
>
> 	if (added)
> 		virtqueue_kick(vq);
>+}
>
>+static int virtio_transport_do_send_dgram_pkt(struct virtio_vsock *vsock,
>+		struct virtqueue *vq, struct virtio_vsock_pkt *pkt)
>+{
>+	struct scatterlist hdr, buf, *sgs[2];
>+	int ret, in_sg = 0, out_sg = 0;
>+
>+	virtio_transport_deliver_tap_pkt(pkt);
>+
>+	sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
>+	sgs[out_sg++] = &hdr;
>+	if (pkt->buf) {
>+		sg_init_one(&buf, pkt->buf, pkt->len);
>+		sgs[out_sg++] = &buf;
>+	}
>+
>+	ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt, GFP_KERNEL);
>+	/* Usually this means that there is no more space available in
>+	 * the vq
>+	 */
>+	if (ret < 0) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENOMEM;
>+	}
>+
>+	virtqueue_kick(vq);
>+
>+	return pkt->len;
>+}
>+
>+
>+static void
>+virtio_transport_send_pkt_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, send_pkt_work);
>+	struct virtqueue *vq;
>+	bool restart_rx = false;
>+
>+	mutex_lock(&vsock->tx_lock);
>+
>+	if (!vsock->tx_run)
>+		goto out;
>+
>+	vq = vsock->vqs[VSOCK_VQ_TX];
>+
>+	virtio_transport_do_send_pkt(vsock, vq, &vsock->send_pkt_list_lock,
>+							&vsock->send_pkt_list, &restart_rx);
> out:
> 	mutex_unlock(&vsock->tx_lock);
>
>@@ -163,11 +225,64 @@ virtio_transport_send_pkt_work(struct work_struct *work)
> }
>
> static int
>+virtio_transport_send_dgram_pkt(struct virtio_vsock_pkt *pkt)
>+{
>+	struct virtio_vsock *vsock;
>+	int len = pkt->len;
>+	struct virtqueue *vq;
>+
>+	vsock = the_virtio_vsock_dgram;
>+
>+	if (!vsock) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENODEV;
>+	}
>+
>+	if (!vsock->dgram_tx_run) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENODEV;
>+	}
>+
>+	if (!refcount_inc_not_zero(&vsock->active)) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENODEV;
>+	}
>+
>+	if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) {
>+		virtio_transport_free_pkt(pkt);
>+		len = -ENODEV;
>+		goto out_ref;
>+	}
>+
>+	/* send the pkt */
>+	mutex_lock(&vsock->dgram_tx_lock);
>+
>+	if (!vsock->dgram_tx_run)
>+		goto out_mutex;
>+
>+	vq = vsock->vqs[VSOCK_VQ_DGRAM_TX];
>+
>+	len = virtio_transport_do_send_dgram_pkt(vsock, vq, pkt);
>+
>+out_mutex:
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
>+out_ref:
>+	if (!refcount_dec_not_one(&vsock->active))
>+		return -EFAULT;
>+
>+	return len;
>+}
>+
>+static int
> virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> {
> 	struct virtio_vsock *vsock;
> 	int len = pkt->len;
>
>+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)
>+		return virtio_transport_send_dgram_pkt(pkt);
>+
> 	rcu_read_lock();
> 	vsock = rcu_dereference(the_virtio_vsock);
> 	if (!vsock) {
>@@ -243,7 +358,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
> 	return ret;
> }
>
>-static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock, bool is_dgram)
> {
> 	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> 	struct virtio_vsock_pkt *pkt;
>@@ -251,7 +366,10 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> 	struct virtqueue *vq;
> 	int ret;
>
>-	vq = vsock->vqs[VSOCK_VQ_RX];
>+	if (is_dgram)
>+		vq = vsock->vqs[VSOCK_VQ_DGRAM_RX];
>+	else
>+		vq = vsock->vqs[VSOCK_VQ_RX];
>
> 	do {
> 		pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
>@@ -277,10 +395,19 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> 			virtio_transport_free_pkt(pkt);
> 			break;
> 		}
>-		vsock->rx_buf_nr++;
>+		if (is_dgram)
>+			vsock->dgram_rx_buf_nr++;
>+		else
>+			vsock->rx_buf_nr++;
> 	} while (vq->num_free);
>-	if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
>-		vsock->rx_buf_max_nr = vsock->rx_buf_nr;
>+	if (is_dgram) {
>+		if (vsock->dgram_rx_buf_nr > vsock->dgram_rx_buf_max_nr)
>+			vsock->dgram_rx_buf_max_nr = vsock->dgram_rx_buf_nr;
>+	} else {
>+		if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
>+			vsock->rx_buf_max_nr = vsock->rx_buf_nr;
>+	}
>+
> 	virtqueue_kick(vq);
> }
>
>@@ -315,6 +442,34 @@ static void virtio_transport_tx_work(struct work_struct *work)
> 		queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work);
> }
>
>+static void virtio_transport_dgram_tx_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, dgram_tx_work);
>+	struct virtqueue *vq;
>+	bool added = false;

`added` seems unused.

>+
>+	vq = vsock->vqs[VSOCK_VQ_DGRAM_TX];
>+	mutex_lock(&vsock->dgram_tx_lock);
>+
>+	if (!vsock->dgram_tx_run)
>+		goto out;
>+
>+	do {
>+		struct virtio_vsock_pkt *pkt;
>+		unsigned int len;
>+
>+		virtqueue_disable_cb(vq);
>+		while ((pkt = virtqueue_get_buf(vq, &len)) != NULL) {
>+			virtio_transport_free_pkt(pkt);
>+			added = true;
>+		}
>+	} while (!virtqueue_enable_cb(vq));

This cycle seems the same of virtio_transport_tx_work(), maybe we can 
create an helper.

>+
>+out:
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+}
>+
> /* Is there space left for replies to rx packets? */
> static bool virtio_transport_more_replies(struct virtio_vsock *vsock)
> {
>@@ -449,6 +604,11 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)
>
> static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)
> {
>+	struct virtio_vsock *vsock = vq->vdev->priv;
>+
>+	if (!vsock)
>+		return;
>+	queue_work(virtio_vsock_workqueue, &vsock->dgram_tx_work);
> }
>
> static void virtio_vsock_rx_done(struct virtqueue *vq)
>@@ -462,8 +622,12 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
>
> static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)
> {
>-}
>+	struct virtio_vsock *vsock = vq->vdev->priv;
>
>+	if (!vsock)
>+		return;
>+	queue_work(virtio_vsock_workqueue, &vsock->dgram_rx_work);
>+}
> static struct virtio_transport virtio_transport = {
> 	.transport = {
> 		.module                   = THIS_MODULE,
>@@ -506,19 +670,9 @@ static struct virtio_transport virtio_transport = {
> 	.send_pkt = virtio_transport_send_pkt,
> };
>
>-static void virtio_transport_rx_work(struct work_struct *work)
>+static void virtio_transport_do_rx_work(struct virtio_vsock *vsock,
>+						struct virtqueue *vq, bool is_dgram)
> {
>-	struct virtio_vsock *vsock =
>-		container_of(work, struct virtio_vsock, rx_work);
>-	struct virtqueue *vq;
>-
>-	vq = vsock->vqs[VSOCK_VQ_RX];
>-
>-	mutex_lock(&vsock->rx_lock);
>-
>-	if (!vsock->rx_run)
>-		goto out;
>-
> 	do {
> 		virtqueue_disable_cb(vq);
> 		for (;;) {
>@@ -538,7 +692,10 @@ static void virtio_transport_rx_work(struct 
>work_struct *work)
> 				break;
> 			}
>
>-			vsock->rx_buf_nr--;
>+			if (is_dgram)
>+				vsock->dgram_rx_buf_nr--;
>+			else
>+				vsock->rx_buf_nr--;
>
> 			/* Drop short/long packets */
> 			if (unlikely(len < sizeof(pkt->hdr) ||
>@@ -554,11 +711,45 @@ static void virtio_transport_rx_work(struct work_struct *work)
> 	} while (!virtqueue_enable_cb(vq));
>
> out:
>+	return;
>+}
>+
>+static void virtio_transport_rx_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, rx_work);
>+	struct virtqueue *vq;
>+
>+	vq = vsock->vqs[VSOCK_VQ_RX];
>+
>+	mutex_lock(&vsock->rx_lock);
>+
>+	if (vsock->rx_run)
>+		virtio_transport_do_rx_work(vsock, vq, false);
>+
> 	if (vsock->rx_buf_nr < vsock->rx_buf_max_nr / 2)
>-		virtio_vsock_rx_fill(vsock);
>+		virtio_vsock_rx_fill(vsock, false);
> 	mutex_unlock(&vsock->rx_lock);
> }
>
>+static void virtio_transport_dgram_rx_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, dgram_rx_work);
>+	struct virtqueue *vq;
>+
>+	vq = vsock->vqs[VSOCK_VQ_DGRAM_RX];
>+
>+	mutex_lock(&vsock->dgram_rx_lock);
>+
>+	if (vsock->dgram_rx_run)
>+		virtio_transport_do_rx_work(vsock, vq, true);
>+
>+	if (vsock->dgram_rx_buf_nr < vsock->dgram_rx_buf_max_nr / 2)
>+		virtio_vsock_rx_fill(vsock, true);
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+}
>+
> static int virtio_vsock_probe(struct virtio_device *vdev)
> {
> 	vq_callback_t *callbacks[] = {
>@@ -642,8 +833,14 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> 	vsock->rx_buf_max_nr = 0;
> 	atomic_set(&vsock->queued_replies, 0);
>
>+	vsock->dgram_rx_buf_nr = 0;
>+	vsock->dgram_rx_buf_max_nr = 0;
>+	atomic_set(&vsock->dgram_queued_replies, 0);
>+
> 	mutex_init(&vsock->tx_lock);
> 	mutex_init(&vsock->rx_lock);
>+	mutex_init(&vsock->dgram_tx_lock);
>+	mutex_init(&vsock->dgram_rx_lock);
> 	mutex_init(&vsock->event_lock);
> 	spin_lock_init(&vsock->send_pkt_list_lock);
> 	INIT_LIST_HEAD(&vsock->send_pkt_list);
>@@ -651,16 +848,27 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> 	INIT_WORK(&vsock->tx_work, virtio_transport_tx_work);
> 	INIT_WORK(&vsock->event_work, virtio_transport_event_work);
> 	INIT_WORK(&vsock->send_pkt_work, virtio_transport_send_pkt_work);
>+	INIT_WORK(&vsock->dgram_rx_work, virtio_transport_dgram_rx_work);
>+	INIT_WORK(&vsock->dgram_tx_work, virtio_transport_dgram_tx_work);
>
> 	mutex_lock(&vsock->tx_lock);
> 	vsock->tx_run = true;
> 	mutex_unlock(&vsock->tx_lock);
>
>+	mutex_lock(&vsock->dgram_tx_lock);
>+	vsock->dgram_tx_run = true;
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
> 	mutex_lock(&vsock->rx_lock);
>-	virtio_vsock_rx_fill(vsock);
>+	virtio_vsock_rx_fill(vsock, false);
> 	vsock->rx_run = true;
> 	mutex_unlock(&vsock->rx_lock);
>
>+	mutex_lock(&vsock->dgram_rx_lock);
>+	virtio_vsock_rx_fill(vsock, true);
>+	vsock->dgram_rx_run = true;
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+
> 	mutex_lock(&vsock->event_lock);
> 	virtio_vsock_event_fill(vsock);
> 	vsock->event_run = true;
>@@ -669,6 +877,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> 	vdev->priv = vsock;
> 	rcu_assign_pointer(the_virtio_vsock, vsock);
>
>+	the_virtio_vsock_dgram = vsock;
>+	refcount_set(&the_virtio_vsock_dgram->active, 1);
>+
> 	mutex_unlock(&the_virtio_vsock_mutex);
> 	return 0;
>
>@@ -699,14 +910,28 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
> 	vsock->rx_run = false;
> 	mutex_unlock(&vsock->rx_lock);
>
>+	mutex_lock(&vsock->dgram_rx_lock);
>+	vsock->dgram_rx_run = false;
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+
> 	mutex_lock(&vsock->tx_lock);
> 	vsock->tx_run = false;
> 	mutex_unlock(&vsock->tx_lock);
>
>+	mutex_lock(&vsock->dgram_tx_lock);
>+	vsock->dgram_tx_run = false;
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
> 	mutex_lock(&vsock->event_lock);
> 	vsock->event_run = false;
> 	mutex_unlock(&vsock->event_lock);
>
>+	while (!refcount_dec_if_one(&the_virtio_vsock_dgram->active)) {
>+		if (signal_pending(current))
>+			break;
>+		msleep(5);

Why the sleep is needed?

If it is really needed, we should put a comment here with the reason.

>+	}
>+
> 	/* Flush all device writes and interrupts, device will not use any
> 	 * more buffers.
> 	 */
>@@ -717,11 +942,21 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
> 		virtio_transport_free_pkt(pkt);
> 	mutex_unlock(&vsock->rx_lock);
>
>+	mutex_lock(&vsock->dgram_rx_lock);
>+	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_DGRAM_RX])))
>+		virtio_transport_free_pkt(pkt);
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+
> 	mutex_lock(&vsock->tx_lock);
> 	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_TX])))
> 		virtio_transport_free_pkt(pkt);
> 	mutex_unlock(&vsock->tx_lock);
>
>+	mutex_lock(&vsock->dgram_tx_lock);
>+	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_DGRAM_TX])))
>+		virtio_transport_free_pkt(pkt);
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
> 	spin_lock_bh(&vsock->send_pkt_list_lock);
> 	while (!list_empty(&vsock->send_pkt_list)) {
> 		pkt = list_first_entry(&vsock->send_pkt_list,
>@@ -739,6 +974,8 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
> 	 */
> 	flush_work(&vsock->rx_work);
> 	flush_work(&vsock->tx_work);
>+	flush_work(&vsock->dgram_rx_work);
>+	flush_work(&vsock->dgram_tx_work);
> 	flush_work(&vsock->event_work);
> 	flush_work(&vsock->send_pkt_work);
>
>@@ -775,7 +1012,7 @@ static int __init virtio_vsock_init(void)
> 		return -ENOMEM;
>
> 	ret = vsock_core_register(&virtio_transport.transport,
>-				  VSOCK_TRANSPORT_F_G2H);
>+				  VSOCK_TRANSPORT_F_G2H | VSOCK_TRANSPORT_F_DGRAM);

I saw multi-transport in the TODO list :-)

We need to find a way, let me know if you want to discuss more about it.

> 	if (ret)
> 		goto out_wq;
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 902cb6dd710b..9f041515b7f1 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -26,6 +26,8 @@
> /* Threshold for detecting small packets to copy */
> #define GOOD_COPY_LEN  128
>
>+static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk);
>+
> static const struct virtio_transport *
> virtio_transport_get_ops(struct vsock_sock *vsk)
> {
>@@ -196,21 +198,28 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> 	vvs = vsk->trans;
>
> 	/* we can send less than pkt_len bytes */
>-	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
>-		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>+	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
>+		if (info->type == VIRTIO_VSOCK_TYPE_STREAM)
>+			pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>+		else
>+			return 0;
>+	}
>
>-	/* virtio_transport_get_credit might return less than pkt_len credit */
>-	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>+	if (info->type == VIRTIO_VSOCK_TYPE_STREAM) {
>+		/* virtio_transport_get_credit might return less than pkt_len credit */
>+		pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>
>-	/* Do not send zero length OP_RW pkt */
>-	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>-		return pkt_len;
>+		/* Do not send zero length OP_RW pkt */
>+		if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>+			return pkt_len;
>+	}
>
> 	pkt = virtio_transport_alloc_pkt(info, pkt_len,
> 					 src_cid, src_port,
> 					 dst_cid, dst_port);
> 	if (!pkt) {
>-		virtio_transport_put_credit(vvs, pkt_len);
>+		if (info->type == VIRTIO_VSOCK_TYPE_STREAM)
>+			virtio_transport_put_credit(vvs, pkt_len);
> 		return -ENOMEM;
> 	}
>
>@@ -397,6 +406,58 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> 	return err;
> }
>
>+static ssize_t
>+virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
>+						   struct msghdr *msg, size_t len)
>+{
>+	struct virtio_vsock_sock *vvs = vsk->trans;
>+	struct virtio_vsock_pkt *pkt;
>+	size_t total = 0;
>+	u32 free_space;
>+	int err = -EFAULT;
>+
>+	spin_lock_bh(&vvs->rx_lock);
>+	if (total < len && !list_empty(&vvs->rx_queue)) {
>+		pkt = list_first_entry(&vvs->rx_queue,
>+				       struct virtio_vsock_pkt, list);
>+
>+		total = len;
>+		if (total > pkt->len - pkt->off)
>+			total = pkt->len - pkt->off;
>+		else if (total < pkt->len - pkt->off)
>+			msg->msg_flags |= MSG_TRUNC;
>+
>+		/* sk_lock is held by caller so no one else can dequeue.
>+		 * Unlock rx_lock since memcpy_to_msg() may sleep.
>+		 */
>+		spin_unlock_bh(&vvs->rx_lock);
>+
>+		err = memcpy_to_msg(msg, pkt->buf + pkt->off, total);
>+		if (err)
>+			return err;
>+
>+		spin_lock_bh(&vvs->rx_lock);
>+
>+		virtio_transport_dec_rx_pkt(vvs, pkt);
>+		list_del(&pkt->list);
>+		virtio_transport_free_pkt(pkt);
>+	}
>+
>+	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
>+
>+	spin_unlock_bh(&vvs->rx_lock);
>+
>+	if (total > 0 && msg->msg_name) {
>+		/* Provide the address of the sender. */
>+		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
>+
>+		vsock_addr_init(vm_addr, le64_to_cpu(pkt->hdr.src_cid),
>+						le32_to_cpu(pkt->hdr.src_port));
>+		msg->msg_namelen = sizeof(*vm_addr);
>+	}
>+	return total;
>+}
>+
> ssize_t
> virtio_transport_stream_dequeue(struct vsock_sock *vsk,
> 				struct msghdr *msg,
>@@ -414,7 +475,66 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> 			       struct msghdr *msg,
> 			       size_t len, int flags)
> {
>-	return -EOPNOTSUPP;
>+	struct sock *sk;
>+	size_t err = 0;
>+	long timeout;
>+
>+	DEFINE_WAIT(wait);
>+
>+	sk = &vsk->sk;
>+	err = 0;
>+
>+	lock_sock(sk);
>+
>+	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags & MSG_PEEK)
>+		return -EOPNOTSUPP;
>+
>+	if (!len)
>+		goto out;
>+
>+	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
>+
>+	while (1) {
>+		s64 ready;
>+
>+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
>+		ready = virtio_transport_dgram_has_data(vsk);
>+
>+		if (ready == 0) {
>+			if (timeout == 0) {
>+				err = -EAGAIN;
>+				finish_wait(sk_sleep(sk), &wait);
>+				break;
>+			}
>+
>+			release_sock(sk);
>+			timeout = schedule_timeout(timeout);
>+			lock_sock(sk);
>+
>+			if (signal_pending(current)) {
>+				err = sock_intr_errno(timeout);
>+				finish_wait(sk_sleep(sk), &wait);
>+				break;
>+			} else if (timeout == 0) {
>+				err = -EAGAIN;
>+				finish_wait(sk_sleep(sk), &wait);
>+				break;
>+			}
>+		} else {
>+			finish_wait(sk_sleep(sk), &wait);
>+
>+			if (ready < 0) {
>+				err = -ENOMEM;
>+				goto out;
>+			}
>+
>+			err = virtio_transport_dgram_do_dequeue(vsk, msg, len);
>+			break;
>+		}
>+	}
>+out:
>+	release_sock(sk);
>+	return err;
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
>
>@@ -431,6 +551,11 @@ s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
> }
> EXPORT_SYMBOL_GPL(virtio_transport_stream_has_data);
>
>+static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk)
>+{
>+	return virtio_transport_stream_has_data(vsk);
>+}
>+
> static s64 virtio_transport_has_space(struct vsock_sock *vsk)
> {
> 	struct virtio_vsock_sock *vvs = vsk->trans;
>@@ -610,13 +735,15 @@ EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> 				struct sockaddr_vm *addr)
> {
>-	return -EOPNOTSUPP;
>+	//use same stream bind for dgram
>+	int ret = vsock_bind_stream(vsk, addr);
>+	return ret;
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
>
> bool virtio_transport_dgram_allow(u32 cid, u32 port)
> {
>-	return false;
>+	return true;
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
>
>@@ -654,7 +781,17 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
> 			       struct msghdr *msg,
> 			       size_t dgram_len)
> {
>-	return -EOPNOTSUPP;
>+	struct virtio_vsock_pkt_info info = {
>+		.op = VIRTIO_VSOCK_OP_RW,
>+		.type = VIRTIO_VSOCK_TYPE_DGRAM,
>+		.msg = msg,
>+		.pkt_len = dgram_len,
>+		.vsk = vsk,
>+		.remote_cid = remote_addr->svm_cid,
>+		.remote_port = remote_addr->svm_port,
>+	};
>+
>+	return virtio_transport_send_pkt_info(vsk, &info);
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
>
>@@ -729,7 +866,6 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
> 		virtio_transport_free_pkt(reply);
> 		return -ENOTCONN;
> 	}
>-
> 	return t->send_pkt(reply);
> }
>
>@@ -925,7 +1061,8 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> 		/* If there is space in the last packet queued, we copy the
> 		 * new packet in its buffer.
> 		 */
>-		if (pkt->len <= last_pkt->buf_len - last_pkt->len) {
>+		if (pkt->len <= last_pkt->buf_len - last_pkt->len &&
>+			pkt->hdr.type == VIRTIO_VSOCK_TYPE_STREAM) {
> 			memcpy(last_pkt->buf + last_pkt->len, pkt->buf,
> 			       pkt->len);
> 			last_pkt->len += pkt->len;
>@@ -949,6 +1086,12 @@ virtio_transport_recv_connected(struct sock *sk,
> 	struct vsock_sock *vsk = vsock_sk(sk);
> 	int err = 0;
>
>+	if (le16_to_cpu(pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)) {
>+		virtio_transport_recv_enqueue(vsk, pkt);
>+		sk->sk_data_ready(sk);
>+		return err;
>+	}
>+
> 	switch (le16_to_cpu(pkt->hdr.op)) {
> 	case VIRTIO_VSOCK_OP_RW:
> 		virtio_transport_recv_enqueue(vsk, pkt);
>@@ -1121,7 +1264,8 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> 					le32_to_cpu(pkt->hdr.buf_alloc),
> 					le32_to_cpu(pkt->hdr.fwd_cnt));
>
>-	if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM) {
>+	if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM &&
>+		le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_DGRAM) {
> 		(void)virtio_transport_reset_no_sock(t, pkt);
> 		goto free_pkt;
> 	}
>@@ -1150,11 +1294,16 @@ void virtio_transport_recv_pkt(struct 
>virtio_transport *t,
> 		goto free_pkt;
> 	}
>
>-	space_available = virtio_transport_space_update(sk, pkt);
>-
> 	/* Update CID in case it has changed after a transport reset event */
> 	vsk->local_addr.svm_cid = dst.svm_cid;
>
>+	if (sk->sk_type == SOCK_DGRAM) {
>+		virtio_transport_recv_connected(sk, pkt);
>+		goto out;
>+	}
>+
>+	space_available = virtio_transport_space_update(sk, pkt);
>+
> 	if (space_available)
> 		sk->sk_write_space(sk);
>
>@@ -1180,6 +1329,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> 		break;
> 	}
>
>+out:
> 	release_sock(sk);
>
> 	/* Release refcnt obtained when we fetched this socket out of the
>-- 
>2.11.0
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 5/6] vhost/vsock: add kconfig for vhost dgram support
  2021-06-09 23:24   ` Jiang Wang
@ 2021-06-18  9:54     ` Stefano Garzarella
  -1 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-18  9:54 UTC (permalink / raw)
  To: Jiang Wang
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, Jason Wang,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Andra Paraschiv, Norbert Slusarek, Colin Ian King, Lu Wei,
	Alexander Popov, kvm, netdev, linux-kernel

On Wed, Jun 09, 2021 at 11:24:57PM +0000, Jiang Wang wrote:
>Also change number of vqs according to the config
>
>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>---
> drivers/vhost/Kconfig |  8 ++++++++
> drivers/vhost/vsock.c | 11 ++++++++---
> 2 files changed, 16 insertions(+), 3 deletions(-)

As we already discussed, I think we don't need this patch.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 5/6] vhost/vsock: add kconfig for vhost dgram support
@ 2021-06-18  9:54     ` Stefano Garzarella
  0 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-18  9:54 UTC (permalink / raw)
  To: Jiang Wang
  Cc: cong.wang, Andra Paraschiv, kvm, mst, virtualization,
	Norbert Slusarek, jhansen, duanxiongchun, xieyongji, Ingo Molnar,
	Jakub Kicinski, Alexander Popov, Steven Rostedt, chaiwen.cc,
	stefanha, netdev, linux-kernel, Lu Wei, Colin Ian King,
	arseny.krasnov, David S. Miller

On Wed, Jun 09, 2021 at 11:24:57PM +0000, Jiang Wang wrote:
>Also change number of vqs according to the config
>
>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>---
> drivers/vhost/Kconfig |  8 ++++++++
> drivers/vhost/vsock.c | 11 ++++++++---
> 2 files changed, 16 insertions(+), 3 deletions(-)

As we already discussed, I think we don't need this patch.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 6/6] virtio/vsock: add sysfs for rx buf len for dgram
  2021-06-09 23:24   ` Jiang Wang
@ 2021-06-18 10:04     ` Stefano Garzarella
  -1 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-18 10:04 UTC (permalink / raw)
  To: Jiang Wang
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, Jason Wang,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Norbert Slusarek, Andra Paraschiv, Lu Wei,
	Alexander Popov, kvm, netdev, linux-kernel

On Wed, Jun 09, 2021 at 11:24:58PM +0000, Jiang Wang wrote:
>Make rx buf len configurable via sysfs
>
>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>---
> net/vmw_vsock/virtio_transport.c | 37 +++++++++++++++++++++++++++++++++++--
> 1 file changed, 35 insertions(+), 2 deletions(-)
>
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index cf47aadb0c34..2e4dd9c48472 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -29,6 +29,14 @@ static struct virtio_vsock __rcu *the_virtio_vsock;
> static struct virtio_vsock *the_virtio_vsock_dgram;
> static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
>
>+static int rx_buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
>+static struct kobject *kobj_ref;
>+static ssize_t  sysfs_show(struct kobject *kobj,
>+			struct kobj_attribute *attr, char *buf);
>+static ssize_t  sysfs_store(struct kobject *kobj,
>+			struct kobj_attribute *attr, const char *buf, size_t count);
>+static struct kobj_attribute rxbuf_attr = __ATTR(rx_buf_value, 0660, sysfs_show, sysfs_store);

Maybe better to use a 'dgram' prefix.

>+
> struct virtio_vsock {
> 	struct virtio_device *vdev;
> 	struct virtqueue **vqs;
>@@ -360,7 +368,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
>
> static void virtio_vsock_rx_fill(struct virtio_vsock *vsock, bool is_dgram)
> {
>-	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
>+	int buf_len = rx_buf_len;
> 	struct virtio_vsock_pkt *pkt;
> 	struct scatterlist hdr, buf, *sgs[2];
> 	struct virtqueue *vq;
>@@ -1003,6 +1011,22 @@ static struct virtio_driver virtio_vsock_driver = {
> 	.remove = virtio_vsock_remove,
> };
>
>+static ssize_t sysfs_show(struct kobject *kobj,
>+		struct kobj_attribute *attr, char *buf)
>+{
>+	return sprintf(buf, "%d", rx_buf_len);
>+}
>+
>+static ssize_t sysfs_store(struct kobject *kobj,
>+		struct kobj_attribute *attr, const char *buf, size_t count)
>+{
>+	if (kstrtou32(buf, 0, &rx_buf_len) < 0)
>+		return -EINVAL;
>+	if (rx_buf_len < 1024)
>+		rx_buf_len = 1024;
>+	return count;
>+}
>+
> static int __init virtio_vsock_init(void)
> {
> 	int ret;
>@@ -1020,8 +1044,17 @@ static int __init virtio_vsock_init(void)
> 	if (ret)
> 		goto out_vci;
>
>-	return 0;
>+	kobj_ref = kobject_create_and_add("vsock", kernel_kobj);

So, IIUC, the path will be /sys/vsock/rx_buf_value?

I'm not sure if we need to add a `virtio` subdir (e.g.
/sys/vsock/virtio/dgram_rx_buf_size)

Thanks,
Stefano

>
>+	/*Creating sysfs file for etx_value*/
>+	ret = sysfs_create_file(kobj_ref, &rxbuf_attr.attr);
>+	if (ret)
>+		goto out_sysfs;
>+
>+	return 0;
>+out_sysfs:
>+	kobject_put(kobj_ref);
>+	sysfs_remove_file(kernel_kobj, &rxbuf_attr.attr);
> out_vci:
> 	vsock_core_unregister(&virtio_transport.transport);
> out_wq:
>-- 
>2.11.0
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 6/6] virtio/vsock: add sysfs for rx buf len for dgram
@ 2021-06-18 10:04     ` Stefano Garzarella
  0 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-18 10:04 UTC (permalink / raw)
  To: Jiang Wang
  Cc: cong.wang, Andra Paraschiv, kvm, mst, virtualization,
	Norbert Slusarek, jhansen, duanxiongchun, xieyongji, Ingo Molnar,
	Jakub Kicinski, Alexander Popov, Steven Rostedt, chaiwen.cc,
	stefanha, netdev, linux-kernel, Lu Wei, Colin Ian King,
	arseny.krasnov, David S. Miller

On Wed, Jun 09, 2021 at 11:24:58PM +0000, Jiang Wang wrote:
>Make rx buf len configurable via sysfs
>
>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>---
> net/vmw_vsock/virtio_transport.c | 37 +++++++++++++++++++++++++++++++++++--
> 1 file changed, 35 insertions(+), 2 deletions(-)
>
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index cf47aadb0c34..2e4dd9c48472 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -29,6 +29,14 @@ static struct virtio_vsock __rcu *the_virtio_vsock;
> static struct virtio_vsock *the_virtio_vsock_dgram;
> static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
>
>+static int rx_buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
>+static struct kobject *kobj_ref;
>+static ssize_t  sysfs_show(struct kobject *kobj,
>+			struct kobj_attribute *attr, char *buf);
>+static ssize_t  sysfs_store(struct kobject *kobj,
>+			struct kobj_attribute *attr, const char *buf, size_t count);
>+static struct kobj_attribute rxbuf_attr = __ATTR(rx_buf_value, 0660, sysfs_show, sysfs_store);

Maybe better to use a 'dgram' prefix.

>+
> struct virtio_vsock {
> 	struct virtio_device *vdev;
> 	struct virtqueue **vqs;
>@@ -360,7 +368,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
>
> static void virtio_vsock_rx_fill(struct virtio_vsock *vsock, bool is_dgram)
> {
>-	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
>+	int buf_len = rx_buf_len;
> 	struct virtio_vsock_pkt *pkt;
> 	struct scatterlist hdr, buf, *sgs[2];
> 	struct virtqueue *vq;
>@@ -1003,6 +1011,22 @@ static struct virtio_driver virtio_vsock_driver = {
> 	.remove = virtio_vsock_remove,
> };
>
>+static ssize_t sysfs_show(struct kobject *kobj,
>+		struct kobj_attribute *attr, char *buf)
>+{
>+	return sprintf(buf, "%d", rx_buf_len);
>+}
>+
>+static ssize_t sysfs_store(struct kobject *kobj,
>+		struct kobj_attribute *attr, const char *buf, size_t count)
>+{
>+	if (kstrtou32(buf, 0, &rx_buf_len) < 0)
>+		return -EINVAL;
>+	if (rx_buf_len < 1024)
>+		rx_buf_len = 1024;
>+	return count;
>+}
>+
> static int __init virtio_vsock_init(void)
> {
> 	int ret;
>@@ -1020,8 +1044,17 @@ static int __init virtio_vsock_init(void)
> 	if (ret)
> 		goto out_vci;
>
>-	return 0;
>+	kobj_ref = kobject_create_and_add("vsock", kernel_kobj);

So, IIUC, the path will be /sys/vsock/rx_buf_value?

I'm not sure if we need to add a `virtio` subdir (e.g.
/sys/vsock/virtio/dgram_rx_buf_size)

Thanks,
Stefano

>
>+	/*Creating sysfs file for etx_value*/
>+	ret = sysfs_create_file(kobj_ref, &rxbuf_attr.attr);
>+	if (ret)
>+		goto out_sysfs;
>+
>+	return 0;
>+out_sysfs:
>+	kobject_put(kobj_ref);
>+	sysfs_remove_file(kernel_kobj, &rxbuf_attr.attr);
> out_vci:
> 	vsock_core_unregister(&virtio_transport.transport);
> out_wq:
>-- 
>2.11.0
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 2/6] virtio/vsock: add support for virtio datagram
  2021-06-09 23:24   ` Jiang Wang
@ 2021-06-18 10:11     ` Stefano Garzarella
  -1 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-18 10:11 UTC (permalink / raw)
  To: Jiang Wang
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, Jason Wang,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Andra Paraschiv, Norbert Slusarek,
	Alexander Popov, kvm, netdev, linux-kernel

On Wed, Jun 09, 2021 at 11:24:54PM +0000, Jiang Wang wrote:
>This patch add support for virtio dgram for the driver.
>Implemented related functions for tx and rx, enqueue
>and dequeue. Send packets synchronously to give sender
>indication when the virtqueue is full.
>Refactored virtio_transport_send_pkt_work() a little bit but
>no functions changes for it.
>
>Support for the host/device side is in another
>patch.
>
>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>---
> include/net/af_vsock.h                             |   1 +
> .../trace/events/vsock_virtio_transport_common.h   |   5 +-
> include/uapi/linux/virtio_vsock.h                  |   1 +
> net/vmw_vsock/af_vsock.c                           |  12 +
> net/vmw_vsock/virtio_transport.c                   | 325 ++++++++++++++++++---
> net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++++--
> 6 files changed, 466 insertions(+), 62 deletions(-)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index b1c717286993..fcae7bca9609 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -200,6 +200,7 @@ void vsock_remove_sock(struct vsock_sock *vsk);
> void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
> int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
> bool vsock_find_cid(unsigned int cid);
>+int vsock_bind_stream(struct vsock_sock *vsk, struct sockaddr_vm *addr);
>
> /**** TAP ****/
>
>diff --git a/include/trace/events/vsock_virtio_transport_common.h b/include/trace/events/vsock_virtio_transport_common.h
>index 6782213778be..b1be25b327a1 100644
>--- a/include/trace/events/vsock_virtio_transport_common.h
>+++ b/include/trace/events/vsock_virtio_transport_common.h
>@@ -9,9 +9,12 @@
> #include <linux/tracepoint.h>
>
> TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_STREAM);
>+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_DGRAM);
>
> #define show_type(val) \
>-	__print_symbolic(val, { VIRTIO_VSOCK_TYPE_STREAM, "STREAM" })
>+	 __print_symbolic(val, \
>+					{ VIRTIO_VSOCK_TYPE_STREAM, "STREAM" }, \
>+					{ VIRTIO_VSOCK_TYPE_DGRAM, "DGRAM" })
>
> TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_INVALID);
> TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_REQUEST);
>diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>index b56614dff1c9..5503585b26e8 100644
>--- a/include/uapi/linux/virtio_vsock.h
>+++ b/include/uapi/linux/virtio_vsock.h
>@@ -68,6 +68,7 @@ struct virtio_vsock_hdr {
>
> enum virtio_vsock_type {
> 	VIRTIO_VSOCK_TYPE_STREAM = 1,
>+	VIRTIO_VSOCK_TYPE_DGRAM = 3,
> };
>
> enum virtio_vsock_op {
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 92a72f0e0d94..c1f512291b94 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -659,6 +659,18 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
> 	return 0;
> }
>
>+int vsock_bind_stream(struct vsock_sock *vsk,
>+				       struct sockaddr_vm *addr)
>+{
>+	int retval;
>+
>+	spin_lock_bh(&vsock_table_lock);
>+	retval = __vsock_bind_stream(vsk, addr);
>+	spin_unlock_bh(&vsock_table_lock);
>+	return retval;
>+}
>+EXPORT_SYMBOL(vsock_bind_stream);
>+
> static int __vsock_bind_dgram(struct vsock_sock *vsk,
> 			      struct sockaddr_vm *addr)
> {
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 7dcb8db23305..cf47aadb0c34 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -20,21 +20,29 @@
> #include <net/sock.h>
> #include <linux/mutex.h>
> #include <net/af_vsock.h>
>+#include<linux/kobject.h>
>+#include<linux/sysfs.h>
>+#include <linux/refcount.h>
>
> static struct workqueue_struct *virtio_vsock_workqueue;
> static struct virtio_vsock __rcu *the_virtio_vsock;
>+static struct virtio_vsock *the_virtio_vsock_dgram;
> static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
>
> struct virtio_vsock {
> 	struct virtio_device *vdev;
> 	struct virtqueue **vqs;
> 	bool has_dgram;
>+	refcount_t active;
>
> 	/* Virtqueue processing is deferred to a workqueue */
> 	struct work_struct tx_work;
> 	struct work_struct rx_work;
> 	struct work_struct event_work;
>
>+	struct work_struct dgram_tx_work;
>+	struct work_struct dgram_rx_work;
>+
> 	/* The following fields are protected by tx_lock.  vqs[VSOCK_VQ_TX]
> 	 * must be accessed with tx_lock held.
> 	 */
>@@ -55,6 +63,22 @@ struct virtio_vsock {
> 	int rx_buf_nr;
> 	int rx_buf_max_nr;
>
>+	/* The following fields are protected by dgram_tx_lock.  vqs[VSOCK_VQ_DGRAM_TX]
>+	 * must be accessed with dgram_tx_lock held.
>+	 */
>+	struct mutex dgram_tx_lock;
>+	bool dgram_tx_run;
>+
>+	atomic_t dgram_queued_replies;
>+
>+	/* The following fields are protected by dgram_rx_lock.  vqs[VSOCK_VQ_DGRAM_RX]
>+	 * must be accessed with dgram_rx_lock held.
>+	 */
>+	struct mutex dgram_rx_lock;
>+	bool dgram_rx_run;
>+	int dgram_rx_buf_nr;
>+	int dgram_rx_buf_max_nr;
>+
> 	/* The following fields are protected by event_lock.
> 	 * vqs[VSOCK_VQ_EVENT] must be accessed with event_lock held.
> 	 */
>@@ -83,21 +107,11 @@ static u32 virtio_transport_get_local_cid(void)
> 	return ret;
> }
>
>-static void
>-virtio_transport_send_pkt_work(struct work_struct *work)
>+static void virtio_transport_do_send_pkt(struct virtio_vsock *vsock,
>+		struct virtqueue *vq,  spinlock_t *lock, struct list_head *send_pkt_list,
>+		bool *restart_rx)
> {
>-	struct virtio_vsock *vsock =
>-		container_of(work, struct virtio_vsock, send_pkt_work);
>-	struct virtqueue *vq;
> 	bool added = false;
>-	bool restart_rx = false;
>-
>-	mutex_lock(&vsock->tx_lock);
>-
>-	if (!vsock->tx_run)
>-		goto out;
>-
>-	vq = vsock->vqs[VSOCK_VQ_TX];
>
> 	for (;;) {
> 		struct virtio_vsock_pkt *pkt;
>@@ -105,16 +119,16 @@ virtio_transport_send_pkt_work(struct work_struct *work)
> 		int ret, in_sg = 0, out_sg = 0;
> 		bool reply;
>
>-		spin_lock_bh(&vsock->send_pkt_list_lock);
>-		if (list_empty(&vsock->send_pkt_list)) {
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+		spin_lock_bh(lock);
>+		if (list_empty(send_pkt_list)) {
>+			spin_unlock_bh(lock);
> 			break;
> 		}
>
>-		pkt = list_first_entry(&vsock->send_pkt_list,
>+		pkt = list_first_entry(send_pkt_list,
> 				       struct virtio_vsock_pkt, list);
> 		list_del_init(&pkt->list);
>-		spin_unlock_bh(&vsock->send_pkt_list_lock);
>+		spin_unlock_bh(lock);
>
> 		virtio_transport_deliver_tap_pkt(pkt);
>
>@@ -132,9 +146,9 @@ virtio_transport_send_pkt_work(struct work_struct *work)
> 		 * the vq
> 		 */
> 		if (ret < 0) {
>-			spin_lock_bh(&vsock->send_pkt_list_lock);
>-			list_add(&pkt->list, &vsock->send_pkt_list);
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+			spin_lock_bh(lock);
>+			list_add(&pkt->list, send_pkt_list);
>+			spin_unlock_bh(lock);
> 			break;
> 		}
>
>@@ -146,7 +160,7 @@ virtio_transport_send_pkt_work(struct work_struct *work)
>
> 			/* Do we now have resources to resume rx processing? */
> 			if (val + 1 == virtqueue_get_vring_size(rx_vq))
>-				restart_rx = true;
>+				*restart_rx = true;
> 		}
>
> 		added = true;
>@@ -154,7 +168,55 @@ virtio_transport_send_pkt_work(struct work_struct *work)
>
> 	if (added)
> 		virtqueue_kick(vq);
>+}
>
>+static int virtio_transport_do_send_dgram_pkt(struct virtio_vsock *vsock,
>+		struct virtqueue *vq, struct virtio_vsock_pkt *pkt)
>+{
>+	struct scatterlist hdr, buf, *sgs[2];
>+	int ret, in_sg = 0, out_sg = 0;
>+
>+	virtio_transport_deliver_tap_pkt(pkt);
>+
>+	sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
>+	sgs[out_sg++] = &hdr;
>+	if (pkt->buf) {
>+		sg_init_one(&buf, pkt->buf, pkt->len);
>+		sgs[out_sg++] = &buf;
>+	}
>+
>+	ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt, GFP_KERNEL);
>+	/* Usually this means that there is no more space available in
>+	 * the vq
>+	 */
>+	if (ret < 0) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENOMEM;
>+	}
>+
>+	virtqueue_kick(vq);
>+
>+	return pkt->len;
>+}
>+
>+
>+static void
>+virtio_transport_send_pkt_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, send_pkt_work);
>+	struct virtqueue *vq;
>+	bool restart_rx = false;
>+
>+	mutex_lock(&vsock->tx_lock);
>+
>+	if (!vsock->tx_run)
>+		goto out;
>+
>+	vq = vsock->vqs[VSOCK_VQ_TX];
>+
>+	virtio_transport_do_send_pkt(vsock, vq, &vsock->send_pkt_list_lock,
>+							&vsock->send_pkt_list, &restart_rx);
> out:
> 	mutex_unlock(&vsock->tx_lock);
>
>@@ -163,11 +225,64 @@ virtio_transport_send_pkt_work(struct work_struct *work)
> }
>
> static int
>+virtio_transport_send_dgram_pkt(struct virtio_vsock_pkt *pkt)
>+{
>+	struct virtio_vsock *vsock;
>+	int len = pkt->len;
>+	struct virtqueue *vq;
>+
>+	vsock = the_virtio_vsock_dgram;
>+
>+	if (!vsock) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENODEV;
>+	}
>+
>+	if (!vsock->dgram_tx_run) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENODEV;
>+	}
>+
>+	if (!refcount_inc_not_zero(&vsock->active)) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENODEV;
>+	}
>+
>+	if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) {
>+		virtio_transport_free_pkt(pkt);
>+		len = -ENODEV;
>+		goto out_ref;
>+	}
>+
>+	/* send the pkt */
>+	mutex_lock(&vsock->dgram_tx_lock);
>+
>+	if (!vsock->dgram_tx_run)
>+		goto out_mutex;
>+
>+	vq = vsock->vqs[VSOCK_VQ_DGRAM_TX];
>+
>+	len = virtio_transport_do_send_dgram_pkt(vsock, vq, pkt);
>+
>+out_mutex:
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
>+out_ref:
>+	if (!refcount_dec_not_one(&vsock->active))
>+		return -EFAULT;
>+
>+	return len;
>+}
>+
>+static int
> virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> {
> 	struct virtio_vsock *vsock;
> 	int len = pkt->len;
>
>+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)
>+		return virtio_transport_send_dgram_pkt(pkt);
>+
> 	rcu_read_lock();
> 	vsock = rcu_dereference(the_virtio_vsock);
> 	if (!vsock) {
>@@ -243,7 +358,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
> 	return ret;
> }
>
>-static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock, bool is_dgram)
> {
> 	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> 	struct virtio_vsock_pkt *pkt;
>@@ -251,7 +366,10 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> 	struct virtqueue *vq;
> 	int ret;
>
>-	vq = vsock->vqs[VSOCK_VQ_RX];
>+	if (is_dgram)
>+		vq = vsock->vqs[VSOCK_VQ_DGRAM_RX];
>+	else
>+		vq = vsock->vqs[VSOCK_VQ_RX];
>
> 	do {
> 		pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
>@@ -277,10 +395,19 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> 			virtio_transport_free_pkt(pkt);
> 			break;
> 		}
>-		vsock->rx_buf_nr++;
>+		if (is_dgram)
>+			vsock->dgram_rx_buf_nr++;
>+		else
>+			vsock->rx_buf_nr++;
> 	} while (vq->num_free);
>-	if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
>-		vsock->rx_buf_max_nr = vsock->rx_buf_nr;
>+	if (is_dgram) {
>+		if (vsock->dgram_rx_buf_nr > vsock->dgram_rx_buf_max_nr)
>+			vsock->dgram_rx_buf_max_nr = vsock->dgram_rx_buf_nr;
>+	} else {
>+		if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
>+			vsock->rx_buf_max_nr = vsock->rx_buf_nr;
>+	}
>+
> 	virtqueue_kick(vq);
> }
>
>@@ -315,6 +442,34 @@ static void virtio_transport_tx_work(struct work_struct *work)
> 		queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work);
> }
>
>+static void virtio_transport_dgram_tx_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, dgram_tx_work);
>+	struct virtqueue *vq;
>+	bool added = false;
>+
>+	vq = vsock->vqs[VSOCK_VQ_DGRAM_TX];
>+	mutex_lock(&vsock->dgram_tx_lock);
>+
>+	if (!vsock->dgram_tx_run)
>+		goto out;
>+
>+	do {
>+		struct virtio_vsock_pkt *pkt;
>+		unsigned int len;
>+
>+		virtqueue_disable_cb(vq);
>+		while ((pkt = virtqueue_get_buf(vq, &len)) != NULL) {
>+			virtio_transport_free_pkt(pkt);
>+			added = true;
>+		}
>+	} while (!virtqueue_enable_cb(vq));
>+
>+out:
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+}
>+
> /* Is there space left for replies to rx packets? */
> static bool virtio_transport_more_replies(struct virtio_vsock *vsock)
> {
>@@ -449,6 +604,11 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)
>
> static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)
> {
>+	struct virtio_vsock *vsock = vq->vdev->priv;
>+
>+	if (!vsock)
>+		return;
>+	queue_work(virtio_vsock_workqueue, &vsock->dgram_tx_work);
> }
>
> static void virtio_vsock_rx_done(struct virtqueue *vq)
>@@ -462,8 +622,12 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
>
> static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)
> {
>-}
>+	struct virtio_vsock *vsock = vq->vdev->priv;
>
>+	if (!vsock)
>+		return;
>+	queue_work(virtio_vsock_workqueue, &vsock->dgram_rx_work);
>+}
> static struct virtio_transport virtio_transport = {
> 	.transport = {
> 		.module                   = THIS_MODULE,
>@@ -506,19 +670,9 @@ static struct virtio_transport virtio_transport = {
> 	.send_pkt = virtio_transport_send_pkt,
> };
>
>-static void virtio_transport_rx_work(struct work_struct *work)
>+static void virtio_transport_do_rx_work(struct virtio_vsock *vsock,
>+						struct virtqueue *vq, bool is_dgram)
> {
>-	struct virtio_vsock *vsock =
>-		container_of(work, struct virtio_vsock, rx_work);
>-	struct virtqueue *vq;
>-
>-	vq = vsock->vqs[VSOCK_VQ_RX];
>-
>-	mutex_lock(&vsock->rx_lock);
>-
>-	if (!vsock->rx_run)
>-		goto out;
>-
> 	do {
> 		virtqueue_disable_cb(vq);
> 		for (;;) {
>@@ -538,7 +692,10 @@ static void virtio_transport_rx_work(struct work_struct *work)
> 				break;
> 			}
>
>-			vsock->rx_buf_nr--;
>+			if (is_dgram)
>+				vsock->dgram_rx_buf_nr--;
>+			else
>+				vsock->rx_buf_nr--;
>
> 			/* Drop short/long packets */
> 			if (unlikely(len < sizeof(pkt->hdr) ||
>@@ -554,11 +711,45 @@ static void virtio_transport_rx_work(struct work_struct *work)
> 	} while (!virtqueue_enable_cb(vq));
>
> out:
>+	return;
>+}
>+
>+static void virtio_transport_rx_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, rx_work);
>+	struct virtqueue *vq;
>+
>+	vq = vsock->vqs[VSOCK_VQ_RX];
>+
>+	mutex_lock(&vsock->rx_lock);
>+
>+	if (vsock->rx_run)
>+		virtio_transport_do_rx_work(vsock, vq, false);
>+
> 	if (vsock->rx_buf_nr < vsock->rx_buf_max_nr / 2)
>-		virtio_vsock_rx_fill(vsock);
>+		virtio_vsock_rx_fill(vsock, false);
> 	mutex_unlock(&vsock->rx_lock);
> }
>
>+static void virtio_transport_dgram_rx_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, dgram_rx_work);
>+	struct virtqueue *vq;
>+
>+	vq = vsock->vqs[VSOCK_VQ_DGRAM_RX];
>+
>+	mutex_lock(&vsock->dgram_rx_lock);
>+
>+	if (vsock->dgram_rx_run)
>+		virtio_transport_do_rx_work(vsock, vq, true);
>+
>+	if (vsock->dgram_rx_buf_nr < vsock->dgram_rx_buf_max_nr / 2)
>+		virtio_vsock_rx_fill(vsock, true);
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+}
>+
> static int virtio_vsock_probe(struct virtio_device *vdev)
> {
> 	vq_callback_t *callbacks[] = {
>@@ -642,8 +833,14 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> 	vsock->rx_buf_max_nr = 0;
> 	atomic_set(&vsock->queued_replies, 0);
>
>+	vsock->dgram_rx_buf_nr = 0;
>+	vsock->dgram_rx_buf_max_nr = 0;
>+	atomic_set(&vsock->dgram_queued_replies, 0);
>+
> 	mutex_init(&vsock->tx_lock);
> 	mutex_init(&vsock->rx_lock);
>+	mutex_init(&vsock->dgram_tx_lock);
>+	mutex_init(&vsock->dgram_rx_lock);
> 	mutex_init(&vsock->event_lock);
> 	spin_lock_init(&vsock->send_pkt_list_lock);
> 	INIT_LIST_HEAD(&vsock->send_pkt_list);
>@@ -651,16 +848,27 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> 	INIT_WORK(&vsock->tx_work, virtio_transport_tx_work);
> 	INIT_WORK(&vsock->event_work, virtio_transport_event_work);
> 	INIT_WORK(&vsock->send_pkt_work, virtio_transport_send_pkt_work);
>+	INIT_WORK(&vsock->dgram_rx_work, virtio_transport_dgram_rx_work);
>+	INIT_WORK(&vsock->dgram_tx_work, virtio_transport_dgram_tx_work);
>
> 	mutex_lock(&vsock->tx_lock);
> 	vsock->tx_run = true;
> 	mutex_unlock(&vsock->tx_lock);
>
>+	mutex_lock(&vsock->dgram_tx_lock);
>+	vsock->dgram_tx_run = true;
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
> 	mutex_lock(&vsock->rx_lock);
>-	virtio_vsock_rx_fill(vsock);
>+	virtio_vsock_rx_fill(vsock, false);
> 	vsock->rx_run = true;
> 	mutex_unlock(&vsock->rx_lock);
>
>+	mutex_lock(&vsock->dgram_rx_lock);
>+	virtio_vsock_rx_fill(vsock, true);
>+	vsock->dgram_rx_run = true;
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+
> 	mutex_lock(&vsock->event_lock);
> 	virtio_vsock_event_fill(vsock);
> 	vsock->event_run = true;
>@@ -669,6 +877,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> 	vdev->priv = vsock;
> 	rcu_assign_pointer(the_virtio_vsock, vsock);
>
>+	the_virtio_vsock_dgram = vsock;
>+	refcount_set(&the_virtio_vsock_dgram->active, 1);
>+
> 	mutex_unlock(&the_virtio_vsock_mutex);
> 	return 0;
>
>@@ -699,14 +910,28 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
> 	vsock->rx_run = false;
> 	mutex_unlock(&vsock->rx_lock);
>
>+	mutex_lock(&vsock->dgram_rx_lock);
>+	vsock->dgram_rx_run = false;
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+
> 	mutex_lock(&vsock->tx_lock);
> 	vsock->tx_run = false;
> 	mutex_unlock(&vsock->tx_lock);
>
>+	mutex_lock(&vsock->dgram_tx_lock);
>+	vsock->dgram_tx_run = false;
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
> 	mutex_lock(&vsock->event_lock);
> 	vsock->event_run = false;
> 	mutex_unlock(&vsock->event_lock);
>
>+	while (!refcount_dec_if_one(&the_virtio_vsock_dgram->active)) {
>+		if (signal_pending(current))
>+			break;
>+		msleep(5);
>+	}
>+
> 	/* Flush all device writes and interrupts, device will not use any
> 	 * more buffers.
> 	 */
>@@ -717,11 +942,21 @@ static void virtio_vsock_remove(struct 
>virtio_device *vdev)
> 		virtio_transport_free_pkt(pkt);
> 	mutex_unlock(&vsock->rx_lock);
>
>+	mutex_lock(&vsock->dgram_rx_lock);
>+	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_DGRAM_RX])))
>+		virtio_transport_free_pkt(pkt);
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+
> 	mutex_lock(&vsock->tx_lock);
> 	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_TX])))
> 		virtio_transport_free_pkt(pkt);
> 	mutex_unlock(&vsock->tx_lock);
>
>+	mutex_lock(&vsock->dgram_tx_lock);
>+	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_DGRAM_TX])))
>+		virtio_transport_free_pkt(pkt);
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
> 	spin_lock_bh(&vsock->send_pkt_list_lock);
> 	while (!list_empty(&vsock->send_pkt_list)) {
> 		pkt = list_first_entry(&vsock->send_pkt_list,
>@@ -739,6 +974,8 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
> 	 */
> 	flush_work(&vsock->rx_work);
> 	flush_work(&vsock->tx_work);
>+	flush_work(&vsock->dgram_rx_work);
>+	flush_work(&vsock->dgram_tx_work);
> 	flush_work(&vsock->event_work);
> 	flush_work(&vsock->send_pkt_work);
>
>@@ -775,7 +1012,7 @@ static int __init virtio_vsock_init(void)
> 		return -ENOMEM;
>
> 	ret = vsock_core_register(&virtio_transport.transport,
>-				  VSOCK_TRANSPORT_F_G2H);
>+				  VSOCK_TRANSPORT_F_G2H | VSOCK_TRANSPORT_F_DGRAM);
> 	if (ret)
> 		goto out_wq;
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 902cb6dd710b..9f041515b7f1 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -26,6 +26,8 @@
> /* Threshold for detecting small packets to copy */
> #define GOOD_COPY_LEN  128
>
>+static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk);
>+
> static const struct virtio_transport *
> virtio_transport_get_ops(struct vsock_sock *vsk)
> {
>@@ -196,21 +198,28 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> 	vvs = vsk->trans;
>
> 	/* we can send less than pkt_len bytes */
>-	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
>-		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>+	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
>+		if (info->type == VIRTIO_VSOCK_TYPE_STREAM)
>+			pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>+		else
>+			return 0;
>+	}
>
>-	/* virtio_transport_get_credit might return less than pkt_len credit */
>-	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>+	if (info->type == VIRTIO_VSOCK_TYPE_STREAM) {
>+		/* virtio_transport_get_credit might return less than pkt_len credit */
>+		pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>
>-	/* Do not send zero length OP_RW pkt */
>-	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>-		return pkt_len;
>+		/* Do not send zero length OP_RW pkt */
>+		if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>+			return pkt_len;
>+	}
>
> 	pkt = virtio_transport_alloc_pkt(info, pkt_len,
> 					 src_cid, src_port,
> 					 dst_cid, dst_port);
> 	if (!pkt) {
>-		virtio_transport_put_credit(vvs, pkt_len);
>+		if (info->type == VIRTIO_VSOCK_TYPE_STREAM)
>+			virtio_transport_put_credit(vvs, pkt_len);
> 		return -ENOMEM;
> 	}
>
>@@ -397,6 +406,58 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> 	return err;
> }
>
>+static ssize_t
>+virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
>+						   struct msghdr *msg, size_t len)
>+{
>+	struct virtio_vsock_sock *vvs = vsk->trans;
>+	struct virtio_vsock_pkt *pkt;
>+	size_t total = 0;
>+	u32 free_space;

`free_space` seems unused.

>+	int err = -EFAULT;
>+
>+	spin_lock_bh(&vvs->rx_lock);
>+	if (total < len && !list_empty(&vvs->rx_queue)) {
>+		pkt = list_first_entry(&vvs->rx_queue,
>+				       struct virtio_vsock_pkt, list);
>+
>+		total = len;
>+		if (total > pkt->len - pkt->off)
>+			total = pkt->len - pkt->off;
>+		else if (total < pkt->len - pkt->off)
>+			msg->msg_flags |= MSG_TRUNC;
>+
>+		/* sk_lock is held by caller so no one else can dequeue.
>+		 * Unlock rx_lock since memcpy_to_msg() may sleep.
>+		 */
>+		spin_unlock_bh(&vvs->rx_lock);
>+
>+		err = memcpy_to_msg(msg, pkt->buf + pkt->off, total);
>+		if (err)
>+			return err;
>+
>+		spin_lock_bh(&vvs->rx_lock);
>+
>+		virtio_transport_dec_rx_pkt(vvs, pkt);
>+		list_del(&pkt->list);
>+		virtio_transport_free_pkt(pkt);
>+	}
>+
>+	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
>+
>+	spin_unlock_bh(&vvs->rx_lock);
>+
>+	if (total > 0 && msg->msg_name) {
>+		/* Provide the address of the sender. */
>+		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
>+
>+		vsock_addr_init(vm_addr, le64_to_cpu(pkt->hdr.src_cid),
>+						le32_to_cpu(pkt->hdr.src_port));
>+		msg->msg_namelen = sizeof(*vm_addr);
>+	}
>+	return total;
>+}
>+
> ssize_t
> virtio_transport_stream_dequeue(struct vsock_sock *vsk,
> 				struct msghdr *msg,
>@@ -414,7 +475,66 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> 			       struct msghdr *msg,
> 			       size_t len, int flags)
> {
>-	return -EOPNOTSUPP;
>+	struct sock *sk;
>+	size_t err = 0;
>+	long timeout;
>+
>+	DEFINE_WAIT(wait);
>+
>+	sk = &vsk->sk;
>+	err = 0;
>+
>+	lock_sock(sk);
>+
>+	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags & MSG_PEEK)
>+		return -EOPNOTSUPP;
>+
>+	if (!len)
>+		goto out;
>+
>+	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
>+
>+	while (1) {
>+		s64 ready;
>+
>+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
>+		ready = virtio_transport_dgram_has_data(vsk);
>+
>+		if (ready == 0) {
>+			if (timeout == 0) {
>+				err = -EAGAIN;
>+				finish_wait(sk_sleep(sk), &wait);
>+				break;
>+			}
>+
>+			release_sock(sk);
>+			timeout = schedule_timeout(timeout);
>+			lock_sock(sk);
>+
>+			if (signal_pending(current)) {
>+				err = sock_intr_errno(timeout);
>+				finish_wait(sk_sleep(sk), &wait);
>+				break;
>+			} else if (timeout == 0) {
>+				err = -EAGAIN;
>+				finish_wait(sk_sleep(sk), &wait);
>+				break;
>+			}
>+		} else {
>+			finish_wait(sk_sleep(sk), &wait);
>+
>+			if (ready < 0) {
>+				err = -ENOMEM;
>+				goto out;
>+			}
>+
>+			err = virtio_transport_dgram_do_dequeue(vsk, msg, len);
>+			break;
>+		}
>+	}
>+out:
>+	release_sock(sk);
>+	return err;
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
>
>@@ -431,6 +551,11 @@ s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
> }
> EXPORT_SYMBOL_GPL(virtio_transport_stream_has_data);
>
>+static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk)
>+{
>+	return virtio_transport_stream_has_data(vsk);
>+}
>+
> static s64 virtio_transport_has_space(struct vsock_sock *vsk)
> {
> 	struct virtio_vsock_sock *vvs = vsk->trans;
>@@ -610,13 +735,15 @@ EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> 				struct sockaddr_vm *addr)
> {
>-	return -EOPNOTSUPP;
>+	//use same stream bind for dgram
>+	int ret = vsock_bind_stream(vsk, addr);
>+	return ret;
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
>
> bool virtio_transport_dgram_allow(u32 cid, u32 port)
> {
>-	return false;
>+	return true;
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
>
>@@ -654,7 +781,17 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
> 			       struct msghdr *msg,
> 			       size_t dgram_len)
> {
>-	return -EOPNOTSUPP;
>+	struct virtio_vsock_pkt_info info = {
>+		.op = VIRTIO_VSOCK_OP_RW,
>+		.type = VIRTIO_VSOCK_TYPE_DGRAM,
>+		.msg = msg,
>+		.pkt_len = dgram_len,
>+		.vsk = vsk,
>+		.remote_cid = remote_addr->svm_cid,
>+		.remote_port = remote_addr->svm_port,
>+	};
>+
>+	return virtio_transport_send_pkt_info(vsk, &info);
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
>
>@@ -729,7 +866,6 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
> 		virtio_transport_free_pkt(reply);
> 		return -ENOTCONN;
> 	}
>-
> 	return t->send_pkt(reply);
> }
>
>@@ -925,7 +1061,8 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> 		/* If there is space in the last packet queued, we copy the
> 		 * new packet in its buffer.
> 		 */
>-		if (pkt->len <= last_pkt->buf_len - last_pkt->len) {
>+		if (pkt->len <= last_pkt->buf_len - last_pkt->len &&
>+			pkt->hdr.type == VIRTIO_VSOCK_TYPE_STREAM) {

We should use le16_to_cpu():
			le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM

> 			memcpy(last_pkt->buf + last_pkt->len, pkt->buf,
> 			       pkt->len);
> 			last_pkt->len += pkt->len;
>@@ -949,6 +1086,12 @@ virtio_transport_recv_connected(struct sock *sk,
> 	struct vsock_sock *vsk = vsock_sk(sk);
> 	int err = 0;
>
>+	if (le16_to_cpu(pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)) {

We should use le16_to_cpu() before the compare:
	if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_DGRAM) {

>+		virtio_transport_recv_enqueue(vsk, pkt);
>+		sk->sk_data_ready(sk);
>+		return err;
>+	}
>+
> 	switch (le16_to_cpu(pkt->hdr.op)) {
> 	case VIRTIO_VSOCK_OP_RW:
> 		virtio_transport_recv_enqueue(vsk, pkt);
>@@ -1121,7 +1264,8 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> 					le32_to_cpu(pkt->hdr.buf_alloc),
> 					le32_to_cpu(pkt->hdr.fwd_cnt));
>
>-	if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM) {
>+	if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM &&
>+		le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_DGRAM) {
> 		(void)virtio_transport_reset_no_sock(t, pkt);
> 		goto free_pkt;
> 	}
>@@ -1150,11 +1294,16 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> 		goto free_pkt;
> 	}
>
>-	space_available = virtio_transport_space_update(sk, pkt);
>-
> 	/* Update CID in case it has changed after a transport reset event */
> 	vsk->local_addr.svm_cid = dst.svm_cid;
>
>+	if (sk->sk_type == SOCK_DGRAM) {
>+		virtio_transport_recv_connected(sk, pkt);
>+		goto out;
>+	}
>+
>+	space_available = virtio_transport_space_update(sk, pkt);
>+
> 	if (space_available)
> 		sk->sk_write_space(sk);
>
>@@ -1180,6 +1329,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> 		break;
> 	}
>
>+out:
> 	release_sock(sk);
>
> 	/* Release refcnt obtained when we fetched this socket out of the
>-- 
>2.11.0
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 2/6] virtio/vsock: add support for virtio datagram
@ 2021-06-18 10:11     ` Stefano Garzarella
  0 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-18 10:11 UTC (permalink / raw)
  To: Jiang Wang
  Cc: cong.wang, duanxiongchun, Ingo Molnar, kvm, mst, netdev,
	linux-kernel, Steven Rostedt, virtualization, xieyongji,
	chaiwen.cc, Norbert Slusarek, stefanha, Colin Ian King,
	Jakub Kicinski, arseny.krasnov, Alexander Popov, jhansen,
	David S. Miller, Andra Paraschiv

On Wed, Jun 09, 2021 at 11:24:54PM +0000, Jiang Wang wrote:
>This patch add support for virtio dgram for the driver.
>Implemented related functions for tx and rx, enqueue
>and dequeue. Send packets synchronously to give sender
>indication when the virtqueue is full.
>Refactored virtio_transport_send_pkt_work() a little bit but
>no functions changes for it.
>
>Support for the host/device side is in another
>patch.
>
>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>---
> include/net/af_vsock.h                             |   1 +
> .../trace/events/vsock_virtio_transport_common.h   |   5 +-
> include/uapi/linux/virtio_vsock.h                  |   1 +
> net/vmw_vsock/af_vsock.c                           |  12 +
> net/vmw_vsock/virtio_transport.c                   | 325 ++++++++++++++++++---
> net/vmw_vsock/virtio_transport_common.c            | 184 ++++++++++--
> 6 files changed, 466 insertions(+), 62 deletions(-)
>
>diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>index b1c717286993..fcae7bca9609 100644
>--- a/include/net/af_vsock.h
>+++ b/include/net/af_vsock.h
>@@ -200,6 +200,7 @@ void vsock_remove_sock(struct vsock_sock *vsk);
> void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
> int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk);
> bool vsock_find_cid(unsigned int cid);
>+int vsock_bind_stream(struct vsock_sock *vsk, struct sockaddr_vm *addr);
>
> /**** TAP ****/
>
>diff --git a/include/trace/events/vsock_virtio_transport_common.h b/include/trace/events/vsock_virtio_transport_common.h
>index 6782213778be..b1be25b327a1 100644
>--- a/include/trace/events/vsock_virtio_transport_common.h
>+++ b/include/trace/events/vsock_virtio_transport_common.h
>@@ -9,9 +9,12 @@
> #include <linux/tracepoint.h>
>
> TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_STREAM);
>+TRACE_DEFINE_ENUM(VIRTIO_VSOCK_TYPE_DGRAM);
>
> #define show_type(val) \
>-	__print_symbolic(val, { VIRTIO_VSOCK_TYPE_STREAM, "STREAM" })
>+	 __print_symbolic(val, \
>+					{ VIRTIO_VSOCK_TYPE_STREAM, "STREAM" }, \
>+					{ VIRTIO_VSOCK_TYPE_DGRAM, "DGRAM" })
>
> TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_INVALID);
> TRACE_DEFINE_ENUM(VIRTIO_VSOCK_OP_REQUEST);
>diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>index b56614dff1c9..5503585b26e8 100644
>--- a/include/uapi/linux/virtio_vsock.h
>+++ b/include/uapi/linux/virtio_vsock.h
>@@ -68,6 +68,7 @@ struct virtio_vsock_hdr {
>
> enum virtio_vsock_type {
> 	VIRTIO_VSOCK_TYPE_STREAM = 1,
>+	VIRTIO_VSOCK_TYPE_DGRAM = 3,
> };
>
> enum virtio_vsock_op {
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 92a72f0e0d94..c1f512291b94 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -659,6 +659,18 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
> 	return 0;
> }
>
>+int vsock_bind_stream(struct vsock_sock *vsk,
>+				       struct sockaddr_vm *addr)
>+{
>+	int retval;
>+
>+	spin_lock_bh(&vsock_table_lock);
>+	retval = __vsock_bind_stream(vsk, addr);
>+	spin_unlock_bh(&vsock_table_lock);
>+	return retval;
>+}
>+EXPORT_SYMBOL(vsock_bind_stream);
>+
> static int __vsock_bind_dgram(struct vsock_sock *vsk,
> 			      struct sockaddr_vm *addr)
> {
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 7dcb8db23305..cf47aadb0c34 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -20,21 +20,29 @@
> #include <net/sock.h>
> #include <linux/mutex.h>
> #include <net/af_vsock.h>
>+#include<linux/kobject.h>
>+#include<linux/sysfs.h>
>+#include <linux/refcount.h>
>
> static struct workqueue_struct *virtio_vsock_workqueue;
> static struct virtio_vsock __rcu *the_virtio_vsock;
>+static struct virtio_vsock *the_virtio_vsock_dgram;
> static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
>
> struct virtio_vsock {
> 	struct virtio_device *vdev;
> 	struct virtqueue **vqs;
> 	bool has_dgram;
>+	refcount_t active;
>
> 	/* Virtqueue processing is deferred to a workqueue */
> 	struct work_struct tx_work;
> 	struct work_struct rx_work;
> 	struct work_struct event_work;
>
>+	struct work_struct dgram_tx_work;
>+	struct work_struct dgram_rx_work;
>+
> 	/* The following fields are protected by tx_lock.  vqs[VSOCK_VQ_TX]
> 	 * must be accessed with tx_lock held.
> 	 */
>@@ -55,6 +63,22 @@ struct virtio_vsock {
> 	int rx_buf_nr;
> 	int rx_buf_max_nr;
>
>+	/* The following fields are protected by dgram_tx_lock.  vqs[VSOCK_VQ_DGRAM_TX]
>+	 * must be accessed with dgram_tx_lock held.
>+	 */
>+	struct mutex dgram_tx_lock;
>+	bool dgram_tx_run;
>+
>+	atomic_t dgram_queued_replies;
>+
>+	/* The following fields are protected by dgram_rx_lock.  vqs[VSOCK_VQ_DGRAM_RX]
>+	 * must be accessed with dgram_rx_lock held.
>+	 */
>+	struct mutex dgram_rx_lock;
>+	bool dgram_rx_run;
>+	int dgram_rx_buf_nr;
>+	int dgram_rx_buf_max_nr;
>+
> 	/* The following fields are protected by event_lock.
> 	 * vqs[VSOCK_VQ_EVENT] must be accessed with event_lock held.
> 	 */
>@@ -83,21 +107,11 @@ static u32 virtio_transport_get_local_cid(void)
> 	return ret;
> }
>
>-static void
>-virtio_transport_send_pkt_work(struct work_struct *work)
>+static void virtio_transport_do_send_pkt(struct virtio_vsock *vsock,
>+		struct virtqueue *vq,  spinlock_t *lock, struct list_head *send_pkt_list,
>+		bool *restart_rx)
> {
>-	struct virtio_vsock *vsock =
>-		container_of(work, struct virtio_vsock, send_pkt_work);
>-	struct virtqueue *vq;
> 	bool added = false;
>-	bool restart_rx = false;
>-
>-	mutex_lock(&vsock->tx_lock);
>-
>-	if (!vsock->tx_run)
>-		goto out;
>-
>-	vq = vsock->vqs[VSOCK_VQ_TX];
>
> 	for (;;) {
> 		struct virtio_vsock_pkt *pkt;
>@@ -105,16 +119,16 @@ virtio_transport_send_pkt_work(struct work_struct *work)
> 		int ret, in_sg = 0, out_sg = 0;
> 		bool reply;
>
>-		spin_lock_bh(&vsock->send_pkt_list_lock);
>-		if (list_empty(&vsock->send_pkt_list)) {
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+		spin_lock_bh(lock);
>+		if (list_empty(send_pkt_list)) {
>+			spin_unlock_bh(lock);
> 			break;
> 		}
>
>-		pkt = list_first_entry(&vsock->send_pkt_list,
>+		pkt = list_first_entry(send_pkt_list,
> 				       struct virtio_vsock_pkt, list);
> 		list_del_init(&pkt->list);
>-		spin_unlock_bh(&vsock->send_pkt_list_lock);
>+		spin_unlock_bh(lock);
>
> 		virtio_transport_deliver_tap_pkt(pkt);
>
>@@ -132,9 +146,9 @@ virtio_transport_send_pkt_work(struct work_struct *work)
> 		 * the vq
> 		 */
> 		if (ret < 0) {
>-			spin_lock_bh(&vsock->send_pkt_list_lock);
>-			list_add(&pkt->list, &vsock->send_pkt_list);
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+			spin_lock_bh(lock);
>+			list_add(&pkt->list, send_pkt_list);
>+			spin_unlock_bh(lock);
> 			break;
> 		}
>
>@@ -146,7 +160,7 @@ virtio_transport_send_pkt_work(struct work_struct *work)
>
> 			/* Do we now have resources to resume rx processing? */
> 			if (val + 1 == virtqueue_get_vring_size(rx_vq))
>-				restart_rx = true;
>+				*restart_rx = true;
> 		}
>
> 		added = true;
>@@ -154,7 +168,55 @@ virtio_transport_send_pkt_work(struct work_struct *work)
>
> 	if (added)
> 		virtqueue_kick(vq);
>+}
>
>+static int virtio_transport_do_send_dgram_pkt(struct virtio_vsock *vsock,
>+		struct virtqueue *vq, struct virtio_vsock_pkt *pkt)
>+{
>+	struct scatterlist hdr, buf, *sgs[2];
>+	int ret, in_sg = 0, out_sg = 0;
>+
>+	virtio_transport_deliver_tap_pkt(pkt);
>+
>+	sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
>+	sgs[out_sg++] = &hdr;
>+	if (pkt->buf) {
>+		sg_init_one(&buf, pkt->buf, pkt->len);
>+		sgs[out_sg++] = &buf;
>+	}
>+
>+	ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt, GFP_KERNEL);
>+	/* Usually this means that there is no more space available in
>+	 * the vq
>+	 */
>+	if (ret < 0) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENOMEM;
>+	}
>+
>+	virtqueue_kick(vq);
>+
>+	return pkt->len;
>+}
>+
>+
>+static void
>+virtio_transport_send_pkt_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, send_pkt_work);
>+	struct virtqueue *vq;
>+	bool restart_rx = false;
>+
>+	mutex_lock(&vsock->tx_lock);
>+
>+	if (!vsock->tx_run)
>+		goto out;
>+
>+	vq = vsock->vqs[VSOCK_VQ_TX];
>+
>+	virtio_transport_do_send_pkt(vsock, vq, &vsock->send_pkt_list_lock,
>+							&vsock->send_pkt_list, &restart_rx);
> out:
> 	mutex_unlock(&vsock->tx_lock);
>
>@@ -163,11 +225,64 @@ virtio_transport_send_pkt_work(struct work_struct *work)
> }
>
> static int
>+virtio_transport_send_dgram_pkt(struct virtio_vsock_pkt *pkt)
>+{
>+	struct virtio_vsock *vsock;
>+	int len = pkt->len;
>+	struct virtqueue *vq;
>+
>+	vsock = the_virtio_vsock_dgram;
>+
>+	if (!vsock) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENODEV;
>+	}
>+
>+	if (!vsock->dgram_tx_run) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENODEV;
>+	}
>+
>+	if (!refcount_inc_not_zero(&vsock->active)) {
>+		virtio_transport_free_pkt(pkt);
>+		return -ENODEV;
>+	}
>+
>+	if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) {
>+		virtio_transport_free_pkt(pkt);
>+		len = -ENODEV;
>+		goto out_ref;
>+	}
>+
>+	/* send the pkt */
>+	mutex_lock(&vsock->dgram_tx_lock);
>+
>+	if (!vsock->dgram_tx_run)
>+		goto out_mutex;
>+
>+	vq = vsock->vqs[VSOCK_VQ_DGRAM_TX];
>+
>+	len = virtio_transport_do_send_dgram_pkt(vsock, vq, pkt);
>+
>+out_mutex:
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
>+out_ref:
>+	if (!refcount_dec_not_one(&vsock->active))
>+		return -EFAULT;
>+
>+	return len;
>+}
>+
>+static int
> virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> {
> 	struct virtio_vsock *vsock;
> 	int len = pkt->len;
>
>+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)
>+		return virtio_transport_send_dgram_pkt(pkt);
>+
> 	rcu_read_lock();
> 	vsock = rcu_dereference(the_virtio_vsock);
> 	if (!vsock) {
>@@ -243,7 +358,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
> 	return ret;
> }
>
>-static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock, bool is_dgram)
> {
> 	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> 	struct virtio_vsock_pkt *pkt;
>@@ -251,7 +366,10 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> 	struct virtqueue *vq;
> 	int ret;
>
>-	vq = vsock->vqs[VSOCK_VQ_RX];
>+	if (is_dgram)
>+		vq = vsock->vqs[VSOCK_VQ_DGRAM_RX];
>+	else
>+		vq = vsock->vqs[VSOCK_VQ_RX];
>
> 	do {
> 		pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
>@@ -277,10 +395,19 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> 			virtio_transport_free_pkt(pkt);
> 			break;
> 		}
>-		vsock->rx_buf_nr++;
>+		if (is_dgram)
>+			vsock->dgram_rx_buf_nr++;
>+		else
>+			vsock->rx_buf_nr++;
> 	} while (vq->num_free);
>-	if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
>-		vsock->rx_buf_max_nr = vsock->rx_buf_nr;
>+	if (is_dgram) {
>+		if (vsock->dgram_rx_buf_nr > vsock->dgram_rx_buf_max_nr)
>+			vsock->dgram_rx_buf_max_nr = vsock->dgram_rx_buf_nr;
>+	} else {
>+		if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
>+			vsock->rx_buf_max_nr = vsock->rx_buf_nr;
>+	}
>+
> 	virtqueue_kick(vq);
> }
>
>@@ -315,6 +442,34 @@ static void virtio_transport_tx_work(struct work_struct *work)
> 		queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work);
> }
>
>+static void virtio_transport_dgram_tx_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, dgram_tx_work);
>+	struct virtqueue *vq;
>+	bool added = false;
>+
>+	vq = vsock->vqs[VSOCK_VQ_DGRAM_TX];
>+	mutex_lock(&vsock->dgram_tx_lock);
>+
>+	if (!vsock->dgram_tx_run)
>+		goto out;
>+
>+	do {
>+		struct virtio_vsock_pkt *pkt;
>+		unsigned int len;
>+
>+		virtqueue_disable_cb(vq);
>+		while ((pkt = virtqueue_get_buf(vq, &len)) != NULL) {
>+			virtio_transport_free_pkt(pkt);
>+			added = true;
>+		}
>+	} while (!virtqueue_enable_cb(vq));
>+
>+out:
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+}
>+
> /* Is there space left for replies to rx packets? */
> static bool virtio_transport_more_replies(struct virtio_vsock *vsock)
> {
>@@ -449,6 +604,11 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)
>
> static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)
> {
>+	struct virtio_vsock *vsock = vq->vdev->priv;
>+
>+	if (!vsock)
>+		return;
>+	queue_work(virtio_vsock_workqueue, &vsock->dgram_tx_work);
> }
>
> static void virtio_vsock_rx_done(struct virtqueue *vq)
>@@ -462,8 +622,12 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
>
> static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)
> {
>-}
>+	struct virtio_vsock *vsock = vq->vdev->priv;
>
>+	if (!vsock)
>+		return;
>+	queue_work(virtio_vsock_workqueue, &vsock->dgram_rx_work);
>+}
> static struct virtio_transport virtio_transport = {
> 	.transport = {
> 		.module                   = THIS_MODULE,
>@@ -506,19 +670,9 @@ static struct virtio_transport virtio_transport = {
> 	.send_pkt = virtio_transport_send_pkt,
> };
>
>-static void virtio_transport_rx_work(struct work_struct *work)
>+static void virtio_transport_do_rx_work(struct virtio_vsock *vsock,
>+						struct virtqueue *vq, bool is_dgram)
> {
>-	struct virtio_vsock *vsock =
>-		container_of(work, struct virtio_vsock, rx_work);
>-	struct virtqueue *vq;
>-
>-	vq = vsock->vqs[VSOCK_VQ_RX];
>-
>-	mutex_lock(&vsock->rx_lock);
>-
>-	if (!vsock->rx_run)
>-		goto out;
>-
> 	do {
> 		virtqueue_disable_cb(vq);
> 		for (;;) {
>@@ -538,7 +692,10 @@ static void virtio_transport_rx_work(struct work_struct *work)
> 				break;
> 			}
>
>-			vsock->rx_buf_nr--;
>+			if (is_dgram)
>+				vsock->dgram_rx_buf_nr--;
>+			else
>+				vsock->rx_buf_nr--;
>
> 			/* Drop short/long packets */
> 			if (unlikely(len < sizeof(pkt->hdr) ||
>@@ -554,11 +711,45 @@ static void virtio_transport_rx_work(struct work_struct *work)
> 	} while (!virtqueue_enable_cb(vq));
>
> out:
>+	return;
>+}
>+
>+static void virtio_transport_rx_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, rx_work);
>+	struct virtqueue *vq;
>+
>+	vq = vsock->vqs[VSOCK_VQ_RX];
>+
>+	mutex_lock(&vsock->rx_lock);
>+
>+	if (vsock->rx_run)
>+		virtio_transport_do_rx_work(vsock, vq, false);
>+
> 	if (vsock->rx_buf_nr < vsock->rx_buf_max_nr / 2)
>-		virtio_vsock_rx_fill(vsock);
>+		virtio_vsock_rx_fill(vsock, false);
> 	mutex_unlock(&vsock->rx_lock);
> }
>
>+static void virtio_transport_dgram_rx_work(struct work_struct *work)
>+{
>+	struct virtio_vsock *vsock =
>+		container_of(work, struct virtio_vsock, dgram_rx_work);
>+	struct virtqueue *vq;
>+
>+	vq = vsock->vqs[VSOCK_VQ_DGRAM_RX];
>+
>+	mutex_lock(&vsock->dgram_rx_lock);
>+
>+	if (vsock->dgram_rx_run)
>+		virtio_transport_do_rx_work(vsock, vq, true);
>+
>+	if (vsock->dgram_rx_buf_nr < vsock->dgram_rx_buf_max_nr / 2)
>+		virtio_vsock_rx_fill(vsock, true);
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+}
>+
> static int virtio_vsock_probe(struct virtio_device *vdev)
> {
> 	vq_callback_t *callbacks[] = {
>@@ -642,8 +833,14 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> 	vsock->rx_buf_max_nr = 0;
> 	atomic_set(&vsock->queued_replies, 0);
>
>+	vsock->dgram_rx_buf_nr = 0;
>+	vsock->dgram_rx_buf_max_nr = 0;
>+	atomic_set(&vsock->dgram_queued_replies, 0);
>+
> 	mutex_init(&vsock->tx_lock);
> 	mutex_init(&vsock->rx_lock);
>+	mutex_init(&vsock->dgram_tx_lock);
>+	mutex_init(&vsock->dgram_rx_lock);
> 	mutex_init(&vsock->event_lock);
> 	spin_lock_init(&vsock->send_pkt_list_lock);
> 	INIT_LIST_HEAD(&vsock->send_pkt_list);
>@@ -651,16 +848,27 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> 	INIT_WORK(&vsock->tx_work, virtio_transport_tx_work);
> 	INIT_WORK(&vsock->event_work, virtio_transport_event_work);
> 	INIT_WORK(&vsock->send_pkt_work, virtio_transport_send_pkt_work);
>+	INIT_WORK(&vsock->dgram_rx_work, virtio_transport_dgram_rx_work);
>+	INIT_WORK(&vsock->dgram_tx_work, virtio_transport_dgram_tx_work);
>
> 	mutex_lock(&vsock->tx_lock);
> 	vsock->tx_run = true;
> 	mutex_unlock(&vsock->tx_lock);
>
>+	mutex_lock(&vsock->dgram_tx_lock);
>+	vsock->dgram_tx_run = true;
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
> 	mutex_lock(&vsock->rx_lock);
>-	virtio_vsock_rx_fill(vsock);
>+	virtio_vsock_rx_fill(vsock, false);
> 	vsock->rx_run = true;
> 	mutex_unlock(&vsock->rx_lock);
>
>+	mutex_lock(&vsock->dgram_rx_lock);
>+	virtio_vsock_rx_fill(vsock, true);
>+	vsock->dgram_rx_run = true;
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+
> 	mutex_lock(&vsock->event_lock);
> 	virtio_vsock_event_fill(vsock);
> 	vsock->event_run = true;
>@@ -669,6 +877,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> 	vdev->priv = vsock;
> 	rcu_assign_pointer(the_virtio_vsock, vsock);
>
>+	the_virtio_vsock_dgram = vsock;
>+	refcount_set(&the_virtio_vsock_dgram->active, 1);
>+
> 	mutex_unlock(&the_virtio_vsock_mutex);
> 	return 0;
>
>@@ -699,14 +910,28 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
> 	vsock->rx_run = false;
> 	mutex_unlock(&vsock->rx_lock);
>
>+	mutex_lock(&vsock->dgram_rx_lock);
>+	vsock->dgram_rx_run = false;
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+
> 	mutex_lock(&vsock->tx_lock);
> 	vsock->tx_run = false;
> 	mutex_unlock(&vsock->tx_lock);
>
>+	mutex_lock(&vsock->dgram_tx_lock);
>+	vsock->dgram_tx_run = false;
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
> 	mutex_lock(&vsock->event_lock);
> 	vsock->event_run = false;
> 	mutex_unlock(&vsock->event_lock);
>
>+	while (!refcount_dec_if_one(&the_virtio_vsock_dgram->active)) {
>+		if (signal_pending(current))
>+			break;
>+		msleep(5);
>+	}
>+
> 	/* Flush all device writes and interrupts, device will not use any
> 	 * more buffers.
> 	 */
>@@ -717,11 +942,21 @@ static void virtio_vsock_remove(struct 
>virtio_device *vdev)
> 		virtio_transport_free_pkt(pkt);
> 	mutex_unlock(&vsock->rx_lock);
>
>+	mutex_lock(&vsock->dgram_rx_lock);
>+	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_DGRAM_RX])))
>+		virtio_transport_free_pkt(pkt);
>+	mutex_unlock(&vsock->dgram_rx_lock);
>+
> 	mutex_lock(&vsock->tx_lock);
> 	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_TX])))
> 		virtio_transport_free_pkt(pkt);
> 	mutex_unlock(&vsock->tx_lock);
>
>+	mutex_lock(&vsock->dgram_tx_lock);
>+	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_DGRAM_TX])))
>+		virtio_transport_free_pkt(pkt);
>+	mutex_unlock(&vsock->dgram_tx_lock);
>+
> 	spin_lock_bh(&vsock->send_pkt_list_lock);
> 	while (!list_empty(&vsock->send_pkt_list)) {
> 		pkt = list_first_entry(&vsock->send_pkt_list,
>@@ -739,6 +974,8 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
> 	 */
> 	flush_work(&vsock->rx_work);
> 	flush_work(&vsock->tx_work);
>+	flush_work(&vsock->dgram_rx_work);
>+	flush_work(&vsock->dgram_tx_work);
> 	flush_work(&vsock->event_work);
> 	flush_work(&vsock->send_pkt_work);
>
>@@ -775,7 +1012,7 @@ static int __init virtio_vsock_init(void)
> 		return -ENOMEM;
>
> 	ret = vsock_core_register(&virtio_transport.transport,
>-				  VSOCK_TRANSPORT_F_G2H);
>+				  VSOCK_TRANSPORT_F_G2H | VSOCK_TRANSPORT_F_DGRAM);
> 	if (ret)
> 		goto out_wq;
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 902cb6dd710b..9f041515b7f1 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -26,6 +26,8 @@
> /* Threshold for detecting small packets to copy */
> #define GOOD_COPY_LEN  128
>
>+static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk);
>+
> static const struct virtio_transport *
> virtio_transport_get_ops(struct vsock_sock *vsk)
> {
>@@ -196,21 +198,28 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> 	vvs = vsk->trans;
>
> 	/* we can send less than pkt_len bytes */
>-	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
>-		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>+	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
>+		if (info->type == VIRTIO_VSOCK_TYPE_STREAM)
>+			pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>+		else
>+			return 0;
>+	}
>
>-	/* virtio_transport_get_credit might return less than pkt_len credit */
>-	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>+	if (info->type == VIRTIO_VSOCK_TYPE_STREAM) {
>+		/* virtio_transport_get_credit might return less than pkt_len credit */
>+		pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>
>-	/* Do not send zero length OP_RW pkt */
>-	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>-		return pkt_len;
>+		/* Do not send zero length OP_RW pkt */
>+		if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>+			return pkt_len;
>+	}
>
> 	pkt = virtio_transport_alloc_pkt(info, pkt_len,
> 					 src_cid, src_port,
> 					 dst_cid, dst_port);
> 	if (!pkt) {
>-		virtio_transport_put_credit(vvs, pkt_len);
>+		if (info->type == VIRTIO_VSOCK_TYPE_STREAM)
>+			virtio_transport_put_credit(vvs, pkt_len);
> 		return -ENOMEM;
> 	}
>
>@@ -397,6 +406,58 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> 	return err;
> }
>
>+static ssize_t
>+virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
>+						   struct msghdr *msg, size_t len)
>+{
>+	struct virtio_vsock_sock *vvs = vsk->trans;
>+	struct virtio_vsock_pkt *pkt;
>+	size_t total = 0;
>+	u32 free_space;

`free_space` seems unused.

>+	int err = -EFAULT;
>+
>+	spin_lock_bh(&vvs->rx_lock);
>+	if (total < len && !list_empty(&vvs->rx_queue)) {
>+		pkt = list_first_entry(&vvs->rx_queue,
>+				       struct virtio_vsock_pkt, list);
>+
>+		total = len;
>+		if (total > pkt->len - pkt->off)
>+			total = pkt->len - pkt->off;
>+		else if (total < pkt->len - pkt->off)
>+			msg->msg_flags |= MSG_TRUNC;
>+
>+		/* sk_lock is held by caller so no one else can dequeue.
>+		 * Unlock rx_lock since memcpy_to_msg() may sleep.
>+		 */
>+		spin_unlock_bh(&vvs->rx_lock);
>+
>+		err = memcpy_to_msg(msg, pkt->buf + pkt->off, total);
>+		if (err)
>+			return err;
>+
>+		spin_lock_bh(&vvs->rx_lock);
>+
>+		virtio_transport_dec_rx_pkt(vvs, pkt);
>+		list_del(&pkt->list);
>+		virtio_transport_free_pkt(pkt);
>+	}
>+
>+	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
>+
>+	spin_unlock_bh(&vvs->rx_lock);
>+
>+	if (total > 0 && msg->msg_name) {
>+		/* Provide the address of the sender. */
>+		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
>+
>+		vsock_addr_init(vm_addr, le64_to_cpu(pkt->hdr.src_cid),
>+						le32_to_cpu(pkt->hdr.src_port));
>+		msg->msg_namelen = sizeof(*vm_addr);
>+	}
>+	return total;
>+}
>+
> ssize_t
> virtio_transport_stream_dequeue(struct vsock_sock *vsk,
> 				struct msghdr *msg,
>@@ -414,7 +475,66 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> 			       struct msghdr *msg,
> 			       size_t len, int flags)
> {
>-	return -EOPNOTSUPP;
>+	struct sock *sk;
>+	size_t err = 0;
>+	long timeout;
>+
>+	DEFINE_WAIT(wait);
>+
>+	sk = &vsk->sk;
>+	err = 0;
>+
>+	lock_sock(sk);
>+
>+	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags & MSG_PEEK)
>+		return -EOPNOTSUPP;
>+
>+	if (!len)
>+		goto out;
>+
>+	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
>+
>+	while (1) {
>+		s64 ready;
>+
>+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
>+		ready = virtio_transport_dgram_has_data(vsk);
>+
>+		if (ready == 0) {
>+			if (timeout == 0) {
>+				err = -EAGAIN;
>+				finish_wait(sk_sleep(sk), &wait);
>+				break;
>+			}
>+
>+			release_sock(sk);
>+			timeout = schedule_timeout(timeout);
>+			lock_sock(sk);
>+
>+			if (signal_pending(current)) {
>+				err = sock_intr_errno(timeout);
>+				finish_wait(sk_sleep(sk), &wait);
>+				break;
>+			} else if (timeout == 0) {
>+				err = -EAGAIN;
>+				finish_wait(sk_sleep(sk), &wait);
>+				break;
>+			}
>+		} else {
>+			finish_wait(sk_sleep(sk), &wait);
>+
>+			if (ready < 0) {
>+				err = -ENOMEM;
>+				goto out;
>+			}
>+
>+			err = virtio_transport_dgram_do_dequeue(vsk, msg, len);
>+			break;
>+		}
>+	}
>+out:
>+	release_sock(sk);
>+	return err;
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
>
>@@ -431,6 +551,11 @@ s64 virtio_transport_stream_has_data(struct vsock_sock *vsk)
> }
> EXPORT_SYMBOL_GPL(virtio_transport_stream_has_data);
>
>+static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk)
>+{
>+	return virtio_transport_stream_has_data(vsk);
>+}
>+
> static s64 virtio_transport_has_space(struct vsock_sock *vsk)
> {
> 	struct virtio_vsock_sock *vvs = vsk->trans;
>@@ -610,13 +735,15 @@ EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> 				struct sockaddr_vm *addr)
> {
>-	return -EOPNOTSUPP;
>+	//use same stream bind for dgram
>+	int ret = vsock_bind_stream(vsk, addr);
>+	return ret;
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
>
> bool virtio_transport_dgram_allow(u32 cid, u32 port)
> {
>-	return false;
>+	return true;
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
>
>@@ -654,7 +781,17 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
> 			       struct msghdr *msg,
> 			       size_t dgram_len)
> {
>-	return -EOPNOTSUPP;
>+	struct virtio_vsock_pkt_info info = {
>+		.op = VIRTIO_VSOCK_OP_RW,
>+		.type = VIRTIO_VSOCK_TYPE_DGRAM,
>+		.msg = msg,
>+		.pkt_len = dgram_len,
>+		.vsk = vsk,
>+		.remote_cid = remote_addr->svm_cid,
>+		.remote_port = remote_addr->svm_port,
>+	};
>+
>+	return virtio_transport_send_pkt_info(vsk, &info);
> }
> EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
>
>@@ -729,7 +866,6 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
> 		virtio_transport_free_pkt(reply);
> 		return -ENOTCONN;
> 	}
>-
> 	return t->send_pkt(reply);
> }
>
>@@ -925,7 +1061,8 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> 		/* If there is space in the last packet queued, we copy the
> 		 * new packet in its buffer.
> 		 */
>-		if (pkt->len <= last_pkt->buf_len - last_pkt->len) {
>+		if (pkt->len <= last_pkt->buf_len - last_pkt->len &&
>+			pkt->hdr.type == VIRTIO_VSOCK_TYPE_STREAM) {

We should use le16_to_cpu():
			le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM

> 			memcpy(last_pkt->buf + last_pkt->len, pkt->buf,
> 			       pkt->len);
> 			last_pkt->len += pkt->len;
>@@ -949,6 +1086,12 @@ virtio_transport_recv_connected(struct sock *sk,
> 	struct vsock_sock *vsk = vsock_sk(sk);
> 	int err = 0;
>
>+	if (le16_to_cpu(pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)) {

We should use le16_to_cpu() before the compare:
	if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_DGRAM) {

>+		virtio_transport_recv_enqueue(vsk, pkt);
>+		sk->sk_data_ready(sk);
>+		return err;
>+	}
>+
> 	switch (le16_to_cpu(pkt->hdr.op)) {
> 	case VIRTIO_VSOCK_OP_RW:
> 		virtio_transport_recv_enqueue(vsk, pkt);
>@@ -1121,7 +1264,8 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> 					le32_to_cpu(pkt->hdr.buf_alloc),
> 					le32_to_cpu(pkt->hdr.fwd_cnt));
>
>-	if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM) {
>+	if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM &&
>+		le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_DGRAM) {
> 		(void)virtio_transport_reset_no_sock(t, pkt);
> 		goto free_pkt;
> 	}
>@@ -1150,11 +1294,16 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> 		goto free_pkt;
> 	}
>
>-	space_available = virtio_transport_space_update(sk, pkt);
>-
> 	/* Update CID in case it has changed after a transport reset event */
> 	vsk->local_addr.svm_cid = dst.svm_cid;
>
>+	if (sk->sk_type == SOCK_DGRAM) {
>+		virtio_transport_recv_connected(sk, pkt);
>+		goto out;
>+	}
>+
>+	space_available = virtio_transport_space_update(sk, pkt);
>+
> 	if (space_available)
> 		sk->sk_write_space(sk);
>
>@@ -1180,6 +1329,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> 		break;
> 	}
>
>+out:
> 	release_sock(sk);
>
> 	/* Release refcnt obtained when we fetched this socket out of the
>-- 
>2.11.0
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 3/6] vhost/vsock: add support for vhost dgram.
  2021-06-09 23:24   ` Jiang Wang
@ 2021-06-18 10:13     ` Stefano Garzarella
  -1 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-18 10:13 UTC (permalink / raw)
  To: Jiang Wang
  Cc: virtualization, stefanha, mst, arseny.krasnov, jhansen,
	cong.wang, duanxiongchun, xieyongji, chaiwen.cc, Jason Wang,
	David S. Miller, Jakub Kicinski, Steven Rostedt, Ingo Molnar,
	Colin Ian King, Andra Paraschiv, Norbert Slusarek,
	Jeff Vander Stoep, Alexander Popov, kvm, netdev, linux-kernel

We should use le16_to_cpu when accessing pkt->hdr fields.

On Wed, Jun 09, 2021 at 11:24:55PM +0000, Jiang Wang wrote:
>This patch supports dgram on vhost side, including
>tx and rx. The vhost send packets asynchronously.
>
>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>---
> drivers/vhost/vsock.c | 199 +++++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 173 insertions(+), 26 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 81d064601093..d366463be6d4 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -28,7 +28,10 @@
>  * small pkts.
>  */
> #define VHOST_VSOCK_PKT_WEIGHT 256
>+#define VHOST_VSOCK_DGRM_MAX_PENDING_PKT 128
>
>+/* Max wait time in busy poll in microseconds */
>+#define VHOST_VSOCK_BUSY_POLL_TIMEOUT 20
> enum {
> 	VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> 			       (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
>@@ -45,7 +48,7 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
>
> struct vhost_vsock {
> 	struct vhost_dev dev;
>-	struct vhost_virtqueue vqs[2];
>+	struct vhost_virtqueue vqs[4];
>
> 	/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */
> 	struct hlist_node hash;
>@@ -54,6 +57,11 @@ struct vhost_vsock {
> 	spinlock_t send_pkt_list_lock;
> 	struct list_head send_pkt_list;	/* host->guest pending packets */
>
>+	spinlock_t dgram_send_pkt_list_lock;
>+	struct list_head dgram_send_pkt_list;	/* host->guest pending packets */
>+	struct vhost_work dgram_send_pkt_work;
>+	int  dgram_used; /*pending packets to be send */
>+
> 	atomic_t queued_replies;
>
> 	u32 guest_cid;
>@@ -90,10 +98,22 @@ static void
> vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 			    struct vhost_virtqueue *vq)
> {
>-	struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX];
>+	struct vhost_virtqueue *tx_vq;
> 	int pkts = 0, total_len = 0;
> 	bool added = false;
> 	bool restart_tx = false;
>+	spinlock_t *lock;
>+	struct list_head *send_pkt_list;
>+
>+	if (vq == &vsock->vqs[VSOCK_VQ_RX]) {
>+		tx_vq = &vsock->vqs[VSOCK_VQ_TX];
>+		lock = &vsock->send_pkt_list_lock;
>+		send_pkt_list = &vsock->send_pkt_list;
>+	} else {
>+		tx_vq = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
>+		lock = &vsock->dgram_send_pkt_list_lock;
>+		send_pkt_list = &vsock->dgram_send_pkt_list;
>+	}
>
> 	mutex_lock(&vq->mutex);
>
>@@ -113,36 +133,48 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 		size_t nbytes;
> 		size_t iov_len, payload_len;
> 		int head;
>+		bool is_dgram = false;
>
>-		spin_lock_bh(&vsock->send_pkt_list_lock);
>-		if (list_empty(&vsock->send_pkt_list)) {
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+		spin_lock_bh(lock);
>+		if (list_empty(send_pkt_list)) {
>+			spin_unlock_bh(lock);
> 			vhost_enable_notify(&vsock->dev, vq);
> 			break;
> 		}
>
>-		pkt = list_first_entry(&vsock->send_pkt_list,
>+		pkt = list_first_entry(send_pkt_list,
> 				       struct virtio_vsock_pkt, list);
> 		list_del_init(&pkt->list);
>-		spin_unlock_bh(&vsock->send_pkt_list_lock);
>+		spin_unlock_bh(lock);
>+
>+		if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)
                     ^
                     le16_to_cpu(pkt->hdr.type)

>+			is_dgram = true;
>
> 		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> 					 &out, &in, NULL, NULL);
> 		if (head < 0) {
>-			spin_lock_bh(&vsock->send_pkt_list_lock);
>-			list_add(&pkt->list, &vsock->send_pkt_list);
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+			spin_lock_bh(lock);
>+			list_add(&pkt->list, send_pkt_list);
>+			spin_unlock_bh(lock);
> 			break;
> 		}
>
> 		if (head == vq->num) {
>-			spin_lock_bh(&vsock->send_pkt_list_lock);
>-			list_add(&pkt->list, &vsock->send_pkt_list);
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+			if (is_dgram) {
>+				virtio_transport_free_pkt(pkt);
>+				vq_err(vq, "Dgram virtqueue is full!");
>+				spin_lock_bh(lock);
>+				vsock->dgram_used--;
>+				spin_unlock_bh(lock);
>+				break;
>+			}
>+			spin_lock_bh(lock);
>+			list_add(&pkt->list, send_pkt_list);
>+			spin_unlock_bh(lock);
>
> 			/* We cannot finish yet if more buffers snuck in while
>-			 * re-enabling notify.
>-			 */
>+			* re-enabling notify.
>+			*/
> 			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
> 				vhost_disable_notify(&vsock->dev, vq);
> 				continue;
>@@ -153,6 +185,12 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 		if (out) {
> 			virtio_transport_free_pkt(pkt);
> 			vq_err(vq, "Expected 0 output buffers, got %u\n", out);
>+			if (is_dgram) {
>+				spin_lock_bh(lock);
>+				vsock->dgram_used--;
>+				spin_unlock_bh(lock);
>+			}
>+
> 			break;
> 		}
>
>@@ -160,6 +198,18 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 		if (iov_len < sizeof(pkt->hdr)) {
> 			virtio_transport_free_pkt(pkt);
> 			vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
>+			if (is_dgram) {
>+				spin_lock_bh(lock);
>+				vsock->dgram_used--;
>+				spin_unlock_bh(lock);
>+			}
>+			break;
>+		}
>+
>+		if (iov_len < pkt->len - pkt->off &&
>+			vq == &vsock->vqs[VSOCK_VQ_DGRAM_RX]) {
>+			virtio_transport_free_pkt(pkt);
>+			vq_err(vq, "Buffer len [%zu] too small for dgram\n", iov_len);
> 			break;
> 		}
>
>@@ -179,6 +229,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 		if (nbytes != sizeof(pkt->hdr)) {
> 			virtio_transport_free_pkt(pkt);
> 			vq_err(vq, "Faulted on copying pkt hdr\n");
>+			if (is_dgram) {
>+				spin_lock_bh(lock);
>+				vsock->dgram_used--;
>+				spin_unlock_bh(lock);
>+			}
> 			break;
> 		}
>
>@@ -204,16 +259,17 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 		/* If we didn't send all the payload we can requeue the packet
> 		 * to send it with the next available buffer.
> 		 */
>-		if (pkt->off < pkt->len) {
>+		if ((pkt->off < pkt->len)
>+			&& (vq == &vsock->vqs[VSOCK_VQ_RX])) {
> 			/* We are queueing the same virtio_vsock_pkt to handle
> 			 * the remaining bytes, and we want to deliver it
> 			 * to monitoring devices in the next iteration.
> 			 */
> 			pkt->tap_delivered = false;
>
>-			spin_lock_bh(&vsock->send_pkt_list_lock);
>-			list_add(&pkt->list, &vsock->send_pkt_list);
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+			spin_lock_bh(lock);
>+			list_add(&pkt->list, send_pkt_list);
>+			spin_unlock_bh(lock);
> 		} else {
> 			if (pkt->reply) {
> 				int val;
>@@ -228,6 +284,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 			}
>
> 			virtio_transport_free_pkt(pkt);
>+			if (is_dgram) {
>+				spin_lock_bh(lock);
>+				vsock->dgram_used--;
>+				spin_unlock_bh(lock);
>+			}
> 		}
> 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
> 	if (added)
>@@ -251,11 +312,25 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work)
> 	vhost_transport_do_send_pkt(vsock, vq);
> }
>
>+static void vhost_transport_dgram_send_pkt_work(struct vhost_work *work)
>+{
>+	struct vhost_virtqueue *vq;
>+	struct vhost_vsock *vsock;
>+
>+	vsock = container_of(work, struct vhost_vsock, dgram_send_pkt_work);
>+	vq = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
>+
>+	vhost_transport_do_send_pkt(vsock, vq);
>+}
>+
> static int
> vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> {
> 	struct vhost_vsock *vsock;
> 	int len = pkt->len;
>+	spinlock_t *lock;
>+	struct list_head *send_pkt_list;
>+	struct vhost_work *work;
>
> 	rcu_read_lock();
>
>@@ -267,14 +342,38 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> 		return -ENODEV;
> 	}
>
>+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_STREAM) {
             ^
             le16_to_cpu(pkt->hdr.type)
>+		lock = &vsock->send_pkt_list_lock;
>+		send_pkt_list = &vsock->send_pkt_list;
>+		work = &vsock->send_pkt_work;
>+	} else if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {
                    ^
                    le16_to_cpu(pkt->hdr.type)
>+		lock = &vsock->dgram_send_pkt_list_lock;
>+		send_pkt_list = &vsock->dgram_send_pkt_list;
>+		work = &vsock->dgram_send_pkt_work;
>+	} else {
>+		rcu_read_unlock();
>+		virtio_transport_free_pkt(pkt);
>+		return -EINVAL;
>+	}
>+
>+
> 	if (pkt->reply)
> 		atomic_inc(&vsock->queued_replies);
>
>-	spin_lock_bh(&vsock->send_pkt_list_lock);
>-	list_add_tail(&pkt->list, &vsock->send_pkt_list);
>-	spin_unlock_bh(&vsock->send_pkt_list_lock);
>+	spin_lock_bh(lock);
>+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {
             ^
             le16_to_cpu(pkt->hdr.type)
>+		if (vsock->dgram_used  == VHOST_VSOCK_DGRM_MAX_PENDING_PKT)
>+			len = -ENOMEM;
>+		else {
>+			vsock->dgram_used++;
>+			list_add_tail(&pkt->list, send_pkt_list);
>+		}
>+	} else
>+		list_add_tail(&pkt->list, send_pkt_list);
>
>-	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
>+	spin_unlock_bh(lock);
>+
>+	vhost_work_queue(&vsock->dev, work);
>
> 	rcu_read_unlock();
> 	return len;
>@@ -355,7 +454,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> 		return NULL;
> 	}
>
>-	if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM)
>+	if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM
>+		|| le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_DGRAM)
> 		pkt->len = le32_to_cpu(pkt->hdr.len);
>
> 	/* No payload */
>@@ -442,6 +542,18 @@ static struct virtio_transport vhost_transport = {
> 	.send_pkt = vhost_transport_send_pkt,
> };
>
>+static inline unsigned long busy_clock(void)
>+{
>+	return local_clock() >> 10;
>+}
>+
>+static bool vhost_can_busy_poll(unsigned long endtime)
>+{
>+	return likely(!need_resched() && !time_after(busy_clock(), endtime) &&
>+		      !signal_pending(current));
>+}
>+
>+
> static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> {
> 	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
>@@ -452,6 +564,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> 	int head, pkts = 0, total_len = 0;
> 	unsigned int out, in;
> 	bool added = false;
>+	unsigned long busyloop_timeout = VHOST_VSOCK_BUSY_POLL_TIMEOUT;
>+	unsigned long endtime;
>
> 	mutex_lock(&vq->mutex);
>
>@@ -461,11 +575,14 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> 	if (!vq_meta_prefetch(vq))
> 		goto out;
>
>+	endtime = busy_clock() + busyloop_timeout;
> 	vhost_disable_notify(&vsock->dev, vq);
>+	preempt_disable();
> 	do {
> 		u32 len;
>
>-		if (!vhost_vsock_more_replies(vsock)) {
>+		if (vq == &vsock->vqs[VSOCK_VQ_TX]
>+			&& !vhost_vsock_more_replies(vsock)) {
> 			/* Stop tx until the device processes already
> 			 * pending replies.  Leave tx virtqueue
> 			 * callbacks disabled.
>@@ -479,6 +596,11 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> 			break;
>
> 		if (head == vq->num) {
>+			if (vhost_can_busy_poll(endtime)) {
>+				cpu_relax();
>+				continue;
>+			}
>+
> 			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
> 				vhost_disable_notify(&vsock->dev, vq);
> 				continue;
>@@ -510,6 +632,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> 		total_len += len;
> 		added = true;
> 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
>+	preempt_enable();
>
> no_more_replies:
> 	if (added)
>@@ -565,6 +688,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
> 	 * let's kick the send worker to send them.
> 	 */
> 	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
>+	vhost_work_queue(&vsock->dev, &vsock->dgram_send_pkt_work);
>
> 	mutex_unlock(&vsock->dev.mutex);
> 	return 0;
>@@ -639,8 +763,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>
> 	vqs[VSOCK_VQ_TX] = &vsock->vqs[VSOCK_VQ_TX];
> 	vqs[VSOCK_VQ_RX] = &vsock->vqs[VSOCK_VQ_RX];
>+	vqs[VSOCK_VQ_DGRAM_TX] = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
>+	vqs[VSOCK_VQ_DGRAM_RX] = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
> 	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
> 	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
>+	vsock->vqs[VSOCK_VQ_DGRAM_TX].handle_kick =
>+						vhost_vsock_handle_tx_kick;
>+	vsock->vqs[VSOCK_VQ_DGRAM_RX].handle_kick =
>+						vhost_vsock_handle_rx_kick;
>
> 	vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
> 		       UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
>@@ -650,6 +780,11 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
> 	spin_lock_init(&vsock->send_pkt_list_lock);
> 	INIT_LIST_HEAD(&vsock->send_pkt_list);
> 	vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
>+	spin_lock_init(&vsock->dgram_send_pkt_list_lock);
>+	INIT_LIST_HEAD(&vsock->dgram_send_pkt_list);
>+	vhost_work_init(&vsock->dgram_send_pkt_work,
>+			vhost_transport_dgram_send_pkt_work);
>+
> 	return 0;
>
> out:
>@@ -665,6 +800,7 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)
> 		if (vsock->vqs[i].handle_kick)
> 			vhost_poll_flush(&vsock->vqs[i].poll);
> 	vhost_work_flush(&vsock->dev, &vsock->send_pkt_work);
>+	vhost_work_flush(&vsock->dev, &vsock->dgram_send_pkt_work);
> }
>
> static void vhost_vsock_reset_orphans(struct sock *sk)
>@@ -724,6 +860,17 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
> 	}
> 	spin_unlock_bh(&vsock->send_pkt_list_lock);
>
>+	spin_lock_bh(&vsock->dgram_send_pkt_list_lock);
>+	while (!list_empty(&vsock->dgram_send_pkt_list)) {
>+		struct virtio_vsock_pkt *pkt;
>+
>+		pkt = list_first_entry(&vsock->dgram_send_pkt_list,
>+				struct virtio_vsock_pkt, list);
>+		list_del_init(&pkt->list);
>+		virtio_transport_free_pkt(pkt);
>+	}
>+	spin_unlock_bh(&vsock->dgram_send_pkt_list_lock);
>+
> 	vhost_dev_cleanup(&vsock->dev);
> 	kfree(vsock->dev.vqs);
> 	vhost_vsock_free(vsock);
>@@ -906,7 +1053,7 @@ static int __init vhost_vsock_init(void)
> 	int ret;
>
> 	ret = vsock_core_register(&vhost_transport.transport,
>-				  VSOCK_TRANSPORT_F_H2G);
>+				  VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM);
> 	if (ret < 0)
> 		return ret;
> 	return misc_register(&vhost_vsock_misc);
>-- 
>2.11.0
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC v1 3/6] vhost/vsock: add support for vhost dgram.
@ 2021-06-18 10:13     ` Stefano Garzarella
  0 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-18 10:13 UTC (permalink / raw)
  To: Jiang Wang
  Cc: cong.wang, Andra Paraschiv, kvm, mst, virtualization,
	Norbert Slusarek, jhansen, duanxiongchun, Jeff Vander Stoep,
	xieyongji, Ingo Molnar, Jakub Kicinski, Alexander Popov,
	Steven Rostedt, chaiwen.cc, stefanha, netdev, linux-kernel,
	Colin Ian King, arseny.krasnov, David S. Miller

We should use le16_to_cpu when accessing pkt->hdr fields.

On Wed, Jun 09, 2021 at 11:24:55PM +0000, Jiang Wang wrote:
>This patch supports dgram on vhost side, including
>tx and rx. The vhost send packets asynchronously.
>
>Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>---
> drivers/vhost/vsock.c | 199 +++++++++++++++++++++++++++++++++++++++++++-------
> 1 file changed, 173 insertions(+), 26 deletions(-)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 81d064601093..d366463be6d4 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -28,7 +28,10 @@
>  * small pkts.
>  */
> #define VHOST_VSOCK_PKT_WEIGHT 256
>+#define VHOST_VSOCK_DGRM_MAX_PENDING_PKT 128
>
>+/* Max wait time in busy poll in microseconds */
>+#define VHOST_VSOCK_BUSY_POLL_TIMEOUT 20
> enum {
> 	VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> 			       (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
>@@ -45,7 +48,7 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
>
> struct vhost_vsock {
> 	struct vhost_dev dev;
>-	struct vhost_virtqueue vqs[2];
>+	struct vhost_virtqueue vqs[4];
>
> 	/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */
> 	struct hlist_node hash;
>@@ -54,6 +57,11 @@ struct vhost_vsock {
> 	spinlock_t send_pkt_list_lock;
> 	struct list_head send_pkt_list;	/* host->guest pending packets */
>
>+	spinlock_t dgram_send_pkt_list_lock;
>+	struct list_head dgram_send_pkt_list;	/* host->guest pending packets */
>+	struct vhost_work dgram_send_pkt_work;
>+	int  dgram_used; /*pending packets to be send */
>+
> 	atomic_t queued_replies;
>
> 	u32 guest_cid;
>@@ -90,10 +98,22 @@ static void
> vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 			    struct vhost_virtqueue *vq)
> {
>-	struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX];
>+	struct vhost_virtqueue *tx_vq;
> 	int pkts = 0, total_len = 0;
> 	bool added = false;
> 	bool restart_tx = false;
>+	spinlock_t *lock;
>+	struct list_head *send_pkt_list;
>+
>+	if (vq == &vsock->vqs[VSOCK_VQ_RX]) {
>+		tx_vq = &vsock->vqs[VSOCK_VQ_TX];
>+		lock = &vsock->send_pkt_list_lock;
>+		send_pkt_list = &vsock->send_pkt_list;
>+	} else {
>+		tx_vq = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
>+		lock = &vsock->dgram_send_pkt_list_lock;
>+		send_pkt_list = &vsock->dgram_send_pkt_list;
>+	}
>
> 	mutex_lock(&vq->mutex);
>
>@@ -113,36 +133,48 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 		size_t nbytes;
> 		size_t iov_len, payload_len;
> 		int head;
>+		bool is_dgram = false;
>
>-		spin_lock_bh(&vsock->send_pkt_list_lock);
>-		if (list_empty(&vsock->send_pkt_list)) {
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+		spin_lock_bh(lock);
>+		if (list_empty(send_pkt_list)) {
>+			spin_unlock_bh(lock);
> 			vhost_enable_notify(&vsock->dev, vq);
> 			break;
> 		}
>
>-		pkt = list_first_entry(&vsock->send_pkt_list,
>+		pkt = list_first_entry(send_pkt_list,
> 				       struct virtio_vsock_pkt, list);
> 		list_del_init(&pkt->list);
>-		spin_unlock_bh(&vsock->send_pkt_list_lock);
>+		spin_unlock_bh(lock);
>+
>+		if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)
                     ^
                     le16_to_cpu(pkt->hdr.type)

>+			is_dgram = true;
>
> 		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> 					 &out, &in, NULL, NULL);
> 		if (head < 0) {
>-			spin_lock_bh(&vsock->send_pkt_list_lock);
>-			list_add(&pkt->list, &vsock->send_pkt_list);
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+			spin_lock_bh(lock);
>+			list_add(&pkt->list, send_pkt_list);
>+			spin_unlock_bh(lock);
> 			break;
> 		}
>
> 		if (head == vq->num) {
>-			spin_lock_bh(&vsock->send_pkt_list_lock);
>-			list_add(&pkt->list, &vsock->send_pkt_list);
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+			if (is_dgram) {
>+				virtio_transport_free_pkt(pkt);
>+				vq_err(vq, "Dgram virtqueue is full!");
>+				spin_lock_bh(lock);
>+				vsock->dgram_used--;
>+				spin_unlock_bh(lock);
>+				break;
>+			}
>+			spin_lock_bh(lock);
>+			list_add(&pkt->list, send_pkt_list);
>+			spin_unlock_bh(lock);
>
> 			/* We cannot finish yet if more buffers snuck in while
>-			 * re-enabling notify.
>-			 */
>+			* re-enabling notify.
>+			*/
> 			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
> 				vhost_disable_notify(&vsock->dev, vq);
> 				continue;
>@@ -153,6 +185,12 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 		if (out) {
> 			virtio_transport_free_pkt(pkt);
> 			vq_err(vq, "Expected 0 output buffers, got %u\n", out);
>+			if (is_dgram) {
>+				spin_lock_bh(lock);
>+				vsock->dgram_used--;
>+				spin_unlock_bh(lock);
>+			}
>+
> 			break;
> 		}
>
>@@ -160,6 +198,18 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 		if (iov_len < sizeof(pkt->hdr)) {
> 			virtio_transport_free_pkt(pkt);
> 			vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
>+			if (is_dgram) {
>+				spin_lock_bh(lock);
>+				vsock->dgram_used--;
>+				spin_unlock_bh(lock);
>+			}
>+			break;
>+		}
>+
>+		if (iov_len < pkt->len - pkt->off &&
>+			vq == &vsock->vqs[VSOCK_VQ_DGRAM_RX]) {
>+			virtio_transport_free_pkt(pkt);
>+			vq_err(vq, "Buffer len [%zu] too small for dgram\n", iov_len);
> 			break;
> 		}
>
>@@ -179,6 +229,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 		if (nbytes != sizeof(pkt->hdr)) {
> 			virtio_transport_free_pkt(pkt);
> 			vq_err(vq, "Faulted on copying pkt hdr\n");
>+			if (is_dgram) {
>+				spin_lock_bh(lock);
>+				vsock->dgram_used--;
>+				spin_unlock_bh(lock);
>+			}
> 			break;
> 		}
>
>@@ -204,16 +259,17 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 		/* If we didn't send all the payload we can requeue the packet
> 		 * to send it with the next available buffer.
> 		 */
>-		if (pkt->off < pkt->len) {
>+		if ((pkt->off < pkt->len)
>+			&& (vq == &vsock->vqs[VSOCK_VQ_RX])) {
> 			/* We are queueing the same virtio_vsock_pkt to handle
> 			 * the remaining bytes, and we want to deliver it
> 			 * to monitoring devices in the next iteration.
> 			 */
> 			pkt->tap_delivered = false;
>
>-			spin_lock_bh(&vsock->send_pkt_list_lock);
>-			list_add(&pkt->list, &vsock->send_pkt_list);
>-			spin_unlock_bh(&vsock->send_pkt_list_lock);
>+			spin_lock_bh(lock);
>+			list_add(&pkt->list, send_pkt_list);
>+			spin_unlock_bh(lock);
> 		} else {
> 			if (pkt->reply) {
> 				int val;
>@@ -228,6 +284,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> 			}
>
> 			virtio_transport_free_pkt(pkt);
>+			if (is_dgram) {
>+				spin_lock_bh(lock);
>+				vsock->dgram_used--;
>+				spin_unlock_bh(lock);
>+			}
> 		}
> 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
> 	if (added)
>@@ -251,11 +312,25 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work)
> 	vhost_transport_do_send_pkt(vsock, vq);
> }
>
>+static void vhost_transport_dgram_send_pkt_work(struct vhost_work *work)
>+{
>+	struct vhost_virtqueue *vq;
>+	struct vhost_vsock *vsock;
>+
>+	vsock = container_of(work, struct vhost_vsock, dgram_send_pkt_work);
>+	vq = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
>+
>+	vhost_transport_do_send_pkt(vsock, vq);
>+}
>+
> static int
> vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> {
> 	struct vhost_vsock *vsock;
> 	int len = pkt->len;
>+	spinlock_t *lock;
>+	struct list_head *send_pkt_list;
>+	struct vhost_work *work;
>
> 	rcu_read_lock();
>
>@@ -267,14 +342,38 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> 		return -ENODEV;
> 	}
>
>+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_STREAM) {
             ^
             le16_to_cpu(pkt->hdr.type)
>+		lock = &vsock->send_pkt_list_lock;
>+		send_pkt_list = &vsock->send_pkt_list;
>+		work = &vsock->send_pkt_work;
>+	} else if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {
                    ^
                    le16_to_cpu(pkt->hdr.type)
>+		lock = &vsock->dgram_send_pkt_list_lock;
>+		send_pkt_list = &vsock->dgram_send_pkt_list;
>+		work = &vsock->dgram_send_pkt_work;
>+	} else {
>+		rcu_read_unlock();
>+		virtio_transport_free_pkt(pkt);
>+		return -EINVAL;
>+	}
>+
>+
> 	if (pkt->reply)
> 		atomic_inc(&vsock->queued_replies);
>
>-	spin_lock_bh(&vsock->send_pkt_list_lock);
>-	list_add_tail(&pkt->list, &vsock->send_pkt_list);
>-	spin_unlock_bh(&vsock->send_pkt_list_lock);
>+	spin_lock_bh(lock);
>+	if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {
             ^
             le16_to_cpu(pkt->hdr.type)
>+		if (vsock->dgram_used  == VHOST_VSOCK_DGRM_MAX_PENDING_PKT)
>+			len = -ENOMEM;
>+		else {
>+			vsock->dgram_used++;
>+			list_add_tail(&pkt->list, send_pkt_list);
>+		}
>+	} else
>+		list_add_tail(&pkt->list, send_pkt_list);
>
>-	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
>+	spin_unlock_bh(lock);
>+
>+	vhost_work_queue(&vsock->dev, work);
>
> 	rcu_read_unlock();
> 	return len;
>@@ -355,7 +454,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> 		return NULL;
> 	}
>
>-	if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM)
>+	if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM
>+		|| le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_DGRAM)
> 		pkt->len = le32_to_cpu(pkt->hdr.len);
>
> 	/* No payload */
>@@ -442,6 +542,18 @@ static struct virtio_transport vhost_transport = {
> 	.send_pkt = vhost_transport_send_pkt,
> };
>
>+static inline unsigned long busy_clock(void)
>+{
>+	return local_clock() >> 10;
>+}
>+
>+static bool vhost_can_busy_poll(unsigned long endtime)
>+{
>+	return likely(!need_resched() && !time_after(busy_clock(), endtime) &&
>+		      !signal_pending(current));
>+}
>+
>+
> static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> {
> 	struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
>@@ -452,6 +564,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> 	int head, pkts = 0, total_len = 0;
> 	unsigned int out, in;
> 	bool added = false;
>+	unsigned long busyloop_timeout = VHOST_VSOCK_BUSY_POLL_TIMEOUT;
>+	unsigned long endtime;
>
> 	mutex_lock(&vq->mutex);
>
>@@ -461,11 +575,14 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> 	if (!vq_meta_prefetch(vq))
> 		goto out;
>
>+	endtime = busy_clock() + busyloop_timeout;
> 	vhost_disable_notify(&vsock->dev, vq);
>+	preempt_disable();
> 	do {
> 		u32 len;
>
>-		if (!vhost_vsock_more_replies(vsock)) {
>+		if (vq == &vsock->vqs[VSOCK_VQ_TX]
>+			&& !vhost_vsock_more_replies(vsock)) {
> 			/* Stop tx until the device processes already
> 			 * pending replies.  Leave tx virtqueue
> 			 * callbacks disabled.
>@@ -479,6 +596,11 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> 			break;
>
> 		if (head == vq->num) {
>+			if (vhost_can_busy_poll(endtime)) {
>+				cpu_relax();
>+				continue;
>+			}
>+
> 			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
> 				vhost_disable_notify(&vsock->dev, vq);
> 				continue;
>@@ -510,6 +632,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> 		total_len += len;
> 		added = true;
> 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
>+	preempt_enable();
>
> no_more_replies:
> 	if (added)
>@@ -565,6 +688,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
> 	 * let's kick the send worker to send them.
> 	 */
> 	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
>+	vhost_work_queue(&vsock->dev, &vsock->dgram_send_pkt_work);
>
> 	mutex_unlock(&vsock->dev.mutex);
> 	return 0;
>@@ -639,8 +763,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>
> 	vqs[VSOCK_VQ_TX] = &vsock->vqs[VSOCK_VQ_TX];
> 	vqs[VSOCK_VQ_RX] = &vsock->vqs[VSOCK_VQ_RX];
>+	vqs[VSOCK_VQ_DGRAM_TX] = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
>+	vqs[VSOCK_VQ_DGRAM_RX] = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
> 	vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
> 	vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
>+	vsock->vqs[VSOCK_VQ_DGRAM_TX].handle_kick =
>+						vhost_vsock_handle_tx_kick;
>+	vsock->vqs[VSOCK_VQ_DGRAM_RX].handle_kick =
>+						vhost_vsock_handle_rx_kick;
>
> 	vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
> 		       UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
>@@ -650,6 +780,11 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
> 	spin_lock_init(&vsock->send_pkt_list_lock);
> 	INIT_LIST_HEAD(&vsock->send_pkt_list);
> 	vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
>+	spin_lock_init(&vsock->dgram_send_pkt_list_lock);
>+	INIT_LIST_HEAD(&vsock->dgram_send_pkt_list);
>+	vhost_work_init(&vsock->dgram_send_pkt_work,
>+			vhost_transport_dgram_send_pkt_work);
>+
> 	return 0;
>
> out:
>@@ -665,6 +800,7 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)
> 		if (vsock->vqs[i].handle_kick)
> 			vhost_poll_flush(&vsock->vqs[i].poll);
> 	vhost_work_flush(&vsock->dev, &vsock->send_pkt_work);
>+	vhost_work_flush(&vsock->dev, &vsock->dgram_send_pkt_work);
> }
>
> static void vhost_vsock_reset_orphans(struct sock *sk)
>@@ -724,6 +860,17 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
> 	}
> 	spin_unlock_bh(&vsock->send_pkt_list_lock);
>
>+	spin_lock_bh(&vsock->dgram_send_pkt_list_lock);
>+	while (!list_empty(&vsock->dgram_send_pkt_list)) {
>+		struct virtio_vsock_pkt *pkt;
>+
>+		pkt = list_first_entry(&vsock->dgram_send_pkt_list,
>+				struct virtio_vsock_pkt, list);
>+		list_del_init(&pkt->list);
>+		virtio_transport_free_pkt(pkt);
>+	}
>+	spin_unlock_bh(&vsock->dgram_send_pkt_list_lock);
>+
> 	vhost_dev_cleanup(&vsock->dev);
> 	kfree(vsock->dev.vqs);
> 	vhost_vsock_free(vsock);
>@@ -906,7 +1053,7 @@ static int __init vhost_vsock_init(void)
> 	int ret;
>
> 	ret = vsock_core_register(&vhost_transport.transport,
>-				  VSOCK_TRANSPORT_F_H2G);
>+				  VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM);
> 	if (ret < 0)
> 		return ret;
> 	return misc_register(&vhost_vsock_misc);
>-- 
>2.11.0
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [External] Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
  2021-06-18  9:35   ` Stefano Garzarella
@ 2021-06-21 17:21     ` Jiang Wang .
  -1 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang . @ 2021-06-21 17:21 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Stefan Hajnoczi, Michael S. Tsirkin,
	Arseny Krasnov, cong.wang, Xiongchun Duan, Yongji Xie,
	柴稳,
	Jason Wang, David S. Miller, Jakub Kicinski, Steven Rostedt,
	Ingo Molnar, Colin Ian King, Jorgen Hansen, Andra Paraschiv,
	Norbert Slusarek, Lu Wei, Alexander Popov, kvm, Networking,
	linux-kernel

On Fri, Jun 18, 2021 at 2:35 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Jun 09, 2021 at 11:24:52PM +0000, Jiang Wang wrote:
> >This patchset implements support of SOCK_DGRAM for virtio
> >transport.
> >
> >Datagram sockets are connectionless and unreliable. To avoid unfair contention
> >with stream and other sockets, add two more virtqueues and
> >a new feature bit to indicate if those two new queues exist or not.
> >
> >Dgram does not use the existing credit update mechanism for
> >stream sockets. When sending from the guest/driver, sending packets
> >synchronously, so the sender will get an error when the virtqueue is full.
> >When sending from the host/device, send packets asynchronously
> >because the descriptor memory belongs to the corresponding QEMU
> >process.
> >
> >The virtio spec patch is here:
> >https://www.spinics.net/lists/linux-virtualization/msg50027.html
> >
> >For those who prefer git repo, here is the link for the linux kernel:
> >https://github.com/Jiang1155/linux/tree/vsock-dgram-v1
> >
> >qemu patch link:
> >https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1
> >
> >
> >To do:
> >1. use skb when receiving packets
> >2. support multiple transport
> >3. support mergeable rx buffer
>
> Jiang, I'll do a fast review, but I think is better to rebase on
> net-next since SEQPACKET support is now merged.
>
> Please also run ./scripts/checkpatch.pl, there are a lot of issues.
>
> I'll leave some simple comments in the patches, but I prefer to do a
> deep review after the rebase and the dynamic handling of DGRAM.

Hi Stefano,

Sure. I will rebase and add dynamic handling of DGRAM. I run checkpatch.pl
at some point but I will make sure to run it again before submitting. Thanks.

Regards,

Jiang


> Thanks,
> Stefano
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [External] Re: [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support
@ 2021-06-21 17:21     ` Jiang Wang .
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang . @ 2021-06-21 17:21 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Andra Paraschiv, kvm, Michael S. Tsirkin,
	virtualization, Norbert Slusarek, Xiongchun Duan, Yongji Xie,
	Ingo Molnar, Jakub Kicinski, Alexander Popov, Steven Rostedt,
	柴稳,
	Stefan Hajnoczi, Networking, linux-kernel, Lu Wei,
	Colin Ian King, Arseny Krasnov, David S. Miller, Jorgen Hansen

On Fri, Jun 18, 2021 at 2:35 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Jun 09, 2021 at 11:24:52PM +0000, Jiang Wang wrote:
> >This patchset implements support of SOCK_DGRAM for virtio
> >transport.
> >
> >Datagram sockets are connectionless and unreliable. To avoid unfair contention
> >with stream and other sockets, add two more virtqueues and
> >a new feature bit to indicate if those two new queues exist or not.
> >
> >Dgram does not use the existing credit update mechanism for
> >stream sockets. When sending from the guest/driver, sending packets
> >synchronously, so the sender will get an error when the virtqueue is full.
> >When sending from the host/device, send packets asynchronously
> >because the descriptor memory belongs to the corresponding QEMU
> >process.
> >
> >The virtio spec patch is here:
> >https://www.spinics.net/lists/linux-virtualization/msg50027.html
> >
> >For those who prefer git repo, here is the link for the linux kernel:
> >https://github.com/Jiang1155/linux/tree/vsock-dgram-v1
> >
> >qemu patch link:
> >https://github.com/Jiang1155/qemu/tree/vsock-dgram-v1
> >
> >
> >To do:
> >1. use skb when receiving packets
> >2. support multiple transport
> >3. support mergeable rx buffer
>
> Jiang, I'll do a fast review, but I think is better to rebase on
> net-next since SEQPACKET support is now merged.
>
> Please also run ./scripts/checkpatch.pl, there are a lot of issues.
>
> I'll leave some simple comments in the patches, but I prefer to do a
> deep review after the rebase and the dynamic handling of DGRAM.

Hi Stefano,

Sure. I will rebase and add dynamic handling of DGRAM. I run checkpatch.pl
at some point but I will make sure to run it again before submitting. Thanks.

Regards,

Jiang


> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [External] Re: [RFC v1 1/6] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
  2021-06-18  9:39     ` Stefano Garzarella
@ 2021-06-21 17:24       ` Jiang Wang .
  -1 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang . @ 2021-06-21 17:24 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Stefan Hajnoczi, Michael S. Tsirkin,
	Arseny Krasnov, cong.wang, Xiongchun Duan, Yongji Xie,
	柴稳,
	Jason Wang, David S. Miller, Jakub Kicinski, Steven Rostedt,
	Ingo Molnar, Andra Paraschiv, Norbert Slusarek, Colin Ian King,
	Alexander Popov, kvm, Networking, linux-kernel

On Fri, Jun 18, 2021 at 2:40 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Jun 09, 2021 at 11:24:53PM +0000, Jiang Wang wrote:
> >When this feature is enabled, allocate 5 queues,
> >otherwise, allocate 3 queues to be compatible with
> >old QEMU versions.
> >
> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> >---
> > drivers/vhost/vsock.c             |  3 +-
> > include/linux/virtio_vsock.h      |  9 +++++
> > include/uapi/linux/virtio_vsock.h |  3 ++
> > net/vmw_vsock/virtio_transport.c  | 73 +++++++++++++++++++++++++++++++++++----
> > 4 files changed, 80 insertions(+), 8 deletions(-)
> >
> >diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> >index 5e78fb719602..81d064601093 100644
> >--- a/drivers/vhost/vsock.c
> >+++ b/drivers/vhost/vsock.c
> >@@ -31,7 +31,8 @@
> >
> > enum {
> >       VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> >-                             (1ULL << VIRTIO_F_ACCESS_PLATFORM)
> >+                             (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
> >+                             (1ULL << VIRTIO_VSOCK_F_DGRAM)
> > };
> >
> > enum {
> >diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> >index dc636b727179..ba3189ed9345 100644
> >--- a/include/linux/virtio_vsock.h
> >+++ b/include/linux/virtio_vsock.h
> >@@ -18,6 +18,15 @@ enum {
> >       VSOCK_VQ_MAX    = 3,
> > };
> >
> >+enum {
> >+      VSOCK_VQ_STREAM_RX     = 0, /* for host to guest data */
> >+      VSOCK_VQ_STREAM_TX     = 1, /* for guest to host data */
> >+      VSOCK_VQ_DGRAM_RX       = 2,
> >+      VSOCK_VQ_DGRAM_TX       = 3,
> >+      VSOCK_VQ_EX_EVENT       = 4,
> >+      VSOCK_VQ_EX_MAX         = 5,
> >+};
> >+
> > /* Per-socket state (accessed via vsk->trans) */
> > struct virtio_vsock_sock {
> >       struct vsock_sock *vsk;
> >diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
> >index 1d57ed3d84d2..b56614dff1c9 100644
> >--- a/include/uapi/linux/virtio_vsock.h
> >+++ b/include/uapi/linux/virtio_vsock.h
> >@@ -38,6 +38,9 @@
> > #include <linux/virtio_ids.h>
> > #include <linux/virtio_config.h>
> >
> >+/* The feature bitmap for virtio net */
> >+#define VIRTIO_VSOCK_F_DGRAM  0       /* Host support dgram vsock */
> >+
> > struct virtio_vsock_config {
> >       __le64 guest_cid;
> > } __attribute__((packed));
> >diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> >index 2700a63ab095..7dcb8db23305 100644
> >--- a/net/vmw_vsock/virtio_transport.c
> >+++ b/net/vmw_vsock/virtio_transport.c
> >@@ -27,7 +27,8 @@ static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
> >
> > struct virtio_vsock {
> >       struct virtio_device *vdev;
> >-      struct virtqueue *vqs[VSOCK_VQ_MAX];
> >+      struct virtqueue **vqs;
> >+      bool has_dgram;
> >
> >       /* Virtqueue processing is deferred to a workqueue */
> >       struct work_struct tx_work;
> >@@ -333,7 +334,10 @@ static int virtio_vsock_event_fill_one(struct virtio_vsock *vsock,
> >       struct scatterlist sg;
> >       struct virtqueue *vq;
> >
> >-      vq = vsock->vqs[VSOCK_VQ_EVENT];
> >+      if (vsock->has_dgram)
> >+              vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
> >+      else
> >+              vq = vsock->vqs[VSOCK_VQ_EVENT];
> >
> >       sg_init_one(&sg, event, sizeof(*event));
> >
> >@@ -351,7 +355,10 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)
> >               virtio_vsock_event_fill_one(vsock, event);
> >       }
> >
> >-      virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
> >+      if (vsock->has_dgram)
> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
> >+      else
> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
> > }
> >
> > static void virtio_vsock_reset_sock(struct sock *sk)
> >@@ -391,7 +398,10 @@ static void virtio_transport_event_work(struct work_struct *work)
> >               container_of(work, struct virtio_vsock, event_work);
> >       struct virtqueue *vq;
> >
> >-      vq = vsock->vqs[VSOCK_VQ_EVENT];
> >+      if (vsock->has_dgram)
> >+              vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
> >+      else
> >+              vq = vsock->vqs[VSOCK_VQ_EVENT];
> >
> >       mutex_lock(&vsock->event_lock);
> >
> >@@ -411,7 +421,10 @@ static void virtio_transport_event_work(struct work_struct *work)
> >               }
> >       } while (!virtqueue_enable_cb(vq));
> >
> >-      virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
> >+      if (vsock->has_dgram)
> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
> >+      else
> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
> > out:
> >       mutex_unlock(&vsock->event_lock);
> > }
> >@@ -434,6 +447,10 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)
> >       queue_work(virtio_vsock_workqueue, &vsock->tx_work);
> > }
> >
> >+static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)
> >+{
> >+}
> >+
> > static void virtio_vsock_rx_done(struct virtqueue *vq)
> > {
> >       struct virtio_vsock *vsock = vq->vdev->priv;
> >@@ -443,6 +460,10 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> >       queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> > }
> >
> >+static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)
> >+{
> >+}
> >+
> > static struct virtio_transport virtio_transport = {
> >       .transport = {
> >               .module                   = THIS_MODULE,
> >@@ -545,13 +566,29 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> >               virtio_vsock_tx_done,
> >               virtio_vsock_event_done,
> >       };
> >+      vq_callback_t *ex_callbacks[] = {
>
> 'ex' is not clear, maybe better 'dgram'?
>
sure.

> What happen if F_DGRAM is negotiated, but not F_STREAM?
>
Hmm. In my mind, F_STREAM is always negotiated. Do we want to add
support when F_STREAM is not negotiated?

> >+              virtio_vsock_rx_done,
> >+              virtio_vsock_tx_done,
> >+              virtio_vsock_dgram_rx_done,
> >+              virtio_vsock_dgram_tx_done,
> >+              virtio_vsock_event_done,
> >+      };
> >+
> >       static const char * const names[] = {
> >               "rx",
> >               "tx",
> >               "event",
> >       };
> >+      static const char * const ex_names[] = {
> >+              "rx",
> >+              "tx",
> >+              "dgram_rx",
> >+              "dgram_tx",
> >+              "event",
> >+      };
> >+
> >       struct virtio_vsock *vsock = NULL;
> >-      int ret;
> >+      int ret, max_vq;
> >
> >       ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);
> >       if (ret)
> >@@ -572,9 +609,30 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> >
> >       vsock->vdev = vdev;
> >
> >-      ret = virtio_find_vqs(vsock->vdev, VSOCK_VQ_MAX,
> >+      if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
> >+              vsock->has_dgram = true;
> >+
> >+      if (vsock->has_dgram)
> >+              max_vq = VSOCK_VQ_EX_MAX;
> >+      else
> >+              max_vq = VSOCK_VQ_MAX;
> >+
> >+      vsock->vqs = kmalloc_array(max_vq, sizeof(struct virtqueue *), GFP_KERNEL);
> >+      if (!vsock->vqs) {
> >+              ret = -ENOMEM;
> >+              goto out;
> >+      }
> >+
> >+      if (vsock->has_dgram) {
> >+              ret = virtio_find_vqs(vsock->vdev, max_vq,
> >+                            vsock->vqs, ex_callbacks, ex_names,
> >+                            NULL);
> >+      } else {
> >+              ret = virtio_find_vqs(vsock->vdev, max_vq,
> >                             vsock->vqs, callbacks, names,
> >                             NULL);
> >+      }
> >+
> >       if (ret < 0)
> >               goto out;
> >
> >@@ -695,6 +753,7 @@ static struct virtio_device_id id_table[] = {
> > };
> >
> > static unsigned int features[] = {
> >+      VIRTIO_VSOCK_F_DGRAM,
> > };
> >
> > static struct virtio_driver virtio_vsock_driver = {
> >--
> >2.11.0
> >
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [External] Re: [RFC v1 1/6] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
@ 2021-06-21 17:24       ` Jiang Wang .
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang . @ 2021-06-21 17:24 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Andra Paraschiv, kvm,
	Michael S. Tsirkin, Networking, linux-kernel, Steven Rostedt,
	virtualization, Yongji Xie, 柴稳,
	Norbert Slusarek, Stefan Hajnoczi, Colin Ian King,
	Jakub Kicinski, Arseny Krasnov, Ingo Molnar, David S. Miller,
	Alexander Popov

On Fri, Jun 18, 2021 at 2:40 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Jun 09, 2021 at 11:24:53PM +0000, Jiang Wang wrote:
> >When this feature is enabled, allocate 5 queues,
> >otherwise, allocate 3 queues to be compatible with
> >old QEMU versions.
> >
> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> >---
> > drivers/vhost/vsock.c             |  3 +-
> > include/linux/virtio_vsock.h      |  9 +++++
> > include/uapi/linux/virtio_vsock.h |  3 ++
> > net/vmw_vsock/virtio_transport.c  | 73 +++++++++++++++++++++++++++++++++++----
> > 4 files changed, 80 insertions(+), 8 deletions(-)
> >
> >diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> >index 5e78fb719602..81d064601093 100644
> >--- a/drivers/vhost/vsock.c
> >+++ b/drivers/vhost/vsock.c
> >@@ -31,7 +31,8 @@
> >
> > enum {
> >       VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> >-                             (1ULL << VIRTIO_F_ACCESS_PLATFORM)
> >+                             (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
> >+                             (1ULL << VIRTIO_VSOCK_F_DGRAM)
> > };
> >
> > enum {
> >diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> >index dc636b727179..ba3189ed9345 100644
> >--- a/include/linux/virtio_vsock.h
> >+++ b/include/linux/virtio_vsock.h
> >@@ -18,6 +18,15 @@ enum {
> >       VSOCK_VQ_MAX    = 3,
> > };
> >
> >+enum {
> >+      VSOCK_VQ_STREAM_RX     = 0, /* for host to guest data */
> >+      VSOCK_VQ_STREAM_TX     = 1, /* for guest to host data */
> >+      VSOCK_VQ_DGRAM_RX       = 2,
> >+      VSOCK_VQ_DGRAM_TX       = 3,
> >+      VSOCK_VQ_EX_EVENT       = 4,
> >+      VSOCK_VQ_EX_MAX         = 5,
> >+};
> >+
> > /* Per-socket state (accessed via vsk->trans) */
> > struct virtio_vsock_sock {
> >       struct vsock_sock *vsk;
> >diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
> >index 1d57ed3d84d2..b56614dff1c9 100644
> >--- a/include/uapi/linux/virtio_vsock.h
> >+++ b/include/uapi/linux/virtio_vsock.h
> >@@ -38,6 +38,9 @@
> > #include <linux/virtio_ids.h>
> > #include <linux/virtio_config.h>
> >
> >+/* The feature bitmap for virtio net */
> >+#define VIRTIO_VSOCK_F_DGRAM  0       /* Host support dgram vsock */
> >+
> > struct virtio_vsock_config {
> >       __le64 guest_cid;
> > } __attribute__((packed));
> >diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> >index 2700a63ab095..7dcb8db23305 100644
> >--- a/net/vmw_vsock/virtio_transport.c
> >+++ b/net/vmw_vsock/virtio_transport.c
> >@@ -27,7 +27,8 @@ static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
> >
> > struct virtio_vsock {
> >       struct virtio_device *vdev;
> >-      struct virtqueue *vqs[VSOCK_VQ_MAX];
> >+      struct virtqueue **vqs;
> >+      bool has_dgram;
> >
> >       /* Virtqueue processing is deferred to a workqueue */
> >       struct work_struct tx_work;
> >@@ -333,7 +334,10 @@ static int virtio_vsock_event_fill_one(struct virtio_vsock *vsock,
> >       struct scatterlist sg;
> >       struct virtqueue *vq;
> >
> >-      vq = vsock->vqs[VSOCK_VQ_EVENT];
> >+      if (vsock->has_dgram)
> >+              vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
> >+      else
> >+              vq = vsock->vqs[VSOCK_VQ_EVENT];
> >
> >       sg_init_one(&sg, event, sizeof(*event));
> >
> >@@ -351,7 +355,10 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)
> >               virtio_vsock_event_fill_one(vsock, event);
> >       }
> >
> >-      virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
> >+      if (vsock->has_dgram)
> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
> >+      else
> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
> > }
> >
> > static void virtio_vsock_reset_sock(struct sock *sk)
> >@@ -391,7 +398,10 @@ static void virtio_transport_event_work(struct work_struct *work)
> >               container_of(work, struct virtio_vsock, event_work);
> >       struct virtqueue *vq;
> >
> >-      vq = vsock->vqs[VSOCK_VQ_EVENT];
> >+      if (vsock->has_dgram)
> >+              vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
> >+      else
> >+              vq = vsock->vqs[VSOCK_VQ_EVENT];
> >
> >       mutex_lock(&vsock->event_lock);
> >
> >@@ -411,7 +421,10 @@ static void virtio_transport_event_work(struct work_struct *work)
> >               }
> >       } while (!virtqueue_enable_cb(vq));
> >
> >-      virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
> >+      if (vsock->has_dgram)
> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
> >+      else
> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
> > out:
> >       mutex_unlock(&vsock->event_lock);
> > }
> >@@ -434,6 +447,10 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)
> >       queue_work(virtio_vsock_workqueue, &vsock->tx_work);
> > }
> >
> >+static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)
> >+{
> >+}
> >+
> > static void virtio_vsock_rx_done(struct virtqueue *vq)
> > {
> >       struct virtio_vsock *vsock = vq->vdev->priv;
> >@@ -443,6 +460,10 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> >       queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> > }
> >
> >+static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)
> >+{
> >+}
> >+
> > static struct virtio_transport virtio_transport = {
> >       .transport = {
> >               .module                   = THIS_MODULE,
> >@@ -545,13 +566,29 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> >               virtio_vsock_tx_done,
> >               virtio_vsock_event_done,
> >       };
> >+      vq_callback_t *ex_callbacks[] = {
>
> 'ex' is not clear, maybe better 'dgram'?
>
sure.

> What happen if F_DGRAM is negotiated, but not F_STREAM?
>
Hmm. In my mind, F_STREAM is always negotiated. Do we want to add
support when F_STREAM is not negotiated?

> >+              virtio_vsock_rx_done,
> >+              virtio_vsock_tx_done,
> >+              virtio_vsock_dgram_rx_done,
> >+              virtio_vsock_dgram_tx_done,
> >+              virtio_vsock_event_done,
> >+      };
> >+
> >       static const char * const names[] = {
> >               "rx",
> >               "tx",
> >               "event",
> >       };
> >+      static const char * const ex_names[] = {
> >+              "rx",
> >+              "tx",
> >+              "dgram_rx",
> >+              "dgram_tx",
> >+              "event",
> >+      };
> >+
> >       struct virtio_vsock *vsock = NULL;
> >-      int ret;
> >+      int ret, max_vq;
> >
> >       ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);
> >       if (ret)
> >@@ -572,9 +609,30 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
> >
> >       vsock->vdev = vdev;
> >
> >-      ret = virtio_find_vqs(vsock->vdev, VSOCK_VQ_MAX,
> >+      if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
> >+              vsock->has_dgram = true;
> >+
> >+      if (vsock->has_dgram)
> >+              max_vq = VSOCK_VQ_EX_MAX;
> >+      else
> >+              max_vq = VSOCK_VQ_MAX;
> >+
> >+      vsock->vqs = kmalloc_array(max_vq, sizeof(struct virtqueue *), GFP_KERNEL);
> >+      if (!vsock->vqs) {
> >+              ret = -ENOMEM;
> >+              goto out;
> >+      }
> >+
> >+      if (vsock->has_dgram) {
> >+              ret = virtio_find_vqs(vsock->vdev, max_vq,
> >+                            vsock->vqs, ex_callbacks, ex_names,
> >+                            NULL);
> >+      } else {
> >+              ret = virtio_find_vqs(vsock->vdev, max_vq,
> >                             vsock->vqs, callbacks, names,
> >                             NULL);
> >+      }
> >+
> >       if (ret < 0)
> >               goto out;
> >
> >@@ -695,6 +753,7 @@ static struct virtio_device_id id_table[] = {
> > };
> >
> > static unsigned int features[] = {
> >+      VIRTIO_VSOCK_F_DGRAM,
> > };
> >
> > static struct virtio_driver virtio_vsock_driver = {
> >--
> >2.11.0
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [External] Re: [RFC v1 5/6] vhost/vsock: add kconfig for vhost dgram support
  2021-06-18  9:54     ` Stefano Garzarella
@ 2021-06-21 17:25       ` Jiang Wang .
  -1 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang . @ 2021-06-21 17:25 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Stefan Hajnoczi, Michael S. Tsirkin,
	Arseny Krasnov, cong.wang, Xiongchun Duan, Yongji Xie,
	柴稳,
	Jason Wang, David S. Miller, Jakub Kicinski, Steven Rostedt,
	Ingo Molnar, Andra Paraschiv, Norbert Slusarek, Colin Ian King,
	Lu Wei, Alexander Popov, kvm, Networking, linux-kernel

On Fri, Jun 18, 2021 at 2:54 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Jun 09, 2021 at 11:24:57PM +0000, Jiang Wang wrote:
> >Also change number of vqs according to the config
> >
> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> >---
> > drivers/vhost/Kconfig |  8 ++++++++
> > drivers/vhost/vsock.c | 11 ++++++++---
> > 2 files changed, 16 insertions(+), 3 deletions(-)
>
> As we already discussed, I think we don't need this patch.

Sure. will do

> Thanks,
> Stefano
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [External] Re: [RFC v1 5/6] vhost/vsock: add kconfig for vhost dgram support
@ 2021-06-21 17:25       ` Jiang Wang .
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang . @ 2021-06-21 17:25 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Lu Wei, cong.wang, Xiongchun Duan, Andra Paraschiv, kvm,
	Michael S. Tsirkin, Networking, linux-kernel, Steven Rostedt,
	virtualization, Yongji Xie, 柴稳,
	Norbert Slusarek, Stefan Hajnoczi, Colin Ian King,
	Jakub Kicinski, Arseny Krasnov, Ingo Molnar, David S. Miller,
	Alexander Popov

On Fri, Jun 18, 2021 at 2:54 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Jun 09, 2021 at 11:24:57PM +0000, Jiang Wang wrote:
> >Also change number of vqs according to the config
> >
> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> >---
> > drivers/vhost/Kconfig |  8 ++++++++
> > drivers/vhost/vsock.c | 11 ++++++++---
> > 2 files changed, 16 insertions(+), 3 deletions(-)
>
> As we already discussed, I think we don't need this patch.

Sure. will do

> Thanks,
> Stefano
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [External] Re: [RFC v1 6/6] virtio/vsock: add sysfs for rx buf len for dgram
  2021-06-18 10:04     ` Stefano Garzarella
@ 2021-06-21 17:27       ` Jiang Wang .
  -1 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang . @ 2021-06-21 17:27 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Stefan Hajnoczi, Michael S. Tsirkin,
	Arseny Krasnov, cong.wang, Xiongchun Duan, Yongji Xie,
	柴稳,
	Jason Wang, David S. Miller, Jakub Kicinski, Steven Rostedt,
	Ingo Molnar, Colin Ian King, Norbert Slusarek, Andra Paraschiv,
	Lu Wei, Alexander Popov, kvm, Networking, linux-kernel

On Fri, Jun 18, 2021 at 3:04 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Jun 09, 2021 at 11:24:58PM +0000, Jiang Wang wrote:
> >Make rx buf len configurable via sysfs
> >
> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> >---
> > net/vmw_vsock/virtio_transport.c | 37 +++++++++++++++++++++++++++++++++++--
> > 1 file changed, 35 insertions(+), 2 deletions(-)
> >
> >diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> >index cf47aadb0c34..2e4dd9c48472 100644
> >--- a/net/vmw_vsock/virtio_transport.c
> >+++ b/net/vmw_vsock/virtio_transport.c
> >@@ -29,6 +29,14 @@ static struct virtio_vsock __rcu *the_virtio_vsock;
> > static struct virtio_vsock *the_virtio_vsock_dgram;
> > static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
> >
> >+static int rx_buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> >+static struct kobject *kobj_ref;
> >+static ssize_t  sysfs_show(struct kobject *kobj,
> >+                      struct kobj_attribute *attr, char *buf);
> >+static ssize_t  sysfs_store(struct kobject *kobj,
> >+                      struct kobj_attribute *attr, const char *buf, size_t count);
> >+static struct kobj_attribute rxbuf_attr = __ATTR(rx_buf_value, 0660, sysfs_show, sysfs_store);
>
> Maybe better to use a 'dgram' prefix.

Sure.

> >+
> > struct virtio_vsock {
> >       struct virtio_device *vdev;
> >       struct virtqueue **vqs;
> >@@ -360,7 +368,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
> >
> > static void virtio_vsock_rx_fill(struct virtio_vsock *vsock, bool is_dgram)
> > {
> >-      int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> >+      int buf_len = rx_buf_len;
> >       struct virtio_vsock_pkt *pkt;
> >       struct scatterlist hdr, buf, *sgs[2];
> >       struct virtqueue *vq;
> >@@ -1003,6 +1011,22 @@ static struct virtio_driver virtio_vsock_driver = {
> >       .remove = virtio_vsock_remove,
> > };
> >
> >+static ssize_t sysfs_show(struct kobject *kobj,
> >+              struct kobj_attribute *attr, char *buf)
> >+{
> >+      return sprintf(buf, "%d", rx_buf_len);
> >+}
> >+
> >+static ssize_t sysfs_store(struct kobject *kobj,
> >+              struct kobj_attribute *attr, const char *buf, size_t count)
> >+{
> >+      if (kstrtou32(buf, 0, &rx_buf_len) < 0)
> >+              return -EINVAL;
> >+      if (rx_buf_len < 1024)
> >+              rx_buf_len = 1024;
> >+      return count;
> >+}
> >+
> > static int __init virtio_vsock_init(void)
> > {
> >       int ret;
> >@@ -1020,8 +1044,17 @@ static int __init virtio_vsock_init(void)
> >       if (ret)
> >               goto out_vci;
> >
> >-      return 0;
> >+      kobj_ref = kobject_create_and_add("vsock", kernel_kobj);
>
> So, IIUC, the path will be /sys/vsock/rx_buf_value?
>
> I'm not sure if we need to add a `virtio` subdir (e.g.
> /sys/vsock/virtio/dgram_rx_buf_size)

I agree adding a virtio is better in case vmware or hyperv will
also have some settings.

> Thanks,
> Stefano
>
> >
> >+      /*Creating sysfs file for etx_value*/
> >+      ret = sysfs_create_file(kobj_ref, &rxbuf_attr.attr);
> >+      if (ret)
> >+              goto out_sysfs;
> >+
> >+      return 0;
> >+out_sysfs:
> >+      kobject_put(kobj_ref);
> >+      sysfs_remove_file(kernel_kobj, &rxbuf_attr.attr);
> > out_vci:
> >       vsock_core_unregister(&virtio_transport.transport);
> > out_wq:
> >--
> >2.11.0
> >
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [External] Re: [RFC v1 6/6] virtio/vsock: add sysfs for rx buf len for dgram
@ 2021-06-21 17:27       ` Jiang Wang .
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang . @ 2021-06-21 17:27 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Lu Wei, cong.wang, Xiongchun Duan, Andra Paraschiv, kvm,
	Michael S. Tsirkin, Networking, linux-kernel, Steven Rostedt,
	virtualization, Yongji Xie, 柴稳,
	Norbert Slusarek, Stefan Hajnoczi, Colin Ian King,
	Jakub Kicinski, Arseny Krasnov, Ingo Molnar, David S. Miller,
	Alexander Popov

On Fri, Jun 18, 2021 at 3:04 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Wed, Jun 09, 2021 at 11:24:58PM +0000, Jiang Wang wrote:
> >Make rx buf len configurable via sysfs
> >
> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> >---
> > net/vmw_vsock/virtio_transport.c | 37 +++++++++++++++++++++++++++++++++++--
> > 1 file changed, 35 insertions(+), 2 deletions(-)
> >
> >diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> >index cf47aadb0c34..2e4dd9c48472 100644
> >--- a/net/vmw_vsock/virtio_transport.c
> >+++ b/net/vmw_vsock/virtio_transport.c
> >@@ -29,6 +29,14 @@ static struct virtio_vsock __rcu *the_virtio_vsock;
> > static struct virtio_vsock *the_virtio_vsock_dgram;
> > static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
> >
> >+static int rx_buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> >+static struct kobject *kobj_ref;
> >+static ssize_t  sysfs_show(struct kobject *kobj,
> >+                      struct kobj_attribute *attr, char *buf);
> >+static ssize_t  sysfs_store(struct kobject *kobj,
> >+                      struct kobj_attribute *attr, const char *buf, size_t count);
> >+static struct kobj_attribute rxbuf_attr = __ATTR(rx_buf_value, 0660, sysfs_show, sysfs_store);
>
> Maybe better to use a 'dgram' prefix.

Sure.

> >+
> > struct virtio_vsock {
> >       struct virtio_device *vdev;
> >       struct virtqueue **vqs;
> >@@ -360,7 +368,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
> >
> > static void virtio_vsock_rx_fill(struct virtio_vsock *vsock, bool is_dgram)
> > {
> >-      int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> >+      int buf_len = rx_buf_len;
> >       struct virtio_vsock_pkt *pkt;
> >       struct scatterlist hdr, buf, *sgs[2];
> >       struct virtqueue *vq;
> >@@ -1003,6 +1011,22 @@ static struct virtio_driver virtio_vsock_driver = {
> >       .remove = virtio_vsock_remove,
> > };
> >
> >+static ssize_t sysfs_show(struct kobject *kobj,
> >+              struct kobj_attribute *attr, char *buf)
> >+{
> >+      return sprintf(buf, "%d", rx_buf_len);
> >+}
> >+
> >+static ssize_t sysfs_store(struct kobject *kobj,
> >+              struct kobj_attribute *attr, const char *buf, size_t count)
> >+{
> >+      if (kstrtou32(buf, 0, &rx_buf_len) < 0)
> >+              return -EINVAL;
> >+      if (rx_buf_len < 1024)
> >+              rx_buf_len = 1024;
> >+      return count;
> >+}
> >+
> > static int __init virtio_vsock_init(void)
> > {
> >       int ret;
> >@@ -1020,8 +1044,17 @@ static int __init virtio_vsock_init(void)
> >       if (ret)
> >               goto out_vci;
> >
> >-      return 0;
> >+      kobj_ref = kobject_create_and_add("vsock", kernel_kobj);
>
> So, IIUC, the path will be /sys/vsock/rx_buf_value?
>
> I'm not sure if we need to add a `virtio` subdir (e.g.
> /sys/vsock/virtio/dgram_rx_buf_size)

I agree adding a virtio is better in case vmware or hyperv will
also have some settings.

> Thanks,
> Stefano
>
> >
> >+      /*Creating sysfs file for etx_value*/
> >+      ret = sysfs_create_file(kobj_ref, &rxbuf_attr.attr);
> >+      if (ret)
> >+              goto out_sysfs;
> >+
> >+      return 0;
> >+out_sysfs:
> >+      kobject_put(kobj_ref);
> >+      sysfs_remove_file(kernel_kobj, &rxbuf_attr.attr);
> > out_vci:
> >       vsock_core_unregister(&virtio_transport.transport);
> > out_wq:
> >--
> >2.11.0
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [External] Re: [RFC v1 3/6] vhost/vsock: add support for vhost dgram.
  2021-06-18 10:13     ` Stefano Garzarella
@ 2021-06-21 17:32       ` Jiang Wang .
  -1 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang . @ 2021-06-21 17:32 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: virtualization, Stefan Hajnoczi, Michael S. Tsirkin,
	Arseny Krasnov, cong.wang, Xiongchun Duan, Yongji Xie,
	柴稳,
	Jason Wang, David S. Miller, Jakub Kicinski, Steven Rostedt,
	Ingo Molnar, Colin Ian King, Andra Paraschiv, Norbert Slusarek,
	Jeff Vander Stoep, Alexander Popov, kvm, Networking,
	linux-kernel

On Fri, Jun 18, 2021 at 3:14 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> We should use le16_to_cpu when accessing pkt->hdr fields.

OK. Will do.

> On Wed, Jun 09, 2021 at 11:24:55PM +0000, Jiang Wang wrote:
> >This patch supports dgram on vhost side, including
> >tx and rx. The vhost send packets asynchronously.
> >
> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> >---
> > drivers/vhost/vsock.c | 199 +++++++++++++++++++++++++++++++++++++++++++-------
> > 1 file changed, 173 insertions(+), 26 deletions(-)
> >
> >diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> >index 81d064601093..d366463be6d4 100644
> >--- a/drivers/vhost/vsock.c
> >+++ b/drivers/vhost/vsock.c
> >@@ -28,7 +28,10 @@
> >  * small pkts.
> >  */
> > #define VHOST_VSOCK_PKT_WEIGHT 256
> >+#define VHOST_VSOCK_DGRM_MAX_PENDING_PKT 128
> >
> >+/* Max wait time in busy poll in microseconds */
> >+#define VHOST_VSOCK_BUSY_POLL_TIMEOUT 20
> > enum {
> >       VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> >                              (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
> >@@ -45,7 +48,7 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
> >
> > struct vhost_vsock {
> >       struct vhost_dev dev;
> >-      struct vhost_virtqueue vqs[2];
> >+      struct vhost_virtqueue vqs[4];
> >
> >       /* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */
> >       struct hlist_node hash;
> >@@ -54,6 +57,11 @@ struct vhost_vsock {
> >       spinlock_t send_pkt_list_lock;
> >       struct list_head send_pkt_list; /* host->guest pending packets */
> >
> >+      spinlock_t dgram_send_pkt_list_lock;
> >+      struct list_head dgram_send_pkt_list;   /* host->guest pending packets */
> >+      struct vhost_work dgram_send_pkt_work;
> >+      int  dgram_used; /*pending packets to be send */
> >+
> >       atomic_t queued_replies;
> >
> >       u32 guest_cid;
> >@@ -90,10 +98,22 @@ static void
> > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >                           struct vhost_virtqueue *vq)
> > {
> >-      struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX];
> >+      struct vhost_virtqueue *tx_vq;
> >       int pkts = 0, total_len = 0;
> >       bool added = false;
> >       bool restart_tx = false;
> >+      spinlock_t *lock;
> >+      struct list_head *send_pkt_list;
> >+
> >+      if (vq == &vsock->vqs[VSOCK_VQ_RX]) {
> >+              tx_vq = &vsock->vqs[VSOCK_VQ_TX];
> >+              lock = &vsock->send_pkt_list_lock;
> >+              send_pkt_list = &vsock->send_pkt_list;
> >+      } else {
> >+              tx_vq = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
> >+              lock = &vsock->dgram_send_pkt_list_lock;
> >+              send_pkt_list = &vsock->dgram_send_pkt_list;
> >+      }
> >
> >       mutex_lock(&vq->mutex);
> >
> >@@ -113,36 +133,48 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >               size_t nbytes;
> >               size_t iov_len, payload_len;
> >               int head;
> >+              bool is_dgram = false;
> >
> >-              spin_lock_bh(&vsock->send_pkt_list_lock);
> >-              if (list_empty(&vsock->send_pkt_list)) {
> >-                      spin_unlock_bh(&vsock->send_pkt_list_lock);
> >+              spin_lock_bh(lock);
> >+              if (list_empty(send_pkt_list)) {
> >+                      spin_unlock_bh(lock);
> >                       vhost_enable_notify(&vsock->dev, vq);
> >                       break;
> >               }
> >
> >-              pkt = list_first_entry(&vsock->send_pkt_list,
> >+              pkt = list_first_entry(send_pkt_list,
> >                                      struct virtio_vsock_pkt, list);
> >               list_del_init(&pkt->list);
> >-              spin_unlock_bh(&vsock->send_pkt_list_lock);
> >+              spin_unlock_bh(lock);
> >+
> >+              if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)
>                      ^
>                      le16_to_cpu(pkt->hdr.type)
>
> >+                      is_dgram = true;
> >
> >               head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> >                                        &out, &in, NULL, NULL);
> >               if (head < 0) {
> >-                      spin_lock_bh(&vsock->send_pkt_list_lock);
> >-                      list_add(&pkt->list, &vsock->send_pkt_list);
> >-                      spin_unlock_bh(&vsock->send_pkt_list_lock);
> >+                      spin_lock_bh(lock);
> >+                      list_add(&pkt->list, send_pkt_list);
> >+                      spin_unlock_bh(lock);
> >                       break;
> >               }
> >
> >               if (head == vq->num) {
> >-                      spin_lock_bh(&vsock->send_pkt_list_lock);
> >-                      list_add(&pkt->list, &vsock->send_pkt_list);
> >-                      spin_unlock_bh(&vsock->send_pkt_list_lock);
> >+                      if (is_dgram) {
> >+                              virtio_transport_free_pkt(pkt);
> >+                              vq_err(vq, "Dgram virtqueue is full!");
> >+                              spin_lock_bh(lock);
> >+                              vsock->dgram_used--;
> >+                              spin_unlock_bh(lock);
> >+                              break;
> >+                      }
> >+                      spin_lock_bh(lock);
> >+                      list_add(&pkt->list, send_pkt_list);
> >+                      spin_unlock_bh(lock);
> >
> >                       /* We cannot finish yet if more buffers snuck in while
> >-                       * re-enabling notify.
> >-                       */
> >+                      * re-enabling notify.
> >+                      */
> >                       if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
> >                               vhost_disable_notify(&vsock->dev, vq);
> >                               continue;
> >@@ -153,6 +185,12 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >               if (out) {
> >                       virtio_transport_free_pkt(pkt);
> >                       vq_err(vq, "Expected 0 output buffers, got %u\n", out);
> >+                      if (is_dgram) {
> >+                              spin_lock_bh(lock);
> >+                              vsock->dgram_used--;
> >+                              spin_unlock_bh(lock);
> >+                      }
> >+
> >                       break;
> >               }
> >
> >@@ -160,6 +198,18 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >               if (iov_len < sizeof(pkt->hdr)) {
> >                       virtio_transport_free_pkt(pkt);
> >                       vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
> >+                      if (is_dgram) {
> >+                              spin_lock_bh(lock);
> >+                              vsock->dgram_used--;
> >+                              spin_unlock_bh(lock);
> >+                      }
> >+                      break;
> >+              }
> >+
> >+              if (iov_len < pkt->len - pkt->off &&
> >+                      vq == &vsock->vqs[VSOCK_VQ_DGRAM_RX]) {
> >+                      virtio_transport_free_pkt(pkt);
> >+                      vq_err(vq, "Buffer len [%zu] too small for dgram\n", iov_len);
> >                       break;
> >               }
> >
> >@@ -179,6 +229,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >               if (nbytes != sizeof(pkt->hdr)) {
> >                       virtio_transport_free_pkt(pkt);
> >                       vq_err(vq, "Faulted on copying pkt hdr\n");
> >+                      if (is_dgram) {
> >+                              spin_lock_bh(lock);
> >+                              vsock->dgram_used--;
> >+                              spin_unlock_bh(lock);
> >+                      }
> >                       break;
> >               }
> >
> >@@ -204,16 +259,17 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >               /* If we didn't send all the payload we can requeue the packet
> >                * to send it with the next available buffer.
> >                */
> >-              if (pkt->off < pkt->len) {
> >+              if ((pkt->off < pkt->len)
> >+                      && (vq == &vsock->vqs[VSOCK_VQ_RX])) {
> >                       /* We are queueing the same virtio_vsock_pkt to handle
> >                        * the remaining bytes, and we want to deliver it
> >                        * to monitoring devices in the next iteration.
> >                        */
> >                       pkt->tap_delivered = false;
> >
> >-                      spin_lock_bh(&vsock->send_pkt_list_lock);
> >-                      list_add(&pkt->list, &vsock->send_pkt_list);
> >-                      spin_unlock_bh(&vsock->send_pkt_list_lock);
> >+                      spin_lock_bh(lock);
> >+                      list_add(&pkt->list, send_pkt_list);
> >+                      spin_unlock_bh(lock);
> >               } else {
> >                       if (pkt->reply) {
> >                               int val;
> >@@ -228,6 +284,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >                       }
> >
> >                       virtio_transport_free_pkt(pkt);
> >+                      if (is_dgram) {
> >+                              spin_lock_bh(lock);
> >+                              vsock->dgram_used--;
> >+                              spin_unlock_bh(lock);
> >+                      }
> >               }
> >       } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
> >       if (added)
> >@@ -251,11 +312,25 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work)
> >       vhost_transport_do_send_pkt(vsock, vq);
> > }
> >
> >+static void vhost_transport_dgram_send_pkt_work(struct vhost_work *work)
> >+{
> >+      struct vhost_virtqueue *vq;
> >+      struct vhost_vsock *vsock;
> >+
> >+      vsock = container_of(work, struct vhost_vsock, dgram_send_pkt_work);
> >+      vq = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
> >+
> >+      vhost_transport_do_send_pkt(vsock, vq);
> >+}
> >+
> > static int
> > vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> > {
> >       struct vhost_vsock *vsock;
> >       int len = pkt->len;
> >+      spinlock_t *lock;
> >+      struct list_head *send_pkt_list;
> >+      struct vhost_work *work;
> >
> >       rcu_read_lock();
> >
> >@@ -267,14 +342,38 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> >               return -ENODEV;
> >       }
> >
> >+      if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_STREAM) {
>              ^
>              le16_to_cpu(pkt->hdr.type)
> >+              lock = &vsock->send_pkt_list_lock;
> >+              send_pkt_list = &vsock->send_pkt_list;
> >+              work = &vsock->send_pkt_work;
> >+      } else if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {
>                     ^
>                     le16_to_cpu(pkt->hdr.type)
> >+              lock = &vsock->dgram_send_pkt_list_lock;
> >+              send_pkt_list = &vsock->dgram_send_pkt_list;
> >+              work = &vsock->dgram_send_pkt_work;
> >+      } else {
> >+              rcu_read_unlock();
> >+              virtio_transport_free_pkt(pkt);
> >+              return -EINVAL;
> >+      }
> >+
> >+
> >       if (pkt->reply)
> >               atomic_inc(&vsock->queued_replies);
> >
> >-      spin_lock_bh(&vsock->send_pkt_list_lock);
> >-      list_add_tail(&pkt->list, &vsock->send_pkt_list);
> >-      spin_unlock_bh(&vsock->send_pkt_list_lock);
> >+      spin_lock_bh(lock);
> >+      if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {
>              ^
>              le16_to_cpu(pkt->hdr.type)
> >+              if (vsock->dgram_used  == VHOST_VSOCK_DGRM_MAX_PENDING_PKT)
> >+                      len = -ENOMEM;
> >+              else {
> >+                      vsock->dgram_used++;
> >+                      list_add_tail(&pkt->list, send_pkt_list);
> >+              }
> >+      } else
> >+              list_add_tail(&pkt->list, send_pkt_list);
> >
> >-      vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
> >+      spin_unlock_bh(lock);
> >+
> >+      vhost_work_queue(&vsock->dev, work);
> >
> >       rcu_read_unlock();
> >       return len;
> >@@ -355,7 +454,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> >               return NULL;
> >       }
> >
> >-      if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM)
> >+      if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM
> >+              || le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_DGRAM)
> >               pkt->len = le32_to_cpu(pkt->hdr.len);
> >
> >       /* No payload */
> >@@ -442,6 +542,18 @@ static struct virtio_transport vhost_transport = {
> >       .send_pkt = vhost_transport_send_pkt,
> > };
> >
> >+static inline unsigned long busy_clock(void)
> >+{
> >+      return local_clock() >> 10;
> >+}
> >+
> >+static bool vhost_can_busy_poll(unsigned long endtime)
> >+{
> >+      return likely(!need_resched() && !time_after(busy_clock(), endtime) &&
> >+                    !signal_pending(current));
> >+}
> >+
> >+
> > static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> > {
> >       struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
> >@@ -452,6 +564,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> >       int head, pkts = 0, total_len = 0;
> >       unsigned int out, in;
> >       bool added = false;
> >+      unsigned long busyloop_timeout = VHOST_VSOCK_BUSY_POLL_TIMEOUT;
> >+      unsigned long endtime;
> >
> >       mutex_lock(&vq->mutex);
> >
> >@@ -461,11 +575,14 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> >       if (!vq_meta_prefetch(vq))
> >               goto out;
> >
> >+      endtime = busy_clock() + busyloop_timeout;
> >       vhost_disable_notify(&vsock->dev, vq);
> >+      preempt_disable();
> >       do {
> >               u32 len;
> >
> >-              if (!vhost_vsock_more_replies(vsock)) {
> >+              if (vq == &vsock->vqs[VSOCK_VQ_TX]
> >+                      && !vhost_vsock_more_replies(vsock)) {
> >                       /* Stop tx until the device processes already
> >                        * pending replies.  Leave tx virtqueue
> >                        * callbacks disabled.
> >@@ -479,6 +596,11 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> >                       break;
> >
> >               if (head == vq->num) {
> >+                      if (vhost_can_busy_poll(endtime)) {
> >+                              cpu_relax();
> >+                              continue;
> >+                      }
> >+
> >                       if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
> >                               vhost_disable_notify(&vsock->dev, vq);
> >                               continue;
> >@@ -510,6 +632,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> >               total_len += len;
> >               added = true;
> >       } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
> >+      preempt_enable();
> >
> > no_more_replies:
> >       if (added)
> >@@ -565,6 +688,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
> >        * let's kick the send worker to send them.
> >        */
> >       vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
> >+      vhost_work_queue(&vsock->dev, &vsock->dgram_send_pkt_work);
> >
> >       mutex_unlock(&vsock->dev.mutex);
> >       return 0;
> >@@ -639,8 +763,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
> >
> >       vqs[VSOCK_VQ_TX] = &vsock->vqs[VSOCK_VQ_TX];
> >       vqs[VSOCK_VQ_RX] = &vsock->vqs[VSOCK_VQ_RX];
> >+      vqs[VSOCK_VQ_DGRAM_TX] = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
> >+      vqs[VSOCK_VQ_DGRAM_RX] = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
> >       vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
> >       vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
> >+      vsock->vqs[VSOCK_VQ_DGRAM_TX].handle_kick =
> >+                                              vhost_vsock_handle_tx_kick;
> >+      vsock->vqs[VSOCK_VQ_DGRAM_RX].handle_kick =
> >+                                              vhost_vsock_handle_rx_kick;
> >
> >       vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
> >                      UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
> >@@ -650,6 +780,11 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
> >       spin_lock_init(&vsock->send_pkt_list_lock);
> >       INIT_LIST_HEAD(&vsock->send_pkt_list);
> >       vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
> >+      spin_lock_init(&vsock->dgram_send_pkt_list_lock);
> >+      INIT_LIST_HEAD(&vsock->dgram_send_pkt_list);
> >+      vhost_work_init(&vsock->dgram_send_pkt_work,
> >+                      vhost_transport_dgram_send_pkt_work);
> >+
> >       return 0;
> >
> > out:
> >@@ -665,6 +800,7 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)
> >               if (vsock->vqs[i].handle_kick)
> >                       vhost_poll_flush(&vsock->vqs[i].poll);
> >       vhost_work_flush(&vsock->dev, &vsock->send_pkt_work);
> >+      vhost_work_flush(&vsock->dev, &vsock->dgram_send_pkt_work);
> > }
> >
> > static void vhost_vsock_reset_orphans(struct sock *sk)
> >@@ -724,6 +860,17 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
> >       }
> >       spin_unlock_bh(&vsock->send_pkt_list_lock);
> >
> >+      spin_lock_bh(&vsock->dgram_send_pkt_list_lock);
> >+      while (!list_empty(&vsock->dgram_send_pkt_list)) {
> >+              struct virtio_vsock_pkt *pkt;
> >+
> >+              pkt = list_first_entry(&vsock->dgram_send_pkt_list,
> >+                              struct virtio_vsock_pkt, list);
> >+              list_del_init(&pkt->list);
> >+              virtio_transport_free_pkt(pkt);
> >+      }
> >+      spin_unlock_bh(&vsock->dgram_send_pkt_list_lock);
> >+
> >       vhost_dev_cleanup(&vsock->dev);
> >       kfree(vsock->dev.vqs);
> >       vhost_vsock_free(vsock);
> >@@ -906,7 +1053,7 @@ static int __init vhost_vsock_init(void)
> >       int ret;
> >
> >       ret = vsock_core_register(&vhost_transport.transport,
> >-                                VSOCK_TRANSPORT_F_H2G);
> >+                                VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM);
> >       if (ret < 0)
> >               return ret;
> >       return misc_register(&vhost_vsock_misc);
> >--
> >2.11.0
> >
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [External] Re: [RFC v1 3/6] vhost/vsock: add support for vhost dgram.
@ 2021-06-21 17:32       ` Jiang Wang .
  0 siblings, 0 replies; 59+ messages in thread
From: Jiang Wang . @ 2021-06-21 17:32 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: cong.wang, Xiongchun Duan, Andra Paraschiv, kvm,
	Michael S. Tsirkin, Jeff Vander Stoep, Networking, linux-kernel,
	Steven Rostedt, virtualization, Yongji Xie, 柴稳,
	Norbert Slusarek, Stefan Hajnoczi, Colin Ian King,
	Jakub Kicinski, Arseny Krasnov, Ingo Molnar, David S. Miller,
	Alexander Popov

On Fri, Jun 18, 2021 at 3:14 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> We should use le16_to_cpu when accessing pkt->hdr fields.

OK. Will do.

> On Wed, Jun 09, 2021 at 11:24:55PM +0000, Jiang Wang wrote:
> >This patch supports dgram on vhost side, including
> >tx and rx. The vhost send packets asynchronously.
> >
> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> >---
> > drivers/vhost/vsock.c | 199 +++++++++++++++++++++++++++++++++++++++++++-------
> > 1 file changed, 173 insertions(+), 26 deletions(-)
> >
> >diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> >index 81d064601093..d366463be6d4 100644
> >--- a/drivers/vhost/vsock.c
> >+++ b/drivers/vhost/vsock.c
> >@@ -28,7 +28,10 @@
> >  * small pkts.
> >  */
> > #define VHOST_VSOCK_PKT_WEIGHT 256
> >+#define VHOST_VSOCK_DGRM_MAX_PENDING_PKT 128
> >
> >+/* Max wait time in busy poll in microseconds */
> >+#define VHOST_VSOCK_BUSY_POLL_TIMEOUT 20
> > enum {
> >       VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> >                              (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
> >@@ -45,7 +48,7 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8);
> >
> > struct vhost_vsock {
> >       struct vhost_dev dev;
> >-      struct vhost_virtqueue vqs[2];
> >+      struct vhost_virtqueue vqs[4];
> >
> >       /* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */
> >       struct hlist_node hash;
> >@@ -54,6 +57,11 @@ struct vhost_vsock {
> >       spinlock_t send_pkt_list_lock;
> >       struct list_head send_pkt_list; /* host->guest pending packets */
> >
> >+      spinlock_t dgram_send_pkt_list_lock;
> >+      struct list_head dgram_send_pkt_list;   /* host->guest pending packets */
> >+      struct vhost_work dgram_send_pkt_work;
> >+      int  dgram_used; /*pending packets to be send */
> >+
> >       atomic_t queued_replies;
> >
> >       u32 guest_cid;
> >@@ -90,10 +98,22 @@ static void
> > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >                           struct vhost_virtqueue *vq)
> > {
> >-      struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX];
> >+      struct vhost_virtqueue *tx_vq;
> >       int pkts = 0, total_len = 0;
> >       bool added = false;
> >       bool restart_tx = false;
> >+      spinlock_t *lock;
> >+      struct list_head *send_pkt_list;
> >+
> >+      if (vq == &vsock->vqs[VSOCK_VQ_RX]) {
> >+              tx_vq = &vsock->vqs[VSOCK_VQ_TX];
> >+              lock = &vsock->send_pkt_list_lock;
> >+              send_pkt_list = &vsock->send_pkt_list;
> >+      } else {
> >+              tx_vq = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
> >+              lock = &vsock->dgram_send_pkt_list_lock;
> >+              send_pkt_list = &vsock->dgram_send_pkt_list;
> >+      }
> >
> >       mutex_lock(&vq->mutex);
> >
> >@@ -113,36 +133,48 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >               size_t nbytes;
> >               size_t iov_len, payload_len;
> >               int head;
> >+              bool is_dgram = false;
> >
> >-              spin_lock_bh(&vsock->send_pkt_list_lock);
> >-              if (list_empty(&vsock->send_pkt_list)) {
> >-                      spin_unlock_bh(&vsock->send_pkt_list_lock);
> >+              spin_lock_bh(lock);
> >+              if (list_empty(send_pkt_list)) {
> >+                      spin_unlock_bh(lock);
> >                       vhost_enable_notify(&vsock->dev, vq);
> >                       break;
> >               }
> >
> >-              pkt = list_first_entry(&vsock->send_pkt_list,
> >+              pkt = list_first_entry(send_pkt_list,
> >                                      struct virtio_vsock_pkt, list);
> >               list_del_init(&pkt->list);
> >-              spin_unlock_bh(&vsock->send_pkt_list_lock);
> >+              spin_unlock_bh(lock);
> >+
> >+              if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM)
>                      ^
>                      le16_to_cpu(pkt->hdr.type)
>
> >+                      is_dgram = true;
> >
> >               head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> >                                        &out, &in, NULL, NULL);
> >               if (head < 0) {
> >-                      spin_lock_bh(&vsock->send_pkt_list_lock);
> >-                      list_add(&pkt->list, &vsock->send_pkt_list);
> >-                      spin_unlock_bh(&vsock->send_pkt_list_lock);
> >+                      spin_lock_bh(lock);
> >+                      list_add(&pkt->list, send_pkt_list);
> >+                      spin_unlock_bh(lock);
> >                       break;
> >               }
> >
> >               if (head == vq->num) {
> >-                      spin_lock_bh(&vsock->send_pkt_list_lock);
> >-                      list_add(&pkt->list, &vsock->send_pkt_list);
> >-                      spin_unlock_bh(&vsock->send_pkt_list_lock);
> >+                      if (is_dgram) {
> >+                              virtio_transport_free_pkt(pkt);
> >+                              vq_err(vq, "Dgram virtqueue is full!");
> >+                              spin_lock_bh(lock);
> >+                              vsock->dgram_used--;
> >+                              spin_unlock_bh(lock);
> >+                              break;
> >+                      }
> >+                      spin_lock_bh(lock);
> >+                      list_add(&pkt->list, send_pkt_list);
> >+                      spin_unlock_bh(lock);
> >
> >                       /* We cannot finish yet if more buffers snuck in while
> >-                       * re-enabling notify.
> >-                       */
> >+                      * re-enabling notify.
> >+                      */
> >                       if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
> >                               vhost_disable_notify(&vsock->dev, vq);
> >                               continue;
> >@@ -153,6 +185,12 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >               if (out) {
> >                       virtio_transport_free_pkt(pkt);
> >                       vq_err(vq, "Expected 0 output buffers, got %u\n", out);
> >+                      if (is_dgram) {
> >+                              spin_lock_bh(lock);
> >+                              vsock->dgram_used--;
> >+                              spin_unlock_bh(lock);
> >+                      }
> >+
> >                       break;
> >               }
> >
> >@@ -160,6 +198,18 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >               if (iov_len < sizeof(pkt->hdr)) {
> >                       virtio_transport_free_pkt(pkt);
> >                       vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
> >+                      if (is_dgram) {
> >+                              spin_lock_bh(lock);
> >+                              vsock->dgram_used--;
> >+                              spin_unlock_bh(lock);
> >+                      }
> >+                      break;
> >+              }
> >+
> >+              if (iov_len < pkt->len - pkt->off &&
> >+                      vq == &vsock->vqs[VSOCK_VQ_DGRAM_RX]) {
> >+                      virtio_transport_free_pkt(pkt);
> >+                      vq_err(vq, "Buffer len [%zu] too small for dgram\n", iov_len);
> >                       break;
> >               }
> >
> >@@ -179,6 +229,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >               if (nbytes != sizeof(pkt->hdr)) {
> >                       virtio_transport_free_pkt(pkt);
> >                       vq_err(vq, "Faulted on copying pkt hdr\n");
> >+                      if (is_dgram) {
> >+                              spin_lock_bh(lock);
> >+                              vsock->dgram_used--;
> >+                              spin_unlock_bh(lock);
> >+                      }
> >                       break;
> >               }
> >
> >@@ -204,16 +259,17 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >               /* If we didn't send all the payload we can requeue the packet
> >                * to send it with the next available buffer.
> >                */
> >-              if (pkt->off < pkt->len) {
> >+              if ((pkt->off < pkt->len)
> >+                      && (vq == &vsock->vqs[VSOCK_VQ_RX])) {
> >                       /* We are queueing the same virtio_vsock_pkt to handle
> >                        * the remaining bytes, and we want to deliver it
> >                        * to monitoring devices in the next iteration.
> >                        */
> >                       pkt->tap_delivered = false;
> >
> >-                      spin_lock_bh(&vsock->send_pkt_list_lock);
> >-                      list_add(&pkt->list, &vsock->send_pkt_list);
> >-                      spin_unlock_bh(&vsock->send_pkt_list_lock);
> >+                      spin_lock_bh(lock);
> >+                      list_add(&pkt->list, send_pkt_list);
> >+                      spin_unlock_bh(lock);
> >               } else {
> >                       if (pkt->reply) {
> >                               int val;
> >@@ -228,6 +284,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> >                       }
> >
> >                       virtio_transport_free_pkt(pkt);
> >+                      if (is_dgram) {
> >+                              spin_lock_bh(lock);
> >+                              vsock->dgram_used--;
> >+                              spin_unlock_bh(lock);
> >+                      }
> >               }
> >       } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
> >       if (added)
> >@@ -251,11 +312,25 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work)
> >       vhost_transport_do_send_pkt(vsock, vq);
> > }
> >
> >+static void vhost_transport_dgram_send_pkt_work(struct vhost_work *work)
> >+{
> >+      struct vhost_virtqueue *vq;
> >+      struct vhost_vsock *vsock;
> >+
> >+      vsock = container_of(work, struct vhost_vsock, dgram_send_pkt_work);
> >+      vq = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
> >+
> >+      vhost_transport_do_send_pkt(vsock, vq);
> >+}
> >+
> > static int
> > vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> > {
> >       struct vhost_vsock *vsock;
> >       int len = pkt->len;
> >+      spinlock_t *lock;
> >+      struct list_head *send_pkt_list;
> >+      struct vhost_work *work;
> >
> >       rcu_read_lock();
> >
> >@@ -267,14 +342,38 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> >               return -ENODEV;
> >       }
> >
> >+      if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_STREAM) {
>              ^
>              le16_to_cpu(pkt->hdr.type)
> >+              lock = &vsock->send_pkt_list_lock;
> >+              send_pkt_list = &vsock->send_pkt_list;
> >+              work = &vsock->send_pkt_work;
> >+      } else if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {
>                     ^
>                     le16_to_cpu(pkt->hdr.type)
> >+              lock = &vsock->dgram_send_pkt_list_lock;
> >+              send_pkt_list = &vsock->dgram_send_pkt_list;
> >+              work = &vsock->dgram_send_pkt_work;
> >+      } else {
> >+              rcu_read_unlock();
> >+              virtio_transport_free_pkt(pkt);
> >+              return -EINVAL;
> >+      }
> >+
> >+
> >       if (pkt->reply)
> >               atomic_inc(&vsock->queued_replies);
> >
> >-      spin_lock_bh(&vsock->send_pkt_list_lock);
> >-      list_add_tail(&pkt->list, &vsock->send_pkt_list);
> >-      spin_unlock_bh(&vsock->send_pkt_list_lock);
> >+      spin_lock_bh(lock);
> >+      if (pkt->hdr.type == VIRTIO_VSOCK_TYPE_DGRAM) {
>              ^
>              le16_to_cpu(pkt->hdr.type)
> >+              if (vsock->dgram_used  == VHOST_VSOCK_DGRM_MAX_PENDING_PKT)
> >+                      len = -ENOMEM;
> >+              else {
> >+                      vsock->dgram_used++;
> >+                      list_add_tail(&pkt->list, send_pkt_list);
> >+              }
> >+      } else
> >+              list_add_tail(&pkt->list, send_pkt_list);
> >
> >-      vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
> >+      spin_unlock_bh(lock);
> >+
> >+      vhost_work_queue(&vsock->dev, work);
> >
> >       rcu_read_unlock();
> >       return len;
> >@@ -355,7 +454,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> >               return NULL;
> >       }
> >
> >-      if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM)
> >+      if (le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_STREAM
> >+              || le16_to_cpu(pkt->hdr.type) == VIRTIO_VSOCK_TYPE_DGRAM)
> >               pkt->len = le32_to_cpu(pkt->hdr.len);
> >
> >       /* No payload */
> >@@ -442,6 +542,18 @@ static struct virtio_transport vhost_transport = {
> >       .send_pkt = vhost_transport_send_pkt,
> > };
> >
> >+static inline unsigned long busy_clock(void)
> >+{
> >+      return local_clock() >> 10;
> >+}
> >+
> >+static bool vhost_can_busy_poll(unsigned long endtime)
> >+{
> >+      return likely(!need_resched() && !time_after(busy_clock(), endtime) &&
> >+                    !signal_pending(current));
> >+}
> >+
> >+
> > static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> > {
> >       struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
> >@@ -452,6 +564,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> >       int head, pkts = 0, total_len = 0;
> >       unsigned int out, in;
> >       bool added = false;
> >+      unsigned long busyloop_timeout = VHOST_VSOCK_BUSY_POLL_TIMEOUT;
> >+      unsigned long endtime;
> >
> >       mutex_lock(&vq->mutex);
> >
> >@@ -461,11 +575,14 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> >       if (!vq_meta_prefetch(vq))
> >               goto out;
> >
> >+      endtime = busy_clock() + busyloop_timeout;
> >       vhost_disable_notify(&vsock->dev, vq);
> >+      preempt_disable();
> >       do {
> >               u32 len;
> >
> >-              if (!vhost_vsock_more_replies(vsock)) {
> >+              if (vq == &vsock->vqs[VSOCK_VQ_TX]
> >+                      && !vhost_vsock_more_replies(vsock)) {
> >                       /* Stop tx until the device processes already
> >                        * pending replies.  Leave tx virtqueue
> >                        * callbacks disabled.
> >@@ -479,6 +596,11 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> >                       break;
> >
> >               if (head == vq->num) {
> >+                      if (vhost_can_busy_poll(endtime)) {
> >+                              cpu_relax();
> >+                              continue;
> >+                      }
> >+
> >                       if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
> >                               vhost_disable_notify(&vsock->dev, vq);
> >                               continue;
> >@@ -510,6 +632,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
> >               total_len += len;
> >               added = true;
> >       } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
> >+      preempt_enable();
> >
> > no_more_replies:
> >       if (added)
> >@@ -565,6 +688,7 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
> >        * let's kick the send worker to send them.
> >        */
> >       vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
> >+      vhost_work_queue(&vsock->dev, &vsock->dgram_send_pkt_work);
> >
> >       mutex_unlock(&vsock->dev.mutex);
> >       return 0;
> >@@ -639,8 +763,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
> >
> >       vqs[VSOCK_VQ_TX] = &vsock->vqs[VSOCK_VQ_TX];
> >       vqs[VSOCK_VQ_RX] = &vsock->vqs[VSOCK_VQ_RX];
> >+      vqs[VSOCK_VQ_DGRAM_TX] = &vsock->vqs[VSOCK_VQ_DGRAM_TX];
> >+      vqs[VSOCK_VQ_DGRAM_RX] = &vsock->vqs[VSOCK_VQ_DGRAM_RX];
> >       vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick;
> >       vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
> >+      vsock->vqs[VSOCK_VQ_DGRAM_TX].handle_kick =
> >+                                              vhost_vsock_handle_tx_kick;
> >+      vsock->vqs[VSOCK_VQ_DGRAM_RX].handle_kick =
> >+                                              vhost_vsock_handle_rx_kick;
> >
> >       vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs),
> >                      UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT,
> >@@ -650,6 +780,11 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
> >       spin_lock_init(&vsock->send_pkt_list_lock);
> >       INIT_LIST_HEAD(&vsock->send_pkt_list);
> >       vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
> >+      spin_lock_init(&vsock->dgram_send_pkt_list_lock);
> >+      INIT_LIST_HEAD(&vsock->dgram_send_pkt_list);
> >+      vhost_work_init(&vsock->dgram_send_pkt_work,
> >+                      vhost_transport_dgram_send_pkt_work);
> >+
> >       return 0;
> >
> > out:
> >@@ -665,6 +800,7 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock)
> >               if (vsock->vqs[i].handle_kick)
> >                       vhost_poll_flush(&vsock->vqs[i].poll);
> >       vhost_work_flush(&vsock->dev, &vsock->send_pkt_work);
> >+      vhost_work_flush(&vsock->dev, &vsock->dgram_send_pkt_work);
> > }
> >
> > static void vhost_vsock_reset_orphans(struct sock *sk)
> >@@ -724,6 +860,17 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
> >       }
> >       spin_unlock_bh(&vsock->send_pkt_list_lock);
> >
> >+      spin_lock_bh(&vsock->dgram_send_pkt_list_lock);
> >+      while (!list_empty(&vsock->dgram_send_pkt_list)) {
> >+              struct virtio_vsock_pkt *pkt;
> >+
> >+              pkt = list_first_entry(&vsock->dgram_send_pkt_list,
> >+                              struct virtio_vsock_pkt, list);
> >+              list_del_init(&pkt->list);
> >+              virtio_transport_free_pkt(pkt);
> >+      }
> >+      spin_unlock_bh(&vsock->dgram_send_pkt_list_lock);
> >+
> >       vhost_dev_cleanup(&vsock->dev);
> >       kfree(vsock->dev.vqs);
> >       vhost_vsock_free(vsock);
> >@@ -906,7 +1053,7 @@ static int __init vhost_vsock_init(void)
> >       int ret;
> >
> >       ret = vsock_core_register(&vhost_transport.transport,
> >-                                VSOCK_TRANSPORT_F_H2G);
> >+                                VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM);
> >       if (ret < 0)
> >               return ret;
> >       return misc_register(&vhost_vsock_misc);
> >--
> >2.11.0
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [External] Re: [RFC v1 1/6] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
  2021-06-21 17:24       ` Jiang Wang .
@ 2021-06-22 10:50         ` Stefano Garzarella
  -1 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-22 10:50 UTC (permalink / raw)
  To: Jiang Wang .
  Cc: virtualization, Stefan Hajnoczi, Michael S. Tsirkin,
	Arseny Krasnov, cong.wang, Xiongchun Duan, Yongji Xie,
	柴稳,
	Jason Wang, David S. Miller, Jakub Kicinski, Steven Rostedt,
	Ingo Molnar, Andra Paraschiv, Norbert Slusarek, Colin Ian King,
	Alexander Popov, kvm, Networking, linux-kernel

On Mon, Jun 21, 2021 at 10:24:20AM -0700, Jiang Wang . wrote:
>On Fri, Jun 18, 2021 at 2:40 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> On Wed, Jun 09, 2021 at 11:24:53PM +0000, Jiang Wang wrote:
>> >When this feature is enabled, allocate 5 queues,
>> >otherwise, allocate 3 queues to be compatible with
>> >old QEMU versions.
>> >
>> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>> >---
>> > drivers/vhost/vsock.c             |  3 +-
>> > include/linux/virtio_vsock.h      |  9 +++++
>> > include/uapi/linux/virtio_vsock.h |  3 ++
>> > net/vmw_vsock/virtio_transport.c  | 73 +++++++++++++++++++++++++++++++++++----
>> > 4 files changed, 80 insertions(+), 8 deletions(-)
>> >
>> >diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> >index 5e78fb719602..81d064601093 100644
>> >--- a/drivers/vhost/vsock.c
>> >+++ b/drivers/vhost/vsock.c
>> >@@ -31,7 +31,8 @@
>> >
>> > enum {
>> >       VHOST_VSOCK_FEATURES = VHOST_FEATURES |
>> >-                             (1ULL << VIRTIO_F_ACCESS_PLATFORM)
>> >+                             (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
>> >+                             (1ULL << VIRTIO_VSOCK_F_DGRAM)
>> > };
>> >
>> > enum {
>> >diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> >index dc636b727179..ba3189ed9345 100644
>> >--- a/include/linux/virtio_vsock.h
>> >+++ b/include/linux/virtio_vsock.h
>> >@@ -18,6 +18,15 @@ enum {
>> >       VSOCK_VQ_MAX    = 3,
>> > };
>> >
>> >+enum {
>> >+      VSOCK_VQ_STREAM_RX     = 0, /* for host to guest data */
>> >+      VSOCK_VQ_STREAM_TX     = 1, /* for guest to host data */
>> >+      VSOCK_VQ_DGRAM_RX       = 2,
>> >+      VSOCK_VQ_DGRAM_TX       = 3,
>> >+      VSOCK_VQ_EX_EVENT       = 4,
>> >+      VSOCK_VQ_EX_MAX         = 5,
>> >+};
>> >+
>> > /* Per-socket state (accessed via vsk->trans) */
>> > struct virtio_vsock_sock {
>> >       struct vsock_sock *vsk;
>> >diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>> >index 1d57ed3d84d2..b56614dff1c9 100644
>> >--- a/include/uapi/linux/virtio_vsock.h
>> >+++ b/include/uapi/linux/virtio_vsock.h
>> >@@ -38,6 +38,9 @@
>> > #include <linux/virtio_ids.h>
>> > #include <linux/virtio_config.h>
>> >
>> >+/* The feature bitmap for virtio net */
>> >+#define VIRTIO_VSOCK_F_DGRAM  0       /* Host support dgram vsock */
>> >+
>> > struct virtio_vsock_config {
>> >       __le64 guest_cid;
>> > } __attribute__((packed));
>> >diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>> >index 2700a63ab095..7dcb8db23305 100644
>> >--- a/net/vmw_vsock/virtio_transport.c
>> >+++ b/net/vmw_vsock/virtio_transport.c
>> >@@ -27,7 +27,8 @@ static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
>> >
>> > struct virtio_vsock {
>> >       struct virtio_device *vdev;
>> >-      struct virtqueue *vqs[VSOCK_VQ_MAX];
>> >+      struct virtqueue **vqs;
>> >+      bool has_dgram;
>> >
>> >       /* Virtqueue processing is deferred to a workqueue */
>> >       struct work_struct tx_work;
>> >@@ -333,7 +334,10 @@ static int virtio_vsock_event_fill_one(struct virtio_vsock *vsock,
>> >       struct scatterlist sg;
>> >       struct virtqueue *vq;
>> >
>> >-      vq = vsock->vqs[VSOCK_VQ_EVENT];
>> >+      if (vsock->has_dgram)
>> >+              vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
>> >+      else
>> >+              vq = vsock->vqs[VSOCK_VQ_EVENT];
>> >
>> >       sg_init_one(&sg, event, sizeof(*event));
>> >
>> >@@ -351,7 +355,10 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)
>> >               virtio_vsock_event_fill_one(vsock, event);
>> >       }
>> >
>> >-      virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
>> >+      if (vsock->has_dgram)
>> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
>> >+      else
>> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
>> > }
>> >
>> > static void virtio_vsock_reset_sock(struct sock *sk)
>> >@@ -391,7 +398,10 @@ static void virtio_transport_event_work(struct work_struct *work)
>> >               container_of(work, struct virtio_vsock, event_work);
>> >       struct virtqueue *vq;
>> >
>> >-      vq = vsock->vqs[VSOCK_VQ_EVENT];
>> >+      if (vsock->has_dgram)
>> >+              vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
>> >+      else
>> >+              vq = vsock->vqs[VSOCK_VQ_EVENT];
>> >
>> >       mutex_lock(&vsock->event_lock);
>> >
>> >@@ -411,7 +421,10 @@ static void virtio_transport_event_work(struct work_struct *work)
>> >               }
>> >       } while (!virtqueue_enable_cb(vq));
>> >
>> >-      virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
>> >+      if (vsock->has_dgram)
>> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
>> >+      else
>> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
>> > out:
>> >       mutex_unlock(&vsock->event_lock);
>> > }
>> >@@ -434,6 +447,10 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)
>> >       queue_work(virtio_vsock_workqueue, &vsock->tx_work);
>> > }
>> >
>> >+static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)
>> >+{
>> >+}
>> >+
>> > static void virtio_vsock_rx_done(struct virtqueue *vq)
>> > {
>> >       struct virtio_vsock *vsock = vq->vdev->priv;
>> >@@ -443,6 +460,10 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
>> >       queue_work(virtio_vsock_workqueue, &vsock->rx_work);
>> > }
>> >
>> >+static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)
>> >+{
>> >+}
>> >+
>> > static struct virtio_transport virtio_transport = {
>> >       .transport = {
>> >               .module                   = THIS_MODULE,
>> >@@ -545,13 +566,29 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
>> >               virtio_vsock_tx_done,
>> >               virtio_vsock_event_done,
>> >       };
>> >+      vq_callback_t *ex_callbacks[] = {
>>
>> 'ex' is not clear, maybe better 'dgram'?
>>
>sure.
>
>> What happen if F_DGRAM is negotiated, but not F_STREAM?
>>
>Hmm. In my mind, F_STREAM is always negotiated. Do we want to add
>support when F_STREAM is not negotiated?
>

Yep, I think we should support this case.

The main purpose of the feature bits is to enable/disable the 
functionality after the negotiation.
Initially we didn't want to introduce it, but then we thought it was 
better because there could be a device for example that wants to support 
only datagram.

Since you're touching this part of the code, it would be very helpful to 
fix the problem now.

But if you think it's too complex, we can do it in a second step.

Thanks,
Stefano

>> >+              virtio_vsock_rx_done,
>> >+              virtio_vsock_tx_done,
>> >+              virtio_vsock_dgram_rx_done,
>> >+              virtio_vsock_dgram_tx_done,
>> >+              virtio_vsock_event_done,
>> >+      };
>> >+
>> >       static const char * const names[] = {
>> >               "rx",
>> >               "tx",
>> >               "event",
>> >       };
>> >+      static const char * const ex_names[] = {
>> >+              "rx",
>> >+              "tx",
>> >+              "dgram_rx",
>> >+              "dgram_tx",
>> >+              "event",
>> >+      };
>> >+
>> >       struct virtio_vsock *vsock = NULL;
>> >-      int ret;
>> >+      int ret, max_vq;
>> >
>> >       ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);
>> >       if (ret)
>> >@@ -572,9 +609,30 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
>> >
>> >       vsock->vdev = vdev;
>> >
>> >-      ret = virtio_find_vqs(vsock->vdev, VSOCK_VQ_MAX,
>> >+      if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
>> >+              vsock->has_dgram = true;
>> >+
>> >+      if (vsock->has_dgram)
>> >+              max_vq = VSOCK_VQ_EX_MAX;
>> >+      else
>> >+              max_vq = VSOCK_VQ_MAX;
>> >+
>> >+      vsock->vqs = kmalloc_array(max_vq, sizeof(struct virtqueue *), GFP_KERNEL);
>> >+      if (!vsock->vqs) {
>> >+              ret = -ENOMEM;
>> >+              goto out;
>> >+      }
>> >+
>> >+      if (vsock->has_dgram) {
>> >+              ret = virtio_find_vqs(vsock->vdev, max_vq,
>> >+                            vsock->vqs, ex_callbacks, ex_names,
>> >+                            NULL);
>> >+      } else {
>> >+              ret = virtio_find_vqs(vsock->vdev, max_vq,
>> >                             vsock->vqs, callbacks, names,
>> >                             NULL);
>> >+      }
>> >+
>> >       if (ret < 0)
>> >               goto out;
>> >
>> >@@ -695,6 +753,7 @@ static struct virtio_device_id id_table[] = {
>> > };
>> >
>> > static unsigned int features[] = {
>> >+      VIRTIO_VSOCK_F_DGRAM,
>> > };
>> >
>> > static struct virtio_driver virtio_vsock_driver = {
>> >--
>> >2.11.0
>> >
>>
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [External] Re: [RFC v1 1/6] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
@ 2021-06-22 10:50         ` Stefano Garzarella
  0 siblings, 0 replies; 59+ messages in thread
From: Stefano Garzarella @ 2021-06-22 10:50 UTC (permalink / raw)
  To: Jiang Wang .
  Cc: cong.wang, Xiongchun Duan, Andra Paraschiv, kvm,
	Michael S. Tsirkin, Networking, linux-kernel, Steven Rostedt,
	virtualization, Yongji Xie, 柴稳,
	Norbert Slusarek, Stefan Hajnoczi, Colin Ian King,
	Jakub Kicinski, Arseny Krasnov, Ingo Molnar, David S. Miller,
	Alexander Popov

On Mon, Jun 21, 2021 at 10:24:20AM -0700, Jiang Wang . wrote:
>On Fri, Jun 18, 2021 at 2:40 AM Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>> On Wed, Jun 09, 2021 at 11:24:53PM +0000, Jiang Wang wrote:
>> >When this feature is enabled, allocate 5 queues,
>> >otherwise, allocate 3 queues to be compatible with
>> >old QEMU versions.
>> >
>> >Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>> >---
>> > drivers/vhost/vsock.c             |  3 +-
>> > include/linux/virtio_vsock.h      |  9 +++++
>> > include/uapi/linux/virtio_vsock.h |  3 ++
>> > net/vmw_vsock/virtio_transport.c  | 73 +++++++++++++++++++++++++++++++++++----
>> > 4 files changed, 80 insertions(+), 8 deletions(-)
>> >
>> >diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> >index 5e78fb719602..81d064601093 100644
>> >--- a/drivers/vhost/vsock.c
>> >+++ b/drivers/vhost/vsock.c
>> >@@ -31,7 +31,8 @@
>> >
>> > enum {
>> >       VHOST_VSOCK_FEATURES = VHOST_FEATURES |
>> >-                             (1ULL << VIRTIO_F_ACCESS_PLATFORM)
>> >+                             (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
>> >+                             (1ULL << VIRTIO_VSOCK_F_DGRAM)
>> > };
>> >
>> > enum {
>> >diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> >index dc636b727179..ba3189ed9345 100644
>> >--- a/include/linux/virtio_vsock.h
>> >+++ b/include/linux/virtio_vsock.h
>> >@@ -18,6 +18,15 @@ enum {
>> >       VSOCK_VQ_MAX    = 3,
>> > };
>> >
>> >+enum {
>> >+      VSOCK_VQ_STREAM_RX     = 0, /* for host to guest data */
>> >+      VSOCK_VQ_STREAM_TX     = 1, /* for guest to host data */
>> >+      VSOCK_VQ_DGRAM_RX       = 2,
>> >+      VSOCK_VQ_DGRAM_TX       = 3,
>> >+      VSOCK_VQ_EX_EVENT       = 4,
>> >+      VSOCK_VQ_EX_MAX         = 5,
>> >+};
>> >+
>> > /* Per-socket state (accessed via vsk->trans) */
>> > struct virtio_vsock_sock {
>> >       struct vsock_sock *vsk;
>> >diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>> >index 1d57ed3d84d2..b56614dff1c9 100644
>> >--- a/include/uapi/linux/virtio_vsock.h
>> >+++ b/include/uapi/linux/virtio_vsock.h
>> >@@ -38,6 +38,9 @@
>> > #include <linux/virtio_ids.h>
>> > #include <linux/virtio_config.h>
>> >
>> >+/* The feature bitmap for virtio net */
>> >+#define VIRTIO_VSOCK_F_DGRAM  0       /* Host support dgram vsock */
>> >+
>> > struct virtio_vsock_config {
>> >       __le64 guest_cid;
>> > } __attribute__((packed));
>> >diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>> >index 2700a63ab095..7dcb8db23305 100644
>> >--- a/net/vmw_vsock/virtio_transport.c
>> >+++ b/net/vmw_vsock/virtio_transport.c
>> >@@ -27,7 +27,8 @@ static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
>> >
>> > struct virtio_vsock {
>> >       struct virtio_device *vdev;
>> >-      struct virtqueue *vqs[VSOCK_VQ_MAX];
>> >+      struct virtqueue **vqs;
>> >+      bool has_dgram;
>> >
>> >       /* Virtqueue processing is deferred to a workqueue */
>> >       struct work_struct tx_work;
>> >@@ -333,7 +334,10 @@ static int virtio_vsock_event_fill_one(struct virtio_vsock *vsock,
>> >       struct scatterlist sg;
>> >       struct virtqueue *vq;
>> >
>> >-      vq = vsock->vqs[VSOCK_VQ_EVENT];
>> >+      if (vsock->has_dgram)
>> >+              vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
>> >+      else
>> >+              vq = vsock->vqs[VSOCK_VQ_EVENT];
>> >
>> >       sg_init_one(&sg, event, sizeof(*event));
>> >
>> >@@ -351,7 +355,10 @@ static void virtio_vsock_event_fill(struct virtio_vsock *vsock)
>> >               virtio_vsock_event_fill_one(vsock, event);
>> >       }
>> >
>> >-      virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
>> >+      if (vsock->has_dgram)
>> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
>> >+      else
>> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
>> > }
>> >
>> > static void virtio_vsock_reset_sock(struct sock *sk)
>> >@@ -391,7 +398,10 @@ static void virtio_transport_event_work(struct work_struct *work)
>> >               container_of(work, struct virtio_vsock, event_work);
>> >       struct virtqueue *vq;
>> >
>> >-      vq = vsock->vqs[VSOCK_VQ_EVENT];
>> >+      if (vsock->has_dgram)
>> >+              vq = vsock->vqs[VSOCK_VQ_EX_EVENT];
>> >+      else
>> >+              vq = vsock->vqs[VSOCK_VQ_EVENT];
>> >
>> >       mutex_lock(&vsock->event_lock);
>> >
>> >@@ -411,7 +421,10 @@ static void virtio_transport_event_work(struct work_struct *work)
>> >               }
>> >       } while (!virtqueue_enable_cb(vq));
>> >
>> >-      virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
>> >+      if (vsock->has_dgram)
>> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EX_EVENT]);
>> >+      else
>> >+              virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);
>> > out:
>> >       mutex_unlock(&vsock->event_lock);
>> > }
>> >@@ -434,6 +447,10 @@ static void virtio_vsock_tx_done(struct virtqueue *vq)
>> >       queue_work(virtio_vsock_workqueue, &vsock->tx_work);
>> > }
>> >
>> >+static void virtio_vsock_dgram_tx_done(struct virtqueue *vq)
>> >+{
>> >+}
>> >+
>> > static void virtio_vsock_rx_done(struct virtqueue *vq)
>> > {
>> >       struct virtio_vsock *vsock = vq->vdev->priv;
>> >@@ -443,6 +460,10 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
>> >       queue_work(virtio_vsock_workqueue, &vsock->rx_work);
>> > }
>> >
>> >+static void virtio_vsock_dgram_rx_done(struct virtqueue *vq)
>> >+{
>> >+}
>> >+
>> > static struct virtio_transport virtio_transport = {
>> >       .transport = {
>> >               .module                   = THIS_MODULE,
>> >@@ -545,13 +566,29 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
>> >               virtio_vsock_tx_done,
>> >               virtio_vsock_event_done,
>> >       };
>> >+      vq_callback_t *ex_callbacks[] = {
>>
>> 'ex' is not clear, maybe better 'dgram'?
>>
>sure.
>
>> What happen if F_DGRAM is negotiated, but not F_STREAM?
>>
>Hmm. In my mind, F_STREAM is always negotiated. Do we want to add
>support when F_STREAM is not negotiated?
>

Yep, I think we should support this case.

The main purpose of the feature bits is to enable/disable the 
functionality after the negotiation.
Initially we didn't want to introduce it, but then we thought it was 
better because there could be a device for example that wants to support 
only datagram.

Since you're touching this part of the code, it would be very helpful to 
fix the problem now.

But if you think it's too complex, we can do it in a second step.

Thanks,
Stefano

>> >+              virtio_vsock_rx_done,
>> >+              virtio_vsock_tx_done,
>> >+              virtio_vsock_dgram_rx_done,
>> >+              virtio_vsock_dgram_tx_done,
>> >+              virtio_vsock_event_done,
>> >+      };
>> >+
>> >       static const char * const names[] = {
>> >               "rx",
>> >               "tx",
>> >               "event",
>> >       };
>> >+      static const char * const ex_names[] = {
>> >+              "rx",
>> >+              "tx",
>> >+              "dgram_rx",
>> >+              "dgram_tx",
>> >+              "event",
>> >+      };
>> >+
>> >       struct virtio_vsock *vsock = NULL;
>> >-      int ret;
>> >+      int ret, max_vq;
>> >
>> >       ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);
>> >       if (ret)
>> >@@ -572,9 +609,30 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
>> >
>> >       vsock->vdev = vdev;
>> >
>> >-      ret = virtio_find_vqs(vsock->vdev, VSOCK_VQ_MAX,
>> >+      if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
>> >+              vsock->has_dgram = true;
>> >+
>> >+      if (vsock->has_dgram)
>> >+              max_vq = VSOCK_VQ_EX_MAX;
>> >+      else
>> >+              max_vq = VSOCK_VQ_MAX;
>> >+
>> >+      vsock->vqs = kmalloc_array(max_vq, sizeof(struct virtqueue *), GFP_KERNEL);
>> >+      if (!vsock->vqs) {
>> >+              ret = -ENOMEM;
>> >+              goto out;
>> >+      }
>> >+
>> >+      if (vsock->has_dgram) {
>> >+              ret = virtio_find_vqs(vsock->vdev, max_vq,
>> >+                            vsock->vqs, ex_callbacks, ex_names,
>> >+                            NULL);
>> >+      } else {
>> >+              ret = virtio_find_vqs(vsock->vdev, max_vq,
>> >                             vsock->vqs, callbacks, names,
>> >                             NULL);
>> >+      }
>> >+
>> >       if (ret < 0)
>> >               goto out;
>> >
>> >@@ -695,6 +753,7 @@ static struct virtio_device_id id_table[] = {
>> > };
>> >
>> > static unsigned int features[] = {
>> >+      VIRTIO_VSOCK_F_DGRAM,
>> > };
>> >
>> > static struct virtio_driver virtio_vsock_driver = {
>> >--
>> >2.11.0
>> >
>>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2021-06-22 10:51 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-09 23:24 [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support Jiang Wang
2021-06-09 23:24 ` Jiang Wang
2021-06-09 23:24 ` [RFC v1 1/6] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Jiang Wang
2021-06-09 23:24   ` Jiang Wang
2021-06-18  9:39   ` Stefano Garzarella
2021-06-18  9:39     ` Stefano Garzarella
2021-06-21 17:24     ` [External] " Jiang Wang .
2021-06-21 17:24       ` Jiang Wang .
2021-06-22 10:50       ` Stefano Garzarella
2021-06-22 10:50         ` Stefano Garzarella
2021-06-09 23:24 ` [RFC v1 2/6] virtio/vsock: add support for virtio datagram Jiang Wang
2021-06-09 23:24   ` Jiang Wang
2021-06-16  9:06   ` kernel test robot
2021-06-16  9:17   ` kernel test robot
2021-06-16 11:18   ` kernel test robot
2021-06-16 17:54   ` kernel test robot
2021-06-18  9:52   ` Stefano Garzarella
2021-06-18  9:52     ` Stefano Garzarella
2021-06-18 10:11   ` Stefano Garzarella
2021-06-18 10:11     ` Stefano Garzarella
2021-06-09 23:24 ` [RFC v1 3/6] vhost/vsock: add support for vhost dgram Jiang Wang
2021-06-09 23:24   ` Jiang Wang
2021-06-16 12:33   ` kernel test robot
2021-06-18 10:13   ` Stefano Garzarella
2021-06-18 10:13     ` Stefano Garzarella
2021-06-21 17:32     ` [External] " Jiang Wang .
2021-06-21 17:32       ` Jiang Wang .
2021-06-09 23:24 ` [RFC v1 4/6] vsock_test: add tests for vsock dgram Jiang Wang
2021-06-09 23:24   ` Jiang Wang
2021-06-09 23:24 ` [RFC v1 5/6] vhost/vsock: add kconfig for vhost dgram support Jiang Wang
2021-06-09 23:24   ` Jiang Wang
2021-06-18  9:54   ` Stefano Garzarella
2021-06-18  9:54     ` Stefano Garzarella
2021-06-21 17:25     ` [External] " Jiang Wang .
2021-06-21 17:25       ` Jiang Wang .
2021-06-09 23:24 ` [RFC v1 6/6] virtio/vsock: add sysfs for rx buf len for dgram Jiang Wang
2021-06-09 23:24   ` Jiang Wang
2021-06-18 10:04   ` Stefano Garzarella
2021-06-18 10:04     ` Stefano Garzarella
2021-06-21 17:27     ` [External] " Jiang Wang .
2021-06-21 17:27       ` Jiang Wang .
2021-06-10  1:50 ` [RFC v1 0/6] virtio/vsock: introduce SOCK_DGRAM support Jason Wang
2021-06-10  1:50   ` Jason Wang
2021-06-10  3:43   ` Jiang Wang .
2021-06-10  3:43     ` Jiang Wang .
2021-06-10  4:02     ` Jason Wang
2021-06-10  4:02       ` Jason Wang
2021-06-10  7:23       ` Stefano Garzarella
2021-06-10  7:23         ` Stefano Garzarella
2021-06-10  7:46         ` Jason Wang
2021-06-10  7:46           ` Jason Wang
2021-06-10  9:51           ` Stefano Garzarella
2021-06-10  9:51             ` Stefano Garzarella
2021-06-10 16:44             ` Jiang Wang .
2021-06-10 16:44               ` Jiang Wang .
2021-06-18  9:35 ` Stefano Garzarella
2021-06-18  9:35   ` Stefano Garzarella
2021-06-21 17:21   ` [External] " Jiang Wang .
2021-06-21 17:21     ` Jiang Wang .

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.