All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v8 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations
@ 2023-09-11 20:22 Arseniy Krasnov
  2023-09-11 20:22 ` [PATCH net-next v8 1/4] vsock/virtio/vhost: read data from non-linear skb Arseniy Krasnov
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Arseniy Krasnov @ 2023-09-11 20:22 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Michael S. Tsirkin,
	Jason Wang, Bobby Eshleman
  Cc: kvm, virtualization, netdev, linux-kernel, kernel, oxffffaa, avkrasnov

Hello,

this patchset is first of three parts of another big patchset for
MSG_ZEROCOPY flag support:
https://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/

During review of this series, Stefano Garzarella <sgarzare@redhat.com>
suggested to split it for three parts to simplify review and merging:

1) virtio and vhost updates (for fragged skbs) <--- this patchset
2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
   tx completions) and update for Documentation/.
3) Updates for tests and utils.

This series enables handling of fragged skbs in virtio and vhost parts.
Newly logic won't be triggered, because SO_ZEROCOPY options is still
impossible to enable at this moment (next bunch of patches from big
set above will enable it).

I've included changelog to some patches anyway, because there were some
comments during review of last big patchset from the link above.

Head for this patchset is:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=73be7fb14e83d24383f840a22f24d3ed222ca319

Link to v1:
https://lore.kernel.org/netdev/20230717210051.856388-1-AVKrasnov@sberdevices.ru/
Link to v2:
https://lore.kernel.org/netdev/20230718180237.3248179-1-AVKrasnov@sberdevices.ru/
Link to v3:
https://lore.kernel.org/netdev/20230720214245.457298-1-AVKrasnov@sberdevices.ru/
Link to v4:
https://lore.kernel.org/netdev/20230727222627.1895355-1-AVKrasnov@sberdevices.ru/
Link to v5:
https://lore.kernel.org/netdev/20230730085905.3420811-1-AVKrasnov@sberdevices.ru/
Link to v6:
https://lore.kernel.org/netdev/20230814212720.3679058-1-AVKrasnov@sberdevices.ru/
Link to v7:
https://lore.kernel.org/netdev/20230827085436.941183-1-avkrasnov@salutedevices.com/

Changelog:
 v3 -> v4:
 * Patchset rebased and tested on new HEAD of net-next (see hash above).
 v4 -> v5:
 * See per-patch changelog after ---.
 v5 -> v6:
 * Patchset rebased and tested on new HEAD of net-next (see hash above).
 * See per-patch changelog after ---.
 v6 -> v7:
 * Patchset rebased and tested on new HEAD of net-next (see hash above).
 * See per-patch changelog after ---.
 v7 -> v8:
 * Patchset rebased and tested on new HEAD of net-next (see hash above).
 * See per-patch changelog after ---.

Arseniy Krasnov (4):
  vsock/virtio/vhost: read data from non-linear skb
  vsock/virtio: support to send non-linear skb
  vsock/virtio: non-linear skb handling for tap
  vsock/virtio: MSG_ZEROCOPY flag support

 drivers/vhost/vsock.c                   |  14 +-
 include/linux/virtio_vsock.h            |  10 +
 net/vmw_vsock/virtio_transport.c        |  92 ++++++-
 net/vmw_vsock/virtio_transport_common.c | 309 ++++++++++++++++++------
 4 files changed, 344 insertions(+), 81 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net-next v8 1/4] vsock/virtio/vhost: read data from non-linear skb
  2023-09-11 20:22 [PATCH net-next v8 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations Arseniy Krasnov
@ 2023-09-11 20:22 ` Arseniy Krasnov
  2023-09-11 20:22 ` [PATCH net-next v8 2/4] vsock/virtio: support to send " Arseniy Krasnov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Arseniy Krasnov @ 2023-09-11 20:22 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Michael S. Tsirkin,
	Jason Wang, Bobby Eshleman
  Cc: kvm, virtualization, netdev, linux-kernel, kernel, oxffffaa, avkrasnov

This is preparation patch for MSG_ZEROCOPY support. It adds handling of
non-linear skbs by replacing direct calls of 'memcpy_to_msg()' with
'skb_copy_datagram_iter()'. Main advantage of the second one is that it
can handle paged part of the skb by using 'kmap()' on each page, but if
there are no pages in the skb, it behaves like simple copying to iov
iterator. This patch also adds new field to the control block of skb -
this value shows current offset in the skb to read next portion of data
(it doesn't matter linear it or not). Idea behind this field is that
'skb_copy_datagram_iter()' handles both types of skb internally - it
just needs an offset from which to copy data from the given skb. This
offset is incremented on each read from skb. This approach allows to
simplify handling of both linear and non-linear skbs, because for
linear skb we need to call 'skb_pull()' after reading data from it,
while in non-linear case we need to update 'data_len'.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
---
 Changelog:
 v5(big patchset) -> v1:
  * Merge 'virtio_transport_common.c' and 'vhost/vsock.c' patches into
    this single patch.
  * Commit message update: grammar fix and remark that this patch is
    MSG_ZEROCOPY preparation.
  * Use 'min_t()' instead of comparison using '<>' operators.
 v1 -> v2:
  * R-b tag added.
 v3 -> v4:
  * R-b tag removed due to rebase:
    * Part for 'virtio_transport_stream_do_peek()' is changed.
    * Part for 'virtio_transport_seqpacket_do_peek()' is added.
  * Comments about sleep in 'memcpy_to_msg()' now describe sleep in
    'skb_copy_datagram_iter()'.
 v5 -> v6:
  * Commit message update.
  * Rename 'frag_off' to 'offset' in 'virtio_vsock_skb_cb'.

 drivers/vhost/vsock.c                   | 14 +++++++----
 include/linux/virtio_vsock.h            |  1 +
 net/vmw_vsock/virtio_transport_common.c | 32 +++++++++++++++----------
 3 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 817d377a3f36..83711aad855c 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -114,6 +114,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		struct sk_buff *skb;
 		unsigned out, in;
 		size_t nbytes;
+		u32 offset;
 		int head;
 
 		skb = virtio_vsock_skb_dequeue(&vsock->send_pkt_queue);
@@ -156,7 +157,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		}
 
 		iov_iter_init(&iov_iter, ITER_DEST, &vq->iov[out], in, iov_len);
-		payload_len = skb->len;
+		offset = VIRTIO_VSOCK_SKB_CB(skb)->offset;
+		payload_len = skb->len - offset;
 		hdr = virtio_vsock_hdr(skb);
 
 		/* If the packet is greater than the space available in the
@@ -197,8 +199,10 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		nbytes = copy_to_iter(skb->data, payload_len, &iov_iter);
-		if (nbytes != payload_len) {
+		if (skb_copy_datagram_iter(skb,
+					   offset,
+					   &iov_iter,
+					   payload_len)) {
 			kfree_skb(skb);
 			vq_err(vq, "Faulted on copying pkt buf\n");
 			break;
@@ -212,13 +216,13 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		vhost_add_used(vq, head, sizeof(*hdr) + payload_len);
 		added = true;
 
-		skb_pull(skb, payload_len);
+		VIRTIO_VSOCK_SKB_CB(skb)->offset += payload_len;
 		total_len += payload_len;
 
 		/* If we didn't send all the payload we can requeue the packet
 		 * to send it with the next available buffer.
 		 */
-		if (skb->len > 0) {
+		if (VIRTIO_VSOCK_SKB_CB(skb)->offset < skb->len) {
 			hdr->flags |= cpu_to_le32(flags_to_restore);
 
 			/* We are queueing the same skb to handle
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index c58453699ee9..a91fbdf233e4 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -12,6 +12,7 @@
 struct virtio_vsock_skb_cb {
 	bool reply;
 	bool tap_delivered;
+	u32 offset;
 };
 
 #define VIRTIO_VSOCK_SKB_CB(skb) ((struct virtio_vsock_skb_cb *)((skb)->cb))
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 352d042b130b..3e08d52a9355 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -364,9 +364,10 @@ virtio_transport_stream_do_peek(struct vsock_sock *vsk,
 		spin_unlock_bh(&vvs->rx_lock);
 
 		/* sk_lock is held by caller so no one else can dequeue.
-		 * Unlock rx_lock since memcpy_to_msg() may sleep.
+		 * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
 		 */
-		err = memcpy_to_msg(msg, skb->data, bytes);
+		err = skb_copy_datagram_iter(skb, VIRTIO_VSOCK_SKB_CB(skb)->offset,
+					     &msg->msg_iter, bytes);
 		if (err)
 			goto out;
 
@@ -410,25 +411,27 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	while (total < len && !skb_queue_empty(&vvs->rx_queue)) {
 		skb = skb_peek(&vvs->rx_queue);
 
-		bytes = len - total;
-		if (bytes > skb->len)
-			bytes = skb->len;
+		bytes = min_t(size_t, len - total,
+			      skb->len - VIRTIO_VSOCK_SKB_CB(skb)->offset);
 
 		/* sk_lock is held by caller so no one else can dequeue.
-		 * Unlock rx_lock since memcpy_to_msg() may sleep.
+		 * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
 		 */
 		spin_unlock_bh(&vvs->rx_lock);
 
-		err = memcpy_to_msg(msg, skb->data, bytes);
+		err = skb_copy_datagram_iter(skb,
+					     VIRTIO_VSOCK_SKB_CB(skb)->offset,
+					     &msg->msg_iter, bytes);
 		if (err)
 			goto out;
 
 		spin_lock_bh(&vvs->rx_lock);
 
 		total += bytes;
-		skb_pull(skb, bytes);
 
-		if (skb->len == 0) {
+		VIRTIO_VSOCK_SKB_CB(skb)->offset += bytes;
+
+		if (skb->len == VIRTIO_VSOCK_SKB_CB(skb)->offset) {
 			u32 pkt_len = le32_to_cpu(virtio_vsock_hdr(skb)->len);
 
 			virtio_transport_dec_rx_pkt(vvs, pkt_len);
@@ -492,9 +495,10 @@ virtio_transport_seqpacket_do_peek(struct vsock_sock *vsk,
 			spin_unlock_bh(&vvs->rx_lock);
 
 			/* sk_lock is held by caller so no one else can dequeue.
-			 * Unlock rx_lock since memcpy_to_msg() may sleep.
+			 * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
 			 */
-			err = memcpy_to_msg(msg, skb->data, bytes);
+			err = skb_copy_datagram_iter(skb, VIRTIO_VSOCK_SKB_CB(skb)->offset,
+						     &msg->msg_iter, bytes);
 			if (err)
 				return err;
 
@@ -553,11 +557,13 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk,
 				int err;
 
 				/* sk_lock is held by caller so no one else can dequeue.
-				 * Unlock rx_lock since memcpy_to_msg() may sleep.
+				 * Unlock rx_lock since skb_copy_datagram_iter() may sleep.
 				 */
 				spin_unlock_bh(&vvs->rx_lock);
 
-				err = memcpy_to_msg(msg, skb->data, bytes_to_copy);
+				err = skb_copy_datagram_iter(skb, 0,
+							     &msg->msg_iter,
+							     bytes_to_copy);
 				if (err) {
 					/* Copy of message failed. Rest of
 					 * fragments will be freed without copy.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v8 2/4] vsock/virtio: support to send non-linear skb
  2023-09-11 20:22 [PATCH net-next v8 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations Arseniy Krasnov
  2023-09-11 20:22 ` [PATCH net-next v8 1/4] vsock/virtio/vhost: read data from non-linear skb Arseniy Krasnov
@ 2023-09-11 20:22 ` Arseniy Krasnov
  2023-09-14 13:06     ` Stefano Garzarella
  2023-09-11 20:22 ` [PATCH net-next v8 3/4] vsock/virtio: non-linear skb handling for tap Arseniy Krasnov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Arseniy Krasnov @ 2023-09-11 20:22 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Michael S. Tsirkin,
	Jason Wang, Bobby Eshleman
  Cc: kvm, virtualization, netdev, linux-kernel, kernel, oxffffaa, avkrasnov

For non-linear skb use its pages from fragment array as buffers in
virtio tx queue. These pages are already pinned by 'get_user_pages()'
during such skb creation.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
---
 Changelog:
 v2 -> v3:
  * Comment about 'page_to_virt()' is updated. I don't remove R-b,
    as this change is quiet small I guess.
 v6 -> v7:
  * Move arrays '*sgs' and 'bufs' to 'virtio_vsock' instead of being
    local variables. This allows to save stack space in cases of too
    big MAX_SKB_FRAGS.
  * Add 'WARN_ON_ONCE()' for handling nonlinear skbs - it checks that
    linear part of such skb contains only header.
  * R-b tag removed due to updates above.
 v7 -> v8:
  * Add comment in 'struct virtio_vsock' for both 'struct scatterlist'
    fields.
  * Rename '*sgs' and 'bufs' to '*out_sgs' and 'out_bufs'.
  * Initialize '*out_sgs' in 'virtio_vsock_probe()' by always pointing
    to the corresponding element of 'out_bufs'.

 net/vmw_vsock/virtio_transport.c | 60 ++++++++++++++++++++++++++++----
 1 file changed, 53 insertions(+), 7 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index e95df847176b..73d730156349 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -63,6 +63,17 @@ struct virtio_vsock {
 
 	u32 guest_cid;
 	bool seqpacket_allow;
+
+	/* These fields are used only in tx path in function
+	 * 'virtio_transport_send_pkt_work()', so to save
+	 * stack space in it, place both of them here. Each
+	 * pointer from 'out_sgs' points to the corresponding
+	 * element in 'out_bufs' - this is initialized in
+	 * 'virtio_vsock_probe()'. Both fields are protected
+	 * by 'tx_lock'. +1 is needed for packet header.
+	 */
+	struct scatterlist *out_sgs[MAX_SKB_FRAGS + 1];
+	struct scatterlist out_bufs[MAX_SKB_FRAGS + 1];
 };
 
 static u32 virtio_transport_get_local_cid(void)
@@ -100,8 +111,8 @@ virtio_transport_send_pkt_work(struct work_struct *work)
 	vq = vsock->vqs[VSOCK_VQ_TX];
 
 	for (;;) {
-		struct scatterlist hdr, buf, *sgs[2];
 		int ret, in_sg = 0, out_sg = 0;
+		struct scatterlist **sgs;
 		struct sk_buff *skb;
 		bool reply;
 
@@ -111,12 +122,43 @@ virtio_transport_send_pkt_work(struct work_struct *work)
 
 		virtio_transport_deliver_tap_pkt(skb);
 		reply = virtio_vsock_skb_reply(skb);
-
-		sg_init_one(&hdr, virtio_vsock_hdr(skb), sizeof(*virtio_vsock_hdr(skb)));
-		sgs[out_sg++] = &hdr;
-		if (skb->len > 0) {
-			sg_init_one(&buf, skb->data, skb->len);
-			sgs[out_sg++] = &buf;
+		sgs = vsock->out_sgs;
+		sg_init_one(sgs[out_sg], virtio_vsock_hdr(skb),
+			    sizeof(*virtio_vsock_hdr(skb)));
+		out_sg++;
+
+		if (!skb_is_nonlinear(skb)) {
+			if (skb->len > 0) {
+				sg_init_one(sgs[out_sg], skb->data, skb->len);
+				out_sg++;
+			}
+		} else {
+			struct skb_shared_info *si;
+			int i;
+
+			/* If skb is nonlinear, then its buffer must contain
+			 * only header and nothing more. Data is stored in
+			 * the fragged part.
+			 */
+			WARN_ON_ONCE(skb_headroom(skb) != sizeof(*virtio_vsock_hdr(skb)));
+
+			si = skb_shinfo(skb);
+
+			for (i = 0; i < si->nr_frags; i++) {
+				skb_frag_t *skb_frag = &si->frags[i];
+				void *va;
+
+				/* We will use 'page_to_virt()' for the userspace page
+				 * here, because virtio or dma-mapping layers will call
+				 * 'virt_to_phys()' later to fill the buffer descriptor.
+				 * We don't touch memory at "virtual" address of this page.
+				 */
+				va = page_to_virt(skb_frag->bv_page);
+				sg_init_one(sgs[out_sg],
+					    va + skb_frag->bv_offset,
+					    skb_frag->bv_len);
+				out_sg++;
+			}
 		}
 
 		ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, skb, GFP_KERNEL);
@@ -621,6 +663,7 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
 {
 	struct virtio_vsock *vsock = NULL;
 	int ret;
+	int i;
 
 	ret = mutex_lock_interruptible(&the_virtio_vsock_mutex);
 	if (ret)
@@ -663,6 +706,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
 	if (ret < 0)
 		goto out;
 
+	for (i = 0; i < ARRAY_SIZE(vsock->out_sgs); i++)
+		vsock->out_sgs[i] = &vsock->out_bufs[i];
+
 	rcu_assign_pointer(the_virtio_vsock, vsock);
 
 	mutex_unlock(&the_virtio_vsock_mutex);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v8 3/4] vsock/virtio: non-linear skb handling for tap
  2023-09-11 20:22 [PATCH net-next v8 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations Arseniy Krasnov
  2023-09-11 20:22 ` [PATCH net-next v8 1/4] vsock/virtio/vhost: read data from non-linear skb Arseniy Krasnov
  2023-09-11 20:22 ` [PATCH net-next v8 2/4] vsock/virtio: support to send " Arseniy Krasnov
@ 2023-09-11 20:22 ` Arseniy Krasnov
  2023-09-11 20:22 ` [PATCH net-next v8 4/4] vsock/virtio: MSG_ZEROCOPY flag support Arseniy Krasnov
  2023-09-14 14:07   ` Stefano Garzarella
  4 siblings, 0 replies; 16+ messages in thread
From: Arseniy Krasnov @ 2023-09-11 20:22 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Michael S. Tsirkin,
	Jason Wang, Bobby Eshleman
  Cc: kvm, virtualization, netdev, linux-kernel, kernel, oxffffaa, avkrasnov

For tap device new skb is created and data from the current skb is
copied to it. This adds copying data from non-linear skb to new
the skb.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 31 ++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 3e08d52a9355..3a48e48a99ac 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -106,6 +106,27 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
 	return NULL;
 }
 
+static void virtio_transport_copy_nonlinear_skb(const struct sk_buff *skb,
+						void *dst,
+						size_t len)
+{
+	struct iov_iter iov_iter = { 0 };
+	struct kvec kvec;
+	size_t to_copy;
+
+	kvec.iov_base = dst;
+	kvec.iov_len = len;
+
+	iov_iter.iter_type = ITER_KVEC;
+	iov_iter.kvec = &kvec;
+	iov_iter.nr_segs = 1;
+
+	to_copy = min_t(size_t, len, skb->len);
+
+	skb_copy_datagram_iter(skb, VIRTIO_VSOCK_SKB_CB(skb)->offset,
+			       &iov_iter, to_copy);
+}
+
 /* Packet capture */
 static struct sk_buff *virtio_transport_build_skb(void *opaque)
 {
@@ -114,7 +135,6 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
 	struct af_vsockmon_hdr *hdr;
 	struct sk_buff *skb;
 	size_t payload_len;
-	void *payload_buf;
 
 	/* A packet could be split to fit the RX buffer, so we can retrieve
 	 * the payload length from the header and the buffer pointer taking
@@ -122,7 +142,6 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
 	 */
 	pkt_hdr = virtio_vsock_hdr(pkt);
 	payload_len = pkt->len;
-	payload_buf = pkt->data;
 
 	skb = alloc_skb(sizeof(*hdr) + sizeof(*pkt_hdr) + payload_len,
 			GFP_ATOMIC);
@@ -165,7 +184,13 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque)
 	skb_put_data(skb, pkt_hdr, sizeof(*pkt_hdr));
 
 	if (payload_len) {
-		skb_put_data(skb, payload_buf, payload_len);
+		if (skb_is_nonlinear(pkt)) {
+			void *data = skb_put(skb, payload_len);
+
+			virtio_transport_copy_nonlinear_skb(pkt, data, payload_len);
+		} else {
+			skb_put_data(skb, pkt->data, payload_len);
+		}
 	}
 
 	return skb;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v8 4/4] vsock/virtio: MSG_ZEROCOPY flag support
  2023-09-11 20:22 [PATCH net-next v8 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations Arseniy Krasnov
                   ` (2 preceding siblings ...)
  2023-09-11 20:22 ` [PATCH net-next v8 3/4] vsock/virtio: non-linear skb handling for tap Arseniy Krasnov
@ 2023-09-11 20:22 ` Arseniy Krasnov
  2023-09-14 13:57     ` Stefano Garzarella
  2023-09-14 14:07   ` Stefano Garzarella
  4 siblings, 1 reply; 16+ messages in thread
From: Arseniy Krasnov @ 2023-09-11 20:22 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Michael S. Tsirkin,
	Jason Wang, Bobby Eshleman
  Cc: kvm, virtualization, netdev, linux-kernel, kernel, oxffffaa, avkrasnov

This adds handling of MSG_ZEROCOPY flag on transmission path:

1) If this flag is set and zerocopy transmission is possible (enabled
   in socket options and transport allows zerocopy), then non-linear
   skb will be created and filled with the pages of user's buffer.
   Pages of user's buffer are locked in memory by 'get_user_pages()'.
2) Replaces way of skb owning: instead of 'skb_set_owner_sk_safe()' it
   calls 'skb_set_owner_w()'. Reason of this change is that
   '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so
   to decrease this field correctly, proper skb destructor is needed:
   'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
3) Adds new callback to 'struct virtio_transport': 'can_msgzerocopy'.
   If this callback is set, then transport needs extra check to be able
   to send provided number of buffers in zerocopy mode. Currently, the
   only transport that needs this callback set is virtio, because this
   transport adds new buffers to the virtio queue and we need to check,
   that number of these buffers is less than size of the queue (it is
   required by virtio spec). vhost and loopback transports don't need
   this check.

Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
---
 Changelog:
 v5(big patchset) -> v1:
  * Refactorings of 'if' conditions.
  * Remove extra blank line.
  * Remove 'frag_off' field unneeded init.
  * Add function 'virtio_transport_fill_skb()' which fills both linear
    and non-linear skb with provided data.
 v1 -> v2:
  * Use original order of last four arguments in 'virtio_transport_alloc_skb()'.
 v2 -> v3:
  * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
    provided 'iov_iter' with data could be sent in a zerocopy mode.
    If this callback is not set in transport - transport allows to send
    any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true'
    then zerocopy is allowed. Reason of this callback is that in case of
    G2H transmission we insert whole skb to the tx virtio queue and such
    skb must fit to the size of the virtio queue to be sent in a single
    iteration (may be tx logic in 'virtio_transport.c' could be reworked
    as in vhost to support partial send of current skb). This callback
    will be enabled only for G2H path. For details pls see comment 
    'Check that tx queue...' below.
 v3 -> v4:
  * 'msgzerocopy_check_iov' moved from 'struct vsock_transport' to
    'struct virtio_transport' as it is virtio specific callback and
    never needed in other transports.
 v4 -> v5:
  * 'msgzerocopy_check_iov' renamed to 'can_msgzerocopy' and now it
    uses number of buffers to send as input argument. I think there is
    no need to pass iov to this callback (at least today, it is used only
    by guest side of virtio transport), because the only thing that this
    callback does is comparison of number of buffers to be inserted to
    the tx queue and size of this queue.
  * Remove any checks for type of current 'iov_iter' with payload (is it
    'iovec' or 'ubuf'). These checks left from the earlier versions where I
    didn't use already implemented kernel API which handles every type of
    'iov_iter'.
 v5 -> v6:
  * Refactor 'virtio_transport_fill_skb()'.
  * Add 'WARN_ON_ONCE()' and comment on invalid combination of destination
    socket and payload in 'virtio_transport_alloc_skb()'. 
 v7 -> v8:
  * Move '+1' addition from 'can_msgzerocopy' callback body to the caller.
    This addition means packet header.
  * In 'virtio_transport_can_zcopy()' rename 'max_to_send' argument to
    'pkt_len'.
  * Update commit message by adding details about new 'can_msgzerocopy'
    callback.
  * In 'virtio_transport_init_hdr()' move 'len' argument directly after
    'info'.
  * Add comment about processing last skb in tx loop.
  * Update comment for 'can_msgzerocopy' callback for more details.

 include/linux/virtio_vsock.h            |   9 +
 net/vmw_vsock/virtio_transport.c        |  32 +++
 net/vmw_vsock/virtio_transport_common.c | 256 ++++++++++++++++++------
 3 files changed, 239 insertions(+), 58 deletions(-)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index a91fbdf233e4..ebb3ce63d64d 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -160,6 +160,15 @@ struct virtio_transport {
 
 	/* Takes ownership of the packet */
 	int (*send_pkt)(struct sk_buff *skb);
+
+	/* Used in MSG_ZEROCOPY mode. Checks, that provided data
+	 * (number of buffers) could be transmitted with zerocopy
+	 * mode. If this callback is not implemented for the current
+	 * transport - this means that this transport doesn't need
+	 * extra checks and can perform zerocopy transmission by
+	 * default.
+	 */
+	bool (*can_msgzerocopy)(int bufs_num);
 };
 
 ssize_t
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 73d730156349..09ba3128e759 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -455,6 +455,37 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
 	queue_work(virtio_vsock_workqueue, &vsock->rx_work);
 }
 
+static bool virtio_transport_can_msgzerocopy(int bufs_num)
+{
+	struct virtio_vsock *vsock;
+	bool res = false;
+
+	rcu_read_lock();
+
+	vsock = rcu_dereference(the_virtio_vsock);
+	if (vsock) {
+		struct virtqueue *vq = vsock->vqs[VSOCK_VQ_TX];
+
+		/* Check that tx queue is large enough to keep whole
+		 * data to send. This is needed, because when there is
+		 * not enough free space in the queue, current skb to
+		 * send will be reinserted to the head of tx list of
+		 * the socket to retry transmission later, so if skb
+		 * is bigger than whole queue, it will be reinserted
+		 * again and again, thus blocking other skbs to be sent.
+		 * Each page of the user provided buffer will be added
+		 * as a single buffer to the tx virtqueue, so compare
+		 * number of pages against maximum capacity of the queue.
+		 */
+		if (bufs_num <= vq->num_max)
+			res = true;
+	}
+
+	rcu_read_unlock();
+
+	return res;
+}
+
 static bool virtio_transport_seqpacket_allow(u32 remote_cid);
 
 static struct virtio_transport virtio_transport = {
@@ -504,6 +535,7 @@ static struct virtio_transport virtio_transport = {
 	},
 
 	.send_pkt = virtio_transport_send_pkt,
+	.can_msgzerocopy = virtio_transport_can_msgzerocopy,
 };
 
 static bool virtio_transport_seqpacket_allow(u32 remote_cid)
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 3a48e48a99ac..e358f118b07e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -37,73 +37,110 @@ virtio_transport_get_ops(struct vsock_sock *vsk)
 	return container_of(t, struct virtio_transport, transport);
 }
 
-/* Returns a new packet on success, otherwise returns NULL.
- *
- * If NULL is returned, errp is set to a negative errno.
- */
-static struct sk_buff *
-virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
-			   size_t len,
-			   u32 src_cid,
-			   u32 src_port,
-			   u32 dst_cid,
-			   u32 dst_port)
-{
-	const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len;
-	struct virtio_vsock_hdr *hdr;
-	struct sk_buff *skb;
-	void *payload;
-	int err;
+static bool virtio_transport_can_zcopy(struct virtio_vsock_pkt_info *info,
+				       size_t pkt_len)
+{
+	const struct virtio_transport *t_ops;
+	struct iov_iter *iov_iter;
 
-	skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
-	if (!skb)
-		return NULL;
+	if (!info->msg)
+		return false;
 
-	hdr = virtio_vsock_hdr(skb);
-	hdr->type	= cpu_to_le16(info->type);
-	hdr->op		= cpu_to_le16(info->op);
-	hdr->src_cid	= cpu_to_le64(src_cid);
-	hdr->dst_cid	= cpu_to_le64(dst_cid);
-	hdr->src_port	= cpu_to_le32(src_port);
-	hdr->dst_port	= cpu_to_le32(dst_port);
-	hdr->flags	= cpu_to_le32(info->flags);
-	hdr->len	= cpu_to_le32(len);
+	iov_iter = &info->msg->msg_iter;
 
-	if (info->msg && len > 0) {
-		payload = skb_put(skb, len);
-		err = memcpy_from_msg(payload, info->msg, len);
-		if (err)
-			goto out;
+	if (iov_iter->iov_offset)
+		return false;
 
-		if (msg_data_left(info->msg) == 0 &&
-		    info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) {
-			hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
+	/* We can't send whole iov. */
+	if (iov_iter->count > pkt_len)
+		return false;
 
-			if (info->msg->msg_flags & MSG_EOR)
-				hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR);
-		}
+	/* Check that transport can send data in zerocopy mode. */
+	t_ops = virtio_transport_get_ops(info->vsk);
+
+	if (t_ops->can_msgzerocopy) {
+		int pages_in_iov = iov_iter_npages(iov_iter, MAX_SKB_FRAGS);
+		int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS);
+
+		/* +1 is for packet header. */
+		return t_ops->can_msgzerocopy(pages_to_send + 1);
 	}
 
-	if (info->reply)
-		virtio_vsock_skb_set_reply(skb);
+	return true;
+}
 
-	trace_virtio_transport_alloc_pkt(src_cid, src_port,
-					 dst_cid, dst_port,
-					 len,
-					 info->type,
-					 info->op,
-					 info->flags);
+static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
+					   struct sk_buff *skb,
+					   struct msghdr *msg,
+					   bool zerocopy)
+{
+	struct ubuf_info *uarg;
 
-	if (info->vsk && !skb_set_owner_sk_safe(skb, sk_vsock(info->vsk))) {
-		WARN_ONCE(1, "failed to allocate skb on vsock socket with sk_refcnt == 0\n");
-		goto out;
+	if (msg->msg_ubuf) {
+		uarg = msg->msg_ubuf;
+		net_zcopy_get(uarg);
+	} else {
+		struct iov_iter *iter = &msg->msg_iter;
+		struct ubuf_info_msgzc *uarg_zc;
+
+		uarg = msg_zerocopy_realloc(sk_vsock(vsk),
+					    iter->count,
+					    NULL);
+		if (!uarg)
+			return -1;
+
+		uarg_zc = uarg_to_msgzc(uarg);
+		uarg_zc->zerocopy = zerocopy ? 1 : 0;
 	}
 
-	return skb;
+	skb_zcopy_init(skb, uarg);
 
-out:
-	kfree_skb(skb);
-	return NULL;
+	return 0;
+}
+
+static int virtio_transport_fill_skb(struct sk_buff *skb,
+				     struct virtio_vsock_pkt_info *info,
+				     size_t len,
+				     bool zcopy)
+{
+	void *payload;
+	int err;
+
+	if (zcopy)
+		return __zerocopy_sg_from_iter(info->msg, NULL, skb,
+					       &info->msg->msg_iter,
+					       len);
+
+	payload = skb_put(skb, len);
+	err = memcpy_from_msg(payload, info->msg, len);
+	if (err)
+		return -1;
+
+	if (msg_data_left(info->msg))
+		return 0;
+
+	return 0;
+}
+
+static void virtio_transport_init_hdr(struct sk_buff *skb,
+				      struct virtio_vsock_pkt_info *info,
+				      size_t payload_len,
+				      u32 src_cid,
+				      u32 src_port,
+				      u32 dst_cid,
+				      u32 dst_port)
+{
+	struct virtio_vsock_hdr *hdr;
+
+	hdr = virtio_vsock_hdr(skb);
+	hdr->type	= cpu_to_le16(info->type);
+	hdr->op		= cpu_to_le16(info->op);
+	hdr->src_cid	= cpu_to_le64(src_cid);
+	hdr->dst_cid	= cpu_to_le64(dst_cid);
+	hdr->src_port	= cpu_to_le32(src_port);
+	hdr->dst_port	= cpu_to_le32(dst_port);
+	hdr->flags	= cpu_to_le32(info->flags);
+	hdr->len	= cpu_to_le32(payload_len);
 }
 
 static void virtio_transport_copy_nonlinear_skb(const struct sk_buff *skb,
@@ -214,6 +251,77 @@ static u16 virtio_transport_get_type(struct sock *sk)
 		return VIRTIO_VSOCK_TYPE_SEQPACKET;
 }
 
+static struct sk_buff *virtio_transport_alloc_skb(struct vsock_sock *vsk,
+						  struct virtio_vsock_pkt_info *info,
+						  size_t payload_len,
+						  bool zcopy,
+						  u32 src_cid,
+						  u32 src_port,
+						  u32 dst_cid,
+						  u32 dst_port)
+{
+	struct sk_buff *skb;
+	size_t skb_len;
+
+	skb_len = VIRTIO_VSOCK_SKB_HEADROOM;
+
+	if (!zcopy)
+		skb_len += payload_len;
+
+	skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
+	if (!skb)
+		return NULL;
+
+	virtio_transport_init_hdr(skb, info, payload_len, src_cid, src_port,
+				  dst_cid, dst_port);
+
+	/* If 'vsk' != NULL then payload is always present, so we
+	 * will never call '__zerocopy_sg_from_iter()' below without
+	 * setting skb owner in 'skb_set_owner_w()'. The only case
+	 * when 'vsk' == NULL is VIRTIO_VSOCK_OP_RST control message
+	 * without payload.
+	 */
+	WARN_ON_ONCE(!(vsk && (info->msg && payload_len)) && zcopy);
+
+	/* Set owner here, because '__zerocopy_sg_from_iter()' uses
+	 * owner of skb without check to update 'sk_wmem_alloc'.
+	 */
+	if (vsk)
+		skb_set_owner_w(skb, sk_vsock(vsk));
+
+	if (info->msg && payload_len > 0) {
+		int err;
+
+		err = virtio_transport_fill_skb(skb, info, payload_len, zcopy);
+		if (err)
+			goto out;
+
+		if (info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) {
+			struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
+
+			hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
+
+			if (info->msg->msg_flags & MSG_EOR)
+				hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR);
+		}
+	}
+
+	if (info->reply)
+		virtio_vsock_skb_set_reply(skb);
+
+	trace_virtio_transport_alloc_pkt(src_cid, src_port,
+					 dst_cid, dst_port,
+					 payload_len,
+					 info->type,
+					 info->op,
+					 info->flags);
+
+	return skb;
+out:
+	kfree_skb(skb);
+	return NULL;
+}
+
 /* This function can only be used on connecting/connected sockets,
  * since a socket assigned to a transport is required.
  *
@@ -222,10 +330,12 @@ static u16 virtio_transport_get_type(struct sock *sk)
 static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 					  struct virtio_vsock_pkt_info *info)
 {
+	u32 max_skb_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
 	u32 src_cid, src_port, dst_cid, dst_port;
 	const struct virtio_transport *t_ops;
 	struct virtio_vsock_sock *vvs;
 	u32 pkt_len = info->pkt_len;
+	bool can_zcopy = false;
 	u32 rest_len;
 	int ret;
 
@@ -254,15 +364,30 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
 		return pkt_len;
 
+	if (info->msg) {
+		/* If zerocopy is not enabled by 'setsockopt()', we behave as
+		 * there is no MSG_ZEROCOPY flag set.
+		 */
+		if (!sock_flag(sk_vsock(vsk), SOCK_ZEROCOPY))
+			info->msg->msg_flags &= ~MSG_ZEROCOPY;
+
+		if (info->msg->msg_flags & MSG_ZEROCOPY)
+			can_zcopy = virtio_transport_can_zcopy(info, pkt_len);
+
+		if (can_zcopy)
+			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
+					    (MAX_SKB_FRAGS * PAGE_SIZE));
+	}
+
 	rest_len = pkt_len;
 
 	do {
 		struct sk_buff *skb;
 		size_t skb_len;
 
-		skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE, rest_len);
+		skb_len = min(max_skb_len, rest_len);
 
-		skb = virtio_transport_alloc_skb(info, skb_len,
+		skb = virtio_transport_alloc_skb(vsk, info, skb_len, can_zcopy,
 						 src_cid, src_port,
 						 dst_cid, dst_port);
 		if (!skb) {
@@ -270,6 +395,21 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 			break;
 		}
 
+		/* We process buffer part by part, allocating skb on
+		 * each iteration. If this is last skb for this buffer
+		 * and MSG_ZEROCOPY mode is in use - we must allocate
+		 * completion for the current syscall.
+		 */
+		if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
+		    skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
+			if (virtio_transport_init_zcopy_skb(vsk, skb,
+							    info->msg,
+							    can_zcopy)) {
+				ret = -ENOMEM;
+				break;
+			}
+		}
+
 		virtio_transport_inc_tx_pkt(vvs, skb);
 
 		ret = t_ops->send_pkt(skb);
@@ -985,7 +1125,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
 	if (!t)
 		return -ENOTCONN;
 
-	reply = virtio_transport_alloc_skb(&info, 0,
+	reply = virtio_transport_alloc_skb(NULL, &info, 0, false,
 					   le64_to_cpu(hdr->dst_cid),
 					   le32_to_cpu(hdr->dst_port),
 					   le64_to_cpu(hdr->src_cid),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v8 2/4] vsock/virtio: support to send non-linear skb
  2023-09-11 20:22 ` [PATCH net-next v8 2/4] vsock/virtio: support to send " Arseniy Krasnov
@ 2023-09-14 13:06     ` Stefano Garzarella
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Garzarella @ 2023-09-14 13:06 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Stefan Hajnoczi, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michael S. Tsirkin, Jason Wang, Bobby Eshleman, kvm,
	virtualization, netdev, linux-kernel, kernel, oxffffaa

On Mon, Sep 11, 2023 at 11:22:32PM +0300, Arseniy Krasnov wrote:
>For non-linear skb use its pages from fragment array as buffers in
>virtio tx queue. These pages are already pinned by 'get_user_pages()'
>during such skb creation.
>
>Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
>---
> Changelog:
> v2 -> v3:
>  * Comment about 'page_to_virt()' is updated. I don't remove R-b,
>    as this change is quiet small I guess.
> v6 -> v7:
>  * Move arrays '*sgs' and 'bufs' to 'virtio_vsock' instead of being
>    local variables. This allows to save stack space in cases of too
>    big MAX_SKB_FRAGS.
>  * Add 'WARN_ON_ONCE()' for handling nonlinear skbs - it checks that
>    linear part of such skb contains only header.
>  * R-b tag removed due to updates above.
> v7 -> v8:
>  * Add comment in 'struct virtio_vsock' for both 'struct scatterlist'
>    fields.
>  * Rename '*sgs' and 'bufs' to '*out_sgs' and 'out_bufs'.
>  * Initialize '*out_sgs' in 'virtio_vsock_probe()' by always pointing
>    to the corresponding element of 'out_bufs'.

LGTM, thanks for addressing that comments!

>
> net/vmw_vsock/virtio_transport.c | 60 ++++++++++++++++++++++++++++----
> 1 file changed, 53 insertions(+), 7 deletions(-)

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v8 2/4] vsock/virtio: support to send non-linear skb
@ 2023-09-14 13:06     ` Stefano Garzarella
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Garzarella @ 2023-09-14 13:06 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Bobby Eshleman, kvm, Michael S. Tsirkin, netdev, linux-kernel,
	virtualization, oxffffaa, Eric Dumazet, Stefan Hajnoczi, kernel,
	Jakub Kicinski, Paolo Abeni, David S. Miller

On Mon, Sep 11, 2023 at 11:22:32PM +0300, Arseniy Krasnov wrote:
>For non-linear skb use its pages from fragment array as buffers in
>virtio tx queue. These pages are already pinned by 'get_user_pages()'
>during such skb creation.
>
>Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
>---
> Changelog:
> v2 -> v3:
>  * Comment about 'page_to_virt()' is updated. I don't remove R-b,
>    as this change is quiet small I guess.
> v6 -> v7:
>  * Move arrays '*sgs' and 'bufs' to 'virtio_vsock' instead of being
>    local variables. This allows to save stack space in cases of too
>    big MAX_SKB_FRAGS.
>  * Add 'WARN_ON_ONCE()' for handling nonlinear skbs - it checks that
>    linear part of such skb contains only header.
>  * R-b tag removed due to updates above.
> v7 -> v8:
>  * Add comment in 'struct virtio_vsock' for both 'struct scatterlist'
>    fields.
>  * Rename '*sgs' and 'bufs' to '*out_sgs' and 'out_bufs'.
>  * Initialize '*out_sgs' in 'virtio_vsock_probe()' by always pointing
>    to the corresponding element of 'out_bufs'.

LGTM, thanks for addressing that comments!

>
> net/vmw_vsock/virtio_transport.c | 60 ++++++++++++++++++++++++++++----
> 1 file changed, 53 insertions(+), 7 deletions(-)

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v8 4/4] vsock/virtio: MSG_ZEROCOPY flag support
  2023-09-11 20:22 ` [PATCH net-next v8 4/4] vsock/virtio: MSG_ZEROCOPY flag support Arseniy Krasnov
@ 2023-09-14 13:57     ` Stefano Garzarella
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Garzarella @ 2023-09-14 13:57 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Bobby Eshleman, kvm, Michael S. Tsirkin, netdev, linux-kernel,
	virtualization, oxffffaa, Eric Dumazet, Stefan Hajnoczi, kernel,
	Jakub Kicinski, Paolo Abeni, David S. Miller

On Mon, Sep 11, 2023 at 11:22:34PM +0300, Arseniy Krasnov wrote:
>This adds handling of MSG_ZEROCOPY flag on transmission path:
>
>1) If this flag is set and zerocopy transmission is possible (enabled
>   in socket options and transport allows zerocopy), then non-linear
>   skb will be created and filled with the pages of user's buffer.
>   Pages of user's buffer are locked in memory by 'get_user_pages()'.
>2) Replaces way of skb owning: instead of 'skb_set_owner_sk_safe()' it
>   calls 'skb_set_owner_w()'. Reason of this change is that
>   '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so
>   to decrease this field correctly, proper skb destructor is needed:
>   'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
>3) Adds new callback to 'struct virtio_transport': 'can_msgzerocopy'.
>   If this callback is set, then transport needs extra check to be able
>   to send provided number of buffers in zerocopy mode. Currently, the
>   only transport that needs this callback set is virtio, because this
>   transport adds new buffers to the virtio queue and we need to check,
>   that number of these buffers is less than size of the queue (it is
>   required by virtio spec). vhost and loopback transports don't need
>   this check.
>
>Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
>---
> Changelog:
> v5(big patchset) -> v1:
>  * Refactorings of 'if' conditions.
>  * Remove extra blank line.
>  * Remove 'frag_off' field unneeded init.
>  * Add function 'virtio_transport_fill_skb()' which fills both linear
>    and non-linear skb with provided data.
> v1 -> v2:
>  * Use original order of last four arguments in 'virtio_transport_alloc_skb()'.
> v2 -> v3:
>  * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
>    provided 'iov_iter' with data could be sent in a zerocopy mode.
>    If this callback is not set in transport - transport allows to send
>    any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true'
>    then zerocopy is allowed. Reason of this callback is that in case of
>    G2H transmission we insert whole skb to the tx virtio queue and such
>    skb must fit to the size of the virtio queue to be sent in a single
>    iteration (may be tx logic in 'virtio_transport.c' could be reworked
>    as in vhost to support partial send of current skb). This callback
>    will be enabled only for G2H path. For details pls see comment
>    'Check that tx queue...' below.
> v3 -> v4:
>  * 'msgzerocopy_check_iov' moved from 'struct vsock_transport' to
>    'struct virtio_transport' as it is virtio specific callback and
>    never needed in other transports.
> v4 -> v5:
>  * 'msgzerocopy_check_iov' renamed to 'can_msgzerocopy' and now it
>    uses number of buffers to send as input argument. I think there is
>    no need to pass iov to this callback (at least today, it is used only
>    by guest side of virtio transport), because the only thing that this
>    callback does is comparison of number of buffers to be inserted to
>    the tx queue and size of this queue.
>  * Remove any checks for type of current 'iov_iter' with payload (is it
>    'iovec' or 'ubuf'). These checks left from the earlier versions where I
>    didn't use already implemented kernel API which handles every type of
>    'iov_iter'.
> v5 -> v6:
>  * Refactor 'virtio_transport_fill_skb()'.
>  * Add 'WARN_ON_ONCE()' and comment on invalid combination of destination
>    socket and payload in 'virtio_transport_alloc_skb()'.
> v7 -> v8:
>  * Move '+1' addition from 'can_msgzerocopy' callback body to the caller.
>    This addition means packet header.
>  * In 'virtio_transport_can_zcopy()' rename 'max_to_send' argument to
>    'pkt_len'.
>  * Update commit message by adding details about new 'can_msgzerocopy'
>    callback.
>  * In 'virtio_transport_init_hdr()' move 'len' argument directly after
>    'info'.
>  * Add comment about processing last skb in tx loop.
>  * Update comment for 'can_msgzerocopy' callback for more details.
>
> include/linux/virtio_vsock.h            |   9 +
> net/vmw_vsock/virtio_transport.c        |  32 +++
> net/vmw_vsock/virtio_transport_common.c | 256 ++++++++++++++++++------
> 3 files changed, 239 insertions(+), 58 deletions(-)
>
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index a91fbdf233e4..ebb3ce63d64d 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -160,6 +160,15 @@ struct virtio_transport {
>
> 	/* Takes ownership of the packet */
> 	int (*send_pkt)(struct sk_buff *skb);
>+
>+	/* Used in MSG_ZEROCOPY mode. Checks, that provided data
>+	 * (number of buffers) could be transmitted with zerocopy
>+	 * mode. If this callback is not implemented for the current
>+	 * transport - this means that this transport doesn't need
>+	 * extra checks and can perform zerocopy transmission by
>+	 * default.
>+	 */
>+	bool (*can_msgzerocopy)(int bufs_num);
> };
>
> ssize_t
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 73d730156349..09ba3128e759 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -455,6 +455,37 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> 	queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> }
>
>+static bool virtio_transport_can_msgzerocopy(int bufs_num)
>+{
>+	struct virtio_vsock *vsock;
>+	bool res = false;
>+
>+	rcu_read_lock();
>+
>+	vsock = rcu_dereference(the_virtio_vsock);
>+	if (vsock) {
>+		struct virtqueue *vq = vsock->vqs[VSOCK_VQ_TX];
>+
>+		/* Check that tx queue is large enough to keep whole
>+		 * data to send. This is needed, because when there is
>+		 * not enough free space in the queue, current skb to
>+		 * send will be reinserted to the head of tx list of
>+		 * the socket to retry transmission later, so if skb
>+		 * is bigger than whole queue, it will be reinserted
>+		 * again and again, thus blocking other skbs to be sent.
>+		 * Each page of the user provided buffer will be added
>+		 * as a single buffer to the tx virtqueue, so compare
>+		 * number of pages against maximum capacity of the queue.
>+		 */
>+		if (bufs_num <= vq->num_max)
>+			res = true;
>+	}
>+
>+	rcu_read_unlock();
>+
>+	return res;
>+}
>+
> static bool virtio_transport_seqpacket_allow(u32 remote_cid);
>
> static struct virtio_transport virtio_transport = {
>@@ -504,6 +535,7 @@ static struct virtio_transport virtio_transport = {
> 	},
>
> 	.send_pkt = virtio_transport_send_pkt,
>+	.can_msgzerocopy = virtio_transport_can_msgzerocopy,
> };
>
> static bool virtio_transport_seqpacket_allow(u32 remote_cid)
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 3a48e48a99ac..e358f118b07e 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -37,73 +37,110 @@ virtio_transport_get_ops(struct vsock_sock *vsk)
> 	return container_of(t, struct virtio_transport, transport);
> }
>
>-/* Returns a new packet on success, otherwise returns NULL.
>- *
>- * If NULL is returned, errp is set to a negative errno.
>- */

Why we are removing this comment?

>-static struct sk_buff *
>-virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
>-			   size_t len,
>-			   u32 src_cid,
>-			   u32 src_port,
>-			   u32 dst_cid,
>-			   u32 dst_port)
>-{
>-	const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len;
>-	struct virtio_vsock_hdr *hdr;
>-	struct sk_buff *skb;
>-	void *payload;
>-	int err;
>+static bool virtio_transport_can_zcopy(struct virtio_vsock_pkt_info 
>*info,
>+				       size_t pkt_len)
>+{
>+	const struct virtio_transport *t_ops;
>+	struct iov_iter *iov_iter;
>
>-	skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
>-	if (!skb)
>-		return NULL;
>+	if (!info->msg)
>+		return false;
>
>-	hdr = virtio_vsock_hdr(skb);
>-	hdr->type	= cpu_to_le16(info->type);
>-	hdr->op		= cpu_to_le16(info->op);
>-	hdr->src_cid	= cpu_to_le64(src_cid);
>-	hdr->dst_cid	= cpu_to_le64(dst_cid);
>-	hdr->src_port	= cpu_to_le32(src_port);
>-	hdr->dst_port	= cpu_to_le32(dst_port);
>-	hdr->flags	= cpu_to_le32(info->flags);
>-	hdr->len	= cpu_to_le32(len);
>+	iov_iter = &info->msg->msg_iter;
>
>-	if (info->msg && len > 0) {
>-		payload = skb)put(skb, len);
>-		err = memcpy_from_msg(payload, info->msg, len);
>-		if (err)
>-			goto out;
>+	if (iov_iter->iov_offset)
>+		return false;
>
>-		if (msg_data_left(info->msg) == 0 &&
>-		    info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) {
>-			hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
>+	/* We can't send whole iov. */
>+	if (iov_iter->count > pkt_len)
>+		return false;
>
>-			if (info->msg->msg_flags & MSG_EOR)
>-				hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR);
>-		}
>+	/* Check that transport can send data in zerocopy mode. */
>+	t_ops = virtio_transport_get_ops(info->vsk);

While reviewing I was wondering why here we don't check if `t_ops` is
NULL.

Then I realized that the only caller of this function
(virtio_transport_send_pkt_info()) already get the vsk ops calling
virtio_transport_get_ops() and also checks if it can be null.

So what about passing the ops as function parameter to avoid to call
virtio_transport_get_ops() again?

>+
>+	if (t_ops->can_msgzerocopy) {
>+		int pages_in_iov = iov_iter_npages(iov_iter, MAX_SKB_FRAGS);
>+		int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS);
>+
>+		/* +1 is for packet header. */
>+		return t_ops->can_msgzerocopy(pages_to_send + 1);
> 	}
>
>-	if (info->reply)
>-		virtio_vsock_skb_set_reply(skb);
>+	return true;
>+}
>
>-	trace_virtio_transport_alloc_pkt(src_cid, src_port,
>-					 dst_cid, dst_port,
>-					 len,
>-					 info->type,
>-					 info->op,
>-					 info->flags);
>+static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
>+					   struct sk_buff *skb,
>+					   struct msghdr *msg,
>+					   bool zerocopy)
>+{
>+	struct ubuf_info *uarg;
>
>-	if (info->vsk && !skb_set_owner_sk_safe(skb, sk_vsock(info->vsk))) {
>-		WARN_ONCE(1, "failed to allocate skb on vsock socket with sk_refcnt == 0\n");
>-		goto out;
>+	if (msg->msg_ubuf) {
>+		uarg = msg->msg_ubuf;
>+		net_zcopy_get(uarg);
>+	} else {
>+		struct iov_iter *iter = &msg->msg_iter;
>+		struct ubuf_info_msgzc *uarg_zc;
>+
>+		uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>+					    iter->count,
>+					    NULL);
>+		if (!uarg)
>+			return -1;
>+
>+		uarg_zc = uarg_to_msgzc(uarg);
>+		uarg_zc->zerocopy = zerocopy ? 1 : 0;
> 	}
>
>-	return skb;
>+	skb_zcopy_init(skb, uarg);
>
>-out:
>-	kfree_skb(skb);
>-	return NULL;
>+	return 0;
>+}
>+
>+static int virtio_transport_fill_skb(struct sk_buff *skb,
>+				     struct virtio_vsock_pkt_info *info,
>+				     size_t len,
>+				     bool zcopy)
>+{
>+	void *payload;
>+	int err;
>+
>+	if (zcopy)
>+		return __zerocopy_sg_from_iter(info->msg, NULL, skb,
>+					       &info->msg->msg_iter,
>+					       len);
>+
>+	payload = skb_put(skb, len);
>+	err = memcpy_from_msg(payload, info->msg, len);
>+	if (err)
>+		return -1;
>+
>+	if (msg_data_left(info->msg))
>+		return 0;

We are returning 0 in any case, what is the purpose of this check?

>+
>+	return 0;
>+}
>+
>+static void virtio_transport_init_hdr(struct sk_buff *skb,
>+				      struct virtio_vsock_pkt_info *info,
>+				      size_t payload_len,
>+				      u32 src_cid,
>+				      u32 src_port,
>+				      u32 dst_cid,
>+				      u32 dst_port)
>+{
>+	struct virtio_vsock_hdr *hdr;
>+
>+	hdr = virtio_vsock_hdr(skb);
>+	hdr->type	= cpu_to_le16(info->type);
>+	hdr->op		= cpu_to_le16(info->op);
>+	hdr->src_cid	= cpu_to_le64(src_cid);
>+	hdr->dst_cid	= cpu_to_le64(dst_cid);
>+	hdr->src_port	= cpu_to_le32(src_port);
>+	hdr->dst_port	= cpu_to_le32(dst_port);
>+	hdr->flags	= cpu_to_le32(info->flags);
>+	hdr->len	= cpu_to_le32(payload_len);
> }
>
> static void virtio_transport_copy_nonlinear_skb(const struct sk_buff *skb,
>@@ -214,6 +251,77 @@ static u16 virtio_transport_get_type(struct sock *sk)
> 		return VIRTIO_VSOCK_TYPE_SEQPACKET;
> }
>
>+static struct sk_buff *virtio_transport_alloc_skb(struct vsock_sock *vsk,

Before this patch we used `info->vsk` in virtio_transport_alloc_skb().
Is it now really necessary to add vsk as a parameter? If so, why?

>+						  struct virtio_vsock_pkt_info *info,
>+						  size_t payload_len,
>+						  bool zcopy,
>+						  u32 src_cid,
>+						  u32 src_port,
>+						  u32 dst_cid,
>+						  u32 dst_port)
>+{
>+	struct sk_buff *skb;
>+	size_t skb_len;
>+
>+	skb_len = VIRTIO_VSOCK_SKB_HEADROOM;
>+
>+	if (!zcopy)
>+		skb_len += payload_len;
>+
>+	skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
>+	if (!skb)
>+		return NULL;
>+
>+	virtio_transport_init_hdr(skb, info, payload_len, src_cid, src_port,
>+				  dst_cid, dst_port);
>+
>+	/* If 'vsk' != NULL then payload is always present, so we
>+	 * will never call '__zerocopy_sg_from_iter()' below without
>+	 * setting skb owner in 'skb_set_owner_w()'. The only case
>+	 * when 'vsk' == NULL is VIRTIO_VSOCK_OP_RST control message
>+	 * without payload.
>+	 */
>+	WARN_ON_ONCE(!(vsk && (info->msg && payload_len)) && zcopy);
>+
>+	/* Set owner here, because '__zerocopy_sg_from_iter()' uses
>+	 * owner of skb without check to update 'sk_wmem_alloc'.
>+	 */
>+	if (vsk)
>+		skb_set_owner_w(skb, sk_vsock(vsk));
>+
>+	if (info->msg && payload_len > 0) {
>+		int err;
>+
>+		err = virtio_transport_fill_skb(skb, info, payload_len, zcopy);
>+		if (err)
>+			goto out;
>+
>+		if (info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) {

Before this patch, we did these steps only if
`msg_data_left(info->msg) == 0`, why now we do it in any case?

>+			struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
>+
>+			hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
>+
>+			if (info->msg->msg_flags & MSG_EOR)
>+				hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR);
>+		}
>+	}
>+
>+	if (info->reply)
>+		virtio_vsock_skb_set_reply(skb);
>+
>+	trace_virtio_transport_alloc_pkt(src_cid, src_port,
>+					 dst_cid, dst_port,
>+					 payload_len,
>+					 info->type,
>+					 info->op,
>+					 info->flags);

Maybe now we should trace also `zcopy`.

>+
>+	return skb;
>+out:
>+	kfree_skb(skb);
>+	return NULL;
>+}
>+
> /* This function can only be used on connecting/connected sockets,
>  * since a socket assigned to a transport is required.
>  *
>@@ -222,10 +330,12 @@ static u16 virtio_transport_get_type(struct sock *sk)
> static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> 					  struct virtio_vsock_pkt_info *info)
> {
>+	u32 max_skb_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> 	u32 src_cid, src_port, dst_cid, dst_port;
> 	const struct virtio_transport *t_ops;
> 	struct virtio_vsock_sock *vvs;
> 	u32 pkt_len = info->pkt_len;
>+	bool can_zcopy = false;
> 	u32 rest_len;
> 	int ret;
>
>@@ -254,15 +364,30 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> 	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> 		return pkt_len;
>
>+	if (info->msg) {
>+		/* If zerocopy is not enabled by 'setsockopt()', we behave as
>+		 * there is no MSG_ZEROCOPY flag set.
>+		 */
>+		if (!sock_flag(sk_vsock(vsk), SOCK_ZEROCOPY))
>+			info->msg->msg_flags &= ~MSG_ZEROCOPY;
>+
>+		if (info->msg->msg_flags & MSG_ZEROCOPY)
>+			can_zcopy = virtio_transport_can_zcopy(info, pkt_len);
>+
>+		if (can_zcopy)
>+			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>+					    (MAX_SKB_FRAGS * PAGE_SIZE));
>+	}
>+
> 	rest_len = pkt_len;
>
> 	do {
> 		struct sk_buff *skb;
> 		size_t skb_len;
>
>-		skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE, rest_len);
>+		skb_len = min(max_skb_len, rest_len);
>
>-		skb = virtio_transport_alloc_skb(info, skb_len,
>+		skb = virtio_transport_alloc_skb(vsk, info, skb_len, can_zcopy,
> 						 src_cid, src_port,
> 						 dst_cid, dst_port);
> 		if (!skb) {
>@@ -270,6 +395,21 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> 			break;
> 		}
>
>+		/* We process buffer part by part, allocating skb on
>+		 * each iteration. If this is last skb for this buffer
>+		 * and MSG_ZEROCOPY mode is in use - we must allocate
>+		 * completion for the current syscall.
>+		 */
>+		if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
>+		    skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
>+			if (virtio_transport_init_zcopy_skb(vsk, skb,
>+							    info->msg,
>+							    can_zcopy)) {
>+				ret = -ENOMEM;
>+				break;
>+			}
>+		}
>+
> 		virtio_transport_inc_tx_pkt(vvs, skb);
>
> 		ret = t_ops->send_pkt(skb);
>@@ -985,7 +1125,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
> 	if (!t)
> 		return -ENOTCONN;
>
>-	reply = virtio_transport_alloc_skb(&info, 0,
>+	reply = virtio_transport_alloc_skb(NULL, &info, 0, false,
> 					   le64_to_cpu(hdr->dst_cid),
> 					   le32_to_cpu(hdr->dst_port),
> 					   le64_to_cpu(hdr->src_cid),
>-- 
>2.25.1
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v8 4/4] vsock/virtio: MSG_ZEROCOPY flag support
@ 2023-09-14 13:57     ` Stefano Garzarella
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Garzarella @ 2023-09-14 13:57 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Stefan Hajnoczi, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michael S. Tsirkin, Jason Wang, Bobby Eshleman, kvm,
	virtualization, netdev, linux-kernel, kernel, oxffffaa

On Mon, Sep 11, 2023 at 11:22:34PM +0300, Arseniy Krasnov wrote:
>This adds handling of MSG_ZEROCOPY flag on transmission path:
>
>1) If this flag is set and zerocopy transmission is possible (enabled
>   in socket options and transport allows zerocopy), then non-linear
>   skb will be created and filled with the pages of user's buffer.
>   Pages of user's buffer are locked in memory by 'get_user_pages()'.
>2) Replaces way of skb owning: instead of 'skb_set_owner_sk_safe()' it
>   calls 'skb_set_owner_w()'. Reason of this change is that
>   '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so
>   to decrease this field correctly, proper skb destructor is needed:
>   'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
>3) Adds new callback to 'struct virtio_transport': 'can_msgzerocopy'.
>   If this callback is set, then transport needs extra check to be able
>   to send provided number of buffers in zerocopy mode. Currently, the
>   only transport that needs this callback set is virtio, because this
>   transport adds new buffers to the virtio queue and we need to check,
>   that number of these buffers is less than size of the queue (it is
>   required by virtio spec). vhost and loopback transports don't need
>   this check.
>
>Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
>---
> Changelog:
> v5(big patchset) -> v1:
>  * Refactorings of 'if' conditions.
>  * Remove extra blank line.
>  * Remove 'frag_off' field unneeded init.
>  * Add function 'virtio_transport_fill_skb()' which fills both linear
>    and non-linear skb with provided data.
> v1 -> v2:
>  * Use original order of last four arguments in 'virtio_transport_alloc_skb()'.
> v2 -> v3:
>  * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
>    provided 'iov_iter' with data could be sent in a zerocopy mode.
>    If this callback is not set in transport - transport allows to send
>    any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true'
>    then zerocopy is allowed. Reason of this callback is that in case of
>    G2H transmission we insert whole skb to the tx virtio queue and such
>    skb must fit to the size of the virtio queue to be sent in a single
>    iteration (may be tx logic in 'virtio_transport.c' could be reworked
>    as in vhost to support partial send of current skb). This callback
>    will be enabled only for G2H path. For details pls see comment
>    'Check that tx queue...' below.
> v3 -> v4:
>  * 'msgzerocopy_check_iov' moved from 'struct vsock_transport' to
>    'struct virtio_transport' as it is virtio specific callback and
>    never needed in other transports.
> v4 -> v5:
>  * 'msgzerocopy_check_iov' renamed to 'can_msgzerocopy' and now it
>    uses number of buffers to send as input argument. I think there is
>    no need to pass iov to this callback (at least today, it is used only
>    by guest side of virtio transport), because the only thing that this
>    callback does is comparison of number of buffers to be inserted to
>    the tx queue and size of this queue.
>  * Remove any checks for type of current 'iov_iter' with payload (is it
>    'iovec' or 'ubuf'). These checks left from the earlier versions where I
>    didn't use already implemented kernel API which handles every type of
>    'iov_iter'.
> v5 -> v6:
>  * Refactor 'virtio_transport_fill_skb()'.
>  * Add 'WARN_ON_ONCE()' and comment on invalid combination of destination
>    socket and payload in 'virtio_transport_alloc_skb()'.
> v7 -> v8:
>  * Move '+1' addition from 'can_msgzerocopy' callback body to the caller.
>    This addition means packet header.
>  * In 'virtio_transport_can_zcopy()' rename 'max_to_send' argument to
>    'pkt_len'.
>  * Update commit message by adding details about new 'can_msgzerocopy'
>    callback.
>  * In 'virtio_transport_init_hdr()' move 'len' argument directly after
>    'info'.
>  * Add comment about processing last skb in tx loop.
>  * Update comment for 'can_msgzerocopy' callback for more details.
>
> include/linux/virtio_vsock.h            |   9 +
> net/vmw_vsock/virtio_transport.c        |  32 +++
> net/vmw_vsock/virtio_transport_common.c | 256 ++++++++++++++++++------
> 3 files changed, 239 insertions(+), 58 deletions(-)
>
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index a91fbdf233e4..ebb3ce63d64d 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -160,6 +160,15 @@ struct virtio_transport {
>
> 	/* Takes ownership of the packet */
> 	int (*send_pkt)(struct sk_buff *skb);
>+
>+	/* Used in MSG_ZEROCOPY mode. Checks, that provided data
>+	 * (number of buffers) could be transmitted with zerocopy
>+	 * mode. If this callback is not implemented for the current
>+	 * transport - this means that this transport doesn't need
>+	 * extra checks and can perform zerocopy transmission by
>+	 * default.
>+	 */
>+	bool (*can_msgzerocopy)(int bufs_num);
> };
>
> ssize_t
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 73d730156349..09ba3128e759 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -455,6 +455,37 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> 	queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> }
>
>+static bool virtio_transport_can_msgzerocopy(int bufs_num)
>+{
>+	struct virtio_vsock *vsock;
>+	bool res = false;
>+
>+	rcu_read_lock();
>+
>+	vsock = rcu_dereference(the_virtio_vsock);
>+	if (vsock) {
>+		struct virtqueue *vq = vsock->vqs[VSOCK_VQ_TX];
>+
>+		/* Check that tx queue is large enough to keep whole
>+		 * data to send. This is needed, because when there is
>+		 * not enough free space in the queue, current skb to
>+		 * send will be reinserted to the head of tx list of
>+		 * the socket to retry transmission later, so if skb
>+		 * is bigger than whole queue, it will be reinserted
>+		 * again and again, thus blocking other skbs to be sent.
>+		 * Each page of the user provided buffer will be added
>+		 * as a single buffer to the tx virtqueue, so compare
>+		 * number of pages against maximum capacity of the queue.
>+		 */
>+		if (bufs_num <= vq->num_max)
>+			res = true;
>+	}
>+
>+	rcu_read_unlock();
>+
>+	return res;
>+}
>+
> static bool virtio_transport_seqpacket_allow(u32 remote_cid);
>
> static struct virtio_transport virtio_transport = {
>@@ -504,6 +535,7 @@ static struct virtio_transport virtio_transport = {
> 	},
>
> 	.send_pkt = virtio_transport_send_pkt,
>+	.can_msgzerocopy = virtio_transport_can_msgzerocopy,
> };
>
> static bool virtio_transport_seqpacket_allow(u32 remote_cid)
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 3a48e48a99ac..e358f118b07e 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -37,73 +37,110 @@ virtio_transport_get_ops(struct vsock_sock *vsk)
> 	return container_of(t, struct virtio_transport, transport);
> }
>
>-/* Returns a new packet on success, otherwise returns NULL.
>- *
>- * If NULL is returned, errp is set to a negative errno.
>- */

Why we are removing this comment?

>-static struct sk_buff *
>-virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
>-			   size_t len,
>-			   u32 src_cid,
>-			   u32 src_port,
>-			   u32 dst_cid,
>-			   u32 dst_port)
>-{
>-	const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len;
>-	struct virtio_vsock_hdr *hdr;
>-	struct sk_buff *skb;
>-	void *payload;
>-	int err;
>+static bool virtio_transport_can_zcopy(struct virtio_vsock_pkt_info 
>*info,
>+				       size_t pkt_len)
>+{
>+	const struct virtio_transport *t_ops;
>+	struct iov_iter *iov_iter;
>
>-	skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
>-	if (!skb)
>-		return NULL;
>+	if (!info->msg)
>+		return false;
>
>-	hdr = virtio_vsock_hdr(skb);
>-	hdr->type	= cpu_to_le16(info->type);
>-	hdr->op		= cpu_to_le16(info->op);
>-	hdr->src_cid	= cpu_to_le64(src_cid);
>-	hdr->dst_cid	= cpu_to_le64(dst_cid);
>-	hdr->src_port	= cpu_to_le32(src_port);
>-	hdr->dst_port	= cpu_to_le32(dst_port);
>-	hdr->flags	= cpu_to_le32(info->flags);
>-	hdr->len	= cpu_to_le32(len);
>+	iov_iter = &info->msg->msg_iter;
>
>-	if (info->msg && len > 0) {
>-		payload = skb)put(skb, len);
>-		err = memcpy_from_msg(payload, info->msg, len);
>-		if (err)
>-			goto out;
>+	if (iov_iter->iov_offset)
>+		return false;
>
>-		if (msg_data_left(info->msg) == 0 &&
>-		    info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) {
>-			hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
>+	/* We can't send whole iov. */
>+	if (iov_iter->count > pkt_len)
>+		return false;
>
>-			if (info->msg->msg_flags & MSG_EOR)
>-				hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR);
>-		}
>+	/* Check that transport can send data in zerocopy mode. */
>+	t_ops = virtio_transport_get_ops(info->vsk);

While reviewing I was wondering why here we don't check if `t_ops` is
NULL.

Then I realized that the only caller of this function
(virtio_transport_send_pkt_info()) already get the vsk ops calling
virtio_transport_get_ops() and also checks if it can be null.

So what about passing the ops as function parameter to avoid to call
virtio_transport_get_ops() again?

>+
>+	if (t_ops->can_msgzerocopy) {
>+		int pages_in_iov = iov_iter_npages(iov_iter, MAX_SKB_FRAGS);
>+		int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS);
>+
>+		/* +1 is for packet header. */
>+		return t_ops->can_msgzerocopy(pages_to_send + 1);
> 	}
>
>-	if (info->reply)
>-		virtio_vsock_skb_set_reply(skb);
>+	return true;
>+}
>
>-	trace_virtio_transport_alloc_pkt(src_cid, src_port,
>-					 dst_cid, dst_port,
>-					 len,
>-					 info->type,
>-					 info->op,
>-					 info->flags);
>+static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
>+					   struct sk_buff *skb,
>+					   struct msghdr *msg,
>+					   bool zerocopy)
>+{
>+	struct ubuf_info *uarg;
>
>-	if (info->vsk && !skb_set_owner_sk_safe(skb, sk_vsock(info->vsk))) {
>-		WARN_ONCE(1, "failed to allocate skb on vsock socket with sk_refcnt == 0\n");
>-		goto out;
>+	if (msg->msg_ubuf) {
>+		uarg = msg->msg_ubuf;
>+		net_zcopy_get(uarg);
>+	} else {
>+		struct iov_iter *iter = &msg->msg_iter;
>+		struct ubuf_info_msgzc *uarg_zc;
>+
>+		uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>+					    iter->count,
>+					    NULL);
>+		if (!uarg)
>+			return -1;
>+
>+		uarg_zc = uarg_to_msgzc(uarg);
>+		uarg_zc->zerocopy = zerocopy ? 1 : 0;
> 	}
>
>-	return skb;
>+	skb_zcopy_init(skb, uarg);
>
>-out:
>-	kfree_skb(skb);
>-	return NULL;
>+	return 0;
>+}
>+
>+static int virtio_transport_fill_skb(struct sk_buff *skb,
>+				     struct virtio_vsock_pkt_info *info,
>+				     size_t len,
>+				     bool zcopy)
>+{
>+	void *payload;
>+	int err;
>+
>+	if (zcopy)
>+		return __zerocopy_sg_from_iter(info->msg, NULL, skb,
>+					       &info->msg->msg_iter,
>+					       len);
>+
>+	payload = skb_put(skb, len);
>+	err = memcpy_from_msg(payload, info->msg, len);
>+	if (err)
>+		return -1;
>+
>+	if (msg_data_left(info->msg))
>+		return 0;

We are returning 0 in any case, what is the purpose of this check?

>+
>+	return 0;
>+}
>+
>+static void virtio_transport_init_hdr(struct sk_buff *skb,
>+				      struct virtio_vsock_pkt_info *info,
>+				      size_t payload_len,
>+				      u32 src_cid,
>+				      u32 src_port,
>+				      u32 dst_cid,
>+				      u32 dst_port)
>+{
>+	struct virtio_vsock_hdr *hdr;
>+
>+	hdr = virtio_vsock_hdr(skb);
>+	hdr->type	= cpu_to_le16(info->type);
>+	hdr->op		= cpu_to_le16(info->op);
>+	hdr->src_cid	= cpu_to_le64(src_cid);
>+	hdr->dst_cid	= cpu_to_le64(dst_cid);
>+	hdr->src_port	= cpu_to_le32(src_port);
>+	hdr->dst_port	= cpu_to_le32(dst_port);
>+	hdr->flags	= cpu_to_le32(info->flags);
>+	hdr->len	= cpu_to_le32(payload_len);
> }
>
> static void virtio_transport_copy_nonlinear_skb(const struct sk_buff *skb,
>@@ -214,6 +251,77 @@ static u16 virtio_transport_get_type(struct sock *sk)
> 		return VIRTIO_VSOCK_TYPE_SEQPACKET;
> }
>
>+static struct sk_buff *virtio_transport_alloc_skb(struct vsock_sock *vsk,

Before this patch we used `info->vsk` in virtio_transport_alloc_skb().
Is it now really necessary to add vsk as a parameter? If so, why?

>+						  struct virtio_vsock_pkt_info *info,
>+						  size_t payload_len,
>+						  bool zcopy,
>+						  u32 src_cid,
>+						  u32 src_port,
>+						  u32 dst_cid,
>+						  u32 dst_port)
>+{
>+	struct sk_buff *skb;
>+	size_t skb_len;
>+
>+	skb_len = VIRTIO_VSOCK_SKB_HEADROOM;
>+
>+	if (!zcopy)
>+		skb_len += payload_len;
>+
>+	skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
>+	if (!skb)
>+		return NULL;
>+
>+	virtio_transport_init_hdr(skb, info, payload_len, src_cid, src_port,
>+				  dst_cid, dst_port);
>+
>+	/* If 'vsk' != NULL then payload is always present, so we
>+	 * will never call '__zerocopy_sg_from_iter()' below without
>+	 * setting skb owner in 'skb_set_owner_w()'. The only case
>+	 * when 'vsk' == NULL is VIRTIO_VSOCK_OP_RST control message
>+	 * without payload.
>+	 */
>+	WARN_ON_ONCE(!(vsk && (info->msg && payload_len)) && zcopy);
>+
>+	/* Set owner here, because '__zerocopy_sg_from_iter()' uses
>+	 * owner of skb without check to update 'sk_wmem_alloc'.
>+	 */
>+	if (vsk)
>+		skb_set_owner_w(skb, sk_vsock(vsk));
>+
>+	if (info->msg && payload_len > 0) {
>+		int err;
>+
>+		err = virtio_transport_fill_skb(skb, info, payload_len, zcopy);
>+		if (err)
>+			goto out;
>+
>+		if (info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) {

Before this patch, we did these steps only if
`msg_data_left(info->msg) == 0`, why now we do it in any case?

>+			struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
>+
>+			hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
>+
>+			if (info->msg->msg_flags & MSG_EOR)
>+				hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR);
>+		}
>+	}
>+
>+	if (info->reply)
>+		virtio_vsock_skb_set_reply(skb);
>+
>+	trace_virtio_transport_alloc_pkt(src_cid, src_port,
>+					 dst_cid, dst_port,
>+					 payload_len,
>+					 info->type,
>+					 info->op,
>+					 info->flags);

Maybe now we should trace also `zcopy`.

>+
>+	return skb;
>+out:
>+	kfree_skb(skb);
>+	return NULL;
>+}
>+
> /* This function can only be used on connecting/connected sockets,
>  * since a socket assigned to a transport is required.
>  *
>@@ -222,10 +330,12 @@ static u16 virtio_transport_get_type(struct sock *sk)
> static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> 					  struct virtio_vsock_pkt_info *info)
> {
>+	u32 max_skb_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> 	u32 src_cid, src_port, dst_cid, dst_port;
> 	const struct virtio_transport *t_ops;
> 	struct virtio_vsock_sock *vvs;
> 	u32 pkt_len = info->pkt_len;
>+	bool can_zcopy = false;
> 	u32 rest_len;
> 	int ret;
>
>@@ -254,15 +364,30 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> 	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> 		return pkt_len;
>
>+	if (info->msg) {
>+		/* If zerocopy is not enabled by 'setsockopt()', we behave as
>+		 * there is no MSG_ZEROCOPY flag set.
>+		 */
>+		if (!sock_flag(sk_vsock(vsk), SOCK_ZEROCOPY))
>+			info->msg->msg_flags &= ~MSG_ZEROCOPY;
>+
>+		if (info->msg->msg_flags & MSG_ZEROCOPY)
>+			can_zcopy = virtio_transport_can_zcopy(info, pkt_len);
>+
>+		if (can_zcopy)
>+			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>+					    (MAX_SKB_FRAGS * PAGE_SIZE));
>+	}
>+
> 	rest_len = pkt_len;
>
> 	do {
> 		struct sk_buff *skb;
> 		size_t skb_len;
>
>-		skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE, rest_len);
>+		skb_len = min(max_skb_len, rest_len);
>
>-		skb = virtio_transport_alloc_skb(info, skb_len,
>+		skb = virtio_transport_alloc_skb(vsk, info, skb_len, can_zcopy,
> 						 src_cid, src_port,
> 						 dst_cid, dst_port);
> 		if (!skb) {
>@@ -270,6 +395,21 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> 			break;
> 		}
>
>+		/* We process buffer part by part, allocating skb on
>+		 * each iteration. If this is last skb for this buffer
>+		 * and MSG_ZEROCOPY mode is in use - we must allocate
>+		 * completion for the current syscall.
>+		 */
>+		if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
>+		    skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
>+			if (virtio_transport_init_zcopy_skb(vsk, skb,
>+							    info->msg,
>+							    can_zcopy)) {
>+				ret = -ENOMEM;
>+				break;
>+			}
>+		}
>+
> 		virtio_transport_inc_tx_pkt(vvs, skb);
>
> 		ret = t_ops->send_pkt(skb);
>@@ -985,7 +1125,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
> 	if (!t)
> 		return -ENOTCONN;
>
>-	reply = virtio_transport_alloc_skb(&info, 0,
>+	reply = virtio_transport_alloc_skb(NULL, &info, 0, false,
> 					   le64_to_cpu(hdr->dst_cid),
> 					   le32_to_cpu(hdr->dst_port),
> 					   le64_to_cpu(hdr->src_cid),
>-- 
>2.25.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v8 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations
  2023-09-14 14:07   ` Stefano Garzarella
  (?)
@ 2023-09-14 14:05   ` Arseniy Krasnov
  2023-09-14 15:34       ` Stefano Garzarella
  -1 siblings, 1 reply; 16+ messages in thread
From: Arseniy Krasnov @ 2023-09-14 14:05 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Stefan Hajnoczi, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michael S. Tsirkin, Jason Wang, Bobby Eshleman, kvm,
	virtualization, netdev, linux-kernel, kernel, oxffffaa

Hello Stefano,

On 14.09.2023 17:07, Stefano Garzarella wrote:
> Hi Arseniy,
> 
> On Mon, Sep 11, 2023 at 11:22:30PM +0300, Arseniy Krasnov wrote:
>> Hello,
>>
>> this patchset is first of three parts of another big patchset for
>> MSG_ZEROCOPY flag support:
>> https://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/
>>
>> During review of this series, Stefano Garzarella <sgarzare@redhat.com>
>> suggested to split it for three parts to simplify review and merging:
>>
>> 1) virtio and vhost updates (for fragged skbs) <--- this patchset
>> 2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
>>   tx completions) and update for Documentation/.
>> 3) Updates for tests and utils.
>>
>> This series enables handling of fragged skbs in virtio and vhost parts.
>> Newly logic won't be triggered, because SO_ZEROCOPY options is still
>> impossible to enable at this moment (next bunch of patches from big
>> set above will enable it).
>>
>> I've included changelog to some patches anyway, because there were some
>> comments during review of last big patchset from the link above.
> 
> Thanks, I left some comments on patch 4, the others LGTM.
> Sorry to not having spotted them before, but moving
> virtio_transport_alloc_skb() around the file, made the patch a little
> confusing and difficult to review.

Sure, no problem, I'll fix them! Thanks for review.

> 
> In addition, I started having failures of test 14 (server: host,
> client: guest), so I looked better to see if there was anything wrong,
> but it fails me even without this series applied.
> 
> It happens to me intermittently (~30%), does it happen to you?
> Can you take a look at it?

Yes! sometime ago I also started to get fails of this test, not ~30%,
significantly rare, but it depends on environment I guess, anyway I'm going to
look at this on the next few days

Thanks, Arseniy

> 
> host$ ./vsock_test --mode=server --control-port=12345 --peer-cid=4
> ...
> 14 - SOCK_STREAM virtio skb merge...expected recv(2) returns 8 bytes, got 3
> 
> guest$ ./vsock_test --mode=client --control-host=192.168.133.2 --control-port=12345 --peer-cid=2
> 
> Thanks,
> Stefano
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v8 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations
  2023-09-11 20:22 [PATCH net-next v8 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations Arseniy Krasnov
@ 2023-09-14 14:07   ` Stefano Garzarella
  2023-09-11 20:22 ` [PATCH net-next v8 2/4] vsock/virtio: support to send " Arseniy Krasnov
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Stefano Garzarella @ 2023-09-14 14:07 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Stefan Hajnoczi, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michael S. Tsirkin, Jason Wang, Bobby Eshleman, kvm,
	virtualization, netdev, linux-kernel, kernel, oxffffaa

Hi Arseniy,

On Mon, Sep 11, 2023 at 11:22:30PM +0300, Arseniy Krasnov wrote:
>Hello,
>
>this patchset is first of three parts of another big patchset for
>MSG_ZEROCOPY flag support:
>https://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/
>
>During review of this series, Stefano Garzarella <sgarzare@redhat.com>
>suggested to split it for three parts to simplify review and merging:
>
>1) virtio and vhost updates (for fragged skbs) <--- this patchset
>2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
>   tx completions) and update for Documentation/.
>3) Updates for tests and utils.
>
>This series enables handling of fragged skbs in virtio and vhost parts.
>Newly logic won't be triggered, because SO_ZEROCOPY options is still
>impossible to enable at this moment (next bunch of patches from big
>set above will enable it).
>
>I've included changelog to some patches anyway, because there were some
>comments during review of last big patchset from the link above.

Thanks, I left some comments on patch 4, the others LGTM.
Sorry to not having spotted them before, but moving
virtio_transport_alloc_skb() around the file, made the patch a little
confusing and difficult to review.

In addition, I started having failures of test 14 (server: host,
client: guest), so I looked better to see if there was anything wrong,
but it fails me even without this series applied.

It happens to me intermittently (~30%), does it happen to you?
Can you take a look at it?

host$ ./vsock_test --mode=server --control-port=12345 --peer-cid=4
...
14 - SOCK_STREAM virtio skb merge...expected recv(2) returns 8 bytes, got 3

guest$ ./vsock_test --mode=client --control-host=192.168.133.2 --control-port=12345 --peer-cid=2

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v8 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations
@ 2023-09-14 14:07   ` Stefano Garzarella
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Garzarella @ 2023-09-14 14:07 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Bobby Eshleman, kvm, Michael S. Tsirkin, netdev, linux-kernel,
	virtualization, oxffffaa, Eric Dumazet, Stefan Hajnoczi, kernel,
	Jakub Kicinski, Paolo Abeni, David S. Miller

Hi Arseniy,

On Mon, Sep 11, 2023 at 11:22:30PM +0300, Arseniy Krasnov wrote:
>Hello,
>
>this patchset is first of three parts of another big patchset for
>MSG_ZEROCOPY flag support:
>https://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/
>
>During review of this series, Stefano Garzarella <sgarzare@redhat.com>
>suggested to split it for three parts to simplify review and merging:
>
>1) virtio and vhost updates (for fragged skbs) <--- this patchset
>2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
>   tx completions) and update for Documentation/.
>3) Updates for tests and utils.
>
>This series enables handling of fragged skbs in virtio and vhost parts.
>Newly logic won't be triggered, because SO_ZEROCOPY options is still
>impossible to enable at this moment (next bunch of patches from big
>set above will enable it).
>
>I've included changelog to some patches anyway, because there were some
>comments during review of last big patchset from the link above.

Thanks, I left some comments on patch 4, the others LGTM.
Sorry to not having spotted them before, but moving
virtio_transport_alloc_skb() around the file, made the patch a little
confusing and difficult to review.

In addition, I started having failures of test 14 (server: host,
client: guest), so I looked better to see if there was anything wrong,
but it fails me even without this series applied.

It happens to me intermittently (~30%), does it happen to you?
Can you take a look at it?

host$ ./vsock_test --mode=server --control-port=12345 --peer-cid=4
...
14 - SOCK_STREAM virtio skb merge...expected recv(2) returns 8 bytes, got 3

guest$ ./vsock_test --mode=client --control-host=192.168.133.2 --control-port=12345 --peer-cid=2

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v8 4/4] vsock/virtio: MSG_ZEROCOPY flag support
  2023-09-14 13:57     ` Stefano Garzarella
  (?)
@ 2023-09-14 14:39     ` Arseniy Krasnov
  -1 siblings, 0 replies; 16+ messages in thread
From: Arseniy Krasnov @ 2023-09-14 14:39 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Stefan Hajnoczi, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michael S. Tsirkin, Jason Wang, Bobby Eshleman, kvm,
	virtualization, netdev, linux-kernel, kernel, oxffffaa



On 14.09.2023 16:57, Stefano Garzarella wrote:
> On Mon, Sep 11, 2023 at 11:22:34PM +0300, Arseniy Krasnov wrote:
>> This adds handling of MSG_ZEROCOPY flag on transmission path:
>>
>> 1) If this flag is set and zerocopy transmission is possible (enabled
>>   in socket options and transport allows zerocopy), then non-linear
>>   skb will be created and filled with the pages of user's buffer.
>>   Pages of user's buffer are locked in memory by 'get_user_pages()'.
>> 2) Replaces way of skb owning: instead of 'skb_set_owner_sk_safe()' it
>>   calls 'skb_set_owner_w()'. Reason of this change is that
>>   '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so
>>   to decrease this field correctly, proper skb destructor is needed:
>>   'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
>> 3) Adds new callback to 'struct virtio_transport': 'can_msgzerocopy'.
>>   If this callback is set, then transport needs extra check to be able
>>   to send provided number of buffers in zerocopy mode. Currently, the
>>   only transport that needs this callback set is virtio, because this
>>   transport adds new buffers to the virtio queue and we need to check,
>>   that number of these buffers is less than size of the queue (it is
>>   required by virtio spec). vhost and loopback transports don't need
>>   this check.
>>
>> Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
>> ---
>> Changelog:
>> v5(big patchset) -> v1:
>>  * Refactorings of 'if' conditions.
>>  * Remove extra blank line.
>>  * Remove 'frag_off' field unneeded init.
>>  * Add function 'virtio_transport_fill_skb()' which fills both linear
>>    and non-linear skb with provided data.
>> v1 -> v2:
>>  * Use original order of last four arguments in 'virtio_transport_alloc_skb()'.
>> v2 -> v3:
>>  * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
>>    provided 'iov_iter' with data could be sent in a zerocopy mode.
>>    If this callback is not set in transport - transport allows to send
>>    any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true'
>>    then zerocopy is allowed. Reason of this callback is that in case of
>>    G2H transmission we insert whole skb to the tx virtio queue and such
>>    skb must fit to the size of the virtio queue to be sent in a single
>>    iteration (may be tx logic in 'virtio_transport.c' could be reworked
>>    as in vhost to support partial send of current skb). This callback
>>    will be enabled only for G2H path. For details pls see comment
>>    'Check that tx queue...' below.
>> v3 -> v4:
>>  * 'msgzerocopy_check_iov' moved from 'struct vsock_transport' to
>>    'struct virtio_transport' as it is virtio specific callback and
>>    never needed in other transports.
>> v4 -> v5:
>>  * 'msgzerocopy_check_iov' renamed to 'can_msgzerocopy' and now it
>>    uses number of buffers to send as input argument. I think there is
>>    no need to pass iov to this callback (at least today, it is used only
>>    by guest side of virtio transport), because the only thing that this
>>    callback does is comparison of number of buffers to be inserted to
>>    the tx queue and size of this queue.
>>  * Remove any checks for type of current 'iov_iter' with payload (is it
>>    'iovec' or 'ubuf'). These checks left from the earlier versions where I
>>    didn't use already implemented kernel API which handles every type of
>>    'iov_iter'.
>> v5 -> v6:
>>  * Refactor 'virtio_transport_fill_skb()'.
>>  * Add 'WARN_ON_ONCE()' and comment on invalid combination of destination
>>    socket and payload in 'virtio_transport_alloc_skb()'.
>> v7 -> v8:
>>  * Move '+1' addition from 'can_msgzerocopy' callback body to the caller.
>>    This addition means packet header.
>>  * In 'virtio_transport_can_zcopy()' rename 'max_to_send' argument to
>>    'pkt_len'.
>>  * Update commit message by adding details about new 'can_msgzerocopy'
>>    callback.
>>  * In 'virtio_transport_init_hdr()' move 'len' argument directly after
>>    'info'.
>>  * Add comment about processing last skb in tx loop.
>>  * Update comment for 'can_msgzerocopy' callback for more details.
>>
>> include/linux/virtio_vsock.h            |   9 +
>> net/vmw_vsock/virtio_transport.c        |  32 +++
>> net/vmw_vsock/virtio_transport_common.c | 256 ++++++++++++++++++------
>> 3 files changed, 239 insertions(+), 58 deletions(-)
>>
>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> index a91fbdf233e4..ebb3ce63d64d 100644
>> --- a/include/linux/virtio_vsock.h
>> +++ b/include/linux/virtio_vsock.h
>> @@ -160,6 +160,15 @@ struct virtio_transport {
>>
>>     /* Takes ownership of the packet */
>>     int (*send_pkt)(struct sk_buff *skb);
>> +
>> +    /* Used in MSG_ZEROCOPY mode. Checks, that provided data
>> +     * (number of buffers) could be transmitted with zerocopy
>> +     * mode. If this callback is not implemented for the current
>> +     * transport - this means that this transport doesn't need
>> +     * extra checks and can perform zerocopy transmission by
>> +     * default.
>> +     */
>> +    bool (*can_msgzerocopy)(int bufs_num);
>> };
>>
>> ssize_t
>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>> index 73d730156349..09ba3128e759 100644
>> --- a/net/vmw_vsock/virtio_transport.c
>> +++ b/net/vmw_vsock/virtio_transport.c
>> @@ -455,6 +455,37 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
>>     queue_work(virtio_vsock_workqueue, &vsock->rx_work);
>> }
>>
>> +static bool virtio_transport_can_msgzerocopy(int bufs_num)
>> +{
>> +    struct virtio_vsock *vsock;
>> +    bool res = false;
>> +
>> +    rcu_read_lock();
>> +
>> +    vsock = rcu_dereference(the_virtio_vsock);
>> +    if (vsock) {
>> +        struct virtqueue *vq = vsock->vqs[VSOCK_VQ_TX];
>> +
>> +        /* Check that tx queue is large enough to keep whole
>> +         * data to send. This is needed, because when there is
>> +         * not enough free space in the queue, current skb to
>> +         * send will be reinserted to the head of tx list of
>> +         * the socket to retry transmission later, so if skb
>> +         * is bigger than whole queue, it will be reinserted
>> +         * again and again, thus blocking other skbs to be sent.
>> +         * Each page of the user provided buffer will be added
>> +         * as a single buffer to the tx virtqueue, so compare
>> +         * number of pages against maximum capacity of the queue.
>> +         */
>> +        if (bufs_num <= vq->num_max)
>> +            res = true;
>> +    }
>> +
>> +    rcu_read_unlock();
>> +
>> +    return res;
>> +}
>> +
>> static bool virtio_transport_seqpacket_allow(u32 remote_cid);
>>
>> static struct virtio_transport virtio_transport = {
>> @@ -504,6 +535,7 @@ static struct virtio_transport virtio_transport = {
>>     },
>>
>>     .send_pkt = virtio_transport_send_pkt,
>> +    .can_msgzerocopy = virtio_transport_can_msgzerocopy,
>> };
>>
>> static bool virtio_transport_seqpacket_allow(u32 remote_cid)
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index 3a48e48a99ac..e358f118b07e 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -37,73 +37,110 @@ virtio_transport_get_ops(struct vsock_sock *vsk)
>>     return container_of(t, struct virtio_transport, transport);
>> }
>>
>> -/* Returns a new packet on success, otherwise returns NULL.
>> - *
>> - * If NULL is returned, errp is set to a negative errno.
>> - */
> 
> Why we are removing this comment?

I think I can return it like:
/* Returns a new sk_buff on success, otherwise returns NULL. */

Line related to 'errp' is not needed i guess.

> 
>> -static struct sk_buff *
>> -virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
>> -               size_t len,
>> -               u32 src_cid,
>> -               u32 src_port,
>> -               u32 dst_cid,
>> -               u32 dst_port)
>> -{
>> -    const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len;
>> -    struct virtio_vsock_hdr *hdr;
>> -    struct sk_buff *skb;
>> -    void *payload;
>> -    int err;
>> +static bool virtio_transport_can_zcopy(struct virtio_vsock_pkt_info *info,
>> +                       size_t pkt_len)
>> +{
>> +    const struct virtio_transport *t_ops;
>> +    struct iov_iter *iov_iter;
>>
>> -    skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
>> -    if (!skb)
>> -        return NULL;
>> +    if (!info->msg)
>> +        return false;
>>
>> -    hdr = virtio_vsock_hdr(skb);
>> -    hdr->type    = cpu_to_le16(info->type);
>> -    hdr->op        = cpu_to_le16(info->op);
>> -    hdr->src_cid    = cpu_to_le64(src_cid);
>> -    hdr->dst_cid    = cpu_to_le64(dst_cid);
>> -    hdr->src_port    = cpu_to_le32(src_port);
>> -    hdr->dst_port    = cpu_to_le32(dst_port);
>> -    hdr->flags    = cpu_to_le32(info->flags);
>> -    hdr->len    = cpu_to_le32(len);
>> +    iov_iter = &info->msg->msg_iter;
>>
>> -    if (info->msg && len > 0) {
>> -        payload = skb)put(skb, len);
>> -        err = memcpy_from_msg(payload, info->msg, len);
>> -        if (err)
>> -            goto out;
>> +    if (iov_iter->iov_offset)
>> +        return false;
>>
>> -        if (msg_data_left(info->msg) == 0 &&
>> -            info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) {
>> -            hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
>> +    /* We can't send whole iov. */
>> +    if (iov_iter->count > pkt_len)
>> +        return false;
>>
>> -            if (info->msg->msg_flags & MSG_EOR)
>> -                hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR);
>> -        }
>> +    /* Check that transport can send data in zerocopy mode. */
>> +    t_ops = virtio_transport_get_ops(info->vsk);
> 
> While reviewing I was wondering why here we don't check if `t_ops` is
> NULL.
> 
> Then I realized that the only caller of this function
> (virtio_transport_send_pkt_info()) already get the vsk ops calling
> virtio_transport_get_ops() and also checks if it can be null.
> 
> So what about passing the ops as function parameter to avoid to call
> virtio_transport_get_ops() again?

Ack

> 
>> +
>> +    if (t_ops->can_msgzerocopy) {
>> +        int pages_in_iov = iov_iter_npages(iov_iter, MAX_SKB_FRAGS);
>> +        int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS);
>> +
>> +        /* +1 is for packet header. */
>> +        return t_ops->can_msgzerocopy(pages_to_send + 1);
>>     }
>>
>> -    if (info->reply)
>> -        virtio_vsock_skb_set_reply(skb);
>> +    return true;
>> +}
>>
>> -    trace_virtio_transport_alloc_pkt(src_cid, src_port,
>> -                     dst_cid, dst_port,
>> -                     len,
>> -                     info->type,
>> -                     info->op,
>> -                     info->flags);
>> +static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
>> +                       struct sk_buff *skb,
>> +                       struct msghdr *msg,
>> +                       bool zerocopy)
>> +{
>> +    struct ubuf_info *uarg;
>>
>> -    if (info->vsk && !skb_set_owner_sk_safe(skb, sk_vsock(info->vsk))) {
>> -        WARN_ONCE(1, "failed to allocate skb on vsock socket with sk_refcnt == 0\n");
>> -        goto out;
>> +    if (msg->msg_ubuf) {
>> +        uarg = msg->msg_ubuf;
>> +        net_zcopy_get(uarg);
>> +    } else {
>> +        struct iov_iter *iter = &msg->msg_iter;
>> +        struct ubuf_info_msgzc *uarg_zc;
>> +
>> +        uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> +                        iter->count,
>> +                        NULL);
>> +        if (!uarg)
>> +            return -1;
>> +
>> +        uarg_zc = uarg_to_msgzc(uarg);
>> +        uarg_zc->zerocopy = zerocopy ? 1 : 0;
>>     }
>>
>> -    return skb;
>> +    skb_zcopy_init(skb, uarg);
>>
>> -out:
>> -    kfree_skb(skb);
>> -    return NULL;
>> +    return 0;
>> +}
>> +
>> +static int virtio_transport_fill_skb(struct sk_buff *skb,
>> +                     struct virtio_vsock_pkt_info *info,
>> +                     size_t len,
>> +                     bool zcopy)
>> +{
>> +    void *payload;
>> +    int err;
>> +
>> +    if (zcopy)
>> +        return __zerocopy_sg_from_iter(info->msg, NULL, skb,
>> +                           &info->msg->msg_iter,
>> +                           len);
>> +
>> +    payload = skb_put(skb, len);
>> +    err = memcpy_from_msg(payload, info->msg, len);
>> +    if (err)
>> +        return -1;
>> +
>> +    if (msg_data_left(info->msg))
>> +        return 0;
> 
> We are returning 0 in any case, what is the purpose of this check?

Right, just forget to remove it.

> 
>> +
>> +    return 0;
>> +}
>> +
>> +static void virtio_transport_init_hdr(struct sk_buff *skb,
>> +                      struct virtio_vsock_pkt_info *info,
>> +                      size_t payload_len,
>> +                      u32 src_cid,
>> +                      u32 src_port,
>> +                      u32 dst_cid,
>> +                      u32 dst_port)
>> +{
>> +    struct virtio_vsock_hdr *hdr;
>> +
>> +    hdr = virtio_vsock_hdr(skb);
>> +    hdr->type    = cpu_to_le16(info->type);
>> +    hdr->op        = cpu_to_le16(info->op);
>> +    hdr->src_cid    = cpu_to_le64(src_cid);
>> +    hdr->dst_cid    = cpu_to_le64(dst_cid);
>> +    hdr->src_port    = cpu_to_le32(src_port);
>> +    hdr->dst_port    = cpu_to_le32(dst_port);
>> +    hdr->flags    = cpu_to_le32(info->flags);
>> +    hdr->len    = cpu_to_le32(payload_len);
>> }
>>
>> static void virtio_transport_copy_nonlinear_skb(const struct sk_buff *skb,
>> @@ -214,6 +251,77 @@ static u16 virtio_transport_get_type(struct sock *sk)
>>         return VIRTIO_VSOCK_TYPE_SEQPACKET;
>> }
>>
>> +static struct sk_buff *virtio_transport_alloc_skb(struct vsock_sock *vsk,
> 
> Before this patch we used `info->vsk` in virtio_transport_alloc_skb().
> Is it now really necessary to add vsk as a parameter? If so, why?

Agree, I think previous version of virtio_transport_alloc_skb() could be
used.

> 
>> +                          struct virtio_vsock_pkt_info *info,
>> +                          size_t payload_len,
>> +                          bool zcopy,
>> +                          u32 src_cid,
>> +                          u32 src_port,
>> +                          u32 dst_cid,
>> +                          u32 dst_port)
>> +{
>> +    struct sk_buff *skb;
>> +    size_t skb_len;
>> +
>> +    skb_len = VIRTIO_VSOCK_SKB_HEADROOM;
>> +
>> +    if (!zcopy)
>> +        skb_len += payload_len;
>> +
>> +    skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
>> +    if (!skb)
>> +        return NULL;
>> +
>> +    virtio_transport_init_hdr(skb, info, payload_len, src_cid, src_port,
>> +                  dst_cid, dst_port);
>> +
>> +    /* If 'vsk' != NULL then payload is always present, so we
>> +     * will never call '__zerocopy_sg_from_iter()' below without
>> +     * setting skb owner in 'skb_set_owner_w()'. The only case
>> +     * when 'vsk' == NULL is VIRTIO_VSOCK_OP_RST control message
>> +     * without payload.
>> +     */
>> +    WARN_ON_ONCE(!(vsk && (info->msg && payload_len)) && zcopy);
>> +
>> +    /* Set owner here, because '__zerocopy_sg_from_iter()' uses
>> +     * owner of skb without check to update 'sk_wmem_alloc'.
>> +     */
>> +    if (vsk)
>> +        skb_set_owner_w(skb, sk_vsock(vsk));
>> +
>> +    if (info->msg && payload_len > 0) {
>> +        int err;
>> +
>> +        err = virtio_transport_fill_skb(skb, info, payload_len, zcopy);
>> +        if (err)
>> +            goto out;
>> +
>> +        if (info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) {
> 
> Before this patch, we did these steps only if
> `msg_data_left(info->msg) == 0`, why now we do it in any case?

Ack, it is bug

> 
>> +            struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb);
>> +
>> +            hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
>> +
>> +            if (info->msg->msg_flags & MSG_EOR)
>> +                hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR);
>> +        }
>> +    }
>> +
>> +    if (info->reply)
>> +        virtio_vsock_skb_set_reply(skb);
>> +
>> +    trace_virtio_transport_alloc_pkt(src_cid, src_port,
>> +                     dst_cid, dst_port,
>> +                     payload_len,
>> +                     info->type,
>> +                     info->op,
>> +                     info->flags);
> 
> Maybe now we should trace also `zcopy`.

Ack

> 
>> +
>> +    return skb;
>> +out:
>> +    kfree_skb(skb);
>> +    return NULL;
>> +}
>> +
>> /* This function can only be used on connecting/connected sockets,
>>  * since a socket assigned to a transport is required.
>>  *
>> @@ -222,10 +330,12 @@ static u16 virtio_transport_get_type(struct sock *sk)
>> static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>                       struct virtio_vsock_pkt_info *info)
>> {
>> +    u32 max_skb_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>>     u32 src_cid, src_port, dst_cid, dst_port;
>>     const struct virtio_transport *t_ops;
>>     struct virtio_vsock_sock *vvs;
>>     u32 pkt_len = info->pkt_len;
>> +    bool can_zcopy = false;
>>     u32 rest_len;
>>     int ret;
>>
>> @@ -254,15 +364,30 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>     if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>>         return pkt_len;
>>
>> +    if (info->msg) {
>> +        /* If zerocopy is not enabled by 'setsockopt()', we behave as
>> +         * there is no MSG_ZEROCOPY flag set.
>> +         */
>> +        if (!sock_flag(sk_vsock(vsk), SOCK_ZEROCOPY))
>> +            info->msg->msg_flags &= ~MSG_ZEROCOPY;
>> +
>> +        if (info->msg->msg_flags & MSG_ZEROCOPY)
>> +            can_zcopy = virtio_transport_can_zcopy(info, pkt_len);
>> +
>> +        if (can_zcopy)
>> +            max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>> +                        (MAX_SKB_FRAGS * PAGE_SIZE));
>> +    }
>> +
>>     rest_len = pkt_len;
>>
>>     do {
>>         struct sk_buff *skb;
>>         size_t skb_len;
>>
>> -        skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE, rest_len);
>> +        skb_len = min(max_skb_len, rest_len);
>>
>> -        skb = virtio_transport_alloc_skb(info, skb_len,
>> +        skb = virtio_transport_alloc_skb(vsk, info, skb_len, can_zcopy,
>>                          src_cid, src_port,
>>                          dst_cid, dst_port);
>>         if (!skb) {
>> @@ -270,6 +395,21 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>             break;
>>         }
>>
>> +        /* We process buffer part by part, allocating skb on
>> +         * each iteration. If this is last skb for this buffer
>> +         * and MSG_ZEROCOPY mode is in use - we must allocate
>> +         * completion for the current syscall.
>> +         */
>> +        if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
>> +            skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
>> +            if (virtio_transport_init_zcopy_skb(vsk, skb,
>> +                                info->msg,
>> +                                can_zcopy)) {
>> +                ret = -ENOMEM;
>> +                break;
>> +            }
>> +        }
>> +
>>         virtio_transport_inc_tx_pkt(vvs, skb);
>>
>>         ret = t_ops->send_pkt(skb);
>> @@ -985,7 +1125,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
>>     if (!t)
>>         return -ENOTCONN;
>>
>> -    reply = virtio_transport_alloc_skb(&info, 0,
>> +    reply = virtio_transport_alloc_skb(NULL, &info, 0, false,
>>                        le64_to_cpu(hdr->dst_cid),
>>                        le32_to_cpu(hdr->dst_port),
>>                        le64_to_cpu(hdr->src_cid),
>> -- 
>> 2.25.1
>>
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v8 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations
  2023-09-14 14:05   ` Arseniy Krasnov
@ 2023-09-14 15:34       ` Stefano Garzarella
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Garzarella @ 2023-09-14 15:34 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Stefan Hajnoczi, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michael S. Tsirkin, Jason Wang, Bobby Eshleman, kvm,
	virtualization, netdev, linux-kernel, kernel, oxffffaa

On Thu, Sep 14, 2023 at 05:05:17PM +0300, Arseniy Krasnov wrote:
>Hello Stefano,
>
>On 14.09.2023 17:07, Stefano Garzarella wrote:
>> Hi Arseniy,
>>
>> On Mon, Sep 11, 2023 at 11:22:30PM +0300, Arseniy Krasnov wrote:
>>> Hello,
>>>
>>> this patchset is first of three parts of another big patchset for
>>> MSG_ZEROCOPY flag support:
>>> https://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/
>>>
>>> During review of this series, Stefano Garzarella <sgarzare@redhat.com>
>>> suggested to split it for three parts to simplify review and merging:
>>>
>>> 1) virtio and vhost updates (for fragged skbs) <--- this patchset
>>> 2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
>>>   tx completions) and update for Documentation/.
>>> 3) Updates for tests and utils.
>>>
>>> This series enables handling of fragged skbs in virtio and vhost parts.
>>> Newly logic won't be triggered, because SO_ZEROCOPY options is still
>>> impossible to enable at this moment (next bunch of patches from big
>>> set above will enable it).
>>>
>>> I've included changelog to some patches anyway, because there were some
>>> comments during review of last big patchset from the link above.
>>
>> Thanks, I left some comments on patch 4, the others LGTM.
>> Sorry to not having spotted them before, but moving
>> virtio_transport_alloc_skb() around the file, made the patch a little
>> confusing and difficult to review.
>
>Sure, no problem, I'll fix them! Thanks for review.
>
>>
>> In addition, I started having failures of test 14 (server: host,
>> client: guest), so I looked better to see if there was anything wrong,
>> but it fails me even without this series applied.
>>
>> It happens to me intermittently (~30%), does it happen to you?
>> Can you take a look at it?
>
>Yes! sometime ago I also started to get fails of this test, not ~30%,
>significantly rare, but it depends on environment I guess, anyway I'm going to
>look at this on the next few days

Maybe it's just a timing issue in the test, indeed we are expecting 8
bytes but we received only 3 plus the 2 bytes we received before it
seems exactly the same bytes we send with the first
`send(fd, HELLO_STR, strlen(HELLO_STR), 0);`

Since it is a stream socket, it could happen, so we should retry
the recv() or just use MSG_WAITALL.

Applying the following patch fixed the issue for me (15 mins without
errors for now):

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 90718c2fd4ea..7b0fed9fc58d 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -1129,7 +1129,7 @@ static void test_stream_virtio_skb_merge_server(const struct test_opts *opts)
         control_expectln("SEND0");

         /* Read skbuff partially. */
-       res = recv(fd, buf, 2, 0);
+       res = recv(fd, buf, 2, MSG_WAITALL);
         if (res != 2) {
                 fprintf(stderr, "expected recv(2) returns 2 bytes, got %zi\n", res);
                 exit(EXIT_FAILURE);
@@ -1138,7 +1138,7 @@ static void test_stream_virtio_skb_merge_server(const struct test_opts *opts)
         control_writeln("REPLY0");
         control_expectln("SEND1");

-       res = recv(fd, buf + 2, sizeof(buf) - 2, 0);
+       res = recv(fd, buf + 2, 8, MSG_WAITALL);
         if (res != 8) {
                 fprintf(stderr, "expected recv(2) returns 8 bytes, got %zi\n", res);
                 exit(EXIT_FAILURE);

I will check better all the cases and send a patch upstream.

Anyway it looks just an issue in our test suite :-)

Stefano

>
>Thanks, Arseniy
>
>>
>> host$ ./vsock_test --mode=server --control-port=12345 --peer-cid=4
>> ...
>> 14 - SOCK_STREAM virtio skb merge...expected recv(2) returns 8 bytes, got 3
>>
>> guest$ ./vsock_test --mode=client --control-host=192.168.133.2 --control-port=12345 --peer-cid=2
>>
>> Thanks,
>> Stefano
>>
>


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v8 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations
@ 2023-09-14 15:34       ` Stefano Garzarella
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Garzarella @ 2023-09-14 15:34 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Bobby Eshleman, kvm, Michael S. Tsirkin, netdev, linux-kernel,
	virtualization, oxffffaa, Eric Dumazet, Stefan Hajnoczi, kernel,
	Jakub Kicinski, Paolo Abeni, David S. Miller

On Thu, Sep 14, 2023 at 05:05:17PM +0300, Arseniy Krasnov wrote:
>Hello Stefano,
>
>On 14.09.2023 17:07, Stefano Garzarella wrote:
>> Hi Arseniy,
>>
>> On Mon, Sep 11, 2023 at 11:22:30PM +0300, Arseniy Krasnov wrote:
>>> Hello,
>>>
>>> this patchset is first of three parts of another big patchset for
>>> MSG_ZEROCOPY flag support:
>>> https://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/
>>>
>>> During review of this series, Stefano Garzarella <sgarzare@redhat.com>
>>> suggested to split it for three parts to simplify review and merging:
>>>
>>> 1) virtio and vhost updates (for fragged skbs) <--- this patchset
>>> 2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
>>>   tx completions) and update for Documentation/.
>>> 3) Updates for tests and utils.
>>>
>>> This series enables handling of fragged skbs in virtio and vhost parts.
>>> Newly logic won't be triggered, because SO_ZEROCOPY options is still
>>> impossible to enable at this moment (next bunch of patches from big
>>> set above will enable it).
>>>
>>> I've included changelog to some patches anyway, because there were some
>>> comments during review of last big patchset from the link above.
>>
>> Thanks, I left some comments on patch 4, the others LGTM.
>> Sorry to not having spotted them before, but moving
>> virtio_transport_alloc_skb() around the file, made the patch a little
>> confusing and difficult to review.
>
>Sure, no problem, I'll fix them! Thanks for review.
>
>>
>> In addition, I started having failures of test 14 (server: host,
>> client: guest), so I looked better to see if there was anything wrong,
>> but it fails me even without this series applied.
>>
>> It happens to me intermittently (~30%), does it happen to you?
>> Can you take a look at it?
>
>Yes! sometime ago I also started to get fails of this test, not ~30%,
>significantly rare, but it depends on environment I guess, anyway I'm going to
>look at this on the next few days

Maybe it's just a timing issue in the test, indeed we are expecting 8
bytes but we received only 3 plus the 2 bytes we received before it
seems exactly the same bytes we send with the first
`send(fd, HELLO_STR, strlen(HELLO_STR), 0);`

Since it is a stream socket, it could happen, so we should retry
the recv() or just use MSG_WAITALL.

Applying the following patch fixed the issue for me (15 mins without
errors for now):

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 90718c2fd4ea..7b0fed9fc58d 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -1129,7 +1129,7 @@ static void test_stream_virtio_skb_merge_server(const struct test_opts *opts)
         control_expectln("SEND0");

         /* Read skbuff partially. */
-       res = recv(fd, buf, 2, 0);
+       res = recv(fd, buf, 2, MSG_WAITALL);
         if (res != 2) {
                 fprintf(stderr, "expected recv(2) returns 2 bytes, got %zi\n", res);
                 exit(EXIT_FAILURE);
@@ -1138,7 +1138,7 @@ static void test_stream_virtio_skb_merge_server(const struct test_opts *opts)
         control_writeln("REPLY0");
         control_expectln("SEND1");

-       res = recv(fd, buf + 2, sizeof(buf) - 2, 0);
+       res = recv(fd, buf + 2, 8, MSG_WAITALL);
         if (res != 8) {
                 fprintf(stderr, "expected recv(2) returns 8 bytes, got %zi\n", res);
                 exit(EXIT_FAILURE);

I will check better all the cases and send a patch upstream.

Anyway it looks just an issue in our test suite :-)

Stefano

>
>Thanks, Arseniy
>
>>
>> host$ ./vsock_test --mode=server --control-port=12345 --peer-cid=4
>> ...
>> 14 - SOCK_STREAM virtio skb merge...expected recv(2) returns 8 bytes, got 3
>>
>> guest$ ./vsock_test --mode=client --control-host=192.168.133.2 --control-port=12345 --peer-cid=2
>>
>> Thanks,
>> Stefano
>>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v8 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations
  2023-09-14 15:34       ` Stefano Garzarella
  (?)
@ 2023-09-14 15:43       ` Arseniy Krasnov
  -1 siblings, 0 replies; 16+ messages in thread
From: Arseniy Krasnov @ 2023-09-14 15:43 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Stefan Hajnoczi, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michael S. Tsirkin, Jason Wang, Bobby Eshleman, kvm,
	virtualization, netdev, linux-kernel, kernel, oxffffaa



On 14.09.2023 18:34, Stefano Garzarella wrote:
> On Thu, Sep 14, 2023 at 05:05:17PM +0300, Arseniy Krasnov wrote:
>> Hello Stefano,
>>
>> On 14.09.2023 17:07, Stefano Garzarella wrote:
>>> Hi Arseniy,
>>>
>>> On Mon, Sep 11, 2023 at 11:22:30PM +0300, Arseniy Krasnov wrote:
>>>> Hello,
>>>>
>>>> this patchset is first of three parts of another big patchset for
>>>> MSG_ZEROCOPY flag support:
>>>> https://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/
>>>>
>>>> During review of this series, Stefano Garzarella <sgarzare@redhat.com>
>>>> suggested to split it for three parts to simplify review and merging:
>>>>
>>>> 1) virtio and vhost updates (for fragged skbs) <--- this patchset
>>>> 2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
>>>>   tx completions) and update for Documentation/.
>>>> 3) Updates for tests and utils.
>>>>
>>>> This series enables handling of fragged skbs in virtio and vhost parts.
>>>> Newly logic won't be triggered, because SO_ZEROCOPY options is still
>>>> impossible to enable at this moment (next bunch of patches from big
>>>> set above will enable it).
>>>>
>>>> I've included changelog to some patches anyway, because there were some
>>>> comments during review of last big patchset from the link above.
>>>
>>> Thanks, I left some comments on patch 4, the others LGTM.
>>> Sorry to not having spotted them before, but moving
>>> virtio_transport_alloc_skb() around the file, made the patch a little
>>> confusing and difficult to review.
>>
>> Sure, no problem, I'll fix them! Thanks for review.
>>
>>>
>>> In addition, I started having failures of test 14 (server: host,
>>> client: guest), so I looked better to see if there was anything wrong,
>>> but it fails me even without this series applied.
>>>
>>> It happens to me intermittently (~30%), does it happen to you?
>>> Can you take a look at it?
>>
>> Yes! sometime ago I also started to get fails of this test, not ~30%,
>> significantly rare, but it depends on environment I guess, anyway I'm going to
>> look at this on the next few days
> 
> Maybe it's just a timing issue in the test, indeed we are expecting 8
> bytes but we received only 3 plus the 2 bytes we received before it
> seems exactly the same bytes we send with the first
> `send(fd, HELLO_STR, strlen(HELLO_STR), 0);`
> 
> Since it is a stream socket, it could happen, so we should retry
> the recv() or just use MSG_WAITALL.
> 
> Applying the following patch fixed the issue for me (15 mins without
> errors for now):
> 
> diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
> index 90718c2fd4ea..7b0fed9fc58d 100644
> --- a/tools/testing/vsock/vsock_test.c
> +++ b/tools/testing/vsock/vsock_test.c
> @@ -1129,7 +1129,7 @@ static void test_stream_virtio_skb_merge_server(const struct test_opts *opts)
>         control_expectln("SEND0");
> 
>         /* Read skbuff partially. */
> -       res = recv(fd, buf, 2, 0);
> +       res = recv(fd, buf, 2, MSG_WAITALL);
>         if (res != 2) {
>                 fprintf(stderr, "expected recv(2) returns 2 bytes, got %zi\n", res);
>                 exit(EXIT_FAILURE);
> @@ -1138,7 +1138,7 @@ static void test_stream_virtio_skb_merge_server(const struct test_opts *opts)
>         control_writeln("REPLY0");
>         control_expectln("SEND1");
> 
> -       res = recv(fd, buf + 2, sizeof(buf) - 2, 0);
> +       res = recv(fd, buf + 2, 8, MSG_WAITALL);
>         if (res != 8) {
>                 fprintf(stderr, "expected recv(2) returns 8 bytes, got %zi\n", res);
>                 exit(EXIT_FAILURE);
> 
> I will check better all the cases and send a patch upstream.

Agree, I think this will fix it!

Thanks, Arseniy

> 
> Anyway it looks just an issue in our test suite :-)
> 
> Stefano
> 
>>
>> Thanks, Arseniy
>>
>>>
>>> host$ ./vsock_test --mode=server --control-port=12345 --peer-cid=4
>>> ...
>>> 14 - SOCK_STREAM virtio skb merge...expected recv(2) returns 8 bytes, got 3
>>>
>>> guest$ ./vsock_test --mode=client --control-host=192.168.133.2 --control-port=12345 --peer-cid=2
>>>
>>> Thanks,
>>> Stefano
>>>
>>
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-09-14 15:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-11 20:22 [PATCH net-next v8 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations Arseniy Krasnov
2023-09-11 20:22 ` [PATCH net-next v8 1/4] vsock/virtio/vhost: read data from non-linear skb Arseniy Krasnov
2023-09-11 20:22 ` [PATCH net-next v8 2/4] vsock/virtio: support to send " Arseniy Krasnov
2023-09-14 13:06   ` Stefano Garzarella
2023-09-14 13:06     ` Stefano Garzarella
2023-09-11 20:22 ` [PATCH net-next v8 3/4] vsock/virtio: non-linear skb handling for tap Arseniy Krasnov
2023-09-11 20:22 ` [PATCH net-next v8 4/4] vsock/virtio: MSG_ZEROCOPY flag support Arseniy Krasnov
2023-09-14 13:57   ` Stefano Garzarella
2023-09-14 13:57     ` Stefano Garzarella
2023-09-14 14:39     ` Arseniy Krasnov
2023-09-14 14:07 ` [PATCH net-next v8 0/4] vsock/virtio/vhost: MSG_ZEROCOPY preparations Stefano Garzarella
2023-09-14 14:07   ` Stefano Garzarella
2023-09-14 14:05   ` Arseniy Krasnov
2023-09-14 15:34     ` Stefano Garzarella
2023-09-14 15:34       ` Stefano Garzarella
2023-09-14 15:43       ` Arseniy Krasnov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.