bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit
@ 2021-01-05  9:11 Xuan Zhuo
  2021-01-05  9:11 ` [PATCH netdev 1/5] xsk: support get page for drv Xuan Zhuo
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Xuan Zhuo @ 2021-01-05  9:11 UTC (permalink / raw)
  To: netdev
  Cc: dust.li, tonylu, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Jakub Kicinski, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	open list:VIRTIO CORE AND NET DRIVERS, open list,
	open list:XDP SOCKETS (AF_XDP)

The first patch made some adjustments to xsk.

The second patch itself can be used as an independent patch to solve the problem
that XDP may fail to load when the number of queues is insufficient.

The third to last patch implements support for xsk in virtio-net.

A practical problem with virtio is that tx interrupts are not very reliable.
There will always be some missing or delayed tx interrupts. So I specially added
a point timer to solve this problem. Of course, considering performance issues,
The timer only triggers when the ring of the network card is full.

Regarding the issue of virtio-net supporting xsk's zero copy rx, I am also
developing it, but I found that the modification may be relatively large, so I
consider this patch set to be separated from the code related to xsk zero copy
rx.

Xuan Zhuo (5):
  xsk: support get page for drv
  virtio-net: support XDP_TX when not more queues
  virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx
  xsk, virtio-net: prepare for support xsk
  virtio-net, xsk: virtio-net support xsk zero copy tx

 drivers/net/virtio_net.c    | 643 +++++++++++++++++++++++++++++++++++++++-----
 include/linux/netdevice.h   |   1 +
 include/net/xdp_sock_drv.h  |  10 +
 include/net/xsk_buff_pool.h |   1 +
 net/xdp/xsk_buff_pool.c     |  10 +-
 5 files changed, 597 insertions(+), 68 deletions(-)

--
1.8.3.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH netdev 1/5] xsk: support get page for drv
  2021-01-05  9:11 [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit Xuan Zhuo
@ 2021-01-05  9:11 ` Xuan Zhuo
  2021-01-05  9:11 ` [PATCH netdev 2/5] virtio-net: support XDP_TX when not more queues Xuan Zhuo
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2021-01-05  9:11 UTC (permalink / raw)
  To: netdev
  Cc: dust.li, tonylu, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Jakub Kicinski, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	open list:VIRTIO CORE AND NET DRIVERS, open list,
	open list:XDP SOCKETS (AF_XDP)

For some drivers, such as virtio-net, we do not configure dma when
binding xsk. We will get the page when sending.

This patch participates in a field need_dma during the setup pool. If
the device does not use dma, this value should be set to false.

And a function xsk_buff_raw_get_page is added to get the page based on
addr in drv.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 include/linux/netdevice.h   |  1 +
 include/net/xdp_sock_drv.h  | 10 ++++++++++
 include/net/xsk_buff_pool.h |  1 +
 net/xdp/xsk_buff_pool.c     | 10 +++++++++-
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7bf1679..b8baef9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -915,6 +915,7 @@ struct netdev_bpf {
 		struct {
 			struct xsk_buff_pool *pool;
 			u16 queue_id;
+			bool need_dma;
 		} xsk;
 	};
 };
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 4e295541..e9c7e25 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -100,6 +100,11 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
 	return xp_raw_get_data(pool, addr);
 }
 
+static inline struct page *xsk_buff_raw_get_page(struct xsk_buff_pool *pool, u64 addr)
+{
+	return xp_raw_get_page(pool, addr);
+}
+
 static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool)
 {
 	struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp);
@@ -232,6 +237,11 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
 	return NULL;
 }
 
+static inline struct page *xsk_buff_raw_get_page(struct xsk_buff_pool *pool, u64 addr)
+{
+	return NULL;
+}
+
 static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool)
 {
 }
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index 01755b8..54e461d 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -103,6 +103,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
 bool xp_can_alloc(struct xsk_buff_pool *pool, u32 count);
 void *xp_raw_get_data(struct xsk_buff_pool *pool, u64 addr);
 dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr);
+struct page *xp_raw_get_page(struct xsk_buff_pool *pool, u64 addr);
 static inline dma_addr_t xp_get_dma(struct xdp_buff_xsk *xskb)
 {
 	return xskb->dma;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 67a4494..9bb058f 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -167,12 +167,13 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
 	bpf.command = XDP_SETUP_XSK_POOL;
 	bpf.xsk.pool = pool;
 	bpf.xsk.queue_id = queue_id;
+	bpf.xsk.need_dma = true;
 
 	err = netdev->netdev_ops->ndo_bpf(netdev, &bpf);
 	if (err)
 		goto err_unreg_pool;
 
-	if (!pool->dma_pages) {
+	if (bpf.xsk.need_dma && !pool->dma_pages) {
 		WARN(1, "Driver did not DMA map zero-copy buffers");
 		err = -EINVAL;
 		goto err_unreg_xsk;
@@ -536,6 +537,13 @@ void *xp_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
 }
 EXPORT_SYMBOL(xp_raw_get_data);
 
+struct page *xp_raw_get_page(struct xsk_buff_pool *pool, u64 addr)
+{
+	addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
+	return pool->umem->pgs[addr >> PAGE_SHIFT];
+}
+EXPORT_SYMBOL(xp_raw_get_page);
+
 dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr)
 {
 	addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH netdev 2/5] virtio-net: support XDP_TX when not more queues
  2021-01-05  9:11 [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit Xuan Zhuo
  2021-01-05  9:11 ` [PATCH netdev 1/5] xsk: support get page for drv Xuan Zhuo
@ 2021-01-05  9:11 ` Xuan Zhuo
  2021-01-05  9:11 ` [PATCH netdev 3/5] virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx Xuan Zhuo
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2021-01-05  9:11 UTC (permalink / raw)
  To: netdev
  Cc: dust.li, tonylu, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Jakub Kicinski, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	open list:VIRTIO CORE AND NET DRIVERS, open list,
	open list:XDP SOCKETS (AF_XDP)

The number of queues implemented by many virtio backends is limited,
especially some machines have a large number of CPUs. In this case, it
is often impossible to allocate a separate queue for XDP_TX.

This patch allows XDP_TX to run by lock when not enough queue.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 42 ++++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f65eea6..f2349b8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -194,6 +194,7 @@ struct virtnet_info {
 
 	/* # of XDP queue pairs currently used by the driver */
 	u16 xdp_queue_pairs;
+	bool xdp_enable;
 
 	/* I like... big packets and I cannot lie! */
 	bool big_packets;
@@ -481,14 +482,34 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
 	return 0;
 }
 
-static struct send_queue *virtnet_xdp_sq(struct virtnet_info *vi)
+static struct send_queue *virtnet_get_xdp_sq(struct virtnet_info *vi)
 {
 	unsigned int qp;
+	struct netdev_queue *txq;
+
+	if (vi->curr_queue_pairs > nr_cpu_ids) {
+		qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
+	} else {
+		qp = smp_processor_id() % vi->curr_queue_pairs;
+		txq = netdev_get_tx_queue(vi->dev, qp);
+		__netif_tx_lock(txq, raw_smp_processor_id());
+	}
 
-	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
 	return &vi->sq[qp];
 }
 
+static void virtnet_put_xdp_sq(struct virtnet_info *vi)
+{
+	unsigned int qp;
+	struct netdev_queue *txq;
+
+	if (vi->curr_queue_pairs <= nr_cpu_ids) {
+		qp = smp_processor_id() % vi->curr_queue_pairs;
+		txq = netdev_get_tx_queue(vi->dev, qp);
+		__netif_tx_unlock(txq);
+	}
+}
+
 static int virtnet_xdp_xmit(struct net_device *dev,
 			    int n, struct xdp_frame **frames, u32 flags)
 {
@@ -512,7 +533,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	if (!xdp_prog)
 		return -ENXIO;
 
-	sq = virtnet_xdp_sq(vi);
+	sq = virtnet_get_xdp_sq(vi);
 
 	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) {
 		ret = -EINVAL;
@@ -560,12 +581,13 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	sq->stats.kicks += kicks;
 	u64_stats_update_end(&sq->stats.syncp);
 
+	virtnet_put_xdp_sq(vi);
 	return ret;
 }
 
 static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
 {
-	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
+	return vi->xdp_enable ? VIRTIO_XDP_HEADROOM : 0;
 }
 
 /* We copy the packet for XDP in the following cases:
@@ -1457,12 +1479,13 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 		xdp_do_flush();
 
 	if (xdp_xmit & VIRTIO_XDP_TX) {
-		sq = virtnet_xdp_sq(vi);
+		sq = virtnet_get_xdp_sq(vi);
 		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
 			u64_stats_update_begin(&sq->stats.syncp);
 			sq->stats.kicks++;
 			u64_stats_update_end(&sq->stats.syncp);
 		}
+		virtnet_put_xdp_sq(vi);
 	}
 
 	return received;
@@ -2415,10 +2438,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 
 	/* XDP requires extra queues for XDP_TX */
 	if (curr_qp + xdp_qp > vi->max_queue_pairs) {
-		NL_SET_ERR_MSG_MOD(extack, "Too few free TX rings available");
-		netdev_warn(dev, "request %i queues but max is %i\n",
-			    curr_qp + xdp_qp, vi->max_queue_pairs);
-		return -ENOMEM;
+		xdp_qp = 0;
 	}
 
 	old_prog = rtnl_dereference(vi->rq[0].xdp_prog);
@@ -2451,12 +2471,14 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 	netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
 	vi->xdp_queue_pairs = xdp_qp;
 
+	vi->xdp_enable = false;
 	if (prog) {
 		for (i = 0; i < vi->max_queue_pairs; i++) {
 			rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
 			if (i == 0 && !old_prog)
 				virtnet_clear_guest_offloads(vi);
 		}
+		vi->xdp_enable = true;
 	}
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
@@ -2524,7 +2546,7 @@ static int virtnet_set_features(struct net_device *dev,
 	int err;
 
 	if ((dev->features ^ features) & NETIF_F_LRO) {
-		if (vi->xdp_queue_pairs)
+		if (vi->xdp_enable)
 			return -EBUSY;
 
 		if (features & NETIF_F_LRO)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH netdev 3/5] virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx
  2021-01-05  9:11 [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit Xuan Zhuo
  2021-01-05  9:11 ` [PATCH netdev 1/5] xsk: support get page for drv Xuan Zhuo
  2021-01-05  9:11 ` [PATCH netdev 2/5] virtio-net: support XDP_TX when not more queues Xuan Zhuo
@ 2021-01-05  9:11 ` Xuan Zhuo
  2021-01-05  9:11 ` [PATCH netdev 4/5] xsk, virtio-net: prepare for support xsk Xuan Zhuo
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2021-01-05  9:11 UTC (permalink / raw)
  To: netdev
  Cc: dust.li, tonylu, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Jakub Kicinski, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	open list:VIRTIO CORE AND NET DRIVERS, open list,
	open list:XDP SOCKETS (AF_XDP)

If support xsk, a new ptr will be recovered during the
process of freeing the old ptr. In order to distinguish between ctx sent
by XDP_TX and ctx sent by xsk, a struct is added here to distinguish
between these two situations. virtnet_xdp_type.type It is used to
distinguish different ctx, and virtnet_xdp_type.offset is used to record
the offset between "true ctx" and virtnet_xdp_type.

The newly added virtnet_xsk_hdr will be used for xsk.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 77 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 62 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f2349b8..df38a9f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -94,6 +94,22 @@ struct virtnet_rq_stats {
 	u64 kicks;
 };
 
+enum {
+	XDP_TYPE_XSK,
+	XDP_TYPE_TX,
+};
+
+struct virtnet_xdp_type {
+	int offset:24;
+	unsigned type:8;
+};
+
+struct virtnet_xsk_hdr {
+	struct virtnet_xdp_type type;
+	struct virtio_net_hdr_mrg_rxbuf hdr;
+	u32 len;
+};
+
 #define VIRTNET_SQ_STAT(m)	offsetof(struct virtnet_sq_stats, m)
 #define VIRTNET_RQ_STAT(m)	offsetof(struct virtnet_rq_stats, m)
 
@@ -252,14 +268,19 @@ static bool is_xdp_frame(void *ptr)
 	return (unsigned long)ptr & VIRTIO_XDP_FLAG;
 }
 
-static void *xdp_to_ptr(struct xdp_frame *ptr)
+static void *xdp_to_ptr(struct virtnet_xdp_type *ptr)
 {
 	return (void *)((unsigned long)ptr | VIRTIO_XDP_FLAG);
 }
 
-static struct xdp_frame *ptr_to_xdp(void *ptr)
+static struct virtnet_xdp_type *ptr_to_xtype(void *ptr)
 {
-	return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
+	return (struct virtnet_xdp_type *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
+}
+
+static void *xtype_got_ptr(struct virtnet_xdp_type *xdptype)
+{
+	return (char *)xdptype + xdptype->offset;
 }
 
 /* Converting between virtqueue no. and kernel tx/rx queue no.
@@ -460,11 +481,16 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
 				   struct xdp_frame *xdpf)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
+	struct virtnet_xdp_type *xdptype;
 	int err;
 
-	if (unlikely(xdpf->headroom < vi->hdr_len))
+	if (unlikely(xdpf->headroom < vi->hdr_len + sizeof(*xdptype)))
 		return -EOVERFLOW;
 
+	xdptype = (struct virtnet_xdp_type *)(xdpf + 1);
+	xdptype->offset = (char *)xdpf - (char *)xdptype;
+	xdptype->type = XDP_TYPE_TX;
+
 	/* Make room for virtqueue hdr (also change xdpf->headroom?) */
 	xdpf->data -= vi->hdr_len;
 	/* Zero header and leave csum up to XDP layers */
@@ -474,7 +500,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
 
 	sg_init_one(sq->sg, xdpf->data, xdpf->len);
 
-	err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdpf),
+	err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdptype),
 				   GFP_ATOMIC);
 	if (unlikely(err))
 		return -ENOSPC; /* Caller handle free/refcnt */
@@ -544,8 +570,11 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	/* Free up any pending old buffers before queueing new ones. */
 	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		if (likely(is_xdp_frame(ptr))) {
-			struct xdp_frame *frame = ptr_to_xdp(ptr);
+			struct virtnet_xdp_type *xtype;
+			struct xdp_frame *frame;
 
+			xtype = ptr_to_xtype(ptr);
+			frame = xtype_got_ptr(xtype);
 			bytes += frame->len;
 			xdp_return_frame(frame);
 		} else {
@@ -1395,24 +1424,34 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 
 static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 {
-	unsigned int len;
 	unsigned int packets = 0;
 	unsigned int bytes = 0;
-	void *ptr;
+	unsigned int len;
+	struct virtnet_xdp_type *xtype;
+	struct xdp_frame        *frame;
+	struct virtnet_xsk_hdr  *xskhdr;
+	struct sk_buff          *skb;
+	void                    *ptr;
 
 	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		if (likely(!is_xdp_frame(ptr))) {
-			struct sk_buff *skb = ptr;
+			skb = ptr;
 
 			pr_debug("Sent skb %p\n", skb);
 
 			bytes += skb->len;
 			napi_consume_skb(skb, in_napi);
 		} else {
-			struct xdp_frame *frame = ptr_to_xdp(ptr);
+			xtype = ptr_to_xtype(ptr);
 
-			bytes += frame->len;
-			xdp_return_frame(frame);
+			if (xtype->type == XDP_TYPE_XSK) {
+				xskhdr = (struct virtnet_xsk_hdr *)xtype;
+				bytes += xskhdr->len;
+			} else {
+				frame = xtype_got_ptr(xtype);
+				xdp_return_frame(frame);
+				bytes += frame->len;
+			}
 		}
 		packets++;
 	}
@@ -2675,14 +2714,22 @@ static void free_unused_bufs(struct virtnet_info *vi)
 {
 	void *buf;
 	int i;
+	struct send_queue *sq;
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		struct virtqueue *vq = vi->sq[i].vq;
+		sq = vi->sq + i;
 		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
-			if (!is_xdp_frame(buf))
+			if (!is_xdp_frame(buf)) {
 				dev_kfree_skb(buf);
-			else
-				xdp_return_frame(ptr_to_xdp(buf));
+			} else {
+				struct virtnet_xdp_type *xtype;
+
+				xtype = ptr_to_xtype(buf);
+
+				if (xtype->type != XDP_TYPE_XSK)
+					xdp_return_frame(xtype_got_ptr(xtype));
+			}
 		}
 	}
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH netdev 4/5] xsk, virtio-net: prepare for support xsk
  2021-01-05  9:11 [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit Xuan Zhuo
                   ` (2 preceding siblings ...)
  2021-01-05  9:11 ` [PATCH netdev 3/5] virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx Xuan Zhuo
@ 2021-01-05  9:11 ` Xuan Zhuo
  2021-01-05  9:11 ` [PATCH netdev 5/5] virtio-net, xsk: virtio-net support xsk zero copy tx Xuan Zhuo
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2021-01-05  9:11 UTC (permalink / raw)
  To: netdev
  Cc: dust.li, tonylu, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Jakub Kicinski, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	open list:VIRTIO CORE AND NET DRIVERS, open list,
	open list:XDP SOCKETS (AF_XDP)

Split function free_old_xmit_skbs, add sub-function __free_old_xmit_ptr,
which is convenient to call with other statistical information, and
supports the parameter 'xsk_wakeup' required for processing xsk.

Use netif stop check as a function virtnet_sq_stop_check, which will be
used when adding xsk support.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 95 ++++++++++++++++++++++++++----------------------
 1 file changed, 52 insertions(+), 43 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index df38a9f..e744dce 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -263,6 +263,11 @@ struct padded_vnet_hdr {
 	char padding[4];
 };
 
+static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
+				bool xsk_wakeup,
+				unsigned int *_packets, unsigned int *_bytes);
+static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi);
+
 static bool is_xdp_frame(void *ptr)
 {
 	return (unsigned long)ptr & VIRTIO_XDP_FLAG;
@@ -376,6 +381,37 @@ static void skb_xmit_done(struct virtqueue *vq)
 		netif_wake_subqueue(vi->dev, vq2txq(vq));
 }
 
+static void virtnet_sq_stop_check(struct send_queue *sq, bool in_napi)
+{
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct net_device *dev = vi->dev;
+	int qnum = sq - vi->sq;
+
+	/* If running out of space, stop queue to avoid getting packets that we
+	 * are then unable to transmit.
+	 * An alternative would be to force queuing layer to requeue the skb by
+	 * returning NETDEV_TX_BUSY. However, NETDEV_TX_BUSY should not be
+	 * returned in a normal path of operation: it means that driver is not
+	 * maintaining the TX queue stop/start state properly, and causes
+	 * the stack to do a non-trivial amount of useless work.
+	 * Since most packets only take 1 or 2 ring slots, stopping the queue
+	 * early means 16 slots are typically wasted.
+	 */
+
+	if (sq->vq->num_free < 2 + MAX_SKB_FRAGS) {
+		netif_stop_subqueue(dev, qnum);
+		if (!sq->napi.weight &&
+		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
+			/* More just got used, free them then recheck. */
+			free_old_xmit_skbs(sq, in_napi);
+			if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) {
+				netif_start_subqueue(dev, qnum);
+				virtqueue_disable_cb(sq->vq);
+			}
+		}
+	}
+}
+
 #define MRG_CTX_HEADER_SHIFT 22
 static void *mergeable_len_to_ctx(unsigned int truesize,
 				  unsigned int headroom)
@@ -543,13 +579,11 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	struct receive_queue *rq = vi->rq;
 	struct bpf_prog *xdp_prog;
 	struct send_queue *sq;
-	unsigned int len;
 	int packets = 0;
 	int bytes = 0;
 	int drops = 0;
 	int kicks = 0;
 	int ret, err;
-	void *ptr;
 	int i;
 
 	/* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
@@ -567,24 +601,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 		goto out;
 	}
 
-	/* Free up any pending old buffers before queueing new ones. */
-	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
-		if (likely(is_xdp_frame(ptr))) {
-			struct virtnet_xdp_type *xtype;
-			struct xdp_frame *frame;
-
-			xtype = ptr_to_xtype(ptr);
-			frame = xtype_got_ptr(xtype);
-			bytes += frame->len;
-			xdp_return_frame(frame);
-		} else {
-			struct sk_buff *skb = ptr;
-
-			bytes += skb->len;
-			napi_consume_skb(skb, false);
-		}
-		packets++;
-	}
+	__free_old_xmit_ptr(sq, false, true, &packets, &bytes);
 
 	for (i = 0; i < n; i++) {
 		struct xdp_frame *xdpf = frames[i];
@@ -1422,7 +1439,9 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 	return stats.packets;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
+static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
+				bool xsk_wakeup,
+				unsigned int *_packets, unsigned int *_bytes)
 {
 	unsigned int packets = 0;
 	unsigned int bytes = 0;
@@ -1456,6 +1475,17 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 		packets++;
 	}
 
+	*_packets = packets;
+	*_bytes = bytes;
+}
+
+static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
+{
+	unsigned int packets = 0;
+	unsigned int bytes = 0;
+
+	__free_old_xmit_ptr(sq, in_napi, true, &packets, &bytes);
+
 	/* Avoid overhead when no packets have been processed
 	 * happens when called speculatively from start_xmit.
 	 */
@@ -1672,28 +1702,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		nf_reset_ct(skb);
 	}
 
-	/* If running out of space, stop queue to avoid getting packets that we
-	 * are then unable to transmit.
-	 * An alternative would be to force queuing layer to requeue the skb by
-	 * returning NETDEV_TX_BUSY. However, NETDEV_TX_BUSY should not be
-	 * returned in a normal path of operation: it means that driver is not
-	 * maintaining the TX queue stop/start state properly, and causes
-	 * the stack to do a non-trivial amount of useless work.
-	 * Since most packets only take 1 or 2 ring slots, stopping the queue
-	 * early means 16 slots are typically wasted.
-	 */
-	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
-		netif_stop_subqueue(dev, qnum);
-		if (!use_napi &&
-		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
-			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq, false);
-			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
-				netif_start_subqueue(dev, qnum);
-				virtqueue_disable_cb(sq->vq);
-			}
-		}
-	}
+	virtnet_sq_stop_check(sq, false);
 
 	if (kick || netif_xmit_stopped(txq)) {
 		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH netdev 5/5] virtio-net, xsk: virtio-net support xsk zero copy tx
  2021-01-05  9:11 [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit Xuan Zhuo
                   ` (3 preceding siblings ...)
  2021-01-05  9:11 ` [PATCH netdev 4/5] xsk, virtio-net: prepare for support xsk Xuan Zhuo
@ 2021-01-05  9:11 ` Xuan Zhuo
  2021-01-05 13:21   ` Michael S. Tsirkin
  2021-01-05  9:32 ` [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit Jason Wang
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2021-01-05  9:11 UTC (permalink / raw)
  To: netdev
  Cc: dust.li, tonylu, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Jakub Kicinski, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	open list:VIRTIO CORE AND NET DRIVERS, open list,
	open list:XDP SOCKETS (AF_XDP)

Virtio net support xdp socket.

We should open the module param "napi_tx" for using this feature.

In fact, various virtio implementations have some problems:
1. The tx interrupt may be lost
2. The tx interrupt may have a relatively large delay

This brings us to several questions:

1. Wakeup wakes up a tx interrupt or directly starts a napi on the
   current cpu, which will cause a delay in sending packets.
2. When the tx ring is full, the tx interrupt may be lost or delayed,
   resulting in untimely recovery.

I choose to send part of the data directly during wakeup. If the sending
has not been completed, I will start a napi to complete the subsequent
sending work.

Since the possible delay or loss of tx interrupt occurs when the tx ring
is full, I added a timer to solve this problem.

The performance of udp sending based on virtio net + xsk is 6 times that
of ordinary kernel udp send.

* xsk_check_timeout: when the dev full or all xsk.hdr used, start timer
  to check the xsk.hdr is avail. the unit is us.
* xsk_num_max: the xsk.hdr max num
* xsk_num_percent: the max hdr num be the percent of the virtio ring
  size. The real xsk hdr num will the min of xsk_num_max and the percent
  of the num of virtio ring
* xsk_budget: the budget for xsk run

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 434 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e744dce..76319e7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -22,10 +22,21 @@
 #include <net/route.h>
 #include <net/xdp.h>
 #include <net/net_failover.h>
+#include <net/xdp_sock_drv.h>
 
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
 
+static int xsk_check_timeout = 100;
+static int xsk_num_max       = 1024;
+static int xsk_num_percent   = 80;
+static int xsk_budget        = 128;
+
+module_param(xsk_check_timeout, int, 0644);
+module_param(xsk_num_max,       int, 0644);
+module_param(xsk_num_percent,   int, 0644);
+module_param(xsk_budget,        int, 0644);
+
 static bool csum = true, gso = true, napi_tx = true;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
@@ -110,6 +121,9 @@ struct virtnet_xsk_hdr {
 	u32 len;
 };
 
+#define VIRTNET_STATE_XSK_WAKEUP BIT(0)
+#define VIRTNET_STATE_XSK_TIMER BIT(1)
+
 #define VIRTNET_SQ_STAT(m)	offsetof(struct virtnet_sq_stats, m)
 #define VIRTNET_RQ_STAT(m)	offsetof(struct virtnet_rq_stats, m)
 
@@ -149,6 +163,32 @@ struct send_queue {
 	struct virtnet_sq_stats stats;
 
 	struct napi_struct napi;
+
+	struct {
+		struct xsk_buff_pool   __rcu *pool;
+		struct virtnet_xsk_hdr __rcu *hdr;
+
+		unsigned long          state;
+		u64                    hdr_con;
+		u64                    hdr_pro;
+		u64                    hdr_n;
+		struct xdp_desc        last_desc;
+		bool                   wait_slot;
+		/* tx interrupt issues
+		 *   1. that may be lost
+		 *   2. that too slow, 200/s or delay 10ms
+		 *
+		 * timer for:
+		 * 1. recycle the desc.(no check for performance, see below)
+		 * 2. check the nic ring is avali. when nic ring is full
+		 *
+		 * Here, the regular check is performed for dev full. The
+		 * application layer must ensure that the number of cq is
+		 * sufficient, otherwise there may be insufficient cq in use.
+		 *
+		 */
+		struct hrtimer          timer;
+	} xsk;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -267,6 +307,8 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
 				bool xsk_wakeup,
 				unsigned int *_packets, unsigned int *_bytes);
 static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi);
+static int virtnet_xsk_run(struct send_queue *sq,
+			   struct xsk_buff_pool *pool, int budget);
 
 static bool is_xdp_frame(void *ptr)
 {
@@ -1439,6 +1481,40 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 	return stats.packets;
 }
 
+static void virt_xsk_complete(struct send_queue *sq, u32 num, bool xsk_wakeup)
+{
+	struct xsk_buff_pool *pool;
+	int n;
+
+	rcu_read_lock();
+
+	WRITE_ONCE(sq->xsk.hdr_pro, sq->xsk.hdr_pro + num);
+
+	pool = rcu_dereference(sq->xsk.pool);
+	if (!pool) {
+		if (sq->xsk.hdr_pro - sq->xsk.hdr_con == sq->xsk.hdr_n) {
+			kfree(sq->xsk.hdr);
+			rcu_assign_pointer(sq->xsk.hdr, NULL);
+		}
+		rcu_read_unlock();
+		return;
+	}
+
+	xsk_tx_completed(pool, num);
+
+	rcu_read_unlock();
+
+	if (!xsk_wakeup || !sq->xsk.wait_slot)
+		return;
+
+	n = sq->xsk.hdr_pro - sq->xsk.hdr_con;
+
+	if (n > sq->xsk.hdr_n / 2) {
+		sq->xsk.wait_slot = false;
+		virtqueue_napi_schedule(&sq->napi, sq->vq);
+	}
+}
+
 static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
 				bool xsk_wakeup,
 				unsigned int *_packets, unsigned int *_bytes)
@@ -1446,6 +1522,7 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
 	unsigned int packets = 0;
 	unsigned int bytes = 0;
 	unsigned int len;
+	u64 xsknum = 0;
 	struct virtnet_xdp_type *xtype;
 	struct xdp_frame        *frame;
 	struct virtnet_xsk_hdr  *xskhdr;
@@ -1466,6 +1543,7 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
 			if (xtype->type == XDP_TYPE_XSK) {
 				xskhdr = (struct virtnet_xsk_hdr *)xtype;
 				bytes += xskhdr->len;
+				xsknum += 1;
 			} else {
 				frame = xtype_got_ptr(xtype);
 				xdp_return_frame(frame);
@@ -1475,6 +1553,9 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
 		packets++;
 	}
 
+	if (xsknum)
+		virt_xsk_complete(sq, xsknum, xsk_wakeup);
+
 	*_packets = packets;
 	*_bytes = bytes;
 }
@@ -1595,6 +1676,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	unsigned int index = vq2txq(sq->vq);
 	struct netdev_queue *txq;
+	struct xsk_buff_pool *pool;
+	int work = 0;
 
 	if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
 		/* We don't need to enable cb for XDP */
@@ -1604,15 +1687,26 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 
 	txq = netdev_get_tx_queue(vi->dev, index);
 	__netif_tx_lock(txq, raw_smp_processor_id());
-	free_old_xmit_skbs(sq, true);
+
+	rcu_read_lock();
+	pool = rcu_dereference(sq->xsk.pool);
+	if (pool) {
+		work = virtnet_xsk_run(sq, pool, budget);
+		rcu_read_unlock();
+	} else {
+		rcu_read_unlock();
+		free_old_xmit_skbs(sq, true);
+	}
+
 	__netif_tx_unlock(txq);
 
-	virtqueue_napi_complete(napi, sq->vq, 0);
+	if (work < budget)
+		virtqueue_napi_complete(napi, sq->vq, 0);
 
 	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
 		netif_tx_wake_queue(txq);
 
-	return 0;
+	return work;
 }
 
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
@@ -2560,16 +2654,346 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 	return err;
 }
 
+static enum hrtimer_restart virtnet_xsk_timeout(struct hrtimer *timer)
+{
+	struct send_queue *sq;
+
+	sq = container_of(timer, struct send_queue, xsk.timer);
+
+	clear_bit(VIRTNET_STATE_XSK_TIMER, &sq->xsk.state);
+
+	virtqueue_napi_schedule(&sq->napi, sq->vq);
+
+	return HRTIMER_NORESTART;
+}
+
+static int virtnet_xsk_pool_enable(struct net_device *dev,
+				   struct xsk_buff_pool *pool,
+				   u16 qid)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	struct send_queue *sq = &vi->sq[qid];
+	struct virtnet_xsk_hdr *hdr;
+	int n, ret = 0;
+
+	if (qid >= dev->real_num_rx_queues || qid >= dev->real_num_tx_queues)
+		return -EINVAL;
+
+	if (qid >= vi->curr_queue_pairs)
+		return -EINVAL;
+
+	rcu_read_lock();
+
+	ret = -EBUSY;
+	if (rcu_dereference(sq->xsk.pool))
+		goto end;
+
+	/* check last xsk wait for hdr been free */
+	if (rcu_dereference(sq->xsk.hdr))
+		goto end;
+
+	n = virtqueue_get_vring_size(sq->vq);
+	n = min(xsk_num_max, n * (xsk_num_percent % 100) / 100);
+
+	ret = -ENOMEM;
+	hdr = kcalloc(n, sizeof(struct virtnet_xsk_hdr), GFP_ATOMIC);
+	if (!hdr)
+		goto end;
+
+	memset(&sq->xsk, 0, sizeof(sq->xsk));
+
+	sq->xsk.hdr_pro = n;
+	sq->xsk.hdr_n   = n;
+
+	hrtimer_init(&sq->xsk.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
+	sq->xsk.timer.function = virtnet_xsk_timeout;
+
+	rcu_assign_pointer(sq->xsk.pool, pool);
+	rcu_assign_pointer(sq->xsk.hdr, hdr);
+
+	ret = 0;
+end:
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static int virtnet_xsk_pool_disable(struct net_device *dev, u16 qid)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	struct send_queue *sq = &vi->sq[qid];
+
+	if (qid >= dev->real_num_rx_queues || qid >= dev->real_num_tx_queues)
+		return -EINVAL;
+
+	if (qid >= vi->curr_queue_pairs)
+		return -EINVAL;
+
+	rcu_assign_pointer(sq->xsk.pool, NULL);
+
+	hrtimer_cancel(&sq->xsk.timer);
+
+	synchronize_rcu(); /* Sync with the XSK wakeup and with NAPI. */
+
+	if (sq->xsk.hdr_pro - sq->xsk.hdr_con == sq->xsk.hdr_n) {
+		kfree(sq->xsk.hdr);
+		rcu_assign_pointer(sq->xsk.hdr, NULL);
+		synchronize_rcu();
+	}
+
+	return 0;
+}
+
 static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 {
 	switch (xdp->command) {
 	case XDP_SETUP_PROG:
 		return virtnet_xdp_set(dev, xdp->prog, xdp->extack);
+	case XDP_SETUP_XSK_POOL:
+		xdp->xsk.need_dma = false;
+		if (xdp->xsk.pool)
+			return virtnet_xsk_pool_enable(dev, xdp->xsk.pool,
+						       xdp->xsk.queue_id);
+		else
+			return virtnet_xsk_pool_disable(dev, xdp->xsk.queue_id);
 	default:
 		return -EINVAL;
 	}
 }
 
+static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
+			    struct xdp_desc *desc)
+{
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	void *data, *ptr;
+	struct page *page;
+	struct virtnet_xsk_hdr *xskhdr;
+	u32 idx, offset, n, i, copy, copied;
+	u64 addr;
+	int err, m;
+
+	addr = desc->addr;
+
+	data = xsk_buff_raw_get_data(pool, addr);
+	offset = offset_in_page(data);
+
+	/* one for hdr, one for the first page */
+	n = 2;
+	m = desc->len - (PAGE_SIZE - offset);
+	if (m > 0) {
+		n += m >> PAGE_SHIFT;
+		if (m & PAGE_MASK)
+			++n;
+
+		n = min_t(u32, n, ARRAY_SIZE(sq->sg));
+	}
+
+	idx = sq->xsk.hdr_con % sq->xsk.hdr_n;
+	xskhdr = &sq->xsk.hdr[idx];
+
+	/* xskhdr->hdr has been memset to zero, so not need to clear again */
+
+	sg_init_table(sq->sg, n);
+	sg_set_buf(sq->sg, &xskhdr->hdr, vi->hdr_len);
+
+	copied = 0;
+	for (i = 1; i < n; ++i) {
+		copy = min_t(int, desc->len - copied, PAGE_SIZE - offset);
+
+		page = xsk_buff_raw_get_page(pool, addr + copied);
+
+		sg_set_page(sq->sg + i, page, copy, offset);
+		copied += copy;
+		if (offset)
+			offset = 0;
+	}
+
+	xskhdr->len = desc->len;
+	ptr = xdp_to_ptr(&xskhdr->type);
+
+	err = virtqueue_add_outbuf(sq->vq, sq->sg, n, ptr, GFP_ATOMIC);
+	if (unlikely(err))
+		sq->xsk.last_desc = *desc;
+	else
+		sq->xsk.hdr_con++;
+
+	return err;
+}
+
+static bool virtnet_xsk_dev_is_full(struct send_queue *sq)
+{
+	if (sq->vq->num_free < 2 + MAX_SKB_FRAGS)
+		return true;
+
+	if (sq->xsk.hdr_con == sq->xsk.hdr_pro)
+		return true;
+
+	return false;
+}
+
+static int virtnet_xsk_xmit_zc(struct send_queue *sq,
+			       struct xsk_buff_pool *pool, unsigned int budget)
+{
+	struct xdp_desc desc;
+	int err, packet = 0;
+	int ret = -EAGAIN;
+
+	if (sq->xsk.last_desc.addr) {
+		err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc);
+		if (unlikely(err))
+			return -EBUSY;
+
+		++packet;
+		sq->xsk.last_desc.addr = 0;
+	}
+
+	while (budget-- > 0) {
+		if (virtnet_xsk_dev_is_full(sq)) {
+			ret = -EBUSY;
+			break;
+		}
+
+		if (!xsk_tx_peek_desc(pool, &desc)) {
+			/* done */
+			ret = 0;
+			break;
+		}
+
+		err = virtnet_xsk_xmit(sq, pool, &desc);
+		if (unlikely(err)) {
+			ret = -EBUSY;
+			break;
+		}
+
+		++packet;
+	}
+
+	if (packet) {
+		xsk_tx_release(pool);
+
+		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
+			u64_stats_update_begin(&sq->stats.syncp);
+			sq->stats.kicks++;
+			u64_stats_update_end(&sq->stats.syncp);
+		}
+	}
+
+	return ret;
+}
+
+static int virtnet_xsk_run(struct send_queue *sq,
+			   struct xsk_buff_pool *pool, int budget)
+{
+	int err, ret = 0;
+	unsigned int _packets = 0;
+	unsigned int _bytes = 0;
+
+	sq->xsk.wait_slot = false;
+
+	if (test_and_clear_bit(VIRTNET_STATE_XSK_TIMER, &sq->xsk.state))
+		hrtimer_try_to_cancel(&sq->xsk.timer);
+
+	__free_old_xmit_ptr(sq, true, false, &_packets, &_bytes);
+
+	err = virtnet_xsk_xmit_zc(sq, pool, xsk_budget);
+	if (!err) {
+		struct xdp_desc desc;
+
+		clear_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state);
+		xsk_set_tx_need_wakeup(pool);
+
+		/* Race breaker. If new is coming after last xmit
+		 * but before flag change
+		 */
+
+		if (!xsk_tx_peek_desc(pool, &desc))
+			goto end;
+
+		set_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state);
+		xsk_clear_tx_need_wakeup(pool);
+
+		sq->xsk.last_desc = desc;
+		ret = budget;
+		goto end;
+	}
+
+	xsk_clear_tx_need_wakeup(pool);
+
+	if (err == -EAGAIN) {
+		ret = budget;
+		goto end;
+	}
+
+	/* -EBUSY: wait tx ring avali.
+	 *	by tx interrupt or rx interrupt or start_xmit or timer
+	 */
+
+	__free_old_xmit_ptr(sq, true, false, &_packets, &_bytes);
+
+	if (!virtnet_xsk_dev_is_full(sq)) {
+		ret = budget;
+		goto end;
+	}
+
+	sq->xsk.wait_slot = true;
+
+	if (xsk_check_timeout) {
+		hrtimer_start(&sq->xsk.timer,
+			      ns_to_ktime(xsk_check_timeout * 1000),
+			      HRTIMER_MODE_REL_PINNED);
+
+		set_bit(VIRTNET_STATE_XSK_TIMER, &sq->xsk.state);
+	}
+
+	virtnet_sq_stop_check(sq, true);
+
+end:
+	return ret;
+}
+
+static int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	struct send_queue *sq;
+	struct xsk_buff_pool *pool;
+	struct netdev_queue *txq;
+	int work = 0;
+
+	if (!netif_running(dev))
+		return -ENETDOWN;
+
+	if (qid >= vi->curr_queue_pairs)
+		return -EINVAL;
+
+	sq = &vi->sq[qid];
+
+	rcu_read_lock();
+
+	pool = rcu_dereference(sq->xsk.pool);
+	if (!pool)
+		goto end;
+
+	if (test_and_set_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state))
+		goto end;
+
+	txq = netdev_get_tx_queue(dev, qid);
+
+	local_bh_disable();
+	__netif_tx_lock(txq, raw_smp_processor_id());
+
+	work = virtnet_xsk_run(sq, pool, xsk_budget);
+
+	__netif_tx_unlock(txq);
+	local_bh_enable();
+
+	if (work == xsk_budget)
+		virtqueue_napi_schedule(&sq->napi, sq->vq);
+
+end:
+	rcu_read_unlock();
+	return 0;
+}
+
 static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
 				      size_t len)
 {
@@ -2624,6 +3048,7 @@ static int virtnet_set_features(struct net_device *dev,
 	.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
 	.ndo_bpf		= virtnet_xdp,
 	.ndo_xdp_xmit		= virtnet_xdp_xmit,
+	.ndo_xsk_wakeup		= virtnet_xsk_wakeup,
 	.ndo_features_check	= passthru_features_check,
 	.ndo_get_phys_port_name	= virtnet_get_phys_port_name,
 	.ndo_set_features	= virtnet_set_features,
@@ -2722,6 +3147,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
 static void free_unused_bufs(struct virtnet_info *vi)
 {
 	void *buf;
+	u32 n;
 	int i;
 	struct send_queue *sq;
 
@@ -2740,6 +3166,11 @@ static void free_unused_bufs(struct virtnet_info *vi)
 					xdp_return_frame(xtype_got_ptr(xtype));
 			}
 		}
+
+		n = sq->xsk.hdr_con + sq->xsk.hdr_n;
+		n -= sq->xsk.hdr_pro;
+		if (n)
+			virt_xsk_complete(sq, n, false);
 	}
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit
  2021-01-05  9:11 [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit Xuan Zhuo
                   ` (4 preceding siblings ...)
  2021-01-05  9:11 ` [PATCH netdev 5/5] virtio-net, xsk: virtio-net support xsk zero copy tx Xuan Zhuo
@ 2021-01-05  9:32 ` Jason Wang
  2021-01-05 12:25 ` Michael S. Tsirkin
  2021-01-16  2:59 ` [PATCH net-next v2 0/7] " Xuan Zhuo
  7 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2021-01-05  9:32 UTC (permalink / raw)
  To: Xuan Zhuo, netdev
  Cc: dust.li, tonylu, Michael S. Tsirkin, David S. Miller,
	Jakub Kicinski, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	open list:VIRTIO CORE AND NET DRIVERS, open list,
	open list:XDP SOCKETS (AF_XDP)


On 2021/1/5 下午5:11, Xuan Zhuo wrote:
> The first patch made some adjustments to xsk.


Thanks a lot for the work. It's rather interesting.


>
> The second patch itself can be used as an independent patch to solve the problem
> that XDP may fail to load when the number of queues is insufficient.


It would be better to send this as a separated patch. Several people 
asked for this before.


>
> The third to last patch implements support for xsk in virtio-net.
>
> A practical problem with virtio is that tx interrupts are not very reliable.
> There will always be some missing or delayed tx interrupts. So I specially added
> a point timer to solve this problem. Of course, considering performance issues,
> The timer only triggers when the ring of the network card is full.


This is sub-optimal. We need figure out the root cause. We don't meet 
such issue before.

Several questions:

- is tx interrupt enabled?
- can you still see the issue if you disable event index?
- what's backend did you use? qemu or vhost(user)?


>
> Regarding the issue of virtio-net supporting xsk's zero copy rx, I am also
> developing it, but I found that the modification may be relatively large, so I
> consider this patch set to be separated from the code related to xsk zero copy
> rx.


That's fine, but a question here.

How is the multieuque being handled here. I'm asking since there's no 
programmable filters/directors support in virtio spec now.

Thanks


>
> Xuan Zhuo (5):
>    xsk: support get page for drv
>    virtio-net: support XDP_TX when not more queues
>    virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx
>    xsk, virtio-net: prepare for support xsk
>    virtio-net, xsk: virtio-net support xsk zero copy tx
>
>   drivers/net/virtio_net.c    | 643 +++++++++++++++++++++++++++++++++++++++-----
>   include/linux/netdevice.h   |   1 +
>   include/net/xdp_sock_drv.h  |  10 +
>   include/net/xsk_buff_pool.h |   1 +
>   net/xdp/xsk_buff_pool.c     |  10 +-
>   5 files changed, 597 insertions(+), 68 deletions(-)
>
> --
> 1.8.3.1
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit
  2021-01-05  9:11 [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit Xuan Zhuo
                   ` (5 preceding siblings ...)
  2021-01-05  9:32 ` [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit Jason Wang
@ 2021-01-05 12:25 ` Michael S. Tsirkin
  2021-01-16  2:59 ` [PATCH net-next v2 0/7] " Xuan Zhuo
  7 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2021-01-05 12:25 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, dust.li, tonylu, Jason Wang, David S. Miller,
	Jakub Kicinski, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	open list:VIRTIO CORE AND NET DRIVERS, open list,
	open list:XDP SOCKETS (AF_XDP)

On Tue, Jan 05, 2021 at 05:11:38PM +0800, Xuan Zhuo wrote:
> The first patch made some adjustments to xsk.
> 
> The second patch itself can be used as an independent patch to solve the problem
> that XDP may fail to load when the number of queues is insufficient.
> 
> The third to last patch implements support for xsk in virtio-net.
> 
> A practical problem with virtio is that tx interrupts are not very reliable.
> There will always be some missing or delayed tx interrupts.

Would appreciate a bit more data on this one. Is this a host bug? Device bug?
Can we limit the work around somehow?

> So I specially added
> a point timer to solve this problem. Of course, considering performance issues,
> The timer only triggers when the ring of the network card is full.
> 
> Regarding the issue of virtio-net supporting xsk's zero copy rx, I am also
> developing it, but I found that the modification may be relatively large, so I
> consider this patch set to be separated from the code related to xsk zero copy
> rx.
> 
> Xuan Zhuo (5):
>   xsk: support get page for drv
>   virtio-net: support XDP_TX when not more queues
>   virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx
>   xsk, virtio-net: prepare for support xsk
>   virtio-net, xsk: virtio-net support xsk zero copy tx
> 
>  drivers/net/virtio_net.c    | 643 +++++++++++++++++++++++++++++++++++++++-----
>  include/linux/netdevice.h   |   1 +
>  include/net/xdp_sock_drv.h  |  10 +
>  include/net/xsk_buff_pool.h |   1 +
>  net/xdp/xsk_buff_pool.c     |  10 +-
>  5 files changed, 597 insertions(+), 68 deletions(-)
> 
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH netdev 5/5] virtio-net, xsk: virtio-net support xsk zero copy tx
  2021-01-05  9:11 ` [PATCH netdev 5/5] virtio-net, xsk: virtio-net support xsk zero copy tx Xuan Zhuo
@ 2021-01-05 13:21   ` Michael S. Tsirkin
  0 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2021-01-05 13:21 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, dust.li, tonylu, Jason Wang, David S. Miller,
	Jakub Kicinski, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	open list:VIRTIO CORE AND NET DRIVERS, open list,
	open list:XDP SOCKETS (AF_XDP)

On Tue, Jan 05, 2021 at 05:11:43PM +0800, Xuan Zhuo wrote:
> Virtio net support xdp socket.
> 
> We should open the module param "napi_tx" for using this feature.

what does this imply exactly?

> In fact, various virtio implementations have some problems:
> 1. The tx interrupt may be lost
> 2. The tx interrupt may have a relatively large delay
> 
> This brings us to several questions:
> 
> 1. Wakeup wakes up a tx interrupt or directly starts a napi on the
>    current cpu, which will cause a delay in sending packets.
> 2. When the tx ring is full, the tx interrupt may be lost or delayed,
>    resulting in untimely recovery.
> 
> I choose to send part of the data directly during wakeup. If the sending
> has not been completed, I will start a napi to complete the subsequent
> sending work.
> 
> Since the possible delay or loss of tx interrupt occurs when the tx ring
> is full, I added a timer to solve this problem.

A lost interrupt sounds like a bug somewhere.
Why isn't this device already broken then, even without zero copy?
Won't a full ring stall forever? And won't a significantly delayed
tx interrupt block userspace?

How about putting work arounds were in a separate patch for now,
so it's easier to figure out whether anything in the patch itself
is causing issues?

> The performance of udp sending based on virtio net + xsk is 6 times that
> of ordinary kernel udp send.
> 
> * xsk_check_timeout: when the dev full or all xsk.hdr used, start timer
>   to check the xsk.hdr is avail. the unit is us.
> * xsk_num_max: the xsk.hdr max num
> * xsk_num_percent: the max hdr num be the percent of the virtio ring
>   size. The real xsk hdr num will the min of xsk_num_max and the percent
>   of the num of virtio ring
> * xsk_budget: the budget for xsk run
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 434 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index e744dce..76319e7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -22,10 +22,21 @@
>  #include <net/route.h>
>  #include <net/xdp.h>
>  #include <net/net_failover.h>
> +#include <net/xdp_sock_drv.h>
>  
>  static int napi_weight = NAPI_POLL_WEIGHT;
>  module_param(napi_weight, int, 0444);
>  
> +static int xsk_check_timeout = 100;
> +static int xsk_num_max       = 1024;
> +static int xsk_num_percent   = 80;
> +static int xsk_budget        = 128;
> +
> +module_param(xsk_check_timeout, int, 0644);
> +module_param(xsk_num_max,       int, 0644);
> +module_param(xsk_num_percent,   int, 0644);
> +module_param(xsk_budget,        int, 0644);
> +
>  static bool csum = true, gso = true, napi_tx = true;
>  module_param(csum, bool, 0444);
>  module_param(gso, bool, 0444);
> @@ -110,6 +121,9 @@ struct virtnet_xsk_hdr {
>  	u32 len;
>  };
>  
> +#define VIRTNET_STATE_XSK_WAKEUP BIT(0)
> +#define VIRTNET_STATE_XSK_TIMER BIT(1)
> +
>  #define VIRTNET_SQ_STAT(m)	offsetof(struct virtnet_sq_stats, m)
>  #define VIRTNET_RQ_STAT(m)	offsetof(struct virtnet_rq_stats, m)
>  
> @@ -149,6 +163,32 @@ struct send_queue {
>  	struct virtnet_sq_stats stats;
>  
>  	struct napi_struct napi;
> +
> +	struct {
> +		struct xsk_buff_pool   __rcu *pool;
> +		struct virtnet_xsk_hdr __rcu *hdr;
> +
> +		unsigned long          state;
> +		u64                    hdr_con;
> +		u64                    hdr_pro;
> +		u64                    hdr_n;
> +		struct xdp_desc        last_desc;
> +		bool                   wait_slot;
> +		/* tx interrupt issues
> +		 *   1. that may be lost
> +		 *   2. that too slow, 200/s or delay 10ms

I mean, we call virtqueue_enable_cb_delayed on each start_xmit.
The point is explicitly to reduce the # of tx interrupts,
is this the issue?

> +		 *
> +		 * timer for:
> +		 * 1. recycle the desc.(no check for performance, see below)
> +		 * 2. check the nic ring is avali. when nic ring is full
> +		 *
> +		 * Here, the regular check is performed for dev full. The
> +		 * application layer must ensure that the number of cq is
> +		 * sufficient, otherwise there may be insufficient cq in use.

Can't really parse this.  cq as in control vq?

> +		 *
> +		 */
> +		struct hrtimer          timer;
> +	} xsk;
>  };
>  
>  /* Internal representation of a receive virtqueue */
> @@ -267,6 +307,8 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
>  				bool xsk_wakeup,
>  				unsigned int *_packets, unsigned int *_bytes);
>  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi);
> +static int virtnet_xsk_run(struct send_queue *sq,
> +			   struct xsk_buff_pool *pool, int budget);
>  
>  static bool is_xdp_frame(void *ptr)
>  {
> @@ -1439,6 +1481,40 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>  	return stats.packets;
>  }
>  
> +static void virt_xsk_complete(struct send_queue *sq, u32 num, bool xsk_wakeup)
> +{
> +	struct xsk_buff_pool *pool;
> +	int n;
> +
> +	rcu_read_lock();
> +
> +	WRITE_ONCE(sq->xsk.hdr_pro, sq->xsk.hdr_pro + num);

WRITE_ONCE without READ_ONCE anywhere looks a bit weird.
Add a comment explaining why this is right?

> +
> +	pool = rcu_dereference(sq->xsk.pool);
> +	if (!pool) {
> +		if (sq->xsk.hdr_pro - sq->xsk.hdr_con == sq->xsk.hdr_n) {
> +			kfree(sq->xsk.hdr);
> +			rcu_assign_pointer(sq->xsk.hdr, NULL);
> +		}
> +		rcu_read_unlock();
> +		return;
> +	}
> +
> +	xsk_tx_completed(pool, num);
> +
> +	rcu_read_unlock();
> +
> +	if (!xsk_wakeup || !sq->xsk.wait_slot)
> +		return;
> +
> +	n = sq->xsk.hdr_pro - sq->xsk.hdr_con;
> +
> +	if (n > sq->xsk.hdr_n / 2) {
> +		sq->xsk.wait_slot = false;
> +		virtqueue_napi_schedule(&sq->napi, sq->vq);
> +	}
> +}
> +
>  static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
>  				bool xsk_wakeup,
>  				unsigned int *_packets, unsigned int *_bytes)
> @@ -1446,6 +1522,7 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
>  	unsigned int packets = 0;
>  	unsigned int bytes = 0;
>  	unsigned int len;
> +	u64 xsknum = 0;
>  	struct virtnet_xdp_type *xtype;
>  	struct xdp_frame        *frame;
>  	struct virtnet_xsk_hdr  *xskhdr;
> @@ -1466,6 +1543,7 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
>  			if (xtype->type == XDP_TYPE_XSK) {
>  				xskhdr = (struct virtnet_xsk_hdr *)xtype;
>  				bytes += xskhdr->len;
> +				xsknum += 1;
>  			} else {
>  				frame = xtype_got_ptr(xtype);
>  				xdp_return_frame(frame);
> @@ -1475,6 +1553,9 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
>  		packets++;
>  	}
>  
> +	if (xsknum)
> +		virt_xsk_complete(sq, xsknum, xsk_wakeup);
> +
>  	*_packets = packets;
>  	*_bytes = bytes;
>  }
> @@ -1595,6 +1676,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>  	struct virtnet_info *vi = sq->vq->vdev->priv;
>  	unsigned int index = vq2txq(sq->vq);
>  	struct netdev_queue *txq;
> +	struct xsk_buff_pool *pool;
> +	int work = 0;
>  
>  	if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
>  		/* We don't need to enable cb for XDP */
> @@ -1604,15 +1687,26 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>  
>  	txq = netdev_get_tx_queue(vi->dev, index);
>  	__netif_tx_lock(txq, raw_smp_processor_id());
> -	free_old_xmit_skbs(sq, true);
> +
> +	rcu_read_lock();
> +	pool = rcu_dereference(sq->xsk.pool);
> +	if (pool) {
> +		work = virtnet_xsk_run(sq, pool, budget);
> +		rcu_read_unlock();
> +	} else {
> +		rcu_read_unlock();
> +		free_old_xmit_skbs(sq, true);
> +	}
> +
>  	__netif_tx_unlock(txq);
>  
> -	virtqueue_napi_complete(napi, sq->vq, 0);
> +	if (work < budget)
> +		virtqueue_napi_complete(napi, sq->vq, 0);
>  
>  	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>  		netif_tx_wake_queue(txq);
>  
> -	return 0;
> +	return work;
>  }
>  
>  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> @@ -2560,16 +2654,346 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>  	return err;
>  }
>  
> +static enum hrtimer_restart virtnet_xsk_timeout(struct hrtimer *timer)
> +{
> +	struct send_queue *sq;
> +
> +	sq = container_of(timer, struct send_queue, xsk.timer);
> +
> +	clear_bit(VIRTNET_STATE_XSK_TIMER, &sq->xsk.state);
> +
> +	virtqueue_napi_schedule(&sq->napi, sq->vq);
> +
> +	return HRTIMER_NORESTART;
> +}
> +
> +static int virtnet_xsk_pool_enable(struct net_device *dev,
> +				   struct xsk_buff_pool *pool,
> +				   u16 qid)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	struct send_queue *sq = &vi->sq[qid];
> +	struct virtnet_xsk_hdr *hdr;
> +	int n, ret = 0;
> +
> +	if (qid >= dev->real_num_rx_queues || qid >= dev->real_num_tx_queues)
> +		return -EINVAL;
> +
> +	if (qid >= vi->curr_queue_pairs)
> +		return -EINVAL;
> +
> +	rcu_read_lock();
> +
> +	ret = -EBUSY;
> +	if (rcu_dereference(sq->xsk.pool))
> +		goto end;
> +
> +	/* check last xsk wait for hdr been free */
> +	if (rcu_dereference(sq->xsk.hdr))
> +		goto end;
> +
> +	n = virtqueue_get_vring_size(sq->vq);
> +	n = min(xsk_num_max, n * (xsk_num_percent % 100) / 100);
> +
> +	ret = -ENOMEM;
> +	hdr = kcalloc(n, sizeof(struct virtnet_xsk_hdr), GFP_ATOMIC);
> +	if (!hdr)
> +		goto end;
> +
> +	memset(&sq->xsk, 0, sizeof(sq->xsk));
> +
> +	sq->xsk.hdr_pro = n;
> +	sq->xsk.hdr_n   = n;
> +
> +	hrtimer_init(&sq->xsk.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
> +	sq->xsk.timer.function = virtnet_xsk_timeout;
> +
> +	rcu_assign_pointer(sq->xsk.pool, pool);
> +	rcu_assign_pointer(sq->xsk.hdr, hdr);
> +
> +	ret = 0;
> +end:
> +	rcu_read_unlock();
> +
> +	return ret;
> +}
> +
> +static int virtnet_xsk_pool_disable(struct net_device *dev, u16 qid)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	struct send_queue *sq = &vi->sq[qid];
> +
> +	if (qid >= dev->real_num_rx_queues || qid >= dev->real_num_tx_queues)
> +		return -EINVAL;
> +
> +	if (qid >= vi->curr_queue_pairs)
> +		return -EINVAL;
> +
> +	rcu_assign_pointer(sq->xsk.pool, NULL);
> +
> +	hrtimer_cancel(&sq->xsk.timer);
> +
> +	synchronize_rcu(); /* Sync with the XSK wakeup and with NAPI. */
> +
> +	if (sq->xsk.hdr_pro - sq->xsk.hdr_con == sq->xsk.hdr_n) {
> +		kfree(sq->xsk.hdr);
> +		rcu_assign_pointer(sq->xsk.hdr, NULL);
> +		synchronize_rcu();
> +	}
> +
> +	return 0;
> +}
> +
>  static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>  {
>  	switch (xdp->command) {
>  	case XDP_SETUP_PROG:
>  		return virtnet_xdp_set(dev, xdp->prog, xdp->extack);
> +	case XDP_SETUP_XSK_POOL:
> +		xdp->xsk.need_dma = false;
> +		if (xdp->xsk.pool)
> +			return virtnet_xsk_pool_enable(dev, xdp->xsk.pool,
> +						       xdp->xsk.queue_id);
> +		else
> +			return virtnet_xsk_pool_disable(dev, xdp->xsk.queue_id);
>  	default:
>  		return -EINVAL;
>  	}
>  }
>  
> +static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
> +			    struct xdp_desc *desc)
> +{
> +	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	void *data, *ptr;
> +	struct page *page;
> +	struct virtnet_xsk_hdr *xskhdr;
> +	u32 idx, offset, n, i, copy, copied;
> +	u64 addr;
> +	int err, m;
> +
> +	addr = desc->addr;
> +
> +	data = xsk_buff_raw_get_data(pool, addr);
> +	offset = offset_in_page(data);
> +
> +	/* one for hdr, one for the first page */
> +	n = 2;
> +	m = desc->len - (PAGE_SIZE - offset);
> +	if (m > 0) {
> +		n += m >> PAGE_SHIFT;
> +		if (m & PAGE_MASK)
> +			++n;
> +
> +		n = min_t(u32, n, ARRAY_SIZE(sq->sg));
> +	}
> +
> +	idx = sq->xsk.hdr_con % sq->xsk.hdr_n;
> +	xskhdr = &sq->xsk.hdr[idx];
> +
> +	/* xskhdr->hdr has been memset to zero, so not need to clear again */
> +
> +	sg_init_table(sq->sg, n);
> +	sg_set_buf(sq->sg, &xskhdr->hdr, vi->hdr_len);
> +
> +	copied = 0;
> +	for (i = 1; i < n; ++i) {
> +		copy = min_t(int, desc->len - copied, PAGE_SIZE - offset);
> +
> +		page = xsk_buff_raw_get_page(pool, addr + copied);
> +
> +		sg_set_page(sq->sg + i, page, copy, offset);
> +		copied += copy;
> +		if (offset)
> +			offset = 0;
> +	}
> +
> +	xskhdr->len = desc->len;
> +	ptr = xdp_to_ptr(&xskhdr->type);
> +
> +	err = virtqueue_add_outbuf(sq->vq, sq->sg, n, ptr, GFP_ATOMIC);
> +	if (unlikely(err))
> +		sq->xsk.last_desc = *desc;
> +	else
> +		sq->xsk.hdr_con++;
> +
> +	return err;
> +}
> +
> +static bool virtnet_xsk_dev_is_full(struct send_queue *sq)
> +{
> +	if (sq->vq->num_free < 2 + MAX_SKB_FRAGS)
> +		return true;
> +
> +	if (sq->xsk.hdr_con == sq->xsk.hdr_pro)
> +		return true;
> +
> +	return false;
> +}
> +
> +static int virtnet_xsk_xmit_zc(struct send_queue *sq,
> +			       struct xsk_buff_pool *pool, unsigned int budget)
> +{
> +	struct xdp_desc desc;
> +	int err, packet = 0;
> +	int ret = -EAGAIN;
> +
> +	if (sq->xsk.last_desc.addr) {
> +		err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc);
> +		if (unlikely(err))
> +			return -EBUSY;
> +
> +		++packet;
> +		sq->xsk.last_desc.addr = 0;
> +	}
> +
> +	while (budget-- > 0) {
> +		if (virtnet_xsk_dev_is_full(sq)) {
> +			ret = -EBUSY;
> +			break;
> +		}
> +
> +		if (!xsk_tx_peek_desc(pool, &desc)) {
> +			/* done */
> +			ret = 0;
> +			break;
> +		}
> +
> +		err = virtnet_xsk_xmit(sq, pool, &desc);
> +		if (unlikely(err)) {
> +			ret = -EBUSY;
> +			break;
> +		}
> +
> +		++packet;
> +	}
> +
> +	if (packet) {
> +		xsk_tx_release(pool);
> +
> +		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
> +			u64_stats_update_begin(&sq->stats.syncp);
> +			sq->stats.kicks++;
> +			u64_stats_update_end(&sq->stats.syncp);
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +static int virtnet_xsk_run(struct send_queue *sq,
> +			   struct xsk_buff_pool *pool, int budget)
> +{
> +	int err, ret = 0;
> +	unsigned int _packets = 0;
> +	unsigned int _bytes = 0;
> +
> +	sq->xsk.wait_slot = false;
> +
> +	if (test_and_clear_bit(VIRTNET_STATE_XSK_TIMER, &sq->xsk.state))
> +		hrtimer_try_to_cancel(&sq->xsk.timer);
> +
> +	__free_old_xmit_ptr(sq, true, false, &_packets, &_bytes);
> +
> +	err = virtnet_xsk_xmit_zc(sq, pool, xsk_budget);
> +	if (!err) {
> +		struct xdp_desc desc;
> +
> +		clear_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state);
> +		xsk_set_tx_need_wakeup(pool);
> +
> +		/* Race breaker. If new is coming after last xmit
> +		 * but before flag change
> +		 */


A bit more text explaining the rules for the two bits please.

> +
> +		if (!xsk_tx_peek_desc(pool, &desc))
> +			goto end;
> +
> +		set_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state);
> +		xsk_clear_tx_need_wakeup(pool);
> +
> +		sq->xsk.last_desc = desc;
> +		ret = budget;
> +		goto end;
> +	}
> +
> +	xsk_clear_tx_need_wakeup(pool);
> +
> +	if (err == -EAGAIN) {
> +		ret = budget;
> +		goto end;
> +	}
> +
> +	/* -EBUSY: wait tx ring avali.
> +	 *	by tx interrupt or rx interrupt or start_xmit or timer


can't parse this either ...

> +	 */
> +
> +	__free_old_xmit_ptr(sq, true, false, &_packets, &_bytes);
> +
> +	if (!virtnet_xsk_dev_is_full(sq)) {
> +		ret = budget;
> +		goto end;
> +	}
> +
> +	sq->xsk.wait_slot = true;
> +
> +	if (xsk_check_timeout) {
> +		hrtimer_start(&sq->xsk.timer,
> +			      ns_to_ktime(xsk_check_timeout * 1000),
> +			      HRTIMER_MODE_REL_PINNED);
> +
> +		set_bit(VIRTNET_STATE_XSK_TIMER, &sq->xsk.state);
> +	}
> +
> +	virtnet_sq_stop_check(sq, true);
> +
> +end:
> +	return ret;
> +}
> +
> +static int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	struct send_queue *sq;
> +	struct xsk_buff_pool *pool;
> +	struct netdev_queue *txq;
> +	int work = 0;
> +
> +	if (!netif_running(dev))
> +		return -ENETDOWN;
> +
> +	if (qid >= vi->curr_queue_pairs)
> +		return -EINVAL;
> +
> +	sq = &vi->sq[qid];
> +
> +	rcu_read_lock();
> +
> +	pool = rcu_dereference(sq->xsk.pool);
> +	if (!pool)
> +		goto end;
> +
> +	if (test_and_set_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state))
> +		goto end;
> +
> +	txq = netdev_get_tx_queue(dev, qid);
> +
> +	local_bh_disable();
> +	__netif_tx_lock(txq, raw_smp_processor_id());
> +
> +	work = virtnet_xsk_run(sq, pool, xsk_budget);
> +
> +	__netif_tx_unlock(txq);
> +	local_bh_enable();
> +
> +	if (work == xsk_budget)
> +		virtqueue_napi_schedule(&sq->napi, sq->vq);
> +
> +end:
> +	rcu_read_unlock();
> +	return 0;
> +}
> +
>  static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>  				      size_t len)
>  {
> @@ -2624,6 +3048,7 @@ static int virtnet_set_features(struct net_device *dev,
>  	.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
>  	.ndo_bpf		= virtnet_xdp,
>  	.ndo_xdp_xmit		= virtnet_xdp_xmit,
> +	.ndo_xsk_wakeup		= virtnet_xsk_wakeup,
>  	.ndo_features_check	= passthru_features_check,
>  	.ndo_get_phys_port_name	= virtnet_get_phys_port_name,
>  	.ndo_set_features	= virtnet_set_features,
> @@ -2722,6 +3147,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
>  static void free_unused_bufs(struct virtnet_info *vi)
>  {
>  	void *buf;
> +	u32 n;
>  	int i;
>  	struct send_queue *sq;
>  
> @@ -2740,6 +3166,11 @@ static void free_unused_bufs(struct virtnet_info *vi)
>  					xdp_return_frame(xtype_got_ptr(xtype));
>  			}
>  		}
> +
> +		n = sq->xsk.hdr_con + sq->xsk.hdr_n;
> +		n -= sq->xsk.hdr_pro;
> +		if (n)
> +			virt_xsk_complete(sq, n, false);
>  	}
>  
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
> -- 
> 1.8.3.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 0/7] virtio-net support xdp socket zero copy xmit
  2021-01-05  9:11 [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit Xuan Zhuo
                   ` (6 preceding siblings ...)
  2021-01-05 12:25 ` Michael S. Tsirkin
@ 2021-01-16  2:59 ` Xuan Zhuo
  2021-01-16  2:59   ` [PATCH net-next v2 1/7] xsk: support get page for drv Xuan Zhuo
                     ` (7 more replies)
  7 siblings, 8 replies; 25+ messages in thread
From: Xuan Zhuo @ 2021-01-16  2:59 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

XDP socket is an excellent by pass kernel network transmission framework. The
zero copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good. mlx5 and intel ixgbe already support this
feature, This patch set allows virtio-net to support xsk's zerocopy xmit
feature.

And xsk's zerocopy rx has made major changes to virtio-net, and I hope to submit
it after this patch set are received.

Compared with other drivers, virtio-net does not directly obtain the dma
address, so I first obtain the xsk page, and then pass the page to virtio.

When recycling the sent packets, we have to distinguish between skb and xdp.
Now we have to distinguish between skb, xdp, xsk. So the second patch solves
this problem first.

The last four patches are used to support xsk zerocopy in virtio-net:

 1. support xsk enable/disable
 2. realize the function of xsk packet sending
 3. implement xsk wakeup callback
 4. set xsk completed when packet sent done


---------------- Performance Testing ------------

The udp package tool implemented by the interface of xsk vs sockperf(kernel udp)
for performance testing:

xsk zero copy in virtio-net:
CPU        PPS         MSGSIZE
28.7%      3833857     64
38.5%      3689491     512
38.9%      2787096     1456

xsk without zero copy in virtio-net:
CPU        PPS         MSGSIZE
100%       1916747     64
100%       1775988     512
100%       1440054     1456

sockperf:
CPU        PPS         MSGSIZE
100%       713274      64
100%       701024      512
100%       695832      1456

Xuan Zhuo (7):
  xsk: support get page for drv
  virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx
  xsk, virtio-net: prepare for support xsk zerocopy xmit
  virtio-net, xsk: support xsk enable/disable
  virtio-net, xsk: realize the function of xsk packet sending
  virtio-net, xsk: implement xsk wakeup callback
  virtio-net, xsk: set xsk completed when packet sent done

 drivers/net/virtio_net.c    | 559 +++++++++++++++++++++++++++++++++++++++-----
 include/linux/netdevice.h   |   1 +
 include/net/xdp_sock_drv.h  |  10 +
 include/net/xsk_buff_pool.h |   1 +
 net/xdp/xsk_buff_pool.c     |  10 +-
 5 files changed, 523 insertions(+), 58 deletions(-)

--
1.8.3.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 1/7] xsk: support get page for drv
  2021-01-16  2:59 ` [PATCH net-next v2 0/7] " Xuan Zhuo
@ 2021-01-16  2:59   ` Xuan Zhuo
  2021-01-16  2:59   ` [PATCH net-next v2 2/7] virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx Xuan Zhuo
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2021-01-16  2:59 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

For some drivers, such as virtio-net, we do not configure dma when
binding xsk. We will get the page when sending.

This patch participates in a field need_dma during the setup pool. If
the device does not use dma, this value should be set to false.

And a function xsk_buff_raw_get_page is added to get the page based on
addr in drv.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 include/linux/netdevice.h   |  1 +
 include/net/xdp_sock_drv.h  | 10 ++++++++++
 include/net/xsk_buff_pool.h |  1 +
 net/xdp/xsk_buff_pool.c     | 10 +++++++++-
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5b94907..b452ade 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -914,6 +914,7 @@ struct netdev_bpf {
 		struct {
 			struct xsk_buff_pool *pool;
 			u16 queue_id;
+			bool need_dma;
 		} xsk;
 	};
 };
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 4e295541..e9c7e25 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -100,6 +100,11 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
 	return xp_raw_get_data(pool, addr);
 }
 
+static inline struct page *xsk_buff_raw_get_page(struct xsk_buff_pool *pool, u64 addr)
+{
+	return xp_raw_get_page(pool, addr);
+}
+
 static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool)
 {
 	struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp);
@@ -232,6 +237,11 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
 	return NULL;
 }
 
+static inline struct page *xsk_buff_raw_get_page(struct xsk_buff_pool *pool, u64 addr)
+{
+	return NULL;
+}
+
 static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool)
 {
 }
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index eaa8386..2dcfa54 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -108,6 +108,7 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
 bool xp_can_alloc(struct xsk_buff_pool *pool, u32 count);
 void *xp_raw_get_data(struct xsk_buff_pool *pool, u64 addr);
 dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr);
+struct page *xp_raw_get_page(struct xsk_buff_pool *pool, u64 addr);
 static inline dma_addr_t xp_get_dma(struct xdp_buff_xsk *xskb)
 {
 	return xskb->dma;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 20598ee..6d0cc9f 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -166,12 +166,13 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
 	bpf.command = XDP_SETUP_XSK_POOL;
 	bpf.xsk.pool = pool;
 	bpf.xsk.queue_id = queue_id;
+	bpf.xsk.need_dma = true;
 
 	err = netdev->netdev_ops->ndo_bpf(netdev, &bpf);
 	if (err)
 		goto err_unreg_pool;
 
-	if (!pool->dma_pages) {
+	if (bpf.xsk.need_dma && !pool->dma_pages) {
 		WARN(1, "Driver did not DMA map zero-copy buffers");
 		err = -EINVAL;
 		goto err_unreg_xsk;
@@ -535,6 +536,13 @@ void *xp_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
 }
 EXPORT_SYMBOL(xp_raw_get_data);
 
+struct page *xp_raw_get_page(struct xsk_buff_pool *pool, u64 addr)
+{
+	addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
+	return pool->umem->pgs[addr >> PAGE_SHIFT];
+}
+EXPORT_SYMBOL(xp_raw_get_page);
+
 dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr)
 {
 	addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 2/7] virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx
  2021-01-16  2:59 ` [PATCH net-next v2 0/7] " Xuan Zhuo
  2021-01-16  2:59   ` [PATCH net-next v2 1/7] xsk: support get page for drv Xuan Zhuo
@ 2021-01-16  2:59   ` Xuan Zhuo
  2021-01-18  6:45     ` Jason Wang
  2021-01-16  2:59   ` [PATCH net-next v2 3/7] xsk, virtio-net: prepare for support xsk zerocopy xmit Xuan Zhuo
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2021-01-16  2:59 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

If support xsk, a new ptr will be recovered during the
process of freeing the old ptr. In order to distinguish between ctx sent
by XDP_TX and ctx sent by xsk, a struct is added here to distinguish
between these two situations. virtnet_xdp_type.type It is used to
distinguish different ctx, and virtnet_xdp_type.offset is used to record
the offset between "true ctx" and virtnet_xdp_type.

The newly added virtnet_xsk_hdr will be used for xsk.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 75 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 60 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ba8e637..e707c31 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -94,6 +94,22 @@ struct virtnet_rq_stats {
 	u64 kicks;
 };
 
+enum {
+	XDP_TYPE_XSK,
+	XDP_TYPE_TX,
+};
+
+struct virtnet_xdp_type {
+	int offset:24;
+	unsigned type:8;
+};
+
+struct virtnet_xsk_hdr {
+	struct virtnet_xdp_type type;
+	struct virtio_net_hdr_mrg_rxbuf hdr;
+	u32 len;
+};
+
 #define VIRTNET_SQ_STAT(m)	offsetof(struct virtnet_sq_stats, m)
 #define VIRTNET_RQ_STAT(m)	offsetof(struct virtnet_rq_stats, m)
 
@@ -251,14 +267,19 @@ static bool is_xdp_frame(void *ptr)
 	return (unsigned long)ptr & VIRTIO_XDP_FLAG;
 }
 
-static void *xdp_to_ptr(struct xdp_frame *ptr)
+static void *xdp_to_ptr(struct virtnet_xdp_type *ptr)
 {
 	return (void *)((unsigned long)ptr | VIRTIO_XDP_FLAG);
 }
 
-static struct xdp_frame *ptr_to_xdp(void *ptr)
+static struct virtnet_xdp_type *ptr_to_xtype(void *ptr)
+{
+	return (struct virtnet_xdp_type *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
+}
+
+static void *xtype_get_ptr(struct virtnet_xdp_type *xdptype)
 {
-	return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
+	return (char *)xdptype + xdptype->offset;
 }
 
 /* Converting between virtqueue no. and kernel tx/rx queue no.
@@ -459,11 +480,16 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
 				   struct xdp_frame *xdpf)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
+	struct virtnet_xdp_type *xdptype;
 	int err;
 
-	if (unlikely(xdpf->headroom < vi->hdr_len))
+	if (unlikely(xdpf->headroom < vi->hdr_len + sizeof(*xdptype)))
 		return -EOVERFLOW;
 
+	xdptype = (struct virtnet_xdp_type *)(xdpf + 1);
+	xdptype->offset = (char *)xdpf - (char *)xdptype;
+	xdptype->type = XDP_TYPE_TX;
+
 	/* Make room for virtqueue hdr (also change xdpf->headroom?) */
 	xdpf->data -= vi->hdr_len;
 	/* Zero header and leave csum up to XDP layers */
@@ -473,7 +499,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
 
 	sg_init_one(sq->sg, xdpf->data, xdpf->len);
 
-	err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdpf),
+	err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdptype),
 				   GFP_ATOMIC);
 	if (unlikely(err))
 		return -ENOSPC; /* Caller handle free/refcnt */
@@ -523,8 +549,11 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	/* Free up any pending old buffers before queueing new ones. */
 	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		if (likely(is_xdp_frame(ptr))) {
-			struct xdp_frame *frame = ptr_to_xdp(ptr);
+			struct virtnet_xdp_type *xtype;
+			struct xdp_frame *frame;
 
+			xtype = ptr_to_xtype(ptr);
+			frame = xtype_get_ptr(xtype);
 			bytes += frame->len;
 			xdp_return_frame(frame);
 		} else {
@@ -1373,24 +1402,34 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 
 static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 {
-	unsigned int len;
 	unsigned int packets = 0;
 	unsigned int bytes = 0;
-	void *ptr;
+	unsigned int len;
+	struct virtnet_xdp_type *xtype;
+	struct xdp_frame        *frame;
+	struct virtnet_xsk_hdr  *xskhdr;
+	struct sk_buff          *skb;
+	void                    *ptr;
 
 	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		if (likely(!is_xdp_frame(ptr))) {
-			struct sk_buff *skb = ptr;
+			skb = ptr;
 
 			pr_debug("Sent skb %p\n", skb);
 
 			bytes += skb->len;
 			napi_consume_skb(skb, in_napi);
 		} else {
-			struct xdp_frame *frame = ptr_to_xdp(ptr);
+			xtype = ptr_to_xtype(ptr);
 
-			bytes += frame->len;
-			xdp_return_frame(frame);
+			if (xtype->type == XDP_TYPE_XSK) {
+				xskhdr = (struct virtnet_xsk_hdr *)xtype;
+				bytes += xskhdr->len;
+			} else {
+				frame = xtype_get_ptr(xtype);
+				xdp_return_frame(frame);
+				bytes += frame->len;
+			}
 		}
 		packets++;
 	}
@@ -2659,10 +2698,16 @@ static void free_unused_bufs(struct virtnet_info *vi)
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		struct virtqueue *vq = vi->sq[i].vq;
 		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
-			if (!is_xdp_frame(buf))
+			if (!is_xdp_frame(buf)) {
 				dev_kfree_skb(buf);
-			else
-				xdp_return_frame(ptr_to_xdp(buf));
+			} else {
+				struct virtnet_xdp_type *xtype;
+
+				xtype = ptr_to_xtype(buf);
+
+				if (xtype->type != XDP_TYPE_XSK)
+					xdp_return_frame(xtype_get_ptr(xtype));
+			}
 		}
 	}
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 3/7] xsk, virtio-net: prepare for support xsk zerocopy xmit
  2021-01-16  2:59 ` [PATCH net-next v2 0/7] " Xuan Zhuo
  2021-01-16  2:59   ` [PATCH net-next v2 1/7] xsk: support get page for drv Xuan Zhuo
  2021-01-16  2:59   ` [PATCH net-next v2 2/7] virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx Xuan Zhuo
@ 2021-01-16  2:59   ` Xuan Zhuo
  2021-01-16  2:59   ` [PATCH net-next v2 4/7] virtio-net, xsk: support xsk enable/disable Xuan Zhuo
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2021-01-16  2:59 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

Split function free_old_xmit_skbs, add sub-function __free_old_xmit_ptr,
which is convenient to call with other statistical information, and
supports the parameter 'xsk_wakeup' required for processing xsk.

Use netif stop check as a function virtnet_sq_stop_check, which will be
used when adding xsk support.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 95 ++++++++++++++++++++++++++----------------------
 1 file changed, 52 insertions(+), 43 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e707c31..9013328 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -262,6 +262,11 @@ struct padded_vnet_hdr {
 	char padding[4];
 };
 
+static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
+				bool xsk_wakeup,
+				unsigned int *_packets, unsigned int *_bytes);
+static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi);
+
 static bool is_xdp_frame(void *ptr)
 {
 	return (unsigned long)ptr & VIRTIO_XDP_FLAG;
@@ -375,6 +380,37 @@ static void skb_xmit_done(struct virtqueue *vq)
 		netif_wake_subqueue(vi->dev, vq2txq(vq));
 }
 
+static void virtnet_sq_stop_check(struct send_queue *sq, bool in_napi)
+{
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct net_device *dev = vi->dev;
+	int qnum = sq - vi->sq;
+
+	/* If running out of space, stop queue to avoid getting packets that we
+	 * are then unable to transmit.
+	 * An alternative would be to force queuing layer to requeue the skb by
+	 * returning NETDEV_TX_BUSY. However, NETDEV_TX_BUSY should not be
+	 * returned in a normal path of operation: it means that driver is not
+	 * maintaining the TX queue stop/start state properly, and causes
+	 * the stack to do a non-trivial amount of useless work.
+	 * Since most packets only take 1 or 2 ring slots, stopping the queue
+	 * early means 16 slots are typically wasted.
+	 */
+
+	if (sq->vq->num_free < 2 + MAX_SKB_FRAGS) {
+		netif_stop_subqueue(dev, qnum);
+		if (!sq->napi.weight &&
+		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
+			/* More just got used, free them then recheck. */
+			free_old_xmit_skbs(sq, in_napi);
+			if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) {
+				netif_start_subqueue(dev, qnum);
+				virtqueue_disable_cb(sq->vq);
+			}
+		}
+	}
+}
+
 #define MRG_CTX_HEADER_SHIFT 22
 static void *mergeable_len_to_ctx(unsigned int truesize,
 				  unsigned int headroom)
@@ -522,13 +558,11 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	struct receive_queue *rq = vi->rq;
 	struct bpf_prog *xdp_prog;
 	struct send_queue *sq;
-	unsigned int len;
 	int packets = 0;
 	int bytes = 0;
 	int drops = 0;
 	int kicks = 0;
 	int ret, err;
-	void *ptr;
 	int i;
 
 	/* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
@@ -546,24 +580,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 		goto out;
 	}
 
-	/* Free up any pending old buffers before queueing new ones. */
-	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
-		if (likely(is_xdp_frame(ptr))) {
-			struct virtnet_xdp_type *xtype;
-			struct xdp_frame *frame;
-
-			xtype = ptr_to_xtype(ptr);
-			frame = xtype_get_ptr(xtype);
-			bytes += frame->len;
-			xdp_return_frame(frame);
-		} else {
-			struct sk_buff *skb = ptr;
-
-			bytes += skb->len;
-			napi_consume_skb(skb, false);
-		}
-		packets++;
-	}
+	__free_old_xmit_ptr(sq, false, true, &packets, &bytes);
 
 	for (i = 0; i < n; i++) {
 		struct xdp_frame *xdpf = frames[i];
@@ -1400,7 +1417,9 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 	return stats.packets;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
+static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
+				bool xsk_wakeup,
+				unsigned int *_packets, unsigned int *_bytes)
 {
 	unsigned int packets = 0;
 	unsigned int bytes = 0;
@@ -1434,6 +1453,17 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 		packets++;
 	}
 
+	*_packets = packets;
+	*_bytes = bytes;
+}
+
+static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
+{
+	unsigned int packets = 0;
+	unsigned int bytes = 0;
+
+	__free_old_xmit_ptr(sq, in_napi, true, &packets, &bytes);
+
 	/* Avoid overhead when no packets have been processed
 	 * happens when called speculatively from start_xmit.
 	 */
@@ -1649,28 +1679,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		nf_reset_ct(skb);
 	}
 
-	/* If running out of space, stop queue to avoid getting packets that we
-	 * are then unable to transmit.
-	 * An alternative would be to force queuing layer to requeue the skb by
-	 * returning NETDEV_TX_BUSY. However, NETDEV_TX_BUSY should not be
-	 * returned in a normal path of operation: it means that driver is not
-	 * maintaining the TX queue stop/start state properly, and causes
-	 * the stack to do a non-trivial amount of useless work.
-	 * Since most packets only take 1 or 2 ring slots, stopping the queue
-	 * early means 16 slots are typically wasted.
-	 */
-	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
-		netif_stop_subqueue(dev, qnum);
-		if (!use_napi &&
-		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
-			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq, false);
-			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
-				netif_start_subqueue(dev, qnum);
-				virtqueue_disable_cb(sq->vq);
-			}
-		}
-	}
+	virtnet_sq_stop_check(sq, false);
 
 	if (kick || netif_xmit_stopped(txq)) {
 		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 4/7] virtio-net, xsk: support xsk enable/disable
  2021-01-16  2:59 ` [PATCH net-next v2 0/7] " Xuan Zhuo
                     ` (2 preceding siblings ...)
  2021-01-16  2:59   ` [PATCH net-next v2 3/7] xsk, virtio-net: prepare for support xsk zerocopy xmit Xuan Zhuo
@ 2021-01-16  2:59   ` Xuan Zhuo
  2021-01-16  2:59   ` [PATCH net-next v2 5/7] virtio-net, xsk: realize the function of xsk packet sending Xuan Zhuo
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2021-01-16  2:59 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

When enable, a certain number of struct virtnet_xsk_hdr is allocated to
save the information of each packet and virtio hdr.This number is the
limit of the received module parameters.

When struct virtnet_xsk_hdr is used up, or the sq->vq->num_free of
virtio-net is too small, it will be considered that the device is busy.

* xsk_num_max: the xsk.hdr max num
* xsk_num_percent: the max hdr num be the percent of the virtio ring
  size. The real xsk hdr num will the min of xsk_num_max and the percent
  of the num of virtio ring
* xsk_budget: the budget for xsk run

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 97 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9013328..a62d456 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -22,10 +22,19 @@
 #include <net/route.h>
 #include <net/xdp.h>
 #include <net/net_failover.h>
+#include <net/xdp_sock_drv.h>
 
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
 
+static int xsk_num_max     = 1024;
+static int xsk_num_percent = 80;
+static int xsk_budget      = 128;
+
+module_param(xsk_num_max,     int, 0644);
+module_param(xsk_num_percent, int, 0644);
+module_param(xsk_budget,      int, 0644);
+
 static bool csum = true, gso = true, napi_tx = true;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
@@ -149,6 +158,15 @@ struct send_queue {
 	struct virtnet_sq_stats stats;
 
 	struct napi_struct napi;
+
+	struct {
+		struct xsk_buff_pool   __rcu *pool;
+		struct virtnet_xsk_hdr __rcu *hdr;
+
+		u64                    hdr_con;
+		u64                    hdr_pro;
+		u64                    hdr_n;
+	} xsk;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -2540,11 +2558,90 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 	return err;
 }
 
+static int virtnet_xsk_pool_enable(struct net_device *dev,
+				   struct xsk_buff_pool *pool,
+				   u16 qid)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	struct send_queue *sq = &vi->sq[qid];
+	struct virtnet_xsk_hdr *hdr;
+	int n, ret = 0;
+
+	if (qid >= dev->real_num_rx_queues || qid >= dev->real_num_tx_queues)
+		return -EINVAL;
+
+	if (qid >= vi->curr_queue_pairs)
+		return -EINVAL;
+
+	rcu_read_lock();
+
+	ret = -EBUSY;
+	if (rcu_dereference(sq->xsk.pool))
+		goto end;
+
+	/* check last xsk wait for hdr been free */
+	if (rcu_dereference(sq->xsk.hdr))
+		goto end;
+
+	n = virtqueue_get_vring_size(sq->vq);
+	n = min(xsk_num_max, n * (xsk_num_percent % 100) / 100);
+
+	ret = -ENOMEM;
+	hdr = kcalloc(n, sizeof(struct virtnet_xsk_hdr), GFP_ATOMIC);
+	if (!hdr)
+		goto end;
+
+	memset(&sq->xsk, 0, sizeof(sq->xsk));
+
+	sq->xsk.hdr_pro = n;
+	sq->xsk.hdr_n   = n;
+
+	rcu_assign_pointer(sq->xsk.pool, pool);
+	rcu_assign_pointer(sq->xsk.hdr, hdr);
+
+	ret = 0;
+end:
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static int virtnet_xsk_pool_disable(struct net_device *dev, u16 qid)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	struct send_queue *sq = &vi->sq[qid];
+	struct virtnet_xsk_hdr *hdr = NULL;
+
+	if (qid >= dev->real_num_rx_queues || qid >= dev->real_num_tx_queues)
+		return -EINVAL;
+
+	if (qid >= vi->curr_queue_pairs)
+		return -EINVAL;
+
+	rcu_assign_pointer(sq->xsk.pool, NULL);
+
+	if (sq->xsk.hdr_pro - sq->xsk.hdr_con == sq->xsk.hdr_n)
+		hdr = rcu_replace_pointer(sq->xsk.hdr, hdr, true);
+
+	synchronize_rcu(); /* Sync with the XSK wakeup and with NAPI. */
+
+	kfree(hdr);
+
+	return 0;
+}
+
 static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 {
 	switch (xdp->command) {
 	case XDP_SETUP_PROG:
 		return virtnet_xdp_set(dev, xdp->prog, xdp->extack);
+	case XDP_SETUP_XSK_POOL:
+		xdp->xsk.need_dma = false;
+		if (xdp->xsk.pool)
+			return virtnet_xsk_pool_enable(dev, xdp->xsk.pool,
+						       xdp->xsk.queue_id);
+		else
+			return virtnet_xsk_pool_disable(dev, xdp->xsk.queue_id);
 	default:
 		return -EINVAL;
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 5/7] virtio-net, xsk: realize the function of xsk packet sending
  2021-01-16  2:59 ` [PATCH net-next v2 0/7] " Xuan Zhuo
                     ` (3 preceding siblings ...)
  2021-01-16  2:59   ` [PATCH net-next v2 4/7] virtio-net, xsk: support xsk enable/disable Xuan Zhuo
@ 2021-01-16  2:59   ` Xuan Zhuo
  2021-01-16  4:47     ` Jakub Kicinski
  2021-01-18  9:10     ` Jason Wang
  2021-01-16  2:59   ` [PATCH net-next v2 6/7] virtio-net, xsk: implement xsk wakeup callback Xuan Zhuo
                     ` (2 subsequent siblings)
  7 siblings, 2 replies; 25+ messages in thread
From: Xuan Zhuo @ 2021-01-16  2:59 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

virtnet_xsk_run will be called in the tx interrupt handling function
virtnet_poll_tx.

The sending process gets desc from the xsk tx queue, and assembles it to
send the data.

Compared with other drivers, a special place is that the page of the
data in xsk is used here instead of the dma address. Because the virtio
interface does not use the dma address.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 200 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 197 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a62d456..42aa9ad 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -119,6 +119,8 @@ struct virtnet_xsk_hdr {
 	u32 len;
 };
 
+#define VIRTNET_STATE_XSK_WAKEUP 1
+
 #define VIRTNET_SQ_STAT(m)	offsetof(struct virtnet_sq_stats, m)
 #define VIRTNET_RQ_STAT(m)	offsetof(struct virtnet_rq_stats, m)
 
@@ -163,9 +165,12 @@ struct send_queue {
 		struct xsk_buff_pool   __rcu *pool;
 		struct virtnet_xsk_hdr __rcu *hdr;
 
+		unsigned long          state;
 		u64                    hdr_con;
 		u64                    hdr_pro;
 		u64                    hdr_n;
+		struct xdp_desc        last_desc;
+		bool                   wait_slot;
 	} xsk;
 };
 
@@ -284,6 +289,8 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
 				bool xsk_wakeup,
 				unsigned int *_packets, unsigned int *_bytes);
 static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi);
+static int virtnet_xsk_run(struct send_queue *sq,
+			   struct xsk_buff_pool *pool, int budget);
 
 static bool is_xdp_frame(void *ptr)
 {
@@ -1590,6 +1597,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	unsigned int index = vq2txq(sq->vq);
 	struct netdev_queue *txq;
+	struct xsk_buff_pool *pool;
+	int work = 0;
 
 	if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
 		/* We don't need to enable cb for XDP */
@@ -1599,15 +1608,26 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 
 	txq = netdev_get_tx_queue(vi->dev, index);
 	__netif_tx_lock(txq, raw_smp_processor_id());
-	free_old_xmit_skbs(sq, true);
+
+	rcu_read_lock();
+	pool = rcu_dereference(sq->xsk.pool);
+	if (pool) {
+		work = virtnet_xsk_run(sq, pool, budget);
+		rcu_read_unlock();
+	} else {
+		rcu_read_unlock();
+		free_old_xmit_skbs(sq, true);
+	}
+
 	__netif_tx_unlock(txq);
 
-	virtqueue_napi_complete(napi, sq->vq, 0);
+	if (work < budget)
+		virtqueue_napi_complete(napi, sq->vq, 0);
 
 	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
 		netif_tx_wake_queue(txq);
 
-	return 0;
+	return work;
 }
 
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
@@ -2647,6 +2667,180 @@ static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 	}
 }
 
+static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
+			    struct xdp_desc *desc)
+{
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	void *data, *ptr;
+	struct page *page;
+	struct virtnet_xsk_hdr *xskhdr;
+	u32 idx, offset, n, i, copy, copied;
+	u64 addr;
+	int err, m;
+
+	addr = desc->addr;
+
+	data = xsk_buff_raw_get_data(pool, addr);
+	offset = offset_in_page(data);
+
+	/* one for hdr, one for the first page */
+	n = 2;
+	m = desc->len - (PAGE_SIZE - offset);
+	if (m > 0) {
+		n += m >> PAGE_SHIFT;
+		if (m & PAGE_MASK)
+			++n;
+
+		n = min_t(u32, n, ARRAY_SIZE(sq->sg));
+	}
+
+	idx = sq->xsk.hdr_con % sq->xsk.hdr_n;
+	xskhdr = &sq->xsk.hdr[idx];
+
+	/* xskhdr->hdr has been memset to zero, so not need to clear again */
+
+	sg_init_table(sq->sg, n);
+	sg_set_buf(sq->sg, &xskhdr->hdr, vi->hdr_len);
+
+	copied = 0;
+	for (i = 1; i < n; ++i) {
+		copy = min_t(int, desc->len - copied, PAGE_SIZE - offset);
+
+		page = xsk_buff_raw_get_page(pool, addr + copied);
+
+		sg_set_page(sq->sg + i, page, copy, offset);
+		copied += copy;
+		if (offset)
+			offset = 0;
+	}
+
+	xskhdr->len = desc->len;
+	ptr = xdp_to_ptr(&xskhdr->type);
+
+	err = virtqueue_add_outbuf(sq->vq, sq->sg, n, ptr, GFP_ATOMIC);
+	if (unlikely(err))
+		sq->xsk.last_desc = *desc;
+	else
+		sq->xsk.hdr_con++;
+
+	return err;
+}
+
+static bool virtnet_xsk_dev_is_full(struct send_queue *sq)
+{
+	if (sq->vq->num_free < 2 + MAX_SKB_FRAGS)
+		return true;
+
+	if (sq->xsk.hdr_con == sq->xsk.hdr_pro)
+		return true;
+
+	return false;
+}
+
+static int virtnet_xsk_xmit_zc(struct send_queue *sq,
+			       struct xsk_buff_pool *pool, unsigned int budget)
+{
+	struct xdp_desc desc;
+	int err, packet = 0;
+	int ret = -EAGAIN;
+
+	if (sq->xsk.last_desc.addr) {
+		err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc);
+		if (unlikely(err))
+			return -EBUSY;
+
+		++packet;
+		sq->xsk.last_desc.addr = 0;
+	}
+
+	while (budget-- > 0) {
+		if (virtnet_xsk_dev_is_full(sq)) {
+			ret = -EBUSY;
+			break;
+		}
+
+		if (!xsk_tx_peek_desc(pool, &desc)) {
+			/* done */
+			ret = 0;
+			break;
+		}
+
+		err = virtnet_xsk_xmit(sq, pool, &desc);
+		if (unlikely(err)) {
+			ret = -EBUSY;
+			break;
+		}
+
+		++packet;
+	}
+
+	if (packet) {
+		xsk_tx_release(pool);
+
+		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
+			u64_stats_update_begin(&sq->stats.syncp);
+			sq->stats.kicks++;
+			u64_stats_update_end(&sq->stats.syncp);
+		}
+	}
+
+	return ret;
+}
+
+static int virtnet_xsk_run(struct send_queue *sq,
+			   struct xsk_buff_pool *pool, int budget)
+{
+	int err, ret = 0;
+	unsigned int _packets = 0;
+	unsigned int _bytes = 0;
+
+	sq->xsk.wait_slot = false;
+
+	__free_old_xmit_ptr(sq, true, false, &_packets, &_bytes);
+
+	err = virtnet_xsk_xmit_zc(sq, pool, xsk_budget);
+	if (!err) {
+		struct xdp_desc desc;
+
+		clear_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state);
+		xsk_set_tx_need_wakeup(pool);
+
+		/* Race breaker. If new is coming after last xmit
+		 * but before flag change
+		 */
+
+		if (!xsk_tx_peek_desc(pool, &desc))
+			goto end;
+
+		set_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state);
+		xsk_clear_tx_need_wakeup(pool);
+
+		sq->xsk.last_desc = desc;
+		ret = budget;
+		goto end;
+	}
+
+	xsk_clear_tx_need_wakeup(pool);
+
+	if (err == -EAGAIN) {
+		ret = budget;
+		goto end;
+	}
+
+	__free_old_xmit_ptr(sq, true, false, &_packets, &_bytes);
+
+	if (!virtnet_xsk_dev_is_full(sq)) {
+		ret = budget;
+		goto end;
+	}
+
+	sq->xsk.wait_slot = true;
+
+	virtnet_sq_stop_check(sq, true);
+end:
+	return ret;
+}
+
 static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
 				      size_t len)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 6/7] virtio-net, xsk: implement xsk wakeup callback
  2021-01-16  2:59 ` [PATCH net-next v2 0/7] " Xuan Zhuo
                     ` (4 preceding siblings ...)
  2021-01-16  2:59   ` [PATCH net-next v2 5/7] virtio-net, xsk: realize the function of xsk packet sending Xuan Zhuo
@ 2021-01-16  2:59   ` Xuan Zhuo
  2021-01-19  4:50     ` Jason Wang
  2021-01-16  2:59   ` [PATCH net-next v2 7/7] virtio-net, xsk: set xsk completed when packet sent done Xuan Zhuo
  2021-01-18  6:28   ` [PATCH net-next v2 0/7] virtio-net support xdp socket zero copy xmit Jason Wang
  7 siblings, 1 reply; 25+ messages in thread
From: Xuan Zhuo @ 2021-01-16  2:59 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

Since I did not find an interface to directly notify virtio to generate
a tx interrupt, I sent some data to trigger a new tx interrupt.

Another advantage of this is that the transmission delay will be
relatively small, and there is no need to wait for the tx interrupt to
start softirq.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 42aa9ad..e552c2d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2841,6 +2841,56 @@ static int virtnet_xsk_run(struct send_queue *sq,
 	return ret;
 }
 
+static int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	struct send_queue *sq;
+	struct xsk_buff_pool *pool;
+	struct netdev_queue *txq;
+
+	if (!netif_running(dev))
+		return -ENETDOWN;
+
+	if (qid >= vi->curr_queue_pairs)
+		return -EINVAL;
+
+	sq = &vi->sq[qid];
+
+	rcu_read_lock();
+
+	pool = rcu_dereference(sq->xsk.pool);
+	if (!pool)
+		goto end;
+
+	if (test_and_set_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state))
+		goto end;
+
+	txq = netdev_get_tx_queue(dev, qid);
+
+	local_bh_disable();
+	__netif_tx_lock(txq, raw_smp_processor_id());
+
+	/* Send part of the package directly to reduce the delay in sending the
+	 * package, and this can actively trigger the tx interrupts.
+	 *
+	 * If the package is not processed, then continue processing in the
+	 * subsequent tx interrupt(virtnet_poll_tx).
+	 *
+	 * If no packet is sent out, the ring of the device is full. In this
+	 * case, we will still get a tx interrupt response. Then we will deal
+	 * with the subsequent packet sending work.
+	 */
+
+	virtnet_xsk_run(sq, pool, xsk_budget);
+
+	__netif_tx_unlock(txq);
+	local_bh_enable();
+
+end:
+	rcu_read_unlock();
+	return 0;
+}
+
 static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
 				      size_t len)
 {
@@ -2895,6 +2945,7 @@ static int virtnet_set_features(struct net_device *dev,
 	.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
 	.ndo_bpf		= virtnet_xdp,
 	.ndo_xdp_xmit		= virtnet_xdp_xmit,
+	.ndo_xsk_wakeup		= virtnet_xsk_wakeup,
 	.ndo_features_check	= passthru_features_check,
 	.ndo_get_phys_port_name	= virtnet_get_phys_port_name,
 	.ndo_set_features	= virtnet_set_features,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v2 7/7] virtio-net, xsk: set xsk completed when packet sent done
  2021-01-16  2:59 ` [PATCH net-next v2 0/7] " Xuan Zhuo
                     ` (5 preceding siblings ...)
  2021-01-16  2:59   ` [PATCH net-next v2 6/7] virtio-net, xsk: implement xsk wakeup callback Xuan Zhuo
@ 2021-01-16  2:59   ` Xuan Zhuo
  2021-01-18  6:28   ` [PATCH net-next v2 0/7] virtio-net support xdp socket zero copy xmit Jason Wang
  7 siblings, 0 replies; 25+ messages in thread
From: Xuan Zhuo @ 2021-01-16  2:59 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

When recycling packets that have been sent, call xsk_tx_completed to
inform xsk which packets have been sent.

If necessary, start napi to process the packets in the xsk queue.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e552c2d..d0d620b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1442,6 +1442,42 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 	return stats.packets;
 }
 
+static void virt_xsk_complete(struct send_queue *sq, u32 num, bool xsk_wakeup)
+{
+	struct xsk_buff_pool *pool;
+	struct virtnet_xsk_hdr *hdr = NULL;
+	int n;
+
+	rcu_read_lock();
+
+	sq->xsk.hdr_pro += num;
+
+	pool = rcu_dereference(sq->xsk.pool);
+	if (!pool) {
+		if (sq->xsk.hdr_pro - sq->xsk.hdr_con == sq->xsk.hdr_n)
+			hdr = rcu_replace_pointer(sq->xsk.hdr, hdr, true);
+
+		rcu_read_unlock();
+
+		kfree(hdr);
+		return;
+	}
+
+	xsk_tx_completed(pool, num);
+
+	rcu_read_unlock();
+
+	if (!xsk_wakeup || !sq->xsk.wait_slot)
+		return;
+
+	n = sq->xsk.hdr_pro - sq->xsk.hdr_con;
+
+	if (n > sq->xsk.hdr_n / 2) {
+		sq->xsk.wait_slot = false;
+		virtqueue_napi_schedule(&sq->napi, sq->vq);
+	}
+}
+
 static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
 				bool xsk_wakeup,
 				unsigned int *_packets, unsigned int *_bytes)
@@ -1449,6 +1485,7 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
 	unsigned int packets = 0;
 	unsigned int bytes = 0;
 	unsigned int len;
+	u64 xsknum = 0;
 	struct virtnet_xdp_type *xtype;
 	struct xdp_frame        *frame;
 	struct virtnet_xsk_hdr  *xskhdr;
@@ -1469,6 +1506,7 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
 			if (xtype->type == XDP_TYPE_XSK) {
 				xskhdr = (struct virtnet_xsk_hdr *)xtype;
 				bytes += xskhdr->len;
+				xsknum += 1;
 			} else {
 				frame = xtype_get_ptr(xtype);
 				xdp_return_frame(frame);
@@ -1478,6 +1516,9 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
 		packets++;
 	}
 
+	if (xsknum)
+		virt_xsk_complete(sq, xsknum, xsk_wakeup);
+
 	*_packets = packets;
 	*_bytes = bytes;
 }
@@ -3044,10 +3085,13 @@ static void free_receive_page_frags(struct virtnet_info *vi)
 static void free_unused_bufs(struct virtnet_info *vi)
 {
 	void *buf;
+	u32 n;
 	int i;
+	struct send_queue *sq;
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		struct virtqueue *vq = vi->sq[i].vq;
+		sq = vi->sq + i;
 		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
 			if (!is_xdp_frame(buf)) {
 				dev_kfree_skb(buf);
@@ -3060,6 +3104,11 @@ static void free_unused_bufs(struct virtnet_info *vi)
 					xdp_return_frame(xtype_get_ptr(xtype));
 			}
 		}
+
+		n = sq->xsk.hdr_con + sq->xsk.hdr_n;
+		n -= sq->xsk.hdr_pro;
+		if (n)
+			virt_xsk_complete(sq, n, false);
 	}
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 5/7] virtio-net, xsk: realize the function of xsk packet sending
  2021-01-16  2:59   ` [PATCH net-next v2 5/7] virtio-net, xsk: realize the function of xsk packet sending Xuan Zhuo
@ 2021-01-16  4:47     ` Jakub Kicinski
  2021-01-18  9:10     ` Jason Wang
  1 sibling, 0 replies; 25+ messages in thread
From: Jakub Kicinski @ 2021-01-16  4:47 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

On Sat, 16 Jan 2021 10:59:26 +0800 Xuan Zhuo wrote:
> +	idx = sq->xsk.hdr_con % sq->xsk.hdr_n;

The arguments here are 64 bit, this code will not build on 32 bit
machines:

ERROR: modpost: "__umoddi3" [drivers/net/virtio_net.ko] undefined!

There's also a sparse warning in this patch:

drivers/net/virtio_net.c:2704:16: warning: incorrect type in assignment (different address spaces)
drivers/net/virtio_net.c:2704:16:    expected struct virtnet_xsk_hdr *xskhdr
drivers/net/virtio_net.c:2704:16:    got struct virtnet_xsk_hdr [noderef] __rcu *

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 0/7] virtio-net support xdp socket zero copy xmit
  2021-01-16  2:59 ` [PATCH net-next v2 0/7] " Xuan Zhuo
                     ` (6 preceding siblings ...)
  2021-01-16  2:59   ` [PATCH net-next v2 7/7] virtio-net, xsk: set xsk completed when packet sent done Xuan Zhuo
@ 2021-01-18  6:28   ` Jason Wang
  7 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2021-01-18  6:28 UTC (permalink / raw)
  To: Xuan Zhuo, netdev
  Cc: Michael S. Tsirkin, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf


On 2021/1/16 上午10:59, Xuan Zhuo wrote:
> XDP socket is an excellent by pass kernel network transmission framework. The
> zero copy feature of xsk (XDP socket) needs to be supported by the driver. The
> performance of zero copy is very good. mlx5 and intel ixgbe already support this
> feature, This patch set allows virtio-net to support xsk's zerocopy xmit
> feature.
>
> And xsk's zerocopy rx has made major changes to virtio-net, and I hope to submit
> it after this patch set are received.
>
> Compared with other drivers, virtio-net does not directly obtain the dma
> address, so I first obtain the xsk page, and then pass the page to virtio.
>
> When recycling the sent packets, we have to distinguish between skb and xdp.
> Now we have to distinguish between skb, xdp, xsk. So the second patch solves
> this problem first.
>
> The last four patches are used to support xsk zerocopy in virtio-net:
>
>   1. support xsk enable/disable
>   2. realize the function of xsk packet sending
>   3. implement xsk wakeup callback
>   4. set xsk completed when packet sent done
>
>
> ---------------- Performance Testing ------------
>
> The udp package tool implemented by the interface of xsk vs sockperf(kernel udp)
> for performance testing:
>
> xsk zero copy in virtio-net:
> CPU        PPS         MSGSIZE
> 28.7%      3833857     64
> 38.5%      3689491     512
> 38.9%      2787096     1456


Some questions on the results:

1) What's the setup on the vhost?
2) What's the setup of the mitigation in both host and guest?
3) Any analyze on the possible bottleneck via perf or other tools?

Thanks


>
> xsk without zero copy in virtio-net:
> CPU        PPS         MSGSIZE
> 100%       1916747     64
> 100%       1775988     512
> 100%       1440054     1456
>
> sockperf:
> CPU        PPS         MSGSIZE
> 100%       713274      64
> 100%       701024      512
> 100%       695832      1456
>
> Xuan Zhuo (7):
>    xsk: support get page for drv
>    virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx
>    xsk, virtio-net: prepare for support xsk zerocopy xmit
>    virtio-net, xsk: support xsk enable/disable
>    virtio-net, xsk: realize the function of xsk packet sending
>    virtio-net, xsk: implement xsk wakeup callback
>    virtio-net, xsk: set xsk completed when packet sent done
>
>   drivers/net/virtio_net.c    | 559 +++++++++++++++++++++++++++++++++++++++-----
>   include/linux/netdevice.h   |   1 +
>   include/net/xdp_sock_drv.h  |  10 +
>   include/net/xsk_buff_pool.h |   1 +
>   net/xdp/xsk_buff_pool.c     |  10 +-
>   5 files changed, 523 insertions(+), 58 deletions(-)
>
> --
> 1.8.3.1
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 2/7] virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx
  2021-01-16  2:59   ` [PATCH net-next v2 2/7] virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx Xuan Zhuo
@ 2021-01-18  6:45     ` Jason Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2021-01-18  6:45 UTC (permalink / raw)
  To: Xuan Zhuo, netdev
  Cc: Michael S. Tsirkin, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf


On 2021/1/16 上午10:59, Xuan Zhuo wrote:
> If support xsk, a new ptr will be recovered during the
> process of freeing the old ptr. In order to distinguish between ctx sent
> by XDP_TX and ctx sent by xsk, a struct is added here to distinguish
> between these two situations. virtnet_xdp_type.type It is used to
> distinguish different ctx, and virtnet_xdp_type.offset is used to record
> the offset between "true ctx" and virtnet_xdp_type.
>
> The newly added virtnet_xsk_hdr will be used for xsk.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>


Any reason that you can't simply encode the type in the pointer itself 
as we used to do?

#define VIRTIO_XSK_FLAG    BIT(1)

?


> ---
>   drivers/net/virtio_net.c | 75 ++++++++++++++++++++++++++++++++++++++----------
>   1 file changed, 60 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ba8e637..e707c31 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -94,6 +94,22 @@ struct virtnet_rq_stats {
>   	u64 kicks;
>   };
>   
> +enum {
> +	XDP_TYPE_XSK,
> +	XDP_TYPE_TX,
> +};
> +
> +struct virtnet_xdp_type {
> +	int offset:24;
> +	unsigned type:8;
> +};
> +
> +struct virtnet_xsk_hdr {
> +	struct virtnet_xdp_type type;
> +	struct virtio_net_hdr_mrg_rxbuf hdr;
> +	u32 len;
> +};
> +
>   #define VIRTNET_SQ_STAT(m)	offsetof(struct virtnet_sq_stats, m)
>   #define VIRTNET_RQ_STAT(m)	offsetof(struct virtnet_rq_stats, m)
>   
> @@ -251,14 +267,19 @@ static bool is_xdp_frame(void *ptr)
>   	return (unsigned long)ptr & VIRTIO_XDP_FLAG;
>   }
>   
> -static void *xdp_to_ptr(struct xdp_frame *ptr)
> +static void *xdp_to_ptr(struct virtnet_xdp_type *ptr)
>   {
>   	return (void *)((unsigned long)ptr | VIRTIO_XDP_FLAG);
>   }
>   
> -static struct xdp_frame *ptr_to_xdp(void *ptr)
> +static struct virtnet_xdp_type *ptr_to_xtype(void *ptr)
> +{
> +	return (struct virtnet_xdp_type *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> +}
> +
> +static void *xtype_get_ptr(struct virtnet_xdp_type *xdptype)
>   {
> -	return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> +	return (char *)xdptype + xdptype->offset;
>   }
>   
>   /* Converting between virtqueue no. and kernel tx/rx queue no.
> @@ -459,11 +480,16 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
>   				   struct xdp_frame *xdpf)
>   {
>   	struct virtio_net_hdr_mrg_rxbuf *hdr;
> +	struct virtnet_xdp_type *xdptype;
>   	int err;
>   
> -	if (unlikely(xdpf->headroom < vi->hdr_len))
> +	if (unlikely(xdpf->headroom < vi->hdr_len + sizeof(*xdptype)))
>   		return -EOVERFLOW;
>   
> +	xdptype = (struct virtnet_xdp_type *)(xdpf + 1);
> +	xdptype->offset = (char *)xdpf - (char *)xdptype;
> +	xdptype->type = XDP_TYPE_TX;
> +
>   	/* Make room for virtqueue hdr (also change xdpf->headroom?) */
>   	xdpf->data -= vi->hdr_len;
>   	/* Zero header and leave csum up to XDP layers */
> @@ -473,7 +499,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
>   
>   	sg_init_one(sq->sg, xdpf->data, xdpf->len);
>   
> -	err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdpf),
> +	err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdptype),
>   				   GFP_ATOMIC);
>   	if (unlikely(err))
>   		return -ENOSPC; /* Caller handle free/refcnt */
> @@ -523,8 +549,11 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>   	/* Free up any pending old buffers before queueing new ones. */
>   	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>   		if (likely(is_xdp_frame(ptr))) {
> -			struct xdp_frame *frame = ptr_to_xdp(ptr);
> +			struct virtnet_xdp_type *xtype;
> +			struct xdp_frame *frame;
>   
> +			xtype = ptr_to_xtype(ptr);
> +			frame = xtype_get_ptr(xtype);
>   			bytes += frame->len;
>   			xdp_return_frame(frame);
>   		} else {
> @@ -1373,24 +1402,34 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>   
>   static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>   {
> -	unsigned int len;
>   	unsigned int packets = 0;
>   	unsigned int bytes = 0;
> -	void *ptr;
> +	unsigned int len;
> +	struct virtnet_xdp_type *xtype;
> +	struct xdp_frame        *frame;
> +	struct virtnet_xsk_hdr  *xskhdr;
> +	struct sk_buff          *skb;
> +	void                    *ptr;
>   
>   	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>   		if (likely(!is_xdp_frame(ptr))) {
> -			struct sk_buff *skb = ptr;
> +			skb = ptr;
>   
>   			pr_debug("Sent skb %p\n", skb);
>   
>   			bytes += skb->len;
>   			napi_consume_skb(skb, in_napi);
>   		} else {
> -			struct xdp_frame *frame = ptr_to_xdp(ptr);
> +			xtype = ptr_to_xtype(ptr);
>   
> -			bytes += frame->len;
> -			xdp_return_frame(frame);
> +			if (xtype->type == XDP_TYPE_XSK) {
> +				xskhdr = (struct virtnet_xsk_hdr *)xtype;
> +				bytes += xskhdr->len;
> +			} else {
> +				frame = xtype_get_ptr(xtype);
> +				xdp_return_frame(frame);
> +				bytes += frame->len;
> +			}
>   		}
>   		packets++;
>   	}
> @@ -2659,10 +2698,16 @@ static void free_unused_bufs(struct virtnet_info *vi)
>   	for (i = 0; i < vi->max_queue_pairs; i++) {
>   		struct virtqueue *vq = vi->sq[i].vq;
>   		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> -			if (!is_xdp_frame(buf))
> +			if (!is_xdp_frame(buf)) {
>   				dev_kfree_skb(buf);
> -			else
> -				xdp_return_frame(ptr_to_xdp(buf));
> +			} else {
> +				struct virtnet_xdp_type *xtype;
> +
> +				xtype = ptr_to_xtype(buf);
> +
> +				if (xtype->type != XDP_TYPE_XSK)
> +					xdp_return_frame(xtype_get_ptr(xtype));
> +			}
>   		}
>   	}
>   


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 5/7] virtio-net, xsk: realize the function of xsk packet sending
  2021-01-16  2:59   ` [PATCH net-next v2 5/7] virtio-net, xsk: realize the function of xsk packet sending Xuan Zhuo
  2021-01-16  4:47     ` Jakub Kicinski
@ 2021-01-18  9:10     ` Jason Wang
  2021-01-18 12:27       ` Michael S. Tsirkin
  1 sibling, 1 reply; 25+ messages in thread
From: Jason Wang @ 2021-01-18  9:10 UTC (permalink / raw)
  To: Xuan Zhuo, netdev
  Cc: Michael S. Tsirkin, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf


On 2021/1/16 上午10:59, Xuan Zhuo wrote:
> virtnet_xsk_run will be called in the tx interrupt handling function
> virtnet_poll_tx.
>
> The sending process gets desc from the xsk tx queue, and assembles it to
> send the data.
>
> Compared with other drivers, a special place is that the page of the
> data in xsk is used here instead of the dma address. Because the virtio
> interface does not use the dma address.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>   drivers/net/virtio_net.c | 200 ++++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 197 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a62d456..42aa9ad 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -119,6 +119,8 @@ struct virtnet_xsk_hdr {
>   	u32 len;
>   };
>   
> +#define VIRTNET_STATE_XSK_WAKEUP 1
> +
>   #define VIRTNET_SQ_STAT(m)	offsetof(struct virtnet_sq_stats, m)
>   #define VIRTNET_RQ_STAT(m)	offsetof(struct virtnet_rq_stats, m)
>   
> @@ -163,9 +165,12 @@ struct send_queue {
>   		struct xsk_buff_pool   __rcu *pool;
>   		struct virtnet_xsk_hdr __rcu *hdr;
>   
> +		unsigned long          state;
>   		u64                    hdr_con;
>   		u64                    hdr_pro;
>   		u64                    hdr_n;
> +		struct xdp_desc        last_desc;
> +		bool                   wait_slot;
>   	} xsk;
>   };
>   
> @@ -284,6 +289,8 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
>   				bool xsk_wakeup,
>   				unsigned int *_packets, unsigned int *_bytes);
>   static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi);
> +static int virtnet_xsk_run(struct send_queue *sq,
> +			   struct xsk_buff_pool *pool, int budget);
>   
>   static bool is_xdp_frame(void *ptr)
>   {
> @@ -1590,6 +1597,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>   	struct virtnet_info *vi = sq->vq->vdev->priv;
>   	unsigned int index = vq2txq(sq->vq);
>   	struct netdev_queue *txq;
> +	struct xsk_buff_pool *pool;
> +	int work = 0;
>   
>   	if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
>   		/* We don't need to enable cb for XDP */
> @@ -1599,15 +1608,26 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>   
>   	txq = netdev_get_tx_queue(vi->dev, index);
>   	__netif_tx_lock(txq, raw_smp_processor_id());
> -	free_old_xmit_skbs(sq, true);
> +
> +	rcu_read_lock();
> +	pool = rcu_dereference(sq->xsk.pool);
> +	if (pool) {
> +		work = virtnet_xsk_run(sq, pool, budget);
> +		rcu_read_unlock();
> +	} else {
> +		rcu_read_unlock();
> +		free_old_xmit_skbs(sq, true);
> +	}
> +
>   	__netif_tx_unlock(txq);
>   
> -	virtqueue_napi_complete(napi, sq->vq, 0);
> +	if (work < budget)
> +		virtqueue_napi_complete(napi, sq->vq, 0);
>   
>   	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>   		netif_tx_wake_queue(txq);
>   
> -	return 0;
> +	return work;
>   }
>   
>   static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> @@ -2647,6 +2667,180 @@ static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>   	}
>   }
>   
> +static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
> +			    struct xdp_desc *desc)
> +{
> +	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	void *data, *ptr;
> +	struct page *page;
> +	struct virtnet_xsk_hdr *xskhdr;
> +	u32 idx, offset, n, i, copy, copied;
> +	u64 addr;
> +	int err, m;
> +
> +	addr = desc->addr;
> +
> +	data = xsk_buff_raw_get_data(pool, addr);
> +	offset = offset_in_page(data);
> +
> +	/* one for hdr, one for the first page */
> +	n = 2;
> +	m = desc->len - (PAGE_SIZE - offset);
> +	if (m > 0) {
> +		n += m >> PAGE_SHIFT;
> +		if (m & PAGE_MASK)
> +			++n;
> +
> +		n = min_t(u32, n, ARRAY_SIZE(sq->sg));
> +	}
> +
> +	idx = sq->xsk.hdr_con % sq->xsk.hdr_n;


I don't understand the reason of the hdr array. It looks to me all of 
them are zero and read only from device.

Any reason for not reusing a single hdr for all xdp descriptors? Or 
maybe it's time to introduce VIRTIO_NET_F_NO_HDR.


> +	xskhdr = &sq->xsk.hdr[idx];
> +
> +	/* xskhdr->hdr has been memset to zero, so not need to clear again */
> +
> +	sg_init_table(sq->sg, n);
> +	sg_set_buf(sq->sg, &xskhdr->hdr, vi->hdr_len);
> +
> +	copied = 0;
> +	for (i = 1; i < n; ++i) {
> +		copy = min_t(int, desc->len - copied, PAGE_SIZE - offset);
> +
> +		page = xsk_buff_raw_get_page(pool, addr + copied);
> +
> +		sg_set_page(sq->sg + i, page, copy, offset);
> +		copied += copy;
> +		if (offset)
> +			offset = 0;
> +	}


It looks to me we need to terminate the sg:

**
  * virtqueue_add_outbuf - expose output buffers to other end
  * @vq: the struct virtqueue we're talking about.
  * @sg: scatterlist (must be well-formed and terminated!)


> +
> +	xskhdr->len = desc->len;
> +	ptr = xdp_to_ptr(&xskhdr->type);
> +
> +	err = virtqueue_add_outbuf(sq->vq, sq->sg, n, ptr, GFP_ATOMIC);
> +	if (unlikely(err))
> +		sq->xsk.last_desc = *desc;
> +	else
> +		sq->xsk.hdr_con++;
> +
> +	return err;
> +}
> +
> +static bool virtnet_xsk_dev_is_full(struct send_queue *sq)
> +{
> +	if (sq->vq->num_free < 2 + MAX_SKB_FRAGS)
> +		return true;
> +
> +	if (sq->xsk.hdr_con == sq->xsk.hdr_pro)
> +		return true;


Can we really reach here?


> +
> +	return false;
> +}
> +
> +static int virtnet_xsk_xmit_zc(struct send_queue *sq,
> +			       struct xsk_buff_pool *pool, unsigned int budget)
> +{
> +	struct xdp_desc desc;
> +	int err, packet = 0;
> +	int ret = -EAGAIN;
> +
> +	if (sq->xsk.last_desc.addr) {
> +		err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc);
> +		if (unlikely(err))
> +			return -EBUSY;
> +
> +		++packet;
> +		sq->xsk.last_desc.addr = 0;
> +	}
> +
> +	while (budget-- > 0) {
> +		if (virtnet_xsk_dev_is_full(sq)) {
> +			ret = -EBUSY;
> +			break;
> +		}


It looks to me we will always hit this if userspace is fast. E.g we 
don't kick until the virtqueue is full ...


> +
> +		if (!xsk_tx_peek_desc(pool, &desc)) {
> +			/* done */
> +			ret = 0;
> +			break;
> +		}
> +
> +		err = virtnet_xsk_xmit(sq, pool, &desc);
> +		if (unlikely(err)) {
> +			ret = -EBUSY;
> +			break;
> +		}
> +
> +		++packet;
> +	}
> +
> +	if (packet) {
> +		xsk_tx_release(pool);
> +
> +		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
> +			u64_stats_update_begin(&sq->stats.syncp);
> +			sq->stats.kicks++;
> +			u64_stats_update_end(&sq->stats.syncp);
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +static int virtnet_xsk_run(struct send_queue *sq,
> +			   struct xsk_buff_pool *pool, int budget)
> +{
> +	int err, ret = 0;
> +	unsigned int _packets = 0;
> +	unsigned int _bytes = 0;
> +
> +	sq->xsk.wait_slot = false;
> +
> +	__free_old_xmit_ptr(sq, true, false, &_packets, &_bytes);
> +
> +	err = virtnet_xsk_xmit_zc(sq, pool, xsk_budget);
> +	if (!err) {
> +		struct xdp_desc desc;
> +
> +		clear_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state);
> +		xsk_set_tx_need_wakeup(pool);
> +
> +		/* Race breaker. If new is coming after last xmit
> +		 * but before flag change
> +		 */
> +
> +		if (!xsk_tx_peek_desc(pool, &desc))
> +			goto end;
> +
> +		set_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state);
> +		xsk_clear_tx_need_wakeup(pool);


How memory ordering is going to work here? Or we don't need to care 
about that?


> +
> +		sq->xsk.last_desc = desc;
> +		ret = budget;
> +		goto end;
> +	}
> +
> +	xsk_clear_tx_need_wakeup(pool);
> +
> +	if (err == -EAGAIN) {
> +		ret = budget;
> +		goto end;
> +	}
> +
> +	__free_old_xmit_ptr(sq, true, false, &_packets, &_bytes);
> +
> +	if (!virtnet_xsk_dev_is_full(sq)) {
> +		ret = budget;
> +		goto end;
> +	}
> +
> +	sq->xsk.wait_slot = true;
> +
> +	virtnet_sq_stop_check(sq, true);
> +end:
> +	return ret;
> +}
> +
>   static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>   				      size_t len)
>   {


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 5/7] virtio-net, xsk: realize the function of xsk packet sending
  2021-01-18  9:10     ` Jason Wang
@ 2021-01-18 12:27       ` Michael S. Tsirkin
  0 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2021-01-18 12:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, netdev, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

On Mon, Jan 18, 2021 at 05:10:24PM +0800, Jason Wang wrote:
> 
> On 2021/1/16 上午10:59, Xuan Zhuo wrote:
> > virtnet_xsk_run will be called in the tx interrupt handling function
> > virtnet_poll_tx.
> > 
> > The sending process gets desc from the xsk tx queue, and assembles it to
> > send the data.
> > 
> > Compared with other drivers, a special place is that the page of the
> > data in xsk is used here instead of the dma address. Because the virtio
> > interface does not use the dma address.
> > 
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >   drivers/net/virtio_net.c | 200 ++++++++++++++++++++++++++++++++++++++++++++++-
> >   1 file changed, 197 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index a62d456..42aa9ad 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -119,6 +119,8 @@ struct virtnet_xsk_hdr {
> >   	u32 len;
> >   };
> > +#define VIRTNET_STATE_XSK_WAKEUP 1
> > +
> >   #define VIRTNET_SQ_STAT(m)	offsetof(struct virtnet_sq_stats, m)
> >   #define VIRTNET_RQ_STAT(m)	offsetof(struct virtnet_rq_stats, m)
> > @@ -163,9 +165,12 @@ struct send_queue {
> >   		struct xsk_buff_pool   __rcu *pool;
> >   		struct virtnet_xsk_hdr __rcu *hdr;
> > +		unsigned long          state;
> >   		u64                    hdr_con;
> >   		u64                    hdr_pro;
> >   		u64                    hdr_n;
> > +		struct xdp_desc        last_desc;
> > +		bool                   wait_slot;
> >   	} xsk;
> >   };



Please add documentation about the new fields/defines, how are they
accessed, what locking/ordering is in place.

> > @@ -284,6 +289,8 @@ static void __free_old_xmit_ptr(struct send_queue *sq, bool in_napi,
> >   				bool xsk_wakeup,
> >   				unsigned int *_packets, unsigned int *_bytes);
> >   static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi);
> > +static int virtnet_xsk_run(struct send_queue *sq,
> > +			   struct xsk_buff_pool *pool, int budget);
> >   static bool is_xdp_frame(void *ptr)
> >   {
> > @@ -1590,6 +1597,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> >   	struct virtnet_info *vi = sq->vq->vdev->priv;
> >   	unsigned int index = vq2txq(sq->vq);
> >   	struct netdev_queue *txq;
> > +	struct xsk_buff_pool *pool;
> > +	int work = 0;
> >   	if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
> >   		/* We don't need to enable cb for XDP */
> > @@ -1599,15 +1608,26 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> >   	txq = netdev_get_tx_queue(vi->dev, index);
> >   	__netif_tx_lock(txq, raw_smp_processor_id());
> > -	free_old_xmit_skbs(sq, true);
> > +
> > +	rcu_read_lock();
> > +	pool = rcu_dereference(sq->xsk.pool);
> > +	if (pool) {
> > +		work = virtnet_xsk_run(sq, pool, budget);
> > +		rcu_read_unlock();
> > +	} else {
> > +		rcu_read_unlock();
> > +		free_old_xmit_skbs(sq, true);
> > +	}
> > +
> >   	__netif_tx_unlock(txq);
> > -	virtqueue_napi_complete(napi, sq->vq, 0);
> > +	if (work < budget)
> > +		virtqueue_napi_complete(napi, sq->vq, 0);
> >   	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> >   		netif_tx_wake_queue(txq);
> > -	return 0;
> > +	return work;
> >   }
> >   static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> > @@ -2647,6 +2667,180 @@ static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
> >   	}
> >   }
> > +static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
> > +			    struct xdp_desc *desc)
> > +{
> > +	struct virtnet_info *vi = sq->vq->vdev->priv;
> > +	void *data, *ptr;
> > +	struct page *page;
> > +	struct virtnet_xsk_hdr *xskhdr;
> > +	u32 idx, offset, n, i, copy, copied;
> > +	u64 addr;
> > +	int err, m;
> > +
> > +	addr = desc->addr;
> > +
> > +	data = xsk_buff_raw_get_data(pool, addr);
> > +	offset = offset_in_page(data);
> > +
> > +	/* one for hdr, one for the first page */
> > +	n = 2;
> > +	m = desc->len - (PAGE_SIZE - offset);
> > +	if (m > 0) {
> > +		n += m >> PAGE_SHIFT;
> > +		if (m & PAGE_MASK)
> > +			++n;
> > +
> > +		n = min_t(u32, n, ARRAY_SIZE(sq->sg));
> > +	}
> > +
> > +	idx = sq->xsk.hdr_con % sq->xsk.hdr_n;
> 
> 
> I don't understand the reason of the hdr array. It looks to me all of them
> are zero and read only from device.
> 
> Any reason for not reusing a single hdr for all xdp descriptors? Or maybe
> it's time to introduce VIRTIO_NET_F_NO_HDR.

I'm not sure it's worth it, since
- xdp can be enabled/disabled dynamically
- there's intent to add offload support to xdp

> 
> > +	xskhdr = &sq->xsk.hdr[idx];
> > +
> > +	/* xskhdr->hdr has been memset to zero, so not need to clear again */
> > +
> > +	sg_init_table(sq->sg, n);
> > +	sg_set_buf(sq->sg, &xskhdr->hdr, vi->hdr_len);
> > +
> > +	copied = 0;
> > +	for (i = 1; i < n; ++i) {
> > +		copy = min_t(int, desc->len - copied, PAGE_SIZE - offset);
> > +
> > +		page = xsk_buff_raw_get_page(pool, addr + copied);
> > +
> > +		sg_set_page(sq->sg + i, page, copy, offset);
> > +		copied += copy;
> > +		if (offset)
> > +			offset = 0;
> > +	}
> 
> 
> It looks to me we need to terminate the sg:
> 
> **
>  * virtqueue_add_outbuf - expose output buffers to other end
>  * @vq: the struct virtqueue we're talking about.
>  * @sg: scatterlist (must be well-formed and terminated!)
> 
> 
> > +
> > +	xskhdr->len = desc->len;
> > +	ptr = xdp_to_ptr(&xskhdr->type);
> > +
> > +	err = virtqueue_add_outbuf(sq->vq, sq->sg, n, ptr, GFP_ATOMIC);
> > +	if (unlikely(err))
> > +		sq->xsk.last_desc = *desc;
> > +	else
> > +		sq->xsk.hdr_con++;
> > +
> > +	return err;
> > +}
> > +
> > +static bool virtnet_xsk_dev_is_full(struct send_queue *sq)
> > +{
> > +	if (sq->vq->num_free < 2 + MAX_SKB_FRAGS)
> > +		return true;
> > +
> > +	if (sq->xsk.hdr_con == sq->xsk.hdr_pro)
> > +		return true;
> 
> 
> Can we really reach here?
> 
> 
> > +
> > +	return false;
> > +}
> > +
> > +static int virtnet_xsk_xmit_zc(struct send_queue *sq,
> > +			       struct xsk_buff_pool *pool, unsigned int budget)
> > +{
> > +	struct xdp_desc desc;
> > +	int err, packet = 0;
> > +	int ret = -EAGAIN;
> > +
> > +	if (sq->xsk.last_desc.addr) {
> > +		err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc);
> > +		if (unlikely(err))
> > +			return -EBUSY;
> > +
> > +		++packet;
> > +		sq->xsk.last_desc.addr = 0;
> > +	}
> > +
> > +	while (budget-- > 0) {
> > +		if (virtnet_xsk_dev_is_full(sq)) {
> > +			ret = -EBUSY;
> > +			break;
> > +		}
> 
> 
> It looks to me we will always hit this if userspace is fast. E.g we don't
> kick until the virtqueue is full ...
> 
> 
> > +
> > +		if (!xsk_tx_peek_desc(pool, &desc)) {
> > +			/* done */
> > +			ret = 0;
> > +			break;
> > +		}
> > +
> > +		err = virtnet_xsk_xmit(sq, pool, &desc);
> > +		if (unlikely(err)) {
> > +			ret = -EBUSY;
> > +			break;
> > +		}
> > +
> > +		++packet;
> > +	}
> > +
> > +	if (packet) {
> > +		xsk_tx_release(pool);
> > +
> > +		if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) {
> > +			u64_stats_update_begin(&sq->stats.syncp);
> > +			sq->stats.kicks++;
> > +			u64_stats_update_end(&sq->stats.syncp);
> > +		}
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static int virtnet_xsk_run(struct send_queue *sq,
> > +			   struct xsk_buff_pool *pool, int budget)
> > +{
> > +	int err, ret = 0;
> > +	unsigned int _packets = 0;
> > +	unsigned int _bytes = 0;
> > +
> > +	sq->xsk.wait_slot = false;
> > +
> > +	__free_old_xmit_ptr(sq, true, false, &_packets, &_bytes);
> > +
> > +	err = virtnet_xsk_xmit_zc(sq, pool, xsk_budget);
> > +	if (!err) {
> > +		struct xdp_desc desc;
> > +
> > +		clear_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state);
> > +		xsk_set_tx_need_wakeup(pool);
> > +
> > +		/* Race breaker. If new is coming after last xmit
> > +		 * but before flag change
> > +		 */
> > +
> > +		if (!xsk_tx_peek_desc(pool, &desc))
> > +			goto end;
> > +
> > +		set_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state);
> > +		xsk_clear_tx_need_wakeup(pool);
> 
> 
> How memory ordering is going to work here? Or we don't need to care about
> that?
> 
> 
> > +
> > +		sq->xsk.last_desc = desc;
> > +		ret = budget;
> > +		goto end;
> > +	}
> > +
> > +	xsk_clear_tx_need_wakeup(pool);
> > +
> > +	if (err == -EAGAIN) {
> > +		ret = budget;
> > +		goto end;
> > +	}
> > +
> > +	__free_old_xmit_ptr(sq, true, false, &_packets, &_bytes);
> > +
> > +	if (!virtnet_xsk_dev_is_full(sq)) {
> > +		ret = budget;
> > +		goto end;
> > +	}
> > +
> > +	sq->xsk.wait_slot = true;
> > +
> > +	virtnet_sq_stop_check(sq, true);
> > +end:
> > +	return ret;
> > +}
> > +
> >   static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
> >   				      size_t len)
> >   {


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v2 6/7] virtio-net, xsk: implement xsk wakeup callback
  2021-01-16  2:59   ` [PATCH net-next v2 6/7] virtio-net, xsk: implement xsk wakeup callback Xuan Zhuo
@ 2021-01-19  4:50     ` Jason Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2021-01-19  4:50 UTC (permalink / raw)
  To: Xuan Zhuo, netdev
  Cc: Michael S. Tsirkin, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf


On 2021/1/16 上午10:59, Xuan Zhuo wrote:
> Since I did not find an interface to directly notify virtio to generate
> a tx interrupt, I sent some data to trigger a new tx interrupt.
>
> Another advantage of this is that the transmission delay will be
> relatively small, and there is no need to wait for the tx interrupt to
> start softirq.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>   drivers/net/virtio_net.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 51 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 42aa9ad..e552c2d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2841,6 +2841,56 @@ static int virtnet_xsk_run(struct send_queue *sq,
>   	return ret;
>   }
>   
> +static int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	struct send_queue *sq;
> +	struct xsk_buff_pool *pool;
> +	struct netdev_queue *txq;
> +
> +	if (!netif_running(dev))
> +		return -ENETDOWN;
> +
> +	if (qid >= vi->curr_queue_pairs)
> +		return -EINVAL;
> +
> +	sq = &vi->sq[qid];
> +
> +	rcu_read_lock();
> +
> +	pool = rcu_dereference(sq->xsk.pool);
> +	if (!pool)
> +		goto end;
> +
> +	if (test_and_set_bit(VIRTNET_STATE_XSK_WAKEUP, &sq->xsk.state))
> +		goto end;
> +
> +	txq = netdev_get_tx_queue(dev, qid);
> +
> +	local_bh_disable();
> +	__netif_tx_lock(txq, raw_smp_processor_id());


You can use __netif_tx_lock_bh().

Thanks


> +
> +	/* Send part of the package directly to reduce the delay in sending the
> +	 * package, and this can actively trigger the tx interrupts.
> +	 *
> +	 * If the package is not processed, then continue processing in the
> +	 * subsequent tx interrupt(virtnet_poll_tx).
> +	 *
> +	 * If no packet is sent out, the ring of the device is full. In this
> +	 * case, we will still get a tx interrupt response. Then we will deal
> +	 * with the subsequent packet sending work.
> +	 */
> +
> +	virtnet_xsk_run(sq, pool, xsk_budget);
> +
> +	__netif_tx_unlock(txq);
> +	local_bh_enable();
> +
> +end:
> +	rcu_read_unlock();
> +	return 0;
> +}
> +
>   static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>   				      size_t len)
>   {
> @@ -2895,6 +2945,7 @@ static int virtnet_set_features(struct net_device *dev,
>   	.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
>   	.ndo_bpf		= virtnet_xdp,
>   	.ndo_xdp_xmit		= virtnet_xdp_xmit,
> +	.ndo_xsk_wakeup		= virtnet_xsk_wakeup,
>   	.ndo_features_check	= passthru_features_check,
>   	.ndo_get_phys_port_name	= virtnet_get_phys_port_name,
>   	.ndo_set_features	= virtnet_set_features,


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit
       [not found] <1609901717.683732-2-xuanzhuo@linux.alibaba.com>
@ 2021-01-06  3:54 ` Jason Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2021-01-06  3:54 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: dust.li, tonylu, Michael S. Tsirkin, David S. Miller,
	Jakub Kicinski, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	VIRTIO CORE AND NET DRIVERS, open list, XDP SOCKETS (AF_XDP),
	netdev


On 2021/1/6 上午10:55, Xuan Zhuo wrote:
> On Wed, 6 Jan 2021 10:46:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> On 2021/1/5 下午8:42, Xuan Zhuo wrote:
>>> On Tue, 5 Jan 2021 17:32:19 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>> On 2021/1/5 下午5:11, Xuan Zhuo wrote:
>>>>> The first patch made some adjustments to xsk.
>>>> Thanks a lot for the work. It's rather interesting.
>>>>
>>>>
>>>>> The second patch itself can be used as an independent patch to solve the problem
>>>>> that XDP may fail to load when the number of queues is insufficient.
>>>> It would be better to send this as a separated patch. Several people
>>>> asked for this before.
>>>>
>>>>
>>>>> The third to last patch implements support for xsk in virtio-net.
>>>>>
>>>>> A practical problem with virtio is that tx interrupts are not very reliable.
>>>>> There will always be some missing or delayed tx interrupts. So I specially added
>>>>> a point timer to solve this problem. Of course, considering performance issues,
>>>>> The timer only triggers when the ring of the network card is full.
>>>> This is sub-optimal. We need figure out the root cause. We don't meet
>>>> such issue before.
>>>>
>>>> Several questions:
>>>>
>>>> - is tx interrupt enabled?
>>>> - can you still see the issue if you disable event index?
>>>> - what's backend did you use? qemu or vhost(user)?
>>> Sorry, it may just be a problem with the backend I used here. I just tested the
>>> latest qemu and it did not have this problem. I think I should delete the
>>> timer-related code?
>>
>> Yes, please.
>>
>>
>>>>> Regarding the issue of virtio-net supporting xsk's zero copy rx, I am also
>>>>> developing it, but I found that the modification may be relatively large, so I
>>>>> consider this patch set to be separated from the code related to xsk zero copy
>>>>> rx.
>>>> That's fine, but a question here.
>>>>
>>>> How is the multieuque being handled here. I'm asking since there's no
>>>> programmable filters/directors support in virtio spec now.
>>>>
>>>> Thanks
>>> I don't really understand what you mean. In the case of multiple queues,
>>> there is no problem.
>>
>> So consider we bind xsk to queue 4, how can you make sure the traffic to
>> be directed queue 4? One possible solution is to use filters as what
>> suggested in af_xdp.rst:
>>
>>         ethtool -N p3p2 rx-flow-hash udp4 fn
>>         ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 \
>>             action 16
>> ...
>>
>> But virtio-net doesn't have any filters that could be programmed from
>> the driver.
>>
>> Anything I missed here?
>>
>> Thanks
> I understand what you mean, this problem does exist, and I encountered it when I
> tested qemu.
>
> First of all, this is that the problem only affects recv. This patch is for
> xmit. Of course, our normal business must also have recv scenarios.
>
> My solution in developing the upper-level application is to bond all the queues
> to ensure that we can receive the packets we want.


I'm not sure I get you here. Note that. one advantage of AF_XDP is that 
is allows XSK to be bound to a specific queue and the rest could still 
be used by kernel.


>   And I think in the
> implementation of the use, even if the network card supports filters, we should
> also bond all the queues, because we don't know which queue the traffic we care
> about will arrive from.


With the help of filters the card can select a specific queue based on 
hash or n-tuple so it should work?


>
> Regarding the problem of virtio-net, I think our core question is whether we
> need to deal with this problem in the driver of virtio-net, I personally think
> that we should add the virtio specification to define this scenario.


Yes, so do you want to do that? It would make virtio-net more user 
friendly to AF_XDP. (Or if you wish I can post patch to extend the spec).


>
> When I tested it, I found that some cloud vendors' implementations guarantee
> this queue selection algorithm.


Right, though spec suggest a automatic steering algorithm but it's not 
mandatory. Vendor can implement their own.

But hash or ntuple filter should be still useful.

Thanks


>
> Thanks!!
>
>>
>>>>> Xuan Zhuo (5):
>>>>>      xsk: support get page for drv
>>>>>      virtio-net: support XDP_TX when not more queues
>>>>>      virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx
>>>>>      xsk, virtio-net: prepare for support xsk
>>>>>      virtio-net, xsk: virtio-net support xsk zero copy tx
>>>>>
>>>>>     drivers/net/virtio_net.c    | 643 +++++++++++++++++++++++++++++++++++++++-----
>>>>>     include/linux/netdevice.h   |   1 +
>>>>>     include/net/xdp_sock_drv.h  |  10 +
>>>>>     include/net/xsk_buff_pool.h |   1 +
>>>>>     net/xdp/xsk_buff_pool.c     |  10 +-
>>>>>     5 files changed, 597 insertions(+), 68 deletions(-)
>>>>>
>>>>> --
>>>>> 1.8.3.1
>>>>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit
       [not found] <1609850555.8687568-1-xuanzhuo@linux.alibaba.com>
@ 2021-01-06  2:46 ` Jason Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Wang @ 2021-01-06  2:46 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: dust.li, tonylu, Michael S. Tsirkin, David S. Miller,
	Jakub Kicinski, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	VIRTIO CORE AND NET DRIVERS, open list, XDP SOCKETS (AF_XDP),
	netdev


On 2021/1/5 下午8:42, Xuan Zhuo wrote:
> On Tue, 5 Jan 2021 17:32:19 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> On 2021/1/5 下午5:11, Xuan Zhuo wrote:
>>> The first patch made some adjustments to xsk.
>>
>> Thanks a lot for the work. It's rather interesting.
>>
>>
>>> The second patch itself can be used as an independent patch to solve the problem
>>> that XDP may fail to load when the number of queues is insufficient.
>>
>> It would be better to send this as a separated patch. Several people
>> asked for this before.
>>
>>
>>> The third to last patch implements support for xsk in virtio-net.
>>>
>>> A practical problem with virtio is that tx interrupts are not very reliable.
>>> There will always be some missing or delayed tx interrupts. So I specially added
>>> a point timer to solve this problem. Of course, considering performance issues,
>>> The timer only triggers when the ring of the network card is full.
>>
>> This is sub-optimal. We need figure out the root cause. We don't meet
>> such issue before.
>>
>> Several questions:
>>
>> - is tx interrupt enabled?
>> - can you still see the issue if you disable event index?
>> - what's backend did you use? qemu or vhost(user)?
> Sorry, it may just be a problem with the backend I used here. I just tested the
> latest qemu and it did not have this problem. I think I should delete the
> timer-related code?


Yes, please.


>
>>
>>> Regarding the issue of virtio-net supporting xsk's zero copy rx, I am also
>>> developing it, but I found that the modification may be relatively large, so I
>>> consider this patch set to be separated from the code related to xsk zero copy
>>> rx.
>>
>> That's fine, but a question here.
>>
>> How is the multieuque being handled here. I'm asking since there's no
>> programmable filters/directors support in virtio spec now.
>>
>> Thanks
> I don't really understand what you mean. In the case of multiple queues,
> there is no problem.


So consider we bind xsk to queue 4, how can you make sure the traffic to 
be directed queue 4? One possible solution is to use filters as what 
suggested in af_xdp.rst:

       ethtool -N p3p2 rx-flow-hash udp4 fn
       ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 \
           action 16
...

But virtio-net doesn't have any filters that could be programmed from 
the driver.

Anything I missed here?

Thanks


>
>>
>>> Xuan Zhuo (5):
>>>     xsk: support get page for drv
>>>     virtio-net: support XDP_TX when not more queues
>>>     virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx
>>>     xsk, virtio-net: prepare for support xsk
>>>     virtio-net, xsk: virtio-net support xsk zero copy tx
>>>
>>>    drivers/net/virtio_net.c    | 643 +++++++++++++++++++++++++++++++++++++++-----
>>>    include/linux/netdevice.h   |   1 +
>>>    include/net/xdp_sock_drv.h  |  10 +
>>>    include/net/xsk_buff_pool.h |   1 +
>>>    net/xdp/xsk_buff_pool.c     |  10 +-
>>>    5 files changed, 597 insertions(+), 68 deletions(-)
>>>
>>> --
>>> 1.8.3.1
>>>


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-01-19  5:02 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-05  9:11 [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit Xuan Zhuo
2021-01-05  9:11 ` [PATCH netdev 1/5] xsk: support get page for drv Xuan Zhuo
2021-01-05  9:11 ` [PATCH netdev 2/5] virtio-net: support XDP_TX when not more queues Xuan Zhuo
2021-01-05  9:11 ` [PATCH netdev 3/5] virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx Xuan Zhuo
2021-01-05  9:11 ` [PATCH netdev 4/5] xsk, virtio-net: prepare for support xsk Xuan Zhuo
2021-01-05  9:11 ` [PATCH netdev 5/5] virtio-net, xsk: virtio-net support xsk zero copy tx Xuan Zhuo
2021-01-05 13:21   ` Michael S. Tsirkin
2021-01-05  9:32 ` [PATCH netdev 0/5] virtio-net support xdp socket zero copy xmit Jason Wang
2021-01-05 12:25 ` Michael S. Tsirkin
2021-01-16  2:59 ` [PATCH net-next v2 0/7] " Xuan Zhuo
2021-01-16  2:59   ` [PATCH net-next v2 1/7] xsk: support get page for drv Xuan Zhuo
2021-01-16  2:59   ` [PATCH net-next v2 2/7] virtio-net, xsk: distinguish XDP_TX and XSK XMIT ctx Xuan Zhuo
2021-01-18  6:45     ` Jason Wang
2021-01-16  2:59   ` [PATCH net-next v2 3/7] xsk, virtio-net: prepare for support xsk zerocopy xmit Xuan Zhuo
2021-01-16  2:59   ` [PATCH net-next v2 4/7] virtio-net, xsk: support xsk enable/disable Xuan Zhuo
2021-01-16  2:59   ` [PATCH net-next v2 5/7] virtio-net, xsk: realize the function of xsk packet sending Xuan Zhuo
2021-01-16  4:47     ` Jakub Kicinski
2021-01-18  9:10     ` Jason Wang
2021-01-18 12:27       ` Michael S. Tsirkin
2021-01-16  2:59   ` [PATCH net-next v2 6/7] virtio-net, xsk: implement xsk wakeup callback Xuan Zhuo
2021-01-19  4:50     ` Jason Wang
2021-01-16  2:59   ` [PATCH net-next v2 7/7] virtio-net, xsk: set xsk completed when packet sent done Xuan Zhuo
2021-01-18  6:28   ` [PATCH net-next v2 0/7] virtio-net support xdp socket zero copy xmit Jason Wang
     [not found] <1609850555.8687568-1-xuanzhuo@linux.alibaba.com>
2021-01-06  2:46 ` [PATCH netdev 0/5] " Jason Wang
     [not found] <1609901717.683732-2-xuanzhuo@linux.alibaba.com>
2021-01-06  3:54 ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).