BPF Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH net-next v3 0/8] virtio-net support xdp socket zero copy xmit
@ 2021-03-31  7:11 Xuan Zhuo
  2021-03-31  7:11 ` [PATCH net-next v3 1/8] xsk: XDP_SETUP_XSK_POOL support option check_dma Xuan Zhuo
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Xuan Zhuo @ 2021-03-31  7:11 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

XDP socket is an excellent by pass kernel network transmission framework. The
zero copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good. mlx5 and intel ixgbe already support this
feature, This patch set allows virtio-net to support xsk's zerocopy xmit
feature.

And xsk's zerocopy rx has made major changes to virtio-net, and I hope to submit
it after this patch set are received.

Compared with other drivers, virtio-net does not directly obtain the dma
address, so I first obtain the xsk page, and then pass the page to virtio.

When recycling the sent packets, we have to distinguish between skb and xdp.
Now we have to distinguish between skb, xdp, xsk.

#0 #1 made some adjustments to xsk.

---------------- Performance Testing ------------

The udp package tool implemented by the interface of xsk vs sockperf(kernel udp)
for performance testing:

xsk zero copy xmit in virtio-net:
CPU        PPS         MSGSIZE    vhost-cpu
7.9%       511804      64         100%
13.3%%     484373      1500       100%

sockperf:
CPU        PPS         MSGSIZE    vhost-cpu
100%       375227      64         89.1%
100%       307322      1500       81.5%

Xuan Zhuo (8):
  xsk: XDP_SETUP_XSK_POOL support option check_dma
  xsk: support get page by addr
  virtio-net: xsk zero copy xmit setup
  virtio-net: xsk zero copy xmit implement wakeup and xmit
  virtio-net: xsk zero copy xmit support xsk unaligned mode
  virtio-net: xsk zero copy xmit kick by threshold
  virtio-net: poll tx call xsk zerocopy xmit
  virtio-net: free old xmit handle xsk

 drivers/net/virtio_net.c   | 449 +++++++++++++++++++++++++++++++++----
 include/linux/netdevice.h  |   1 +
 include/net/xdp_sock_drv.h |  11 +
 net/xdp/xsk_buff_pool.c    |   3 +-
 4 files changed, 415 insertions(+), 49 deletions(-)

--
2.31.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 1/8] xsk: XDP_SETUP_XSK_POOL support option check_dma
  2021-03-31  7:11 [PATCH net-next v3 0/8] virtio-net support xdp socket zero copy xmit Xuan Zhuo
@ 2021-03-31  7:11 ` Xuan Zhuo
  2021-03-31  7:11 ` [PATCH net-next v3 2/8] xsk: support get page by addr Xuan Zhuo
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Xuan Zhuo @ 2021-03-31  7:11 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li

The check_dma option is added to the bpf command. Because virtio-net
does not complete the dma initialization in advance. Instead, vring does
the dma operation every time data is sent.

Of course, in theory, it would be better to complete the dma
initialization in advance. But the modification vring may be more
troublesome, so here is an option to notify xsk dma whether the
initialization is complete. In this way, xsk will not report an error
because dma has not been initialized.

Of course, I still hope that virtio-net can support the completion of
dma operations in advance.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
---
 include/linux/netdevice.h | 1 +
 net/xdp/xsk_buff_pool.c   | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f57b70fc251f..47666b5d2dff 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -979,6 +979,7 @@ struct netdev_bpf {
 		struct {
 			struct xsk_buff_pool *pool;
 			u16 queue_id;
+			bool check_dma;
 		} xsk;
 	};
 };
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 8de01aaac4a0..4d3aed73ee3e 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -166,12 +166,13 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
 	bpf.command = XDP_SETUP_XSK_POOL;
 	bpf.xsk.pool = pool;
 	bpf.xsk.queue_id = queue_id;
+	bpf.xsk.check_dma = true;
 
 	err = netdev->netdev_ops->ndo_bpf(netdev, &bpf);
 	if (err)
 		goto err_unreg_pool;
 
-	if (!pool->dma_pages) {
+	if (bpf.xsk.check_dma && !pool->dma_pages) {
 		WARN(1, "Driver did not DMA map zero-copy buffers");
 		err = -EINVAL;
 		goto err_unreg_xsk;
-- 
2.31.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 2/8] xsk: support get page by addr
  2021-03-31  7:11 [PATCH net-next v3 0/8] virtio-net support xdp socket zero copy xmit Xuan Zhuo
  2021-03-31  7:11 ` [PATCH net-next v3 1/8] xsk: XDP_SETUP_XSK_POOL support option check_dma Xuan Zhuo
@ 2021-03-31  7:11 ` Xuan Zhuo
  2021-03-31  7:11 ` [PATCH net-next v3 3/8] virtio-net: xsk zero copy xmit setup Xuan Zhuo
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Xuan Zhuo @ 2021-03-31  7:11 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li

xsk adds an interface and returns the page corresponding to
data. virtio-net does not initialize dma, so it needs page to construct
scatterlist to pass to vring.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
---
 include/net/xdp_sock_drv.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 4e295541e396..1d08b5d8d15f 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -72,6 +72,12 @@ static inline dma_addr_t xsk_buff_xdp_get_frame_dma(struct xdp_buff *xdp)
 	return xp_get_frame_dma(xskb);
 }
 
+static inline struct page *xsk_buff_xdp_get_page(struct xsk_buff_pool *pool, u64 addr)
+{
+	addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
+	return pool->umem->pgs[addr >> PAGE_SHIFT];
+}
+
 static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool)
 {
 	return xp_alloc(pool);
@@ -207,6 +213,11 @@ static inline dma_addr_t xsk_buff_xdp_get_frame_dma(struct xdp_buff *xdp)
 	return 0;
 }
 
+static inline struct page *xsk_buff_xdp_get_page(struct xsk_buff_pool *pool, u64 addr)
+{
+	return NULL;
+}
+
 static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool)
 {
 	return NULL;
-- 
2.31.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 3/8] virtio-net: xsk zero copy xmit setup
  2021-03-31  7:11 [PATCH net-next v3 0/8] virtio-net support xdp socket zero copy xmit Xuan Zhuo
  2021-03-31  7:11 ` [PATCH net-next v3 1/8] xsk: XDP_SETUP_XSK_POOL support option check_dma Xuan Zhuo
  2021-03-31  7:11 ` [PATCH net-next v3 2/8] xsk: support get page by addr Xuan Zhuo
@ 2021-03-31  7:11 ` Xuan Zhuo
  2021-04-06  4:27   ` Jason Wang
  2021-03-31  7:11 ` [PATCH net-next v3 4/8] virtio-net: xsk zero copy xmit implement wakeup and xmit Xuan Zhuo
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Xuan Zhuo @ 2021-03-31  7:11 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li

xsk is a high-performance packet receiving and sending technology.

This patch implements the binding and unbinding operations of xsk and
the virtio-net queue for xsk zero copy xmit.

The xsk zero copy xmit depends on tx napi. So if tx napi is not opened,
an error will be reported. And the entire operation is under the
protection of rtnl_lock

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 66 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index bb4ea9dbc16b..4e25408a2b37 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -22,6 +22,7 @@
 #include <net/route.h>
 #include <net/xdp.h>
 #include <net/net_failover.h>
+#include <net/xdp_sock_drv.h>
 
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
@@ -133,6 +134,11 @@ struct send_queue {
 	struct virtnet_sq_stats stats;
 
 	struct napi_struct napi;
+
+	struct {
+		/* xsk pool */
+		struct xsk_buff_pool __rcu *pool;
+	} xsk;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -2526,11 +2532,71 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 	return err;
 }
 
+static int virtnet_xsk_pool_enable(struct net_device *dev,
+				   struct xsk_buff_pool *pool,
+				   u16 qid)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	struct send_queue *sq;
+	int ret = -EBUSY;
+
+	if (qid >= vi->curr_queue_pairs)
+		return -EINVAL;
+
+	sq = &vi->sq[qid];
+
+	/* xsk zerocopy depend on the tx napi */
+	if (!sq->napi.weight)
+		return -EPERM;
+
+	rcu_read_lock();
+	if (rcu_dereference(sq->xsk.pool))
+		goto end;
+
+	/* Here is already protected by rtnl_lock, so rcu_assign_pointer is
+	 * safe.
+	 */
+	rcu_assign_pointer(sq->xsk.pool, pool);
+	ret = 0;
+end:
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static int virtnet_xsk_pool_disable(struct net_device *dev, u16 qid)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	struct send_queue *sq;
+
+	if (qid >= vi->curr_queue_pairs)
+		return -EINVAL;
+
+	sq = &vi->sq[qid];
+
+	/* Here is already protected by rtnl_lock, so rcu_assign_pointer is
+	 * safe.
+	 */
+	rcu_assign_pointer(sq->xsk.pool, NULL);
+
+	synchronize_rcu(); /* Sync with the XSK wakeup and with NAPI. */
+
+	return 0;
+}
+
 static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 {
 	switch (xdp->command) {
 	case XDP_SETUP_PROG:
 		return virtnet_xdp_set(dev, xdp->prog, xdp->extack);
+	case XDP_SETUP_XSK_POOL:
+		/* virtio net not use dma before call vring api */
+		xdp->xsk.check_dma = false;
+		if (xdp->xsk.pool)
+			return virtnet_xsk_pool_enable(dev, xdp->xsk.pool,
+						       xdp->xsk.queue_id);
+		else
+			return virtnet_xsk_pool_disable(dev, xdp->xsk.queue_id);
 	default:
 		return -EINVAL;
 	}
-- 
2.31.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 4/8] virtio-net: xsk zero copy xmit implement wakeup and xmit
  2021-03-31  7:11 [PATCH net-next v3 0/8] virtio-net support xdp socket zero copy xmit Xuan Zhuo
                   ` (2 preceding siblings ...)
  2021-03-31  7:11 ` [PATCH net-next v3 3/8] virtio-net: xsk zero copy xmit setup Xuan Zhuo
@ 2021-03-31  7:11 ` Xuan Zhuo
  2021-04-06  6:19   ` Jason Wang
  2021-03-31  7:11 ` [PATCH net-next v3 5/8] virtio-net: xsk zero copy xmit support xsk unaligned mode Xuan Zhuo
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Xuan Zhuo @ 2021-03-31  7:11 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li

When the user calls sendto to consume the data in the xsk tx queue,
virtnet_xsk_wakeup will be called.

In wakeup, it will try to send a part of the data directly, the quantity
is operated by the module parameter xsk_budget. There are two purposes
for this realization:

1. Send part of the data quickly to reduce the transmission delay of the
   first packet
2. Trigger tx interrupt, start napi to consume xsk tx data

All sent xsk packets share the virtio-net header of xsk_hdr. If xsk
needs to support csum and other functions later, consider assigning xsk
hdr separately for each sent packet.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 183 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 183 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4e25408a2b37..c8a317a93ef7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -28,9 +28,11 @@ static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
 
 static bool csum = true, gso = true, napi_tx = true;
+static int xsk_budget = 32;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
 module_param(napi_tx, bool, 0644);
+module_param(xsk_budget, int, 0644);
 
 /* FIXME: MTU in config. */
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
@@ -47,6 +49,8 @@ module_param(napi_tx, bool, 0644);
 
 #define VIRTIO_XDP_FLAG	BIT(0)
 
+static struct virtio_net_hdr_mrg_rxbuf xsk_hdr;
+
 /* RX packet size EWMA. The average packet size is used to determine the packet
  * buffer size when refilling RX rings. As the entire RX ring may be refilled
  * at once, the weight is chosen so that the EWMA will be insensitive to short-
@@ -138,6 +142,9 @@ struct send_queue {
 	struct {
 		/* xsk pool */
 		struct xsk_buff_pool __rcu *pool;
+
+		/* save the desc for next xmit, when xmit fail. */
+		struct xdp_desc last_desc;
 	} xsk;
 };
 
@@ -2532,6 +2539,179 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 	return err;
 }
 
+static void virtnet_xsk_check_space(struct send_queue *sq)
+{
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct net_device *dev = vi->dev;
+	int qnum = sq - vi->sq;
+
+	/* If this sq is not the exclusive queue of the current cpu,
+	 * then it may be called by start_xmit, so check it running out
+	 * of space.
+	 */
+	if (is_xdp_raw_buffer_queue(vi, qnum))
+		return;
+
+	/* Stop the queue to avoid getting packets that we are
+	 * then unable to transmit. Then wait the tx interrupt.
+	 */
+	if (sq->vq->num_free < 2 + MAX_SKB_FRAGS)
+		netif_stop_subqueue(dev, qnum);
+}
+
+static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
+			    struct xdp_desc *desc)
+{
+	struct virtnet_info *vi;
+	struct page *page;
+	void *data;
+	u32 offset;
+	u64 addr;
+	int err;
+
+	vi = sq->vq->vdev->priv;
+	addr = desc->addr;
+	data = xsk_buff_raw_get_data(pool, addr);
+	offset = offset_in_page(data);
+
+	sg_init_table(sq->sg, 2);
+	sg_set_buf(sq->sg, &xsk_hdr, vi->hdr_len);
+	page = xsk_buff_xdp_get_page(pool, addr);
+	sg_set_page(sq->sg + 1, page, desc->len, offset);
+
+	err = virtqueue_add_outbuf(sq->vq, sq->sg, 2, NULL, GFP_ATOMIC);
+	if (unlikely(err))
+		sq->xsk.last_desc = *desc;
+
+	return err;
+}
+
+static int virtnet_xsk_xmit_batch(struct send_queue *sq,
+				  struct xsk_buff_pool *pool,
+				  unsigned int budget,
+				  bool in_napi, int *done)
+{
+	struct xdp_desc desc;
+	int err, packet = 0;
+	int ret = -EAGAIN;
+
+	if (sq->xsk.last_desc.addr) {
+		err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc);
+		if (unlikely(err))
+			return -EBUSY;
+
+		++packet;
+		--budget;
+		sq->xsk.last_desc.addr = 0;
+	}
+
+	while (budget-- > 0) {
+		if (sq->vq->num_free < 2 + MAX_SKB_FRAGS) {
+			ret = -EBUSY;
+			break;
+		}
+
+		if (!xsk_tx_peek_desc(pool, &desc)) {
+			/* done */
+			ret = 0;
+			break;
+		}
+
+		err = virtnet_xsk_xmit(sq, pool, &desc);
+		if (unlikely(err)) {
+			ret = -EBUSY;
+			break;
+		}
+
+		++packet;
+	}
+
+	if (packet) {
+		if (virtqueue_kick_prepare(sq->vq) &&
+		    virtqueue_notify(sq->vq)) {
+			u64_stats_update_begin(&sq->stats.syncp);
+			sq->stats.kicks += 1;
+			u64_stats_update_end(&sq->stats.syncp);
+		}
+
+		*done = packet;
+
+		xsk_tx_release(pool);
+	}
+
+	return ret;
+}
+
+static int virtnet_xsk_run(struct send_queue *sq, struct xsk_buff_pool *pool,
+			   int budget, bool in_napi)
+{
+	int done = 0;
+	int err;
+
+	free_old_xmit_skbs(sq, in_napi);
+
+	err = virtnet_xsk_xmit_batch(sq, pool, budget, in_napi, &done);
+	/* -EAGAIN: done == budget
+	 * -EBUSY: done < budget
+	 *  0    : done < budget
+	 */
+	if (err == -EBUSY) {
+		free_old_xmit_skbs(sq, in_napi);
+
+		/* If the space is enough, let napi run again. */
+		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+			done = budget;
+	}
+
+	virtnet_xsk_check_space(sq);
+
+	return done;
+}
+
+static int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	struct xsk_buff_pool *pool;
+	struct netdev_queue *txq;
+	struct send_queue *sq;
+
+	if (!netif_running(dev))
+		return -ENETDOWN;
+
+	if (qid >= vi->curr_queue_pairs)
+		return -EINVAL;
+
+	sq = &vi->sq[qid];
+
+	rcu_read_lock();
+
+	pool = rcu_dereference(sq->xsk.pool);
+	if (!pool)
+		goto end;
+
+	if (napi_if_scheduled_mark_missed(&sq->napi))
+		goto end;
+
+	txq = netdev_get_tx_queue(dev, qid);
+
+	__netif_tx_lock_bh(txq);
+
+	/* Send part of the packet directly to reduce the delay in sending the
+	 * packet, and this can actively trigger the tx interrupts.
+	 *
+	 * If no packet is sent out, the ring of the device is full. In this
+	 * case, we will still get a tx interrupt response. Then we will deal
+	 * with the subsequent packet sending work.
+	 */
+	virtnet_xsk_run(sq, pool, xsk_budget, false);
+
+	__netif_tx_unlock_bh(txq);
+
+end:
+	rcu_read_unlock();
+	return 0;
+}
+
 static int virtnet_xsk_pool_enable(struct net_device *dev,
 				   struct xsk_buff_pool *pool,
 				   u16 qid)
@@ -2553,6 +2733,8 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
 	if (rcu_dereference(sq->xsk.pool))
 		goto end;
 
+	memset(&sq->xsk, 0, sizeof(sq->xsk));
+
 	/* Here is already protected by rtnl_lock, so rcu_assign_pointer is
 	 * safe.
 	 */
@@ -2656,6 +2838,7 @@ static const struct net_device_ops virtnet_netdev = {
 	.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
 	.ndo_bpf		= virtnet_xdp,
 	.ndo_xdp_xmit		= virtnet_xdp_xmit,
+	.ndo_xsk_wakeup         = virtnet_xsk_wakeup,
 	.ndo_features_check	= passthru_features_check,
 	.ndo_get_phys_port_name	= virtnet_get_phys_port_name,
 	.ndo_set_features	= virtnet_set_features,
-- 
2.31.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 5/8] virtio-net: xsk zero copy xmit support xsk unaligned mode
  2021-03-31  7:11 [PATCH net-next v3 0/8] virtio-net support xdp socket zero copy xmit Xuan Zhuo
                   ` (3 preceding siblings ...)
  2021-03-31  7:11 ` [PATCH net-next v3 4/8] virtio-net: xsk zero copy xmit implement wakeup and xmit Xuan Zhuo
@ 2021-03-31  7:11 ` Xuan Zhuo
  2021-04-06  6:55   ` Jason Wang
  2021-03-31  7:11 ` [PATCH net-next v3 6/8] virtio-net: xsk zero copy xmit kick by threshold Xuan Zhuo
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Xuan Zhuo @ 2021-03-31  7:11 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li

In xsk unaligned mode, the frame pointed to by desc may span two
consecutive pages, but not more than two pages.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c8a317a93ef7..259fafcf6028 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2562,24 +2562,42 @@ static void virtnet_xsk_check_space(struct send_queue *sq)
 static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
 			    struct xdp_desc *desc)
 {
+	u32 offset, n, i, copy, copied;
 	struct virtnet_info *vi;
 	struct page *page;
 	void *data;
-	u32 offset;
+	int err, m;
 	u64 addr;
-	int err;
 
 	vi = sq->vq->vdev->priv;
 	addr = desc->addr;
+
 	data = xsk_buff_raw_get_data(pool, addr);
+
 	offset = offset_in_page(data);
+	m = desc->len - (PAGE_SIZE - offset);
+	/* xsk unaligned mode, desc will use two page */
+	if (m > 0)
+		n = 3;
+	else
+		n = 2;
 
-	sg_init_table(sq->sg, 2);
+	sg_init_table(sq->sg, n);
 	sg_set_buf(sq->sg, &xsk_hdr, vi->hdr_len);
-	page = xsk_buff_xdp_get_page(pool, addr);
-	sg_set_page(sq->sg + 1, page, desc->len, offset);
 
-	err = virtqueue_add_outbuf(sq->vq, sq->sg, 2, NULL, GFP_ATOMIC);
+	copied = 0;
+	for (i = 1; i < n; ++i) {
+		copy = min_t(int, desc->len - copied, PAGE_SIZE - offset);
+
+		page = xsk_buff_xdp_get_page(pool, addr + copied);
+
+		sg_set_page(sq->sg + i, page, copy, offset);
+		copied += copy;
+		if (offset)
+			offset = 0;
+	}
+
+	err = virtqueue_add_outbuf(sq->vq, sq->sg, n, NULL, GFP_ATOMIC);
 	if (unlikely(err))
 		sq->xsk.last_desc = *desc;
 
-- 
2.31.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 6/8] virtio-net: xsk zero copy xmit kick by threshold
  2021-03-31  7:11 [PATCH net-next v3 0/8] virtio-net support xdp socket zero copy xmit Xuan Zhuo
                   ` (4 preceding siblings ...)
  2021-03-31  7:11 ` [PATCH net-next v3 5/8] virtio-net: xsk zero copy xmit support xsk unaligned mode Xuan Zhuo
@ 2021-03-31  7:11 ` Xuan Zhuo
  2021-04-06  6:59   ` Jason Wang
  2021-03-31  7:11 ` [PATCH net-next v3 7/8] virtio-net: poll tx call xsk zerocopy xmit Xuan Zhuo
  2021-03-31  7:11 ` [PATCH net-next v3 8/8] virtio-net: free old xmit handle xsk Xuan Zhuo
  7 siblings, 1 reply; 18+ messages in thread
From: Xuan Zhuo @ 2021-03-31  7:11 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li

After testing, the performance of calling kick every time is not stable.
And if all the packets are sent and kicked again, the performance is not
good. So add a module parameter to specify how many packets are sent to
call a kick.

8 is a relatively stable value with the best performance.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 259fafcf6028..d7e95f55478d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -29,10 +29,12 @@ module_param(napi_weight, int, 0444);
 
 static bool csum = true, gso = true, napi_tx = true;
 static int xsk_budget = 32;
+static int xsk_kick_thr = 8;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
 module_param(napi_tx, bool, 0644);
 module_param(xsk_budget, int, 0644);
+module_param(xsk_kick_thr, int, 0644);
 
 /* FIXME: MTU in config. */
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
@@ -2612,6 +2614,8 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,
 	struct xdp_desc desc;
 	int err, packet = 0;
 	int ret = -EAGAIN;
+	int need_kick = 0;
+	int kicks = 0;
 
 	if (sq->xsk.last_desc.addr) {
 		err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc);
@@ -2619,6 +2623,7 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,
 			return -EBUSY;
 
 		++packet;
+		++need_kick;
 		--budget;
 		sq->xsk.last_desc.addr = 0;
 	}
@@ -2642,13 +2647,25 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,
 		}
 
 		++packet;
+		++need_kick;
+		if (need_kick > xsk_kick_thr) {
+			if (virtqueue_kick_prepare(sq->vq) &&
+			    virtqueue_notify(sq->vq))
+				++kicks;
+
+			need_kick = 0;
+		}
 	}
 
 	if (packet) {
-		if (virtqueue_kick_prepare(sq->vq) &&
-		    virtqueue_notify(sq->vq)) {
+		if (need_kick) {
+			if (virtqueue_kick_prepare(sq->vq) &&
+			    virtqueue_notify(sq->vq))
+				++kicks;
+		}
+		if (kicks) {
 			u64_stats_update_begin(&sq->stats.syncp);
-			sq->stats.kicks += 1;
+			sq->stats.kicks += kicks;
 			u64_stats_update_end(&sq->stats.syncp);
 		}
 
-- 
2.31.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 7/8] virtio-net: poll tx call xsk zerocopy xmit
  2021-03-31  7:11 [PATCH net-next v3 0/8] virtio-net support xdp socket zero copy xmit Xuan Zhuo
                   ` (5 preceding siblings ...)
  2021-03-31  7:11 ` [PATCH net-next v3 6/8] virtio-net: xsk zero copy xmit kick by threshold Xuan Zhuo
@ 2021-03-31  7:11 ` Xuan Zhuo
  2021-04-06  7:03   ` Jason Wang
  2021-03-31  7:11 ` [PATCH net-next v3 8/8] virtio-net: free old xmit handle xsk Xuan Zhuo
  7 siblings, 1 reply; 18+ messages in thread
From: Xuan Zhuo @ 2021-03-31  7:11 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li

poll tx call virtnet_xsk_run, then the data in the xsk tx queue will be
continuously consumed by napi.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d7e95f55478d..fac7d0020013 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -264,6 +264,9 @@ struct padded_vnet_hdr {
 	char padding[4];
 };
 
+static int virtnet_xsk_run(struct send_queue *sq, struct xsk_buff_pool *pool,
+			   int budget, bool in_napi);
+
 static bool is_xdp_frame(void *ptr)
 {
 	return (unsigned long)ptr & VIRTIO_XDP_FLAG;
@@ -1553,7 +1556,9 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 	struct send_queue *sq = container_of(napi, struct send_queue, napi);
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	unsigned int index = vq2txq(sq->vq);
+	struct xsk_buff_pool *pool;
 	struct netdev_queue *txq;
+	int work = 0;
 
 	if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
 		/* We don't need to enable cb for XDP */
@@ -1563,15 +1568,24 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 
 	txq = netdev_get_tx_queue(vi->dev, index);
 	__netif_tx_lock(txq, raw_smp_processor_id());
-	free_old_xmit_skbs(sq, true);
+	rcu_read_lock();
+	pool = rcu_dereference(sq->xsk.pool);
+	if (pool) {
+		work = virtnet_xsk_run(sq, pool, budget, true);
+		rcu_read_unlock();
+	} else {
+		rcu_read_unlock();
+		free_old_xmit_skbs(sq, true);
+	}
 	__netif_tx_unlock(txq);
 
-	virtqueue_napi_complete(napi, sq->vq, 0);
+	if (work < budget)
+		virtqueue_napi_complete(napi, sq->vq, 0);
 
 	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
 		netif_tx_wake_queue(txq);
 
-	return 0;
+	return work;
 }
 
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
-- 
2.31.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 8/8] virtio-net: free old xmit handle xsk
  2021-03-31  7:11 [PATCH net-next v3 0/8] virtio-net support xdp socket zero copy xmit Xuan Zhuo
                   ` (6 preceding siblings ...)
  2021-03-31  7:11 ` [PATCH net-next v3 7/8] virtio-net: poll tx call xsk zerocopy xmit Xuan Zhuo
@ 2021-03-31  7:11 ` Xuan Zhuo
  2021-04-06  7:16   ` Jason Wang
  7 siblings, 1 reply; 18+ messages in thread
From: Xuan Zhuo @ 2021-03-31  7:11 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li

Based on the last two bit of ptr returned by virtqueue_get_buf, 01
represents the packet sent by xdp, 10 is the packet sent by xsk, and 00
is skb by default.

If the xmit work of xsk has not been completed, but the ring is full,
napi must first exit and wait for the ring to be available, so
need_wakeup is set. If __free_old_xmit is called first by start_xmit, we
can quickly wake up napi to execute xsk xmit work.

When recycling, we need to count the number of bytes sent, so put xsk
desc->len into the ptr pointer. Because ptr does not point to meaningful
objects in xsk.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 171 ++++++++++++++++++++++++++-------------
 1 file changed, 113 insertions(+), 58 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index fac7d0020013..8318b89b2971 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -50,6 +50,9 @@ module_param(xsk_kick_thr, int, 0644);
 #define VIRTIO_XDP_REDIR	BIT(1)
 
 #define VIRTIO_XDP_FLAG	BIT(0)
+#define VIRTIO_XSK_FLAG	BIT(1)
+
+#define VIRTIO_XSK_PTR_SHIFT       4
 
 static struct virtio_net_hdr_mrg_rxbuf xsk_hdr;
 
@@ -147,6 +150,9 @@ struct send_queue {
 
 		/* save the desc for next xmit, when xmit fail. */
 		struct xdp_desc last_desc;
+
+		/* xsk wait for tx inter or softirq */
+		bool need_wakeup;
 	} xsk;
 };
 
@@ -266,6 +272,12 @@ struct padded_vnet_hdr {
 
 static int virtnet_xsk_run(struct send_queue *sq, struct xsk_buff_pool *pool,
 			   int budget, bool in_napi);
+static void virtnet_xsk_complete(struct send_queue *sq, u32 num);
+
+static bool is_skb_ptr(void *ptr)
+{
+	return !((unsigned long)ptr & (VIRTIO_XDP_FLAG | VIRTIO_XSK_FLAG));
+}
 
 static bool is_xdp_frame(void *ptr)
 {
@@ -277,11 +289,58 @@ static void *xdp_to_ptr(struct xdp_frame *ptr)
 	return (void *)((unsigned long)ptr | VIRTIO_XDP_FLAG);
 }
 
+static void *xsk_to_ptr(struct xdp_desc *desc)
+{
+	/* save the desc len to ptr */
+	u64 p = desc->len << VIRTIO_XSK_PTR_SHIFT;
+
+	return (void *)(p | VIRTIO_XSK_FLAG);
+}
+
+static void ptr_to_xsk(void *ptr, struct xdp_desc *desc)
+{
+	desc->len = ((u64)ptr) >> VIRTIO_XSK_PTR_SHIFT;
+}
+
 static struct xdp_frame *ptr_to_xdp(void *ptr)
 {
 	return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
 }
 
+static void __free_old_xmit(struct send_queue *sq, bool in_napi,
+			    struct virtnet_sq_stats *stats)
+{
+	unsigned int xsknum = 0;
+	unsigned int len;
+	void *ptr;
+
+	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+		if (is_skb_ptr(ptr)) {
+			struct sk_buff *skb = ptr;
+
+			pr_debug("Sent skb %p\n", skb);
+
+			stats->bytes += skb->len;
+			napi_consume_skb(skb, in_napi);
+		} else if (is_xdp_frame(ptr)) {
+			struct xdp_frame *frame = ptr_to_xdp(ptr);
+
+			stats->bytes += frame->len;
+			xdp_return_frame(frame);
+		} else {
+			struct xdp_desc desc;
+
+			ptr_to_xsk(ptr, &desc);
+			stats->bytes += desc.len;
+			++xsknum;
+		}
+		stats->packets++;
+	}
+
+	if (xsknum)
+		virtnet_xsk_complete(sq, xsknum);
+}
+
 /* Converting between virtqueue no. and kernel tx/rx queue no.
  * 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq
  */
@@ -543,15 +602,12 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 			    int n, struct xdp_frame **frames, u32 flags)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
+	struct virtnet_sq_stats stats = {};
 	struct receive_queue *rq = vi->rq;
 	struct bpf_prog *xdp_prog;
 	struct send_queue *sq;
-	unsigned int len;
-	int packets = 0;
-	int bytes = 0;
 	int nxmit = 0;
 	int kicks = 0;
-	void *ptr;
 	int ret;
 	int i;
 
@@ -570,20 +626,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	}
 
 	/* Free up any pending old buffers before queueing new ones. */
-	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
-		if (likely(is_xdp_frame(ptr))) {
-			struct xdp_frame *frame = ptr_to_xdp(ptr);
-
-			bytes += frame->len;
-			xdp_return_frame(frame);
-		} else {
-			struct sk_buff *skb = ptr;
-
-			bytes += skb->len;
-			napi_consume_skb(skb, false);
-		}
-		packets++;
-	}
+	__free_old_xmit(sq, false, &stats);
 
 	for (i = 0; i < n; i++) {
 		struct xdp_frame *xdpf = frames[i];
@@ -600,8 +643,8 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	}
 out:
 	u64_stats_update_begin(&sq->stats.syncp);
-	sq->stats.bytes += bytes;
-	sq->stats.packets += packets;
+	sq->stats.bytes += stats.bytes;
+	sq->stats.packets += stats.packets;
 	sq->stats.xdp_tx += n;
 	sq->stats.xdp_tx_drops += n - nxmit;
 	sq->stats.kicks += kicks;
@@ -1426,37 +1469,19 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 
 static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 {
-	unsigned int len;
-	unsigned int packets = 0;
-	unsigned int bytes = 0;
-	void *ptr;
-
-	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
-		if (likely(!is_xdp_frame(ptr))) {
-			struct sk_buff *skb = ptr;
+	struct virtnet_sq_stats stats = {};
 
-			pr_debug("Sent skb %p\n", skb);
-
-			bytes += skb->len;
-			napi_consume_skb(skb, in_napi);
-		} else {
-			struct xdp_frame *frame = ptr_to_xdp(ptr);
-
-			bytes += frame->len;
-			xdp_return_frame(frame);
-		}
-		packets++;
-	}
+	__free_old_xmit(sq, in_napi, &stats);
 
 	/* Avoid overhead when no packets have been processed
 	 * happens when called speculatively from start_xmit.
 	 */
-	if (!packets)
+	if (!stats.packets)
 		return;
 
 	u64_stats_update_begin(&sq->stats.syncp);
-	sq->stats.bytes += bytes;
-	sq->stats.packets += packets;
+	sq->stats.bytes += stats.bytes;
+	sq->stats.packets += stats.packets;
 	u64_stats_update_end(&sq->stats.syncp);
 }
 
@@ -2575,6 +2600,28 @@ static void virtnet_xsk_check_space(struct send_queue *sq)
 		netif_stop_subqueue(dev, qnum);
 }
 
+static void virtnet_xsk_complete(struct send_queue *sq, u32 num)
+{
+	struct xsk_buff_pool *pool;
+
+	rcu_read_lock();
+
+	pool = rcu_dereference(sq->xsk.pool);
+	if (!pool) {
+		rcu_read_unlock();
+		return;
+	}
+
+	xsk_tx_completed(pool, num);
+
+	rcu_read_unlock();
+
+	if (sq->xsk.need_wakeup) {
+		sq->xsk.need_wakeup = false;
+		virtqueue_napi_schedule(&sq->napi, sq->vq);
+	}
+}
+
 static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
 			    struct xdp_desc *desc)
 {
@@ -2613,7 +2660,8 @@ static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
 			offset = 0;
 	}
 
-	err = virtqueue_add_outbuf(sq->vq, sq->sg, n, NULL, GFP_ATOMIC);
+	err = virtqueue_add_outbuf(sq->vq, sq->sg, n, xsk_to_ptr(desc),
+				   GFP_ATOMIC);
 	if (unlikely(err))
 		sq->xsk.last_desc = *desc;
 
@@ -2623,13 +2671,13 @@ static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
 static int virtnet_xsk_xmit_batch(struct send_queue *sq,
 				  struct xsk_buff_pool *pool,
 				  unsigned int budget,
-				  bool in_napi, int *done)
+				  bool in_napi, int *done,
+				  struct virtnet_sq_stats *stats)
 {
 	struct xdp_desc desc;
 	int err, packet = 0;
 	int ret = -EAGAIN;
 	int need_kick = 0;
-	int kicks = 0;
 
 	if (sq->xsk.last_desc.addr) {
 		err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc);
@@ -2665,7 +2713,7 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,
 		if (need_kick > xsk_kick_thr) {
 			if (virtqueue_kick_prepare(sq->vq) &&
 			    virtqueue_notify(sq->vq))
-				++kicks;
+				++stats->kicks;
 
 			need_kick = 0;
 		}
@@ -2675,15 +2723,11 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,
 		if (need_kick) {
 			if (virtqueue_kick_prepare(sq->vq) &&
 			    virtqueue_notify(sq->vq))
-				++kicks;
-		}
-		if (kicks) {
-			u64_stats_update_begin(&sq->stats.syncp);
-			sq->stats.kicks += kicks;
-			u64_stats_update_end(&sq->stats.syncp);
+				++stats->kicks;
 		}
 
 		*done = packet;
+		stats->xdp_tx += packet;
 
 		xsk_tx_release(pool);
 	}
@@ -2694,26 +2738,37 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,
 static int virtnet_xsk_run(struct send_queue *sq, struct xsk_buff_pool *pool,
 			   int budget, bool in_napi)
 {
+	struct virtnet_sq_stats stats = {};
 	int done = 0;
 	int err;
 
-	free_old_xmit_skbs(sq, in_napi);
+	sq->xsk.need_wakeup = false;
+	__free_old_xmit(sq, in_napi, &stats);
 
-	err = virtnet_xsk_xmit_batch(sq, pool, budget, in_napi, &done);
+	err = virtnet_xsk_xmit_batch(sq, pool, budget, in_napi, &done, &stats);
 	/* -EAGAIN: done == budget
 	 * -EBUSY: done < budget
 	 *  0    : done < budget
 	 */
 	if (err == -EBUSY) {
-		free_old_xmit_skbs(sq, in_napi);
+		__free_old_xmit(sq, in_napi, &stats);
 
 		/* If the space is enough, let napi run again. */
 		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
 			done = budget;
+		else
+			sq->xsk.need_wakeup = true;
 	}
 
 	virtnet_xsk_check_space(sq);
 
+	u64_stats_update_begin(&sq->stats.syncp);
+	sq->stats.packets += stats.packets;
+	sq->stats.bytes += stats.bytes;
+	sq->stats.kicks += stats.kicks;
+	sq->stats.xdp_tx += stats.xdp_tx;
+	u64_stats_update_end(&sq->stats.syncp);
+
 	return done;
 }
 
@@ -2991,9 +3046,9 @@ static void free_unused_bufs(struct virtnet_info *vi)
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		struct virtqueue *vq = vi->sq[i].vq;
 		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
-			if (!is_xdp_frame(buf))
+			if (is_skb_ptr(buf))
 				dev_kfree_skb(buf);
-			else
+			else if (is_xdp_frame(buf))
 				xdp_return_frame(ptr_to_xdp(buf));
 		}
 	}
-- 
2.31.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 3/8] virtio-net: xsk zero copy xmit setup
  2021-03-31  7:11 ` [PATCH net-next v3 3/8] virtio-net: xsk zero copy xmit setup Xuan Zhuo
@ 2021-04-06  4:27   ` Jason Wang
  2021-04-07 10:02     ` Magnus Karlsson
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2021-04-06  4:27 UTC (permalink / raw)
  To: Xuan Zhuo, netdev
  Cc: Michael S. Tsirkin, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li


在 2021/3/31 下午3:11, Xuan Zhuo 写道:
> xsk is a high-performance packet receiving and sending technology.
>
> This patch implements the binding and unbinding operations of xsk and
> the virtio-net queue for xsk zero copy xmit.
>
> The xsk zero copy xmit depends on tx napi. So if tx napi is not opened,
> an error will be reported. And the entire operation is under the
> protection of rtnl_lock
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> ---
>   drivers/net/virtio_net.c | 66 ++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 66 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index bb4ea9dbc16b..4e25408a2b37 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -22,6 +22,7 @@
>   #include <net/route.h>
>   #include <net/xdp.h>
>   #include <net/net_failover.h>
> +#include <net/xdp_sock_drv.h>
>   
>   static int napi_weight = NAPI_POLL_WEIGHT;
>   module_param(napi_weight, int, 0444);
> @@ -133,6 +134,11 @@ struct send_queue {
>   	struct virtnet_sq_stats stats;
>   
>   	struct napi_struct napi;
> +
> +	struct {
> +		/* xsk pool */
> +		struct xsk_buff_pool __rcu *pool;
> +	} xsk;
>   };
>   
>   /* Internal representation of a receive virtqueue */
> @@ -2526,11 +2532,71 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>   	return err;
>   }
>   
> +static int virtnet_xsk_pool_enable(struct net_device *dev,
> +				   struct xsk_buff_pool *pool,
> +				   u16 qid)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	struct send_queue *sq;
> +	int ret = -EBUSY;


I'd rather move this under the check of xsk.pool.


> +
> +	if (qid >= vi->curr_queue_pairs)
> +		return -EINVAL;
> +
> +	sq = &vi->sq[qid];
> +
> +	/* xsk zerocopy depend on the tx napi */


Need more comments to explain why tx NAPI is required here.

And what's more important, tx NAPI could be enabled/disable via ethtool, 
what if the NAPI is disabled after xsk is enabled?


> +	if (!sq->napi.weight)
> +		return -EPERM;
> +
> +	rcu_read_lock();
> +	if (rcu_dereference(sq->xsk.pool))
> +		goto end;


Under what condition can we reach here?


> +
> +	/* Here is already protected by rtnl_lock, so rcu_assign_pointer is
> +	 * safe.
> +	 */
> +	rcu_assign_pointer(sq->xsk.pool, pool);
> +	ret = 0;
> +end:
> +	rcu_read_unlock();
> +
> +	return ret;
> +}
> +
> +static int virtnet_xsk_pool_disable(struct net_device *dev, u16 qid)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	struct send_queue *sq;
> +
> +	if (qid >= vi->curr_queue_pairs)
> +		return -EINVAL;
> +
> +	sq = &vi->sq[qid];
> +
> +	/* Here is already protected by rtnl_lock, so rcu_assign_pointer is
> +	 * safe.
> +	 */
> +	rcu_assign_pointer(sq->xsk.pool, NULL);
> +
> +	synchronize_rcu(); /* Sync with the XSK wakeup and with NAPI. */


Since rtnl is held here, I guess it's better to use synchornize_net().


> +
> +	return 0;
> +}
> +
>   static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>   {
>   	switch (xdp->command) {
>   	case XDP_SETUP_PROG:
>   		return virtnet_xdp_set(dev, xdp->prog, xdp->extack);
> +	case XDP_SETUP_XSK_POOL:
> +		/* virtio net not use dma before call vring api */
> +		xdp->xsk.check_dma = false;


I think it's better not open code things like this. How about introduce 
new parameters in xp_assign_dev()?


> +		if (xdp->xsk.pool)
> +			return virtnet_xsk_pool_enable(dev, xdp->xsk.pool,
> +						       xdp->xsk.queue_id);
> +		else
> +			return virtnet_xsk_pool_disable(dev, xdp->xsk.queue_id);
>   	default:
>   		return -EINVAL;
>   	}


Thanks


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 4/8] virtio-net: xsk zero copy xmit implement wakeup and xmit
  2021-03-31  7:11 ` [PATCH net-next v3 4/8] virtio-net: xsk zero copy xmit implement wakeup and xmit Xuan Zhuo
@ 2021-04-06  6:19   ` Jason Wang
  2021-04-07  9:56     ` Magnus Karlsson
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2021-04-06  6:19 UTC (permalink / raw)
  To: Xuan Zhuo, netdev
  Cc: Michael S. Tsirkin, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li


在 2021/3/31 下午3:11, Xuan Zhuo 写道:
> When the user calls sendto to consume the data in the xsk tx queue,
> virtnet_xsk_wakeup will be called.
>
> In wakeup, it will try to send a part of the data directly, the quantity
> is operated by the module parameter xsk_budget.


Any reason that we can't use NAPI budget?


>   There are two purposes
> for this realization:
>
> 1. Send part of the data quickly to reduce the transmission delay of the
>     first packet
> 2. Trigger tx interrupt, start napi to consume xsk tx data
>
> All sent xsk packets share the virtio-net header of xsk_hdr. If xsk
> needs to support csum and other functions later, consider assigning xsk
> hdr separately for each sent packet.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> ---
>   drivers/net/virtio_net.c | 183 +++++++++++++++++++++++++++++++++++++++
>   1 file changed, 183 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 4e25408a2b37..c8a317a93ef7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -28,9 +28,11 @@ static int napi_weight = NAPI_POLL_WEIGHT;
>   module_param(napi_weight, int, 0444);
>   
>   static bool csum = true, gso = true, napi_tx = true;
> +static int xsk_budget = 32;
>   module_param(csum, bool, 0444);
>   module_param(gso, bool, 0444);
>   module_param(napi_tx, bool, 0644);
> +module_param(xsk_budget, int, 0644);
>   
>   /* FIXME: MTU in config. */
>   #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> @@ -47,6 +49,8 @@ module_param(napi_tx, bool, 0644);
>   
>   #define VIRTIO_XDP_FLAG	BIT(0)
>   
> +static struct virtio_net_hdr_mrg_rxbuf xsk_hdr;
> +
>   /* RX packet size EWMA. The average packet size is used to determine the packet
>    * buffer size when refilling RX rings. As the entire RX ring may be refilled
>    * at once, the weight is chosen so that the EWMA will be insensitive to short-
> @@ -138,6 +142,9 @@ struct send_queue {
>   	struct {
>   		/* xsk pool */
>   		struct xsk_buff_pool __rcu *pool;
> +
> +		/* save the desc for next xmit, when xmit fail. */
> +		struct xdp_desc last_desc;
>   	} xsk;
>   };
>   
> @@ -2532,6 +2539,179 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>   	return err;
>   }
>   
> +static void virtnet_xsk_check_space(struct send_queue *sq)
> +{


The name is confusing, the function does more than just checking the 
space, it may stop the queue as well.


> +	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	struct net_device *dev = vi->dev;
> +	int qnum = sq - vi->sq;
> +
> +	/* If this sq is not the exclusive queue of the current cpu,
> +	 * then it may be called by start_xmit, so check it running out
> +	 * of space.
> +	 */


So the code can explain itself. We need a better comment to explain why 
we need to differ the case of the raw buffer queue.


> +	if (is_xdp_raw_buffer_queue(vi, qnum))
> +		return;
> +
> +	/* Stop the queue to avoid getting packets that we are
> +	 * then unable to transmit. Then wait the tx interrupt.
> +	 */
> +	if (sq->vq->num_free < 2 + MAX_SKB_FRAGS)
> +		netif_stop_subqueue(dev, qnum);
> +}
> +
> +static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
> +			    struct xdp_desc *desc)
> +{
> +	struct virtnet_info *vi;
> +	struct page *page;
> +	void *data;
> +	u32 offset;
> +	u64 addr;
> +	int err;
> +
> +	vi = sq->vq->vdev->priv;
> +	addr = desc->addr;
> +	data = xsk_buff_raw_get_data(pool, addr);
> +	offset = offset_in_page(data);
> +
> +	sg_init_table(sq->sg, 2);
> +	sg_set_buf(sq->sg, &xsk_hdr, vi->hdr_len);
> +	page = xsk_buff_xdp_get_page(pool, addr);
> +	sg_set_page(sq->sg + 1, page, desc->len, offset);
> +
> +	err = virtqueue_add_outbuf(sq->vq, sq->sg, 2, NULL, GFP_ATOMIC);
> +	if (unlikely(err))
> +		sq->xsk.last_desc = *desc;


So I think it's better to make sure we had at least 2 slots to avoid 
handling errors like this? (Especially consider the queue size is not 
necessarily the power of 2 when packed virtqueue is used).


> +
> +	return err;
> +}
> +
> +static int virtnet_xsk_xmit_batch(struct send_queue *sq,
> +				  struct xsk_buff_pool *pool,
> +				  unsigned int budget,
> +				  bool in_napi, int *done)
> +{
> +	struct xdp_desc desc;
> +	int err, packet = 0;
> +	int ret = -EAGAIN;
> +
> +	if (sq->xsk.last_desc.addr) {


Any reason that num_free is not checked here?


> +		err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc);
> +		if (unlikely(err))
> +			return -EBUSY;
> +
> +		++packet;
> +		--budget;
> +		sq->xsk.last_desc.addr = 0;
> +	}
> +
> +	while (budget-- > 0) {
> +		if (sq->vq->num_free < 2 + MAX_SKB_FRAGS) {
> +			ret = -EBUSY;
> +			break;
> +		}
> +
> +		if (!xsk_tx_peek_desc(pool, &desc)) {
> +			/* done */
> +			ret = 0;
> +			break;
> +		}
> +
> +		err = virtnet_xsk_xmit(sq, pool, &desc);
> +		if (unlikely(err)) {
> +			ret = -EBUSY;
> +			break;
> +		}
> +
> +		++packet;
> +	}
> +
> +	if (packet) {
> +		if (virtqueue_kick_prepare(sq->vq) &&
> +		    virtqueue_notify(sq->vq)) {
> +			u64_stats_update_begin(&sq->stats.syncp);
> +			sq->stats.kicks += 1;
> +			u64_stats_update_end(&sq->stats.syncp);
> +		}
> +
> +		*done = packet;
> +
> +		xsk_tx_release(pool);
> +	}
> +
> +	return ret;
> +}
> +
> +static int virtnet_xsk_run(struct send_queue *sq, struct xsk_buff_pool *pool,
> +			   int budget, bool in_napi)
> +{
> +	int done = 0;
> +	int err;
> +
> +	free_old_xmit_skbs(sq, in_napi);
> +
> +	err = virtnet_xsk_xmit_batch(sq, pool, budget, in_napi, &done);
> +	/* -EAGAIN: done == budget
> +	 * -EBUSY: done < budget
> +	 *  0    : done < budget
> +	 */


Please move them to the comment above virtnet_xsk_xmit_batch().

And it looks to me there's no care for -EAGAIN, any reason for sticking 
a dedicated variable like that?


> +	if (err == -EBUSY) {
> +		free_old_xmit_skbs(sq, in_napi);
> +
> +		/* If the space is enough, let napi run again. */
> +		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> +			done = budget;


So I don't see how this can work in the case of event index where the 
notification needs to be enabled explicitly.


> +	}
> +
> +	virtnet_xsk_check_space(sq);
> +
> +	return done;
> +}
> +
> +static int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	struct xsk_buff_pool *pool;
> +	struct netdev_queue *txq;
> +	struct send_queue *sq;
> +
> +	if (!netif_running(dev))
> +		return -ENETDOWN;
> +
> +	if (qid >= vi->curr_queue_pairs)
> +		return -EINVAL;
> +
> +	sq = &vi->sq[qid];
> +
> +	rcu_read_lock();
> +
> +	pool = rcu_dereference(sq->xsk.pool);
> +	if (!pool)
> +		goto end;
> +
> +	if (napi_if_scheduled_mark_missed(&sq->napi))
> +		goto end;
> +
> +	txq = netdev_get_tx_queue(dev, qid);
> +
> +	__netif_tx_lock_bh(txq);
> +
> +	/* Send part of the packet directly to reduce the delay in sending the
> +	 * packet, and this can actively trigger the tx interrupts.
> +	 *
> +	 * If no packet is sent out, the ring of the device is full. In this
> +	 * case, we will still get a tx interrupt response. Then we will deal
> +	 * with the subsequent packet sending work.
> +	 */
> +	virtnet_xsk_run(sq, pool, xsk_budget, false);


So the return value is ignored, this means there's no way to report we 
exhaust the budget. Is this intended?

Thanks


> +
> +	__netif_tx_unlock_bh(txq);
> +
> +end:
> +	rcu_read_unlock();
> +	return 0;
> +}
> +
>   static int virtnet_xsk_pool_enable(struct net_device *dev,
>   				   struct xsk_buff_pool *pool,
>   				   u16 qid)
> @@ -2553,6 +2733,8 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
>   	if (rcu_dereference(sq->xsk.pool))
>   		goto end;
>   
> +	memset(&sq->xsk, 0, sizeof(sq->xsk));
> +
>   	/* Here is already protected by rtnl_lock, so rcu_assign_pointer is
>   	 * safe.
>   	 */
> @@ -2656,6 +2838,7 @@ static const struct net_device_ops virtnet_netdev = {
>   	.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
>   	.ndo_bpf		= virtnet_xdp,
>   	.ndo_xdp_xmit		= virtnet_xdp_xmit,
> +	.ndo_xsk_wakeup         = virtnet_xsk_wakeup,
>   	.ndo_features_check	= passthru_features_check,
>   	.ndo_get_phys_port_name	= virtnet_get_phys_port_name,
>   	.ndo_set_features	= virtnet_set_features,


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 5/8] virtio-net: xsk zero copy xmit support xsk unaligned mode
  2021-03-31  7:11 ` [PATCH net-next v3 5/8] virtio-net: xsk zero copy xmit support xsk unaligned mode Xuan Zhuo
@ 2021-04-06  6:55   ` Jason Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2021-04-06  6:55 UTC (permalink / raw)
  To: Xuan Zhuo, netdev
  Cc: Michael S. Tsirkin, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li


在 2021/3/31 下午3:11, Xuan Zhuo 写道:
> In xsk unaligned mode, the frame pointed to by desc may span two
> consecutive pages, but not more than two pages.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>


I'd squash this patch into patch 4.


> ---
>   drivers/net/virtio_net.c | 30 ++++++++++++++++++++++++------
>   1 file changed, 24 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c8a317a93ef7..259fafcf6028 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2562,24 +2562,42 @@ static void virtnet_xsk_check_space(struct send_queue *sq)
>   static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
>   			    struct xdp_desc *desc)
>   {
> +	u32 offset, n, i, copy, copied;


Let's use a better name since we don't actually copy anything here.


>   	struct virtnet_info *vi;
>   	struct page *page;
>   	void *data;
> -	u32 offset;
> +	int err, m;
>   	u64 addr;
> -	int err;
>   
>   	vi = sq->vq->vdev->priv;
>   	addr = desc->addr;
> +
>   	data = xsk_buff_raw_get_data(pool, addr);
> +
>   	offset = offset_in_page(data);
> +	m = desc->len - (PAGE_SIZE - offset);
> +	/* xsk unaligned mode, desc will use two page */
> +	if (m > 0)
> +		n = 3;
> +	else
> +		n = 2;
>   
> -	sg_init_table(sq->sg, 2);
> +	sg_init_table(sq->sg, n);
>   	sg_set_buf(sq->sg, &xsk_hdr, vi->hdr_len);
> -	page = xsk_buff_xdp_get_page(pool, addr);
> -	sg_set_page(sq->sg + 1, page, desc->len, offset);
>   
> -	err = virtqueue_add_outbuf(sq->vq, sq->sg, 2, NULL, GFP_ATOMIC);
> +	copied = 0;
> +	for (i = 1; i < n; ++i) {
> +		copy = min_t(int, desc->len - copied, PAGE_SIZE - offset);
> +
> +		page = xsk_buff_xdp_get_page(pool, addr + copied);
> +
> +		sg_set_page(sq->sg + i, page, copy, offset);
> +		copied += copy;
> +		if (offset)
> +			offset = 0;
> +	}


Can we simplify the codes by using while here? Then I think we don't 
need to determine the value of n.

Thanks


> +
> +	err = virtqueue_add_outbuf(sq->vq, sq->sg, n, NULL, GFP_ATOMIC);
>   	if (unlikely(err))
>   		sq->xsk.last_desc = *desc;
>   


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 6/8] virtio-net: xsk zero copy xmit kick by threshold
  2021-03-31  7:11 ` [PATCH net-next v3 6/8] virtio-net: xsk zero copy xmit kick by threshold Xuan Zhuo
@ 2021-04-06  6:59   ` Jason Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2021-04-06  6:59 UTC (permalink / raw)
  To: Xuan Zhuo, netdev
  Cc: Michael S. Tsirkin, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li


在 2021/3/31 下午3:11, Xuan Zhuo 写道:
> After testing, the performance of calling kick every time is not stable.
> And if all the packets are sent and kicked again, the performance is not
> good. So add a module parameter to specify how many packets are sent to
> call a kick.
>
> 8 is a relatively stable value with the best performance.


Please add some perf numbers here.

Thanks


>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> ---
>   drivers/net/virtio_net.c | 23 ++++++++++++++++++++---
>   1 file changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 259fafcf6028..d7e95f55478d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -29,10 +29,12 @@ module_param(napi_weight, int, 0444);
>   
>   static bool csum = true, gso = true, napi_tx = true;
>   static int xsk_budget = 32;
> +static int xsk_kick_thr = 8;
>   module_param(csum, bool, 0444);
>   module_param(gso, bool, 0444);
>   module_param(napi_tx, bool, 0644);
>   module_param(xsk_budget, int, 0644);
> +module_param(xsk_kick_thr, int, 0644);
>   
>   /* FIXME: MTU in config. */
>   #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> @@ -2612,6 +2614,8 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,
>   	struct xdp_desc desc;
>   	int err, packet = 0;
>   	int ret = -EAGAIN;
> +	int need_kick = 0;
> +	int kicks = 0;
>   
>   	if (sq->xsk.last_desc.addr) {
>   		err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc);
> @@ -2619,6 +2623,7 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,
>   			return -EBUSY;
>   
>   		++packet;
> +		++need_kick;
>   		--budget;
>   		sq->xsk.last_desc.addr = 0;
>   	}
> @@ -2642,13 +2647,25 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,
>   		}
>   
>   		++packet;
> +		++need_kick;
> +		if (need_kick > xsk_kick_thr) {
> +			if (virtqueue_kick_prepare(sq->vq) &&
> +			    virtqueue_notify(sq->vq))
> +				++kicks;
> +
> +			need_kick = 0;
> +		}
>   	}
>   
>   	if (packet) {
> -		if (virtqueue_kick_prepare(sq->vq) &&
> -		    virtqueue_notify(sq->vq)) {
> +		if (need_kick) {
> +			if (virtqueue_kick_prepare(sq->vq) &&
> +			    virtqueue_notify(sq->vq))
> +				++kicks;
> +		}
> +		if (kicks) {
>   			u64_stats_update_begin(&sq->stats.syncp);
> -			sq->stats.kicks += 1;
> +			sq->stats.kicks += kicks;
>   			u64_stats_update_end(&sq->stats.syncp);
>   		}
>   


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 7/8] virtio-net: poll tx call xsk zerocopy xmit
  2021-03-31  7:11 ` [PATCH net-next v3 7/8] virtio-net: poll tx call xsk zerocopy xmit Xuan Zhuo
@ 2021-04-06  7:03   ` Jason Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2021-04-06  7:03 UTC (permalink / raw)
  To: Xuan Zhuo, netdev
  Cc: Michael S. Tsirkin, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li


在 2021/3/31 下午3:11, Xuan Zhuo 写道:
> poll tx call virtnet_xsk_run, then the data in the xsk tx queue will be
> continuously consumed by napi.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>


I think we need squash this into patch 4, it looks more like a bug fix 
to me.


> ---
>   drivers/net/virtio_net.c | 20 +++++++++++++++++---
>   1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index d7e95f55478d..fac7d0020013 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -264,6 +264,9 @@ struct padded_vnet_hdr {
>   	char padding[4];
>   };
>   
> +static int virtnet_xsk_run(struct send_queue *sq, struct xsk_buff_pool *pool,
> +			   int budget, bool in_napi);
> +
>   static bool is_xdp_frame(void *ptr)
>   {
>   	return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> @@ -1553,7 +1556,9 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>   	struct send_queue *sq = container_of(napi, struct send_queue, napi);
>   	struct virtnet_info *vi = sq->vq->vdev->priv;
>   	unsigned int index = vq2txq(sq->vq);
> +	struct xsk_buff_pool *pool;
>   	struct netdev_queue *txq;
> +	int work = 0;
>   
>   	if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
>   		/* We don't need to enable cb for XDP */
> @@ -1563,15 +1568,24 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>   
>   	txq = netdev_get_tx_queue(vi->dev, index);
>   	__netif_tx_lock(txq, raw_smp_processor_id());
> -	free_old_xmit_skbs(sq, true);
> +	rcu_read_lock();
> +	pool = rcu_dereference(sq->xsk.pool);
> +	if (pool) {
> +		work = virtnet_xsk_run(sq, pool, budget, true);
> +		rcu_read_unlock();
> +	} else {
> +		rcu_read_unlock();
> +		free_old_xmit_skbs(sq, true);
> +	}
>   	__netif_tx_unlock(txq);
>   
> -	virtqueue_napi_complete(napi, sq->vq, 0);
> +	if (work < budget)
> +		virtqueue_napi_complete(napi, sq->vq, 0);
>   
>   	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>   		netif_tx_wake_queue(txq);
>   
> -	return 0;
> +	return work;


Need a separate patch to "fix" the budget returned by poll_tx here.

Thanks


>   }
>   
>   static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 8/8] virtio-net: free old xmit handle xsk
  2021-03-31  7:11 ` [PATCH net-next v3 8/8] virtio-net: free old xmit handle xsk Xuan Zhuo
@ 2021-04-06  7:16   ` Jason Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2021-04-06  7:16 UTC (permalink / raw)
  To: Xuan Zhuo, netdev
  Cc: Michael S. Tsirkin, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li


在 2021/3/31 下午3:11, Xuan Zhuo 写道:
> Based on the last two bit of ptr returned by virtqueue_get_buf, 01
> represents the packet sent by xdp, 10 is the packet sent by xsk, and 00
> is skb by default.
>
> If the xmit work of xsk has not been completed, but the ring is full,
> napi must first exit and wait for the ring to be available, so
> need_wakeup is set. If __free_old_xmit is called first by start_xmit, we
> can quickly wake up napi to execute xsk xmit work.
>
> When recycling, we need to count the number of bytes sent, so put xsk
> desc->len into the ptr pointer. Because ptr does not point to meaningful
> objects in xsk.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>


This needs to be squashed into patch 4.


> ---
>   drivers/net/virtio_net.c | 171 ++++++++++++++++++++++++++-------------
>   1 file changed, 113 insertions(+), 58 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index fac7d0020013..8318b89b2971 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -50,6 +50,9 @@ module_param(xsk_kick_thr, int, 0644);
>   #define VIRTIO_XDP_REDIR	BIT(1)
>   
>   #define VIRTIO_XDP_FLAG	BIT(0)
> +#define VIRTIO_XSK_FLAG	BIT(1)
> +
> +#define VIRTIO_XSK_PTR_SHIFT       4
>   
>   static struct virtio_net_hdr_mrg_rxbuf xsk_hdr;
>   
> @@ -147,6 +150,9 @@ struct send_queue {
>   
>   		/* save the desc for next xmit, when xmit fail. */
>   		struct xdp_desc last_desc;
> +
> +		/* xsk wait for tx inter or softirq */
> +		bool need_wakeup;
>   	} xsk;
>   };
>   
> @@ -266,6 +272,12 @@ struct padded_vnet_hdr {
>   
>   static int virtnet_xsk_run(struct send_queue *sq, struct xsk_buff_pool *pool,
>   			   int budget, bool in_napi);
> +static void virtnet_xsk_complete(struct send_queue *sq, u32 num);
> +
> +static bool is_skb_ptr(void *ptr)
> +{
> +	return !((unsigned long)ptr & (VIRTIO_XDP_FLAG | VIRTIO_XSK_FLAG));
> +}
>   
>   static bool is_xdp_frame(void *ptr)
>   {
> @@ -277,11 +289,58 @@ static void *xdp_to_ptr(struct xdp_frame *ptr)
>   	return (void *)((unsigned long)ptr | VIRTIO_XDP_FLAG);
>   }
>   
> +static void *xsk_to_ptr(struct xdp_desc *desc)
> +{
> +	/* save the desc len to ptr */
> +	u64 p = desc->len << VIRTIO_XSK_PTR_SHIFT;
> +
> +	return (void *)(p | VIRTIO_XSK_FLAG);
> +}
> +
> +static void ptr_to_xsk(void *ptr, struct xdp_desc *desc)
> +{
> +	desc->len = ((u64)ptr) >> VIRTIO_XSK_PTR_SHIFT;
> +}
> +
>   static struct xdp_frame *ptr_to_xdp(void *ptr)
>   {
>   	return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
>   }
>   
> +static void __free_old_xmit(struct send_queue *sq, bool in_napi,
> +			    struct virtnet_sq_stats *stats)
> +{
> +	unsigned int xsknum = 0;
> +	unsigned int len;
> +	void *ptr;
> +
> +	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +		if (is_skb_ptr(ptr)) {
> +			struct sk_buff *skb = ptr;
> +
> +			pr_debug("Sent skb %p\n", skb);
> +
> +			stats->bytes += skb->len;
> +			napi_consume_skb(skb, in_napi);
> +		} else if (is_xdp_frame(ptr)) {
> +			struct xdp_frame *frame = ptr_to_xdp(ptr);
> +
> +			stats->bytes += frame->len;
> +			xdp_return_frame(frame);
> +		} else {
> +			struct xdp_desc desc;
> +
> +			ptr_to_xsk(ptr, &desc);
> +			stats->bytes += desc.len;
> +			++xsknum;
> +		}
> +		stats->packets++;
> +	}
> +
> +	if (xsknum)
> +		virtnet_xsk_complete(sq, xsknum);
> +}
> +
>   /* Converting between virtqueue no. and kernel tx/rx queue no.
>    * 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq
>    */
> @@ -543,15 +602,12 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>   			    int n, struct xdp_frame **frames, u32 flags)
>   {
>   	struct virtnet_info *vi = netdev_priv(dev);
> +	struct virtnet_sq_stats stats = {};
>   	struct receive_queue *rq = vi->rq;
>   	struct bpf_prog *xdp_prog;
>   	struct send_queue *sq;
> -	unsigned int len;
> -	int packets = 0;
> -	int bytes = 0;
>   	int nxmit = 0;
>   	int kicks = 0;
> -	void *ptr;
>   	int ret;
>   	int i;
>   
> @@ -570,20 +626,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>   	}
>   
>   	/* Free up any pending old buffers before queueing new ones. */
> -	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> -		if (likely(is_xdp_frame(ptr))) {
> -			struct xdp_frame *frame = ptr_to_xdp(ptr);
> -
> -			bytes += frame->len;
> -			xdp_return_frame(frame);
> -		} else {
> -			struct sk_buff *skb = ptr;
> -
> -			bytes += skb->len;
> -			napi_consume_skb(skb, false);
> -		}
> -		packets++;
> -	}
> +	__free_old_xmit(sq, false, &stats);


Let's use a separate patch for this kind of factoring.


>   
>   	for (i = 0; i < n; i++) {
>   		struct xdp_frame *xdpf = frames[i];
> @@ -600,8 +643,8 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>   	}
>   out:
>   	u64_stats_update_begin(&sq->stats.syncp);
> -	sq->stats.bytes += bytes;
> -	sq->stats.packets += packets;
> +	sq->stats.bytes += stats.bytes;
> +	sq->stats.packets += stats.packets;
>   	sq->stats.xdp_tx += n;
>   	sq->stats.xdp_tx_drops += n - nxmit;
>   	sq->stats.kicks += kicks;
> @@ -1426,37 +1469,19 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>   
>   static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>   {
> -	unsigned int len;
> -	unsigned int packets = 0;
> -	unsigned int bytes = 0;
> -	void *ptr;
> -
> -	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> -		if (likely(!is_xdp_frame(ptr))) {
> -			struct sk_buff *skb = ptr;
> +	struct virtnet_sq_stats stats = {};
>   
> -			pr_debug("Sent skb %p\n", skb);
> -
> -			bytes += skb->len;
> -			napi_consume_skb(skb, in_napi);
> -		} else {
> -			struct xdp_frame *frame = ptr_to_xdp(ptr);
> -
> -			bytes += frame->len;
> -			xdp_return_frame(frame);
> -		}
> -		packets++;
> -	}
> +	__free_old_xmit(sq, in_napi, &stats);
>   
>   	/* Avoid overhead when no packets have been processed
>   	 * happens when called speculatively from start_xmit.
>   	 */
> -	if (!packets)
> +	if (!stats.packets)
>   		return;
>   
>   	u64_stats_update_begin(&sq->stats.syncp);
> -	sq->stats.bytes += bytes;
> -	sq->stats.packets += packets;
> +	sq->stats.bytes += stats.bytes;
> +	sq->stats.packets += stats.packets;
>   	u64_stats_update_end(&sq->stats.syncp);
>   }
>   
> @@ -2575,6 +2600,28 @@ static void virtnet_xsk_check_space(struct send_queue *sq)
>   		netif_stop_subqueue(dev, qnum);
>   }
>   
> +static void virtnet_xsk_complete(struct send_queue *sq, u32 num)
> +{
> +	struct xsk_buff_pool *pool;
> +
> +	rcu_read_lock();
> +
> +	pool = rcu_dereference(sq->xsk.pool);
> +	if (!pool) {
> +		rcu_read_unlock();
> +		return;
> +	}
> +
> +	xsk_tx_completed(pool, num);
> +
> +	rcu_read_unlock();
> +
> +	if (sq->xsk.need_wakeup) {
> +		sq->xsk.need_wakeup = false;
> +		virtqueue_napi_schedule(&sq->napi, sq->vq);
> +	}
> +}
> +
>   static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
>   			    struct xdp_desc *desc)
>   {
> @@ -2613,7 +2660,8 @@ static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
>   			offset = 0;
>   	}
>   
> -	err = virtqueue_add_outbuf(sq->vq, sq->sg, n, NULL, GFP_ATOMIC);
> +	err = virtqueue_add_outbuf(sq->vq, sq->sg, n, xsk_to_ptr(desc),
> +				   GFP_ATOMIC);
>   	if (unlikely(err))
>   		sq->xsk.last_desc = *desc;
>   
> @@ -2623,13 +2671,13 @@ static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
>   static int virtnet_xsk_xmit_batch(struct send_queue *sq,
>   				  struct xsk_buff_pool *pool,
>   				  unsigned int budget,
> -				  bool in_napi, int *done)
> +				  bool in_napi, int *done,
> +				  struct virtnet_sq_stats *stats)
>   {
>   	struct xdp_desc desc;
>   	int err, packet = 0;
>   	int ret = -EAGAIN;
>   	int need_kick = 0;
> -	int kicks = 0;
>   
>   	if (sq->xsk.last_desc.addr) {
>   		err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc);
> @@ -2665,7 +2713,7 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,
>   		if (need_kick > xsk_kick_thr) {
>   			if (virtqueue_kick_prepare(sq->vq) &&
>   			    virtqueue_notify(sq->vq))
> -				++kicks;
> +				++stats->kicks;
>   
>   			need_kick = 0;
>   		}
> @@ -2675,15 +2723,11 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,
>   		if (need_kick) {
>   			if (virtqueue_kick_prepare(sq->vq) &&
>   			    virtqueue_notify(sq->vq))
> -				++kicks;
> -		}
> -		if (kicks) {
> -			u64_stats_update_begin(&sq->stats.syncp);
> -			sq->stats.kicks += kicks;
> -			u64_stats_update_end(&sq->stats.syncp);
> +				++stats->kicks;
>   		}
>   
>   		*done = packet;
> +		stats->xdp_tx += packet;
>   
>   		xsk_tx_release(pool);
>   	}
> @@ -2694,26 +2738,37 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq,
>   static int virtnet_xsk_run(struct send_queue *sq, struct xsk_buff_pool *pool,
>   			   int budget, bool in_napi)
>   {
> +	struct virtnet_sq_stats stats = {};
>   	int done = 0;
>   	int err;
>   
> -	free_old_xmit_skbs(sq, in_napi);
> +	sq->xsk.need_wakeup = false;
> +	__free_old_xmit(sq, in_napi, &stats);
>   
> -	err = virtnet_xsk_xmit_batch(sq, pool, budget, in_napi, &done);
> +	err = virtnet_xsk_xmit_batch(sq, pool, budget, in_napi, &done, &stats);
>   	/* -EAGAIN: done == budget
>   	 * -EBUSY: done < budget
>   	 *  0    : done < budget
>   	 */
>   	if (err == -EBUSY) {
> -		free_old_xmit_skbs(sq, in_napi);
> +		__free_old_xmit(sq, in_napi, &stats);
>   
>   		/* If the space is enough, let napi run again. */
>   		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>   			done = budget;


So I don't see how NAPI can be rescheduled here.

Thanks


> +		else
> +			sq->xsk.need_wakeup = true;
>   	}
>   
>   	virtnet_xsk_check_space(sq);
>   
> +	u64_stats_update_begin(&sq->stats.syncp);
> +	sq->stats.packets += stats.packets;
> +	sq->stats.bytes += stats.bytes;
> +	sq->stats.kicks += stats.kicks;
> +	sq->stats.xdp_tx += stats.xdp_tx;
> +	u64_stats_update_end(&sq->stats.syncp);
> +
>   	return done;
>   }
>   
> @@ -2991,9 +3046,9 @@ static void free_unused_bufs(struct virtnet_info *vi)
>   	for (i = 0; i < vi->max_queue_pairs; i++) {
>   		struct virtqueue *vq = vi->sq[i].vq;
>   		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> -			if (!is_xdp_frame(buf))
> +			if (is_skb_ptr(buf))
>   				dev_kfree_skb(buf);
> -			else
> +			else if (is_xdp_frame(buf))
>   				xdp_return_frame(ptr_to_xdp(buf));
>   		}
>   	}


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 4/8] virtio-net: xsk zero copy xmit implement wakeup and xmit
  2021-04-06  6:19   ` Jason Wang
@ 2021-04-07  9:56     ` Magnus Karlsson
  0 siblings, 0 replies; 18+ messages in thread
From: Magnus Karlsson @ 2021-04-07  9:56 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, Network Development, Michael S. Tsirkin,
	David S. Miller, Jakub Kicinski, Björn Töpel,
	Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, virtualization, bpf, Dust Li

On Tue, Apr 6, 2021 at 3:33 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/3/31 下午3:11, Xuan Zhuo 写道:
> > When the user calls sendto to consume the data in the xsk tx queue,
> > virtnet_xsk_wakeup will be called.
> >
> > In wakeup, it will try to send a part of the data directly, the quantity
> > is operated by the module parameter xsk_budget.
>
>
> Any reason that we can't use NAPI budget?

I think we should use NAPI budget here and skip this module parameter.
If a user would like to control the number of packets processed, then
the preferred busy_poll mode [1] can be used. In that patch series, a
new setsockopt option was introduced called SO_BUSY_POLL_BUDGET. With
it you can set the napi budget value without the need of a module
parameter.

[1]: https://lore.kernel.org/bpf/20201119083024.119566-1-bjorn.topel@gmail.com/T/

>
> >   There are two purposes
> > for this realization:
> >
> > 1. Send part of the data quickly to reduce the transmission delay of the
> >     first packet
> > 2. Trigger tx interrupt, start napi to consume xsk tx data
> >
> > All sent xsk packets share the virtio-net header of xsk_hdr. If xsk
> > needs to support csum and other functions later, consider assigning xsk
> > hdr separately for each sent packet.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> > ---
> >   drivers/net/virtio_net.c | 183 +++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 183 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 4e25408a2b37..c8a317a93ef7 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -28,9 +28,11 @@ static int napi_weight = NAPI_POLL_WEIGHT;
> >   module_param(napi_weight, int, 0444);
> >
> >   static bool csum = true, gso = true, napi_tx = true;
> > +static int xsk_budget = 32;
> >   module_param(csum, bool, 0444);
> >   module_param(gso, bool, 0444);
> >   module_param(napi_tx, bool, 0644);
> > +module_param(xsk_budget, int, 0644);
> >
> >   /* FIXME: MTU in config. */
> >   #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > @@ -47,6 +49,8 @@ module_param(napi_tx, bool, 0644);
> >
> >   #define VIRTIO_XDP_FLAG     BIT(0)
> >
> > +static struct virtio_net_hdr_mrg_rxbuf xsk_hdr;
> > +
> >   /* RX packet size EWMA. The average packet size is used to determine the packet
> >    * buffer size when refilling RX rings. As the entire RX ring may be refilled
> >    * at once, the weight is chosen so that the EWMA will be insensitive to short-
> > @@ -138,6 +142,9 @@ struct send_queue {
> >       struct {
> >               /* xsk pool */
> >               struct xsk_buff_pool __rcu *pool;
> > +
> > +             /* save the desc for next xmit, when xmit fail. */
> > +             struct xdp_desc last_desc;
> >       } xsk;
> >   };
> >
> > @@ -2532,6 +2539,179 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> >       return err;
> >   }
> >
> > +static void virtnet_xsk_check_space(struct send_queue *sq)
> > +{
>
>
> The name is confusing, the function does more than just checking the
> space, it may stop the queue as well.
>
>
> > +     struct virtnet_info *vi = sq->vq->vdev->priv;
> > +     struct net_device *dev = vi->dev;
> > +     int qnum = sq - vi->sq;
> > +
> > +     /* If this sq is not the exclusive queue of the current cpu,
> > +      * then it may be called by start_xmit, so check it running out
> > +      * of space.
> > +      */
>
>
> So the code can explain itself. We need a better comment to explain why
> we need to differ the case of the raw buffer queue.
>
>
> > +     if (is_xdp_raw_buffer_queue(vi, qnum))
> > +             return;
> > +
> > +     /* Stop the queue to avoid getting packets that we are
> > +      * then unable to transmit. Then wait the tx interrupt.
> > +      */
> > +     if (sq->vq->num_free < 2 + MAX_SKB_FRAGS)
> > +             netif_stop_subqueue(dev, qnum);
> > +}
> > +
> > +static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
> > +                         struct xdp_desc *desc)
> > +{
> > +     struct virtnet_info *vi;
> > +     struct page *page;
> > +     void *data;
> > +     u32 offset;
> > +     u64 addr;
> > +     int err;
> > +
> > +     vi = sq->vq->vdev->priv;
> > +     addr = desc->addr;
> > +     data = xsk_buff_raw_get_data(pool, addr);
> > +     offset = offset_in_page(data);
> > +
> > +     sg_init_table(sq->sg, 2);
> > +     sg_set_buf(sq->sg, &xsk_hdr, vi->hdr_len);
> > +     page = xsk_buff_xdp_get_page(pool, addr);
> > +     sg_set_page(sq->sg + 1, page, desc->len, offset);
> > +
> > +     err = virtqueue_add_outbuf(sq->vq, sq->sg, 2, NULL, GFP_ATOMIC);
> > +     if (unlikely(err))
> > +             sq->xsk.last_desc = *desc;
>
>
> So I think it's better to make sure we had at least 2 slots to avoid
> handling errors like this? (Especially consider the queue size is not
> necessarily the power of 2 when packed virtqueue is used).
>
>
> > +
> > +     return err;
> > +}
> > +
> > +static int virtnet_xsk_xmit_batch(struct send_queue *sq,
> > +                               struct xsk_buff_pool *pool,
> > +                               unsigned int budget,
> > +                               bool in_napi, int *done)
> > +{
> > +     struct xdp_desc desc;
> > +     int err, packet = 0;
> > +     int ret = -EAGAIN;
> > +
> > +     if (sq->xsk.last_desc.addr) {
>
>
> Any reason that num_free is not checked here?
>
>
> > +             err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc);
> > +             if (unlikely(err))
> > +                     return -EBUSY;
> > +
> > +             ++packet;
> > +             --budget;
> > +             sq->xsk.last_desc.addr = 0;
> > +     }
> > +
> > +     while (budget-- > 0) {
> > +             if (sq->vq->num_free < 2 + MAX_SKB_FRAGS) {
> > +                     ret = -EBUSY;
> > +                     break;
> > +             }
> > +
> > +             if (!xsk_tx_peek_desc(pool, &desc)) {
> > +                     /* done */
> > +                     ret = 0;
> > +                     break;
> > +             }
> > +
> > +             err = virtnet_xsk_xmit(sq, pool, &desc);
> > +             if (unlikely(err)) {
> > +                     ret = -EBUSY;
> > +                     break;
> > +             }
> > +
> > +             ++packet;
> > +     }
> > +
> > +     if (packet) {
> > +             if (virtqueue_kick_prepare(sq->vq) &&
> > +                 virtqueue_notify(sq->vq)) {
> > +                     u64_stats_update_begin(&sq->stats.syncp);
> > +                     sq->stats.kicks += 1;
> > +                     u64_stats_update_end(&sq->stats.syncp);
> > +             }
> > +
> > +             *done = packet;
> > +
> > +             xsk_tx_release(pool);
> > +     }
> > +
> > +     return ret;
> > +}
> > +
> > +static int virtnet_xsk_run(struct send_queue *sq, struct xsk_buff_pool *pool,
> > +                        int budget, bool in_napi)
> > +{
> > +     int done = 0;
> > +     int err;
> > +
> > +     free_old_xmit_skbs(sq, in_napi);
> > +
> > +     err = virtnet_xsk_xmit_batch(sq, pool, budget, in_napi, &done);
> > +     /* -EAGAIN: done == budget
> > +      * -EBUSY: done < budget
> > +      *  0    : done < budget
> > +      */
>
>
> Please move them to the comment above virtnet_xsk_xmit_batch().
>
> And it looks to me there's no care for -EAGAIN, any reason for sticking
> a dedicated variable like that?
>
>
> > +     if (err == -EBUSY) {
> > +             free_old_xmit_skbs(sq, in_napi);
> > +
> > +             /* If the space is enough, let napi run again. */
> > +             if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> > +                     done = budget;
>
>
> So I don't see how this can work in the case of event index where the
> notification needs to be enabled explicitly.
>
>
> > +     }
> > +
> > +     virtnet_xsk_check_space(sq);
> > +
> > +     return done;
> > +}
> > +
> > +static int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
> > +{
> > +     struct virtnet_info *vi = netdev_priv(dev);
> > +     struct xsk_buff_pool *pool;
> > +     struct netdev_queue *txq;
> > +     struct send_queue *sq;
> > +
> > +     if (!netif_running(dev))
> > +             return -ENETDOWN;
> > +
> > +     if (qid >= vi->curr_queue_pairs)
> > +             return -EINVAL;
> > +
> > +     sq = &vi->sq[qid];
> > +
> > +     rcu_read_lock();
> > +
> > +     pool = rcu_dereference(sq->xsk.pool);
> > +     if (!pool)
> > +             goto end;
> > +
> > +     if (napi_if_scheduled_mark_missed(&sq->napi))
> > +             goto end;
> > +
> > +     txq = netdev_get_tx_queue(dev, qid);
> > +
> > +     __netif_tx_lock_bh(txq);
> > +
> > +     /* Send part of the packet directly to reduce the delay in sending the
> > +      * packet, and this can actively trigger the tx interrupts.
> > +      *
> > +      * If no packet is sent out, the ring of the device is full. In this
> > +      * case, we will still get a tx interrupt response. Then we will deal
> > +      * with the subsequent packet sending work.
> > +      */
> > +     virtnet_xsk_run(sq, pool, xsk_budget, false);
>
>
> So the return value is ignored, this means there's no way to report we
> exhaust the budget. Is this intended?
>
> Thanks
>
>
> > +
> > +     __netif_tx_unlock_bh(txq);
> > +
> > +end:
> > +     rcu_read_unlock();
> > +     return 0;
> > +}
> > +
> >   static int virtnet_xsk_pool_enable(struct net_device *dev,
> >                                  struct xsk_buff_pool *pool,
> >                                  u16 qid)
> > @@ -2553,6 +2733,8 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
> >       if (rcu_dereference(sq->xsk.pool))
> >               goto end;
> >
> > +     memset(&sq->xsk, 0, sizeof(sq->xsk));
> > +
> >       /* Here is already protected by rtnl_lock, so rcu_assign_pointer is
> >        * safe.
> >        */
> > @@ -2656,6 +2838,7 @@ static const struct net_device_ops virtnet_netdev = {
> >       .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
> >       .ndo_bpf                = virtnet_xdp,
> >       .ndo_xdp_xmit           = virtnet_xdp_xmit,
> > +     .ndo_xsk_wakeup         = virtnet_xsk_wakeup,
> >       .ndo_features_check     = passthru_features_check,
> >       .ndo_get_phys_port_name = virtnet_get_phys_port_name,
> >       .ndo_set_features       = virtnet_set_features,
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 3/8] virtio-net: xsk zero copy xmit setup
  2021-04-06  4:27   ` Jason Wang
@ 2021-04-07 10:02     ` Magnus Karlsson
  0 siblings, 0 replies; 18+ messages in thread
From: Magnus Karlsson @ 2021-04-07 10:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, Network Development, Michael S. Tsirkin,
	David S. Miller, Jakub Kicinski, Björn Töpel,
	Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, virtualization, bpf, Dust Li

On Tue, Apr 6, 2021 at 3:13 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/3/31 下午3:11, Xuan Zhuo 写道:
> > xsk is a high-performance packet receiving and sending technology.
> >
> > This patch implements the binding and unbinding operations of xsk and
> > the virtio-net queue for xsk zero copy xmit.
> >
> > The xsk zero copy xmit depends on tx napi. So if tx napi is not opened,
> > an error will be reported. And the entire operation is under the
> > protection of rtnl_lock
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> > ---
> >   drivers/net/virtio_net.c | 66 ++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 66 insertions(+)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index bb4ea9dbc16b..4e25408a2b37 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -22,6 +22,7 @@
> >   #include <net/route.h>
> >   #include <net/xdp.h>
> >   #include <net/net_failover.h>
> > +#include <net/xdp_sock_drv.h>
> >
> >   static int napi_weight = NAPI_POLL_WEIGHT;
> >   module_param(napi_weight, int, 0444);
> > @@ -133,6 +134,11 @@ struct send_queue {
> >       struct virtnet_sq_stats stats;
> >
> >       struct napi_struct napi;
> > +
> > +     struct {
> > +             /* xsk pool */
> > +             struct xsk_buff_pool __rcu *pool;
> > +     } xsk;
> >   };
> >
> >   /* Internal representation of a receive virtqueue */
> > @@ -2526,11 +2532,71 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> >       return err;
> >   }
> >
> > +static int virtnet_xsk_pool_enable(struct net_device *dev,
> > +                                struct xsk_buff_pool *pool,
> > +                                u16 qid)
> > +{
> > +     struct virtnet_info *vi = netdev_priv(dev);
> > +     struct send_queue *sq;
> > +     int ret = -EBUSY;
>
>
> I'd rather move this under the check of xsk.pool.
>
>
> > +
> > +     if (qid >= vi->curr_queue_pairs)
> > +             return -EINVAL;
> > +
> > +     sq = &vi->sq[qid];
> > +
> > +     /* xsk zerocopy depend on the tx napi */
>
>
> Need more comments to explain why tx NAPI is required here.
>
> And what's more important, tx NAPI could be enabled/disable via ethtool,
> what if the NAPI is disabled after xsk is enabled?
>
>
> > +     if (!sq->napi.weight)
> > +             return -EPERM;
> > +
> > +     rcu_read_lock();
> > +     if (rcu_dereference(sq->xsk.pool))
> > +             goto end;
>
>
> Under what condition can we reach here?

There is already code in the common xsk part that tests for binding
multiple sockets to the same netdev and queue id (in an illegal way
that is). Does this code not work for you? If so, we should fix that
and not introduce a separate check down in the driver. Or maybe I
misunderstand your problem.

The code is here:

struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev,
                                            u16 queue_id)
{
        if (queue_id < dev->real_num_rx_queues)
                return dev->_rx[queue_id].pool;
        if (queue_id < dev->real_num_tx_queues)
                return dev->_tx[queue_id].pool;

        return NULL;
}

int xp_assign_dev(struct xsk_buff_pool *pool,
                  struct net_device *netdev, u16 queue_id, u16 flags)
{
        :
        :
        if (xsk_get_pool_from_qid(netdev, queue_id))
                return -EBUSY;


>
> > +
> > +     /* Here is already protected by rtnl_lock, so rcu_assign_pointer is
> > +      * safe.
> > +      */
> > +     rcu_assign_pointer(sq->xsk.pool, pool);
> > +     ret = 0;
> > +end:
> > +     rcu_read_unlock();
> > +
> > +     return ret;
> > +}
> > +
> > +static int virtnet_xsk_pool_disable(struct net_device *dev, u16 qid)
> > +{
> > +     struct virtnet_info *vi = netdev_priv(dev);
> > +     struct send_queue *sq;
> > +
> > +     if (qid >= vi->curr_queue_pairs)
> > +             return -EINVAL;
> > +
> > +     sq = &vi->sq[qid];
> > +
> > +     /* Here is already protected by rtnl_lock, so rcu_assign_pointer is
> > +      * safe.
> > +      */
> > +     rcu_assign_pointer(sq->xsk.pool, NULL);
> > +
> > +     synchronize_rcu(); /* Sync with the XSK wakeup and with NAPI. */
>
>
> Since rtnl is held here, I guess it's better to use synchornize_net().
>
>
> > +
> > +     return 0;
> > +}
> > +
> >   static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
> >   {
> >       switch (xdp->command) {
> >       case XDP_SETUP_PROG:
> >               return virtnet_xdp_set(dev, xdp->prog, xdp->extack);
> > +     case XDP_SETUP_XSK_POOL:
> > +             /* virtio net not use dma before call vring api */
> > +             xdp->xsk.check_dma = false;
>
>
> I think it's better not open code things like this. How about introduce
> new parameters in xp_assign_dev()?
>
>
> > +             if (xdp->xsk.pool)
> > +                     return virtnet_xsk_pool_enable(dev, xdp->xsk.pool,
> > +                                                    xdp->xsk.queue_id);
> > +             else
> > +                     return virtnet_xsk_pool_disable(dev, xdp->xsk.queue_id);
> >       default:
> >               return -EINVAL;
> >       }
>
>
> Thanks
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 7/8] virtio-net: poll tx call xsk zerocopy xmit
       [not found] <1617786614.454336-5-xuanzhuo@linux.alibaba.com>
@ 2021-04-07  9:16 ` Jason Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2021-04-07  9:16 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Michael S. Tsirkin, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf, Dust Li, netdev


在 2021/4/7 下午5:10, Xuan Zhuo 写道:
> On Tue, 6 Apr 2021 15:03:29 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> 在 2021/3/31 下午3:11, Xuan Zhuo 写道:
>>> poll tx call virtnet_xsk_run, then the data in the xsk tx queue will be
>>> continuously consumed by napi.
>>>
>>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
>>
>> I think we need squash this into patch 4, it looks more like a bug fix
>> to me.
>>
>>
>>> ---
>>>    drivers/net/virtio_net.c | 20 +++++++++++++++++---
>>>    1 file changed, 17 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index d7e95f55478d..fac7d0020013 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -264,6 +264,9 @@ struct padded_vnet_hdr {
>>>    	char padding[4];
>>>    };
>>>
>>> +static int virtnet_xsk_run(struct send_queue *sq, struct xsk_buff_pool *pool,
>>> +			   int budget, bool in_napi);
>>> +
>>>    static bool is_xdp_frame(void *ptr)
>>>    {
>>>    	return (unsigned long)ptr & VIRTIO_XDP_FLAG;
>>> @@ -1553,7 +1556,9 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>>>    	struct send_queue *sq = container_of(napi, struct send_queue, napi);
>>>    	struct virtnet_info *vi = sq->vq->vdev->priv;
>>>    	unsigned int index = vq2txq(sq->vq);
>>> +	struct xsk_buff_pool *pool;
>>>    	struct netdev_queue *txq;
>>> +	int work = 0;
>>>
>>>    	if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
>>>    		/* We don't need to enable cb for XDP */
>>> @@ -1563,15 +1568,24 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>>>
>>>    	txq = netdev_get_tx_queue(vi->dev, index);
>>>    	__netif_tx_lock(txq, raw_smp_processor_id());
>>> -	free_old_xmit_skbs(sq, true);
>>> +	rcu_read_lock();
>>> +	pool = rcu_dereference(sq->xsk.pool);
>>> +	if (pool) {
>>> +		work = virtnet_xsk_run(sq, pool, budget, true);
>>> +		rcu_read_unlock();
>>> +	} else {
>>> +		rcu_read_unlock();
>>> +		free_old_xmit_skbs(sq, true);
>>> +	}
>>>    	__netif_tx_unlock(txq);
>>>
>>> -	virtqueue_napi_complete(napi, sq->vq, 0);
>>> +	if (work < budget)
>>> +		virtqueue_napi_complete(napi, sq->vq, 0);
>>>
>>>    	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>>>    		netif_tx_wake_queue(txq);
>>>
>>> -	return 0;
>>> +	return work;
>>
>> Need a separate patch to "fix" the budget returned by poll_tx here.
> I will merge #5 #7 #8 into #4, which is indeed more reasonable, but maybe this
> patch will be too big.
>
> But I don't understand, what you are talking about here, what is the separate
> patch, when this is squashed into patch 4?


So you modify the behaviour of NAPI poll to return the number of work 
now (0 is returned previously). Do we need to do that for non XSK part 
as well (which seems to be the behaviour of other nic driver)? If yes, 
this part should be a separated patch to be more bisect friendly.

Thanks


>
>> Thanks
>>
>>
>>>    }
>>>
>>>    static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, back to index

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-31  7:11 [PATCH net-next v3 0/8] virtio-net support xdp socket zero copy xmit Xuan Zhuo
2021-03-31  7:11 ` [PATCH net-next v3 1/8] xsk: XDP_SETUP_XSK_POOL support option check_dma Xuan Zhuo
2021-03-31  7:11 ` [PATCH net-next v3 2/8] xsk: support get page by addr Xuan Zhuo
2021-03-31  7:11 ` [PATCH net-next v3 3/8] virtio-net: xsk zero copy xmit setup Xuan Zhuo
2021-04-06  4:27   ` Jason Wang
2021-04-07 10:02     ` Magnus Karlsson
2021-03-31  7:11 ` [PATCH net-next v3 4/8] virtio-net: xsk zero copy xmit implement wakeup and xmit Xuan Zhuo
2021-04-06  6:19   ` Jason Wang
2021-04-07  9:56     ` Magnus Karlsson
2021-03-31  7:11 ` [PATCH net-next v3 5/8] virtio-net: xsk zero copy xmit support xsk unaligned mode Xuan Zhuo
2021-04-06  6:55   ` Jason Wang
2021-03-31  7:11 ` [PATCH net-next v3 6/8] virtio-net: xsk zero copy xmit kick by threshold Xuan Zhuo
2021-04-06  6:59   ` Jason Wang
2021-03-31  7:11 ` [PATCH net-next v3 7/8] virtio-net: poll tx call xsk zerocopy xmit Xuan Zhuo
2021-04-06  7:03   ` Jason Wang
2021-03-31  7:11 ` [PATCH net-next v3 8/8] virtio-net: free old xmit handle xsk Xuan Zhuo
2021-04-06  7:16   ` Jason Wang
     [not found] <1617786614.454336-5-xuanzhuo@linux.alibaba.com>
2021-04-07  9:16 ` [PATCH net-next v3 7/8] virtio-net: poll tx call xsk zerocopy xmit Jason Wang

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git