bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/3] xsk: build skb by page
@ 2021-01-19  9:45 Xuan Zhuo
  2021-01-19  9:45 ` [PATCH bpf-next v2 1/3] net: add priv_flags for allow tx skb without linear Xuan Zhuo
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Xuan Zhuo @ 2021-01-19  9:45 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

v2:
    1. add priv_flags IFF_TX_SKB_NO_LINEAR instead of netdev_feature
    2. split the patch to three:
        a. add priv_flags IFF_TX_SKB_NO_LINEAR
        b. virtio net add priv_flags IFF_TX_SKB_NO_LINEAR
        c. When there is support this flag, construct skb without linear space
    3. use ERR_PTR() and PTR_ERR() to handle the err


v1 message log:
---------------

This patch is used to construct skb based on page to save memory copy
overhead.

This has one problem:

We construct the skb by fill the data page as a frag into the skb. In
this way, the linear space is empty, and the header information is also
in the frag, not in the linear space, which is not allowed for some
network cards. For example, Mellanox Technologies MT27710 Family
[ConnectX-4 Lx] will get the following error message:

    mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68
    00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
    WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
    00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
    00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
    00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb

I also tried to use build_skb to construct skb, but because of the
existence of skb_shinfo, it must be behind the linear space, so this
method is not working. We can't put skb_shinfo on desc->addr, it will be
exposed to users, this is not safe.

Finally, I added a feature NETIF_F_SKB_NO_LINEAR to identify whether the
network card supports the header information of the packet in the frag
and not in the linear space.

---------------- Performance Testing ------------

The test environment is Aliyun ECS server.
Test cmd:
```
xdpsock -i eth0 -t  -S -s <msg size>
```

Test result data:

size    64      512     1024    1500
copy    1916747 1775988 1600203 1440054
page    1974058 1953655 1945463 1904478
percent 3.0%    10.0%   21.58%  32.3%


Xuan Zhuo (3):
  net: add priv_flags for allow tx skb without linear
  virtio-net: support IFF_TX_SKB_NO_LINEAR
  xsk: build skb by page

 drivers/net/virtio_net.c  |   3 +-
 include/linux/netdevice.h |   3 ++
 net/xdp/xsk.c             | 112 ++++++++++++++++++++++++++++++++++++++--------
 3 files changed, 99 insertions(+), 19 deletions(-)

--
1.8.3.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH bpf-next v2 1/3] net: add priv_flags for allow tx skb without linear
  2021-01-19  9:45 [PATCH bpf-next v2 0/3] xsk: build skb by page Xuan Zhuo
@ 2021-01-19  9:45 ` Xuan Zhuo
  2021-01-20  3:01   ` Jason Wang
  2021-01-19  9:45 ` [PATCH bpf-next v2 2/3] virtio-net: support IFF_TX_SKB_NO_LINEAR Xuan Zhuo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Xuan Zhuo @ 2021-01-19  9:45 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

In some cases, we hope to construct skb directly based on the existing
memory without copying data. In this case, the page will be placed
directly in the skb, and the linear space of skb is empty. But
unfortunately, many the network card does not support this operation.
For example Mellanox Technologies MT27710 Family [ConnectX-4 Lx] will
get the following error message:

    mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68
    00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
    WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
    00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
    00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
    00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb

So a priv_flag is added here to indicate whether the network card
supports this feature.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Suggested-by: Alexander Lobakin <alobakin@pm.me>
---
 include/linux/netdevice.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 02dcef4..54501eb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1525,6 +1525,7 @@ struct net_device_ops {
  * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
  * @IFF_L3MDEV_RX_HANDLER: only invoke the rx handler of L3 master device
  * @IFF_LIVE_RENAME_OK: rename is allowed while device is up and running
+ * @IFF_TX_SKB_NO_LINEAR: allow tx skb linear is empty
  */
 enum netdev_priv_flags {
 	IFF_802_1Q_VLAN			= 1<<0,
@@ -1558,6 +1559,7 @@ enum netdev_priv_flags {
 	IFF_FAILOVER_SLAVE		= 1<<28,
 	IFF_L3MDEV_RX_HANDLER		= 1<<29,
 	IFF_LIVE_RENAME_OK		= 1<<30,
+	IFF_TX_SKB_NO_LINEAR		= 1<<31,
 };
 
 #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
@@ -1590,6 +1592,7 @@ enum netdev_priv_flags {
 #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
 #define IFF_L3MDEV_RX_HANDLER		IFF_L3MDEV_RX_HANDLER
 #define IFF_LIVE_RENAME_OK		IFF_LIVE_RENAME_OK
+#define IFF_TX_SKB_NO_LINEAR		IFF_TX_SKB_NO_LINEAR
 
 /**
  *	struct net_device - The DEVICE structure.
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next v2 2/3] virtio-net: support IFF_TX_SKB_NO_LINEAR
  2021-01-19  9:45 [PATCH bpf-next v2 0/3] xsk: build skb by page Xuan Zhuo
  2021-01-19  9:45 ` [PATCH bpf-next v2 1/3] net: add priv_flags for allow tx skb without linear Xuan Zhuo
@ 2021-01-19  9:45 ` Xuan Zhuo
  2021-01-19 14:28   ` Alexander Lobakin
  2021-01-19  9:45 ` [PATCH bpf-next " Xuan Zhuo
  2021-01-19  9:50 ` [PATCH bpf-next v2 0/3] " Michael S. Tsirkin
  3 siblings, 1 reply; 18+ messages in thread
From: Xuan Zhuo @ 2021-01-19  9:45 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

Virtio net supports the case where the skb linear space is empty, so add
priv_flags.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ba8e637..80d637f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2972,7 +2972,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 		return -ENOMEM;
 
 	/* Set up network device as normal. */
-	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE;
+	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE |
+		IFF_TX_SKB_NO_LINEAR;
 	dev->netdev_ops = &virtnet_netdev;
 	dev->features = NETIF_F_HIGHDMA;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next v2 3/3] xsk: build skb by page
  2021-01-19  9:45 [PATCH bpf-next v2 0/3] xsk: build skb by page Xuan Zhuo
  2021-01-19  9:45 ` [PATCH bpf-next v2 1/3] net: add priv_flags for allow tx skb without linear Xuan Zhuo
  2021-01-19  9:45 ` [PATCH bpf-next v2 2/3] virtio-net: support IFF_TX_SKB_NO_LINEAR Xuan Zhuo
@ 2021-01-19  9:45 ` Xuan Zhuo
  2021-01-19 14:43   ` Alexander Lobakin
  2021-01-19  9:50 ` [PATCH bpf-next v2 0/3] " Michael S. Tsirkin
  3 siblings, 1 reply; 18+ messages in thread
From: Xuan Zhuo @ 2021-01-19  9:45 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

This patch is used to construct skb based on page to save memory copy
overhead.

This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
directly construct skb. If this feature is not supported, it is still
necessary to copy data to construct skb.

---------------- Performance Testing ------------

The test environment is Aliyun ECS server.
Test cmd:
```
xdpsock -i eth0 -t  -S -s <msg size>
```

Test result data:

size    64      512     1024    1500
copy    1916747 1775988 1600203 1440054
page    1974058 1953655 1945463 1904478
percent 3.0%    10.0%   21.58%  32.3%

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
---
 net/xdp/xsk.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 94 insertions(+), 18 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 8037b04..8c291f8 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -430,6 +430,95 @@ static void xsk_destruct_skb(struct sk_buff *skb)
 	sock_wfree(skb);
 }
 
+static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
+					      struct xdp_desc *desc)
+{
+	u32 len, offset, copy, copied;
+	struct sk_buff *skb;
+	struct page *page;
+	char *buffer;
+	int err = 0, i;
+	u64 addr;
+
+	skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err);
+	if (unlikely(!skb))
+		return NULL;
+
+	addr = desc->addr;
+	len = desc->len;
+
+	buffer = xsk_buff_raw_get_data(xs->pool, addr);
+	offset = offset_in_page(buffer);
+	addr = buffer - (char *)xs->pool->addrs;
+
+	for (copied = 0, i = 0; copied < len; ++i) {
+		page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
+
+		get_page(page);
+
+		copy = min((u32)(PAGE_SIZE - offset), len - copied);
+
+		skb_fill_page_desc(skb, i, page, offset, copy);
+
+		copied += copy;
+		addr += copy;
+		offset = 0;
+	}
+
+	skb->len += len;
+	skb->data_len += len;
+	skb->truesize += len;
+
+	refcount_add(len, &xs->sk.sk_wmem_alloc);
+
+	return skb;
+}
+
+static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
+				     struct xdp_desc *desc)
+{
+	struct sk_buff *skb = NULL;
+	int err = -ENOMEM;
+
+	if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
+		skb = xsk_build_skb_zerocopy(xs, desc);
+		if (unlikely(!skb))
+			goto err;
+	} else {
+		char *buffer;
+		u64 addr;
+		u32 len;
+		int err;
+
+		len = desc->len;
+		skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
+		if (unlikely(!skb))
+			goto err;
+
+		skb_put(skb, len);
+		addr = desc->addr;
+		buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
+		err = skb_store_bits(skb, 0, buffer, len);
+
+		if (unlikely(err)) {
+			err = -EINVAL;
+			goto err;
+		}
+	}
+
+	skb->dev = xs->dev;
+	skb->priority = xs->sk.sk_priority;
+	skb->mark = xs->sk.sk_mark;
+	skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
+	skb->destructor = xsk_destruct_skb;
+
+	return skb;
+
+err:
+	kfree_skb(skb);
+	return ERR_PTR(err);
+}
+
 static int xsk_generic_xmit(struct sock *sk)
 {
 	struct xdp_sock *xs = xdp_sk(sk);
@@ -446,43 +535,30 @@ static int xsk_generic_xmit(struct sock *sk)
 		goto out;
 
 	while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
-		char *buffer;
-		u64 addr;
-		u32 len;
-
 		if (max_batch-- == 0) {
 			err = -EAGAIN;
 			goto out;
 		}
 
-		len = desc.len;
-		skb = sock_alloc_send_skb(sk, len, 1, &err);
-		if (unlikely(!skb))
+		skb = xsk_build_skb(xs, &desc);
+		if (IS_ERR(skb)) {
+			err = PTR_ERR(skb);
 			goto out;
+		}
 
-		skb_put(skb, len);
-		addr = desc.addr;
-		buffer = xsk_buff_raw_get_data(xs->pool, addr);
-		err = skb_store_bits(skb, 0, buffer, len);
 		/* This is the backpressure mechanism for the Tx path.
 		 * Reserve space in the completion queue and only proceed
 		 * if there is space in it. This avoids having to implement
 		 * any buffering in the Tx path.
 		 */
 		spin_lock_irqsave(&xs->pool->cq_lock, flags);
-		if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
+		if (xskq_prod_reserve(xs->pool->cq)) {
 			spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
 			kfree_skb(skb);
 			goto out;
 		}
 		spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
 
-		skb->dev = xs->dev;
-		skb->priority = sk->sk_priority;
-		skb->mark = sk->sk_mark;
-		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
-		skb->destructor = xsk_destruct_skb;
-
 		err = __dev_direct_xmit(skb, xs->queue_id);
 		if  (err == NETDEV_TX_BUSY) {
 			/* Tell user-space to retry the send */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v2 0/3] xsk: build skb by page
  2021-01-19  9:45 [PATCH bpf-next v2 0/3] xsk: build skb by page Xuan Zhuo
                   ` (2 preceding siblings ...)
  2021-01-19  9:45 ` [PATCH bpf-next " Xuan Zhuo
@ 2021-01-19  9:50 ` Michael S. Tsirkin
       [not found]   ` <1611053609.502882-1-xuanzhuo@linux.alibaba.com>
  3 siblings, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2021-01-19  9:50 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

On Tue, Jan 19, 2021 at 05:45:09PM +0800, Xuan Zhuo wrote:
> v2:
>     1. add priv_flags IFF_TX_SKB_NO_LINEAR instead of netdev_feature
>     2. split the patch to three:
>         a. add priv_flags IFF_TX_SKB_NO_LINEAR
>         b. virtio net add priv_flags IFF_TX_SKB_NO_LINEAR
>         c. When there is support this flag, construct skb without linear space
>     3. use ERR_PTR() and PTR_ERR() to handle the err
> 
> 
> v1 message log:
> ---------------
> 
> This patch is used to construct skb based on page to save memory copy
> overhead.
> 
> This has one problem:
> 
> We construct the skb by fill the data page as a frag into the skb. In
> this way, the linear space is empty, and the header information is also
> in the frag, not in the linear space, which is not allowed for some
> network cards. For example, Mellanox Technologies MT27710 Family
> [ConnectX-4 Lx] will get the following error message:
> 
>     mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68
>     00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
>     WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
>     00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
>     00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
>     00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb
> 
> I also tried to use build_skb to construct skb, but because of the
> existence of skb_shinfo, it must be behind the linear space, so this
> method is not working. We can't put skb_shinfo on desc->addr, it will be
> exposed to users, this is not safe.
> 
> Finally, I added a feature NETIF_F_SKB_NO_LINEAR to identify whether the
> network card supports the header information of the packet in the frag
> and not in the linear space.
> 
> ---------------- Performance Testing ------------
> 
> The test environment is Aliyun ECS server.
> Test cmd:
> ```
> xdpsock -i eth0 -t  -S -s <msg size>
> ```
> 
> Test result data:
> 
> size    64      512     1024    1500
> copy    1916747 1775988 1600203 1440054
> page    1974058 1953655 1945463 1904478
> percent 3.0%    10.0%   21.58%  32.3%

Just making sure, are these test results with v2?

> 
> Xuan Zhuo (3):
>   net: add priv_flags for allow tx skb without linear
>   virtio-net: support IFF_TX_SKB_NO_LINEAR
>   xsk: build skb by page
> 
>  drivers/net/virtio_net.c  |   3 +-
>  include/linux/netdevice.h |   3 ++
>  net/xdp/xsk.c             | 112 ++++++++++++++++++++++++++++++++++++++--------
>  3 files changed, 99 insertions(+), 19 deletions(-)
> 
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v2 0/3] xsk: build skb by page
       [not found]   ` <1611053609.502882-1-xuanzhuo@linux.alibaba.com>
@ 2021-01-19 11:02     ` Michael S. Tsirkin
  0 siblings, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2021-01-19 11:02 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Jason Wang, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf

On Tue, Jan 19, 2021 at 06:53:29PM +0800, Xuan Zhuo wrote:
> On Tue, 19 Jan 2021 04:50:30 -0500, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Jan 19, 2021 at 05:45:09PM +0800, Xuan Zhuo wrote:
> > > v2:
> > >     1. add priv_flags IFF_TX_SKB_NO_LINEAR instead of netdev_feature
> > >     2. split the patch to three:
> > >         a. add priv_flags IFF_TX_SKB_NO_LINEAR
> > >         b. virtio net add priv_flags IFF_TX_SKB_NO_LINEAR
> > >         c. When there is support this flag, construct skb without linear space
> > >     3. use ERR_PTR() and PTR_ERR() to handle the err
> > >
> > >
> > > v1 message log:
> > > ---------------
> > >
> > > This patch is used to construct skb based on page to save memory copy
> > > overhead.
> > >
> > > This has one problem:
> > >
> > > We construct the skb by fill the data page as a frag into the skb. In
> > > this way, the linear space is empty, and the header information is also
> > > in the frag, not in the linear space, which is not allowed for some
> > > network cards. For example, Mellanox Technologies MT27710 Family
> > > [ConnectX-4 Lx] will get the following error message:
> > >
> > >     mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68
> > >     00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >     00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >     00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >     00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
> > >     WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
> > >     00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
> > >     00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >     00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
> > >     00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >     mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb
> > >
> > > I also tried to use build_skb to construct skb, but because of the
> > > existence of skb_shinfo, it must be behind the linear space, so this
> > > method is not working. We can't put skb_shinfo on desc->addr, it will be
> > > exposed to users, this is not safe.
> > >
> > > Finally, I added a feature NETIF_F_SKB_NO_LINEAR to identify whether the
> > > network card supports the header information of the packet in the frag
> > > and not in the linear space.
> > >
> > > ---------------- Performance Testing ------------
> > >
> > > The test environment is Aliyun ECS server.
> > > Test cmd:
> > > ```
> > > xdpsock -i eth0 -t  -S -s <msg size>
> > > ```
> > >
> > > Test result data:
> > >
> > > size    64      512     1024    1500
> > > copy    1916747 1775988 1600203 1440054
> > > page    1974058 1953655 1945463 1904478
> > > percent 3.0%    10.0%   21.58%  32.3%
> >
> > Just making sure, are these test results with v2?
> 
> The data was tested at v1,
> but v2 did not modify the performance-related code.
> 
> Thanks.

Looks like v1 wouldn't even build, or did I miss anything?
It would be nicer if you retested it ...

> 
> >
> > >
> > > Xuan Zhuo (3):
> > >   net: add priv_flags for allow tx skb without linear
> > >   virtio-net: support IFF_TX_SKB_NO_LINEAR
> > >   xsk: build skb by page
> > >
> > >  drivers/net/virtio_net.c  |   3 +-
> > >  include/linux/netdevice.h |   3 ++
> > >  net/xdp/xsk.c             | 112 ++++++++++++++++++++++++++++++++++++++--------
> > >  3 files changed, 99 insertions(+), 19 deletions(-)
> > >
> > > --
> > > 1.8.3.1
> >


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v2 2/3] virtio-net: support IFF_TX_SKB_NO_LINEAR
  2021-01-19  9:45 ` [PATCH bpf-next v2 2/3] virtio-net: support IFF_TX_SKB_NO_LINEAR Xuan Zhuo
@ 2021-01-19 14:28   ` Alexander Lobakin
       [not found]     ` <cover.1611128806.git.xuanzhuo@linux.alibaba.com>
  0 siblings, 1 reply; 18+ messages in thread
From: Alexander Lobakin @ 2021-01-19 14:28 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Alexander Lobakin, Michael S. Tsirkin, Jason Wang,
	David S. Miller, Jakub Kicinski, bjorn.topel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	virtualization, bpf, netdev, linux-kernel

From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Tue, 19 Jan 2021 17:45:11 +0800

> Virtio net supports the case where the skb linear space is empty, so add
> priv_flags.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ba8e637..80d637f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2972,7 +2972,8 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		return -ENOMEM;
>  
>  	/* Set up network device as normal. */
> -	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE;
> +	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE |
> +		IFF_TX_SKB_NO_LINEAR;

Please align IFF_TX_SKB_NO_LINEAR to IFF_UNICAST_FLT:

	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE |
			   IFF_TX_SKB_NO_LINEAR;

>  	dev->netdev_ops = &virtnet_netdev;
>  	dev->features = NETIF_F_HIGHDMA;

Also, the series you sent is showed up incorrectly on lore.kernel.org
and patchwork.kernel.org. Seems like you used different To and Cc for
its parts.
Please use scripts/get_maintainer.pl to the whole series:

scripts/get_maintainer.pl ../patch-build-skb-by-page/*

And use one list of addresses for every message, so they wouldn't
lost.

Al


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v2 3/3] xsk: build skb by page
  2021-01-19  9:45 ` [PATCH bpf-next " Xuan Zhuo
@ 2021-01-19 14:43   ` Alexander Lobakin
  0 siblings, 0 replies; 18+ messages in thread
From: Alexander Lobakin @ 2021-01-19 14:43 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Alexander Lobakin, Michael S. Tsirkin, Jason Wang,
	David S. Miller, Jakub Kicinski, bjorn.topel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	virtualization, bpf, netdev, linux-kernel

From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Tue, 19 Jan 2021 17:45:12 +0800

> This patch is used to construct skb based on page to save memory copy
> overhead.
> 
> This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> directly construct skb. If this feature is not supported, it is still
> necessary to copy data to construct skb.
> 
> ---------------- Performance Testing ------------
> 
> The test environment is Aliyun ECS server.
> Test cmd:
> ```
> xdpsock -i eth0 -t  -S -s <msg size>
> ```
> 
> Test result data:
> 
> size    64      512     1024    1500
> copy    1916747 1775988 1600203 1440054
> page    1974058 1953655 1945463 1904478
> percent 3.0%    10.0%   21.58%  32.3%
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> ---
>  net/xdp/xsk.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 94 insertions(+), 18 deletions(-)
> 
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 8037b04..8c291f8 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -430,6 +430,95 @@ static void xsk_destruct_skb(struct sk_buff *skb)
>  	sock_wfree(skb);
>  }
>  
> +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> +					      struct xdp_desc *desc)
> +{
> +	u32 len, offset, copy, copied;
> +	struct sk_buff *skb;
> +	struct page *page;
> +	char *buffer;
> +	int err = 0, i;
> +	u64 addr;
> +
> +	skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err);
> +	if (unlikely(!skb))
> +		return NULL;

You can propagate err from here to the outer function:

	if (unlikely(!skb))
		return ERR_PTR(err);

> +	addr = desc->addr;
> +	len = desc->len;
> +
> +	buffer = xsk_buff_raw_get_data(xs->pool, addr);
> +	offset = offset_in_page(buffer);
> +	addr = buffer - (char *)xs->pool->addrs;
> +
> +	for (copied = 0, i = 0; copied < len; ++i) {

i++ would be less confusing here. You build skb frags from frag 0
anyway.

> +		page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> +
> +		get_page(page);
> +
> +		copy = min((u32)(PAGE_SIZE - offset), len - copied);

It's better to use min_t(u32, ...) instead of manual casting.

> +
> +		skb_fill_page_desc(skb, i, page, offset, copy);
> +
> +		copied += copy;
> +		addr += copy;
> +		offset = 0;
> +	}
> +
> +	skb->len += len;
> +	skb->data_len += len;
> +	skb->truesize += len;
> +
> +	refcount_add(len, &xs->sk.sk_wmem_alloc);
> +
> +	return skb;
> +}
> +
> +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> +				     struct xdp_desc *desc)
> +{
> +	struct sk_buff *skb = NULL;
> +	int err = -ENOMEM;
> +
> +	if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
> +		skb = xsk_build_skb_zerocopy(xs, desc);
> +		if (unlikely(!skb))
> +			goto err;

1. You should'n use goto err here, as skb == NULL, so kfree_skb(skb)
   is redundant.
2. If you would use ERR_PTR() in xsk_build_skb_zerocopy(),
   the condition should look like:

		if (IS_ERR(skb))
			return PTR_ERR(skb);

> +	} else {
> +		char *buffer;
> +		u64 addr;
> +		u32 len;
> +		int err;
> +
> +		len = desc->len;
> +		skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
> +		if (unlikely(!skb))
> +			goto err;

Same here, if skb == NULL, just return without calling kfree_skb().

> +		skb_put(skb, len);
> +		addr = desc->addr;
> +		buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
> +		err = skb_store_bits(skb, 0, buffer, len);
> +
> +		if (unlikely(err)) {
> +			err = -EINVAL;

You already have errno in err, no need to override it.

> +			goto err;
> +		}
> +	}
> +
> +	skb->dev = xs->dev;
> +	skb->priority = xs->sk.sk_priority;
> +	skb->mark = xs->sk.sk_mark;
> +	skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
> +	skb->destructor = xsk_destruct_skb;
> +
> +	return skb;
> +
> +err:
> +	kfree_skb(skb);
> +	return ERR_PTR(err);
> +}
> +
>  static int xsk_generic_xmit(struct sock *sk)
>  {
>  	struct xdp_sock *xs = xdp_sk(sk);
> @@ -446,43 +535,30 @@ static int xsk_generic_xmit(struct sock *sk)
>  		goto out;
>  
>  	while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
> -		char *buffer;
> -		u64 addr;
> -		u32 len;
> -
>  		if (max_batch-- == 0) {
>  			err = -EAGAIN;
>  			goto out;
>  		}
>  
> -		len = desc.len;
> -		skb = sock_alloc_send_skb(sk, len, 1, &err);
> -		if (unlikely(!skb))
> +		skb = xsk_build_skb(xs, &desc);
> +		if (IS_ERR(skb)) {
> +			err = PTR_ERR(skb);
>  			goto out;
> +		}
>  
> -		skb_put(skb, len);
> -		addr = desc.addr;
> -		buffer = xsk_buff_raw_get_data(xs->pool, addr);
> -		err = skb_store_bits(skb, 0, buffer, len);
>  		/* This is the backpressure mechanism for the Tx path.
>  		 * Reserve space in the completion queue and only proceed
>  		 * if there is space in it. This avoids having to implement
>  		 * any buffering in the Tx path.
>  		 */
>  		spin_lock_irqsave(&xs->pool->cq_lock, flags);
> -		if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
> +		if (xskq_prod_reserve(xs->pool->cq)) {
>  			spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
>  			kfree_skb(skb);
>  			goto out;
>  		}
>  		spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
>  
> -		skb->dev = xs->dev;
> -		skb->priority = sk->sk_priority;
> -		skb->mark = sk->sk_mark;
> -		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
> -		skb->destructor = xsk_destruct_skb;
> -
>  		err = __dev_direct_xmit(skb, xs->queue_id);
>  		if  (err == NETDEV_TX_BUSY) {
>  			/* Tell user-space to retry the send */

So please recheck the code and then retest it, especially error
paths (you can inject errors manually here to ensure they work).

Thanks,
Al


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next v2 1/3] net: add priv_flags for allow tx skb without linear
  2021-01-19  9:45 ` [PATCH bpf-next v2 1/3] net: add priv_flags for allow tx skb without linear Xuan Zhuo
@ 2021-01-20  3:01   ` Jason Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2021-01-20  3:01 UTC (permalink / raw)
  To: Xuan Zhuo, netdev
  Cc: Michael S. Tsirkin, David S. Miller, Jakub Kicinski,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, virtualization, bpf


On 2021/1/19 下午5:45, Xuan Zhuo wrote:
> In some cases, we hope to construct skb directly based on the existing
> memory without copying data. In this case, the page will be placed
> directly in the skb, and the linear space of skb is empty. But
> unfortunately, many the network card does not support this operation.
> For example Mellanox Technologies MT27710 Family [ConnectX-4 Lx] will
> get the following error message:
>
>      mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68
>      00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>      00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>      00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>      00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
>      WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
>      00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
>      00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>      00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
>      00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>      mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb
>
> So a priv_flag is added here to indicate whether the network card
> supports this feature.


I don't see Mellanox engineers are copied. I wonder if we need their 
confirmation on whether it's a bug or hardware limitation.

Thanks


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v2 2/3] virtio-net: support IFF_TX_SKB_NO_LINEAR
       [not found]     ` <cover.1611128806.git.xuanzhuo@linux.alibaba.com>
@ 2021-01-20  7:49       ` Xuan Zhuo
  2021-01-20  7:57         ` Michael S. Tsirkin
  2021-01-20  7:50       ` [PATCH net-next v2 3/3] xsk: build skb by page Xuan Zhuo
  1 sibling, 1 reply; 18+ messages in thread
From: Xuan Zhuo @ 2021-01-20  7:49 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, virtualization, bpf, netdev, linux-kernel

Virtio net supports the case where the skb linear space is empty, so add
priv_flags.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ba8e637..f2ff6c3 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2972,7 +2972,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 		return -ENOMEM;
 
 	/* Set up network device as normal. */
-	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE;
+	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE |
+			   IFF_TX_SKB_NO_LINEAR;
 	dev->netdev_ops = &virtnet_netdev;
 	dev->features = NETIF_F_HIGHDMA;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v2 3/3] xsk: build skb by page
       [not found]     ` <cover.1611128806.git.xuanzhuo@linux.alibaba.com>
  2021-01-20  7:49       ` [PATCH net-next " Xuan Zhuo
@ 2021-01-20  7:50       ` Xuan Zhuo
  2021-01-20  8:10         ` Michael S. Tsirkin
  1 sibling, 1 reply; 18+ messages in thread
From: Xuan Zhuo @ 2021-01-20  7:50 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, virtualization, bpf, netdev, linux-kernel

This patch is used to construct skb based on page to save memory copy
overhead.

This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
directly construct skb. If this feature is not supported, it is still
necessary to copy data to construct skb.

---------------- Performance Testing ------------

The test environment is Aliyun ECS server.
Test cmd:
```
xdpsock -i eth0 -t  -S -s <msg size>
```

Test result data:

size    64      512     1024    1500
copy    1916747 1775988 1600203 1440054
page    1974058 1953655 1945463 1904478
percent 3.0%    10.0%   21.58%  32.3%

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
---
 net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 86 insertions(+), 18 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 8037b04..817a3a5 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
 	sock_wfree(skb);
 }
 
+static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
+					      struct xdp_desc *desc)
+{
+	u32 len, offset, copy, copied;
+	struct sk_buff *skb;
+	struct page *page;
+	char *buffer;
+	int err, i;
+	u64 addr;
+
+	skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err);
+	if (unlikely(!skb))
+		return ERR_PTR(err);
+
+	addr = desc->addr;
+	len = desc->len;
+
+	buffer = xsk_buff_raw_get_data(xs->pool, addr);
+	offset = offset_in_page(buffer);
+	addr = buffer - (char *)xs->pool->addrs;
+
+	for (copied = 0, i = 0; copied < len; i++) {
+		page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
+
+		get_page(page);
+
+		copy = min_t(u32, PAGE_SIZE - offset, len - copied);
+
+		skb_fill_page_desc(skb, i, page, offset, copy);
+
+		copied += copy;
+		addr += copy;
+		offset = 0;
+	}
+
+	skb->len += len;
+	skb->data_len += len;
+	skb->truesize += len;
+
+	refcount_add(len, &xs->sk.sk_wmem_alloc);
+
+	return skb;
+}
+
+static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
+				     struct xdp_desc *desc)
+{
+	struct sk_buff *skb = NULL;
+
+	if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
+		skb = xsk_build_skb_zerocopy(xs, desc);
+		if (IS_ERR(skb))
+			return skb;
+	} else {
+		char *buffer;
+		u32 len;
+		int err;
+
+		len = desc->len;
+		skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
+		if (unlikely(!skb))
+			return ERR_PTR(err);
+
+		skb_put(skb, len);
+		buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
+		err = skb_store_bits(skb, 0, buffer, len);
+		if (unlikely(err)) {
+			kfree_skb(skb);
+			return ERR_PTR(err);
+		}
+	}
+
+	skb->dev = xs->dev;
+	skb->priority = xs->sk.sk_priority;
+	skb->mark = xs->sk.sk_mark;
+	skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
+	skb->destructor = xsk_destruct_skb;
+
+	return skb;
+}
+
 static int xsk_generic_xmit(struct sock *sk)
 {
 	struct xdp_sock *xs = xdp_sk(sk);
@@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk)
 		goto out;
 
 	while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
-		char *buffer;
-		u64 addr;
-		u32 len;
-
 		if (max_batch-- == 0) {
 			err = -EAGAIN;
 			goto out;
 		}
 
-		len = desc.len;
-		skb = sock_alloc_send_skb(sk, len, 1, &err);
-		if (unlikely(!skb))
+		skb = xsk_build_skb(xs, &desc);
+		if (IS_ERR(skb)) {
+			err = PTR_ERR(skb);
 			goto out;
+		}
 
-		skb_put(skb, len);
-		addr = desc.addr;
-		buffer = xsk_buff_raw_get_data(xs->pool, addr);
-		err = skb_store_bits(skb, 0, buffer, len);
 		/* This is the backpressure mechanism for the Tx path.
 		 * Reserve space in the completion queue and only proceed
 		 * if there is space in it. This avoids having to implement
 		 * any buffering in the Tx path.
 		 */
 		spin_lock_irqsave(&xs->pool->cq_lock, flags);
-		if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
+		if (xskq_prod_reserve(xs->pool->cq)) {
 			spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
 			kfree_skb(skb);
 			goto out;
 		}
 		spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
 
-		skb->dev = xs->dev;
-		skb->priority = sk->sk_priority;
-		skb->mark = sk->sk_mark;
-		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
-		skb->destructor = xsk_destruct_skb;
-
 		err = __dev_direct_xmit(skb, xs->queue_id);
 		if  (err == NETDEV_TX_BUSY) {
 			/* Tell user-space to retry the send */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v2 2/3] virtio-net: support IFF_TX_SKB_NO_LINEAR
  2021-01-20  7:49       ` [PATCH net-next " Xuan Zhuo
@ 2021-01-20  7:57         ` Michael S. Tsirkin
  0 siblings, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2021-01-20  7:57 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Alexander Lobakin, Jason Wang, David S. Miller, Jakub Kicinski,
	bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, virtualization, bpf, netdev, linux-kernel

On Wed, Jan 20, 2021 at 03:49:10PM +0800, Xuan Zhuo wrote:
> Virtio net supports the case where the skb linear space is empty, so add
> priv_flags.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ba8e637..f2ff6c3 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2972,7 +2972,8 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		return -ENOMEM;
>  
>  	/* Set up network device as normal. */
> -	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE;
> +	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE |
> +			   IFF_TX_SKB_NO_LINEAR;
>  	dev->netdev_ops = &virtnet_netdev;
>  	dev->features = NETIF_F_HIGHDMA;
>  
> -- 
> 1.8.3.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v2 3/3] xsk: build skb by page
  2021-01-20  7:50       ` [PATCH net-next v2 3/3] xsk: build skb by page Xuan Zhuo
@ 2021-01-20  8:10         ` Michael S. Tsirkin
  2021-01-20  8:11           ` Michael S. Tsirkin
       [not found]           ` <cover.1611131344.git.xuanzhuo@linux.alibaba.com>
  0 siblings, 2 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2021-01-20  8:10 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Alexander Lobakin, Jason Wang, David S. Miller, Jakub Kicinski,
	bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, virtualization, bpf, netdev, linux-kernel

On Wed, Jan 20, 2021 at 03:50:01PM +0800, Xuan Zhuo wrote:
> This patch is used to construct skb based on page to save memory copy
> overhead.
> 
> This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> directly construct skb. If this feature is not supported, it is still
> necessary to copy data to construct skb.
> 
> ---------------- Performance Testing ------------
> 
> The test environment is Aliyun ECS server.
> Test cmd:
> ```
> xdpsock -i eth0 -t  -S -s <msg size>
> ```
> 
> Test result data:
> 
> size    64      512     1024    1500
> copy    1916747 1775988 1600203 1440054
> page    1974058 1953655 1945463 1904478
> percent 3.0%    10.0%   21.58%  32.3%
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>

I can't see the cover letter or 1/3 in this series - was probably
threaded incorrectly?


> ---
>  net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 86 insertions(+), 18 deletions(-)
> 
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 8037b04..817a3a5 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
>  	sock_wfree(skb);
>  }
>  
> +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> +					      struct xdp_desc *desc)
> +{
> +	u32 len, offset, copy, copied;
> +	struct sk_buff *skb;
> +	struct page *page;
> +	char *buffer;

Actually, make this void *, this way you will not need
casts down the road. I know this is from xsk_generic_xmit -
I don't know why it's char * there, either.

> +	int err, i;
> +	u64 addr;
> +
> +	skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err);
> +	if (unlikely(!skb))
> +		return ERR_PTR(err);
> +
> +	addr = desc->addr;
> +	len = desc->len;
> +
> +	buffer = xsk_buff_raw_get_data(xs->pool, addr);
> +	offset = offset_in_page(buffer);
> +	addr = buffer - (char *)xs->pool->addrs;
> +
> +	for (copied = 0, i = 0; copied < len; i++) {
> +		page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> +
> +		get_page(page);
> +
> +		copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> +
> +		skb_fill_page_desc(skb, i, page, offset, copy);
> +
> +		copied += copy;
> +		addr += copy;
> +		offset = 0;
> +	}
> +
> +	skb->len += len;
> +	skb->data_len += len;
> +	skb->truesize += len;
> +
> +	refcount_add(len, &xs->sk.sk_wmem_alloc);
> +
> +	return skb;
> +}
> +
> +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> +				     struct xdp_desc *desc)
> +{
> +	struct sk_buff *skb = NULL;
> +
> +	if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
> +		skb = xsk_build_skb_zerocopy(xs, desc);
> +		if (IS_ERR(skb))
> +			return skb;
> +	} else {
> +		char *buffer;
> +		u32 len;
> +		int err;
> +
> +		len = desc->len;
> +		skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
> +		if (unlikely(!skb))
> +			return ERR_PTR(err);
> +
> +		skb_put(skb, len);
> +		buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
> +		err = skb_store_bits(skb, 0, buffer, len);
> +		if (unlikely(err)) {
> +			kfree_skb(skb);
> +			return ERR_PTR(err);
> +		}
> +	}
> +
> +	skb->dev = xs->dev;
> +	skb->priority = xs->sk.sk_priority;
> +	skb->mark = xs->sk.sk_mark;
> +	skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
> +	skb->destructor = xsk_destruct_skb;
> +
> +	return skb;
> +}
> +
>  static int xsk_generic_xmit(struct sock *sk)
>  {
>  	struct xdp_sock *xs = xdp_sk(sk);
> @@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk)
>  		goto out;
>  
>  	while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
> -		char *buffer;
> -		u64 addr;
> -		u32 len;
> -
>  		if (max_batch-- == 0) {
>  			err = -EAGAIN;
>  			goto out;
>  		}
>  
> -		len = desc.len;
> -		skb = sock_alloc_send_skb(sk, len, 1, &err);
> -		if (unlikely(!skb))
> +		skb = xsk_build_skb(xs, &desc);
> +		if (IS_ERR(skb)) {
> +			err = PTR_ERR(skb);
>  			goto out;
> +		}
>  
> -		skb_put(skb, len);
> -		addr = desc.addr;
> -		buffer = xsk_buff_raw_get_data(xs->pool, addr);
> -		err = skb_store_bits(skb, 0, buffer, len);
>  		/* This is the backpressure mechanism for the Tx path.
>  		 * Reserve space in the completion queue and only proceed
>  		 * if there is space in it. This avoids having to implement
>  		 * any buffering in the Tx path.
>  		 */
>  		spin_lock_irqsave(&xs->pool->cq_lock, flags);
> -		if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
> +		if (xskq_prod_reserve(xs->pool->cq)) {
>  			spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
>  			kfree_skb(skb);
>  			goto out;
>  		}
>  		spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
>  
> -		skb->dev = xs->dev;
> -		skb->priority = sk->sk_priority;
> -		skb->mark = sk->sk_mark;
> -		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
> -		skb->destructor = xsk_destruct_skb;
> -
>  		err = __dev_direct_xmit(skb, xs->queue_id);
>  		if  (err == NETDEV_TX_BUSY) {
>  			/* Tell user-space to retry the send */
> -- 
> 1.8.3.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v2 3/3] xsk: build skb by page
  2021-01-20  8:10         ` Michael S. Tsirkin
@ 2021-01-20  8:11           ` Michael S. Tsirkin
       [not found]           ` <cover.1611131344.git.xuanzhuo@linux.alibaba.com>
  1 sibling, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2021-01-20  8:11 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Alexander Lobakin, Jason Wang, David S. Miller, Jakub Kicinski,
	bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, virtualization, bpf, netdev, linux-kernel

On Wed, Jan 20, 2021 at 03:11:04AM -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 20, 2021 at 03:50:01PM +0800, Xuan Zhuo wrote:
> > This patch is used to construct skb based on page to save memory copy
> > overhead.
> > 
> > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> > directly construct skb. If this feature is not supported, it is still
> > necessary to copy data to construct skb.
> > 
> > ---------------- Performance Testing ------------
> > 
> > The test environment is Aliyun ECS server.
> > Test cmd:
> > ```
> > xdpsock -i eth0 -t  -S -s <msg size>
> > ```
> > 
> > Test result data:
> > 
> > size    64      512     1024    1500
> > copy    1916747 1775988 1600203 1440054
> > page    1974058 1953655 1945463 1904478
> > percent 3.0%    10.0%   21.58%  32.3%
> > 
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> 
> I can't see the cover letter or 1/3 in this series - was probably
> threaded incorrectly?


Hmm looked again and now I do see them. My mistake pls ignore.

> 
> > ---
> >  net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 86 insertions(+), 18 deletions(-)
> > 
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 8037b04..817a3a5 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
> >  	sock_wfree(skb);
> >  }
> >  
> > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> > +					      struct xdp_desc *desc)
> > +{
> > +	u32 len, offset, copy, copied;
> > +	struct sk_buff *skb;
> > +	struct page *page;
> > +	char *buffer;
> 
> Actually, make this void *, this way you will not need
> casts down the road. I know this is from xsk_generic_xmit -
> I don't know why it's char * there, either.
> 
> > +	int err, i;
> > +	u64 addr;
> > +
> > +	skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err);
> > +	if (unlikely(!skb))
> > +		return ERR_PTR(err);
> > +
> > +	addr = desc->addr;
> > +	len = desc->len;
> > +
> > +	buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > +	offset = offset_in_page(buffer);
> > +	addr = buffer - (char *)xs->pool->addrs;
> > +
> > +	for (copied = 0, i = 0; copied < len; i++) {
> > +		page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > +
> > +		get_page(page);
> > +
> > +		copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> > +
> > +		skb_fill_page_desc(skb, i, page, offset, copy);
> > +
> > +		copied += copy;
> > +		addr += copy;
> > +		offset = 0;
> > +	}
> > +
> > +	skb->len += len;
> > +	skb->data_len += len;
> > +	skb->truesize += len;
> > +
> > +	refcount_add(len, &xs->sk.sk_wmem_alloc);
> > +
> > +	return skb;
> > +}
> > +
> > +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> > +				     struct xdp_desc *desc)
> > +{
> > +	struct sk_buff *skb = NULL;
> > +
> > +	if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
> > +		skb = xsk_build_skb_zerocopy(xs, desc);
> > +		if (IS_ERR(skb))
> > +			return skb;
> > +	} else {
> > +		char *buffer;
> > +		u32 len;
> > +		int err;
> > +
> > +		len = desc->len;
> > +		skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
> > +		if (unlikely(!skb))
> > +			return ERR_PTR(err);
> > +
> > +		skb_put(skb, len);
> > +		buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
> > +		err = skb_store_bits(skb, 0, buffer, len);
> > +		if (unlikely(err)) {
> > +			kfree_skb(skb);
> > +			return ERR_PTR(err);
> > +		}
> > +	}
> > +
> > +	skb->dev = xs->dev;
> > +	skb->priority = xs->sk.sk_priority;
> > +	skb->mark = xs->sk.sk_mark;
> > +	skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
> > +	skb->destructor = xsk_destruct_skb;
> > +
> > +	return skb;
> > +}
> > +
> >  static int xsk_generic_xmit(struct sock *sk)
> >  {
> >  	struct xdp_sock *xs = xdp_sk(sk);
> > @@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk)
> >  		goto out;
> >  
> >  	while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
> > -		char *buffer;
> > -		u64 addr;
> > -		u32 len;
> > -
> >  		if (max_batch-- == 0) {
> >  			err = -EAGAIN;
> >  			goto out;
> >  		}
> >  
> > -		len = desc.len;
> > -		skb = sock_alloc_send_skb(sk, len, 1, &err);
> > -		if (unlikely(!skb))
> > +		skb = xsk_build_skb(xs, &desc);
> > +		if (IS_ERR(skb)) {
> > +			err = PTR_ERR(skb);
> >  			goto out;
> > +		}
> >  
> > -		skb_put(skb, len);
> > -		addr = desc.addr;
> > -		buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > -		err = skb_store_bits(skb, 0, buffer, len);
> >  		/* This is the backpressure mechanism for the Tx path.
> >  		 * Reserve space in the completion queue and only proceed
> >  		 * if there is space in it. This avoids having to implement
> >  		 * any buffering in the Tx path.
> >  		 */
> >  		spin_lock_irqsave(&xs->pool->cq_lock, flags);
> > -		if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
> > +		if (xskq_prod_reserve(xs->pool->cq)) {
> >  			spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
> >  			kfree_skb(skb);
> >  			goto out;
> >  		}
> >  		spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
> >  
> > -		skb->dev = xs->dev;
> > -		skb->priority = sk->sk_priority;
> > -		skb->mark = sk->sk_mark;
> > -		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
> > -		skb->destructor = xsk_destruct_skb;
> > -
> >  		err = __dev_direct_xmit(skb, xs->queue_id);
> >  		if  (err == NETDEV_TX_BUSY) {
> >  			/* Tell user-space to retry the send */
> > -- 
> > 1.8.3.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v2 3/3] xsk: build skb by page
       [not found]           ` <cover.1611131344.git.xuanzhuo@linux.alibaba.com>
@ 2021-01-20  8:30             ` Xuan Zhuo
  2021-01-20 13:56               ` Alexander Lobakin
  0 siblings, 1 reply; 18+ messages in thread
From: Xuan Zhuo @ 2021-01-20  8:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alexander Lobakin, Jason Wang, David S. Miller, Jakub Kicinski,
	bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, virtualization, bpf, netdev, linux-kernel

This patch is used to construct skb based on page to save memory copy
overhead.

This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
directly construct skb. If this feature is not supported, it is still
necessary to copy data to construct skb.

---------------- Performance Testing ------------

The test environment is Aliyun ECS server.
Test cmd:
```
xdpsock -i eth0 -t  -S -s <msg size>
```

Test result data:

size    64      512     1024    1500
copy    1916747 1775988 1600203 1440054
page    1974058 1953655 1945463 1904478
percent 3.0%    10.0%   21.58%  32.3%

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
---
 net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 86 insertions(+), 18 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 8037b04..40bac11 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
 	sock_wfree(skb);
 }
 
+static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
+					      struct xdp_desc *desc)
+{
+	u32 len, offset, copy, copied;
+	struct sk_buff *skb;
+	struct page *page;
+	void *buffer;
+	int err, i;
+	u64 addr;
+
+	skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err);
+	if (unlikely(!skb))
+		return ERR_PTR(err);
+
+	addr = desc->addr;
+	len = desc->len;
+
+	buffer = xsk_buff_raw_get_data(xs->pool, addr);
+	offset = offset_in_page(buffer);
+	addr = buffer - xs->pool->addrs;
+
+	for (copied = 0, i = 0; copied < len; i++) {
+		page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
+
+		get_page(page);
+
+		copy = min_t(u32, PAGE_SIZE - offset, len - copied);
+
+		skb_fill_page_desc(skb, i, page, offset, copy);
+
+		copied += copy;
+		addr += copy;
+		offset = 0;
+	}
+
+	skb->len += len;
+	skb->data_len += len;
+	skb->truesize += len;
+
+	refcount_add(len, &xs->sk.sk_wmem_alloc);
+
+	return skb;
+}
+
+static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
+				     struct xdp_desc *desc)
+{
+	struct sk_buff *skb = NULL;
+
+	if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
+		skb = xsk_build_skb_zerocopy(xs, desc);
+		if (IS_ERR(skb))
+			return skb;
+	} else {
+		void *buffer;
+		u32 len;
+		int err;
+
+		len = desc->len;
+		skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
+		if (unlikely(!skb))
+			return ERR_PTR(err);
+
+		skb_put(skb, len);
+		buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
+		err = skb_store_bits(skb, 0, buffer, len);
+		if (unlikely(err)) {
+			kfree_skb(skb);
+			return ERR_PTR(err);
+		}
+	}
+
+	skb->dev = xs->dev;
+	skb->priority = xs->sk.sk_priority;
+	skb->mark = xs->sk.sk_mark;
+	skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
+	skb->destructor = xsk_destruct_skb;
+
+	return skb;
+}
+
 static int xsk_generic_xmit(struct sock *sk)
 {
 	struct xdp_sock *xs = xdp_sk(sk);
@@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk)
 		goto out;
 
 	while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
-		char *buffer;
-		u64 addr;
-		u32 len;
-
 		if (max_batch-- == 0) {
 			err = -EAGAIN;
 			goto out;
 		}
 
-		len = desc.len;
-		skb = sock_alloc_send_skb(sk, len, 1, &err);
-		if (unlikely(!skb))
+		skb = xsk_build_skb(xs, &desc);
+		if (IS_ERR(skb)) {
+			err = PTR_ERR(skb);
 			goto out;
+		}
 
-		skb_put(skb, len);
-		addr = desc.addr;
-		buffer = xsk_buff_raw_get_data(xs->pool, addr);
-		err = skb_store_bits(skb, 0, buffer, len);
 		/* This is the backpressure mechanism for the Tx path.
 		 * Reserve space in the completion queue and only proceed
 		 * if there is space in it. This avoids having to implement
 		 * any buffering in the Tx path.
 		 */
 		spin_lock_irqsave(&xs->pool->cq_lock, flags);
-		if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
+		if (xskq_prod_reserve(xs->pool->cq)) {
 			spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
 			kfree_skb(skb);
 			goto out;
 		}
 		spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
 
-		skb->dev = xs->dev;
-		skb->priority = sk->sk_priority;
-		skb->mark = sk->sk_mark;
-		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
-		skb->destructor = xsk_destruct_skb;
-
 		err = __dev_direct_xmit(skb, xs->queue_id);
 		if  (err == NETDEV_TX_BUSY) {
 			/* Tell user-space to retry the send */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v2 3/3] xsk: build skb by page
  2021-01-20  8:30             ` Xuan Zhuo
@ 2021-01-20 13:56               ` Alexander Lobakin
  2021-01-21  7:41                 ` Magnus Karlsson
  0 siblings, 1 reply; 18+ messages in thread
From: Alexander Lobakin @ 2021-01-20 13:56 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Alexander Lobakin, Michael S. Tsirkin, Jason Wang,
	David S. Miller, Jakub Kicinski, bjorn.topel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	virtualization, bpf, netdev, linux-kernel

From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: Wed, 20 Jan 2021 16:30:56 +0800

> This patch is used to construct skb based on page to save memory copy
> overhead.
> 
> This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> directly construct skb. If this feature is not supported, it is still
> necessary to copy data to construct skb.
> 
> ---------------- Performance Testing ------------
> 
> The test environment is Aliyun ECS server.
> Test cmd:
> ```
> xdpsock -i eth0 -t  -S -s <msg size>
> ```
> 
> Test result data:
> 
> size    64      512     1024    1500
> copy    1916747 1775988 1600203 1440054
> page    1974058 1953655 1945463 1904478
> percent 3.0%    10.0%   21.58%  32.3%
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> ---
>  net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 86 insertions(+), 18 deletions(-)

Now I like the result, thanks!

But Patchwork still display your series incorrectly (messages 0 and 1
are missing). I'm concerning maintainers may not take this in such
form. Try to pass the folder's name, not folder/*.patch to
git send-email when sending, and don't use --in-reply-to when sending
a new iteration.

> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 8037b04..40bac11 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
>  	sock_wfree(skb);
>  }
> 
> +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> +					      struct xdp_desc *desc)
> +{
> +	u32 len, offset, copy, copied;
> +	struct sk_buff *skb;
> +	struct page *page;
> +	void *buffer;
> +	int err, i;
> +	u64 addr;
> +
> +	skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err);
> +	if (unlikely(!skb))
> +		return ERR_PTR(err);
> +
> +	addr = desc->addr;
> +	len = desc->len;
> +
> +	buffer = xsk_buff_raw_get_data(xs->pool, addr);
> +	offset = offset_in_page(buffer);
> +	addr = buffer - xs->pool->addrs;
> +
> +	for (copied = 0, i = 0; copied < len; i++) {
> +		page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> +
> +		get_page(page);
> +
> +		copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> +
> +		skb_fill_page_desc(skb, i, page, offset, copy);
> +
> +		copied += copy;
> +		addr += copy;
> +		offset = 0;
> +	}
> +
> +	skb->len += len;
> +	skb->data_len += len;
> +	skb->truesize += len;
> +
> +	refcount_add(len, &xs->sk.sk_wmem_alloc);
> +
> +	return skb;
> +}
> +
> +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> +				     struct xdp_desc *desc)
> +{
> +	struct sk_buff *skb = NULL;
> +
> +	if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
> +		skb = xsk_build_skb_zerocopy(xs, desc);
> +		if (IS_ERR(skb))
> +			return skb;
> +	} else {
> +		void *buffer;
> +		u32 len;
> +		int err;
> +
> +		len = desc->len;
> +		skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
> +		if (unlikely(!skb))
> +			return ERR_PTR(err);
> +
> +		skb_put(skb, len);
> +		buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
> +		err = skb_store_bits(skb, 0, buffer, len);
> +		if (unlikely(err)) {
> +			kfree_skb(skb);
> +			return ERR_PTR(err);
> +		}
> +	}
> +
> +	skb->dev = xs->dev;
> +	skb->priority = xs->sk.sk_priority;
> +	skb->mark = xs->sk.sk_mark;
> +	skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
> +	skb->destructor = xsk_destruct_skb;
> +
> +	return skb;
> +}
> +
>  static int xsk_generic_xmit(struct sock *sk)
>  {
>  	struct xdp_sock *xs = xdp_sk(sk);
> @@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk)
>  		goto out;
> 
>  	while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
> -		char *buffer;
> -		u64 addr;
> -		u32 len;
> -
>  		if (max_batch-- == 0) {
>  			err = -EAGAIN;
>  			goto out;
>  		}
> 
> -		len = desc.len;
> -		skb = sock_alloc_send_skb(sk, len, 1, &err);
> -		if (unlikely(!skb))
> +		skb = xsk_build_skb(xs, &desc);
> +		if (IS_ERR(skb)) {
> +			err = PTR_ERR(skb);
>  			goto out;
> +		}
> 
> -		skb_put(skb, len);
> -		addr = desc.addr;
> -		buffer = xsk_buff_raw_get_data(xs->pool, addr);
> -		err = skb_store_bits(skb, 0, buffer, len);
>  		/* This is the backpressure mechanism for the Tx path.
>  		 * Reserve space in the completion queue and only proceed
>  		 * if there is space in it. This avoids having to implement
>  		 * any buffering in the Tx path.
>  		 */
>  		spin_lock_irqsave(&xs->pool->cq_lock, flags);
> -		if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
> +		if (xskq_prod_reserve(xs->pool->cq)) {
>  			spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
>  			kfree_skb(skb);
>  			goto out;
>  		}
>  		spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
> 
> -		skb->dev = xs->dev;
> -		skb->priority = sk->sk_priority;
> -		skb->mark = sk->sk_mark;
> -		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
> -		skb->destructor = xsk_destruct_skb;
> -
>  		err = __dev_direct_xmit(skb, xs->queue_id);
>  		if  (err == NETDEV_TX_BUSY) {
>  			/* Tell user-space to retry the send */
> --
> 1.8.3.1

Al


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v2 3/3] xsk: build skb by page
  2021-01-20 13:56               ` Alexander Lobakin
@ 2021-01-21  7:41                 ` Magnus Karlsson
  2021-01-21 10:54                   ` Yunsheng Lin
  0 siblings, 1 reply; 18+ messages in thread
From: Magnus Karlsson @ 2021-01-21  7:41 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Xuan Zhuo, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Jakub Kicinski, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	virtualization, bpf, Network Development, open list

On Wed, Jan 20, 2021 at 9:29 PM Alexander Lobakin <alobakin@pm.me> wrote:
>
> From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Date: Wed, 20 Jan 2021 16:30:56 +0800
>
> > This patch is used to construct skb based on page to save memory copy
> > overhead.
> >
> > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> > directly construct skb. If this feature is not supported, it is still
> > necessary to copy data to construct skb.
> >
> > ---------------- Performance Testing ------------
> >
> > The test environment is Aliyun ECS server.
> > Test cmd:
> > ```
> > xdpsock -i eth0 -t  -S -s <msg size>
> > ```
> >
> > Test result data:
> >
> > size    64      512     1024    1500
> > copy    1916747 1775988 1600203 1440054
> > page    1974058 1953655 1945463 1904478
> > percent 3.0%    10.0%   21.58%  32.3%
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
> > ---
> >  net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 86 insertions(+), 18 deletions(-)
>
> Now I like the result, thanks!
>
> But Patchwork still display your series incorrectly (messages 0 and 1
> are missing). I'm concerning maintainers may not take this in such
> form. Try to pass the folder's name, not folder/*.patch to
> git send-email when sending, and don't use --in-reply-to when sending
> a new iteration.

Xuan,

Please make the new submission of the patch set a v3 even though you
did not change the code. Just so we can clearly see it is the new
submission.

> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 8037b04..40bac11 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
> >       sock_wfree(skb);
> >  }
> >
> > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> > +                                           struct xdp_desc *desc)
> > +{
> > +     u32 len, offset, copy, copied;
> > +     struct sk_buff *skb;
> > +     struct page *page;
> > +     void *buffer;
> > +     int err, i;
> > +     u64 addr;
> > +
> > +     skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err);
> > +     if (unlikely(!skb))
> > +             return ERR_PTR(err);
> > +
> > +     addr = desc->addr;
> > +     len = desc->len;
> > +
> > +     buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > +     offset = offset_in_page(buffer);
> > +     addr = buffer - xs->pool->addrs;
> > +
> > +     for (copied = 0, i = 0; copied < len; i++) {
> > +             page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > +
> > +             get_page(page);
> > +
> > +             copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> > +
> > +             skb_fill_page_desc(skb, i, page, offset, copy);
> > +
> > +             copied += copy;
> > +             addr += copy;
> > +             offset = 0;
> > +     }
> > +
> > +     skb->len += len;
> > +     skb->data_len += len;
> > +     skb->truesize += len;
> > +
> > +     refcount_add(len, &xs->sk.sk_wmem_alloc);
> > +
> > +     return skb;
> > +}
> > +
> > +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> > +                                  struct xdp_desc *desc)
> > +{
> > +     struct sk_buff *skb = NULL;
> > +
> > +     if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
> > +             skb = xsk_build_skb_zerocopy(xs, desc);
> > +             if (IS_ERR(skb))
> > +                     return skb;
> > +     } else {
> > +             void *buffer;
> > +             u32 len;
> > +             int err;
> > +
> > +             len = desc->len;
> > +             skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
> > +             if (unlikely(!skb))
> > +                     return ERR_PTR(err);
> > +
> > +             skb_put(skb, len);
> > +             buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
> > +             err = skb_store_bits(skb, 0, buffer, len);
> > +             if (unlikely(err)) {
> > +                     kfree_skb(skb);
> > +                     return ERR_PTR(err);
> > +             }
> > +     }
> > +
> > +     skb->dev = xs->dev;
> > +     skb->priority = xs->sk.sk_priority;
> > +     skb->mark = xs->sk.sk_mark;
> > +     skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
> > +     skb->destructor = xsk_destruct_skb;
> > +
> > +     return skb;
> > +}
> > +
> >  static int xsk_generic_xmit(struct sock *sk)
> >  {
> >       struct xdp_sock *xs = xdp_sk(sk);
> > @@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk)
> >               goto out;
> >
> >       while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
> > -             char *buffer;
> > -             u64 addr;
> > -             u32 len;
> > -
> >               if (max_batch-- == 0) {
> >                       err = -EAGAIN;
> >                       goto out;
> >               }
> >
> > -             len = desc.len;
> > -             skb = sock_alloc_send_skb(sk, len, 1, &err);
> > -             if (unlikely(!skb))
> > +             skb = xsk_build_skb(xs, &desc);
> > +             if (IS_ERR(skb)) {
> > +                     err = PTR_ERR(skb);
> >                       goto out;
> > +             }
> >
> > -             skb_put(skb, len);
> > -             addr = desc.addr;
> > -             buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > -             err = skb_store_bits(skb, 0, buffer, len);
> >               /* This is the backpressure mechanism for the Tx path.
> >                * Reserve space in the completion queue and only proceed
> >                * if there is space in it. This avoids having to implement
> >                * any buffering in the Tx path.
> >                */
> >               spin_lock_irqsave(&xs->pool->cq_lock, flags);
> > -             if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
> > +             if (xskq_prod_reserve(xs->pool->cq)) {
> >                       spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
> >                       kfree_skb(skb);
> >                       goto out;
> >               }
> >               spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
> >
> > -             skb->dev = xs->dev;
> > -             skb->priority = sk->sk_priority;
> > -             skb->mark = sk->sk_mark;
> > -             skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
> > -             skb->destructor = xsk_destruct_skb;
> > -
> >               err = __dev_direct_xmit(skb, xs->queue_id);
> >               if  (err == NETDEV_TX_BUSY) {
> >                       /* Tell user-space to retry the send */
> > --
> > 1.8.3.1
>
> Al
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v2 3/3] xsk: build skb by page
  2021-01-21  7:41                 ` Magnus Karlsson
@ 2021-01-21 10:54                   ` Yunsheng Lin
  0 siblings, 0 replies; 18+ messages in thread
From: Yunsheng Lin @ 2021-01-21 10:54 UTC (permalink / raw)
  To: Magnus Karlsson, Alexander Lobakin
  Cc: Xuan Zhuo, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Jakub Kicinski, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	virtualization, bpf, Network Development, open list

On 2021/1/21 15:41, Magnus Karlsson wrote:
> On Wed, Jan 20, 2021 at 9:29 PM Alexander Lobakin <alobakin@pm.me> wrote:
>>
>> From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> Date: Wed, 20 Jan 2021 16:30:56 +0800
>>
>>> This patch is used to construct skb based on page to save memory copy
>>> overhead.
>>>
>>> This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
>>> network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
>>> directly construct skb. If this feature is not supported, it is still
>>> necessary to copy data to construct skb.
>>>
>>> ---------------- Performance Testing ------------
>>>
>>> The test environment is Aliyun ECS server.
>>> Test cmd:
>>> ```
>>> xdpsock -i eth0 -t  -S -s <msg size>
>>> ```
>>>
>>> Test result data:
>>>
>>> size    64      512     1024    1500
>>> copy    1916747 1775988 1600203 1440054
>>> page    1974058 1953655 1945463 1904478
>>> percent 3.0%    10.0%   21.58%  32.3%
>>>
>>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
>>> ---
>>>  net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++----------
>>>  1 file changed, 86 insertions(+), 18 deletions(-)
>>
>> Now I like the result, thanks!
>>
>> But Patchwork still display your series incorrectly (messages 0 and 1
>> are missing). I'm concerning maintainers may not take this in such
>> form. Try to pass the folder's name, not folder/*.patch to
>> git send-email when sending, and don't use --in-reply-to when sending
>> a new iteration.
> 
> Xuan,
> 
> Please make the new submission of the patch set a v3 even though you
> did not change the code. Just so we can clearly see it is the new
> submission.
> 
>>> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
>>> index 8037b04..40bac11 100644
>>> --- a/net/xdp/xsk.c
>>> +++ b/net/xdp/xsk.c
>>> @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
>>>       sock_wfree(skb);
>>>  }
>>>
>>> +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
>>> +                                           struct xdp_desc *desc)
>>> +{
>>> +     u32 len, offset, copy, copied;
>>> +     struct sk_buff *skb;
>>> +     struct page *page;
>>> +     void *buffer;
>>> +     int err, i;
>>> +     u64 addr;
>>> +
>>> +     skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err);
>>> +     if (unlikely(!skb))
>>> +             return ERR_PTR(err);
>>> +
>>> +     addr = desc->addr;
>>> +     len = desc->len;
>>> +
>>> +     buffer = xsk_buff_raw_get_data(xs->pool, addr);
>>> +     offset = offset_in_page(buffer);
>>> +     addr = buffer - xs->pool->addrs;
>>> +
>>> +     for (copied = 0, i = 0; copied < len; i++) {
>>> +             page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
>>> +
>>> +             get_page(page);
>>> +
>>> +             copy = min_t(u32, PAGE_SIZE - offset, len - copied);
>>> +
>>> +             skb_fill_page_desc(skb, i, page, offset, copy);
>>> +
>>> +             copied += copy;
>>> +             addr += copy;
>>> +             offset = 0;
>>> +     }
>>> +
>>> +     skb->len += len;
>>> +     skb->data_len += len;
>>> +     skb->truesize += len;
>>> +
>>> +     refcount_add(len, &xs->sk.sk_wmem_alloc);
>>> +
>>> +     return skb;
>>> +}
>>> +
>>> +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
>>> +                                  struct xdp_desc *desc)
>>> +{
>>> +     struct sk_buff *skb = NULL;

It seems the above init is unnecessary, for the skb is always
set before being used.

>>> +
>>> +     if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
>>> +             skb = xsk_build_skb_zerocopy(xs, desc);
>>> +             if (IS_ERR(skb))
>>> +                     return skb;
>>> +     } else {
>>> +             void *buffer;
>>> +             u32 len;
>>> +             int err;
>>> +
>>> +             len = desc->len;
>>> +             skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
>>> +             if (unlikely(!skb))
>>> +                     return ERR_PTR(err);
>>> +
>>> +             skb_put(skb, len);
>>> +             buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
>>> +             err = skb_store_bits(skb, 0, buffer, len);
>>> +             if (unlikely(err)) {
>>> +                     kfree_skb(skb);
>>> +                     return ERR_PTR(err);
>>> +             }
>>> +     }
>>> +
>>> +     skb->dev = xs->dev;
>>> +     skb->priority = xs->sk.sk_priority;
>>> +     skb->mark = xs->sk.sk_mark;
>>> +     skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
>>> +     skb->destructor = xsk_destruct_skb;
>>> +
>>> +     return skb;
>>> +}
>>> +
>>>  static int xsk_generic_xmit(struct sock *sk)
>>>  {
>>>       struct xdp_sock *xs = xdp_sk(sk);
>>> @@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk)
>>>               goto out;
>>>
>>>       while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
>>> -             char *buffer;
>>> -             u64 addr;
>>> -             u32 len;
>>> -
>>>               if (max_batch-- == 0) {
>>>                       err = -EAGAIN;
>>>                       goto out;
>>>               }
>>>
>>> -             len = desc.len;
>>> -             skb = sock_alloc_send_skb(sk, len, 1, &err);
>>> -             if (unlikely(!skb))
>>> +             skb = xsk_build_skb(xs, &desc);
>>> +             if (IS_ERR(skb)) {
>>> +                     err = PTR_ERR(skb);
>>>                       goto out;
>>> +             }
>>>
>>> -             skb_put(skb, len);
>>> -             addr = desc.addr;
>>> -             buffer = xsk_buff_raw_get_data(xs->pool, addr);
>>> -             err = skb_store_bits(skb, 0, buffer, len);
>>>               /* This is the backpressure mechanism for the Tx path.
>>>                * Reserve space in the completion queue and only proceed
>>>                * if there is space in it. This avoids having to implement
>>>                * any buffering in the Tx path.
>>>                */
>>>               spin_lock_irqsave(&xs->pool->cq_lock, flags);
>>> -             if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) {
>>> +             if (xskq_prod_reserve(xs->pool->cq)) {
>>>                       spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
>>>                       kfree_skb(skb);
>>>                       goto out;
>>>               }
>>>               spin_unlock_irqrestore(&xs->pool->cq_lock, flags);
>>>
>>> -             skb->dev = xs->dev;
>>> -             skb->priority = sk->sk_priority;
>>> -             skb->mark = sk->sk_mark;
>>> -             skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
>>> -             skb->destructor = xsk_destruct_skb;
>>> -
>>>               err = __dev_direct_xmit(skb, xs->queue_id);
>>>               if  (err == NETDEV_TX_BUSY) {
>>>                       /* Tell user-space to retry the send */
>>> --
>>> 1.8.3.1
>>
>> Al
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-01-21 20:38 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-19  9:45 [PATCH bpf-next v2 0/3] xsk: build skb by page Xuan Zhuo
2021-01-19  9:45 ` [PATCH bpf-next v2 1/3] net: add priv_flags for allow tx skb without linear Xuan Zhuo
2021-01-20  3:01   ` Jason Wang
2021-01-19  9:45 ` [PATCH bpf-next v2 2/3] virtio-net: support IFF_TX_SKB_NO_LINEAR Xuan Zhuo
2021-01-19 14:28   ` Alexander Lobakin
     [not found]     ` <cover.1611128806.git.xuanzhuo@linux.alibaba.com>
2021-01-20  7:49       ` [PATCH net-next " Xuan Zhuo
2021-01-20  7:57         ` Michael S. Tsirkin
2021-01-20  7:50       ` [PATCH net-next v2 3/3] xsk: build skb by page Xuan Zhuo
2021-01-20  8:10         ` Michael S. Tsirkin
2021-01-20  8:11           ` Michael S. Tsirkin
     [not found]           ` <cover.1611131344.git.xuanzhuo@linux.alibaba.com>
2021-01-20  8:30             ` Xuan Zhuo
2021-01-20 13:56               ` Alexander Lobakin
2021-01-21  7:41                 ` Magnus Karlsson
2021-01-21 10:54                   ` Yunsheng Lin
2021-01-19  9:45 ` [PATCH bpf-next " Xuan Zhuo
2021-01-19 14:43   ` Alexander Lobakin
2021-01-19  9:50 ` [PATCH bpf-next v2 0/3] " Michael S. Tsirkin
     [not found]   ` <1611053609.502882-1-xuanzhuo@linux.alibaba.com>
2021-01-19 11:02     ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).