* Re: [PATCH bpf-next v2 2/3] virtio-net: support IFF_TX_SKB_NO_LINEAR [not found] ` <21d2f709140470eb143e3c6c69e2a5dbd20bf2e7.1611048724.git.xuanzhuo@linux.alibaba.com> @ 2021-01-19 14:28 ` Alexander Lobakin [not found] ` <cover.1611128806.git.xuanzhuo@linux.alibaba.com> 0 siblings, 1 reply; 11+ messages in thread From: Alexander Lobakin @ 2021-01-19 14:28 UTC (permalink / raw) To: Xuan Zhuo Cc: Alexander Lobakin, Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski, bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, virtualization, bpf, netdev, linux-kernel From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Date: Tue, 19 Jan 2021 17:45:11 +0800 > Virtio net supports the case where the skb linear space is empty, so add > priv_flags. > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > --- > drivers/net/virtio_net.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index ba8e637..80d637f 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -2972,7 +2972,8 @@ static int virtnet_probe(struct virtio_device *vdev) > return -ENOMEM; > > /* Set up network device as normal. */ > - dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE; > + dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE | > + IFF_TX_SKB_NO_LINEAR; Please align IFF_TX_SKB_NO_LINEAR to IFF_UNICAST_FLT: dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE | IFF_TX_SKB_NO_LINEAR; > dev->netdev_ops = &virtnet_netdev; > dev->features = NETIF_F_HIGHDMA; Also, the series you sent is showed up incorrectly on lore.kernel.org and patchwork.kernel.org. Seems like you used different To and Cc for its parts. Please use scripts/get_maintainer.pl to the whole series: scripts/get_maintainer.pl ../patch-build-skb-by-page/* And use one list of addresses for every message, so they wouldn't lost. Al ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <cover.1611128806.git.xuanzhuo@linux.alibaba.com>]
* [PATCH net-next v2 2/3] virtio-net: support IFF_TX_SKB_NO_LINEAR [not found] ` <cover.1611128806.git.xuanzhuo@linux.alibaba.com> @ 2021-01-20 7:49 ` Xuan Zhuo 2021-01-20 7:57 ` Michael S. Tsirkin 2021-01-20 7:50 ` [PATCH net-next v2 3/3] xsk: build skb by page Xuan Zhuo 1 sibling, 1 reply; 11+ messages in thread From: Xuan Zhuo @ 2021-01-20 7:49 UTC (permalink / raw) To: Alexander Lobakin Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski, bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, virtualization, bpf, netdev, linux-kernel Virtio net supports the case where the skb linear space is empty, so add priv_flags. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> --- drivers/net/virtio_net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ba8e637..f2ff6c3 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2972,7 +2972,8 @@ static int virtnet_probe(struct virtio_device *vdev) return -ENOMEM; /* Set up network device as normal. */ - dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE; + dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE | + IFF_TX_SKB_NO_LINEAR; dev->netdev_ops = &virtnet_netdev; dev->features = NETIF_F_HIGHDMA; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 2/3] virtio-net: support IFF_TX_SKB_NO_LINEAR 2021-01-20 7:49 ` [PATCH net-next " Xuan Zhuo @ 2021-01-20 7:57 ` Michael S. Tsirkin 0 siblings, 0 replies; 11+ messages in thread From: Michael S. Tsirkin @ 2021-01-20 7:57 UTC (permalink / raw) To: Xuan Zhuo Cc: Alexander Lobakin, Jason Wang, David S. Miller, Jakub Kicinski, bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, virtualization, bpf, netdev, linux-kernel On Wed, Jan 20, 2021 at 03:49:10PM +0800, Xuan Zhuo wrote: > Virtio net supports the case where the skb linear space is empty, so add > priv_flags. > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> > --- > drivers/net/virtio_net.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index ba8e637..f2ff6c3 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -2972,7 +2972,8 @@ static int virtnet_probe(struct virtio_device *vdev) > return -ENOMEM; > > /* Set up network device as normal. */ > - dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE; > + dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE | > + IFF_TX_SKB_NO_LINEAR; > dev->netdev_ops = &virtnet_netdev; > dev->features = NETIF_F_HIGHDMA; > > -- > 1.8.3.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net-next v2 3/3] xsk: build skb by page [not found] ` <cover.1611128806.git.xuanzhuo@linux.alibaba.com> 2021-01-20 7:49 ` [PATCH net-next " Xuan Zhuo @ 2021-01-20 7:50 ` Xuan Zhuo 2021-01-20 8:10 ` Michael S. Tsirkin 1 sibling, 1 reply; 11+ messages in thread From: Xuan Zhuo @ 2021-01-20 7:50 UTC (permalink / raw) To: Alexander Lobakin Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski, bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, virtualization, bpf, netdev, linux-kernel This patch is used to construct skb based on page to save memory copy overhead. This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to directly construct skb. If this feature is not supported, it is still necessary to copy data to construct skb. ---------------- Performance Testing ------------ The test environment is Aliyun ECS server. Test cmd: ``` xdpsock -i eth0 -t -S -s <msg size> ``` Test result data: size 64 512 1024 1500 copy 1916747 1775988 1600203 1440054 page 1974058 1953655 1945463 1904478 percent 3.0% 10.0% 21.58% 32.3% Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> --- net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 86 insertions(+), 18 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 8037b04..817a3a5 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb) sock_wfree(skb); } +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, + struct xdp_desc *desc) +{ + u32 len, offset, copy, copied; + struct sk_buff *skb; + struct page *page; + char *buffer; + int err, i; + u64 addr; + + skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err); + if (unlikely(!skb)) + return ERR_PTR(err); + + addr = desc->addr; + len = desc->len; + + buffer = xsk_buff_raw_get_data(xs->pool, addr); + offset = offset_in_page(buffer); + addr = buffer - (char *)xs->pool->addrs; + + for (copied = 0, i = 0; copied < len; i++) { + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT]; + + get_page(page); + + copy = min_t(u32, PAGE_SIZE - offset, len - copied); + + skb_fill_page_desc(skb, i, page, offset, copy); + + copied += copy; + addr += copy; + offset = 0; + } + + skb->len += len; + skb->data_len += len; + skb->truesize += len; + + refcount_add(len, &xs->sk.sk_wmem_alloc); + + return skb; +} + +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, + struct xdp_desc *desc) +{ + struct sk_buff *skb = NULL; + + if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { + skb = xsk_build_skb_zerocopy(xs, desc); + if (IS_ERR(skb)) + return skb; + } else { + char *buffer; + u32 len; + int err; + + len = desc->len; + skb = sock_alloc_send_skb(&xs->sk, len, 1, &err); + if (unlikely(!skb)) + return ERR_PTR(err); + + skb_put(skb, len); + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); + err = skb_store_bits(skb, 0, buffer, len); + if (unlikely(err)) { + kfree_skb(skb); + return ERR_PTR(err); + } + } + + skb->dev = xs->dev; + skb->priority = xs->sk.sk_priority; + skb->mark = xs->sk.sk_mark; + skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; + skb->destructor = xsk_destruct_skb; + + return skb; +} + static int xsk_generic_xmit(struct sock *sk) { struct xdp_sock *xs = xdp_sk(sk); @@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk) goto out; while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { - char *buffer; - u64 addr; - u32 len; - if (max_batch-- == 0) { err = -EAGAIN; goto out; } - len = desc.len; - skb = sock_alloc_send_skb(sk, len, 1, &err); - if (unlikely(!skb)) + skb = xsk_build_skb(xs, &desc); + if (IS_ERR(skb)) { + err = PTR_ERR(skb); goto out; + } - skb_put(skb, len); - addr = desc.addr; - buffer = xsk_buff_raw_get_data(xs->pool, addr); - err = skb_store_bits(skb, 0, buffer, len); /* This is the backpressure mechanism for the Tx path. * Reserve space in the completion queue and only proceed * if there is space in it. This avoids having to implement * any buffering in the Tx path. */ spin_lock_irqsave(&xs->pool->cq_lock, flags); - if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { + if (xskq_prod_reserve(xs->pool->cq)) { spin_unlock_irqrestore(&xs->pool->cq_lock, flags); kfree_skb(skb); goto out; } spin_unlock_irqrestore(&xs->pool->cq_lock, flags); - skb->dev = xs->dev; - skb->priority = sk->sk_priority; - skb->mark = sk->sk_mark; - skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; - skb->destructor = xsk_destruct_skb; - err = __dev_direct_xmit(skb, xs->queue_id); if (err == NETDEV_TX_BUSY) { /* Tell user-space to retry the send */ -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 3/3] xsk: build skb by page 2021-01-20 7:50 ` [PATCH net-next v2 3/3] xsk: build skb by page Xuan Zhuo @ 2021-01-20 8:10 ` Michael S. Tsirkin 2021-01-20 8:11 ` Michael S. Tsirkin [not found] ` <cover.1611131344.git.xuanzhuo@linux.alibaba.com> 0 siblings, 2 replies; 11+ messages in thread From: Michael S. Tsirkin @ 2021-01-20 8:10 UTC (permalink / raw) To: Xuan Zhuo Cc: Alexander Lobakin, Jason Wang, David S. Miller, Jakub Kicinski, bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, virtualization, bpf, netdev, linux-kernel On Wed, Jan 20, 2021 at 03:50:01PM +0800, Xuan Zhuo wrote: > This patch is used to construct skb based on page to save memory copy > overhead. > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to > directly construct skb. If this feature is not supported, it is still > necessary to copy data to construct skb. > > ---------------- Performance Testing ------------ > > The test environment is Aliyun ECS server. > Test cmd: > ``` > xdpsock -i eth0 -t -S -s <msg size> > ``` > > Test result data: > > size 64 512 1024 1500 > copy 1916747 1775988 1600203 1440054 > page 1974058 1953655 1945463 1904478 > percent 3.0% 10.0% 21.58% 32.3% > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > Reviewed-by: Dust Li <dust.li@linux.alibaba.com> I can't see the cover letter or 1/3 in this series - was probably threaded incorrectly? > --- > net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 86 insertions(+), 18 deletions(-) > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index 8037b04..817a3a5 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb) > sock_wfree(skb); > } > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, > + struct xdp_desc *desc) > +{ > + u32 len, offset, copy, copied; > + struct sk_buff *skb; > + struct page *page; > + char *buffer; Actually, make this void *, this way you will not need casts down the road. I know this is from xsk_generic_xmit - I don't know why it's char * there, either. > + int err, i; > + u64 addr; > + > + skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err); > + if (unlikely(!skb)) > + return ERR_PTR(err); > + > + addr = desc->addr; > + len = desc->len; > + > + buffer = xsk_buff_raw_get_data(xs->pool, addr); > + offset = offset_in_page(buffer); > + addr = buffer - (char *)xs->pool->addrs; > + > + for (copied = 0, i = 0; copied < len; i++) { > + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT]; > + > + get_page(page); > + > + copy = min_t(u32, PAGE_SIZE - offset, len - copied); > + > + skb_fill_page_desc(skb, i, page, offset, copy); > + > + copied += copy; > + addr += copy; > + offset = 0; > + } > + > + skb->len += len; > + skb->data_len += len; > + skb->truesize += len; > + > + refcount_add(len, &xs->sk.sk_wmem_alloc); > + > + return skb; > +} > + > +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, > + struct xdp_desc *desc) > +{ > + struct sk_buff *skb = NULL; > + > + if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { > + skb = xsk_build_skb_zerocopy(xs, desc); > + if (IS_ERR(skb)) > + return skb; > + } else { > + char *buffer; > + u32 len; > + int err; > + > + len = desc->len; > + skb = sock_alloc_send_skb(&xs->sk, len, 1, &err); > + if (unlikely(!skb)) > + return ERR_PTR(err); > + > + skb_put(skb, len); > + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); > + err = skb_store_bits(skb, 0, buffer, len); > + if (unlikely(err)) { > + kfree_skb(skb); > + return ERR_PTR(err); > + } > + } > + > + skb->dev = xs->dev; > + skb->priority = xs->sk.sk_priority; > + skb->mark = xs->sk.sk_mark; > + skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; > + skb->destructor = xsk_destruct_skb; > + > + return skb; > +} > + > static int xsk_generic_xmit(struct sock *sk) > { > struct xdp_sock *xs = xdp_sk(sk); > @@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk) > goto out; > > while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { > - char *buffer; > - u64 addr; > - u32 len; > - > if (max_batch-- == 0) { > err = -EAGAIN; > goto out; > } > > - len = desc.len; > - skb = sock_alloc_send_skb(sk, len, 1, &err); > - if (unlikely(!skb)) > + skb = xsk_build_skb(xs, &desc); > + if (IS_ERR(skb)) { > + err = PTR_ERR(skb); > goto out; > + } > > - skb_put(skb, len); > - addr = desc.addr; > - buffer = xsk_buff_raw_get_data(xs->pool, addr); > - err = skb_store_bits(skb, 0, buffer, len); > /* This is the backpressure mechanism for the Tx path. > * Reserve space in the completion queue and only proceed > * if there is space in it. This avoids having to implement > * any buffering in the Tx path. > */ > spin_lock_irqsave(&xs->pool->cq_lock, flags); > - if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { > + if (xskq_prod_reserve(xs->pool->cq)) { > spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > kfree_skb(skb); > goto out; > } > spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > > - skb->dev = xs->dev; > - skb->priority = sk->sk_priority; > - skb->mark = sk->sk_mark; > - skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; > - skb->destructor = xsk_destruct_skb; > - > err = __dev_direct_xmit(skb, xs->queue_id); > if (err == NETDEV_TX_BUSY) { > /* Tell user-space to retry the send */ > -- > 1.8.3.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 3/3] xsk: build skb by page 2021-01-20 8:10 ` Michael S. Tsirkin @ 2021-01-20 8:11 ` Michael S. Tsirkin [not found] ` <cover.1611131344.git.xuanzhuo@linux.alibaba.com> 1 sibling, 0 replies; 11+ messages in thread From: Michael S. Tsirkin @ 2021-01-20 8:11 UTC (permalink / raw) To: Xuan Zhuo Cc: Alexander Lobakin, Jason Wang, David S. Miller, Jakub Kicinski, bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, virtualization, bpf, netdev, linux-kernel On Wed, Jan 20, 2021 at 03:11:04AM -0500, Michael S. Tsirkin wrote: > On Wed, Jan 20, 2021 at 03:50:01PM +0800, Xuan Zhuo wrote: > > This patch is used to construct skb based on page to save memory copy > > overhead. > > > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to > > directly construct skb. If this feature is not supported, it is still > > necessary to copy data to construct skb. > > > > ---------------- Performance Testing ------------ > > > > The test environment is Aliyun ECS server. > > Test cmd: > > ``` > > xdpsock -i eth0 -t -S -s <msg size> > > ``` > > > > Test result data: > > > > size 64 512 1024 1500 > > copy 1916747 1775988 1600203 1440054 > > page 1974058 1953655 1945463 1904478 > > percent 3.0% 10.0% 21.58% 32.3% > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > Reviewed-by: Dust Li <dust.li@linux.alibaba.com> > > I can't see the cover letter or 1/3 in this series - was probably > threaded incorrectly? Hmm looked again and now I do see them. My mistake pls ignore. > > > --- > > net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++---------- > > 1 file changed, 86 insertions(+), 18 deletions(-) > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > index 8037b04..817a3a5 100644 > > --- a/net/xdp/xsk.c > > +++ b/net/xdp/xsk.c > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb) > > sock_wfree(skb); > > } > > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, > > + struct xdp_desc *desc) > > +{ > > + u32 len, offset, copy, copied; > > + struct sk_buff *skb; > > + struct page *page; > > + char *buffer; > > Actually, make this void *, this way you will not need > casts down the road. I know this is from xsk_generic_xmit - > I don't know why it's char * there, either. > > > + int err, i; > > + u64 addr; > > + > > + skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err); > > + if (unlikely(!skb)) > > + return ERR_PTR(err); > > + > > + addr = desc->addr; > > + len = desc->len; > > + > > + buffer = xsk_buff_raw_get_data(xs->pool, addr); > > + offset = offset_in_page(buffer); > > + addr = buffer - (char *)xs->pool->addrs; > > + > > + for (copied = 0, i = 0; copied < len; i++) { > > + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT]; > > + > > + get_page(page); > > + > > + copy = min_t(u32, PAGE_SIZE - offset, len - copied); > > + > > + skb_fill_page_desc(skb, i, page, offset, copy); > > + > > + copied += copy; > > + addr += copy; > > + offset = 0; > > + } > > + > > + skb->len += len; > > + skb->data_len += len; > > + skb->truesize += len; > > + > > + refcount_add(len, &xs->sk.sk_wmem_alloc); > > + > > + return skb; > > +} > > + > > +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, > > + struct xdp_desc *desc) > > +{ > > + struct sk_buff *skb = NULL; > > + > > + if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { > > + skb = xsk_build_skb_zerocopy(xs, desc); > > + if (IS_ERR(skb)) > > + return skb; > > + } else { > > + char *buffer; > > + u32 len; > > + int err; > > + > > + len = desc->len; > > + skb = sock_alloc_send_skb(&xs->sk, len, 1, &err); > > + if (unlikely(!skb)) > > + return ERR_PTR(err); > > + > > + skb_put(skb, len); > > + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); > > + err = skb_store_bits(skb, 0, buffer, len); > > + if (unlikely(err)) { > > + kfree_skb(skb); > > + return ERR_PTR(err); > > + } > > + } > > + > > + skb->dev = xs->dev; > > + skb->priority = xs->sk.sk_priority; > > + skb->mark = xs->sk.sk_mark; > > + skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; > > + skb->destructor = xsk_destruct_skb; > > + > > + return skb; > > +} > > + > > static int xsk_generic_xmit(struct sock *sk) > > { > > struct xdp_sock *xs = xdp_sk(sk); > > @@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk) > > goto out; > > > > while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { > > - char *buffer; > > - u64 addr; > > - u32 len; > > - > > if (max_batch-- == 0) { > > err = -EAGAIN; > > goto out; > > } > > > > - len = desc.len; > > - skb = sock_alloc_send_skb(sk, len, 1, &err); > > - if (unlikely(!skb)) > > + skb = xsk_build_skb(xs, &desc); > > + if (IS_ERR(skb)) { > > + err = PTR_ERR(skb); > > goto out; > > + } > > > > - skb_put(skb, len); > > - addr = desc.addr; > > - buffer = xsk_buff_raw_get_data(xs->pool, addr); > > - err = skb_store_bits(skb, 0, buffer, len); > > /* This is the backpressure mechanism for the Tx path. > > * Reserve space in the completion queue and only proceed > > * if there is space in it. This avoids having to implement > > * any buffering in the Tx path. > > */ > > spin_lock_irqsave(&xs->pool->cq_lock, flags); > > - if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { > > + if (xskq_prod_reserve(xs->pool->cq)) { > > spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > > kfree_skb(skb); > > goto out; > > } > > spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > > > > - skb->dev = xs->dev; > > - skb->priority = sk->sk_priority; > > - skb->mark = sk->sk_mark; > > - skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; > > - skb->destructor = xsk_destruct_skb; > > - > > err = __dev_direct_xmit(skb, xs->queue_id); > > if (err == NETDEV_TX_BUSY) { > > /* Tell user-space to retry the send */ > > -- > > 1.8.3.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <cover.1611131344.git.xuanzhuo@linux.alibaba.com>]
* [PATCH net-next v2 3/3] xsk: build skb by page [not found] ` <cover.1611131344.git.xuanzhuo@linux.alibaba.com> @ 2021-01-20 8:30 ` Xuan Zhuo 2021-01-20 13:56 ` Alexander Lobakin 0 siblings, 1 reply; 11+ messages in thread From: Xuan Zhuo @ 2021-01-20 8:30 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Alexander Lobakin, Jason Wang, David S. Miller, Jakub Kicinski, bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, virtualization, bpf, netdev, linux-kernel This patch is used to construct skb based on page to save memory copy overhead. This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to directly construct skb. If this feature is not supported, it is still necessary to copy data to construct skb. ---------------- Performance Testing ------------ The test environment is Aliyun ECS server. Test cmd: ``` xdpsock -i eth0 -t -S -s <msg size> ``` Test result data: size 64 512 1024 1500 copy 1916747 1775988 1600203 1440054 page 1974058 1953655 1945463 1904478 percent 3.0% 10.0% 21.58% 32.3% Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> --- net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 86 insertions(+), 18 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 8037b04..40bac11 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb) sock_wfree(skb); } +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, + struct xdp_desc *desc) +{ + u32 len, offset, copy, copied; + struct sk_buff *skb; + struct page *page; + void *buffer; + int err, i; + u64 addr; + + skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err); + if (unlikely(!skb)) + return ERR_PTR(err); + + addr = desc->addr; + len = desc->len; + + buffer = xsk_buff_raw_get_data(xs->pool, addr); + offset = offset_in_page(buffer); + addr = buffer - xs->pool->addrs; + + for (copied = 0, i = 0; copied < len; i++) { + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT]; + + get_page(page); + + copy = min_t(u32, PAGE_SIZE - offset, len - copied); + + skb_fill_page_desc(skb, i, page, offset, copy); + + copied += copy; + addr += copy; + offset = 0; + } + + skb->len += len; + skb->data_len += len; + skb->truesize += len; + + refcount_add(len, &xs->sk.sk_wmem_alloc); + + return skb; +} + +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, + struct xdp_desc *desc) +{ + struct sk_buff *skb = NULL; + + if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { + skb = xsk_build_skb_zerocopy(xs, desc); + if (IS_ERR(skb)) + return skb; + } else { + void *buffer; + u32 len; + int err; + + len = desc->len; + skb = sock_alloc_send_skb(&xs->sk, len, 1, &err); + if (unlikely(!skb)) + return ERR_PTR(err); + + skb_put(skb, len); + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); + err = skb_store_bits(skb, 0, buffer, len); + if (unlikely(err)) { + kfree_skb(skb); + return ERR_PTR(err); + } + } + + skb->dev = xs->dev; + skb->priority = xs->sk.sk_priority; + skb->mark = xs->sk.sk_mark; + skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; + skb->destructor = xsk_destruct_skb; + + return skb; +} + static int xsk_generic_xmit(struct sock *sk) { struct xdp_sock *xs = xdp_sk(sk); @@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk) goto out; while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { - char *buffer; - u64 addr; - u32 len; - if (max_batch-- == 0) { err = -EAGAIN; goto out; } - len = desc.len; - skb = sock_alloc_send_skb(sk, len, 1, &err); - if (unlikely(!skb)) + skb = xsk_build_skb(xs, &desc); + if (IS_ERR(skb)) { + err = PTR_ERR(skb); goto out; + } - skb_put(skb, len); - addr = desc.addr; - buffer = xsk_buff_raw_get_data(xs->pool, addr); - err = skb_store_bits(skb, 0, buffer, len); /* This is the backpressure mechanism for the Tx path. * Reserve space in the completion queue and only proceed * if there is space in it. This avoids having to implement * any buffering in the Tx path. */ spin_lock_irqsave(&xs->pool->cq_lock, flags); - if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { + if (xskq_prod_reserve(xs->pool->cq)) { spin_unlock_irqrestore(&xs->pool->cq_lock, flags); kfree_skb(skb); goto out; } spin_unlock_irqrestore(&xs->pool->cq_lock, flags); - skb->dev = xs->dev; - skb->priority = sk->sk_priority; - skb->mark = sk->sk_mark; - skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; - skb->destructor = xsk_destruct_skb; - err = __dev_direct_xmit(skb, xs->queue_id); if (err == NETDEV_TX_BUSY) { /* Tell user-space to retry the send */ -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 3/3] xsk: build skb by page 2021-01-20 8:30 ` Xuan Zhuo @ 2021-01-20 13:56 ` Alexander Lobakin 2021-01-21 7:41 ` Magnus Karlsson 0 siblings, 1 reply; 11+ messages in thread From: Alexander Lobakin @ 2021-01-20 13:56 UTC (permalink / raw) To: Xuan Zhuo Cc: Alexander Lobakin, Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski, bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, virtualization, bpf, netdev, linux-kernel From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Date: Wed, 20 Jan 2021 16:30:56 +0800 > This patch is used to construct skb based on page to save memory copy > overhead. > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to > directly construct skb. If this feature is not supported, it is still > necessary to copy data to construct skb. > > ---------------- Performance Testing ------------ > > The test environment is Aliyun ECS server. > Test cmd: > ``` > xdpsock -i eth0 -t -S -s <msg size> > ``` > > Test result data: > > size 64 512 1024 1500 > copy 1916747 1775988 1600203 1440054 > page 1974058 1953655 1945463 1904478 > percent 3.0% 10.0% 21.58% 32.3% > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > Reviewed-by: Dust Li <dust.li@linux.alibaba.com> > --- > net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 86 insertions(+), 18 deletions(-) Now I like the result, thanks! But Patchwork still display your series incorrectly (messages 0 and 1 are missing). I'm concerning maintainers may not take this in such form. Try to pass the folder's name, not folder/*.patch to git send-email when sending, and don't use --in-reply-to when sending a new iteration. > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index 8037b04..40bac11 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb) > sock_wfree(skb); > } > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, > + struct xdp_desc *desc) > +{ > + u32 len, offset, copy, copied; > + struct sk_buff *skb; > + struct page *page; > + void *buffer; > + int err, i; > + u64 addr; > + > + skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err); > + if (unlikely(!skb)) > + return ERR_PTR(err); > + > + addr = desc->addr; > + len = desc->len; > + > + buffer = xsk_buff_raw_get_data(xs->pool, addr); > + offset = offset_in_page(buffer); > + addr = buffer - xs->pool->addrs; > + > + for (copied = 0, i = 0; copied < len; i++) { > + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT]; > + > + get_page(page); > + > + copy = min_t(u32, PAGE_SIZE - offset, len - copied); > + > + skb_fill_page_desc(skb, i, page, offset, copy); > + > + copied += copy; > + addr += copy; > + offset = 0; > + } > + > + skb->len += len; > + skb->data_len += len; > + skb->truesize += len; > + > + refcount_add(len, &xs->sk.sk_wmem_alloc); > + > + return skb; > +} > + > +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, > + struct xdp_desc *desc) > +{ > + struct sk_buff *skb = NULL; > + > + if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { > + skb = xsk_build_skb_zerocopy(xs, desc); > + if (IS_ERR(skb)) > + return skb; > + } else { > + void *buffer; > + u32 len; > + int err; > + > + len = desc->len; > + skb = sock_alloc_send_skb(&xs->sk, len, 1, &err); > + if (unlikely(!skb)) > + return ERR_PTR(err); > + > + skb_put(skb, len); > + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); > + err = skb_store_bits(skb, 0, buffer, len); > + if (unlikely(err)) { > + kfree_skb(skb); > + return ERR_PTR(err); > + } > + } > + > + skb->dev = xs->dev; > + skb->priority = xs->sk.sk_priority; > + skb->mark = xs->sk.sk_mark; > + skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; > + skb->destructor = xsk_destruct_skb; > + > + return skb; > +} > + > static int xsk_generic_xmit(struct sock *sk) > { > struct xdp_sock *xs = xdp_sk(sk); > @@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk) > goto out; > > while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { > - char *buffer; > - u64 addr; > - u32 len; > - > if (max_batch-- == 0) { > err = -EAGAIN; > goto out; > } > > - len = desc.len; > - skb = sock_alloc_send_skb(sk, len, 1, &err); > - if (unlikely(!skb)) > + skb = xsk_build_skb(xs, &desc); > + if (IS_ERR(skb)) { > + err = PTR_ERR(skb); > goto out; > + } > > - skb_put(skb, len); > - addr = desc.addr; > - buffer = xsk_buff_raw_get_data(xs->pool, addr); > - err = skb_store_bits(skb, 0, buffer, len); > /* This is the backpressure mechanism for the Tx path. > * Reserve space in the completion queue and only proceed > * if there is space in it. This avoids having to implement > * any buffering in the Tx path. > */ > spin_lock_irqsave(&xs->pool->cq_lock, flags); > - if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { > + if (xskq_prod_reserve(xs->pool->cq)) { > spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > kfree_skb(skb); > goto out; > } > spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > > - skb->dev = xs->dev; > - skb->priority = sk->sk_priority; > - skb->mark = sk->sk_mark; > - skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; > - skb->destructor = xsk_destruct_skb; > - > err = __dev_direct_xmit(skb, xs->queue_id); > if (err == NETDEV_TX_BUSY) { > /* Tell user-space to retry the send */ > -- > 1.8.3.1 Al ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 3/3] xsk: build skb by page 2021-01-20 13:56 ` Alexander Lobakin @ 2021-01-21 7:41 ` Magnus Karlsson 2021-01-21 10:54 ` Yunsheng Lin 0 siblings, 1 reply; 11+ messages in thread From: Magnus Karlsson @ 2021-01-21 7:41 UTC (permalink / raw) To: Alexander Lobakin Cc: Xuan Zhuo, Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski, Björn Töpel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, virtualization, bpf, Network Development, open list On Wed, Jan 20, 2021 at 9:29 PM Alexander Lobakin <alobakin@pm.me> wrote: > > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > Date: Wed, 20 Jan 2021 16:30:56 +0800 > > > This patch is used to construct skb based on page to save memory copy > > overhead. > > > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to > > directly construct skb. If this feature is not supported, it is still > > necessary to copy data to construct skb. > > > > ---------------- Performance Testing ------------ > > > > The test environment is Aliyun ECS server. > > Test cmd: > > ``` > > xdpsock -i eth0 -t -S -s <msg size> > > ``` > > > > Test result data: > > > > size 64 512 1024 1500 > > copy 1916747 1775988 1600203 1440054 > > page 1974058 1953655 1945463 1904478 > > percent 3.0% 10.0% 21.58% 32.3% > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > > Reviewed-by: Dust Li <dust.li@linux.alibaba.com> > > --- > > net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++---------- > > 1 file changed, 86 insertions(+), 18 deletions(-) > > Now I like the result, thanks! > > But Patchwork still display your series incorrectly (messages 0 and 1 > are missing). I'm concerning maintainers may not take this in such > form. Try to pass the folder's name, not folder/*.patch to > git send-email when sending, and don't use --in-reply-to when sending > a new iteration. Xuan, Please make the new submission of the patch set a v3 even though you did not change the code. Just so we can clearly see it is the new submission. > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > index 8037b04..40bac11 100644 > > --- a/net/xdp/xsk.c > > +++ b/net/xdp/xsk.c > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb) > > sock_wfree(skb); > > } > > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, > > + struct xdp_desc *desc) > > +{ > > + u32 len, offset, copy, copied; > > + struct sk_buff *skb; > > + struct page *page; > > + void *buffer; > > + int err, i; > > + u64 addr; > > + > > + skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err); > > + if (unlikely(!skb)) > > + return ERR_PTR(err); > > + > > + addr = desc->addr; > > + len = desc->len; > > + > > + buffer = xsk_buff_raw_get_data(xs->pool, addr); > > + offset = offset_in_page(buffer); > > + addr = buffer - xs->pool->addrs; > > + > > + for (copied = 0, i = 0; copied < len; i++) { > > + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT]; > > + > > + get_page(page); > > + > > + copy = min_t(u32, PAGE_SIZE - offset, len - copied); > > + > > + skb_fill_page_desc(skb, i, page, offset, copy); > > + > > + copied += copy; > > + addr += copy; > > + offset = 0; > > + } > > + > > + skb->len += len; > > + skb->data_len += len; > > + skb->truesize += len; > > + > > + refcount_add(len, &xs->sk.sk_wmem_alloc); > > + > > + return skb; > > +} > > + > > +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, > > + struct xdp_desc *desc) > > +{ > > + struct sk_buff *skb = NULL; > > + > > + if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { > > + skb = xsk_build_skb_zerocopy(xs, desc); > > + if (IS_ERR(skb)) > > + return skb; > > + } else { > > + void *buffer; > > + u32 len; > > + int err; > > + > > + len = desc->len; > > + skb = sock_alloc_send_skb(&xs->sk, len, 1, &err); > > + if (unlikely(!skb)) > > + return ERR_PTR(err); > > + > > + skb_put(skb, len); > > + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); > > + err = skb_store_bits(skb, 0, buffer, len); > > + if (unlikely(err)) { > > + kfree_skb(skb); > > + return ERR_PTR(err); > > + } > > + } > > + > > + skb->dev = xs->dev; > > + skb->priority = xs->sk.sk_priority; > > + skb->mark = xs->sk.sk_mark; > > + skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; > > + skb->destructor = xsk_destruct_skb; > > + > > + return skb; > > +} > > + > > static int xsk_generic_xmit(struct sock *sk) > > { > > struct xdp_sock *xs = xdp_sk(sk); > > @@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk) > > goto out; > > > > while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { > > - char *buffer; > > - u64 addr; > > - u32 len; > > - > > if (max_batch-- == 0) { > > err = -EAGAIN; > > goto out; > > } > > > > - len = desc.len; > > - skb = sock_alloc_send_skb(sk, len, 1, &err); > > - if (unlikely(!skb)) > > + skb = xsk_build_skb(xs, &desc); > > + if (IS_ERR(skb)) { > > + err = PTR_ERR(skb); > > goto out; > > + } > > > > - skb_put(skb, len); > > - addr = desc.addr; > > - buffer = xsk_buff_raw_get_data(xs->pool, addr); > > - err = skb_store_bits(skb, 0, buffer, len); > > /* This is the backpressure mechanism for the Tx path. > > * Reserve space in the completion queue and only proceed > > * if there is space in it. This avoids having to implement > > * any buffering in the Tx path. > > */ > > spin_lock_irqsave(&xs->pool->cq_lock, flags); > > - if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { > > + if (xskq_prod_reserve(xs->pool->cq)) { > > spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > > kfree_skb(skb); > > goto out; > > } > > spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > > > > - skb->dev = xs->dev; > > - skb->priority = sk->sk_priority; > > - skb->mark = sk->sk_mark; > > - skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; > > - skb->destructor = xsk_destruct_skb; > > - > > err = __dev_direct_xmit(skb, xs->queue_id); > > if (err == NETDEV_TX_BUSY) { > > /* Tell user-space to retry the send */ > > -- > > 1.8.3.1 > > Al > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 3/3] xsk: build skb by page 2021-01-21 7:41 ` Magnus Karlsson @ 2021-01-21 10:54 ` Yunsheng Lin 0 siblings, 0 replies; 11+ messages in thread From: Yunsheng Lin @ 2021-01-21 10:54 UTC (permalink / raw) To: Magnus Karlsson, Alexander Lobakin Cc: Xuan Zhuo, Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski, Björn Töpel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, virtualization, bpf, Network Development, open list On 2021/1/21 15:41, Magnus Karlsson wrote: > On Wed, Jan 20, 2021 at 9:29 PM Alexander Lobakin <alobakin@pm.me> wrote: >> >> From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> >> Date: Wed, 20 Jan 2021 16:30:56 +0800 >> >>> This patch is used to construct skb based on page to save memory copy >>> overhead. >>> >>> This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the >>> network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to >>> directly construct skb. If this feature is not supported, it is still >>> necessary to copy data to construct skb. >>> >>> ---------------- Performance Testing ------------ >>> >>> The test environment is Aliyun ECS server. >>> Test cmd: >>> ``` >>> xdpsock -i eth0 -t -S -s <msg size> >>> ``` >>> >>> Test result data: >>> >>> size 64 512 1024 1500 >>> copy 1916747 1775988 1600203 1440054 >>> page 1974058 1953655 1945463 1904478 >>> percent 3.0% 10.0% 21.58% 32.3% >>> >>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> >>> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> >>> --- >>> net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++---------- >>> 1 file changed, 86 insertions(+), 18 deletions(-) >> >> Now I like the result, thanks! >> >> But Patchwork still display your series incorrectly (messages 0 and 1 >> are missing). I'm concerning maintainers may not take this in such >> form. Try to pass the folder's name, not folder/*.patch to >> git send-email when sending, and don't use --in-reply-to when sending >> a new iteration. > > Xuan, > > Please make the new submission of the patch set a v3 even though you > did not change the code. Just so we can clearly see it is the new > submission. > >>> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c >>> index 8037b04..40bac11 100644 >>> --- a/net/xdp/xsk.c >>> +++ b/net/xdp/xsk.c >>> @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb) >>> sock_wfree(skb); >>> } >>> >>> +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, >>> + struct xdp_desc *desc) >>> +{ >>> + u32 len, offset, copy, copied; >>> + struct sk_buff *skb; >>> + struct page *page; >>> + void *buffer; >>> + int err, i; >>> + u64 addr; >>> + >>> + skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err); >>> + if (unlikely(!skb)) >>> + return ERR_PTR(err); >>> + >>> + addr = desc->addr; >>> + len = desc->len; >>> + >>> + buffer = xsk_buff_raw_get_data(xs->pool, addr); >>> + offset = offset_in_page(buffer); >>> + addr = buffer - xs->pool->addrs; >>> + >>> + for (copied = 0, i = 0; copied < len; i++) { >>> + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT]; >>> + >>> + get_page(page); >>> + >>> + copy = min_t(u32, PAGE_SIZE - offset, len - copied); >>> + >>> + skb_fill_page_desc(skb, i, page, offset, copy); >>> + >>> + copied += copy; >>> + addr += copy; >>> + offset = 0; >>> + } >>> + >>> + skb->len += len; >>> + skb->data_len += len; >>> + skb->truesize += len; >>> + >>> + refcount_add(len, &xs->sk.sk_wmem_alloc); >>> + >>> + return skb; >>> +} >>> + >>> +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, >>> + struct xdp_desc *desc) >>> +{ >>> + struct sk_buff *skb = NULL; It seems the above init is unnecessary, for the skb is always set before being used. >>> + >>> + if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { >>> + skb = xsk_build_skb_zerocopy(xs, desc); >>> + if (IS_ERR(skb)) >>> + return skb; >>> + } else { >>> + void *buffer; >>> + u32 len; >>> + int err; >>> + >>> + len = desc->len; >>> + skb = sock_alloc_send_skb(&xs->sk, len, 1, &err); >>> + if (unlikely(!skb)) >>> + return ERR_PTR(err); >>> + >>> + skb_put(skb, len); >>> + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); >>> + err = skb_store_bits(skb, 0, buffer, len); >>> + if (unlikely(err)) { >>> + kfree_skb(skb); >>> + return ERR_PTR(err); >>> + } >>> + } >>> + >>> + skb->dev = xs->dev; >>> + skb->priority = xs->sk.sk_priority; >>> + skb->mark = xs->sk.sk_mark; >>> + skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; >>> + skb->destructor = xsk_destruct_skb; >>> + >>> + return skb; >>> +} >>> + >>> static int xsk_generic_xmit(struct sock *sk) >>> { >>> struct xdp_sock *xs = xdp_sk(sk); >>> @@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk) >>> goto out; >>> >>> while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { >>> - char *buffer; >>> - u64 addr; >>> - u32 len; >>> - >>> if (max_batch-- == 0) { >>> err = -EAGAIN; >>> goto out; >>> } >>> >>> - len = desc.len; >>> - skb = sock_alloc_send_skb(sk, len, 1, &err); >>> - if (unlikely(!skb)) >>> + skb = xsk_build_skb(xs, &desc); >>> + if (IS_ERR(skb)) { >>> + err = PTR_ERR(skb); >>> goto out; >>> + } >>> >>> - skb_put(skb, len); >>> - addr = desc.addr; >>> - buffer = xsk_buff_raw_get_data(xs->pool, addr); >>> - err = skb_store_bits(skb, 0, buffer, len); >>> /* This is the backpressure mechanism for the Tx path. >>> * Reserve space in the completion queue and only proceed >>> * if there is space in it. This avoids having to implement >>> * any buffering in the Tx path. >>> */ >>> spin_lock_irqsave(&xs->pool->cq_lock, flags); >>> - if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { >>> + if (xskq_prod_reserve(xs->pool->cq)) { >>> spin_unlock_irqrestore(&xs->pool->cq_lock, flags); >>> kfree_skb(skb); >>> goto out; >>> } >>> spin_unlock_irqrestore(&xs->pool->cq_lock, flags); >>> >>> - skb->dev = xs->dev; >>> - skb->priority = sk->sk_priority; >>> - skb->mark = sk->sk_mark; >>> - skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; >>> - skb->destructor = xsk_destruct_skb; >>> - >>> err = __dev_direct_xmit(skb, xs->queue_id); >>> if (err == NETDEV_TX_BUSY) { >>> /* Tell user-space to retry the send */ >>> -- >>> 1.8.3.1 >> >> Al >> > > . > ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <017fdff4e061a7e0e779b7bc96ed3b45e07aa006.1611048724.git.xuanzhuo@linux.alibaba.com>]
* Re: [PATCH bpf-next v2 3/3] xsk: build skb by page [not found] ` <017fdff4e061a7e0e779b7bc96ed3b45e07aa006.1611048724.git.xuanzhuo@linux.alibaba.com> @ 2021-01-19 14:43 ` Alexander Lobakin 0 siblings, 0 replies; 11+ messages in thread From: Alexander Lobakin @ 2021-01-19 14:43 UTC (permalink / raw) To: Xuan Zhuo Cc: Alexander Lobakin, Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski, bjorn.topel, Magnus Karlsson, Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, virtualization, bpf, netdev, linux-kernel From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Date: Tue, 19 Jan 2021 17:45:12 +0800 > This patch is used to construct skb based on page to save memory copy > overhead. > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to > directly construct skb. If this feature is not supported, it is still > necessary to copy data to construct skb. > > ---------------- Performance Testing ------------ > > The test environment is Aliyun ECS server. > Test cmd: > ``` > xdpsock -i eth0 -t -S -s <msg size> > ``` > > Test result data: > > size 64 512 1024 1500 > copy 1916747 1775988 1600203 1440054 > page 1974058 1953655 1945463 1904478 > percent 3.0% 10.0% 21.58% 32.3% > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> > Reviewed-by: Dust Li <dust.li@linux.alibaba.com> > --- > net/xdp/xsk.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 94 insertions(+), 18 deletions(-) > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index 8037b04..8c291f8 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -430,6 +430,95 @@ static void xsk_destruct_skb(struct sk_buff *skb) > sock_wfree(skb); > } > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, > + struct xdp_desc *desc) > +{ > + u32 len, offset, copy, copied; > + struct sk_buff *skb; > + struct page *page; > + char *buffer; > + int err = 0, i; > + u64 addr; > + > + skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err); > + if (unlikely(!skb)) > + return NULL; You can propagate err from here to the outer function: if (unlikely(!skb)) return ERR_PTR(err); > + addr = desc->addr; > + len = desc->len; > + > + buffer = xsk_buff_raw_get_data(xs->pool, addr); > + offset = offset_in_page(buffer); > + addr = buffer - (char *)xs->pool->addrs; > + > + for (copied = 0, i = 0; copied < len; ++i) { i++ would be less confusing here. You build skb frags from frag 0 anyway. > + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT]; > + > + get_page(page); > + > + copy = min((u32)(PAGE_SIZE - offset), len - copied); It's better to use min_t(u32, ...) instead of manual casting. > + > + skb_fill_page_desc(skb, i, page, offset, copy); > + > + copied += copy; > + addr += copy; > + offset = 0; > + } > + > + skb->len += len; > + skb->data_len += len; > + skb->truesize += len; > + > + refcount_add(len, &xs->sk.sk_wmem_alloc); > + > + return skb; > +} > + > +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, > + struct xdp_desc *desc) > +{ > + struct sk_buff *skb = NULL; > + int err = -ENOMEM; > + > + if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { > + skb = xsk_build_skb_zerocopy(xs, desc); > + if (unlikely(!skb)) > + goto err; 1. You should'n use goto err here, as skb == NULL, so kfree_skb(skb) is redundant. 2. If you would use ERR_PTR() in xsk_build_skb_zerocopy(), the condition should look like: if (IS_ERR(skb)) return PTR_ERR(skb); > + } else { > + char *buffer; > + u64 addr; > + u32 len; > + int err; > + > + len = desc->len; > + skb = sock_alloc_send_skb(&xs->sk, len, 1, &err); > + if (unlikely(!skb)) > + goto err; Same here, if skb == NULL, just return without calling kfree_skb(). > + skb_put(skb, len); > + addr = desc->addr; > + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr); > + err = skb_store_bits(skb, 0, buffer, len); > + > + if (unlikely(err)) { > + err = -EINVAL; You already have errno in err, no need to override it. > + goto err; > + } > + } > + > + skb->dev = xs->dev; > + skb->priority = xs->sk.sk_priority; > + skb->mark = xs->sk.sk_mark; > + skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; > + skb->destructor = xsk_destruct_skb; > + > + return skb; > + > +err: > + kfree_skb(skb); > + return ERR_PTR(err); > +} > + > static int xsk_generic_xmit(struct sock *sk) > { > struct xdp_sock *xs = xdp_sk(sk); > @@ -446,43 +535,30 @@ static int xsk_generic_xmit(struct sock *sk) > goto out; > > while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { > - char *buffer; > - u64 addr; > - u32 len; > - > if (max_batch-- == 0) { > err = -EAGAIN; > goto out; > } > > - len = desc.len; > - skb = sock_alloc_send_skb(sk, len, 1, &err); > - if (unlikely(!skb)) > + skb = xsk_build_skb(xs, &desc); > + if (IS_ERR(skb)) { > + err = PTR_ERR(skb); > goto out; > + } > > - skb_put(skb, len); > - addr = desc.addr; > - buffer = xsk_buff_raw_get_data(xs->pool, addr); > - err = skb_store_bits(skb, 0, buffer, len); > /* This is the backpressure mechanism for the Tx path. > * Reserve space in the completion queue and only proceed > * if there is space in it. This avoids having to implement > * any buffering in the Tx path. > */ > spin_lock_irqsave(&xs->pool->cq_lock, flags); > - if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { > + if (xskq_prod_reserve(xs->pool->cq)) { > spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > kfree_skb(skb); > goto out; > } > spin_unlock_irqrestore(&xs->pool->cq_lock, flags); > > - skb->dev = xs->dev; > - skb->priority = sk->sk_priority; > - skb->mark = sk->sk_mark; > - skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; > - skb->destructor = xsk_destruct_skb; > - > err = __dev_direct_xmit(skb, xs->queue_id); > if (err == NETDEV_TX_BUSY) { > /* Tell user-space to retry the send */ So please recheck the code and then retest it, especially error paths (you can inject errors manually here to ensure they work). Thanks, Al ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-01-21 10:58 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <cover.1611048724.git.xuanzhuo@linux.alibaba.com> [not found] ` <21d2f709140470eb143e3c6c69e2a5dbd20bf2e7.1611048724.git.xuanzhuo@linux.alibaba.com> 2021-01-19 14:28 ` [PATCH bpf-next v2 2/3] virtio-net: support IFF_TX_SKB_NO_LINEAR Alexander Lobakin [not found] ` <cover.1611128806.git.xuanzhuo@linux.alibaba.com> 2021-01-20 7:49 ` [PATCH net-next " Xuan Zhuo 2021-01-20 7:57 ` Michael S. Tsirkin 2021-01-20 7:50 ` [PATCH net-next v2 3/3] xsk: build skb by page Xuan Zhuo 2021-01-20 8:10 ` Michael S. Tsirkin 2021-01-20 8:11 ` Michael S. Tsirkin [not found] ` <cover.1611131344.git.xuanzhuo@linux.alibaba.com> 2021-01-20 8:30 ` Xuan Zhuo 2021-01-20 13:56 ` Alexander Lobakin 2021-01-21 7:41 ` Magnus Karlsson 2021-01-21 10:54 ` Yunsheng Lin [not found] ` <017fdff4e061a7e0e779b7bc96ed3b45e07aa006.1611048724.git.xuanzhuo@linux.alibaba.com> 2021-01-19 14:43 ` [PATCH bpf-next " Alexander Lobakin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).