netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/2] xsk: introduce generic almost-zerocopy xmit
@ 2021-03-30 23:15 Alexander Lobakin
  2021-03-30 23:15 ` [PATCH bpf-next 1/2] xsk: speed-up generic full-copy xmit Alexander Lobakin
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Alexander Lobakin @ 2021-03-30 23:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: Xuan Zhuo, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Alexander Lobakin, netdev, bpf, linux-kernel

This series is based on the exceptional generic zerocopy xmit logics
initially introduced by Xuan Zhuo. It extends it the way that it
could cover all the sane drivers, not only the ones that are capable
of xmitting skbs with no linear space.

The first patch is a random while-we-are-here improvement over
full-copy path, and the second is the main course. See the individual
commit messages for the details.

The original (full-zerocopy) path is still here and still generally
faster, but for now it seems like virtio_net will remain the only
user of it, at least for a considerable period of time.

Alexander Lobakin (2):
  xsk: speed-up generic full-copy xmit
  xsk: introduce generic almost-zerocopy xmit

 net/xdp/xsk.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

--
Well, this is untested. I currently don't have an access to my setup
and is bound by moving to another country, but as I don't know for
sure at the moment when I'll get back to work on the kernel next time,
I found it worthy to publish this now -- if any further changes will
be required when I already will be out-of-sight, maybe someone could
carry on to make a another revision and so on (I'm still here for any
questions, comments, reviews and improvements till the end of this
week).
But this *should* work with all the sane drivers. If a particular
one won't handle this, it's likely ill.
--
2.31.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH bpf-next 1/2] xsk: speed-up generic full-copy xmit
  2021-03-30 23:15 [PATCH bpf-next 0/2] xsk: introduce generic almost-zerocopy xmit Alexander Lobakin
@ 2021-03-30 23:15 ` Alexander Lobakin
  2021-03-30 23:15 ` [PATCH bpf-next 2/2] xsk: introduce generic almost-zerocopy xmit Alexander Lobakin
  2021-03-31  9:44 ` [PATCH bpf-next 0/2] " Magnus Karlsson
  2 siblings, 0 replies; 5+ messages in thread
From: Alexander Lobakin @ 2021-03-30 23:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: Xuan Zhuo, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Alexander Lobakin, netdev, bpf, linux-kernel

There are a few moments that are known for sure at the moment of
copying:
 - allocated skb is fully linear;
 - its linear space is long enough to hold the full buffer data.

So, the out-of-line skb_put(), skb_store_bits() and the check for
a retcode can be replaced with plain memcpy(__skb_put()) with
no loss.
Also align memcpy()'s len to sizeof(long) to improve its performance.

Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/xdp/xsk.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index a71ed664da0a..41f8f21b3348 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -517,14 +517,9 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 			return ERR_PTR(err);

 		skb_reserve(skb, hr);
-		skb_put(skb, len);

 		buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
-		err = skb_store_bits(skb, 0, buffer, len);
-		if (unlikely(err)) {
-			kfree_skb(skb);
-			return ERR_PTR(err);
-		}
+		memcpy(__skb_put(skb, len), buffer, ALIGN(len, sizeof(long)));
 	}

 	skb->dev = dev;
--
2.31.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH bpf-next 2/2] xsk: introduce generic almost-zerocopy xmit
  2021-03-30 23:15 [PATCH bpf-next 0/2] xsk: introduce generic almost-zerocopy xmit Alexander Lobakin
  2021-03-30 23:15 ` [PATCH bpf-next 1/2] xsk: speed-up generic full-copy xmit Alexander Lobakin
@ 2021-03-30 23:15 ` Alexander Lobakin
  2021-03-31  9:44 ` [PATCH bpf-next 0/2] " Magnus Karlsson
  2 siblings, 0 replies; 5+ messages in thread
From: Alexander Lobakin @ 2021-03-30 23:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: Xuan Zhuo, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Alexander Lobakin, netdev, bpf, linux-kernel

The reasons behind IFF_TX_SKB_NO_LINEAR are:
 - most drivers expect skb with the linear space;
 - most drivers expect hard header in the linear space;
 - many drivers need some headroom to insert custom headers
   and/or pull headers from frags (pskb_may_pull() etc.).

With some bits of overhead, we can satisfy all of this without
inducing full buffer data copy.

Now frames that are no lesser than 128 bytes (to mitigate allocation
overhead) are also being built using zerocopy path (if the device and
driver support S/G xmit, which is almost always true).
We allocate 256* additional bytes for skb linear space and pull hard
header there (aligning its end by 16 bytes for platforms with
NET_IP_ALIGN). The rest of the buffer data is just pinned as frags.
A room of at least 242 bytes is left for any driver needs.

We could just pass the buffer to eth_get_headlen() to minimize
allocation overhead and be able to copy all the headers into the
linear space, but the flow dissection procedure tends to be more
expensive than the current approach.

IFF_TX_SKB_NO_LINEAR path remains unchanged and is still actual and
generally faster.

* The value of 256 bytes is kinda "magic", it can be found in lots
  of drivers and places of core code and it is believed that 256
  bytes are enough to store any headers of any frame.

Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/xdp/xsk.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 41f8f21b3348..090ff9c096a3 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -445,6 +445,9 @@ static void xsk_destruct_skb(struct sk_buff *skb)
 	sock_wfree(skb);
 }

+#define XSK_SKB_HEADLEN		256
+#define XSK_COPY_THRESHOLD	(XSK_SKB_HEADLEN / 2)
+
 static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 					      struct xdp_desc *desc)
 {
@@ -452,13 +455,22 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 	u32 hr, len, ts, offset, copy, copied;
 	struct sk_buff *skb;
 	struct page *page;
+	bool need_pull;
 	void *buffer;
 	int err, i;
 	u64 addr;

 	hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom));
+	len = hr;
+
+	need_pull = !(xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR);
+	if (need_pull) {
+		len += XSK_SKB_HEADLEN;
+		len += NET_IP_ALIGN;
+		hr += NET_IP_ALIGN;
+	}

-	skb = sock_alloc_send_skb(&xs->sk, hr, 1, &err);
+	skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
 	if (unlikely(!skb))
 		return ERR_PTR(err);

@@ -488,6 +500,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 	skb->data_len += len;
 	skb->truesize += ts;

+	if (need_pull && unlikely(!__pskb_pull_tail(skb, ETH_HLEN))) {
+		kfree_skb(skb);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	refcount_add(ts, &xs->sk.sk_wmem_alloc);

 	return skb;
@@ -498,19 +515,20 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 {
 	struct net_device *dev = xs->dev;
 	struct sk_buff *skb;
+	u32 len = desc->len;

-	if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
+	if ((dev->priv_flags & IFF_TX_SKB_NO_LINEAR) ||
+	    (len >= XSK_COPY_THRESHOLD && likely(dev->features & NETIF_F_SG))) {
 		skb = xsk_build_skb_zerocopy(xs, desc);
 		if (IS_ERR(skb))
 			return skb;
 	} else {
-		u32 hr, tr, len;
 		void *buffer;
+		u32 hr, tr;
 		int err;

 		hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom));
 		tr = dev->needed_tailroom;
-		len = desc->len;

 		skb = sock_alloc_send_skb(&xs->sk, hr + len + tr, 1, &err);
 		if (unlikely(!skb))
--
2.31.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next 0/2] xsk: introduce generic almost-zerocopy xmit
  2021-03-30 23:15 [PATCH bpf-next 0/2] xsk: introduce generic almost-zerocopy xmit Alexander Lobakin
  2021-03-30 23:15 ` [PATCH bpf-next 1/2] xsk: speed-up generic full-copy xmit Alexander Lobakin
  2021-03-30 23:15 ` [PATCH bpf-next 2/2] xsk: introduce generic almost-zerocopy xmit Alexander Lobakin
@ 2021-03-31  9:44 ` Magnus Karlsson
  2021-03-31 12:01   ` Alexander Lobakin
  2 siblings, 1 reply; 5+ messages in thread
From: Magnus Karlsson @ 2021-03-31  9:44 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Alexei Starovoitov, Daniel Borkmann, Xuan Zhuo,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Network Development, bpf, open list

On Wed, Mar 31, 2021 at 1:17 AM Alexander Lobakin <alobakin@pm.me> wrote:
>
> This series is based on the exceptional generic zerocopy xmit logics
> initially introduced by Xuan Zhuo. It extends it the way that it
> could cover all the sane drivers, not only the ones that are capable
> of xmitting skbs with no linear space.
>
> The first patch is a random while-we-are-here improvement over
> full-copy path, and the second is the main course. See the individual
> commit messages for the details.
>
> The original (full-zerocopy) path is still here and still generally
> faster, but for now it seems like virtio_net will remain the only
> user of it, at least for a considerable period of time.
>
> Alexander Lobakin (2):
>   xsk: speed-up generic full-copy xmit
>   xsk: introduce generic almost-zerocopy xmit
>
>  net/xdp/xsk.c | 33 +++++++++++++++++++++++----------
>  1 file changed, 23 insertions(+), 10 deletions(-)
>
> --
> Well, this is untested. I currently don't have an access to my setup
> and is bound by moving to another country, but as I don't know for
> sure at the moment when I'll get back to work on the kernel next time,
> I found it worthy to publish this now -- if any further changes will
> be required when I already will be out-of-sight, maybe someone could
> carry on to make a another revision and so on (I'm still here for any
> questions, comments, reviews and improvements till the end of this
> week).
> But this *should* work with all the sane drivers. If a particular
> one won't handle this, it's likely ill.

Thanks Alexander. I will take your patches for a spin on a couple of
NICs and get back to you, though it will be next week due to holidays
where I am based.

> --
> 2.31.1
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next 0/2] xsk: introduce generic almost-zerocopy xmit
  2021-03-31  9:44 ` [PATCH bpf-next 0/2] " Magnus Karlsson
@ 2021-03-31 12:01   ` Alexander Lobakin
  0 siblings, 0 replies; 5+ messages in thread
From: Alexander Lobakin @ 2021-03-31 12:01 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
	Xuan Zhuo, Björn Töpel, Magnus Karlsson,
	Jonathan Lemon, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Network Development, bpf, open list

From: Magnus Karlsson <magnus.karlsson@gmail.com>
Date: Wed, 31 Mar 2021 11:44:45 +0200

> On Wed, Mar 31, 2021 at 1:17 AM Alexander Lobakin <alobakin@pm.me> wrote:
> >
> > This series is based on the exceptional generic zerocopy xmit logics
> > initially introduced by Xuan Zhuo. It extends it the way that it
> > could cover all the sane drivers, not only the ones that are capable
> > of xmitting skbs with no linear space.
> >
> > The first patch is a random while-we-are-here improvement over
> > full-copy path, and the second is the main course. See the individual
> > commit messages for the details.
> >
> > The original (full-zerocopy) path is still here and still generally
> > faster, but for now it seems like virtio_net will remain the only
> > user of it, at least for a considerable period of time.
> >
> > Alexander Lobakin (2):
> >   xsk: speed-up generic full-copy xmit
> >   xsk: introduce generic almost-zerocopy xmit
> >
> >  net/xdp/xsk.c | 33 +++++++++++++++++++++++----------
> >  1 file changed, 23 insertions(+), 10 deletions(-)
> >
> > --
> > Well, this is untested. I currently don't have an access to my setup
> > and is bound by moving to another country, but as I don't know for
> > sure at the moment when I'll get back to work on the kernel next time,
> > I found it worthy to publish this now -- if any further changes will
> > be required when I already will be out-of-sight, maybe someone could
> > carry on to make a another revision and so on (I'm still here for any
> > questions, comments, reviews and improvements till the end of this
> > week).
> > But this *should* work with all the sane drivers. If a particular
> > one won't handle this, it's likely ill.
>
> Thanks Alexander. I will take your patches for a spin on a couple of
> NICs and get back to you, though it will be next week due to holidays
> where I am based.

Thanks a lot! Any tests will be much appreciated.
I'll publish v2 in a moment though, want to drop a couple of
micro-optimizations.

> > --
> > 2.31.1

Al


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-31 12:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-30 23:15 [PATCH bpf-next 0/2] xsk: introduce generic almost-zerocopy xmit Alexander Lobakin
2021-03-30 23:15 ` [PATCH bpf-next 1/2] xsk: speed-up generic full-copy xmit Alexander Lobakin
2021-03-30 23:15 ` [PATCH bpf-next 2/2] xsk: introduce generic almost-zerocopy xmit Alexander Lobakin
2021-03-31  9:44 ` [PATCH bpf-next 0/2] " Magnus Karlsson
2021-03-31 12:01   ` Alexander Lobakin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).