linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Lobakin <alobakin@pm.me>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>
Cc: "Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Björn Töpel" <bjorn@kernel.org>,
	"Magnus Karlsson" <magnus.karlsson@intel.com>,
	"Jonathan Lemon" <jonathan.lemon@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"KP Singh" <kpsingh@kernel.org>,
	"Alexander Lobakin" <alobakin@pm.me>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH v2 bpf-next 2/2] xsk: introduce generic almost-zerocopy xmit
Date: Wed, 31 Mar 2021 12:28:40 +0000	[thread overview]
Message-ID: <20210331122820.6356-2-alobakin@pm.me> (raw)
In-Reply-To: <20210331122820.6356-1-alobakin@pm.me>

The reasons behind IFF_TX_SKB_NO_LINEAR are:
 - most drivers expect skb with the linear space;
 - most drivers expect hard header in the linear space;
 - many drivers need some headroom to insert custom headers
   and/or pull headers from frags (pskb_may_pull() etc.).

With some bits of overhead, we can satisfy all of this without
inducing full buffer data copy.

Now frames that are bigger than 128 bytes (to mitigate allocation
overhead) are also being built using zerocopy path (if the device and
driver support S/G xmit, which is almost always true).
We allocate 256* additional bytes for skb linear space and pull hard
header there (aligning its end by 16 bytes for platforms with
NET_IP_ALIGN). The rest of the buffer data is just pinned as frags.
A room of at least 240 bytes is left for any driver needs.

We could just pass the buffer to eth_get_headlen() to minimize
allocation overhead and be able to copy all the headers into the
linear space, but the flow dissection procedure tends to be more
expensive than the current approach.

IFF_TX_SKB_NO_LINEAR path remains unchanged and is still actual and
generally faster.

* The value of 256 bytes is kinda "magic", it can be found in lots
  of drivers and places of core code and it is believed that 256
  bytes are enough to store any headers of any frame.

Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/xdp/xsk.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 41f8f21b3348..1d241f87422c 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -445,6 +445,9 @@ static void xsk_destruct_skb(struct sk_buff *skb)
 	sock_wfree(skb);
 }

+#define XSK_SKB_HEADLEN		256
+#define XSK_COPY_THRESHOLD	(XSK_SKB_HEADLEN / 2)
+
 static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 					      struct xdp_desc *desc)
 {
@@ -452,13 +455,21 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 	u32 hr, len, ts, offset, copy, copied;
 	struct sk_buff *skb;
 	struct page *page;
+	bool need_pull;
 	void *buffer;
 	int err, i;
 	u64 addr;

 	hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(xs->dev->needed_headroom));
+	len = hr;
+
+	need_pull = !(xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR);
+	if (need_pull) {
+		len += XSK_SKB_HEADLEN;
+		hr += NET_IP_ALIGN;
+	}

-	skb = sock_alloc_send_skb(&xs->sk, hr, 1, &err);
+	skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
 	if (unlikely(!skb))
 		return ERR_PTR(err);

@@ -488,6 +499,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 	skb->data_len += len;
 	skb->truesize += ts;

+	if (need_pull && unlikely(!__pskb_pull_tail(skb, ETH_HLEN))) {
+		kfree_skb(skb);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	refcount_add(ts, &xs->sk.sk_wmem_alloc);

 	return skb;
@@ -498,19 +514,20 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 {
 	struct net_device *dev = xs->dev;
 	struct sk_buff *skb;
+	u32 len = desc->len;

-	if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
+	if ((dev->priv_flags & IFF_TX_SKB_NO_LINEAR) ||
+	    (len > XSK_COPY_THRESHOLD && likely(dev->features & NETIF_F_SG))) {
 		skb = xsk_build_skb_zerocopy(xs, desc);
 		if (IS_ERR(skb))
 			return skb;
 	} else {
-		u32 hr, tr, len;
 		void *buffer;
+		u32 hr, tr;
 		int err;

 		hr = max(NET_SKB_PAD, L1_CACHE_ALIGN(dev->needed_headroom));
 		tr = dev->needed_tailroom;
-		len = desc->len;

 		skb = sock_alloc_send_skb(&xs->sk, hr + len + tr, 1, &err);
 		if (unlikely(!skb))
--
2.31.1



  reply	other threads:[~2021-03-31 12:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-31 12:26 [PATCH v2 bpf-next 0/2] xsk: introduce generic almost-zerocopy xmit Alexander Lobakin
2021-03-31 12:28 ` [PATCH v2 bpf-next 1/2] xsk: speed-up generic full-copy xmit Alexander Lobakin
2021-03-31 12:28   ` Alexander Lobakin [this message]
2021-04-12 14:13 ` [PATCH v2 bpf-next 0/2] xsk: introduce generic almost-zerocopy xmit Magnus Karlsson
     [not found]   ` <1618278328.0085247-1-xuanzhuo@linux.alibaba.com>
2021-04-13  7:14     ` Magnus Karlsson
2021-04-18 12:04       ` Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210331122820.6356-2-alobakin@pm.me \
    --to=alobakin@pm.me \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).