From: John Fastabend <john.fastabend@gmail.com>
To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>, netdev@vger.kernel.org
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
"Jason Wang" <jasowang@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
"Jakub Kicinski" <kuba@kernel.org>,
"Björn Töpel" <bjorn.topel@intel.com>,
"Magnus Karlsson" <magnus.karlsson@intel.com>,
"Jonathan Lemon" <jonathan.lemon@gmail.com>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Martin KaFai Lau" <kafai@fb.com>,
"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
"KP Singh" <kpsingh@kernel.org>,
"Willem de Bruijn" <willemb@google.com>,
"Steffen Klassert" <steffen.klassert@secunet.com>,
"Alexander Lobakin" <alobakin@pm.me>,
"Miaohe Lin" <linmiaohe@huawei.com>,
"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
"Mauro Carvalho Chehab" <mchehab+huawei@kernel.org>,
"Antoine Tenart" <atenart@kernel.org>,
"Michal Kubecek" <mkubecek@suse.cz>,
"Andrew Lunn" <andrew@lunn.ch>,
"Florian Fainelli" <f.fainelli@gmail.com>,
"Meir Lichtinger" <meirl@mellanox.com>,
virtualization@lists.linux-foundation.org, bpf@vger.kernel.org
Subject: RE: [PATCH bpf-next] xsk: build skb by page
Date: Sun, 17 Jan 2021 13:55:32 -0800 [thread overview]
Message-ID: <6004b254ce7_2664208d0@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <579fa463bba42ac71591540a1811dca41d725350.1610764948.git.xuanzhuo@linux.alibaba.com>
Xuan Zhuo wrote:
> This patch is used to construct skb based on page to save memory copy
> overhead.
>
> This has one problem:
>
> We construct the skb by fill the data page as a frag into the skb. In
> this way, the linear space is empty, and the header information is also
> in the frag, not in the linear space, which is not allowed for some
> network cards. For example, Mellanox Technologies MT27710 Family
> [ConnectX-4 Lx] will get the following error message:
>
> mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68
> 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
> WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
> 00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
> 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
> 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb
>
> I also tried to use build_skb to construct skb, but because of the
> existence of skb_shinfo, it must be behind the linear space, so this
> method is not working. We can't put skb_shinfo on desc->addr, it will be
> exposed to users, this is not safe.
>
> Finally, I added a feature NETIF_F_SKB_NO_LINEAR to identify whether the
> network card supports the header information of the packet in the frag
> and not in the linear space.
>
> ---------------- Performance Testing ------------
>
> The test environment is Aliyun ECS server.
> Test cmd:
> ```
> xdpsock -i eth0 -t -S -s <msg size>
> ```
>
> Test result data:
>
> size 64 512 1024 1500
> copy 1916747 1775988 1600203 1440054
> page 1974058 1953655 1945463 1904478
> percent 3.0% 10.0% 21.58% 32.3%
Looks like a good perf bump. Some easy suggestions below
> +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> + struct xdp_desc *desc, int *err)
> +{
Passing a 'int *err' here is ugly IMO use the ERR_PTR/PTR_ERR macros
and roll it into the return value.
or maybe use the out: pattern used in the kernel, but just doing direct
returns like now but with ERR_PTR() would also be fine.
> + struct sk_buff *skb ;
struct sk_buff *skb = NULL;
err = -ENOMEM;
> +
> + if (xs->dev->features & NETIF_F_SKB_NO_LINEAR) {
> + skb = xsk_build_skb_zerocopy(xs, desc);
> + if (unlikely(!skb)) {
goto out
> + *err = -ENOMEM;
> + return NULL;
> + }
> + } else {
> + char *buffer;
> + u64 addr;
> + u32 len;
> + int err;
> +
> + len = desc->len;
> + skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
> + if (unlikely(!skb)) {
goto out;
> + *err = -ENOMEM;
> + return NULL;
> + }
> +
> + skb_put(skb, len);
> + addr = desc->addr;
> + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
> + err = skb_store_bits(skb, 0, buffer, len);
> +
> + if (unlikely(err)) {
> + kfree_skb(skb);
err = -EINVAL;
goto out
> + *err = -EINVAL;
> + return NULL;
> + }
> + }
> +
> + skb->dev = xs->dev;
> + skb->priority = xs->sk.sk_priority;
> + skb->mark = xs->sk.sk_mark;
> + skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
> + skb->destructor = xsk_destruct_skb;
> +
> + return skb;
out:
kfree_skb(skb)
return ERR_PTR(err);
> +}
> +
Otherwise looks good thanks.
next prev parent reply other threads:[~2021-01-17 21:56 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-16 2:44 [PATCH bpf-next] xsk: build skb by page Xuan Zhuo
2021-01-16 5:24 ` kernel test robot
2021-01-16 8:15 ` kernel test robot
2021-01-17 21:55 ` John Fastabend [this message]
2021-01-18 9:25 ` Magnus Karlsson
2021-01-18 11:57 ` Michael S. Tsirkin
2021-01-18 12:37 ` Alexander Lobakin
2021-01-18 12:40 ` Yunsheng Lin
2021-01-18 13:00 ` Alexander Lobakin
2021-01-18 14:40 ` Alexander Lobakin
2021-01-18 15:03 ` Magnus Karlsson
2021-01-18 15:10 ` Magnus Karlsson
2021-01-18 16:38 ` Alexander Lobakin
2021-01-19 7:01 ` Magnus Karlsson
2021-01-19 12:44 ` Alexander Lobakin
-- strict thread matches above, loose matches on Subject: below --
2020-12-23 8:56 Xuan Zhuo
2020-12-23 10:04 ` Magnus Karlsson
2020-12-29 8:32 ` Xuan Zhuo
2020-12-31 16:29 ` John Fastabend
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6004b254ce7_2664208d0@john-XPS-13-9370.notmuch \
--to=john.fastabend@gmail.com \
--cc=alobakin@pm.me \
--cc=andrew@lunn.ch \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=atenart@kernel.org \
--cc=bjorn.topel@intel.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=f.fainelli@gmail.com \
--cc=hawk@kernel.org \
--cc=jasowang@redhat.com \
--cc=jonathan.lemon@gmail.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linmiaohe@huawei.com \
--cc=magnus.karlsson@intel.com \
--cc=mchehab+huawei@kernel.org \
--cc=meirl@mellanox.com \
--cc=mkubecek@suse.cz \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=songliubraving@fb.com \
--cc=steffen.klassert@secunet.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=willemb@google.com \
--cc=xuanzhuo@linux.alibaba.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).