netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>, netdev@vger.kernel.org
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	"Jason Wang" <jasowang@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Björn Töpel" <bjorn.topel@intel.com>,
	"Magnus Karlsson" <magnus.karlsson@intel.com>,
	"Jonathan Lemon" <jonathan.lemon@gmail.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"KP Singh" <kpsingh@kernel.org>,
	"Willem de Bruijn" <willemb@google.com>,
	"Steffen Klassert" <steffen.klassert@secunet.com>,
	"Alexander Lobakin" <alobakin@pm.me>,
	"Miaohe Lin" <linmiaohe@huawei.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	"Mauro Carvalho Chehab" <mchehab+huawei@kernel.org>,
	"Antoine Tenart" <atenart@kernel.org>,
	"Michal Kubecek" <mkubecek@suse.cz>,
	"Andrew Lunn" <andrew@lunn.ch>,
	"Florian Fainelli" <f.fainelli@gmail.com>,
	"Meir Lichtinger" <meirl@mellanox.com>,
	virtualization@lists.linux-foundation.org, bpf@vger.kernel.org
Subject: RE: [PATCH bpf-next] xsk: build skb by page
Date: Sun, 17 Jan 2021 13:55:32 -0800	[thread overview]
Message-ID: <6004b254ce7_2664208d0@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <579fa463bba42ac71591540a1811dca41d725350.1610764948.git.xuanzhuo@linux.alibaba.com>

Xuan Zhuo wrote:
> This patch is used to construct skb based on page to save memory copy
> overhead.
> 
> This has one problem:
> 
> We construct the skb by fill the data page as a frag into the skb. In
> this way, the linear space is empty, and the header information is also
> in the frag, not in the linear space, which is not allowed for some
> network cards. For example, Mellanox Technologies MT27710 Family
> [ConnectX-4 Lx] will get the following error message:
> 
>     mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb, opcode 0xd, syndrome 0x1, vendor syndrome 0x68
>     00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
>     WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
>     00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
>     00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
>     00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>     mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb
> 
> I also tried to use build_skb to construct skb, but because of the
> existence of skb_shinfo, it must be behind the linear space, so this
> method is not working. We can't put skb_shinfo on desc->addr, it will be
> exposed to users, this is not safe.
> 
> Finally, I added a feature NETIF_F_SKB_NO_LINEAR to identify whether the
> network card supports the header information of the packet in the frag
> and not in the linear space.
> 
> ---------------- Performance Testing ------------
> 
> The test environment is Aliyun ECS server.
> Test cmd:
> ```
> xdpsock -i eth0 -t  -S -s <msg size>
> ```
> 
> Test result data:
> 
> size    64      512     1024    1500
> copy    1916747 1775988 1600203 1440054
> page    1974058 1953655 1945463 1904478
> percent 3.0%    10.0%   21.58%  32.3%

Looks like a good perf bump. Some easy suggestions below

> +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> +				     struct xdp_desc *desc, int *err)
> +{

Passing a 'int *err' here is ugly IMO use the ERR_PTR/PTR_ERR macros
and roll it into the return value.

or maybe use the out: pattern used in the kernel, but just doing direct
returns like now but with ERR_PTR() would also be fine.

> +	struct sk_buff *skb ;
        struct sk_buff *skb = NULL;
        err = -ENOMEM;
> +
> +	if (xs->dev->features & NETIF_F_SKB_NO_LINEAR) {
> +		skb = xsk_build_skb_zerocopy(xs, desc);
> +		if (unlikely(!skb)) {
			goto out

> +			*err = -ENOMEM;
> +			return NULL;
> +		}
> +	} else {
> +		char *buffer;
> +		u64 addr;
> +		u32 len;
> +		int err;
> +
> +		len = desc->len;
> +		skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
> +		if (unlikely(!skb)) {
			goto out;
> +			*err = -ENOMEM;
> +			return NULL;
> +		}
> +
> +		skb_put(skb, len);
> +		addr = desc->addr;
> +		buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
> +		err = skb_store_bits(skb, 0, buffer, len);
> +
> +		if (unlikely(err)) {
> +			kfree_skb(skb);

			err = -EINVAL;
			goto out

> +			*err = -EINVAL;
> +			return NULL;
> +		}
> +	}
> +
> +	skb->dev = xs->dev;
> +	skb->priority = xs->sk.sk_priority;
> +	skb->mark = xs->sk.sk_mark;
> +	skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
> +	skb->destructor = xsk_destruct_skb;
> +
> +	return skb;

out:
	kfree_skb(skb)
	return ERR_PTR(err);

> +}
> +

Otherwise looks good thanks.

  parent reply	other threads:[~2021-01-17 21:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-16  2:44 [PATCH bpf-next] xsk: build skb by page Xuan Zhuo
2021-01-16  5:24 ` kernel test robot
2021-01-16  8:15 ` kernel test robot
2021-01-17 21:55 ` John Fastabend [this message]
2021-01-18  9:25 ` Magnus Karlsson
2021-01-18 11:57 ` Michael S. Tsirkin
2021-01-18 12:37 ` Alexander Lobakin
2021-01-18 12:40 ` Yunsheng Lin
2021-01-18 13:00   ` Alexander Lobakin
2021-01-18 14:40     ` Alexander Lobakin
2021-01-18 15:03       ` Magnus Karlsson
2021-01-18 15:10       ` Magnus Karlsson
2021-01-18 16:38         ` Alexander Lobakin
2021-01-19  7:01           ` Magnus Karlsson
2021-01-19 12:44 ` Alexander Lobakin
  -- strict thread matches above, loose matches on Subject: below --
2020-12-23  8:56 Xuan Zhuo
2020-12-23 10:04 ` Magnus Karlsson
2020-12-29  8:32   ` Xuan Zhuo
2020-12-31 16:29 ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6004b254ce7_2664208d0@john-XPS-13-9370.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=alobakin@pm.me \
    --cc=andrew@lunn.ch \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=atenart@kernel.org \
    --cc=bjorn.topel@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=hawk@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=magnus.karlsson@intel.com \
    --cc=mchehab+huawei@kernel.org \
    --cc=meirl@mellanox.com \
    --cc=mkubecek@suse.cz \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=steffen.klassert@secunet.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=willemb@google.com \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).