bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Ahern <dsahern@gmail.com>
To: Yunsheng Lin <linyunsheng@huawei.com>,
	davem@davemloft.net, kuba@kernel.org
Cc: alexander.duyck@gmail.com, linux@armlinux.org.uk,
	mw@semihalf.com, linuxarm@openeuler.org, yisen.zhuang@huawei.com,
	salil.mehta@huawei.com, thomas.petazzoni@bootlin.com,
	hawk@kernel.org, ilias.apalodimas@linaro.org, ast@kernel.org,
	daniel@iogearbox.net, john.fastabend@gmail.com,
	akpm@linux-foundation.org, peterz@infradead.org, will@kernel.org,
	willy@infradead.org, vbabka@suse.cz, fenghua.yu@intel.com,
	guro@fb.com, peterx@redhat.com, feng.tang@intel.com,
	jgg@ziepe.ca, mcroce@microsoft.com, hughd@google.com,
	jonathan.lemon@gmail.com, alobakin@pm.me, willemb@google.com,
	wenxu@ucloud.cn, cong.wang@bytedance.com, haokexin@gmail.com,
	nogikh@google.com, elver@google.com, yhs@fb.com,
	kpsingh@kernel.org, andrii@kernel.org, kafai@fb.com,
	songliubraving@fb.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	chenhao288@hisilicon.com, edumazet@google.com,
	yoshfuji@linux-ipv6.org, dsahern@kernel.org, memxor@gmail.com,
	linux@rempel-privat.de, atenart@kernel.org, weiwan@google.com,
	ap420073@gmail.com, arnd@arndb.de,
	mathew.j.martineau@linux.intel.com, aahringo@redhat.com,
	ceggers@arri.de, yangbo.lu@nxp.com, fw@strlen.de,
	xiangxia.m.yue@gmail.com, linmiaohe@huawei.com
Subject: Re: [PATCH RFC 0/7] add socket to netdev page frag recycling support
Date: Mon, 23 Aug 2021 20:34:11 -0700	[thread overview]
Message-ID: <80701f7a-e7c6-eb86-4018-67033f0823bf@gmail.com> (raw)
In-Reply-To: <744e88b6-7cb4-ea99-0523-4bfa5a23e15c@huawei.com>

On 8/22/21 9:32 PM, Yunsheng Lin wrote:
> I assumed the "either Rx or Tx is cpu bound" meant either Rx or Tx is the
> bottleneck?


> It seems iperf3 support the Tx ZC, I retested using the iperf3, Rx settings
> is not changed when testing, MTU is 1500:

-Z == sendfile API. That works fine to a point and that point is well
below 100G.


> IOMMU in strict mode:
> 1. Tx ZC case:
>    22Gbit with Tx being bottleneck(cpu bound)
> 2. Tx non-ZC case with pfrag pool enabled:
>    40Git with Rx being bottleneck(cpu bound)
> 3. Tx non-ZC case with pfrag pool disabled:
>    30Git, the bottleneck seems not to be cpu bound, as the Rx and Tx does
>    not have a single CPU reaching about 100% usage.
>> At 1500 MTU lowering CPU usage on the Tx side does not accomplish much
>> on throughput since the Rx is 100% cpu.
> As above performance data, enabling ZC does not seems to help when IOMMU
> is involved, which has about 30% performance degrade when pfrag pool is
> disabled and 50% performance degrade when pfrag pool is enabled.

In a past response you should numbers for Tx ZC API with a custom
program. That program showed the dramatic reduction in CPU cycles for Tx
with the ZC API.

>> At 3300 MTU you have ~47% the pps for the same throughput. Lower pps
>> reduces Rx processing and lower CPU to process the incoming stream. Then
>> using the Tx ZC API you lower the Tx overehad allowing a single stream
>> to faster - sending more data which in the end results in much higher
>> pps and throughput. At the limit you are CPU bound (both ends in my
>> testing as Rx side approaches the max pps, and Tx side as it continually
>> tries to send data).
>> Lowering CPU usage on Tx the side is a win regardless of whether there
>> is a big increase on the throughput at 1500 MTU since that configuration
>> is an Rx CPU bound problem. Hence, my point that we have a good start
>> point for lowering CPU usage on the Tx side; we should improve it rather
>> than add per-socket page pools.
> Acctually it is not a per-socket page pools, the page pool is still per
> NAPI, this patchset adds multi allocation context to the page pool, so that
> the tx can reuse the same page pool with rx, which is quite usefully if the
> ARFS is enabled.
>> You can stress the Tx side and emphasize its overhead by modifying the
>> receiver to drop the data on Rx rather than copy to userspace which is a
>> huge bottleneck (e.g., MSG_TRUNC on recv). This allows the single flow
> As the frag page is supported in page pool for Rx, the Rx probably is not
> a bottleneck any more, at least not for IOMMU in strict mode.
> It seems iperf3 does not support MSG_TRUNC yet, any testing tool supporting
> MSG_TRUNC? Or do I have to hack the kernel or iperf3 tool to do that?

https://github.com/dsahern/iperf, mods branch

--zc_api is the Tx ZC API; --rx_drop adds MSG_TRUNC to recv.

>> stream to go faster and emphasize Tx bottlenecks as the pps at 3300
>> approaches the top pps at 1500. e.g., doing this with iperf3 shows the
>> spinlock overhead with tcp_sendmsg, overhead related to 'select' and
>> then gup_pgd_range.
> When IOMMU is in strict mode, the overhead with IOMMU seems to be much
> bigger than spinlock(23% to 10%).
> Anyway, I still think ZC mostly benefit to packet which is bigger than a
> specific size and IOMMU disabling case.
>> .

  reply	other threads:[~2021-08-24  3:34 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-18  3:32 [PATCH RFC 0/7] add socket to netdev page frag recycling support Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 1/7] page_pool: refactor the page pool to support multi alloc context Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 2/7] skbuff: add interface to manipulate frag count for tx recycling Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 3/7] net: add NAPI api to register and retrieve the page pool ptr Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 4/7] net: pfrag_pool: add pfrag pool support based on page pool Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 5/7] sock: support refilling pfrag from pfrag_pool Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 6/7] net: hns3: support tx recycling in the hns3 driver Yunsheng Lin
2021-08-18  8:57 ` [PATCH RFC 0/7] add socket to netdev page frag recycling support Eric Dumazet
2021-08-18  9:36   ` Yunsheng Lin
2021-08-23  9:25     ` [Linuxarm] " Yunsheng Lin
2021-08-23 15:04       ` Eric Dumazet
2021-08-24  8:03         ` Yunsheng Lin
2021-08-25 16:29         ` David Ahern
2021-08-25 16:32           ` Eric Dumazet
2021-08-25 16:38             ` David Ahern
2021-08-25 17:24               ` Eric Dumazet
2021-08-26  4:05                 ` David Ahern
2021-08-18 22:05 ` David Ahern
2021-08-19  8:18   ` Yunsheng Lin
2021-08-20 14:35     ` David Ahern
2021-08-23  3:32       ` Yunsheng Lin
2021-08-24  3:34         ` David Ahern [this message]
2021-08-24  8:41           ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80701f7a-e7c6-eb86-4018-67033f0823bf@gmail.com \
    --to=dsahern@gmail.com \
    --cc=aahringo@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=alobakin@pm.me \
    --cc=andrii@kernel.org \
    --cc=ap420073@gmail.com \
    --cc=arnd@arndb.de \
    --cc=ast@kernel.org \
    --cc=atenart@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=ceggers@arri.de \
    --cc=chenhao288@hisilicon.com \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=elver@google.com \
    --cc=feng.tang@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=fw@strlen.de \
    --cc=guro@fb.com \
    --cc=haokexin@gmail.com \
    --cc=hawk@kernel.org \
    --cc=hughd@google.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jgg@ziepe.ca \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linux@rempel-privat.de \
    --cc=linuxarm@openeuler.org \
    --cc=linyunsheng@huawei.com \
    --cc=mathew.j.martineau@linux.intel.com \
    --cc=mcroce@microsoft.com \
    --cc=memxor@gmail.com \
    --cc=mw@semihalf.com \
    --cc=netdev@vger.kernel.org \
    --cc=nogikh@google.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=salil.mehta@huawei.com \
    --cc=songliubraving@fb.com \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=vbabka@suse.cz \
    --cc=weiwan@google.com \
    --cc=wenxu@ucloud.cn \
    --cc=will@kernel.org \
    --cc=willemb@google.com \
    --cc=willy@infradead.org \
    --cc=xiangxia.m.yue@gmail.com \
    --cc=yangbo.lu@nxp.com \
    --cc=yhs@fb.com \
    --cc=yisen.zhuang@huawei.com \
    --cc=yoshfuji@linux-ipv6.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).