All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: David Ahern <dsahern@gmail.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>,
	David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	Russell King <linux@armlinux.org.uk>,
	Marcin Wojtas <mw@semihalf.com>,
	linuxarm@openeuler.org, Yisen Zhuang <yisen.zhuang@huawei.com>,
	Salil Mehta <salil.mehta@huawei.com>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Fenghua Yu <fenghua.yu@intel.com>, Roman Gushchin <guro@fb.com>,
	Peter Xu <peterx@redhat.com>, "Tang, Feng" <feng.tang@intel.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	mcroce@microsoft.com, Hugh Dickins <hughd@google.com>,
	Jonathan Lemon <jonathan.lemon@gmail.com>,
	Alexander Lobakin <alobakin@pm.me>,
	Willem de Bruijn <willemb@google.com>, wenxu <wenxu@ucloud.cn>,
	Cong Wang <cong.wang@bytedance.com>,
	Kevin Hao <haokexin@gmail.com>,
	Aleksandr Nogikh <nogikh@google.com>,
	Marco Elver <elver@google.com>, Yonghong Song <yhs@fb.com>,
	kpsingh@kernel.org, Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	netdev <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	chenhao288@hisilicon.com,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>,
	memxor@gmail.com, linux@rempel-privat.de,
	Antoine Tenart <atenart@kernel.org>, Wei Wang <weiwan@google.com>,
	Taehee Yoo <ap420073@gmail.com>, Arnd Bergmann <arnd@arndb.de>,
	Mat Martineau <mathew.j.martineau@linux.intel.com>,
	aahringo@redhat.com, ceggers@arri.de, yangbo.lu@nxp.com,
	Florian Westphal <fw@strlen.de>,
	xiangxia.m.yue@gmail.com, linmiaohe <linmiaohe@huawei.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [Linuxarm] Re: [PATCH RFC 0/7] add socket to netdev page frag recycling support
Date: Wed, 25 Aug 2021 09:32:41 -0700	[thread overview]
Message-ID: <CANn89iKqijGU_0dQMeyMJ2h2MJE3=fLm8qb456G3ZD_7TrLt_A@mail.gmail.com> (raw)
In-Reply-To: <5fdc5223-7d67-fed7-f691-185dcb2e3d80@gmail.com>

On Wed, Aug 25, 2021 at 9:29 AM David Ahern <dsahern@gmail.com> wrote:
>
> On 8/23/21 8:04 AM, Eric Dumazet wrote:
> >>
> >>
> >> It seems PAGE_ALLOC_COSTLY_ORDER is mostly related to pcp page, OOM, memory
> >> compact and memory isolation, as the test system has a lot of memory installed
> >> (about 500G, only 3-4G is used), so I used the below patch to test the max
> >> possible performance improvement when making TCP frags twice bigger, and
> >> the performance improvement went from about 30Gbit to 32Gbit for one thread
> >> iperf tcp flow in IOMMU strict mode,
> >
> > This is encouraging, and means we can do much better.
> >
> > Even with SKB_FRAG_PAGE_ORDER  set to 4, typical skbs will need 3 mappings
> >
> > 1) One for the headers (in skb->head)
> > 2) Two page frags, because one TSO packet payload is not a nice power-of-two.
>
> interesting observation. I have noticed 17 with the ZC API. That might
> explain the less than expected performance bump with iommu strict mode.

Note that if application is using huge pages, things get better after

commit 394fcd8a813456b3306c423ec4227ed874dfc08b
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Aug 20 08:43:59 2020 -0700

    net: zerocopy: combine pages in zerocopy_sg_from_iter()

    Currently, tcp sendmsg(MSG_ZEROCOPY) is building skbs with order-0
fragments.
    Compared to standard sendmsg(), these skbs usually contain up to
16 fragments
    on arches with 4KB page sizes, instead of two.

    This adds considerable costs on various ndo_start_xmit() handlers,
    especially when IOMMU is in the picture.

    As high performance applications are often using huge pages,
    we can try to combine adjacent pages belonging to same
    compound page.

    Tested on AMD Rome platform, with IOMMU, nominal single TCP flow speed
    is roughly doubled (~55Gbit -> ~100Gbit), when user application
    is using hugepages.

    For reference, nominal single TCP flow speed on this platform
    without MSG_ZEROCOPY is ~65Gbit.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Willem de Bruijn <willemb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Ideally the gup stuff should really directly deal with hugepages, so
that we avoid
all these crazy refcounting games on the per-huge-page central refcount.

>
> >
> > The first issue can be addressed using a piece of coherent memory (128
> > or 256 bytes per entry in TX ring).
> > Copying the headers can avoid one IOMMU mapping, and improve IOTLB
> > hits, because all
> > slots of the TX ring buffer will use one single IOTLB slot.
> >
> > The second issue can be solved by tweaking a bit
> > skb_page_frag_refill() to accept an additional parameter
> > so that the whole skb payload fits in a single order-4 page.
> >
> >

  reply	other threads:[~2021-08-25 16:33 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-18  3:32 [PATCH RFC 0/7] add socket to netdev page frag recycling support Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 1/7] page_pool: refactor the page pool to support multi alloc context Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 2/7] skbuff: add interface to manipulate frag count for tx recycling Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 3/7] net: add NAPI api to register and retrieve the page pool ptr Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 4/7] net: pfrag_pool: add pfrag pool support based on page pool Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 5/7] sock: support refilling pfrag from pfrag_pool Yunsheng Lin
2021-08-18  3:32 ` [PATCH RFC 6/7] net: hns3: support tx recycling in the hns3 driver Yunsheng Lin
2021-08-18  8:57 ` [PATCH RFC 0/7] add socket to netdev page frag recycling support Eric Dumazet
2021-08-18  9:36   ` Yunsheng Lin
2021-08-23  9:25     ` [Linuxarm] " Yunsheng Lin
2021-08-23 15:04       ` Eric Dumazet
2021-08-24  8:03         ` Yunsheng Lin
2021-08-25 16:29         ` David Ahern
2021-08-25 16:32           ` Eric Dumazet [this message]
2021-08-25 16:38             ` David Ahern
2021-08-25 17:24               ` Eric Dumazet
2021-08-26  4:05                 ` David Ahern
2021-08-18 22:05 ` David Ahern
2021-08-19  8:18   ` Yunsheng Lin
2021-08-20 14:35     ` David Ahern
2021-08-23  3:32       ` Yunsheng Lin
2021-08-24  3:34         ` David Ahern
2021-08-24  8:41           ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANn89iKqijGU_0dQMeyMJ2h2MJE3=fLm8qb456G3ZD_7TrLt_A@mail.gmail.com' \
    --to=edumazet@google.com \
    --cc=aahringo@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=alobakin@pm.me \
    --cc=andrii@kernel.org \
    --cc=ap420073@gmail.com \
    --cc=arnd@arndb.de \
    --cc=ast@kernel.org \
    --cc=atenart@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=ceggers@arri.de \
    --cc=chenhao288@hisilicon.com \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=dsahern@kernel.org \
    --cc=elver@google.com \
    --cc=feng.tang@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=fw@strlen.de \
    --cc=guro@fb.com \
    --cc=haokexin@gmail.com \
    --cc=hawk@kernel.org \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jgg@ziepe.ca \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linux@rempel-privat.de \
    --cc=linuxarm@openeuler.org \
    --cc=linyunsheng@huawei.com \
    --cc=mathew.j.martineau@linux.intel.com \
    --cc=mcroce@microsoft.com \
    --cc=memxor@gmail.com \
    --cc=mw@semihalf.com \
    --cc=netdev@vger.kernel.org \
    --cc=nogikh@google.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=salil.mehta@huawei.com \
    --cc=songliubraving@fb.com \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=vbabka@suse.cz \
    --cc=weiwan@google.com \
    --cc=wenxu@ucloud.cn \
    --cc=will@kernel.org \
    --cc=willemb@google.com \
    --cc=willy@infradead.org \
    --cc=xiangxia.m.yue@gmail.com \
    --cc=yangbo.lu@nxp.com \
    --cc=yhs@fb.com \
    --cc=yisen.zhuang@huawei.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.