From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D13A1C432BE for ; Mon, 23 Aug 2021 15:04:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B7F486136F for ; Mon, 23 Aug 2021 15:04:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230420AbhHWPFM (ORCPT ); Mon, 23 Aug 2021 11:05:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231174AbhHWPFM (ORCPT ); Mon, 23 Aug 2021 11:05:12 -0400 Received: from mail-yb1-xb34.google.com (mail-yb1-xb34.google.com [IPv6:2607:f8b0:4864:20::b34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1815DC061757 for ; Mon, 23 Aug 2021 08:04:29 -0700 (PDT) Received: by mail-yb1-xb34.google.com with SMTP id n126so21393057ybf.6 for ; Mon, 23 Aug 2021 08:04:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=brllRPBO8PdBfH3Jzs8CaAHdDlba7cjyGUD8zEZHXyY=; b=Vc94pMZYpc3ZyvKJNLJR7EvkvNVw9GhNOAOpAliq6KWt3zPz4+eWlN7s8oacbHvPfG T1ZWXFUTqWUQtDMntCHNe0df98oepr+UqMcrx2l794bCcdeL/YKATj+g0ZWPKkKXiR8l LlZJD4r4V7HVVKltZj68itYQapxgFlrJZnFruvhzqzbhkJu8+kn8Guu/TfuVUOv5Ama5 xrVG+YDIkOLNVTDTNWfu6bvxYhRpKndPVxk3EYlxYDq5r4VCQ1rpU8gBNoeTdimT40wO El539AasUnjmndFEkMIO2V90KHJBrz7CizfpLVCNaZfqxfSWbyZ2zA8ziexXSACy7EVZ cvUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=brllRPBO8PdBfH3Jzs8CaAHdDlba7cjyGUD8zEZHXyY=; b=jEO3tbgaUO6p660S6vgmJ8+A7qnIeMg51Zc/V39YAumaIN5z4VakPjIXYKN1T323pB Mo4VpX/4mWwJwSnPr+ba6u8mLZrGwtoWU5BJ8oLlY12is6GNa4N3lSE8U7qJeZam9jTx 1TpAHOsEY9Y/h1+EE44wZu9N63f2J/Eab0cSNRp2GmcCV4WVs+IOLqRR69hRB6nX2Qkt SnBaDP9X08/uQUDwrkz0SEnyI7/QGq6I1UOhNvOq00BidTantD44WDagJuE4JUyzItE3 hvxDor5P1AB1k7+iuCAB1V8Lq8qEQeJeikoT+FkkBAZxoAr+nkKuGglNpzbno7RqGKav eM7Q== X-Gm-Message-State: AOAM532FGzvSVb0lNnOOCI5MxWk8F5/RkfNUCxUY5raeus0ZfkkXIKce f3NY8YLqPk2x2aOIilp2x+JPMURB2bl+xQ4yS5RNLA== X-Google-Smtp-Source: ABdhPJzACN5iW03c/RkSTXbYC3XXHzI70EbUZdt3h+laFXezA37uunrWvdyn9XuQcY7UyBk5h7i/rsROdE5kjXMoX3o= X-Received: by 2002:a25:afcd:: with SMTP id d13mr42895820ybj.504.1629731067656; Mon, 23 Aug 2021 08:04:27 -0700 (PDT) MIME-Version: 1.0 References: <1629257542-36145-1-git-send-email-linyunsheng@huawei.com> <2cf4b672-d7dc-db3d-ce90-15b4e91c4005@huawei.com> <4b2ad6d4-8e3f-fea9-766e-2e7330750f84@huawei.com> In-Reply-To: <4b2ad6d4-8e3f-fea9-766e-2e7330750f84@huawei.com> From: Eric Dumazet Date: Mon, 23 Aug 2021 08:04:16 -0700 Message-ID: Subject: Re: [Linuxarm] Re: [PATCH RFC 0/7] add socket to netdev page frag recycling support To: Yunsheng Lin Cc: David Miller , Jakub Kicinski , Alexander Duyck , Russell King , Marcin Wojtas , linuxarm@openeuler.org, Yisen Zhuang , Salil Mehta , Thomas Petazzoni , Jesper Dangaard Brouer , Ilias Apalodimas , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrew Morton , Peter Zijlstra , Will Deacon , Matthew Wilcox , Vlastimil Babka , Fenghua Yu , Roman Gushchin , Peter Xu , "Tang, Feng" , Jason Gunthorpe , mcroce@microsoft.com, Hugh Dickins , Jonathan Lemon , Alexander Lobakin , Willem de Bruijn , wenxu , Cong Wang , Kevin Hao , Aleksandr Nogikh , Marco Elver , Yonghong Song , kpsingh@kernel.org, Andrii Nakryiko , Martin KaFai Lau , Song Liu , netdev , LKML , bpf , chenhao288@hisilicon.com, Hideaki YOSHIFUJI , David Ahern , memxor@gmail.com, linux@rempel-privat.de, Antoine Tenart , Wei Wang , Taehee Yoo , Arnd Bergmann , Mat Martineau , aahringo@redhat.com, ceggers@arri.de, yangbo.lu@nxp.com, Florian Westphal , xiangxia.m.yue@gmail.com, linmiaohe , Christoph Hellwig Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On Mon, Aug 23, 2021 at 2:25 AM Yunsheng Lin wrote: > > On 2021/8/18 17:36, Yunsheng Lin wrote: > > On 2021/8/18 16:57, Eric Dumazet wrote: > >> On Wed, Aug 18, 2021 at 5:33 AM Yunsheng Lin wrote: > >>> > >>> This patchset adds the socket to netdev page frag recycling > >>> support based on the busy polling and page pool infrastructure. > >> > >> I really do not see how this can scale to thousands of sockets. > >> > >> tcp_mem[] defaults to ~ 9 % of physical memory. > >> > >> If you now run tests with thousands of sockets, their skbs will > >> consume Gigabytes > >> of memory on typical servers, now backed by order-0 pages (instead of > >> current order-3 pages) > >> So IOMMU costs will actually be much bigger. > > > > As the page allocator support bulk allocating now, see: > > https://elixir.bootlin.com/linux/latest/source/net/core/page_pool.c#L252 > > > > if the DMA also support batch mapping/unmapping, maybe having a > > small-sized page pool for thousands of sockets may not be a problem? > > Christoph Hellwig mentioned the batch DMA operation support in below > > thread: > > https://www.spinics.net/lists/netdev/msg666715.html > > > > if the batched DMA operation is supported, maybe having the > > page pool is mainly benefit the case of small number of socket? > > > >> > >> Are we planning to use Gigabyte sized page pools for NIC ? > >> > >> Have you tried instead to make TCP frags twice bigger ? > > > > Not yet. > > > >> This would require less IOMMU mappings. > >> (Note: This could require some mm help, since PAGE_ALLOC_COSTLY_ORDER > >> is currently 3, not 4) > > > > I am not familiar with mm yet, but I will take a look about that:) > > > It seems PAGE_ALLOC_COSTLY_ORDER is mostly related to pcp page, OOM, memory > compact and memory isolation, as the test system has a lot of memory installed > (about 500G, only 3-4G is used), so I used the below patch to test the max > possible performance improvement when making TCP frags twice bigger, and > the performance improvement went from about 30Gbit to 32Gbit for one thread > iperf tcp flow in IOMMU strict mode, This is encouraging, and means we can do much better. Even with SKB_FRAG_PAGE_ORDER set to 4, typical skbs will need 3 mappings 1) One for the headers (in skb->head) 2) Two page frags, because one TSO packet payload is not a nice power-of-two. The first issue can be addressed using a piece of coherent memory (128 or 256 bytes per entry in TX ring). Copying the headers can avoid one IOMMU mapping, and improve IOTLB hits, because all slots of the TX ring buffer will use one single IOTLB slot. The second issue can be solved by tweaking a bit skb_page_frag_refill() to accept an additional parameter so that the whole skb payload fits in a single order-4 page. and using the pfrag pool, the improvement > went from about 30Gbit to 40Gbit for the same testing configuation: Yes, but you have not provided performance number when 200 (or 1000+) concurrent flows are running. Optimizing singe flow TCP performance while killing performance for the more common case is not an option. > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index fcb5355..dda20f9 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -37,7 +37,7 @@ > * coalesce naturally under reasonable reclaim pressure and those which > * will not. > */ > -#define PAGE_ALLOC_COSTLY_ORDER 3 > +#define PAGE_ALLOC_COSTLY_ORDER 4 > > enum migratetype { > MIGRATE_UNMOVABLE, > diff --git a/net/core/sock.c b/net/core/sock.c > index 870a3b7..b1e0dfc 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -2580,7 +2580,7 @@ static void sk_leave_memory_pressure(struct sock *sk) > } > } > > -#define SKB_FRAG_PAGE_ORDER get_order(32768) > +#define SKB_FRAG_PAGE_ORDER get_order(65536) > DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); > > /** > > > > >> > >> diff --git a/net/core/sock.c b/net/core/sock.c > >> index a3eea6e0b30a7d43793f567ffa526092c03e3546..6b66b51b61be9f198f6f1c4a3d81b57fa327986a > >> 100644 > >> --- a/net/core/sock.c > >> +++ b/net/core/sock.c > >> @@ -2560,7 +2560,7 @@ static void sk_leave_memory_pressure(struct sock *sk) > >> } > >> } > >> > >> -#define SKB_FRAG_PAGE_ORDER get_order(32768) > >> +#define SKB_FRAG_PAGE_ORDER get_order(65536) > >> DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); > >> > >> /** > >> > >> > >> > >>> > > _______________________________________________ > > Linuxarm mailing list -- linuxarm@openeuler.org > > To unsubscribe send an email to linuxarm-leave@openeuler.org > >