From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B336C4338F for ; Wed, 18 Aug 2021 09:38:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 83C7960F39 for ; Wed, 18 Aug 2021 09:38:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233888AbhHRJin (ORCPT ); Wed, 18 Aug 2021 05:38:43 -0400 Received: from szxga01-in.huawei.com ([45.249.212.187]:8038 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234949AbhHRJg5 (ORCPT ); Wed, 18 Aug 2021 05:36:57 -0400 Received: from dggemv704-chm.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4GqN733dvCzYnLv; Wed, 18 Aug 2021 17:35:43 +0800 (CST) Received: from dggpemm500005.china.huawei.com (7.185.36.74) by dggemv704-chm.china.huawei.com (10.3.19.47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 18 Aug 2021 17:36:07 +0800 Received: from [10.69.30.204] (10.69.30.204) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2176.2; Wed, 18 Aug 2021 17:36:06 +0800 Subject: Re: [PATCH RFC 0/7] add socket to netdev page frag recycling support To: Eric Dumazet CC: David Miller , Jakub Kicinski , Alexander Duyck , Russell King , Marcin Wojtas , , Yisen Zhuang , Salil Mehta , Thomas Petazzoni , Jesper Dangaard Brouer , Ilias Apalodimas , Alexei Starovoitov , "Daniel Borkmann" , John Fastabend , Andrew Morton , Peter Zijlstra , Will Deacon , Matthew Wilcox , Vlastimil Babka , Fenghua Yu , Roman Gushchin , Peter Xu , "Tang, Feng" , Jason Gunthorpe , , Hugh Dickins , Jonathan Lemon , Alexander Lobakin , Willem de Bruijn , wenxu , Cong Wang , Kevin Hao , Aleksandr Nogikh , Marco Elver , Yonghong Song , , "Andrii Nakryiko" , Martin KaFai Lau , Song Liu , netdev , LKML , bpf , , Hideaki YOSHIFUJI , David Ahern , , , Antoine Tenart , Wei Wang , Taehee Yoo , Arnd Bergmann , Mat Martineau , , , , "Florian Westphal" , , linmiaohe , References: <1629257542-36145-1-git-send-email-linyunsheng@huawei.com> From: Yunsheng Lin Message-ID: <2cf4b672-d7dc-db3d-ce90-15b4e91c4005@huawei.com> Date: Wed, 18 Aug 2021 17:36:06 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.69.30.204] X-ClientProxiedBy: dggeme716-chm.china.huawei.com (10.1.199.112) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/8/18 16:57, Eric Dumazet wrote: > On Wed, Aug 18, 2021 at 5:33 AM Yunsheng Lin wrote: >> >> This patchset adds the socket to netdev page frag recycling >> support based on the busy polling and page pool infrastructure. > > I really do not see how this can scale to thousands of sockets. > > tcp_mem[] defaults to ~ 9 % of physical memory. > > If you now run tests with thousands of sockets, their skbs will > consume Gigabytes > of memory on typical servers, now backed by order-0 pages (instead of > current order-3 pages) > So IOMMU costs will actually be much bigger. As the page allocator support bulk allocating now, see: https://elixir.bootlin.com/linux/latest/source/net/core/page_pool.c#L252 if the DMA also support batch mapping/unmapping, maybe having a small-sized page pool for thousands of sockets may not be a problem? Christoph Hellwig mentioned the batch DMA operation support in below thread: https://www.spinics.net/lists/netdev/msg666715.html if the batched DMA operation is supported, maybe having the page pool is mainly benefit the case of small number of socket? > > Are we planning to use Gigabyte sized page pools for NIC ? > > Have you tried instead to make TCP frags twice bigger ? Not yet. > This would require less IOMMU mappings. > (Note: This could require some mm help, since PAGE_ALLOC_COSTLY_ORDER > is currently 3, not 4) I am not familiar with mm yet, but I will take a look about that:) > > diff --git a/net/core/sock.c b/net/core/sock.c > index a3eea6e0b30a7d43793f567ffa526092c03e3546..6b66b51b61be9f198f6f1c4a3d81b57fa327986a > 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -2560,7 +2560,7 @@ static void sk_leave_memory_pressure(struct sock *sk) > } > } > > -#define SKB_FRAG_PAGE_ORDER get_order(32768) > +#define SKB_FRAG_PAGE_ORDER get_order(65536) > DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); > > /** > > > >>