From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03E3FC4332F for ; Wed, 8 Nov 2023 10:57:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344416AbjKHK5G (ORCPT ); Wed, 8 Nov 2023 05:57:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235663AbjKHK4a (ORCPT ); Wed, 8 Nov 2023 05:56:30 -0500 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E5A019A5; Wed, 8 Nov 2023 02:56:16 -0800 (PST) Received: from dggpemm500005.china.huawei.com (unknown [172.30.72.53]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4SQMSz45TJzfb2h; Wed, 8 Nov 2023 18:56:03 +0800 (CST) Received: from [10.69.30.204] (10.69.30.204) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Wed, 8 Nov 2023 18:56:12 +0800 Subject: Re: [RFC PATCH v3 07/12] page-pool: device memory support To: Mina Almasry CC: , , , , , , , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jesper Dangaard Brouer , Ilias Apalodimas , Arnd Bergmann , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=c3=b6nig?= , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi References: <20231106024413.2801438-1-almasrymina@google.com> <20231106024413.2801438-8-almasrymina@google.com> <4a0e9d53-324d-e19b-2a30-ba86f9e5569e@huawei.com> From: Yunsheng Lin Message-ID: Date: Wed, 8 Nov 2023 18:56:11 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.69.30.204] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023/11/8 5:56, Mina Almasry wrote: > On Tue, Nov 7, 2023 at 12:00 AM Yunsheng Lin wrote: >> >> On 2023/11/6 10:44, Mina Almasry wrote: >>> Overload the LSB of struct page* to indicate that it's a page_pool_iov. >>> >>> Refactor mm calls on struct page* into helpers, and add page_pool_iov >>> handling on those helpers. Modify callers of these mm APIs with calls to >>> these helpers instead. >>> >>> In areas where struct page* is dereferenced, add a check for special >>> handling of page_pool_iov. >>> >>> Signed-off-by: Mina Almasry >>> >>> --- >>> include/net/page_pool/helpers.h | 74 ++++++++++++++++++++++++++++++++- >>> net/core/page_pool.c | 63 ++++++++++++++++++++-------- >>> 2 files changed, 118 insertions(+), 19 deletions(-) >>> >>> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h >>> index b93243c2a640..08f1a2cc70d2 100644 >>> --- a/include/net/page_pool/helpers.h >>> +++ b/include/net/page_pool/helpers.h >>> @@ -151,6 +151,64 @@ static inline struct page_pool_iov *page_to_page_pool_iov(struct page *page) >>> return NULL; >>> } >>> >>> +static inline int page_pool_page_ref_count(struct page *page) >>> +{ >>> + if (page_is_page_pool_iov(page)) >>> + return page_pool_iov_refcount(page_to_page_pool_iov(page)); >> >> We have added a lot of 'if' for the devmem case, it would be better to >> make it more generic so that we can have more unified metadata handling >> for normal page and devmem. If we add another memory type here, do we >> need another 'if' here? > > Maybe, not sure. I'm guessing new memory types will either be pages or > iovs, so maybe no new if statements needed. > >> That is part of the reason I suggested using a more unified metadata for >> all the types of memory chunks used by page_pool. > > I think your suggestion was to use struct pages for devmem. That was > thoroughly considered and intensely argued about in the initial > conversations regarding devmem and the initial RFC, and from the > conclusions there it's extremely clear to me that devmem struct pages > are categorically a no-go. Not exactly, I was wondering if adding a more abstract structure specificly for page pool makes any sense, and each mem type can add its own specific fields, net stack only see and handle the common fields so that it does not care about specific mem type, and each provider only see the and handle the specific fields belonging to it most of the time. Ideally something like beleow: struct netmem { /* common fields */ refcount_t refcount; struct page_pool *pp; ...... union { struct devmem{ struct dmabuf_genpool_chunk_owner *owner; }; struct other_mem{ ... ... }; }; }; But untill we completely decouple the 'struct page' from the net stack, the above seems undoable in the near term. But we might be able to do something as folio is doing now, mm subsystem is still seeing 'struct folio/page', but other subsystem like slab is using 'struct slab', and there is still some common fields shared between 'struct folio' and 'struct slab'. As the netmem patchset, is devmem able to reuse the below 'struct netmem' and rename it to 'struct page_pool_iov'? So that 'struct page' for normal memory and 'struct page_pool_iov' for devmem share the common fields used by page pool and net stack? And we might be able to reuse the 'flags', '_pp_mapping_pad' and '_mapcount' for specific mem provider, which is enough for the devmem only requiring a single pointer to point to it's owner? https://lkml.kernel.org/netdev/20230105214631.3939268-2-willy@infradead.org/ +/** + * struct netmem - A memory allocation from a &struct page_pool. + * @flags: The same as the page flags. Do not use directly. + * @pp_magic: Magic value to avoid recycling non page_pool allocated pages. + * @pp: The page pool this netmem was allocated from. + * @dma_addr: Call netmem_get_dma_addr() to read this value. + * @dma_addr_upper: Might need to be 64-bit on 32-bit architectures. + * @pp_frag_count: For frag page support, not supported in 32-bit + * architectures with 64-bit DMA. + * @_mapcount: Do not access this member directly. + * @_refcount: Do not access this member directly. Read it using + * netmem_ref_count() and manipulate it with netmem_get() and netmem_put(). + * + * This struct overlays struct page for now. Do not modify without a + * good understanding of the issues. + */ +struct netmem { + unsigned long flags; + unsigned long pp_magic; + struct page_pool *pp; + /* private: no need to document this padding */ + unsigned long _pp_mapping_pad; /* aliases with folio->mapping */ + /* public: */ + unsigned long dma_addr; + union { + unsigned long dma_addr_upper; + atomic_long_t pp_frag_count; + }; + atomic_t _mapcount; + atomic_t _refcount; +}; If we do that, it seems we might be able to allow net stack and page pool to see the metadata for devmem chunk as 'struct page', and may be able to aovid most of the 'if' checking in net stack and page pool? > > -- > Thanks, > Mina > > . >