From: Mina Almasry <almasrymina@google.com> To: Yunsheng Lin <linyunsheng@huawei.com> Cc: "Shailend Chand" <shailend@google.com>, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, "David S. Miller" <davem@davemloft.net>, "Eric Dumazet" <edumazet@google.com>, "Jakub Kicinski" <kuba@kernel.org>, "Paolo Abeni" <pabeni@redhat.com>, "Jonathan Corbet" <corbet@lwn.net>, "Jeroen de Borst" <jeroendb@google.com>, "Praveen Kaligineedi" <pkaligineedi@google.com>, "Jesper Dangaard Brouer" <hawk@kernel.org>, "Ilias Apalodimas" <ilias.apalodimas@linaro.org>, "Arnd Bergmann" <arnd@arndb.de>, "David Ahern" <dsahern@kernel.org>, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com>, "Shuah Khan" <shuah@kernel.org>, "Sumit Semwal" <sumit.semwal@linaro.org>, "Christian König" <christian.koenig@amd.com>, "Harshitha Ramamurthy" <hramamurthy@google.com>, "Shakeel Butt" <shakeelb@google.com> Subject: Re: [net-next v1 09/16] page_pool: device memory support Date: Sun, 10 Dec 2023 18:26:29 -0800 [thread overview] Message-ID: <CAHS8izPEFsqw50qgM+sPot6XVvOExpd+DrwrmPSR3zsWGLysRw@mail.gmail.com> (raw) In-Reply-To: <92e30bd9-6df4-b72f-7bcd-f4fe5670eba2@huawei.com> On Sun, Dec 10, 2023 at 6:04 PM Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2023/12/9 0:05, Mina Almasry wrote: > > On Fri, Dec 8, 2023 at 1:30 AM Yunsheng Lin <linyunsheng@huawei.com> wrote: > >> > >> > >> As mentioned before, it seems we need to have the above checking every > >> time we need to do some per-page handling in page_pool core, is there > >> a plan in your mind how to remove those kind of checking in the future? > >> > > > > I see 2 ways to remove the checking, both infeasible: > > > > 1. Allocate a wrapper struct that pulls out all the fields the page pool needs: > > > > struct netmem { > > /* common fields */ > > refcount_t refcount; > > bool is_pfmemalloc; > > int nid; > > ... > > union { > > struct dmabuf_genpool_chunk_owner *owner; > > struct page * page; > > }; > > }; > > > > The page pool can then not care if the underlying memory is iov or > > page. However this introduces significant memory bloat as this struct > > needs to be allocated for each page or ppiov, which I imagine is not > > acceptable for the upside of removing a few static_branch'd if > > statements with no performance cost. > > > > 2. Create a unified struct for page and dmabuf memory, which the mm > > folks have repeatedly nacked, and I imagine will repeatedly nack in > > the future. > > > > So I imagine the special handling of ppiov in some form is critical > > and the checking may not be removable. > > If the above is true, perhaps devmem is not really supposed to be intergated > into page_pool. > > Adding a checking for every per-page handling in page_pool core is just too > hacky to be really considerred a longterm solution. > The only other option is to implement another page_pool for ppiov and have the driver create page_pool or ppiov_pool depending on the state of the netdev_rx_queue (or some helper in the net stack to do that for the driver). This introduces some code duplication. The ppiov_pool & page_pool would look similar in implementation. But this was all discussed in detail in RFC v2 and the last response I heard from Jesper was in favor if this approach, if I understand correctly: https://lore.kernel.org/netdev/7aedc5d5-0daf-63be-21bc-3b724cc1cab9@redhat.com/ Would love to have the maintainer weigh in here. > It is somewhat ironical that devmem is using static_branch to alliviate the > performance impact for normal memory at the possible cost of performance > degradation for devmem, does it not defeat some purpose of intergating devmem > to page_pool? > I don't see the issue. The static branch sets the non-ppiov path as default if no memory providers are in use, and flips it when they are, making the default branch prediction ideal in both cases. > > > >> Even though a static_branch check is added in page_is_page_pool_iov(), it > >> does not make much sense that a core has tow different 'struct' for its > >> most basic data. > >> > >> IMHO, the ppiov for dmabuf is forced fitting into page_pool without much > >> design consideration at this point. > >> > > ... > >> > >> For now, the above may work for the the rx part as it seems that you are > >> only enabling rx for dmabuf for now. > >> > >> What is the plan to enable tx for dmabuf? If it is also intergrated into > >> page_pool? There was a attempt to enable page_pool for tx, Eric seemed to > >> have some comment about this: > >> https://lkml.kernel.org/netdev/2cf4b672-d7dc-db3d-ce90-15b4e91c4005@huawei.com/T/#mb6ab62dc22f38ec621d516259c56dd66353e24a2 > >> > >> If tx is not intergrated into page_pool, do we need to create a new layer for > >> the tx dmabuf? > >> > > > > I imagine the TX path will reuse page_pool_iov, page_pool_iov_*() > > helpers, and page_pool_page_*() helpers, but will not need any core > > page_pool changes. This is because the TX path will have to piggyback > > We may need another bit/flags checking to demux between page_pool owned > devmem and non-page_pool owned devmem. > The way I'm imagining the support, I don't see the need for such flags. We'd be re-using generic helpers like page_pool_iov_get_dma_address() and what not that don't need that checking. > Also calling page_pool_*() on non-page_pool owned devmem is confusing > enough that we may need a thin layer handling non-page_pool owned devmem > in the end. > The page_pool_page* & page_pool_iov* functions can be renamed if confusing. I would think that's no issue (note that the page_pool_* functions need not be called for TX path). > > on MSG_ZEROCOPY (devmem is not copyable), so no memory allocation from > > the page_pool (or otherwise) is needed or possible. RFCv1 had a TX > > implementation based on dmabuf pages without page_pool involvement, I > > imagine I'll do something similar. > It would be good to have a tx implementation for the next version, so > that we can have a whole picture of devmem. > > > -- Thanks, Mina
WARNING: multiple messages have this Message-ID (diff)
From: Mina Almasry <almasrymina@google.com> To: Yunsheng Lin <linyunsheng@huawei.com> Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, "Eric Dumazet" <edumazet@google.com>, linux-kselftest@vger.kernel.org, "Shuah Khan" <shuah@kernel.org>, "Sumit Semwal" <sumit.semwal@linaro.org>, linux-arch@vger.kernel.org, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com>, "Jeroen de Borst" <jeroendb@google.com>, "Jonathan Corbet" <corbet@lwn.net>, "Jakub Kicinski" <kuba@kernel.org>, "Paolo Abeni" <pabeni@redhat.com>, linux-media@vger.kernel.org, "Jesper Dangaard Brouer" <hawk@kernel.org>, "Arnd Bergmann" <arnd@arndb.de>, "Shailend Chand" <shailend@google.com>, "Shakeel Butt" <shakeelb@google.com>, "Harshitha Ramamurthy" <hramamurthy@google.com>, netdev@vger.kernel.org, "David Ahern" <dsahern@kernel.org>, "Ilias Apalodimas" <ilias.apalodimas@linaro.org>, linux-kernel@vger.kernel.org, "Christian König" <christian.koenig@amd.com>, "Praveen Kaligineedi" <pkaligineedi@google.com>, bpf@vger.kernel.org, "David S. Miller" <davem@davemloft.net> Subject: Re: [net-next v1 09/16] page_pool: device memory support Date: Sun, 10 Dec 2023 18:26:29 -0800 [thread overview] Message-ID: <CAHS8izPEFsqw50qgM+sPot6XVvOExpd+DrwrmPSR3zsWGLysRw@mail.gmail.com> (raw) In-Reply-To: <92e30bd9-6df4-b72f-7bcd-f4fe5670eba2@huawei.com> On Sun, Dec 10, 2023 at 6:04 PM Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2023/12/9 0:05, Mina Almasry wrote: > > On Fri, Dec 8, 2023 at 1:30 AM Yunsheng Lin <linyunsheng@huawei.com> wrote: > >> > >> > >> As mentioned before, it seems we need to have the above checking every > >> time we need to do some per-page handling in page_pool core, is there > >> a plan in your mind how to remove those kind of checking in the future? > >> > > > > I see 2 ways to remove the checking, both infeasible: > > > > 1. Allocate a wrapper struct that pulls out all the fields the page pool needs: > > > > struct netmem { > > /* common fields */ > > refcount_t refcount; > > bool is_pfmemalloc; > > int nid; > > ... > > union { > > struct dmabuf_genpool_chunk_owner *owner; > > struct page * page; > > }; > > }; > > > > The page pool can then not care if the underlying memory is iov or > > page. However this introduces significant memory bloat as this struct > > needs to be allocated for each page or ppiov, which I imagine is not > > acceptable for the upside of removing a few static_branch'd if > > statements with no performance cost. > > > > 2. Create a unified struct for page and dmabuf memory, which the mm > > folks have repeatedly nacked, and I imagine will repeatedly nack in > > the future. > > > > So I imagine the special handling of ppiov in some form is critical > > and the checking may not be removable. > > If the above is true, perhaps devmem is not really supposed to be intergated > into page_pool. > > Adding a checking for every per-page handling in page_pool core is just too > hacky to be really considerred a longterm solution. > The only other option is to implement another page_pool for ppiov and have the driver create page_pool or ppiov_pool depending on the state of the netdev_rx_queue (or some helper in the net stack to do that for the driver). This introduces some code duplication. The ppiov_pool & page_pool would look similar in implementation. But this was all discussed in detail in RFC v2 and the last response I heard from Jesper was in favor if this approach, if I understand correctly: https://lore.kernel.org/netdev/7aedc5d5-0daf-63be-21bc-3b724cc1cab9@redhat.com/ Would love to have the maintainer weigh in here. > It is somewhat ironical that devmem is using static_branch to alliviate the > performance impact for normal memory at the possible cost of performance > degradation for devmem, does it not defeat some purpose of intergating devmem > to page_pool? > I don't see the issue. The static branch sets the non-ppiov path as default if no memory providers are in use, and flips it when they are, making the default branch prediction ideal in both cases. > > > >> Even though a static_branch check is added in page_is_page_pool_iov(), it > >> does not make much sense that a core has tow different 'struct' for its > >> most basic data. > >> > >> IMHO, the ppiov for dmabuf is forced fitting into page_pool without much > >> design consideration at this point. > >> > > ... > >> > >> For now, the above may work for the the rx part as it seems that you are > >> only enabling rx for dmabuf for now. > >> > >> What is the plan to enable tx for dmabuf? If it is also intergrated into > >> page_pool? There was a attempt to enable page_pool for tx, Eric seemed to > >> have some comment about this: > >> https://lkml.kernel.org/netdev/2cf4b672-d7dc-db3d-ce90-15b4e91c4005@huawei.com/T/#mb6ab62dc22f38ec621d516259c56dd66353e24a2 > >> > >> If tx is not intergrated into page_pool, do we need to create a new layer for > >> the tx dmabuf? > >> > > > > I imagine the TX path will reuse page_pool_iov, page_pool_iov_*() > > helpers, and page_pool_page_*() helpers, but will not need any core > > page_pool changes. This is because the TX path will have to piggyback > > We may need another bit/flags checking to demux between page_pool owned > devmem and non-page_pool owned devmem. > The way I'm imagining the support, I don't see the need for such flags. We'd be re-using generic helpers like page_pool_iov_get_dma_address() and what not that don't need that checking. > Also calling page_pool_*() on non-page_pool owned devmem is confusing > enough that we may need a thin layer handling non-page_pool owned devmem > in the end. > The page_pool_page* & page_pool_iov* functions can be renamed if confusing. I would think that's no issue (note that the page_pool_* functions need not be called for TX path). > > on MSG_ZEROCOPY (devmem is not copyable), so no memory allocation from > > the page_pool (or otherwise) is needed or possible. RFCv1 had a TX > > implementation based on dmabuf pages without page_pool involvement, I > > imagine I'll do something similar. > It would be good to have a tx implementation for the next version, so > that we can have a whole picture of devmem. > > > -- Thanks, Mina
next prev parent reply other threads:[~2023-12-11 2:26 UTC|newest] Thread overview: 145+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-12-08 0:52 [net-next v1 00/16] Device Memory TCP Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 01/16] net: page_pool: factor out releasing DMA from releasing the page Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-10 3:49 ` Shakeel Butt 2023-12-10 3:49 ` Shakeel Butt 2023-12-12 8:11 ` Ilias Apalodimas 2023-12-12 8:11 ` Ilias Apalodimas 2023-12-08 0:52 ` [net-next v1 02/16] net: page_pool: create hooks for custom page providers Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-12 8:07 ` Ilias Apalodimas 2023-12-12 8:07 ` Ilias Apalodimas 2023-12-12 14:47 ` Mina Almasry 2023-12-12 14:47 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 03/16] queue_api: define queue api Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-14 1:15 ` Jakub Kicinski 2023-12-14 1:15 ` Jakub Kicinski 2023-12-08 0:52 ` [net-next v1 04/16] gve: implement " Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2024-03-05 11:45 ` Arnd Bergmann 2023-12-08 0:52 ` [net-next v1 05/16] net: netdev netlink api to bind dma-buf to a net device Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-14 1:17 ` Jakub Kicinski 2023-12-14 1:17 ` Jakub Kicinski 2023-12-08 0:52 ` [net-next v1 06/16] netdev: support binding dma-buf to netdevice Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 15:40 ` kernel test robot 2023-12-08 15:40 ` kernel test robot 2023-12-08 16:02 ` kernel test robot 2023-12-08 16:02 ` kernel test robot 2023-12-08 17:48 ` David Ahern 2023-12-08 17:48 ` David Ahern 2023-12-08 19:22 ` Mina Almasry 2023-12-08 19:22 ` Mina Almasry 2023-12-08 20:32 ` Mina Almasry 2023-12-08 20:32 ` Mina Almasry 2023-12-09 23:29 ` David Ahern 2023-12-09 23:29 ` David Ahern 2023-12-11 2:19 ` Mina Almasry 2023-12-11 2:19 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 07/16] netdev: netdevice devmem allocator Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 17:56 ` David Ahern 2023-12-08 17:56 ` David Ahern 2023-12-08 19:27 ` Mina Almasry 2023-12-08 19:27 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 08/16] memory-provider: dmabuf devmem memory provider Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 22:48 ` Pavel Begunkov 2023-12-08 22:48 ` Pavel Begunkov 2023-12-08 23:25 ` Mina Almasry 2023-12-08 23:25 ` Mina Almasry 2023-12-10 3:03 ` Pavel Begunkov 2023-12-10 3:03 ` Pavel Begunkov 2023-12-11 2:30 ` Mina Almasry 2023-12-11 2:30 ` Mina Almasry 2023-12-11 20:35 ` Pavel Begunkov 2023-12-11 20:35 ` Pavel Begunkov 2023-12-14 20:03 ` Mina Almasry 2023-12-14 20:03 ` Mina Almasry 2023-12-19 23:55 ` Pavel Begunkov 2023-12-19 23:55 ` Pavel Begunkov 2023-12-08 23:05 ` Pavel Begunkov 2023-12-08 23:05 ` Pavel Begunkov 2023-12-12 12:25 ` Jason Gunthorpe 2023-12-12 12:25 ` Jason Gunthorpe 2023-12-12 13:07 ` Christoph Hellwig 2023-12-12 14:26 ` Mina Almasry 2023-12-12 14:26 ` Mina Almasry 2023-12-12 14:39 ` Jason Gunthorpe 2023-12-12 14:39 ` Jason Gunthorpe 2023-12-12 14:58 ` Mina Almasry 2023-12-12 14:58 ` Mina Almasry 2023-12-12 15:08 ` Jason Gunthorpe 2023-12-12 15:08 ` Jason Gunthorpe 2023-12-13 1:09 ` Mina Almasry 2023-12-13 1:09 ` Mina Almasry 2023-12-13 2:19 ` David Ahern 2023-12-13 2:19 ` David Ahern 2023-12-13 7:49 ` Yinjun Zhang 2023-12-13 7:49 ` Yinjun Zhang 2023-12-08 0:52 ` [net-next v1 09/16] page_pool: device memory support Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 9:30 ` Yunsheng Lin 2023-12-08 9:30 ` Yunsheng Lin 2023-12-08 16:05 ` Mina Almasry 2023-12-08 16:05 ` Mina Almasry 2023-12-11 2:04 ` Yunsheng Lin 2023-12-11 2:04 ` Yunsheng Lin 2023-12-11 2:26 ` Mina Almasry [this message] 2023-12-11 2:26 ` Mina Almasry 2023-12-11 4:04 ` Mina Almasry 2023-12-11 4:04 ` Mina Almasry 2023-12-11 11:51 ` Yunsheng Lin 2023-12-11 11:51 ` Yunsheng Lin 2023-12-11 18:14 ` Mina Almasry 2023-12-11 18:14 ` Mina Almasry 2023-12-12 11:17 ` Yunsheng Lin 2023-12-12 11:17 ` Yunsheng Lin 2023-12-12 14:28 ` Mina Almasry 2023-12-12 14:28 ` Mina Almasry 2023-12-13 11:48 ` Yunsheng Lin 2023-12-13 11:48 ` Yunsheng Lin 2023-12-13 7:52 ` Mina Almasry 2023-12-13 7:52 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 10/16] page_pool: don't release iov on elevanted refcount Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 11/16] net: support non paged skb frags Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 12/16] net: add support for skbs with unreadable frags Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 13/16] tcp: RX path for devmem TCP Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 15:40 ` kernel test robot 2023-12-08 15:40 ` kernel test robot 2023-12-08 17:55 ` David Ahern 2023-12-08 17:55 ` David Ahern 2023-12-08 19:23 ` Mina Almasry 2023-12-08 19:23 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 14/16] net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-12 19:08 ` Simon Horman 2023-12-12 19:08 ` Simon Horman 2023-12-08 0:52 ` [net-next v1 15/16] net: add devmem TCP documentation Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-12 19:14 ` Simon Horman 2023-12-12 19:14 ` Simon Horman 2023-12-08 0:52 ` [net-next v1 16/16] selftests: add ncdevmem, netcat for devmem TCP Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 1:47 ` [net-next v1 00/16] Device Memory TCP Mina Almasry 2023-12-08 1:47 ` Mina Almasry 2023-12-08 17:57 ` David Ahern 2023-12-08 17:57 ` David Ahern 2023-12-08 19:31 ` Mina Almasry 2023-12-08 19:31 ` Mina Almasry 2023-12-10 3:48 ` Shakeel Butt 2023-12-10 3:48 ` Shakeel Butt 2023-12-12 5:58 ` Christoph Hellwig 2023-12-14 6:20 ` patchwork-bot+netdevbpf 2023-12-14 6:20 ` patchwork-bot+netdevbpf 2023-12-14 6:48 ` Christoph Hellwig 2023-12-14 6:51 ` Mina Almasry 2023-12-14 6:51 ` Mina Almasry 2023-12-14 6:59 ` Christoph Hellwig
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAHS8izPEFsqw50qgM+sPot6XVvOExpd+DrwrmPSR3zsWGLysRw@mail.gmail.com \ --to=almasrymina@google.com \ --cc=arnd@arndb.de \ --cc=bpf@vger.kernel.org \ --cc=christian.koenig@amd.com \ --cc=corbet@lwn.net \ --cc=davem@davemloft.net \ --cc=dri-devel@lists.freedesktop.org \ --cc=dsahern@kernel.org \ --cc=edumazet@google.com \ --cc=hawk@kernel.org \ --cc=hramamurthy@google.com \ --cc=ilias.apalodimas@linaro.org \ --cc=jeroendb@google.com \ --cc=kuba@kernel.org \ --cc=linux-arch@vger.kernel.org \ --cc=linux-doc@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-kselftest@vger.kernel.org \ --cc=linux-media@vger.kernel.org \ --cc=linyunsheng@huawei.com \ --cc=netdev@vger.kernel.org \ --cc=pabeni@redhat.com \ --cc=pkaligineedi@google.com \ --cc=shailend@google.com \ --cc=shakeelb@google.com \ --cc=shuah@kernel.org \ --cc=sumit.semwal@linaro.org \ --cc=willemdebruijn.kernel@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.