From: Mina Almasry <almasrymina@google.com> To: Shailend Chand <shailend@google.com>, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org Cc: "Mina Almasry" <almasrymina@google.com>, "David S. Miller" <davem@davemloft.net>, "Eric Dumazet" <edumazet@google.com>, "Jakub Kicinski" <kuba@kernel.org>, "Paolo Abeni" <pabeni@redhat.com>, "Jonathan Corbet" <corbet@lwn.net>, "Jeroen de Borst" <jeroendb@google.com>, "Praveen Kaligineedi" <pkaligineedi@google.com>, "Jesper Dangaard Brouer" <hawk@kernel.org>, "Ilias Apalodimas" <ilias.apalodimas@linaro.org>, "Arnd Bergmann" <arnd@arndb.de>, "David Ahern" <dsahern@kernel.org>, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com>, "Shuah Khan" <shuah@kernel.org>, "Sumit Semwal" <sumit.semwal@linaro.org>, "Christian König" <christian.koenig@amd.com>, "Yunsheng Lin" <linyunsheng@huawei.com>, "Harshitha Ramamurthy" <hramamurthy@google.com>, "Shakeel Butt" <shakeelb@google.com> Subject: [net-next v1 10/16] page_pool: don't release iov on elevanted refcount Date: Thu, 7 Dec 2023 16:52:41 -0800 [thread overview] Message-ID: <20231208005250.2910004-11-almasrymina@google.com> (raw) In-Reply-To: <20231208005250.2910004-1-almasrymina@google.com> Currently the page_pool behavior is that a page is considered for recycling only once, the first time __page_pool_put_page() is called on it. This works because in practice the net stack only holds 1 reference to the skb frags. In that case, the page_pool recycling works as expected, as the skb frags will have 1 reference on the pages from the net stack when __page_pool_put_page() is called (if the driver is not holding extra references for recycling), and so the page will be recycled. However, this is not compatible with devmem TCP. For devmem TCP, the net stack holds 2 references for each frag, 1 reference is part of the SKB, and the second reference is for the user holding the frag until they call SO_DEVMEM_DONTNEED. This causes a bug in the page_pool recycling where, when the skb is freed, the reference count goes from 2->1, the page_pool sees a pending reference, releases the page, and so no devmem iovs get recycled. To fix this, don't release iovs on elevated refcount. Signed-off-by: Mina Almasry <almasrymina@google.com> --- net/core/page_pool.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/net/core/page_pool.c b/net/core/page_pool.c index f0148d66371b..dc2a148f5b06 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -731,6 +731,29 @@ __page_pool_put_page(struct page_pool *pool, struct page *page, /* Page found as candidate for recycling */ return page; } + + if (page_is_page_pool_iov(page)) { + /* With devmem TCP and ppiovs, we can't release pages if the + * refcount is > 1. This is because the net stack holds + * 2 references: + * - 1 for the skb, and + * - 1 for the user until they call SO_DEVMEM_DONTNEED. + * Releasing pages for elevated refcounts completely disables + * page_pool recycling. Instead, simply don't release pages and + * the next call to napi_pp_put_page() via SO_DEVMEM_DONTNEED + * will consider the page again for recycling. As a result, + * devmem TCP incompatible with drivers doing refcnt based + * recycling unless those drivers: + * + * - don't mark skb_mark_for_recycle() + * - are sure to release the last reference with + * page_pool_put_full_page() to consider the page for + * page_pool recycling. + */ + page_pool_page_put_many(page, 1); + return NULL; + } + /* Fallback/non-XDP mode: API user have elevated refcnt. * * Many drivers split up the page into fragments, and some -- 2.43.0.472.g3155946c3a-goog
WARNING: multiple messages have this Message-ID (diff)
From: Mina Almasry <almasrymina@google.com> To: Shailend Chand <shailend@google.com>, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org Cc: "Mina Almasry" <almasrymina@google.com>, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com>, "Jeroen de Borst" <jeroendb@google.com>, "Jesper Dangaard Brouer" <hawk@kernel.org>, "Arnd Bergmann" <arnd@arndb.de>, "Jonathan Corbet" <corbet@lwn.net>, "David Ahern" <dsahern@kernel.org>, "Ilias Apalodimas" <ilias.apalodimas@linaro.org>, "Yunsheng Lin" <linyunsheng@huawei.com>, "Sumit Semwal" <sumit.semwal@linaro.org>, "Eric Dumazet" <edumazet@google.com>, "Shakeel Butt" <shakeelb@google.com>, "Harshitha Ramamurthy" <hramamurthy@google.com>, "Praveen Kaligineedi" <pkaligineedi@google.com>, "Jakub Kicinski" <kuba@kernel.org>, "Christian König" <christian.koenig@amd.com>, "Paolo Abeni" <pabeni@redhat.com>, "Shuah Khan" <shuah@kernel.org>, "David S. Miller" <davem@davemloft.net> Subject: [net-next v1 10/16] page_pool: don't release iov on elevanted refcount Date: Thu, 7 Dec 2023 16:52:41 -0800 [thread overview] Message-ID: <20231208005250.2910004-11-almasrymina@google.com> (raw) In-Reply-To: <20231208005250.2910004-1-almasrymina@google.com> Currently the page_pool behavior is that a page is considered for recycling only once, the first time __page_pool_put_page() is called on it. This works because in practice the net stack only holds 1 reference to the skb frags. In that case, the page_pool recycling works as expected, as the skb frags will have 1 reference on the pages from the net stack when __page_pool_put_page() is called (if the driver is not holding extra references for recycling), and so the page will be recycled. However, this is not compatible with devmem TCP. For devmem TCP, the net stack holds 2 references for each frag, 1 reference is part of the SKB, and the second reference is for the user holding the frag until they call SO_DEVMEM_DONTNEED. This causes a bug in the page_pool recycling where, when the skb is freed, the reference count goes from 2->1, the page_pool sees a pending reference, releases the page, and so no devmem iovs get recycled. To fix this, don't release iovs on elevated refcount. Signed-off-by: Mina Almasry <almasrymina@google.com> --- net/core/page_pool.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/net/core/page_pool.c b/net/core/page_pool.c index f0148d66371b..dc2a148f5b06 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -731,6 +731,29 @@ __page_pool_put_page(struct page_pool *pool, struct page *page, /* Page found as candidate for recycling */ return page; } + + if (page_is_page_pool_iov(page)) { + /* With devmem TCP and ppiovs, we can't release pages if the + * refcount is > 1. This is because the net stack holds + * 2 references: + * - 1 for the skb, and + * - 1 for the user until they call SO_DEVMEM_DONTNEED. + * Releasing pages for elevated refcounts completely disables + * page_pool recycling. Instead, simply don't release pages and + * the next call to napi_pp_put_page() via SO_DEVMEM_DONTNEED + * will consider the page again for recycling. As a result, + * devmem TCP incompatible with drivers doing refcnt based + * recycling unless those drivers: + * + * - don't mark skb_mark_for_recycle() + * - are sure to release the last reference with + * page_pool_put_full_page() to consider the page for + * page_pool recycling. + */ + page_pool_page_put_many(page, 1); + return NULL; + } + /* Fallback/non-XDP mode: API user have elevated refcnt. * * Many drivers split up the page into fragments, and some -- 2.43.0.472.g3155946c3a-goog
next prev parent reply other threads:[~2023-12-08 0:53 UTC|newest] Thread overview: 145+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-12-08 0:52 [net-next v1 00/16] Device Memory TCP Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 01/16] net: page_pool: factor out releasing DMA from releasing the page Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-10 3:49 ` Shakeel Butt 2023-12-10 3:49 ` Shakeel Butt 2023-12-12 8:11 ` Ilias Apalodimas 2023-12-12 8:11 ` Ilias Apalodimas 2023-12-08 0:52 ` [net-next v1 02/16] net: page_pool: create hooks for custom page providers Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-12 8:07 ` Ilias Apalodimas 2023-12-12 8:07 ` Ilias Apalodimas 2023-12-12 14:47 ` Mina Almasry 2023-12-12 14:47 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 03/16] queue_api: define queue api Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-14 1:15 ` Jakub Kicinski 2023-12-14 1:15 ` Jakub Kicinski 2023-12-08 0:52 ` [net-next v1 04/16] gve: implement " Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2024-03-05 11:45 ` Arnd Bergmann 2023-12-08 0:52 ` [net-next v1 05/16] net: netdev netlink api to bind dma-buf to a net device Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-14 1:17 ` Jakub Kicinski 2023-12-14 1:17 ` Jakub Kicinski 2023-12-08 0:52 ` [net-next v1 06/16] netdev: support binding dma-buf to netdevice Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 15:40 ` kernel test robot 2023-12-08 15:40 ` kernel test robot 2023-12-08 16:02 ` kernel test robot 2023-12-08 16:02 ` kernel test robot 2023-12-08 17:48 ` David Ahern 2023-12-08 17:48 ` David Ahern 2023-12-08 19:22 ` Mina Almasry 2023-12-08 19:22 ` Mina Almasry 2023-12-08 20:32 ` Mina Almasry 2023-12-08 20:32 ` Mina Almasry 2023-12-09 23:29 ` David Ahern 2023-12-09 23:29 ` David Ahern 2023-12-11 2:19 ` Mina Almasry 2023-12-11 2:19 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 07/16] netdev: netdevice devmem allocator Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 17:56 ` David Ahern 2023-12-08 17:56 ` David Ahern 2023-12-08 19:27 ` Mina Almasry 2023-12-08 19:27 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 08/16] memory-provider: dmabuf devmem memory provider Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 22:48 ` Pavel Begunkov 2023-12-08 22:48 ` Pavel Begunkov 2023-12-08 23:25 ` Mina Almasry 2023-12-08 23:25 ` Mina Almasry 2023-12-10 3:03 ` Pavel Begunkov 2023-12-10 3:03 ` Pavel Begunkov 2023-12-11 2:30 ` Mina Almasry 2023-12-11 2:30 ` Mina Almasry 2023-12-11 20:35 ` Pavel Begunkov 2023-12-11 20:35 ` Pavel Begunkov 2023-12-14 20:03 ` Mina Almasry 2023-12-14 20:03 ` Mina Almasry 2023-12-19 23:55 ` Pavel Begunkov 2023-12-19 23:55 ` Pavel Begunkov 2023-12-08 23:05 ` Pavel Begunkov 2023-12-08 23:05 ` Pavel Begunkov 2023-12-12 12:25 ` Jason Gunthorpe 2023-12-12 12:25 ` Jason Gunthorpe 2023-12-12 13:07 ` Christoph Hellwig 2023-12-12 14:26 ` Mina Almasry 2023-12-12 14:26 ` Mina Almasry 2023-12-12 14:39 ` Jason Gunthorpe 2023-12-12 14:39 ` Jason Gunthorpe 2023-12-12 14:58 ` Mina Almasry 2023-12-12 14:58 ` Mina Almasry 2023-12-12 15:08 ` Jason Gunthorpe 2023-12-12 15:08 ` Jason Gunthorpe 2023-12-13 1:09 ` Mina Almasry 2023-12-13 1:09 ` Mina Almasry 2023-12-13 2:19 ` David Ahern 2023-12-13 2:19 ` David Ahern 2023-12-13 7:49 ` Yinjun Zhang 2023-12-13 7:49 ` Yinjun Zhang 2023-12-08 0:52 ` [net-next v1 09/16] page_pool: device memory support Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 9:30 ` Yunsheng Lin 2023-12-08 9:30 ` Yunsheng Lin 2023-12-08 16:05 ` Mina Almasry 2023-12-08 16:05 ` Mina Almasry 2023-12-11 2:04 ` Yunsheng Lin 2023-12-11 2:04 ` Yunsheng Lin 2023-12-11 2:26 ` Mina Almasry 2023-12-11 2:26 ` Mina Almasry 2023-12-11 4:04 ` Mina Almasry 2023-12-11 4:04 ` Mina Almasry 2023-12-11 11:51 ` Yunsheng Lin 2023-12-11 11:51 ` Yunsheng Lin 2023-12-11 18:14 ` Mina Almasry 2023-12-11 18:14 ` Mina Almasry 2023-12-12 11:17 ` Yunsheng Lin 2023-12-12 11:17 ` Yunsheng Lin 2023-12-12 14:28 ` Mina Almasry 2023-12-12 14:28 ` Mina Almasry 2023-12-13 11:48 ` Yunsheng Lin 2023-12-13 11:48 ` Yunsheng Lin 2023-12-13 7:52 ` Mina Almasry 2023-12-13 7:52 ` Mina Almasry 2023-12-08 0:52 ` Mina Almasry [this message] 2023-12-08 0:52 ` [net-next v1 10/16] page_pool: don't release iov on elevanted refcount Mina Almasry 2023-12-08 0:52 ` [net-next v1 11/16] net: support non paged skb frags Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 12/16] net: add support for skbs with unreadable frags Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 13/16] tcp: RX path for devmem TCP Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 15:40 ` kernel test robot 2023-12-08 15:40 ` kernel test robot 2023-12-08 17:55 ` David Ahern 2023-12-08 17:55 ` David Ahern 2023-12-08 19:23 ` Mina Almasry 2023-12-08 19:23 ` Mina Almasry 2023-12-08 0:52 ` [net-next v1 14/16] net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-12 19:08 ` Simon Horman 2023-12-12 19:08 ` Simon Horman 2023-12-08 0:52 ` [net-next v1 15/16] net: add devmem TCP documentation Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-12 19:14 ` Simon Horman 2023-12-12 19:14 ` Simon Horman 2023-12-08 0:52 ` [net-next v1 16/16] selftests: add ncdevmem, netcat for devmem TCP Mina Almasry 2023-12-08 0:52 ` Mina Almasry 2023-12-08 1:47 ` [net-next v1 00/16] Device Memory TCP Mina Almasry 2023-12-08 1:47 ` Mina Almasry 2023-12-08 17:57 ` David Ahern 2023-12-08 17:57 ` David Ahern 2023-12-08 19:31 ` Mina Almasry 2023-12-08 19:31 ` Mina Almasry 2023-12-10 3:48 ` Shakeel Butt 2023-12-10 3:48 ` Shakeel Butt 2023-12-12 5:58 ` Christoph Hellwig 2023-12-14 6:20 ` patchwork-bot+netdevbpf 2023-12-14 6:20 ` patchwork-bot+netdevbpf 2023-12-14 6:48 ` Christoph Hellwig 2023-12-14 6:51 ` Mina Almasry 2023-12-14 6:51 ` Mina Almasry 2023-12-14 6:59 ` Christoph Hellwig
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20231208005250.2910004-11-almasrymina@google.com \ --to=almasrymina@google.com \ --cc=arnd@arndb.de \ --cc=bpf@vger.kernel.org \ --cc=christian.koenig@amd.com \ --cc=corbet@lwn.net \ --cc=davem@davemloft.net \ --cc=dri-devel@lists.freedesktop.org \ --cc=dsahern@kernel.org \ --cc=edumazet@google.com \ --cc=hawk@kernel.org \ --cc=hramamurthy@google.com \ --cc=ilias.apalodimas@linaro.org \ --cc=jeroendb@google.com \ --cc=kuba@kernel.org \ --cc=linux-arch@vger.kernel.org \ --cc=linux-doc@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-kselftest@vger.kernel.org \ --cc=linux-media@vger.kernel.org \ --cc=linyunsheng@huawei.com \ --cc=netdev@vger.kernel.org \ --cc=pabeni@redhat.com \ --cc=pkaligineedi@google.com \ --cc=shailend@google.com \ --cc=shakeelb@google.com \ --cc=shuah@kernel.org \ --cc=sumit.semwal@linaro.org \ --cc=willemdebruijn.kernel@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.