All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Mina Almasry <almasrymina@google.com>,
	Shailend Chand <shailend@google.com>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-doc@vger.kernel.org>,
	<linux-arch@vger.kernel.org>, <linux-kselftest@vger.kernel.org>,
	<bpf@vger.kernel.org>, <linux-media@vger.kernel.org>,
	<dri-devel@lists.freedesktop.org>
Cc: "David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Jeroen de Borst" <jeroendb@google.com>,
	"Praveen Kaligineedi" <pkaligineedi@google.com>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"David Ahern" <dsahern@kernel.org>,
	"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
	"Shuah Khan" <shuah@kernel.org>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Harshitha Ramamurthy" <hramamurthy@google.com>,
	"Shakeel Butt" <shakeelb@google.com>
Subject: Re: [net-next v1 09/16] page_pool: device memory support
Date: Fri, 8 Dec 2023 17:30:36 +0800	[thread overview]
Message-ID: <32211cbf-3a4e-8a86-6214-4304ddb18a98@huawei.com> (raw)
In-Reply-To: <20231208005250.2910004-10-almasrymina@google.com>

On 2023/12/8 8:52, Mina Almasry wrote:
> Overload the LSB of struct page* to indicate that it's a page_pool_iov.
> 
> Refactor mm calls on struct page* into helpers, and add page_pool_iov
> handling on those helpers. Modify callers of these mm APIs with calls to
> these helpers instead.
> 
> In areas where struct page* is dereferenced, add a check for special
> handling of page_pool_iov.
> 
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> 
> ---
> 
> v1:
> - Disable fragmentation support for iov properly.
> - fix napi_pp_put_page() path (Yunsheng).
> 
> ---
>  include/net/page_pool/helpers.h | 78 ++++++++++++++++++++++++++++++++-
>  net/core/page_pool.c            | 67 ++++++++++++++++++++--------
>  net/core/skbuff.c               | 28 +++++++-----
>  3 files changed, 141 insertions(+), 32 deletions(-)
> 
> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> index 00197f14aa87..2d4e0a2c5620 100644
> --- a/include/net/page_pool/helpers.h
> +++ b/include/net/page_pool/helpers.h
> @@ -154,6 +154,64 @@ static inline struct page_pool_iov *page_to_page_pool_iov(struct page *page)
>  	return NULL;
>  }
>  
> +static inline int page_pool_page_ref_count(struct page *page)
> +{
> +	if (page_is_page_pool_iov(page))

As mentioned before, it seems we need to have the above checking every
time we need to do some per-page handling in page_pool core, is there
a plan in your mind how to remove those kind of checking in the future?

Even though a static_branch check is added in page_is_page_pool_iov(), it
does not make much sense that a core has tow different 'struct' for its
most basic data.

IMHO, the ppiov for dmabuf is forced fitting into page_pool without much
design consideration at this point.

> +		return page_pool_iov_refcount(page_to_page_pool_iov(page));
> +
> +	return page_ref_count(page);
> +}
> +

...

> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index b157efea5dea..07f802f1adf1 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -896,19 +896,23 @@ bool napi_pp_put_page(struct page *page, bool napi_safe)
>  	bool allow_direct = false;
>  	struct page_pool *pp;
>  
> -	page = compound_head(page);
> -
> -	/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
> -	 * in order to preserve any existing bits, such as bit 0 for the
> -	 * head page of compound page and bit 1 for pfmemalloc page, so
> -	 * mask those bits for freeing side when doing below checking,
> -	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
> -	 * to avoid recycling the pfmemalloc page.
> -	 */
> -	if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE))
> -		return false;
> +	if (!page_is_page_pool_iov(page)) {

For now, the above may work for the the rx part as it seems that you are
only enabling rx for dmabuf for now.

What is the plan to enable tx for dmabuf? If it is also intergrated into
page_pool? There was a attempt to enable page_pool for tx, Eric seemed to
have some comment about this:
https://lkml.kernel.org/netdev/2cf4b672-d7dc-db3d-ce90-15b4e91c4005@huawei.com/T/#mb6ab62dc22f38ec621d516259c56dd66353e24a2

If tx is not intergrated into page_pool, do we need to create a new layer for
the tx dmabuf?

> +		page = compound_head(page);
> +
> +		/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
> +		 * in order to preserve any existing bits, such as bit 0 for the
> +		 * head page of compound page and bit 1 for pfmemalloc page, so
> +		 * mask those bits for freeing side when doing below checking,
> +		 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
> +		 * to avoid recycling the pfmemalloc page.
> +		 */
> +		if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE))
> +			return false;
>  
> -	pp = page->pp;
> +		pp = page->pp;
> +	} else {
> +		pp = page_to_page_pool_iov(page)->pp;
> +	}
>  
>  	/* Allow direct recycle if we have reasons to believe that we are
>  	 * in the same context as the consumer would run, so there's
> 

WARNING: multiple messages have this Message-ID (diff)
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Mina Almasry <almasrymina@google.com>,
	Shailend Chand <shailend@google.com>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-doc@vger.kernel.org>,
	<linux-arch@vger.kernel.org>, <linux-kselftest@vger.kernel.org>,
	<bpf@vger.kernel.org>, <linux-media@vger.kernel.org>,
	<dri-devel@lists.freedesktop.org>
Cc: "Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
	"Jeroen de Borst" <jeroendb@google.com>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"David Ahern" <dsahern@kernel.org>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Eric Dumazet" <edumazet@google.com>,
	"Shakeel Butt" <shakeelb@google.com>,
	"Harshitha Ramamurthy" <hramamurthy@google.com>,
	"Praveen Kaligineedi" <pkaligineedi@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Shuah Khan" <shuah@kernel.org>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [net-next v1 09/16] page_pool: device memory support
Date: Fri, 8 Dec 2023 17:30:36 +0800	[thread overview]
Message-ID: <32211cbf-3a4e-8a86-6214-4304ddb18a98@huawei.com> (raw)
In-Reply-To: <20231208005250.2910004-10-almasrymina@google.com>

On 2023/12/8 8:52, Mina Almasry wrote:
> Overload the LSB of struct page* to indicate that it's a page_pool_iov.
> 
> Refactor mm calls on struct page* into helpers, and add page_pool_iov
> handling on those helpers. Modify callers of these mm APIs with calls to
> these helpers instead.
> 
> In areas where struct page* is dereferenced, add a check for special
> handling of page_pool_iov.
> 
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> 
> ---
> 
> v1:
> - Disable fragmentation support for iov properly.
> - fix napi_pp_put_page() path (Yunsheng).
> 
> ---
>  include/net/page_pool/helpers.h | 78 ++++++++++++++++++++++++++++++++-
>  net/core/page_pool.c            | 67 ++++++++++++++++++++--------
>  net/core/skbuff.c               | 28 +++++++-----
>  3 files changed, 141 insertions(+), 32 deletions(-)
> 
> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> index 00197f14aa87..2d4e0a2c5620 100644
> --- a/include/net/page_pool/helpers.h
> +++ b/include/net/page_pool/helpers.h
> @@ -154,6 +154,64 @@ static inline struct page_pool_iov *page_to_page_pool_iov(struct page *page)
>  	return NULL;
>  }
>  
> +static inline int page_pool_page_ref_count(struct page *page)
> +{
> +	if (page_is_page_pool_iov(page))

As mentioned before, it seems we need to have the above checking every
time we need to do some per-page handling in page_pool core, is there
a plan in your mind how to remove those kind of checking in the future?

Even though a static_branch check is added in page_is_page_pool_iov(), it
does not make much sense that a core has tow different 'struct' for its
most basic data.

IMHO, the ppiov for dmabuf is forced fitting into page_pool without much
design consideration at this point.

> +		return page_pool_iov_refcount(page_to_page_pool_iov(page));
> +
> +	return page_ref_count(page);
> +}
> +

...

> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index b157efea5dea..07f802f1adf1 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -896,19 +896,23 @@ bool napi_pp_put_page(struct page *page, bool napi_safe)
>  	bool allow_direct = false;
>  	struct page_pool *pp;
>  
> -	page = compound_head(page);
> -
> -	/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
> -	 * in order to preserve any existing bits, such as bit 0 for the
> -	 * head page of compound page and bit 1 for pfmemalloc page, so
> -	 * mask those bits for freeing side when doing below checking,
> -	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
> -	 * to avoid recycling the pfmemalloc page.
> -	 */
> -	if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE))
> -		return false;
> +	if (!page_is_page_pool_iov(page)) {

For now, the above may work for the the rx part as it seems that you are
only enabling rx for dmabuf for now.

What is the plan to enable tx for dmabuf? If it is also intergrated into
page_pool? There was a attempt to enable page_pool for tx, Eric seemed to
have some comment about this:
https://lkml.kernel.org/netdev/2cf4b672-d7dc-db3d-ce90-15b4e91c4005@huawei.com/T/#mb6ab62dc22f38ec621d516259c56dd66353e24a2

If tx is not intergrated into page_pool, do we need to create a new layer for
the tx dmabuf?

> +		page = compound_head(page);
> +
> +		/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
> +		 * in order to preserve any existing bits, such as bit 0 for the
> +		 * head page of compound page and bit 1 for pfmemalloc page, so
> +		 * mask those bits for freeing side when doing below checking,
> +		 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
> +		 * to avoid recycling the pfmemalloc page.
> +		 */
> +		if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE))
> +			return false;
>  
> -	pp = page->pp;
> +		pp = page->pp;
> +	} else {
> +		pp = page_to_page_pool_iov(page)->pp;
> +	}
>  
>  	/* Allow direct recycle if we have reasons to believe that we are
>  	 * in the same context as the consumer would run, so there's
> 

  reply	other threads:[~2023-12-08  9:30 UTC|newest]

Thread overview: 145+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-08  0:52 [net-next v1 00/16] Device Memory TCP Mina Almasry
2023-12-08  0:52 ` Mina Almasry
2023-12-08  0:52 ` [net-next v1 01/16] net: page_pool: factor out releasing DMA from releasing the page Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-10  3:49   ` Shakeel Butt
2023-12-10  3:49     ` Shakeel Butt
2023-12-12  8:11   ` Ilias Apalodimas
2023-12-12  8:11     ` Ilias Apalodimas
2023-12-08  0:52 ` [net-next v1 02/16] net: page_pool: create hooks for custom page providers Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-12  8:07   ` Ilias Apalodimas
2023-12-12  8:07     ` Ilias Apalodimas
2023-12-12 14:47     ` Mina Almasry
2023-12-12 14:47       ` Mina Almasry
2023-12-08  0:52 ` [net-next v1 03/16] queue_api: define queue api Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-14  1:15   ` Jakub Kicinski
2023-12-14  1:15     ` Jakub Kicinski
2023-12-08  0:52 ` [net-next v1 04/16] gve: implement " Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2024-03-05 11:45   ` Arnd Bergmann
2023-12-08  0:52 ` [net-next v1 05/16] net: netdev netlink api to bind dma-buf to a net device Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-14  1:17   ` Jakub Kicinski
2023-12-14  1:17     ` Jakub Kicinski
2023-12-08  0:52 ` [net-next v1 06/16] netdev: support binding dma-buf to netdevice Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-08 15:40   ` kernel test robot
2023-12-08 15:40     ` kernel test robot
2023-12-08 16:02   ` kernel test robot
2023-12-08 16:02     ` kernel test robot
2023-12-08 17:48   ` David Ahern
2023-12-08 17:48     ` David Ahern
2023-12-08 19:22     ` Mina Almasry
2023-12-08 19:22       ` Mina Almasry
2023-12-08 20:32       ` Mina Almasry
2023-12-08 20:32         ` Mina Almasry
2023-12-09 23:29       ` David Ahern
2023-12-09 23:29         ` David Ahern
2023-12-11  2:19         ` Mina Almasry
2023-12-11  2:19           ` Mina Almasry
2023-12-08  0:52 ` [net-next v1 07/16] netdev: netdevice devmem allocator Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-08 17:56   ` David Ahern
2023-12-08 17:56     ` David Ahern
2023-12-08 19:27     ` Mina Almasry
2023-12-08 19:27       ` Mina Almasry
2023-12-08  0:52 ` [net-next v1 08/16] memory-provider: dmabuf devmem memory provider Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-08 22:48   ` Pavel Begunkov
2023-12-08 22:48     ` Pavel Begunkov
2023-12-08 23:25     ` Mina Almasry
2023-12-08 23:25       ` Mina Almasry
2023-12-10  3:03       ` Pavel Begunkov
2023-12-10  3:03         ` Pavel Begunkov
2023-12-11  2:30         ` Mina Almasry
2023-12-11  2:30           ` Mina Almasry
2023-12-11 20:35           ` Pavel Begunkov
2023-12-11 20:35             ` Pavel Begunkov
2023-12-14 20:03             ` Mina Almasry
2023-12-14 20:03               ` Mina Almasry
2023-12-19 23:55               ` Pavel Begunkov
2023-12-19 23:55                 ` Pavel Begunkov
2023-12-08 23:05   ` Pavel Begunkov
2023-12-08 23:05     ` Pavel Begunkov
2023-12-12 12:25   ` Jason Gunthorpe
2023-12-12 12:25     ` Jason Gunthorpe
2023-12-12 13:07     ` Christoph Hellwig
2023-12-12 14:26     ` Mina Almasry
2023-12-12 14:26       ` Mina Almasry
2023-12-12 14:39       ` Jason Gunthorpe
2023-12-12 14:39         ` Jason Gunthorpe
2023-12-12 14:58         ` Mina Almasry
2023-12-12 14:58           ` Mina Almasry
2023-12-12 15:08           ` Jason Gunthorpe
2023-12-12 15:08             ` Jason Gunthorpe
2023-12-13  1:09             ` Mina Almasry
2023-12-13  1:09               ` Mina Almasry
2023-12-13  2:19               ` David Ahern
2023-12-13  2:19                 ` David Ahern
2023-12-13  7:49   ` Yinjun Zhang
2023-12-13  7:49     ` Yinjun Zhang
2023-12-08  0:52 ` [net-next v1 09/16] page_pool: device memory support Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-08  9:30   ` Yunsheng Lin [this message]
2023-12-08  9:30     ` Yunsheng Lin
2023-12-08 16:05     ` Mina Almasry
2023-12-08 16:05       ` Mina Almasry
2023-12-11  2:04       ` Yunsheng Lin
2023-12-11  2:04         ` Yunsheng Lin
2023-12-11  2:26         ` Mina Almasry
2023-12-11  2:26           ` Mina Almasry
2023-12-11  4:04           ` Mina Almasry
2023-12-11  4:04             ` Mina Almasry
2023-12-11 11:51             ` Yunsheng Lin
2023-12-11 11:51               ` Yunsheng Lin
2023-12-11 18:14               ` Mina Almasry
2023-12-11 18:14                 ` Mina Almasry
2023-12-12 11:17                 ` Yunsheng Lin
2023-12-12 11:17                   ` Yunsheng Lin
2023-12-12 14:28                   ` Mina Almasry
2023-12-12 14:28                     ` Mina Almasry
2023-12-13 11:48                     ` Yunsheng Lin
2023-12-13 11:48                       ` Yunsheng Lin
2023-12-13  7:52             ` Mina Almasry
2023-12-13  7:52               ` Mina Almasry
2023-12-08  0:52 ` [net-next v1 10/16] page_pool: don't release iov on elevanted refcount Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-08  0:52 ` [net-next v1 11/16] net: support non paged skb frags Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-08  0:52 ` [net-next v1 12/16] net: add support for skbs with unreadable frags Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-08  0:52 ` [net-next v1 13/16] tcp: RX path for devmem TCP Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-08 15:40   ` kernel test robot
2023-12-08 15:40     ` kernel test robot
2023-12-08 17:55   ` David Ahern
2023-12-08 17:55     ` David Ahern
2023-12-08 19:23     ` Mina Almasry
2023-12-08 19:23       ` Mina Almasry
2023-12-08  0:52 ` [net-next v1 14/16] net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-12 19:08   ` Simon Horman
2023-12-12 19:08     ` Simon Horman
2023-12-08  0:52 ` [net-next v1 15/16] net: add devmem TCP documentation Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-12 19:14   ` Simon Horman
2023-12-12 19:14     ` Simon Horman
2023-12-08  0:52 ` [net-next v1 16/16] selftests: add ncdevmem, netcat for devmem TCP Mina Almasry
2023-12-08  0:52   ` Mina Almasry
2023-12-08  1:47 ` [net-next v1 00/16] Device Memory TCP Mina Almasry
2023-12-08  1:47   ` Mina Almasry
2023-12-08 17:57 ` David Ahern
2023-12-08 17:57   ` David Ahern
2023-12-08 19:31   ` Mina Almasry
2023-12-08 19:31     ` Mina Almasry
2023-12-10  3:48 ` Shakeel Butt
2023-12-10  3:48   ` Shakeel Butt
2023-12-12  5:58 ` Christoph Hellwig
2023-12-14  6:20 ` patchwork-bot+netdevbpf
2023-12-14  6:20   ` patchwork-bot+netdevbpf
2023-12-14  6:48   ` Christoph Hellwig
2023-12-14  6:51     ` Mina Almasry
2023-12-14  6:51       ` Mina Almasry
2023-12-14  6:59       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32211cbf-3a4e-8a86-6214-4304ddb18a98@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=almasrymina@google.com \
    --cc=arnd@arndb.de \
    --cc=bpf@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=hramamurthy@google.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jeroendb@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pkaligineedi@google.com \
    --cc=shailend@google.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.