All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] support non-frag page for page_pool_alloc_frag()
@ 2023-05-26  9:26 Yunsheng Lin
  2023-05-26  9:26 ` [PATCH net-next 1/2] page_pool: unify frag page and non-frag page handling Yunsheng Lin
  2023-05-26  9:26 ` [PATCH net-next 2/2] page_pool: support non-frag page for page_pool_alloc_frag() Yunsheng Lin
  0 siblings, 2 replies; 9+ messages in thread
From: Yunsheng Lin @ 2023-05-26  9:26 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, bpf

In [1], there is a use case to use frag support in page
pool to reduce memory usage, and it may request different
frag size depending on the head/tail room space for
xdp_frame/shinfo and mtu/packet size. When the requested
frag size is large enough that a single page can not be
split into more than one frag, using frag support only
have performance penalty because of the extra frag count
handling for frag support.

So this patchset provides a way for user to fail back to
non-frag page when a page is not able to hold two frags.

PP_FLAG_PAGE_FRAG may be removed after this patchset, and
the extra benefit is that driver does not need to handle
the case for arch with PAGE_POOL_DMA_USE_PP_FRAG_COUNT when
using page_pool_alloc_frag() API.

1. https://patchwork.kernel.org/project/netdevbpf/patch/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/

V1: Drop RFC tag and page_pool_frag patch

Yunsheng Lin (2):
  page_pool: unify frag page and non-frag page handling
  page_pool: support non-frag page for page_pool_alloc_frag()

 include/net/page_pool.h | 38 ++++++++++++++++++++++++++------
 net/core/page_pool.c    | 48 +++++++++++++++++++++++++----------------
 2 files changed, 61 insertions(+), 25 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next 1/2] page_pool: unify frag page and non-frag page handling
  2023-05-26  9:26 [PATCH net-next 0/2] support non-frag page for page_pool_alloc_frag() Yunsheng Lin
@ 2023-05-26  9:26 ` Yunsheng Lin
  2023-05-26 12:03   ` Ilias Apalodimas
  2023-05-26  9:26 ` [PATCH net-next 2/2] page_pool: support non-frag page for page_pool_alloc_frag() Yunsheng Lin
  1 sibling, 1 reply; 9+ messages in thread
From: Yunsheng Lin @ 2023-05-26  9:26 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Jesper Dangaard Brouer, Ilias Apalodimas,
	Eric Dumazet

Currently page_pool_dev_alloc_pages() can not be called
when PP_FLAG_PAGE_FRAG is set, because it does not use
the frag reference counting.

As we are already doing a optimization by not updating
page->pp_frag_count in page_pool_defrag_page() for the
last frag user, and non-frag page only have one user,
so we utilize that to unify frag page and non-frag page
handling, so that page_pool_dev_alloc_pages() can also
be called with PP_FLAG_PAGE_FRAG set.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
---
 include/net/page_pool.h | 38 +++++++++++++++++++++++++++++++-------
 net/core/page_pool.c    |  1 +
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index c8ec2f34722b..ea7a0c0592a5 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -50,6 +50,9 @@
 				 PP_FLAG_DMA_SYNC_DEV |\
 				 PP_FLAG_PAGE_FRAG)
 
+#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT \
+		(sizeof(dma_addr_t) > sizeof(unsigned long))
+
 /*
  * Fast allocation side cache array/stack
  *
@@ -295,13 +298,20 @@ void page_pool_put_defragged_page(struct page_pool *pool, struct page *page,
  */
 static inline void page_pool_fragment_page(struct page *page, long nr)
 {
-	atomic_long_set(&page->pp_frag_count, nr);
+	if (!PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
+		atomic_long_set(&page->pp_frag_count, nr);
 }
 
+/* We need to reset frag_count back to 1 for the last user to allow
+ * only one user in case the page is recycled and allocated as non-frag
+ * page.
+ */
 static inline long page_pool_defrag_page(struct page *page, long nr)
 {
 	long ret;
 
+	BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1);
+
 	/* If nr == pp_frag_count then we have cleared all remaining
 	 * references to the page. No need to actually overwrite it, instead
 	 * we can leave this to be overwritten by the calling function.
@@ -311,19 +321,36 @@ static inline long page_pool_defrag_page(struct page *page, long nr)
 	 * especially when dealing with a page that may be partitioned
 	 * into only 2 or 3 pieces.
 	 */
-	if (atomic_long_read(&page->pp_frag_count) == nr)
+	if (atomic_long_read(&page->pp_frag_count) == nr) {
+		/* As we have ensured nr is always one for constant case
+		 * using the BUILD_BUG_ON() as above, only need to handle
+		 * the non-constant case here for frag count draining.
+		 */
+		if (!__builtin_constant_p(nr))
+			atomic_long_set(&page->pp_frag_count, 1);
+
 		return 0;
+	}
 
 	ret = atomic_long_sub_return(nr, &page->pp_frag_count);
 	WARN_ON(ret < 0);
+
+	/* Reset frag count back to 1, this should be the rare case when
+	 * two users call page_pool_defrag_page() currently.
+	 */
+	if (!ret)
+		atomic_long_set(&page->pp_frag_count, 1);
+
 	return ret;
 }
 
 static inline bool page_pool_is_last_frag(struct page_pool *pool,
 					  struct page *page)
 {
-	/* If fragments aren't enabled or count is 0 we were the last user */
-	return !(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
+	/* When dma_addr_upper is overlapped with pp_frag_count
+	 * or we were the last page frag user.
+	 */
+	return PAGE_POOL_DMA_USE_PP_FRAG_COUNT ||
 	       (page_pool_defrag_page(page, 1) == 0);
 }
 
@@ -357,9 +384,6 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
 	page_pool_put_full_page(pool, page, true);
 }
 
-#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT	\
-		(sizeof(dma_addr_t) > sizeof(unsigned long))
-
 static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
 {
 	dma_addr_t ret = page->dma_addr;
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index e212e9d7edcb..0868aa8f6323 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -334,6 +334,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
 {
 	page->pp = pool;
 	page->pp_magic |= PP_SIGNATURE;
+	page_pool_fragment_page(page, 1);
 	if (pool->p.init_callback)
 		pool->p.init_callback(page, pool->p.init_arg);
 }
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 2/2] page_pool: support non-frag page for page_pool_alloc_frag()
  2023-05-26  9:26 [PATCH net-next 0/2] support non-frag page for page_pool_alloc_frag() Yunsheng Lin
  2023-05-26  9:26 ` [PATCH net-next 1/2] page_pool: unify frag page and non-frag page handling Yunsheng Lin
@ 2023-05-26  9:26 ` Yunsheng Lin
  2023-05-26 15:16   ` Alexander Duyck
  1 sibling, 1 reply; 9+ messages in thread
From: Yunsheng Lin @ 2023-05-26  9:26 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Jesper Dangaard Brouer, Ilias Apalodimas,
	Eric Dumazet

There is performance penalty with using page frag support when
user requests a larger frag size and a page only supports one
frag user, see [1].

It seems like user may request different frag size depending
on the mtu and packet size, provide an option to allocate
non-frag page when a whole page is not able to hold two frags,
so that user has a unified interface for the memory allocation
with least memory utilization and performance penalty.

1. https://lore.kernel.org/netdev/ZEU+vospFdm08IeE@localhost.localdomain/

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
---
 net/core/page_pool.c | 47 +++++++++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 18 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 0868aa8f6323..e84ec6eabefd 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -699,14 +699,27 @@ struct page *page_pool_alloc_frag(struct page_pool *pool,
 	unsigned int max_size = PAGE_SIZE << pool->p.order;
 	struct page *page = pool->frag_page;
 
-	if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
-		    size > max_size))
+	if (unlikely(size > max_size))
 		return NULL;
 
+	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) {
+		*offset = 0;
+		return page_pool_alloc_pages(pool, gfp);
+	}
+
 	size = ALIGN(size, dma_get_cache_alignment());
-	*offset = pool->frag_offset;
 
-	if (page && *offset + size > max_size) {
+	if (page) {
+		*offset = pool->frag_offset;
+
+		if (*offset + size <= max_size) {
+			pool->frag_users++;
+			pool->frag_offset = *offset + size;
+			alloc_stat_inc(pool, fast);
+			return page;
+		}
+
+		pool->frag_page = NULL;
 		page = page_pool_drain_frag(pool, page);
 		if (page) {
 			alloc_stat_inc(pool, fast);
@@ -714,26 +727,24 @@ struct page *page_pool_alloc_frag(struct page_pool *pool,
 		}
 	}
 
-	if (!page) {
-		page = page_pool_alloc_pages(pool, gfp);
-		if (unlikely(!page)) {
-			pool->frag_page = NULL;
-			return NULL;
-		}
-
-		pool->frag_page = page;
+	page = page_pool_alloc_pages(pool, gfp);
+	if (unlikely(!page))
+		return NULL;
 
 frag_reset:
-		pool->frag_users = 1;
+	/* return page as non-frag page if a page is not able to
+	 * hold two frags for the current requested size.
+	 */
+	if (unlikely(size << 1 > max_size)) {
 		*offset = 0;
-		pool->frag_offset = size;
-		page_pool_fragment_page(page, BIAS_MAX);
 		return page;
 	}
 
-	pool->frag_users++;
-	pool->frag_offset = *offset + size;
-	alloc_stat_inc(pool, fast);
+	pool->frag_page = page;
+	pool->frag_users = 1;
+	*offset = 0;
+	pool->frag_offset = size;
+	page_pool_fragment_page(page, BIAS_MAX);
 	return page;
 }
 EXPORT_SYMBOL(page_pool_alloc_frag);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 1/2] page_pool: unify frag page and non-frag page handling
  2023-05-26  9:26 ` [PATCH net-next 1/2] page_pool: unify frag page and non-frag page handling Yunsheng Lin
@ 2023-05-26 12:03   ` Ilias Apalodimas
  2023-05-26 12:35     ` Yunsheng Lin
  0 siblings, 1 reply; 9+ messages in thread
From: Ilias Apalodimas @ 2023-05-26 12:03 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Lorenzo Bianconi,
	Alexander Duyck, Jesper Dangaard Brouer, Eric Dumazet

Hi Yunsheng

Apologies for not replying to the RFC,  I was pretty busy with internal
stuff

On Fri, May 26, 2023 at 05:26:14PM +0800, Yunsheng Lin wrote:
> Currently page_pool_dev_alloc_pages() can not be called
> when PP_FLAG_PAGE_FRAG is set, because it does not use
> the frag reference counting.
>
> As we are already doing a optimization by not updating
> page->pp_frag_count in page_pool_defrag_page() for the
> last frag user, and non-frag page only have one user,
> so we utilize that to unify frag page and non-frag page
> handling, so that page_pool_dev_alloc_pages() can also
> be called with PP_FLAG_PAGE_FRAG set.

What happens here is clear.  But why do we need this?  Do you have a
specific use case in mind where a driver will call
page_pool_dev_alloc_pages() and the PP_FLAG_PAGE_FRAG will be set?
If that's the case isn't it a better idea to unify the functions entirely?

Thanks
/Ilias
>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> CC: Lorenzo Bianconi <lorenzo@kernel.org>
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> ---
>  include/net/page_pool.h | 38 +++++++++++++++++++++++++++++++-------
>  net/core/page_pool.c    |  1 +
>  2 files changed, 32 insertions(+), 7 deletions(-)
>
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index c8ec2f34722b..ea7a0c0592a5 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -50,6 +50,9 @@
>  				 PP_FLAG_DMA_SYNC_DEV |\
>  				 PP_FLAG_PAGE_FRAG)
>
> +#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT \
> +		(sizeof(dma_addr_t) > sizeof(unsigned long))
> +
>  /*
>   * Fast allocation side cache array/stack
>   *
> @@ -295,13 +298,20 @@ void page_pool_put_defragged_page(struct page_pool *pool, struct page *page,
>   */
>  static inline void page_pool_fragment_page(struct page *page, long nr)
>  {
> -	atomic_long_set(&page->pp_frag_count, nr);
> +	if (!PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> +		atomic_long_set(&page->pp_frag_count, nr);
>  }
>
> +/* We need to reset frag_count back to 1 for the last user to allow
> + * only one user in case the page is recycled and allocated as non-frag
> + * page.
> + */
>  static inline long page_pool_defrag_page(struct page *page, long nr)
>  {
>  	long ret;
>
> +	BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1);
> +
>  	/* If nr == pp_frag_count then we have cleared all remaining
>  	 * references to the page. No need to actually overwrite it, instead
>  	 * we can leave this to be overwritten by the calling function.
> @@ -311,19 +321,36 @@ static inline long page_pool_defrag_page(struct page *page, long nr)
>  	 * especially when dealing with a page that may be partitioned
>  	 * into only 2 or 3 pieces.
>  	 */
> -	if (atomic_long_read(&page->pp_frag_count) == nr)
> +	if (atomic_long_read(&page->pp_frag_count) == nr) {
> +		/* As we have ensured nr is always one for constant case
> +		 * using the BUILD_BUG_ON() as above, only need to handle
> +		 * the non-constant case here for frag count draining.
> +		 */
> +		if (!__builtin_constant_p(nr))
> +			atomic_long_set(&page->pp_frag_count, 1);
> +
>  		return 0;
> +	}
>
>  	ret = atomic_long_sub_return(nr, &page->pp_frag_count);
>  	WARN_ON(ret < 0);
> +
> +	/* Reset frag count back to 1, this should be the rare case when
> +	 * two users call page_pool_defrag_page() currently.
> +	 */
> +	if (!ret)
> +		atomic_long_set(&page->pp_frag_count, 1);
> +
>  	return ret;
>  }
>
>  static inline bool page_pool_is_last_frag(struct page_pool *pool,
>  					  struct page *page)
>  {
> -	/* If fragments aren't enabled or count is 0 we were the last user */
> -	return !(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
> +	/* When dma_addr_upper is overlapped with pp_frag_count
> +	 * or we were the last page frag user.
> +	 */
> +	return PAGE_POOL_DMA_USE_PP_FRAG_COUNT ||
>  	       (page_pool_defrag_page(page, 1) == 0);
>  }
>
> @@ -357,9 +384,6 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
>  	page_pool_put_full_page(pool, page, true);
>  }
>
> -#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT	\
> -		(sizeof(dma_addr_t) > sizeof(unsigned long))
> -
>  static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>  {
>  	dma_addr_t ret = page->dma_addr;
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index e212e9d7edcb..0868aa8f6323 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -334,6 +334,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
>  {
>  	page->pp = pool;
>  	page->pp_magic |= PP_SIGNATURE;
> +	page_pool_fragment_page(page, 1);
>  	if (pool->p.init_callback)
>  		pool->p.init_callback(page, pool->p.init_arg);
>  }
> --
> 2.33.0
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 1/2] page_pool: unify frag page and non-frag page handling
  2023-05-26 12:03   ` Ilias Apalodimas
@ 2023-05-26 12:35     ` Yunsheng Lin
  2023-05-26 15:38       ` Ilias Apalodimas
  0 siblings, 1 reply; 9+ messages in thread
From: Yunsheng Lin @ 2023-05-26 12:35 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Lorenzo Bianconi,
	Alexander Duyck, Jesper Dangaard Brouer, Eric Dumazet

On 2023/5/26 20:03, Ilias Apalodimas wrote:
> Hi Yunsheng
> 
> Apologies for not replying to the RFC,  I was pretty busy with internal
> stuff
> 
> On Fri, May 26, 2023 at 05:26:14PM +0800, Yunsheng Lin wrote:
>> Currently page_pool_dev_alloc_pages() can not be called
>> when PP_FLAG_PAGE_FRAG is set, because it does not use
>> the frag reference counting.
>>
>> As we are already doing a optimization by not updating
>> page->pp_frag_count in page_pool_defrag_page() for the
>> last frag user, and non-frag page only have one user,
>> so we utilize that to unify frag page and non-frag page
>> handling, so that page_pool_dev_alloc_pages() can also
>> be called with PP_FLAG_PAGE_FRAG set.
> 
> What happens here is clear.  But why do we need this?  Do you have a
> specific use case in mind where a driver will call
> page_pool_dev_alloc_pages() and the PP_FLAG_PAGE_FRAG will be set?

Actually it is about calling page_pool_alloc_pages() in
page_pool_alloc_frag() in patch 2, the use case is the
veth using page frag support. see:

https://patchwork.kernel.org/project/netdevbpf/patch/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/

> If that's the case isn't it a better idea to unify the functions entirely?

As about, page_pool_alloc_frag() does seems to be a superset of
page_pool_alloc_pages() after this patchset as my understanding.
If the page_pool_alloc_frag() API turns out to be a good API for
the driver, maybe we can phase out *page_pool_alloc_pages() as
time goes by?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 2/2] page_pool: support non-frag page for page_pool_alloc_frag()
  2023-05-26  9:26 ` [PATCH net-next 2/2] page_pool: support non-frag page for page_pool_alloc_frag() Yunsheng Lin
@ 2023-05-26 15:16   ` Alexander Duyck
  2023-05-27  9:51     ` Yunsheng Lin
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander Duyck @ 2023-05-26 15:16 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Lorenzo Bianconi,
	Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet

On Fri, May 26, 2023 at 2:28 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> There is performance penalty with using page frag support when
> user requests a larger frag size and a page only supports one
> frag user, see [1].
>
> It seems like user may request different frag size depending
> on the mtu and packet size, provide an option to allocate
> non-frag page when a whole page is not able to hold two frags,
> so that user has a unified interface for the memory allocation
> with least memory utilization and performance penalty.
>
> 1. https://lore.kernel.org/netdev/ZEU+vospFdm08IeE@localhost.localdomain/
>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> CC: Lorenzo Bianconi <lorenzo@kernel.org>
> CC: Alexander Duyck <alexander.duyck@gmail.com>

The way I see it there are several problems with this approach.

First, why not just increase the page order rather than trying to
essentially make page_pool_alloc_frag into an analog for
page_pool_alloc_pages? I know for the skb allocator we are working
with an order 3 page. You could likely do something similar here to
achieve the better performance you are looking for.

Second, I am not a fan of these changes as they seem to be wasteful
for drivers that might make use of a mix of large and small
allocations. If we aren't going to use fragments then we should
probably just wrap the call to this function in an inline wrapper that
checks the size and just automatically pulls the larger sizes off into
the non-frag allocation path. Look at something such as
__netdev_alloc_skb as an example.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 1/2] page_pool: unify frag page and non-frag page handling
  2023-05-26 12:35     ` Yunsheng Lin
@ 2023-05-26 15:38       ` Ilias Apalodimas
  2023-05-27  8:18         ` Yunsheng Lin
  0 siblings, 1 reply; 9+ messages in thread
From: Ilias Apalodimas @ 2023-05-26 15:38 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Lorenzo Bianconi,
	Alexander Duyck, Jesper Dangaard Brouer, Eric Dumazet

On Fri, May 26, 2023 at 08:35:24PM +0800, Yunsheng Lin wrote:
> On 2023/5/26 20:03, Ilias Apalodimas wrote:
> > Hi Yunsheng
> >
> > Apologies for not replying to the RFC,  I was pretty busy with internal
> > stuff
> >
> > On Fri, May 26, 2023 at 05:26:14PM +0800, Yunsheng Lin wrote:
> >> Currently page_pool_dev_alloc_pages() can not be called
> >> when PP_FLAG_PAGE_FRAG is set, because it does not use
> >> the frag reference counting.
> >>
> >> As we are already doing a optimization by not updating
> >> page->pp_frag_count in page_pool_defrag_page() for the
> >> last frag user, and non-frag page only have one user,
> >> so we utilize that to unify frag page and non-frag page
> >> handling, so that page_pool_dev_alloc_pages() can also
> >> be called with PP_FLAG_PAGE_FRAG set.
> >
> > What happens here is clear.  But why do we need this?  Do you have a
> > specific use case in mind where a driver will call
> > page_pool_dev_alloc_pages() and the PP_FLAG_PAGE_FRAG will be set?
>
> Actually it is about calling page_pool_alloc_pages() in
> page_pool_alloc_frag() in patch 2, the use case is the
> veth using page frag support. see:
>
> https://patchwork.kernel.org/project/netdevbpf/patch/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/

Ok I missed that patch.

>
> > If that's the case isn't it a better idea to unify the functions entirely?
>
> As about, page_pool_alloc_frag() does seems to be a superset of
> page_pool_alloc_pages() after this patchset as my understanding.
> If the page_pool_alloc_frag() API turns out to be a good API for
> the driver, maybe we can phase out *page_pool_alloc_pages() as
> time goes by?

Looking at patch 2/2 it seems a bit wasteful.  At the moment only hns3 uses
fragments and the length of the allocation seems static.  But if someone
else chooses to allocate a > 2048 packet why should it allocate a page?

I just think it's a bit confusing since we have a flag on the pool for page
fragments, but then we violate it when it suits us.

Thanks
/Ilias

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 1/2] page_pool: unify frag page and non-frag page handling
  2023-05-26 15:38       ` Ilias Apalodimas
@ 2023-05-27  8:18         ` Yunsheng Lin
  0 siblings, 0 replies; 9+ messages in thread
From: Yunsheng Lin @ 2023-05-27  8:18 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Lorenzo Bianconi,
	Alexander Duyck, Jesper Dangaard Brouer, Eric Dumazet

On 2023/5/26 23:38, Ilias Apalodimas wrote:
>>
>>> If that's the case isn't it a better idea to unify the functions entirely?
>>
>> As about, page_pool_alloc_frag() does seems to be a superset of
>> page_pool_alloc_pages() after this patchset as my understanding.
>> If the page_pool_alloc_frag() API turns out to be a good API for
>> the driver, maybe we can phase out *page_pool_alloc_pages() as
>> time goes by?
> 
> Looking at patch 2/2 it seems a bit wasteful.  At the moment only hns3 uses
> fragments and the length of the allocation seems static.  But if someone
> else chooses to allocate a > 2048 packet why should it allocate a page?

It is based on the fact that if user requests a > 2048 frag, then it will
most likely requests > 2048 frag again, for example, when mtu is changed
or xdp is enabled/disabble, at least for veth case, the frag size is likely
changed.

Allocating a page for the above case avoid the frag count draining overhead,
and unify the interface for the driver so that driver don't need to choose
which API to use.

> 
> I just think it's a bit confusing since we have a flag on the pool for page
> fragments, but then we violate it when it suits us.

Yes, we can remove it as mentioned in the cover letter:
"PP_FLAG_PAGE_FRAG may be removed after this patchset, and the
extra benefit is that driver does not need to handle the case
for arch with PAGE_POOL_DMA_USE_PP_FRAG_COUNT when using
page_pool_alloc_frag() API."

> 
> Thanks
> /Ilias
> .
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 2/2] page_pool: support non-frag page for page_pool_alloc_frag()
  2023-05-26 15:16   ` Alexander Duyck
@ 2023-05-27  9:51     ` Yunsheng Lin
  0 siblings, 0 replies; 9+ messages in thread
From: Yunsheng Lin @ 2023-05-27  9:51 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Lorenzo Bianconi,
	Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet

On 2023/5/26 23:16, Alexander Duyck wrote:
> On Fri, May 26, 2023 at 2:28 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> There is performance penalty with using page frag support when
>> user requests a larger frag size and a page only supports one
>> frag user, see [1].
>>
>> It seems like user may request different frag size depending
>> on the mtu and packet size, provide an option to allocate
>> non-frag page when a whole page is not able to hold two frags,
>> so that user has a unified interface for the memory allocation
>> with least memory utilization and performance penalty.
>>
>> 1. https://lore.kernel.org/netdev/ZEU+vospFdm08IeE@localhost.localdomain/
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> CC: Lorenzo Bianconi <lorenzo@kernel.org>
>> CC: Alexander Duyck <alexander.duyck@gmail.com>
> 
> The way I see it there are several problems with this approach.
> 
> First, why not just increase the page order rather than trying to
> essentially make page_pool_alloc_frag into an analog for
> page_pool_alloc_pages? I know for the skb allocator we are working
> with an order 3 page. You could likely do something similar here to
> achieve the better performance you are looking for.

I suppose the "an order 3 page" refers to __page_frag_cache_refill()
trying to allocate an order 3 page, if it fails, then fail back to
order 0 page?

As page pool alloc and recycle page in order based on pool->alloc
and pool->ring, so we can not do the above failback trick for page
pool.

We could add different pool->alloc/pool->ring for different order
page, for example, pool->alloc_order_0/pool->ring_order_0 for order
0 page, and pool->alloc_order_3/pool->ring_order_3 for order 3 page,
but then it will be complicated and possibly more memory wasteful.

We would also create page pool with pool->p.order being 3, then
the optimization in this patch also apply.

> 
> Second, I am not a fan of these changes as they seem to be wasteful
> for drivers that might make use of a mix of large and small

I suppose 'wasteful' refer to memory usage, right?
I am not sure how drivers that might make use a mix of large and
small yet, like mlx5 using page_pool_defrag_page() directly.
but if the PAGE_SIZE << pool->p.order is integral multiple of the frag,
then some waste seems to be unavoidable, does handling the frag count
in the driver save more memory than handling the frag count in page pool?
If not, handling the frag count in page pool seems more reasonable?

> allocations. If we aren't going to use fragments then we should
> probably just wrap the call to this function in an inline wrapper that
> checks the size and just automatically pulls the larger sizes off into
> the non-frag allocation path. Look at something such as
> __netdev_alloc_skb as an example.

Do you mean adding an new inline wrapper like below?

if (len > xxx)
	return page_pool_alloc_pages(pool, gfp);
else
	return page_pool_alloc_frag();

It seems the above is essentially the same as this patch does,
this patch does it by not introducing another interface and doing
some optimization. For example, for the above case with 'len > xxx'
in this patch, we still allow frag allocate if the remaining size of
the page is bigger than 'len', which I think it is a small optimization
for 64K page size or order 3 page.

Also, do you have some thing in mind above xxx here? As this patch
chooses the ((PAGE_SIZE << pool->p.order) / 2) as xxx.

> .
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-05-27  9:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-26  9:26 [PATCH net-next 0/2] support non-frag page for page_pool_alloc_frag() Yunsheng Lin
2023-05-26  9:26 ` [PATCH net-next 1/2] page_pool: unify frag page and non-frag page handling Yunsheng Lin
2023-05-26 12:03   ` Ilias Apalodimas
2023-05-26 12:35     ` Yunsheng Lin
2023-05-26 15:38       ` Ilias Apalodimas
2023-05-27  8:18         ` Yunsheng Lin
2023-05-26  9:26 ` [PATCH net-next 2/2] page_pool: support non-frag page for page_pool_alloc_frag() Yunsheng Lin
2023-05-26 15:16   ` Alexander Duyck
2023-05-27  9:51     ` Yunsheng Lin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.