[PATCH net-next 0/7] some optimization for page pool

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next 0/7] some optimization for page pool
@ 2021-09-22  9:41 Yunsheng Lin
  2021-09-22  9:41 ` [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA Yunsheng Lin
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-22  9:41 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, linux-kernel, linuxarm, hawk, ilias.apalodimas,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern

Patch 1: disable dma mapping support for 32-bit arch with 64-bit
         DMA.
Patch 2: support non-split page when PP_FLAG_PAGE_FRAG is set.
patch 3: avoid calling compound_head() for skb frag page
Patch 4-7: use pp_magic to identify pp page uniquely.

V3:
    1. add patch 1/4/6/7.
    2. use pp_magic to identify pp page uniquely too.
    3. avoid unnecessary compound_head() calling.

V2: add patch 2, adjust the commit log accroding to the discussion
    in V1, and fix a compiler error reported by kernel test robot.

Yunsheng Lin (7):
  page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA
  page_pool: support non-split page with PP_FLAG_PAGE_FRAG
  pool_pool: avoid calling compound_head() for skb frag page
  page_pool: change BIAS_MAX to support incrementing
  skbuff: keep track of pp page when __skb_frag_ref() is called
  skbuff: only use pp_magic identifier for a skb' head page
  skbuff: remove unused skb->pp_recycle

 .../net/ethernet/hisilicon/hns3/hns3_enet.c   |  6 ---
 drivers/net/ethernet/marvell/mvneta.c         |  2 -
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  4 +-
 drivers/net/ethernet/marvell/sky2.c           |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  2 +-
 drivers/net/ethernet/ti/cpsw.c                |  2 -
 drivers/net/ethernet/ti/cpsw_new.c            |  2 -
 include/linux/mm_types.h                      | 13 +-----
 include/linux/skbuff.h                        | 39 ++++++++----------
 include/net/page_pool.h                       | 31 ++++++++------
 net/core/page_pool.c                          | 40 +++++++------------
 net/core/skbuff.c                             | 36 ++++++-----------
 net/tls/tls_device.c                          |  2 +-
 13 files changed, 67 insertions(+), 114 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA
  2021-09-22  9:41 [PATCH net-next 0/7] some optimization for page pool Yunsheng Lin
@ 2021-09-22  9:41 ` Yunsheng Lin
  2021-09-23  9:10   ` Ilias Apalodimas
  2021-09-23  9:33   ` Jesper Dangaard Brouer
  2021-09-22  9:41 ` [PATCH net-next 2/7] page_pool: support non-split page with PP_FLAG_PAGE_FRAG Yunsheng Lin
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-22  9:41 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, linux-kernel, linuxarm, hawk, ilias.apalodimas,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern

As the 32-bit arch with 64-bit DMA seems to rare those days,
and page pool is carrying a lot of code and complexity for
systems that possibly don't exist.

So disable dma mapping support for such systems, if drivers
really want to work on such systems, they have to implement
their own DMA-mapping fallback tracking outside page_pool.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 include/linux/mm_types.h | 13 +------------
 include/net/page_pool.h  | 12 +-----------
 net/core/page_pool.c     | 10 ++++++----
 3 files changed, 8 insertions(+), 27 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 7f8ee09c711f..436e0946d691 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -104,18 +104,7 @@ struct page {
 			struct page_pool *pp;
 			unsigned long _pp_mapping_pad;
 			unsigned long dma_addr;
-			union {
-				/**
-				 * dma_addr_upper: might require a 64-bit
-				 * value on 32-bit architectures.
-				 */
-				unsigned long dma_addr_upper;
-				/**
-				 * For frag page support, not supported in
-				 * 32-bit architectures with 64-bit DMA.
-				 */
-				atomic_long_t pp_frag_count;
-			};
+			atomic_long_t pp_frag_count;
 		};
 		struct {	/* slab, slob and slub */
 			union {
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index a4082406a003..3855f069627f 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -216,24 +216,14 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
 	page_pool_put_full_page(pool, page, true);
 }
 
-#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT	\
-		(sizeof(dma_addr_t) > sizeof(unsigned long))
-
 static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
 {
-	dma_addr_t ret = page->dma_addr;
-
-	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
-		ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16;
-
-	return ret;
+	return page->dma_addr;
 }
 
 static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
 {
 	page->dma_addr = addr;
-	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
-		page->dma_addr_upper = upper_32_bits(addr);
 }
 
 static inline void page_pool_set_frag_count(struct page *page, long nr)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1a6978427d6c..a65bd7972e37 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -49,6 +49,12 @@ static int page_pool_init(struct page_pool *pool,
 	 * which is the XDP_TX use-case.
 	 */
 	if (pool->p.flags & PP_FLAG_DMA_MAP) {
+		/* DMA-mapping is not supported on 32-bit systems with
+		 * 64-bit DMA mapping.
+		 */
+		if (sizeof(dma_addr_t) > sizeof(unsigned long))
+			return -EINVAL;
+
 		if ((pool->p.dma_dir != DMA_FROM_DEVICE) &&
 		    (pool->p.dma_dir != DMA_BIDIRECTIONAL))
 			return -EINVAL;
@@ -69,10 +75,6 @@ static int page_pool_init(struct page_pool *pool,
 		 */
 	}
 
-	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
-	    pool->p.flags & PP_FLAG_PAGE_FRAG)
-		return -EINVAL;
-
 	if (ptr_ring_init(&pool->ring, ring_qsize, GFP_KERNEL) < 0)
 		return -ENOMEM;
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 2/7] page_pool: support non-split page with PP_FLAG_PAGE_FRAG
  2021-09-22  9:41 [PATCH net-next 0/7] some optimization for page pool Yunsheng Lin
  2021-09-22  9:41 ` [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA Yunsheng Lin
@ 2021-09-22  9:41 ` Yunsheng Lin
  2021-09-23 12:08   ` Jesper Dangaard Brouer
  2021-09-22  9:41 ` [PATCH net-next 3/7] pool_pool: avoid calling compound_head() for skb frag page Yunsheng Lin
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-22  9:41 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, linux-kernel, linuxarm, hawk, ilias.apalodimas,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern

Currently when PP_FLAG_PAGE_FRAG is set, the caller is not
expected to call page_pool_alloc_pages() directly because of
the PP_FLAG_PAGE_FRAG checking in __page_pool_put_page().

The patch removes the above checking to enable non-split page
support when PP_FLAG_PAGE_FRAG is set.

Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 net/core/page_pool.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index a65bd7972e37..f7e71dcb6a2e 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -315,11 +315,14 @@ struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp)
 
 	/* Fast-path: Get a page from cache */
 	page = __page_pool_get_cached(pool);
-	if (page)
-		return page;
 
 	/* Slow-path: cache empty, do real allocation */
-	page = __page_pool_alloc_pages_slow(pool, gfp);
+	if (!page)
+		page = __page_pool_alloc_pages_slow(pool, gfp);
+
+	if (likely(page))
+		page_pool_set_frag_count(page, 1);
+
 	return page;
 }
 EXPORT_SYMBOL(page_pool_alloc_pages);
@@ -428,8 +431,7 @@ __page_pool_put_page(struct page_pool *pool, struct page *page,
 		     unsigned int dma_sync_size, bool allow_direct)
 {
 	/* It is not the last user for the page frag case */
-	if (pool->p.flags & PP_FLAG_PAGE_FRAG &&
-	    page_pool_atomic_sub_frag_count_return(page, 1))
+	if (page_pool_atomic_sub_frag_count_return(page, 1))
 		return NULL;
 
 	/* This allocator is optimized for the XDP mode that uses
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 3/7] pool_pool: avoid calling compound_head() for skb frag page
  2021-09-22  9:41 [PATCH net-next 0/7] some optimization for page pool Yunsheng Lin
  2021-09-22  9:41 ` [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA Yunsheng Lin
  2021-09-22  9:41 ` [PATCH net-next 2/7] page_pool: support non-split page with PP_FLAG_PAGE_FRAG Yunsheng Lin
@ 2021-09-22  9:41 ` Yunsheng Lin
  2021-09-23  8:33   ` Ilias Apalodimas
  2021-09-22  9:41 ` [PATCH net-next 4/7] page_pool: change BIAS_MAX to support incrementing Yunsheng Lin
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-22  9:41 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, linux-kernel, linuxarm, hawk, ilias.apalodimas,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern

As the pp page for a skb frag is always a head page, so make
sure skb_pp_recycle() passes a head page to avoid calling
compound_head() for skb frag page case.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 include/linux/skbuff.h | 2 +-
 net/core/page_pool.c   | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6bdb0db3e825..35eebc2310a5 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4722,7 +4722,7 @@ static inline bool skb_pp_recycle(struct sk_buff *skb, void *data)
 {
 	if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
 		return false;
-	return page_pool_return_skb_page(virt_to_page(data));
+	return page_pool_return_skb_page(virt_to_head_page(data));
 }
 
 #endif	/* __KERNEL__ */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index f7e71dcb6a2e..357fb53343a0 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -742,8 +742,6 @@ bool page_pool_return_skb_page(struct page *page)
 {
 	struct page_pool *pp;
 
-	page = compound_head(page);
-
 	/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
 	 * in order to preserve any existing bits, such as bit 0 for the
 	 * head page of compound page and bit 1 for pfmemalloc page, so
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 4/7] page_pool: change BIAS_MAX to support incrementing
  2021-09-22  9:41 [PATCH net-next 0/7] some optimization for page pool Yunsheng Lin
                   ` (2 preceding siblings ...)
  2021-09-22  9:41 ` [PATCH net-next 3/7] pool_pool: avoid calling compound_head() for skb frag page Yunsheng Lin
@ 2021-09-22  9:41 ` Yunsheng Lin
  2021-09-22  9:41 ` [PATCH net-next 5/7] skbuff: keep track of pp page when __skb_frag_ref() is called Yunsheng Lin
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-22  9:41 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, linux-kernel, linuxarm, hawk, ilias.apalodimas,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern

As the page->pp_frag_count need incrementing for pp page
tracking support, so change BIAS_MAX to (LONG_MAX / 2) to
avoid overflowing.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 net/core/page_pool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 357fb53343a0..e9516477f9d2 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -24,7 +24,7 @@
 #define DEFER_TIME (msecs_to_jiffies(1000))
 #define DEFER_WARN_INTERVAL (60 * HZ)
 
-#define BIAS_MAX	LONG_MAX
+#define BIAS_MAX	(LONG_MAX / 2)
 
 static int page_pool_init(struct page_pool *pool,
 			  const struct page_pool_params *params)
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 5/7] skbuff: keep track of pp page when __skb_frag_ref() is called
  2021-09-22  9:41 [PATCH net-next 0/7] some optimization for page pool Yunsheng Lin
                   ` (3 preceding siblings ...)
  2021-09-22  9:41 ` [PATCH net-next 4/7] page_pool: change BIAS_MAX to support incrementing Yunsheng Lin
@ 2021-09-22  9:41 ` Yunsheng Lin
  2021-09-22  9:41 ` [PATCH net-next 6/7] skbuff: only use pp_magic identifier for a skb' head page Yunsheng Lin
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-22  9:41 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, linux-kernel, linuxarm, hawk, ilias.apalodimas,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern

As the skb->pp_recycle and page->pp_magic may not be enough
to track if a frag page is from page pool after the calling
of __skb_frag_ref(), mostly because of a data race, see:
commit 2cc3aeb5eccc ("skbuff: Fix a potential race while
recycling page_pool packets").

There may be clone and expand head case that might lose the
track if a frag page is from page pool or not.

And not being able to keep track of pp page may cause problem
for the skb_split() case in tso_fragment() too:
Supposing a skb has 3 frag pages, all coming from a page pool,
and is split to skb1 and skb2:
skb1: first frag page + first half of second frag page
skb2: second half of second frag page + third frag page

How do we set the skb->pp_recycle of skb1 and skb2?
1. If we set both of them to 1, then we may have a similar
   race as the above commit for second frag page.
2. If we set only one of them to 1, then we may have resource
   leaking problem as both first frag page and third frag page
   are indeed from page pool.

So increment the frag count when __skb_frag_ref() is called,
and only use page->pp_magic to indicate if a frag page is from
page pool, to avoid the above data race.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 drivers/net/ethernet/marvell/sky2.c        |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |  2 +-
 include/linux/skbuff.h                     | 30 +++++++++++++++++-----
 include/net/page_pool.h                    | 19 +++++++++++++-
 net/core/page_pool.c                       | 14 +---------
 net/core/skbuff.c                          |  4 +--
 net/tls/tls_device.c                       |  2 +-
 7 files changed, 48 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index e9fc74e54b22..91b62f468a26 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -2503,7 +2503,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,
 
 		if (length == 0) {
 			/* don't need this page */
-			__skb_frag_unref(frag, false);
+			__skb_frag_unref(frag);
 			--skb_shinfo(skb)->nr_frags;
 		} else {
 			size = min(length, (unsigned) PAGE_SIZE);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 7f6d3b82c29b..ce31b1fd554f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -526,7 +526,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
 fail:
 	while (nr > 0) {
 		nr--;
-		__skb_frag_unref(skb_shinfo(skb)->frags + nr, false);
+		__skb_frag_unref(skb_shinfo(skb)->frags + nr);
 	}
 	return 0;
 }
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 35eebc2310a5..a2d3b6fe0c32 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3073,7 +3073,16 @@ static inline struct page *skb_frag_page(const skb_frag_t *frag)
  */
 static inline void __skb_frag_ref(skb_frag_t *frag)
 {
-	get_page(skb_frag_page(frag));
+	struct page *page = skb_frag_page(frag);
+
+#ifdef CONFIG_PAGE_POOL
+	if (page_pool_is_pp_page(page)) {
+		page_pool_atomic_inc_frag_count(page);
+		return;
+	}
+#endif
+
+	get_page(page);
 }
 
 /**
@@ -3091,18 +3100,19 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f)
 /**
  * __skb_frag_unref - release a reference on a paged fragment.
  * @frag: the paged fragment
- * @recycle: recycle the page if allocated via page_pool
  *
  * Releases a reference on the paged fragment @frag
  * or recycles the page via the page_pool API.
  */
-static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle)
+static inline void __skb_frag_unref(skb_frag_t *frag)
 {
 	struct page *page = skb_frag_page(frag);
 
 #ifdef CONFIG_PAGE_POOL
-	if (recycle && page_pool_return_skb_page(page))
+	if (page_pool_is_pp_page(page)) {
+		page_pool_return_skb_page(page);
 		return;
+	}
 #endif
 	put_page(page);
 }
@@ -3116,7 +3126,7 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle)
  */
 static inline void skb_frag_unref(struct sk_buff *skb, int f)
 {
-	__skb_frag_unref(&skb_shinfo(skb)->frags[f], skb->pp_recycle);
+	__skb_frag_unref(&skb_shinfo(skb)->frags[f]);
 }
 
 /**
@@ -4720,9 +4730,17 @@ static inline void skb_mark_for_recycle(struct sk_buff *skb)
 
 static inline bool skb_pp_recycle(struct sk_buff *skb, void *data)
 {
+	struct page *page;
+
 	if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
 		return false;
-	return page_pool_return_skb_page(virt_to_head_page(data));
+
+	page = virt_to_head_page(data);
+	if (!page_pool_is_pp_page(page))
+		return false;
+
+	page_pool_return_skb_page(page);
+	return true;
 }
 
 #endif	/* __KERNEL__ */
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 3855f069627f..f9738da37d9f 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -164,7 +164,7 @@ inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
 	return pool->p.dma_dir;
 }
 
-bool page_pool_return_skb_page(struct page *page);
+void page_pool_return_skb_page(struct page *page);
 
 struct page_pool *page_pool_create(const struct page_pool_params *params);
 
@@ -231,6 +231,23 @@ static inline void page_pool_set_frag_count(struct page *page, long nr)
 	atomic_long_set(&page->pp_frag_count, nr);
 }
 
+static inline void page_pool_atomic_inc_frag_count(struct page *page)
+{
+	atomic_long_inc(&page->pp_frag_count);
+}
+
+static inline bool page_pool_is_pp_page(struct page *page)
+{
+	/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
+	 * in order to preserve any existing bits, such as bit 0 for the
+	 * head page of compound page and bit 1 for pfmemalloc page, so
+	 * mask those bits for freeing side when doing below checking,
+	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
+	 * to avoid recycling the pfmemalloc page.
+	 */
+	return (page->pp_magic & ~0x3UL) == PP_SIGNATURE;
+}
+
 static inline long page_pool_atomic_sub_frag_count_return(struct page *page,
 							  long nr)
 {
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index e9516477f9d2..96a28accce0e 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -738,20 +738,10 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
 }
 EXPORT_SYMBOL(page_pool_update_nid);
 
-bool page_pool_return_skb_page(struct page *page)
+void page_pool_return_skb_page(struct page *page)
 {
 	struct page_pool *pp;
 
-	/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
-	 * in order to preserve any existing bits, such as bit 0 for the
-	 * head page of compound page and bit 1 for pfmemalloc page, so
-	 * mask those bits for freeing side when doing below checking,
-	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
-	 * to avoid recycling the pfmemalloc page.
-	 */
-	if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE))
-		return false;
-
 	pp = page->pp;
 
 	/* Driver set this to memory recycling info. Reset it on recycle.
@@ -760,7 +750,5 @@ bool page_pool_return_skb_page(struct page *page)
 	 * 'flipped' fragment being in use or not.
 	 */
 	page_pool_put_full_page(pp, page, false);
-
-	return true;
 }
 EXPORT_SYMBOL(page_pool_return_skb_page);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2170bea2c7de..db8af3eff255 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -668,7 +668,7 @@ static void skb_release_data(struct sk_buff *skb)
 	skb_zcopy_clear(skb, true);
 
 	for (i = 0; i < shinfo->nr_frags; i++)
-		__skb_frag_unref(&shinfo->frags[i], skb->pp_recycle);
+		__skb_frag_unref(&shinfo->frags[i]);
 
 	if (shinfo->frag_list)
 		kfree_skb_list(shinfo->frag_list);
@@ -3563,7 +3563,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen)
 		fragto = &skb_shinfo(tgt)->frags[merge];
 
 		skb_frag_size_add(fragto, skb_frag_size(fragfrom));
-		__skb_frag_unref(fragfrom, skb->pp_recycle);
+		__skb_frag_unref(fragfrom);
 	}
 
 	/* Reposition in the original skb */
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index b932469ee69c..bd9f1567aa39 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -128,7 +128,7 @@ static void destroy_record(struct tls_record_info *record)
 	int i;
 
 	for (i = 0; i < record->num_frags; i++)
-		__skb_frag_unref(&record->frags[i], false);
+		__skb_frag_unref(&record->frags[i]);
 	kfree(record);
 }
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 6/7] skbuff: only use pp_magic identifier for a skb' head page
  2021-09-22  9:41 [PATCH net-next 0/7] some optimization for page pool Yunsheng Lin
                   ` (4 preceding siblings ...)
  2021-09-22  9:41 ` [PATCH net-next 5/7] skbuff: keep track of pp page when __skb_frag_ref() is called Yunsheng Lin
@ 2021-09-22  9:41 ` Yunsheng Lin
  2021-09-22  9:41 ` [PATCH net-next 7/7] skbuff: remove unused skb->pp_recycle Yunsheng Lin
  2021-09-23  7:07 ` [PATCH net-next 0/7] some optimization for page pool Ilias Apalodimas
  7 siblings, 0 replies; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-22  9:41 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, linux-kernel, linuxarm, hawk, ilias.apalodimas,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern

As pp_magic is used to identify pp page for a skb's frag page
in previous patch, so do the similar handling for the head
page of a skb too.

And it seems head to frag converting for a skb during GRO and
GSO processing does not need handling when using pp_magic to
identify pp page for a skb' head page and frag page, see
NAPI_GRO_FREE_STOLEN_HEAD for GRO in skb_gro_receive() and
skb_head_frag_to_page_desc() for GSO in skb_segment().

As pp_magic only exist in the head page of a compound page,
and the freeing of a head page for a skb is eventually operated
on the head page of a compound page for both pp and non-pp
page, so use virt_to_head_page() and __page_frag_cache_drain()
in skb_free_head() to avoid unnecessary virt_to_head_page()
calling in page_frag_free().

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 include/linux/skbuff.h | 15 ---------------
 net/core/skbuff.c      | 11 +++++++++--
 2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a2d3b6fe0c32..b77ee060b64d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4728,20 +4728,5 @@ static inline void skb_mark_for_recycle(struct sk_buff *skb)
 }
 #endif
 
-static inline bool skb_pp_recycle(struct sk_buff *skb, void *data)
-{
-	struct page *page;
-
-	if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
-		return false;
-
-	page = virt_to_head_page(data);
-	if (!page_pool_is_pp_page(page))
-		return false;
-
-	page_pool_return_skb_page(page);
-	return true;
-}
-
 #endif	/* __KERNEL__ */
 #endif	/* _LINUX_SKBUFF_H */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index db8af3eff255..3718898da499 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -647,9 +647,16 @@ static void skb_free_head(struct sk_buff *skb)
 	unsigned char *head = skb->head;
 
 	if (skb->head_frag) {
-		if (skb_pp_recycle(skb, head))
+		struct page *page = virt_to_head_page(head);
+
+#ifdef CONFIG_PAGE_POOL
+		if (page_pool_is_pp_page(page)) {
+			page_pool_return_skb_page(page);
 			return;
-		skb_free_frag(head);
+		}
+#endif
+
+		__page_frag_cache_drain(page, 1);
 	} else {
 		kfree(head);
 	}
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 7/7] skbuff: remove unused skb->pp_recycle
  2021-09-22  9:41 [PATCH net-next 0/7] some optimization for page pool Yunsheng Lin
                   ` (5 preceding siblings ...)
  2021-09-22  9:41 ` [PATCH net-next 6/7] skbuff: only use pp_magic identifier for a skb' head page Yunsheng Lin
@ 2021-09-22  9:41 ` Yunsheng Lin
  2021-09-23  7:07 ` [PATCH net-next 0/7] some optimization for page pool Ilias Apalodimas
  7 siblings, 0 replies; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-22  9:41 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, linux-kernel, linuxarm, hawk, ilias.apalodimas,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern

As we have used pp_magic to identify pp page for the head
and frag page of a skb, the skb->pp_recycle is not used, so
remove it.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 .../net/ethernet/hisilicon/hns3/hns3_enet.c   |  6 ------
 drivers/net/ethernet/marvell/mvneta.c         |  2 --
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  4 +---
 drivers/net/ethernet/ti/cpsw.c                |  2 --
 drivers/net/ethernet/ti/cpsw_new.c            |  2 --
 include/linux/skbuff.h                        | 12 +----------
 net/core/skbuff.c                             | 21 +------------------
 7 files changed, 3 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 22af3d6ce178..5331e0f2cee4 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -3864,9 +3864,6 @@ static int hns3_alloc_skb(struct hns3_enet_ring *ring, unsigned int length,
 		return 0;
 	}
 
-	if (ring->page_pool)
-		skb_mark_for_recycle(skb);
-
 	u64_stats_update_begin(&ring->syncp);
 	ring->stats.seg_pkt_cnt++;
 	u64_stats_update_end(&ring->syncp);
@@ -3906,9 +3903,6 @@ static int hns3_add_frag(struct hns3_enet_ring *ring)
 				return -ENXIO;
 			}
 
-			if (ring->page_pool)
-				skb_mark_for_recycle(new_skb);
-
 			ring->frag_num = 0;
 
 			if (ring->tail_skb) {
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 9d460a270601..c852e0dd6d38 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2327,8 +2327,6 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 	if (!skb)
 		return ERR_PTR(-ENOMEM);
 
-	skb_mark_for_recycle(skb);
-
 	skb_reserve(skb, xdp->data - xdp->data_hard_start);
 	skb_put(skb, xdp->data_end - xdp->data);
 	skb->ip_summed = mvneta_rx_csum(pp, desc_status);
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index d5c92e43f89e..bacae115c6c6 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -3994,9 +3994,7 @@ static int mvpp2_rx(struct mvpp2_port *port, struct napi_struct *napi,
 			goto err_drop_frame;
 		}
 
-		if (pp)
-			skb_mark_for_recycle(skb);
-		else
+		if (!pp)
 			dma_unmap_single_attrs(dev->dev.parent, dma_addr,
 					       bm_pool->buf_size, DMA_FROM_DEVICE,
 					       DMA_ATTR_SKIP_CPU_SYNC);
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 66f7ddd9b1f9..2fb5a4545b8b 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -430,8 +430,6 @@ static void cpsw_rx_handler(void *token, int len, int status)
 		cpts_rx_timestamp(cpsw->cpts, skb);
 	skb->protocol = eth_type_trans(skb, ndev);
 
-	/* mark skb for recycling */
-	skb_mark_for_recycle(skb);
 	netif_receive_skb(skb);
 
 	ndev->stats.rx_bytes += len;
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index 7968f24d99c8..1e74d484852d 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -374,8 +374,6 @@ static void cpsw_rx_handler(void *token, int len, int status)
 		cpts_rx_timestamp(cpsw->cpts, skb);
 	skb->protocol = eth_type_trans(skb, ndev);
 
-	/* mark skb for recycling */
-	skb_mark_for_recycle(skb);
 	netif_receive_skb(skb);
 
 	ndev->stats.rx_bytes += len;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b77ee060b64d..d4bb0e160fef 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -668,8 +668,6 @@ typedef unsigned char *sk_buff_data_t;
  *	@head_frag: skb was allocated from page fragments,
  *		not allocated by kmalloc() or vmalloc().
  *	@pfmemalloc: skbuff was allocated from PFMEMALLOC reserves
- *	@pp_recycle: mark the packet for recycling instead of freeing (implies
- *		page_pool support on driver)
  *	@active_extensions: active extensions (skb_ext_id types)
  *	@ndisc_nodetype: router type (from link layer)
  *	@ooo_okay: allow the mapping of a socket to a queue to be changed
@@ -795,8 +793,7 @@ struct sk_buff {
 				fclone:2,
 				peeked:1,
 				head_frag:1,
-				pfmemalloc:1,
-				pp_recycle:1; /* page_pool recycle indicator */
+				pfmemalloc:1;
 #ifdef CONFIG_SKB_EXTENSIONS
 	__u8			active_extensions;
 #endif
@@ -4721,12 +4718,5 @@ static inline u64 skb_get_kcov_handle(struct sk_buff *skb)
 #endif
 }
 
-#ifdef CONFIG_PAGE_POOL
-static inline void skb_mark_for_recycle(struct sk_buff *skb)
-{
-	skb->pp_recycle = 1;
-}
-#endif
-
 #endif	/* __KERNEL__ */
 #endif	/* _LINUX_SKBUFF_H */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3718898da499..85ae59f4349a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -670,7 +670,7 @@ static void skb_release_data(struct sk_buff *skb)
 	if (skb->cloned &&
 	    atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
 			      &shinfo->dataref))
-		goto exit;
+		return;
 
 	skb_zcopy_clear(skb, true);
 
@@ -681,17 +681,6 @@ static void skb_release_data(struct sk_buff *skb)
 		kfree_skb_list(shinfo->frag_list);
 
 	skb_free_head(skb);
-exit:
-	/* When we clone an SKB we copy the reycling bit. The pp_recycle
-	 * bit is only set on the head though, so in order to avoid races
-	 * while trying to recycle fragments on __skb_frag_unref() we need
-	 * to make one SKB responsible for triggering the recycle path.
-	 * So disable the recycling bit if an SKB is cloned and we have
-	 * additional references to to the fragmented part of the SKB.
-	 * Eventually the last SKB will have the recycling bit set and it's
-	 * dataref set to 0, which will trigger the recycling
-	 */
-	skb->pp_recycle = 0;
 }
 
 /*
@@ -1073,7 +1062,6 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
 	n->nohdr = 0;
 	n->peeked = 0;
 	C(pfmemalloc);
-	C(pp_recycle);
 	n->destructor = NULL;
 	C(tail);
 	C(end);
@@ -5368,13 +5356,6 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 	if (skb_cloned(to))
 		return false;
 
-	/* The page pool signature of struct page will eventually figure out
-	 * which pages can be recycled or not but for now let's prohibit slab
-	 * allocated and page_pool allocated SKBs from being coalesced.
-	 */
-	if (to->pp_recycle != from->pp_recycle)
-		return false;
-
 	if (len <= skb_tailroom(to)) {
 		if (len)
 			BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len));
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/7] some optimization for page pool
  2021-09-22  9:41 [PATCH net-next 0/7] some optimization for page pool Yunsheng Lin
                   ` (6 preceding siblings ...)
  2021-09-22  9:41 ` [PATCH net-next 7/7] skbuff: remove unused skb->pp_recycle Yunsheng Lin
@ 2021-09-23  7:07 ` Ilias Apalodimas
  2021-09-23 11:12   ` Yunsheng Lin
  7 siblings, 1 reply; 26+ messages in thread
From: Ilias Apalodimas @ 2021-09-23  7:07 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, kuba, netdev, linux-kernel, linuxarm, hawk,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern

Hi Yunsheng, 

On Wed, Sep 22, 2021 at 05:41:24PM +0800, Yunsheng Lin wrote:
> Patch 1: disable dma mapping support for 32-bit arch with 64-bit
>          DMA.
> Patch 2: support non-split page when PP_FLAG_PAGE_FRAG is set.
> patch 3: avoid calling compound_head() for skb frag page
> Patch 4-7: use pp_magic to identify pp page uniquely.

There's some subtle changes in this patchset that might affect XDP.

What I forgot when I proposed removing the recycling bit,  is that it also
serves as an 'opt-in' mechanism for drivers that want to use page_pool but 
do the recycling internally.  With that removed we need to make sure
nothing bad happens to them.  In theory the page refcnt for mlx5
specifically will be elevated, so we'll just end up unmapping the buffer.
Arguably we could add a similar mechanism internally into page pool,  
which would allow us to enable and disable recycling,  but that's
an extra if per packet allocation and I don't know if we want that on the XDP 
case.
A few numbers pre/post patch for XDP would help, but iirc hns3 doesn't have
XDP support yet?

It's plumbers week so I'll do some testing starting Monday.

Thanks
/Ilias

> 
> V3:
>     1. add patch 1/4/6/7.
>     2. use pp_magic to identify pp page uniquely too.
>     3. avoid unnecessary compound_head() calling.
> 
> V2: add patch 2, adjust the commit log accroding to the discussion
>     in V1, and fix a compiler error reported by kernel test robot.
> 
> Yunsheng Lin (7):
>   page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA
>   page_pool: support non-split page with PP_FLAG_PAGE_FRAG
>   pool_pool: avoid calling compound_head() for skb frag page
>   page_pool: change BIAS_MAX to support incrementing
>   skbuff: keep track of pp page when __skb_frag_ref() is called
>   skbuff: only use pp_magic identifier for a skb' head page
>   skbuff: remove unused skb->pp_recycle
> 
>  .../net/ethernet/hisilicon/hns3/hns3_enet.c   |  6 ---
>  drivers/net/ethernet/marvell/mvneta.c         |  2 -
>  .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  4 +-
>  drivers/net/ethernet/marvell/sky2.c           |  2 +-
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  2 +-
>  drivers/net/ethernet/ti/cpsw.c                |  2 -
>  drivers/net/ethernet/ti/cpsw_new.c            |  2 -
>  include/linux/mm_types.h                      | 13 +-----
>  include/linux/skbuff.h                        | 39 ++++++++----------
>  include/net/page_pool.h                       | 31 ++++++++------
>  net/core/page_pool.c                          | 40 +++++++------------
>  net/core/skbuff.c                             | 36 ++++++-----------
>  net/tls/tls_device.c                          |  2 +-
>  13 files changed, 67 insertions(+), 114 deletions(-)
> 
> -- 
> 2.33.0
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 3/7] pool_pool: avoid calling compound_head() for skb frag page
  2021-09-22  9:41 ` [PATCH net-next 3/7] pool_pool: avoid calling compound_head() for skb frag page Yunsheng Lin
@ 2021-09-23  8:33   ` Ilias Apalodimas
  2021-09-23 11:24     ` Yunsheng Lin
  0 siblings, 1 reply; 26+ messages in thread
From: Ilias Apalodimas @ 2021-09-23  8:33 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, kuba, netdev, linux-kernel, linuxarm, hawk,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern

On Wed, Sep 22, 2021 at 05:41:27PM +0800, Yunsheng Lin wrote:
> As the pp page for a skb frag is always a head page, so make
> sure skb_pp_recycle() passes a head page to avoid calling
> compound_head() for skb frag page case.

Doesn't that rely on the driver mostly (i.e what's passed in skb_frag_set_page() ? 
None of the current netstack code assumes bv_page is the head page of a 
compound page.  Since our page_pool allocator can will allocate compound
pages for order > 0,  why should we rely on it ?

Thanks
/Ilias
> 
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
>  include/linux/skbuff.h | 2 +-
>  net/core/page_pool.c   | 2 --
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 6bdb0db3e825..35eebc2310a5 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -4722,7 +4722,7 @@ static inline bool skb_pp_recycle(struct sk_buff *skb, void *data)
>  {
>  	if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
>  		return false;
> -	return page_pool_return_skb_page(virt_to_page(data));
> +	return page_pool_return_skb_page(virt_to_head_page(data));
>  }
>  
>  #endif	/* __KERNEL__ */
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index f7e71dcb6a2e..357fb53343a0 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -742,8 +742,6 @@ bool page_pool_return_skb_page(struct page *page)
>  {
>  	struct page_pool *pp;
>  
> -	page = compound_head(page);
> -
>  	/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
>  	 * in order to preserve any existing bits, such as bit 0 for the
>  	 * head page of compound page and bit 1 for pfmemalloc page, so
> -- 
> 2.33.0
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA
  2021-09-22  9:41 ` [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA Yunsheng Lin
@ 2021-09-23  9:10   ` Ilias Apalodimas
  2021-09-23  9:33   ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 26+ messages in thread
From: Ilias Apalodimas @ 2021-09-23  9:10 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: David S. Miller, Jakub Kicinski, Networking, open list, linuxarm,
	Jesper Dangaard Brouer, Jonathan Lemon, Alexander Lobakin,
	Willem de Bruijn, Cong Wang, Paolo Abeni, Kevin Hao,
	Aleksandr Nogikh, Marco Elver, memxor, Eric Dumazet,
	Alexander Duyck, David Ahern, Matthew Wilcox

(+cc Matthew) but this looks safe to me.

On Wed, 22 Sept 2021 at 12:43, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> As the 32-bit arch with 64-bit DMA seems to rare those days,
> and page pool is carrying a lot of code and complexity for
> systems that possibly don't exist.
>
> So disable dma mapping support for such systems, if drivers
> really want to work on such systems, they have to implement
> their own DMA-mapping fallback tracking outside page_pool.
>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
>  include/linux/mm_types.h | 13 +------------
>  include/net/page_pool.h  | 12 +-----------
>  net/core/page_pool.c     | 10 ++++++----
>  3 files changed, 8 insertions(+), 27 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 7f8ee09c711f..436e0946d691 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -104,18 +104,7 @@ struct page {
>                         struct page_pool *pp;
>                         unsigned long _pp_mapping_pad;
>                         unsigned long dma_addr;
> -                       union {
> -                               /**
> -                                * dma_addr_upper: might require a 64-bit
> -                                * value on 32-bit architectures.
> -                                */
> -                               unsigned long dma_addr_upper;
> -                               /**
> -                                * For frag page support, not supported in
> -                                * 32-bit architectures with 64-bit DMA.
> -                                */
> -                               atomic_long_t pp_frag_count;
> -                       };
> +                       atomic_long_t pp_frag_count;
>                 };
>                 struct {        /* slab, slob and slub */
>                         union {
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index a4082406a003..3855f069627f 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -216,24 +216,14 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
>         page_pool_put_full_page(pool, page, true);
>  }
>
> -#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT        \
> -               (sizeof(dma_addr_t) > sizeof(unsigned long))
> -
>  static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>  {
> -       dma_addr_t ret = page->dma_addr;
> -
> -       if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> -               ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16;
> -
> -       return ret;
> +       return page->dma_addr;
>  }
>
>  static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
>  {
>         page->dma_addr = addr;
> -       if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> -               page->dma_addr_upper = upper_32_bits(addr);
>  }
>
>  static inline void page_pool_set_frag_count(struct page *page, long nr)
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 1a6978427d6c..a65bd7972e37 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -49,6 +49,12 @@ static int page_pool_init(struct page_pool *pool,
>          * which is the XDP_TX use-case.
>          */
>         if (pool->p.flags & PP_FLAG_DMA_MAP) {
> +               /* DMA-mapping is not supported on 32-bit systems with
> +                * 64-bit DMA mapping.
> +                */
> +               if (sizeof(dma_addr_t) > sizeof(unsigned long))
> +                       return -EINVAL;
> +
>                 if ((pool->p.dma_dir != DMA_FROM_DEVICE) &&
>                     (pool->p.dma_dir != DMA_BIDIRECTIONAL))
>                         return -EINVAL;
> @@ -69,10 +75,6 @@ static int page_pool_init(struct page_pool *pool,
>                  */
>         }
>
> -       if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
> -           pool->p.flags & PP_FLAG_PAGE_FRAG)
> -               return -EINVAL;
> -
>         if (ptr_ring_init(&pool->ring, ring_qsize, GFP_KERNEL) < 0)
>                 return -ENOMEM;
>
> --
> 2.33.0
>

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA
  2021-09-22  9:41 ` [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA Yunsheng Lin
  2021-09-23  9:10   ` Ilias Apalodimas
@ 2021-09-23  9:33   ` Jesper Dangaard Brouer
  2021-09-23 10:02     ` Ilias Apalodimas
  1 sibling, 1 reply; 26+ messages in thread
From: Jesper Dangaard Brouer @ 2021-09-23  9:33 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba
  Cc: brouer, netdev, linux-kernel, linuxarm, hawk, ilias.apalodimas,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern


On 22/09/2021 11.41, Yunsheng Lin wrote:
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 1a6978427d6c..a65bd7972e37 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -49,6 +49,12 @@ static int page_pool_init(struct page_pool *pool,
>   	 * which is the XDP_TX use-case.
>   	 */
>   	if (pool->p.flags & PP_FLAG_DMA_MAP) {
> +		/* DMA-mapping is not supported on 32-bit systems with
> +		 * 64-bit DMA mapping.
> +		 */
> +		if (sizeof(dma_addr_t) > sizeof(unsigned long))
> +			return -EINVAL;

As I said before, can we please use another error than EINVAL.
We should give drivers a chance/ability to detect this error, and e.g. 
fallback to doing DMA mappings inside driver instead.

I suggest using EOPNOTSUPP 95 (Operation not supported).

-Jesper


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA
  2021-09-23  9:33   ` Jesper Dangaard Brouer
@ 2021-09-23 10:02     ` Ilias Apalodimas
  2021-09-23 11:13       ` Yunsheng Lin
  0 siblings, 1 reply; 26+ messages in thread
From: Ilias Apalodimas @ 2021-09-23 10:02 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Yunsheng Lin, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Networking, open list, linuxarm,
	Jesper Dangaard Brouer, Jonathan Lemon, Alexander Lobakin,
	Willem de Bruijn, Cong Wang, Paolo Abeni, Kevin Hao,
	Aleksandr Nogikh, Marco Elver, memxor, Eric Dumazet,
	Alexander Duyck, David Ahern

Hi Jesper,

On Thu, 23 Sept 2021 at 12:33, Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
> On 22/09/2021 11.41, Yunsheng Lin wrote:
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index 1a6978427d6c..a65bd7972e37 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -49,6 +49,12 @@ static int page_pool_init(struct page_pool *pool,
> >        * which is the XDP_TX use-case.
> >        */
> >       if (pool->p.flags & PP_FLAG_DMA_MAP) {
> > +             /* DMA-mapping is not supported on 32-bit systems with
> > +              * 64-bit DMA mapping.
> > +              */
> > +             if (sizeof(dma_addr_t) > sizeof(unsigned long))
> > +                     return -EINVAL;
>
> As I said before, can we please use another error than EINVAL.
> We should give drivers a chance/ability to detect this error, and e.g.
> fallback to doing DMA mappings inside driver instead.
>
> I suggest using EOPNOTSUPP 95 (Operation not supported).

I am fine with both.  In any case though the aforementioned driver can
just remove PP_FLAG_DMA_MAP and do it's own mappings.

Regards
/Ilias
>
> -Jesper
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/7] some optimization for page pool
  2021-09-23  7:07 ` [PATCH net-next 0/7] some optimization for page pool Ilias Apalodimas
@ 2021-09-23 11:12   ` Yunsheng Lin
  0 siblings, 0 replies; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-23 11:12 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: davem, kuba, netdev, linux-kernel, linuxarm, hawk,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern

On 2021/9/23 15:07, Ilias Apalodimas wrote:
> Hi Yunsheng, 
> 
> On Wed, Sep 22, 2021 at 05:41:24PM +0800, Yunsheng Lin wrote:
>> Patch 1: disable dma mapping support for 32-bit arch with 64-bit
>>          DMA.
>> Patch 2: support non-split page when PP_FLAG_PAGE_FRAG is set.
>> patch 3: avoid calling compound_head() for skb frag page
>> Patch 4-7: use pp_magic to identify pp page uniquely.
> 
> There's some subtle changes in this patchset that might affect XDP.
> 
> What I forgot when I proposed removing the recycling bit,  is that it also
> serves as an 'opt-in' mechanism for drivers that want to use page_pool but 
> do the recycling internally.  With that removed we need to make sure
> nothing bad happens to them.  In theory the page refcnt for mlx5

It seems odd that mlx5 is adding its own page cache on top of page pool,
is it about support both "struct sk_buff" and "struct xdp_buff" for the
same queue?

> specifically will be elevated, so we'll just end up unmapping the buffer.
> Arguably we could add a similar mechanism internally into page pool,  
> which would allow us to enable and disable recycling,  but that's
> an extra if per packet allocation and I don't know if we want that on the XDP 
> case.

Or we could change mlx5e_rx_cache_get() to check for "page->pp_frag_count
== 1" too, and adjust mlx5e_page_release() accordingly?

> A few numbers pre/post patch for XDP would help, but iirc hns3 doesn't have
> XDP support yet?

You are right, hns3 doesn't have XDP support yet.

> 
> It's plumbers week so I'll do some testing starting Monday.
> 
> Thanks
> /Ilias
> 
>>
>> V3:
>>     1. add patch 1/4/6/7.
>>     2. use pp_magic to identify pp page uniquely too.
>>     3. avoid unnecessary compound_head() calling.
>>
>> V2: add patch 2, adjust the commit log accroding to the discussion
>>     in V1, and fix a compiler error reported by kernel test robot.
>>
>> Yunsheng Lin (7):
>>   page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA
>>   page_pool: support non-split page with PP_FLAG_PAGE_FRAG
>>   pool_pool: avoid calling compound_head() for skb frag page
>>   page_pool: change BIAS_MAX to support incrementing
>>   skbuff: keep track of pp page when __skb_frag_ref() is called
>>   skbuff: only use pp_magic identifier for a skb' head page
>>   skbuff: remove unused skb->pp_recycle
>>
>>  .../net/ethernet/hisilicon/hns3/hns3_enet.c   |  6 ---
>>  drivers/net/ethernet/marvell/mvneta.c         |  2 -
>>  .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  4 +-
>>  drivers/net/ethernet/marvell/sky2.c           |  2 +-
>>  drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  2 +-
>>  drivers/net/ethernet/ti/cpsw.c                |  2 -
>>  drivers/net/ethernet/ti/cpsw_new.c            |  2 -
>>  include/linux/mm_types.h                      | 13 +-----
>>  include/linux/skbuff.h                        | 39 ++++++++----------
>>  include/net/page_pool.h                       | 31 ++++++++------
>>  net/core/page_pool.c                          | 40 +++++++------------
>>  net/core/skbuff.c                             | 36 ++++++-----------
>>  net/tls/tls_device.c                          |  2 +-
>>  13 files changed, 67 insertions(+), 114 deletions(-)
>>
>> -- 
>> 2.33.0
>>
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA
  2021-09-23 10:02     ` Ilias Apalodimas
@ 2021-09-23 11:13       ` Yunsheng Lin
  2021-09-23 13:07         ` Ilias Apalodimas
  0 siblings, 1 reply; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-23 11:13 UTC (permalink / raw)
  To: Ilias Apalodimas, Jesper Dangaard Brouer
  Cc: David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	Networking, open list, linuxarm, Jesper Dangaard Brouer,
	Jonathan Lemon, Alexander Lobakin, Willem de Bruijn, Cong Wang,
	Paolo Abeni, Kevin Hao, Aleksandr Nogikh, Marco Elver, memxor,
	Eric Dumazet, Alexander Duyck, David Ahern

On 2021/9/23 18:02, Ilias Apalodimas wrote:
> Hi Jesper,
> 
> On Thu, 23 Sept 2021 at 12:33, Jesper Dangaard Brouer
> <jbrouer@redhat.com> wrote:
>>
>>
>> On 22/09/2021 11.41, Yunsheng Lin wrote:
>>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>>> index 1a6978427d6c..a65bd7972e37 100644
>>> --- a/net/core/page_pool.c
>>> +++ b/net/core/page_pool.c
>>> @@ -49,6 +49,12 @@ static int page_pool_init(struct page_pool *pool,
>>>        * which is the XDP_TX use-case.
>>>        */
>>>       if (pool->p.flags & PP_FLAG_DMA_MAP) {
>>> +             /* DMA-mapping is not supported on 32-bit systems with
>>> +              * 64-bit DMA mapping.
>>> +              */
>>> +             if (sizeof(dma_addr_t) > sizeof(unsigned long))
>>> +                     return -EINVAL;
>>
>> As I said before, can we please use another error than EINVAL.
>> We should give drivers a chance/ability to detect this error, and e.g.
>> fallback to doing DMA mappings inside driver instead.
>>
>> I suggest using EOPNOTSUPP 95 (Operation not supported).

Will change it to EOPNOTSUPP, thanks.

> 
> I am fine with both.  In any case though the aforementioned driver can
> just remove PP_FLAG_DMA_MAP and do it's own mappings.
> 
> Regards
> /Ilias
>>
>> -Jesper
>>
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 3/7] pool_pool: avoid calling compound_head() for skb frag page
  2021-09-23  8:33   ` Ilias Apalodimas
@ 2021-09-23 11:24     ` Yunsheng Lin
  2021-09-23 11:47       ` Ilias Apalodimas
  0 siblings, 1 reply; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-23 11:24 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: davem, kuba, netdev, linux-kernel, linuxarm, hawk,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern

On 2021/9/23 16:33, Ilias Apalodimas wrote:
> On Wed, Sep 22, 2021 at 05:41:27PM +0800, Yunsheng Lin wrote:
>> As the pp page for a skb frag is always a head page, so make
>> sure skb_pp_recycle() passes a head page to avoid calling
>> compound_head() for skb frag page case.
> 
> Doesn't that rely on the driver mostly (i.e what's passed in skb_frag_set_page() ? 
> None of the current netstack code assumes bv_page is the head page of a 
> compound page.  Since our page_pool allocator can will allocate compound
> pages for order > 0,  why should we rely on it ?

As the page pool alloc function return 'struct page *' to the caller, which
is the head page of a compound pages for order > 0, so I assume the caller
will pass that to skb_frag_set_page().

For non-pp page, I assume it is ok whether the page is a head page or tail
page, as the pp_magic for both of them are not set with PP_SIGNATURE.

Or should we play safe here, and do the trick as skb_free_head() does in
patch 6?

> 
> Thanks
> /Ilias
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> ---
>>  include/linux/skbuff.h | 2 +-
>>  net/core/page_pool.c   | 2 --
>>  2 files changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 6bdb0db3e825..35eebc2310a5 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -4722,7 +4722,7 @@ static inline bool skb_pp_recycle(struct sk_buff *skb, void *data)
>>  {
>>  	if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
>>  		return false;
>> -	return page_pool_return_skb_page(virt_to_page(data));
>> +	return page_pool_return_skb_page(virt_to_head_page(data));
>>  }
>>  
>>  #endif	/* __KERNEL__ */
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index f7e71dcb6a2e..357fb53343a0 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -742,8 +742,6 @@ bool page_pool_return_skb_page(struct page *page)
>>  {
>>  	struct page_pool *pp;
>>  
>> -	page = compound_head(page);
>> -
>>  	/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
>>  	 * in order to preserve any existing bits, such as bit 0 for the
>>  	 * head page of compound page and bit 1 for pfmemalloc page, so
>> -- 
>> 2.33.0
>>
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 3/7] pool_pool: avoid calling compound_head() for skb frag page
  2021-09-23 11:24     ` Yunsheng Lin
@ 2021-09-23 11:47       ` Ilias Apalodimas
  2021-09-24  7:33         ` Yunsheng Lin
  0 siblings, 1 reply; 26+ messages in thread
From: Ilias Apalodimas @ 2021-09-23 11:47 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: David S. Miller, Jakub Kicinski, Networking, open list, linuxarm,
	Jesper Dangaard Brouer, Jonathan Lemon, Alexander Lobakin,
	Willem de Bruijn, Cong Wang, Paolo Abeni, Kevin Hao,
	Aleksandr Nogikh, Marco Elver, memxor, Eric Dumazet,
	Alexander Duyck, David Ahern

On Thu, 23 Sept 2021 at 14:24, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2021/9/23 16:33, Ilias Apalodimas wrote:
> > On Wed, Sep 22, 2021 at 05:41:27PM +0800, Yunsheng Lin wrote:
> >> As the pp page for a skb frag is always a head page, so make
> >> sure skb_pp_recycle() passes a head page to avoid calling
> >> compound_head() for skb frag page case.
> >
> > Doesn't that rely on the driver mostly (i.e what's passed in skb_frag_set_page() ?
> > None of the current netstack code assumes bv_page is the head page of a
> > compound page.  Since our page_pool allocator can will allocate compound
> > pages for order > 0,  why should we rely on it ?
>
> As the page pool alloc function return 'struct page *' to the caller, which
> is the head page of a compound pages for order > 0, so I assume the caller
> will pass that to skb_frag_set_page().

Yea that's exactly the assumption I was afraid of.
Sure not passing the head page might seem weird atm and the assumption
stands, but the point is we shouldn't blow up the entire network stack
if someone does that eventually.

>
> For non-pp page, I assume it is ok whether the page is a head page or tail
> page, as the pp_magic for both of them are not set with PP_SIGNATURE.

Yea that's true, although we removed the checking for coalescing
recyclable and non-recyclable SKBs,   the next patch first checks the
signature before trying to do anything with the skb.

>
> Or should we play safe here, and do the trick as skb_free_head() does in
> patch 6?

I don't think the &1 will even be measurable,  so I'd suggest just
dropping this and play safe?

Cheers
/Ilias
>
> >
> > Thanks
> > /Ilias
> >>
> >> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> >> ---
> >>  include/linux/skbuff.h | 2 +-
> >>  net/core/page_pool.c   | 2 --
> >>  2 files changed, 1 insertion(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >> index 6bdb0db3e825..35eebc2310a5 100644
> >> --- a/include/linux/skbuff.h
> >> +++ b/include/linux/skbuff.h
> >> @@ -4722,7 +4722,7 @@ static inline bool skb_pp_recycle(struct sk_buff *skb, void *data)
> >>  {
> >>      if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
> >>              return false;
> >> -    return page_pool_return_skb_page(virt_to_page(data));
> >> +    return page_pool_return_skb_page(virt_to_head_page(data));
> >>  }
> >>
> >>  #endif      /* __KERNEL__ */
> >> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> >> index f7e71dcb6a2e..357fb53343a0 100644
> >> --- a/net/core/page_pool.c
> >> +++ b/net/core/page_pool.c
> >> @@ -742,8 +742,6 @@ bool page_pool_return_skb_page(struct page *page)
> >>  {
> >>      struct page_pool *pp;
> >>
> >> -    page = compound_head(page);
> >> -
> >>      /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
> >>       * in order to preserve any existing bits, such as bit 0 for the
> >>       * head page of compound page and bit 1 for pfmemalloc page, so
> >> --
> >> 2.33.0
> >>
> > .
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 2/7] page_pool: support non-split page with PP_FLAG_PAGE_FRAG
  2021-09-22  9:41 ` [PATCH net-next 2/7] page_pool: support non-split page with PP_FLAG_PAGE_FRAG Yunsheng Lin
@ 2021-09-23 12:08   ` Jesper Dangaard Brouer
  2021-09-24  7:23     ` Yunsheng Lin
  0 siblings, 1 reply; 26+ messages in thread
From: Jesper Dangaard Brouer @ 2021-09-23 12:08 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba
  Cc: brouer, netdev, linux-kernel, linuxarm, hawk, ilias.apalodimas,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern



On 22/09/2021 11.41, Yunsheng Lin wrote:
> Currently when PP_FLAG_PAGE_FRAG is set, the caller is not
> expected to call page_pool_alloc_pages() directly because of
> the PP_FLAG_PAGE_FRAG checking in __page_pool_put_page().
> 
> The patch removes the above checking to enable non-split page
> support when PP_FLAG_PAGE_FRAG is set.
> 
> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
>   net/core/page_pool.c | 12 +++++++-----
>   1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index a65bd7972e37..f7e71dcb6a2e 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -315,11 +315,14 @@ struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp)
>   
>   	/* Fast-path: Get a page from cache */
>   	page = __page_pool_get_cached(pool);
> -	if (page)
> -		return page;
>   
>   	/* Slow-path: cache empty, do real allocation */
> -	page = __page_pool_alloc_pages_slow(pool, gfp);
> +	if (!page)
> +		page = __page_pool_alloc_pages_slow(pool, gfp);
> +
> +	if (likely(page))
> +		page_pool_set_frag_count(page, 1);
> +

I really don't like that you add one atomic_long_set operation per page 
alloc call.
This is a fast-path for XDP use-cases, which you are ignoring as you 
drivers doesn't implement XDP.

As I cannot ask you to run XDP benchmarks, I fortunately have some 
page_pool specific microbenchmarks you can run instead.

I will ask you to provide before and after results from running these 
benchmarks [1] and [2].

  [1] 
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c

  [2] 
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_cross_cpu.c

How to use these module is documented here[3]:
  [3] 
https://prototype-kernel.readthedocs.io/en/latest/prototype-kernel/build-process.html

>   	return page;
>   }
>   EXPORT_SYMBOL(page_pool_alloc_pages);
> @@ -428,8 +431,7 @@ __page_pool_put_page(struct page_pool *pool, struct page *page,
>   		     unsigned int dma_sync_size, bool allow_direct)
>   {
>   	/* It is not the last user for the page frag case */
> -	if (pool->p.flags & PP_FLAG_PAGE_FRAG &&
> -	    page_pool_atomic_sub_frag_count_return(page, 1))
> +	if (page_pool_atomic_sub_frag_count_return(page, 1))
>   		return NULL;

This adds an atomic_long_read, even when PP_FLAG_PAGE_FRAG is not set.

>   
>   	/* This allocator is optimized for the XDP mode that uses
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA
  2021-09-23 11:13       ` Yunsheng Lin
@ 2021-09-23 13:07         ` Ilias Apalodimas
  2021-09-24  7:04           ` Yunsheng Lin
  0 siblings, 1 reply; 26+ messages in thread
From: Ilias Apalodimas @ 2021-09-23 13:07 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Jesper Dangaard Brouer, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Networking, open list, linuxarm,
	Jesper Dangaard Brouer, Jonathan Lemon, Alexander Lobakin,
	Willem de Bruijn, Cong Wang, Paolo Abeni, Kevin Hao,
	Aleksandr Nogikh, Marco Elver, memxor, Eric Dumazet,
	Alexander Duyck, David Ahern

On Thu, Sep 23, 2021 at 07:13:11PM +0800, Yunsheng Lin wrote:
> On 2021/9/23 18:02, Ilias Apalodimas wrote:
> > Hi Jesper,
> > 
> > On Thu, 23 Sept 2021 at 12:33, Jesper Dangaard Brouer
> > <jbrouer@redhat.com> wrote:
> >>
> >>
> >> On 22/09/2021 11.41, Yunsheng Lin wrote:
> >>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> >>> index 1a6978427d6c..a65bd7972e37 100644
> >>> --- a/net/core/page_pool.c
> >>> +++ b/net/core/page_pool.c
> >>> @@ -49,6 +49,12 @@ static int page_pool_init(struct page_pool *pool,
> >>>        * which is the XDP_TX use-case.
> >>>        */
> >>>       if (pool->p.flags & PP_FLAG_DMA_MAP) {
> >>> +             /* DMA-mapping is not supported on 32-bit systems with
> >>> +              * 64-bit DMA mapping.
> >>> +              */
> >>> +             if (sizeof(dma_addr_t) > sizeof(unsigned long))
> >>> +                     return -EINVAL;
> >>
> >> As I said before, can we please use another error than EINVAL.
> >> We should give drivers a chance/ability to detect this error, and e.g.
> >> fallback to doing DMA mappings inside driver instead.
> >>
> >> I suggest using EOPNOTSUPP 95 (Operation not supported).
> 
> Will change it to EOPNOTSUPP, thanks.

Mind sending this one separately (and you can keep my reviewed-by).  It
fits nicely on it's own and since I am not sure about the rest of the
changes yet, it would be nice to get this one in.

Cheers
/Ilias
> 
> > 
> > I am fine with both.  In any case though the aforementioned driver can
> > just remove PP_FLAG_DMA_MAP and do it's own mappings.
> > 
> > Regards
> > /Ilias
> >>
> >> -Jesper
> >>
> > .
> > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA
  2021-09-23 13:07         ` Ilias Apalodimas
@ 2021-09-24  7:04           ` Yunsheng Lin
  2021-09-24  7:25             ` Ilias Apalodimas
  0 siblings, 1 reply; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-24  7:04 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Jesper Dangaard Brouer, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Networking, open list, linuxarm,
	Jesper Dangaard Brouer, Jonathan Lemon, Alexander Lobakin,
	Willem de Bruijn, Cong Wang, Paolo Abeni, Kevin Hao,
	Aleksandr Nogikh, Marco Elver, memxor, Eric Dumazet,
	Alexander Duyck, David Ahern

On 2021/9/23 21:07, Ilias Apalodimas wrote:
> On Thu, Sep 23, 2021 at 07:13:11PM +0800, Yunsheng Lin wrote:
>> On 2021/9/23 18:02, Ilias Apalodimas wrote:
>>> Hi Jesper,
>>>
>>> On Thu, 23 Sept 2021 at 12:33, Jesper Dangaard Brouer
>>> <jbrouer@redhat.com> wrote:
>>>>
>>>>
>>>> On 22/09/2021 11.41, Yunsheng Lin wrote:
>>>>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>>>>> index 1a6978427d6c..a65bd7972e37 100644
>>>>> --- a/net/core/page_pool.c
>>>>> +++ b/net/core/page_pool.c
>>>>> @@ -49,6 +49,12 @@ static int page_pool_init(struct page_pool *pool,
>>>>>        * which is the XDP_TX use-case.
>>>>>        */
>>>>>       if (pool->p.flags & PP_FLAG_DMA_MAP) {
>>>>> +             /* DMA-mapping is not supported on 32-bit systems with
>>>>> +              * 64-bit DMA mapping.
>>>>> +              */
>>>>> +             if (sizeof(dma_addr_t) > sizeof(unsigned long))
>>>>> +                     return -EINVAL;
>>>>
>>>> As I said before, can we please use another error than EINVAL.
>>>> We should give drivers a chance/ability to detect this error, and e.g.
>>>> fallback to doing DMA mappings inside driver instead.
>>>>
>>>> I suggest using EOPNOTSUPP 95 (Operation not supported).
>>
>> Will change it to EOPNOTSUPP, thanks.
> 
> Mind sending this one separately (and you can keep my reviewed-by).  It
> fits nicely on it's own and since I am not sure about the rest of the
> changes yet, it would be nice to get this one in.

I am not sure sending this one separately really makes sense, as it is
mainly used to make supporting the "keep track of pp page when __skb_frag_ref()
is called" in patch 5 easier.

> 
> Cheers
> /Ilias
>>
>>>
>>> I am fine with both.  In any case though the aforementioned driver can
>>> just remove PP_FLAG_DMA_MAP and do it's own mappings.
>>>
>>> Regards
>>> /Ilias
>>>>
>>>> -Jesper
>>>>
>>> .
>>>
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 2/7] page_pool: support non-split page with PP_FLAG_PAGE_FRAG
  2021-09-23 12:08   ` Jesper Dangaard Brouer
@ 2021-09-24  7:23     ` Yunsheng Lin
  2021-09-30  7:28       ` [Linuxarm] " Yunsheng Lin
  0 siblings, 1 reply; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-24  7:23 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, davem, kuba
  Cc: brouer, netdev, linux-kernel, linuxarm, hawk, ilias.apalodimas,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern

On 2021/9/23 20:08, Jesper Dangaard Brouer wrote:
> 
> 
> On 22/09/2021 11.41, Yunsheng Lin wrote:
>> Currently when PP_FLAG_PAGE_FRAG is set, the caller is not
>> expected to call page_pool_alloc_pages() directly because of
>> the PP_FLAG_PAGE_FRAG checking in __page_pool_put_page().
>>
>> The patch removes the above checking to enable non-split page
>> support when PP_FLAG_PAGE_FRAG is set.
>>
>> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> ---
>>   net/core/page_pool.c | 12 +++++++-----
>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index a65bd7972e37..f7e71dcb6a2e 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -315,11 +315,14 @@ struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp)
>>         /* Fast-path: Get a page from cache */
>>       page = __page_pool_get_cached(pool);
>> -    if (page)
>> -        return page;
>>         /* Slow-path: cache empty, do real allocation */
>> -    page = __page_pool_alloc_pages_slow(pool, gfp);
>> +    if (!page)
>> +        page = __page_pool_alloc_pages_slow(pool, gfp);
>> +
>> +    if (likely(page))
>> +        page_pool_set_frag_count(page, 1);
>> +
> 
> I really don't like that you add one atomic_long_set operation per page alloc call.
> This is a fast-path for XDP use-cases, which you are ignoring as you drivers doesn't implement XDP.
> 
> As I cannot ask you to run XDP benchmarks, I fortunately have some page_pool specific microbenchmarks you can run instead.
> 
> I will ask you to provide before and after results from running these benchmarks [1] and [2].
> 
>  [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c
> 
>  [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_cross_cpu.c
> 
> How to use these module is documented here[3]:
>  [3] https://prototype-kernel.readthedocs.io/en/latest/prototype-kernel/build-process.html

Will running these benchmarks to see if any performance overhead noticable here,
thanks for the benchmarks.

> 
>>       return page;
>>   }
>>   EXPORT_SYMBOL(page_pool_alloc_pages);
>> @@ -428,8 +431,7 @@ __page_pool_put_page(struct page_pool *pool, struct page *page,
>>                unsigned int dma_sync_size, bool allow_direct)
>>   {
>>       /* It is not the last user for the page frag case */
>> -    if (pool->p.flags & PP_FLAG_PAGE_FRAG &&
>> -        page_pool_atomic_sub_frag_count_return(page, 1))
>> +    if (page_pool_atomic_sub_frag_count_return(page, 1))
>>           return NULL;
> 
> This adds an atomic_long_read, even when PP_FLAG_PAGE_FRAG is not set.

The point here is to have consistent handling for both PP_FLAG_PAGE_FRAG
and non-PP_FLAG_PAGE_FRAG case in the following patch.

As the page->_refcount is accessed in "page_ref_count(page) == 1" checking
in __page_pool_put_page(), and page->pp_frag_count is most likely in the
same cache line as the page->_refcount, So I am not expecting a noticable
overhead here.

Anyway, will use the above benchmarks as an example to verify it.


> 
>>         /* This allocator is optimized for the XDP mode that uses
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA
  2021-09-24  7:04           ` Yunsheng Lin
@ 2021-09-24  7:25             ` Ilias Apalodimas
  2021-09-24  8:01               ` Yunsheng Lin
  0 siblings, 1 reply; 26+ messages in thread
From: Ilias Apalodimas @ 2021-09-24  7:25 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Jesper Dangaard Brouer, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Networking, open list, linuxarm,
	Jesper Dangaard Brouer, Jonathan Lemon, Alexander Lobakin,
	Willem de Bruijn, Cong Wang, Paolo Abeni, Kevin Hao,
	Aleksandr Nogikh, Marco Elver, memxor, Eric Dumazet,
	Alexander Duyck, David Ahern

On Fri, 24 Sept 2021 at 10:04, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2021/9/23 21:07, Ilias Apalodimas wrote:
> > On Thu, Sep 23, 2021 at 07:13:11PM +0800, Yunsheng Lin wrote:
> >> On 2021/9/23 18:02, Ilias Apalodimas wrote:
> >>> Hi Jesper,
> >>>
> >>> On Thu, 23 Sept 2021 at 12:33, Jesper Dangaard Brouer
> >>> <jbrouer@redhat.com> wrote:
> >>>>
> >>>>
> >>>> On 22/09/2021 11.41, Yunsheng Lin wrote:
> >>>>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> >>>>> index 1a6978427d6c..a65bd7972e37 100644
> >>>>> --- a/net/core/page_pool.c
> >>>>> +++ b/net/core/page_pool.c
> >>>>> @@ -49,6 +49,12 @@ static int page_pool_init(struct page_pool *pool,
> >>>>>        * which is the XDP_TX use-case.
> >>>>>        */
> >>>>>       if (pool->p.flags & PP_FLAG_DMA_MAP) {
> >>>>> +             /* DMA-mapping is not supported on 32-bit systems with
> >>>>> +              * 64-bit DMA mapping.
> >>>>> +              */
> >>>>> +             if (sizeof(dma_addr_t) > sizeof(unsigned long))
> >>>>> +                     return -EINVAL;
> >>>>
> >>>> As I said before, can we please use another error than EINVAL.
> >>>> We should give drivers a chance/ability to detect this error, and e.g.
> >>>> fallback to doing DMA mappings inside driver instead.
> >>>>
> >>>> I suggest using EOPNOTSUPP 95 (Operation not supported).
> >>
> >> Will change it to EOPNOTSUPP, thanks.
> >
> > Mind sending this one separately (and you can keep my reviewed-by).  It
> > fits nicely on it's own and since I am not sure about the rest of the
> > changes yet, it would be nice to get this one in.
>
> I am not sure sending this one separately really makes sense, as it is
> mainly used to make supporting the "keep track of pp page when __skb_frag_ref()
> is called" in patch 5 easier.

It rips out support for devices that are 32bit and have 64bit dma and
make the whole code easier to follow.  I thought we agreed on removing
the support for those devices regardless didn't we?

Regards
/Ilias

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 3/7] pool_pool: avoid calling compound_head() for skb frag page
  2021-09-23 11:47       ` Ilias Apalodimas
@ 2021-09-24  7:33         ` Yunsheng Lin
  2021-09-24  7:44           ` Ilias Apalodimas
  0 siblings, 1 reply; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-24  7:33 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: David S. Miller, Jakub Kicinski, Networking, open list, linuxarm,
	Jesper Dangaard Brouer, Jonathan Lemon, Alexander Lobakin,
	Willem de Bruijn, Cong Wang, Paolo Abeni, Kevin Hao,
	Aleksandr Nogikh, Marco Elver, memxor, Eric Dumazet,
	Alexander Duyck, David Ahern

On 2021/9/23 19:47, Ilias Apalodimas wrote:
> On Thu, 23 Sept 2021 at 14:24, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2021/9/23 16:33, Ilias Apalodimas wrote:
>>> On Wed, Sep 22, 2021 at 05:41:27PM +0800, Yunsheng Lin wrote:
>>>> As the pp page for a skb frag is always a head page, so make
>>>> sure skb_pp_recycle() passes a head page to avoid calling
>>>> compound_head() for skb frag page case.
>>>
>>> Doesn't that rely on the driver mostly (i.e what's passed in skb_frag_set_page() ?
>>> None of the current netstack code assumes bv_page is the head page of a
>>> compound page.  Since our page_pool allocator can will allocate compound
>>> pages for order > 0,  why should we rely on it ?
>>
>> As the page pool alloc function return 'struct page *' to the caller, which
>> is the head page of a compound pages for order > 0, so I assume the caller
>> will pass that to skb_frag_set_page().
> 
> Yea that's exactly the assumption I was afraid of.
> Sure not passing the head page might seem weird atm and the assumption
> stands, but the point is we shouldn't blow up the entire network stack
> if someone does that eventually.
> 
>>
>> For non-pp page, I assume it is ok whether the page is a head page or tail
>> page, as the pp_magic for both of them are not set with PP_SIGNATURE.
> 
> Yea that's true, although we removed the checking for coalescing
> recyclable and non-recyclable SKBs,   the next patch first checks the
> signature before trying to do anything with the skb.
> 
>>
>> Or should we play safe here, and do the trick as skb_free_head() does in
>> patch 6?
> 
> I don't think the &1 will even be measurable,  so I'd suggest just
> dropping this and play safe?

I am not sure what does '&1' mean above.

The one thing I am not sure about the trick done in patch 6 is that
if __page_frag_cache_drain() is right API to use here, I used it because
it is the only API that is expecting a head page.

> 
> Cheers
> /Ilias
>>
>>>
>>> Thanks
>>> /Ilias
>>>>
>>>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>>>> ---
>>>>  include/linux/skbuff.h | 2 +-
>>>>  net/core/page_pool.c   | 2 --
>>>>  2 files changed, 1 insertion(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>> index 6bdb0db3e825..35eebc2310a5 100644
>>>> --- a/include/linux/skbuff.h
>>>> +++ b/include/linux/skbuff.h
>>>> @@ -4722,7 +4722,7 @@ static inline bool skb_pp_recycle(struct sk_buff *skb, void *data)
>>>>  {
>>>>      if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
>>>>              return false;
>>>> -    return page_pool_return_skb_page(virt_to_page(data));
>>>> +    return page_pool_return_skb_page(virt_to_head_page(data));
>>>>  }
>>>>
>>>>  #endif      /* __KERNEL__ */
>>>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>>>> index f7e71dcb6a2e..357fb53343a0 100644
>>>> --- a/net/core/page_pool.c
>>>> +++ b/net/core/page_pool.c
>>>> @@ -742,8 +742,6 @@ bool page_pool_return_skb_page(struct page *page)
>>>>  {
>>>>      struct page_pool *pp;
>>>>
>>>> -    page = compound_head(page);
>>>> -
>>>>      /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
>>>>       * in order to preserve any existing bits, such as bit 0 for the
>>>>       * head page of compound page and bit 1 for pfmemalloc page, so
>>>> --
>>>> 2.33.0
>>>>
>>> .
>>>
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 3/7] pool_pool: avoid calling compound_head() for skb frag page
  2021-09-24  7:33         ` Yunsheng Lin
@ 2021-09-24  7:44           ` Ilias Apalodimas
  0 siblings, 0 replies; 26+ messages in thread
From: Ilias Apalodimas @ 2021-09-24  7:44 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: David S. Miller, Jakub Kicinski, Networking, open list, linuxarm,
	Jesper Dangaard Brouer, Jonathan Lemon, Alexander Lobakin,
	Willem de Bruijn, Cong Wang, Paolo Abeni, Kevin Hao,
	Aleksandr Nogikh, Marco Elver, memxor, Eric Dumazet,
	Alexander Duyck, David Ahern

On Fri, 24 Sept 2021 at 10:33, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2021/9/23 19:47, Ilias Apalodimas wrote:
> > On Thu, 23 Sept 2021 at 14:24, Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>
> >> On 2021/9/23 16:33, Ilias Apalodimas wrote:
> >>> On Wed, Sep 22, 2021 at 05:41:27PM +0800, Yunsheng Lin wrote:
> >>>> As the pp page for a skb frag is always a head page, so make
> >>>> sure skb_pp_recycle() passes a head page to avoid calling
> >>>> compound_head() for skb frag page case.
> >>>
> >>> Doesn't that rely on the driver mostly (i.e what's passed in skb_frag_set_page() ?
> >>> None of the current netstack code assumes bv_page is the head page of a
> >>> compound page.  Since our page_pool allocator can will allocate compound
> >>> pages for order > 0,  why should we rely on it ?
> >>
> >> As the page pool alloc function return 'struct page *' to the caller, which
> >> is the head page of a compound pages for order > 0, so I assume the caller
> >> will pass that to skb_frag_set_page().
> >
> > Yea that's exactly the assumption I was afraid of.
> > Sure not passing the head page might seem weird atm and the assumption
> > stands, but the point is we shouldn't blow up the entire network stack
> > if someone does that eventually.
> >
> >>
> >> For non-pp page, I assume it is ok whether the page is a head page or tail
> >> page, as the pp_magic for both of them are not set with PP_SIGNATURE.
> >
> > Yea that's true, although we removed the checking for coalescing
> > recyclable and non-recyclable SKBs,   the next patch first checks the
> > signature before trying to do anything with the skb.
> >
> >>
> >> Or should we play safe here, and do the trick as skb_free_head() does in
> >> patch 6?
> >
> > I don't think the &1 will even be measurable,  so I'd suggest just
> > dropping this and play safe?
>
> I am not sure what does '&1' mean above.

I meant the check compound_head() is doing before deciding on the head page.

>
> The one thing I am not sure about the trick done in patch 6 is that
> if __page_frag_cache_drain() is right API to use here, I used it because
> it is the only API that is expecting a head page.

Yea seemed a bit funny to me in the first place, until I figured out
what exactly it was doing.

Regards
/Ilias
>
> >
> > Cheers
> > /Ilias
> >>
> >>>
> >>> Thanks
> >>> /Ilias
> >>>>
> >>>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> >>>> ---
> >>>>  include/linux/skbuff.h | 2 +-
> >>>>  net/core/page_pool.c   | 2 --
> >>>>  2 files changed, 1 insertion(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >>>> index 6bdb0db3e825..35eebc2310a5 100644
> >>>> --- a/include/linux/skbuff.h
> >>>> +++ b/include/linux/skbuff.h
> >>>> @@ -4722,7 +4722,7 @@ static inline bool skb_pp_recycle(struct sk_buff *skb, void *data)
> >>>>  {
> >>>>      if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
> >>>>              return false;
> >>>> -    return page_pool_return_skb_page(virt_to_page(data));
> >>>> +    return page_pool_return_skb_page(virt_to_head_page(data));
> >>>>  }
> >>>>
> >>>>  #endif      /* __KERNEL__ */
> >>>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> >>>> index f7e71dcb6a2e..357fb53343a0 100644
> >>>> --- a/net/core/page_pool.c
> >>>> +++ b/net/core/page_pool.c
> >>>> @@ -742,8 +742,6 @@ bool page_pool_return_skb_page(struct page *page)
> >>>>  {
> >>>>      struct page_pool *pp;
> >>>>
> >>>> -    page = compound_head(page);
> >>>> -
> >>>>      /* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
> >>>>       * in order to preserve any existing bits, such as bit 0 for the
> >>>>       * head page of compound page and bit 1 for pfmemalloc page, so
> >>>> --
> >>>> 2.33.0
> >>>>
> >>> .
> >>>
> > .
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA
  2021-09-24  7:25             ` Ilias Apalodimas
@ 2021-09-24  8:01               ` Yunsheng Lin
  0 siblings, 0 replies; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-24  8:01 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Jesper Dangaard Brouer, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Networking, open list, linuxarm,
	Jesper Dangaard Brouer, Jonathan Lemon, Alexander Lobakin,
	Willem de Bruijn, Cong Wang, Paolo Abeni, Kevin Hao,
	Aleksandr Nogikh, Marco Elver, memxor, Eric Dumazet,
	Alexander Duyck, David Ahern

On 2021/9/24 15:25, Ilias Apalodimas wrote:
> On Fri, 24 Sept 2021 at 10:04, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2021/9/23 21:07, Ilias Apalodimas wrote:
>>> On Thu, Sep 23, 2021 at 07:13:11PM +0800, Yunsheng Lin wrote:
>>>> On 2021/9/23 18:02, Ilias Apalodimas wrote:
>>>>> Hi Jesper,
>>>>>
>>>>> On Thu, 23 Sept 2021 at 12:33, Jesper Dangaard Brouer
>>>>> <jbrouer@redhat.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 22/09/2021 11.41, Yunsheng Lin wrote:
>>>>>>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>>>>>>> index 1a6978427d6c..a65bd7972e37 100644
>>>>>>> --- a/net/core/page_pool.c
>>>>>>> +++ b/net/core/page_pool.c
>>>>>>> @@ -49,6 +49,12 @@ static int page_pool_init(struct page_pool *pool,
>>>>>>>        * which is the XDP_TX use-case.
>>>>>>>        */
>>>>>>>       if (pool->p.flags & PP_FLAG_DMA_MAP) {
>>>>>>> +             /* DMA-mapping is not supported on 32-bit systems with
>>>>>>> +              * 64-bit DMA mapping.
>>>>>>> +              */
>>>>>>> +             if (sizeof(dma_addr_t) > sizeof(unsigned long))
>>>>>>> +                     return -EINVAL;
>>>>>>
>>>>>> As I said before, can we please use another error than EINVAL.
>>>>>> We should give drivers a chance/ability to detect this error, and e.g.
>>>>>> fallback to doing DMA mappings inside driver instead.
>>>>>>
>>>>>> I suggest using EOPNOTSUPP 95 (Operation not supported).
>>>>
>>>> Will change it to EOPNOTSUPP, thanks.
>>>
>>> Mind sending this one separately (and you can keep my reviewed-by).  It
>>> fits nicely on it's own and since I am not sure about the rest of the
>>> changes yet, it would be nice to get this one in.
>>
>> I am not sure sending this one separately really makes sense, as it is
>> mainly used to make supporting the "keep track of pp page when __skb_frag_ref()
>> is called" in patch 5 easier.
> 
> It rips out support for devices that are 32bit and have 64bit dma and
> make the whole code easier to follow.  I thought we agreed on removing
> the support for those devices regardless didn't we?

I am actually not convinced that the code about PAGE_POOL_DMA_USE_PP_FRAG_COUNT
(maybe the name is somewhat confusiong) as it it now, but it is after adding patch
5, and it seems we are not handing the skb_split() case in tso_fragment() for 32bit
arch with 64bit dma too if we still keep PAGE_POOL_DMA_USE_PP_FRAG_COUNT macro.


> 
> Regards
> /Ilias
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Linuxarm] Re: [PATCH net-next 2/7] page_pool: support non-split page with PP_FLAG_PAGE_FRAG
  2021-09-24  7:23     ` Yunsheng Lin
@ 2021-09-30  7:28       ` Yunsheng Lin
  0 siblings, 0 replies; 26+ messages in thread
From: Yunsheng Lin @ 2021-09-30  7:28 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, davem, kuba
  Cc: brouer, netdev, linux-kernel, linuxarm, hawk, ilias.apalodimas,
	jonathan.lemon, alobakin, willemb, cong.wang, pabeni, haokexin,
	nogikh, elver, memxor, edumazet, alexander.duyck, dsahern

On 2021/9/24 15:23, Yunsheng Lin wrote:
> On 2021/9/23 20:08, Jesper Dangaard Brouer wrote:
>>
>>
>> On 22/09/2021 11.41, Yunsheng Lin wrote:
>>> Currently when PP_FLAG_PAGE_FRAG is set, the caller is not
>>> expected to call page_pool_alloc_pages() directly because of
>>> the PP_FLAG_PAGE_FRAG checking in __page_pool_put_page().
>>>
>>> The patch removes the above checking to enable non-split page
>>> support when PP_FLAG_PAGE_FRAG is set.
>>>
>>> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
>>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>>> ---
>>>   net/core/page_pool.c | 12 +++++++-----
>>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>>> index a65bd7972e37..f7e71dcb6a2e 100644
>>> --- a/net/core/page_pool.c
>>> +++ b/net/core/page_pool.c
>>> @@ -315,11 +315,14 @@ struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp)
>>>         /* Fast-path: Get a page from cache */
>>>       page = __page_pool_get_cached(pool);
>>> -    if (page)
>>> -        return page;
>>>         /* Slow-path: cache empty, do real allocation */
>>> -    page = __page_pool_alloc_pages_slow(pool, gfp);
>>> +    if (!page)
>>> +        page = __page_pool_alloc_pages_slow(pool, gfp);
>>> +
>>> +    if (likely(page))
>>> +        page_pool_set_frag_count(page, 1);
>>> +
>>
>> I really don't like that you add one atomic_long_set operation per page alloc call.
>> This is a fast-path for XDP use-cases, which you are ignoring as you drivers doesn't implement XDP.
>>
>> As I cannot ask you to run XDP benchmarks, I fortunately have some page_pool specific microbenchmarks you can run instead.
>>
>> I will ask you to provide before and after results from running these benchmarks [1] and [2].
>>
>>  [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c
>>
>>  [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_cross_cpu.c
>>
>> How to use these module is documented here[3]:
>>  [3] https://prototype-kernel.readthedocs.io/en/latest/prototype-kernel/build-process.html
> 
> Will running these benchmarks to see if any performance overhead noticable here,
> thanks for the benchmarks.

You are right, there is notiable overhead for bench_page_pool_cross_cpu test
case above, possibly due to the cache bouncing caused by page_pool_set_frag_count().

As memntioned by Ilias, mlx5 use page pool and also do the recycling internally,
so handling the page frag tracking consistently for both PP_FLAG_PAGE_FRAG and
non-PP_FLAG_PAGE_FRAG will break the mlx5 driver.

So I will drop this patch for now.

> 
>>
>>>       return page;
>>>   }
>>>   EXPORT_SYMBOL(page_pool_alloc_pages);
>>> @@ -428,8 +431,7 @@ __page_pool_put_page(struct page_pool *pool, struct page *page,
>>>                unsigned int dma_sync_size, bool allow_direct)
>>>   {
>>>       /* It is not the last user for the page frag case */
>>> -    if (pool->p.flags & PP_FLAG_PAGE_FRAG &&
>>> -        page_pool_atomic_sub_frag_count_return(page, 1))
>>> +    if (page_pool_atomic_sub_frag_count_return(page, 1))
>>>           return NULL;
>>
>> This adds an atomic_long_read, even when PP_FLAG_PAGE_FRAG is not set.
> 
> The point here is to have consistent handling for both PP_FLAG_PAGE_FRAG
> and non-PP_FLAG_PAGE_FRAG case in the following patch.
> 
> As the page->_refcount is accessed in "page_ref_count(page) == 1" checking
> in __page_pool_put_page(), and page->pp_frag_count is most likely in the
> same cache line as the page->_refcount, So I am not expecting a noticable
> overhead here.
> 
> Anyway, will use the above benchmarks as an example to verify it.
> 
> 
>>
>>>         /* This allocator is optimized for the XDP mode that uses
>>>
>>
>> .
>>
> _______________________________________________
> Linuxarm mailing list -- linuxarm@openeuler.org
> To unsubscribe send an email to linuxarm-leave@openeuler.org
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-09-30  7:28 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22  9:41 [PATCH net-next 0/7] some optimization for page pool Yunsheng Lin
2021-09-22  9:41 ` [PATCH net-next 1/7] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA Yunsheng Lin
2021-09-23  9:10   ` Ilias Apalodimas
2021-09-23  9:33   ` Jesper Dangaard Brouer
2021-09-23 10:02     ` Ilias Apalodimas
2021-09-23 11:13       ` Yunsheng Lin
2021-09-23 13:07         ` Ilias Apalodimas
2021-09-24  7:04           ` Yunsheng Lin
2021-09-24  7:25             ` Ilias Apalodimas
2021-09-24  8:01               ` Yunsheng Lin
2021-09-22  9:41 ` [PATCH net-next 2/7] page_pool: support non-split page with PP_FLAG_PAGE_FRAG Yunsheng Lin
2021-09-23 12:08   ` Jesper Dangaard Brouer
2021-09-24  7:23     ` Yunsheng Lin
2021-09-30  7:28       ` [Linuxarm] " Yunsheng Lin
2021-09-22  9:41 ` [PATCH net-next 3/7] pool_pool: avoid calling compound_head() for skb frag page Yunsheng Lin
2021-09-23  8:33   ` Ilias Apalodimas
2021-09-23 11:24     ` Yunsheng Lin
2021-09-23 11:47       ` Ilias Apalodimas
2021-09-24  7:33         ` Yunsheng Lin
2021-09-24  7:44           ` Ilias Apalodimas
2021-09-22  9:41 ` [PATCH net-next 4/7] page_pool: change BIAS_MAX to support incrementing Yunsheng Lin
2021-09-22  9:41 ` [PATCH net-next 5/7] skbuff: keep track of pp page when __skb_frag_ref() is called Yunsheng Lin
2021-09-22  9:41 ` [PATCH net-next 6/7] skbuff: only use pp_magic identifier for a skb' head page Yunsheng Lin
2021-09-22  9:41 ` [PATCH net-next 7/7] skbuff: remove unused skb->pp_recycle Yunsheng Lin
2021-09-23  7:07 ` [PATCH net-next 0/7] some optimization for page pool Ilias Apalodimas
2021-09-23 11:12   ` Yunsheng Lin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.