All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next -v5 0/4] some optimization for page pool
@ 2021-10-09  9:37 Yunsheng Lin
  2021-10-09  9:37 ` [PATCH net-next -v5 1/4] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA Yunsheng Lin
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Yunsheng Lin @ 2021-10-09  9:37 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, linux-kernel, linuxarm, akpm, hawk, ilias.apalodimas,
	peterz, yuzhao, jhubbard, will, willy, jgg, mcroce, willemb,
	cong.wang, pabeni, haokexin, nogikh, elver, memxor, vvs,
	linux-mm, edumazet, alexander.duyck, dsahern

Patch 1: disable dma mapping support for 32-bit arch with 64-bit
         DMA.
Patch 2 - 4: pp page frag tracking support

The small packet drop test show no notiable performance degradation
when page pool is disabled.

V5: Keep the put_page()/get_page() semantics.

V4:
    1. Change error code to EOPNOTSUPP in patch 1.
    2. Drop patch 2.
    3. Use pp_frag_count to indicate if a pp page can be tracked,
       to avoid breaking the mlx5 driver.

V3:
    1. add patch 1/4/6/7.
    2. use pp_magic to identify pp page uniquely too.
    3. avoid unnecessary compound_head() calling.

V2: add patch 2, adjust the commit log accroding to the discussion
    in V1, and fix a compiler error reported by kernel test robot.

Yunsheng Lin (4):
  page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA
  page_pool: change BIAS_MAX to support incrementing
  mm: introduce __get_page() and __put_page()
  skbuff: keep track of pp page when pp_frag_count is used

 include/linux/mm.h       | 21 ++++++++++++++-------
 include/linux/mm_types.h | 13 +------------
 include/linux/skbuff.h   | 30 ++++++++++++++++++++----------
 include/net/page_pool.h  | 36 ++++++++++++++++++++++++------------
 mm/swap.c                |  6 +++---
 net/core/page_pool.c     | 29 +++++++++--------------------
 net/core/skbuff.c        | 10 ++++++++--
 7 files changed, 79 insertions(+), 66 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net-next -v5 1/4] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA
  2021-10-09  9:37 [PATCH net-next -v5 0/4] some optimization for page pool Yunsheng Lin
@ 2021-10-09  9:37 ` Yunsheng Lin
  2021-10-09  9:37 ` [PATCH net-next -v5 2/4] page_pool: change BIAS_MAX to support incrementing Yunsheng Lin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Yunsheng Lin @ 2021-10-09  9:37 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, linux-kernel, linuxarm, akpm, hawk, ilias.apalodimas,
	peterz, yuzhao, jhubbard, will, willy, jgg, mcroce, willemb,
	cong.wang, pabeni, haokexin, nogikh, elver, memxor, vvs,
	linux-mm, edumazet, alexander.duyck, dsahern

As the 32-bit arch with 64-bit DMA seems to rare those days,
and page pool is carrying a lot of code and complexity for
systems that possibly don't exist when the pp page frag
tracking support is added.

So disable dma mapping support for such systems, if drivers
really want to work on such systems, they have to implement
their own DMA-mapping fallback tracking outside page_pool.

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 include/linux/mm_types.h | 13 +------------
 include/net/page_pool.h  | 12 +-----------
 net/core/page_pool.c     | 10 ++++++----
 3 files changed, 8 insertions(+), 27 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 7f8ee09c711f..436e0946d691 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -104,18 +104,7 @@ struct page {
 			struct page_pool *pp;
 			unsigned long _pp_mapping_pad;
 			unsigned long dma_addr;
-			union {
-				/**
-				 * dma_addr_upper: might require a 64-bit
-				 * value on 32-bit architectures.
-				 */
-				unsigned long dma_addr_upper;
-				/**
-				 * For frag page support, not supported in
-				 * 32-bit architectures with 64-bit DMA.
-				 */
-				atomic_long_t pp_frag_count;
-			};
+			atomic_long_t pp_frag_count;
 		};
 		struct {	/* slab, slob and slub */
 			union {
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index a4082406a003..3855f069627f 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -216,24 +216,14 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
 	page_pool_put_full_page(pool, page, true);
 }
 
-#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT	\
-		(sizeof(dma_addr_t) > sizeof(unsigned long))
-
 static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
 {
-	dma_addr_t ret = page->dma_addr;
-
-	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
-		ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16;
-
-	return ret;
+	return page->dma_addr;
 }
 
 static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
 {
 	page->dma_addr = addr;
-	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
-		page->dma_addr_upper = upper_32_bits(addr);
 }
 
 static inline void page_pool_set_frag_count(struct page *page, long nr)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1a6978427d6c..9b60e4301a44 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -49,6 +49,12 @@ static int page_pool_init(struct page_pool *pool,
 	 * which is the XDP_TX use-case.
 	 */
 	if (pool->p.flags & PP_FLAG_DMA_MAP) {
+		/* DMA-mapping is not supported on 32-bit systems with
+		 * 64-bit DMA mapping.
+		 */
+		if (sizeof(dma_addr_t) > sizeof(unsigned long))
+			return -EOPNOTSUPP;
+
 		if ((pool->p.dma_dir != DMA_FROM_DEVICE) &&
 		    (pool->p.dma_dir != DMA_BIDIRECTIONAL))
 			return -EINVAL;
@@ -69,10 +75,6 @@ static int page_pool_init(struct page_pool *pool,
 		 */
 	}
 
-	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
-	    pool->p.flags & PP_FLAG_PAGE_FRAG)
-		return -EINVAL;
-
 	if (ptr_ring_init(&pool->ring, ring_qsize, GFP_KERNEL) < 0)
 		return -ENOMEM;
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next -v5 2/4] page_pool: change BIAS_MAX to support incrementing
  2021-10-09  9:37 [PATCH net-next -v5 0/4] some optimization for page pool Yunsheng Lin
  2021-10-09  9:37 ` [PATCH net-next -v5 1/4] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA Yunsheng Lin
@ 2021-10-09  9:37 ` Yunsheng Lin
  2021-10-09  9:37 ` [PATCH net-next -v5 3/4] mm: introduce __get_page() and __put_page() Yunsheng Lin
  2021-10-09  9:37 ` [PATCH net-next -v5 4/4] skbuff: keep track of pp page when pp_frag_count is used Yunsheng Lin
  3 siblings, 0 replies; 16+ messages in thread
From: Yunsheng Lin @ 2021-10-09  9:37 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, linux-kernel, linuxarm, akpm, hawk, ilias.apalodimas,
	peterz, yuzhao, jhubbard, will, willy, jgg, mcroce, willemb,
	cong.wang, pabeni, haokexin, nogikh, elver, memxor, vvs,
	linux-mm, edumazet, alexander.duyck, dsahern

As the page->pp_frag_count need incrementing for pp page
frag tracking support, so change BIAS_MAX to (LONG_MAX / 2)
to avoid overflowing.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 net/core/page_pool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 9b60e4301a44..2c643b72ce16 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -24,7 +24,7 @@
 #define DEFER_TIME (msecs_to_jiffies(1000))
 #define DEFER_WARN_INTERVAL (60 * HZ)
 
-#define BIAS_MAX	LONG_MAX
+#define BIAS_MAX	(LONG_MAX / 2)
 
 static int page_pool_init(struct page_pool *pool,
 			  const struct page_pool_params *params)
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next -v5 3/4] mm: introduce __get_page() and __put_page()
  2021-10-09  9:37 [PATCH net-next -v5 0/4] some optimization for page pool Yunsheng Lin
  2021-10-09  9:37 ` [PATCH net-next -v5 1/4] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA Yunsheng Lin
  2021-10-09  9:37 ` [PATCH net-next -v5 2/4] page_pool: change BIAS_MAX to support incrementing Yunsheng Lin
@ 2021-10-09  9:37 ` Yunsheng Lin
  2021-10-09 19:49   ` John Hubbard
  2021-10-09  9:37 ` [PATCH net-next -v5 4/4] skbuff: keep track of pp page when pp_frag_count is used Yunsheng Lin
  3 siblings, 1 reply; 16+ messages in thread
From: Yunsheng Lin @ 2021-10-09  9:37 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, linux-kernel, linuxarm, akpm, hawk, ilias.apalodimas,
	peterz, yuzhao, jhubbard, will, willy, jgg, mcroce, willemb,
	cong.wang, pabeni, haokexin, nogikh, elver, memxor, vvs,
	linux-mm, edumazet, alexander.duyck, dsahern

Introduce __get_page() and __put_page() to operate on the
base page or head of a compound page for the cases when a
page is known to be a base page or head of a compound page.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 include/linux/mm.h | 21 ++++++++++++++-------
 mm/swap.c          |  6 +++---
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73a52aba448f..5683313c3e9d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -902,7 +902,7 @@ static inline struct page *virt_to_head_page(const void *x)
 	return compound_head(page);
 }
 
-void __put_page(struct page *page);
+void __put_single_or_compound_page(struct page *page);
 
 void put_pages_list(struct list_head *pages);
 
@@ -1203,9 +1203,8 @@ static inline bool is_pci_p2pdma_page(const struct page *page)
 #define page_ref_zero_or_close_to_overflow(page) \
 	((unsigned int) page_ref_count(page) + 127u <= 127u)
 
-static inline void get_page(struct page *page)
+static inline void __get_page(struct page *page)
 {
-	page = compound_head(page);
 	/*
 	 * Getting a normal page or the head of a compound page
 	 * requires to already have an elevated page->_refcount.
@@ -1214,6 +1213,11 @@ static inline void get_page(struct page *page)
 	page_ref_inc(page);
 }
 
+static inline void get_page(struct page *page)
+{
+	__get_page(compound_head(page));
+}
+
 bool __must_check try_grab_page(struct page *page, unsigned int flags);
 struct page *try_grab_compound_head(struct page *page, int refs,
 				    unsigned int flags);
@@ -1228,10 +1232,8 @@ static inline __must_check bool try_get_page(struct page *page)
 	return true;
 }
 
-static inline void put_page(struct page *page)
+static inline void __put_page(struct page *page)
 {
-	page = compound_head(page);
-
 	/*
 	 * For devmap managed pages we need to catch refcount transition from
 	 * 2 to 1, when refcount reach one it means the page is free and we
@@ -1244,7 +1246,12 @@ static inline void put_page(struct page *page)
 	}
 
 	if (put_page_testzero(page))
-		__put_page(page);
+		__put_single_or_compound_page(page);
+}
+
+static inline void put_page(struct page *page)
+{
+	__put_page(compound_head(page));
 }
 
 /*
diff --git a/mm/swap.c b/mm/swap.c
index af3cad4e5378..565cbde1caea 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -111,7 +111,7 @@ static void __put_compound_page(struct page *page)
 	destroy_compound_page(page);
 }
 
-void __put_page(struct page *page)
+void __put_single_or_compound_page(struct page *page)
 {
 	if (is_zone_device_page(page)) {
 		put_dev_pagemap(page->pgmap);
@@ -128,7 +128,7 @@ void __put_page(struct page *page)
 	else
 		__put_single_page(page);
 }
-EXPORT_SYMBOL(__put_page);
+EXPORT_SYMBOL(__put_single_or_compound_page);
 
 /**
  * put_pages_list() - release a list of pages
@@ -1153,7 +1153,7 @@ void put_devmap_managed_page(struct page *page)
 	if (count == 1)
 		free_devmap_managed_page(page);
 	else if (!count)
-		__put_page(page);
+		__put_single_or_compound_page(page);
 }
 EXPORT_SYMBOL(put_devmap_managed_page);
 #endif
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next -v5 4/4] skbuff: keep track of pp page when pp_frag_count is used
  2021-10-09  9:37 [PATCH net-next -v5 0/4] some optimization for page pool Yunsheng Lin
                   ` (2 preceding siblings ...)
  2021-10-09  9:37 ` [PATCH net-next -v5 3/4] mm: introduce __get_page() and __put_page() Yunsheng Lin
@ 2021-10-09  9:37 ` Yunsheng Lin
  2021-10-09 12:11     ` kernel test robot
  2021-10-09 12:12     ` kernel test robot
  3 siblings, 2 replies; 16+ messages in thread
From: Yunsheng Lin @ 2021-10-09  9:37 UTC (permalink / raw)
  To: davem, kuba
  Cc: netdev, linux-kernel, linuxarm, akpm, hawk, ilias.apalodimas,
	peterz, yuzhao, jhubbard, will, willy, jgg, mcroce, willemb,
	cong.wang, pabeni, haokexin, nogikh, elver, memxor, vvs,
	linux-mm, edumazet, alexander.duyck, dsahern

As the skb->pp_recycle and page->pp_magic may not be enough
to track if a frag page is from page pool after the calling
of __skb_frag_ref(), mostly because of a data race, see:
commit 2cc3aeb5eccc ("skbuff: Fix a potential race while
recycling page_pool packets").

There may be clone and expand head case that might lose the
track if a frag page is from page pool or not.

And not being able to keep track of pp page may cause problem
for the skb_split() case in tso_fragment() too:
Supposing a skb has 3 frag pages, all coming from a page pool,
and is split to skb1 and skb2:
skb1: first frag page + first half of second frag page
skb2: second half of second frag page + third frag page

How do we set the skb->pp_recycle of skb1 and skb2?
1. If we set both of them to 1, then we may have a similar
   race as the above commit for second frag page.
2. If we set only one of them to 1, then we may have resource
   leaking problem as both first frag page and third frag page
   are indeed from page pool.

Increment the pp_frag_count of pp page frag in __skb_frag_ref(),
and only use page->pp_magic to indicate a pp page frag in
__skb_frag_unref() to keep track of pp page frag.

Similar handling is done for the head page of a skb too.

As we need the head page of a compound page to decide if it is
from page pool at first, so __page_frag_cache_drain() and
page_ref_inc() is used to avoid unnecessary compound_head()
calling.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 include/linux/skbuff.h  | 30 ++++++++++++++++++++----------
 include/net/page_pool.h | 24 +++++++++++++++++++++++-
 net/core/page_pool.c    | 17 ++---------------
 net/core/skbuff.c       | 10 ++++++++--
 4 files changed, 53 insertions(+), 28 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 841e2f0f5240..c4f8b04a694c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3073,7 +3073,19 @@ static inline struct page *skb_frag_page(const skb_frag_t *frag)
  */
 static inline void __skb_frag_ref(skb_frag_t *frag)
 {
-	get_page(skb_frag_page(frag));
+	struct page *page = skb_frag_page(frag);
+
+	page = compound_head(page);
+
+#ifdef CONFIG_PAGE_POOL
+	if (page_pool_is_pp_page(page) &&
+	    page_pool_is_pp_page_frag(page)) {
+		page_pool_atomic_inc_frag_count(page);
+		return;
+	}
+#endif
+
+	__get_page(page);
 }
 
 /**
@@ -3100,11 +3112,16 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle)
 {
 	struct page *page = skb_frag_page(frag);
 
+	page = compound_head(page);
+
 #ifdef CONFIG_PAGE_POOL
-	if (recycle && page_pool_return_skb_page(page))
+	if (page_pool_is_pp_page(page) &&
+	    (recycle || page_pool_is_pp_page_frag(page))) {
+		page_pool_return_skb_page(page);
 		return;
+	}
 #endif
-	put_page(page);
+	__put_page(page);
 }
 
 /**
@@ -4718,12 +4735,5 @@ static inline void skb_mark_for_recycle(struct sk_buff *skb)
 }
 #endif
 
-static inline bool skb_pp_recycle(struct sk_buff *skb, void *data)
-{
-	if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
-		return false;
-	return page_pool_return_skb_page(virt_to_page(data));
-}
-
 #endif	/* __KERNEL__ */
 #endif	/* _LINUX_SKBUFF_H */
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 3855f069627f..740a8ca7f4a6 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -164,7 +164,7 @@ inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
 	return pool->p.dma_dir;
 }
 
-bool page_pool_return_skb_page(struct page *page);
+void page_pool_return_skb_page(struct page *page);
 
 struct page_pool *page_pool_create(const struct page_pool_params *params);
 
@@ -231,6 +231,28 @@ static inline void page_pool_set_frag_count(struct page *page, long nr)
 	atomic_long_set(&page->pp_frag_count, nr);
 }
 
+static inline void page_pool_atomic_inc_frag_count(struct page *page)
+{
+	atomic_long_inc(&page->pp_frag_count);
+}
+
+static inline bool page_pool_is_pp_page(struct page *page)
+{
+	/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
+	 * in order to preserve any existing bits, such as bit 0 for the
+	 * head page of compound page and bit 1 for pfmemalloc page, so
+	 * mask those bits for freeing side when doing below checking,
+	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
+	 * to avoid recycling the pfmemalloc page.
+	 */
+	return (page->pp_magic & ~0x3UL) == PP_SIGNATURE;
+}
+
+static inline bool page_pool_is_pp_page_frag(struct page *page)
+{
+	return !!atomic_long_read(&page->pp_frag_count);
+}
+
 static inline long page_pool_atomic_sub_frag_count_return(struct page *page,
 							  long nr)
 {
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 2c643b72ce16..d141e00459c9 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -219,6 +219,7 @@ static void page_pool_set_pp_info(struct page_pool *pool,
 {
 	page->pp = pool;
 	page->pp_magic |= PP_SIGNATURE;
+	page_pool_set_frag_count(page, 0);
 }
 
 static void page_pool_clear_pp_info(struct page *page)
@@ -736,22 +737,10 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
 }
 EXPORT_SYMBOL(page_pool_update_nid);
 
-bool page_pool_return_skb_page(struct page *page)
+void page_pool_return_skb_page(struct page *page)
 {
 	struct page_pool *pp;
 
-	page = compound_head(page);
-
-	/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
-	 * in order to preserve any existing bits, such as bit 0 for the
-	 * head page of compound page and bit 1 for pfmemalloc page, so
-	 * mask those bits for freeing side when doing below checking,
-	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
-	 * to avoid recycling the pfmemalloc page.
-	 */
-	if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE))
-		return false;
-
 	pp = page->pp;
 
 	/* Driver set this to memory recycling info. Reset it on recycle.
@@ -760,7 +749,5 @@ bool page_pool_return_skb_page(struct page *page)
 	 * 'flipped' fragment being in use or not.
 	 */
 	page_pool_put_full_page(pp, page, false);
-
-	return true;
 }
 EXPORT_SYMBOL(page_pool_return_skb_page);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 74601bbc56ac..e3691b025d30 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -646,9 +646,15 @@ static void skb_free_head(struct sk_buff *skb)
 	unsigned char *head = skb->head;
 
 	if (skb->head_frag) {
-		if (skb_pp_recycle(skb, head))
+		struct page *page = virt_to_head_page(head);
+
+		if (page_pool_is_pp_page(page) &&
+		    (skb->pp_recycle || page_pool_is_pp_page_frag(page))) {
+			page_pool_return_skb_page(page);
 			return;
-		skb_free_frag(head);
+		}
+
+		__page_frag_cache_drain(page, 1);
 	} else {
 		kfree(head);
 	}
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next -v5 4/4] skbuff: keep track of pp page when pp_frag_count is used
  2021-10-09  9:37 ` [PATCH net-next -v5 4/4] skbuff: keep track of pp page when pp_frag_count is used Yunsheng Lin
@ 2021-10-09 12:11     ` kernel test robot
  2021-10-09 12:12     ` kernel test robot
  1 sibling, 0 replies; 16+ messages in thread
From: kernel test robot @ 2021-10-09 12:11 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba
  Cc: kbuild-all, netdev, linux-kernel, linuxarm, akpm, hawk,
	ilias.apalodimas, peterz, yuzhao

[-- Attachment #1: Type: text/plain, Size: 1515 bytes --]

Hi Yunsheng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Yunsheng-Lin/some-optimization-for-page-pool/20211009-174023
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git f12e658c620a925aba4caead54a05eb157728863
config: ia64-defconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/d58490b58ef2cfa172d5e1ff2d13b713a6d38969
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Yunsheng-Lin/some-optimization-for-page-pool/20211009-174023
        git checkout d58490b58ef2cfa172d5e1ff2d13b713a6d38969
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=ia64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ia64-linux-ld: net/core/skbuff.o: in function `skb_free_head':
>> skbuff.c:(.text+0x6c92): undefined reference to `page_pool_return_skb_page'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 19974 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next -v5 4/4] skbuff: keep track of pp page when pp_frag_count is used
@ 2021-10-09 12:11     ` kernel test robot
  0 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2021-10-09 12:11 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1549 bytes --]

Hi Yunsheng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Yunsheng-Lin/some-optimization-for-page-pool/20211009-174023
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git f12e658c620a925aba4caead54a05eb157728863
config: ia64-defconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/d58490b58ef2cfa172d5e1ff2d13b713a6d38969
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Yunsheng-Lin/some-optimization-for-page-pool/20211009-174023
        git checkout d58490b58ef2cfa172d5e1ff2d13b713a6d38969
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=ia64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ia64-linux-ld: net/core/skbuff.o: in function `skb_free_head':
>> skbuff.c:(.text+0x6c92): undefined reference to `page_pool_return_skb_page'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 19974 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next -v5 4/4] skbuff: keep track of pp page when pp_frag_count is used
  2021-10-09  9:37 ` [PATCH net-next -v5 4/4] skbuff: keep track of pp page when pp_frag_count is used Yunsheng Lin
@ 2021-10-09 12:12     ` kernel test robot
  2021-10-09 12:12     ` kernel test robot
  1 sibling, 0 replies; 16+ messages in thread
From: kernel test robot @ 2021-10-09 12:12 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba
  Cc: kbuild-all, netdev, linux-kernel, linuxarm, akpm, hawk,
	ilias.apalodimas, peterz, yuzhao

[-- Attachment #1: Type: text/plain, Size: 1867 bytes --]

Hi Yunsheng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Yunsheng-Lin/some-optimization-for-page-pool/20211009-174023
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git f12e658c620a925aba4caead54a05eb157728863
config: i386-randconfig-a001-20211009 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/d58490b58ef2cfa172d5e1ff2d13b713a6d38969
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Yunsheng-Lin/some-optimization-for-page-pool/20211009-174023
        git checkout d58490b58ef2cfa172d5e1ff2d13b713a6d38969
        # save the attached .config to linux build tree
        mkdir build_dir
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: net/core/skbuff.o: in function `skb_free_head':
>> net/core/skbuff.c:653: undefined reference to `page_pool_return_skb_page'


vim +653 net/core/skbuff.c

   643	
   644	static void skb_free_head(struct sk_buff *skb)
   645	{
   646		unsigned char *head = skb->head;
   647	
   648		if (skb->head_frag) {
   649			struct page *page = virt_to_head_page(head);
   650	
   651			if (page_pool_is_pp_page(page) &&
   652			    (skb->pp_recycle || page_pool_is_pp_page_frag(page))) {
 > 653				page_pool_return_skb_page(page);
   654				return;
   655			}
   656	
   657			__page_frag_cache_drain(page, 1);
   658		} else {
   659			kfree(head);
   660		}
   661	}
   662	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34063 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next -v5 4/4] skbuff: keep track of pp page when pp_frag_count is used
@ 2021-10-09 12:12     ` kernel test robot
  0 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2021-10-09 12:12 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1923 bytes --]

Hi Yunsheng,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Yunsheng-Lin/some-optimization-for-page-pool/20211009-174023
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git f12e658c620a925aba4caead54a05eb157728863
config: i386-randconfig-a001-20211009 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/d58490b58ef2cfa172d5e1ff2d13b713a6d38969
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Yunsheng-Lin/some-optimization-for-page-pool/20211009-174023
        git checkout d58490b58ef2cfa172d5e1ff2d13b713a6d38969
        # save the attached .config to linux build tree
        mkdir build_dir
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: net/core/skbuff.o: in function `skb_free_head':
>> net/core/skbuff.c:653: undefined reference to `page_pool_return_skb_page'


vim +653 net/core/skbuff.c

   643	
   644	static void skb_free_head(struct sk_buff *skb)
   645	{
   646		unsigned char *head = skb->head;
   647	
   648		if (skb->head_frag) {
   649			struct page *page = virt_to_head_page(head);
   650	
   651			if (page_pool_is_pp_page(page) &&
   652			    (skb->pp_recycle || page_pool_is_pp_page_frag(page))) {
 > 653				page_pool_return_skb_page(page);
   654				return;
   655			}
   656	
   657			__page_frag_cache_drain(page, 1);
   658		} else {
   659			kfree(head);
   660		}
   661	}
   662	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 34063 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next -v5 3/4] mm: introduce __get_page() and __put_page()
  2021-10-09  9:37 ` [PATCH net-next -v5 3/4] mm: introduce __get_page() and __put_page() Yunsheng Lin
@ 2021-10-09 19:49   ` John Hubbard
  2021-10-09 20:15     ` Matthew Wilcox
  2021-10-11 12:25     ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 16+ messages in thread
From: John Hubbard @ 2021-10-09 19:49 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba
  Cc: netdev, linux-kernel, linuxarm, akpm, hawk, ilias.apalodimas,
	peterz, yuzhao, will, willy, jgg, mcroce, willemb, cong.wang,
	pabeni, haokexin, nogikh, elver, memxor, vvs, linux-mm, edumazet,
	alexander.duyck, dsahern

On 10/9/21 02:37, Yunsheng Lin wrote:
> Introduce __get_page() and __put_page() to operate on the
> base page or head of a compound page for the cases when a
> page is known to be a base page or head of a compound page.

Hi,

I wonder if you are aware of a much larger, 137-patch seriesto do that:
folio/pageset [1]?

The naming you are proposing here does not really improve clarity. There
is nothing about __get_page() that makes it clear that it's meant only
for head/base pages, while get_page() tail pages as well. And the
well-known and widely used get_page() and put_page() get their meaning
shifted.

This area is hard to get right, and that's why there have been 15
versions, and a lot of contention associated with [1]. If you have an
alternate approach, I think it would be better in its own separate
series, with a cover letter that, at a minimum, explains how it compares
to folios/pagesets.

So in case it's not clear, I'd like to request that you drop this one
patch from your series.


[1] https://lore.kernel.org/r/YSPwmNNuuQhXNToQ@casper.infradead.org

thanks,
-- 
John Hubbard
NVIDIA

> 
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
>   include/linux/mm.h | 21 ++++++++++++++-------
>   mm/swap.c          |  6 +++---
>   2 files changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 73a52aba448f..5683313c3e9d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -902,7 +902,7 @@ static inline struct page *virt_to_head_page(const void *x)
>   	return compound_head(page);
>   }
>   
> -void __put_page(struct page *page);
> +void __put_single_or_compound_page(struct page *page);
>   
>   void put_pages_list(struct list_head *pages);
>   
> @@ -1203,9 +1203,8 @@ static inline bool is_pci_p2pdma_page(const struct page *page)
>   #define page_ref_zero_or_close_to_overflow(page) \
>   	((unsigned int) page_ref_count(page) + 127u <= 127u)
>   
> -static inline void get_page(struct page *page)
> +static inline void __get_page(struct page *page)
>   {
> -	page = compound_head(page);
>   	/*
>   	 * Getting a normal page or the head of a compound page
>   	 * requires to already have an elevated page->_refcount.
> @@ -1214,6 +1213,11 @@ static inline void get_page(struct page *page)
>   	page_ref_inc(page);
>   }
>   
> +static inline void get_page(struct page *page)
> +{
> +	__get_page(compound_head(page));
> +}
> +
>   bool __must_check try_grab_page(struct page *page, unsigned int flags);
>   struct page *try_grab_compound_head(struct page *page, int refs,
>   				    unsigned int flags);
> @@ -1228,10 +1232,8 @@ static inline __must_check bool try_get_page(struct page *page)
>   	return true;
>   }
>   
> -static inline void put_page(struct page *page)
> +static inline void __put_page(struct page *page)
>   {
> -	page = compound_head(page);
> -
>   	/*
>   	 * For devmap managed pages we need to catch refcount transition from
>   	 * 2 to 1, when refcount reach one it means the page is free and we
> @@ -1244,7 +1246,12 @@ static inline void put_page(struct page *page)
>   	}
>   
>   	if (put_page_testzero(page))
> -		__put_page(page);
> +		__put_single_or_compound_page(page);
> +}
> +
> +static inline void put_page(struct page *page)
> +{
> +	__put_page(compound_head(page));
>   }
>   
>   /*
> diff --git a/mm/swap.c b/mm/swap.c
> index af3cad4e5378..565cbde1caea 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -111,7 +111,7 @@ static void __put_compound_page(struct page *page)
>   	destroy_compound_page(page);
>   }
>   
> -void __put_page(struct page *page)
> +void __put_single_or_compound_page(struct page *page)
>   {
>   	if (is_zone_device_page(page)) {
>   		put_dev_pagemap(page->pgmap);
> @@ -128,7 +128,7 @@ void __put_page(struct page *page)
>   	else
>   		__put_single_page(page);
>   }
> -EXPORT_SYMBOL(__put_page);
> +EXPORT_SYMBOL(__put_single_or_compound_page);
>   
>   /**
>    * put_pages_list() - release a list of pages
> @@ -1153,7 +1153,7 @@ void put_devmap_managed_page(struct page *page)
>   	if (count == 1)
>   		free_devmap_managed_page(page);
>   	else if (!count)
> -		__put_page(page);
> +		__put_single_or_compound_page(page);
>   }
>   EXPORT_SYMBOL(put_devmap_managed_page);
>   #endif
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next -v5 3/4] mm: introduce __get_page() and __put_page()
  2021-10-09 19:49   ` John Hubbard
@ 2021-10-09 20:15     ` Matthew Wilcox
  2021-10-11  6:37       ` Yunsheng Lin
  2021-10-11 12:25     ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2021-10-09 20:15 UTC (permalink / raw)
  To: John Hubbard
  Cc: Yunsheng Lin, davem, kuba, netdev, linux-kernel, linuxarm, akpm,
	hawk, ilias.apalodimas, peterz, yuzhao, will, jgg, mcroce,
	willemb, cong.wang, pabeni, haokexin, nogikh, elver, memxor, vvs,
	linux-mm, edumazet, alexander.duyck, dsahern

On Sat, Oct 09, 2021 at 12:49:29PM -0700, John Hubbard wrote:
> On 10/9/21 02:37, Yunsheng Lin wrote:
> > Introduce __get_page() and __put_page() to operate on the
> > base page or head of a compound page for the cases when a
> > page is known to be a base page or head of a compound page.
> 
> Hi,
> 
> I wonder if you are aware of a much larger, 137-patch seriesto do that:
> folio/pageset [1]?
> 
> The naming you are proposing here does not really improve clarity. There
> is nothing about __get_page() that makes it clear that it's meant only
> for head/base pages, while get_page() tail pages as well. And the
> well-known and widely used get_page() and put_page() get their meaning
> shifted.
> 
> This area is hard to get right, and that's why there have been 15
> versions, and a lot of contention associated with [1]. If you have an
> alternate approach, I think it would be better in its own separate
> series, with a cover letter that, at a minimum, explains how it compares
> to folios/pagesets.

I wasn't initially sure whether network pagepools should be part of
struct folio or should be their own separate type.  At this point, I
think they should be a folio.  But that's all kind of irrelevant until
Linus decides whether he's going to take the folio patchset or not.
Feel free to let him know your opinion when the inevitable argument
blows up again around the next pull request.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next -v5 3/4] mm: introduce __get_page() and __put_page()
  2021-10-09 20:15     ` Matthew Wilcox
@ 2021-10-11  6:37       ` Yunsheng Lin
  0 siblings, 0 replies; 16+ messages in thread
From: Yunsheng Lin @ 2021-10-11  6:37 UTC (permalink / raw)
  To: Matthew Wilcox, John Hubbard
  Cc: davem, kuba, netdev, linux-kernel, linuxarm, akpm, hawk,
	ilias.apalodimas, peterz, yuzhao, will, jgg, mcroce, willemb,
	cong.wang, pabeni, haokexin, nogikh, elver, memxor, vvs,
	linux-mm, edumazet, alexander.duyck, dsahern

On 2021/10/10 4:15, Matthew Wilcox wrote:
> On Sat, Oct 09, 2021 at 12:49:29PM -0700, John Hubbard wrote:
>> On 10/9/21 02:37, Yunsheng Lin wrote:
>>> Introduce __get_page() and __put_page() to operate on the
>>> base page or head of a compound page for the cases when a
>>> page is known to be a base page or head of a compound page.
>>
>> Hi,
>>
>> I wonder if you are aware of a much larger, 137-patch seriesto do that:
>> folio/pageset [1]?
>>
>> The naming you are proposing here does not really improve clarity. There
>> is nothing about __get_page() that makes it clear that it's meant only
>> for head/base pages, while get_page() tail pages as well. And the
>> well-known and widely used get_page() and put_page() get their meaning
>> shifted.
>>
>> This area is hard to get right, and that's why there have been 15
>> versions, and a lot of contention associated with [1]. If you have an
>> alternate approach, I think it would be better in its own separate
>> series, with a cover letter that, at a minimum, explains how it compares
>> to folios/pagesets.

As I was not familiar enough with mm, so I tried the semantic of
__page_frag_cache_drain(), which expects a base page or the head
page of a compound page too.

I suppose we may need to put a BUG_ON() to catch the case of
user passing a tail page accidentally, which is a run time error
and has run time overhead?
And adding a new type like folio will allow the compiler to
catch the error without any overhead?

> 
> I wasn't initially sure whether network pagepools should be part of
> struct folio or should be their own separate type.  At this point, I

Actually only a few driver are using page pool now, and others are mostly
using page allocator directly, see page_frag_alloc_align() and
skb_page_frag_refill(), only changing the page pool does not seems helpful
here, maybe the whole network stack should be using a new type like folio,
as the netstask does not need to deal with tail page directly? And it seems
virt_to_page() is one of the things need handling when netstack is changed
to use a new type like folio?

> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next -v5 3/4] mm: introduce __get_page() and __put_page()
  2021-10-09 19:49   ` John Hubbard
  2021-10-09 20:15     ` Matthew Wilcox
@ 2021-10-11 12:25     ` Jesper Dangaard Brouer
  2021-10-11 12:29       ` Ilias Apalodimas
  1 sibling, 1 reply; 16+ messages in thread
From: Jesper Dangaard Brouer @ 2021-10-11 12:25 UTC (permalink / raw)
  To: John Hubbard, Yunsheng Lin, davem, kuba
  Cc: brouer, netdev, linux-kernel, linuxarm, akpm, hawk,
	ilias.apalodimas, peterz, yuzhao, will, willy, jgg, mcroce,
	willemb, cong.wang, pabeni, haokexin, nogikh, elver, memxor, vvs,
	linux-mm, edumazet, alexander.duyck, dsahern



On 09/10/2021 21.49, John Hubbard wrote:
> So in case it's not clear, I'd like to request that you drop this one
> patch from your series.

In my opinion as page_pool maintainer, you should also drop patch 4/4 
from this series.

I like the first two patches, and they should be resend and can be 
applied without too much further discussion.

--Jesper


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next -v5 3/4] mm: introduce __get_page() and __put_page()
  2021-10-11 12:25     ` Jesper Dangaard Brouer
@ 2021-10-11 12:29       ` Ilias Apalodimas
  2021-10-12  7:38         ` Yunsheng Lin
  0 siblings, 1 reply; 16+ messages in thread
From: Ilias Apalodimas @ 2021-10-11 12:29 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: John Hubbard, Yunsheng Lin, davem, kuba, brouer, netdev,
	linux-kernel, linuxarm, akpm, hawk, peterz, yuzhao, will, willy,
	jgg, mcroce, willemb, cong.wang, pabeni, haokexin, nogikh, elver,
	memxor, vvs, linux-mm, edumazet, alexander.duyck, dsahern

On Mon, Oct 11, 2021 at 02:25:08PM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> On 09/10/2021 21.49, John Hubbard wrote:
> > So in case it's not clear, I'd like to request that you drop this one
> > patch from your series.
> 
> In my opinion as page_pool maintainer, you should also drop patch 4/4 from
> this series.
> 
> I like the first two patches, and they should be resend and can be applied
> without too much further discussion.

+1
That's what I hinted on the previous version. The patches right now go way
beyond the spec of page pool.  We are starting to change core networking
functions and imho we need a lot more people involved in this discussion,
than the ones participating already.

As a general note and the reason I am so hesitant,  is that we are starting
to violate layers here (at least in my opinion).  When the recycling was
added,  my main concern was to keep the network stack unaware (apart from
the skb bit).  Now suddenly we need to teach frag_ref/unref internal page
pool counters and that doesn't feel right.  We first need to prove the race
can actually happen, before starting to change things.

Regards
/Ilias
> 
> --Jesper
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next -v5 3/4] mm: introduce __get_page() and __put_page()
  2021-10-11 12:29       ` Ilias Apalodimas
@ 2021-10-12  7:38         ` Yunsheng Lin
  2021-10-12  7:49           ` Ilias Apalodimas
  0 siblings, 1 reply; 16+ messages in thread
From: Yunsheng Lin @ 2021-10-12  7:38 UTC (permalink / raw)
  To: Ilias Apalodimas, Jesper Dangaard Brouer
  Cc: John Hubbard, davem, kuba, brouer, netdev, linux-kernel,
	linuxarm, akpm, hawk, peterz, yuzhao, will, willy, jgg, mcroce,
	willemb, cong.wang, pabeni, haokexin, nogikh, elver, memxor, vvs,
	linux-mm, edumazet, alexander.duyck, dsahern

On 2021/10/11 20:29, Ilias Apalodimas wrote:
> On Mon, Oct 11, 2021 at 02:25:08PM +0200, Jesper Dangaard Brouer wrote:
>>
>>
>> On 09/10/2021 21.49, John Hubbard wrote:
>>> So in case it's not clear, I'd like to request that you drop this one
>>> patch from your series.
>>
>> In my opinion as page_pool maintainer, you should also drop patch 4/4 from
>> this series.
>>
>> I like the first two patches, and they should be resend and can be applied
>> without too much further discussion.
> 
> +1

Ok, it seems there is a lot of contention about how to avoid calling
compound_head() now.

Will send out the uncontroversial one first.

> That's what I hinted on the previous version. The patches right now go way
> beyond the spec of page pool.  We are starting to change core networking
> functions and imho we need a lot more people involved in this discussion,
> than the ones participating already.
> 
> As a general note and the reason I am so hesitant,  is that we are starting
> to violate layers here (at least in my opinion).  When the recycling was
> added,  my main concern was to keep the network stack unaware (apart from
> the skb bit).  Now suddenly we need to teach frag_ref/unref internal page

Maybe the skb recycle bit is a clever way to avoid dealing with the network
stack directly.

But that bit might also introduce or hide some problem, like the data race
as pointed out by Alexander, and the odd using of page pool in mlx5 driver.

> pool counters and that doesn't feel right.  We first need to prove the race
> can actually happen, before starting to change things.

As the network stack is adding a lot of performance improvement, such as
sockmap for BPF, which may cause problem for them, will dig more to prove
that.

> 
> Regards
> /Ilias
>>
>> --Jesper
>>
> .
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next -v5 3/4] mm: introduce __get_page() and __put_page()
  2021-10-12  7:38         ` Yunsheng Lin
@ 2021-10-12  7:49           ` Ilias Apalodimas
  0 siblings, 0 replies; 16+ messages in thread
From: Ilias Apalodimas @ 2021-10-12  7:49 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Jesper Dangaard Brouer, John Hubbard, davem, kuba, brouer,
	netdev, linux-kernel, linuxarm, akpm, hawk, peterz, yuzhao, will,
	willy, jgg, mcroce, willemb, cong.wang, pabeni, haokexin, nogikh,
	elver, memxor, vvs, linux-mm, edumazet, alexander.duyck, dsahern

On Tue, Oct 12, 2021 at 03:38:15PM +0800, Yunsheng Lin wrote:
> On 2021/10/11 20:29, Ilias Apalodimas wrote:
> > On Mon, Oct 11, 2021 at 02:25:08PM +0200, Jesper Dangaard Brouer wrote:
> >>
> >>
> >> On 09/10/2021 21.49, John Hubbard wrote:
> >>> So in case it's not clear, I'd like to request that you drop this one
> >>> patch from your series.
> >>
> >> In my opinion as page_pool maintainer, you should also drop patch 4/4 from
> >> this series.
> >>
> >> I like the first two patches, and they should be resend and can be applied
> >> without too much further discussion.
> > 
> > +1
> 
> Ok, it seems there is a lot of contention about how to avoid calling
> compound_head() now.
> 

IMHO compound head is not that heavy.  So you could keep the get/put page
calls as-is and worry about micro optimizations later,  especially since
it's intersecting with folio changes atm.

> Will send out the uncontroversial one first.
> 

Thanks!

> > That's what I hinted on the previous version. The patches right now go way
> > beyond the spec of page pool.  We are starting to change core networking
> > functions and imho we need a lot more people involved in this discussion,
> > than the ones participating already.
> > 
> > As a general note and the reason I am so hesitant,  is that we are starting
> > to violate layers here (at least in my opinion).  When the recycling was
> > added,  my main concern was to keep the network stack unaware (apart from
> > the skb bit).  Now suddenly we need to teach frag_ref/unref internal page
> 
> Maybe the skb recycle bit is a clever way to avoid dealing with the network
> stack directly.
> 
> But that bit might also introduce or hide some problem, like the data race
> as pointed out by Alexander, and the odd using of page pool in mlx5 driver.

Yea.  I was always wondering if unmaping the buffers and let the network stack
deal with them eventually would be a good idea (on those special cases).
There's an obvious disadvantage (which imho is terrible) in this approach.
Any future functions that we add in the core networking code, will need to
keep that in mindxi,  and unmap some random driver memory  if they start
playing tricks with the skb and their fragments. IOW I think this is very
fragile.

> 
> > pool counters and that doesn't feel right.  We first need to prove the race
> > can actually happen, before starting to change things.
> 
> As the network stack is adding a lot of performance improvement, such as
> sockmap for BPF, which may cause problem for them, will dig more to prove
> that.
> 

Ok that's something we need to look at.  Are those buffers freed eventually
by skb_free_head() etc?

Regards
/Ilias

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-10-12  7:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-09  9:37 [PATCH net-next -v5 0/4] some optimization for page pool Yunsheng Lin
2021-10-09  9:37 ` [PATCH net-next -v5 1/4] page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA Yunsheng Lin
2021-10-09  9:37 ` [PATCH net-next -v5 2/4] page_pool: change BIAS_MAX to support incrementing Yunsheng Lin
2021-10-09  9:37 ` [PATCH net-next -v5 3/4] mm: introduce __get_page() and __put_page() Yunsheng Lin
2021-10-09 19:49   ` John Hubbard
2021-10-09 20:15     ` Matthew Wilcox
2021-10-11  6:37       ` Yunsheng Lin
2021-10-11 12:25     ` Jesper Dangaard Brouer
2021-10-11 12:29       ` Ilias Apalodimas
2021-10-12  7:38         ` Yunsheng Lin
2021-10-12  7:49           ` Ilias Apalodimas
2021-10-09  9:37 ` [PATCH net-next -v5 4/4] skbuff: keep track of pp page when pp_frag_count is used Yunsheng Lin
2021-10-09 12:11   ` kernel test robot
2021-10-09 12:11     ` kernel test robot
2021-10-09 12:12   ` kernel test robot
2021-10-09 12:12     ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.