[RFC net-next v2 0/3] page_pool: allow caching from safely localized NAPI

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC net-next v2 0/3] page_pool: allow caching from safely localized NAPI
@ 2023-04-05 23:20 Jakub Kicinski
  2023-04-05 23:20 ` [RFC net-next v2 1/3] net: skb: plumb napi state thru skb freeing paths Jakub Kicinski
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jakub Kicinski @ 2023-04-05 23:20 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, pabeni, hawk, ilias.apalodimas, Jakub Kicinski

I went back to the explicit "are we in NAPI method", mostly
because I don't like having both around :( (even tho I maintain
that in_softirq() && !in_hardirq() is as safe, as softirqs do
not nest).

Still returning the skbs to a CPU, tho, not to the NAPI instance.
I reckon we could create a small refcounted struct per NAPI instance
which would allow sockets and other users so hold a persisent
and safe reference. But that's a bigger change, and I get 90+%
recycling thru the cache with just these patches (for RR and
streaming tests with 100% CPU use it's almost 100%).

Some numbers for streaming test with 100% CPU use (from previous version,
but really they perform the same):

		HW-GRO				page=page
		before		after		before		after
recycle:
cached:			0	138669686		0	150197505
cache_full:		0	   223391		0	    74582
ring:		138551933         9997191	149299454		0
ring_full: 		0             488	     3154	   127590
released_refcnt:	0		0		0		0

alloc:
fast:		136491361	148615710	146969587	150322859
slow:		     1772	     1799	      144	      105
slow_high_order:	0		0		0		0
empty:		     1772	     1799	      144	      105
refill:		  2165245	   156302	  2332880	     2128
waive:			0		0		0		0

Jakub Kicinski (3):
  net: skb: plumb napi state thru skb freeing paths
  page_pool: allow caching from safely localized NAPI
  bnxt: hook NAPIs to page pools

 Documentation/networking/page_pool.rst    |  1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |  1 +
 include/linux/netdevice.h                 |  3 ++
 include/linux/skbuff.h                    | 20 +++++++----
 include/net/page_pool.h                   |  3 +-
 net/core/dev.c                            |  3 ++
 net/core/page_pool.c                      | 15 ++++++--
 net/core/skbuff.c                         | 43 ++++++++++++-----------
 8 files changed, 59 insertions(+), 30 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC net-next v2 1/3] net: skb: plumb napi state thru skb freeing paths
  2023-04-05 23:20 [RFC net-next v2 0/3] page_pool: allow caching from safely localized NAPI Jakub Kicinski
@ 2023-04-05 23:20 ` Jakub Kicinski
  2023-04-07  9:15   ` Yunsheng Lin
  2023-04-05 23:20 ` [RFC net-next v2 2/3] page_pool: allow caching from safely localized NAPI Jakub Kicinski
  2023-04-05 23:21 ` [RFC net-next v2 3/3] bnxt: hook NAPIs to page pools Jakub Kicinski
  2 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2023-04-05 23:20 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, pabeni, hawk, ilias.apalodimas, Jakub Kicinski

We maintain a NAPI-local cache of skbs which is fed by napi_consume_skb().
Going forward we will also try to cache head and data pages.
Plumb the "are we in a normal NAPI context" information thru
deeper into the freeing path, up to skb_release_data() and
skb_free_head()/skb_pp_recycle().

Use "bool in_normal_napi" rather than bare "int budget",
the further we get from NAPI the more confusing the budget
argument may seem (particularly whether 0 or MAX is the
correct value to pass in when not in NAPI).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 net/core/skbuff.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 050a875d09c5..8d5377b4112f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -839,7 +839,7 @@ static void skb_clone_fraglist(struct sk_buff *skb)
 		skb_get(list);
 }
 
-static bool skb_pp_recycle(struct sk_buff *skb, void *data)
+static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool in_normal_napi)
 {
 	if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
 		return false;
@@ -856,12 +856,12 @@ static void skb_kfree_head(void *head, unsigned int end_offset)
 		kfree(head);
 }
 
-static void skb_free_head(struct sk_buff *skb)
+static void skb_free_head(struct sk_buff *skb, bool in_normal_napi)
 {
 	unsigned char *head = skb->head;
 
 	if (skb->head_frag) {
-		if (skb_pp_recycle(skb, head))
+		if (skb_pp_recycle(skb, head, in_normal_napi))
 			return;
 		skb_free_frag(head);
 	} else {
@@ -869,7 +869,8 @@ static void skb_free_head(struct sk_buff *skb)
 	}
 }
 
-static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason)
+static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason,
+			     bool in_normal_napi)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	int i;
@@ -894,7 +895,7 @@ static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason)
 	if (shinfo->frag_list)
 		kfree_skb_list_reason(shinfo->frag_list, reason);
 
-	skb_free_head(skb);
+	skb_free_head(skb, in_normal_napi);
 exit:
 	/* When we clone an SKB we copy the reycling bit. The pp_recycle
 	 * bit is only set on the head though, so in order to avoid races
@@ -955,11 +956,12 @@ void skb_release_head_state(struct sk_buff *skb)
 }
 
 /* Free everything but the sk_buff shell. */
-static void skb_release_all(struct sk_buff *skb, enum skb_drop_reason reason)
+static void skb_release_all(struct sk_buff *skb, enum skb_drop_reason reason,
+			    bool in_normal_napi)
 {
 	skb_release_head_state(skb);
 	if (likely(skb->head))
-		skb_release_data(skb, reason);
+		skb_release_data(skb, reason, in_normal_napi);
 }
 
 /**
@@ -973,7 +975,7 @@ static void skb_release_all(struct sk_buff *skb, enum skb_drop_reason reason)
 
 void __kfree_skb(struct sk_buff *skb)
 {
-	skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED);
+	skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED, false);
 	kfree_skbmem(skb);
 }
 EXPORT_SYMBOL(__kfree_skb);
@@ -1027,7 +1029,7 @@ static void kfree_skb_add_bulk(struct sk_buff *skb,
 		return;
 	}
 
-	skb_release_all(skb, reason);
+	skb_release_all(skb, reason, false);
 	sa->skb_array[sa->skb_count++] = skb;
 
 	if (unlikely(sa->skb_count == KFREE_SKB_BULK_SIZE)) {
@@ -1201,7 +1203,7 @@ EXPORT_SYMBOL(consume_skb);
 void __consume_stateless_skb(struct sk_buff *skb)
 {
 	trace_consume_skb(skb, __builtin_return_address(0));
-	skb_release_data(skb, SKB_CONSUMED);
+	skb_release_data(skb, SKB_CONSUMED, false);
 	kfree_skbmem(skb);
 }
 
@@ -1226,7 +1228,7 @@ static void napi_skb_cache_put(struct sk_buff *skb)
 
 void __kfree_skb_defer(struct sk_buff *skb)
 {
-	skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED);
+	skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED, false);
 	napi_skb_cache_put(skb);
 }
 
@@ -1264,7 +1266,7 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
 		return;
 	}
 
-	skb_release_all(skb, SKB_CONSUMED);
+	skb_release_all(skb, SKB_CONSUMED, !!budget);
 	napi_skb_cache_put(skb);
 }
 EXPORT_SYMBOL(napi_consume_skb);
@@ -1395,7 +1397,7 @@ EXPORT_SYMBOL_GPL(alloc_skb_for_msg);
  */
 struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
 {
-	skb_release_all(dst, SKB_CONSUMED);
+	skb_release_all(dst, SKB_CONSUMED, false);
 	return __skb_clone(dst, src);
 }
 EXPORT_SYMBOL_GPL(skb_morph);
@@ -2018,9 +2020,9 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 		if (skb_has_frag_list(skb))
 			skb_clone_fraglist(skb);
 
-		skb_release_data(skb, SKB_CONSUMED);
+		skb_release_data(skb, SKB_CONSUMED, false);
 	} else {
-		skb_free_head(skb);
+		skb_free_head(skb, false);
 	}
 	off = (data + nhead) - skb->head;
 
@@ -6389,12 +6391,12 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
 			skb_frag_ref(skb, i);
 		if (skb_has_frag_list(skb))
 			skb_clone_fraglist(skb);
-		skb_release_data(skb, SKB_CONSUMED);
+		skb_release_data(skb, SKB_CONSUMED, false);
 	} else {
 		/* we can reuse existing recount- all we did was
 		 * relocate values
 		 */
-		skb_free_head(skb);
+		skb_free_head(skb, false);
 	}
 
 	skb->head = data;
@@ -6529,7 +6531,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
 		skb_kfree_head(data, size);
 		return -ENOMEM;
 	}
-	skb_release_data(skb, SKB_CONSUMED);
+	skb_release_data(skb, SKB_CONSUMED, false);
 
 	skb->head = data;
 	skb->head_frag = 0;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC net-next v2 2/3] page_pool: allow caching from safely localized NAPI
  2023-04-05 23:20 [RFC net-next v2 0/3] page_pool: allow caching from safely localized NAPI Jakub Kicinski
  2023-04-05 23:20 ` [RFC net-next v2 1/3] net: skb: plumb napi state thru skb freeing paths Jakub Kicinski
@ 2023-04-05 23:20 ` Jakub Kicinski
  2023-04-05 23:21 ` [RFC net-next v2 3/3] bnxt: hook NAPIs to page pools Jakub Kicinski
  2 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2023-04-05 23:20 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, pabeni, hawk, ilias.apalodimas, Jakub Kicinski

Recent patches to mlx5 mentioned a regression when moving from
driver local page pool to only using the generic page pool code.
Page pool has two recycling paths (1) direct one, which runs in
safe NAPI context (basically consumer context, so producing
can be lockless); and (2) via a ptr_ring, which takes a spin
lock because the freeing can happen from any CPU; producer
and consumer may run concurrently.

Since the page pool code was added, Eric introduced a revised version
of deferred skb freeing. TCP skbs are now usually returned to the CPU
which allocated them, and freed in softirq context. This places the
freeing (producing of pages back to the pool) enticingly close to
the allocation (consumer).

If we can prove that we're freeing in the same softirq context in which
the consumer NAPI will run - lockless use of the cache is perfectly fine,
no need for the lock.

Let drivers link the page pool to a NAPI instance. If the NAPI instance
is scheduled on the same CPU on which we're freeing - place the pages
in the direct cache.

With that and patched bnxt (XDP enabled to engage the page pool, sigh,
bnxt really needs page pool work :() I see a 2.6% perf boost with
a TCP stream test (app on a different physical core than softirq).

The CPU use of relevant functions decreases as expected:

  page_pool_refill_alloc_cache   1.17% -> 0%
  _raw_spin_lock                 2.41% -> 0.98%

Only consider lockless path to be safe when NAPI is scheduled
- in practice this should cover majority if not all of steady state
workloads. It's usually the NAPI kicking in that causes the skb flush.

The main case we'll miss out on is when application runs on the same
CPU as NAPI. In that case we don't use the deferred skb free path.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
 - plumb thru "are we in NAPI" bool rather than guessing based
   on softirq && !hardirq

CC: hawk@kernel.org
CC: ilias.apalodimas@linaro.org
---
 Documentation/networking/page_pool.rst |  1 +
 include/linux/netdevice.h              |  3 +++
 include/linux/skbuff.h                 | 20 +++++++++++++-------
 include/net/page_pool.h                |  3 ++-
 net/core/dev.c                         |  3 +++
 net/core/page_pool.c                   | 15 +++++++++++++--
 net/core/skbuff.c                      |  5 +++--
 7 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
index 30f1344e7cca..873efd97f822 100644
--- a/Documentation/networking/page_pool.rst
+++ b/Documentation/networking/page_pool.rst
@@ -165,6 +165,7 @@ Registration
     pp_params.pool_size = DESC_NUM;
     pp_params.nid = NUMA_NO_NODE;
     pp_params.dev = priv->dev;
+    pp_params.napi = napi; /* only if locking is tied to NAPI */
     pp_params.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
     page_pool = page_pool_create(&pp_params);
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a740be3bb911..64a64dc23fa0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -360,8 +360,11 @@ struct napi_struct {
 	unsigned long		gro_bitmask;
 	int			(*poll)(struct napi_struct *, int);
 #ifdef CONFIG_NETPOLL
+	/* CPU actively polling if netpoll is configured */
 	int			poll_owner;
 #endif
+	/* CPU on which NAPI has been scheduled for processing */
+	int			list_owner;
 	struct net_device	*dev;
 	struct gro_list		gro_hash[GRO_HASH_BUCKETS];
 	struct sk_buff		*skb;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 82511b2f61ea..8e7e5f9bdb77 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3386,6 +3386,18 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f)
 	__skb_frag_ref(&skb_shinfo(skb)->frags[f]);
 }
 
+static inline void
+napi_frag_unref(skb_frag_t *frag, bool recycle, bool in_normal_napi)
+{
+	struct page *page = skb_frag_page(frag);
+
+#ifdef CONFIG_PAGE_POOL
+	if (recycle && page_pool_return_skb_page(page, in_normal_napi))
+		return;
+#endif
+	put_page(page);
+}
+
 /**
  * __skb_frag_unref - release a reference on a paged fragment.
  * @frag: the paged fragment
@@ -3396,13 +3408,7 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f)
  */
 static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle)
 {
-	struct page *page = skb_frag_page(frag);
-
-#ifdef CONFIG_PAGE_POOL
-	if (recycle && page_pool_return_skb_page(page))
-		return;
-#endif
-	put_page(page);
+	napi_frag_unref(frag, recycle, false);
 }
 
 /**
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index ddfa0b328677..d8f2b29c99f8 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -77,6 +77,7 @@ struct page_pool_params {
 	unsigned int	pool_size;
 	int		nid;  /* Numa node id to allocate from pages from */
 	struct device	*dev; /* device, for DMA pre-mapping purposes */
+	struct napi_struct *napi; /* Sole consumer of pages, otherwise NULL */
 	enum dma_data_direction dma_dir; /* DMA mapping direction */
 	unsigned int	max_len; /* max DMA sync memory size */
 	unsigned int	offset;  /* DMA addr offset */
@@ -239,7 +240,7 @@ inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
 	return pool->p.dma_dir;
 }
 
-bool page_pool_return_skb_page(struct page *page);
+bool page_pool_return_skb_page(struct page *page, bool in_napi);
 
 struct page_pool *page_pool_create(const struct page_pool_params *params);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 7ce5985be84b..075e607551cb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4358,6 +4358,7 @@ static inline void ____napi_schedule(struct softnet_data *sd,
 	}
 
 	list_add_tail(&napi->poll_list, &sd->poll_list);
+	WRITE_ONCE(napi->list_owner, smp_processor_id());
 	/* If not called from net_rx_action()
 	 * we have to raise NET_RX_SOFTIRQ.
 	 */
@@ -6068,6 +6069,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
 		list_del_init(&n->poll_list);
 		local_irq_restore(flags);
 	}
+	WRITE_ONCE(n->list_owner, -1);
 
 	val = READ_ONCE(n->state);
 	do {
@@ -6383,6 +6385,7 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
 #ifdef CONFIG_NETPOLL
 	napi->poll_owner = -1;
 #endif
+	napi->list_owner = -1;
 	set_bit(NAPI_STATE_SCHED, &napi->state);
 	set_bit(NAPI_STATE_NPSVC, &napi->state);
 	list_add_rcu(&napi->dev_list, &dev->napi_list);
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 193c18799865..bbad7a0a320b 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -19,6 +19,7 @@
 #include <linux/mm.h> /* for put_page() */
 #include <linux/poison.h>
 #include <linux/ethtool.h>
+#include <linux/netdevice.h>
 
 #include <trace/events/page_pool.h>
 
@@ -874,9 +875,11 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
 }
 EXPORT_SYMBOL(page_pool_update_nid);
 
-bool page_pool_return_skb_page(struct page *page)
+bool page_pool_return_skb_page(struct page *page, bool in_napi)
 {
+	struct napi_struct *napi;
 	struct page_pool *pp;
+	bool allow_direct;
 
 	page = compound_head(page);
 
@@ -892,12 +895,20 @@ bool page_pool_return_skb_page(struct page *page)
 
 	pp = page->pp;
 
+	/* Allow direct recycle if we have reasons to believe that we are
+	 * in the same context as the consumer would run, so there's
+	 * no possible race.
+	 */
+	napi = pp->p.napi;
+	allow_direct = in_napi && napi &&
+		READ_ONCE(napi->list_owner) == smp_processor_id();
+
 	/* Driver set this to memory recycling info. Reset it on recycle.
 	 * This will *not* work for NIC using a split-page memory model.
 	 * The page will be returned to the pool here regardless of the
 	 * 'flipped' fragment being in use or not.
 	 */
-	page_pool_put_full_page(pp, page, false);
+	page_pool_put_full_page(pp, page, allow_direct);
 
 	return true;
 }
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 8d5377b4112f..8e084db3b5a3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -843,7 +843,7 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool in_normal_napi)
 {
 	if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
 		return false;
-	return page_pool_return_skb_page(virt_to_page(data));
+	return page_pool_return_skb_page(virt_to_page(data), in_normal_napi);
 }
 
 static void skb_kfree_head(void *head, unsigned int end_offset)
@@ -889,7 +889,8 @@ static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason,
 	}
 
 	for (i = 0; i < shinfo->nr_frags; i++)
-		__skb_frag_unref(&shinfo->frags[i], skb->pp_recycle);
+		napi_frag_unref(&shinfo->frags[i], skb->pp_recycle,
+				in_normal_napi);
 
 free_head:
 	if (shinfo->frag_list)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC net-next v2 3/3] bnxt: hook NAPIs to page pools
  2023-04-05 23:20 [RFC net-next v2 0/3] page_pool: allow caching from safely localized NAPI Jakub Kicinski
  2023-04-05 23:20 ` [RFC net-next v2 1/3] net: skb: plumb napi state thru skb freeing paths Jakub Kicinski
  2023-04-05 23:20 ` [RFC net-next v2 2/3] page_pool: allow caching from safely localized NAPI Jakub Kicinski
@ 2023-04-05 23:21 ` Jakub Kicinski
  2 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2023-04-05 23:21 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, hawk, ilias.apalodimas, Jakub Kicinski,
	michael.chan

bnxt has 1:1 mapping of page pools and NAPIs, so it's safe
to hoook them up together.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: michael.chan@broadcom.com
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8ff5a4f98d6f..fffd84d11faf 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3237,6 +3237,7 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
 
 	pp.pool_size = bp->rx_ring_size;
 	pp.nid = dev_to_node(&bp->pdev->dev);
+	pp.napi = &rxr->bnapi->napi;
 	pp.dev = &bp->pdev->dev;
 	pp.dma_dir = DMA_BIDIRECTIONAL;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC net-next v2 1/3] net: skb: plumb napi state thru skb freeing paths
  2023-04-05 23:20 ` [RFC net-next v2 1/3] net: skb: plumb napi state thru skb freeing paths Jakub Kicinski
@ 2023-04-07  9:15   ` Yunsheng Lin
  2023-04-07 14:14     ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Yunsheng Lin @ 2023-04-07  9:15 UTC (permalink / raw)
  To: Jakub Kicinski, davem; +Cc: netdev, edumazet, pabeni, hawk, ilias.apalodimas

On 2023/4/6 7:20, Jakub Kicinski wrote:
> We maintain a NAPI-local cache of skbs which is fed by napi_consume_skb().
> Going forward we will also try to cache head and data pages.
> Plumb the "are we in a normal NAPI context" information thru
> deeper into the freeing path, up to skb_release_data() and
> skb_free_head()/skb_pp_recycle().
> 
> Use "bool in_normal_napi" rather than bare "int budget",
> the further we get from NAPI the more confusing the budget
> argument may seem (particularly whether 0 or MAX is the
> correct value to pass in when not in NAPI).
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  net/core/skbuff.c | 38 ++++++++++++++++++++------------------
>  1 file changed, 20 insertions(+), 18 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 050a875d09c5..8d5377b4112f 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -839,7 +839,7 @@ static void skb_clone_fraglist(struct sk_buff *skb)
>  		skb_get(list);
>  }
>  
> -static bool skb_pp_recycle(struct sk_buff *skb, void *data)
> +static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool in_normal_napi)

What does *normal* means in 'in_normal_napi'?
can we just use in_napi?

>  {
>  	if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
>  		return false;
> @@ -856,12 +856,12 @@ static void skb_kfree_head(void *head, unsigned int end_offset)
>  		kfree(head);
>  }
>  
> -static void skb_free_head(struct sk_buff *skb)
> +static void skb_free_head(struct sk_buff *skb, bool in_normal_napi)
>  {
>  	unsigned char *head = skb->head;
>  
>  	if (skb->head_frag) {
> -		if (skb_pp_recycle(skb, head))
> +		if (skb_pp_recycle(skb, head, in_normal_napi))
>  			return;
>  		skb_free_frag(head);
>  	} else {
> @@ -869,7 +869,8 @@ static void skb_free_head(struct sk_buff *skb)
>  	}
>  }
>  
> -static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason)
> +static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason,
> +			     bool in_normal_napi)
>  {
>  	struct skb_shared_info *shinfo = skb_shinfo(skb);
>  	int i;
> @@ -894,7 +895,7 @@ static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason)
>  	if (shinfo->frag_list)
>  		kfree_skb_list_reason(shinfo->frag_list, reason);
>  
> -	skb_free_head(skb);
> +	skb_free_head(skb, in_normal_napi);
>  exit:
>  	/* When we clone an SKB we copy the reycling bit. The pp_recycle
>  	 * bit is only set on the head though, so in order to avoid races
> @@ -955,11 +956,12 @@ void skb_release_head_state(struct sk_buff *skb)
>  }
>  
>  /* Free everything but the sk_buff shell. */
> -static void skb_release_all(struct sk_buff *skb, enum skb_drop_reason reason)
> +static void skb_release_all(struct sk_buff *skb, enum skb_drop_reason reason,
> +			    bool in_normal_napi)
>  {
>  	skb_release_head_state(skb);
>  	if (likely(skb->head))
> -		skb_release_data(skb, reason);
> +		skb_release_data(skb, reason, in_normal_napi);
>  }
>  
>  /**
> @@ -973,7 +975,7 @@ static void skb_release_all(struct sk_buff *skb, enum skb_drop_reason reason)
>  
>  void __kfree_skb(struct sk_buff *skb)
>  {
> -	skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED);
> +	skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED, false);
>  	kfree_skbmem(skb);
>  }
>  EXPORT_SYMBOL(__kfree_skb);
> @@ -1027,7 +1029,7 @@ static void kfree_skb_add_bulk(struct sk_buff *skb,
>  		return;
>  	}
>  
> -	skb_release_all(skb, reason);
> +	skb_release_all(skb, reason, false);
>  	sa->skb_array[sa->skb_count++] = skb;
>  
>  	if (unlikely(sa->skb_count == KFREE_SKB_BULK_SIZE)) {
> @@ -1201,7 +1203,7 @@ EXPORT_SYMBOL(consume_skb);
>  void __consume_stateless_skb(struct sk_buff *skb)
>  {
>  	trace_consume_skb(skb, __builtin_return_address(0));
> -	skb_release_data(skb, SKB_CONSUMED);
> +	skb_release_data(skb, SKB_CONSUMED, false);
>  	kfree_skbmem(skb);
>  }
>  
> @@ -1226,7 +1228,7 @@ static void napi_skb_cache_put(struct sk_buff *skb)
>  
>  void __kfree_skb_defer(struct sk_buff *skb)
>  {
> -	skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED);
> +	skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED, false);
>  	napi_skb_cache_put(skb);

Is there any reson not to call skb_release_all() with in_normal_napi
being true while napi_skb_cache_put() is called here?

>  }
>  
> @@ -1264,7 +1266,7 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
>  		return;
>  	}
>  
> -	skb_release_all(skb, SKB_CONSUMED);
> +	skb_release_all(skb, SKB_CONSUMED, !!budget);
>  	napi_skb_cache_put(skb);
>  }
>  EXPORT_SYMBOL(napi_consume_skb);
> @@ -1395,7 +1397,7 @@ EXPORT_SYMBOL_GPL(alloc_skb_for_msg);
>   */
>  struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
>  {
> -	skb_release_all(dst, SKB_CONSUMED);
> +	skb_release_all(dst, SKB_CONSUMED, false);
>  	return __skb_clone(dst, src);
>  }
>  EXPORT_SYMBOL_GPL(skb_morph);
> @@ -2018,9 +2020,9 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  		if (skb_has_frag_list(skb))
>  			skb_clone_fraglist(skb);
>  
> -		skb_release_data(skb, SKB_CONSUMED);
> +		skb_release_data(skb, SKB_CONSUMED, false);
>  	} else {
> -		skb_free_head(skb);
> +		skb_free_head(skb, false);
>  	}
>  	off = (data + nhead) - skb->head;
>  
> @@ -6389,12 +6391,12 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
>  			skb_frag_ref(skb, i);
>  		if (skb_has_frag_list(skb))
>  			skb_clone_fraglist(skb);
> -		skb_release_data(skb, SKB_CONSUMED);
> +		skb_release_data(skb, SKB_CONSUMED, false);
>  	} else {
>  		/* we can reuse existing recount- all we did was
>  		 * relocate values
>  		 */
> -		skb_free_head(skb);
> +		skb_free_head(skb, false);
>  	}
>  
>  	skb->head = data;
> @@ -6529,7 +6531,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
>  		skb_kfree_head(data, size);
>  		return -ENOMEM;
>  	}
> -	skb_release_data(skb, SKB_CONSUMED);
> +	skb_release_data(skb, SKB_CONSUMED, false);
>  
>  	skb->head = data;
>  	skb->head_frag = 0;
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC net-next v2 1/3] net: skb: plumb napi state thru skb freeing paths
  2023-04-07  9:15   ` Yunsheng Lin
@ 2023-04-07 14:14     ` Jakub Kicinski
  2023-04-07 15:28       ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2023-04-07 14:14 UTC (permalink / raw)
  To: Yunsheng Lin; +Cc: davem, netdev, edumazet, pabeni, hawk, ilias.apalodimas

On Fri, 7 Apr 2023 17:15:15 +0800 Yunsheng Lin wrote:
> On 2023/4/6 7:20, Jakub Kicinski wrote:
> > We maintain a NAPI-local cache of skbs which is fed by napi_consume_skb().
> > Going forward we will also try to cache head and data pages.
> > Plumb the "are we in a normal NAPI context" information thru
> > deeper into the freeing path, up to skb_release_data() and
> > skb_free_head()/skb_pp_recycle().
> > 
> > Use "bool in_normal_napi" rather than bare "int budget",
> > the further we get from NAPI the more confusing the budget
> > argument may seem (particularly whether 0 or MAX is the
> > correct value to pass in when not in NAPI).

> > @@ -839,7 +839,7 @@ static void skb_clone_fraglist(struct sk_buff *skb)
> >  		skb_get(list);
> >  }
> >  
> > -static bool skb_pp_recycle(struct sk_buff *skb, void *data)
> > +static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool in_normal_napi)  
> 
> What does *normal* means in 'in_normal_napi'?
> can we just use in_napi?

Technically netpoll also calls NAPI, that's why I threw in the
"normal". If folks prefer in_napi or some other name I'm more 
than happy to change. Naming is hard.

> > @@ -1226,7 +1228,7 @@ static void napi_skb_cache_put(struct sk_buff *skb)
> >  
> >  void __kfree_skb_defer(struct sk_buff *skb)
> >  {
> > -	skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED);
> > +	skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED, false);
> >  	napi_skb_cache_put(skb);  
> 
> Is there any reson not to call skb_release_all() with in_normal_napi
> being true while napi_skb_cache_put() is called here?

True, __kfree_skb_defer() needs more work also. I'll set in to true 
in the PATCH posting and clean up the function in a follow up.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC net-next v2 1/3] net: skb: plumb napi state thru skb freeing paths
  2023-04-07 14:14     ` Jakub Kicinski
@ 2023-04-07 15:28       ` Jakub Kicinski
  2023-04-09 17:28         ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2023-04-07 15:28 UTC (permalink / raw)
  To: Yunsheng Lin; +Cc: davem, netdev, edumazet, pabeni, hawk, ilias.apalodimas

On Fri, 7 Apr 2023 07:14:02 -0700 Jakub Kicinski wrote:
> > > -static bool skb_pp_recycle(struct sk_buff *skb, void *data)
> > > +static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool in_normal_napi)    
> > 
> > What does *normal* means in 'in_normal_napi'?
> > can we just use in_napi?  
> 
> Technically netpoll also calls NAPI, that's why I threw in the
> "normal". If folks prefer in_napi or some other name I'm more 
> than happy to change. Naming is hard.

Maybe I should rename it to in_softirq ? Or napi_safe ?
Because __kfree_skb_defer() gets called from the Tx side.
And even the Rx deferred free isn't really *in* NAPI.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC net-next v2 1/3] net: skb: plumb napi state thru skb freeing paths
  2023-04-07 15:28       ` Jakub Kicinski
@ 2023-04-09 17:28         ` Jesper Dangaard Brouer
  2023-04-10  9:20           ` Yunsheng Lin
  0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-04-09 17:28 UTC (permalink / raw)
  To: Jakub Kicinski, Yunsheng Lin
  Cc: brouer, davem, netdev, edumazet, pabeni, hawk, ilias.apalodimas



On 07/04/2023 17.28, Jakub Kicinski wrote:
> On Fri, 7 Apr 2023 07:14:02 -0700 Jakub Kicinski wrote:
>>>> -static bool skb_pp_recycle(struct sk_buff *skb, void *data)
>>>> +static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool in_normal_napi)
>>>
>>> What does *normal* means in 'in_normal_napi'?
>>> can we just use in_napi?
>>
>> Technically netpoll also calls NAPI, that's why I threw in the
>> "normal". If folks prefer in_napi or some other name I'm more
>> than happy to change. Naming is hard.
> 
> Maybe I should rename it to in_softirq ? Or napi_safe ?
> Because __kfree_skb_defer() gets called from the Tx side.
> And even the Rx deferred free isn't really *in* NAPI.
> 

I like the name "napi_safe".

--Jesper


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC net-next v2 1/3] net: skb: plumb napi state thru skb freeing paths
  2023-04-09 17:28         ` Jesper Dangaard Brouer
@ 2023-04-10  9:20           ` Yunsheng Lin
  0 siblings, 0 replies; 9+ messages in thread
From: Yunsheng Lin @ 2023-04-10  9:20 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Jakub Kicinski
  Cc: brouer, davem, netdev, edumazet, pabeni, hawk, ilias.apalodimas

On 2023/4/10 1:28, Jesper Dangaard Brouer wrote:
> 
> 
> On 07/04/2023 17.28, Jakub Kicinski wrote:
>> On Fri, 7 Apr 2023 07:14:02 -0700 Jakub Kicinski wrote:
>>>>> -static bool skb_pp_recycle(struct sk_buff *skb, void *data)
>>>>> +static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool in_normal_napi)
>>>>
>>>> What does *normal* means in 'in_normal_napi'?
>>>> can we just use in_napi?
>>>
>>> Technically netpoll also calls NAPI, that's why I threw in the
>>> "normal". If folks prefer in_napi or some other name I'm more
>>> than happy to change. Naming is hard.
>>
>> Maybe I should rename it to in_softirq ? Or napi_safe ?
>> Because __kfree_skb_defer() gets called from the Tx side.
>> And even the Rx deferred free isn't really *in* NAPI.
>>
> 
> I like the name "napi_safe".

+1.

> 
> --Jesper
> 
> .
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-04-10  9:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-05 23:20 [RFC net-next v2 0/3] page_pool: allow caching from safely localized NAPI Jakub Kicinski
2023-04-05 23:20 ` [RFC net-next v2 1/3] net: skb: plumb napi state thru skb freeing paths Jakub Kicinski
2023-04-07  9:15   ` Yunsheng Lin
2023-04-07 14:14     ` Jakub Kicinski
2023-04-07 15:28       ` Jakub Kicinski
2023-04-09 17:28         ` Jesper Dangaard Brouer
2023-04-10  9:20           ` Yunsheng Lin
2023-04-05 23:20 ` [RFC net-next v2 2/3] page_pool: allow caching from safely localized NAPI Jakub Kicinski
2023-04-05 23:21 ` [RFC net-next v2 3/3] bnxt: hook NAPIs to page pools Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.