All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters
@ 2022-01-26 22:48 Joe Damato
  2022-01-26 22:48 ` [PATCH 1/6] net: page_pool: Add alloc stats and fast path stat Joe Damato
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Joe Damato @ 2022-01-26 22:48 UTC (permalink / raw)
  To: netdev; +Cc: kuba, davem, ilias.apalodimas, hawk, Joe Damato

Greetings:

This series adds some stat counters for the page_pool allocation path which
help to track:

	- fast path allocations
	- slow path order-0 allocations
	- slow path high order allocations
	- refills which failed due to an empty ptr ring, forcing a slow
	  path allocation
	- allocations fulfilled via successful refill
	- pages which cannot be added to the cache because of numa mismatch
	  (i.e. waived)

Some static inline wrappers are provided for accessing these stats. The
intention is that drivers which use the page_pool API can, if they choose,
use this stats API.

It assumed that the API consumer will ensure the page_pool is not destroyed
during calls to the stats API.

If this series is accepted, I'll submit a follow up patch which will export
these stats per RX-ring via ethtool in a driver which uses the page_pool
API.

Joe Damato (6):
  net: page_pool: Add alloc stats and fast path stat
  net: page_pool: Add a stat for the slow alloc path
  net: page_pool: Add a high order alloc stat
  net: page_pool: Add stat tracking empty ring
  net: page_pool: Add stat tracking cache refills.
  net: page_pool: Add a stat tracking waived pages.

 include/net/page_pool.h | 82 +++++++++++++++++++++++++++++++++++++++++++++++++
 net/core/page_pool.c    | 15 +++++++--
 2 files changed, 94 insertions(+), 3 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/6] net: page_pool: Add alloc stats and fast path stat
  2022-01-26 22:48 [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Joe Damato
@ 2022-01-26 22:48 ` Joe Damato
  2022-01-27 16:32   ` Jakub Kicinski
  2022-02-02 14:14   ` Jesper Dangaard Brouer
  2022-01-26 22:48 ` [PATCH net-next 2/6] net: page_pool: Add a stat for the slow alloc path Joe Damato
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 17+ messages in thread
From: Joe Damato @ 2022-01-26 22:48 UTC (permalink / raw)
  To: netdev; +Cc: kuba, davem, ilias.apalodimas, hawk, Joe Damato

Add a stats structure with a an internal alloc structure for holding
allocation path related stats.

The alloc structure contains the stat 'fast'. This stat tracks fast
path allocations.

A static inline accessor function is exposed for accessing this stat.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 include/net/page_pool.h | 26 ++++++++++++++++++++++++++
 net/core/page_pool.c    |  1 +
 2 files changed, 27 insertions(+)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 79a8055..3ae3dc4 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -71,6 +71,20 @@ struct pp_alloc_cache {
 	struct page *cache[PP_ALLOC_CACHE_SIZE];
 };
 
+/**
+ * stats for tracking page_pool events.
+ *
+ * accessor functions for these stats provided below.
+ *
+ * Note that it is the responsibility of the API consumer to ensure that
+ * the page_pool has not been destroyed while accessing stats fields.
+ */
+struct page_pool_stats {
+	struct {
+		u64 fast; /* fast path allocations */
+	} alloc;
+};
+
 struct page_pool_params {
 	unsigned int	flags;
 	unsigned int	order;
@@ -86,6 +100,7 @@ struct page_pool_params {
 
 struct page_pool {
 	struct page_pool_params p;
+	struct page_pool_stats ps;
 
 	struct delayed_work release_dw;
 	void (*disconnect)(void *);
@@ -180,6 +195,12 @@ void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
 void page_pool_release_page(struct page_pool *pool, struct page *page);
 void page_pool_put_page_bulk(struct page_pool *pool, void **data,
 			     int count);
+
+static inline u64 page_pool_stats_get_fast(struct page_pool *pool)
+{
+	return pool->ps.alloc.fast;
+}
+
 #else
 static inline void page_pool_destroy(struct page_pool *pool)
 {
@@ -199,6 +220,11 @@ static inline void page_pool_put_page_bulk(struct page_pool *pool, void **data,
 					   int count)
 {
 }
+
+static inline u64 page_pool_stats_get_fast(struct page_pool *pool)
+{
+	return 0;
+}
 #endif
 
 void page_pool_put_page(struct page_pool *pool, struct page *page,
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index bd62c01..84c9566 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -166,6 +166,7 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
 	if (likely(pool->alloc.count)) {
 		/* Fast-path */
 		page = pool->alloc.cache[--pool->alloc.count];
+		pool->ps.alloc.fast++;
 	} else {
 		page = page_pool_refill_alloc_cache(pool);
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 2/6] net: page_pool: Add a stat for the slow alloc path
  2022-01-26 22:48 [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Joe Damato
  2022-01-26 22:48 ` [PATCH 1/6] net: page_pool: Add alloc stats and fast path stat Joe Damato
@ 2022-01-26 22:48 ` Joe Damato
  2022-01-26 22:48 ` [PATCH net-next 3/6] net: page_pool: Add a high order alloc stat Joe Damato
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Joe Damato @ 2022-01-26 22:48 UTC (permalink / raw)
  To: netdev; +Cc: kuba, davem, ilias.apalodimas, hawk, Joe Damato

Add a stat, 'slow', for the slow allocation path. A static inline accessor
function is exposed for accessing this stat.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 include/net/page_pool.h | 10 ++++++++++
 net/core/page_pool.c    |  6 ++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 3ae3dc4..b5691ee 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -82,6 +82,7 @@ struct pp_alloc_cache {
 struct page_pool_stats {
 	struct {
 		u64 fast; /* fast path allocations */
+		u64 slow; /* slow-path order-0 allocations */
 	} alloc;
 };
 
@@ -201,6 +202,10 @@ static inline u64 page_pool_stats_get_fast(struct page_pool *pool)
 	return pool->ps.alloc.fast;
 }
 
+static inline u64 page_pool_stats_get_slow(struct page_pool *pool)
+{
+	return pool->ps.alloc.slow;
+}
 #else
 static inline void page_pool_destroy(struct page_pool *pool)
 {
@@ -225,6 +230,11 @@ static inline u64 page_pool_stats_get_fast(struct page_pool *pool)
 {
 	return 0;
 }
+
+static inline u64 page_pool_stats_get_slow(struct page_pool *pool)
+{
+	return 0;
+}
 #endif
 
 void page_pool_put_page(struct page_pool *pool, struct page *page,
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 84c9566..9dbe721 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -294,10 +294,12 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
 	}
 
 	/* Return last page */
-	if (likely(pool->alloc.count > 0))
+	if (likely(pool->alloc.count > 0)) {
 		page = pool->alloc.cache[--pool->alloc.count];
-	else
+		pool->ps.alloc.slow++;
+	} else {
 		page = NULL;
+	}
 
 	/* When page just alloc'ed is should/must have refcnt 1. */
 	return page;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 3/6] net: page_pool: Add a high order alloc stat
  2022-01-26 22:48 [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Joe Damato
  2022-01-26 22:48 ` [PATCH 1/6] net: page_pool: Add alloc stats and fast path stat Joe Damato
  2022-01-26 22:48 ` [PATCH net-next 2/6] net: page_pool: Add a stat for the slow alloc path Joe Damato
@ 2022-01-26 22:48 ` Joe Damato
  2022-01-26 22:48 ` [PATCH net-next 4/6] net: page_pool: Add stat tracking empty ring Joe Damato
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Joe Damato @ 2022-01-26 22:48 UTC (permalink / raw)
  To: netdev; +Cc: kuba, davem, ilias.apalodimas, hawk, Joe Damato

Add a stat to track high order allocations in the slow path. A static
inline function is exposed for accessing this stat.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 include/net/page_pool.h | 11 +++++++++++
 net/core/page_pool.c    |  1 +
 2 files changed, 12 insertions(+)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index b5691ee..b024197 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -83,6 +83,7 @@ struct page_pool_stats {
 	struct {
 		u64 fast; /* fast path allocations */
 		u64 slow; /* slow-path order-0 allocations */
+		u64 slow_high_order; /* slow-path high order allocations */
 	} alloc;
 };
 
@@ -206,6 +207,11 @@ static inline u64 page_pool_stats_get_slow(struct page_pool *pool)
 {
 	return pool->ps.alloc.slow;
 }
+
+static inline u64 page_pool_stats_get_slow_high_order(struct page_pool *pool)
+{
+	return pool->ps.alloc.slow_high_order;
+}
 #else
 static inline void page_pool_destroy(struct page_pool *pool)
 {
@@ -235,6 +241,11 @@ static inline u64 page_pool_stats_get_slow(struct page_pool *pool)
 {
 	return 0;
 }
+
+static inline u64 page_pool_stats_get_slow_high_order(struct page_pool *pool)
+{
+	return 0;
+}
 #endif
 
 void page_pool_put_page(struct page_pool *pool, struct page *page,
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 9dbe721..3a4b912 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -240,6 +240,7 @@ static struct page *__page_pool_alloc_page_order(struct page_pool *pool,
 		return NULL;
 	}
 
+	pool->ps.alloc.slow_high_order++;
 	page_pool_set_pp_info(pool, page);
 
 	/* Track how many pages are held 'in-flight' */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 4/6] net: page_pool: Add stat tracking empty ring
  2022-01-26 22:48 [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Joe Damato
                   ` (2 preceding siblings ...)
  2022-01-26 22:48 ` [PATCH net-next 3/6] net: page_pool: Add a high order alloc stat Joe Damato
@ 2022-01-26 22:48 ` Joe Damato
  2022-01-26 22:48 ` [PATCH net-next 5/6] net: page_pool: Add stat tracking cache refills Joe Damato
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Joe Damato @ 2022-01-26 22:48 UTC (permalink / raw)
  To: netdev; +Cc: kuba, davem, ilias.apalodimas, hawk, Joe Damato

Add a stat tracking when the ptr ring is empty. A static inline wrapper
function is exposed to access this stat.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 include/net/page_pool.h | 15 +++++++++++++++
 net/core/page_pool.c    |  4 +++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index b024197..92edf8e 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -84,6 +84,10 @@ struct page_pool_stats {
 		u64 fast; /* fast path allocations */
 		u64 slow; /* slow-path order-0 allocations */
 		u64 slow_high_order; /* slow-path high order allocations */
+		u64 empty; /* failed refills due to empty ptr ring, forcing
+			    * slow path allocation
+			    */
+
 	} alloc;
 };
 
@@ -212,6 +216,12 @@ static inline u64 page_pool_stats_get_slow_high_order(struct page_pool *pool)
 {
 	return pool->ps.alloc.slow_high_order;
 }
+
+static inline u64 page_pool_stats_get_empty(struct page_pool *pool)
+{
+	return pool->ps.alloc.empty;
+}
+
 #else
 static inline void page_pool_destroy(struct page_pool *pool)
 {
@@ -246,6 +256,11 @@ static inline u64 page_pool_stats_get_slow_high_order(struct page_pool *pool)
 {
 	return 0;
 }
+
+static inline u64 page_pool_stats_get_empty(struct page_pool *pool)
+{
+	return 0;
+}
 #endif
 
 void page_pool_put_page(struct page_pool *pool, struct page *page,
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 3a4b912..0de9d58 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -117,8 +117,10 @@ static struct page *page_pool_refill_alloc_cache(struct page_pool *pool)
 	int pref_nid; /* preferred NUMA node */
 
 	/* Quicker fallback, avoid locks when ring is empty */
-	if (__ptr_ring_empty(r))
+	if (__ptr_ring_empty(r)) {
+		pool->ps.alloc.empty++;
 		return NULL;
+	}
 
 	/* Softirq guarantee CPU and thus NUMA node is stable. This,
 	 * assumes CPU refilling driver RX-ring will also run RX-NAPI.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 5/6] net: page_pool: Add stat tracking cache refills.
  2022-01-26 22:48 [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Joe Damato
                   ` (3 preceding siblings ...)
  2022-01-26 22:48 ` [PATCH net-next 4/6] net: page_pool: Add stat tracking empty ring Joe Damato
@ 2022-01-26 22:48 ` Joe Damato
  2022-01-26 22:48 ` [PATCH 6/6] net: page_pool: Add a stat tracking waived pages Joe Damato
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Joe Damato @ 2022-01-26 22:48 UTC (permalink / raw)
  To: netdev; +Cc: kuba, davem, ilias.apalodimas, hawk, Joe Damato

Add a stat tracking succesfull allocations which triggered a refill. A
static inline wrapper is exposed for accessing this stat.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 include/net/page_pool.h | 11 ++++++++++-
 net/core/page_pool.c    |  2 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 92edf8e..a68d05f 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -87,7 +87,7 @@ struct page_pool_stats {
 		u64 empty; /* failed refills due to empty ptr ring, forcing
 			    * slow path allocation
 			    */
-
+		u64 refill; /* allocations via successful refill */
 	} alloc;
 };
 
@@ -222,6 +222,10 @@ static inline u64 page_pool_stats_get_empty(struct page_pool *pool)
 	return pool->ps.alloc.empty;
 }
 
+static inline u64 page_pool_stats_get_refill(struct page_pool *pool)
+{
+	return pool->ps.alloc.refill;
+}
 #else
 static inline void page_pool_destroy(struct page_pool *pool)
 {
@@ -261,6 +265,11 @@ static inline u64 page_pool_stats_get_empty(struct page_pool *pool)
 {
 	return 0;
 }
+
+static inline u64 page_pool_stats_get_refill(struct page_pool *pool)
+{
+	return 0;
+}
 #endif
 
 void page_pool_put_page(struct page_pool *pool, struct page *page,
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 0de9d58..15f4e73 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -171,6 +171,8 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
 		pool->ps.alloc.fast++;
 	} else {
 		page = page_pool_refill_alloc_cache(pool);
+		if (likely(page))
+			pool->ps.alloc.refill++;
 	}
 
 	return page;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/6] net: page_pool: Add a stat tracking waived pages.
  2022-01-26 22:48 [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Joe Damato
                   ` (4 preceding siblings ...)
  2022-01-26 22:48 ` [PATCH net-next 5/6] net: page_pool: Add stat tracking cache refills Joe Damato
@ 2022-01-26 22:48 ` Joe Damato
  2022-01-27  8:51 ` [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Jesper Dangaard Brouer
  2022-01-27  9:08 ` Ilias Apalodimas
  7 siblings, 0 replies; 17+ messages in thread
From: Joe Damato @ 2022-01-26 22:48 UTC (permalink / raw)
  To: netdev; +Cc: kuba, davem, ilias.apalodimas, hawk, Joe Damato

Track how often pages obtained from the ring cannot be added to the cache
because of a NUMA mismatch. A static inline wrapper is added for accessing
this stat.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 include/net/page_pool.h | 11 +++++++++++
 net/core/page_pool.c    |  1 +
 2 files changed, 12 insertions(+)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index a68d05f..cf65d78 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -88,6 +88,7 @@ struct page_pool_stats {
 			    * slow path allocation
 			    */
 		u64 refill; /* allocations via successful refill */
+		u64 waive;  /* failed refills due to numa zone mismatch */
 	} alloc;
 };
 
@@ -226,6 +227,11 @@ static inline u64 page_pool_stats_get_refill(struct page_pool *pool)
 {
 	return pool->ps.alloc.refill;
 }
+
+static inline u64 page_pool_stats_get_waive(struct page_pool *pool)
+{
+	return pool->ps.alloc.waive;
+}
 #else
 static inline void page_pool_destroy(struct page_pool *pool)
 {
@@ -270,6 +276,11 @@ static inline u64 page_pool_stats_get_refill(struct page_pool *pool)
 {
 	return 0;
 }
+
+static inline u64 page_pool_stats_get_waive(struct page_pool *pool)
+{
+	return 0;
+}
 #endif
 
 void page_pool_put_page(struct page_pool *pool, struct page *page,
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 15f4e73..7c4ae2e 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -147,6 +147,7 @@ static struct page *page_pool_refill_alloc_cache(struct page_pool *pool)
 			 * This limit stress on page buddy alloactor.
 			 */
 			page_pool_return_page(pool, page);
+			pool->ps.alloc.waive++;
 			page = NULL;
 			break;
 		}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters
  2022-01-26 22:48 [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Joe Damato
                   ` (5 preceding siblings ...)
  2022-01-26 22:48 ` [PATCH 6/6] net: page_pool: Add a stat tracking waived pages Joe Damato
@ 2022-01-27  8:51 ` Jesper Dangaard Brouer
  2022-01-27 21:08   ` Joe Damato
  2022-01-27  9:08 ` Ilias Apalodimas
  7 siblings, 1 reply; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2022-01-27  8:51 UTC (permalink / raw)
  To: Joe Damato, netdev; +Cc: brouer, kuba, davem, ilias.apalodimas, hawk



On 26/01/2022 23.48, Joe Damato wrote:
> Greetings:
> 
> This series adds some stat counters for the page_pool allocation path which
> help to track:
> 
> 	- fast path allocations
> 	- slow path order-0 allocations
> 	- slow path high order allocations
> 	- refills which failed due to an empty ptr ring, forcing a slow
> 	  path allocation
> 	- allocations fulfilled via successful refill
> 	- pages which cannot be added to the cache because of numa mismatch
> 	  (i.e. waived)
> 
> Some static inline wrappers are provided for accessing these stats. The
> intention is that drivers which use the page_pool API can, if they choose,
> use this stats API.

You are adding (always on) counters to a critical fast-path, that 
drivers uses for the XDP_DROP use-case.

I want to see performance measurements as documentation, showing this is 
not causing a slow-down.

I have some performance tests here[1]:
  [1] 
https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/lib

Look at:
  - bench_page_pool_simple.c and
  - bench_page_pool_cross_cpu.c

How to use + build this[2]:
  [2] 
https://prototype-kernel.readthedocs.io/en/latest/prototype-kernel/build-process.html

> It assumed that the API consumer will ensure the page_pool is not destroyed
> during calls to the stats API.
> 
> If this series is accepted, I'll submit a follow up patch which will export
> these stats per RX-ring via ethtool in a driver which uses the page_pool
> API.
> 
> Joe Damato (6):
>    net: page_pool: Add alloc stats and fast path stat
>    net: page_pool: Add a stat for the slow alloc path
>    net: page_pool: Add a high order alloc stat
>    net: page_pool: Add stat tracking empty ring
>    net: page_pool: Add stat tracking cache refills.
>    net: page_pool: Add a stat tracking waived pages.
> 
>   include/net/page_pool.h | 82 +++++++++++++++++++++++++++++++++++++++++++++++++
>   net/core/page_pool.c    | 15 +++++++--
>   2 files changed, 94 insertions(+), 3 deletions(-)
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters
  2022-01-26 22:48 [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Joe Damato
                   ` (6 preceding siblings ...)
  2022-01-27  8:51 ` [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Jesper Dangaard Brouer
@ 2022-01-27  9:08 ` Ilias Apalodimas
  2022-01-27 23:55   ` Joe Damato
  7 siblings, 1 reply; 17+ messages in thread
From: Ilias Apalodimas @ 2022-01-27  9:08 UTC (permalink / raw)
  To: Joe Damato; +Cc: netdev, kuba, davem, hawk

Hi Joe, 

On Wed, Jan 26, 2022 at 02:48:14PM -0800, Joe Damato wrote:
> Greetings:
> 
> This series adds some stat counters for the page_pool allocation path which
> help to track:
> 
> 	- fast path allocations
> 	- slow path order-0 allocations
> 	- slow path high order allocations
> 	- refills which failed due to an empty ptr ring, forcing a slow
> 	  path allocation
> 	- allocations fulfilled via successful refill
> 	- pages which cannot be added to the cache because of numa mismatch
> 	  (i.e. waived)
> 

Thanks for the patch.  Stats are something that's indeed missing from the
API.  The patch  should work for Rx based allocations (which is what you
currently cover),  since the RX side is usually protected by NAPI.  However
we've added a few features recently,  which we would like to have stats on. 

commit 6a5bcd84e886("page_pool: Allow drivers to hint on SKB recycling"),
introduces recycling capabilities on the API.  I think it would be far more
interesting to be able to extend the statistics to recycled/non-recycled
packets as well in the future.  But the recycling is asynchronous and we
can't add locks just for the sake of accurate statistics.  Can we instead
convert that to a per-cpu structure for producers?

Thanks!
/Ilias

> Some static inline wrappers are provided for accessing these stats. The
> intention is that drivers which use the page_pool API can, if they choose,
> use this stats API.
> 
> It assumed that the API consumer will ensure the page_pool is not destroyed
> during calls to the stats API.
> 
> If this series is accepted, I'll submit a follow up patch which will export
> these stats per RX-ring via ethtool in a driver which uses the page_pool
> API.
> 
> Joe Damato (6):
>   net: page_pool: Add alloc stats and fast path stat
>   net: page_pool: Add a stat for the slow alloc path
>   net: page_pool: Add a high order alloc stat
>   net: page_pool: Add stat tracking empty ring
>   net: page_pool: Add stat tracking cache refills.
>   net: page_pool: Add a stat tracking waived pages.
> 
>  include/net/page_pool.h | 82 +++++++++++++++++++++++++++++++++++++++++++++++++
>  net/core/page_pool.c    | 15 +++++++--
>  2 files changed, 94 insertions(+), 3 deletions(-)
> 
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] net: page_pool: Add alloc stats and fast path stat
  2022-01-26 22:48 ` [PATCH 1/6] net: page_pool: Add alloc stats and fast path stat Joe Damato
@ 2022-01-27 16:32   ` Jakub Kicinski
  2022-01-27 21:11     ` Joe Damato
  2022-02-02 14:14   ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2022-01-27 16:32 UTC (permalink / raw)
  To: Joe Damato; +Cc: netdev, davem, ilias.apalodimas, hawk

On Wed, 26 Jan 2022 14:48:15 -0800 Joe Damato wrote:
> Add a stats structure with a an internal alloc structure for holding
> allocation path related stats.
> 
> The alloc structure contains the stat 'fast'. This stat tracks fast
> path allocations.
> 
> A static inline accessor function is exposed for accessing this stat.

> +/**
> + * stats for tracking page_pool events.
> + *
> + * accessor functions for these stats provided below.
> + *
> + * Note that it is the responsibility of the API consumer to ensure that
> + * the page_pool has not been destroyed while accessing stats fields.
> + */
> +struct page_pool_stats {
> +	struct {
> +		u64 fast; /* fast path allocations */
> +	} alloc;
> +};

scripts/kernel-doc says:

include/net/page_pool.h:75: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * stats for tracking page_pool events.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters
  2022-01-27  8:51 ` [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Jesper Dangaard Brouer
@ 2022-01-27 21:08   ` Joe Damato
  2022-02-02 14:04     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 17+ messages in thread
From: Joe Damato @ 2022-01-27 21:08 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, brouer, kuba, davem, ilias.apalodimas, hawk

On Thu, Jan 27, 2022 at 12:51 AM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
>
>
>
> On 26/01/2022 23.48, Joe Damato wrote:
> > Greetings:
> >
> > This series adds some stat counters for the page_pool allocation path which
> > help to track:
> >
> >       - fast path allocations
> >       - slow path order-0 allocations
> >       - slow path high order allocations
> >       - refills which failed due to an empty ptr ring, forcing a slow
> >         path allocation
> >       - allocations fulfilled via successful refill
> >       - pages which cannot be added to the cache because of numa mismatch
> >         (i.e. waived)
> >
> > Some static inline wrappers are provided for accessing these stats. The
> > intention is that drivers which use the page_pool API can, if they choose,
> > use this stats API.
>
> You are adding (always on) counters to a critical fast-path, that
> drivers uses for the XDP_DROP use-case.

If you prefer requiring users explicitly enable these stats, I am
happy to add a kernel config option (e.g. CONFIG_PAGE_POOL_DEBUG or
similar) in a v2.

> I want to see performance measurements as documentation, showing this is
> not causing a slow-down.
>
> I have some performance tests here[1]:
>   [1]
> https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/lib
>
> Look at:
>   - bench_page_pool_simple.c and
>   - bench_page_pool_cross_cpu.c
>
> How to use + build this[2]:
>   [2]
> https://prototype-kernel.readthedocs.io/en/latest/prototype-kernel/build-process.html

Thanks for the pointers to the benchmarks.

In general, I noted that the benchmark results varied fairly
substantially between repeated runs on the same system.

Results below suggest that:
   -  bench_page_pool_simple is faster on the test kernel, and
   - bench_page_pool_cross_cpu faster on the control

Subsequent runs of bench_page_pool_cross_cpu on the control, however,
reveal *much* slower results than shown below.

Test system:
  - 2 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  - 2 numa zones, with 18 cores per zone and 2 threads per core

Control kernel: built from net-next at commit e2cf07654efb ("ptp:
replace snprintf with sysfs_emit").
Test kernel: This series applied on top of control kernel mentioned above.

Raw output from dmesg for control [1] and test [2] kernel summarized below:

bench_page_pool_simple
  - run with default options (i.e. "sudo mod probe bench_page_pool_simple").

Control:

Type:for_loop Per elem: 0 cycles(tsc) 0.334 ns (step:0)
Type:atomic_inc Per elem: 13 cycles(tsc) 6.021 ns (step:0)
Type:lock Per elem: 31 cycles(tsc) 13.514 ns (step:0)

Type:no-softirq-page_pool01 Per elem: 44 cycles(tsc) 19.549 ns (step:0)
Type:no-softirq-page_pool02 Per elem: 45 cycles(tsc) 19.658 ns (step:0)
Type:no-softirq-page_pool03 Per elem: 118 cycles(tsc) 51.638 ns (step:0)

Type:tasklet_page_pool01_fast_path Per elem: 17 cycles(tsc) 7.472 ns (step:0)
Type:tasklet_page_pool02_ptr_ring Per elem: 42 cycles(tsc) 18.585 ns (step:0)
Type:tasklet_page_pool03_slow Per elem: 109 cycles(tsc) 47.807 ns (step:0)

Test:

Type:for_loop Per elem: 0 cycles(tsc) 0.334 ns (step:0)
Type:atomic_inc Per elem: 14 cycles(tsc) 6.195 ns (step:0)
Type:lock Per elem: 31 cycles(tsc) 13.827 ns (step:0)

Type:no-softirq-page_pool01 Per elem: 44 cycles(tsc) 19.561 ns (step:0)
Type:no-softirq-page_pool02 Per elem: 45 cycles(tsc) 19.700 ns (step:0)
Type:no-softirq-page_pool03 Per elem: 108 cycles(tsc) 47.186 ns (step:0)

Type:tasklet_page_pool01_fast_path Per elem: 12 cycles(tsc) 5.447 ns (step:0)
Type:tasklet_page_pool02_ptr_ring Per elem: 42 cycles(tsc) 18.501 ns (step:0)
Type:tasklet_page_pool03_slow Per elem: 106 cycles(tsc) 46.313 ns (step:0)

bench_page_pool_cross_cpu
  - run with default options (i.e. "sudo mod probe bench_page_pool_cross_cpu").

Control:
Type:page_pool_cross_cpu CPU(0) 1795 cycles(tsc) 782.567 ns (step:2)
Type:page_pool_cross_cpu CPU(1) 1921 cycles(tsc) 837.435 ns (step:2)
Type:page_pool_cross_cpu CPU(2) 960 cycles(tsc) 418.758 ns (step:2)
Sum Type:page_pool_cross_cpu Average: 1558 cycles(tsc) CPUs:3 step:2

Test:
Type:page_pool_cross_cpu CPU(0) 2411 cycles(tsc) 1051.037 ns (step:2)
Type:page_pool_cross_cpu CPU(1) 2467 cycles(tsc) 1075.204 ns (step:2)
Type:page_pool_cross_cpu CPU(2) 1233 cycles(tsc) 537.629 ns (step:2)
Type:page_pool_cross_cpu Average: 2037 cycles(tsc) CPUs:3 step:2


[1]: https://gist.githubusercontent.com/jdamato-fsly/385806f06cb95c61ff8cecf7a3645e75/raw/886e3208f5b9c47abdd59bdaa7ecf27994f477b1/page_pool_bench_control
[2]: https://gist.githubusercontent.com/jdamato-fsly/385806f06cb95c61ff8cecf7a3645e75/raw/886e3208f5b9c47abdd59bdaa7ecf27994f477b1/page_pool_bench_TESTKERNEL


> > It assumed that the API consumer will ensure the page_pool is not destroyed
> > during calls to the stats API.
> >
> > If this series is accepted, I'll submit a follow up patch which will export
> > these stats per RX-ring via ethtool in a driver which uses the page_pool
> > API.
> >
> > Joe Damato (6):
> >    net: page_pool: Add alloc stats and fast path stat
> >    net: page_pool: Add a stat for the slow alloc path
> >    net: page_pool: Add a high order alloc stat
> >    net: page_pool: Add stat tracking empty ring
> >    net: page_pool: Add stat tracking cache refills.
> >    net: page_pool: Add a stat tracking waived pages.
> >
> >   include/net/page_pool.h | 82 +++++++++++++++++++++++++++++++++++++++++++++++++
> >   net/core/page_pool.c    | 15 +++++++--
> >   2 files changed, 94 insertions(+), 3 deletions(-)
> >
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] net: page_pool: Add alloc stats and fast path stat
  2022-01-27 16:32   ` Jakub Kicinski
@ 2022-01-27 21:11     ` Joe Damato
  0 siblings, 0 replies; 17+ messages in thread
From: Joe Damato @ 2022-01-27 21:11 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, davem, ilias.apalodimas, hawk

On Thu, Jan 27, 2022 at 8:32 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 26 Jan 2022 14:48:15 -0800 Joe Damato wrote:
> > Add a stats structure with a an internal alloc structure for holding
> > allocation path related stats.
> >
> > The alloc structure contains the stat 'fast'. This stat tracks fast
> > path allocations.
> >
> > A static inline accessor function is exposed for accessing this stat.
>
> > +/**
> > + * stats for tracking page_pool events.
> > + *
> > + * accessor functions for these stats provided below.
> > + *
> > + * Note that it is the responsibility of the API consumer to ensure that
> > + * the page_pool has not been destroyed while accessing stats fields.
> > + */
> > +struct page_pool_stats {
> > +     struct {
> > +             u64 fast; /* fast path allocations */
> > +     } alloc;
> > +};
>
> scripts/kernel-doc says:
>
> include/net/page_pool.h:75: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
>  * stats for tracking page_pool events.

Thank you. I had only been running scripts/checkpatch, but will
remember to also run kernel-doc in the future. I will correct the
comments in the v2.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters
  2022-01-27  9:08 ` Ilias Apalodimas
@ 2022-01-27 23:55   ` Joe Damato
  2022-01-29 14:07     ` Ilias Apalodimas
  0 siblings, 1 reply; 17+ messages in thread
From: Joe Damato @ 2022-01-27 23:55 UTC (permalink / raw)
  To: Ilias Apalodimas; +Cc: netdev, kuba, davem, hawk

On Thu, Jan 27, 2022 at 1:08 AM Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
>
> Hi Joe,
>
> On Wed, Jan 26, 2022 at 02:48:14PM -0800, Joe Damato wrote:
> > Greetings:
> >
> > This series adds some stat counters for the page_pool allocation path which
> > help to track:
> >
> >       - fast path allocations
> >       - slow path order-0 allocations
> >       - slow path high order allocations
> >       - refills which failed due to an empty ptr ring, forcing a slow
> >         path allocation
> >       - allocations fulfilled via successful refill
> >       - pages which cannot be added to the cache because of numa mismatch
> >         (i.e. waived)
> >
>
> Thanks for the patch.  Stats are something that's indeed missing from the
> API.  The patch  should work for Rx based allocations (which is what you
> currently cover),  since the RX side is usually protected by NAPI.  However
> we've added a few features recently,  which we would like to have stats on.

Thanks for taking a look at the patch.

> commit 6a5bcd84e886("page_pool: Allow drivers to hint on SKB recycling"),
> introduces recycling capabilities on the API.  I think it would be far more
> interesting to be able to extend the statistics to recycled/non-recycled
> packets as well in the future.

I agree. Tracking recycling events would be both helpful and
interesting, indeed.

> But the recycling is asynchronous and we
> can't add locks just for the sake of accurate statistics.

Agreed.

> Can we instead
> convert that to a per-cpu structure for producers?

If my understanding of your proposal is accurate, moving the stats
structure to a per-cpu structure (instead of per-pool) would add
ambiguity as to the performance of a specific driver's page pool. In
exchange for the ambiguity, though, we'd get stats for additional
events, which could be interesting.

It seems like under load it might be very useful to know that a
particular driver's page pool is adding pressure to the buddy
allocator in the slow path. I suppose that a user could move softirqs
around on their system to alleviate some of the ambiguity and perhaps
that is good enough.







> Thanks!
> /Ilias
>
> > Some static inline wrappers are provided for accessing these stats. The
> > intention is that drivers which use the page_pool API can, if they choose,
> > use this stats API.
> >
> > It assumed that the API consumer will ensure the page_pool is not destroyed
> > during calls to the stats API.
> >
> > If this series is accepted, I'll submit a follow up patch which will export
> > these stats per RX-ring via ethtool in a driver which uses the page_pool
> > API.
> >
> > Joe Damato (6):
> >   net: page_pool: Add alloc stats and fast path stat
> >   net: page_pool: Add a stat for the slow alloc path
> >   net: page_pool: Add a high order alloc stat
> >   net: page_pool: Add stat tracking empty ring
> >   net: page_pool: Add stat tracking cache refills.
> >   net: page_pool: Add a stat tracking waived pages.
> >
> >  include/net/page_pool.h | 82 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  net/core/page_pool.c    | 15 +++++++--
> >  2 files changed, 94 insertions(+), 3 deletions(-)
> >
> > --
> > 2.7.4
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters
  2022-01-27 23:55   ` Joe Damato
@ 2022-01-29 14:07     ` Ilias Apalodimas
  2022-01-29 18:07       ` Joe Damato
  0 siblings, 1 reply; 17+ messages in thread
From: Ilias Apalodimas @ 2022-01-29 14:07 UTC (permalink / raw)
  To: Joe Damato; +Cc: netdev, kuba, davem, hawk

Hi Joe!

On Thu, Jan 27, 2022 at 03:55:03PM -0800, Joe Damato wrote:
> On Thu, Jan 27, 2022 at 1:08 AM Ilias Apalodimas
> <ilias.apalodimas@linaro.org> wrote:
> >
> > Hi Joe,
> >
> > On Wed, Jan 26, 2022 at 02:48:14PM -0800, Joe Damato wrote:
> > > Greetings:
> > >
> > > This series adds some stat counters for the page_pool allocation path which
> > > help to track:
> > >
> > >       - fast path allocations
> > >       - slow path order-0 allocations
> > >       - slow path high order allocations
> > >       - refills which failed due to an empty ptr ring, forcing a slow
> > >         path allocation
> > >       - allocations fulfilled via successful refill
> > >       - pages which cannot be added to the cache because of numa mismatch
> > >         (i.e. waived)
> > >
> >
> > Thanks for the patch.  Stats are something that's indeed missing from the
> > API.  The patch  should work for Rx based allocations (which is what you
> > currently cover),  since the RX side is usually protected by NAPI.  However
> > we've added a few features recently,  which we would like to have stats on.
> 
> Thanks for taking a look at the patch.
> 

yw

> > commit 6a5bcd84e886("page_pool: Allow drivers to hint on SKB recycling"),
> > introduces recycling capabilities on the API.  I think it would be far more
> > interesting to be able to extend the statistics to recycled/non-recycled
> > packets as well in the future.
> 
> I agree. Tracking recycling events would be both helpful and
> interesting, indeed.
> 
> > But the recycling is asynchronous and we
> > can't add locks just for the sake of accurate statistics.
> 
> Agreed.
> 
> > Can we instead
> > convert that to a per-cpu structure for producers?
> 
> If my understanding of your proposal is accurate, moving the stats
> structure to a per-cpu structure (instead of per-pool) would add
> ambiguity as to the performance of a specific driver's page pool. In
> exchange for the ambiguity, though, we'd get stats for additional
> events, which could be interesting.

I was mostly thinking per pool using with 'struct percpu_counter' or 
allocate __percpu variables,  but I haven't really checked if that's doable or 
which of those is better suited for our case.

> 
> It seems like under load it might be very useful to know that a
> particular driver's page pool is adding pressure to the buddy
> allocator in the slow path. I suppose that a user could move softirqs
> around on their system to alleviate some of the ambiguity and perhaps
> that is good enough.
> 

[...]

Cheers
/Ilias

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters
  2022-01-29 14:07     ` Ilias Apalodimas
@ 2022-01-29 18:07       ` Joe Damato
  0 siblings, 0 replies; 17+ messages in thread
From: Joe Damato @ 2022-01-29 18:07 UTC (permalink / raw)
  To: Ilias Apalodimas; +Cc: netdev, kuba, davem, hawk

On Sat, Jan 29, 2022 at 6:07 AM Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
>
> Hi Joe!
>
> On Thu, Jan 27, 2022 at 03:55:03PM -0800, Joe Damato wrote:
> > On Thu, Jan 27, 2022 at 1:08 AM Ilias Apalodimas
> > <ilias.apalodimas@linaro.org> wrote:
> > >
> > > Hi Joe,
> > >
> > > On Wed, Jan 26, 2022 at 02:48:14PM -0800, Joe Damato wrote:
> > > > Greetings:
> > > >
> > > > This series adds some stat counters for the page_pool allocation path which
> > > > help to track:
> > > >
> > > >       - fast path allocations
> > > >       - slow path order-0 allocations
> > > >       - slow path high order allocations
> > > >       - refills which failed due to an empty ptr ring, forcing a slow
> > > >         path allocation
> > > >       - allocations fulfilled via successful refill
> > > >       - pages which cannot be added to the cache because of numa mismatch
> > > >         (i.e. waived)
> > > >
> > >
> > > Thanks for the patch.  Stats are something that's indeed missing from the
> > > API.  The patch  should work for Rx based allocations (which is what you
> > > currently cover),  since the RX side is usually protected by NAPI.  However
> > > we've added a few features recently,  which we would like to have stats on.
> >
> > Thanks for taking a look at the patch.
> >
>
> yw
>
> > > commit 6a5bcd84e886("page_pool: Allow drivers to hint on SKB recycling"),
> > > introduces recycling capabilities on the API.  I think it would be far more
> > > interesting to be able to extend the statistics to recycled/non-recycled
> > > packets as well in the future.
> >
> > I agree. Tracking recycling events would be both helpful and
> > interesting, indeed.
> >
> > > But the recycling is asynchronous and we
> > > can't add locks just for the sake of accurate statistics.
> >
> > Agreed.
> >
> > > Can we instead
> > > convert that to a per-cpu structure for producers?
> >
> > If my understanding of your proposal is accurate, moving the stats
> > structure to a per-cpu structure (instead of per-pool) would add
> > ambiguity as to the performance of a specific driver's page pool. In
> > exchange for the ambiguity, though, we'd get stats for additional
> > events, which could be interesting.
>
> I was mostly thinking per pool using with 'struct percpu_counter' or
> allocate __percpu variables,  but I haven't really checked if that's doable or
> which of those is better suited for our case.

I wrote up a v2 last night that allocates and exports a
page_pool_stats structure per cpu (but not per pool). The data can be
accessed by users in the file /proc/net/page_pool_stats. The approach
is similar to the way softnet_stat is implemented.

The main advantage with this approach is that no driver modifications
are needed and no additional APIs are exposed that will need to be
maintained. Adding new stats in the future would be much simpler with
this approach. I've also moved all the code behind a kernel config
flag so users can opt-in to get these stats.

I'll send the v2 shortly.

Thanks,
Joe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters
  2022-01-27 21:08   ` Joe Damato
@ 2022-02-02 14:04     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2022-02-02 14:04 UTC (permalink / raw)
  To: Joe Damato, Jesper Dangaard Brouer
  Cc: brouer, netdev, kuba, davem, ilias.apalodimas, hawk

On 27/01/2022 22.08, Joe Damato wrote:
> On Thu, Jan 27, 2022 at 12:51 AM Jesper Dangaard Brouer
> <jbrouer@redhat.com> wrote:
>>
>> On 26/01/2022 23.48, Joe Damato wrote:
>>> Greetings:
>>>
>>> This series adds some stat counters for the page_pool allocation path which
>>> help to track:
>>>
>>>        - fast path allocations
>>>        - slow path order-0 allocations
>>>        - slow path high order allocations
>>>        - refills which failed due to an empty ptr ring, forcing a slow
>>>          path allocation
>>>        - allocations fulfilled via successful refill
>>>        - pages which cannot be added to the cache because of numa mismatch
>>>          (i.e. waived)
>>>
>>> Some static inline wrappers are provided for accessing these stats. The
>>> intention is that drivers which use the page_pool API can, if they choose,
>>> use this stats API.
>>
>> You are adding (always on) counters to a critical fast-path, that
>> drivers uses for the XDP_DROP use-case.
> 
> If you prefer requiring users explicitly enable these stats, I am
> happy to add a kernel config option (e.g. CONFIG_PAGE_POOL_DEBUG or
> similar) in a v2.
> 
>> I want to see performance measurements as documentation, showing this is
>> not causing a slow-down.
>>
>> I have some performance tests here[1]:
>>    [1]
>> https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/lib
>>
>> Look at:
>>    - bench_page_pool_simple.c and
>>    - bench_page_pool_cross_cpu.c
>>
>> How to use + build this[2]:
>>    [2]
>> https://prototype-kernel.readthedocs.io/en/latest/prototype-kernel/build-process.html
> 
> Thanks for the pointers to the benchmarks.
> 
> In general, I noted that the benchmark results varied fairly
> substantially between repeated runs on the same system.
> 
> Results below suggest that:
>     -  bench_page_pool_simple is faster on the test kernel, and
>     - bench_page_pool_cross_cpu faster on the control
> 
> Subsequent runs of bench_page_pool_cross_cpu on the control, however,
> reveal *much* slower results than shown below.
> 
> Test system:
>    - 2 x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
>    - 2 numa zones, with 18 cores per zone and 2 threads per core
> 
> Control kernel: built from net-next at commit e2cf07654efb ("ptp:
> replace snprintf with sysfs_emit").
> Test kernel: This series applied on top of control kernel mentioned above.
> 
> Raw output from dmesg for control [1] and test [2] kernel summarized below:
> 
> bench_page_pool_simple
>    - run with default options (i.e. "sudo mod probe bench_page_pool_simple").
> 
> Control:
> 
> Type:for_loop Per elem: 0 cycles(tsc) 0.334 ns (step:0)
> Type:atomic_inc Per elem: 13 cycles(tsc) 6.021 ns (step:0)
> Type:lock Per elem: 31 cycles(tsc) 13.514 ns (step:0)
> 
> Type:no-softirq-page_pool01 Per elem: 44 cycles(tsc) 19.549 ns (step:0)
> Type:no-softirq-page_pool02 Per elem: 45 cycles(tsc) 19.658 ns (step:0)
> Type:no-softirq-page_pool03 Per elem: 118 cycles(tsc) 51.638 ns (step:0)
> 
> Type:tasklet_page_pool01_fast_path Per elem: 17 cycles(tsc) 7.472 ns (step:0)
> Type:tasklet_page_pool02_ptr_ring Per elem: 42 cycles(tsc) 18.585 ns (step:0)
> Type:tasklet_page_pool03_slow Per elem: 109 cycles(tsc) 47.807 ns (step:0)
> 
> Test:
> 
> Type:for_loop Per elem: 0 cycles(tsc) 0.334 ns (step:0)
> Type:atomic_inc Per elem: 14 cycles(tsc) 6.195 ns (step:0)
> Type:lock Per elem: 31 cycles(tsc) 13.827 ns (step:0)
> 
> Type:no-softirq-page_pool01 Per elem: 44 cycles(tsc) 19.561 ns (step:0)
> Type:no-softirq-page_pool02 Per elem: 45 cycles(tsc) 19.700 ns (step:0)
> Type:no-softirq-page_pool03 Per elem: 108 cycles(tsc) 47.186 ns (step:0)
> 
> Type:tasklet_page_pool01_fast_path Per elem: 12 cycles(tsc) 5.447 ns (step:0)

Watch out for the: measurement period time:0.054478253 (taken from [2])

If the measurement period becomes too small, you/we cannot use the 
results.  Perhaps I've set the default 'loops' variable too low, for 
these modern systems.  Hint it is adjust as module parameter 'loops'.



> Type:tasklet_page_pool02_ptr_ring Per elem: 42 cycles(tsc) 18.501 ns (step:0)
> Type:tasklet_page_pool03_slow Per elem: 106 cycles(tsc) 46.313 ns (step:0)
> 
> bench_page_pool_cross_cpu
>    - run with default options (i.e. "sudo mod probe bench_page_pool_cross_cpu").
> 
> Control:
> Type:page_pool_cross_cpu CPU(0) 1795 cycles(tsc) 782.567 ns (step:2)
> Type:page_pool_cross_cpu CPU(1) 1921 cycles(tsc) 837.435 ns (step:2)
> Type:page_pool_cross_cpu CPU(2) 960 cycles(tsc) 418.758 ns (step:2)
> Sum Type:page_pool_cross_cpu Average: 1558 cycles(tsc) CPUs:3 step:2
> 
> Test:
> Type:page_pool_cross_cpu CPU(0) 2411 cycles(tsc) 1051.037 ns (step:2)
> Type:page_pool_cross_cpu CPU(1) 2467 cycles(tsc) 1075.204 ns (step:2)
> Type:page_pool_cross_cpu CPU(2) 1233 cycles(tsc) 537.629 ns (step:2)
> Type:page_pool_cross_cpu Average: 2037 cycles(tsc) CPUs:3 step:2

I think the effect you are seeing here is because you placed your stats 
struct on the a cache-line that is also used by remote CPUs 
freeing/recycling page'es back to the page_pool.


> 
> [1]: https://gist.githubusercontent.com/jdamato-fsly/385806f06cb95c61ff8cecf7a3645e75/raw/886e3208f5b9c47abdd59bdaa7ecf27994f477b1/page_pool_bench_control
> [2]: https://gist.githubusercontent.com/jdamato-fsly/385806f06cb95c61ff8cecf7a3645e75/raw/886e3208f5b9c47abdd59bdaa7ecf27994f477b1/page_pool_bench_TESTKERNEL
> 
> 
>>> It assumed that the API consumer will ensure the page_pool is not destroyed
>>> during calls to the stats API.
>>>
>>> If this series is accepted, I'll submit a follow up patch which will export
>>> these stats per RX-ring via ethtool in a driver which uses the page_pool
>>> API.
>>>
>>> Joe Damato (6):
>>>     net: page_pool: Add alloc stats and fast path stat
>>>     net: page_pool: Add a stat for the slow alloc path
>>>     net: page_pool: Add a high order alloc stat
>>>     net: page_pool: Add stat tracking empty ring
>>>     net: page_pool: Add stat tracking cache refills.
>>>     net: page_pool: Add a stat tracking waived pages.
>>>
>>>    include/net/page_pool.h | 82 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>    net/core/page_pool.c    | 15 +++++++--
>>>    2 files changed, 94 insertions(+), 3 deletions(-)
>>>
>>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] net: page_pool: Add alloc stats and fast path stat
  2022-01-26 22:48 ` [PATCH 1/6] net: page_pool: Add alloc stats and fast path stat Joe Damato
  2022-01-27 16:32   ` Jakub Kicinski
@ 2022-02-02 14:14   ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2022-02-02 14:14 UTC (permalink / raw)
  To: Joe Damato, netdev
  Cc: brouer, kuba, davem, ilias.apalodimas, hawk, Tariq Toukan,
	Saeed Mahameed


On 26/01/2022 23.48, Joe Damato wrote:
 > @@ -86,6 +100,7 @@ struct page_pool_params {
 >
 >   struct page_pool {
 >   	struct page_pool_params p;
 > +	struct page_pool_stats ps;
 >
 >   	struct delayed_work release_dw;
The placement of page_pool_stats is problematic, due to cache-line 
placement.

It will be sharing a cache-line with page_pool_params, which is 
read-mostly.  As I say on benchmark email, this is likely the 
performance regression you saw.

I *do* know you have changed location (to percpu) in newer patch 
versions, but I think it is important to point out, that we have to be 
very careful about cache-line placements, and which part of 'struct 
page_pool' is accessed by which part of the code RX or 
DMA-TX-completion "return" code.

--Jesper


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-02-02 14:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-26 22:48 [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Joe Damato
2022-01-26 22:48 ` [PATCH 1/6] net: page_pool: Add alloc stats and fast path stat Joe Damato
2022-01-27 16:32   ` Jakub Kicinski
2022-01-27 21:11     ` Joe Damato
2022-02-02 14:14   ` Jesper Dangaard Brouer
2022-01-26 22:48 ` [PATCH net-next 2/6] net: page_pool: Add a stat for the slow alloc path Joe Damato
2022-01-26 22:48 ` [PATCH net-next 3/6] net: page_pool: Add a high order alloc stat Joe Damato
2022-01-26 22:48 ` [PATCH net-next 4/6] net: page_pool: Add stat tracking empty ring Joe Damato
2022-01-26 22:48 ` [PATCH net-next 5/6] net: page_pool: Add stat tracking cache refills Joe Damato
2022-01-26 22:48 ` [PATCH 6/6] net: page_pool: Add a stat tracking waived pages Joe Damato
2022-01-27  8:51 ` [PATCH net-next 0/6] net: page_pool: Add page_pool stat counters Jesper Dangaard Brouer
2022-01-27 21:08   ` Joe Damato
2022-02-02 14:04     ` Jesper Dangaard Brouer
2022-01-27  9:08 ` Ilias Apalodimas
2022-01-27 23:55   ` Joe Damato
2022-01-29 14:07     ` Ilias Apalodimas
2022-01-29 18:07       ` Joe Damato

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.