bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 net-next 0/5] xdp: introduce bulking for page_pool tx return path
@ 2020-11-04 10:22 Lorenzo Bianconi
  2020-11-04 10:22 ` [PATCH v3 net-next 1/5] net: xdp: introduce bulking for xdp " Lorenzo Bianconi
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Lorenzo Bianconi @ 2020-11-04 10:22 UTC (permalink / raw)
  To: netdev; +Cc: bpf, lorenzo.bianconi, davem, kuba, brouer, ilias.apalodimas

XDP bulk APIs introduce a defer/flush mechanism to return
pages belonging to the same xdp_mem_allocator object
(identified via the mem.id field) in bulk to optimize
I-cache and D-cache since xdp_return_frame is usually run
inside the driver NAPI tx completion loop.
Convert mvneta, mvpp2 and mlx5 drivers to xdp_return_frame_bulk APIs.

Changes since v2:
- move mvneta changes in a dedicated patch

Changes since v1:
- improve comments
- rework xdp_return_frame_bulk routine logic
- move count and xa fields at the beginning of xdp_frame_bulk struct
- invert logic in page_pool_put_page_bulk for loop

Lorenzo Bianconi (5):
  net: xdp: introduce bulking for xdp tx return path
  net: page_pool: add bulk support for ptr_ring
  net: mvneta: add xdp tx return bulking support
  net: mvpp2: add xdp tx return bulking support
  net: mlx5: add xdp tx return bulking support

 drivers/net/ethernet/marvell/mvneta.c         |  5 +-
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  5 +-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  5 +-
 include/net/page_pool.h                       | 26 +++++++++
 include/net/xdp.h                             |  9 +++
 net/core/page_pool.c                          | 35 ++++++++++++
 net/core/xdp.c                                | 56 +++++++++++++++++++
 7 files changed, 138 insertions(+), 3 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 net-next 1/5] net: xdp: introduce bulking for xdp tx return path
  2020-11-04 10:22 [PATCH v3 net-next 0/5] xdp: introduce bulking for page_pool tx return path Lorenzo Bianconi
@ 2020-11-04 10:22 ` Lorenzo Bianconi
  2020-11-04 11:11   ` Jesper Dangaard Brouer
  2020-11-04 10:22 ` [PATCH v3 net-next 2/5] net: page_pool: add bulk support for ptr_ring Lorenzo Bianconi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Lorenzo Bianconi @ 2020-11-04 10:22 UTC (permalink / raw)
  To: netdev; +Cc: bpf, lorenzo.bianconi, davem, kuba, brouer, ilias.apalodimas

XDP bulk APIs introduce a defer/flush mechanism to return
pages belonging to the same xdp_mem_allocator object
(identified via the mem.id field) in bulk to optimize
I-cache and D-cache since xdp_return_frame is usually run
inside the driver NAPI tx completion loop.
The bulk queue size is set to 16 to be aligned to how
XDP_REDIRECT bulking works. The bulk is flushed when
it is full or when mem.id changes.
xdp_frame_bulk is usually stored/allocated on the function
call-stack to avoid locking penalties.
Current implementation considers only page_pool memory model.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h |  9 +++++++
 net/core/xdp.c    | 61 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 3814fb631d52..a1f48a73e6df 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -104,6 +104,12 @@ struct xdp_frame {
 	struct net_device *dev_rx; /* used by cpumap */
 };
 
+#define XDP_BULK_QUEUE_SIZE	16
+struct xdp_frame_bulk {
+	int count;
+	void *xa;
+	void *q[XDP_BULK_QUEUE_SIZE];
+};
 
 static inline struct skb_shared_info *
 xdp_get_shared_info_from_frame(struct xdp_frame *frame)
@@ -194,6 +200,9 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
 void xdp_return_frame(struct xdp_frame *xdpf);
 void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
 void xdp_return_buff(struct xdp_buff *xdp);
+void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq);
+void xdp_return_frame_bulk(struct xdp_frame *xdpf,
+			   struct xdp_frame_bulk *bq);
 
 /* When sending xdp_frame into the network stack, then there is no
  * return point callback, which is needed to release e.g. DMA-mapping
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 48aba933a5a8..66ac275a0360 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -380,6 +380,67 @@ void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
 
+/* XDP bulk APIs introduce a defer/flush mechanism to return
+ * pages belonging to the same xdp_mem_allocator object
+ * (identified via the mem.id field) in bulk to optimize
+ * I-cache and D-cache.
+ * The bulk queue size is set to 16 to be aligned to how
+ * XDP_REDIRECT bulking works. The bulk is flushed when
+ * it is full or when mem.id changes.
+ * xdp_frame_bulk is usually stored/allocated on the function
+ * call-stack to avoid locking penalties.
+ */
+void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq)
+{
+	struct xdp_mem_allocator *xa = bq->xa;
+	int i;
+
+	if (unlikely(!xa))
+		return;
+
+	for (i = 0; i < bq->count; i++) {
+		struct page *page = virt_to_head_page(bq->q[i]);
+
+		page_pool_put_full_page(xa->page_pool, page, false);
+	}
+	bq->count = 0;
+}
+EXPORT_SYMBOL_GPL(xdp_flush_frame_bulk);
+
+void xdp_return_frame_bulk(struct xdp_frame *xdpf,
+			   struct xdp_frame_bulk *bq)
+{
+	struct xdp_mem_info *mem = &xdpf->mem;
+	struct xdp_mem_allocator *xa;
+
+	if (mem->type != MEM_TYPE_PAGE_POOL) {
+		__xdp_return(xdpf->data, &xdpf->mem, false);
+		return;
+	}
+
+	rcu_read_lock();
+
+	xa = bq->xa;
+	if (unlikely(!xa)) {
+		xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
+		bq->count = 0;
+		bq->xa = xa;
+	}
+
+	if (bq->count == XDP_BULK_QUEUE_SIZE)
+		xdp_flush_frame_bulk(bq);
+
+	if (mem->id != xa->mem.id) {
+		xdp_flush_frame_bulk(bq);
+		bq->xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
+	}
+
+	bq->q[bq->count++] = xdpf->data;
+
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(xdp_return_frame_bulk);
+
 void xdp_return_buff(struct xdp_buff *xdp)
 {
 	__xdp_return(xdp->data, &xdp->rxq->mem, true);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 net-next 2/5] net: page_pool: add bulk support for ptr_ring
  2020-11-04 10:22 [PATCH v3 net-next 0/5] xdp: introduce bulking for page_pool tx return path Lorenzo Bianconi
  2020-11-04 10:22 ` [PATCH v3 net-next 1/5] net: xdp: introduce bulking for xdp " Lorenzo Bianconi
@ 2020-11-04 10:22 ` Lorenzo Bianconi
  2020-11-04 12:24   ` Jesper Dangaard Brouer
  2020-11-04 10:22 ` [PATCH v3 net-next 3/5] net: mvneta: add xdp tx return bulking support Lorenzo Bianconi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Lorenzo Bianconi @ 2020-11-04 10:22 UTC (permalink / raw)
  To: netdev; +Cc: bpf, lorenzo.bianconi, davem, kuba, brouer, ilias.apalodimas

Introduce the capability to batch page_pool ptr_ring refill since it is
usually run inside the driver NAPI tx completion loop.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/page_pool.h | 26 ++++++++++++++++++++++++++
 net/core/page_pool.c    | 35 +++++++++++++++++++++++++++++++++++
 net/core/xdp.c          |  9 ++-------
 3 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 81d7773f96cd..b5b195305346 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -152,6 +152,8 @@ struct page_pool *page_pool_create(const struct page_pool_params *params);
 void page_pool_destroy(struct page_pool *pool);
 void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *));
 void page_pool_release_page(struct page_pool *pool, struct page *page);
+void page_pool_put_page_bulk(struct page_pool *pool, void **data,
+			     int count);
 #else
 static inline void page_pool_destroy(struct page_pool *pool)
 {
@@ -165,6 +167,11 @@ static inline void page_pool_release_page(struct page_pool *pool,
 					  struct page *page)
 {
 }
+
+static inline void page_pool_put_page_bulk(struct page_pool *pool, void **data,
+					   int count)
+{
+}
 #endif
 
 void page_pool_put_page(struct page_pool *pool, struct page *page,
@@ -215,4 +222,23 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid)
 	if (unlikely(pool->p.nid != new_nid))
 		page_pool_update_nid(pool, new_nid);
 }
+
+static inline void page_pool_ring_lock(struct page_pool *pool)
+	__acquires(&pool->ring.producer_lock)
+{
+	if (in_serving_softirq())
+		spin_lock(&pool->ring.producer_lock);
+	else
+		spin_lock_bh(&pool->ring.producer_lock);
+}
+
+static inline void page_pool_ring_unlock(struct page_pool *pool)
+	__releases(&pool->ring.producer_lock)
+{
+	if (in_serving_softirq())
+		spin_unlock(&pool->ring.producer_lock);
+	else
+		spin_unlock_bh(&pool->ring.producer_lock);
+}
+
 #endif /* _NET_PAGE_POOL_H */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index ef98372facf6..236c5ed3aa66 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -11,6 +11,8 @@
 #include <linux/device.h>
 
 #include <net/page_pool.h>
+#include <net/xdp.h>
+
 #include <linux/dma-direction.h>
 #include <linux/dma-mapping.h>
 #include <linux/page-flags.h>
@@ -408,6 +410,39 @@ void page_pool_put_page(struct page_pool *pool, struct page *page,
 }
 EXPORT_SYMBOL(page_pool_put_page);
 
+void page_pool_put_page_bulk(struct page_pool *pool, void **data,
+			     int count)
+{
+	int i, len = 0;
+
+	for (i = 0; i < count; i++) {
+		struct page *page = virt_to_head_page(data[i]);
+
+		if (likely(page_ref_count(page) == 1 &&
+			   pool_page_reusable(pool, page))) {
+			if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
+				page_pool_dma_sync_for_device(pool, page, -1);
+
+			/* bulk pages for ptr_ring cache */
+			data[len++] = page;
+		} else {
+			page_pool_release_page(pool, page);
+			put_page(page);
+		}
+	}
+
+	/* Grab the producer spinlock for concurrent access to
+	 * ptr_ring page_pool cache
+	 */
+	page_pool_ring_lock(pool);
+	for (i = 0; i < len; i++) {
+		if (__ptr_ring_produce(&pool->ring, data[i]))
+			page_pool_return_page(pool, data[i]);
+	}
+	page_pool_ring_unlock(pool);
+}
+EXPORT_SYMBOL(page_pool_put_page_bulk);
+
 static void page_pool_empty_ring(struct page_pool *pool)
 {
 	struct page *page;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 66ac275a0360..ff7c801bd40c 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -393,16 +393,11 @@ EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
 void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq)
 {
 	struct xdp_mem_allocator *xa = bq->xa;
-	int i;
 
-	if (unlikely(!xa))
+	if (unlikely(!xa || !bq->count))
 		return;
 
-	for (i = 0; i < bq->count; i++) {
-		struct page *page = virt_to_head_page(bq->q[i]);
-
-		page_pool_put_full_page(xa->page_pool, page, false);
-	}
+	page_pool_put_page_bulk(xa->page_pool, bq->q, bq->count);
 	bq->count = 0;
 }
 EXPORT_SYMBOL_GPL(xdp_flush_frame_bulk);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 net-next 3/5] net: mvneta: add xdp tx return bulking support
  2020-11-04 10:22 [PATCH v3 net-next 0/5] xdp: introduce bulking for page_pool tx return path Lorenzo Bianconi
  2020-11-04 10:22 ` [PATCH v3 net-next 1/5] net: xdp: introduce bulking for xdp " Lorenzo Bianconi
  2020-11-04 10:22 ` [PATCH v3 net-next 2/5] net: page_pool: add bulk support for ptr_ring Lorenzo Bianconi
@ 2020-11-04 10:22 ` Lorenzo Bianconi
  2020-11-04 10:22 ` [PATCH v3 net-next 4/5] net: mvpp2: " Lorenzo Bianconi
  2020-11-04 10:22 ` [PATCH v3 net-next 5/5] net: mlx5: " Lorenzo Bianconi
  4 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Bianconi @ 2020-11-04 10:22 UTC (permalink / raw)
  To: netdev; +Cc: bpf, lorenzo.bianconi, davem, kuba, brouer, ilias.apalodimas

Convert mvneta driver to xdp_return_frame_bulk APIs.

XDP_REDIRECT (upstream codepath): 275Kpps
XDP_REDIRECT (upstream codepath + bulking APIs): 284Kpps

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 54b0bf574c05..43ab8a73900e 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1834,8 +1834,10 @@ static void mvneta_txq_bufs_free(struct mvneta_port *pp,
 				 struct netdev_queue *nq, bool napi)
 {
 	unsigned int bytes_compl = 0, pkts_compl = 0;
+	struct xdp_frame_bulk bq;
 	int i;
 
+	bq.xa = NULL;
 	for (i = 0; i < num; i++) {
 		struct mvneta_tx_buf *buf = &txq->buf[txq->txq_get_index];
 		struct mvneta_tx_desc *tx_desc = txq->descs +
@@ -1857,9 +1859,10 @@ static void mvneta_txq_bufs_free(struct mvneta_port *pp,
 			if (napi && buf->type == MVNETA_TYPE_XDP_TX)
 				xdp_return_frame_rx_napi(buf->xdpf);
 			else
-				xdp_return_frame(buf->xdpf);
+				xdp_return_frame_bulk(buf->xdpf, &bq);
 		}
 	}
+	xdp_flush_frame_bulk(&bq);
 
 	netdev_tx_completed_queue(nq, pkts_compl, bytes_compl);
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 net-next 4/5] net: mvpp2: add xdp tx return bulking support
  2020-11-04 10:22 [PATCH v3 net-next 0/5] xdp: introduce bulking for page_pool tx return path Lorenzo Bianconi
                   ` (2 preceding siblings ...)
  2020-11-04 10:22 ` [PATCH v3 net-next 3/5] net: mvneta: add xdp tx return bulking support Lorenzo Bianconi
@ 2020-11-04 10:22 ` Lorenzo Bianconi
  2020-11-04 10:22 ` [PATCH v3 net-next 5/5] net: mlx5: " Lorenzo Bianconi
  4 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Bianconi @ 2020-11-04 10:22 UTC (permalink / raw)
  To: netdev; +Cc: bpf, lorenzo.bianconi, davem, kuba, brouer, ilias.apalodimas

Convert mvpp2 driver to xdp_return_frame_bulk APIs.

XDP_REDIRECT (upstream codepath): 1.79Mpps
XDP_REDIRECT (upstream codepath + bulking APIs): 1.93Mpps

Tested-by: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index f6616c8933ca..04f24d1d72ab 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -2440,8 +2440,10 @@ static void mvpp2_txq_bufs_free(struct mvpp2_port *port,
 				struct mvpp2_tx_queue *txq,
 				struct mvpp2_txq_pcpu *txq_pcpu, int num)
 {
+	struct xdp_frame_bulk bq;
 	int i;
 
+	bq.xa = NULL;
 	for (i = 0; i < num; i++) {
 		struct mvpp2_txq_pcpu_buf *tx_buf =
 			txq_pcpu->buffs + txq_pcpu->txq_get_index;
@@ -2454,10 +2456,11 @@ static void mvpp2_txq_bufs_free(struct mvpp2_port *port,
 			dev_kfree_skb_any(tx_buf->skb);
 		else if (tx_buf->type == MVPP2_TYPE_XDP_TX ||
 			 tx_buf->type == MVPP2_TYPE_XDP_NDO)
-			xdp_return_frame(tx_buf->xdpf);
+			xdp_return_frame_bulk(tx_buf->xdpf, &bq);
 
 		mvpp2_txq_inc_get(txq_pcpu);
 	}
+	xdp_flush_frame_bulk(&bq);
 }
 
 static inline struct mvpp2_rx_queue *mvpp2_get_rx_queue(struct mvpp2_port *port,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 net-next 5/5] net: mlx5: add xdp tx return bulking support
  2020-11-04 10:22 [PATCH v3 net-next 0/5] xdp: introduce bulking for page_pool tx return path Lorenzo Bianconi
                   ` (3 preceding siblings ...)
  2020-11-04 10:22 ` [PATCH v3 net-next 4/5] net: mvpp2: " Lorenzo Bianconi
@ 2020-11-04 10:22 ` Lorenzo Bianconi
  4 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Bianconi @ 2020-11-04 10:22 UTC (permalink / raw)
  To: netdev; +Cc: bpf, lorenzo.bianconi, davem, kuba, brouer, ilias.apalodimas

Convert mlx5 driver to xdp_return_frame_bulk APIs.

XDP_REDIRECT (upstream codepath): 8.5Mpps
XDP_REDIRECT (upstream codepath + bulking APIs): 10.1Mpps

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index ae90d533a350..5fdfbf390d5c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -369,8 +369,10 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
 				  bool recycle)
 {
 	struct mlx5e_xdp_info_fifo *xdpi_fifo = &sq->db.xdpi_fifo;
+	struct xdp_frame_bulk bq;
 	u16 i;
 
+	bq.xa = NULL;
 	for (i = 0; i < wi->num_pkts; i++) {
 		struct mlx5e_xdp_info xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo);
 
@@ -379,7 +381,7 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
 			/* XDP_TX from the XSK RQ and XDP_REDIRECT */
 			dma_unmap_single(sq->pdev, xdpi.frame.dma_addr,
 					 xdpi.frame.xdpf->len, DMA_TO_DEVICE);
-			xdp_return_frame(xdpi.frame.xdpf);
+			xdp_return_frame_bulk(xdpi.frame.xdpf, &bq);
 			break;
 		case MLX5E_XDP_XMIT_MODE_PAGE:
 			/* XDP_TX from the regular RQ */
@@ -393,6 +395,7 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
 			WARN_ON_ONCE(true);
 		}
 	}
+	xdp_flush_frame_bulk(&bq);
 }
 
 bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 net-next 1/5] net: xdp: introduce bulking for xdp tx return path
  2020-11-04 10:22 ` [PATCH v3 net-next 1/5] net: xdp: introduce bulking for xdp " Lorenzo Bianconi
@ 2020-11-04 11:11   ` Jesper Dangaard Brouer
  2020-11-04 11:19     ` Lorenzo Bianconi
  0 siblings, 1 reply; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2020-11-04 11:11 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, bpf, lorenzo.bianconi, davem, kuba, ilias.apalodimas,
	brouer, Ioana Ciornei, Ioana Radulescu

On Wed,  4 Nov 2020 11:22:54 +0100
Lorenzo Bianconi <lorenzo@kernel.org> wrote:

> XDP bulk APIs introduce a defer/flush mechanism to return
> pages belonging to the same xdp_mem_allocator object
> (identified via the mem.id field) in bulk to optimize
> I-cache and D-cache since xdp_return_frame is usually run
> inside the driver NAPI tx completion loop.
> The bulk queue size is set to 16 to be aligned to how
> XDP_REDIRECT bulking works. The bulk is flushed when
> it is full or when mem.id changes.
> xdp_frame_bulk is usually stored/allocated on the function
> call-stack to avoid locking penalties.
> Current implementation considers only page_pool memory model.
> 
> Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  include/net/xdp.h |  9 +++++++
>  net/core/xdp.c    | 61 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 70 insertions(+)
> 
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 3814fb631d52..a1f48a73e6df 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -104,6 +104,12 @@ struct xdp_frame {
>  	struct net_device *dev_rx; /* used by cpumap */
>  };
>  
> +#define XDP_BULK_QUEUE_SIZE	16
> +struct xdp_frame_bulk {
> +	int count;
> +	void *xa;
> +	void *q[XDP_BULK_QUEUE_SIZE];
> +};
>  
>  static inline struct skb_shared_info *
>  xdp_get_shared_info_from_frame(struct xdp_frame *frame)
> @@ -194,6 +200,9 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
>  void xdp_return_frame(struct xdp_frame *xdpf);
>  void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
>  void xdp_return_buff(struct xdp_buff *xdp);
> +void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq);
> +void xdp_return_frame_bulk(struct xdp_frame *xdpf,
> +			   struct xdp_frame_bulk *bq);
>  
>  /* When sending xdp_frame into the network stack, then there is no
>   * return point callback, which is needed to release e.g. DMA-mapping
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 48aba933a5a8..66ac275a0360 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -380,6 +380,67 @@ void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
>  }
>  EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
>  
> +/* XDP bulk APIs introduce a defer/flush mechanism to return
> + * pages belonging to the same xdp_mem_allocator object
> + * (identified via the mem.id field) in bulk to optimize
> + * I-cache and D-cache.
> + * The bulk queue size is set to 16 to be aligned to how
> + * XDP_REDIRECT bulking works. The bulk is flushed when

If this is connected, then why have you not redefined DEV_MAP_BULK_SIZE?

Cc. DPAA2 maintainers as they use this define in their drivers.
You want to make sure this driver is flexible enough for future changes.

Like:

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 3814fb631d52..44440a36f96f 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -245,6 +245,6 @@ bool xdp_attachment_flags_ok(struct xdp_attachment_info *info,
 void xdp_attachment_setup(struct xdp_attachment_info *info,
                          struct netdev_bpf *bpf);
 
-#define DEV_MAP_BULK_SIZE 16
+#define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE
 
 #endif /* __LINUX_NET_XDP_H__ */


> + * it is full or when mem.id changes.
> + * xdp_frame_bulk is usually stored/allocated on the function
> + * call-stack to avoid locking penalties.
> + */
> +void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq)
> +{
> +	struct xdp_mem_allocator *xa = bq->xa;
> +	int i;
> +
> +	if (unlikely(!xa))
> +		return;
> +
> +	for (i = 0; i < bq->count; i++) {
> +		struct page *page = virt_to_head_page(bq->q[i]);
> +
> +		page_pool_put_full_page(xa->page_pool, page, false);
> +	}
> +	bq->count = 0;
> +}
> +EXPORT_SYMBOL_GPL(xdp_flush_frame_bulk);
> +
> +void xdp_return_frame_bulk(struct xdp_frame *xdpf,
> +			   struct xdp_frame_bulk *bq)
> +{
> +	struct xdp_mem_info *mem = &xdpf->mem;
> +	struct xdp_mem_allocator *xa;
> +
> +	if (mem->type != MEM_TYPE_PAGE_POOL) {
> +		__xdp_return(xdpf->data, &xdpf->mem, false);
> +		return;
> +	}
>

I cannot make up my mind: It would be a micro-optimization to move
this if-statement to include/net/xdp.h, but it will make code harder to
read/follow, and the call you replace xdp_return_frame() is also in
xdp.c with same call to _xdp_return().  Let keep it as-is. (we can
followup with micro-optimizations)


> +	rcu_read_lock();
> +
> +	xa = bq->xa;
> +	if (unlikely(!xa)) {
> +		xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
> +		bq->count = 0;
> +		bq->xa = xa;
> +	}
> +
> +	if (bq->count == XDP_BULK_QUEUE_SIZE)
> +		xdp_flush_frame_bulk(bq);
> +
> +	if (mem->id != xa->mem.id) {
> +		xdp_flush_frame_bulk(bq);
> +		bq->xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
> +	}
> +
> +	bq->q[bq->count++] = xdpf->data;
> +
> +	rcu_read_unlock();
> +}
> +EXPORT_SYMBOL_GPL(xdp_return_frame_bulk);
> +
>  void xdp_return_buff(struct xdp_buff *xdp)
>  {
>  	__xdp_return(xdp->data, &xdp->rxq->mem, true);



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 net-next 1/5] net: xdp: introduce bulking for xdp tx return path
  2020-11-04 11:11   ` Jesper Dangaard Brouer
@ 2020-11-04 11:19     ` Lorenzo Bianconi
  2020-11-04 12:28       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 12+ messages in thread
From: Lorenzo Bianconi @ 2020-11-04 11:19 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Lorenzo Bianconi, netdev, bpf, davem, kuba, ilias.apalodimas,
	Ioana Ciornei, Ioana Radulescu

[-- Attachment #1: Type: text/plain, Size: 3461 bytes --]

> On Wed,  4 Nov 2020 11:22:54 +0100
> Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> 

[...]

> > +/* XDP bulk APIs introduce a defer/flush mechanism to return
> > + * pages belonging to the same xdp_mem_allocator object
> > + * (identified via the mem.id field) in bulk to optimize
> > + * I-cache and D-cache.
> > + * The bulk queue size is set to 16 to be aligned to how
> > + * XDP_REDIRECT bulking works. The bulk is flushed when
> 
> If this is connected, then why have you not redefined DEV_MAP_BULK_SIZE?
> 
> Cc. DPAA2 maintainers as they use this define in their drivers.
> You want to make sure this driver is flexible enough for future changes.
> 
> Like:
> 
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 3814fb631d52..44440a36f96f 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -245,6 +245,6 @@ bool xdp_attachment_flags_ok(struct xdp_attachment_info *info,
>  void xdp_attachment_setup(struct xdp_attachment_info *info,
>                           struct netdev_bpf *bpf);
>  
> -#define DEV_MAP_BULK_SIZE 16
> +#define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE

my idea was to address it in a separated patch, but if you prefer I can merge
this change in v4

>  
>  #endif /* __LINUX_NET_XDP_H__ */
> 
> 
> > + * it is full or when mem.id changes.
> > + * xdp_frame_bulk is usually stored/allocated on the function
> > + * call-stack to avoid locking penalties.
> > + */
> > +void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq)
> > +{
> > +	struct xdp_mem_allocator *xa = bq->xa;
> > +	int i;
> > +
> > +	if (unlikely(!xa))
> > +		return;
> > +
> > +	for (i = 0; i < bq->count; i++) {
> > +		struct page *page = virt_to_head_page(bq->q[i]);
> > +
> > +		page_pool_put_full_page(xa->page_pool, page, false);
> > +	}
> > +	bq->count = 0;
> > +}
> > +EXPORT_SYMBOL_GPL(xdp_flush_frame_bulk);
> > +
> > +void xdp_return_frame_bulk(struct xdp_frame *xdpf,
> > +			   struct xdp_frame_bulk *bq)
> > +{
> > +	struct xdp_mem_info *mem = &xdpf->mem;
> > +	struct xdp_mem_allocator *xa;
> > +
> > +	if (mem->type != MEM_TYPE_PAGE_POOL) {
> > +		__xdp_return(xdpf->data, &xdpf->mem, false);
> > +		return;
> > +	}
> >
> 
> I cannot make up my mind: It would be a micro-optimization to move
> this if-statement to include/net/xdp.h, but it will make code harder to
> read/follow, and the call you replace xdp_return_frame() is also in
> xdp.c with same call to _xdp_return().  Let keep it as-is. (we can
> followup with micro-optimizations)

ack

Regards,
Lorenzo

> 
> 
> > +	rcu_read_lock();
> > +
> > +	xa = bq->xa;
> > +	if (unlikely(!xa)) {
> > +		xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
> > +		bq->count = 0;
> > +		bq->xa = xa;
> > +	}
> > +
> > +	if (bq->count == XDP_BULK_QUEUE_SIZE)
> > +		xdp_flush_frame_bulk(bq);
> > +
> > +	if (mem->id != xa->mem.id) {
> > +		xdp_flush_frame_bulk(bq);
> > +		bq->xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
> > +	}
> > +
> > +	bq->q[bq->count++] = xdpf->data;
> > +
> > +	rcu_read_unlock();
> > +}
> > +EXPORT_SYMBOL_GPL(xdp_return_frame_bulk);
> > +
> >  void xdp_return_buff(struct xdp_buff *xdp)
> >  {
> >  	__xdp_return(xdp->data, &xdp->rxq->mem, true);
> 
> 
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 net-next 2/5] net: page_pool: add bulk support for ptr_ring
  2020-11-04 10:22 ` [PATCH v3 net-next 2/5] net: page_pool: add bulk support for ptr_ring Lorenzo Bianconi
@ 2020-11-04 12:24   ` Jesper Dangaard Brouer
  2020-11-04 14:49     ` Lorenzo Bianconi
  0 siblings, 1 reply; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2020-11-04 12:24 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, bpf, lorenzo.bianconi, davem, kuba, ilias.apalodimas, brouer

On Wed,  4 Nov 2020 11:22:55 +0100 Lorenzo Bianconi <lorenzo@kernel.org> wrote:

> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index ef98372facf6..236c5ed3aa66 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
[...]
> @@ -408,6 +410,39 @@ void page_pool_put_page(struct page_pool *pool, struct page *page,
>  }
>  EXPORT_SYMBOL(page_pool_put_page);
>  
> +void page_pool_put_page_bulk(struct page_pool *pool, void **data,
> +			     int count)
> +{
> +	int i, len = 0;
> +
> +	for (i = 0; i < count; i++) {
> +		struct page *page = virt_to_head_page(data[i]);
> +
> +		if (likely(page_ref_count(page) == 1 &&
> +			   pool_page_reusable(pool, page))) {
> +			if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
> +				page_pool_dma_sync_for_device(pool, page, -1);
> +
> +			/* bulk pages for ptr_ring cache */
> +			data[len++] = page;
> +		} else {
> +			page_pool_release_page(pool, page);
> +			put_page(page);
> +		}
> +	}
> +
> +	/* Grab the producer spinlock for concurrent access to
> +	 * ptr_ring page_pool cache
> +	 */
> +	page_pool_ring_lock(pool);
> +	for (i = 0; i < len; i++) {
> +		if (__ptr_ring_produce(&pool->ring, data[i]))
> +			page_pool_return_page(pool, data[i]);
> +	}
> +	page_pool_ring_unlock(pool);
> +}
> +EXPORT_SYMBOL(page_pool_put_page_bulk);

I don't like that you are replicating the core logic from
page_pool_put_page() in this function.  This means that we as
maintainers need to keep both of this places up-to-date.

Let me try to re-implement this, while sharing the refcnt logic:
(completely untested, not even compiled)

---
 net/core/page_pool.c |   58 +++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index ef98372facf6..c785e9825a0d 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -362,8 +362,9 @@ static bool pool_page_reusable(struct page_pool *pool, struct page *page)
  * If the page refcnt != 1, then the page will be returned to memory
  * subsystem.
  */
-void page_pool_put_page(struct page_pool *pool, struct page *page,
-			unsigned int dma_sync_size, bool allow_direct)
+static struct page*
+__page_pool_put_page(struct page_pool *pool, struct page *page,
+		     unsigned int dma_sync_size, bool allow_direct)
 {
 	/* This allocator is optimized for the XDP mode that uses
 	 * one-frame-per-page, but have fallbacks that act like the
@@ -381,13 +382,10 @@ void page_pool_put_page(struct page_pool *pool, struct page *page,
 
 		if (allow_direct && in_serving_softirq())
 			if (page_pool_recycle_in_cache(page, pool))
-				return;
+				return NULL;
 
-		if (!page_pool_recycle_in_ring(pool, page)) {
-			/* Cache full, fallback to free pages */
-			page_pool_return_page(pool, page);
-		}
-		return;
+		/* Page found as candidate for recycling */
+		return page;
 	}
 	/* Fallback/non-XDP mode: API user have elevated refcnt.
 	 *
@@ -405,9 +403,53 @@ void page_pool_put_page(struct page_pool *pool, struct page *page,
 	/* Do not replace this with page_pool_return_page() */
 	page_pool_release_page(pool, page);
 	put_page(page);
+	return NULL;
+}
+
+void page_pool_put_page(struct page_pool *pool, struct page *page,
+			unsigned int dma_sync_size, bool allow_direct)
+{
+	page = __page_pool_put_page(pool, page, dma_sync_size, allow_direct);
+
+	if (page && !page_pool_recycle_in_ring(pool, page)) {
+		/* Cache full, fallback to free pages */
+		page_pool_return_page(pool, page);
+	}
 }
 EXPORT_SYMBOL(page_pool_put_page);
 
+/* Caller must not use data area after call, as this function overwrites it */
+void page_pool_put_page_bulk(struct page_pool *pool, void **data, int count)
+{
+	int i, len = 0, len2 = 0;
+
+	for (i = 0; i < count; i++) {
+		struct page *page = virt_to_head_page(data[i]);
+
+		page = __page_pool_put_page(pool, page, -1 , false);
+
+		/* Approved for recycling for ptr_ring cache */
+		if (page)
+			data[len++] = page;
+	}
+
+	/* Bulk producer into ptr_ring page_pool cache */
+	page_pool_ring_lock(pool);
+	for (i = 0; i < len; i++) {
+		if (__ptr_ring_produce(&pool->ring, data[i]))
+			data[len2++] = data[i];
+	}
+	page_pool_ring_unlock(pool);
+
+	/* Unlikely case of ptr_ring cache full, free pages outside producer
+	 * lock, given put_page() with refcnt==1 can be an expensive operation.
+	 */
+	for (i = 0; i < len2; i++) {
+		page_pool_return_page(pool, data[i]);
+	}
+}
+EXPORT_SYMBOL(page_pool_put_page_bulk);
+
 static void page_pool_empty_ring(struct page_pool *pool)
 {
 	struct page *page;


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 net-next 1/5] net: xdp: introduce bulking for xdp tx return path
  2020-11-04 11:19     ` Lorenzo Bianconi
@ 2020-11-04 12:28       ` Jesper Dangaard Brouer
  2020-11-04 12:53         ` Lorenzo Bianconi
  0 siblings, 1 reply; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2020-11-04 12:28 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Lorenzo Bianconi, netdev, bpf, davem, kuba, ilias.apalodimas,
	Ioana Ciornei, Ioana Radulescu, brouer

On Wed, 4 Nov 2020 12:19:02 +0100
Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:

> > On Wed,  4 Nov 2020 11:22:54 +0100
> > Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> >   
> 
> [...]
> 
> > > +/* XDP bulk APIs introduce a defer/flush mechanism to return
> > > + * pages belonging to the same xdp_mem_allocator object
> > > + * (identified via the mem.id field) in bulk to optimize
> > > + * I-cache and D-cache.
> > > + * The bulk queue size is set to 16 to be aligned to how
> > > + * XDP_REDIRECT bulking works. The bulk is flushed when  
> > 
> > If this is connected, then why have you not redefined DEV_MAP_BULK_SIZE?
> > 
> > Cc. DPAA2 maintainers as they use this define in their drivers.
> > You want to make sure this driver is flexible enough for future changes.
> > 
> > Like:
> > 
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index 3814fb631d52..44440a36f96f 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -245,6 +245,6 @@ bool xdp_attachment_flags_ok(struct xdp_attachment_info *info,
> >  void xdp_attachment_setup(struct xdp_attachment_info *info,
> >                           struct netdev_bpf *bpf);
> >  
> > -#define DEV_MAP_BULK_SIZE 16
> > +#define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE  
> 
> my idea was to address it in a separated patch, but if you prefer I can merge
> this change in v4

Please merge in V4.  As this patch contains the explanation, and we
want to avoid too much churn (remember our colleagues need to backport
and review this).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 net-next 1/5] net: xdp: introduce bulking for xdp tx return path
  2020-11-04 12:28       ` Jesper Dangaard Brouer
@ 2020-11-04 12:53         ` Lorenzo Bianconi
  0 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Bianconi @ 2020-11-04 12:53 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Lorenzo Bianconi, netdev, bpf, davem, kuba, ilias.apalodimas,
	Ioana Ciornei, Ioana Radulescu

[-- Attachment #1: Type: text/plain, Size: 1847 bytes --]

> On Wed, 4 Nov 2020 12:19:02 +0100
> Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:
> 
> > > On Wed,  4 Nov 2020 11:22:54 +0100
> > > Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > >   
> > 
> > [...]
> > 
> > > > +/* XDP bulk APIs introduce a defer/flush mechanism to return
> > > > + * pages belonging to the same xdp_mem_allocator object
> > > > + * (identified via the mem.id field) in bulk to optimize
> > > > + * I-cache and D-cache.
> > > > + * The bulk queue size is set to 16 to be aligned to how
> > > > + * XDP_REDIRECT bulking works. The bulk is flushed when  
> > > 
> > > If this is connected, then why have you not redefined DEV_MAP_BULK_SIZE?
> > > 
> > > Cc. DPAA2 maintainers as they use this define in their drivers.
> > > You want to make sure this driver is flexible enough for future changes.
> > > 
> > > Like:
> > > 
> > > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > > index 3814fb631d52..44440a36f96f 100644
> > > --- a/include/net/xdp.h
> > > +++ b/include/net/xdp.h
> > > @@ -245,6 +245,6 @@ bool xdp_attachment_flags_ok(struct xdp_attachment_info *info,
> > >  void xdp_attachment_setup(struct xdp_attachment_info *info,
> > >                           struct netdev_bpf *bpf);
> > >  
> > > -#define DEV_MAP_BULK_SIZE 16
> > > +#define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE  
> > 
> > my idea was to address it in a separated patch, but if you prefer I can merge
> > this change in v4

sure, will do in v4.

Regards,
Lorenzo

> 
> Please merge in V4.  As this patch contains the explanation, and we
> want to avoid too much churn (remember our colleagues need to backport
> and review this).
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 net-next 2/5] net: page_pool: add bulk support for ptr_ring
  2020-11-04 12:24   ` Jesper Dangaard Brouer
@ 2020-11-04 14:49     ` Lorenzo Bianconi
  0 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Bianconi @ 2020-11-04 14:49 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Lorenzo Bianconi, netdev, bpf, davem, kuba, ilias.apalodimas

[-- Attachment #1: Type: text/plain, Size: 3623 bytes --]

> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index ef98372facf6..236c5ed3aa66 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> [...]
> > @@ -408,6 +410,39 @@ void page_pool_put_page(struct page_pool *pool, struct page *page,
> >  }
> >  EXPORT_SYMBOL(page_pool_put_page);
> >  
> > +void page_pool_put_page_bulk(struct page_pool *pool, void **data,
> > +			     int count)
> > +{
> > +	int i, len = 0;
> > +
> > +	for (i = 0; i < count; i++) {
> > +		struct page *page = virt_to_head_page(data[i]);
> > +
> > +		if (likely(page_ref_count(page) == 1 &&
> > +			   pool_page_reusable(pool, page))) {
> > +			if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
> > +				page_pool_dma_sync_for_device(pool, page, -1);
> > +
> > +			/* bulk pages for ptr_ring cache */
> > +			data[len++] = page;
> > +		} else {
> > +			page_pool_release_page(pool, page);
> > +			put_page(page);
> > +		}
> > +	}
> > +
> > +	/* Grab the producer spinlock for concurrent access to
> > +	 * ptr_ring page_pool cache
> > +	 */
> > +	page_pool_ring_lock(pool);
> > +	for (i = 0; i < len; i++) {
> > +		if (__ptr_ring_produce(&pool->ring, data[i]))
> > +			page_pool_return_page(pool, data[i]);
> > +	}
> > +	page_pool_ring_unlock(pool);
> > +}
> > +EXPORT_SYMBOL(page_pool_put_page_bulk);
> 
> I don't like that you are replicating the core logic from
> page_pool_put_page() in this function.  This means that we as
> maintainers need to keep both of this places up-to-date.
> 
> Let me try to re-implement this, while sharing the refcnt logic:
> (completely untested, not even compiled)

ack, I like the approach below, I will integrate it in v4

> 
> ---
>  net/core/page_pool.c |   58 +++++++++++++++++++++++++++++++++++++++++++-------
>  2 files changed, 51 insertions(+), 9 deletions(-)
> 
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index ef98372facf6..c785e9825a0d 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -362,8 +362,9 @@ static bool pool_page_reusable(struct page_pool *pool, struct page *page)
>   * If the page refcnt != 1, then the page will be returned to memory
>   * subsystem.

[...]

>  
> +/* Caller must not use data area after call, as this function overwrites it */
> +void page_pool_put_page_bulk(struct page_pool *pool, void **data, int count)
> +{
> +	int i, len = 0, len2 = 0;
> +
> +	for (i = 0; i < count; i++) {
> +		struct page *page = virt_to_head_page(data[i]);
> +
> +		page = __page_pool_put_page(pool, page, -1 , false);
> +
> +		/* Approved for recycling for ptr_ring cache */
> +		if (page)
> +			data[len++] = page;
> +	}

I guess we just return here if len is 0 since we will avoid to grab the
ptr_ring lock, agree?

Regards,
Lorenzo

> +
> +	/* Bulk producer into ptr_ring page_pool cache */
> +	page_pool_ring_lock(pool);
> +	for (i = 0; i < len; i++) {
> +		if (__ptr_ring_produce(&pool->ring, data[i]))
> +			data[len2++] = data[i];
> +	}
> +	page_pool_ring_unlock(pool);
> +
> +	/* Unlikely case of ptr_ring cache full, free pages outside producer
> +	 * lock, given put_page() with refcnt==1 can be an expensive operation.
> +	 */
> +	for (i = 0; i < len2; i++) {
> +		page_pool_return_page(pool, data[i]);
> +	}
> +}
> +EXPORT_SYMBOL(page_pool_put_page_bulk);
> +
>  static void page_pool_empty_ring(struct page_pool *pool)
>  {
>  	struct page *page;
> 
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-11-04 14:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-04 10:22 [PATCH v3 net-next 0/5] xdp: introduce bulking for page_pool tx return path Lorenzo Bianconi
2020-11-04 10:22 ` [PATCH v3 net-next 1/5] net: xdp: introduce bulking for xdp " Lorenzo Bianconi
2020-11-04 11:11   ` Jesper Dangaard Brouer
2020-11-04 11:19     ` Lorenzo Bianconi
2020-11-04 12:28       ` Jesper Dangaard Brouer
2020-11-04 12:53         ` Lorenzo Bianconi
2020-11-04 10:22 ` [PATCH v3 net-next 2/5] net: page_pool: add bulk support for ptr_ring Lorenzo Bianconi
2020-11-04 12:24   ` Jesper Dangaard Brouer
2020-11-04 14:49     ` Lorenzo Bianconi
2020-11-04 10:22 ` [PATCH v3 net-next 3/5] net: mvneta: add xdp tx return bulking support Lorenzo Bianconi
2020-11-04 10:22 ` [PATCH v3 net-next 4/5] net: mvpp2: " Lorenzo Bianconi
2020-11-04 10:22 ` [PATCH v3 net-next 5/5] net: mlx5: " Lorenzo Bianconi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).