All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/4] xsk: Tx improvements
@ 2021-12-16 13:59 Maciej Fijalkowski
  2021-12-16 13:59 ` [PATCH bpf-next v2 1/4] i40e: xsk: move tmp desc array from driver to pool Maciej Fijalkowski
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Maciej Fijalkowski @ 2021-12-16 13:59 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: netdev, magnus.karlsson, Maciej Fijalkowski

Hi,
this time we're on Tx side of AF_XDP and we touch i40e and ice drivers.
Unfortunately, similar scalability issues that were addressed for XDP
processing in ice, exist on AF_XDP side. Let's resolve them in mostly
the same way as we did on [0] and utilize the Tx batching API from xsk
buffer pool.

Magnus moves the array of Tx descriptors that is used with batching
approach to the xsk buffer pool. This means that future users of this
API will not have carry the array on their own side, they can simple
refer to pool's tx_desc array, which can be seen on patch from Magnus.
Described patch is based on i40e as it is the only user of this API.
Tx batching is still left to be tried out for ice, though.

v2:
* introduce new patch that resets @next_dd and @next_rs fields
* use batching API for AF_XDP Tx on ice side

Thanks,
Magnus & Maciej

[0]: https://lore.kernel.org/bpf/20211015162908.145341-8-anthony.l.nguyen@intel.com/

Maciej Fijalkowski (3):
  ice: xsk: avoid potential dead AF_XDP Tx processing
  ice: xsk: improve AF_XDP ZC Tx and use batching API
  ice: xsk: borrow xdp_tx_active logic from i40e

Magnus Karlsson (1):
  i40e: xsk: move tmp desc array from driver to pool

 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  11 -
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |   1 -
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    |   4 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c     |   4 +-
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   5 +-
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c |   1 +
 drivers/net/ethernet/intel/ice/ice_xsk.c      | 255 ++++++++++++------
 drivers/net/ethernet/intel/ice/ice_xsk.h      |  26 +-
 include/net/xdp_sock_drv.h                    |   5 +-
 include/net/xsk_buff_pool.h                   |   1 +
 net/xdp/xsk.c                                 |  13 +-
 net/xdp/xsk_buff_pool.c                       |   7 +
 net/xdp/xsk_queue.h                           |  12 +-
 13 files changed, 216 insertions(+), 129 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v2 1/4] i40e: xsk: move tmp desc array from driver to pool
  2021-12-16 13:59 [PATCH bpf-next v2 0/4] xsk: Tx improvements Maciej Fijalkowski
@ 2021-12-16 13:59 ` Maciej Fijalkowski
  2021-12-16 13:59 ` [PATCH bpf-next v2 2/4] ice: xsk: avoid potential dead AF_XDP Tx processing Maciej Fijalkowski
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Maciej Fijalkowski @ 2021-12-16 13:59 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: netdev, magnus.karlsson

From: Magnus Karlsson <magnus.karlsson@intel.com>

Move desc_array from the driver to the pool. The reason behind this is
that we can then reuse this array as a temporary storage for
descriptors in both the zero-copy case and the SKB case. No need to
have two, but it needs to be in the pool as the SKB case assumes
unmodified drivers.

i40e is the only driver that has a batched Tx zero-copy
implementation, so no need to touch any other driver.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 11 -----------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h |  1 -
 drivers/net/ethernet/intel/i40e/i40e_xsk.c  |  4 ++--
 include/net/xdp_sock_drv.h                  |  5 ++---
 include/net/xsk_buff_pool.h                 |  1 +
 net/xdp/xsk.c                               | 13 ++++++-------
 net/xdp/xsk_buff_pool.c                     |  7 +++++++
 net/xdp/xsk_queue.h                         | 12 ++++++------
 8 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 10a83e5385c7..d3a4a33977ee 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -830,8 +830,6 @@ void i40e_free_tx_resources(struct i40e_ring *tx_ring)
 	i40e_clean_tx_ring(tx_ring);
 	kfree(tx_ring->tx_bi);
 	tx_ring->tx_bi = NULL;
-	kfree(tx_ring->xsk_descs);
-	tx_ring->xsk_descs = NULL;
 
 	if (tx_ring->desc) {
 		dma_free_coherent(tx_ring->dev, tx_ring->size,
@@ -1433,13 +1431,6 @@ int i40e_setup_tx_descriptors(struct i40e_ring *tx_ring)
 	if (!tx_ring->tx_bi)
 		goto err;
 
-	if (ring_is_xdp(tx_ring)) {
-		tx_ring->xsk_descs = kcalloc(I40E_MAX_NUM_DESCRIPTORS, sizeof(*tx_ring->xsk_descs),
-					     GFP_KERNEL);
-		if (!tx_ring->xsk_descs)
-			goto err;
-	}
-
 	u64_stats_init(&tx_ring->syncp);
 
 	/* round up to nearest 4K */
@@ -1463,8 +1454,6 @@ int i40e_setup_tx_descriptors(struct i40e_ring *tx_ring)
 	return 0;
 
 err:
-	kfree(tx_ring->xsk_descs);
-	tx_ring->xsk_descs = NULL;
 	kfree(tx_ring->tx_bi);
 	tx_ring->tx_bi = NULL;
 	return -ENOMEM;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index bfc2845c99d1..f6d91fa1562e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -390,7 +390,6 @@ struct i40e_ring {
 	u16 rx_offset;
 	struct xdp_rxq_info xdp_rxq;
 	struct xsk_buff_pool *xsk_pool;
-	struct xdp_desc *xsk_descs;      /* For storing descriptors in the AF_XDP ZC path */
 } ____cacheline_internodealigned_in_smp;
 
 static inline bool ring_uses_build_skb(struct i40e_ring *ring)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index ea06e957393e..1a505cf8a6ad 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -467,11 +467,11 @@ static void i40e_set_rs_bit(struct i40e_ring *xdp_ring)
  **/
 static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
 {
-	struct xdp_desc *descs = xdp_ring->xsk_descs;
+	struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs;
 	u32 nb_pkts, nb_processed = 0;
 	unsigned int total_bytes = 0;
 
-	nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, descs, budget);
+	nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget);
 	if (!nb_pkts)
 		return true;
 
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 443d45951564..4aa031849668 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -13,7 +13,7 @@
 
 void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries);
 bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc);
-u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, struct xdp_desc *desc, u32 max);
+u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, u32 max);
 void xsk_tx_release(struct xsk_buff_pool *pool);
 struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev,
 					    u16 queue_id);
@@ -142,8 +142,7 @@ static inline bool xsk_tx_peek_desc(struct xsk_buff_pool *pool,
 	return false;
 }
 
-static inline u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, struct xdp_desc *desc,
-						 u32 max)
+static inline u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, u32 max)
 {
 	return 0;
 }
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index ddeefc4a1040..5554ee75e7da 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -60,6 +60,7 @@ struct xsk_buff_pool {
 	 */
 	dma_addr_t *dma_pages;
 	struct xdp_buff_xsk *heads;
+	struct xdp_desc *tx_descs;
 	u64 chunk_mask;
 	u64 addrs_cnt;
 	u32 free_list_cnt;
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index f16074eb53c7..e86a8fd0c7b3 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -343,9 +343,9 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
 }
 EXPORT_SYMBOL(xsk_tx_peek_desc);
 
-static u32 xsk_tx_peek_release_fallback(struct xsk_buff_pool *pool, struct xdp_desc *descs,
-					u32 max_entries)
+static u32 xsk_tx_peek_release_fallback(struct xsk_buff_pool *pool, u32 max_entries)
 {
+	struct xdp_desc *descs = pool->tx_descs;
 	u32 nb_pkts = 0;
 
 	while (nb_pkts < max_entries && xsk_tx_peek_desc(pool, &descs[nb_pkts]))
@@ -355,8 +355,7 @@ static u32 xsk_tx_peek_release_fallback(struct xsk_buff_pool *pool, struct xdp_d
 	return nb_pkts;
 }
 
-u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, struct xdp_desc *descs,
-				   u32 max_entries)
+u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, u32 max_entries)
 {
 	struct xdp_sock *xs;
 	u32 nb_pkts;
@@ -365,7 +364,7 @@ u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, struct xdp_desc *
 	if (!list_is_singular(&pool->xsk_tx_list)) {
 		/* Fallback to the non-batched version */
 		rcu_read_unlock();
-		return xsk_tx_peek_release_fallback(pool, descs, max_entries);
+		return xsk_tx_peek_release_fallback(pool, max_entries);
 	}
 
 	xs = list_first_or_null_rcu(&pool->xsk_tx_list, struct xdp_sock, tx_list);
@@ -374,7 +373,7 @@ u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, struct xdp_desc *
 		goto out;
 	}
 
-	nb_pkts = xskq_cons_peek_desc_batch(xs->tx, descs, pool, max_entries);
+	nb_pkts = xskq_cons_peek_desc_batch(xs->tx, pool, max_entries);
 	if (!nb_pkts) {
 		xs->tx->queue_empty_descs++;
 		goto out;
@@ -386,7 +385,7 @@ u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, struct xdp_desc *
 	 * packets. This avoids having to implement any buffering in
 	 * the Tx path.
 	 */
-	nb_pkts = xskq_prod_reserve_addr_batch(pool->cq, descs, nb_pkts);
+	nb_pkts = xskq_prod_reserve_addr_batch(pool->cq, pool->tx_descs, nb_pkts);
 	if (!nb_pkts)
 		goto out;
 
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 90c4e1e819d3..15f02f02a743 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -37,6 +37,7 @@ void xp_destroy(struct xsk_buff_pool *pool)
 	if (!pool)
 		return;
 
+	kvfree(pool->tx_descs);
 	kvfree(pool->heads);
 	kvfree(pool);
 }
@@ -58,6 +59,12 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs,
 	if (!pool->heads)
 		goto out;
 
+	if (xs->tx) {
+		pool->tx_descs = kcalloc(xs->tx->nentries, sizeof(*pool->tx_descs), GFP_KERNEL);
+		if (!pool->tx_descs)
+			goto out;
+	}
+
 	pool->chunk_mask = ~((u64)umem->chunk_size - 1);
 	pool->addrs_cnt = umem->size;
 	pool->heads_cnt = umem->chunks;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index e9aa2c236356..638138fbe475 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -205,11 +205,11 @@ static inline bool xskq_cons_read_desc(struct xsk_queue *q,
 	return false;
 }
 
-static inline u32 xskq_cons_read_desc_batch(struct xsk_queue *q,
-					    struct xdp_desc *descs,
-					    struct xsk_buff_pool *pool, u32 max)
+static inline u32 xskq_cons_read_desc_batch(struct xsk_queue *q, struct xsk_buff_pool *pool,
+					    u32 max)
 {
 	u32 cached_cons = q->cached_cons, nb_entries = 0;
+	struct xdp_desc *descs = pool->tx_descs;
 
 	while (cached_cons != q->cached_prod && nb_entries < max) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
@@ -282,12 +282,12 @@ static inline bool xskq_cons_peek_desc(struct xsk_queue *q,
 	return xskq_cons_read_desc(q, desc, pool);
 }
 
-static inline u32 xskq_cons_peek_desc_batch(struct xsk_queue *q, struct xdp_desc *descs,
-					    struct xsk_buff_pool *pool, u32 max)
+static inline u32 xskq_cons_peek_desc_batch(struct xsk_queue *q, struct xsk_buff_pool *pool,
+					    u32 max)
 {
 	u32 entries = xskq_cons_nb_entries(q, max);
 
-	return xskq_cons_read_desc_batch(q, descs, pool, entries);
+	return xskq_cons_read_desc_batch(q, pool, entries);
 }
 
 /* To improve performance in the xskq_cons_release functions, only update local state here.
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v2 2/4] ice: xsk: avoid potential dead AF_XDP Tx processing
  2021-12-16 13:59 [PATCH bpf-next v2 0/4] xsk: Tx improvements Maciej Fijalkowski
  2021-12-16 13:59 ` [PATCH bpf-next v2 1/4] i40e: xsk: move tmp desc array from driver to pool Maciej Fijalkowski
@ 2021-12-16 13:59 ` Maciej Fijalkowski
  2021-12-21  7:38   ` Magnus Karlsson
  2021-12-16 13:59 ` [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API Maciej Fijalkowski
  2021-12-16 13:59 ` [PATCH bpf-next v2 4/4] ice: xsk: borrow xdp_tx_active logic from i40e Maciej Fijalkowski
  3 siblings, 1 reply; 14+ messages in thread
From: Maciej Fijalkowski @ 2021-12-16 13:59 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: netdev, magnus.karlsson, Maciej Fijalkowski

Commit 9610bd988df9 ("ice: optimize XDP_TX workloads") introduced
@next_dd and @next_rs to ice_tx_ring struct. Currently, their state is
not restored in ice_clean_tx_ring(), which was not causing any troubles
as the XDP rings are gone after we're done with XDP prog on interface.

For upcoming usage of mentioned fields in AF_XDP, this might expose us
to a potential dead Tx side. Scenario would look like following (based
on xdpsock):

- two xdpsock instances are spawned in Tx mode
- one of them is killed
- XDP prog is kept on interface due to the other xdpsock still running
  * this means that XDP rings stayed in place
- xdpsock is launched again on same queue id that was terminated on
- @next_dd and @next_rs setting is bogus, therefore transmit side is
  broken

To protect us from the above, restore the default @next_rs and @next_dd
values when cleaning the Tx ring.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index bc3ba19dc88f..0f3f92ce8a95 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -172,6 +172,8 @@ void ice_clean_tx_ring(struct ice_tx_ring *tx_ring)
 
 	tx_ring->next_to_use = 0;
 	tx_ring->next_to_clean = 0;
+	tx_ring->next_dd = ICE_TX_THRESH - 1;
+	tx_ring->next_rs = ICE_TX_THRESH - 1;
 
 	if (!tx_ring->netdev)
 		return;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API
  2021-12-16 13:59 [PATCH bpf-next v2 0/4] xsk: Tx improvements Maciej Fijalkowski
  2021-12-16 13:59 ` [PATCH bpf-next v2 1/4] i40e: xsk: move tmp desc array from driver to pool Maciej Fijalkowski
  2021-12-16 13:59 ` [PATCH bpf-next v2 2/4] ice: xsk: avoid potential dead AF_XDP Tx processing Maciej Fijalkowski
@ 2021-12-16 13:59 ` Maciej Fijalkowski
  2021-12-17  3:02   ` kernel test robot
                     ` (3 more replies)
  2021-12-16 13:59 ` [PATCH bpf-next v2 4/4] ice: xsk: borrow xdp_tx_active logic from i40e Maciej Fijalkowski
  3 siblings, 4 replies; 14+ messages in thread
From: Maciej Fijalkowski @ 2021-12-16 13:59 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: netdev, magnus.karlsson, Maciej Fijalkowski

Follow mostly the logic from commit 9610bd988df9 ("ice: optimize XDP_TX
workloads") that has been done in order to address the massive tx_busy
statistic bump and improve the performance as well.

Increase the ICE_TX_THRESH to 64 as it seems to work out better for both
XDP and AF_XDP. Also, separating the stats structs onto separate cache
lines seemed to improve the performance. Batching approach is inspired
by i40e's implementation with adjustments to the cleaning logic.

One difference from 'xdpdrv' XDP_TX is when ring has less than
ICE_TX_THRESH free entries, the cleaning routine will not stop after
cleaning a single ICE_TX_THRESH amount of descs but rather will forward
the next_dd pointer and check the DD bit and for this bit being set the
cleaning will be repeated. IOW clean until there are descs that can be
cleaned.

It takes three separate xdpsock instances in txonly mode to achieve the
line rate and this was not previously possible.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c |   2 +-
 drivers/net/ethernet/intel/ice/ice_txrx.h |   4 +-
 drivers/net/ethernet/intel/ice/ice_xsk.c  | 249 ++++++++++++++--------
 drivers/net/ethernet/intel/ice/ice_xsk.h  |  26 ++-
 4 files changed, 182 insertions(+), 99 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 0f3f92ce8a95..5f1a9a1ac877 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1447,7 +1447,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
 		bool wd;
 
 		if (tx_ring->xsk_pool)
-			wd = ice_clean_tx_irq_zc(tx_ring, budget);
+			wd = ice_xmit_zc(tx_ring, ICE_DESC_UNUSED(tx_ring));
 		else if (ice_ring_is_xdp(tx_ring))
 			wd = true;
 		else
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index c56dd1749903..9e8b4337d131 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -13,7 +13,7 @@
 #define ICE_MAX_CHAINED_RX_BUFS	5
 #define ICE_MAX_BUF_TXD		8
 #define ICE_MIN_TX_LEN		17
-#define ICE_TX_THRESH		32
+#define ICE_TX_THRESH		64
 
 /* The size limit for a transmit buffer in a descriptor is (16K - 1).
  * In order to align with the read requests we will align the value to
@@ -322,9 +322,9 @@ struct ice_tx_ring {
 	u16 count;			/* Number of descriptors */
 	u16 q_index;			/* Queue number of ring */
 	/* stats structs */
+	struct ice_txq_stats tx_stats;
 	struct ice_q_stats	stats;
 	struct u64_stats_sync syncp;
-	struct ice_txq_stats tx_stats;
 
 	/* CL3 - 3rd cacheline starts here */
 	struct rcu_head rcu;		/* to avoid race on free */
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index ff55cb415b11..563ea7e7e0b1 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -611,58 +611,6 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
 	return failure ? budget : (int)total_rx_packets;
 }
 
-/**
- * ice_xmit_zc - Completes AF_XDP entries, and cleans XDP entries
- * @xdp_ring: XDP Tx ring
- * @budget: max number of frames to xmit
- *
- * Returns true if cleanup/transmission is done.
- */
-static bool ice_xmit_zc(struct ice_tx_ring *xdp_ring, int budget)
-{
-	struct ice_tx_desc *tx_desc = NULL;
-	bool work_done = true;
-	struct xdp_desc desc;
-	dma_addr_t dma;
-
-	while (likely(budget-- > 0)) {
-		struct ice_tx_buf *tx_buf;
-
-		if (unlikely(!ICE_DESC_UNUSED(xdp_ring))) {
-			xdp_ring->tx_stats.tx_busy++;
-			work_done = false;
-			break;
-		}
-
-		tx_buf = &xdp_ring->tx_buf[xdp_ring->next_to_use];
-
-		if (!xsk_tx_peek_desc(xdp_ring->xsk_pool, &desc))
-			break;
-
-		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc.addr);
-		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma,
-						 desc.len);
-
-		tx_buf->bytecount = desc.len;
-
-		tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_to_use);
-		tx_desc->buf_addr = cpu_to_le64(dma);
-		tx_desc->cmd_type_offset_bsz =
-			ice_build_ctob(ICE_TXD_LAST_DESC_CMD, 0, desc.len, 0);
-
-		xdp_ring->next_to_use++;
-		if (xdp_ring->next_to_use == xdp_ring->count)
-			xdp_ring->next_to_use = 0;
-	}
-
-	if (tx_desc) {
-		ice_xdp_ring_update_tail(xdp_ring);
-		xsk_tx_release(xdp_ring->xsk_pool);
-	}
-
-	return budget > 0 && work_done;
-}
-
 /**
  * ice_clean_xdp_tx_buf - Free and unmap XDP Tx buffer
  * @xdp_ring: XDP Tx ring
@@ -678,32 +626,33 @@ ice_clean_xdp_tx_buf(struct ice_tx_ring *xdp_ring, struct ice_tx_buf *tx_buf)
 }
 
 /**
- * ice_clean_tx_irq_zc - Completes AF_XDP entries, and cleans XDP entries
- * @xdp_ring: XDP Tx ring
- * @budget: NAPI budget
+ * ice_clean_xdp_irq - Reclaim resources after transmit completes on XDP ring
+ * @xdp_ring: XDP ring to clean
  *
- * Returns true if cleanup/tranmission is done.
+ * Returns count of cleaned descriptors
  */
-bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget)
+static u16 ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring)
 {
-	int total_packets = 0, total_bytes = 0;
-	s16 ntc = xdp_ring->next_to_clean;
-	struct ice_tx_desc *tx_desc;
+	struct ice_tx_desc *next_dd_desc;
+	u16 next_dd = xdp_ring->next_dd;
+	u16 desc_cnt = xdp_ring->count;
 	struct ice_tx_buf *tx_buf;
+	u16 ntc, cleared_dds = 0;
 	u32 xsk_frames = 0;
-	bool xmit_done;
+	u16 i;
 
-	tx_desc = ICE_TX_DESC(xdp_ring, ntc);
-	tx_buf = &xdp_ring->tx_buf[ntc];
-	ntc -= xdp_ring->count;
+	next_dd_desc = ICE_TX_DESC(xdp_ring, next_dd);
+	if (!(next_dd_desc->cmd_type_offset_bsz &
+	    cpu_to_le64(ICE_TX_DESC_DTYPE_DESC_DONE)))
+		return 0;
 
-	do {
-		if (!(tx_desc->cmd_type_offset_bsz &
-		      cpu_to_le64(ICE_TX_DESC_DTYPE_DESC_DONE)))
-			break;
+again:
+	cleared_dds++;
 
-		total_bytes += tx_buf->bytecount;
-		total_packets++;
+	ntc = xdp_ring->next_to_clean;
+
+	for (i = 0; i < ICE_TX_THRESH; i++) {
+		tx_buf = &xdp_ring->tx_buf[ntc];
 
 		if (tx_buf->raw_buf) {
 			ice_clean_xdp_tx_buf(xdp_ring, tx_buf);
@@ -712,34 +661,158 @@ bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget)
 			xsk_frames++;
 		}
 
-		tx_desc->cmd_type_offset_bsz = 0;
-		tx_buf++;
-		tx_desc++;
 		ntc++;
+		if (ntc >= xdp_ring->count)
+			ntc = 0;
+	}
 
-		if (unlikely(!ntc)) {
-			ntc -= xdp_ring->count;
-			tx_buf = xdp_ring->tx_buf;
-			tx_desc = ICE_TX_DESC(xdp_ring, 0);
-		}
+	xdp_ring->next_to_clean += ICE_TX_THRESH;
+	if (xdp_ring->next_to_clean >= desc_cnt)
+		xdp_ring->next_to_clean -= desc_cnt;
+	if (xsk_frames)
+		xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames);
+	next_dd_desc->cmd_type_offset_bsz = 0;
+	xdp_ring->next_dd = xdp_ring->next_dd + ICE_TX_THRESH;
+	if (xdp_ring->next_dd >= desc_cnt)
+		xdp_ring->next_dd = ICE_TX_THRESH - 1;
 
-		prefetch(tx_desc);
+	next_dd_desc = ICE_TX_DESC(xdp_ring, next_dd);
+	if ((next_dd_desc->cmd_type_offset_bsz &
+	    cpu_to_le64(ICE_TX_DESC_DTYPE_DESC_DONE)))
+		goto again;
 
-	} while (likely(--budget));
+	return cleared_dds * ICE_TX_THRESH;
+}
 
-	ntc += xdp_ring->count;
-	xdp_ring->next_to_clean = ntc;
+/**
+ * ice_xmit_pkt - produce a single HW Tx descriptor out of AF_XDP descriptor
+ * @xdp_ring: XDP ring to produce the HW Tx descriptor on
+ * @desc: AF_XDP descriptor to pull the DMA address and length from
+ * @total_bytes: bytes accumulator that will be used for stats update
+ */
+static void ice_xmit_pkt(struct ice_tx_ring *xdp_ring, struct xdp_desc *desc,
+			 unsigned int *total_bytes)
+{
+	struct ice_tx_desc *tx_desc;
+	dma_addr_t dma;
 
-	if (xsk_frames)
-		xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames);
+	dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc->addr);
+	xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc->len);
+
+	tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_to_use++);
+	tx_desc->buf_addr = cpu_to_le64(dma);
+	tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP,
+						      0, desc->len, 0);
+
+	*total_bytes += desc->len;
+}
+
+/**
+ * ice_xmit_pkt - produce a batch of HW Tx descriptors out of AF_XDP descriptors
+ * @xdp_ring: XDP ring to produce the HW Tx descriptors on
+ * @descs: AF_XDP descriptors to pull the DMA addresses and lengths from
+ * @total_bytes: bytes accumulator that will be used for stats update
+ */
+static void ice_xmit_pkt_batch(struct ice_tx_ring *xdp_ring, struct xdp_desc *descs,
+			       unsigned int *total_bytes)
+{
+	u16 ntu = xdp_ring->next_to_use;
+	struct ice_tx_desc *tx_desc;
+	dma_addr_t dma;
+	u32 i;
+
+	loop_unrolled_for(i = 0; i < PKTS_PER_BATCH; i++) {
+		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, descs[i].addr);
+		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, descs[i].len);
+
+		tx_desc = ICE_TX_DESC(xdp_ring, ntu++);
+		tx_desc->buf_addr = cpu_to_le64(dma);
+		tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP,
+							      0, descs[i].len, 0);
+
+		*total_bytes += descs[i].len;
+	}
+
+	xdp_ring->next_to_use = ntu;
+
+	if (xdp_ring->next_to_use > xdp_ring->next_rs) {
+		tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_rs);
+		tx_desc->cmd_type_offset_bsz |=
+			cpu_to_le64(ICE_TX_DESC_CMD_RS << ICE_TXD_QW1_CMD_S);
+		xdp_ring->next_rs += ICE_TX_THRESH;
+	}
+}
+
+/**
+ * ice_fill_tx_hw_ring - produce the number of Tx descriptors onto ring
+ * @xdp_ring: XDP ring to produce the HW Tx descriptors on
+ * @descs: AF_XDP descriptors to pull the DMA addresses and lengths from
+ * @nb_pkts: count of packets to be send
+ * @total_bytes: bytes accumulator that will be used for stats update
+ *
+ */
+static void ice_fill_tx_hw_ring(struct ice_tx_ring *xdp_ring, struct xdp_desc *descs,
+				u32 nb_pkts, unsigned int *total_bytes)
+{
+	struct ice_tx_desc *tx_desc;
+	u32 batched, leftover, i;
+
+	batched = nb_pkts & ~(PKTS_PER_BATCH - 1);
+	leftover = nb_pkts & (PKTS_PER_BATCH - 1);
+	for (i = 0; i < batched; i += PKTS_PER_BATCH)
+		ice_xmit_pkt_batch(xdp_ring, &descs[i], total_bytes);
+	for (i = batched; i < batched + leftover; i++)
+		ice_xmit_pkt(xdp_ring, &descs[i], total_bytes);
+
+	if (xdp_ring->next_to_use > xdp_ring->next_rs) {
+		tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_rs);
+		tx_desc->cmd_type_offset_bsz |=
+			cpu_to_le64(ICE_TX_DESC_CMD_RS << ICE_TXD_QW1_CMD_S);
+		xdp_ring->next_rs += ICE_TX_THRESH;
+	}
+}
+
+/**
+ * ice_xmit_zc - take entries from XSK Tx ring and place them onto HW Tx ring
+ * @xdp_ring: XDP ring to produce the HW Tx descriptors on
+ * @budget: number of free descriptors on HW Tx ring that can be used
+ *
+ * Returns true if there is no more work that needs to be done, false otherwise
+ */
+bool ice_xmit_zc(struct ice_tx_ring *xdp_ring, u32 budget)
+{
+	struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs;
+	u32 nb_pkts, nb_processed = 0;
+	unsigned int total_bytes = 0;
+	struct ice_tx_desc *tx_desc;
+
+	if (budget < ICE_TX_THRESH)
+		budget += ice_clean_xdp_irq_zc(xdp_ring);
+
+	nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget);
+	if (!nb_pkts)
+		return true;
+
+	if (xdp_ring->next_to_use + nb_pkts >= xdp_ring->count) {
+		nb_processed = xdp_ring->count - xdp_ring->next_to_use;
+		ice_fill_tx_hw_ring(xdp_ring, descs, nb_processed, &total_bytes);
+		tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_rs);
+		tx_desc->cmd_type_offset_bsz |=
+			cpu_to_le64(ICE_TX_DESC_CMD_RS << ICE_TXD_QW1_CMD_S);
+		xdp_ring->next_rs = ICE_TX_THRESH - 1;
+		xdp_ring->next_to_use = 0;
+	}
+
+	ice_fill_tx_hw_ring(xdp_ring, &descs[nb_processed], nb_pkts - nb_processed,
+			    &total_bytes);
+
+	ice_xdp_ring_update_tail(xdp_ring);
+	ice_update_tx_ring_stats(xdp_ring, nb_pkts, total_bytes);
 
 	if (xsk_uses_need_wakeup(xdp_ring->xsk_pool))
 		xsk_set_tx_need_wakeup(xdp_ring->xsk_pool);
 
-	ice_update_tx_ring_stats(xdp_ring, total_packets, total_bytes);
-	xmit_done = ice_xmit_zc(xdp_ring, ICE_DFLT_IRQ_WORK);
-
-	return budget > 0 && xmit_done;
+	return nb_pkts < budget;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h
index 4c7bd8e9dfc4..f2eb99063c1f 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.h
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.h
@@ -6,19 +6,36 @@
 #include "ice_txrx.h"
 #include "ice.h"
 
+#define PKTS_PER_BATCH 8
+
+#ifdef __clang__
+#define loop_unrolled_for _Pragma("clang loop unroll_count(8)") for
+#elif __GNUC__ >= 4
+#define loop_unrolled_for _Pragma("GCC unroll 8") for
+#else
+#define loop_unrolled_for for
+#endif
+
 struct ice_vsi;
 
 #ifdef CONFIG_XDP_SOCKETS
 int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool,
 		       u16 qid);
 int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget);
-bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget);
 int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags);
 bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count);
 bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi);
 void ice_xsk_clean_rx_ring(struct ice_rx_ring *rx_ring);
 void ice_xsk_clean_xdp_ring(struct ice_tx_ring *xdp_ring);
+bool ice_xmit_zc(struct ice_tx_ring *xdp_ring, u32 budget);
 #else
+static inline bool
+ice_xmit_zc(struct ice_tx_ring __always_unused *xdp_ring,
+	    u32 __always_unused budget)
+{
+	return false;
+}
+
 static inline int
 ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi,
 		   struct xsk_buff_pool __always_unused *pool,
@@ -34,13 +51,6 @@ ice_clean_rx_irq_zc(struct ice_rx_ring __always_unused *rx_ring,
 	return 0;
 }
 
-static inline bool
-ice_clean_tx_irq_zc(struct ice_tx_ring __always_unused *xdp_ring,
-		    int __always_unused budget)
-{
-	return false;
-}
-
 static inline bool
 ice_alloc_rx_bufs_zc(struct ice_rx_ring __always_unused *rx_ring,
 		     u16 __always_unused count)
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v2 4/4] ice: xsk: borrow xdp_tx_active logic from i40e
  2021-12-16 13:59 [PATCH bpf-next v2 0/4] xsk: Tx improvements Maciej Fijalkowski
                   ` (2 preceding siblings ...)
  2021-12-16 13:59 ` [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API Maciej Fijalkowski
@ 2021-12-16 13:59 ` Maciej Fijalkowski
  3 siblings, 0 replies; 14+ messages in thread
From: Maciej Fijalkowski @ 2021-12-16 13:59 UTC (permalink / raw)
  To: bpf, ast, daniel; +Cc: netdev, magnus.karlsson, Maciej Fijalkowski

One of the things that commit 5574ff7b7b3d ("i40e: optimize AF_XDP Tx
completion path") introduced was the @xdp_tx_active field. Its usage
from i40e can be adjusted to ice driver and give us positive performance
results.

If the descriptor that @next_dd to points has been sent by HW (its DD
bit is set), then we are sure that there are ICE_TX_THRESH count of
descriptors ready to be cleaned. If @xdp_tx_active is 0 which means that
related xdp_ring is not used for XDP_{TX, REDIRECT} workloads, then we
know how many XSK entries should placed to completion queue, IOW walking
through the ring can be skipped.

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.h     | 1 +
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 1 +
 drivers/net/ethernet/intel/ice/ice_xsk.c      | 8 +++++++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index 9e8b4337d131..5e37d4f57bfa 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -333,6 +333,7 @@ struct ice_tx_ring {
 	struct ice_ptp_tx *tx_tstamps;
 	spinlock_t tx_lock;
 	u32 txq_teid;			/* Added Tx queue TEID */
+	u16 xdp_tx_active;
 #define ICE_TX_FLAGS_RING_XDP		BIT(0)
 	u8 flags;
 	u8 dcb_tc;			/* Traffic class of ring */
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 1dd7e84f41f8..f15c215c973c 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -299,6 +299,7 @@ int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring *xdp_ring)
 	tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP, 0,
 						      size, 0);
 
+	xdp_ring->xdp_tx_active++;
 	i++;
 	if (i == xdp_ring->count) {
 		i = 0;
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 563ea7e7e0b1..a81ade2b7600 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -620,6 +620,7 @@ static void
 ice_clean_xdp_tx_buf(struct ice_tx_ring *xdp_ring, struct ice_tx_buf *tx_buf)
 {
 	xdp_return_frame((struct xdp_frame *)tx_buf->raw_buf);
+	xdp_ring->xdp_tx_active--;
 	dma_unmap_single(xdp_ring->dev, dma_unmap_addr(tx_buf, dma),
 			 dma_unmap_len(tx_buf, len), DMA_TO_DEVICE);
 	dma_unmap_len_set(tx_buf, len, 0);
@@ -648,6 +649,11 @@ static u16 ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring)
 
 again:
 	cleared_dds++;
+	xsk_frames = 0;
+	if (likely(!xdp_ring->xdp_tx_active)) {
+		xsk_frames = ICE_TX_THRESH;
+		goto skip;
+	}
 
 	ntc = xdp_ring->next_to_clean;
 
@@ -665,7 +671,7 @@ static u16 ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring)
 		if (ntc >= xdp_ring->count)
 			ntc = 0;
 	}
-
+skip:
 	xdp_ring->next_to_clean += ICE_TX_THRESH;
 	if (xdp_ring->next_to_clean >= desc_cnt)
 		xdp_ring->next_to_clean -= desc_cnt;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API
  2021-12-16 13:59 ` [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API Maciej Fijalkowski
@ 2021-12-17  3:02   ` kernel test robot
  2021-12-29  3:02   ` Alexei Starovoitov
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2021-12-17  3:02 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 11998 bytes --]

Hi Maciej,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Maciej-Fijalkowski/xsk-Tx-improvements/20211216-220139
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: arc-allmodconfig (https://download.01.org/0day-ci/archive/20211217/202112171058.ExFmKObL-lkp(a)intel.com/config)
compiler: arceb-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/64cb650145881d3738a05befb3773e16b1a5de56
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Maciej-Fijalkowski/xsk-Tx-improvements/20211216-220139
        git checkout 64cb650145881d3738a05befb3773e16b1a5de56
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/net/ethernet/intel/ice/ice_xsk.c:636: warning: expecting prototype for ice_clean_xdp_irq(). Prototype was for ice_clean_xdp_irq_zc() instead
>> drivers/net/ethernet/intel/ice/ice_xsk.c:719: warning: expecting prototype for ice_xmit_pkt(). Prototype was for ice_xmit_pkt_batch() instead


vim +636 drivers/net/ethernet/intel/ice/ice_xsk.c

2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  628  
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  629  /**
64cb650145881d3 Maciej Fijalkowski     2021-12-16  630   * ice_clean_xdp_irq - Reclaim resources after transmit completes on XDP ring
64cb650145881d3 Maciej Fijalkowski     2021-12-16  631   * @xdp_ring: XDP ring to clean
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  632   *
64cb650145881d3 Maciej Fijalkowski     2021-12-16  633   * Returns count of cleaned descriptors
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  634   */
64cb650145881d3 Maciej Fijalkowski     2021-12-16  635  static u16 ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring)
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04 @636  {
64cb650145881d3 Maciej Fijalkowski     2021-12-16  637  	struct ice_tx_desc *next_dd_desc;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  638  	u16 next_dd = xdp_ring->next_dd;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  639  	u16 desc_cnt = xdp_ring->count;
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  640  	struct ice_tx_buf *tx_buf;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  641  	u16 ntc, cleared_dds = 0;
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  642  	u32 xsk_frames = 0;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  643  	u16 i;
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  644  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  645  	next_dd_desc = ICE_TX_DESC(xdp_ring, next_dd);
64cb650145881d3 Maciej Fijalkowski     2021-12-16  646  	if (!(next_dd_desc->cmd_type_offset_bsz &
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  647  	    cpu_to_le64(ICE_TX_DESC_DTYPE_DESC_DONE)))
64cb650145881d3 Maciej Fijalkowski     2021-12-16  648  		return 0;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  649  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  650  again:
64cb650145881d3 Maciej Fijalkowski     2021-12-16  651  	cleared_dds++;
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  652  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  653  	ntc = xdp_ring->next_to_clean;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  654  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  655  	for (i = 0; i < ICE_TX_THRESH; i++) {
64cb650145881d3 Maciej Fijalkowski     2021-12-16  656  		tx_buf = &xdp_ring->tx_buf[ntc];
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  657  
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  658  		if (tx_buf->raw_buf) {
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  659  			ice_clean_xdp_tx_buf(xdp_ring, tx_buf);
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  660  			tx_buf->raw_buf = NULL;
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  661  		} else {
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  662  			xsk_frames++;
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  663  		}
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  664  
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  665  		ntc++;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  666  		if (ntc >= xdp_ring->count)
64cb650145881d3 Maciej Fijalkowski     2021-12-16  667  			ntc = 0;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  668  	}
64cb650145881d3 Maciej Fijalkowski     2021-12-16  669  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  670  	xdp_ring->next_to_clean += ICE_TX_THRESH;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  671  	if (xdp_ring->next_to_clean >= desc_cnt)
64cb650145881d3 Maciej Fijalkowski     2021-12-16  672  		xdp_ring->next_to_clean -= desc_cnt;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  673  	if (xsk_frames)
64cb650145881d3 Maciej Fijalkowski     2021-12-16  674  		xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames);
64cb650145881d3 Maciej Fijalkowski     2021-12-16  675  	next_dd_desc->cmd_type_offset_bsz = 0;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  676  	xdp_ring->next_dd = xdp_ring->next_dd + ICE_TX_THRESH;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  677  	if (xdp_ring->next_dd >= desc_cnt)
64cb650145881d3 Maciej Fijalkowski     2021-12-16  678  		xdp_ring->next_dd = ICE_TX_THRESH - 1;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  679  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  680  	next_dd_desc = ICE_TX_DESC(xdp_ring, next_dd);
64cb650145881d3 Maciej Fijalkowski     2021-12-16  681  	if ((next_dd_desc->cmd_type_offset_bsz &
64cb650145881d3 Maciej Fijalkowski     2021-12-16  682  	    cpu_to_le64(ICE_TX_DESC_DTYPE_DESC_DONE)))
64cb650145881d3 Maciej Fijalkowski     2021-12-16  683  		goto again;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  684  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  685  	return cleared_dds * ICE_TX_THRESH;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  686  }
64cb650145881d3 Maciej Fijalkowski     2021-12-16  687  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  688  /**
64cb650145881d3 Maciej Fijalkowski     2021-12-16  689   * ice_xmit_pkt - produce a single HW Tx descriptor out of AF_XDP descriptor
64cb650145881d3 Maciej Fijalkowski     2021-12-16  690   * @xdp_ring: XDP ring to produce the HW Tx descriptor on
64cb650145881d3 Maciej Fijalkowski     2021-12-16  691   * @desc: AF_XDP descriptor to pull the DMA address and length from
64cb650145881d3 Maciej Fijalkowski     2021-12-16  692   * @total_bytes: bytes accumulator that will be used for stats update
64cb650145881d3 Maciej Fijalkowski     2021-12-16  693   */
64cb650145881d3 Maciej Fijalkowski     2021-12-16  694  static void ice_xmit_pkt(struct ice_tx_ring *xdp_ring, struct xdp_desc *desc,
64cb650145881d3 Maciej Fijalkowski     2021-12-16  695  			 unsigned int *total_bytes)
64cb650145881d3 Maciej Fijalkowski     2021-12-16  696  {
64cb650145881d3 Maciej Fijalkowski     2021-12-16  697  	struct ice_tx_desc *tx_desc;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  698  	dma_addr_t dma;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  699  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  700  	dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc->addr);
64cb650145881d3 Maciej Fijalkowski     2021-12-16  701  	xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc->len);
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  702  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  703  	tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_to_use++);
64cb650145881d3 Maciej Fijalkowski     2021-12-16  704  	tx_desc->buf_addr = cpu_to_le64(dma);
64cb650145881d3 Maciej Fijalkowski     2021-12-16  705  	tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP,
64cb650145881d3 Maciej Fijalkowski     2021-12-16  706  						      0, desc->len, 0);
64cb650145881d3 Maciej Fijalkowski     2021-12-16  707  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  708  	*total_bytes += desc->len;
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  709  }
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  710  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  711  /**
64cb650145881d3 Maciej Fijalkowski     2021-12-16  712   * ice_xmit_pkt - produce a batch of HW Tx descriptors out of AF_XDP descriptors
64cb650145881d3 Maciej Fijalkowski     2021-12-16  713   * @xdp_ring: XDP ring to produce the HW Tx descriptors on
64cb650145881d3 Maciej Fijalkowski     2021-12-16  714   * @descs: AF_XDP descriptors to pull the DMA addresses and lengths from
64cb650145881d3 Maciej Fijalkowski     2021-12-16  715   * @total_bytes: bytes accumulator that will be used for stats update
64cb650145881d3 Maciej Fijalkowski     2021-12-16  716   */
64cb650145881d3 Maciej Fijalkowski     2021-12-16  717  static void ice_xmit_pkt_batch(struct ice_tx_ring *xdp_ring, struct xdp_desc *descs,
64cb650145881d3 Maciej Fijalkowski     2021-12-16  718  			       unsigned int *total_bytes)
64cb650145881d3 Maciej Fijalkowski     2021-12-16 @719  {
64cb650145881d3 Maciej Fijalkowski     2021-12-16  720  	u16 ntu = xdp_ring->next_to_use;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  721  	struct ice_tx_desc *tx_desc;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  722  	dma_addr_t dma;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  723  	u32 i;
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  724  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  725  	loop_unrolled_for(i = 0; i < PKTS_PER_BATCH; i++) {
64cb650145881d3 Maciej Fijalkowski     2021-12-16  726  		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, descs[i].addr);
64cb650145881d3 Maciej Fijalkowski     2021-12-16  727  		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, descs[i].len);
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  728  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  729  		tx_desc = ICE_TX_DESC(xdp_ring, ntu++);
64cb650145881d3 Maciej Fijalkowski     2021-12-16  730  		tx_desc->buf_addr = cpu_to_le64(dma);
64cb650145881d3 Maciej Fijalkowski     2021-12-16  731  		tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP,
64cb650145881d3 Maciej Fijalkowski     2021-12-16  732  							      0, descs[i].len, 0);
2d4238f55697221 Krzysztof Kazimierczak 2019-11-04  733  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  734  		*total_bytes += descs[i].len;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  735  	}
64cb650145881d3 Maciej Fijalkowski     2021-12-16  736  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  737  	xdp_ring->next_to_use = ntu;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  738  
64cb650145881d3 Maciej Fijalkowski     2021-12-16  739  	if (xdp_ring->next_to_use > xdp_ring->next_rs) {
64cb650145881d3 Maciej Fijalkowski     2021-12-16  740  		tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_rs);
64cb650145881d3 Maciej Fijalkowski     2021-12-16  741  		tx_desc->cmd_type_offset_bsz |=
64cb650145881d3 Maciej Fijalkowski     2021-12-16  742  			cpu_to_le64(ICE_TX_DESC_CMD_RS << ICE_TXD_QW1_CMD_S);
64cb650145881d3 Maciej Fijalkowski     2021-12-16  743  		xdp_ring->next_rs += ICE_TX_THRESH;
64cb650145881d3 Maciej Fijalkowski     2021-12-16  744  	}
64cb650145881d3 Maciej Fijalkowski     2021-12-16  745  }
64cb650145881d3 Maciej Fijalkowski     2021-12-16  746  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 2/4] ice: xsk: avoid potential dead AF_XDP Tx processing
  2021-12-16 13:59 ` [PATCH bpf-next v2 2/4] ice: xsk: avoid potential dead AF_XDP Tx processing Maciej Fijalkowski
@ 2021-12-21  7:38   ` Magnus Karlsson
  0 siblings, 0 replies; 14+ messages in thread
From: Magnus Karlsson @ 2021-12-21  7:38 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Network Development,
	Karlsson, Magnus

On Fri, Dec 17, 2021 at 12:38 AM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> Commit 9610bd988df9 ("ice: optimize XDP_TX workloads") introduced
> @next_dd and @next_rs to ice_tx_ring struct. Currently, their state is
> not restored in ice_clean_tx_ring(), which was not causing any troubles
> as the XDP rings are gone after we're done with XDP prog on interface.
>
> For upcoming usage of mentioned fields in AF_XDP, this might expose us
> to a potential dead Tx side. Scenario would look like following (based
> on xdpsock):
>
> - two xdpsock instances are spawned in Tx mode
> - one of them is killed
> - XDP prog is kept on interface due to the other xdpsock still running
>   * this means that XDP rings stayed in place
> - xdpsock is launched again on same queue id that was terminated on
> - @next_dd and @next_rs setting is bogus, therefore transmit side is
>   broken
>
> To protect us from the above, restore the default @next_rs and @next_dd
> values when cleaning the Tx ring.

Thank you Maciej.

Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>

> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_txrx.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
> index bc3ba19dc88f..0f3f92ce8a95 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> @@ -172,6 +172,8 @@ void ice_clean_tx_ring(struct ice_tx_ring *tx_ring)
>
>         tx_ring->next_to_use = 0;
>         tx_ring->next_to_clean = 0;
> +       tx_ring->next_dd = ICE_TX_THRESH - 1;
> +       tx_ring->next_rs = ICE_TX_THRESH - 1;
>
>         if (!tx_ring->netdev)
>                 return;
> --
> 2.33.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API
  2021-12-16 13:59 ` [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API Maciej Fijalkowski
  2021-12-17  3:02   ` kernel test robot
@ 2021-12-29  3:02   ` Alexei Starovoitov
  2021-12-29 10:10     ` kernel test robot
  2021-12-29 13:11   ` Alexander Lobakin
  3 siblings, 0 replies; 14+ messages in thread
From: Alexei Starovoitov @ 2021-12-29  3:02 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Network Development,
	Karlsson, Magnus

On Thu, Dec 16, 2021 at 6:00 AM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>  }
>
>  /**
> - * ice_clean_tx_irq_zc - Completes AF_XDP entries, and cleans XDP entries
> - * @xdp_ring: XDP Tx ring
> - * @budget: NAPI budget
> + * ice_clean_xdp_irq - Reclaim resources after transmit completes on XDP ring
> + * @xdp_ring: XDP ring to clean
>   *
> - * Returns true if cleanup/tranmission is done.
> + * Returns count of cleaned descriptors
>   */
> -bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget)
> +static u16 ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring)

The patches look good, but please fix the warnings:

../drivers/net/ethernet/intel/ice/ice_xsk.c:636: warning: expecting
prototype for ice_clean_xdp_irq(). Prototype was for
ice_clean_xdp_irq_zc() instead
../drivers/net/ethernet/intel/ice/ice_xsk.c:719: warning: expecting
prototype for ice_xmit_pkt(). Prototype was for ice_xmit_pkt_batch()
instead
../drivers/net/ethernet/intel/ice/ice_xsk.c:636: warning: expecting
prototype for ice_clean_xdp_irq(). Prototype was for
ice_clean_xdp_irq_zc() instead
../drivers/net/ethernet/intel/ice/ice_xsk.c:719: warning: expecting
prototype for ice_xmit_pkt(). Prototype was for ice_xmit_pkt_batch()
instead

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API
  2021-12-16 13:59 ` [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API Maciej Fijalkowski
@ 2021-12-29 10:10     ` kernel test robot
  2021-12-29  3:02   ` Alexei Starovoitov
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2021-12-29 10:10 UTC (permalink / raw)
  To: Maciej Fijalkowski; +Cc: llvm, kbuild-all

Hi Maciej,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Maciej-Fijalkowski/xsk-Tx-improvements/20211216-220139
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: riscv-randconfig-r042-20211228 (https://download.01.org/0day-ci/archive/20211229/202112291846.LmSTdxbq-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project cd284b7ac0615afc6e0f1a30da2777e361de27a3)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/0day-ci/linux/commit/64cb650145881d3738a05befb3773e16b1a5de56
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Maciej-Fijalkowski/xsk-Tx-improvements/20211216-220139
        git checkout 64cb650145881d3738a05befb3773e16b1a5de56
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash drivers/net/ethernet/intel/ice/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/net/ethernet/intel/ice/ice_xsk.c:636: warning: expecting prototype for ice_clean_xdp_irq(). Prototype was for ice_clean_xdp_irq_zc() instead
>> drivers/net/ethernet/intel/ice/ice_xsk.c:719: warning: expecting prototype for ice_xmit_pkt(). Prototype was for ice_xmit_pkt_batch() instead


vim +636 drivers/net/ethernet/intel/ice/ice_xsk.c

2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  628  
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  629  /**
64cb650145881d Maciej Fijalkowski     2021-12-16  630   * ice_clean_xdp_irq - Reclaim resources after transmit completes on XDP ring
64cb650145881d Maciej Fijalkowski     2021-12-16  631   * @xdp_ring: XDP ring to clean
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  632   *
64cb650145881d Maciej Fijalkowski     2021-12-16  633   * Returns count of cleaned descriptors
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  634   */
64cb650145881d Maciej Fijalkowski     2021-12-16  635  static u16 ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring)
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04 @636  {
64cb650145881d Maciej Fijalkowski     2021-12-16  637  	struct ice_tx_desc *next_dd_desc;
64cb650145881d Maciej Fijalkowski     2021-12-16  638  	u16 next_dd = xdp_ring->next_dd;
64cb650145881d Maciej Fijalkowski     2021-12-16  639  	u16 desc_cnt = xdp_ring->count;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  640  	struct ice_tx_buf *tx_buf;
64cb650145881d Maciej Fijalkowski     2021-12-16  641  	u16 ntc, cleared_dds = 0;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  642  	u32 xsk_frames = 0;
64cb650145881d Maciej Fijalkowski     2021-12-16  643  	u16 i;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  644  
64cb650145881d Maciej Fijalkowski     2021-12-16  645  	next_dd_desc = ICE_TX_DESC(xdp_ring, next_dd);
64cb650145881d Maciej Fijalkowski     2021-12-16  646  	if (!(next_dd_desc->cmd_type_offset_bsz &
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  647  	    cpu_to_le64(ICE_TX_DESC_DTYPE_DESC_DONE)))
64cb650145881d Maciej Fijalkowski     2021-12-16  648  		return 0;
64cb650145881d Maciej Fijalkowski     2021-12-16  649  
64cb650145881d Maciej Fijalkowski     2021-12-16  650  again:
64cb650145881d Maciej Fijalkowski     2021-12-16  651  	cleared_dds++;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  652  
64cb650145881d Maciej Fijalkowski     2021-12-16  653  	ntc = xdp_ring->next_to_clean;
64cb650145881d Maciej Fijalkowski     2021-12-16  654  
64cb650145881d Maciej Fijalkowski     2021-12-16  655  	for (i = 0; i < ICE_TX_THRESH; i++) {
64cb650145881d Maciej Fijalkowski     2021-12-16  656  		tx_buf = &xdp_ring->tx_buf[ntc];
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  657  
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  658  		if (tx_buf->raw_buf) {
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  659  			ice_clean_xdp_tx_buf(xdp_ring, tx_buf);
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  660  			tx_buf->raw_buf = NULL;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  661  		} else {
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  662  			xsk_frames++;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  663  		}
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  664  
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  665  		ntc++;
64cb650145881d Maciej Fijalkowski     2021-12-16  666  		if (ntc >= xdp_ring->count)
64cb650145881d Maciej Fijalkowski     2021-12-16  667  			ntc = 0;
64cb650145881d Maciej Fijalkowski     2021-12-16  668  	}
64cb650145881d Maciej Fijalkowski     2021-12-16  669  
64cb650145881d Maciej Fijalkowski     2021-12-16  670  	xdp_ring->next_to_clean += ICE_TX_THRESH;
64cb650145881d Maciej Fijalkowski     2021-12-16  671  	if (xdp_ring->next_to_clean >= desc_cnt)
64cb650145881d Maciej Fijalkowski     2021-12-16  672  		xdp_ring->next_to_clean -= desc_cnt;
64cb650145881d Maciej Fijalkowski     2021-12-16  673  	if (xsk_frames)
64cb650145881d Maciej Fijalkowski     2021-12-16  674  		xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames);
64cb650145881d Maciej Fijalkowski     2021-12-16  675  	next_dd_desc->cmd_type_offset_bsz = 0;
64cb650145881d Maciej Fijalkowski     2021-12-16  676  	xdp_ring->next_dd = xdp_ring->next_dd + ICE_TX_THRESH;
64cb650145881d Maciej Fijalkowski     2021-12-16  677  	if (xdp_ring->next_dd >= desc_cnt)
64cb650145881d Maciej Fijalkowski     2021-12-16  678  		xdp_ring->next_dd = ICE_TX_THRESH - 1;
64cb650145881d Maciej Fijalkowski     2021-12-16  679  
64cb650145881d Maciej Fijalkowski     2021-12-16  680  	next_dd_desc = ICE_TX_DESC(xdp_ring, next_dd);
64cb650145881d Maciej Fijalkowski     2021-12-16  681  	if ((next_dd_desc->cmd_type_offset_bsz &
64cb650145881d Maciej Fijalkowski     2021-12-16  682  	    cpu_to_le64(ICE_TX_DESC_DTYPE_DESC_DONE)))
64cb650145881d Maciej Fijalkowski     2021-12-16  683  		goto again;
64cb650145881d Maciej Fijalkowski     2021-12-16  684  
64cb650145881d Maciej Fijalkowski     2021-12-16  685  	return cleared_dds * ICE_TX_THRESH;
64cb650145881d Maciej Fijalkowski     2021-12-16  686  }
64cb650145881d Maciej Fijalkowski     2021-12-16  687  
64cb650145881d Maciej Fijalkowski     2021-12-16  688  /**
64cb650145881d Maciej Fijalkowski     2021-12-16  689   * ice_xmit_pkt - produce a single HW Tx descriptor out of AF_XDP descriptor
64cb650145881d Maciej Fijalkowski     2021-12-16  690   * @xdp_ring: XDP ring to produce the HW Tx descriptor on
64cb650145881d Maciej Fijalkowski     2021-12-16  691   * @desc: AF_XDP descriptor to pull the DMA address and length from
64cb650145881d Maciej Fijalkowski     2021-12-16  692   * @total_bytes: bytes accumulator that will be used for stats update
64cb650145881d Maciej Fijalkowski     2021-12-16  693   */
64cb650145881d Maciej Fijalkowski     2021-12-16  694  static void ice_xmit_pkt(struct ice_tx_ring *xdp_ring, struct xdp_desc *desc,
64cb650145881d Maciej Fijalkowski     2021-12-16  695  			 unsigned int *total_bytes)
64cb650145881d Maciej Fijalkowski     2021-12-16  696  {
64cb650145881d Maciej Fijalkowski     2021-12-16  697  	struct ice_tx_desc *tx_desc;
64cb650145881d Maciej Fijalkowski     2021-12-16  698  	dma_addr_t dma;
64cb650145881d Maciej Fijalkowski     2021-12-16  699  
64cb650145881d Maciej Fijalkowski     2021-12-16  700  	dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc->addr);
64cb650145881d Maciej Fijalkowski     2021-12-16  701  	xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc->len);
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  702  
64cb650145881d Maciej Fijalkowski     2021-12-16  703  	tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_to_use++);
64cb650145881d Maciej Fijalkowski     2021-12-16  704  	tx_desc->buf_addr = cpu_to_le64(dma);
64cb650145881d Maciej Fijalkowski     2021-12-16  705  	tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP,
64cb650145881d Maciej Fijalkowski     2021-12-16  706  						      0, desc->len, 0);
64cb650145881d Maciej Fijalkowski     2021-12-16  707  
64cb650145881d Maciej Fijalkowski     2021-12-16  708  	*total_bytes += desc->len;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  709  }
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  710  
64cb650145881d Maciej Fijalkowski     2021-12-16  711  /**
64cb650145881d Maciej Fijalkowski     2021-12-16  712   * ice_xmit_pkt - produce a batch of HW Tx descriptors out of AF_XDP descriptors
64cb650145881d Maciej Fijalkowski     2021-12-16  713   * @xdp_ring: XDP ring to produce the HW Tx descriptors on
64cb650145881d Maciej Fijalkowski     2021-12-16  714   * @descs: AF_XDP descriptors to pull the DMA addresses and lengths from
64cb650145881d Maciej Fijalkowski     2021-12-16  715   * @total_bytes: bytes accumulator that will be used for stats update
64cb650145881d Maciej Fijalkowski     2021-12-16  716   */
64cb650145881d Maciej Fijalkowski     2021-12-16  717  static void ice_xmit_pkt_batch(struct ice_tx_ring *xdp_ring, struct xdp_desc *descs,
64cb650145881d Maciej Fijalkowski     2021-12-16  718  			       unsigned int *total_bytes)
64cb650145881d Maciej Fijalkowski     2021-12-16 @719  {
64cb650145881d Maciej Fijalkowski     2021-12-16  720  	u16 ntu = xdp_ring->next_to_use;
64cb650145881d Maciej Fijalkowski     2021-12-16  721  	struct ice_tx_desc *tx_desc;
64cb650145881d Maciej Fijalkowski     2021-12-16  722  	dma_addr_t dma;
64cb650145881d Maciej Fijalkowski     2021-12-16  723  	u32 i;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  724  
64cb650145881d Maciej Fijalkowski     2021-12-16  725  	loop_unrolled_for(i = 0; i < PKTS_PER_BATCH; i++) {
64cb650145881d Maciej Fijalkowski     2021-12-16  726  		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, descs[i].addr);
64cb650145881d Maciej Fijalkowski     2021-12-16  727  		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, descs[i].len);
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  728  
64cb650145881d Maciej Fijalkowski     2021-12-16  729  		tx_desc = ICE_TX_DESC(xdp_ring, ntu++);
64cb650145881d Maciej Fijalkowski     2021-12-16  730  		tx_desc->buf_addr = cpu_to_le64(dma);
64cb650145881d Maciej Fijalkowski     2021-12-16  731  		tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP,
64cb650145881d Maciej Fijalkowski     2021-12-16  732  							      0, descs[i].len, 0);
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  733  
64cb650145881d Maciej Fijalkowski     2021-12-16  734  		*total_bytes += descs[i].len;
64cb650145881d Maciej Fijalkowski     2021-12-16  735  	}
64cb650145881d Maciej Fijalkowski     2021-12-16  736  
64cb650145881d Maciej Fijalkowski     2021-12-16  737  	xdp_ring->next_to_use = ntu;
64cb650145881d Maciej Fijalkowski     2021-12-16  738  
64cb650145881d Maciej Fijalkowski     2021-12-16  739  	if (xdp_ring->next_to_use > xdp_ring->next_rs) {
64cb650145881d Maciej Fijalkowski     2021-12-16  740  		tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_rs);
64cb650145881d Maciej Fijalkowski     2021-12-16  741  		tx_desc->cmd_type_offset_bsz |=
64cb650145881d Maciej Fijalkowski     2021-12-16  742  			cpu_to_le64(ICE_TX_DESC_CMD_RS << ICE_TXD_QW1_CMD_S);
64cb650145881d Maciej Fijalkowski     2021-12-16  743  		xdp_ring->next_rs += ICE_TX_THRESH;
64cb650145881d Maciej Fijalkowski     2021-12-16  744  	}
64cb650145881d Maciej Fijalkowski     2021-12-16  745  }
64cb650145881d Maciej Fijalkowski     2021-12-16  746  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API
@ 2021-12-29 10:10     ` kernel test robot
  0 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2021-12-29 10:10 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 12116 bytes --]

Hi Maciej,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Maciej-Fijalkowski/xsk-Tx-improvements/20211216-220139
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: riscv-randconfig-r042-20211228 (https://download.01.org/0day-ci/archive/20211229/202112291846.LmSTdxbq-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project cd284b7ac0615afc6e0f1a30da2777e361de27a3)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/0day-ci/linux/commit/64cb650145881d3738a05befb3773e16b1a5de56
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Maciej-Fijalkowski/xsk-Tx-improvements/20211216-220139
        git checkout 64cb650145881d3738a05befb3773e16b1a5de56
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash drivers/net/ethernet/intel/ice/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/net/ethernet/intel/ice/ice_xsk.c:636: warning: expecting prototype for ice_clean_xdp_irq(). Prototype was for ice_clean_xdp_irq_zc() instead
>> drivers/net/ethernet/intel/ice/ice_xsk.c:719: warning: expecting prototype for ice_xmit_pkt(). Prototype was for ice_xmit_pkt_batch() instead


vim +636 drivers/net/ethernet/intel/ice/ice_xsk.c

2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  628  
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  629  /**
64cb650145881d Maciej Fijalkowski     2021-12-16  630   * ice_clean_xdp_irq - Reclaim resources after transmit completes on XDP ring
64cb650145881d Maciej Fijalkowski     2021-12-16  631   * @xdp_ring: XDP ring to clean
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  632   *
64cb650145881d Maciej Fijalkowski     2021-12-16  633   * Returns count of cleaned descriptors
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  634   */
64cb650145881d Maciej Fijalkowski     2021-12-16  635  static u16 ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring)
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04 @636  {
64cb650145881d Maciej Fijalkowski     2021-12-16  637  	struct ice_tx_desc *next_dd_desc;
64cb650145881d Maciej Fijalkowski     2021-12-16  638  	u16 next_dd = xdp_ring->next_dd;
64cb650145881d Maciej Fijalkowski     2021-12-16  639  	u16 desc_cnt = xdp_ring->count;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  640  	struct ice_tx_buf *tx_buf;
64cb650145881d Maciej Fijalkowski     2021-12-16  641  	u16 ntc, cleared_dds = 0;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  642  	u32 xsk_frames = 0;
64cb650145881d Maciej Fijalkowski     2021-12-16  643  	u16 i;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  644  
64cb650145881d Maciej Fijalkowski     2021-12-16  645  	next_dd_desc = ICE_TX_DESC(xdp_ring, next_dd);
64cb650145881d Maciej Fijalkowski     2021-12-16  646  	if (!(next_dd_desc->cmd_type_offset_bsz &
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  647  	    cpu_to_le64(ICE_TX_DESC_DTYPE_DESC_DONE)))
64cb650145881d Maciej Fijalkowski     2021-12-16  648  		return 0;
64cb650145881d Maciej Fijalkowski     2021-12-16  649  
64cb650145881d Maciej Fijalkowski     2021-12-16  650  again:
64cb650145881d Maciej Fijalkowski     2021-12-16  651  	cleared_dds++;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  652  
64cb650145881d Maciej Fijalkowski     2021-12-16  653  	ntc = xdp_ring->next_to_clean;
64cb650145881d Maciej Fijalkowski     2021-12-16  654  
64cb650145881d Maciej Fijalkowski     2021-12-16  655  	for (i = 0; i < ICE_TX_THRESH; i++) {
64cb650145881d Maciej Fijalkowski     2021-12-16  656  		tx_buf = &xdp_ring->tx_buf[ntc];
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  657  
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  658  		if (tx_buf->raw_buf) {
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  659  			ice_clean_xdp_tx_buf(xdp_ring, tx_buf);
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  660  			tx_buf->raw_buf = NULL;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  661  		} else {
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  662  			xsk_frames++;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  663  		}
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  664  
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  665  		ntc++;
64cb650145881d Maciej Fijalkowski     2021-12-16  666  		if (ntc >= xdp_ring->count)
64cb650145881d Maciej Fijalkowski     2021-12-16  667  			ntc = 0;
64cb650145881d Maciej Fijalkowski     2021-12-16  668  	}
64cb650145881d Maciej Fijalkowski     2021-12-16  669  
64cb650145881d Maciej Fijalkowski     2021-12-16  670  	xdp_ring->next_to_clean += ICE_TX_THRESH;
64cb650145881d Maciej Fijalkowski     2021-12-16  671  	if (xdp_ring->next_to_clean >= desc_cnt)
64cb650145881d Maciej Fijalkowski     2021-12-16  672  		xdp_ring->next_to_clean -= desc_cnt;
64cb650145881d Maciej Fijalkowski     2021-12-16  673  	if (xsk_frames)
64cb650145881d Maciej Fijalkowski     2021-12-16  674  		xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames);
64cb650145881d Maciej Fijalkowski     2021-12-16  675  	next_dd_desc->cmd_type_offset_bsz = 0;
64cb650145881d Maciej Fijalkowski     2021-12-16  676  	xdp_ring->next_dd = xdp_ring->next_dd + ICE_TX_THRESH;
64cb650145881d Maciej Fijalkowski     2021-12-16  677  	if (xdp_ring->next_dd >= desc_cnt)
64cb650145881d Maciej Fijalkowski     2021-12-16  678  		xdp_ring->next_dd = ICE_TX_THRESH - 1;
64cb650145881d Maciej Fijalkowski     2021-12-16  679  
64cb650145881d Maciej Fijalkowski     2021-12-16  680  	next_dd_desc = ICE_TX_DESC(xdp_ring, next_dd);
64cb650145881d Maciej Fijalkowski     2021-12-16  681  	if ((next_dd_desc->cmd_type_offset_bsz &
64cb650145881d Maciej Fijalkowski     2021-12-16  682  	    cpu_to_le64(ICE_TX_DESC_DTYPE_DESC_DONE)))
64cb650145881d Maciej Fijalkowski     2021-12-16  683  		goto again;
64cb650145881d Maciej Fijalkowski     2021-12-16  684  
64cb650145881d Maciej Fijalkowski     2021-12-16  685  	return cleared_dds * ICE_TX_THRESH;
64cb650145881d Maciej Fijalkowski     2021-12-16  686  }
64cb650145881d Maciej Fijalkowski     2021-12-16  687  
64cb650145881d Maciej Fijalkowski     2021-12-16  688  /**
64cb650145881d Maciej Fijalkowski     2021-12-16  689   * ice_xmit_pkt - produce a single HW Tx descriptor out of AF_XDP descriptor
64cb650145881d Maciej Fijalkowski     2021-12-16  690   * @xdp_ring: XDP ring to produce the HW Tx descriptor on
64cb650145881d Maciej Fijalkowski     2021-12-16  691   * @desc: AF_XDP descriptor to pull the DMA address and length from
64cb650145881d Maciej Fijalkowski     2021-12-16  692   * @total_bytes: bytes accumulator that will be used for stats update
64cb650145881d Maciej Fijalkowski     2021-12-16  693   */
64cb650145881d Maciej Fijalkowski     2021-12-16  694  static void ice_xmit_pkt(struct ice_tx_ring *xdp_ring, struct xdp_desc *desc,
64cb650145881d Maciej Fijalkowski     2021-12-16  695  			 unsigned int *total_bytes)
64cb650145881d Maciej Fijalkowski     2021-12-16  696  {
64cb650145881d Maciej Fijalkowski     2021-12-16  697  	struct ice_tx_desc *tx_desc;
64cb650145881d Maciej Fijalkowski     2021-12-16  698  	dma_addr_t dma;
64cb650145881d Maciej Fijalkowski     2021-12-16  699  
64cb650145881d Maciej Fijalkowski     2021-12-16  700  	dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc->addr);
64cb650145881d Maciej Fijalkowski     2021-12-16  701  	xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc->len);
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  702  
64cb650145881d Maciej Fijalkowski     2021-12-16  703  	tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_to_use++);
64cb650145881d Maciej Fijalkowski     2021-12-16  704  	tx_desc->buf_addr = cpu_to_le64(dma);
64cb650145881d Maciej Fijalkowski     2021-12-16  705  	tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP,
64cb650145881d Maciej Fijalkowski     2021-12-16  706  						      0, desc->len, 0);
64cb650145881d Maciej Fijalkowski     2021-12-16  707  
64cb650145881d Maciej Fijalkowski     2021-12-16  708  	*total_bytes += desc->len;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  709  }
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  710  
64cb650145881d Maciej Fijalkowski     2021-12-16  711  /**
64cb650145881d Maciej Fijalkowski     2021-12-16  712   * ice_xmit_pkt - produce a batch of HW Tx descriptors out of AF_XDP descriptors
64cb650145881d Maciej Fijalkowski     2021-12-16  713   * @xdp_ring: XDP ring to produce the HW Tx descriptors on
64cb650145881d Maciej Fijalkowski     2021-12-16  714   * @descs: AF_XDP descriptors to pull the DMA addresses and lengths from
64cb650145881d Maciej Fijalkowski     2021-12-16  715   * @total_bytes: bytes accumulator that will be used for stats update
64cb650145881d Maciej Fijalkowski     2021-12-16  716   */
64cb650145881d Maciej Fijalkowski     2021-12-16  717  static void ice_xmit_pkt_batch(struct ice_tx_ring *xdp_ring, struct xdp_desc *descs,
64cb650145881d Maciej Fijalkowski     2021-12-16  718  			       unsigned int *total_bytes)
64cb650145881d Maciej Fijalkowski     2021-12-16 @719  {
64cb650145881d Maciej Fijalkowski     2021-12-16  720  	u16 ntu = xdp_ring->next_to_use;
64cb650145881d Maciej Fijalkowski     2021-12-16  721  	struct ice_tx_desc *tx_desc;
64cb650145881d Maciej Fijalkowski     2021-12-16  722  	dma_addr_t dma;
64cb650145881d Maciej Fijalkowski     2021-12-16  723  	u32 i;
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  724  
64cb650145881d Maciej Fijalkowski     2021-12-16  725  	loop_unrolled_for(i = 0; i < PKTS_PER_BATCH; i++) {
64cb650145881d Maciej Fijalkowski     2021-12-16  726  		dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, descs[i].addr);
64cb650145881d Maciej Fijalkowski     2021-12-16  727  		xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, descs[i].len);
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  728  
64cb650145881d Maciej Fijalkowski     2021-12-16  729  		tx_desc = ICE_TX_DESC(xdp_ring, ntu++);
64cb650145881d Maciej Fijalkowski     2021-12-16  730  		tx_desc->buf_addr = cpu_to_le64(dma);
64cb650145881d Maciej Fijalkowski     2021-12-16  731  		tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP,
64cb650145881d Maciej Fijalkowski     2021-12-16  732  							      0, descs[i].len, 0);
2d4238f5569722 Krzysztof Kazimierczak 2019-11-04  733  
64cb650145881d Maciej Fijalkowski     2021-12-16  734  		*total_bytes += descs[i].len;
64cb650145881d Maciej Fijalkowski     2021-12-16  735  	}
64cb650145881d Maciej Fijalkowski     2021-12-16  736  
64cb650145881d Maciej Fijalkowski     2021-12-16  737  	xdp_ring->next_to_use = ntu;
64cb650145881d Maciej Fijalkowski     2021-12-16  738  
64cb650145881d Maciej Fijalkowski     2021-12-16  739  	if (xdp_ring->next_to_use > xdp_ring->next_rs) {
64cb650145881d Maciej Fijalkowski     2021-12-16  740  		tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_rs);
64cb650145881d Maciej Fijalkowski     2021-12-16  741  		tx_desc->cmd_type_offset_bsz |=
64cb650145881d Maciej Fijalkowski     2021-12-16  742  			cpu_to_le64(ICE_TX_DESC_CMD_RS << ICE_TXD_QW1_CMD_S);
64cb650145881d Maciej Fijalkowski     2021-12-16  743  		xdp_ring->next_rs += ICE_TX_THRESH;
64cb650145881d Maciej Fijalkowski     2021-12-16  744  	}
64cb650145881d Maciej Fijalkowski     2021-12-16  745  }
64cb650145881d Maciej Fijalkowski     2021-12-16  746  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API
  2021-12-16 13:59 ` [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API Maciej Fijalkowski
                     ` (2 preceding siblings ...)
  2021-12-29 10:10     ` kernel test robot
@ 2021-12-29 13:11   ` Alexander Lobakin
  2021-12-30 13:13     ` Maciej Fijalkowski
  3 siblings, 1 reply; 14+ messages in thread
From: Alexander Lobakin @ 2021-12-29 13:11 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Alexander Lobakin, bpf, ast, daniel, netdev, magnus.karlsson

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Thu, 16 Dec 2021 14:59:57 +0100

> Follow mostly the logic from commit 9610bd988df9 ("ice: optimize XDP_TX
> workloads") that has been done in order to address the massive tx_busy
> statistic bump and improve the performance as well.
> 
> Increase the ICE_TX_THRESH to 64 as it seems to work out better for both
> XDP and AF_XDP. Also, separating the stats structs onto separate cache
> lines seemed to improve the performance. Batching approach is inspired
> by i40e's implementation with adjustments to the cleaning logic.
> 
> One difference from 'xdpdrv' XDP_TX is when ring has less than
> ICE_TX_THRESH free entries, the cleaning routine will not stop after
> cleaning a single ICE_TX_THRESH amount of descs but rather will forward
> the next_dd pointer and check the DD bit and for this bit being set the
> cleaning will be repeated. IOW clean until there are descs that can be
> cleaned.
> 
> It takes three separate xdpsock instances in txonly mode to achieve the
> line rate and this was not previously possible.
> 
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_txrx.c |   2 +-
>  drivers/net/ethernet/intel/ice/ice_txrx.h |   4 +-
>  drivers/net/ethernet/intel/ice/ice_xsk.c  | 249 ++++++++++++++--------
>  drivers/net/ethernet/intel/ice/ice_xsk.h  |  26 ++-
>  4 files changed, 182 insertions(+), 99 deletions(-)
> 

-- 8< --

> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h
> index 4c7bd8e9dfc4..f2eb99063c1f 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.h
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.h
> @@ -6,19 +6,36 @@
>  #include "ice_txrx.h"
>  #include "ice.h"
>  
> +#define PKTS_PER_BATCH 8
> +
> +#ifdef __clang__
> +#define loop_unrolled_for _Pragma("clang loop unroll_count(8)") for
> +#elif __GNUC__ >= 4
> +#define loop_unrolled_for _Pragma("GCC unroll 8") for
> +#else
> +#define loop_unrolled_for for
> +#endif

It's used in a bunch more places across the tree, what about
defining that in linux/compiler{,_clang,_gcc}.h?
Is it possible to pass '8' as an argument? Like

	loop_unrolled_for(PKTS_PER_BATCH) ( ; ; ) { }

Could be quite handy.
If it is not, I'd maybe try to define a couple of precoded macros
for 8, 16 and 32, like

#define loop_unrolled_for_8 ...
#define loop_unrolled_for_16 ...
...

So they could be used as generic. I don't think I've seen them with
values other than 8-32.

> +
>  struct ice_vsi;
>  
>  #ifdef CONFIG_XDP_SOCKETS
>  int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool,
>  		       u16 qid);
>  int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget);
> -bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget);
>  int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags);
>  bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count);
>  bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi);
>  void ice_xsk_clean_rx_ring(struct ice_rx_ring *rx_ring);
>  void ice_xsk_clean_xdp_ring(struct ice_tx_ring *xdp_ring);
> +bool ice_xmit_zc(struct ice_tx_ring *xdp_ring, u32 budget);
>  #else
> +static inline bool
> +ice_xmit_zc(struct ice_tx_ring __always_unused *xdp_ring,
> +	    u32 __always_unused budget)
> +{
> +	return false;
> +}
> +
>  static inline int
>  ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi,
>  		   struct xsk_buff_pool __always_unused *pool,
> @@ -34,13 +51,6 @@ ice_clean_rx_irq_zc(struct ice_rx_ring __always_unused *rx_ring,
>  	return 0;
>  }
>  
> -static inline bool
> -ice_clean_tx_irq_zc(struct ice_tx_ring __always_unused *xdp_ring,
> -		    int __always_unused budget)
> -{
> -	return false;
> -}
> -
>  static inline bool
>  ice_alloc_rx_bufs_zc(struct ice_rx_ring __always_unused *rx_ring,
>  		     u16 __always_unused count)
> -- 
> 2.33.1

Thanks,
Al

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API
  2021-12-29 13:11   ` Alexander Lobakin
@ 2021-12-30 13:13     ` Maciej Fijalkowski
  2021-12-30 16:07       ` Alexander Lobakin
  0 siblings, 1 reply; 14+ messages in thread
From: Maciej Fijalkowski @ 2021-12-30 13:13 UTC (permalink / raw)
  To: Alexander Lobakin; +Cc: bpf, ast, daniel, netdev, magnus.karlsson

On Wed, Dec 29, 2021 at 02:11:31PM +0100, Alexander Lobakin wrote:
> From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Date: Thu, 16 Dec 2021 14:59:57 +0100
> 
> > Follow mostly the logic from commit 9610bd988df9 ("ice: optimize XDP_TX
> > workloads") that has been done in order to address the massive tx_busy
> > statistic bump and improve the performance as well.
> > 
> > Increase the ICE_TX_THRESH to 64 as it seems to work out better for both
> > XDP and AF_XDP. Also, separating the stats structs onto separate cache
> > lines seemed to improve the performance. Batching approach is inspired
> > by i40e's implementation with adjustments to the cleaning logic.
> > 
> > One difference from 'xdpdrv' XDP_TX is when ring has less than
> > ICE_TX_THRESH free entries, the cleaning routine will not stop after
> > cleaning a single ICE_TX_THRESH amount of descs but rather will forward
> > the next_dd pointer and check the DD bit and for this bit being set the
> > cleaning will be repeated. IOW clean until there are descs that can be
> > cleaned.
> > 
> > It takes three separate xdpsock instances in txonly mode to achieve the
> > line rate and this was not previously possible.
> > 
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice_txrx.c |   2 +-
> >  drivers/net/ethernet/intel/ice/ice_txrx.h |   4 +-
> >  drivers/net/ethernet/intel/ice/ice_xsk.c  | 249 ++++++++++++++--------
> >  drivers/net/ethernet/intel/ice/ice_xsk.h  |  26 ++-
> >  4 files changed, 182 insertions(+), 99 deletions(-)
> > 
> 
> -- 8< --
> 
> > diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h
> > index 4c7bd8e9dfc4..f2eb99063c1f 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_xsk.h
> > +++ b/drivers/net/ethernet/intel/ice/ice_xsk.h
> > @@ -6,19 +6,36 @@
> >  #include "ice_txrx.h"
> >  #include "ice.h"
> >  
> > +#define PKTS_PER_BATCH 8
> > +
> > +#ifdef __clang__
> > +#define loop_unrolled_for _Pragma("clang loop unroll_count(8)") for
> > +#elif __GNUC__ >= 4
> > +#define loop_unrolled_for _Pragma("GCC unroll 8") for
> > +#else
> > +#define loop_unrolled_for for
> > +#endif
> 
> It's used in a bunch more places across the tree, what about
> defining that in linux/compiler{,_clang,_gcc}.h?
> Is it possible to pass '8' as an argument? Like

Like where besides i40e? I might currently suck at grepping, let's blame
christmas break for that.

If there are actually other callsites besides i40e then this is a good
idea to me, maybe as a follow-up?

> 
> 	loop_unrolled_for(PKTS_PER_BATCH) ( ; ; ) { }
> 
> Could be quite handy.
> If it is not, I'd maybe try to define a couple of precoded macros
> for 8, 16 and 32, like
> 
> #define loop_unrolled_for_8 ...
> #define loop_unrolled_for_16 ...
> ...
> 
> So they could be used as generic. I don't think I've seen them with
> values other than 8-32.
> 
> > +
> >  struct ice_vsi;
> >  
> >  #ifdef CONFIG_XDP_SOCKETS
> >  int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool,
> >  		       u16 qid);
> >  int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget);
> > -bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget);
> >  int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags);
> >  bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count);
> >  bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi);
> >  void ice_xsk_clean_rx_ring(struct ice_rx_ring *rx_ring);
> >  void ice_xsk_clean_xdp_ring(struct ice_tx_ring *xdp_ring);
> > +bool ice_xmit_zc(struct ice_tx_ring *xdp_ring, u32 budget);
> >  #else
> > +static inline bool
> > +ice_xmit_zc(struct ice_tx_ring __always_unused *xdp_ring,
> > +	    u32 __always_unused budget)
> > +{
> > +	return false;
> > +}
> > +
> >  static inline int
> >  ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi,
> >  		   struct xsk_buff_pool __always_unused *pool,
> > @@ -34,13 +51,6 @@ ice_clean_rx_irq_zc(struct ice_rx_ring __always_unused *rx_ring,
> >  	return 0;
> >  }
> >  
> > -static inline bool
> > -ice_clean_tx_irq_zc(struct ice_tx_ring __always_unused *xdp_ring,
> > -		    int __always_unused budget)
> > -{
> > -	return false;
> > -}
> > -
> >  static inline bool
> >  ice_alloc_rx_bufs_zc(struct ice_rx_ring __always_unused *rx_ring,
> >  		     u16 __always_unused count)
> > -- 
> > 2.33.1
> 
> Thanks,
> Al

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API
  2021-12-30 13:13     ` Maciej Fijalkowski
@ 2021-12-30 16:07       ` Alexander Lobakin
  2022-01-05 20:55         ` Alexei Starovoitov
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Lobakin @ 2021-12-30 16:07 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Alexander Lobakin, bpf, ast, daniel, netdev, magnus.karlsson

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Thu, 30 Dec 2021 14:13:10 +0100

> On Wed, Dec 29, 2021 at 02:11:31PM +0100, Alexander Lobakin wrote:
> > From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > Date: Thu, 16 Dec 2021 14:59:57 +0100
> > 
> > > Follow mostly the logic from commit 9610bd988df9 ("ice: optimize XDP_TX
> > > workloads") that has been done in order to address the massive tx_busy
> > > statistic bump and improve the performance as well.
> > > 
> > > Increase the ICE_TX_THRESH to 64 as it seems to work out better for both
> > > XDP and AF_XDP. Also, separating the stats structs onto separate cache
> > > lines seemed to improve the performance. Batching approach is inspired
> > > by i40e's implementation with adjustments to the cleaning logic.
> > > 
> > > One difference from 'xdpdrv' XDP_TX is when ring has less than
> > > ICE_TX_THRESH free entries, the cleaning routine will not stop after
> > > cleaning a single ICE_TX_THRESH amount of descs but rather will forward
> > > the next_dd pointer and check the DD bit and for this bit being set the
> > > cleaning will be repeated. IOW clean until there are descs that can be
> > > cleaned.
> > > 
> > > It takes three separate xdpsock instances in txonly mode to achieve the
> > > line rate and this was not previously possible.
> > > 
> > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > ---
> > >  drivers/net/ethernet/intel/ice/ice_txrx.c |   2 +-
> > >  drivers/net/ethernet/intel/ice/ice_txrx.h |   4 +-
> > >  drivers/net/ethernet/intel/ice/ice_xsk.c  | 249 ++++++++++++++--------
> > >  drivers/net/ethernet/intel/ice/ice_xsk.h  |  26 ++-
> > >  4 files changed, 182 insertions(+), 99 deletions(-)
> > > 
> > 
> > -- 8< --
> > 
> > > diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h
> > > index 4c7bd8e9dfc4..f2eb99063c1f 100644
> > > --- a/drivers/net/ethernet/intel/ice/ice_xsk.h
> > > +++ b/drivers/net/ethernet/intel/ice/ice_xsk.h
> > > @@ -6,19 +6,36 @@
> > >  #include "ice_txrx.h"
> > >  #include "ice.h"
> > >  
> > > +#define PKTS_PER_BATCH 8
> > > +
> > > +#ifdef __clang__
> > > +#define loop_unrolled_for _Pragma("clang loop unroll_count(8)") for
> > > +#elif __GNUC__ >= 4
> > > +#define loop_unrolled_for _Pragma("GCC unroll 8") for
> > > +#else
> > > +#define loop_unrolled_for for
> > > +#endif
> > 
> > It's used in a bunch more places across the tree, what about
> > defining that in linux/compiler{,_clang,_gcc}.h?
> > Is it possible to pass '8' as an argument? Like
> 
> Like where besides i40e? I might currently suck at grepping, let's blame
> christmas break for that.

Ah okay, I confused it with a work around this pragma here: [0]

> 
> If there are actually other callsites besides i40e then this is a good
> idea to me, maybe as a follow-up?

I think there are more potential call sites for that to come, I'd
make linux/unroll.h in the future I guess. But not as a part of
this series, right.

> 
> > 
> > 	loop_unrolled_for(PKTS_PER_BATCH) ( ; ; ) { }
> > 
> > Could be quite handy.
> > If it is not, I'd maybe try to define a couple of precoded macros
> > for 8, 16 and 32, like
> > 
> > #define loop_unrolled_for_8 ...
> > #define loop_unrolled_for_16 ...
> > ...
> > 
> > So they could be used as generic. I don't think I've seen them with
> > values other than 8-32.
> > 
> > > +
> > >  struct ice_vsi;
> > >  
> > >  #ifdef CONFIG_XDP_SOCKETS
> > >  int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool,
> > >  		       u16 qid);
> > >  int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget);
> > > -bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget);
> > >  int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags);
> > >  bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count);
> > >  bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi);
> > >  void ice_xsk_clean_rx_ring(struct ice_rx_ring *rx_ring);
> > >  void ice_xsk_clean_xdp_ring(struct ice_tx_ring *xdp_ring);
> > > +bool ice_xmit_zc(struct ice_tx_ring *xdp_ring, u32 budget);
> > >  #else
> > > +static inline bool
> > > +ice_xmit_zc(struct ice_tx_ring __always_unused *xdp_ring,
> > > +	    u32 __always_unused budget)
> > > +{
> > > +	return false;
> > > +}
> > > +
> > >  static inline int
> > >  ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi,
> > >  		   struct xsk_buff_pool __always_unused *pool,
> > > @@ -34,13 +51,6 @@ ice_clean_rx_irq_zc(struct ice_rx_ring __always_unused *rx_ring,
> > >  	return 0;
> > >  }
> > >  
> > > -static inline bool
> > > -ice_clean_tx_irq_zc(struct ice_tx_ring __always_unused *xdp_ring,
> > > -		    int __always_unused budget)
> > > -{
> > > -	return false;
> > > -}
> > > -
> > >  static inline bool
> > >  ice_alloc_rx_bufs_zc(struct ice_rx_ring __always_unused *rx_ring,
> > >  		     u16 __always_unused count)
> > > -- 
> > > 2.33.1
> > 
> > Thanks,
> > Al

[0] https://elixir.bootlin.com/linux/v5.16-rc7/source/arch/mips/include/asm/unroll.h#L16

Al

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API
  2021-12-30 16:07       ` Alexander Lobakin
@ 2022-01-05 20:55         ` Alexei Starovoitov
  0 siblings, 0 replies; 14+ messages in thread
From: Alexei Starovoitov @ 2022-01-05 20:55 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Maciej Fijalkowski, bpf, Alexei Starovoitov, Daniel Borkmann,
	Network Development, Karlsson, Magnus, Jakub Kicinski,
	David S. Miller

On Thu, Dec 30, 2021 at 8:09 AM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
>
> From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Date: Thu, 30 Dec 2021 14:13:10 +0100
>
> > On Wed, Dec 29, 2021 at 02:11:31PM +0100, Alexander Lobakin wrote:
> > > From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > Date: Thu, 16 Dec 2021 14:59:57 +0100
> > >
> > > > Follow mostly the logic from commit 9610bd988df9 ("ice: optimize XDP_TX
> > > > workloads") that has been done in order to address the massive tx_busy
> > > > statistic bump and improve the performance as well.
> > > >
> > > > Increase the ICE_TX_THRESH to 64 as it seems to work out better for both
> > > > XDP and AF_XDP. Also, separating the stats structs onto separate cache
> > > > lines seemed to improve the performance. Batching approach is inspired
> > > > by i40e's implementation with adjustments to the cleaning logic.
> > > >
> > > > One difference from 'xdpdrv' XDP_TX is when ring has less than
> > > > ICE_TX_THRESH free entries, the cleaning routine will not stop after
> > > > cleaning a single ICE_TX_THRESH amount of descs but rather will forward
> > > > the next_dd pointer and check the DD bit and for this bit being set the
> > > > cleaning will be repeated. IOW clean until there are descs that can be
> > > > cleaned.
> > > >
> > > > It takes three separate xdpsock instances in txonly mode to achieve the
> > > > line rate and this was not previously possible.
> > > >
> > > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > > ---
> > > >  drivers/net/ethernet/intel/ice/ice_txrx.c |   2 +-
> > > >  drivers/net/ethernet/intel/ice/ice_txrx.h |   4 +-
> > > >  drivers/net/ethernet/intel/ice/ice_xsk.c  | 249 ++++++++++++++--------
> > > >  drivers/net/ethernet/intel/ice/ice_xsk.h  |  26 ++-
> > > >  4 files changed, 182 insertions(+), 99 deletions(-)
> > > >
> > >
> > > -- 8< --
> > >
> > > > diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h
> > > > index 4c7bd8e9dfc4..f2eb99063c1f 100644
> > > > --- a/drivers/net/ethernet/intel/ice/ice_xsk.h
> > > > +++ b/drivers/net/ethernet/intel/ice/ice_xsk.h
> > > > @@ -6,19 +6,36 @@
> > > >  #include "ice_txrx.h"
> > > >  #include "ice.h"
> > > >
> > > > +#define PKTS_PER_BATCH 8
> > > > +
> > > > +#ifdef __clang__
> > > > +#define loop_unrolled_for _Pragma("clang loop unroll_count(8)") for
> > > > +#elif __GNUC__ >= 4
> > > > +#define loop_unrolled_for _Pragma("GCC unroll 8") for
> > > > +#else
> > > > +#define loop_unrolled_for for
> > > > +#endif
> > >
> > > It's used in a bunch more places across the tree, what about
> > > defining that in linux/compiler{,_clang,_gcc}.h?
> > > Is it possible to pass '8' as an argument? Like
> >
> > Like where besides i40e? I might currently suck at grepping, let's blame
> > christmas break for that.
>
> Ah okay, I confused it with a work around this pragma here: [0]
>
> >
> > If there are actually other callsites besides i40e then this is a good
> > idea to me, maybe as a follow-up?
>
> I think there are more potential call sites for that to come, I'd
> make linux/unroll.h in the future I guess. But not as a part of
> this series, right.

Please don't, since loop unroll pragma is a hint.
The compilers don't have to actually do the unroll.
Both gcc and clang try to do it when it looks ok-ish from
compiler perspective, but they don't have to.
Try large unroll values and check the code.
Ideally add compiler debug flags, so it can tell what it's actually doing.
It's hard to figure out loop unroll factor looking at the assembly.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-01-05 20:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-16 13:59 [PATCH bpf-next v2 0/4] xsk: Tx improvements Maciej Fijalkowski
2021-12-16 13:59 ` [PATCH bpf-next v2 1/4] i40e: xsk: move tmp desc array from driver to pool Maciej Fijalkowski
2021-12-16 13:59 ` [PATCH bpf-next v2 2/4] ice: xsk: avoid potential dead AF_XDP Tx processing Maciej Fijalkowski
2021-12-21  7:38   ` Magnus Karlsson
2021-12-16 13:59 ` [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API Maciej Fijalkowski
2021-12-17  3:02   ` kernel test robot
2021-12-29  3:02   ` Alexei Starovoitov
2021-12-29 10:10   ` kernel test robot
2021-12-29 10:10     ` kernel test robot
2021-12-29 13:11   ` Alexander Lobakin
2021-12-30 13:13     ` Maciej Fijalkowski
2021-12-30 16:07       ` Alexander Lobakin
2022-01-05 20:55         ` Alexei Starovoitov
2021-12-16 13:59 ` [PATCH bpf-next v2 4/4] ice: xsk: borrow xdp_tx_active logic from i40e Maciej Fijalkowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.