All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] net_prefetch API
@ 2020-08-26 12:54 Tariq Toukan
  2020-08-26 12:54 ` [PATCH net-next 1/3] net: Take common prefetch code structure into a function Tariq Toukan
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Tariq Toukan @ 2020-08-26 12:54 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Moshe Shemesh, Saeed Mahameed, Jakub Kicinski, Tariq Toukan

Hi,

This patchset adds a common net API for L1 cacheline size-aware prefetch.

Patch 1 introduces the common API in net and aligns the drivers to use it.
Patches 2 and 3 add usage in mlx4 and mlx5 Eth drivers.

Series generated against net-next commit:
079f921e9f4d Merge tag 'batadv-next-for-davem-20200824' of git://git.open-mesh.org/linux-merge

Thanks,
Tariq.


Tariq Toukan (3):
  net: Take common prefetch code structure into a function
  net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES
  net/mlx4_en: RX, Add a prefetch command for small L1_CACHE_BYTES

 drivers/net/ethernet/chelsio/cxgb3/sge.c         |  5 +----
 drivers/net/ethernet/hisilicon/hns/hns_enet.c    |  5 +----
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c  |  5 +----
 drivers/net/ethernet/intel/fm10k/fm10k_main.c    |  5 +----
 drivers/net/ethernet/intel/i40e/i40e_txrx.c      | 12 ++++--------
 drivers/net/ethernet/intel/iavf/iavf_txrx.c      | 11 +++--------
 drivers/net/ethernet/intel/ice/ice_txrx.c        | 10 ++--------
 drivers/net/ethernet/intel/igb/igb_main.c        | 10 ++--------
 drivers/net/ethernet/intel/igc/igc_main.c        | 10 ++--------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 11 +++--------
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c    | 11 +++--------
 drivers/net/ethernet/mellanox/mlx4/en_rx.c       |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 ++--
 .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.c  |  4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c  | 13 ++++++-------
 .../ethernet/mellanox/mlx5/core/en_selftest.c    |  3 +--
 include/linux/netdevice.h                        | 16 ++++++++++++++++
 17 files changed, 51 insertions(+), 86 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 1/3] net: Take common prefetch code structure into a function
  2020-08-26 12:54 [PATCH net-next 0/3] net_prefetch API Tariq Toukan
@ 2020-08-26 12:54 ` Tariq Toukan
  2020-08-26 12:54 ` [PATCH net-next 2/3] net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES Tariq Toukan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Tariq Toukan @ 2020-08-26 12:54 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Moshe Shemesh, Saeed Mahameed, Jakub Kicinski,
	Tariq Toukan, Jakub Kicinski

Many device drivers use the same prefetch code structure to
deal with small L1 cacheline size.
Take this code into a function and call it from the drivers.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/chelsio/cxgb3/sge.c         |  5 +----
 drivers/net/ethernet/hisilicon/hns/hns_enet.c    |  5 +----
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c  |  5 +----
 drivers/net/ethernet/intel/fm10k/fm10k_main.c    |  5 +----
 drivers/net/ethernet/intel/i40e/i40e_txrx.c      | 12 ++++--------
 drivers/net/ethernet/intel/iavf/iavf_txrx.c      | 11 +++--------
 drivers/net/ethernet/intel/ice/ice_txrx.c        | 10 ++--------
 drivers/net/ethernet/intel/igb/igb_main.c        | 10 ++--------
 drivers/net/ethernet/intel/igc/igc_main.c        | 10 ++--------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 11 +++--------
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c    | 11 +++--------
 include/linux/netdevice.h                        | 16 ++++++++++++++++
 12 files changed, 39 insertions(+), 72 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb3/sge.c b/drivers/net/ethernet/chelsio/cxgb3/sge.c
index 6dabbf1502c7..ee6188dea705 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/sge.c
@@ -2372,10 +2372,7 @@ static int process_responses(struct adapter *adap, struct sge_qset *qs,
 			if (fl->use_pages) {
 				void *addr = fl->sdesc[fl->cidx].pg_chunk.va;
 
-				prefetch(addr);
-#if L1_CACHE_BYTES < 128
-				prefetch(addr + L1_CACHE_BYTES);
-#endif
+				net_prefetch(addr);
 				__refill_fl(adap, fl);
 				if (lro > 0) {
 					lro_add_page(adap, qs, fl,
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 23f278e46975..3af33ade7b60 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -557,10 +557,7 @@ static int hns_nic_poll_rx_skb(struct hns_nic_ring_data *ring_data,
 	va = (unsigned char *)desc_cb->buf + desc_cb->page_offset;
 
 	/* prefetch first cache line of first page */
-	prefetch(va);
-#if L1_CACHE_BYTES < 128
-	prefetch(va + L1_CACHE_BYTES);
-#endif
+	net_prefetch(va);
 
 	skb = *out_skb = napi_alloc_skb(&ring_data->napi,
 					HNS_RX_HEAD_SIZE);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 87776ce3539b..1a1ba6a41bfe 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -3091,10 +3091,7 @@ static int hns3_handle_rx_bd(struct hns3_enet_ring *ring)
 	 * lines. In such a case, single fetch would suffice to cache in the
 	 * relevant part of the header.
 	 */
-	prefetch(ring->va);
-#if L1_CACHE_BYTES < 128
-	prefetch(ring->va + L1_CACHE_BYTES);
-#endif
+	net_prefetch(ring->va);
 
 	if (!skb) {
 		ret = hns3_alloc_skb(ring, length, ring->va);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index d88dd41a9442..99b8252eb969 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -310,10 +310,7 @@ static struct sk_buff *fm10k_fetch_rx_buffer(struct fm10k_ring *rx_ring,
 				  rx_buffer->page_offset;
 
 		/* prefetch first cache line of first page */
-		prefetch(page_addr);
-#if L1_CACHE_BYTES < 128
-		prefetch((void *)((u8 *)page_addr + L1_CACHE_BYTES));
-#endif
+		net_prefetch(page_addr);
 
 		/* allocate a skb to store the frags */
 		skb = napi_alloc_skb(&rx_ring->q_vector->napi,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 3e5c566ceb01..432a984ac335 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1992,10 +1992,8 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
 	struct sk_buff *skb;
 
 	/* prefetch first cache line of first page */
-	prefetch(xdp->data);
-#if L1_CACHE_BYTES < 128
-	prefetch(xdp->data + L1_CACHE_BYTES);
-#endif
+	net_prefetch(xdp->data);
+
 	/* Note, we get here by enabling legacy-rx via:
 	 *
 	 *    ethtool --set-priv-flags <dev> legacy-rx on
@@ -2078,10 +2076,8 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
 	 * likely have a consumer accessing first few bytes of meta
 	 * data, and then actual data.
 	 */
-	prefetch(xdp->data_meta);
-#if L1_CACHE_BYTES < 128
-	prefetch(xdp->data_meta + L1_CACHE_BYTES);
-#endif
+	net_prefetch(xdp->data_meta);
+
 	/* build an skb around the page buffer */
 	skb = build_skb(xdp->data_hard_start, truesize);
 	if (unlikely(!skb))
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index ca041b39ffda..256fa07d54d5 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -1309,10 +1309,7 @@ static struct sk_buff *iavf_construct_skb(struct iavf_ring *rx_ring,
 		return NULL;
 	/* prefetch first cache line of first page */
 	va = page_address(rx_buffer->page) + rx_buffer->page_offset;
-	prefetch(va);
-#if L1_CACHE_BYTES < 128
-	prefetch(va + L1_CACHE_BYTES);
-#endif
+	net_prefetch(va);
 
 	/* allocate a skb to store the frags */
 	skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
@@ -1376,10 +1373,8 @@ static struct sk_buff *iavf_build_skb(struct iavf_ring *rx_ring,
 		return NULL;
 	/* prefetch first cache line of first page */
 	va = page_address(rx_buffer->page) + rx_buffer->page_offset;
-	prefetch(va);
-#if L1_CACHE_BYTES < 128
-	prefetch(va + L1_CACHE_BYTES);
-#endif
+	net_prefetch(va);
+
 	/* build an skb around the page buffer */
 	skb = build_skb(va - IAVF_SKB_PAD, truesize);
 	if (unlikely(!skb))
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 9d0d6b0025cf..d2fca4a52f51 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -919,10 +919,7 @@ ice_build_skb(struct ice_ring *rx_ring, struct ice_rx_buf *rx_buf,
 	 * likely have a consumer accessing first few bytes of meta
 	 * data, and then actual data.
 	 */
-	prefetch(xdp->data_meta);
-#if L1_CACHE_BYTES < 128
-	prefetch((void *)(xdp->data + L1_CACHE_BYTES));
-#endif
+	net_prefetch(xdp->data_meta);
 	/* build an skb around the page buffer */
 	skb = build_skb(xdp->data_hard_start, truesize);
 	if (unlikely(!skb))
@@ -964,10 +961,7 @@ ice_construct_skb(struct ice_ring *rx_ring, struct ice_rx_buf *rx_buf,
 	struct sk_buff *skb;
 
 	/* prefetch first cache line of first page */
-	prefetch(xdp->data);
-#if L1_CACHE_BYTES < 128
-	prefetch((void *)(xdp->data + L1_CACHE_BYTES));
-#endif /* L1_CACHE_BYTES */
+	net_prefetch(xdp->data);
 
 	/* allocate a skb to store the frags */
 	skb = __napi_alloc_skb(&rx_ring->q_vector->napi, ICE_RX_HDR_SIZE,
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 4f05f6efe6af..698bb6a4b088 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8047,10 +8047,7 @@ static struct sk_buff *igb_construct_skb(struct igb_ring *rx_ring,
 	struct sk_buff *skb;
 
 	/* prefetch first cache line of first page */
-	prefetch(va);
-#if L1_CACHE_BYTES < 128
-	prefetch(va + L1_CACHE_BYTES);
-#endif
+	net_prefetch(va);
 
 	/* allocate a skb to store the frags */
 	skb = napi_alloc_skb(&rx_ring->q_vector->napi, IGB_RX_HDR_LEN);
@@ -8104,10 +8101,7 @@ static struct sk_buff *igb_build_skb(struct igb_ring *rx_ring,
 	struct sk_buff *skb;
 
 	/* prefetch first cache line of first page */
-	prefetch(va);
-#if L1_CACHE_BYTES < 128
-	prefetch(va + L1_CACHE_BYTES);
-#endif
+	net_prefetch(va);
 
 	/* build an skb around the page buffer */
 	skb = build_skb(va - IGB_SKB_PAD, truesize);
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 9593aa4eea36..c6968fdb6caa 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1550,10 +1550,7 @@ static struct sk_buff *igc_build_skb(struct igc_ring *rx_ring,
 	struct sk_buff *skb;
 
 	/* prefetch first cache line of first page */
-	prefetch(va);
-#if L1_CACHE_BYTES < 128
-	prefetch(va + L1_CACHE_BYTES);
-#endif
+	net_prefetch(va);
 
 	/* build an skb around the page buffer */
 	skb = build_skb(va - IGC_SKB_PAD, truesize);
@@ -1589,10 +1586,7 @@ static struct sk_buff *igc_construct_skb(struct igc_ring *rx_ring,
 	struct sk_buff *skb;
 
 	/* prefetch first cache line of first page */
-	prefetch(va);
-#if L1_CACHE_BYTES < 128
-	prefetch(va + L1_CACHE_BYTES);
-#endif
+	net_prefetch(va);
 
 	/* allocate a skb to store the frags */
 	skb = napi_alloc_skb(&rx_ring->q_vector->napi, IGC_RX_HDR_LEN);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2f8a4cfc5fa1..f4f2198f388b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2095,10 +2095,8 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
 	struct sk_buff *skb;
 
 	/* prefetch first cache line of first page */
-	prefetch(xdp->data);
-#if L1_CACHE_BYTES < 128
-	prefetch(xdp->data + L1_CACHE_BYTES);
-#endif
+	net_prefetch(xdp->data);
+
 	/* Note, we get here by enabling legacy-rx via:
 	 *
 	 *    ethtool --set-priv-flags <dev> legacy-rx on
@@ -2161,10 +2159,7 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
 	 * likely have a consumer accessing first few bytes of meta
 	 * data, and then actual data.
 	 */
-	prefetch(xdp->data_meta);
-#if L1_CACHE_BYTES < 128
-	prefetch(xdp->data_meta + L1_CACHE_BYTES);
-#endif
+	net_prefetch(xdp->data_meta);
 
 	/* build an skb to around the page buffer */
 	skb = build_skb(xdp->data_hard_start, truesize);
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index a428113e6d54..50afec43e001 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -866,10 +866,8 @@ struct sk_buff *ixgbevf_construct_skb(struct ixgbevf_ring *rx_ring,
 	struct sk_buff *skb;
 
 	/* prefetch first cache line of first page */
-	prefetch(xdp->data);
-#if L1_CACHE_BYTES < 128
-	prefetch(xdp->data + L1_CACHE_BYTES);
-#endif
+	net_prefetch(xdp->data);
+
 	/* Note, we get here by enabling legacy-rx via:
 	 *
 	 *    ethtool --set-priv-flags <dev> legacy-rx on
@@ -947,10 +945,7 @@ static struct sk_buff *ixgbevf_build_skb(struct ixgbevf_ring *rx_ring,
 	 * have a consumer accessing first few bytes of meta data,
 	 * and then actual data.
 	 */
-	prefetch(xdp->data_meta);
-#if L1_CACHE_BYTES < 128
-	prefetch(xdp->data_meta + L1_CACHE_BYTES);
-#endif
+	net_prefetch(xdp->data_meta);
 
 	/* build an skb around the page buffer */
 	skb = build_skb(xdp->data_hard_start, truesize);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b0e303f6603f..b8abe1d7aa0b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2193,6 +2193,22 @@ int netdev_get_num_tc(struct net_device *dev)
 	return dev->num_tc;
 }
 
+static inline void net_prefetch(void *p)
+{
+	prefetch(p);
+#if L1_CACHE_BYTES < 128
+	prefetch((u8 *)p + L1_CACHE_BYTES);
+#endif
+}
+
+static inline void net_prefetchw(void *p)
+{
+	prefetchw(p);
+#if L1_CACHE_BYTES < 128
+	prefetchw((u8 *)p + L1_CACHE_BYTES);
+#endif
+}
+
 void netdev_unbind_sb_channel(struct net_device *dev,
 			      struct net_device *sb_dev);
 int netdev_bind_sb_channel_queue(struct net_device *dev,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 2/3] net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES
  2020-08-26 12:54 [PATCH net-next 0/3] net_prefetch API Tariq Toukan
  2020-08-26 12:54 ` [PATCH net-next 1/3] net: Take common prefetch code structure into a function Tariq Toukan
@ 2020-08-26 12:54 ` Tariq Toukan
  2020-08-26 12:54 ` [PATCH net-next 3/3] net/mlx4_en: " Tariq Toukan
  2020-08-26 22:56 ` [PATCH net-next 0/3] net_prefetch API David Miller
  3 siblings, 0 replies; 7+ messages in thread
From: Tariq Toukan @ 2020-08-26 12:54 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Moshe Shemesh, Saeed Mahameed, Jakub Kicinski, Tariq Toukan

A single cacheline might not contain the packet header for
small L1_CACHE_BYTES values.
Use net_prefetch() as it issues an additional prefetch
in this case.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    |  4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c |  4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c     | 13 ++++++-------
 .../net/ethernet/mellanox/mlx5/core/en_selftest.c   |  3 +--
 4 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 0e6946fc121f..4bcb73a5522f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -201,7 +201,7 @@ static void mlx5e_xdp_mpwqe_session_start(struct mlx5e_xdpsq *sq)
 	pi = mlx5e_xdpsq_get_next_pi(sq, MLX5_SEND_WQE_MAX_WQEBBS);
 	session->wqe = MLX5E_TX_FETCH_WQE(sq, pi);
 
-	prefetchw(session->wqe->data);
+	net_prefetchw(session->wqe->data);
 	session->ds_count  = MLX5E_XDP_TX_EMPTY_DS_COUNT;
 	session->pkt_count = 0;
 
@@ -322,7 +322,7 @@ mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xdp_xmit_data *xdptxd,
 
 	struct mlx5e_xdpsq_stats *stats = sq->stats;
 
-	prefetchw(wqe);
+	net_prefetchw(wqe);
 
 	if (unlikely(dma_len < MLX5E_XDP_MIN_INLINE || sq->hw_mtu < dma_len)) {
 		stats->err++;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index a33a1f762c70..786fedf52436 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -49,7 +49,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq,
 	xdp->data_end = xdp->data + cqe_bcnt32;
 	xdp_set_data_meta_invalid(xdp);
 	xsk_buff_dma_sync_for_cpu(xdp);
-	prefetch(xdp->data);
+	net_prefetch(xdp->data);
 
 	rcu_read_lock();
 	consumed = mlx5e_xdp_handle(rq, NULL, &cqe_bcnt32, xdp);
@@ -100,7 +100,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq,
 	xdp->data_end = xdp->data + cqe_bcnt;
 	xdp_set_data_meta_invalid(xdp);
 	xsk_buff_dma_sync_for_cpu(xdp);
-	prefetch(xdp->data);
+	net_prefetch(xdp->data);
 
 	if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)) {
 		rq->stats->wqe_err++;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 65828af120b7..228fd775bcd5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -30,7 +30,6 @@
  * SOFTWARE.
  */
 
-#include <linux/prefetch.h>
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/tcp.h>
@@ -1141,8 +1140,8 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
 
 	dma_sync_single_range_for_cpu(rq->pdev, di->addr, wi->offset,
 				      frag_size, DMA_FROM_DEVICE);
-	prefetchw(va); /* xdp_frame data area */
-	prefetch(data);
+	net_prefetchw(va); /* xdp_frame data area */
+	net_prefetch(data);
 
 	rcu_read_lock();
 	mlx5e_fill_xdp_buff(rq, va, rx_headroom, cqe_bcnt, &xdp);
@@ -1184,7 +1183,7 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
 		return NULL;
 	}
 
-	prefetchw(skb->data);
+	net_prefetchw(skb->data);
 
 	while (byte_cnt) {
 		u16 frag_consumed_bytes =
@@ -1399,7 +1398,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 		return NULL;
 	}
 
-	prefetchw(skb->data);
+	net_prefetchw(skb->data);
 
 	if (unlikely(frag_offset >= PAGE_SIZE)) {
 		di++;
@@ -1452,8 +1451,8 @@ mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 
 	dma_sync_single_range_for_cpu(rq->pdev, di->addr, head_offset,
 				      frag_size, DMA_FROM_DEVICE);
-	prefetchw(va); /* xdp_frame data area */
-	prefetch(data);
+	net_prefetchw(va); /* xdp_frame data area */
+	net_prefetch(data);
 
 	rcu_read_lock();
 	mlx5e_fill_xdp_buff(rq, va, rx_headroom, cqe_bcnt32, &xdp);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
index 46790216ce86..ce8ab1f01876 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
@@ -30,7 +30,6 @@
  * SOFTWARE.
  */
 
-#include <linux/prefetch.h>
 #include <linux/ip.h>
 #include <linux/udp.h>
 #include <net/udp.h>
@@ -115,7 +114,7 @@ static struct sk_buff *mlx5e_test_get_udp_skb(struct mlx5e_priv *priv)
 		return NULL;
 	}
 
-	prefetchw(skb->data);
+	net_prefetchw(skb->data);
 	skb_reserve(skb, NET_IP_ALIGN);
 
 	/*  Reserve for ethernet and IP header  */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 3/3] net/mlx4_en: RX, Add a prefetch command for small L1_CACHE_BYTES
  2020-08-26 12:54 [PATCH net-next 0/3] net_prefetch API Tariq Toukan
  2020-08-26 12:54 ` [PATCH net-next 1/3] net: Take common prefetch code structure into a function Tariq Toukan
  2020-08-26 12:54 ` [PATCH net-next 2/3] net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES Tariq Toukan
@ 2020-08-26 12:54 ` Tariq Toukan
  2020-08-26 14:55   ` Eric Dumazet
  2020-08-26 22:56 ` [PATCH net-next 0/3] net_prefetch API David Miller
  3 siblings, 1 reply; 7+ messages in thread
From: Tariq Toukan @ 2020-08-26 12:54 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Moshe Shemesh, Saeed Mahameed, Jakub Kicinski, Tariq Toukan

A single cacheline might not contain the packet header for
small L1_CACHE_BYTES values.
Use net_prefetch() as it issues an additional prefetch
in this case.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index b50c567ef508..99d7737e8ad6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -705,7 +705,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 
 		frags = ring->rx_info + (index << priv->log_rx_info);
 		va = page_address(frags[0].page) + frags[0].page_offset;
-		prefetchw(va);
+		net_prefetchw(va);
 		/*
 		 * make sure we read the CQE after we read the ownership bit
 		 */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 3/3] net/mlx4_en: RX, Add a prefetch command for small L1_CACHE_BYTES
  2020-08-26 12:54 ` [PATCH net-next 3/3] net/mlx4_en: " Tariq Toukan
@ 2020-08-26 14:55   ` Eric Dumazet
  2020-08-26 17:09     ` Saeed Mahameed
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2020-08-26 14:55 UTC (permalink / raw)
  To: Tariq Toukan, David S. Miller
  Cc: netdev, Moshe Shemesh, Saeed Mahameed, Jakub Kicinski



On 8/26/20 5:54 AM, Tariq Toukan wrote:
> A single cacheline might not contain the packet header for
> small L1_CACHE_BYTES values.
> Use net_prefetch() as it issues an additional prefetch
> in this case.
> 
> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index b50c567ef508..99d7737e8ad6 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -705,7 +705,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>  
>  		frags = ring->rx_info + (index << priv->log_rx_info);
>  		va = page_address(frags[0].page) + frags[0].page_offset;
> -		prefetchw(va);
> +		net_prefetchw(va);
>  		/*
>  		 * make sure we read the CQE after we read the ownership bit
>  		 */
> 

Why these cache lines would be written next ? Presumably we read the headers (pulled into skb->head)

Really using prefetch() for the about to be read packet is too late anyway for current cpus.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 3/3] net/mlx4_en: RX, Add a prefetch command for small L1_CACHE_BYTES
  2020-08-26 14:55   ` Eric Dumazet
@ 2020-08-26 17:09     ` Saeed Mahameed
  0 siblings, 0 replies; 7+ messages in thread
From: Saeed Mahameed @ 2020-08-26 17:09 UTC (permalink / raw)
  To: eric.dumazet, tariqt, davem; +Cc: netdev, kuba, moshe

On Wed, 2020-08-26 at 07:55 -0700, Eric Dumazet wrote:
> 
> On 8/26/20 5:54 AM, Tariq Toukan wrote:
> > A single cacheline might not contain the packet header for
> > small L1_CACHE_BYTES values.
> > Use net_prefetch() as it issues an additional prefetch
> > in this case.
> > 
> > Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> > Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > index b50c567ef508..99d7737e8ad6 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > @@ -705,7 +705,7 @@ int mlx4_en_process_rx_cq(struct net_device
> > *dev, struct mlx4_en_cq *cq, int bud
> >  
> >  		frags = ring->rx_info + (index << priv->log_rx_info);
> >  		va = page_address(frags[0].page) +
> > frags[0].page_offset;
> > -		prefetchw(va);
> > +		net_prefetchw(va);
> >  		/*
> >  		 * make sure we read the CQE after we read the
> > ownership bit
> >  		 */
> > 
> 
> Why these cache lines would be written next ? Presumably we read the
> headers (pulled into skb->head)
> 

XDP

> Really using prefetch() for the about to be read packet is too late
> anyway for current cpus.
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 0/3] net_prefetch API
  2020-08-26 12:54 [PATCH net-next 0/3] net_prefetch API Tariq Toukan
                   ` (2 preceding siblings ...)
  2020-08-26 12:54 ` [PATCH net-next 3/3] net/mlx4_en: " Tariq Toukan
@ 2020-08-26 22:56 ` David Miller
  3 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2020-08-26 22:56 UTC (permalink / raw)
  To: tariqt; +Cc: netdev, moshe, saeedm, kuba

From: Tariq Toukan <tariqt@mellanox.com>
Date: Wed, 26 Aug 2020 15:54:15 +0300

> This patchset adds a common net API for L1 cacheline size-aware prefetch.
> 
> Patch 1 introduces the common API in net and aligns the drivers to use it.
> Patches 2 and 3 add usage in mlx4 and mlx5 Eth drivers.
> 
> Series generated against net-next commit:
> 079f921e9f4d Merge tag 'batadv-next-for-davem-20200824' of git://git.open-mesh.org/linux-merge

Effectively a nice little cleanup, series applied, thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-08-26 22:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-26 12:54 [PATCH net-next 0/3] net_prefetch API Tariq Toukan
2020-08-26 12:54 ` [PATCH net-next 1/3] net: Take common prefetch code structure into a function Tariq Toukan
2020-08-26 12:54 ` [PATCH net-next 2/3] net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES Tariq Toukan
2020-08-26 12:54 ` [PATCH net-next 3/3] net/mlx4_en: " Tariq Toukan
2020-08-26 14:55   ` Eric Dumazet
2020-08-26 17:09     ` Saeed Mahameed
2020-08-26 22:56 ` [PATCH net-next 0/3] net_prefetch API David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.