* [PATCH net-next 0/2] Introduce net_prefetch
@ 2019-05-05 10:36 Tariq Toukan
2019-05-05 10:36 ` [PATCH net-next 1/2] net: Take common prefetch code structure into a function Tariq Toukan
2019-05-05 10:36 ` [PATCH net-next 2/2] net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES Tariq Toukan
0 siblings, 2 replies; 6+ messages in thread
From: Tariq Toukan @ 2019-05-05 10:36 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Tariq Toukan
Hi Dave,
In this series, I first take the repeated code structure from
net device drivers into a function. Then, I call it from mlx5e driver.
Series generated against net-next commit:
ca96534630e2 openvswitch: check for null pointer return from nla_nest_start_noflag
Thanks,
Tariq
Tariq Toukan (2):
net: Take common prefetch code structure into a function
net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES
drivers/net/ethernet/chelsio/cxgb3/sge.c | 5 +----
drivers/net/ethernet/hisilicon/hns/hns_enet.c | 5 +----
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 5 +----
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 5 +----
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 12 ++++--------
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 11 +++--------
drivers/net/ethernet/intel/ice/ice_txrx.c | 5 +----
drivers/net/ethernet/intel/igb/igb_main.c | 10 ++--------
drivers/net/ethernet/intel/igc/igc_main.c | 10 ++--------
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 11 +++--------
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 +++--------
drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 4 ++--
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 13 ++++++-------
drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c | 3 +--
include/linux/netdevice.h | 16 ++++++++++++++++
15 files changed, 47 insertions(+), 79 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net-next 1/2] net: Take common prefetch code structure into a function
2019-05-05 10:36 [PATCH net-next 0/2] Introduce net_prefetch Tariq Toukan
@ 2019-05-05 10:36 ` Tariq Toukan
2019-05-06 23:51 ` Jakub Kicinski
2019-05-05 10:36 ` [PATCH net-next 2/2] net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES Tariq Toukan
1 sibling, 1 reply; 6+ messages in thread
From: Tariq Toukan @ 2019-05-05 10:36 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Eran Ben Elisha, Tariq Toukan, Jesper Dangaard Brouer
Many device drivers use the same prefetch code structure to
deal with small L1 cacheline size.
Take this code into a function and call it from the drivers.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
---
drivers/net/ethernet/chelsio/cxgb3/sge.c | 5 +----
drivers/net/ethernet/hisilicon/hns/hns_enet.c | 5 +----
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 5 +----
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 5 +----
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 12 ++++--------
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 11 +++--------
drivers/net/ethernet/intel/ice/ice_txrx.c | 5 +----
drivers/net/ethernet/intel/igb/igb_main.c | 10 ++--------
drivers/net/ethernet/intel/igc/igc_main.c | 10 ++--------
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 11 +++--------
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 11 +++--------
include/linux/netdevice.h | 16 ++++++++++++++++
12 files changed, 38 insertions(+), 68 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb3/sge.c b/drivers/net/ethernet/chelsio/cxgb3/sge.c
index 89db739b7819..be5cba86d294 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/sge.c
@@ -2372,10 +2372,7 @@ static int process_responses(struct adapter *adap, struct sge_qset *qs,
if (fl->use_pages) {
void *addr = fl->sdesc[fl->cidx].pg_chunk.va;
- prefetch(addr);
-#if L1_CACHE_BYTES < 128
- prefetch(addr + L1_CACHE_BYTES);
-#endif
+ net_prefetch(addr);
__refill_fl(adap, fl);
if (lro > 0) {
lro_add_page(adap, qs, fl,
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 65b985acae38..0a05f5279c3a 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -561,10 +561,7 @@ static int hns_nic_poll_rx_skb(struct hns_nic_ring_data *ring_data,
va = (unsigned char *)desc_cb->buf + desc_cb->page_offset;
/* prefetch first cache line of first page */
- prefetch(va);
-#if L1_CACHE_BYTES < 128
- prefetch(va + L1_CACHE_BYTES);
-#endif
+ net_prefetch(va);
skb = *out_skb = napi_alloc_skb(&ring_data->napi,
HNS_RX_HEAD_SIZE);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 96272e632afc..1bfa9397ce68 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2825,10 +2825,7 @@ static int hns3_handle_rx_bd(struct hns3_enet_ring *ring,
* lines. In such a case, single fetch would suffice to cache in the
* relevant part of the header.
*/
- prefetch(ring->va);
-#if L1_CACHE_BYTES < 128
- prefetch(ring->va + L1_CACHE_BYTES);
-#endif
+ net_prefetch(ring->va);
if (!skb) {
ret = hns3_alloc_skb(ring, length, ring->va);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index b4d970e44163..ba8636d720ac 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -313,10 +313,7 @@ static struct sk_buff *fm10k_fetch_rx_buffer(struct fm10k_ring *rx_ring,
rx_buffer->page_offset;
/* prefetch first cache line of first page */
- prefetch(page_addr);
-#if L1_CACHE_BYTES < 128
- prefetch(page_addr + L1_CACHE_BYTES);
-#endif
+ net_prefetch(page_addr);
/* allocate a skb to store the frags */
skb = napi_alloc_skb(&rx_ring->q_vector->napi,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e1931701cd7e..7fda0d569000 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2005,10 +2005,8 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
struct sk_buff *skb;
/* prefetch first cache line of first page */
- prefetch(xdp->data);
-#if L1_CACHE_BYTES < 128
- prefetch(xdp->data + L1_CACHE_BYTES);
-#endif
+ net_prefetch(xdp->data);
+
/* Note, we get here by enabling legacy-rx via:
*
* ethtool --set-priv-flags <dev> legacy-rx on
@@ -2091,10 +2089,8 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
* likely have a consumer accessing first few bytes of meta
* data, and then actual data.
*/
- prefetch(xdp->data_meta);
-#if L1_CACHE_BYTES < 128
- prefetch(xdp->data_meta + L1_CACHE_BYTES);
-#endif
+ net_prefetch(xdp->data_meta);
+
/* build an skb around the page buffer */
skb = build_skb(xdp->data_hard_start, truesize);
if (unlikely(!skb))
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index cf8be63a8a4f..f72521aa03cd 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -1300,10 +1300,7 @@ static struct sk_buff *iavf_construct_skb(struct iavf_ring *rx_ring,
struct sk_buff *skb;
/* prefetch first cache line of first page */
- prefetch(va);
-#if L1_CACHE_BYTES < 128
- prefetch(va + L1_CACHE_BYTES);
-#endif
+ net_prefetch(va);
/* allocate a skb to store the frags */
skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
@@ -1364,10 +1361,8 @@ static struct sk_buff *iavf_build_skb(struct iavf_ring *rx_ring,
struct sk_buff *skb;
/* prefetch first cache line of first page */
- prefetch(va);
-#if L1_CACHE_BYTES < 128
- prefetch(va + L1_CACHE_BYTES);
-#endif
+ net_prefetch(va);
+
/* build an skb around the page buffer */
skb = build_skb(va - IAVF_SKB_PAD, truesize);
if (unlikely(!skb))
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 259f118c7d8b..ff013abad100 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -684,10 +684,7 @@ static bool ice_can_reuse_rx_page(struct ice_rx_buf *rx_buf)
struct sk_buff *skb;
/* prefetch first cache line of first page */
- prefetch(va);
-#if L1_CACHE_BYTES < 128
- prefetch((u8 *)va + L1_CACHE_BYTES);
-#endif /* L1_CACHE_BYTES */
+ net_prefetch(va);
/* allocate a skb to store the frags */
skb = __napi_alloc_skb(&rx_ring->q_vector->napi, ICE_RX_HDR_SIZE,
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 9b8a4bb25327..fc736547cf08 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8032,10 +8032,7 @@ static struct sk_buff *igb_construct_skb(struct igb_ring *rx_ring,
struct sk_buff *skb;
/* prefetch first cache line of first page */
- prefetch(va);
-#if L1_CACHE_BYTES < 128
- prefetch(va + L1_CACHE_BYTES);
-#endif
+ net_prefetch(va);
/* allocate a skb to store the frags */
skb = napi_alloc_skb(&rx_ring->q_vector->napi, IGB_RX_HDR_LEN);
@@ -8089,10 +8086,7 @@ static struct sk_buff *igb_build_skb(struct igb_ring *rx_ring,
struct sk_buff *skb;
/* prefetch first cache line of first page */
- prefetch(va);
-#if L1_CACHE_BYTES < 128
- prefetch(va + L1_CACHE_BYTES);
-#endif
+ net_prefetch(va);
/* build an skb around the page buffer */
skb = build_skb(va - IGB_SKB_PAD, truesize);
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index e58a6e0dc4d9..7497ff24cd58 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1147,10 +1147,7 @@ static struct sk_buff *igc_build_skb(struct igc_ring *rx_ring,
struct sk_buff *skb;
/* prefetch first cache line of first page */
- prefetch(va);
-#if L1_CACHE_BYTES < 128
- prefetch(va + L1_CACHE_BYTES);
-#endif
+ net_prefetch(va);
/* build an skb around the page buffer */
skb = build_skb(va - IGC_SKB_PAD, truesize);
@@ -1186,10 +1183,7 @@ static struct sk_buff *igc_construct_skb(struct igc_ring *rx_ring,
struct sk_buff *skb;
/* prefetch first cache line of first page */
- prefetch(va);
-#if L1_CACHE_BYTES < 128
- prefetch(va + L1_CACHE_BYTES);
-#endif
+ net_prefetch(va);
/* allocate a skb to store the frags */
skb = napi_alloc_skb(&rx_ring->q_vector->napi, IGC_RX_HDR_LEN);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 7b903206b534..5f37ed33b46c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2094,10 +2094,8 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
struct sk_buff *skb;
/* prefetch first cache line of first page */
- prefetch(xdp->data);
-#if L1_CACHE_BYTES < 128
- prefetch(xdp->data + L1_CACHE_BYTES);
-#endif
+ net_prefetch(xdp->data);
+
/* Note, we get here by enabling legacy-rx via:
*
* ethtool --set-priv-flags <dev> legacy-rx on
@@ -2160,10 +2158,7 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
* likely have a consumer accessing first few bytes of meta
* data, and then actual data.
*/
- prefetch(xdp->data_meta);
-#if L1_CACHE_BYTES < 128
- prefetch(xdp->data_meta + L1_CACHE_BYTES);
-#endif
+ net_prefetch(xdp->data_meta);
/* build an skb to around the page buffer */
skb = build_skb(xdp->data_hard_start, truesize);
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index d189ed247665..23a6cb74da84 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -868,10 +868,8 @@ struct sk_buff *ixgbevf_construct_skb(struct ixgbevf_ring *rx_ring,
struct sk_buff *skb;
/* prefetch first cache line of first page */
- prefetch(xdp->data);
-#if L1_CACHE_BYTES < 128
- prefetch(xdp->data + L1_CACHE_BYTES);
-#endif
+ net_prefetch(xdp->data);
+
/* Note, we get here by enabling legacy-rx via:
*
* ethtool --set-priv-flags <dev> legacy-rx on
@@ -949,10 +947,7 @@ static struct sk_buff *ixgbevf_build_skb(struct ixgbevf_ring *rx_ring,
* have a consumer accessing first few bytes of meta data,
* and then actual data.
*/
- prefetch(xdp->data_meta);
-#if L1_CACHE_BYTES < 128
- prefetch(xdp->data_meta + L1_CACHE_BYTES);
-#endif
+ net_prefetch(xdp->data_meta);
/* build an skb around the page buffer */
skb = build_skb(xdp->data_hard_start, truesize);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 44b47e9df94a..a3269153ac5e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2079,6 +2079,22 @@ int netdev_get_num_tc(struct net_device *dev)
return dev->num_tc;
}
+static inline void net_prefetch(void *p)
+{
+ prefetch(p);
+#if L1_CACHE_BYTES < 128
+ prefetch(p + L1_CACHE_BYTES);
+#endif
+}
+
+static inline void net_prefetchw(void *p)
+{
+ prefetchw(p);
+#if L1_CACHE_BYTES < 128
+ prefetchw(p + L1_CACHE_BYTES);
+#endif
+}
+
void netdev_unbind_sb_channel(struct net_device *dev,
struct net_device *sb_dev);
int netdev_bind_sb_channel_queue(struct net_device *dev,
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH net-next 2/2] net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES
2019-05-05 10:36 [PATCH net-next 0/2] Introduce net_prefetch Tariq Toukan
2019-05-05 10:36 ` [PATCH net-next 1/2] net: Take common prefetch code structure into a function Tariq Toukan
@ 2019-05-05 10:36 ` Tariq Toukan
1 sibling, 0 replies; 6+ messages in thread
From: Tariq Toukan @ 2019-05-05 10:36 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Tariq Toukan
A single cacheline might not contain the packet header for
small L1_CACHE_BYTES values.
Issue an additional prefetch in this case.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 4 ++--
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 13 ++++++-------
drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c | 3 +--
3 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 399957104f9d..548a4563e70a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -132,7 +132,7 @@ static void mlx5e_xdp_mpwqe_session_start(struct mlx5e_xdpsq *sq)
mlx5e_xdpsq_fetch_wqe(sq, &session->wqe);
- prefetchw(session->wqe->data);
+ net_prefetchw(session->wqe->data);
session->ds_count = MLX5E_XDP_TX_EMPTY_DS_COUNT;
session->pkt_count = 0;
session->complete = 0;
@@ -235,7 +235,7 @@ static bool mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xdp_info *
struct mlx5e_xdpsq_stats *stats = sq->stats;
- prefetchw(wqe);
+ net_prefetchw(wqe);
if (unlikely(dma_len < MLX5E_XDP_MIN_INLINE || sq->hw_mtu < dma_len)) {
stats->err++;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 13133e7f088e..9d43ac31e2c5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -30,7 +30,6 @@
* SOFTWARE.
*/
-#include <linux/prefetch.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/tcp.h>
@@ -1007,8 +1006,8 @@ struct sk_buff *
dma_sync_single_range_for_cpu(rq->pdev, di->addr, wi->offset,
frag_size, DMA_FROM_DEVICE);
- prefetchw(va); /* xdp_frame data area */
- prefetch(data);
+ net_prefetchw(va); /* xdp_frame data area */
+ net_prefetch(data);
if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)) {
rq->stats->wqe_err++;
@@ -1057,7 +1056,7 @@ struct sk_buff *
return NULL;
}
- prefetchw(skb->data);
+ net_prefetchw(skb->data);
while (byte_cnt) {
u16 frag_consumed_bytes =
@@ -1174,7 +1173,7 @@ struct sk_buff *
return NULL;
}
- prefetchw(skb->data);
+ net_prefetchw(skb->data);
if (unlikely(frag_offset >= PAGE_SIZE)) {
di++;
@@ -1226,8 +1225,8 @@ struct sk_buff *
dma_sync_single_range_for_cpu(rq->pdev, di->addr, head_offset,
frag_size, DMA_FROM_DEVICE);
- prefetchw(va); /* xdp_frame data area */
- prefetch(data);
+ net_prefetchw(va); /* xdp_frame data area */
+ net_prefetch(data);
rcu_read_lock();
consumed = mlx5e_xdp_handle(rq, di, va, &rx_headroom, &cqe_bcnt32);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
index 4382ef85488c..c3a5b4d7bbec 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
@@ -30,7 +30,6 @@
* SOFTWARE.
*/
-#include <linux/prefetch.h>
#include <linux/ip.h>
#include <linux/udp.h>
#include <net/udp.h>
@@ -124,7 +123,7 @@ static struct sk_buff *mlx5e_test_get_udp_skb(struct mlx5e_priv *priv)
return NULL;
}
- prefetchw(skb->data);
+ net_prefetchw(skb->data);
skb_reserve(skb, NET_IP_ALIGN);
/* Reserve for ethernet and IP header */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next 1/2] net: Take common prefetch code structure into a function
2019-05-05 10:36 ` [PATCH net-next 1/2] net: Take common prefetch code structure into a function Tariq Toukan
@ 2019-05-06 23:51 ` Jakub Kicinski
2019-05-07 10:08 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2019-05-06 23:51 UTC (permalink / raw)
To: Tariq Toukan
Cc: David S. Miller, netdev, Eran Ben Elisha, Jesper Dangaard Brouer
On Sun, 5 May 2019 13:36:06 +0300, Tariq Toukan wrote:
> Many device drivers use the same prefetch code structure to
> deal with small L1 cacheline size.
> Take this code into a function and call it from the drivers.
>
> Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
We could bike shed on the name a little - net_prefetch_headers() ?
but at least a short kdoc explanation for the purpose of this helper
would be good IMHO.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next 1/2] net: Take common prefetch code structure into a function
2019-05-06 23:51 ` Jakub Kicinski
@ 2019-05-07 10:08 ` Jesper Dangaard Brouer
2019-05-08 7:48 ` Tariq Toukan
0 siblings, 1 reply; 6+ messages in thread
From: Jesper Dangaard Brouer @ 2019-05-07 10:08 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Tariq Toukan, David S. Miller, netdev, Eran Ben Elisha, brouer,
Alexander Duyck
On Mon, 6 May 2019 16:51:57 -0700
Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> On Sun, 5 May 2019 13:36:06 +0300, Tariq Toukan wrote:
> > Many device drivers use the same prefetch code structure to
> > deal with small L1 cacheline size.
> > Take this code into a function and call it from the drivers.
> >
> > Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> > Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
> > Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>
> We could bike shed on the name a little - net_prefetch_headers() ?
> but at least a short kdoc explanation for the purpose of this helper
> would be good IMHO.
I would at least improve the commit message. As Alexander so nicely
explained[1], this prefetch purpose: "the 2 prefetches are needed for x86
if you want a full TCP or IPv6 header pulled into the L1 cache for
instance." Although, this is not true for a minimum TCP-packet
Eth(14)+IP(20)+TCP(20)=54 bytes. An I missing an alignment in my calc?
[1] https://lore.kernel.org/netdev/CAKgT0UeEL3W42eDqSt97xnn3tXDtWMf4sdPByAtvbx=Z7Sx7hQ@mail.gmail.com/
The name net_prefetch_headers() suggested by Jakub makes sense, as this
indicate that this should be used for prefetching packet headers.
As Alexander also explained, I was wrong in thinking the HW DCU (Data
Cache Unit) prefetcher will fetch two cache-lines automatically. As
the DCU prefetcher is a streaming prefetcher, and doesn't see our
access pattern, which is why we need this.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next 1/2] net: Take common prefetch code structure into a function
2019-05-07 10:08 ` Jesper Dangaard Brouer
@ 2019-05-08 7:48 ` Tariq Toukan
0 siblings, 0 replies; 6+ messages in thread
From: Tariq Toukan @ 2019-05-08 7:48 UTC (permalink / raw)
To: Jesper Dangaard Brouer, Jakub Kicinski
Cc: David S. Miller, netdev, Eran Ben Elisha, Alexander Duyck
On 5/7/2019 1:08 PM, Jesper Dangaard Brouer wrote:
> On Mon, 6 May 2019 16:51:57 -0700
> Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
>
>> On Sun, 5 May 2019 13:36:06 +0300, Tariq Toukan wrote:
>>> Many device drivers use the same prefetch code structure to
>>> deal with small L1 cacheline size.
>>> Take this code into a function and call it from the drivers.
>>>
>>> Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
>>> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
>>> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
>>
>> We could bike shed on the name a little - net_prefetch_headers() ?
>> but at least a short kdoc explanation for the purpose of this helper
>> would be good IMHO.
>
> I would at least improve the commit message. As Alexander so nicely
> explained[1], this prefetch purpose: "the 2 prefetches are needed for x86
> if you want a full TCP or IPv6 header pulled into the L1 cache for
> instance." Although, this is not true for a minimum TCP-packet
> Eth(14)+IP(20)+TCP(20)=54 bytes. An I missing an alignment in my calc?
>
> [1] https://lore.kernel.org/netdev/CAKgT0UeEL3W42eDqSt97xnn3tXDtWMf4sdPByAtvbx=Z7Sx7hQ@mail.gmail.com/
>
> The name net_prefetch_headers() suggested by Jakub makes sense, as this
> indicate that this should be used for prefetching packet headers.
>
> As Alexander also explained, I was wrong in thinking the HW DCU (Data
> Cache Unit) prefetcher will fetch two cache-lines automatically. As
> the DCU prefetcher is a streaming prefetcher, and doesn't see our
> access pattern, which is why we need this.
>
Thanks all for your comments.
I will fix and re-spin once the window re-opens.
Tariq
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-05-08 7:48 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-05 10:36 [PATCH net-next 0/2] Introduce net_prefetch Tariq Toukan
2019-05-05 10:36 ` [PATCH net-next 1/2] net: Take common prefetch code structure into a function Tariq Toukan
2019-05-06 23:51 ` Jakub Kicinski
2019-05-07 10:08 ` Jesper Dangaard Brouer
2019-05-08 7:48 ` Tariq Toukan
2019-05-05 10:36 ` [PATCH net-next 2/2] net/mlx5e: RX, Add a prefetch command for small L1_CACHE_BYTES Tariq Toukan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).