All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net/i40e: add packet prefetch
@ 2017-03-01 10:56 Vladyslav Buslov
  2017-03-07 10:37 ` Ferruh Yigit
  2017-04-01  2:01 ` Pei, Yulong
  0 siblings, 2 replies; 8+ messages in thread
From: Vladyslav Buslov @ 2017-03-01 10:56 UTC (permalink / raw)
  To: helin.zhang, jingjing.wu, ferruh.yigit; +Cc: dev

Prefetch both cache lines of mbuf and first cache line of payload
if CONFIG_RTE_PMD_PACKET_PREFETCH is set.

Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
---
 drivers/net/i40e/i40e_rxtx.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 48429cc..2b4e5c9 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -100,6 +100,12 @@
 #define I40E_TX_OFFLOAD_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
 
+#ifdef RTE_PMD_PACKET_PREFETCH
+#define rte_packet_prefetch(p)	rte_prefetch0(p)
+#else
+#define rte_packet_prefetch(p)	do {} while (0)
+#endif
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -495,6 +501,9 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
 		/* Translate descriptor info to mbuf parameters */
 		for (j = 0; j < nb_dd; j++) {
 			mb = rxep[j].mbuf;
+			rte_packet_prefetch(
+				RTE_PTR_ADD(mb->buf_addr,
+						RTE_PKTMBUF_HEADROOM));
 			qword1 = rte_le_to_cpu_64(\
 				rxdp[j].wb.qword1.status_error_len);
 			pkt_len = ((qword1 & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
@@ -578,9 +587,11 @@ i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq)
 
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
-		if (likely(i < (rxq->rx_free_thresh - 1)))
+		if (likely(i < (rxq->rx_free_thresh - 1))) {
 			/* Prefetch next mbuf */
-			rte_prefetch0(rxep[i + 1].mbuf);
+			rte_packet_prefetch(rxep[i + 1].mbuf->cacheline0);
+			rte_packet_prefetch(rxep[i + 1].mbuf->cacheline1);
+		}
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
@@ -752,7 +763,8 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 				I40E_RXD_QW1_LENGTH_PBUF_SHIFT) - rxq->crc_len;
 
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
-		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
+		rte_packet_prefetch(RTE_PTR_ADD(rxm->buf_addr,
+						RTE_PKTMBUF_HEADROOM));
 		rxm->nb_segs = 1;
 		rxm->next = NULL;
 		rxm->pkt_len = rx_packet_len;
@@ -939,7 +951,7 @@ i40e_recv_scattered_pkts(void *rx_queue,
 		first_seg->ol_flags |= pkt_flags;
 
 		/* Prefetch data of first segment, if configured to do so. */
-		rte_prefetch0(RTE_PTR_ADD(first_seg->buf_addr,
+		rte_packet_prefetch(RTE_PTR_ADD(first_seg->buf_addr,
 			first_seg->data_off));
 		rx_pkts[nb_rx++] = first_seg;
 		first_seg = NULL;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] net/i40e: add packet prefetch
  2017-03-01 10:56 [PATCH] net/i40e: add packet prefetch Vladyslav Buslov
@ 2017-03-07 10:37 ` Ferruh Yigit
  2017-04-01  2:01 ` Pei, Yulong
  1 sibling, 0 replies; 8+ messages in thread
From: Ferruh Yigit @ 2017-03-07 10:37 UTC (permalink / raw)
  To: Vladyslav Buslov, helin.zhang, jingjing.wu
  Cc: dev, Ananyev, Konstantin, Bruce Richardson

On 3/1/2017 10:56 AM, Vladyslav Buslov wrote:
> Prefetch both cache lines of mbuf and first cache line of payload
> if CONFIG_RTE_PMD_PACKET_PREFETCH is set.
> 
> Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>

Cc'ing Bruce and Konstantin.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net/i40e: add packet prefetch
  2017-03-01 10:56 [PATCH] net/i40e: add packet prefetch Vladyslav Buslov
  2017-03-07 10:37 ` Ferruh Yigit
@ 2017-04-01  2:01 ` Pei, Yulong
  2017-04-03 10:31   ` Ferruh Yigit
  2017-04-03 10:47   ` Ananyev, Konstantin
  1 sibling, 2 replies; 8+ messages in thread
From: Pei, Yulong @ 2017-04-01  2:01 UTC (permalink / raw)
  To: Vladyslav Buslov, Zhang, Helin, Wu, Jingjing, Yigit, Ferruh; +Cc: dev

Hi All

In Non-vector mode, without this patch, single core performance can reach 37.576Mpps with 64Byte packet,
But after applied this patch , single core performance downgrade to 34.343Mpps with 64Byte packet.

Best Regards
Yulong Pei

-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladyslav Buslov
Sent: Wednesday, March 1, 2017 6:57 PM
To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>
Cc: dev@dpdk.org
Subject: [dpdk-dev] [PATCH] net/i40e: add packet prefetch

Prefetch both cache lines of mbuf and first cache line of payload if CONFIG_RTE_PMD_PACKET_PREFETCH is set.

Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
---
 drivers/net/i40e/i40e_rxtx.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 48429cc..2b4e5c9 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -100,6 +100,12 @@
 #define I40E_TX_OFFLOAD_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
 
+#ifdef RTE_PMD_PACKET_PREFETCH
+#define rte_packet_prefetch(p)	rte_prefetch0(p)
+#else
+#define rte_packet_prefetch(p)	do {} while (0)
+#endif
+
 static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
 				      struct rte_mbuf **tx_pkts,
 				      uint16_t nb_pkts);
@@ -495,6 +501,9 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
 		/* Translate descriptor info to mbuf parameters */
 		for (j = 0; j < nb_dd; j++) {
 			mb = rxep[j].mbuf;
+			rte_packet_prefetch(
+				RTE_PTR_ADD(mb->buf_addr,
+						RTE_PKTMBUF_HEADROOM));
 			qword1 = rte_le_to_cpu_64(\
 				rxdp[j].wb.qword1.status_error_len);
 			pkt_len = ((qword1 & I40E_RXD_QW1_LENGTH_PBUF_MASK) >> @@ -578,9 +587,11 @@ i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq)
 
 	rxdp = &rxq->rx_ring[alloc_idx];
 	for (i = 0; i < rxq->rx_free_thresh; i++) {
-		if (likely(i < (rxq->rx_free_thresh - 1)))
+		if (likely(i < (rxq->rx_free_thresh - 1))) {
 			/* Prefetch next mbuf */
-			rte_prefetch0(rxep[i + 1].mbuf);
+			rte_packet_prefetch(rxep[i + 1].mbuf->cacheline0);
+			rte_packet_prefetch(rxep[i + 1].mbuf->cacheline1);
+		}
 
 		mb = rxep[i].mbuf;
 		rte_mbuf_refcnt_set(mb, 1);
@@ -752,7 +763,8 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 				I40E_RXD_QW1_LENGTH_PBUF_SHIFT) - rxq->crc_len;
 
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
-		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
+		rte_packet_prefetch(RTE_PTR_ADD(rxm->buf_addr,
+						RTE_PKTMBUF_HEADROOM));
 		rxm->nb_segs = 1;
 		rxm->next = NULL;
 		rxm->pkt_len = rx_packet_len;
@@ -939,7 +951,7 @@ i40e_recv_scattered_pkts(void *rx_queue,
 		first_seg->ol_flags |= pkt_flags;
 
 		/* Prefetch data of first segment, if configured to do so. */
-		rte_prefetch0(RTE_PTR_ADD(first_seg->buf_addr,
+		rte_packet_prefetch(RTE_PTR_ADD(first_seg->buf_addr,
 			first_seg->data_off));
 		rx_pkts[nb_rx++] = first_seg;
 		first_seg = NULL;
--
2.1.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net/i40e: add packet prefetch
  2017-04-01  2:01 ` Pei, Yulong
@ 2017-04-03 10:31   ` Ferruh Yigit
  2017-04-06  9:29     ` Vladyslav Buslov
  2017-04-03 10:47   ` Ananyev, Konstantin
  1 sibling, 1 reply; 8+ messages in thread
From: Ferruh Yigit @ 2017-04-03 10:31 UTC (permalink / raw)
  To: Pei, Yulong, Vladyslav Buslov, Zhang, Helin, Wu, Jingjing; +Cc: dev

On 4/1/2017 3:01 AM, Pei, Yulong wrote:
> Hi All
> 
> In Non-vector mode, without this patch, single core performance can reach 37.576Mpps with 64Byte packet,
> But after applied this patch , single core performance downgrade to 34.343Mpps with 64Byte packet.

Thanks Yulong for the test.

Vladyslav,

Is above test results match with your result?

If there is a degradation, this patch should not be merged.

Thanks,
ferruh

> 
> Best Regards
> Yulong Pei
> 
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladyslav Buslov
> Sent: Wednesday, March 1, 2017 6:57 PM
> To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH] net/i40e: add packet prefetch
> 
> Prefetch both cache lines of mbuf and first cache line of payload if CONFIG_RTE_PMD_PACKET_PREFETCH is set.
> 
> Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
<..>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net/i40e: add packet prefetch
  2017-04-01  2:01 ` Pei, Yulong
  2017-04-03 10:31   ` Ferruh Yigit
@ 2017-04-03 10:47   ` Ananyev, Konstantin
  2017-04-06  9:54     ` Bruce Richardson
  1 sibling, 1 reply; 8+ messages in thread
From: Ananyev, Konstantin @ 2017-04-03 10:47 UTC (permalink / raw)
  To: Pei, Yulong, Vladyslav Buslov, Zhang, Helin, Wu, Jingjing, Yigit, Ferruh
  Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pei, Yulong
> Sent: Saturday, April 1, 2017 3:02 AM
> To: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>; Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> Yigit, Ferruh <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] net/i40e: add packet prefetch
> 
> Hi All
> 
> In Non-vector mode, without this patch, single core performance can reach 37.576Mpps with 64Byte packet,
> But after applied this patch , single core performance downgrade to 34.343Mpps with 64Byte packet.
> 
> Best Regards
> Yulong Pei
> 
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladyslav Buslov
> Sent: Wednesday, March 1, 2017 6:57 PM
> To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH] net/i40e: add packet prefetch
> 
> Prefetch both cache lines of mbuf and first cache line of payload if CONFIG_RTE_PMD_PACKET_PREFETCH is set.
> 
> Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
> ---
>  drivers/net/i40e/i40e_rxtx.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 48429cc..2b4e5c9 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -100,6 +100,12 @@
>  #define I40E_TX_OFFLOAD_NOTSUP_MASK \
>  		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
> 
> +#ifdef RTE_PMD_PACKET_PREFETCH
> +#define rte_packet_prefetch(p)	rte_prefetch0(p)
> +#else
> +#define rte_packet_prefetch(p)	do {} while (0)
> +#endif
> +
>  static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
>  				      struct rte_mbuf **tx_pkts,
>  				      uint16_t nb_pkts);
> @@ -495,6 +501,9 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
>  		/* Translate descriptor info to mbuf parameters */
>  		for (j = 0; j < nb_dd; j++) {
>  			mb = rxep[j].mbuf;
> +			rte_packet_prefetch(
> +				RTE_PTR_ADD(mb->buf_addr,
> +						RTE_PKTMBUF_HEADROOM));
>  			qword1 = rte_le_to_cpu_64(\
>  				rxdp[j].wb.qword1.status_error_len);
>  			pkt_len = ((qword1 & I40E_RXD_QW1_LENGTH_PBUF_MASK) >> @@ -578,9 +587,11 @@
> i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq)
> 
>  	rxdp = &rxq->rx_ring[alloc_idx];
>  	for (i = 0; i < rxq->rx_free_thresh; i++) {
> -		if (likely(i < (rxq->rx_free_thresh - 1)))
> +		if (likely(i < (rxq->rx_free_thresh - 1))) {
>  			/* Prefetch next mbuf */
> -			rte_prefetch0(rxep[i + 1].mbuf);
> +			rte_packet_prefetch(rxep[i + 1].mbuf->cacheline0);
> +			rte_packet_prefetch(rxep[i + 1].mbuf->cacheline1);

As I can see the line aove is the only real difference in that patch.
If that so, might be worth to re-run perf tests witout that line?
Konstantin

> +		}
> 
>  		mb = rxep[i].mbuf;
>  		rte_mbuf_refcnt_set(mb, 1);
> @@ -752,7 +763,8 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
>  				I40E_RXD_QW1_LENGTH_PBUF_SHIFT) - rxq->crc_len;
> 
>  		rxm->data_off = RTE_PKTMBUF_HEADROOM;
> -		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
> +		rte_packet_prefetch(RTE_PTR_ADD(rxm->buf_addr,
> +						RTE_PKTMBUF_HEADROOM));
>  		rxm->nb_segs = 1;
>  		rxm->next = NULL;
>  		rxm->pkt_len = rx_packet_len;
> @@ -939,7 +951,7 @@ i40e_recv_scattered_pkts(void *rx_queue,
>  		first_seg->ol_flags |= pkt_flags;
> 
>  		/* Prefetch data of first segment, if configured to do so. */
> -		rte_prefetch0(RTE_PTR_ADD(first_seg->buf_addr,
> +		rte_packet_prefetch(RTE_PTR_ADD(first_seg->buf_addr,
>  			first_seg->data_off));
>  		rx_pkts[nb_rx++] = first_seg;
>  		first_seg = NULL;
> --
> 2.1.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net/i40e: add packet prefetch
  2017-04-03 10:31   ` Ferruh Yigit
@ 2017-04-06  9:29     ` Vladyslav Buslov
  2017-04-06  9:54       ` Bruce Richardson
  0 siblings, 1 reply; 8+ messages in thread
From: Vladyslav Buslov @ 2017-04-06  9:29 UTC (permalink / raw)
  To: Ferruh Yigit, Pei, Yulong, Zhang, Helin, Wu, Jingjing; +Cc: dev, Nitin Kumar

Ferruh,

In our case patch significantly improves application performance. (~40% more PPS on load balancer core)
Using DPDK examples I can only reproduce perf improvements with similar design apps like load_balancer.
With applications that send on free packets on same core where they are received performance is not improved.

It seems that this patch doesn't do anything(or even actually slightly degrades performance) for run-to-completion apps so we will have to continue maintaining it as part part of our internal branch.

Regards,
Vlad

> -----Original Message-----
> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> Sent: Monday, April 03, 2017 1:31 PM
> To: Pei, Yulong; Vladyslav Buslov; Zhang, Helin; Wu, Jingjing
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] net/i40e: add packet prefetch
> 
> On 4/1/2017 3:01 AM, Pei, Yulong wrote:
> > Hi All
> >
> > In Non-vector mode, without this patch, single core performance can
> > reach 37.576Mpps with 64Byte packet, But after applied this patch , single
> core performance downgrade to 34.343Mpps with 64Byte packet.
> 
> Thanks Yulong for the test.
> 
> Vladyslav,
> 
> Is above test results match with your result?
> 
> If there is a degradation, this patch should not be merged.
> 
> Thanks,
> ferruh
> 
> >
> > Best Regards
> > Yulong Pei
> >
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladyslav Buslov
> > Sent: Wednesday, March 1, 2017 6:57 PM
> > To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>
> > Cc: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH] net/i40e: add packet prefetch
> >
> > Prefetch both cache lines of mbuf and first cache line of payload if
> CONFIG_RTE_PMD_PACKET_PREFETCH is set.
> >
> > Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
> <..>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net/i40e: add packet prefetch
  2017-04-03 10:47   ` Ananyev, Konstantin
@ 2017-04-06  9:54     ` Bruce Richardson
  0 siblings, 0 replies; 8+ messages in thread
From: Bruce Richardson @ 2017-04-06  9:54 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Pei, Yulong, Vladyslav Buslov, Zhang, Helin, Wu, Jingjing, Yigit,
	Ferruh, dev

On Mon, Apr 03, 2017 at 10:47:20AM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Pei, Yulong
> > Sent: Saturday, April 1, 2017 3:02 AM
> > To: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>; Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> > Yigit, Ferruh <ferruh.yigit@intel.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] net/i40e: add packet prefetch
> > 
> > Hi All
> > 
> > In Non-vector mode, without this patch, single core performance can reach 37.576Mpps with 64Byte packet,
> > But after applied this patch , single core performance downgrade to 34.343Mpps with 64Byte packet.
> > 
> > Best Regards
> > Yulong Pei
> > 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladyslav Buslov
> > Sent: Wednesday, March 1, 2017 6:57 PM
> > To: Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>
> > Cc: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH] net/i40e: add packet prefetch
> > 
> > Prefetch both cache lines of mbuf and first cache line of payload if CONFIG_RTE_PMD_PACKET_PREFETCH is set.
> > 
> > Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
> > ---
> >  drivers/net/i40e/i40e_rxtx.c | 20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 48429cc..2b4e5c9 100644
> > --- a/drivers/net/i40e/i40e_rxtx.c
> > +++ b/drivers/net/i40e/i40e_rxtx.c
> > @@ -100,6 +100,12 @@
> >  #define I40E_TX_OFFLOAD_NOTSUP_MASK \
> >  		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK)
> > 
> > +#ifdef RTE_PMD_PACKET_PREFETCH
> > +#define rte_packet_prefetch(p)	rte_prefetch0(p)
> > +#else
> > +#define rte_packet_prefetch(p)	do {} while (0)
> > +#endif
> > +
> >  static uint16_t i40e_xmit_pkts_simple(void *tx_queue,
> >  				      struct rte_mbuf **tx_pkts,
> >  				      uint16_t nb_pkts);
> > @@ -495,6 +501,9 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> >  		/* Translate descriptor info to mbuf parameters */
> >  		for (j = 0; j < nb_dd; j++) {
> >  			mb = rxep[j].mbuf;
> > +			rte_packet_prefetch(
> > +				RTE_PTR_ADD(mb->buf_addr,
> > +						RTE_PKTMBUF_HEADROOM));
> >  			qword1 = rte_le_to_cpu_64(\
> >  				rxdp[j].wb.qword1.status_error_len);
> >  			pkt_len = ((qword1 & I40E_RXD_QW1_LENGTH_PBUF_MASK) >> @@ -578,9 +587,11 @@
> > i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq)
> > 
> >  	rxdp = &rxq->rx_ring[alloc_idx];
> >  	for (i = 0; i < rxq->rx_free_thresh; i++) {
> > -		if (likely(i < (rxq->rx_free_thresh - 1)))
> > +		if (likely(i < (rxq->rx_free_thresh - 1))) {
> >  			/* Prefetch next mbuf */
> > -			rte_prefetch0(rxep[i + 1].mbuf);
> > +			rte_packet_prefetch(rxep[i + 1].mbuf->cacheline0);
> > +			rte_packet_prefetch(rxep[i + 1].mbuf->cacheline1);
> 
> As I can see the line aove is the only real difference in that patch.
> If that so, might be worth to re-run perf tests witout that line?
> Konstantin
>
The prefetch for the packet buf_addr+headroom above also looks new.
Are both needed to get the performance boost you see?
We should also investigate if the same effect can be got using a
runtime option, rather than a compile-time setting. That would give us
the best of both worlds.

/Bruce

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] net/i40e: add packet prefetch
  2017-04-06  9:29     ` Vladyslav Buslov
@ 2017-04-06  9:54       ` Bruce Richardson
  0 siblings, 0 replies; 8+ messages in thread
From: Bruce Richardson @ 2017-04-06  9:54 UTC (permalink / raw)
  To: Vladyslav Buslov
  Cc: Ferruh Yigit, Pei, Yulong, Zhang, Helin, Wu, Jingjing, dev, Nitin Kumar

On Thu, Apr 06, 2017 at 09:29:12AM +0000, Vladyslav Buslov wrote:
> Ferruh,
> 
> In our case patch significantly improves application performance. (~40% more PPS on load balancer core)
> Using DPDK examples I can only reproduce perf improvements with similar design apps like load_balancer.
> With applications that send on free packets on same core where they are received performance is not improved.
> 
> It seems that this patch doesn't do anything(or even actually slightly degrades performance) for run-to-completion apps so we will have to continue maintaining it as part part of our internal branch.
> 
> Regards,
> Vlad
>
Is it possible to get the same performance boost in your app, while
avoiding the perf hit in run-to-completion apps, by using a runtime,
rather than compile-time option?

/Bruce

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-04-06  9:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-01 10:56 [PATCH] net/i40e: add packet prefetch Vladyslav Buslov
2017-03-07 10:37 ` Ferruh Yigit
2017-04-01  2:01 ` Pei, Yulong
2017-04-03 10:31   ` Ferruh Yigit
2017-04-06  9:29     ` Vladyslav Buslov
2017-04-06  9:54       ` Bruce Richardson
2017-04-03 10:47   ` Ananyev, Konstantin
2017-04-06  9:54     ` Bruce Richardson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.