[PATCH net v2] ixgbe: fix double clean of tx descriptors with xdp

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net v2] ixgbe: fix double clean of tx descriptors with xdp
       [not found] <CGME20190822123045eucas1p125b6e106f0310bdb50e759ef41993a91@eucas1p1.samsung.com>
@ 2019-08-22 12:30 ` Ilya Maximets
  2019-08-22 16:23   ` William Tu
  2019-08-22 16:38   ` Alexander Duyck
  0 siblings, 2 replies; 5+ messages in thread
From: Ilya Maximets @ 2019-08-22 12:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, bpf, David S. Miller, Björn Töpel,
	Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron,
	William Tu, Alexander Duyck, Ilya Maximets

Tx code doesn't clear the descriptors' status after cleaning.
So, if the budget is larger than number of used elems in a ring, some
descriptors will be accounted twice and xsk_umem_complete_tx will move
prod_tail far beyond the prod_head breaking the comletion queue ring.

Fix that by limiting the number of descriptors to clean by the number
of used descriptors in the tx ring.

'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
'ixgbe_xsk_clean_tx_ring()' since we don't need most of the
complications implemented in the regular 'ixgbe_clean_tx_irq()'
and we're allowed to directly use 'next_to_clean' and 'next_to_use'
indexes.

Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---

Version 2:
  * 'ixgbe_clean_xdp_tx_irq()' refactored to look more like
    'ixgbe_xsk_clean_tx_ring()'.

 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 34 ++++++++------------
 1 file changed, 13 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 6b609553329f..d1297660e14a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -633,22 +633,23 @@ static void ixgbe_clean_xdp_tx_buffer(struct ixgbe_ring *tx_ring,
 bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
 			    struct ixgbe_ring *tx_ring, int napi_budget)
 {
+	u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
 	unsigned int total_packets = 0, total_bytes = 0;
-	u32 i = tx_ring->next_to_clean, xsk_frames = 0;
 	unsigned int budget = q_vector->tx.work_limit;
 	struct xdp_umem *umem = tx_ring->xsk_umem;
-	union ixgbe_adv_tx_desc *tx_desc;
-	struct ixgbe_tx_buffer *tx_bi;
+	u32 xsk_frames = 0;
 	bool xmit_done;
 
-	tx_bi = &tx_ring->tx_buffer_info[i];
-	tx_desc = IXGBE_TX_DESC(tx_ring, i);
-	i -= tx_ring->count;
+	while (likely(ntc != ntu && budget)) {
+		union ixgbe_adv_tx_desc *tx_desc;
+		struct ixgbe_tx_buffer *tx_bi;
+
+		tx_desc = IXGBE_TX_DESC(tx_ring, ntc);
 
-	do {
 		if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
 			break;
 
+		tx_bi = &tx_ring->tx_buffer_info[ntc];
 		total_bytes += tx_bi->bytecount;
 		total_packets += tx_bi->gso_segs;
 
@@ -659,24 +660,15 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
 
 		tx_bi->xdpf = NULL;
 
-		tx_bi++;
-		tx_desc++;
-		i++;
-		if (unlikely(!i)) {
-			i -= tx_ring->count;
-			tx_bi = tx_ring->tx_buffer_info;
-			tx_desc = IXGBE_TX_DESC(tx_ring, 0);
-		}
-
-		/* issue prefetch for next Tx descriptor */
-		prefetch(tx_desc);
+		ntc++;
+		if (unlikely(ntc == tx_ring->count))
+			ntc = 0;
 
 		/* update budget accounting */
 		budget--;
-	} while (likely(budget));
+	}
 
-	i += tx_ring->count;
-	tx_ring->next_to_clean = i;
+	tx_ring->next_to_clean = ntc;
 
 	u64_stats_update_begin(&tx_ring->syncp);
 	tx_ring->stats.bytes += total_bytes;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net v2] ixgbe: fix double clean of tx descriptors with xdp
  2019-08-22 12:30 ` [PATCH net v2] ixgbe: fix double clean of tx descriptors with xdp Ilya Maximets
@ 2019-08-22 16:23   ` William Tu
  2019-08-22 16:38   ` Alexander Duyck
  1 sibling, 0 replies; 5+ messages in thread
From: William Tu @ 2019-08-22 16:23 UTC (permalink / raw)
  To: Ilya Maximets
  Cc: Linux Kernel Network Developers, LKML, bpf, David S. Miller,
	Björn Töpel, Magnus Karlsson, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Jeff Kirsher,
	intel-wired-lan, Eelco Chaudron, Alexander Duyck

On Thu, Aug 22, 2019 at 5:30 AM Ilya Maximets <i.maximets@samsung.com> wrote:
>
> Tx code doesn't clear the descriptors' status after cleaning.
> So, if the budget is larger than number of used elems in a ring, some
> descriptors will be accounted twice and xsk_umem_complete_tx will move
> prod_tail far beyond the prod_head breaking the comletion queue ring.

s/comletion/completion/

>
> Fix that by limiting the number of descriptors to clean by the number
> of used descriptors in the tx ring.
>
> 'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
> 'ixgbe_xsk_clean_tx_ring()' since we don't need most of the
> complications implemented in the regular 'ixgbe_clean_tx_irq()'
> and we're allowed to directly use 'next_to_clean' and 'next_to_use'
> indexes.
>
> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---

Tested-by: William Tu <u9012063@gmail.com>

Instead of measuring tx performance at the tx machine, I measured the TX
performance at the other side (the traffic generating machine).  This time it
is more consistent and showing not much difference with (5.9Mpps) and
without this patch (6.1Mpps).

>
> Version 2:
>   * 'ixgbe_clean_xdp_tx_irq()' refactored to look more like
>     'ixgbe_xsk_clean_tx_ring()'.
>
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 34 ++++++++------------
>  1 file changed, 13 insertions(+), 21 deletions(-)
>

<snip>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v2] ixgbe: fix double clean of tx descriptors with xdp
  2019-08-22 12:30 ` [PATCH net v2] ixgbe: fix double clean of tx descriptors with xdp Ilya Maximets
  2019-08-22 16:23   ` William Tu
@ 2019-08-22 16:38   ` Alexander Duyck
  2019-08-22 16:58     ` Ilya Maximets
  1 sibling, 1 reply; 5+ messages in thread
From: Alexander Duyck @ 2019-08-22 16:38 UTC (permalink / raw)
  To: Ilya Maximets
  Cc: Netdev, LKML, bpf, David S. Miller, Björn Töpel,
	Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron,
	William Tu

On Thu, Aug 22, 2019 at 5:30 AM Ilya Maximets <i.maximets@samsung.com> wrote:
>
> Tx code doesn't clear the descriptors' status after cleaning.
> So, if the budget is larger than number of used elems in a ring, some
> descriptors will be accounted twice and xsk_umem_complete_tx will move
> prod_tail far beyond the prod_head breaking the comletion queue ring.
>
> Fix that by limiting the number of descriptors to clean by the number
> of used descriptors in the tx ring.
>
> 'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
> 'ixgbe_xsk_clean_tx_ring()' since we don't need most of the
> complications implemented in the regular 'ixgbe_clean_tx_irq()'
> and we're allowed to directly use 'next_to_clean' and 'next_to_use'
> indexes.
>
> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
>
> Version 2:
>   * 'ixgbe_clean_xdp_tx_irq()' refactored to look more like
>     'ixgbe_xsk_clean_tx_ring()'.
>
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 34 ++++++++------------
>  1 file changed, 13 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index 6b609553329f..d1297660e14a 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -633,22 +633,23 @@ static void ixgbe_clean_xdp_tx_buffer(struct ixgbe_ring *tx_ring,
>  bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>                             struct ixgbe_ring *tx_ring, int napi_budget)
>  {
> +       u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
>         unsigned int total_packets = 0, total_bytes = 0;
> -       u32 i = tx_ring->next_to_clean, xsk_frames = 0;
>         unsigned int budget = q_vector->tx.work_limit;
>         struct xdp_umem *umem = tx_ring->xsk_umem;
> -       union ixgbe_adv_tx_desc *tx_desc;
> -       struct ixgbe_tx_buffer *tx_bi;
> +       u32 xsk_frames = 0;
>         bool xmit_done;
>
> -       tx_bi = &tx_ring->tx_buffer_info[i];
> -       tx_desc = IXGBE_TX_DESC(tx_ring, i);
> -       i -= tx_ring->count;
> +       while (likely(ntc != ntu && budget)) {

I would say you can get rid of budget entirely. It was only really
needed for the regular Tx case where you can have multiple CPUs
feeding a single Tx queue and causing a stall. Since we have a 1:1
mapping we should never have more than the Rx budget worth of packets
to really process. In addition we can only make one pass through the
ring since the ntu value is not updated while running the loop.

> +               union ixgbe_adv_tx_desc *tx_desc;
> +               struct ixgbe_tx_buffer *tx_bi;
> +
> +               tx_desc = IXGBE_TX_DESC(tx_ring, ntc);
>
> -       do {
>                 if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
>                         break;
>
> +               tx_bi = &tx_ring->tx_buffer_info[ntc];

Please don't move this logic into the loop. We were intentionally
processing this outside of the loop once and then just doing the
increments because it is faster that way. It takes several operations
to compute tx_bi based on ntc, whereas just incrementing is a single
operation.

>                 total_bytes += tx_bi->bytecount;
>                 total_packets += tx_bi->gso_segs;
>
> @@ -659,24 +660,15 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>
>                 tx_bi->xdpf = NULL;
>
> -               tx_bi++;
> -               tx_desc++;
> -               i++;
> -               if (unlikely(!i)) {
> -                       i -= tx_ring->count;

So these two lines can probably just be replaced by:
if (unlikely(ntc == tx_ring->count)) {
        ntc = 0;

> -                       tx_bi = tx_ring->tx_buffer_info;
> -                       tx_desc = IXGBE_TX_DESC(tx_ring, 0);
> -               }
> -
> -               /* issue prefetch for next Tx descriptor */
> -               prefetch(tx_desc);

Did you just drop the prefetch? You are changing way too much with
this patch. All you should need to do is replace i with ntc, replace
the "do {" with "while (ntc != ntu) {", and remove the while at the
end.

> +               ntc++;
> +               if (unlikely(ntc == tx_ring->count))
> +                       ntc = 0;
>
>                 /* update budget accounting */
>                 budget--;
> -       } while (likely(budget));

As I stated earlier, budget can be removed entirely.

> +       }
>
> -       i += tx_ring->count;
> -       tx_ring->next_to_clean = i;
> +       tx_ring->next_to_clean = ntc;
>
>         u64_stats_update_begin(&tx_ring->syncp);
>         tx_ring->stats.bytes += total_bytes;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v2] ixgbe: fix double clean of tx descriptors with xdp
  2019-08-22 16:38   ` Alexander Duyck
@ 2019-08-22 16:58     ` Ilya Maximets
  2019-08-22 17:10       ` Alexander Duyck
  0 siblings, 1 reply; 5+ messages in thread
From: Ilya Maximets @ 2019-08-22 16:58 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Netdev, LKML, bpf, David S. Miller, Björn Töpel,
	Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron,
	William Tu

On 22.08.2019 19:38, Alexander Duyck wrote:
> On Thu, Aug 22, 2019 at 5:30 AM Ilya Maximets <i.maximets@samsung.com> wrote:
>>
>> Tx code doesn't clear the descriptors' status after cleaning.
>> So, if the budget is larger than number of used elems in a ring, some
>> descriptors will be accounted twice and xsk_umem_complete_tx will move
>> prod_tail far beyond the prod_head breaking the comletion queue ring.
>>
>> Fix that by limiting the number of descriptors to clean by the number
>> of used descriptors in the tx ring.
>>
>> 'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
>> 'ixgbe_xsk_clean_tx_ring()' since we don't need most of the
>> complications implemented in the regular 'ixgbe_clean_tx_irq()'
>> and we're allowed to directly use 'next_to_clean' and 'next_to_use'
>> indexes.
>>
>> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>> ---
>>
>> Version 2:
>>   * 'ixgbe_clean_xdp_tx_irq()' refactored to look more like
>>     'ixgbe_xsk_clean_tx_ring()'.
>>
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 34 ++++++++------------
>>  1 file changed, 13 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> index 6b609553329f..d1297660e14a 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> @@ -633,22 +633,23 @@ static void ixgbe_clean_xdp_tx_buffer(struct ixgbe_ring *tx_ring,
>>  bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>>                             struct ixgbe_ring *tx_ring, int napi_budget)
>>  {
>> +       u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
>>         unsigned int total_packets = 0, total_bytes = 0;
>> -       u32 i = tx_ring->next_to_clean, xsk_frames = 0;
>>         unsigned int budget = q_vector->tx.work_limit;
>>         struct xdp_umem *umem = tx_ring->xsk_umem;
>> -       union ixgbe_adv_tx_desc *tx_desc;
>> -       struct ixgbe_tx_buffer *tx_bi;
>> +       u32 xsk_frames = 0;
>>         bool xmit_done;
>>
>> -       tx_bi = &tx_ring->tx_buffer_info[i];
>> -       tx_desc = IXGBE_TX_DESC(tx_ring, i);
>> -       i -= tx_ring->count;
>> +       while (likely(ntc != ntu && budget)) {
> 
> I would say you can get rid of budget entirely. It was only really
> needed for the regular Tx case where you can have multiple CPUs
> feeding a single Tx queue and causing a stall. Since we have a 1:1
> mapping we should never have more than the Rx budget worth of packets
> to really process. In addition we can only make one pass through the
> ring since the ntu value is not updated while running the loop.

OK. Will remove.

> 
>> +               union ixgbe_adv_tx_desc *tx_desc;
>> +               struct ixgbe_tx_buffer *tx_bi;
>> +
>> +               tx_desc = IXGBE_TX_DESC(tx_ring, ntc);
>>
>> -       do {
>>                 if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
>>                         break;
>>
>> +               tx_bi = &tx_ring->tx_buffer_info[ntc];
> 
> Please don't move this logic into the loop. We were intentionally
> processing this outside of the loop once and then just doing the
> increments because it is faster that way. It takes several operations
> to compute tx_bi based on ntc, whereas just incrementing is a single
> operation.

OK.

> 
>>                 total_bytes += tx_bi->bytecount;
>>                 total_packets += tx_bi->gso_segs;
>>
>> @@ -659,24 +660,15 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
>>
>>                 tx_bi->xdpf = NULL;
>>
>> -               tx_bi++;
>> -               tx_desc++;
>> -               i++;
>> -               if (unlikely(!i)) {
>> -                       i -= tx_ring->count;
> 
> So these two lines can probably just be replaced by:
> if (unlikely(ntc == tx_ring->count)) {
>         ntc = 0;

Sure.

> 
>> -                       tx_bi = tx_ring->tx_buffer_info;
>> -                       tx_desc = IXGBE_TX_DESC(tx_ring, 0);
>> -               }
>> -
>> -               /* issue prefetch for next Tx descriptor */
>> -               prefetch(tx_desc);
> 
> Did you just drop the prefetch?

I'll keep the prefetch in v3 because, as you fairly mentioned, it's not
related to this patch. However, I'm not sure if this prefetch makes any
sense here, because there is only one comparison operation between the
prefetch and the data usage:

 while (ntc != ntu) {
     if (!(tx_desc->wb.status ...
     <...>
     prefetch(tx_desc);
 }


> You are changing way too much with
> this patch. All you should need to do is replace i with ntc, replace
> the "do {" with "while (ntc != ntu) {", and remove the while at the
> end.
> 
>> +               ntc++;
>> +               if (unlikely(ntc == tx_ring->count))
>> +                       ntc = 0;
>>
>>                 /* update budget accounting */
>>                 budget--;
>> -       } while (likely(budget));
> 
> As I stated earlier, budget can be removed entirely.

Sure.

> 
>> +       }
>>
>> -       i += tx_ring->count;
>> -       tx_ring->next_to_clean = i;
>> +       tx_ring->next_to_clean = ntc;
>>
>>         u64_stats_update_begin(&tx_ring->syncp);
>>         tx_ring->stats.bytes += total_bytes;
>> --
>> 2.17.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v2] ixgbe: fix double clean of tx descriptors with xdp
  2019-08-22 16:58     ` Ilya Maximets
@ 2019-08-22 17:10       ` Alexander Duyck
  0 siblings, 0 replies; 5+ messages in thread
From: Alexander Duyck @ 2019-08-22 17:10 UTC (permalink / raw)
  To: Ilya Maximets
  Cc: Netdev, LKML, bpf, David S. Miller, Björn Töpel,
	Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron,
	William Tu

On Thu, Aug 22, 2019 at 9:58 AM Ilya Maximets <i.maximets@samsung.com> wrote:
>
> On 22.08.2019 19:38, Alexander Duyck wrote:
> > On Thu, Aug 22, 2019 at 5:30 AM Ilya Maximets <i.maximets@samsung.com> wrote:
> >>
> >> Tx code doesn't clear the descriptors' status after cleaning.
> >> So, if the budget is larger than number of used elems in a ring, some
> >> descriptors will be accounted twice and xsk_umem_complete_tx will move
> >> prod_tail far beyond the prod_head breaking the comletion queue ring.
> >>
> >> Fix that by limiting the number of descriptors to clean by the number
> >> of used descriptors in the tx ring.
> >>
> >> 'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
> >> 'ixgbe_xsk_clean_tx_ring()' since we don't need most of the
> >> complications implemented in the regular 'ixgbe_clean_tx_irq()'
> >> and we're allowed to directly use 'next_to_clean' and 'next_to_use'
> >> indexes.
> >>
> >> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
> >> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> >> ---
> >>
> >> Version 2:
> >>   * 'ixgbe_clean_xdp_tx_irq()' refactored to look more like
> >>     'ixgbe_xsk_clean_tx_ring()'.
> >>
> >>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 34 ++++++++------------
> >>  1 file changed, 13 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> >> index 6b609553329f..d1297660e14a 100644
> >> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> >> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> >> @@ -633,22 +633,23 @@ static void ixgbe_clean_xdp_tx_buffer(struct ixgbe_ring *tx_ring,
> >>  bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
> >>                             struct ixgbe_ring *tx_ring, int napi_budget)
> >>  {
> >> +       u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
> >>         unsigned int total_packets = 0, total_bytes = 0;
> >> -       u32 i = tx_ring->next_to_clean, xsk_frames = 0;
> >>         unsigned int budget = q_vector->tx.work_limit;
> >>         struct xdp_umem *umem = tx_ring->xsk_umem;
> >> -       union ixgbe_adv_tx_desc *tx_desc;
> >> -       struct ixgbe_tx_buffer *tx_bi;
> >> +       u32 xsk_frames = 0;
> >>         bool xmit_done;
> >>
> >> -       tx_bi = &tx_ring->tx_buffer_info[i];
> >> -       tx_desc = IXGBE_TX_DESC(tx_ring, i);
> >> -       i -= tx_ring->count;
> >> +       while (likely(ntc != ntu && budget)) {
> >
> > I would say you can get rid of budget entirely. It was only really
> > needed for the regular Tx case where you can have multiple CPUs
> > feeding a single Tx queue and causing a stall. Since we have a 1:1
> > mapping we should never have more than the Rx budget worth of packets
> > to really process. In addition we can only make one pass through the
> > ring since the ntu value is not updated while running the loop.
>
> OK. Will remove.
>
> >
> >> +               union ixgbe_adv_tx_desc *tx_desc;
> >> +               struct ixgbe_tx_buffer *tx_bi;
> >> +
> >> +               tx_desc = IXGBE_TX_DESC(tx_ring, ntc);
> >>
> >> -       do {
> >>                 if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
> >>                         break;
> >>
> >> +               tx_bi = &tx_ring->tx_buffer_info[ntc];
> >
> > Please don't move this logic into the loop. We were intentionally
> > processing this outside of the loop once and then just doing the
> > increments because it is faster that way. It takes several operations
> > to compute tx_bi based on ntc, whereas just incrementing is a single
> > operation.
>
> OK.
>
> >
> >>                 total_bytes += tx_bi->bytecount;
> >>                 total_packets += tx_bi->gso_segs;
> >>
> >> @@ -659,24 +660,15 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
> >>
> >>                 tx_bi->xdpf = NULL;
> >>
> >> -               tx_bi++;
> >> -               tx_desc++;
> >> -               i++;
> >> -               if (unlikely(!i)) {
> >> -                       i -= tx_ring->count;
> >
> > So these two lines can probably just be replaced by:
> > if (unlikely(ntc == tx_ring->count)) {
> >         ntc = 0;
>
> Sure.
>
> >
> >> -                       tx_bi = tx_ring->tx_buffer_info;
> >> -                       tx_desc = IXGBE_TX_DESC(tx_ring, 0);
> >> -               }
> >> -
> >> -               /* issue prefetch for next Tx descriptor */
> >> -               prefetch(tx_desc);
> >
> > Did you just drop the prefetch?
>
> I'll keep the prefetch in v3 because, as you fairly mentioned, it's not
> related to this patch. However, I'm not sure if this prefetch makes any
> sense here, because there is only one comparison operation between the
> prefetch and the data usage:
>
>  while (ntc != ntu) {
>      if (!(tx_desc->wb.status ...
>      <...>
>      prefetch(tx_desc);
>  }

I'm not opposed to dropping the prefetch, but if you are going to do
it you should do it in a separate patch.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-08-22 17:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20190822123045eucas1p125b6e106f0310bdb50e759ef41993a91@eucas1p1.samsung.com>
2019-08-22 12:30 ` [PATCH net v2] ixgbe: fix double clean of tx descriptors with xdp Ilya Maximets
2019-08-22 16:23   ` William Tu
2019-08-22 16:38   ` Alexander Duyck
2019-08-22 16:58     ` Ilya Maximets
2019-08-22 17:10       ` Alexander Duyck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).