* [PATCH net] ixgbe: fix double clean of tx descriptors with xdp [not found] <CGME20190820151644eucas1p179d6d1da42bb6be0aad8f58ac46624ce@eucas1p1.samsung.com> @ 2019-08-20 15:16 ` Ilya Maximets 2019-08-20 15:35 ` Alexander Duyck 2019-08-21 10:09 ` Eelco Chaudron 0 siblings, 2 replies; 14+ messages in thread From: Ilya Maximets @ 2019-08-20 15:16 UTC (permalink / raw) To: netdev Cc: linux-kernel, bpf, David S. Miller, Björn Töpel, Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron, William Tu, Ilya Maximets Tx code doesn't clear the descriptor status after cleaning. So, if the budget is larger than number of used elems in a ring, some descriptors will be accounted twice and xsk_umem_complete_tx will move prod_tail far beyond the prod_head breaking the comletion queue ring. Fix that by limiting the number of descriptors to clean by the number of used descriptors in the tx ring. Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> --- Not tested yet because of lack of available hardware. So, testing is very welcome. drivers/net/ethernet/intel/ixgbe/ixgbe.h | 10 ++++++++++ drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 +----------- drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 6 ++++-- 3 files changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h index 39e73ad60352..0befcef46e80 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h @@ -512,6 +512,16 @@ static inline u16 ixgbe_desc_unused(struct ixgbe_ring *ring) return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1; } +static inline u64 ixgbe_desc_used(struct ixgbe_ring *ring) +{ + unsigned int head, tail; + + head = ring->next_to_clean; + tail = ring->next_to_use; + + return ((head <= tail) ? tail : tail + ring->count) - head; +} + #define IXGBE_RX_DESC(R, i) \ (&(((union ixgbe_adv_rx_desc *)((R)->desc))[i])) #define IXGBE_TX_DESC(R, i) \ diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 7882148abb43..d417237857d8 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -1012,21 +1012,11 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring) return ring->stats.packets; } -static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring) -{ - unsigned int head, tail; - - head = ring->next_to_clean; - tail = ring->next_to_use; - - return ((head <= tail) ? tail : tail + ring->count) - head; -} - static inline bool ixgbe_check_tx_hang(struct ixgbe_ring *tx_ring) { u32 tx_done = ixgbe_get_tx_completed(tx_ring); u32 tx_done_old = tx_ring->tx_stats.tx_done_old; - u32 tx_pending = ixgbe_get_tx_pending(tx_ring); + u32 tx_pending = ixgbe_desc_used(tx_ring); clear_check_for_tx_hang(tx_ring); diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index 6b609553329f..7702efed356a 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -637,6 +637,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector, u32 i = tx_ring->next_to_clean, xsk_frames = 0; unsigned int budget = q_vector->tx.work_limit; struct xdp_umem *umem = tx_ring->xsk_umem; + u32 used_descs = ixgbe_desc_used(tx_ring); union ixgbe_adv_tx_desc *tx_desc; struct ixgbe_tx_buffer *tx_bi; bool xmit_done; @@ -645,7 +646,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector, tx_desc = IXGBE_TX_DESC(tx_ring, i); i -= tx_ring->count; - do { + while (likely(budget && used_descs)) { if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD))) break; @@ -673,7 +674,8 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector, /* update budget accounting */ budget--; - } while (likely(budget)); + used_descs--; + } i += tx_ring->count; tx_ring->next_to_clean = i; -- 2.17.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp 2019-08-20 15:16 ` [PATCH net] ixgbe: fix double clean of tx descriptors with xdp Ilya Maximets @ 2019-08-20 15:35 ` Alexander Duyck 2019-08-20 15:58 ` Ilya Maximets 2019-08-21 10:09 ` Eelco Chaudron 1 sibling, 1 reply; 14+ messages in thread From: Alexander Duyck @ 2019-08-20 15:35 UTC (permalink / raw) To: Ilya Maximets Cc: Netdev, LKML, bpf, David S. Miller, Björn Töpel, Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron, William Tu On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote: > > Tx code doesn't clear the descriptor status after cleaning. > So, if the budget is larger than number of used elems in a ring, some > descriptors will be accounted twice and xsk_umem_complete_tx will move > prod_tail far beyond the prod_head breaking the comletion queue ring. > > Fix that by limiting the number of descriptors to clean by the number > of used descriptors in the tx ring. > > Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") > Signed-off-by: Ilya Maximets <i.maximets@samsung.com> I'm not sure this is the best way to go. My preference would be to have something in the ring that would prevent us from racing which I don't think this really addresses. I am pretty sure this code is safe on x86 but I would be worried about weak ordered systems such as PowerPC. It might make sense to look at adding the eop_desc logic like we have in the regular path with a proper barrier before we write it and after we read it. So for example we could hold of on writing the bytecount value until the end of an iteration and call smp_wmb before we write it. Then on the cleanup we could read it and if it is non-zero we take an smp_rmb before proceeding further to process the Tx descriptor and clearing the value. Otherwise this code is going to just keep popping up with issues. > --- > > Not tested yet because of lack of available hardware. > So, testing is very welcome. > > drivers/net/ethernet/intel/ixgbe/ixgbe.h | 10 ++++++++++ > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 +----------- > drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 6 ++++-- > 3 files changed, 15 insertions(+), 13 deletions(-) > > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h > index 39e73ad60352..0befcef46e80 100644 > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h > @@ -512,6 +512,16 @@ static inline u16 ixgbe_desc_unused(struct ixgbe_ring *ring) > return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1; > } > > +static inline u64 ixgbe_desc_used(struct ixgbe_ring *ring) > +{ > + unsigned int head, tail; > + > + head = ring->next_to_clean; > + tail = ring->next_to_use; > + > + return ((head <= tail) ? tail : tail + ring->count) - head; > +} > + > #define IXGBE_RX_DESC(R, i) \ > (&(((union ixgbe_adv_rx_desc *)((R)->desc))[i])) > #define IXGBE_TX_DESC(R, i) \ > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > index 7882148abb43..d417237857d8 100644 > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > @@ -1012,21 +1012,11 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring) > return ring->stats.packets; > } > > -static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring) > -{ > - unsigned int head, tail; > - > - head = ring->next_to_clean; > - tail = ring->next_to_use; > - > - return ((head <= tail) ? tail : tail + ring->count) - head; > -} > - > static inline bool ixgbe_check_tx_hang(struct ixgbe_ring *tx_ring) > { > u32 tx_done = ixgbe_get_tx_completed(tx_ring); > u32 tx_done_old = tx_ring->tx_stats.tx_done_old; > - u32 tx_pending = ixgbe_get_tx_pending(tx_ring); > + u32 tx_pending = ixgbe_desc_used(tx_ring); > > clear_check_for_tx_hang(tx_ring); > > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c > index 6b609553329f..7702efed356a 100644 > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c > @@ -637,6 +637,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector, > u32 i = tx_ring->next_to_clean, xsk_frames = 0; > unsigned int budget = q_vector->tx.work_limit; > struct xdp_umem *umem = tx_ring->xsk_umem; > + u32 used_descs = ixgbe_desc_used(tx_ring); > union ixgbe_adv_tx_desc *tx_desc; > struct ixgbe_tx_buffer *tx_bi; > bool xmit_done; > @@ -645,7 +646,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector, > tx_desc = IXGBE_TX_DESC(tx_ring, i); > i -= tx_ring->count; > > - do { > + while (likely(budget && used_descs)) { > if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD))) > break; > > @@ -673,7 +674,8 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector, > > /* update budget accounting */ > budget--; > - } while (likely(budget)); > + used_descs--; > + } > > i += tx_ring->count; > tx_ring->next_to_clean = i; > -- > 2.17.1 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp 2019-08-20 15:35 ` Alexander Duyck @ 2019-08-20 15:58 ` Ilya Maximets 2019-08-21 1:17 ` Alexander Duyck 0 siblings, 1 reply; 14+ messages in thread From: Ilya Maximets @ 2019-08-20 15:58 UTC (permalink / raw) To: Alexander Duyck Cc: Netdev, LKML, bpf, David S. Miller, Björn Töpel, Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron, William Tu On 20.08.2019 18:35, Alexander Duyck wrote: > On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote: >> >> Tx code doesn't clear the descriptor status after cleaning. >> So, if the budget is larger than number of used elems in a ring, some >> descriptors will be accounted twice and xsk_umem_complete_tx will move >> prod_tail far beyond the prod_head breaking the comletion queue ring. >> >> Fix that by limiting the number of descriptors to clean by the number >> of used descriptors in the tx ring. >> >> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") >> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > > I'm not sure this is the best way to go. My preference would be to > have something in the ring that would prevent us from racing which I > don't think this really addresses. I am pretty sure this code is safe > on x86 but I would be worried about weak ordered systems such as > PowerPC. > > It might make sense to look at adding the eop_desc logic like we have > in the regular path with a proper barrier before we write it and after > we read it. So for example we could hold of on writing the bytecount > value until the end of an iteration and call smp_wmb before we write > it. Then on the cleanup we could read it and if it is non-zero we take > an smp_rmb before proceeding further to process the Tx descriptor and > clearing the value. Otherwise this code is going to just keep popping > up with issues. But, unlike regular case, xdp zero-copy xmit and clean for particular tx ring always happens in the same NAPI context and even on the same CPU core. I saw the 'eop_desc' manipulations in regular case and yes, we could use 'next_to_watch' field just as a flag of descriptor existence, but it seems unnecessarily complicated. Am I missing something? > >> --- >> >> Not tested yet because of lack of available hardware. >> So, testing is very welcome. >> >> drivers/net/ethernet/intel/ixgbe/ixgbe.h | 10 ++++++++++ >> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 +----------- >> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 6 ++++-- >> 3 files changed, 15 insertions(+), 13 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h >> index 39e73ad60352..0befcef46e80 100644 >> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h >> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h >> @@ -512,6 +512,16 @@ static inline u16 ixgbe_desc_unused(struct ixgbe_ring *ring) >> return ((ntc > ntu) ? 0 : ring->count) + ntc - ntu - 1; >> } >> >> +static inline u64 ixgbe_desc_used(struct ixgbe_ring *ring) >> +{ >> + unsigned int head, tail; >> + >> + head = ring->next_to_clean; >> + tail = ring->next_to_use; >> + >> + return ((head <= tail) ? tail : tail + ring->count) - head; >> +} >> + >> #define IXGBE_RX_DESC(R, i) \ >> (&(((union ixgbe_adv_rx_desc *)((R)->desc))[i])) >> #define IXGBE_TX_DESC(R, i) \ >> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c >> index 7882148abb43..d417237857d8 100644 >> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c >> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c >> @@ -1012,21 +1012,11 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring) >> return ring->stats.packets; >> } >> >> -static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring) >> -{ >> - unsigned int head, tail; >> - >> - head = ring->next_to_clean; >> - tail = ring->next_to_use; >> - >> - return ((head <= tail) ? tail : tail + ring->count) - head; >> -} >> - >> static inline bool ixgbe_check_tx_hang(struct ixgbe_ring *tx_ring) >> { >> u32 tx_done = ixgbe_get_tx_completed(tx_ring); >> u32 tx_done_old = tx_ring->tx_stats.tx_done_old; >> - u32 tx_pending = ixgbe_get_tx_pending(tx_ring); >> + u32 tx_pending = ixgbe_desc_used(tx_ring); >> >> clear_check_for_tx_hang(tx_ring); >> >> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c >> index 6b609553329f..7702efed356a 100644 >> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c >> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c >> @@ -637,6 +637,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector, >> u32 i = tx_ring->next_to_clean, xsk_frames = 0; >> unsigned int budget = q_vector->tx.work_limit; >> struct xdp_umem *umem = tx_ring->xsk_umem; >> + u32 used_descs = ixgbe_desc_used(tx_ring); >> union ixgbe_adv_tx_desc *tx_desc; >> struct ixgbe_tx_buffer *tx_bi; >> bool xmit_done; >> @@ -645,7 +646,7 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector, >> tx_desc = IXGBE_TX_DESC(tx_ring, i); >> i -= tx_ring->count; >> >> - do { >> + while (likely(budget && used_descs)) { >> if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD))) >> break; >> >> @@ -673,7 +674,8 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector, >> >> /* update budget accounting */ >> budget--; >> - } while (likely(budget)); >> + used_descs--; >> + } >> >> i += tx_ring->count; >> tx_ring->next_to_clean = i; >> -- >> 2.17.1 >> > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp 2019-08-20 15:58 ` Ilya Maximets @ 2019-08-21 1:17 ` Alexander Duyck 2019-08-21 16:21 ` Ilya Maximets 0 siblings, 1 reply; 14+ messages in thread From: Alexander Duyck @ 2019-08-21 1:17 UTC (permalink / raw) To: Ilya Maximets Cc: Netdev, LKML, bpf, David S. Miller, Björn Töpel, Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron, William Tu On Tue, Aug 20, 2019 at 8:58 AM Ilya Maximets <i.maximets@samsung.com> wrote: > > On 20.08.2019 18:35, Alexander Duyck wrote: > > On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote: > >> > >> Tx code doesn't clear the descriptor status after cleaning. > >> So, if the budget is larger than number of used elems in a ring, some > >> descriptors will be accounted twice and xsk_umem_complete_tx will move > >> prod_tail far beyond the prod_head breaking the comletion queue ring. > >> > >> Fix that by limiting the number of descriptors to clean by the number > >> of used descriptors in the tx ring. > >> > >> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") > >> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > > > > I'm not sure this is the best way to go. My preference would be to > > have something in the ring that would prevent us from racing which I > > don't think this really addresses. I am pretty sure this code is safe > > on x86 but I would be worried about weak ordered systems such as > > PowerPC. > > > > It might make sense to look at adding the eop_desc logic like we have > > in the regular path with a proper barrier before we write it and after > > we read it. So for example we could hold of on writing the bytecount > > value until the end of an iteration and call smp_wmb before we write > > it. Then on the cleanup we could read it and if it is non-zero we take > > an smp_rmb before proceeding further to process the Tx descriptor and > > clearing the value. Otherwise this code is going to just keep popping > > up with issues. > > But, unlike regular case, xdp zero-copy xmit and clean for particular > tx ring always happens in the same NAPI context and even on the same > CPU core. > > I saw the 'eop_desc' manipulations in regular case and yes, we could > use 'next_to_watch' field just as a flag of descriptor existence, > but it seems unnecessarily complicated. Am I missing something? > So is it always in the same NAPI context?. I forgot, I was thinking that somehow the socket could possibly make use of XDP for transmit. As far as the logic to use I would be good with just using a value you are already setting such as the bytecount value. All that would need to happen is to guarantee that the value is cleared in the Tx path. So if you clear the bytecount in ixgbe_clean_xdp_tx_irq you could theoretically just use that as well to flag that a descriptor has been populated and is ready to be cleaned. Assuming the logic about this all being in the same NAPI context anyway you wouldn't need to mess with the barrier stuff I mentioned before. - Alex ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp 2019-08-21 1:17 ` Alexander Duyck @ 2019-08-21 16:21 ` Ilya Maximets 2019-08-21 16:57 ` Alexander Duyck 2019-08-22 7:10 ` Björn Töpel 0 siblings, 2 replies; 14+ messages in thread From: Ilya Maximets @ 2019-08-21 16:21 UTC (permalink / raw) To: Alexander Duyck, Björn Töpel Cc: Netdev, LKML, bpf, David S. Miller, Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron, William Tu On 21.08.2019 4:17, Alexander Duyck wrote: > On Tue, Aug 20, 2019 at 8:58 AM Ilya Maximets <i.maximets@samsung.com> wrote: >> >> On 20.08.2019 18:35, Alexander Duyck wrote: >>> On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote: >>>> >>>> Tx code doesn't clear the descriptor status after cleaning. >>>> So, if the budget is larger than number of used elems in a ring, some >>>> descriptors will be accounted twice and xsk_umem_complete_tx will move >>>> prod_tail far beyond the prod_head breaking the comletion queue ring. >>>> >>>> Fix that by limiting the number of descriptors to clean by the number >>>> of used descriptors in the tx ring. >>>> >>>> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") >>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> >>> >>> I'm not sure this is the best way to go. My preference would be to >>> have something in the ring that would prevent us from racing which I >>> don't think this really addresses. I am pretty sure this code is safe >>> on x86 but I would be worried about weak ordered systems such as >>> PowerPC. >>> >>> It might make sense to look at adding the eop_desc logic like we have >>> in the regular path with a proper barrier before we write it and after >>> we read it. So for example we could hold of on writing the bytecount >>> value until the end of an iteration and call smp_wmb before we write >>> it. Then on the cleanup we could read it and if it is non-zero we take >>> an smp_rmb before proceeding further to process the Tx descriptor and >>> clearing the value. Otherwise this code is going to just keep popping >>> up with issues. >> >> But, unlike regular case, xdp zero-copy xmit and clean for particular >> tx ring always happens in the same NAPI context and even on the same >> CPU core. >> >> I saw the 'eop_desc' manipulations in regular case and yes, we could >> use 'next_to_watch' field just as a flag of descriptor existence, >> but it seems unnecessarily complicated. Am I missing something? >> > > So is it always in the same NAPI context?. I forgot, I was thinking > that somehow the socket could possibly make use of XDP for transmit. AF_XDP socket only triggers tx interrupt on ndo_xsk_async_xmit() which is used in zero-copy mode. Real xmit happens inside ixgbe_poll() -> ixgbe_clean_xdp_tx_irq() -> ixgbe_xmit_zc() This should be not possible to bound another XDP socket to the same netdev queue. It also possible to xmit frames in xdp_ring while performing XDP_TX/REDIRECT actions. REDIRECT could happen from different netdev with different NAPI context, but this operation is bound to specific CPU core and each core has its own xdp_ring. However, I'm not an expert here. Björn, maybe you could comment on this? > > As far as the logic to use I would be good with just using a value you > are already setting such as the bytecount value. All that would need > to happen is to guarantee that the value is cleared in the Tx path. So > if you clear the bytecount in ixgbe_clean_xdp_tx_irq you could > theoretically just use that as well to flag that a descriptor has been > populated and is ready to be cleaned. Assuming the logic about this > all being in the same NAPI context anyway you wouldn't need to mess > with the barrier stuff I mentioned before. Checking the number of used descs, i.e. next_to_use - next_to_clean, makes iteration in this function logically equal to the iteration inside 'ixgbe_xsk_clean_tx_ring()'. Do you think we need to change the later function too to follow same 'bytecount' approach? I don't like having two different ways to determine number of used descriptors in the same file. Best regards, Ilya Maximets. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp 2019-08-21 16:21 ` Ilya Maximets @ 2019-08-21 16:57 ` Alexander Duyck 2019-08-21 21:38 ` William Tu 2019-08-22 7:12 ` [Intel-wired-lan] " Björn Töpel 2019-08-22 7:10 ` Björn Töpel 1 sibling, 2 replies; 14+ messages in thread From: Alexander Duyck @ 2019-08-21 16:57 UTC (permalink / raw) To: Ilya Maximets Cc: Björn Töpel, Netdev, LKML, bpf, David S. Miller, Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron, William Tu On Wed, Aug 21, 2019 at 9:22 AM Ilya Maximets <i.maximets@samsung.com> wrote: > > On 21.08.2019 4:17, Alexander Duyck wrote: > > On Tue, Aug 20, 2019 at 8:58 AM Ilya Maximets <i.maximets@samsung.com> wrote: > >> > >> On 20.08.2019 18:35, Alexander Duyck wrote: > >>> On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote: > >>>> > >>>> Tx code doesn't clear the descriptor status after cleaning. > >>>> So, if the budget is larger than number of used elems in a ring, some > >>>> descriptors will be accounted twice and xsk_umem_complete_tx will move > >>>> prod_tail far beyond the prod_head breaking the comletion queue ring. > >>>> > >>>> Fix that by limiting the number of descriptors to clean by the number > >>>> of used descriptors in the tx ring. > >>>> > >>>> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") > >>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > >>> > >>> I'm not sure this is the best way to go. My preference would be to > >>> have something in the ring that would prevent us from racing which I > >>> don't think this really addresses. I am pretty sure this code is safe > >>> on x86 but I would be worried about weak ordered systems such as > >>> PowerPC. > >>> > >>> It might make sense to look at adding the eop_desc logic like we have > >>> in the regular path with a proper barrier before we write it and after > >>> we read it. So for example we could hold of on writing the bytecount > >>> value until the end of an iteration and call smp_wmb before we write > >>> it. Then on the cleanup we could read it and if it is non-zero we take > >>> an smp_rmb before proceeding further to process the Tx descriptor and > >>> clearing the value. Otherwise this code is going to just keep popping > >>> up with issues. > >> > >> But, unlike regular case, xdp zero-copy xmit and clean for particular > >> tx ring always happens in the same NAPI context and even on the same > >> CPU core. > >> > >> I saw the 'eop_desc' manipulations in regular case and yes, we could > >> use 'next_to_watch' field just as a flag of descriptor existence, > >> but it seems unnecessarily complicated. Am I missing something? > >> > > > > So is it always in the same NAPI context?. I forgot, I was thinking > > that somehow the socket could possibly make use of XDP for transmit. > > AF_XDP socket only triggers tx interrupt on ndo_xsk_async_xmit() which > is used in zero-copy mode. Real xmit happens inside > ixgbe_poll() > -> ixgbe_clean_xdp_tx_irq() > -> ixgbe_xmit_zc() > > This should be not possible to bound another XDP socket to the same netdev > queue. > > It also possible to xmit frames in xdp_ring while performing XDP_TX/REDIRECT > actions. REDIRECT could happen from different netdev with different NAPI > context, but this operation is bound to specific CPU core and each core has > its own xdp_ring. > > However, I'm not an expert here. > Björn, maybe you could comment on this? > > > > > As far as the logic to use I would be good with just using a value you > > are already setting such as the bytecount value. All that would need > > to happen is to guarantee that the value is cleared in the Tx path. So > > if you clear the bytecount in ixgbe_clean_xdp_tx_irq you could > > theoretically just use that as well to flag that a descriptor has been > > populated and is ready to be cleaned. Assuming the logic about this > > all being in the same NAPI context anyway you wouldn't need to mess > > with the barrier stuff I mentioned before. > > Checking the number of used descs, i.e. next_to_use - next_to_clean, > makes iteration in this function logically equal to the iteration inside > 'ixgbe_xsk_clean_tx_ring()'. Do you think we need to change the later > function too to follow same 'bytecount' approach? I don't like having > two different ways to determine number of used descriptors in the same file. > > Best regards, Ilya Maximets. As far as ixgbe_clean_xdp_tx_irq() vs ixgbe_xsk_clean_tx_ring(), I would say that if you got rid of budget and framed things more like how ixgbe_xsk_clean_tx_ring was framed with the ntc != ntu being obvious I would prefer to see us go that route. Really there is no need for budget in ixgbe_clean_xdp_tx_irq() if you are going to be working with a static ntu value since you will only ever process one iteration through the ring anyway. It might make more sense if you just went through and got rid of budget and i, and instead used ntc and ntu like what was done in ixgbe_xsk_clean_tx_ring(). Thanks. - Alex ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp 2019-08-21 16:57 ` Alexander Duyck @ 2019-08-21 21:38 ` William Tu 2019-08-22 8:17 ` Ilya Maximets 2019-08-22 7:12 ` [Intel-wired-lan] " Björn Töpel 1 sibling, 1 reply; 14+ messages in thread From: William Tu @ 2019-08-21 21:38 UTC (permalink / raw) To: Alexander Duyck Cc: Ilya Maximets, Björn Töpel, Netdev, LKML, bpf, David S. Miller, Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron On Wed, Aug 21, 2019 at 9:57 AM Alexander Duyck <alexander.duyck@gmail.com> wrote: > > On Wed, Aug 21, 2019 at 9:22 AM Ilya Maximets <i.maximets@samsung.com> wrote: > > > > On 21.08.2019 4:17, Alexander Duyck wrote: > > > On Tue, Aug 20, 2019 at 8:58 AM Ilya Maximets <i.maximets@samsung.com> wrote: > > >> > > >> On 20.08.2019 18:35, Alexander Duyck wrote: > > >>> On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote: > > >>>> > > >>>> Tx code doesn't clear the descriptor status after cleaning. > > >>>> So, if the budget is larger than number of used elems in a ring, some > > >>>> descriptors will be accounted twice and xsk_umem_complete_tx will move > > >>>> prod_tail far beyond the prod_head breaking the comletion queue ring. > > >>>> > > >>>> Fix that by limiting the number of descriptors to clean by the number > > >>>> of used descriptors in the tx ring. > > >>>> > > >>>> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") > > >>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > > >>> > > >>> I'm not sure this is the best way to go. My preference would be to > > >>> have something in the ring that would prevent us from racing which I > > >>> don't think this really addresses. I am pretty sure this code is safe > > >>> on x86 but I would be worried about weak ordered systems such as > > >>> PowerPC. > > >>> > > >>> It might make sense to look at adding the eop_desc logic like we have > > >>> in the regular path with a proper barrier before we write it and after > > >>> we read it. So for example we could hold of on writing the bytecount > > >>> value until the end of an iteration and call smp_wmb before we write > > >>> it. Then on the cleanup we could read it and if it is non-zero we take > > >>> an smp_rmb before proceeding further to process the Tx descriptor and > > >>> clearing the value. Otherwise this code is going to just keep popping > > >>> up with issues. > > >> > > >> But, unlike regular case, xdp zero-copy xmit and clean for particular > > >> tx ring always happens in the same NAPI context and even on the same > > >> CPU core. > > >> > > >> I saw the 'eop_desc' manipulations in regular case and yes, we could > > >> use 'next_to_watch' field just as a flag of descriptor existence, > > >> but it seems unnecessarily complicated. Am I missing something? > > >> > > > > > > So is it always in the same NAPI context?. I forgot, I was thinking > > > that somehow the socket could possibly make use of XDP for transmit. > > > > AF_XDP socket only triggers tx interrupt on ndo_xsk_async_xmit() which > > is used in zero-copy mode. Real xmit happens inside > > ixgbe_poll() > > -> ixgbe_clean_xdp_tx_irq() > > -> ixgbe_xmit_zc() > > > > This should be not possible to bound another XDP socket to the same netdev > > queue. > > > > It also possible to xmit frames in xdp_ring while performing XDP_TX/REDIRECT > > actions. REDIRECT could happen from different netdev with different NAPI > > context, but this operation is bound to specific CPU core and each core has > > its own xdp_ring. > > > > However, I'm not an expert here. > > Björn, maybe you could comment on this? > > > > > > > > As far as the logic to use I would be good with just using a value you > > > are already setting such as the bytecount value. All that would need > > > to happen is to guarantee that the value is cleared in the Tx path. So > > > if you clear the bytecount in ixgbe_clean_xdp_tx_irq you could > > > theoretically just use that as well to flag that a descriptor has been > > > populated and is ready to be cleaned. Assuming the logic about this > > > all being in the same NAPI context anyway you wouldn't need to mess > > > with the barrier stuff I mentioned before. > > > > Checking the number of used descs, i.e. next_to_use - next_to_clean, > > makes iteration in this function logically equal to the iteration inside > > 'ixgbe_xsk_clean_tx_ring()'. Do you think we need to change the later > > function too to follow same 'bytecount' approach? I don't like having > > two different ways to determine number of used descriptors in the same file. > > > > Best regards, Ilya Maximets. > > As far as ixgbe_clean_xdp_tx_irq() vs ixgbe_xsk_clean_tx_ring(), I > would say that if you got rid of budget and framed things more like > how ixgbe_xsk_clean_tx_ring was framed with the ntc != ntu being > obvious I would prefer to see us go that route. > > Really there is no need for budget in ixgbe_clean_xdp_tx_irq() if you > are going to be working with a static ntu value since you will only > ever process one iteration through the ring anyway. It might make more > sense if you just went through and got rid of budget and i, and > instead used ntc and ntu like what was done in > ixgbe_xsk_clean_tx_ring(). > > Thanks. > > - Alex Not familiar with the driver details. I tested this patch and the issue mentioned in OVS mailing list. https://www.mail-archive.com/ovs-dev@openvswitch.org/msg35362.html and indeed the problem goes away. But I saw a huge performance drop, my AF_XDP tx performance drops from >9Mpps to <5Mpps. Tested using kernel 5.3.0-rc3+ 03:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01) Subsystem: Intel Corporation Ethernet 10G 2P X540-t Adapter Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Regards, William ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp 2019-08-21 21:38 ` William Tu @ 2019-08-22 8:17 ` Ilya Maximets 2019-08-22 16:07 ` William Tu 0 siblings, 1 reply; 14+ messages in thread From: Ilya Maximets @ 2019-08-22 8:17 UTC (permalink / raw) To: William Tu, Alexander Duyck Cc: Björn Töpel, Netdev, LKML, bpf, David S. Miller, Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron On 22.08.2019 0:38, William Tu wrote: > On Wed, Aug 21, 2019 at 9:57 AM Alexander Duyck > <alexander.duyck@gmail.com> wrote: >> >> On Wed, Aug 21, 2019 at 9:22 AM Ilya Maximets <i.maximets@samsung.com> wrote: >>> >>> On 21.08.2019 4:17, Alexander Duyck wrote: >>>> On Tue, Aug 20, 2019 at 8:58 AM Ilya Maximets <i.maximets@samsung.com> wrote: >>>>> >>>>> On 20.08.2019 18:35, Alexander Duyck wrote: >>>>>> On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote: >>>>>>> >>>>>>> Tx code doesn't clear the descriptor status after cleaning. >>>>>>> So, if the budget is larger than number of used elems in a ring, some >>>>>>> descriptors will be accounted twice and xsk_umem_complete_tx will move >>>>>>> prod_tail far beyond the prod_head breaking the comletion queue ring. >>>>>>> >>>>>>> Fix that by limiting the number of descriptors to clean by the number >>>>>>> of used descriptors in the tx ring. >>>>>>> >>>>>>> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") >>>>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> >>>>>> >>>>>> I'm not sure this is the best way to go. My preference would be to >>>>>> have something in the ring that would prevent us from racing which I >>>>>> don't think this really addresses. I am pretty sure this code is safe >>>>>> on x86 but I would be worried about weak ordered systems such as >>>>>> PowerPC. >>>>>> >>>>>> It might make sense to look at adding the eop_desc logic like we have >>>>>> in the regular path with a proper barrier before we write it and after >>>>>> we read it. So for example we could hold of on writing the bytecount >>>>>> value until the end of an iteration and call smp_wmb before we write >>>>>> it. Then on the cleanup we could read it and if it is non-zero we take >>>>>> an smp_rmb before proceeding further to process the Tx descriptor and >>>>>> clearing the value. Otherwise this code is going to just keep popping >>>>>> up with issues. >>>>> >>>>> But, unlike regular case, xdp zero-copy xmit and clean for particular >>>>> tx ring always happens in the same NAPI context and even on the same >>>>> CPU core. >>>>> >>>>> I saw the 'eop_desc' manipulations in regular case and yes, we could >>>>> use 'next_to_watch' field just as a flag of descriptor existence, >>>>> but it seems unnecessarily complicated. Am I missing something? >>>>> >>>> >>>> So is it always in the same NAPI context?. I forgot, I was thinking >>>> that somehow the socket could possibly make use of XDP for transmit. >>> >>> AF_XDP socket only triggers tx interrupt on ndo_xsk_async_xmit() which >>> is used in zero-copy mode. Real xmit happens inside >>> ixgbe_poll() >>> -> ixgbe_clean_xdp_tx_irq() >>> -> ixgbe_xmit_zc() >>> >>> This should be not possible to bound another XDP socket to the same netdev >>> queue. >>> >>> It also possible to xmit frames in xdp_ring while performing XDP_TX/REDIRECT >>> actions. REDIRECT could happen from different netdev with different NAPI >>> context, but this operation is bound to specific CPU core and each core has >>> its own xdp_ring. >>> >>> However, I'm not an expert here. >>> Björn, maybe you could comment on this? >>> >>>> >>>> As far as the logic to use I would be good with just using a value you >>>> are already setting such as the bytecount value. All that would need >>>> to happen is to guarantee that the value is cleared in the Tx path. So >>>> if you clear the bytecount in ixgbe_clean_xdp_tx_irq you could >>>> theoretically just use that as well to flag that a descriptor has been >>>> populated and is ready to be cleaned. Assuming the logic about this >>>> all being in the same NAPI context anyway you wouldn't need to mess >>>> with the barrier stuff I mentioned before. >>> >>> Checking the number of used descs, i.e. next_to_use - next_to_clean, >>> makes iteration in this function logically equal to the iteration inside >>> 'ixgbe_xsk_clean_tx_ring()'. Do you think we need to change the later >>> function too to follow same 'bytecount' approach? I don't like having >>> two different ways to determine number of used descriptors in the same file. >>> >>> Best regards, Ilya Maximets. >> >> As far as ixgbe_clean_xdp_tx_irq() vs ixgbe_xsk_clean_tx_ring(), I >> would say that if you got rid of budget and framed things more like >> how ixgbe_xsk_clean_tx_ring was framed with the ntc != ntu being >> obvious I would prefer to see us go that route. >> >> Really there is no need for budget in ixgbe_clean_xdp_tx_irq() if you >> are going to be working with a static ntu value since you will only >> ever process one iteration through the ring anyway. It might make more >> sense if you just went through and got rid of budget and i, and >> instead used ntc and ntu like what was done in >> ixgbe_xsk_clean_tx_ring(). >> >> Thanks. >> >> - Alex > > Not familiar with the driver details. > I tested this patch and the issue mentioned in OVS mailing list. > https://www.mail-archive.com/ovs-dev@openvswitch.org/msg35362.html > and indeed the problem goes away. Good. Thanks for testing! > But I saw a huge performance drop, > my AF_XDP tx performance drops from >9Mpps to <5Mpps. I didn't expect so big performance difference with this change. What is your test scenario? Is it possible that you're accounting same packet several times due to broken completion queue? Looking at samples/bpf/xdpsock_user.c:complete_tx_only(), it accounts sent packets (tx_npkts) by accumulating results of xsk_ring_cons__peek() for completion queue, so it's not a trusted source of pps information. Best regards, Ilya Maximets. > > Tested using kernel 5.3.0-rc3+ > 03:00.0 Ethernet controller: Intel Corporation Ethernet Controller > 10-Gigabit X540-AT2 (rev 01) > Subsystem: Intel Corporation Ethernet 10G 2P X540-t Adapter > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- DisINTx+ > > Regards, > William ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp 2019-08-22 8:17 ` Ilya Maximets @ 2019-08-22 16:07 ` William Tu 2019-08-22 16:30 ` Ilya Maximets 0 siblings, 1 reply; 14+ messages in thread From: William Tu @ 2019-08-22 16:07 UTC (permalink / raw) To: Ilya Maximets Cc: Alexander Duyck, Björn Töpel, Netdev, LKML, bpf, David S. Miller, Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron On Thu, Aug 22, 2019 at 1:17 AM Ilya Maximets <i.maximets@samsung.com> wrote: > > On 22.08.2019 0:38, William Tu wrote: > > On Wed, Aug 21, 2019 at 9:57 AM Alexander Duyck > > <alexander.duyck@gmail.com> wrote: > >> > >> On Wed, Aug 21, 2019 at 9:22 AM Ilya Maximets <i.maximets@samsung.com> wrote: > >>> > >>> On 21.08.2019 4:17, Alexander Duyck wrote: > >>>> On Tue, Aug 20, 2019 at 8:58 AM Ilya Maximets <i.maximets@samsung.com> wrote: > >>>>> > >>>>> On 20.08.2019 18:35, Alexander Duyck wrote: > >>>>>> On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote: > >>>>>>> > >>>>>>> Tx code doesn't clear the descriptor status after cleaning. > >>>>>>> So, if the budget is larger than number of used elems in a ring, some > >>>>>>> descriptors will be accounted twice and xsk_umem_complete_tx will move > >>>>>>> prod_tail far beyond the prod_head breaking the comletion queue ring. > >>>>>>> > >>>>>>> Fix that by limiting the number of descriptors to clean by the number > >>>>>>> of used descriptors in the tx ring. > >>>>>>> > >>>>>>> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") > >>>>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > >>>>>> > >>>>>> I'm not sure this is the best way to go. My preference would be to > >>>>>> have something in the ring that would prevent us from racing which I > >>>>>> don't think this really addresses. I am pretty sure this code is safe > >>>>>> on x86 but I would be worried about weak ordered systems such as > >>>>>> PowerPC. > >>>>>> > >>>>>> It might make sense to look at adding the eop_desc logic like we have > >>>>>> in the regular path with a proper barrier before we write it and after > >>>>>> we read it. So for example we could hold of on writing the bytecount > >>>>>> value until the end of an iteration and call smp_wmb before we write > >>>>>> it. Then on the cleanup we could read it and if it is non-zero we take > >>>>>> an smp_rmb before proceeding further to process the Tx descriptor and > >>>>>> clearing the value. Otherwise this code is going to just keep popping > >>>>>> up with issues. > >>>>> > >>>>> But, unlike regular case, xdp zero-copy xmit and clean for particular > >>>>> tx ring always happens in the same NAPI context and even on the same > >>>>> CPU core. > >>>>> > >>>>> I saw the 'eop_desc' manipulations in regular case and yes, we could > >>>>> use 'next_to_watch' field just as a flag of descriptor existence, > >>>>> but it seems unnecessarily complicated. Am I missing something? > >>>>> > >>>> > >>>> So is it always in the same NAPI context?. I forgot, I was thinking > >>>> that somehow the socket could possibly make use of XDP for transmit. > >>> > >>> AF_XDP socket only triggers tx interrupt on ndo_xsk_async_xmit() which > >>> is used in zero-copy mode. Real xmit happens inside > >>> ixgbe_poll() > >>> -> ixgbe_clean_xdp_tx_irq() > >>> -> ixgbe_xmit_zc() > >>> > >>> This should be not possible to bound another XDP socket to the same netdev > >>> queue. > >>> > >>> It also possible to xmit frames in xdp_ring while performing XDP_TX/REDIRECT > >>> actions. REDIRECT could happen from different netdev with different NAPI > >>> context, but this operation is bound to specific CPU core and each core has > >>> its own xdp_ring. > >>> > >>> However, I'm not an expert here. > >>> Björn, maybe you could comment on this? > >>> > >>>> > >>>> As far as the logic to use I would be good with just using a value you > >>>> are already setting such as the bytecount value. All that would need > >>>> to happen is to guarantee that the value is cleared in the Tx path. So > >>>> if you clear the bytecount in ixgbe_clean_xdp_tx_irq you could > >>>> theoretically just use that as well to flag that a descriptor has been > >>>> populated and is ready to be cleaned. Assuming the logic about this > >>>> all being in the same NAPI context anyway you wouldn't need to mess > >>>> with the barrier stuff I mentioned before. > >>> > >>> Checking the number of used descs, i.e. next_to_use - next_to_clean, > >>> makes iteration in this function logically equal to the iteration inside > >>> 'ixgbe_xsk_clean_tx_ring()'. Do you think we need to change the later > >>> function too to follow same 'bytecount' approach? I don't like having > >>> two different ways to determine number of used descriptors in the same file. > >>> > >>> Best regards, Ilya Maximets. > >> > >> As far as ixgbe_clean_xdp_tx_irq() vs ixgbe_xsk_clean_tx_ring(), I > >> would say that if you got rid of budget and framed things more like > >> how ixgbe_xsk_clean_tx_ring was framed with the ntc != ntu being > >> obvious I would prefer to see us go that route. > >> > >> Really there is no need for budget in ixgbe_clean_xdp_tx_irq() if you > >> are going to be working with a static ntu value since you will only > >> ever process one iteration through the ring anyway. It might make more > >> sense if you just went through and got rid of budget and i, and > >> instead used ntc and ntu like what was done in > >> ixgbe_xsk_clean_tx_ring(). > >> > >> Thanks. > >> > >> - Alex > > > > Not familiar with the driver details. > > I tested this patch and the issue mentioned in OVS mailing list. > > https://www.mail-archive.com/ovs-dev@openvswitch.org/msg35362.html > > and indeed the problem goes away. > > Good. Thanks for testing! > > > But I saw a huge performance drop, > > my AF_XDP tx performance drops from >9Mpps to <5Mpps. > > I didn't expect so big performance difference with this change. > What is your test scenario? I was using OVS with dual port NIC, setting one OpenFlow rule in_port=eth2 actions=output:eth3 and eth2 for rx and measure eth3 tx 'sar -n DEV 1' shows pretty huge drop on eth3 tx. > Is it possible that you're accounting same > packet several times due to broken completion queue? That's possible. Let me double check on your v2 patch. @Eelco: do you also see some performance difference? Regards, William > > Looking at samples/bpf/xdpsock_user.c:complete_tx_only(), it accounts > sent packets (tx_npkts) by accumulating results of xsk_ring_cons__peek() > for completion queue, so it's not a trusted source of pps information. > > Best regards, Ilya Maximets. > > > > > Tested using kernel 5.3.0-rc3+ > > 03:00.0 Ethernet controller: Intel Corporation Ethernet Controller > > 10-Gigabit X540-AT2 (rev 01) > > Subsystem: Intel Corporation Ethernet 10G 2P X540-t Adapter > > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > > Stepping- SERR- FastB2B- DisINTx+ > > > > Regards, > > William ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp 2019-08-22 16:07 ` William Tu @ 2019-08-22 16:30 ` Ilya Maximets 0 siblings, 0 replies; 14+ messages in thread From: Ilya Maximets @ 2019-08-22 16:30 UTC (permalink / raw) To: William Tu Cc: Alexander Duyck, Björn Töpel, Netdev, LKML, bpf, David S. Miller, Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, Jeff Kirsher, intel-wired-lan, Eelco Chaudron On 22.08.2019 19:07, William Tu wrote: > On Thu, Aug 22, 2019 at 1:17 AM Ilya Maximets <i.maximets@samsung.com> wrote: >> >> On 22.08.2019 0:38, William Tu wrote: >>> On Wed, Aug 21, 2019 at 9:57 AM Alexander Duyck >>> <alexander.duyck@gmail.com> wrote: >>>> >>>> On Wed, Aug 21, 2019 at 9:22 AM Ilya Maximets <i.maximets@samsung.com> wrote: >>>>> >>>>> On 21.08.2019 4:17, Alexander Duyck wrote: >>>>>> On Tue, Aug 20, 2019 at 8:58 AM Ilya Maximets <i.maximets@samsung.com> wrote: >>>>>>> >>>>>>> On 20.08.2019 18:35, Alexander Duyck wrote: >>>>>>>> On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote: >>>>>>>>> >>>>>>>>> Tx code doesn't clear the descriptor status after cleaning. >>>>>>>>> So, if the budget is larger than number of used elems in a ring, some >>>>>>>>> descriptors will be accounted twice and xsk_umem_complete_tx will move >>>>>>>>> prod_tail far beyond the prod_head breaking the comletion queue ring. >>>>>>>>> >>>>>>>>> Fix that by limiting the number of descriptors to clean by the number >>>>>>>>> of used descriptors in the tx ring. >>>>>>>>> >>>>>>>>> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") >>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> >>>>>>>> >>>>>>>> I'm not sure this is the best way to go. My preference would be to >>>>>>>> have something in the ring that would prevent us from racing which I >>>>>>>> don't think this really addresses. I am pretty sure this code is safe >>>>>>>> on x86 but I would be worried about weak ordered systems such as >>>>>>>> PowerPC. >>>>>>>> >>>>>>>> It might make sense to look at adding the eop_desc logic like we have >>>>>>>> in the regular path with a proper barrier before we write it and after >>>>>>>> we read it. So for example we could hold of on writing the bytecount >>>>>>>> value until the end of an iteration and call smp_wmb before we write >>>>>>>> it. Then on the cleanup we could read it and if it is non-zero we take >>>>>>>> an smp_rmb before proceeding further to process the Tx descriptor and >>>>>>>> clearing the value. Otherwise this code is going to just keep popping >>>>>>>> up with issues. >>>>>>> >>>>>>> But, unlike regular case, xdp zero-copy xmit and clean for particular >>>>>>> tx ring always happens in the same NAPI context and even on the same >>>>>>> CPU core. >>>>>>> >>>>>>> I saw the 'eop_desc' manipulations in regular case and yes, we could >>>>>>> use 'next_to_watch' field just as a flag of descriptor existence, >>>>>>> but it seems unnecessarily complicated. Am I missing something? >>>>>>> >>>>>> >>>>>> So is it always in the same NAPI context?. I forgot, I was thinking >>>>>> that somehow the socket could possibly make use of XDP for transmit. >>>>> >>>>> AF_XDP socket only triggers tx interrupt on ndo_xsk_async_xmit() which >>>>> is used in zero-copy mode. Real xmit happens inside >>>>> ixgbe_poll() >>>>> -> ixgbe_clean_xdp_tx_irq() >>>>> -> ixgbe_xmit_zc() >>>>> >>>>> This should be not possible to bound another XDP socket to the same netdev >>>>> queue. >>>>> >>>>> It also possible to xmit frames in xdp_ring while performing XDP_TX/REDIRECT >>>>> actions. REDIRECT could happen from different netdev with different NAPI >>>>> context, but this operation is bound to specific CPU core and each core has >>>>> its own xdp_ring. >>>>> >>>>> However, I'm not an expert here. >>>>> Björn, maybe you could comment on this? >>>>> >>>>>> >>>>>> As far as the logic to use I would be good with just using a value you >>>>>> are already setting such as the bytecount value. All that would need >>>>>> to happen is to guarantee that the value is cleared in the Tx path. So >>>>>> if you clear the bytecount in ixgbe_clean_xdp_tx_irq you could >>>>>> theoretically just use that as well to flag that a descriptor has been >>>>>> populated and is ready to be cleaned. Assuming the logic about this >>>>>> all being in the same NAPI context anyway you wouldn't need to mess >>>>>> with the barrier stuff I mentioned before. >>>>> >>>>> Checking the number of used descs, i.e. next_to_use - next_to_clean, >>>>> makes iteration in this function logically equal to the iteration inside >>>>> 'ixgbe_xsk_clean_tx_ring()'. Do you think we need to change the later >>>>> function too to follow same 'bytecount' approach? I don't like having >>>>> two different ways to determine number of used descriptors in the same file. >>>>> >>>>> Best regards, Ilya Maximets. >>>> >>>> As far as ixgbe_clean_xdp_tx_irq() vs ixgbe_xsk_clean_tx_ring(), I >>>> would say that if you got rid of budget and framed things more like >>>> how ixgbe_xsk_clean_tx_ring was framed with the ntc != ntu being >>>> obvious I would prefer to see us go that route. >>>> >>>> Really there is no need for budget in ixgbe_clean_xdp_tx_irq() if you >>>> are going to be working with a static ntu value since you will only >>>> ever process one iteration through the ring anyway. It might make more >>>> sense if you just went through and got rid of budget and i, and >>>> instead used ntc and ntu like what was done in >>>> ixgbe_xsk_clean_tx_ring(). >>>> >>>> Thanks. >>>> >>>> - Alex >>> >>> Not familiar with the driver details. >>> I tested this patch and the issue mentioned in OVS mailing list. >>> https://www.mail-archive.com/ovs-dev@openvswitch.org/msg35362.html >>> and indeed the problem goes away. >> >> Good. Thanks for testing! >> >>> But I saw a huge performance drop, >>> my AF_XDP tx performance drops from >9Mpps to <5Mpps. >> >> I didn't expect so big performance difference with this change. >> What is your test scenario? > > I was using OVS with dual port NIC, setting one OpenFlow rule > in_port=eth2 actions=output:eth3 > and eth2 for rx and measure eth3 tx > 'sar -n DEV 1' shows pretty huge drop on eth3 tx. 'sar' just parses net procfs to obtain interface stats, but interface stats are not correct due to this bug (same packets accounted several times). > >> Is it possible that you're accounting same >> packet several times due to broken completion queue? > > That's possible. > Let me double check on your v2 patch. > > @Eelco: do you also see some performance difference? > > Regards, > William > >> >> Looking at samples/bpf/xdpsock_user.c:complete_tx_only(), it accounts >> sent packets (tx_npkts) by accumulating results of xsk_ring_cons__peek() >> for completion queue, so it's not a trusted source of pps information. >> >> Best regards, Ilya Maximets. >> >>> >>> Tested using kernel 5.3.0-rc3+ >>> 03:00.0 Ethernet controller: Intel Corporation Ethernet Controller >>> 10-Gigabit X540-AT2 (rev 01) >>> Subsystem: Intel Corporation Ethernet 10G 2P X540-t Adapter >>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- >>> Stepping- SERR- FastB2B- DisINTx+ >>> >>> Regards, >>> William > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] [PATCH net] ixgbe: fix double clean of tx descriptors with xdp 2019-08-21 16:57 ` Alexander Duyck 2019-08-21 21:38 ` William Tu @ 2019-08-22 7:12 ` Björn Töpel 2019-08-22 8:05 ` Ilya Maximets 1 sibling, 1 reply; 14+ messages in thread From: Björn Töpel @ 2019-08-22 7:12 UTC (permalink / raw) To: Alexander Duyck Cc: Ilya Maximets, Jakub Kicinski, Daniel Borkmann, Netdev, William Tu, LKML, Alexei Starovoitov, intel-wired-lan, bpf, Björn Töpel, David S. Miller, Magnus Karlsson, Eelco Chaudron On Wed, 21 Aug 2019 at 18:57, Alexander Duyck <alexander.duyck@gmail.com> wrote: > > On Wed, Aug 21, 2019 at 9:22 AM Ilya Maximets <i.maximets@samsung.com> wrote: > > > > On 21.08.2019 4:17, Alexander Duyck wrote: > > > On Tue, Aug 20, 2019 at 8:58 AM Ilya Maximets <i.maximets@samsung.com> wrote: > > >> > > >> On 20.08.2019 18:35, Alexander Duyck wrote: > > >>> On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote: > > >>>> > > >>>> Tx code doesn't clear the descriptor status after cleaning. > > >>>> So, if the budget is larger than number of used elems in a ring, some > > >>>> descriptors will be accounted twice and xsk_umem_complete_tx will move > > >>>> prod_tail far beyond the prod_head breaking the comletion queue ring. > > >>>> > > >>>> Fix that by limiting the number of descriptors to clean by the number > > >>>> of used descriptors in the tx ring. > > >>>> > > >>>> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") > > >>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > > >>> > > >>> I'm not sure this is the best way to go. My preference would be to > > >>> have something in the ring that would prevent us from racing which I > > >>> don't think this really addresses. I am pretty sure this code is safe > > >>> on x86 but I would be worried about weak ordered systems such as > > >>> PowerPC. > > >>> > > >>> It might make sense to look at adding the eop_desc logic like we have > > >>> in the regular path with a proper barrier before we write it and after > > >>> we read it. So for example we could hold of on writing the bytecount > > >>> value until the end of an iteration and call smp_wmb before we write > > >>> it. Then on the cleanup we could read it and if it is non-zero we take > > >>> an smp_rmb before proceeding further to process the Tx descriptor and > > >>> clearing the value. Otherwise this code is going to just keep popping > > >>> up with issues. > > >> > > >> But, unlike regular case, xdp zero-copy xmit and clean for particular > > >> tx ring always happens in the same NAPI context and even on the same > > >> CPU core. > > >> > > >> I saw the 'eop_desc' manipulations in regular case and yes, we could > > >> use 'next_to_watch' field just as a flag of descriptor existence, > > >> but it seems unnecessarily complicated. Am I missing something? > > >> > > > > > > So is it always in the same NAPI context?. I forgot, I was thinking > > > that somehow the socket could possibly make use of XDP for transmit. > > > > AF_XDP socket only triggers tx interrupt on ndo_xsk_async_xmit() which > > is used in zero-copy mode. Real xmit happens inside > > ixgbe_poll() > > -> ixgbe_clean_xdp_tx_irq() > > -> ixgbe_xmit_zc() > > > > This should be not possible to bound another XDP socket to the same netdev > > queue. > > > > It also possible to xmit frames in xdp_ring while performing XDP_TX/REDIRECT > > actions. REDIRECT could happen from different netdev with different NAPI > > context, but this operation is bound to specific CPU core and each core has > > its own xdp_ring. > > > > However, I'm not an expert here. > > Björn, maybe you could comment on this? > > > > > > > > As far as the logic to use I would be good with just using a value you > > > are already setting such as the bytecount value. All that would need > > > to happen is to guarantee that the value is cleared in the Tx path. So > > > if you clear the bytecount in ixgbe_clean_xdp_tx_irq you could > > > theoretically just use that as well to flag that a descriptor has been > > > populated and is ready to be cleaned. Assuming the logic about this > > > all being in the same NAPI context anyway you wouldn't need to mess > > > with the barrier stuff I mentioned before. > > > > Checking the number of used descs, i.e. next_to_use - next_to_clean, > > makes iteration in this function logically equal to the iteration inside > > 'ixgbe_xsk_clean_tx_ring()'. Do you think we need to change the later > > function too to follow same 'bytecount' approach? I don't like having > > two different ways to determine number of used descriptors in the same file. > > > > Best regards, Ilya Maximets. > > As far as ixgbe_clean_xdp_tx_irq() vs ixgbe_xsk_clean_tx_ring(), I > would say that if you got rid of budget and framed things more like > how ixgbe_xsk_clean_tx_ring was framed with the ntc != ntu being > obvious I would prefer to see us go that route. > > Really there is no need for budget in ixgbe_clean_xdp_tx_irq() if you > are going to be working with a static ntu value since you will only > ever process one iteration through the ring anyway. It might make more > sense if you just went through and got rid of budget and i, and > instead used ntc and ntu like what was done in > ixgbe_xsk_clean_tx_ring(). > +1. I'd prefer this as well! Cheers, Björn > Thanks. > > - Alex > _______________________________________________ > Intel-wired-lan mailing list > Intel-wired-lan@osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] [PATCH net] ixgbe: fix double clean of tx descriptors with xdp 2019-08-22 7:12 ` [Intel-wired-lan] " Björn Töpel @ 2019-08-22 8:05 ` Ilya Maximets 0 siblings, 0 replies; 14+ messages in thread From: Ilya Maximets @ 2019-08-22 8:05 UTC (permalink / raw) To: Björn Töpel, Alexander Duyck Cc: Jakub Kicinski, Daniel Borkmann, Netdev, William Tu, LKML, Alexei Starovoitov, intel-wired-lan, bpf, Björn Töpel, David S. Miller, Magnus Karlsson, Eelco Chaudron On 22.08.2019 10:12, Björn Töpel wrote: > On Wed, 21 Aug 2019 at 18:57, Alexander Duyck <alexander.duyck@gmail.com> wrote: >> >> On Wed, Aug 21, 2019 at 9:22 AM Ilya Maximets <i.maximets@samsung.com> wrote: >>> >>> On 21.08.2019 4:17, Alexander Duyck wrote: >>>> On Tue, Aug 20, 2019 at 8:58 AM Ilya Maximets <i.maximets@samsung.com> wrote: >>>>> >>>>> On 20.08.2019 18:35, Alexander Duyck wrote: >>>>>> On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote: >>>>>>> >>>>>>> Tx code doesn't clear the descriptor status after cleaning. >>>>>>> So, if the budget is larger than number of used elems in a ring, some >>>>>>> descriptors will be accounted twice and xsk_umem_complete_tx will move >>>>>>> prod_tail far beyond the prod_head breaking the comletion queue ring. >>>>>>> >>>>>>> Fix that by limiting the number of descriptors to clean by the number >>>>>>> of used descriptors in the tx ring. >>>>>>> >>>>>>> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") >>>>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> >>>>>> >>>>>> I'm not sure this is the best way to go. My preference would be to >>>>>> have something in the ring that would prevent us from racing which I >>>>>> don't think this really addresses. I am pretty sure this code is safe >>>>>> on x86 but I would be worried about weak ordered systems such as >>>>>> PowerPC. >>>>>> >>>>>> It might make sense to look at adding the eop_desc logic like we have >>>>>> in the regular path with a proper barrier before we write it and after >>>>>> we read it. So for example we could hold of on writing the bytecount >>>>>> value until the end of an iteration and call smp_wmb before we write >>>>>> it. Then on the cleanup we could read it and if it is non-zero we take >>>>>> an smp_rmb before proceeding further to process the Tx descriptor and >>>>>> clearing the value. Otherwise this code is going to just keep popping >>>>>> up with issues. >>>>> >>>>> But, unlike regular case, xdp zero-copy xmit and clean for particular >>>>> tx ring always happens in the same NAPI context and even on the same >>>>> CPU core. >>>>> >>>>> I saw the 'eop_desc' manipulations in regular case and yes, we could >>>>> use 'next_to_watch' field just as a flag of descriptor existence, >>>>> but it seems unnecessarily complicated. Am I missing something? >>>>> >>>> >>>> So is it always in the same NAPI context?. I forgot, I was thinking >>>> that somehow the socket could possibly make use of XDP for transmit. >>> >>> AF_XDP socket only triggers tx interrupt on ndo_xsk_async_xmit() which >>> is used in zero-copy mode. Real xmit happens inside >>> ixgbe_poll() >>> -> ixgbe_clean_xdp_tx_irq() >>> -> ixgbe_xmit_zc() >>> >>> This should be not possible to bound another XDP socket to the same netdev >>> queue. >>> >>> It also possible to xmit frames in xdp_ring while performing XDP_TX/REDIRECT >>> actions. REDIRECT could happen from different netdev with different NAPI >>> context, but this operation is bound to specific CPU core and each core has >>> its own xdp_ring. >>> >>> However, I'm not an expert here. >>> Björn, maybe you could comment on this? >>> >>>> >>>> As far as the logic to use I would be good with just using a value you >>>> are already setting such as the bytecount value. All that would need >>>> to happen is to guarantee that the value is cleared in the Tx path. So >>>> if you clear the bytecount in ixgbe_clean_xdp_tx_irq you could >>>> theoretically just use that as well to flag that a descriptor has been >>>> populated and is ready to be cleaned. Assuming the logic about this >>>> all being in the same NAPI context anyway you wouldn't need to mess >>>> with the barrier stuff I mentioned before. >>> >>> Checking the number of used descs, i.e. next_to_use - next_to_clean, >>> makes iteration in this function logically equal to the iteration inside >>> 'ixgbe_xsk_clean_tx_ring()'. Do you think we need to change the later >>> function too to follow same 'bytecount' approach? I don't like having >>> two different ways to determine number of used descriptors in the same file. >>> >>> Best regards, Ilya Maximets. >> >> As far as ixgbe_clean_xdp_tx_irq() vs ixgbe_xsk_clean_tx_ring(), I >> would say that if you got rid of budget and framed things more like >> how ixgbe_xsk_clean_tx_ring was framed with the ntc != ntu being >> obvious I would prefer to see us go that route. >> >> Really there is no need for budget in ixgbe_clean_xdp_tx_irq() if you >> are going to be working with a static ntu value since you will only >> ever process one iteration through the ring anyway. It might make more >> sense if you just went through and got rid of budget and i, and >> instead used ntc and ntu like what was done in >> ixgbe_xsk_clean_tx_ring(). >> > > +1. I'd prefer this as well! Sounds good. I'll look in this direction. But I'm not completely sure about 'budget' elimination. Wouldn't it cause issues if we'll clean the whole ring at once? I mean maybe it'll be too long to hold the CPU core for this amount of work. We could re-work the code keeping the loop break on budget exhaustion. What do you think? > > > Cheers, > Björn > >> Thanks. >> >> - Alex >> _______________________________________________ >> Intel-wired-lan mailing list >> Intel-wired-lan@osuosl.org >> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] [PATCH net] ixgbe: fix double clean of tx descriptors with xdp 2019-08-21 16:21 ` Ilya Maximets 2019-08-21 16:57 ` Alexander Duyck @ 2019-08-22 7:10 ` Björn Töpel 1 sibling, 0 replies; 14+ messages in thread From: Björn Töpel @ 2019-08-22 7:10 UTC (permalink / raw) To: Ilya Maximets Cc: Alexander Duyck, Björn Töpel, Jakub Kicinski, Daniel Borkmann, Netdev, William Tu, LKML, Alexei Starovoitov, intel-wired-lan, bpf, David S. Miller, Magnus Karlsson, Eelco Chaudron On Wed, 21 Aug 2019 at 18:22, Ilya Maximets <i.maximets@samsung.com> wrote: > > On 21.08.2019 4:17, Alexander Duyck wrote: > > On Tue, Aug 20, 2019 at 8:58 AM Ilya Maximets <i.maximets@samsung.com> wrote: > >> > >> On 20.08.2019 18:35, Alexander Duyck wrote: [...] > > > > So is it always in the same NAPI context?. I forgot, I was thinking > > that somehow the socket could possibly make use of XDP for transmit. > > AF_XDP socket only triggers tx interrupt on ndo_xsk_async_xmit() which > is used in zero-copy mode. Real xmit happens inside > ixgbe_poll() > -> ixgbe_clean_xdp_tx_irq() > -> ixgbe_xmit_zc() > > This should be not possible to bound another XDP socket to the same netdev > queue. > > It also possible to xmit frames in xdp_ring while performing XDP_TX/REDIRECT > actions. REDIRECT could happen from different netdev with different NAPI > context, but this operation is bound to specific CPU core and each core has > its own xdp_ring. > > However, I'm not an expert here. > Björn, maybe you could comment on this? > Yes, you're correct Ilya. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp 2019-08-20 15:16 ` [PATCH net] ixgbe: fix double clean of tx descriptors with xdp Ilya Maximets 2019-08-20 15:35 ` Alexander Duyck @ 2019-08-21 10:09 ` Eelco Chaudron 1 sibling, 0 replies; 14+ messages in thread From: Eelco Chaudron @ 2019-08-21 10:09 UTC (permalink / raw) To: Ilya Maximets Cc: netdev, linux-kernel, bpf, David S. Miller, Björn Töpel, Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, Jeff Kirsher, intel-wired-lan, William Tu On 20 Aug 2019, at 17:16, Ilya Maximets wrote: > Tx code doesn't clear the descriptor status after cleaning. > So, if the budget is larger than number of used elems in a ring, some > descriptors will be accounted twice and xsk_umem_complete_tx will move > prod_tail far beyond the prod_head breaking the comletion queue ring. > > Fix that by limiting the number of descriptors to clean by the number > of used descriptors in the tx ring. > > Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") > Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > --- > > Not tested yet because of lack of available hardware. > So, testing is very welcome. > Hi Ilya, this patch fixes the issue I reported earlier on the Open vSwitch mailing list regarding complete queue overrun. Tested-by: Eelco Chaudron <echaudro@redhat.com> <SNIP> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-08-22 16:30 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CGME20190820151644eucas1p179d6d1da42bb6be0aad8f58ac46624ce@eucas1p1.samsung.com> 2019-08-20 15:16 ` [PATCH net] ixgbe: fix double clean of tx descriptors with xdp Ilya Maximets 2019-08-20 15:35 ` Alexander Duyck 2019-08-20 15:58 ` Ilya Maximets 2019-08-21 1:17 ` Alexander Duyck 2019-08-21 16:21 ` Ilya Maximets 2019-08-21 16:57 ` Alexander Duyck 2019-08-21 21:38 ` William Tu 2019-08-22 8:17 ` Ilya Maximets 2019-08-22 16:07 ` William Tu 2019-08-22 16:30 ` Ilya Maximets 2019-08-22 7:12 ` [Intel-wired-lan] " Björn Töpel 2019-08-22 8:05 ` Ilya Maximets 2019-08-22 7:10 ` Björn Töpel 2019-08-21 10:09 ` Eelco Chaudron
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).