* [PATCH 0/2] intel/xdp fixes for fliping rx buffer @ 2020-07-17 6:24 ` Li RongQing 0 siblings, 0 replies; 30+ messages in thread From: Li RongQing @ 2020-07-17 6:24 UTC (permalink / raw) To: netdev, intel-wired-lan, magnus.karlsson, bjorn.topel This fixes ice/i40e/ixgbe/ixgbevf_rx_buffer_flip in copy mode xdp that can lead to data corruption. I split two patches, since i40e/xgbe/ixgbevf supports xsk receiving from 4.18, put their fixes in a patch Li RongQing (2): xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp ice/xdp: not adjust rx buffer for copy mode xdp drivers/net/ethernet/intel/i40e/i40e_txrx.c | 5 ++++- drivers/net/ethernet/intel/ice/ice_txrx.c | 5 ++++- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 ++++- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 5 ++++- include/net/xdp.h | 3 +++ net/xdp/xsk.c | 4 +++- 6 files changed, 22 insertions(+), 5 deletions(-) -- 2.16.2 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer @ 2020-07-17 6:24 ` Li RongQing 0 siblings, 0 replies; 30+ messages in thread From: Li RongQing @ 2020-07-17 6:24 UTC (permalink / raw) To: intel-wired-lan This fixes ice/i40e/ixgbe/ixgbevf_rx_buffer_flip in copy mode xdp that can lead to data corruption. I split two patches, since i40e/xgbe/ixgbevf supports xsk receiving from 4.18, put their fixes in a patch Li RongQing (2): xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp ice/xdp: not adjust rx buffer for copy mode xdp drivers/net/ethernet/intel/i40e/i40e_txrx.c | 5 ++++- drivers/net/ethernet/intel/ice/ice_txrx.c | 5 ++++- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 ++++- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 5 ++++- include/net/xdp.h | 3 +++ net/xdp/xsk.c | 4 +++- 6 files changed, 22 insertions(+), 5 deletions(-) -- 2.16.2 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp 2020-07-17 6:24 ` [Intel-wired-lan] " Li RongQing @ 2020-07-17 6:24 ` Li RongQing -1 siblings, 0 replies; 30+ messages in thread From: Li RongQing @ 2020-07-17 6:24 UTC (permalink / raw) To: netdev, intel-wired-lan, magnus.karlsson, bjorn.topel i40e/ixgbe/ixgbevf_rx_buffer_flip in copy mode xdp can lead to data corruption, like the following flow: 1. first skb is not for xsk, and forwarded to another device or socket queue 2. seconds skb is for xsk, copy data to xsk memory, and page of skb->data is released 3. rx_buff is reusable since only first skb is in it, but *_rx_buffer_flip will make that page_offset is set to first skb data 4. then reuse rx buffer, first skb which still is living will be corrupted. so add flags in xdp struct, to report xdp's data status, then driver has knowledge whether to flip rx buffer Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support") Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Dongsheng Rong <rongdongsheng@baidu.com> --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 5 ++++- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 ++++- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 5 ++++- include/net/xdp.h | 3 +++ net/xdp/xsk.c | 4 +++- 5 files changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index b3836092c327..51fa6f86f917 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2376,6 +2376,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget) /* retrieve a buffer from the ring */ if (!skb) { + xdp.flags = 0; xdp.data = page_address(rx_buffer->page) + rx_buffer->page_offset; xdp.data_meta = xdp.data; @@ -2394,7 +2395,9 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget) if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR)) { xdp_xmit |= xdp_res; - i40e_rx_buffer_flip(rx_ring, rx_buffer, size); + + if (!(xdp.flags & XDP_DATA_RELEASED)) + i40e_rx_buffer_flip(rx_ring, rx_buffer, size); } else { rx_buffer->pagecnt_bias++; } diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index a8bf941c5c29..9e44a7e1d91c 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -2333,6 +2333,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, /* retrieve a buffer from the ring */ if (!skb) { + xdp.flags = 0; xdp.data = page_address(rx_buffer->page) + rx_buffer->page_offset; xdp.data_meta = xdp.data; @@ -2351,7 +2352,9 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) { xdp_xmit |= xdp_res; - ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size); + + if (!(xdp.flags & XDP_DATA_RELEASED)) + ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size); } else { rx_buffer->pagecnt_bias++; } diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c index a39e2cb384dd..1c1a8b6a5dcf 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -1168,6 +1168,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector, /* retrieve a buffer from the ring */ if (!skb) { + xdp.flags = 0; xdp.data = page_address(rx_buffer->page) + rx_buffer->page_offset; xdp.data_meta = xdp.data; @@ -1184,7 +1185,9 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector, if (IS_ERR(skb)) { if (PTR_ERR(skb) == -IXGBEVF_XDP_TX) { xdp_xmit = true; - ixgbevf_rx_buffer_flip(rx_ring, rx_buffer, + + if (!(xdp.flags & XDP_DATA_RELEASED)) + ixgbevf_rx_buffer_flip(rx_ring, rx_buffer, size); } else { rx_buffer->pagecnt_bias++; diff --git a/include/net/xdp.h b/include/net/xdp.h index 609f819ed08b..6b32a01ade19 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -47,6 +47,8 @@ enum xdp_mem_type { #define XDP_XMIT_FLUSH (1U << 0) /* doorbell signal consumer */ #define XDP_XMIT_FLAGS_MASK XDP_XMIT_FLUSH +#define XDP_DATA_RELEASED (1U << 0) + struct xdp_mem_info { u32 type; /* enum xdp_mem_type, but known size type */ u32 id; @@ -73,6 +75,7 @@ struct xdp_buff { struct xdp_rxq_info *rxq; struct xdp_txq_info *txq; u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/ + u32 flags; }; /* Reserve memory area at end-of data area. diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index b6c0f08bd80d..2c4c5c16660b 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -172,8 +172,10 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len, xsk_buff_free(xsk_xdp); return err; } - if (explicit_free) + if (explicit_free) { xdp_return_buff(xdp); + xdp->flags |= XDP_DATA_RELEASED; + } return 0; } -- 2.16.2 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Intel-wired-lan] [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp @ 2020-07-17 6:24 ` Li RongQing 0 siblings, 0 replies; 30+ messages in thread From: Li RongQing @ 2020-07-17 6:24 UTC (permalink / raw) To: intel-wired-lan i40e/ixgbe/ixgbevf_rx_buffer_flip in copy mode xdp can lead to data corruption, like the following flow: 1. first skb is not for xsk, and forwarded to another device or socket queue 2. seconds skb is for xsk, copy data to xsk memory, and page of skb->data is released 3. rx_buff is reusable since only first skb is in it, but *_rx_buffer_flip will make that page_offset is set to first skb data 4. then reuse rx buffer, first skb which still is living will be corrupted. so add flags in xdp struct, to report xdp's data status, then driver has knowledge whether to flip rx buffer Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support") Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Dongsheng Rong <rongdongsheng@baidu.com> --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 5 ++++- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 ++++- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 5 ++++- include/net/xdp.h | 3 +++ net/xdp/xsk.c | 4 +++- 5 files changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index b3836092c327..51fa6f86f917 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2376,6 +2376,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget) /* retrieve a buffer from the ring */ if (!skb) { + xdp.flags = 0; xdp.data = page_address(rx_buffer->page) + rx_buffer->page_offset; xdp.data_meta = xdp.data; @@ -2394,7 +2395,9 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget) if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR)) { xdp_xmit |= xdp_res; - i40e_rx_buffer_flip(rx_ring, rx_buffer, size); + + if (!(xdp.flags & XDP_DATA_RELEASED)) + i40e_rx_buffer_flip(rx_ring, rx_buffer, size); } else { rx_buffer->pagecnt_bias++; } diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index a8bf941c5c29..9e44a7e1d91c 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -2333,6 +2333,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, /* retrieve a buffer from the ring */ if (!skb) { + xdp.flags = 0; xdp.data = page_address(rx_buffer->page) + rx_buffer->page_offset; xdp.data_meta = xdp.data; @@ -2351,7 +2352,9 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) { xdp_xmit |= xdp_res; - ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size); + + if (!(xdp.flags & XDP_DATA_RELEASED)) + ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size); } else { rx_buffer->pagecnt_bias++; } diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c index a39e2cb384dd..1c1a8b6a5dcf 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -1168,6 +1168,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector, /* retrieve a buffer from the ring */ if (!skb) { + xdp.flags = 0; xdp.data = page_address(rx_buffer->page) + rx_buffer->page_offset; xdp.data_meta = xdp.data; @@ -1184,7 +1185,9 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector, if (IS_ERR(skb)) { if (PTR_ERR(skb) == -IXGBEVF_XDP_TX) { xdp_xmit = true; - ixgbevf_rx_buffer_flip(rx_ring, rx_buffer, + + if (!(xdp.flags & XDP_DATA_RELEASED)) + ixgbevf_rx_buffer_flip(rx_ring, rx_buffer, size); } else { rx_buffer->pagecnt_bias++; diff --git a/include/net/xdp.h b/include/net/xdp.h index 609f819ed08b..6b32a01ade19 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -47,6 +47,8 @@ enum xdp_mem_type { #define XDP_XMIT_FLUSH (1U << 0) /* doorbell signal consumer */ #define XDP_XMIT_FLAGS_MASK XDP_XMIT_FLUSH +#define XDP_DATA_RELEASED (1U << 0) + struct xdp_mem_info { u32 type; /* enum xdp_mem_type, but known size type */ u32 id; @@ -73,6 +75,7 @@ struct xdp_buff { struct xdp_rxq_info *rxq; struct xdp_txq_info *txq; u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/ + u32 flags; }; /* Reserve memory area@end-of data area. diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index b6c0f08bd80d..2c4c5c16660b 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -172,8 +172,10 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len, xsk_buff_free(xsk_xdp); return err; } - if (explicit_free) + if (explicit_free) { xdp_return_buff(xdp); + xdp->flags |= XDP_DATA_RELEASED; + } return 0; } -- 2.16.2 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [Intel-wired-lan] [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp 2020-07-17 6:24 ` [Intel-wired-lan] " Li RongQing @ 2020-07-20 7:21 ` Magnus Karlsson -1 siblings, 0 replies; 30+ messages in thread From: Magnus Karlsson @ 2020-07-20 7:21 UTC (permalink / raw) To: Li RongQing Cc: Network Development, intel-wired-lan, Karlsson, Magnus, Björn Töpel On Fri, Jul 17, 2020 at 8:24 AM Li RongQing <lirongqing@baidu.com> wrote: > > i40e/ixgbe/ixgbevf_rx_buffer_flip in copy mode xdp can lead to > data corruption, like the following flow: > > 1. first skb is not for xsk, and forwarded to another device > or socket queue > 2. seconds skb is for xsk, copy data to xsk memory, and page > of skb->data is released > 3. rx_buff is reusable since only first skb is in it, but > *_rx_buffer_flip will make that page_offset is set to > first skb data > 4. then reuse rx buffer, first skb which still is living > will be corrupted. > > so add flags in xdp struct, to report xdp's data status, then > driver has knowledge whether to flip rx buffer > > Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support") > Signed-off-by: Li RongQing <lirongqing@baidu.com> > Signed-off-by: Dongsheng Rong <rongdongsheng@baidu.com> > --- > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 5 ++++- > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 ++++- > drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 5 ++++- > include/net/xdp.h | 3 +++ > net/xdp/xsk.c | 4 +++- > 5 files changed, 18 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c > index b3836092c327..51fa6f86f917 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c > +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c > @@ -2376,6 +2376,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget) > > /* retrieve a buffer from the ring */ > if (!skb) { > + xdp.flags = 0; > xdp.data = page_address(rx_buffer->page) + > rx_buffer->page_offset; > xdp.data_meta = xdp.data; > @@ -2394,7 +2395,9 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget) > > if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR)) { > xdp_xmit |= xdp_res; > - i40e_rx_buffer_flip(rx_ring, rx_buffer, size); > + > + if (!(xdp.flags & XDP_DATA_RELEASED)) > + i40e_rx_buffer_flip(rx_ring, rx_buffer, size); > } else { > rx_buffer->pagecnt_bias++; > } > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > index a8bf941c5c29..9e44a7e1d91c 100644 > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > @@ -2333,6 +2333,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, > > /* retrieve a buffer from the ring */ > if (!skb) { > + xdp.flags = 0; > xdp.data = page_address(rx_buffer->page) + > rx_buffer->page_offset; > xdp.data_meta = xdp.data; > @@ -2351,7 +2352,9 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, > > if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) { > xdp_xmit |= xdp_res; > - ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size); > + > + if (!(xdp.flags & XDP_DATA_RELEASED)) > + ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size); > } else { > rx_buffer->pagecnt_bias++; > } > diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c > index a39e2cb384dd..1c1a8b6a5dcf 100644 > --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c > +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c > @@ -1168,6 +1168,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector, > > /* retrieve a buffer from the ring */ > if (!skb) { > + xdp.flags = 0; > xdp.data = page_address(rx_buffer->page) + > rx_buffer->page_offset; > xdp.data_meta = xdp.data; > @@ -1184,7 +1185,9 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector, > if (IS_ERR(skb)) { > if (PTR_ERR(skb) == -IXGBEVF_XDP_TX) { > xdp_xmit = true; > - ixgbevf_rx_buffer_flip(rx_ring, rx_buffer, > + > + if (!(xdp.flags & XDP_DATA_RELEASED)) > + ixgbevf_rx_buffer_flip(rx_ring, rx_buffer, > size); > } else { > rx_buffer->pagecnt_bias++; > diff --git a/include/net/xdp.h b/include/net/xdp.h > index 609f819ed08b..6b32a01ade19 100644 > --- a/include/net/xdp.h > +++ b/include/net/xdp.h > @@ -47,6 +47,8 @@ enum xdp_mem_type { > #define XDP_XMIT_FLUSH (1U << 0) /* doorbell signal consumer */ > #define XDP_XMIT_FLAGS_MASK XDP_XMIT_FLUSH > > +#define XDP_DATA_RELEASED (1U << 0) > + > struct xdp_mem_info { > u32 type; /* enum xdp_mem_type, but known size type */ > u32 id; > @@ -73,6 +75,7 @@ struct xdp_buff { > struct xdp_rxq_info *rxq; > struct xdp_txq_info *txq; > u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/ > + u32 flags; RongQing, Sorry that I was not clear enough. Could you please submit the simple patch you had, the one that only tests for the memory type. if (xdp->rxq->mem.type != MEM_TYPE_XSK_BUFF_POOL) i40e_rx_buffer_flip(rx_ring, rx_buffer, size); I do not think that adding a flags field in the xdp_mem_info to fix an Intel driver problem will be hugely popular. The struct is also meant to contain long lived information, not things that will frequently change. Thank you: Magnus > }; > > /* Reserve memory area at end-of data area. > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index b6c0f08bd80d..2c4c5c16660b 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -172,8 +172,10 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len, > xsk_buff_free(xsk_xdp); > return err; > } > - if (explicit_free) > + if (explicit_free) { > xdp_return_buff(xdp); > + xdp->flags |= XDP_DATA_RELEASED; > + } > return 0; > } > > -- > 2.16.2 > > _______________________________________________ > Intel-wired-lan mailing list > Intel-wired-lan@osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 30+ messages in thread
* [Intel-wired-lan] [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp @ 2020-07-20 7:21 ` Magnus Karlsson 0 siblings, 0 replies; 30+ messages in thread From: Magnus Karlsson @ 2020-07-20 7:21 UTC (permalink / raw) To: intel-wired-lan On Fri, Jul 17, 2020 at 8:24 AM Li RongQing <lirongqing@baidu.com> wrote: > > i40e/ixgbe/ixgbevf_rx_buffer_flip in copy mode xdp can lead to > data corruption, like the following flow: > > 1. first skb is not for xsk, and forwarded to another device > or socket queue > 2. seconds skb is for xsk, copy data to xsk memory, and page > of skb->data is released > 3. rx_buff is reusable since only first skb is in it, but > *_rx_buffer_flip will make that page_offset is set to > first skb data > 4. then reuse rx buffer, first skb which still is living > will be corrupted. > > so add flags in xdp struct, to report xdp's data status, then > driver has knowledge whether to flip rx buffer > > Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support") > Signed-off-by: Li RongQing <lirongqing@baidu.com> > Signed-off-by: Dongsheng Rong <rongdongsheng@baidu.com> > --- > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 5 ++++- > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 ++++- > drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 5 ++++- > include/net/xdp.h | 3 +++ > net/xdp/xsk.c | 4 +++- > 5 files changed, 18 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c > index b3836092c327..51fa6f86f917 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c > +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c > @@ -2376,6 +2376,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget) > > /* retrieve a buffer from the ring */ > if (!skb) { > + xdp.flags = 0; > xdp.data = page_address(rx_buffer->page) + > rx_buffer->page_offset; > xdp.data_meta = xdp.data; > @@ -2394,7 +2395,9 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget) > > if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR)) { > xdp_xmit |= xdp_res; > - i40e_rx_buffer_flip(rx_ring, rx_buffer, size); > + > + if (!(xdp.flags & XDP_DATA_RELEASED)) > + i40e_rx_buffer_flip(rx_ring, rx_buffer, size); > } else { > rx_buffer->pagecnt_bias++; > } > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > index a8bf941c5c29..9e44a7e1d91c 100644 > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > @@ -2333,6 +2333,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, > > /* retrieve a buffer from the ring */ > if (!skb) { > + xdp.flags = 0; > xdp.data = page_address(rx_buffer->page) + > rx_buffer->page_offset; > xdp.data_meta = xdp.data; > @@ -2351,7 +2352,9 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, > > if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) { > xdp_xmit |= xdp_res; > - ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size); > + > + if (!(xdp.flags & XDP_DATA_RELEASED)) > + ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size); > } else { > rx_buffer->pagecnt_bias++; > } > diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c > index a39e2cb384dd..1c1a8b6a5dcf 100644 > --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c > +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c > @@ -1168,6 +1168,7 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector, > > /* retrieve a buffer from the ring */ > if (!skb) { > + xdp.flags = 0; > xdp.data = page_address(rx_buffer->page) + > rx_buffer->page_offset; > xdp.data_meta = xdp.data; > @@ -1184,7 +1185,9 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector, > if (IS_ERR(skb)) { > if (PTR_ERR(skb) == -IXGBEVF_XDP_TX) { > xdp_xmit = true; > - ixgbevf_rx_buffer_flip(rx_ring, rx_buffer, > + > + if (!(xdp.flags & XDP_DATA_RELEASED)) > + ixgbevf_rx_buffer_flip(rx_ring, rx_buffer, > size); > } else { > rx_buffer->pagecnt_bias++; > diff --git a/include/net/xdp.h b/include/net/xdp.h > index 609f819ed08b..6b32a01ade19 100644 > --- a/include/net/xdp.h > +++ b/include/net/xdp.h > @@ -47,6 +47,8 @@ enum xdp_mem_type { > #define XDP_XMIT_FLUSH (1U << 0) /* doorbell signal consumer */ > #define XDP_XMIT_FLAGS_MASK XDP_XMIT_FLUSH > > +#define XDP_DATA_RELEASED (1U << 0) > + > struct xdp_mem_info { > u32 type; /* enum xdp_mem_type, but known size type */ > u32 id; > @@ -73,6 +75,7 @@ struct xdp_buff { > struct xdp_rxq_info *rxq; > struct xdp_txq_info *txq; > u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/ > + u32 flags; RongQing, Sorry that I was not clear enough. Could you please submit the simple patch you had, the one that only tests for the memory type. if (xdp->rxq->mem.type != MEM_TYPE_XSK_BUFF_POOL) i40e_rx_buffer_flip(rx_ring, rx_buffer, size); I do not think that adding a flags field in the xdp_mem_info to fix an Intel driver problem will be hugely popular. The struct is also meant to contain long lived information, not things that will frequently change. Thank you: Magnus > }; > > /* Reserve memory area at end-of data area. > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index b6c0f08bd80d..2c4c5c16660b 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -172,8 +172,10 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len, > xsk_buff_free(xsk_xdp); > return err; > } > - if (explicit_free) > + if (explicit_free) { > xdp_return_buff(xdp); > + xdp->flags |= XDP_DATA_RELEASED; > + } > return 0; > } > > -- > 2.16.2 > > _______________________________________________ > Intel-wired-lan mailing list > Intel-wired-lan at osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 30+ messages in thread
* 答复: [Intel-wired-lan] [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp 2020-07-20 7:21 ` Magnus Karlsson @ 2020-07-21 1:42 ` Li, Rongqing -1 siblings, 0 replies; 30+ messages in thread From: Li,Rongqing @ 2020-07-21 1:42 UTC (permalink / raw) To: Magnus Karlsson Cc: Network Development, intel-wired-lan, Karlsson, Magnus, Björn Töpel > -----邮件原件----- > 发件人: Magnus Karlsson [mailto:magnus.karlsson@gmail.com] > 发送时间: 2020年7月20日 15:21 > 收件人: Li,Rongqing <lirongqing@baidu.com> > 抄送: Network Development <netdev@vger.kernel.org>; intel-wired-lan > <intel-wired-lan@lists.osuosl.org>; Karlsson, Magnus > <magnus.karlsson@intel.com>; Björn Töpel <bjorn.topel@intel.com> > 主题: Re: [Intel-wired-lan] [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not flip rx > buffer for copy mode xdp > > On Fri, Jul 17, 2020 at 8:24 AM Li RongQing <lirongqing@baidu.com> wrote: > > > > i40e/ixgbe/ixgbevf_rx_buffer_flip in copy mode xdp can lead to data > > corruption, like the following flow: > > > > 1. first skb is not for xsk, and forwarded to another device > > or socket queue > > 2. seconds skb is for xsk, copy data to xsk memory, and page > > of skb->data is released > > 3. rx_buff is reusable since only first skb is in it, but > > *_rx_buffer_flip will make that page_offset is set to > > first skb data > > 4. then reuse rx buffer, first skb which still is living > > will be corrupted. e, but known size type */ > > u32 id; > > @@ -73,6 +75,7 @@ struct xdp_buff { > > struct xdp_rxq_info *rxq; > > struct xdp_txq_info *txq; > > u32 frame_sz; /* frame size to deduce data_hard_end/reserved > > tailroom*/ > > + u32 flags; > > RongQing, > > Sorry that I was not clear enough. Could you please submit the simple patch > you had, the one that only tests for the memory type. > > if (xdp->rxq->mem.type != MEM_TYPE_XSK_BUFF_POOL) > i40e_rx_buffer_flip(rx_ring, rx_buffer, size); > > I do not think that adding a flags field in the xdp_mem_info to fix an Intel driver > problem will be hugely popular. The struct is also meant to contain long lived > information, not things that will frequently change. > Thank you Magnus My original suggestion is wrong , it should be following if (xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) i40e_rx_buffer_flip(rx_ring, rx_buffer, size); but I feel it is not enough to only check mem.type, it must ensure that map_type is BPF_MAP_TYPE_XSKMAP ? but it is not expose. other maptype, like BPF_MAP_TYPE_DEVMAP, and if mem.type is MEM_TYPE_PAGE_SHARED, not flip the rx buffer, will cause data corruption. -Li ^ permalink raw reply [flat|nested] 30+ messages in thread
* [Intel-wired-lan] 答复: [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp @ 2020-07-21 1:42 ` Li, Rongqing 0 siblings, 0 replies; 30+ messages in thread From: Li, Rongqing @ 2020-07-21 1:42 UTC (permalink / raw) To: intel-wired-lan > -----????----- > ???: Magnus Karlsson [mailto:magnus.karlsson at gmail.com] > ????: 2020?7?20? 15:21 > ???: Li,Rongqing <lirongqing@baidu.com> > ??: Network Development <netdev@vger.kernel.org>; intel-wired-lan > <intel-wired-lan@lists.osuosl.org>; Karlsson, Magnus > <magnus.karlsson@intel.com>; Bj?rn T?pel <bjorn.topel@intel.com> > ??: Re: [Intel-wired-lan] [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not flip rx > buffer for copy mode xdp > > On Fri, Jul 17, 2020 at 8:24 AM Li RongQing <lirongqing@baidu.com> wrote: > > > > i40e/ixgbe/ixgbevf_rx_buffer_flip in copy mode xdp can lead to data > > corruption, like the following flow: > > > > 1. first skb is not for xsk, and forwarded to another device > > or socket queue > > 2. seconds skb is for xsk, copy data to xsk memory, and page > > of skb->data is released > > 3. rx_buff is reusable since only first skb is in it, but > > *_rx_buffer_flip will make that page_offset is set to > > first skb data > > 4. then reuse rx buffer, first skb which still is living > > will be corrupted. e, but known size type */ > > u32 id; > > @@ -73,6 +75,7 @@ struct xdp_buff { > > struct xdp_rxq_info *rxq; > > struct xdp_txq_info *txq; > > u32 frame_sz; /* frame size to deduce data_hard_end/reserved > > tailroom*/ > > + u32 flags; > > RongQing, > > Sorry that I was not clear enough. Could you please submit the simple patch > you had, the one that only tests for the memory type. > > if (xdp->rxq->mem.type != MEM_TYPE_XSK_BUFF_POOL) > i40e_rx_buffer_flip(rx_ring, rx_buffer, size); > > I do not think that adding a flags field in the xdp_mem_info to fix an Intel driver > problem will be hugely popular. The struct is also meant to contain long lived > information, not things that will frequently change. > Thank you Magnus My original suggestion is wrong , it should be following if (xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) i40e_rx_buffer_flip(rx_ring, rx_buffer, size); but I feel it is not enough to only check mem.type, it must ensure that map_type is BPF_MAP_TYPE_XSKMAP ? but it is not expose. other maptype, like BPF_MAP_TYPE_DEVMAP, and if mem.type is MEM_TYPE_PAGE_SHARED, not flip the rx buffer, will cause data corruption. -Li ^ permalink raw reply [flat|nested] 30+ messages in thread
* 答复: [Intel-wired-lan] [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp 2020-07-20 7:21 ` Magnus Karlsson @ 2020-07-21 7:49 ` Li, Rongqing -1 siblings, 0 replies; 30+ messages in thread From: Li,Rongqing @ 2020-07-21 7:49 UTC (permalink / raw) To: Magnus Karlsson Cc: Network Development, intel-wired-lan, Karlsson, Magnus, Björn Töpel > -----邮件原件----- > 发件人: Li,Rongqing > 发送时间: 2020年7月21日 9:43 > 收件人: 'Magnus Karlsson' <magnus.karlsson@gmail.com> > 抄送: Network Development <netdev@vger.kernel.org>; intel-wired-lan > <intel-wired-lan@lists.osuosl.org>; Karlsson, Magnus > <magnus.karlsson@intel.com>; Björn Töpel <bjorn.topel@intel.com> > 主题: 答复: [Intel-wired-lan] [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not flip rx > buffer for copy mode xdp > > > > > -----邮件原件----- > > 发件人: Magnus Karlsson [mailto:magnus.karlsson@gmail.com] > > 发送时间: 2020年7月20日 15:21 > > 收件人: Li,Rongqing <lirongqing@baidu.com> > > 抄送: Network Development <netdev@vger.kernel.org>; intel-wired-lan > > <intel-wired-lan@lists.osuosl.org>; Karlsson, Magnus > > <magnus.karlsson@intel.com>; Björn Töpel <bjorn.topel@intel.com> > > 主题: Re: [Intel-wired-lan] [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not > > flip rx buffer for copy mode xdp > > > > On Fri, Jul 17, 2020 at 8:24 AM Li RongQing <lirongqing@baidu.com> wrote: > > > > > > i40e/ixgbe/ixgbevf_rx_buffer_flip in copy mode xdp can lead to data > > > corruption, like the following flow: > > > > > > 1. first skb is not for xsk, and forwarded to another device > > > or socket queue > > > 2. seconds skb is for xsk, copy data to xsk memory, and page > > > of skb->data is released > > > 3. rx_buff is reusable since only first skb is in it, but > > > *_rx_buffer_flip will make that page_offset is set to > > > first skb data > > > 4. then reuse rx buffer, first skb which still is living > > > will be corrupted. > e, but known size type */ > > > u32 id; > > > @@ -73,6 +75,7 @@ struct xdp_buff { > > > struct xdp_rxq_info *rxq; > > > struct xdp_txq_info *txq; > > > u32 frame_sz; /* frame size to deduce data_hard_end/reserved > > > tailroom*/ > > > + u32 flags; > > > > RongQing, > > > > Sorry that I was not clear enough. Could you please submit the simple > > patch you had, the one that only tests for the memory type. > > > > if (xdp->rxq->mem.type != MEM_TYPE_XSK_BUFF_POOL) > > i40e_rx_buffer_flip(rx_ring, rx_buffer, size); > > > > I do not think that adding a flags field in the xdp_mem_info to fix an > > Intel driver problem will be hugely popular. The struct is also meant > > to contain long lived information, not things that will frequently change. > > > > > Thank you Magnus > > My original suggestion is wrong , it should be following > > if (xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) > i40e_rx_buffer_flip(rx_ring, rx_buffer, size); > > > but I feel it is not enough to only check mem.type, it must ensure that > map_type is BPF_MAP_TYPE_XSKMAP ? but it is not expose. > > other maptype, like BPF_MAP_TYPE_DEVMAP, and if mem.type is > MEM_TYPE_PAGE_SHARED, not flip the rx buffer, will cause data corruption. > > > -Li > > How about this? --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2394,7 +2394,10 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget) if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR)) { xdp_xmit |= xdp_res; - i40e_rx_buffer_flip(rx_ring, rx_buffer, size); + + if (xdp.rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL || + xdp_get_map_type() != BPF_MAP_TYPE_XSKMAP) + i40e_rx_buffer_flip(rx_ring, rx_buffer, size); } else { rx_buffer->pagecnt_bias++; } diff --git a/include/linux/filter.h b/include/linux/filter.h index 259377723603..94f4435a77f3 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -919,6 +919,17 @@ static inline void xdp_clear_return_frame_no_direct(void) ri->kern_flags &= ~BPF_RI_F_RF_NO_DIRECT; } +static enum bpf_map_type xdp_get_map_type(void) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_map *map = READ_ONCE(ri->map); + + if (map) + return map->map_type; + else + return BPF_MAP_TYPE_UNSPEC; +} + static inline int xdp_ok_fwd_dev(const struct net_device *fwd, unsigned int pktlen) ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Intel-wired-lan] 答复: [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp @ 2020-07-21 7:49 ` Li, Rongqing 0 siblings, 0 replies; 30+ messages in thread From: Li, Rongqing @ 2020-07-21 7:49 UTC (permalink / raw) To: intel-wired-lan > -----????----- > ???: Li,Rongqing > ????: 2020?7?21? 9:43 > ???: 'Magnus Karlsson' <magnus.karlsson@gmail.com> > ??: Network Development <netdev@vger.kernel.org>; intel-wired-lan > <intel-wired-lan@lists.osuosl.org>; Karlsson, Magnus > <magnus.karlsson@intel.com>; Bj?rn T?pel <bjorn.topel@intel.com> > ??: ??: [Intel-wired-lan] [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not flip rx > buffer for copy mode xdp > > > > > -----????----- > > ???: Magnus Karlsson [mailto:magnus.karlsson at gmail.com] > > ????: 2020?7?20? 15:21 > > ???: Li,Rongqing <lirongqing@baidu.com> > > ??: Network Development <netdev@vger.kernel.org>; intel-wired-lan > > <intel-wired-lan@lists.osuosl.org>; Karlsson, Magnus > > <magnus.karlsson@intel.com>; Bj?rn T?pel <bjorn.topel@intel.com> > > ??: Re: [Intel-wired-lan] [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not > > flip rx buffer for copy mode xdp > > > > On Fri, Jul 17, 2020 at 8:24 AM Li RongQing <lirongqing@baidu.com> wrote: > > > > > > i40e/ixgbe/ixgbevf_rx_buffer_flip in copy mode xdp can lead to data > > > corruption, like the following flow: > > > > > > 1. first skb is not for xsk, and forwarded to another device > > > or socket queue > > > 2. seconds skb is for xsk, copy data to xsk memory, and page > > > of skb->data is released > > > 3. rx_buff is reusable since only first skb is in it, but > > > *_rx_buffer_flip will make that page_offset is set to > > > first skb data > > > 4. then reuse rx buffer, first skb which still is living > > > will be corrupted. > e, but known size type */ > > > u32 id; > > > @@ -73,6 +75,7 @@ struct xdp_buff { > > > struct xdp_rxq_info *rxq; > > > struct xdp_txq_info *txq; > > > u32 frame_sz; /* frame size to deduce data_hard_end/reserved > > > tailroom*/ > > > + u32 flags; > > > > RongQing, > > > > Sorry that I was not clear enough. Could you please submit the simple > > patch you had, the one that only tests for the memory type. > > > > if (xdp->rxq->mem.type != MEM_TYPE_XSK_BUFF_POOL) > > i40e_rx_buffer_flip(rx_ring, rx_buffer, size); > > > > I do not think that adding a flags field in the xdp_mem_info to fix an > > Intel driver problem will be hugely popular. The struct is also meant > > to contain long lived information, not things that will frequently change. > > > > > Thank you Magnus > > My original suggestion is wrong , it should be following > > if (xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) > i40e_rx_buffer_flip(rx_ring, rx_buffer, size); > > > but I feel it is not enough to only check mem.type, it must ensure that > map_type is BPF_MAP_TYPE_XSKMAP ? but it is not expose. > > other maptype, like BPF_MAP_TYPE_DEVMAP, and if mem.type is > MEM_TYPE_PAGE_SHARED, not flip the rx buffer, will cause data corruption. > > > -Li > > How about this? --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2394,7 +2394,10 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget) if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR)) { xdp_xmit |= xdp_res; - i40e_rx_buffer_flip(rx_ring, rx_buffer, size); + + if (xdp.rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL || + xdp_get_map_type() != BPF_MAP_TYPE_XSKMAP) + i40e_rx_buffer_flip(rx_ring, rx_buffer, size); } else { rx_buffer->pagecnt_bias++; } diff --git a/include/linux/filter.h b/include/linux/filter.h index 259377723603..94f4435a77f3 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -919,6 +919,17 @@ static inline void xdp_clear_return_frame_no_direct(void) ri->kern_flags &= ~BPF_RI_F_RF_NO_DIRECT; } +static enum bpf_map_type xdp_get_map_type(void) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_map *map = READ_ONCE(ri->map); + + if (map) + return map->map_type; + else + return BPF_MAP_TYPE_UNSPEC; +} + static inline int xdp_ok_fwd_dev(const struct net_device *fwd, unsigned int pktlen) ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/2] ice/xdp: not adjust rx buffer for copy mode xdp 2020-07-17 6:24 ` [Intel-wired-lan] " Li RongQing @ 2020-07-17 6:24 ` Li RongQing -1 siblings, 0 replies; 30+ messages in thread From: Li RongQing @ 2020-07-17 6:24 UTC (permalink / raw) To: netdev, intel-wired-lan, magnus.karlsson, bjorn.topel ice_rx_buf_adjust_pg_offset in copy mode xdp can lead to data corruption, like the following flow: 1. first skb is not for xsk, and forwarded to another device or socket queue 2. seconds skb is for xsk, copy data to xsk memory, and page of skb->data is released 3. rx_buff is reusable since only first skb is in it, but ice_rx_buf_adjust_pg_offset will make that page_offset is set to first skb data 4. then reuse rx buffer, first skb which still is living will be corrupted. so adjust rx buffer page offset only when xdp data is not released Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Li RongQing <lirongqing@baidu.com> --- drivers/net/ethernet/intel/ice/ice_txrx.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index abdb137c8bb7..2c58daf4d0d1 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -1147,6 +1147,7 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget) goto construct_skb; } + xdp.flags = 0; xdp.data = page_address(rx_buf->page) + rx_buf->page_offset; xdp.data_hard_start = xdp.data - ice_rx_offset(rx_ring); xdp.data_meta = xdp.data; @@ -1169,7 +1170,9 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget) goto construct_skb; if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) { xdp_xmit |= xdp_res; - ice_rx_buf_adjust_pg_offset(rx_buf, xdp.frame_sz); + + if (!(xdp.flags & XDP_DATA_RELEASED)) + ice_rx_buf_adjust_pg_offset(rx_buf, xdp.frame_sz); } else { rx_buf->pagecnt_bias++; } -- 2.16.2 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Intel-wired-lan] [PATCH 2/2] ice/xdp: not adjust rx buffer for copy mode xdp @ 2020-07-17 6:24 ` Li RongQing 0 siblings, 0 replies; 30+ messages in thread From: Li RongQing @ 2020-07-17 6:24 UTC (permalink / raw) To: intel-wired-lan ice_rx_buf_adjust_pg_offset in copy mode xdp can lead to data corruption, like the following flow: 1. first skb is not for xsk, and forwarded to another device or socket queue 2. seconds skb is for xsk, copy data to xsk memory, and page of skb->data is released 3. rx_buff is reusable since only first skb is in it, but ice_rx_buf_adjust_pg_offset will make that page_offset is set to first skb data 4. then reuse rx buffer, first skb which still is living will be corrupted. so adjust rx buffer page offset only when xdp data is not released Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Signed-off-by: Li RongQing <lirongqing@baidu.com> --- drivers/net/ethernet/intel/ice/ice_txrx.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index abdb137c8bb7..2c58daf4d0d1 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -1147,6 +1147,7 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget) goto construct_skb; } + xdp.flags = 0; xdp.data = page_address(rx_buf->page) + rx_buf->page_offset; xdp.data_hard_start = xdp.data - ice_rx_offset(rx_ring); xdp.data_meta = xdp.data; @@ -1169,7 +1170,9 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget) goto construct_skb; if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) { xdp_xmit |= xdp_res; - ice_rx_buf_adjust_pg_offset(rx_buf, xdp.frame_sz); + + if (!(xdp.flags & XDP_DATA_RELEASED)) + ice_rx_buf_adjust_pg_offset(rx_buf, xdp.frame_sz); } else { rx_buf->pagecnt_bias++; } -- 2.16.2 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer 2020-07-17 6:24 ` [Intel-wired-lan] " Li RongQing @ 2020-08-18 14:04 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= -1 siblings, 0 replies; 30+ messages in thread From: Björn Töpel @ 2020-08-18 14:04 UTC (permalink / raw) To: Li RongQing Cc: Netdev, intel-wired-lan, Karlsson, Magnus, Björn Töpel, bpf, Maciej Fijalkowski, Piotr, Maciej On Fri, 17 Jul 2020 at 08:24, Li RongQing <lirongqing@baidu.com> wrote: > > This fixes ice/i40e/ixgbe/ixgbevf_rx_buffer_flip in > copy mode xdp that can lead to data corruption. > > I split two patches, since i40e/xgbe/ixgbevf supports xsk > receiving from 4.18, put their fixes in a patch > Li, sorry for the looong latency. I took a looong vacation. :-P Thanks for taking a look at this, but I believe this is not a bug. The Intel Ethernet drivers (obviously non-zerocopy AF_XDP -- "good ol' XDP") use a page reuse algorithm. Basic idea is that a page is allocated from the page allocator (i40e_alloc_mapped_page()). The refcount is increased to USHRT_MAX. The page is split into two chunks (simplified). If there's one user of the page, the page can be reused (flipped). If not, a new page needs to be allocated (with the large refcount). So, the idea is that usually the page can be reused (flipped), and the page only needs to be "put" not "get" since the refcount was initally bumped to a large value. All frames (except XDP_DROP which can be reused directly) "die" via page_frag_free() which decreases the page refcount, and frees the page if the refcount is zero. Let's take some scenarios as examples: 1. A frame is received in "vanilla" XDP (MEM_TYPE_PAGE_SHARED), and the XDP program verdict is XDP_TX. The frame will be placed on the HW Tx ring, and freed* (async) in i40e_clean_tx_irq: /* free the skb/XDP data */ if (ring_is_xdp(tx_ring)) xdp_return_frame(tx_buf->xdpf); // calls page_frag_free() 2. A frame is passed to the stack, eventually it's freed* via skb_free_frag(). 3. A frame is passed to an AF_XDP socket. The data is copied to the socket data area, and the frame is directly freed*. Not the * by the freed. Actually freeing here means calling page_frag_free(), which means decreasing the refcount. The page reuse algorithm makes sure that the buffers are not stale. The only difference from XDP_TX and XDP_DIRECT to dev/cpumaps, compared to AF_XDP sockets is that the latter calls page_frag_free() directly, whereas the other does it asynchronous from the Tx clean up phase. Let me know if it's still not clear, but the bottom line is that none of these patches are needed. Thanks! Björn > Li RongQing (2): > xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp > ice/xdp: not adjust rx buffer for copy mode xdp > > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 5 ++++- > drivers/net/ethernet/intel/ice/ice_txrx.c | 5 ++++- > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 ++++- > drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 5 ++++- > include/net/xdp.h | 3 +++ > net/xdp/xsk.c | 4 +++- > 6 files changed, 22 insertions(+), 5 deletions(-) > > -- > 2.16.2 > > _______________________________________________ > Intel-wired-lan mailing list > Intel-wired-lan@osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 30+ messages in thread
* [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer @ 2020-08-18 14:04 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 0 siblings, 0 replies; 30+ messages in thread From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2020-08-18 14:04 UTC (permalink / raw) To: intel-wired-lan On Fri, 17 Jul 2020 at 08:24, Li RongQing <lirongqing@baidu.com> wrote: > > This fixes ice/i40e/ixgbe/ixgbevf_rx_buffer_flip in > copy mode xdp that can lead to data corruption. > > I split two patches, since i40e/xgbe/ixgbevf supports xsk > receiving from 4.18, put their fixes in a patch > Li, sorry for the looong latency. I took a looong vacation. :-P Thanks for taking a look at this, but I believe this is not a bug. The Intel Ethernet drivers (obviously non-zerocopy AF_XDP -- "good ol' XDP") use a page reuse algorithm. Basic idea is that a page is allocated from the page allocator (i40e_alloc_mapped_page()). The refcount is increased to USHRT_MAX. The page is split into two chunks (simplified). If there's one user of the page, the page can be reused (flipped). If not, a new page needs to be allocated (with the large refcount). So, the idea is that usually the page can be reused (flipped), and the page only needs to be "put" not "get" since the refcount was initally bumped to a large value. All frames (except XDP_DROP which can be reused directly) "die" via page_frag_free() which decreases the page refcount, and frees the page if the refcount is zero. Let's take some scenarios as examples: 1. A frame is received in "vanilla" XDP (MEM_TYPE_PAGE_SHARED), and the XDP program verdict is XDP_TX. The frame will be placed on the HW Tx ring, and freed* (async) in i40e_clean_tx_irq: /* free the skb/XDP data */ if (ring_is_xdp(tx_ring)) xdp_return_frame(tx_buf->xdpf); // calls page_frag_free() 2. A frame is passed to the stack, eventually it's freed* via skb_free_frag(). 3. A frame is passed to an AF_XDP socket. The data is copied to the socket data area, and the frame is directly freed*. Not the * by the freed. Actually freeing here means calling page_frag_free(), which means decreasing the refcount. The page reuse algorithm makes sure that the buffers are not stale. The only difference from XDP_TX and XDP_DIRECT to dev/cpumaps, compared to AF_XDP sockets is that the latter calls page_frag_free() directly, whereas the other does it asynchronous from the Tx clean up phase. Let me know if it's still not clear, but the bottom line is that none of these patches are needed. Thanks! Bj?rn > Li RongQing (2): > xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp > ice/xdp: not adjust rx buffer for copy mode xdp > > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 5 ++++- > drivers/net/ethernet/intel/ice/ice_txrx.c | 5 ++++- > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 ++++- > drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 5 ++++- > include/net/xdp.h | 3 +++ > net/xdp/xsk.c | 4 +++- > 6 files changed, 22 insertions(+), 5 deletions(-) > > -- > 2.16.2 > > _______________________________________________ > Intel-wired-lan mailing list > Intel-wired-lan at osuosl.org > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 30+ messages in thread
* 答复: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer 2020-08-18 14:04 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2020-08-19 1:37 ` Li, Rongqing -1 siblings, 0 replies; 30+ messages in thread From: Li,Rongqing @ 2020-08-19 1:37 UTC (permalink / raw) To: Björn Töpel Cc: Netdev, intel-wired-lan, Karlsson, Magnus, Björn Töpel, bpf, Maciej Fijalkowski, Piotr, Maciej > -----邮件原件----- > 发件人: Björn Töpel [mailto:bjorn.topel@gmail.com] > 发送时间: 2020年8月18日 22:05 > 收件人: Li,Rongqing <lirongqing@baidu.com> > 抄送: Netdev <netdev@vger.kernel.org>; intel-wired-lan > <intel-wired-lan@lists.osuosl.org>; Karlsson, Magnus > <magnus.karlsson@intel.com>; Björn Töpel <bjorn.topel@intel.com>; bpf > <bpf@vger.kernel.org>; Maciej Fijalkowski <maciej.fijalkowski@intel.com>; > Piotr <piotr.raczynski@intel.com>; Maciej <maciej.machnikowski@intel.com> > 主题: Re: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer > > On Fri, 17 Jul 2020 at 08:24, Li RongQing <lirongqing@baidu.com> wrote: > > > > This fixes ice/i40e/ixgbe/ixgbevf_rx_buffer_flip in copy mode xdp that > > can lead to data corruption. > > > > I split two patches, since i40e/xgbe/ixgbevf supports xsk receiving > > from 4.18, put their fixes in a patch > > > > Li, sorry for the looong latency. I took a looong vacation. :-P > > Thanks for taking a look at this, but I believe this is not a bug. > > The Intel Ethernet drivers (obviously non-zerocopy AF_XDP -- "good ol' > XDP") use a page reuse algorithm. > > Basic idea is that a page is allocated from the page allocator > (i40e_alloc_mapped_page()). The refcount is increased to USHRT_MAX. The > page is split into two chunks (simplified). If there's one user of the page, the > page can be reused (flipped). If not, a new page needs to be allocated (with the > large refcount). > > So, the idea is that usually the page can be reused (flipped), and the page only > needs to be "put" not "get" since the refcount was initally bumped to a large > value. > > All frames (except XDP_DROP which can be reused directly) "die" via > page_frag_free() which decreases the page refcount, and frees the page if the > refcount is zero. > > Let's take some scenarios as examples: > > 1. A frame is received in "vanilla" XDP (MEM_TYPE_PAGE_SHARED), and > the XDP program verdict is XDP_TX. The frame will be placed on the > HW Tx ring, and freed* (async) in i40e_clean_tx_irq: > /* free the skb/XDP data */ > if (ring_is_xdp(tx_ring)) > xdp_return_frame(tx_buf->xdpf); // calls page_frag_free() > > 2. A frame is passed to the stack, eventually it's freed* via > skb_free_frag(). > > 3. A frame is passed to an AF_XDP socket. The data is copied to the > socket data area, and the frame is directly freed*. > > Not the * by the freed. Actually freeing here means calling page_frag_free(), > which means decreasing the refcount. The page reuse algorithm makes sure > that the buffers are not stale. > > The only difference from XDP_TX and XDP_DIRECT to dev/cpumaps, compared > to AF_XDP sockets is that the latter calls page_frag_free() directly, whereas > the other does it asynchronous from the Tx clean up phase. > Hi: Thanks for your explanation. But we can reproduce this bug We use ebpf to redirect only-Vxlan packets to non-zerocopy AF_XDP, First we see panic on tcp stack, in tcp_collapse: BUG_ON(offset < 0); it is very hard to reproduce. Then we use the scp to do test, and has lots of vxlan packet at the same time, scp will be broken frequently. With this fixes, scp has not been broken again, and kernel is not panic again Seem your explanation is unable to solve my analysis: 1. first skb is not for xsk, and forwarded to another device or socket queue 2. seconds skb is for xsk, copy data to xsk memory, and page of skb->data is released 3. rx_buff is reusable since only first skb is in it, but *_rx_buffer_flip will make that page_offset is set to first skb data 4. then reuse rx buffer, first skb which still is living will be corrupted. The root cause is difference you said upper, so I only fixes for non-zerocopy AF_XDP -Li > Let me know if it's still not clear, but the bottom line is that none of these > patches are needed. > > > Thanks! > Björn > > > > Li RongQing (2): > > xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp > > ice/xdp: not adjust rx buffer for copy mode xdp > > > > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 5 ++++- > > drivers/net/ethernet/intel/ice/ice_txrx.c | 5 ++++- > > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 ++++- > > drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 5 ++++- > > include/net/xdp.h | 3 +++ > > net/xdp/xsk.c | 4 +++- > > 6 files changed, 22 insertions(+), 5 deletions(-) > > > > -- > > 2.16.2 > > > > _______________________________________________ > > Intel-wired-lan mailing list > > Intel-wired-lan@osuosl.org > > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 30+ messages in thread
* [Intel-wired-lan] 答复: [PATCH 0/2] intel/xdp fixes for fliping rx buffer @ 2020-08-19 1:37 ` Li, Rongqing 0 siblings, 0 replies; 30+ messages in thread From: Li, Rongqing @ 2020-08-19 1:37 UTC (permalink / raw) To: intel-wired-lan > -----????----- > ???: Bj?rn T?pel [mailto:bjorn.topel at gmail.com] > ????: 2020?8?18? 22:05 > ???: Li,Rongqing <lirongqing@baidu.com> > ??: Netdev <netdev@vger.kernel.org>; intel-wired-lan > <intel-wired-lan@lists.osuosl.org>; Karlsson, Magnus > <magnus.karlsson@intel.com>; Bj?rn T?pel <bjorn.topel@intel.com>; bpf > <bpf@vger.kernel.org>; Maciej Fijalkowski <maciej.fijalkowski@intel.com>; > Piotr <piotr.raczynski@intel.com>; Maciej <maciej.machnikowski@intel.com> > ??: Re: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer > > On Fri, 17 Jul 2020 at 08:24, Li RongQing <lirongqing@baidu.com> wrote: > > > > This fixes ice/i40e/ixgbe/ixgbevf_rx_buffer_flip in copy mode xdp that > > can lead to data corruption. > > > > I split two patches, since i40e/xgbe/ixgbevf supports xsk receiving > > from 4.18, put their fixes in a patch > > > > Li, sorry for the looong latency. I took a looong vacation. :-P > > Thanks for taking a look at this, but I believe this is not a bug. > > The Intel Ethernet drivers (obviously non-zerocopy AF_XDP -- "good ol' > XDP") use a page reuse algorithm. > > Basic idea is that a page is allocated from the page allocator > (i40e_alloc_mapped_page()). The refcount is increased to USHRT_MAX. The > page is split into two chunks (simplified). If there's one user of the page, the > page can be reused (flipped). If not, a new page needs to be allocated (with the > large refcount). > > So, the idea is that usually the page can be reused (flipped), and the page only > needs to be "put" not "get" since the refcount was initally bumped to a large > value. > > All frames (except XDP_DROP which can be reused directly) "die" via > page_frag_free() which decreases the page refcount, and frees the page if the > refcount is zero. > > Let's take some scenarios as examples: > > 1. A frame is received in "vanilla" XDP (MEM_TYPE_PAGE_SHARED), and > the XDP program verdict is XDP_TX. The frame will be placed on the > HW Tx ring, and freed* (async) in i40e_clean_tx_irq: > /* free the skb/XDP data */ > if (ring_is_xdp(tx_ring)) > xdp_return_frame(tx_buf->xdpf); // calls page_frag_free() > > 2. A frame is passed to the stack, eventually it's freed* via > skb_free_frag(). > > 3. A frame is passed to an AF_XDP socket. The data is copied to the > socket data area, and the frame is directly freed*. > > Not the * by the freed. Actually freeing here means calling page_frag_free(), > which means decreasing the refcount. The page reuse algorithm makes sure > that the buffers are not stale. > > The only difference from XDP_TX and XDP_DIRECT to dev/cpumaps, compared > to AF_XDP sockets is that the latter calls page_frag_free() directly, whereas > the other does it asynchronous from the Tx clean up phase. > Hi: Thanks for your explanation. But we can reproduce this bug We use ebpf to redirect only-Vxlan packets to non-zerocopy AF_XDP, First we see panic on tcp stack, in tcp_collapse: BUG_ON(offset < 0); it is very hard to reproduce. Then we use the scp to do test, and has lots of vxlan packet@the same time, scp will be broken frequently. With this fixes, scp has not been broken again, and kernel is not panic again Seem your explanation is unable to solve my analysis: 1. first skb is not for xsk, and forwarded to another device or socket queue 2. seconds skb is for xsk, copy data to xsk memory, and page of skb->data is released 3. rx_buff is reusable since only first skb is in it, but *_rx_buffer_flip will make that page_offset is set to first skb data 4. then reuse rx buffer, first skb which still is living will be corrupted. The root cause is difference you said upper, so I only fixes for non-zerocopy AF_XDP -Li > Let me know if it's still not clear, but the bottom line is that none of these > patches are needed. > > > Thanks! > Bj?rn > > > > Li RongQing (2): > > xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp > > ice/xdp: not adjust rx buffer for copy mode xdp > > > > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 5 ++++- > > drivers/net/ethernet/intel/ice/ice_txrx.c | 5 ++++- > > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 ++++- > > drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 5 ++++- > > include/net/xdp.h | 3 +++ > > net/xdp/xsk.c | 4 +++- > > 6 files changed, 22 insertions(+), 5 deletions(-) > > > > -- > > 2.16.2 > > > > _______________________________________________ > > Intel-wired-lan mailing list > > Intel-wired-lan at osuosl.org > > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 答复: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer 2020-08-19 1:37 ` [Intel-wired-lan] 答复: " Li, Rongqing @ 2020-08-19 6:44 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= -1 siblings, 0 replies; 30+ messages in thread From: Björn Töpel @ 2020-08-19 6:44 UTC (permalink / raw) To: Li,Rongqing, Björn Töpel Cc: Netdev, intel-wired-lan, Karlsson, Magnus, bpf, Maciej Fijalkowski, Piotr, Maciej On 2020-08-19 03:37, Li,Rongqing wrote: [...] > Hi: > > Thanks for your explanation. > > But we can reproduce this bug > > We use ebpf to redirect only-Vxlan packets to non-zerocopy AF_XDP, First we see panic on tcp stack, in tcp_collapse: BUG_ON(offset < 0); it is very hard to reproduce. > > Then we use the scp to do test, and has lots of vxlan packet at the same time, scp will be broken frequently. > Ok! Just so that I'm certain of your setup. You receive packets to an i40e netdev where there's an XDP program. The program does XDP_PASS or XDP_REDIRECT to e.g. devmap for non-vxlan packets. However, vxlan packets are redirected to AF_XDP socket(s) in *copy-mode*. Am I understanding that correct? I'm assuming this is an x86-64 with 4k page size, right? :-) The page flipping is a bit different if the PAGE_SIZE is not 4k. > With this fixes, scp has not been broken again, and kernel is not panic again > Let's dig into your scenario. Are you saying the following: Page A: +------------ | "first skb" ----> Rx HW ring entry X +------------ | "second skb"----> Rx HW ring entry X+1 (or X+n) +------------ This is a scenario that shouldn't be allowed, because there are now two users of the page. If that's the case, the refcounting is broken. Is that the case? Check out i40e_can_reuse_rx_page(). The idea with page flipping/reuse is that the page is only reused if there is only one user. > Seem your explanation is unable to solve my analysis: > > 1. first skb is not for xsk, and forwarded to another device > or socket queue The data for the "first skb" resides on a page: A: +------------ | "first skb" +------------ | to be reused +------------ refcount >>1 > 2. seconds skb is for xsk, copy data to xsk memory, and page > of skb->data is released Note that page B != page A. B: +------------ | to be reused/or used by the stack +------------ | "second skb for xsk" +------------ refcount >>1 data is copied to socket, page_frag_free() is called, and the page count is decreased. The driver will then check if the page can be reused. If not, it's freed to the page allocator. > 3. rx_buff is reusable since only first skb is in it, but > *_rx_buffer_flip will make that page_offset is set to > first skb data I'm having trouble grasping how this is possible. More than one user implies that it wont be reused. If this is possible, the recounting/reuse mechanism is broken, and that is what should be fixed. The AF_XDP redirect should not have semantics different from, say, devmap redirect. It's just that the page_frag_free() is called earlier for AF_XDP, instead of from i40e_clean_tx_irq() as the case for devmap/XDP_TX. > 4. then reuse rx buffer, first skb which still is living > will be corrupted. > > > The root cause is difference you said upper, so I only fixes for non-zerocopy AF_XDP > I have only addressed non-zerocopy, so we're on the same page (pun intended) here! Björn > -Li ^ permalink raw reply [flat|nested] 30+ messages in thread
* [Intel-wired-lan] 答复: [PATCH 0/2] intel/xdp fixes for fliping rx buffer @ 2020-08-19 6:44 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 0 siblings, 0 replies; 30+ messages in thread From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2020-08-19 6:44 UTC (permalink / raw) To: intel-wired-lan On 2020-08-19 03:37, Li,Rongqing wrote: [...] > Hi: > > Thanks for your explanation. > > But we can reproduce this bug > > We use ebpf to redirect only-Vxlan packets to non-zerocopy AF_XDP, First we see panic on tcp stack, in tcp_collapse: BUG_ON(offset < 0); it is very hard to reproduce. > > Then we use the scp to do test, and has lots of vxlan packet at the same time, scp will be broken frequently. > Ok! Just so that I'm certain of your setup. You receive packets to an i40e netdev where there's an XDP program. The program does XDP_PASS or XDP_REDIRECT to e.g. devmap for non-vxlan packets. However, vxlan packets are redirected to AF_XDP socket(s) in *copy-mode*. Am I understanding that correct? I'm assuming this is an x86-64 with 4k page size, right? :-) The page flipping is a bit different if the PAGE_SIZE is not 4k. > With this fixes, scp has not been broken again, and kernel is not panic again > Let's dig into your scenario. Are you saying the following: Page A: +------------ | "first skb" ----> Rx HW ring entry X +------------ | "second skb"----> Rx HW ring entry X+1 (or X+n) +------------ This is a scenario that shouldn't be allowed, because there are now two users of the page. If that's the case, the refcounting is broken. Is that the case? Check out i40e_can_reuse_rx_page(). The idea with page flipping/reuse is that the page is only reused if there is only one user. > Seem your explanation is unable to solve my analysis: > > 1. first skb is not for xsk, and forwarded to another device > or socket queue The data for the "first skb" resides on a page: A: +------------ | "first skb" +------------ | to be reused +------------ refcount >>1 > 2. seconds skb is for xsk, copy data to xsk memory, and page > of skb->data is released Note that page B != page A. B: +------------ | to be reused/or used by the stack +------------ | "second skb for xsk" +------------ refcount >>1 data is copied to socket, page_frag_free() is called, and the page count is decreased. The driver will then check if the page can be reused. If not, it's freed to the page allocator. > 3. rx_buff is reusable since only first skb is in it, but > *_rx_buffer_flip will make that page_offset is set to > first skb data I'm having trouble grasping how this is possible. More than one user implies that it wont be reused. If this is possible, the recounting/reuse mechanism is broken, and that is what should be fixed. The AF_XDP redirect should not have semantics different from, say, devmap redirect. It's just that the page_frag_free() is called earlier for AF_XDP, instead of from i40e_clean_tx_irq() as the case for devmap/XDP_TX. > 4. then reuse rx buffer, first skb which still is living > will be corrupted. > > > The root cause is difference you said upper, so I only fixes for non-zerocopy AF_XDP > I have only addressed non-zerocopy, so we're on the same page (pun intended) here! Bj?rn > -Li ^ permalink raw reply [flat|nested] 30+ messages in thread
* 答复: 答复: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer 2020-08-19 6:44 ` [Intel-wired-lan] 答复: " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2020-08-19 8:17 ` Li, Rongqing -1 siblings, 0 replies; 30+ messages in thread From: Li,Rongqing @ 2020-08-19 8:17 UTC (permalink / raw) To: Björn Töpel, Björn Töpel Cc: Netdev, intel-wired-lan, Karlsson, Magnus, bpf, Maciej Fijalkowski, Piotr, Maciej > -----邮件原件----- > 发件人: Björn Töpel [mailto:bjorn.topel@intel.com] > 发送时间: 2020年8月19日 14:45 > 收件人: Li,Rongqing <lirongqing@baidu.com>; Björn Töpel > <bjorn.topel@gmail.com> > 抄送: Netdev <netdev@vger.kernel.org>; intel-wired-lan > <intel-wired-lan@lists.osuosl.org>; Karlsson, Magnus > <magnus.karlsson@intel.com>; bpf <bpf@vger.kernel.org>; Maciej Fijalkowski > <maciej.fijalkowski@intel.com>; Piotr <piotr.raczynski@intel.com>; Maciej > <maciej.machnikowski@intel.com> > 主题: Re: 答复: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer > > On 2020-08-19 03:37, Li,Rongqing wrote: > [...] > > Hi: > > > > Thanks for your explanation. > > > > But we can reproduce this bug > > > > We use ebpf to redirect only-Vxlan packets to non-zerocopy AF_XDP, First we > see panic on tcp stack, in tcp_collapse: BUG_ON(offset < 0); it is very hard to > reproduce. > > > > Then we use the scp to do test, and has lots of vxlan packet at the same > time, scp will be broken frequently. > > > > Ok! Just so that I'm certain of your setup. You receive packets to an i40e netdev > where there's an XDP program. The program does XDP_PASS or XDP_REDIRECT > to e.g. devmap for non-vxlan packets. However, vxlan packets are redirected to > AF_XDP socket(s) in *copy-mode*. Am I understanding that correct? > Similar as your description, but the xdp program only redirects vxlan packets to af_xdp socket, other packets will go to Linux kernel networking stack, like scp/ssh packets > I'm assuming this is an x86-64 with 4k page size, right? :-) The page flipping is a > bit different if the PAGE_SIZE is not 4k. > We use 4k page size, page flipping is 4k, we did not change the i40e drivers, 4.19 stable kernel > > With this fixes, scp has not been broken again, and kernel is not panic > again > > > Let's dig into your scenario. > > Are you saying the following: > > Page A: > +------------ > | "first skb" ----> Rx HW ring entry X > +------------ > | "second skb"----> Rx HW ring entry X+1 (or X+n) > +------------ > Like: First skb will be into tcp socket rx queue Seconds skb is vxlan packet, will be copy to af_xdp socket, and released. > This is a scenario that shouldn't be allowed, because there are now two users > of the page. If that's the case, the refcounting is broken. Is that the case? > True, it is broken for copy mode xsk -Li > Check out i40e_can_reuse_rx_page(). The idea with page flipping/reuse is that > the page is only reused if there is only one user. > > > Seem your explanation is unable to solve my analysis: > > > > 1. first skb is not for xsk, and forwarded to another device > > or socket queue > > The data for the "first skb" resides on a page: > A: > +------------ > | "first skb" > +------------ > | to be reused > +------------ > refcount >>1 > > > 2. seconds skb is for xsk, copy data to xsk memory, and page > > of skb->data is released > > Note that page B != page A. > > B: > +------------ > | to be reused/or used by the stack > +------------ > | "second skb for xsk" > +------------ > refcount >>1 > > data is copied to socket, page_frag_free() is called, and the page count is > decreased. The driver will then check if the page can be reused. If not, it's freed > to the page allocator. > > > 3. rx_buff is reusable since only first skb is in it, but > > *_rx_buffer_flip will make that page_offset is set to > > first skb data > > I'm having trouble grasping how this is possible. More than one user implies > that it wont be reused. If this is possible, the recounting/reuse mechanism is > broken, and that is what should be fixed. > > The AF_XDP redirect should not have semantics different from, say, devmap > redirect. It's just that the page_frag_free() is called earlier for AF_XDP, instead > of from i40e_clean_tx_irq() as the case for devmap/XDP_TX. > > > 4. then reuse rx buffer, first skb which still is living > > will be corrupted. > > > > > > The root cause is difference you said upper, so I only fixes for non-zerocopy > AF_XDP > > > I have only addressed non-zerocopy, so we're on the same page (pun > intended) here! > > > Björn > > > -Li ^ permalink raw reply [flat|nested] 30+ messages in thread
* [Intel-wired-lan] 答复: 答复: [PATCH 0/2] intel/xdp fixes for fliping rx buffer @ 2020-08-19 8:17 ` Li, Rongqing 0 siblings, 0 replies; 30+ messages in thread From: Li, Rongqing @ 2020-08-19 8:17 UTC (permalink / raw) To: intel-wired-lan > -----????----- > ???: Bj?rn T?pel [mailto:bjorn.topel at intel.com] > ????: 2020?8?19? 14:45 > ???: Li,Rongqing <lirongqing@baidu.com>; Bj?rn T?pel > <bjorn.topel@gmail.com> > ??: Netdev <netdev@vger.kernel.org>; intel-wired-lan > <intel-wired-lan@lists.osuosl.org>; Karlsson, Magnus > <magnus.karlsson@intel.com>; bpf <bpf@vger.kernel.org>; Maciej Fijalkowski > <maciej.fijalkowski@intel.com>; Piotr <piotr.raczynski@intel.com>; Maciej > <maciej.machnikowski@intel.com> > ??: Re: ??: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer > > On 2020-08-19 03:37, Li,Rongqing wrote: > [...] > > Hi: > > > > Thanks for your explanation. > > > > But we can reproduce this bug > > > > We use ebpf to redirect only-Vxlan packets to non-zerocopy AF_XDP, First we > see panic on tcp stack, in tcp_collapse: BUG_ON(offset < 0); it is very hard to > reproduce. > > > > Then we use the scp to do test, and has lots of vxlan packet at the same > time, scp will be broken frequently. > > > > Ok! Just so that I'm certain of your setup. You receive packets to an i40e netdev > where there's an XDP program. The program does XDP_PASS or XDP_REDIRECT > to e.g. devmap for non-vxlan packets. However, vxlan packets are redirected to > AF_XDP socket(s) in *copy-mode*. Am I understanding that correct? > Similar as your description, but the xdp program only redirects vxlan packets to af_xdp socket, other packets will go to Linux kernel networking stack, like scp/ssh packets > I'm assuming this is an x86-64 with 4k page size, right? :-) The page flipping is a > bit different if the PAGE_SIZE is not 4k. > We use 4k page size, page flipping is 4k, we did not change the i40e drivers, 4.19 stable kernel > > With this fixes, scp has not been broken again, and kernel is not panic > again > > > Let's dig into your scenario. > > Are you saying the following: > > Page A: > +------------ > | "first skb" ----> Rx HW ring entry X > +------------ > | "second skb"----> Rx HW ring entry X+1 (or X+n) > +------------ > Like: First skb will be into tcp socket rx queue Seconds skb is vxlan packet, will be copy to af_xdp socket, and released. > This is a scenario that shouldn't be allowed, because there are now two users > of the page. If that's the case, the refcounting is broken. Is that the case? > True, it is broken for copy mode xsk -Li > Check out i40e_can_reuse_rx_page(). The idea with page flipping/reuse is that > the page is only reused if there is only one user. > > > Seem your explanation is unable to solve my analysis: > > > > 1. first skb is not for xsk, and forwarded to another device > > or socket queue > > The data for the "first skb" resides on a page: > A: > +------------ > | "first skb" > +------------ > | to be reused > +------------ > refcount >>1 > > > 2. seconds skb is for xsk, copy data to xsk memory, and page > > of skb->data is released > > Note that page B != page A. > > B: > +------------ > | to be reused/or used by the stack > +------------ > | "second skb for xsk" > +------------ > refcount >>1 > > data is copied to socket, page_frag_free() is called, and the page count is > decreased. The driver will then check if the page can be reused. If not, it's freed > to the page allocator. > > > 3. rx_buff is reusable since only first skb is in it, but > > *_rx_buffer_flip will make that page_offset is set to > > first skb data > > I'm having trouble grasping how this is possible. More than one user implies > that it wont be reused. If this is possible, the recounting/reuse mechanism is > broken, and that is what should be fixed. > > The AF_XDP redirect should not have semantics different from, say, devmap > redirect. It's just that the page_frag_free() is called earlier for AF_XDP, instead > of from i40e_clean_tx_irq() as the case for devmap/XDP_TX. > > > 4. then reuse rx buffer, first skb which still is living > > will be corrupted. > > > > > > The root cause is difference you said upper, so I only fixes for non-zerocopy > AF_XDP > > > I have only addressed non-zerocopy, so we're on the same page (pun > intended) here! > > > Bj?rn > > > -Li ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 答复: 答复: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer 2020-08-19 8:17 ` [Intel-wired-lan] 答复: 答复: " Li, Rongqing @ 2020-08-19 8:31 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= -1 siblings, 0 replies; 30+ messages in thread From: Björn Töpel @ 2020-08-19 8:31 UTC (permalink / raw) To: Li,Rongqing, Björn Töpel Cc: Netdev, intel-wired-lan, Karlsson, Magnus, bpf, Maciej Fijalkowski, Piotr, Maciej On 2020-08-19 10:17, Li,Rongqing wrote: > > >> -----邮件原件----- >> 发件人: Björn Töpel [mailto:bjorn.topel@intel.com] >> 发送时间: 2020年8月19日 14:45 >> 收件人: Li,Rongqing <lirongqing@baidu.com>; Björn Töpel >> <bjorn.topel@gmail.com> >> 抄送: Netdev <netdev@vger.kernel.org>; intel-wired-lan >> <intel-wired-lan@lists.osuosl.org>; Karlsson, Magnus >> <magnus.karlsson@intel.com>; bpf <bpf@vger.kernel.org>; Maciej Fijalkowski >> <maciej.fijalkowski@intel.com>; Piotr <piotr.raczynski@intel.com>; Maciej >> <maciej.machnikowski@intel.com> >> 主题: Re: 答复: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer >> >> On 2020-08-19 03:37, Li,Rongqing wrote: >> [...] >> > Hi: >> > >> > Thanks for your explanation. >> > >> > But we can reproduce this bug >> > >> > We use ebpf to redirect only-Vxlan packets to non-zerocopy AF_XDP, First we >> see panic on tcp stack, in tcp_collapse: BUG_ON(offset < 0); it is very hard to >> reproduce. >> > >> > Then we use the scp to do test, and has lots of vxlan packet at the same >> time, scp will be broken frequently. >> > >> >> Ok! Just so that I'm certain of your setup. You receive packets to an i40e netdev >> where there's an XDP program. The program does XDP_PASS or XDP_REDIRECT >> to e.g. devmap for non-vxlan packets. However, vxlan packets are redirected to >> AF_XDP socket(s) in *copy-mode*. Am I understanding that correct? >> > Similar as your description, > > but the xdp program only redirects vxlan packets to af_xdp socket, other packets will go to Linux kernel networking stack, like scp/ssh packets > > >> I'm assuming this is an x86-64 with 4k page size, right? :-) The page flipping is a >> bit different if the PAGE_SIZE is not 4k. >> > > We use 4k page size, page flipping is 4k, we did not change the i40e drivers, 4.19 stable kernel > Would you mind testing on a newer kernel? Say the latest stable 5.8.2? >> > With this fixes, scp has not been broken again, and kernel is not panic >> again > >> >> Let's dig into your scenario. >> >> Are you saying the following: >> >> Page A: >> +------------ >> | "first skb" ----> Rx HW ring entry X >> +------------ >> | "second skb"----> Rx HW ring entry X+1 (or X+n) >> +------------ >> > > Like: > > First skb will be into tcp socket rx queue > > Seconds skb is vxlan packet, will be copy to af_xdp socket, and released. > >> This is a scenario that shouldn't be allowed, because there are now two users >> of the page. If that's the case, the refcounting is broken. Is that the case? >> > > True, it is broken for copy mode xsk > Ok. However, the fix is not avoiding the page_frag_free, but finding and fixing the refcount bug. I'll have a deeper look. But please, try to reproduce with a newer kernel. Thanks, Björn ^ permalink raw reply [flat|nested] 30+ messages in thread
* [Intel-wired-lan] 答复: 答复: [PATCH 0/2] intel/xdp fixes for fliping rx buffer @ 2020-08-19 8:31 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 0 siblings, 0 replies; 30+ messages in thread From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2020-08-19 8:31 UTC (permalink / raw) To: intel-wired-lan On 2020-08-19 10:17, Li,Rongqing wrote: > > >> -----????----- >> ???: Bj?rn T?pel [mailto:bjorn.topel at intel.com] >> ????: 2020?8?19? 14:45 >> ???: Li,Rongqing <lirongqing@baidu.com>; Bj?rn T?pel >> <bjorn.topel@gmail.com> >> ??: Netdev <netdev@vger.kernel.org>; intel-wired-lan >> <intel-wired-lan@lists.osuosl.org>; Karlsson, Magnus >> <magnus.karlsson@intel.com>; bpf <bpf@vger.kernel.org>; Maciej Fijalkowski >> <maciej.fijalkowski@intel.com>; Piotr <piotr.raczynski@intel.com>; Maciej >> <maciej.machnikowski@intel.com> >> ??: Re: ??: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer >> >> On 2020-08-19 03:37, Li,Rongqing wrote: >> [...] >> > Hi: >> > >> > Thanks for your explanation. >> > >> > But we can reproduce this bug >> > >> > We use ebpf to redirect only-Vxlan packets to non-zerocopy AF_XDP, First we >> see panic on tcp stack, in tcp_collapse: BUG_ON(offset < 0); it is very hard to >> reproduce. >> > >> > Then we use the scp to do test, and has lots of vxlan packet at the same >> time, scp will be broken frequently. >> > >> >> Ok! Just so that I'm certain of your setup. You receive packets to an i40e netdev >> where there's an XDP program. The program does XDP_PASS or XDP_REDIRECT >> to e.g. devmap for non-vxlan packets. However, vxlan packets are redirected to >> AF_XDP socket(s) in *copy-mode*. Am I understanding that correct? >> > Similar as your description, > > but the xdp program only redirects vxlan packets to af_xdp socket, other packets will go to Linux kernel networking stack, like scp/ssh packets > > >> I'm assuming this is an x86-64 with 4k page size, right? :-) The page flipping is a >> bit different if the PAGE_SIZE is not 4k. >> > > We use 4k page size, page flipping is 4k, we did not change the i40e drivers, 4.19 stable kernel > Would you mind testing on a newer kernel? Say the latest stable 5.8.2? >> > With this fixes, scp has not been broken again, and kernel is not panic >> again > >> >> Let's dig into your scenario. >> >> Are you saying the following: >> >> Page A: >> +------------ >> | "first skb" ----> Rx HW ring entry X >> +------------ >> | "second skb"----> Rx HW ring entry X+1 (or X+n) >> +------------ >> > > Like: > > First skb will be into tcp socket rx queue > > Seconds skb is vxlan packet, will be copy to af_xdp socket, and released. > >> This is a scenario that shouldn't be allowed, because there are now two users >> of the page. If that's the case, the refcounting is broken. Is that the case? >> > > True, it is broken for copy mode xsk > Ok. However, the fix is not avoiding the page_frag_free, but finding and fixing the refcount bug. I'll have a deeper look. But please, try to reproduce with a newer kernel. Thanks, Bj?rn ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: 答复: 答复: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer 2020-08-19 8:31 ` [Intel-wired-lan] 答复: 答复: " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2020-08-19 8:52 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= -1 siblings, 0 replies; 30+ messages in thread From: Björn Töpel @ 2020-08-19 8:52 UTC (permalink / raw) To: Li,Rongqing, Björn Töpel Cc: Netdev, intel-wired-lan, Karlsson, Magnus, bpf, Maciej Fijalkowski, Piotr, Maciej On 2020-08-19 10:31, Björn Töpel wrote: [...] > > But please, try to reproduce with a newer kernel. > Also, you are *sure* that you're touching stale data? Have you tried running with CONFIG_DEBUG_PAGEALLOC and CONFIG_PAGE_POISONING? Björn ^ permalink raw reply [flat|nested] 30+ messages in thread
* [Intel-wired-lan] 答复: 答复: [PATCH 0/2] intel/xdp fixes for fliping rx buffer @ 2020-08-19 8:52 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 0 siblings, 0 replies; 30+ messages in thread From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2020-08-19 8:52 UTC (permalink / raw) To: intel-wired-lan On 2020-08-19 10:31, Bj?rn T?pel wrote: [...] > > But please, try to reproduce with a newer kernel. > Also, you are *sure* that you're touching stale data? Have you tried running with CONFIG_DEBUG_PAGEALLOC and CONFIG_PAGE_POISONING? Bj?rn ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer 2020-08-18 14:04 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2020-08-20 15:13 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= -1 siblings, 0 replies; 30+ messages in thread From: Björn Töpel @ 2020-08-20 15:13 UTC (permalink / raw) To: Li RongQing Cc: Netdev, intel-wired-lan, Karlsson, Magnus, Björn Töpel, bpf, Maciej Fijalkowski, Piotr, Maciej On Tue, 18 Aug 2020 at 16:04, Björn Töpel <bjorn.topel@gmail.com> wrote: > > On Fri, 17 Jul 2020 at 08:24, Li RongQing <lirongqing@baidu.com> wrote: > > > > This fixes ice/i40e/ixgbe/ixgbevf_rx_buffer_flip in > > copy mode xdp that can lead to data corruption. > > > > I split two patches, since i40e/xgbe/ixgbevf supports xsk > > receiving from 4.18, put their fixes in a patch > > > > Li, sorry for the looong latency. I took a looong vacation. :-P > > Thanks for taking a look at this, but I believe this is not a bug. > Ok, dug a bit more into this. I had an offlist discussion with Li, and there are two places (AFAIK) where Li experience a BUG() in tcp_collapse(): BUG_ON(offset < 0); and if (skb_copy_bits(skb, offset, skb_put(nskb, size), size)) BUG(); (Li, please correct me if I'm wrong.) I still claim that the page-flipping mechanism is correct, but I found some weirdness in the build_skb() call. In drivers/net/ethernet/intel/i40e/i40e_txrx.c, build_skb() is invoked as: skb = build_skb(xdp->data_hard_start, truesize); For the setup Li has truesize is 2048 (half a page), but the rx_buf_len is 1536. In the driver a packet is layed out as: | padding 192 | packet data 1536 | skb shared info 320 | build_skb() assumes that the second argument (frag_size) is max packet size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)). In other words, frag_size should not include the padding (192 above). In build_skb(), frag_size is used to compute the skb truesize and skb end. i40e passes a too large buffer, and can therefore potentially corrupt the skb, and maybe this is the reason for tcp_collapse() splatting. Li, could you test if you get the splat with this patch: diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 3e5c566ceb01..acfb4ad9b506 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2065,7 +2065,8 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring, { unsigned int metasize = xdp->data - xdp->data_meta; #if (PAGE_SIZE < 8192) - unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2; + unsigned int truesize = rx_ring->rx_buf_len + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); #else unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + SKB_DATA_ALIGN(xdp->data_end - I'll have a look in the other Intel drivers, and see if there are similar issues. I'll cook a patch. Björn ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer @ 2020-08-20 15:13 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 0 siblings, 0 replies; 30+ messages in thread From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2020-08-20 15:13 UTC (permalink / raw) To: intel-wired-lan On Tue, 18 Aug 2020 at 16:04, Bj?rn T?pel <bjorn.topel@gmail.com> wrote: > > On Fri, 17 Jul 2020 at 08:24, Li RongQing <lirongqing@baidu.com> wrote: > > > > This fixes ice/i40e/ixgbe/ixgbevf_rx_buffer_flip in > > copy mode xdp that can lead to data corruption. > > > > I split two patches, since i40e/xgbe/ixgbevf supports xsk > > receiving from 4.18, put their fixes in a patch > > > > Li, sorry for the looong latency. I took a looong vacation. :-P > > Thanks for taking a look at this, but I believe this is not a bug. > Ok, dug a bit more into this. I had an offlist discussion with Li, and there are two places (AFAIK) where Li experience a BUG() in tcp_collapse(): BUG_ON(offset < 0); and if (skb_copy_bits(skb, offset, skb_put(nskb, size), size)) BUG(); (Li, please correct me if I'm wrong.) I still claim that the page-flipping mechanism is correct, but I found some weirdness in the build_skb() call. In drivers/net/ethernet/intel/i40e/i40e_txrx.c, build_skb() is invoked as: skb = build_skb(xdp->data_hard_start, truesize); For the setup Li has truesize is 2048 (half a page), but the rx_buf_len is 1536. In the driver a packet is layed out as: | padding 192 | packet data 1536 | skb shared info 320 | build_skb() assumes that the second argument (frag_size) is max packet size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)). In other words, frag_size should not include the padding (192 above). In build_skb(), frag_size is used to compute the skb truesize and skb end. i40e passes a too large buffer, and can therefore potentially corrupt the skb, and maybe this is the reason for tcp_collapse() splatting. Li, could you test if you get the splat with this patch: diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 3e5c566ceb01..acfb4ad9b506 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2065,7 +2065,8 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring, { unsigned int metasize = xdp->data - xdp->data_meta; #if (PAGE_SIZE < 8192) - unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2; + unsigned int truesize = rx_ring->rx_buf_len + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); #else unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + SKB_DATA_ALIGN(xdp->data_end - I'll have a look in the other Intel drivers, and see if there are similar issues. I'll cook a patch. Bj?rn ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer 2020-08-20 15:13 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2020-08-20 16:51 ` Maciej Fijalkowski -1 siblings, 0 replies; 30+ messages in thread From: Maciej Fijalkowski @ 2020-08-20 16:51 UTC (permalink / raw) To: Björn Töpel Cc: Li RongQing, Netdev, intel-wired-lan, Karlsson, Magnus, Björn Töpel, bpf, Piotr, Maciej On Thu, Aug 20, 2020 at 05:13:16PM +0200, Björn Töpel wrote: > On Tue, 18 Aug 2020 at 16:04, Björn Töpel <bjorn.topel@gmail.com> wrote: > > > > On Fri, 17 Jul 2020 at 08:24, Li RongQing <lirongqing@baidu.com> wrote: > > > > > > This fixes ice/i40e/ixgbe/ixgbevf_rx_buffer_flip in > > > copy mode xdp that can lead to data corruption. > > > > > > I split two patches, since i40e/xgbe/ixgbevf supports xsk > > > receiving from 4.18, put their fixes in a patch > > > > > > > Li, sorry for the looong latency. I took a looong vacation. :-P > > > > Thanks for taking a look at this, but I believe this is not a bug. > > > > Ok, dug a bit more into this. I had an offlist discussion with Li, and > there are two places (AFAIK) where Li experience a BUG() in > tcp_collapse(): > > BUG_ON(offset < 0); > and > if (skb_copy_bits(skb, offset, skb_put(nskb, size), size)) > BUG(); > > (Li, please correct me if I'm wrong.) > > I still claim that the page-flipping mechanism is correct, but I found > some weirdness in the build_skb() call. > > In drivers/net/ethernet/intel/i40e/i40e_txrx.c, build_skb() is invoked as: > skb = build_skb(xdp->data_hard_start, truesize); > > For the setup Li has truesize is 2048 (half a page), but the > rx_buf_len is 1536. In the driver a packet is layed out as: > > | padding 192 | packet data 1536 | skb shared info 320 | > > build_skb() assumes that the second argument (frag_size) is max packet > size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)). In other words, > frag_size should not include the padding (192 above). In build_skb(), Not sure I am buying that reasoning. It assumes the padding + packet_data and we use skb_reserve() to tell the skb about the padding. __build_skb_around() subtracts sizeof(struct skb_shared_info) from size that we are providing, so now we are with padding + packet_data. Then it is used to calculate the skb->end. Back to i40e_build_skb(), we use the skb_reserve() to advance the skb->data and skb->tail so that they point to packet_data. Finally __skb_put() will move the skb->tail to the end of packet_data. Wouldn't your approach disallow having the headroom at all in the linear part of skb? > frag_size is used to compute the skb truesize and skb end. i40e passes IMHO skb->end is correct. For skb->truesize I would assume that the headroom should also be taken into account for tracking how many bytes a particular skb consumes, no? > a too large buffer, and can therefore potentially corrupt the skb, and > maybe this is the reason for tcp_collapse() splatting. > > Li, could you test if you get the splat with this patch: > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c > b/drivers/net/ethernet/intel/i40e/i40e_txrx.c > index 3e5c566ceb01..acfb4ad9b506 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c > +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c > @@ -2065,7 +2065,8 @@ static struct sk_buff *i40e_build_skb(struct > i40e_ring *rx_ring, > { > unsigned int metasize = xdp->data - xdp->data_meta; > #if (PAGE_SIZE < 8192) > - unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2; > + unsigned int truesize = rx_ring->rx_buf_len + > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); This will actually break the page flipping scheme. We need a separate variable for that and use the old truesize to bump the page_offset. > #else > unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + > SKB_DATA_ALIGN(xdp->data_end - > > I'll have a look in the other Intel drivers, and see if there are > similar issues. I'll cook a patch. > > > Björn ^ permalink raw reply [flat|nested] 30+ messages in thread
* [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer @ 2020-08-20 16:51 ` Maciej Fijalkowski 0 siblings, 0 replies; 30+ messages in thread From: Maciej Fijalkowski @ 2020-08-20 16:51 UTC (permalink / raw) To: intel-wired-lan On Thu, Aug 20, 2020 at 05:13:16PM +0200, Bj?rn T?pel wrote: > On Tue, 18 Aug 2020 at 16:04, Bj?rn T?pel <bjorn.topel@gmail.com> wrote: > > > > On Fri, 17 Jul 2020 at 08:24, Li RongQing <lirongqing@baidu.com> wrote: > > > > > > This fixes ice/i40e/ixgbe/ixgbevf_rx_buffer_flip in > > > copy mode xdp that can lead to data corruption. > > > > > > I split two patches, since i40e/xgbe/ixgbevf supports xsk > > > receiving from 4.18, put their fixes in a patch > > > > > > > Li, sorry for the looong latency. I took a looong vacation. :-P > > > > Thanks for taking a look at this, but I believe this is not a bug. > > > > Ok, dug a bit more into this. I had an offlist discussion with Li, and > there are two places (AFAIK) where Li experience a BUG() in > tcp_collapse(): > > BUG_ON(offset < 0); > and > if (skb_copy_bits(skb, offset, skb_put(nskb, size), size)) > BUG(); > > (Li, please correct me if I'm wrong.) > > I still claim that the page-flipping mechanism is correct, but I found > some weirdness in the build_skb() call. > > In drivers/net/ethernet/intel/i40e/i40e_txrx.c, build_skb() is invoked as: > skb = build_skb(xdp->data_hard_start, truesize); > > For the setup Li has truesize is 2048 (half a page), but the > rx_buf_len is 1536. In the driver a packet is layed out as: > > | padding 192 | packet data 1536 | skb shared info 320 | > > build_skb() assumes that the second argument (frag_size) is max packet > size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)). In other words, > frag_size should not include the padding (192 above). In build_skb(), Not sure I am buying that reasoning. It assumes the padding + packet_data and we use skb_reserve() to tell the skb about the padding. __build_skb_around() subtracts sizeof(struct skb_shared_info) from size that we are providing, so now we are with padding + packet_data. Then it is used to calculate the skb->end. Back to i40e_build_skb(), we use the skb_reserve() to advance the skb->data and skb->tail so that they point to packet_data. Finally __skb_put() will move the skb->tail to the end of packet_data. Wouldn't your approach disallow having the headroom at all in the linear part of skb? > frag_size is used to compute the skb truesize and skb end. i40e passes IMHO skb->end is correct. For skb->truesize I would assume that the headroom should also be taken into account for tracking how many bytes a particular skb consumes, no? > a too large buffer, and can therefore potentially corrupt the skb, and > maybe this is the reason for tcp_collapse() splatting. > > Li, could you test if you get the splat with this patch: > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c > b/drivers/net/ethernet/intel/i40e/i40e_txrx.c > index 3e5c566ceb01..acfb4ad9b506 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c > +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c > @@ -2065,7 +2065,8 @@ static struct sk_buff *i40e_build_skb(struct > i40e_ring *rx_ring, > { > unsigned int metasize = xdp->data - xdp->data_meta; > #if (PAGE_SIZE < 8192) > - unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2; > + unsigned int truesize = rx_ring->rx_buf_len + > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); This will actually break the page flipping scheme. We need a separate variable for that and use the old truesize to bump the page_offset. > #else > unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + > SKB_DATA_ALIGN(xdp->data_end - > > I'll have a look in the other Intel drivers, and see if there are > similar issues. I'll cook a patch. > > > Bj?rn ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer 2020-08-20 16:51 ` Maciej Fijalkowski @ 2020-08-20 18:04 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= -1 siblings, 0 replies; 30+ messages in thread From: Björn Töpel @ 2020-08-20 18:04 UTC (permalink / raw) To: Maciej Fijalkowski, Björn Töpel Cc: Li RongQing, Netdev, intel-wired-lan, Karlsson, Magnus, bpf, Piotr, Maciej On 2020-08-20 18:51, Maciej Fijalkowski wrote: > On Thu, Aug 20, 2020 at 05:13:16PM +0200, Björn Töpel wrote: >> On Tue, 18 Aug 2020 at 16:04, Björn Töpel <bjorn.topel@gmail.com> wrote: >>> >>> On Fri, 17 Jul 2020 at 08:24, Li RongQing <lirongqing@baidu.com> wrote: >>>> >>>> This fixes ice/i40e/ixgbe/ixgbevf_rx_buffer_flip in >>>> copy mode xdp that can lead to data corruption. >>>> >>>> I split two patches, since i40e/xgbe/ixgbevf supports xsk >>>> receiving from 4.18, put their fixes in a patch >>>> >>> >>> Li, sorry for the looong latency. I took a looong vacation. :-P >>> >>> Thanks for taking a look at this, but I believe this is not a bug. >>> >> >> Ok, dug a bit more into this. I had an offlist discussion with Li, and >> there are two places (AFAIK) where Li experience a BUG() in >> tcp_collapse(): >> >> BUG_ON(offset < 0); >> and >> if (skb_copy_bits(skb, offset, skb_put(nskb, size), size)) >> BUG(); >> >> (Li, please correct me if I'm wrong.) >> >> I still claim that the page-flipping mechanism is correct, but I found >> some weirdness in the build_skb() call. >> >> In drivers/net/ethernet/intel/i40e/i40e_txrx.c, build_skb() is invoked as: >> skb = build_skb(xdp->data_hard_start, truesize); >> >> For the setup Li has truesize is 2048 (half a page), but the >> rx_buf_len is 1536. In the driver a packet is layed out as: >> >> | padding 192 | packet data 1536 | skb shared info 320 | >> >> build_skb() assumes that the second argument (frag_size) is max packet >> size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)). In other words, >> frag_size should not include the padding (192 above). In build_skb(), > > Not sure I am buying that reasoning. It assumes the padding + packet_data > and we use skb_reserve() to tell the skb about the padding. > > __build_skb_around() subtracts sizeof(struct skb_shared_info) from size > that we are providing, so now we are with padding + packet_data. > Then it is used to calculate the skb->end. > > Back to i40e_build_skb(), we use the skb_reserve() to advance the > skb->data and skb->tail so that they point to packet_data. Finally > __skb_put() will move the skb->tail to the end of packet_data. > > Wouldn't your approach disallow having the headroom at all in the linear > part of skb? > Mea culpa. You're perfectly right, and I'm all wrong. Thanks for sorting that out. xdp->data_hard_start messed up my neurons (if any one should ask). *climbing back into the cave* Sorry for the mail noise, Björn ^ permalink raw reply [flat|nested] 30+ messages in thread
* [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer @ 2020-08-20 18:04 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 0 siblings, 0 replies; 30+ messages in thread From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2020-08-20 18:04 UTC (permalink / raw) To: intel-wired-lan On 2020-08-20 18:51, Maciej Fijalkowski wrote: > On Thu, Aug 20, 2020 at 05:13:16PM +0200, Bj?rn T?pel wrote: >> On Tue, 18 Aug 2020 at 16:04, Bj?rn T?pel <bjorn.topel@gmail.com> wrote: >>> >>> On Fri, 17 Jul 2020 at 08:24, Li RongQing <lirongqing@baidu.com> wrote: >>>> >>>> This fixes ice/i40e/ixgbe/ixgbevf_rx_buffer_flip in >>>> copy mode xdp that can lead to data corruption. >>>> >>>> I split two patches, since i40e/xgbe/ixgbevf supports xsk >>>> receiving from 4.18, put their fixes in a patch >>>> >>> >>> Li, sorry for the looong latency. I took a looong vacation. :-P >>> >>> Thanks for taking a look at this, but I believe this is not a bug. >>> >> >> Ok, dug a bit more into this. I had an offlist discussion with Li, and >> there are two places (AFAIK) where Li experience a BUG() in >> tcp_collapse(): >> >> BUG_ON(offset < 0); >> and >> if (skb_copy_bits(skb, offset, skb_put(nskb, size), size)) >> BUG(); >> >> (Li, please correct me if I'm wrong.) >> >> I still claim that the page-flipping mechanism is correct, but I found >> some weirdness in the build_skb() call. >> >> In drivers/net/ethernet/intel/i40e/i40e_txrx.c, build_skb() is invoked as: >> skb = build_skb(xdp->data_hard_start, truesize); >> >> For the setup Li has truesize is 2048 (half a page), but the >> rx_buf_len is 1536. In the driver a packet is layed out as: >> >> | padding 192 | packet data 1536 | skb shared info 320 | >> >> build_skb() assumes that the second argument (frag_size) is max packet >> size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)). In other words, >> frag_size should not include the padding (192 above). In build_skb(), > > Not sure I am buying that reasoning. It assumes the padding + packet_data > and we use skb_reserve() to tell the skb about the padding. > > __build_skb_around() subtracts sizeof(struct skb_shared_info) from size > that we are providing, so now we are with padding + packet_data. > Then it is used to calculate the skb->end. > > Back to i40e_build_skb(), we use the skb_reserve() to advance the > skb->data and skb->tail so that they point to packet_data. Finally > __skb_put() will move the skb->tail to the end of packet_data. > > Wouldn't your approach disallow having the headroom at all in the linear > part of skb? > Mea culpa. You're perfectly right, and I'm all wrong. Thanks for sorting that out. xdp->data_hard_start messed up my neurons (if any one should ask). *climbing back into the cave* Sorry for the mail noise, Bj?rn ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2020-08-20 18:04 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-17 6:24 [PATCH 0/2] intel/xdp fixes for fliping rx buffer Li RongQing 2020-07-17 6:24 ` [Intel-wired-lan] " Li RongQing 2020-07-17 6:24 ` [PATCH 1/2] xdp: i40e: ixgbe: ixgbevf: not flip rx buffer for copy mode xdp Li RongQing 2020-07-17 6:24 ` [Intel-wired-lan] " Li RongQing 2020-07-20 7:21 ` Magnus Karlsson 2020-07-20 7:21 ` Magnus Karlsson 2020-07-21 1:42 ` 答复: " Li,Rongqing 2020-07-21 1:42 ` [Intel-wired-lan] 答复: " Li, Rongqing 2020-07-21 7:49 ` 答复: [Intel-wired-lan] " Li,Rongqing 2020-07-21 7:49 ` [Intel-wired-lan] 答复: " Li, Rongqing 2020-07-17 6:24 ` [PATCH 2/2] ice/xdp: not adjust " Li RongQing 2020-07-17 6:24 ` [Intel-wired-lan] " Li RongQing 2020-08-18 14:04 ` [Intel-wired-lan] [PATCH 0/2] intel/xdp fixes for fliping rx buffer Björn Töpel 2020-08-18 14:04 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-08-19 1:37 ` 答复: " Li,Rongqing 2020-08-19 1:37 ` [Intel-wired-lan] 答复: " Li, Rongqing 2020-08-19 6:44 ` 答复: [Intel-wired-lan] " Björn Töpel 2020-08-19 6:44 ` [Intel-wired-lan] 答复: " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-08-19 8:17 ` 答复: 答复: [Intel-wired-lan] " Li,Rongqing 2020-08-19 8:17 ` [Intel-wired-lan] 答复: 答复: " Li, Rongqing 2020-08-19 8:31 ` 答复: 答复: [Intel-wired-lan] " Björn Töpel 2020-08-19 8:31 ` [Intel-wired-lan] 答复: 答复: " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-08-19 8:52 ` 答复: 答复: [Intel-wired-lan] " Björn Töpel 2020-08-19 8:52 ` [Intel-wired-lan] 答复: 答复: " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-08-20 15:13 ` [Intel-wired-lan] " Björn Töpel 2020-08-20 15:13 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-08-20 16:51 ` Maciej Fijalkowski 2020-08-20 16:51 ` Maciej Fijalkowski 2020-08-20 18:04 ` Björn Töpel 2020-08-20 18:04 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.