netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v1 00/15] XDP extend with knowledge of frame size
@ 2020-03-17 17:29 Jesper Dangaard Brouer
  2020-03-17 17:29 ` [PATCH RFC v1 01/15] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
                   ` (14 more replies)
  0 siblings, 15 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:29 UTC (permalink / raw)
  To: sameehj
  Cc: thomas.petazzoni, Andy Gospodarek, Jeff Kirsher, Michael Chan,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

Early RFC *before* I finish converting all drivers... so I can get
feedback that might affect the design...

XDP have evolved to support several frame sizes, but xdp_buff was not
updated with this information. This have caused the side-effect that XDP
frame data hard end is not known. It also limited the BPF-helper
bpf_xdp_adjust_tail to only shrink the packet.

This patchset tries to address this and add packet tail extend/grow.

The purpose of the patchset is ALSO to reserve a memory area that can be
used for storing extra information, specifically for extending XDP with
multi-buffer support. One proposal is to use same layout as
skb_shared_info, which is why this area is currently 320 bytes.

---

Jesper Dangaard Brouer (15):
      xdp: add frame size to xdp_buff
      mvneta: add XDP frame size to driver
      bnxt: add XDP frame size to driver
      ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K
      ixgbe: add XDP frame size to driver
      sfc: fix XDP-redirect in this driver
      sfc: add XDP frame size
      xdp: allow bpf_xdp_adjust_tail() to grow packet size
      xdp: clear grow memory in bpf_xdp_adjust_tail()
      net: XDP-generic determining XDP frame size
      xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame
      xdp: cpumap redirect use frame_sz and increase skb_tailroom
      tun: add XDP frame size
      veth: xdp using frame_sz in veth driver
      dpaa2-eth: add XDP frame size


 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c    |    1 +
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c |    1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   17 ++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |   17 +++++++-----
 drivers/net/ethernet/marvell/mvneta.c            |    1 +
 drivers/net/ethernet/sfc/efx_common.c            |    9 ++++--
 drivers/net/ethernet/sfc/net_driver.h            |    6 ++++
 drivers/net/ethernet/sfc/rx.c                    |    3 +-
 drivers/net/ethernet/sfc/rx_common.c             |    6 ++--
 drivers/net/tun.c                                |    2 +
 drivers/net/veth.c                               |   15 +++++++++--
 include/net/xdp.h                                |   31 +++++++++++++++++++++-
 include/uapi/linux/bpf.h                         |    4 +--
 kernel/bpf/cpumap.c                              |   21 ++-------------
 net/core/dev.c                                   |   14 ++++++----
 net/core/filter.c                                |   24 ++++++++++++++++-
 net/core/xdp.c                                   |    7 +++++
 17 files changed, 133 insertions(+), 46 deletions(-)

--


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 01/15] xdp: add frame size to xdp_buff
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
@ 2020-03-17 17:29 ` Jesper Dangaard Brouer
  2020-03-17 20:42   ` Jakub Kicinski
  2020-03-17 17:29 ` [PATCH RFC v1 02/15] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:29 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

XDP have evolved to support several frame sizes, but xdp_buff was not
updated with this information. The frame size (frame_sz) member of
xdp_buff is introduced to know the real size of the memory the frame is
delivered in.

When introducing this also make it clear that some tailroom is
reserved/required when creating SKBs using build_skb().

It would also have been an option to introduce a pointer to
data_hard_end (with reserved offset). The advantage with frame_sz is
that (like rxq) drivers only need to setup/assign this value once per
NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to
store frame_sz inside xdp_rxq_info, because it's varies per packet as it
can be based/depend on packet length.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/net/xdp.h |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 40c6d3398458..99f4374f6214 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -6,6 +6,8 @@
 #ifndef __LINUX_NET_XDP_H__
 #define __LINUX_NET_XDP_H__
 
+#include <linux/skbuff.h> /* skb_shared_info */
+
 /**
  * DOC: XDP RX-queue information
  *
@@ -70,8 +72,23 @@ struct xdp_buff {
 	void *data_hard_start;
 	unsigned long handle;
 	struct xdp_rxq_info *rxq;
+	u32 frame_sz; /* frame size to deduct data_hard_end/reserved tailroom*/
 };
 
+/* Reserve memory area at end-of data area.
+ *
+ * This macro reserves tailroom in the XDP buffer by limiting the
+ * XDP/BPF data access to data_hard_end.  Notice same area (and size)
+ * is used for XDP_PASS, when constructing the SKB via build_skb().
+ */
+#define xdp_data_hard_end(xdp)				\
+	((xdp)->data_hard_start + (xdp)->frame_sz -	\
+	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
+/* Like skb_shinfo */
+#define xdp_shinfo(xdp)	((struct skb_shared_info *)(xdp_data_hard_end(xdp)))
+// XXX: Above likely belongs in later patch
+
 struct xdp_frame {
 	void *data;
 	u16 len;



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 02/15] mvneta: add XDP frame size to driver
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
  2020-03-17 17:29 ` [PATCH RFC v1 01/15] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
@ 2020-03-17 17:29 ` Jesper Dangaard Brouer
  2020-03-17 17:29 ` [PATCH RFC v1 03/15] bnxt: " Jesper Dangaard Brouer
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:29 UTC (permalink / raw)
  To: sameehj
  Cc: thomas.petazzoni, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi

This marvell driver mvneta uses PAGE_SIZE frames, which makes it
really easy to convert.  It also only updated rxq and now frame_sz
once per NAPI call.

Cc: thomas.petazzoni@bootlin.com
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/marvell/mvneta.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 1c391f63a26f..e6c6524f63a5 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2310,6 +2310,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(pp->xdp_prog);
 	xdp_buf.rxq = &rxq->xdp_rxq;
+	xdp_buf.frame_sz = PAGE_SIZE;
 
 	/* Fairness NAPI loop */
 	while (rx_proc < budget && rx_proc < rx_todo) {



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 03/15] bnxt: add XDP frame size to driver
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
  2020-03-17 17:29 ` [PATCH RFC v1 01/15] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
  2020-03-17 17:29 ` [PATCH RFC v1 02/15] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-03-17 17:29 ` Jesper Dangaard Brouer
  2020-03-20 19:22   ` Andy Gospodarek
  2020-03-23 14:18   ` Andy Gospodarek
  2020-03-17 17:29 ` [PATCH RFC v1 04/15] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:29 UTC (permalink / raw)
  To: sameehj
  Cc: Michael Chan, Andy Gospodarek, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi

This driver uses full PAGE_SIZE pages when XDP is enabled.

Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index c6f6f2033880..5e3b4a3b69ea 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -138,6 +138,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = *data_ptr + *len;
 	xdp.rxq = &rxr->xdp_rxq;
+	xdp.frame_sz = PAGE_SIZE; /* BNXT_RX_PAGE_MODE(bp) when XDP enabled */
 	orig_data = xdp.data;
 
 	rcu_read_lock();



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 04/15] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (2 preceding siblings ...)
  2020-03-17 17:29 ` [PATCH RFC v1 03/15] bnxt: " Jesper Dangaard Brouer
@ 2020-03-17 17:29 ` Jesper Dangaard Brouer
  2020-03-17 17:29 ` [PATCH RFC v1 05/15] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:29 UTC (permalink / raw)
  To: sameehj
  Cc: Jeff Kirsher, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi

The ixgbe driver have another memory model when compiled on archs with
PAGE_SIZE above 4096 bytes. In this mode it doesn't split the page in
two halves, but instead increment rx_buffer->page_offset by truesize of
packet (which include headroom and tailroom for skb_shared_info).

This is done correctly in ixgbe_build_skb(), but in ixgbe_rx_buffer_flip
which is currently only called on XDP_TX and XDP_REDIRECT, it forgets
to add the tailroom for skb_shared_info. This breaks XDP_REDIRECT, for
veth and cpumap.  Fix by adding size of skb_shared_info tailroom.

Maintainers notice: This fix have been queued to Jeff.

Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect")
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 718931d951bc..ea6834bae04c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2254,7 +2254,8 @@ static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
 	rx_buffer->page_offset ^= truesize;
 #else
 	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) :
+				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
+				SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
 				SKB_DATA_ALIGN(size);
 
 	rx_buffer->page_offset += truesize;



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 05/15] ixgbe: add XDP frame size to driver
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (3 preceding siblings ...)
  2020-03-17 17:29 ` [PATCH RFC v1 04/15] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
@ 2020-03-17 17:29 ` Jesper Dangaard Brouer
  2020-03-18 20:03   ` Maciej Fijalkowski
  2020-03-17 17:29 ` [PATCH RFC v1 06/15] sfc: fix XDP-redirect in this driver Jesper Dangaard Brouer
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:29 UTC (permalink / raw)
  To: sameehj
  Cc: Jeff Kirsher, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi

The ixgbe driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes).
For PAGE_SIZE larger than 4K, driver advance its rx_buffer->page_offset
with the frame size "truesize".

When driver enable XDP it uses build_skb() which provides the necessary
tailroom for XDP-redirect.

When XDP frame size doesn't depend on RX packet size (4K case), then
xdp.frame_sz can be updated once outside the main NAPI loop.

Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   17 +++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   18 ++++++++++--------
 2 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 2833e4f041ce..943b643b6ed8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -417,6 +417,23 @@ static inline unsigned int ixgbe_rx_pg_order(struct ixgbe_ring *ring)
 }
 #define ixgbe_rx_pg_size(_ring) (PAGE_SIZE << ixgbe_rx_pg_order(_ring))
 
+static inline unsigned int ixgbe_rx_frame_truesize(struct ixgbe_ring *rx_ring,
+						   unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ixgbe_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	/* Notice XDP must use build_skb() mode */
+	truesize = ring_uses_build_skb(rx_ring) ?
+		SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+	return truesize;
+}
+
 #define IXGBE_ITR_ADAPTIVE_MIN_INC	2
 #define IXGBE_ITR_ADAPTIVE_MIN_USECS	10
 #define IXGBE_ITR_ADAPTIVE_MAX_USECS	126
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ea6834bae04c..f505ed8c9dc1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2248,16 +2248,10 @@ static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
 				 struct ixgbe_rx_buffer *rx_buffer,
 				 unsigned int size)
 {
+	unsigned int truesize = ixgbe_rx_frame_truesize(rx_ring, size);
 #if (PAGE_SIZE < 8192)
-	unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
-
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
-				SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
-				SKB_DATA_ALIGN(size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -2291,6 +2285,11 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 
 	xdp.rxq = &rx_ring->xdp_rxq;
 
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, 0);
+#endif
+
 	while (likely(total_rx_packets < budget)) {
 		union ixgbe_adv_rx_desc *rx_desc;
 		struct ixgbe_rx_buffer *rx_buffer;
@@ -2324,7 +2323,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			xdp.data_hard_start = xdp.data -
 					      ixgbe_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on size */
+			xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = ixgbe_run_xdp(adapter, rx_ring, &xdp);
 		}
 



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 06/15] sfc: fix XDP-redirect in this driver
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (4 preceding siblings ...)
  2020-03-17 17:29 ` [PATCH RFC v1 05/15] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-03-17 17:29 ` Jesper Dangaard Brouer
  2020-03-17 17:29 ` [PATCH RFC v1 07/15] sfc: add XDP frame size Jesper Dangaard Brouer
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:29 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

TODO: Drop as already accepted in net-next as
 commit 86e85bf6981c ("sfc: fix XDP-redirect in this driver")

XDP-redirect is broken in this driver sfc. XDP_REDIRECT requires
tailroom for skb_shared_info when creating an SKB based on the
redirected xdp_frame (both in cpumap and veth).

The fix requires some initial explaining. The driver uses RX page-split
when possible. It reserves the top 64 bytes in the RX-page for storing
dma_addr (struct efx_rx_page_state). It also have the XDP recommended
headroom of XDP_PACKET_HEADROOM (256 bytes). As it doesn't reserve any
tailroom, it can still fit two standard MTU (1500) frames into one page.

The sizeof struct skb_shared_info in 320 bytes. Thus drivers like ixgbe
and i40e, reduce their XDP headroom to 192 bytes, which allows them to
fit two frames with max 1536 bytes into a 4K page (192+1536+320=2048).

The fix is to reduce this drivers headroom to 128 bytes and add the 320
bytes tailroom. This account for reserved top 64 bytes in the page, and
still fit two frame in a page for normal MTUs.

We must never go below 128 bytes of headroom for XDP, as one cacheline
is for xdp_frame area and next cacheline is reserved for metadata area.

Fixes: eb9a36be7f3e ("sfc: perform XDP processing on received packets")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/sfc/efx_common.c |    9 +++++----
 drivers/net/ethernet/sfc/net_driver.h |    6 ++++++
 drivers/net/ethernet/sfc/rx.c         |    2 +-
 drivers/net/ethernet/sfc/rx_common.c  |    6 +++---
 4 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx_common.c b/drivers/net/ethernet/sfc/efx_common.c
index b0d76bc19673..1799ff9a45d9 100644
--- a/drivers/net/ethernet/sfc/efx_common.c
+++ b/drivers/net/ethernet/sfc/efx_common.c
@@ -200,11 +200,11 @@ void efx_link_status_changed(struct efx_nic *efx)
 unsigned int efx_xdp_max_mtu(struct efx_nic *efx)
 {
 	/* The maximum MTU that we can fit in a single page, allowing for
-	 * framing, overhead and XDP headroom.
+	 * framing, overhead and XDP headroom + tailroom.
 	 */
 	int overhead = EFX_MAX_FRAME_LEN(0) + sizeof(struct efx_rx_page_state) +
 		       efx->rx_prefix_size + efx->type->rx_buffer_padding +
-		       efx->rx_ip_align + XDP_PACKET_HEADROOM;
+		       efx->rx_ip_align + EFX_XDP_HEADROOM + EFX_XDP_TAILROOM;
 
 	return PAGE_SIZE - overhead;
 }
@@ -302,8 +302,9 @@ static void efx_start_datapath(struct efx_nic *efx)
 	efx->rx_dma_len = (efx->rx_prefix_size +
 			   EFX_MAX_FRAME_LEN(efx->net_dev->mtu) +
 			   efx->type->rx_buffer_padding);
-	rx_buf_len = (sizeof(struct efx_rx_page_state) + XDP_PACKET_HEADROOM +
-		      efx->rx_ip_align + efx->rx_dma_len);
+	rx_buf_len = (sizeof(struct efx_rx_page_state)   + EFX_XDP_HEADROOM +
+		      efx->rx_ip_align + efx->rx_dma_len + EFX_XDP_TAILROOM);
+
 	if (rx_buf_len <= PAGE_SIZE) {
 		efx->rx_scatter = efx->type->always_rx_scatter;
 		efx->rx_buffer_order = 0;
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 9f9886f222c8..f96b1f9fe119 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -91,6 +91,12 @@
 #define EFX_RX_BUF_ALIGNMENT	4
 #endif
 
+/* Non-standard XDP_PACKET_HEADROOM and tailroom to satisfy XDP_REDIRECT and
+ * still fit two standard MTU size packets into a single 4K page.
+ */
+#define EFX_XDP_HEADROOM	128
+#define EFX_XDP_TAILROOM	SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
+
 /* Forward declare Precision Time Protocol (PTP) support structure. */
 struct efx_ptp_data;
 struct hwtstamp_config;
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index a2042f16babc..260352d97d9d 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -302,7 +302,7 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel,
 	       efx->rx_prefix_size);
 
 	xdp.data = *ehp;
-	xdp.data_hard_start = xdp.data - XDP_PACKET_HEADROOM;
+	xdp.data_hard_start = xdp.data - EFX_XDP_HEADROOM;
 
 	/* No support yet for XDP metadata */
 	xdp_set_data_meta_invalid(&xdp);
diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
index ee8beb87bdc1..e10c23833515 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -412,10 +412,10 @@ static int efx_init_rx_buffers(struct efx_rx_queue *rx_queue, bool atomic)
 			index = rx_queue->added_count & rx_queue->ptr_mask;
 			rx_buf = efx_rx_buffer(rx_queue, index);
 			rx_buf->dma_addr = dma_addr + efx->rx_ip_align +
-					   XDP_PACKET_HEADROOM;
+					   EFX_XDP_HEADROOM;
 			rx_buf->page = page;
 			rx_buf->page_offset = page_offset + efx->rx_ip_align +
-					      XDP_PACKET_HEADROOM;
+					      EFX_XDP_HEADROOM;
 			rx_buf->len = efx->rx_dma_len;
 			rx_buf->flags = 0;
 			++rx_queue->added_count;
@@ -433,7 +433,7 @@ static int efx_init_rx_buffers(struct efx_rx_queue *rx_queue, bool atomic)
 void efx_rx_config_page_split(struct efx_nic *efx)
 {
 	efx->rx_page_buf_step = ALIGN(efx->rx_dma_len + efx->rx_ip_align +
-				      XDP_PACKET_HEADROOM,
+				      EFX_XDP_HEADROOM + EFX_XDP_TAILROOM,
 				      EFX_RX_BUF_ALIGNMENT);
 	efx->rx_bufs_per_page = efx->rx_buffer_order ? 1 :
 		((PAGE_SIZE - sizeof(struct efx_rx_page_state)) /



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 07/15] sfc: add XDP frame size
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (5 preceding siblings ...)
  2020-03-17 17:29 ` [PATCH RFC v1 06/15] sfc: fix XDP-redirect in this driver Jesper Dangaard Brouer
@ 2020-03-17 17:29 ` Jesper Dangaard Brouer
  2020-03-17 17:29 ` [PATCH RFC v1 08/15] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:29 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

This driver uses RX page-split when possible. It was recently fixed
in commit 86e85bf6981c ("sfc: fix XDP-redirect in this driver") to
add needed tailroom for XDP-redirect.

After the fix efx->rx_page_buf_step is the frame size, with enough
head and tail-room for XDP-redirect.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/sfc/rx.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 260352d97d9d..68c47a8c71df 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -308,6 +308,7 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = xdp.data + rx_buf->len;
 	xdp.rxq = &rx_queue->xdp_rxq_info;
+	xdp.frame_sz = efx->rx_page_buf_step;
 
 	xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp);
 	rcu_read_unlock();



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 08/15] xdp: allow bpf_xdp_adjust_tail() to grow packet size
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (6 preceding siblings ...)
  2020-03-17 17:29 ` [PATCH RFC v1 07/15] sfc: add XDP frame size Jesper Dangaard Brouer
@ 2020-03-17 17:29 ` Jesper Dangaard Brouer
  2020-03-17 17:29 ` [PATCH RFC v1 09/15] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:29 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

FIXME: This patch MUST be LAST after all drivers have been updated!!!

Finally, after all drivers have a frame size, allow BPF-helper
bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.

Remember that helper/macro xdp_data_hard_end have reserved some
tailroom.  Thus, this helper makes sure that the BPF-prog don't have
access to this tailroom area.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/uapi/linux/bpf.h |    4 ++--
 net/core/filter.c        |   18 ++++++++++++++++--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 14dc4f9fb3c8..1fa6ab616ec3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1942,8 +1942,8 @@ union bpf_attr {
  * int bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
  * 	Description
  * 		Adjust (move) *xdp_md*\ **->data_end** by *delta* bytes. It is
- * 		only possible to shrink the packet as of this writing,
- * 		therefore *delta* must be a negative integer.
+ * 		possible to both shrink and grow the packet tail.
+ * 		Shrink done via *delta* being a negative integer.
  *
  * 		A call to this helper is susceptible to change the underlying
  * 		packet buffer. Therefore, at load time, all checks on pointers
diff --git a/net/core/filter.c b/net/core/filter.c
index 4a08c9fb2be7..0ceddee0c678 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3409,12 +3409,26 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
 
 BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 {
+	void *data_hard_end = xdp_data_hard_end(xdp);
 	void *data_end = xdp->data_end + offset;
 
-	/* only shrinking is allowed for now. */
-	if (unlikely(offset >= 0))
+	/* Notice that xdp_data_hard_end have reserved some tailroom */
+	if (unlikely(data_end >= data_hard_end))
 		return -EINVAL;
 
+	/* DANGER: ALL drivers MUST be converted to init xdp->frame_sz
+	 * - Adding some chicken checks below
+	 * - Will (likely) not be for upstream
+	 */
+	if (unlikely(xdp->frame_sz < (xdp->data_end - xdp->data_hard_start))) {
+		WARN(1, "Too small xdp->frame_sz = %d\n", xdp->frame_sz);
+		return -EINVAL;
+	}
+	if (unlikely(xdp->frame_sz > PAGE_SIZE)) {
+		WARN(1, "Too BIG xdp->frame_sz = %d\n", xdp->frame_sz);
+		return -EINVAL;
+	}
+
 	if (unlikely(data_end < xdp->data + ETH_HLEN))
 		return -EINVAL;
 



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 09/15] xdp: clear grow memory in bpf_xdp_adjust_tail()
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (7 preceding siblings ...)
  2020-03-17 17:29 ` [PATCH RFC v1 08/15] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
@ 2020-03-17 17:29 ` Jesper Dangaard Brouer
  2020-03-17 20:38   ` Jakub Kicinski
  2020-03-18  9:15   ` Toke Høiland-Jørgensen
  2020-03-17 17:29 ` [PATCH RFC v1 10/15] net: XDP-generic determining XDP frame size Jesper Dangaard Brouer
                   ` (5 subsequent siblings)
  14 siblings, 2 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:29 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

To reviewers: Need some opinions if this is needed?

(TODO: Squash patch)
---
 net/core/filter.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 0ceddee0c678..669f29992177 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3432,6 +3432,12 @@ BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 	if (unlikely(data_end < xdp->data + ETH_HLEN))
 		return -EINVAL;
 
+	// XXX: To reviewers: How paranoid are we? Do we really need to
+	/* clear memory area on grow, as in-theory can contain uninit kmem */
+	if (offset > 0) {
+		memset(xdp->data_end, 0, offset);
+	}
+
 	xdp->data_end = data_end;
 
 	return 0;



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 10/15] net: XDP-generic determining XDP frame size
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (8 preceding siblings ...)
  2020-03-17 17:29 ` [PATCH RFC v1 09/15] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
@ 2020-03-17 17:29 ` Jesper Dangaard Brouer
  2020-03-17 17:30 ` [PATCH RFC v1 11/15] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:29 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

The SKB "head" pointer points to the data area that contains
skb_shared_info, that can be found via skb_end_pointer(). Given
xdp->data_hard_start have been established (basically pointing to
skb->head), frame size is between skb_end_pointer() and data_hard_start,
plus the size reserved to skb_shared_info.

Change the bpf_xdp_adjust_tail fix up of skb->len, to be a positive
offset number on grow, and negative number on shrink.  As this seems more
natural when reading the code

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/dev.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 9c941cd38b13..645e5c48a52f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4546,6 +4546,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 	xdp->data_meta = xdp->data;
 	xdp->data_end = xdp->data + hlen;
 	xdp->data_hard_start = skb->data - skb_headroom(skb);
+
+	/* SKB "head" area always have tailroom for skb_shared_info */
+	xdp->frame_sz  = (void *)skb_end_pointer(skb) - xdp->data_hard_start;
+	xdp->frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
 	orig_data_end = xdp->data_end;
 	orig_data = xdp->data;
 	eth = (struct ethhdr *)xdp->data;
@@ -4569,14 +4574,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 		skb_reset_network_header(skb);
 	}
 
-	/* check if bpf_xdp_adjust_tail was used. it can only "shrink"
-	 * pckt.
-	 */
-	off = orig_data_end - xdp->data_end;
+	/* check if bpf_xdp_adjust_tail was used */
+	off = xdp->data_end - orig_data_end;
 	if (off != 0) {
 		skb_set_tail_pointer(skb, xdp->data_end - xdp->data);
-		skb->len -= off;
-
+		skb->len += off; /* positive on grow, negative on shrink */
 	}
 
 	/* check if XDP changed eth hdr such SKB needs update */



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 11/15] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (9 preceding siblings ...)
  2020-03-17 17:29 ` [PATCH RFC v1 10/15] net: XDP-generic determining XDP frame size Jesper Dangaard Brouer
@ 2020-03-17 17:30 ` Jesper Dangaard Brouer
  2020-03-17 17:30 ` [PATCH RFC v1 12/15] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:30 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

Use hole in struct xdp_frame, when adding member frame_sz, which keeps
same sizeof struct (32 bytes)

Drivers ixgbe and sfc had bug cases where the necessary/expected
tailroom was not reserved. This can lead to some hard to catch memory
corruption issues. Having the drivers frame_sz this can be detected when
packet length/end via xdp->data_end exceed the xdp_data_hard_end
pointer, which accounts for the reserved the tailroom.

When detecting this driver issue, simply fail the conversion with NULL,
which results in feedback to driver (failing xdp_do_redirect()) causing
driver to drop packet. Given the lack of consistent XDP stats, this can
be hard to troubleshoot. And given this is a driver bug, we want to
generate some more noise in form of a WARN stack dump (to ID the driver
code that inlined convert_to_xdp_frame).

Inlining the WARN macro is problematic, because it adds an asm
instruction (on Intel CPUs ud2) what influence instruction cache
prefetching. Thus, introduce xdp_warn and macro XDP_WARN, to avoid this
and at the same time make identifying the function and line of this
inlined function easier.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/net/xdp.h |   14 +++++++++++++-
 net/core/xdp.c    |    7 +++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 99f4374f6214..55a885aa4e53 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -93,7 +93,8 @@ struct xdp_frame {
 	void *data;
 	u16 len;
 	u16 headroom;
-	u16 metasize;
+	u32 metasize:8;
+	u32 frame_sz:24;
 	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
 	 * while mem info is valid on remote CPU.
 	 */
@@ -108,6 +109,10 @@ static inline void xdp_scrub_frame(struct xdp_frame *frame)
 	frame->dev_rx = NULL;
 }
 
+/* Avoids inlining WARN macro in fast-path */
+void xdp_warn(const char* msg, const char* func, const int line);
+#define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)
+
 struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
 
 /* Convert xdp_buff to xdp_frame */
@@ -128,6 +133,12 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 	if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
 		return NULL;
 
+	/* Catch if driver didn't reserve tailroom for skb_shared_info */
+	if (unlikely(xdp->data_end > xdp_data_hard_end(xdp))) {
+		XDP_WARN("Driver BUG: missing reserved tailroom");
+		return NULL;
+	}
+
 	/* Store info in top of packet */
 	xdp_frame = xdp->data_hard_start;
 
@@ -135,6 +146,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 	xdp_frame->len  = xdp->data_end - xdp->data;
 	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
 	xdp_frame->metasize = metasize;
+	xdp_frame->frame_sz = xdp->frame_sz;
 
 	/* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
 	xdp_frame->mem = xdp->rxq->mem;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 4c7ea85486af..4bc3026ae218 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -11,6 +11,7 @@
 #include <linux/slab.h>
 #include <linux/idr.h>
 #include <linux/rhashtable.h>
+#include <linux/bug.h>
 #include <net/page_pool.h>
 
 #include <net/xdp.h>
@@ -496,3 +497,9 @@ struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp)
 	return xdpf;
 }
 EXPORT_SYMBOL_GPL(xdp_convert_zc_to_xdp_frame);
+
+/* Used by XDP_WARN macro, to avoid inlining WARN() in fast-path */
+void xdp_warn(const char* msg, const char* func, const int line) {
+	WARN(1, "XDP_WARN: %s(line:%d): %s\n", func, line, msg);
+};
+EXPORT_SYMBOL_GPL(xdp_warn);



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 12/15] xdp: cpumap redirect use frame_sz and increase skb_tailroom
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (10 preceding siblings ...)
  2020-03-17 17:30 ` [PATCH RFC v1 11/15] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
@ 2020-03-17 17:30 ` Jesper Dangaard Brouer
  2020-03-17 17:30 ` [PATCH RFC v1 13/15] tun: add XDP frame size Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:30 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

Knowing the memory size backing the packet/xdp_frame data area, and
knowing it already have reserved room for skb_shared_info, simplifies
using build_skb significantly.

With this change we no-longer lie about the SKB truesize, but more
importantly a significant larger skb_tailroom is now provided, e.g. when
drivers uses a full PAGE_SIZE. This extra tailroom (in linear area) can be
used by the network stack when coalescing SKBs (e.g. in skb_try_coalesce,
see TCP cases where tcp_queue_rcv() can 'eat' skb).

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 kernel/bpf/cpumap.c |   21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 70f71b154fa5..9c777ac4d4bd 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -162,25 +162,10 @@ static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
 	/* Part of headroom was reserved to xdpf */
 	hard_start_headroom = sizeof(struct xdp_frame) +  xdpf->headroom;
 
-	/* build_skb need to place skb_shared_info after SKB end, and
-	 * also want to know the memory "truesize".  Thus, need to
-	 * know the memory frame size backing xdp_buff.
-	 *
-	 * XDP was designed to have PAGE_SIZE frames, but this
-	 * assumption is not longer true with ixgbe and i40e.  It
-	 * would be preferred to set frame_size to 2048 or 4096
-	 * depending on the driver.
-	 *   frame_size = 2048;
-	 *   frame_len  = frame_size - sizeof(*xdp_frame);
-	 *
-	 * Instead, with info avail, skb_shared_info in placed after
-	 * packet len.  This, unfortunately fakes the truesize.
-	 * Another disadvantage of this approach, the skb_shared_info
-	 * is not at a fixed memory location, with mixed length
-	 * packets, which is bad for cache-line hotness.
+	/* Memory size backing xdp_frame data already have reserved
+	 * room for build_skb to place skb_shared_info in tailroom.
 	 */
-	frame_size = SKB_DATA_ALIGN(xdpf->len + hard_start_headroom) +
-		SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	frame_size = xdpf->frame_sz;
 
 	pkt_data_start = xdpf->data - hard_start_headroom;
 	skb = build_skb_around(skb, pkt_data_start, frame_size);



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 13/15] tun: add XDP frame size
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (11 preceding siblings ...)
  2020-03-17 17:30 ` [PATCH RFC v1 12/15] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
@ 2020-03-17 17:30 ` Jesper Dangaard Brouer
  2020-03-17 17:30 ` [PATCH RFC v1 14/15] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
  2020-03-17 17:30 ` [PATCH RFC v1 15/15] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
  14 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:30 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

The tun driver have two code paths for running XDP (bpf_prog_run_xdp).
In both cases 'buflen' contains enough tailroom for skb_shared_info.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/tun.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 79f248cb282d..c540ff2f5774 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1706,6 +1706,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 		xdp_set_data_meta_invalid(&xdp);
 		xdp.data_end = xdp.data + len;
 		xdp.rxq = &tfile->xdp_rxq;
+		xdp.frame_sz = buflen;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		if (act == XDP_REDIRECT || act == XDP_TX) {
@@ -2445,6 +2446,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 		}
 		xdp_set_data_meta_invalid(xdp);
 		xdp->rxq = &tfile->xdp_rxq;
+		xdp->frame_sz = buflen;
 
 		act = bpf_prog_run_xdp(xdp_prog, xdp);
 		err = tun_xdp_act(tun, xdp_prog, xdp, act);



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 14/15] veth: xdp using frame_sz in veth driver
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (12 preceding siblings ...)
  2020-03-17 17:30 ` [PATCH RFC v1 13/15] tun: add XDP frame size Jesper Dangaard Brouer
@ 2020-03-17 17:30 ` Jesper Dangaard Brouer
  2020-03-17 17:30 ` [PATCH RFC v1 15/15] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
  14 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:30 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/veth.c |   15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8cdc4415fa70..be88625162cd 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -492,6 +492,8 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 					unsigned int *xdp_xmit,
 					struct veth_xdp_tx_bq *bq)
 {
+// hmm... do we really need to exclude sizeof(struct xdp_frame) ?
+// ... as bpf_xdp_adjust_head() already handle this.
 	void *hard_start = frame->data - frame->headroom;
 	void *head = hard_start - sizeof(struct xdp_frame);
 	int len = frame->len, delta = 0;
@@ -506,11 +508,12 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 		struct xdp_buff xdp;
 		u32 act;
 
-		xdp.data_hard_start = hard_start;
+		xdp.data_hard_start = hard_start; /* exclude xdp_frame area */
 		xdp.data = frame->data;
 		xdp.data_end = frame->data + frame->len;
 		xdp.data_meta = frame->data - frame->metasize;
 		xdp.rxq = &rq->xdp_rxq;
+		xdp.frame_sz = frame->frame_sz - sizeof(struct xdp_frame);
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 
@@ -639,6 +642,11 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, struct sk_buff *skb,
 	xdp.data_end = xdp.data + pktlen;
 	xdp.data_meta = xdp.data;
 	xdp.rxq = &rq->xdp_rxq;
+
+	/* SKB "head" area always have tailroom for skb_shared_info */
+	xdp.frame_sz = (void *)skb_end_pointer(skb) - xdp.data_hard_start;
+	xdp.frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
 	orig_data = xdp.data;
 	orig_data_end = xdp.data_end;
 
@@ -678,6 +686,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, struct sk_buff *skb,
 	}
 	rcu_read_unlock();
 
+	/* check if bpf_xdp_adjust_head was used */
 	delta = orig_data - xdp.data;
 	off = mac_len + delta;
 	if (off > 0)
@@ -685,9 +694,11 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, struct sk_buff *skb,
 	else if (off < 0)
 		__skb_pull(skb, -off);
 	skb->mac_header -= delta;
+
+	/* check if bpf_xdp_adjust_tail was used */
 	off = xdp.data_end - orig_data_end;
 	if (off != 0)
-		__skb_put(skb, off);
+		__skb_put(skb, off); /* positive on grow, negative on shrink */
 	skb->protocol = eth_type_trans(skb, rq->dev);
 
 	metalen = xdp.data - xdp.data_meta;



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH RFC v1 15/15] dpaa2-eth: add XDP frame size
  2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (13 preceding siblings ...)
  2020-03-17 17:30 ` [PATCH RFC v1 14/15] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
@ 2020-03-17 17:30 ` Jesper Dangaard Brouer
  14 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-17 17:30 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

This is the full page size:
 #define DPAA2_ETH_RX_BUF_RAW_SIZE	PAGE_SIZE

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index 7ff147e89426..8b03b38b33c9 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -301,6 +301,7 @@ static u32 run_xdp(struct dpaa2_eth_priv *priv,
 	xdp.data_hard_start = xdp.data - XDP_PACKET_HEADROOM;
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.rxq = &ch->xdp_rxq;
+	xdp.frame_sz = DPAA2_ETH_RX_BUF_RAW_SIZE;
 
 	xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp);
 



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v1 09/15] xdp: clear grow memory in bpf_xdp_adjust_tail()
  2020-03-17 17:29 ` [PATCH RFC v1 09/15] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
@ 2020-03-17 20:38   ` Jakub Kicinski
  2020-03-18  9:15   ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 29+ messages in thread
From: Jakub Kicinski @ 2020-03-17 20:38 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

On Tue, 17 Mar 2020 18:29:53 +0100 Jesper Dangaard Brouer wrote:
> To reviewers: Need some opinions if this is needed?
> 
> (TODO: Squash patch)

I'd vote we don't clear, since we don't clear in adjust head.

We could also add some wrapper around memset() which could be compiled
out based on some CONFIG_ but that could be seen as just moving the
responsibility onto the user..

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 0ceddee0c678..669f29992177 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3432,6 +3432,12 @@ BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
>  	if (unlikely(data_end < xdp->data + ETH_HLEN))
>  		return -EINVAL;
>  
> +	// XXX: To reviewers: How paranoid are we? Do we really need to
> +	/* clear memory area on grow, as in-theory can contain uninit kmem */
> +	if (offset > 0) {
> +		memset(xdp->data_end, 0, offset);
> +	}
> +
>  	xdp->data_end = data_end;
>  
>  	return 0;
> 
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v1 01/15] xdp: add frame size to xdp_buff
  2020-03-17 17:29 ` [PATCH RFC v1 01/15] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
@ 2020-03-17 20:42   ` Jakub Kicinski
  2020-03-18  6:58     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 29+ messages in thread
From: Jakub Kicinski @ 2020-03-17 20:42 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

On Tue, 17 Mar 2020 18:29:12 +0100 Jesper Dangaard Brouer wrote:
> XDP have evolved to support several frame sizes, but xdp_buff was not
> updated with this information. The frame size (frame_sz) member of
> xdp_buff is introduced to know the real size of the memory the frame is
> delivered in.
> 
> When introducing this also make it clear that some tailroom is
> reserved/required when creating SKBs using build_skb().
> 
> It would also have been an option to introduce a pointer to
> data_hard_end (with reserved offset). The advantage with frame_sz is
> that (like rxq) drivers only need to setup/assign this value once per
> NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to
> store frame_sz inside xdp_rxq_info, because it's varies per packet as it
> can be based/depend on packet length.

Do you reckon it would be too ugly to make xdp-generic suffer and have
it set the length in rxq per packet? We shouldn't handle multiple
packets from the same rxq in parallel, no?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v1 01/15] xdp: add frame size to xdp_buff
  2020-03-17 20:42   ` Jakub Kicinski
@ 2020-03-18  6:58     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-18  6:58 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: sameehj, netdev, bpf, zorik, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, brouer

On Tue, 17 Mar 2020 13:42:43 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> On Tue, 17 Mar 2020 18:29:12 +0100 Jesper Dangaard Brouer wrote:
> > XDP have evolved to support several frame sizes, but xdp_buff was not
> > updated with this information. The frame size (frame_sz) member of
> > xdp_buff is introduced to know the real size of the memory the frame is
> > delivered in.
> > 
> > When introducing this also make it clear that some tailroom is
> > reserved/required when creating SKBs using build_skb().
> > 
> > It would also have been an option to introduce a pointer to
> > data_hard_end (with reserved offset). The advantage with frame_sz is
> > that (like rxq) drivers only need to setup/assign this value once per
> > NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to
> > store frame_sz inside xdp_rxq_info, because it's varies per packet as it
> > can be based/depend on packet length.  
> 
> Do you reckon it would be too ugly to make xdp-generic suffer and have
> it set the length in rxq per packet? We shouldn't handle multiple
> packets from the same rxq in parallel, no?

It's not only xdp-generic, but also xdp-native drivers like ixgbe and
i40e, that have modes (>4K page) where they have per packet frame size.
As this kind of mode, have in-practice been "allowed" (with out me
realizing it) I expect that other drivers will likely also use this.

Regarding the parallel argument, then Intel at LPC had done experiments
with "RX-bulking" that required multiple xdp_buff's.  It's not exactly
parallel, but I see progress in that direction.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v1 09/15] xdp: clear grow memory in bpf_xdp_adjust_tail()
  2020-03-17 17:29 ` [PATCH RFC v1 09/15] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
  2020-03-17 20:38   ` Jakub Kicinski
@ 2020-03-18  9:15   ` Toke Høiland-Jørgensen
  2020-03-18 10:25     ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 29+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-18  9:15 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> To reviewers: Need some opinions if this is needed?
>
> (TODO: Squash patch)
> ---
>  net/core/filter.c |    6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 0ceddee0c678..669f29992177 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3432,6 +3432,12 @@ BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
>  	if (unlikely(data_end < xdp->data + ETH_HLEN))
>  		return -EINVAL;
>  
> +	// XXX: To reviewers: How paranoid are we? Do we really need to
> +	/* clear memory area on grow, as in-theory can contain uninit kmem */
> +	if (offset > 0) {
> +		memset(xdp->data_end, 0, offset);
> +	}

This memory will usually be recycled through page_pool or equivalent,
right? So couldn't we clear the pages when they are first allocated?
That way, the only data that would be left there would be packet data
from previous packets...

-Toke


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v1 09/15] xdp: clear grow memory in bpf_xdp_adjust_tail()
  2020-03-18  9:15   ` Toke Høiland-Jørgensen
@ 2020-03-18 10:25     ` Jesper Dangaard Brouer
  2020-03-19  5:35       ` Alexei Starovoitov
  0 siblings, 1 reply; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-18 10:25 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, brouer

On Wed, 18 Mar 2020 10:15:38 +0100
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> Jesper Dangaard Brouer <brouer@redhat.com> writes:
> 
> > To reviewers: Need some opinions if this is needed?
> >
> > (TODO: Squash patch)
> > ---
> >  net/core/filter.c |    6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 0ceddee0c678..669f29992177 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -3432,6 +3432,12 @@ BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
> >  	if (unlikely(data_end < xdp->data + ETH_HLEN))
> >  		return -EINVAL;
> >  
> > +	// XXX: To reviewers: How paranoid are we? Do we really need to
> > +	/* clear memory area on grow, as in-theory can contain uninit kmem */
> > +	if (offset > 0) {
> > +		memset(xdp->data_end, 0, offset);
> > +	}  
> 
> This memory will usually be recycled through page_pool or equivalent,
> right? So couldn't we clear the pages when they are first allocated?
> That way, the only data that would be left there would be packet data
> from previous packets...

Yes, that is another option, to clear pages on "real" alloc (not
recycle alloc), but it is a bit harder to implement (when not using
page_pool).

And yes, this area will very likely just contain old packet data, but
we cannot be 100% sure.

Previously Alexei have argued that we should not leak pointer values in
XDP.  Which is why we have xdp_scrub_frame(), but this is not 100% the
same.  So, I would like to hear Alexei's opinion ?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v1 05/15] ixgbe: add XDP frame size to driver
  2020-03-17 17:29 ` [PATCH RFC v1 05/15] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-03-18 20:03   ` Maciej Fijalkowski
  2020-03-18 21:23     ` Alexander Duyck
  0 siblings, 1 reply; 29+ messages in thread
From: Maciej Fijalkowski @ 2020-03-18 20:03 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, Jeff Kirsher, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck, David Ahern,
	Willem de Bruijn, Ilias Apalodimas, Lorenzo Bianconi,
	bjorn.topel, kuba

On Tue, Mar 17, 2020 at 06:29:33PM +0100, Jesper Dangaard Brouer wrote:
> The ixgbe driver uses different memory models depending on PAGE_SIZE at
> compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
> normal MTU frame size is 2048 bytes (and headroom 192 bytes).

To be clear the 2048 is the size of buffer given to HW and we slice it up
in a following way:
- 192 bytes dedicated for headroom
- 1500 is max allowed MTU for this setup
- 320 bytes for tailroom (skb shinfo)

In case you go with higher MTU then 3K buffer would be used and it would
came from order1 page and we still do the half split. Just FYI all of this
is for PAGE_SIZE == 4k and L1$ size == 64.

> For PAGE_SIZE larger than 4K, driver advance its rx_buffer->page_offset
> with the frame size "truesize".

Alex, couldn't we base the truesize here somehow on ixgbe_rx_bufsz() since
these are the sizes that we are passing to hw? I must admit I haven't been
in touch with systems with PAGE_SIZE > 4K.

> 
> When driver enable XDP it uses build_skb() which provides the necessary
> tailroom for XDP-redirect.

We still allow to load XDP prog when ring is not using build_skb(). I have
a feeling that we should drop this case now.

Alex/John/Bjorn WDYT?

> 
> When XDP frame size doesn't depend on RX packet size (4K case), then
> xdp.frame_sz can be updated once outside the main NAPI loop.
> 
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   17 +++++++++++++++++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   18 ++++++++++--------
>  2 files changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index 2833e4f041ce..943b643b6ed8 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -417,6 +417,23 @@ static inline unsigned int ixgbe_rx_pg_order(struct ixgbe_ring *ring)
>  }
>  #define ixgbe_rx_pg_size(_ring) (PAGE_SIZE << ixgbe_rx_pg_order(_ring))
>  
> +static inline unsigned int ixgbe_rx_frame_truesize(struct ixgbe_ring *rx_ring,
> +						   unsigned int size)
> +{
> +	unsigned int truesize;
> +
> +#if (PAGE_SIZE < 8192)
> +	truesize = ixgbe_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
> +#else
> +	/* Notice XDP must use build_skb() mode */
> +	truesize = ring_uses_build_skb(rx_ring) ?
> +		SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
> +		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
> +		SKB_DATA_ALIGN(size);
> +#endif
> +	return truesize;
> +}
> +
>  #define IXGBE_ITR_ADAPTIVE_MIN_INC	2
>  #define IXGBE_ITR_ADAPTIVE_MIN_USECS	10
>  #define IXGBE_ITR_ADAPTIVE_MAX_USECS	126
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index ea6834bae04c..f505ed8c9dc1 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -2248,16 +2248,10 @@ static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
>  				 struct ixgbe_rx_buffer *rx_buffer,
>  				 unsigned int size)
>  {
> +	unsigned int truesize = ixgbe_rx_frame_truesize(rx_ring, size);
>  #if (PAGE_SIZE < 8192)
> -	unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
> -
>  	rx_buffer->page_offset ^= truesize;
>  #else
> -	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
> -				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
> -				SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
> -				SKB_DATA_ALIGN(size);
> -
>  	rx_buffer->page_offset += truesize;
>  #endif
>  }
> @@ -2291,6 +2285,11 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>  
>  	xdp.rxq = &rx_ring->xdp_rxq;
>  
> +	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
> +#if (PAGE_SIZE < 8192)
> +	xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, 0);
> +#endif
> +
>  	while (likely(total_rx_packets < budget)) {
>  		union ixgbe_adv_rx_desc *rx_desc;
>  		struct ixgbe_rx_buffer *rx_buffer;
> @@ -2324,7 +2323,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>  			xdp.data_hard_start = xdp.data -
>  					      ixgbe_rx_offset(rx_ring);
>  			xdp.data_end = xdp.data + size;
> -
> +#if (PAGE_SIZE > 4096)
> +			/* At larger PAGE_SIZE, frame_sz depend on size */
> +			xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, size);
> +#endif
>  			skb = ixgbe_run_xdp(adapter, rx_ring, &xdp);
>  		}
>  
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v1 05/15] ixgbe: add XDP frame size to driver
  2020-03-18 20:03   ` Maciej Fijalkowski
@ 2020-03-18 21:23     ` Alexander Duyck
  2020-03-20 21:44       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 29+ messages in thread
From: Alexander Duyck @ 2020-03-18 21:23 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Jesper Dangaard Brouer, Jubran, Samih, Jeff Kirsher, Netdev, bpf,
	zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend, David Ahern,
	Willem de Bruijn, Ilias Apalodimas, Lorenzo Bianconi,
	Björn Töpel, kuba

On Wed, Mar 18, 2020 at 1:04 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Tue, Mar 17, 2020 at 06:29:33PM +0100, Jesper Dangaard Brouer wrote:
> > The ixgbe driver uses different memory models depending on PAGE_SIZE at
> > compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
> > normal MTU frame size is 2048 bytes (and headroom 192 bytes).
>
> To be clear the 2048 is the size of buffer given to HW and we slice it up
> in a following way:
> - 192 bytes dedicated for headroom
> - 1500 is max allowed MTU for this setup
> - 320 bytes for tailroom (skb shinfo)
>
> In case you go with higher MTU then 3K buffer would be used and it would
> came from order1 page and we still do the half split. Just FYI all of this
> is for PAGE_SIZE == 4k and L1$ size == 64.

True, but for most people this is the most common case since these are
the standard for x86.

> > For PAGE_SIZE larger than 4K, driver advance its rx_buffer->page_offset
> > with the frame size "truesize".
>
> Alex, couldn't we base the truesize here somehow on ixgbe_rx_bufsz() since
> these are the sizes that we are passing to hw? I must admit I haven't been
> in touch with systems with PAGE_SIZE > 4K.

With a page size greater than 4K we can actually get many more uses
out of a page by using the frame size to determine the truesize of the
packet. The truesize is the memory footprint currently being held by
the packet. So once the packet is filled we just have to add the
headroom and tailroom to whatever the hardware wrote instead of having
to use what we gave to the hardware. That gives us better efficiency,
if we used ixgbe_rx_bufsz() we would penalize small packets and that
in turn would likely hurt performance.

> >
> > When driver enable XDP it uses build_skb() which provides the necessary
> > tailroom for XDP-redirect.
>
> We still allow to load XDP prog when ring is not using build_skb(). I have
> a feeling that we should drop this case now.
>
> Alex/John/Bjorn WDYT?

The comment Jesper had about using using build_skb() when XDP is in
use is incorrect. The two are not correlated. The underlying buffer is
the same, however we drop the headroom and tailroom if we are in
_RX_LEGACY mode. We default to build_skb and the option of switching
to legacy Rx is controlled via the device private flags.

However with that said the change itself is mostly harmless, and
likely helps to resolve issues that would be seen if somebody were to
enable XDP while having the RX_LEGACY flag set.

> >
> > When XDP frame size doesn't depend on RX packet size (4K case), then
> > xdp.frame_sz can be updated once outside the main NAPI loop.
> >
> > Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >  drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   17 +++++++++++++++++
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   18 ++++++++++--------
> >  2 files changed, 27 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> > index 2833e4f041ce..943b643b6ed8 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> > @@ -417,6 +417,23 @@ static inline unsigned int ixgbe_rx_pg_order(struct ixgbe_ring *ring)
> >  }
> >  #define ixgbe_rx_pg_size(_ring) (PAGE_SIZE << ixgbe_rx_pg_order(_ring))
> >
> > +static inline unsigned int ixgbe_rx_frame_truesize(struct ixgbe_ring *rx_ring,
> > +                                                unsigned int size)
> > +{
> > +     unsigned int truesize;
> > +
> > +#if (PAGE_SIZE < 8192)
> > +     truesize = ixgbe_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
> > +#else
> > +     /* Notice XDP must use build_skb() mode */
> > +     truesize = ring_uses_build_skb(rx_ring) ?
> > +             SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
> > +             SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
> > +             SKB_DATA_ALIGN(size);
> > +#endif
> > +     return truesize;
> > +}
> > +
> >  #define IXGBE_ITR_ADAPTIVE_MIN_INC   2
> >  #define IXGBE_ITR_ADAPTIVE_MIN_USECS 10
> >  #define IXGBE_ITR_ADAPTIVE_MAX_USECS 126
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > index ea6834bae04c..f505ed8c9dc1 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > @@ -2248,16 +2248,10 @@ static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
> >                                struct ixgbe_rx_buffer *rx_buffer,
> >                                unsigned int size)
> >  {
> > +     unsigned int truesize = ixgbe_rx_frame_truesize(rx_ring, size);
> >  #if (PAGE_SIZE < 8192)
> > -     unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
> > -
> >       rx_buffer->page_offset ^= truesize;
> >  #else
> > -     unsigned int truesize = ring_uses_build_skb(rx_ring) ?
> > -                             SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
> > -                             SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
> > -                             SKB_DATA_ALIGN(size);
> > -
> >       rx_buffer->page_offset += truesize;
> >  #endif
> >  }
> > @@ -2291,6 +2285,11 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
> >
> >       xdp.rxq = &rx_ring->xdp_rxq;
> >
> > +     /* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
> > +#if (PAGE_SIZE < 8192)
> > +     xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, 0);
> > +#endif
> > +
> >       while (likely(total_rx_packets < budget)) {
> >               union ixgbe_adv_rx_desc *rx_desc;
> >               struct ixgbe_rx_buffer *rx_buffer;
> > @@ -2324,7 +2323,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
> >                       xdp.data_hard_start = xdp.data -
> >                                             ixgbe_rx_offset(rx_ring);
> >                       xdp.data_end = xdp.data + size;
> > -
> > +#if (PAGE_SIZE > 4096)
> > +                     /* At larger PAGE_SIZE, frame_sz depend on size */
> > +                     xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, size);
> > +#endif
> >                       skb = ixgbe_run_xdp(adapter, rx_ring, &xdp);
> >               }
> >
> >
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v1 09/15] xdp: clear grow memory in bpf_xdp_adjust_tail()
  2020-03-18 10:25     ` Jesper Dangaard Brouer
@ 2020-03-19  5:35       ` Alexei Starovoitov
  0 siblings, 0 replies; 29+ messages in thread
From: Alexei Starovoitov @ 2020-03-19  5:35 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, sameehj, Network Development,
	bpf, zorik, akiyano, gtzalik, Daniel Borkmann, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi

On Wed, Mar 18, 2020 at 12:25 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Wed, 18 Mar 2020 10:15:38 +0100
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> > Jesper Dangaard Brouer <brouer@redhat.com> writes:
> >
> > > To reviewers: Need some opinions if this is needed?
> > >
> > > (TODO: Squash patch)
> > > ---
> > >  net/core/filter.c |    6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 0ceddee0c678..669f29992177 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -3432,6 +3432,12 @@ BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
> > >     if (unlikely(data_end < xdp->data + ETH_HLEN))
> > >             return -EINVAL;
> > >
> > > +   // XXX: To reviewers: How paranoid are we? Do we really need to
> > > +   /* clear memory area on grow, as in-theory can contain uninit kmem */
> > > +   if (offset > 0) {
> > > +           memset(xdp->data_end, 0, offset);
> > > +   }
> >
> > This memory will usually be recycled through page_pool or equivalent,
> > right? So couldn't we clear the pages when they are first allocated?
> > That way, the only data that would be left there would be packet data
> > from previous packets...
>
> Yes, that is another option, to clear pages on "real" alloc (not
> recycle alloc), but it is a bit harder to implement (when not using
> page_pool).
>
> And yes, this area will very likely just contain old packet data, but
> we cannot be 100% sure.
>
> Previously Alexei have argued that we should not leak pointer values in
> XDP.  Which is why we have xdp_scrub_frame(), but this is not 100% the
> same.  So, I would like to hear Alexei's opinion ?

those were the days when sw didn't need to worry about hw bugs.
Looks like these types of security issues are acceptable now, since
pointer leaks no longer get CVE assigned.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v1 03/15] bnxt: add XDP frame size to driver
  2020-03-17 17:29 ` [PATCH RFC v1 03/15] bnxt: " Jesper Dangaard Brouer
@ 2020-03-20 19:22   ` Andy Gospodarek
  2020-03-23 14:18   ` Andy Gospodarek
  1 sibling, 0 replies; 29+ messages in thread
From: Andy Gospodarek @ 2020-03-20 19:22 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, Michael Chan, Andy Gospodarek, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi

On Tue, Mar 17, 2020 at 06:29:22PM +0100, Jesper Dangaard Brouer wrote:
> This driver uses full PAGE_SIZE pages when XDP is enabled.
> 
> Cc: Michael Chan <michael.chan@broadcom.com>
> Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> index c6f6f2033880..5e3b4a3b69ea 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> @@ -138,6 +138,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
>  	xdp_set_data_meta_invalid(&xdp);
>  	xdp.data_end = *data_ptr + *len;
>  	xdp.rxq = &rxr->xdp_rxq;
> +	xdp.frame_sz = PAGE_SIZE; /* BNXT_RX_PAGE_MODE(bp) when XDP enabled */

So if we want this to be the true size that the packet inside the the DMA
buffer could grow to, I _think_ this would need to be:

	xdp.frame_sz = PAGE_SIZE - XDP_PACKET_HEADROOM;

but I also noted that in patch 8 of the series that there is a check
against data_end - data_hard_start, so functionally your original patch
appears to be correct.

If the intent was just to capture the size of the [DMA] buffer available
for the datagram, maybe calling this new field 'buf_sz' or similar would
be nice as it does not convey anything about the on-wire size like
'frame_sz' does.


>  	orig_data = xdp.data;
>  
>  	rcu_read_lock();
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v1 05/15] ixgbe: add XDP frame size to driver
  2020-03-18 21:23     ` Alexander Duyck
@ 2020-03-20 21:44       ` Jesper Dangaard Brouer
  2020-03-20 22:37         ` Alexander Duyck
  0 siblings, 1 reply; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-20 21:44 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Maciej Fijalkowski, Jubran, Samih, Jeff Kirsher, Netdev, bpf,
	zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend, David Ahern,
	Willem de Bruijn, Ilias Apalodimas, Lorenzo Bianconi,
	Björn Töpel, kuba, brouer

On Wed, 18 Mar 2020 14:23:09 -0700
Alexander Duyck <alexander.duyck@gmail.com> wrote:

> On Wed, Mar 18, 2020 at 1:04 PM Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
> >
> > On Tue, Mar 17, 2020 at 06:29:33PM +0100, Jesper Dangaard Brouer wrote:  
> > > The ixgbe driver uses different memory models depending on PAGE_SIZE at
> > > compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
> > > normal MTU frame size is 2048 bytes (and headroom 192 bytes).  
> >
> > To be clear the 2048 is the size of buffer given to HW and we slice it up
> > in a following way:
> > - 192 bytes dedicated for headroom
> > - 1500 is max allowed MTU for this setup
> > - 320 bytes for tailroom (skb shinfo)
> >
> > In case you go with higher MTU then 3K buffer would be used and it would
> > came from order1 page and we still do the half split. Just FYI all of this
> > is for PAGE_SIZE == 4k and L1$ size == 64.  
> 
> True, but for most people this is the most common case since these are
> the standard for x86.
> 
> > > For PAGE_SIZE larger than 4K, driver advance its rx_buffer->page_offset
> > > with the frame size "truesize".  
> >
> > Alex, couldn't we base the truesize here somehow on ixgbe_rx_bufsz() since
> > these are the sizes that we are passing to hw? I must admit I haven't been
> > in touch with systems with PAGE_SIZE > 4K.  
> 
> With a page size greater than 4K we can actually get many more uses
> out of a page by using the frame size to determine the truesize of the
> packet. The truesize is the memory footprint currently being held by
> the packet. So once the packet is filled we just have to add the
> headroom and tailroom to whatever the hardware wrote instead of having
> to use what we gave to the hardware. That gives us better efficiency,
> if we used ixgbe_rx_bufsz() we would penalize small packets and that
> in turn would likely hurt performance.
> 
> > >
> > > When driver enable XDP it uses build_skb() which provides the necessary
> > > tailroom for XDP-redirect.  
> >
> > We still allow to load XDP prog when ring is not using build_skb(). I have
> > a feeling that we should drop this case now.
> >
> > Alex/John/Bjorn WDYT?  
> 
> The comment Jesper had about using using build_skb() when XDP is in
> use is incorrect. The two are not correlated. The underlying buffer is
> the same, however we drop the headroom and tailroom if we are in
> _RX_LEGACY mode. We default to build_skb and the option of switching
> to legacy Rx is controlled via the device private flags.

Thanks for catching that.

> However with that said the change itself is mostly harmless, and
> likely helps to resolve issues that would be seen if somebody were to
> enable XDP while having the RX_LEGACY flag set.

So what is the path forward(?).  Are you/Intel okay with disallowing
XDP when the RX_LEGACY flag is set?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v1 05/15] ixgbe: add XDP frame size to driver
  2020-03-20 21:44       ` Jesper Dangaard Brouer
@ 2020-03-20 22:37         ` Alexander Duyck
  0 siblings, 0 replies; 29+ messages in thread
From: Alexander Duyck @ 2020-03-20 22:37 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Maciej Fijalkowski, Jubran, Samih, Jeff Kirsher, Netdev, bpf,
	zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend, David Ahern,
	Willem de Bruijn, Ilias Apalodimas, Lorenzo Bianconi,
	Björn Töpel, kuba

On Fri, Mar 20, 2020 at 2:44 PM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Wed, 18 Mar 2020 14:23:09 -0700
> Alexander Duyck <alexander.duyck@gmail.com> wrote:
>
> > On Wed, Mar 18, 2020 at 1:04 PM Maciej Fijalkowski
> > <maciej.fijalkowski@intel.com> wrote:
> > >
> > > On Tue, Mar 17, 2020 at 06:29:33PM +0100, Jesper Dangaard Brouer wrote:
> > > > The ixgbe driver uses different memory models depending on PAGE_SIZE at
> > > > compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
> > > > normal MTU frame size is 2048 bytes (and headroom 192 bytes).
> > >
> > > To be clear the 2048 is the size of buffer given to HW and we slice it up
> > > in a following way:
> > > - 192 bytes dedicated for headroom
> > > - 1500 is max allowed MTU for this setup
> > > - 320 bytes for tailroom (skb shinfo)
> > >
> > > In case you go with higher MTU then 3K buffer would be used and it would
> > > came from order1 page and we still do the half split. Just FYI all of this
> > > is for PAGE_SIZE == 4k and L1$ size == 64.
> >
> > True, but for most people this is the most common case since these are
> > the standard for x86.
> >
> > > > For PAGE_SIZE larger than 4K, driver advance its rx_buffer->page_offset
> > > > with the frame size "truesize".
> > >
> > > Alex, couldn't we base the truesize here somehow on ixgbe_rx_bufsz() since
> > > these are the sizes that we are passing to hw? I must admit I haven't been
> > > in touch with systems with PAGE_SIZE > 4K.
> >
> > With a page size greater than 4K we can actually get many more uses
> > out of a page by using the frame size to determine the truesize of the
> > packet. The truesize is the memory footprint currently being held by
> > the packet. So once the packet is filled we just have to add the
> > headroom and tailroom to whatever the hardware wrote instead of having
> > to use what we gave to the hardware. That gives us better efficiency,
> > if we used ixgbe_rx_bufsz() we would penalize small packets and that
> > in turn would likely hurt performance.
> >
> > > >
> > > > When driver enable XDP it uses build_skb() which provides the necessary
> > > > tailroom for XDP-redirect.
> > >
> > > We still allow to load XDP prog when ring is not using build_skb(). I have
> > > a feeling that we should drop this case now.
> > >
> > > Alex/John/Bjorn WDYT?
> >
> > The comment Jesper had about using using build_skb() when XDP is in
> > use is incorrect. The two are not correlated. The underlying buffer is
> > the same, however we drop the headroom and tailroom if we are in
> > _RX_LEGACY mode. We default to build_skb and the option of switching
> > to legacy Rx is controlled via the device private flags.
>
> Thanks for catching that.
>
> > However with that said the change itself is mostly harmless, and
> > likely helps to resolve issues that would be seen if somebody were to
> > enable XDP while having the RX_LEGACY flag set.
>
> So what is the path forward(?).  Are you/Intel okay with disallowing
> XDP when the RX_LEGACY flag is set?

Why would we need to disallow it? It won't work for the redirect use
case, but other use cases should work just fine. I thought with this
patch set you were correctly reporting the headroom or tailroom so
that we would either reallocate or just drop the frame if it cannot be
handled.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v1 03/15] bnxt: add XDP frame size to driver
  2020-03-17 17:29 ` [PATCH RFC v1 03/15] bnxt: " Jesper Dangaard Brouer
  2020-03-20 19:22   ` Andy Gospodarek
@ 2020-03-23 14:18   ` Andy Gospodarek
  2020-03-23 14:44     ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 29+ messages in thread
From: Andy Gospodarek @ 2020-03-23 14:18 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, Michael Chan, Andy Gospodarek, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi

On Tue, Mar 17, 2020 at 06:29:22PM +0100, Jesper Dangaard Brouer wrote:
> This driver uses full PAGE_SIZE pages when XDP is enabled.

Talked with Jesper about this some more on IRC and he clarified
something for me that was bugging me.

> Cc: Michael Chan <michael.chan@broadcom.com>
> Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>

I know this is only an RFC, but feel free to add:

Reviewed-by: Andy Gospodarek <gospo@broadcom.com>

to this patch.  Thanks for your work on this!

> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> index c6f6f2033880..5e3b4a3b69ea 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> @@ -138,6 +138,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
>  	xdp_set_data_meta_invalid(&xdp);
>  	xdp.data_end = *data_ptr + *len;
>  	xdp.rxq = &rxr->xdp_rxq;
> +	xdp.frame_sz = PAGE_SIZE; /* BNXT_RX_PAGE_MODE(bp) when XDP enabled */
>  	orig_data = xdp.data;
>  
>  	rcu_read_lock();
> 
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH RFC v1 03/15] bnxt: add XDP frame size to driver
  2020-03-23 14:18   ` Andy Gospodarek
@ 2020-03-23 14:44     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 29+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-23 14:44 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: sameehj, Michael Chan, Andy Gospodarek, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, brouer

On Mon, 23 Mar 2020 10:18:33 -0400
Andy Gospodarek <andy@greyhouse.net> wrote:

> On Tue, Mar 17, 2020 at 06:29:22PM +0100, Jesper Dangaard Brouer wrote:
> > This driver uses full PAGE_SIZE pages when XDP is enabled.  
> 
> Talked with Jesper about this some more on IRC and he clarified
> something for me that was bugging me.

Yes, nice talking to you on IRC. Note some XDP developers are hanging
out on freenode channel #xdp (my nick is netoptimizer from my GitHub name)

> > Cc: Michael Chan <michael.chan@broadcom.com>
> > Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>  
> 
> I know this is only an RFC, but feel free to add:
> 
> Reviewed-by: Andy Gospodarek <gospo@broadcom.com>

Great, I've added this to my current patchset :-)
 
> to this patch.  Thanks for your work on this!

Thanks!

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-03-23 14:45 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-17 17:29 [PATCH RFC v1 00/15] XDP extend with knowledge of frame size Jesper Dangaard Brouer
2020-03-17 17:29 ` [PATCH RFC v1 01/15] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
2020-03-17 20:42   ` Jakub Kicinski
2020-03-18  6:58     ` Jesper Dangaard Brouer
2020-03-17 17:29 ` [PATCH RFC v1 02/15] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
2020-03-17 17:29 ` [PATCH RFC v1 03/15] bnxt: " Jesper Dangaard Brouer
2020-03-20 19:22   ` Andy Gospodarek
2020-03-23 14:18   ` Andy Gospodarek
2020-03-23 14:44     ` Jesper Dangaard Brouer
2020-03-17 17:29 ` [PATCH RFC v1 04/15] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
2020-03-17 17:29 ` [PATCH RFC v1 05/15] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
2020-03-18 20:03   ` Maciej Fijalkowski
2020-03-18 21:23     ` Alexander Duyck
2020-03-20 21:44       ` Jesper Dangaard Brouer
2020-03-20 22:37         ` Alexander Duyck
2020-03-17 17:29 ` [PATCH RFC v1 06/15] sfc: fix XDP-redirect in this driver Jesper Dangaard Brouer
2020-03-17 17:29 ` [PATCH RFC v1 07/15] sfc: add XDP frame size Jesper Dangaard Brouer
2020-03-17 17:29 ` [PATCH RFC v1 08/15] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
2020-03-17 17:29 ` [PATCH RFC v1 09/15] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
2020-03-17 20:38   ` Jakub Kicinski
2020-03-18  9:15   ` Toke Høiland-Jørgensen
2020-03-18 10:25     ` Jesper Dangaard Brouer
2020-03-19  5:35       ` Alexei Starovoitov
2020-03-17 17:29 ` [PATCH RFC v1 10/15] net: XDP-generic determining XDP frame size Jesper Dangaard Brouer
2020-03-17 17:30 ` [PATCH RFC v1 11/15] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
2020-03-17 17:30 ` [PATCH RFC v1 12/15] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
2020-03-17 17:30 ` [PATCH RFC v1 13/15] tun: add XDP frame size Jesper Dangaard Brouer
2020-03-17 17:30 ` [PATCH RFC v1 14/15] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
2020-03-17 17:30 ` [PATCH RFC v1 15/15] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).