All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size
@ 2020-04-08 11:50 Jesper Dangaard Brouer
  2020-04-08 11:50 ` [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
                   ` (33 more replies)
  0 siblings, 34 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:50 UTC (permalink / raw)
  To: intel-wired-lan

RFC-note: This is only an RFC because net-next is closed.
- Please ACK patches you like, then I will collect those for later.

XDP have evolved to support several frame sizes, but xdp_buff was not
updated with this information. This have caused the side-effect that
XDP frame data hard end is unknown. This have limited the BPF-helper
bpf_xdp_adjust_tail to only shrink the packet. This patchset address
this and add packet tail extend/grow.

The purpose of the patchset is ALSO to reserve a memory area that can be
used for storing extra information, specifically for extending XDP with
multi-buffer support. One proposal is to use same layout as
skb_shared_info, which is why this area is currently 320 bytes.

When converting xdp_frame to SKB (veth and cpumap), the full tailroom
area can now be used and SKB truesize is now correct. For most
drivers this result in a much larger tailroom in SKB "head" data
area. The network stack can now take advantage of this when doing SKB
coalescing. Thus, a good driver test is to use xdp_redirect_cpu from
samples/bpf/ and do some TCP stream testing.

---

Ilias Apalodimas (1):
      net: netsec: Add support for XDP frame size

Jesper Dangaard Brouer (32):
      xdp: add frame size to xdp_buff
      bnxt: add XDP frame size to driver
      sfc: add XDP frame size
      mvneta: add XDP frame size to driver
      net: XDP-generic determining XDP frame size
      xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame
      xdp: cpumap redirect use frame_sz and increase skb_tailroom
      veth: adjust hard_start offset on redirect XDP frames
      veth: xdp using frame_sz in veth driver
      dpaa2-eth: add XDP frame size
      hv_netvsc: add XDP frame size to driver
      qlogic/qede: add XDP frame size to driver
      net: ethernet: ti: add XDP frame size to driver cpsw
      ena: add XDP frame size to amazon NIC driver
      mlx4: add XDP frame size and adjust max XDP MTU
      mlx5: rx queue setup time determine frame_sz for XDP
      net: thunderx: add XDP frame size
      nfp: add XDP frame size to netronome driver
      tun: add XDP frame size
      vhost_net: also populate XDP frame size
      virtio_net: add XDP frame size in two code paths
      ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K
      ixgbe: add XDP frame size to driver
      ixgbevf: add XDP frame size to VF driver
      i40e: add XDP frame size to driver
      ice: add XDP frame size to driver
      xdp: for Intel AF_XDP drivers add XDP frame_sz
      xdp: allow bpf_xdp_adjust_tail() to grow packet size
      xdp: clear grow memory in bpf_xdp_adjust_tail()
      bpf: add xdp.frame_sz in bpf_prog_test_run_xdp().
      selftests/bpf: adjust BPF selftest for xdp_adjust_tail
      selftests/bpf: xdp_adjust_tail add grow tail tests


 drivers/net/ethernet/amazon/ena/ena_netdev.c       |    1 
 drivers/net/ethernet/amazon/ena/ena_netdev.h       |    5 -
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c      |    1 
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |    1 
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c   |    1 
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        |   31 ++++-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c         |    2 
 drivers/net/ethernet/intel/ice/ice_txrx.c          |   34 ++++--
 drivers/net/ethernet/intel/ice/ice_xsk.c           |    2 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |   33 ++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c       |    2 
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c  |   34 ++++--
 drivers/net/ethernet/marvell/mvneta.c              |   25 ++--
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c     |    3 
 drivers/net/ethernet/mellanox/mlx4/en_rx.c         |    1 
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |    1 
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c   |    1 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |    4 +
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |    1 
 drivers/net/ethernet/qlogic/qede/qede_fp.c         |    1 
 drivers/net/ethernet/qlogic/qede/qede_main.c       |    2 
 drivers/net/ethernet/sfc/rx.c                      |    1 
 drivers/net/ethernet/socionext/netsec.c            |   30 +++--
 drivers/net/ethernet/ti/cpsw.c                     |    1 
 drivers/net/ethernet/ti/cpsw_new.c                 |    1 
 drivers/net/hyperv/netvsc_bpf.c                    |    1 
 drivers/net/hyperv/netvsc_drv.c                    |    2 
 drivers/net/tun.c                                  |    2 
 drivers/net/veth.c                                 |   28 +++--
 drivers/net/virtio_net.c                           |   15 ++
 drivers/vhost/net.c                                |    1 
 include/net/xdp.h                                  |   31 +++++
 include/net/xdp_sock.h                             |   11 ++
 include/uapi/linux/bpf.h                           |    4 -
 kernel/bpf/cpumap.c                                |   21 ---
 net/bpf/test_run.c                                 |   16 ++-
 net/core/dev.c                                     |   14 +-
 net/core/filter.c                                  |   23 +++-
 net/core/xdp.c                                     |    7 +
 .../selftests/bpf/prog_tests/xdp_adjust_tail.c     |  123 +++++++++++++++++++-
 .../testing/selftests/bpf/progs/test_adjust_tail.c |   30 -----
 .../bpf/progs/test_xdp_adjust_tail_grow.c          |   33 +++++
 .../bpf/progs/test_xdp_adjust_tail_shrink.c        |   30 +++++
 43 files changed, 472 insertions(+), 139 deletions(-)
 delete mode 100644 tools/testing/selftests/bpf/progs/test_adjust_tail.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c

--


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
@ 2020-04-08 11:50 ` Jesper Dangaard Brouer
  2020-04-08 17:53   ` Jakub Kicinski
  2020-04-09  0:50   ` Saeed Mahameed
  2020-04-08 11:50 ` [PATCH RFC v2 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (32 subsequent siblings)
  33 siblings, 2 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:50 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

XDP have evolved to support several frame sizes, but xdp_buff was not
updated with this information. The frame size (frame_sz) member of
xdp_buff is introduced to know the real size of the memory the frame is
delivered in.

When introducing this also make it clear that some tailroom is
reserved/required when creating SKBs using build_skb().

It would also have been an option to introduce a pointer to
data_hard_end (with reserved offset). The advantage with frame_sz is
that (like rxq) drivers only need to setup/assign this value once per
NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to
store frame_sz inside xdp_rxq_info, because it's varies per packet as it
can be based/depend on packet length.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/net/xdp.h |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 40c6d3398458..99f4374f6214 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -6,6 +6,8 @@
 #ifndef __LINUX_NET_XDP_H__
 #define __LINUX_NET_XDP_H__
 
+#include <linux/skbuff.h> /* skb_shared_info */
+
 /**
  * DOC: XDP RX-queue information
  *
@@ -70,8 +72,23 @@ struct xdp_buff {
 	void *data_hard_start;
 	unsigned long handle;
 	struct xdp_rxq_info *rxq;
+	u32 frame_sz; /* frame size to deduct data_hard_end/reserved tailroom*/
 };
 
+/* Reserve memory area at end-of data area.
+ *
+ * This macro reserves tailroom in the XDP buffer by limiting the
+ * XDP/BPF data access to data_hard_end.  Notice same area (and size)
+ * is used for XDP_PASS, when constructing the SKB via build_skb().
+ */
+#define xdp_data_hard_end(xdp)				\
+	((xdp)->data_hard_start + (xdp)->frame_sz -	\
+	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
+/* Like skb_shinfo */
+#define xdp_shinfo(xdp)	((struct skb_shared_info *)(xdp_data_hard_end(xdp)))
+// XXX: Above likely belongs in later patch
+
 struct xdp_frame {
 	void *data;
 	u16 len;



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 02/33] bnxt: add XDP frame size to driver
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
  2020-04-08 11:50 ` [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
@ 2020-04-08 11:50 ` Jesper Dangaard Brouer
  2020-04-08 11:50 ` [PATCH RFC v2 03/33] sfc: add XDP frame size Jesper Dangaard Brouer
                   ` (31 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:50 UTC (permalink / raw)
  To: sameehj
  Cc: Michael Chan, Andy Gospodarek, Andy Gospodarek,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

This driver uses full PAGE_SIZE pages when XDP is enabled.

Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index c6f6f2033880..5e3b4a3b69ea 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -138,6 +138,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = *data_ptr + *len;
 	xdp.rxq = &rxr->xdp_rxq;
+	xdp.frame_sz = PAGE_SIZE; /* BNXT_RX_PAGE_MODE(bp) when XDP enabled */
 	orig_data = xdp.data;
 
 	rcu_read_lock();



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 03/33] sfc: add XDP frame size
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
  2020-04-08 11:50 ` [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
  2020-04-08 11:50 ` [PATCH RFC v2 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-08 11:50 ` Jesper Dangaard Brouer
  2020-04-08 11:50 ` [PATCH RFC v2 04/33] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (30 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:50 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

This driver uses RX page-split when possible. It was recently fixed
in commit 86e85bf6981c ("sfc: fix XDP-redirect in this driver") to
add needed tailroom for XDP-redirect.

After the fix efx->rx_page_buf_step is the frame size, with enough
head and tail-room for XDP-redirect.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/sfc/rx.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 260352d97d9d..68c47a8c71df 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -308,6 +308,7 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = xdp.data + rx_buf->len;
 	xdp.rxq = &rx_queue->xdp_rxq_info;
+	xdp.frame_sz = efx->rx_page_buf_step;
 
 	xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp);
 	rcu_read_unlock();



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 04/33] mvneta: add XDP frame size to driver
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (2 preceding siblings ...)
  2020-04-08 11:50 ` [PATCH RFC v2 03/33] sfc: add XDP frame size Jesper Dangaard Brouer
@ 2020-04-08 11:50 ` Jesper Dangaard Brouer
  2020-04-08 11:50 ` [PATCH RFC v2 05/33] net: netsec: Add support for XDP frame size Jesper Dangaard Brouer
                   ` (29 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:50 UTC (permalink / raw)
  To: sameehj
  Cc: thomas.petazzoni, Lorenzo Bianconi, Jesper Dangaard Brouer,
	netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

This marvell driver mvneta uses PAGE_SIZE frames, which makes it
really easy to convert.  Driver updates rxq and now frame_sz
once per NAPI call.

This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that
can help reduce the number of cache-lines that need to be flushed
when doing DMA sync for_device. Due to xdp_adjust_tail can grow the
area accessible to the by the CPU (can possibly write into), then max
sync length *after* bpf_prog_run_xdp() needs to be taken into account.

Cc: thomas.petazzoni@bootlin.com
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/marvell/mvneta.c |   25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 5be61f73b6ab..612a6c273970 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2148,12 +2148,17 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	       struct bpf_prog *prog, struct xdp_buff *xdp,
 	       struct mvneta_stats *stats)
 {
-	unsigned int len;
+	unsigned int len, sync;
+	struct page *page;
 	u32 ret, act;
 
 	len = xdp->data_end - xdp->data_hard_start - pp->rx_offset_correction;
 	act = bpf_prog_run_xdp(prog, xdp);
 
+	/* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */
+	sync = xdp->data_end - xdp->data_hard_start - pp->rx_offset_correction;
+	sync = max(sync, len);
+
 	switch (act) {
 	case XDP_PASS:
 		stats->xdp_pass++;
@@ -2164,9 +2169,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		err = xdp_do_redirect(pp->dev, xdp, prog);
 		if (unlikely(err)) {
 			ret = MVNETA_XDP_DROPPED;
-			page_pool_put_page(rxq->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(rxq->page_pool, page, sync, true);
 		} else {
 			ret = MVNETA_XDP_REDIR;
 			stats->xdp_redirect++;
@@ -2175,10 +2179,10 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	}
 	case XDP_TX:
 		ret = mvneta_xdp_xmit_back(pp, xdp);
-		if (ret != MVNETA_XDP_TX)
-			page_pool_put_page(rxq->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+		if (ret != MVNETA_XDP_TX) {
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(rxq->page_pool, page, sync, true);
+		}
 		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
@@ -2187,8 +2191,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		trace_xdp_exception(pp->dev, prog, act);
 		/* fall through */
 	case XDP_DROP:
-		page_pool_put_page(rxq->page_pool,
-				   virt_to_head_page(xdp->data), len, true);
+		page = virt_to_head_page(xdp->data);
+		page_pool_put_page(rxq->page_pool, page, sync, true);
 		ret = MVNETA_XDP_DROPPED;
 		stats->xdp_drop++;
 		break;
@@ -2320,6 +2324,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(pp->xdp_prog);
 	xdp_buf.rxq = &rxq->xdp_rxq;
+	xdp_buf.frame_sz = PAGE_SIZE;
 
 	/* Fairness NAPI loop */
 	while (rx_proc < budget && rx_proc < rx_todo) {



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 05/33] net: netsec: Add support for XDP frame size
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (3 preceding siblings ...)
  2020-04-08 11:50 ` [PATCH RFC v2 04/33] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-08 11:50 ` Jesper Dangaard Brouer
  2020-04-08 13:09   ` Lorenzo Bianconi
  2020-04-08 11:51 ` [PATCH RFC v2 06/33] net: XDP-generic determining " Jesper Dangaard Brouer
                   ` (28 subsequent siblings)
  33 siblings, 1 reply; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:50 UTC (permalink / raw)
  To: sameehj
  Cc: Ilias Apalodimas, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that
can help reduce the number of cache-lines that need to be flushed
when doing DMA sync for_device. Due to xdp_adjust_tail can grow the
area accessible to the by the CPU (can possibly write into), then max
sync length *after* bpf_prog_run_xdp() needs to be taken into account.

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/socionext/netsec.c |   30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index a5a0fb60193a..e1f4be4b3d69 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -884,23 +884,28 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
 			  struct xdp_buff *xdp)
 {
 	struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_RX];
-	unsigned int len = xdp->data_end - xdp->data;
+	unsigned int sync, len = xdp->data_end - xdp->data;
 	u32 ret = NETSEC_XDP_PASS;
+	struct page *page;
 	int err;
 	u32 act;
 
 	act = bpf_prog_run_xdp(prog, xdp);
 
+	/* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */
+	sync = xdp->data_end - xdp->data_hard_start - NETSEC_RXBUF_HEADROOM;
+	sync = max(sync, len);
+
 	switch (act) {
 	case XDP_PASS:
 		ret = NETSEC_XDP_PASS;
 		break;
 	case XDP_TX:
 		ret = netsec_xdp_xmit_back(priv, xdp);
-		if (ret != NETSEC_XDP_TX)
-			page_pool_put_page(dring->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+		if (ret != NETSEC_XDP_TX) {
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(dring->page_pool, page, sync, true);
+		}
 		break;
 	case XDP_REDIRECT:
 		err = xdp_do_redirect(priv->ndev, xdp, prog);
@@ -908,9 +913,8 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
 			ret = NETSEC_XDP_REDIR;
 		} else {
 			ret = NETSEC_XDP_CONSUMED;
-			page_pool_put_page(dring->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(dring->page_pool, page, sync, true);
 		}
 		break;
 	default:
@@ -921,8 +925,8 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
 		/* fall through -- handle aborts by dropping packet */
 	case XDP_DROP:
 		ret = NETSEC_XDP_CONSUMED;
-		page_pool_put_page(dring->page_pool,
-				   virt_to_head_page(xdp->data), len, true);
+		page = virt_to_head_page(xdp->data);
+		page_pool_put_page(dring->page_pool, page, sync, true);
 		break;
 	}
 
@@ -936,10 +940,14 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
 	struct netsec_rx_pkt_info rx_info;
 	enum dma_data_direction dma_dir;
 	struct bpf_prog *xdp_prog;
+	struct xdp_buff xdp;
 	u16 xdp_xmit = 0;
 	u32 xdp_act = 0;
 	int done = 0;
 
+	xdp.rxq = &dring->xdp_rxq;
+	xdp.frame_sz = PAGE_SIZE;
+
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(priv->xdp_prog);
 	dma_dir = page_pool_get_dma_dir(dring->page_pool);
@@ -953,7 +961,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
 		struct sk_buff *skb = NULL;
 		u16 pkt_len, desc_len;
 		dma_addr_t dma_handle;
-		struct xdp_buff xdp;
 		void *buf_addr;
 
 		if (de->attr & (1U << NETSEC_RX_PKT_OWN_FIELD)) {
@@ -1002,7 +1009,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
 		xdp.data = desc->addr + NETSEC_RXBUF_HEADROOM;
 		xdp_set_data_meta_invalid(&xdp);
 		xdp.data_end = xdp.data + pkt_len;
-		xdp.rxq = &dring->xdp_rxq;
 
 		if (xdp_prog) {
 			xdp_result = netsec_run_xdp(priv, xdp_prog, &xdp);



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 06/33] net: XDP-generic determining XDP frame size
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (4 preceding siblings ...)
  2020-04-08 11:50 ` [PATCH RFC v2 05/33] net: netsec: Add support for XDP frame size Jesper Dangaard Brouer
@ 2020-04-08 11:51 ` Jesper Dangaard Brouer
  2020-04-08 11:51 ` [PATCH RFC v2 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
                   ` (27 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:51 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

The SKB "head" pointer points to the data area that contains
skb_shared_info, that can be found via skb_end_pointer(). Given
xdp->data_hard_start have been established (basically pointing to
skb->head), frame size is between skb_end_pointer() and data_hard_start,
plus the size reserved to skb_shared_info.

Change the bpf_xdp_adjust_tail offset adjust of skb->len, to be a positive
offset number on grow, and negative number on shrink.  As this seems more
natural when reading the code.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/dev.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 9c9e763bfe0e..899920c3a78f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4548,6 +4548,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 	xdp->data_meta = xdp->data;
 	xdp->data_end = xdp->data + hlen;
 	xdp->data_hard_start = skb->data - skb_headroom(skb);
+
+	/* SKB "head" area always have tailroom for skb_shared_info */
+	xdp->frame_sz  = (void *)skb_end_pointer(skb) - xdp->data_hard_start;
+	xdp->frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
 	orig_data_end = xdp->data_end;
 	orig_data = xdp->data;
 	eth = (struct ethhdr *)xdp->data;
@@ -4571,14 +4576,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 		skb_reset_network_header(skb);
 	}
 
-	/* check if bpf_xdp_adjust_tail was used. it can only "shrink"
-	 * pckt.
-	 */
-	off = orig_data_end - xdp->data_end;
+	/* check if bpf_xdp_adjust_tail was used */
+	off = xdp->data_end - orig_data_end;
 	if (off != 0) {
 		skb_set_tail_pointer(skb, xdp->data_end - xdp->data);
-		skb->len -= off;
-
+		skb->len += off; /* positive on grow, negative on shrink */
 	}
 
 	/* check if XDP changed eth hdr such SKB needs update */



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (5 preceding siblings ...)
  2020-04-08 11:51 ` [PATCH RFC v2 06/33] net: XDP-generic determining " Jesper Dangaard Brouer
@ 2020-04-08 11:51 ` Jesper Dangaard Brouer
  2020-04-08 11:51 ` [PATCH RFC v2 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
                   ` (26 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:51 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

Use hole in struct xdp_frame, when adding member frame_sz, which keeps
same sizeof struct (32 bytes)

Drivers ixgbe and sfc had bug cases where the necessary/expected
tailroom was not reserved. This can lead to some hard to catch memory
corruption issues. Having the drivers frame_sz this can be detected when
packet length/end via xdp->data_end exceed the xdp_data_hard_end
pointer, which accounts for the reserved the tailroom.

When detecting this driver issue, simply fail the conversion with NULL,
which results in feedback to driver (failing xdp_do_redirect()) causing
driver to drop packet. Given the lack of consistent XDP stats, this can
be hard to troubleshoot. And given this is a driver bug, we want to
generate some more noise in form of a WARN stack dump (to ID the driver
code that inlined convert_to_xdp_frame).

Inlining the WARN macro is problematic, because it adds an asm
instruction (on Intel CPUs ud2) what influence instruction cache
prefetching. Thus, introduce xdp_warn and macro XDP_WARN, to avoid this
and at the same time make identifying the function and line of this
inlined function easier.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/net/xdp.h |   14 +++++++++++++-
 net/core/xdp.c    |    7 +++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 99f4374f6214..55a885aa4e53 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -93,7 +93,8 @@ struct xdp_frame {
 	void *data;
 	u16 len;
 	u16 headroom;
-	u16 metasize;
+	u32 metasize:8;
+	u32 frame_sz:24;
 	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
 	 * while mem info is valid on remote CPU.
 	 */
@@ -108,6 +109,10 @@ static inline void xdp_scrub_frame(struct xdp_frame *frame)
 	frame->dev_rx = NULL;
 }
 
+/* Avoids inlining WARN macro in fast-path */
+void xdp_warn(const char* msg, const char* func, const int line);
+#define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)
+
 struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
 
 /* Convert xdp_buff to xdp_frame */
@@ -128,6 +133,12 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 	if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
 		return NULL;
 
+	/* Catch if driver didn't reserve tailroom for skb_shared_info */
+	if (unlikely(xdp->data_end > xdp_data_hard_end(xdp))) {
+		XDP_WARN("Driver BUG: missing reserved tailroom");
+		return NULL;
+	}
+
 	/* Store info in top of packet */
 	xdp_frame = xdp->data_hard_start;
 
@@ -135,6 +146,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 	xdp_frame->len  = xdp->data_end - xdp->data;
 	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
 	xdp_frame->metasize = metasize;
+	xdp_frame->frame_sz = xdp->frame_sz;
 
 	/* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
 	xdp_frame->mem = xdp->rxq->mem;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 4c7ea85486af..4bc3026ae218 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -11,6 +11,7 @@
 #include <linux/slab.h>
 #include <linux/idr.h>
 #include <linux/rhashtable.h>
+#include <linux/bug.h>
 #include <net/page_pool.h>
 
 #include <net/xdp.h>
@@ -496,3 +497,9 @@ struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp)
 	return xdpf;
 }
 EXPORT_SYMBOL_GPL(xdp_convert_zc_to_xdp_frame);
+
+/* Used by XDP_WARN macro, to avoid inlining WARN() in fast-path */
+void xdp_warn(const char* msg, const char* func, const int line) {
+	WARN(1, "XDP_WARN: %s(line:%d): %s\n", func, line, msg);
+};
+EXPORT_SYMBOL_GPL(xdp_warn);



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (6 preceding siblings ...)
  2020-04-08 11:51 ` [PATCH RFC v2 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
@ 2020-04-08 11:51 ` Jesper Dangaard Brouer
  2020-04-08 11:51 ` [PATCH RFC v2 09/33] veth: adjust hard_start offset on redirect XDP frames Jesper Dangaard Brouer
                   ` (25 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:51 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

Knowing the memory size backing the packet/xdp_frame data area, and
knowing it already have reserved room for skb_shared_info, simplifies
using build_skb significantly.

With this change we no-longer lie about the SKB truesize, but more
importantly a significant larger skb_tailroom is now provided, e.g. when
drivers uses a full PAGE_SIZE. This extra tailroom (in linear area) can be
used by the network stack when coalescing SKBs (e.g. in skb_try_coalesce,
see TCP cases where tcp_queue_rcv() can 'eat' skb).

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 kernel/bpf/cpumap.c |   21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 70f71b154fa5..9c777ac4d4bd 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -162,25 +162,10 @@ static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
 	/* Part of headroom was reserved to xdpf */
 	hard_start_headroom = sizeof(struct xdp_frame) +  xdpf->headroom;
 
-	/* build_skb need to place skb_shared_info after SKB end, and
-	 * also want to know the memory "truesize".  Thus, need to
-	 * know the memory frame size backing xdp_buff.
-	 *
-	 * XDP was designed to have PAGE_SIZE frames, but this
-	 * assumption is not longer true with ixgbe and i40e.  It
-	 * would be preferred to set frame_size to 2048 or 4096
-	 * depending on the driver.
-	 *   frame_size = 2048;
-	 *   frame_len  = frame_size - sizeof(*xdp_frame);
-	 *
-	 * Instead, with info avail, skb_shared_info in placed after
-	 * packet len.  This, unfortunately fakes the truesize.
-	 * Another disadvantage of this approach, the skb_shared_info
-	 * is not at a fixed memory location, with mixed length
-	 * packets, which is bad for cache-line hotness.
+	/* Memory size backing xdp_frame data already have reserved
+	 * room for build_skb to place skb_shared_info in tailroom.
 	 */
-	frame_size = SKB_DATA_ALIGN(xdpf->len + hard_start_headroom) +
-		SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	frame_size = xdpf->frame_sz;
 
 	pkt_data_start = xdpf->data - hard_start_headroom;
 	skb = build_skb_around(skb, pkt_data_start, frame_size);



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 09/33] veth: adjust hard_start offset on redirect XDP frames
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (7 preceding siblings ...)
  2020-04-08 11:51 ` [PATCH RFC v2 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
@ 2020-04-08 11:51 ` Jesper Dangaard Brouer
  2020-04-08 11:51 ` [PATCH RFC v2 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
                   ` (24 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:51 UTC (permalink / raw)
  To: sameehj
  Cc: Toshiaki Makita, Mao Wenan, Toshiaki Makita,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

When native XDP redirect into a veth device, the frame arrives in the
xdp_frame structure. It is then processed in veth_xdp_rcv_one(),
which can run a new XDP bpf_prog on the packet. Doing so requires
converting xdp_frame to xdp_buff, but the tricky part is that
xdp_frame memory area is located in the top (data_hard_start) memory
area that xdp_buff will point into.

The current code tried to protect the xdp_frame area, by assigning
xdp_buff.data_hard_start past this memory. This results in 32 bytes
less headroom to expand into via BPF-helper bpf_xdp_adjust_head().

This protect step is actually not needed, because BPF-helper
bpf_xdp_adjust_head() already reserve this area, and don't allow
BPF-prog to expand into it. Thus, it is safe to point data_hard_start
directly at xdp_frame memory area.

Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
Fixes: 9fc8d518d9d5 ("veth: Handle xdp_frames in xdp napi ring")
Reported-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
---
 drivers/net/veth.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index aece0e5eec8c..d5691bb84448 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -564,13 +564,15 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 					struct veth_stats *stats)
 {
 	void *hard_start = frame->data - frame->headroom;
-	void *head = hard_start - sizeof(struct xdp_frame);
 	int len = frame->len, delta = 0;
 	struct xdp_frame orig_frame;
 	struct bpf_prog *xdp_prog;
 	unsigned int headroom;
 	struct sk_buff *skb;
 
+	/* bpf_xdp_adjust_head() assures BPF cannot access xdp_frame area */
+	hard_start -= sizeof(struct xdp_frame);
+
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (likely(xdp_prog)) {
@@ -592,7 +594,6 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 			break;
 		case XDP_TX:
 			orig_frame = *frame;
-			xdp.data_hard_start = head;
 			xdp.rxq->mem = frame->mem;
 			if (unlikely(veth_xdp_tx(rq, &xdp, bq) < 0)) {
 				trace_xdp_exception(rq->dev, xdp_prog, act);
@@ -605,7 +606,6 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 			goto xdp_xmit;
 		case XDP_REDIRECT:
 			orig_frame = *frame;
-			xdp.data_hard_start = head;
 			xdp.rxq->mem = frame->mem;
 			if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
 				frame = &orig_frame;
@@ -629,7 +629,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 	rcu_read_unlock();
 
 	headroom = sizeof(struct xdp_frame) + frame->headroom - delta;
-	skb = veth_build_skb(head, headroom, len, 0);
+	skb = veth_build_skb(hard_start, headroom, len, 0);
 	if (!skb) {
 		xdp_return_frame(frame);
 		stats->rx_drops++;



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 10/33] veth: xdp using frame_sz in veth driver
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (8 preceding siblings ...)
  2020-04-08 11:51 ` [PATCH RFC v2 09/33] veth: adjust hard_start offset on redirect XDP frames Jesper Dangaard Brouer
@ 2020-04-08 11:51 ` Jesper Dangaard Brouer
  2020-04-08 11:51 ` [PATCH RFC v2 11/33] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
                   ` (23 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:51 UTC (permalink / raw)
  To: sameehj
  Cc: Toshiaki Makita, Lorenzo Bianconi, Jesper Dangaard Brouer,
	netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

The veth driver can run XDP in "native" mode in it's own NAPI
handler, and since commit 9fc8d518d9d5 ("veth: Handle xdp_frames in
xdp napi ring") packets can come in two forms either xdp_frame or
skb, calling respectively veth_xdp_rcv_one() or veth_xdp_rcv_skb().

For packets to arrive in xdp_frame format, they will have been
redirected from an XDP native driver. In case of XDP_PASS or no
XDP-prog attached, the veth driver will allocate and create an SKB.

The current code in veth_xdp_rcv_one() xdp_frame case, had to guess
the frame truesize of the incoming xdp_frame, when using
veth_build_skb(). With xdp_frame->frame_sz this is not longer
necessary.

Calculating the frame_sz in veth_xdp_rcv_skb() skb case, is done
similar to the XDP-generic handling code in net/core/dev.c.

Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/veth.c |   22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index d5691bb84448..b586d2fa5551 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -405,10 +405,6 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len,
 {
 	struct sk_buff *skb;
 
-	if (!buflen) {
-		buflen = SKB_DATA_ALIGN(headroom + len) +
-			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-	}
 	skb = build_skb(head, buflen);
 	if (!skb)
 		return NULL;
@@ -583,6 +579,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 		xdp.data = frame->data;
 		xdp.data_end = frame->data + frame->len;
 		xdp.data_meta = frame->data - frame->metasize;
+		xdp.frame_sz = frame->frame_sz;
 		xdp.rxq = &rq->xdp_rxq;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
@@ -629,7 +626,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 	rcu_read_unlock();
 
 	headroom = sizeof(struct xdp_frame) + frame->headroom - delta;
-	skb = veth_build_skb(hard_start, headroom, len, 0);
+	skb = veth_build_skb(hard_start, headroom, len, frame->frame_sz);
 	if (!skb) {
 		xdp_return_frame(frame);
 		stats->rx_drops++;
@@ -695,9 +692,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 			goto drop;
 		}
 
-		nskb = veth_build_skb(head,
-				      VETH_XDP_HEADROOM + mac_len, skb->len,
-				      PAGE_SIZE);
+		nskb = veth_build_skb(head, VETH_XDP_HEADROOM + mac_len,
+				      skb->len, PAGE_SIZE);
 		if (!nskb) {
 			page_frag_free(head);
 			goto drop;
@@ -715,6 +711,11 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	xdp.data_end = xdp.data + pktlen;
 	xdp.data_meta = xdp.data;
 	xdp.rxq = &rq->xdp_rxq;
+
+	/* SKB "head" area always have tailroom for skb_shared_info */
+	xdp.frame_sz = (void *)skb_end_pointer(skb) - xdp.data_hard_start;
+	xdp.frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
 	orig_data = xdp.data;
 	orig_data_end = xdp.data_end;
 
@@ -758,6 +759,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	}
 	rcu_read_unlock();
 
+	/* check if bpf_xdp_adjust_head was used */
 	delta = orig_data - xdp.data;
 	off = mac_len + delta;
 	if (off > 0)
@@ -765,9 +767,11 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	else if (off < 0)
 		__skb_pull(skb, -off);
 	skb->mac_header -= delta;
+
+	/* check if bpf_xdp_adjust_tail was used */
 	off = xdp.data_end - orig_data_end;
 	if (off != 0)
-		__skb_put(skb, off);
+		__skb_put(skb, off); /* positive on grow, negative on shrink */
 	skb->protocol = eth_type_trans(skb, rq->dev);
 
 	metalen = xdp.data - xdp.data_meta;



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 11/33] dpaa2-eth: add XDP frame size
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (9 preceding siblings ...)
  2020-04-08 11:51 ` [PATCH RFC v2 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
@ 2020-04-08 11:51 ` Jesper Dangaard Brouer
  2020-04-08 11:51 ` [PATCH RFC v2 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (22 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:51 UTC (permalink / raw)
  To: sameehj
  Cc: Ioana Radulescu, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed

This is the full page size:
 #define DPAA2_ETH_RX_BUF_RAW_SIZE	PAGE_SIZE

Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index b6c46639aa4c..d05c60e2da9c 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -301,6 +301,7 @@ static u32 run_xdp(struct dpaa2_eth_priv *priv,
 	xdp.data_hard_start = xdp.data - XDP_PACKET_HEADROOM;
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.rxq = &ch->xdp_rxq;
+	xdp.frame_sz = DPAA2_ETH_RX_BUF_RAW_SIZE;
 
 	xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp);
 



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 12/33] hv_netvsc: add XDP frame size to driver
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (10 preceding siblings ...)
  2020-04-08 11:51 ` [PATCH RFC v2 11/33] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
@ 2020-04-08 11:51 ` Jesper Dangaard Brouer
  2020-04-08 14:56   ` Haiyang Zhang
  2020-04-08 11:51 ` [PATCH RFC v2 13/33] qlogic/qede: " Jesper Dangaard Brouer
                   ` (21 subsequent siblings)
  33 siblings, 1 reply; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:51 UTC (permalink / raw)
  To: sameehj
  Cc: Wei Liu, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

The hyperv NIC drivers XDP implementation is rather disappointing as it
will be a slowdown to enable XDP on this driver, given it will allocate a
new page for each packet and copy over the payload, before invoking the
XDP BPF-prog.

The only positive thing it that its easy to determine the xdp.frame_sz.

Then XDP is enabled on this driver, XDP_PASS and XDP_TX will create the
SKB via build_skb (based on the newly allocated page). Now using XDP
frame_sz this will provide more skb_tailroom, which netstack can use for
SKB coalescing (e.g tcp_try_coalesce -> skb_try_coalesce).

Cc: Wei Liu <wei.liu@kernel.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/hyperv/netvsc_bpf.c |    1 +
 drivers/net/hyperv/netvsc_drv.c |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bpf.c
index b86611041db6..1e0c024b0a93 100644
--- a/drivers/net/hyperv/netvsc_bpf.c
+++ b/drivers/net/hyperv/netvsc_bpf.c
@@ -49,6 +49,7 @@ u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan,
 	xdp_set_data_meta_invalid(xdp);
 	xdp->data_end = xdp->data + len;
 	xdp->rxq = &nvchan->xdp_rxq;
+	xdp->frame_sz = PAGE_SIZE;
 	xdp->handle = 0;
 
 	memcpy(xdp->data, data, len);
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index d8e86bdbfba1..651344fea0a5 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -794,7 +794,7 @@ static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net,
 	if (xbuf) {
 		unsigned int hdroom = xdp->data - xdp->data_hard_start;
 		unsigned int xlen = xdp->data_end - xdp->data;
-		unsigned int frag_size = netvsc_xdp_fraglen(hdroom + xlen);
+		unsigned int frag_size = xdp->frame_sz;
 
 		skb = build_skb(xbuf, frag_size);
 



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 13/33] qlogic/qede: add XDP frame size to driver
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (11 preceding siblings ...)
  2020-04-08 11:51 ` [PATCH RFC v2 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-08 11:51 ` Jesper Dangaard Brouer
  2020-04-08 11:51 ` [PATCH RFC v2 14/33] net: ethernet: ti: add XDP frame size to driver cpsw Jesper Dangaard Brouer
                   ` (20 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:51 UTC (permalink / raw)
  To: sameehj
  Cc: Ariel Elior, GR-everest-linux-l2, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed

The driver qede uses a full page, when XDP is enabled. The drivers value
in rx_buf_seg_size (struct qede_rx_queue) will be PAGE_SIZE when an
XDP bpf_prog is attached.

Cc: Ariel Elior <aelior@marvell.com>
Cc: GR-everest-linux-l2@marvell.com
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/qlogic/qede/qede_fp.c   |    1 +
 drivers/net/ethernet/qlogic/qede/qede_main.c |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c
index c6c20776b474..7598ebe0962a 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c
@@ -1066,6 +1066,7 @@ static bool qede_rx_xdp(struct qede_dev *edev,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = xdp.data + *len;
 	xdp.rxq = &rxq->xdp_rxq;
+	xdp.frame_sz = rxq->rx_buf_seg_size; /* PAGE_SIZE when XDP enabled */
 
 	/* Queues always have a full reset currently, so for the time
 	 * being until there's atomic program replace just mark read
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 34fa3917eb33..39b404e8088f 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1398,7 +1398,7 @@ static int qede_alloc_mem_rxq(struct qede_dev *edev, struct qede_rx_queue *rxq)
 	if (rxq->rx_buf_size + size > PAGE_SIZE)
 		rxq->rx_buf_size = PAGE_SIZE - size;
 
-	/* Segment size to spilt a page in multiple equal parts ,
+	/* Segment size to split a page in multiple equal parts,
 	 * unless XDP is used in which case we'd use the entire page.
 	 */
 	if (!edev->xdp_prog) {



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 14/33] net: ethernet: ti: add XDP frame size to driver cpsw
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (12 preceding siblings ...)
  2020-04-08 11:51 ` [PATCH RFC v2 13/33] qlogic/qede: " Jesper Dangaard Brouer
@ 2020-04-08 11:51 ` Jesper Dangaard Brouer
  2020-04-08 11:51 ` [PATCH RFC v2 15/33] ena: add XDP frame size to amazon NIC driver Jesper Dangaard Brouer
                   ` (19 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:51 UTC (permalink / raw)
  To: sameehj
  Cc: Grygorii Strashko, Ilias Apalodimas, Jesper Dangaard Brouer,
	netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

The driver code cpsw.c and cpsw_new.c both use page_pool
with default order-0 pages or their RX-pages.

Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/ti/cpsw.c     |    1 +
 drivers/net/ethernet/ti/cpsw_new.c |    1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index c2c5bf87da01..58e346ea9898 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -406,6 +406,7 @@ static void cpsw_rx_handler(void *token, int len, int status)
 
 		xdp.data_hard_start = pa;
 		xdp.rxq = &priv->xdp_rxq[ch];
+		xdp.frame_sz = PAGE_SIZE;
 
 		port = priv->emac_port + cpsw->data.dual_emac;
 		ret = cpsw_run_xdp(priv, ch, &xdp, page, port);
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index 9209e613257d..08e1c5b8f00e 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -348,6 +348,7 @@ static void cpsw_rx_handler(void *token, int len, int status)
 
 		xdp.data_hard_start = pa;
 		xdp.rxq = &priv->xdp_rxq[ch];
+		xdp.frame_sz = PAGE_SIZE;
 
 		ret = cpsw_run_xdp(priv, ch, &xdp, page, priv->emac_port);
 		if (ret != CPSW_XDP_PASS)



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 15/33] ena: add XDP frame size to amazon NIC driver
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (13 preceding siblings ...)
  2020-04-08 11:51 ` [PATCH RFC v2 14/33] net: ethernet: ti: add XDP frame size to driver cpsw Jesper Dangaard Brouer
@ 2020-04-08 11:51 ` Jesper Dangaard Brouer
  2020-04-22  8:39   ` Jubran, Samih
  2020-04-08 11:51 ` [PATCH RFC v2 16/33] mlx4: add XDP frame size and adjust max XDP MTU Jesper Dangaard Brouer
                   ` (18 subsequent siblings)
  33 siblings, 1 reply; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:51 UTC (permalink / raw)
  To: sameehj
  Cc: Arthur Kiyanovski, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed

Frame size ENA_PAGE_SIZE is limited to 16K on systems with larger
PAGE_SIZE than 16K. Change ENA_XDP_MAX_MTU to also take into account
the reserved tailroom.

Cc: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c |    1 +
 drivers/net/ethernet/amazon/ena/ena_netdev.h |    5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 2cc765df8da3..0fd7db1769f8 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1606,6 +1606,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi,
 		  "%s qid %d\n", __func__, rx_ring->qid);
 	res_budget = budget;
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = ENA_PAGE_SIZE;
 
 	do {
 		xdp_verdict = XDP_PASS;
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 97dfd0c67e84..dd00127dfe9f 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -151,8 +151,9 @@
  * The buffer size we share with the device is defined to be ENA_PAGE_SIZE
  */
 
-#define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \
-				VLAN_HLEN - XDP_PACKET_HEADROOM)
+#define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN -	\
+			 VLAN_HLEN - XDP_PACKET_HEADROOM -		\
+			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 
 #define ENA_IS_XDP_INDEX(adapter, index) (((index) >= (adapter)->xdp_first_ring) && \
 	((index) < (adapter)->xdp_first_ring + (adapter)->xdp_num_queues))



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 16/33] mlx4: add XDP frame size and adjust max XDP MTU
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (14 preceding siblings ...)
  2020-04-08 11:51 ` [PATCH RFC v2 15/33] ena: add XDP frame size to amazon NIC driver Jesper Dangaard Brouer
@ 2020-04-08 11:51 ` Jesper Dangaard Brouer
  2020-04-08 12:57   ` Tariq Toukan
  2020-04-08 11:52 ` [PATCH RFC v2 17/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
                   ` (17 subsequent siblings)
  33 siblings, 1 reply; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:51 UTC (permalink / raw)
  To: sameehj
  Cc: Tariq Toukan, Saeed Mahameed, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed

The mlx4 drivers size of memory backing the RX packet is stored in
frag_stride. For XDP mode this will be PAGE_SIZE (normally 4096).
For normal mode frag_stride is 2048.

Also adjust MLX4_EN_MAX_XDP_MTU to take tailroom into account.

Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |    3 ++-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c     |    1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 43dcbd8214c6..5bd3cd37d50f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -51,7 +51,8 @@
 #include "en_port.h"
 
 #define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) - \
-				   XDP_PACKET_HEADROOM))
+				XDP_PACKET_HEADROOM -			    \
+				SKB_DATA_ALIGN(sizeof(struct skb_shared_info))))
 
 int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index db3552f2d087..231f08c0276c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -683,6 +683,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(ring->xdp_prog);
 	xdp.rxq = &ring->xdp_rxq;
+	xdp.frame_sz = priv->frag_info[0].frag_stride;
 	doorbell_pending = 0;
 
 	/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 17/33] mlx5: rx queue setup time determine frame_sz for XDP
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (15 preceding siblings ...)
  2020-04-08 11:51 ` [PATCH RFC v2 16/33] mlx4: add XDP frame size and adjust max XDP MTU Jesper Dangaard Brouer
@ 2020-04-08 11:52 ` Jesper Dangaard Brouer
  2020-04-08 12:52   ` Tariq Toukan
  2020-04-09  9:28   ` Maxim Mikityanskiy
  2020-04-08 11:52 ` [PATCH RFC v2 18/33] net: thunderx: add XDP frame size Jesper Dangaard Brouer
                   ` (16 subsequent siblings)
  33 siblings, 2 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: sameehj
  Cc: Tariq Toukan, Saeed Mahameed, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed

The mlx5 driver have multiple memory models, which are also changed
according to whether a XDP bpf_prog is attached.

The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.:
 # ethtool --set-priv-flags mlx5p2 rx_striding_rq off

On the general case with 4K page_size and regular MTU packet, then
the frame_sz is 2048 and 4096 when XDP is enabled, in both modes.

The info on the given frame size is stored differently depending on the
RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe.
In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which
corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ).
In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored
in rq->wqe.info.arr[0].frag_stride.

To reduce effect on fast-path, this patch determine the frame_sz at
setup time, to avoid determining the memory model runtime.

Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |    1 +
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |    1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    4 ++++
 3 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 12a61bf82c14..1f280fc142ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -651,6 +651,7 @@ struct mlx5e_rq {
 	struct {
 		u16            umem_headroom;
 		u16            headroom;
+		u32            frame_sz;
 		u8             map_dir;   /* dma map direction */
 	} buff;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index f049e0ac308a..de4ad2c9f49a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -137,6 +137,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 	if (xsk)
 		xdp.handle = di->xsk.handle;
 	xdp.rxq = &rq->xdp_rxq;
+	xdp.frame_sz = rq->buff.frame_sz;
 
 	act = bpf_prog_run_xdp(prog, &xdp);
 	if (xsk) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index dd7f338425eb..b9595315c45b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -462,6 +462,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 		rq->mpwqe.num_strides =
 			BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, xsk));
 
+		rq->buff.frame_sz = (1 << rq->mpwqe.log_stride_sz);
+
 		err = mlx5e_create_rq_umr_mkey(mdev, rq);
 		if (err)
 			goto err_rq_wq_destroy;
@@ -485,6 +487,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 			num_xsk_frames = wq_sz << rq->wqe.info.log_num_frags;
 
 		rq->wqe.info = rqp->frags_info;
+		rq->buff.frame_sz = rq->wqe.info.arr[0].frag_stride;
+
 		rq->wqe.frags =
 			kvzalloc_node(array_size(sizeof(*rq->wqe.frags),
 					(wq_sz << rq->wqe.info.log_num_frags)),



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 18/33] net: thunderx: add XDP frame size
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (16 preceding siblings ...)
  2020-04-08 11:52 ` [PATCH RFC v2 17/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
@ 2020-04-08 11:52 ` Jesper Dangaard Brouer
  2020-04-08 11:52 ` [PATCH RFC v2 19/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
                   ` (15 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: sameehj
  Cc: Sunil Goutham, Robert Richter, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed

To help reviewers these are the defines related to RCV_FRAG_LEN

 #define DMA_BUFFER_LEN	1536 /* In multiples of 128bytes */
 #define RCV_FRAG_LEN	(SKB_DATA_ALIGN(DMA_BUFFER_LEN + NET_SKB_PAD) + \
			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))

Cc: Sunil Goutham <sgoutham@marvell.com>
Cc: Robert Richter <rrichter@marvell.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index b4b33368698f..2ba0ce115e63 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -552,6 +552,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = xdp.data + len;
 	xdp.rxq = &rq->xdp_rxq;
+	xdp.frame_sz = RCV_FRAG_LEN + XDP_PACKET_HEADROOM;
 	orig_data = xdp.data;
 
 	rcu_read_lock();



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 19/33] nfp: add XDP frame size to netronome driver
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (17 preceding siblings ...)
  2020-04-08 11:52 ` [PATCH RFC v2 18/33] net: thunderx: add XDP frame size Jesper Dangaard Brouer
@ 2020-04-08 11:52 ` Jesper Dangaard Brouer
  2020-04-08 17:53   ` Jakub Kicinski
  2020-04-08 11:52 ` [PATCH RFC v2 20/33] tun: add XDP frame size Jesper Dangaard Brouer
                   ` (14 subsequent siblings)
  33 siblings, 1 reply; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: sameehj
  Cc: Jakub Kicinski, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed

The netronome nfp driver already had a true_bufsz variable
that contains what was needed for xdp.frame_sz.

Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 9bfb3b077bc1..b9b8c30eab33 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1817,6 +1817,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(dp->xdp_prog);
 	true_bufsz = xdp_prog ? PAGE_SIZE : dp->fl_bufsz;
+	xdp.frame_sz = true_bufsz;
 	xdp.rxq = &rx_ring->xdp_rxq;
 	tx_ring = r_vec->xdp_ring;
 



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 20/33] tun: add XDP frame size
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (18 preceding siblings ...)
  2020-04-08 11:52 ` [PATCH RFC v2 19/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
@ 2020-04-08 11:52 ` Jesper Dangaard Brouer
  2020-04-08 11:52 ` [PATCH RFC v2 21/33] vhost_net: also populate " Jesper Dangaard Brouer
                   ` (13 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: sameehj
  Cc: Jason Wang, Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

The tun driver have two code paths for running XDP (bpf_prog_run_xdp).
In both cases 'buflen' contains enough tailroom for skb_shared_info.

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/tun.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 228fe449dc6d..8351bb287a05 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1671,6 +1671,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 		xdp_set_data_meta_invalid(&xdp);
 		xdp.data_end = xdp.data + len;
 		xdp.rxq = &tfile->xdp_rxq;
+		xdp.frame_sz = buflen;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		if (act == XDP_REDIRECT || act == XDP_TX) {
@@ -2408,6 +2409,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 		}
 		xdp_set_data_meta_invalid(xdp);
 		xdp->rxq = &tfile->xdp_rxq;
+		xdp->frame_sz = buflen;
 
 		act = bpf_prog_run_xdp(xdp_prog, xdp);
 		err = tun_xdp_act(tun, xdp_prog, xdp, act);



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 21/33] vhost_net: also populate XDP frame size
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (19 preceding siblings ...)
  2020-04-08 11:52 ` [PATCH RFC v2 20/33] tun: add XDP frame size Jesper Dangaard Brouer
@ 2020-04-08 11:52 ` Jesper Dangaard Brouer
  2020-04-08 11:52 ` [PATCH RFC v2 22/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
                   ` (12 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: sameehj
  Cc: Jason Wang, Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff
have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start)
which contains the buffer length 'buflen' (with tailroom for
skb_shared_info). Also storing this buflen in xdp->frame_sz, does not
obsolete struct tun_xdp_hdr, as it also contains a struct
virtio_net_hdr with other information.

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/vhost/net.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 18e205eeb9af..5bf608766fc2 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -745,6 +745,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
 	xdp->data = buf + pad;
 	xdp->data_end = xdp->data + len;
 	hdr->buflen = buflen;
+	xdp->frame_sz = buflen;
 
 	--net->refcnt_bias;
 	alloc_frag->offset += buflen;



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 22/33] virtio_net: add XDP frame size in two code paths
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (20 preceding siblings ...)
  2020-04-08 11:52 ` [PATCH RFC v2 21/33] vhost_net: also populate " Jesper Dangaard Brouer
@ 2020-04-08 11:52 ` Jesper Dangaard Brouer
  2020-04-08 11:52 ` [PATCH RFC v2 23/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
                   ` (11 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: sameehj
  Cc: Jason Wang, Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

The virtio_net driver is running inside the guest-OS. There are two
XDP receive code-paths in virtio_net, namely receive_small() and
receive_mergeable(). The receive_big() function does not support XDP.

In receive_small() the frame size is available in buflen. The buffer
backing these frames are allocated in add_recvbuf_small() with same
size, except for the headroom, but tailroom have reserved room for
skb_shared_info. The headroom is encoded in ctx pointer as a value.

In receive_mergeable() the frame size is more dynamic. There are two
basic cases: (1) buffer size is based on a exponentially weighted
moving average (see DECLARE_EWMA) of packet length. Or (2) in case
virtnet_get_headroom() have any headroom then buffer size is
PAGE_SIZE. The ctx pointer is this time used for encoding two values;
the buffer len "truesize" and headroom. In case (1) if the rx buffer
size is underestimated, the packet will have been split over more
buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of
buffer area). If that happens the XDP path does a xdp_linearize_page
operation.

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/virtio_net.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11f722460513..1df3676da185 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -689,6 +689,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		xdp.data_end = xdp.data + len;
 		xdp.data_meta = xdp.data;
 		xdp.rxq = &rq->xdp_rxq;
+		xdp.frame_sz = buflen;
 		orig_data = xdp.data;
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		stats->xdp_packets++;
@@ -797,10 +798,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	int offset = buf - page_address(page);
 	struct sk_buff *head_skb, *curr_skb;
 	struct bpf_prog *xdp_prog;
-	unsigned int truesize;
+	unsigned int truesize = mergeable_ctx_to_truesize(ctx);
 	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
-	int err;
 	unsigned int metasize = 0;
+	unsigned int frame_sz;
+	int err;
 
 	head_skb = NULL;
 	stats->bytes += len - vi->hdr_len;
@@ -821,6 +823,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type))
 			goto err_xdp;
 
+		/* Buffers with headroom use PAGE_SIZE as alloc size,
+		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
+		 */
+		frame_sz = headroom ? PAGE_SIZE : truesize;
+
 		/* This happens when rx buffer size is underestimated
 		 * or headroom is not enough because of the buffer
 		 * was refilled before XDP is set. This should only
@@ -834,6 +841,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 						      page, offset,
 						      VIRTIO_XDP_HEADROOM,
 						      &len);
+			frame_sz = PAGE_SIZE;
+
 			if (!xdp_page)
 				goto err_xdp;
 			offset = VIRTIO_XDP_HEADROOM;
@@ -850,6 +859,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		xdp.data_end = xdp.data + (len - vi->hdr_len);
 		xdp.data_meta = xdp.data;
 		xdp.rxq = &rq->xdp_rxq;
+		xdp.frame_sz = frame_sz;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		stats->xdp_packets++;
@@ -924,7 +934,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	}
 	rcu_read_unlock();
 
-	truesize = mergeable_ctx_to_truesize(ctx);
 	if (unlikely(len > truesize)) {
 		pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
 			 dev->name, len, (unsigned long)ctx);



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 23/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (21 preceding siblings ...)
  2020-04-08 11:52 ` [PATCH RFC v2 22/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
@ 2020-04-08 11:52 ` Jesper Dangaard Brouer
  2020-04-08 11:52   ` [Intel-wired-lan] " Jesper Dangaard Brouer
                   ` (10 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: sameehj
  Cc: Jeff Kirsher, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed

The ixgbe driver have another memory model when compiled on archs with
PAGE_SIZE above 4096 bytes. In this mode it doesn't split the page in
two halves, but instead increment rx_buffer->page_offset by truesize of
packet (which include headroom and tailroom for skb_shared_info).

This is done correctly in ixgbe_build_skb(), but in ixgbe_rx_buffer_flip
which is currently only called on XDP_TX and XDP_REDIRECT, it forgets
to add the tailroom for skb_shared_info. This breaks XDP_REDIRECT, for
veth and cpumap.  Fix by adding size of skb_shared_info tailroom.

Maintainers notice: This fix have been queued to Jeff.

Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect")
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 718931d951bc..ea6834bae04c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2254,7 +2254,8 @@ static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
 	rx_buffer->page_offset ^= truesize;
 #else
 	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) :
+				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
+				SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
 				SKB_DATA_ALIGN(size);
 
 	rx_buffer->page_offset += truesize;



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 24/33] ixgbe: add XDP frame size to driver
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
@ 2020-04-08 11:52   ` Jesper Dangaard Brouer
  2020-04-08 11:50 ` [PATCH RFC v2 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
                     ` (32 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Jeff Kirsher, Alexander Duyck,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   34 +++++++++++++++++++------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ea6834bae04c..eab5934b04f5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2244,20 +2244,30 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 	return ERR_PTR(-result);
 }
 
+static unsigned int ixgbe_rx_frame_truesize(struct ixgbe_ring *rx_ring,
+					    unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ixgbe_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ring_uses_build_skb(rx_ring) ?
+		SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+	return truesize;
+}
+
 static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
 				 struct ixgbe_rx_buffer *rx_buffer,
 				 unsigned int size)
 {
+	unsigned int truesize = ixgbe_rx_frame_truesize(rx_ring, size);
 #if (PAGE_SIZE < 8192)
-	unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
-
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
-				SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
-				SKB_DATA_ALIGN(size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -2291,6 +2301,11 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 
 	xdp.rxq = &rx_ring->xdp_rxq;
 
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, 0);
+#endif
+
 	while (likely(total_rx_packets < budget)) {
 		union ixgbe_adv_rx_desc *rx_desc;
 		struct ixgbe_rx_buffer *rx_buffer;
@@ -2324,7 +2339,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			xdp.data_hard_start = xdp.data -
 					      ixgbe_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = ixgbe_run_xdp(adapter, rx_ring, &xdp);
 		}
 



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Intel-wired-lan] [PATCH RFC v2 24/33] ixgbe: add XDP frame size to driver
@ 2020-04-08 11:52   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: intel-wired-lan

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan at lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   34 +++++++++++++++++++------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ea6834bae04c..eab5934b04f5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2244,20 +2244,30 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 	return ERR_PTR(-result);
 }
 
+static unsigned int ixgbe_rx_frame_truesize(struct ixgbe_ring *rx_ring,
+					    unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ixgbe_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ring_uses_build_skb(rx_ring) ?
+		SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+	return truesize;
+}
+
 static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
 				 struct ixgbe_rx_buffer *rx_buffer,
 				 unsigned int size)
 {
+	unsigned int truesize = ixgbe_rx_frame_truesize(rx_ring, size);
 #if (PAGE_SIZE < 8192)
-	unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
-
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
-				SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
-				SKB_DATA_ALIGN(size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -2291,6 +2301,11 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 
 	xdp.rxq = &rx_ring->xdp_rxq;
 
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, 0);
+#endif
+
 	while (likely(total_rx_packets < budget)) {
 		union ixgbe_adv_rx_desc *rx_desc;
 		struct ixgbe_rx_buffer *rx_buffer;
@@ -2324,7 +2339,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			xdp.data_hard_start = xdp.data -
 					      ixgbe_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = ixgbe_run_xdp(adapter, rx_ring, &xdp);
 		}
 



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 25/33] ixgbevf: add XDP frame size to VF driver
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
@ 2020-04-08 11:52   ` Jesper Dangaard Brouer
  2020-04-08 11:50 ` [PATCH RFC v2 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
                     ` (32 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Jeff Kirsher, Alexander Duyck,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

This patch mirrors the changes to ixgbe in previous patch.

This VF driver doesn't support XDP_REDIRECT, but correct tailroom is
still necessary for BPF-helper xdp_adjust_tail.  In legacy-mode +
larger PAGE_SIZE, due to lacking tailroom, we accept that
xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   34 +++++++++++++++++----
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 4622c4ea2e46..62bc3e3b5b9c 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1095,19 +1095,31 @@ static struct sk_buff *ixgbevf_run_xdp(struct ixgbevf_adapter *adapter,
 	return ERR_PTR(-result);
 }
 
+static unsigned int ixgbevf_rx_frame_truesize(struct ixgbevf_ring *rx_ring,
+					      unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ixgbevf_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ring_uses_build_skb(rx_ring) ?
+		SKB_DATA_ALIGN(IXGBEVF_SKB_PAD + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+       return truesize;
+}
+
 static void ixgbevf_rx_buffer_flip(struct ixgbevf_ring *rx_ring,
 				   struct ixgbevf_rx_buffer *rx_buffer,
 				   unsigned int size)
 {
-#if (PAGE_SIZE < 8192)
-	unsigned int truesize = ixgbevf_rx_pg_size(rx_ring) / 2;
+	unsigned int truesize = ixgbevf_rx_frame_truesize(rx_ring, size);
 
+#if (PAGE_SIZE < 8192)
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBEVF_SKB_PAD + size) :
-				SKB_DATA_ALIGN(size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -1125,6 +1137,11 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 
 	xdp.rxq = &rx_ring->xdp_rxq;
 
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ixgbevf_rx_frame_truesize(rx_ring, 0);
+#endif
+
 	while (likely(total_rx_packets < budget)) {
 		struct ixgbevf_rx_buffer *rx_buffer;
 		union ixgbe_adv_rx_desc *rx_desc;
@@ -1157,7 +1174,10 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 			xdp.data_hard_start = xdp.data -
 					      ixgbevf_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = ixgbevf_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = ixgbevf_run_xdp(adapter, rx_ring, &xdp);
 		}
 



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Intel-wired-lan] [PATCH RFC v2 25/33] ixgbevf: add XDP frame size to VF driver
@ 2020-04-08 11:52   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: intel-wired-lan

This patch mirrors the changes to ixgbe in previous patch.

This VF driver doesn't support XDP_REDIRECT, but correct tailroom is
still necessary for BPF-helper xdp_adjust_tail.  In legacy-mode +
larger PAGE_SIZE, due to lacking tailroom, we accept that
xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan at lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   34 +++++++++++++++++----
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 4622c4ea2e46..62bc3e3b5b9c 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1095,19 +1095,31 @@ static struct sk_buff *ixgbevf_run_xdp(struct ixgbevf_adapter *adapter,
 	return ERR_PTR(-result);
 }
 
+static unsigned int ixgbevf_rx_frame_truesize(struct ixgbevf_ring *rx_ring,
+					      unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ixgbevf_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ring_uses_build_skb(rx_ring) ?
+		SKB_DATA_ALIGN(IXGBEVF_SKB_PAD + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+       return truesize;
+}
+
 static void ixgbevf_rx_buffer_flip(struct ixgbevf_ring *rx_ring,
 				   struct ixgbevf_rx_buffer *rx_buffer,
 				   unsigned int size)
 {
-#if (PAGE_SIZE < 8192)
-	unsigned int truesize = ixgbevf_rx_pg_size(rx_ring) / 2;
+	unsigned int truesize = ixgbevf_rx_frame_truesize(rx_ring, size);
 
+#if (PAGE_SIZE < 8192)
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBEVF_SKB_PAD + size) :
-				SKB_DATA_ALIGN(size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -1125,6 +1137,11 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 
 	xdp.rxq = &rx_ring->xdp_rxq;
 
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ixgbevf_rx_frame_truesize(rx_ring, 0);
+#endif
+
 	while (likely(total_rx_packets < budget)) {
 		struct ixgbevf_rx_buffer *rx_buffer;
 		union ixgbe_adv_rx_desc *rx_desc;
@@ -1157,7 +1174,10 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 			xdp.data_hard_start = xdp.data -
 					      ixgbevf_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = ixgbevf_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = ixgbevf_run_xdp(adapter, rx_ring, &xdp);
 		}
 



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 26/33] i40e: add XDP frame size to driver
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
@ 2020-04-08 11:52   ` Jesper Dangaard Brouer
  2020-04-08 11:50 ` [PATCH RFC v2 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
                     ` (32 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Jeff Kirsher, Alexander Duyck,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |   31 +++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index b8496037ef7f..1fb6b1004dcb 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1507,6 +1507,23 @@ static inline unsigned int i40e_rx_offset(struct i40e_ring *rx_ring)
 	return ring_uses_build_skb(rx_ring) ? I40E_SKB_PAD : 0;
 }
 
+static inline unsigned int i40e_rx_frame_truesize(struct i40e_ring *rx_ring,
+						  unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = i40e_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = i40e_rx_offset(rx_ring) ?
+		SKB_DATA_ALIGN(size + i40e_rx_offset(rx_ring)) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+	return truesize;
+}
+
+
 /**
  * i40e_alloc_mapped_page - recycle or make a new page
  * @rx_ring: ring to use
@@ -2246,13 +2263,11 @@ static void i40e_rx_buffer_flip(struct i40e_ring *rx_ring,
 				struct i40e_rx_buffer *rx_buffer,
 				unsigned int size)
 {
-#if (PAGE_SIZE < 8192)
-	unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
+	unsigned int truesize = i40e_rx_frame_truesize(rx_ring, size);
 
+#if (PAGE_SIZE < 8192)
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = SKB_DATA_ALIGN(i40e_rx_offset(rx_ring) + size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -2335,6 +2350,9 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 	bool failure = false;
 	struct xdp_buff xdp;
 
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, 0);
+#endif
 	xdp.rxq = &rx_ring->xdp_rxq;
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
@@ -2389,7 +2407,10 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 			xdp.data_hard_start = xdp.data -
 					      i40e_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = i40e_run_xdp(rx_ring, &xdp);
 		}
 



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Intel-wired-lan] [PATCH RFC v2 26/33] i40e: add XDP frame size to driver
@ 2020-04-08 11:52   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: intel-wired-lan

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan at lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |   31 +++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index b8496037ef7f..1fb6b1004dcb 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1507,6 +1507,23 @@ static inline unsigned int i40e_rx_offset(struct i40e_ring *rx_ring)
 	return ring_uses_build_skb(rx_ring) ? I40E_SKB_PAD : 0;
 }
 
+static inline unsigned int i40e_rx_frame_truesize(struct i40e_ring *rx_ring,
+						  unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = i40e_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = i40e_rx_offset(rx_ring) ?
+		SKB_DATA_ALIGN(size + i40e_rx_offset(rx_ring)) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+	return truesize;
+}
+
+
 /**
  * i40e_alloc_mapped_page - recycle or make a new page
  * @rx_ring: ring to use
@@ -2246,13 +2263,11 @@ static void i40e_rx_buffer_flip(struct i40e_ring *rx_ring,
 				struct i40e_rx_buffer *rx_buffer,
 				unsigned int size)
 {
-#if (PAGE_SIZE < 8192)
-	unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
+	unsigned int truesize = i40e_rx_frame_truesize(rx_ring, size);
 
+#if (PAGE_SIZE < 8192)
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = SKB_DATA_ALIGN(i40e_rx_offset(rx_ring) + size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -2335,6 +2350,9 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 	bool failure = false;
 	struct xdp_buff xdp;
 
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, 0);
+#endif
 	xdp.rxq = &rx_ring->xdp_rxq;
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
@@ -2389,7 +2407,10 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 			xdp.data_hard_start = xdp.data -
 					      i40e_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = i40e_run_xdp(rx_ring, &xdp);
 		}
 



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 27/33] ice: add XDP frame size to driver
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
@ 2020-04-08 11:52   ` Jesper Dangaard Brouer
  2020-04-08 11:50 ` [PATCH RFC v2 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
                     ` (32 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Jeff Kirsher, Alexander Duyck,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c |   34 +++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index f67e8362958c..695f86694224 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -423,6 +423,22 @@ static unsigned int ice_rx_offset(struct ice_ring *rx_ring)
 	return 0;
 }
 
+static unsigned int ice_rx_frame_truesize(struct ice_ring *rx_ring,
+					  unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ice_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ice_rx_offset(rx_ring) ?
+		SKB_DATA_ALIGN(ice_rx_offset(rx_ring) + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size)
+#endif
+	return truesize;
+}
+
 /**
  * ice_run_xdp - Executes an XDP program on initialized xdp_buff
  * @rx_ring: Rx ring
@@ -991,6 +1007,10 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 	bool failure;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ice_rx_frame_truesize(rx_ring, 0);
+#endif
 
 	/* start the loop to process Rx packets bounded by 'budget' */
 	while (likely(total_rx_pkts < (unsigned int)budget)) {
@@ -1038,6 +1058,10 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		xdp.data_hard_start = xdp.data - ice_rx_offset(rx_ring);
 		xdp.data_meta = xdp.data;
 		xdp.data_end = xdp.data + size;
+#if (PAGE_SIZE > 4096)
+		/* At larger PAGE_SIZE, frame_sz depend on len size */
+		xdp.frame_sz = ice_rx_frame_truesize(rx_ring, size);
+#endif
 
 		rcu_read_lock();
 		xdp_prog = READ_ONCE(rx_ring->xdp_prog);
@@ -1051,16 +1075,8 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		if (!xdp_res)
 			goto construct_skb;
 		if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) {
-			unsigned int truesize;
-
-#if (PAGE_SIZE < 8192)
-			truesize = ice_rx_pg_size(rx_ring) / 2;
-#else
-			truesize = SKB_DATA_ALIGN(ice_rx_offset(rx_ring) +
-						  size);
-#endif
 			xdp_xmit |= xdp_res;
-			ice_rx_buf_adjust_pg_offset(rx_buf, truesize);
+			ice_rx_buf_adjust_pg_offset(rx_buf, xdp.frame_sz);
 		} else {
 			rx_buf->pagecnt_bias++;
 		}



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Intel-wired-lan] [PATCH RFC v2 27/33] ice: add XDP frame size to driver
@ 2020-04-08 11:52   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: intel-wired-lan

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan at lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c |   34 +++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index f67e8362958c..695f86694224 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -423,6 +423,22 @@ static unsigned int ice_rx_offset(struct ice_ring *rx_ring)
 	return 0;
 }
 
+static unsigned int ice_rx_frame_truesize(struct ice_ring *rx_ring,
+					  unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ice_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ice_rx_offset(rx_ring) ?
+		SKB_DATA_ALIGN(ice_rx_offset(rx_ring) + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size)
+#endif
+	return truesize;
+}
+
 /**
  * ice_run_xdp - Executes an XDP program on initialized xdp_buff
  * @rx_ring: Rx ring
@@ -991,6 +1007,10 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 	bool failure;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ice_rx_frame_truesize(rx_ring, 0);
+#endif
 
 	/* start the loop to process Rx packets bounded by 'budget' */
 	while (likely(total_rx_pkts < (unsigned int)budget)) {
@@ -1038,6 +1058,10 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		xdp.data_hard_start = xdp.data - ice_rx_offset(rx_ring);
 		xdp.data_meta = xdp.data;
 		xdp.data_end = xdp.data + size;
+#if (PAGE_SIZE > 4096)
+		/* At larger PAGE_SIZE, frame_sz depend on len size */
+		xdp.frame_sz = ice_rx_frame_truesize(rx_ring, size);
+#endif
 
 		rcu_read_lock();
 		xdp_prog = READ_ONCE(rx_ring->xdp_prog);
@@ -1051,16 +1075,8 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		if (!xdp_res)
 			goto construct_skb;
 		if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) {
-			unsigned int truesize;
-
-#if (PAGE_SIZE < 8192)
-			truesize = ice_rx_pg_size(rx_ring) / 2;
-#else
-			truesize = SKB_DATA_ALIGN(ice_rx_offset(rx_ring) +
-						  size);
-#endif
 			xdp_xmit |= xdp_res;
-			ice_rx_buf_adjust_pg_offset(rx_buf, truesize);
+			ice_rx_buf_adjust_pg_offset(rx_buf, xdp.frame_sz);
 		} else {
 			rx_buf->pagecnt_bias++;
 		}



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 28/33] xdp: for Intel AF_XDP drivers add XDP frame_sz
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
@ 2020-04-08 11:52   ` Jesper Dangaard Brouer
  2020-04-08 11:50 ` [PATCH RFC v2 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
                     ` (32 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Björn Töpel, Magnus Karlsson,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

Intel drivers implement native AF_XDP zerocopy in separate C-files,
that have its own invocation of bpf_prog_run_xdp(). The setup of
xdp_buff is also handled in separately from normal code path.

This patch update XDP frame_sz for AF_XDP zerocopy drivers i40e, ice
and ixgbe, as the code changes needed are very similar.  Introduce a
helper function xsk_umem_xdp_frame_sz() for calculating frame size.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c   |    2 ++
 drivers/net/ethernet/intel/ice/ice_xsk.c     |    2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c |    2 ++
 include/net/xdp_sock.h                       |   11 +++++++++++
 4 files changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 0b7d29192b2c..2b9184aead5f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -531,12 +531,14 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_res, xdp_xmit = 0;
 	bool failure = false;
 	struct sk_buff *skb;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
 		struct i40e_rx_buffer *bi;
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 8279db15e870..23e5515d4527 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -840,11 +840,13 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_xmit = 0;
 	bool failure = false;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
 		union ice_32b_rx_flex_desc *rx_desc;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 74b540ebb3dc..a656ee9a1fae 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -431,12 +431,14 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_res, xdp_xmit = 0;
 	bool failure = false;
 	struct sk_buff *skb;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < budget)) {
 		union ixgbe_adv_rx_desc *rx_desc;
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index e86ec48ef627..1cd1ec3cea97 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -237,6 +237,12 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
 	else
 		return address + offset;
 }
+
+static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
+{
+	return umem->chunk_size_nohr + umem->headroom;
+}
+
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -367,6 +373,11 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
 	return 0;
 }
 
+static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
+{
+	return 0;
+}
+
 static inline int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	return -EOPNOTSUPP;



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [Intel-wired-lan] [PATCH RFC v2 28/33] xdp: for Intel AF_XDP drivers add XDP frame_sz
@ 2020-04-08 11:52   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:52 UTC (permalink / raw)
  To: intel-wired-lan

Intel drivers implement native AF_XDP zerocopy in separate C-files,
that have its own invocation of bpf_prog_run_xdp(). The setup of
xdp_buff is also handled in separately from normal code path.

This patch update XDP frame_sz for AF_XDP zerocopy drivers i40e, ice
and ixgbe, as the code changes needed are very similar.  Introduce a
helper function xsk_umem_xdp_frame_sz() for calculating frame size.

Cc: intel-wired-lan at lists.osuosl.org
Cc: Bj?rn T?pel <bjorn.topel@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c   |    2 ++
 drivers/net/ethernet/intel/ice/ice_xsk.c     |    2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c |    2 ++
 include/net/xdp_sock.h                       |   11 +++++++++++
 4 files changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 0b7d29192b2c..2b9184aead5f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -531,12 +531,14 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_res, xdp_xmit = 0;
 	bool failure = false;
 	struct sk_buff *skb;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
 		struct i40e_rx_buffer *bi;
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 8279db15e870..23e5515d4527 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -840,11 +840,13 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_xmit = 0;
 	bool failure = false;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
 		union ice_32b_rx_flex_desc *rx_desc;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 74b540ebb3dc..a656ee9a1fae 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -431,12 +431,14 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_res, xdp_xmit = 0;
 	bool failure = false;
 	struct sk_buff *skb;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < budget)) {
 		union ixgbe_adv_rx_desc *rx_desc;
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index e86ec48ef627..1cd1ec3cea97 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -237,6 +237,12 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
 	else
 		return address + offset;
 }
+
+static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
+{
+	return umem->chunk_size_nohr + umem->headroom;
+}
+
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -367,6 +373,11 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
 	return 0;
 }
 
+static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
+{
+	return 0;
+}
+
 static inline int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	return -EOPNOTSUPP;



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (27 preceding siblings ...)
  2020-04-08 11:52   ` [Intel-wired-lan] " Jesper Dangaard Brouer
@ 2020-04-08 11:53 ` Jesper Dangaard Brouer
  2020-04-09  3:31   ` Saeed Mahameed
  2020-04-14  9:56   ` Jesper Dangaard Brouer
  2020-04-08 11:53 ` [PATCH RFC v2 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
                   ` (4 subsequent siblings)
  33 siblings, 2 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:53 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

Finally, after all drivers have a frame size, allow BPF-helper
bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.

Remember that helper/macro xdp_data_hard_end have reserved some
tailroom.  Thus, this helper makes sure that the BPF-prog don't have
access to this tailroom area.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/uapi/linux/bpf.h |    4 ++--
 net/core/filter.c        |   18 ++++++++++++++++--
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2e29a671d67e..0e5abe991ca3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1969,8 +1969,8 @@ union bpf_attr {
  * int bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
  * 	Description
  * 		Adjust (move) *xdp_md*\ **->data_end** by *delta* bytes. It is
- * 		only possible to shrink the packet as of this writing,
- * 		therefore *delta* must be a negative integer.
+ * 		possible to both shrink and grow the packet tail.
+ * 		Shrink done via *delta* being a negative integer.
  *
  * 		A call to this helper is susceptible to change the underlying
  * 		packet buffer. Therefore, at load time, all checks on pointers
diff --git a/net/core/filter.c b/net/core/filter.c
index 7628b947dbc3..4d58a147eed0 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3422,12 +3422,26 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
 
 BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 {
+	void *data_hard_end = xdp_data_hard_end(xdp);
 	void *data_end = xdp->data_end + offset;
 
-	/* only shrinking is allowed for now. */
-	if (unlikely(offset >= 0))
+	/* Notice that xdp_data_hard_end have reserved some tailroom */
+	if (unlikely(data_end > data_hard_end))
 		return -EINVAL;
 
+	/* DANGER: ALL drivers MUST be converted to init xdp->frame_sz
+	 * - Adding some chicken checks below
+	 * - Will (likely) not be for upstream
+	 */
+	if (unlikely(xdp->frame_sz < (xdp->data_end - xdp->data_hard_start))) {
+		WARN(1, "Too small xdp->frame_sz = %d\n", xdp->frame_sz);
+		return -EINVAL;
+	}
+	if (unlikely(xdp->frame_sz > PAGE_SIZE)) {
+		WARN(1, "Too BIG xdp->frame_sz = %d\n", xdp->frame_sz);
+		return -EINVAL;
+	}
+
 	if (unlikely(data_end < xdp->data + ETH_HLEN))
 		return -EINVAL;
 



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail()
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (28 preceding siblings ...)
  2020-04-08 11:53 ` [PATCH RFC v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
@ 2020-04-08 11:53 ` Jesper Dangaard Brouer
  2020-04-08 21:49   ` David Miller
  2020-04-08 11:53 ` [PATCH RFC v2 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp() Jesper Dangaard Brouer
                   ` (3 subsequent siblings)
  33 siblings, 1 reply; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:53 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

Clearing memory of tail when grow happens, because it is too easy
to write a XDP_PASS program that extend the tail, which expose
this memory to users that can run tcpdump.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/filter.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 4d58a147eed0..a8674f2a0e24 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3445,6 +3445,11 @@ BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 	if (unlikely(data_end < xdp->data + ETH_HLEN))
 		return -EINVAL;
 
+	/* Clear memory area on grow, can contain uninit kernel memory */
+	if (offset > 0) {
+		memset(xdp->data_end, 0, offset);
+	}
+
 	xdp->data_end = data_end;
 
 	return 0;



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp().
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (29 preceding siblings ...)
  2020-04-08 11:53 ` [PATCH RFC v2 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
@ 2020-04-08 11:53 ` Jesper Dangaard Brouer
  2020-04-08 11:53 ` [PATCH RFC v2 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:53 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

Update the memory requirements, when adding xdp.frame_sz in BPF test_run
function bpf_prog_test_run_xdp() which e.g. is used by XDP selftests.

Specifically add the expected reserved tailroom, but also allocated a
larger memory area to reflect that XDP frames usually comes in this
format. Limit the provided packet data size to 4096 minus headroom +
tailroom, as this also reflect a common 3520 bytes MTU limit with XDP.

Note that bpf_test_init already use a memory allocation method that clears
memory.  Thus, this already guards against leaking uninit kernel memory.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/bpf/test_run.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 29dbdd4c29f6..30ba7d38941d 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -470,25 +470,34 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 			  union bpf_attr __user *uattr)
 {
+	u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	u32 headroom = XDP_PACKET_HEADROOM;
 	u32 size = kattr->test.data_size_in;
 	u32 repeat = kattr->test.repeat;
 	struct netdev_rx_queue *rxqueue;
 	struct xdp_buff xdp = {};
 	u32 retval, duration;
+	u32 max_data_sz;
 	void *data;
 	int ret;
 
 	if (kattr->test.ctx_in || kattr->test.ctx_out)
 		return -EINVAL;
 
-	data = bpf_test_init(kattr, size, XDP_PACKET_HEADROOM + NET_IP_ALIGN, 0);
+	/* XDP have extra tailroom as (most) drivers use full page */
+	max_data_sz = 4096 - headroom - tailroom;
+	if (size > max_data_sz)
+		return -EINVAL;
+
+	data = bpf_test_init(kattr, max_data_sz, headroom, tailroom);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
 	xdp.data_hard_start = data;
-	xdp.data = data + XDP_PACKET_HEADROOM + NET_IP_ALIGN;
+	xdp.data = data + headroom;
 	xdp.data_meta = xdp.data;
 	xdp.data_end = xdp.data + size;
+	xdp.frame_sz = headroom + max_data_sz + tailroom;
 
 	rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 0);
 	xdp.rxq = &rxqueue->xdp_rxq;
@@ -496,8 +505,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
 	if (ret)
 		goto out;
-	if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN ||
-	    xdp.data_end != xdp.data + size)
+	if (xdp.data != data + headroom || xdp.data_end != xdp.data + size)
 		size = xdp.data_end - xdp.data;
 	ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
 out:



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (30 preceding siblings ...)
  2020-04-08 11:53 ` [PATCH RFC v2 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp() Jesper Dangaard Brouer
@ 2020-04-08 11:53 ` Jesper Dangaard Brouer
  2020-04-08 11:53 ` [PATCH RFC v2 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests Jesper Dangaard Brouer
  2020-04-08 16:55   ` [Intel-wired-lan] " Alexei Starovoitov
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:53 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

Current selftest for BPF-helper xdp_adjust_tail only shrink tail.
Make it more clear that this is a shrink test case.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 .../selftests/bpf/prog_tests/xdp_adjust_tail.c     |    9 +++++-
 .../testing/selftests/bpf/progs/test_adjust_tail.c |   30 --------------------
 .../bpf/progs/test_xdp_adjust_tail_shrink.c        |   30 ++++++++++++++++++++
 3 files changed, 37 insertions(+), 32 deletions(-)
 delete mode 100644 tools/testing/selftests/bpf/progs/test_adjust_tail.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
index 3744196d7cba..d258f979d5ef 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
@@ -1,9 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
 
-void test_xdp_adjust_tail(void)
+void test_xdp_adjust_tail_shrink(void)
 {
-	const char *file = "./test_adjust_tail.o";
+	const char *file = "./test_xdp_adjust_tail_shrink.o";
 	struct bpf_object *obj;
 	char buf[128];
 	__u32 duration, retval, size;
@@ -27,3 +27,8 @@ void test_xdp_adjust_tail(void)
 	      err, errno, retval, size);
 	bpf_object__close(obj);
 }
+
+void test_xdp_adjust_tail(void)
+{
+	test_xdp_adjust_tail_shrink();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_adjust_tail.c b/tools/testing/selftests/bpf/progs/test_adjust_tail.c
deleted file mode 100644
index b7fc85769bdc..000000000000
--- a/tools/testing/selftests/bpf/progs/test_adjust_tail.c
+++ /dev/null
@@ -1,30 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0
- * Copyright (c) 2018 Facebook
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- */
-#include <linux/bpf.h>
-#include <linux/if_ether.h>
-#include <bpf/bpf_helpers.h>
-
-int _version SEC("version") = 1;
-
-SEC("xdp_adjust_tail")
-int _xdp_adjust_tail(struct xdp_md *xdp)
-{
-	void *data_end = (void *)(long)xdp->data_end;
-	void *data = (void *)(long)xdp->data;
-	int offset = 0;
-
-	if (data_end - data == 54)
-		offset = 256;
-	else
-		offset = 20;
-	if (bpf_xdp_adjust_tail(xdp, 0 - offset))
-		return XDP_DROP;
-	return XDP_TX;
-}
-
-char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
new file mode 100644
index 000000000000..c8a7c17b54f4
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (c) 2018 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <bpf/bpf_helpers.h>
+
+int _version SEC("version") = 1;
+
+SEC("xdp_adjust_tail_shrink")
+int _xdp_adjust_tail_shrink(struct xdp_md *xdp)
+{
+	void *data_end = (void *)(long)xdp->data_end;
+	void *data = (void *)(long)xdp->data;
+	int offset = 0;
+
+	if (data_end - data == 54) /* sizeof(pkt_v4) */
+		offset = 256; /* shrink too much */
+	else
+		offset = 20;
+	if (bpf_xdp_adjust_tail(xdp, 0 - offset))
+		return XDP_DROP;
+	return XDP_TX;
+}
+
+char _license[] SEC("license") = "GPL";



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH RFC v2 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (31 preceding siblings ...)
  2020-04-08 11:53 ` [PATCH RFC v2 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail Jesper Dangaard Brouer
@ 2020-04-08 11:53 ` Jesper Dangaard Brouer
  2020-04-08 16:55   ` [Intel-wired-lan] " Alexei Starovoitov
  33 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-08 11:53 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

Extend BPF selftest xdp_adjust_tail with grow tail tests, which is added
as subtest's. The first grow test stays in same form as original shrink
test. The second grow test use the newer bpf_prog_test_run_xattr() calls,
and does extra checking of data contents.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 .../selftests/bpf/prog_tests/xdp_adjust_tail.c     |  116 +++++++++++++++++++-
 .../bpf/progs/test_xdp_adjust_tail_grow.c          |   33 ++++++
 2 files changed, 144 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
index d258f979d5ef..1498627af6e8 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
@@ -4,10 +4,10 @@
 void test_xdp_adjust_tail_shrink(void)
 {
 	const char *file = "./test_xdp_adjust_tail_shrink.o";
+	__u32 duration, retval, size, expect_sz;
 	struct bpf_object *obj;
-	char buf[128];
-	__u32 duration, retval, size;
 	int err, prog_fd;
+	char buf[128];
 
 	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
 	if (CHECK_FAIL(err))
@@ -20,15 +20,121 @@ void test_xdp_adjust_tail_shrink(void)
 	      "ipv4", "err %d errno %d retval %d size %d\n",
 	      err, errno, retval, size);
 
+	expect_sz = sizeof(pkt_v6) - 20;  /* Test shrink with 20 bytes */
 	err = bpf_prog_test_run(prog_fd, 1, &pkt_v6, sizeof(pkt_v6),
 				buf, &size, &retval, &duration);
-	CHECK(err || retval != XDP_TX || size != 54,
-	      "ipv6", "err %d errno %d retval %d size %d\n",
+	CHECK(err || retval != XDP_TX || size != expect_sz,
+	      "ipv6", "err %d errno %d retval %d size %d expect-size %d\n",
+	      err, errno, retval, size, expect_sz);
+	bpf_object__close(obj);
+}
+
+void test_xdp_adjust_tail_grow(void)
+{
+	const char *file = "./test_xdp_adjust_tail_grow.o";
+	struct bpf_object *obj;
+	char buf[4096]; /* avoid segfault: large buf to hold grow results */
+	__u32 duration, retval, size, expect_sz;
+	int err, prog_fd;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
+	if (CHECK_FAIL(err))
+		return;
+
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+				buf, &size, &retval, &duration);
+	CHECK(err || retval != XDP_DROP,
+	      "ipv4", "err %d errno %d retval %d size %d\n",
 	      err, errno, retval, size);
+
+	expect_sz = sizeof(pkt_v6) + 40; /* Test grow with 40 bytes */
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v6, sizeof(pkt_v6) /* 74 */,
+				buf, &size, &retval, &duration);
+	CHECK(err || retval != XDP_TX || size != expect_sz,
+	      "ipv6", "err %d errno %d retval %d size %d expect-size %d\n",
+	      err, errno, retval, size, expect_sz);
+
+	bpf_object__close(obj);
+}
+
+void test_xdp_adjust_tail_grow2(void)
+{
+	const char *file = "./test_xdp_adjust_tail_grow.o";
+	char buf[4096]; /* avoid segfault: large buf to hold grow results */
+	int tailroom = 320; /* SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) */;
+	struct bpf_object *obj;
+	int err, cnt, i;
+	int max_grow;
+
+	struct bpf_prog_test_run_attr tattr = {
+		.repeat 	= 1,
+		.data_in	= &buf,
+		.data_out	= &buf,
+		.data_size_in	= 0, /* Per test */
+		.data_size_out	= 0, /* Per test */
+	};
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &tattr.prog_fd);
+	if (CHECK_ATTR(err, "load", "err %d errno %d\n", err, errno))
+		return;
+
+	/* Test case-64 */
+	memset(buf, 1, sizeof(buf));
+	tattr.data_size_in  =  64; /* Determine test case via pkt size */
+	tattr.data_size_out = 128; /* Limit copy_size */
+	/* Kernel side alloc packet memory area that is zero init */
+	err = bpf_prog_test_run_xattr(&tattr);
+
+	CHECK_ATTR(errno != ENOSPC /* Due limit copy_size in bpf_test_finish */
+		   || tattr.retval != XDP_TX
+		   || tattr.data_size_out != 192, /* Expected grow size */
+		   "case-64",
+		   "err %d errno %d retval %d size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out);
+
+	/* Extra checks for data contents */
+	CHECK_ATTR(tattr.data_size_out != 192
+		   || buf[0]   != 1 ||  buf[63]  != 1  /*  0-63  memset to 1 */
+		   || buf[64]  != 0 ||  buf[127] != 0  /* 64-127 memset to 0 */
+		   || buf[128] != 1 ||  buf[191] != 1, /*128-191 memset to 1 */
+		   "case-64-data",
+		   "err %d errno %d retval %d size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out);
+
+	/* Test case-128 */
+	memset(buf, 2, sizeof(buf));
+	tattr.data_size_in  = 128; /* Determine test case via pkt size */
+	tattr.data_size_out = sizeof(buf);   /* Copy everything */
+	err = bpf_prog_test_run_xattr(&tattr);
+
+	max_grow = 4096 - XDP_PACKET_HEADROOM -	tailroom; /* 3520 */
+	CHECK_ATTR(err
+		   || tattr.retval != XDP_TX
+		   || tattr.data_size_out != max_grow, /* Expect max grow size */
+		   "case-128",
+		   "err %d errno %d retval %d size %d expect-size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out, max_grow);
+
+	/* Extra checks for data contents: Count grow size, will contain zeros */
+	for (i = 0, cnt = 0; i < sizeof(buf); i++) {
+		if (buf[i] == 0)
+			cnt++;
+	}
+	CHECK_ATTR((cnt != (max_grow - tattr.data_size_in)) /* Grow increase */
+		   || tattr.data_size_out != max_grow, /* Total grow size */
+		   "case-128-data",
+		   "err %d errno %d retval %d size %d grow-size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out, cnt);
+
 	bpf_object__close(obj);
 }
 
 void test_xdp_adjust_tail(void)
 {
-	test_xdp_adjust_tail_shrink();
+	if (test__start_subtest("xdp_adjust_tail_shrink"))
+		test_xdp_adjust_tail_shrink();
+	if (test__start_subtest("xdp_adjust_tail_grow"))
+		test_xdp_adjust_tail_grow();
+	if (test__start_subtest("xdp_adjust_tail_grow2"))
+		test_xdp_adjust_tail_grow2();
 }
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
new file mode 100644
index 000000000000..3d66599eee2e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+SEC("xdp_adjust_tail_grow")
+int _xdp_adjust_tail_grow(struct xdp_md *xdp)
+{
+	void *data_end = (void *)(long)xdp->data_end;
+	void *data = (void *)(long)xdp->data;
+	unsigned int data_len;
+	int offset = 0;
+
+	/* Data length determine test case */
+	data_len = data_end - data;
+
+	if (data_len == 54) { /* sizeof(pkt_v4) */
+		offset = 4096; /* test too large offset */
+	} else if (data_len == 74) { /* sizeof(pkt_v6) */
+		offset = 40;
+	} else if (data_len == 64) {
+		offset = 128;
+	} else if (data_len == 128) {
+		offset = 4096 - 256 - 320 - data_len; /* Max tail grow 3520 */
+	} else {
+		return XDP_ABORTED; /* No matching test */
+	}
+
+	if (bpf_xdp_adjust_tail(xdp, offset))
+		return XDP_DROP;
+	return XDP_TX;
+}
+
+char _license[] SEC("license") = "GPL";



^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 17/33] mlx5: rx queue setup time determine frame_sz for XDP
  2020-04-08 11:52 ` [PATCH RFC v2 17/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
@ 2020-04-08 12:52   ` Tariq Toukan
  2020-04-16 12:04     ` Jesper Dangaard Brouer
  2020-04-09  9:28   ` Maxim Mikityanskiy
  1 sibling, 1 reply; 78+ messages in thread
From: Tariq Toukan @ 2020-04-08 12:52 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Saeed Mahameed, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

Hi Jesper,

Thanks for your patch.
Please see feedback below.

On 4/8/2020 2:52 PM, Jesper Dangaard Brouer wrote:
> The mlx5 driver have multiple memory models, which are also changed
> according to whether a XDP bpf_prog is attached.
> 
> The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.:
>   # ethtool --set-priv-flags mlx5p2 rx_striding_rq off
> 
> On the general case with 4K page_size and regular MTU packet, then
> the frame_sz is 2048 and 4096 when XDP is enabled, in both modes.
> 
> The info on the given frame size is stored differently depending on the
> RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe.
> In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which
> corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ).
> In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored
> in rq->wqe.info.arr[0].frag_stride.

Just to clarify, the above description is true as long as we're in the 
Linear SKB memory scheme, this holds when:
1) MTU + headroom + tailroom < PAGE_SIZE, and
2) HW LRO is OFF.

Otherwise, mpwqe.log_stride_sz can be smaller, and frag_stride of 
wqe_info can vary from one index to another.

> 
> To reduce effect on fast-path, this patch determine the frame_sz at
> setup time, to avoid determining the memory model runtime.
> 
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/en.h      |    1 +
>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |    1 +
>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    4 ++++
>   3 files changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> index 12a61bf82c14..1f280fc142ca 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> @@ -651,6 +651,7 @@ struct mlx5e_rq {
>   	struct {
>   		u16            umem_headroom;
>   		u16            headroom;
> +		u32            frame_sz;
>   		u8             map_dir;   /* dma map direction */
>   	} buff;
>   
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index f049e0ac308a..de4ad2c9f49a 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -137,6 +137,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>   	if (xsk)
>   		xdp.handle = di->xsk.handle;
>   	xdp.rxq = &rq->xdp_rxq;
> +	xdp.frame_sz = rq->buff.frame_sz;
>   
>   	act = bpf_prog_run_xdp(prog, &xdp);
>   	if (xsk) {
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index dd7f338425eb..b9595315c45b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -462,6 +462,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>   		rq->mpwqe.num_strides =
>   			BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, xsk));
>   
> +		rq->buff.frame_sz = (1 << rq->mpwqe.log_stride_sz);
> +

This is always correct.

>   		err = mlx5e_create_rq_umr_mkey(mdev, rq);
>   		if (err)
>   			goto err_rq_wq_destroy;
> @@ -485,6 +487,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>   			num_xsk_frames = wq_sz << rq->wqe.info.log_num_frags;
>   
>   		rq->wqe.info = rqp->frags_info;
> +		rq->buff.frame_sz = rq->wqe.info.arr[0].frag_stride;
> +

This is not always correct.
Size of the last frag for a large MTU might be a full page.
See:
https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#L2097

However, you won't try to use this value at all in the non-linear SKB 
flow, as it's not compatible with XDP.

Anyway, I prefer this value to be always true. No matter if it's really 
used or not.
Probably rename the field name to indicate this?
Something like: single_frame_sz / first_frame_sz ?

>   		rq->wqe.frags =
>   			kvzalloc_node(array_size(sizeof(*rq->wqe.frags),
>   					(wq_sz << rq->wqe.info.log_num_frags)),
> 
> 

Thanks,
Tariq

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 16/33] mlx4: add XDP frame size and adjust max XDP MTU
  2020-04-08 11:51 ` [PATCH RFC v2 16/33] mlx4: add XDP frame size and adjust max XDP MTU Jesper Dangaard Brouer
@ 2020-04-08 12:57   ` Tariq Toukan
  2020-04-14  8:19     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 78+ messages in thread
From: Tariq Toukan @ 2020-04-08 12:57 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Tariq Toukan, Saeed Mahameed, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi



On 4/8/2020 2:51 PM, Jesper Dangaard Brouer wrote:
> The mlx4 drivers size of memory backing the RX packet is stored in
> frag_stride. For XDP mode this will be PAGE_SIZE (normally 4096).
> For normal mode frag_stride is 2048.
> 
> Also adjust MLX4_EN_MAX_XDP_MTU to take tailroom into account.
> 
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/en_netdev.c |    3 ++-
>   drivers/net/ethernet/mellanox/mlx4/en_rx.c     |    1 +
>   2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 43dcbd8214c6..5bd3cd37d50f 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -51,7 +51,8 @@
>   #include "en_port.h"
>   
>   #define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) - \
> -				   XDP_PACKET_HEADROOM))
> +				XDP_PACKET_HEADROOM -			    \
> +				SKB_DATA_ALIGN(sizeof(struct skb_shared_info))))
>   
>   int mlx4_en_setup_tc(struct net_device *dev, u8 up)
>   {
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index db3552f2d087..231f08c0276c 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -683,6 +683,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>   	rcu_read_lock();
>   	xdp_prog = rcu_dereference(ring->xdp_prog);
>   	xdp.rxq = &ring->xdp_rxq;
> +	xdp.frame_sz = priv->frag_info[0].frag_stride;
>   	doorbell_pending = 0;
>   
>   	/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
> 
> 

Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Thanks.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 05/33] net: netsec: Add support for XDP frame size
  2020-04-08 11:50 ` [PATCH RFC v2 05/33] net: netsec: Add support for XDP frame size Jesper Dangaard Brouer
@ 2020-04-08 13:09   ` Lorenzo Bianconi
  2020-04-14  8:07     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 78+ messages in thread
From: Lorenzo Bianconi @ 2020-04-08 13:09 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, Ilias Apalodimas, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Lorenzo Bianconi,
	Saeed Mahameed

[-- Attachment #1: Type: text/plain, Size: 4142 bytes --]

> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>

> 
> This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that
> can help reduce the number of cache-lines that need to be flushed
> when doing DMA sync for_device. Due to xdp_adjust_tail can grow the
> area accessible to the by the CPU (can possibly write into), then max
> sync length *after* bpf_prog_run_xdp() needs to be taken into account.
> 
> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  drivers/net/ethernet/socionext/netsec.c |   30 ++++++++++++++++++------------
>  1 file changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
> index a5a0fb60193a..e1f4be4b3d69 100644
> --- a/drivers/net/ethernet/socionext/netsec.c
> +++ b/drivers/net/ethernet/socionext/netsec.c
> @@ -884,23 +884,28 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
>  			  struct xdp_buff *xdp)
>  {
>  	struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_RX];
> -	unsigned int len = xdp->data_end - xdp->data;
> +	unsigned int sync, len = xdp->data_end - xdp->data;
>  	u32 ret = NETSEC_XDP_PASS;
> +	struct page *page;
>  	int err;
>  	u32 act;
>  
>  	act = bpf_prog_run_xdp(prog, xdp);
>  
> +	/* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */
> +	sync = xdp->data_end - xdp->data_hard_start - NETSEC_RXBUF_HEADROOM;
> +	sync = max(sync, len);
> +
>  	switch (act) {
>  	case XDP_PASS:
>  		ret = NETSEC_XDP_PASS;
>  		break;
>  	case XDP_TX:
>  		ret = netsec_xdp_xmit_back(priv, xdp);
> -		if (ret != NETSEC_XDP_TX)
> -			page_pool_put_page(dring->page_pool,
> -					   virt_to_head_page(xdp->data), len,
> -					   true);
> +		if (ret != NETSEC_XDP_TX) {
> +			page = virt_to_head_page(xdp->data);
> +			page_pool_put_page(dring->page_pool, page, sync, true);
> +		}
>  		break;
>  	case XDP_REDIRECT:
>  		err = xdp_do_redirect(priv->ndev, xdp, prog);
> @@ -908,9 +913,8 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
>  			ret = NETSEC_XDP_REDIR;
>  		} else {
>  			ret = NETSEC_XDP_CONSUMED;
> -			page_pool_put_page(dring->page_pool,
> -					   virt_to_head_page(xdp->data), len,
> -					   true);
> +			page = virt_to_head_page(xdp->data);
> +			page_pool_put_page(dring->page_pool, page, sync, true);
>  		}
>  		break;
>  	default:
> @@ -921,8 +925,8 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
>  		/* fall through -- handle aborts by dropping packet */
>  	case XDP_DROP:
>  		ret = NETSEC_XDP_CONSUMED;
> -		page_pool_put_page(dring->page_pool,
> -				   virt_to_head_page(xdp->data), len, true);
> +		page = virt_to_head_page(xdp->data);
> +		page_pool_put_page(dring->page_pool, page, sync, true);
>  		break;
>  	}
>  
> @@ -936,10 +940,14 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
>  	struct netsec_rx_pkt_info rx_info;
>  	enum dma_data_direction dma_dir;
>  	struct bpf_prog *xdp_prog;
> +	struct xdp_buff xdp;
>  	u16 xdp_xmit = 0;
>  	u32 xdp_act = 0;
>  	int done = 0;
>  
> +	xdp.rxq = &dring->xdp_rxq;
> +	xdp.frame_sz = PAGE_SIZE;
> +
>  	rcu_read_lock();
>  	xdp_prog = READ_ONCE(priv->xdp_prog);
>  	dma_dir = page_pool_get_dma_dir(dring->page_pool);
> @@ -953,7 +961,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
>  		struct sk_buff *skb = NULL;
>  		u16 pkt_len, desc_len;
>  		dma_addr_t dma_handle;
> -		struct xdp_buff xdp;
>  		void *buf_addr;
>  
>  		if (de->attr & (1U << NETSEC_RX_PKT_OWN_FIELD)) {
> @@ -1002,7 +1009,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
>  		xdp.data = desc->addr + NETSEC_RXBUF_HEADROOM;
>  		xdp_set_data_meta_invalid(&xdp);
>  		xdp.data_end = xdp.data + pkt_len;
> -		xdp.rxq = &dring->xdp_rxq;
>  
>  		if (xdp_prog) {
>  			xdp_result = netsec_run_xdp(priv, xdp_prog, &xdp);
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: [PATCH RFC v2 12/33] hv_netvsc: add XDP frame size to driver
  2020-04-08 11:51 ` [PATCH RFC v2 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-08 14:56   ` Haiyang Zhang
  0 siblings, 0 replies; 78+ messages in thread
From: Haiyang Zhang @ 2020-04-08 14:56 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Wei Liu, KY Srinivasan, Stephen Hemminger, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed



> -----Original Message-----
> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Sent: Wednesday, April 8, 2020 7:52 AM
> To: sameehj@amazon.com
> Cc: Wei Liu <wei.liu@kernel.org>; KY Srinivasan <kys@microsoft.com>;
> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Jesper Dangaard Brouer
> <brouer@redhat.com>; netdev@vger.kernel.org; bpf@vger.kernel.org;
> zorik@amazon.com; akiyano@amazon.com; gtzalik@amazon.com; Toke
> Høiland-Jørgensen <toke@redhat.com>; Daniel Borkmann
> <borkmann@iogearbox.net>; Alexei Starovoitov
> <alexei.starovoitov@gmail.com>; John Fastabend
> <john.fastabend@gmail.com>; Alexander Duyck
> <alexander.duyck@gmail.com>; Jeff Kirsher <jeffrey.t.kirsher@intel.com>;
> David Ahern <dsahern@gmail.com>; Willem de Bruijn
> <willemdebruijn.kernel@gmail.com>; Ilias Apalodimas
> <ilias.apalodimas@linaro.org>; Lorenzo Bianconi <lorenzo@kernel.org>;
> Saeed Mahameed <saeedm@mellanox.com>
> Subject: [PATCH RFC v2 12/33] hv_netvsc: add XDP frame size to driver
> 
> The hyperv NIC drivers XDP implementation is rather disappointing as it
> will be a slowdown to enable XDP on this driver, given it will allocate a
> new page for each packet and copy over the payload, before invoking the
> XDP BPF-prog.

As explained when I submit the XDP support for hv_netvsc -- without XDP, 
this driver already allocates memory and does a copy for every packet. So 
the page allocation for XDP data buf is not slower than the existing code 
path. Also, an optimization that only allocates a PAGE once, and re-uses it 
in a NAPI cycle will be done.

And, my XDP implementation for hv_netvsc transparently passes xdp_prog 
to the associated VF NIC. Many of the Azure VMs are using SRIOV, so 
majority of the data are actually processed directly on the VF driver's XDP 
path. So the overhead of the synthetic data path (hv_netvsc) is minimal.

Thanks,
- Haiyang


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 00/33] XDP extend with knowledge of frame size
  2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
@ 2020-04-08 16:55   ` Alexei Starovoitov
  2020-04-08 11:50 ` [PATCH RFC v2 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
                     ` (32 subsequent siblings)
  33 siblings, 0 replies; 78+ messages in thread
From: Alexei Starovoitov @ 2020-04-08 16:55 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, Grygorii Strashko, intel-wired-lan, Ariel Elior,
	Andy Gospodarek, K. Y. Srinivasan, Jason Wang, Sunil Goutham,
	Robert Richter, GR-everest-linux-l2, Wei Liu, Magnus Karlsson,
	Ilias Apalodimas, Jeff Kirsher, Arthur Kiyanovski,
	Ioana Radulescu, Alexander Duyck, Björn Töpel,
	Jakub Kicinski, Haiyang Zhang, Lorenzo Bianconi, Mao Wenan,
	Toshiaki Makita, Michael Chan, Saeed Mahameed, Andy Gospodarek,
	Tariq Toukan, Stephen Hemminger, thomas.petazzoni, netdev, bpf,
	zorik, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, John Fastabend, David Ahern, Willem de Bruijn

On Wed, Apr 08, 2020 at 01:50:34PM +0200, Jesper Dangaard Brouer wrote:
> RFC-note: This is only an RFC because net-next is closed.
> - Please ACK patches you like, then I will collect those for later.
> 
> XDP have evolved to support several frame sizes, but xdp_buff was not
> updated with this information. This have caused the side-effect that
> XDP frame data hard end is unknown. This have limited the BPF-helper
> bpf_xdp_adjust_tail to only shrink the packet. This patchset address
> this and add packet tail extend/grow.
> 
> The purpose of the patchset is ALSO to reserve a memory area that can be
> used for storing extra information, specifically for extending XDP with
> multi-buffer support. One proposal is to use same layout as
> skb_shared_info, which is why this area is currently 320 bytes.
> 
> When converting xdp_frame to SKB (veth and cpumap), the full tailroom
> area can now be used and SKB truesize is now correct. For most
> drivers this result in a much larger tailroom in SKB "head" data
> area. The network stack can now take advantage of this when doing SKB
> coalescing. Thus, a good driver test is to use xdp_redirect_cpu from
> samples/bpf/ and do some TCP stream testing.

I did a quick look through the patches. Overall looks good to me.
Nice to see selftests as well.
If you can add an xdp selftest that uses generic xdp on lo or veth
that would be awesome.
I rarely run test_xdp*.sh tests, but run test_progs on every commit.
So having more comprehensive xdp test as part of test_progs will help us
catch breakage sooner.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size
@ 2020-04-08 16:55   ` Alexei Starovoitov
  0 siblings, 0 replies; 78+ messages in thread
From: Alexei Starovoitov @ 2020-04-08 16:55 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Apr 08, 2020 at 01:50:34PM +0200, Jesper Dangaard Brouer wrote:
> RFC-note: This is only an RFC because net-next is closed.
> - Please ACK patches you like, then I will collect those for later.
> 
> XDP have evolved to support several frame sizes, but xdp_buff was not
> updated with this information. This have caused the side-effect that
> XDP frame data hard end is unknown. This have limited the BPF-helper
> bpf_xdp_adjust_tail to only shrink the packet. This patchset address
> this and add packet tail extend/grow.
> 
> The purpose of the patchset is ALSO to reserve a memory area that can be
> used for storing extra information, specifically for extending XDP with
> multi-buffer support. One proposal is to use same layout as
> skb_shared_info, which is why this area is currently 320 bytes.
> 
> When converting xdp_frame to SKB (veth and cpumap), the full tailroom
> area can now be used and SKB truesize is now correct. For most
> drivers this result in a much larger tailroom in SKB "head" data
> area. The network stack can now take advantage of this when doing SKB
> coalescing. Thus, a good driver test is to use xdp_redirect_cpu from
> samples/bpf/ and do some TCP stream testing.

I did a quick look through the patches. Overall looks good to me.
Nice to see selftests as well.
If you can add an xdp selftest that uses generic xdp on lo or veth
that would be awesome.
I rarely run test_xdp*.sh tests, but run test_progs on every commit.
So having more comprehensive xdp test as part of test_progs will help us
catch breakage sooner.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 28/33] xdp: for Intel AF_XDP drivers add XDP frame_sz
  2020-04-08 11:52   ` [Intel-wired-lan] " Jesper Dangaard Brouer
@ 2020-04-08 17:31     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 78+ messages in thread
From: Björn Töpel @ 2020-04-08 17:31 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: intel-wired-lan, Magnus Karlsson, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Maxim Mikityanskiy

On 2020-04-08 13:52, Jesper Dangaard Brouer wrote:
> Intel drivers implement native AF_XDP zerocopy in separate C-files,
> that have its own invocation of bpf_prog_run_xdp(). The setup of
> xdp_buff is also handled in separately from normal code path.
> 
> This patch update XDP frame_sz for AF_XDP zerocopy drivers i40e, ice
> and ixgbe, as the code changes needed are very similar.  Introduce a
> helper function xsk_umem_xdp_frame_sz() for calculating frame size.
> 
> Cc: intel-wired-lan@lists.osuosl.org
> Cc: Björn Töpel <bjorn.topel@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Thanks for the patch, Jesper! Note that mlx5 has AF_XDP support as well,
and might need similar changes. Adding Max for input!

For the Intel drivers, and core AF_XDP:
Acked-by: Björn Töpel <bjorn.topel@intel.com>

> ---
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c   |    2 ++
>   drivers/net/ethernet/intel/ice/ice_xsk.c     |    2 ++
>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c |    2 ++
>   include/net/xdp_sock.h                       |   11 +++++++++++
>   4 files changed, 17 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 0b7d29192b2c..2b9184aead5f 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -531,12 +531,14 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
>   {
>   	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>   	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
> +	struct xdp_umem *umem = rx_ring->xsk_umem;
>   	unsigned int xdp_res, xdp_xmit = 0;
>   	bool failure = false;
>   	struct sk_buff *skb;
>   	struct xdp_buff xdp;
>   
>   	xdp.rxq = &rx_ring->xdp_rxq;
> +	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
>   
>   	while (likely(total_rx_packets < (unsigned int)budget)) {
>   		struct i40e_rx_buffer *bi;
> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> index 8279db15e870..23e5515d4527 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> @@ -840,11 +840,13 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
>   {
>   	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>   	u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
> +	struct xdp_umem *umem = rx_ring->xsk_umem;
>   	unsigned int xdp_xmit = 0;
>   	bool failure = false;
>   	struct xdp_buff xdp;
>   
>   	xdp.rxq = &rx_ring->xdp_rxq;
> +	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
>   
>   	while (likely(total_rx_packets < (unsigned int)budget)) {
>   		union ice_32b_rx_flex_desc *rx_desc;
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index 74b540ebb3dc..a656ee9a1fae 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -431,12 +431,14 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
>   	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>   	struct ixgbe_adapter *adapter = q_vector->adapter;
>   	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
> +	struct xdp_umem *umem = rx_ring->xsk_umem;
>   	unsigned int xdp_res, xdp_xmit = 0;
>   	bool failure = false;
>   	struct sk_buff *skb;
>   	struct xdp_buff xdp;
>   
>   	xdp.rxq = &rx_ring->xdp_rxq;
> +	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
>   
>   	while (likely(total_rx_packets < budget)) {
>   		union ixgbe_adv_rx_desc *rx_desc;
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index e86ec48ef627..1cd1ec3cea97 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -237,6 +237,12 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
>   	else
>   		return address + offset;
>   }
> +
> +static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
> +{
> +	return umem->chunk_size_nohr + umem->headroom;
> +}
> +
>   #else
>   static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>   {
> @@ -367,6 +373,11 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
>   	return 0;
>   }
>   
> +static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
> +{
> +	return 0;
> +}
> +
>   static inline int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp)
>   {
>   	return -EOPNOTSUPP;
> 
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [Intel-wired-lan] [PATCH RFC v2 28/33] xdp: for Intel AF_XDP drivers add XDP frame_sz
@ 2020-04-08 17:31     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 78+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2020-04-08 17:31 UTC (permalink / raw)
  To: intel-wired-lan

On 2020-04-08 13:52, Jesper Dangaard Brouer wrote:
> Intel drivers implement native AF_XDP zerocopy in separate C-files,
> that have its own invocation of bpf_prog_run_xdp(). The setup of
> xdp_buff is also handled in separately from normal code path.
> 
> This patch update XDP frame_sz for AF_XDP zerocopy drivers i40e, ice
> and ixgbe, as the code changes needed are very similar.  Introduce a
> helper function xsk_umem_xdp_frame_sz() for calculating frame size.
> 
> Cc: intel-wired-lan at lists.osuosl.org
> Cc: Bj?rn T?pel <bjorn.topel@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Thanks for the patch, Jesper! Note that mlx5 has AF_XDP support as well,
and might need similar changes. Adding Max for input!

For the Intel drivers, and core AF_XDP:
Acked-by: Bj?rn T?pel <bjorn.topel@intel.com>

> ---
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c   |    2 ++
>   drivers/net/ethernet/intel/ice/ice_xsk.c     |    2 ++
>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c |    2 ++
>   include/net/xdp_sock.h                       |   11 +++++++++++
>   4 files changed, 17 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 0b7d29192b2c..2b9184aead5f 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -531,12 +531,14 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
>   {
>   	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>   	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
> +	struct xdp_umem *umem = rx_ring->xsk_umem;
>   	unsigned int xdp_res, xdp_xmit = 0;
>   	bool failure = false;
>   	struct sk_buff *skb;
>   	struct xdp_buff xdp;
>   
>   	xdp.rxq = &rx_ring->xdp_rxq;
> +	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
>   
>   	while (likely(total_rx_packets < (unsigned int)budget)) {
>   		struct i40e_rx_buffer *bi;
> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> index 8279db15e870..23e5515d4527 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> @@ -840,11 +840,13 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
>   {
>   	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>   	u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
> +	struct xdp_umem *umem = rx_ring->xsk_umem;
>   	unsigned int xdp_xmit = 0;
>   	bool failure = false;
>   	struct xdp_buff xdp;
>   
>   	xdp.rxq = &rx_ring->xdp_rxq;
> +	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
>   
>   	while (likely(total_rx_packets < (unsigned int)budget)) {
>   		union ice_32b_rx_flex_desc *rx_desc;
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index 74b540ebb3dc..a656ee9a1fae 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -431,12 +431,14 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
>   	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>   	struct ixgbe_adapter *adapter = q_vector->adapter;
>   	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
> +	struct xdp_umem *umem = rx_ring->xsk_umem;
>   	unsigned int xdp_res, xdp_xmit = 0;
>   	bool failure = false;
>   	struct sk_buff *skb;
>   	struct xdp_buff xdp;
>   
>   	xdp.rxq = &rx_ring->xdp_rxq;
> +	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
>   
>   	while (likely(total_rx_packets < budget)) {
>   		union ixgbe_adv_rx_desc *rx_desc;
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index e86ec48ef627..1cd1ec3cea97 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -237,6 +237,12 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
>   	else
>   		return address + offset;
>   }
> +
> +static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
> +{
> +	return umem->chunk_size_nohr + umem->headroom;
> +}
> +
>   #else
>   static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>   {
> @@ -367,6 +373,11 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
>   	return 0;
>   }
>   
> +static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
> +{
> +	return 0;
> +}
> +
>   static inline int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp)
>   {
>   	return -EOPNOTSUPP;
> 
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff
  2020-04-08 11:50 ` [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
@ 2020-04-08 17:53   ` Jakub Kicinski
  2020-04-09  0:48     ` Saeed Mahameed
  2020-04-14 14:16     ` Jesper Dangaard Brouer
  2020-04-09  0:50   ` Saeed Mahameed
  1 sibling, 2 replies; 78+ messages in thread
From: Jakub Kicinski @ 2020-04-08 17:53 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

On Wed, 08 Apr 2020 13:50:39 +0200 Jesper Dangaard Brouer wrote:
> XDP have evolved to support several frame sizes, but xdp_buff was not
> updated with this information. The frame size (frame_sz) member of
> xdp_buff is introduced to know the real size of the memory the frame is
> delivered in.
> 
> When introducing this also make it clear that some tailroom is
> reserved/required when creating SKBs using build_skb().
> 
> It would also have been an option to introduce a pointer to
> data_hard_end (with reserved offset). The advantage with frame_sz is
> that (like rxq) drivers only need to setup/assign this value once per
> NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to
> store frame_sz inside xdp_rxq_info, because it's varies per packet as it
> can be based/depend on packet length.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  include/net/xdp.h |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 40c6d3398458..99f4374f6214 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -6,6 +6,8 @@
>  #ifndef __LINUX_NET_XDP_H__
>  #define __LINUX_NET_XDP_H__
>  
> +#include <linux/skbuff.h> /* skb_shared_info */
> +
>  /**
>   * DOC: XDP RX-queue information
>   *
> @@ -70,8 +72,23 @@ struct xdp_buff {
>  	void *data_hard_start;
>  	unsigned long handle;
>  	struct xdp_rxq_info *rxq;
> +	u32 frame_sz; /* frame size to deduct data_hard_end/reserved tailroom*/

Perhaps

/* length of packet buffer, starting at data_hard_start */

?

>  };
>  
> +/* Reserve memory area at end-of data area.

I wouldn't say this reserves anything. It just computes the end
pointer, no?

> + *
> + * This macro reserves tailroom in the XDP buffer by limiting the
> + * XDP/BPF data access to data_hard_end.  Notice same area (and size)
> + * is used for XDP_PASS, when constructing the SKB via build_skb().
> + */
> +#define xdp_data_hard_end(xdp)				\
> +	((xdp)->data_hard_start + (xdp)->frame_sz -	\
> +	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))

I think it should be said somewhere that the drivers are expected to
DMA map memory up to xdp_data_hard_end(xdp).

> +
> +/* Like skb_shinfo */
> +#define xdp_shinfo(xdp)	((struct skb_shared_info *)(xdp_data_hard_end(xdp)))
> +// XXX: Above likely belongs in later patch
> +
>  struct xdp_frame {
>  	void *data;
>  	u16 len;
> 
> 


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 19/33] nfp: add XDP frame size to netronome driver
  2020-04-08 11:52 ` [PATCH RFC v2 19/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
@ 2020-04-08 17:53   ` Jakub Kicinski
  2020-04-14 14:02     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 78+ messages in thread
From: Jakub Kicinski @ 2020-04-08 17:53 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

On Wed, 08 Apr 2020 13:52:10 +0200 Jesper Dangaard Brouer wrote:
> The netronome nfp driver already had a true_bufsz variable
> that contains what was needed for xdp.frame_sz.
> 
> Cc: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  .../net/ethernet/netronome/nfp/nfp_net_common.c    |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> index 9bfb3b077bc1..b9b8c30eab33 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> @@ -1817,6 +1817,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
>  	rcu_read_lock();
>  	xdp_prog = READ_ONCE(dp->xdp_prog);
>  	true_bufsz = xdp_prog ? PAGE_SIZE : dp->fl_bufsz;
> +	xdp.frame_sz = true_bufsz;

Since this matters only with XDP on - we can set to PAGE_SIZE directly?

But more importantly the correct value is:

	PAGE_SIZE - NFP_NET_RX_BUF_HEADROOM

as we set hard_start at an offset. 

	xdp.data_hard_start = rxbuf->frag + NFP_NET_RX_BUF_HEADROOM;

Cause NFP_NET_RX_BUF_HEADROOM is not DMA mapped.

>  	xdp.rxq = &rx_ring->xdp_rxq;
>  	tx_ring = r_vec->xdp_ring;

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 26/33] i40e: add XDP frame size to driver
  2020-04-08 11:52   ` [Intel-wired-lan] " Jesper Dangaard Brouer
@ 2020-04-08 21:48     ` David Miller
  -1 siblings, 0 replies; 78+ messages in thread
From: David Miller @ 2020-04-08 21:48 UTC (permalink / raw)
  To: brouer
  Cc: sameehj, intel-wired-lan, jeffrey.t.kirsher, alexander.duyck,
	netdev, bpf, zorik, akiyano, gtzalik, toke, borkmann,
	alexei.starovoitov, john.fastabend, dsahern,
	willemdebruijn.kernel, ilias.apalodimas, lorenzo, saeedm

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Wed, 08 Apr 2020 13:52:46 +0200

> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index b8496037ef7f..1fb6b1004dcb 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1507,6 +1507,23 @@ static inline unsigned int i40e_rx_offset(struct i40e_ring *rx_ring)
>  	return ring_uses_build_skb(rx_ring) ? I40E_SKB_PAD : 0;
>  }
>  
> +static inline unsigned int i40e_rx_frame_truesize(struct i40e_ring *rx_ring,
> +						  unsigned int size)

Please don't use inline in foo.c files.  I noticed you properly elided this in
the ice changes so I wonder why it showed up here :-)

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [Intel-wired-lan] [PATCH RFC v2 26/33] i40e: add XDP frame size to driver
@ 2020-04-08 21:48     ` David Miller
  0 siblings, 0 replies; 78+ messages in thread
From: David Miller @ 2020-04-08 21:48 UTC (permalink / raw)
  To: intel-wired-lan

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Wed, 08 Apr 2020 13:52:46 +0200

> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index b8496037ef7f..1fb6b1004dcb 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1507,6 +1507,23 @@ static inline unsigned int i40e_rx_offset(struct i40e_ring *rx_ring)
>  	return ring_uses_build_skb(rx_ring) ? I40E_SKB_PAD : 0;
>  }
>  
> +static inline unsigned int i40e_rx_frame_truesize(struct i40e_ring *rx_ring,
> +						  unsigned int size)

Please don't use inline in foo.c files.  I noticed you properly elided this in
the ice changes so I wonder why it showed up here :-)

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail()
  2020-04-08 11:53 ` [PATCH RFC v2 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
@ 2020-04-08 21:49   ` David Miller
  2020-04-14  9:43     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 78+ messages in thread
From: David Miller @ 2020-04-08 21:49 UTC (permalink / raw)
  To: brouer
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik, toke, borkmann,
	alexei.starovoitov, john.fastabend, alexander.duyck,
	jeffrey.t.kirsher, dsahern, willemdebruijn.kernel,
	ilias.apalodimas, lorenzo, saeedm

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Wed, 08 Apr 2020 13:53:06 +0200

> @@ -3445,6 +3445,11 @@ BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
>  	if (unlikely(data_end < xdp->data + ETH_HLEN))
>  		return -EINVAL;
>  
> +	/* Clear memory area on grow, can contain uninit kernel memory */
> +	if (offset > 0) {
> +		memset(xdp->data_end, 0, offset);
> +	}

Single statement basic blocks should elide curly braces.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff
  2020-04-08 17:53   ` Jakub Kicinski
@ 2020-04-09  0:48     ` Saeed Mahameed
  2020-04-09  1:13       ` Jakub Kicinski
  2020-04-14 14:16     ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 78+ messages in thread
From: Saeed Mahameed @ 2020-04-09  0:48 UTC (permalink / raw)
  To: kuba, brouer
  Cc: akiyano, willemdebruijn.kernel, borkmann, jeffrey.t.kirsher,
	john.fastabend, toke, alexei.starovoitov, gtzalik, dsahern,
	sameehj, alexander.duyck, bpf, ilias.apalodimas, zorik, netdev,
	lorenzo

On Wed, 2020-04-08 at 10:53 -0700, Jakub Kicinski wrote:
> On Wed, 08 Apr 2020 13:50:39 +0200 Jesper Dangaard Brouer wrote:
> > XDP have evolved to support several frame sizes, but xdp_buff was
> > not
> > updated with this information. The frame size (frame_sz) member of
> > xdp_buff is introduced to know the real size of the memory the
> > frame is
> > delivered in.
> > 
> > When introducing this also make it clear that some tailroom is
> > reserved/required when creating SKBs using build_skb().
> > 
> > It would also have been an option to introduce a pointer to
> > data_hard_end (with reserved offset). The advantage with frame_sz
> > is
> > that (like rxq) drivers only need to setup/assign this value once
> > per
> > NAPI cycle. Due to XDP-generic (and some drivers) it's not possible
> > to
> > store frame_sz inside xdp_rxq_info, because it's varies per packet
> > as it
> > can be based/depend on packet length.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >  include/net/xdp.h |   17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index 40c6d3398458..99f4374f6214 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -6,6 +6,8 @@
> >  #ifndef __LINUX_NET_XDP_H__
> >  #define __LINUX_NET_XDP_H__
> >  
> > +#include <linux/skbuff.h> /* skb_shared_info */
> > +
> >  /**
> >   * DOC: XDP RX-queue information
> >   *
> > @@ -70,8 +72,23 @@ struct xdp_buff {
> >  	void *data_hard_start;
> >  	unsigned long handle;
> >  	struct xdp_rxq_info *rxq;
> > +	u32 frame_sz; /* frame size to deduct data_hard_end/reserved
> > tailroom*/
> 
> Perhaps
> 
> /* length of packet buffer, starting at data_hard_start */
> 
> ?
> 
> >  };
> >  
> > +/* Reserve memory area at end-of data area.
> 
> I wouldn't say this reserves anything. It just computes the end
> pointer, no?
> 
> > + *
> > + * This macro reserves tailroom in the XDP buffer by limiting the
> > + * XDP/BPF data access to data_hard_end.  Notice same area (and
> > size)
> > + * is used for XDP_PASS, when constructing the SKB via
> > build_skb().
> > + */
> > +#define xdp_data_hard_end(xdp)				\
> > +	((xdp)->data_hard_start + (xdp)->frame_sz -	\
> > +	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> 
> I think it should be said somewhere that the drivers are expected to
> DMA map memory up to xdp_data_hard_end(xdp).
> 

but this works on a specific xdp buff, drivers work with mtu

and what if the driver want to have this as an option per packet .. 
i.e.: if there is enough tail room, then build_skb, otherwise
alloc new skb, copy headers, setup data frags.. etc

having such limitations on driver can be very strict, i think the
decision must remain dynamic per frame..

of-course drivers should optimize to preserve enough tail room for all
rx packets.. 

> > +
> > +/* Like skb_shinfo */
> > +#define xdp_shinfo(xdp)	((struct skb_shared_info
> > *)(xdp_data_hard_end(xdp)))
> > +// XXX: Above likely belongs in later patch
> > +
> >  struct xdp_frame {
> >  	void *data;
> >  	u16 len;
> > 
> > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff
  2020-04-08 11:50 ` [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
  2020-04-08 17:53   ` Jakub Kicinski
@ 2020-04-09  0:50   ` Saeed Mahameed
  2020-04-16 13:02     ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 78+ messages in thread
From: Saeed Mahameed @ 2020-04-09  0:50 UTC (permalink / raw)
  To: brouer, sameehj
  Cc: toke, gtzalik, ilias.apalodimas, borkmann, alexander.duyck,
	john.fastabend, akiyano, zorik, alexei.starovoitov, netdev,
	jeffrey.t.kirsher, bpf, dsahern, lorenzo, willemdebruijn.kernel

On Wed, 2020-04-08 at 13:50 +0200, Jesper Dangaard Brouer wrote:
> XDP have evolved to support several frame sizes, but xdp_buff was not
> updated with this information. The frame size (frame_sz) member of
> xdp_buff is introduced to know the real size of the memory the frame
> is
> delivered in.
> 
> When introducing this also make it clear that some tailroom is
> reserved/required when creating SKBs using build_skb().
> 
> It would also have been an option to introduce a pointer to
> data_hard_end (with reserved offset). The advantage with frame_sz is
> that (like rxq) drivers only need to setup/assign this value once per
> NAPI cycle. Due to XDP-generic (and some drivers) it's not possible
> to
> store frame_sz inside xdp_rxq_info, because it's varies per packet as
> it
> can be based/depend on packet length.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  include/net/xdp.h |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 40c6d3398458..99f4374f6214 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -6,6 +6,8 @@
>  #ifndef __LINUX_NET_XDP_H__
>  #define __LINUX_NET_XDP_H__
>  
> +#include <linux/skbuff.h> /* skb_shared_info */
> +

I think it is wrong to make xdp.h depend on skbuff.h
we must keep xdp.h minimal and independent,
the new macros should be defined in skbuff.h 

>  /**
>   * DOC: XDP RX-queue information
>   *
> @@ -70,8 +72,23 @@ struct xdp_buff {
>  	void *data_hard_start;
>  	unsigned long handle;
>  	struct xdp_rxq_info *rxq;
> +	u32 frame_sz; /* frame size to deduct data_hard_end/reserved
> tailroom*/

why u32 ? u16 should be more than enough.. 

>  };
>  
> +/* Reserve memory area at end-of data area.
> + *
> + * This macro reserves tailroom in the XDP buffer by limiting the
> + * XDP/BPF data access to data_hard_end.  Notice same area (and
> size)
> + * is used for XDP_PASS, when constructing the SKB via build_skb().
> + */
> +#define xdp_data_hard_end(xdp)				\
> +	((xdp)->data_hard_start + (xdp)->frame_sz -	\
> +	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> +

this macro is not safe when unary operators are being used

> +/* Like skb_shinfo */
> +#define xdp_shinfo(xdp)	((struct skb_shared_info
> *)(xdp_data_hard_end(xdp)))
> +// XXX: Above likely belongs in later patch
> +
>  struct xdp_frame {
>  	void *data;
>  	u16 len;
> 
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff
  2020-04-09  0:48     ` Saeed Mahameed
@ 2020-04-09  1:13       ` Jakub Kicinski
  2020-04-09 23:07         ` Saeed Mahameed
  0 siblings, 1 reply; 78+ messages in thread
From: Jakub Kicinski @ 2020-04-09  1:13 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: brouer, akiyano, willemdebruijn.kernel, borkmann,
	jeffrey.t.kirsher, john.fastabend, toke, alexei.starovoitov,
	gtzalik, dsahern, sameehj, alexander.duyck, bpf,
	ilias.apalodimas, zorik, netdev, lorenzo

On Thu, 9 Apr 2020 00:48:30 +0000 Saeed Mahameed wrote:
> > > + * This macro reserves tailroom in the XDP buffer by limiting the
> > > + * XDP/BPF data access to data_hard_end.  Notice same area (and
> > > size)
> > > + * is used for XDP_PASS, when constructing the SKB via
> > > build_skb().
> > > + */
> > > +#define xdp_data_hard_end(xdp)				\
> > > +	((xdp)->data_hard_start + (xdp)->frame_sz -	\
> > > +	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))  
> > 
> > I think it should be said somewhere that the drivers are expected to
> > DMA map memory up to xdp_data_hard_end(xdp).
> >   
> 
> but this works on a specific xdp buff, drivers work with mtu
> 
> and what if the driver want to have this as an option per packet .. 
> i.e.: if there is enough tail room, then build_skb, otherwise
> alloc new skb, copy headers, setup data frags.. etc
> 
> having such limitations on driver can be very strict, i think the
> decision must remain dynamic per frame..
> 
> of-course drivers should optimize to preserve enough tail room for all
> rx packets.. 

My concern is that driver may allocate a full page for each frame but
only DMA map the amount that can reasonably contain data given the MTU.
To save on DMA syncs.

Today that wouldn't be a problem, because XDP_REDIRECT will re-map the
page, and XDP_TX has the same MTU.

In this set xdp_data_hard_end is used both to find the end of memory
buffer, and end of DMA buffer. Implementation of bpf_xdp_adjust_tail()
assumes anything < SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) from
the end is fair game.

So I was trying to say that we should warn driver authors that the DMA
buffer can now grow / move beyond what the driver may expect in XDP_TX.
Drivers can either DMA map enough memory, or handle the corner case in
a special way.

IDK if that makes sense, we may be talking past each other :)

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size
  2020-04-08 11:53 ` [PATCH RFC v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
@ 2020-04-09  3:31   ` Saeed Mahameed
  2020-04-14 12:46     ` Jesper Dangaard Brouer
  2020-04-14  9:56   ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 78+ messages in thread
From: Saeed Mahameed @ 2020-04-09  3:31 UTC (permalink / raw)
  To: brouer
  Cc: toke, gtzalik, ilias.apalodimas, borkmann, alexander.duyck,
	john.fastabend, akiyano, zorik, alexei.starovoitov, netdev,
	jeffrey.t.kirsher, bpf, dsahern, lorenzo, willemdebruijn.kernel

On Wed, 2020-04-08 at 13:53 +0200, Jesper Dangaard Brouer wrote:
> Finally, after all drivers have a frame size, allow BPF-helper
> bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.
> 

can you provide a list of usecases for why tail extension is necessary
?

and what do you have in mind as immediate use of bpf_xdp_adjust_tail()
? 

both cover letter and commit messages didn't list any actual use case..

> Remember that helper/macro xdp_data_hard_end have reserved some
> tailroom.  Thus, this helper makes sure that the BPF-prog don't have
> access to this tailroom area.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  include/uapi/linux/bpf.h |    4 ++--
>  net/core/filter.c        |   18 ++++++++++++++++--
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 2e29a671d67e..0e5abe991ca3 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1969,8 +1969,8 @@ union bpf_attr {
>   * int bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
>   * 	Description
>   * 		Adjust (move) *xdp_md*\ **->data_end** by *delta*
> bytes. It is
> - * 		only possible to shrink the packet as of this writing,
> - * 		therefore *delta* must be a negative integer.
> + * 		possible to both shrink and grow the packet tail.
> + * 		Shrink done via *delta* being a negative integer.
>   *
>   * 		A call to this helper is susceptible to change the
> underlying
>   * 		packet buffer. Therefore, at load time, all checks on
> pointers
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 7628b947dbc3..4d58a147eed0 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3422,12 +3422,26 @@ static const struct bpf_func_proto
> bpf_xdp_adjust_head_proto = {
>  
>  BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
>  {
> +	void *data_hard_end = xdp_data_hard_end(xdp);
>  	void *data_end = xdp->data_end + offset;
>  
> -	/* only shrinking is allowed for now. */
> -	if (unlikely(offset >= 0))
> +	/* Notice that xdp_data_hard_end have reserved some tailroom */
> +	if (unlikely(data_end > data_hard_end))
>  		return -EINVAL;
>  

i don't know if i like this approach for couple of reasons.

1. drivers will provide arbitrary frames_sz, which is normally larger
than mtu, and could be a full page size, for XDP_TX action this can be
problematic if xdp progs will allow oversized packets to get caught at
the driver level.. 

2. xdp_data_hard_end(xdp) has a hardcoded assumption of the skb shinfo
and it introduces a reverse dependency between xdp buff and skbuff 

both of the above can be solved if the drivers provided the max allowed
frame size, already accounting for mtu and shinfo when setting
xdp_buff.frame_sz at the driver level.


> +	/* DANGER: ALL drivers MUST be converted to init xdp->frame_sz
> +	 * - Adding some chicken checks below
> +	 * - Will (likely) not be for upstream
> +	 */
> +	if (unlikely(xdp->frame_sz < (xdp->data_end - xdp-
> >data_hard_start))) {
> +		WARN(1, "Too small xdp->frame_sz = %d\n", xdp-
> >frame_sz);
> +		return -EINVAL;
> +	}
> +	if (unlikely(xdp->frame_sz > PAGE_SIZE)) {
> +		WARN(1, "Too BIG xdp->frame_sz = %d\n", xdp->frame_sz);
> +		return -EINVAL;
> +	}
> +
>  	if (unlikely(data_end < xdp->data + ETH_HLEN))
>  		return -EINVAL;
>  
> 
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 17/33] mlx5: rx queue setup time determine frame_sz for XDP
  2020-04-08 11:52 ` [PATCH RFC v2 17/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
  2020-04-08 12:52   ` Tariq Toukan
@ 2020-04-09  9:28   ` Maxim Mikityanskiy
  1 sibling, 0 replies; 78+ messages in thread
From: Maxim Mikityanskiy @ 2020-04-09  9:28 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Tariq Toukan, Saeed Mahameed, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi

On 2020-04-08 14:52, Jesper Dangaard Brouer wrote:
> The mlx5 driver have multiple memory models, which are also changed
> according to whether a XDP bpf_prog is attached.
> 
> The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.:
>   # ethtool --set-priv-flags mlx5p2 rx_striding_rq off
> 
> On the general case with 4K page_size and regular MTU packet, then
> the frame_sz is 2048 and 4096 when XDP is enabled, in both modes.
> 
> The info on the given frame size is stored differently depending on the
> RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe.
> In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which
> corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ).
> In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored
> in rq->wqe.info.arr[0].frag_stride.
> 
> To reduce effect on fast-path, this patch determine the frame_sz at
> setup time, to avoid determining the memory model runtime.
> 
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/en.h      |    1 +
>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |    1 +
>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    4 ++++
>   3 files changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> index 12a61bf82c14..1f280fc142ca 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> @@ -651,6 +651,7 @@ struct mlx5e_rq {
>   	struct {
>   		u16            umem_headroom;
>   		u16            headroom;
> +		u32            frame_sz;
>   		u8             map_dir;   /* dma map direction */
>   	} buff;
>   
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index f049e0ac308a..de4ad2c9f49a 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -137,6 +137,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>   	if (xsk)
>   		xdp.handle = di->xsk.handle;
>   	xdp.rxq = &rq->xdp_rxq;
> +	xdp.frame_sz = rq->buff.frame_sz;
>   
>   	act = bpf_prog_run_xdp(prog, &xdp);
>   	if (xsk) {
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index dd7f338425eb..b9595315c45b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -462,6 +462,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>   		rq->mpwqe.num_strides =
>   			BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, xsk));
>   
> +		rq->buff.frame_sz = (1 << rq->mpwqe.log_stride_sz);

I think it won't be correct from AF_XDP perspective in case unaligned 
chunks are in use. As I see by the patches for Intel's drivers, 
xdp.frame_sz is set to chunk_size_nohr + headroom, which is not always a 
power of two.

Moreover, it won't be correct for standard (aligned to a power of two) 
chunks either, because mlx5e_rx_get_linear_frag_sz always rounds up to 
PAGE_SIZE in case of XDP (this usage of striding RQ is somewhat hackish 
when it comes to AF_XDP), so we will end up with frame_sz == PAGE_SIZE.

So, I think we just need to use `xsk->chunk_size` here for frame_sz. The 
same for non-striding RQ.

> +
>   		err = mlx5e_create_rq_umr_mkey(mdev, rq);
>   		if (err)
>   			goto err_rq_wq_destroy;
> @@ -485,6 +487,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>   			num_xsk_frames = wq_sz << rq->wqe.info.log_num_frags;
>   
>   		rq->wqe.info = rqp->frags_info;
> +		rq->buff.frame_sz = rq->wqe.info.arr[0].frag_stride;
> +
>   		rq->wqe.frags =
>   			kvzalloc_node(array_size(sizeof(*rq->wqe.frags),
>   					(wq_sz << rq->wqe.info.log_num_frags)),
> 
> 


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 28/33] xdp: for Intel AF_XDP drivers add XDP frame_sz
  2020-04-08 17:31     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2020-04-09  9:33       ` Maxim Mikityanskiy
  -1 siblings, 0 replies; 78+ messages in thread
From: Maxim Mikityanskiy @ 2020-04-09  9:33 UTC (permalink / raw)
  To: Björn Töpel, Jesper Dangaard Brouer, sameehj
  Cc: intel-wired-lan, Magnus Karlsson, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed

On 2020-04-08 20:31, Björn Töpel wrote:
> On 2020-04-08 13:52, Jesper Dangaard Brouer wrote:
>> Intel drivers implement native AF_XDP zerocopy in separate C-files,
>> that have its own invocation of bpf_prog_run_xdp(). The setup of
>> xdp_buff is also handled in separately from normal code path.
>>
>> This patch update XDP frame_sz for AF_XDP zerocopy drivers i40e, ice
>> and ixgbe, as the code changes needed are very similar.  Introduce a
>> helper function xsk_umem_xdp_frame_sz() for calculating frame size.
>>
>> Cc: intel-wired-lan@lists.osuosl.org
>> Cc: Björn Töpel <bjorn.topel@intel.com>
>> Cc: Magnus Karlsson <magnus.karlsson@intel.com>
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> 
> Thanks for the patch, Jesper! Note that mlx5 has AF_XDP support as well,
> and might need similar changes. Adding Max for input!

Thanks for drawing my attention to this series, Björn! I commented 
regarding frame_sz calculation under the mlx5 patch (17/33).

> For the Intel drivers, and core AF_XDP:
> Acked-by: Björn Töpel <bjorn.topel@intel.com>
> 
>> ---
>>   drivers/net/ethernet/intel/i40e/i40e_xsk.c   |    2 ++
>>   drivers/net/ethernet/intel/ice/ice_xsk.c     |    2 ++
>>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c |    2 ++
>>   include/net/xdp_sock.h                       |   11 +++++++++++
>>   4 files changed, 17 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c 
>> b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>> index 0b7d29192b2c..2b9184aead5f 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>> @@ -531,12 +531,14 @@ int i40e_clean_rx_irq_zc(struct i40e_ring 
>> *rx_ring, int budget)
>>   {
>>       unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>>       u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
>> +    struct xdp_umem *umem = rx_ring->xsk_umem;
>>       unsigned int xdp_res, xdp_xmit = 0;
>>       bool failure = false;
>>       struct sk_buff *skb;
>>       struct xdp_buff xdp;
>>       xdp.rxq = &rx_ring->xdp_rxq;
>> +    xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
>>       while (likely(total_rx_packets < (unsigned int)budget)) {
>>           struct i40e_rx_buffer *bi;
>> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c 
>> b/drivers/net/ethernet/intel/ice/ice_xsk.c
>> index 8279db15e870..23e5515d4527 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
>> @@ -840,11 +840,13 @@ int ice_clean_rx_irq_zc(struct ice_ring 
>> *rx_ring, int budget)
>>   {
>>       unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>>       u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
>> +    struct xdp_umem *umem = rx_ring->xsk_umem;
>>       unsigned int xdp_xmit = 0;
>>       bool failure = false;
>>       struct xdp_buff xdp;
>>       xdp.rxq = &rx_ring->xdp_rxq;
>> +    xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
>>       while (likely(total_rx_packets < (unsigned int)budget)) {
>>           union ice_32b_rx_flex_desc *rx_desc;
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c 
>> b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> index 74b540ebb3dc..a656ee9a1fae 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> @@ -431,12 +431,14 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector 
>> *q_vector,
>>       unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>>       struct ixgbe_adapter *adapter = q_vector->adapter;
>>       u16 cleaned_count = ixgbe_desc_unused(rx_ring);
>> +    struct xdp_umem *umem = rx_ring->xsk_umem;
>>       unsigned int xdp_res, xdp_xmit = 0;
>>       bool failure = false;
>>       struct sk_buff *skb;
>>       struct xdp_buff xdp;
>>       xdp.rxq = &rx_ring->xdp_rxq;
>> +    xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
>>       while (likely(total_rx_packets < budget)) {
>>           union ixgbe_adv_rx_desc *rx_desc;
>> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
>> index e86ec48ef627..1cd1ec3cea97 100644
>> --- a/include/net/xdp_sock.h
>> +++ b/include/net/xdp_sock.h
>> @@ -237,6 +237,12 @@ static inline u64 xsk_umem_adjust_offset(struct 
>> xdp_umem *umem, u64 address,
>>       else
>>           return address + offset;
>>   }
>> +
>> +static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
>> +{
>> +    return umem->chunk_size_nohr + umem->headroom;
>> +}
>> +

This new function may be used in mlx5 for mlx5e_build_xsk_param.

>>   #else
>>   static inline int xsk_generic_rcv(struct xdp_sock *xs, struct 
>> xdp_buff *xdp)
>>   {
>> @@ -367,6 +373,11 @@ static inline u64 xsk_umem_adjust_offset(struct 
>> xdp_umem *umem, u64 handle,
>>       return 0;
>>   }
>> +static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
>> +{
>> +    return 0;
>> +}
>> +
>>   static inline int __xsk_map_redirect(struct xdp_sock *xs, struct 
>> xdp_buff *xdp)
>>   {
>>       return -EOPNOTSUPP;
>>
>>


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [Intel-wired-lan] [PATCH RFC v2 28/33] xdp: for Intel AF_XDP drivers add XDP frame_sz
@ 2020-04-09  9:33       ` Maxim Mikityanskiy
  0 siblings, 0 replies; 78+ messages in thread
From: Maxim Mikityanskiy @ 2020-04-09  9:33 UTC (permalink / raw)
  To: intel-wired-lan

On 2020-04-08 20:31, Bj?rn T?pel wrote:
> On 2020-04-08 13:52, Jesper Dangaard Brouer wrote:
>> Intel drivers implement native AF_XDP zerocopy in separate C-files,
>> that have its own invocation of bpf_prog_run_xdp(). The setup of
>> xdp_buff is also handled in separately from normal code path.
>>
>> This patch update XDP frame_sz for AF_XDP zerocopy drivers i40e, ice
>> and ixgbe, as the code changes needed are very similar.? Introduce a
>> helper function xsk_umem_xdp_frame_sz() for calculating frame size.
>>
>> Cc: intel-wired-lan at lists.osuosl.org
>> Cc: Bj?rn T?pel <bjorn.topel@intel.com>
>> Cc: Magnus Karlsson <magnus.karlsson@intel.com>
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> 
> Thanks for the patch, Jesper! Note that mlx5 has AF_XDP support as well,
> and might need similar changes. Adding Max for input!

Thanks for drawing my attention to this series, Bj?rn! I commented 
regarding frame_sz calculation under the mlx5 patch (17/33).

> For the Intel drivers, and core AF_XDP:
> Acked-by: Bj?rn T?pel <bjorn.topel@intel.com>
> 
>> ---
>> ? drivers/net/ethernet/intel/i40e/i40e_xsk.c?? |??? 2 ++
>> ? drivers/net/ethernet/intel/ice/ice_xsk.c???? |??? 2 ++
>> ? drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c |??? 2 ++
>> ? include/net/xdp_sock.h?????????????????????? |?? 11 +++++++++++
>> ? 4 files changed, 17 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c 
>> b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>> index 0b7d29192b2c..2b9184aead5f 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
>> @@ -531,12 +531,14 @@ int i40e_clean_rx_irq_zc(struct i40e_ring 
>> *rx_ring, int budget)
>> ? {
>> ????? unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>> ????? u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
>> +??? struct xdp_umem *umem = rx_ring->xsk_umem;
>> ????? unsigned int xdp_res, xdp_xmit = 0;
>> ????? bool failure = false;
>> ????? struct sk_buff *skb;
>> ????? struct xdp_buff xdp;
>> ????? xdp.rxq = &rx_ring->xdp_rxq;
>> +??? xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
>> ????? while (likely(total_rx_packets < (unsigned int)budget)) {
>> ????????? struct i40e_rx_buffer *bi;
>> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c 
>> b/drivers/net/ethernet/intel/ice/ice_xsk.c
>> index 8279db15e870..23e5515d4527 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
>> @@ -840,11 +840,13 @@ int ice_clean_rx_irq_zc(struct ice_ring 
>> *rx_ring, int budget)
>> ? {
>> ????? unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>> ????? u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
>> +??? struct xdp_umem *umem = rx_ring->xsk_umem;
>> ????? unsigned int xdp_xmit = 0;
>> ????? bool failure = false;
>> ????? struct xdp_buff xdp;
>> ????? xdp.rxq = &rx_ring->xdp_rxq;
>> +??? xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
>> ????? while (likely(total_rx_packets < (unsigned int)budget)) {
>> ????????? union ice_32b_rx_flex_desc *rx_desc;
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c 
>> b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> index 74b540ebb3dc..a656ee9a1fae 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>> @@ -431,12 +431,14 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector 
>> *q_vector,
>> ????? unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>> ????? struct ixgbe_adapter *adapter = q_vector->adapter;
>> ????? u16 cleaned_count = ixgbe_desc_unused(rx_ring);
>> +??? struct xdp_umem *umem = rx_ring->xsk_umem;
>> ????? unsigned int xdp_res, xdp_xmit = 0;
>> ????? bool failure = false;
>> ????? struct sk_buff *skb;
>> ????? struct xdp_buff xdp;
>> ????? xdp.rxq = &rx_ring->xdp_rxq;
>> +??? xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
>> ????? while (likely(total_rx_packets < budget)) {
>> ????????? union ixgbe_adv_rx_desc *rx_desc;
>> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
>> index e86ec48ef627..1cd1ec3cea97 100644
>> --- a/include/net/xdp_sock.h
>> +++ b/include/net/xdp_sock.h
>> @@ -237,6 +237,12 @@ static inline u64 xsk_umem_adjust_offset(struct 
>> xdp_umem *umem, u64 address,
>> ????? else
>> ????????? return address + offset;
>> ? }
>> +
>> +static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
>> +{
>> +??? return umem->chunk_size_nohr + umem->headroom;
>> +}
>> +

This new function may be used in mlx5 for mlx5e_build_xsk_param.

>> ? #else
>> ? static inline int xsk_generic_rcv(struct xdp_sock *xs, struct 
>> xdp_buff *xdp)
>> ? {
>> @@ -367,6 +373,11 @@ static inline u64 xsk_umem_adjust_offset(struct 
>> xdp_umem *umem, u64 handle,
>> ????? return 0;
>> ? }
>> +static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
>> +{
>> +??? return 0;
>> +}
>> +
>> ? static inline int __xsk_map_redirect(struct xdp_sock *xs, struct 
>> xdp_buff *xdp)
>> ? {
>> ????? return -EOPNOTSUPP;
>>
>>


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff
  2020-04-09  1:13       ` Jakub Kicinski
@ 2020-04-09 23:07         ` Saeed Mahameed
  2020-04-09 23:27           ` Jakub Kicinski
  0 siblings, 1 reply; 78+ messages in thread
From: Saeed Mahameed @ 2020-04-09 23:07 UTC (permalink / raw)
  To: kuba
  Cc: akiyano, willemdebruijn.kernel, borkmann, jeffrey.t.kirsher,
	john.fastabend, toke, alexei.starovoitov, gtzalik, dsahern,
	brouer, sameehj, zorik, alexander.duyck, bpf, ilias.apalodimas,
	netdev, lorenzo

On Wed, 2020-04-08 at 18:13 -0700, Jakub Kicinski wrote:
> On Thu, 9 Apr 2020 00:48:30 +0000 Saeed Mahameed wrote:
> > > > + * This macro reserves tailroom in the XDP buffer by limiting
> > > > the
> > > > + * XDP/BPF data access to data_hard_end.  Notice same area
> > > > (and
> > > > size)
> > > > + * is used for XDP_PASS, when constructing the SKB via
> > > > build_skb().
> > > > + */
> > > > +#define xdp_data_hard_end(xdp)				\
> > > > +	((xdp)->data_hard_start + (xdp)->frame_sz -	\
> > > > +	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))  
> > > 
> > > I think it should be said somewhere that the drivers are expected
> > > to
> > > DMA map memory up to xdp_data_hard_end(xdp).
> > >   
> > 
> > but this works on a specific xdp buff, drivers work with mtu
> > 
> > and what if the driver want to have this as an option per packet
> > .. 
> > i.e.: if there is enough tail room, then build_skb, otherwise
> > alloc new skb, copy headers, setup data frags.. etc
> > 
> > having such limitations on driver can be very strict, i think the
> > decision must remain dynamic per frame..
> > 
> > of-course drivers should optimize to preserve enough tail room for
> > all
> > rx packets.. 
> 
> My concern is that driver may allocate a full page for each frame but
> only DMA map the amount that can reasonably contain data given the
> MTU.
> To save on DMA syncs.
> 
> Today that wouldn't be a problem, because XDP_REDIRECT will re-map
> the
> page, and XDP_TX has the same MTU.
> 

I am not worried about dma at all, i am worried about the xdp progs
which are now allowed to extend packets beyond the mtu and do XDP_TX.
but as i am thinking about this i just realized that this can already
happen with xdp_adjust_head()..

but as you stated above this puts alot of assumptions on how driver
should dma rx buffs 

> In this set xdp_data_hard_end is used both to find the end of memory
> buffer, and end of DMA buffer. Implementation of
> bpf_xdp_adjust_tail()
> assumes anything < SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
> from
> the end is fair game.
> 

but why skb_shared_info in particular though ? this assumes someone
needs this tail for building skbs .. looks weird to me.

> So I was trying to say that we should warn driver authors that the
> DMA
> buffer can now grow / move beyond what the driver may expect in
> XDP_TX.

Ack, but can we do it by desing ? i.e instead of having hardcoded
limits (e.g. SKB_DATA_ALIGN(shinfo)) in bpf_xdp_adjust_tail(), let the
driver provide this, or any other restrictions, e.g mtu for tx, or
driver specific memory model restrictions .. 

> Drivers can either DMA map enough memory, or handle the corner case
> in
> a special way.
> 
> IDK if that makes sense, we may be talking past each other :)

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff
  2020-04-09 23:07         ` Saeed Mahameed
@ 2020-04-09 23:27           ` Jakub Kicinski
  0 siblings, 0 replies; 78+ messages in thread
From: Jakub Kicinski @ 2020-04-09 23:27 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: akiyano, willemdebruijn.kernel, borkmann, jeffrey.t.kirsher,
	john.fastabend, toke, alexei.starovoitov, gtzalik, dsahern,
	brouer, sameehj, zorik, alexander.duyck, bpf, ilias.apalodimas,
	netdev, lorenzo

On Thu, 9 Apr 2020 23:07:42 +0000 Saeed Mahameed wrote:
> > My concern is that driver may allocate a full page for each frame but
> > only DMA map the amount that can reasonably contain data given the
> > MTU.
> > To save on DMA syncs.
> > 
> > Today that wouldn't be a problem, because XDP_REDIRECT will re-map
> > the
> > page, and XDP_TX has the same MTU.
> 
> I am not worried about dma at all, i am worried about the xdp progs
> which are now allowed to extend packets beyond the mtu and do XDP_TX.
> but as i am thinking about this i just realized that this can already
> happen with xdp_adjust_head()..
> 
> but as you stated above this puts alot of assumptions on how driver
> should dma rx buffs 
> 
> > In this set xdp_data_hard_end is used both to find the end of memory
> > buffer, and end of DMA buffer. Implementation of
> > bpf_xdp_adjust_tail()
> > assumes anything < SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
> > from
> > the end is fair game.
> 
> but why skb_shared_info in particular though ? this assumes someone
> needs this tail for building skbs .. looks weird to me.

Fair, simplifies the internals, I guess.

> > So I was trying to say that we should warn driver authors that the
> > DMA
> > buffer can now grow / move beyond what the driver may expect in
> > XDP_TX.  
> 
> Ack, but can we do it by desing ? i.e instead of having hardcoded
> limits (e.g. SKB_DATA_ALIGN(shinfo)) in bpf_xdp_adjust_tail(), let the
> driver provide this, or any other restrictions, e.g mtu for tx, or
> driver specific memory model restrictions .. 

Right, actually for NFP we need to add the check already - looking at
the code - the DMA mapping does not cover anything beyond the headroom +
MTU.

> > Drivers can either DMA map enough memory, or handle the corner case
> > in
> > a special way.
> > 
> > IDK if that makes sense, we may be talking past each other :)  

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 27/33] ice: add XDP frame size to driver
  2020-04-08 11:52   ` [Intel-wired-lan] " Jesper Dangaard Brouer
  (?)
@ 2020-04-10  0:59   ` kbuild test robot
  2020-04-14 10:39     ` Jesper Dangaard Brouer
  -1 siblings, 1 reply; 78+ messages in thread
From: kbuild test robot @ 2020-04-10  0:59 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2300 bytes --]

Hi Jesper,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on bpf-next/master]
[also build test ERROR on bpf/master linus/master next-20200409]
[cannot apply to jkirsher-next-queue/dev-queue v5.6]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/XDP-extend-with-knowledge-of-frame-size/20200410-032658
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: sparc-allyesconfig (attached as .config)
compiler: sparc64-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=9.3.0 make.cross ARCH=sparc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   drivers/net/ethernet/intel/ice/ice_txrx.c: In function 'ice_rx_frame_truesize':
>> drivers/net/ethernet/intel/ice/ice_txrx.c:439:2: error: expected ';' before 'return'
     439 |  return truesize;
         |  ^~~~~~
>> drivers/net/ethernet/intel/ice/ice_txrx.c:440:1: warning: no return statement in function returning non-void [-Wreturn-type]
     440 | }
         | ^

vim +439 drivers/net/ethernet/intel/ice/ice_txrx.c

   425	
   426	static unsigned int ice_rx_frame_truesize(struct ice_ring *rx_ring,
   427						  unsigned int size)
   428	{
   429		unsigned int truesize;
   430	
   431	#if (PAGE_SIZE < 8192)
   432		truesize = ice_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
   433	#else
   434		truesize = ice_rx_offset(rx_ring) ?
   435			SKB_DATA_ALIGN(ice_rx_offset(rx_ring) + size) +
   436			SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
   437			SKB_DATA_ALIGN(size)
   438	#endif
 > 439		return truesize;
 > 440	}
   441	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 61398 bytes --]

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 05/33] net: netsec: Add support for XDP frame size
  2020-04-08 13:09   ` Lorenzo Bianconi
@ 2020-04-14  8:07     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-14  8:07 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: sameehj, Ilias Apalodimas, netdev, bpf, zorik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David Ahern, Willem de Bruijn,
	Lorenzo Bianconi, Saeed Mahameed, brouer

On Wed, 8 Apr 2020 15:09:23 +0200
Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:

> > From: Ilias Apalodimas <ilias.apalodimas@linaro.org>  
> 
> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>

Thanks, collected ACK for next submission.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 16/33] mlx4: add XDP frame size and adjust max XDP MTU
  2020-04-08 12:57   ` Tariq Toukan
@ 2020-04-14  8:19     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-14  8:19 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: sameehj, Tariq Toukan, Saeed Mahameed, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, brouer

On Wed, 8 Apr 2020 15:57:00 +0300
Tariq Toukan <ttoukan.linux@gmail.com> wrote:

> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>

Thanks, collected this reviewed-by.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail()
  2020-04-08 21:49   ` David Miller
@ 2020-04-14  9:43     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-14  9:43 UTC (permalink / raw)
  To: David Miller
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik, toke, borkmann,
	alexei.starovoitov, john.fastabend, alexander.duyck,
	jeffrey.t.kirsher, dsahern, willemdebruijn.kernel,
	ilias.apalodimas, lorenzo, saeedm, brouer

On Wed, 08 Apr 2020 14:49:14 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Date: Wed, 08 Apr 2020 13:53:06 +0200
> 
> > @@ -3445,6 +3445,11 @@ BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
> >  	if (unlikely(data_end < xdp->data + ETH_HLEN))
> >  		return -EINVAL;
> >  
> > +	/* Clear memory area on grow, can contain uninit kernel memory */
> > +	if (offset > 0) {
> > +		memset(xdp->data_end, 0, offset);
> > +	}  
> 
> Single statement basic blocks should elide curly braces.

Fixed

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size
  2020-04-08 11:53 ` [PATCH RFC v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
  2020-04-09  3:31   ` Saeed Mahameed
@ 2020-04-14  9:56   ` Jesper Dangaard Brouer
  2020-04-14 10:11     ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-14  9:56 UTC (permalink / raw)
  To: sameehj
  Cc: netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, brouer


On Wed, 08 Apr 2020 13:53:01 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 7628b947dbc3..4d58a147eed0 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3422,12 +3422,26 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
>  
>  BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
>  {
> +	void *data_hard_end = xdp_data_hard_end(xdp);
>  	void *data_end = xdp->data_end + offset;
>  
[...]
> +	/* DANGER: ALL drivers MUST be converted to init xdp->frame_sz
> +	 * - Adding some chicken checks below
> +	 * - Will (likely) not be for upstream
> +	 */
> +	if (unlikely(xdp->frame_sz < (xdp->data_end - xdp->data_hard_start))) {
> +		WARN(1, "Too small xdp->frame_sz = %d\n", xdp->frame_sz);
> +		return -EINVAL;
> +	}
> +	if (unlikely(xdp->frame_sz > PAGE_SIZE)) {
> +		WARN(1, "Too BIG xdp->frame_sz = %d\n", xdp->frame_sz);
> +		return -EINVAL;
> +	}

Any opinions on above checks?
Should they be removed or kept?

The idea is to catch drivers that forgot to update xdp_buff->frame_sz,
by doing some sanity checks on this uninit value.  If I correctly
updated all XDP drivers in this patchset, then these checks should be
unnecessary, but will this be valuable for driver developers converting
new drivers to XDP to have these WARN checks?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Help for reviewers:

+/* Reserve memory area at end-of data area.
+ *
+ * This macro reserves tailroom in the XDP buffer by limiting the
+ * XDP/BPF data access to data_hard_end.  Notice same area (and size)
+ * is used for XDP_PASS, when constructing the SKB via build_skb().
+ */
+#define xdp_data_hard_end(xdp)				\
+	((xdp)->data_hard_start + (xdp)->frame_sz -	\
+	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size
  2020-04-14  9:56   ` Jesper Dangaard Brouer
@ 2020-04-14 10:11     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 78+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-14 10:11 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: netdev, bpf, zorik, akiyano, gtzalik, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, brouer

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> On Wed, 08 Apr 2020 13:53:01 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 7628b947dbc3..4d58a147eed0 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -3422,12 +3422,26 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
>>  
>>  BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
>>  {
>> +	void *data_hard_end = xdp_data_hard_end(xdp);
>>  	void *data_end = xdp->data_end + offset;
>>  
> [...]
>> +	/* DANGER: ALL drivers MUST be converted to init xdp->frame_sz
>> +	 * - Adding some chicken checks below
>> +	 * - Will (likely) not be for upstream
>> +	 */
>> +	if (unlikely(xdp->frame_sz < (xdp->data_end - xdp->data_hard_start))) {
>> +		WARN(1, "Too small xdp->frame_sz = %d\n", xdp->frame_sz);
>> +		return -EINVAL;
>> +	}
>> +	if (unlikely(xdp->frame_sz > PAGE_SIZE)) {
>> +		WARN(1, "Too BIG xdp->frame_sz = %d\n", xdp->frame_sz);
>> +		return -EINVAL;
>> +	}
>
> Any opinions on above checks?
> Should they be removed or kept?
>
> The idea is to catch drivers that forgot to update xdp_buff->frame_sz,
> by doing some sanity checks on this uninit value.  If I correctly
> updated all XDP drivers in this patchset, then these checks should be
> unnecessary, but will this be valuable for driver developers converting
> new drivers to XDP to have these WARN checks?

Hmm, I wonder if there's a way we could have these kinds of checks
available, but disabled by default? A new macro (e.g.,
XDP_CHECK(condition)) that is only enabled when some debug option is
enabled in the kernel build, perhaps? Or just straight ifdef'ing them
out, but maybe a macro would be generally useful?

-Toke


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 26/33] i40e: add XDP frame size to driver
  2020-04-08 21:48     ` [Intel-wired-lan] " David Miller
@ 2020-04-14 10:16       ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-14 10:16 UTC (permalink / raw)
  To: David Miller
  Cc: sameehj, intel-wired-lan, jeffrey.t.kirsher, alexander.duyck,
	netdev, bpf, zorik, akiyano, gtzalik, toke, borkmann,
	alexei.starovoitov, john.fastabend, dsahern,
	willemdebruijn.kernel, ilias.apalodimas, lorenzo, saeedm, brouer

On Wed, 08 Apr 2020 14:48:45 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Date: Wed, 08 Apr 2020 13:52:46 +0200
> 
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > index b8496037ef7f..1fb6b1004dcb 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > @@ -1507,6 +1507,23 @@ static inline unsigned int i40e_rx_offset(struct i40e_ring *rx_ring)
> >  	return ring_uses_build_skb(rx_ring) ? I40E_SKB_PAD : 0;
> >  }
> >  
> > +static inline unsigned int i40e_rx_frame_truesize(struct i40e_ring *rx_ring,
> > +						  unsigned int size)  
> 
> Please don't use inline in foo.c files.  I noticed you properly elided this in
> the ice changes so I wonder why it showed up here :-)

Yes, I know I should not do this.  It got here by copy-paste accident,
as I first had ixgbe function in a header file, and later I decided to
move this into the ixgbe C-file, but I had already copy-pasted this
into i40e driver ;-)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [Intel-wired-lan] [PATCH RFC v2 26/33] i40e: add XDP frame size to driver
@ 2020-04-14 10:16       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-14 10:16 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, 08 Apr 2020 14:48:45 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Date: Wed, 08 Apr 2020 13:52:46 +0200
> 
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > index b8496037ef7f..1fb6b1004dcb 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> > @@ -1507,6 +1507,23 @@ static inline unsigned int i40e_rx_offset(struct i40e_ring *rx_ring)
> >  	return ring_uses_build_skb(rx_ring) ? I40E_SKB_PAD : 0;
> >  }
> >  
> > +static inline unsigned int i40e_rx_frame_truesize(struct i40e_ring *rx_ring,
> > +						  unsigned int size)  
> 
> Please don't use inline in foo.c files.  I noticed you properly elided this in
> the ice changes so I wonder why it showed up here :-)

Yes, I know I should not do this.  It got here by copy-paste accident,
as I first had ixgbe function in a header file, and later I decided to
move this into the ixgbe C-file, but I had already copy-pasted this
into i40e driver ;-)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 27/33] ice: add XDP frame size to driver
  2020-04-10  0:59   ` kbuild test robot
@ 2020-04-14 10:39     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-14 10:39 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1384 bytes --]


Thanks for catching the missing semi-colon, fixed.
--Jesper

On Fri, 10 Apr 2020 08:59:46 +0800
kbuild test robot <lkp@intel.com> wrote:

> All error/warnings (new ones prefixed by >>):
> 
>    drivers/net/ethernet/intel/ice/ice_txrx.c: In function 'ice_rx_frame_truesize':
> >> drivers/net/ethernet/intel/ice/ice_txrx.c:439:2: error: expected ';' before 'return'  
>      439 |  return truesize;
>          |  ^~~~~~
> >> drivers/net/ethernet/intel/ice/ice_txrx.c:440:1: warning: no return statement in function returning non-void [-Wreturn-type]  
>      440 | }
>          | ^
> 
> vim +439 drivers/net/ethernet/intel/ice/ice_txrx.c
> 
>    425	
>    426	static unsigned int ice_rx_frame_truesize(struct ice_ring *rx_ring,
>    427						  unsigned int size)
>    428	{
>    429		unsigned int truesize;
>    430	
>    431	#if (PAGE_SIZE < 8192)
>    432		truesize = ice_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
>    433	#else
>    434		truesize = ice_rx_offset(rx_ring) ?
>    435			SKB_DATA_ALIGN(ice_rx_offset(rx_ring) + size) +
>    436			SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
>    437			SKB_DATA_ALIGN(size)
>    438	#endif
>  > 439		return truesize;
>  > 440	}  
>    441	



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size
  2020-04-09  3:31   ` Saeed Mahameed
@ 2020-04-14 12:46     ` Jesper Dangaard Brouer
  2020-04-18  3:33       ` Saeed Mahameed
  0 siblings, 1 reply; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-14 12:46 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: toke, gtzalik, ilias.apalodimas, borkmann, alexander.duyck,
	john.fastabend, akiyano, zorik, alexei.starovoitov, netdev,
	jeffrey.t.kirsher, bpf, dsahern, lorenzo, willemdebruijn.kernel,
	brouer, Steffen Klassert, Willy Tarreau

On Thu, 9 Apr 2020 03:31:14 +0000
Saeed Mahameed <saeedm@mellanox.com> wrote:

> On Wed, 2020-04-08 at 13:53 +0200, Jesper Dangaard Brouer wrote:
> > Finally, after all drivers have a frame size, allow BPF-helper
> > bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.
> >   
> 
> can you provide a list of usecases for why tail extension is necessary
> ?

Use-cases:
(1) IPsec / XFRM needs a tail extend[1][2].
(2) DNS-cache replies in XDP.
(3) HA-proxy ALOHA would need it to convert to XDP.
 
> and what do you have in mind as immediate use of bpf_xdp_adjust_tail()
> ? 

I guess Steffen Klassert's ipsec use-case(1) it the most immediate.

[1] http://vger.kernel.org/netconf2019_files/xfrm_xdp.pdf
[2] http://vger.kernel.org/netconf2019.html

> both cover letter and commit messages didn't list any actual use case..

Sorry about that.

> > Remember that helper/macro xdp_data_hard_end have reserved some
> > tailroom.  Thus, this helper makes sure that the BPF-prog don't have
> > access to this tailroom area.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >  include/uapi/linux/bpf.h |    4 ++--
> >  net/core/filter.c        |   18 ++++++++++++++++--
> >  2 files changed, 18 insertions(+), 4 deletions(-)
> > 
[... cut ...]
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 7628b947dbc3..4d58a147eed0 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -3422,12 +3422,26 @@ static const struct bpf_func_proto
> > bpf_xdp_adjust_head_proto = {
> >  
> >  BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
> >  {
> > +	void *data_hard_end = xdp_data_hard_end(xdp);
> >  	void *data_end = xdp->data_end + offset;
> >  
> > -	/* only shrinking is allowed for now. */
> > -	if (unlikely(offset >= 0))
> > +	/* Notice that xdp_data_hard_end have reserved some tailroom */
> > +	if (unlikely(data_end > data_hard_end))
> >  		return -EINVAL;
> >    
> 
> i don't know if i like this approach for couple of reasons.
> 
> 1. drivers will provide arbitrary frames_sz, which is normally larger
> than mtu, and could be a full page size, for XDP_TX action this can be
> problematic if xdp progs will allow oversized packets to get caught at
> the driver level..

We already check if MTU is exceeded for a specific device when we
redirect into this, see helper xdp_ok_fwd_dev().  For the XDP_TX case,
I guess some drivers bypass that check, which should be fixed. The
XDP_TX case is IMHO a place where we allow drivers do special
optimizations, thus drivers can choose to do something faster than
calling generic helper xdp_ok_fwd_dev().  
  
> 
> 2. xdp_data_hard_end(xdp) has a hardcoded assumption of the skb shinfo
> and it introduces a reverse dependency between xdp buff and skbuff 
> 
(I'll address this in another mail)

> both of the above can be solved if the drivers provided the max
> allowed frame size, already accounting for mtu and shinfo when setting
> xdp_buff.frame_sz at the driver level.

It seems we look at the problem from two different angles.  You have
the drivers perspective, while I have the network stacks perspective
(the XDP_PASS case).  The mlx5 driver treats XDP as a special case, by
hiding or confining xdp_buff to functions fairly deep in the
call-stack.  My goal is different (moving SKB out of drivers), I see
the xdp_buff/xdp_frame as the main packet object in the drivers, that
gets send up the network stack (after converting to xdp_frame) and
converted into SKB in core-code (yes, there is a long road-ahead). The
larger tailroom can be used by netstack in SKB-coalesce.

The next step is making xdp_buff (and xdp_frame) multi-buffer aware.
This is why I reserve room for skb_shared_info.  I have considered
reducing the size of xdp_buff.frame_sz, with sizeof(skb_shared_info),
but it got kind of ugly having this in each drivers.

I also considered having drivers setup a direct pointer to
{skb,xdp}_shared_info section in xdp_buff, because will make it more
flexible (for what I imagined Alexander Duyck want).  (But we can still
do/change that later, once we start work in multi-buffer code)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 19/33] nfp: add XDP frame size to netronome driver
  2020-04-08 17:53   ` Jakub Kicinski
@ 2020-04-14 14:02     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-14 14:02 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, brouer

On Wed, 8 Apr 2020 10:53:44 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> On Wed, 08 Apr 2020 13:52:10 +0200 Jesper Dangaard Brouer wrote:
> > The netronome nfp driver already had a true_bufsz variable
> > that contains what was needed for xdp.frame_sz.
> > 
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >  .../net/ethernet/netronome/nfp/nfp_net_common.c    |    1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> > index 9bfb3b077bc1..b9b8c30eab33 100644
> > --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> > +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> > @@ -1817,6 +1817,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
> >  	rcu_read_lock();
> >  	xdp_prog = READ_ONCE(dp->xdp_prog);
> >  	true_bufsz = xdp_prog ? PAGE_SIZE : dp->fl_bufsz;
> > +	xdp.frame_sz = true_bufsz;  
> 
> Since this matters only with XDP on - we can set to PAGE_SIZE directly?

Well the value was already calculate for us in true_bufsz, but I can
change that.

> But more importantly the correct value is:
> 
> 	PAGE_SIZE - NFP_NET_RX_BUF_HEADROOM

Thanks for catching that. I will fix.

> as we set hard_start at an offset. 
> 
> 	xdp.data_hard_start = rxbuf->frag + NFP_NET_RX_BUF_HEADROOM;
> 
> Cause NFP_NET_RX_BUF_HEADROOM is not DMA mapped.
> 
> >  	xdp.rxq = &rx_ring->xdp_rxq;
> >  	tx_ring = r_vec->xdp_ring;  
> 

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff
  2020-04-08 17:53   ` Jakub Kicinski
  2020-04-09  0:48     ` Saeed Mahameed
@ 2020-04-14 14:16     ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-14 14:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, brouer

On Wed, 8 Apr 2020 10:53:39 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> > + * This macro reserves tailroom in the XDP buffer by limiting the
> > + * XDP/BPF data access to data_hard_end.  Notice same area (and size)
> > + * is used for XDP_PASS, when constructing the SKB via build_skb().
> > + */
> > +#define xdp_data_hard_end(xdp)				\
> > +	((xdp)->data_hard_start + (xdp)->frame_sz -	\
> > +	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))  
> 
> I think it should be said somewhere that the drivers are expected to
> DMA map memory up to xdp_data_hard_end(xdp).

No, I don't want driver to DMA map memory up to xdp_data_hard_end(xdp).

The driver will/should map up-to the configured MTU.  Reading ahead, I
can see that you worry about XDP_TX, that doesn't do the MTU check in
xdp_ok_fwd_dev() like we do for XDP_REDIRECT.  Guess, we need to check
that before doing XDP_TX, such that we don't DMA sync for_device, for
an area that does not included the original DMA-map area.  I wonder if
that is a violation, if so, it is also problematic for adjust *head*.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 17/33] mlx5: rx queue setup time determine frame_sz for XDP
  2020-04-08 12:52   ` Tariq Toukan
@ 2020-04-16 12:04     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-16 12:04 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: sameehj, Saeed Mahameed, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, brouer

On Wed, 8 Apr 2020 15:52:26 +0300
Tariq Toukan <ttoukan.linux@gmail.com> wrote:

> Hi Jesper,
> 
> Thanks for your patch.
> Please see feedback below.
> 
> On 4/8/2020 2:52 PM, Jesper Dangaard Brouer wrote:
> > The mlx5 driver have multiple memory models, which are also changed
> > according to whether a XDP bpf_prog is attached.
> > 
> > The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.:
> >   # ethtool --set-priv-flags mlx5p2 rx_striding_rq off
> > 
> > On the general case with 4K page_size and regular MTU packet, then
> > the frame_sz is 2048 and 4096 when XDP is enabled, in both modes.
> > 
> > The info on the given frame size is stored differently depending on the
> > RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe.
> > In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which
> > corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ).
> > In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored
> > in rq->wqe.info.arr[0].frag_stride.  
> 
> Just to clarify, the above description is true as long as we're in the 
> Linear SKB memory scheme, this holds when:
> 1) MTU + headroom + tailroom < PAGE_SIZE, and
> 2) HW LRO is OFF.
> 
> Otherwise, mpwqe.log_stride_sz can be smaller, and frag_stride of 
> wqe_info can vary from one index to another.
> 
> > 
> > To reduce effect on fast-path, this patch determine the frame_sz at
> > setup time, to avoid determining the memory model runtime.
> > 
> > Cc: Tariq Toukan <tariqt@mellanox.com>
> > Cc: Saeed Mahameed <saeedm@mellanox.com>
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >   drivers/net/ethernet/mellanox/mlx5/core/en.h      |    1 +
> >   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |    1 +
> >   drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    4 ++++
> >   3 files changed, 6 insertions(+)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> > index 12a61bf82c14..1f280fc142ca 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> > @@ -651,6 +651,7 @@ struct mlx5e_rq {
> >   	struct {
> >   		u16            umem_headroom;
> >   		u16            headroom;
> > +		u32            frame_sz;
> >   		u8             map_dir;   /* dma map direction */
> >   	} buff;
> >   
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> > index f049e0ac308a..de4ad2c9f49a 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> > @@ -137,6 +137,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
> >   	if (xsk)
> >   		xdp.handle = di->xsk.handle;
> >   	xdp.rxq = &rq->xdp_rxq;
> > +	xdp.frame_sz = rq->buff.frame_sz;
> >   
> >   	act = bpf_prog_run_xdp(prog, &xdp);
> >   	if (xsk) {
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index dd7f338425eb..b9595315c45b 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -462,6 +462,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
> >   		rq->mpwqe.num_strides =
> >   			BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, xsk));
> >   
> > +		rq->buff.frame_sz = (1 << rq->mpwqe.log_stride_sz);
> > +  
> 
> This is always correct.
> 
> >   		err = mlx5e_create_rq_umr_mkey(mdev, rq);
> >   		if (err)
> >   			goto err_rq_wq_destroy;
> > @@ -485,6 +487,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
> >   			num_xsk_frames = wq_sz << rq->wqe.info.log_num_frags;
> >   
> >   		rq->wqe.info = rqp->frags_info;
> > +		rq->buff.frame_sz = rq->wqe.info.arr[0].frag_stride;
> > +  
> 
> This is not always correct.
> Size of the last frag for a large MTU might be a full page.
> See:
> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#L2097
> 
> However, you won't try to use this value at all in the non-linear SKB 
> flow, as it's not compatible with XDP.

Yes, exactly.

> Anyway, I prefer this value to be always true. No matter if it's really 
> used or not.
> Probably rename the field name to indicate this?
> Something like: single_frame_sz / first_frame_sz ?

Okay, I've renamed the field name to "first_frame_sz".  As this field
only describe the size of the first fragment.  This is fits with what
we are currently planning, to only give XDP/eBPF access to the first
fragment in case of multi-buffer XDP. (And then use Daniels idea of a
BPF-helper to pull in more data if explicitly requested).

Still trying to figure out if this is correct for AF_XDP.

And trying if I can get it more correct for non-linear case,
even-though it is not really used in that case.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff
  2020-04-09  0:50   ` Saeed Mahameed
@ 2020-04-16 13:02     ` Jesper Dangaard Brouer
  2020-04-17 23:09       ` Saeed Mahameed
  0 siblings, 1 reply; 78+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-16 13:02 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: sameehj, toke, gtzalik, ilias.apalodimas, borkmann,
	alexander.duyck, john.fastabend, akiyano, zorik,
	alexei.starovoitov, netdev, jeffrey.t.kirsher, bpf, dsahern,
	lorenzo, willemdebruijn.kernel, brouer

On Thu, 9 Apr 2020 00:50:02 +0000
Saeed Mahameed <saeedm@mellanox.com> wrote:

> On Wed, 2020-04-08 at 13:50 +0200, Jesper Dangaard Brouer wrote:
> > XDP have evolved to support several frame sizes, but xdp_buff was not
> > updated with this information. The frame size (frame_sz) member of
> > xdp_buff is introduced to know the real size of the memory the frame
> > is
> > delivered in.
> > 
> > When introducing this also make it clear that some tailroom is
> > reserved/required when creating SKBs using build_skb().
> > 
> > It would also have been an option to introduce a pointer to
> > data_hard_end (with reserved offset). The advantage with frame_sz is
> > that (like rxq) drivers only need to setup/assign this value once per
> > NAPI cycle. Due to XDP-generic (and some drivers) it's not possible
> > to
> > store frame_sz inside xdp_rxq_info, because it's varies per packet as
> > it
> > can be based/depend on packet length.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >  include/net/xdp.h |   17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index 40c6d3398458..99f4374f6214 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -6,6 +6,8 @@
> >  #ifndef __LINUX_NET_XDP_H__
> >  #define __LINUX_NET_XDP_H__
> >  
> > +#include <linux/skbuff.h> /* skb_shared_info */
> > +  
> 
> I think it is wrong to make xdp.h depend on skbuff.h
> we must keep xdp.h minimal and independent,

I agree, that it seems strange to have xdp.h include skbuff.h, and I'm
not happy with that approach myself, but the alternatives all looked
kind of ugly.

> the new macros should be defined in skbuff.h 

Moving #define xdp_data_hard_end(xdp) into skbuff.h also seems strange.


> >  /**
> >   * DOC: XDP RX-queue information
> >   *
> > @@ -70,8 +72,23 @@ struct xdp_buff {
> >  	void *data_hard_start;
> >  	unsigned long handle;
> >  	struct xdp_rxq_info *rxq;
> > +	u32 frame_sz; /* frame size to deduct data_hard_end/reserved
> > tailroom*/  
> 
> why u32 ? u16 should be more than enough.. 

Nope.  It need to be able to store PAGE_SIZE == 65536.

$ echo $((1<<12))
4096
$ echo $((1<<16))
65536

$ printf "0x%X\n" 65536
0x10000


> >  };
> >  
> > +/* Reserve memory area at end-of data area.
> > + *
> > + * This macro reserves tailroom in the XDP buffer by limiting the
> > + * XDP/BPF data access to data_hard_end.  Notice same area (and size)
> > + * is used for XDP_PASS, when constructing the SKB via build_skb().
> > + */
> > +#define xdp_data_hard_end(xdp)				\
> > +	((xdp)->data_hard_start + (xdp)->frame_sz -	\
> > +	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> > +  
> 
> this macro is not safe when unary operators are being used

The parentheses round (xdp) does make xdp_data_hard_end(&xdp) work
correctly. What other cases are you worried about?


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff
  2020-04-16 13:02     ` Jesper Dangaard Brouer
@ 2020-04-17 23:09       ` Saeed Mahameed
  0 siblings, 0 replies; 78+ messages in thread
From: Saeed Mahameed @ 2020-04-17 23:09 UTC (permalink / raw)
  To: brouer
  Cc: akiyano, willemdebruijn.kernel, borkmann, jeffrey.t.kirsher,
	john.fastabend, toke, alexei.starovoitov, gtzalik, dsahern,
	sameehj, alexander.duyck, bpf, ilias.apalodimas, zorik, netdev,
	lorenzo

On Thu, 2020-04-16 at 15:02 +0200, Jesper Dangaard Brouer wrote:
> On Thu, 9 Apr 2020 00:50:02 +0000
> Saeed Mahameed <saeedm@mellanox.com> wrote:
> 
> > On Wed, 2020-04-08 at 13:50 +0200, Jesper Dangaard Brouer wrote:
> > > XDP have evolved to support several frame sizes, but xdp_buff was
> > > not
> > > updated with this information. The frame size (frame_sz) member
> > > of
> > > xdp_buff is introduced to know the real size of the memory the
> > > frame
> > > is
> > > delivered in.
> > > 
> > > When introducing this also make it clear that some tailroom is
> > > reserved/required when creating SKBs using build_skb().
> > > 
> > > It would also have been an option to introduce a pointer to
> > > data_hard_end (with reserved offset). The advantage with frame_sz
> > > is
> > > that (like rxq) drivers only need to setup/assign this value once
> > > per
> > > NAPI cycle. Due to XDP-generic (and some drivers) it's not
> > > possible
> > > to
> > > store frame_sz inside xdp_rxq_info, because it's varies per
> > > packet as
> > > it
> > > can be based/depend on packet length.
> > > 
> > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > > ---
> > >  include/net/xdp.h |   17 +++++++++++++++++
> > >  1 file changed, 17 insertions(+)
> > > 
> > > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > > index 40c6d3398458..99f4374f6214 100644
> > > --- a/include/net/xdp.h
> > > +++ b/include/net/xdp.h
> > > @@ -6,6 +6,8 @@
> > >  #ifndef __LINUX_NET_XDP_H__
> > >  #define __LINUX_NET_XDP_H__
> > >  
> > > +#include <linux/skbuff.h> /* skb_shared_info */
> > > +  
> > 
> > I think it is wrong to make xdp.h depend on skbuff.h
> > we must keep xdp.h minimal and independent,
> 
> I agree, that it seems strange to have xdp.h include skbuff.h, and
> I'm
> not happy with that approach myself, but the alternatives all looked
> kind of ugly.
> 
> > the new macros should be defined in skbuff.h 
> 
> Moving #define xdp_data_hard_end(xdp) into skbuff.h also seems
> strange.
> 

So maybe we shouldn't have any dependencies by design, and let the
drivers decide how much tailroom they want to preserve, and remove the
hardcoded sizeof(skb_shinfo).. 

maybe per rxq ? on memory model registration ?


> 
> > >  /**
> > >   * DOC: XDP RX-queue information
> > >   *
> > > @@ -70,8 +72,23 @@ struct xdp_buff {
> > >  	void *data_hard_start;
> > >  	unsigned long handle;
> > >  	struct xdp_rxq_info *rxq;
> > > +	u32 frame_sz; /* frame size to deduct data_hard_end/reserved
> > > tailroom*/  
> > 
> > why u32 ? u16 should be more than enough.. 
> 
> Nope.  It need to be able to store PAGE_SIZE == 65536.
> 
> $ echo $((1<<12))
> 4096
> $ echo $((1<<16))
> 65536
> 
> $ printf "0x%X\n" 65536
> 0x10000
> 

:(

> 
> > >  };
> > >  
> > > +/* Reserve memory area at end-of data area.
> > > + *
> > > + * This macro reserves tailroom in the XDP buffer by limiting
> > > the
> > > + * XDP/BPF data access to data_hard_end.  Notice same area (and
> > > size)
> > > + * is used for XDP_PASS, when constructing the SKB via
> > > build_skb().
> > > + */
> > > +#define xdp_data_hard_end(xdp)				\
> > > +	((xdp)->data_hard_start + (xdp)->frame_sz -	\
> > > +	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> > > +  
> > 
> > this macro is not safe when unary operators are being used
> 
> The parentheses round (xdp) does make xdp_data_hard_end(&xdp) work
> correctly. What other cases are you worried about?
> 
> 

consider: 
xdp_data_hard_end(xdp_ptr++)

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH RFC v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size
  2020-04-14 12:46     ` Jesper Dangaard Brouer
@ 2020-04-18  3:33       ` Saeed Mahameed
  0 siblings, 0 replies; 78+ messages in thread
From: Saeed Mahameed @ 2020-04-18  3:33 UTC (permalink / raw)
  To: brouer
  Cc: akiyano, willemdebruijn.kernel, borkmann, jeffrey.t.kirsher,
	john.fastabend, toke, alexei.starovoitov, gtzalik, dsahern,
	alexander.duyck, bpf, ilias.apalodimas, zorik, netdev,
	steffen.klassert, lorenzo, w

On Tue, 2020-04-14 at 14:46 +0200, Jesper Dangaard Brouer wrote:
> On Thu, 9 Apr 2020 03:31:14 +0000
> Saeed Mahameed <saeedm@mellanox.com> wrote:
> 
> > On Wed, 2020-04-08 at 13:53 +0200, Jesper Dangaard Brouer wrote:
> > > Finally, after all drivers have a frame size, allow BPF-helper
> > > bpf_xdp_adjust_tail() to grow or extend packet size at frame
> > > tail.
> > >   
> > 
> > can you provide a list of usecases for why tail extension is
> > necessary
> > ?
> 
> Use-cases:
> (1) IPsec / XFRM needs a tail extend[1][2].
> (2) DNS-cache replies in XDP.
> (3) HA-proxy ALOHA would need it to convert to XDP.
>  
> > and what do you have in mind as immediate use of
> > bpf_xdp_adjust_tail()
> > ? 
> 
> I guess Steffen Klassert's ipsec use-case(1) it the most immediate.
> 
> [1] http://vger.kernel.org/netconf2019_files/xfrm_xdp.pdf
> [2] http://vger.kernel.org/netconf2019.html
> 

Thanks !

> > both cover letter and commit messages didn't list any actual use
> > case..
> 
> Sorry about that.
> 
> > > Remember that helper/macro xdp_data_hard_end have reserved some
> > > tailroom.  Thus, this helper makes sure that the BPF-prog don't
> > > have
> > > access to this tailroom area.
> > > 
> > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > > ---
> > >  include/uapi/linux/bpf.h |    4 ++--
> > >  net/core/filter.c        |   18 ++++++++++++++++--
> > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > 
> [... cut ...]
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 7628b947dbc3..4d58a147eed0 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -3422,12 +3422,26 @@ static const struct bpf_func_proto
> > > bpf_xdp_adjust_head_proto = {
> > >  
> > >  BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int,
> > > offset)
> > >  {
> > > +	void *data_hard_end = xdp_data_hard_end(xdp);
> > >  	void *data_end = xdp->data_end + offset;
> > >  
> > > -	/* only shrinking is allowed for now. */
> > > -	if (unlikely(offset >= 0))
> > > +	/* Notice that xdp_data_hard_end have reserved some tailroom */
> > > +	if (unlikely(data_end > data_hard_end))
> > >  		return -EINVAL;
> > >    
> > 
> > i don't know if i like this approach for couple of reasons.
> > 
> > 1. drivers will provide arbitrary frames_sz, which is normally
> > larger
> > than mtu, and could be a full page size, for XDP_TX action this can
> > be
> > problematic if xdp progs will allow oversized packets to get caught
> > at
> > the driver level..
> 
> We already check if MTU is exceeded for a specific device when we
> redirect into this, see helper xdp_ok_fwd_dev().  For the XDP_TX
> case,
> I guess some drivers bypass that check, which should be fixed. The
> XDP_TX case is IMHO a place where we allow drivers do special
> optimizations, thus drivers can choose to do something faster than
> calling generic helper xdp_ok_fwd_dev().  
>   
> > 2. xdp_data_hard_end(xdp) has a hardcoded assumption of the skb
> > shinfo
> > and it introduces a reverse dependency between xdp buff and skbuff 
> > 
> (I'll address this in another mail)
> 
> > both of the above can be solved if the drivers provided the max
> > allowed frame size, already accounting for mtu and shinfo when
> > setting
> > xdp_buff.frame_sz at the driver level.
> 
> It seems we look at the problem from two different angles.  You have
> the drivers perspective, while I have the network stacks perspective
> (the XDP_PASS case).  The mlx5 driver treats XDP as a special case,
> by
> hiding or confining xdp_buff to functions fairly deep in the
> call-stack.  My goal is different (moving SKB out of drivers), I see
> the xdp_buff/xdp_frame as the main packet object in the drivers, that
> gets send up the network stack (after converting to xdp_frame) and
> converted into SKB in core-code (yes, there is a long road-ahead).
> The
> larger tailroom can be used by netstack in SKB-coalesce.
> 

But to achieve a proper model, the drivers must be notified about the
size of the tailroom they must preserve, now we are just hardcoding it,
where it even doesn't belong. I don't know what the right solution yet.
but we are still not there .. once we totally move memory management
out of the driver, then we might have a better way to preserve head and
tail-room .. 

> The next step is making xdp_buff (and xdp_frame) multi-buffer aware.
> This is why I reserve room for skb_shared_info.  I have considered

this needs to be carefully crafted.. as we don't want to endup with one
more SKB type thing to deal with.. 


> reducing the size of xdp_buff.frame_sz, with sizeof(skb_shared_info),
> but it got kind of ugly having this in each drivers.
> 

can be done via memory model registration ?

> I also considered having drivers setup a direct pointer to
> {skb,xdp}_shared_info section in xdp_buff, because will make it more
> flexible (for what I imagined Alexander Duyck want).  (But we can
> still
> do/change that later, once we start work in multi-buffer code)
> 

you mean something like xdp->data_tail or xdp->data_hard_end ?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: [PATCH RFC v2 15/33] ena: add XDP frame size to amazon NIC driver
  2020-04-08 11:51 ` [PATCH RFC v2 15/33] ena: add XDP frame size to amazon NIC driver Jesper Dangaard Brouer
@ 2020-04-22  8:39   ` Jubran, Samih
  0 siblings, 0 replies; 78+ messages in thread
From: Jubran, Samih @ 2020-04-22  8:39 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Kiyanovski, Arthur, netdev, bpf, Machulsky, Zorik, Kiyanovski,
	Arthur, Tzalik, Guy, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed

Acked-by: Sameeh Jubran <sameehj@amazon.com>

> -----Original Message-----
> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Sent: Wednesday, April 8, 2020 2:52 PM
> To: Jubran, Samih <sameehj@amazon.com>
> Cc: Kiyanovski, Arthur <akiyano@amazon.com>; Jesper Dangaard Brouer
> <brouer@redhat.com>; netdev@vger.kernel.org; bpf@vger.kernel.org;
> Machulsky, Zorik <zorik@amazon.com>; Kiyanovski, Arthur
> <akiyano@amazon.com>; Tzalik, Guy <gtzalik@amazon.com>; Toke Høiland-
> Jørgensen <toke@redhat.com>; Daniel Borkmann
> <borkmann@iogearbox.net>; Alexei Starovoitov
> <alexei.starovoitov@gmail.com>; John Fastabend
> <john.fastabend@gmail.com>; Alexander Duyck
> <alexander.duyck@gmail.com>; Jeff Kirsher <jeffrey.t.kirsher@intel.com>;
> David Ahern <dsahern@gmail.com>; Willem de Bruijn
> <willemdebruijn.kernel@gmail.com>; Ilias Apalodimas
> <ilias.apalodimas@linaro.org>; Lorenzo Bianconi <lorenzo@kernel.org>;
> Saeed Mahameed <saeedm@mellanox.com>
> Subject: [EXTERNAL] [PATCH RFC v2 15/33] ena: add XDP frame size to
> amazon NIC driver
> 
> CAUTION: This email originated from outside of the organization. Do not click
> links or open attachments unless you can confirm the sender and know the
> content is safe.
> 
> 
> 
> Frame size ENA_PAGE_SIZE is limited to 16K on systems with larger
> PAGE_SIZE than 16K. Change ENA_XDP_MAX_MTU to also take into account
> the reserved tailroom.
> 
> Cc: Arthur Kiyanovski <akiyano@amazon.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  drivers/net/ethernet/amazon/ena/ena_netdev.c |    1 +
>  drivers/net/ethernet/amazon/ena/ena_netdev.h |    5 +++--
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 2cc765df8da3..0fd7db1769f8 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -1606,6 +1606,7 @@ static int ena_clean_rx_irq(struct ena_ring
> *rx_ring, struct napi_struct *napi,
>                   "%s qid %d\n", __func__, rx_ring->qid);
>         res_budget = budget;
>         xdp.rxq = &rx_ring->xdp_rxq;
> +       xdp.frame_sz = ENA_PAGE_SIZE;
> 
>         do {
>                 xdp_verdict = XDP_PASS;
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h
> b/drivers/net/ethernet/amazon/ena/ena_netdev.h
> index 97dfd0c67e84..dd00127dfe9f 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
> @@ -151,8 +151,9 @@
>   * The buffer size we share with the device is defined to be ENA_PAGE_SIZE
>   */
> 
> -#define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN
> - \
> -                               VLAN_HLEN - XDP_PACKET_HEADROOM)
> +#define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN -
> ETH_FCS_LEN -      \
> +                        VLAN_HLEN - XDP_PACKET_HEADROOM -              \
> +                        SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> 
>  #define ENA_IS_XDP_INDEX(adapter, index) (((index) >= (adapter)-
> >xdp_first_ring) && \
>         ((index) < (adapter)->xdp_first_ring + (adapter)->xdp_num_queues))
> 


^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2020-04-22  8:39 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-08 11:50 [Intel-wired-lan] [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
2020-04-08 11:50 ` [PATCH RFC v2 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
2020-04-08 17:53   ` Jakub Kicinski
2020-04-09  0:48     ` Saeed Mahameed
2020-04-09  1:13       ` Jakub Kicinski
2020-04-09 23:07         ` Saeed Mahameed
2020-04-09 23:27           ` Jakub Kicinski
2020-04-14 14:16     ` Jesper Dangaard Brouer
2020-04-09  0:50   ` Saeed Mahameed
2020-04-16 13:02     ` Jesper Dangaard Brouer
2020-04-17 23:09       ` Saeed Mahameed
2020-04-08 11:50 ` [PATCH RFC v2 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-08 11:50 ` [PATCH RFC v2 03/33] sfc: add XDP frame size Jesper Dangaard Brouer
2020-04-08 11:50 ` [PATCH RFC v2 04/33] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-08 11:50 ` [PATCH RFC v2 05/33] net: netsec: Add support for XDP frame size Jesper Dangaard Brouer
2020-04-08 13:09   ` Lorenzo Bianconi
2020-04-14  8:07     ` Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 06/33] net: XDP-generic determining " Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 09/33] veth: adjust hard_start offset on redirect XDP frames Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 11/33] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-08 14:56   ` Haiyang Zhang
2020-04-08 11:51 ` [PATCH RFC v2 13/33] qlogic/qede: " Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 14/33] net: ethernet: ti: add XDP frame size to driver cpsw Jesper Dangaard Brouer
2020-04-08 11:51 ` [PATCH RFC v2 15/33] ena: add XDP frame size to amazon NIC driver Jesper Dangaard Brouer
2020-04-22  8:39   ` Jubran, Samih
2020-04-08 11:51 ` [PATCH RFC v2 16/33] mlx4: add XDP frame size and adjust max XDP MTU Jesper Dangaard Brouer
2020-04-08 12:57   ` Tariq Toukan
2020-04-14  8:19     ` Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 17/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
2020-04-08 12:52   ` Tariq Toukan
2020-04-16 12:04     ` Jesper Dangaard Brouer
2020-04-09  9:28   ` Maxim Mikityanskiy
2020-04-08 11:52 ` [PATCH RFC v2 18/33] net: thunderx: add XDP frame size Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 19/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
2020-04-08 17:53   ` Jakub Kicinski
2020-04-14 14:02     ` Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 20/33] tun: add XDP frame size Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 21/33] vhost_net: also populate " Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 22/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 23/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 24/33] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-08 11:52   ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 25/33] ixgbevf: add XDP frame size to VF driver Jesper Dangaard Brouer
2020-04-08 11:52   ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 26/33] i40e: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-08 11:52   ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-04-08 21:48   ` David Miller
2020-04-08 21:48     ` [Intel-wired-lan] " David Miller
2020-04-14 10:16     ` Jesper Dangaard Brouer
2020-04-14 10:16       ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 27/33] ice: " Jesper Dangaard Brouer
2020-04-08 11:52   ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-04-10  0:59   ` kbuild test robot
2020-04-14 10:39     ` Jesper Dangaard Brouer
2020-04-08 11:52 ` [PATCH RFC v2 28/33] xdp: for Intel AF_XDP drivers add XDP frame_sz Jesper Dangaard Brouer
2020-04-08 11:52   ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-04-08 17:31   ` Björn Töpel
2020-04-08 17:31     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-04-09  9:33     ` Maxim Mikityanskiy
2020-04-09  9:33       ` [Intel-wired-lan] " Maxim Mikityanskiy
2020-04-08 11:53 ` [PATCH RFC v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
2020-04-09  3:31   ` Saeed Mahameed
2020-04-14 12:46     ` Jesper Dangaard Brouer
2020-04-18  3:33       ` Saeed Mahameed
2020-04-14  9:56   ` Jesper Dangaard Brouer
2020-04-14 10:11     ` Toke Høiland-Jørgensen
2020-04-08 11:53 ` [PATCH RFC v2 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
2020-04-08 21:49   ` David Miller
2020-04-14  9:43     ` Jesper Dangaard Brouer
2020-04-08 11:53 ` [PATCH RFC v2 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp() Jesper Dangaard Brouer
2020-04-08 11:53 ` [PATCH RFC v2 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail Jesper Dangaard Brouer
2020-04-08 11:53 ` [PATCH RFC v2 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests Jesper Dangaard Brouer
2020-04-08 16:55 ` [PATCH RFC v2 00/33] XDP extend with knowledge of frame size Alexei Starovoitov
2020-04-08 16:55   ` [Intel-wired-lan] " Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.