All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
@ 2020-05-13  1:45 David Ahern
  2020-05-13  1:45 ` [PATCH v5 bpf-next 01/11] net: Refactor convert_to_xdp_frame David Ahern
                   ` (11 more replies)
  0 siblings, 12 replies; 32+ messages in thread
From: David Ahern @ 2020-05-13  1:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, daniel,
	john.fastabend, ast, kafai, songliubraving, yhs, andriin,
	dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

This series adds support for XDP in the egress path by introducing
a new XDP attachment type, BPF_XDP_EGRESS, and adding a UAPI to
if_link.h for attaching the program to a netdevice and reporting
the program. This allows bpf programs to be run on redirected xdp
frames with the context showing the Tx device.

This is a missing primitive for XDP allowing solutions to build small,
targeted programs properly distributed in the networking path allowing,
for example, an egress firewall/ACL/traffic verification or packet
manipulation based on data specific to the egress device.

Nothing about running a program in the Tx path requires driver specific
resources like the Rx path has. Thus, programs can be run in core
code and attached to the net_device struct similar to skb mode. The
egress attach is done using the new XDP_FLAGS_EGRESS_MODE flag, and
is reported by the kernel using the XDP_ATTACHED_EGRESS_CORE attach
flag with IFLA_XDP_EGRESS_PROG_ID making the api similar to existing
APIs for XDP.

The egress program is run in bq_xmit_all before invoking ndo_xdp_xmit.
This is similar to cls_bpf programs which run before the call to
ndo_dev_xmit. Together the 2 locations cover all packets about to be
sent to a device for Tx.

xdp egress programs are not run on skbs, so a cls-bpf counterpart
should also be attached to the device to cover all packets -
xdp_frames and skbs.

v5:
- rebased to top of bpf-next
- dropped skb path; cls-bpf provides an option for the same functionality
  without having to take a performance hit (e.g., disabling GSO).
- updated fall through notation to 'fallthrough;' statement per
  checkpatch warning

v4:
- added space in bpftool help in partch 12 - Toke
- updated to top of bpf-next

v3:
- removed IFLA_XDP_EGRESS and dropped back to XDP_FLAGS_EGRESS_MODE
  as the uapi to specify the attach. This caused the ordering of the
  patches to change with the uapi now introduced in the second patch
  and 2 refactoring patches are dropped. Samples and test programs
  updated to use the new API.

v2:
- changed rx checks in xdp_is_valid_access to any expected_attach_type
- add xdp_egress argument to bpftool prog rst document
- do not allow IFLA_XDP and IFLA_XDP_EGRESS in the same config. There
  is no way to rollback IFLA_XDP if IFLA_XDP_EGRESS fails.
- comments from Andrii on libbpf

v1:
- add selftests
- flip the order of xdp generic patches as requested by Toke
- fixed the count arg to do_xdp_egress_frame - Toke
- remove meta data invalidate in __xdp_egress_frame - Toke
- fixed data_hard_start in __xdp_egress_frame - Jesper
- refactored convert_to_xdp_frame to reuse buf to frame code - Jesper
- added missed refactoring patch when generating patch set

RFC v5:
- updated cover letter
- moved running of ebpf program to from ndo_{start,xdp}_xmit to core
  code. Dropped all tun and vhost related changes.
- added egress support to bpftool

RFC v4:
- updated cover letter
- patches related to code movement between tuntap, headers and vhost
  are dropped; previous RFC ran the XDP program in vhost context vs
  this set which runs them before queueing to vhost. As a part of this
  moved invocation of egress program to tun_net_xmit and tun_xdp_xmit.
- renamed do_xdp_generic to do_xdp_generic_rx to emphasize is called
  in the Rx path; added rx argument to do_xdp_generic_core since it
  is used for both directions and needs to know which queue values to
  set in xdp_buff

RFC v3:
- reworked the patches - splitting patch 1 from RFC v2 into 3, combining
  patch 2 from RFC v2 into the first 3, combining patches 6 and 7 from
  RFC v2 into 1 since both did a trivial rename and export. Reordered
  the patches such that kernel changes are first followed by libbpf and
  an enhancement to a sample.

- moved small xdp related helper functions from tun.c to tun.h to make
  tun_ptr_free usable from the tap code. This is needed to handle the
  case of tap builtin and tun built as a module.

- pkt_ptrs added to `struct tun_file` and passed to tun_consume_packets
  rather than declaring pkts as an array on the stack.

RFC v2:
- New XDP attachment type: Jesper, Toke and Alexei discussed whether
  to introduce a new program type. Since this set adds a way to attach
  regular XDP program to the tx path, as per Alexei's suggestion, a
  new attachment type BPF_XDP_EGRESS is introduced.

- libbpf API changes:
  Alexei had suggested _opts() style of API extension. Considering it
  two new libbpf APIs are introduced which are equivalent to existing
  APIs. New ones can be extended easily. Please see individual patches
  for details. xdp1 sample program is modified to use new APIs.

- tun: Some patches from previous set are removed as they are
  irrelevant in this series. They will in introduced later.


David Ahern (11):
  net: Refactor convert_to_xdp_frame
  net: uapi for XDP programs in the egress path
  net: Add XDP setup and query commands for Tx programs
  net: Add BPF_XDP_EGRESS as a bpf_attach_type
  xdp: Add xdp_txq_info to xdp_buff
  net: set XDP egress program on netdevice
  net: Support xdp in the Tx path for xdp_frames
  libbpf: Add egress XDP support
  bpftool: Add support for XDP egress
  selftest: Add xdp_egress attach tests
  samples/bpf: add XDP egress support to xdp1

 include/linux/netdevice.h                     |   7 +
 include/net/xdp.h                             |  35 +++--
 include/uapi/linux/bpf.h                      |   3 +
 include/uapi/linux/if_link.h                  |   6 +-
 kernel/bpf/devmap.c                           |  19 ++-
 net/core/dev.c                                | 147 ++++++++++++++++--
 net/core/filter.c                             |  26 ++++
 net/core/rtnetlink.c                          |  23 ++-
 samples/bpf/xdp1_user.c                       |  11 +-
 .../bpf/bpftool/Documentation/bpftool-net.rst |   4 +-
 .../bpftool/Documentation/bpftool-prog.rst    |   2 +-
 tools/bpf/bpftool/bash-completion/bpftool     |   4 +-
 tools/bpf/bpftool/net.c                       |   6 +-
 tools/bpf/bpftool/netlink_dumper.c            |   5 +
 tools/bpf/bpftool/prog.c                      |   2 +-
 tools/include/uapi/linux/bpf.h                |   3 +
 tools/include/uapi/linux/if_link.h            |   6 +-
 tools/lib/bpf/libbpf.c                        |   2 +
 tools/lib/bpf/libbpf.h                        |   1 +
 tools/lib/bpf/netlink.c                       |   6 +
 .../bpf/prog_tests/xdp_egress_attach.c        |  56 +++++++
 .../selftests/bpf/progs/test_xdp_egress.c     |  12 ++
 .../bpf/progs/test_xdp_egress_fail.c          |  16 ++
 23 files changed, 358 insertions(+), 44 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_egress_attach.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_egress.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_egress_fail.c

-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v5 bpf-next 01/11] net: Refactor convert_to_xdp_frame
  2020-05-13  1:45 [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path David Ahern
@ 2020-05-13  1:45 ` David Ahern
  2020-05-13  1:45 ` [PATCH v5 bpf-next 02/11] net: uapi for XDP programs in the egress path David Ahern
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: David Ahern @ 2020-05-13  1:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, daniel,
	john.fastabend, ast, kafai, songliubraving, yhs, andriin,
	dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Move the guts of convert_to_xdp_frame to a new helper, update_xdp_frame
so it can be reused in a later patch.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
 include/net/xdp.h | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 3cc6d5d84aa4..3264fa882de3 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -93,32 +93,42 @@ static inline void xdp_scrub_frame(struct xdp_frame *frame)
 
 struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
 
-/* Convert xdp_buff to xdp_frame */
 static inline
-struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
+bool update_xdp_frame(struct xdp_buff *xdp, struct xdp_frame *xdp_frame)
 {
-	struct xdp_frame *xdp_frame;
 	int metasize;
 	int headroom;
 
-	if (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY)
-		return xdp_convert_zc_to_xdp_frame(xdp);
-
 	/* Assure headroom is available for storing info */
 	headroom = xdp->data - xdp->data_hard_start;
 	metasize = xdp->data - xdp->data_meta;
 	metasize = metasize > 0 ? metasize : 0;
 	if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
-		return NULL;
-
-	/* Store info in top of packet */
-	xdp_frame = xdp->data_hard_start;
+		return false;
 
 	xdp_frame->data = xdp->data;
 	xdp_frame->len  = xdp->data_end - xdp->data;
 	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
 	xdp_frame->metasize = metasize;
 
+	return true;
+}
+
+/* Convert xdp_buff to xdp_frame */
+static inline
+struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
+{
+	struct xdp_frame *xdp_frame;
+
+	if (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY)
+		return xdp_convert_zc_to_xdp_frame(xdp);
+
+	/* Store info in top of packet */
+	xdp_frame = xdp->data_hard_start;
+
+	if (unlikely(!update_xdp_frame(xdp, xdp_frame)))
+		return NULL;
+
 	/* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
 	xdp_frame->mem = xdp->rxq->mem;
 
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 bpf-next 02/11] net: uapi for XDP programs in the egress path
  2020-05-13  1:45 [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path David Ahern
  2020-05-13  1:45 ` [PATCH v5 bpf-next 01/11] net: Refactor convert_to_xdp_frame David Ahern
@ 2020-05-13  1:45 ` David Ahern
  2020-05-13  1:45 ` [PATCH v5 bpf-next 03/11] net: Add XDP setup and query commands for Tx programs David Ahern
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: David Ahern @ 2020-05-13  1:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, daniel,
	john.fastabend, ast, kafai, songliubraving, yhs, andriin,
	dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Running programs in the egress path, on skbs or xdp_frames, does not
require driver specific resources like Rx path. Accordingly, the
programs can be run in core code, so add xdp_egress_prog to net_device
to hold a reference to an attached program.

For UAPI, add XDP_FLAGS_EGRESS_MODE to specify attach is at egress, add
a new attach flag, XDP_ATTACHED_EGRESS_CORE, for reporting the attach
point at the core, egress level and add IFLA_XDP_EGRESS_PROG_ID
for reporting the program id. Add rtnl_xdp_prog_egress to fill in
link message with egress data.

Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/linux/netdevice.h          |  1 +
 include/uapi/linux/if_link.h       |  6 +++++-
 net/core/rtnetlink.c               | 23 +++++++++++++++++++++--
 tools/include/uapi/linux/if_link.h |  6 +++++-
 4 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5a8d40f1ffe2..594c13d4cd00 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1995,6 +1995,7 @@ struct net_device {
 	unsigned int		real_num_rx_queues;
 
 	struct bpf_prog __rcu	*xdp_prog;
+	struct bpf_prog __rcu	*xdp_egress_prog;
 	unsigned long		gro_flush_timeout;
 	int			napi_defer_hard_irqs;
 	rx_handler_func_t __rcu	*rx_handler;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index a009365ad67b..de32a9e94d74 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -976,9 +976,11 @@ enum {
 #define XDP_FLAGS_DRV_MODE		(1U << 2)
 #define XDP_FLAGS_HW_MODE		(1U << 3)
 #define XDP_FLAGS_REPLACE		(1U << 4)
+#define XDP_FLAGS_EGRESS_MODE		(1U << 5)
 #define XDP_FLAGS_MODES			(XDP_FLAGS_SKB_MODE | \
 					 XDP_FLAGS_DRV_MODE | \
-					 XDP_FLAGS_HW_MODE)
+					 XDP_FLAGS_HW_MODE | \
+					 XDP_FLAGS_EGRESS_MODE)
 #define XDP_FLAGS_MASK			(XDP_FLAGS_UPDATE_IF_NOEXIST | \
 					 XDP_FLAGS_MODES | XDP_FLAGS_REPLACE)
 
@@ -989,6 +991,7 @@ enum {
 	XDP_ATTACHED_SKB,
 	XDP_ATTACHED_HW,
 	XDP_ATTACHED_MULTI,
+	XDP_ATTACHED_EGRESS_CORE,
 };
 
 enum {
@@ -1001,6 +1004,7 @@ enum {
 	IFLA_XDP_SKB_PROG_ID,
 	IFLA_XDP_HW_PROG_ID,
 	IFLA_XDP_EXPECTED_FD,
+	IFLA_XDP_EGRESS_PROG_ID,
 	__IFLA_XDP_MAX,
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2269199c5891..9084dfebc699 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -982,7 +982,7 @@ static size_t rtnl_xdp_size(void)
 	size_t xdp_size = nla_total_size(0) +	/* nest IFLA_XDP */
 			  nla_total_size(1) +	/* XDP_ATTACHED */
 			  nla_total_size(4) +	/* XDP_PROG_ID (or 1st mode) */
-			  nla_total_size(4);	/* XDP_<mode>_PROG_ID */
+			  nla_total_size(4) * 2; /* XDP_<mode>_PROG_ID */
 
 	return xdp_size;
 }
@@ -1402,6 +1402,18 @@ static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
 	return 0;
 }
 
+static u32 rtnl_xdp_prog_egress(struct net_device *dev)
+{
+	const struct bpf_prog *prog;
+
+	ASSERT_RTNL();
+
+	prog = rtnl_dereference(dev->xdp_egress_prog);
+	if (!prog)
+		return 0;
+	return prog->aux->id;
+}
+
 static u32 rtnl_xdp_prog_skb(struct net_device *dev)
 {
 	const struct bpf_prog *generic_xdp_prog;
@@ -1474,6 +1486,12 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
 				  IFLA_XDP_HW_PROG_ID, rtnl_xdp_prog_hw);
 	if (err)
 		goto err_cancel;
+	err = rtnl_xdp_report_one(skb, dev, &prog_id, &mode,
+				  XDP_ATTACHED_EGRESS_CORE,
+				  IFLA_XDP_EGRESS_PROG_ID,
+				  rtnl_xdp_prog_egress);
+	if (err)
+		goto err_cancel;
 
 	err = nla_put_u8(skb, IFLA_XDP_ATTACHED, mode);
 	if (err)
@@ -2790,7 +2808,8 @@ static int do_setlink(const struct sk_buff *skb,
 		if (err < 0)
 			goto errout;
 
-		if (xdp[IFLA_XDP_ATTACHED] || xdp[IFLA_XDP_PROG_ID]) {
+		if (xdp[IFLA_XDP_ATTACHED] || xdp[IFLA_XDP_PROG_ID] ||
+		    xdp[IFLA_XDP_EGRESS_PROG_ID]) {
 			err = -EINVAL;
 			goto errout;
 		}
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index cafedbbfefbe..1d61cb46a77c 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -964,9 +964,11 @@ enum {
 #define XDP_FLAGS_DRV_MODE		(1U << 2)
 #define XDP_FLAGS_HW_MODE		(1U << 3)
 #define XDP_FLAGS_REPLACE		(1U << 4)
+#define XDP_FLAGS_EGRESS_MODE		(1U << 5)
 #define XDP_FLAGS_MODES			(XDP_FLAGS_SKB_MODE | \
 					 XDP_FLAGS_DRV_MODE | \
-					 XDP_FLAGS_HW_MODE)
+					 XDP_FLAGS_HW_MODE | \
+					 XDP_FLAGS_EGRESS_MODE)
 #define XDP_FLAGS_MASK			(XDP_FLAGS_UPDATE_IF_NOEXIST | \
 					 XDP_FLAGS_MODES | XDP_FLAGS_REPLACE)
 
@@ -977,6 +979,7 @@ enum {
 	XDP_ATTACHED_SKB,
 	XDP_ATTACHED_HW,
 	XDP_ATTACHED_MULTI,
+	XDP_ATTACHED_EGRESS_CORE,
 };
 
 enum {
@@ -989,6 +992,7 @@ enum {
 	IFLA_XDP_SKB_PROG_ID,
 	IFLA_XDP_HW_PROG_ID,
 	IFLA_XDP_EXPECTED_FD,
+	IFLA_XDP_EGRESS_PROG_ID,
 	__IFLA_XDP_MAX,
 };
 
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 bpf-next 03/11] net: Add XDP setup and query commands for Tx programs
  2020-05-13  1:45 [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path David Ahern
  2020-05-13  1:45 ` [PATCH v5 bpf-next 01/11] net: Refactor convert_to_xdp_frame David Ahern
  2020-05-13  1:45 ` [PATCH v5 bpf-next 02/11] net: uapi for XDP programs in the egress path David Ahern
@ 2020-05-13  1:45 ` David Ahern
  2020-05-13  1:46 ` [PATCH v5 bpf-next 04/11] net: Add BPF_XDP_EGRESS as a bpf_attach_type David Ahern
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: David Ahern @ 2020-05-13  1:45 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, daniel,
	john.fastabend, ast, kafai, songliubraving, yhs, andriin,
	dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add new netdev commands, XDP_SETUP_PROG_EGRESS and
XDP_QUERY_PROG_EGRESS, to query and setup egress programs.

Update dev_change_xdp_fd and dev_xdp_install to check for egress mode
via XDP_FLAGS_EGRESS_MODE in the flags. If egress bool is set, then use
XDP_SETUP_PROG_EGRESS in dev_xdp_install and XDP_QUERY_PROG_EGRESS in
dev_change_xdp_fd.

Signed-off-by: David Ahern <dahern@digitalocean.com>
Co-developed-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            | 20 +++++++++++++++-----
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 594c13d4cd00..ee0cb73ca18a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -873,8 +873,10 @@ enum bpf_netdev_command {
 	 */
 	XDP_SETUP_PROG,
 	XDP_SETUP_PROG_HW,
+	XDP_SETUP_PROG_EGRESS,
 	XDP_QUERY_PROG,
 	XDP_QUERY_PROG_HW,
+	XDP_QUERY_PROG_EGRESS,
 	/* BPF program for offload callbacks, invoked at program load time. */
 	BPF_OFFLOAD_MAP_ALLOC,
 	BPF_OFFLOAD_MAP_FREE,
diff --git a/net/core/dev.c b/net/core/dev.c
index afff16849c26..c0455e764f97 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8600,13 +8600,16 @@ static int dev_xdp_install(struct net_device *dev, bpf_op_t bpf_op,
 			   struct bpf_prog *prog)
 {
 	bool non_hw = !(flags & XDP_FLAGS_HW_MODE);
+	bool egress = flags & XDP_FLAGS_EGRESS_MODE;
 	struct bpf_prog *prev_prog = NULL;
 	struct netdev_bpf xdp;
 	int err;
 
 	if (non_hw) {
-		prev_prog = bpf_prog_by_id(__dev_xdp_query(dev, bpf_op,
-							   XDP_QUERY_PROG));
+		enum bpf_netdev_command cmd;
+
+		cmd = egress ? XDP_QUERY_PROG_EGRESS : XDP_QUERY_PROG;
+		prev_prog = bpf_prog_by_id(__dev_xdp_query(dev, bpf_op, cmd));
 		if (IS_ERR(prev_prog))
 			prev_prog = NULL;
 	}
@@ -8615,7 +8618,7 @@ static int dev_xdp_install(struct net_device *dev, bpf_op_t bpf_op,
 	if (flags & XDP_FLAGS_HW_MODE)
 		xdp.command = XDP_SETUP_PROG_HW;
 	else
-		xdp.command = XDP_SETUP_PROG;
+		xdp.command = egress ? XDP_SETUP_PROG_EGRESS : XDP_SETUP_PROG;
 	xdp.extack = extack;
 	xdp.flags = flags;
 	xdp.prog = prog;
@@ -8677,12 +8680,18 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	bpf_op_t bpf_op, bpf_chk;
 	struct bpf_prog *prog;
 	bool offload;
+	bool egress;
 	int err;
 
 	ASSERT_RTNL();
 
 	offload = flags & XDP_FLAGS_HW_MODE;
-	query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
+	egress = flags & XDP_FLAGS_EGRESS_MODE;
+	if (egress)
+		query = XDP_QUERY_PROG_EGRESS;
+	else
+		query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
+
 
 	bpf_op = bpf_chk = ops->ndo_bpf;
 	if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE))) {
@@ -8712,7 +8721,8 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		}
 	}
 	if (fd >= 0) {
-		if (!offload && __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
+		if (!offload && !egress &&
+		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
 			NL_SET_ERR_MSG(extack, "native and generic XDP can't be active at the same time");
 			return -EEXIST;
 		}
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 bpf-next 04/11] net: Add BPF_XDP_EGRESS as a bpf_attach_type
  2020-05-13  1:45 [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path David Ahern
                   ` (2 preceding siblings ...)
  2020-05-13  1:45 ` [PATCH v5 bpf-next 03/11] net: Add XDP setup and query commands for Tx programs David Ahern
@ 2020-05-13  1:46 ` David Ahern
  2020-05-13  1:46 ` [PATCH v5 bpf-next 05/11] xdp: Add xdp_txq_info to xdp_buff David Ahern
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: David Ahern @ 2020-05-13  1:46 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, daniel,
	john.fastabend, ast, kafai, songliubraving, yhs, andriin,
	dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add new bpf_attach_type, BPF_XDP_EGRESS, for BPF programs attached
at the XDP layer, but the egress path.

Since egress path will not have ingress_ifindex and rx_queue_index
set, update xdp_is_valid_access to block access to these entries in
the xdp context when a program is attached with expected_attach_type
set.

Update dev_change_xdp_fd to verify expected_attach_type for a program
is BPF_XDP_EGRESS if egress argument is set.

The next patch adds support for the egress ifindex.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Co-developed-by: David Ahern <dahern@digitalocean.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/uapi/linux/bpf.h       |  1 +
 net/core/dev.c                 | 11 +++++++++++
 net/core/filter.c              | 11 +++++++++++
 tools/include/uapi/linux/bpf.h |  1 +
 4 files changed, 24 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index bfb31c1be219..05accb95bb4a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -220,6 +220,7 @@ enum bpf_attach_type {
 	BPF_MODIFY_RETURN,
 	BPF_LSM_MAC,
 	BPF_TRACE_ITER,
+	BPF_XDP_EGRESS,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index c0455e764f97..88672ea4fc80 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8737,6 +8737,17 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 
+		if (egress && prog->expected_attach_type != BPF_XDP_EGRESS) {
+			NL_SET_ERR_MSG(extack, "XDP program in Tx path must use BPF_XDP_EGRESS attach type");
+			bpf_prog_put(prog);
+			return -EINVAL;
+		}
+		if (!egress && prog->expected_attach_type == BPF_XDP_EGRESS) {
+			NL_SET_ERR_MSG(extack, "XDP program in Rx path can not use BPF_XDP_EGRESS attach type");
+			bpf_prog_put(prog);
+			return -EINVAL;
+		}
+
 		if (!offload && bpf_prog_is_dev_bound(prog->aux)) {
 			NL_SET_ERR_MSG(extack, "using device-bound program without HW_MODE flag is not supported");
 			bpf_prog_put(prog);
diff --git a/net/core/filter.c b/net/core/filter.c
index da0634979f53..19272eb7bb8f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6931,6 +6931,17 @@ static bool xdp_is_valid_access(int off, int size,
 				const struct bpf_prog *prog,
 				struct bpf_insn_access_aux *info)
 {
+	/* Rx data is only accessible from original XDP where
+	 * expected_attach_type is not set
+	 */
+	if (prog->expected_attach_type) {
+		switch (off) {
+		case offsetof(struct xdp_md, ingress_ifindex):
+		case offsetof(struct xdp_md, rx_queue_index):
+			return false;
+		}
+	}
+
 	if (type == BPF_WRITE) {
 		if (bpf_prog_is_dev_bound(prog->aux)) {
 			switch (off) {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index bfb31c1be219..05accb95bb4a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -220,6 +220,7 @@ enum bpf_attach_type {
 	BPF_MODIFY_RETURN,
 	BPF_LSM_MAC,
 	BPF_TRACE_ITER,
+	BPF_XDP_EGRESS,
 	__MAX_BPF_ATTACH_TYPE
 };
 
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 bpf-next 05/11] xdp: Add xdp_txq_info to xdp_buff
  2020-05-13  1:45 [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path David Ahern
                   ` (3 preceding siblings ...)
  2020-05-13  1:46 ` [PATCH v5 bpf-next 04/11] net: Add BPF_XDP_EGRESS as a bpf_attach_type David Ahern
@ 2020-05-13  1:46 ` David Ahern
  2020-05-13  1:46 ` [PATCH v5 bpf-next 06/11] net: set XDP egress program on netdevice David Ahern
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: David Ahern @ 2020-05-13  1:46 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, daniel,
	john.fastabend, ast, kafai, songliubraving, yhs, andriin,
	dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add xdp_txq_info as the Tx counterpart to xdp_rxq_info. At the
moment only the device is added. Other fields (queue_index)
can be added as use cases arise.

From a UAPI perspective, add egress_ifindex to xdp context.

Update the verifier to reject accesses to egress_ifindex by
rx programs.

Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/net/xdp.h              |  5 +++++
 include/uapi/linux/bpf.h       |  2 ++
 net/core/filter.c              | 15 +++++++++++++++
 tools/include/uapi/linux/bpf.h |  2 ++
 4 files changed, 24 insertions(+)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 3264fa882de3..2b85b8649201 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -63,6 +63,10 @@ struct xdp_rxq_info {
 	struct xdp_mem_info mem;
 } ____cacheline_aligned; /* perf critical, avoid false-sharing */
 
+struct xdp_txq_info {
+	struct net_device *dev;
+};
+
 struct xdp_buff {
 	void *data;
 	void *data_end;
@@ -70,6 +74,7 @@ struct xdp_buff {
 	void *data_hard_start;
 	unsigned long handle;
 	struct xdp_rxq_info *rxq;
+	struct xdp_txq_info *txq;
 };
 
 struct xdp_frame {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 05accb95bb4a..54d50dd62498 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3586,6 +3586,8 @@ struct xdp_md {
 	/* Below access go through struct xdp_rxq_info */
 	__u32 ingress_ifindex; /* rxq->dev->ifindex */
 	__u32 rx_queue_index;  /* rxq->queue_index  */
+
+	__u32 egress_ifindex;  /* txq->dev->ifindex */
 };
 
 enum sk_action {
diff --git a/net/core/filter.c b/net/core/filter.c
index 19272eb7bb8f..c1edd800b4f1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6940,6 +6940,11 @@ static bool xdp_is_valid_access(int off, int size,
 		case offsetof(struct xdp_md, rx_queue_index):
 			return false;
 		}
+	} else if (prog->expected_attach_type != BPF_XDP_EGRESS) {
+		switch (off) {
+		case offsetof(struct xdp_md, egress_ifindex):
+			return false;
+		}
 	}
 
 	if (type == BPF_WRITE) {
@@ -7889,6 +7894,16 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,
 				      offsetof(struct xdp_rxq_info,
 					       queue_index));
 		break;
+	case offsetof(struct xdp_md, egress_ifindex):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, txq),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct xdp_buff, txq));
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_txq_info, dev),
+				      si->dst_reg, si->dst_reg,
+				      offsetof(struct xdp_txq_info, dev));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+				      offsetof(struct net_device, ifindex));
+		break;
 	}
 
 	return insn - insn_buf;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 05accb95bb4a..54d50dd62498 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3586,6 +3586,8 @@ struct xdp_md {
 	/* Below access go through struct xdp_rxq_info */
 	__u32 ingress_ifindex; /* rxq->dev->ifindex */
 	__u32 rx_queue_index;  /* rxq->queue_index  */
+
+	__u32 egress_ifindex;  /* txq->dev->ifindex */
 };
 
 enum sk_action {
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 bpf-next 06/11] net: set XDP egress program on netdevice
  2020-05-13  1:45 [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path David Ahern
                   ` (4 preceding siblings ...)
  2020-05-13  1:46 ` [PATCH v5 bpf-next 05/11] xdp: Add xdp_txq_info to xdp_buff David Ahern
@ 2020-05-13  1:46 ` David Ahern
  2020-05-13  1:46 ` [PATCH v5 bpf-next 07/11] net: Support xdp in the Tx path for xdp_frames David Ahern
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: David Ahern @ 2020-05-13  1:46 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, daniel,
	john.fastabend, ast, kafai, songliubraving, yhs, andriin,
	dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

This patch handles the plumbing for installing an XDP egress
program on a net_device by handling XDP_SETUP_PROG_EGRESS and
XDP_QUERY_PROG_EGRESS in generic_xdp_install handler. New static
key is added to signal when an egress program has been installed.

Update dev_xdp_uninstall to remove egress programs.

Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            | 48 +++++++++++++++++++++++++++------------
 2 files changed, 36 insertions(+), 14 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ee0cb73ca18a..651baeb36729 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -752,6 +752,8 @@ struct netdev_rx_queue {
 #endif
 } ____cacheline_aligned_in_smp;
 
+extern struct static_key_false xdp_egress_needed_key;
+
 /*
  * RX queue sysfs structures and functions.
  */
diff --git a/net/core/dev.c b/net/core/dev.c
index 88672ea4fc80..97954f835ceb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4641,6 +4641,7 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)
 }
 
 static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key);
+DEFINE_STATIC_KEY_FALSE(xdp_egress_needed_key);
 
 int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 {
@@ -5336,12 +5337,12 @@ static void __netif_receive_skb_list(struct list_head *head)
 
 static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp)
 {
-	struct bpf_prog *old = rtnl_dereference(dev->xdp_prog);
-	struct bpf_prog *new = xdp->prog;
+	struct bpf_prog *old, *new = xdp->prog;
 	int ret = 0;
 
 	switch (xdp->command) {
 	case XDP_SETUP_PROG:
+		old = rtnl_dereference(dev->xdp_prog);
 		rcu_assign_pointer(dev->xdp_prog, new);
 		if (old)
 			bpf_prog_put(old);
@@ -5354,11 +5355,25 @@ static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp)
 			dev_disable_gro_hw(dev);
 		}
 		break;
+	case XDP_SETUP_PROG_EGRESS:
+		old = rtnl_dereference(dev->xdp_egress_prog);
+		rcu_assign_pointer(dev->xdp_egress_prog, new);
+		if (old)
+			bpf_prog_put(old);
 
+		if (old && !new)
+			static_branch_dec(&xdp_egress_needed_key);
+		else if (new && !old)
+			static_branch_inc(&xdp_egress_needed_key);
+		break;
 	case XDP_QUERY_PROG:
+		old = rtnl_dereference(dev->xdp_prog);
+		xdp->prog_id = old ? old->aux->id : 0;
+		break;
+	case XDP_QUERY_PROG_EGRESS:
+		old = rtnl_dereference(dev->xdp_egress_prog);
 		xdp->prog_id = old ? old->aux->id : 0;
 		break;
-
 	default:
 		ret = -EINVAL;
 		break;
@@ -8641,6 +8656,10 @@ static void dev_xdp_uninstall(struct net_device *dev)
 	/* Remove generic XDP */
 	WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL, 0, NULL));
 
+	/* Remove XDP egress */
+	WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL,
+				XDP_FLAGS_EGRESS_MODE, NULL));
+
 	/* Remove from the driver */
 	ndo_bpf = dev->netdev_ops->ndo_bpf;
 	if (!ndo_bpf)
@@ -8687,21 +8706,22 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 
 	offload = flags & XDP_FLAGS_HW_MODE;
 	egress = flags & XDP_FLAGS_EGRESS_MODE;
-	if (egress)
+	if (egress) {
 		query = XDP_QUERY_PROG_EGRESS;
-	else
+		bpf_op = bpf_chk = generic_xdp_install;
+	} else {
 		query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
 
-
-	bpf_op = bpf_chk = ops->ndo_bpf;
-	if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE))) {
-		NL_SET_ERR_MSG(extack, "underlying driver does not support XDP in native mode");
-		return -EOPNOTSUPP;
+		bpf_op = bpf_chk = ops->ndo_bpf;
+		if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE))) {
+			NL_SET_ERR_MSG(extack, "underlying driver does not support XDP in native mode");
+			return -EOPNOTSUPP;
+		}
+		if (!bpf_op || (flags & XDP_FLAGS_SKB_MODE))
+			bpf_op = generic_xdp_install;
+		if (bpf_op == bpf_chk)
+			bpf_chk = generic_xdp_install;
 	}
-	if (!bpf_op || (flags & XDP_FLAGS_SKB_MODE))
-		bpf_op = generic_xdp_install;
-	if (bpf_op == bpf_chk)
-		bpf_chk = generic_xdp_install;
 
 	prog_id = __dev_xdp_query(dev, bpf_op, query);
 	if (flags & XDP_FLAGS_REPLACE) {
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 bpf-next 07/11] net: Support xdp in the Tx path for xdp_frames
  2020-05-13  1:45 [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path David Ahern
                   ` (5 preceding siblings ...)
  2020-05-13  1:46 ` [PATCH v5 bpf-next 06/11] net: set XDP egress program on netdevice David Ahern
@ 2020-05-13  1:46 ` David Ahern
  2020-05-13  1:46 ` [PATCH v5 bpf-next 08/11] libbpf: Add egress XDP support David Ahern
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: David Ahern @ 2020-05-13  1:46 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, daniel,
	john.fastabend, ast, kafai, songliubraving, yhs, andriin,
	dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add support to run Tx path program on xdp_frames by adding a hook to
bq_xmit_all before xdp_frames are passed to ndo_xdp_xmit for the device.

If an xdp_frame is dropped by the program, it is removed from the
xdp_frames array with subsequent entries moved up.

Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/linux/netdevice.h |  2 ++
 kernel/bpf/devmap.c       | 19 +++++++---
 net/core/dev.c            | 74 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 90 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 651baeb36729..042190df23d5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3717,6 +3717,8 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
 
 void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
 int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb);
+unsigned int do_xdp_egress(struct net_device *dev, struct xdp_frame **frames,
+			   unsigned int count);
 int netif_rx(struct sk_buff *skb);
 int netif_rx_ni(struct sk_buff *skb);
 int netif_receive_skb(struct sk_buff *skb);
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index a51d9fb7a359..513dec5f67b9 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -321,24 +321,33 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
 {
 	struct net_device *dev = bq->dev;
 	int sent = 0, drops = 0, err = 0;
+	unsigned int count = bq->count;
 	int i;
 
-	if (unlikely(!bq->count))
+	if (unlikely(!count))
 		return 0;
 
-	for (i = 0; i < bq->count; i++) {
+	for (i = 0; i < count; i++) {
 		struct xdp_frame *xdpf = bq->q[i];
 
 		prefetch(xdpf);
 	}
 
-	sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, flags);
+	if (static_branch_unlikely(&xdp_egress_needed_key)) {
+		count = do_xdp_egress(dev, bq->q, count);
+		drops += bq->count - count;
+		/* all frames consumed by the xdp program? */
+		if (!count)
+			goto out;
+	}
+
+	sent = dev->netdev_ops->ndo_xdp_xmit(dev, count, bq->q, flags);
 	if (sent < 0) {
 		err = sent;
 		sent = 0;
 		goto error;
 	}
-	drops = bq->count - sent;
+	drops += count - sent;
 out:
 	bq->count = 0;
 
@@ -350,7 +359,7 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
 	/* If ndo_xdp_xmit fails with an errno, no frames have been
 	 * xmit'ed and it's our responsibility to them free all.
 	 */
-	for (i = 0; i < bq->count; i++) {
+	for (i = 0; i < count; i++) {
 		struct xdp_frame *xdpf = bq->q[i];
 
 		xdp_return_frame_rx_napi(xdpf);
diff --git a/net/core/dev.c b/net/core/dev.c
index 97954f835ceb..e6a737b84768 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4673,6 +4673,80 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(do_xdp_generic);
 
+static u32 __xdp_egress_frame(struct net_device *dev,
+			      struct bpf_prog *xdp_prog,
+			      struct xdp_frame *xdp_frame,
+			      struct xdp_txq_info *txq)
+{
+	struct xdp_buff xdp;
+	u32 act;
+
+	xdp.data_hard_start = xdp_frame->data - xdp_frame->headroom
+			      - sizeof(*xdp_frame);
+	xdp.data = xdp_frame->data;
+	xdp.data_end = xdp.data + xdp_frame->len;
+	xdp.data_meta = xdp.data - xdp_frame->metasize;
+	xdp.txq = txq;
+
+	act = bpf_prog_run_xdp(xdp_prog, &xdp);
+	switch (act) {
+	case XDP_DROP:
+		fallthrough;
+	case XDP_PASS:
+		break;
+	case XDP_TX:
+		fallthrough;
+	case XDP_REDIRECT:
+		fallthrough;
+	default:
+		bpf_warn_invalid_xdp_action(act);
+		fallthrough;
+	case XDP_ABORTED:
+		trace_xdp_exception(dev, xdp_prog, act);
+		act = XDP_DROP;
+		break;
+	}
+
+	/* if not dropping frame, readjust pointers in case
+	 * program made changes to the buffer
+	 */
+	if (act != XDP_DROP) {
+		if (unlikely(!update_xdp_frame(&xdp, xdp_frame)))
+			return XDP_DROP;
+	}
+
+	return act;
+}
+
+unsigned int do_xdp_egress(struct net_device *dev, struct xdp_frame **frames,
+			   unsigned int count)
+{
+	struct bpf_prog *xdp_prog;
+
+	xdp_prog = rcu_dereference(dev->xdp_egress_prog);
+	if (xdp_prog) {
+		struct xdp_txq_info txq = { .dev = dev };
+		unsigned int i, j;
+		u32 act;
+
+		for (i = 0, j = 0; i < count; i++) {
+			struct xdp_frame *frame = frames[i];
+
+			act = __xdp_egress_frame(dev, xdp_prog, frame, &txq);
+			if (act == XDP_DROP) {
+				xdp_return_frame_rx_napi(frame);
+				continue;
+			}
+
+			frames[j] = frame;
+			j++;
+		}
+		count = j;
+	}
+
+	return count;
+}
+
 static int netif_rx_internal(struct sk_buff *skb)
 {
 	int ret;
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 bpf-next 08/11] libbpf: Add egress XDP support
  2020-05-13  1:45 [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path David Ahern
                   ` (6 preceding siblings ...)
  2020-05-13  1:46 ` [PATCH v5 bpf-next 07/11] net: Support xdp in the Tx path for xdp_frames David Ahern
@ 2020-05-13  1:46 ` David Ahern
  2020-05-13  1:46 ` [PATCH v5 bpf-next 09/11] bpftool: Add support for XDP egress David Ahern
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: David Ahern @ 2020-05-13  1:46 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, daniel,
	john.fastabend, ast, kafai, songliubraving, yhs, andriin,
	dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

New section name hint, xdp_egress, is added to set expected attach
type at program load. Programs can use xdp_egress as the prefix in
the SEC statement to load the program with the BPF_XDP_EGRESS
attach type set.

egress_prog_id is added to xdp_link_info to report the program
id.

Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 tools/lib/bpf/libbpf.c  | 2 ++
 tools/lib/bpf/libbpf.h  | 1 +
 tools/lib/bpf/netlink.c | 6 ++++++
 3 files changed, 9 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 3da66540b54b..5d1d513d9958 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6635,6 +6635,8 @@ static const struct bpf_sec_def section_defs[] = {
 		.expected_attach_type = BPF_TRACE_ITER,
 		.is_attach_btf = true,
 		.attach_fn = attach_iter),
+	BPF_EAPROG_SEC("xdp_egress",		BPF_PROG_TYPE_XDP,
+						BPF_XDP_EGRESS),
 	BPF_PROG_SEC("xdp",			BPF_PROG_TYPE_XDP),
 	BPF_PROG_SEC("perf_event",		BPF_PROG_TYPE_PERF_EVENT),
 	BPF_PROG_SEC("lwt_in",			BPF_PROG_TYPE_LWT_IN),
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 8ea69558f0a8..d3ded4b2da02 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -463,6 +463,7 @@ struct xdp_link_info {
 	__u32 hw_prog_id;
 	__u32 skb_prog_id;
 	__u8 attach_mode;
+	__u32 egress_prog_id;
 };
 
 struct bpf_xdp_set_link_opts {
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index 312f887570b2..da0b383dbd5d 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -280,6 +280,10 @@ static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
 		xdp_id->info.hw_prog_id = libbpf_nla_getattr_u32(
 			xdp_tb[IFLA_XDP_HW_PROG_ID]);
 
+	if (xdp_tb[IFLA_XDP_EGRESS_PROG_ID])
+		xdp_id->info.egress_prog_id = libbpf_nla_getattr_u32(
+			xdp_tb[IFLA_XDP_EGRESS_PROG_ID]);
+
 	return 0;
 }
 
@@ -331,6 +335,8 @@ static __u32 get_xdp_id(struct xdp_link_info *info, __u32 flags)
 		return info->hw_prog_id;
 	if (flags & XDP_FLAGS_SKB_MODE)
 		return info->skb_prog_id;
+	if (flags & XDP_FLAGS_EGRESS_MODE)
+		return info->egress_prog_id;
 
 	return 0;
 }
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 bpf-next 09/11] bpftool: Add support for XDP egress
  2020-05-13  1:45 [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path David Ahern
                   ` (7 preceding siblings ...)
  2020-05-13  1:46 ` [PATCH v5 bpf-next 08/11] libbpf: Add egress XDP support David Ahern
@ 2020-05-13  1:46 ` David Ahern
  2020-05-13  1:46 ` [PATCH v5 bpf-next 10/11] selftest: Add xdp_egress attach tests David Ahern
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: David Ahern @ 2020-05-13  1:46 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, daniel,
	john.fastabend, ast, kafai, songliubraving, yhs, andriin,
	dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add xdp_egress as a program type since it requires a new attach
type. This follows suit with other program type + attach type
combintations and leverages the SEC name in libbpf.

Add NET_ATTACH_TYPE_XDP_EGRESS and update attach_type_strings to
allow a user to specify 'xdp_egress' as the attach or detach point.

Update do_attach_detach_xdp to set XDP_FLAGS_EGRESS_MODE if egress
is selected.

Update do_xdp_dump_one to show egress program ids.

Update the documentation and help output.

Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 tools/bpf/bpftool/Documentation/bpftool-net.rst  | 4 +++-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst | 2 +-
 tools/bpf/bpftool/bash-completion/bpftool        | 4 ++--
 tools/bpf/bpftool/net.c                          | 6 +++++-
 tools/bpf/bpftool/netlink_dumper.c               | 5 +++++
 tools/bpf/bpftool/prog.c                         | 2 +-
 6 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-net.rst b/tools/bpf/bpftool/Documentation/bpftool-net.rst
index aa7450736179..e1c83188a2b5 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-net.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-net.rst
@@ -26,7 +26,8 @@ NET COMMANDS
 |	**bpftool** **net help**
 |
 |	*PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
-|	*ATTACH_TYPE* := { **xdp** | **xdpgeneric** | **xdpdrv** | **xdpoffload** }
+|	*ATTACH_TYPE* :=
+|       { **xdp** | **xdpgeneric** | **xdpdrv** | **xdpoffload** | **xdp_egress** }
 
 DESCRIPTION
 ===========
@@ -63,6 +64,7 @@ DESCRIPTION
                   **xdpgeneric** - Generic XDP. runs at generic XDP hook when packet already enters receive path as skb;
                   **xdpdrv** - Native XDP. runs earliest point in driver's receive path;
                   **xdpoffload** - Offload XDP. runs directly on NIC on each packet reception;
+                  **xdp_egress** - XDP in egress path. runs at core networking level;
 
 	**bpftool** **net detach** *ATTACH_TYPE* **dev** *NAME*
                   Detach bpf program attached to network interface *NAME* with
diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 5948e9d89c8d..739dee23c610 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -44,7 +44,7 @@ PROG COMMANDS
 |		**cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** | **cgroup/sendmsg6** |
 |		**cgroup/recvmsg4** | **cgroup/recvmsg6** | **cgroup/sysctl** |
 |		**cgroup/getsockopt** | **cgroup/setsockopt** |
-|		**struct_ops** | **fentry** | **fexit** | **freplace**
+|		**struct_ops** | **fentry** | **fexit** | **freplace** | **xdp_egress**
 |	}
 |       *ATTACH_TYPE* := {
 |		**msg_verdict** | **stream_verdict** | **stream_parser** | **flow_dissector**
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 9f0f20e73b87..92cc24a0a3cb 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -477,7 +477,7 @@ _bpftool()
                                 cgroup/post_bind4 cgroup/post_bind6 \
                                 cgroup/sysctl cgroup/getsockopt \
                                 cgroup/setsockopt struct_ops \
-                                fentry fexit freplace" -- \
+                                fentry fexit freplace xdp_egress" -- \
                                                    "$cur" ) )
                             return 0
                             ;;
@@ -1022,7 +1022,7 @@ _bpftool()
             ;;
         net)
             local PROG_TYPE='id pinned tag name'
-            local ATTACH_TYPES='xdp xdpgeneric xdpdrv xdpoffload'
+            local ATTACH_TYPES='xdp xdpgeneric xdpdrv xdpoffload xdp_egress'
             case $command in
                 show|list)
                     [[ $prev != "$command" ]] && return 0
diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index c5e3895b7c8b..714fa075521b 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -61,6 +61,7 @@ enum net_attach_type {
 	NET_ATTACH_TYPE_XDP_GENERIC,
 	NET_ATTACH_TYPE_XDP_DRIVER,
 	NET_ATTACH_TYPE_XDP_OFFLOAD,
+	NET_ATTACH_TYPE_XDP_EGRESS,
 };
 
 static const char * const attach_type_strings[] = {
@@ -68,6 +69,7 @@ static const char * const attach_type_strings[] = {
 	[NET_ATTACH_TYPE_XDP_GENERIC]	= "xdpgeneric",
 	[NET_ATTACH_TYPE_XDP_DRIVER]	= "xdpdrv",
 	[NET_ATTACH_TYPE_XDP_OFFLOAD]	= "xdpoffload",
+	[NET_ATTACH_TYPE_XDP_EGRESS]	= "xdp_egress",
 };
 
 const size_t net_attach_type_size = ARRAY_SIZE(attach_type_strings);
@@ -286,6 +288,8 @@ static int do_attach_detach_xdp(int progfd, enum net_attach_type attach_type,
 		flags |= XDP_FLAGS_DRV_MODE;
 	if (attach_type == NET_ATTACH_TYPE_XDP_OFFLOAD)
 		flags |= XDP_FLAGS_HW_MODE;
+	if (attach_type == NET_ATTACH_TYPE_XDP_EGRESS)
+		flags |= XDP_FLAGS_EGRESS_MODE;
 
 	return bpf_set_link_xdp_fd(ifindex, progfd, flags);
 }
@@ -464,7 +468,7 @@ static int do_help(int argc, char **argv)
 		"       %s %s help\n"
 		"\n"
 		"       " HELP_SPEC_PROGRAM "\n"
-		"       ATTACH_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n"
+		"       ATTACH_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload | xdp_egress }\n"
 		"\n"
 		"Note: Only xdp and tc attachments are supported now.\n"
 		"      For progs attached to cgroups, use \"bpftool cgroup\"\n"
diff --git a/tools/bpf/bpftool/netlink_dumper.c b/tools/bpf/bpftool/netlink_dumper.c
index 5f65140b003b..68e4909b6073 100644
--- a/tools/bpf/bpftool/netlink_dumper.c
+++ b/tools/bpf/bpftool/netlink_dumper.c
@@ -55,6 +55,8 @@ static int do_xdp_dump_one(struct nlattr *attr, unsigned int ifindex,
 		xdp_dump_prog_id(tb, IFLA_XDP_SKB_PROG_ID, "generic", true);
 		xdp_dump_prog_id(tb, IFLA_XDP_DRV_PROG_ID, "driver", true);
 		xdp_dump_prog_id(tb, IFLA_XDP_HW_PROG_ID, "offload", true);
+		xdp_dump_prog_id(tb, IFLA_XDP_EGRESS_PROG_ID,
+				 "egress", true);
 		if (json_output)
 			jsonw_end_array(json_wtr);
 	} else if (mode == XDP_ATTACHED_DRV) {
@@ -63,6 +65,9 @@ static int do_xdp_dump_one(struct nlattr *attr, unsigned int ifindex,
 		xdp_dump_prog_id(tb, IFLA_XDP_PROG_ID, "generic", false);
 	} else if (mode == XDP_ATTACHED_HW) {
 		xdp_dump_prog_id(tb, IFLA_XDP_PROG_ID, "offload", false);
+	} else if (mode == XDP_ATTACHED_EGRESS_CORE) {
+		xdp_dump_prog_id(tb, IFLA_XDP_EGRESS_PROG_ID,
+				 "egress", false);
 	}
 
 	NET_END_OBJECT_FINAL;
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index b6e5ba568f98..86e19357afc8 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -2014,7 +2014,7 @@ static int do_help(int argc, char **argv)
 		"                 cgroup/post_bind6 | cgroup/connect4 | cgroup/connect6 |\n"
 		"                 cgroup/sendmsg4 | cgroup/sendmsg6 | cgroup/recvmsg4 |\n"
 		"                 cgroup/recvmsg6 | cgroup/getsockopt | cgroup/setsockopt |\n"
-		"                 struct_ops | fentry | fexit | freplace }\n"
+		"                 struct_ops | fentry | fexit | freplace | xdp_egress }\n"
 		"       ATTACH_TYPE := { msg_verdict | stream_verdict | stream_parser |\n"
 		"                        flow_dissector }\n"
 		"       METRIC := { cycles | instructions | l1d_loads | llc_misses }\n"
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 bpf-next 10/11] selftest: Add xdp_egress attach tests
  2020-05-13  1:45 [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path David Ahern
                   ` (8 preceding siblings ...)
  2020-05-13  1:46 ` [PATCH v5 bpf-next 09/11] bpftool: Add support for XDP egress David Ahern
@ 2020-05-13  1:46 ` David Ahern
  2020-05-13  1:46 ` [PATCH v5 bpf-next 11/11] samples/bpf: add XDP egress support to xdp1 David Ahern
  2020-05-13 10:43 ` [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path Toke Høiland-Jørgensen
  11 siblings, 0 replies; 32+ messages in thread
From: David Ahern @ 2020-05-13  1:46 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, daniel,
	john.fastabend, ast, kafai, songliubraving, yhs, andriin,
	dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add xdp_egress attach tests:
1. verify egress programs cannot access ingress entries in xdp context
2. verify ability to load, attach, and detach xdp egress to a device.

Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 .../bpf/prog_tests/xdp_egress_attach.c        | 56 +++++++++++++++++++
 .../selftests/bpf/progs/test_xdp_egress.c     | 12 ++++
 .../bpf/progs/test_xdp_egress_fail.c          | 16 ++++++
 3 files changed, 84 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_egress_attach.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_egress.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_egress_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_egress_attach.c b/tools/testing/selftests/bpf/prog_tests/xdp_egress_attach.c
new file mode 100644
index 000000000000..5253754b27de
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_egress_attach.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/if_link.h>
+#include <test_progs.h>
+
+#define IFINDEX_LO 1
+
+void test_xdp_egress_attach(void)
+{
+	struct bpf_prog_load_attr attr = {
+		.prog_type = BPF_PROG_TYPE_XDP,
+		.expected_attach_type = BPF_XDP_EGRESS,
+	};
+	struct bpf_prog_info info = {};
+	__u32 id, len = sizeof(info);
+	struct bpf_object *obj;
+	__u32 duration = 0;
+	int err, fd = -1;
+
+	/* should fail - accesses rx queue info */
+	attr.file = "./test_xdp_egress_fail.o",
+	err = bpf_prog_load_xattr(&attr, &obj, &fd);
+	if (CHECK(err == 0 && fd >= 0, "xdp_egress with rx failed to load",
+		 "load of xdp_egress with rx succeeded instead of failed"))
+		return;
+
+	attr.file = "./test_xdp_egress.o",
+	err = bpf_prog_load_xattr(&attr, &obj, &fd);
+	if (CHECK_FAIL(err))
+		return;
+
+	err = bpf_obj_get_info_by_fd(fd, &info, &len);
+	if (CHECK_FAIL(err))
+		goto out_close;
+
+	err = bpf_set_link_xdp_fd(IFINDEX_LO, fd, XDP_FLAGS_EGRESS_MODE);
+	if (CHECK(err, "xdp attach", "xdp attach failed"))
+		goto out_close;
+
+	err = bpf_get_link_xdp_id(IFINDEX_LO, &id, XDP_FLAGS_EGRESS_MODE);
+	if (CHECK(err || id != info.id, "id_check",
+		  "loaded prog id %u != id %u, err %d", info.id, id, err))
+		goto out;
+
+out:
+	err = bpf_set_link_xdp_fd(IFINDEX_LO, -1, XDP_FLAGS_EGRESS_MODE);
+	if (CHECK(err, "xdp detach", "xdp detach failed"))
+		goto out_close;
+
+	err = bpf_get_link_xdp_id(IFINDEX_LO, &id, XDP_FLAGS_EGRESS_MODE);
+	if (CHECK(err || id, "id_check",
+		  "failed to detach program %u", id))
+		goto out;
+
+out_close:
+	bpf_object__close(obj);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_egress.c b/tools/testing/selftests/bpf/progs/test_xdp_egress.c
new file mode 100644
index 000000000000..0477e8537b7f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_egress.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+SEC("xdp_egress")
+int xdp_egress_good(struct xdp_md *ctx)
+{
+	__u32 idx = ctx->egress_ifindex;
+
+	return idx == 1 ? XDP_DROP : XDP_PASS;
+}
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_egress_fail.c b/tools/testing/selftests/bpf/progs/test_xdp_egress_fail.c
new file mode 100644
index 000000000000..76b47b1d3bc3
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_egress_fail.c
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+SEC("xdp_egress")
+int xdp_egress_fail(struct xdp_md *ctx)
+{
+	__u32 rxq = ctx->rx_queue_index;
+	__u32 idx = ctx->ingress_ifindex;
+
+	if (idx == 1)
+		return XDP_DROP;
+
+	return rxq ? XDP_DROP : XDP_PASS;
+}
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v5 bpf-next 11/11] samples/bpf: add XDP egress support to xdp1
  2020-05-13  1:45 [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path David Ahern
                   ` (9 preceding siblings ...)
  2020-05-13  1:46 ` [PATCH v5 bpf-next 10/11] selftest: Add xdp_egress attach tests David Ahern
@ 2020-05-13  1:46 ` David Ahern
  2020-05-13 10:43 ` [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path Toke Høiland-Jørgensen
  11 siblings, 0 replies; 32+ messages in thread
From: David Ahern @ 2020-05-13  1:46 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, daniel,
	john.fastabend, ast, kafai, songliubraving, yhs, andriin,
	dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

xdp1 and xdp2 now accept -E flag to set XDP program in the egress
path.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 samples/bpf/xdp1_user.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index c447ad9e3a1d..bb104f4d8c5e 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -73,7 +73,8 @@ static void usage(const char *prog)
 		"OPTS:\n"
 		"    -S    use skb-mode\n"
 		"    -N    enforce native mode\n"
-		"    -F    force loading prog\n",
+		"    -F    force loading prog\n"
+		"    -E	   egress path program\n",
 		prog);
 }
 
@@ -85,7 +86,7 @@ int main(int argc, char **argv)
 	};
 	struct bpf_prog_info info = {};
 	__u32 info_len = sizeof(info);
-	const char *optstr = "FSN";
+	const char *optstr = "FSNE";
 	int prog_fd, map_fd, opt;
 	struct bpf_object *obj;
 	struct bpf_map *map;
@@ -103,13 +104,17 @@ int main(int argc, char **argv)
 		case 'F':
 			xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
+		case 'E':
+			xdp_flags |= XDP_FLAGS_EGRESS_MODE;
+			prog_load_attr.expected_attach_type = BPF_XDP_EGRESS;
+			break;
 		default:
 			usage(basename(argv[0]));
 			return 1;
 		}
 	}
 
-	if (!(xdp_flags & XDP_FLAGS_SKB_MODE))
+	if (!(xdp_flags & (XDP_FLAGS_SKB_MODE | XDP_FLAGS_EGRESS_MODE)))
 		xdp_flags |= XDP_FLAGS_DRV_MODE;
 
 	if (optind == argc) {
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-13  1:45 [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path David Ahern
                   ` (10 preceding siblings ...)
  2020-05-13  1:46 ` [PATCH v5 bpf-next 11/11] samples/bpf: add XDP egress support to xdp1 David Ahern
@ 2020-05-13 10:43 ` Toke Høiland-Jørgensen
  2020-05-13 19:37   ` David Ahern
  11 siblings, 1 reply; 32+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-05-13 10:43 UTC (permalink / raw)
  To: David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, daniel, john.fastabend,
	ast, kafai, songliubraving, yhs, andriin, dsahern, David Ahern

David Ahern <dsahern@kernel.org> writes:

> From: David Ahern <dahern@digitalocean.com>
>
> This series adds support for XDP in the egress path by introducing
> a new XDP attachment type, BPF_XDP_EGRESS, and adding a UAPI to
> if_link.h for attaching the program to a netdevice and reporting
> the program. This allows bpf programs to be run on redirected xdp
> frames with the context showing the Tx device.
>
> This is a missing primitive for XDP allowing solutions to build small,
> targeted programs properly distributed in the networking path allowing,
> for example, an egress firewall/ACL/traffic verification or packet
> manipulation based on data specific to the egress device.
>
> Nothing about running a program in the Tx path requires driver specific
> resources like the Rx path has. Thus, programs can be run in core
> code and attached to the net_device struct similar to skb mode. The
> egress attach is done using the new XDP_FLAGS_EGRESS_MODE flag, and
> is reported by the kernel using the XDP_ATTACHED_EGRESS_CORE attach
> flag with IFLA_XDP_EGRESS_PROG_ID making the api similar to existing
> APIs for XDP.
>
> The egress program is run in bq_xmit_all before invoking ndo_xdp_xmit.
> This is similar to cls_bpf programs which run before the call to
> ndo_dev_xmit. Together the 2 locations cover all packets about to be
> sent to a device for Tx.
>
> xdp egress programs are not run on skbs, so a cls-bpf counterpart
> should also be attached to the device to cover all packets -
> xdp_frames and skbs.
>
> v5:
> - rebased to top of bpf-next
> - dropped skb path; cls-bpf provides an option for the same functionality
>   without having to take a performance hit (e.g., disabling GSO).

I don't like this. I makes the egress hook asymmetrical with the ingress
hook (ingress hook sees all traffic, egress only some of it). If the
performance hit of disabling GSO is the concern, maybe it's better to
wait until we figure out how to deal with that (presumably by
multi-buffer XDP)?

-Toke


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-13 10:43 ` [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path Toke Høiland-Jørgensen
@ 2020-05-13 19:37   ` David Ahern
  2020-05-15 22:54     ` John Fastabend
  2020-05-18  9:08     ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 32+ messages in thread
From: David Ahern @ 2020-05-13 19:37 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, daniel, john.fastabend,
	ast, kafai, songliubraving, yhs, andriin, David Ahern

On 5/13/20 4:43 AM, Toke Høiland-Jørgensen wrote:
> I don't like this. I makes the egress hook asymmetrical with the ingress
> hook (ingress hook sees all traffic, egress only some of it). If the
> performance hit of disabling GSO is the concern, maybe it's better to
> wait until we figure out how to deal with that (presumably by
> multi-buffer XDP)?

XDP is for accelerated networking. Disabling a h/w offload feature to
use a s/w feature is just wrong. But it is more than just disabling GSO,
and multi-buffer support for XDP is still not going to solve the
problem. XDP is free form allowing any packet modifications - pushing
and popping headers - and, for example, that blows up all of the skb
markers for mac, network, transport and their inner versions. Walking
the skb after an XDP program has run to reset the markers does not make
sense. Combine this with the generic xdp overhead (e.g., handling skb
clone and linearize), and the whole thing just does not make sense.

We have to accept there a lot of use cases / code paths that simply can
not be converted to work with both skbs and xdp_frames. The qdisc code
is one example. This is another. Requiring a tc program for the skb path
is an acceptable trade off.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-13 19:37   ` David Ahern
@ 2020-05-15 22:54     ` John Fastabend
  2020-05-15 23:15       ` David Ahern
  2020-05-18  3:40       ` David Ahern
  2020-05-18  9:08     ` Toke Høiland-Jørgensen
  1 sibling, 2 replies; 32+ messages in thread
From: John Fastabend @ 2020-05-15 22:54 UTC (permalink / raw)
  To: David Ahern, Toke Høiland-Jørgensen, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, daniel, john.fastabend,
	ast, kafai, songliubraving, yhs, andriin, David Ahern

David Ahern wrote:
> On 5/13/20 4:43 AM, Toke Høiland-Jørgensen wrote:
> > I don't like this. I makes the egress hook asymmetrical with the ingress
> > hook (ingress hook sees all traffic, egress only some of it). If the
> > performance hit of disabling GSO is the concern, maybe it's better to
> > wait until we figure out how to deal with that (presumably by
> > multi-buffer XDP)?
> 
> XDP is for accelerated networking. Disabling a h/w offload feature to
> use a s/w feature is just wrong. But it is more than just disabling GSO,
> and multi-buffer support for XDP is still not going to solve the
> problem. XDP is free form allowing any packet modifications - pushing
> and popping headers - and, for example, that blows up all of the skb
> markers for mac, network, transport and their inner versions. Walking
> the skb after an XDP program has run to reset the markers does not make
> sense. Combine this with the generic xdp overhead (e.g., handling skb
> clone and linearize), and the whole thing just does not make sense.
> 
> We have to accept there a lot of use cases / code paths that simply can
> not be converted to work with both skbs and xdp_frames. The qdisc code
> is one example. This is another. Requiring a tc program for the skb path
> is an acceptable trade off.

Hi David,

Another way to set up egress programs that I had been thinking about is to
build a prog_array map with a slot per interface then after doing the
redirect (or I guess the tail call program can do the redirect) do the
tail call into the "egress" program.

From a programming side this would look like,


  ---> ingress xdp bpf                BPF_MAP_TYPE_PROG_ARRAY
         redirect(ifindex)            +---------+
         tail_call(ifindex)           |         |
                      |               +---------+
                      +-------------> | ifindex | 
                                      +---------+
                                      |         |
                                      +---------+


         return XDP_REDIRECT
                        |
                        +-------------> xdp_xmit


The controller would then update the BPF_MAP_TYPE_PROG_ARRAY instead of
attaching to egress interface itself as in the series here. I think it
would only require that tail call program return XDP_REDIRECT so the
driver knows to follow through with the redirect. OTOH the egress program
can decide to DROP or PASS as well. The DROP case is straight forward,
packet gets dropped. The PASS case is interesting because it will cause
the packet to go to the stack. Which may or may not be expected I guess.
We could always lint the programs or force the programs to return only
XDP_REDIRECT/XDP_PASS from libbpf side.

Would there be any differences from my example and your series from the
datapath side? I think from the BPF program side the only difference
would be return codes XDP_REDIRECT vs XDP_PASS. The control plane is
different however. I don't have a good sense of one being better than
the other. Do you happen to see some reason to prefer native xdp egress
program types over prog array usage?

From performance side I suspect they will be more or less equivalant.

On the positive side using a PROG_ARRAY doesn't require a new attach
point. A con might be right-sizing the PROG_ARRAY to map to interfaces?
Do you have 1000's of interfaces here? Or some unknown number of
interfaces? I've had building resizable hash/array maps for awhile
on my todo list so could add that for other use cases as well if that
was the only problem.

Sorry for the late reply it took me a bit of time to mull over the
patches.

Thanks,
John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-15 22:54     ` John Fastabend
@ 2020-05-15 23:15       ` David Ahern
  2020-05-18 18:10         ` John Fastabend
  2020-05-18  3:40       ` David Ahern
  1 sibling, 1 reply; 32+ messages in thread
From: David Ahern @ 2020-05-15 23:15 UTC (permalink / raw)
  To: John Fastabend, Toke Høiland-Jørgensen, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, daniel, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

On 5/15/20 4:54 PM, John Fastabend wrote:
> Hi David,
> 
> Another way to set up egress programs that I had been thinking about is to
> build a prog_array map with a slot per interface then after doing the
> redirect (or I guess the tail call program can do the redirect) do the
> tail call into the "egress" program.
> 
> From a programming side this would look like,
> 
> 
>   ---> ingress xdp bpf                BPF_MAP_TYPE_PROG_ARRAY
>          redirect(ifindex)            +---------+
>          tail_call(ifindex)           |         |
>                       |               +---------+
>                       +-------------> | ifindex | 
>                                       +---------+
>                                       |         |
>                                       +---------+
> 
> 
>          return XDP_REDIRECT
>                         |
>                         +-------------> xdp_xmit
> 
> 
> The controller would then update the BPF_MAP_TYPE_PROG_ARRAY instead of
> attaching to egress interface itself as in the series here. I think it
> would only require that tail call program return XDP_REDIRECT so the
> driver knows to follow through with the redirect. OTOH the egress program
> can decide to DROP or PASS as well. The DROP case is straight forward,
> packet gets dropped. The PASS case is interesting because it will cause
> the packet to go to the stack. Which may or may not be expected I guess.
> We could always lint the programs or force the programs to return only
> XDP_REDIRECT/XDP_PASS from libbpf side.
> 
> Would there be any differences from my example and your series from the
> datapath side? I think from the BPF program side the only difference
> would be return codes XDP_REDIRECT vs XDP_PASS. The control plane is
> different however. I don't have a good sense of one being better than
> the other. Do you happen to see some reason to prefer native xdp egress
> program types over prog array usage?

host ingress to VM is one use case; VM to VM on the same host is another.

> 
> From performance side I suspect they will be more or less equivalant.
> 
> On the positive side using a PROG_ARRAY doesn't require a new attach
> point. A con might be right-sizing the PROG_ARRAY to map to interfaces?
> Do you have 1000's of interfaces here? Or some unknown number of

1000ish is probably the right ballpark - up to 500 VM's on a host each
with a public and private network connection. From there each interface
can have their own firewall (ingress and egress; most likely VM unique
data, but to be flexible potentially different programs e.g., blacklist
vs whitelist). Each VM will definitely have its own network data - mac
and network addresses, and since VMs are untrusted packet validation in
both directions is a requirement.

With respect to lifecycle management of the programs and the data,
putting VM specific programs and maps on VM specific taps simplifies
management. VM terminates, taps are deleted, programs and maps
disappear. So no validator thread needed to handle stray data / programs
from the inevitable cleanup problems when everything is lumped into 1
program / map or even array of programs and maps.

To me the distributed approach is the simplest and best. The program on
the host nics can be stupid simple; no packet parsing beyond the
ethernet header. It's job is just a traffic demuxer very much like a
switch. All VM logic and data is local to the VM's interfaces.


> interfaces? I've had building resizable hash/array maps for awhile
> on my todo list so could add that for other use cases as well if that
> was the only problem.
> 
> Sorry for the late reply it took me a bit of time to mull over the
> patches.
> 
> Thanks,
> John
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-15 22:54     ` John Fastabend
  2020-05-15 23:15       ` David Ahern
@ 2020-05-18  3:40       ` David Ahern
  1 sibling, 0 replies; 32+ messages in thread
From: David Ahern @ 2020-05-18  3:40 UTC (permalink / raw)
  To: John Fastabend, Toke Høiland-Jørgensen, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, daniel, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

I am trying to understand the resistance here. There are ingress/egress
hooks for most of the layers - tc, netfilter, and even within bpf APIs.
Clearly there is a need for this kind of symmetry across the APIs, so
why the resistance or hesitation for XDP?

Stacking programs on the Rx side into the host was brought up 9
revisions ago when the first patches went out. It makes for an
unnecessarily complicated design and is antithetical to the whole
Unix/Linux philosophy of small focused programs linked together to
provide a solution.

Can you elaborate on your concerns?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-13 19:37   ` David Ahern
  2020-05-15 22:54     ` John Fastabend
@ 2020-05-18  9:08     ` Toke Høiland-Jørgensen
  2020-05-18 14:44       ` David Ahern
  1 sibling, 1 reply; 32+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-05-18  9:08 UTC (permalink / raw)
  To: David Ahern, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, daniel, john.fastabend,
	ast, kafai, songliubraving, yhs, andriin, David Ahern

David Ahern <dsahern@gmail.com> writes:

> On 5/13/20 4:43 AM, Toke Høiland-Jørgensen wrote:
>> I don't like this. I makes the egress hook asymmetrical with the ingress
>> hook (ingress hook sees all traffic, egress only some of it). If the
>> performance hit of disabling GSO is the concern, maybe it's better to
>> wait until we figure out how to deal with that (presumably by
>> multi-buffer XDP)?
>
> XDP is for accelerated networking. Disabling a h/w offload feature to
> use a s/w feature is just wrong. But it is more than just disabling GSO,
> and multi-buffer support for XDP is still not going to solve the
> problem. XDP is free form allowing any packet modifications - pushing
> and popping headers - and, for example, that blows up all of the skb
> markers for mac, network, transport and their inner versions. Walking
> the skb after an XDP program has run to reset the markers does not make
> sense. Combine this with the generic xdp overhead (e.g., handling skb
> clone and linearize), and the whole thing just does not make sense.

I can see your point that fixing up the whole skb after the program has
run is not a good idea. But to me that just indicates that the hook is
in the wrong place: that it really should be in the driver, executed at
a point where the skb data structure is no longer necessary (similar to
how the ingress hook is before the skb is generated).

Otherwise, what you're proposing is not an egress hook, but rather a
'post-REDIRECT hook', which is strictly less powerful. This may or may
not be useful in its own right, but let's not pretend it's a full egress
hook. Personally I feel that the egress hook is what we should be going
for, not this partial thing.

-Toke


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-18  9:08     ` Toke Høiland-Jørgensen
@ 2020-05-18 14:44       ` David Ahern
  2020-05-18 18:00         ` Toke Høiland-Jørgensen
  2020-05-18 21:23         ` Daniel Borkmann
  0 siblings, 2 replies; 32+ messages in thread
From: David Ahern @ 2020-05-18 14:44 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, daniel, john.fastabend,
	ast, kafai, songliubraving, yhs, andriin, David Ahern

On 5/18/20 3:08 AM, Toke Høiland-Jørgensen wrote:
> I can see your point that fixing up the whole skb after the program has
> run is not a good idea. But to me that just indicates that the hook is
> in the wrong place: that it really should be in the driver, executed at
> a point where the skb data structure is no longer necessary (similar to
> how the ingress hook is before the skb is generated).

Have you created a cls_bpf program to modify skbs? Have you looked at
the helpers, the restrictions and the tight management of skb changes?
Have you followed the skb from create to device handoff through the
drivers? Have you looked at the history of encapsulations, gso handling,
offloads, ...? I have and it drove home that the skb path and xdp paths
are radically different. XDP is meant to be light and fast, and trying
to cram an skb down the xdp path is a dead end.

> 
> Otherwise, what you're proposing is not an egress hook, but rather a
> 'post-REDIRECT hook', which is strictly less powerful. This may or may
> not be useful in its own right, but let's not pretend it's a full egress
> hook. Personally I feel that the egress hook is what we should be going
> for, not this partial thing.

You are hand waving. Be specific, with details.

Less powerful how? There are only so many operations you can do to a
packet. What do you want to do and what can't be done with this proposed
change? Why must it be done as XDP vs proper synergy between the 2 paths.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-18 14:44       ` David Ahern
@ 2020-05-18 18:00         ` Toke Høiland-Jørgensen
  2020-05-18 21:06           ` Daniel Borkmann
  2020-05-18 23:37           ` David Ahern
  2020-05-18 21:23         ` Daniel Borkmann
  1 sibling, 2 replies; 32+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-05-18 18:00 UTC (permalink / raw)
  To: David Ahern, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, daniel, john.fastabend,
	ast, kafai, songliubraving, yhs, andriin, David Ahern

David Ahern <dsahern@gmail.com> writes:

> On 5/18/20 3:08 AM, Toke Høiland-Jørgensen wrote:
>> I can see your point that fixing up the whole skb after the program has
>> run is not a good idea. But to me that just indicates that the hook is
>> in the wrong place: that it really should be in the driver, executed at
>> a point where the skb data structure is no longer necessary (similar to
>> how the ingress hook is before the skb is generated).
>
> Have you created a cls_bpf program to modify skbs? Have you looked at
> the helpers, the restrictions and the tight management of skb changes?
> Have you followed the skb from create to device handoff through the
> drivers? Have you looked at the history of encapsulations, gso handling,
> offloads, ...?

Have you tried re-reading the first sentence of the paragraph you're
replying to? You know, the one that started with "I can see your point
that..."

>> Otherwise, what you're proposing is not an egress hook, but rather a
>> 'post-REDIRECT hook', which is strictly less powerful. This may or may
>> not be useful in its own right, but let's not pretend it's a full egress
>> hook. Personally I feel that the egress hook is what we should be going
>> for, not this partial thing.
>
> You are hand waving. Be specific, with details.

Are you deliberately trying to antagonise me or something? It's a really
odd way to try to make your case...

> Less powerful how? There are only so many operations you can do to a
> packet. What do you want to do and what can't be done with this proposed
> change? Why must it be done as XDP vs proper synergy between the 2 paths.

I meant 'less powerful' in the obvious sense: it only sees a subset of
the packets going out of the interface. And so I worry that it will (a)
make an already hard to use set of APIs even more confusing, and (b)
turn out to not be enough so we'll end up needing a "real" egress hook.

As I said in my previous email, a post-REDIRECT hook may or may not be
useful in its own right. I'm kinda on the fence about that, but am
actually leaning towards it being useful; however, I am concerned that
it'll end up being redundant if we do get a full egress hook.

-Toke


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-15 23:15       ` David Ahern
@ 2020-05-18 18:10         ` John Fastabend
  2020-05-18 23:52           ` David Ahern
  0 siblings, 1 reply; 32+ messages in thread
From: John Fastabend @ 2020-05-18 18:10 UTC (permalink / raw)
  To: David Ahern, John Fastabend, Toke Høiland-Jørgensen,
	David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, daniel, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

David Ahern wrote:
> On 5/15/20 4:54 PM, John Fastabend wrote:
> > Hi David,
> > 
> > Another way to set up egress programs that I had been thinking about is to
> > build a prog_array map with a slot per interface then after doing the
> > redirect (or I guess the tail call program can do the redirect) do the
> > tail call into the "egress" program.
> > 
> > From a programming side this would look like,
> > 
> > 
> >   ---> ingress xdp bpf                BPF_MAP_TYPE_PROG_ARRAY
> >          redirect(ifindex)            +---------+
> >          tail_call(ifindex)           |         |
> >                       |               +---------+
> >                       +-------------> | ifindex | 
> >                                       +---------+
> >                                       |         |
> >                                       +---------+
> > 
> > 
> >          return XDP_REDIRECT
> >                         |
> >                         +-------------> xdp_xmit
> > 
> > 
> > The controller would then update the BPF_MAP_TYPE_PROG_ARRAY instead of
> > attaching to egress interface itself as in the series here. I think it
> > would only require that tail call program return XDP_REDIRECT so the
> > driver knows to follow through with the redirect. OTOH the egress program
> > can decide to DROP or PASS as well. The DROP case is straight forward,
> > packet gets dropped. The PASS case is interesting because it will cause
> > the packet to go to the stack. Which may or may not be expected I guess.
> > We could always lint the programs or force the programs to return only
> > XDP_REDIRECT/XDP_PASS from libbpf side.
> > 
> > Would there be any differences from my example and your series from the
> > datapath side? I think from the BPF program side the only difference
> > would be return codes XDP_REDIRECT vs XDP_PASS. The control plane is
> > different however. I don't have a good sense of one being better than
> > the other. Do you happen to see some reason to prefer native xdp egress
> > program types over prog array usage?
> 
> host ingress to VM is one use case; VM to VM on the same host is another.

But host ingress to VM would still work with tail calls because the XDP
packet came from another XDP program. At least that is how I understand
it.

VM to VM case, again using tail calls on the sending VM ingress hook
would work also.

> 
> > 
> > From performance side I suspect they will be more or less equivalant.
> > 
> > On the positive side using a PROG_ARRAY doesn't require a new attach
> > point. A con might be right-sizing the PROG_ARRAY to map to interfaces?
> > Do you have 1000's of interfaces here? Or some unknown number of
> 
> 1000ish is probably the right ballpark - up to 500 VM's on a host each
> with a public and private network connection. From there each interface
> can have their own firewall (ingress and egress; most likely VM unique
> data, but to be flexible potentially different programs e.g., blacklist
> vs whitelist). Each VM will definitely have its own network data - mac
> and network addresses, and since VMs are untrusted packet validation in
> both directions is a requirement.

Understood and makes sense.

> 
> With respect to lifecycle management of the programs and the data,
> putting VM specific programs and maps on VM specific taps simplifies
> management. VM terminates, taps are deleted, programs and maps
> disappear. So no validator thread needed to handle stray data / programs
> from the inevitable cleanup problems when everything is lumped into 1
> program / map or even array of programs and maps.

OK. Also presumably you already have a hook into this event to insert
the tc filter programs so its probably a natural hook for mgmt.

> 
> To me the distributed approach is the simplest and best. The program on
> the host nics can be stupid simple; no packet parsing beyond the
> ethernet header. It's job is just a traffic demuxer very much like a
> switch. All VM logic and data is local to the VM's interfaces.

IMO it seems more natural and efficient to use a tail call. But, I
can see how if the ingress program is a l2/l3 switch and the VM hook
is a l2/l3 filter it feels more like a switch+firewall layout we
would normally use on a "real" (v)switch. Also I think the above point
where cleanup is free because of the tap tear down is a win.

> 
> 
> > interfaces? I've had building resizable hash/array maps for awhile
> > on my todo list so could add that for other use cases as well if that
> > was the only problem.
> > 
> > Sorry for the late reply it took me a bit of time to mull over the
> > patches.
> > 
> > Thanks,
> > John
> > 

Pulling in below because I think it was for me.

> I am trying to understand the resistance here. There are ingress/egress
> hooks for most of the layers - tc, netfilter, and even within bpf APIs.
> Clearly there is a need for this kind of symmetry across the APIs, so
> why the resistance or hesitation for XDP?

Because I don't see it as necessary and it adds another xdp interface. I
also didn't fully understand why it would be useful.

> 
> Stacking programs on the Rx side into the host was brought up 9
> revisions ago when the first patches went out. It makes for an
> unnecessarily complicated design and is antithetical to the whole
> Unix/Linux philosophy of small focused programs linked together to
> provide a solution.

I know it was brought up earlier and at the time the hook was also being
used for skbs. This sort of convinced me it was different from the tail
call example. Once skbs usage become impractical it seems like the
same datapath can be implemented with xdp+prog_array. As I understand
it this is still the case. The datapath could be implemented as a set
of xdp+prog_array hooks but the mgmt life-cycle is different and also
the mental model is a bit different. At least the mental model of the
BPF developer has to be different.

> Can you elaborate on your concerns?

Just understanding the use case.

My summary is the series gives us a few nice things (a) allows control
plane to be simpler because programs will not need to be explicitly
garbage collected, (b) we don't have to guess a right-size for a
program array map because we don't have to manage a map at all, and
(c) helps bpf programmers mental model by using separate attach points
for each function.

I can see how (a) and (b) will be useful so no objections from my
side to merge the series.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-18 18:00         ` Toke Høiland-Jørgensen
@ 2020-05-18 21:06           ` Daniel Borkmann
  2020-05-19  0:02             ` David Ahern
  2020-05-18 23:37           ` David Ahern
  1 sibling, 1 reply; 32+ messages in thread
From: Daniel Borkmann @ 2020-05-18 21:06 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, David Ahern, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, john.fastabend, ast,
	kafai, songliubraving, yhs, andriin, David Ahern

On 5/18/20 8:00 PM, Toke Høiland-Jørgensen wrote:
> David Ahern <dsahern@gmail.com> writes:
>> On 5/18/20 3:08 AM, Toke Høiland-Jørgensen wrote:
[...]
>> Less powerful how? There are only so many operations you can do to a
>> packet. What do you want to do and what can't be done with this proposed
>> change? Why must it be done as XDP vs proper synergy between the 2 paths.
> 
> I meant 'less powerful' in the obvious sense: it only sees a subset of
> the packets going out of the interface. And so I worry that it will (a)
> make an already hard to use set of APIs even more confusing, and (b)
> turn out to not be enough so we'll end up needing a "real" egress hook.
> 
> As I said in my previous email, a post-REDIRECT hook may or may not be
> useful in its own right. I'm kinda on the fence about that, but am
> actually leaning towards it being useful; however, I am concerned that
> it'll end up being redundant if we do get a full egress hook.

I tend to agree with this. From a user point of view, say, one that has used
the ingress XDP path before, the expectation would very likely be that an XDP
"egress hook" would see all the traffic similarly as on the ingress side, but
since the skb path has been dropped in this revision - I agree with you, David,
that it makes sense to do so - calling it XDP "egress" then feels a bit misleading
wrt expectations. I'd assume we'd see a lot of confused users on this very list
asking why their BPF program doesn't trigger.

So given we neither call this hook on the skb path, nor XDP_TX nor AF_XDP's TX
path, I was wondering also wrt the discussion with John if it makes sense to
make this hook a property of the devmap _itself_, for example, to have a default
BPF prog upon devmap creation or a dev-specific override that is passed on map
update along with the dev. At least this would make it very clear where this is
logically tied to and triggered from, and if needed (?) would provide potentially
more flexibility on specifiying BPF progs to be called while also solving your
use-case.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-18 14:44       ` David Ahern
  2020-05-18 18:00         ` Toke Høiland-Jørgensen
@ 2020-05-18 21:23         ` Daniel Borkmann
  1 sibling, 0 replies; 32+ messages in thread
From: Daniel Borkmann @ 2020-05-18 21:23 UTC (permalink / raw)
  To: David Ahern, Toke Høiland-Jørgensen, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, john.fastabend, ast,
	kafai, songliubraving, yhs, andriin, David Ahern

On 5/18/20 4:44 PM, David Ahern wrote:
> On 5/18/20 3:08 AM, Toke Høiland-Jørgensen wrote:
>> I can see your point that fixing up the whole skb after the program has
>> run is not a good idea. But to me that just indicates that the hook is
>> in the wrong place: that it really should be in the driver, executed at
>> a point where the skb data structure is no longer necessary (similar to
>> how the ingress hook is before the skb is generated).
> 
> Have you created a cls_bpf program to modify skbs? Have you looked at
> the helpers, the restrictions and the tight management of skb changes?
> Have you followed the skb from create to device handoff through the
> drivers? Have you looked at the history of encapsulations, gso handling,
> offloads, ...? I have and it drove home that the skb path and xdp paths
> are radically different. XDP is meant to be light and fast, and trying
> to cram an skb down the xdp path is a dead end.

Agree, it's already challenging in itself to abstract the skb internals and
protocol specifics away for tc BPF programs while keeping them reasonably
fast (e.g. not destroying skb GSO specifics, etc). Good example is the whole
bpf_skb_adjust_room() flags mess. :/ The buffer would have to be an XDP
one straight from socket layer and stay that way as an xdp-buff down to the
driver, not the other way around where you'd pay the price of back'n'forth
conversion to xdp-buff and then passing it to the driver while handling/
fixing up all the skb details after the BPF prog was run. AF_XDP's xmit would
be more suited for something like that.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-18 18:00         ` Toke Høiland-Jørgensen
  2020-05-18 21:06           ` Daniel Borkmann
@ 2020-05-18 23:37           ` David Ahern
  1 sibling, 0 replies; 32+ messages in thread
From: David Ahern @ 2020-05-18 23:37 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, daniel, john.fastabend,
	ast, kafai, songliubraving, yhs, andriin

On 5/18/20 12:00 PM, Toke Høiland-Jørgensen wrote:
> I meant 'less powerful' in the obvious sense: it only sees a subset of
> the packets going out of the interface. And so I worry that it will (a)
> make an already hard to use set of APIs even more confusing, and (b)
> turn out to not be enough so we'll end up needing a "real" egress hook.
> 
> As I said in my previous email, a post-REDIRECT hook may or may not be
> useful in its own right. I'm kinda on the fence about that, but am
> actually leaning towards it being useful; however, I am concerned that
> it'll end up being redundant if we do get a full egress hook.
> 

I made the changes to mlx5 to run programs in the driver back in early
March. I have looked at both i40e and mlx5 xmit functions all the way to
h/w handoff to get 2 vendor perspectives. With xdp I can push any header
I want - e.g., mpls - and as soon as I do the markers are wrong. Take a
look at mlx5e_sq_xmit and how it gets the transport header offset. Or
i40e_tso. Those markers are necessary for the offloads so there is no
'post skb' location to run a bpf program in the driver and have the
result be sane for hardware handoff.

[ as an aside, a co-worker just happened to hit something like this
today (unrelated to xdp). He called dev_queue_xmit with a large,
manually crafted packet and no skb markers. Both the boxes (connected
back to back) had to be rebooted.]

From what I can see there are 3 ways to run an XDP program on skbs in
the Tx path:
1. disable hardware offloads (which is nonsense - you don't disable H/W
acceleration for S/W acceleration),

2. neuter XDP egress and not allow bpf_xdp_adjust_head (that is a key
feature of XDP), or

3. walk the skb afterwards and reset the markers (performance killer).

I have stared at this code for months; I would love for someone to prove
me wrong.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-18 18:10         ` John Fastabend
@ 2020-05-18 23:52           ` David Ahern
  2020-05-19  6:04             ` John Fastabend
  0 siblings, 1 reply; 32+ messages in thread
From: David Ahern @ 2020-05-18 23:52 UTC (permalink / raw)
  To: John Fastabend, Toke Høiland-Jørgensen, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, daniel, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

On 5/18/20 12:10 PM, John Fastabend wrote:
>>
>> host ingress to VM is one use case; VM to VM on the same host is another.
> 
> But host ingress to VM would still work with tail calls because the XDP
> packet came from another XDP program. At least that is how I understand
> it.
> 
> VM to VM case, again using tail calls on the sending VM ingress hook
> would work also.

understood. I realize I can attach the program array all around, I just
see that as complex control plane / performance hit depending on how the
programs are wired up.

>>
>> With respect to lifecycle management of the programs and the data,
>> putting VM specific programs and maps on VM specific taps simplifies
>> management. VM terminates, taps are deleted, programs and maps
>> disappear. So no validator thread needed to handle stray data / programs
>> from the inevitable cleanup problems when everything is lumped into 1
>> program / map or even array of programs and maps.
> 
> OK. Also presumably you already have a hook into this event to insert
> the tc filter programs so its probably a natural hook for mgmt.

For VMs there is no reason to have an skb at all, so no tc filter program.

> 
>>
>> To me the distributed approach is the simplest and best. The program on
>> the host nics can be stupid simple; no packet parsing beyond the
>> ethernet header. It's job is just a traffic demuxer very much like a
>> switch. All VM logic and data is local to the VM's interfaces.
> 
> IMO it seems more natural and efficient to use a tail call. But, I
> can see how if the ingress program is a l2/l3 switch and the VM hook
> is a l2/l3 filter it feels more like a switch+firewall layout we
> would normally use on a "real" (v)switch. Also I think the above point
> where cleanup is free because of the tap tear down is a win.

exactly. To the VM. the host is part of the network. The host should be
passing the packets as fast and as simply as possible from ingress nic
to vm. It can be done completely as xdp frames and doing so reduces the
CPU cycles per packet in the host (yes, there are caveats to that
statement).

VM to host nic, and VM to VM have their own challenges which need to be
tackled next.

But the end goal is to have all VM traffic touched by the host as xdp
frames and without creating a complex control plane. The distributed
approach is much simpler and cleaner - and seems to follow what Cilium
is doing to a degree, or that is my interpretation of

"By attaching to the TC ingress hook of the host side of this veth pair
Cilium can monitor and enforce policy on all traffic exiting a
container. By attaching a BPF program to the veth pair associated with
each container and routing all network traffic to the host side virtual
devices with another BPF program attached to the tc ingress hook as well
Cilium can monitor and enforce policy on all traffic entering or exiting
the node."

https://docs.cilium.io/en/v1.7/architecture/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-18 21:06           ` Daniel Borkmann
@ 2020-05-19  0:02             ` David Ahern
  2020-05-19 13:31               ` Daniel Borkmann
  0 siblings, 1 reply; 32+ messages in thread
From: David Ahern @ 2020-05-19  0:02 UTC (permalink / raw)
  To: Daniel Borkmann, Toke Høiland-Jørgensen, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, john.fastabend, ast,
	kafai, songliubraving, yhs, andriin, David Ahern

On 5/18/20 3:06 PM, Daniel Borkmann wrote:
> So given we neither call this hook on the skb path, nor XDP_TX nor
> AF_XDP's TX
> path, I was wondering also wrt the discussion with John if it makes
> sense to
> make this hook a property of the devmap _itself_, for example, to have a
> default
> BPF prog upon devmap creation or a dev-specific override that is passed
> on map
> update along with the dev. At least this would make it very clear where
> this is
> logically tied to and triggered from, and if needed (?) would provide
> potentially
> more flexibility on specifiying BPF progs to be called while also
> solving your
> use-case.
> 

You lost me on the 'property of the devmap.' The programs need to be per
netdevice, and devmap is an array of devices. Can you elaborate?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-18 23:52           ` David Ahern
@ 2020-05-19  6:04             ` John Fastabend
  0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2020-05-19  6:04 UTC (permalink / raw)
  To: David Ahern, John Fastabend, Toke Høiland-Jørgensen,
	David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, daniel, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

David Ahern wrote:
> On 5/18/20 12:10 PM, John Fastabend wrote:
> >>
> >> host ingress to VM is one use case; VM to VM on the same host is another.
> > 
> > But host ingress to VM would still work with tail calls because the XDP
> > packet came from another XDP program. At least that is how I understand
> > it.
> > 
> > VM to VM case, again using tail calls on the sending VM ingress hook
> > would work also.
> 
> understood. I realize I can attach the program array all around, I just
> see that as complex control plane / performance hit depending on how the
> programs are wired up.
> 

Hard to argue with out a specific program. I think it could go either way.
I'll concede the control plane might be more complex but not so convinced
about performance. Either way having a program attached to the life cycle
of the VM seems like something that would be nice to have. In the tc skb
case if we attach to a qdisc it is removed automatically when the device
is removed. Having something similar for xdp is probably a good thing.

Worth following up in Daniel's thread. Another way to do that instead of
having the program associated with the ifindex is to have it associated
with the devmap entry. Basically when we add an entry in the devmap if
we had a program fd associated with it they could both be released when
the devmap entry is removed. This will happen automatically if the ifindex
is removed. But, rather than fragment threads too much I'll wait for
Daniel's reply.

> >>
> >> With respect to lifecycle management of the programs and the data,
> >> putting VM specific programs and maps on VM specific taps simplifies
> >> management. VM terminates, taps are deleted, programs and maps
> >> disappear. So no validator thread needed to handle stray data / programs
> >> from the inevitable cleanup problems when everything is lumped into 1
> >> program / map or even array of programs and maps.
> > 
> > OK. Also presumably you already have a hook into this event to insert
> > the tc filter programs so its probably a natural hook for mgmt.
> 
> For VMs there is no reason to have an skb at all, so no tc filter program.

+1 nice win for sure.

> 
> > 
> >>
> >> To me the distributed approach is the simplest and best. The program on
> >> the host nics can be stupid simple; no packet parsing beyond the
> >> ethernet header. It's job is just a traffic demuxer very much like a
> >> switch. All VM logic and data is local to the VM's interfaces.
> > 
> > IMO it seems more natural and efficient to use a tail call. But, I
> > can see how if the ingress program is a l2/l3 switch and the VM hook
> > is a l2/l3 filter it feels more like a switch+firewall layout we
> > would normally use on a "real" (v)switch. Also I think the above point
> > where cleanup is free because of the tap tear down is a win.
> 
> exactly. To the VM. the host is part of the network. The host should be
> passing the packets as fast and as simply as possible from ingress nic
> to vm. It can be done completely as xdp frames and doing so reduces the
> CPU cycles per packet in the host (yes, there are caveats to that
> statement).
> 
> VM to host nic, and VM to VM have their own challenges which need to be
> tackled next.
> 
> But the end goal is to have all VM traffic touched by the host as xdp
> frames and without creating a complex control plane. The distributed
> approach is much simpler and cleaner - and seems to follow what Cilium
> is doing to a degree, or that is my interpretation of

+1 agree everything as xdp pkt is a great goal.

> 
> "By attaching to the TC ingress hook of the host side of this veth pair
> Cilium can monitor and enforce policy on all traffic exiting a
> container. By attaching a BPF program to the veth pair associated with
> each container and routing all network traffic to the host side virtual
> devices with another BPF program attached to the tc ingress hook as well
> Cilium can monitor and enforce policy on all traffic entering or exiting
> the node."
> 
> https://docs.cilium.io/en/v1.7/architecture/

In many configurations there are no egress hooks though because policy (the
firewall piece) is implemented as part of the ingress hook. Because the
ingress TC hook "knows" where it will redirect a packet it can also run
the policy logic for that pod/VM/etc.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-19  0:02             ` David Ahern
@ 2020-05-19 13:31               ` Daniel Borkmann
  2020-05-19 14:21                 ` Jesper Dangaard Brouer
                                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Daniel Borkmann @ 2020-05-19 13:31 UTC (permalink / raw)
  To: David Ahern, Toke Høiland-Jørgensen, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, john.fastabend, ast,
	kafai, songliubraving, yhs, andriin, David Ahern

On 5/19/20 2:02 AM, David Ahern wrote:
> On 5/18/20 3:06 PM, Daniel Borkmann wrote:
>> So given we neither call this hook on the skb path, nor XDP_TX nor
>> AF_XDP's TX
>> path, I was wondering also wrt the discussion with John if it makes
>> sense to
>> make this hook a property of the devmap _itself_, for example, to have a
>> default
>> BPF prog upon devmap creation or a dev-specific override that is passed
>> on map
>> update along with the dev. At least this would make it very clear where
>> this is
>> logically tied to and triggered from, and if needed (?) would provide
>> potentially
>> more flexibility on specifiying BPF progs to be called while also
>> solving your
>> use-case.
> 
> You lost me on the 'property of the devmap.' The programs need to be per
> netdevice, and devmap is an array of devices. Can you elaborate?

I meant that the dev{map,hash} would get extended in a way where the
__dev_map_update_elem() receives an (ifindex, BPF prog fd) tuple from
user space and holds the program's ref as long as it is in the map slot.
Then, upon redirect to the given device in the devmap, we'd execute the
prog as well in order to also allow for XDP_DROP policy in there. Upon
map update when we drop the dev from the map slot, we also release the
reference to the associated BPF prog. What I mean to say wrt 'property
of the devmap' is that this program is _only_ used in combination with
redirection to devmap, so given we are not solving all the other egress
cases for reasons mentioned, it would make sense to tie it logically to
the devmap which would also make it clear from a user perspective _when_
the prog is expected to run.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-19 13:31               ` Daniel Borkmann
@ 2020-05-19 14:21                 ` Jesper Dangaard Brouer
  2020-05-19 16:58                   ` Lorenzo Bianconi
  2020-05-19 14:52                 ` Toke Høiland-Jørgensen
  2020-05-19 16:37                 ` David Ahern
  2 siblings, 1 reply; 32+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-19 14:21 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Ahern, Toke Høiland-Jørgensen, David Ahern,
	netdev, davem, kuba, prashantbhole.linux, john.fastabend, ast,
	kafai, songliubraving, yhs, andriin, David Ahern, brouer,
	Lorenzo Bianconi

On Tue, 19 May 2020 15:31:20 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 5/19/20 2:02 AM, David Ahern wrote:
> > On 5/18/20 3:06 PM, Daniel Borkmann wrote:  
> >>
> >> So given we neither call this hook on the skb path, nor XDP_TX nor
> >> AF_XDP's TX path, I was wondering also wrt the discussion with
> >> John if it makes sense to make this hook a property of the devmap
> >> _itself_, for example, to have a default BPF prog upon devmap
> >> creation or a dev-specific override that is passed on map update
> >> along with the dev. At least this would make it very clear where
> >> this is logically tied to and triggered from, and if needed (?)
> >> would provide potentially more flexibility on specifiying BPF
> >> progs to be called while also solving your use-case.  
> > 
> > You lost me on the 'property of the devmap.' The programs need to be per
> > netdevice, and devmap is an array of devices. Can you elaborate?  
> 
> I meant that the dev{map,hash} would get extended in a way where the
> __dev_map_update_elem() receives an (ifindex, BPF prog fd) tuple from
> user space and holds the program's ref as long as it is in the map slot.
> Then, upon redirect to the given device in the devmap, we'd execute the
> prog as well in order to also allow for XDP_DROP policy in there. Upon
> map update when we drop the dev from the map slot, we also release the
> reference to the associated BPF prog. What I mean to say wrt 'property
> of the devmap' is that this program is _only_ used in combination with
> redirection to devmap, so given we are not solving all the other egress
> cases for reasons mentioned, it would make sense to tie it logically to
> the devmap which would also make it clear from a user perspective _when_
> the prog is expected to run.

Yes, I agree.

I also have a use-case for 'cpumap' (cc. Lorenzo as I asked him to
work on it).  We want to run another XDP program on the CPU that
receives the xdp_frame, and then allow it to XDP redirect again.
It would make a lot of sense, to attach this XDP program via inserting
an BPF-prog-fd into the map as a value.

Notice that we would also need another expected-attach-type for this
case, as we want to allow XDP program to read xdp_md->ingress_ifindex,
but we don't have xdp_rxq_info any-longer. Thus, we need to remap that
to xdp_frame->dev_rx->ifindex (instead of rxq->dev->ifindex).

The practical use-case is the espressobin mvneta based ARM64 board,
that can only receive IRQs + RX-frames on CPU-0, but hardware have more
TX-queues that we would like to take advantage of on both CPUs.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-19 13:31               ` Daniel Borkmann
  2020-05-19 14:21                 ` Jesper Dangaard Brouer
@ 2020-05-19 14:52                 ` Toke Høiland-Jørgensen
  2020-05-19 16:37                 ` David Ahern
  2 siblings, 0 replies; 32+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-05-19 14:52 UTC (permalink / raw)
  To: Daniel Borkmann, David Ahern, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, john.fastabend, ast,
	kafai, songliubraving, yhs, andriin, David Ahern

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 5/19/20 2:02 AM, David Ahern wrote:
>> On 5/18/20 3:06 PM, Daniel Borkmann wrote:
>>> So given we neither call this hook on the skb path, nor XDP_TX nor
>>> AF_XDP's TX
>>> path, I was wondering also wrt the discussion with John if it makes
>>> sense to
>>> make this hook a property of the devmap _itself_, for example, to have a
>>> default
>>> BPF prog upon devmap creation or a dev-specific override that is passed
>>> on map
>>> update along with the dev. At least this would make it very clear where
>>> this is
>>> logically tied to and triggered from, and if needed (?) would provide
>>> potentially
>>> more flexibility on specifiying BPF progs to be called while also
>>> solving your
>>> use-case.
>> 
>> You lost me on the 'property of the devmap.' The programs need to be per
>> netdevice, and devmap is an array of devices. Can you elaborate?
>
> I meant that the dev{map,hash} would get extended in a way where the
> __dev_map_update_elem() receives an (ifindex, BPF prog fd) tuple from
> user space and holds the program's ref as long as it is in the map slot.
> Then, upon redirect to the given device in the devmap, we'd execute the
> prog as well in order to also allow for XDP_DROP policy in there. Upon
> map update when we drop the dev from the map slot, we also release the
> reference to the associated BPF prog. What I mean to say wrt 'property
> of the devmap' is that this program is _only_ used in combination with
> redirection to devmap, so given we are not solving all the other egress
> cases for reasons mentioned, it would make sense to tie it logically to
> the devmap which would also make it clear from a user perspective _when_
> the prog is expected to run.

I would be totally on board with this. Also makes sense for the
multicast map type, if you want to fix up the packet after the redirect,
just stick the fixer-upper program into the map along with the ifindex.

-Toke


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-19 13:31               ` Daniel Borkmann
  2020-05-19 14:21                 ` Jesper Dangaard Brouer
  2020-05-19 14:52                 ` Toke Høiland-Jørgensen
@ 2020-05-19 16:37                 ` David Ahern
  2 siblings, 0 replies; 32+ messages in thread
From: David Ahern @ 2020-05-19 16:37 UTC (permalink / raw)
  To: Daniel Borkmann, netdev
  Cc: Toke Høiland-Jørgensen, davem, kuba,
	prashantbhole.linux, brouer, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin

On 5/19/20 7:31 AM, Daniel Borkmann wrote:
> I meant that the dev{map,hash} would get extended in a way where the
> __dev_map_update_elem() receives an (ifindex, BPF prog fd) tuple from
> user space and holds the program's ref as long as it is in the map slot.
> Then, upon redirect to the given device in the devmap, we'd execute the
> prog as well in order to also allow for XDP_DROP policy in there. Upon
> map update when we drop the dev from the map slot, we also release the
> reference to the associated BPF prog. What I mean to say wrt 'property
> of the devmap' is that this program is _only_ used in combination with
> redirection to devmap, so given we are not solving all the other egress
> cases for reasons mentioned, it would make sense to tie it logically to
> the devmap which would also make it clear from a user perspective _when_
> the prog is expected to run.

Thanks. I will take a look at this.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path
  2020-05-19 14:21                 ` Jesper Dangaard Brouer
@ 2020-05-19 16:58                   ` Lorenzo Bianconi
  0 siblings, 0 replies; 32+ messages in thread
From: Lorenzo Bianconi @ 2020-05-19 16:58 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Daniel Borkmann, David Ahern, Toke Høiland-Jørgensen,
	David Ahern, netdev, davem, kuba, prashantbhole.linux,
	john.fastabend, ast, kafai, songliubraving, yhs, andriin,
	David Ahern

[-- Attachment #1: Type: text/plain, Size: 3232 bytes --]

> On Tue, 19 May 2020 15:31:20 +0200
> Daniel Borkmann <daniel@iogearbox.net> wrote:
> 
> > On 5/19/20 2:02 AM, David Ahern wrote:
> > > On 5/18/20 3:06 PM, Daniel Borkmann wrote:  
> > >>
> > >> So given we neither call this hook on the skb path, nor XDP_TX nor
> > >> AF_XDP's TX path, I was wondering also wrt the discussion with
> > >> John if it makes sense to make this hook a property of the devmap
> > >> _itself_, for example, to have a default BPF prog upon devmap
> > >> creation or a dev-specific override that is passed on map update
> > >> along with the dev. At least this would make it very clear where
> > >> this is logically tied to and triggered from, and if needed (?)
> > >> would provide potentially more flexibility on specifiying BPF
> > >> progs to be called while also solving your use-case.  
> > > 
> > > You lost me on the 'property of the devmap.' The programs need to be per
> > > netdevice, and devmap is an array of devices. Can you elaborate?  
> > 
> > I meant that the dev{map,hash} would get extended in a way where the
> > __dev_map_update_elem() receives an (ifindex, BPF prog fd) tuple from
> > user space and holds the program's ref as long as it is in the map slot.
> > Then, upon redirect to the given device in the devmap, we'd execute the
> > prog as well in order to also allow for XDP_DROP policy in there. Upon
> > map update when we drop the dev from the map slot, we also release the
> > reference to the associated BPF prog. What I mean to say wrt 'property
> > of the devmap' is that this program is _only_ used in combination with
> > redirection to devmap, so given we are not solving all the other egress
> > cases for reasons mentioned, it would make sense to tie it logically to
> > the devmap which would also make it clear from a user perspective _when_
> > the prog is expected to run.
> 
> Yes, I agree.
> 
> I also have a use-case for 'cpumap' (cc. Lorenzo as I asked him to
> work on it).  We want to run another XDP program on the CPU that
> receives the xdp_frame, and then allow it to XDP redirect again.
> It would make a lot of sense, to attach this XDP program via inserting
> an BPF-prog-fd into the map as a value.
> 
> Notice that we would also need another expected-attach-type for this
> case, as we want to allow XDP program to read xdp_md->ingress_ifindex,
> but we don't have xdp_rxq_info any-longer. Thus, we need to remap that
> to xdp_frame->dev_rx->ifindex (instead of rxq->dev->ifindex).

here I am looking at how we can extend cpumaps in order to pass from
usersapce the qsize and a bpf program file descriptor adding an element
to the map and allow cpu_map_update_elem() to load the program (e.g.
similar to dev_change_xdp_fd())
Doing so we can have an approach similar to veth xdp implementation.

Regards,
Lorenzo

> 
> The practical use-case is the espressobin mvneta based ARM64 board,
> that can only receive IRQs + RX-frames on CPU-0, but hardware have more
> TX-queues that we would like to take advantage of on both CPUs.
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-05-19 16:58 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-13  1:45 [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path David Ahern
2020-05-13  1:45 ` [PATCH v5 bpf-next 01/11] net: Refactor convert_to_xdp_frame David Ahern
2020-05-13  1:45 ` [PATCH v5 bpf-next 02/11] net: uapi for XDP programs in the egress path David Ahern
2020-05-13  1:45 ` [PATCH v5 bpf-next 03/11] net: Add XDP setup and query commands for Tx programs David Ahern
2020-05-13  1:46 ` [PATCH v5 bpf-next 04/11] net: Add BPF_XDP_EGRESS as a bpf_attach_type David Ahern
2020-05-13  1:46 ` [PATCH v5 bpf-next 05/11] xdp: Add xdp_txq_info to xdp_buff David Ahern
2020-05-13  1:46 ` [PATCH v5 bpf-next 06/11] net: set XDP egress program on netdevice David Ahern
2020-05-13  1:46 ` [PATCH v5 bpf-next 07/11] net: Support xdp in the Tx path for xdp_frames David Ahern
2020-05-13  1:46 ` [PATCH v5 bpf-next 08/11] libbpf: Add egress XDP support David Ahern
2020-05-13  1:46 ` [PATCH v5 bpf-next 09/11] bpftool: Add support for XDP egress David Ahern
2020-05-13  1:46 ` [PATCH v5 bpf-next 10/11] selftest: Add xdp_egress attach tests David Ahern
2020-05-13  1:46 ` [PATCH v5 bpf-next 11/11] samples/bpf: add XDP egress support to xdp1 David Ahern
2020-05-13 10:43 ` [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path Toke Høiland-Jørgensen
2020-05-13 19:37   ` David Ahern
2020-05-15 22:54     ` John Fastabend
2020-05-15 23:15       ` David Ahern
2020-05-18 18:10         ` John Fastabend
2020-05-18 23:52           ` David Ahern
2020-05-19  6:04             ` John Fastabend
2020-05-18  3:40       ` David Ahern
2020-05-18  9:08     ` Toke Høiland-Jørgensen
2020-05-18 14:44       ` David Ahern
2020-05-18 18:00         ` Toke Høiland-Jørgensen
2020-05-18 21:06           ` Daniel Borkmann
2020-05-19  0:02             ` David Ahern
2020-05-19 13:31               ` Daniel Borkmann
2020-05-19 14:21                 ` Jesper Dangaard Brouer
2020-05-19 16:58                   ` Lorenzo Bianconi
2020-05-19 14:52                 ` Toke Høiland-Jørgensen
2020-05-19 16:37                 ` David Ahern
2020-05-18 23:37           ` David Ahern
2020-05-18 21:23         ` Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.