Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH bpf-next 0/4] XDP: Support atomic replacement of XDP interface attachments
@ 2020-03-19 13:13 Toke Høiland-Jørgensen
  2020-03-19 13:13 ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
                   ` (3 more replies)
  0 siblings, 4 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-19 13:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Lorenz Bauer,
	Andrey Ignatov, netdev, bpf

This series adds support for atomically replacing the XDP program loaded on an
interface. This is achieved by means of a new netlink attribute that can specify
the expected previous program to replace on the interface. If set, the kernel
will compare this "expected fd" attribute with the program currently loaded on
the interface, and reject the operation if it does not match.

With this primitive, userspace applications can avoid stepping on each other's
toes when simultaneously updating the loaded XDP program.

---

Toke Høiland-Jørgensen (4):
      xdp: Support specifying expected existing program when attaching XDP
      tools: Add EXPECTED_FD-related definitions in if_link.h
      libbpf: Add function to set link XDP fd while specifying old fd
      selftests/bpf: Add tests for attaching XDP programs


 include/linux/netdevice.h                          |  2 +-
 include/uapi/linux/if_link.h                       |  4 +-
 net/core/dev.c                                     | 25 ++++++++--
 net/core/rtnetlink.c                               | 11 +++++
 tools/include/uapi/linux/if_link.h                 |  4 +-
 tools/lib/bpf/libbpf.h                             |  2 +
 tools/lib/bpf/libbpf.map                           |  1 +
 tools/lib/bpf/netlink.c                            | 22 ++++++++-
 .../testing/selftests/bpf/prog_tests/xdp_attach.c  | 55 ++++++++++++++++++++++
 9 files changed, 117 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_attach.c


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-19 13:13 [PATCH bpf-next 0/4] XDP: Support atomic replacement of XDP interface attachments Toke Høiland-Jørgensen
@ 2020-03-19 13:13 ` Toke Høiland-Jørgensen
  2020-03-19 22:52   ` Jakub Kicinski
  2020-03-20  2:13   ` Yonghong Song
  2020-03-19 13:13 ` [PATCH bpf-next 2/4] tools: Add EXPECTED_FD-related definitions in if_link.h Toke Høiland-Jørgensen
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-19 13:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Lorenz Bauer,
	Andrey Ignatov, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

While it is currently possible for userspace to specify that an existing
XDP program should not be replaced when attaching to an interface, there is
no mechanism to safely replace a specific XDP program with another.

This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
set along with IFLA_XDP_FD. If set, the kernel will check that the program
currently loaded on the interface matches the expected one, and fail the
operation if it does not. This corresponds to a 'cmpxchg' memory operation.

A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
request checking of the EXPECTED_FD attribute. This is needed for userspace
to discover whether the kernel supports the new attribute.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/netdevice.h    |    2 +-
 include/uapi/linux/if_link.h |    4 +++-
 net/core/dev.c               |   25 ++++++++++++++++++++-----
 net/core/rtnetlink.c         |   11 +++++++++++
 4 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b6fedd54cd8e..40b12bd93913 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3767,7 +3767,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 
 typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
-		      int fd, u32 flags);
+		      int fd, int expected_fd, u32 flags);
 u32 __dev_xdp_query(struct net_device *dev, bpf_op_t xdp_op,
 		    enum bpf_netdev_command cmd);
 int xdp_umem_query(struct net_device *dev, u16 queue_id);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 61e0801c82df..314173f8079e 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -972,11 +972,12 @@ enum {
 #define XDP_FLAGS_SKB_MODE		(1U << 1)
 #define XDP_FLAGS_DRV_MODE		(1U << 2)
 #define XDP_FLAGS_HW_MODE		(1U << 3)
+#define XDP_FLAGS_EXPECT_FD		(1U << 4)
 #define XDP_FLAGS_MODES			(XDP_FLAGS_SKB_MODE | \
 					 XDP_FLAGS_DRV_MODE | \
 					 XDP_FLAGS_HW_MODE)
 #define XDP_FLAGS_MASK			(XDP_FLAGS_UPDATE_IF_NOEXIST | \
-					 XDP_FLAGS_MODES)
+					 XDP_FLAGS_MODES | XDP_FLAGS_EXPECT_FD)
 
 /* These are stored into IFLA_XDP_ATTACHED on dump. */
 enum {
@@ -996,6 +997,7 @@ enum {
 	IFLA_XDP_DRV_PROG_ID,
 	IFLA_XDP_SKB_PROG_ID,
 	IFLA_XDP_HW_PROG_ID,
+	IFLA_XDP_EXPECTED_FD,
 	__IFLA_XDP_MAX,
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 25dab1598803..44095326b8d5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8654,15 +8654,17 @@ static void dev_xdp_uninstall(struct net_device *dev)
  *	@dev: device
  *	@extack: netlink extended ack
  *	@fd: new program fd or negative value to clear
+ *	@expected_fd: old program fd that userspace expects to replace or clear
  *	@flags: xdp-related flags
  *
  *	Set or clear a bpf program for a device
  */
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
-		      int fd, u32 flags)
+		      int fd, int expected_fd, u32 flags)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 	enum bpf_netdev_command query;
+	u32 prog_id, expected_id = 0;
 	struct bpf_prog *prog = NULL;
 	bpf_op_t bpf_op, bpf_chk;
 	bool offload;
@@ -8683,15 +8685,28 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	if (bpf_op == bpf_chk)
 		bpf_chk = generic_xdp_install;
 
-	if (fd >= 0) {
-		u32 prog_id;
+	prog_id = __dev_xdp_query(dev, bpf_op, query);
+	if (expected_fd >= 0 || (flags & XDP_FLAGS_EXPECT_FD)) {
+		if (expected_fd >= 0) {
+			prog = bpf_prog_get_type_dev(expected_fd, BPF_PROG_TYPE_XDP,
+						     bpf_op == ops->ndo_bpf);
+			if (IS_ERR(prog))
+				return PTR_ERR(prog);
+			expected_id = prog->aux->id;
+			bpf_prog_put(prog);
+		}
 
+		if (prog_id != expected_id) {
+			NL_SET_ERR_MSG(extack, "Active program does not match expected");
+			return -EEXIST;
+		}
+	}
+	if (fd >= 0) {
 		if (!offload && __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
 			NL_SET_ERR_MSG(extack, "native and generic XDP can't be active at the same time");
 			return -EEXIST;
 		}
 
-		prog_id = __dev_xdp_query(dev, bpf_op, query);
 		if ((flags & XDP_FLAGS_UPDATE_IF_NOEXIST) && prog_id) {
 			NL_SET_ERR_MSG(extack, "XDP program already attached");
 			return -EBUSY;
@@ -8714,7 +8729,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 			return 0;
 		}
 	} else {
-		if (!__dev_xdp_query(dev, bpf_op, query))
+		if (!prog_id)
 			return 0;
 	}
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 14e6ea21c378..09c08980f6f6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1873,6 +1873,7 @@ static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = {
 
 static const struct nla_policy ifla_xdp_policy[IFLA_XDP_MAX + 1] = {
 	[IFLA_XDP_FD]		= { .type = NLA_S32 },
+	[IFLA_XDP_EXPECTED_FD]	= { .type = NLA_S32 },
 	[IFLA_XDP_ATTACHED]	= { .type = NLA_U8 },
 	[IFLA_XDP_FLAGS]	= { .type = NLA_U32 },
 	[IFLA_XDP_PROG_ID]	= { .type = NLA_U32 },
@@ -2799,8 +2800,18 @@ static int do_setlink(const struct sk_buff *skb,
 		}
 
 		if (xdp[IFLA_XDP_FD]) {
+			int expected_fd = -1;
+
+			if (xdp[IFLA_XDP_EXPECTED_FD]) {
+				expected_fd = nla_get_s32(xdp[IFLA_XDP_EXPECTED_FD]);
+			} else if(xdp_flags & XDP_FLAGS_EXPECT_FD) {
+				err = -EINVAL;
+				goto errout;
+			}
+
 			err = dev_change_xdp_fd(dev, extack,
 						nla_get_s32(xdp[IFLA_XDP_FD]),
+						expected_fd,
 						xdp_flags);
 			if (err)
 				goto errout;


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [PATCH bpf-next 2/4] tools: Add EXPECTED_FD-related definitions in if_link.h
  2020-03-19 13:13 [PATCH bpf-next 0/4] XDP: Support atomic replacement of XDP interface attachments Toke Høiland-Jørgensen
  2020-03-19 13:13 ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
@ 2020-03-19 13:13 ` Toke Høiland-Jørgensen
  2020-03-19 13:13 ` [PATCH bpf-next 3/4] libbpf: Add function to set link XDP fd while specifying old fd Toke Høiland-Jørgensen
  2020-03-19 13:13 ` [PATCH bpf-next 4/4] selftests/bpf: Add tests for attaching XDP programs Toke Høiland-Jørgensen
  3 siblings, 0 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-19 13:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Lorenz Bauer,
	Andrey Ignatov, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

This adds the IFLA_XDP_EXPECTED_FD netlink attribute definition and the
XDP_FLAGS_EXPECT_FD flag to if_link.h in tools/include.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/include/uapi/linux/if_link.h |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index 024af2d1d0af..e5eced1c28f4 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -960,11 +960,12 @@ enum {
 #define XDP_FLAGS_SKB_MODE		(1U << 1)
 #define XDP_FLAGS_DRV_MODE		(1U << 2)
 #define XDP_FLAGS_HW_MODE		(1U << 3)
+#define XDP_FLAGS_EXPECT_FD		(1U << 4)
 #define XDP_FLAGS_MODES			(XDP_FLAGS_SKB_MODE | \
 					 XDP_FLAGS_DRV_MODE | \
 					 XDP_FLAGS_HW_MODE)
 #define XDP_FLAGS_MASK			(XDP_FLAGS_UPDATE_IF_NOEXIST | \
-					 XDP_FLAGS_MODES)
+					 XDP_FLAGS_MODES | XDP_FLAGS_EXPECT_FD)
 
 /* These are stored into IFLA_XDP_ATTACHED on dump. */
 enum {
@@ -984,6 +985,7 @@ enum {
 	IFLA_XDP_DRV_PROG_ID,
 	IFLA_XDP_SKB_PROG_ID,
 	IFLA_XDP_HW_PROG_ID,
+	IFLA_XDP_EXPECTED_FD,
 	__IFLA_XDP_MAX,
 };
 


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [PATCH bpf-next 3/4] libbpf: Add function to set link XDP fd while specifying old fd
  2020-03-19 13:13 [PATCH bpf-next 0/4] XDP: Support atomic replacement of XDP interface attachments Toke Høiland-Jørgensen
  2020-03-19 13:13 ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
  2020-03-19 13:13 ` [PATCH bpf-next 2/4] tools: Add EXPECTED_FD-related definitions in if_link.h Toke Høiland-Jørgensen
@ 2020-03-19 13:13 ` Toke Høiland-Jørgensen
  2020-03-19 13:13 ` [PATCH bpf-next 4/4] selftests/bpf: Add tests for attaching XDP programs Toke Høiland-Jørgensen
  3 siblings, 0 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-19 13:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Lorenz Bauer,
	Andrey Ignatov, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

This adds a new function to set the XDP fd while specifying the old fd to
replace, using the newly added IFLA_XDP_EXPECTED_FD netlink parameter.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/lib/bpf/libbpf.h   |    2 ++
 tools/lib/bpf/libbpf.map |    1 +
 tools/lib/bpf/netlink.c  |   22 +++++++++++++++++++++-
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index d38d7a629417..b5ca4f741e28 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -445,6 +445,8 @@ struct xdp_link_info {
 };
 
 LIBBPF_API int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
+LIBBPF_API int bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
+					   __u32 flags);
 LIBBPF_API int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags);
 LIBBPF_API int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 				     size_t info_size, __u32 flags);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 5129283c0284..154f1d94fa63 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -244,4 +244,5 @@ LIBBPF_0.0.8 {
 		bpf_link__pin_path;
 		bpf_link__unpin;
 		bpf_program__set_attach_target;
+		bpf_set_link_xdp_fd_replace;
 } LIBBPF_0.0.7;
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index 431bd25c6cdb..39bd0ead1546 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -132,7 +132,8 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 	return ret;
 }
 
-int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
+static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
+					 __u32 flags)
 {
 	int sock, seq = 0, ret;
 	struct nlattr *nla, *nla_xdp;
@@ -178,6 +179,14 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
 		nla->nla_len += nla_xdp->nla_len;
 	}
 
+	if (flags & XDP_FLAGS_EXPECT_FD) {
+		nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
+		nla_xdp->nla_type = IFLA_XDP_EXPECTED_FD;
+		nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
+		memcpy((char *)nla_xdp + NLA_HDRLEN, &old_fd, sizeof(old_fd));
+		nla->nla_len += nla_xdp->nla_len;
+	}
+
 	req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
 
 	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
@@ -191,6 +200,17 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
 	return ret;
 }
 
+int bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd, __u32 flags)
+{
+	return __bpf_set_link_xdp_fd_replace(ifindex, fd, old_fd,
+					     flags | XDP_FLAGS_EXPECT_FD);
+}
+
+int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
+{
+	return __bpf_set_link_xdp_fd_replace(ifindex, fd, -1, flags);
+}
+
 static int __dump_link_nlmsg(struct nlmsghdr *nlh,
 			     libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
 {


^ permalink raw reply	[flat|nested] 112+ messages in thread

* [PATCH bpf-next 4/4] selftests/bpf: Add tests for attaching XDP programs
  2020-03-19 13:13 [PATCH bpf-next 0/4] XDP: Support atomic replacement of XDP interface attachments Toke Høiland-Jørgensen
                   ` (2 preceding siblings ...)
  2020-03-19 13:13 ` [PATCH bpf-next 3/4] libbpf: Add function to set link XDP fd while specifying old fd Toke Høiland-Jørgensen
@ 2020-03-19 13:13 ` Toke Høiland-Jørgensen
  3 siblings, 0 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-19 13:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Lorenz Bauer,
	Andrey Ignatov, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

This adds tests for the various replacement operations using
IFLA_XDP_EXPECTED_FD.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 .../testing/selftests/bpf/prog_tests/xdp_attach.c  |   55 ++++++++++++++++++++
 1 file changed, 55 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_attach.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_attach.c b/tools/testing/selftests/bpf/prog_tests/xdp_attach.c
new file mode 100644
index 000000000000..ad974b677e74
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_attach.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+#define IFINDEX_LO 1
+
+void test_xdp_attach(void)
+{
+	struct bpf_object *obj1, *obj2, *obj3;
+	const char *file = "./test_xdp.o";
+	int err, fd1, fd2, fd3;
+        __u32 duration = 0;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj1, &fd1);
+	if (CHECK_FAIL(err))
+		return;
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj2, &fd2);
+	if (CHECK_FAIL(err))
+		goto out_1;
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj3, &fd3);
+	if (CHECK_FAIL(err))
+		goto out_2;
+
+        err = bpf_set_link_xdp_fd_replace(IFINDEX_LO, fd1, -1, 0);
+        if (CHECK(err, "load_ok", "initial load failed"))
+                goto out_close;
+
+        err = bpf_set_link_xdp_fd_replace(IFINDEX_LO, fd2, -1, 0);
+        if (CHECK(!err, "load_fail", "load with expected fd didn't fail"))
+                goto out;
+
+        err = bpf_set_link_xdp_fd_replace(IFINDEX_LO, fd2, fd1, 0);
+        if (CHECK(err, "replace_ok", "replace valid old_fd failed"))
+                goto out;
+
+        err = bpf_set_link_xdp_fd_replace(IFINDEX_LO, fd3, fd1, 0);
+        if (CHECK(!err, "replace_fail", "replace invalid old_fd didn't fail"))
+                goto out;
+
+        err = bpf_set_link_xdp_fd_replace(IFINDEX_LO, -1, fd1, 0);
+        if (CHECK(!err, "remove_fail", "remove invalid old_fd didn't fail"))
+                goto out;
+
+        err = bpf_set_link_xdp_fd_replace(IFINDEX_LO, -1, fd2, 0);
+        if (CHECK(err, "remove_ok", "remove valid old_fd failed"))
+                goto out;
+
+out:
+        bpf_set_link_xdp_fd(IFINDEX_LO, -1, 0);
+out_close:
+	bpf_object__close(obj3);
+out_2:
+	bpf_object__close(obj2);
+out_1:
+	bpf_object__close(obj1);
+}


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-19 13:13 ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
@ 2020-03-19 22:52   ` Jakub Kicinski
  2020-03-20  8:48     ` Toke Høiland-Jørgensen
  2020-03-20  2:13   ` Yonghong Song
  1 sibling, 1 reply; 112+ messages in thread
From: Jakub Kicinski @ 2020-03-19 22:52 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, John Fastabend, Lorenz Bauer,
	Andrey Ignatov, netdev, bpf

On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
> 
> While it is currently possible for userspace to specify that an existing
> XDP program should not be replaced when attaching to an interface, there is
> no mechanism to safely replace a specific XDP program with another.
> 
> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
> set along with IFLA_XDP_FD. If set, the kernel will check that the program
> currently loaded on the interface matches the expected one, and fail the
> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
> 
> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
> request checking of the EXPECTED_FD attribute. This is needed for userspace
> to discover whether the kernel supports the new attribute.
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>

I didn't know we wanted to go ahead with this...

If we do please run this thru checkpatch, set .strict_start_type, and
make the expected fd unsigned. A negative expected fd makes no sense.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-19 13:13 ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
  2020-03-19 22:52   ` Jakub Kicinski
@ 2020-03-20  2:13   ` Yonghong Song
  2020-03-20  8:48     ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 112+ messages in thread
From: Yonghong Song @ 2020-03-20  2:13 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, Song Liu, Andrii Nakryiko,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Lorenz Bauer, Andrey Ignatov, netdev, bpf



On 3/19/20 6:13 AM, Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
> 
> While it is currently possible for userspace to specify that an existing
> XDP program should not be replaced when attaching to an interface, there is
> no mechanism to safely replace a specific XDP program with another.
> 
> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
> set along with IFLA_XDP_FD. If set, the kernel will check that the program
> currently loaded on the interface matches the expected one, and fail the
> operation if it does not. This corresponds to a 'cmpxchg' memory operation.

The patch set itself looks good to me. But previously there is a
discussion regarding a potential similar functionality through bpf_link.
I guess maintainers (Alexei and Daniel) need to weigh in as some
future vision is involved.

> 
> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
> request checking of the EXPECTED_FD attribute. This is needed for userspace
> to discover whether the kernel supports the new attribute.
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>   include/linux/netdevice.h    |    2 +-
>   include/uapi/linux/if_link.h |    4 +++-
>   net/core/dev.c               |   25 ++++++++++++++++++++-----
>   net/core/rtnetlink.c         |   11 +++++++++++
>   4 files changed, 35 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index b6fedd54cd8e..40b12bd93913 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3767,7 +3767,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>   
>   typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
>   int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
> -		      int fd, u32 flags);
> +		      int fd, int expected_fd, u32 flags);
>   u32 __dev_xdp_query(struct net_device *dev, bpf_op_t xdp_op,
>   		    enum bpf_netdev_command cmd);
>   int xdp_umem_query(struct net_device *dev, u16 queue_id);
[...]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-19 22:52   ` Jakub Kicinski
@ 2020-03-20  8:48     ` Toke Høiland-Jørgensen
  2020-03-20 17:35       ` Jakub Kicinski
                         ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-20  8:48 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, John Fastabend, Lorenz Bauer,
	Andrey Ignatov, netdev, bpf

Jakub Kicinski <kuba@kernel.org> writes:

> On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> 
>> While it is currently possible for userspace to specify that an existing
>> XDP program should not be replaced when attaching to an interface, there is
>> no mechanism to safely replace a specific XDP program with another.
>> 
>> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
>> set along with IFLA_XDP_FD. If set, the kernel will check that the program
>> currently loaded on the interface matches the expected one, and fail the
>> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
>> 
>> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
>> request checking of the EXPECTED_FD attribute. This is needed for userspace
>> to discover whether the kernel supports the new attribute.
>> 
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>
> I didn't know we wanted to go ahead with this...

Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
happening with that, though. So since this is a straight-forward
extension of the existing API, that doesn't carry a high implementation
cost, I figured I'd just go ahead with this. Doesn't mean we can't have
something similar in bpf_link as well, of course.

> If we do please run this thru checkpatch, set .strict_start_type,

Will do.

> and make the expected fd unsigned. A negative expected fd makes no
> sense.

A negative expected_fd corresponds to setting the UPDATE_IF_NOEXIST
flag. I guess you could argue that since we have that flag, setting a
negative expected_fd is not strictly needed. However, I thought it was
weird to have a "this is what I expect" API that did not support
expressing "I expect no program to be attached".

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-20  2:13   ` Yonghong Song
@ 2020-03-20  8:48     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-20  8:48 UTC (permalink / raw)
  To: Yonghong Song, Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, Song Liu, Andrii Nakryiko,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Lorenz Bauer, Andrey Ignatov, netdev, bpf

Yonghong Song <yhs@fb.com> writes:

> On 3/19/20 6:13 AM, Toke Høiland-Jørgensen wrote:
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> 
>> While it is currently possible for userspace to specify that an existing
>> XDP program should not be replaced when attaching to an interface, there is
>> no mechanism to safely replace a specific XDP program with another.
>> 
>> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
>> set along with IFLA_XDP_FD. If set, the kernel will check that the program
>> currently loaded on the interface matches the expected one, and fail the
>> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
>
> The patch set itself looks good to me. But previously there is a
> discussion regarding a potential similar functionality through bpf_link.
> I guess maintainers (Alexei and Daniel) need to weigh in as some
> future vision is involved.

Right, sure. See my reply to Jakub for why I went ahead with this
anyway.

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-20  8:48     ` Toke Høiland-Jørgensen
@ 2020-03-20 17:35       ` Jakub Kicinski
  2020-03-20 18:17         ` Toke Høiland-Jørgensen
  2020-03-20 18:30         ` John Fastabend
  2020-03-20 20:30       ` Daniel Borkmann
  2020-03-20 20:39       ` Andrii Nakryiko
  2 siblings, 2 replies; 112+ messages in thread
From: Jakub Kicinski @ 2020-03-20 17:35 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, John Fastabend, Lorenz Bauer,
	Andrey Ignatov, netdev, bpf

On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:  
> >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> >> 
> >> While it is currently possible for userspace to specify that an existing
> >> XDP program should not be replaced when attaching to an interface, there is
> >> no mechanism to safely replace a specific XDP program with another.
> >> 
> >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
> >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
> >> currently loaded on the interface matches the expected one, and fail the
> >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
> >> 
> >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
> >> request checking of the EXPECTED_FD attribute. This is needed for userspace
> >> to discover whether the kernel supports the new attribute.
> >> 
> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>  
> >
> > I didn't know we wanted to go ahead with this...  
> 
> Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
> happening with that, though. So since this is a straight-forward
> extension of the existing API, that doesn't carry a high implementation
> cost, I figured I'd just go ahead with this. Doesn't mean we can't have
> something similar in bpf_link as well, of course.

I'm not really in the loop, but from what I overheard - I think the
bpf_link may be targeting something non-networking first.

> > If we do please run this thru checkpatch, set .strict_start_type,  
> 
> Will do.
> 
> > and make the expected fd unsigned. A negative expected fd makes no
> > sense.  
> 
> A negative expected_fd corresponds to setting the UPDATE_IF_NOEXIST
> flag. I guess you could argue that since we have that flag, setting a
> negative expected_fd is not strictly needed. However, I thought it was
> weird to have a "this is what I expect" API that did not support
> expressing "I expect no program to be attached".

I see it now, not entirely unreasonable.

Why did you choose to use the FD rather than passing prog id directly?
Is the application unlikely to have program ID?

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-20 17:35       ` Jakub Kicinski
@ 2020-03-20 18:17         ` Toke Høiland-Jørgensen
  2020-03-20 18:35           ` Jakub Kicinski
  2020-03-20 18:30         ` John Fastabend
  1 sibling, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-20 18:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, John Fastabend, Lorenz Bauer,
	Andrey Ignatov, netdev, bpf

Jakub Kicinski <kuba@kernel.org> writes:

>> > If we do please run this thru checkpatch, set .strict_start_type,  
>> 
>> Will do.
>> 
>> > and make the expected fd unsigned. A negative expected fd makes no
>> > sense.  
>> 
>> A negative expected_fd corresponds to setting the UPDATE_IF_NOEXIST
>> flag. I guess you could argue that since we have that flag, setting a
>> negative expected_fd is not strictly needed. However, I thought it was
>> weird to have a "this is what I expect" API that did not support
>> expressing "I expect no program to be attached".
>
> I see it now, not entirely unreasonable.
>
> Why did you choose to use the FD rather than passing prog id directly?
> Is the application unlikely to have program ID?

For consistency with other APIs. Seems the pattern is generally that
userspace supplies program FDs, and the kernel returns IDs, no?

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-20 17:35       ` Jakub Kicinski
  2020-03-20 18:17         ` Toke Høiland-Jørgensen
@ 2020-03-20 18:30         ` John Fastabend
  2020-03-20 20:24           ` Andrii Nakryiko
  1 sibling, 1 reply; 112+ messages in thread
From: John Fastabend @ 2020-03-20 18:30 UTC (permalink / raw)
  To: Jakub Kicinski, Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, John Fastabend, Lorenz Bauer,
	Andrey Ignatov, netdev, bpf

Jakub Kicinski wrote:
> On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
> > Jakub Kicinski <kuba@kernel.org> writes:
> > > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:  
> > >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> > >> 
> > >> While it is currently possible for userspace to specify that an existing
> > >> XDP program should not be replaced when attaching to an interface, there is
> > >> no mechanism to safely replace a specific XDP program with another.
> > >> 
> > >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
> > >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
> > >> currently loaded on the interface matches the expected one, and fail the
> > >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
> > >> 
> > >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
> > >> request checking of the EXPECTED_FD attribute. This is needed for userspace
> > >> to discover whether the kernel supports the new attribute.
> > >> 
> > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>  
> > >
> > > I didn't know we wanted to go ahead with this...  
> > 
> > Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
> > happening with that, though. So since this is a straight-forward
> > extension of the existing API, that doesn't carry a high implementation
> > cost, I figured I'd just go ahead with this. Doesn't mean we can't have
> > something similar in bpf_link as well, of course.
> 
> I'm not really in the loop, but from what I overheard - I think the
> bpf_link may be targeting something non-networking first.

My preference is to avoid building two different APIs one for XDP and another
for everything else. If we have userlands that already understand links and
pinning support is on the way imo lets use these APIs for networking as well.

Would a link_swap() API (proposed by Andrii iirc) resolve this use case as
well? If not why? If it can it seems like the more general and consistent
solution. I can imagine swapping links is useful in tracing as well and
likely other cases I haven't thought about.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-20 18:17         ` Toke Høiland-Jørgensen
@ 2020-03-20 18:35           ` Jakub Kicinski
  0 siblings, 0 replies; 112+ messages in thread
From: Jakub Kicinski @ 2020-03-20 18:35 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, John Fastabend, Lorenz Bauer,
	Andrey Ignatov, netdev, bpf

On Fri, 20 Mar 2020 19:17:46 +0100 Toke Høiland-Jørgensen wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> 
> >> > If we do please run this thru checkpatch, set .strict_start_type,    
> >> 
> >> Will do.
> >>   
> >> > and make the expected fd unsigned. A negative expected fd makes no
> >> > sense.    
> >> 
> >> A negative expected_fd corresponds to setting the UPDATE_IF_NOEXIST
> >> flag. I guess you could argue that since we have that flag, setting a
> >> negative expected_fd is not strictly needed. However, I thought it was
> >> weird to have a "this is what I expect" API that did not support
> >> expressing "I expect no program to be attached".  
> >
> > I see it now, not entirely unreasonable.
> >
> > Why did you choose to use the FD rather than passing prog id directly?
> > Is the application unlikely to have program ID?  
> 
> For consistency with other APIs. Seems the pattern is generally that
> userspace supplies program FDs, and the kernel returns IDs, no?

This API just predates the IDs. "From kernel" API was added when 
IDs already existed.

I'd think for cmpxchg it may be easier if user space provides ID
directly, since it's what it get returned.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-20 18:30         ` John Fastabend
@ 2020-03-20 20:24           ` Andrii Nakryiko
  2020-03-23 11:24             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-20 20:24 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jakub Kicinski, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Fri, Mar 20, 2020 at 11:31 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> Jakub Kicinski wrote:
> > On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
> > > Jakub Kicinski <kuba@kernel.org> writes:
> > > > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
> > > >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> > > >>
> > > >> While it is currently possible for userspace to specify that an existing
> > > >> XDP program should not be replaced when attaching to an interface, there is
> > > >> no mechanism to safely replace a specific XDP program with another.
> > > >>
> > > >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
> > > >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
> > > >> currently loaded on the interface matches the expected one, and fail the
> > > >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
> > > >>
> > > >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
> > > >> request checking of the EXPECTED_FD attribute. This is needed for userspace
> > > >> to discover whether the kernel supports the new attribute.
> > > >>
> > > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > > >
> > > > I didn't know we wanted to go ahead with this...
> > >
> > > Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
> > > happening with that, though. So since this is a straight-forward
> > > extension of the existing API, that doesn't carry a high implementation
> > > cost, I figured I'd just go ahead with this. Doesn't mean we can't have
> > > something similar in bpf_link as well, of course.
> >
> > I'm not really in the loop, but from what I overheard - I think the
> > bpf_link may be targeting something non-networking first.
>
> My preference is to avoid building two different APIs one for XDP and another
> for everything else. If we have userlands that already understand links and
> pinning support is on the way imo lets use these APIs for networking as well.

I agree here. And yes, I've been working on extending bpf_link into
cgroup and then to XDP. We are still discussing some cgroup-specific
details, but the patch is ready. I'm going to post it as an RFC to get
the discussion started, before we do this for XDP.

>
> Would a link_swap() API (proposed by Andrii iirc) resolve this use case as
> well? If not why? If it can it seems like the more general and consistent
> solution. I can imagine swapping links is useful in tracing as well and
> likely other cases I haven't thought about.

Yes, that's the idea. Right now I have implementation for cgroups, but
API itself is generic and should/will be extended to tracing and XDP.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-20  8:48     ` Toke Høiland-Jørgensen
  2020-03-20 17:35       ` Jakub Kicinski
@ 2020-03-20 20:30       ` Daniel Borkmann
  2020-03-20 20:40         ` Daniel Borkmann
  2020-03-20 20:39       ` Andrii Nakryiko
  2 siblings, 1 reply; 112+ messages in thread
From: Daniel Borkmann @ 2020-03-20 20:30 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Jakub Kicinski
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	John Fastabend, Lorenz Bauer, Andrey Ignatov, netdev, bpf

On 3/20/20 9:48 AM, Toke Høiland-Jørgensen wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
>> On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
>>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>>
>>> While it is currently possible for userspace to specify that an existing
>>> XDP program should not be replaced when attaching to an interface, there is
>>> no mechanism to safely replace a specific XDP program with another.
>>>
>>> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
>>> set along with IFLA_XDP_FD. If set, the kernel will check that the program
>>> currently loaded on the interface matches the expected one, and fail the
>>> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
>>>
>>> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
>>> request checking of the EXPECTED_FD attribute. This is needed for userspace
>>> to discover whether the kernel supports the new attribute.
>>>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>
>> I didn't know we wanted to go ahead with this...
> 
> Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
> happening with that, though. So since this is a straight-forward
> extension of the existing API, that doesn't carry a high implementation
> cost, I figured I'd just go ahead with this. Doesn't mean we can't have
> something similar in bpf_link as well, of course.

Overall series looks okay, but before we go down that road, especially given there is
the new bpf_link object now, I would like us to first elaborate and figure out how XDP
fits into the bpf_link concept, where its limitations are, whether it even fits at all,
and how its semantics should look like realistically given bpf_link is to be generic to
all program types. Then we could extend the atomic replace there generically as well. I
think at the very minimum it might have similarities with what is proposed here, but
from a user experience I would like to avoid having something similar in XDP API and
then again in bpf_link which would just be confusing..

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-20  8:48     ` Toke Høiland-Jørgensen
  2020-03-20 17:35       ` Jakub Kicinski
  2020-03-20 20:30       ` Daniel Borkmann
@ 2020-03-20 20:39       ` Andrii Nakryiko
  2020-03-23 11:25         ` Toke Høiland-Jørgensen
  2 siblings, 1 reply; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-20 20:39 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, John Fastabend,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On Fri, Mar 20, 2020 at 1:48 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Jakub Kicinski <kuba@kernel.org> writes:
>
> > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
> >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> >>
> >> While it is currently possible for userspace to specify that an existing
> >> XDP program should not be replaced when attaching to an interface, there is
> >> no mechanism to safely replace a specific XDP program with another.
> >>
> >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
> >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
> >> currently loaded on the interface matches the expected one, and fail the
> >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
> >>
> >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
> >> request checking of the EXPECTED_FD attribute. This is needed for userspace
> >> to discover whether the kernel supports the new attribute.
> >>
> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >
> > I didn't know we wanted to go ahead with this...
>
> Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
> happening with that, though. So since this is a straight-forward
> extension of the existing API, that doesn't carry a high implementation
> cost, I figured I'd just go ahead with this. Doesn't mean we can't have
> something similar in bpf_link as well, of course.
>
> > If we do please run this thru checkpatch, set .strict_start_type,
>
> Will do.
>
> > and make the expected fd unsigned. A negative expected fd makes no
> > sense.
>
> A negative expected_fd corresponds to setting the UPDATE_IF_NOEXIST
> flag. I guess you could argue that since we have that flag, setting a
> negative expected_fd is not strictly needed. However, I thought it was
> weird to have a "this is what I expect" API that did not support
> expressing "I expect no program to be attached".

For BPF syscall it seems the typical approach when optional FD is
needed is to have extra flag (e.g., BPF_F_REPLACE for cgroups) and if
it's not specified - enforce zero for that optional fd. That handles
backwards compatibility cases well as well.

>
> -Toke
>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-20 20:30       ` Daniel Borkmann
@ 2020-03-20 20:40         ` Daniel Borkmann
  2020-03-20 21:30           ` Jakub Kicinski
  0 siblings, 1 reply; 112+ messages in thread
From: Daniel Borkmann @ 2020-03-20 20:40 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Jakub Kicinski
  Cc: Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	John Fastabend, Lorenz Bauer, Andrey Ignatov, netdev, bpf

On 3/20/20 9:30 PM, Daniel Borkmann wrote:
> On 3/20/20 9:48 AM, Toke Høiland-Jørgensen wrote:
>> Jakub Kicinski <kuba@kernel.org> writes:
>>> On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
>>>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>
>>>> While it is currently possible for userspace to specify that an existing
>>>> XDP program should not be replaced when attaching to an interface, there is
>>>> no mechanism to safely replace a specific XDP program with another.
>>>>
>>>> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
>>>> set along with IFLA_XDP_FD. If set, the kernel will check that the program
>>>> currently loaded on the interface matches the expected one, and fail the
>>>> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
>>>>
>>>> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
>>>> request checking of the EXPECTED_FD attribute. This is needed for userspace
>>>> to discover whether the kernel supports the new attribute.
>>>>
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>
>>> I didn't know we wanted to go ahead with this...
>>
>> Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
>> happening with that, though. So since this is a straight-forward
>> extension of the existing API, that doesn't carry a high implementation
>> cost, I figured I'd just go ahead with this. Doesn't mean we can't have
>> something similar in bpf_link as well, of course.
> 
> Overall series looks okay, but before we go down that road, especially given there is
> the new bpf_link object now, I would like us to first elaborate and figure out how XDP
> fits into the bpf_link concept, where its limitations are, whether it even fits at all,
> and how its semantics should look like realistically given bpf_link is to be generic to
> all program types. Then we could extend the atomic replace there generically as well. I
> think at the very minimum it might have similarities with what is proposed here, but
> from a user experience I would like to avoid having something similar in XDP API and
> then again in bpf_link which would just be confusing..

Another aspect that falls into this atomic replacement is also that the programs can
actually be atomically replaced at runtime. Last time I looked, some drivers still do
a down/up cycle on replacement and hence traffic would be interrupted. I would argue
that such /atomic/ swap operation on bpf_link would cover a guarantee of not having to
perform this as well (workaround today would be a simple tail call map as entry point).

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-20 20:40         ` Daniel Borkmann
@ 2020-03-20 21:30           ` Jakub Kicinski
  2020-03-20 21:55             ` Daniel Borkmann
  0 siblings, 1 reply; 112+ messages in thread
From: Jakub Kicinski @ 2020-03-20 21:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, John Fastabend,
	Lorenz Bauer, Andrey Ignatov, netdev, bpf

On Fri, 20 Mar 2020 21:40:46 +0100 Daniel Borkmann wrote:
> On 3/20/20 9:30 PM, Daniel Borkmann wrote:
> > On 3/20/20 9:48 AM, Toke Høiland-Jørgensen wrote:  
> >> Jakub Kicinski <kuba@kernel.org> writes:  
> >>> On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:  
> >>>> From: Toke Høiland-Jørgensen <toke@redhat.com>
> >>>>
> >>>> While it is currently possible for userspace to specify that an existing
> >>>> XDP program should not be replaced when attaching to an interface, there is
> >>>> no mechanism to safely replace a specific XDP program with another.
> >>>>
> >>>> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
> >>>> set along with IFLA_XDP_FD. If set, the kernel will check that the program
> >>>> currently loaded on the interface matches the expected one, and fail the
> >>>> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
> >>>>
> >>>> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
> >>>> request checking of the EXPECTED_FD attribute. This is needed for userspace
> >>>> to discover whether the kernel supports the new attribute.
> >>>>
> >>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>  
> >>>
> >>> I didn't know we wanted to go ahead with this...  
> >>
> >> Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
> >> happening with that, though. So since this is a straight-forward
> >> extension of the existing API, that doesn't carry a high implementation
> >> cost, I figured I'd just go ahead with this. Doesn't mean we can't have
> >> something similar in bpf_link as well, of course.  
> > 
> > Overall series looks okay, but before we go down that road, especially given there is
> > the new bpf_link object now, I would like us to first elaborate and figure out how XDP
> > fits into the bpf_link concept, where its limitations are, whether it even fits at all,
> > and how its semantics should look like realistically given bpf_link is to be generic to
> > all program types. Then we could extend the atomic replace there generically as well. I
> > think at the very minimum it might have similarities with what is proposed here, but
> > from a user experience I would like to avoid having something similar in XDP API and
> > then again in bpf_link which would just be confusing..  
> 
> Another aspect that falls into this atomic replacement is also that the programs can
> actually be atomically replaced at runtime. Last time I looked, some drivers still do
> a down/up cycle on replacement and hence traffic would be interrupted. I would argue
> that such /atomic/ swap operation on bpf_link would cover a guarantee of not having to
> perform this as well (workaround today would be a simple tail call map as entry point).

I don't think that's the case. Drivers generally have a fast path 
for the active-active replace.

Up/Down is only done to remap DMA buffers and change RX buffer
allocation scheme. That's when program is installed or removed,
not replaced.

I'm sure bpf_link would have solved this problem, though, and all 
the other problems we don't actually have :-P

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-20 21:30           ` Jakub Kicinski
@ 2020-03-20 21:55             ` Daniel Borkmann
  2020-03-20 23:35               ` Jakub Kicinski
  0 siblings, 1 reply; 112+ messages in thread
From: Daniel Borkmann @ 2020-03-20 21:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, John Fastabend,
	Lorenz Bauer, Andrey Ignatov, netdev, bpf

On 3/20/20 10:30 PM, Jakub Kicinski wrote:
> On Fri, 20 Mar 2020 21:40:46 +0100 Daniel Borkmann wrote:
>> On 3/20/20 9:30 PM, Daniel Borkmann wrote:
>>> On 3/20/20 9:48 AM, Toke Høiland-Jørgensen wrote:
>>>> Jakub Kicinski <kuba@kernel.org> writes:
>>>>> On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
>>>>>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>>>
>>>>>> While it is currently possible for userspace to specify that an existing
>>>>>> XDP program should not be replaced when attaching to an interface, there is
>>>>>> no mechanism to safely replace a specific XDP program with another.
>>>>>>
>>>>>> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
>>>>>> set along with IFLA_XDP_FD. If set, the kernel will check that the program
>>>>>> currently loaded on the interface matches the expected one, and fail the
>>>>>> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
>>>>>>
>>>>>> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
>>>>>> request checking of the EXPECTED_FD attribute. This is needed for userspace
>>>>>> to discover whether the kernel supports the new attribute.
>>>>>>
>>>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>>
>>>>> I didn't know we wanted to go ahead with this...
>>>>
>>>> Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
>>>> happening with that, though. So since this is a straight-forward
>>>> extension of the existing API, that doesn't carry a high implementation
>>>> cost, I figured I'd just go ahead with this. Doesn't mean we can't have
>>>> something similar in bpf_link as well, of course.
>>>
>>> Overall series looks okay, but before we go down that road, especially given there is
>>> the new bpf_link object now, I would like us to first elaborate and figure out how XDP
>>> fits into the bpf_link concept, where its limitations are, whether it even fits at all,
>>> and how its semantics should look like realistically given bpf_link is to be generic to
>>> all program types. Then we could extend the atomic replace there generically as well. I
>>> think at the very minimum it might have similarities with what is proposed here, but
>>> from a user experience I would like to avoid having something similar in XDP API and
>>> then again in bpf_link which would just be confusing..
>>
>> Another aspect that falls into this atomic replacement is also that the programs can
>> actually be atomically replaced at runtime. Last time I looked, some drivers still do
>> a down/up cycle on replacement and hence traffic would be interrupted. I would argue
>> that such /atomic/ swap operation on bpf_link would cover a guarantee of not having to
>> perform this as well (workaround today would be a simple tail call map as entry point).
> 
> I don't think that's the case. Drivers generally have a fast path
> for the active-active replace.
> 
> Up/Down is only done to remap DMA buffers and change RX buffer
> allocation scheme. That's when program is installed or removed,
> not replaced.

I know; though it seems not all adhere to that scheme sadly. I don't have that HW so can
only judge on the code, but one example that looked suspicious enough to me is qede_xdp().
It calls qede_xdp_set(), which does a qede_reload() for /every/ prog update. The latter
basically does ...

     if (edev->state == QEDE_STATE_OPEN) {
         qede_unload(edev, QEDE_UNLOAD_NORMAL, true);
         if (args)
             args->func(edev, args);               <-- prog replace here
         qede_load(edev, QEDE_LOAD_RELOAD, true);
         [...]
     }

... now that is one driver. I haven't checked all the others (aside from i40e/ixgbe/mlx4/
mlx5/nfp), but in any case it's also fixable in the driver w/o the extra need for bpf_link.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-20 21:55             ` Daniel Borkmann
@ 2020-03-20 23:35               ` Jakub Kicinski
  0 siblings, 0 replies; 112+ messages in thread
From: Jakub Kicinski @ 2020-03-20 23:35 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, John Fastabend,
	Lorenz Bauer, Andrey Ignatov, netdev, bpf

On Fri, 20 Mar 2020 22:55:43 +0100 Daniel Borkmann wrote:
> >> Another aspect that falls into this atomic replacement is also that the programs can
> >> actually be atomically replaced at runtime. Last time I looked, some drivers still do
> >> a down/up cycle on replacement and hence traffic would be interrupted. I would argue
> >> that such /atomic/ swap operation on bpf_link would cover a guarantee of not having to
> >> perform this as well (workaround today would be a simple tail call map as entry point).  
> > 
> > I don't think that's the case. Drivers generally have a fast path
> > for the active-active replace.
> > 
> > Up/Down is only done to remap DMA buffers and change RX buffer
> > allocation scheme. That's when program is installed or removed,
> > not replaced.  
> 
> I know; though it seems not all adhere to that scheme sadly. I don't have that HW so can
> only judge on the code, but one example that looked suspicious enough to me is qede_xdp().
> It calls qede_xdp_set(), which does a qede_reload() for /every/ prog update. The latter
> basically does ...
> 
>      if (edev->state == QEDE_STATE_OPEN) {
>          qede_unload(edev, QEDE_UNLOAD_NORMAL, true);
>          if (args)
>              args->func(edev, args);               <-- prog replace here
>          qede_load(edev, QEDE_LOAD_RELOAD, true);
>          [...]
>      }

Ack, one day maybe we can restructure things enough so that drivers
don't have to copy/paste this dance :(

> ... now that is one driver. I haven't checked all the others (aside from i40e/ixgbe/mlx4/
> mlx5/nfp), but in any case it's also fixable in the driver w/o the extra need for bpf_link.

Agreed

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-20 20:24           ` Andrii Nakryiko
@ 2020-03-23 11:24             ` Toke Høiland-Jørgensen
  2020-03-23 16:54               ` Jakub Kicinski
  2020-03-23 18:14               ` Andrii Nakryiko
  0 siblings, 2 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-23 11:24 UTC (permalink / raw)
  To: Andrii Nakryiko, John Fastabend
  Cc: Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, Lorenz Bauer,
	Andrey Ignatov, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Fri, Mar 20, 2020 at 11:31 AM John Fastabend
> <john.fastabend@gmail.com> wrote:
>>
>> Jakub Kicinski wrote:
>> > On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
>> > > Jakub Kicinski <kuba@kernel.org> writes:
>> > > > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
>> > > >> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> > > >>
>> > > >> While it is currently possible for userspace to specify that an existing
>> > > >> XDP program should not be replaced when attaching to an interface, there is
>> > > >> no mechanism to safely replace a specific XDP program with another.
>> > > >>
>> > > >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
>> > > >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
>> > > >> currently loaded on the interface matches the expected one, and fail the
>> > > >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
>> > > >>
>> > > >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
>> > > >> request checking of the EXPECTED_FD attribute. This is needed for userspace
>> > > >> to discover whether the kernel supports the new attribute.
>> > > >>
>> > > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> > > >
>> > > > I didn't know we wanted to go ahead with this...
>> > >
>> > > Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
>> > > happening with that, though. So since this is a straight-forward
>> > > extension of the existing API, that doesn't carry a high implementation
>> > > cost, I figured I'd just go ahead with this. Doesn't mean we can't have
>> > > something similar in bpf_link as well, of course.
>> >
>> > I'm not really in the loop, but from what I overheard - I think the
>> > bpf_link may be targeting something non-networking first.
>>
>> My preference is to avoid building two different APIs one for XDP and another
>> for everything else. If we have userlands that already understand links and
>> pinning support is on the way imo lets use these APIs for networking as well.
>
> I agree here. And yes, I've been working on extending bpf_link into
> cgroup and then to XDP. We are still discussing some cgroup-specific
> details, but the patch is ready. I'm going to post it as an RFC to get
> the discussion started, before we do this for XDP.

Well, my reason for being skeptic about bpf_link and proposing the
netlink-based API is actually exactly this, but in reverse: With
bpf_link we will be in the situation that everything related to a netdev
is configured over netlink *except* XDP.

Other than that, I don't see any reason why the bpf_link API won't work.
So I guess that if no one else has any problem with BPF insisting on
being a special snowflake, I guess I can live with it as well... *shrugs* :)

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-20 20:39       ` Andrii Nakryiko
@ 2020-03-23 11:25         ` Toke Høiland-Jørgensen
  2020-03-23 18:07           ` Andrii Nakryiko
  2020-03-23 23:54           ` Andrey Ignatov
  0 siblings, 2 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-23 11:25 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, John Fastabend,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Fri, Mar 20, 2020 at 1:48 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Jakub Kicinski <kuba@kernel.org> writes:
>>
>> > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
>> >> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> >>
>> >> While it is currently possible for userspace to specify that an existing
>> >> XDP program should not be replaced when attaching to an interface, there is
>> >> no mechanism to safely replace a specific XDP program with another.
>> >>
>> >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
>> >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
>> >> currently loaded on the interface matches the expected one, and fail the
>> >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
>> >>
>> >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
>> >> request checking of the EXPECTED_FD attribute. This is needed for userspace
>> >> to discover whether the kernel supports the new attribute.
>> >>
>> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> >
>> > I didn't know we wanted to go ahead with this...
>>
>> Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
>> happening with that, though. So since this is a straight-forward
>> extension of the existing API, that doesn't carry a high implementation
>> cost, I figured I'd just go ahead with this. Doesn't mean we can't have
>> something similar in bpf_link as well, of course.
>>
>> > If we do please run this thru checkpatch, set .strict_start_type,
>>
>> Will do.
>>
>> > and make the expected fd unsigned. A negative expected fd makes no
>> > sense.
>>
>> A negative expected_fd corresponds to setting the UPDATE_IF_NOEXIST
>> flag. I guess you could argue that since we have that flag, setting a
>> negative expected_fd is not strictly needed. However, I thought it was
>> weird to have a "this is what I expect" API that did not support
>> expressing "I expect no program to be attached".
>
> For BPF syscall it seems the typical approach when optional FD is
> needed is to have extra flag (e.g., BPF_F_REPLACE for cgroups) and if
> it's not specified - enforce zero for that optional fd. That handles
> backwards compatibility cases well as well.

Never did understand how that is supposed to square with 0 being a valid
fd number?

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-23 11:24             ` Toke Høiland-Jørgensen
@ 2020-03-23 16:54               ` Jakub Kicinski
  2020-03-23 18:14               ` Andrii Nakryiko
  1 sibling, 0 replies; 112+ messages in thread
From: Jakub Kicinski @ 2020-03-23 16:54 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, John Fastabend, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On Mon, 23 Mar 2020 12:24:34 +0100 Toke Høiland-Jørgensen wrote:
> Well, my reason for being skeptic about bpf_link and proposing the
> netlink-based API is actually exactly this, but in reverse: With
> bpf_link we will be in the situation that everything related to a netdev
> is configured over netlink *except* XDP.

+1

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-23 11:25         ` Toke Høiland-Jørgensen
@ 2020-03-23 18:07           ` Andrii Nakryiko
  2020-03-23 23:54           ` Andrey Ignatov
  1 sibling, 0 replies; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-23 18:07 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, John Fastabend,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On Mon, Mar 23, 2020 at 4:25 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Fri, Mar 20, 2020 at 1:48 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Jakub Kicinski <kuba@kernel.org> writes:
> >>
> >> > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
> >> >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >>
> >> >> While it is currently possible for userspace to specify that an existing
> >> >> XDP program should not be replaced when attaching to an interface, there is
> >> >> no mechanism to safely replace a specific XDP program with another.
> >> >>
> >> >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
> >> >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
> >> >> currently loaded on the interface matches the expected one, and fail the
> >> >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
> >> >>
> >> >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
> >> >> request checking of the EXPECTED_FD attribute. This is needed for userspace
> >> >> to discover whether the kernel supports the new attribute.
> >> >>
> >> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >
> >> > I didn't know we wanted to go ahead with this...
> >>
> >> Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
> >> happening with that, though. So since this is a straight-forward
> >> extension of the existing API, that doesn't carry a high implementation
> >> cost, I figured I'd just go ahead with this. Doesn't mean we can't have
> >> something similar in bpf_link as well, of course.
> >>
> >> > If we do please run this thru checkpatch, set .strict_start_type,
> >>
> >> Will do.
> >>
> >> > and make the expected fd unsigned. A negative expected fd makes no
> >> > sense.
> >>
> >> A negative expected_fd corresponds to setting the UPDATE_IF_NOEXIST
> >> flag. I guess you could argue that since we have that flag, setting a
> >> negative expected_fd is not strictly needed. However, I thought it was
> >> weird to have a "this is what I expect" API that did not support
> >> expressing "I expect no program to be attached".
> >
> > For BPF syscall it seems the typical approach when optional FD is
> > needed is to have extra flag (e.g., BPF_F_REPLACE for cgroups) and if
> > it's not specified - enforce zero for that optional fd. That handles
> > backwards compatibility cases well as well.
>
> Never did understand how that is supposed to square with 0 being a valid
> fd number?

You mean a tiny chance that given invalid userspace program behavior
(setting valid FD 0 without specifying BPF_F_REPLACE or not setting
FD, but FD=0 being a valid program FD) it might succeed accidentally?
Sure it's theoretically possible, but highly unlikely and in any case
it's an invalid userspace behavior. So I guess it was deemed
acceptable for the sake of backwards compatibility?

>
> -Toke
>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-23 11:24             ` Toke Høiland-Jørgensen
  2020-03-23 16:54               ` Jakub Kicinski
@ 2020-03-23 18:14               ` Andrii Nakryiko
  2020-03-23 19:23                 ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-23 18:14 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On Mon, Mar 23, 2020 at 4:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Fri, Mar 20, 2020 at 11:31 AM John Fastabend
> > <john.fastabend@gmail.com> wrote:
> >>
> >> Jakub Kicinski wrote:
> >> > On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
> >> > > Jakub Kicinski <kuba@kernel.org> writes:
> >> > > > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
> >> > > >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> >> > > >>
> >> > > >> While it is currently possible for userspace to specify that an existing
> >> > > >> XDP program should not be replaced when attaching to an interface, there is
> >> > > >> no mechanism to safely replace a specific XDP program with another.
> >> > > >>
> >> > > >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
> >> > > >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
> >> > > >> currently loaded on the interface matches the expected one, and fail the
> >> > > >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
> >> > > >>
> >> > > >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
> >> > > >> request checking of the EXPECTED_FD attribute. This is needed for userspace
> >> > > >> to discover whether the kernel supports the new attribute.
> >> > > >>
> >> > > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> > > >
> >> > > > I didn't know we wanted to go ahead with this...
> >> > >
> >> > > Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
> >> > > happening with that, though. So since this is a straight-forward
> >> > > extension of the existing API, that doesn't carry a high implementation
> >> > > cost, I figured I'd just go ahead with this. Doesn't mean we can't have
> >> > > something similar in bpf_link as well, of course.
> >> >
> >> > I'm not really in the loop, but from what I overheard - I think the
> >> > bpf_link may be targeting something non-networking first.
> >>
> >> My preference is to avoid building two different APIs one for XDP and another
> >> for everything else. If we have userlands that already understand links and
> >> pinning support is on the way imo lets use these APIs for networking as well.
> >
> > I agree here. And yes, I've been working on extending bpf_link into
> > cgroup and then to XDP. We are still discussing some cgroup-specific
> > details, but the patch is ready. I'm going to post it as an RFC to get
> > the discussion started, before we do this for XDP.
>
> Well, my reason for being skeptic about bpf_link and proposing the
> netlink-based API is actually exactly this, but in reverse: With
> bpf_link we will be in the situation that everything related to a netdev
> is configured over netlink *except* XDP.

One can argue that everything related to use of BPF is going to be
uniform and done through BPF syscall? Given variety of possible BPF
hooks/targets, using custom ways to attach for all those many cases is
really bad as well, so having a unifying concept and single entry to
do this is good, no?

>
> Other than that, I don't see any reason why the bpf_link API won't work.
> So I guess that if no one else has any problem with BPF insisting on
> being a special snowflake, I guess I can live with it as well... *shrugs* :)

Apart from derogatory remark, BPF is a bit special here, because it
requires every potential BPF hook (be it cgroups, xdp, perf_event,
etc) to be aware of BPF program(s) and execute them with special
macro. So like it or not, it is special and each driver supporting BPF
needs to implement this BPF wiring.

>
> -Toke
>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-23 18:14               ` Andrii Nakryiko
@ 2020-03-23 19:23                 ` Toke Høiland-Jørgensen
  2020-03-24  1:01                   ` David Ahern
  2020-03-24  5:00                   ` Andrii Nakryiko
  0 siblings, 2 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-23 19:23 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Mon, Mar 23, 2020 at 4:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Fri, Mar 20, 2020 at 11:31 AM John Fastabend
>> > <john.fastabend@gmail.com> wrote:
>> >>
>> >> Jakub Kicinski wrote:
>> >> > On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
>> >> > > Jakub Kicinski <kuba@kernel.org> writes:
>> >> > > > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
>> >> > > >> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> > > >>
>> >> > > >> While it is currently possible for userspace to specify that an existing
>> >> > > >> XDP program should not be replaced when attaching to an interface, there is
>> >> > > >> no mechanism to safely replace a specific XDP program with another.
>> >> > > >>
>> >> > > >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
>> >> > > >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
>> >> > > >> currently loaded on the interface matches the expected one, and fail the
>> >> > > >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
>> >> > > >>
>> >> > > >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
>> >> > > >> request checking of the EXPECTED_FD attribute. This is needed for userspace
>> >> > > >> to discover whether the kernel supports the new attribute.
>> >> > > >>
>> >> > > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> > > >
>> >> > > > I didn't know we wanted to go ahead with this...
>> >> > >
>> >> > > Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
>> >> > > happening with that, though. So since this is a straight-forward
>> >> > > extension of the existing API, that doesn't carry a high implementation
>> >> > > cost, I figured I'd just go ahead with this. Doesn't mean we can't have
>> >> > > something similar in bpf_link as well, of course.
>> >> >
>> >> > I'm not really in the loop, but from what I overheard - I think the
>> >> > bpf_link may be targeting something non-networking first.
>> >>
>> >> My preference is to avoid building two different APIs one for XDP and another
>> >> for everything else. If we have userlands that already understand links and
>> >> pinning support is on the way imo lets use these APIs for networking as well.
>> >
>> > I agree here. And yes, I've been working on extending bpf_link into
>> > cgroup and then to XDP. We are still discussing some cgroup-specific
>> > details, but the patch is ready. I'm going to post it as an RFC to get
>> > the discussion started, before we do this for XDP.
>>
>> Well, my reason for being skeptic about bpf_link and proposing the
>> netlink-based API is actually exactly this, but in reverse: With
>> bpf_link we will be in the situation that everything related to a netdev
>> is configured over netlink *except* XDP.
>
> One can argue that everything related to use of BPF is going to be
> uniform and done through BPF syscall? Given variety of possible BPF
> hooks/targets, using custom ways to attach for all those many cases is
> really bad as well, so having a unifying concept and single entry to
> do this is good, no?

Well, it depends on how you view the BPF subsystem's relation to the
rest of the kernel, I suppose. I tend to view it as a subsystem that
provides a bunch of functionality, which you can setup (using "internal"
BPF APIs), and then attach that object to a different subsystem
(networking) using that subsystem's configuration APIs.

Seeing as this really boils down to a matter of taste, though, I'm not
sure we'll find agreement on this :)

>> Other than that, I don't see any reason why the bpf_link API won't work.
>> So I guess that if no one else has any problem with BPF insisting on
>> being a special snowflake, I guess I can live with it as well... *shrugs* :)
>
> Apart from derogatory remark,

Yeah, should have left out the 'snowflake' bit, sorry about that...

> BPF is a bit special here, because it requires every potential BPF
> hook (be it cgroups, xdp, perf_event, etc) to be aware of BPF
> program(s) and execute them with special macro. So like it or not, it
> is special and each driver supporting BPF needs to implement this BPF
> wiring.

All that is about internal implementation, though. I'm bothered by the
API discrepancy (i.e., from the user PoV we'll end up with: "netlink is
what you use to configure your netdev except if you want to attach an
XDP program to it").

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-23 11:25         ` Toke Høiland-Jørgensen
  2020-03-23 18:07           ` Andrii Nakryiko
@ 2020-03-23 23:54           ` Andrey Ignatov
  2020-03-24 10:16             ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 112+ messages in thread
From: Andrey Ignatov @ 2020-03-23 23:54 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	John Fastabend, Lorenz Bauer, Networking, bpf

Toke Høiland-Jørgensen <toke@redhat.com> [Mon, 2020-03-23 04:25 -0700]:
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> 
> > On Fri, Mar 20, 2020 at 1:48 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Jakub Kicinski <kuba@kernel.org> writes:
> >>
> >> > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
> >> >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >>
> >> >> While it is currently possible for userspace to specify that an existing
> >> >> XDP program should not be replaced when attaching to an interface, there is
> >> >> no mechanism to safely replace a specific XDP program with another.
> >> >>
> >> >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
> >> >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
> >> >> currently loaded on the interface matches the expected one, and fail the
> >> >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
> >> >>
> >> >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
> >> >> request checking of the EXPECTED_FD attribute. This is needed for userspace
> >> >> to discover whether the kernel supports the new attribute.
> >> >>
> >> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >
> >> > I didn't know we wanted to go ahead with this...
> >>
> >> Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
> >> happening with that, though. So since this is a straight-forward
> >> extension of the existing API, that doesn't carry a high implementation
> >> cost, I figured I'd just go ahead with this. Doesn't mean we can't have
> >> something similar in bpf_link as well, of course.
> >>
> >> > If we do please run this thru checkpatch, set .strict_start_type,
> >>
> >> Will do.
> >>
> >> > and make the expected fd unsigned. A negative expected fd makes no
> >> > sense.
> >>
> >> A negative expected_fd corresponds to setting the UPDATE_IF_NOEXIST
> >> flag. I guess you could argue that since we have that flag, setting a
> >> negative expected_fd is not strictly needed. However, I thought it was
> >> weird to have a "this is what I expect" API that did not support
> >> expressing "I expect no program to be attached".
> >
> > For BPF syscall it seems the typical approach when optional FD is
> > needed is to have extra flag (e.g., BPF_F_REPLACE for cgroups) and if
> > it's not specified - enforce zero for that optional fd. That handles
> > backwards compatibility cases well as well.
> 
> Never did understand how that is supposed to square with 0 being a valid
> fd number?

In BPF_F_REPLACE case (since it was used as an example in this thread)
it's all pretty clear:

* if the flag is set, use fd from attr.replace_bpf_fd that can be anything
  (incl. zero, since indeed it's valid fd) no problem with that;
* if flag is not set, ignore replace_bpf_fd completely.

It's descirbed in commit log in 7dd68b3279f1:

    ...

    BPF_F_REPLACE is introduced to make the user intent clear, since
    replace_bpf_fd alone can't be used for this (its default value, 0, is a
    valid fd). BPF_F_REPLACE also makes it possible to extend the API in the
    future (e.g. add BPF_F_BEFORE and BPF_F_AFTER if needed).

    ...

, i.e. flag presense is important, not the fd attribute being zero.

Hope it clarifies.


-- 
Andrey Ignatov

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-23 19:23                 ` Toke Høiland-Jørgensen
@ 2020-03-24  1:01                   ` David Ahern
  2020-03-24  4:53                     ` Andrii Nakryiko
  2020-03-24  5:00                   ` Andrii Nakryiko
  1 sibling, 1 reply; 112+ messages in thread
From: David Ahern @ 2020-03-24  1:01 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Andrii Nakryiko
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On 3/23/20 1:23 PM, Toke Høiland-Jørgensen wrote:
>>>> I agree here. And yes, I've been working on extending bpf_link into
>>>> cgroup and then to XDP. We are still discussing some cgroup-specific
>>>> details, but the patch is ready. I'm going to post it as an RFC to get
>>>> the discussion started, before we do this for XDP.
>>>
>>> Well, my reason for being skeptic about bpf_link and proposing the
>>> netlink-based API is actually exactly this, but in reverse: With
>>> bpf_link we will be in the situation that everything related to a netdev
>>> is configured over netlink *except* XDP.

+1

>>
>> One can argue that everything related to use of BPF is going to be
>> uniform and done through BPF syscall? Given variety of possible BPF
>> hooks/targets, using custom ways to attach for all those many cases is
>> really bad as well, so having a unifying concept and single entry to
>> do this is good, no?
> 
> Well, it depends on how you view the BPF subsystem's relation to the
> rest of the kernel, I suppose. I tend to view it as a subsystem that
> provides a bunch of functionality, which you can setup (using "internal"
> BPF APIs), and then attach that object to a different subsystem
> (networking) using that subsystem's configuration APIs.
> 

again, +1.

bpf syscall is used for program related manipulations like load and
unload. Attaching that program to an object has a type unique solution -
e.g., netlink for XDP and ioctl for perf_events.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-24  1:01                   ` David Ahern
@ 2020-03-24  4:53                     ` Andrii Nakryiko
  2020-03-24 20:55                       ` David Ahern
  0 siblings, 1 reply; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-24  4:53 UTC (permalink / raw)
  To: David Ahern
  Cc: Toke Høiland-Jørgensen, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Mon, Mar 23, 2020 at 6:01 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 3/23/20 1:23 PM, Toke Høiland-Jørgensen wrote:
> >>>> I agree here. And yes, I've been working on extending bpf_link into
> >>>> cgroup and then to XDP. We are still discussing some cgroup-specific
> >>>> details, but the patch is ready. I'm going to post it as an RFC to get
> >>>> the discussion started, before we do this for XDP.
> >>>
> >>> Well, my reason for being skeptic about bpf_link and proposing the
> >>> netlink-based API is actually exactly this, but in reverse: With
> >>> bpf_link we will be in the situation that everything related to a netdev
> >>> is configured over netlink *except* XDP.
>
> +1

Hm... so using **libbpf**'s bpf_set_link_xdp_fd() API (notice "bpf" in
the name of the library and function, and notice no "netlink"), which
exposes absolutely nothing about netlink (it's just an internal
implementation detail and can easily change), is ok. But actually
switching to libbpf's bpf_link would be out of ordinary? Especially
considering that to use freplace programs (for libxdp and chaining)
with libbpf you will use bpf_program and bpf_link abstractions
anyways.

>
> >>
> >> One can argue that everything related to use of BPF is going to be
> >> uniform and done through BPF syscall? Given variety of possible BPF
> >> hooks/targets, using custom ways to attach for all those many cases is
> >> really bad as well, so having a unifying concept and single entry to
> >> do this is good, no?
> >
> > Well, it depends on how you view the BPF subsystem's relation to the
> > rest of the kernel, I suppose. I tend to view it as a subsystem that
> > provides a bunch of functionality, which you can setup (using "internal"
> > BPF APIs), and then attach that object to a different subsystem
> > (networking) using that subsystem's configuration APIs.
> >
>
> again, +1.
>
> bpf syscall is used for program related manipulations like load and

bpf syscall is used for way more than that, actually...

> unload. Attaching that program to an object has a type unique solution -
> e.g., netlink for XDP and ioctl for perf_events.

That's not true and hasn't been true for at least a while now. cgroup
programs, flow_dissector, lirc_mode2 (whatever that is, I have no
idea) are attached with BPF_PROG_ATTACH. raw_tracepoint and all the
fentry/fexit/fmod_ret/freplace attachments are done also through bpf
syscall. For perf_event related stuff it's done through ioctls right
now, but with bpf_link unification I wouldn't be surprised if it will
be done through the same LINK_CREATE command soon, as is done for
cgroup and *other* tracing bpf_links. Because consistent API and
semantics is good, rather than having to do it N different ways for N
different subsystems.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-23 19:23                 ` Toke Høiland-Jørgensen
  2020-03-24  1:01                   ` David Ahern
@ 2020-03-24  5:00                   ` Andrii Nakryiko
  2020-03-24 10:57                     ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-24  5:00 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On Mon, Mar 23, 2020 at 12:23 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Mon, Mar 23, 2020 at 4:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>
> >> > On Fri, Mar 20, 2020 at 11:31 AM John Fastabend
> >> > <john.fastabend@gmail.com> wrote:
> >> >>
> >> >> Jakub Kicinski wrote:
> >> >> > On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
> >> >> > > Jakub Kicinski <kuba@kernel.org> writes:
> >> >> > > > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
> >> >> > > >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >> > > >>
> >> >> > > >> While it is currently possible for userspace to specify that an existing
> >> >> > > >> XDP program should not be replaced when attaching to an interface, there is
> >> >> > > >> no mechanism to safely replace a specific XDP program with another.
> >> >> > > >>
> >> >> > > >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
> >> >> > > >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
> >> >> > > >> currently loaded on the interface matches the expected one, and fail the
> >> >> > > >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
> >> >> > > >>
> >> >> > > >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
> >> >> > > >> request checking of the EXPECTED_FD attribute. This is needed for userspace
> >> >> > > >> to discover whether the kernel supports the new attribute.
> >> >> > > >>
> >> >> > > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >> > > >
> >> >> > > > I didn't know we wanted to go ahead with this...
> >> >> > >
> >> >> > > Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
> >> >> > > happening with that, though. So since this is a straight-forward
> >> >> > > extension of the existing API, that doesn't carry a high implementation
> >> >> > > cost, I figured I'd just go ahead with this. Doesn't mean we can't have
> >> >> > > something similar in bpf_link as well, of course.
> >> >> >
> >> >> > I'm not really in the loop, but from what I overheard - I think the
> >> >> > bpf_link may be targeting something non-networking first.
> >> >>
> >> >> My preference is to avoid building two different APIs one for XDP and another
> >> >> for everything else. If we have userlands that already understand links and
> >> >> pinning support is on the way imo lets use these APIs for networking as well.
> >> >
> >> > I agree here. And yes, I've been working on extending bpf_link into
> >> > cgroup and then to XDP. We are still discussing some cgroup-specific
> >> > details, but the patch is ready. I'm going to post it as an RFC to get
> >> > the discussion started, before we do this for XDP.
> >>
> >> Well, my reason for being skeptic about bpf_link and proposing the
> >> netlink-based API is actually exactly this, but in reverse: With
> >> bpf_link we will be in the situation that everything related to a netdev
> >> is configured over netlink *except* XDP.
> >
> > One can argue that everything related to use of BPF is going to be
> > uniform and done through BPF syscall? Given variety of possible BPF
> > hooks/targets, using custom ways to attach for all those many cases is
> > really bad as well, so having a unifying concept and single entry to
> > do this is good, no?
>
> Well, it depends on how you view the BPF subsystem's relation to the
> rest of the kernel, I suppose. I tend to view it as a subsystem that
> provides a bunch of functionality, which you can setup (using "internal"
> BPF APIs), and then attach that object to a different subsystem
> (networking) using that subsystem's configuration APIs.
>
> Seeing as this really boils down to a matter of taste, though, I'm not
> sure we'll find agreement on this :)

Yeah, seems like so. But then again, your view and reality don't seem
to correlate completely. cgroup, a lot of tracing,
flow_dissector/lirc_mode2 attachments all are done through BPF
syscall. LINK_CREATE provides an opportunity to finally unify all
those different ways to achieve the same "attach my BPF program to
some target object" semantics.

>
> >> Other than that, I don't see any reason why the bpf_link API won't work.
> >> So I guess that if no one else has any problem with BPF insisting on
> >> being a special snowflake, I guess I can live with it as well... *shrugs* :)
> >
> > Apart from derogatory remark,
>
> Yeah, should have left out the 'snowflake' bit, sorry about that...
>
> > BPF is a bit special here, because it requires every potential BPF
> > hook (be it cgroups, xdp, perf_event, etc) to be aware of BPF
> > program(s) and execute them with special macro. So like it or not, it
> > is special and each driver supporting BPF needs to implement this BPF
> > wiring.
>
> All that is about internal implementation, though. I'm bothered by the
> API discrepancy (i.e., from the user PoV we'll end up with: "netlink is
> what you use to configure your netdev except if you want to attach an
> XDP program to it").
>

See my reply to David. Depends on where you define user API. Is it
libbpf API, which is what most users are using? Or kernel API? If
everyone is using libbpf, does kernel system (bpf syscall vs netlink)
matter all that much?

Also, isn't this "netlink for configuring, except attaching XDP" rule
the case for XDP today anyway? You set up your netdev with netlink,
then go use libbpf's bpf_set_link_xdp_fd()? Where's netlink in the
latter? :)

> -Toke
>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-23 23:54           ` Andrey Ignatov
@ 2020-03-24 10:16             ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-24 10:16 UTC (permalink / raw)
  To: Andrey Ignatov
  Cc: Andrii Nakryiko, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	John Fastabend, Lorenz Bauer, Networking, bpf

Andrey Ignatov <rdna@fb.com> writes:

> Toke Høiland-Jørgensen <toke@redhat.com> [Mon, 2020-03-23 04:25 -0700]:
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> 
>> > On Fri, Mar 20, 2020 at 1:48 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Jakub Kicinski <kuba@kernel.org> writes:
>> >>
>> >> > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
>> >> >> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> >>
>> >> >> While it is currently possible for userspace to specify that an existing
>> >> >> XDP program should not be replaced when attaching to an interface, there is
>> >> >> no mechanism to safely replace a specific XDP program with another.
>> >> >>
>> >> >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
>> >> >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
>> >> >> currently loaded on the interface matches the expected one, and fail the
>> >> >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
>> >> >>
>> >> >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
>> >> >> request checking of the EXPECTED_FD attribute. This is needed for userspace
>> >> >> to discover whether the kernel supports the new attribute.
>> >> >>
>> >> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> >
>> >> > I didn't know we wanted to go ahead with this...
>> >>
>> >> Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
>> >> happening with that, though. So since this is a straight-forward
>> >> extension of the existing API, that doesn't carry a high implementation
>> >> cost, I figured I'd just go ahead with this. Doesn't mean we can't have
>> >> something similar in bpf_link as well, of course.
>> >>
>> >> > If we do please run this thru checkpatch, set .strict_start_type,
>> >>
>> >> Will do.
>> >>
>> >> > and make the expected fd unsigned. A negative expected fd makes no
>> >> > sense.
>> >>
>> >> A negative expected_fd corresponds to setting the UPDATE_IF_NOEXIST
>> >> flag. I guess you could argue that since we have that flag, setting a
>> >> negative expected_fd is not strictly needed. However, I thought it was
>> >> weird to have a "this is what I expect" API that did not support
>> >> expressing "I expect no program to be attached".
>> >
>> > For BPF syscall it seems the typical approach when optional FD is
>> > needed is to have extra flag (e.g., BPF_F_REPLACE for cgroups) and if
>> > it's not specified - enforce zero for that optional fd. That handles
>> > backwards compatibility cases well as well.
>> 
>> Never did understand how that is supposed to square with 0 being a valid
>> fd number?
>
> In BPF_F_REPLACE case (since it was used as an example in this thread)
> it's all pretty clear:
>
> * if the flag is set, use fd from attr.replace_bpf_fd that can be anything
>   (incl. zero, since indeed it's valid fd) no problem with that;
> * if flag is not set, ignore replace_bpf_fd completely.
>
> It's descirbed in commit log in 7dd68b3279f1:
>
>     ...
>
>     BPF_F_REPLACE is introduced to make the user intent clear, since
>     replace_bpf_fd alone can't be used for this (its default value, 0, is a
>     valid fd). BPF_F_REPLACE also makes it possible to extend the API in the
>     future (e.g. add BPF_F_BEFORE and BPF_F_AFTER if needed).
>
>     ...
>
> , i.e. flag presense is important, not the fd attribute being zero.
>
> Hope it clarifies.

Yup, it does, thanks! My confusion stemmed from having seen '!= 0' tests
for FDs in various places and wondered how that was supposed to work.
Didn't realise this was handled by way of an accompanying flag, that
does make sense :)

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-24  5:00                   ` Andrii Nakryiko
@ 2020-03-24 10:57                     ` Toke Høiland-Jørgensen
  2020-03-24 18:53                       ` Jakub Kicinski
                                         ` (2 more replies)
  0 siblings, 3 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-24 10:57 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Mon, Mar 23, 2020 at 12:23 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Mon, Mar 23, 2020 at 4:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> >>
>> >> > On Fri, Mar 20, 2020 at 11:31 AM John Fastabend
>> >> > <john.fastabend@gmail.com> wrote:
>> >> >>
>> >> >> Jakub Kicinski wrote:
>> >> >> > On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
>> >> >> > > Jakub Kicinski <kuba@kernel.org> writes:
>> >> >> > > > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
>> >> >> > > >> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> >> > > >>
>> >> >> > > >> While it is currently possible for userspace to specify that an existing
>> >> >> > > >> XDP program should not be replaced when attaching to an interface, there is
>> >> >> > > >> no mechanism to safely replace a specific XDP program with another.
>> >> >> > > >>
>> >> >> > > >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
>> >> >> > > >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
>> >> >> > > >> currently loaded on the interface matches the expected one, and fail the
>> >> >> > > >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
>> >> >> > > >>
>> >> >> > > >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
>> >> >> > > >> request checking of the EXPECTED_FD attribute. This is needed for userspace
>> >> >> > > >> to discover whether the kernel supports the new attribute.
>> >> >> > > >>
>> >> >> > > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> >> > > >
>> >> >> > > > I didn't know we wanted to go ahead with this...
>> >> >> > >
>> >> >> > > Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
>> >> >> > > happening with that, though. So since this is a straight-forward
>> >> >> > > extension of the existing API, that doesn't carry a high implementation
>> >> >> > > cost, I figured I'd just go ahead with this. Doesn't mean we can't have
>> >> >> > > something similar in bpf_link as well, of course.
>> >> >> >
>> >> >> > I'm not really in the loop, but from what I overheard - I think the
>> >> >> > bpf_link may be targeting something non-networking first.
>> >> >>
>> >> >> My preference is to avoid building two different APIs one for XDP and another
>> >> >> for everything else. If we have userlands that already understand links and
>> >> >> pinning support is on the way imo lets use these APIs for networking as well.
>> >> >
>> >> > I agree here. And yes, I've been working on extending bpf_link into
>> >> > cgroup and then to XDP. We are still discussing some cgroup-specific
>> >> > details, but the patch is ready. I'm going to post it as an RFC to get
>> >> > the discussion started, before we do this for XDP.
>> >>
>> >> Well, my reason for being skeptic about bpf_link and proposing the
>> >> netlink-based API is actually exactly this, but in reverse: With
>> >> bpf_link we will be in the situation that everything related to a netdev
>> >> is configured over netlink *except* XDP.
>> >
>> > One can argue that everything related to use of BPF is going to be
>> > uniform and done through BPF syscall? Given variety of possible BPF
>> > hooks/targets, using custom ways to attach for all those many cases is
>> > really bad as well, so having a unifying concept and single entry to
>> > do this is good, no?
>>
>> Well, it depends on how you view the BPF subsystem's relation to the
>> rest of the kernel, I suppose. I tend to view it as a subsystem that
>> provides a bunch of functionality, which you can setup (using "internal"
>> BPF APIs), and then attach that object to a different subsystem
>> (networking) using that subsystem's configuration APIs.
>>
>> Seeing as this really boils down to a matter of taste, though, I'm not
>> sure we'll find agreement on this :)
>
> Yeah, seems like so. But then again, your view and reality don't seem
> to correlate completely. cgroup, a lot of tracing,
> flow_dissector/lirc_mode2 attachments all are done through BPF
> syscall.

Well, I wasn't talking about any of those subsystems, I was talking
about networking :)

In particular, networking already has a consistent and fairly
well-designed configuration mechanism (i.e., netlink) that we are
generally trying to move more functionality *towards* not *away from*
(see, e.g., converting ethtool to use netlink).

> LINK_CREATE provides an opportunity to finally unify all those
> different ways to achieve the same "attach my BPF program to some
> target object" semantics.

Well I also happen to think that "attach a BPF program to an object" is
the wrong way to think about XDP. Rather, in my mind the model is
"instruct the netdevice to execute this piece of BPF code".

>> >> Other than that, I don't see any reason why the bpf_link API won't work.
>> >> So I guess that if no one else has any problem with BPF insisting on
>> >> being a special snowflake, I guess I can live with it as well... *shrugs* :)
>> >
>> > Apart from derogatory remark,
>>
>> Yeah, should have left out the 'snowflake' bit, sorry about that...
>>
>> > BPF is a bit special here, because it requires every potential BPF
>> > hook (be it cgroups, xdp, perf_event, etc) to be aware of BPF
>> > program(s) and execute them with special macro. So like it or not, it
>> > is special and each driver supporting BPF needs to implement this BPF
>> > wiring.
>>
>> All that is about internal implementation, though. I'm bothered by the
>> API discrepancy (i.e., from the user PoV we'll end up with: "netlink is
>> what you use to configure your netdev except if you want to attach an
>> XDP program to it").
>>
>
> See my reply to David. Depends on where you define user API. Is it
> libbpf API, which is what most users are using? Or kernel API?

Well I'm talking about the kernel<->userspace API, obviously :)

> If everyone is using libbpf, does kernel system (bpf syscall vs
> netlink) matter all that much?

This argument works the other way as well, though: If libbpf can
abstract the subsystem differences and provide a consistent interface to
"the BPF world", why does BPF need to impose its own syscall API on the
networking subsystem?

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-24 10:57                     ` Toke Høiland-Jørgensen
@ 2020-03-24 18:53                       ` Jakub Kicinski
  2020-03-24 22:30                         ` Andrii Nakryiko
  2020-03-24 19:22                       ` John Fastabend
  2020-03-24 22:25                       ` Andrii Nakryiko
  2 siblings, 1 reply; 112+ messages in thread
From: Jakub Kicinski @ 2020-03-24 18:53 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, John Fastabend, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On Tue, 24 Mar 2020 11:57:45 +0100 Toke Høiland-Jørgensen wrote:
> > If everyone is using libbpf, does kernel system (bpf syscall vs
> > netlink) matter all that much?  
> 
> This argument works the other way as well, though: If libbpf can
> abstract the subsystem differences and provide a consistent interface to
> "the BPF world", why does BPF need to impose its own syscall API on the
> networking subsystem?

Hitting the nail on the head there, again :)

Once upon a time when we were pushing for libbpf focus & unification,
one of my main motivations was that a solid library that most people
use give us the ability to provide user space abstractions.

As much as adding new kernel interfaces "to rule them all" is fun, it
has a real cost.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-24 10:57                     ` Toke Høiland-Jørgensen
  2020-03-24 18:53                       ` Jakub Kicinski
@ 2020-03-24 19:22                       ` John Fastabend
  2020-03-25  1:36                         ` Alexei Starovoitov
  2020-03-25 10:30                         ` Toke Høiland-Jørgensen
  2020-03-24 22:25                       ` Andrii Nakryiko
  2 siblings, 2 replies; 112+ messages in thread
From: John Fastabend @ 2020-03-24 19:22 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Andrii Nakryiko
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

Toke Høiland-Jørgensen wrote:
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> 
> > On Mon, Mar 23, 2020 at 12:23 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>
> >> > On Mon, Mar 23, 2020 at 4:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >>
> >> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >> >>
> >> >> > On Fri, Mar 20, 2020 at 11:31 AM John Fastabend
> >> >> > <john.fastabend@gmail.com> wrote:
> >> >> >>
> >> >> >> Jakub Kicinski wrote:
> >> >> >> > On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
> >> >> >> > > Jakub Kicinski <kuba@kernel.org> writes:
> >> >> >> > > > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
> >> >> >> > > >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >> >> > > >>
> >> >> >> > > >> While it is currently possible for userspace to specify that an existing
> >> >> >> > > >> XDP program should not be replaced when attaching to an interface, there is
> >> >> >> > > >> no mechanism to safely replace a specific XDP program with another.
> >> >> >> > > >>
> >> >> >> > > >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
> >> >> >> > > >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
> >> >> >> > > >> currently loaded on the interface matches the expected one, and fail the
> >> >> >> > > >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
> >> >> >> > > >>
> >> >> >> > > >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
> >> >> >> > > >> request checking of the EXPECTED_FD attribute. This is needed for userspace
> >> >> >> > > >> to discover whether the kernel supports the new attribute.
> >> >> >> > > >>
> >> >> >> > > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >> >> > > >
> >> >> >> > > > I didn't know we wanted to go ahead with this...
> >> >> >> > >
> >> >> >> > > Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
> >> >> >> > > happening with that, though. So since this is a straight-forward
> >> >> >> > > extension of the existing API, that doesn't carry a high implementation
> >> >> >> > > cost, I figured I'd just go ahead with this. Doesn't mean we can't have
> >> >> >> > > something similar in bpf_link as well, of course.
> >> >> >> >
> >> >> >> > I'm not really in the loop, but from what I overheard - I think the
> >> >> >> > bpf_link may be targeting something non-networking first.
> >> >> >>
> >> >> >> My preference is to avoid building two different APIs one for XDP and another
> >> >> >> for everything else. If we have userlands that already understand links and
> >> >> >> pinning support is on the way imo lets use these APIs for networking as well.
> >> >> >
> >> >> > I agree here. And yes, I've been working on extending bpf_link into
> >> >> > cgroup and then to XDP. We are still discussing some cgroup-specific
> >> >> > details, but the patch is ready. I'm going to post it as an RFC to get
> >> >> > the discussion started, before we do this for XDP.
> >> >>
> >> >> Well, my reason for being skeptic about bpf_link and proposing the
> >> >> netlink-based API is actually exactly this, but in reverse: With
> >> >> bpf_link we will be in the situation that everything related to a netdev
> >> >> is configured over netlink *except* XDP.
> >> >
> >> > One can argue that everything related to use of BPF is going to be
> >> > uniform and done through BPF syscall? Given variety of possible BPF
> >> > hooks/targets, using custom ways to attach for all those many cases is
> >> > really bad as well, so having a unifying concept and single entry to
> >> > do this is good, no?
> >>
> >> Well, it depends on how you view the BPF subsystem's relation to the
> >> rest of the kernel, I suppose. I tend to view it as a subsystem that
> >> provides a bunch of functionality, which you can setup (using "internal"
> >> BPF APIs), and then attach that object to a different subsystem
> >> (networking) using that subsystem's configuration APIs.
> >>
> >> Seeing as this really boils down to a matter of taste, though, I'm not
> >> sure we'll find agreement on this :)
> >
> > Yeah, seems like so. But then again, your view and reality don't seem
> > to correlate completely. cgroup, a lot of tracing,
> > flow_dissector/lirc_mode2 attachments all are done through BPF
> > syscall.
> 
> Well, I wasn't talking about any of those subsystems, I was talking
> about networking :)

My experience has been that networking in the strict sense of XDP no
longer exists on its own without cgroups, flow dissector, sockops,
sockmap, tracing, etc. All of these pieces are built, patched, loaded,
pinned and otherwise managed and manipulated as BPF objects via libbpf.

Because I have all this infra in place for other items its a bit odd
imo to drop out of BPF apis to then swap a program differently in the
XDP case from how I would swap a program in any other place. I'm
assuming ability to swap links will be enabled at some point.

Granted it just means I have some extra functions on the side to manage
the swap similar to how 'qdisc' would be handled today but still not as
nice an experience in my case as if it was handled natively.

Anyways the netlink API is going to have to call into the BPF infra
on the kernel side for verification, etc so its already not pure
networking.

> 
> In particular, networking already has a consistent and fairly
> well-designed configuration mechanism (i.e., netlink) that we are
> generally trying to move more functionality *towards* not *away from*
> (see, e.g., converting ethtool to use netlink).

True. But BPF programs are going to exist and interop with other
programs not exactly in the networking space. Actually library calls
might be used in tracing, cgroups, and XDP side. It gets a bit more
interesting if the "same" object file (with some patching) runs in both
XDP and sockops land for example.

> 
> > LINK_CREATE provides an opportunity to finally unify all those
> > different ways to achieve the same "attach my BPF program to some
> > target object" semantics.
> 
> Well I also happen to think that "attach a BPF program to an object" is
> the wrong way to think about XDP. Rather, in my mind the model is
> "instruct the netdevice to execute this piece of BPF code".
> 
> >> >> Other than that, I don't see any reason why the bpf_link API won't work.
> >> >> So I guess that if no one else has any problem with BPF insisting on
> >> >> being a special snowflake, I guess I can live with it as well... *shrugs* :)
> >> >
> >> > Apart from derogatory remark,
> >>
> >> Yeah, should have left out the 'snowflake' bit, sorry about that...
> >>
> >> > BPF is a bit special here, because it requires every potential BPF
> >> > hook (be it cgroups, xdp, perf_event, etc) to be aware of BPF
> >> > program(s) and execute them with special macro. So like it or not, it
> >> > is special and each driver supporting BPF needs to implement this BPF
> >> > wiring.
> >>
> >> All that is about internal implementation, though. I'm bothered by the
> >> API discrepancy (i.e., from the user PoV we'll end up with: "netlink is
> >> what you use to configure your netdev except if you want to attach an
> >> XDP program to it").
> >>
> >
> > See my reply to David. Depends on where you define user API. Is it
> > libbpf API, which is what most users are using? Or kernel API?
> 
> Well I'm talking about the kernel<->userspace API, obviously :)
> 
> > If everyone is using libbpf, does kernel system (bpf syscall vs
> > netlink) matter all that much?
> 
> This argument works the other way as well, though: If libbpf can
> abstract the subsystem differences and provide a consistent interface to
> "the BPF world", why does BPF need to impose its own syscall API on the
> networking subsystem?

I can make it work either way as a netlink or syscall its not going
to be a blocker. If we go netlink route then the next question is
does libbpf pull in the ability to swap XDP progs via netlink or
is that some other lib?

> 
> -Toke
> 



^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-24  4:53                     ` Andrii Nakryiko
@ 2020-03-24 20:55                       ` David Ahern
  2020-03-24 22:56                         ` Andrii Nakryiko
  0 siblings, 1 reply; 112+ messages in thread
From: David Ahern @ 2020-03-24 20:55 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Toke Høiland-Jørgensen, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On 3/23/20 10:53 PM, Andrii Nakryiko wrote:
> On Mon, Mar 23, 2020 at 6:01 PM David Ahern <dsahern@gmail.com> wrote:
>>
>> On 3/23/20 1:23 PM, Toke Høiland-Jørgensen wrote:
>>>>>> I agree here. And yes, I've been working on extending bpf_link into
>>>>>> cgroup and then to XDP. We are still discussing some cgroup-specific
>>>>>> details, but the patch is ready. I'm going to post it as an RFC to get
>>>>>> the discussion started, before we do this for XDP.
>>>>>
>>>>> Well, my reason for being skeptic about bpf_link and proposing the
>>>>> netlink-based API is actually exactly this, but in reverse: With
>>>>> bpf_link we will be in the situation that everything related to a netdev
>>>>> is configured over netlink *except* XDP.
>>
>> +1
> 
> Hm... so using **libbpf**'s bpf_set_link_xdp_fd() API (notice "bpf" in
> the name of the library and function, and notice no "netlink"), which
> exposes absolutely nothing about netlink (it's just an internal
> implementation detail and can easily change), is ok. But actually
> switching to libbpf's bpf_link would be out of ordinary? Especially
> considering that to use freplace programs (for libxdp and chaining)
> with libbpf you will use bpf_program and bpf_link abstractions
> anyways.

It seems to me you are conflating libbpf api with the kernel uapi.
Making libbpf user friendly certainly encourages standardization on its
use, but there is no requirement that use of bpf means use of libbpf.

> 
>>
>>>>
>>>> One can argue that everything related to use of BPF is going to be
>>>> uniform and done through BPF syscall? Given variety of possible BPF
>>>> hooks/targets, using custom ways to attach for all those many cases is
>>>> really bad as well, so having a unifying concept and single entry to
>>>> do this is good, no?
>>>
>>> Well, it depends on how you view the BPF subsystem's relation to the
>>> rest of the kernel, I suppose. I tend to view it as a subsystem that
>>> provides a bunch of functionality, which you can setup (using "internal"
>>> BPF APIs), and then attach that object to a different subsystem
>>> (networking) using that subsystem's configuration APIs.
>>>
>>
>> again, +1.
>>
>> bpf syscall is used for program related manipulations like load and
> 
> bpf syscall is used for way more than that, actually...
> 
>> unload. Attaching that program to an object has a type unique solution -
>> e.g., netlink for XDP and ioctl for perf_events.
> 
> That's not true and hasn't been true for at least a while now. cgroup
> programs, flow_dissector, lirc_mode2 (whatever that is, I have no
> idea) are attached with BPF_PROG_ATTACH. raw_tracepoint and all the
> fentry/fexit/fmod_ret/freplace attachments are done also through bpf
> syscall. For perf_event related stuff it's done through ioctls right
> now, but with bpf_link unification I wouldn't be surprised if it will

and it always will be able to. Kernel uapi will not be revoked because a
new way to do something comes along.

> be done through the same LINK_CREATE command soon, as is done for
> cgroup and *other* tracing bpf_links. Because consistent API and
> semantics is good, rather than having to do it N different ways for N
> different subsystems.
> 

That's a bpf / libbpf centric perspective. What Toke and I are saying is
the networking centric perspective matters to and networking uses
netlink for configuration.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-24 10:57                     ` Toke Høiland-Jørgensen
  2020-03-24 18:53                       ` Jakub Kicinski
  2020-03-24 19:22                       ` John Fastabend
@ 2020-03-24 22:25                       ` Andrii Nakryiko
  2020-03-25  9:38                         ` Toke Høiland-Jørgensen
  2 siblings, 1 reply; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-24 22:25 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On Tue, Mar 24, 2020 at 3:57 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Mon, Mar 23, 2020 at 12:23 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>
> >> > On Mon, Mar 23, 2020 at 4:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >>
> >> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >> >>
> >> >> > On Fri, Mar 20, 2020 at 11:31 AM John Fastabend
> >> >> > <john.fastabend@gmail.com> wrote:
> >> >> >>
> >> >> >> Jakub Kicinski wrote:
> >> >> >> > On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
> >> >> >> > > Jakub Kicinski <kuba@kernel.org> writes:
> >> >> >> > > > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
> >> >> >> > > >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >> >> > > >>
> >> >> >> > > >> While it is currently possible for userspace to specify that an existing
> >> >> >> > > >> XDP program should not be replaced when attaching to an interface, there is
> >> >> >> > > >> no mechanism to safely replace a specific XDP program with another.
> >> >> >> > > >>
> >> >> >> > > >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
> >> >> >> > > >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
> >> >> >> > > >> currently loaded on the interface matches the expected one, and fail the
> >> >> >> > > >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
> >> >> >> > > >>
> >> >> >> > > >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
> >> >> >> > > >> request checking of the EXPECTED_FD attribute. This is needed for userspace
> >> >> >> > > >> to discover whether the kernel supports the new attribute.
> >> >> >> > > >>
> >> >> >> > > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >> >> > > >
> >> >> >> > > > I didn't know we wanted to go ahead with this...
> >> >> >> > >
> >> >> >> > > Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
> >> >> >> > > happening with that, though. So since this is a straight-forward
> >> >> >> > > extension of the existing API, that doesn't carry a high implementation
> >> >> >> > > cost, I figured I'd just go ahead with this. Doesn't mean we can't have
> >> >> >> > > something similar in bpf_link as well, of course.
> >> >> >> >
> >> >> >> > I'm not really in the loop, but from what I overheard - I think the
> >> >> >> > bpf_link may be targeting something non-networking first.
> >> >> >>
> >> >> >> My preference is to avoid building two different APIs one for XDP and another
> >> >> >> for everything else. If we have userlands that already understand links and
> >> >> >> pinning support is on the way imo lets use these APIs for networking as well.
> >> >> >
> >> >> > I agree here. And yes, I've been working on extending bpf_link into
> >> >> > cgroup and then to XDP. We are still discussing some cgroup-specific
> >> >> > details, but the patch is ready. I'm going to post it as an RFC to get
> >> >> > the discussion started, before we do this for XDP.
> >> >>
> >> >> Well, my reason for being skeptic about bpf_link and proposing the
> >> >> netlink-based API is actually exactly this, but in reverse: With
> >> >> bpf_link we will be in the situation that everything related to a netdev
> >> >> is configured over netlink *except* XDP.
> >> >
> >> > One can argue that everything related to use of BPF is going to be
> >> > uniform and done through BPF syscall? Given variety of possible BPF
> >> > hooks/targets, using custom ways to attach for all those many cases is
> >> > really bad as well, so having a unifying concept and single entry to
> >> > do this is good, no?
> >>
> >> Well, it depends on how you view the BPF subsystem's relation to the
> >> rest of the kernel, I suppose. I tend to view it as a subsystem that
> >> provides a bunch of functionality, which you can setup (using "internal"
> >> BPF APIs), and then attach that object to a different subsystem
> >> (networking) using that subsystem's configuration APIs.
> >>
> >> Seeing as this really boils down to a matter of taste, though, I'm not
> >> sure we'll find agreement on this :)
> >
> > Yeah, seems like so. But then again, your view and reality don't seem
> > to correlate completely. cgroup, a lot of tracing,
> > flow_dissector/lirc_mode2 attachments all are done through BPF
> > syscall.
>
> Well, I wasn't talking about any of those subsystems, I was talking
> about networking :)

So it's not "BPF subsystem's relation to the rest of the kernel" from
your previous email, it's now only "talking about networking"? Since
when the rest of the kernel is networking?

But anyways, I think John addressed modern XDP networking issues in
his email very well already.

>
> In particular, networking already has a consistent and fairly
> well-designed configuration mechanism (i.e., netlink) that we are
> generally trying to move more functionality *towards* not *away from*
> (see, e.g., converting ethtool to use netlink).
>
> > LINK_CREATE provides an opportunity to finally unify all those
> > different ways to achieve the same "attach my BPF program to some
> > target object" semantics.
>
> Well I also happen to think that "attach a BPF program to an object" is
> the wrong way to think about XDP. Rather, in my mind the model is
> "instruct the netdevice to execute this piece of BPF code".

That can't be reconciled, so no point of arguing :) But thinking about
BPF in general, I think it's closer to attach BPF program thinking
(especially all the fexit/fentry, kprobe, etc), where objects that BPF
is attached to is not "active" in the sense of "calling BPF", it's
more of BPF system setting things up (attaching?) in such a way that
BPF program is executed when appropriate.

>
> >> >> Other than that, I don't see any reason why the bpf_link API won't work.
> >> >> So I guess that if no one else has any problem with BPF insisting on
> >> >> being a special snowflake, I guess I can live with it as well... *shrugs* :)
> >> >
> >> > Apart from derogatory remark,
> >>
> >> Yeah, should have left out the 'snowflake' bit, sorry about that...
> >>
> >> > BPF is a bit special here, because it requires every potential BPF
> >> > hook (be it cgroups, xdp, perf_event, etc) to be aware of BPF
> >> > program(s) and execute them with special macro. So like it or not, it
> >> > is special and each driver supporting BPF needs to implement this BPF
> >> > wiring.
> >>
> >> All that is about internal implementation, though. I'm bothered by the
> >> API discrepancy (i.e., from the user PoV we'll end up with: "netlink is
> >> what you use to configure your netdev except if you want to attach an
> >> XDP program to it").
> >>
> >
> > See my reply to David. Depends on where you define user API. Is it
> > libbpf API, which is what most users are using? Or kernel API?
>
> Well I'm talking about the kernel<->userspace API, obviously :)
>
> > If everyone is using libbpf, does kernel system (bpf syscall vs
> > netlink) matter all that much?
>
> This argument works the other way as well, though: If libbpf can
> abstract the subsystem differences and provide a consistent interface to
> "the BPF world", why does BPF need to impose its own syscall API on the
> networking subsystem?

bpf_link in libbpf started as user-space abstraction only, but we
realized that it's not enough and there is a need to have proper
kernel support and corresponding kernel object, so it's not just
user-space API concerns.

As for having netlink interface for creating link only for XDP. Why
duplicating and maintaining 2 interfaces? All the other subsystems
will go through bpf syscall, only XDP wants to (also) have this
through netlink. This means duplication of UAPI for no added benefit.
It's a LINK_CREATE operations, as well as LINK_UPDATE operations. Do
we need to duplicate LINK_QUERY (once its implemented)? What if we'd
like to support some other generic bpf_link functionality, would it be
ok to add it only to bpf syscall, or we need to duplicate this in
netlink as well?

>
> -Toke
>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-24 18:53                       ` Jakub Kicinski
@ 2020-03-24 22:30                         ` Andrii Nakryiko
  2020-03-25  1:25                           ` Jakub Kicinski
  0 siblings, 1 reply; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-24 22:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Tue, Mar 24, 2020 at 11:53 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 24 Mar 2020 11:57:45 +0100 Toke Høiland-Jørgensen wrote:
> > > If everyone is using libbpf, does kernel system (bpf syscall vs
> > > netlink) matter all that much?
> >
> > This argument works the other way as well, though: If libbpf can
> > abstract the subsystem differences and provide a consistent interface to
> > "the BPF world", why does BPF need to impose its own syscall API on the
> > networking subsystem?
>
> Hitting the nail on the head there, again :)
>
> Once upon a time when we were pushing for libbpf focus & unification,
> one of my main motivations was that a solid library that most people
> use give us the ability to provide user space abstractions.

Yes, but bpf_link is not a user-space abstraction only anymore. It
started that way and we quickly realized that we still will need
kernel support. Not everything can be abstracted in user-space only.
So I don't see any contradiction here, that's still libbpf focus.

>
> As much as adding new kernel interfaces "to rule them all" is fun, it
> has a real cost.

We are adding kernel interface regardless of XDP (for cgroups and
tracing, then perf_events, etc). The real point and real cost here is
to not have another duplication of same functionality just for XDP use
case. That's the real cost, not the other way around. Don't know how
to emphasize this further.

And there is very little fun involved from my side, believe it or not...

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-24 20:55                       ` David Ahern
@ 2020-03-24 22:56                         ` Andrii Nakryiko
  0 siblings, 0 replies; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-24 22:56 UTC (permalink / raw)
  To: David Ahern
  Cc: Toke Høiland-Jørgensen, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Tue, Mar 24, 2020 at 1:55 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 3/23/20 10:53 PM, Andrii Nakryiko wrote:
> > On Mon, Mar 23, 2020 at 6:01 PM David Ahern <dsahern@gmail.com> wrote:
> >>
> >> On 3/23/20 1:23 PM, Toke Høiland-Jørgensen wrote:
> >>>>>> I agree here. And yes, I've been working on extending bpf_link into
> >>>>>> cgroup and then to XDP. We are still discussing some cgroup-specific
> >>>>>> details, but the patch is ready. I'm going to post it as an RFC to get
> >>>>>> the discussion started, before we do this for XDP.
> >>>>>
> >>>>> Well, my reason for being skeptic about bpf_link and proposing the
> >>>>> netlink-based API is actually exactly this, but in reverse: With
> >>>>> bpf_link we will be in the situation that everything related to a netdev
> >>>>> is configured over netlink *except* XDP.
> >>
> >> +1
> >
> > Hm... so using **libbpf**'s bpf_set_link_xdp_fd() API (notice "bpf" in
> > the name of the library and function, and notice no "netlink"), which
> > exposes absolutely nothing about netlink (it's just an internal
> > implementation detail and can easily change), is ok. But actually
> > switching to libbpf's bpf_link would be out of ordinary? Especially
> > considering that to use freplace programs (for libxdp and chaining)
> > with libbpf you will use bpf_program and bpf_link abstractions
> > anyways.
>
> It seems to me you are conflating libbpf api with the kernel uapi.

I'm not, as you can see in other email where I explicitly asked about
which ones we care in this discussion the most.

> Making libbpf user friendly certainly encourages standardization on its
> use, but there is no requirement that use of bpf means use of libbpf.

Agree, we can't force anyone to use libbpf. But it seems a pretty
popular choice in practice.

>
> >
> >>
> >>>>
> >>>> One can argue that everything related to use of BPF is going to be
> >>>> uniform and done through BPF syscall? Given variety of possible BPF
> >>>> hooks/targets, using custom ways to attach for all those many cases is
> >>>> really bad as well, so having a unifying concept and single entry to
> >>>> do this is good, no?
> >>>
> >>> Well, it depends on how you view the BPF subsystem's relation to the
> >>> rest of the kernel, I suppose. I tend to view it as a subsystem that
> >>> provides a bunch of functionality, which you can setup (using "internal"
> >>> BPF APIs), and then attach that object to a different subsystem
> >>> (networking) using that subsystem's configuration APIs.
> >>>
> >>
> >> again, +1.
> >>
> >> bpf syscall is used for program related manipulations like load and
> >
> > bpf syscall is used for way more than that, actually...
> >
> >> unload. Attaching that program to an object has a type unique solution -
> >> e.g., netlink for XDP and ioctl for perf_events.
> >
> > That's not true and hasn't been true for at least a while now. cgroup
> > programs, flow_dissector, lirc_mode2 (whatever that is, I have no
> > idea) are attached with BPF_PROG_ATTACH. raw_tracepoint and all the
> > fentry/fexit/fmod_ret/freplace attachments are done also through bpf
> > syscall. For perf_event related stuff it's done through ioctls right
> > now, but with bpf_link unification I wouldn't be surprised if it will
>
> and it always will be able to. Kernel uapi will not be revoked because a
> new way to do something comes along.

Good that we are in agreement that BPF attachment is not really a type
unique solution.

Also, I didn't say any of the existing APIs will go away. But to
support pinnable/queryable bpf_link, we'll need a new API for
perf_event. And I believe it should be done through bpf syscall, not
through more ioctls. Which is what we are discussing here w.r.t. XDP
as well. Existing way of attaching BPF program directly (with no
bpf_link created, no way to pin and query that bpf_link, etc) won't go
away anywhere, of course. But there is no need to duplicate
bpf_link-related APIs in netlink, if we are going to do it as part of
bpf syscall.

>
> > be done through the same LINK_CREATE command soon, as is done for
> > cgroup and *other* tracing bpf_links. Because consistent API and
> > semantics is good, rather than having to do it N different ways for N
> > different subsystems.
> >
>
> That's a bpf / libbpf centric perspective. What Toke and I are saying is
> the networking centric perspective matters to and networking uses
> netlink for configuration.

It's BPF-centric because BPF is much wider than networking which
allows to keep things in perspective beyond networking world. It's
about cost of maintaining UAPIs and consistency across whole range of
BPF program types. I don't see a good reason to maintain duplicate
APIs. We are going to have bpf_link API through bpf syscall (because
cgroups, tracing, etc) and it is going to be generic. So what's the
upside to duplicating it in netlink as well?

Can one work with XDP without bpf syscall? No, one cannot. So we are
not adding a new "syscall dependency" or anything like that.

On the other hand, as a developer, can I develop XDP application
without using netlink API at all? Funnily enough, I could if BPF
syscall allowed attaching to ifindex, couldn't I? If I develop some
monitoring application using XDP and not intending to configure any
network interface, just attach my BPF program and let it run for a
bit. Why would I bother with implementing entire netlink protocol just
to attach BPF program? But I also don't subscribe to a notion of
"attaching BPF program is configuration", so...

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-24 22:30                         ` Andrii Nakryiko
@ 2020-03-25  1:25                           ` Jakub Kicinski
  0 siblings, 0 replies; 112+ messages in thread
From: Jakub Kicinski @ 2020-03-25  1:25 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Tue, 24 Mar 2020 15:30:58 -0700 Andrii Nakryiko wrote:
> On Tue, Mar 24, 2020 at 11:53 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue, 24 Mar 2020 11:57:45 +0100 Toke Høiland-Jørgensen wrote:  
> > > > If everyone is using libbpf, does kernel system (bpf syscall vs
> > > > netlink) matter all that much?  
> > >
> > > This argument works the other way as well, though: If libbpf can
> > > abstract the subsystem differences and provide a consistent interface to
> > > "the BPF world", why does BPF need to impose its own syscall API on the
> > > networking subsystem?  
> >
> > Hitting the nail on the head there, again :)
> >
> > Once upon a time when we were pushing for libbpf focus & unification,
> > one of my main motivations was that a solid library that most people
> > use give us the ability to provide user space abstractions.  
> 
> Yes, but bpf_link is not a user-space abstraction only anymore. It
> started that way and we quickly realized that we still will need
> kernel support. Not everything can be abstracted in user-space only.
> So I don't see any contradiction here, that's still libbpf focus.
>
> > As much as adding new kernel interfaces "to rule them all" is fun, it
> > has a real cost.  
> 
> We are adding kernel interface regardless of XDP (for cgroups and
> tracing, then perf_events, etc). The real point and real cost here is
> to not have another duplication of same functionality just for XDP use
> case. That's the real cost, not the other way around. Don't know how
> to emphasize this further.

Toke's change is net 30 lines of kernel code while retaining full netlink
compliance and abilities. The integration with libbpf is pretty trivial
as well. He has an actual project which needs this functionality, and
for which his change is sufficient.

Neither LoC/maintenance burden nor use cases are in favor of bpf_link
to put it mildly.

> And there is very little fun involved from my side, believe it or not...


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-24 19:22                       ` John Fastabend
@ 2020-03-25  1:36                         ` Alexei Starovoitov
  2020-03-25  2:15                           ` Jakub Kicinski
  2020-03-25 10:42                           ` Toke Høiland-Jørgensen
  2020-03-25 10:30                         ` Toke Høiland-Jørgensen
  1 sibling, 2 replies; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-25  1:36 UTC (permalink / raw)
  To: John Fastabend
  Cc: Toke Høiland-Jørgensen, Andrii Nakryiko,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, Lorenz Bauer,
	Andrey Ignatov, Networking, bpf

On Tue, Mar 24, 2020 at 12:22:47PM -0700, John Fastabend wrote:
> > 
> > Well, I wasn't talking about any of those subsystems, I was talking
> > about networking :)
> 
> My experience has been that networking in the strict sense of XDP no
> longer exists on its own without cgroups, flow dissector, sockops,
> sockmap, tracing, etc. All of these pieces are built, patched, loaded,
> pinned and otherwise managed and manipulated as BPF objects via libbpf.
> 
> Because I have all this infra in place for other items its a bit odd
> imo to drop out of BPF apis to then swap a program differently in the
> XDP case from how I would swap a program in any other place. I'm
> assuming ability to swap links will be enabled at some point.
> 
> Granted it just means I have some extra functions on the side to manage
> the swap similar to how 'qdisc' would be handled today but still not as
> nice an experience in my case as if it was handled natively.
> 
> Anyways the netlink API is going to have to call into the BPF infra
> on the kernel side for verification, etc so its already not pure
> networking.
> 
> > 
> > In particular, networking already has a consistent and fairly
> > well-designed configuration mechanism (i.e., netlink) that we are
> > generally trying to move more functionality *towards* not *away from*
> > (see, e.g., converting ethtool to use netlink).
> 
> True. But BPF programs are going to exist and interop with other
> programs not exactly in the networking space. Actually library calls
> might be used in tracing, cgroups, and XDP side. It gets a bit more
> interesting if the "same" object file (with some patching) runs in both
> XDP and sockops land for example.

Thanks John for summarizing it very well.
It looks to me that netlink proponents fail to realize that "bpf for
networking" goes way beyond what netlink is doing and capable of doing in the
future. BPF_*_INET_* progs do core networking without any smell of netlink
anywhere. "But, but, but, netlink is the way to configure networking"... is
simply not true. Even in years before BPF sockets and syscalls were the way to
do it. netlink has plenty of awesome properties, but arguing that it's the
only true way to do networking is not matching the reality.
Details are important and every case is different. So imo:
converting ethtool to netlink - great stuff.
converting netdev irq/queue management to netlink - great stuff too.
adding more netlink api for xdp - really bad idea.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-25  1:36                         ` Alexei Starovoitov
@ 2020-03-25  2:15                           ` Jakub Kicinski
  2020-03-25 18:06                             ` Alexei Starovoitov
  2020-03-25 10:42                           ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 112+ messages in thread
From: Jakub Kicinski @ 2020-03-25  2:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: John Fastabend, Toke Høiland-Jørgensen,
	Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, Lorenz Bauer,
	Andrey Ignatov, Networking, bpf

On Tue, 24 Mar 2020 18:36:31 -0700 Alexei Starovoitov wrote:
> On Tue, Mar 24, 2020 at 12:22:47PM -0700, John Fastabend wrote:
> > > Well, I wasn't talking about any of those subsystems, I was talking
> > > about networking :)  
> > 
> > My experience has been that networking in the strict sense of XDP no
> > longer exists on its own without cgroups, flow dissector, sockops,
> > sockmap, tracing, etc. All of these pieces are built, patched, loaded,
> > pinned and otherwise managed and manipulated as BPF objects via libbpf.
> > 
> > Because I have all this infra in place for other items its a bit odd
> > imo to drop out of BPF apis to then swap a program differently in the
> > XDP case from how I would swap a program in any other place. I'm
> > assuming ability to swap links will be enabled at some point.
> > 
> > Granted it just means I have some extra functions on the side to manage
> > the swap similar to how 'qdisc' would be handled today but still not as
> > nice an experience in my case as if it was handled natively.
> > 
> > Anyways the netlink API is going to have to call into the BPF infra
> > on the kernel side for verification, etc so its already not pure
> > networking.
> >   
> > > 
> > > In particular, networking already has a consistent and fairly
> > > well-designed configuration mechanism (i.e., netlink) that we are
> > > generally trying to move more functionality *towards* not *away from*
> > > (see, e.g., converting ethtool to use netlink).  
> > 
> > True. But BPF programs are going to exist and interop with other
> > programs not exactly in the networking space. Actually library calls
> > might be used in tracing, cgroups, and XDP side. It gets a bit more
> > interesting if the "same" object file (with some patching) runs in both
> > XDP and sockops land for example.  
> 
> Thanks John for summarizing it very well.
> It looks to me that netlink proponents fail to realize that "bpf for
> networking" goes way beyond what netlink is doing and capable of doing in the
> future. BPF_*_INET_* progs do core networking without any smell of netlink
> anywhere. "But, but, but, netlink is the way to configure networking"... is
> simply not true. Even in years before BPF sockets and syscalls were the way to
> do it. netlink has plenty of awesome properties, but arguing that it's the
> only true way to do networking is not matching the reality.

It is the way to configure XDP today, so it's only natural to
scrutinize the attempts to replace it. 

Also I personally don't think you'd see this much push back trying to
add bpf_link-based stuff to cls_bpf, that's an add-on. XDP is
integrated very fundamentally with the networking stack at this point.

> Details are important and every case is different. So imo:
> converting ethtool to netlink - great stuff.
> converting netdev irq/queue management to netlink - great stuff too.
> adding more netlink api for xdp - really bad idea.

Why is it a bad idea?

There are plenty things which will only be available over netlink.
Configuring the interface so installing the XDP program is possible
(disabling features, configuring queues etc.). Chances are user gets
the ifindex of the interface to attach to over netlink in the first
place. The queue configuration (which you agree belongs in netlink)
will definitely get more complex to allow REDIRECTs to work more
smoothly. AF_XDP needs all sort of netlink stuff.

Netlink gives us the notification mechanism which is how we solve
coordination across daemons (something that BPF subsystem is only 
now trying to solve).

BPF subsystem has a proven track record of reimplementing things devs
don't like or haven't studied (bpftool net, netlink library). So it is
a real concern to allow duplicating parts of the kernel netlink API.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-24 22:25                       ` Andrii Nakryiko
@ 2020-03-25  9:38                         ` Toke Høiland-Jørgensen
  2020-03-25 17:55                           ` Alexei Starovoitov
  2020-03-26  0:16                           ` Andrii Nakryiko
  0 siblings, 2 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-25  9:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Mar 24, 2020 at 3:57 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Mon, Mar 23, 2020 at 12:23 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> >>
>> >> > On Mon, Mar 23, 2020 at 4:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> >>
>> >> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> >> >>
>> >> >> > On Fri, Mar 20, 2020 at 11:31 AM John Fastabend
>> >> >> > <john.fastabend@gmail.com> wrote:
>> >> >> >>
>> >> >> >> Jakub Kicinski wrote:
>> >> >> >> > On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
>> >> >> >> > > Jakub Kicinski <kuba@kernel.org> writes:
>> >> >> >> > > > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
>> >> >> >> > > >> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> >> >> > > >>
>> >> >> >> > > >> While it is currently possible for userspace to specify that an existing
>> >> >> >> > > >> XDP program should not be replaced when attaching to an interface, there is
>> >> >> >> > > >> no mechanism to safely replace a specific XDP program with another.
>> >> >> >> > > >>
>> >> >> >> > > >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
>> >> >> >> > > >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
>> >> >> >> > > >> currently loaded on the interface matches the expected one, and fail the
>> >> >> >> > > >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
>> >> >> >> > > >>
>> >> >> >> > > >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
>> >> >> >> > > >> request checking of the EXPECTED_FD attribute. This is needed for userspace
>> >> >> >> > > >> to discover whether the kernel supports the new attribute.
>> >> >> >> > > >>
>> >> >> >> > > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> >> >> > > >
>> >> >> >> > > > I didn't know we wanted to go ahead with this...
>> >> >> >> > >
>> >> >> >> > > Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
>> >> >> >> > > happening with that, though. So since this is a straight-forward
>> >> >> >> > > extension of the existing API, that doesn't carry a high implementation
>> >> >> >> > > cost, I figured I'd just go ahead with this. Doesn't mean we can't have
>> >> >> >> > > something similar in bpf_link as well, of course.
>> >> >> >> >
>> >> >> >> > I'm not really in the loop, but from what I overheard - I think the
>> >> >> >> > bpf_link may be targeting something non-networking first.
>> >> >> >>
>> >> >> >> My preference is to avoid building two different APIs one for XDP and another
>> >> >> >> for everything else. If we have userlands that already understand links and
>> >> >> >> pinning support is on the way imo lets use these APIs for networking as well.
>> >> >> >
>> >> >> > I agree here. And yes, I've been working on extending bpf_link into
>> >> >> > cgroup and then to XDP. We are still discussing some cgroup-specific
>> >> >> > details, but the patch is ready. I'm going to post it as an RFC to get
>> >> >> > the discussion started, before we do this for XDP.
>> >> >>
>> >> >> Well, my reason for being skeptic about bpf_link and proposing the
>> >> >> netlink-based API is actually exactly this, but in reverse: With
>> >> >> bpf_link we will be in the situation that everything related to a netdev
>> >> >> is configured over netlink *except* XDP.
>> >> >
>> >> > One can argue that everything related to use of BPF is going to be
>> >> > uniform and done through BPF syscall? Given variety of possible BPF
>> >> > hooks/targets, using custom ways to attach for all those many cases is
>> >> > really bad as well, so having a unifying concept and single entry to
>> >> > do this is good, no?
>> >>
>> >> Well, it depends on how you view the BPF subsystem's relation to the
>> >> rest of the kernel, I suppose. I tend to view it as a subsystem that
>> >> provides a bunch of functionality, which you can setup (using "internal"
>> >> BPF APIs), and then attach that object to a different subsystem
>> >> (networking) using that subsystem's configuration APIs.
>> >>
>> >> Seeing as this really boils down to a matter of taste, though, I'm not
>> >> sure we'll find agreement on this :)
>> >
>> > Yeah, seems like so. But then again, your view and reality don't seem
>> > to correlate completely. cgroup, a lot of tracing,
>> > flow_dissector/lirc_mode2 attachments all are done through BPF
>> > syscall.
>>
>> Well, I wasn't talking about any of those subsystems, I was talking
>> about networking :)
>
> So it's not "BPF subsystem's relation to the rest of the kernel" from
> your previous email, it's now only "talking about networking"? Since
> when the rest of the kernel is networking?

Not really, I would likely argue the same for any other subsystem, I
just prefer to limit myself to talking about things I actually know
something about. Hence, networking :)

> But anyways, I think John addressed modern XDP networking issues in
> his email very well already.

Going to reply to that email next...

>> In particular, networking already has a consistent and fairly
>> well-designed configuration mechanism (i.e., netlink) that we are
>> generally trying to move more functionality *towards* not *away from*
>> (see, e.g., converting ethtool to use netlink).
>>
>> > LINK_CREATE provides an opportunity to finally unify all those
>> > different ways to achieve the same "attach my BPF program to some
>> > target object" semantics.
>>
>> Well I also happen to think that "attach a BPF program to an object" is
>> the wrong way to think about XDP. Rather, in my mind the model is
>> "instruct the netdevice to execute this piece of BPF code".
>
> That can't be reconciled, so no point of arguing :) But thinking about
> BPF in general, I think it's closer to attach BPF program thinking
> (especially all the fexit/fentry, kprobe, etc), where objects that BPF
> is attached to is not "active" in the sense of "calling BPF", it's
> more of BPF system setting things up (attaching?) in such a way that
> BPF program is executed when appropriate.

I'd tend to agree with you on most of the tracing stuff, but not on
this. But let's just agree to disagree here :)

>> >> >> Other than that, I don't see any reason why the bpf_link API won't work.
>> >> >> So I guess that if no one else has any problem with BPF insisting on
>> >> >> being a special snowflake, I guess I can live with it as well... *shrugs* :)
>> >> >
>> >> > Apart from derogatory remark,
>> >>
>> >> Yeah, should have left out the 'snowflake' bit, sorry about that...
>> >>
>> >> > BPF is a bit special here, because it requires every potential BPF
>> >> > hook (be it cgroups, xdp, perf_event, etc) to be aware of BPF
>> >> > program(s) and execute them with special macro. So like it or not, it
>> >> > is special and each driver supporting BPF needs to implement this BPF
>> >> > wiring.
>> >>
>> >> All that is about internal implementation, though. I'm bothered by the
>> >> API discrepancy (i.e., from the user PoV we'll end up with: "netlink is
>> >> what you use to configure your netdev except if you want to attach an
>> >> XDP program to it").
>> >>
>> >
>> > See my reply to David. Depends on where you define user API. Is it
>> > libbpf API, which is what most users are using? Or kernel API?
>>
>> Well I'm talking about the kernel<->userspace API, obviously :)
>>
>> > If everyone is using libbpf, does kernel system (bpf syscall vs
>> > netlink) matter all that much?
>>
>> This argument works the other way as well, though: If libbpf can
>> abstract the subsystem differences and provide a consistent interface to
>> "the BPF world", why does BPF need to impose its own syscall API on the
>> networking subsystem?
>
> bpf_link in libbpf started as user-space abstraction only, but we
> realized that it's not enough and there is a need to have proper
> kernel support and corresponding kernel object, so it's not just
> user-space API concerns.
>
> As for having netlink interface for creating link only for XDP. Why
> duplicating and maintaining 2 interfaces?

Totally agree; why do we need two interfaces? Let's keep the one we
already have - the netlink interface! :)

> All the other subsystems will go through bpf syscall, only XDP wants
> to (also) have this through netlink. This means duplication of UAPI
> for no added benefit. It's a LINK_CREATE operations, as well as
> LINK_UPDATE operations. Do we need to duplicate LINK_QUERY (once its
> implemented)? What if we'd like to support some other generic bpf_link
> functionality, would it be ok to add it only to bpf syscall, or we
> need to duplicate this in netlink as well?

You're saying that like we didn't already have the netlink API. We
essentially already have (the equivalent of) LINK_CREATE and LINK_QUERY,
this is just adding LINK_UPDATE. It's a straight-forward fix of an
existing API; essentially you're saying we should keep the old API in a
crippled state in order to promote your (proposed) new API.

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-24 19:22                       ` John Fastabend
  2020-03-25  1:36                         ` Alexei Starovoitov
@ 2020-03-25 10:30                         ` Toke Høiland-Jørgensen
  2020-03-25 17:56                           ` Alexei Starovoitov
  1 sibling, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-25 10:30 UTC (permalink / raw)
  To: John Fastabend, Andrii Nakryiko
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

John Fastabend <john.fastabend@gmail.com> writes:

> Toke Høiland-Jørgensen wrote:
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> 
>> > On Mon, Mar 23, 2020 at 12:23 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> >>
>> >> > On Mon, Mar 23, 2020 at 4:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> >>
>> >> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> >> >>
>> >> >> > On Fri, Mar 20, 2020 at 11:31 AM John Fastabend
>> >> >> > <john.fastabend@gmail.com> wrote:
>> >> >> >>
>> >> >> >> Jakub Kicinski wrote:
>> >> >> >> > On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
>> >> >> >> > > Jakub Kicinski <kuba@kernel.org> writes:
>> >> >> >> > > > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
>> >> >> >> > > >> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> >> >> > > >>
>> >> >> >> > > >> While it is currently possible for userspace to specify that an existing
>> >> >> >> > > >> XDP program should not be replaced when attaching to an interface, there is
>> >> >> >> > > >> no mechanism to safely replace a specific XDP program with another.
>> >> >> >> > > >>
>> >> >> >> > > >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
>> >> >> >> > > >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
>> >> >> >> > > >> currently loaded on the interface matches the expected one, and fail the
>> >> >> >> > > >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
>> >> >> >> > > >>
>> >> >> >> > > >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
>> >> >> >> > > >> request checking of the EXPECTED_FD attribute. This is needed for userspace
>> >> >> >> > > >> to discover whether the kernel supports the new attribute.
>> >> >> >> > > >>
>> >> >> >> > > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> >> >> > > >
>> >> >> >> > > > I didn't know we wanted to go ahead with this...
>> >> >> >> > >
>> >> >> >> > > Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
>> >> >> >> > > happening with that, though. So since this is a straight-forward
>> >> >> >> > > extension of the existing API, that doesn't carry a high implementation
>> >> >> >> > > cost, I figured I'd just go ahead with this. Doesn't mean we can't have
>> >> >> >> > > something similar in bpf_link as well, of course.
>> >> >> >> >
>> >> >> >> > I'm not really in the loop, but from what I overheard - I think the
>> >> >> >> > bpf_link may be targeting something non-networking first.
>> >> >> >>
>> >> >> >> My preference is to avoid building two different APIs one for XDP and another
>> >> >> >> for everything else. If we have userlands that already understand links and
>> >> >> >> pinning support is on the way imo lets use these APIs for networking as well.
>> >> >> >
>> >> >> > I agree here. And yes, I've been working on extending bpf_link into
>> >> >> > cgroup and then to XDP. We are still discussing some cgroup-specific
>> >> >> > details, but the patch is ready. I'm going to post it as an RFC to get
>> >> >> > the discussion started, before we do this for XDP.
>> >> >>
>> >> >> Well, my reason for being skeptic about bpf_link and proposing the
>> >> >> netlink-based API is actually exactly this, but in reverse: With
>> >> >> bpf_link we will be in the situation that everything related to a netdev
>> >> >> is configured over netlink *except* XDP.
>> >> >
>> >> > One can argue that everything related to use of BPF is going to be
>> >> > uniform and done through BPF syscall? Given variety of possible BPF
>> >> > hooks/targets, using custom ways to attach for all those many cases is
>> >> > really bad as well, so having a unifying concept and single entry to
>> >> > do this is good, no?
>> >>
>> >> Well, it depends on how you view the BPF subsystem's relation to the
>> >> rest of the kernel, I suppose. I tend to view it as a subsystem that
>> >> provides a bunch of functionality, which you can setup (using "internal"
>> >> BPF APIs), and then attach that object to a different subsystem
>> >> (networking) using that subsystem's configuration APIs.
>> >>
>> >> Seeing as this really boils down to a matter of taste, though, I'm not
>> >> sure we'll find agreement on this :)
>> >
>> > Yeah, seems like so. But then again, your view and reality don't seem
>> > to correlate completely. cgroup, a lot of tracing,
>> > flow_dissector/lirc_mode2 attachments all are done through BPF
>> > syscall.
>> 
>> Well, I wasn't talking about any of those subsystems, I was talking
>> about networking :)
>
> My experience has been that networking in the strict sense of XDP no
> longer exists on its own without cgroups, flow dissector, sockops,
> sockmap, tracing, etc. All of these pieces are built, patched, loaded,
> pinned and otherwise managed and manipulated as BPF objects via libbpf.
>
> Because I have all this infra in place for other items its a bit odd
> imo to drop out of BPF apis to then swap a program differently in the
> XDP case from how I would swap a program in any other place. I'm
> assuming ability to swap links will be enabled at some point.
>
> Granted it just means I have some extra functions on the side to manage
> the swap similar to how 'qdisc' would be handled today but still not as
> nice an experience in my case as if it was handled natively.

From a BPF application developer PoV I can totally understand the desire
for unified APIs. But that unification can still be achieved at the
libbpf level, while keeping network interface configuration done through
netlink.

> Anyways the netlink API is going to have to call into the BPF infra
> on the kernel side for verification, etc so its already not pure
> networking.

Yes, obviously there are *interactions* between the networking stack and
BPF. But the program attach is still interface configuration. The
netlink operation says "please configure this netdev to hook into the
BPF subsystem with this program".

>> In particular, networking already has a consistent and fairly
>> well-designed configuration mechanism (i.e., netlink) that we are
>> generally trying to move more functionality *towards* not *away from*
>> (see, e.g., converting ethtool to use netlink).
>
> True. But BPF programs are going to exist and interop with other
> programs not exactly in the networking space. Actually library calls
> might be used in tracing, cgroups, and XDP side. It gets a bit more
> interesting if the "same" object file (with some patching) runs in both
> XDP and sockops land for example.

Not really sure why that makes a difference, actually? There will still
be a point at which the network interface configuration is updated to
point to a (new) BPF program.

>> > LINK_CREATE provides an opportunity to finally unify all those
>> > different ways to achieve the same "attach my BPF program to some
>> > target object" semantics.
>> 
>> Well I also happen to think that "attach a BPF program to an object" is
>> the wrong way to think about XDP. Rather, in my mind the model is
>> "instruct the netdevice to execute this piece of BPF code".
>> 
>> >> >> Other than that, I don't see any reason why the bpf_link API won't work.
>> >> >> So I guess that if no one else has any problem with BPF insisting on
>> >> >> being a special snowflake, I guess I can live with it as well... *shrugs* :)
>> >> >
>> >> > Apart from derogatory remark,
>> >>
>> >> Yeah, should have left out the 'snowflake' bit, sorry about that...
>> >>
>> >> > BPF is a bit special here, because it requires every potential BPF
>> >> > hook (be it cgroups, xdp, perf_event, etc) to be aware of BPF
>> >> > program(s) and execute them with special macro. So like it or not, it
>> >> > is special and each driver supporting BPF needs to implement this BPF
>> >> > wiring.
>> >>
>> >> All that is about internal implementation, though. I'm bothered by the
>> >> API discrepancy (i.e., from the user PoV we'll end up with: "netlink is
>> >> what you use to configure your netdev except if you want to attach an
>> >> XDP program to it").
>> >>
>> >
>> > See my reply to David. Depends on where you define user API. Is it
>> > libbpf API, which is what most users are using? Or kernel API?
>> 
>> Well I'm talking about the kernel<->userspace API, obviously :)
>> 
>> > If everyone is using libbpf, does kernel system (bpf syscall vs
>> > netlink) matter all that much?
>> 
>> This argument works the other way as well, though: If libbpf can
>> abstract the subsystem differences and provide a consistent interface to
>> "the BPF world", why does BPF need to impose its own syscall API on the
>> networking subsystem?
>
> I can make it work either way as a netlink or syscall its not going
> to be a blocker. If we go netlink route then the next question is
> does libbpf pull in the ability to swap XDP progs via netlink or
> is that some other lib?

Not sure what you mean by this? This series does update libbpf with the
new API?

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-25  1:36                         ` Alexei Starovoitov
  2020-03-25  2:15                           ` Jakub Kicinski
@ 2020-03-25 10:42                           ` Toke Høiland-Jørgensen
  2020-03-25 18:11                             ` Alexei Starovoitov
  1 sibling, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-25 10:42 UTC (permalink / raw)
  To: Alexei Starovoitov, John Fastabend
  Cc: Andrii Nakryiko, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Tue, Mar 24, 2020 at 12:22:47PM -0700, John Fastabend wrote:
>> > 
>> > Well, I wasn't talking about any of those subsystems, I was talking
>> > about networking :)
>> 
>> My experience has been that networking in the strict sense of XDP no
>> longer exists on its own without cgroups, flow dissector, sockops,
>> sockmap, tracing, etc. All of these pieces are built, patched, loaded,
>> pinned and otherwise managed and manipulated as BPF objects via libbpf.
>> 
>> Because I have all this infra in place for other items its a bit odd
>> imo to drop out of BPF apis to then swap a program differently in the
>> XDP case from how I would swap a program in any other place. I'm
>> assuming ability to swap links will be enabled at some point.
>> 
>> Granted it just means I have some extra functions on the side to manage
>> the swap similar to how 'qdisc' would be handled today but still not as
>> nice an experience in my case as if it was handled natively.
>> 
>> Anyways the netlink API is going to have to call into the BPF infra
>> on the kernel side for verification, etc so its already not pure
>> networking.
>> 
>> > 
>> > In particular, networking already has a consistent and fairly
>> > well-designed configuration mechanism (i.e., netlink) that we are
>> > generally trying to move more functionality *towards* not *away from*
>> > (see, e.g., converting ethtool to use netlink).
>> 
>> True. But BPF programs are going to exist and interop with other
>> programs not exactly in the networking space. Actually library calls
>> might be used in tracing, cgroups, and XDP side. It gets a bit more
>> interesting if the "same" object file (with some patching) runs in both
>> XDP and sockops land for example.
>
> Thanks John for summarizing it very well.
> It looks to me that netlink proponents fail to realize that "bpf for
> networking" goes way beyond what netlink is doing and capable of doing in the
> future. BPF_*_INET_* progs do core networking without any smell of netlink
> anywhere. "But, but, but, netlink is the way to configure networking"... is
> simply not true.

That was not what I was saying. Obviously there are other components to
the networking stack than netlink.

What I'm saying is that netlink is the interface the kernel uses to
*configure network devices*. And that attaching an XDP program is a
network device configuration operation. I mean, it:

- Relies on the RTNL lock for synchronisation
- Fundamentally alters the flow of network packets on the device
- Potentially has side effects like link up/down, HWQ reconfig etc

I'm wondering if there's a way to reconcile these views? Maybe making
the bpf_link attachment work by passing the link fd to the netlink API?
That would keep the network interface configuration over netlink, but
would still allow a BPF application to swap out "its" programs via the
bpf_link APIs?

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-25  9:38                         ` Toke Høiland-Jørgensen
@ 2020-03-25 17:55                           ` Alexei Starovoitov
  2020-03-26  0:16                           ` Andrii Nakryiko
  1 sibling, 0 replies; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-25 17:55 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Wed, Mar 25, 2020 at 10:38:32AM +0100, Toke Høiland-Jørgensen wrote:
> >
> > As for having netlink interface for creating link only for XDP. Why
> > duplicating and maintaining 2 interfaces?
> 
> Totally agree; why do we need two interfaces? Let's keep the one we
> already have - the netlink interface! :)

it's not about netlink vs something else.
I already explained that the ownership concept is missing.

> > All the other subsystems will go through bpf syscall, only XDP wants
> > to (also) have this through netlink. This means duplication of UAPI
> > for no added benefit. It's a LINK_CREATE operations, as well as
> > LINK_UPDATE operations. Do we need to duplicate LINK_QUERY (once its
> > implemented)? What if we'd like to support some other generic bpf_link
> > functionality, would it be ok to add it only to bpf syscall, or we
> > need to duplicate this in netlink as well?
> 
> You're saying that like we didn't already have the netlink API. We
> essentially already have (the equivalent of) LINK_CREATE and LINK_QUERY,
> this is just adding LINK_UPDATE. It's a straight-forward fix of an
> existing API; essentially you're saying we should keep the old API in a
> crippled state in order to promote your (proposed) new API.

It's not a fix. It papers over a giant issue with all existing attaching
apis regardless of the form (netlink, syscall, etc)
The commit 7dd68b3279f1 ("bpf: Support replacing cgroup-bpf program in MULTI mode")
is the same paper-over. It's not a fix for broken api. I regret applying it.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-25 10:30                         ` Toke Høiland-Jørgensen
@ 2020-03-25 17:56                           ` Alexei Starovoitov
  0 siblings, 0 replies; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-25 17:56 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Andrii Nakryiko, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Wed, Mar 25, 2020 at 11:30:15AM +0100, Toke Høiland-Jørgensen wrote:
> 
> From a BPF application developer PoV I can totally understand the desire
> for unified APIs. But that unification can still be achieved at the
> libbpf level, while keeping network interface configuration done through
> netlink.

it cannot be done at libbpf level. The kernel is missing the ownership concept.
netlink vs other is irrelevant.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-25  2:15                           ` Jakub Kicinski
@ 2020-03-25 18:06                             ` Alexei Starovoitov
  2020-03-25 18:20                               ` Jakub Kicinski
  0 siblings, 1 reply; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-25 18:06 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: John Fastabend, Toke Høiland-Jørgensen,
	Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, Lorenz Bauer,
	Andrey Ignatov, Networking, bpf

On Tue, Mar 24, 2020 at 07:15:54PM -0700, Jakub Kicinski wrote:
> 
> It is the way to configure XDP today, so it's only natural to
> scrutinize the attempts to replace it. 

No one is replacing it.

> Also I personally don't think you'd see this much push back trying to
> add bpf_link-based stuff to cls_bpf, that's an add-on. XDP is
> integrated very fundamentally with the networking stack at this point.
> 
> > Details are important and every case is different. So imo:
> > converting ethtool to netlink - great stuff.
> > converting netdev irq/queue management to netlink - great stuff too.
> > adding more netlink api for xdp - really bad idea.
> 
> Why is it a bad idea?

I explained in three other emails. tldr: lack of ownership.

> There are plenty things which will only be available over netlink.
> Configuring the interface so installing the XDP program is possible
> (disabling features, configuring queues etc.). Chances are user gets
> the ifindex of the interface to attach to over netlink in the first
> place. The queue configuration (which you agree belongs in netlink)
> will definitely get more complex to allow REDIRECTs to work more
> smoothly. AF_XDP needs all sort of netlink stuff.

sure. that has nothing to do with ownership of attachment.

> Netlink gives us the notification mechanism which is how we solve
> coordination across daemons (something that BPF subsystem is only 
> now trying to solve).

I don't care about notifications on attachment and no one is trying to
solve that as far as I can see. It's not a problem to solve in the first place.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-25 10:42                           ` Toke Høiland-Jørgensen
@ 2020-03-25 18:11                             ` Alexei Starovoitov
  0 siblings, 0 replies; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-25 18:11 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Andrii Nakryiko, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Wed, Mar 25, 2020 at 11:42:57AM +0100, Toke Høiland-Jørgensen wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> 
> > On Tue, Mar 24, 2020 at 12:22:47PM -0700, John Fastabend wrote:
> >> > 
> >> > Well, I wasn't talking about any of those subsystems, I was talking
> >> > about networking :)
> >> 
> >> My experience has been that networking in the strict sense of XDP no
> >> longer exists on its own without cgroups, flow dissector, sockops,
> >> sockmap, tracing, etc. All of these pieces are built, patched, loaded,
> >> pinned and otherwise managed and manipulated as BPF objects via libbpf.
> >> 
> >> Because I have all this infra in place for other items its a bit odd
> >> imo to drop out of BPF apis to then swap a program differently in the
> >> XDP case from how I would swap a program in any other place. I'm
> >> assuming ability to swap links will be enabled at some point.
> >> 
> >> Granted it just means I have some extra functions on the side to manage
> >> the swap similar to how 'qdisc' would be handled today but still not as
> >> nice an experience in my case as if it was handled natively.
> >> 
> >> Anyways the netlink API is going to have to call into the BPF infra
> >> on the kernel side for verification, etc so its already not pure
> >> networking.
> >> 
> >> > 
> >> > In particular, networking already has a consistent and fairly
> >> > well-designed configuration mechanism (i.e., netlink) that we are
> >> > generally trying to move more functionality *towards* not *away from*
> >> > (see, e.g., converting ethtool to use netlink).
> >> 
> >> True. But BPF programs are going to exist and interop with other
> >> programs not exactly in the networking space. Actually library calls
> >> might be used in tracing, cgroups, and XDP side. It gets a bit more
> >> interesting if the "same" object file (with some patching) runs in both
> >> XDP and sockops land for example.
> >
> > Thanks John for summarizing it very well.
> > It looks to me that netlink proponents fail to realize that "bpf for
> > networking" goes way beyond what netlink is doing and capable of doing in the
> > future. BPF_*_INET_* progs do core networking without any smell of netlink
> > anywhere. "But, but, but, netlink is the way to configure networking"... is
> > simply not true.
> 
> That was not what I was saying. Obviously there are other components to
> the networking stack than netlink.
> 
> What I'm saying is that netlink is the interface the kernel uses to
> *configure network devices*. And that attaching an XDP program is a
> network device configuration operation. I mean, it:
> 
> - Relies on the RTNL lock for synchronisation
> - Fundamentally alters the flow of network packets on the device
> - Potentially has side effects like link up/down, HWQ reconfig etc

sure. Attaching a prog to ingress qdisc can be considered a 'configuration'
of qdisc because rtnl is needed and what not.
That doesn't contradict my point that other apis (not only netlink) take
rtnl lock, etc.

> I'm wondering if there's a way to reconcile these views? Maybe making
> the bpf_link attachment work by passing the link fd to the netlink API?

what kind of frankenstein that would be?

> That would keep the network interface configuration over netlink, but
> would still allow a BPF application to swap out "its" programs via the
> bpf_link APIs?

It's not about swapping. bpf_link brings ownership concept in the first place.
It could be done via bpf syscall, new syscall, netlink, ioctl, you name it.
It's all secondary. The key concept is ownership.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-25 18:06                             ` Alexei Starovoitov
@ 2020-03-25 18:20                               ` Jakub Kicinski
  2020-03-25 19:14                                 ` Alexei Starovoitov
  0 siblings, 1 reply; 112+ messages in thread
From: Jakub Kicinski @ 2020-03-25 18:20 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: John Fastabend, Toke Høiland-Jørgensen,
	Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, Lorenz Bauer,
	Andrey Ignatov, Networking, bpf

On Wed, 25 Mar 2020 11:06:38 -0700 Alexei Starovoitov wrote:
> On Tue, Mar 24, 2020 at 07:15:54PM -0700, Jakub Kicinski wrote:
> > It is the way to configure XDP today, so it's only natural to
> > scrutinize the attempts to replace it.   
> 
> No one is replacing it.

You're blocking extensions to the existing API, that means that part 
of the API is frozen and is being replaced.

> > Also I personally don't think you'd see this much push back trying to
> > add bpf_link-based stuff to cls_bpf, that's an add-on. XDP is
> > integrated very fundamentally with the networking stack at this point.
> >   
> > > Details are important and every case is different. So imo:
> > > converting ethtool to netlink - great stuff.
> > > converting netdev irq/queue management to netlink - great stuff too.
> > > adding more netlink api for xdp - really bad idea.  
> > 
> > Why is it a bad idea?  
> 
> I explained in three other emails. tldr: lack of ownership.

Those came later, I think, thanks.

Fine, maybe one day someone will find the extension you're proposing
useful. To me that's not a justification to freeze the existing API
(you said "adding more netlink api for xdp - really bad idea").

Besides, if you look at Toke's libxdp work (which exists), what's the
ownership of the attached program? Whichever application touched it
last?

The whole auto-detachment thing may work nicely in cls_bpf and
sub-programs attached to the root XDP program, but it's a bit hard 
to imagine how its useful for the singleton root XDP program.

> > There are plenty things which will only be available over netlink.
> > Configuring the interface so installing the XDP program is possible
> > (disabling features, configuring queues etc.). Chances are user gets
> > the ifindex of the interface to attach to over netlink in the first
> > place. The queue configuration (which you agree belongs in netlink)
> > will definitely get more complex to allow REDIRECTs to work more
> > smoothly. AF_XDP needs all sort of netlink stuff.  
> 
> sure. that has nothing to do with ownership of attachment.

AFAICT the allure to John is the uniform API, and no need for netlink.
I was explaining how that's a bad goal to have.

> > Netlink gives us the notification mechanism which is how we solve
> > coordination across daemons (something that BPF subsystem is only 
> > now trying to solve).  
> 
> I don't care about notifications on attachment and no one is trying to
> solve that as far as I can see. It's not a problem to solve in the first place.

Well, it's the existing solution to the "ownership" problem.
I think most people simply didn't know about it.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-25 18:20                               ` Jakub Kicinski
@ 2020-03-25 19:14                                 ` Alexei Starovoitov
  0 siblings, 0 replies; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-25 19:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: John Fastabend, Toke Høiland-Jørgensen,
	Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, Lorenz Bauer,
	Andrey Ignatov, Networking, bpf

On Wed, Mar 25, 2020 at 11:20:05AM -0700, Jakub Kicinski wrote:
> On Wed, 25 Mar 2020 11:06:38 -0700 Alexei Starovoitov wrote:
> > On Tue, Mar 24, 2020 at 07:15:54PM -0700, Jakub Kicinski wrote:
> > > It is the way to configure XDP today, so it's only natural to
> > > scrutinize the attempts to replace it.   
> > 
> > No one is replacing it.
> 
> You're blocking extensions to the existing API, that means that part 
> of the API is frozen and is being replaced.

two things are wrong in the above stmt:
1. extensions are not frozen in general.
2. api is not being replaced. ownership is lacking. It needs to be added.
   It's a new concept. Not a replacement.

> > > Also I personally don't think you'd see this much push back trying to
> > > add bpf_link-based stuff to cls_bpf, that's an add-on. XDP is
> > > integrated very fundamentally with the networking stack at this point.
> > >   
> > > > Details are important and every case is different. So imo:
> > > > converting ethtool to netlink - great stuff.
> > > > converting netdev irq/queue management to netlink - great stuff too.
> > > > adding more netlink api for xdp - really bad idea.  
> > > 
> > > Why is it a bad idea?  
> > 
> > I explained in three other emails. tldr: lack of ownership.
> 
> Those came later, I think, thanks.
> 
> Fine, maybe one day someone will find the extension you're proposing
> useful. To me that's not a justification to freeze the existing API
> (you said "adding more netlink api for xdp - really bad idea").
> 
> Besides, if you look at Toke's libxdp work (which exists), what's the
> ownership of the attached program? Whichever application touched it
> last?
> 
> The whole auto-detachment thing may work nicely in cls_bpf and
> sub-programs attached to the root XDP program, but it's a bit hard 
> to imagine how its useful for the singleton root XDP program.

bpf_link introduces two new things: 1. ownership 2. auto-detach
They are both useful. Looks like the use case for 2 is obvious, but
1 can exist without being FD based.

> 
> > > There are plenty things which will only be available over netlink.
> > > Configuring the interface so installing the XDP program is possible
> > > (disabling features, configuring queues etc.). Chances are user gets
> > > the ifindex of the interface to attach to over netlink in the first
> > > place. The queue configuration (which you agree belongs in netlink)
> > > will definitely get more complex to allow REDIRECTs to work more
> > > smoothly. AF_XDP needs all sort of netlink stuff.  
> > 
> > sure. that has nothing to do with ownership of attachment.
> 
> AFAICT the allure to John is the uniform API, and no need for netlink.
> I was explaining how that's a bad goal to have.

You clearly misunderstood. Neither John nor I were saying that there is
no need for netlink.

> 
> > > Netlink gives us the notification mechanism which is how we solve
> > > coordination across daemons (something that BPF subsystem is only 
> > > now trying to solve).  
> > 
> > I don't care about notifications on attachment and no one is trying to
> > solve that as far as I can see. It's not a problem to solve in the first place.
> 
> Well, it's the existing solution to the "ownership" problem.
> I think most people simply didn't know about it.

Toke's set introduces the same thing to XDP as
commit 7dd68b3279f1 ("bpf: Support replacing cgroup-bpf program in MULTI mode")
did for cgroup-bpf.
Both are trying to address the same issue and both are NOT doing.
That cgroup-bpf commit looked like a great solution just three month ago.
Now it's clear it's not fixing the underlying issue.
Same thing with Toke's fix. It feels good now, but going to be uselss
without introducing ownership.

Why that cgroup-bpf commit not fixing it?
Take a look at that commit. The first paragraph is
"
The common use-case in production is to have multiple cgroup-bpf
programs per attach type that cover multiple use-cases. Such programs
are attached with BPF_F_ALLOW_MULTI and can be maintained by different
people.
"
Then the description goes into explaining how one service wants to replace its prog.
In this case it sort of works because it's single c++ service with multiple
progs that do different things. There is a 'centralized daemon' (kinda) that
can try to orchestrate. It breaks when there are two c++ services.
That replace_bpf_fd is trying to be a link identifier. But the kernel lacks
that identifier.
I think it would be simpler to understand the ownership if bpf_link had
its own IDR for every link. Every attachment(link) would be an object with its
own id. We could have iterated over all attachments with GET_NEXT_ID, for example.
But that's nice to have. Not strictly necessary.
The ownership of the attachment needs to be permanent. It needs to belong
to a task and other tasks should not be able to break that attachment.
That cgroup-bpf commit addressing part of the issue by "inventing" an identifier
for the attachment (in the form of prog_fd that suppose to be there in that
attachment), but not addressing the owner part of the attachment.
Only the task(s) that own that attachment should be able to modify the attachment.

One can imagine how attachment ID can be completely implemented with netlink.
Is it good idea? Not really, because there is no mechanism to transfer the ownership.
Having an FD that points to a kernel object that represents the ownership makes it
easy for user space to pass the ownership (by passing an FD).
Auto-detach part comes for free with FD based bpf_link, but that's not the main feature.
May be we will add a flag to disable auto-detach too.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-25  9:38                         ` Toke Høiland-Jørgensen
  2020-03-25 17:55                           ` Alexei Starovoitov
@ 2020-03-26  0:16                           ` Andrii Nakryiko
  2020-03-26  5:13                             ` Jakub Kicinski
                                               ` (2 more replies)
  1 sibling, 3 replies; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-26  0:16 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On Wed, Mar 25, 2020 at 2:38 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Tue, Mar 24, 2020 at 3:57 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>
> >> > On Mon, Mar 23, 2020 at 12:23 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >>
> >> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >> >>
> >> >> > On Mon, Mar 23, 2020 at 4:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >> >>
> >> >> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >> >> >>
> >> >> >> > On Fri, Mar 20, 2020 at 11:31 AM John Fastabend
> >> >> >> > <john.fastabend@gmail.com> wrote:
> >> >> >> >>
> >> >> >> >> Jakub Kicinski wrote:
> >> >> >> >> > On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
> >> >> >> >> > > Jakub Kicinski <kuba@kernel.org> writes:
> >> >> >> >> > > > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
> >> >> >> >> > > >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >> >> >> > > >>
> >> >> >> >> > > >> While it is currently possible for userspace to specify that an existing
> >> >> >> >> > > >> XDP program should not be replaced when attaching to an interface, there is
> >> >> >> >> > > >> no mechanism to safely replace a specific XDP program with another.
> >> >> >> >> > > >>
> >> >> >> >> > > >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
> >> >> >> >> > > >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
> >> >> >> >> > > >> currently loaded on the interface matches the expected one, and fail the
> >> >> >> >> > > >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
> >> >> >> >> > > >>
> >> >> >> >> > > >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
> >> >> >> >> > > >> request checking of the EXPECTED_FD attribute. This is needed for userspace
> >> >> >> >> > > >> to discover whether the kernel supports the new attribute.
> >> >> >> >> > > >>
> >> >> >> >> > > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >> >> >> > > >
> >> >> >> >> > > > I didn't know we wanted to go ahead with this...
> >> >> >> >> > >
> >> >> >> >> > > Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
> >> >> >> >> > > happening with that, though. So since this is a straight-forward
> >> >> >> >> > > extension of the existing API, that doesn't carry a high implementation
> >> >> >> >> > > cost, I figured I'd just go ahead with this. Doesn't mean we can't have
> >> >> >> >> > > something similar in bpf_link as well, of course.
> >> >> >> >> >
> >> >> >> >> > I'm not really in the loop, but from what I overheard - I think the
> >> >> >> >> > bpf_link may be targeting something non-networking first.
> >> >> >> >>
> >> >> >> >> My preference is to avoid building two different APIs one for XDP and another
> >> >> >> >> for everything else. If we have userlands that already understand links and
> >> >> >> >> pinning support is on the way imo lets use these APIs for networking as well.
> >> >> >> >
> >> >> >> > I agree here. And yes, I've been working on extending bpf_link into
> >> >> >> > cgroup and then to XDP. We are still discussing some cgroup-specific
> >> >> >> > details, but the patch is ready. I'm going to post it as an RFC to get
> >> >> >> > the discussion started, before we do this for XDP.
> >> >> >>
> >> >> >> Well, my reason for being skeptic about bpf_link and proposing the
> >> >> >> netlink-based API is actually exactly this, but in reverse: With
> >> >> >> bpf_link we will be in the situation that everything related to a netdev
> >> >> >> is configured over netlink *except* XDP.
> >> >> >
> >> >> > One can argue that everything related to use of BPF is going to be
> >> >> > uniform and done through BPF syscall? Given variety of possible BPF
> >> >> > hooks/targets, using custom ways to attach for all those many cases is
> >> >> > really bad as well, so having a unifying concept and single entry to
> >> >> > do this is good, no?
> >> >>
> >> >> Well, it depends on how you view the BPF subsystem's relation to the
> >> >> rest of the kernel, I suppose. I tend to view it as a subsystem that
> >> >> provides a bunch of functionality, which you can setup (using "internal"
> >> >> BPF APIs), and then attach that object to a different subsystem
> >> >> (networking) using that subsystem's configuration APIs.
> >> >>
> >> >> Seeing as this really boils down to a matter of taste, though, I'm not
> >> >> sure we'll find agreement on this :)
> >> >
> >> > Yeah, seems like so. But then again, your view and reality don't seem
> >> > to correlate completely. cgroup, a lot of tracing,
> >> > flow_dissector/lirc_mode2 attachments all are done through BPF
> >> > syscall.
> >>
> >> Well, I wasn't talking about any of those subsystems, I was talking
> >> about networking :)
> >
> > So it's not "BPF subsystem's relation to the rest of the kernel" from
> > your previous email, it's now only "talking about networking"? Since
> > when the rest of the kernel is networking?
>
> Not really, I would likely argue the same for any other subsystem, I

And you would like lose that argument :) You already agreed that for
tracing this is not the case. BPF is not attached by writing text into
ftrace's debugfs entries. Same for cgroups, we don't
create/update/write special files in cgroupfs, we have an explicit
attachment API in BPF.

BTW, kprobes started out with the same model as XDP has right now. You
had to do a bunch of magic writes into various debugfs files to attach
BPF program. If user-space application crashed, kprobe stayed
attached. This was horrible and led to many problems in real world
production uses. So a completely different interface was created,
allowing to do it through perf_event_open() and created anonymous
inode for BPF program attachment. That allowed crashing program to
auto-detach kprobe and not harm production use case.

Now we are coming after cgroup BPF programs, which have similar issues
and similar pains in production. cgroup BPF progs actually have extra
problems: programs can user-space applications can accidentally
replace a critical cgroup program and ruin the day for many folks that
have to deal with production breakage after that. Which is why I'm
implementing bpf_link with all its properties: to solve real pain and
real problem.

Now for XDP. It has same flawed model. And even if it seems to you
that it's not a big issue, and even if Jakub thinks we are trying to
solve non-existing problem, it is a real problem and a real concern
from people that have to support XDP in production with many
well-meaning developers developing BPF applications independently.
Copying what you wrote in another thread:

> Setting aside the question of which is the best abstraction to represent
> an attachment, it seems to me that the actual behavioural problem (XDP
> programs being overridden by mistake) would be solvable by this patch,
> assuming well-behaved userspace applications.

... this is a horrible and unrealistic assumption that we just cannot
make and accept. However well-behaved userspace applications are, they
are written by people that make mistakes. And rather than blissfully
expect that everything will be fine, we want to have enforcements in
place that will prevent some buggy application to wreck havoc in
production.

> just prefer to limit myself to talking about things I actually know
> something about. Hence, networking :)
>
> > But anyways, I think John addressed modern XDP networking issues in
> > his email very well already.
>
> Going to reply to that email next...
>
> >> In particular, networking already has a consistent and fairly
> >> well-designed configuration mechanism (i.e., netlink) that we are
> >> generally trying to move more functionality *towards* not *away from*
> >> (see, e.g., converting ethtool to use netlink).
> >>
> >> > LINK_CREATE provides an opportunity to finally unify all those
> >> > different ways to achieve the same "attach my BPF program to some
> >> > target object" semantics.
> >>
> >> Well I also happen to think that "attach a BPF program to an object" is
> >> the wrong way to think about XDP. Rather, in my mind the model is
> >> "instruct the netdevice to execute this piece of BPF code".
> >
> > That can't be reconciled, so no point of arguing :) But thinking about
> > BPF in general, I think it's closer to attach BPF program thinking
> > (especially all the fexit/fentry, kprobe, etc), where objects that BPF
> > is attached to is not "active" in the sense of "calling BPF", it's
> > more of BPF system setting things up (attaching?) in such a way that
> > BPF program is executed when appropriate.
>
> I'd tend to agree with you on most of the tracing stuff, but not on
> this. But let's just agree to disagree here :)
>
> >> >> >> Other than that, I don't see any reason why the bpf_link API won't work.
> >> >> >> So I guess that if no one else has any problem with BPF insisting on
> >> >> >> being a special snowflake, I guess I can live with it as well... *shrugs* :)
> >> >> >
> >> >> > Apart from derogatory remark,
> >> >>
> >> >> Yeah, should have left out the 'snowflake' bit, sorry about that...
> >> >>
> >> >> > BPF is a bit special here, because it requires every potential BPF
> >> >> > hook (be it cgroups, xdp, perf_event, etc) to be aware of BPF
> >> >> > program(s) and execute them with special macro. So like it or not, it
> >> >> > is special and each driver supporting BPF needs to implement this BPF
> >> >> > wiring.
> >> >>
> >> >> All that is about internal implementation, though. I'm bothered by the
> >> >> API discrepancy (i.e., from the user PoV we'll end up with: "netlink is
> >> >> what you use to configure your netdev except if you want to attach an
> >> >> XDP program to it").
> >> >>
> >> >
> >> > See my reply to David. Depends on where you define user API. Is it
> >> > libbpf API, which is what most users are using? Or kernel API?
> >>
> >> Well I'm talking about the kernel<->userspace API, obviously :)
> >>
> >> > If everyone is using libbpf, does kernel system (bpf syscall vs
> >> > netlink) matter all that much?
> >>
> >> This argument works the other way as well, though: If libbpf can
> >> abstract the subsystem differences and provide a consistent interface to
> >> "the BPF world", why does BPF need to impose its own syscall API on the
> >> networking subsystem?
> >
> > bpf_link in libbpf started as user-space abstraction only, but we
> > realized that it's not enough and there is a need to have proper
> > kernel support and corresponding kernel object, so it's not just
> > user-space API concerns.
> >
> > As for having netlink interface for creating link only for XDP. Why
> > duplicating and maintaining 2 interfaces?
>
> Totally agree; why do we need two interfaces? Let's keep the one we
> already have - the netlink interface! :)
>
> > All the other subsystems will go through bpf syscall, only XDP wants
> > to (also) have this through netlink. This means duplication of UAPI
> > for no added benefit. It's a LINK_CREATE operations, as well as
> > LINK_UPDATE operations. Do we need to duplicate LINK_QUERY (once its
> > implemented)? What if we'd like to support some other generic bpf_link
> > functionality, would it be ok to add it only to bpf syscall, or we
> > need to duplicate this in netlink as well?
>
> You're saying that like we didn't already have the netlink API. We
> essentially already have (the equivalent of) LINK_CREATE and LINK_QUERY,
> this is just adding LINK_UPDATE. It's a straight-forward fix of an
> existing API; essentially you're saying we should keep the old API in a
> crippled state in order to promote your (proposed) new API.

This is the fundamental disagreement that we seem to have. XDP's BPF
program attachment is not in any way equivalent to bpf_link. So no,
netlink API currently doesn't have anything that's close to bpf_link.
Let me try to summarize what bpf_link is and what are its fundamental
properties regardless of type of BPF programs.

1. bpf_link represents a connection (pairing?) of BPF program and some
BPF hook it is attached to. BPF hook could be perf event, cgroup,
netdev, etc. It's a completely independent object in itself, along the
bpf_map and bpf_prog, which has its own lifetime and kernel
representation. To user-space application it is returned as an
installed FD, similar to loaded BPF program and BPF map. It is
important that it's not just a BPF program, because BPF program can be
attached to multiple BPF hooks (e.g., same XDP program can be attached
to multiple interface; same kprobe handler can be installed multiple
times), which means that having BPF program FD isn't enough to
uniquely represent that one specific BPF program attachment and detach
it or query it. Having kernel object for this allows to encapsulate
all these various details of what is attached were and present to
user-space a single handle (FD) to work with.

2. Due to having FD associated with bpf_link, it's not possible to
talk about "owning" bpf_link. If application created link and never
shared its FD with any other application, it is the sole owner of it.
But it also means that you can share it, if you need it. Now, once
application closes FD or app crashes and kernel automatically closes
that FD, bpf_link refcount is decremented. If it was the last or only
FD, it will trigger automatica detachment and clean up of that
particular BPF program attachment. Note, not a clean up of BPF
program, which can still be attached somewhere else: only that
particular attachment.

3. This derives from the concept of ownership of bpf_link. Once
bpf_link is attached, no other application that doesn't own that
bpf_link can replace, detach or modify the link. For some cases it
doesn't matter. E.g., for tracing, all attachment to the same fentry
trampoline are completely independent. But for other cases this is
crucial property. E.g., when you attach BPF program in an exclusive
(single) mode, it means that particular cgroup and any of its children
cgroups can have any more BPF programs attached. This is important for
container management systems to enforce invariants and correct
functioning of the system. Right now it's very easy to violate that -
you just go and attach your own BPF program, and previous BPF program
gets automatically detached without original application that put it
there knowing about this. Chaos ensues after that and real people have
to deal with this. Which is why existing
BPF_PROG_ATTACH/BPF_PROG_DETACH API is inadequate and we are adding
bpf_link support.

Those same folks have similar concern with XDP. In the world where
container management installs "root" XDP program which other user
applications can plug into (libxdp use case, right?), it's crucial to
ensure that this root XDP program is not accidentally overwritten by
some well-meaning, but not overly cautious developer experimenting in
his own container with XDP programs. This is where bpf_link ownership
plays a huge role. Tupperware agent (FB's container management agent)
would install root XDP program and will hold onto this bpf_link
without sharing it with other applications. That will guarantee that
the system will be stable and can't be compromised.

Now, those were fundamental things, but I'd like to touch on a "nice
things we get with that". Having a proper kernel object representing
single instance of attached BPF program to some other kernel object
allows to build an uniform and consistent API around bpf_link with
same semantics. We can do LINK_UPDATE and allow to atomically replace
BPF program inside the established bpf_link. It's applicable to all
types of BPF program attachment and can be done in a way that ensures
no BPF program invocation is skipped while BPF programs are swapped
(because at the lowest level it boils down to an atomic pointer swap).
Of course not all bpf_links might have this support initially, but
we'll establish a lot of common infrastructure which will make it
simpler, faster and more reliable to add this functionality.

Similarly, we can have LINK_QUERY, which will return essential
information about attachment, particular to a specific kind of BPF
program attachment: BPF program ID itself, attach type, cgroup
ID/ifindex/perf event/whatever else we decide is good to report. If we
start allocating IDs for bpf_link, as Alexei mentioned, now you'll be
able to consistently iterate over all attached BPF programs,
regardless of their types, without a need to first iterate all
possible cgroups/netdevs, etc.

And to wrap up. I agree, consistent API is not a goal in itself, as
Jakub mentioned. But it is a worthy goal nevertheless, especially if
it doesn't cost anything extra. It makes kernel developers lives
easier, it makes library developers' lives easier, it makes it easier
to understand and learn BPF overall easier. If we have 10 different
bpf_link types, and 9 out of 10 are going to go through bpf syscall,
but 1 (guess which one?) will be done through netlink, it's not the
end of the world, of course, but that does sound weird. And people
making it sound like developing and attaching BPF program is as
trivial as using one netlink command are just trying to mislead.
Whoever does XDP development will have to learn quite a bit more about
BPF, so making BPF story more consistent and simpler is important for
networking people (who care about XDP, of course) as much as for any
tracing BPF developer.


>
> -Toke
>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26  0:16                           ` Andrii Nakryiko
@ 2020-03-26  5:13                             ` Jakub Kicinski
  2020-03-26 18:09                               ` Andrii Nakryiko
  2020-03-26 19:40                               ` Alexei Starovoitov
  2020-03-26 10:04                             ` Lorenz Bauer
  2020-03-26 12:35                             ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
  2 siblings, 2 replies; 112+ messages in thread
From: Jakub Kicinski @ 2020-03-26  5:13 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Wed, 25 Mar 2020 17:16:13 -0700 Andrii Nakryiko wrote:
> > >> Well, I wasn't talking about any of those subsystems, I was talking
> > >> about networking :)  
> > >
> > > So it's not "BPF subsystem's relation to the rest of the kernel" from
> > > your previous email, it's now only "talking about networking"? Since
> > > when the rest of the kernel is networking?  
> >
> > Not really, I would likely argue the same for any other subsystem, I  
> 
> And you would like lose that argument :) You already agreed that for
> tracing this is not the case. BPF is not attached by writing text into
> ftrace's debugfs entries. Same for cgroups, we don't
> create/update/write special files in cgroupfs, we have an explicit
> attachment API in BPF.
> 
> BTW, kprobes started out with the same model as XDP has right now. You
> had to do a bunch of magic writes into various debugfs files to attach
> BPF program. If user-space application crashed, kprobe stayed
> attached. This was horrible and led to many problems in real world
> production uses. So a completely different interface was created,
> allowing to do it through perf_event_open() and created anonymous
> inode for BPF program attachment. That allowed crashing program to
> auto-detach kprobe and not harm production use case.
> 
> Now we are coming after cgroup BPF programs, which have similar issues
> and similar pains in production. cgroup BPF progs actually have extra
> problems: programs can user-space applications can accidentally
> replace a critical cgroup program and ruin the day for many folks that
> have to deal with production breakage after that. Which is why I'm
> implementing bpf_link with all its properties: to solve real pain and
> real problem.
>
> Now for XDP. It has same flawed model. And even if it seems to you
> that it's not a big issue, and even if Jakub thinks we are trying to
> solve non-existing problem, it is a real problem and a real concern
> from people that have to support XDP in production with many

More than happy to talk to those folks, and see the tickets.

Toke has actual user space code which needs his extension, and for
which "ownership" makes no difference as it would just be passed with
whoever touched the program last.

> well-meaning developers developing BPF applications independently.

There is one single program which can be attached to the XDP hook, 
the "everybody attaches their program model" does not apply.

TW agent should just listen on netlink notifications to see if someone
replaced its program. cgroups have multi-attachment and no notifications
(although not sure anyone was explicitly asking for links there,
either).

In production a no-op XDP program is likely to be attached from the
moment machine boots, to avoid traffic interruption and the risk of
something going wrong with the driver when switching between skb to 
xdp datapath. And then the program is only replaced, not detached.

Not to mention the fact that networking applications generally don't
want to remove their policy from the kernel when they crash :/

> Now, those were fundamental things, but I'd like to touch on a "nice
> things we get with that". Having a proper kernel object representing
> single instance of attached BPF program to some other kernel object
> allows to build an uniform and consistent API around bpf_link with
> same semantics. We can do LINK_UPDATE and allow to atomically replace
> BPF program inside the established bpf_link. It's applicable to all
> types of BPF program attachment and can be done in a way that ensures
> no BPF program invocation is skipped while BPF programs are swapped
> (because at the lowest level it boils down to an atomic pointer swap).
> Of course not all bpf_links might have this support initially, but
> we'll establish a lot of common infrastructure which will make it
> simpler, faster and more reliable to add this functionality.

XDP replace is already atomic, no packet will be passed without either
old or new program executed on it.

> And to wrap up. I agree, consistent API is not a goal in itself, as
> Jakub mentioned. But it is a worthy goal nevertheless, especially if
> it doesn't cost anything extra. It makes kernel developers lives

Not sure how having two interfaces instead of one makes kernel
developer's life easier.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26  0:16                           ` Andrii Nakryiko
  2020-03-26  5:13                             ` Jakub Kicinski
@ 2020-03-26 10:04                             ` Lorenz Bauer
  2020-03-26 17:47                               ` Jakub Kicinski
                                                 ` (2 more replies)
  2020-03-26 12:35                             ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
  2 siblings, 3 replies; 112+ messages in thread
From: Lorenz Bauer @ 2020-03-26 10:04 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Toke Høiland-Jørgensen, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Andrey Ignatov, Networking, bpf

On Thu, 26 Mar 2020 at 00:16, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
[...]
>
> Those same folks have similar concern with XDP. In the world where
> container management installs "root" XDP program which other user
> applications can plug into (libxdp use case, right?), it's crucial to
> ensure that this root XDP program is not accidentally overwritten by
> some well-meaning, but not overly cautious developer experimenting in
> his own container with XDP programs. This is where bpf_link ownership
> plays a huge role. Tupperware agent (FB's container management agent)
> would install root XDP program and will hold onto this bpf_link
> without sharing it with other applications. That will guarantee that
> the system will be stable and can't be compromised.

Thanks for the extensive explanation Andrii.

This is what I imagine you're referring to: Tupperware creates a new network
namespace ns1 and a veth0<>veth1 pair, moves one of the veth devices
(let's says veth1) into ns1 and runs an application in ns1. On which veth
would the XDP program go?

The way I understand it, veth1 would have XDP, and the application in ns1 would
be prevented from attaching a new program? Maybe you can elaborate on this
a little.

Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26  0:16                           ` Andrii Nakryiko
  2020-03-26  5:13                             ` Jakub Kicinski
  2020-03-26 10:04                             ` Lorenz Bauer
@ 2020-03-26 12:35                             ` Toke Høiland-Jørgensen
  2020-03-26 19:06                               ` Andrii Nakryiko
  2020-03-26 19:58                               ` Alexei Starovoitov
  2 siblings, 2 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-26 12:35 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> Now for XDP. It has same flawed model. And even if it seems to you
> that it's not a big issue, and even if Jakub thinks we are trying to
> solve non-existing problem, it is a real problem and a real concern
> from people that have to support XDP in production with many
> well-meaning developers developing BPF applications independently.
> Copying what you wrote in another thread:
>
>> Setting aside the question of which is the best abstraction to represent
>> an attachment, it seems to me that the actual behavioural problem (XDP
>> programs being overridden by mistake) would be solvable by this patch,
>> assuming well-behaved userspace applications.
>
> ... this is a horrible and unrealistic assumption that we just cannot
> make and accept. However well-behaved userspace applications are, they
> are written by people that make mistakes. And rather than blissfully
> expect that everything will be fine, we want to have enforcements in
> place that will prevent some buggy application to wreck havoc in
> production.

Look, I'm not trying to tell you how to managed your internal systems.
I'm just objecting to your assertion that your deployment model is the
only one that can possibly work, and the refusal to consider other
alternatives that comes with it.

>> You're saying that like we didn't already have the netlink API. We
>> essentially already have (the equivalent of) LINK_CREATE and LINK_QUERY,
>> this is just adding LINK_UPDATE. It's a straight-forward fix of an
>> existing API; essentially you're saying we should keep the old API in a
>> crippled state in order to promote your (proposed) new API.
>
> This is the fundamental disagreement that we seem to have. XDP's BPF
> program attachment is not in any way equivalent to bpf_link. So no,
> netlink API currently doesn't have anything that's close to bpf_link.
> Let me try to summarize what bpf_link is and what are its fundamental
> properties regardless of type of BPF programs.

First of all, thank you for this summary; that is very useful!

> 1. bpf_link represents a connection (pairing?) of BPF program and some
> BPF hook it is attached to. BPF hook could be perf event, cgroup,
> netdev, etc. It's a completely independent object in itself, along the
> bpf_map and bpf_prog, which has its own lifetime and kernel
> representation. To user-space application it is returned as an
> installed FD, similar to loaded BPF program and BPF map. It is
> important that it's not just a BPF program, because BPF program can be
> attached to multiple BPF hooks (e.g., same XDP program can be attached
> to multiple interface; same kprobe handler can be installed multiple
> times), which means that having BPF program FD isn't enough to
> uniquely represent that one specific BPF program attachment and detach
> it or query it. Having kernel object for this allows to encapsulate
> all these various details of what is attached were and present to
> user-space a single handle (FD) to work with.

For XDP there is already a unique handle, it's just implicit: Each
netdev can have exactly one XDP program loaded. So I don't really see
how bpf_link adds anything, other than another API for the same thing?

> 2. Due to having FD associated with bpf_link, it's not possible to
> talk about "owning" bpf_link. If application created link and never
> shared its FD with any other application, it is the sole owner of it.
> But it also means that you can share it, if you need it. Now, once
> application closes FD or app crashes and kernel automatically closes
> that FD, bpf_link refcount is decremented. If it was the last or only
> FD, it will trigger automatica detachment and clean up of that
> particular BPF program attachment. Note, not a clean up of BPF
> program, which can still be attached somewhere else: only that
> particular attachment.

This behaviour is actually one of my reservations against bpf_link for
XDP: I think that automatically detaching XDP programs when the FD is
closed is very much the wrong behaviour. An XDP program processes
packets, and when loading one I very much expect it to keep doing that
until I explicitly tell it to stop.

> 3. This derives from the concept of ownership of bpf_link. Once
> bpf_link is attached, no other application that doesn't own that
> bpf_link can replace, detach or modify the link. For some cases it
> doesn't matter. E.g., for tracing, all attachment to the same fentry
> trampoline are completely independent. But for other cases this is
> crucial property. E.g., when you attach BPF program in an exclusive
> (single) mode, it means that particular cgroup and any of its children
> cgroups can have any more BPF programs attached. This is important for
> container management systems to enforce invariants and correct
> functioning of the system. Right now it's very easy to violate that -
> you just go and attach your own BPF program, and previous BPF program
> gets automatically detached without original application that put it
> there knowing about this. Chaos ensues after that and real people have
> to deal with this. Which is why existing
> BPF_PROG_ATTACH/BPF_PROG_DETACH API is inadequate and we are adding
> bpf_link support.

I can totally see how having an option to enforce a policy such as
locking out others from installing cgroup BPF programs is useful. But
such an option is just that: policy. So building this policy in as a
fundamental property of the API seems like a bad idea; that is
effectively enforcing policy in the kernel, isn't it?

> Those same folks have similar concern with XDP. In the world where
> container management installs "root" XDP program which other user
> applications can plug into (libxdp use case, right?), it's crucial to
> ensure that this root XDP program is not accidentally overwritten by
> some well-meaning, but not overly cautious developer experimenting in
> his own container with XDP programs. This is where bpf_link ownership
> plays a huge role. Tupperware agent (FB's container management agent)
> would install root XDP program and will hold onto this bpf_link
> without sharing it with other applications. That will guarantee that
> the system will be stable and can't be compromised.

See this is where we get into "deployment-model specific territory". I
mean, sure, in the "central management daemon" model, it makes sense
that no other applications can replace the XDP program. But, erm, we
already have a mechanism to ensure that: Just don't grant those
applications CAP_NET_ADMIN? So again, bpf_link doesn't really seem to
add anything other than a different way to do the same thing?

Additionally, in the case where there is *not* a central management
daemon (i.e., what I'm implementing with libxdp), this would be the flow
implemented by the library without bpf_link:

1. Query kernel for current BPF prog loaded on $IFACE
2. Sanity-check that this program is a dispatcher program installed by
   libxdp
3. Create a new dispatcher program with whatever changes we want to do
   (such as adding another component program).
4. Atomically replace the old program with the new one using the netlink
   API in this patch series.

Whereas with bpf_link, it would be:

1. Find the pinned bpf_link for $IFACE (e.g., load from
   /sys/fs/bpf/iface-links/$IFNAME).
2. Query kernel for current BPF prog linked to $LINK
3. Sanity-check that this program is a dispatcher program installed by
   libxdp
4. Create a new dispatcher program with whatever changes we want to do
   (such as adding another component program).
5. Atomically replace the old program with the new one using the
   LINK_UPDATE bpf() API.


So all this does is add an additional step, and another dependency on
bpffs. And crucially, I really don't see how the "bpf_link is the only
thing that is not fundamentally broken" argument holds up.

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26 10:04                             ` Lorenz Bauer
@ 2020-03-26 17:47                               ` Jakub Kicinski
  2020-03-26 19:45                                 ` Alexei Starovoitov
  2020-03-26 18:18                               ` Andrii Nakryiko
  2020-03-26 19:53                               ` Alexei Starovoitov
  2 siblings, 1 reply; 112+ messages in thread
From: Jakub Kicinski @ 2020-03-26 17:47 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Andrii Nakryiko, Toke Høiland-Jørgensen,
	John Fastabend, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, Andrey Ignatov,
	Networking, bpf

On Thu, 26 Mar 2020 10:04:53 +0000 Lorenz Bauer wrote:
> On Thu, 26 Mar 2020 at 00:16, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > Those same folks have similar concern with XDP. In the world where
> > container management installs "root" XDP program which other user
> > applications can plug into (libxdp use case, right?), it's crucial to
> > ensure that this root XDP program is not accidentally overwritten by
> > some well-meaning, but not overly cautious developer experimenting in
> > his own container with XDP programs. This is where bpf_link ownership
> > plays a huge role. Tupperware agent (FB's container management agent)
> > would install root XDP program and will hold onto this bpf_link
> > without sharing it with other applications. That will guarantee that
> > the system will be stable and can't be compromised.  
> 
> Thanks for the extensive explanation Andrii.
> 
> This is what I imagine you're referring to: Tupperware creates a new network
> namespace ns1 and a veth0<>veth1 pair, moves one of the veth devices
> (let's says veth1) into ns1 and runs an application in ns1. On which veth
> would the XDP program go?
> 
> The way I understand it, veth1 would have XDP, and the application in ns1 would
> be prevented from attaching a new program? Maybe you can elaborate on this
> a little.

Nope, there is no veths involved. Tupperware mediates the requests 
from containers to install programs on the physical interface for
heavy-duty network processing like DDoS protection for the entire
machine.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26  5:13                             ` Jakub Kicinski
@ 2020-03-26 18:09                               ` Andrii Nakryiko
  2020-03-26 19:40                               ` Alexei Starovoitov
  1 sibling, 0 replies; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-26 18:09 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Wed, Mar 25, 2020 at 10:13 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 25 Mar 2020 17:16:13 -0700 Andrii Nakryiko wrote:
> > > >> Well, I wasn't talking about any of those subsystems, I was talking
> > > >> about networking :)
> > > >
> > > > So it's not "BPF subsystem's relation to the rest of the kernel" from
> > > > your previous email, it's now only "talking about networking"? Since
> > > > when the rest of the kernel is networking?
> > >
> > > Not really, I would likely argue the same for any other subsystem, I
> >
> > And you would like lose that argument :) You already agreed that for
> > tracing this is not the case. BPF is not attached by writing text into
> > ftrace's debugfs entries. Same for cgroups, we don't
> > create/update/write special files in cgroupfs, we have an explicit
> > attachment API in BPF.
> >
> > BTW, kprobes started out with the same model as XDP has right now. You
> > had to do a bunch of magic writes into various debugfs files to attach
> > BPF program. If user-space application crashed, kprobe stayed
> > attached. This was horrible and led to many problems in real world
> > production uses. So a completely different interface was created,
> > allowing to do it through perf_event_open() and created anonymous
> > inode for BPF program attachment. That allowed crashing program to
> > auto-detach kprobe and not harm production use case.
> >
> > Now we are coming after cgroup BPF programs, which have similar issues
> > and similar pains in production. cgroup BPF progs actually have extra
> > problems: programs can user-space applications can accidentally
> > replace a critical cgroup program and ruin the day for many folks that
> > have to deal with production breakage after that. Which is why I'm
> > implementing bpf_link with all its properties: to solve real pain and
> > real problem.
> >
> > Now for XDP. It has same flawed model. And even if it seems to you
> > that it's not a big issue, and even if Jakub thinks we are trying to
> > solve non-existing problem, it is a real problem and a real concern
> > from people that have to support XDP in production with many
>
> More than happy to talk to those folks, and see the tickets.

We can certainly set up some meeting with Andrey and Takshak.

>
> Toke has actual user space code which needs his extension, and for
> which "ownership" makes no difference as it would just be passed with
> whoever touched the program last.

As has been repeated time and time again, we cannot allow any random
application to just go and replace XDP program. Same for cgroups. It's
not a hypothetical problem, it has happened and it has caused
problems.

So just because Toke's prototype doesn't have any protection against
this, doesn't mean it's how it will end up being.

>
> > well-meaning developers developing BPF applications independently.
>
> There is one single program which can be attached to the XDP hook,
> the "everybody attaches their program model" does not apply.

Yes, but you've followed all the XDP chaining discussion and freplace
stuff up until now, right? There is going to be a single XDP root
program, but other applications are going to plug in their freplace
programs into it. And Tupperware wants to control XDP root program and
not let anyone replace it, even though some program will need to have
root access anyways.

>
> TW agent should just listen on netlink notifications to see if someone

I'll leave it up to TW agent team to decide if that's a good idea. But
please educate me. When some app replaces XDP program accidentally,
how TW agent can make sure (by following netlink notifications) that
**no** packet is intercepted and mis-routed by this wrong XDP program?
Are there such guarantees by netlink notifications that listening
application will be able to undo the operation in between two network
packets?

> replaced its program. cgroups have multi-attachment and no notifications

Multi-attachment is not always appropriate, which is why Andrey
Ignatov asked to support all modes (NONE, OVERRIDABLE, MULTI).  But
honestly I lost why this is relevant here.

> (although not sure anyone was explicitly asking for links there,
> either).

Tupperware did.

>
> In production a no-op XDP program is likely to be attached from the
> moment machine boots, to avoid traffic interruption and the risk of
> something going wrong with the driver when switching between skb to
> xdp datapath. And then the program is only replaced, not detached.

Good, so there in no problem to pin it somewhere forever.

>
> Not to mention the fact that networking applications generally don't
> want to remove their policy from the kernel when they crash :/

Yes, which is why bpf_link are trivially pinnable. bpf_link gives
choice. What's there right now in XDP (program FD attachment) doesn't
give a choice of auto-detaching on application crash for cases where
it's appropriate (some relatively short-running XDP monitoring script,
for example).

>
> > Now, those were fundamental things, but I'd like to touch on a "nice
> > things we get with that". Having a proper kernel object representing
> > single instance of attached BPF program to some other kernel object
> > allows to build an uniform and consistent API around bpf_link with
> > same semantics. We can do LINK_UPDATE and allow to atomically replace
> > BPF program inside the established bpf_link. It's applicable to all
> > types of BPF program attachment and can be done in a way that ensures
> > no BPF program invocation is skipped while BPF programs are swapped
> > (because at the lowest level it boils down to an atomic pointer swap).
> > Of course not all bpf_links might have this support initially, but
> > we'll establish a lot of common infrastructure which will make it
> > simpler, faster and more reliable to add this functionality.
>
> XDP replace is already atomic, no packet will be passed without either
> old or new program executed on it.

Please re-read what I wrote again, entire thing. You are picking
arbitrary pieces and considering them in isolation. It's either
dishonest or you are missing the point.

>
> > And to wrap up. I agree, consistent API is not a goal in itself, as
> > Jakub mentioned. But it is a worthy goal nevertheless, especially if
> > it doesn't cost anything extra. It makes kernel developers lives
>
> Not sure how having two interfaces instead of one makes kernel
> developer's life easier.

There is no interface for bpf_link for XDP right now. But let's
separate netlink vs bpf syscall discussion from bpf_link general
discussion.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26 10:04                             ` Lorenz Bauer
  2020-03-26 17:47                               ` Jakub Kicinski
@ 2020-03-26 18:18                               ` Andrii Nakryiko
  2020-03-26 19:53                               ` Alexei Starovoitov
  2 siblings, 0 replies; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-26 18:18 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Toke Høiland-Jørgensen, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Andrey Ignatov, Networking, bpf

On Thu, Mar 26, 2020 at 3:05 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> On Thu, 26 Mar 2020 at 00:16, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> [...]
> >
> > Those same folks have similar concern with XDP. In the world where
> > container management installs "root" XDP program which other user
> > applications can plug into (libxdp use case, right?), it's crucial to
> > ensure that this root XDP program is not accidentally overwritten by
> > some well-meaning, but not overly cautious developer experimenting in
> > his own container with XDP programs. This is where bpf_link ownership
> > plays a huge role. Tupperware agent (FB's container management agent)
> > would install root XDP program and will hold onto this bpf_link
> > without sharing it with other applications. That will guarantee that
> > the system will be stable and can't be compromised.
>
> Thanks for the extensive explanation Andrii.
>
> This is what I imagine you're referring to: Tupperware creates a new network
> namespace ns1 and a veth0<>veth1 pair, moves one of the veth devices
> (let's says veth1) into ns1 and runs an application in ns1. On which veth
> would the XDP program go?
>
> The way I understand it, veth1 would have XDP, and the application in ns1 would
> be prevented from attaching a new program? Maybe you can elaborate on this
> a little.
>

I'll people with first hand knowledge elaborate, if they are willing to share.

> Lorenz
>
> --
> Lorenz Bauer  |  Systems Engineer
> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
>
> www.cloudflare.com

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26 12:35                             ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
@ 2020-03-26 19:06                               ` Andrii Nakryiko
  2020-03-27 11:06                                 ` Lorenz Bauer
  2020-03-27 11:46                                 ` Toke Høiland-Jørgensen
  2020-03-26 19:58                               ` Alexei Starovoitov
  1 sibling, 2 replies; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-26 19:06 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On Thu, Mar 26, 2020 at 5:35 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > Now for XDP. It has same flawed model. And even if it seems to you
> > that it's not a big issue, and even if Jakub thinks we are trying to
> > solve non-existing problem, it is a real problem and a real concern
> > from people that have to support XDP in production with many
> > well-meaning developers developing BPF applications independently.
> > Copying what you wrote in another thread:
> >
> >> Setting aside the question of which is the best abstraction to represent
> >> an attachment, it seems to me that the actual behavioural problem (XDP
> >> programs being overridden by mistake) would be solvable by this patch,
> >> assuming well-behaved userspace applications.
> >
> > ... this is a horrible and unrealistic assumption that we just cannot
> > make and accept. However well-behaved userspace applications are, they
> > are written by people that make mistakes. And rather than blissfully
> > expect that everything will be fine, we want to have enforcements in
> > place that will prevent some buggy application to wreck havoc in
> > production.
>
> Look, I'm not trying to tell you how to managed your internal systems.
> I'm just objecting to your assertion that your deployment model is the
> only one that can possibly work, and the refusal to consider other
> alternatives that comes with it.

Your assumption doesn't work for us. Because of that we need something
like bpf_link. Existing attachment API doesn't go away and is still
supported. Feel free to use existing API. As for EXPECTED_FD API you
are adding, it will be up to maintainers to decide, ultimately, I
can't block it, even if I wanted to.

>
> >> You're saying that like we didn't already have the netlink API. We
> >> essentially already have (the equivalent of) LINK_CREATE and LINK_QUERY,
> >> this is just adding LINK_UPDATE. It's a straight-forward fix of an
> >> existing API; essentially you're saying we should keep the old API in a
> >> crippled state in order to promote your (proposed) new API.
> >
> > This is the fundamental disagreement that we seem to have. XDP's BPF
> > program attachment is not in any way equivalent to bpf_link. So no,
> > netlink API currently doesn't have anything that's close to bpf_link.
> > Let me try to summarize what bpf_link is and what are its fundamental
> > properties regardless of type of BPF programs.
>
> First of all, thank you for this summary; that is very useful!

Sure, you're welcome.

>
> > 1. bpf_link represents a connection (pairing?) of BPF program and some
> > BPF hook it is attached to. BPF hook could be perf event, cgroup,
> > netdev, etc. It's a completely independent object in itself, along the
> > bpf_map and bpf_prog, which has its own lifetime and kernel
> > representation. To user-space application it is returned as an
> > installed FD, similar to loaded BPF program and BPF map. It is
> > important that it's not just a BPF program, because BPF program can be
> > attached to multiple BPF hooks (e.g., same XDP program can be attached
> > to multiple interface; same kprobe handler can be installed multiple
> > times), which means that having BPF program FD isn't enough to
> > uniquely represent that one specific BPF program attachment and detach
> > it or query it. Having kernel object for this allows to encapsulate
> > all these various details of what is attached were and present to
> > user-space a single handle (FD) to work with.
>
> For XDP there is already a unique handle, it's just implicit: Each
> netdev can have exactly one XDP program loaded. So I don't really see
> how bpf_link adds anything, other than another API for the same thing?

I certainly failed to explain things clearly if you are still asking
this. See point #2, once you attach bpf_link you can't just replace
it. This is what XDP doesn't have right now.

It's a game of picking features/properties in isolation and "we can do
this particular thing this different way with what we have". Please,
try consider all of it together, it's important. Every single aspect
of bpf_link is not that unique, but it's all of them together that
matter.

>
> > 2. Due to having FD associated with bpf_link, it's not possible to
> > talk about "owning" bpf_link. If application created link and never
> > shared its FD with any other application, it is the sole owner of it.
> > But it also means that you can share it, if you need it. Now, once
> > application closes FD or app crashes and kernel automatically closes
> > that FD, bpf_link refcount is decremented. If it was the last or only
> > FD, it will trigger automatica detachment and clean up of that
> > particular BPF program attachment. Note, not a clean up of BPF
> > program, which can still be attached somewhere else: only that
> > particular attachment.
>
> This behaviour is actually one of my reservations against bpf_link for
> XDP: I think that automatically detaching XDP programs when the FD is
> closed is very much the wrong behaviour. An XDP program processes
> packets, and when loading one I very much expect it to keep doing that
> until I explicitly tell it to stop.

As you mentioned earlier, "it's not the only one mode". Just like with
tracing APIs, you can imagine scripts that would adds their
packet-sniffing XDP program temporarily. If they crash, "temporarily"
turns into "permanently, but no one knows". This is bad. And again,
it's a choice, just with a default to auto-cleanup, because it's safe,
even if it requires extra step for applications willing to do
permanent XDP attachment.

>
> > 3. This derives from the concept of ownership of bpf_link. Once
> > bpf_link is attached, no other application that doesn't own that
> > bpf_link can replace, detach or modify the link. For some cases it
> > doesn't matter. E.g., for tracing, all attachment to the same fentry
> > trampoline are completely independent. But for other cases this is
> > crucial property. E.g., when you attach BPF program in an exclusive
> > (single) mode, it means that particular cgroup and any of its children
> > cgroups can have any more BPF programs attached. This is important for
> > container management systems to enforce invariants and correct
> > functioning of the system. Right now it's very easy to violate that -
> > you just go and attach your own BPF program, and previous BPF program
> > gets automatically detached without original application that put it
> > there knowing about this. Chaos ensues after that and real people have
> > to deal with this. Which is why existing
> > BPF_PROG_ATTACH/BPF_PROG_DETACH API is inadequate and we are adding
> > bpf_link support.
>
> I can totally see how having an option to enforce a policy such as
> locking out others from installing cgroup BPF programs is useful. But
> such an option is just that: policy. So building this policy in as a
> fundamental property of the API seems like a bad idea; that is
> effectively enforcing policy in the kernel, isn't it?

I hope we won't go into a dictionary definition of what "policy" means
here :). For me it's about guarantee that kernel gives to user-space.
bpf_link doesn't care about dictating policies. If you don't want this
guarantee - don't use bpf_link, use direct program attachment. As
simple as that. Policy is implemented by user-space application by
using APIs with just the right guarantees.

>
> > Those same folks have similar concern with XDP. In the world where
> > container management installs "root" XDP program which other user
> > applications can plug into (libxdp use case, right?), it's crucial to
> > ensure that this root XDP program is not accidentally overwritten by
> > some well-meaning, but not overly cautious developer experimenting in
> > his own container with XDP programs. This is where bpf_link ownership
> > plays a huge role. Tupperware agent (FB's container management agent)
> > would install root XDP program and will hold onto this bpf_link
> > without sharing it with other applications. That will guarantee that
> > the system will be stable and can't be compromised.
>
> See this is where we get into "deployment-model specific territory". I
> mean, sure, in the "central management daemon" model, it makes sense
> that no other applications can replace the XDP program. But, erm, we
> already have a mechanism to ensure that: Just don't grant those
> applications CAP_NET_ADMIN? So again, bpf_link doesn't really seem to
> add anything other than a different way to do the same thing?

Because there are still applications that need CAP_NET_ADMIN in order
to function (for other reasons than attaching XDP), so it's impossible
to enforce with for everyone.

>
> Additionally, in the case where there is *not* a central management
> daemon (i.e., what I'm implementing with libxdp), this would be the flow
> implemented by the library without bpf_link:
>
> 1. Query kernel for current BPF prog loaded on $IFACE
> 2. Sanity-check that this program is a dispatcher program installed by
>    libxdp
> 3. Create a new dispatcher program with whatever changes we want to do
>    (such as adding another component program).
> 4. Atomically replace the old program with the new one using the netlink
>    API in this patch series.
>
> Whereas with bpf_link, it would be:
>
> 1. Find the pinned bpf_link for $IFACE (e.g., load from
>    /sys/fs/bpf/iface-links/$IFNAME).

But now you can hide this mount point from containerized
root/CAP_NET_ADMIN application, can't you? See the difference? One
might think about bpf_link as a fine-grained capability in this sense.


> 2. Query kernel for current BPF prog linked to $LINK
> 3. Sanity-check that this program is a dispatcher program installed by
>    libxdp
> 4. Create a new dispatcher program with whatever changes we want to do
>    (such as adding another component program).
> 5. Atomically replace the old program with the new one using the
>    LINK_UPDATE bpf() API.
>
>
> So all this does is add an additional step, and another dependency on
> bpffs. And crucially, I really don't see how the "bpf_link is the only
> thing that is not fundamentally broken" argument holds up.
>
> -Toke
>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26  5:13                             ` Jakub Kicinski
  2020-03-26 18:09                               ` Andrii Nakryiko
@ 2020-03-26 19:40                               ` Alexei Starovoitov
  2020-03-26 20:05                                 ` Edward Cree
  1 sibling, 1 reply; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-26 19:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrii Nakryiko, Toke Høiland-Jørgensen,
	John Fastabend, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, Lorenz Bauer,
	Andrey Ignatov, Networking, bpf

On Wed, Mar 25, 2020 at 10:13:23PM -0700, Jakub Kicinski wrote:
> >
> > Now for XDP. It has same flawed model. And even if it seems to you
> > that it's not a big issue, and even if Jakub thinks we are trying to
> > solve non-existing problem, it is a real problem and a real concern
> > from people that have to support XDP in production with many
> 
> More than happy to talk to those folks, and see the tickets.

Jakub, you repeatedly demonstrated lack of understanding of what
bpf_link is despite multiple attempts from me, Andrii and others.
At this point I don't believe in your good intent.
Your repeated attacks on BPF in every thread are out of control.
I kept ignoring your insults for long time, but I cannot do this anymore.
Please find other threads to contribute your opinions.
They are not welcomed here.

> > well-meaning developers developing BPF applications independently.
> 
> There is one single program which can be attached to the XDP hook, 
> the "everybody attaches their program model" does not apply.
> 
> TW agent should just listen on netlink notifications to see if someone
> replaced its program.

This is dumbest idea I've heard in a long time.
May be kernel shouldn't have done ACLs and did notifications only
when file is accessed by a task that shouldn't have accessed it?
Same level of craziness.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26 17:47                               ` Jakub Kicinski
@ 2020-03-26 19:45                                 ` Alexei Starovoitov
  0 siblings, 0 replies; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-26 19:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Lorenz Bauer, Andrii Nakryiko, Toke Høiland-Jørgensen,
	John Fastabend, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, Andrey Ignatov,
	Networking, bpf

On Thu, Mar 26, 2020 at 10:47:55AM -0700, Jakub Kicinski wrote:
> On Thu, 26 Mar 2020 10:04:53 +0000 Lorenz Bauer wrote:
> > On Thu, 26 Mar 2020 at 00:16, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > > Those same folks have similar concern with XDP. In the world where
> > > container management installs "root" XDP program which other user
> > > applications can plug into (libxdp use case, right?), it's crucial to
> > > ensure that this root XDP program is not accidentally overwritten by
> > > some well-meaning, but not overly cautious developer experimenting in
> > > his own container with XDP programs. This is where bpf_link ownership
> > > plays a huge role. Tupperware agent (FB's container management agent)
> > > would install root XDP program and will hold onto this bpf_link
> > > without sharing it with other applications. That will guarantee that
> > > the system will be stable and can't be compromised.  
> > 
> > Thanks for the extensive explanation Andrii.
> > 
> > This is what I imagine you're referring to: Tupperware creates a new network
> > namespace ns1 and a veth0<>veth1 pair, moves one of the veth devices
> > (let's says veth1) into ns1 and runs an application in ns1. On which veth
> > would the XDP program go?
> > 
> > The way I understand it, veth1 would have XDP, and the application in ns1 would
> > be prevented from attaching a new program? Maybe you can elaborate on this
> > a little.
> 
> Nope, there is no veths involved. Tupperware mediates the requests 
> from containers to install programs on the physical interface for
> heavy-duty network processing like DDoS protection for the entire
> machine.

that's not what is happening.
Jakub, I strongly suggest to avoid talking about things you have no clue.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26 10:04                             ` Lorenz Bauer
  2020-03-26 17:47                               ` Jakub Kicinski
  2020-03-26 18:18                               ` Andrii Nakryiko
@ 2020-03-26 19:53                               ` Alexei Starovoitov
  2020-03-27 11:11                                 ` Toke Høiland-Jørgensen
  2 siblings, 1 reply; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-26 19:53 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Andrii Nakryiko, Toke Høiland-Jørgensen,
	John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Andrey Ignatov, Networking, bpf

On Thu, Mar 26, 2020 at 10:04:53AM +0000, Lorenz Bauer wrote:
> On Thu, 26 Mar 2020 at 00:16, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> [...]
> >
> > Those same folks have similar concern with XDP. In the world where
> > container management installs "root" XDP program which other user
> > applications can plug into (libxdp use case, right?), it's crucial to
> > ensure that this root XDP program is not accidentally overwritten by
> > some well-meaning, but not overly cautious developer experimenting in
> > his own container with XDP programs. This is where bpf_link ownership
> > plays a huge role. Tupperware agent (FB's container management agent)
> > would install root XDP program and will hold onto this bpf_link
> > without sharing it with other applications. That will guarantee that
> > the system will be stable and can't be compromised.
> 
> Thanks for the extensive explanation Andrii.
> 
> This is what I imagine you're referring to: Tupperware creates a new network
> namespace ns1 and a veth0<>veth1 pair, moves one of the veth devices
> (let's says veth1) into ns1 and runs an application in ns1. On which veth
> would the XDP program go?

As you can imagine there are many teams and use cases in the data center.
If I say that netns is not used it won't be true. Since there are folks
that use netns. Though it's strongly discouraged.
For container usage though netns is not used. IP virtualization is done
via cgroup-bpf bind/connect override.
But it's also not in 100% of containers.
There are various teams that use XDP already and some that want to start
using it. The XDP orchestration is lacking. That's all the discussions
around libxdp (and now renamed to libdispatcher, right Toke?) are about.
The design of libdispatcher will evolve over time.
No one is saying that we thought through of everything.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26 12:35                             ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
  2020-03-26 19:06                               ` Andrii Nakryiko
@ 2020-03-26 19:58                               ` Alexei Starovoitov
  2020-03-27 12:06                                 ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-26 19:58 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Thu, Mar 26, 2020 at 01:35:13PM +0100, Toke Høiland-Jørgensen wrote:
> 
> Additionally, in the case where there is *not* a central management
> daemon (i.e., what I'm implementing with libxdp), this would be the flow
> implemented by the library without bpf_link:
> 
> 1. Query kernel for current BPF prog loaded on $IFACE
> 2. Sanity-check that this program is a dispatcher program installed by
>    libxdp
> 3. Create a new dispatcher program with whatever changes we want to do
>    (such as adding another component program).
> 4. Atomically replace the old program with the new one using the netlink
>    API in this patch series.

in this model what stops another application that is not using libdispatcher to
nuke dispatcher program ?

> Whereas with bpf_link, it would be:
> 
> 1. Find the pinned bpf_link for $IFACE (e.g., load from
>    /sys/fs/bpf/iface-links/$IFNAME).
> 2. Query kernel for current BPF prog linked to $LINK
> 3. Sanity-check that this program is a dispatcher program installed by
>    libxdp
> 4. Create a new dispatcher program with whatever changes we want to do
>    (such as adding another component program).
> 5. Atomically replace the old program with the new one using the
>    LINK_UPDATE bpf() API.

whereas here dispatcher program is only accessible to libdispatcher.
Instance of bpffs needs to be known to libdispatcher only.
That's the ownership I've been talking about.

As discussed early we need a way for _human_ to nuke dispatcher program,
but such api shouldn't be usable out of application/task.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26 19:40                               ` Alexei Starovoitov
@ 2020-03-26 20:05                                 ` Edward Cree
  2020-03-27 11:09                                   ` Lorenz Bauer
  2020-03-27 23:11                                   ` Alexei Starovoitov
  0 siblings, 2 replies; 112+ messages in thread
From: Edward Cree @ 2020-03-26 20:05 UTC (permalink / raw)
  To: Alexei Starovoitov, Jakub Kicinski
  Cc: Andrii Nakryiko, Toke Høiland-Jørgensen,
	John Fastabend, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, Lorenz Bauer,
	Andrey Ignatov, Networking, bpf

On 26/03/2020 19:40, Alexei Starovoitov wrote:
> At this point I don't believe in your good intent.
> Your repeated attacks on BPF in every thread are out of control.
> I kept ignoring your insults for long time, but I cannot do this anymore.
> Please find other threads to contribute your opinions.
> They are not welcomed here.
Given that this clearly won't land in this cycle (and neither will bpf_link
 for XDP), can I suggest thateveryone involved steps back from the subject
 for a few days to let tempers cool?  It's getting to the point where people
 are burning bridges and saying things they might regret.
I know everyone is under a lot of stress right now.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26 19:06                               ` Andrii Nakryiko
@ 2020-03-27 11:06                                 ` Lorenz Bauer
  2020-03-27 16:12                                   ` David Ahern
                                                     ` (3 more replies)
  2020-03-27 11:46                                 ` Toke Høiland-Jørgensen
  1 sibling, 4 replies; 112+ messages in thread
From: Lorenz Bauer @ 2020-03-27 11:06 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Toke Høiland-Jørgensen, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Andrey Ignatov, Networking, bpf

On Thu, 26 Mar 2020 at 19:06, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Mar 26, 2020 at 5:35 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >
> > > Now for XDP. It has same flawed model. And even if it seems to you
> > > that it's not a big issue, and even if Jakub thinks we are trying to
> > > solve non-existing problem, it is a real problem and a real concern
> > > from people that have to support XDP in production with many
> > > well-meaning developers developing BPF applications independently.
> > > Copying what you wrote in another thread:
> > >
> > >> Setting aside the question of which is the best abstraction to represent
> > >> an attachment, it seems to me that the actual behavioural problem (XDP
> > >> programs being overridden by mistake) would be solvable by this patch,
> > >> assuming well-behaved userspace applications.
> > >
> > > ... this is a horrible and unrealistic assumption that we just cannot
> > > make and accept. However well-behaved userspace applications are, they
> > > are written by people that make mistakes. And rather than blissfully
> > > expect that everything will be fine, we want to have enforcements in
> > > place that will prevent some buggy application to wreck havoc in
> > > production.
> >
> > Look, I'm not trying to tell you how to managed your internal systems.
> > I'm just objecting to your assertion that your deployment model is the
> > only one that can possibly work, and the refusal to consider other
> > alternatives that comes with it.
>
> Your assumption doesn't work for us. Because of that we need something
> like bpf_link. Existing attachment API doesn't go away and is still
> supported. Feel free to use existing API. As for EXPECTED_FD API you
> are adding, it will be up to maintainers to decide, ultimately, I
> can't block it, even if I wanted to.
>
> >
> > >> You're saying that like we didn't already have the netlink API. We
> > >> essentially already have (the equivalent of) LINK_CREATE and LINK_QUERY,
> > >> this is just adding LINK_UPDATE. It's a straight-forward fix of an
> > >> existing API; essentially you're saying we should keep the old API in a
> > >> crippled state in order to promote your (proposed) new API.
> > >
> > > This is the fundamental disagreement that we seem to have. XDP's BPF
> > > program attachment is not in any way equivalent to bpf_link. So no,
> > > netlink API currently doesn't have anything that's close to bpf_link.
> > > Let me try to summarize what bpf_link is and what are its fundamental
> > > properties regardless of type of BPF programs.
> >
> > First of all, thank you for this summary; that is very useful!
>
> Sure, you're welcome.
>
> >
> > > 1. bpf_link represents a connection (pairing?) of BPF program and some
> > > BPF hook it is attached to. BPF hook could be perf event, cgroup,
> > > netdev, etc. It's a completely independent object in itself, along the
> > > bpf_map and bpf_prog, which has its own lifetime and kernel
> > > representation. To user-space application it is returned as an
> > > installed FD, similar to loaded BPF program and BPF map. It is
> > > important that it's not just a BPF program, because BPF program can be
> > > attached to multiple BPF hooks (e.g., same XDP program can be attached
> > > to multiple interface; same kprobe handler can be installed multiple
> > > times), which means that having BPF program FD isn't enough to
> > > uniquely represent that one specific BPF program attachment and detach
> > > it or query it. Having kernel object for this allows to encapsulate
> > > all these various details of what is attached were and present to
> > > user-space a single handle (FD) to work with.
> >
> > For XDP there is already a unique handle, it's just implicit: Each
> > netdev can have exactly one XDP program loaded. So I don't really see
> > how bpf_link adds anything, other than another API for the same thing?
>
> I certainly failed to explain things clearly if you are still asking
> this. See point #2, once you attach bpf_link you can't just replace
> it. This is what XDP doesn't have right now.

From your description I like bpf_link, because it'll make attachment easier
to support, and the pinning behaviour also seems nice. I'm really not fussed
by netlink vs syscall, whatever.

However, this behaviour concerns me. It's like Windows not
letting you delete a file while an application has it opened, which just leads
to randomly killing programs until you find the right one. It's frustrating
and counter productive.

You're taking power away from the operator. In your deployment scenario
this might make sense, but I think it's a really bad model in general. If I am
privileged I need to be able to exercise that privilege. This means that if
there is a netdevice in my network namespace, and I have CAP_NET_ADMIN
or whatever, I can break the association.

So, to be constructive: I'd prefer bpf_link to replace a netlink attachment and
vice versa. If you need to restrict control, use network namespaces
to hide the devices, instead of hiding the bpffs.

>
> It's a game of picking features/properties in isolation and "we can do
> this particular thing this different way with what we have". Please,
> try consider all of it together, it's important. Every single aspect
> of bpf_link is not that unique, but it's all of them together that
> matter.
>
> >
> > > 2. Due to having FD associated with bpf_link, it's not possible to
> > > talk about "owning" bpf_link. If application created link and never
> > > shared its FD with any other application, it is the sole owner of it.
> > > But it also means that you can share it, if you need it. Now, once
> > > application closes FD or app crashes and kernel automatically closes
> > > that FD, bpf_link refcount is decremented. If it was the last or only
> > > FD, it will trigger automatica detachment and clean up of that
> > > particular BPF program attachment. Note, not a clean up of BPF
> > > program, which can still be attached somewhere else: only that
> > > particular attachment.
> >
> > This behaviour is actually one of my reservations against bpf_link for
> > XDP: I think that automatically detaching XDP programs when the FD is
> > closed is very much the wrong behaviour. An XDP program processes
> > packets, and when loading one I very much expect it to keep doing that
> > until I explicitly tell it to stop.
>
> As you mentioned earlier, "it's not the only one mode". Just like with
> tracing APIs, you can imagine scripts that would adds their
> packet-sniffing XDP program temporarily. If they crash, "temporarily"
> turns into "permanently, but no one knows". This is bad. And again,
> it's a choice, just with a default to auto-cleanup, because it's safe,
> even if it requires extra step for applications willing to do
> permanent XDP attachment.
>
> >
> > > 3. This derives from the concept of ownership of bpf_link. Once
> > > bpf_link is attached, no other application that doesn't own that
> > > bpf_link can replace, detach or modify the link. For some cases it
> > > doesn't matter. E.g., for tracing, all attachment to the same fentry
> > > trampoline are completely independent. But for other cases this is
> > > crucial property. E.g., when you attach BPF program in an exclusive
> > > (single) mode, it means that particular cgroup and any of its children
> > > cgroups can have any more BPF programs attached. This is important for
> > > container management systems to enforce invariants and correct
> > > functioning of the system. Right now it's very easy to violate that -
> > > you just go and attach your own BPF program, and previous BPF program
> > > gets automatically detached without original application that put it
> > > there knowing about this. Chaos ensues after that and real people have
> > > to deal with this. Which is why existing
> > > BPF_PROG_ATTACH/BPF_PROG_DETACH API is inadequate and we are adding
> > > bpf_link support.
> >
> > I can totally see how having an option to enforce a policy such as
> > locking out others from installing cgroup BPF programs is useful. But
> > such an option is just that: policy. So building this policy in as a
> > fundamental property of the API seems like a bad idea; that is
> > effectively enforcing policy in the kernel, isn't it?
>
> I hope we won't go into a dictionary definition of what "policy" means
> here :). For me it's about guarantee that kernel gives to user-space.
> bpf_link doesn't care about dictating policies. If you don't want this
> guarantee - don't use bpf_link, use direct program attachment. As
> simple as that. Policy is implemented by user-space application by
> using APIs with just the right guarantees.
>
> >
> > > Those same folks have similar concern with XDP. In the world where
> > > container management installs "root" XDP program which other user
> > > applications can plug into (libxdp use case, right?), it's crucial to
> > > ensure that this root XDP program is not accidentally overwritten by
> > > some well-meaning, but not overly cautious developer experimenting in
> > > his own container with XDP programs. This is where bpf_link ownership
> > > plays a huge role. Tupperware agent (FB's container management agent)
> > > would install root XDP program and will hold onto this bpf_link
> > > without sharing it with other applications. That will guarantee that
> > > the system will be stable and can't be compromised.
> >
> > See this is where we get into "deployment-model specific territory". I
> > mean, sure, in the "central management daemon" model, it makes sense
> > that no other applications can replace the XDP program. But, erm, we
> > already have a mechanism to ensure that: Just don't grant those
> > applications CAP_NET_ADMIN? So again, bpf_link doesn't really seem to
> > add anything other than a different way to do the same thing?
>
> Because there are still applications that need CAP_NET_ADMIN in order
> to function (for other reasons than attaching XDP), so it's impossible
> to enforce with for everyone.

I think I'm missing some context. CAP_NET_ADMIN is trusted by definition,
so trust these applications to not fiddle with XDP? Are there many of these?
Are they inside a user namespace or something?

>
> >
> > Additionally, in the case where there is *not* a central management
> > daemon (i.e., what I'm implementing with libxdp), this would be the flow
> > implemented by the library without bpf_link:
> >
> > 1. Query kernel for current BPF prog loaded on $IFACE
> > 2. Sanity-check that this program is a dispatcher program installed by
> >    libxdp
> > 3. Create a new dispatcher program with whatever changes we want to do
> >    (such as adding another component program).
> > 4. Atomically replace the old program with the new one using the netlink
> >    API in this patch series.
> >
> > Whereas with bpf_link, it would be:
> >
> > 1. Find the pinned bpf_link for $IFACE (e.g., load from
> >    /sys/fs/bpf/iface-links/$IFNAME).
>
> But now you can hide this mount point from containerized
> root/CAP_NET_ADMIN application, can't you? See the difference? One
> might think about bpf_link as a fine-grained capability in this sense.
>
>
> > 2. Query kernel for current BPF prog linked to $LINK
> > 3. Sanity-check that this program is a dispatcher program installed by
> >    libxdp
> > 4. Create a new dispatcher program with whatever changes we want to do
> >    (such as adding another component program).
> > 5. Atomically replace the old program with the new one using the
> >    LINK_UPDATE bpf() API.
> >
> >
> > So all this does is add an additional step, and another dependency on
> > bpffs. And crucially, I really don't see how the "bpf_link is the only
> > thing that is not fundamentally broken" argument holds up.
> >
> > -Toke
> >



-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26 20:05                                 ` Edward Cree
@ 2020-03-27 11:09                                   ` Lorenz Bauer
  2020-03-27 23:11                                   ` Alexei Starovoitov
  1 sibling, 0 replies; 112+ messages in thread
From: Lorenz Bauer @ 2020-03-27 11:09 UTC (permalink / raw)
  To: Edward Cree
  Cc: Alexei Starovoitov, Jakub Kicinski, Andrii Nakryiko,
	Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Andrey Ignatov, Networking, bpf

On Thu, 26 Mar 2020 at 20:06, Edward Cree <ecree@solarflare.com> wrote:
>
> On 26/03/2020 19:40, Alexei Starovoitov wrote:
> > At this point I don't believe in your good intent.
> > Your repeated attacks on BPF in every thread are out of control.
> > I kept ignoring your insults for long time, but I cannot do this anymore.
> > Please find other threads to contribute your opinions.
> > They are not welcomed here.
> Given that this clearly won't land in this cycle (and neither will bpf_link
>  for XDP), can I suggest thateveryone involved steps back from the subject
>  for a few days to let tempers cool?  It's getting to the point where people
>  are burning bridges and saying things they might regret.
> I know everyone is under a lot of stress right now.

Sorry, I hadn't seen this message before sending my reply. I think
this sounds like a good idea.

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26 19:53                               ` Alexei Starovoitov
@ 2020-03-27 11:11                                 ` Toke Høiland-Jørgensen
  2020-04-02 20:21                                   ` bpf: ability to attach freplace to multiple parents Alexei Starovoitov
  0 siblings, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-27 11:11 UTC (permalink / raw)
  To: Alexei Starovoitov, Lorenz Bauer
  Cc: Andrii Nakryiko, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Andrey Ignatov, Networking, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> libxdp (and now renamed to libdispatcher, right Toke?)

Not yet :)

I want to get it to initial feature completeness for XDP first, then
think about generalising the dispatcher bits (which has additional
issues, such as figuring out how to manage the dispatcher programs for
different program types).

Current code is in [0], for those following along. There are two bits of
kernel support missing before I can get it to where I want it for an
initial "release": Atomic replace of the dispatcher (this series), and
the ability to attach an freplace program to more than one "parent".
I'll try to get an RFC out for the latter during the merge window, but
I'll probably need some help in figuring out how to make it safe from
the verifier PoV.

-Toke


[0] https://github.com/xdp-project/xdp-tools/tree/xdp-multi-prog


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26 19:06                               ` Andrii Nakryiko
  2020-03-27 11:06                                 ` Lorenz Bauer
@ 2020-03-27 11:46                                 ` Toke Høiland-Jørgensen
  2020-03-27 20:07                                   ` Andrii Nakryiko
  1 sibling, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-27 11:46 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Thu, Mar 26, 2020 at 5:35 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > Now for XDP. It has same flawed model. And even if it seems to you
>> > that it's not a big issue, and even if Jakub thinks we are trying to
>> > solve non-existing problem, it is a real problem and a real concern
>> > from people that have to support XDP in production with many
>> > well-meaning developers developing BPF applications independently.
>> > Copying what you wrote in another thread:
>> >
>> >> Setting aside the question of which is the best abstraction to represent
>> >> an attachment, it seems to me that the actual behavioural problem (XDP
>> >> programs being overridden by mistake) would be solvable by this patch,
>> >> assuming well-behaved userspace applications.
>> >
>> > ... this is a horrible and unrealistic assumption that we just cannot
>> > make and accept. However well-behaved userspace applications are, they
>> > are written by people that make mistakes. And rather than blissfully
>> > expect that everything will be fine, we want to have enforcements in
>> > place that will prevent some buggy application to wreck havoc in
>> > production.
>>
>> Look, I'm not trying to tell you how to managed your internal systems.
>> I'm just objecting to your assertion that your deployment model is the
>> only one that can possibly work, and the refusal to consider other
>> alternatives that comes with it.
>
> Your assumption doesn't work for us. Because of that we need something
> like bpf_link.

I'm not disputing what you need for your use case; you obviously know
better than me. I'm really just saying that your use case is not
everyone's use case.

> Existing attachment API doesn't go away and is still supported. Feel
> free to use existing API.

As far as I'm concerned that's what I'm trying to do. This patch series
is really just fixing a bug in the existing API; to which the response
was "no, that API is fundamentally broken, you have to use bpf_link
instead". And *that* is what I am disputing.

(I do have some reservations about details of bpf_link, see below, but
I'm not actually totally against the whole concept).

>> > 1. bpf_link represents a connection (pairing?) of BPF program and some
>> > BPF hook it is attached to. BPF hook could be perf event, cgroup,
>> > netdev, etc. It's a completely independent object in itself, along the
>> > bpf_map and bpf_prog, which has its own lifetime and kernel
>> > representation. To user-space application it is returned as an
>> > installed FD, similar to loaded BPF program and BPF map. It is
>> > important that it's not just a BPF program, because BPF program can be
>> > attached to multiple BPF hooks (e.g., same XDP program can be attached
>> > to multiple interface; same kprobe handler can be installed multiple
>> > times), which means that having BPF program FD isn't enough to
>> > uniquely represent that one specific BPF program attachment and detach
>> > it or query it. Having kernel object for this allows to encapsulate
>> > all these various details of what is attached were and present to
>> > user-space a single handle (FD) to work with.
>>
>> For XDP there is already a unique handle, it's just implicit: Each
>> netdev can have exactly one XDP program loaded. So I don't really see
>> how bpf_link adds anything, other than another API for the same thing?
>
> I certainly failed to explain things clearly if you are still asking
> this. See point #2, once you attach bpf_link you can't just replace
> it. This is what XDP doesn't have right now.

Those are two different things, though. I get that #2 is a new
capability provided by bpf_link, I was just saying #1 isn't (for XDP).

>> > 2. Due to having FD associated with bpf_link, it's not possible to
>> > talk about "owning" bpf_link. If application created link and never
>> > shared its FD with any other application, it is the sole owner of it.
>> > But it also means that you can share it, if you need it. Now, once
>> > application closes FD or app crashes and kernel automatically closes
>> > that FD, bpf_link refcount is decremented. If it was the last or only
>> > FD, it will trigger automatica detachment and clean up of that
>> > particular BPF program attachment. Note, not a clean up of BPF
>> > program, which can still be attached somewhere else: only that
>> > particular attachment.
>>
>> This behaviour is actually one of my reservations against bpf_link for
>> XDP: I think that automatically detaching XDP programs when the FD is
>> closed is very much the wrong behaviour. An XDP program processes
>> packets, and when loading one I very much expect it to keep doing that
>> until I explicitly tell it to stop.
>
> As you mentioned earlier, "it's not the only one mode". Just like with
> tracing APIs, you can imagine scripts that would adds their
> packet-sniffing XDP program temporarily. If they crash, "temporarily"
> turns into "permanently, but no one knows". This is bad. And again,
> it's a choice, just with a default to auto-cleanup, because it's safe,
> even if it requires extra step for applications willing to do
> permanent XDP attachment.

Well, there are two aspects to this: One is what should be the default -
I'd argue that for XDP the most common case is 'permanent attachment'.
But that can be worked around at the library level, so it's not that
important (just a bit annoying for the library implementer, which just
so happens to be me in this case :)).

The more important problem is that with "attach link + pin", we need two
operations. So with that there is no longer a way to atomically do a
permanent attach. And also there are two pieces of state (the pinned
bpf_link + the attachment of that to the interface).

>> > 3. This derives from the concept of ownership of bpf_link. Once
>> > bpf_link is attached, no other application that doesn't own that
>> > bpf_link can replace, detach or modify the link. For some cases it
>> > doesn't matter. E.g., for tracing, all attachment to the same fentry
>> > trampoline are completely independent. But for other cases this is
>> > crucial property. E.g., when you attach BPF program in an exclusive
>> > (single) mode, it means that particular cgroup and any of its children
>> > cgroups can have any more BPF programs attached. This is important for
>> > container management systems to enforce invariants and correct
>> > functioning of the system. Right now it's very easy to violate that -
>> > you just go and attach your own BPF program, and previous BPF program
>> > gets automatically detached without original application that put it
>> > there knowing about this. Chaos ensues after that and real people have
>> > to deal with this. Which is why existing
>> > BPF_PROG_ATTACH/BPF_PROG_DETACH API is inadequate and we are adding
>> > bpf_link support.
>>
>> I can totally see how having an option to enforce a policy such as
>> locking out others from installing cgroup BPF programs is useful. But
>> such an option is just that: policy. So building this policy in as a
>> fundamental property of the API seems like a bad idea; that is
>> effectively enforcing policy in the kernel, isn't it?
>
> I hope we won't go into a dictionary definition of what "policy" means
> here :). For me it's about guarantee that kernel gives to user-space.
> bpf_link doesn't care about dictating policies. If you don't want this
> guarantee - don't use bpf_link, use direct program attachment. As
> simple as that. Policy is implemented by user-space application by
> using APIs with just the right guarantees.

Yes, but the user-space application shouldn't get to choose the policy -
the system administrator should. So an application should be able to
*request* this behaviour, but it should be a policy decision whether to
allow it. If the "locking" behaviour is built-in to the API, that
separation becomes impossible.

>> > Those same folks have similar concern with XDP. In the world where
>> > container management installs "root" XDP program which other user
>> > applications can plug into (libxdp use case, right?), it's crucial to
>> > ensure that this root XDP program is not accidentally overwritten by
>> > some well-meaning, but not overly cautious developer experimenting in
>> > his own container with XDP programs. This is where bpf_link ownership
>> > plays a huge role. Tupperware agent (FB's container management agent)
>> > would install root XDP program and will hold onto this bpf_link
>> > without sharing it with other applications. That will guarantee that
>> > the system will be stable and can't be compromised.
>>
>> See this is where we get into "deployment-model specific territory". I
>> mean, sure, in the "central management daemon" model, it makes sense
>> that no other applications can replace the XDP program. But, erm, we
>> already have a mechanism to ensure that: Just don't grant those
>> applications CAP_NET_ADMIN? So again, bpf_link doesn't really seem to
>> add anything other than a different way to do the same thing?
>
> Because there are still applications that need CAP_NET_ADMIN in order
> to function (for other reasons than attaching XDP), so it's impossible
> to enforce with for everyone.

But if you grant an application CAP_NET_ADMIN, it can wreak all sorts of
havoc (the most obvious being just issuing 'ip link down' on the iface).
So you're implicitly trusting it to be well-behaved, so why does this
particular act of misbehaviour need a special kernel enforcement
mechanism?

>> Additionally, in the case where there is *not* a central management
>> daemon (i.e., what I'm implementing with libxdp), this would be the flow
>> implemented by the library without bpf_link:
>>
>> 1. Query kernel for current BPF prog loaded on $IFACE
>> 2. Sanity-check that this program is a dispatcher program installed by
>>    libxdp
>> 3. Create a new dispatcher program with whatever changes we want to do
>>    (such as adding another component program).
>> 4. Atomically replace the old program with the new one using the netlink
>>    API in this patch series.
>>
>> Whereas with bpf_link, it would be:
>>
>> 1. Find the pinned bpf_link for $IFACE (e.g., load from
>>    /sys/fs/bpf/iface-links/$IFNAME).
>
> But now you can hide this mount point from containerized
> root/CAP_NET_ADMIN application, can't you? See the difference? One
> might think about bpf_link as a fine-grained capability in this sense.

Yes, that may be a feature. But it may also be an anti-feature (I can't
move an iface to a new namespace that doesn't have the original bpffs
*without* preventing that namespace from replacing the XDP program).
Also, why are we re-inventing an ad-hoc capability mechanism?

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26 19:58                               ` Alexei Starovoitov
@ 2020-03-27 12:06                                 ` Toke Høiland-Jørgensen
  2020-03-27 23:00                                   ` Alexei Starovoitov
  0 siblings, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-27 12:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Thu, Mar 26, 2020 at 01:35:13PM +0100, Toke Høiland-Jørgensen wrote:
>> 
>> Additionally, in the case where there is *not* a central management
>> daemon (i.e., what I'm implementing with libxdp), this would be the flow
>> implemented by the library without bpf_link:
>> 
>> 1. Query kernel for current BPF prog loaded on $IFACE
>> 2. Sanity-check that this program is a dispatcher program installed by
>>    libxdp
>> 3. Create a new dispatcher program with whatever changes we want to do
>>    (such as adding another component program).
>> 4. Atomically replace the old program with the new one using the netlink
>>    API in this patch series.
>
> in this model what stops another application that is not using libdispatcher to
> nuke dispatcher program ?

Nothing. But nothing is stopping it from issuing 'ip link down' either -
an application with CAP_NET_ADMIN is implicitly trusted to be
well-behaved. This patch series is just adding the kernel primitive that
enables applications to be well-behaved. I consider it an API bug-fix.

>> Whereas with bpf_link, it would be:
>> 
>> 1. Find the pinned bpf_link for $IFACE (e.g., load from
>>    /sys/fs/bpf/iface-links/$IFNAME).
>> 2. Query kernel for current BPF prog linked to $LINK
>> 3. Sanity-check that this program is a dispatcher program installed by
>>    libxdp
>> 4. Create a new dispatcher program with whatever changes we want to do
>>    (such as adding another component program).
>> 5. Atomically replace the old program with the new one using the
>>    LINK_UPDATE bpf() API.
>
> whereas here dispatcher program is only accessible to libdispatcher.
> Instance of bpffs needs to be known to libdispatcher only.
> That's the ownership I've been talking about.
>
> As discussed early we need a way for _human_ to nuke dispatcher program,
> but such api shouldn't be usable out of application/task.

As long as there is this kind of override in place, I'm not actually
fundamentally opposed to the concept of bpf_link for XDP, as an
additional mechanism. What I'm opposed to is using bpf_link as a reason
to block this series.

In fact, a way to implement the "human override" you mention, could be
to reuse the mechanism implemented in this series: If the EXPECTED_FD
passed via netlink is a bpf_link FD, that could be interpreted as an
override by the kernel.

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-27 11:06                                 ` Lorenz Bauer
@ 2020-03-27 16:12                                   ` David Ahern
  2020-03-27 20:10                                     ` Andrii Nakryiko
  2020-03-27 23:02                                     ` Alexei Starovoitov
  2020-03-27 19:42                                   ` Andrii Nakryiko
                                                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 112+ messages in thread
From: David Ahern @ 2020-03-27 16:12 UTC (permalink / raw)
  To: Lorenz Bauer, Andrii Nakryiko
  Cc: Toke Høiland-Jørgensen, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Andrey Ignatov, Networking, bpf

On 3/27/20 5:06 AM, Lorenz Bauer wrote:
> However, this behaviour concerns me. It's like Windows not
> letting you delete a file while an application has it opened, which just leads
> to randomly killing programs until you find the right one. It's frustrating
> and counter productive.
> 
> You're taking power away from the operator. In your deployment scenario
> this might make sense, but I think it's a really bad model in general. If I am
> privileged I need to be able to exercise that privilege. This means that if
> there is a netdevice in my network namespace, and I have CAP_NET_ADMIN
> or whatever, I can break the association.
> 
> So, to be constructive: I'd prefer bpf_link to replace a netlink attachment and
> vice versa. If you need to restrict control, use network namespaces
> to hide the devices, instead of hiding the bpffs.

I had a thought yesterday along similar lines: bpf_link is about
ownership and preventing "accidental" deletes. What's the observability
wrt to learning who owns a program at a specific attach point and can
that ever be hidden.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-27 11:06                                 ` Lorenz Bauer
  2020-03-27 16:12                                   ` David Ahern
@ 2020-03-27 19:42                                   ` Andrii Nakryiko
  2020-03-27 19:45                                   ` Andrii Nakryiko
  2020-03-27 23:09                                   ` Alexei Starovoitov
  3 siblings, 0 replies; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-27 19:42 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Toke Høiland-Jørgensen, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Andrey Ignatov, Networking, bpf

On Fri, Mar 27, 2020 at 4:07 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> On Thu, 26 Mar 2020 at 19:06, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Mar 26, 2020 at 5:35 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> > >
> > > > Now for XDP. It has same flawed model. And even if it seems to you
> > > > that it's not a big issue, and even if Jakub thinks we are trying to
> > > > solve non-existing problem, it is a real problem and a real concern
> > > > from people that have to support XDP in production with many
> > > > well-meaning developers developing BPF applications independently.
> > > > Copying what you wrote in another thread:
> > > >
> > > >> Setting aside the question of which is the best abstraction to represent
> > > >> an attachment, it seems to me that the actual behavioural problem (XDP
> > > >> programs being overridden by mistake) would be solvable by this patch,
> > > >> assuming well-behaved userspace applications.
> > > >
> > > > ... this is a horrible and unrealistic assumption that we just cannot
> > > > make and accept. However well-behaved userspace applications are, they
> > > > are written by people that make mistakes. And rather than blissfully
> > > > expect that everything will be fine, we want to have enforcements in
> > > > place that will prevent some buggy application to wreck havoc in
> > > > production.
> > >
> > > Look, I'm not trying to tell you how to managed your internal systems.
> > > I'm just objecting to your assertion that your deployment model is the
> > > only one that can possibly work, and the refusal to consider other
> > > alternatives that comes with it.
> >
> > Your assumption doesn't work for us. Because of that we need something
> > like bpf_link. Existing attachment API doesn't go away and is still
> > supported. Feel free to use existing API. As for EXPECTED_FD API you
> > are adding, it will be up to maintainers to decide, ultimately, I
> > can't block it, even if I wanted to.
> >
> > >
> > > >> You're saying that like we didn't already have the netlink API. We
> > > >> essentially already have (the equivalent of) LINK_CREATE and LINK_QUERY,
> > > >> this is just adding LINK_UPDATE. It's a straight-forward fix of an
> > > >> existing API; essentially you're saying we should keep the old API in a
> > > >> crippled state in order to promote your (proposed) new API.
> > > >
> > > > This is the fundamental disagreement that we seem to have. XDP's BPF
> > > > program attachment is not in any way equivalent to bpf_link. So no,
> > > > netlink API currently doesn't have anything that's close to bpf_link.
> > > > Let me try to summarize what bpf_link is and what are its fundamental
> > > > properties regardless of type of BPF programs.
> > >
> > > First of all, thank you for this summary; that is very useful!
> >
> > Sure, you're welcome.
> >
> > >
> > > > 1. bpf_link represents a connection (pairing?) of BPF program and some
> > > > BPF hook it is attached to. BPF hook could be perf event, cgroup,
> > > > netdev, etc. It's a completely independent object in itself, along the
> > > > bpf_map and bpf_prog, which has its own lifetime and kernel
> > > > representation. To user-space application it is returned as an
> > > > installed FD, similar to loaded BPF program and BPF map. It is
> > > > important that it's not just a BPF program, because BPF program can be
> > > > attached to multiple BPF hooks (e.g., same XDP program can be attached
> > > > to multiple interface; same kprobe handler can be installed multiple
> > > > times), which means that having BPF program FD isn't enough to
> > > > uniquely represent that one specific BPF program attachment and detach
> > > > it or query it. Having kernel object for this allows to encapsulate
> > > > all these various details of what is attached were and present to
> > > > user-space a single handle (FD) to work with.
> > >
> > > For XDP there is already a unique handle, it's just implicit: Each
> > > netdev can have exactly one XDP program loaded. So I don't really see
> > > how bpf_link adds anything, other than another API for the same thing?
> >
> > I certainly failed to explain things clearly if you are still asking
> > this. See point #2, once you attach bpf_link you can't just replace
> > it. This is what XDP doesn't have right now.
>
> From your description I like bpf_link, because it'll make attachment easier
> to support, and the pinning behaviour also seems nice. I'm really not fussed
> by netlink vs syscall, whatever.

Great, thanks.

>
> However, this behaviour concerns me. It's like Windows not
> letting you delete a file while an application has it opened, which just leads
> to randomly killing programs until you find the right one. It's frustrating
> and counter productive.
>
> You're taking power away from the operator. In your deployment scenario
> this might make sense, but I think it's a really bad model in general. If I am
> privileged I need to be able to exercise that privilege. This means that if
> there is a netdevice in my network namespace, and I have CAP_NET_ADMIN
> or whatever, I can break the association.
>
> So, to be constructive: I'd prefer bpf_link to replace a netlink attachment and
> vice versa. If you need to restrict control, use network namespaces
> to hide the devices, instead of hiding the bpffs.

Alexei mentioned a "nuke" option few times already, that will solve
this. The idea is that human operation should be able to do this, but
not applications, even though they have CAP_NET_ADMIN. I don't know
how exactly interface will look like, but it shouldn't allow
applications just randomly replace bpf_link. There are legitimate use
cases where application has to have CAP_NET_ADMIN and we can't hide
netdevice from them, unfortunately.


>
> >
> > It's a game of picking features/properties in isolation and "we can do
> > this particular thing this different way with what we have". Please,
> > try consider all of it together, it's important. Every single aspect
> > of bpf_link is not that unique, but it's all of them together that
> > matter.
> >
> > >
> > > > 2. Due to having FD associated with bpf_link, it's not possible to
> > > > talk about "owning" bpf_link. If application created link and never
> > > > shared its FD with any other application, it is the sole owner of it.
> > > > But it also means that you can share it, if you need it. Now, once
> > > > application closes FD or app crashes and kernel automatically closes
> > > > that FD, bpf_link refcount is decremented. If it was the last or only
> > > > FD, it will trigger automatica detachment and clean up of that
> > > > particular BPF program attachment. Note, not a clean up of BPF
> > > > program, which can still be attached somewhere else: only that
> > > > particular attachment.
> > >
> > > This behaviour is actually one of my reservations against bpf_link for
> > > XDP: I think that automatically detaching XDP programs when the FD is
> > > closed is very much the wrong behaviour. An XDP program processes
> > > packets, and when loading one I very much expect it to keep doing that
> > > until I explicitly tell it to stop.
> >
> > As you mentioned earlier, "it's not the only one mode". Just like with
> > tracing APIs, you can imagine scripts that would adds their
> > packet-sniffing XDP program temporarily. If they crash, "temporarily"
> > turns into "permanently, but no one knows". This is bad. And again,
> > it's a choice, just with a default to auto-cleanup, because it's safe,
> > even if it requires extra step for applications willing to do
> > permanent XDP attachment.
> >
> > >
> > > > 3. This derives from the concept of ownership of bpf_link. Once
> > > > bpf_link is attached, no other application that doesn't own that
> > > > bpf_link can replace, detach or modify the link. For some cases it
> > > > doesn't matter. E.g., for tracing, all attachment to the same fentry
> > > > trampoline are completely independent. But for other cases this is
> > > > crucial property. E.g., when you attach BPF program in an exclusive
> > > > (single) mode, it means that particular cgroup and any of its children
> > > > cgroups can have any more BPF programs attached. This is important for
> > > > container management systems to enforce invariants and correct
> > > > functioning of the system. Right now it's very easy to violate that -
> > > > you just go and attach your own BPF program, and previous BPF program
> > > > gets automatically detached without original application that put it
> > > > there knowing about this. Chaos ensues after that and real people have
> > > > to deal with this. Which is why existing
> > > > BPF_PROG_ATTACH/BPF_PROG_DETACH API is inadequate and we are adding
> > > > bpf_link support.
> > >
> > > I can totally see how having an option to enforce a policy such as
> > > locking out others from installing cgroup BPF programs is useful. But
> > > such an option is just that: policy. So building this policy in as a
> > > fundamental property of the API seems like a bad idea; that is
> > > effectively enforcing policy in the kernel, isn't it?
> >
> > I hope we won't go into a dictionary definition of what "policy" means
> > here :). For me it's about guarantee that kernel gives to user-space.
> > bpf_link doesn't care about dictating policies. If you don't want this
> > guarantee - don't use bpf_link, use direct program attachment. As
> > simple as that. Policy is implemented by user-space application by
> > using APIs with just the right guarantees.
> >
> > >
> > > > Those same folks have similar concern with XDP. In the world where
> > > > container management installs "root" XDP program which other user
> > > > applications can plug into (libxdp use case, right?), it's crucial to
> > > > ensure that this root XDP program is not accidentally overwritten by
> > > > some well-meaning, but not overly cautious developer experimenting in
> > > > his own container with XDP programs. This is where bpf_link ownership
> > > > plays a huge role. Tupperware agent (FB's container management agent)
> > > > would install root XDP program and will hold onto this bpf_link
> > > > without sharing it with other applications. That will guarantee that
> > > > the system will be stable and can't be compromised.
> > >
> > > See this is where we get into "deployment-model specific territory". I
> > > mean, sure, in the "central management daemon" model, it makes sense
> > > that no other applications can replace the XDP program. But, erm, we
> > > already have a mechanism to ensure that: Just don't grant those
> > > applications CAP_NET_ADMIN? So again, bpf_link doesn't really seem to
> > > add anything other than a different way to do the same thing?
> >
> > Because there are still applications that need CAP_NET_ADMIN in order
> > to function (for other reasons than attaching XDP), so it's impossible
> > to enforce with for everyone.
>
> I think I'm missing some context. CAP_NET_ADMIN is trusted by definition,
> so trust these applications to not fiddle with XDP? Are there many of these?
> Are they inside a user namespace or something?
>
> >
> > >
> > > Additionally, in the case where there is *not* a central management
> > > daemon (i.e., what I'm implementing with libxdp), this would be the flow
> > > implemented by the library without bpf_link:
> > >
> > > 1. Query kernel for current BPF prog loaded on $IFACE
> > > 2. Sanity-check that this program is a dispatcher program installed by
> > >    libxdp
> > > 3. Create a new dispatcher program with whatever changes we want to do
> > >    (such as adding another component program).
> > > 4. Atomically replace the old program with the new one using the netlink
> > >    API in this patch series.
> > >
> > > Whereas with bpf_link, it would be:
> > >
> > > 1. Find the pinned bpf_link for $IFACE (e.g., load from
> > >    /sys/fs/bpf/iface-links/$IFNAME).
> >
> > But now you can hide this mount point from containerized
> > root/CAP_NET_ADMIN application, can't you? See the difference? One
> > might think about bpf_link as a fine-grained capability in this sense.
> >
> >
> > > 2. Query kernel for current BPF prog linked to $LINK
> > > 3. Sanity-check that this program is a dispatcher program installed by
> > >    libxdp
> > > 4. Create a new dispatcher program with whatever changes we want to do
> > >    (such as adding another component program).
> > > 5. Atomically replace the old program with the new one using the
> > >    LINK_UPDATE bpf() API.
> > >
> > >
> > > So all this does is add an additional step, and another dependency on
> > > bpffs. And crucially, I really don't see how the "bpf_link is the only
> > > thing that is not fundamentally broken" argument holds up.
> > >
> > > -Toke
> > >
>
>
>
> --
> Lorenz Bauer  |  Systems Engineer
> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
>
> www.cloudflare.com

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-27 11:06                                 ` Lorenz Bauer
  2020-03-27 16:12                                   ` David Ahern
  2020-03-27 19:42                                   ` Andrii Nakryiko
@ 2020-03-27 19:45                                   ` Andrii Nakryiko
  2020-03-27 23:09                                   ` Alexei Starovoitov
  3 siblings, 0 replies; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-27 19:45 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Toke Høiland-Jørgensen, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Andrey Ignatov, Networking, bpf

On Fri, Mar 27, 2020 at 4:07 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> On Thu, 26 Mar 2020 at 19:06, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Mar 26, 2020 at 5:35 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> > >
> > > > Now for XDP. It has same flawed model. And even if it seems to you
> > > > that it's not a big issue, and even if Jakub thinks we are trying to
> > > > solve non-existing problem, it is a real problem and a real concern
> > > > from people that have to support XDP in production with many
> > > > well-meaning developers developing BPF applications independently.
> > > > Copying what you wrote in another thread:
> > > >
> > > >> Setting aside the question of which is the best abstraction to represent
> > > >> an attachment, it seems to me that the actual behavioural problem (XDP
> > > >> programs being overridden by mistake) would be solvable by this patch,
> > > >> assuming well-behaved userspace applications.
> > > >
> > > > ... this is a horrible and unrealistic assumption that we just cannot
> > > > make and accept. However well-behaved userspace applications are, they
> > > > are written by people that make mistakes. And rather than blissfully
> > > > expect that everything will be fine, we want to have enforcements in
> > > > place that will prevent some buggy application to wreck havoc in
> > > > production.
> > >
> > > Look, I'm not trying to tell you how to managed your internal systems.
> > > I'm just objecting to your assertion that your deployment model is the
> > > only one that can possibly work, and the refusal to consider other
> > > alternatives that comes with it.
> >
> > Your assumption doesn't work for us. Because of that we need something
> > like bpf_link. Existing attachment API doesn't go away and is still
> > supported. Feel free to use existing API. As for EXPECTED_FD API you
> > are adding, it will be up to maintainers to decide, ultimately, I
> > can't block it, even if I wanted to.
> >
> > >
> > > >> You're saying that like we didn't already have the netlink API. We
> > > >> essentially already have (the equivalent of) LINK_CREATE and LINK_QUERY,
> > > >> this is just adding LINK_UPDATE. It's a straight-forward fix of an
> > > >> existing API; essentially you're saying we should keep the old API in a
> > > >> crippled state in order to promote your (proposed) new API.
> > > >
> > > > This is the fundamental disagreement that we seem to have. XDP's BPF
> > > > program attachment is not in any way equivalent to bpf_link. So no,
> > > > netlink API currently doesn't have anything that's close to bpf_link.
> > > > Let me try to summarize what bpf_link is and what are its fundamental
> > > > properties regardless of type of BPF programs.
> > >
> > > First of all, thank you for this summary; that is very useful!
> >
> > Sure, you're welcome.
> >
> > >
> > > > 1. bpf_link represents a connection (pairing?) of BPF program and some
> > > > BPF hook it is attached to. BPF hook could be perf event, cgroup,
> > > > netdev, etc. It's a completely independent object in itself, along the
> > > > bpf_map and bpf_prog, which has its own lifetime and kernel
> > > > representation. To user-space application it is returned as an
> > > > installed FD, similar to loaded BPF program and BPF map. It is
> > > > important that it's not just a BPF program, because BPF program can be
> > > > attached to multiple BPF hooks (e.g., same XDP program can be attached
> > > > to multiple interface; same kprobe handler can be installed multiple
> > > > times), which means that having BPF program FD isn't enough to
> > > > uniquely represent that one specific BPF program attachment and detach
> > > > it or query it. Having kernel object for this allows to encapsulate
> > > > all these various details of what is attached were and present to
> > > > user-space a single handle (FD) to work with.
> > >
> > > For XDP there is already a unique handle, it's just implicit: Each
> > > netdev can have exactly one XDP program loaded. So I don't really see
> > > how bpf_link adds anything, other than another API for the same thing?
> >
> > I certainly failed to explain things clearly if you are still asking
> > this. See point #2, once you attach bpf_link you can't just replace
> > it. This is what XDP doesn't have right now.
>
> From your description I like bpf_link, because it'll make attachment easier
> to support, and the pinning behaviour also seems nice. I'm really not fussed
> by netlink vs syscall, whatever.
>
> However, this behaviour concerns me. It's like Windows not
> letting you delete a file while an application has it opened, which just leads
> to randomly killing programs until you find the right one. It's frustrating
> and counter productive.
>
> You're taking power away from the operator. In your deployment scenario
> this might make sense, but I think it's a really bad model in general. If I am
> privileged I need to be able to exercise that privilege. This means that if
> there is a netdevice in my network namespace, and I have CAP_NET_ADMIN
> or whatever, I can break the association.
>
> So, to be constructive: I'd prefer bpf_link to replace a netlink attachment and
> vice versa. If you need to restrict control, use network namespaces
> to hide the devices, instead of hiding the bpffs.
>
> >
> > It's a game of picking features/properties in isolation and "we can do
> > this particular thing this different way with what we have". Please,
> > try consider all of it together, it's important. Every single aspect
> > of bpf_link is not that unique, but it's all of them together that
> > matter.
> >
> > >
> > > > 2. Due to having FD associated with bpf_link, it's not possible to
> > > > talk about "owning" bpf_link. If application created link and never
> > > > shared its FD with any other application, it is the sole owner of it.
> > > > But it also means that you can share it, if you need it. Now, once
> > > > application closes FD or app crashes and kernel automatically closes
> > > > that FD, bpf_link refcount is decremented. If it was the last or only
> > > > FD, it will trigger automatica detachment and clean up of that
> > > > particular BPF program attachment. Note, not a clean up of BPF
> > > > program, which can still be attached somewhere else: only that
> > > > particular attachment.
> > >
> > > This behaviour is actually one of my reservations against bpf_link for
> > > XDP: I think that automatically detaching XDP programs when the FD is
> > > closed is very much the wrong behaviour. An XDP program processes
> > > packets, and when loading one I very much expect it to keep doing that
> > > until I explicitly tell it to stop.
> >
> > As you mentioned earlier, "it's not the only one mode". Just like with
> > tracing APIs, you can imagine scripts that would adds their
> > packet-sniffing XDP program temporarily. If they crash, "temporarily"
> > turns into "permanently, but no one knows". This is bad. And again,
> > it's a choice, just with a default to auto-cleanup, because it's safe,
> > even if it requires extra step for applications willing to do
> > permanent XDP attachment.
> >
> > >
> > > > 3. This derives from the concept of ownership of bpf_link. Once
> > > > bpf_link is attached, no other application that doesn't own that
> > > > bpf_link can replace, detach or modify the link. For some cases it
> > > > doesn't matter. E.g., for tracing, all attachment to the same fentry
> > > > trampoline are completely independent. But for other cases this is
> > > > crucial property. E.g., when you attach BPF program in an exclusive
> > > > (single) mode, it means that particular cgroup and any of its children
> > > > cgroups can have any more BPF programs attached. This is important for
> > > > container management systems to enforce invariants and correct
> > > > functioning of the system. Right now it's very easy to violate that -
> > > > you just go and attach your own BPF program, and previous BPF program
> > > > gets automatically detached without original application that put it
> > > > there knowing about this. Chaos ensues after that and real people have
> > > > to deal with this. Which is why existing
> > > > BPF_PROG_ATTACH/BPF_PROG_DETACH API is inadequate and we are adding
> > > > bpf_link support.
> > >
> > > I can totally see how having an option to enforce a policy such as
> > > locking out others from installing cgroup BPF programs is useful. But
> > > such an option is just that: policy. So building this policy in as a
> > > fundamental property of the API seems like a bad idea; that is
> > > effectively enforcing policy in the kernel, isn't it?
> >
> > I hope we won't go into a dictionary definition of what "policy" means
> > here :). For me it's about guarantee that kernel gives to user-space.
> > bpf_link doesn't care about dictating policies. If you don't want this
> > guarantee - don't use bpf_link, use direct program attachment. As
> > simple as that. Policy is implemented by user-space application by
> > using APIs with just the right guarantees.
> >
> > >
> > > > Those same folks have similar concern with XDP. In the world where
> > > > container management installs "root" XDP program which other user
> > > > applications can plug into (libxdp use case, right?), it's crucial to
> > > > ensure that this root XDP program is not accidentally overwritten by
> > > > some well-meaning, but not overly cautious developer experimenting in
> > > > his own container with XDP programs. This is where bpf_link ownership
> > > > plays a huge role. Tupperware agent (FB's container management agent)
> > > > would install root XDP program and will hold onto this bpf_link
> > > > without sharing it with other applications. That will guarantee that
> > > > the system will be stable and can't be compromised.
> > >
> > > See this is where we get into "deployment-model specific territory". I
> > > mean, sure, in the "central management daemon" model, it makes sense
> > > that no other applications can replace the XDP program. But, erm, we
> > > already have a mechanism to ensure that: Just don't grant those
> > > applications CAP_NET_ADMIN? So again, bpf_link doesn't really seem to
> > > add anything other than a different way to do the same thing?
> >
> > Because there are still applications that need CAP_NET_ADMIN in order
> > to function (for other reasons than attaching XDP), so it's impossible
> > to enforce with for everyone.
>
> I think I'm missing some context. CAP_NET_ADMIN is trusted by definition,
> so trust these applications to not fiddle with XDP? Are there many of these?
> Are they inside a user namespace or something?

Sorry, missed this part. Yes, in our environment those will be
containerized applications. It doesn't matter how many of them there
are. There might be none right now, but we have no guarantee there
won't appear a new one, because we have many independent teams working
on different applications. And just a single one rogue application is
enough to wreck havoc. So again, it's about preventing, not just
detection.

>
> >
> > >
> > > Additionally, in the case where there is *not* a central management
> > > daemon (i.e., what I'm implementing with libxdp), this would be the flow
> > > implemented by the library without bpf_link:
> > >
> > > 1. Query kernel for current BPF prog loaded on $IFACE
> > > 2. Sanity-check that this program is a dispatcher program installed by
> > >    libxdp
> > > 3. Create a new dispatcher program with whatever changes we want to do
> > >    (such as adding another component program).
> > > 4. Atomically replace the old program with the new one using the netlink
> > >    API in this patch series.
> > >
> > > Whereas with bpf_link, it would be:
> > >
> > > 1. Find the pinned bpf_link for $IFACE (e.g., load from
> > >    /sys/fs/bpf/iface-links/$IFNAME).
> >
> > But now you can hide this mount point from containerized
> > root/CAP_NET_ADMIN application, can't you? See the difference? One
> > might think about bpf_link as a fine-grained capability in this sense.
> >
> >
> > > 2. Query kernel for current BPF prog linked to $LINK
> > > 3. Sanity-check that this program is a dispatcher program installed by
> > >    libxdp
> > > 4. Create a new dispatcher program with whatever changes we want to do
> > >    (such as adding another component program).
> > > 5. Atomically replace the old program with the new one using the
> > >    LINK_UPDATE bpf() API.
> > >
> > >
> > > So all this does is add an additional step, and another dependency on
> > > bpffs. And crucially, I really don't see how the "bpf_link is the only
> > > thing that is not fundamentally broken" argument holds up.
> > >
> > > -Toke
> > >
>
>
>
> --
> Lorenz Bauer  |  Systems Engineer
> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
>
> www.cloudflare.com

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-27 11:46                                 ` Toke Høiland-Jørgensen
@ 2020-03-27 20:07                                   ` Andrii Nakryiko
  2020-03-27 22:16                                     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-27 20:07 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On Fri, Mar 27, 2020 at 4:46 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Thu, Mar 26, 2020 at 5:35 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>
> >> > Now for XDP. It has same flawed model. And even if it seems to you
> >> > that it's not a big issue, and even if Jakub thinks we are trying to
> >> > solve non-existing problem, it is a real problem and a real concern
> >> > from people that have to support XDP in production with many
> >> > well-meaning developers developing BPF applications independently.
> >> > Copying what you wrote in another thread:
> >> >
> >> >> Setting aside the question of which is the best abstraction to represent
> >> >> an attachment, it seems to me that the actual behavioural problem (XDP
> >> >> programs being overridden by mistake) would be solvable by this patch,
> >> >> assuming well-behaved userspace applications.
> >> >
> >> > ... this is a horrible and unrealistic assumption that we just cannot
> >> > make and accept. However well-behaved userspace applications are, they
> >> > are written by people that make mistakes. And rather than blissfully
> >> > expect that everything will be fine, we want to have enforcements in
> >> > place that will prevent some buggy application to wreck havoc in
> >> > production.
> >>
> >> Look, I'm not trying to tell you how to managed your internal systems.
> >> I'm just objecting to your assertion that your deployment model is the
> >> only one that can possibly work, and the refusal to consider other
> >> alternatives that comes with it.
> >
> > Your assumption doesn't work for us. Because of that we need something
> > like bpf_link.
>
> I'm not disputing what you need for your use case; you obviously know
> better than me. I'm really just saying that your use case is not
> everyone's use case.
>
> > Existing attachment API doesn't go away and is still supported. Feel
> > free to use existing API.
>
> As far as I'm concerned that's what I'm trying to do. This patch series
> is really just fixing a bug in the existing API; to which the response
> was "no, that API is fundamentally broken, you have to use bpf_link
> instead". And *that* is what I am disputing.
>
> (I do have some reservations about details of bpf_link, see below, but
> I'm not actually totally against the whole concept).
>
> >> > 1. bpf_link represents a connection (pairing?) of BPF program and some
> >> > BPF hook it is attached to. BPF hook could be perf event, cgroup,
> >> > netdev, etc. It's a completely independent object in itself, along the
> >> > bpf_map and bpf_prog, which has its own lifetime and kernel
> >> > representation. To user-space application it is returned as an
> >> > installed FD, similar to loaded BPF program and BPF map. It is
> >> > important that it's not just a BPF program, because BPF program can be
> >> > attached to multiple BPF hooks (e.g., same XDP program can be attached
> >> > to multiple interface; same kprobe handler can be installed multiple
> >> > times), which means that having BPF program FD isn't enough to
> >> > uniquely represent that one specific BPF program attachment and detach
> >> > it or query it. Having kernel object for this allows to encapsulate
> >> > all these various details of what is attached were and present to
> >> > user-space a single handle (FD) to work with.
> >>
> >> For XDP there is already a unique handle, it's just implicit: Each
> >> netdev can have exactly one XDP program loaded. So I don't really see
> >> how bpf_link adds anything, other than another API for the same thing?
> >
> > I certainly failed to explain things clearly if you are still asking
> > this. See point #2, once you attach bpf_link you can't just replace
> > it. This is what XDP doesn't have right now.
>
> Those are two different things, though. I get that #2 is a new
> capability provided by bpf_link, I was just saying #1 isn't (for XDP).

bpf_link is combination of those different things... Independently
they are either impossible or insufficient. I'm not sure how that
doesn't answer your question:

> So I don't really see
> how bpf_link adds anything, other than another API for the same thing?

Please stop dodging. Just like with "rest of the kernel", but really
"just networking" from before.

>
> >> > 2. Due to having FD associated with bpf_link, it's not possible to
> >> > talk about "owning" bpf_link. If application created link and never
> >> > shared its FD with any other application, it is the sole owner of it.
> >> > But it also means that you can share it, if you need it. Now, once
> >> > application closes FD or app crashes and kernel automatically closes
> >> > that FD, bpf_link refcount is decremented. If it was the last or only
> >> > FD, it will trigger automatica detachment and clean up of that
> >> > particular BPF program attachment. Note, not a clean up of BPF
> >> > program, which can still be attached somewhere else: only that
> >> > particular attachment.
> >>
> >> This behaviour is actually one of my reservations against bpf_link for
> >> XDP: I think that automatically detaching XDP programs when the FD is
> >> closed is very much the wrong behaviour. An XDP program processes
> >> packets, and when loading one I very much expect it to keep doing that
> >> until I explicitly tell it to stop.
> >
> > As you mentioned earlier, "it's not the only one mode". Just like with
> > tracing APIs, you can imagine scripts that would adds their
> > packet-sniffing XDP program temporarily. If they crash, "temporarily"
> > turns into "permanently, but no one knows". This is bad. And again,
> > it's a choice, just with a default to auto-cleanup, because it's safe,
> > even if it requires extra step for applications willing to do
> > permanent XDP attachment.
>
> Well, there are two aspects to this: One is what should be the default -
> I'd argue that for XDP the most common case is 'permanent attachment'.
> But that can be worked around at the library level, so it's not that
> important (just a bit annoying for the library implementer, which just
> so happens to be me in this case :)).

Permanent attachment used to be common case for tracing until it
wasn't. Same is going to happen with cgroups. So not sure that's
strong argument, plus it's a matter of opinion, I don't think we can
have hard data on what's the most common use case. But my reasons are
due to safety, not popularity. Current default is not safe.

>
> The more important problem is that with "attach link + pin", we need two
> operations. So with that there is no longer a way to atomically do a
> permanent attach. And also there are two pieces of state (the pinned
> bpf_link + the attachment of that to the interface).

What does it mean "atomically do a permanent attach" and why is that
important? If your application attaches XDP and then crashes before it
can pin it, then it will be detached and cleaned up, which should
happen for every buggy program or if the environment is not set up
correctly (e.g., BPF FS is not mounted). Fix the program, if it can't
proceed without crashing to pinning. How is this important problem?

>
> >> > 3. This derives from the concept of ownership of bpf_link. Once
> >> > bpf_link is attached, no other application that doesn't own that
> >> > bpf_link can replace, detach or modify the link. For some cases it
> >> > doesn't matter. E.g., for tracing, all attachment to the same fentry
> >> > trampoline are completely independent. But for other cases this is
> >> > crucial property. E.g., when you attach BPF program in an exclusive
> >> > (single) mode, it means that particular cgroup and any of its children
> >> > cgroups can have any more BPF programs attached. This is important for
> >> > container management systems to enforce invariants and correct
> >> > functioning of the system. Right now it's very easy to violate that -
> >> > you just go and attach your own BPF program, and previous BPF program
> >> > gets automatically detached without original application that put it
> >> > there knowing about this. Chaos ensues after that and real people have
> >> > to deal with this. Which is why existing
> >> > BPF_PROG_ATTACH/BPF_PROG_DETACH API is inadequate and we are adding
> >> > bpf_link support.
> >>
> >> I can totally see how having an option to enforce a policy such as
> >> locking out others from installing cgroup BPF programs is useful. But
> >> such an option is just that: policy. So building this policy in as a
> >> fundamental property of the API seems like a bad idea; that is
> >> effectively enforcing policy in the kernel, isn't it?
> >
> > I hope we won't go into a dictionary definition of what "policy" means
> > here :). For me it's about guarantee that kernel gives to user-space.
> > bpf_link doesn't care about dictating policies. If you don't want this
> > guarantee - don't use bpf_link, use direct program attachment. As
> > simple as that. Policy is implemented by user-space application by
> > using APIs with just the right guarantees.
>
> Yes, but the user-space application shouldn't get to choose the policy -
> the system administrator should. So an application should be able to
> *request* this behaviour, but it should be a policy decision whether to
> allow it. If the "locking" behaviour is built-in to the API, that
> separation becomes impossible.

This doesn't make any sense. If kernel said that it successfully
attached my program then application has all the rights to believe it
is attached. And won't be arbitrarily replaced by some other
application. This is not policy, this is fundamental guarantees.

Imagine that one application opens file and then seeks to some
position. Then another application runs, opens same file and seeks to
another position. Suddenly first application's file position gets
reset because of independent second application. That's not policy,
that's broken API.

Also, all this talk about well-behaved cooperation applications we
had. If that was reliable way to do things, kernels would just
implement cooperative multi-tasking and be done with it.

>
> >> > Those same folks have similar concern with XDP. In the world where
> >> > container management installs "root" XDP program which other user
> >> > applications can plug into (libxdp use case, right?), it's crucial to
> >> > ensure that this root XDP program is not accidentally overwritten by
> >> > some well-meaning, but not overly cautious developer experimenting in
> >> > his own container with XDP programs. This is where bpf_link ownership
> >> > plays a huge role. Tupperware agent (FB's container management agent)
> >> > would install root XDP program and will hold onto this bpf_link
> >> > without sharing it with other applications. That will guarantee that
> >> > the system will be stable and can't be compromised.
> >>
> >> See this is where we get into "deployment-model specific territory". I
> >> mean, sure, in the "central management daemon" model, it makes sense
> >> that no other applications can replace the XDP program. But, erm, we
> >> already have a mechanism to ensure that: Just don't grant those
> >> applications CAP_NET_ADMIN? So again, bpf_link doesn't really seem to
> >> add anything other than a different way to do the same thing?
> >
> > Because there are still applications that need CAP_NET_ADMIN in order
> > to function (for other reasons than attaching XDP), so it's impossible
> > to enforce with for everyone.
>
> But if you grant an application CAP_NET_ADMIN, it can wreak all sorts of
> havoc (the most obvious being just issuing 'ip link down' on the iface).
> So you're implicitly trusting it to be well-behaved, so why does this
> particular act of misbehaviour need a special kernel enforcement
> mechanism?

Well-behaved in the sense of not bringing system down, yes. But not
well-behaved in the sense of aware of all other BPF XDP users. It's
impossible to coordinate with 100% guarantee in a real-world big
company environment.

>
> >> Additionally, in the case where there is *not* a central management
> >> daemon (i.e., what I'm implementing with libxdp), this would be the flow
> >> implemented by the library without bpf_link:
> >>
> >> 1. Query kernel for current BPF prog loaded on $IFACE
> >> 2. Sanity-check that this program is a dispatcher program installed by
> >>    libxdp
> >> 3. Create a new dispatcher program with whatever changes we want to do
> >>    (such as adding another component program).
> >> 4. Atomically replace the old program with the new one using the netlink
> >>    API in this patch series.
> >>
> >> Whereas with bpf_link, it would be:
> >>
> >> 1. Find the pinned bpf_link for $IFACE (e.g., load from
> >>    /sys/fs/bpf/iface-links/$IFNAME).
> >
> > But now you can hide this mount point from containerized
> > root/CAP_NET_ADMIN application, can't you? See the difference? One
> > might think about bpf_link as a fine-grained capability in this sense.
>
> Yes, that may be a feature. But it may also be an anti-feature (I can't
> move an iface to a new namespace that doesn't have the original bpffs
> *without* preventing that namespace from replacing the XDP program).

There are other ways to share bpf_link FD, if sharing BPF FS is not an option.

> Also, why are we re-inventing an ad-hoc capability mechanism?

We are not, it's an analogy. Also, as far as I understand, existing
capabilities are all that fine grained to express any of this?

>
> -Toke
>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-27 16:12                                   ` David Ahern
@ 2020-03-27 20:10                                     ` Andrii Nakryiko
  2020-03-27 23:02                                     ` Alexei Starovoitov
  1 sibling, 0 replies; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-27 20:10 UTC (permalink / raw)
  To: David Ahern
  Cc: Lorenz Bauer, Toke Høiland-Jørgensen, John Fastabend,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, Andrey Ignatov,
	Networking, bpf

On Fri, Mar 27, 2020 at 9:12 AM David Ahern <dsahern@gmail.com> wrote:
>
> On 3/27/20 5:06 AM, Lorenz Bauer wrote:
> > However, this behaviour concerns me. It's like Windows not
> > letting you delete a file while an application has it opened, which just leads
> > to randomly killing programs until you find the right one. It's frustrating
> > and counter productive.
> >
> > You're taking power away from the operator. In your deployment scenario
> > this might make sense, but I think it's a really bad model in general. If I am
> > privileged I need to be able to exercise that privilege. This means that if
> > there is a netdevice in my network namespace, and I have CAP_NET_ADMIN
> > or whatever, I can break the association.
> >
> > So, to be constructive: I'd prefer bpf_link to replace a netlink attachment and
> > vice versa. If you need to restrict control, use network namespaces
> > to hide the devices, instead of hiding the bpffs.
>
> I had a thought yesterday along similar lines: bpf_link is about
> ownership and preventing "accidental" deletes. What's the observability
> wrt to learning who owns a program at a specific attach point and can
> that ever be hidden.

We are talking about adding LINK_QUERY command that will return
attached BPF program and ifindex or cgroup (or whatever else) that it
is attached to.

If it's about which applications holds open FD to bpf_link, it's the
same problem as with any other FD, I'm not sure there is a
well-defined solution to this problem. Using drgn script to get this
is one possible solution that can be implemented today without
extending any of kernel APIs.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-27 20:07                                   ` Andrii Nakryiko
@ 2020-03-27 22:16                                     ` Toke Høiland-Jørgensen
  2020-03-27 22:54                                       ` Andrii Nakryiko
  0 siblings, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-27 22:16 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> Please stop dodging. Just like with "rest of the kernel", but really
> "just networking" from before.

Look, if we can't have this conversation without throwing around
accusations of bad faith, I think it is best we just take Ed's advice
and leave it until after the merge window.

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-27 22:16                                     ` Toke Høiland-Jørgensen
@ 2020-03-27 22:54                                       ` Andrii Nakryiko
  2020-03-28  1:09                                         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-27 22:54 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On Fri, Mar 27, 2020 at 3:17 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > Please stop dodging. Just like with "rest of the kernel", but really
> > "just networking" from before.
>
> Look, if we can't have this conversation without throwing around
> accusations of bad faith, I think it is best we just take Ed's advice
> and leave it until after the merge window.
>

Toke, if me pointing out that you are dodging original discussion and
pivoting offends you, by all means, you don't have to continue. But if
you are still with me, let's look at this particular part of
discussion:

>> >> For XDP there is already a unique handle, it's just implicit: Each
>> >> netdev can have exactly one XDP program loaded. So I don't really see
>> >> how bpf_link adds anything, other than another API for the same thing?
>> >
>> > I certainly failed to explain things clearly if you are still asking
>> > this. See point #2, once you attach bpf_link you can't just replace
>> > it. This is what XDP doesn't have right now.
>>
>> Those are two different things, though. I get that #2 is a new
>> capability provided by bpf_link, I was just saying #1 isn't (for XDP).
>
> bpf_link is combination of those different things... Independently
> they are either impossible or insufficient. I'm not sure how that
> doesn't answer your question:
>
>> So I don't really see
>> how bpf_link adds anything, other than another API for the same thing?
>
> Please stop dodging. Just like with "rest of the kernel", but really
> "just networking" from before.

You said "So I don't really see how bpf_link adds anything, other than
another API for the same thing?". I explained that bpf_link is not the
same thing that exists already, thus it's not another API for the same
thing. You picked one property of bpf_link and claimed it's the same
as what XDP has right now. "I get that #2 is a new capability provided
by bpf_link, I was just saying #1 isn't (for XDP)". So should I read
that as if you are agreeing and your original objection is rescinded?
If yes, then good, this part is concluded and I'm sorry if I
misinterpreted your answer.

But if not, then you again are picking one properly and just saying
"but XDP has it" without considering all of bpf_link properties as a
whole. In that case I do think you are arguing not in good faith.
Simple as that. I also hope I don't have to go all the way back to
"rest of the kernel", pivoted to "just networking" w.r.t.
subsystem-specific configuration/attachment APIs to explain another
reference.

P.S. I don't know how merge window has anything to do with this whole
discussion, honestly...

>> >


> -Toke
>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-27 12:06                                 ` Toke Høiland-Jørgensen
@ 2020-03-27 23:00                                   ` Alexei Starovoitov
  2020-03-28  1:43                                     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-27 23:00 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Fri, Mar 27, 2020 at 01:06:46PM +0100, Toke Høiland-Jørgensen wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> 
> > On Thu, Mar 26, 2020 at 01:35:13PM +0100, Toke Høiland-Jørgensen wrote:
> >> 
> >> Additionally, in the case where there is *not* a central management
> >> daemon (i.e., what I'm implementing with libxdp), this would be the flow
> >> implemented by the library without bpf_link:
> >> 
> >> 1. Query kernel for current BPF prog loaded on $IFACE
> >> 2. Sanity-check that this program is a dispatcher program installed by
> >>    libxdp
> >> 3. Create a new dispatcher program with whatever changes we want to do
> >>    (such as adding another component program).
> >> 4. Atomically replace the old program with the new one using the netlink
> >>    API in this patch series.
> >
> > in this model what stops another application that is not using libdispatcher to
> > nuke dispatcher program ?
> 
> Nothing. But nothing is stopping it from issuing 'ip link down' either -
> an application with CAP_NET_ADMIN is implicitly trusted to be
> well-behaved. This patch series is just adding the kernel primitive that
> enables applications to be well-behaved. I consider it an API bug-fix.

I think what you're proposing is not a fix, but a band-aid.
And from what I can read in this thread you remain unconvinced that
you will hit exactly the same issues we're describing.
We hit them already and you will hit them a year from now.
Simply because fb usage of all parts of bpf are about 3-4 years ahead
of everyone else.
I'm trying to convince you that your libxdp will be in much better
shape a year from now. It will be prepared for a situation when
other libxdp clones exist and are trying to do the same.
While you're saying:
"let me shot myself in the foot. I know what I'm doing. I'll be fine".
I know you will not be. And soon enough you'll come back proposing
locking, id, owner apis for xdp.

> >> Whereas with bpf_link, it would be:
> >> 
> >> 1. Find the pinned bpf_link for $IFACE (e.g., load from
> >>    /sys/fs/bpf/iface-links/$IFNAME).
> >> 2. Query kernel for current BPF prog linked to $LINK
> >> 3. Sanity-check that this program is a dispatcher program installed by
> >>    libxdp
> >> 4. Create a new dispatcher program with whatever changes we want to do
> >>    (such as adding another component program).
> >> 5. Atomically replace the old program with the new one using the
> >>    LINK_UPDATE bpf() API.
> >
> > whereas here dispatcher program is only accessible to libdispatcher.
> > Instance of bpffs needs to be known to libdispatcher only.
> > That's the ownership I've been talking about.
> >
> > As discussed early we need a way for _human_ to nuke dispatcher program,
> > but such api shouldn't be usable out of application/task.
> 
> As long as there is this kind of override in place, I'm not actually
> fundamentally opposed to the concept of bpf_link for XDP, as an
> additional mechanism. What I'm opposed to is using bpf_link as a reason
> to block this series.
> 
> In fact, a way to implement the "human override" you mention, could be
> to reuse the mechanism implemented in this series: If the EXPECTED_FD
> passed via netlink is a bpf_link FD, that could be interpreted as an
> override by the kernel.

That's not "human override". You want to use expected_fd in libxdp.
That's not human. That's any 'yum install firewall' will be nuking
the bpf_link and careful orchestration of our libxdp.

As far as blocking cap_net_admin...
you mentioned that use case is to do:
sudo yum install firewall1
sudo yum install firewall2

when these packages are being installed they will invoke startup scripts
that will install their dispatcher progs on eth0.
Imagine firewall2 is not using correct vestion of libxdp. or buggy one.
all the good work from firewall1 went down the drain.
Note in both cases you only need cap_net_admin to install the prog.
The packages will not be reconfiguring eth0. They need to be told
which interface to apply firewall to. That's all.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-27 16:12                                   ` David Ahern
  2020-03-27 20:10                                     ` Andrii Nakryiko
@ 2020-03-27 23:02                                     ` Alexei Starovoitov
  2020-03-30 15:25                                       ` Edward Cree
  1 sibling, 1 reply; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-27 23:02 UTC (permalink / raw)
  To: David Ahern
  Cc: Lorenz Bauer, Andrii Nakryiko, Toke Høiland-Jørgensen,
	John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Andrey Ignatov, Networking, bpf

On Fri, Mar 27, 2020 at 10:12:05AM -0600, David Ahern wrote:
> On 3/27/20 5:06 AM, Lorenz Bauer wrote:
> > However, this behaviour concerns me. It's like Windows not
> > letting you delete a file while an application has it opened, which just leads
> > to randomly killing programs until you find the right one. It's frustrating
> > and counter productive.
> > 
> > You're taking power away from the operator. In your deployment scenario
> > this might make sense, but I think it's a really bad model in general. If I am
> > privileged I need to be able to exercise that privilege. This means that if
> > there is a netdevice in my network namespace, and I have CAP_NET_ADMIN
> > or whatever, I can break the association.
> > 
> > So, to be constructive: I'd prefer bpf_link to replace a netlink attachment and
> > vice versa. If you need to restrict control, use network namespaces
> > to hide the devices, instead of hiding the bpffs.
> 
> I had a thought yesterday along similar lines: bpf_link is about
> ownership and preventing "accidental" deletes. What's the observability
> wrt to learning who owns a program at a specific attach point and can
> that ever be hidden.

Absolutely. all links should be visible somehow.
idr for links with equivalent get_next_id and get_fd_from_id will be available.
The mechanism for "human override" is tbd.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-27 11:06                                 ` Lorenz Bauer
                                                     ` (2 preceding siblings ...)
  2020-03-27 19:45                                   ` Andrii Nakryiko
@ 2020-03-27 23:09                                   ` Alexei Starovoitov
  3 siblings, 0 replies; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-27 23:09 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Andrii Nakryiko, Toke Høiland-Jørgensen,
	John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Andrey Ignatov, Networking, bpf

On Fri, Mar 27, 2020 at 11:06:59AM +0000, Lorenz Bauer wrote:
> 
> From your description I like bpf_link, because it'll make attachment easier
> to support, and the pinning behaviour also seems nice. I'm really not fussed
> by netlink vs syscall, whatever.
> 
> However, this behaviour concerns me. It's like Windows not
> letting you delete a file while an application has it opened, which just leads
> to randomly killing programs until you find the right one. It's frustrating
> and counter productive.
> 
> You're taking power away from the operator. In your deployment scenario
> this might make sense, but I think it's a really bad model in general. If I am
> privileged I need to be able to exercise that privilege. This means that if
> there is a netdevice in my network namespace, and I have CAP_NET_ADMIN
> or whatever, I can break the association.

I think I read a lot of assumptions in the above statement that are not the case.
Let me clarify:
bpf_link will not freeze the netdev that you cannot move it.
If you want to ifdown it. It's fine. It can go down.
If you want to move it to another netns it's also fine. bpf_link based attachment
either will become dangling or continue to exist in a different namespace.
That behavior is tbd.
If bpf_link was attached to veth and you want to delete that veth that's also
fine. bpf_link will surely be dangling at this point.

bpf_link is about preserving the ownership of the attachment of a program
to a netdev. I don't see how this is comparable with deletion of files in windows.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-26 20:05                                 ` Edward Cree
  2020-03-27 11:09                                   ` Lorenz Bauer
@ 2020-03-27 23:11                                   ` Alexei Starovoitov
  1 sibling, 0 replies; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-27 23:11 UTC (permalink / raw)
  To: Edward Cree
  Cc: Jakub Kicinski, Andrii Nakryiko,
	Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Thu, Mar 26, 2020 at 1:06 PM Edward Cree <ecree@solarflare.com> wrote:
>
> On 26/03/2020 19:40, Alexei Starovoitov wrote:
> > At this point I don't believe in your good intent.
> > Your repeated attacks on BPF in every thread are out of control.
> > I kept ignoring your insults for long time, but I cannot do this anymore.
> > Please find other threads to contribute your opinions.
> > They are not welcomed here.
> Given that this clearly won't land in this cycle (and neither will bpf_link
>  for XDP), can I suggest thateveryone involved steps back from the subject
>  for a few days to let tempers cool?  It's getting to the point where people
>  are burning bridges and saying things they might regret.
> I know everyone is under a lot of stress right now.

Same opinion a day later. No regrets.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-27 22:54                                       ` Andrii Nakryiko
@ 2020-03-28  1:09                                         ` Toke Høiland-Jørgensen
  2020-03-28  1:44                                           ` Andrii Nakryiko
  0 siblings, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-28  1:09 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Fri, Mar 27, 2020 at 3:17 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > Please stop dodging. Just like with "rest of the kernel", but really
>> > "just networking" from before.
>>
>> Look, if we can't have this conversation without throwing around
>> accusations of bad faith, I think it is best we just take Ed's advice
>> and leave it until after the merge window.
>>
>
> Toke, if me pointing out that you are dodging original discussion and
> pivoting offends you,

It does, because I'm not. See below.

> But if you are still with me, let's look at this particular part of
> discussion:
>
>>> >> For XDP there is already a unique handle, it's just implicit: Each
>>> >> netdev can have exactly one XDP program loaded. So I don't really see
>>> >> how bpf_link adds anything, other than another API for the same thing?
>>> >
>>> > I certainly failed to explain things clearly if you are still asking
>>> > this. See point #2, once you attach bpf_link you can't just replace
>>> > it. This is what XDP doesn't have right now.
>>>
>>> Those are two different things, though. I get that #2 is a new
>>> capability provided by bpf_link, I was just saying #1 isn't (for XDP).
>>
>> bpf_link is combination of those different things... Independently
>> they are either impossible or insufficient. I'm not sure how that
>> doesn't answer your question:
>>
>>> So I don't really see
>>> how bpf_link adds anything, other than another API for the same thing?
>>
>> Please stop dodging. Just like with "rest of the kernel", but really
>> "just networking" from before.
>
> You said "So I don't really see how bpf_link adds anything, other than
> another API for the same thing?". I explained that bpf_link is not the
> same thing that exists already, thus it's not another API for the same
> thing. You picked one property of bpf_link and claimed it's the same
> as what XDP has right now. "I get that #2 is a new capability provided
> by bpf_link, I was just saying #1 isn't (for XDP)". So should I read
> that as if you are agreeing and your original objection is rescinded?
> If yes, then good, this part is concluded and I'm sorry if I
> misinterpreted your answer.

Yes, I do believe that was a misinterpretation. Basically, by my
paraphrasing, our argument goes something like this:

What you said was: "bpf_link adds three things: 1. unique attachment
identifier, 2. auto-detach and 3. preventing others from overriding it".

And I replied: "1. already exists for XDP, 2. I don't think is the right
behaviour for XDP, and 3. I don't see the point of - hence I don't
believe bpf_link adds anything useful for my use case"

I was not trying to cherry-pick any of the properties, and I do
understand that 2. and 3. are new properties; I just disagree about how
useful they are (and thus whether they are worth introducing another API
for).

> But if not, then you again are picking one properly and just saying
> "but XDP has it" without considering all of bpf_link properties as a
> whole. In that case I do think you are arguing not in good faith.

I really don't see how you could read my emails and come to that
conclusion. But obviously you did, so I'll take that into consideration
and see if I can express myself clearer in the future. But know this: I
never deliberately argue in bad faith; so even if it seems like I am,
please extend me the courtesy of assuming that this is due to either a
misunderstanding or an honest difference in opinion. I will try to do
the same for you.

> Simple as that. I also hope I don't have to go all the way back to
> "rest of the kernel", pivoted to "just networking" w.r.t.
> subsystem-specific configuration/attachment APIs to explain another
> reference.

Again, I was not trying to "pivot", or attempting to use rhetorical
tricks to "win" or anything like that. I was making an observation about
how it's natural that when two subsystems interact, it's quite natural
that there will be clashes between their different "traditions". And
that how you view the subsystems' relationship with each other obviously
affects your opinion of what the right thing to do is in such a
situation. I never meant to imply anything concrete about BPF in
anything other than a networking context. And again, I don't understand
how you could read that out of what I wrote, but I'll take the fact that
you did into consideration in the future.

> P.S. I don't know how merge window has anything to do with this whole
> discussion, honestly...

Nothing apart from the merge window being a conveniently delimited
period of time to step away from things and focus on something else.

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-27 23:00                                   ` Alexei Starovoitov
@ 2020-03-28  1:43                                     ` Toke Høiland-Jørgensen
  2020-03-28  2:26                                       ` Alexei Starovoitov
  0 siblings, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-28  1:43 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Fri, Mar 27, 2020 at 01:06:46PM +0100, Toke Høiland-Jørgensen wrote:
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>> 
>> > On Thu, Mar 26, 2020 at 01:35:13PM +0100, Toke Høiland-Jørgensen wrote:
>> >> 
>> >> Additionally, in the case where there is *not* a central management
>> >> daemon (i.e., what I'm implementing with libxdp), this would be the flow
>> >> implemented by the library without bpf_link:
>> >> 
>> >> 1. Query kernel for current BPF prog loaded on $IFACE
>> >> 2. Sanity-check that this program is a dispatcher program installed by
>> >>    libxdp
>> >> 3. Create a new dispatcher program with whatever changes we want to do
>> >>    (such as adding another component program).
>> >> 4. Atomically replace the old program with the new one using the netlink
>> >>    API in this patch series.
>> >
>> > in this model what stops another application that is not using libdispatcher to
>> > nuke dispatcher program ?
>> 
>> Nothing. But nothing is stopping it from issuing 'ip link down' either -
>> an application with CAP_NET_ADMIN is implicitly trusted to be
>> well-behaved. This patch series is just adding the kernel primitive that
>> enables applications to be well-behaved. I consider it an API bug-fix.
>
> I think what you're proposing is not a fix, but a band-aid.

Even if that were the case, I don't see how that is an argument for not
fixing the old API. I mean, it's not going away, so why not improve it,
even though we disagree whether that improvement will make it "not
broken" or "less broken"? I could understand why you wouldn't want to do
that if it was a huge and invasive change; but it really isn't...

> And from what I can read in this thread you remain unconvinced that
> you will hit exactly the same issues we're describing.

Yes, quite right :)

> We hit them already and you will hit them a year from now.
> Simply because fb usage of all parts of bpf are about 3-4 years ahead
> of everyone else.
> I'm trying to convince you that your libxdp will be in much better
> shape a year from now. It will be prepared for a situation when
> other libxdp clones exist and are trying to do the same.
> While you're saying:
> "let me shot myself in the foot. I know what I'm doing. I'll be fine".

I'm not saying "let me shoot myself in the foot", I'm saying that the
protections you are talking about won't make any meaningful difference
for the amount of foot-shooting that will end up happening.

>> >> Whereas with bpf_link, it would be:
>> >> 
>> >> 1. Find the pinned bpf_link for $IFACE (e.g., load from
>> >>    /sys/fs/bpf/iface-links/$IFNAME).
>> >> 2. Query kernel for current BPF prog linked to $LINK
>> >> 3. Sanity-check that this program is a dispatcher program installed by
>> >>    libxdp
>> >> 4. Create a new dispatcher program with whatever changes we want to do
>> >>    (such as adding another component program).
>> >> 5. Atomically replace the old program with the new one using the
>> >>    LINK_UPDATE bpf() API.
>> >
>> > whereas here dispatcher program is only accessible to libdispatcher.
>> > Instance of bpffs needs to be known to libdispatcher only.
>> > That's the ownership I've been talking about.
>> >
>> > As discussed early we need a way for _human_ to nuke dispatcher program,
>> > but such api shouldn't be usable out of application/task.
>> 
>> As long as there is this kind of override in place, I'm not actually
>> fundamentally opposed to the concept of bpf_link for XDP, as an
>> additional mechanism. What I'm opposed to is using bpf_link as a reason
>> to block this series.
>> 
>> In fact, a way to implement the "human override" you mention, could be
>> to reuse the mechanism implemented in this series: If the EXPECTED_FD
>> passed via netlink is a bpf_link FD, that could be interpreted as an
>> override by the kernel.
>
> That's not "human override". You want to use expected_fd in libxdp.
> That's not human. That's any 'yum install firewall' will be nuking
> the bpf_link and careful orchestration of our libxdp.

No, I was certainly not planning to use that to teach libxdp to just
nuke any bpf_link it finds attached to an interface. Quite the contrary,
the point of this series is to allow libxdp to *avoid* replacing
something on the interface that it didn't put there itself.

> As far as blocking cap_net_admin...
> you mentioned that use case is to do:
> sudo yum install firewall1
> sudo yum install firewall2
>
> when these packages are being installed they will invoke startup scripts
> that will install their dispatcher progs on eth0.
> Imagine firewall2 is not using correct vestion of libxdp. or buggy one.
> all the good work from firewall1 went down the drain.

With a pinned bpf_link, both applications will get the link fd from the
same place, so if firewall2 (or its version of libxdp) is buggy, surely
it can interfere just as much with firewall1 if they are both using the
netlink API, no?

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-28  1:09                                         ` Toke Høiland-Jørgensen
@ 2020-03-28  1:44                                           ` Andrii Nakryiko
  2020-03-28 19:43                                             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-28  1:44 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On Fri, Mar 27, 2020 at 6:10 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Fri, Mar 27, 2020 at 3:17 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>
> >> > Please stop dodging. Just like with "rest of the kernel", but really
> >> > "just networking" from before.
> >>
> >> Look, if we can't have this conversation without throwing around
> >> accusations of bad faith, I think it is best we just take Ed's advice
> >> and leave it until after the merge window.
> >>
> >
> > Toke, if me pointing out that you are dodging original discussion and
> > pivoting offends you,
>
> It does, because I'm not. See below.
>
> > But if you are still with me, let's look at this particular part of
> > discussion:
> >
> >>> >> For XDP there is already a unique handle, it's just implicit: Each
> >>> >> netdev can have exactly one XDP program loaded. So I don't really see
> >>> >> how bpf_link adds anything, other than another API for the same thing?
> >>> >
> >>> > I certainly failed to explain things clearly if you are still asking
> >>> > this. See point #2, once you attach bpf_link you can't just replace
> >>> > it. This is what XDP doesn't have right now.
> >>>
> >>> Those are two different things, though. I get that #2 is a new
> >>> capability provided by bpf_link, I was just saying #1 isn't (for XDP).
> >>
> >> bpf_link is combination of those different things... Independently
> >> they are either impossible or insufficient. I'm not sure how that
> >> doesn't answer your question:
> >>
> >>> So I don't really see
> >>> how bpf_link adds anything, other than another API for the same thing?
> >>
> >> Please stop dodging. Just like with "rest of the kernel", but really
> >> "just networking" from before.
> >
> > You said "So I don't really see how bpf_link adds anything, other than
> > another API for the same thing?". I explained that bpf_link is not the
> > same thing that exists already, thus it's not another API for the same
> > thing. You picked one property of bpf_link and claimed it's the same
> > as what XDP has right now. "I get that #2 is a new capability provided
> > by bpf_link, I was just saying #1 isn't (for XDP)". So should I read
> > that as if you are agreeing and your original objection is rescinded?
> > If yes, then good, this part is concluded and I'm sorry if I
> > misinterpreted your answer.
>
> Yes, I do believe that was a misinterpretation. Basically, by my
> paraphrasing, our argument goes something like this:
>
> What you said was: "bpf_link adds three things: 1. unique attachment
> identifier, 2. auto-detach and 3. preventing others from overriding it".
>
> And I replied: "1. already exists for XDP, 2. I don't think is the right
> behaviour for XDP, and 3. I don't see the point of - hence I don't
> believe bpf_link adds anything useful for my use case"
>
> I was not trying to cherry-pick any of the properties, and I do
> understand that 2. and 3. are new properties; I just disagree about how
> useful they are (and thus whether they are worth introducing another API
> for).
>

I appreciate you summarizing. It makes everything clearer. I also
don't have much to add after so many rounds.

> > But if not, then you again are picking one properly and just saying
> > "but XDP has it" without considering all of bpf_link properties as a
> > whole. In that case I do think you are arguing not in good faith.
>
> I really don't see how you could read my emails and come to that
> conclusion. But obviously you did, so I'll take that into consideration
> and see if I can express myself clearer in the future. But know this: I
> never deliberately argue in bad faith; so even if it seems like I am,
> please extend me the courtesy of assuming that this is due to either a
> misunderstanding or an honest difference in opinion. I will try to do
> the same for you.

I guess me citing your previous replies and pointing out to
inconsistencies (at least from my interpretation of them) should have
been a signal ;) But I do assume good faith to the extent possible,
which is why we are still here at almost 80 emails in.

>
> > Simple as that. I also hope I don't have to go all the way back to
> > "rest of the kernel", pivoted to "just networking" w.r.t.
> > subsystem-specific configuration/attachment APIs to explain another
> > reference.
>
> Again, I was not trying to "pivot", or attempting to use rhetorical
> tricks to "win" or anything like that. I was making an observation about
> how it's natural that when two subsystems interact, it's quite natural
> that there will be clashes between their different "traditions". And
> that how you view the subsystems' relationship with each other obviously
> affects your opinion of what the right thing to do is in such a
> situation. I never meant to imply anything concrete about BPF in
> anything other than a networking context. And again, I don't understand
> how you could read that out of what I wrote, but I'll take the fact that
> you did into consideration in the future.

Because "rest of the kernel" meant "cgroup subsystem" as well, which
was clearly not true case w.r.t. BPF. But alright, water under the
bridge, let's just not use generalizations too much going forward.

> > P.S. I don't know how merge window has anything to do with this whole
> > discussion, honestly...
>
> Nothing apart from the merge window being a conveniently delimited
> period of time to step away from things and focus on something else.
>
> -Toke
>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-28  1:43                                     ` Toke Høiland-Jørgensen
@ 2020-03-28  2:26                                       ` Alexei Starovoitov
  2020-03-28 19:34                                         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-28  2:26 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Sat, Mar 28, 2020 at 02:43:18AM +0100, Toke Høiland-Jørgensen wrote:
> 
> No, I was certainly not planning to use that to teach libxdp to just
> nuke any bpf_link it finds attached to an interface. Quite the contrary,
> the point of this series is to allow libxdp to *avoid* replacing
> something on the interface that it didn't put there itself.

Exactly! "that it didn't put there itself".
How are you going to do that?
I really hope you thought it through and came up with magic.
Because I tried and couldn't figure out how to do that with IFLA_XDP*
Please walk me step by step how do you think it's possible.

I'm saying that without bpf_link for xdp libxdp has no ability to identify
an attachment that is theirs.

I suspect what is happening that you found first missing kernel feature
while implementing libxdp and trying to fix it by extending kernel api.
Well the reason libxdp is not part of libbpf is for it to be flexible
in design and have unstable api.
But you're using this unstable project as the reason to add stable apis
both to kernel and libbpf. I don't think that's workable because...

> I could understand why you wouldn't want to do
> that if it was a huge and invasive change; but it really isn't...

Yes. It's a small api extension to both kernel and libbpf.
But it means that by accepting this small change I sign up on maintaining it
forever. And I see how second and third such small experimental change will be
coming in the future. All such design revisions of libxdp will end up on my
plate to support forever in the kernel and in libbpf. I'm not excited to
support all of these experimental code.

I see two ways out of this stalemate:
1. assume that replace_fd extension landed and develop libxdp further
   into fully fledged library. May be not a complete library, but at least
   for few more weeks. If then you still think replace_fd is enough
   I'll land it.
2. I can land replace_fd now, but please don't be surprised that
   I will revert it several weeks from now when it's clear that
   it's not enough.
 
Which one do you prefer?

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-28  2:26                                       ` Alexei Starovoitov
@ 2020-03-28 19:34                                         ` Toke Høiland-Jørgensen
  2020-03-28 23:35                                           ` Alexei Starovoitov
  2020-03-29 20:23                                           ` Andrii Nakryiko
  0 siblings, 2 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-28 19:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Sat, Mar 28, 2020 at 02:43:18AM +0100, Toke Høiland-Jørgensen wrote:
>> 
>> No, I was certainly not planning to use that to teach libxdp to just
>> nuke any bpf_link it finds attached to an interface. Quite the contrary,
>> the point of this series is to allow libxdp to *avoid* replacing
>> something on the interface that it didn't put there itself.
>
> Exactly! "that it didn't put there itself".
> How are you going to do that?
> I really hope you thought it through and came up with magic.
> Because I tried and couldn't figure out how to do that with IFLA_XDP*
> Please walk me step by step how do you think it's possible.

I'm inspecting the BPF program itself to make sure it's compatible.
Specifically, I'm embedding a piece of metadata into the program BTF,
using Andrii's encoding trick that we also use for defining maps. So
xdp-dispatcher.c contains this[0]:

__uint(dispatcher_version, XDP_DISPATCHER_VERSION) SEC(XDP_METADATA_SECTION);

and libxdp will refuse to touch any program that it finds loaded on an
iface which doesn't have this, or which has a version number that is
higher than what the library understands. The code implementing the
check itself is this[1]:

static int check_dispatcher_version(struct btf *btf)
{
	const char *name = "dispatcher_version";
	const struct btf_type *sec, *def;
	__u32 version;

	sec = btf_get_datasec(btf, XDP_METADATA_SECTION);
	if (!sec)
		return -ENOENT;

	def = btf_get_section_var(btf, sec, name, BTF_KIND_PTR);
	if (IS_ERR(def))
		return PTR_ERR(def);

	if (!get_field_int(btf, name, def, &version))
		return -ENOENT;

	if (version > XDP_DISPATCHER_VERSION) {
		pr_warn("XDP dispatcher version %d higher than supported %d\n",
			version, XDP_DISPATCHER_VERSION);
		return -EOPNOTSUPP;
	}
	pr_debug("Verified XDP dispatcher version %d <= %d\n",
		 version, XDP_DISPATCHER_VERSION);
	return 0;
}

and is called both when loading the BPF object code from disk, and
before operating on a program already loaded into the kernel.

> I'm saying that without bpf_link for xdp libxdp has no ability to
> identify an attachment that is theirs.

Ah, so *that* was what you meant with "unique attachment". It never
occurred to me that answering this question ("is it my program?") was to
be a feature of bpf_link; I always assumed that would be a property of
the bpf_prog itself.

Any reason what I'm describing above wouldn't work for you?

> I suspect what is happening that you found first missing kernel feature
> while implementing libxdp and trying to fix it by extending kernel api.
> Well the reason libxdp is not part of libbpf is for it to be flexible
> in design and have unstable api.
> But you're using this unstable project as the reason to add stable apis
> both to kernel and libbpf. I don't think that's workable because...

That's certainly not my intention. I have done my best to think through
which is the minimum amount of kernel support I need to implement the
libxdp multi-prog feature set. When the initial freplace support landed
there was three things missing:

1. Ability to make freplace attachments permanent
2. Atomic replace of XDP programs
3. Multi-attach for freplace

Andrii already solved 1. with pinning, this is my attempt to solve 2.,
and 3. is TBD.

>> I could understand why you wouldn't want to do
>> that if it was a huge and invasive change; but it really isn't...
>
> Yes. It's a small api extension to both kernel and libbpf.
> But it means that by accepting this small change I sign up on maintaining it
> forever. And I see how second and third such small experimental change will be
> coming in the future. All such design revisions of libxdp will end up on my
> plate to support forever in the kernel and in libbpf. I'm not excited to
> support all of these experimental code.

I understand that, but as I said it's really not my intention to just
dump experimental code on you. And I also do consider this an obvious
API bugfix that is useful in its own right.

> I see two ways out of this stalemate:
> 1. assume that replace_fd extension landed and develop libxdp further
>    into fully fledged library. May be not a complete library, but at least
>    for few more weeks. If then you still think replace_fd is enough
>    I'll land it.
> 2. I can land replace_fd now, but please don't be surprised that
>    I will revert it several weeks from now when it's clear that
>    it's not enough.
>  
> Which one do you prefer?

I prefer 2. Reverting if it does turn out that I'm wrong is fine. Heck,
in that case I'll even send the revert myself :)

-Toke

[0] https://github.com/xdp-project/xdp-tools/blob/xdp-multi-prog/lib/libxdp/xdp-dispatcher.c.in#L61
[1] https://github.com/xdp-project/xdp-tools/blob/xdp-multi-prog/lib/libxdp/libxdp.c#L824


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-28  1:44                                           ` Andrii Nakryiko
@ 2020-03-28 19:43                                             ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-28 19:43 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Fri, Mar 27, 2020 at 6:10 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Fri, Mar 27, 2020 at 3:17 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> >>
>> >> > Please stop dodging. Just like with "rest of the kernel", but really
>> >> > "just networking" from before.
>> >>
>> >> Look, if we can't have this conversation without throwing around
>> >> accusations of bad faith, I think it is best we just take Ed's advice
>> >> and leave it until after the merge window.
>> >>
>> >
>> > Toke, if me pointing out that you are dodging original discussion and
>> > pivoting offends you,
>>
>> It does, because I'm not. See below.
>>
>> > But if you are still with me, let's look at this particular part of
>> > discussion:
>> >
>> >>> >> For XDP there is already a unique handle, it's just implicit: Each
>> >>> >> netdev can have exactly one XDP program loaded. So I don't really see
>> >>> >> how bpf_link adds anything, other than another API for the same thing?
>> >>> >
>> >>> > I certainly failed to explain things clearly if you are still asking
>> >>> > this. See point #2, once you attach bpf_link you can't just replace
>> >>> > it. This is what XDP doesn't have right now.
>> >>>
>> >>> Those are two different things, though. I get that #2 is a new
>> >>> capability provided by bpf_link, I was just saying #1 isn't (for XDP).
>> >>
>> >> bpf_link is combination of those different things... Independently
>> >> they are either impossible or insufficient. I'm not sure how that
>> >> doesn't answer your question:
>> >>
>> >>> So I don't really see
>> >>> how bpf_link adds anything, other than another API for the same thing?
>> >>
>> >> Please stop dodging. Just like with "rest of the kernel", but really
>> >> "just networking" from before.
>> >
>> > You said "So I don't really see how bpf_link adds anything, other than
>> > another API for the same thing?". I explained that bpf_link is not the
>> > same thing that exists already, thus it's not another API for the same
>> > thing. You picked one property of bpf_link and claimed it's the same
>> > as what XDP has right now. "I get that #2 is a new capability provided
>> > by bpf_link, I was just saying #1 isn't (for XDP)". So should I read
>> > that as if you are agreeing and your original objection is rescinded?
>> > If yes, then good, this part is concluded and I'm sorry if I
>> > misinterpreted your answer.
>>
>> Yes, I do believe that was a misinterpretation. Basically, by my
>> paraphrasing, our argument goes something like this:
>>
>> What you said was: "bpf_link adds three things: 1. unique attachment
>> identifier, 2. auto-detach and 3. preventing others from overriding it".
>>
>> And I replied: "1. already exists for XDP, 2. I don't think is the right
>> behaviour for XDP, and 3. I don't see the point of - hence I don't
>> believe bpf_link adds anything useful for my use case"
>>
>> I was not trying to cherry-pick any of the properties, and I do
>> understand that 2. and 3. are new properties; I just disagree about how
>> useful they are (and thus whether they are worth introducing another API
>> for).
>>
>
> I appreciate you summarizing. It makes everything clearer. I also
> don't have much to add after so many rounds.

Right, great, let's leave this here, then :)

>> > But if not, then you again are picking one properly and just saying
>> > "but XDP has it" without considering all of bpf_link properties as a
>> > whole. In that case I do think you are arguing not in good faith.
>>
>> I really don't see how you could read my emails and come to that
>> conclusion. But obviously you did, so I'll take that into consideration
>> and see if I can express myself clearer in the future. But know this: I
>> never deliberately argue in bad faith; so even if it seems like I am,
>> please extend me the courtesy of assuming that this is due to either a
>> misunderstanding or an honest difference in opinion. I will try to do
>> the same for you.
>
> I guess me citing your previous replies and pointing out to
> inconsistencies (at least from my interpretation of them) should have
> been a signal ;)

Well, it was my impression that we were making progress on this; which
is why I got so offended when I suddenly felt myself being accused :/

> But I do assume good faith to the extent possible, which is why we are
> still here at almost 80 emails in.

Great, thank you! And yeah, those emails did stack up, didn't they? I do
think we've made some progress, though, miscommunication and all :)

>> > Simple as that. I also hope I don't have to go all the way back to
>> > "rest of the kernel", pivoted to "just networking" w.r.t.
>> > subsystem-specific configuration/attachment APIs to explain another
>> > reference.
>>
>> Again, I was not trying to "pivot", or attempting to use rhetorical
>> tricks to "win" or anything like that. I was making an observation about
>> how it's natural that when two subsystems interact, it's quite natural
>> that there will be clashes between their different "traditions". And
>> that how you view the subsystems' relationship with each other obviously
>> affects your opinion of what the right thing to do is in such a
>> situation. I never meant to imply anything concrete about BPF in
>> anything other than a networking context. And again, I don't understand
>> how you could read that out of what I wrote, but I'll take the fact that
>> you did into consideration in the future.
>
> Because "rest of the kernel" meant "cgroup subsystem" as well, which
> was clearly not true case w.r.t. BPF. But alright, water under the
> bridge, let's just not use generalizations too much going forward.

Sure, sounds good.

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-28 19:34                                         ` Toke Høiland-Jørgensen
@ 2020-03-28 23:35                                           ` Alexei Starovoitov
  2020-03-29 10:39                                             ` Toke Høiland-Jørgensen
  2020-03-29 20:23                                           ` Andrii Nakryiko
  1 sibling, 1 reply; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-28 23:35 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Sat, Mar 28, 2020 at 08:34:12PM +0100, Toke Høiland-Jørgensen wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> 
> > On Sat, Mar 28, 2020 at 02:43:18AM +0100, Toke Høiland-Jørgensen wrote:
> >> 
> >> No, I was certainly not planning to use that to teach libxdp to just
> >> nuke any bpf_link it finds attached to an interface. Quite the contrary,
> >> the point of this series is to allow libxdp to *avoid* replacing
> >> something on the interface that it didn't put there itself.
> >
> > Exactly! "that it didn't put there itself".
> > How are you going to do that?
> > I really hope you thought it through and came up with magic.
> > Because I tried and couldn't figure out how to do that with IFLA_XDP*
> > Please walk me step by step how do you think it's possible.
> 
> I'm inspecting the BPF program itself to make sure it's compatible.
> Specifically, I'm embedding a piece of metadata into the program BTF,
> using Andrii's encoding trick that we also use for defining maps. So
> xdp-dispatcher.c contains this[0]:
> 
> __uint(dispatcher_version, XDP_DISPATCHER_VERSION) SEC(XDP_METADATA_SECTION);
> 
> and libxdp will refuse to touch any program that it finds loaded on an
> iface which doesn't have this, or which has a version number that is
> higher than what the library understands.

so libxdp will do:
ifindex -> id of currently attached prog -> fd -> prog_info -> btf -> read map
-> find "dispatcher_version"
and then it will do replace_fd with new version of the dispatcher ?
I see how this approach helps the second set of races (from fd into "dispatcher_version")
when another libxdp is doing the same.
But there is still a race in query->id->fd. Much smaller though.
In that sense replace_fd is a better behaved prog replacement than
just calling bpf_set_link_xdp_fd() without XDP_FLAGS_UPDATE_IF_NOEXIST.
But not much. The libxdp doesn't own the attachment.
If replace_fd fails what libxdp is going to do?
Try the whole thing from the beginning?
ifindex -> id2 -> fd2 ...
Say it succeeded.
But the libxdp1 that won the first race has no clue that libxdp2
retried and there is a different dispatcher prog there.
So you'll add netlink notifiers for libxdp to watch ?
That would mean that some user space process has to be always running
while typical firewall doesn't need any user space. The firewall.rpm can 
install its prog with all firewall rules, permanently link it to
the interface and exit.
But let's continue. So single libxdp daemon is now waiting for notifications
or both libxdp1 and libxdp2 that are part of two firewalls that are
being 'yum installed' are waiting for notifications?
How fight between libxdp1 and libxdp2 to install what they want going
to be resolved?
If their versions are the same I think they will settle quickly
since both libraries will see dispatcher prog with expected version number, right?
What if versions are different? Older libxdp or newer libxdp suppose to give up?
If libxdp2 is newer it will still be able to use older dispatcher prog
that was installed by libxdp1, but it would need to disable all new
user facing library features?

I guess all that is acceptable behavior to some libxdp users.

> > I'm saying that without bpf_link for xdp libxdp has no ability to
> > identify an attachment that is theirs.
> 
> Ah, so *that* was what you meant with "unique attachment". It never
> occurred to me that answering this question ("is it my program?") was to
> be a feature of bpf_link; I always assumed that would be a property of
> the bpf_prog itself.
> 
> Any reason what I'm describing above wouldn't work for you?

I don't see how this is even apples to apples comparison.
Racy query via id with sort-of "atomic" replacement and no ownership
vs guaranteed attachment with exact ownership and no races.

> > I see two ways out of this stalemate:
> > 1. assume that replace_fd extension landed and develop libxdp further
> >    into fully fledged library. May be not a complete library, but at least
> >    for few more weeks. If then you still think replace_fd is enough
> >    I'll land it.
> > 2. I can land replace_fd now, but please don't be surprised that
> >    I will revert it several weeks from now when it's clear that
> >    it's not enough.
> >  
> > Which one do you prefer?
> 
> I prefer 2. Reverting if it does turn out that I'm wrong is fine. Heck,
> in that case I'll even send the revert myself :)

Ok. Applied.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-28 23:35                                           ` Alexei Starovoitov
@ 2020-03-29 10:39                                             ` Toke Høiland-Jørgensen
  2020-03-29 19:26                                               ` Alexei Starovoitov
  0 siblings, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-29 10:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Sat, Mar 28, 2020 at 08:34:12PM +0100, Toke Høiland-Jørgensen wrote:
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>> 
>> > On Sat, Mar 28, 2020 at 02:43:18AM +0100, Toke Høiland-Jørgensen wrote:
>> >> 
>> >> No, I was certainly not planning to use that to teach libxdp to just
>> >> nuke any bpf_link it finds attached to an interface. Quite the contrary,
>> >> the point of this series is to allow libxdp to *avoid* replacing
>> >> something on the interface that it didn't put there itself.
>> >
>> > Exactly! "that it didn't put there itself".
>> > How are you going to do that?
>> > I really hope you thought it through and came up with magic.
>> > Because I tried and couldn't figure out how to do that with IFLA_XDP*
>> > Please walk me step by step how do you think it's possible.
>> 
>> I'm inspecting the BPF program itself to make sure it's compatible.
>> Specifically, I'm embedding a piece of metadata into the program BTF,
>> using Andrii's encoding trick that we also use for defining maps. So
>> xdp-dispatcher.c contains this[0]:
>> 
>> __uint(dispatcher_version, XDP_DISPATCHER_VERSION) SEC(XDP_METADATA_SECTION);
>> 
>> and libxdp will refuse to touch any program that it finds loaded on an
>> iface which doesn't have this, or which has a version number that is
>> higher than what the library understands.
>
> so libxdp will do:
> ifindex -> id of currently attached prog -> fd -> prog_info -> btf -> read map
> -> find "dispatcher_version"
> and then it will do replace_fd with new version of the dispatcher ?
> I see how this approach helps the second set of races (from fd into "dispatcher_version")
> when another libxdp is doing the same.
> But there is still a race in query->id->fd. Much smaller though.

You mean the program can disappear before the ID can be turned into an
fd? Yeah, I guess that can happen, but that can just be treated as a
failure that triggers the retry logic.

> In that sense replace_fd is a better behaved prog replacement than
> just calling bpf_set_link_xdp_fd() without XDP_FLAGS_UPDATE_IF_NOEXIST.
> But not much. The libxdp doesn't own the attachment.
> If replace_fd fails what libxdp is going to do?
> Try the whole thing from the beginning?
> ifindex -> id2 -> fd2 ...

Yes, this is predicated on a "retry on failure" logic.

> Say it succeeded.
> But the libxdp1 that won the first race has no clue that libxdp2
> retried and there is a different dispatcher prog there.
> So you'll add netlink notifiers for libxdp to watch ?

No, the idea is that the dispatchers are compatible. So app1 installs
dispatcher1 with sequence (prog1), then app2 installs dispatcher2 with
sequence (prog1,prog2) - or (prog2,prog1) depending on ordering.

> That would mean that some user space process has to be always running
> while typical firewall doesn't need any user space. The firewall.rpm can 
> install its prog with all firewall rules, permanently link it to
> the interface and exit.
> But let's continue. So single libxdp daemon is now waiting for notifications
> or both libxdp1 and libxdp2 that are part of two firewalls that are
> being 'yum installed' are waiting for notifications?
> How fight between libxdp1 and libxdp2 to install what they want going
> to be resolved?
> If their versions are the same I think they will settle quickly
> since both libraries will see dispatcher prog with expected version number, right?
> What if versions are different? Older libxdp or newer libxdp suppose to give up?
> If libxdp2 is newer it will still be able to use older dispatcher prog
> that was installed by libxdp1, but it would need to disable all new
> user facing library features?

It will depend on what changes between versions, I guess. But yeah, I
don't think we can completely rule out that a "compatibility mode" may
be necessary at some point. This is orthogonal to how the programs are
being attached, though.

> I guess all that is acceptable behavior to some libxdp users.

I believe so.

>> > I'm saying that without bpf_link for xdp libxdp has no ability to
>> > identify an attachment that is theirs.
>> 
>> Ah, so *that* was what you meant with "unique attachment". It never
>> occurred to me that answering this question ("is it my program?") was to
>> be a feature of bpf_link; I always assumed that would be a property of
>> the bpf_prog itself.
>> 
>> Any reason what I'm describing above wouldn't work for you?
>
> I don't see how this is even apples to apples comparison.
> Racy query via id with sort-of "atomic" replacement and no ownership
> vs guaranteed attachment with exact ownership and no races.

No, I guess in your "management daemon" case the kernel-enforced
exclusivity does come in handy. And as I said, I can live with there
being two APIs as long as there's a reasonable way to override the
bpf_link "lock" :)

>> > I see two ways out of this stalemate:
>> > 1. assume that replace_fd extension landed and develop libxdp further
>> >    into fully fledged library. May be not a complete library, but at least
>> >    for few more weeks. If then you still think replace_fd is enough
>> >    I'll land it.
>> > 2. I can land replace_fd now, but please don't be surprised that
>> >    I will revert it several weeks from now when it's clear that
>> >    it's not enough.
>> >  
>> > Which one do you prefer?
>> 
>> I prefer 2. Reverting if it does turn out that I'm wrong is fine. Heck,
>> in that case I'll even send the revert myself :)
>
> Ok. Applied.

Great, thanks!

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-29 10:39                                             ` Toke Høiland-Jørgensen
@ 2020-03-29 19:26                                               ` Alexei Starovoitov
  2020-03-30 10:19                                                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-29 19:26 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Sun, Mar 29, 2020 at 12:39:21PM +0200, Toke Høiland-Jørgensen wrote:
> 
> > I guess all that is acceptable behavior to some libxdp users.
> 
> I believe so.

Not for us. Sadly that's where we part ways. we will not be using your libxdp.
Existing xdp api was barely usable in the datacenter environment. replace_fd
makes no difference.

> exclusivity does come in handy. And as I said, I can live with there
> being two APIs as long as there's a reasonable way to override the
> bpf_link "lock" :)

I explained many times already that bpf_link for XDP is NOT a second api to do
the same thing. I understand that you think it's a second api, but when you
keep repeating 'second api' it makes other folks (who also don't understand the
difference) to make wrong conclusions that they can use either to achieve the
same thing. They cannot. And it makes my job explaining harder. So please drop
'second api' narrative.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-28 19:34                                         ` Toke Høiland-Jørgensen
  2020-03-28 23:35                                           ` Alexei Starovoitov
@ 2020-03-29 20:23                                           ` Andrii Nakryiko
  2020-03-30 13:53                                             ` Toke Høiland-Jørgensen
  2020-03-30 15:41                                             ` Edward Cree
  1 sibling, 2 replies; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-29 20:23 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Sat, Mar 28, 2020 at 12:34 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Sat, Mar 28, 2020 at 02:43:18AM +0100, Toke Høiland-Jørgensen wrote:
> >>
> >> No, I was certainly not planning to use that to teach libxdp to just
> >> nuke any bpf_link it finds attached to an interface. Quite the contrary,
> >> the point of this series is to allow libxdp to *avoid* replacing
> >> something on the interface that it didn't put there itself.
> >
> > Exactly! "that it didn't put there itself".
> > How are you going to do that?
> > I really hope you thought it through and came up with magic.
> > Because I tried and couldn't figure out how to do that with IFLA_XDP*
> > Please walk me step by step how do you think it's possible.
>
> I'm inspecting the BPF program itself to make sure it's compatible.
> Specifically, I'm embedding a piece of metadata into the program BTF,
> using Andrii's encoding trick that we also use for defining maps. So
> xdp-dispatcher.c contains this[0]:
>
> __uint(dispatcher_version, XDP_DISPATCHER_VERSION) SEC(XDP_METADATA_SECTION);
>
> and libxdp will refuse to touch any program that it finds loaded on an

But you can't say the same about other XDP applications that do not
use libxdp. So will your library come with a huge warning, e.g.:

WARNING! ANY XDP APPLICATION YOU INSTALL ON YOUR MACHINE THAT DOESN'T
USE LIBXDP WILL BREAK YOUR FIREWALLS/ROUTERS/OTHER LIBXDP
APPLICATIONS. USE AT YOUR OWN RISK.

So you install your libxdp-based firewalls and are happy. Then you
decide to install this awesome packet analyzer, which doesn't know
about libxdp yet. Suddenly, you get all packets analyzer, but no more
firewall, until users somehow notices that it's gone. Or firewall
periodically checks that it's still runinng. Both not great, IMO, but
might be acceptable for some users, I guess. But imagine all the
confusion for user, especially if he doesn't give a damn about XDP and
other buzzwords, but only needs a reliable firewall :)

[...]

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-29 19:26                                               ` Alexei Starovoitov
@ 2020-03-30 10:19                                                 ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-30 10:19 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Sun, Mar 29, 2020 at 12:39:21PM +0200, Toke Høiland-Jørgensen wrote:
>> 
>> > I guess all that is acceptable behavior to some libxdp users.
>> 
>> I believe so.
>
> Not for us. Sadly that's where we part ways. we will not be using your
> libxdp.

Up to you of course, but I must say that I'm a bit surprised that you
state this so categorically at this stage. As far as I'm concerned,
there is still plenty of opportunity for cooperation on this, and I'm
still quite willing to accommodate your needs in libxdp.

> Existing xdp api was barely usable in the datacenter environment. replace_fd
> makes no difference.
>
>> exclusivity does come in handy. And as I said, I can live with there
>> being two APIs as long as there's a reasonable way to override the
>> bpf_link "lock" :)
>
> I explained many times already that bpf_link for XDP is NOT a second api to do
> the same thing. I understand that you think it's a second api, but when you
> keep repeating 'second api' it makes other folks (who also don't understand the
> difference) to make wrong conclusions that they can use either to achieve the
> same thing. They cannot. And it makes my job explaining harder. So please drop
> 'second api' narrative.

I understand that they're different. Really, I do. That doesn't change
the fact that there will be two ways to install an XDP program, though.
Which is all I meant by "two APIs"; I was not implying they would be
completely equivalent in all ways.

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-29 20:23                                           ` Andrii Nakryiko
@ 2020-03-30 13:53                                             ` Toke Høiland-Jørgensen
  2020-03-30 20:17                                               ` Andrii Nakryiko
  2020-03-30 15:41                                             ` Edward Cree
  1 sibling, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-30 13:53 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Sat, Mar 28, 2020 at 12:34 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > On Sat, Mar 28, 2020 at 02:43:18AM +0100, Toke Høiland-Jørgensen wrote:
>> >>
>> >> No, I was certainly not planning to use that to teach libxdp to just
>> >> nuke any bpf_link it finds attached to an interface. Quite the contrary,
>> >> the point of this series is to allow libxdp to *avoid* replacing
>> >> something on the interface that it didn't put there itself.
>> >
>> > Exactly! "that it didn't put there itself".
>> > How are you going to do that?
>> > I really hope you thought it through and came up with magic.
>> > Because I tried and couldn't figure out how to do that with IFLA_XDP*
>> > Please walk me step by step how do you think it's possible.
>>
>> I'm inspecting the BPF program itself to make sure it's compatible.
>> Specifically, I'm embedding a piece of metadata into the program BTF,
>> using Andrii's encoding trick that we also use for defining maps. So
>> xdp-dispatcher.c contains this[0]:
>>
>> __uint(dispatcher_version, XDP_DISPATCHER_VERSION) SEC(XDP_METADATA_SECTION);
>>
>> and libxdp will refuse to touch any program that it finds loaded on an
>
> But you can't say the same about other XDP applications that do not
> use libxdp. So will your library come with a huge warning, e.g.:
>
> WARNING! ANY XDP APPLICATION YOU INSTALL ON YOUR MACHINE THAT DOESN'T
> USE LIBXDP WILL BREAK YOUR FIREWALLS/ROUTERS/OTHER LIBXDP
> APPLICATIONS. USE AT YOUR OWN RISK.
>
> So you install your libxdp-based firewalls and are happy. Then you
> decide to install this awesome packet analyzer, which doesn't know
> about libxdp yet. Suddenly, you get all packets analyzer, but no more
> firewall, until users somehow notices that it's gone. Or firewall
> periodically checks that it's still runinng. Both not great, IMO, but
> might be acceptable for some users, I guess. But imagine all the
> confusion for user, especially if he doesn't give a damn about XDP and
> other buzzwords, but only needs a reliable firewall :)

Yes, whereas if the firewall is using bpf_link, then the packet analyser
will be locked out and can't do its thing. Either way you end up with a
broken application; it's just moving the breakage. In the case of
firewall vs packet analyser it's probably clear what the right
precedence is, but what if it's firewall vs IDS? Or two different
firewall-type applications?

This is the reason I don't believe the problem bpf_link solves is such a
big deal: Since multi-prog is implemented in userspace it *fundamentally
requires* applications to coordinate. So all the kernel needs to provide,
is a way to help well-behaved applications do this coordination, for
which REPLACE_FD is sufficient.

Now, this picture changes a bit if you have a more-privileged
application managing things - such as the "xdp daemon" I believe you're
using, right? In that case it becomes obvious which application should
have precedence, and the "lock-out" feature makes sense (assuming you
can't just use capabilities to enforce the access restriction). This is
why I keep saying that I understand why you want bpf_link for you use
case, I just don't think it'll help mine much... :)

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-27 23:02                                     ` Alexei Starovoitov
@ 2020-03-30 15:25                                       ` Edward Cree
  2020-03-31  3:43                                         ` Alexei Starovoitov
  0 siblings, 1 reply; 112+ messages in thread
From: Edward Cree @ 2020-03-30 15:25 UTC (permalink / raw)
  To: Alexei Starovoitov, David Ahern
  Cc: Lorenz Bauer, Andrii Nakryiko, Toke Høiland-Jørgensen,
	John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Andrey Ignatov, Networking, bpf

On 27/03/2020 23:02, Alexei Starovoitov wrote:
> On Fri, Mar 27, 2020 at 10:12:05AM -0600, David Ahern wrote:
>> I had a thought yesterday along similar lines: bpf_link is about
>> ownership and preventing "accidental" deletes.
> The mechanism for "human override" is tbd.
Then that's a question you really need to solve, especially if you're
 going to push bpf_link quite so... forcefully.
Everything that a human operator can do, so can any program with the
 same capabilities/wheel bits.  Especially as the API that the
 operator-tool uses *will* be open and documented.  The Unix Way does
 not allow unscriptable interfaces, and heavily frowns at any kind of
 distinction between 'humans' and 'programs'.
So what will the override look like?  A bpf() syscall with a special
 BPF_F_IM_A_HUMAN_AND_I_KNOW_WHAT_IM_DOING flag?  ptracing the link
 owner, so that you can close() its fd?  Something in between?

In any case, the question is orthogonal to the bpf_link vs. netlink
 issue: the netlink XDP attach could be done with a flag that means
 "don't allow replacement/removal without EXPECTED_FD".  No?

-ed

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-29 20:23                                           ` Andrii Nakryiko
  2020-03-30 13:53                                             ` Toke Høiland-Jørgensen
@ 2020-03-30 15:41                                             ` Edward Cree
  2020-03-30 19:13                                               ` Jakub Kicinski
  2020-03-31  4:01                                               ` Alexei Starovoitov
  1 sibling, 2 replies; 112+ messages in thread
From: Edward Cree @ 2020-03-30 15:41 UTC (permalink / raw)
  To: Andrii Nakryiko, Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On 29/03/2020 21:23, Andrii Nakryiko wrote:
> But you can't say the same about other XDP applications that do not
> use libxdp. So will your library come with a huge warning
What about a system-wide policy switch to decide whether replacing/
 removing an XDP program without EXPECTED_FD is allowed?  That way
 the sysadmin gets to choose whether it's the firewall or the packet
 analyser that breaks, rather than baking a policy into the design.
Then libxdp just needs to say in the README "you might want to turn
 on this switch".  Or maybe it defaults to on, and the other program
 has to talk you into turning it off if it wants to be 'ill-behaved'.
Either way, affected users will be driven to the kernel's
 documentation for the policy switch, where we can tell them whatever
 we think they need to know.

-ed

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-30 15:41                                             ` Edward Cree
@ 2020-03-30 19:13                                               ` Jakub Kicinski
  2020-03-31  4:01                                               ` Alexei Starovoitov
  1 sibling, 0 replies; 112+ messages in thread
From: Jakub Kicinski @ 2020-03-30 19:13 UTC (permalink / raw)
  To: Edward Cree
  Cc: Andrii Nakryiko, Toke Høiland-Jørgensen,
	Alexei Starovoitov, John Fastabend, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On Mon, 30 Mar 2020 16:41:46 +0100 Edward Cree wrote:
> On 29/03/2020 21:23, Andrii Nakryiko wrote:
> > But you can't say the same about other XDP applications that do not
> > use libxdp. So will your library come with a huge warning  
> What about a system-wide policy switch to decide whether replacing/
>  removing an XDP program without EXPECTED_FD is allowed?  That way
>  the sysadmin gets to choose whether it's the firewall or the packet
>  analyser that breaks, rather than baking a policy into the design.
> Then libxdp just needs to say in the README "you might want to turn
>  on this switch".  Or maybe it defaults to on, and the other program
>  has to talk you into turning it off if it wants to be 'ill-behaved'.
> Either way, affected users will be driven to the kernel's
>  documentation for the policy switch, where we can tell them whatever
>  we think they need to know.

I had the same thought. But then again all samples specify IF_NOEXIST
AFAICS, and users will file bugs for replacing other apps. IMHO it's
kind of a responsibility of the distro to make sure that apps it packages
don't break each other. 

The mechanism to be well behaved exists, it's the sad reality of
backward compatibility that we can't just make it enforced by default
(IF_NOEXIST vs ALLOW_REPLACE).

So adding a knob seems perfectly reasonable, but perhaps we should see
one or two examples of apps actually getting it wrong before adding a
knob?

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-30 13:53                                             ` Toke Høiland-Jørgensen
@ 2020-03-30 20:17                                               ` Andrii Nakryiko
  2020-03-31 10:13                                                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-30 20:17 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Mon, Mar 30, 2020 at 6:53 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Sat, Mar 28, 2020 at 12:34 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >>
> >> > On Sat, Mar 28, 2020 at 02:43:18AM +0100, Toke Høiland-Jørgensen wrote:
> >> >>
> >> >> No, I was certainly not planning to use that to teach libxdp to just
> >> >> nuke any bpf_link it finds attached to an interface. Quite the contrary,
> >> >> the point of this series is to allow libxdp to *avoid* replacing
> >> >> something on the interface that it didn't put there itself.
> >> >
> >> > Exactly! "that it didn't put there itself".
> >> > How are you going to do that?
> >> > I really hope you thought it through and came up with magic.
> >> > Because I tried and couldn't figure out how to do that with IFLA_XDP*
> >> > Please walk me step by step how do you think it's possible.
> >>
> >> I'm inspecting the BPF program itself to make sure it's compatible.
> >> Specifically, I'm embedding a piece of metadata into the program BTF,
> >> using Andrii's encoding trick that we also use for defining maps. So
> >> xdp-dispatcher.c contains this[0]:
> >>
> >> __uint(dispatcher_version, XDP_DISPATCHER_VERSION) SEC(XDP_METADATA_SECTION);
> >>
> >> and libxdp will refuse to touch any program that it finds loaded on an
> >
> > But you can't say the same about other XDP applications that do not
> > use libxdp. So will your library come with a huge warning, e.g.:
> >
> > WARNING! ANY XDP APPLICATION YOU INSTALL ON YOUR MACHINE THAT DOESN'T
> > USE LIBXDP WILL BREAK YOUR FIREWALLS/ROUTERS/OTHER LIBXDP
> > APPLICATIONS. USE AT YOUR OWN RISK.
> >
> > So you install your libxdp-based firewalls and are happy. Then you
> > decide to install this awesome packet analyzer, which doesn't know
> > about libxdp yet. Suddenly, you get all packets analyzer, but no more
> > firewall, until users somehow notices that it's gone. Or firewall
> > periodically checks that it's still runinng. Both not great, IMO, but
> > might be acceptable for some users, I guess. But imagine all the
> > confusion for user, especially if he doesn't give a damn about XDP and
> > other buzzwords, but only needs a reliable firewall :)
>
> Yes, whereas if the firewall is using bpf_link, then the packet analyser
> will be locked out and can't do its thing. Either way you end up with a
> broken application; it's just moving the breakage. In the case of

Hm... In one case firewall installation reported success and stopped
working afterwards with no notification and user having no clue. In
another, packet analyzer refused to start and reported error to user.
Let's agree to disagree that those are not at all equivalent. To me
silent failure is so much worse, than application failing to start in
the first place.

> firewall vs packet analyser it's probably clear what the right
> precedence is, but what if it's firewall vs IDS? Or two different
> firewall-type applications?
>
> This is the reason I don't believe the problem bpf_link solves is such a
> big deal: Since multi-prog is implemented in userspace it *fundamentally
> requires* applications to coordinate. So all the kernel needs to provide,
> is a way to help well-behaved applications do this coordination, for
> which REPLACE_FD is sufficient.
>
> Now, this picture changes a bit if you have a more-privileged
> application managing things - such as the "xdp daemon" I believe you're
> using, right? In that case it becomes obvious which application should
> have precedence, and the "lock-out" feature makes sense (assuming you
> can't just use capabilities to enforce the access restriction). This is
> why I keep saying that I understand why you want bpf_link for you use
> case, I just don't think it'll help mine much... :)
>
> -Toke
>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-30 15:25                                       ` Edward Cree
@ 2020-03-31  3:43                                         ` Alexei Starovoitov
  2020-03-31 22:05                                           ` Edward Cree
  0 siblings, 1 reply; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-31  3:43 UTC (permalink / raw)
  To: Edward Cree
  Cc: David Ahern, Lorenz Bauer, Andrii Nakryiko,
	Toke Høiland-Jørgensen, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Andrey Ignatov, Networking, bpf

On Mon, Mar 30, 2020 at 04:25:07PM +0100, Edward Cree wrote:
> On 27/03/2020 23:02, Alexei Starovoitov wrote:
> > On Fri, Mar 27, 2020 at 10:12:05AM -0600, David Ahern wrote:
> >> I had a thought yesterday along similar lines: bpf_link is about
> >> ownership and preventing "accidental" deletes.
> > The mechanism for "human override" is tbd.
> Then that's a question you really need to solve, especially if you're
>  going to push bpf_link quite so... forcefully.
> Everything that a human operator can do, so can any program with the
>  same capabilities/wheel bits.  Especially as the API that the
>  operator-tool uses *will* be open and documented.  The Unix Way does
>  not allow unscriptable interfaces, and heavily frowns at any kind of
>  distinction between 'humans' and 'programs'.

can you share a link on such philosophy?
I was thinking something like CAPTCHA 'confirm if you're not a robot'
type of a button.
So humans doing 'bpftool link show' followed by 'bpftool link del id 123'
will work as expected, but processes cannot use the same api to
nuke other processes.

> So what will the override look like?  A bpf() syscall with a special
>  BPF_F_IM_A_HUMAN_AND_I_KNOW_WHAT_IM_DOING flag?  ptracing the link
>  owner, so that you can close() its fd?  Something in between?

not sure yet.

> In any case, the question is orthogonal to the bpf_link vs. netlink
>  issue: the netlink XDP attach could be done with a flag that means
>  "don't allow replacement/removal without EXPECTED_FD".  No?

Nothing to do with netlink, of course. Both XDP and tc bpf hooks
missing the concept of owner of the attachment.
For tc it's easier to implement and understand, since it allows
multi prog. If process A attaches a tc clsbpf prog via bpf_link
another process B will not be able to nuke it.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-30 15:41                                             ` Edward Cree
  2020-03-30 19:13                                               ` Jakub Kicinski
@ 2020-03-31  4:01                                               ` Alexei Starovoitov
  2020-03-31 11:34                                                 ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-31  4:01 UTC (permalink / raw)
  To: Edward Cree
  Cc: Andrii Nakryiko, Toke Høiland-Jørgensen,
	John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf

On Mon, Mar 30, 2020 at 04:41:46PM +0100, Edward Cree wrote:
> On 29/03/2020 21:23, Andrii Nakryiko wrote:
> > But you can't say the same about other XDP applications that do not
> > use libxdp. So will your library come with a huge warning
> What about a system-wide policy switch to decide whether replacing/
>  removing an XDP program without EXPECTED_FD is allowed?  That way
>  the sysadmin gets to choose whether it's the firewall or the packet
>  analyser that breaks, rather than baking a policy into the design.
> Then libxdp just needs to say in the README "you might want to turn
>  on this switch".  Or maybe it defaults to on, and the other program
>  has to talk you into turning it off if it wants to be 'ill-behaved'.

yeah. something like this can work for xdp only, but
it won't work for tc, since ownership is missing.
It looks like such policy knob will bere-inventing bpf_link for
one specific xdp case only because xdp has one program per attachment.

Imagine it was easy to come up with sensible policy and allow
multiple progs in xdp hook.
How would you implement such policy knob?
processA attaches prog XDP_A. processB attaches prog XDP-B.
Unless they start tagging their indivdual programs with BTF tags
(as Toke is planning to do) there is no way to tell them apart.
Then processA can iterate all progs in a hook, finds its prog
based on tag and tell kernel: "find and replace an xdp prog with old_fd
with new_fd on this ifindex".
Kinda works, but it doesn't stop processB to accidently detach prog XDP_A
that was installed by processA.

The kernel job is to share the system resources. Like memory, cpu time.
The hook is such resource too. The owner concept part of bpf_link
allows such sharing.

> Either way, affected users will be driven to the kernel's
>  documentation for the policy switch, where we can tell them whatever
>  we think they need to know.

In the data center there are no users. Few months back I described it
the single user system. A bunch of processes are competing for resources.
They can be all root, or all nobody, or containers with userns.
Neither user id nor caps can be such separator among processes for
the job of sharing bpf hook.
The tc/xdp/cgroup/tracing bpf attachment points need to be safely
shared among N root processes that are not cooperating with each other.
For tc, cgroup, tracing the problem is solved with bpf_link, since
they all allow multi prog.
XDP is the hardest, since it does single prog only.
That's what we're trying to solve with libdispatcher.
I think if it goes well it can become part of the kernel and kernel
will do multi prog XDP attach. And all hooks will be symmetrical.
But looking at the size of this thread and still lots of misunderstanding
about basic concept like bpf_link I'm not hopeful that libdispatcher
will ever become part of the kernel.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-30 20:17                                               ` Andrii Nakryiko
@ 2020-03-31 10:13                                                 ` Toke Høiland-Jørgensen
  2020-03-31 13:48                                                   ` Daniel Borkmann
  0 siblings, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-31 10:13 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

>> > So you install your libxdp-based firewalls and are happy. Then you
>> > decide to install this awesome packet analyzer, which doesn't know
>> > about libxdp yet. Suddenly, you get all packets analyzer, but no more
>> > firewall, until users somehow notices that it's gone. Or firewall
>> > periodically checks that it's still runinng. Both not great, IMO, but
>> > might be acceptable for some users, I guess. But imagine all the
>> > confusion for user, especially if he doesn't give a damn about XDP and
>> > other buzzwords, but only needs a reliable firewall :)
>>
>> Yes, whereas if the firewall is using bpf_link, then the packet analyser
>> will be locked out and can't do its thing. Either way you end up with a
>> broken application; it's just moving the breakage. In the case of
>
> Hm... In one case firewall installation reported success and stopped
> working afterwards with no notification and user having no clue. In
> another, packet analyzer refused to start and reported error to user.
> Let's agree to disagree that those are not at all equivalent. To me
> silent failure is so much worse, than application failing to start in
> the first place.

Oh, sure, obvious failures are preferable to silent ones, do doubt about
that. But for things to actually *work*, both applications need to agree
on how to do things, which in practice means they'll need to use the
same library. At which point you can solve this problem in the
library.

So again, I'm not saying the two are equivalent, I am just disagreeing
with you about how big the benefit is. And sure, we can agree to
disagree on that :)

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-31  4:01                                               ` Alexei Starovoitov
@ 2020-03-31 11:34                                                 ` Toke Høiland-Jørgensen
  2020-03-31 18:52                                                   ` Alexei Starovoitov
  0 siblings, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-31 11:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Edward Cree
  Cc: Andrii Nakryiko, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Mon, Mar 30, 2020 at 04:41:46PM +0100, Edward Cree wrote:
>> On 29/03/2020 21:23, Andrii Nakryiko wrote:
>> > But you can't say the same about other XDP applications that do not
>> > use libxdp. So will your library come with a huge warning
>> What about a system-wide policy switch to decide whether replacing/
>>  removing an XDP program without EXPECTED_FD is allowed?  That way
>>  the sysadmin gets to choose whether it's the firewall or the packet
>>  analyser that breaks, rather than baking a policy into the design.
>> Then libxdp just needs to say in the README "you might want to turn
>>  on this switch".  Or maybe it defaults to on, and the other program
>>  has to talk you into turning it off if it wants to be 'ill-behaved'.
>
> yeah. something like this can work for xdp only, but
> it won't work for tc, since ownership is missing.
> It looks like such policy knob will bere-inventing bpf_link for
> one specific xdp case only because xdp has one program per attachment.

You keep talking about this as though bpf_link was the existing API and
we're discussing adding another, when in reality it's the other way
around.

> Imagine it was easy to come up with sensible policy and allow
> multiple progs in xdp hook.
> How would you implement such policy knob?
> processA attaches prog XDP_A. processB attaches prog XDP-B.
> Unless they start tagging their indivdual programs with BTF tags
> (as Toke is planning to do) there is no way to tell them apart.
> Then processA can iterate all progs in a hook, finds its prog
> based on tag and tell kernel: "find and replace an xdp prog with old_fd
> with new_fd on this ifindex".
> Kinda works, but it doesn't stop processB to accidently detach prog XDP_A
> that was installed by processA.
>
> The kernel job is to share the system resources. Like memory, cpu time.
> The hook is such resource too. The owner concept part of bpf_link
> allows such sharing.

FWIW I actually agree that the bpf_link ownership concept makes sense
for the individual attachments in a multi-prog hook; including for XDP.
And I've started thinking about whether the bpf_link fd can work as the
reference being returned by libxdp after a component program is
attached. I have some reservations, but I'll start a new thread on that
once I'm a bit further along with it...

[...]

> XDP is the hardest, since it does single prog only.
> That's what we're trying to solve with libdispatcher.
> I think if it goes well it can become part of the kernel and kernel
> will do multi prog XDP attach. And all hooks will be symmetrical.

Now *that* I'd like to see! I've said from the beginning that I think
XDP multi-prog should be part of the kernel, so if we can get there via
this detour I'm all for it.

> But looking at the size of this thread and still lots of
> misunderstanding about basic concept like bpf_link I'm not hopeful
> that libdispatcher will ever become part of the kernel.

I don't share your pessimism. If we can stop writing off honest
disagreement about design tradeoffs as just "misunderstanding", I think
we can get there.

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-31 10:13                                                 ` Toke Høiland-Jørgensen
@ 2020-03-31 13:48                                                   ` Daniel Borkmann
  2020-03-31 15:00                                                     ` Toke Høiland-Jørgensen
  2020-03-31 20:15                                                     ` Andrii Nakryiko
  0 siblings, 2 replies; 112+ messages in thread
From: Daniel Borkmann @ 2020-03-31 13:48 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Andrii Nakryiko
  Cc: Alexei Starovoitov, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf, dsahern

On 3/31/20 12:13 PM, Toke Høiland-Jørgensen wrote:
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> 
>>>> So you install your libxdp-based firewalls and are happy. Then you
>>>> decide to install this awesome packet analyzer, which doesn't know
>>>> about libxdp yet. Suddenly, you get all packets analyzer, but no more
>>>> firewall, until users somehow notices that it's gone. Or firewall
>>>> periodically checks that it's still runinng. Both not great, IMO, but
>>>> might be acceptable for some users, I guess. But imagine all the
>>>> confusion for user, especially if he doesn't give a damn about XDP and
>>>> other buzzwords, but only needs a reliable firewall :)
>>>
>>> Yes, whereas if the firewall is using bpf_link, then the packet analyser
>>> will be locked out and can't do its thing. Either way you end up with a
>>> broken application; it's just moving the breakage. In the case of
>>
>> Hm... In one case firewall installation reported success and stopped
>> working afterwards with no notification and user having no clue. In
>> another, packet analyzer refused to start and reported error to user.
>> Let's agree to disagree that those are not at all equivalent. To me
>> silent failure is so much worse, than application failing to start in
>> the first place.

I sort of agree with both of you that either case is not great. The silent
override we currently have is not great since it can be evicted at any time
but also bpf_link to lock-out other programs at XDP layer is not great either
since there is also huge potential to break existing programs. It's probably
best to discuss on an actual proposal to see the concrete semantics, but my
concerns, assuming I didn't misunderstand or got confused on something along
the way (if so, please let me know), currently are:

  - System service XYZ starts to use XDP with bpf_link one day. Somehow this
    application gets shipped by default on mainstream distros and starts up
    during init, then effectively locking out everyone else that used to use
    the hook today "just fine" given they owned / orchestrated the underlying
    networking on the host namespace for the nodes they manage (and for that
    it worked before). Now such latter app somehow needs to work around this
    breakage by undoing the damage that XYZ did in order to be able to operate
    again. There was mentioned 'human override'. I presume whatever it will be,
    it will also be done by applications when they don't have another choice.
    Otherwise we need to go and tell users that XDP is now only _entirely_
    reserved for system service XYZ if you run distro ABC, but not for everyone
    else anymore; what answer is there to this? From a PoV where one owns the
    entire distro and ecosystem, this is fine, but where this is not the case
    as in the rest of the world having to rely on mainstream distros, what is
    the answer to users (and especially "those that don't give a damn about XDP,
    but just want to get stuff to work" that used to work before, even if we
    think silent override is broken)? If the answer is to just 'shrug' and tell
    'sorry that's the new way it is right now', then apps will try to use whatever
    'human override' there is, and we're back to square one. To provide a
    concrete example: if Cilium was configured to load some of its programs on
    XDP hook, it currently replaces whatever it was there before. The assumption
    is, that in the scenario we're in, we can orchestrate the hostns networking
    just fine on K8s nodes since there is just one CNI plugin taking care of that
    (and that generally also holds true for the other hooks we're using today).
    Now, while we could switch to bpf_link as well and implement it in iproute2
    for this specific case, what if someone else starting up earlier and locks
    our stuff out? How would we work around it?

  - Assuming we have XDP with bpf_link in place with the above, now applications
    are forced to start using bpf_link in order to not be locked out by others
    using bpf_link as otherwise their application would break. So they need to
    support the "old" way of attaching programs as we have today for older
    kernels and need to support the bpf_link attachment for newer kernels since
    they cannot rely on the old / existing API anymore. There is also a world
    outside of C/C++ and thus libbpf / lib{xdp,dispatcher} or whatever, so the
    whole rest of the ecosystem is forced to implement it as well due to breakage
    concerns, understandable, but quite a burden.

  - Equally, in case of Toke's implementation for the cmpxchg-like mechanism in
    XDP itself, what happens if an application uses this API and assuming the
    library would return the error to the application using it if the expected
    program is not attached? Then the application would go for a forceful override
    with the existing API today or would it voluntarily bail out and refusing to
    work if some other non-cooperating application was loaded in the meantime?
    What is the cmpxchg-like mechanism then solving realistically? (And again,
    please keep all in mind we cannot force the entire world to use one single
    library to rule 'em all, there are plenty of other language runtimes out in
    the wild that cannot just import C/C++.)

Thoughts?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-31 13:48                                                   ` Daniel Borkmann
@ 2020-03-31 15:00                                                     ` Toke Høiland-Jørgensen
  2020-03-31 20:19                                                       ` Andrii Nakryiko
  2020-03-31 20:15                                                     ` Andrii Nakryiko
  1 sibling, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-31 15:00 UTC (permalink / raw)
  To: Daniel Borkmann, Andrii Nakryiko
  Cc: Alexei Starovoitov, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, David S. Miller, Jesper Dangaard Brouer,
	Lorenz Bauer, Andrey Ignatov, Networking, bpf, dsahern

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 3/31/20 12:13 PM, Toke Høiland-Jørgensen wrote:
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> 
>>>>> So you install your libxdp-based firewalls and are happy. Then you
>>>>> decide to install this awesome packet analyzer, which doesn't know
>>>>> about libxdp yet. Suddenly, you get all packets analyzer, but no more
>>>>> firewall, until users somehow notices that it's gone. Or firewall
>>>>> periodically checks that it's still runinng. Both not great, IMO, but
>>>>> might be acceptable for some users, I guess. But imagine all the
>>>>> confusion for user, especially if he doesn't give a damn about XDP and
>>>>> other buzzwords, but only needs a reliable firewall :)
>>>>
>>>> Yes, whereas if the firewall is using bpf_link, then the packet analyser
>>>> will be locked out and can't do its thing. Either way you end up with a
>>>> broken application; it's just moving the breakage. In the case of
>>>
>>> Hm... In one case firewall installation reported success and stopped
>>> working afterwards with no notification and user having no clue. In
>>> another, packet analyzer refused to start and reported error to user.
>>> Let's agree to disagree that those are not at all equivalent. To me
>>> silent failure is so much worse, than application failing to start in
>>> the first place.
>
> I sort of agree with both of you that either case is not great. The silent
> override we currently have is not great since it can be evicted at any time
> but also bpf_link to lock-out other programs at XDP layer is not great either
> since there is also huge potential to break existing programs. It's probably
> best to discuss on an actual proposal to see the concrete semantics, but my
> concerns, assuming I didn't misunderstand or got confused on something along
> the way (if so, please let me know), currently are:

I think you're summarising the issues well, with perhaps one thing
missing: The goal is to enable multi-prog execution, i.e., execute two
programs in sequence. So, when things work correctly the flow should be:

App1, loading prog1:
- get current program from $IFACE
- current program is NULL:
  -> build dispatcher(prog1)
  -> load dispatcher onto $IFACE with UPDATE_IF_NOEXIST flag
  -> success

Then, app2 loading prog2:
- get current program from $IFACE
- current program is dispatcher(prog1):
  -> build new dispatcher(prog1,prog2)
  -> atomically replace old dispatcher with new one
  -> success

As long as app1 and app2 agree on what a dispatcher looks like, and how
to update it, they can cooperatively install themselves in the chain, as
long as there's a way to resolve the race between reading and updating
the state in the kernel.

However, if they *don't* agree on how to build the dispatcher and run in
sequence, they are fundamentally incompatible. Which also means that
multi-prog operation is going to be incompatible with any application
that was written before it was implemented. The only way to avoid that
is to provide the multi-prog support in the kernel, in a way that is
compatible with the old API. I'm not sure if this is even possible; but
I certainly got a very emphatic NACK on any attempt to implement the
support in the kernel when I posted my initial patch back in the fall.

Also, to your point about needing a specific library: I've been saying
"using the same library" because I think that is the most likely way to
get applications to agree. But really, what's needed is more like a
protocol; there could in theory be several independent implementations
that interoperate. However, I don't see a way to make things compatible
with applications that don't follow that protocol; we only get to pick
the failure mode (and those failure modes I think you summarised quite
well).

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-31 11:34                                                 ` Toke Høiland-Jørgensen
@ 2020-03-31 18:52                                                   ` Alexei Starovoitov
  0 siblings, 0 replies; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-31 18:52 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Edward Cree, Andrii Nakryiko, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf

On Tue, Mar 31, 2020 at 01:34:00PM +0200, Toke Høiland-Jørgensen wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> 
> > On Mon, Mar 30, 2020 at 04:41:46PM +0100, Edward Cree wrote:
> >> On 29/03/2020 21:23, Andrii Nakryiko wrote:
> >> > But you can't say the same about other XDP applications that do not
> >> > use libxdp. So will your library come with a huge warning
> >> What about a system-wide policy switch to decide whether replacing/
> >>  removing an XDP program without EXPECTED_FD is allowed?  That way
> >>  the sysadmin gets to choose whether it's the firewall or the packet
> >>  analyser that breaks, rather than baking a policy into the design.
> >> Then libxdp just needs to say in the README "you might want to turn
> >>  on this switch".  Or maybe it defaults to on, and the other program
> >>  has to talk you into turning it off if it wants to be 'ill-behaved'.
> >
> > yeah. something like this can work for xdp only, but
> > it won't work for tc, since ownership is missing.
> > It looks like such policy knob will bere-inventing bpf_link for
> > one specific xdp case only because xdp has one program per attachment.
> 
> You keep talking about this as though bpf_link was the existing API and
> we're discussing adding another, when in reality it's the other way
> around.

We explained it several times already that it is an existing API.
The _name_ bpf_link was formed only recently, but the concept
existed for very long time.
The raw_tp attach is nothing but bpf_link. It's FD based and it
preserves ownership (program execution guarantee).
Nothing can nuke it from under the process.
This was an api from the day one. See
commit c4f6699dfcb8 ("bpf: introduce BPF_RAW_TRACEPOINT")
from March 2018.
Then FD based [ku]probe and tracepoints were added
with the same two properties of bpf_link concept.
Then fentry/fexit attachment. Also FD based and execution guarantee.
And finally freplace. which is exact equivalent of bpf_link for xdp.
Since freplace can only be one, attaching freplace prog to another
program locks out any other process from attaching a different freplace
prog in the same spot (the same hook/function in the target prog).
To me that behavior looks like 100% equivalency to bpf_link for xdp.
While raw_tp/kprobe/tp/fentry/fexit/bpf_lsm are 100% equivalent to
what we want to do with bpf_link for TC (FD based multi prog with
all progs running and execution guarantee).

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-31 13:48                                                   ` Daniel Borkmann
  2020-03-31 15:00                                                     ` Toke Høiland-Jørgensen
@ 2020-03-31 20:15                                                     ` Andrii Nakryiko
  1 sibling, 0 replies; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-31 20:15 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	John Fastabend, Jakub Kicinski, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David S. Miller, Jesper Dangaard Brouer, Lorenz Bauer,
	Andrey Ignatov, Networking, bpf, David Ahern

On Tue, Mar 31, 2020 at 6:49 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 3/31/20 12:13 PM, Toke Høiland-Jørgensen wrote:
> > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >
> >>>> So you install your libxdp-based firewalls and are happy. Then you
> >>>> decide to install this awesome packet analyzer, which doesn't know
> >>>> about libxdp yet. Suddenly, you get all packets analyzer, but no more
> >>>> firewall, until users somehow notices that it's gone. Or firewall
> >>>> periodically checks that it's still runinng. Both not great, IMO, but
> >>>> might be acceptable for some users, I guess. But imagine all the
> >>>> confusion for user, especially if he doesn't give a damn about XDP and
> >>>> other buzzwords, but only needs a reliable firewall :)
> >>>
> >>> Yes, whereas if the firewall is using bpf_link, then the packet analyser
> >>> will be locked out and can't do its thing. Either way you end up with a
> >>> broken application; it's just moving the breakage. In the case of
> >>
> >> Hm... In one case firewall installation reported success and stopped
> >> working afterwards with no notification and user having no clue. In
> >> another, packet analyzer refused to start and reported error to user.
> >> Let's agree to disagree that those are not at all equivalent. To me
> >> silent failure is so much worse, than application failing to start in
> >> the first place.
>
> I sort of agree with both of you that either case is not great. The silent
> override we currently have is not great since it can be evicted at any time
> but also bpf_link to lock-out other programs at XDP layer is not great either
> since there is also huge potential to break existing programs. It's probably

I disagree with the premise that in current setup two XDP applications
can work at all. Best case, one will fail if specified the option to
not attach if something is already attached. Worst case, both will
happily assume (and report to user) that they are working, but only
the one attached last will actually work. Or maybe it's not even the
worst scenario, if both of them use netlink notification and keep
re-attaching themselves and detaching "opponent" (a fun bot fight to
watch...)

So what I hope we are discussing here is the world where some
applications are moving into using libxdp or some other co-operative
approach/daemon. In that case, utmost importance (otherwise its
unreliable and half-working solution) is to prevent XDP applications
not aware about this cooperation approach to break cooperating ones.
And that can be done only if libxdp/daemon can guarantee that *if it
successfully* attaches root XDP program, it won't be replaced by
oblivious XDP application that is not aware of it. So yes, in this
case non-cooperating application won't work (as it might not with
current API), but at least it will report that it can't attach, as
opposed to break *all* other nicely behaving and cooperating XDP
appplications **silently**. There are probably few more frustrating
things than silent corruption/breakage, IMO as both a user and
programmer.

> best to discuss on an actual proposal to see the concrete semantics, but my
> concerns, assuming I didn't misunderstand or got confused on something along
> the way (if so, please let me know), currently are:
>
>   - System service XYZ starts to use XDP with bpf_link one day. Somehow this
>     application gets shipped by default on mainstream distros and starts up
>     during init, then effectively locking out everyone else that used to use
>     the hook today "just fine" given they owned / orchestrated the underlying
>     networking on the host namespace for the nodes they manage (and for that

If XYZ didn't use bpf_link and just used existing API, it would break
everything as well, because see above, they can't co-exist. The
difference is in amount of undeterminism (who starts first and who's
second) and awareness (whether app even knows that it's broken).
Neither are in favor of existing API.

>     it worked before). Now such latter app somehow needs to work around this
>     breakage by undoing the damage that XYZ did in order to be able to operate
>     again. There was mentioned 'human override'. I presume whatever it will be,
>     it will also be done by applications when they don't have another choice.

No, it's opposite. That's why it's **human override**. It's not
intended to be used by application to "unblock" itself.

>     Otherwise we need to go and tell users that XDP is now only _entirely_
>     reserved for system service XYZ if you run distro ABC, but not for everyone
>     else anymore; what answer is there to this? From a PoV where one owns the

That's exactly what Toke's libxdp is intending to be. An XDP
coordination solution/library that other applications that want to
co-exist on the same network interface **have** to use. It intends to
allow all XDP applications to co-exist. That would be the answer - go
use that library.

>     entire distro and ecosystem, this is fine, but where this is not the case
>     as in the rest of the world having to rely on mainstream distros, what is
>     the answer to users (and especially "those that don't give a damn about XDP,
>     but just want to get stuff to work" that used to work before, even if we
>     think silent override is broken)? If the answer is to just 'shrug' and tell
>     'sorry that's the new way it is right now', then apps will try to use whatever
>     'human override' there is, and we're back to square one. To provide a
>     concrete example: if Cilium was configured to load some of its programs on
>     XDP hook, it currently replaces whatever it was there before. The assumption
>     is, that in the scenario we're in, we can orchestrate the hostns networking
>     just fine on K8s nodes since there is just one CNI plugin taking care of that
>     (and that generally also holds true for the other hooks we're using today).
>     Now, while we could switch to bpf_link as well and implement it in iproute2
>     for this specific case, what if someone else starting up earlier and locks
>     our stuff out? How would we work around it?

For one, you can use some tool/script (like the one I posted
yesterday: [0]) to find "offending" application that's not expected to
be using XDP and kill it (and/or investigate why on earth it got
installed/started in your infrastructure without you being aware). I
think this solution is better than nuke ("human override") option,
because it gives you clues on what's misbehaving and needs fixing in
the first place.

  [0] https://gist.github.com/anakryiko/562dff8e39c619a5ee247bb55aa057c7

>
>   - Assuming we have XDP with bpf_link in place with the above, now applications
>     are forced to start using bpf_link in order to not be locked out by others
>     using bpf_link as otherwise their application would break. So they need to
>     support the "old" way of attaching programs as we have today for older
>     kernels and need to support the bpf_link attachment for newer kernels since
>     they cannot rely on the old / existing API anymore. There is also a world
>     outside of C/C++ and thus libbpf / lib{xdp,dispatcher} or whatever, so the
>     whole rest of the ecosystem is forced to implement it as well due to breakage
>     concerns, understandable, but quite a burden.

Multi-XDP (on the same netdev, of course) doesn't exist and is not
possible today with existing API and semantics. The world outside of
C/C++ will either need to use compatible mechanisms (linking and using
libxdp with whatever means their language/runtime provides or at least
re-implementing the same set of protocols and behaviors).

>
>   - Equally, in case of Toke's implementation for the cmpxchg-like mechanism in
>     XDP itself, what happens if an application uses this API and assuming the
>     library would return the error to the application using it if the expected
>     program is not attached? Then the application would go for a forceful override
>     with the existing API today or would it voluntarily bail out and refusing to
>     work if some other non-cooperating application was loaded in the meantime?
>     What is the cmpxchg-like mechanism then solving realistically? (And again,
>     please keep all in mind we cannot force the entire world to use one single
>     library to rule 'em all, there are plenty of other language runtimes out in
>     the wild that cannot just import C/C++.)

It's not about using one specific library, but it is about following
the same protocol. I think that's what Toke, Alexei, Andrey and others
are assuming - that yes, if people want to write reliable XDP
applications co-existing with each other, they will have to use the
same library (easier, if possible) or at least follow the same
protocol.


>
> Thoughts?
>
> Thanks,
> Daniel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-31 15:00                                                     ` Toke Høiland-Jørgensen
@ 2020-03-31 20:19                                                       ` Andrii Nakryiko
  0 siblings, 0 replies; 112+ messages in thread
From: Andrii Nakryiko @ 2020-03-31 20:19 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Jakub Kicinski, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Lorenz Bauer, Andrey Ignatov, Networking,
	bpf, David Ahern

On Tue, Mar 31, 2020 at 8:00 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Daniel Borkmann <daniel@iogearbox.net> writes:
>
> > On 3/31/20 12:13 PM, Toke Høiland-Jørgensen wrote:
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>
> >>>>> So you install your libxdp-based firewalls and are happy. Then you
> >>>>> decide to install this awesome packet analyzer, which doesn't know
> >>>>> about libxdp yet. Suddenly, you get all packets analyzer, but no more
> >>>>> firewall, until users somehow notices that it's gone. Or firewall
> >>>>> periodically checks that it's still runinng. Both not great, IMO, but
> >>>>> might be acceptable for some users, I guess. But imagine all the
> >>>>> confusion for user, especially if he doesn't give a damn about XDP and
> >>>>> other buzzwords, but only needs a reliable firewall :)
> >>>>
> >>>> Yes, whereas if the firewall is using bpf_link, then the packet analyser
> >>>> will be locked out and can't do its thing. Either way you end up with a
> >>>> broken application; it's just moving the breakage. In the case of
> >>>
> >>> Hm... In one case firewall installation reported success and stopped
> >>> working afterwards with no notification and user having no clue. In
> >>> another, packet analyzer refused to start and reported error to user.
> >>> Let's agree to disagree that those are not at all equivalent. To me
> >>> silent failure is so much worse, than application failing to start in
> >>> the first place.
> >
> > I sort of agree with both of you that either case is not great. The silent
> > override we currently have is not great since it can be evicted at any time
> > but also bpf_link to lock-out other programs at XDP layer is not great either
> > since there is also huge potential to break existing programs. It's probably
> > best to discuss on an actual proposal to see the concrete semantics, but my
> > concerns, assuming I didn't misunderstand or got confused on something along
> > the way (if so, please let me know), currently are:
>
> I think you're summarising the issues well, with perhaps one thing
> missing: The goal is to enable multi-prog execution, i.e., execute two
> programs in sequence. So, when things work correctly the flow should be:
>
> App1, loading prog1:
> - get current program from $IFACE
> - current program is NULL:
>   -> build dispatcher(prog1)
>   -> load dispatcher onto $IFACE with UPDATE_IF_NOEXIST flag
>   -> success
>
> Then, app2 loading prog2:
> - get current program from $IFACE
> - current program is dispatcher(prog1):
>   -> build new dispatcher(prog1,prog2)
>   -> atomically replace old dispatcher with new one
>   -> success
>
> As long as app1 and app2 agree on what a dispatcher looks like, and how
> to update it, they can cooperatively install themselves in the chain, as
> long as there's a way to resolve the race between reading and updating
> the state in the kernel.
>
> However, if they *don't* agree on how to build the dispatcher and run in
> sequence, they are fundamentally incompatible. Which also means that
> multi-prog operation is going to be incompatible with any application
> that was written before it was implemented. The only way to avoid that
> is to provide the multi-prog support in the kernel, in a way that is
> compatible with the old API. I'm not sure if this is even possible; but
> I certainly got a very emphatic NACK on any attempt to implement the
> support in the kernel when I posted my initial patch back in the fall.
>
> Also, to your point about needing a specific library: I've been saying
> "using the same library" because I think that is the most likely way to
> get applications to agree. But really, what's needed is more like a
> protocol; there could in theory be several independent implementations
> that interoperate. However, I don't see a way to make things compatible
> with applications that don't follow that protocol; we only get to pick
> the failure mode (and those failure modes I think you summarised quite
> well).

Well, for once we agree with Toke in this thread (regarding last two
paragraphs) :)

>
> -Toke
>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-31  3:43                                         ` Alexei Starovoitov
@ 2020-03-31 22:05                                           ` Edward Cree
  2020-03-31 22:16                                             ` Alexei Starovoitov
  0 siblings, 1 reply; 112+ messages in thread
From: Edward Cree @ 2020-03-31 22:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Ahern, Lorenz Bauer, Andrii Nakryiko,
	Toke Høiland-Jørgensen, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Andrey Ignatov, Networking, bpf

On 31/03/2020 04:43, Alexei Starovoitov wrote:
> On Mon, Mar 30, 2020 at 04:25:07PM +0100, Edward Cree wrote:
>> Everything that a human operator can do, so can any program with the
>>  same capabilities/wheel bits.  Especially as the API that the
>>  operator-tool uses *will* be open and documented.  The Unix Way does
>>  not allow unscriptable interfaces, and heavily frowns at any kind of
>>  distinction between 'humans' and 'programs'.
> can you share a link on such philosophy?
It's not quite as explicit about it as I'd like, but
 http://www.catb.org/esr/writings/taoup/html/ch01s06.html#id2877684
 is the closest I can find right now.

-ed

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
  2020-03-31 22:05                                           ` Edward Cree
@ 2020-03-31 22:16                                             ` Alexei Starovoitov
  0 siblings, 0 replies; 112+ messages in thread
From: Alexei Starovoitov @ 2020-03-31 22:16 UTC (permalink / raw)
  To: Edward Cree
  Cc: David Ahern, Lorenz Bauer, Andrii Nakryiko,
	Toke Høiland-Jørgensen, John Fastabend, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, David S. Miller,
	Jesper Dangaard Brouer, Andrey Ignatov, Networking, bpf

On Tue, Mar 31, 2020 at 11:05:50PM +0100, Edward Cree wrote:
> On 31/03/2020 04:43, Alexei Starovoitov wrote:
> > On Mon, Mar 30, 2020 at 04:25:07PM +0100, Edward Cree wrote:
> >> Everything that a human operator can do, so can any program with the
> >>  same capabilities/wheel bits.  Especially as the API that the
> >>  operator-tool uses *will* be open and documented.  The Unix Way does
> >>  not allow unscriptable interfaces, and heavily frowns at any kind of
> >>  distinction between 'humans' and 'programs'.
> > can you share a link on such philosophy?
> It's not quite as explicit about it as I'd like, but
>  http://www.catb.org/esr/writings/taoup/html/ch01s06.html#id2877684
>  is the closest I can find right now.

I knew the bit you linked and I've read several "Rule of" up and down
in that doc and still don't see any mention of 'humans' vs 'programs'.
Unix philosophy can be rephrased as divide-and-conquer which is #1
principle in bpf architecture. In other words: build the smallest
possible mechanisms that are composable.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* bpf: ability to attach freplace to multiple parents
  2020-03-27 11:11                                 ` Toke Høiland-Jørgensen
@ 2020-04-02 20:21                                   ` Alexei Starovoitov
  2020-04-02 21:23                                     ` Toke Høiland-Jørgensen
  2020-04-02 21:24                                     ` Andrey Ignatov
  0 siblings, 2 replies; 112+ messages in thread
From: Alexei Starovoitov @ 2020-04-02 20:21 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Andrii Nakryiko, David S. Miller,
	Andrey Ignatov, Networking, bpf

On Fri, Mar 27, 2020 at 12:11:15PM +0100, Toke Høiland-Jørgensen wrote:
> 
> Current code is in [0], for those following along. There are two bits of
> kernel support missing before I can get it to where I want it for an
> initial "release": Atomic replace of the dispatcher (this series), and
> the ability to attach an freplace program to more than one "parent".
> I'll try to get an RFC out for the latter during the merge window, but
> I'll probably need some help in figuring out how to make it safe from
> the verifier PoV.

I have some thoughts on the second part "ability to attach an freplace
to more than one 'parent'".
I think the solution should be more generic than just freplace.
fentry/fexit need to have the same feature.
Few folks already said that they want to attach fentry to multiple
kernel functions. It's similar to what people do with kprobe progs now.
(attach to multiple and differentiate attach point based on parent IP)
Similarly "bpftool profile" needs it to avoid creating new pair of fentry/fexit
progs for every target bpf prog it's collecting stats about.
I didn't add this ability to fentry/fexit/freplace only to simplify
initial implementation ;) I think the time had come.
Currently fentry/fexit/freplace progs have single prog->aux->linked_prog pointer.
It just needs to become a linked list.
The api extension could be like this:
bpf_raw_tp_open(prog_fd, attach_prog_fd, attach_btf_id);
(currently it's just bpf_raw_tp_open(prog_fd))
The same pair of (attach_prog_fd, attach_btf_id) is already passed into prog_load
to hold the linked_prog and its corresponding btf_id.
I'm proposing to extend raw_tp_open with this pair as well to
attach existing fentry/fexit/freplace prog to another target.
Internally the kernel verify that btf of current linked_prog
exactly matches to btf of another requested linked_prog and
if they match it will attach the same prog to two target programs (in case of freplace)
or two kernel functions (in case of fentry/fexit).

Toke, Andrey,
if above kinda makes sense from high level description
I can prototype it quickly and then we can discuss details
in the patches ?
Or we can drill further into details and discuss corner cases.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: bpf: ability to attach freplace to multiple parents
  2020-04-02 20:21                                   ` bpf: ability to attach freplace to multiple parents Alexei Starovoitov
@ 2020-04-02 21:23                                     ` Toke Høiland-Jørgensen
  2020-04-02 21:54                                       ` Alexei Starovoitov
  2020-04-02 21:24                                     ` Andrey Ignatov
  1 sibling, 1 reply; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-02 21:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Andrii Nakryiko, David S. Miller,
	Andrey Ignatov, Networking, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Fri, Mar 27, 2020 at 12:11:15PM +0100, Toke Høiland-Jørgensen wrote:
>> 
>> Current code is in [0], for those following along. There are two bits of
>> kernel support missing before I can get it to where I want it for an
>> initial "release": Atomic replace of the dispatcher (this series), and
>> the ability to attach an freplace program to more than one "parent".
>> I'll try to get an RFC out for the latter during the merge window, but
>> I'll probably need some help in figuring out how to make it safe from
>> the verifier PoV.
>
> I have some thoughts on the second part "ability to attach an freplace
> to more than one 'parent'".
> I think the solution should be more generic than just freplace.
> fentry/fexit need to have the same feature.
> Few folks already said that they want to attach fentry to multiple
> kernel functions. It's similar to what people do with kprobe progs now.
> (attach to multiple and differentiate attach point based on parent IP)
> Similarly "bpftool profile" needs it to avoid creating new pair of fentry/fexit
> progs for every target bpf prog it's collecting stats about.
> I didn't add this ability to fentry/fexit/freplace only to simplify
> initial implementation ;) I think the time had come.

Yup, I agree that it makes sense to do the same for fentry/fexit.

> Currently fentry/fexit/freplace progs have single prog->aux->linked_prog pointer.
> It just needs to become a linked list.
> The api extension could be like this:
> bpf_raw_tp_open(prog_fd, attach_prog_fd, attach_btf_id);
> (currently it's just bpf_raw_tp_open(prog_fd))
> The same pair of (attach_prog_fd, attach_btf_id) is already passed into prog_load
> to hold the linked_prog and its corresponding btf_id.
> I'm proposing to extend raw_tp_open with this pair as well to
> attach existing fentry/fexit/freplace prog to another target.
> Internally the kernel verify that btf of current linked_prog
> exactly matches to btf of another requested linked_prog and
> if they match it will attach the same prog to two target programs (in case of freplace)
> or two kernel functions (in case of fentry/fexit).

API-wise this was exactly what I had in mind as well.

> Toke, Andrey,
> if above kinda makes sense from high level description
> I can prototype it quickly and then we can discuss details
> in the patches ?
> Or we can drill further into details and discuss corner cases.

I have one detail to discuss: What would the bpf_raw_tp_open() call
return on the second attachment? A second reference to the same bpf_link
fd as the initial attachment, or a different link?

For the dispatcher use case, the former would make sense: If the
bpf_link is returned to the application as a canonical reference to its
program's attachment, it should persist even when the dispatcher program
itself is replaced from underneath it. But I'm not sure if the same is
true for all such secondary attachments?

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: bpf: ability to attach freplace to multiple parents
  2020-04-02 20:21                                   ` bpf: ability to attach freplace to multiple parents Alexei Starovoitov
  2020-04-02 21:23                                     ` Toke Høiland-Jørgensen
@ 2020-04-02 21:24                                     ` Andrey Ignatov
  2020-04-02 22:01                                       ` Alexei Starovoitov
  1 sibling, 1 reply; 112+ messages in thread
From: Andrey Ignatov @ 2020-04-02 21:24 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Toke Høiland-Jørgensen, Daniel Borkmann,
	Andrii Nakryiko, David S. Miller, Networking, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> [Thu, 2020-04-02 13:22 -0700]:
> On Fri, Mar 27, 2020 at 12:11:15PM +0100, Toke Høiland-Jørgensen wrote:
> > 
> > Current code is in [0], for those following along. There are two bits of
> > kernel support missing before I can get it to where I want it for an
> > initial "release": Atomic replace of the dispatcher (this series), and
> > the ability to attach an freplace program to more than one "parent".
> > I'll try to get an RFC out for the latter during the merge window, but
> > I'll probably need some help in figuring out how to make it safe from
> > the verifier PoV.
> 
> I have some thoughts on the second part "ability to attach an freplace
> to more than one 'parent'".
> I think the solution should be more generic than just freplace.
> fentry/fexit need to have the same feature.
> Few folks already said that they want to attach fentry to multiple
> kernel functions. It's similar to what people do with kprobe progs now.
> (attach to multiple and differentiate attach point based on parent IP)
> Similarly "bpftool profile" needs it to avoid creating new pair of fentry/fexit
> progs for every target bpf prog it's collecting stats about.
> I didn't add this ability to fentry/fexit/freplace only to simplify
> initial implementation ;) I think the time had come.
> Currently fentry/fexit/freplace progs have single prog->aux->linked_prog pointer.
> It just needs to become a linked list.
> The api extension could be like this:
> bpf_raw_tp_open(prog_fd, attach_prog_fd, attach_btf_id);
> (currently it's just bpf_raw_tp_open(prog_fd))
> The same pair of (attach_prog_fd, attach_btf_id) is already passed into prog_load
> to hold the linked_prog and its corresponding btf_id.
> I'm proposing to extend raw_tp_open with this pair as well to
> attach existing fentry/fexit/freplace prog to another target.
> Internally the kernel verify that btf of current linked_prog
> exactly matches to btf of another requested linked_prog and
> if they match it will attach the same prog to two target programs (in case of freplace)
> or two kernel functions (in case of fentry/fexit).
> 
> Toke, Andrey,
> if above kinda makes sense from high level description
> I can prototype it quickly and then we can discuss details
> in the patches ?
> Or we can drill further into details and discuss corner cases.

That makes sense to me.

I've also been thinking of a way to "transition" ext prog from one
target program to another, but I had an impression that limiting number
of target progs to one for an ext prog is "by design" and hard to
change, and was looking at introducing a way to duplicate existing ext
prog by its fd but with different attach_prog_fd and attach_btf_id (smth
like BPF_PROG_DUP command) instead.

But since you're saying that there are actually many use-cases to be
able to attach freplace/fexit/fentry to multiple target programs, that
works as well. Happy to look at the prototype when it's available.

Thanks.


-- 
Andrey Ignatov

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: bpf: ability to attach freplace to multiple parents
  2020-04-02 21:23                                     ` Toke Høiland-Jørgensen
@ 2020-04-02 21:54                                       ` Alexei Starovoitov
  2020-04-03  8:38                                         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 112+ messages in thread
From: Alexei Starovoitov @ 2020-04-02 21:54 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Andrii Nakryiko, David S. Miller,
	Andrey Ignatov, Networking, bpf

On Thu, Apr 02, 2020 at 11:23:12PM +0200, Toke Høiland-Jørgensen wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> 
> > On Fri, Mar 27, 2020 at 12:11:15PM +0100, Toke Høiland-Jørgensen wrote:
> >> 
> >> Current code is in [0], for those following along. There are two bits of
> >> kernel support missing before I can get it to where I want it for an
> >> initial "release": Atomic replace of the dispatcher (this series), and
> >> the ability to attach an freplace program to more than one "parent".
> >> I'll try to get an RFC out for the latter during the merge window, but
> >> I'll probably need some help in figuring out how to make it safe from
> >> the verifier PoV.
> >
> > I have some thoughts on the second part "ability to attach an freplace
> > to more than one 'parent'".
> > I think the solution should be more generic than just freplace.
> > fentry/fexit need to have the same feature.
> > Few folks already said that they want to attach fentry to multiple
> > kernel functions. It's similar to what people do with kprobe progs now.
> > (attach to multiple and differentiate attach point based on parent IP)
> > Similarly "bpftool profile" needs it to avoid creating new pair of fentry/fexit
> > progs for every target bpf prog it's collecting stats about.
> > I didn't add this ability to fentry/fexit/freplace only to simplify
> > initial implementation ;) I think the time had come.
> 
> Yup, I agree that it makes sense to do the same for fentry/fexit.
> 
> > Currently fentry/fexit/freplace progs have single prog->aux->linked_prog pointer.
> > It just needs to become a linked list.
> > The api extension could be like this:
> > bpf_raw_tp_open(prog_fd, attach_prog_fd, attach_btf_id);
> > (currently it's just bpf_raw_tp_open(prog_fd))
> > The same pair of (attach_prog_fd, attach_btf_id) is already passed into prog_load
> > to hold the linked_prog and its corresponding btf_id.
> > I'm proposing to extend raw_tp_open with this pair as well to
> > attach existing fentry/fexit/freplace prog to another target.
> > Internally the kernel verify that btf of current linked_prog
> > exactly matches to btf of another requested linked_prog and
> > if they match it will attach the same prog to two target programs (in case of freplace)
> > or two kernel functions (in case of fentry/fexit).
> 
> API-wise this was exactly what I had in mind as well.

perfect!

> > Toke, Andrey,
> > if above kinda makes sense from high level description
> > I can prototype it quickly and then we can discuss details
> > in the patches ?
> > Or we can drill further into details and discuss corner cases.
> 
> I have one detail to discuss: What would the bpf_raw_tp_open() call
> return on the second attachment? A second reference to the same bpf_link
> fd as the initial attachment, or a different link?

It's a different link.
For fentry/fexit/freplace the link is pair:
  // target           ...         bpf_prog
(target_prog_fd_or_vmlinux, fentry_exit_replace_prog_fd).

So for xdp case we will have:
root_link = (eth0_ifindex, dispatcher_prog_fd) // dispatcher prog attached to eth0
link1 = (dispatcher_prog_fd, xdp_firewall1_fd) // 1st extension prog attached to dispatcher
link2 = (dispatcher_prog_fd, xdp_firewall2_fd) // 2nd extension prog attached to dispatcher

Now libxdp wants to update the dispatcher prog.
It generates new dispatcher prog with more placeholder entries or new policy:
new_dispatcher_prog_fd.
It's not attached anywhere.
Then libxdp calls new bpf_raw_tp_open() api I'm proposing above to create:
link3 = (new_dispatcher_prog_fd, xdp_firewall1_fd)
link4 = (new_dispatcher_prog_fd, xdp_firewall2_fd)
Now we have two firewalls attached to both old dispatcher prog and new dispatcher prog.
Both firewalls are executing via old dispatcher prog that is active.
Now libxdp calls:
bpf_link_udpate(root_link, dispatcher_prog_fd, new_dispatcher_prog_fd)
which atomically replaces old dispatcher prog with new dispatcher prog in eth0.
The traffic keeps flowing into both firewalls. No packets lost.
But now it goes through new dipsatcher prog.
libxdp can now:
close(dispatcher_prog_fd);
close(link1);
close(link2);
Closing (and destroying two links) will remove old dispatcher prog
from linked list in xdp_firewall1_prog->aux->linked_prog_list and from
xdp_firewall2_prog->aux->linked_prog_list.
Notice that there is no need to explicitly detach old dispatcher prog from eth0.
link_update() did it while replacing it with new dispatcher prog.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: bpf: ability to attach freplace to multiple parents
  2020-04-02 21:24                                     ` Andrey Ignatov
@ 2020-04-02 22:01                                       ` Alexei Starovoitov
  0 siblings, 0 replies; 112+ messages in thread
From: Alexei Starovoitov @ 2020-04-02 22:01 UTC (permalink / raw)
  To: Andrey Ignatov
  Cc: Toke Høiland-Jørgensen, Daniel Borkmann,
	Andrii Nakryiko, David S. Miller, Networking, bpf

On Thu, Apr 02, 2020 at 02:24:33PM -0700, Andrey Ignatov wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> [Thu, 2020-04-02 13:22 -0700]:
> > On Fri, Mar 27, 2020 at 12:11:15PM +0100, Toke Høiland-Jørgensen wrote:
> > > 
> > > Current code is in [0], for those following along. There are two bits of
> > > kernel support missing before I can get it to where I want it for an
> > > initial "release": Atomic replace of the dispatcher (this series), and
> > > the ability to attach an freplace program to more than one "parent".
> > > I'll try to get an RFC out for the latter during the merge window, but
> > > I'll probably need some help in figuring out how to make it safe from
> > > the verifier PoV.
> > 
> > I have some thoughts on the second part "ability to attach an freplace
> > to more than one 'parent'".
> > I think the solution should be more generic than just freplace.
> > fentry/fexit need to have the same feature.
> > Few folks already said that they want to attach fentry to multiple
> > kernel functions. It's similar to what people do with kprobe progs now.
> > (attach to multiple and differentiate attach point based on parent IP)
> > Similarly "bpftool profile" needs it to avoid creating new pair of fentry/fexit
> > progs for every target bpf prog it's collecting stats about.
> > I didn't add this ability to fentry/fexit/freplace only to simplify
> > initial implementation ;) I think the time had come.
> > Currently fentry/fexit/freplace progs have single prog->aux->linked_prog pointer.
> > It just needs to become a linked list.
> > The api extension could be like this:
> > bpf_raw_tp_open(prog_fd, attach_prog_fd, attach_btf_id);
> > (currently it's just bpf_raw_tp_open(prog_fd))
> > The same pair of (attach_prog_fd, attach_btf_id) is already passed into prog_load
> > to hold the linked_prog and its corresponding btf_id.
> > I'm proposing to extend raw_tp_open with this pair as well to
> > attach existing fentry/fexit/freplace prog to another target.
> > Internally the kernel verify that btf of current linked_prog
> > exactly matches to btf of another requested linked_prog and
> > if they match it will attach the same prog to two target programs (in case of freplace)
> > or two kernel functions (in case of fentry/fexit).
> > 
> > Toke, Andrey,
> > if above kinda makes sense from high level description
> > I can prototype it quickly and then we can discuss details
> > in the patches ?
> > Or we can drill further into details and discuss corner cases.
> 
> That makes sense to me.
> 
> I've also been thinking of a way to "transition" ext prog from one
> target program to another, but I had an impression that limiting number
> of target progs to one for an ext prog is "by design" and hard to
> change, and was looking at introducing a way to duplicate existing ext
> prog by its fd but with different attach_prog_fd and attach_btf_id (smth
> like BPF_PROG_DUP command) instead.

I think cloning the whole program is useful.
iirc cilium folks wanted an ability for clone to have a different 'flavor' of
the program. Like re-verifying and re-optimizing with new dead code elimination
when global data changed. Like loading a prog to manage traffic for one k8
container with given IP, then cloning this prog for a different container with
different IP. So clone is effectively a fast load where the verifier
potentially doesn't need to do full verification of all paths. I think it's
still a useful feature, but for this case, I hope, much simpler approach would do.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: bpf: ability to attach freplace to multiple parents
  2020-04-02 21:54                                       ` Alexei Starovoitov
@ 2020-04-03  8:38                                         ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 112+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-03  8:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Andrii Nakryiko, David S. Miller,
	Andrey Ignatov, Networking, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> It's a different link.
> For fentry/fexit/freplace the link is pair:
>   // target           ...         bpf_prog
> (target_prog_fd_or_vmlinux, fentry_exit_replace_prog_fd).
>
> So for xdp case we will have:
> root_link = (eth0_ifindex, dispatcher_prog_fd) // dispatcher prog attached to eth0
> link1 = (dispatcher_prog_fd, xdp_firewall1_fd) // 1st extension prog attached to dispatcher
> link2 = (dispatcher_prog_fd, xdp_firewall2_fd) // 2nd extension prog attached to dispatcher
>
> Now libxdp wants to update the dispatcher prog.
> It generates new dispatcher prog with more placeholder entries or new policy:
> new_dispatcher_prog_fd.
> It's not attached anywhere.
> Then libxdp calls new bpf_raw_tp_open() api I'm proposing above to create:
> link3 = (new_dispatcher_prog_fd, xdp_firewall1_fd)
> link4 = (new_dispatcher_prog_fd, xdp_firewall2_fd)
> Now we have two firewalls attached to both old dispatcher prog and new dispatcher prog.
> Both firewalls are executing via old dispatcher prog that is active.
> Now libxdp calls:
> bpf_link_udpate(root_link, dispatcher_prog_fd, new_dispatcher_prog_fd)
> which atomically replaces old dispatcher prog with new dispatcher prog in eth0.
> The traffic keeps flowing into both firewalls. No packets lost.
> But now it goes through new dipsatcher prog.
> libxdp can now:
> close(dispatcher_prog_fd);
> close(link1);
> close(link2);
> Closing (and destroying two links) will remove old dispatcher prog
> from linked list in xdp_firewall1_prog->aux->linked_prog_list and from
> xdp_firewall2_prog->aux->linked_prog_list.
> Notice that there is no need to explicitly detach old dispatcher prog from eth0.
> link_update() did it while replacing it with new dispatcher prog.

Yeah, this was the flow I had in mind already. However, what I meant was
that *from the PoV of an application consuming the link fd*, this would
lead to dangling links.

I.e., an application does:

app1_link_fd = libxdp_install_prog(prog1);

and stores link_fd somewhere (just holds on to it, or pins it
somewhere).

Then later, another application does:

app2_link_fd = libxdp_install_prog(prog2);

but this has the side-effect of replacing the dispatcher, so
app1_link_fd is now no longer valid.

This can be worked around, of course (e.g., just return the prog_fd and
hide any link_fd details inside the library), but if the point of
bpf_link is that the application could hold on to it and use it for
subsequent replacements, that would be nice to have for consumers of the
library as well, no?

-Toke


^ permalink raw reply	[flat|nested] 112+ messages in thread

end of thread, back to index

Thread overview: 112+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-19 13:13 [PATCH bpf-next 0/4] XDP: Support atomic replacement of XDP interface attachments Toke Høiland-Jørgensen
2020-03-19 13:13 ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
2020-03-19 22:52   ` Jakub Kicinski
2020-03-20  8:48     ` Toke Høiland-Jørgensen
2020-03-20 17:35       ` Jakub Kicinski
2020-03-20 18:17         ` Toke Høiland-Jørgensen
2020-03-20 18:35           ` Jakub Kicinski
2020-03-20 18:30         ` John Fastabend
2020-03-20 20:24           ` Andrii Nakryiko
2020-03-23 11:24             ` Toke Høiland-Jørgensen
2020-03-23 16:54               ` Jakub Kicinski
2020-03-23 18:14               ` Andrii Nakryiko
2020-03-23 19:23                 ` Toke Høiland-Jørgensen
2020-03-24  1:01                   ` David Ahern
2020-03-24  4:53                     ` Andrii Nakryiko
2020-03-24 20:55                       ` David Ahern
2020-03-24 22:56                         ` Andrii Nakryiko
2020-03-24  5:00                   ` Andrii Nakryiko
2020-03-24 10:57                     ` Toke Høiland-Jørgensen
2020-03-24 18:53                       ` Jakub Kicinski
2020-03-24 22:30                         ` Andrii Nakryiko
2020-03-25  1:25                           ` Jakub Kicinski
2020-03-24 19:22                       ` John Fastabend
2020-03-25  1:36                         ` Alexei Starovoitov
2020-03-25  2:15                           ` Jakub Kicinski
2020-03-25 18:06                             ` Alexei Starovoitov
2020-03-25 18:20                               ` Jakub Kicinski
2020-03-25 19:14                                 ` Alexei Starovoitov
2020-03-25 10:42                           ` Toke Høiland-Jørgensen
2020-03-25 18:11                             ` Alexei Starovoitov
2020-03-25 10:30                         ` Toke Høiland-Jørgensen
2020-03-25 17:56                           ` Alexei Starovoitov
2020-03-24 22:25                       ` Andrii Nakryiko
2020-03-25  9:38                         ` Toke Høiland-Jørgensen
2020-03-25 17:55                           ` Alexei Starovoitov
2020-03-26  0:16                           ` Andrii Nakryiko
2020-03-26  5:13                             ` Jakub Kicinski
2020-03-26 18:09                               ` Andrii Nakryiko
2020-03-26 19:40                               ` Alexei Starovoitov
2020-03-26 20:05                                 ` Edward Cree
2020-03-27 11:09                                   ` Lorenz Bauer
2020-03-27 23:11                                   ` Alexei Starovoitov
2020-03-26 10:04                             ` Lorenz Bauer
2020-03-26 17:47                               ` Jakub Kicinski
2020-03-26 19:45                                 ` Alexei Starovoitov
2020-03-26 18:18                               ` Andrii Nakryiko
2020-03-26 19:53                               ` Alexei Starovoitov
2020-03-27 11:11                                 ` Toke Høiland-Jørgensen
2020-04-02 20:21                                   ` bpf: ability to attach freplace to multiple parents Alexei Starovoitov
2020-04-02 21:23                                     ` Toke Høiland-Jørgensen
2020-04-02 21:54                                       ` Alexei Starovoitov
2020-04-03  8:38                                         ` Toke Høiland-Jørgensen
2020-04-02 21:24                                     ` Andrey Ignatov
2020-04-02 22:01                                       ` Alexei Starovoitov
2020-03-26 12:35                             ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
2020-03-26 19:06                               ` Andrii Nakryiko
2020-03-27 11:06                                 ` Lorenz Bauer
2020-03-27 16:12                                   ` David Ahern
2020-03-27 20:10                                     ` Andrii Nakryiko
2020-03-27 23:02                                     ` Alexei Starovoitov
2020-03-30 15:25                                       ` Edward Cree
2020-03-31  3:43                                         ` Alexei Starovoitov
2020-03-31 22:05                                           ` Edward Cree
2020-03-31 22:16                                             ` Alexei Starovoitov
2020-03-27 19:42                                   ` Andrii Nakryiko
2020-03-27 19:45                                   ` Andrii Nakryiko
2020-03-27 23:09                                   ` Alexei Starovoitov
2020-03-27 11:46                                 ` Toke Høiland-Jørgensen
2020-03-27 20:07                                   ` Andrii Nakryiko
2020-03-27 22:16                                     ` Toke Høiland-Jørgensen
2020-03-27 22:54                                       ` Andrii Nakryiko
2020-03-28  1:09                                         ` Toke Høiland-Jørgensen
2020-03-28  1:44                                           ` Andrii Nakryiko
2020-03-28 19:43                                             ` Toke Høiland-Jørgensen
2020-03-26 19:58                               ` Alexei Starovoitov
2020-03-27 12:06                                 ` Toke Høiland-Jørgensen
2020-03-27 23:00                                   ` Alexei Starovoitov
2020-03-28  1:43                                     ` Toke Høiland-Jørgensen
2020-03-28  2:26                                       ` Alexei Starovoitov
2020-03-28 19:34                                         ` Toke Høiland-Jørgensen
2020-03-28 23:35                                           ` Alexei Starovoitov
2020-03-29 10:39                                             ` Toke Høiland-Jørgensen
2020-03-29 19:26                                               ` Alexei Starovoitov
2020-03-30 10:19                                                 ` Toke Høiland-Jørgensen
2020-03-29 20:23                                           ` Andrii Nakryiko
2020-03-30 13:53                                             ` Toke Høiland-Jørgensen
2020-03-30 20:17                                               ` Andrii Nakryiko
2020-03-31 10:13                                                 ` Toke Høiland-Jørgensen
2020-03-31 13:48                                                   ` Daniel Borkmann
2020-03-31 15:00                                                     ` Toke Høiland-Jørgensen
2020-03-31 20:19                                                       ` Andrii Nakryiko
2020-03-31 20:15                                                     ` Andrii Nakryiko
2020-03-30 15:41                                             ` Edward Cree
2020-03-30 19:13                                               ` Jakub Kicinski
2020-03-31  4:01                                               ` Alexei Starovoitov
2020-03-31 11:34                                                 ` Toke Høiland-Jørgensen
2020-03-31 18:52                                                   ` Alexei Starovoitov
2020-03-20 20:30       ` Daniel Borkmann
2020-03-20 20:40         ` Daniel Borkmann
2020-03-20 21:30           ` Jakub Kicinski
2020-03-20 21:55             ` Daniel Borkmann
2020-03-20 23:35               ` Jakub Kicinski
2020-03-20 20:39       ` Andrii Nakryiko
2020-03-23 11:25         ` Toke Høiland-Jørgensen
2020-03-23 18:07           ` Andrii Nakryiko
2020-03-23 23:54           ` Andrey Ignatov
2020-03-24 10:16             ` Toke Høiland-Jørgensen
2020-03-20  2:13   ` Yonghong Song
2020-03-20  8:48     ` Toke Høiland-Jørgensen
2020-03-19 13:13 ` [PATCH bpf-next 2/4] tools: Add EXPECTED_FD-related definitions in if_link.h Toke Høiland-Jørgensen
2020-03-19 13:13 ` [PATCH bpf-next 3/4] libbpf: Add function to set link XDP fd while specifying old fd Toke Høiland-Jørgensen
2020-03-19 13:13 ` [PATCH bpf-next 4/4] selftests/bpf: Add tests for attaching XDP programs Toke Høiland-Jørgensen

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git