bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/6] Add support for cgroup bpf_link
@ 2020-03-20 20:36 Andrii Nakryiko
  2020-03-20 20:36 ` [PATCH bpf-next 1/6] bpf: factor out cgroup storages operations Andrii Nakryiko
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2020-03-20 20:36 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

bpf_link abstraction itself was formalized in [0] with justifications for why
its semantics is a good fit for attaching BPF programs of various types. This
patch set adds bpf_link-based BPF program attachment mechanism for cgroup BPF
programs.

Cgroup BPF link is semantically compatible with current BPF_F_ALLOW_MULTI
semantics of attaching cgroup BPF programs directly. Thus cgroup bpf_link can
co-exist with legacy BPF program multi-attachment.

bpf_link is destroyed and automatically detached when the last open FD holding
the reference to bpf_link is closed. This means that by default, when the
process that created bpf_link exits, attached BPF program will be
automatically detached due to bpf_link's clean up code. Cgroup bpf_link, like
any other bpf_link, can be pinned in BPF FS and by those means survive the
exit of process that created the link. This is useful in many scenarios to
provide long-living BPF program attachments. Pinning also means that there
could be many owners of bpf_link through independent FDs.

Additionally, auto-detachmet of cgroup bpf_link is implemented. When cgroup is
dying it will automatically detach all active bpf_links. This ensures that
cgroup clean up is not delayed due to active bpf_link even despite no chance
for any BPF program to be run for a given cgroup. In that sense it's similar
to existing behavior of dropping refcnt of attached bpf_prog. But in the case
of bpf_link, bpf_link is not destroyed and is still available to user as long
as at least one active FD is still open (or if it's pinned in BPF FS).

There are two main cgroup-specific differences between bpf_link-based and
direct bpf_prog-based attachment.

First, as opposed to direct bpf_prog attachment, cgroup itself doesn't "own"
bpf_link, which makes it possible to auto-clean up attached bpf_link when user
process abruptly exits without explicitly detaching BPF program. This makes
for a safe default behavior proven in BPF tracing program types. But bpf_link
doesn't bump cgroup->bpf.refcnt as well and because of that doesn't prevent
cgroup from cleaning up its BPF state.

Second, only owners of bpf_link (those who created bpf_link in the first place
or obtained a new FD by opening bpf_link from BPF FS) can detach and/or update
it. This makes sure that no other process can accidentally remove/replace BPF
program.

This patch set also implements LINK_UPDATE sub-command, which allows to
replace bpf_link's underlying bpf_prog, similarly to BPF_F_REPLACE flag
behavior for direct bpf_prog cgroup attachment. Similarly to LINK_CREATE, it
is supposed to be generic command for different types of bpf_links.

  [0] https://lore.kernel.org/bpf/20200228223948.360936-1-andriin@fb.com/

Andrii Nakryiko (6):
  bpf: factor out cgroup storages operations
  bpf: factor out attach_type to prog_type mapping for attach/detach
  bpf: implement bpf_link-based cgroup BPF program attachment
  bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  libbpf: add support for bpf_link-based cgroup attachment
  selftests/bpf: test FD-based cgroup attachment

 include/linux/bpf-cgroup.h                    |  31 +-
 include/linux/bpf.h                           |  10 +-
 include/uapi/linux/bpf.h                      |  21 +-
 kernel/bpf/cgroup.c                           | 502 +++++++++++++-----
 kernel/bpf/syscall.c                          | 275 ++++++----
 kernel/cgroup/cgroup.c                        |  35 +-
 tools/include/uapi/linux/bpf.h                |  21 +-
 tools/lib/bpf/bpf.c                           |  34 ++
 tools/lib/bpf/bpf.h                           |  19 +
 tools/lib/bpf/libbpf.c                        |  46 ++
 tools/lib/bpf/libbpf.h                        |   8 +-
 tools/lib/bpf/libbpf.map                      |   4 +
 .../selftests/bpf/prog_tests/cgroup_link.c    | 242 +++++++++
 .../selftests/bpf/progs/test_cgroup_link.c    |  24 +
 14 files changed, 1042 insertions(+), 230 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_link.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_cgroup_link.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 1/6] bpf: factor out cgroup storages operations
  2020-03-20 20:36 [PATCH bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
@ 2020-03-20 20:36 ` Andrii Nakryiko
  2020-03-20 20:36 ` [PATCH bpf-next 2/6] bpf: factor out attach_type to prog_type mapping for attach/detach Andrii Nakryiko
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2020-03-20 20:36 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Refactor cgroup attach/detach code to abstract away common operations
performed on all types of cgroup storages. This makes high-level logic more
apparent, plus allows to reuse more code across multiple functions.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 kernel/bpf/cgroup.c | 118 +++++++++++++++++++++++++++-----------------
 1 file changed, 72 insertions(+), 46 deletions(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 9a500fadbef5..9c8472823a7f 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -28,6 +28,58 @@ void cgroup_bpf_offline(struct cgroup *cgrp)
 	percpu_ref_kill(&cgrp->bpf.refcnt);
 }
 
+static void bpf_cgroup_storages_free(struct bpf_cgroup_storage *storages[])
+{
+	enum bpf_cgroup_storage_type stype;
+
+	for_each_cgroup_storage_type(stype)
+		bpf_cgroup_storage_free(storages[stype]);
+}
+
+static int bpf_cgroup_storages_alloc(struct bpf_cgroup_storage *storages[],
+				     struct bpf_prog *prog)
+{
+	enum bpf_cgroup_storage_type stype;
+
+	for_each_cgroup_storage_type(stype) {
+		storages[stype] = bpf_cgroup_storage_alloc(prog, stype);
+		if (IS_ERR(storages[stype])) {
+			storages[stype] = NULL;
+			bpf_cgroup_storages_free(storages);
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
+static void bpf_cgroup_storages_assign(struct bpf_cgroup_storage *dst[],
+				       struct bpf_cgroup_storage *src[])
+{
+	enum bpf_cgroup_storage_type stype;
+
+	for_each_cgroup_storage_type(stype)
+		dst[stype] = src[stype];
+}
+
+static void bpf_cgroup_storages_link(struct bpf_cgroup_storage *storages[],
+				     struct cgroup* cgrp,
+				     enum bpf_attach_type attach_type)
+{
+	enum bpf_cgroup_storage_type stype;
+
+	for_each_cgroup_storage_type(stype)
+		bpf_cgroup_storage_link(storages[stype], cgrp, attach_type);
+}
+
+static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[])
+{
+	enum bpf_cgroup_storage_type stype;
+
+	for_each_cgroup_storage_type(stype)
+		bpf_cgroup_storage_unlink(storages[stype]);
+}
+
 /**
  * cgroup_bpf_release() - put references of all bpf programs and
  *                        release all cgroup bpf data
@@ -37,7 +89,6 @@ static void cgroup_bpf_release(struct work_struct *work)
 {
 	struct cgroup *p, *cgrp = container_of(work, struct cgroup,
 					       bpf.release_work);
-	enum bpf_cgroup_storage_type stype;
 	struct bpf_prog_array *old_array;
 	unsigned int type;
 
@@ -50,10 +101,8 @@ static void cgroup_bpf_release(struct work_struct *work)
 		list_for_each_entry_safe(pl, tmp, progs, node) {
 			list_del(&pl->node);
 			bpf_prog_put(pl->prog);
-			for_each_cgroup_storage_type(stype) {
-				bpf_cgroup_storage_unlink(pl->storage[stype]);
-				bpf_cgroup_storage_free(pl->storage[stype]);
-			}
+			bpf_cgroup_storages_unlink(pl->storage);
+			bpf_cgroup_storages_free(pl->storage);
 			kfree(pl);
 			static_branch_dec(&cgroup_bpf_enabled_key);
 		}
@@ -138,7 +187,7 @@ static int compute_effective_progs(struct cgroup *cgrp,
 				   enum bpf_attach_type type,
 				   struct bpf_prog_array **array)
 {
-	enum bpf_cgroup_storage_type stype;
+	struct bpf_prog_array_item *item;
 	struct bpf_prog_array *progs;
 	struct bpf_prog_list *pl;
 	struct cgroup *p = cgrp;
@@ -166,10 +215,10 @@ static int compute_effective_progs(struct cgroup *cgrp,
 			if (!pl->prog)
 				continue;
 
-			progs->items[cnt].prog = pl->prog;
-			for_each_cgroup_storage_type(stype)
-				progs->items[cnt].cgroup_storage[stype] =
-					pl->storage[stype];
+			item = &progs->items[cnt];
+			item->prog = pl->prog;
+			bpf_cgroup_storages_assign(item->cgroup_storage,
+						   pl->storage);
 			cnt++;
 		}
 	} while ((p = cgroup_parent(p)));
@@ -305,7 +354,6 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE],
 		*old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {NULL};
 	struct bpf_prog_list *pl, *replace_pl = NULL;
-	enum bpf_cgroup_storage_type stype;
 	int err;
 
 	if (((flags & BPF_F_ALLOW_OVERRIDE) && (flags & BPF_F_ALLOW_MULTI)) ||
@@ -341,37 +389,25 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
 		replace_pl = list_first_entry(progs, typeof(*pl), node);
 	}
 
-	for_each_cgroup_storage_type(stype) {
-		storage[stype] = bpf_cgroup_storage_alloc(prog, stype);
-		if (IS_ERR(storage[stype])) {
-			storage[stype] = NULL;
-			for_each_cgroup_storage_type(stype)
-				bpf_cgroup_storage_free(storage[stype]);
-			return -ENOMEM;
-		}
-	}
+	if (bpf_cgroup_storages_alloc(storage, prog))
+		return -ENOMEM;
 
 	if (replace_pl) {
 		pl = replace_pl;
 		old_prog = pl->prog;
-		for_each_cgroup_storage_type(stype) {
-			old_storage[stype] = pl->storage[stype];
-			bpf_cgroup_storage_unlink(old_storage[stype]);
-		}
+		bpf_cgroup_storages_unlink(pl->storage);
+		bpf_cgroup_storages_assign(old_storage, pl->storage);
 	} else {
 		pl = kmalloc(sizeof(*pl), GFP_KERNEL);
 		if (!pl) {
-			for_each_cgroup_storage_type(stype)
-				bpf_cgroup_storage_free(storage[stype]);
+			bpf_cgroup_storages_free(storage);
 			return -ENOMEM;
 		}
 		list_add_tail(&pl->node, progs);
 	}
 
 	pl->prog = prog;
-	for_each_cgroup_storage_type(stype)
-		pl->storage[stype] = storage[stype];
-
+	bpf_cgroup_storages_assign(pl->storage, storage);
 	cgrp->bpf.flags[type] = saved_flags;
 
 	err = update_effective_progs(cgrp, type);
@@ -379,27 +415,20 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
 		goto cleanup;
 
 	static_branch_inc(&cgroup_bpf_enabled_key);
-	for_each_cgroup_storage_type(stype) {
-		if (!old_storage[stype])
-			continue;
-		bpf_cgroup_storage_free(old_storage[stype]);
-	}
+	bpf_cgroup_storages_free(old_storage);
 	if (old_prog) {
 		bpf_prog_put(old_prog);
 		static_branch_dec(&cgroup_bpf_enabled_key);
 	}
-	for_each_cgroup_storage_type(stype)
-		bpf_cgroup_storage_link(storage[stype], cgrp, type);
+	bpf_cgroup_storages_link(storage, cgrp, type);
 	return 0;
 
 cleanup:
 	/* and cleanup the prog list */
 	pl->prog = old_prog;
-	for_each_cgroup_storage_type(stype) {
-		bpf_cgroup_storage_free(pl->storage[stype]);
-		pl->storage[stype] = old_storage[stype];
-		bpf_cgroup_storage_link(old_storage[stype], cgrp, type);
-	}
+	bpf_cgroup_storages_free(pl->storage);
+	bpf_cgroup_storages_assign(pl->storage, old_storage);
+	bpf_cgroup_storages_link(pl->storage, cgrp, type);
 	if (!replace_pl) {
 		list_del(&pl->node);
 		kfree(pl);
@@ -420,7 +449,6 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 			enum bpf_attach_type type)
 {
 	struct list_head *progs = &cgrp->bpf.progs[type];
-	enum bpf_cgroup_storage_type stype;
 	u32 flags = cgrp->bpf.flags[type];
 	struct bpf_prog *old_prog = NULL;
 	struct bpf_prog_list *pl;
@@ -467,10 +495,8 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 
 	/* now can actually delete it from this cgroup list */
 	list_del(&pl->node);
-	for_each_cgroup_storage_type(stype) {
-		bpf_cgroup_storage_unlink(pl->storage[stype]);
-		bpf_cgroup_storage_free(pl->storage[stype]);
-	}
+	bpf_cgroup_storages_unlink(pl->storage);
+	bpf_cgroup_storages_free(pl->storage);
 	kfree(pl);
 	if (list_empty(progs))
 		/* last program was detached, reset flags to zero */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 2/6] bpf: factor out attach_type to prog_type mapping for attach/detach
  2020-03-20 20:36 [PATCH bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
  2020-03-20 20:36 ` [PATCH bpf-next 1/6] bpf: factor out cgroup storages operations Andrii Nakryiko
@ 2020-03-20 20:36 ` Andrii Nakryiko
  2020-03-20 20:36 ` [PATCH bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment Andrii Nakryiko
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2020-03-20 20:36 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Factor out logic mapping expected program attach type to program type and
subsequent handling of program attach/detach. Also list out all supported
cgroup BPF program types explicitly to prevent accidental bugs once more
program types are added to a mapping. Do the same for prog_query API.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 kernel/bpf/syscall.c | 153 +++++++++++++++++++------------------------
 1 file changed, 66 insertions(+), 87 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 85567a6ea5f9..fd4181939064 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2535,36 +2535,18 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 	}
 }
 
-#define BPF_PROG_ATTACH_LAST_FIELD replace_bpf_fd
-
-#define BPF_F_ATTACH_MASK \
-	(BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI | BPF_F_REPLACE)
-
-static int bpf_prog_attach(const union bpf_attr *attr)
+static enum bpf_prog_type
+attach_type_to_prog_type(enum bpf_attach_type attach_type)
 {
-	enum bpf_prog_type ptype;
-	struct bpf_prog *prog;
-	int ret;
-
-	if (!capable(CAP_NET_ADMIN))
-		return -EPERM;
-
-	if (CHECK_ATTR(BPF_PROG_ATTACH))
-		return -EINVAL;
-
-	if (attr->attach_flags & ~BPF_F_ATTACH_MASK)
-		return -EINVAL;
-
-	switch (attr->attach_type) {
+	switch (attach_type) {
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
-		ptype = BPF_PROG_TYPE_CGROUP_SKB;
+		return BPF_PROG_TYPE_CGROUP_SKB;
 		break;
 	case BPF_CGROUP_INET_SOCK_CREATE:
 	case BPF_CGROUP_INET4_POST_BIND:
 	case BPF_CGROUP_INET6_POST_BIND:
-		ptype = BPF_PROG_TYPE_CGROUP_SOCK;
-		break;
+		return BPF_PROG_TYPE_CGROUP_SOCK;
 	case BPF_CGROUP_INET4_BIND:
 	case BPF_CGROUP_INET6_BIND:
 	case BPF_CGROUP_INET4_CONNECT:
@@ -2573,37 +2555,53 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_CGROUP_UDP6_SENDMSG:
 	case BPF_CGROUP_UDP4_RECVMSG:
 	case BPF_CGROUP_UDP6_RECVMSG:
-		ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
-		break;
+		return BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
 	case BPF_CGROUP_SOCK_OPS:
-		ptype = BPF_PROG_TYPE_SOCK_OPS;
-		break;
+		return BPF_PROG_TYPE_SOCK_OPS;
 	case BPF_CGROUP_DEVICE:
-		ptype = BPF_PROG_TYPE_CGROUP_DEVICE;
-		break;
+		return BPF_PROG_TYPE_CGROUP_DEVICE;
 	case BPF_SK_MSG_VERDICT:
-		ptype = BPF_PROG_TYPE_SK_MSG;
-		break;
+		return BPF_PROG_TYPE_SK_MSG;
 	case BPF_SK_SKB_STREAM_PARSER:
 	case BPF_SK_SKB_STREAM_VERDICT:
-		ptype = BPF_PROG_TYPE_SK_SKB;
-		break;
+		return BPF_PROG_TYPE_SK_SKB;
 	case BPF_LIRC_MODE2:
-		ptype = BPF_PROG_TYPE_LIRC_MODE2;
-		break;
+		return BPF_PROG_TYPE_LIRC_MODE2;
 	case BPF_FLOW_DISSECTOR:
-		ptype = BPF_PROG_TYPE_FLOW_DISSECTOR;
-		break;
+		return BPF_PROG_TYPE_FLOW_DISSECTOR;
 	case BPF_CGROUP_SYSCTL:
-		ptype = BPF_PROG_TYPE_CGROUP_SYSCTL;
-		break;
+		return BPF_PROG_TYPE_CGROUP_SYSCTL;
 	case BPF_CGROUP_GETSOCKOPT:
 	case BPF_CGROUP_SETSOCKOPT:
-		ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT;
-		break;
+		return BPF_PROG_TYPE_CGROUP_SOCKOPT;
 	default:
-		return -EINVAL;
+		return BPF_PROG_TYPE_UNSPEC;
 	}
+}
+
+#define BPF_PROG_ATTACH_LAST_FIELD replace_bpf_fd
+
+#define BPF_F_ATTACH_MASK \
+	(BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI | BPF_F_REPLACE)
+
+static int bpf_prog_attach(const union bpf_attr *attr)
+{
+	enum bpf_prog_type ptype;
+	struct bpf_prog *prog;
+	int ret;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	if (CHECK_ATTR(BPF_PROG_ATTACH))
+		return -EINVAL;
+
+	if (attr->attach_flags & ~BPF_F_ATTACH_MASK)
+		return -EINVAL;
+
+	ptype = attach_type_to_prog_type(attr->attach_type);
+	if (ptype == BPF_PROG_TYPE_UNSPEC)
+		return -EINVAL;
 
 	prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
 	if (IS_ERR(prog))
@@ -2625,8 +2623,17 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 		ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
 		break;
-	default:
+	case BPF_PROG_TYPE_CGROUP_DEVICE:
+	case BPF_PROG_TYPE_CGROUP_SKB:
+	case BPF_PROG_TYPE_CGROUP_SOCK:
+	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
+	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+	case BPF_PROG_TYPE_CGROUP_SYSCTL:
+	case BPF_PROG_TYPE_SOCK_OPS:
 		ret = cgroup_bpf_prog_attach(attr, ptype, prog);
+		break;
+	default:
+		ret = -EINVAL;
 	}
 
 	if (ret)
@@ -2646,53 +2653,27 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	if (CHECK_ATTR(BPF_PROG_DETACH))
 		return -EINVAL;
 
-	switch (attr->attach_type) {
-	case BPF_CGROUP_INET_INGRESS:
-	case BPF_CGROUP_INET_EGRESS:
-		ptype = BPF_PROG_TYPE_CGROUP_SKB;
-		break;
-	case BPF_CGROUP_INET_SOCK_CREATE:
-	case BPF_CGROUP_INET4_POST_BIND:
-	case BPF_CGROUP_INET6_POST_BIND:
-		ptype = BPF_PROG_TYPE_CGROUP_SOCK;
-		break;
-	case BPF_CGROUP_INET4_BIND:
-	case BPF_CGROUP_INET6_BIND:
-	case BPF_CGROUP_INET4_CONNECT:
-	case BPF_CGROUP_INET6_CONNECT:
-	case BPF_CGROUP_UDP4_SENDMSG:
-	case BPF_CGROUP_UDP6_SENDMSG:
-	case BPF_CGROUP_UDP4_RECVMSG:
-	case BPF_CGROUP_UDP6_RECVMSG:
-		ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
-		break;
-	case BPF_CGROUP_SOCK_OPS:
-		ptype = BPF_PROG_TYPE_SOCK_OPS;
-		break;
-	case BPF_CGROUP_DEVICE:
-		ptype = BPF_PROG_TYPE_CGROUP_DEVICE;
-		break;
-	case BPF_SK_MSG_VERDICT:
-		return sock_map_get_from_fd(attr, NULL);
-	case BPF_SK_SKB_STREAM_PARSER:
-	case BPF_SK_SKB_STREAM_VERDICT:
+	ptype = attach_type_to_prog_type(attr->attach_type);
+
+	switch (ptype) {
+	case BPF_PROG_TYPE_SK_MSG:
+	case BPF_PROG_TYPE_SK_SKB:
 		return sock_map_get_from_fd(attr, NULL);
-	case BPF_LIRC_MODE2:
+	case BPF_PROG_TYPE_LIRC_MODE2:
 		return lirc_prog_detach(attr);
-	case BPF_FLOW_DISSECTOR:
+	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 		return skb_flow_dissector_bpf_prog_detach(attr);
-	case BPF_CGROUP_SYSCTL:
-		ptype = BPF_PROG_TYPE_CGROUP_SYSCTL;
-		break;
-	case BPF_CGROUP_GETSOCKOPT:
-	case BPF_CGROUP_SETSOCKOPT:
-		ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT;
-		break;
+	case BPF_PROG_TYPE_CGROUP_DEVICE:
+	case BPF_PROG_TYPE_CGROUP_SKB:
+	case BPF_PROG_TYPE_CGROUP_SOCK:
+	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
+	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+	case BPF_PROG_TYPE_CGROUP_SYSCTL:
+	case BPF_PROG_TYPE_SOCK_OPS:
+		return cgroup_bpf_prog_detach(attr, ptype);
 	default:
 		return -EINVAL;
 	}
-
-	return cgroup_bpf_prog_detach(attr, ptype);
 }
 
 #define BPF_PROG_QUERY_LAST_FIELD query.prog_cnt
@@ -2726,7 +2707,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_CGROUP_SYSCTL:
 	case BPF_CGROUP_GETSOCKOPT:
 	case BPF_CGROUP_SETSOCKOPT:
-		break;
+		return cgroup_bpf_prog_query(attr, uattr);
 	case BPF_LIRC_MODE2:
 		return lirc_prog_query(attr, uattr);
 	case BPF_FLOW_DISSECTOR:
@@ -2734,8 +2715,6 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	default:
 		return -EINVAL;
 	}
-
-	return cgroup_bpf_prog_query(attr, uattr);
 }
 
 #define BPF_PROG_TEST_RUN_LAST_FIELD test.ctx_out
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment
  2020-03-20 20:36 [PATCH bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
  2020-03-20 20:36 ` [PATCH bpf-next 1/6] bpf: factor out cgroup storages operations Andrii Nakryiko
  2020-03-20 20:36 ` [PATCH bpf-next 2/6] bpf: factor out attach_type to prog_type mapping for attach/detach Andrii Nakryiko
@ 2020-03-20 20:36 ` Andrii Nakryiko
  2020-03-20 21:33   ` Stanislav Fomichev
  2020-03-20 23:46   ` [Potential Spoof] " Andrey Ignatov
  2020-03-20 20:36 ` [PATCH bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Andrii Nakryiko
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2020-03-20 20:36 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Implement new sub-command to attach cgroup BPF programs and return FD-based
bpf_link back on success. bpf_link, once attached to cgroup, cannot be
replaced, except by owner having its FD. cgroup bpf_link has semantics of
BPF_F_ALLOW_MULTI BPF program attachments and can co-exist with
non-bpf_link-based BPF cgroup attachments.

To prevent bpf_cgroup_link from keeping cgroup alive past the point when no
BPF program can be executed, implement auto-detachment of link. When
cgroup_bpf_release() is called, all attached bpf_links are forced to release
cgroup refcounts, but they leave bpf_link otherwise active and allocated, as
well as still owning underlying bpf_prog. This is because user-space might
still have FDs open and active, so bpf_link as a user-referenced object can't
be freed yet. Once last active FD is closed, bpf_link will be freed and
underlying bpf_prog refcount will be dropped. But cgroup refcount won't be
touched, because cgroup is released already.

The inherent race between bpf_cgroup_link release (from closing last FD) and
cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So
the only additional check required is when bpf_cgroup_link attempts to detach
itself from cgroup. At that time we need to check whether there is still
cgroup associated with that link. And if not, exit with success, because
bpf_cgroup_link was already successfully detached.

Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 include/linux/bpf-cgroup.h     |  27 ++-
 include/linux/bpf.h            |  10 +-
 include/uapi/linux/bpf.h       |   9 +-
 kernel/bpf/cgroup.c            | 313 +++++++++++++++++++++++++--------
 kernel/bpf/syscall.c           |  62 +++++--
 kernel/cgroup/cgroup.c         |  14 +-
 tools/include/uapi/linux/bpf.h |   9 +-
 7 files changed, 345 insertions(+), 99 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index a7cd5c7a2509..ab95824a1d99 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -51,9 +51,16 @@ struct bpf_cgroup_storage {
 	struct rcu_head rcu;
 };
 
+struct bpf_cgroup_link {
+	struct bpf_link link;
+	struct cgroup *cgroup;
+	enum bpf_attach_type type;
+};
+
 struct bpf_prog_list {
 	struct list_head node;
 	struct bpf_prog *prog;
+	struct bpf_cgroup_link *link;
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE];
 };
 
@@ -84,20 +91,23 @@ struct cgroup_bpf {
 int cgroup_bpf_inherit(struct cgroup *cgrp);
 void cgroup_bpf_offline(struct cgroup *cgrp);
 
-int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
-			struct bpf_prog *replace_prog,
+int __cgroup_bpf_attach(struct cgroup *cgrp,
+			struct bpf_prog *prog, struct bpf_prog *replace_prog,
+			struct bpf_cgroup_link *link,
 			enum bpf_attach_type type, u32 flags);
 int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
+			struct bpf_cgroup_link *link,
 			enum bpf_attach_type type);
 int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		       union bpf_attr __user *uattr);
 
 /* Wrapper for __cgroup_bpf_*() protected by cgroup_mutex */
-int cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
-		      struct bpf_prog *replace_prog, enum bpf_attach_type type,
+int cgroup_bpf_attach(struct cgroup *cgrp,
+		      struct bpf_prog *prog, struct bpf_prog *replace_prog,
+		      struct bpf_cgroup_link *link, enum bpf_attach_type type,
 		      u32 flags);
 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
-		      enum bpf_attach_type type, u32 flags);
+		      enum bpf_attach_type type);
 int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		     union bpf_attr __user *uattr);
 
@@ -332,6 +342,7 @@ int cgroup_bpf_prog_attach(const union bpf_attr *attr,
 			   enum bpf_prog_type ptype, struct bpf_prog *prog);
 int cgroup_bpf_prog_detach(const union bpf_attr *attr,
 			   enum bpf_prog_type ptype);
+int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
 int cgroup_bpf_prog_query(const union bpf_attr *attr,
 			  union bpf_attr __user *uattr);
 #else
@@ -354,6 +365,12 @@ static inline int cgroup_bpf_prog_detach(const union bpf_attr *attr,
 	return -EINVAL;
 }
 
+static inline int cgroup_bpf_link_attach(const union bpf_attr *attr,
+					 struct bpf_prog *prog)
+{
+	return -EINVAL;
+}
+
 static inline int cgroup_bpf_prog_query(const union bpf_attr *attr,
 					union bpf_attr __user *uattr)
 {
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bdb981c204fa..0f7c2f48c734 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1081,15 +1081,23 @@ extern int sysctl_unprivileged_bpf_disabled;
 int bpf_map_new_fd(struct bpf_map *map, int flags);
 int bpf_prog_new_fd(struct bpf_prog *prog);
 
-struct bpf_link;
+struct bpf_link {
+	atomic64_t refcnt;
+	const struct bpf_link_ops *ops;
+	struct bpf_prog *prog;
+	struct work_struct work;
+};
 
 struct bpf_link_ops {
 	void (*release)(struct bpf_link *link);
 	void (*dealloc)(struct bpf_link *link);
+
 };
 
 void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
 		   struct bpf_prog *prog);
+void bpf_link_cleanup(struct bpf_link *link, struct file *link_file,
+		      int link_fd);
 void bpf_link_inc(struct bpf_link *link);
 void bpf_link_put(struct bpf_link *link);
 int bpf_link_new_fd(struct bpf_link *link);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 5d01c5c7e598..fad9f79bb8f1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -111,6 +111,7 @@ enum bpf_cmd {
 	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
 	BPF_MAP_UPDATE_BATCH,
 	BPF_MAP_DELETE_BATCH,
+	BPF_LINK_CREATE,
 };
 
 enum bpf_map_type {
@@ -539,7 +540,7 @@ union bpf_attr {
 		__u32		prog_cnt;
 	} query;
 
-	struct {
+	struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
 		__u64 name;
 		__u32 prog_fd;
 	} raw_tracepoint;
@@ -567,6 +568,12 @@ union bpf_attr {
 		__u64		probe_offset;	/* output: probe_offset */
 		__u64		probe_addr;	/* output: probe_addr */
 	} task_fd_query;
+
+	struct { /* struct used by BPF_LINK_CREATE command */
+		__u32		prog_fd;	/* eBPF program to attach */
+		__u32		target_fd;	/* object to attach to */
+		__u32		attach_type;	/* attach type */
+	} link_create;
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 9c8472823a7f..b960e8633f23 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -80,6 +80,17 @@ static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[])
 		bpf_cgroup_storage_unlink(storages[stype]);
 }
 
+/* Called when bpf_cgroup_link is auto-detached from dying cgroup.
+ * It drops cgroup and bpf_prog refcounts, and marks bpf_link as defunct. It
+ * doesn't free link memory, which will eventually be done by bpf_link's
+ * release() callback, when its last FD is closed.
+ */
+static void bpf_cgroup_link_auto_detach(struct bpf_cgroup_link *link)
+{
+	cgroup_put(link->cgroup);
+	link->cgroup = NULL;
+}
+
 /**
  * cgroup_bpf_release() - put references of all bpf programs and
  *                        release all cgroup bpf data
@@ -100,7 +111,10 @@ static void cgroup_bpf_release(struct work_struct *work)
 
 		list_for_each_entry_safe(pl, tmp, progs, node) {
 			list_del(&pl->node);
-			bpf_prog_put(pl->prog);
+			if (pl->prog)
+				bpf_prog_put(pl->prog);
+			if (pl->link)
+				bpf_cgroup_link_auto_detach(pl->link);
 			bpf_cgroup_storages_unlink(pl->storage);
 			bpf_cgroup_storages_free(pl->storage);
 			kfree(pl);
@@ -134,6 +148,18 @@ static void cgroup_bpf_release_fn(struct percpu_ref *ref)
 	queue_work(system_wq, &cgrp->bpf.release_work);
 }
 
+/* Get underlying bpf_prog of bpf_prog_list entry, regardless if it's through
+ * link or direct prog.
+ */
+static struct bpf_prog *prog_list_prog(struct bpf_prog_list *pl)
+{
+	if (pl->prog)
+		return pl->prog;
+	if (pl->link)
+		return pl->link->link.prog;
+	return NULL;
+}
+
 /* count number of elements in the list.
  * it's slow but the list cannot be long
  */
@@ -143,7 +169,7 @@ static u32 prog_list_length(struct list_head *head)
 	u32 cnt = 0;
 
 	list_for_each_entry(pl, head, node) {
-		if (!pl->prog)
+		if (!prog_list_prog(pl))
 			continue;
 		cnt++;
 	}
@@ -212,11 +238,11 @@ static int compute_effective_progs(struct cgroup *cgrp,
 			continue;
 
 		list_for_each_entry(pl, &p->bpf.progs[type], node) {
-			if (!pl->prog)
+			if (!prog_list_prog(pl))
 				continue;
 
 			item = &progs->items[cnt];
-			item->prog = pl->prog;
+			item->prog = prog_list_prog(pl);
 			bpf_cgroup_storages_assign(item->cgroup_storage,
 						   pl->storage);
 			cnt++;
@@ -333,19 +359,62 @@ static int update_effective_progs(struct cgroup *cgrp,
 
 #define BPF_CGROUP_MAX_PROGS 64
 
+static struct bpf_prog_list *find_attach_entry(struct list_head *progs,
+					       struct bpf_prog *prog,
+					       struct bpf_cgroup_link *link,
+					       struct bpf_prog *replace_prog,
+					       bool allow_multi)
+{
+	struct bpf_prog_list *pl;
+
+	/* legacy single-attach case */
+	if (!allow_multi) {
+		if (list_empty(progs))
+			return NULL;
+		return list_first_entry(progs, typeof(*pl), node);
+	}
+
+	/* direct prog multi-attach case */
+	if (prog) {
+		list_for_each_entry(pl, progs, node) {
+			if (pl->prog == prog)
+				/* disallow attaching the same prog twice */
+				return ERR_PTR(-EINVAL);
+			if (replace_prog && pl->prog == replace_prog)
+				/* a match found */
+				return pl;
+		}
+		if (replace_prog)
+			/* prog to replace not found for cgroup */
+			return ERR_PTR(-ENOENT);
+		return NULL;
+	}
+
+	/* link (multi-attach) case */
+	list_for_each_entry(pl, progs, node) {
+		if (pl->link == link)
+			/* disallow attaching the same link twice */
+			return ERR_PTR(-EINVAL);
+	}
+	return NULL;
+}
+
 /**
- * __cgroup_bpf_attach() - Attach the program to a cgroup, and
+ * __cgroup_bpf_attach() - Attach the program or the link to a cgroup, and
  *                         propagate the change to descendants
  * @cgrp: The cgroup which descendants to traverse
  * @prog: A program to attach
+ * @link: A link to attach
  * @replace_prog: Previously attached program to replace if BPF_F_REPLACE is set
  * @type: Type of attach operation
  * @flags: Option flags
  *
+ * Exactly one of @prog or @link can be non-null.
  * Must be called with cgroup_mutex held.
  */
-int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
-			struct bpf_prog *replace_prog,
+int __cgroup_bpf_attach(struct cgroup *cgrp,
+			struct bpf_prog *prog, struct bpf_prog *replace_prog,
+			struct bpf_cgroup_link *link,
 			enum bpf_attach_type type, u32 flags)
 {
 	u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI));
@@ -353,13 +422,19 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
 	struct bpf_prog *old_prog = NULL;
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE],
 		*old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {NULL};
-	struct bpf_prog_list *pl, *replace_pl = NULL;
+	struct bpf_prog_list *pl;
 	int err;
 
 	if (((flags & BPF_F_ALLOW_OVERRIDE) && (flags & BPF_F_ALLOW_MULTI)) ||
 	    ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI)))
 		/* invalid combination */
 		return -EINVAL;
+	if (link && (prog || replace_prog))
+		/* only either link or prog/replace_prog can be specified */
+		return -EINVAL;
+	if (!!replace_prog != !!(flags & BPF_F_REPLACE))
+		/* replace_prog implies BPF_F_REPLACE, and vice versa */
+		return -EINVAL;
 
 	if (!hierarchy_allows_attach(cgrp, type))
 		return -EPERM;
@@ -374,26 +449,16 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
 	if (prog_list_length(progs) >= BPF_CGROUP_MAX_PROGS)
 		return -E2BIG;
 
-	if (flags & BPF_F_ALLOW_MULTI) {
-		list_for_each_entry(pl, progs, node) {
-			if (pl->prog == prog)
-				/* disallow attaching the same prog twice */
-				return -EINVAL;
-			if (pl->prog == replace_prog)
-				replace_pl = pl;
-		}
-		if ((flags & BPF_F_REPLACE) && !replace_pl)
-			/* prog to replace not found for cgroup */
-			return -ENOENT;
-	} else if (!list_empty(progs)) {
-		replace_pl = list_first_entry(progs, typeof(*pl), node);
-	}
+	pl = find_attach_entry(progs, prog, link, replace_prog,
+			       flags & BPF_F_ALLOW_MULTI);
+	if (IS_ERR(pl))
+		return PTR_ERR(pl);
 
-	if (bpf_cgroup_storages_alloc(storage, prog))
+	if (bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog))
 		return -ENOMEM;
 
-	if (replace_pl) {
-		pl = replace_pl;
+	if (pl) {
+		/* only non-link case is possible */
 		old_prog = pl->prog;
 		bpf_cgroup_storages_unlink(pl->storage);
 		bpf_cgroup_storages_assign(old_storage, pl->storage);
@@ -407,6 +472,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
 	}
 
 	pl->prog = prog;
+	pl->link = link;
 	bpf_cgroup_storages_assign(pl->storage, storage);
 	cgrp->bpf.flags[type] = saved_flags;
 
@@ -414,80 +480,91 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
 	if (err)
 		goto cleanup;
 
-	static_branch_inc(&cgroup_bpf_enabled_key);
 	bpf_cgroup_storages_free(old_storage);
-	if (old_prog) {
+	if (old_prog)
 		bpf_prog_put(old_prog);
-		static_branch_dec(&cgroup_bpf_enabled_key);
-	}
-	bpf_cgroup_storages_link(storage, cgrp, type);
+	else
+		static_branch_inc(&cgroup_bpf_enabled_key);
+	bpf_cgroup_storages_link(pl->storage, cgrp, type);
 	return 0;
 
 cleanup:
-	/* and cleanup the prog list */
-	pl->prog = old_prog;
+	if (old_prog)
+		pl->prog = old_prog;
 	bpf_cgroup_storages_free(pl->storage);
 	bpf_cgroup_storages_assign(pl->storage, old_storage);
 	bpf_cgroup_storages_link(pl->storage, cgrp, type);
-	if (!replace_pl) {
+	if (!old_prog) {
 		list_del(&pl->node);
 		kfree(pl);
 	}
 	return err;
 }
 
+static struct bpf_prog_list *find_detach_entry(struct list_head *progs,
+					       struct bpf_prog *prog,
+					       struct bpf_cgroup_link *link,
+					       bool allow_multi)
+{
+	struct bpf_prog_list *pl;
+
+	if (!allow_multi) {
+		if (list_empty(progs))
+			/* report error when trying to detach and nothing is attached */
+			return ERR_PTR(-ENOENT);
+
+		/* to maintain backward compatibility NONE and OVERRIDE cgroups
+		 * allow detaching with invalid FD (prog==NULL)
+		 */
+		return list_first_entry(progs, typeof(*pl), node);
+	}
+
+	if (!prog && !link)
+		/* to detach MULTI prog the user has to specify valid FD
+		 * of the program to be detached
+		 */
+		return ERR_PTR(-EINVAL);
+
+	/* find the prog and detach it */
+	list_for_each_entry(pl, progs, node) {
+		if (pl->prog == prog && pl->link == link)
+			return pl;
+	}
+	return ERR_PTR(-ENOENT);
+}
+
 /**
- * __cgroup_bpf_detach() - Detach the program from a cgroup, and
+ * __cgroup_bpf_detach() - Detach the program or link from a cgroup, and
  *                         propagate the change to descendants
  * @cgrp: The cgroup which descendants to traverse
  * @prog: A program to detach or NULL
+ * @prog: A link to detach or NULL
  * @type: Type of detach operation
  *
+ * At most one of @prog or @link can be non-NULL.
  * Must be called with cgroup_mutex held.
  */
 int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
-			enum bpf_attach_type type)
+			struct bpf_cgroup_link *link, enum bpf_attach_type type)
 {
 	struct list_head *progs = &cgrp->bpf.progs[type];
 	u32 flags = cgrp->bpf.flags[type];
-	struct bpf_prog *old_prog = NULL;
 	struct bpf_prog_list *pl;
+	struct bpf_prog *old_prog;
 	int err;
 
-	if (flags & BPF_F_ALLOW_MULTI) {
-		if (!prog)
-			/* to detach MULTI prog the user has to specify valid FD
-			 * of the program to be detached
-			 */
-			return -EINVAL;
-	} else {
-		if (list_empty(progs))
-			/* report error when trying to detach and nothing is attached */
-			return -ENOENT;
-	}
+	if (prog && link)
+		/* only one of prog or link can be specified */
+		return -EINVAL;
 
-	if (flags & BPF_F_ALLOW_MULTI) {
-		/* find the prog and detach it */
-		list_for_each_entry(pl, progs, node) {
-			if (pl->prog != prog)
-				continue;
-			old_prog = prog;
-			/* mark it deleted, so it's ignored while
-			 * recomputing effective
-			 */
-			pl->prog = NULL;
-			break;
-		}
-		if (!old_prog)
-			return -ENOENT;
-	} else {
-		/* to maintain backward compatibility NONE and OVERRIDE cgroups
-		 * allow detaching with invalid FD (prog==NULL)
-		 */
-		pl = list_first_entry(progs, typeof(*pl), node);
-		old_prog = pl->prog;
-		pl->prog = NULL;
-	}
+	pl = find_detach_entry(progs, prog, link, flags & BPF_F_ALLOW_MULTI);
+	if (IS_ERR(pl))
+		return PTR_ERR(pl);
+
+	/* mark it deleted, so it's ignored while recomputing effective */
+	old_prog = pl->prog;
+	pl->prog = NULL;
+	pl->link = NULL;
 
 	err = update_effective_progs(cgrp, type);
 	if (err)
@@ -501,14 +578,15 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 	if (list_empty(progs))
 		/* last program was detached, reset flags to zero */
 		cgrp->bpf.flags[type] = 0;
-
-	bpf_prog_put(old_prog);
+	if (old_prog)
+		bpf_prog_put(old_prog);
 	static_branch_dec(&cgroup_bpf_enabled_key);
 	return 0;
 
 cleanup:
-	/* and restore back old_prog */
+	/* restore back prog or link */
 	pl->prog = old_prog;
+	pl->link = link;
 	return err;
 }
 
@@ -521,6 +599,7 @@ int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 	struct list_head *progs = &cgrp->bpf.progs[type];
 	u32 flags = cgrp->bpf.flags[type];
 	struct bpf_prog_array *effective;
+	struct bpf_prog *prog;
 	int cnt, ret = 0, i;
 
 	effective = rcu_dereference_protected(cgrp->bpf.effective[type],
@@ -551,7 +630,8 @@ int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 
 		i = 0;
 		list_for_each_entry(pl, progs, node) {
-			id = pl->prog->aux->id;
+			prog = prog_list_prog(pl);
+			id = prog->aux->id;
 			if (copy_to_user(prog_ids + i, &id, sizeof(id)))
 				return -EFAULT;
 			if (++i == cnt)
@@ -581,8 +661,8 @@ int cgroup_bpf_prog_attach(const union bpf_attr *attr,
 		}
 	}
 
-	ret = cgroup_bpf_attach(cgrp, prog, replace_prog, attr->attach_type,
-				attr->attach_flags);
+	ret = cgroup_bpf_attach(cgrp, prog, replace_prog, NULL,
+				attr->attach_type, attr->attach_flags);
 
 	if (replace_prog)
 		bpf_prog_put(replace_prog);
@@ -604,7 +684,7 @@ int cgroup_bpf_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
 	if (IS_ERR(prog))
 		prog = NULL;
 
-	ret = cgroup_bpf_detach(cgrp, prog, attr->attach_type, 0);
+	ret = cgroup_bpf_detach(cgrp, prog, attr->attach_type);
 	if (prog)
 		bpf_prog_put(prog);
 
@@ -612,6 +692,87 @@ int cgroup_bpf_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
 	return ret;
 }
 
+static void bpf_cgroup_link_release(struct bpf_link *link)
+{
+	struct bpf_cgroup_link *cg_link =
+		container_of(link, struct bpf_cgroup_link, link);
+
+	/* link might have been auto-detached by dying cgroup already,
+	 * in that case our work is done here
+	 */
+	if (!cg_link->cgroup)
+		return;
+
+	mutex_lock(&cgroup_mutex);
+
+	/* re-check cgroup under lock again */
+	if (!cg_link->cgroup) {
+		mutex_unlock(&cgroup_mutex);
+		return;
+	}
+
+	WARN_ON(__cgroup_bpf_detach(cg_link->cgroup, NULL, cg_link,
+				    cg_link->type));
+
+	mutex_unlock(&cgroup_mutex);
+	cgroup_put(cg_link->cgroup);
+}
+
+static void bpf_cgroup_link_dealloc(struct bpf_link *link)
+{
+	struct bpf_cgroup_link *cg_link =
+		container_of(link, struct bpf_cgroup_link, link);
+
+	kfree(cg_link);
+}
+
+const struct bpf_link_ops bpf_cgroup_link_lops = {
+	.release = bpf_cgroup_link_release,
+	.dealloc = bpf_cgroup_link_dealloc,
+};
+
+int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	struct bpf_cgroup_link *link;
+	struct file *link_file;
+	struct cgroup *cgrp;
+	int err, link_fd;
+
+	cgrp = cgroup_get_from_fd(attr->link_create.target_fd);
+	if (IS_ERR(cgrp))
+		return PTR_ERR(cgrp);
+
+	link = kzalloc(sizeof(*link), GFP_USER);
+	if (!link) {
+		err = -ENOMEM;
+		goto out_put_cgroup;
+	}
+	bpf_link_init(&link->link, &bpf_cgroup_link_lops, prog);
+	link->cgroup = cgrp;
+	link->type = attr->link_create.attach_type;
+
+	link_file = bpf_link_new_file(&link->link, &link_fd);
+	if (IS_ERR(link_file)) {
+		kfree(link);
+		err = PTR_ERR(link_file);
+		goto out_put_cgroup;
+	}
+
+	err = cgroup_bpf_attach(cgrp, NULL, NULL, link, link->type,
+				BPF_F_ALLOW_MULTI);
+	if (err) {
+		bpf_link_cleanup(&link->link, link_file, link_fd);
+		goto out_put_cgroup;
+	}
+
+	fd_install(link_fd, link_file);
+	return link_fd;
+
+out_put_cgroup:
+	cgroup_put(cgrp);
+	return err;
+}
+
 int cgroup_bpf_prog_query(const union bpf_attr *attr,
 			  union bpf_attr __user *uattr)
 {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index fd4181939064..f6e7d32a2632 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2173,13 +2173,6 @@ static int bpf_obj_get(const union bpf_attr *attr)
 				attr->file_flags);
 }
 
-struct bpf_link {
-	atomic64_t refcnt;
-	const struct bpf_link_ops *ops;
-	struct bpf_prog *prog;
-	struct work_struct work;
-};
-
 void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
 		   struct bpf_prog *prog)
 {
@@ -2193,8 +2186,8 @@ void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
  * anon_inode's release() call. This helper manages marking bpf_link as
  * defunct, releases anon_inode file and puts reserved FD.
  */
-static void bpf_link_cleanup(struct bpf_link *link, struct file *link_file,
-			     int link_fd)
+void bpf_link_cleanup(struct bpf_link *link, struct file *link_file,
+		      int link_fd)
 {
 	link->prog = NULL;
 	fput(link_file);
@@ -2252,7 +2245,8 @@ static int bpf_link_release(struct inode *inode, struct file *filp)
 #ifdef CONFIG_PROC_FS
 static const struct bpf_link_ops bpf_raw_tp_lops;
 static const struct bpf_link_ops bpf_tracing_link_lops;
-static const struct bpf_link_ops bpf_xdp_link_lops;
+
+extern const struct bpf_link_ops bpf_cgroup_link_lops;
 
 static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
 {
@@ -2265,6 +2259,8 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
 		link_type = "raw_tracepoint";
 	else if (link->ops == &bpf_tracing_link_lops)
 		link_type = "tracing";
+	else if (link->ops == &bpf_cgroup_link_lops)
+		link_type = "cgroup";
 	else
 		link_type = "unknown";
 
@@ -3533,6 +3529,49 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
 	return err;
 }
 
+#define BPF_LINK_CREATE_LAST_FIELD link_create.attach_type
+static int link_create(union bpf_attr *attr)
+{
+	enum bpf_prog_type ptype;
+	struct bpf_prog *prog;
+	int ret;
+
+	if (CHECK_ATTR(BPF_LINK_CREATE))
+		return -EINVAL;
+
+	ptype = attach_type_to_prog_type(attr->link_create.attach_type);
+	if (ptype == BPF_PROG_TYPE_UNSPEC)
+		return -EINVAL;
+
+	prog = bpf_prog_get_type(attr->link_create.prog_fd, ptype);
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	ret = bpf_prog_attach_check_attach_type(prog,
+						attr->link_create.attach_type);
+	if (ret)
+		goto err_out;
+
+	switch (ptype) {
+	case BPF_PROG_TYPE_CGROUP_SKB:
+	case BPF_PROG_TYPE_CGROUP_SOCK:
+	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
+	case BPF_PROG_TYPE_SOCK_OPS:
+	case BPF_PROG_TYPE_CGROUP_DEVICE:
+	case BPF_PROG_TYPE_CGROUP_SYSCTL:
+	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+		ret = cgroup_bpf_link_attach(attr, prog);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+err_out:
+	if (ret < 0)
+		bpf_prog_put(prog);
+	return ret;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
 {
 	union bpf_attr attr = {};
@@ -3643,6 +3682,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_MAP_DELETE_BATCH:
 		err = bpf_map_do_batch(&attr, uattr, BPF_MAP_DELETE_BATCH);
 		break;
+	case BPF_LINK_CREATE:
+		err = link_create(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 3dead0416b91..219624fba9ba 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6303,27 +6303,31 @@ void cgroup_sk_free(struct sock_cgroup_data *skcd)
 #endif	/* CONFIG_SOCK_CGROUP_DATA */
 
 #ifdef CONFIG_CGROUP_BPF
-int cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
-		      struct bpf_prog *replace_prog, enum bpf_attach_type type,
+int cgroup_bpf_attach(struct cgroup *cgrp,
+		      struct bpf_prog *prog, struct bpf_prog *replace_prog,
+		      struct bpf_cgroup_link *link,
+		      enum bpf_attach_type type,
 		      u32 flags)
 {
 	int ret;
 
 	mutex_lock(&cgroup_mutex);
-	ret = __cgroup_bpf_attach(cgrp, prog, replace_prog, type, flags);
+	ret = __cgroup_bpf_attach(cgrp, prog, replace_prog, link, type, flags);
 	mutex_unlock(&cgroup_mutex);
 	return ret;
 }
+
 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
-		      enum bpf_attach_type type, u32 flags)
+		      enum bpf_attach_type type)
 {
 	int ret;
 
 	mutex_lock(&cgroup_mutex);
-	ret = __cgroup_bpf_detach(cgrp, prog, type);
+	ret = __cgroup_bpf_detach(cgrp, prog, NULL, type);
 	mutex_unlock(&cgroup_mutex);
 	return ret;
 }
+
 int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		     union bpf_attr __user *uattr)
 {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 5d01c5c7e598..fad9f79bb8f1 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -111,6 +111,7 @@ enum bpf_cmd {
 	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
 	BPF_MAP_UPDATE_BATCH,
 	BPF_MAP_DELETE_BATCH,
+	BPF_LINK_CREATE,
 };
 
 enum bpf_map_type {
@@ -539,7 +540,7 @@ union bpf_attr {
 		__u32		prog_cnt;
 	} query;
 
-	struct {
+	struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
 		__u64 name;
 		__u32 prog_fd;
 	} raw_tracepoint;
@@ -567,6 +568,12 @@ union bpf_attr {
 		__u64		probe_offset;	/* output: probe_offset */
 		__u64		probe_addr;	/* output: probe_addr */
 	} task_fd_query;
+
+	struct { /* struct used by BPF_LINK_CREATE command */
+		__u32		prog_fd;	/* eBPF program to attach */
+		__u32		target_fd;	/* object to attach to */
+		__u32		attach_type;	/* attach type */
+	} link_create;
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-20 20:36 [PATCH bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2020-03-20 20:36 ` [PATCH bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment Andrii Nakryiko
@ 2020-03-20 20:36 ` Andrii Nakryiko
  2020-03-21  0:33   ` kbuild test robot
                     ` (2 more replies)
  2020-03-20 20:36 ` [PATCH bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment Andrii Nakryiko
  2020-03-20 20:36 ` [PATCH bpf-next 6/6] selftests/bpf: test FD-based " Andrii Nakryiko
  5 siblings, 3 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2020-03-20 20:36 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from
under given bpf_link. Currently this is only supported for bpf_cgroup_link,
but will be extended to other kinds of bpf_links in follow-up patches.

For bpf_cgroup_link, implemented functionality matches existing semantics for
direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either
unconditionally set new bpf_prog regardless of which bpf_prog is currently
active under given bpf_link, or, optionally, can specify expected active
bpf_prog. If active bpf_prog doesn't match expected one, operation is a noop
and returns a failure.

cgroup_bpf_replace() operation is resolving race between auto-detachment and
bpf_prog update in the same fashion as it's done for bpf_link detachment,
except in this case update has no way of succeeding because of target cgroup
marked as dying. So in this case error is returned.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 include/linux/bpf-cgroup.h |  4 ++
 include/uapi/linux/bpf.h   | 12 ++++++
 kernel/bpf/cgroup.c        | 77 ++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c       | 60 +++++++++++++++++++++++++++++
 kernel/cgroup/cgroup.c     | 21 +++++++++++
 5 files changed, 174 insertions(+)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index ab95824a1d99..5735d8bfd69e 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -98,6 +98,8 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
 int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 			struct bpf_cgroup_link *link,
 			enum bpf_attach_type type);
+int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
+			 struct bpf_prog *new_prog);
 int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		       union bpf_attr __user *uattr);
 
@@ -108,6 +110,8 @@ int cgroup_bpf_attach(struct cgroup *cgrp,
 		      u32 flags);
 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 		      enum bpf_attach_type type);
+int cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
+		       struct bpf_prog *old_prog, struct bpf_prog *new_prog);
 int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		     union bpf_attr __user *uattr);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index fad9f79bb8f1..fa944093f9fc 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -112,6 +112,7 @@ enum bpf_cmd {
 	BPF_MAP_UPDATE_BATCH,
 	BPF_MAP_DELETE_BATCH,
 	BPF_LINK_CREATE,
+	BPF_LINK_UPDATE,
 };
 
 enum bpf_map_type {
@@ -574,6 +575,17 @@ union bpf_attr {
 		__u32		target_fd;	/* object to attach to */
 		__u32		attach_type;	/* attach type */
 	} link_create;
+
+	struct { /* struct used by BPF_LINK_UPDATE command */
+		__u32		link_fd;	/* link fd */
+		/* new program fd to update link with */
+		__u32		new_prog_fd;
+		__u32		flags;		/* extra flags */
+		/* expected link's program fd; is specified only if
+		 * BPF_F_REPLACE flag is set in flags */
+		__u32		old_prog_fd;
+	} link_update;
+
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index b960e8633f23..b9f4971336f3 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -501,6 +501,83 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
 	return err;
 }
 
+/* Swap updated BPF program for given link in effective program arrays across
+ * all descendant cgroups. This function is guaranteed to succeed.
+ */
+static void replace_effective_prog(struct cgroup *cgrp,
+				   enum bpf_attach_type type,
+				   struct bpf_cgroup_link *link)
+{
+	struct bpf_prog_array_item *item;
+	struct cgroup_subsys_state *css;
+	struct bpf_prog_array *progs;
+	struct bpf_prog_list *pl;
+	struct list_head *head;
+	struct cgroup *cg;
+	int pos;
+
+	css_for_each_descendant_pre(css, &cgrp->self) {
+		struct cgroup *desc = container_of(css, struct cgroup, self);
+
+		if (percpu_ref_is_zero(&desc->bpf.refcnt))
+			continue;
+
+		/* found position of link in effective progs array */
+		for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) {
+			if (pos && !(cg->bpf.flags[type] & BPF_F_ALLOW_MULTI))
+				continue;
+
+			head = &cg->bpf.progs[type];
+			list_for_each_entry(pl, head, node) {
+				if (!prog_list_prog(pl))
+					continue;
+				if (pl->link == link)
+					goto found;
+				pos++;
+			}
+		}
+found:
+		BUG_ON(!cg);
+		progs = rcu_dereference_protected(
+				desc->bpf.effective[type],
+				lockdep_is_held(&cgroup_mutex));
+		item = &progs->items[pos];
+		WRITE_ONCE(item->prog, link->link.prog);
+	}
+}
+
+/**
+ * __cgroup_bpf_replace() - Replace link's program and propagate the change
+ *                          to descendants
+ * @cgrp: The cgroup which descendants to traverse
+ * @link: A link for which to replace BPF program
+ * @type: Type of attach operation
+ *
+ * Must be called with cgroup_mutex held.
+ */
+int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
+			 struct bpf_prog *new_prog)
+{
+	struct list_head *progs = &cgrp->bpf.progs[link->type];
+	struct bpf_prog *old_prog;
+	struct bpf_prog_list *pl;
+	bool found = false;
+
+	list_for_each_entry(pl, progs, node) {
+		if (pl->link == link) {
+			found = true;
+			break;
+		}
+	}
+	if (!found)
+		return -ENOENT;
+
+	old_prog = xchg(&link->link.prog, new_prog);
+	replace_effective_prog(cgrp, link->type, link);
+	bpf_prog_put(old_prog);
+	return 0;
+}
+
 static struct bpf_prog_list *find_detach_entry(struct list_head *progs,
 					       struct bpf_prog *prog,
 					       struct bpf_cgroup_link *link,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f6e7d32a2632..1ff7aaa2c727 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3572,6 +3572,63 @@ static int link_create(union bpf_attr *attr)
 	return ret;
 }
 
+#define BPF_LINK_UPDATE_LAST_FIELD link_update.old_prog_fd
+
+static int link_update(union bpf_attr *attr)
+{
+	struct bpf_prog *old_prog = NULL, *new_prog;
+	enum bpf_prog_type ptype;
+	struct bpf_link *link;
+	u32 flags;
+	int ret;
+
+	if (CHECK_ATTR(BPF_LINK_UPDATE))
+		return -EINVAL;
+
+	flags = attr->link_update.flags;
+	if (flags & ~BPF_F_REPLACE)
+		return -EINVAL;
+
+	link = bpf_link_get_from_fd(attr->link_update.link_fd);
+	if (IS_ERR(link))
+		return PTR_ERR(link);
+
+	new_prog = bpf_prog_get(attr->link_update.new_prog_fd);
+	if (IS_ERR(new_prog))
+		return PTR_ERR(new_prog);
+
+	if (flags & BPF_F_REPLACE) {
+		old_prog = bpf_prog_get(attr->link_update.old_prog_fd);
+		if (IS_ERR(old_prog)) {
+			ret = PTR_ERR(old_prog);
+			old_prog = NULL;
+			goto out_put_progs;
+		}
+	}
+
+	if (link->ops == &bpf_cgroup_link_lops) {
+		struct bpf_cgroup_link *cg_link;
+
+		cg_link = container_of(link, struct bpf_cgroup_link, link);
+		ptype = attach_type_to_prog_type(cg_link->type);
+		if (ptype != new_prog->type) {
+			ret = -EINVAL;
+			goto out_put_progs;
+		}
+		ret = cgroup_bpf_replace(cg_link->cgroup, cg_link,
+					 old_prog, new_prog);
+	} else {
+		ret = -EINVAL;
+	}
+
+out_put_progs:
+	if (old_prog)
+		bpf_prog_put(old_prog);
+	if (ret)
+		bpf_prog_put(new_prog);
+	return ret;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
 {
 	union bpf_attr attr = {};
@@ -3685,6 +3742,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_LINK_CREATE:
 		err = link_create(&attr);
 		break;
+	case BPF_LINK_UPDATE:
+		err = link_update(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 219624fba9ba..d4787fccf183 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6317,6 +6317,27 @@ int cgroup_bpf_attach(struct cgroup *cgrp,
 	return ret;
 }
 
+int cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
+		       struct bpf_prog *old_prog, struct bpf_prog *new_prog)
+{
+	int ret;
+
+	mutex_lock(&cgroup_mutex);
+	/* link might have been auto-released by dying cgroup, so fail */
+	if (!link->cgroup) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+	if (old_prog && link->link.prog != old_prog) {
+		ret = -EPERM;
+		goto out_unlock;
+	}
+	ret = __cgroup_bpf_replace(cgrp, link, new_prog);
+out_unlock:
+	mutex_unlock(&cgroup_mutex);
+	return ret;
+}
+
 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 		      enum bpf_attach_type type)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment
  2020-03-20 20:36 [PATCH bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2020-03-20 20:36 ` [PATCH bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Andrii Nakryiko
@ 2020-03-20 20:36 ` Andrii Nakryiko
  2020-03-23 11:02   ` Toke Høiland-Jørgensen
  2020-03-20 20:36 ` [PATCH bpf-next 6/6] selftests/bpf: test FD-based " Andrii Nakryiko
  5 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2020-03-20 20:36 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add bpf_program__attach_cgroup(), which uses BPF_LINK_CREATE subcommand to
create an FD-based kernel bpf_link. Also add low-level bpf_link_create() API.

If expected_attach_type is not specified explicitly with
bpf_program__set_expected_attach_type(), libbpf will try to determine proper
attach type from BPF program's section definition.

Also add support for bpf_link's underlying BPF program replacement:
  - unconditional through high-level bpf_link__update_program() API;
  - cmpxchg-like with specifying expected current BPF program through
    low-level bpf_link_update() API.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/include/uapi/linux/bpf.h | 12 +++++++++
 tools/lib/bpf/bpf.c            | 34 +++++++++++++++++++++++++
 tools/lib/bpf/bpf.h            | 19 ++++++++++++++
 tools/lib/bpf/libbpf.c         | 46 ++++++++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.h         |  8 +++++-
 tools/lib/bpf/libbpf.map       |  4 +++
 6 files changed, 122 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index fad9f79bb8f1..fa944093f9fc 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -112,6 +112,7 @@ enum bpf_cmd {
 	BPF_MAP_UPDATE_BATCH,
 	BPF_MAP_DELETE_BATCH,
 	BPF_LINK_CREATE,
+	BPF_LINK_UPDATE,
 };
 
 enum bpf_map_type {
@@ -574,6 +575,17 @@ union bpf_attr {
 		__u32		target_fd;	/* object to attach to */
 		__u32		attach_type;	/* attach type */
 	} link_create;
+
+	struct { /* struct used by BPF_LINK_UPDATE command */
+		__u32		link_fd;	/* link fd */
+		/* new program fd to update link with */
+		__u32		new_prog_fd;
+		__u32		flags;		/* extra flags */
+		/* expected link's program fd; is specified only if
+		 * BPF_F_REPLACE flag is set in flags */
+		__u32		old_prog_fd;
+	} link_update;
+
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index c6dafe563176..35c34fc81bd0 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -584,6 +584,40 @@ int bpf_prog_detach2(int prog_fd, int target_fd, enum bpf_attach_type type)
 	return sys_bpf(BPF_PROG_DETACH, &attr, sizeof(attr));
 }
 
+int bpf_link_create(int prog_fd, int target_fd,
+		    enum bpf_attach_type attach_type,
+		    const struct bpf_link_create_opts *opts)
+{
+	union bpf_attr attr;
+
+	if (!OPTS_VALID(opts, bpf_link_create_opts))
+		return -EINVAL;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.link_create.prog_fd = prog_fd;
+	attr.link_create.target_fd = target_fd;
+	attr.link_create.attach_type = attach_type;
+
+	return sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
+}
+
+int bpf_link_update(int link_fd, int new_prog_fd,
+		    const struct bpf_link_update_opts *opts)
+{
+	union bpf_attr attr;
+
+	if (!OPTS_VALID(opts, bpf_link_update_opts))
+		return -EINVAL;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.link_update.link_fd = link_fd;
+	attr.link_update.new_prog_fd = new_prog_fd;
+	attr.link_update.flags = OPTS_GET(opts, flags, 0);
+	attr.link_update.old_prog_fd = OPTS_GET(opts, old_prog_fd, 0);
+
+	return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr));
+}
+
 int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
 		   __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt)
 {
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index b976e77316cc..46d47afdd887 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -168,6 +168,25 @@ LIBBPF_API int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
 LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd,
 				enum bpf_attach_type type);
 
+struct bpf_link_create_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+};
+#define bpf_link_create_opts__last_field sz
+
+LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
+			       enum bpf_attach_type attach_type,
+			       const struct bpf_link_create_opts *opts);
+
+struct bpf_link_update_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u32 flags;	   /* extra flags */
+	__u32 old_prog_fd; /* expected old program FD */
+};
+#define bpf_link_update_opts__last_field old_prog_fd
+
+LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd,
+			       const struct bpf_link_update_opts *opts);
+
 struct bpf_prog_test_run_attr {
 	int prog_fd;
 	int repeat;
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 085e41f9b68e..8b23c70033d3 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6951,6 +6951,12 @@ struct bpf_link {
 	bool disconnected;
 };
 
+/* Replace link's underlying BPF program with the new one */
+int bpf_link__update_program(struct bpf_link *link, struct bpf_program *prog)
+{
+	return bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog), NULL);
+}
+
 /* Release "ownership" of underlying BPF resource (typically, BPF program
  * attached to some BPF hook, e.g., tracepoint, kprobe, etc). Disconnected
  * link, when destructed through bpf_link__destroy() call won't attempt to
@@ -7489,6 +7495,46 @@ static struct bpf_link *attach_trace(const struct bpf_sec_def *sec,
 	return bpf_program__attach_trace(prog);
 }
 
+struct bpf_link *bpf_program__attach_cgroup(struct bpf_program *prog,
+					    int cgroup_fd)
+{
+	const struct bpf_sec_def *sec_def;
+	enum bpf_attach_type attach_type;
+	char errmsg[STRERR_BUFSIZE];
+	struct bpf_link *link;
+	int prog_fd, link_fd;
+
+	prog_fd = bpf_program__fd(prog);
+	if (prog_fd < 0) {
+		pr_warn("program '%s': can't attach before loaded\n",
+			bpf_program__title(prog, false));
+		return ERR_PTR(-EINVAL);
+	}
+
+	link = calloc(1, sizeof(*link));
+	if (!link)
+		return ERR_PTR(-ENOMEM);
+	link->detach = &bpf_link__detach_fd;
+
+	attach_type = bpf_program__get_expected_attach_type(prog);
+	if (!attach_type) {
+		sec_def = find_sec_def(bpf_program__title(prog, false));
+		if (sec_def)
+			attach_type = sec_def->attach_type;
+	}
+	link_fd = bpf_link_create(prog_fd, cgroup_fd, attach_type, NULL);
+	if (link_fd < 0) {
+		link_fd = -errno;
+		free(link);
+		pr_warn("program '%s': failed to attach to cgroup: %s\n",
+			bpf_program__title(prog, false),
+			libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
+		return ERR_PTR(link_fd);
+	}
+	link->fd = link_fd;
+	return link;
+}
+
 struct bpf_link *bpf_program__attach(struct bpf_program *prog)
 {
 	const struct bpf_sec_def *sec_def;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index d38d7a629417..af9ececb8ae9 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -224,6 +224,8 @@ LIBBPF_API int bpf_link__fd(const struct bpf_link *link);
 LIBBPF_API const char *bpf_link__pin_path(const struct bpf_link *link);
 LIBBPF_API int bpf_link__pin(struct bpf_link *link, const char *path);
 LIBBPF_API int bpf_link__unpin(struct bpf_link *link);
+LIBBPF_API int bpf_link__update_program(struct bpf_link *link,
+					struct bpf_program *prog);
 LIBBPF_API void bpf_link__disconnect(struct bpf_link *link);
 LIBBPF_API int bpf_link__destroy(struct bpf_link *link);
 
@@ -245,11 +247,15 @@ bpf_program__attach_tracepoint(struct bpf_program *prog,
 LIBBPF_API struct bpf_link *
 bpf_program__attach_raw_tracepoint(struct bpf_program *prog,
 				   const char *tp_name);
-
 LIBBPF_API struct bpf_link *
 bpf_program__attach_trace(struct bpf_program *prog);
+LIBBPF_API struct bpf_link *
+bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd);
+
 struct bpf_map;
+
 LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(struct bpf_map *map);
+
 struct bpf_insn;
 
 /*
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 5129283c0284..793c5af07b23 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -243,5 +243,9 @@ LIBBPF_0.0.8 {
 		bpf_link__pin;
 		bpf_link__pin_path;
 		bpf_link__unpin;
+		bpf_link__update_program;
+		bpf_link_create;
+		bpf_link_update;
+		bpf_program__attach_cgroup;
 		bpf_program__set_attach_target;
 } LIBBPF_0.0.7;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 6/6] selftests/bpf: test FD-based cgroup attachment
  2020-03-20 20:36 [PATCH bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2020-03-20 20:36 ` [PATCH bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment Andrii Nakryiko
@ 2020-03-20 20:36 ` Andrii Nakryiko
  5 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2020-03-20 20:36 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add selftests to exercise FD-based cgroup BPF program attachments and their
intermixing with legacy stateful cgroup BPF attachments. Auto-detachment and
program replacement (both unconditional and cmpxchng-like) are tested as well.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 .../selftests/bpf/prog_tests/cgroup_link.c    | 242 ++++++++++++++++++
 .../selftests/bpf/progs/test_cgroup_link.c    |  24 ++
 2 files changed, 266 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_link.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_cgroup_link.c

diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_link.c b/tools/testing/selftests/bpf/prog_tests/cgroup_link.c
new file mode 100644
index 000000000000..9596aa5f2c07
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/cgroup_link.c
@@ -0,0 +1,242 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include "cgroup_helpers.h"
+#include "test_cgroup_link.skel.h"
+
+static __u32 duration = 0;
+#define PING_CMD	"ping -q -c1 -w1 127.0.0.1 > /dev/null"
+
+void test_cgroup_link(void)
+{
+	struct test_cgroup_link *skel;
+	struct {
+		const char *path;
+		int fd;
+	} cgs[] = {
+		{ "/cg1" },
+		{ "/cg1/cg2" },
+		{ "/cg1/cg2/cg3" },
+	};
+	int last_cg = ARRAY_SIZE(cgs) - 1, cg_nr = ARRAY_SIZE(cgs);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, link_upd_opts);
+	struct bpf_link *links[ARRAY_SIZE(cgs)] = {};
+	__u32 prog_ids[5], prog_cnt = 0, attach_flags;
+	int i = 0, err, prog_fd;
+	bool detach_legacy = false;
+
+	skel = test_cgroup_link__open_and_load();
+	if (CHECK(!skel, "skel_open_load", "failed to open/load skeleton\n"))
+		return;
+	prog_fd = bpf_program__fd(skel->progs.egress);
+
+	err = setup_cgroup_environment();
+	if (CHECK(err, "cg_init", "failed: %d\n", err))
+		goto cleanup;
+
+	for (i = 0; i < cg_nr; i++) {
+		cgs[i].fd = create_and_get_cgroup(cgs[i].path);
+		if (CHECK(cgs[i].fd < 0, "cg_create", "fail: %d\n", cgs[i].fd))
+			goto cleanup;
+	}
+
+	err = join_cgroup(cgs[last_cg].path);
+	if (CHECK(err, "cg_join", "fail: %d\n", err))
+		goto cleanup;
+
+	for (i = 0; i < cg_nr; i++) {
+		links[i] = bpf_program__attach_cgroup(skel->progs.egress,
+						      cgs[i].fd);
+		if (CHECK(IS_ERR(links[i]), "cg_attach", "i: %d, err: %ld\n",
+				 i, PTR_ERR(links[i])))
+			goto cleanup;
+	}
+
+	CHECK_FAIL(system(PING_CMD));
+	if (CHECK(skel->bss->calls != cg_nr, "call_cnt", "exp %d, got %d\n",
+		  cg_nr, skel->bss->calls))
+		goto cleanup;
+
+	/* query the number of effective progs in last cg */
+	CHECK_FAIL(bpf_prog_query(cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS,
+				  BPF_F_QUERY_EFFECTIVE, NULL, NULL,
+				  &prog_cnt));
+	if (CHECK(prog_cnt != cg_nr, "effect_cnt", "exp %d, got %d\n",
+		  cg_nr, prog_cnt))
+		goto cleanup;
+	/* query the effective prog IDs in last cg */
+	CHECK_FAIL(bpf_prog_query(cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS,
+				  BPF_F_QUERY_EFFECTIVE, &attach_flags,
+				  prog_ids, &prog_cnt));
+	if (CHECK(prog_cnt != cg_nr, "effect_cnt", "exp %d, got %d\n",
+		  cg_nr, prog_cnt))
+		goto cleanup;
+	CHECK_FAIL(attach_flags != BPF_F_ALLOW_MULTI);
+	for (i = 1; i < prog_cnt; i++)
+		CHECK(prog_ids[i - 1] != prog_ids[i], "prod_id_check",
+		      "idx %d, prev id %d, cur id %d\n",
+		      i, prog_ids[i - 1], prog_ids[i]);
+
+	/* detach bottom program and ping again */
+	bpf_link__destroy(links[last_cg]);
+	links[last_cg] = NULL;
+
+	skel->bss->calls = 0;
+	CHECK_FAIL(system(PING_CMD));
+	if (CHECK(skel->bss->calls != cg_nr - 1, "call_cnt", "exp %d, got %d\n",
+		  cg_nr - 1, skel->bss->calls))
+		goto cleanup;
+
+	/* mix in with non link-based multi-attachments */
+	err = bpf_prog_attach(prog_fd, cgs[last_cg].fd,
+			      BPF_CGROUP_INET_EGRESS, BPF_F_ALLOW_MULTI);
+	if (CHECK(err, "cg_attach_legacy", "errno=%d\n", errno))
+		goto cleanup;
+	detach_legacy = true;
+
+	links[last_cg] = bpf_program__attach_cgroup(skel->progs.egress,
+						    cgs[last_cg].fd);
+	if (CHECK(IS_ERR(links[last_cg]), "cg_attach", "err: %ld\n",
+		  PTR_ERR(links[last_cg])))
+		goto cleanup;
+
+	skel->bss->calls = 0;
+	CHECK_FAIL(system(PING_CMD));
+	CHECK(skel->bss->calls != cg_nr + 1, "call_cnt", "exp %d, got %d\n",
+	      cg_nr + 1, skel->bss->calls);
+
+	bpf_link__destroy(links[last_cg]);
+	links[last_cg] = NULL;
+
+	if (CHECK(bpf_prog_detach2(prog_fd, cgs[last_cg].fd,
+		  BPF_CGROUP_INET_EGRESS), "cg_detach_legacy",
+		  "errno=%d\n", errno))
+		goto cleanup;
+	detach_legacy = false;
+
+	/* attempt to mix in with legacy exclusive prog attachment */
+	err = bpf_prog_attach(prog_fd, cgs[last_cg].fd,
+			      BPF_CGROUP_INET_EGRESS, 0);
+	if (CHECK(err, "cg_attach_exclusive", "errno=%d\n", errno))
+		goto cleanup;
+	detach_legacy = true;
+
+	links[last_cg] = bpf_program__attach_cgroup(skel->progs.egress,
+						    cgs[last_cg].fd);
+	if (CHECK(!IS_ERR(links[last_cg]), "cg_attach_fail", "unexpected success\n"))
+		goto cleanup;
+
+	skel->bss->calls = 0;
+	CHECK_FAIL(system(PING_CMD));
+	CHECK(skel->bss->calls != cg_nr, "call_cnt", "exp %d, got %d\n",
+	      cg_nr, skel->bss->calls);
+
+	if (CHECK(bpf_prog_detach2(prog_fd, cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS),
+		  "cg_detach_exclusive", "errno=%d\n", errno))
+		goto cleanup;
+	detach_legacy = false;
+
+	/* re-attach last bpf_link to finish off test */
+	links[last_cg] = bpf_program__attach_cgroup(skel->progs.egress,
+						    cgs[last_cg].fd);
+	if (CHECK(IS_ERR(links[last_cg]), "cg_attach", "err: %ld\n",
+		  PTR_ERR(links[last_cg])))
+		goto cleanup;
+
+	skel->bss->calls = 0;
+	CHECK_FAIL(system(PING_CMD));
+	if (CHECK(skel->bss->calls != cg_nr, "call_cnt", "exp %d, got %d\n",
+		  cg_nr, skel->bss->calls))
+		goto cleanup;
+	if (CHECK(skel->bss->alt_calls != 0, "alt_call_cnt", "exp %d, got %d\n",
+		  0, skel->bss->alt_calls))
+		goto cleanup;
+
+	/* replace BPF programs inside their links for all but first link */
+	for (i = 1; i < cg_nr; i++) {
+		err = bpf_link__update_program(links[i], skel->progs.egress_alt);
+		if (CHECK(err, "prog_upd", "link #%d\n", i))
+			goto cleanup;
+	}
+
+	skel->bss->calls = 0;
+	skel->bss->alt_calls = 0;
+	CHECK_FAIL(system(PING_CMD));
+	if (CHECK(skel->bss->calls != 1, "call_cnt",
+		  "exp %d, got %d\n", 1, skel->bss->calls))
+		goto cleanup;
+	if (CHECK(skel->bss->alt_calls != cg_nr - 1, "alt_call_cnt",
+		  "exp %d, got %d\n", cg_nr - 1, skel->bss->alt_calls))
+		goto cleanup;
+
+	/* Attempt program update with wrong expected BPF program */
+	link_upd_opts.old_prog_fd = bpf_program__fd(skel->progs.egress_alt);
+	link_upd_opts.flags = BPF_F_REPLACE;
+	err = bpf_link_update(bpf_link__fd(links[0]),
+			      bpf_program__fd(skel->progs.egress_alt),
+			      &link_upd_opts);
+	if (CHECK(err == 0 || errno != EPERM, "prog_cmpxchg1",
+		  "unexpectedly succeeded, err %d, errno %d\n", err, -errno))
+		goto cleanup;
+
+	/* Compare-exchange single link program from egress to egress_alt */
+	link_upd_opts.old_prog_fd = bpf_program__fd(skel->progs.egress);
+	link_upd_opts.flags = BPF_F_REPLACE;
+	err = bpf_link_update(bpf_link__fd(links[0]),
+			      bpf_program__fd(skel->progs.egress_alt),
+			      &link_upd_opts);
+	if (CHECK(err, "prog_cmpxchg2", "errno %d\n", -errno))
+		goto cleanup;
+
+	skel->bss->calls = 0;
+	skel->bss->alt_calls = 0;
+	CHECK_FAIL(system(PING_CMD));
+	if (CHECK(skel->bss->calls != 0, "call_cnt",
+		  "exp %d, got %d\n", 0, skel->bss->calls))
+		goto cleanup;
+	if (CHECK(skel->bss->alt_calls != cg_nr, "alt_call_cnt",
+		  "exp %d, got %d\n", cg_nr, skel->bss->alt_calls))
+		goto cleanup;
+
+	/* close cgroup FDs before detaching links */
+	for (i = 0; i < cg_nr; i++) {
+		if (cgs[i].fd > 0) {
+			close(cgs[i].fd);
+			cgs[i].fd = -1;
+		}
+	}
+
+	/* BPF programs should still get called */
+	skel->bss->alt_calls = 0;
+	CHECK_FAIL(system(PING_CMD));
+	if (CHECK(skel->bss->alt_calls != cg_nr, "alt_all_cnt",
+		  "exp %d, got %d\n", cg_nr, skel->bss->alt_calls))
+		goto cleanup;
+
+	/* leave cgroup and remove them, don't detach programs */
+	cleanup_cgroup_environment();
+
+	/* BPF programs should have been auto-detached */
+	skel->bss->alt_calls = 0;
+	CHECK_FAIL(system(PING_CMD));
+	if (CHECK(skel->bss->alt_calls, "alt_call_cnt", "exp %d, got %d\n",
+		  cg_nr, skel->bss->alt_calls))
+		goto cleanup;
+
+cleanup:
+	if (detach_legacy)
+		bpf_prog_detach2(prog_fd, cgs[last_cg].fd,
+				 BPF_CGROUP_INET_EGRESS);
+
+	for (i = 0; i < cg_nr; i++) {
+		if (!IS_ERR(links[i]))
+			bpf_link__destroy(links[i]);
+	}
+	test_cgroup_link__destroy(skel);
+
+	for (i = 0; i < cg_nr; i++) {
+		if (cgs[i].fd > 0)
+			close(cgs[i].fd);
+	}
+	cleanup_cgroup_environment();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_cgroup_link.c b/tools/testing/selftests/bpf/progs/test_cgroup_link.c
new file mode 100644
index 000000000000..77e47b9e4446
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_cgroup_link.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2020 Facebook
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+int calls = 0;
+int alt_calls = 0;
+
+SEC("cgroup_skb/egress1")
+int egress(struct __sk_buff *skb)
+{
+	__sync_fetch_and_add(&calls, 1);
+	return 1;
+}
+
+SEC("cgroup_skb/egress2")
+int egress_alt(struct __sk_buff *skb)
+{
+	__sync_fetch_and_add(&alt_calls, 1);
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment
  2020-03-20 20:36 ` [PATCH bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment Andrii Nakryiko
@ 2020-03-20 21:33   ` Stanislav Fomichev
  2020-03-20 21:47     ` Andrii Nakryiko
  2020-03-20 23:46   ` [Potential Spoof] " Andrey Ignatov
  1 sibling, 1 reply; 22+ messages in thread
From: Stanislav Fomichev @ 2020-03-20 21:33 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, netdev, ast, daniel, andrii.nakryiko, kernel-team

On 03/20, Andrii Nakryiko wrote:
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 5d01c5c7e598..fad9f79bb8f1 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -111,6 +111,7 @@ enum bpf_cmd {
>  	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
>  	BPF_MAP_UPDATE_BATCH,
>  	BPF_MAP_DELETE_BATCH,
[..]
> +	BPF_LINK_CREATE,
Curious, why did you decide to add new command versus reusing existing
BPF_PROG_ATTACH/BPF_PROG_DETACH pair? Can we have a new flag like
BPF_F_NOT_OWNED that we can set when calling BPF_PROG_ATTACH to trigger
all these new bpf_link properties (like cgroup not holding an extra ref)?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment
  2020-03-20 21:33   ` Stanislav Fomichev
@ 2020-03-20 21:47     ` Andrii Nakryiko
  0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2020-03-20 21:47 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Fri, Mar 20, 2020 at 2:33 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 03/20, Andrii Nakryiko wrote:
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 5d01c5c7e598..fad9f79bb8f1 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -111,6 +111,7 @@ enum bpf_cmd {
> >       BPF_MAP_LOOKUP_AND_DELETE_BATCH,
> >       BPF_MAP_UPDATE_BATCH,
> >       BPF_MAP_DELETE_BATCH,
> [..]
> > +     BPF_LINK_CREATE,
> Curious, why did you decide to add new command versus reusing existing
> BPF_PROG_ATTACH/BPF_PROG_DETACH pair? Can we have a new flag like
> BPF_F_NOT_OWNED that we can set when calling BPF_PROG_ATTACH to trigger
> all these new bpf_link properties (like cgroup not holding an extra ref)?

It was my initial approach, but I've got internal feedback that this
will be actually more confusing than useful. E.g., for bpf_link case,
BPF_PROG_DETACH is disabled, so having BPF_PROG_ATTACH is creating a
false expectation in this case. BPF_PROG attach is returning 0 on
succes and <0 on error, but BPF_LINK_CREATE is returning error or >0
FD.

Also with BPF_LINK_UPDATE, it makes more sense to have dedicated
per-link operations.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Potential Spoof] [PATCH bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment
  2020-03-20 20:36 ` [PATCH bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment Andrii Nakryiko
  2020-03-20 21:33   ` Stanislav Fomichev
@ 2020-03-20 23:46   ` Andrey Ignatov
  2020-03-21  0:19     ` Andrii Nakryiko
  1 sibling, 1 reply; 22+ messages in thread
From: Andrey Ignatov @ 2020-03-20 23:46 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, netdev, ast, daniel, andrii.nakryiko, kernel-team

Andrii Nakryiko <andriin@fb.com> [Fri, 2020-03-20 13:37 -0700]:
> Implement new sub-command to attach cgroup BPF programs and return FD-based
> bpf_link back on success. bpf_link, once attached to cgroup, cannot be
> replaced, except by owner having its FD. cgroup bpf_link has semantics of
> BPF_F_ALLOW_MULTI BPF program attachments and can co-exist with

Hi Andrii,

Is there any reason to limit it to only BPF_F_ALLOW_MULTI?

The thing is BPF_F_ALLOW_MULTI not only allows to attach multiple
programs to specified cgroup but also controls what programs can later
be attached to a sub-cgroup, and in BPF_F_ALLOW_MULTI case both
sub-cgroup programs and specified cgroup programs will be executed (in
this order).

There many use-cases though when it's desired to either completely
disallow attaching programs to a sub-cgroup or override parent cgroup's
program behavior in sub-cgroup. If bpf_link covers only
BPF_F_ALLOW_MULTI, those scenarios won't be able to leverage it.

This double-purpose of BPF_F_ALLOW_MULTI is a pain ... For example if
one wants to attach multiple programs to a cgroup but disallow attaching
programs to a sub-cgroup it's currently impossible (well, w/o additional
cgroup level just for this).

> non-bpf_link-based BPF cgroup attachments.
> 
> To prevent bpf_cgroup_link from keeping cgroup alive past the point when no
> BPF program can be executed, implement auto-detachment of link. When
> cgroup_bpf_release() is called, all attached bpf_links are forced to release
> cgroup refcounts, but they leave bpf_link otherwise active and allocated, as
> well as still owning underlying bpf_prog. This is because user-space might
> still have FDs open and active, so bpf_link as a user-referenced object can't
> be freed yet. Once last active FD is closed, bpf_link will be freed and
> underlying bpf_prog refcount will be dropped. But cgroup refcount won't be
> touched, because cgroup is released already.
> 
> The inherent race between bpf_cgroup_link release (from closing last FD) and
> cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So
> the only additional check required is when bpf_cgroup_link attempts to detach
> itself from cgroup. At that time we need to check whether there is still
> cgroup associated with that link. And if not, exit with success, because
> bpf_cgroup_link was already successfully detached.
> 
> Acked-by: Roman Gushchin <guro@fb.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  include/linux/bpf-cgroup.h     |  27 ++-
>  include/linux/bpf.h            |  10 +-
>  include/uapi/linux/bpf.h       |   9 +-
>  kernel/bpf/cgroup.c            | 313 +++++++++++++++++++++++++--------
>  kernel/bpf/syscall.c           |  62 +++++--
>  kernel/cgroup/cgroup.c         |  14 +-
>  tools/include/uapi/linux/bpf.h |   9 +-
>  7 files changed, 345 insertions(+), 99 deletions(-)
> 
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index a7cd5c7a2509..ab95824a1d99 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -51,9 +51,16 @@ struct bpf_cgroup_storage {
>  	struct rcu_head rcu;
>  };
>  
> +struct bpf_cgroup_link {
> +	struct bpf_link link;
> +	struct cgroup *cgroup;
> +	enum bpf_attach_type type;
> +};
> +
>  struct bpf_prog_list {
>  	struct list_head node;
>  	struct bpf_prog *prog;
> +	struct bpf_cgroup_link *link;
>  	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE];
>  };
>  
> @@ -84,20 +91,23 @@ struct cgroup_bpf {
>  int cgroup_bpf_inherit(struct cgroup *cgrp);
>  void cgroup_bpf_offline(struct cgroup *cgrp);
>  
> -int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
> -			struct bpf_prog *replace_prog,
> +int __cgroup_bpf_attach(struct cgroup *cgrp,
> +			struct bpf_prog *prog, struct bpf_prog *replace_prog,
> +			struct bpf_cgroup_link *link,
>  			enum bpf_attach_type type, u32 flags);
>  int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> +			struct bpf_cgroup_link *link,
>  			enum bpf_attach_type type);
>  int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>  		       union bpf_attr __user *uattr);
>  
>  /* Wrapper for __cgroup_bpf_*() protected by cgroup_mutex */
> -int cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
> -		      struct bpf_prog *replace_prog, enum bpf_attach_type type,
> +int cgroup_bpf_attach(struct cgroup *cgrp,
> +		      struct bpf_prog *prog, struct bpf_prog *replace_prog,
> +		      struct bpf_cgroup_link *link, enum bpf_attach_type type,
>  		      u32 flags);
>  int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> -		      enum bpf_attach_type type, u32 flags);
> +		      enum bpf_attach_type type);
>  int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>  		     union bpf_attr __user *uattr);
>  
> @@ -332,6 +342,7 @@ int cgroup_bpf_prog_attach(const union bpf_attr *attr,
>  			   enum bpf_prog_type ptype, struct bpf_prog *prog);
>  int cgroup_bpf_prog_detach(const union bpf_attr *attr,
>  			   enum bpf_prog_type ptype);
> +int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
>  int cgroup_bpf_prog_query(const union bpf_attr *attr,
>  			  union bpf_attr __user *uattr);
>  #else
> @@ -354,6 +365,12 @@ static inline int cgroup_bpf_prog_detach(const union bpf_attr *attr,
>  	return -EINVAL;
>  }
>  
> +static inline int cgroup_bpf_link_attach(const union bpf_attr *attr,
> +					 struct bpf_prog *prog)
> +{
> +	return -EINVAL;
> +}
> +
>  static inline int cgroup_bpf_prog_query(const union bpf_attr *attr,
>  					union bpf_attr __user *uattr)
>  {
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index bdb981c204fa..0f7c2f48c734 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1081,15 +1081,23 @@ extern int sysctl_unprivileged_bpf_disabled;
>  int bpf_map_new_fd(struct bpf_map *map, int flags);
>  int bpf_prog_new_fd(struct bpf_prog *prog);
>  
> -struct bpf_link;
> +struct bpf_link {
> +	atomic64_t refcnt;
> +	const struct bpf_link_ops *ops;
> +	struct bpf_prog *prog;
> +	struct work_struct work;
> +};
>  
>  struct bpf_link_ops {
>  	void (*release)(struct bpf_link *link);
>  	void (*dealloc)(struct bpf_link *link);
> +
>  };
>  
>  void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
>  		   struct bpf_prog *prog);
> +void bpf_link_cleanup(struct bpf_link *link, struct file *link_file,
> +		      int link_fd);
>  void bpf_link_inc(struct bpf_link *link);
>  void bpf_link_put(struct bpf_link *link);
>  int bpf_link_new_fd(struct bpf_link *link);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 5d01c5c7e598..fad9f79bb8f1 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -111,6 +111,7 @@ enum bpf_cmd {
>  	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
>  	BPF_MAP_UPDATE_BATCH,
>  	BPF_MAP_DELETE_BATCH,
> +	BPF_LINK_CREATE,
>  };
>  
>  enum bpf_map_type {
> @@ -539,7 +540,7 @@ union bpf_attr {
>  		__u32		prog_cnt;
>  	} query;
>  
> -	struct {
> +	struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
>  		__u64 name;
>  		__u32 prog_fd;
>  	} raw_tracepoint;
> @@ -567,6 +568,12 @@ union bpf_attr {
>  		__u64		probe_offset;	/* output: probe_offset */
>  		__u64		probe_addr;	/* output: probe_addr */
>  	} task_fd_query;
> +
> +	struct { /* struct used by BPF_LINK_CREATE command */
> +		__u32		prog_fd;	/* eBPF program to attach */
> +		__u32		target_fd;	/* object to attach to */
> +		__u32		attach_type;	/* attach type */
> +	} link_create;
>  } __attribute__((aligned(8)));
>  
>  /* The description below is an attempt at providing documentation to eBPF
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 9c8472823a7f..b960e8633f23 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -80,6 +80,17 @@ static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[])
>  		bpf_cgroup_storage_unlink(storages[stype]);
>  }
>  
> +/* Called when bpf_cgroup_link is auto-detached from dying cgroup.
> + * It drops cgroup and bpf_prog refcounts, and marks bpf_link as defunct. It
> + * doesn't free link memory, which will eventually be done by bpf_link's
> + * release() callback, when its last FD is closed.
> + */
> +static void bpf_cgroup_link_auto_detach(struct bpf_cgroup_link *link)
> +{
> +	cgroup_put(link->cgroup);
> +	link->cgroup = NULL;
> +}
> +
>  /**
>   * cgroup_bpf_release() - put references of all bpf programs and
>   *                        release all cgroup bpf data
> @@ -100,7 +111,10 @@ static void cgroup_bpf_release(struct work_struct *work)
>  
>  		list_for_each_entry_safe(pl, tmp, progs, node) {
>  			list_del(&pl->node);
> -			bpf_prog_put(pl->prog);
> +			if (pl->prog)
> +				bpf_prog_put(pl->prog);
> +			if (pl->link)
> +				bpf_cgroup_link_auto_detach(pl->link);
>  			bpf_cgroup_storages_unlink(pl->storage);
>  			bpf_cgroup_storages_free(pl->storage);
>  			kfree(pl);
> @@ -134,6 +148,18 @@ static void cgroup_bpf_release_fn(struct percpu_ref *ref)
>  	queue_work(system_wq, &cgrp->bpf.release_work);
>  }
>  
> +/* Get underlying bpf_prog of bpf_prog_list entry, regardless if it's through
> + * link or direct prog.
> + */
> +static struct bpf_prog *prog_list_prog(struct bpf_prog_list *pl)
> +{
> +	if (pl->prog)
> +		return pl->prog;
> +	if (pl->link)
> +		return pl->link->link.prog;
> +	return NULL;
> +}
> +
>  /* count number of elements in the list.
>   * it's slow but the list cannot be long
>   */
> @@ -143,7 +169,7 @@ static u32 prog_list_length(struct list_head *head)
>  	u32 cnt = 0;
>  
>  	list_for_each_entry(pl, head, node) {
> -		if (!pl->prog)
> +		if (!prog_list_prog(pl))
>  			continue;
>  		cnt++;
>  	}
> @@ -212,11 +238,11 @@ static int compute_effective_progs(struct cgroup *cgrp,
>  			continue;
>  
>  		list_for_each_entry(pl, &p->bpf.progs[type], node) {
> -			if (!pl->prog)
> +			if (!prog_list_prog(pl))
>  				continue;
>  
>  			item = &progs->items[cnt];
> -			item->prog = pl->prog;
> +			item->prog = prog_list_prog(pl);
>  			bpf_cgroup_storages_assign(item->cgroup_storage,
>  						   pl->storage);
>  			cnt++;
> @@ -333,19 +359,62 @@ static int update_effective_progs(struct cgroup *cgrp,
>  
>  #define BPF_CGROUP_MAX_PROGS 64
>  
> +static struct bpf_prog_list *find_attach_entry(struct list_head *progs,
> +					       struct bpf_prog *prog,
> +					       struct bpf_cgroup_link *link,
> +					       struct bpf_prog *replace_prog,
> +					       bool allow_multi)
> +{
> +	struct bpf_prog_list *pl;
> +
> +	/* legacy single-attach case */
> +	if (!allow_multi) {
> +		if (list_empty(progs))
> +			return NULL;
> +		return list_first_entry(progs, typeof(*pl), node);
> +	}
> +
> +	/* direct prog multi-attach case */
> +	if (prog) {
> +		list_for_each_entry(pl, progs, node) {
> +			if (pl->prog == prog)
> +				/* disallow attaching the same prog twice */
> +				return ERR_PTR(-EINVAL);
> +			if (replace_prog && pl->prog == replace_prog)
> +				/* a match found */
> +				return pl;
> +		}
> +		if (replace_prog)
> +			/* prog to replace not found for cgroup */
> +			return ERR_PTR(-ENOENT);
> +		return NULL;
> +	}
> +
> +	/* link (multi-attach) case */
> +	list_for_each_entry(pl, progs, node) {
> +		if (pl->link == link)
> +			/* disallow attaching the same link twice */
> +			return ERR_PTR(-EINVAL);
> +	}
> +	return NULL;
> +}
> +
>  /**
> - * __cgroup_bpf_attach() - Attach the program to a cgroup, and
> + * __cgroup_bpf_attach() - Attach the program or the link to a cgroup, and
>   *                         propagate the change to descendants
>   * @cgrp: The cgroup which descendants to traverse
>   * @prog: A program to attach
> + * @link: A link to attach
>   * @replace_prog: Previously attached program to replace if BPF_F_REPLACE is set
>   * @type: Type of attach operation
>   * @flags: Option flags
>   *
> + * Exactly one of @prog or @link can be non-null.
>   * Must be called with cgroup_mutex held.
>   */
> -int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
> -			struct bpf_prog *replace_prog,
> +int __cgroup_bpf_attach(struct cgroup *cgrp,
> +			struct bpf_prog *prog, struct bpf_prog *replace_prog,
> +			struct bpf_cgroup_link *link,
>  			enum bpf_attach_type type, u32 flags)
>  {
>  	u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI));
> @@ -353,13 +422,19 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
>  	struct bpf_prog *old_prog = NULL;
>  	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE],
>  		*old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {NULL};
> -	struct bpf_prog_list *pl, *replace_pl = NULL;
> +	struct bpf_prog_list *pl;
>  	int err;
>  
>  	if (((flags & BPF_F_ALLOW_OVERRIDE) && (flags & BPF_F_ALLOW_MULTI)) ||
>  	    ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI)))
>  		/* invalid combination */
>  		return -EINVAL;
> +	if (link && (prog || replace_prog))
> +		/* only either link or prog/replace_prog can be specified */
> +		return -EINVAL;
> +	if (!!replace_prog != !!(flags & BPF_F_REPLACE))
> +		/* replace_prog implies BPF_F_REPLACE, and vice versa */
> +		return -EINVAL;
>  
>  	if (!hierarchy_allows_attach(cgrp, type))
>  		return -EPERM;
> @@ -374,26 +449,16 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
>  	if (prog_list_length(progs) >= BPF_CGROUP_MAX_PROGS)
>  		return -E2BIG;
>  
> -	if (flags & BPF_F_ALLOW_MULTI) {
> -		list_for_each_entry(pl, progs, node) {
> -			if (pl->prog == prog)
> -				/* disallow attaching the same prog twice */
> -				return -EINVAL;
> -			if (pl->prog == replace_prog)
> -				replace_pl = pl;
> -		}
> -		if ((flags & BPF_F_REPLACE) && !replace_pl)
> -			/* prog to replace not found for cgroup */
> -			return -ENOENT;
> -	} else if (!list_empty(progs)) {
> -		replace_pl = list_first_entry(progs, typeof(*pl), node);
> -	}
> +	pl = find_attach_entry(progs, prog, link, replace_prog,
> +			       flags & BPF_F_ALLOW_MULTI);
> +	if (IS_ERR(pl))
> +		return PTR_ERR(pl);
>  
> -	if (bpf_cgroup_storages_alloc(storage, prog))
> +	if (bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog))
>  		return -ENOMEM;
>  
> -	if (replace_pl) {
> -		pl = replace_pl;
> +	if (pl) {
> +		/* only non-link case is possible */
>  		old_prog = pl->prog;
>  		bpf_cgroup_storages_unlink(pl->storage);
>  		bpf_cgroup_storages_assign(old_storage, pl->storage);
> @@ -407,6 +472,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
>  	}
>  
>  	pl->prog = prog;
> +	pl->link = link;
>  	bpf_cgroup_storages_assign(pl->storage, storage);
>  	cgrp->bpf.flags[type] = saved_flags;
>  
> @@ -414,80 +480,91 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
>  	if (err)
>  		goto cleanup;
>  
> -	static_branch_inc(&cgroup_bpf_enabled_key);
>  	bpf_cgroup_storages_free(old_storage);
> -	if (old_prog) {
> +	if (old_prog)
>  		bpf_prog_put(old_prog);
> -		static_branch_dec(&cgroup_bpf_enabled_key);
> -	}
> -	bpf_cgroup_storages_link(storage, cgrp, type);
> +	else
> +		static_branch_inc(&cgroup_bpf_enabled_key);
> +	bpf_cgroup_storages_link(pl->storage, cgrp, type);
>  	return 0;
>  
>  cleanup:
> -	/* and cleanup the prog list */
> -	pl->prog = old_prog;
> +	if (old_prog)
> +		pl->prog = old_prog;
>  	bpf_cgroup_storages_free(pl->storage);
>  	bpf_cgroup_storages_assign(pl->storage, old_storage);
>  	bpf_cgroup_storages_link(pl->storage, cgrp, type);
> -	if (!replace_pl) {
> +	if (!old_prog) {
>  		list_del(&pl->node);
>  		kfree(pl);
>  	}
>  	return err;
>  }
>  
> +static struct bpf_prog_list *find_detach_entry(struct list_head *progs,
> +					       struct bpf_prog *prog,
> +					       struct bpf_cgroup_link *link,
> +					       bool allow_multi)
> +{
> +	struct bpf_prog_list *pl;
> +
> +	if (!allow_multi) {
> +		if (list_empty(progs))
> +			/* report error when trying to detach and nothing is attached */
> +			return ERR_PTR(-ENOENT);
> +
> +		/* to maintain backward compatibility NONE and OVERRIDE cgroups
> +		 * allow detaching with invalid FD (prog==NULL)
> +		 */
> +		return list_first_entry(progs, typeof(*pl), node);
> +	}
> +
> +	if (!prog && !link)
> +		/* to detach MULTI prog the user has to specify valid FD
> +		 * of the program to be detached
> +		 */
> +		return ERR_PTR(-EINVAL);
> +
> +	/* find the prog and detach it */
> +	list_for_each_entry(pl, progs, node) {
> +		if (pl->prog == prog && pl->link == link)
> +			return pl;
> +	}
> +	return ERR_PTR(-ENOENT);
> +}
> +
>  /**
> - * __cgroup_bpf_detach() - Detach the program from a cgroup, and
> + * __cgroup_bpf_detach() - Detach the program or link from a cgroup, and
>   *                         propagate the change to descendants
>   * @cgrp: The cgroup which descendants to traverse
>   * @prog: A program to detach or NULL
> + * @prog: A link to detach or NULL
>   * @type: Type of detach operation
>   *
> + * At most one of @prog or @link can be non-NULL.
>   * Must be called with cgroup_mutex held.
>   */
>  int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> -			enum bpf_attach_type type)
> +			struct bpf_cgroup_link *link, enum bpf_attach_type type)
>  {
>  	struct list_head *progs = &cgrp->bpf.progs[type];
>  	u32 flags = cgrp->bpf.flags[type];
> -	struct bpf_prog *old_prog = NULL;
>  	struct bpf_prog_list *pl;
> +	struct bpf_prog *old_prog;
>  	int err;
>  
> -	if (flags & BPF_F_ALLOW_MULTI) {
> -		if (!prog)
> -			/* to detach MULTI prog the user has to specify valid FD
> -			 * of the program to be detached
> -			 */
> -			return -EINVAL;
> -	} else {
> -		if (list_empty(progs))
> -			/* report error when trying to detach and nothing is attached */
> -			return -ENOENT;
> -	}
> +	if (prog && link)
> +		/* only one of prog or link can be specified */
> +		return -EINVAL;
>  
> -	if (flags & BPF_F_ALLOW_MULTI) {
> -		/* find the prog and detach it */
> -		list_for_each_entry(pl, progs, node) {
> -			if (pl->prog != prog)
> -				continue;
> -			old_prog = prog;
> -			/* mark it deleted, so it's ignored while
> -			 * recomputing effective
> -			 */
> -			pl->prog = NULL;
> -			break;
> -		}
> -		if (!old_prog)
> -			return -ENOENT;
> -	} else {
> -		/* to maintain backward compatibility NONE and OVERRIDE cgroups
> -		 * allow detaching with invalid FD (prog==NULL)
> -		 */
> -		pl = list_first_entry(progs, typeof(*pl), node);
> -		old_prog = pl->prog;
> -		pl->prog = NULL;
> -	}
> +	pl = find_detach_entry(progs, prog, link, flags & BPF_F_ALLOW_MULTI);
> +	if (IS_ERR(pl))
> +		return PTR_ERR(pl);
> +
> +	/* mark it deleted, so it's ignored while recomputing effective */
> +	old_prog = pl->prog;
> +	pl->prog = NULL;
> +	pl->link = NULL;
>  
>  	err = update_effective_progs(cgrp, type);
>  	if (err)
> @@ -501,14 +578,15 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>  	if (list_empty(progs))
>  		/* last program was detached, reset flags to zero */
>  		cgrp->bpf.flags[type] = 0;
> -
> -	bpf_prog_put(old_prog);
> +	if (old_prog)
> +		bpf_prog_put(old_prog);
>  	static_branch_dec(&cgroup_bpf_enabled_key);
>  	return 0;
>  
>  cleanup:
> -	/* and restore back old_prog */
> +	/* restore back prog or link */
>  	pl->prog = old_prog;
> +	pl->link = link;
>  	return err;
>  }
>  
> @@ -521,6 +599,7 @@ int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>  	struct list_head *progs = &cgrp->bpf.progs[type];
>  	u32 flags = cgrp->bpf.flags[type];
>  	struct bpf_prog_array *effective;
> +	struct bpf_prog *prog;
>  	int cnt, ret = 0, i;
>  
>  	effective = rcu_dereference_protected(cgrp->bpf.effective[type],
> @@ -551,7 +630,8 @@ int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>  
>  		i = 0;
>  		list_for_each_entry(pl, progs, node) {
> -			id = pl->prog->aux->id;
> +			prog = prog_list_prog(pl);
> +			id = prog->aux->id;
>  			if (copy_to_user(prog_ids + i, &id, sizeof(id)))
>  				return -EFAULT;
>  			if (++i == cnt)
> @@ -581,8 +661,8 @@ int cgroup_bpf_prog_attach(const union bpf_attr *attr,
>  		}
>  	}
>  
> -	ret = cgroup_bpf_attach(cgrp, prog, replace_prog, attr->attach_type,
> -				attr->attach_flags);
> +	ret = cgroup_bpf_attach(cgrp, prog, replace_prog, NULL,
> +				attr->attach_type, attr->attach_flags);
>  
>  	if (replace_prog)
>  		bpf_prog_put(replace_prog);
> @@ -604,7 +684,7 @@ int cgroup_bpf_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
>  	if (IS_ERR(prog))
>  		prog = NULL;
>  
> -	ret = cgroup_bpf_detach(cgrp, prog, attr->attach_type, 0);
> +	ret = cgroup_bpf_detach(cgrp, prog, attr->attach_type);
>  	if (prog)
>  		bpf_prog_put(prog);
>  
> @@ -612,6 +692,87 @@ int cgroup_bpf_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
>  	return ret;
>  }
>  
> +static void bpf_cgroup_link_release(struct bpf_link *link)
> +{
> +	struct bpf_cgroup_link *cg_link =
> +		container_of(link, struct bpf_cgroup_link, link);
> +
> +	/* link might have been auto-detached by dying cgroup already,
> +	 * in that case our work is done here
> +	 */
> +	if (!cg_link->cgroup)
> +		return;
> +
> +	mutex_lock(&cgroup_mutex);
> +
> +	/* re-check cgroup under lock again */
> +	if (!cg_link->cgroup) {
> +		mutex_unlock(&cgroup_mutex);
> +		return;
> +	}
> +
> +	WARN_ON(__cgroup_bpf_detach(cg_link->cgroup, NULL, cg_link,
> +				    cg_link->type));
> +
> +	mutex_unlock(&cgroup_mutex);
> +	cgroup_put(cg_link->cgroup);
> +}
> +
> +static void bpf_cgroup_link_dealloc(struct bpf_link *link)
> +{
> +	struct bpf_cgroup_link *cg_link =
> +		container_of(link, struct bpf_cgroup_link, link);
> +
> +	kfree(cg_link);
> +}
> +
> +const struct bpf_link_ops bpf_cgroup_link_lops = {
> +	.release = bpf_cgroup_link_release,
> +	.dealloc = bpf_cgroup_link_dealloc,
> +};
> +
> +int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> +{
> +	struct bpf_cgroup_link *link;
> +	struct file *link_file;
> +	struct cgroup *cgrp;
> +	int err, link_fd;
> +
> +	cgrp = cgroup_get_from_fd(attr->link_create.target_fd);
> +	if (IS_ERR(cgrp))
> +		return PTR_ERR(cgrp);
> +
> +	link = kzalloc(sizeof(*link), GFP_USER);
> +	if (!link) {
> +		err = -ENOMEM;
> +		goto out_put_cgroup;
> +	}
> +	bpf_link_init(&link->link, &bpf_cgroup_link_lops, prog);
> +	link->cgroup = cgrp;
> +	link->type = attr->link_create.attach_type;
> +
> +	link_file = bpf_link_new_file(&link->link, &link_fd);
> +	if (IS_ERR(link_file)) {
> +		kfree(link);
> +		err = PTR_ERR(link_file);
> +		goto out_put_cgroup;
> +	}
> +
> +	err = cgroup_bpf_attach(cgrp, NULL, NULL, link, link->type,
> +				BPF_F_ALLOW_MULTI);
> +	if (err) {
> +		bpf_link_cleanup(&link->link, link_file, link_fd);
> +		goto out_put_cgroup;
> +	}
> +
> +	fd_install(link_fd, link_file);
> +	return link_fd;
> +
> +out_put_cgroup:
> +	cgroup_put(cgrp);
> +	return err;
> +}
> +
>  int cgroup_bpf_prog_query(const union bpf_attr *attr,
>  			  union bpf_attr __user *uattr)
>  {
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index fd4181939064..f6e7d32a2632 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2173,13 +2173,6 @@ static int bpf_obj_get(const union bpf_attr *attr)
>  				attr->file_flags);
>  }
>  
> -struct bpf_link {
> -	atomic64_t refcnt;
> -	const struct bpf_link_ops *ops;
> -	struct bpf_prog *prog;
> -	struct work_struct work;
> -};
> -
>  void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
>  		   struct bpf_prog *prog)
>  {
> @@ -2193,8 +2186,8 @@ void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
>   * anon_inode's release() call. This helper manages marking bpf_link as
>   * defunct, releases anon_inode file and puts reserved FD.
>   */
> -static void bpf_link_cleanup(struct bpf_link *link, struct file *link_file,
> -			     int link_fd)
> +void bpf_link_cleanup(struct bpf_link *link, struct file *link_file,
> +		      int link_fd)
>  {
>  	link->prog = NULL;
>  	fput(link_file);
> @@ -2252,7 +2245,8 @@ static int bpf_link_release(struct inode *inode, struct file *filp)
>  #ifdef CONFIG_PROC_FS
>  static const struct bpf_link_ops bpf_raw_tp_lops;
>  static const struct bpf_link_ops bpf_tracing_link_lops;
> -static const struct bpf_link_ops bpf_xdp_link_lops;
> +
> +extern const struct bpf_link_ops bpf_cgroup_link_lops;
>  
>  static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
>  {
> @@ -2265,6 +2259,8 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
>  		link_type = "raw_tracepoint";
>  	else if (link->ops == &bpf_tracing_link_lops)
>  		link_type = "tracing";
> +	else if (link->ops == &bpf_cgroup_link_lops)
> +		link_type = "cgroup";
>  	else
>  		link_type = "unknown";
>  
> @@ -3533,6 +3529,49 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
>  	return err;
>  }
>  
> +#define BPF_LINK_CREATE_LAST_FIELD link_create.attach_type
> +static int link_create(union bpf_attr *attr)
> +{
> +	enum bpf_prog_type ptype;
> +	struct bpf_prog *prog;
> +	int ret;
> +
> +	if (CHECK_ATTR(BPF_LINK_CREATE))
> +		return -EINVAL;
> +
> +	ptype = attach_type_to_prog_type(attr->link_create.attach_type);
> +	if (ptype == BPF_PROG_TYPE_UNSPEC)
> +		return -EINVAL;
> +
> +	prog = bpf_prog_get_type(attr->link_create.prog_fd, ptype);
> +	if (IS_ERR(prog))
> +		return PTR_ERR(prog);
> +
> +	ret = bpf_prog_attach_check_attach_type(prog,
> +						attr->link_create.attach_type);
> +	if (ret)
> +		goto err_out;
> +
> +	switch (ptype) {
> +	case BPF_PROG_TYPE_CGROUP_SKB:
> +	case BPF_PROG_TYPE_CGROUP_SOCK:
> +	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
> +	case BPF_PROG_TYPE_SOCK_OPS:
> +	case BPF_PROG_TYPE_CGROUP_DEVICE:
> +	case BPF_PROG_TYPE_CGROUP_SYSCTL:
> +	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
> +		ret = cgroup_bpf_link_attach(attr, prog);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +	}
> +
> +err_out:
> +	if (ret < 0)
> +		bpf_prog_put(prog);
> +	return ret;
> +}
> +
>  SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
>  {
>  	union bpf_attr attr = {};
> @@ -3643,6 +3682,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
>  	case BPF_MAP_DELETE_BATCH:
>  		err = bpf_map_do_batch(&attr, uattr, BPF_MAP_DELETE_BATCH);
>  		break;
> +	case BPF_LINK_CREATE:
> +		err = link_create(&attr);
> +		break;
>  	default:
>  		err = -EINVAL;
>  		break;
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 3dead0416b91..219624fba9ba 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -6303,27 +6303,31 @@ void cgroup_sk_free(struct sock_cgroup_data *skcd)
>  #endif	/* CONFIG_SOCK_CGROUP_DATA */
>  
>  #ifdef CONFIG_CGROUP_BPF
> -int cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
> -		      struct bpf_prog *replace_prog, enum bpf_attach_type type,
> +int cgroup_bpf_attach(struct cgroup *cgrp,
> +		      struct bpf_prog *prog, struct bpf_prog *replace_prog,
> +		      struct bpf_cgroup_link *link,
> +		      enum bpf_attach_type type,
>  		      u32 flags)
>  {
>  	int ret;
>  
>  	mutex_lock(&cgroup_mutex);
> -	ret = __cgroup_bpf_attach(cgrp, prog, replace_prog, type, flags);
> +	ret = __cgroup_bpf_attach(cgrp, prog, replace_prog, link, type, flags);
>  	mutex_unlock(&cgroup_mutex);
>  	return ret;
>  }
> +
>  int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> -		      enum bpf_attach_type type, u32 flags)
> +		      enum bpf_attach_type type)
>  {
>  	int ret;
>  
>  	mutex_lock(&cgroup_mutex);
> -	ret = __cgroup_bpf_detach(cgrp, prog, type);
> +	ret = __cgroup_bpf_detach(cgrp, prog, NULL, type);
>  	mutex_unlock(&cgroup_mutex);
>  	return ret;
>  }
> +
>  int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>  		     union bpf_attr __user *uattr)
>  {
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 5d01c5c7e598..fad9f79bb8f1 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -111,6 +111,7 @@ enum bpf_cmd {
>  	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
>  	BPF_MAP_UPDATE_BATCH,
>  	BPF_MAP_DELETE_BATCH,
> +	BPF_LINK_CREATE,
>  };
>  
>  enum bpf_map_type {
> @@ -539,7 +540,7 @@ union bpf_attr {
>  		__u32		prog_cnt;
>  	} query;
>  
> -	struct {
> +	struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
>  		__u64 name;
>  		__u32 prog_fd;
>  	} raw_tracepoint;
> @@ -567,6 +568,12 @@ union bpf_attr {
>  		__u64		probe_offset;	/* output: probe_offset */
>  		__u64		probe_addr;	/* output: probe_addr */
>  	} task_fd_query;
> +
> +	struct { /* struct used by BPF_LINK_CREATE command */
> +		__u32		prog_fd;	/* eBPF program to attach */
> +		__u32		target_fd;	/* object to attach to */
> +		__u32		attach_type;	/* attach type */
> +	} link_create;
>  } __attribute__((aligned(8)));
>  
>  /* The description below is an attempt at providing documentation to eBPF
> -- 
> 2.17.1
> 

-- 
Andrey Ignatov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Potential Spoof] [PATCH bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment
  2020-03-20 23:46   ` [Potential Spoof] " Andrey Ignatov
@ 2020-03-21  0:19     ` Andrii Nakryiko
  2020-03-23 18:03       ` Andrii Nakryiko
  0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2020-03-21  0:19 UTC (permalink / raw)
  To: Andrey Ignatov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Fri, Mar 20, 2020 at 4:46 PM Andrey Ignatov <rdna@fb.com> wrote:
>
> Andrii Nakryiko <andriin@fb.com> [Fri, 2020-03-20 13:37 -0700]:
> > Implement new sub-command to attach cgroup BPF programs and return FD-based
> > bpf_link back on success. bpf_link, once attached to cgroup, cannot be
> > replaced, except by owner having its FD. cgroup bpf_link has semantics of
> > BPF_F_ALLOW_MULTI BPF program attachments and can co-exist with
>
> Hi Andrii,
>
> Is there any reason to limit it to only BPF_F_ALLOW_MULTI?
>

No technical reason, just felt like the good default behavior. It's
possible to support all the same flags as with legacy BPF program
attachment, but see below...

> The thing is BPF_F_ALLOW_MULTI not only allows to attach multiple
> programs to specified cgroup but also controls what programs can later
> be attached to a sub-cgroup, and in BPF_F_ALLOW_MULTI case both
> sub-cgroup programs and specified cgroup programs will be executed (in
> this order).
>
> There many use-cases though when it's desired to either completely
> disallow attaching programs to a sub-cgroup or override parent cgroup's
> program behavior in sub-cgroup. If bpf_link covers only
> BPF_F_ALLOW_MULTI, those scenarios won't be able to leverage it.
>
> This double-purpose of BPF_F_ALLOW_MULTI is a pain ... For example if

Yeah, exactly. I don't know historical reasons for why it was done the
way it was done (i.e., BPF_F_ALLOW_MULTI and BPF_F_ALLOW_OVERRIDE
flags), so maybe someone can give some insights there. But I wonder if
inheritance policy should be orthogonal to a single vs multiple
bpf_progs limit for a given cgroup. They could be specified on
per-cgroup level, not per-BPF program (and then making sure flags
match). That way it would be possible to express more nuanced
policies, like allowing multiple programs for a root cgroup, but
disallowing attach new BPF programs for any child cgroup, etc.

Would be good to get some more perspectives on this...

> one wants to attach multiple programs to a cgroup but disallow attaching
> programs to a sub-cgroup it's currently impossible (well, w/o additional
> cgroup level just for this).
>
> > non-bpf_link-based BPF cgroup attachments.
> >
> > To prevent bpf_cgroup_link from keeping cgroup alive past the point when no
> > BPF program can be executed, implement auto-detachment of link. When
> > cgroup_bpf_release() is called, all attached bpf_links are forced to release
> > cgroup refcounts, but they leave bpf_link otherwise active and allocated, as
> > well as still owning underlying bpf_prog. This is because user-space might
> > still have FDs open and active, so bpf_link as a user-referenced object can't
> > be freed yet. Once last active FD is closed, bpf_link will be freed and
> > underlying bpf_prog refcount will be dropped. But cgroup refcount won't be
> > touched, because cgroup is released already.
> >
> > The inherent race between bpf_cgroup_link release (from closing last FD) and
> > cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So
> > the only additional check required is when bpf_cgroup_link attempts to detach
> > itself from cgroup. At that time we need to check whether there is still
> > cgroup associated with that link. And if not, exit with success, because
> > bpf_cgroup_link was already successfully detached.
> >
> > Acked-by: Roman Gushchin <guro@fb.com>
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
> >  include/linux/bpf-cgroup.h     |  27 ++-
> >  include/linux/bpf.h            |  10 +-
> >  include/uapi/linux/bpf.h       |   9 +-
> >  kernel/bpf/cgroup.c            | 313 +++++++++++++++++++++++++--------
> >  kernel/bpf/syscall.c           |  62 +++++--
> >  kernel/cgroup/cgroup.c         |  14 +-
> >  tools/include/uapi/linux/bpf.h |   9 +-
> >  7 files changed, 345 insertions(+), 99 deletions(-)
> >

[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-20 20:36 ` [PATCH bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Andrii Nakryiko
@ 2020-03-21  0:33   ` kbuild test robot
  2020-03-21  0:58     ` Andrii Nakryiko
  2020-03-21  1:28   ` kbuild test robot
  2020-03-23 10:58   ` Toke Høiland-Jørgensen
  2 siblings, 1 reply; 22+ messages in thread
From: kbuild test robot @ 2020-03-21  0:33 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: kbuild-all, bpf, netdev, ast, daniel

[-- Attachment #1: Type: text/plain, Size: 5760 bytes --]

Hi Andrii,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]
[cannot apply to bpf/master cgroup/for-next v5.6-rc6 next-20200320]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Andrii-Nakryiko/Add-support-for-cgroup-bpf_link/20200321-045848
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: m68k-sun3_defconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 9.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=9.2.0 make.cross ARCH=m68k 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:11,
                    from include/linux/list.h:9,
                    from include/linux/timer.h:5,
                    from include/linux/workqueue.h:9,
                    from include/linux/bpf.h:9,
                    from kernel/bpf/syscall.c:4:
   kernel/bpf/syscall.c: In function 'link_update':
>> include/linux/kernel.h:994:51: error: dereferencing pointer to incomplete type 'struct bpf_cgroup_link'
     994 |  BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
         |                                                   ^~
   include/linux/compiler.h:330:9: note: in definition of macro '__compiletime_assert'
     330 |   if (!(condition))     \
         |         ^~~~~~~~~
   include/linux/compiler.h:350:2: note: in expansion of macro '_compiletime_assert'
     350 |  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
         |  ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^~~~~~~~~~~~~~~~~~
   include/linux/kernel.h:994:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
     994 |  BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
         |  ^~~~~~~~~~~~~~~~
   include/linux/kernel.h:994:20: note: in expansion of macro '__same_type'
     994 |  BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
         |                    ^~~~~~~~~~~
>> kernel/bpf/syscall.c:3612:13: note: in expansion of macro 'container_of'
    3612 |   cg_link = container_of(link, struct bpf_cgroup_link, link);
         |             ^~~~~~~~~~~~
   In file included from <command-line>:
>> include/linux/compiler_types.h:129:35: error: invalid use of undefined type 'struct bpf_cgroup_link'
     129 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
         |                                   ^~~~~~~~~~~~~~~~~~
   include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
      17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
         |                                ^~~~~~~~~~~~~~~~~~~
   include/linux/kernel.h:997:21: note: in expansion of macro 'offsetof'
     997 |  ((type *)(__mptr - offsetof(type, member))); })
         |                     ^~~~~~~~
>> kernel/bpf/syscall.c:3612:13: note: in expansion of macro 'container_of'
    3612 |   cg_link = container_of(link, struct bpf_cgroup_link, link);
         |             ^~~~~~~~~~~~
>> kernel/bpf/syscall.c:3618:9: error: implicit declaration of function 'cgroup_bpf_replace'; did you mean 'cgroup_bpf_offline'? [-Werror=implicit-function-declaration]
    3618 |   ret = cgroup_bpf_replace(cg_link->cgroup, cg_link,
         |         ^~~~~~~~~~~~~~~~~~
         |         cgroup_bpf_offline
   cc1: some warnings being treated as errors

vim +994 include/linux/kernel.h

cf14f27f82af78 Alexei Starovoitov 2018-03-28  984  
^1da177e4c3f41 Linus Torvalds     2005-04-16  985  /**
^1da177e4c3f41 Linus Torvalds     2005-04-16  986   * container_of - cast a member of a structure out to the containing structure
^1da177e4c3f41 Linus Torvalds     2005-04-16  987   * @ptr:	the pointer to the member.
^1da177e4c3f41 Linus Torvalds     2005-04-16  988   * @type:	the type of the container struct this is embedded in.
^1da177e4c3f41 Linus Torvalds     2005-04-16  989   * @member:	the name of the member within the struct.
^1da177e4c3f41 Linus Torvalds     2005-04-16  990   *
^1da177e4c3f41 Linus Torvalds     2005-04-16  991   */
^1da177e4c3f41 Linus Torvalds     2005-04-16  992  #define container_of(ptr, type, member) ({				\
c7acec713d14c6 Ian Abbott         2017-07-12  993  	void *__mptr = (void *)(ptr);					\
c7acec713d14c6 Ian Abbott         2017-07-12 @994  	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
c7acec713d14c6 Ian Abbott         2017-07-12  995  			 !__same_type(*(ptr), void),			\
c7acec713d14c6 Ian Abbott         2017-07-12  996  			 "pointer type mismatch in container_of()");	\
c7acec713d14c6 Ian Abbott         2017-07-12  997  	((type *)(__mptr - offsetof(type, member))); })
^1da177e4c3f41 Linus Torvalds     2005-04-16  998  

:::::: The code at line 994 was first introduced by commit
:::::: c7acec713d14c6ce8a20154f9dfda258d6bcad3b kernel.h: handle pointers to arrays better in container_of()

:::::: TO: Ian Abbott <abbotti@mev.co.uk>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 13182 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-21  0:33   ` kbuild test robot
@ 2020-03-21  0:58     ` Andrii Nakryiko
  0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2020-03-21  0:58 UTC (permalink / raw)
  To: kbuild test robot
  Cc: Andrii Nakryiko, kbuild-all, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann

On Fri, Mar 20, 2020 at 5:34 PM kbuild test robot <lkp@intel.com> wrote:
>
> Hi Andrii,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on bpf-next/master]
> [cannot apply to bpf/master cgroup/for-next v5.6-rc6 next-20200320]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
>
> url:    https://github.com/0day-ci/linux/commits/Andrii-Nakryiko/Add-support-for-cgroup-bpf_link/20200321-045848
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> config: m68k-sun3_defconfig (attached as .config)
> compiler: m68k-linux-gcc (GCC) 9.2.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         GCC_VERSION=9.2.0 make.cross ARCH=m68k
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
>
> All error/warnings (new ones prefixed by >>):
>
>    In file included from include/linux/kernel.h:11,
>                     from include/linux/list.h:9,
>                     from include/linux/timer.h:5,
>                     from include/linux/workqueue.h:9,
>                     from include/linux/bpf.h:9,
>                     from kernel/bpf/syscall.c:4:
>    kernel/bpf/syscall.c: In function 'link_update':
> >> include/linux/kernel.h:994:51: error: dereferencing pointer to incomplete type 'struct bpf_cgroup_link'
>      994 |  BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
>          |                                                   ^~
>    include/linux/compiler.h:330:9: note: in definition of macro '__compiletime_assert'
>      330 |   if (!(condition))     \
>          |         ^~~~~~~~~
>    include/linux/compiler.h:350:2: note: in expansion of macro '_compiletime_assert'
>      350 |  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
>          |  ^~~~~~~~~~~~~~~~~~~
>    include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
>       39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
>          |                                     ^~~~~~~~~~~~~~~~~~
>    include/linux/kernel.h:994:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
>      994 |  BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
>          |  ^~~~~~~~~~~~~~~~
>    include/linux/kernel.h:994:20: note: in expansion of macro '__same_type'
>      994 |  BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
>          |                    ^~~~~~~~~~~
> >> kernel/bpf/syscall.c:3612:13: note: in expansion of macro 'container_of'
>     3612 |   cg_link = container_of(link, struct bpf_cgroup_link, link);
>          |             ^~~~~~~~~~~~
>    In file included from <command-line>:
> >> include/linux/compiler_types.h:129:35: error: invalid use of undefined type 'struct bpf_cgroup_link'
>      129 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
>          |                                   ^~~~~~~~~~~~~~~~~~
>    include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
>       17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
>          |                                ^~~~~~~~~~~~~~~~~~~
>    include/linux/kernel.h:997:21: note: in expansion of macro 'offsetof'
>      997 |  ((type *)(__mptr - offsetof(type, member))); })
>          |                     ^~~~~~~~
> >> kernel/bpf/syscall.c:3612:13: note: in expansion of macro 'container_of'
>     3612 |   cg_link = container_of(link, struct bpf_cgroup_link, link);
>          |             ^~~~~~~~~~~~
> >> kernel/bpf/syscall.c:3618:9: error: implicit declaration of function 'cgroup_bpf_replace'; did you mean 'cgroup_bpf_offline'? [-Werror=implicit-function-declaration]
>     3618 |   ret = cgroup_bpf_replace(cg_link->cgroup, cg_link,
>          |         ^~~~~~~~~~~~~~~~~~
>          |         cgroup_bpf_offline
>    cc1: some warnings being treated as errors
>

Forgot to stub out cgroup_bpf_replace(), will fix in v2.


> vim +994 include/linux/kernel.h
>
> cf14f27f82af78 Alexei Starovoitov 2018-03-28  984
> ^1da177e4c3f41 Linus Torvalds     2005-04-16  985  /**
> ^1da177e4c3f41 Linus Torvalds     2005-04-16  986   * container_of - cast a member of a structure out to the containing structure
> ^1da177e4c3f41 Linus Torvalds     2005-04-16  987   * @ptr:     the pointer to the member.
> ^1da177e4c3f41 Linus Torvalds     2005-04-16  988   * @type:    the type of the container struct this is embedded in.
> ^1da177e4c3f41 Linus Torvalds     2005-04-16  989   * @member:  the name of the member within the struct.
> ^1da177e4c3f41 Linus Torvalds     2005-04-16  990   *
> ^1da177e4c3f41 Linus Torvalds     2005-04-16  991   */
> ^1da177e4c3f41 Linus Torvalds     2005-04-16  992  #define container_of(ptr, type, member) ({                           \
> c7acec713d14c6 Ian Abbott         2017-07-12  993       void *__mptr = (void *)(ptr);                                   \
> c7acec713d14c6 Ian Abbott         2017-07-12 @994       BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&   \
> c7acec713d14c6 Ian Abbott         2017-07-12  995                        !__same_type(*(ptr), void),                    \
> c7acec713d14c6 Ian Abbott         2017-07-12  996                        "pointer type mismatch in container_of()");    \
> c7acec713d14c6 Ian Abbott         2017-07-12  997       ((type *)(__mptr - offsetof(type, member))); })
> ^1da177e4c3f41 Linus Torvalds     2005-04-16  998
>
> :::::: The code at line 994 was first introduced by commit
> :::::: c7acec713d14c6ce8a20154f9dfda258d6bcad3b kernel.h: handle pointers to arrays better in container_of()
>
> :::::: TO: Ian Abbott <abbotti@mev.co.uk>
> :::::: CC: Linus Torvalds <torvalds@linux-foundation.org>
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-20 20:36 ` [PATCH bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Andrii Nakryiko
  2020-03-21  0:33   ` kbuild test robot
@ 2020-03-21  1:28   ` kbuild test robot
  2020-03-23 10:58   ` Toke Høiland-Jørgensen
  2 siblings, 0 replies; 22+ messages in thread
From: kbuild test robot @ 2020-03-21  1:28 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: kbuild-all, bpf, netdev, ast, daniel

[-- Attachment #1: Type: text/plain, Size: 5416 bytes --]

Hi Andrii,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]
[cannot apply to bpf/master cgroup/for-next v5.6-rc6 next-20200320]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Andrii-Nakryiko/Add-support-for-cgroup-bpf_link/20200321-045848
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: mips-64r6el_defconfig (attached as .config)
compiler: mips64el-linux-gcc (GCC) 5.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=5.5.0 make.cross ARCH=mips 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/kernel.h:11:0,
                    from include/linux/list.h:9,
                    from include/linux/timer.h:5,
                    from include/linux/workqueue.h:9,
                    from include/linux/bpf.h:9,
                    from kernel/bpf/syscall.c:4:
   kernel/bpf/syscall.c: In function 'link_update':
   include/linux/kernel.h:994:51: error: dereferencing pointer to incomplete type 'struct bpf_cgroup_link'
     BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
                                                      ^
   include/linux/compiler.h:330:9: note: in definition of macro '__compiletime_assert'
      if (!(condition))     \
            ^
   include/linux/compiler.h:350:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
    #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                        ^
   include/linux/kernel.h:994:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
     BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
     ^
   include/linux/kernel.h:994:20: note: in expansion of macro '__same_type'
     BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
                       ^
   kernel/bpf/syscall.c:3612:13: note: in expansion of macro 'container_of'
      cg_link = container_of(link, struct bpf_cgroup_link, link);
                ^
   In file included from <command-line>:0:0:
>> kernel/bpf/syscall.c:3612:39: error: invalid use of undefined type 'struct bpf_cgroup_link'
      cg_link = container_of(link, struct bpf_cgroup_link, link);
                                          ^
   include/linux/compiler_types.h:129:54: note: in definition of macro '__compiler_offsetof'
    #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
                                                         ^
   include/linux/kernel.h:997:21: note: in expansion of macro 'offsetof'
     ((type *)(__mptr - offsetof(type, member))); })
                        ^
   kernel/bpf/syscall.c:3612:13: note: in expansion of macro 'container_of'
      cg_link = container_of(link, struct bpf_cgroup_link, link);
                ^
>> kernel/bpf/syscall.c:3618:9: error: implicit declaration of function 'cgroup_bpf_replace' [-Werror=implicit-function-declaration]
      ret = cgroup_bpf_replace(cg_link->cgroup, cg_link,
            ^
   cc1: some warnings being treated as errors

vim +3612 kernel/bpf/syscall.c

  3576	
  3577	static int link_update(union bpf_attr *attr)
  3578	{
  3579		struct bpf_prog *old_prog = NULL, *new_prog;
  3580		enum bpf_prog_type ptype;
  3581		struct bpf_link *link;
  3582		u32 flags;
  3583		int ret;
  3584	
  3585		if (CHECK_ATTR(BPF_LINK_UPDATE))
  3586			return -EINVAL;
  3587	
  3588		flags = attr->link_update.flags;
  3589		if (flags & ~BPF_F_REPLACE)
  3590			return -EINVAL;
  3591	
  3592		link = bpf_link_get_from_fd(attr->link_update.link_fd);
  3593		if (IS_ERR(link))
  3594			return PTR_ERR(link);
  3595	
  3596		new_prog = bpf_prog_get(attr->link_update.new_prog_fd);
  3597		if (IS_ERR(new_prog))
  3598			return PTR_ERR(new_prog);
  3599	
  3600		if (flags & BPF_F_REPLACE) {
  3601			old_prog = bpf_prog_get(attr->link_update.old_prog_fd);
  3602			if (IS_ERR(old_prog)) {
  3603				ret = PTR_ERR(old_prog);
  3604				old_prog = NULL;
  3605				goto out_put_progs;
  3606			}
  3607		}
  3608	
  3609		if (link->ops == &bpf_cgroup_link_lops) {
  3610			struct bpf_cgroup_link *cg_link;
  3611	
> 3612			cg_link = container_of(link, struct bpf_cgroup_link, link);
  3613			ptype = attach_type_to_prog_type(cg_link->type);
  3614			if (ptype != new_prog->type) {
  3615				ret = -EINVAL;
  3616				goto out_put_progs;
  3617			}
> 3618			ret = cgroup_bpf_replace(cg_link->cgroup, cg_link,
  3619						 old_prog, new_prog);
  3620		} else {
  3621			ret = -EINVAL;
  3622		}
  3623	
  3624	out_put_progs:
  3625		if (old_prog)
  3626			bpf_prog_put(old_prog);
  3627		if (ret)
  3628			bpf_prog_put(new_prog);
  3629		return ret;
  3630	}
  3631	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 20603 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-20 20:36 ` [PATCH bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Andrii Nakryiko
  2020-03-21  0:33   ` kbuild test robot
  2020-03-21  1:28   ` kbuild test robot
@ 2020-03-23 10:58   ` Toke Høiland-Jørgensen
  2020-03-23 17:55     ` Andrii Nakryiko
  2 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-23 10:58 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Andrii Nakryiko <andriin@fb.com> writes:

> Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from
> under given bpf_link. Currently this is only supported for bpf_cgroup_link,
> but will be extended to other kinds of bpf_links in follow-up patches.
>
> For bpf_cgroup_link, implemented functionality matches existing semantics for
> direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either
> unconditionally set new bpf_prog regardless of which bpf_prog is currently
> active under given bpf_link, or, optionally, can specify expected active
> bpf_prog. If active bpf_prog doesn't match expected one, operation is a noop
> and returns a failure.

Nit: I'd consider a 'noop' to be something that succeeds, so that last
sentence is a contradiction. Maybe "If active bpf_prog doesn't match
expected one, the kernel will abort the operation and return a failure."?

> cgroup_bpf_replace() operation is resolving race between auto-detachment and
> bpf_prog update in the same fashion as it's done for bpf_link detachment,
> except in this case update has no way of succeeding because of target cgroup
> marked as dying. So in this case error is returned.
>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  include/linux/bpf-cgroup.h |  4 ++
>  include/uapi/linux/bpf.h   | 12 ++++++
>  kernel/bpf/cgroup.c        | 77 ++++++++++++++++++++++++++++++++++++++
>  kernel/bpf/syscall.c       | 60 +++++++++++++++++++++++++++++
>  kernel/cgroup/cgroup.c     | 21 +++++++++++
>  5 files changed, 174 insertions(+)
>
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index ab95824a1d99..5735d8bfd69e 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -98,6 +98,8 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
>  int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>  			struct bpf_cgroup_link *link,
>  			enum bpf_attach_type type);
> +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
> +			 struct bpf_prog *new_prog);
>  int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>  		       union bpf_attr __user *uattr);
>  
> @@ -108,6 +110,8 @@ int cgroup_bpf_attach(struct cgroup *cgrp,
>  		      u32 flags);
>  int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>  		      enum bpf_attach_type type);
> +int cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
> +		       struct bpf_prog *old_prog, struct bpf_prog *new_prog);
>  int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>  		     union bpf_attr __user *uattr);
>  
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index fad9f79bb8f1..fa944093f9fc 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -112,6 +112,7 @@ enum bpf_cmd {
>  	BPF_MAP_UPDATE_BATCH,
>  	BPF_MAP_DELETE_BATCH,
>  	BPF_LINK_CREATE,
> +	BPF_LINK_UPDATE,
>  };

I feel like there's a BPF_LINK_QUERY operation missing here? Otherwise,
how is userspace supposed to discover which program is currently
attached to a link?

>  enum bpf_map_type {
> @@ -574,6 +575,17 @@ union bpf_attr {
>  		__u32		target_fd;	/* object to attach to */
>  		__u32		attach_type;	/* attach type */
>  	} link_create;
> +
> +	struct { /* struct used by BPF_LINK_UPDATE command */
> +		__u32		link_fd;	/* link fd */
> +		/* new program fd to update link with */
> +		__u32		new_prog_fd;
> +		__u32		flags;		/* extra flags */
> +		/* expected link's program fd; is specified only if
> +		 * BPF_F_REPLACE flag is set in flags */
> +		__u32		old_prog_fd;
> +	} link_update;
> +
>  } __attribute__((aligned(8)));
>  
>  /* The description below is an attempt at providing documentation to eBPF
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index b960e8633f23..b9f4971336f3 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -501,6 +501,83 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
>  	return err;
>  }
>  
> +/* Swap updated BPF program for given link in effective program arrays across
> + * all descendant cgroups. This function is guaranteed to succeed.
> + */
> +static void replace_effective_prog(struct cgroup *cgrp,
> +				   enum bpf_attach_type type,
> +				   struct bpf_cgroup_link *link)
> +{
> +	struct bpf_prog_array_item *item;
> +	struct cgroup_subsys_state *css;
> +	struct bpf_prog_array *progs;
> +	struct bpf_prog_list *pl;
> +	struct list_head *head;
> +	struct cgroup *cg;
> +	int pos;
> +
> +	css_for_each_descendant_pre(css, &cgrp->self) {
> +		struct cgroup *desc = container_of(css, struct cgroup, self);
> +
> +		if (percpu_ref_is_zero(&desc->bpf.refcnt))
> +			continue;
> +
> +		/* found position of link in effective progs array */
> +		for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) {
> +			if (pos && !(cg->bpf.flags[type] & BPF_F_ALLOW_MULTI))
> +				continue;
> +
> +			head = &cg->bpf.progs[type];
> +			list_for_each_entry(pl, head, node) {
> +				if (!prog_list_prog(pl))
> +					continue;
> +				if (pl->link == link)
> +					goto found;
> +				pos++;
> +			}
> +		}
> +found:
> +		BUG_ON(!cg);
> +		progs = rcu_dereference_protected(
> +				desc->bpf.effective[type],
> +				lockdep_is_held(&cgroup_mutex));
> +		item = &progs->items[pos];
> +		WRITE_ONCE(item->prog, link->link.prog);
> +	}
> +}
> +
> +/**
> + * __cgroup_bpf_replace() - Replace link's program and propagate the change
> + *                          to descendants
> + * @cgrp: The cgroup which descendants to traverse
> + * @link: A link for which to replace BPF program
> + * @type: Type of attach operation
> + *
> + * Must be called with cgroup_mutex held.
> + */
> +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
> +			 struct bpf_prog *new_prog)
> +{
> +	struct list_head *progs = &cgrp->bpf.progs[link->type];
> +	struct bpf_prog *old_prog;
> +	struct bpf_prog_list *pl;
> +	bool found = false;
> +
> +	list_for_each_entry(pl, progs, node) {
> +		if (pl->link == link) {
> +			found = true;
> +			break;
> +		}
> +	}
> +	if (!found)
> +		return -ENOENT;
> +
> +	old_prog = xchg(&link->link.prog, new_prog);
> +	replace_effective_prog(cgrp, link->type, link);
> +	bpf_prog_put(old_prog);
> +	return 0;
> +}
> +
>  static struct bpf_prog_list *find_detach_entry(struct list_head *progs,
>  					       struct bpf_prog *prog,
>  					       struct bpf_cgroup_link *link,
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index f6e7d32a2632..1ff7aaa2c727 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3572,6 +3572,63 @@ static int link_create(union bpf_attr *attr)
>  	return ret;
>  }
>  
> +#define BPF_LINK_UPDATE_LAST_FIELD link_update.old_prog_fd
> +
> +static int link_update(union bpf_attr *attr)
> +{
> +	struct bpf_prog *old_prog = NULL, *new_prog;
> +	enum bpf_prog_type ptype;
> +	struct bpf_link *link;
> +	u32 flags;
> +	int ret;
> +
> +	if (CHECK_ATTR(BPF_LINK_UPDATE))
> +		return -EINVAL;
> +
> +	flags = attr->link_update.flags;
> +	if (flags & ~BPF_F_REPLACE)
> +		return -EINVAL;
> +
> +	link = bpf_link_get_from_fd(attr->link_update.link_fd);
> +	if (IS_ERR(link))
> +		return PTR_ERR(link);
> +
> +	new_prog = bpf_prog_get(attr->link_update.new_prog_fd);
> +	if (IS_ERR(new_prog))
> +		return PTR_ERR(new_prog);
> +
> +	if (flags & BPF_F_REPLACE) {
> +		old_prog = bpf_prog_get(attr->link_update.old_prog_fd);
> +		if (IS_ERR(old_prog)) {
> +			ret = PTR_ERR(old_prog);
> +			old_prog = NULL;
> +			goto out_put_progs;
> +		}
> +	}

Shouldn't the default be to require an old FD and do atomic update, but
provide a flag (BPF_F_CLOBBER?) to opt-in to unconditional replace
behaviour? Since the unconditional replace is inherently racy I don't
think it should be the default; in fact, I'm not sure if it should be
allowed at all?

> +	if (link->ops == &bpf_cgroup_link_lops) {
> +		struct bpf_cgroup_link *cg_link;
> +
> +		cg_link = container_of(link, struct bpf_cgroup_link, link);
> +		ptype = attach_type_to_prog_type(cg_link->type);
> +		if (ptype != new_prog->type) {
> +			ret = -EINVAL;
> +			goto out_put_progs;
> +		}
> +		ret = cgroup_bpf_replace(cg_link->cgroup, cg_link,
> +					 old_prog, new_prog);
> +	} else {
> +		ret = -EINVAL;
> +	}
> +
> +out_put_progs:
> +	if (old_prog)
> +		bpf_prog_put(old_prog);
> +	if (ret)
> +		bpf_prog_put(new_prog);
> +	return ret;
> +}
> +
>  SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
>  {
>  	union bpf_attr attr = {};
> @@ -3685,6 +3742,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
>  	case BPF_LINK_CREATE:
>  		err = link_create(&attr);
>  		break;
> +	case BPF_LINK_UPDATE:
> +		err = link_update(&attr);
> +		break;
>  	default:
>  		err = -EINVAL;
>  		break;
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 219624fba9ba..d4787fccf183 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -6317,6 +6317,27 @@ int cgroup_bpf_attach(struct cgroup *cgrp,
>  	return ret;
>  }
>  
> +int cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
> +		       struct bpf_prog *old_prog, struct bpf_prog *new_prog)
> +{
> +	int ret;
> +
> +	mutex_lock(&cgroup_mutex);
> +	/* link might have been auto-released by dying cgroup, so fail */
> +	if (!link->cgroup) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +	if (old_prog && link->link.prog != old_prog) {
> +		ret = -EPERM;
> +		goto out_unlock;
> +	}
> +	ret = __cgroup_bpf_replace(cgrp, link, new_prog);
> +out_unlock:
> +	mutex_unlock(&cgroup_mutex);
> +	return ret;
> +}
> +
>  int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>  		      enum bpf_attach_type type)
>  {
> -- 
> 2.17.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment
  2020-03-20 20:36 ` [PATCH bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment Andrii Nakryiko
@ 2020-03-23 11:02   ` Toke Høiland-Jørgensen
  2020-03-23 17:58     ` Andrii Nakryiko
  0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-23 11:02 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Andrii Nakryiko <andriin@fb.com> writes:

> Add bpf_program__attach_cgroup(), which uses BPF_LINK_CREATE subcommand to
> create an FD-based kernel bpf_link. Also add low-level bpf_link_create() API.
>
> If expected_attach_type is not specified explicitly with
> bpf_program__set_expected_attach_type(), libbpf will try to determine proper
> attach type from BPF program's section definition.
>
> Also add support for bpf_link's underlying BPF program replacement:
>   - unconditional through high-level bpf_link__update_program() API;
>   - cmpxchg-like with specifying expected current BPF program through
>     low-level bpf_link_update() API.
>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  tools/include/uapi/linux/bpf.h | 12 +++++++++
>  tools/lib/bpf/bpf.c            | 34 +++++++++++++++++++++++++
>  tools/lib/bpf/bpf.h            | 19 ++++++++++++++
>  tools/lib/bpf/libbpf.c         | 46 ++++++++++++++++++++++++++++++++++
>  tools/lib/bpf/libbpf.h         |  8 +++++-
>  tools/lib/bpf/libbpf.map       |  4 +++
>  6 files changed, 122 insertions(+), 1 deletion(-)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index fad9f79bb8f1..fa944093f9fc 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -112,6 +112,7 @@ enum bpf_cmd {
>  	BPF_MAP_UPDATE_BATCH,
>  	BPF_MAP_DELETE_BATCH,
>  	BPF_LINK_CREATE,
> +	BPF_LINK_UPDATE,
>  };
>  
>  enum bpf_map_type {
> @@ -574,6 +575,17 @@ union bpf_attr {
>  		__u32		target_fd;	/* object to attach to */
>  		__u32		attach_type;	/* attach type */
>  	} link_create;
> +
> +	struct { /* struct used by BPF_LINK_UPDATE command */
> +		__u32		link_fd;	/* link fd */
> +		/* new program fd to update link with */
> +		__u32		new_prog_fd;
> +		__u32		flags;		/* extra flags */
> +		/* expected link's program fd; is specified only if
> +		 * BPF_F_REPLACE flag is set in flags */
> +		__u32		old_prog_fd;
> +	} link_update;
> +
>  } __attribute__((aligned(8)));
>  
>  /* The description below is an attempt at providing documentation to eBPF
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index c6dafe563176..35c34fc81bd0 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -584,6 +584,40 @@ int bpf_prog_detach2(int prog_fd, int target_fd, enum bpf_attach_type type)
>  	return sys_bpf(BPF_PROG_DETACH, &attr, sizeof(attr));
>  }
>  
> +int bpf_link_create(int prog_fd, int target_fd,
> +		    enum bpf_attach_type attach_type,
> +		    const struct bpf_link_create_opts *opts)
> +{
> +	union bpf_attr attr;
> +
> +	if (!OPTS_VALID(opts, bpf_link_create_opts))
> +		return -EINVAL;
> +
> +	memset(&attr, 0, sizeof(attr));
> +	attr.link_create.prog_fd = prog_fd;
> +	attr.link_create.target_fd = target_fd;
> +	attr.link_create.attach_type = attach_type;
> +
> +	return sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
> +}
> +
> +int bpf_link_update(int link_fd, int new_prog_fd,
> +		    const struct bpf_link_update_opts *opts)
> +{
> +	union bpf_attr attr;
> +
> +	if (!OPTS_VALID(opts, bpf_link_update_opts))
> +		return -EINVAL;
> +
> +	memset(&attr, 0, sizeof(attr));
> +	attr.link_update.link_fd = link_fd;
> +	attr.link_update.new_prog_fd = new_prog_fd;
> +	attr.link_update.flags = OPTS_GET(opts, flags, 0);
> +	attr.link_update.old_prog_fd = OPTS_GET(opts, old_prog_fd, 0);
> +
> +	return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr));
> +}
> +
>  int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
>  		   __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt)
>  {
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index b976e77316cc..46d47afdd887 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -168,6 +168,25 @@ LIBBPF_API int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
>  LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd,
>  				enum bpf_attach_type type);
>  
> +struct bpf_link_create_opts {
> +	size_t sz; /* size of this struct for forward/backward compatibility */
> +};
> +#define bpf_link_create_opts__last_field sz
> +
> +LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
> +			       enum bpf_attach_type attach_type,
> +			       const struct bpf_link_create_opts *opts);
> +
> +struct bpf_link_update_opts {
> +	size_t sz; /* size of this struct for forward/backward compatibility */
> +	__u32 flags;	   /* extra flags */
> +	__u32 old_prog_fd; /* expected old program FD */
> +};
> +#define bpf_link_update_opts__last_field old_prog_fd
> +
> +LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd,
> +			       const struct bpf_link_update_opts *opts);
> +
>  struct bpf_prog_test_run_attr {
>  	int prog_fd;
>  	int repeat;
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 085e41f9b68e..8b23c70033d3 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -6951,6 +6951,12 @@ struct bpf_link {
>  	bool disconnected;
>  };
>  
> +/* Replace link's underlying BPF program with the new one */
> +int bpf_link__update_program(struct bpf_link *link, struct bpf_program *prog)
> +{
> +	return bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog), NULL);
> +}

I would expect bpf_link to keep track of the previous program and
automatically fill it in with this operation. I.e., it should be
possible to do something like:

link = bpf_link__open("/sys/fs/bpf/my_link");
prog = bpf_link__get_prog(link);
new_prog = enhance_prog(prog);
err = bpf_link__update_program(link, new_prog);

and have atomic replacement "just work". This obviously implies that
bpf_link__open() should use that BPF_LINK_QUERY operation I was
requesting in my comment to the previous patch :)

-Toke


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-23 10:58   ` Toke Høiland-Jørgensen
@ 2020-03-23 17:55     ` Andrii Nakryiko
  2020-03-23 19:29       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2020-03-23 17:55 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Mon, Mar 23, 2020 at 3:58 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andriin@fb.com> writes:
>
> > Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from
> > under given bpf_link. Currently this is only supported for bpf_cgroup_link,
> > but will be extended to other kinds of bpf_links in follow-up patches.
> >
> > For bpf_cgroup_link, implemented functionality matches existing semantics for
> > direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either
> > unconditionally set new bpf_prog regardless of which bpf_prog is currently
> > active under given bpf_link, or, optionally, can specify expected active
> > bpf_prog. If active bpf_prog doesn't match expected one, operation is a noop
> > and returns a failure.
>
> Nit: I'd consider a 'noop' to be something that succeeds, so that last
> sentence is a contradiction. Maybe "If active bpf_prog doesn't match
> expected one, the kernel will abort the operation and return a failure."?

Heh, for me "noop" (no operation) means no changes done, it doesn't
mean that syscall itself is successful. But I'll change the wording to
be less confusing.

>
> > cgroup_bpf_replace() operation is resolving race between auto-detachment and
> > bpf_prog update in the same fashion as it's done for bpf_link detachment,
> > except in this case update has no way of succeeding because of target cgroup
> > marked as dying. So in this case error is returned.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
> >  include/linux/bpf-cgroup.h |  4 ++
> >  include/uapi/linux/bpf.h   | 12 ++++++
> >  kernel/bpf/cgroup.c        | 77 ++++++++++++++++++++++++++++++++++++++
> >  kernel/bpf/syscall.c       | 60 +++++++++++++++++++++++++++++
> >  kernel/cgroup/cgroup.c     | 21 +++++++++++
> >  5 files changed, 174 insertions(+)
> >
> > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> > index ab95824a1d99..5735d8bfd69e 100644
> > --- a/include/linux/bpf-cgroup.h
> > +++ b/include/linux/bpf-cgroup.h
> > @@ -98,6 +98,8 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
> >  int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> >                       struct bpf_cgroup_link *link,
> >                       enum bpf_attach_type type);
> > +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
> > +                      struct bpf_prog *new_prog);
> >  int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
> >                      union bpf_attr __user *uattr);
> >
> > @@ -108,6 +110,8 @@ int cgroup_bpf_attach(struct cgroup *cgrp,
> >                     u32 flags);
> >  int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> >                     enum bpf_attach_type type);
> > +int cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
> > +                    struct bpf_prog *old_prog, struct bpf_prog *new_prog);
> >  int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
> >                    union bpf_attr __user *uattr);
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index fad9f79bb8f1..fa944093f9fc 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -112,6 +112,7 @@ enum bpf_cmd {
> >       BPF_MAP_UPDATE_BATCH,
> >       BPF_MAP_DELETE_BATCH,
> >       BPF_LINK_CREATE,
> > +     BPF_LINK_UPDATE,
> >  };
>
> I feel like there's a BPF_LINK_QUERY operation missing here? Otherwise,
> how is userspace supposed to discover which program is currently
> attached to a link?

Probably, but it should return not just attached BPF program, but also
whatever target it is attached to (e.g., cgroup, ifindex, perf event
fd, etc). But we'll need to design it properly, so I didn't do
implementation yet.

>
> >  enum bpf_map_type {
> > @@ -574,6 +575,17 @@ union bpf_attr {
> >               __u32           target_fd;      /* object to attach to */
> >               __u32           attach_type;    /* attach type */
> >       } link_create;
> > +
> > +     struct { /* struct used by BPF_LINK_UPDATE command */
> > +             __u32           link_fd;        /* link fd */
> > +             /* new program fd to update link with */
> > +             __u32           new_prog_fd;
> > +             __u32           flags;          /* extra flags */
> > +             /* expected link's program fd; is specified only if
> > +              * BPF_F_REPLACE flag is set in flags */
> > +             __u32           old_prog_fd;
> > +     } link_update;
> > +
> >  } __attribute__((aligned(8)));
> >
> >  /* The description below is an attempt at providing documentation to eBPF
> > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > index b960e8633f23..b9f4971336f3 100644
> > --- a/kernel/bpf/cgroup.c
> > +++ b/kernel/bpf/cgroup.c
> > @@ -501,6 +501,83 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
> >       return err;
> >  }
> >
> > +/* Swap updated BPF program for given link in effective program arrays across
> > + * all descendant cgroups. This function is guaranteed to succeed.
> > + */
> > +static void replace_effective_prog(struct cgroup *cgrp,
> > +                                enum bpf_attach_type type,
> > +                                struct bpf_cgroup_link *link)
> > +{
> > +     struct bpf_prog_array_item *item;
> > +     struct cgroup_subsys_state *css;
> > +     struct bpf_prog_array *progs;
> > +     struct bpf_prog_list *pl;
> > +     struct list_head *head;
> > +     struct cgroup *cg;
> > +     int pos;
> > +
> > +     css_for_each_descendant_pre(css, &cgrp->self) {
> > +             struct cgroup *desc = container_of(css, struct cgroup, self);
> > +
> > +             if (percpu_ref_is_zero(&desc->bpf.refcnt))
> > +                     continue;
> > +
> > +             /* found position of link in effective progs array */
> > +             for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) {
> > +                     if (pos && !(cg->bpf.flags[type] & BPF_F_ALLOW_MULTI))
> > +                             continue;
> > +
> > +                     head = &cg->bpf.progs[type];
> > +                     list_for_each_entry(pl, head, node) {
> > +                             if (!prog_list_prog(pl))
> > +                                     continue;
> > +                             if (pl->link == link)
> > +                                     goto found;
> > +                             pos++;
> > +                     }
> > +             }
> > +found:
> > +             BUG_ON(!cg);
> > +             progs = rcu_dereference_protected(
> > +                             desc->bpf.effective[type],
> > +                             lockdep_is_held(&cgroup_mutex));
> > +             item = &progs->items[pos];
> > +             WRITE_ONCE(item->prog, link->link.prog);
> > +     }
> > +}
> > +
> > +/**
> > + * __cgroup_bpf_replace() - Replace link's program and propagate the change
> > + *                          to descendants
> > + * @cgrp: The cgroup which descendants to traverse
> > + * @link: A link for which to replace BPF program
> > + * @type: Type of attach operation
> > + *
> > + * Must be called with cgroup_mutex held.
> > + */
> > +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
> > +                      struct bpf_prog *new_prog)
> > +{
> > +     struct list_head *progs = &cgrp->bpf.progs[link->type];
> > +     struct bpf_prog *old_prog;
> > +     struct bpf_prog_list *pl;
> > +     bool found = false;
> > +
> > +     list_for_each_entry(pl, progs, node) {
> > +             if (pl->link == link) {
> > +                     found = true;
> > +                     break;
> > +             }
> > +     }
> > +     if (!found)
> > +             return -ENOENT;
> > +
> > +     old_prog = xchg(&link->link.prog, new_prog);
> > +     replace_effective_prog(cgrp, link->type, link);
> > +     bpf_prog_put(old_prog);
> > +     return 0;
> > +}
> > +
> >  static struct bpf_prog_list *find_detach_entry(struct list_head *progs,
> >                                              struct bpf_prog *prog,
> >                                              struct bpf_cgroup_link *link,
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index f6e7d32a2632..1ff7aaa2c727 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -3572,6 +3572,63 @@ static int link_create(union bpf_attr *attr)
> >       return ret;
> >  }
> >
> > +#define BPF_LINK_UPDATE_LAST_FIELD link_update.old_prog_fd
> > +
> > +static int link_update(union bpf_attr *attr)
> > +{
> > +     struct bpf_prog *old_prog = NULL, *new_prog;
> > +     enum bpf_prog_type ptype;
> > +     struct bpf_link *link;
> > +     u32 flags;
> > +     int ret;
> > +
> > +     if (CHECK_ATTR(BPF_LINK_UPDATE))
> > +             return -EINVAL;
> > +
> > +     flags = attr->link_update.flags;
> > +     if (flags & ~BPF_F_REPLACE)
> > +             return -EINVAL;
> > +
> > +     link = bpf_link_get_from_fd(attr->link_update.link_fd);
> > +     if (IS_ERR(link))
> > +             return PTR_ERR(link);
> > +
> > +     new_prog = bpf_prog_get(attr->link_update.new_prog_fd);
> > +     if (IS_ERR(new_prog))
> > +             return PTR_ERR(new_prog);
> > +
> > +     if (flags & BPF_F_REPLACE) {
> > +             old_prog = bpf_prog_get(attr->link_update.old_prog_fd);
> > +             if (IS_ERR(old_prog)) {
> > +                     ret = PTR_ERR(old_prog);
> > +                     old_prog = NULL;
> > +                     goto out_put_progs;
> > +             }
> > +     }
>
> Shouldn't the default be to require an old FD and do atomic update, but
> provide a flag (BPF_F_CLOBBER?) to opt-in to unconditional replace
> behaviour? Since the unconditional replace is inherently racy I don't
> think it should be the default; in fact, I'm not sure if it should be
> allowed at all?

I don't feel strongly either way, but I expect majority of use cases
to use non-pinned bpf_link with just FD open by an application, where
application knows that no one else can update program from under link.
In which case setting new BPF program won't be racy.

>
> > +     if (link->ops == &bpf_cgroup_link_lops) {
> > +             struct bpf_cgroup_link *cg_link;
> > +
> > +             cg_link = container_of(link, struct bpf_cgroup_link, link);
> > +             ptype = attach_type_to_prog_type(cg_link->type);
> > +             if (ptype != new_prog->type) {
> > +                     ret = -EINVAL;
> > +                     goto out_put_progs;
> > +             }
> > +             ret = cgroup_bpf_replace(cg_link->cgroup, cg_link,
> > +                                      old_prog, new_prog);
> > +     } else {
> > +             ret = -EINVAL;
> > +     }
> > +
> > +out_put_progs:
> > +     if (old_prog)
> > +             bpf_prog_put(old_prog);
> > +     if (ret)
> > +             bpf_prog_put(new_prog);
> > +     return ret;
> > +}
> > +
> >  SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
> >  {
> >       union bpf_attr attr = {};
> > @@ -3685,6 +3742,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
> >       case BPF_LINK_CREATE:
> >               err = link_create(&attr);
> >               break;
> > +     case BPF_LINK_UPDATE:
> > +             err = link_update(&attr);
> > +             break;
> >       default:
> >               err = -EINVAL;
> >               break;
> > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> > index 219624fba9ba..d4787fccf183 100644
> > --- a/kernel/cgroup/cgroup.c
> > +++ b/kernel/cgroup/cgroup.c
> > @@ -6317,6 +6317,27 @@ int cgroup_bpf_attach(struct cgroup *cgrp,
> >       return ret;
> >  }
> >
> > +int cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
> > +                    struct bpf_prog *old_prog, struct bpf_prog *new_prog)
> > +{
> > +     int ret;
> > +
> > +     mutex_lock(&cgroup_mutex);
> > +     /* link might have been auto-released by dying cgroup, so fail */
> > +     if (!link->cgroup) {
> > +             ret = -EINVAL;
> > +             goto out_unlock;
> > +     }
> > +     if (old_prog && link->link.prog != old_prog) {
> > +             ret = -EPERM;
> > +             goto out_unlock;
> > +     }
> > +     ret = __cgroup_bpf_replace(cgrp, link, new_prog);
> > +out_unlock:
> > +     mutex_unlock(&cgroup_mutex);
> > +     return ret;
> > +}
> > +
> >  int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
> >                     enum bpf_attach_type type)
> >  {
> > --
> > 2.17.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment
  2020-03-23 11:02   ` Toke Høiland-Jørgensen
@ 2020-03-23 17:58     ` Andrii Nakryiko
  2020-03-23 19:31       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2020-03-23 17:58 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Mon, Mar 23, 2020 at 4:02 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andriin@fb.com> writes:
>
> > Add bpf_program__attach_cgroup(), which uses BPF_LINK_CREATE subcommand to
> > create an FD-based kernel bpf_link. Also add low-level bpf_link_create() API.
> >
> > If expected_attach_type is not specified explicitly with
> > bpf_program__set_expected_attach_type(), libbpf will try to determine proper
> > attach type from BPF program's section definition.
> >
> > Also add support for bpf_link's underlying BPF program replacement:
> >   - unconditional through high-level bpf_link__update_program() API;
> >   - cmpxchg-like with specifying expected current BPF program through
> >     low-level bpf_link_update() API.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
> >  tools/include/uapi/linux/bpf.h | 12 +++++++++
> >  tools/lib/bpf/bpf.c            | 34 +++++++++++++++++++++++++
> >  tools/lib/bpf/bpf.h            | 19 ++++++++++++++
> >  tools/lib/bpf/libbpf.c         | 46 ++++++++++++++++++++++++++++++++++
> >  tools/lib/bpf/libbpf.h         |  8 +++++-
> >  tools/lib/bpf/libbpf.map       |  4 +++
> >  6 files changed, 122 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > index fad9f79bb8f1..fa944093f9fc 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -112,6 +112,7 @@ enum bpf_cmd {
> >       BPF_MAP_UPDATE_BATCH,
> >       BPF_MAP_DELETE_BATCH,
> >       BPF_LINK_CREATE,
> > +     BPF_LINK_UPDATE,
> >  };
> >
> >  enum bpf_map_type {
> > @@ -574,6 +575,17 @@ union bpf_attr {
> >               __u32           target_fd;      /* object to attach to */
> >               __u32           attach_type;    /* attach type */
> >       } link_create;
> > +
> > +     struct { /* struct used by BPF_LINK_UPDATE command */
> > +             __u32           link_fd;        /* link fd */
> > +             /* new program fd to update link with */
> > +             __u32           new_prog_fd;
> > +             __u32           flags;          /* extra flags */
> > +             /* expected link's program fd; is specified only if
> > +              * BPF_F_REPLACE flag is set in flags */
> > +             __u32           old_prog_fd;
> > +     } link_update;
> > +
> >  } __attribute__((aligned(8)));
> >
> >  /* The description below is an attempt at providing documentation to eBPF
> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > index c6dafe563176..35c34fc81bd0 100644
> > --- a/tools/lib/bpf/bpf.c
> > +++ b/tools/lib/bpf/bpf.c
> > @@ -584,6 +584,40 @@ int bpf_prog_detach2(int prog_fd, int target_fd, enum bpf_attach_type type)
> >       return sys_bpf(BPF_PROG_DETACH, &attr, sizeof(attr));
> >  }
> >
> > +int bpf_link_create(int prog_fd, int target_fd,
> > +                 enum bpf_attach_type attach_type,
> > +                 const struct bpf_link_create_opts *opts)
> > +{
> > +     union bpf_attr attr;
> > +
> > +     if (!OPTS_VALID(opts, bpf_link_create_opts))
> > +             return -EINVAL;
> > +
> > +     memset(&attr, 0, sizeof(attr));
> > +     attr.link_create.prog_fd = prog_fd;
> > +     attr.link_create.target_fd = target_fd;
> > +     attr.link_create.attach_type = attach_type;
> > +
> > +     return sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
> > +}
> > +
> > +int bpf_link_update(int link_fd, int new_prog_fd,
> > +                 const struct bpf_link_update_opts *opts)
> > +{
> > +     union bpf_attr attr;
> > +
> > +     if (!OPTS_VALID(opts, bpf_link_update_opts))
> > +             return -EINVAL;
> > +
> > +     memset(&attr, 0, sizeof(attr));
> > +     attr.link_update.link_fd = link_fd;
> > +     attr.link_update.new_prog_fd = new_prog_fd;
> > +     attr.link_update.flags = OPTS_GET(opts, flags, 0);
> > +     attr.link_update.old_prog_fd = OPTS_GET(opts, old_prog_fd, 0);
> > +
> > +     return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr));
> > +}
> > +
> >  int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
> >                  __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt)
> >  {
> > diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> > index b976e77316cc..46d47afdd887 100644
> > --- a/tools/lib/bpf/bpf.h
> > +++ b/tools/lib/bpf/bpf.h
> > @@ -168,6 +168,25 @@ LIBBPF_API int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
> >  LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd,
> >                               enum bpf_attach_type type);
> >
> > +struct bpf_link_create_opts {
> > +     size_t sz; /* size of this struct for forward/backward compatibility */
> > +};
> > +#define bpf_link_create_opts__last_field sz
> > +
> > +LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
> > +                            enum bpf_attach_type attach_type,
> > +                            const struct bpf_link_create_opts *opts);
> > +
> > +struct bpf_link_update_opts {
> > +     size_t sz; /* size of this struct for forward/backward compatibility */
> > +     __u32 flags;       /* extra flags */
> > +     __u32 old_prog_fd; /* expected old program FD */
> > +};
> > +#define bpf_link_update_opts__last_field old_prog_fd
> > +
> > +LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd,
> > +                            const struct bpf_link_update_opts *opts);
> > +
> >  struct bpf_prog_test_run_attr {
> >       int prog_fd;
> >       int repeat;
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index 085e41f9b68e..8b23c70033d3 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -6951,6 +6951,12 @@ struct bpf_link {
> >       bool disconnected;
> >  };
> >
> > +/* Replace link's underlying BPF program with the new one */
> > +int bpf_link__update_program(struct bpf_link *link, struct bpf_program *prog)
> > +{
> > +     return bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog), NULL);
> > +}
>
> I would expect bpf_link to keep track of the previous program and
> automatically fill it in with this operation. I.e., it should be
> possible to do something like:
>
> link = bpf_link__open("/sys/fs/bpf/my_link");
> prog = bpf_link__get_prog(link);

I don't think libbpf is able to construct struct bpf_program from link
info. It can get program FD, of course, but struct bpf_program is much
more than that and not sure kernel has all the necessary info. Some
parts of bpf_program is coming from ELF file, which is gone by this
time.

> new_prog = enhance_prog(prog);
> err = bpf_link__update_program(link, new_prog);
>
> and have atomic replacement "just work". This obviously implies that
> bpf_link__open() should use that BPF_LINK_QUERY operation I was
> requesting in my comment to the previous patch :)

This will depend on which way we go with mandatory/default expected
program FD vs optional.

>
> -Toke
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Potential Spoof] [PATCH bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment
  2020-03-21  0:19     ` Andrii Nakryiko
@ 2020-03-23 18:03       ` Andrii Nakryiko
  0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2020-03-23 18:03 UTC (permalink / raw)
  To: Andrey Ignatov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Fri, Mar 20, 2020 at 5:19 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Mar 20, 2020 at 4:46 PM Andrey Ignatov <rdna@fb.com> wrote:
> >
> > Andrii Nakryiko <andriin@fb.com> [Fri, 2020-03-20 13:37 -0700]:
> > > Implement new sub-command to attach cgroup BPF programs and return FD-based
> > > bpf_link back on success. bpf_link, once attached to cgroup, cannot be
> > > replaced, except by owner having its FD. cgroup bpf_link has semantics of
> > > BPF_F_ALLOW_MULTI BPF program attachments and can co-exist with
> >
> > Hi Andrii,
> >
> > Is there any reason to limit it to only BPF_F_ALLOW_MULTI?
> >
>
> No technical reason, just felt like the good default behavior. It's
> possible to support all the same flags as with legacy BPF program
> attachment, but see below...

So I went ahead and just added support for all the modes, it's very
minor change in kernel itself, but needs a lot more selftesting logic,
given all the new modes. Once I finish with that, I'll post v2 that
also fixes build with !CONFIG_CGROUP_BPF. We can continue discussing
orthogonal inheritance policies independently, if there is any
intereset.

>
> > The thing is BPF_F_ALLOW_MULTI not only allows to attach multiple
> > programs to specified cgroup but also controls what programs can later
> > be attached to a sub-cgroup, and in BPF_F_ALLOW_MULTI case both
> > sub-cgroup programs and specified cgroup programs will be executed (in
> > this order).
> >
> > There many use-cases though when it's desired to either completely
> > disallow attaching programs to a sub-cgroup or override parent cgroup's
> > program behavior in sub-cgroup. If bpf_link covers only
> > BPF_F_ALLOW_MULTI, those scenarios won't be able to leverage it.
> >
> > This double-purpose of BPF_F_ALLOW_MULTI is a pain ... For example if
>
> Yeah, exactly. I don't know historical reasons for why it was done the
> way it was done (i.e., BPF_F_ALLOW_MULTI and BPF_F_ALLOW_OVERRIDE
> flags), so maybe someone can give some insights there. But I wonder if
> inheritance policy should be orthogonal to a single vs multiple
> bpf_progs limit for a given cgroup. They could be specified on
> per-cgroup level, not per-BPF program (and then making sure flags
> match). That way it would be possible to express more nuanced
> policies, like allowing multiple programs for a root cgroup, but
> disallowing attach new BPF programs for any child cgroup, etc.
>
> Would be good to get some more perspectives on this...
>
> > one wants to attach multiple programs to a cgroup but disallow attaching
> > programs to a sub-cgroup it's currently impossible (well, w/o additional
> > cgroup level just for this).
> >
> > > non-bpf_link-based BPF cgroup attachments.
> > >
> > > To prevent bpf_cgroup_link from keeping cgroup alive past the point when no
> > > BPF program can be executed, implement auto-detachment of link. When
> > > cgroup_bpf_release() is called, all attached bpf_links are forced to release
> > > cgroup refcounts, but they leave bpf_link otherwise active and allocated, as
> > > well as still owning underlying bpf_prog. This is because user-space might
> > > still have FDs open and active, so bpf_link as a user-referenced object can't
> > > be freed yet. Once last active FD is closed, bpf_link will be freed and
> > > underlying bpf_prog refcount will be dropped. But cgroup refcount won't be
> > > touched, because cgroup is released already.
> > >
> > > The inherent race between bpf_cgroup_link release (from closing last FD) and
> > > cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So
> > > the only additional check required is when bpf_cgroup_link attempts to detach
> > > itself from cgroup. At that time we need to check whether there is still
> > > cgroup associated with that link. And if not, exit with success, because
> > > bpf_cgroup_link was already successfully detached.
> > >
> > > Acked-by: Roman Gushchin <guro@fb.com>
> > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > ---
> > >  include/linux/bpf-cgroup.h     |  27 ++-
> > >  include/linux/bpf.h            |  10 +-
> > >  include/uapi/linux/bpf.h       |   9 +-
> > >  kernel/bpf/cgroup.c            | 313 +++++++++++++++++++++++++--------
> > >  kernel/bpf/syscall.c           |  62 +++++--
> > >  kernel/cgroup/cgroup.c         |  14 +-
> > >  tools/include/uapi/linux/bpf.h |   9 +-
> > >  7 files changed, 345 insertions(+), 99 deletions(-)
> > >
>
> [...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-23 17:55     ` Andrii Nakryiko
@ 2020-03-23 19:29       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-23 19:29 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Mon, Mar 23, 2020 at 3:58 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andriin@fb.com> writes:
>>
>> > Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from
>> > under given bpf_link. Currently this is only supported for bpf_cgroup_link,
>> > but will be extended to other kinds of bpf_links in follow-up patches.
>> >
>> > For bpf_cgroup_link, implemented functionality matches existing semantics for
>> > direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either
>> > unconditionally set new bpf_prog regardless of which bpf_prog is currently
>> > active under given bpf_link, or, optionally, can specify expected active
>> > bpf_prog. If active bpf_prog doesn't match expected one, operation is a noop
>> > and returns a failure.
>>
>> Nit: I'd consider a 'noop' to be something that succeeds, so that last
>> sentence is a contradiction. Maybe "If active bpf_prog doesn't match
>> expected one, the kernel will abort the operation and return a failure."?
>
> Heh, for me "noop" (no operation) means no changes done, it doesn't
> mean that syscall itself is successful. But I'll change the wording to
> be less confusing.

Cool.

>>
>> > cgroup_bpf_replace() operation is resolving race between auto-detachment and
>> > bpf_prog update in the same fashion as it's done for bpf_link detachment,
>> > except in this case update has no way of succeeding because of target cgroup
>> > marked as dying. So in this case error is returned.
>> >
>> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>> > ---
>> >  include/linux/bpf-cgroup.h |  4 ++
>> >  include/uapi/linux/bpf.h   | 12 ++++++
>> >  kernel/bpf/cgroup.c        | 77 ++++++++++++++++++++++++++++++++++++++
>> >  kernel/bpf/syscall.c       | 60 +++++++++++++++++++++++++++++
>> >  kernel/cgroup/cgroup.c     | 21 +++++++++++
>> >  5 files changed, 174 insertions(+)
>> >
>> > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
>> > index ab95824a1d99..5735d8bfd69e 100644
>> > --- a/include/linux/bpf-cgroup.h
>> > +++ b/include/linux/bpf-cgroup.h
>> > @@ -98,6 +98,8 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
>> >  int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>> >                       struct bpf_cgroup_link *link,
>> >                       enum bpf_attach_type type);
>> > +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
>> > +                      struct bpf_prog *new_prog);
>> >  int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>> >                      union bpf_attr __user *uattr);
>> >
>> > @@ -108,6 +110,8 @@ int cgroup_bpf_attach(struct cgroup *cgrp,
>> >                     u32 flags);
>> >  int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
>> >                     enum bpf_attach_type type);
>> > +int cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
>> > +                    struct bpf_prog *old_prog, struct bpf_prog *new_prog);
>> >  int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>> >                    union bpf_attr __user *uattr);
>> >
>> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> > index fad9f79bb8f1..fa944093f9fc 100644
>> > --- a/include/uapi/linux/bpf.h
>> > +++ b/include/uapi/linux/bpf.h
>> > @@ -112,6 +112,7 @@ enum bpf_cmd {
>> >       BPF_MAP_UPDATE_BATCH,
>> >       BPF_MAP_DELETE_BATCH,
>> >       BPF_LINK_CREATE,
>> > +     BPF_LINK_UPDATE,
>> >  };
>>
>> I feel like there's a BPF_LINK_QUERY operation missing here? Otherwise,
>> how is userspace supposed to discover which program is currently
>> attached to a link?
>
> Probably, but it should return not just attached BPF program, but also
> whatever target it is attached to (e.g., cgroup, ifindex, perf event
> fd, etc).

Yes, sure.

> But we'll need to design it properly, so I didn't do implementation
> yet.

Right, OK.

>> >  enum bpf_map_type {
>> > @@ -574,6 +575,17 @@ union bpf_attr {
>> >               __u32           target_fd;      /* object to attach to */
>> >               __u32           attach_type;    /* attach type */
>> >       } link_create;
>> > +
>> > +     struct { /* struct used by BPF_LINK_UPDATE command */
>> > +             __u32           link_fd;        /* link fd */
>> > +             /* new program fd to update link with */
>> > +             __u32           new_prog_fd;
>> > +             __u32           flags;          /* extra flags */
>> > +             /* expected link's program fd; is specified only if
>> > +              * BPF_F_REPLACE flag is set in flags */
>> > +             __u32           old_prog_fd;
>> > +     } link_update;
>> > +
>> >  } __attribute__((aligned(8)));
>> >
>> >  /* The description below is an attempt at providing documentation to eBPF
>> > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>> > index b960e8633f23..b9f4971336f3 100644
>> > --- a/kernel/bpf/cgroup.c
>> > +++ b/kernel/bpf/cgroup.c
>> > @@ -501,6 +501,83 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
>> >       return err;
>> >  }
>> >
>> > +/* Swap updated BPF program for given link in effective program arrays across
>> > + * all descendant cgroups. This function is guaranteed to succeed.
>> > + */
>> > +static void replace_effective_prog(struct cgroup *cgrp,
>> > +                                enum bpf_attach_type type,
>> > +                                struct bpf_cgroup_link *link)
>> > +{
>> > +     struct bpf_prog_array_item *item;
>> > +     struct cgroup_subsys_state *css;
>> > +     struct bpf_prog_array *progs;
>> > +     struct bpf_prog_list *pl;
>> > +     struct list_head *head;
>> > +     struct cgroup *cg;
>> > +     int pos;
>> > +
>> > +     css_for_each_descendant_pre(css, &cgrp->self) {
>> > +             struct cgroup *desc = container_of(css, struct cgroup, self);
>> > +
>> > +             if (percpu_ref_is_zero(&desc->bpf.refcnt))
>> > +                     continue;
>> > +
>> > +             /* found position of link in effective progs array */
>> > +             for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) {
>> > +                     if (pos && !(cg->bpf.flags[type] & BPF_F_ALLOW_MULTI))
>> > +                             continue;
>> > +
>> > +                     head = &cg->bpf.progs[type];
>> > +                     list_for_each_entry(pl, head, node) {
>> > +                             if (!prog_list_prog(pl))
>> > +                                     continue;
>> > +                             if (pl->link == link)
>> > +                                     goto found;
>> > +                             pos++;
>> > +                     }
>> > +             }
>> > +found:
>> > +             BUG_ON(!cg);
>> > +             progs = rcu_dereference_protected(
>> > +                             desc->bpf.effective[type],
>> > +                             lockdep_is_held(&cgroup_mutex));
>> > +             item = &progs->items[pos];
>> > +             WRITE_ONCE(item->prog, link->link.prog);
>> > +     }
>> > +}
>> > +
>> > +/**
>> > + * __cgroup_bpf_replace() - Replace link's program and propagate the change
>> > + *                          to descendants
>> > + * @cgrp: The cgroup which descendants to traverse
>> > + * @link: A link for which to replace BPF program
>> > + * @type: Type of attach operation
>> > + *
>> > + * Must be called with cgroup_mutex held.
>> > + */
>> > +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
>> > +                      struct bpf_prog *new_prog)
>> > +{
>> > +     struct list_head *progs = &cgrp->bpf.progs[link->type];
>> > +     struct bpf_prog *old_prog;
>> > +     struct bpf_prog_list *pl;
>> > +     bool found = false;
>> > +
>> > +     list_for_each_entry(pl, progs, node) {
>> > +             if (pl->link == link) {
>> > +                     found = true;
>> > +                     break;
>> > +             }
>> > +     }
>> > +     if (!found)
>> > +             return -ENOENT;
>> > +
>> > +     old_prog = xchg(&link->link.prog, new_prog);
>> > +     replace_effective_prog(cgrp, link->type, link);
>> > +     bpf_prog_put(old_prog);
>> > +     return 0;
>> > +}
>> > +
>> >  static struct bpf_prog_list *find_detach_entry(struct list_head *progs,
>> >                                              struct bpf_prog *prog,
>> >                                              struct bpf_cgroup_link *link,
>> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> > index f6e7d32a2632..1ff7aaa2c727 100644
>> > --- a/kernel/bpf/syscall.c
>> > +++ b/kernel/bpf/syscall.c
>> > @@ -3572,6 +3572,63 @@ static int link_create(union bpf_attr *attr)
>> >       return ret;
>> >  }
>> >
>> > +#define BPF_LINK_UPDATE_LAST_FIELD link_update.old_prog_fd
>> > +
>> > +static int link_update(union bpf_attr *attr)
>> > +{
>> > +     struct bpf_prog *old_prog = NULL, *new_prog;
>> > +     enum bpf_prog_type ptype;
>> > +     struct bpf_link *link;
>> > +     u32 flags;
>> > +     int ret;
>> > +
>> > +     if (CHECK_ATTR(BPF_LINK_UPDATE))
>> > +             return -EINVAL;
>> > +
>> > +     flags = attr->link_update.flags;
>> > +     if (flags & ~BPF_F_REPLACE)
>> > +             return -EINVAL;
>> > +
>> > +     link = bpf_link_get_from_fd(attr->link_update.link_fd);
>> > +     if (IS_ERR(link))
>> > +             return PTR_ERR(link);
>> > +
>> > +     new_prog = bpf_prog_get(attr->link_update.new_prog_fd);
>> > +     if (IS_ERR(new_prog))
>> > +             return PTR_ERR(new_prog);
>> > +
>> > +     if (flags & BPF_F_REPLACE) {
>> > +             old_prog = bpf_prog_get(attr->link_update.old_prog_fd);
>> > +             if (IS_ERR(old_prog)) {
>> > +                     ret = PTR_ERR(old_prog);
>> > +                     old_prog = NULL;
>> > +                     goto out_put_progs;
>> > +             }
>> > +     }
>>
>> Shouldn't the default be to require an old FD and do atomic update, but
>> provide a flag (BPF_F_CLOBBER?) to opt-in to unconditional replace
>> behaviour? Since the unconditional replace is inherently racy I don't
>> think it should be the default; in fact, I'm not sure if it should be
>> allowed at all?
>
> I don't feel strongly either way, but I expect majority of use cases
> to use non-pinned bpf_link with just FD open by an application, where
> application knows that no one else can update program from under link.
> In which case setting new BPF program won't be racy.

Ah. As you've probably noticed, my mental model is somewhat biased
towards utilities that exit after invocation, so didn't think about that :)

Still, even in the case of such a long-running application, it would
still know which fd was already attached, so it could still supply it,
even though there's no possibility of a race. Especially if we abstract
that behind the bpf_link data structure in libbpf.

-Toke


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment
  2020-03-23 17:58     ` Andrii Nakryiko
@ 2020-03-23 19:31       ` Toke Høiland-Jørgensen
  2020-03-24 23:05         ` Andrii Nakryiko
  0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-03-23 19:31 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Mon, Mar 23, 2020 at 4:02 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andriin@fb.com> writes:
>>
>> > Add bpf_program__attach_cgroup(), which uses BPF_LINK_CREATE subcommand to
>> > create an FD-based kernel bpf_link. Also add low-level bpf_link_create() API.
>> >
>> > If expected_attach_type is not specified explicitly with
>> > bpf_program__set_expected_attach_type(), libbpf will try to determine proper
>> > attach type from BPF program's section definition.
>> >
>> > Also add support for bpf_link's underlying BPF program replacement:
>> >   - unconditional through high-level bpf_link__update_program() API;
>> >   - cmpxchg-like with specifying expected current BPF program through
>> >     low-level bpf_link_update() API.
>> >
>> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>> > ---
>> >  tools/include/uapi/linux/bpf.h | 12 +++++++++
>> >  tools/lib/bpf/bpf.c            | 34 +++++++++++++++++++++++++
>> >  tools/lib/bpf/bpf.h            | 19 ++++++++++++++
>> >  tools/lib/bpf/libbpf.c         | 46 ++++++++++++++++++++++++++++++++++
>> >  tools/lib/bpf/libbpf.h         |  8 +++++-
>> >  tools/lib/bpf/libbpf.map       |  4 +++
>> >  6 files changed, 122 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> > index fad9f79bb8f1..fa944093f9fc 100644
>> > --- a/tools/include/uapi/linux/bpf.h
>> > +++ b/tools/include/uapi/linux/bpf.h
>> > @@ -112,6 +112,7 @@ enum bpf_cmd {
>> >       BPF_MAP_UPDATE_BATCH,
>> >       BPF_MAP_DELETE_BATCH,
>> >       BPF_LINK_CREATE,
>> > +     BPF_LINK_UPDATE,
>> >  };
>> >
>> >  enum bpf_map_type {
>> > @@ -574,6 +575,17 @@ union bpf_attr {
>> >               __u32           target_fd;      /* object to attach to */
>> >               __u32           attach_type;    /* attach type */
>> >       } link_create;
>> > +
>> > +     struct { /* struct used by BPF_LINK_UPDATE command */
>> > +             __u32           link_fd;        /* link fd */
>> > +             /* new program fd to update link with */
>> > +             __u32           new_prog_fd;
>> > +             __u32           flags;          /* extra flags */
>> > +             /* expected link's program fd; is specified only if
>> > +              * BPF_F_REPLACE flag is set in flags */
>> > +             __u32           old_prog_fd;
>> > +     } link_update;
>> > +
>> >  } __attribute__((aligned(8)));
>> >
>> >  /* The description below is an attempt at providing documentation to eBPF
>> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>> > index c6dafe563176..35c34fc81bd0 100644
>> > --- a/tools/lib/bpf/bpf.c
>> > +++ b/tools/lib/bpf/bpf.c
>> > @@ -584,6 +584,40 @@ int bpf_prog_detach2(int prog_fd, int target_fd, enum bpf_attach_type type)
>> >       return sys_bpf(BPF_PROG_DETACH, &attr, sizeof(attr));
>> >  }
>> >
>> > +int bpf_link_create(int prog_fd, int target_fd,
>> > +                 enum bpf_attach_type attach_type,
>> > +                 const struct bpf_link_create_opts *opts)
>> > +{
>> > +     union bpf_attr attr;
>> > +
>> > +     if (!OPTS_VALID(opts, bpf_link_create_opts))
>> > +             return -EINVAL;
>> > +
>> > +     memset(&attr, 0, sizeof(attr));
>> > +     attr.link_create.prog_fd = prog_fd;
>> > +     attr.link_create.target_fd = target_fd;
>> > +     attr.link_create.attach_type = attach_type;
>> > +
>> > +     return sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
>> > +}
>> > +
>> > +int bpf_link_update(int link_fd, int new_prog_fd,
>> > +                 const struct bpf_link_update_opts *opts)
>> > +{
>> > +     union bpf_attr attr;
>> > +
>> > +     if (!OPTS_VALID(opts, bpf_link_update_opts))
>> > +             return -EINVAL;
>> > +
>> > +     memset(&attr, 0, sizeof(attr));
>> > +     attr.link_update.link_fd = link_fd;
>> > +     attr.link_update.new_prog_fd = new_prog_fd;
>> > +     attr.link_update.flags = OPTS_GET(opts, flags, 0);
>> > +     attr.link_update.old_prog_fd = OPTS_GET(opts, old_prog_fd, 0);
>> > +
>> > +     return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr));
>> > +}
>> > +
>> >  int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
>> >                  __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt)
>> >  {
>> > diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
>> > index b976e77316cc..46d47afdd887 100644
>> > --- a/tools/lib/bpf/bpf.h
>> > +++ b/tools/lib/bpf/bpf.h
>> > @@ -168,6 +168,25 @@ LIBBPF_API int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
>> >  LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd,
>> >                               enum bpf_attach_type type);
>> >
>> > +struct bpf_link_create_opts {
>> > +     size_t sz; /* size of this struct for forward/backward compatibility */
>> > +};
>> > +#define bpf_link_create_opts__last_field sz
>> > +
>> > +LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
>> > +                            enum bpf_attach_type attach_type,
>> > +                            const struct bpf_link_create_opts *opts);
>> > +
>> > +struct bpf_link_update_opts {
>> > +     size_t sz; /* size of this struct for forward/backward compatibility */
>> > +     __u32 flags;       /* extra flags */
>> > +     __u32 old_prog_fd; /* expected old program FD */
>> > +};
>> > +#define bpf_link_update_opts__last_field old_prog_fd
>> > +
>> > +LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd,
>> > +                            const struct bpf_link_update_opts *opts);
>> > +
>> >  struct bpf_prog_test_run_attr {
>> >       int prog_fd;
>> >       int repeat;
>> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> > index 085e41f9b68e..8b23c70033d3 100644
>> > --- a/tools/lib/bpf/libbpf.c
>> > +++ b/tools/lib/bpf/libbpf.c
>> > @@ -6951,6 +6951,12 @@ struct bpf_link {
>> >       bool disconnected;
>> >  };
>> >
>> > +/* Replace link's underlying BPF program with the new one */
>> > +int bpf_link__update_program(struct bpf_link *link, struct bpf_program *prog)
>> > +{
>> > +     return bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog), NULL);
>> > +}
>>
>> I would expect bpf_link to keep track of the previous program and
>> automatically fill it in with this operation. I.e., it should be
>> possible to do something like:
>>
>> link = bpf_link__open("/sys/fs/bpf/my_link");
>> prog = bpf_link__get_prog(link);
>
> I don't think libbpf is able to construct struct bpf_program from link
> info. It can get program FD, of course, but struct bpf_program is much
> more than that and not sure kernel has all the necessary info. Some
> parts of bpf_program is coming from ELF file, which is gone by this
> time.

Hmm, sure, maybe, but it could still get enough information (such as the
prog fd, and everything returned by GET_PROG_INFO) for userspace could
do something meaningful with the result. So that would turn the above
into bpf_link__get_prog_fd(), and struct bpf_link would contain the fd
of the currently-attached program so it can be supplied in any future
replacement calls.

-Toke


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment
  2020-03-23 19:31       ` Toke Høiland-Jørgensen
@ 2020-03-24 23:05         ` Andrii Nakryiko
  0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2020-03-24 23:05 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Mon, Mar 23, 2020 at 12:31 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Mon, Mar 23, 2020 at 4:02 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Andrii Nakryiko <andriin@fb.com> writes:
> >>
> >> > Add bpf_program__attach_cgroup(), which uses BPF_LINK_CREATE subcommand to
> >> > create an FD-based kernel bpf_link. Also add low-level bpf_link_create() API.
> >> >
> >> > If expected_attach_type is not specified explicitly with
> >> > bpf_program__set_expected_attach_type(), libbpf will try to determine proper
> >> > attach type from BPF program's section definition.
> >> >
> >> > Also add support for bpf_link's underlying BPF program replacement:
> >> >   - unconditional through high-level bpf_link__update_program() API;
> >> >   - cmpxchg-like with specifying expected current BPF program through
> >> >     low-level bpf_link_update() API.
> >> >
> >> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> >> > ---
> >> >  tools/include/uapi/linux/bpf.h | 12 +++++++++
> >> >  tools/lib/bpf/bpf.c            | 34 +++++++++++++++++++++++++
> >> >  tools/lib/bpf/bpf.h            | 19 ++++++++++++++
> >> >  tools/lib/bpf/libbpf.c         | 46 ++++++++++++++++++++++++++++++++++
> >> >  tools/lib/bpf/libbpf.h         |  8 +++++-
> >> >  tools/lib/bpf/libbpf.map       |  4 +++
> >> >  6 files changed, 122 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> >> > index fad9f79bb8f1..fa944093f9fc 100644
> >> > --- a/tools/include/uapi/linux/bpf.h
> >> > +++ b/tools/include/uapi/linux/bpf.h
> >> > @@ -112,6 +112,7 @@ enum bpf_cmd {
> >> >       BPF_MAP_UPDATE_BATCH,
> >> >       BPF_MAP_DELETE_BATCH,
> >> >       BPF_LINK_CREATE,
> >> > +     BPF_LINK_UPDATE,
> >> >  };
> >> >
> >> >  enum bpf_map_type {
> >> > @@ -574,6 +575,17 @@ union bpf_attr {
> >> >               __u32           target_fd;      /* object to attach to */
> >> >               __u32           attach_type;    /* attach type */
> >> >       } link_create;
> >> > +
> >> > +     struct { /* struct used by BPF_LINK_UPDATE command */
> >> > +             __u32           link_fd;        /* link fd */
> >> > +             /* new program fd to update link with */
> >> > +             __u32           new_prog_fd;
> >> > +             __u32           flags;          /* extra flags */
> >> > +             /* expected link's program fd; is specified only if
> >> > +              * BPF_F_REPLACE flag is set in flags */
> >> > +             __u32           old_prog_fd;
> >> > +     } link_update;
> >> > +
> >> >  } __attribute__((aligned(8)));
> >> >
> >> >  /* The description below is an attempt at providing documentation to eBPF
> >> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> >> > index c6dafe563176..35c34fc81bd0 100644
> >> > --- a/tools/lib/bpf/bpf.c
> >> > +++ b/tools/lib/bpf/bpf.c
> >> > @@ -584,6 +584,40 @@ int bpf_prog_detach2(int prog_fd, int target_fd, enum bpf_attach_type type)
> >> >       return sys_bpf(BPF_PROG_DETACH, &attr, sizeof(attr));
> >> >  }
> >> >
> >> > +int bpf_link_create(int prog_fd, int target_fd,
> >> > +                 enum bpf_attach_type attach_type,
> >> > +                 const struct bpf_link_create_opts *opts)
> >> > +{
> >> > +     union bpf_attr attr;
> >> > +
> >> > +     if (!OPTS_VALID(opts, bpf_link_create_opts))
> >> > +             return -EINVAL;
> >> > +
> >> > +     memset(&attr, 0, sizeof(attr));
> >> > +     attr.link_create.prog_fd = prog_fd;
> >> > +     attr.link_create.target_fd = target_fd;
> >> > +     attr.link_create.attach_type = attach_type;
> >> > +
> >> > +     return sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
> >> > +}
> >> > +
> >> > +int bpf_link_update(int link_fd, int new_prog_fd,
> >> > +                 const struct bpf_link_update_opts *opts)
> >> > +{
> >> > +     union bpf_attr attr;
> >> > +
> >> > +     if (!OPTS_VALID(opts, bpf_link_update_opts))
> >> > +             return -EINVAL;
> >> > +
> >> > +     memset(&attr, 0, sizeof(attr));
> >> > +     attr.link_update.link_fd = link_fd;
> >> > +     attr.link_update.new_prog_fd = new_prog_fd;
> >> > +     attr.link_update.flags = OPTS_GET(opts, flags, 0);
> >> > +     attr.link_update.old_prog_fd = OPTS_GET(opts, old_prog_fd, 0);
> >> > +
> >> > +     return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr));
> >> > +}
> >> > +
> >> >  int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
> >> >                  __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt)
> >> >  {
> >> > diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> >> > index b976e77316cc..46d47afdd887 100644
> >> > --- a/tools/lib/bpf/bpf.h
> >> > +++ b/tools/lib/bpf/bpf.h
> >> > @@ -168,6 +168,25 @@ LIBBPF_API int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
> >> >  LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd,
> >> >                               enum bpf_attach_type type);
> >> >
> >> > +struct bpf_link_create_opts {
> >> > +     size_t sz; /* size of this struct for forward/backward compatibility */
> >> > +};
> >> > +#define bpf_link_create_opts__last_field sz
> >> > +
> >> > +LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
> >> > +                            enum bpf_attach_type attach_type,
> >> > +                            const struct bpf_link_create_opts *opts);
> >> > +
> >> > +struct bpf_link_update_opts {
> >> > +     size_t sz; /* size of this struct for forward/backward compatibility */
> >> > +     __u32 flags;       /* extra flags */
> >> > +     __u32 old_prog_fd; /* expected old program FD */
> >> > +};
> >> > +#define bpf_link_update_opts__last_field old_prog_fd
> >> > +
> >> > +LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd,
> >> > +                            const struct bpf_link_update_opts *opts);
> >> > +
> >> >  struct bpf_prog_test_run_attr {
> >> >       int prog_fd;
> >> >       int repeat;
> >> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >> > index 085e41f9b68e..8b23c70033d3 100644
> >> > --- a/tools/lib/bpf/libbpf.c
> >> > +++ b/tools/lib/bpf/libbpf.c
> >> > @@ -6951,6 +6951,12 @@ struct bpf_link {
> >> >       bool disconnected;
> >> >  };
> >> >
> >> > +/* Replace link's underlying BPF program with the new one */
> >> > +int bpf_link__update_program(struct bpf_link *link, struct bpf_program *prog)
> >> > +{
> >> > +     return bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog), NULL);
> >> > +}
> >>
> >> I would expect bpf_link to keep track of the previous program and
> >> automatically fill it in with this operation. I.e., it should be
> >> possible to do something like:
> >>
> >> link = bpf_link__open("/sys/fs/bpf/my_link");
> >> prog = bpf_link__get_prog(link);
> >
> > I don't think libbpf is able to construct struct bpf_program from link
> > info. It can get program FD, of course, but struct bpf_program is much
> > more than that and not sure kernel has all the necessary info. Some
> > parts of bpf_program is coming from ELF file, which is gone by this
> > time.
>
> Hmm, sure, maybe, but it could still get enough information (such as the
> prog fd, and everything returned by GET_PROG_INFO) for userspace could
> do something meaningful with the result. So that would turn the above
> into bpf_link__get_prog_fd(), and struct bpf_link would contain the fd
> of the currently-attached program so it can be supplied in any future
> replacement calls.

Yes, at that will probably be implementation if we go with "expected
always required" as a default. But I'm still not sure that's the right
default.

>
> -Toke
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-03-24 23:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-20 20:36 [PATCH bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
2020-03-20 20:36 ` [PATCH bpf-next 1/6] bpf: factor out cgroup storages operations Andrii Nakryiko
2020-03-20 20:36 ` [PATCH bpf-next 2/6] bpf: factor out attach_type to prog_type mapping for attach/detach Andrii Nakryiko
2020-03-20 20:36 ` [PATCH bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment Andrii Nakryiko
2020-03-20 21:33   ` Stanislav Fomichev
2020-03-20 21:47     ` Andrii Nakryiko
2020-03-20 23:46   ` [Potential Spoof] " Andrey Ignatov
2020-03-21  0:19     ` Andrii Nakryiko
2020-03-23 18:03       ` Andrii Nakryiko
2020-03-20 20:36 ` [PATCH bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Andrii Nakryiko
2020-03-21  0:33   ` kbuild test robot
2020-03-21  0:58     ` Andrii Nakryiko
2020-03-21  1:28   ` kbuild test robot
2020-03-23 10:58   ` Toke Høiland-Jørgensen
2020-03-23 17:55     ` Andrii Nakryiko
2020-03-23 19:29       ` Toke Høiland-Jørgensen
2020-03-20 20:36 ` [PATCH bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment Andrii Nakryiko
2020-03-23 11:02   ` Toke Høiland-Jørgensen
2020-03-23 17:58     ` Andrii Nakryiko
2020-03-23 19:31       ` Toke Høiland-Jørgensen
2020-03-24 23:05         ` Andrii Nakryiko
2020-03-20 20:36 ` [PATCH bpf-next 6/6] selftests/bpf: test FD-based " Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).