bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 0/6] Add support for cgroup bpf_link
@ 2020-03-25  6:57 Andrii Nakryiko
  2020-03-25  6:57 ` [PATCH v2 bpf-next 1/6] bpf: factor out cgroup storages operations Andrii Nakryiko
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Andrii Nakryiko @ 2020-03-25  6:57 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, rdna
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

bpf_link abstraction itself was formalized in [0] with justifications for why
its semantics is a good fit for attaching BPF programs of various types. This
patch set adds bpf_link-based BPF program attachment mechanism for cgroup BPF
programs.

Cgroup BPF link implements all the modes and semantics of existing BPF program
attachment: exclusive, exclusive overridable and multi-attachment. Thus cgroup
bpf_link can co-exist with legacy BPF program multi-attachment. See patch #4
for detailed explanation of inter-operability between legacy and
bpf_link-based attachments.

bpf_link is destroyed and automatically detached when the last open FD holding
the reference to bpf_link is closed. This means that by default, when the
process that created bpf_link exits, attached BPF program will be
automatically detached due to bpf_link's clean up code. Cgroup bpf_link, like
any other bpf_link, can be pinned in BPF FS and by those means survive the
exit of process that created the link. This is useful in many scenarios to
provide long-living BPF program attachments. Pinning also means that there
could be many owners of bpf_link through independent FDs.

Additionally, auto-detachmet of cgroup bpf_link is implemented. When cgroup is
dying it will automatically detach all active bpf_links. This ensures that
cgroup clean up is not delayed due to active bpf_link even despite no chance
for any BPF program to be run for a given cgroup. In that sense it's similar
to existing behavior of dropping refcnt of attached bpf_prog. But in the case
of bpf_link, bpf_link is not destroyed and is still available to user as long
as at least one active FD is still open (or if it's pinned in BPF FS).

There are two main cgroup-specific differences between bpf_link-based and
direct bpf_prog-based attachment.

First, as opposed to direct bpf_prog attachment, cgroup itself doesn't "own"
bpf_link, which makes it possible to auto-clean up attached bpf_link when user
process abruptly exits without explicitly detaching BPF program. This makes
for a safe default behavior proven in BPF tracing program types. But bpf_link
doesn't bump cgroup->bpf.refcnt as well and because of that doesn't prevent
cgroup from cleaning up its BPF state.

Second, only owners of bpf_link (those who created bpf_link in the first place
or obtained a new FD by opening bpf_link from BPF FS) can detach and/or update
it. This makes sure that no other process can accidentally remove/replace BPF
program.

This patch set also implements LINK_UPDATE sub-command, which allows to
replace bpf_link's underlying bpf_prog, similarly to BPF_F_REPLACE flag
behavior for direct bpf_prog cgroup attachment. Similarly to LINK_CREATE, it
is supposed to be generic command for different types of bpf_links.

  [0] https://lore.kernel.org/bpf/20200228223948.360936-1-andriin@fb.com/

v1->v2:
  - implement exclusive and overridable exclusive modes (Andrey Ignatov);
  - fix build for !CONFIG_CGROUP_BPF build;
  - add more selftests for non-multi mode and inter-operability;

Andrii Nakryiko (6):
  bpf: factor out cgroup storages operations
  bpf: factor out attach_type to prog_type mapping for attach/detach
  bpf: implement bpf_link-based cgroup BPF program attachment
  bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  libbpf: add support for bpf_link-based cgroup attachment
  selftests/bpf: test FD-based cgroup attachment

 include/linux/bpf-cgroup.h                    |  40 +-
 include/linux/bpf.h                           |  10 +-
 include/uapi/linux/bpf.h                      |  22 +-
 kernel/bpf/cgroup.c                           | 518 ++++++++++++++----
 kernel/bpf/syscall.c                          | 267 +++++----
 kernel/cgroup/cgroup.c                        |  41 +-
 tools/include/uapi/linux/bpf.h                |  22 +-
 tools/lib/bpf/bpf.c                           |  35 ++
 tools/lib/bpf/bpf.h                           |  20 +
 tools/lib/bpf/libbpf.c                        |  49 ++
 tools/lib/bpf/libbpf.h                        |   9 +-
 tools/lib/bpf/libbpf.map                      |   4 +
 .../selftests/bpf/prog_tests/cgroup_link.c    | 235 ++++++++
 .../selftests/bpf/progs/test_cgroup_link.c    |  24 +
 14 files changed, 1067 insertions(+), 229 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_link.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_cgroup_link.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 bpf-next 1/6] bpf: factor out cgroup storages operations
  2020-03-25  6:57 [PATCH v2 bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
@ 2020-03-25  6:57 ` Andrii Nakryiko
  2020-03-25  6:57 ` [PATCH v2 bpf-next 2/6] bpf: factor out attach_type to prog_type mapping for attach/detach Andrii Nakryiko
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Andrii Nakryiko @ 2020-03-25  6:57 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, rdna
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Refactor cgroup attach/detach code to abstract away common operations
performed on all types of cgroup storages. This makes high-level logic more
apparent, plus allows to reuse more code across multiple functions.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 kernel/bpf/cgroup.c | 118 +++++++++++++++++++++++++++-----------------
 1 file changed, 72 insertions(+), 46 deletions(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 9a500fadbef5..9c8472823a7f 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -28,6 +28,58 @@ void cgroup_bpf_offline(struct cgroup *cgrp)
 	percpu_ref_kill(&cgrp->bpf.refcnt);
 }
 
+static void bpf_cgroup_storages_free(struct bpf_cgroup_storage *storages[])
+{
+	enum bpf_cgroup_storage_type stype;
+
+	for_each_cgroup_storage_type(stype)
+		bpf_cgroup_storage_free(storages[stype]);
+}
+
+static int bpf_cgroup_storages_alloc(struct bpf_cgroup_storage *storages[],
+				     struct bpf_prog *prog)
+{
+	enum bpf_cgroup_storage_type stype;
+
+	for_each_cgroup_storage_type(stype) {
+		storages[stype] = bpf_cgroup_storage_alloc(prog, stype);
+		if (IS_ERR(storages[stype])) {
+			storages[stype] = NULL;
+			bpf_cgroup_storages_free(storages);
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
+static void bpf_cgroup_storages_assign(struct bpf_cgroup_storage *dst[],
+				       struct bpf_cgroup_storage *src[])
+{
+	enum bpf_cgroup_storage_type stype;
+
+	for_each_cgroup_storage_type(stype)
+		dst[stype] = src[stype];
+}
+
+static void bpf_cgroup_storages_link(struct bpf_cgroup_storage *storages[],
+				     struct cgroup* cgrp,
+				     enum bpf_attach_type attach_type)
+{
+	enum bpf_cgroup_storage_type stype;
+
+	for_each_cgroup_storage_type(stype)
+		bpf_cgroup_storage_link(storages[stype], cgrp, attach_type);
+}
+
+static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[])
+{
+	enum bpf_cgroup_storage_type stype;
+
+	for_each_cgroup_storage_type(stype)
+		bpf_cgroup_storage_unlink(storages[stype]);
+}
+
 /**
  * cgroup_bpf_release() - put references of all bpf programs and
  *                        release all cgroup bpf data
@@ -37,7 +89,6 @@ static void cgroup_bpf_release(struct work_struct *work)
 {
 	struct cgroup *p, *cgrp = container_of(work, struct cgroup,
 					       bpf.release_work);
-	enum bpf_cgroup_storage_type stype;
 	struct bpf_prog_array *old_array;
 	unsigned int type;
 
@@ -50,10 +101,8 @@ static void cgroup_bpf_release(struct work_struct *work)
 		list_for_each_entry_safe(pl, tmp, progs, node) {
 			list_del(&pl->node);
 			bpf_prog_put(pl->prog);
-			for_each_cgroup_storage_type(stype) {
-				bpf_cgroup_storage_unlink(pl->storage[stype]);
-				bpf_cgroup_storage_free(pl->storage[stype]);
-			}
+			bpf_cgroup_storages_unlink(pl->storage);
+			bpf_cgroup_storages_free(pl->storage);
 			kfree(pl);
 			static_branch_dec(&cgroup_bpf_enabled_key);
 		}
@@ -138,7 +187,7 @@ static int compute_effective_progs(struct cgroup *cgrp,
 				   enum bpf_attach_type type,
 				   struct bpf_prog_array **array)
 {
-	enum bpf_cgroup_storage_type stype;
+	struct bpf_prog_array_item *item;
 	struct bpf_prog_array *progs;
 	struct bpf_prog_list *pl;
 	struct cgroup *p = cgrp;
@@ -166,10 +215,10 @@ static int compute_effective_progs(struct cgroup *cgrp,
 			if (!pl->prog)
 				continue;
 
-			progs->items[cnt].prog = pl->prog;
-			for_each_cgroup_storage_type(stype)
-				progs->items[cnt].cgroup_storage[stype] =
-					pl->storage[stype];
+			item = &progs->items[cnt];
+			item->prog = pl->prog;
+			bpf_cgroup_storages_assign(item->cgroup_storage,
+						   pl->storage);
 			cnt++;
 		}
 	} while ((p = cgroup_parent(p)));
@@ -305,7 +354,6 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE],
 		*old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {NULL};
 	struct bpf_prog_list *pl, *replace_pl = NULL;
-	enum bpf_cgroup_storage_type stype;
 	int err;
 
 	if (((flags & BPF_F_ALLOW_OVERRIDE) && (flags & BPF_F_ALLOW_MULTI)) ||
@@ -341,37 +389,25 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
 		replace_pl = list_first_entry(progs, typeof(*pl), node);
 	}
 
-	for_each_cgroup_storage_type(stype) {
-		storage[stype] = bpf_cgroup_storage_alloc(prog, stype);
-		if (IS_ERR(storage[stype])) {
-			storage[stype] = NULL;
-			for_each_cgroup_storage_type(stype)
-				bpf_cgroup_storage_free(storage[stype]);
-			return -ENOMEM;
-		}
-	}
+	if (bpf_cgroup_storages_alloc(storage, prog))
+		return -ENOMEM;
 
 	if (replace_pl) {
 		pl = replace_pl;
 		old_prog = pl->prog;
-		for_each_cgroup_storage_type(stype) {
-			old_storage[stype] = pl->storage[stype];
-			bpf_cgroup_storage_unlink(old_storage[stype]);
-		}
+		bpf_cgroup_storages_unlink(pl->storage);
+		bpf_cgroup_storages_assign(old_storage, pl->storage);
 	} else {
 		pl = kmalloc(sizeof(*pl), GFP_KERNEL);
 		if (!pl) {
-			for_each_cgroup_storage_type(stype)
-				bpf_cgroup_storage_free(storage[stype]);
+			bpf_cgroup_storages_free(storage);
 			return -ENOMEM;
 		}
 		list_add_tail(&pl->node, progs);
 	}
 
 	pl->prog = prog;
-	for_each_cgroup_storage_type(stype)
-		pl->storage[stype] = storage[stype];
-
+	bpf_cgroup_storages_assign(pl->storage, storage);
 	cgrp->bpf.flags[type] = saved_flags;
 
 	err = update_effective_progs(cgrp, type);
@@ -379,27 +415,20 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
 		goto cleanup;
 
 	static_branch_inc(&cgroup_bpf_enabled_key);
-	for_each_cgroup_storage_type(stype) {
-		if (!old_storage[stype])
-			continue;
-		bpf_cgroup_storage_free(old_storage[stype]);
-	}
+	bpf_cgroup_storages_free(old_storage);
 	if (old_prog) {
 		bpf_prog_put(old_prog);
 		static_branch_dec(&cgroup_bpf_enabled_key);
 	}
-	for_each_cgroup_storage_type(stype)
-		bpf_cgroup_storage_link(storage[stype], cgrp, type);
+	bpf_cgroup_storages_link(storage, cgrp, type);
 	return 0;
 
 cleanup:
 	/* and cleanup the prog list */
 	pl->prog = old_prog;
-	for_each_cgroup_storage_type(stype) {
-		bpf_cgroup_storage_free(pl->storage[stype]);
-		pl->storage[stype] = old_storage[stype];
-		bpf_cgroup_storage_link(old_storage[stype], cgrp, type);
-	}
+	bpf_cgroup_storages_free(pl->storage);
+	bpf_cgroup_storages_assign(pl->storage, old_storage);
+	bpf_cgroup_storages_link(pl->storage, cgrp, type);
 	if (!replace_pl) {
 		list_del(&pl->node);
 		kfree(pl);
@@ -420,7 +449,6 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 			enum bpf_attach_type type)
 {
 	struct list_head *progs = &cgrp->bpf.progs[type];
-	enum bpf_cgroup_storage_type stype;
 	u32 flags = cgrp->bpf.flags[type];
 	struct bpf_prog *old_prog = NULL;
 	struct bpf_prog_list *pl;
@@ -467,10 +495,8 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 
 	/* now can actually delete it from this cgroup list */
 	list_del(&pl->node);
-	for_each_cgroup_storage_type(stype) {
-		bpf_cgroup_storage_unlink(pl->storage[stype]);
-		bpf_cgroup_storage_free(pl->storage[stype]);
-	}
+	bpf_cgroup_storages_unlink(pl->storage);
+	bpf_cgroup_storages_free(pl->storage);
 	kfree(pl);
 	if (list_empty(progs))
 		/* last program was detached, reset flags to zero */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 bpf-next 2/6] bpf: factor out attach_type to prog_type mapping for attach/detach
  2020-03-25  6:57 [PATCH v2 bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
  2020-03-25  6:57 ` [PATCH v2 bpf-next 1/6] bpf: factor out cgroup storages operations Andrii Nakryiko
@ 2020-03-25  6:57 ` Andrii Nakryiko
  2020-03-25  6:57 ` [PATCH v2 bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment Andrii Nakryiko
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Andrii Nakryiko @ 2020-03-25  6:57 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, rdna
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Factor out logic mapping expected program attach type to program type and
subsequent handling of program attach/detach. Also list out all supported
cgroup BPF program types explicitly to prevent accidental bugs once more
program types are added to a mapping. Do the same for prog_query API.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 kernel/bpf/syscall.c | 153 +++++++++++++++++++------------------------
 1 file changed, 66 insertions(+), 87 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 85567a6ea5f9..fd4181939064 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2535,36 +2535,18 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 	}
 }
 
-#define BPF_PROG_ATTACH_LAST_FIELD replace_bpf_fd
-
-#define BPF_F_ATTACH_MASK \
-	(BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI | BPF_F_REPLACE)
-
-static int bpf_prog_attach(const union bpf_attr *attr)
+static enum bpf_prog_type
+attach_type_to_prog_type(enum bpf_attach_type attach_type)
 {
-	enum bpf_prog_type ptype;
-	struct bpf_prog *prog;
-	int ret;
-
-	if (!capable(CAP_NET_ADMIN))
-		return -EPERM;
-
-	if (CHECK_ATTR(BPF_PROG_ATTACH))
-		return -EINVAL;
-
-	if (attr->attach_flags & ~BPF_F_ATTACH_MASK)
-		return -EINVAL;
-
-	switch (attr->attach_type) {
+	switch (attach_type) {
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
-		ptype = BPF_PROG_TYPE_CGROUP_SKB;
+		return BPF_PROG_TYPE_CGROUP_SKB;
 		break;
 	case BPF_CGROUP_INET_SOCK_CREATE:
 	case BPF_CGROUP_INET4_POST_BIND:
 	case BPF_CGROUP_INET6_POST_BIND:
-		ptype = BPF_PROG_TYPE_CGROUP_SOCK;
-		break;
+		return BPF_PROG_TYPE_CGROUP_SOCK;
 	case BPF_CGROUP_INET4_BIND:
 	case BPF_CGROUP_INET6_BIND:
 	case BPF_CGROUP_INET4_CONNECT:
@@ -2573,37 +2555,53 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_CGROUP_UDP6_SENDMSG:
 	case BPF_CGROUP_UDP4_RECVMSG:
 	case BPF_CGROUP_UDP6_RECVMSG:
-		ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
-		break;
+		return BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
 	case BPF_CGROUP_SOCK_OPS:
-		ptype = BPF_PROG_TYPE_SOCK_OPS;
-		break;
+		return BPF_PROG_TYPE_SOCK_OPS;
 	case BPF_CGROUP_DEVICE:
-		ptype = BPF_PROG_TYPE_CGROUP_DEVICE;
-		break;
+		return BPF_PROG_TYPE_CGROUP_DEVICE;
 	case BPF_SK_MSG_VERDICT:
-		ptype = BPF_PROG_TYPE_SK_MSG;
-		break;
+		return BPF_PROG_TYPE_SK_MSG;
 	case BPF_SK_SKB_STREAM_PARSER:
 	case BPF_SK_SKB_STREAM_VERDICT:
-		ptype = BPF_PROG_TYPE_SK_SKB;
-		break;
+		return BPF_PROG_TYPE_SK_SKB;
 	case BPF_LIRC_MODE2:
-		ptype = BPF_PROG_TYPE_LIRC_MODE2;
-		break;
+		return BPF_PROG_TYPE_LIRC_MODE2;
 	case BPF_FLOW_DISSECTOR:
-		ptype = BPF_PROG_TYPE_FLOW_DISSECTOR;
-		break;
+		return BPF_PROG_TYPE_FLOW_DISSECTOR;
 	case BPF_CGROUP_SYSCTL:
-		ptype = BPF_PROG_TYPE_CGROUP_SYSCTL;
-		break;
+		return BPF_PROG_TYPE_CGROUP_SYSCTL;
 	case BPF_CGROUP_GETSOCKOPT:
 	case BPF_CGROUP_SETSOCKOPT:
-		ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT;
-		break;
+		return BPF_PROG_TYPE_CGROUP_SOCKOPT;
 	default:
-		return -EINVAL;
+		return BPF_PROG_TYPE_UNSPEC;
 	}
+}
+
+#define BPF_PROG_ATTACH_LAST_FIELD replace_bpf_fd
+
+#define BPF_F_ATTACH_MASK \
+	(BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI | BPF_F_REPLACE)
+
+static int bpf_prog_attach(const union bpf_attr *attr)
+{
+	enum bpf_prog_type ptype;
+	struct bpf_prog *prog;
+	int ret;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	if (CHECK_ATTR(BPF_PROG_ATTACH))
+		return -EINVAL;
+
+	if (attr->attach_flags & ~BPF_F_ATTACH_MASK)
+		return -EINVAL;
+
+	ptype = attach_type_to_prog_type(attr->attach_type);
+	if (ptype == BPF_PROG_TYPE_UNSPEC)
+		return -EINVAL;
 
 	prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
 	if (IS_ERR(prog))
@@ -2625,8 +2623,17 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 		ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
 		break;
-	default:
+	case BPF_PROG_TYPE_CGROUP_DEVICE:
+	case BPF_PROG_TYPE_CGROUP_SKB:
+	case BPF_PROG_TYPE_CGROUP_SOCK:
+	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
+	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+	case BPF_PROG_TYPE_CGROUP_SYSCTL:
+	case BPF_PROG_TYPE_SOCK_OPS:
 		ret = cgroup_bpf_prog_attach(attr, ptype, prog);
+		break;
+	default:
+		ret = -EINVAL;
 	}
 
 	if (ret)
@@ -2646,53 +2653,27 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	if (CHECK_ATTR(BPF_PROG_DETACH))
 		return -EINVAL;
 
-	switch (attr->attach_type) {
-	case BPF_CGROUP_INET_INGRESS:
-	case BPF_CGROUP_INET_EGRESS:
-		ptype = BPF_PROG_TYPE_CGROUP_SKB;
-		break;
-	case BPF_CGROUP_INET_SOCK_CREATE:
-	case BPF_CGROUP_INET4_POST_BIND:
-	case BPF_CGROUP_INET6_POST_BIND:
-		ptype = BPF_PROG_TYPE_CGROUP_SOCK;
-		break;
-	case BPF_CGROUP_INET4_BIND:
-	case BPF_CGROUP_INET6_BIND:
-	case BPF_CGROUP_INET4_CONNECT:
-	case BPF_CGROUP_INET6_CONNECT:
-	case BPF_CGROUP_UDP4_SENDMSG:
-	case BPF_CGROUP_UDP6_SENDMSG:
-	case BPF_CGROUP_UDP4_RECVMSG:
-	case BPF_CGROUP_UDP6_RECVMSG:
-		ptype = BPF_PROG_TYPE_CGROUP_SOCK_ADDR;
-		break;
-	case BPF_CGROUP_SOCK_OPS:
-		ptype = BPF_PROG_TYPE_SOCK_OPS;
-		break;
-	case BPF_CGROUP_DEVICE:
-		ptype = BPF_PROG_TYPE_CGROUP_DEVICE;
-		break;
-	case BPF_SK_MSG_VERDICT:
-		return sock_map_get_from_fd(attr, NULL);
-	case BPF_SK_SKB_STREAM_PARSER:
-	case BPF_SK_SKB_STREAM_VERDICT:
+	ptype = attach_type_to_prog_type(attr->attach_type);
+
+	switch (ptype) {
+	case BPF_PROG_TYPE_SK_MSG:
+	case BPF_PROG_TYPE_SK_SKB:
 		return sock_map_get_from_fd(attr, NULL);
-	case BPF_LIRC_MODE2:
+	case BPF_PROG_TYPE_LIRC_MODE2:
 		return lirc_prog_detach(attr);
-	case BPF_FLOW_DISSECTOR:
+	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 		return skb_flow_dissector_bpf_prog_detach(attr);
-	case BPF_CGROUP_SYSCTL:
-		ptype = BPF_PROG_TYPE_CGROUP_SYSCTL;
-		break;
-	case BPF_CGROUP_GETSOCKOPT:
-	case BPF_CGROUP_SETSOCKOPT:
-		ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT;
-		break;
+	case BPF_PROG_TYPE_CGROUP_DEVICE:
+	case BPF_PROG_TYPE_CGROUP_SKB:
+	case BPF_PROG_TYPE_CGROUP_SOCK:
+	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
+	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+	case BPF_PROG_TYPE_CGROUP_SYSCTL:
+	case BPF_PROG_TYPE_SOCK_OPS:
+		return cgroup_bpf_prog_detach(attr, ptype);
 	default:
 		return -EINVAL;
 	}
-
-	return cgroup_bpf_prog_detach(attr, ptype);
 }
 
 #define BPF_PROG_QUERY_LAST_FIELD query.prog_cnt
@@ -2726,7 +2707,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_CGROUP_SYSCTL:
 	case BPF_CGROUP_GETSOCKOPT:
 	case BPF_CGROUP_SETSOCKOPT:
-		break;
+		return cgroup_bpf_prog_query(attr, uattr);
 	case BPF_LIRC_MODE2:
 		return lirc_prog_query(attr, uattr);
 	case BPF_FLOW_DISSECTOR:
@@ -2734,8 +2715,6 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	default:
 		return -EINVAL;
 	}
-
-	return cgroup_bpf_prog_query(attr, uattr);
 }
 
 #define BPF_PROG_TEST_RUN_LAST_FIELD test.ctx_out
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment
  2020-03-25  6:57 [PATCH v2 bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
  2020-03-25  6:57 ` [PATCH v2 bpf-next 1/6] bpf: factor out cgroup storages operations Andrii Nakryiko
  2020-03-25  6:57 ` [PATCH v2 bpf-next 2/6] bpf: factor out attach_type to prog_type mapping for attach/detach Andrii Nakryiko
@ 2020-03-25  6:57 ` Andrii Nakryiko
  2020-03-25  6:57 ` [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Andrii Nakryiko
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Andrii Nakryiko @ 2020-03-25  6:57 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, rdna
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Implement new sub-command to attach cgroup BPF programs and return FD-based
bpf_link back on success. bpf_link, once attached to cgroup, cannot be
replaced, except by owner having its FD. cgroup bpf_link supports the sema
semantics and mode of direct BPF program attachment and co-exist with
non-bpf_link-based BPF cgroup attachments. E.g., for BPF_F_ALLOW_MULTI case,
both link-based and prog-based attachments can be freely intermixed. For
exclusive mode (overridable and not), bpf_link-based attachment can replace
bpf_prog-based single attachment, but not vice versa. This way bpf_link always
preserves its promise that it can be replaced only by its owner. This also
allows to forcefully replace legacy BPF program and prevent further
replacement by attaching bpf_link.

To prevent bpf_cgroup_link from keeping cgroup alive past the point when no
BPF program can be executed, implement auto-detachment of link. When
cgroup_bpf_release() is called, all attached bpf_links are forced to release
cgroup refcounts, but they leave bpf_link otherwise active and allocated, as
well as still owning underlying bpf_prog. This is because user-space might
still have FDs open and active, so bpf_link as a user-referenced object can't
be freed yet. Once last active FD is closed, bpf_link will be freed and
underlying bpf_prog refcount will be dropped. But cgroup refcount won't be
touched, because cgroup is released already.

The inherent race between bpf_cgroup_link release (from closing last FD) and
cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So
the only additional check required is when bpf_cgroup_link attempts to detach
itself from cgroup. At that time we need to check whether there is still
cgroup associated with that link. And if not, exit with success, because
bpf_cgroup_link was already successfully detached.

Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 include/linux/bpf-cgroup.h     |  29 ++-
 include/linux/bpf.h            |  10 +-
 include/uapi/linux/bpf.h       |  10 +-
 kernel/bpf/cgroup.c            | 328 +++++++++++++++++++++++++--------
 kernel/bpf/syscall.c           |  62 ++++++-
 kernel/cgroup/cgroup.c         |  14 +-
 tools/include/uapi/linux/bpf.h |  10 +-
 7 files changed, 364 insertions(+), 99 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index a7cd5c7a2509..d2d969669564 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -51,9 +51,18 @@ struct bpf_cgroup_storage {
 	struct rcu_head rcu;
 };
 
+struct bpf_cgroup_link {
+	struct bpf_link link;
+	struct cgroup *cgroup;
+	enum bpf_attach_type type;
+};
+
+extern const struct bpf_link_ops bpf_cgroup_link_lops;
+
 struct bpf_prog_list {
 	struct list_head node;
 	struct bpf_prog *prog;
+	struct bpf_cgroup_link *link;
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE];
 };
 
@@ -84,20 +93,23 @@ struct cgroup_bpf {
 int cgroup_bpf_inherit(struct cgroup *cgrp);
 void cgroup_bpf_offline(struct cgroup *cgrp);
 
-int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
-			struct bpf_prog *replace_prog,
+int __cgroup_bpf_attach(struct cgroup *cgrp,
+			struct bpf_prog *prog, struct bpf_prog *replace_prog,
+			struct bpf_cgroup_link *link,
 			enum bpf_attach_type type, u32 flags);
 int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
+			struct bpf_cgroup_link *link,
 			enum bpf_attach_type type);
 int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		       union bpf_attr __user *uattr);
 
 /* Wrapper for __cgroup_bpf_*() protected by cgroup_mutex */
-int cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
-		      struct bpf_prog *replace_prog, enum bpf_attach_type type,
+int cgroup_bpf_attach(struct cgroup *cgrp,
+		      struct bpf_prog *prog, struct bpf_prog *replace_prog,
+		      struct bpf_cgroup_link *link, enum bpf_attach_type type,
 		      u32 flags);
 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
-		      enum bpf_attach_type type, u32 flags);
+		      enum bpf_attach_type type);
 int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		     union bpf_attr __user *uattr);
 
@@ -332,6 +344,7 @@ int cgroup_bpf_prog_attach(const union bpf_attr *attr,
 			   enum bpf_prog_type ptype, struct bpf_prog *prog);
 int cgroup_bpf_prog_detach(const union bpf_attr *attr,
 			   enum bpf_prog_type ptype);
+int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
 int cgroup_bpf_prog_query(const union bpf_attr *attr,
 			  union bpf_attr __user *uattr);
 #else
@@ -354,6 +367,12 @@ static inline int cgroup_bpf_prog_detach(const union bpf_attr *attr,
 	return -EINVAL;
 }
 
+static inline int cgroup_bpf_link_attach(const union bpf_attr *attr,
+					 struct bpf_prog *prog)
+{
+	return -EINVAL;
+}
+
 static inline int cgroup_bpf_prog_query(const union bpf_attr *attr,
 					union bpf_attr __user *uattr)
 {
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bdb981c204fa..0f7c2f48c734 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1081,15 +1081,23 @@ extern int sysctl_unprivileged_bpf_disabled;
 int bpf_map_new_fd(struct bpf_map *map, int flags);
 int bpf_prog_new_fd(struct bpf_prog *prog);
 
-struct bpf_link;
+struct bpf_link {
+	atomic64_t refcnt;
+	const struct bpf_link_ops *ops;
+	struct bpf_prog *prog;
+	struct work_struct work;
+};
 
 struct bpf_link_ops {
 	void (*release)(struct bpf_link *link);
 	void (*dealloc)(struct bpf_link *link);
+
 };
 
 void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
 		   struct bpf_prog *prog);
+void bpf_link_cleanup(struct bpf_link *link, struct file *link_file,
+		      int link_fd);
 void bpf_link_inc(struct bpf_link *link);
 void bpf_link_put(struct bpf_link *link);
 int bpf_link_new_fd(struct bpf_link *link);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 5d01c5c7e598..948ebbfd401b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -111,6 +111,7 @@ enum bpf_cmd {
 	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
 	BPF_MAP_UPDATE_BATCH,
 	BPF_MAP_DELETE_BATCH,
+	BPF_LINK_CREATE,
 };
 
 enum bpf_map_type {
@@ -539,7 +540,7 @@ union bpf_attr {
 		__u32		prog_cnt;
 	} query;
 
-	struct {
+	struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
 		__u64 name;
 		__u32 prog_fd;
 	} raw_tracepoint;
@@ -567,6 +568,13 @@ union bpf_attr {
 		__u64		probe_offset;	/* output: probe_offset */
 		__u64		probe_addr;	/* output: probe_addr */
 	} task_fd_query;
+
+	struct { /* struct used by BPF_LINK_CREATE command */
+		__u32		prog_fd;	/* eBPF program to attach */
+		__u32		target_fd;	/* object to attach to */
+		__u32		attach_type;	/* attach type */
+		__u32		flags;		/* extra flags */
+	} link_create;
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 9c8472823a7f..c5cedc8c3428 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -80,6 +80,17 @@ static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[])
 		bpf_cgroup_storage_unlink(storages[stype]);
 }
 
+/* Called when bpf_cgroup_link is auto-detached from dying cgroup.
+ * It drops cgroup and bpf_prog refcounts, and marks bpf_link as defunct. It
+ * doesn't free link memory, which will eventually be done by bpf_link's
+ * release() callback, when its last FD is closed.
+ */
+static void bpf_cgroup_link_auto_detach(struct bpf_cgroup_link *link)
+{
+	cgroup_put(link->cgroup);
+	link->cgroup = NULL;
+}
+
 /**
  * cgroup_bpf_release() - put references of all bpf programs and
  *                        release all cgroup bpf data
@@ -100,7 +111,10 @@ static void cgroup_bpf_release(struct work_struct *work)
 
 		list_for_each_entry_safe(pl, tmp, progs, node) {
 			list_del(&pl->node);
-			bpf_prog_put(pl->prog);
+			if (pl->prog)
+				bpf_prog_put(pl->prog);
+			if (pl->link)
+				bpf_cgroup_link_auto_detach(pl->link);
 			bpf_cgroup_storages_unlink(pl->storage);
 			bpf_cgroup_storages_free(pl->storage);
 			kfree(pl);
@@ -134,6 +148,18 @@ static void cgroup_bpf_release_fn(struct percpu_ref *ref)
 	queue_work(system_wq, &cgrp->bpf.release_work);
 }
 
+/* Get underlying bpf_prog of bpf_prog_list entry, regardless if it's through
+ * link or direct prog.
+ */
+static struct bpf_prog *prog_list_prog(struct bpf_prog_list *pl)
+{
+	if (pl->prog)
+		return pl->prog;
+	if (pl->link)
+		return pl->link->link.prog;
+	return NULL;
+}
+
 /* count number of elements in the list.
  * it's slow but the list cannot be long
  */
@@ -143,7 +169,7 @@ static u32 prog_list_length(struct list_head *head)
 	u32 cnt = 0;
 
 	list_for_each_entry(pl, head, node) {
-		if (!pl->prog)
+		if (!prog_list_prog(pl))
 			continue;
 		cnt++;
 	}
@@ -212,11 +238,11 @@ static int compute_effective_progs(struct cgroup *cgrp,
 			continue;
 
 		list_for_each_entry(pl, &p->bpf.progs[type], node) {
-			if (!pl->prog)
+			if (!prog_list_prog(pl))
 				continue;
 
 			item = &progs->items[cnt];
-			item->prog = pl->prog;
+			item->prog = prog_list_prog(pl);
 			bpf_cgroup_storages_assign(item->cgroup_storage,
 						   pl->storage);
 			cnt++;
@@ -333,19 +359,66 @@ static int update_effective_progs(struct cgroup *cgrp,
 
 #define BPF_CGROUP_MAX_PROGS 64
 
+static struct bpf_prog_list *find_attach_entry(struct list_head *progs,
+					       struct bpf_prog *prog,
+					       struct bpf_cgroup_link *link,
+					       struct bpf_prog *replace_prog,
+					       bool allow_multi)
+{
+	struct bpf_prog_list *pl;
+
+	/* single-attach case */
+	if (!allow_multi) {
+		if (list_empty(progs))
+			return NULL;
+
+		pl = list_first_entry(progs, typeof(*pl), node);
+		if (pl->link)
+			/* can't replace existing link */
+			return ERR_PTR(-EINVAL);
+		return pl;
+	}
+
+	list_for_each_entry(pl, progs, node) {
+		if (prog && pl->prog == prog)
+			/* disallow attaching the same prog twice */
+			return ERR_PTR(-EINVAL);
+		if (link && pl->link == link)
+			/* disallow attaching the same link twice */
+			return ERR_PTR(-EINVAL);
+	}
+
+
+	/* direct prog multi-attach w/ replacement case */
+	if (replace_prog) {
+		list_for_each_entry(pl, progs, node) {
+			if (pl->prog == replace_prog)
+				/* a match found */
+				return pl;
+		}
+		/* prog to replace not found for cgroup */
+		return ERR_PTR(-ENOENT);
+	}
+
+	return NULL;
+}
+
 /**
- * __cgroup_bpf_attach() - Attach the program to a cgroup, and
+ * __cgroup_bpf_attach() - Attach the program or the link to a cgroup, and
  *                         propagate the change to descendants
  * @cgrp: The cgroup which descendants to traverse
  * @prog: A program to attach
+ * @link: A link to attach
  * @replace_prog: Previously attached program to replace if BPF_F_REPLACE is set
  * @type: Type of attach operation
  * @flags: Option flags
  *
+ * Exactly one of @prog or @link can be non-null.
  * Must be called with cgroup_mutex held.
  */
-int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
-			struct bpf_prog *replace_prog,
+int __cgroup_bpf_attach(struct cgroup *cgrp,
+			struct bpf_prog *prog, struct bpf_prog *replace_prog,
+			struct bpf_cgroup_link *link,
 			enum bpf_attach_type type, u32 flags)
 {
 	u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI));
@@ -353,13 +426,19 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
 	struct bpf_prog *old_prog = NULL;
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE],
 		*old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {NULL};
-	struct bpf_prog_list *pl, *replace_pl = NULL;
+	struct bpf_prog_list *pl;
 	int err;
 
 	if (((flags & BPF_F_ALLOW_OVERRIDE) && (flags & BPF_F_ALLOW_MULTI)) ||
 	    ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI)))
 		/* invalid combination */
 		return -EINVAL;
+	if (link && (prog || replace_prog))
+		/* only either link or prog/replace_prog can be specified */
+		return -EINVAL;
+	if (!!replace_prog != !!(flags & BPF_F_REPLACE))
+		/* replace_prog implies BPF_F_REPLACE, and vice versa */
+		return -EINVAL;
 
 	if (!hierarchy_allows_attach(cgrp, type))
 		return -EPERM;
@@ -374,26 +453,15 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
 	if (prog_list_length(progs) >= BPF_CGROUP_MAX_PROGS)
 		return -E2BIG;
 
-	if (flags & BPF_F_ALLOW_MULTI) {
-		list_for_each_entry(pl, progs, node) {
-			if (pl->prog == prog)
-				/* disallow attaching the same prog twice */
-				return -EINVAL;
-			if (pl->prog == replace_prog)
-				replace_pl = pl;
-		}
-		if ((flags & BPF_F_REPLACE) && !replace_pl)
-			/* prog to replace not found for cgroup */
-			return -ENOENT;
-	} else if (!list_empty(progs)) {
-		replace_pl = list_first_entry(progs, typeof(*pl), node);
-	}
+	pl = find_attach_entry(progs, prog, link, replace_prog,
+			       flags & BPF_F_ALLOW_MULTI);
+	if (IS_ERR(pl))
+		return PTR_ERR(pl);
 
-	if (bpf_cgroup_storages_alloc(storage, prog))
+	if (bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog))
 		return -ENOMEM;
 
-	if (replace_pl) {
-		pl = replace_pl;
+	if (pl) {
 		old_prog = pl->prog;
 		bpf_cgroup_storages_unlink(pl->storage);
 		bpf_cgroup_storages_assign(old_storage, pl->storage);
@@ -407,6 +475,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
 	}
 
 	pl->prog = prog;
+	pl->link = link;
 	bpf_cgroup_storages_assign(pl->storage, storage);
 	cgrp->bpf.flags[type] = saved_flags;
 
@@ -414,80 +483,97 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
 	if (err)
 		goto cleanup;
 
-	static_branch_inc(&cgroup_bpf_enabled_key);
 	bpf_cgroup_storages_free(old_storage);
-	if (old_prog) {
+	if (old_prog)
 		bpf_prog_put(old_prog);
-		static_branch_dec(&cgroup_bpf_enabled_key);
-	}
-	bpf_cgroup_storages_link(storage, cgrp, type);
+	else
+		static_branch_inc(&cgroup_bpf_enabled_key);
+	bpf_cgroup_storages_link(pl->storage, cgrp, type);
 	return 0;
 
 cleanup:
-	/* and cleanup the prog list */
-	pl->prog = old_prog;
+	if (old_prog) {
+		pl->prog = old_prog;
+		pl->link = NULL;
+	}
 	bpf_cgroup_storages_free(pl->storage);
 	bpf_cgroup_storages_assign(pl->storage, old_storage);
 	bpf_cgroup_storages_link(pl->storage, cgrp, type);
-	if (!replace_pl) {
+	if (!old_prog) {
 		list_del(&pl->node);
 		kfree(pl);
 	}
 	return err;
 }
 
+static struct bpf_prog_list *find_detach_entry(struct list_head *progs,
+					       struct bpf_prog *prog,
+					       struct bpf_cgroup_link *link,
+					       bool allow_multi)
+{
+	struct bpf_prog_list *pl;
+
+	if (!allow_multi) {
+		if (list_empty(progs))
+			/* report error when trying to detach and nothing is attached */
+			return ERR_PTR(-ENOENT);
+
+		pl = list_first_entry(progs, typeof(*pl), node);
+		/* link can't be detached except by its own destructor */
+		if (pl->link && pl->link != link)
+			return ERR_PTR(-ENOENT);
+		/* to maintain backward compatibility NONE and OVERRIDE cgroups
+		 * allow detaching with invalid FD (prog==NULL) in legacy mode
+		 */
+		return list_first_entry(progs, typeof(*pl), node);
+	}
+
+	if (!prog && !link)
+		/* to detach MULTI prog the user has to specify valid FD
+		 * of the program to be detached
+		 */
+		return ERR_PTR(-EINVAL);
+
+	/* find the prog and detach it */
+	list_for_each_entry(pl, progs, node) {
+		if (pl->prog == prog && pl->link == link)
+			return pl;
+	}
+	return ERR_PTR(-ENOENT);
+}
+
 /**
- * __cgroup_bpf_detach() - Detach the program from a cgroup, and
+ * __cgroup_bpf_detach() - Detach the program or link from a cgroup, and
  *                         propagate the change to descendants
  * @cgrp: The cgroup which descendants to traverse
  * @prog: A program to detach or NULL
+ * @prog: A link to detach or NULL
  * @type: Type of detach operation
  *
+ * At most one of @prog or @link can be non-NULL.
  * Must be called with cgroup_mutex held.
  */
 int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
-			enum bpf_attach_type type)
+			struct bpf_cgroup_link *link, enum bpf_attach_type type)
 {
 	struct list_head *progs = &cgrp->bpf.progs[type];
 	u32 flags = cgrp->bpf.flags[type];
-	struct bpf_prog *old_prog = NULL;
 	struct bpf_prog_list *pl;
+	struct bpf_prog *old_prog;
 	int err;
 
-	if (flags & BPF_F_ALLOW_MULTI) {
-		if (!prog)
-			/* to detach MULTI prog the user has to specify valid FD
-			 * of the program to be detached
-			 */
-			return -EINVAL;
-	} else {
-		if (list_empty(progs))
-			/* report error when trying to detach and nothing is attached */
-			return -ENOENT;
-	}
+	if (prog && link)
+		/* only one of prog or link can be specified */
+		return -EINVAL;
 
-	if (flags & BPF_F_ALLOW_MULTI) {
-		/* find the prog and detach it */
-		list_for_each_entry(pl, progs, node) {
-			if (pl->prog != prog)
-				continue;
-			old_prog = prog;
-			/* mark it deleted, so it's ignored while
-			 * recomputing effective
-			 */
-			pl->prog = NULL;
-			break;
-		}
-		if (!old_prog)
-			return -ENOENT;
-	} else {
-		/* to maintain backward compatibility NONE and OVERRIDE cgroups
-		 * allow detaching with invalid FD (prog==NULL)
-		 */
-		pl = list_first_entry(progs, typeof(*pl), node);
-		old_prog = pl->prog;
-		pl->prog = NULL;
-	}
+	pl = find_detach_entry(progs, prog, link, flags & BPF_F_ALLOW_MULTI);
+	if (IS_ERR(pl))
+		return PTR_ERR(pl);
+
+	/* mark it deleted, so it's ignored while recomputing effective */
+	old_prog = pl->prog;
+	pl->prog = NULL;
+	pl->link = NULL;
 
 	err = update_effective_progs(cgrp, type);
 	if (err)
@@ -501,14 +587,15 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 	if (list_empty(progs))
 		/* last program was detached, reset flags to zero */
 		cgrp->bpf.flags[type] = 0;
-
-	bpf_prog_put(old_prog);
+	if (old_prog)
+		bpf_prog_put(old_prog);
 	static_branch_dec(&cgroup_bpf_enabled_key);
 	return 0;
 
 cleanup:
-	/* and restore back old_prog */
+	/* restore back prog or link */
 	pl->prog = old_prog;
+	pl->link = link;
 	return err;
 }
 
@@ -521,6 +608,7 @@ int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 	struct list_head *progs = &cgrp->bpf.progs[type];
 	u32 flags = cgrp->bpf.flags[type];
 	struct bpf_prog_array *effective;
+	struct bpf_prog *prog;
 	int cnt, ret = 0, i;
 
 	effective = rcu_dereference_protected(cgrp->bpf.effective[type],
@@ -551,7 +639,8 @@ int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 
 		i = 0;
 		list_for_each_entry(pl, progs, node) {
-			id = pl->prog->aux->id;
+			prog = prog_list_prog(pl);
+			id = prog->aux->id;
 			if (copy_to_user(prog_ids + i, &id, sizeof(id)))
 				return -EFAULT;
 			if (++i == cnt)
@@ -581,8 +670,8 @@ int cgroup_bpf_prog_attach(const union bpf_attr *attr,
 		}
 	}
 
-	ret = cgroup_bpf_attach(cgrp, prog, replace_prog, attr->attach_type,
-				attr->attach_flags);
+	ret = cgroup_bpf_attach(cgrp, prog, replace_prog, NULL,
+				attr->attach_type, attr->attach_flags);
 
 	if (replace_prog)
 		bpf_prog_put(replace_prog);
@@ -604,7 +693,7 @@ int cgroup_bpf_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
 	if (IS_ERR(prog))
 		prog = NULL;
 
-	ret = cgroup_bpf_detach(cgrp, prog, attr->attach_type, 0);
+	ret = cgroup_bpf_detach(cgrp, prog, attr->attach_type);
 	if (prog)
 		bpf_prog_put(prog);
 
@@ -612,6 +701,93 @@ int cgroup_bpf_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
 	return ret;
 }
 
+static void bpf_cgroup_link_release(struct bpf_link *link)
+{
+	struct bpf_cgroup_link *cg_link =
+		container_of(link, struct bpf_cgroup_link, link);
+
+	/* link might have been auto-detached by dying cgroup already,
+	 * in that case our work is done here
+	 */
+	if (!cg_link->cgroup)
+		return;
+
+	mutex_lock(&cgroup_mutex);
+
+	/* re-check cgroup under lock again */
+	if (!cg_link->cgroup) {
+		mutex_unlock(&cgroup_mutex);
+		return;
+	}
+
+	WARN_ON(__cgroup_bpf_detach(cg_link->cgroup, NULL, cg_link,
+				    cg_link->type));
+
+	mutex_unlock(&cgroup_mutex);
+	cgroup_put(cg_link->cgroup);
+}
+
+static void bpf_cgroup_link_dealloc(struct bpf_link *link)
+{
+	struct bpf_cgroup_link *cg_link =
+		container_of(link, struct bpf_cgroup_link, link);
+
+	kfree(cg_link);
+}
+
+const struct bpf_link_ops bpf_cgroup_link_lops = {
+	.release = bpf_cgroup_link_release,
+	.dealloc = bpf_cgroup_link_dealloc,
+};
+
+#define BPF_CGROUP_LINK_CREATE_MASK \
+	(BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI)
+
+int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	struct bpf_cgroup_link *link;
+	struct file *link_file;
+	struct cgroup *cgrp;
+	int err, link_fd;
+
+	if (attr->link_create.flags & ~BPF_CGROUP_LINK_CREATE_MASK)
+		return -EINVAL;
+
+	cgrp = cgroup_get_from_fd(attr->link_create.target_fd);
+	if (IS_ERR(cgrp))
+		return PTR_ERR(cgrp);
+
+	link = kzalloc(sizeof(*link), GFP_USER);
+	if (!link) {
+		err = -ENOMEM;
+		goto out_put_cgroup;
+	}
+	bpf_link_init(&link->link, &bpf_cgroup_link_lops, prog);
+	link->cgroup = cgrp;
+	link->type = attr->link_create.attach_type;
+
+	link_file = bpf_link_new_file(&link->link, &link_fd);
+	if (IS_ERR(link_file)) {
+		kfree(link);
+		err = PTR_ERR(link_file);
+		goto out_put_cgroup;
+	}
+
+	err = cgroup_bpf_attach(cgrp, NULL, NULL, link, link->type,
+				attr->link_create.flags);
+	if (err) {
+		bpf_link_cleanup(&link->link, link_file, link_fd);
+		goto out_put_cgroup;
+	}
+
+	fd_install(link_fd, link_file);
+	return link_fd;
+
+out_put_cgroup:
+	cgroup_put(cgrp);
+	return err;
+}
+
 int cgroup_bpf_prog_query(const union bpf_attr *attr,
 			  union bpf_attr __user *uattr)
 {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index fd4181939064..638ec8b54741 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2173,13 +2173,6 @@ static int bpf_obj_get(const union bpf_attr *attr)
 				attr->file_flags);
 }
 
-struct bpf_link {
-	atomic64_t refcnt;
-	const struct bpf_link_ops *ops;
-	struct bpf_prog *prog;
-	struct work_struct work;
-};
-
 void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
 		   struct bpf_prog *prog)
 {
@@ -2193,8 +2186,8 @@ void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
  * anon_inode's release() call. This helper manages marking bpf_link as
  * defunct, releases anon_inode file and puts reserved FD.
  */
-static void bpf_link_cleanup(struct bpf_link *link, struct file *link_file,
-			     int link_fd)
+void bpf_link_cleanup(struct bpf_link *link, struct file *link_file,
+		      int link_fd)
 {
 	link->prog = NULL;
 	fput(link_file);
@@ -2252,7 +2245,6 @@ static int bpf_link_release(struct inode *inode, struct file *filp)
 #ifdef CONFIG_PROC_FS
 static const struct bpf_link_ops bpf_raw_tp_lops;
 static const struct bpf_link_ops bpf_tracing_link_lops;
-static const struct bpf_link_ops bpf_xdp_link_lops;
 
 static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
 {
@@ -2265,6 +2257,10 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
 		link_type = "raw_tracepoint";
 	else if (link->ops == &bpf_tracing_link_lops)
 		link_type = "tracing";
+#ifdef CONFIG_CGROUP_BPF
+	else if (link->ops == &bpf_cgroup_link_lops)
+		link_type = "cgroup";
+#endif
 	else
 		link_type = "unknown";
 
@@ -3533,6 +3529,49 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
 	return err;
 }
 
+#define BPF_LINK_CREATE_LAST_FIELD link_create.flags
+static int link_create(union bpf_attr *attr)
+{
+	enum bpf_prog_type ptype;
+	struct bpf_prog *prog;
+	int ret;
+
+	if (CHECK_ATTR(BPF_LINK_CREATE))
+		return -EINVAL;
+
+	ptype = attach_type_to_prog_type(attr->link_create.attach_type);
+	if (ptype == BPF_PROG_TYPE_UNSPEC)
+		return -EINVAL;
+
+	prog = bpf_prog_get_type(attr->link_create.prog_fd, ptype);
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	ret = bpf_prog_attach_check_attach_type(prog,
+						attr->link_create.attach_type);
+	if (ret)
+		goto err_out;
+
+	switch (ptype) {
+	case BPF_PROG_TYPE_CGROUP_SKB:
+	case BPF_PROG_TYPE_CGROUP_SOCK:
+	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
+	case BPF_PROG_TYPE_SOCK_OPS:
+	case BPF_PROG_TYPE_CGROUP_DEVICE:
+	case BPF_PROG_TYPE_CGROUP_SYSCTL:
+	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+		ret = cgroup_bpf_link_attach(attr, prog);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+err_out:
+	if (ret < 0)
+		bpf_prog_put(prog);
+	return ret;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
 {
 	union bpf_attr attr = {};
@@ -3643,6 +3682,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_MAP_DELETE_BATCH:
 		err = bpf_map_do_batch(&attr, uattr, BPF_MAP_DELETE_BATCH);
 		break;
+	case BPF_LINK_CREATE:
+		err = link_create(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 3dead0416b91..219624fba9ba 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6303,27 +6303,31 @@ void cgroup_sk_free(struct sock_cgroup_data *skcd)
 #endif	/* CONFIG_SOCK_CGROUP_DATA */
 
 #ifdef CONFIG_CGROUP_BPF
-int cgroup_bpf_attach(struct cgroup *cgrp, struct bpf_prog *prog,
-		      struct bpf_prog *replace_prog, enum bpf_attach_type type,
+int cgroup_bpf_attach(struct cgroup *cgrp,
+		      struct bpf_prog *prog, struct bpf_prog *replace_prog,
+		      struct bpf_cgroup_link *link,
+		      enum bpf_attach_type type,
 		      u32 flags)
 {
 	int ret;
 
 	mutex_lock(&cgroup_mutex);
-	ret = __cgroup_bpf_attach(cgrp, prog, replace_prog, type, flags);
+	ret = __cgroup_bpf_attach(cgrp, prog, replace_prog, link, type, flags);
 	mutex_unlock(&cgroup_mutex);
 	return ret;
 }
+
 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
-		      enum bpf_attach_type type, u32 flags)
+		      enum bpf_attach_type type)
 {
 	int ret;
 
 	mutex_lock(&cgroup_mutex);
-	ret = __cgroup_bpf_detach(cgrp, prog, type);
+	ret = __cgroup_bpf_detach(cgrp, prog, NULL, type);
 	mutex_unlock(&cgroup_mutex);
 	return ret;
 }
+
 int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		     union bpf_attr __user *uattr)
 {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 5d01c5c7e598..948ebbfd401b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -111,6 +111,7 @@ enum bpf_cmd {
 	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
 	BPF_MAP_UPDATE_BATCH,
 	BPF_MAP_DELETE_BATCH,
+	BPF_LINK_CREATE,
 };
 
 enum bpf_map_type {
@@ -539,7 +540,7 @@ union bpf_attr {
 		__u32		prog_cnt;
 	} query;
 
-	struct {
+	struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
 		__u64 name;
 		__u32 prog_fd;
 	} raw_tracepoint;
@@ -567,6 +568,13 @@ union bpf_attr {
 		__u64		probe_offset;	/* output: probe_offset */
 		__u64		probe_addr;	/* output: probe_addr */
 	} task_fd_query;
+
+	struct { /* struct used by BPF_LINK_CREATE command */
+		__u32		prog_fd;	/* eBPF program to attach */
+		__u32		target_fd;	/* object to attach to */
+		__u32		attach_type;	/* attach type */
+		__u32		flags;		/* extra flags */
+	} link_create;
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-25  6:57 [PATCH v2 bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2020-03-25  6:57 ` [PATCH v2 bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment Andrii Nakryiko
@ 2020-03-25  6:57 ` Andrii Nakryiko
  2020-03-25 22:57   ` kbuild test robot
                     ` (2 more replies)
  2020-03-25  6:57 ` [PATCH v2 bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment Andrii Nakryiko
  2020-03-25  6:57 ` [PATCH v2 bpf-next 6/6] selftests/bpf: test FD-based " Andrii Nakryiko
  5 siblings, 3 replies; 14+ messages in thread
From: Andrii Nakryiko @ 2020-03-25  6:57 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, rdna
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from
under given bpf_link. Currently this is only supported for bpf_cgroup_link,
but will be extended to other kinds of bpf_links in follow-up patches.

For bpf_cgroup_link, implemented functionality matches existing semantics for
direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either
unconditionally set new bpf_prog regardless of which bpf_prog is currently
active under given bpf_link, or, optionally, can specify expected active
bpf_prog. If active bpf_prog doesn't match expected one, no changes are
performed, old bpf_link stays intact and attached, operation returns
a failure.

cgroup_bpf_replace() operation is resolving race between auto-detachment and
bpf_prog update in the same fashion as it's done for bpf_link detachment,
except in this case update has no way of succeeding because of target cgroup
marked as dying. So in this case error is returned.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 include/linux/bpf-cgroup.h | 11 ++++++
 include/uapi/linux/bpf.h   | 12 ++++++
 kernel/bpf/cgroup.c        | 80 ++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c       | 52 +++++++++++++++++++++++++
 kernel/cgroup/cgroup.c     | 27 +++++++++++++
 5 files changed, 182 insertions(+)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index d2d969669564..a8d78efd3cea 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -100,6 +100,8 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
 int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 			struct bpf_cgroup_link *link,
 			enum bpf_attach_type type);
+int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
+			 struct bpf_prog *new_prog);
 int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		       union bpf_attr __user *uattr);
 
@@ -110,6 +112,8 @@ int cgroup_bpf_attach(struct cgroup *cgrp,
 		      u32 flags);
 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 		      enum bpf_attach_type type);
+int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *old_prog,
+		       struct bpf_prog *new_prog);
 int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		     union bpf_attr __user *uattr);
 
@@ -373,6 +377,13 @@ static inline int cgroup_bpf_link_attach(const union bpf_attr *attr,
 	return -EINVAL;
 }
 
+static inline int cgroup_bpf_replace(struct bpf_link *link,
+				     struct bpf_prog *old_prog,
+				     struct bpf_prog *new_prog)
+{
+	return -EINVAL;
+}
+
 static inline int cgroup_bpf_prog_query(const union bpf_attr *attr,
 					union bpf_attr __user *uattr)
 {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 948ebbfd401b..d7583483fca5 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -112,6 +112,7 @@ enum bpf_cmd {
 	BPF_MAP_UPDATE_BATCH,
 	BPF_MAP_DELETE_BATCH,
 	BPF_LINK_CREATE,
+	BPF_LINK_UPDATE,
 };
 
 enum bpf_map_type {
@@ -575,6 +576,17 @@ union bpf_attr {
 		__u32		attach_type;	/* attach type */
 		__u32		flags;		/* extra flags */
 	} link_create;
+
+	struct { /* struct used by BPF_LINK_UPDATE command */
+		__u32		link_fd;	/* link fd */
+		/* new program fd to update link with */
+		__u32		new_prog_fd;
+		__u32		flags;		/* extra flags */
+		/* expected link's program fd; is specified only if
+		 * BPF_F_REPLACE flag is set in flags */
+		__u32		old_prog_fd;
+	} link_update;
+
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index c5cedc8c3428..2c70e2c95cb7 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -506,6 +506,86 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
 	return err;
 }
 
+/* Swap updated BPF program for given link in effective program arrays across
+ * all descendant cgroups. This function is guaranteed to succeed.
+ */
+static void replace_effective_prog(struct cgroup *cgrp,
+				   enum bpf_attach_type type,
+				   struct bpf_cgroup_link *link)
+{
+	struct bpf_prog_array_item *item;
+	struct cgroup_subsys_state *css;
+	struct bpf_prog_array *progs;
+	struct bpf_prog_list *pl;
+	struct list_head *head;
+	struct cgroup *cg;
+	int pos;
+
+	css_for_each_descendant_pre(css, &cgrp->self) {
+		struct cgroup *desc = container_of(css, struct cgroup, self);
+
+		if (percpu_ref_is_zero(&desc->bpf.refcnt))
+			continue;
+
+		/* found position of link in effective progs array */
+		for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) {
+			if (pos && !(cg->bpf.flags[type] & BPF_F_ALLOW_MULTI))
+				continue;
+
+			head = &cg->bpf.progs[type];
+			list_for_each_entry(pl, head, node) {
+				if (!prog_list_prog(pl))
+					continue;
+				if (pl->link == link)
+					goto found;
+				pos++;
+			}
+		}
+found:
+		BUG_ON(!cg);
+		progs = rcu_dereference_protected(
+				desc->bpf.effective[type],
+				lockdep_is_held(&cgroup_mutex));
+		item = &progs->items[pos];
+		WRITE_ONCE(item->prog, link->link.prog);
+	}
+}
+
+/**
+ * __cgroup_bpf_replace() - Replace link's program and propagate the change
+ *                          to descendants
+ * @cgrp: The cgroup which descendants to traverse
+ * @link: A link for which to replace BPF program
+ * @type: Type of attach operation
+ *
+ * Must be called with cgroup_mutex held.
+ */
+int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
+			 struct bpf_prog *new_prog)
+{
+	struct list_head *progs = &cgrp->bpf.progs[link->type];
+	struct bpf_prog *old_prog;
+	struct bpf_prog_list *pl;
+	bool found = false;
+
+	if (link->link.prog->type != new_prog->type)
+		return -EINVAL;
+
+	list_for_each_entry(pl, progs, node) {
+		if (pl->link == link) {
+			found = true;
+			break;
+		}
+	}
+	if (!found)
+		return -ENOENT;
+
+	old_prog = xchg(&link->link.prog, new_prog);
+	replace_effective_prog(cgrp, link->type, link);
+	bpf_prog_put(old_prog);
+	return 0;
+}
+
 static struct bpf_prog_list *find_detach_entry(struct list_head *progs,
 					       struct bpf_prog *prog,
 					       struct bpf_cgroup_link *link,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 638ec8b54741..a52426e1e0df 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3572,6 +3572,55 @@ static int link_create(union bpf_attr *attr)
 	return ret;
 }
 
+#define BPF_LINK_UPDATE_LAST_FIELD link_update.old_prog_fd
+
+static int link_update(union bpf_attr *attr)
+{
+	struct bpf_prog *old_prog = NULL, *new_prog;
+	struct bpf_link *link;
+	u32 flags;
+	int ret;
+
+	if (CHECK_ATTR(BPF_LINK_UPDATE))
+		return -EINVAL;
+
+	flags = attr->link_update.flags;
+	if (flags & ~BPF_F_REPLACE)
+		return -EINVAL;
+
+	link = bpf_link_get_from_fd(attr->link_update.link_fd);
+	if (IS_ERR(link))
+		return PTR_ERR(link);
+
+	new_prog = bpf_prog_get(attr->link_update.new_prog_fd);
+	if (IS_ERR(new_prog))
+		return PTR_ERR(new_prog);
+
+	if (flags & BPF_F_REPLACE) {
+		old_prog = bpf_prog_get(attr->link_update.old_prog_fd);
+		if (IS_ERR(old_prog)) {
+			ret = PTR_ERR(old_prog);
+			old_prog = NULL;
+			goto out_put_progs;
+		}
+	}
+
+#ifdef CONFIG_CGROUP_BPF
+	if (link->ops == &bpf_cgroup_link_lops) {
+		ret = cgroup_bpf_replace(link, old_prog, new_prog);
+		goto out_put_progs;
+	}
+#endif
+	ret = -EINVAL;
+
+out_put_progs:
+	if (old_prog)
+		bpf_prog_put(old_prog);
+	if (ret)
+		bpf_prog_put(new_prog);
+	return ret;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
 {
 	union bpf_attr attr = {};
@@ -3685,6 +3734,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_LINK_CREATE:
 		err = link_create(&attr);
 		break;
+	case BPF_LINK_UPDATE:
+		err = link_update(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 219624fba9ba..915dda3f7f19 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6317,6 +6317,33 @@ int cgroup_bpf_attach(struct cgroup *cgrp,
 	return ret;
 }
 
+int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *old_prog,
+		       struct bpf_prog *new_prog)
+{
+	struct bpf_cgroup_link *cg_link;
+	int ret;
+
+	if (link->ops != &bpf_cgroup_link_lops)
+		return -EINVAL;
+
+	cg_link = container_of(link, struct bpf_cgroup_link, link);
+
+	mutex_lock(&cgroup_mutex);
+	/* link might have been auto-released by dying cgroup, so fail */
+	if (!cg_link->cgroup) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+	if (old_prog && link->prog != old_prog) {
+		ret = -EPERM;
+		goto out_unlock;
+	}
+	ret = __cgroup_bpf_replace(cg_link->cgroup, cg_link, new_prog);
+out_unlock:
+	mutex_unlock(&cgroup_mutex);
+	return ret;
+}
+
 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 		      enum bpf_attach_type type)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment
  2020-03-25  6:57 [PATCH v2 bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2020-03-25  6:57 ` [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Andrii Nakryiko
@ 2020-03-25  6:57 ` Andrii Nakryiko
  2020-03-25  6:57 ` [PATCH v2 bpf-next 6/6] selftests/bpf: test FD-based " Andrii Nakryiko
  5 siblings, 0 replies; 14+ messages in thread
From: Andrii Nakryiko @ 2020-03-25  6:57 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, rdna
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add bpf_program__attach_cgroup(), which uses BPF_LINK_CREATE subcommand to
create an FD-based kernel bpf_link. Also add low-level bpf_link_create() API.

If expected_attach_type is not specified explicitly with
bpf_program__set_expected_attach_type(), libbpf will try to determine proper
attach type from BPF program's section definition.

Also add support for bpf_link's underlying BPF program replacement:
  - unconditional through high-level bpf_link__update_program() API;
  - cmpxchg-like with specifying expected current BPF program through
    low-level bpf_link_update() API.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/include/uapi/linux/bpf.h | 12 +++++++++
 tools/lib/bpf/bpf.c            | 35 ++++++++++++++++++++++++
 tools/lib/bpf/bpf.h            | 20 ++++++++++++++
 tools/lib/bpf/libbpf.c         | 49 ++++++++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.h         |  9 ++++++-
 tools/lib/bpf/libbpf.map       |  4 +++
 6 files changed, 128 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 948ebbfd401b..d7583483fca5 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -112,6 +112,7 @@ enum bpf_cmd {
 	BPF_MAP_UPDATE_BATCH,
 	BPF_MAP_DELETE_BATCH,
 	BPF_LINK_CREATE,
+	BPF_LINK_UPDATE,
 };
 
 enum bpf_map_type {
@@ -575,6 +576,17 @@ union bpf_attr {
 		__u32		attach_type;	/* attach type */
 		__u32		flags;		/* extra flags */
 	} link_create;
+
+	struct { /* struct used by BPF_LINK_UPDATE command */
+		__u32		link_fd;	/* link fd */
+		/* new program fd to update link with */
+		__u32		new_prog_fd;
+		__u32		flags;		/* extra flags */
+		/* expected link's program fd; is specified only if
+		 * BPF_F_REPLACE flag is set in flags */
+		__u32		old_prog_fd;
+	} link_update;
+
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index c6dafe563176..b5eecb390c0c 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -584,6 +584,41 @@ int bpf_prog_detach2(int prog_fd, int target_fd, enum bpf_attach_type type)
 	return sys_bpf(BPF_PROG_DETACH, &attr, sizeof(attr));
 }
 
+int bpf_link_create(int prog_fd, int target_fd,
+		    enum bpf_attach_type attach_type,
+		    const struct bpf_link_create_opts *opts)
+{
+	union bpf_attr attr;
+
+	if (!OPTS_VALID(opts, bpf_link_create_opts))
+		return -EINVAL;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.link_create.prog_fd = prog_fd;
+	attr.link_create.target_fd = target_fd;
+	attr.link_create.attach_type = attach_type;
+	attr.link_create.flags = OPTS_GET(opts, flags, 0);
+
+	return sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
+}
+
+int bpf_link_update(int link_fd, int new_prog_fd,
+		    const struct bpf_link_update_opts *opts)
+{
+	union bpf_attr attr;
+
+	if (!OPTS_VALID(opts, bpf_link_update_opts))
+		return -EINVAL;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.link_update.link_fd = link_fd;
+	attr.link_update.new_prog_fd = new_prog_fd;
+	attr.link_update.flags = OPTS_GET(opts, flags, 0);
+	attr.link_update.old_prog_fd = OPTS_GET(opts, old_prog_fd, 0);
+
+	return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr));
+}
+
 int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
 		   __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt)
 {
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index b976e77316cc..880879f4e434 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -168,6 +168,26 @@ LIBBPF_API int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
 LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd,
 				enum bpf_attach_type type);
 
+struct bpf_link_create_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u32 flags;
+};
+#define bpf_link_create_opts__last_field flags
+
+LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
+			       enum bpf_attach_type attach_type,
+			       const struct bpf_link_create_opts *opts);
+
+struct bpf_link_update_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u32 flags;	   /* extra flags */
+	__u32 old_prog_fd; /* expected old program FD */
+};
+#define bpf_link_update_opts__last_field old_prog_fd
+
+LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd,
+			       const struct bpf_link_update_opts *opts);
+
 struct bpf_prog_test_run_attr {
 	int prog_fd;
 	int repeat;
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 085e41f9b68e..09055fcb3c33 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6951,6 +6951,12 @@ struct bpf_link {
 	bool disconnected;
 };
 
+/* Replace link's underlying BPF program with the new one */
+int bpf_link__update_program(struct bpf_link *link, struct bpf_program *prog)
+{
+	return bpf_link_update(bpf_link__fd(link), bpf_program__fd(prog), NULL);
+}
+
 /* Release "ownership" of underlying BPF resource (typically, BPF program
  * attached to some BPF hook, e.g., tracepoint, kprobe, etc). Disconnected
  * link, when destructed through bpf_link__destroy() call won't attempt to
@@ -7489,6 +7495,49 @@ static struct bpf_link *attach_trace(const struct bpf_sec_def *sec,
 	return bpf_program__attach_trace(prog);
 }
 
+struct bpf_link *bpf_program__attach_cgroup(struct bpf_program *prog,
+					    int cgroup_fd, __u32 flags)
+{
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+	const struct bpf_sec_def *sec_def;
+	enum bpf_attach_type attach_type;
+	char errmsg[STRERR_BUFSIZE];
+	struct bpf_link *link;
+	int prog_fd, link_fd;
+
+
+	prog_fd = bpf_program__fd(prog);
+	if (prog_fd < 0) {
+		pr_warn("program '%s': can't attach before loaded\n",
+			bpf_program__title(prog, false));
+		return ERR_PTR(-EINVAL);
+	}
+
+	link = calloc(1, sizeof(*link));
+	if (!link)
+		return ERR_PTR(-ENOMEM);
+	link->detach = &bpf_link__detach_fd;
+
+	attach_type = bpf_program__get_expected_attach_type(prog);
+	if (!attach_type) {
+		sec_def = find_sec_def(bpf_program__title(prog, false));
+		if (sec_def)
+			attach_type = sec_def->attach_type;
+	}
+	opts.flags = flags;
+	link_fd = bpf_link_create(prog_fd, cgroup_fd, attach_type, &opts);
+	if (link_fd < 0) {
+		link_fd = -errno;
+		free(link);
+		pr_warn("program '%s': failed to attach to cgroup: %s\n",
+			bpf_program__title(prog, false),
+			libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
+		return ERR_PTR(link_fd);
+	}
+	link->fd = link_fd;
+	return link;
+}
+
 struct bpf_link *bpf_program__attach(struct bpf_program *prog)
 {
 	const struct bpf_sec_def *sec_def;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index d38d7a629417..38288e8709b6 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -224,6 +224,8 @@ LIBBPF_API int bpf_link__fd(const struct bpf_link *link);
 LIBBPF_API const char *bpf_link__pin_path(const struct bpf_link *link);
 LIBBPF_API int bpf_link__pin(struct bpf_link *link, const char *path);
 LIBBPF_API int bpf_link__unpin(struct bpf_link *link);
+LIBBPF_API int bpf_link__update_program(struct bpf_link *link,
+					struct bpf_program *prog);
 LIBBPF_API void bpf_link__disconnect(struct bpf_link *link);
 LIBBPF_API int bpf_link__destroy(struct bpf_link *link);
 
@@ -245,11 +247,16 @@ bpf_program__attach_tracepoint(struct bpf_program *prog,
 LIBBPF_API struct bpf_link *
 bpf_program__attach_raw_tracepoint(struct bpf_program *prog,
 				   const char *tp_name);
-
 LIBBPF_API struct bpf_link *
 bpf_program__attach_trace(struct bpf_program *prog);
+LIBBPF_API struct bpf_link *
+bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd,
+			   __u32 flags);
+
 struct bpf_map;
+
 LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(struct bpf_map *map);
+
 struct bpf_insn;
 
 /*
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 5129283c0284..793c5af07b23 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -243,5 +243,9 @@ LIBBPF_0.0.8 {
 		bpf_link__pin;
 		bpf_link__pin_path;
 		bpf_link__unpin;
+		bpf_link__update_program;
+		bpf_link_create;
+		bpf_link_update;
+		bpf_program__attach_cgroup;
 		bpf_program__set_attach_target;
 } LIBBPF_0.0.7;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 bpf-next 6/6] selftests/bpf: test FD-based cgroup attachment
  2020-03-25  6:57 [PATCH v2 bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2020-03-25  6:57 ` [PATCH v2 bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment Andrii Nakryiko
@ 2020-03-25  6:57 ` Andrii Nakryiko
  5 siblings, 0 replies; 14+ messages in thread
From: Andrii Nakryiko @ 2020-03-25  6:57 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, rdna
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add selftests to exercise FD-based cgroup BPF program attachments and their
intermixing with legacy cgroup BPF attachments. Auto-detachment and program
replacement (both unconditional and cmpxchng-like) are tested as well.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 .../selftests/bpf/prog_tests/cgroup_link.c    | 235 ++++++++++++++++++
 .../selftests/bpf/progs/test_cgroup_link.c    |  24 ++
 2 files changed, 259 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_link.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_cgroup_link.c

diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_link.c b/tools/testing/selftests/bpf/prog_tests/cgroup_link.c
new file mode 100644
index 000000000000..2076a9861f74
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/cgroup_link.c
@@ -0,0 +1,235 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+#include "cgroup_helpers.h"
+#include "test_cgroup_link.skel.h"
+
+static __u32 duration = 0;
+#define PING_CMD	"ping -q -c1 -w1 127.0.0.1 > /dev/null"
+
+static struct test_cgroup_link *skel = NULL;
+
+int ping_and_check(int exp_calls, int exp_alt_calls)
+{
+	skel->bss->calls = 0;
+	skel->bss->alt_calls = 0;
+	CHECK_FAIL(system(PING_CMD));
+	if (CHECK(skel->bss->calls != exp_calls, "call_cnt",
+		  "exp %d, got %d\n", exp_calls, skel->bss->calls))
+		return -EINVAL;
+	if (CHECK(skel->bss->alt_calls != exp_alt_calls, "alt_call_cnt",
+		  "exp %d, got %d\n", exp_alt_calls, skel->bss->alt_calls))
+		return -EINVAL;
+	return 0;
+}
+
+void test_cgroup_link(void)
+{
+	struct {
+		const char *path;
+		int fd;
+	} cgs[] = {
+		{ "/cg1" },
+		{ "/cg1/cg2" },
+		{ "/cg1/cg2/cg3" },
+	};
+	int last_cg = ARRAY_SIZE(cgs) - 1, cg_nr = ARRAY_SIZE(cgs);
+	DECLARE_LIBBPF_OPTS(bpf_link_update_opts, link_upd_opts);
+	struct bpf_link *links[ARRAY_SIZE(cgs)] = {}, *tmp_link;
+	__u32 prog_ids[5], prog_cnt = 0, attach_flags;
+	int i = 0, err, prog_fd;
+	bool detach_legacy = false;
+
+	skel = test_cgroup_link__open_and_load();
+	if (CHECK(!skel, "skel_open_load", "failed to open/load skeleton\n"))
+		return;
+	prog_fd = bpf_program__fd(skel->progs.egress);
+
+	err = setup_cgroup_environment();
+	if (CHECK(err, "cg_init", "failed: %d\n", err))
+		goto cleanup;
+
+	for (i = 0; i < cg_nr; i++) {
+		cgs[i].fd = create_and_get_cgroup(cgs[i].path);
+		if (CHECK(cgs[i].fd < 0, "cg_create", "fail: %d\n", cgs[i].fd))
+			goto cleanup;
+	}
+
+	err = join_cgroup(cgs[last_cg].path);
+	if (CHECK(err, "cg_join", "fail: %d\n", err))
+		goto cleanup;
+
+	for (i = 0; i < cg_nr; i++) {
+		links[i] = bpf_program__attach_cgroup(skel->progs.egress,
+						      cgs[i].fd,
+						      BPF_F_ALLOW_MULTI);
+		if (CHECK(IS_ERR(links[i]), "cg_attach", "i: %d, err: %ld\n",
+				 i, PTR_ERR(links[i])))
+			goto cleanup;
+	}
+
+	ping_and_check(cg_nr, 0);
+
+	/* query the number of effective progs in last cg */
+	err = bpf_prog_query(cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS,
+			     BPF_F_QUERY_EFFECTIVE, NULL, NULL,
+			     &prog_cnt);
+	CHECK_FAIL(err);
+	if (CHECK(prog_cnt != cg_nr, "effect_cnt", "exp %d, got %d\n",
+		  cg_nr, prog_cnt))
+		goto cleanup;
+	/* query the effective prog IDs in last cg */
+	err = bpf_prog_query(cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS,
+			     BPF_F_QUERY_EFFECTIVE, &attach_flags,
+			     prog_ids, &prog_cnt);
+	CHECK_FAIL(err);
+	if (CHECK(prog_cnt != cg_nr, "effect_cnt", "exp %d, got %d\n",
+		  cg_nr, prog_cnt))
+		goto cleanup;
+	CHECK_FAIL(attach_flags != BPF_F_ALLOW_MULTI);
+	for (i = 1; i < prog_cnt; i++) {
+		CHECK(prog_ids[i - 1] != prog_ids[i], "prod_id_check",
+		      "idx %d, prev id %d, cur id %d\n",
+		      i, prog_ids[i - 1], prog_ids[i]);
+	}
+
+	/* detach bottom program and ping again */
+	bpf_link__destroy(links[last_cg]);
+	links[last_cg] = NULL;
+
+	ping_and_check(cg_nr - 1, 0);
+
+	/* mix in with non link-based multi-attachments */
+	err = bpf_prog_attach(prog_fd, cgs[last_cg].fd,
+			      BPF_CGROUP_INET_EGRESS, BPF_F_ALLOW_MULTI);
+	if (CHECK(err, "cg_attach_legacy", "errno=%d\n", errno))
+		goto cleanup;
+	detach_legacy = true;
+
+	links[last_cg] = bpf_program__attach_cgroup(skel->progs.egress,
+						    cgs[last_cg].fd,
+						    BPF_F_ALLOW_MULTI);
+	if (CHECK(IS_ERR(links[last_cg]), "cg_attach", "err: %ld\n",
+		  PTR_ERR(links[last_cg])))
+		goto cleanup;
+
+	/* attempt to mix in with exclusive bpf_link */
+	tmp_link = bpf_program__attach_cgroup(skel->progs.egress,
+					      cgs[last_cg].fd,
+					      BPF_F_ALLOW_OVERRIDE);
+	if (CHECK(!IS_ERR(tmp_link), "cg_attach_fail", "unexpected success!\n")) {
+		bpf_link__destroy(tmp_link);
+		goto cleanup;
+	}
+
+	ping_and_check(cg_nr + 1, 0);
+
+	/* detach */
+	bpf_link__destroy(links[last_cg]);
+	links[last_cg] = NULL;
+
+	/* detach legacy */
+	err = bpf_prog_detach2(prog_fd, cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS);
+	if (CHECK(err, "cg_detach_legacy", "errno=%d\n", errno))
+		goto cleanup;
+	detach_legacy = false;
+
+	/* attach legacy exclusive prog attachment */
+	err = bpf_prog_attach(prog_fd, cgs[last_cg].fd,
+			      BPF_CGROUP_INET_EGRESS, 0);
+	if (CHECK(err, "cg_attach_exclusive", "errno=%d\n", errno))
+		goto cleanup;
+	detach_legacy = true;
+
+	/* replace legacy exlusive prog with exclusive bpf_link */
+	links[last_cg] = bpf_program__attach_cgroup(skel->progs.egress,
+						    cgs[last_cg].fd,
+						    0);
+	if (CHECK(IS_ERR(links[last_cg]), "cg_replace", "err: %ld\n",
+		  PTR_ERR(links[last_cg])))
+		goto cleanup;
+	detach_legacy = false;
+
+	ping_and_check(cg_nr, 0);
+
+	/* detach */
+	err = bpf_prog_detach2(prog_fd, cgs[last_cg].fd, BPF_CGROUP_INET_EGRESS);
+	if (CHECK(!err, "cg_detach_legacy", "unexpected success!\n"))
+		goto cleanup;
+
+	/* attempt to mix in with multi-attach bpf_link */
+	tmp_link = bpf_program__attach_cgroup(skel->progs.egress,
+					      cgs[last_cg].fd,
+					      BPF_F_ALLOW_OVERRIDE);
+	if (CHECK(!IS_ERR(tmp_link), "cg_attach_fail", "unexpected success!\n")) {
+		bpf_link__destroy(tmp_link);
+		goto cleanup;
+	}
+
+	ping_and_check(cg_nr, 0);
+
+	/* replace BPF programs inside their links for all but first link */
+	for (i = 1; i < cg_nr; i++) {
+		err = bpf_link__update_program(links[i], skel->progs.egress_alt);
+		if (CHECK(err, "prog_upd", "link #%d\n", i))
+			goto cleanup;
+	}
+
+	ping_and_check(1, cg_nr - 1);
+
+	/* Attempt program update with wrong expected BPF program */
+	link_upd_opts.old_prog_fd = bpf_program__fd(skel->progs.egress_alt);
+	link_upd_opts.flags = BPF_F_REPLACE;
+	err = bpf_link_update(bpf_link__fd(links[0]),
+			      bpf_program__fd(skel->progs.egress_alt),
+			      &link_upd_opts);
+	if (CHECK(err == 0 || errno != EPERM, "prog_cmpxchg1",
+		  "unexpectedly succeeded, err %d, errno %d\n", err, -errno))
+		goto cleanup;
+
+	/* Compare-exchange single link program from egress to egress_alt */
+	link_upd_opts.old_prog_fd = bpf_program__fd(skel->progs.egress);
+	link_upd_opts.flags = BPF_F_REPLACE;
+	err = bpf_link_update(bpf_link__fd(links[0]),
+			      bpf_program__fd(skel->progs.egress_alt),
+			      &link_upd_opts);
+	if (CHECK(err, "prog_cmpxchg2", "errno %d\n", -errno))
+		goto cleanup;
+
+	/* ping */
+	ping_and_check(0, cg_nr);
+
+	/* close cgroup FDs before detaching links */
+	for (i = 0; i < cg_nr; i++) {
+		if (cgs[i].fd > 0) {
+			close(cgs[i].fd);
+			cgs[i].fd = -1;
+		}
+	}
+
+	/* BPF programs should still get called */
+	ping_and_check(0, cg_nr);
+
+	/* leave cgroup and remove them, don't detach programs */
+	cleanup_cgroup_environment();
+
+	/* BPF programs should have been auto-detached */
+	ping_and_check(0, 0);
+
+cleanup:
+	if (detach_legacy)
+		bpf_prog_detach2(prog_fd, cgs[last_cg].fd,
+				 BPF_CGROUP_INET_EGRESS);
+
+	for (i = 0; i < cg_nr; i++) {
+		if (!IS_ERR(links[i]))
+			bpf_link__destroy(links[i]);
+	}
+	test_cgroup_link__destroy(skel);
+
+	for (i = 0; i < cg_nr; i++) {
+		if (cgs[i].fd > 0)
+			close(cgs[i].fd);
+	}
+	cleanup_cgroup_environment();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_cgroup_link.c b/tools/testing/selftests/bpf/progs/test_cgroup_link.c
new file mode 100644
index 000000000000..77e47b9e4446
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_cgroup_link.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2020 Facebook
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+int calls = 0;
+int alt_calls = 0;
+
+SEC("cgroup_skb/egress1")
+int egress(struct __sk_buff *skb)
+{
+	__sync_fetch_and_add(&calls, 1);
+	return 1;
+}
+
+SEC("cgroup_skb/egress2")
+int egress_alt(struct __sk_buff *skb)
+{
+	__sync_fetch_and_add(&alt_calls, 1);
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-25  6:57 ` [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Andrii Nakryiko
@ 2020-03-25 22:57   ` kbuild test robot
  2020-03-26  1:28     ` Andrii Nakryiko
  2020-03-26  0:45   ` kbuild test robot
  2020-03-26 23:35   ` Alexei Starovoitov
  2 siblings, 1 reply; 14+ messages in thread
From: kbuild test robot @ 2020-03-25 22:57 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: kbuild-all, bpf, netdev, ast, daniel, rdna

[-- Attachment #1: Type: text/plain, Size: 2550 bytes --]

Hi Andrii,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]
[cannot apply to bpf/master cgroup/for-next v5.6-rc7 next-20200325]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Andrii-Nakryiko/Add-support-for-cgroup-bpf_link/20200326-055942
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.5.0-5) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/linux/cgroup-defs.h:22:0,
                    from include/linux/cgroup.h:28,
                    from include/linux/memcontrol.h:13,
                    from include/linux/swap.h:9,
                    from include/linux/suspend.h:5,
                    from arch/x86/kernel/asm-offsets.c:13:
>> include/linux/bpf-cgroup.h:380:45: warning: 'struct bpf_link' declared inside parameter list will not be visible outside of this definition or declaration
    static inline int cgroup_bpf_replace(struct bpf_link *link,
                                                ^~~~~~~~
--
   In file included from include/linux/cgroup-defs.h:22:0,
                    from include/linux/cgroup.h:28,
                    from include/linux/memcontrol.h:13,
                    from include/linux/swap.h:9,
                    from include/linux/suspend.h:5,
                    from arch/x86/kernel/asm-offsets.c:13:
>> include/linux/bpf-cgroup.h:380:45: warning: 'struct bpf_link' declared inside parameter list will not be visible outside of this definition or declaration
    static inline int cgroup_bpf_replace(struct bpf_link *link,
                                                ^~~~~~~~
   20 real  6 user  8 sys  71.33% cpu 	make prepare

vim +380 include/linux/bpf-cgroup.h

   379	
 > 380	static inline int cgroup_bpf_replace(struct bpf_link *link,
   381					     struct bpf_prog *old_prog,
   382					     struct bpf_prog *new_prog)
   383	{
   384		return -EINVAL;
   385	}
   386	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7246 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-25  6:57 ` [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Andrii Nakryiko
  2020-03-25 22:57   ` kbuild test robot
@ 2020-03-26  0:45   ` kbuild test robot
  2020-03-26 23:35   ` Alexei Starovoitov
  2 siblings, 0 replies; 14+ messages in thread
From: kbuild test robot @ 2020-03-26  0:45 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: kbuild-all, bpf, netdev, ast, daniel, rdna

[-- Attachment #1: Type: text/plain, Size: 2751 bytes --]

Hi Andrii,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]
[cannot apply to bpf/master cgroup/for-next v5.6-rc7 next-20200325]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Andrii-Nakryiko/Add-support-for-cgroup-bpf_link/20200326-055942
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: mips-fuloong2e_defconfig (attached as .config)
compiler: mips64el-linux-gcc (GCC) 5.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=5.5.0 make.cross ARCH=mips 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/linux/cgroup-defs.h:22:0,
                    from include/linux/cgroup.h:28,
                    from include/linux/memcontrol.h:13,
                    from include/linux/swap.h:9,
                    from include/linux/suspend.h:5,
                    from arch/mips/kernel/asm-offsets.c:17:
>> include/linux/bpf-cgroup.h:382:17: warning: 'struct bpf_link' declared inside parameter list
             struct bpf_prog *new_prog)
                    ^
>> include/linux/bpf-cgroup.h:382:17: warning: its scope is only this definition or declaration, which is probably not what you want
--
   In file included from include/linux/cgroup-defs.h:22:0,
                    from include/linux/cgroup.h:28,
                    from include/linux/memcontrol.h:13,
                    from include/linux/swap.h:9,
                    from include/linux/suspend.h:5,
                    from arch/mips/kernel/asm-offsets.c:17:
>> include/linux/bpf-cgroup.h:382:17: warning: 'struct bpf_link' declared inside parameter list
             struct bpf_prog *new_prog)
                    ^
>> include/linux/bpf-cgroup.h:382:17: warning: its scope is only this definition or declaration, which is probably not what you want
   20 real  6 user  9 sys  75.80% cpu 	make prepare

vim +382 include/linux/bpf-cgroup.h

   379	
   380	static inline int cgroup_bpf_replace(struct bpf_link *link,
   381					     struct bpf_prog *old_prog,
 > 382					     struct bpf_prog *new_prog)
   383	{
   384		return -EINVAL;
   385	}
   386	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 19209 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-25 22:57   ` kbuild test robot
@ 2020-03-26  1:28     ` Andrii Nakryiko
  0 siblings, 0 replies; 14+ messages in thread
From: Andrii Nakryiko @ 2020-03-26  1:28 UTC (permalink / raw)
  To: kbuild test robot
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Andrey Ignatov

On Wed, Mar 25, 2020 at 3:58 PM kbuild test robot <lkp@intel.com> wrote:
>
> Hi Andrii,
>
> I love your patch! Perhaps something to improve:
>
> [auto build test WARNING on bpf-next/master]
> [cannot apply to bpf/master cgroup/for-next v5.6-rc7 next-20200325]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
>
> url:    https://github.com/0day-ci/linux/commits/Andrii-Nakryiko/Add-support-for-cgroup-bpf_link/20200326-055942
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> config: i386-tinyconfig (attached as .config)
> compiler: gcc-7 (Debian 7.5.0-5) 7.5.0
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=i386
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
>
> All warnings (new ones prefixed by >>):
>
>    In file included from include/linux/cgroup-defs.h:22:0,
>                     from include/linux/cgroup.h:28,
>                     from include/linux/memcontrol.h:13,
>                     from include/linux/swap.h:9,
>                     from include/linux/suspend.h:5,
>                     from arch/x86/kernel/asm-offsets.c:13:
> >> include/linux/bpf-cgroup.h:380:45: warning: 'struct bpf_link' declared inside parameter list will not be visible outside of this definition or declaration
>     static inline int cgroup_bpf_replace(struct bpf_link *link,
>                                                 ^~~~~~~~
> --
>    In file included from include/linux/cgroup-defs.h:22:0,
>                     from include/linux/cgroup.h:28,
>                     from include/linux/memcontrol.h:13,
>                     from include/linux/swap.h:9,
>                     from include/linux/suspend.h:5,
>                     from arch/x86/kernel/asm-offsets.c:13:
> >> include/linux/bpf-cgroup.h:380:45: warning: 'struct bpf_link' declared inside parameter list will not be visible outside of this definition or declaration
>     static inline int cgroup_bpf_replace(struct bpf_link *link,
>                                                 ^~~~~~~~
>    20 real  6 user  8 sys  71.33% cpu   make prepare
>
> vim +380 include/linux/bpf-cgroup.h
>
>    379

It's a matter of forward declaring struct bpf_link here. Will fix in
v3, but I'll wait a bit for any feedback before updating.

>  > 380  static inline int cgroup_bpf_replace(struct bpf_link *link,
>    381                                       struct bpf_prog *old_prog,
>    382                                       struct bpf_prog *new_prog)
>    383  {
>    384          return -EINVAL;
>    385  }
>    386
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-25  6:57 ` [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Andrii Nakryiko
  2020-03-25 22:57   ` kbuild test robot
  2020-03-26  0:45   ` kbuild test robot
@ 2020-03-26 23:35   ` Alexei Starovoitov
  2020-03-26 23:59     ` Andrii Nakryiko
  2 siblings, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2020-03-26 23:35 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, ast, daniel, rdna, andrii.nakryiko, kernel-team

On Tue, Mar 24, 2020 at 11:57:44PM -0700, Andrii Nakryiko wrote:
>  
> +/* Swap updated BPF program for given link in effective program arrays across
> + * all descendant cgroups. This function is guaranteed to succeed.
> + */
> +static void replace_effective_prog(struct cgroup *cgrp,
> +				   enum bpf_attach_type type,
> +				   struct bpf_cgroup_link *link)
> +{
> +	struct bpf_prog_array_item *item;
> +	struct cgroup_subsys_state *css;
> +	struct bpf_prog_array *progs;
> +	struct bpf_prog_list *pl;
> +	struct list_head *head;
> +	struct cgroup *cg;
> +	int pos;
> +
> +	css_for_each_descendant_pre(css, &cgrp->self) {
> +		struct cgroup *desc = container_of(css, struct cgroup, self);
> +
> +		if (percpu_ref_is_zero(&desc->bpf.refcnt))
> +			continue;
> +
> +		/* found position of link in effective progs array */
> +		for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) {
> +			if (pos && !(cg->bpf.flags[type] & BPF_F_ALLOW_MULTI))
> +				continue;
> +
> +			head = &cg->bpf.progs[type];
> +			list_for_each_entry(pl, head, node) {
> +				if (!prog_list_prog(pl))
> +					continue;
> +				if (pl->link == link)
> +					goto found;
> +				pos++;
> +			}
> +		}
> +found:
> +		BUG_ON(!cg);
> +		progs = rcu_dereference_protected(
> +				desc->bpf.effective[type],
> +				lockdep_is_held(&cgroup_mutex));
> +		item = &progs->items[pos];
> +		WRITE_ONCE(item->prog, link->link.prog);
> +	}
> +}
> +
> +/**
> + * __cgroup_bpf_replace() - Replace link's program and propagate the change
> + *                          to descendants
> + * @cgrp: The cgroup which descendants to traverse
> + * @link: A link for which to replace BPF program
> + * @type: Type of attach operation
> + *
> + * Must be called with cgroup_mutex held.
> + */
> +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
> +			 struct bpf_prog *new_prog)
> +{
> +	struct list_head *progs = &cgrp->bpf.progs[link->type];
> +	struct bpf_prog *old_prog;
> +	struct bpf_prog_list *pl;
> +	bool found = false;
> +
> +	if (link->link.prog->type != new_prog->type)
> +		return -EINVAL;
> +
> +	list_for_each_entry(pl, progs, node) {
> +		if (pl->link == link) {
> +			found = true;
> +			break;
> +		}
> +	}
> +	if (!found)
> +		return -ENOENT;
> +
> +	old_prog = xchg(&link->link.prog, new_prog);
> +	replace_effective_prog(cgrp, link->type, link);

I think with 'found = true' in this function you're assuming that it will be
found in replace_effective_prog() ? I don't think that's the case.
Try to create bpf_link with BPF_F_ALLOW_OVERRIDE, override it in a child cgroup
with another link and then try to LINK_UPDATE the former. The link is there,
but the prog is not executing and it's not in effective array. What LINK_UPDATE
suppose to do? I guess it should succeed?
Even trickier that the prog will be in effective array in some of
css_for_each_descendant_pre() and not in others. This cgroup attach semantics
were convoluted from the day one. Apparently people use all three variants now,
but I wouldn't bet that everyone understands it.
Hence my proposal to support F_ALLOW_MULTI for links only. At least initially.
It's so much simpler to explain. And owning bpf_link will guarantee that the
prog is executing (unless cgroup is removed and sockets are closed). I guess
default (no-override) is acceptable to bpf_link as well and in that sense it
will be very similar to XDP with single prog attached. So I think I can live
with default and ALLOW_MULTI for now. But we should probably redesign
overriding capabilities. Folks need to attach multiple progs to a given cgroup
and disallow all progs in children. Currently it's not possible to do, since
MULTI in the parent allows at least one (default, override or multi) in the
children. bpf_link inheriting this logic won't help to solve this use case. It
feels that link should stay as multi only and override or not in the children
should be a separate property. Probably not related to link at all. It fits
better as a cgroup permission.

Anyhow I'm going to apply patches 1 and 2, since they are good cleanup
regardless of what we decide here.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-26 23:35   ` Alexei Starovoitov
@ 2020-03-26 23:59     ` Andrii Nakryiko
  2020-03-27  0:34       ` Alexei Starovoitov
  0 siblings, 1 reply; 14+ messages in thread
From: Andrii Nakryiko @ 2020-03-26 23:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Andrey Ignatov, Kernel Team

On Thu, Mar 26, 2020 at 4:35 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Mar 24, 2020 at 11:57:44PM -0700, Andrii Nakryiko wrote:
> >
> > +/* Swap updated BPF program for given link in effective program arrays across
> > + * all descendant cgroups. This function is guaranteed to succeed.
> > + */
> > +static void replace_effective_prog(struct cgroup *cgrp,
> > +                                enum bpf_attach_type type,
> > +                                struct bpf_cgroup_link *link)
> > +{
> > +     struct bpf_prog_array_item *item;
> > +     struct cgroup_subsys_state *css;
> > +     struct bpf_prog_array *progs;
> > +     struct bpf_prog_list *pl;
> > +     struct list_head *head;
> > +     struct cgroup *cg;
> > +     int pos;
> > +
> > +     css_for_each_descendant_pre(css, &cgrp->self) {
> > +             struct cgroup *desc = container_of(css, struct cgroup, self);
> > +
> > +             if (percpu_ref_is_zero(&desc->bpf.refcnt))
> > +                     continue;
> > +
> > +             /* found position of link in effective progs array */
> > +             for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) {
> > +                     if (pos && !(cg->bpf.flags[type] & BPF_F_ALLOW_MULTI))
> > +                             continue;
> > +
> > +                     head = &cg->bpf.progs[type];
> > +                     list_for_each_entry(pl, head, node) {
> > +                             if (!prog_list_prog(pl))
> > +                                     continue;
> > +                             if (pl->link == link)
> > +                                     goto found;
> > +                             pos++;
> > +                     }
> > +             }
> > +found:
> > +             BUG_ON(!cg);
> > +             progs = rcu_dereference_protected(
> > +                             desc->bpf.effective[type],
> > +                             lockdep_is_held(&cgroup_mutex));
> > +             item = &progs->items[pos];
> > +             WRITE_ONCE(item->prog, link->link.prog);
> > +     }
> > +}
> > +
> > +/**
> > + * __cgroup_bpf_replace() - Replace link's program and propagate the change
> > + *                          to descendants
> > + * @cgrp: The cgroup which descendants to traverse
> > + * @link: A link for which to replace BPF program
> > + * @type: Type of attach operation
> > + *
> > + * Must be called with cgroup_mutex held.
> > + */
> > +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
> > +                      struct bpf_prog *new_prog)
> > +{
> > +     struct list_head *progs = &cgrp->bpf.progs[link->type];
> > +     struct bpf_prog *old_prog;
> > +     struct bpf_prog_list *pl;
> > +     bool found = false;
> > +
> > +     if (link->link.prog->type != new_prog->type)
> > +             return -EINVAL;
> > +
> > +     list_for_each_entry(pl, progs, node) {
> > +             if (pl->link == link) {
> > +                     found = true;
> > +                     break;
> > +             }
> > +     }
> > +     if (!found)
> > +             return -ENOENT;
> > +
> > +     old_prog = xchg(&link->link.prog, new_prog);
> > +     replace_effective_prog(cgrp, link->type, link);
>
> I think with 'found = true' in this function you're assuming that it will be
> found in replace_effective_prog() ? I don't think that's the case.
> Try to create bpf_link with BPF_F_ALLOW_OVERRIDE, override it in a child cgroup
> with another link and then try to LINK_UPDATE the former. The link is there,
> but the prog is not executing and it's not in effective array. What LINK_UPDATE
> suppose to do? I guess it should succeed?

Yes, this is a great catch! I should have used ALLOW_OVERRIDE at the
root cgroup level in my selftest, it would catch it immediately.

BUG_ON(!cg) in replace_effective_prog() is too aggressive, if I
replace it with `if (!cg) continue;` it will handle this as well.

> Even trickier that the prog will be in effective array in some of
> css_for_each_descendant_pre() and not in others. This cgroup attach semantics
> were convoluted from the day one. Apparently people use all three variants now,
> but I wouldn't bet that everyone understands it.

Agree about convoluted logic, spent enormous time understanding and
modifying it :)

But apart from BUG_ON(!cg) problem, everything works because each
level of hierarchy is treated independently in
replace_effective_prog(). Search for attached link on each level is
reset and performed anew. If found - we replace program, if not - must
be ALLOW_OVERRIDE case, i.e., actually overridden link.

> Hence my proposal to support F_ALLOW_MULTI for links only. At least initially.
> It's so much simpler to explain. And owning bpf_link will guarantee that the
> prog is executing (unless cgroup is removed and sockets are closed). I guess
> default (no-override) is acceptable to bpf_link as well and in that sense it
> will be very similar to XDP with single prog attached. So I think I can live
> with default and ALLOW_MULTI for now. But we should probably redesign
> overriding capabilities. Folks need to attach multiple progs to a given cgroup
> and disallow all progs in children. Currently it's not possible to do, since
> MULTI in the parent allows at least one (default, override or multi) in the
> children. bpf_link inheriting this logic won't help to solve this use case. It
> feels that link should stay as multi only and override or not in the children
> should be a separate property. Probably not related to link at all. It fits
> better as a cgroup permission.

Yeah, we had a brief discussion with Andrey on mailing list. Not sure
what the solution looks like, but it should be orthogonal to link/prog
attachment operation, probably.

If you insist and Andrey is ok with dropping ALLOW_OVERRIDE, it's
easy. But fixing the logic to handle it is also easy. So are we sure
about supporting 2 out of 3 existing modes? :)

>
> Anyhow I'm going to apply patches 1 and 2, since they are good cleanup
> regardless of what we decide here.

Thanks, will rebase on top of bpf-next master for v3.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-26 23:59     ` Andrii Nakryiko
@ 2020-03-27  0:34       ` Alexei Starovoitov
  2020-03-27  0:55         ` Andrii Nakryiko
  0 siblings, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2020-03-27  0:34 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Andrey Ignatov, Kernel Team

On Thu, Mar 26, 2020 at 04:59:06PM -0700, Andrii Nakryiko wrote:
> On Thu, Mar 26, 2020 at 4:35 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Mar 24, 2020 at 11:57:44PM -0700, Andrii Nakryiko wrote:
> > >
> > > +/* Swap updated BPF program for given link in effective program arrays across
> > > + * all descendant cgroups. This function is guaranteed to succeed.
> > > + */
> > > +static void replace_effective_prog(struct cgroup *cgrp,
> > > +                                enum bpf_attach_type type,
> > > +                                struct bpf_cgroup_link *link)
> > > +{
> > > +     struct bpf_prog_array_item *item;
> > > +     struct cgroup_subsys_state *css;
> > > +     struct bpf_prog_array *progs;
> > > +     struct bpf_prog_list *pl;
> > > +     struct list_head *head;
> > > +     struct cgroup *cg;
> > > +     int pos;
> > > +
> > > +     css_for_each_descendant_pre(css, &cgrp->self) {
> > > +             struct cgroup *desc = container_of(css, struct cgroup, self);
> > > +
> > > +             if (percpu_ref_is_zero(&desc->bpf.refcnt))
> > > +                     continue;
> > > +
> > > +             /* found position of link in effective progs array */
> > > +             for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) {
> > > +                     if (pos && !(cg->bpf.flags[type] & BPF_F_ALLOW_MULTI))
> > > +                             continue;
> > > +
> > > +                     head = &cg->bpf.progs[type];
> > > +                     list_for_each_entry(pl, head, node) {
> > > +                             if (!prog_list_prog(pl))
> > > +                                     continue;
> > > +                             if (pl->link == link)
> > > +                                     goto found;
> > > +                             pos++;
> > > +                     }
> > > +             }
> > > +found:
> > > +             BUG_ON(!cg);
> > > +             progs = rcu_dereference_protected(
> > > +                             desc->bpf.effective[type],
> > > +                             lockdep_is_held(&cgroup_mutex));
> > > +             item = &progs->items[pos];
> > > +             WRITE_ONCE(item->prog, link->link.prog);
> > > +     }
> > > +}
> > > +
> > > +/**
> > > + * __cgroup_bpf_replace() - Replace link's program and propagate the change
> > > + *                          to descendants
> > > + * @cgrp: The cgroup which descendants to traverse
> > > + * @link: A link for which to replace BPF program
> > > + * @type: Type of attach operation
> > > + *
> > > + * Must be called with cgroup_mutex held.
> > > + */
> > > +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
> > > +                      struct bpf_prog *new_prog)
> > > +{
> > > +     struct list_head *progs = &cgrp->bpf.progs[link->type];
> > > +     struct bpf_prog *old_prog;
> > > +     struct bpf_prog_list *pl;
> > > +     bool found = false;
> > > +
> > > +     if (link->link.prog->type != new_prog->type)
> > > +             return -EINVAL;
> > > +
> > > +     list_for_each_entry(pl, progs, node) {
> > > +             if (pl->link == link) {
> > > +                     found = true;
> > > +                     break;
> > > +             }
> > > +     }
> > > +     if (!found)
> > > +             return -ENOENT;
> > > +
> > > +     old_prog = xchg(&link->link.prog, new_prog);
> > > +     replace_effective_prog(cgrp, link->type, link);
> >
> > I think with 'found = true' in this function you're assuming that it will be
> > found in replace_effective_prog() ? I don't think that's the case.
> > Try to create bpf_link with BPF_F_ALLOW_OVERRIDE, override it in a child cgroup
> > with another link and then try to LINK_UPDATE the former. The link is there,
> > but the prog is not executing and it's not in effective array. What LINK_UPDATE
> > suppose to do? I guess it should succeed?
> 
> Yes, this is a great catch! I should have used ALLOW_OVERRIDE at the
> root cgroup level in my selftest, it would catch it immediately.
> 
> BUG_ON(!cg) in replace_effective_prog() is too aggressive, if I
> replace it with `if (!cg) continue;` it will handle this as well.
> 
> > Even trickier that the prog will be in effective array in some of
> > css_for_each_descendant_pre() and not in others. This cgroup attach semantics
> > were convoluted from the day one. Apparently people use all three variants now,
> > but I wouldn't bet that everyone understands it.
> 
> Agree about convoluted logic, spent enormous time understanding and
> modifying it :)
> 
> But apart from BUG_ON(!cg) problem, everything works because each
> level of hierarchy is treated independently in
> replace_effective_prog(). Search for attached link on each level is
> reset and performed anew. If found - we replace program, if not - must
> be ALLOW_OVERRIDE case, i.e., actually overridden link.
> 
> > Hence my proposal to support F_ALLOW_MULTI for links only. At least initially.
> > It's so much simpler to explain. And owning bpf_link will guarantee that the
> > prog is executing (unless cgroup is removed and sockets are closed). I guess
> > default (no-override) is acceptable to bpf_link as well and in that sense it
> > will be very similar to XDP with single prog attached. So I think I can live
> > with default and ALLOW_MULTI for now. But we should probably redesign
> > overriding capabilities. Folks need to attach multiple progs to a given cgroup
> > and disallow all progs in children. Currently it's not possible to do, since
> > MULTI in the parent allows at least one (default, override or multi) in the
> > children. bpf_link inheriting this logic won't help to solve this use case. It
> > feels that link should stay as multi only and override or not in the children
> > should be a separate property. Probably not related to link at all. It fits
> > better as a cgroup permission.
> 
> Yeah, we had a brief discussion with Andrey on mailing list. Not sure
> what the solution looks like, but it should be orthogonal to link/prog
> attachment operation, probably.
> 
> If you insist and Andrey is ok with dropping ALLOW_OVERRIDE, it's
> easy. But fixing the logic to handle it is also easy. So are we sure
> about supporting 2 out of 3 existing modes? :)

I wasn't clear enough. My preference is only multi for bpf_link with a concrete
plan how cgroup permissions can do no-override, selective override, and
whatever else container folks need.
I can imagine somebody may want to attach bind/connect at outer container level
and disallow this specific attach_type for children while allowing other
cgroup-bpf prog types in inner containers. There is no way to do so now and
flags for bpf_link is not the answer either.

> Thanks, will rebase on top of bpf-next master for v3.

please wait with repost until this discussion settles.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
  2020-03-27  0:34       ` Alexei Starovoitov
@ 2020-03-27  0:55         ` Andrii Nakryiko
  0 siblings, 0 replies; 14+ messages in thread
From: Andrii Nakryiko @ 2020-03-27  0:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Andrey Ignatov, Kernel Team

On Thu, Mar 26, 2020 at 5:34 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Mar 26, 2020 at 04:59:06PM -0700, Andrii Nakryiko wrote:
> > On Thu, Mar 26, 2020 at 4:35 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Tue, Mar 24, 2020 at 11:57:44PM -0700, Andrii Nakryiko wrote:
> > > >
> > > > +/* Swap updated BPF program for given link in effective program arrays across
> > > > + * all descendant cgroups. This function is guaranteed to succeed.
> > > > + */
> > > > +static void replace_effective_prog(struct cgroup *cgrp,
> > > > +                                enum bpf_attach_type type,
> > > > +                                struct bpf_cgroup_link *link)
> > > > +{
> > > > +     struct bpf_prog_array_item *item;
> > > > +     struct cgroup_subsys_state *css;
> > > > +     struct bpf_prog_array *progs;
> > > > +     struct bpf_prog_list *pl;
> > > > +     struct list_head *head;
> > > > +     struct cgroup *cg;
> > > > +     int pos;
> > > > +
> > > > +     css_for_each_descendant_pre(css, &cgrp->self) {
> > > > +             struct cgroup *desc = container_of(css, struct cgroup, self);
> > > > +
> > > > +             if (percpu_ref_is_zero(&desc->bpf.refcnt))
> > > > +                     continue;
> > > > +
> > > > +             /* found position of link in effective progs array */
> > > > +             for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) {
> > > > +                     if (pos && !(cg->bpf.flags[type] & BPF_F_ALLOW_MULTI))
> > > > +                             continue;
> > > > +
> > > > +                     head = &cg->bpf.progs[type];
> > > > +                     list_for_each_entry(pl, head, node) {
> > > > +                             if (!prog_list_prog(pl))
> > > > +                                     continue;
> > > > +                             if (pl->link == link)
> > > > +                                     goto found;
> > > > +                             pos++;
> > > > +                     }
> > > > +             }
> > > > +found:
> > > > +             BUG_ON(!cg);
> > > > +             progs = rcu_dereference_protected(
> > > > +                             desc->bpf.effective[type],
> > > > +                             lockdep_is_held(&cgroup_mutex));
> > > > +             item = &progs->items[pos];
> > > > +             WRITE_ONCE(item->prog, link->link.prog);
> > > > +     }
> > > > +}
> > > > +
> > > > +/**
> > > > + * __cgroup_bpf_replace() - Replace link's program and propagate the change
> > > > + *                          to descendants
> > > > + * @cgrp: The cgroup which descendants to traverse
> > > > + * @link: A link for which to replace BPF program
> > > > + * @type: Type of attach operation
> > > > + *
> > > > + * Must be called with cgroup_mutex held.
> > > > + */
> > > > +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
> > > > +                      struct bpf_prog *new_prog)
> > > > +{
> > > > +     struct list_head *progs = &cgrp->bpf.progs[link->type];
> > > > +     struct bpf_prog *old_prog;
> > > > +     struct bpf_prog_list *pl;
> > > > +     bool found = false;
> > > > +
> > > > +     if (link->link.prog->type != new_prog->type)
> > > > +             return -EINVAL;
> > > > +
> > > > +     list_for_each_entry(pl, progs, node) {
> > > > +             if (pl->link == link) {
> > > > +                     found = true;
> > > > +                     break;
> > > > +             }
> > > > +     }
> > > > +     if (!found)
> > > > +             return -ENOENT;
> > > > +
> > > > +     old_prog = xchg(&link->link.prog, new_prog);
> > > > +     replace_effective_prog(cgrp, link->type, link);
> > >
> > > I think with 'found = true' in this function you're assuming that it will be
> > > found in replace_effective_prog() ? I don't think that's the case.
> > > Try to create bpf_link with BPF_F_ALLOW_OVERRIDE, override it in a child cgroup
> > > with another link and then try to LINK_UPDATE the former. The link is there,
> > > but the prog is not executing and it's not in effective array. What LINK_UPDATE
> > > suppose to do? I guess it should succeed?
> >
> > Yes, this is a great catch! I should have used ALLOW_OVERRIDE at the
> > root cgroup level in my selftest, it would catch it immediately.
> >
> > BUG_ON(!cg) in replace_effective_prog() is too aggressive, if I
> > replace it with `if (!cg) continue;` it will handle this as well.
> >
> > > Even trickier that the prog will be in effective array in some of
> > > css_for_each_descendant_pre() and not in others. This cgroup attach semantics
> > > were convoluted from the day one. Apparently people use all three variants now,
> > > but I wouldn't bet that everyone understands it.
> >
> > Agree about convoluted logic, spent enormous time understanding and
> > modifying it :)
> >
> > But apart from BUG_ON(!cg) problem, everything works because each
> > level of hierarchy is treated independently in
> > replace_effective_prog(). Search for attached link on each level is
> > reset and performed anew. If found - we replace program, if not - must
> > be ALLOW_OVERRIDE case, i.e., actually overridden link.
> >
> > > Hence my proposal to support F_ALLOW_MULTI for links only. At least initially.
> > > It's so much simpler to explain. And owning bpf_link will guarantee that the
> > > prog is executing (unless cgroup is removed and sockets are closed). I guess
> > > default (no-override) is acceptable to bpf_link as well and in that sense it
> > > will be very similar to XDP with single prog attached. So I think I can live
> > > with default and ALLOW_MULTI for now. But we should probably redesign
> > > overriding capabilities. Folks need to attach multiple progs to a given cgroup
> > > and disallow all progs in children. Currently it's not possible to do, since
> > > MULTI in the parent allows at least one (default, override or multi) in the
> > > children. bpf_link inheriting this logic won't help to solve this use case. It
> > > feels that link should stay as multi only and override or not in the children
> > > should be a separate property. Probably not related to link at all. It fits
> > > better as a cgroup permission.
> >
> > Yeah, we had a brief discussion with Andrey on mailing list. Not sure
> > what the solution looks like, but it should be orthogonal to link/prog
> > attachment operation, probably.
> >
> > If you insist and Andrey is ok with dropping ALLOW_OVERRIDE, it's
> > easy. But fixing the logic to handle it is also easy. So are we sure
> > about supporting 2 out of 3 existing modes? :)
>
> I wasn't clear enough. My preference is only multi for bpf_link with a concrete

Ah, ok, it certainly read as default + multi should be supported.
Alright, I'll drop NONE and OVERRIDE mode (so back to initial
version).

> plan how cgroup permissions can do no-override, selective override, and
> whatever else container folks need.
> I can imagine somebody may want to attach bind/connect at outer container level
> and disallow this specific attach_type for children while allowing other
> cgroup-bpf prog types in inner containers. There is no way to do so now and
> flags for bpf_link is not the answer either.
>
> > Thanks, will rebase on top of bpf-next master for v3.
>
> please wait with repost until this discussion settles.

Sure, will do.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-03-27  0:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-25  6:57 [PATCH v2 bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
2020-03-25  6:57 ` [PATCH v2 bpf-next 1/6] bpf: factor out cgroup storages operations Andrii Nakryiko
2020-03-25  6:57 ` [PATCH v2 bpf-next 2/6] bpf: factor out attach_type to prog_type mapping for attach/detach Andrii Nakryiko
2020-03-25  6:57 ` [PATCH v2 bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment Andrii Nakryiko
2020-03-25  6:57 ` [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Andrii Nakryiko
2020-03-25 22:57   ` kbuild test robot
2020-03-26  1:28     ` Andrii Nakryiko
2020-03-26  0:45   ` kbuild test robot
2020-03-26 23:35   ` Alexei Starovoitov
2020-03-26 23:59     ` Andrii Nakryiko
2020-03-27  0:34       ` Alexei Starovoitov
2020-03-27  0:55         ` Andrii Nakryiko
2020-03-25  6:57 ` [PATCH v2 bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment Andrii Nakryiko
2020-03-25  6:57 ` [PATCH v2 bpf-next 6/6] selftests/bpf: test FD-based " Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).