bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 bpf-next 0/5] Make BPF CGROUP_STORAGE map usable by different programs at once
@ 2020-07-17  0:16 YiFei Zhu
  2020-07-17  0:16 ` [PATCH v3 bpf-next 1/5] selftests/bpf: Add test for CGROUP_STORAGE map on multiple attaches YiFei Zhu
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: YiFei Zhu @ 2020-07-17  0:16 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Stanislav Fomichev,
	Mahesh Bandewar, Roman Gushchin, Andrii Nakryiko,
	Martin KaFai Lau, YiFei Zhu

From: YiFei Zhu <zhuyifei@google.com>

To access the storage in a CGROUP_STORAGE map, one uses
bpf_get_local_storage helper, which is extremely fast due to its
use of per-CPU variables. However, its whole code is built on
the assumption that one map can only be used by one program at any
time, and this prohibits any sharing of data between multiple
programs using these maps, eliminating a lot of use cases, such
as some per-cgroup configuration storage, written to by a
setsockopt program and read by a cg_sock_addr program.

Why not use other map types? The great part of CGROUP_STORAGE map
is that it is isolated by different cgroups its attached to. When
one program uses bpf_get_local_storage, even on the same map, it
gets different storages if it were run as a result of attaching
to different cgroups. The kernel manages the storages, simplifying
BPF program or userspace. In theory, one could probably use other
maps like array or hash to do the same thing, but it would be a
major overhead / complexity. Userspace needs to know when a cgroup
is being freed in order to free up a space in the replacement map.

This patch set introduces a significant change to the semantics of
CGROUP_STORAGE map type. Instead of each storage being tied to one
single attachment, it is shared across different attachments to
the same cgroup, and persists until either the map or the cgroup
attached to is being freed.

The attach_type field of the map key struct bpf_cgroup_storage_key
is now unused. If userspace reads it it will always be zero. If
userspace sends us a non-zero value it will be ignored.

How could this break existing users?
* Users that uses detach & reattach / program replacement as a
  shortcut to zeroing the storage. Since we need sharing between
  programs, we cannot zero the storage. Users that expect this
  behavior should either attach a program with a new map, or
  explicitly zero the map with a syscall.
* Programs that expect isolation from different attach types. In
  reality, attaching the same program to different attach types,
  relying on that expected_attach_type not being enforced, should
  rarely happen, if at all.
* Userspace that does memcmp on the storage key when fetching
  map keys. In reality, if user wants to use a fixed key they
  would use {delete,lookup,update}_elem, rather than get_next_key.
These cases are dependent on undocumented implementation details, 
so the impact should be very minimal.

Patch 1 introduces a test on the old expected behavior of the map
type.

Patch 2 introduces a test showing how two programs cannot share
one such map.

Patch 3 implements the change of semantics to the map.

Patch 4 amends the new test such that it yields the behavior we
expect from the change.

Patch 5 documents the map type.

Changes since RFC:
* Clarify commit message in patch 3 such that it says the lifetime
  of the storage is ended at the freeing of the cgroup_bpf, rather
  than the cgroup itself.
* Restored an -ENOMEM check in __cgroup_bpf_attach.
* Update selftests for recent change in network_helpers API.

Changes since v1:
* s/CHECK_FAIL/CHECK/
* s/bpf_prog_attach/bpf_program__attach_cgroup/
* Moved test__start_subtest to test_cg_storage_multi.
* Removed some redundant CHECK_FAIL where they are already CHECK-ed.

Changes since v2:
* Lock cgroup_mutex during map_free.
* Publish new storages only if attach is successful, by tracking
  exactly which storages are reused in an array of bools.
* Mention bpftool map dump showing a value of zero for attach_type
  in patch 3 commit message.

YiFei Zhu (5):
  selftests/bpf: Add test for CGROUP_STORAGE map on multiple attaches
  selftests/bpf: Test CGROUP_STORAGE map can't be used by multiple progs
  bpf: Make cgroup storages shared across attaches on the same cgroup
  selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress
  Documentation/bpf: Document CGROUP_STORAGE map type

 Documentation/bpf/index.rst                   |   9 +
 Documentation/bpf/map_cgroup_storage.rst      |  95 +++++++
 include/linux/bpf-cgroup.h                    |  15 +-
 include/uapi/linux/bpf.h                      |   2 +-
 kernel/bpf/cgroup.c                           |  69 +++--
 kernel/bpf/core.c                             |  12 -
 kernel/bpf/local_storage.c                    |  85 +++---
 tools/include/uapi/linux/bpf.h                |   2 +-
 .../bpf/prog_tests/cg_storage_multi.c         | 265 ++++++++++++++++++
 .../selftests/bpf/progs/cg_storage_multi.h    |  13 +
 .../progs/cg_storage_multi_egress_ingress.c   |  45 +++
 .../bpf/progs/cg_storage_multi_egress_only.c  |  33 +++
 12 files changed, 551 insertions(+), 94 deletions(-)
 create mode 100644 Documentation/bpf/map_cgroup_storage.rst
 create mode 100644 tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
 create mode 100644 tools/testing/selftests/bpf/progs/cg_storage_multi.h
 create mode 100644 tools/testing/selftests/bpf/progs/cg_storage_multi_egress_ingress.c
 create mode 100644 tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 bpf-next 1/5] selftests/bpf: Add test for CGROUP_STORAGE map on multiple attaches
  2020-07-17  0:16 [PATCH v3 bpf-next 0/5] Make BPF CGROUP_STORAGE map usable by different programs at once YiFei Zhu
@ 2020-07-17  0:16 ` YiFei Zhu
  2020-07-17  0:16 ` [PATCH v3 bpf-next 2/5] selftests/bpf: Test CGROUP_STORAGE map can't be used by multiple progs YiFei Zhu
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: YiFei Zhu @ 2020-07-17  0:16 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Stanislav Fomichev,
	Mahesh Bandewar, Roman Gushchin, Andrii Nakryiko,
	Martin KaFai Lau, YiFei Zhu

From: YiFei Zhu <zhuyifei@google.com>

This test creates a parent cgroup, and a child of that cgroup.
It attaches a cgroup_skb/egress program that simply counts packets,
to a global variable (ARRAY map), and to a CGROUP_STORAGE map.
The program is first attached to the parent cgroup only, then to
parent and child.

The test cases sends a message within the child cgroup, and because
the program is inherited across parent / child cgroups, it will
trigger the egress program for both the parent and child, if they
exist. The program, when looking up a CGROUP_STORAGE map, uses the
cgroup and attach type of the attachment parameters; therefore,
both attaches uses different cgroup storages.

We assert that all packet counts returns what we expects.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
---
 .../bpf/prog_tests/cg_storage_multi.c         | 163 ++++++++++++++++++
 .../bpf/progs/cg_storage_multi_egress_only.c  |  30 ++++
 2 files changed, 193 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
 create mode 100644 tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c

diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
new file mode 100644
index 000000000000..6d5a2194e036
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright 2020 Google LLC.
+ */
+
+#include <test_progs.h>
+#include <cgroup_helpers.h>
+#include <network_helpers.h>
+
+#include "cg_storage_multi_egress_only.skel.h"
+
+#define PARENT_CGROUP "/cgroup_storage"
+#define CHILD_CGROUP "/cgroup_storage/child"
+
+static int duration;
+
+static bool assert_storage(struct bpf_map *map, const char *cgroup_path,
+			   __u32 expected)
+{
+	struct bpf_cgroup_storage_key key = {0};
+	__u32 value;
+	int map_fd;
+
+	map_fd = bpf_map__fd(map);
+
+	key.cgroup_inode_id = get_cgroup_id(cgroup_path);
+	key.attach_type = BPF_CGROUP_INET_EGRESS;
+	if (CHECK(bpf_map_lookup_elem(map_fd, &key, &value) < 0,
+		  "map-lookup", "errno %d", errno))
+		return true;
+	if (CHECK(value != expected,
+		  "assert-storage", "got %u expected %u", value, expected))
+		return true;
+
+	return false;
+}
+
+static bool assert_storage_noexist(struct bpf_map *map, const char *cgroup_path)
+{
+	struct bpf_cgroup_storage_key key = {0};
+	__u32 value;
+	int map_fd;
+
+	map_fd = bpf_map__fd(map);
+
+	key.cgroup_inode_id = get_cgroup_id(cgroup_path);
+	key.attach_type = BPF_CGROUP_INET_EGRESS;
+	if (CHECK(bpf_map_lookup_elem(map_fd, &key, &value) == 0,
+		  "map-lookup", "succeeded, expected ENOENT"))
+		return true;
+	if (CHECK(errno != ENOENT,
+		  "map-lookup", "errno %d, expected ENOENT", errno))
+		return true;
+
+	return false;
+}
+
+static bool connect_send(const char *cgroup_path)
+{
+	bool res = true;
+	int server_fd = -1, client_fd = -1;
+
+	if (join_cgroup(cgroup_path))
+		goto out_clean;
+
+	server_fd = start_server(AF_INET, SOCK_DGRAM, NULL, 0, 0);
+	if (server_fd < 0)
+		goto out_clean;
+
+	client_fd = connect_to_fd(server_fd, 0);
+	if (client_fd < 0)
+		goto out_clean;
+
+	if (send(client_fd, "message", strlen("message"), 0) < 0)
+		goto out_clean;
+
+	res = false;
+
+out_clean:
+	close(client_fd);
+	close(server_fd);
+	return res;
+}
+
+static void test_egress_only(int parent_cgroup_fd, int child_cgroup_fd)
+{
+	struct cg_storage_multi_egress_only *obj;
+	struct bpf_link *parent_link = NULL, *child_link = NULL;
+	bool err;
+
+	obj = cg_storage_multi_egress_only__open_and_load();
+	if (CHECK(!obj, "skel-load", "errno %d", errno))
+		return;
+
+	/* Attach to parent cgroup, trigger packet from child.
+	 * Assert that there is only one run and in that run the storage is
+	 * parent cgroup's storage.
+	 * Also assert that child cgroup's storage does not exist
+	 */
+	parent_link = bpf_program__attach_cgroup(obj->progs.egress,
+						 parent_cgroup_fd);
+	if (CHECK(IS_ERR(parent_link), "parent-cg-attach",
+		  "err %ld", PTR_ERR(parent_link)))
+		goto close_bpf_object;
+	err = connect_send(CHILD_CGROUP);
+	if (CHECK(err, "first-connect-send", "errno %d", errno))
+		goto close_bpf_object;
+	if (CHECK(obj->bss->invocations != 1,
+		  "first-invoke", "invocations=%d", obj->bss->invocations))
+		goto close_bpf_object;
+	if (assert_storage(obj->maps.cgroup_storage, PARENT_CGROUP, 1))
+		goto close_bpf_object;
+	if (assert_storage_noexist(obj->maps.cgroup_storage, CHILD_CGROUP))
+		goto close_bpf_object;
+
+	/* Attach to parent and child cgroup, trigger packet from child.
+	 * Assert that there are two additional runs, one that run with parent
+	 * cgroup's storage and one with child cgroup's storage.
+	 */
+	child_link = bpf_program__attach_cgroup(obj->progs.egress,
+						child_cgroup_fd);
+	if (CHECK(IS_ERR(child_link), "child-cg-attach",
+		  "err %ld", PTR_ERR(child_link)))
+		goto close_bpf_object;
+	err = connect_send(CHILD_CGROUP);
+	if (CHECK(err, "second-connect-send", "errno %d", errno))
+		goto close_bpf_object;
+	if (CHECK(obj->bss->invocations != 3,
+		  "second-invoke", "invocations=%d", obj->bss->invocations))
+		goto close_bpf_object;
+	if (assert_storage(obj->maps.cgroup_storage, PARENT_CGROUP, 2))
+		goto close_bpf_object;
+	if (assert_storage(obj->maps.cgroup_storage, CHILD_CGROUP, 1))
+		goto close_bpf_object;
+
+close_bpf_object:
+	if (parent_link)
+		bpf_link__destroy(parent_link);
+	if (child_link)
+		bpf_link__destroy(child_link);
+
+	cg_storage_multi_egress_only__destroy(obj);
+}
+
+void test_cg_storage_multi(void)
+{
+	int parent_cgroup_fd = -1, child_cgroup_fd = -1;
+
+	parent_cgroup_fd = test__join_cgroup(PARENT_CGROUP);
+	if (CHECK(parent_cgroup_fd < 0, "cg-create-parent", "errno %d", errno))
+		goto close_cgroup_fd;
+	child_cgroup_fd = create_and_get_cgroup(CHILD_CGROUP);
+	if (CHECK(child_cgroup_fd < 0, "cg-create-child", "errno %d", errno))
+		goto close_cgroup_fd;
+
+	if (test__start_subtest("egress_only"))
+		test_egress_only(parent_cgroup_fd, child_cgroup_fd);
+
+close_cgroup_fd:
+	close(child_cgroup_fd);
+	close(parent_cgroup_fd);
+}
diff --git a/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c b/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c
new file mode 100644
index 000000000000..ec0165d07105
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright 2020 Google LLC.
+ */
+
+#include <errno.h>
+#include <linux/bpf.h>
+#include <linux/ip.h>
+#include <linux/udp.h>
+#include <bpf/bpf_helpers.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
+	__type(key, struct bpf_cgroup_storage_key);
+	__type(value, __u32);
+} cgroup_storage SEC(".maps");
+
+__u32 invocations = 0;
+
+SEC("cgroup_skb/egress")
+int egress(struct __sk_buff *skb)
+{
+	__u32 *ptr_cg_storage = bpf_get_local_storage(&cgroup_storage, 0);
+
+	__sync_fetch_and_add(ptr_cg_storage, 1);
+	__sync_fetch_and_add(&invocations, 1);
+
+	return 1;
+}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 bpf-next 2/5] selftests/bpf: Test CGROUP_STORAGE map can't be used by multiple progs
  2020-07-17  0:16 [PATCH v3 bpf-next 0/5] Make BPF CGROUP_STORAGE map usable by different programs at once YiFei Zhu
  2020-07-17  0:16 ` [PATCH v3 bpf-next 1/5] selftests/bpf: Add test for CGROUP_STORAGE map on multiple attaches YiFei Zhu
@ 2020-07-17  0:16 ` YiFei Zhu
  2020-07-17  0:16 ` [PATCH v3 bpf-next 3/5] bpf: Make cgroup storages shared across attaches on the same cgroup YiFei Zhu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: YiFei Zhu @ 2020-07-17  0:16 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Stanislav Fomichev,
	Mahesh Bandewar, Roman Gushchin, Andrii Nakryiko,
	Martin KaFai Lau, YiFei Zhu

From: YiFei Zhu <zhuyifei@google.com>

The current assumption is that the lifetime of a cgroup storage
is tied to the program's attachment. The storage is created in
cgroup_bpf_attach, and released upon cgroup_bpf_detach and
cgroup_bpf_release.

Because the current semantics is that each attachment gets a
completely independent cgroup storage, and you can have multiple
programs attached to the same (cgroup, attach type) pair, the key
of the CGROUP_STORAGE map, looking up the map with this pair could
yield multiple storages, and that is not permitted. Therefore,
the kernel verifier checks that two programs cannot share the same
CGROUP_STORAGE map, even if they have different expected attach
types, considering that the actual attach type does not always
have to be equal to the expected attach type.

The test creates a CGROUP_STORAGE map and make it shared across
two different programs, one cgroup_skb/egress and one /ingress.
It asserts that the two programs cannot be both loaded, due to
verifier failure from the above reason.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
---
 .../bpf/prog_tests/cg_storage_multi.c         | 42 +++++++++++++----
 .../selftests/bpf/progs/cg_storage_multi.h    | 13 ++++++
 .../progs/cg_storage_multi_egress_ingress.c   | 45 +++++++++++++++++++
 .../bpf/progs/cg_storage_multi_egress_only.c  |  9 ++--
 4 files changed, 98 insertions(+), 11 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/cg_storage_multi.h
 create mode 100644 tools/testing/selftests/bpf/progs/cg_storage_multi_egress_ingress.c

diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
index 6d5a2194e036..1f4ab437ddb9 100644
--- a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
+++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
@@ -8,7 +8,10 @@
 #include <cgroup_helpers.h>
 #include <network_helpers.h>
 
+#include "progs/cg_storage_multi.h"
+
 #include "cg_storage_multi_egress_only.skel.h"
+#include "cg_storage_multi_egress_ingress.skel.h"
 
 #define PARENT_CGROUP "/cgroup_storage"
 #define CHILD_CGROUP "/cgroup_storage/child"
@@ -16,10 +19,10 @@
 static int duration;
 
 static bool assert_storage(struct bpf_map *map, const char *cgroup_path,
-			   __u32 expected)
+			   struct cgroup_value *expected)
 {
 	struct bpf_cgroup_storage_key key = {0};
-	__u32 value;
+	struct cgroup_value value;
 	int map_fd;
 
 	map_fd = bpf_map__fd(map);
@@ -29,8 +32,8 @@ static bool assert_storage(struct bpf_map *map, const char *cgroup_path,
 	if (CHECK(bpf_map_lookup_elem(map_fd, &key, &value) < 0,
 		  "map-lookup", "errno %d", errno))
 		return true;
-	if (CHECK(value != expected,
-		  "assert-storage", "got %u expected %u", value, expected))
+	if (CHECK(memcmp(&value, expected, sizeof(struct cgroup_value)),
+		  "assert-storage", "storages differ"))
 		return true;
 
 	return false;
@@ -39,7 +42,7 @@ static bool assert_storage(struct bpf_map *map, const char *cgroup_path,
 static bool assert_storage_noexist(struct bpf_map *map, const char *cgroup_path)
 {
 	struct bpf_cgroup_storage_key key = {0};
-	__u32 value;
+	struct cgroup_value value;
 	int map_fd;
 
 	map_fd = bpf_map__fd(map);
@@ -86,6 +89,7 @@ static bool connect_send(const char *cgroup_path)
 static void test_egress_only(int parent_cgroup_fd, int child_cgroup_fd)
 {
 	struct cg_storage_multi_egress_only *obj;
+	struct cgroup_value expected_cgroup_value;
 	struct bpf_link *parent_link = NULL, *child_link = NULL;
 	bool err;
 
@@ -109,7 +113,9 @@ static void test_egress_only(int parent_cgroup_fd, int child_cgroup_fd)
 	if (CHECK(obj->bss->invocations != 1,
 		  "first-invoke", "invocations=%d", obj->bss->invocations))
 		goto close_bpf_object;
-	if (assert_storage(obj->maps.cgroup_storage, PARENT_CGROUP, 1))
+	expected_cgroup_value = (struct cgroup_value) { .egress_pkts = 1 };
+	if (assert_storage(obj->maps.cgroup_storage,
+			   PARENT_CGROUP, &expected_cgroup_value))
 		goto close_bpf_object;
 	if (assert_storage_noexist(obj->maps.cgroup_storage, CHILD_CGROUP))
 		goto close_bpf_object;
@@ -129,9 +135,13 @@ static void test_egress_only(int parent_cgroup_fd, int child_cgroup_fd)
 	if (CHECK(obj->bss->invocations != 3,
 		  "second-invoke", "invocations=%d", obj->bss->invocations))
 		goto close_bpf_object;
-	if (assert_storage(obj->maps.cgroup_storage, PARENT_CGROUP, 2))
+	expected_cgroup_value = (struct cgroup_value) { .egress_pkts = 2 };
+	if (assert_storage(obj->maps.cgroup_storage,
+			   PARENT_CGROUP, &expected_cgroup_value))
 		goto close_bpf_object;
-	if (assert_storage(obj->maps.cgroup_storage, CHILD_CGROUP, 1))
+	expected_cgroup_value = (struct cgroup_value) { .egress_pkts = 1 };
+	if (assert_storage(obj->maps.cgroup_storage,
+			   CHILD_CGROUP, &expected_cgroup_value))
 		goto close_bpf_object;
 
 close_bpf_object:
@@ -143,6 +153,19 @@ static void test_egress_only(int parent_cgroup_fd, int child_cgroup_fd)
 	cg_storage_multi_egress_only__destroy(obj);
 }
 
+static void test_egress_ingress(int parent_cgroup_fd, int child_cgroup_fd)
+{
+	struct cg_storage_multi_egress_ingress *obj;
+
+	/* Cannot load both programs due to verifier failure:
+	 * "only one cgroup storage of each type is allowed"
+	 */
+	obj = cg_storage_multi_egress_ingress__open_and_load();
+	if (CHECK(obj || errno != EBUSY,
+		  "skel-load", "errno %d, expected EBUSY", errno))
+		return;
+}
+
 void test_cg_storage_multi(void)
 {
 	int parent_cgroup_fd = -1, child_cgroup_fd = -1;
@@ -157,6 +180,9 @@ void test_cg_storage_multi(void)
 	if (test__start_subtest("egress_only"))
 		test_egress_only(parent_cgroup_fd, child_cgroup_fd);
 
+	if (test__start_subtest("egress_ingress"))
+		test_egress_ingress(parent_cgroup_fd, child_cgroup_fd);
+
 close_cgroup_fd:
 	close(child_cgroup_fd);
 	close(parent_cgroup_fd);
diff --git a/tools/testing/selftests/bpf/progs/cg_storage_multi.h b/tools/testing/selftests/bpf/progs/cg_storage_multi.h
new file mode 100644
index 000000000000..a0778fe7857a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/cg_storage_multi.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __PROGS_CG_STORAGE_MULTI_H
+#define __PROGS_CG_STORAGE_MULTI_H
+
+#include <asm/types.h>
+
+struct cgroup_value {
+	__u32 egress_pkts;
+	__u32 ingress_pkts;
+};
+
+#endif
diff --git a/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_ingress.c b/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_ingress.c
new file mode 100644
index 000000000000..9ce386899365
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_ingress.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright 2020 Google LLC.
+ */
+
+#include <errno.h>
+#include <linux/bpf.h>
+#include <linux/ip.h>
+#include <linux/udp.h>
+#include <bpf/bpf_helpers.h>
+
+#include "progs/cg_storage_multi.h"
+
+struct {
+	__uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
+	__type(key, struct bpf_cgroup_storage_key);
+	__type(value, struct cgroup_value);
+} cgroup_storage SEC(".maps");
+
+__u32 invocations = 0;
+
+SEC("cgroup_skb/egress")
+int egress(struct __sk_buff *skb)
+{
+	struct cgroup_value *ptr_cg_storage =
+		bpf_get_local_storage(&cgroup_storage, 0);
+
+	__sync_fetch_and_add(&ptr_cg_storage->egress_pkts, 1);
+	__sync_fetch_and_add(&invocations, 1);
+
+	return 1;
+}
+
+SEC("cgroup_skb/ingress")
+int ingress(struct __sk_buff *skb)
+{
+	struct cgroup_value *ptr_cg_storage =
+		bpf_get_local_storage(&cgroup_storage, 0);
+
+	__sync_fetch_and_add(&ptr_cg_storage->ingress_pkts, 1);
+	__sync_fetch_and_add(&invocations, 1);
+
+	return 1;
+}
diff --git a/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c b/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c
index ec0165d07105..44ad46b33539 100644
--- a/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c
+++ b/tools/testing/selftests/bpf/progs/cg_storage_multi_egress_only.c
@@ -10,10 +10,12 @@
 #include <linux/udp.h>
 #include <bpf/bpf_helpers.h>
 
+#include "progs/cg_storage_multi.h"
+
 struct {
 	__uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
 	__type(key, struct bpf_cgroup_storage_key);
-	__type(value, __u32);
+	__type(value, struct cgroup_value);
 } cgroup_storage SEC(".maps");
 
 __u32 invocations = 0;
@@ -21,9 +23,10 @@ __u32 invocations = 0;
 SEC("cgroup_skb/egress")
 int egress(struct __sk_buff *skb)
 {
-	__u32 *ptr_cg_storage = bpf_get_local_storage(&cgroup_storage, 0);
+	struct cgroup_value *ptr_cg_storage =
+		bpf_get_local_storage(&cgroup_storage, 0);
 
-	__sync_fetch_and_add(ptr_cg_storage, 1);
+	__sync_fetch_and_add(&ptr_cg_storage->egress_pkts, 1);
 	__sync_fetch_and_add(&invocations, 1);
 
 	return 1;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 bpf-next 3/5] bpf: Make cgroup storages shared across attaches on the same cgroup
  2020-07-17  0:16 [PATCH v3 bpf-next 0/5] Make BPF CGROUP_STORAGE map usable by different programs at once YiFei Zhu
  2020-07-17  0:16 ` [PATCH v3 bpf-next 1/5] selftests/bpf: Add test for CGROUP_STORAGE map on multiple attaches YiFei Zhu
  2020-07-17  0:16 ` [PATCH v3 bpf-next 2/5] selftests/bpf: Test CGROUP_STORAGE map can't be used by multiple progs YiFei Zhu
@ 2020-07-17  0:16 ` YiFei Zhu
  2020-07-18  3:30   ` Martin KaFai Lau
  2020-07-17  0:16 ` [PATCH v3 bpf-next 4/5] selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress YiFei Zhu
  2020-07-17  0:16 ` [PATCH v3 bpf-next 5/5] Documentation/bpf: Document CGROUP_STORAGE map type YiFei Zhu
  4 siblings, 1 reply; 8+ messages in thread
From: YiFei Zhu @ 2020-07-17  0:16 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Stanislav Fomichev,
	Mahesh Bandewar, Roman Gushchin, Andrii Nakryiko,
	Martin KaFai Lau, YiFei Zhu

From: YiFei Zhu <zhuyifei@google.com>

This change comes in several parts:

One, the restriction that the CGROUP_STORAGE map can only be used
by one program is removed. This results in the removal of the field
'aux' in struct bpf_cgroup_storage_map, and removal of relevant
code associated with the field, and removal of now-noop functions
bpf_free_cgroup_storage and bpf_cgroup_storage_release.

Second, because there could be multiple attach types to the same
cgroup, the attach type is completely ignored on comparison in
the map key. Newly added keys have it zeroed. The only value in
the key that still matters is the cgroup inode. bpftool map dump
will also show an attach type of zero.

Third, because the storages are now shared, the storages cannot
be unconditionally freed on program detach. There could be two
ways to solve this issue:
* A. Reference count the usage of the storages, and free when the
     last program is detached.
* B. Free only when the storage is impossible to be referred to
     again, i.e. when either the cgroup_bpf it is attached to, or
     the map itself, is freed.
Option A has the side effect that, when the user detach and
reattach a program, whether the program gets a fresh storage
depends on whether there is another program attached using that
storage. This could trigger races if the user is multi-threaded,
and since nondeterminism in data races is evil, go with option B.

The both the map and the cgroup_bpf now tracks their associated
storages, and the storage unlink and free are removed from
cgroup_bpf_detach and added to cgroup_bpf_release and
cgroup_storage_map_free. The latter also new holds the cgroup_mutex
to prevent any races with the former.

Fourth, on attach, we reuse the old storage if the key already
exists in the map. Because the rbtree traversal holds the spinlock
of the map, during which we can't allocate a new storage if we
don't find an old storage, instead we preallocate the storage
unconditionally, and free the preallocated storage if we find an
old storage in the map. This results in a change of semantics in
bpf_cgroup_storage{,s}_link, and rename cgroup_storage_insert to
cgroup_storage_lookup_insert that does both lookup and conditionally
insert or free. bpf_cgroup_storage{,s}_link also tracks exactly
which storages are reused in an array of bools, so it can unlink
and free the new storages in the event that attachment failed
later than link. bpf_cgroup_storages_{free,unlink} accepts the
bool array in order to facilitate that.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
---
 include/linux/bpf-cgroup.h     | 15 +++---
 include/uapi/linux/bpf.h       |  2 +-
 kernel/bpf/cgroup.c            | 69 +++++++++++++++------------
 kernel/bpf/core.c              | 12 -----
 kernel/bpf/local_storage.c     | 85 ++++++++++++++++------------------
 tools/include/uapi/linux/bpf.h |  2 +-
 6 files changed, 91 insertions(+), 94 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 2c6f26670acc..c83cd8862298 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -46,7 +46,8 @@ struct bpf_cgroup_storage {
 	};
 	struct bpf_cgroup_storage_map *map;
 	struct bpf_cgroup_storage_key key;
-	struct list_head list;
+	struct list_head list_map;
+	struct list_head list_cg;
 	struct rb_node node;
 	struct rcu_head rcu;
 };
@@ -78,6 +79,9 @@ struct cgroup_bpf {
 	struct list_head progs[MAX_BPF_ATTACH_TYPE];
 	u32 flags[MAX_BPF_ATTACH_TYPE];
 
+	/* list of cgroup shared storages */
+	struct list_head storages;
+
 	/* temp storage for effective prog array used by prog_attach/detach */
 	struct bpf_prog_array *inactive;
 
@@ -164,12 +168,11 @@ static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage
 struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog,
 					enum bpf_cgroup_storage_type stype);
 void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage);
-void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
-			     struct cgroup *cgroup,
-			     enum bpf_attach_type type);
+struct bpf_cgroup_storage *
+bpf_cgroup_storage_link(struct bpf_cgroup_storage *new_storage,
+			struct cgroup *cgroup, bool *storage_reused);
 void bpf_cgroup_storage_unlink(struct bpf_cgroup_storage *storage);
 int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *map);
-void bpf_cgroup_storage_release(struct bpf_prog_aux *aux, struct bpf_map *map);
 
 int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key, void *value);
 int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
@@ -383,8 +386,6 @@ static inline void bpf_cgroup_storage_set(
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE]) {}
 static inline int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux,
 					    struct bpf_map *map) { return 0; }
-static inline void bpf_cgroup_storage_release(struct bpf_prog_aux *aux,
-					      struct bpf_map *map) {}
 static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(
 	struct bpf_prog *prog, enum bpf_cgroup_storage_type stype) { return NULL; }
 static inline void bpf_cgroup_storage_free(
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7ac3992dacfe..b14f008ad028 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -78,7 +78,7 @@ struct bpf_lpm_trie_key {
 
 struct bpf_cgroup_storage_key {
 	__u64	cgroup_inode_id;	/* cgroup inode id */
-	__u32	attach_type;		/* program attach type */
+	__u32	attach_type;		/* program attach type, unused */
 };
 
 /* BPF syscall commands, see bpf(2) man-page for details. */
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index ac53102e244a..6b1ef4a809bb 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -28,12 +28,14 @@ void cgroup_bpf_offline(struct cgroup *cgrp)
 	percpu_ref_kill(&cgrp->bpf.refcnt);
 }
 
-static void bpf_cgroup_storages_free(struct bpf_cgroup_storage *storages[])
+static void bpf_cgroup_storages_free(struct bpf_cgroup_storage *storages[],
+				     bool *storage_reused)
 {
 	enum bpf_cgroup_storage_type stype;
 
 	for_each_cgroup_storage_type(stype)
-		bpf_cgroup_storage_free(storages[stype]);
+		if (!storage_reused || !storage_reused[stype])
+			bpf_cgroup_storage_free(storages[stype]);
 }
 
 static int bpf_cgroup_storages_alloc(struct bpf_cgroup_storage *storages[],
@@ -45,7 +47,7 @@ static int bpf_cgroup_storages_alloc(struct bpf_cgroup_storage *storages[],
 		storages[stype] = bpf_cgroup_storage_alloc(prog, stype);
 		if (IS_ERR(storages[stype])) {
 			storages[stype] = NULL;
-			bpf_cgroup_storages_free(storages);
+			bpf_cgroup_storages_free(storages, NULL);
 			return -ENOMEM;
 		}
 	}
@@ -63,21 +65,24 @@ static void bpf_cgroup_storages_assign(struct bpf_cgroup_storage *dst[],
 }
 
 static void bpf_cgroup_storages_link(struct bpf_cgroup_storage *storages[],
-				     struct cgroup* cgrp,
-				     enum bpf_attach_type attach_type)
+				     struct cgroup *cgrp, bool *storage_reused)
 {
 	enum bpf_cgroup_storage_type stype;
 
 	for_each_cgroup_storage_type(stype)
-		bpf_cgroup_storage_link(storages[stype], cgrp, attach_type);
+		storages[stype] =
+			bpf_cgroup_storage_link(storages[stype], cgrp,
+					        &storage_reused[stype]);
 }
 
-static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[])
+static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[],
+				       bool *storage_reused)
 {
 	enum bpf_cgroup_storage_type stype;
 
 	for_each_cgroup_storage_type(stype)
-		bpf_cgroup_storage_unlink(storages[stype]);
+		if (!storage_reused || !storage_reused[stype])
+			bpf_cgroup_storage_unlink(storages[stype]);
 }
 
 /* Called when bpf_cgroup_link is auto-detached from dying cgroup.
@@ -101,22 +106,23 @@ static void cgroup_bpf_release(struct work_struct *work)
 	struct cgroup *p, *cgrp = container_of(work, struct cgroup,
 					       bpf.release_work);
 	struct bpf_prog_array *old_array;
+	struct list_head *storages = &cgrp->bpf.storages;
+	struct bpf_cgroup_storage *storage, *stmp;
+
 	unsigned int type;
 
 	mutex_lock(&cgroup_mutex);
 
 	for (type = 0; type < ARRAY_SIZE(cgrp->bpf.progs); type++) {
 		struct list_head *progs = &cgrp->bpf.progs[type];
-		struct bpf_prog_list *pl, *tmp;
+		struct bpf_prog_list *pl, *pltmp;
 
-		list_for_each_entry_safe(pl, tmp, progs, node) {
+		list_for_each_entry_safe(pl, pltmp, progs, node) {
 			list_del(&pl->node);
 			if (pl->prog)
 				bpf_prog_put(pl->prog);
 			if (pl->link)
 				bpf_cgroup_link_auto_detach(pl->link);
-			bpf_cgroup_storages_unlink(pl->storage);
-			bpf_cgroup_storages_free(pl->storage);
 			kfree(pl);
 			static_branch_dec(&cgroup_bpf_enabled_key);
 		}
@@ -126,6 +132,11 @@ static void cgroup_bpf_release(struct work_struct *work)
 		bpf_prog_array_free(old_array);
 	}
 
+	list_for_each_entry_safe(storage, stmp, storages, list_cg) {
+		bpf_cgroup_storage_unlink(storage);
+		bpf_cgroup_storage_free(storage);
+	}
+
 	mutex_unlock(&cgroup_mutex);
 
 	for (p = cgroup_parent(cgrp); p; p = cgroup_parent(p))
@@ -290,6 +301,8 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
 	for (i = 0; i < NR; i++)
 		INIT_LIST_HEAD(&cgrp->bpf.progs[i]);
 
+	INIT_LIST_HEAD(&cgrp->bpf.storages);
+
 	for (i = 0; i < NR; i++)
 		if (compute_effective_progs(cgrp, i, &arrays[i]))
 			goto cleanup;
@@ -422,7 +435,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
 	struct list_head *progs = &cgrp->bpf.progs[type];
 	struct bpf_prog *old_prog = NULL;
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
-	struct bpf_cgroup_storage *old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
+	bool storage_reused[MAX_BPF_CGROUP_STORAGE_TYPE];
 	struct bpf_prog_list *pl;
 	int err;
 
@@ -455,22 +468,22 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
 	if (IS_ERR(pl))
 		return PTR_ERR(pl);
 
-	if (bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog))
-		return -ENOMEM;
-
 	if (pl) {
 		old_prog = pl->prog;
-		bpf_cgroup_storages_unlink(pl->storage);
-		bpf_cgroup_storages_assign(old_storage, pl->storage);
 	} else {
 		pl = kmalloc(sizeof(*pl), GFP_KERNEL);
-		if (!pl) {
-			bpf_cgroup_storages_free(storage);
+		if (!pl)
 			return -ENOMEM;
-		}
+
 		list_add_tail(&pl->node, progs);
 	}
 
+	err = bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog);
+	if (err)
+		goto cleanup;
+
+	bpf_cgroup_storages_link(storage, cgrp, storage_reused);
+
 	pl->prog = prog;
 	pl->link = link;
 	bpf_cgroup_storages_assign(pl->storage, storage);
@@ -478,24 +491,24 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
 
 	err = update_effective_progs(cgrp, type);
 	if (err)
-		goto cleanup;
+		goto cleanup_unlink;
 
-	bpf_cgroup_storages_free(old_storage);
 	if (old_prog)
 		bpf_prog_put(old_prog);
 	else
 		static_branch_inc(&cgroup_bpf_enabled_key);
-	bpf_cgroup_storages_link(pl->storage, cgrp, type);
 	return 0;
 
+cleanup_unlink:
+	bpf_cgroup_storages_unlink(storage, storage_reused);
+
 cleanup:
+	bpf_cgroup_storages_free(storage, storage_reused);
+
 	if (old_prog) {
 		pl->prog = old_prog;
 		pl->link = NULL;
 	}
-	bpf_cgroup_storages_free(pl->storage);
-	bpf_cgroup_storages_assign(pl->storage, old_storage);
-	bpf_cgroup_storages_link(pl->storage, cgrp, type);
 	if (!old_prog) {
 		list_del(&pl->node);
 		kfree(pl);
@@ -679,8 +692,6 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 
 	/* now can actually delete it from this cgroup list */
 	list_del(&pl->node);
-	bpf_cgroup_storages_unlink(pl->storage);
-	bpf_cgroup_storages_free(pl->storage);
 	kfree(pl);
 	if (list_empty(progs))
 		/* last program was detached, reset flags to zero */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9df4cc9a2907..f367fe7422ea 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2042,24 +2042,12 @@ int bpf_prog_array_copy_info(struct bpf_prog_array *array,
 								     : 0;
 }
 
-static void bpf_free_cgroup_storage(struct bpf_prog_aux *aux)
-{
-	enum bpf_cgroup_storage_type stype;
-
-	for_each_cgroup_storage_type(stype) {
-		if (!aux->cgroup_storage[stype])
-			continue;
-		bpf_cgroup_storage_release(aux, aux->cgroup_storage[stype]);
-	}
-}
-
 void __bpf_free_used_maps(struct bpf_prog_aux *aux,
 			  struct bpf_map **used_maps, u32 len)
 {
 	struct bpf_map *map;
 	u32 i;
 
-	bpf_free_cgroup_storage(aux);
 	for (i = 0; i < len; i++) {
 		map = used_maps[i];
 		if (map->ops->map_poke_untrack)
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 51bd5a8cb01b..78ffe69ff1d8 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -9,6 +9,8 @@
 #include <linux/slab.h>
 #include <uapi/linux/btf.h>
 
+#include "../cgroup/cgroup-internal.h"
+
 DEFINE_PER_CPU(struct bpf_cgroup_storage*, bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
 
 #ifdef CONFIG_CGROUP_BPF
@@ -20,7 +22,6 @@ struct bpf_cgroup_storage_map {
 	struct bpf_map map;
 
 	spinlock_t lock;
-	struct bpf_prog_aux *aux;
 	struct rb_root root;
 	struct list_head list;
 };
@@ -38,10 +39,6 @@ static int bpf_cgroup_storage_key_cmp(
 		return -1;
 	else if (key1->cgroup_inode_id > key2->cgroup_inode_id)
 		return 1;
-	else if (key1->attach_type < key2->attach_type)
-		return -1;
-	else if (key1->attach_type > key2->attach_type)
-		return 1;
 	return 0;
 }
 
@@ -81,8 +78,9 @@ static struct bpf_cgroup_storage *cgroup_storage_lookup(
 	return NULL;
 }
 
-static int cgroup_storage_insert(struct bpf_cgroup_storage_map *map,
-				 struct bpf_cgroup_storage *storage)
+static struct bpf_cgroup_storage *
+cgroup_storage_lookup_insert(struct bpf_cgroup_storage_map *map,
+			     struct bpf_cgroup_storage *storage)
 {
 	struct rb_root *root = &map->root;
 	struct rb_node **new = &(root->rb_node), *parent = NULL;
@@ -101,14 +99,15 @@ static int cgroup_storage_insert(struct bpf_cgroup_storage_map *map,
 			new = &((*new)->rb_right);
 			break;
 		default:
-			return -EEXIST;
+			bpf_cgroup_storage_free(storage);
+			return this;
 		}
 	}
 
 	rb_link_node(&storage->node, parent, new);
 	rb_insert_color(&storage->node, root);
 
-	return 0;
+	return NULL;
 }
 
 static void *cgroup_storage_lookup_elem(struct bpf_map *_map, void *_key)
@@ -131,10 +130,7 @@ static int cgroup_storage_update_elem(struct bpf_map *map, void *_key,
 	struct bpf_cgroup_storage *storage;
 	struct bpf_storage_buffer *new;
 
-	if (unlikely(flags & ~(BPF_F_LOCK | BPF_EXIST | BPF_NOEXIST)))
-		return -EINVAL;
-
-	if (unlikely(flags & BPF_NOEXIST))
+	if (unlikely(flags & ~(BPF_F_LOCK | BPF_EXIST)))
 		return -EINVAL;
 
 	if (unlikely((flags & BPF_F_LOCK) &&
@@ -250,16 +246,15 @@ static int cgroup_storage_get_next_key(struct bpf_map *_map, void *_key,
 		if (!storage)
 			goto enoent;
 
-		storage = list_next_entry(storage, list);
+		storage = list_next_entry(storage, list_map);
 		if (!storage)
 			goto enoent;
 	} else {
 		storage = list_first_entry(&map->list,
-					 struct bpf_cgroup_storage, list);
+					 struct bpf_cgroup_storage, list_map);
 	}
 
 	spin_unlock_bh(&map->lock);
-	next->attach_type = storage->key.attach_type;
 	next->cgroup_inode_id = storage->key.cgroup_inode_id;
 	return 0;
 
@@ -318,6 +313,17 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 static void cgroup_storage_map_free(struct bpf_map *_map)
 {
 	struct bpf_cgroup_storage_map *map = map_to_storage(_map);
+	struct list_head *storages = &map->list;
+	struct bpf_cgroup_storage *storage, *stmp;
+
+	mutex_lock(&cgroup_mutex);
+
+	list_for_each_entry_safe(storage, stmp, storages, list_map) {
+		bpf_cgroup_storage_unlink(storage);
+		bpf_cgroup_storage_free(storage);
+	}
+
+	mutex_unlock(&cgroup_mutex);
 
 	WARN_ON(!RB_EMPTY_ROOT(&map->root));
 	WARN_ON(!list_empty(&map->list));
@@ -431,13 +437,10 @@ int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *_map)
 
 	spin_lock_bh(&map->lock);
 
-	if (map->aux && map->aux != aux)
-		goto unlock;
 	if (aux->cgroup_storage[stype] &&
 	    aux->cgroup_storage[stype] != _map)
 		goto unlock;
 
-	map->aux = aux;
 	aux->cgroup_storage[stype] = _map;
 	ret = 0;
 unlock:
@@ -446,20 +449,6 @@ int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *_map)
 	return ret;
 }
 
-void bpf_cgroup_storage_release(struct bpf_prog_aux *aux, struct bpf_map *_map)
-{
-	enum bpf_cgroup_storage_type stype = cgroup_storage_type(_map);
-	struct bpf_cgroup_storage_map *map = map_to_storage(_map);
-
-	spin_lock_bh(&map->lock);
-	if (map->aux == aux) {
-		WARN_ON(aux->cgroup_storage[stype] != _map);
-		map->aux = NULL;
-		aux->cgroup_storage[stype] = NULL;
-	}
-	spin_unlock_bh(&map->lock);
-}
-
 static size_t bpf_cgroup_storage_calculate_size(struct bpf_map *map, u32 *pages)
 {
 	size_t size;
@@ -562,24 +551,31 @@ void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage)
 		call_rcu(&storage->rcu, free_percpu_cgroup_storage_rcu);
 }
 
-void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
-			     struct cgroup *cgroup,
-			     enum bpf_attach_type type)
+struct bpf_cgroup_storage *
+bpf_cgroup_storage_link(struct bpf_cgroup_storage *new_storage,
+			struct cgroup *cgroup, bool *storage_reused)
 {
 	struct bpf_cgroup_storage_map *map;
+	struct bpf_cgroup_storage *old_storage;
 
-	if (!storage)
-		return;
+	if (!new_storage)
+		return NULL;
 
-	storage->key.attach_type = type;
-	storage->key.cgroup_inode_id = cgroup_id(cgroup);
+	new_storage->key.cgroup_inode_id = cgroup_id(cgroup);
 
-	map = storage->map;
+	map = new_storage->map;
 
 	spin_lock_bh(&map->lock);
-	WARN_ON(cgroup_storage_insert(map, storage));
-	list_add(&storage->list, &map->list);
+	old_storage = cgroup_storage_lookup_insert(map, new_storage);
+	if (!old_storage) {
+		list_add(&new_storage->list_map, &map->list);
+		list_add(&new_storage->list_cg, &cgroup->bpf.storages);
+	}
 	spin_unlock_bh(&map->lock);
+
+	*storage_reused = old_storage;
+
+	return old_storage ? : new_storage;
 }
 
 void bpf_cgroup_storage_unlink(struct bpf_cgroup_storage *storage)
@@ -596,7 +592,8 @@ void bpf_cgroup_storage_unlink(struct bpf_cgroup_storage *storage)
 	root = &map->root;
 	rb_erase(&storage->node, root);
 
-	list_del(&storage->list);
+	list_del(&storage->list_map);
+	list_del(&storage->list_cg);
 	spin_unlock_bh(&map->lock);
 }
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 7ac3992dacfe..b14f008ad028 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -78,7 +78,7 @@ struct bpf_lpm_trie_key {
 
 struct bpf_cgroup_storage_key {
 	__u64	cgroup_inode_id;	/* cgroup inode id */
-	__u32	attach_type;		/* program attach type */
+	__u32	attach_type;		/* program attach type, unused */
 };
 
 /* BPF syscall commands, see bpf(2) man-page for details. */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 bpf-next 4/5] selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress
  2020-07-17  0:16 [PATCH v3 bpf-next 0/5] Make BPF CGROUP_STORAGE map usable by different programs at once YiFei Zhu
                   ` (2 preceding siblings ...)
  2020-07-17  0:16 ` [PATCH v3 bpf-next 3/5] bpf: Make cgroup storages shared across attaches on the same cgroup YiFei Zhu
@ 2020-07-17  0:16 ` YiFei Zhu
  2020-07-17  0:16 ` [PATCH v3 bpf-next 5/5] Documentation/bpf: Document CGROUP_STORAGE map type YiFei Zhu
  4 siblings, 0 replies; 8+ messages in thread
From: YiFei Zhu @ 2020-07-17  0:16 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Stanislav Fomichev,
	Mahesh Bandewar, Roman Gushchin, Andrii Nakryiko,
	Martin KaFai Lau, YiFei Zhu

From: YiFei Zhu <zhuyifei@google.com>

This mirrors the original egress-only test. The cgroup_storage is
now extended to have two packet counters, one for egress and one
for ingress. The behavior of the counters are exactly the same as
the original egress-only test, only that the total number of
invocations doubles from having both egress and ingress being
counted.

The field attach_type in the map key is ignored in the kernel;
however, keeping it is pointless here and we are demonstrating the
expected usage of the map, so it is removed. That said, keeping the
field will not fail the test, for backwards compatibility reasons.
In other words, the original egress-only test is not affected by
the change in CGROUP_STORAGE behavior and will pass in both cases.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
---
 .../bpf/prog_tests/cg_storage_multi.c         | 90 +++++++++++++++++--
 1 file changed, 83 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
index 1f4ab437ddb9..aa2b448c4214 100644
--- a/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
+++ b/tools/testing/selftests/bpf/prog_tests/cg_storage_multi.c
@@ -28,7 +28,6 @@ static bool assert_storage(struct bpf_map *map, const char *cgroup_path,
 	map_fd = bpf_map__fd(map);
 
 	key.cgroup_inode_id = get_cgroup_id(cgroup_path);
-	key.attach_type = BPF_CGROUP_INET_EGRESS;
 	if (CHECK(bpf_map_lookup_elem(map_fd, &key, &value) < 0,
 		  "map-lookup", "errno %d", errno))
 		return true;
@@ -48,7 +47,6 @@ static bool assert_storage_noexist(struct bpf_map *map, const char *cgroup_path)
 	map_fd = bpf_map__fd(map);
 
 	key.cgroup_inode_id = get_cgroup_id(cgroup_path);
-	key.attach_type = BPF_CGROUP_INET_EGRESS;
 	if (CHECK(bpf_map_lookup_elem(map_fd, &key, &value) == 0,
 		  "map-lookup", "succeeded, expected ENOENT"))
 		return true;
@@ -156,14 +154,92 @@ static void test_egress_only(int parent_cgroup_fd, int child_cgroup_fd)
 static void test_egress_ingress(int parent_cgroup_fd, int child_cgroup_fd)
 {
 	struct cg_storage_multi_egress_ingress *obj;
+	struct cgroup_value expected_cgroup_value;
+	struct bpf_link *parent_egress_link = NULL, *parent_ingress_link = NULL;
+	struct bpf_link *child_egress_link = NULL, *child_ingress_link = NULL;
+	bool err;
 
-	/* Cannot load both programs due to verifier failure:
-	 * "only one cgroup storage of each type is allowed"
-	 */
 	obj = cg_storage_multi_egress_ingress__open_and_load();
-	if (CHECK(obj || errno != EBUSY,
-		  "skel-load", "errno %d, expected EBUSY", errno))
+	if (CHECK(!obj, "skel-load", "errno %d", errno))
 		return;
+
+	/* Attach to parent cgroup, trigger packet from child.
+	 * Assert that there is two runs, one with parent cgroup egress and
+	 * one with parent cgroup ingress.
+	 * Also assert that child cgroup's storage does not exist
+	 */
+	parent_egress_link = bpf_program__attach_cgroup(obj->progs.egress,
+							parent_cgroup_fd);
+	if (CHECK(IS_ERR(parent_egress_link), "parent-egress-cg-attach",
+		  "err %ld", PTR_ERR(parent_egress_link)))
+		goto close_bpf_object;
+	parent_ingress_link = bpf_program__attach_cgroup(obj->progs.ingress,
+							 parent_cgroup_fd);
+	if (CHECK(IS_ERR(parent_ingress_link), "parent-ingress-cg-attach",
+		  "err %ld", PTR_ERR(parent_ingress_link)))
+		goto close_bpf_object;
+	err = connect_send(CHILD_CGROUP);
+	if (CHECK(err, "first-connect-send", "errno %d", errno))
+		goto close_bpf_object;
+	if (CHECK(obj->bss->invocations != 2,
+		  "first-invoke", "invocations=%d", obj->bss->invocations))
+		goto close_bpf_object;
+	expected_cgroup_value = (struct cgroup_value) {
+		.egress_pkts = 1,
+		.ingress_pkts = 1,
+	};
+	if (assert_storage(obj->maps.cgroup_storage,
+			   PARENT_CGROUP, &expected_cgroup_value))
+		goto close_bpf_object;
+	if (assert_storage_noexist(obj->maps.cgroup_storage, CHILD_CGROUP))
+		goto close_bpf_object;
+
+	/* Attach to parent and child cgroup, trigger packet from child.
+	 * Assert that there is four additional runs, parent cgroup egress and
+	 * ingress, child cgroup egress and ingress.
+	 */
+	child_egress_link = bpf_program__attach_cgroup(obj->progs.egress,
+						       child_cgroup_fd);
+	if (CHECK(IS_ERR(child_egress_link), "child-egress-cg-attach",
+		  "err %ld", PTR_ERR(child_egress_link)))
+		goto close_bpf_object;
+	child_ingress_link = bpf_program__attach_cgroup(obj->progs.ingress,
+							child_cgroup_fd);
+	if (CHECK(IS_ERR(child_ingress_link), "child-ingress-cg-attach",
+		  "err %ld", PTR_ERR(child_ingress_link)))
+		goto close_bpf_object;
+	err = connect_send(CHILD_CGROUP);
+	if (CHECK(err, "second-connect-send", "errno %d", errno))
+		goto close_bpf_object;
+	if (CHECK(obj->bss->invocations != 6,
+		  "second-invoke", "invocations=%d", obj->bss->invocations))
+		goto close_bpf_object;
+	expected_cgroup_value = (struct cgroup_value) {
+		.egress_pkts = 2,
+		.ingress_pkts = 2,
+	};
+	if (assert_storage(obj->maps.cgroup_storage,
+			   PARENT_CGROUP, &expected_cgroup_value))
+		goto close_bpf_object;
+	expected_cgroup_value = (struct cgroup_value) {
+		.egress_pkts = 1,
+		.ingress_pkts = 1,
+	};
+	if (assert_storage(obj->maps.cgroup_storage,
+			   CHILD_CGROUP, &expected_cgroup_value))
+		goto close_bpf_object;
+
+close_bpf_object:
+	if (parent_egress_link)
+		bpf_link__destroy(parent_egress_link);
+	if (parent_ingress_link)
+		bpf_link__destroy(parent_ingress_link);
+	if (child_egress_link)
+		bpf_link__destroy(child_egress_link);
+	if (child_ingress_link)
+		bpf_link__destroy(child_ingress_link);
+
+	cg_storage_multi_egress_ingress__destroy(obj);
 }
 
 void test_cg_storage_multi(void)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 bpf-next 5/5] Documentation/bpf: Document CGROUP_STORAGE map type
  2020-07-17  0:16 [PATCH v3 bpf-next 0/5] Make BPF CGROUP_STORAGE map usable by different programs at once YiFei Zhu
                   ` (3 preceding siblings ...)
  2020-07-17  0:16 ` [PATCH v3 bpf-next 4/5] selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress YiFei Zhu
@ 2020-07-17  0:16 ` YiFei Zhu
  4 siblings, 0 replies; 8+ messages in thread
From: YiFei Zhu @ 2020-07-17  0:16 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Stanislav Fomichev,
	Mahesh Bandewar, Roman Gushchin, Andrii Nakryiko,
	Martin KaFai Lau, YiFei Zhu

From: YiFei Zhu <zhuyifei@google.com>

The machanics and usage are not very straightforward. Given the
changes it's better to document how it works and how to use it,
rather than having to rely on the examples and implementation to
infer what is going on.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
---
 Documentation/bpf/index.rst              |  9 +++
 Documentation/bpf/map_cgroup_storage.rst | 95 ++++++++++++++++++++++++
 2 files changed, 104 insertions(+)
 create mode 100644 Documentation/bpf/map_cgroup_storage.rst

diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst
index 38b4db8be7a2..26f4bb3107fc 100644
--- a/Documentation/bpf/index.rst
+++ b/Documentation/bpf/index.rst
@@ -48,6 +48,15 @@ Program types
    bpf_lsm
 
 
+Map types
+=========
+
+.. toctree::
+   :maxdepth: 1
+
+   map_cgroup_storage
+
+
 Testing and debugging BPF
 =========================
 
diff --git a/Documentation/bpf/map_cgroup_storage.rst b/Documentation/bpf/map_cgroup_storage.rst
new file mode 100644
index 000000000000..b7210cb3f294
--- /dev/null
+++ b/Documentation/bpf/map_cgroup_storage.rst
@@ -0,0 +1,95 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+.. Copyright (C) 2020 Google LLC.
+
+===========================
+BPF_MAP_TYPE_CGROUP_STORAGE
+===========================
+
+The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized
+storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that
+attach to cgroups; the programs are made available by the same config. The
+storage is identified by the cgroup the program is attached to.
+
+This document describes the usage and semantics of the
+``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in
+Linux 5.9 and this document will describe the differences.
+
+Usage
+=====
+
+The map uses key of type ``struct bpf_cgroup_storage_key``, declared in
+``linux/bpf.h``::
+
+    struct bpf_cgroup_storage_key {
+            __u64 cgroup_inode_id;
+            __u32 attach_type;
+    };
+
+``cgroup_inode_id`` is the inode id of the cgroup directory.
+``attach_type`` was the the program's attach type prior to Linux 5.9, since 5.9
+it is ignored and kept for backwards compatibility.
+
+To access the storage in a program, use ``bpf_get_local_storage``::
+
+    void *bpf_get_local_storage(void *map, u64 flags)
+
+``flags`` is reserved for future use and must be 0.
+
+There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE``
+can be accessed by multiple programs across different CPUs, and user should
+take care of synchronization by themselves.
+
+Example usage::
+
+    #include <linux/bpf.h>
+
+    struct {
+            __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
+            __type(key, struct bpf_cgroup_storage_key);
+            __type(value, __u32);
+    } cgroup_storage SEC(".maps");
+
+    int program(struct __sk_buff *skb)
+    {
+            __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
+            __sync_fetch_and_add(ptr_cg_storage-, 1);
+
+            return 0;
+    }
+
+Semantics
+=========
+
+``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This
+per-CPU variant will have different memory regions for each CPU for each
+storage. The non-per-CPU will have the same memory region for each storage.
+
+Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and
+for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded
+that uses the map. A program may be attached to multiple cgroups or have
+multiple attach types, and each attach creates a fresh zeroed storage. The
+storage is freed upon detach.
+
+Userspace may use the the attach parameters of cgroup and attach type pair
+in ``struct bpf_cgroup_storage_key`` as the key to the BPF map APIs to read or
+update the storage for a given attachment.
+
+Since Linux 5.9, storage can be shared by multiple programs, and attach type
+is ignored. When a program is attached to a cgroup, the kernel would create a
+new storage only if the map does not already contain an entry for the cgroup,
+or else the old storage is reused for the new attachment. Storage is freed
+only when either the map or the cgroup attached to is being freed. Detaching
+will not directly free the storage, but it may cause the reference to the map
+to reach zero and indirectly freeing all storage in the map.
+
+Userspace may use the the attach parameters of cgroup only in
+``struct bpf_cgroup_storage_key`` as the key to the BPF map APIs to read or
+update the storage for a given attachment. The struct also contains an
+``attach_type`` field; this field is ignored.
+
+In all versions, the storage is bound at attach time. Even if the program is
+attached to parent and triggers in child, the storage still belongs to the
+parent.
+
+Userspace cannot create a new entry in the map or delete an existing entry.
+Program test runs always use a temporary storage.
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 bpf-next 3/5] bpf: Make cgroup storages shared across attaches on the same cgroup
  2020-07-17  0:16 ` [PATCH v3 bpf-next 3/5] bpf: Make cgroup storages shared across attaches on the same cgroup YiFei Zhu
@ 2020-07-18  3:30   ` Martin KaFai Lau
  2020-07-18  3:48     ` YiFei Zhu
  0 siblings, 1 reply; 8+ messages in thread
From: Martin KaFai Lau @ 2020-07-18  3:30 UTC (permalink / raw)
  To: YiFei Zhu
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Stanislav Fomichev,
	Mahesh Bandewar, Roman Gushchin, Andrii Nakryiko, YiFei Zhu

On Thu, Jul 16, 2020 at 07:16:27PM -0500, YiFei Zhu wrote:

> Fourth, on attach, we reuse the old storage if the key already
> exists in the map. Because the rbtree traversal holds the spinlock
> of the map, during which we can't allocate a new storage if we
> don't find an old storage, instead we preallocate the storage
> unconditionally, and free the preallocated storage if we find an
> old storage in the map. This results in a change of semantics in
> bpf_cgroup_storage{,s}_link, and rename cgroup_storage_insert to
> cgroup_storage_lookup_insert that does both lookup and conditionally
> insert or free. bpf_cgroup_storage{,s}_link also tracks exactly
> which storages are reused in an array of bools, so it can unlink
> and free the new storages in the event that attachment failed
> later than link. bpf_cgroup_storages_{free,unlink} accepts the
> bool array in order to facilitate that.

[ ... ]

> ---
>  include/linux/bpf-cgroup.h     | 15 +++---
>  include/uapi/linux/bpf.h       |  2 +-
>  kernel/bpf/cgroup.c            | 69 +++++++++++++++------------
>  kernel/bpf/core.c              | 12 -----
>  kernel/bpf/local_storage.c     | 85 ++++++++++++++++------------------
>  tools/include/uapi/linux/bpf.h |  2 +-
>  6 files changed, 91 insertions(+), 94 deletions(-)
> 
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index 2c6f26670acc..c83cd8862298 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -46,7 +46,8 @@ struct bpf_cgroup_storage {
>  	};
>  	struct bpf_cgroup_storage_map *map;
>  	struct bpf_cgroup_storage_key key;
> -	struct list_head list;
> +	struct list_head list_map;
> +	struct list_head list_cg;
>  	struct rb_node node;
>  	struct rcu_head rcu;
>  };
> @@ -78,6 +79,9 @@ struct cgroup_bpf {
>  	struct list_head progs[MAX_BPF_ATTACH_TYPE];
>  	u32 flags[MAX_BPF_ATTACH_TYPE];
>  
> +	/* list of cgroup shared storages */
> +	struct list_head storages;
> +
>  	/* temp storage for effective prog array used by prog_attach/detach */
>  	struct bpf_prog_array *inactive;
>  
> @@ -164,12 +168,11 @@ static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage
>  struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog,
>  					enum bpf_cgroup_storage_type stype);
>  void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage);
> -void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
> -			     struct cgroup *cgroup,
> -			     enum bpf_attach_type type);
> +struct bpf_cgroup_storage *
> +bpf_cgroup_storage_link(struct bpf_cgroup_storage *new_storage,
> +			struct cgroup *cgroup, bool *storage_reused);
>  void bpf_cgroup_storage_unlink(struct bpf_cgroup_storage *storage);
>  int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *map);
> -void bpf_cgroup_storage_release(struct bpf_prog_aux *aux, struct bpf_map *map);
>  
>  int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key, void *value);
>  int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
> @@ -383,8 +386,6 @@ static inline void bpf_cgroup_storage_set(
>  	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE]) {}
>  static inline int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux,
>  					    struct bpf_map *map) { return 0; }
> -static inline void bpf_cgroup_storage_release(struct bpf_prog_aux *aux,
> -					      struct bpf_map *map) {}
>  static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(
>  	struct bpf_prog *prog, enum bpf_cgroup_storage_type stype) { return NULL; }
>  static inline void bpf_cgroup_storage_free(
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 7ac3992dacfe..b14f008ad028 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -78,7 +78,7 @@ struct bpf_lpm_trie_key {
>  
>  struct bpf_cgroup_storage_key {
>  	__u64	cgroup_inode_id;	/* cgroup inode id */
> -	__u32	attach_type;		/* program attach type */
> +	__u32	attach_type;		/* program attach type, unused */
>  };
>  
>  /* BPF syscall commands, see bpf(2) man-page for details. */
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index ac53102e244a..6b1ef4a809bb 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -28,12 +28,14 @@ void cgroup_bpf_offline(struct cgroup *cgrp)
>  	percpu_ref_kill(&cgrp->bpf.refcnt);
>  }
>  
> -static void bpf_cgroup_storages_free(struct bpf_cgroup_storage *storages[])
> +static void bpf_cgroup_storages_free(struct bpf_cgroup_storage *storages[],
> +				     bool *storage_reused)
>  {
>  	enum bpf_cgroup_storage_type stype;
>  
>  	for_each_cgroup_storage_type(stype)
> -		bpf_cgroup_storage_free(storages[stype]);
> +		if (!storage_reused || !storage_reused[stype])
> +			bpf_cgroup_storage_free(storages[stype]);
>  }
>  
>  static int bpf_cgroup_storages_alloc(struct bpf_cgroup_storage *storages[],
> @@ -45,7 +47,7 @@ static int bpf_cgroup_storages_alloc(struct bpf_cgroup_storage *storages[],
>  		storages[stype] = bpf_cgroup_storage_alloc(prog, stype);
>  		if (IS_ERR(storages[stype])) {
>  			storages[stype] = NULL;
> -			bpf_cgroup_storages_free(storages);
> +			bpf_cgroup_storages_free(storages, NULL);
>  			return -ENOMEM;
>  		}
>  	}
> @@ -63,21 +65,24 @@ static void bpf_cgroup_storages_assign(struct bpf_cgroup_storage *dst[],
>  }
>  
>  static void bpf_cgroup_storages_link(struct bpf_cgroup_storage *storages[],
> -				     struct cgroup* cgrp,
> -				     enum bpf_attach_type attach_type)
> +				     struct cgroup *cgrp, bool *storage_reused)
>  {
>  	enum bpf_cgroup_storage_type stype;
>  
>  	for_each_cgroup_storage_type(stype)
> -		bpf_cgroup_storage_link(storages[stype], cgrp, attach_type);
> +		storages[stype] =
> +			bpf_cgroup_storage_link(storages[stype], cgrp,
> +					        &storage_reused[stype]);
>  }
>  
> -static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[])
> +static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[],
> +				       bool *storage_reused)
>  {
>  	enum bpf_cgroup_storage_type stype;
>  
>  	for_each_cgroup_storage_type(stype)
> -		bpf_cgroup_storage_unlink(storages[stype]);
> +		if (!storage_reused || !storage_reused[stype])
> +			bpf_cgroup_storage_unlink(storages[stype]);
>  }
>  
>  /* Called when bpf_cgroup_link is auto-detached from dying cgroup.
> @@ -101,22 +106,23 @@ static void cgroup_bpf_release(struct work_struct *work)
>  	struct cgroup *p, *cgrp = container_of(work, struct cgroup,
>  					       bpf.release_work);
>  	struct bpf_prog_array *old_array;
> +	struct list_head *storages = &cgrp->bpf.storages;
> +	struct bpf_cgroup_storage *storage, *stmp;
> +
>  	unsigned int type;
>  
>  	mutex_lock(&cgroup_mutex);
>  
>  	for (type = 0; type < ARRAY_SIZE(cgrp->bpf.progs); type++) {
>  		struct list_head *progs = &cgrp->bpf.progs[type];
> -		struct bpf_prog_list *pl, *tmp;
> +		struct bpf_prog_list *pl, *pltmp;
>  
> -		list_for_each_entry_safe(pl, tmp, progs, node) {
> +		list_for_each_entry_safe(pl, pltmp, progs, node) {
>  			list_del(&pl->node);
>  			if (pl->prog)
>  				bpf_prog_put(pl->prog);
>  			if (pl->link)
>  				bpf_cgroup_link_auto_detach(pl->link);
> -			bpf_cgroup_storages_unlink(pl->storage);
> -			bpf_cgroup_storages_free(pl->storage);
>  			kfree(pl);
>  			static_branch_dec(&cgroup_bpf_enabled_key);
>  		}
> @@ -126,6 +132,11 @@ static void cgroup_bpf_release(struct work_struct *work)
>  		bpf_prog_array_free(old_array);
>  	}
>  
> +	list_for_each_entry_safe(storage, stmp, storages, list_cg) {
> +		bpf_cgroup_storage_unlink(storage);
> +		bpf_cgroup_storage_free(storage);
> +	}
> +
>  	mutex_unlock(&cgroup_mutex);
>  
>  	for (p = cgroup_parent(cgrp); p; p = cgroup_parent(p))
> @@ -290,6 +301,8 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
>  	for (i = 0; i < NR; i++)
>  		INIT_LIST_HEAD(&cgrp->bpf.progs[i]);
>  
> +	INIT_LIST_HEAD(&cgrp->bpf.storages);
> +
>  	for (i = 0; i < NR; i++)
>  		if (compute_effective_progs(cgrp, i, &arrays[i]))
>  			goto cleanup;
> @@ -422,7 +435,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
>  	struct list_head *progs = &cgrp->bpf.progs[type];
>  	struct bpf_prog *old_prog = NULL;
>  	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> -	struct bpf_cgroup_storage *old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> +	bool storage_reused[MAX_BPF_CGROUP_STORAGE_TYPE];
>  	struct bpf_prog_list *pl;
>  	int err;
>  
> @@ -455,22 +468,22 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
>  	if (IS_ERR(pl))
>  		return PTR_ERR(pl);
>  
> -	if (bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog))
> -		return -ENOMEM;
> -
>  	if (pl) {
>  		old_prog = pl->prog;
> -		bpf_cgroup_storages_unlink(pl->storage);
> -		bpf_cgroup_storages_assign(old_storage, pl->storage);
>  	} else {
>  		pl = kmalloc(sizeof(*pl), GFP_KERNEL);
> -		if (!pl) {
> -			bpf_cgroup_storages_free(storage);
> +		if (!pl)
>  			return -ENOMEM;
> -		}
> +
>  		list_add_tail(&pl->node, progs);
>  	}
>  
> +	err = bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog);
> +	if (err)
> +		goto cleanup;
> +
> +	bpf_cgroup_storages_link(storage, cgrp, storage_reused);
> +
>  	pl->prog = prog;
>  	pl->link = link;
>  	bpf_cgroup_storages_assign(pl->storage, storage);
> @@ -478,24 +491,24 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
>  
>  	err = update_effective_progs(cgrp, type);
>  	if (err)
> -		goto cleanup;
> +		goto cleanup_unlink;
>  
> -	bpf_cgroup_storages_free(old_storage);
>  	if (old_prog)
>  		bpf_prog_put(old_prog);
>  	else
>  		static_branch_inc(&cgroup_bpf_enabled_key);
> -	bpf_cgroup_storages_link(pl->storage, cgrp, type);
>  	return 0;
>  
> +cleanup_unlink:
> +	bpf_cgroup_storages_unlink(storage, storage_reused);
> +
I still failed to understand why there is a need to do this dance
that always link/publish first and then unlink/unpublish during failure.
It causes all these changes to add and track "storage_reused" params
in a few functions for handling this one failure. That also requires
to introduce the cgroup_storage_lookup_insert().

Going back to my earlier comment in v2 which I didn't here any feedback:

**** snippet ****
>> lookup old, found=>reuse, not-found=>alloc.
>>
>> Only publish the new storage after the attach has succeeded.
*** snippet ****

I try to put them in code here (uncompiled code).  wdyt?

static int bpf_cgroup_storages_alloc(struct bpf_cgroup_storage *storages[],
				     struct bpf_cgroup_storage *new_storages[],
				     struct bpf_prog *prog,
				     struct cgroup *cgrp)
{
	enum bpf_cgroup_storage_type stype;
	struct bpf_cgroup_storage_key key;
	struct bpf_map *map;

	key.cgroup_inode_id = cgroup_id(cgrp);
	key.attach_type = 0;

	for_each_cgroup_storage_type(stype) {
		map = prog->aux->cgroup_storage[stype];
		if (!map)
			continue;

		storages[stype] = cgroup_storage_lookup((void *)map, &key, false);
		if (!storages[stype]) {
			struct bpf_cgroup_storage *new_storage;

			new_storage = bpf_cgroup_storage_alloc(prog, stype);
			if (IS_ERR(new_storage)) {
				bpf_cgroup_storages_free(new_storages);
				return PTR_ERR(new_storage);
			}
			storages[stype] = new_storage;
			new_storages[stype] = new_storage;
		}
	}

	return 0;
}

@@ -422,7 +439,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
 	struct list_head *progs = &cgrp->bpf.progs[type];
 	struct bpf_prog *old_prog = NULL;
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
-	struct bpf_cgroup_storage *old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
+	struct bpf_cgroup_storage *new_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
 	struct bpf_prog_list *pl;
 	int err;
 
@@ -455,17 +472,16 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
 	if (IS_ERR(pl))
 		return PTR_ERR(pl);
 
-	if (bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog))
+	if (bpf_cgroup_storages_alloc(storage, new_storage,
+				      prog ? : link->link.prog, cgrp))
 		return -ENOMEM;
 
 	if (pl) {
 		old_prog = pl->prog;
-		bpf_cgroup_storages_unlink(pl->storage);
-		bpf_cgroup_storages_assign(old_storage, pl->storage);
 	} else {
 		pl = kmalloc(sizeof(*pl), GFP_KERNEL);
 		if (!pl) {
-			bpf_cgroup_storages_free(storage);
+			bpf_cgroup_storages_free(new_storage);
 			return -ENOMEM;
 		}
 		list_add_tail(&pl->node, progs);
@@ -480,12 +496,11 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
 	if (err)
 		goto cleanup;
 
-	bpf_cgroup_storages_free(old_storage);
 	if (old_prog)
 		bpf_prog_put(old_prog);
 	else
 		static_branch_inc(&cgroup_bpf_enabled_key);
-	bpf_cgroup_storages_link(pl->storage, cgrp, type);
+	bpf_cgroup_storages_link(new_storage, cgrp, type);
 	return 0;
 
 cleanup:
@@ -493,9 +508,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
 		pl->prog = old_prog;
 		pl->link = NULL;
 	}
-	bpf_cgroup_storages_free(pl->storage);
-	bpf_cgroup_storages_assign(pl->storage, old_storage);
-	bpf_cgroup_storages_link(pl->storage, cgrp, type);
+	bpf_cgroup_storages_free(new_storage);
 	if (!old_prog) {
 		list_del(&pl->node);
 		kfree(pl);

[ ... ]

> diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
> index 51bd5a8cb01b..78ffe69ff1d8 100644
> --- a/kernel/bpf/local_storage.c
> +++ b/kernel/bpf/local_storage.c
[ ... ]
> @@ -318,6 +313,17 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
>  static void cgroup_storage_map_free(struct bpf_map *_map)
>  {
>  	struct bpf_cgroup_storage_map *map = map_to_storage(_map);
> +	struct list_head *storages = &map->list;
> +	struct bpf_cgroup_storage *storage, *stmp;
> +
> +	mutex_lock(&cgroup_mutex);
> +
> +	list_for_each_entry_safe(storage, stmp, storages, list_map) {
> +		bpf_cgroup_storage_unlink(storage);
> +		bpf_cgroup_storage_free(storage);
> +	}
> +
> +	mutex_unlock(&cgroup_mutex);
>  
>  	WARN_ON(!RB_EMPTY_ROOT(&map->root));
>  	WARN_ON(!list_empty(&map->list));
> @@ -431,13 +437,10 @@ int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *_map)
>  
>  	spin_lock_bh(&map->lock);
>  
> -	if (map->aux && map->aux != aux)
> -		goto unlock;
>  	if (aux->cgroup_storage[stype] &&
>  	    aux->cgroup_storage[stype] != _map)
>  		goto unlock;
>  
> -	map->aux = aux;
Is spin_lock_bh(&map->lock) still required in this function?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 bpf-next 3/5] bpf: Make cgroup storages shared across attaches on the same cgroup
  2020-07-18  3:30   ` Martin KaFai Lau
@ 2020-07-18  3:48     ` YiFei Zhu
  0 siblings, 0 replies; 8+ messages in thread
From: YiFei Zhu @ 2020-07-18  3:48 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: YiFei Zhu, bpf, Alexei Starovoitov, Daniel Borkmann,
	Stanislav Fomichev, Mahesh Bandewar, Roman Gushchin,
	Andrii Nakryiko

On Fri, Jul 17, 2020 at 10:30 PM Martin KaFai Lau <kafai@fb.com> wrote:
> I still failed to understand why there is a need to do this dance
> that always link/publish first and then unlink/unpublish during failure.
> It causes all these changes to add and track "storage_reused" params
> in a few functions for handling this one failure. That also requires
> to introduce the cgroup_storage_lookup_insert().
>
> Going back to my earlier comment in v2 which I didn't here any feedback:
>
> **** snippet ****
> >> lookup old, found=>reuse, not-found=>alloc.
> >>
> >> Only publish the new storage after the attach has succeeded.
> *** snippet ****
>
> I try to put them in code here (uncompiled code).  wdyt?

Ah, I see what you mean now. I was under the false impression that
multiple CPUs may attempt to link at the same time, so one would need
a weird dance to avoid races and allocating-during-spinlock, but this
is not the case, given that they are under the cgroup_mutex. Thanks
for pointing that out. Will fix in v4.

> >       spin_lock_bh(&map->lock);
> >
> > -     if (map->aux && map->aux != aux)
> > -             goto unlock;
> >       if (aux->cgroup_storage[stype] &&
> >           aux->cgroup_storage[stype] != _map)
> >               goto unlock;
> >
> > -     map->aux = aux;
> Is spin_lock_bh(&map->lock) still required in this function?

No. Will fix in v4.

YiFei Zhu

On Fri, Jul 17, 2020 at 10:30 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Thu, Jul 16, 2020 at 07:16:27PM -0500, YiFei Zhu wrote:
>
> > Fourth, on attach, we reuse the old storage if the key already
> > exists in the map. Because the rbtree traversal holds the spinlock
> > of the map, during which we can't allocate a new storage if we
> > don't find an old storage, instead we preallocate the storage
> > unconditionally, and free the preallocated storage if we find an
> > old storage in the map. This results in a change of semantics in
> > bpf_cgroup_storage{,s}_link, and rename cgroup_storage_insert to
> > cgroup_storage_lookup_insert that does both lookup and conditionally
> > insert or free. bpf_cgroup_storage{,s}_link also tracks exactly
> > which storages are reused in an array of bools, so it can unlink
> > and free the new storages in the event that attachment failed
> > later than link. bpf_cgroup_storages_{free,unlink} accepts the
> > bool array in order to facilitate that.
>
> [ ... ]
>
> > ---
> >  include/linux/bpf-cgroup.h     | 15 +++---
> >  include/uapi/linux/bpf.h       |  2 +-
> >  kernel/bpf/cgroup.c            | 69 +++++++++++++++------------
> >  kernel/bpf/core.c              | 12 -----
> >  kernel/bpf/local_storage.c     | 85 ++++++++++++++++------------------
> >  tools/include/uapi/linux/bpf.h |  2 +-
> >  6 files changed, 91 insertions(+), 94 deletions(-)
> >
> > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> > index 2c6f26670acc..c83cd8862298 100644
> > --- a/include/linux/bpf-cgroup.h
> > +++ b/include/linux/bpf-cgroup.h
> > @@ -46,7 +46,8 @@ struct bpf_cgroup_storage {
> >       };
> >       struct bpf_cgroup_storage_map *map;
> >       struct bpf_cgroup_storage_key key;
> > -     struct list_head list;
> > +     struct list_head list_map;
> > +     struct list_head list_cg;
> >       struct rb_node node;
> >       struct rcu_head rcu;
> >  };
> > @@ -78,6 +79,9 @@ struct cgroup_bpf {
> >       struct list_head progs[MAX_BPF_ATTACH_TYPE];
> >       u32 flags[MAX_BPF_ATTACH_TYPE];
> >
> > +     /* list of cgroup shared storages */
> > +     struct list_head storages;
> > +
> >       /* temp storage for effective prog array used by prog_attach/detach */
> >       struct bpf_prog_array *inactive;
> >
> > @@ -164,12 +168,11 @@ static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage
> >  struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog,
> >                                       enum bpf_cgroup_storage_type stype);
> >  void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage);
> > -void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
> > -                          struct cgroup *cgroup,
> > -                          enum bpf_attach_type type);
> > +struct bpf_cgroup_storage *
> > +bpf_cgroup_storage_link(struct bpf_cgroup_storage *new_storage,
> > +                     struct cgroup *cgroup, bool *storage_reused);
> >  void bpf_cgroup_storage_unlink(struct bpf_cgroup_storage *storage);
> >  int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *map);
> > -void bpf_cgroup_storage_release(struct bpf_prog_aux *aux, struct bpf_map *map);
> >
> >  int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key, void *value);
> >  int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
> > @@ -383,8 +386,6 @@ static inline void bpf_cgroup_storage_set(
> >       struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE]) {}
> >  static inline int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux,
> >                                           struct bpf_map *map) { return 0; }
> > -static inline void bpf_cgroup_storage_release(struct bpf_prog_aux *aux,
> > -                                           struct bpf_map *map) {}
> >  static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(
> >       struct bpf_prog *prog, enum bpf_cgroup_storage_type stype) { return NULL; }
> >  static inline void bpf_cgroup_storage_free(
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 7ac3992dacfe..b14f008ad028 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -78,7 +78,7 @@ struct bpf_lpm_trie_key {
> >
> >  struct bpf_cgroup_storage_key {
> >       __u64   cgroup_inode_id;        /* cgroup inode id */
> > -     __u32   attach_type;            /* program attach type */
> > +     __u32   attach_type;            /* program attach type, unused */
> >  };
> >
> >  /* BPF syscall commands, see bpf(2) man-page for details. */
> > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > index ac53102e244a..6b1ef4a809bb 100644
> > --- a/kernel/bpf/cgroup.c
> > +++ b/kernel/bpf/cgroup.c
> > @@ -28,12 +28,14 @@ void cgroup_bpf_offline(struct cgroup *cgrp)
> >       percpu_ref_kill(&cgrp->bpf.refcnt);
> >  }
> >
> > -static void bpf_cgroup_storages_free(struct bpf_cgroup_storage *storages[])
> > +static void bpf_cgroup_storages_free(struct bpf_cgroup_storage *storages[],
> > +                                  bool *storage_reused)
> >  {
> >       enum bpf_cgroup_storage_type stype;
> >
> >       for_each_cgroup_storage_type(stype)
> > -             bpf_cgroup_storage_free(storages[stype]);
> > +             if (!storage_reused || !storage_reused[stype])
> > +                     bpf_cgroup_storage_free(storages[stype]);
> >  }
> >
> >  static int bpf_cgroup_storages_alloc(struct bpf_cgroup_storage *storages[],
> > @@ -45,7 +47,7 @@ static int bpf_cgroup_storages_alloc(struct bpf_cgroup_storage *storages[],
> >               storages[stype] = bpf_cgroup_storage_alloc(prog, stype);
> >               if (IS_ERR(storages[stype])) {
> >                       storages[stype] = NULL;
> > -                     bpf_cgroup_storages_free(storages);
> > +                     bpf_cgroup_storages_free(storages, NULL);
> >                       return -ENOMEM;
> >               }
> >       }
> > @@ -63,21 +65,24 @@ static void bpf_cgroup_storages_assign(struct bpf_cgroup_storage *dst[],
> >  }
> >
> >  static void bpf_cgroup_storages_link(struct bpf_cgroup_storage *storages[],
> > -                                  struct cgroup* cgrp,
> > -                                  enum bpf_attach_type attach_type)
> > +                                  struct cgroup *cgrp, bool *storage_reused)
> >  {
> >       enum bpf_cgroup_storage_type stype;
> >
> >       for_each_cgroup_storage_type(stype)
> > -             bpf_cgroup_storage_link(storages[stype], cgrp, attach_type);
> > +             storages[stype] =
> > +                     bpf_cgroup_storage_link(storages[stype], cgrp,
> > +                                             &storage_reused[stype]);
> >  }
> >
> > -static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[])
> > +static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[],
> > +                                    bool *storage_reused)
> >  {
> >       enum bpf_cgroup_storage_type stype;
> >
> >       for_each_cgroup_storage_type(stype)
> > -             bpf_cgroup_storage_unlink(storages[stype]);
> > +             if (!storage_reused || !storage_reused[stype])
> > +                     bpf_cgroup_storage_unlink(storages[stype]);
> >  }
> >
> >  /* Called when bpf_cgroup_link is auto-detached from dying cgroup.
> > @@ -101,22 +106,23 @@ static void cgroup_bpf_release(struct work_struct *work)
> >       struct cgroup *p, *cgrp = container_of(work, struct cgroup,
> >                                              bpf.release_work);
> >       struct bpf_prog_array *old_array;
> > +     struct list_head *storages = &cgrp->bpf.storages;
> > +     struct bpf_cgroup_storage *storage, *stmp;
> > +
> >       unsigned int type;
> >
> >       mutex_lock(&cgroup_mutex);
> >
> >       for (type = 0; type < ARRAY_SIZE(cgrp->bpf.progs); type++) {
> >               struct list_head *progs = &cgrp->bpf.progs[type];
> > -             struct bpf_prog_list *pl, *tmp;
> > +             struct bpf_prog_list *pl, *pltmp;
> >
> > -             list_for_each_entry_safe(pl, tmp, progs, node) {
> > +             list_for_each_entry_safe(pl, pltmp, progs, node) {
> >                       list_del(&pl->node);
> >                       if (pl->prog)
> >                               bpf_prog_put(pl->prog);
> >                       if (pl->link)
> >                               bpf_cgroup_link_auto_detach(pl->link);
> > -                     bpf_cgroup_storages_unlink(pl->storage);
> > -                     bpf_cgroup_storages_free(pl->storage);
> >                       kfree(pl);
> >                       static_branch_dec(&cgroup_bpf_enabled_key);
> >               }
> > @@ -126,6 +132,11 @@ static void cgroup_bpf_release(struct work_struct *work)
> >               bpf_prog_array_free(old_array);
> >       }
> >
> > +     list_for_each_entry_safe(storage, stmp, storages, list_cg) {
> > +             bpf_cgroup_storage_unlink(storage);
> > +             bpf_cgroup_storage_free(storage);
> > +     }
> > +
> >       mutex_unlock(&cgroup_mutex);
> >
> >       for (p = cgroup_parent(cgrp); p; p = cgroup_parent(p))
> > @@ -290,6 +301,8 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
> >       for (i = 0; i < NR; i++)
> >               INIT_LIST_HEAD(&cgrp->bpf.progs[i]);
> >
> > +     INIT_LIST_HEAD(&cgrp->bpf.storages);
> > +
> >       for (i = 0; i < NR; i++)
> >               if (compute_effective_progs(cgrp, i, &arrays[i]))
> >                       goto cleanup;
> > @@ -422,7 +435,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
> >       struct list_head *progs = &cgrp->bpf.progs[type];
> >       struct bpf_prog *old_prog = NULL;
> >       struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> > -     struct bpf_cgroup_storage *old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> > +     bool storage_reused[MAX_BPF_CGROUP_STORAGE_TYPE];
> >       struct bpf_prog_list *pl;
> >       int err;
> >
> > @@ -455,22 +468,22 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
> >       if (IS_ERR(pl))
> >               return PTR_ERR(pl);
> >
> > -     if (bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog))
> > -             return -ENOMEM;
> > -
> >       if (pl) {
> >               old_prog = pl->prog;
> > -             bpf_cgroup_storages_unlink(pl->storage);
> > -             bpf_cgroup_storages_assign(old_storage, pl->storage);
> >       } else {
> >               pl = kmalloc(sizeof(*pl), GFP_KERNEL);
> > -             if (!pl) {
> > -                     bpf_cgroup_storages_free(storage);
> > +             if (!pl)
> >                       return -ENOMEM;
> > -             }
> > +
> >               list_add_tail(&pl->node, progs);
> >       }
> >
> > +     err = bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog);
> > +     if (err)
> > +             goto cleanup;
> > +
> > +     bpf_cgroup_storages_link(storage, cgrp, storage_reused);
> > +
> >       pl->prog = prog;
> >       pl->link = link;
> >       bpf_cgroup_storages_assign(pl->storage, storage);
> > @@ -478,24 +491,24 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
> >
> >       err = update_effective_progs(cgrp, type);
> >       if (err)
> > -             goto cleanup;
> > +             goto cleanup_unlink;
> >
> > -     bpf_cgroup_storages_free(old_storage);
> >       if (old_prog)
> >               bpf_prog_put(old_prog);
> >       else
> >               static_branch_inc(&cgroup_bpf_enabled_key);
> > -     bpf_cgroup_storages_link(pl->storage, cgrp, type);
> >       return 0;
> >
> > +cleanup_unlink:
> > +     bpf_cgroup_storages_unlink(storage, storage_reused);
> > +
> I still failed to understand why there is a need to do this dance
> that always link/publish first and then unlink/unpublish during failure.
> It causes all these changes to add and track "storage_reused" params
> in a few functions for handling this one failure. That also requires
> to introduce the cgroup_storage_lookup_insert().
>
> Going back to my earlier comment in v2 which I didn't here any feedback:
>
> **** snippet ****
> >> lookup old, found=>reuse, not-found=>alloc.
> >>
> >> Only publish the new storage after the attach has succeeded.
> *** snippet ****
>
> I try to put them in code here (uncompiled code).  wdyt?
>
> static int bpf_cgroup_storages_alloc(struct bpf_cgroup_storage *storages[],
>                                      struct bpf_cgroup_storage *new_storages[],
>                                      struct bpf_prog *prog,
>                                      struct cgroup *cgrp)
> {
>         enum bpf_cgroup_storage_type stype;
>         struct bpf_cgroup_storage_key key;
>         struct bpf_map *map;
>
>         key.cgroup_inode_id = cgroup_id(cgrp);
>         key.attach_type = 0;
>
>         for_each_cgroup_storage_type(stype) {
>                 map = prog->aux->cgroup_storage[stype];
>                 if (!map)
>                         continue;
>
>                 storages[stype] = cgroup_storage_lookup((void *)map, &key, false);
>                 if (!storages[stype]) {
>                         struct bpf_cgroup_storage *new_storage;
>
>                         new_storage = bpf_cgroup_storage_alloc(prog, stype);
>                         if (IS_ERR(new_storage)) {
>                                 bpf_cgroup_storages_free(new_storages);
>                                 return PTR_ERR(new_storage);
>                         }
>                         storages[stype] = new_storage;
>                         new_storages[stype] = new_storage;
>                 }
>         }
>
>         return 0;
> }
>
> @@ -422,7 +439,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
>         struct list_head *progs = &cgrp->bpf.progs[type];
>         struct bpf_prog *old_prog = NULL;
>         struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> -       struct bpf_cgroup_storage *old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
> +       struct bpf_cgroup_storage *new_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
>         struct bpf_prog_list *pl;
>         int err;
>
> @@ -455,17 +472,16 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
>         if (IS_ERR(pl))
>                 return PTR_ERR(pl);
>
> -       if (bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog))
> +       if (bpf_cgroup_storages_alloc(storage, new_storage,
> +                                     prog ? : link->link.prog, cgrp))
>                 return -ENOMEM;
>
>         if (pl) {
>                 old_prog = pl->prog;
> -               bpf_cgroup_storages_unlink(pl->storage);
> -               bpf_cgroup_storages_assign(old_storage, pl->storage);
>         } else {
>                 pl = kmalloc(sizeof(*pl), GFP_KERNEL);
>                 if (!pl) {
> -                       bpf_cgroup_storages_free(storage);
> +                       bpf_cgroup_storages_free(new_storage);
>                         return -ENOMEM;
>                 }
>                 list_add_tail(&pl->node, progs);
> @@ -480,12 +496,11 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
>         if (err)
>                 goto cleanup;
>
> -       bpf_cgroup_storages_free(old_storage);
>         if (old_prog)
>                 bpf_prog_put(old_prog);
>         else
>                 static_branch_inc(&cgroup_bpf_enabled_key);
> -       bpf_cgroup_storages_link(pl->storage, cgrp, type);
> +       bpf_cgroup_storages_link(new_storage, cgrp, type);
>         return 0;
>
>  cleanup:
> @@ -493,9 +508,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
>                 pl->prog = old_prog;
>                 pl->link = NULL;
>         }
> -       bpf_cgroup_storages_free(pl->storage);
> -       bpf_cgroup_storages_assign(pl->storage, old_storage);
> -       bpf_cgroup_storages_link(pl->storage, cgrp, type);
> +       bpf_cgroup_storages_free(new_storage);
>         if (!old_prog) {
>                 list_del(&pl->node);
>                 kfree(pl);
>
> [ ... ]
>
> > diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
> > index 51bd5a8cb01b..78ffe69ff1d8 100644
> > --- a/kernel/bpf/local_storage.c
> > +++ b/kernel/bpf/local_storage.c
> [ ... ]
> > @@ -318,6 +313,17 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
> >  static void cgroup_storage_map_free(struct bpf_map *_map)
> >  {
> >       struct bpf_cgroup_storage_map *map = map_to_storage(_map);
> > +     struct list_head *storages = &map->list;
> > +     struct bpf_cgroup_storage *storage, *stmp;
> > +
> > +     mutex_lock(&cgroup_mutex);
> > +
> > +     list_for_each_entry_safe(storage, stmp, storages, list_map) {
> > +             bpf_cgroup_storage_unlink(storage);
> > +             bpf_cgroup_storage_free(storage);
> > +     }
> > +
> > +     mutex_unlock(&cgroup_mutex);
> >
> >       WARN_ON(!RB_EMPTY_ROOT(&map->root));
> >       WARN_ON(!list_empty(&map->list));
> > @@ -431,13 +437,10 @@ int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *_map)
> >
> >       spin_lock_bh(&map->lock);
> >
> > -     if (map->aux && map->aux != aux)
> > -             goto unlock;
> >       if (aux->cgroup_storage[stype] &&
> >           aux->cgroup_storage[stype] != _map)
> >               goto unlock;
> >
> > -     map->aux = aux;
> Is spin_lock_bh(&map->lock) still required in this function?
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-07-18  3:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-17  0:16 [PATCH v3 bpf-next 0/5] Make BPF CGROUP_STORAGE map usable by different programs at once YiFei Zhu
2020-07-17  0:16 ` [PATCH v3 bpf-next 1/5] selftests/bpf: Add test for CGROUP_STORAGE map on multiple attaches YiFei Zhu
2020-07-17  0:16 ` [PATCH v3 bpf-next 2/5] selftests/bpf: Test CGROUP_STORAGE map can't be used by multiple progs YiFei Zhu
2020-07-17  0:16 ` [PATCH v3 bpf-next 3/5] bpf: Make cgroup storages shared across attaches on the same cgroup YiFei Zhu
2020-07-18  3:30   ` Martin KaFai Lau
2020-07-18  3:48     ` YiFei Zhu
2020-07-17  0:16 ` [PATCH v3 bpf-next 4/5] selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress YiFei Zhu
2020-07-17  0:16 ` [PATCH v3 bpf-next 5/5] Documentation/bpf: Document CGROUP_STORAGE map type YiFei Zhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).