Re: [PATCH v5 bpf-next 0/5] Make BPF CGROUP_STORAGE map usable by different programs at once

From: Martin KaFai Lau <kafai@fb.com>
To: YiFei Zhu <zhuyifei1999@gmail.com>
Cc: <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Stanislav Fomichev <sdf@google.com>,
	Mahesh Bandewar <maheshb@google.com>,
	Roman Gushchin <guro@fb.com>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	YiFei Zhu <zhuyifei@google.com>
Subject: Re: [PATCH v5 bpf-next 0/5] Make BPF CGROUP_STORAGE map usable by different programs at once
Date: Thu, 23 Jul 2020 17:05:51 -0700	[thread overview]
Message-ID: <20200724000551.qi4y2isxtgdtcgzc@kafai-mbp> (raw)
In-Reply-To: <cover.1595489786.git.zhuyifei@google.com>

On Thu, Jul 23, 2020 at 02:40:53AM -0500, YiFei Zhu wrote:
> From: YiFei Zhu <zhuyifei@google.com>
> 
> To access the storage in a CGROUP_STORAGE map, one uses
> bpf_get_local_storage helper, which is extremely fast due to its
> use of per-CPU variables. However, its whole code is built on
> the assumption that one map can only be used by one program at any
> time, and this prohibits any sharing of data between multiple
> programs using these maps, eliminating a lot of use cases, such
> as some per-cgroup configuration storage, written to by a
> setsockopt program and read by a cg_sock_addr program.
> 
> Why not use other map types? The great part of CGROUP_STORAGE map
> is that it is isolated by different cgroups its attached to. When
> one program uses bpf_get_local_storage, even on the same map, it
> gets different storages if it were run as a result of attaching
> to different cgroups. The kernel manages the storages, simplifying
> BPF program or userspace. In theory, one could probably use other
> maps like array or hash to do the same thing, but it would be a
> major overhead / complexity. Userspace needs to know when a cgroup
> is being freed in order to free up a space in the replacement map.
> 
> This patch set introduces a significant change to the semantics of
> CGROUP_STORAGE map type. Instead of each storage being tied to one
> single attachment, it is shared across different attachments to
> the same cgroup, and persists until either the map or the cgroup
> attached to is being freed.
> 
> User may use u64 as the key to the map, and the result would be
> that the attach type become ignored during key comparison, and
> programs of different attach types will share the same storage if
> the cgroups they are attached to are the same.
> 
> How could this break existing users?
> * Users that uses detach & reattach / program replacement as a
>   shortcut to zeroing the storage. Since we need sharing between
>   programs, we cannot zero the storage. Users that expect this
>   behavior should either attach a program with a new map, or
>   explicitly zero the map with a syscall.
> This case is dependent on undocumented implementation details, 
> so the impact should be very minimal.
> 
> Patch 1 introduces a test on the old expected behavior of the map
> type.
> 
> Patch 2 introduces a test showing how two programs cannot share
> one such map.
> 
> Patch 3 implements the change of semantics to the map.
> 
> Patch 4 amends the new test such that it yields the behavior we
> expect from the change.
> 
> Patch 5 documents the map type.
> 
> Changes since RFC:
> * Clarify commit message in patch 3 such that it says the lifetime
>   of the storage is ended at the freeing of the cgroup_bpf, rather
>   than the cgroup itself.
> * Restored an -ENOMEM check in __cgroup_bpf_attach.
> * Update selftests for recent change in network_helpers API.
> 
> Changes since v1:
> * s/CHECK_FAIL/CHECK/
> * s/bpf_prog_attach/bpf_program__attach_cgroup/
> * Moved test__start_subtest to test_cg_storage_multi.
> * Removed some redundant CHECK_FAIL where they are already CHECK-ed.
> 
> Changes since v2:
> * Lock cgroup_mutex during map_free.
> * Publish new storages only if attach is successful, by tracking
>   exactly which storages are reused in an array of bools.
> * Mention bpftool map dump showing a value of zero for attach_type
>   in patch 3 commit message.
> 
> Changes since v3:
> * Use a much simpler lookup and allocate-if-not-exist from the fact
>   that cgroup_mutex is locked during attach.
> * Removed an unnecessary spinlock hold.
> 
> Changes since v4:
> * Changed semantics so that if the key type is struct
>   bpf_cgroup_storage_key the map retains isolation between different
>   attach types. Sharing between different attach types only occur
>   when key type is u64.
> * Adapted tests and docs for the above change.
Thank for the v5.  Overall looks good. I left some nit comments
for the code changes.  The doc needs some more details.  With
that,

Acked-by: Martin KaFai Lau <kafai@fb.com>