Re: [PATCH v3 bpf-next 02/14] bpf: introduce cgroup storage maps

From: Roman Gushchin <guro@fb.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<kernel-team@fb.com>, Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH v3 bpf-next 02/14] bpf: introduce cgroup storage maps
Date: Fri, 27 Jul 2018 10:12:43 -0700	[thread overview]
Message-ID: <20180727171239.GA9128@castle> (raw)
In-Reply-To: <a7bbd507-3de9-e574-7187-f7915add7474@iogearbox.net>

On Fri, Jul 27, 2018 at 06:11:31AM +0200, Daniel Borkmann wrote:
> On 07/20/2018 07:45 PM, Roman Gushchin wrote:
> > This commit introduces BPF_MAP_TYPE_CGROUP_STORAGE maps:
> > a special type of maps which are implementing the cgroup storage.
> > 
> > From the userspace point of view it's almost a generic
> > hash map with the (cgroup inode id, attachment type) pair
> > used as a key.
> > 
> > The only difference is that some operations are restricted:
> >   1) a user can't create new entries,
> >   2) a user can't remove existing entries.
> > 
> > The lookup from userspace is o(log(n)).
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Acked-by: Martin KaFai Lau <kafai@fb.com>
> 
> (First of all sorry for the late review, only limited availability this week
>  on my side.)

Np, thank you for the review!

> 
> > ---
> >  include/linux/bpf-cgroup.h |  38 +++++
> >  include/linux/bpf.h        |   1 +
> >  include/linux/bpf_types.h  |   3 +
> >  include/uapi/linux/bpf.h   |   6 +
> >  kernel/bpf/Makefile        |   1 +
> >  kernel/bpf/local_storage.c | 367 +++++++++++++++++++++++++++++++++++++++++++++
> >  kernel/bpf/verifier.c      |  12 ++
> >  7 files changed, 428 insertions(+)
> >  create mode 100644 kernel/bpf/local_storage.c
> > 
> > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> > index 79795c5fa7c3..6b0e7bd4b154 100644
> > --- a/include/linux/bpf-cgroup.h
> > +++ b/include/linux/bpf-cgroup.h
> > @@ -3,19 +3,39 @@
> >  #define _BPF_CGROUP_H
> >  
> >  #include <linux/jump_label.h>
> > +#include <linux/rbtree.h>
> >  #include <uapi/linux/bpf.h>
> >  
> [...]
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 15d69b278277..0b089ba4595d 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -5140,6 +5140,14 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
> >  				return -E2BIG;
> >  			}
> >  
> > +			if (map->map_type == BPF_MAP_TYPE_CGROUP_STORAGE &&
> > +			    bpf_cgroup_storage_assign(env->prog, map)) {
> > +				verbose(env,
> > +					"only one cgroup storage is allowed\n");
> > +				fdput(f);
> > +				return -EBUSY;
> > +			}
> > +
> >  			/* hold the map. If the program is rejected by verifier,
> >  			 * the map will be released by release_maps() or it
> >  			 * will be used by the valid program until it's unloaded
> > @@ -5148,6 +5156,10 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
> >  			map = bpf_map_inc(map, false);
> >  			if (IS_ERR(map)) {
> >  				fdput(f);
> > +				if (map->map_type ==
> > +				    BPF_MAP_TYPE_CGROUP_STORAGE)
> > +					bpf_cgroup_storage_release(env->prog,
> > +								   map);
> 
> I think this behavior is a bit strange, meaning that we would reset the map via
> bpf_cgroup_storage_release() in this case, but if we error out and exit in any
> later instruction the prior bpf_cgroup_storage_assign() is not undone, meaning
> at this point we have no other choice but to destroy the map since any later
> BPF prog load with bpf_cgroup_storage_assign() attempt would fail with -EBUSY
> even though it's not assigned anywhere, is that correct? Same also on any other
> errors along the prog load path. E.g. say, as one example, your verifier buffer
> is too small, so any retry with the very same program from loader side would fail
> above due to different prog pointers?

Yeah, I see...
We should call bpf_cgroup_storage_release() from the generic verifier error path.
I'll fix this and the leak in the other patch and resend soon.

Thanks!