Re: [PATCH bpf v2 0/6] Fix attach / detach uapi for sockmap and flow_dissector

From: Yonghong Song <yhs@fb.com>
To: Lorenz Bauer <lmb@cloudflare.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Stanislav Fomichev <sdf@google.com>,
	Jakub Sitnicki <jakub@cloudflare.com>,
	John Fastabend <john.fastabend@gmail.com>,
	kernel-team <kernel-team@cloudflare.com>,
	bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf v2 0/6] Fix attach / detach uapi for sockmap and flow_dissector
Date: Tue, 30 Jun 2020 08:08:33 -0700	[thread overview]
Message-ID: <8fcf1a4c-5a5a-280a-65eb-fa8bc8a298c1@fb.com> (raw)
In-Reply-To: <CACAyw9_5Dg=dTMk3TQiYFE3vzUuq68V2-NcpZCuiQqJFPn-0Dw@mail.gmail.com>

On 6/30/20 1:39 AM, Lorenz Bauer wrote:
> On Tue, 30 Jun 2020 at 06:48, Yonghong Song <yhs@fb.com> wrote:
>>
>> Since bpf_iter is mentioned here, I would like to provide a little
>> context on how target_fd in link_create is treated there.
> 
> Thanks!
> 
>> Currently, target_fd is always 0 as it is not used. This is
>> just easier if we want to use it in the future.
>>
>> In the future, bpf_iter can maintain that target_fd must be 0
>> or it may not so. For example, it can add a flag value in
>> link_create such that when flag is set it will take whatever
>> value in target_fd and use it. Or it may just take a non-0
>> target_fd as an indication of the flag is set. I have not
>> finalized patches yet. I intend to do the latter, i.e.,
>> taking a non-0 target_fd. But we will see once my bpf_iter
>> patches for map elements are out.
> 
> I had a piece of code for sockmap which did something like this:
> 
>      prog = bpf_prog_get(attr->attach_bpf_fd)
>      if (IS_ERR(prog))
>          if (!attr->attach_bpf_fd)
>              // fall back to old behaviour
>          else
>              return PTR_ERR(prog)
>      else if (prog->type != TYPE)
>          return -EINVAL
> 
> The benefit is that it continues to work if a binary is invoked with
> stdin closed, which could lead to a BPF program with fd 0.

For bpf_iter, there is no legacy. So I will have something like
     // somecondition could be new attr->flags, or some kernel internal 
checking
     if (somecondition) {
       /* not accepting fd 0 */
       if (attr->attach_bpf_fd == 0)
         return -EINVAL;
       prog = bpf_prog_get(attr->attach_bpf_fd)
       if (IS_ERR(prog))
         return PTR_ERR(prog)
     } else if (attr->attach_bpf_fd != 0)
       return -EINVAL;
or I could have
     if (somecondition) {
       /* accepting any fd */
       prog = bpf_prog_get(attr->attach_bpf_fd)
       if (IS_ERR(prog))
         return PTR_ERR(prog)
     } else if (attr->attach_bpf_fd != 0)
       return -EINVAL;

This "somecondition" is false for the current bpf_iter, so existing
behavior attr->attach_bpf_fd == 0 is still enforced.

> 
> Could this work for bpf_iter as well?
> 
>>
>> There is another example where 0 and non-0 prog_fd make a difference.
>> The attach_prog_fd field when doing prog_load.
>> When attach_prog_fd is 0, it means attaching to vmlinux through
>> attach_btf_id. If attach_prog_fd is not 0, it means attaching to
>> another bpf program (replace). So user space (libbpf) may
>> already need to pay attention to this.
> 
> That is unfortunate. What was the reason to use 0 instead of -1 to
> attach to vmlinux?

attaching to vmlinux happens first and at that time attach_prog_fd
does not exist. Later when replace prog feature is introduced,
attach_prog_fd is added. This field is used to differentiate
between vmlinux func attachment vs. bpf_prog attachment. A little
bit unfortunate, but using 0 is easier as we have check_attr
in the kernel to ensure all kernel-unsupported fields must be 0.
using -1 will break that.

> 
>>
>>> work around for fd 0 should we need to in the future.
>>>
>>> The detach case is more problematic: both cgroups and lirc2 verify
>>> that attach_bpf_fd matches the currently attached program. This
>>> way you need access to the program fd to be able to remove it.
>>> Neither sockmap nor flow_dissector do this. flow_dissector even
>>> has a check for CAP_NET_ADMIN because of this. The patch set
>>> addresses this by implementing the desired behaviour.
>>>
>>> There is a possibility for user space breakage: any callers that
>>> don't provide the correct fd will fail with ENOENT. For sockmap
>>> the risk is low: even the selftests assume that sockmap works
>>> the way I described. For flow_dissector the story is less
>>> straightforward, and the selftests use a variety of arguments.
>>>
>>> I've includes fixes tags for the oldest commits that allow an easy
>>> backport, however the behaviour dates back to when sockmap and
>>> flow_dissector were introduced. What is the best way to handle these?
>>>
>>> This set is based on top of Jakub's work "bpf, netns: Prepare
>>> for multi-prog attachment" available at
>>> https://lore.kernel.org/bpf/87k0zwmhtb.fsf@cloudflare.com/T/
>>>
>>> Since v1:
>>> - Adjust selftests
>>> - Implement detach behaviour
>>>
>>> Lorenz Bauer (6):
>>>     bpf: flow_dissector: check value of unused flags to BPF_PROG_ATTACH
>>>     bpf: flow_dissector: check value of unused flags to BPF_PROG_DETACH
>>>     bpf: sockmap: check value of unused args to BPF_PROG_ATTACH
>>>     bpf: sockmap: require attach_bpf_fd when detaching a program
>>>     selftests: bpf: pass program and target_fd in flow_dissector_reattach
>>>     selftests: bpf: pass program to bpf_prog_detach in flow_dissector
>>>
>>>    include/linux/bpf-netns.h                     |  5 +-
>>>    include/linux/bpf.h                           | 13 ++++-
>>>    include/linux/skmsg.h                         | 13 +++++
>>>    kernel/bpf/net_namespace.c                    | 22 ++++++--
>>>    kernel/bpf/syscall.c                          |  6 +--
>>>    net/core/sock_map.c                           | 53 +++++++++++++++++--
>>>    .../selftests/bpf/prog_tests/flow_dissector.c |  4 +-
>>>    .../bpf/prog_tests/flow_dissector_reattach.c  | 12 ++---
>>>    8 files changed, 103 insertions(+), 25 deletions(-)
>>>
> 
> 
>