Re: [PATCH bpf-next 1/2] bpf: change uapi for bpf iterator map elements

From: Yonghong Song <yhs@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Networking <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 1/2] bpf: change uapi for bpf iterator map elements
Date: Sun, 2 Aug 2020 23:21:13 -0700	[thread overview]
Message-ID: <b9cbedf6-e407-51d7-53f5-fae7b91905e8@fb.com> (raw)
In-Reply-To: <CAEf4Bzbr--=tbmLqrgbtA4ERy8KmCYvBDfP5PciXx9x3yWpmsQ@mail.gmail.com>

On 8/2/20 10:11 PM, Andrii Nakryiko wrote:
> On Sun, Aug 2, 2020 at 7:23 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 8/2/20 6:25 PM, Andrii Nakryiko wrote:
>>> On Sat, Aug 1, 2020 at 9:22 PM Yonghong Song <yhs@fb.com> wrote:
>>>>
>>>> Commit a5cbe05a6673 ("bpf: Implement bpf iterator for
>>>> map elements") added bpf iterator support for
>>>> map elements. The map element bpf iterator requires
>>>> info to identify a particular map. In the above
>>>> commit, the attr->link_create.target_fd is used
>>>> to carry map_fd and an enum bpf_iter_link_info
>>>> is added to uapi to specify the target_fd actually
>>>> representing a map_fd:
>>>>       enum bpf_iter_link_info {
>>>>           BPF_ITER_LINK_UNSPEC = 0,
>>>>           BPF_ITER_LINK_MAP_FD = 1,
>>>>
>>>>           MAX_BPF_ITER_LINK_INFO,
>>>>       };
>>>>
>>>> This is an extensible approach as we can grow
>>>> enumerator for pid, cgroup_id, etc. and we can
>>>> unionize target_fd for pid, cgroup_id, etc.
>>>> But in the future, there are chances that
>>>> more complex customization may happen, e.g.,
>>>> for tasks, it could be filtered based on
>>>> both cgroup_id and user_id.
>>>>
>>>> This patch changed the uapi to have fields
>>>>           __aligned_u64   iter_info;
>>>>           __u32           iter_info_len;
>>>> for additional iter_info for link_create.
>>>> The iter_info is defined as
>>>>           union bpf_iter_link_info {
>>>>                   struct {
>>>>                           __u32   map_fd;
>>>>                   } map;
>>>>           };
>>>>
>>>> So future extension for additional customization
>>>> will be easier. The bpf_iter_link_info will be
>>>> passed to target callback to validate and generic
>>>> bpf_iter framework does not need to deal it any
>>>> more.
>>>>
>>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>>> ---
>>>>    include/linux/bpf.h            | 10 ++++---
>>>>    include/uapi/linux/bpf.h       | 15 +++++-----
>>>>    kernel/bpf/bpf_iter.c          | 52 +++++++++++++++-------------------
>>>>    kernel/bpf/map_iter.c          | 37 ++++++++++++++++++------
>>>>    kernel/bpf/syscall.c           |  2 +-
>>>>    net/core/bpf_sk_storage.c      | 37 ++++++++++++++++++------
>>>>    tools/include/uapi/linux/bpf.h | 15 +++++-----
>>>>    7 files changed, 104 insertions(+), 64 deletions(-)
>>>>
>>>
>>> [...]
>>>
>>>>    int bpf_iter_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>>>>    {
>>>> +       union bpf_iter_link_info __user *ulinfo;
>>>>           struct bpf_link_primer link_primer;
>>>>           struct bpf_iter_target_info *tinfo;
>>>> -       struct bpf_iter_aux_info aux = {};
>>>> +       union bpf_iter_link_info linfo;
>>>>           struct bpf_iter_link *link;
>>>> -       u32 prog_btf_id, target_fd;
>>>> +       u32 prog_btf_id, linfo_len;
>>>>           bool existed = false;
>>>> -       struct bpf_map *map;
>>>>           int err;
>>>>
>>>> +       memset(&linfo, 0, sizeof(union bpf_iter_link_info));
>>>> +
>>>> +       ulinfo = u64_to_user_ptr(attr->link_create.iter_info);
>>>> +       linfo_len = attr->link_create.iter_info_len;
>>>> +       if (ulinfo && linfo_len) {
>>>
>>> We probably want to be more strict here: if either pointer or len is
>>> non-zero, both should be present and valid. Otherwise we can have
>>> garbage in iter_info, as long as iter_info_len is zero.
>>
>> yes, it is possible iter_info_len = 0 and iter_info is not null and
>> if this happens, iter_info will not be examined.
>>
>> in kernel, we have places this is handled similarly. For example,
>> for cgroup bpf_prog query.
>>
>> kernel/bpf/cgroup.c, function __cgroup_bpf_query
>>
>>     __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
>>     ...
>>     if (attr->query.prog_cnt == 0 || !prog_ids || !cnt)
>>       return 0;
>>
>> In the above case, it is possible prog_cnt = 0 and prog_ids != NULL,
>> or prog_ids == NULL and prog_cnt != 0, and we won't return error
>> to user space.
>>
>> Not 100% sure whether we have convention here or not.
> 
> I don't know either, but I'd assume that we didn't think about 100%
> strictness when originally implementing this. So I'd go with a very
> strict check for this new functionality.

Agreed. This should be fine as the functionality is new.

> 
>>
>>>
>>>> +               err = bpf_check_uarg_tail_zero(ulinfo, sizeof(linfo),
>>>> +                                              linfo_len);
>>>> +               if (err)
>>>> +                       return err;
>>>> +               linfo_len = min_t(u32, linfo_len, sizeof(linfo));
>>>> +               if (copy_from_user(&linfo, ulinfo, linfo_len))
>>>> +                       return -EFAULT;
>>>> +       }
>>>> +
>>>>           prog_btf_id = prog->aux->attach_btf_id;
>>>>           mutex_lock(&targets_mutex);
>>>>           list_for_each_entry(tinfo, &targets, list) {
>>>> @@ -411,13 +425,6 @@ int bpf_iter_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>>>>           if (!existed)
>>>>                   return -ENOENT;
>>>>
>>>> -       /* Make sure user supplied flags are target expected. */
>>>> -       target_fd = attr->link_create.target_fd;
>>>> -       if (attr->link_create.flags != tinfo->reg_info->req_linfo)
>>>> -               return -EINVAL;
>>>> -       if (!attr->link_create.flags && target_fd)
>>>> -               return -EINVAL;
>>>> -
>>>
>>> Please still ensure that no flags are specified.
>>
>> Make sense. I also need to ensure target_fd is 0 since it is not used
>> any more.
>>
> 
> yep, good catch
> 
>>>
>>>
>>>>           link = kzalloc(sizeof(*link), GFP_USER | __GFP_NOWARN);
>>>>           if (!link)
>>>>                   return -ENOMEM;
[...]