netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	Alexei Starovoitov <ast@fb.com>
Cc: Martin KaFai Lau <kafai@fb.com>, Andrii Nakryiko <andriin@fb.com>,
	bpf <bpf@vger.kernel.org>, Networking <netdev@vger.kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next v1 03/19] bpf: add bpf_map iterator
Date: Wed, 29 Apr 2020 13:15:02 -0700	[thread overview]
Message-ID: <cc802671-76e6-e911-0e4e-53a4e99c69ff@fb.com> (raw)
In-Reply-To: <CAEf4BzZgZ7h_asHNGk_34vJv_yvLtWGcTGwdTO4fgLPySaG-Eg@mail.gmail.com>



On 4/29/20 12:19 PM, Andrii Nakryiko wrote:
> On Wed, Apr 29, 2020 at 8:34 AM Alexei Starovoitov <ast@fb.com> wrote:
>>
>> On 4/28/20 11:44 PM, Yonghong Song wrote:
>>>
>>>
>>> On 4/28/20 11:40 PM, Andrii Nakryiko wrote:
>>>> On Tue, Apr 28, 2020 at 11:30 PM Alexei Starovoitov <ast@fb.com> wrote:
>>>>>
>>>>> On 4/28/20 11:20 PM, Yonghong Song wrote:
>>>>>>
>>>>>>
>>>>>> On 4/28/20 11:08 PM, Andrii Nakryiko wrote:
>>>>>>> On Tue, Apr 28, 2020 at 10:10 PM Yonghong Song <yhs@fb.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/28/20 7:44 PM, Alexei Starovoitov wrote:
>>>>>>>>> On 4/28/20 6:15 PM, Yonghong Song wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 4/28/20 5:48 PM, Alexei Starovoitov wrote:
>>>>>>>>>>> On 4/28/20 5:37 PM, Martin KaFai Lau wrote:
>>>>>>>>>>>>> +    prog = bpf_iter_get_prog(seq, sizeof(struct
>>>>>>>>>>>>> bpf_iter_seq_map_info),
>>>>>>>>>>>>> +                 &meta.session_id, &meta.seq_num,
>>>>>>>>>>>>> +                 v == (void *)0);
>>>>>>>>>>>>     From looking at seq_file.c, when will show() be called with
>>>>>>>>>>>> "v ==
>>>>>>>>>>>> NULL"?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> that v == NULL here and the whole verifier change just to allow
>>>>>>>>>>> NULL...
>>>>>>>>>>> may be use seq_num as an indicator of the last elem instead?
>>>>>>>>>>> Like seq_num with upper bit set to indicate that it's last?
>>>>>>>>>>
>>>>>>>>>> We could. But then verifier won't have an easy way to verify that.
>>>>>>>>>> For example, the above is expected:
>>>>>>>>>>
>>>>>>>>>>          int prog(struct bpf_map *map, u64 seq_num) {
>>>>>>>>>>             if (seq_num >> 63)
>>>>>>>>>>               return 0;
>>>>>>>>>>             ... map->id ...
>>>>>>>>>>             ... map->user_cnt ...
>>>>>>>>>>          }
>>>>>>>>>>
>>>>>>>>>> But if user writes
>>>>>>>>>>
>>>>>>>>>>          int prog(struct bpf_map *map, u64 seq_num) {
>>>>>>>>>>              ... map->id ...
>>>>>>>>>>              ... map->user_cnt ...
>>>>>>>>>>          }
>>>>>>>>>>
>>>>>>>>>> verifier won't be easy to conclude inproper map pointer tracing
>>>>>>>>>> here and in the above map->id, map->user_cnt will cause
>>>>>>>>>> exceptions and they will silently get value 0.
>>>>>>>>>
>>>>>>>>> I mean always pass valid object pointer into the prog.
>>>>>>>>> In above case 'map' will always be valid.
>>>>>>>>> Consider prog that iterating all map elements.
>>>>>>>>> It's weird that the prog would always need to do
>>>>>>>>> if (map == 0)
>>>>>>>>>       goto out;
>>>>>>>>> even if it doesn't care about finding last.
>>>>>>>>> All progs would have to have such extra 'if'.
>>>>>>>>> If we always pass valid object than there is no need
>>>>>>>>> for such extra checks inside the prog.
>>>>>>>>> First and last element can be indicated via seq_num
>>>>>>>>> or via another flag or via helper call like is_this_last_elem()
>>>>>>>>> or something.
>>>>>>>>
>>>>>>>> Okay, I see what you mean now. Basically this means
>>>>>>>> seq_ops->next() should try to get/maintain next two elements,
>>>>>>>
>>>>>>> What about the case when there are no elements to iterate to begin
>>>>>>> with? In that case, we still need to call bpf_prog for (empty)
>>>>>>> post-aggregation, but we have no valid element... For bpf_map
>>>>>>> iteration we could have fake empty bpf_map that would be passed, but
>>>>>>> I'm not sure it's applicable for any time of object (e.g., having a
>>>>>>> fake task_struct is probably quite a bit more problematic?)...
>>>>>>
>>>>>> Oh, yes, thanks for reminding me of this. I put a call to
>>>>>> bpf_prog in seq_ops->stop() especially to handle no object
>>>>>> case. In that case, seq_ops->start() will return NULL,
>>>>>> seq_ops->next() won't be called, and then seq_ops->stop()
>>>>>> is called. My earlier attempt tries to hook with next()
>>>>>> and then find it not working in all cases.
>>>>>
>>>>> wait a sec. seq_ops->stop() is not the end.
>>>>> With lseek of seq_file it can be called multiple times.
>>>
>>> Yes, I have taken care of this. when the object is NULL,
>>> bpf program will be called. When the object is NULL again,
>>> it won't be called. The private data remembers it has
>>> been called with NULL.
>>
>> Even without lseek stop() will be called multiple times.
>> If I read seq_file.c correctly it will be called before
>> every copy_to_user(). Which means that for a lot of text
>> (or if read() is done with small buffer) there will be
>> plenty of start,show,show,stop sequences.
> 
> 
> Right start/stop can be called multiple times, but seems like there
> are clear indicators of beginning of iteration and end of iteration:
> - start() with seq_num == 0 is start of iteration (can be called
> multiple times, if first element overflows buffer);
> - stop() with p == NULL is end of iteration (seems like can be called
> multiple times as well, if user keeps read()'ing after iteration
> completed).
> 
> There is another problem with stop(), though. If BPF program will
> attempt to output anything during stop(), that output will be just
> discarded. Not great. Especially if that output overflows and we need

The stop() output will not be discarded in the following cases:
    - regular show() objects overflow and stop() BPF program not called
    - regular show() objects not overflow, which means iteration is done,
      and stop() BPF program does not overflow.

The stop() seq_file output will be discarded if
    - regular show() objects not overflow and stop() BPF program output
      overflows.
    - no objects to iterate, BPF program got called, but its seq_file
      write/printf will be discarded.

Two options here:
   - implement Alexei suggestion to look ahead two elements to
     always having valid object and indicating the last element
     with a special flag.
   - Per Andrii's suggestion below to implement new way or to
     tweak seq_file() a little bit to resolve the above cases
     where stop() seq_file outputs being discarded.

Will try to experiment with both above options...


> to re-allocate buffer.
> 
> We are trying to use seq_file just to reuse 140 lines of code in
> seq_read(), which is no magic, just a simple double buffer and retry
> piece of logic. We don't need lseek and traverse, we don't need all
> the escaping stuff. I think bpf_iter implementation would be much
> simpler if bpf_iter had better control over iteration. Then this whole
> "end of iteration" behavior would be crystal clear. Should we maybe
> reconsider again?
> 
> I understand we want to re-use networking iteration code, but we can
> still do that with custom implementation of seq_read, because we are
> still using struct seq_file and follow its semantics. The change would
> be to allow stop(NULL) (or any stop() call for that matter) to perform
> output (and handle retry and buffer re-allocation). Or, alternatively,
> coupled with seq_operations intercept proposal in patch #7 discussion,
> we can add extra method (e.g., finish()) that would be called after
> all elements are traversed and will allow to emit extra stuff. We can
> do that (implement finish()) in seq_read, as well, if that's going to
> fly ok with seq_file maintainers, of course.
> 

  reply	other threads:[~2020-04-29 20:15 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-27 20:12 [PATCH bpf-next v1 00/19] bpf: implement bpf iterator for kernel data Yonghong Song
2020-04-27 20:12 ` [PATCH bpf-next v1 01/19] net: refactor net assignment for seq_net_private structure Yonghong Song
2020-04-29  5:38   ` Andrii Nakryiko
2020-04-27 20:12 ` [PATCH bpf-next v1 02/19] bpf: implement an interface to register bpf_iter targets Yonghong Song
2020-04-28 16:20   ` Martin KaFai Lau
2020-04-28 16:50     ` Yonghong Song
2020-04-27 20:12 ` [PATCH bpf-next v1 03/19] bpf: add bpf_map iterator Yonghong Song
2020-04-29  0:37   ` Martin KaFai Lau
2020-04-29  0:48     ` Alexei Starovoitov
2020-04-29  1:15       ` Yonghong Song
2020-04-29  2:44         ` Alexei Starovoitov
2020-04-29  5:09           ` Yonghong Song
2020-04-29  6:08             ` Andrii Nakryiko
2020-04-29  6:20               ` Yonghong Song
2020-04-29  6:30                 ` Alexei Starovoitov
2020-04-29  6:40                   ` Andrii Nakryiko
2020-04-29  6:44                     ` Yonghong Song
2020-04-29 15:34                       ` Alexei Starovoitov
2020-04-29 18:14                         ` Yonghong Song
2020-04-29 19:19                         ` Andrii Nakryiko
2020-04-29 20:15                           ` Yonghong Song [this message]
2020-04-30  3:06                             ` Alexei Starovoitov
2020-04-30  4:01                               ` Yonghong Song
2020-04-29  6:34                 ` Martin KaFai Lau
2020-04-29  6:51                   ` Yonghong Song
2020-04-29 19:25                     ` Andrii Nakryiko
2020-04-29  1:02     ` Yonghong Song
2020-04-29  6:04   ` Andrii Nakryiko
2020-04-27 20:12 ` [PATCH bpf-next v1 04/19] bpf: allow loading of a bpf_iter program Yonghong Song
2020-04-29  0:54   ` Martin KaFai Lau
2020-04-29  1:27     ` Yonghong Song
2020-04-27 20:12 ` [PATCH bpf-next v1 05/19] bpf: support bpf tracing/iter programs for BPF_LINK_CREATE Yonghong Song
2020-04-29  1:17   ` [Potential Spoof] " Martin KaFai Lau
2020-04-29  6:25   ` Andrii Nakryiko
2020-04-27 20:12 ` [PATCH bpf-next v1 06/19] bpf: support bpf tracing/iter programs for BPF_LINK_UPDATE Yonghong Song
2020-04-29  1:32   ` Martin KaFai Lau
2020-04-29  5:04     ` Yonghong Song
2020-04-29  5:58       ` Martin KaFai Lau
2020-04-29  6:32         ` Andrii Nakryiko
2020-04-29  6:41           ` Martin KaFai Lau
2020-04-27 20:12 ` [PATCH bpf-next v1 07/19] bpf: create anonymous bpf iterator Yonghong Song
2020-04-29  5:39   ` Martin KaFai Lau
2020-04-29  6:56   ` Andrii Nakryiko
2020-04-29  7:06     ` Yonghong Song
2020-04-29 18:16       ` Andrii Nakryiko
2020-04-29 18:46         ` Martin KaFai Lau
2020-04-29 19:20           ` Yonghong Song
2020-04-29 20:50             ` Martin KaFai Lau
2020-04-29 20:54               ` Yonghong Song
2020-04-29 19:39   ` Andrii Nakryiko
2020-04-27 20:12 ` [PATCH bpf-next v1 08/19] bpf: create file " Yonghong Song
2020-04-29 20:40   ` Andrii Nakryiko
2020-04-30 18:02     ` Yonghong Song
2020-04-27 20:12 ` [PATCH bpf-next v1 09/19] bpf: add PTR_TO_BTF_ID_OR_NULL support Yonghong Song
2020-04-29 20:46   ` Andrii Nakryiko
2020-04-29 20:51     ` Yonghong Song
2020-04-27 20:12 ` [PATCH bpf-next v1 10/19] bpf: add netlink and ipv6_route targets Yonghong Song
2020-04-28 19:49   ` kbuild test robot
2020-04-28 19:50   ` [RFC PATCH] bpf: __bpf_iter__netlink() can be static kbuild test robot
2020-04-27 20:12 ` [PATCH bpf-next v1 11/19] bpf: add task and task/file targets Yonghong Song
2020-04-30  2:08   ` Andrii Nakryiko
2020-05-01 17:23     ` Yonghong Song
2020-05-01 19:01       ` Andrii Nakryiko
2020-04-27 20:12 ` [PATCH bpf-next v1 12/19] bpf: add bpf_seq_printf and bpf_seq_write helpers Yonghong Song
2020-04-28  6:02   ` kbuild test robot
2020-04-28 16:35     ` Yonghong Song
2020-04-30 20:06       ` Andrii Nakryiko
2020-04-27 20:12 ` [PATCH bpf-next v1 13/19] bpf: handle spilled PTR_TO_BTF_ID properly when checking stack_boundary Yonghong Song
2020-04-27 20:12 ` [PATCH bpf-next v1 14/19] bpf: support variable length array in tracing programs Yonghong Song
2020-04-30 20:04   ` Andrii Nakryiko
2020-04-27 20:12 ` [PATCH bpf-next v1 15/19] tools/libbpf: add bpf_iter support Yonghong Song
2020-04-30  1:41   ` Andrii Nakryiko
2020-05-02  7:17     ` Yonghong Song
2020-04-27 20:12 ` [PATCH bpf-next v1 16/19] tools/bpftool: add bpf_iter support for bptool Yonghong Song
2020-04-28  9:27   ` Quentin Monnet
2020-04-28 17:35     ` Yonghong Song
2020-04-29  8:37       ` Quentin Monnet
2020-04-27 20:12 ` [PATCH bpf-next v1 17/19] tools/bpf: selftests: add iterator programs for ipv6_route and netlink Yonghong Song
2020-04-30  2:12   ` Andrii Nakryiko
2020-04-27 20:12 ` [PATCH bpf-next v1 18/19] tools/bpf: selftests: add iter progs for bpf_map/task/task_file Yonghong Song
2020-04-27 20:12 ` [PATCH bpf-next v1 19/19] tools/bpf: selftests: add bpf_iter selftests Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cc802671-76e6-e911-0e4e-53a4e99c69ff@fb.com \
    --to=yhs@fb.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kafai@fb.com \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).