bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: "open list:BPF (Safe dynamic programs and tools)" 
	<netdev@vger.kernel.org>,
	"open list:BPF (Safe dynamic programs and tools)" 
	<bpf@vger.kernel.org>,
	"duanxiongchun@bytedance.com" <duanxiongchun@bytedance.com>,
	"wangdongdong.6@bytedance.com" <wangdongdong.6@bytedance.com>,
	Muchun Song <songmuchun@bytedance.com>,
	"Cong Wang" <cong.wang@bytedance.com>,
	Alexei Starovoitov <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>, "Martin Lau" <kafai@fb.com>,
	Yonghong Song <yhs@fb.com>
Subject: Re: [RFC Patch bpf-next] bpf: introduce bpf timer
Date: Tue, 6 Apr 2021 23:36:48 +0000	[thread overview]
Message-ID: <B014E4B4-D542-4005-97D5-5A3DDE446B95@fb.com> (raw)
In-Reply-To: <CAM_iQpVwDvpMa2bVwx-2=ePGrkaeCG2HZh4szO9=vAP4ur4xzw@mail.gmail.com>



> On Apr 6, 2021, at 9:48 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> On Mon, Apr 5, 2021 at 11:18 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Apr 5, 2021, at 6:24 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> 
>>> On Mon, Apr 5, 2021 at 6:08 PM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Apr 5, 2021, at 4:49 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>> 
>>>>> On Fri, Apr 2, 2021 at 4:31 PM Song Liu <songliubraving@fb.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Apr 2, 2021, at 1:57 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>>>> 
>>>>>>> Ideally I even prefer to create timers in kernel-space too, but as I already
>>>>>>> explained, this seems impossible to me.
>>>>>> 
>>>>>> Would hrtimer (include/linux/hrtimer.h) work?
>>>>> 
>>>>> By impossible, I meant it is impossible (to me) to take a refcnt to the callback
>>>>> prog if we create the timer in kernel-space. So, hrtimer is the same in this
>>>>> perspective.
>>>>> 
>>>>> Thanks.
>>>> 
>>>> I guess I am not following 100%. Here is what I would propose:
>>>> 
>>>> We only introduce a new program type BPF_PROG_TYPE_TIMER. No new map type.
>>>> The new program will trigger based on a timer, and the program can somehow
>>>> control the period of the timer (for example, via return value).
>>> 
>>> Like we already discussed, with this approach the "timer" itself is not
>>> visible to kernel, that is, only manageable in user-space. Or do you disagree?
>> 
>> Do you mean we need mechanisms to control the timer, like stop the timer,
>> trigger the timer immediately, etc.? And we need these mechanisms in kernel?
>> And by "in kernel-space" I assume you mean from BPF programs.
> 
> Yes, of course. In the context of our discussion, kernel-space only means
> eBPF code running in kernel-space. And like I showed in pseudo code,
> we want to manage the timer in eBPF code too, that is, updating timer
> expiration time and even deleting a timer. The point is we want to give
> users as much flexibility as possible, so that they can use it in whatever
> scenarios they want. We do not decide how they use them, they do.
> 
>> 
>> If these are correct, how about something like:
>> 
>> 1. A new program BPF_PROG_TYPE_TIMER, which by default will trigger on a timer.
>>   Note that, the timer here is embedded in the program. So all the operations
>>   are on the program.
>> 2. Allow adding such BPF_PROG_TYPE_TIMER programs to a map of type
>>   BPF_MAP_TYPE_PROG_ARRAY.
>> 3. Some new helpers that access the program via the BPF_MAP_TYPE_PROG_ARRAY map.
>>   Actually, maybe we can reuse existing bpf_tail_call().
> 
> Reusing bpf_tail_call() just for timer sounds even crazier than
> my current approach. So... what's the advantage of your approach
> compared to mine?

Since I don't know much about conntrack, I don't know which is better. 
The follow is just my thoughts based on the example you showed. It is 
likely that I misunderstand something. 

IIUC, the problem with user space timer is that we need a dedicated task 
for each wait-trigger loop. So I am proposing a BPF program that would
trigger up on timer expiration. 

The advantage (I think) is to not introduce a separate timer entity. 
The timer is bundled with the program.  

> 
> 
>> 
>> The BPF program and map will look like:
>> 
>> ==================== 8< ==========================
>> struct data_elem {
>>        __u64 expiration;
>>        /* other data */
>> };
> 
> So, expiration is separated from "timer" itself. Naturally, expiration
> belongs to the timer itself.

In this example, expiration is not related to the timer. It is just part
of the data element. It is possible that we won't need the expiration for 
some use cases. 

> 
>> 
>> struct {
>>        __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
>>        __uint(max_entries, 256);
>>        __type(key, __u32);
>>        __type(value, struct data_elem);
>> } data_map SEC(".maps");
>> 
>> struct {
>>        __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
>>        __uint(max_entries, 256);
>>        __type(key, __u32);
>>        __type(value, __u64);
>> } timer_prog_map SEC(".maps");
> 
> So, users have to use two maps just for a timer. Looks unnecessarily
> complex to me.

The data_map is not for the timer program, it is for the actual data. 
timer_prog_map is also optional here. We only need it when we want to 
trigger the program sooner than the scheduled time. If we can wait a
little longer, timer_prog_map can also be removed.

> 
>> 
>> static __u64
>> check_expired_elem(struct bpf_map *map, __u32 *key, __u64 *val,
>>                 int *data)
>> {
>>        u64 expires = *val;
>> 
>>        if (expires < bpf_jiffies64()) {
> 
> Value is a 64-bit 'expiration', so it is not atomic to read/write it on 32bit
> CPU. And user-space could update it in parallel to this timer callback.
> So basically we have to use a bpf spinlock here.
> 
> 
>>                bpf_map_delete_elem(map, key);
>>                *data++;
>>        }
>> return 0;
>> }
>> 
>> SEC("timer")
>> int clean_up_timer(void)
>> {
>>        int count;
>> 
>>        bpf_for_each_map_elem(&data_map, check_expired_elem, &count, 0);
>>        if (count)
>>                return 0; // not re-arm this timer
>>        else
>>                return 10; // reschedule this timer after 10 jiffies
>> }
>> 
>> SEC("tp_btf/XXX")
>> int another_trigger(void)
>> {
>>        if (some_condition)
>>                bpf_tail_call(NULL, &timer_prog_map, idx);
> 
> Are you sure you can use bpf_tail_call() to call a prog asynchronously?

I am not sure that we gonna use bpf_tail_call() here. If necessary, we 
can introduce a new helper. 


I am not sure whether this makes sense. I feel there is still some 
misunderstanding. It will be helpful if you can share more information 
about the overall design. 

BTW: this could be a good topic for the BPF office hour. See more details
here:

https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit#gid=0

Thanks,
Song

  reply	other threads:[~2021-04-06 23:37 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01  4:26 [RFC Patch bpf-next] bpf: introduce bpf timer Cong Wang
2021-04-01  6:38 ` Song Liu
2021-04-01 17:28   ` Cong Wang
2021-04-01 20:17     ` Song Liu
2021-04-02 17:34       ` Cong Wang
2021-04-02 17:57         ` Song Liu
2021-04-02 19:08           ` Cong Wang
2021-04-02 19:43             ` Song Liu
2021-04-02 20:57               ` Cong Wang
2021-04-02 23:31                 ` Song Liu
2021-04-05 23:49                   ` Cong Wang
2021-04-06  1:07                     ` Song Liu
2021-04-06  1:24                       ` Cong Wang
2021-04-06  6:17                         ` Song Liu
2021-04-06 16:48                           ` Cong Wang
2021-04-06 23:36                             ` Song Liu [this message]
2021-04-08 22:45                               ` Cong Wang
2021-04-02 19:28 ` Alexei Starovoitov
2021-04-02 21:24   ` Cong Wang
2021-04-02 23:45     ` Alexei Starovoitov
2021-04-06  0:36       ` Cong Wang
2021-04-12 23:01         ` Alexei Starovoitov
2021-04-15  4:02           ` Cong Wang
2021-04-15  4:25             ` Alexei Starovoitov
2021-04-15 15:51               ` Cong Wang
2021-04-26 23:00               ` Cong Wang
2021-04-26 23:05                 ` Alexei Starovoitov
2021-04-26 23:37                   ` Cong Wang
2021-04-27  2:01                     ` Alexei Starovoitov
2021-04-27 11:52                       ` Jamal Hadi Salim
2021-04-27 16:36                       ` Cong Wang
2021-04-27 18:33                         ` Alexei Starovoitov
2021-05-09  5:37                           ` Cong Wang
2021-05-10 20:55                             ` Jamal Hadi Salim
2021-05-11 21:29                               ` Cong Wang
2021-05-12 22:56                                 ` Jamal Hadi Salim
2021-05-11  5:05                             ` Joe Stringer
2021-05-11 21:08                               ` Cong Wang
2021-05-12 22:43                               ` Jamal Hadi Salim
2021-05-13 18:45                                 ` Jamal Hadi Salim
2021-05-14  2:53                                   ` Cong Wang
2021-08-11 21:03                                     ` Joe Stringer
2021-05-20 18:55 [RFC PATCH bpf-next] bpf: Introduce bpf_timer Alexei Starovoitov
2021-05-21 14:38 ` Alexei Starovoitov
2021-05-21 21:37 ` Cong Wang
2021-05-23 16:01   ` Alexei Starovoitov
2021-05-24  8:45     ` Lorenz Bauer
2021-05-25  3:16     ` Cong Wang
2021-05-25  4:59       ` Cong Wang
2021-05-25 18:21         ` Alexei Starovoitov
2021-05-25 19:35           ` Jamal Hadi Salim
2021-05-25 19:57             ` Alexei Starovoitov
2021-05-25 21:09               ` Jamal Hadi Salim
2021-05-25 22:08                 ` Alexei Starovoitov
2021-05-26 15:34                   ` Jamal Hadi Salim
2021-05-26 16:58                     ` Alexei Starovoitov
2021-05-26 18:25                       ` Jamal Hadi Salim
2021-05-30  6:36           ` Cong Wang
2021-06-02  2:00             ` Alexei Starovoitov
2021-06-02  8:48               ` Toke Høiland-Jørgensen
2021-06-02 17:54                 ` Martin KaFai Lau
2021-06-02 18:13                   ` Kumar Kartikeya Dwivedi
2021-06-02 18:26                     ` Alexei Starovoitov
2021-06-02 18:30                       ` Kumar Kartikeya Dwivedi
2021-06-02 18:46                     ` John Fastabend
2021-05-23 11:48 ` Toke Høiland-Jørgensen
2021-05-23 15:58   ` Alexei Starovoitov
2021-05-24  8:42     ` Lorenz Bauer
2021-05-24 14:48       ` Alexei Starovoitov
2021-05-24 17:33     ` Alexei Starovoitov
2021-05-24 18:39       ` Toke Høiland-Jørgensen
2021-05-24 18:38     ` Toke Høiland-Jørgensen
2021-05-24 11:49 ` Lorenz Bauer
2021-05-24 14:56   ` Alexei Starovoitov
2021-05-24 19:13     ` Andrii Nakryiko
2021-05-25  5:22       ` Cong Wang
2021-05-25 19:47         ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B014E4B4-D542-4005-97D5-5A3DDE446B95@fb.com \
    --to=songliubraving@fb.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=duanxiongchun@bytedance.com \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=songmuchun@bytedance.com \
    --cc=wangdongdong.6@bytedance.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).