Re: [PATCH v2 bpf-next 1/3] bpf: Introduce bpf_timer

From: Cong Wang <xiyou.wangcong@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, kernel-team <kernel-team@fb.com>
Subject: Re: [PATCH v2 bpf-next 1/3] bpf: Introduce bpf_timer
Date: Mon, 14 Jun 2021 23:10:46 -0700	[thread overview]
Message-ID: <CAM_iQpV2fv=MMhf3w+YpGDXCYaMKVO_hoACL0=oXmn_pDUVexg@mail.gmail.com> (raw)
In-Reply-To: <20210611184516.tpjvlaxjc4zdeqe6@ast-mbp.dhcp.thefacebook.com>

On Fri, Jun 11, 2021 at 11:45 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jun 10, 2021 at 11:42:24PM -0700, Cong Wang wrote:
> > On Thu, Jun 10, 2021 at 9:27 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > From: Alexei Starovoitov <ast@kernel.org>
>
> Please stick to one email thread in the future, ok?
>
> I'll consolidate them here:
>
> > What is your use case to justify your own code? Asking because
> > you deny mine, so clearly my use case is not yours.
>
> I mentioned several use cases in the prior threads.
> tldr: any periodic event in tracing, networking, security.
> Garbage collection falls into this category as well, but please internalize
> that implementing conntrack as it is today in the kernel is an explicit non-goal.

You need to read my use case again, it is for the conntrack
in Cilium, not the kernel one.

>
> > And more importantly, why not just use BPF_TEST_RUN with
> > a user-space timer? Asking because you offer no API to read or
> > modify timer expiration, so literally the same with BPF_TEST_RUN
> > approach.
>
> a wrapper on top of hrtimer_get_remaining() like bpf_timer_get_remaining()
> is trivial to add, but what is the use case?

If you do not have any use case, then stick to BPF_TEST_RUN
with user-space timers? And of course your patches are not needed
at all.

>
> > >
> > > Introduce 'struct bpf_timer { __u64 :64; __u64 :64; };' that can be embedded
> > > in hash/array/lru maps as regular field and helpers to operate on it:
> >
> > Can be or has to be? Huge difference here.
>
> map elements don't have to use timers.

You interpret this in a wrong way, what I asked is whether a bpf timer has
to be embedded in a map. IOW, can a bpf timer be a standalone global
data?

>
> > In the other thread, you said it is global data, which implies that it does
> > not have to be in a map.
>
> global data is a map. That was explained in the prior thread as well.
>

I think you implied bpf timer can exist without a map, hence I am asking.

> >
> > In your test case or your example, all timers are still in a map. So what has
> > changed since then? Looks nothing to me.
>
> look again?

Yes, I just looked at it again, only more confusing, not less.

>
> > Hmm, finally you begin refcounting it, which you were strongly against. ;)
>
> That was already answered in the prior thread.
> tldr: there were two options. This is one of them. Another can be added
> in the future as well.
>
> > Three questions:
> >
> > 1. Can t->prog be freed between bpf_timer_init() and bpf_timer_start()?
>
> yes.

Good. So if a program which only initializes the timer and then exits,
the other program which shares this timer will crash when it calls
bpf_timer_start(), right?

>
> > If the timer subprog is always in the same prog which installs it, then
>
> installs it? I'm not following the quesiton.
>
> > this is fine. But then how do multiple programs share a timer?
>
> there is only one callback function.

That's exactly my question. How is one callback function shared
by multiple eBPF programs which want to share the timer?

>
> > In the
> > case of conntrack, either ingress or egress could install the timer,
> > it only depends which one gets traffic first. Do they have to copy
> > the same subprog for the same timer?
>
> conntrack is an explicit non-goal.

I interpret this as you do not want timers to be shared by multiple
eBPF programs, correct? Weirdly, maps are shared, timers are
placed in a map, so timers should be shared naturally too.

>
> >
> > 2. Can t->prog be freed between a timer expiration and bpf_timer_start()
> > again?
>
> If it's already armed with the first bpf_timer_start() it won't be freed.

Why? I see t->prog is released in your timer callback:

+       bpf_prog_put(prog);
+       this_cpu_write(hrtimer_running, NULL);
+       return HRTIMER_NORESTART;

>
> > It gets a refcnt when starting a timer and puts it when cancelling
> > or expired, so t->prog can be freed right after cancelling or expired. What
> > if another program which shares this timer wants to restart this timer?
>
> There is only one callback_fn per timer. Another program can share
> the struct bpf_timer and the map. It might have subprog callback_fn code
> that looks exactly the same as callback_fn in the first prog.
> For example when libbpf loads progs/timer.c (after it was compiled into .o)
> it might share a subprog in the future (when kernel has support for
> dynamic linking). From bpf user pov it's a single .c file.
> The split into programs and subprograms is an implemenation detail
> that C programmer doesn't need to worry about.

Not exactly, they share a same C file but still can be loaded/unloaded
separately. And logically, whether a timer has been initialized once or
twice makes a huge difference for programers.

>
> > 3. Since you offer no API to read the expiration time, why not just use
> > BPF_TEST_RUN with a user-space timer? This is preferred by Andrii.
>
> Andrii point was that there should be no syscall cmds that replicate
> bpf_timer_init/start/cancel helpers. I agree with this.

Actually there is no strong reason to bother a bpf timer unless you
want to access the timer itself, which mostly contains expiration.
User-space timers work just fine for your cases, even if not, extending
BPF_TEST_RUN should too.

>
>
> > Thanks.
> >
> > Another unpopular point of view:
> >
> > This init() is not suitable for bpf programs, because unlike kernel modules,
> > there is no init or exit functions for a bpf program. And timer init
> > is typically
> > called during module init.
>
> Already answerd this in the prior thread. There will be __init and __fini like
> subprograms in bpf progs.

I interpret this as init does not make sense until we have __init and __fini?

>
> Please apply the patches to your local tree and do few experiments based
> on selftests/bpf/progs/timer.c. I think experimenting with the code
> will answer all of your questions.

Sounds like you find a great excuse for a failure of documentation.
What I asked are just fundamental design questions you should have
covered in your cover letter, which is literally empty.

Thanks.