Re: [PATCH v4 bpf-next 1/9] bpf: Introduce bpf timers.

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Martin KaFai Lau <kafai@fb.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v4 bpf-next 1/9] bpf: Introduce bpf timers.
Date: Sun, 4 Jul 2021 07:19:19 -0700	[thread overview]
Message-ID: <CAADnVQKYGBdZJiMsxMVRX8axEbH_Uh+HekcECpiZqU2oWeWv2Q@mail.gmail.com> (raw)
In-Reply-To: <20210702010455.3h4v5c4g7wx2aeth@kafai-mbp.dhcp.thefacebook.com>

On Thu, Jul 1, 2021 at 6:05 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Thu, Jul 01, 2021 at 12:20:36PM -0700, Alexei Starovoitov wrote:
> [ ... ]
>
> > +BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callback_fn,
> > +        struct bpf_prog *, prog)
> > +{
> > +     struct bpf_hrtimer *t;
> > +     struct bpf_prog *prev;
> > +     int ret = 0;
> > +
> > +     if (in_nmi())
> > +             return -EOPNOTSUPP;
> > +     ____bpf_spin_lock(&timer->lock); /* including irqsave */
> > +     t = timer->timer;
> > +     if (!t) {
> > +             ret = -EINVAL;
> > +             goto out;
> > +     }
> > +     if (!atomic64_read(&(t->map->usercnt))) {
> > +             /* maps with timers must be either held by user space
> > +              * or pinned in bpffs. Otherwise timer might still be
> > +              * running even when bpf prog is detached and user space
> > +              * is gone, since map_release_uref won't ever be called.
> > +              */
> > +             ret = -EPERM;
> > +             goto out;
> > +     }
> > +     prev = t->prog;
> > +     if (prev != prog) {
> > +             if (prev)
> > +                     /* Drop prev prog refcnt when swapping with new prog */
> > +                     bpf_prog_put(prev);
> > +             /* Bump prog refcnt once. Every bpf_timer_set_callback()
> > +              * can pick different callback_fn-s within the same prog.
> > +              */
> > +             bpf_prog_inc(prog);
> I think prog->aux->refcnt could be zero here when the bpf prog
> is making its final run and before the rcu grace section ended,
> so bpf_prog_inc_not_zero() should be used here.

good point.
What should be the failure mode?
Return the error from the helper and set the prog/callback to NULL?

> > +             t->prog = prog;
> > +     }
> > +     t->callback_fn = callback_fn;
> > +out:
> > +     ____bpf_spin_unlock(&timer->lock); /* including irqrestore */
> > +     return ret;
> > +}
> > +
>
> [ ... ]
>
> > @@ -5837,6 +5906,8 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
> >           func_id != BPF_FUNC_map_pop_elem &&
> >           func_id != BPF_FUNC_map_peek_elem &&
> >           func_id != BPF_FUNC_for_each_map_elem &&
> > +         func_id != BPF_FUNC_timer_init &&
> > +         func_id != BPF_FUNC_timer_set_callback &&
> It seems checking the posion map_ptr_state is not needed.
> Is this change needed?

+1. Leftover from earlier versions.

> [ ... ]
>
> > @@ -12584,6 +12662,46 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> >                       continue;
> >               }
> >
> > +             if (insn->imm == BPF_FUNC_timer_set_callback) {
> > +                     /* There is no need to do:
> > +                      *     aux = &env->insn_aux_data[i + delta];
> > +                      *     if (bpf_map_ptr_poisoned(aux)) return -EINVAL;
> > +                      * for bpf_timer_set_callback(). If the same callback_fn is shared
> > +                      * by different timers in different maps the poisoned check
> > +                      * will return false positive.
> > +                      *
> > +                      * The verifier will process callback_fn as many times as necessary
> > +                      * with different maps and the register states prepared by
> > +                      * set_timer_callback_state will be accurate.
> > +                      *
> > +                      * The following use case is valid:
> > +                      *   map1 is shared by prog1, prog2, prog3.
> > +                      *   prog1 calls bpf_timer_init for some map1 elements
> > +                      *   prog2 calls bpf_timer_set_callback for some map1 elements.
> > +                      *     Those that were not bpf_timer_init-ed will return -EINVAL.
> > +                      *   prog3 calls bpf_timer_start for some map1 elements.
> > +                      *     Those that were not both bpf_timer_init-ed and
> > +                      *     bpf_timer_set_callback-ed will return -EINVAL.
> > +                      */
> > +                     struct bpf_insn ld_addrs[2] = {
> > +                             BPF_LD_IMM64(BPF_REG_3, (long)prog),
> The "prog" pointer value is used here.
>
> > +                     };
> > +
> > +                     insn_buf[0] = ld_addrs[0];
> > +                     insn_buf[1] = ld_addrs[1];
> > +                     insn_buf[2] = *insn;
> > +                     cnt = 3;
> > +
> > +                     new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
> > +                     if (!new_prog)
> > +                             return -ENOMEM;
> > +
> > +                     delta    += cnt - 1;
> > +                     env->prog = prog = new_prog;
> After bpf_patch_insn_data(), a new prog may be allocated.
> Is the above old "prog" pointer value updated accordingly?
> I could have missed something.

excellent catch. The patching of prog can go bad either here
or later if patching of some other insn happened to change prog.
I'll try to switch to dynamic prog fetching via ksym.
The timers won't work in the interpreted mode though.
But that's better trade-off than link-list of insns to patch with a prog
after all of bpf_patch_insn_data are done?
Some other way to fix this issue?