From: Yonghong Song <yhs@fb.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: <davem@davemloft.net>, <daniel@iogearbox.net>,
<andrii@kernel.org>, <netdev@vger.kernel.org>,
<bpf@vger.kernel.org>, <kernel-team@fb.com>
Subject: Re: [PATCH v3 bpf-next 1/8] bpf: Introduce bpf timers.
Date: Mon, 28 Jun 2021 19:24:28 -0700 [thread overview]
Message-ID: <bcc2f155-129d-12f1-1e3d-c741c746df10@fb.com> (raw)
In-Reply-To: <20210629014607.fz5tkewb6n3u6pvr@ast-mbp.dhcp.thefacebook.com>
On 6/28/21 6:46 PM, Alexei Starovoitov wrote:
> On Fri, Jun 25, 2021 at 09:54:11AM -0700, Yonghong Song wrote:
>>
>>
>> On 6/23/21 7:25 PM, Alexei Starovoitov wrote:
>>> From: Alexei Starovoitov <ast@kernel.org>
>>>
>>> Introduce 'struct bpf_timer { __u64 :64; __u64 :64; };' that can be embedded
>>> in hash/array/lru maps as a regular field and helpers to operate on it:
>>>
>>> // Initialize the timer.
>>> // First 4 bits of 'flags' specify clockid.
>>> // Only CLOCK_MONOTONIC, CLOCK_REALTIME, CLOCK_BOOTTIME are allowed.
>>> long bpf_timer_init(struct bpf_timer *timer, int flags);
>>>
>>> // Arm the timer to call callback_fn static function and set its
>>> // expiration 'nsec' nanoseconds from the current time.
>>> long bpf_timer_start(struct bpf_timer *timer, void *callback_fn, u64 nsec);
>>>
>>> // Cancel the timer and wait for callback_fn to finish if it was running.
>>> long bpf_timer_cancel(struct bpf_timer *timer);
>>>
>>> Here is how BPF program might look like:
>>> struct map_elem {
>>> int counter;
>>> struct bpf_timer timer;
>>> };
>>>
>>> struct {
>>> __uint(type, BPF_MAP_TYPE_HASH);
>>> __uint(max_entries, 1000);
>>> __type(key, int);
>>> __type(value, struct map_elem);
>>> } hmap SEC(".maps");
>>>
>>> static int timer_cb(void *map, int *key, struct map_elem *val);
>>> /* val points to particular map element that contains bpf_timer. */
>>>
>>> SEC("fentry/bpf_fentry_test1")
>>> int BPF_PROG(test1, int a)
>>> {
>>> struct map_elem *val;
>>> int key = 0;
>>>
>>> val = bpf_map_lookup_elem(&hmap, &key);
>>> if (val) {
>>> bpf_timer_init(&val->timer, CLOCK_REALTIME);
>>> bpf_timer_start(&val->timer, timer_cb, 1000 /* call timer_cb2 in 1 usec */);
>>> }
>>> }
>>>
>>> This patch adds helper implementations that rely on hrtimers
>>> to call bpf functions as timers expire.
>>> The following patches add necessary safety checks.
>>>
>>> Only programs with CAP_BPF are allowed to use bpf_timer.
>>>
>>> The amount of timers used by the program is constrained by
>>> the memcg recorded at map creation time.
>>>
>>> The bpf_timer_init() helper is receiving hidden 'map' argument and
>>> bpf_timer_start() is receiving hidden 'prog' argument supplied by the verifier.
>>> The prog pointer is needed to do refcnting of bpf program to make sure that
>>> program doesn't get freed while the timer is armed. This apporach relies on
>>> "user refcnt" scheme used in prog_array that stores bpf programs for
>>> bpf_tail_call. The bpf_timer_start() will increment the prog refcnt which is
>>> paired with bpf_timer_cancel() that will drop the prog refcnt. The
>>> ops->map_release_uref is responsible for cancelling the timers and dropping
>>> prog refcnt when user space reference to a map reaches zero.
>>> This uref approach is done to make sure that Ctrl-C of user space process will
>>> not leave timers running forever unless the user space explicitly pinned a map
>>> that contained timers in bpffs.
>>>
>>> The bpf_map_delete_elem() and bpf_map_update_elem() operations cancel
>>> and free the timer if given map element had it allocated.
>>> "bpftool map update" command can be used to cancel timers.
>>>
>>> The 'struct bpf_timer' is explicitly __attribute__((aligned(8))) because
>>> '__u64 :64' has 1 byte alignment of 8 byte padding.
>>>
>>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>>> ---
>>> include/linux/bpf.h | 3 +
>>> include/uapi/linux/bpf.h | 55 +++++++
>>> kernel/bpf/helpers.c | 281 +++++++++++++++++++++++++++++++++
>>> kernel/bpf/verifier.c | 138 ++++++++++++++++
>>> kernel/trace/bpf_trace.c | 2 +-
>>> scripts/bpf_doc.py | 2 +
>>> tools/include/uapi/linux/bpf.h | 55 +++++++
>>> 7 files changed, 535 insertions(+), 1 deletion(-)
>>>
>> [...]
>>> @@ -12533,6 +12607,70 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>>> continue;
>>> }
>>> + if (insn->imm == BPF_FUNC_timer_init) {
>>> + aux = &env->insn_aux_data[i + delta];
>>> + if (bpf_map_ptr_poisoned(aux)) {
>>> + verbose(env, "bpf_timer_init abusing map_ptr\n");
>>> + return -EINVAL;
>>> + }
>>> + map_ptr = BPF_MAP_PTR(aux->map_ptr_state);
>>> + {
>>> + struct bpf_insn ld_addrs[2] = {
>>> + BPF_LD_IMM64(BPF_REG_3, (long)map_ptr),
>>> + };
>>> +
>>> + insn_buf[0] = ld_addrs[0];
>>> + insn_buf[1] = ld_addrs[1];
>>> + }
>>> + insn_buf[2] = *insn;
>>> + cnt = 3;
>>> +
>>> + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
>>> + if (!new_prog)
>>> + return -ENOMEM;
>>> +
>>> + delta += cnt - 1;
>>> + env->prog = prog = new_prog;
>>> + insn = new_prog->insnsi + i + delta;
>>> + goto patch_call_imm;
>>> + }
>>> +
>>> + if (insn->imm == BPF_FUNC_timer_start) {
>>> + /* There is no need to do:
>>> + * aux = &env->insn_aux_data[i + delta];
>>> + * if (bpf_map_ptr_poisoned(aux)) return -EINVAL;
>>> + * for bpf_timer_start(). If the same callback_fn is shared
>>> + * by different timers in different maps the poisoned check
>>> + * will return false positive.
>>> + *
>>> + * The verifier will process callback_fn as many times as necessary
>>> + * with different maps and the register states prepared by
>>> + * set_timer_start_callback_state will be accurate.
>>> + *
>>> + * There is no need for bpf_timer_start() to check in the
>>> + * run-time that bpf_hrtimer->map stored during bpf_timer_init()
>>> + * is the same map as in bpf_timer_start()
>>> + * because it's the same map element value.
>>
>> I am puzzled by above comments. Maybe you could explain more?
>> bpf_timer_start() checked whether timer is initialized with timer->timer
>> NULL check. It will proceed only if a valid timer has been
>> initialized. I think the following scenarios are also supported:
>> 1. map1 is shared by prog1 and prog2
>> 2. prog1 call bpf_timer_init for all map1 elements
>> 3. prog2 call bpf_timer_start for some or all map1 elements.
>> So for prog2 verification, bpf_timer_init() is not even called.
>
> Right. Such timer sharing between two progs is supported.
>>From prog2 pov the bpf_timer_init() was not called, but it certainly
> had to be called by this or ther other prog.
> I'll rephrase the last paragraph.
okay.
>
> While talking to Martin about the api he pointed out that
> callback_fn in timer_start() doesn't achieve the full use case
> of replacing a prog. So in the next spin I'll split it into
> bpf_timer_set_callback(timer, callback_fn);
> bpf_timer_start(timer, nsec);
> This way callback and prog can be replaced without resetting
> timer expiry which could be useful.
I took a brief look for patch 4-6 and it looks okay. But since
you will change helper signatures I will hold and check next
revision instead.
BTW, does this mean the following scenario will be supported?
prog1: bpf_timer_set_callback(time, callback_fn)
prog2: bpf_timer_start(timer, nsec)
so here prog2 can start the timer which call prog1's callback_fn?
>
> Also Daniel and Andrii reminded that cpu pinning would be next
> feature request. The api extensibility allows to add it in the future.
> I'm going to delay implementing it until bpf_smp_call_single()
> implications are understood.
Do we need to any a 'flags' parameter for bpf_timer_start() helper
so we can encode target cpu in 'flags'?
next prev parent reply other threads:[~2021-06-29 2:25 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-24 2:25 [PATCH v3 bpf-next 0/8] bpf: Introduce BPF timers Alexei Starovoitov
2021-06-24 2:25 ` [PATCH v3 bpf-next 1/8] bpf: Introduce bpf timers Alexei Starovoitov
2021-06-25 6:25 ` Yonghong Song
2021-06-25 14:57 ` Alexei Starovoitov
2021-06-25 15:54 ` Yonghong Song
2021-06-29 1:39 ` Alexei Starovoitov
2021-06-25 16:54 ` Yonghong Song
2021-06-29 1:46 ` Alexei Starovoitov
2021-06-29 2:24 ` Yonghong Song [this message]
2021-06-29 3:32 ` Alexei Starovoitov
2021-06-29 6:34 ` Andrii Nakryiko
2021-06-29 13:28 ` Alexei Starovoitov
2021-06-30 10:08 ` Andrii Nakryiko
2021-06-30 17:38 ` Alexei Starovoitov
2021-07-01 5:40 ` Alexei Starovoitov
2021-07-01 11:51 ` Toke Høiland-Jørgensen
2021-07-01 15:34 ` Alexei Starovoitov
2021-06-24 2:25 ` [PATCH v3 bpf-next 2/8] bpf: Add map side support for " Alexei Starovoitov
2021-06-25 19:46 ` Yonghong Song
2021-06-29 1:49 ` Alexei Starovoitov
2021-06-24 2:25 ` [PATCH v3 bpf-next 3/8] bpf: Remember BTF of inner maps Alexei Starovoitov
2021-06-29 1:45 ` Yonghong Song
2021-06-24 2:25 ` [PATCH v3 bpf-next 4/8] bpf: Relax verifier recursion check Alexei Starovoitov
2021-06-24 2:25 ` [PATCH v3 bpf-next 5/8] bpf: Implement verifier support for validation of async callbacks Alexei Starovoitov
2021-06-24 2:25 ` [PATCH v3 bpf-next 6/8] bpf: Teach stack depth check about " Alexei Starovoitov
2021-06-24 2:25 ` [PATCH v3 bpf-next 7/8] selftests/bpf: Add bpf_timer test Alexei Starovoitov
2021-06-24 2:25 ` [PATCH v3 bpf-next 8/8] selftests/bpf: Add a test with bpf_timer in inner map Alexei Starovoitov
2021-06-24 11:27 ` [PATCH v3 bpf-next 0/8] bpf: Introduce BPF timers Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bcc2f155-129d-12f1-1e3d-c741c746df10@fb.com \
--to=yhs@fb.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=kernel-team@fb.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).