From: Benjamin Tissoires <bentiss@kernel.org>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-input@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kselftest@vger.kernel.org,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
Andrii Nakryiko <andrii@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Jiri Kosina <jikos@kernel.org>,
Benjamin Tissoires <benjamin.tissoires@redhat.com>,
Jonathan Corbet <corbet@lwn.net>, Shuah Khan <shuah@kernel.org>
Subject: Re: [PATCH RFC bpf-next v2 02/10] bpf/helpers: introduce sleepable timers
Date: Fri, 16 Feb 2024 15:58:20 +0100 [thread overview]
Message-ID: <fckhc367l6eha2gpftixhzjdsmo2jts5p6ir6ukx2q5xndsbhf@btzjwvuamcv4> (raw)
In-Reply-To: <87eddccx1q.fsf@toke.dk>
On Feb 16 2024, Toke Høiland-Jørgensen wrote:
> Benjamin Tissoires <bentiss@kernel.org> writes:
>
> > On Feb 15 2024, Martin KaFai Lau wrote:
> >> On 2/14/24 9:18 AM, Benjamin Tissoires wrote:
> >> > +static void bpf_timer_work_cb(struct work_struct *work)
> >> > +{
> >> > + struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, work);
> >> > + struct bpf_map *map = t->map;
> >> > + void *value = t->value;
> >> > + bpf_callback_t callback_fn;
> >> > + void *key;
> >> > + u32 idx;
> >> > +
> >> > + BTF_TYPE_EMIT(struct bpf_timer);
> >> > +
> >> > + rcu_read_lock();
> >> > + callback_fn = rcu_dereference(t->sleepable_cb_fn);
> >> > + rcu_read_unlock();
> >>
> >> I took a very brief look at patch 2. One thing that may worth to ask here,
> >> the rcu_read_unlock() seems to be done too early. It is protecting the
> >> t->sleepable_cb_fn (?), so should it be done after finished using the
> >> callback_fn?
> >
> > Probably :)
> >
> > TBH, everytime I work with RCUs I spent countless hours trying to
> > re-understand everything, and in this case I'm currently in the "let's
> > make it work" process than fixing concurrency issues.
> > I still gave it a shot in case it solves my issue, but no, I still have
> > the crash.
> >
> > But given that callback_fn might sleep, isn't it an issue to keep the
> > RCU_reader lock so long? (we don't seem to call synchronize_rcu() so it
> > might be fine, but I'd like the confirmation from someone else).
>
> You're right, it isn't. From the RCU/checklist.rst doc:
>
> 13. Unlike most flavors of RCU, it *is* permissible to block in an
> SRCU read-side critical section (demarked by srcu_read_lock()
> and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
> Please note that if you don't need to sleep in read-side critical
> sections, you should be using RCU rather than SRCU, because RCU
> is almost always faster and easier to use than is SRCU.
>
> So we can't use the regular RCU protection for the callback in this
> usage. We'll need to either convert it to SRCU, or add another
> protection mechanism to make sure the callback function is not freed
> from under us (like a refcnt). I suspect the latter may be simpler (from
> reading the rest of that documentation around SRCU.
Currently I'm thinking at also incrementing the ->prog held in the
bpf_hrtimer which should prevent the callback to be freed, if I'm not wrong.
Then I should be able to just release the rcu_read_unlock before calling
the actual callback. And then put the ref on ->prog once done.
But to be able to do that I might need to protect ->prog with an RCU
too.
>
> >> A high level design question. The intention of the new
> >> bpf_timer_set_sleepable_cb() kfunc is actually to delay work to a workqueue.
> >> It is useful to delay work from the bpf_timer_cb and it may also useful to
> >> delay work from other bpf running context (e.g. the networking hooks like
> >> "tc"). The bpf_timer_set_sleepable_cb() seems to be unnecessary forcing
> >> delay-work must be done in a bpf_timer_cb.
> >
> > Basically I'm just a monkey here. I've been told that I should use
> > bpf_timer[0]. But my implementation is not finished, as Alexei mentioned
> > that we should bypass hrtimer if I'm not wrong [1].
>
> I don't think getting rid of the hrtimer in favour of
> schedule_delayed_work() makes any sense. schedule_delayed_work() does
> exactly the same as you're doing in this version of the patch: it
> schedules a timer callback, and calls queue_work() from inside that
> timer callback. It just uses "regular" timers instead of hrtimers. So I
> don't think there's any performance benefit from using that facility; on
> the contrary, it would require extra logic to handle cancellation etc;
> might as well just re-use the existing hrtimer-based callback logic we
> already have, and do a schedule_work() from the hrtimer callback like
> you're doing now.
I agree that we can nicely emulate delayed_timer with the current patch
series. However, if I understand Alexei's idea (and Martin's) there are
cases where we just want schedule_work(), without any timer involved.
That makes a weird timer (with a delay always equal to 0), but it would
allow to satisfy those latency issues.
So (and this also answers your second email today) I'm thinking at:
- have multiple flags to control the timer (with dedicated timer_cb
kernel functions):
- BPF_F_TIMER_HRTIMER (default)
- BPF_F_TIMER_WORKER (no timer, just workqueue)
- BPF_F_TIMER_DELAYED_WORKER (hrtimer + workqueue, or actual
delayed_work, but that's re-implementing stuffs)
- have bpf_timer_set_callback() and bpf_timer_set_sleepable_cb() strictly
equivalent in terms of processing, but the latter just instructs the
verifier that the callback can be sleepable (so calling
bpf_timer_set_callback() on a BPF_F_TIMER_DELAYED_WORKER is fine as
long as the target callback is using non sleepable kfuncs).
- ensure that bpf_timer_set_sleepable_cb() is invalid when called with
a non sleepable timer flag.
- ensure that when the bpf_timer has no hrtimer we also return -EINVAL
when a delay is given in bpf_timer_start().
Actually, BPF_F_TIMER_DELAYED_WORKER could be just a combination of
(BPF_F_TIMER_HRTIMER | BPF_F_TIMER_WORKER) which would reflect the
reality of how things are implemented.
Thinking through this a little bit more, maybe we should have
BPF_F_TIMER_SLEEPABLE instead of BPF_F_TIMER_WORKER. We can still have
BPF_F_TIMER_SLEEPABLE_BH when WQ_BH gets merged. _SLEEPABLE allows to
not bind the flag to the implementation...
Cheers,
Benjamin
next prev parent reply other threads:[~2024-02-16 14:58 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-14 17:18 [PATCH RFC bpf-next v2 00/10] allow HID-BPF to do device IOs Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 01/10] bpf/verifier: introduce in_sleepable() helper Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 02/10] bpf/helpers: introduce sleepable timers Benjamin Tissoires
2024-02-15 15:23 ` Benjamin Tissoires
2024-02-16 9:50 ` Benjamin Tissoires
2024-02-21 2:59 ` Alexei Starovoitov
2024-02-21 16:06 ` Benjamin Tissoires
2024-02-16 6:36 ` Martin KaFai Lau
2024-02-16 8:13 ` Benjamin Tissoires
2024-02-16 14:18 ` Toke Høiland-Jørgensen
2024-02-16 14:58 ` Benjamin Tissoires [this message]
2024-02-17 13:42 ` Toke Høiland-Jørgensen
2024-02-21 2:49 ` Alexei Starovoitov
2024-02-16 14:20 ` Toke Høiland-Jørgensen
2024-02-16 16:58 ` Kui-Feng Lee
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 03/10] bpf/verifier: allow more maps in sleepable bpf programs Benjamin Tissoires
2024-02-21 2:51 ` Alexei Starovoitov
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 04/10] HID: bpf/dispatch: regroup kfuncs definitions Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 05/10] HID: bpf: export hid_hw_output_report as a BPF kfunc Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 06/10] selftests/hid: Add test for hid_bpf_hw_output_report Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 07/10] HID: bpf: allow to inject HID event from BPF Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 08/10] selftests/hid: add tests for hid_bpf_input_report Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 09/10] HID: bpf: allow to use bpf_timer_set_sleepable_cb() in tracing callbacks Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 10/10] selftests/hid: add test for bpf_timer Benjamin Tissoires
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fckhc367l6eha2gpftixhzjdsmo2jts5p6ir6ukx2q5xndsbhf@btzjwvuamcv4 \
--to=bentiss@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=benjamin.tissoires@redhat.com \
--cc=bpf@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=jikos@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-input@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=sdf@google.com \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=toke@redhat.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).