bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Tissoires <bentiss@kernel.org>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>,
	bpf@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-input@vger.kernel.org, linux-doc@vger.kernel.org,
	 linux-kselftest@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	 Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	 Andrii Nakryiko <andrii@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
	 Yonghong Song <yonghong.song@linux.dev>,
	KP Singh <kpsingh@kernel.org>,
	 Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>,  Jiri Kosina <jikos@kernel.org>,
	Benjamin Tissoires <benjamin.tissoires@redhat.com>,
	 Jonathan Corbet <corbet@lwn.net>, Shuah Khan <shuah@kernel.org>
Subject: Re: [PATCH RFC bpf-next v2 02/10] bpf/helpers: introduce sleepable timers
Date: Fri, 16 Feb 2024 15:58:20 +0100	[thread overview]
Message-ID: <fckhc367l6eha2gpftixhzjdsmo2jts5p6ir6ukx2q5xndsbhf@btzjwvuamcv4> (raw)
In-Reply-To: <87eddccx1q.fsf@toke.dk>

On Feb 16 2024, Toke Høiland-Jørgensen wrote:
> Benjamin Tissoires <bentiss@kernel.org> writes:
> 
> > On Feb 15 2024, Martin KaFai Lau wrote:
> >> On 2/14/24 9:18 AM, Benjamin Tissoires wrote:
> >> > +static void bpf_timer_work_cb(struct work_struct *work)
> >> > +{
> >> > +	struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, work);
> >> > +	struct bpf_map *map = t->map;
> >> > +	void *value = t->value;
> >> > +	bpf_callback_t callback_fn;
> >> > +	void *key;
> >> > +	u32 idx;
> >> > +
> >> > +	BTF_TYPE_EMIT(struct bpf_timer);
> >> > +
> >> > +	rcu_read_lock();
> >> > +	callback_fn = rcu_dereference(t->sleepable_cb_fn);
> >> > +	rcu_read_unlock();
> >> 
> >> I took a very brief look at patch 2. One thing that may worth to ask here,
> >> the rcu_read_unlock() seems to be done too early. It is protecting the
> >> t->sleepable_cb_fn (?), so should it be done after finished using the
> >> callback_fn?
> >
> > Probably :)
> >
> > TBH, everytime I work with RCUs I spent countless hours trying to
> > re-understand everything, and in this case I'm currently in the "let's
> > make it work" process than fixing concurrency issues.
> > I still gave it a shot in case it solves my issue, but no, I still have
> > the crash.
> >
> > But given that callback_fn might sleep, isn't it an issue to keep the
> > RCU_reader lock so long? (we don't seem to call synchronize_rcu() so it
> > might be fine, but I'd like the confirmation from someone else).
> 
> You're right, it isn't. From the RCU/checklist.rst doc:
> 
> 13.	Unlike most flavors of RCU, it *is* permissible to block in an
> 	SRCU read-side critical section (demarked by srcu_read_lock()
> 	and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
> 	Please note that if you don't need to sleep in read-side critical
> 	sections, you should be using RCU rather than SRCU, because RCU
> 	is almost always faster and easier to use than is SRCU.
> 
> So we can't use the regular RCU protection for the callback in this
> usage. We'll need to either convert it to SRCU, or add another
> protection mechanism to make sure the callback function is not freed
> from under us (like a refcnt). I suspect the latter may be simpler (from
> reading the rest of that documentation around SRCU.

Currently I'm thinking at also incrementing the ->prog held in the
bpf_hrtimer which should prevent the callback to be freed, if I'm not wrong.
Then I should be able to just release the rcu_read_unlock before calling
the actual callback. And then put the ref on ->prog once done.

But to be able to do that I might need to protect ->prog with an RCU
too.

> 
> >> A high level design question. The intention of the new
> >> bpf_timer_set_sleepable_cb() kfunc is actually to delay work to a workqueue.
> >> It is useful to delay work from the bpf_timer_cb and it may also useful to
> >> delay work from other bpf running context (e.g. the networking hooks like
> >> "tc"). The bpf_timer_set_sleepable_cb() seems to be unnecessary forcing
> >> delay-work must be done in a bpf_timer_cb.
> >
> > Basically I'm just a monkey here. I've been told that I should use
> > bpf_timer[0]. But my implementation is not finished, as Alexei mentioned
> > that we should bypass hrtimer if I'm not wrong [1].
> 
> I don't think getting rid of the hrtimer in favour of
> schedule_delayed_work() makes any sense. schedule_delayed_work() does
> exactly the same as you're doing in this version of the patch: it
> schedules a timer callback, and calls queue_work() from inside that
> timer callback. It just uses "regular" timers instead of hrtimers. So I
> don't think there's any performance benefit from using that facility; on
> the contrary, it would require extra logic to handle cancellation etc;
> might as well just re-use the existing hrtimer-based callback logic we
> already have, and do a schedule_work() from the hrtimer callback like
> you're doing now.

I agree that we can nicely emulate delayed_timer with the current patch
series. However, if I understand Alexei's idea (and Martin's) there are
cases where we just want schedule_work(), without any timer involved.
That makes a weird timer (with a delay always equal to 0), but it would
allow to satisfy those latency issues.

So (and this also answers your second email today) I'm thinking at:
- have multiple flags to control the timer (with dedicated timer_cb
  kernel functions):
  - BPF_F_TIMER_HRTIMER (default)
  - BPF_F_TIMER_WORKER (no timer, just workqueue)
  - BPF_F_TIMER_DELAYED_WORKER (hrtimer + workqueue, or actual
    delayed_work, but that's re-implementing stuffs)
- have bpf_timer_set_callback() and bpf_timer_set_sleepable_cb() strictly
  equivalent in terms of processing, but the latter just instructs the
  verifier that the callback can be sleepable (so calling
  bpf_timer_set_callback() on a BPF_F_TIMER_DELAYED_WORKER is fine as
  long as the target callback is using non sleepable kfuncs).
- ensure that bpf_timer_set_sleepable_cb() is invalid when called with
  a non sleepable timer flag.
- ensure that when the bpf_timer has no hrtimer we also return -EINVAL
  when a delay is given in bpf_timer_start().

Actually, BPF_F_TIMER_DELAYED_WORKER could be just a combination of
(BPF_F_TIMER_HRTIMER | BPF_F_TIMER_WORKER) which would reflect the
reality of how things are implemented.

Thinking through this a little bit more, maybe we should have
BPF_F_TIMER_SLEEPABLE instead of BPF_F_TIMER_WORKER. We can still have
BPF_F_TIMER_SLEEPABLE_BH when WQ_BH gets merged. _SLEEPABLE allows to
not bind the flag to the implementation...

Cheers,
Benjamin

  reply	other threads:[~2024-02-16 14:58 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-14 17:18 [PATCH RFC bpf-next v2 00/10] allow HID-BPF to do device IOs Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 01/10] bpf/verifier: introduce in_sleepable() helper Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 02/10] bpf/helpers: introduce sleepable timers Benjamin Tissoires
2024-02-15 15:23   ` Benjamin Tissoires
2024-02-16  9:50     ` Benjamin Tissoires
2024-02-21  2:59       ` Alexei Starovoitov
2024-02-21 16:06         ` Benjamin Tissoires
2024-02-16  6:36   ` Martin KaFai Lau
2024-02-16  8:13     ` Benjamin Tissoires
2024-02-16 14:18       ` Toke Høiland-Jørgensen
2024-02-16 14:58         ` Benjamin Tissoires [this message]
2024-02-17 13:42           ` Toke Høiland-Jørgensen
2024-02-21  2:49           ` Alexei Starovoitov
2024-02-16 14:20   ` Toke Høiland-Jørgensen
2024-02-16 16:58   ` Kui-Feng Lee
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 03/10] bpf/verifier: allow more maps in sleepable bpf programs Benjamin Tissoires
2024-02-21  2:51   ` Alexei Starovoitov
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 04/10] HID: bpf/dispatch: regroup kfuncs definitions Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 05/10] HID: bpf: export hid_hw_output_report as a BPF kfunc Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 06/10] selftests/hid: Add test for hid_bpf_hw_output_report Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 07/10] HID: bpf: allow to inject HID event from BPF Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 08/10] selftests/hid: add tests for hid_bpf_input_report Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 09/10] HID: bpf: allow to use bpf_timer_set_sleepable_cb() in tracing callbacks Benjamin Tissoires
2024-02-14 17:18 ` [PATCH RFC bpf-next v2 10/10] selftests/hid: add test for bpf_timer Benjamin Tissoires

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fckhc367l6eha2gpftixhzjdsmo2jts5p6ir6ukx2q5xndsbhf@btzjwvuamcv4 \
    --to=bentiss@kernel.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=benjamin.tissoires@redhat.com \
    --cc=bpf@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=jikos@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-input@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=sdf@google.com \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=toke@redhat.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).