From: Barret Rhoden <brho@google.com>
To: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org, mingo@redhat.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
bristot@redhat.com, vschneid@redhat.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org,
joshdon@google.com, pjt@google.com, derkling@google.com,
haoluo@google.com, dvernet@meta.com, dschatzberg@meta.com,
dskarlat@cs.cmu.edu, riel@surriel.com,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
kernel-team@meta.com
Subject: Re: [PATCH 14/31] sched_ext: Implement BPF extensible scheduler class
Date: Fri, 2 Dec 2022 12:08:27 -0500 [thread overview]
Message-ID: <8d146099-a12a-c5a1-4829-dec95497fdca@google.com> (raw)
In-Reply-To: <20221130082313.3241517-15-tj@kernel.org>
hi -
On 11/30/22 03:22, Tejun Heo wrote:
[...]
> +static bool consume_dispatch_q(struct rq *rq, struct rq_flags *rf,
> + struct scx_dispatch_q *dsq)
> +{
> + struct scx_rq *scx_rq = &rq->scx;
> + struct task_struct *p;
> + struct rq *task_rq;
> + bool moved = false;
> +retry:
> + if (list_empty(&dsq->fifo))
> + return false;
> +
> + raw_spin_lock(&dsq->lock);
> + list_for_each_entry(p, &dsq->fifo, scx.dsq_node) {
> + task_rq = task_rq(p);
> + if (rq == task_rq)
> + goto this_rq;
> + if (likely(rq->online) && !is_migration_disabled(p) &&
> + cpumask_test_cpu(cpu_of(rq), p->cpus_ptr))
> + goto remote_rq;
> + }
> + raw_spin_unlock(&dsq->lock);
> + return false;
> +
> +this_rq:
> + /* @dsq is locked and @p is on this rq */
> + WARN_ON_ONCE(p->scx.holding_cpu >= 0);
> + list_move_tail(&p->scx.dsq_node, &scx_rq->local_dsq.fifo);
> + dsq->nr--;
> + scx_rq->local_dsq.nr++;
> + p->scx.dsq = &scx_rq->local_dsq;
> + raw_spin_unlock(&dsq->lock);
> + return true;
> +
> +remote_rq:
> +#ifdef CONFIG_SMP
> + /*
> + * @dsq is locked and @p is on a remote rq. @p is currently protected by
> + * @dsq->lock. We want to pull @p to @rq but may deadlock if we grab
> + * @task_rq while holding @dsq and @rq locks. As dequeue can't drop the
> + * rq lock or fail, do a little dancing from our side. See
> + * move_task_to_local_dsq().
> + */
> + WARN_ON_ONCE(p->scx.holding_cpu >= 0);
> + list_del_init(&p->scx.dsq_node);
> + dsq->nr--;
> + p->scx.holding_cpu = raw_smp_processor_id();
> + raw_spin_unlock(&dsq->lock);
> +
> + rq_unpin_lock(rq, rf);
> + double_lock_balance(rq, task_rq);
> + rq_repin_lock(rq, rf);
> +
> + moved = move_task_to_local_dsq(rq, p);
you might be able to avoid the double_lock_balance() by using
move_queued_task(), which internally hands off the old rq lock and
returns with the new rq lock.
the pattern for consume_dispatch_q() would be something like:
- kfunc from bpf, with this_rq lock held
- notice p isn't on this_rq, goto remote_rq:
- do sched_ext accounting, like the this_rq->dsq->nr--
- unlock this_rq
- p_rq = task_rq_lock(p)
- double_check p->rq didn't change to this_rq during that unlock
- new_rq = move_queued_task(p_rq, rf, p, new_cpu)
- do sched_ext accounting like new_rq->dsq->nr++
- unlock new_rq
- relock the original this_rq
- return to bpf
you still end up grabbing both locks, but just not at the same time.
plus, task_rq_lock() takes the guesswork out of whether you're getting
p's rq lock or not. it looks like you're using the holding_cpu to
handle the race where p moves cpus after you read task_rq(p) but before
you lock that task_rq. maybe you can drop the whole concept of the
holding_cpu?
thanks,
barret
> +
> + double_unlock_balance(rq, task_rq);
> +#endif /* CONFIG_SMP */
> + if (likely(moved))
> + return true;
> + goto retry;
> +}
next prev parent reply other threads:[~2022-12-02 17:08 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-30 8:22 [PATCHSET RFC] sched: Implement BPF extensible scheduler class Tejun Heo
2022-11-30 8:22 ` [PATCH 01/31] rhashtable: Allow rhashtable to be used from irq-safe contexts Tejun Heo
2022-11-30 16:35 ` Linus Torvalds
2022-11-30 17:00 ` Tejun Heo
2022-12-06 21:36 ` [PATCH v2 " Tejun Heo
2022-12-09 10:50 ` patchwork-bot+netdevbpf
2022-11-30 8:22 ` [PATCH 02/31] cgroup: Implement cgroup_show_cftypes() Tejun Heo
2022-11-30 8:22 ` [PATCH 03/31] BPF: Add @prog to bpf_struct_ops->check_member() Tejun Heo
2022-11-30 8:22 ` [PATCH 04/31] sched: Allow sched_cgroup_fork() to fail and introduce sched_cancel_fork() Tejun Heo
2022-12-12 11:13 ` Peter Zijlstra
2022-12-12 18:03 ` Tejun Heo
2022-12-12 20:07 ` Peter Zijlstra
2022-12-12 20:12 ` Tejun Heo
2022-11-30 8:22 ` [PATCH 05/31] sched: Add sched_class->reweight_task() Tejun Heo
2022-12-12 11:22 ` Peter Zijlstra
2022-12-12 17:34 ` Tejun Heo
2022-12-12 20:11 ` Peter Zijlstra
2022-12-12 20:15 ` Tejun Heo
2022-11-30 8:22 ` [PATCH 06/31] sched: Add sched_class->switching_to() and expose check_class_changing/changed() Tejun Heo
2022-12-12 11:28 ` Peter Zijlstra
2022-12-12 17:59 ` Tejun Heo
2022-11-30 8:22 ` [PATCH 07/31] sched: Factor out cgroup weight conversion functions Tejun Heo
2022-11-30 8:22 ` [PATCH 08/31] sched: Expose css_tg() and __setscheduler_prio() in kernel/sched/sched.h Tejun Heo
2022-12-12 11:49 ` Peter Zijlstra
2022-12-12 17:47 ` Tejun Heo
2022-11-30 8:22 ` [PATCH 09/31] sched: Enumerate CPU cgroup file types Tejun Heo
2022-11-30 8:22 ` [PATCH 10/31] sched: Add @reason to sched_class->rq_{on|off}line() Tejun Heo
2022-12-12 11:57 ` Peter Zijlstra
2022-12-12 18:06 ` Tejun Heo
2022-11-30 8:22 ` [PATCH 11/31] sched: Add @reason to sched_move_task() Tejun Heo
2022-12-12 12:00 ` Peter Zijlstra
2022-12-12 17:54 ` Tejun Heo
2022-11-30 8:22 ` [PATCH 12/31] sched: Add normal_policy() Tejun Heo
2022-11-30 8:22 ` [PATCH 13/31] sched_ext: Add boilerplate for extensible scheduler class Tejun Heo
2022-11-30 8:22 ` [PATCH 14/31] sched_ext: Implement BPF " Tejun Heo
2022-12-02 17:08 ` Barret Rhoden [this message]
2022-12-02 18:01 ` Tejun Heo
2022-12-06 21:42 ` Tejun Heo
2022-12-06 21:44 ` Tejun Heo
2022-12-11 22:33 ` Julia Lawall
2022-12-12 2:15 ` Tejun Heo
2022-12-12 6:03 ` Julia Lawall
2022-12-12 6:08 ` Tejun Heo
2022-12-12 12:31 ` Peter Zijlstra
2022-12-12 20:03 ` Tejun Heo
2022-12-12 12:53 ` Peter Zijlstra
2022-12-12 21:33 ` Tejun Heo
2022-12-13 10:55 ` Peter Zijlstra
2022-12-13 18:12 ` Tejun Heo
2022-12-13 18:40 ` Rik van Riel
2022-12-13 23:20 ` Josh Don
2022-12-13 10:57 ` Peter Zijlstra
2022-12-13 17:32 ` Tejun Heo
2022-11-30 8:22 ` [PATCH 15/31] sched_ext: [TEMPORARY] Add temporary workaround kfunc helpers Tejun Heo
2022-11-30 8:22 ` [PATCH 16/31] sched_ext: Add scx_example_dummy and scx_example_qmap example schedulers Tejun Heo
2022-11-30 8:22 ` [PATCH 17/31] sched_ext: Add sysrq-S which disables the BPF scheduler Tejun Heo
2022-11-30 8:23 ` [PATCH 18/31] sched_ext: Implement runnable task stall watchdog Tejun Heo
2022-11-30 8:23 ` [PATCH 19/31] sched_ext: Allow BPF schedulers to disallow specific tasks from joining SCHED_EXT Tejun Heo
2022-11-30 8:23 ` [PATCH 20/31] sched_ext: Allow BPF schedulers to switch all eligible tasks into sched_ext Tejun Heo
2022-11-30 8:23 ` [PATCH 21/31] sched_ext: Implement scx_bpf_kick_cpu() and task preemption support Tejun Heo
2022-11-30 8:23 ` [PATCH 22/31] sched_ext: Add task state tracking operations Tejun Heo
2022-11-30 8:23 ` [PATCH 23/31] sched_ext: Implement tickless support Tejun Heo
2022-11-30 8:23 ` [PATCH 24/31] sched_ext: Add cgroup support Tejun Heo
2022-11-30 8:23 ` [PATCH 25/31] sched_ext: Implement SCX_KICK_WAIT Tejun Heo
2022-11-30 8:23 ` [PATCH 26/31] sched_ext: Implement sched_ext_ops.cpu_acquire/release() Tejun Heo
2022-11-30 8:23 ` [PATCH 27/31] sched_ext: Implement sched_ext_ops.cpu_online/offline() Tejun Heo
2022-11-30 8:23 ` [PATCH 28/31] sched_ext: Add Documentation/scheduler/sched-ext.rst Tejun Heo
2022-12-12 4:01 ` Bagas Sanjaya
2022-12-12 6:28 ` Tejun Heo
2022-12-12 13:07 ` Bagas Sanjaya
2022-12-12 17:30 ` Tejun Heo
2022-12-12 12:39 ` Peter Zijlstra
2022-12-12 17:16 ` Tejun Heo
2022-11-30 8:23 ` [PATCH 29/31] sched_ext: Add a basic, userland vruntime scheduler Tejun Heo
2022-11-30 8:23 ` [PATCH 30/31] BPF: [TEMPORARY] Nerf BTF scalar value check Tejun Heo
2022-11-30 8:23 ` [PATCH 31/31] sched_ext: Add a rust userspace hybrid example scheduler Tejun Heo
2022-12-12 14:03 ` Peter Zijlstra
2022-12-12 21:05 ` Peter Oskolkov
2022-12-13 11:02 ` Peter Zijlstra
2022-12-13 18:24 ` Peter Oskolkov
2022-12-12 22:00 ` Tejun Heo
2022-12-12 22:18 ` Josh Don
2022-12-13 11:30 ` Peter Zijlstra
2022-12-13 20:33 ` Tejun Heo
2022-12-14 2:00 ` Josh Don
2022-12-12 9:37 ` [PATCHSET RFC] sched: Implement BPF extensible scheduler class Peter Zijlstra
2022-12-12 17:27 ` Tejun Heo
2022-12-12 10:14 ` Peter Zijlstra
2022-12-14 2:11 ` Josh Don
2022-12-14 8:55 ` Peter Zijlstra
2022-12-14 22:23 ` Tejun Heo
2022-12-14 23:20 ` Barret Rhoden
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8d146099-a12a-c5a1-4829-dec95497fdca@google.com \
--to=brho@google.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=daniel@iogearbox.net \
--cc=derkling@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=dschatzberg@meta.com \
--cc=dskarlat@cs.cmu.edu \
--cc=dvernet@meta.com \
--cc=haoluo@google.com \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=riel@surriel.com \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).