From: Tejun Heo <tj@kernel.org>
To: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: Waiman Long <longman@redhat.com>,
Qais Yousef <qais.yousef@arm.com>,
Xuewen Yan <xuewen.yan@unisoc.com>,
rafael@kernel.org, viresh.kumar@linaro.org, mingo@redhat.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
bristot@redhat.com, linux-kernel@vger.kernel.org,
ke.wang@unisoc.com, xuewyan@foxmail.com,
linux-pm@vger.kernel.org, Lukasz Luba <Lukasz.Luba@arm.com>
Subject: Re: [PATCH] sched/schedutil: Fix deadlock between cpuset and cpu hotplug when using schedutil
Date: Wed, 27 Jul 2022 10:09:04 -1000 [thread overview]
Message-ID: <YuGbYCfAG81mZBnN@slm.duckdns.org> (raw)
In-Reply-To: <CAB8ipk-8cbur-m733py-cw4bXCt7gkd8gAOXtKO+-fV1B2EeZw@mail.gmail.com>
Hello,
On Wed, Jul 20, 2022 at 03:45:27PM +0800, Xuewen Yan wrote:
> Dear all
>
> On Wed, Jul 13, 2022 at 11:20 AM Tejun Heo <tj@kernel.org> wrote:
> >
> > On Tue, Jul 12, 2022 at 10:49:57PM -0400, Waiman Long wrote:
> > > > Well, the only thing I can think of is always grabbing cpus_read_lock()
> > > > before grabbing threadgroup_rwsem. Waiman, what do you think?
> > >
> > > That is a possible solution as cpus_read_lock() is rather lightweight. It is
> > > a good practice to acquire it first.
> >
> > On a similar note, I think we probably should re-enable percpu operations on
> > threadgroup_rwsem too by default and allow users who are affected by slower
> > write path operations to opt-in. After the new CLONE_INTO_CGROUP which
> > doesn't need the rwsem, we have far fewer write lockers after all.
> >
>
> If there's any patch for me to try? I would be very grateful.
Can youp please see whether the following patch fixes the problem?
Thanks.
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 13c8e91d78620..7caefc8af9127 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2345,6 +2345,38 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
}
EXPORT_SYMBOL_GPL(task_cgroup_path);
+/**
+ * lock_threadgroup - Stabilize threadgroups
+ *
+ * cgroup migration operations need the threadgroups stabilized against forks
+ * and exits, which can be achieved by write-locking cgroup_threadgroup_rwsem.
+ *
+ * Note that write-locking threadgdroup_rwsem is nested inside CPU hotplug
+ * disable. This is because cpuset needs CPU hotplug disabled during ->attach()
+ * and bringing up a CPU may involve creating new tasks which can require
+ * read-locking threadgroup_rwsem. If we call cpuset's ->attach() with
+ * threadgroup_rwsem write-locked with hotplug enabled, we can deadlock by
+ * cpuset waiting for an on-going CPU hotplug operation which in turn is waiting
+ * for the threadgroup_rwsem to be released to create new tasks. See the
+ * following for more details:
+ *
+ * http://lkml.kernel.org/r/20220711174629.uehfmqegcwn2lqzu@wubuntu
+ */
+static void lock_threadgroup(void)
+{
+ cpus_read_lock();
+ percpu_down_write(&cgroup_threadgroup_rwsem);
+}
+
+/**
+ * lock_threadgroup - Undo lock_threadgroup
+ */
+static void unlock_threadgroup(void)
+{
+ percpu_up_write(&cgroup_threadgroup_rwsem);
+ cpus_read_unlock();
+}
+
/**
* cgroup_migrate_add_task - add a migration target task to a migration context
* @task: target task
@@ -2822,7 +2854,6 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
bool *locked)
- __acquires(&cgroup_threadgroup_rwsem)
{
struct task_struct *tsk;
pid_t pid;
@@ -2840,7 +2871,7 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
*/
lockdep_assert_held(&cgroup_mutex);
if (pid || threadgroup) {
- percpu_down_write(&cgroup_threadgroup_rwsem);
+ lock_threadgroup();
*locked = true;
} else {
*locked = false;
@@ -2876,7 +2907,7 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
out_unlock_threadgroup:
if (*locked) {
- percpu_up_write(&cgroup_threadgroup_rwsem);
+ unlock_threadgroup();
*locked = false;
}
out_unlock_rcu:
@@ -2885,7 +2916,6 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
}
void cgroup_procs_write_finish(struct task_struct *task, bool locked)
- __releases(&cgroup_threadgroup_rwsem)
{
struct cgroup_subsys *ss;
int ssid;
@@ -2894,7 +2924,8 @@ void cgroup_procs_write_finish(struct task_struct *task, bool locked)
put_task_struct(task);
if (locked)
- percpu_up_write(&cgroup_threadgroup_rwsem);
+ unlock_threadgroup();
+
for_each_subsys(ss, ssid)
if (ss->post_attach)
ss->post_attach();
@@ -2953,7 +2984,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
lockdep_assert_held(&cgroup_mutex);
- percpu_down_write(&cgroup_threadgroup_rwsem);
+ lock_threadgroup();
/* look up all csses currently attached to @cgrp's subtree */
spin_lock_irq(&css_set_lock);
@@ -2984,7 +3015,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
ret = cgroup_migrate_execute(&mgctx);
out_finish:
cgroup_migrate_finish(&mgctx);
- percpu_up_write(&cgroup_threadgroup_rwsem);
+ unlock_threadgroup();
return ret;
}
next prev parent reply other threads:[~2022-07-27 20:09 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-05 12:37 [PATCH] sched/schedutil: Fix deadlock between cpuset and cpu hotplug when using schedutil Xuewen Yan
2022-07-11 7:21 ` Xuewen Yan
2022-07-11 7:32 ` Lukasz Luba
2022-07-11 7:43 ` Xuewen Yan
2022-07-11 7:47 ` Viresh Kumar
2022-07-11 7:50 ` Lukasz Luba
2022-07-11 17:46 ` Qais Yousef
2022-07-11 20:58 ` Tejun Heo
2022-07-11 21:11 ` Steven Rostedt
2022-07-12 5:42 ` Xuewen Yan
2022-07-12 5:57 ` Xuewen Yan
2022-07-13 3:02 ` Waiman Long
2022-07-12 12:57 ` Qais Yousef
2022-07-12 16:13 ` Tejun Heo
2022-07-12 17:14 ` Qais Yousef
2022-07-27 22:01 ` Peter Zijlstra
2022-07-27 22:05 ` Tejun Heo
2022-07-13 2:49 ` Waiman Long
2022-07-13 3:00 ` Tejun Heo
2022-07-20 7:45 ` Xuewen Yan
2022-07-27 20:09 ` Tejun Heo [this message]
2022-07-29 7:33 ` Xuewen Yan
2022-07-29 8:39 ` Qais Yousef
2022-07-29 19:59 ` Tejun Heo
2022-08-09 2:02 ` Xuewen Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YuGbYCfAG81mZBnN@slm.duckdns.org \
--to=tj@kernel.org \
--cc=Lukasz.Luba@arm.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=ke.wang@unisoc.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=qais.yousef@arm.com \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=xuewen.yan94@gmail.com \
--cc=xuewen.yan@unisoc.com \
--cc=xuewyan@foxmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).