linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: Waiman Long <longman@redhat.com>,
	Qais Yousef <qais.yousef@arm.com>,
	Xuewen Yan <xuewen.yan@unisoc.com>,
	rafael@kernel.org, viresh.kumar@linaro.org, mingo@redhat.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	bristot@redhat.com, linux-kernel@vger.kernel.org,
	ke.wang@unisoc.com, xuewyan@foxmail.com,
	linux-pm@vger.kernel.org, Lukasz Luba <Lukasz.Luba@arm.com>
Subject: Re: [PATCH] sched/schedutil: Fix deadlock between cpuset and cpu hotplug when using schedutil
Date: Wed, 27 Jul 2022 10:09:04 -1000	[thread overview]
Message-ID: <YuGbYCfAG81mZBnN@slm.duckdns.org> (raw)
In-Reply-To: <CAB8ipk-8cbur-m733py-cw4bXCt7gkd8gAOXtKO+-fV1B2EeZw@mail.gmail.com>

Hello,

On Wed, Jul 20, 2022 at 03:45:27PM +0800, Xuewen Yan wrote:
> Dear all
> 
> On Wed, Jul 13, 2022 at 11:20 AM Tejun Heo <tj@kernel.org> wrote:
> >
> > On Tue, Jul 12, 2022 at 10:49:57PM -0400, Waiman Long wrote:
> > > > Well, the only thing I can think of is always grabbing cpus_read_lock()
> > > > before grabbing threadgroup_rwsem. Waiman, what do you think?
> > >
> > > That is a possible solution as cpus_read_lock() is rather lightweight. It is
> > > a good practice to acquire it first.
> >
> > On a similar note, I think we probably should re-enable percpu operations on
> > threadgroup_rwsem too by default and allow users who are affected by slower
> > write path operations to opt-in. After the new CLONE_INTO_CGROUP which
> > doesn't need the rwsem, we have far fewer write lockers after all.
> >
> 
>  If there's any patch for me to try? I would be very grateful.

Can youp please see whether the following patch fixes the problem?

Thanks.

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 13c8e91d78620..7caefc8af9127 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2345,6 +2345,38 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
 }
 EXPORT_SYMBOL_GPL(task_cgroup_path);
 
+/**
+ * lock_threadgroup - Stabilize threadgroups
+ *
+ * cgroup migration operations need the threadgroups stabilized against forks
+ * and exits, which can be achieved by write-locking cgroup_threadgroup_rwsem.
+ *
+ * Note that write-locking threadgdroup_rwsem is nested inside CPU hotplug
+ * disable. This is because cpuset needs CPU hotplug disabled during ->attach()
+ * and bringing up a CPU may involve creating new tasks which can require
+ * read-locking threadgroup_rwsem. If we call cpuset's ->attach() with
+ * threadgroup_rwsem write-locked with hotplug enabled, we can deadlock by
+ * cpuset waiting for an on-going CPU hotplug operation which in turn is waiting
+ * for the threadgroup_rwsem to be released to create new tasks. See the
+ * following for more details:
+ *
+ * http://lkml.kernel.org/r/20220711174629.uehfmqegcwn2lqzu@wubuntu
+ */
+static void lock_threadgroup(void)
+{
+	cpus_read_lock();
+	percpu_down_write(&cgroup_threadgroup_rwsem);
+}
+
+/**
+ * lock_threadgroup - Undo lock_threadgroup
+ */
+static void unlock_threadgroup(void)
+{
+	percpu_up_write(&cgroup_threadgroup_rwsem);
+	cpus_read_unlock();
+}
+
 /**
  * cgroup_migrate_add_task - add a migration target task to a migration context
  * @task: target task
@@ -2822,7 +2854,6 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
 
 struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
 					     bool *locked)
-	__acquires(&cgroup_threadgroup_rwsem)
 {
 	struct task_struct *tsk;
 	pid_t pid;
@@ -2840,7 +2871,7 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
 	 */
 	lockdep_assert_held(&cgroup_mutex);
 	if (pid || threadgroup) {
-		percpu_down_write(&cgroup_threadgroup_rwsem);
+		lock_threadgroup();
 		*locked = true;
 	} else {
 		*locked = false;
@@ -2876,7 +2907,7 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
 
 out_unlock_threadgroup:
 	if (*locked) {
-		percpu_up_write(&cgroup_threadgroup_rwsem);
+		unlock_threadgroup();
 		*locked = false;
 	}
 out_unlock_rcu:
@@ -2885,7 +2916,6 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
 }
 
 void cgroup_procs_write_finish(struct task_struct *task, bool locked)
-	__releases(&cgroup_threadgroup_rwsem)
 {
 	struct cgroup_subsys *ss;
 	int ssid;
@@ -2894,7 +2924,8 @@ void cgroup_procs_write_finish(struct task_struct *task, bool locked)
 	put_task_struct(task);
 
 	if (locked)
-		percpu_up_write(&cgroup_threadgroup_rwsem);
+		unlock_threadgroup();
+
 	for_each_subsys(ss, ssid)
 		if (ss->post_attach)
 			ss->post_attach();
@@ -2953,7 +2984,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 
 	lockdep_assert_held(&cgroup_mutex);
 
-	percpu_down_write(&cgroup_threadgroup_rwsem);
+	lock_threadgroup();
 
 	/* look up all csses currently attached to @cgrp's subtree */
 	spin_lock_irq(&css_set_lock);
@@ -2984,7 +3015,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 	ret = cgroup_migrate_execute(&mgctx);
 out_finish:
 	cgroup_migrate_finish(&mgctx);
-	percpu_up_write(&cgroup_threadgroup_rwsem);
+	unlock_threadgroup();
 	return ret;
 }
 

  reply	other threads:[~2022-07-27 20:09 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-05 12:37 [PATCH] sched/schedutil: Fix deadlock between cpuset and cpu hotplug when using schedutil Xuewen Yan
2022-07-11  7:21 ` Xuewen Yan
2022-07-11  7:32   ` Lukasz Luba
2022-07-11  7:43     ` Xuewen Yan
2022-07-11  7:47       ` Viresh Kumar
2022-07-11  7:50       ` Lukasz Luba
2022-07-11 17:46 ` Qais Yousef
2022-07-11 20:58   ` Tejun Heo
2022-07-11 21:11     ` Steven Rostedt
2022-07-12  5:42       ` Xuewen Yan
2022-07-12  5:57       ` Xuewen Yan
2022-07-13  3:02       ` Waiman Long
2022-07-12 12:57     ` Qais Yousef
2022-07-12 16:13       ` Tejun Heo
2022-07-12 17:14         ` Qais Yousef
2022-07-27 22:01         ` Peter Zijlstra
2022-07-27 22:05           ` Tejun Heo
2022-07-13  2:49     ` Waiman Long
2022-07-13  3:00       ` Tejun Heo
2022-07-20  7:45         ` Xuewen Yan
2022-07-27 20:09           ` Tejun Heo [this message]
2022-07-29  7:33             ` Xuewen Yan
2022-07-29  8:39               ` Qais Yousef
2022-07-29 19:59                 ` Tejun Heo
2022-08-09  2:02                   ` Xuewen Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YuGbYCfAG81mZBnN@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=Lukasz.Luba@arm.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=ke.wang@unisoc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=xuewen.yan94@gmail.com \
    --cc=xuewen.yan@unisoc.com \
    --cc=xuewyan@foxmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).