From: Steven Rostedt <rostedt@goodmis.org>
To: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: williams@redhat.com, Ingo Molnar <mingo@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
bristot@redhat.com, jkacur@redhat.com, efault@gmx.de,
hpa@zytor.com, torvalds@linux-foundation.org, swood@redhat.com,
linux-tip-commits@vger.kernel.org
Subject: Re: [tip:sched/core] sched/rt: Simplify the IPI based RT balancing logic
Date: Fri, 19 Jan 2018 13:11:21 -0500 [thread overview]
Message-ID: <20180119131121.22dac3d3@gandalf.local.home> (raw)
In-Reply-To: <20180119174617.GA6563@codeaurora.org>
On Fri, 19 Jan 2018 23:16:17 +0530
Pavan Kondeti <pkondeti@codeaurora.org> wrote:
> I am thinking of another problem because of the race between
> rto_push_irq_work_func() and rq_attach_root() where rq->rd is modified.
>
> Lets say, we cache the rq->rd here and queued the IRQ work on a remote
> CPU. In the mean time, the rq_attach_root() might drop all the references
> to this cached (old) rd and wants to free it. The rq->rd is freed in
> RCU-sched callback. If that remote CPU is in RCU quiescent state, the rq->rd
> can get freed before the IRQ work is executed. This results in the corruption
> of the remote CPU's IRQ work list. Right?
>
> Taking rq->lock in rto_push_irq_work_func() also does not help here. Probably
> we have to wait for the IRQ work to finish before freeing the older root domain
> in RCU-sched callback.
I was wondering about this too. Yeah, it would require an RCU like
update. Once the rd was unreferenced, it would need to wait for the
irq works to to finish before freeing it.
The easy way to do this is to simply up the refcount when sending the
domain. Something like this:
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 862a513adca3..89a086ed2b16 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1907,9 +1907,8 @@ static void push_rt_tasks(struct rq *rq)
* the rt_loop_next will cause the iterator to perform another scan.
*
*/
-static int rto_next_cpu(struct rq *rq)
+static int rto_next_cpu(struct root_domain *rd)
{
- struct root_domain *rd = rq->rd;
int next;
int cpu;
@@ -1985,19 +1984,24 @@ static void tell_cpu_to_push(struct rq *rq)
* Otherwise it is finishing up and an ipi needs to be sent.
*/
if (rq->rd->rto_cpu < 0)
- cpu = rto_next_cpu(rq);
+ cpu = rto_next_cpu(rq->rd);
raw_spin_unlock(&rq->rd->rto_lock);
rto_start_unlock(&rq->rd->rto_loop_start);
- if (cpu >= 0)
+ if (cpu >= 0) {
+ /* Make sure the rd does not get freed while pushing */
+ sched_get_rd(rq->rd);
irq_work_queue_on(&rq->rd->rto_push_work, cpu);
+ }
}
/* Called from hardirq context */
void rto_push_irq_work_func(struct irq_work *work)
{
+ struct root_domain *rd =
+ container_of(work, struct root_domain, rto_push_work);
struct rq *rq;
int cpu;
@@ -2013,18 +2017,20 @@ void rto_push_irq_work_func(struct irq_work *work)
raw_spin_unlock(&rq->lock);
}
- raw_spin_lock(&rq->rd->rto_lock);
+ raw_spin_lock(&rd->rto_lock);
/* Pass the IPI to the next rt overloaded queue */
- cpu = rto_next_cpu(rq);
+ cpu = rto_next_cpu(rd);
- raw_spin_unlock(&rq->rd->rto_lock);
+ raw_spin_unlock(&rd->rto_lock);
- if (cpu < 0)
+ if (cpu < 0) {
+ sched_put_rd(rd);
return;
+ }
/* Try the next RT overloaded CPU */
- irq_work_queue_on(&rq->rd->rto_push_work, cpu);
+ irq_work_queue_on(&rd->rto_push_work, cpu);
}
#endif /* HAVE_RT_PUSH_IPI */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2e95505e23c6..fb5fc458547f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -691,6 +691,8 @@ extern struct mutex sched_domains_mutex;
extern void init_defrootdomain(void);
extern int sched_init_domains(const struct cpumask *cpu_map);
extern void rq_attach_root(struct rq *rq, struct root_domain *rd);
+extern void sched_get_rd(struct root_domain *rd);
+extern void sched_put_rd(struct root_domain *rd);
#ifdef HAVE_RT_PUSH_IPI
extern void rto_push_irq_work_func(struct irq_work *work);
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 034cbed7f88b..519b024f4e94 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -259,6 +259,19 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
call_rcu_sched(&old_rd->rcu, free_rootdomain);
}
+void sched_get_rd(struct root_domain *rd)
+{
+ atomic_inc(&rd->refcount);
+}
+
+void sched_put_rd(struct root_domain *rd)
+{
+ if (!atomic_dec_and_test(&rd->refcount))
+ return;
+
+ call_rcu_sched(&rd->rcu, free_rootdomain);
+}
+
static int init_rootdomain(struct root_domain *rd)
{
if (!zalloc_cpumask_var(&rd->span, GFP_KERNEL))
-- Steve
next prev parent reply other threads:[~2018-01-19 18:11 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-24 15:47 [PATCH tip/sched/core v2] sched/rt: Simplify the IPI rt balancing logic Steven Rostedt
2017-05-04 14:41 ` Peter Zijlstra
2017-05-04 15:01 ` Steven Rostedt
2017-05-04 15:32 ` Peter Zijlstra
2017-05-04 17:25 ` Steven Rostedt
2017-05-04 18:42 ` Peter Zijlstra
2017-05-04 19:03 ` Steven Rostedt
2017-05-05 4:26 ` Mike Galbraith
2017-05-05 5:16 ` Mike Galbraith
2017-05-05 11:05 ` Peter Zijlstra
2017-05-05 12:02 ` Steven Rostedt
2017-05-05 17:39 ` Peter Zijlstra
2017-05-05 18:59 ` Steven Rostedt
2017-05-06 7:52 ` Peter Zijlstra
2017-10-10 11:02 ` [tip:sched/core] sched/rt: Simplify the IPI based RT " tip-bot for Steven Rostedt (Red Hat)
2018-01-19 9:23 ` Pavan Kondeti
2018-01-19 15:03 ` Steven Rostedt
2018-01-19 15:44 ` Pavan Kondeti
2018-01-19 15:53 ` Steven Rostedt
2018-01-19 17:46 ` Pavan Kondeti
2018-01-19 18:11 ` Steven Rostedt [this message]
2018-01-19 18:12 ` Steven Rostedt
2018-01-19 18:57 ` Pavan Kondeti
2018-01-19 19:51 ` Steven Rostedt
2018-01-20 4:56 ` Pavan Kondeti
2018-01-19 18:54 ` Pavan Kondeti
2018-02-06 11:54 ` [tip:sched/urgent] sched/rt: Use container_of() to get root domain in rto_push_irq_work_func() tip-bot for Steven Rostedt (VMware)
2018-02-07 4:14 ` Steven Rostedt
2018-02-06 11:54 ` [tip:sched/urgent] sched/rt: Up the root domain ref count when passing it around via IPIs tip-bot for Steven Rostedt (VMware)
2018-02-07 4:15 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180119131121.22dac3d3@gandalf.local.home \
--to=rostedt@goodmis.org \
--cc=bristot@redhat.com \
--cc=efault@gmx.de \
--cc=hpa@zytor.com \
--cc=jkacur@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=pkondeti@codeaurora.org \
--cc=swood@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=williams@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).