linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] sched/rt: Fix race with sched IPIs and root domain changes
@ 2018-01-24  1:45 Steven Rostedt
  2018-01-24  1:45 ` [PATCH 1/2] sched/rt: Use container_of() to get root domain in rto_push_irq_work_func() Steven Rostedt
  2018-01-24  1:45 ` [PATCH 2/2] sched/rt: Up the root domain ref count when passing it around via IPIs Steven Rostedt
  0 siblings, 2 replies; 3+ messages in thread
From: Steven Rostedt @ 2018-01-24  1:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Pavan Kondeti

Hi Ingo and Peter,

Pavan reported that he was able to trigger a crash when doing CPU
hotplug and scheduling of real-time tasks. It came down to the
IPI logic. Here's the issue broken down:

 - The RT overloaded mask and variables are associated to a root domain.

 - The IPI logic uses the RT overload mask and variables

 - The scheduler (under rq->lock) kicks off the IPI logic

 - The IPI logic accesses the RT overload mask and variables without
   the rq->lock held.

 - The root domain can be modified while the rq->lock is not held

What Pavan saw first came from the taking of the rq->rd->rto_lock
and releasing it. There was issues where the unlock was done to a lock
not held. What happened was during CPU hotplug, the rq->rd changed
while the IPIs were going around. spin_lock(rq->rd->rto_lock)
was a different lock than the spin_unlock(rq->rd->rto_lock)

The first patch fixes that. Instead of using rq = this_rq()
and then accessing the root domain with rq->rd, as the irq work
is also associated to the root domain, we could simply grab the
rd from a container_of(work).

Then another issue came up while discussing this. That is, not only
can rq->rd change, but the rd itself can be freed. Thus to keep
the rd around, the second patch adds sched_get_rd() and sched_put_rd()
interface where the rt scheduler can up the ref count of the root domain
when it kicks off the IPIs, and then dec it, when there's no more
IPI to go around. The sched_put_rd() will also free the root domain
if the ref count goes to zero, just like it does when being released.

Please take these patches, unless you have any concerns with them.
Let me know what those concerns are.

Thanks!

-- Steve



  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
tip/sched/core

Head SHA1: 4397f04575c44e1440ec2e49b6302785c95fd2f8
fa972f829b2ce71bb0ff58bb97e80dcd001ab66c


Steven Rostedt (VMware) (2):
      sched/rt: Use container_of() to get root domain in rto_push_irq_work_func()
      sched/rt: Up the root domain ref count when passing it around via IPIs

----
 kernel/sched/rt.c       | 24 +++++++++++++++---------
 kernel/sched/sched.h    |  2 ++
 kernel/sched/topology.c | 13 +++++++++++++
 3 files changed, 30 insertions(+), 9 deletions(-)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] sched/rt: Use container_of() to get root domain in rto_push_irq_work_func()
  2018-01-24  1:45 [PATCH 0/2] sched/rt: Fix race with sched IPIs and root domain changes Steven Rostedt
@ 2018-01-24  1:45 ` Steven Rostedt
  2018-01-24  1:45 ` [PATCH 2/2] sched/rt: Up the root domain ref count when passing it around via IPIs Steven Rostedt
  1 sibling, 0 replies; 3+ messages in thread
From: Steven Rostedt @ 2018-01-24  1:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Pavan Kondeti, stable

[-- Attachment #1: 0001-sched-rt-Use-container_of-to-get-root-domain-in-rto_.patch --]
[-- Type: text/plain, Size: 2625 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

When the rto_push_irq_work_func() is called, it looks at the RT overloaded
bitmask in the root domain via the runqueue (rq->rd). The problem is that
during CPU up and down, nothing here stops rq->rd from changing between
taking the rq->rd->rto_lock and releasing it. That means the lock that is
released is not the same lock that was taken.

Instead of using this_rq()->rd to get the root domain, as the irq work is
part of the root domain, we can simply get the root domain from the irq work
that is passed to the routine:

 container_of(work, struct root_domain, rto_push_work)

This keeps the root domain consistent.

Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com
Cc: stable@vger.kernel.org
Fixes: 4bdced5c9a292 ("sched/rt: Simplify the IPI based RT balancing logic")
Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/sched/rt.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 665ace2fc558..274b55e4a560 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1907,9 +1907,8 @@ static void push_rt_tasks(struct rq *rq)
  * the rt_loop_next will cause the iterator to perform another scan.
  *
  */
-static int rto_next_cpu(struct rq *rq)
+static int rto_next_cpu(struct root_domain *rd)
 {
-	struct root_domain *rd = rq->rd;
 	int next;
 	int cpu;
 
@@ -1985,7 +1984,7 @@ static void tell_cpu_to_push(struct rq *rq)
 	 * Otherwise it is finishing up and an ipi needs to be sent.
 	 */
 	if (rq->rd->rto_cpu < 0)
-		cpu = rto_next_cpu(rq);
+		cpu = rto_next_cpu(rq->rd);
 
 	raw_spin_unlock(&rq->rd->rto_lock);
 
@@ -1998,6 +1997,8 @@ static void tell_cpu_to_push(struct rq *rq)
 /* Called from hardirq context */
 void rto_push_irq_work_func(struct irq_work *work)
 {
+	struct root_domain *rd =
+		container_of(work, struct root_domain, rto_push_work);
 	struct rq *rq;
 	int cpu;
 
@@ -2013,18 +2014,18 @@ void rto_push_irq_work_func(struct irq_work *work)
 		raw_spin_unlock(&rq->lock);
 	}
 
-	raw_spin_lock(&rq->rd->rto_lock);
+	raw_spin_lock(&rd->rto_lock);
 
 	/* Pass the IPI to the next rt overloaded queue */
-	cpu = rto_next_cpu(rq);
+	cpu = rto_next_cpu(rd);
 
-	raw_spin_unlock(&rq->rd->rto_lock);
+	raw_spin_unlock(&rd->rto_lock);
 
 	if (cpu < 0)
 		return;
 
 	/* Try the next RT overloaded CPU */
-	irq_work_queue_on(&rq->rd->rto_push_work, cpu);
+	irq_work_queue_on(&rd->rto_push_work, cpu);
 }
 #endif /* HAVE_RT_PUSH_IPI */
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] sched/rt: Up the root domain ref count when passing it around via IPIs
  2018-01-24  1:45 [PATCH 0/2] sched/rt: Fix race with sched IPIs and root domain changes Steven Rostedt
  2018-01-24  1:45 ` [PATCH 1/2] sched/rt: Use container_of() to get root domain in rto_push_irq_work_func() Steven Rostedt
@ 2018-01-24  1:45 ` Steven Rostedt
  1 sibling, 0 replies; 3+ messages in thread
From: Steven Rostedt @ 2018-01-24  1:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Pavan Kondeti, stable

[-- Attachment #1: 0002-sched-rt-Up-the-root-domain-ref-count-when-passing-i.patch --]
[-- Type: text/plain, Size: 3255 bytes --]

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

When issuing an IPI RT push, where an IPI is sent to each CPU that has more
than one RT task scheduled on it, it references the root domain's rto_mask,
that contains all the CPUs within the root domain that has more than one RT
task in the runable state. The problem is, after the IPIs are initiated, the
rq->lock is released. This means that the root domain that is associated to
the run queue could be freed while the IPIs are going around.

Add a sched_get_rd() and a sched_put_rd() that will increment and decrement
the root domain's ref count respectively. This way when initiating the IPIs,
the scheduler will up the root domain's ref count before releasing the
rq->lock, ensuring that the root domain does not go away until the IPI round
is complete.

Link: http://lkml.kernel.org/r/CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com
Cc: stable@vger.kernel.org
Fixes: 4bdced5c9a292 ("sched/rt: Simplify the IPI based RT balancing logic")
Reported-by: Pavan Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/sched/rt.c       |  9 +++++++--
 kernel/sched/sched.h    |  2 ++
 kernel/sched/topology.c | 13 +++++++++++++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 274b55e4a560..3401f588c916 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1990,8 +1990,11 @@ static void tell_cpu_to_push(struct rq *rq)
 
 	rto_start_unlock(&rq->rd->rto_loop_start);
 
-	if (cpu >= 0)
+	if (cpu >= 0) {
+		/* Make sure the rd does not get freed while pushing */
+		sched_get_rd(rq->rd);
 		irq_work_queue_on(&rq->rd->rto_push_work, cpu);
+	}
 }
 
 /* Called from hardirq context */
@@ -2021,8 +2024,10 @@ void rto_push_irq_work_func(struct irq_work *work)
 
 	raw_spin_unlock(&rd->rto_lock);
 
-	if (cpu < 0)
+	if (cpu < 0) {
+		sched_put_rd(rd);
 		return;
+	}
 
 	/* Try the next RT overloaded CPU */
 	irq_work_queue_on(&rd->rto_push_work, cpu);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2e95505e23c6..fb5fc458547f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -691,6 +691,8 @@ extern struct mutex sched_domains_mutex;
 extern void init_defrootdomain(void);
 extern int sched_init_domains(const struct cpumask *cpu_map);
 extern void rq_attach_root(struct rq *rq, struct root_domain *rd);
+extern void sched_get_rd(struct root_domain *rd);
+extern void sched_put_rd(struct root_domain *rd);
 
 #ifdef HAVE_RT_PUSH_IPI
 extern void rto_push_irq_work_func(struct irq_work *work);
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 034cbed7f88b..519b024f4e94 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -259,6 +259,19 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
 		call_rcu_sched(&old_rd->rcu, free_rootdomain);
 }
 
+void sched_get_rd(struct root_domain *rd)
+{
+	atomic_inc(&rd->refcount);
+}
+
+void sched_put_rd(struct root_domain *rd)
+{
+	if (!atomic_dec_and_test(&rd->refcount))
+		return;
+
+	call_rcu_sched(&rd->rcu, free_rootdomain);
+}
+
 static int init_rootdomain(struct root_domain *rd)
 {
 	if (!zalloc_cpumask_var(&rd->span, GFP_KERNEL))
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-01-24  1:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-24  1:45 [PATCH 0/2] sched/rt: Fix race with sched IPIs and root domain changes Steven Rostedt
2018-01-24  1:45 ` [PATCH 1/2] sched/rt: Use container_of() to get root domain in rto_push_irq_work_func() Steven Rostedt
2018-01-24  1:45 ` [PATCH 2/2] sched/rt: Up the root domain ref count when passing it around via IPIs Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).