linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/4] x86/irq: Plug a couple of cpu hotplug races
@ 2015-07-05 17:12 Thomas Gleixner
  2015-07-05 17:12 ` [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down Thomas Gleixner
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Thomas Gleixner @ 2015-07-05 17:12 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Peter Zijlstra, Peter Anvin, xiao jin, Joerg Roedel,
	Borislav Petkov, Yanmin Zhang

Jin debugged a subtle race in the cpu hotplug code which caused my to
look deeper into this. So I unearthed quite a few racy constructs.

Aside of the x86 specific problems I discovered a generic issue which
needs to be addressed in the cpu hotplug code.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down
  2015-07-05 17:12 [patch 0/4] x86/irq: Plug a couple of cpu hotplug races Thomas Gleixner
@ 2015-07-05 17:12 ` Thomas Gleixner
  2015-07-07  9:48   ` [tip:irq/urgent] hotplug: Prevent alloc/ free " tip-bot for Thomas Gleixner
                     ` (3 more replies)
  2015-07-05 17:12 ` [patch 2/4] x86: Plug irq vector hotplug race Thomas Gleixner
                   ` (2 subsequent siblings)
  3 siblings, 4 replies; 20+ messages in thread
From: Thomas Gleixner @ 2015-07-05 17:12 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Peter Zijlstra, Peter Anvin, xiao jin, Joerg Roedel,
	Borislav Petkov, Yanmin Zhang

[-- Attachment #1: hotplug-prevent-irq-setup-teardown.patch --]
[-- Type: text/plain, Size: 4126 bytes --]

When a cpu goes up some architectures (e.g. x86) have to walk the irq
space to set up the vector space for the cpu. While this needs extra
protection at the architecture level we can avoid a few race
conditions by preventing the concurrent allocation/free of irq
descriptors and the associated data.

When a cpu goes down it moves the interrupts which are targeted to
this cpu away by reassigning the affinities. While this happens
interrupts can be allocated and freed, which opens a can of race
conditions in the code which reassignes the affinities because
interrupt descriptors might be freed underneath.

Example:

CPU1				CPU2
cpu_up/down
 irq_desc = irq_to_desc(irq);
				remove_from_radix_tree(desc);
 raw_spin_lock(&desc->lock);
				free(desc);

We could protect the irq descriptors with RCU, but that would require
a full tree change of all accesses to interrupt descriptors. But
fortunately these kind of race conditions are rather limited to a few
things like cpu hotplug. The normal setup/teardown is very well
serialized. So the simpler and obvious solution is:

Prevent allocation and freeing of interrupt descriptors accross cpu
hotplug.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irqdesc.h |    7 ++++++-
 kernel/cpu.c            |   21 ++++++++++++++++++++-
 kernel/irq/internals.h  |    4 ----
 3 files changed, 26 insertions(+), 6 deletions(-)

Index: tip/include/linux/irqdesc.h
===================================================================
--- tip.orig/include/linux/irqdesc.h
+++ tip/include/linux/irqdesc.h
@@ -90,7 +90,12 @@ struct irq_desc {
 	const char		*name;
 } ____cacheline_internodealigned_in_smp;
 
-#ifndef CONFIG_SPARSE_IRQ
+#ifdef CONFIG_SPARSE_IRQ
+extern void irq_lock_sparse(void);
+extern void irq_unlock_sparse(void);
+#else
+static inline void irq_lock_sparse(void) { }
+static inline void irq_unlock_sparse(void) { }
 extern struct irq_desc irq_desc[NR_IRQS];
 #endif
 
Index: tip/kernel/cpu.c
===================================================================
--- tip.orig/kernel/cpu.c
+++ tip/kernel/cpu.c
@@ -392,13 +392,19 @@ static int __ref _cpu_down(unsigned int
 	smpboot_park_threads(cpu);
 
 	/*
-	 * So now all preempt/rcu users must observe !cpu_active().
+	 * Prevent irq alloc/free while the dying cpu reorganizes the
+	 * interrupt affinities.
 	 */
+	irq_lock_sparse();
 
+	/*
+	 * So now all preempt/rcu users must observe !cpu_active().
+	 */
 	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
 	if (err) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
 		cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
+		irq_unlock_sparse();
 		goto out_release;
 	}
 	BUG_ON(cpu_online(cpu));
@@ -415,6 +421,9 @@ static int __ref _cpu_down(unsigned int
 	smp_mb(); /* Read from cpu_dead_idle before __cpu_die(). */
 	per_cpu(cpu_dead_idle, cpu) = false;
 
+	/* Interrupts are moved away from the dying cpu, reenable alloc/free */
+	irq_unlock_sparse();
+
 	hotplug_cpu__broadcast_tick_pull(cpu);
 	/* This actually kills the CPU. */
 	__cpu_die(cpu);
@@ -517,8 +526,18 @@ static int _cpu_up(unsigned int cpu, int
 		goto out_notify;
 	}
 
+	/*
+	 * Some architectures have to walk the irq descriptors to
+	 * setup the vector space for the cpu which comes online.
+	 * Prevent irq alloc/free across the bringup.
+	 */
+	irq_lock_sparse();
+
 	/* Arch-specific enabling code. */
 	ret = __cpu_up(cpu, idle);
+
+	irq_unlock_sparse();
+
 	if (ret != 0)
 		goto out_notify;
 	BUG_ON(!cpu_online(cpu));
Index: tip/kernel/irq/internals.h
===================================================================
--- tip.orig/kernel/irq/internals.h
+++ tip/kernel/irq/internals.h
@@ -76,12 +76,8 @@ extern void unmask_threaded_irq(struct i
 
 #ifdef CONFIG_SPARSE_IRQ
 static inline void irq_mark_irq(unsigned int irq) { }
-extern void irq_lock_sparse(void);
-extern void irq_unlock_sparse(void);
 #else
 extern void irq_mark_irq(unsigned int irq);
-static inline void irq_lock_sparse(void) { }
-static inline void irq_unlock_sparse(void) { }
 #endif
 
 extern void init_kstat_irqs(struct irq_desc *desc, int node, int nr);



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 2/4] x86: Plug irq vector hotplug race
  2015-07-05 17:12 [patch 0/4] x86/irq: Plug a couple of cpu hotplug races Thomas Gleixner
  2015-07-05 17:12 ` [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down Thomas Gleixner
@ 2015-07-05 17:12 ` Thomas Gleixner
  2015-07-07  9:57   ` [tip:x86/urgent] x86/irq: " tip-bot for Thomas Gleixner
  2015-07-05 17:12 ` [patch 3/4] x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable() Thomas Gleixner
  2015-07-05 17:12 ` [patch 4/4] x86/irq: Retrieve irq data after locking irq_desc Thomas Gleixner
  3 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2015-07-05 17:12 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Peter Zijlstra, Peter Anvin, xiao jin, Joerg Roedel,
	Borislav Petkov, Yanmin Zhang

[-- Attachment #1: x86-plug-irq-vector-race.patch --]
[-- Type: text/plain, Size: 4634 bytes --]

Jin debugged a nasty cpu hotplug race which results in leaking a irq
vector on the newly hotplugged cpu.

cpu N				cpu M
native_cpu_up                   device_shutdown
  do_boot_cpu			  free_msi_irqs
  start_secondary                   arch_teardown_msi_irqs
    smp_callin                        default_teardown_msi_irqs
       setup_vector_irq                  arch_teardown_msi_irq
        __setup_vector_irq		   native_teardown_msi_irq
          lock(vector_lock)		     destroy_irq 
          install vectors
          unlock(vector_lock)
					       lock(vector_lock)
--->                                  	       __clear_irq_vector
                                    	       unlock(vector_lock)
    lock(vector_lock)
    set_cpu_online
    unlock(vector_lock)

This leaves the irq vector(s) which are torn down on CPU M stale in
the vector array of CPU N, because CPU M does not see CPU N online
yet. There is a similar issue with concurrent newly setup interrupts.

The alloc/free protection of irq descriptors does not prevent the
above race, because it merily prevents interrupt descriptors from
going away or changing concurrently.

Prevent this by moving the call to setup_vector_irq() into the
vector_lock held region which protects set_cpu_online():

cpu N				cpu M
native_cpu_up                   device_shutdown
  do_boot_cpu			  free_msi_irqs
  start_secondary                   arch_teardown_msi_irqs
    smp_callin                        default_teardown_msi_irqs
       lock(vector_lock)                arch_teardown_msi_irq
       setup_vector_irq()
        __setup_vector_irq		   native_teardown_msi_irq
          install vectors		     destroy_irq 
       set_cpu_online
       unlock(vector_lock)
					       lock(vector_lock)
                                  	       __clear_irq_vector
                                    	       unlock(vector_lock)

So cpu M either sees the cpu N online before clearing the vector or
cpu N installs the vectors after cpu M has cleared it.

Reported-by: xiao jin <jin.xiao@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/vector.c |   10 ++--------
 arch/x86/kernel/smpboot.c     |   13 +++++--------
 2 files changed, 7 insertions(+), 16 deletions(-)

Index: tip/arch/x86/kernel/apic/vector.c
===================================================================
--- tip.orig/arch/x86/kernel/apic/vector.c
+++ tip/arch/x86/kernel/apic/vector.c
@@ -409,12 +409,6 @@ static void __setup_vector_irq(int cpu)
 	int irq, vector;
 	struct apic_chip_data *data;
 
-	/*
-	 * vector_lock will make sure that we don't run into irq vector
-	 * assignments that might be happening on another cpu in parallel,
-	 * while we setup our initial vector to irq mappings.
-	 */
-	raw_spin_lock(&vector_lock);
 	/* Mark the inuse vectors */
 	for_each_active_irq(irq) {
 		data = apic_chip_data(irq_get_irq_data(irq));
@@ -436,16 +430,16 @@ static void __setup_vector_irq(int cpu)
 		if (!cpumask_test_cpu(cpu, data->domain))
 			per_cpu(vector_irq, cpu)[vector] = VECTOR_UNDEFINED;
 	}
-	raw_spin_unlock(&vector_lock);
 }
 
 /*
- * Setup the vector to irq mappings.
+ * Setup the vector to irq mappings. Must be called with vector_lock held.
  */
 void setup_vector_irq(int cpu)
 {
 	int irq;
 
+	lockdep_assert_held(&vector_lock);
 	/*
 	 * On most of the platforms, legacy PIC delivers the interrupts on the
 	 * boot cpu. But there are certain platforms where PIC interrupts are
Index: tip/arch/x86/kernel/smpboot.c
===================================================================
--- tip.orig/arch/x86/kernel/smpboot.c
+++ tip/arch/x86/kernel/smpboot.c
@@ -171,11 +171,6 @@ static void smp_callin(void)
 	apic_ap_setup();
 
 	/*
-	 * Need to setup vector mappings before we enable interrupts.
-	 */
-	setup_vector_irq(smp_processor_id());
-
-	/*
 	 * Save our processor parameters. Note: this information
 	 * is needed for clock calibration.
 	 */
@@ -246,11 +241,13 @@ static void notrace start_secondary(void
 #endif
 
 	/*
-	 * We need to hold vector_lock so there the set of online cpus
-	 * does not change while we are assigning vectors to cpus.  Holding
-	 * this lock ensures we don't half assign or remove an irq from a cpu.
+	 * Lock vector_lock and initialize the vectors on this cpu
+	 * before setting the cpu online. We must set it online with
+	 * vector_lock held to prevent a concurrent setup/teardown
+	 * from seeing a half valid vector space.
 	 */
 	lock_vector_lock();
+	setup_vector_irq(smp_processor_id());
 	set_cpu_online(smp_processor_id(), true);
 	unlock_vector_lock();
 	cpu_set_state_online(smp_processor_id());



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 3/4] x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable()
  2015-07-05 17:12 [patch 0/4] x86/irq: Plug a couple of cpu hotplug races Thomas Gleixner
  2015-07-05 17:12 ` [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down Thomas Gleixner
  2015-07-05 17:12 ` [patch 2/4] x86: Plug irq vector hotplug race Thomas Gleixner
@ 2015-07-05 17:12 ` Thomas Gleixner
  2015-07-07  9:57   ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
  2015-07-05 17:12 ` [patch 4/4] x86/irq: Retrieve irq data after locking irq_desc Thomas Gleixner
  3 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2015-07-05 17:12 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Peter Zijlstra, Peter Anvin, xiao jin, Joerg Roedel,
	Borislav Petkov, Yanmin Zhang

[-- Attachment #1: x86-protect-irq-vector-check.patch --]
[-- Type: text/plain, Size: 1606 bytes --]

It's unsafe to examine fields in the irq descriptor w/o holding the
descriptor lock. Add proper locking.

While at it add a comment why the vector check can run lock less

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/irq.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Index: tip/arch/x86/kernel/irq.c
===================================================================
--- tip.orig/arch/x86/kernel/irq.c
+++ tip/arch/x86/kernel/irq.c
@@ -347,14 +347,22 @@ int check_irq_vectors_for_cpu_disable(vo
 			if (!desc)
 				continue;
 
+			/*
+			 * Protect against concurrent action removal,
+			 * affinity changes etc.
+			 */
+			raw_spin_lock(&desc->lock);
 			data = irq_desc_get_irq_data(desc);
 			cpumask_copy(&affinity_new, data->affinity);
 			cpumask_clear_cpu(this_cpu, &affinity_new);
 
 			/* Do not count inactive or per-cpu irqs. */
-			if (!irq_has_action(irq) || irqd_is_per_cpu(data))
+			if (!irq_has_action(irq) || irqd_is_per_cpu(data)) {
+				raw_spin_unlock(&desc->lock);
 				continue;
+			}
 
+			raw_spin_unlock(&desc->lock);
 			/*
 			 * A single irq may be mapped to multiple
 			 * cpu's vector_irq[] (for example IOAPIC cluster
@@ -385,6 +393,9 @@ int check_irq_vectors_for_cpu_disable(vo
 		 * vector. If the vector is marked in the used vectors
 		 * bitmap or an irq is assigned to it, we don't count
 		 * it as available.
+		 *
+		 * As this is an inaccurate snapshot anyway, we can do
+		 * this w/o holding vector_lock.
 		 */
 		for (vector = FIRST_EXTERNAL_VECTOR;
 		     vector < first_system_vector; vector++) {



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 4/4] x86/irq: Retrieve irq data after locking irq_desc
  2015-07-05 17:12 [patch 0/4] x86/irq: Plug a couple of cpu hotplug races Thomas Gleixner
                   ` (2 preceding siblings ...)
  2015-07-05 17:12 ` [patch 3/4] x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable() Thomas Gleixner
@ 2015-07-05 17:12 ` Thomas Gleixner
  2015-07-07  9:58   ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
  3 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2015-07-05 17:12 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Peter Zijlstra, Peter Anvin, xiao jin, Joerg Roedel,
	Borislav Petkov, Yanmin Zhang

[-- Attachment #1: x86-irq-protect-fixup-irqs.patch --]
[-- Type: text/plain, Size: 1275 bytes --]

irq_data is protected by irq_desc->lock, so retrieving the irq chip
from irq_data outside the lock is racy vs. an concurrent update. Move
it into the lock held region.

While at it add a comment why the vector walk does not require
vector_lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/irq.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Index: tip/arch/x86/kernel/irq.c
===================================================================
--- tip.orig/arch/x86/kernel/irq.c
+++ tip/arch/x86/kernel/irq.c
@@ -497,6 +497,11 @@ void fixup_irqs(void)
 	 */
 	mdelay(1);
 
+	/*
+	 * We can walk the vector array of this cpu without holding
+	 * vector_lock because the cpu is already marked !online, so
+	 * nothing else will touch it.
+	 */
 	for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
 		unsigned int irr;
 
@@ -508,9 +513,9 @@ void fixup_irqs(void)
 			irq = __this_cpu_read(vector_irq[vector]);
 
 			desc = irq_to_desc(irq);
+			raw_spin_lock(&desc->lock);
 			data = irq_desc_get_irq_data(desc);
 			chip = irq_data_get_irq_chip(data);
-			raw_spin_lock(&desc->lock);
 			if (chip->irq_retrigger) {
 				chip->irq_retrigger(data);
 				__this_cpu_write(vector_irq[vector], VECTOR_RETRIGGERED);



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:irq/urgent] hotplug: Prevent alloc/ free of irq descriptors during cpu up/down
  2015-07-05 17:12 ` [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down Thomas Gleixner
@ 2015-07-07  9:48   ` tip-bot for Thomas Gleixner
  2015-07-07 20:06   ` tip-bot for Thomas Gleixner
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Thomas Gleixner @ 2015-07-07  9:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, jin.xiao, yanmin_zhang, mingo, hpa, linux-kernel, tglx,
	peterz, jroedel

Commit-ID:  fc862aa8288be8ace91013375ff0e3c48815c662
Gitweb:     http://git.kernel.org/tip/fc862aa8288be8ace91013375ff0e3c48815c662
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 5 Jul 2015 17:12:30 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 7 Jul 2015 11:33:44 +0200

hotplug: Prevent alloc/free of irq descriptors during cpu up/down

When a cpu goes up some architectures (e.g. x86) have to walk the irq
space to set up the vector space for the cpu. While this needs extra
protection at the architecture level we can avoid a few race
conditions by preventing the concurrent allocation/free of irq
descriptors and the associated data.

When a cpu goes down it moves the interrupts which are targeted to
this cpu away by reassigning the affinities. While this happens
interrupts can be allocated and freed, which opens a can of race
conditions in the code which reassignes the affinities because
interrupt descriptors might be freed underneath.

Example:

CPU1				CPU2
cpu_up/down
 irq_desc = irq_to_desc(irq);
				remove_from_radix_tree(desc);
 raw_spin_lock(&desc->lock);
				free(desc);

We could protect the irq descriptors with RCU, but that would require
a full tree change of all accesses to interrupt descriptors. But
fortunately these kind of race conditions are rather limited to a few
things like cpu hotplug. The normal setup/teardown is very well
serialized. So the simpler and obvious solution is:

Prevent allocation and freeing of interrupt descriptors accross cpu
hotplug.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de
---
 include/linux/irqdesc.h |  7 ++++++-
 kernel/cpu.c            | 21 ++++++++++++++++++++-
 kernel/irq/internals.h  |  4 ----
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index 624a668..fcea4e4 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -87,7 +87,12 @@ struct irq_desc {
 	const char		*name;
 } ____cacheline_internodealigned_in_smp;
 
-#ifndef CONFIG_SPARSE_IRQ
+#ifdef CONFIG_SPARSE_IRQ
+extern void irq_lock_sparse(void);
+extern void irq_unlock_sparse(void);
+#else
+static inline void irq_lock_sparse(void) { }
+static inline void irq_unlock_sparse(void) { }
 extern struct irq_desc irq_desc[NR_IRQS];
 #endif
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 9c9c9fa..fa6dc67 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -392,13 +392,19 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	smpboot_park_threads(cpu);
 
 	/*
-	 * So now all preempt/rcu users must observe !cpu_active().
+	 * Prevent irq alloc/free while the dying cpu reorganizes the
+	 * interrupt affinities.
 	 */
+	irq_lock_sparse();
 
+	/*
+	 * So now all preempt/rcu users must observe !cpu_active().
+	 */
 	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
 	if (err) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
 		cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
+		irq_unlock_sparse();
 		goto out_release;
 	}
 	BUG_ON(cpu_online(cpu));
@@ -415,6 +421,9 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	smp_mb(); /* Read from cpu_dead_idle before __cpu_die(). */
 	per_cpu(cpu_dead_idle, cpu) = false;
 
+	/* Interrupts are moved away from the dying cpu, reenable alloc/free */
+	irq_unlock_sparse();
+
 	hotplug_cpu__broadcast_tick_pull(cpu);
 	/* This actually kills the CPU. */
 	__cpu_die(cpu);
@@ -517,8 +526,18 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen)
 		goto out_notify;
 	}
 
+	/*
+	 * Some architectures have to walk the irq descriptors to
+	 * setup the vector space for the cpu which comes online.
+	 * Prevent irq alloc/free across the bringup.
+	 */
+	irq_lock_sparse();
+
 	/* Arch-specific enabling code. */
 	ret = __cpu_up(cpu, idle);
+
+	irq_unlock_sparse();
+
 	if (ret != 0)
 		goto out_notify;
 	BUG_ON(!cpu_online(cpu));
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 4834ee8..61008b8 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -76,12 +76,8 @@ extern void unmask_threaded_irq(struct irq_desc *desc);
 
 #ifdef CONFIG_SPARSE_IRQ
 static inline void irq_mark_irq(unsigned int irq) { }
-extern void irq_lock_sparse(void);
-extern void irq_unlock_sparse(void);
 #else
 extern void irq_mark_irq(unsigned int irq);
-static inline void irq_lock_sparse(void) { }
-static inline void irq_unlock_sparse(void) { }
 #endif
 
 extern void init_kstat_irqs(struct irq_desc *desc, int node, int nr);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip:x86/urgent] x86/irq: Plug irq vector hotplug race
  2015-07-05 17:12 ` [patch 2/4] x86: Plug irq vector hotplug race Thomas Gleixner
@ 2015-07-07  9:57   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Thomas Gleixner @ 2015-07-07  9:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jin.xiao, mingo, peterz, bp, jroedel, tglx, hpa, linux-kernel,
	yanmin_zhang

Commit-ID:  5a3f75e3f02836518ce49536e9c460ca8e1fa290
Gitweb:     http://git.kernel.org/tip/5a3f75e3f02836518ce49536e9c460ca8e1fa290
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 5 Jul 2015 17:12:32 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 7 Jul 2015 11:54:04 +0200

x86/irq: Plug irq vector hotplug race

Jin debugged a nasty cpu hotplug race which results in leaking a irq
vector on the newly hotplugged cpu.

cpu N				cpu M
native_cpu_up                   device_shutdown
  do_boot_cpu			  free_msi_irqs
  start_secondary                   arch_teardown_msi_irqs
    smp_callin                        default_teardown_msi_irqs
       setup_vector_irq                  arch_teardown_msi_irq
        __setup_vector_irq		   native_teardown_msi_irq
          lock(vector_lock)		     destroy_irq 
          install vectors
          unlock(vector_lock)
					       lock(vector_lock)
--->                                  	       __clear_irq_vector
                                    	       unlock(vector_lock)
    lock(vector_lock)
    set_cpu_online
    unlock(vector_lock)

This leaves the irq vector(s) which are torn down on CPU M stale in
the vector array of CPU N, because CPU M does not see CPU N online
yet. There is a similar issue with concurrent newly setup interrupts.

The alloc/free protection of irq descriptors does not prevent the
above race, because it merily prevents interrupt descriptors from
going away or changing concurrently.

Prevent this by moving the call to setup_vector_irq() into the
vector_lock held region which protects set_cpu_online():

cpu N				cpu M
native_cpu_up                   device_shutdown
  do_boot_cpu			  free_msi_irqs
  start_secondary                   arch_teardown_msi_irqs
    smp_callin                        default_teardown_msi_irqs
       lock(vector_lock)                arch_teardown_msi_irq
       setup_vector_irq()
        __setup_vector_irq		   native_teardown_msi_irq
          install vectors		     destroy_irq 
       set_cpu_online
       unlock(vector_lock)
					       lock(vector_lock)
                                  	       __clear_irq_vector
                                    	       unlock(vector_lock)

So cpu M either sees the cpu N online before clearing the vector or
cpu N installs the vectors after cpu M has cleared it.

Reported-by: xiao jin <jin.xiao@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.141898931@linutronix.de
---
 arch/x86/kernel/apic/vector.c | 10 ++--------
 arch/x86/kernel/smpboot.c     | 13 +++++--------
 2 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 28eba2d..f813261 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -409,12 +409,6 @@ static void __setup_vector_irq(int cpu)
 	int irq, vector;
 	struct apic_chip_data *data;
 
-	/*
-	 * vector_lock will make sure that we don't run into irq vector
-	 * assignments that might be happening on another cpu in parallel,
-	 * while we setup our initial vector to irq mappings.
-	 */
-	raw_spin_lock(&vector_lock);
 	/* Mark the inuse vectors */
 	for_each_active_irq(irq) {
 		data = apic_chip_data(irq_get_irq_data(irq));
@@ -436,16 +430,16 @@ static void __setup_vector_irq(int cpu)
 		if (!cpumask_test_cpu(cpu, data->domain))
 			per_cpu(vector_irq, cpu)[vector] = VECTOR_UNDEFINED;
 	}
-	raw_spin_unlock(&vector_lock);
 }
 
 /*
- * Setup the vector to irq mappings.
+ * Setup the vector to irq mappings. Must be called with vector_lock held.
  */
 void setup_vector_irq(int cpu)
 {
 	int irq;
 
+	lockdep_assert_held(&vector_lock);
 	/*
 	 * On most of the platforms, legacy PIC delivers the interrupts on the
 	 * boot cpu. But there are certain platforms where PIC interrupts are
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 0bd8c1d..d3010aa 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -171,11 +171,6 @@ static void smp_callin(void)
 	apic_ap_setup();
 
 	/*
-	 * Need to setup vector mappings before we enable interrupts.
-	 */
-	setup_vector_irq(smp_processor_id());
-
-	/*
 	 * Save our processor parameters. Note: this information
 	 * is needed for clock calibration.
 	 */
@@ -239,11 +234,13 @@ static void notrace start_secondary(void *unused)
 	check_tsc_sync_target();
 
 	/*
-	 * We need to hold vector_lock so there the set of online cpus
-	 * does not change while we are assigning vectors to cpus.  Holding
-	 * this lock ensures we don't half assign or remove an irq from a cpu.
+	 * Lock vector_lock and initialize the vectors on this cpu
+	 * before setting the cpu online. We must set it online with
+	 * vector_lock held to prevent a concurrent setup/teardown
+	 * from seeing a half valid vector space.
 	 */
 	lock_vector_lock();
+	setup_vector_irq(smp_processor_id());
 	set_cpu_online(smp_processor_id(), true);
 	unlock_vector_lock();
 	cpu_set_state_online(smp_processor_id());

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip:x86/urgent] x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable()
  2015-07-05 17:12 ` [patch 3/4] x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable() Thomas Gleixner
@ 2015-07-07  9:57   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Thomas Gleixner @ 2015-07-07  9:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, linux-kernel, hpa, yanmin_zhang, bp, jin.xiao, peterz,
	jroedel, mingo

Commit-ID:  cbb24dc761d95fe39a7a122bb1b298e9604cae15
Gitweb:     http://git.kernel.org/tip/cbb24dc761d95fe39a7a122bb1b298e9604cae15
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 5 Jul 2015 17:12:33 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 7 Jul 2015 11:54:04 +0200

x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable()

It's unsafe to examine fields in the irq descriptor w/o holding the
descriptor lock. Add proper locking.

While at it add a comment why the vector check can run lock less

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.236544164@linutronix.de
---
 arch/x86/kernel/irq.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 88b36648..85ca76e 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -347,14 +347,22 @@ int check_irq_vectors_for_cpu_disable(void)
 			if (!desc)
 				continue;
 
+			/*
+			 * Protect against concurrent action removal,
+			 * affinity changes etc.
+			 */
+			raw_spin_lock(&desc->lock);
 			data = irq_desc_get_irq_data(desc);
 			cpumask_copy(&affinity_new, data->affinity);
 			cpumask_clear_cpu(this_cpu, &affinity_new);
 
 			/* Do not count inactive or per-cpu irqs. */
-			if (!irq_has_action(irq) || irqd_is_per_cpu(data))
+			if (!irq_has_action(irq) || irqd_is_per_cpu(data)) {
+				raw_spin_unlock(&desc->lock);
 				continue;
+			}
 
+			raw_spin_unlock(&desc->lock);
 			/*
 			 * A single irq may be mapped to multiple
 			 * cpu's vector_irq[] (for example IOAPIC cluster
@@ -385,6 +393,9 @@ int check_irq_vectors_for_cpu_disable(void)
 		 * vector. If the vector is marked in the used vectors
 		 * bitmap or an irq is assigned to it, we don't count
 		 * it as available.
+		 *
+		 * As this is an inaccurate snapshot anyway, we can do
+		 * this w/o holding vector_lock.
 		 */
 		for (vector = FIRST_EXTERNAL_VECTOR;
 		     vector < first_system_vector; vector++) {

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip:x86/urgent] x86/irq: Retrieve irq data after locking irq_desc
  2015-07-05 17:12 ` [patch 4/4] x86/irq: Retrieve irq data after locking irq_desc Thomas Gleixner
@ 2015-07-07  9:58   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Thomas Gleixner @ 2015-07-07  9:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, linux-kernel, yanmin_zhang, peterz, mingo, jroedel, jin.xiao,
	tglx, hpa

Commit-ID:  09cf92b784fae6109450c5d64f9908066d605249
Gitweb:     http://git.kernel.org/tip/09cf92b784fae6109450c5d64f9908066d605249
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 5 Jul 2015 17:12:35 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 7 Jul 2015 11:54:04 +0200

x86/irq: Retrieve irq data after locking irq_desc

irq_data is protected by irq_desc->lock, so retrieving the irq chip
from irq_data outside the lock is racy vs. an concurrent update. Move
it into the lock held region.

While at it add a comment why the vector walk does not require
vector_lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.331320612@linutronix.de
---
 arch/x86/kernel/irq.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 85ca76e..c7dfe1b 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -497,6 +497,11 @@ void fixup_irqs(void)
 	 */
 	mdelay(1);
 
+	/*
+	 * We can walk the vector array of this cpu without holding
+	 * vector_lock because the cpu is already marked !online, so
+	 * nothing else will touch it.
+	 */
 	for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS; vector++) {
 		unsigned int irr;
 
@@ -508,9 +513,9 @@ void fixup_irqs(void)
 			irq = __this_cpu_read(vector_irq[vector]);
 
 			desc = irq_to_desc(irq);
+			raw_spin_lock(&desc->lock);
 			data = irq_desc_get_irq_data(desc);
 			chip = irq_data_get_irq_chip(data);
-			raw_spin_lock(&desc->lock);
 			if (chip->irq_retrigger) {
 				chip->irq_retrigger(data);
 				__this_cpu_write(vector_irq[vector], VECTOR_RETRIGGERED);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip:irq/urgent] hotplug: Prevent alloc/ free of irq descriptors during cpu up/down
  2015-07-05 17:12 ` [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down Thomas Gleixner
  2015-07-07  9:48   ` [tip:irq/urgent] hotplug: Prevent alloc/ free " tip-bot for Thomas Gleixner
@ 2015-07-07 20:06   ` tip-bot for Thomas Gleixner
  2015-07-08  9:37   ` tip-bot for Thomas Gleixner
  2015-07-14 14:39   ` [patch 1/4] hotplug: Prevent alloc/free " Boris Ostrovsky
  3 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Thomas Gleixner @ 2015-07-07 20:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, hpa, tglx, jroedel, peterz, jin.xiao, linux-kernel,
	yanmin_zhang, mingo

Commit-ID:  bdcbafe3402cb337752c4c8bce3445ee4c5559a5
Gitweb:     http://git.kernel.org/tip/bdcbafe3402cb337752c4c8bce3445ee4c5559a5
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 5 Jul 2015 17:12:30 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 7 Jul 2015 22:03:22 +0200

hotplug: Prevent alloc/free of irq descriptors during cpu up/down

When a cpu goes up some architectures (e.g. x86) have to walk the irq
space to set up the vector space for the cpu. While this needs extra
protection at the architecture level we can avoid a few race
conditions by preventing the concurrent allocation/free of irq
descriptors and the associated data.

When a cpu goes down it moves the interrupts which are targeted to
this cpu away by reassigning the affinities. While this happens
interrupts can be allocated and freed, which opens a can of race
conditions in the code which reassignes the affinities because
interrupt descriptors might be freed underneath.

Example:

CPU1				CPU2
cpu_up/down
 irq_desc = irq_to_desc(irq);
				remove_from_radix_tree(desc);
 raw_spin_lock(&desc->lock);
				free(desc);

We could protect the irq descriptors with RCU, but that would require
a full tree change of all accesses to interrupt descriptors. But
fortunately these kind of race conditions are rather limited to a few
things like cpu hotplug. The normal setup/teardown is very well
serialized. So the simpler and obvious solution is:

Prevent allocation and freeing of interrupt descriptors accross cpu
hotplug.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de
---
 kernel/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index fa6dc67..6a37454 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -21,6 +21,7 @@
 #include <linux/suspend.h>
 #include <linux/lockdep.h>
 #include <linux/tick.h>
+#include <linux/irq.h>
 #include <trace/events/power.h>
 
 #include "smpboot.h"

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip:irq/urgent] hotplug: Prevent alloc/ free of irq descriptors during cpu up/down
  2015-07-05 17:12 ` [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down Thomas Gleixner
  2015-07-07  9:48   ` [tip:irq/urgent] hotplug: Prevent alloc/ free " tip-bot for Thomas Gleixner
  2015-07-07 20:06   ` tip-bot for Thomas Gleixner
@ 2015-07-08  9:37   ` tip-bot for Thomas Gleixner
  2015-07-14 14:39   ` [patch 1/4] hotplug: Prevent alloc/free " Boris Ostrovsky
  3 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Thomas Gleixner @ 2015-07-08  9:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jroedel, linux-kernel, bp, peterz, tglx, jin.xiao, yanmin_zhang,
	hpa, mingo

Commit-ID:  a899418167264c7bac574b1a0f1b2c26c5b0995a
Gitweb:     http://git.kernel.org/tip/a899418167264c7bac574b1a0f1b2c26c5b0995a
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sun, 5 Jul 2015 17:12:30 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 8 Jul 2015 11:32:25 +0200

hotplug: Prevent alloc/free of irq descriptors during cpu up/down

When a cpu goes up some architectures (e.g. x86) have to walk the irq
space to set up the vector space for the cpu. While this needs extra
protection at the architecture level we can avoid a few race
conditions by preventing the concurrent allocation/free of irq
descriptors and the associated data.

When a cpu goes down it moves the interrupts which are targeted to
this cpu away by reassigning the affinities. While this happens
interrupts can be allocated and freed, which opens a can of race
conditions in the code which reassignes the affinities because
interrupt descriptors might be freed underneath.

Example:

CPU1				CPU2
cpu_up/down
 irq_desc = irq_to_desc(irq);
				remove_from_radix_tree(desc);
 raw_spin_lock(&desc->lock);
				free(desc);

We could protect the irq descriptors with RCU, but that would require
a full tree change of all accesses to interrupt descriptors. But
fortunately these kind of race conditions are rather limited to a few
things like cpu hotplug. The normal setup/teardown is very well
serialized. So the simpler and obvious solution is:

Prevent allocation and freeing of interrupt descriptors accross cpu
hotplug.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de
---
 include/linux/irqdesc.h |  7 ++++++-
 kernel/cpu.c            | 22 +++++++++++++++++++++-
 kernel/irq/internals.h  |  4 ----
 3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index 624a668..fcea4e4 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -87,7 +87,12 @@ struct irq_desc {
 	const char		*name;
 } ____cacheline_internodealigned_in_smp;
 
-#ifndef CONFIG_SPARSE_IRQ
+#ifdef CONFIG_SPARSE_IRQ
+extern void irq_lock_sparse(void);
+extern void irq_unlock_sparse(void);
+#else
+static inline void irq_lock_sparse(void) { }
+static inline void irq_unlock_sparse(void) { }
 extern struct irq_desc irq_desc[NR_IRQS];
 #endif
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 9c9c9fa..6a37454 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -21,6 +21,7 @@
 #include <linux/suspend.h>
 #include <linux/lockdep.h>
 #include <linux/tick.h>
+#include <linux/irq.h>
 #include <trace/events/power.h>
 
 #include "smpboot.h"
@@ -392,13 +393,19 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	smpboot_park_threads(cpu);
 
 	/*
-	 * So now all preempt/rcu users must observe !cpu_active().
+	 * Prevent irq alloc/free while the dying cpu reorganizes the
+	 * interrupt affinities.
 	 */
+	irq_lock_sparse();
 
+	/*
+	 * So now all preempt/rcu users must observe !cpu_active().
+	 */
 	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
 	if (err) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
 		cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
+		irq_unlock_sparse();
 		goto out_release;
 	}
 	BUG_ON(cpu_online(cpu));
@@ -415,6 +422,9 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	smp_mb(); /* Read from cpu_dead_idle before __cpu_die(). */
 	per_cpu(cpu_dead_idle, cpu) = false;
 
+	/* Interrupts are moved away from the dying cpu, reenable alloc/free */
+	irq_unlock_sparse();
+
 	hotplug_cpu__broadcast_tick_pull(cpu);
 	/* This actually kills the CPU. */
 	__cpu_die(cpu);
@@ -517,8 +527,18 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen)
 		goto out_notify;
 	}
 
+	/*
+	 * Some architectures have to walk the irq descriptors to
+	 * setup the vector space for the cpu which comes online.
+	 * Prevent irq alloc/free across the bringup.
+	 */
+	irq_lock_sparse();
+
 	/* Arch-specific enabling code. */
 	ret = __cpu_up(cpu, idle);
+
+	irq_unlock_sparse();
+
 	if (ret != 0)
 		goto out_notify;
 	BUG_ON(!cpu_online(cpu));
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 4834ee8..61008b8 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -76,12 +76,8 @@ extern void unmask_threaded_irq(struct irq_desc *desc);
 
 #ifdef CONFIG_SPARSE_IRQ
 static inline void irq_mark_irq(unsigned int irq) { }
-extern void irq_lock_sparse(void);
-extern void irq_unlock_sparse(void);
 #else
 extern void irq_mark_irq(unsigned int irq);
-static inline void irq_lock_sparse(void) { }
-static inline void irq_unlock_sparse(void) { }
 #endif
 
 extern void init_kstat_irqs(struct irq_desc *desc, int node, int nr);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down
  2015-07-05 17:12 ` [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down Thomas Gleixner
                     ` (2 preceding siblings ...)
  2015-07-08  9:37   ` tip-bot for Thomas Gleixner
@ 2015-07-14 14:39   ` Boris Ostrovsky
  2015-07-14 15:44     ` Thomas Gleixner
  3 siblings, 1 reply; 20+ messages in thread
From: Boris Ostrovsky @ 2015-07-14 14:39 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: Ingo Molnar, Peter Zijlstra, Peter Anvin, xiao jin, Joerg Roedel,
	Borislav Petkov, Yanmin Zhang, xen-devel

On 07/05/2015 01:12 PM, Thomas Gleixner wrote:
> When a cpu goes up some architectures (e.g. x86) have to walk the irq
> space to set up the vector space for the cpu. While this needs extra
> protection at the architecture level we can avoid a few race
> conditions by preventing the concurrent allocation/free of irq
> descriptors and the associated data.
>
> When a cpu goes down it moves the interrupts which are targeted to
> this cpu away by reassigning the affinities. While this happens
> interrupts can be allocated and freed, which opens a can of race
> conditions in the code which reassignes the affinities because
> interrupt descriptors might be freed underneath.
>
> Example:
>
> CPU1				CPU2
> cpu_up/down
>   irq_desc = irq_to_desc(irq);
> 				remove_from_radix_tree(desc);
>   raw_spin_lock(&desc->lock);
> 				free(desc);
>
> We could protect the irq descriptors with RCU, but that would require
> a full tree change of all accesses to interrupt descriptors. But
> fortunately these kind of race conditions are rather limited to a few
> things like cpu hotplug. The normal setup/teardown is very well
> serialized. So the simpler and obvious solution is:
>
> Prevent allocation and freeing of interrupt descriptors accross cpu
> hotplug.


This breaks Xen guests that allocate interrupt descriptors in .cpu_up().

Any chance this locking can be moved into arch code? Otherwise we will 
need to have something like arch_post_cpu_up() after the lock is released.

(The patch doesn't appear to have any side effects for the down path 
since Xen guests deallocate descriptors in __cpu_die()).


-boris


>
> Signed-off-by: Thomas Gleixner<tglx@linutronix.de>
> ---
>   include/linux/irqdesc.h |    7 ++++++-
>   kernel/cpu.c            |   21 ++++++++++++++++++++-
>   kernel/irq/internals.h  |    4 ----
>   3 files changed, 26 insertions(+), 6 deletions(-)
>
> Index: tip/include/linux/irqdesc.h
> ===================================================================
> --- tip.orig/include/linux/irqdesc.h
> +++ tip/include/linux/irqdesc.h
> @@ -90,7 +90,12 @@ struct irq_desc {
>   	const char		*name;
>   } ____cacheline_internodealigned_in_smp;
>
> -#ifndef CONFIG_SPARSE_IRQ
> +#ifdef CONFIG_SPARSE_IRQ
> +extern void irq_lock_sparse(void);
> +extern void irq_unlock_sparse(void);
> +#else
> +static inline void irq_lock_sparse(void) { }
> +static inline void irq_unlock_sparse(void) { }
>   extern struct irq_desc irq_desc[NR_IRQS];
>   #endif
>
> Index: tip/kernel/cpu.c
> ===================================================================
> --- tip.orig/kernel/cpu.c
> +++ tip/kernel/cpu.c
> @@ -392,13 +392,19 @@ static int __ref _cpu_down(unsigned int
>   	smpboot_park_threads(cpu);
>
>   	/*
> -	 * So now all preempt/rcu users must observe !cpu_active().
> +	 * Prevent irq alloc/free while the dying cpu reorganizes the
> +	 * interrupt affinities.
>   	 */
> +	irq_lock_sparse();
>
> +	/*
> +	 * So now all preempt/rcu users must observe !cpu_active().
> +	 */
>   	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
>   	if (err) {
>   		/* CPU didn't die: tell everyone.  Can't complain. */
>   		cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
> +		irq_unlock_sparse();
>   		goto out_release;
>   	}
>   	BUG_ON(cpu_online(cpu));
> @@ -415,6 +421,9 @@ static int __ref _cpu_down(unsigned int
>   	smp_mb(); /* Read from cpu_dead_idle before __cpu_die(). */
>   	per_cpu(cpu_dead_idle, cpu) = false;
>
> +	/* Interrupts are moved away from the dying cpu, reenable alloc/free */
> +	irq_unlock_sparse();
> +
>   	hotplug_cpu__broadcast_tick_pull(cpu);
>   	/* This actually kills the CPU. */
>   	__cpu_die(cpu);
> @@ -517,8 +526,18 @@ static int _cpu_up(unsigned int cpu, int
>   		goto out_notify;
>   	}
>
> +	/*
> +	 * Some architectures have to walk the irq descriptors to
> +	 * setup the vector space for the cpu which comes online.
> +	 * Prevent irq alloc/free across the bringup.
> +	 */
> +	irq_lock_sparse();
> +
>   	/* Arch-specific enabling code. */
>   	ret = __cpu_up(cpu, idle);
> +
> +	irq_unlock_sparse();
> +
>   	if (ret != 0)
>   		goto out_notify;
>   	BUG_ON(!cpu_online(cpu));
> Index: tip/kernel/irq/internals.h
> ===================================================================
> --- tip.orig/kernel/irq/internals.h
> +++ tip/kernel/irq/internals.h
> @@ -76,12 +76,8 @@ extern void unmask_threaded_irq(struct i
>
>   #ifdef CONFIG_SPARSE_IRQ
>   static inline void irq_mark_irq(unsigned int irq) { }
> -extern void irq_lock_sparse(void);
> -extern void irq_unlock_sparse(void);
>   #else
>   extern void irq_mark_irq(unsigned int irq);
> -static inline void irq_lock_sparse(void) { }
> -static inline void irq_unlock_sparse(void) { }
>   #endif
>
>   extern void init_kstat_irqs(struct irq_desc *desc, int node, int nr);
>
>
> --


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down
  2015-07-14 14:39   ` [patch 1/4] hotplug: Prevent alloc/free " Boris Ostrovsky
@ 2015-07-14 15:44     ` Thomas Gleixner
  2015-07-14 16:03       ` Boris Ostrovsky
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2015-07-14 15:44 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Peter Anvin, xiao jin,
	Joerg Roedel, Borislav Petkov, Yanmin Zhang, xen-devel

On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
> > Prevent allocation and freeing of interrupt descriptors accross cpu
> > hotplug.
> 
> 
> This breaks Xen guests that allocate interrupt descriptors in .cpu_up().

And where exactly does XEN allocate those descriptors?
 
> Any chance this locking can be moved into arch code?

No.

> (The patch doesn't appear to have any side effects for the down path since Xen
> guests deallocate descriptors in __cpu_die()).
 
Exact place please.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down
  2015-07-14 15:44     ` Thomas Gleixner
@ 2015-07-14 16:03       ` Boris Ostrovsky
  2015-07-14 17:32         ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Boris Ostrovsky @ 2015-07-14 16:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Peter Anvin, xiao jin,
	Joerg Roedel, Borislav Petkov, Yanmin Zhang, xen-devel

On 07/14/2015 11:44 AM, Thomas Gleixner wrote:
> On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
>>> Prevent allocation and freeing of interrupt descriptors accross cpu
>>> hotplug.
>>
>> This breaks Xen guests that allocate interrupt descriptors in .cpu_up().
> And where exactly does XEN allocate those descriptors?

xen_cpu_up()
     xen_setup_timer()
         bind_virq_to_irqhandler()
             bind_virq_to_irq()
                 xen_allocate_irq_dynamic()
                     xen_allocate_irqs_dynamic()
                         irq_alloc_descs()


There is also a similar pass via xen_cpu_up() -> xen_smp_intr_init()


>   
>> Any chance this locking can be moved into arch code?
> No.
>
>> (The patch doesn't appear to have any side effects for the down path since Xen
>> guests deallocate descriptors in __cpu_die()).
>   
> Exact place please.

Whose place? Where descriptors are deallocated?

__cpu_die()
     xen_cpu_die()
         xen_teardown_timer()
             unbind_from_irqhandler()
                 unbind_from_irq()
                     __unbind_from_irq()
                         xen_free_irq()
                             irq_free_descs()
                                 free_desc()

-boris


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down
  2015-07-14 16:03       ` Boris Ostrovsky
@ 2015-07-14 17:32         ` Thomas Gleixner
  2015-07-14 20:04           ` [Xen-devel] " Boris Ostrovsky
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2015-07-14 17:32 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Peter Anvin, xiao jin,
	Joerg Roedel, Borislav Petkov, Yanmin Zhang, xen-devel

On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
> On 07/14/2015 11:44 AM, Thomas Gleixner wrote:
> > On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
> > > > Prevent allocation and freeing of interrupt descriptors accross cpu
> > > > hotplug.
> > > 
> > > This breaks Xen guests that allocate interrupt descriptors in .cpu_up().
> > And where exactly does XEN allocate those descriptors?
> 
> xen_cpu_up()
>     xen_setup_timer()
>         bind_virq_to_irqhandler()
>             bind_virq_to_irq()
>                 xen_allocate_irq_dynamic()
>                     xen_allocate_irqs_dynamic()
>                         irq_alloc_descs()
> 
> 
> There is also a similar pass via xen_cpu_up() -> xen_smp_intr_init()

Sigh.
 
> 
> >   
> > > Any chance this locking can be moved into arch code?
> > No.

The issue here is that all architectures need that protection and just
Xen does irq allocations in cpu_up.

So moving that protection into architecture code is not really an
option.

> > > Otherwise we will need to have something like arch_post_cpu_up()
> > > after the lock is released.

I'm not sure, that this will work. You probably want to do this in the
cpu prepare stage, i.e. before calling __cpu_up().

I have to walk the dogs now. Will look into it later tonight.

> > > (The patch doesn't appear to have any side effects for the down path since
> > > Xen
> > > guests deallocate descriptors in __cpu_die()).
> >   Exact place please.
> 
> Whose place? Where descriptors are deallocated?
> 
> __cpu_die()
>     xen_cpu_die()
>         xen_teardown_timer()
>             unbind_from_irqhandler()
>                 unbind_from_irq()
>                     __unbind_from_irq()
>                         xen_free_irq()
>                             irq_free_descs()
>                                 free_desc()

Right, that's outside the lock held region.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down
  2015-07-14 17:32         ` Thomas Gleixner
@ 2015-07-14 20:04           ` Boris Ostrovsky
  2015-07-14 20:15             ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Boris Ostrovsky @ 2015-07-14 20:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Yanmin Zhang, Joerg Roedel, Peter Zijlstra, LKML, Ingo Molnar,
	Peter Anvin, xen-devel, Borislav Petkov, xiao jin

On 07/14/2015 01:32 PM, Thomas Gleixner wrote:
> On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
>> On 07/14/2015 11:44 AM, Thomas Gleixner wrote:
>>> On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
>>>>> Prevent allocation and freeing of interrupt descriptors accross cpu
>>>>> hotplug.
>>>> This breaks Xen guests that allocate interrupt descriptors in .cpu_up().
>>> And where exactly does XEN allocate those descriptors?
>> xen_cpu_up()
>>      xen_setup_timer()
>>          bind_virq_to_irqhandler()
>>              bind_virq_to_irq()
>>                  xen_allocate_irq_dynamic()
>>                      xen_allocate_irqs_dynamic()
>>                          irq_alloc_descs()
>>
>>
>> There is also a similar pass via xen_cpu_up() -> xen_smp_intr_init()
> Sigh.
>   
>>>    
>>>> Any chance this locking can be moved into arch code?
>>> No.
> The issue here is that all architectures need that protection and just
> Xen does irq allocations in cpu_up.
>
> So moving that protection into architecture code is not really an
> option.
>
>>>> Otherwise we will need to have something like arch_post_cpu_up()
>>>> after the lock is released.
> I'm not sure, that this will work. You probably want to do this in the
> cpu prepare stage, i.e. before calling __cpu_up().



For PV guests (the ones that use xen_cpu_up()) it will work either 
before or after __cpu_up(). At least my (somewhat limited) testing 
didn't show any problems so far.

However, HVM CPUs use xen_hvm_cpu_up() and if you read comments there 
you will see that xen_smp_intr_init() needs to be called before 
native_cpu_up() but xen_init_lock_cpu() (which eventually calls 
irq_alloc_descs()) needs to be called after.

I think I can split xen_init_lock_cpu() so that the part that needs to 
be called after will avoid going into irq core code. And then the rest 
will go into arch_cpu_prepare().


-boris

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down
  2015-07-14 20:04           ` [Xen-devel] " Boris Ostrovsky
@ 2015-07-14 20:15             ` Thomas Gleixner
  2015-07-14 21:07               ` Boris Ostrovsky
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2015-07-14 20:15 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Yanmin Zhang, Joerg Roedel, Peter Zijlstra, LKML, Ingo Molnar,
	Peter Anvin, xen-devel, Borislav Petkov, xiao jin

On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
> On 07/14/2015 01:32 PM, Thomas Gleixner wrote:
> > On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
> > > On 07/14/2015 11:44 AM, Thomas Gleixner wrote:
> > > > On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
> > > > > > Prevent allocation and freeing of interrupt descriptors accross cpu
> > > > > > hotplug.
> > > > > This breaks Xen guests that allocate interrupt descriptors in
> > > > > .cpu_up().
> > > > And where exactly does XEN allocate those descriptors?
> > > xen_cpu_up()
> > >      xen_setup_timer()
> > >          bind_virq_to_irqhandler()
> > >              bind_virq_to_irq()
> > >                  xen_allocate_irq_dynamic()
> > >                      xen_allocate_irqs_dynamic()
> > >                          irq_alloc_descs()
> > > 
> > > 
> > > There is also a similar pass via xen_cpu_up() -> xen_smp_intr_init()
> > Sigh.
> >   
> > > >    
> > > > > Any chance this locking can be moved into arch code?
> > > > No.
> > The issue here is that all architectures need that protection and just
> > Xen does irq allocations in cpu_up.
> > 
> > So moving that protection into architecture code is not really an
> > option.
> > 
> > > > > Otherwise we will need to have something like arch_post_cpu_up()
> > > > > after the lock is released.
> > I'm not sure, that this will work. You probably want to do this in the
> > cpu prepare stage, i.e. before calling __cpu_up().
> 
> For PV guests (the ones that use xen_cpu_up()) it will work either before or
> after __cpu_up(). At least my (somewhat limited) testing didn't show any
> problems so far.
> 
> However, HVM CPUs use xen_hvm_cpu_up() and if you read comments there you will
> see that xen_smp_intr_init() needs to be called before native_cpu_up() but
> xen_init_lock_cpu() (which eventually calls irq_alloc_descs()) needs to be
> called after.
> 
> I think I can split xen_init_lock_cpu() so that the part that needs to be
> called after will avoid going into irq core code. And then the rest will go
> into arch_cpu_prepare().

I think we should revisit this for 4.3. For 4.2 we can do the trivial
variant and move the locking in native_cpu_up() and x86 only. x86 was
the only arch on which such wreckage has been seen in the wild, but we
should have that protection for all archs in the long run.

Patch below should fix the issue.

Thanks,

	tglx
---
commit d4a969314077914a623f3e2c5120cd2ef31aba30
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 14 22:03:57 2015 +0200

    genirq: Revert sparse irq locking around __cpu_up() and move it to x86 for now
    
    Boris reported that the sparse_irq protection around __cpu_up() in the
    generic code causes a regression on Xen. Xen allocates interrupts and
    some more in the xen_cpu_up() function, so it deadlocks on the
    sparse_irq_lock.
    
    There is no simple fix for this and we really should have the
    protection for all architectures, but for now the only solution is to
    move it to x86 where actual wreckage due to the lack of protection has
    been observed.
    
    Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Fixes: a89941816726 'hotplug: Prevent alloc/free of irq descriptors during cpu up/down'
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: xiao jin <jin.xiao@intel.com>
    Cc: Joerg Roedel <jroedel@suse.de>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
    Cc: xen-devel <xen-devel@lists.xenproject.org>

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d3010aa79daf..b1f3ed9c7a9e 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -992,8 +992,17 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 
 	common_cpu_up(cpu, tidle);
 
+	/*
+	 * We have to walk the irq descriptors to setup the vector
+	 * space for the cpu which comes online.  Prevent irq
+	 * alloc/free across the bringup.
+	 */
+	irq_lock_sparse();
+
 	err = do_boot_cpu(apicid, cpu, tidle);
+
 	if (err) {
+		irq_unlock_sparse();
 		pr_err("do_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu);
 		return -EIO;
 	}
@@ -1011,6 +1020,8 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 		touch_nmi_watchdog();
 	}
 
+	irq_unlock_sparse();
+
 	return 0;
 }
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6a374544d495..5644ec5582b9 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -527,18 +527,9 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen)
 		goto out_notify;
 	}
 
-	/*
-	 * Some architectures have to walk the irq descriptors to
-	 * setup the vector space for the cpu which comes online.
-	 * Prevent irq alloc/free across the bringup.
-	 */
-	irq_lock_sparse();
-
 	/* Arch-specific enabling code. */
 	ret = __cpu_up(cpu, idle);
 
-	irq_unlock_sparse();
-
 	if (ret != 0)
 		goto out_notify;
 	BUG_ON(!cpu_online(cpu));

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down
  2015-07-14 20:15             ` Thomas Gleixner
@ 2015-07-14 21:07               ` Boris Ostrovsky
  2016-03-12  9:19                 ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Boris Ostrovsky @ 2015-07-14 21:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Yanmin Zhang, Joerg Roedel, Peter Zijlstra, LKML, xiao jin,
	Peter Anvin, xen-devel, Borislav Petkov, Ingo Molnar

On 07/14/2015 04:15 PM, Thomas Gleixner wrote:
> On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
>> On 07/14/2015 01:32 PM, Thomas Gleixner wrote:
>>> On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
>>>> On 07/14/2015 11:44 AM, Thomas Gleixner wrote:
>>>>> On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
>>>>>>> Prevent allocation and freeing of interrupt descriptors accross cpu
>>>>>>> hotplug.
>>>>>> This breaks Xen guests that allocate interrupt descriptors in
>>>>>> .cpu_up().
>>>>> And where exactly does XEN allocate those descriptors?
>>>> xen_cpu_up()
>>>>       xen_setup_timer()
>>>>           bind_virq_to_irqhandler()
>>>>               bind_virq_to_irq()
>>>>                   xen_allocate_irq_dynamic()
>>>>                       xen_allocate_irqs_dynamic()
>>>>                           irq_alloc_descs()
>>>>
>>>>
>>>> There is also a similar pass via xen_cpu_up() -> xen_smp_intr_init()
>>> Sigh.
>>>    
>>>>>     
>>>>>> Any chance this locking can be moved into arch code?
>>>>> No.
>>> The issue here is that all architectures need that protection and just
>>> Xen does irq allocations in cpu_up.
>>>
>>> So moving that protection into architecture code is not really an
>>> option.
>>>
>>>>>> Otherwise we will need to have something like arch_post_cpu_up()
>>>>>> after the lock is released.
>>> I'm not sure, that this will work. You probably want to do this in the
>>> cpu prepare stage, i.e. before calling __cpu_up().
>> For PV guests (the ones that use xen_cpu_up()) it will work either before or
>> after __cpu_up(). At least my (somewhat limited) testing didn't show any
>> problems so far.
>>
>> However, HVM CPUs use xen_hvm_cpu_up() and if you read comments there you will
>> see that xen_smp_intr_init() needs to be called before native_cpu_up() but
>> xen_init_lock_cpu() (which eventually calls irq_alloc_descs()) needs to be
>> called after.
>>
>> I think I can split xen_init_lock_cpu() so that the part that needs to be
>> called after will avoid going into irq core code. And then the rest will go
>> into arch_cpu_prepare().
> I think we should revisit this for 4.3. For 4.2 we can do the trivial
> variant and move the locking in native_cpu_up() and x86 only. x86 was
> the only arch on which such wreckage has been seen in the wild, but we
> should have that protection for all archs in the long run.
>
> Patch below should fix the issue.


Thanks! Most of my tests passed, I had a couple of failures but I will 
need to see whether they are related to this patch.

-boris

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down
  2015-07-14 21:07               ` Boris Ostrovsky
@ 2016-03-12  9:19                 ` Thomas Gleixner
  2016-03-14 13:12                   ` Boris Ostrovsky
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2016-03-12  9:19 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Yanmin Zhang, Joerg Roedel, Peter Zijlstra, LKML, xiao jin,
	Peter Anvin, xen-devel, Borislav Petkov, Ingo Molnar

Boris,

On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
> On 07/14/2015 04:15 PM, Thomas Gleixner wrote:
> > > > The issue here is that all architectures need that protection and just
> > > > Xen does irq allocations in cpu_up.
> > > > 
> > > > So moving that protection into architecture code is not really an
> > > > option.
> > > > 
> > > > > > > Otherwise we will need to have something like arch_post_cpu_up()
> > > > > > > after the lock is released.
> > > > I'm not sure, that this will work. You probably want to do this in the
> > > > cpu prepare stage, i.e. before calling __cpu_up().
> > > For PV guests (the ones that use xen_cpu_up()) it will work either before
> > > or
> > > after __cpu_up(). At least my (somewhat limited) testing didn't show any
> > > problems so far.
> > > 
> > > However, HVM CPUs use xen_hvm_cpu_up() and if you read comments there you
> > > will
> > > see that xen_smp_intr_init() needs to be called before native_cpu_up() but
> > > xen_init_lock_cpu() (which eventually calls irq_alloc_descs()) needs to be
> > > called after.
> > > 
> > > I think I can split xen_init_lock_cpu() so that the part that needs to be
> > > called after will avoid going into irq core code. And then the rest will
> > > go
> > > into arch_cpu_prepare().
> > I think we should revisit this for 4.3. For 4.2 we can do the trivial
> > variant and move the locking in native_cpu_up() and x86 only. x86 was
> > the only arch on which such wreckage has been seen in the wild, but we
> > should have that protection for all archs in the long run.
> > 
> > Patch below should fix the issue.
> 
> Thanks! Most of my tests passed, I had a couple of failures but I will need to
> see whether they are related to this patch.

Did you ever come around to address that irq allocation from within cpu_up()?

I really want to generalize the protection instead of carrying that x86 only
hack forever.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down
  2016-03-12  9:19                 ` Thomas Gleixner
@ 2016-03-14 13:12                   ` Boris Ostrovsky
  0 siblings, 0 replies; 20+ messages in thread
From: Boris Ostrovsky @ 2016-03-14 13:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Yanmin Zhang, Joerg Roedel, Peter Zijlstra, LKML, xiao jin,
	Peter Anvin, xen-devel, Borislav Petkov, Ingo Molnar

On 03/12/2016 04:19 AM, Thomas Gleixner wrote:
> Boris,
>
> On Tue, 14 Jul 2015, Boris Ostrovsky wrote:
>> On 07/14/2015 04:15 PM, Thomas Gleixner wrote:
>>>>> The issue here is that all architectures need that protection and just
>>>>> Xen does irq allocations in cpu_up.
>>>>>
>>>>> So moving that protection into architecture code is not really an
>>>>> option.
>>>>>
>>>>>>>> Otherwise we will need to have something like arch_post_cpu_up()
>>>>>>>> after the lock is released.
>>>>> I'm not sure, that this will work. You probably want to do this in the
>>>>> cpu prepare stage, i.e. before calling __cpu_up().
>>>> For PV guests (the ones that use xen_cpu_up()) it will work either before
>>>> or
>>>> after __cpu_up(). At least my (somewhat limited) testing didn't show any
>>>> problems so far.
>>>>
>>>> However, HVM CPUs use xen_hvm_cpu_up() and if you read comments there you
>>>> will
>>>> see that xen_smp_intr_init() needs to be called before native_cpu_up() but
>>>> xen_init_lock_cpu() (which eventually calls irq_alloc_descs()) needs to be
>>>> called after.
>>>>
>>>> I think I can split xen_init_lock_cpu() so that the part that needs to be
>>>> called after will avoid going into irq core code. And then the rest will
>>>> go
>>>> into arch_cpu_prepare().
>>> I think we should revisit this for 4.3. For 4.2 we can do the trivial
>>> variant and move the locking in native_cpu_up() and x86 only. x86 was
>>> the only arch on which such wreckage has been seen in the wild, but we
>>> should have that protection for all archs in the long run.
>>>
>>> Patch below should fix the issue.
>> Thanks! Most of my tests passed, I had a couple of failures but I will need to
>> see whether they are related to this patch.
> Did you ever come around to address that irq allocation from within cpu_up()?
>
> I really want to generalize the protection instead of carrying that x86 only
> hack forever.

Sorry, I completely forgot about this. Let me see how I can take 
allocations from under the lock. I might just be able to put them in CPU 
notifiers --- most into CPU_UP_PREPARE but spinlock interrupt may need 
to go into CPU_ONLINE.

-boris

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-03-14 13:13 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-05 17:12 [patch 0/4] x86/irq: Plug a couple of cpu hotplug races Thomas Gleixner
2015-07-05 17:12 ` [patch 1/4] hotplug: Prevent alloc/free of irq descriptors during cpu up/down Thomas Gleixner
2015-07-07  9:48   ` [tip:irq/urgent] hotplug: Prevent alloc/ free " tip-bot for Thomas Gleixner
2015-07-07 20:06   ` tip-bot for Thomas Gleixner
2015-07-08  9:37   ` tip-bot for Thomas Gleixner
2015-07-14 14:39   ` [patch 1/4] hotplug: Prevent alloc/free " Boris Ostrovsky
2015-07-14 15:44     ` Thomas Gleixner
2015-07-14 16:03       ` Boris Ostrovsky
2015-07-14 17:32         ` Thomas Gleixner
2015-07-14 20:04           ` [Xen-devel] " Boris Ostrovsky
2015-07-14 20:15             ` Thomas Gleixner
2015-07-14 21:07               ` Boris Ostrovsky
2016-03-12  9:19                 ` Thomas Gleixner
2016-03-14 13:12                   ` Boris Ostrovsky
2015-07-05 17:12 ` [patch 2/4] x86: Plug irq vector hotplug race Thomas Gleixner
2015-07-07  9:57   ` [tip:x86/urgent] x86/irq: " tip-bot for Thomas Gleixner
2015-07-05 17:12 ` [patch 3/4] x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable() Thomas Gleixner
2015-07-07  9:57   ` [tip:x86/urgent] " tip-bot for Thomas Gleixner
2015-07-05 17:12 ` [patch 4/4] x86/irq: Retrieve irq data after locking irq_desc Thomas Gleixner
2015-07-07  9:58   ` [tip:x86/urgent] " tip-bot for Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).