linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/5] x86/irq: Cure various interrupt issues
@ 2019-06-25 11:13 Thomas Gleixner
  2019-06-25 11:13 ` [patch 1/5] genirq: Delay deactivation in free_irq() Thomas Gleixner
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Thomas Gleixner @ 2019-06-25 11:13 UTC (permalink / raw)
  To: LKML
  Cc: x86, Robert Hodaszi, Vadim Pasternak, Ido Schimmel,
	Greg Kroah-Hartman, linux-serial, Marc Zyngier

This series addresses a few long standing issues:

  1) The spurious interrupt warning which is emitted occasionally for
     no obvious reason. Partially harmless but annoying

  2) The spurious system vector detection which got wreckaged quite some
     time ago and can completely wedge a machine. Posted yesterday already
     in a preliminary version. Now actually verified that it does what it
     claims to do.

Details in the various patches.

For your conveniance the series is available from git:

    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.irq

Thanks,

	tglx

8<-----------------
 arch/x86/entry/entry_32.S      |   24 ++++++++++++++++++++++++
 arch/x86/entry/entry_64.S      |   30 ++++++++++++++++++++++++++----
 arch/x86/include/asm/hw_irq.h  |    5 ++++-
 arch/x86/kernel/apic/apic.c    |   33 ++++++++++++++++++++++-----------
 arch/x86/kernel/apic/io_apic.c |   39 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/apic/vector.c  |    4 ++--
 arch/x86/kernel/idt.c          |    3 ++-
 arch/x86/kernel/irq.c          |    2 +-
 include/linux/irq.h            |    2 ++
 kernel/irq/autoprobe.c         |    6 +++---
 kernel/irq/chip.c              |    6 ++++++
 kernel/irq/cpuhotplug.c        |    2 +-
 kernel/irq/internals.h         |    1 +
 kernel/irq/manage.c            |   41 +++++++++++++++++++++++++++++++++++------
 14 files changed, 168 insertions(+), 30 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch 1/5] genirq: Delay deactivation in free_irq()
  2019-06-25 11:13 [patch 0/5] x86/irq: Cure various interrupt issues Thomas Gleixner
@ 2019-06-25 11:13 ` Thomas Gleixner
  2019-06-28  7:42   ` Marc Zyngier
  2019-06-25 11:13 ` [patch 2/5] genirq: Add optional hardware synchronization for shutdown Thomas Gleixner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2019-06-25 11:13 UTC (permalink / raw)
  To: LKML
  Cc: x86, Robert Hodaszi, Vadim Pasternak, Ido Schimmel,
	Greg Kroah-Hartman, linux-serial, Marc Zyngier

When interrupts are shutdown, they are immediately deactivated in the
irqdomain hierarchy. While this looks obviously correct there is a subtle
issue:

There might be an interrupt in flight when free_irq() is invoking the
shutdown. This is properly handled at the irq descriptor / primary handler
level, but the deactivation might completely disable resources which are
required to acknowledge the interrupt.

Split the shutdown code and deactivate the interrupt after synchronization
in free_irq(). Fixup all other usage sites where this is not an issue to
invoke the combined shutdown_and_deactivate() function instead.

This still might be an issue if the interrupt in flight servicing is
delayed on a remote CPU beyond the invocation of synchronize_irq(), but
that cannot be handled at that level and needs to be handled in the
synchronize_irq() context.

Fixes: f8264e34965a ("irqdomain: Introduce new interfaces to support hierarchy irqdomains")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/irq/autoprobe.c  |    6 +++---
 kernel/irq/chip.c       |    6 ++++++
 kernel/irq/cpuhotplug.c |    2 +-
 kernel/irq/internals.h  |    1 +
 kernel/irq/manage.c     |   12 +++++++++++-
 5 files changed, 22 insertions(+), 5 deletions(-)

--- a/kernel/irq/autoprobe.c
+++ b/kernel/irq/autoprobe.c
@@ -90,7 +90,7 @@ unsigned long probe_irq_on(void)
 			/* It triggered already - consider it spurious. */
 			if (!(desc->istate & IRQS_WAITING)) {
 				desc->istate &= ~IRQS_AUTODETECT;
-				irq_shutdown(desc);
+				irq_shutdown_and_deactivate(desc);
 			} else
 				if (i < 32)
 					mask |= 1 << i;
@@ -127,7 +127,7 @@ unsigned int probe_irq_mask(unsigned lon
 				mask |= 1 << i;
 
 			desc->istate &= ~IRQS_AUTODETECT;
-			irq_shutdown(desc);
+			irq_shutdown_and_deactivate(desc);
 		}
 		raw_spin_unlock_irq(&desc->lock);
 	}
@@ -169,7 +169,7 @@ int probe_irq_off(unsigned long val)
 				nr_of_irqs++;
 			}
 			desc->istate &= ~IRQS_AUTODETECT;
-			irq_shutdown(desc);
+			irq_shutdown_and_deactivate(desc);
 		}
 		raw_spin_unlock_irq(&desc->lock);
 	}
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -314,6 +314,12 @@ void irq_shutdown(struct irq_desc *desc)
 		}
 		irq_state_clr_started(desc);
 	}
+}
+
+
+void irq_shutdown_and_deactivate(struct irq_desc *desc)
+{
+	irq_shutdown(desc);
 	/*
 	 * This must be called even if the interrupt was never started up,
 	 * because the activation can happen before the interrupt is
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -116,7 +116,7 @@ static bool migrate_one_irq(struct irq_d
 		 */
 		if (irqd_affinity_is_managed(d)) {
 			irqd_set_managed_shutdown(d);
-			irq_shutdown(desc);
+			irq_shutdown_and_deactivate(desc);
 			return false;
 		}
 		affinity = cpu_online_mask;
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -82,6 +82,7 @@ extern int irq_activate_and_startup(stru
 extern int irq_startup(struct irq_desc *desc, bool resend, bool force);
 
 extern void irq_shutdown(struct irq_desc *desc);
+extern void irq_shutdown_and_deactivate(struct irq_desc *desc);
 extern void irq_enable(struct irq_desc *desc);
 extern void irq_disable(struct irq_desc *desc);
 extern void irq_percpu_enable(struct irq_desc *desc, unsigned int cpu);
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -13,6 +13,7 @@
 #include <linux/module.h>
 #include <linux/random.h>
 #include <linux/interrupt.h>
+#include <linux/irqdomain.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/rt.h>
@@ -1699,6 +1700,7 @@ static struct irqaction *__free_irq(stru
 	/* If this was the last handler, shut down the IRQ line: */
 	if (!desc->action) {
 		irq_settings_clr_disable_unlazy(desc);
+		/* Only shutdown. Deactivate after synchronize_irq() */
 		irq_shutdown(desc);
 	}
 
@@ -1768,6 +1770,14 @@ static struct irqaction *__free_irq(stru
 		 * require it to deallocate resources over the slow bus.
 		 */
 		chip_bus_lock(desc);
+		/*
+		 * There is no interrupt on the fly anymore. Deactivate it
+		 * completely.
+		 */
+		raw_spin_lock_irqsave(&desc->lock, flags);
+		irq_domain_deactivate_irq(&desc->irq_data);
+		raw_spin_unlock_irqrestore(&desc->lock, flags);
+
 		irq_release_resources(desc);
 		chip_bus_sync_unlock(desc);
 		irq_remove_timings(desc);
@@ -1855,7 +1865,7 @@ static const void *__cleanup_nmi(unsigne
 	}
 
 	irq_settings_clr_disable_unlazy(desc);
-	irq_shutdown(desc);
+	irq_shutdown_and_deactivate(desc);
 
 	irq_release_resources(desc);
 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch 2/5] genirq: Add optional hardware synchronization for shutdown
  2019-06-25 11:13 [patch 0/5] x86/irq: Cure various interrupt issues Thomas Gleixner
  2019-06-25 11:13 ` [patch 1/5] genirq: Delay deactivation in free_irq() Thomas Gleixner
@ 2019-06-25 11:13 ` Thomas Gleixner
  2019-06-26 13:03   ` Thomas Gleixner
  2019-06-28  7:59   ` Marc Zyngier
  2019-06-25 11:13 ` [patch 3/5] x86/ioapic: Implement irq_inflight() callback Thomas Gleixner
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 11+ messages in thread
From: Thomas Gleixner @ 2019-06-25 11:13 UTC (permalink / raw)
  To: LKML
  Cc: x86, Robert Hodaszi, Vadim Pasternak, Ido Schimmel,
	Greg Kroah-Hartman, linux-serial, Marc Zyngier

free_irq() ensures that no hardware interrupt handler is executing on a
different CPU before actually releasing resources and deactivating the
interrupt completely in a domain hierarchy.

But that does not catch the case where the interrupt is on flight at the
hardware level but not yet serviced by the target CPU. That creates an
interesing race condition:

   CPU 0                  CPU 1               IRQ CHIP

                                              interrupt is raised
                                              sent to CPU1
			  Unable to handle
			  immediately
			  (interrupts off,
			   deep idle delay)
   mask()
   ...
   free()
     shutdown()
     synchronize_irq()
     release_resources()
                          do_IRQ()
                            -> resources are not available

That might be harmless and just trigger a spurious interrupt warning, but
some interrupt chips might get into a wedged state.

Provide infrastructure for interrupt chips to provide an optional
irq_inflight() callback and use it for the synchronization in free_irq().

synchronize_[hard]irq() are not using this mechanism as it might actually
deadlock unter certain conditions.

Fixes: 464d12309e1b ("x86/vector: Switch IOAPIC to global reservation mode")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irq.h |    2 ++
 kernel/irq/manage.c |   29 ++++++++++++++++++++++++-----
 2 files changed, 26 insertions(+), 5 deletions(-)

--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -418,6 +418,7 @@ static inline irq_hw_number_t irqd_to_hw
  *			required. This is used for CPU hotplug where the
  *			target CPU is not yet set in the cpu_online_mask.
  * @irq_retrigger:	resend an IRQ to the CPU
+ * @irq_inflight:	chip level detection of interrupts in flight (optional)
  * @irq_set_type:	set the flow type (IRQ_TYPE_LEVEL/etc.) of an IRQ
  * @irq_set_wake:	enable/disable power-management wake-on of an IRQ
  * @irq_bus_lock:	function to lock access to slow bus (i2c) chips
@@ -462,6 +463,7 @@ struct irq_chip {
 
 	int		(*irq_set_affinity)(struct irq_data *data, const struct cpumask *dest, bool force);
 	int		(*irq_retrigger)(struct irq_data *data);
+	int		(*irq_inflight)(struct irq_data *data);
 	int		(*irq_set_type)(struct irq_data *data, unsigned int flow_type);
 	int		(*irq_set_wake)(struct irq_data *data, unsigned int on);
 
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -35,8 +35,10 @@ static int __init setup_forced_irqthread
 early_param("threadirqs", setup_forced_irqthreads);
 #endif
 
-static void __synchronize_hardirq(struct irq_desc *desc)
+static void __synchronize_hardirq(struct irq_desc *desc, bool sync_chip)
 {
+	struct irq_data *irqd = irq_desc_get_irq_data(desc);
+	struct irq_chip *chip = irq_data_get_irq_chip(irqd);
 	bool inprogress;
 
 	do {
@@ -52,6 +54,13 @@ static void __synchronize_hardirq(struct
 		/* Ok, that indicated we're done: double-check carefully. */
 		raw_spin_lock_irqsave(&desc->lock, flags);
 		inprogress = irqd_irq_inprogress(&desc->irq_data);
+
+		/*
+		 * If requested and supported, check at the chip whether it
+		 * is in flight at the hardware level:
+		 */
+		if (!inprogress && sync_chip && chip && chip->irq_inflight)
+			inprogress = chip->irq_inflight(irqd);
 		raw_spin_unlock_irqrestore(&desc->lock, flags);
 
 		/* Oops, that failed? */
@@ -74,13 +83,16 @@ static void __synchronize_hardirq(struct
  *	Returns: false if a threaded handler is active.
  *
  *	This function may be called - with care - from IRQ context.
+ *
+ *	It does not check whether there is an interrupt on flight at the
+ *	hardware level, but not serviced yet, as this might deadlock.
  */
 bool synchronize_hardirq(unsigned int irq)
 {
 	struct irq_desc *desc = irq_to_desc(irq);
 
 	if (desc) {
-		__synchronize_hardirq(desc);
+		__synchronize_hardirq(desc, false);
 		return !atomic_read(&desc->threads_active);
 	}
 
@@ -97,13 +109,16 @@ EXPORT_SYMBOL(synchronize_hardirq);
  *	holding a resource the IRQ handler may need you will deadlock.
  *
  *	This function may be called - with care - from IRQ context.
+ *
+ *	It does not check whether there is an interrupt on flight at the
+ *	hardware level, but not serviced yet, as this might deadlock.
  */
 void synchronize_irq(unsigned int irq)
 {
 	struct irq_desc *desc = irq_to_desc(irq);
 
 	if (desc) {
-		__synchronize_hardirq(desc);
+		__synchronize_hardirq(desc, false);
 		/*
 		 * We made sure that no hardirq handler is
 		 * running. Now verify that no threaded handlers are
@@ -1729,8 +1744,12 @@ static struct irqaction *__free_irq(stru
 
 	unregister_handler_proc(irq, action);
 
-	/* Make sure it's not being used on another CPU: */
-	synchronize_hardirq(irq);
+	/*
+	 * Make sure it's not being used on another CPU and if the chip
+	 * supports it also make sure that there is no (not yet serviced)
+	 * interrupt on flight at the hardware level.
+	 */
+	__synchronize_hardirq(desc, true);
 
 #ifdef CONFIG_DEBUG_SHIRQ
 	/*



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch 3/5] x86/ioapic: Implement irq_inflight() callback
  2019-06-25 11:13 [patch 0/5] x86/irq: Cure various interrupt issues Thomas Gleixner
  2019-06-25 11:13 ` [patch 1/5] genirq: Delay deactivation in free_irq() Thomas Gleixner
  2019-06-25 11:13 ` [patch 2/5] genirq: Add optional hardware synchronization for shutdown Thomas Gleixner
@ 2019-06-25 11:13 ` Thomas Gleixner
  2019-06-25 11:13 ` [patch 4/5] x86/irq: Handle spurious interrupt after shutdown gracefully Thomas Gleixner
  2019-06-25 11:13 ` [patch 5/5] x86/irq: Seperate unused system vectors from spurious entry again Thomas Gleixner
  4 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2019-06-25 11:13 UTC (permalink / raw)
  To: LKML
  Cc: x86, Robert Hodaszi, Vadim Pasternak, Ido Schimmel,
	Greg Kroah-Hartman, linux-serial, Marc Zyngier

When an interrupt is shut down in free_irq() there might be an inflight
interrupt which is not yet serviced pending in the IO-APIC remote IRR. That
means the interrupt has been sent to the target CPUs local APIC, but the
target CPU is in a state which delays the servicing.

So free_irq() would proceed to free resources and to clear the vector
because synchronize_hardirq() does not see an interrupt handler in
progress.

That can trigger a spurious interrupt warning, which is harmless and just
confuses users, but it also can leave the remote IRR in a stale state
because once the handler is invoked the interrupt resources might be freed
already and therefore acknowledgement is not possible anymore.

Implement the new irq_inflight() callback for the IO-APIC irq chip. The
callback is invoked from free_irq() via __synchronize_hardirq(). Check the
remote IRR bit of the interrupt and return 'in flight' if it is set and the
interrupt is configured in level mode. For edge mode the remote IRR has no
meaning.

As this is only meaningful for level triggered interrupts this won't cure
the potential spurious interrupt warning for edge triggered interrupts, but
the edge trigger case does not result in stale hardware state. This has to
be addressed at the vector/interrupt entry level seperately.

Fixes: 464d12309e1b ("x86/vector: Switch IOAPIC to global reservation mode")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/io_apic.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1893,6 +1893,43 @@ static int ioapic_set_affinity(struct ir
 	return ret;
 }
 
+/*
+ * Interrupt shutdown masks the ioapic pin, but the interrupt might already
+ * be on flight, but not yet serviced by the target CPU. That means
+ * __synchronize_hardirq() would return and claim that everything is calmed
+ * down. So free_irq() would proceed and deactivate the interrupt and free
+ * resources.
+ *
+ * Once the target CPU comes around to service it it will find a cleared
+ * vector and complain. While the spurious interrupt is harmless, the full
+ * release of resources might prevent the interrupt from being acknowledged
+ * which keeps the hardware in a weird state.
+ *
+ * Verify that the corresponding Remote-IRR bits are clear.
+ */
+static int ioapic_irq_inflight(struct irq_data *irqd)
+{
+	struct mp_chip_data *mcd = irqd->chip_data;
+	struct IO_APIC_route_entry rentry;
+	struct irq_pin_list *p;
+	int ret = 0;
+
+	raw_spin_lock(&ioapic_lock);
+	for_each_irq_pin(p, mcd->irq_2_pin) {
+		rentry = __ioapic_read_entry(p->apic, p->pin);
+		/*
+		 * The remote IRR is only valid in level trigger mode. It's
+		 * meaning is undefined for edge triggered interrupts.
+		 */
+		if (rentry.irr && rentry.trigger) {
+			ret = 1;
+			break;
+		}
+	}
+	raw_spin_unlock(&ioapic_lock);
+	return ret;
+}
+
 static struct irq_chip ioapic_chip __read_mostly = {
 	.name			= "IO-APIC",
 	.irq_startup		= startup_ioapic_irq,
@@ -1902,6 +1939,7 @@ static struct irq_chip ioapic_chip __rea
 	.irq_eoi		= ioapic_ack_level,
 	.irq_set_affinity	= ioapic_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_inflight		= ioapic_irq_inflight,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 
@@ -1914,6 +1952,7 @@ static struct irq_chip ioapic_ir_chip __
 	.irq_eoi		= ioapic_ir_ack_level,
 	.irq_set_affinity	= ioapic_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_inflight		= ioapic_irq_inflight,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch 4/5] x86/irq: Handle spurious interrupt after shutdown gracefully
  2019-06-25 11:13 [patch 0/5] x86/irq: Cure various interrupt issues Thomas Gleixner
                   ` (2 preceding siblings ...)
  2019-06-25 11:13 ` [patch 3/5] x86/ioapic: Implement irq_inflight() callback Thomas Gleixner
@ 2019-06-25 11:13 ` Thomas Gleixner
  2019-06-25 11:13 ` [patch 5/5] x86/irq: Seperate unused system vectors from spurious entry again Thomas Gleixner
  4 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2019-06-25 11:13 UTC (permalink / raw)
  To: LKML
  Cc: x86, Robert Hodaszi, Vadim Pasternak, Ido Schimmel,
	Greg Kroah-Hartman, linux-serial, Marc Zyngier

Since the rework of the vector management, warnings about spurious
interrupts have been reported. Robert provided some more information and
did an initial analysis. The following situation leads to these warnings:

   CPU 0                  CPU 1               IO_APIC

                                              interrupt is raised
                                              sent to CPU1
			  Unable to handle
			  immediately
			  (interrupts off,
			   deep idle delay)
   mask()
   ...
   free()
     shutdown()
     synchronize_irq()
     clear_vector()
                          do_IRQ()
                            -> vector is clear

Before the rework the vector entries of legacy interrupts were statically
assigned and occupied precious vector space while most of them were
unused. Due to that the above situation was handled silently because the
vector was handled and the core handler of the assigned interrupt
descriptor noticed that it is shut down and returned.

While this has been usually observed with legacy interrupts, this situation
is not limited to them. Any other interrupt source, e.g. MSI, can cause the
same issue.

After adding proper synchronization for level triggered interrupts, this
can only happen for edge triggered interrupts where the IO-APIC obviously
cannot provide information about interrupts in flight.

While the spurious warning is actually harmless in this case it worries
users and driver developers.

Handle it gracefully by marking the vector entry as VECTOR_SHUTDOWN instead
of VECTOR_UNUSED when the vector is freed up.

If that above late handling happens the spurious detector will not complain
and switch the entry to VECTOR_UNUSED. Any subsequent spurious interrupt on
that line will trigger the spurious warning as before.

Fixes: 464d12309e1b ("x86/vector: Switch IOAPIC to global reservation mode")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/hw_irq.h |    3 ++-
 arch/x86/kernel/apic/vector.c |    4 ++--
 arch/x86/kernel/irq.c         |    2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -151,7 +151,8 @@ extern char irq_entries_start[];
 #endif
 
 #define VECTOR_UNUSED		NULL
-#define VECTOR_RETRIGGERED	((void *)~0UL)
+#define VECTOR_SHUTDOWN		((void *)~0UL)
+#define VECTOR_RETRIGGERED	((void *)~1UL)
 
 typedef struct irq_desc* vector_irq_t[NR_VECTORS];
 DECLARE_PER_CPU(vector_irq_t, vector_irq);
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -340,7 +340,7 @@ static void clear_irq_vector(struct irq_
 	trace_vector_clear(irqd->irq, vector, apicd->cpu, apicd->prev_vector,
 			   apicd->prev_cpu);
 
-	per_cpu(vector_irq, apicd->cpu)[vector] = VECTOR_UNUSED;
+	per_cpu(vector_irq, apicd->cpu)[vector] = VECTOR_SHUTDOWN;
 	irq_matrix_free(vector_matrix, apicd->cpu, vector, managed);
 	apicd->vector = 0;
 
@@ -349,7 +349,7 @@ static void clear_irq_vector(struct irq_
 	if (!vector)
 		return;
 
-	per_cpu(vector_irq, apicd->prev_cpu)[vector] = VECTOR_UNUSED;
+	per_cpu(vector_irq, apicd->prev_cpu)[vector] = VECTOR_SHUTDOWN;
 	irq_matrix_free(vector_matrix, apicd->prev_cpu, vector, managed);
 	apicd->prev_vector = 0;
 	apicd->move_in_progress = 0;
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -247,7 +247,7 @@ u64 arch_irq_stat(void)
 	if (!handle_irq(desc, regs)) {
 		ack_APIC_irq();
 
-		if (desc != VECTOR_RETRIGGERED) {
+		if (desc != VECTOR_RETRIGGERED && desc != VECTOR_SHUTDOWN) {
 			pr_emerg_ratelimited("%s: %d.%d No irq handler for vector\n",
 					     __func__, smp_processor_id(),
 					     vector);



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch 5/5] x86/irq: Seperate unused system vectors from spurious entry again
  2019-06-25 11:13 [patch 0/5] x86/irq: Cure various interrupt issues Thomas Gleixner
                   ` (3 preceding siblings ...)
  2019-06-25 11:13 ` [patch 4/5] x86/irq: Handle spurious interrupt after shutdown gracefully Thomas Gleixner
@ 2019-06-25 11:13 ` Thomas Gleixner
  4 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2019-06-25 11:13 UTC (permalink / raw)
  To: LKML
  Cc: x86, Robert Hodaszi, Vadim Pasternak, Ido Schimmel,
	Greg Kroah-Hartman, linux-serial, Marc Zyngier, Jan Kiszka,
	Jan Beulich

Quite some time ago the interrupt entry stubs for unused vectors in the
system vector range got removed and directly mapped to the spurious
interrupt vector entry point.

Sounds reasonable, but it's subtly broken. The spurious interrupt vector
entry point pushes vector number 0xFF on the stack which makes the whole
logic in __smp_spurious_interrupt() pointless.

As a consequence any spurious interrupt which comes from a vector != 0xFF
is treated as a real spurious interrupt (vector 0xFF) and not
acknowledged. That subsequently stalls all interrupt vectors of equal and
lower priority, which brings the system to a grinding halt.

This can happen because even on 64-bit the system vector space is not
guaranteed to be fully populated. A full compile time handling of the
unused vectors is not possible because quite some of them are conditonally
populated at runtime.

Bring the entry stubs back, which wastes 160 bytes if all stubs are unused,
but gains the proper handling back. There is no point to selectively spare
some of the stubs which are known at compile time as the required code in
the IDT management would be way larger and convoluted.

Do not route the spurious entries through common_interrupt and do_IRQ() as
the original code did. Route it to smp_spurious_interrupt() which evaluates
the vector number and acts accordingly now that the real vector numbers are
handed in.

Fixup the pr_warn so the actual spurious vector (0xff) is clearly
distiguished from the other vectors and also note for the vectored case
whether it was pending in the ISR or not.

 "Spurious APIC interrupt (vector 0xFF) on CPU#0, should never happen."
 "Spurious interrupt vector 0xed on CPU#1. Acked."
 "Spurious interrupt vector 0xee on CPU#1. Not pending!."

Fixes: 2414e021ac8d ("x86: Avoid building unused IRQ entry stubs")
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jan Beulich <jbeulich@suse.com>
---
 arch/x86/entry/entry_32.S     |   24 ++++++++++++++++++++++++
 arch/x86/entry/entry_64.S     |   30 ++++++++++++++++++++++++++----
 arch/x86/include/asm/hw_irq.h |    2 ++
 arch/x86/kernel/apic/apic.c   |   33 ++++++++++++++++++++++-----------
 arch/x86/kernel/idt.c         |    3 ++-
 5 files changed, 76 insertions(+), 16 deletions(-)

--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1104,6 +1104,30 @@ ENTRY(irq_entries_start)
     .endr
 END(irq_entries_start)
 
+#ifdef CONFIG_X86_LOCAL_APIC
+	.align 8
+ENTRY(spurious_entries_start)
+    vector=FIRST_SYSTEM_VECTOR
+    .rept (NR_VECTORS - FIRST_SYSTEM_VECTOR)
+	pushl	$(~vector+0x80)			/* Note: always in signed byte range */
+    vector=vector+1
+	jmp	common_spurious
+	.align	8
+    .endr
+END(spurious_entries_start)
+
+common_spurious:
+	ASM_CLAC
+	addl	$-0x80, (%esp)			/* Adjust vector into the [-256, -1] range */
+	SAVE_ALL switch_stacks=1
+	ENCODE_FRAME_POINTER
+	TRACE_IRQS_OFF
+	movl	%esp, %eax
+	call	smp_spurious_interrupt
+	jmp	ret_from_intr
+ENDPROC(common_interrupt)
+#endif
+
 /*
  * the CPU automatically disables interrupts when executing an IRQ vector,
  * so IRQ-flags tracing has to follow that:
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -375,6 +375,18 @@ ENTRY(irq_entries_start)
     .endr
 END(irq_entries_start)
 
+	.align 8
+ENTRY(spurious_entries_start)
+    vector=FIRST_SYSTEM_VECTOR
+    .rept (NR_VECTORS - FIRST_SYSTEM_VECTOR)
+	UNWIND_HINT_IRET_REGS
+	pushq	$(~vector+0x80)			/* Note: always in signed byte range */
+	jmp	common_spurious
+	.align	8
+	vector=vector+1
+    .endr
+END(spurious_entries_start)
+
 .macro DEBUG_ENTRY_ASSERT_IRQS_OFF
 #ifdef CONFIG_DEBUG_ENTRY
 	pushq %rax
@@ -571,10 +583,20 @@ END(interrupt_entry)
 
 /* Interrupt entry/exit. */
 
-	/*
-	 * The interrupt stubs push (~vector+0x80) onto the stack and
-	 * then jump to common_interrupt.
-	 */
+/*
+ * The interrupt stubs push (~vector+0x80) onto the stack and
+ * then jump to common_spurious/interrupt.
+ */
+common_spurious:
+	addq	$-0x80, (%rsp)			/* Adjust vector to [-256, -1] range */
+	call	interrupt_entry
+	UNWIND_HINT_REGS indirect=1
+	call	smp_spurious_interrupt		/* rdi points to pt_regs */
+	jmp	ret_from_intr
+END(common_spurious)
+_ASM_NOKPROBE(common_spurious)
+
+/* common_interrupt is a hotpath. Align it */
 	.p2align CONFIG_X86_L1_CACHE_SHIFT
 common_interrupt:
 	addq	$-0x80, (%rsp)			/* Adjust vector to [-256, -1] range */
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -150,6 +150,8 @@ extern char irq_entries_start[];
 #define trace_irq_entries_start irq_entries_start
 #endif
 
+extern char spurious_entries_start[];
+
 #define VECTOR_UNUSED		NULL
 #define VECTOR_SHUTDOWN		((void *)~0UL)
 #define VECTOR_RETRIGGERED	((void *)~1UL)
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2040,21 +2040,32 @@ void __init register_lapic_address(unsig
 	entering_irq();
 	trace_spurious_apic_entry(vector);
 
+	inc_irq_stat(irq_spurious_count);
+
+	/*
+	 * If this is a spurious interrupt then do not acknowledge
+	 */
+	if (vector == SPURIOUS_APIC_VECTOR) {
+		/* See SDM vol 3 */
+		pr_info("Spurious APIC interrupt (vector 0xFF) on CPU#%d, should never happen.\n",
+			smp_processor_id());
+		goto out;
+	}
+
 	/*
-	 * Check if this really is a spurious interrupt and ACK it
-	 * if it is a vectored one.  Just in case...
-	 * Spurious interrupts should not be ACKed.
+	 * If it is a vectored one, verify it's set in the ISR. If set,
+	 * acknowledge it.
 	 */
 	v = apic_read(APIC_ISR + ((vector & ~0x1f) >> 1));
-	if (v & (1 << (vector & 0x1f)))
+	if (v & (1 << (vector & 0x1f))) {
+		pr_info("Spurious interrupt (vector 0x%02x) on CPU#%d. Acked\n",
+			vector, smp_processor_id());
 		ack_APIC_irq();
-
-	inc_irq_stat(irq_spurious_count);
-
-	/* see sw-dev-man vol 3, chapter 7.4.13.5 */
-	pr_info("spurious APIC interrupt through vector %02x on CPU#%d, "
-		"should never happen.\n", vector, smp_processor_id());
-
+	} else {
+		pr_info("Spurious interrupt (vector 0x%02x) on CPU#%d. Not pending!\n",
+			vector, smp_processor_id());
+	}
+out:
 	trace_spurious_apic_exit(vector);
 	exiting_irq();
 }
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -319,7 +319,8 @@ void __init idt_setup_apic_and_irq_gates
 #ifdef CONFIG_X86_LOCAL_APIC
 	for_each_clear_bit_from(i, system_vectors, NR_VECTORS) {
 		set_bit(i, system_vectors);
-		set_intr_gate(i, spurious_interrupt);
+		entry = spurious_entries_start + 8 * (i - FIRST_SYSTEM_VECTOR);
+		set_intr_gate(i, entry);
 	}
 #endif
 }



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch 2/5] genirq: Add optional hardware synchronization for shutdown
  2019-06-25 11:13 ` [patch 2/5] genirq: Add optional hardware synchronization for shutdown Thomas Gleixner
@ 2019-06-26 13:03   ` Thomas Gleixner
  2019-06-28  7:59   ` Marc Zyngier
  1 sibling, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2019-06-26 13:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Robert Hodaszi, Vadim Pasternak, Ido Schimmel,
	Greg Kroah-Hartman, linux-serial, Marc Zyngier

On Tue, 25 Jun 2019, Thomas Gleixner wrote:
>   *	Returns: false if a threaded handler is active.
>   *
>   *	This function may be called - with care - from IRQ context.
> + *
> + *	It does not check whether there is an interrupt on flight at the
> + *	hardware level, but not serviced yet, as this might deadlock.

  + when invoked from an interrupt disabled region.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch 1/5] genirq: Delay deactivation in free_irq()
  2019-06-25 11:13 ` [patch 1/5] genirq: Delay deactivation in free_irq() Thomas Gleixner
@ 2019-06-28  7:42   ` Marc Zyngier
  2019-06-28  9:28     ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2019-06-28  7:42 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Robert Hodaszi, Vadim Pasternak, Ido Schimmel,
	Greg Kroah-Hartman, linux-serial

Hi Thomas,

On 25/06/2019 12:13, Thomas Gleixner wrote:
> When interrupts are shutdown, they are immediately deactivated in the
> irqdomain hierarchy. While this looks obviously correct there is a subtle
> issue:
> 
> There might be an interrupt in flight when free_irq() is invoking the
> shutdown. This is properly handled at the irq descriptor / primary handler
> level, but the deactivation might completely disable resources which are
> required to acknowledge the interrupt.
> 
> Split the shutdown code and deactivate the interrupt after synchronization
> in free_irq(). Fixup all other usage sites where this is not an issue to
> invoke the combined shutdown_and_deactivate() function instead.
> 
> This still might be an issue if the interrupt in flight servicing is
> delayed on a remote CPU beyond the invocation of synchronize_irq(), but
> that cannot be handled at that level and needs to be handled in the
> synchronize_irq() context.
> 
> Fixes: f8264e34965a ("irqdomain: Introduce new interfaces to support hierarchy irqdomains")
> Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/irq/autoprobe.c  |    6 +++---
>  kernel/irq/chip.c       |    6 ++++++
>  kernel/irq/cpuhotplug.c |    2 +-
>  kernel/irq/internals.h  |    1 +
>  kernel/irq/manage.c     |   12 +++++++++++-
>  5 files changed, 22 insertions(+), 5 deletions(-)
> 
> --- a/kernel/irq/autoprobe.c
> +++ b/kernel/irq/autoprobe.c
> @@ -90,7 +90,7 @@ unsigned long probe_irq_on(void)
>  			/* It triggered already - consider it spurious. */
>  			if (!(desc->istate & IRQS_WAITING)) {
>  				desc->istate &= ~IRQS_AUTODETECT;
> -				irq_shutdown(desc);
> +				irq_shutdown_and_deactivate(desc);
>  			} else
>  				if (i < 32)
>  					mask |= 1 << i;
> @@ -127,7 +127,7 @@ unsigned int probe_irq_mask(unsigned lon
>  				mask |= 1 << i;
>  
>  			desc->istate &= ~IRQS_AUTODETECT;
> -			irq_shutdown(desc);
> +			irq_shutdown_and_deactivate(desc);
>  		}
>  		raw_spin_unlock_irq(&desc->lock);
>  	}
> @@ -169,7 +169,7 @@ int probe_irq_off(unsigned long val)
>  				nr_of_irqs++;
>  			}
>  			desc->istate &= ~IRQS_AUTODETECT;
> -			irq_shutdown(desc);
> +			irq_shutdown_and_deactivate(desc);
>  		}
>  		raw_spin_unlock_irq(&desc->lock);
>  	}
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -314,6 +314,12 @@ void irq_shutdown(struct irq_desc *desc)
>  		}
>  		irq_state_clr_started(desc);
>  	}
> +}
> +
> +
> +void irq_shutdown_and_deactivate(struct irq_desc *desc)
> +{
> +	irq_shutdown(desc);
>  	/*
>  	 * This must be called even if the interrupt was never started up,
>  	 * because the activation can happen before the interrupt is
> --- a/kernel/irq/cpuhotplug.c
> +++ b/kernel/irq/cpuhotplug.c
> @@ -116,7 +116,7 @@ static bool migrate_one_irq(struct irq_d
>  		 */
>  		if (irqd_affinity_is_managed(d)) {
>  			irqd_set_managed_shutdown(d);
> -			irq_shutdown(desc);
> +			irq_shutdown_and_deactivate(desc);
>  			return false;
>  		}
>  		affinity = cpu_online_mask;
> --- a/kernel/irq/internals.h
> +++ b/kernel/irq/internals.h
> @@ -82,6 +82,7 @@ extern int irq_activate_and_startup(stru
>  extern int irq_startup(struct irq_desc *desc, bool resend, bool force);
>  
>  extern void irq_shutdown(struct irq_desc *desc);
> +extern void irq_shutdown_and_deactivate(struct irq_desc *desc);
>  extern void irq_enable(struct irq_desc *desc);
>  extern void irq_disable(struct irq_desc *desc);
>  extern void irq_percpu_enable(struct irq_desc *desc, unsigned int cpu);
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -13,6 +13,7 @@
>  #include <linux/module.h>
>  #include <linux/random.h>
>  #include <linux/interrupt.h>
> +#include <linux/irqdomain.h>
>  #include <linux/slab.h>
>  #include <linux/sched.h>
>  #include <linux/sched/rt.h>
> @@ -1699,6 +1700,7 @@ static struct irqaction *__free_irq(stru
>  	/* If this was the last handler, shut down the IRQ line: */
>  	if (!desc->action) {
>  		irq_settings_clr_disable_unlazy(desc);
> +		/* Only shutdown. Deactivate after synchronize_irq() */

synchronize_hardirq()

>  		irq_shutdown(desc);
>  	}
>  
> @@ -1768,6 +1770,14 @@ static struct irqaction *__free_irq(stru
>  		 * require it to deallocate resources over the slow bus.
>  		 */
>  		chip_bus_lock(desc);
> +		/*
> +		 * There is no interrupt on the fly anymore. Deactivate it
> +		 * completely.
> +		 */
> +		raw_spin_lock_irqsave(&desc->lock, flags);
> +		irq_domain_deactivate_irq(&desc->irq_data);
> +		raw_spin_unlock_irqrestore(&desc->lock, flags);
> +
>  		irq_release_resources(desc);
>  		chip_bus_sync_unlock(desc);
>  		irq_remove_timings(desc);
> @@ -1855,7 +1865,7 @@ static const void *__cleanup_nmi(unsigne
>  	}
>  
>  	irq_settings_clr_disable_unlazy(desc);
> -	irq_shutdown(desc);
> +	irq_shutdown_and_deactivate(desc);
>  
>  	irq_release_resources(desc);
The patch makes complete sense, so this comment is only a nit, feel free 
to ignore me: I find it a bit odd that irq_shutdown() doesn't do the 
full thing anymore, given that it is the "canonical" function, and that 
__free_irq is the only one that has special needs. Suggestion below.

Irrespective of which version you prefer:

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 29d6c7d070b4..aeab853fcc10 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -301,7 +301,7 @@ int irq_activate_and_startup(struct irq_desc *desc, bool resend)
 
 static void __irq_disable(struct irq_desc *desc, bool mask);
 
-void irq_shutdown(struct irq_desc *desc)
+void __irq_shutdown(struct irq_desc *desc)
 {
 	if (irqd_is_started(&desc->irq_data)) {
 		desc->depth = 1;
@@ -314,6 +314,11 @@ void irq_shutdown(struct irq_desc *desc)
 		}
 		irq_state_clr_started(desc);
 	}
+}
+
+void irq_shutdown(struct irq_desc *desc)
+{
+	__irq_shutdown(desc);
 	/*
 	 * This must be called even if the interrupt was never started up,
 	 * because the activation can happen before the interrupt is
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 70c3053bc1f6..107f7d342e3d 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -81,6 +81,7 @@ extern int irq_activate(struct irq_desc *desc);
 extern int irq_activate_and_startup(struct irq_desc *desc, bool resend);
 extern int irq_startup(struct irq_desc *desc, bool resend, bool force);
 
+extern void __irq_shutdown(struct irq_desc *desc);
 extern void irq_shutdown(struct irq_desc *desc);
 extern void irq_enable(struct irq_desc *desc);
 extern void irq_disable(struct irq_desc *desc);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 78f3ddeb7fe4..f19e3589989e 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -13,6 +13,7 @@
 #include <linux/module.h>
 #include <linux/random.h>
 #include <linux/interrupt.h>
+#include <linux/irqdomain.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/rt.h>
@@ -1699,7 +1700,8 @@ static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id)
 	/* If this was the last handler, shut down the IRQ line: */
 	if (!desc->action) {
 		irq_settings_clr_disable_unlazy(desc);
-		irq_shutdown(desc);
+		/* Only shutdown. Deactivate after synchronize_hardirq() */
+		__irq_shutdown(desc);
 	}
 
 #ifdef CONFIG_SMP
@@ -1768,6 +1770,14 @@ static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id)
 		 * require it to deallocate resources over the slow bus.
 		 */
 		chip_bus_lock(desc);
+		/*
+		 * There is no interrupt on the fly anymore. Deactivate it
+		 * completely.
+		 */
+		raw_spin_lock_irqsave(&desc->lock, flags);
+		irq_domain_deactivate_irq(&desc->irq_data);
+		raw_spin_unlock_irqrestore(&desc->lock, flags);
+
 		irq_release_resources(desc);
 		chip_bus_sync_unlock(desc);
 		irq_remove_timings(desc);

-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [patch 2/5] genirq: Add optional hardware synchronization for shutdown
  2019-06-25 11:13 ` [patch 2/5] genirq: Add optional hardware synchronization for shutdown Thomas Gleixner
  2019-06-26 13:03   ` Thomas Gleixner
@ 2019-06-28  7:59   ` Marc Zyngier
  2019-06-28  9:41     ` Thomas Gleixner
  1 sibling, 1 reply; 11+ messages in thread
From: Marc Zyngier @ 2019-06-28  7:59 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Robert Hodaszi, Vadim Pasternak, Ido Schimmel,
	Greg Kroah-Hartman, linux-serial

On 25/06/2019 12:13, Thomas Gleixner wrote:
> free_irq() ensures that no hardware interrupt handler is executing on a
> different CPU before actually releasing resources and deactivating the
> interrupt completely in a domain hierarchy.
> 
> But that does not catch the case where the interrupt is on flight at the
> hardware level but not yet serviced by the target CPU. That creates an
> interesing race condition:
> 
>    CPU 0                  CPU 1               IRQ CHIP
> 
>                                               interrupt is raised
>                                               sent to CPU1
> 			  Unable to handle
> 			  immediately
> 			  (interrupts off,
> 			   deep idle delay)
>    mask()
>    ...
>    free()
>      shutdown()
>      synchronize_irq()
>      release_resources()
>                           do_IRQ()
>                             -> resources are not available
> 
> That might be harmless and just trigger a spurious interrupt warning, but
> some interrupt chips might get into a wedged state.
> 
> Provide infrastructure for interrupt chips to provide an optional
> irq_inflight() callback and use it for the synchronization in free_irq().
> 
> synchronize_[hard]irq() are not using this mechanism as it might actually
> deadlock unter certain conditions.
> 
> Fixes: 464d12309e1b ("x86/vector: Switch IOAPIC to global reservation mode")
> Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  include/linux/irq.h |    2 ++
>  kernel/irq/manage.c |   29 ++++++++++++++++++++++++-----
>  2 files changed, 26 insertions(+), 5 deletions(-)
> 
> --- a/include/linux/irq.h
> +++ b/include/linux/irq.h
> @@ -418,6 +418,7 @@ static inline irq_hw_number_t irqd_to_hw
>   *			required. This is used for CPU hotplug where the
>   *			target CPU is not yet set in the cpu_online_mask.
>   * @irq_retrigger:	resend an IRQ to the CPU
> + * @irq_inflight:	chip level detection of interrupts in flight (optional)
>   * @irq_set_type:	set the flow type (IRQ_TYPE_LEVEL/etc.) of an IRQ
>   * @irq_set_wake:	enable/disable power-management wake-on of an IRQ
>   * @irq_bus_lock:	function to lock access to slow bus (i2c) chips
> @@ -462,6 +463,7 @@ struct irq_chip {
>  
>  	int		(*irq_set_affinity)(struct irq_data *data, const struct cpumask *dest, bool force);
>  	int		(*irq_retrigger)(struct irq_data *data);
> +	int		(*irq_inflight)(struct irq_data *data);

I wonder how different this irq_inflight() is from the irq_get_irqchip_state()
that can already report a number of states from the HW.

It is also unclear to me how "in flight" maps to the internal state of
the interrupt controller: Is it pending? Acked? If the interrupt has been
masked already, pending should be harmless (the interrupt won't fire anymore).
But seeing it in the acked would probably mean that it is on its way to being
handled, with a potential splat.

>  	int		(*irq_set_type)(struct irq_data *data, unsigned int flow_type);
>  	int		(*irq_set_wake)(struct irq_data *data, unsigned int on);
>  
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -35,8 +35,10 @@ static int __init setup_forced_irqthread
>  early_param("threadirqs", setup_forced_irqthreads);
>  #endif
>  
> -static void __synchronize_hardirq(struct irq_desc *desc)
> +static void __synchronize_hardirq(struct irq_desc *desc, bool sync_chip)
>  {
> +	struct irq_data *irqd = irq_desc_get_irq_data(desc);
> +	struct irq_chip *chip = irq_data_get_irq_chip(irqd);
>  	bool inprogress;
>  
>  	do {
> @@ -52,6 +54,13 @@ static void __synchronize_hardirq(struct
>  		/* Ok, that indicated we're done: double-check carefully. */
>  		raw_spin_lock_irqsave(&desc->lock, flags);
>  		inprogress = irqd_irq_inprogress(&desc->irq_data);
> +
> +		/*
> +		 * If requested and supported, check at the chip whether it
> +		 * is in flight at the hardware level:
> +		 */
> +		if (!inprogress && sync_chip && chip && chip->irq_inflight)
> +			inprogress = chip->irq_inflight(irqd);

To expand on what I was trying to exptree above, I wonder if that would 
be similar to in effect to:

		if (!inprogress && sync_chip && chip && chip->irq_get_irqchip_state)
			chip->irq_get_irqchip_state(irqd, IRQCHIP_STATE_ACTIVE, &inprogress);
				

>  		raw_spin_unlock_irqrestore(&desc->lock, flags);
>  
>  		/* Oops, that failed? */
> @@ -74,13 +83,16 @@ static void __synchronize_hardirq(struct
>   *	Returns: false if a threaded handler is active.
>   *
>   *	This function may be called - with care - from IRQ context.
> + *
> + *	It does not check whether there is an interrupt on flight at the
> + *	hardware level, but not serviced yet, as this might deadlock.
>   */
>  bool synchronize_hardirq(unsigned int irq)
>  {
>  	struct irq_desc *desc = irq_to_desc(irq);
>  
>  	if (desc) {
> -		__synchronize_hardirq(desc);
> +		__synchronize_hardirq(desc, false);
>  		return !atomic_read(&desc->threads_active);
>  	}
>  
> @@ -97,13 +109,16 @@ EXPORT_SYMBOL(synchronize_hardirq);
>   *	holding a resource the IRQ handler may need you will deadlock.
>   *
>   *	This function may be called - with care - from IRQ context.
> + *
> + *	It does not check whether there is an interrupt on flight at the
> + *	hardware level, but not serviced yet, as this might deadlock.
>   */
>  void synchronize_irq(unsigned int irq)
>  {
>  	struct irq_desc *desc = irq_to_desc(irq);
>  
>  	if (desc) {
> -		__synchronize_hardirq(desc);
> +		__synchronize_hardirq(desc, false);
>  		/*
>  		 * We made sure that no hardirq handler is
>  		 * running. Now verify that no threaded handlers are
> @@ -1729,8 +1744,12 @@ static struct irqaction *__free_irq(stru
>  
>  	unregister_handler_proc(irq, action);
>  
> -	/* Make sure it's not being used on another CPU: */
> -	synchronize_hardirq(irq);
> +	/*
> +	 * Make sure it's not being used on another CPU and if the chip
> +	 * supports it also make sure that there is no (not yet serviced)
> +	 * interrupt on flight at the hardware level.
> +	 */
> +	__synchronize_hardirq(desc, true);
>  
>  #ifdef CONFIG_DEBUG_SHIRQ
>  	/*
> 
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch 1/5] genirq: Delay deactivation in free_irq()
  2019-06-28  7:42   ` Marc Zyngier
@ 2019-06-28  9:28     ` Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2019-06-28  9:28 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: LKML, x86, Robert Hodaszi, Vadim Pasternak, Ido Schimmel,
	Greg Kroah-Hartman, linux-serial

Marc,

On Fri, 28 Jun 2019, Marc Zyngier wrote:
> > --- a/kernel/irq/autoprobe.c
> > +++ b/kernel/irq/autoprobe.c
> > @@ -1699,6 +1700,7 @@ static struct irqaction *__free_irq(stru
> >  	/* If this was the last handler, shut down the IRQ line: */
> >  	if (!desc->action) {
> >  		irq_settings_clr_disable_unlazy(desc);
> > +		/* Only shutdown. Deactivate after synchronize_irq() */
> 
> synchronize_hardirq()

Correct.

> >  
> >  	irq_release_resources(desc);
> The patch makes complete sense, so this comment is only a nit, feel free 
> to ignore me: I find it a bit odd that irq_shutdown() doesn't do the 
> full thing anymore, given that it is the "canonical" function, and that 
> __free_irq is the only one that has special needs. Suggestion below.

We have pretty much the same thing with startup, so I made it symmetric.

> Irrespective of which version you prefer:
> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

Thanks for looking into this!

       tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch 2/5] genirq: Add optional hardware synchronization for shutdown
  2019-06-28  7:59   ` Marc Zyngier
@ 2019-06-28  9:41     ` Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2019-06-28  9:41 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: LKML, x86, Robert Hodaszi, Vadim Pasternak, Ido Schimmel,
	Greg Kroah-Hartman, linux-serial

Marc,

On Fri, 28 Jun 2019, Marc Zyngier wrote:
> On 25/06/2019 12:13, Thomas Gleixner wrote:
> >  
> >  	int		(*irq_set_affinity)(struct irq_data *data, const struct cpumask *dest, bool force);
> >  	int		(*irq_retrigger)(struct irq_data *data);
> > +	int		(*irq_inflight)(struct irq_data *data);
> 
> I wonder how different this irq_inflight() is from the irq_get_irqchip_state()
> that can already report a number of states from the HW.
> 
> It is also unclear to me how "in flight" maps to the internal state of
> the interrupt controller: Is it pending? Acked? If the interrupt has been
> masked already, pending should be harmless (the interrupt won't fire anymore).
> But seeing it in the acked would probably mean that it is on its way to being
> handled, with a potential splat.

in flight means that the interrupt chip (in the offending case the IO-APIC)
has that interrupt raised internally _AND_ already propagated to the
destination CPU (in this case the local APIC of the destination). The CPU
has accepted the interrupt and now the chip is in a state where it waits
for it to be acknowledged. If we proceed in that state then the destination
CPU will eventually handle it _after_ all the resources are freed. But that
means that the in flight/wait for acknowledge state becomes stale because
the handling CPU does not find the connection to that chip anymore.

> > +		/*
> > +		 * If requested and supported, check at the chip whether it
> > +		 * is in flight at the hardware level:
> > +		 */
> > +		if (!inprogress && sync_chip && chip && chip->irq_inflight)
> > +			inprogress = chip->irq_inflight(irqd);
> 
> To expand on what I was trying to exptree above, I wonder if that would 
> be similar to in effect to:
> 
> 		if (!inprogress && sync_chip && chip && chip->irq_get_irqchip_state)
> 			chip->irq_get_irqchip_state(irqd, IRQCHIP_STATE_ACTIVE, &inprogress);

Ah, indeed that could be mapped to it.

I'm happy to get rid of the extra callback. Now the question is whether
this would would give an headache for any of the chips which already
implement that callback and whether it needs to become conditional on the
trigger type at the core level. For the IO-APIC this state is only defined
for level type which makes sense as edge is fire and forget.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-06-28  9:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-25 11:13 [patch 0/5] x86/irq: Cure various interrupt issues Thomas Gleixner
2019-06-25 11:13 ` [patch 1/5] genirq: Delay deactivation in free_irq() Thomas Gleixner
2019-06-28  7:42   ` Marc Zyngier
2019-06-28  9:28     ` Thomas Gleixner
2019-06-25 11:13 ` [patch 2/5] genirq: Add optional hardware synchronization for shutdown Thomas Gleixner
2019-06-26 13:03   ` Thomas Gleixner
2019-06-28  7:59   ` Marc Zyngier
2019-06-28  9:41     ` Thomas Gleixner
2019-06-25 11:13 ` [patch 3/5] x86/ioapic: Implement irq_inflight() callback Thomas Gleixner
2019-06-25 11:13 ` [patch 4/5] x86/irq: Handle spurious interrupt after shutdown gracefully Thomas Gleixner
2019-06-25 11:13 ` [patch 5/5] x86/irq: Seperate unused system vectors from spurious entry again Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).