All of lore.kernel.org
 help / color / mirror / Atom feed
* [dovetail][PATCH] x86: pipeline: Fix vector stall after vector error handling
@ 2021-11-08  6:00 Florian Bezdeka
  2021-11-08  6:54 ` Jan Kiszka
  2021-11-08  7:56 ` Philippe Gerum
  0 siblings, 2 replies; 3+ messages in thread
From: Florian Bezdeka @ 2021-11-08  6:00 UTC (permalink / raw)
  To: xenomai, rpm

Whenever an IRQ was handled for a vector being NULL or in one of the
error states the interrupt was not acknowledged at the APIC. That can
happen if a vector is cleaned up by one of the device drivers while
there is still one IRQ in flight.

This has two effects:
  - If the affected vector is re-assigned later, it does not work, the
    IRQ never makes its way to the CPU
  - Interrupts with lower priority are no longer delivered to the CPU

The problem was observed on a quite big Intel XEON machine where some
vectors / irqs were temporary used and cleaned up and re-assigned
later.

Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
---
 arch/x86/kernel/irq_pipeline.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/irq_pipeline.c b/arch/x86/kernel/irq_pipeline.c
index 48d9959bc11a..63de68141b21 100644
--- a/arch/x86/kernel/irq_pipeline.c
+++ b/arch/x86/kernel/irq_pipeline.c
@@ -239,6 +239,8 @@ void arch_handle_irq(struct pt_regs *regs, u8 vector, bool irq_movable)
 	} else {
 		desc = __this_cpu_read(vector_irq[vector]);
 		if (unlikely(IS_ERR_OR_NULL(desc))) {
+			__ack_APIC_irq();
+
 			if (desc == VECTOR_UNUSED) {
 				pr_emerg_ratelimited("%s: %d.%u No irq handler for vector\n",
 						__func__, smp_processor_id(),
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [dovetail][PATCH] x86: pipeline: Fix vector stall after vector error handling
  2021-11-08  6:00 [dovetail][PATCH] x86: pipeline: Fix vector stall after vector error handling Florian Bezdeka
@ 2021-11-08  6:54 ` Jan Kiszka
  2021-11-08  7:56 ` Philippe Gerum
  1 sibling, 0 replies; 3+ messages in thread
From: Jan Kiszka @ 2021-11-08  6:54 UTC (permalink / raw)
  To: Florian Bezdeka, xenomai, rpm

On 08.11.21 07:00, Florian Bezdeka wrote:
> Whenever an IRQ was handled for a vector being NULL or in one of the
> error states the interrupt was not acknowledged at the APIC. That can
> happen if a vector is cleaned up by one of the device drivers while
> there is still one IRQ in flight.
> 
> This has two effects:
>   - If the affected vector is re-assigned later, it does not work, the
>     IRQ never makes its way to the CPU
>   - Interrupts with lower priority are no longer delivered to the CPU
> 
> The problem was observed on a quite big Intel XEON machine where some
> vectors / irqs were temporary used and cleaned up and re-assigned
> later.
> 
> Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> ---
>  arch/x86/kernel/irq_pipeline.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kernel/irq_pipeline.c b/arch/x86/kernel/irq_pipeline.c
> index 48d9959bc11a..63de68141b21 100644
> --- a/arch/x86/kernel/irq_pipeline.c
> +++ b/arch/x86/kernel/irq_pipeline.c
> @@ -239,6 +239,8 @@ void arch_handle_irq(struct pt_regs *regs, u8 vector, bool irq_movable)
>  	} else {
>  		desc = __this_cpu_read(vector_irq[vector]);
>  		if (unlikely(IS_ERR_OR_NULL(desc))) {
> +			__ack_APIC_irq();
> +
>  			if (desc == VECTOR_UNUSED) {
>  				pr_emerg_ratelimited("%s: %d.%u No irq handler for vector\n",
>  						__func__, smp_processor_id(),
> 

Nice catch! And very hard work to get there - well done!

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dovetail][PATCH] x86: pipeline: Fix vector stall after vector error handling
  2021-11-08  6:00 [dovetail][PATCH] x86: pipeline: Fix vector stall after vector error handling Florian Bezdeka
  2021-11-08  6:54 ` Jan Kiszka
@ 2021-11-08  7:56 ` Philippe Gerum
  1 sibling, 0 replies; 3+ messages in thread
From: Philippe Gerum @ 2021-11-08  7:56 UTC (permalink / raw)
  To: Florian Bezdeka; +Cc: xenomai, jan.kiszka, henning.schild


Florian Bezdeka <florian.bezdeka@siemens.com> writes:

> Whenever an IRQ was handled for a vector being NULL or in one of the
> error states the interrupt was not acknowledged at the APIC. That can
> happen if a vector is cleaned up by one of the device drivers while
> there is still one IRQ in flight.
>
> This has two effects:
>   - If the affected vector is re-assigned later, it does not work, the
>     IRQ never makes its way to the CPU
>   - Interrupts with lower priority are no longer delivered to the CPU
>
> The problem was observed on a quite big Intel XEON machine where some
> vectors / irqs were temporary used and cleaned up and re-assigned
> later.
>
> Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> ---
>  arch/x86/kernel/irq_pipeline.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/x86/kernel/irq_pipeline.c b/arch/x86/kernel/irq_pipeline.c
> index 48d9959bc11a..63de68141b21 100644
> --- a/arch/x86/kernel/irq_pipeline.c
> +++ b/arch/x86/kernel/irq_pipeline.c
> @@ -239,6 +239,8 @@ void arch_handle_irq(struct pt_regs *regs, u8 vector, bool irq_movable)
>  	} else {
>  		desc = __this_cpu_read(vector_irq[vector]);
>  		if (unlikely(IS_ERR_OR_NULL(desc))) {
> +			__ack_APIC_irq();
> +
>  			if (desc == VECTOR_UNUSED) {
>  				pr_emerg_ratelimited("%s: %d.%u No irq handler for vector\n",
>  						__func__, smp_processor_id(),

Ouch. Thanks for digging this. Merged upstream.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-11-08  7:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-08  6:00 [dovetail][PATCH] x86: pipeline: Fix vector stall after vector error handling Florian Bezdeka
2021-11-08  6:54 ` Jan Kiszka
2021-11-08  7:56 ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.