All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console
@ 2015-07-29  8:12 Taichi Kageyama
  2015-07-29  8:12 ` [RFC PATCH v2 1/3] serial: 8250: Fix autoconfig_irq() to avoid race conditions Taichi Kageyama
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Taichi Kageyama @ 2015-07-29  8:12 UTC (permalink / raw)
  To: gregkh, tglx, peter, jiang.liu
  Cc: Taichi Kageyama, linux-serial, linux-kernel, jslaby, prarit,
	Naoya Horiguchi


This patch set provides a workaround to avoid the following problem.
It's based on Linux 4.2-rc4 mainstream kernel.
I've tested this patch set on x86-64 machine and KVM.

RFC
--------------------------
During interrupt probing phase, irq affinity of candidate IRQs
can be changed immediately and safely?
I'd like to discuss how irq affinity should be set during 
interrupt probing phase without paying attention to the kind of
chip->irq_set_affinity.

[patch v2 2/3] tries to set irq affinity and expects that irq affinity
is set immediately if possible.
I've tested this patch with 4.1-rc3 and 4.2-rc4, but the behavior of
this patch was different between both versions and depends on the 
kind of chip->irq_set_affinity although I could test only 2 types 
of machines.
I don't know whether these differences are problem or not. 
It seems other modules don't call irq_do_set_affinity() directly
before irq setup, so my usage of irq affinity may not be good.


v4.1-rc3 with CONFIG_GENERIC_PENDING_IRQ
  + x86-64(IvyBridge): intel_ioapic_set_affinity()
        -  irq affinity is changed immediately[No pending]
  + KVM(x86-64): native_ioapic_set_affinity()
        -  irq affinity is changed immediately  [No pending]
        -  assign_irq_vector() fails with EBUSY because the status 
           is still "move_in_progress"
           when other device calls setup_affinity() with the same irq.

v4.2-rc4 with CONFIG_GENERIC_PENDING_IRQ
   + x86-64(IvyBridge): intel_ir_set_affinity()
        - irq affinity is changed immediately [No pending]
   + KVM(x86-64): ioapic_set_affinity()
        - irq affinity is NOT changed immediately [Pending]
        - The following error was shown when other device calls
          setup_affinity() with the same irq 
          because the status is still "move_in_progress".
                "Failed to recover vector for irq 6"


Problem
--------------------------
There're cases where autoconfig_irq() fails during boot.
In these cases, the console doesn't work in interrupt mode
and "input overrun" (which makes operation mistakes) can happen
on some systems. This problem can happen with high rate every boot
once it occurs because the boot sequence is always almost same.
I saw the original problem on RHEL6.6.

Conditions of Reproduction
--------------------------
- Need non-PnP console serial 
  or PnP console without CONFIG_SERIAL_8250_PNP
- Build with CONFIG_SERIAL_8250_DETECT_IRQ.
- Keep interrupt disabled on the CPU which is used to detect 
  an interrupt during the timeout of autoconfig_irq().
     + Kick printk() on the CPU which detects interrupt 
       from a console serial port.

Change Log
--------------------------
v1:
  http://www.spinics.net/lists/linux-serial/msg17744.html
v2:
- Updated commit log of v1 patch 1/2 --> v2 1/3
- Removed v1 patch 2/2
- Added v2 2/3 patch to set irq affinity
- Added v2 3/3 patch to resolve other cases of this problem
   This is based on Peter's idea.
   It depends on v2 2/3 to set irq affinity.


Taichi Kageyama (3):
  serial: 8250: Fix autoconfig_irq() to avoid race conditions
  genirq: Add a function to set irq affinity of candidate IRQs
  serial: 8250: Fix autoconfig_irq() to reduce the risk of failure

 drivers/tty/serial/8250/8250_core.c | 15 +++++++++++++++
 include/linux/interrupt.h           |  4 ++++
 kernel/irq/autoprobe.c              | 31 +++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+)

-- 
2.4.6

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 1/3] serial: 8250: Fix autoconfig_irq() to avoid race conditions
  2015-07-29  8:12 [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console Taichi Kageyama
@ 2015-07-29  8:12 ` Taichi Kageyama
  2015-08-06 12:53   ` Peter Hurley
  2015-07-29  8:12 ` [RFC PATCH v2 2/3] genirq: Add a function to set irq affinity of candidate IRQs Taichi Kageyama
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Taichi Kageyama @ 2015-07-29  8:12 UTC (permalink / raw)
  To: gregkh, tglx, peter, jiang.liu
  Cc: Taichi Kageyama, linux-serial, linux-kernel, jslaby, prarit,
	Naoya Horiguchi

The following race conditions can happen when a serial port is used
as console.

Case1: CPU_B is used to detect an interrupt from a serial port,
       but it can have interrupts disabled during the waiting time.
Case2: CPU_B clears UART_IER just after CPU_A sets UART_IER and then
       a serial port may not make an interrupt.
Case3: CPU_A sets UART_IER just after CPU_B clears UART_IER.
       This is an unexpected behavior for serial8250_console_write().

CPU_A [autoconfig_irq]      |  CPU_B [serial8250_console_write]
----------------------------|---------------------------------------
                            |
probe_irq_on()              |  spin_lock_irqsave(&port->lock,)
serial_outp(,UART_IER,0x0f) |  serial_out(,UART_IER,0)
udelay(20);                 |  uart_console_write()
probe_irq_off()             |
                            |  spin_unlock_irqrestore(&port->lock,)

Case1 and 2 can make autoconfig_irq() failed.
In these cases, the console doesn't work in interrupt mode and
"input overrun" (which can make operation mistakes) can happen
on some systems. Especially in the Case1, It is known that the
problem happens with high rate every boot once it occurs
because the boot sequence is always almost same.

port mutex makes sure that the autoconfig operation is exclusive of
any other concurrent HW access except by the console operation.
console lock is required in autoconfig_irq().

Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
---
 drivers/tty/serial/8250/8250_core.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git v4.2-rc4.org/drivers/tty/serial/8250/8250_core.c v4.2-rc4.work/drivers/tty/serial/8250/8250_core.c
index 37fff12..ed1e23e 100644
--- v4.2-rc4.org/drivers/tty/serial/8250/8250_core.c
+++ v4.2-rc4.work/drivers/tty/serial/8250/8250_core.c
@@ -1303,6 +1303,9 @@ static void autoconfig_irq(struct uart_8250_port *up)
 		inb_p(ICP);
 	}
 
+	if (uart_console(port))
+		console_lock();
+
 	/* forget possible initially masked and pending IRQ */
 	probe_irq_off(probe_irq_on());
 	save_mcr = serial_in(up, UART_MCR);
@@ -1334,6 +1337,9 @@ static void autoconfig_irq(struct uart_8250_port *up)
 	if (port->flags & UPF_FOURPORT)
 		outb_p(save_ICP, ICP);
 
+	if (uart_console(port))
+		console_unlock();
+
 	port->irq = (irq > 0) ? irq : 0;
 }
 
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 2/3] genirq: Add a function to set irq affinity of candidate IRQs
  2015-07-29  8:12 [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console Taichi Kageyama
  2015-07-29  8:12 ` [RFC PATCH v2 1/3] serial: 8250: Fix autoconfig_irq() to avoid race conditions Taichi Kageyama
@ 2015-07-29  8:12 ` Taichi Kageyama
  2015-07-29  8:13 ` [RFC PATCH v2 3/3] serial: 8250: Fix autoconfig_irq() to reduce the risk of failure Taichi Kageyama
  2015-07-29 10:32 ` [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console Thomas Gleixner
  3 siblings, 0 replies; 18+ messages in thread
From: Taichi Kageyama @ 2015-07-29  8:12 UTC (permalink / raw)
  To: gregkh, tglx, peter, jiang.liu
  Cc: Taichi Kageyama, linux-serial, linux-kernel, jslaby, prarit,
	Naoya Horiguchi

This new function can specify which CPU is used for interrupt probing.

An interrupt probing code expects a CPU detects an interrupt from the
target device, but it doesn't work when the CPU has interrupts disabled
during the waiting time.
The probing code can use this function to specify which CPU detects the
interrupt and reduce the risk of the failure as far as possible.

Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
---
 include/linux/interrupt.h |  4 ++++
 kernel/irq/autoprobe.c    | 31 +++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git v4.2-rc4.org/include/linux/interrupt.h v4.2-rc4.work/include/linux/interrupt.h
index be7e75c..4f08f6d 100644
--- v4.2-rc4.org/include/linux/interrupt.h
+++ v4.2-rc4.work/include/linux/interrupt.h
@@ -647,10 +647,14 @@ static inline unsigned int probe_irq_mask(unsigned long val)
 {
 	return 0;
 }
+static inline void probe_irq_set_affinity(unsigned long val, cpumask_t *mask)
+{
+}
 #else
 extern unsigned long probe_irq_on(void);	/* returns 0 on failure */
 extern int probe_irq_off(unsigned long);	/* returns 0 or negative on failure */
 extern unsigned int probe_irq_mask(unsigned long);	/* returns mask of ISA interrupts */
+extern void probe_irq_set_affinity(unsigned long, cpumask_t *);
 #endif
 
 #ifdef CONFIG_PROC_FS
diff --git v4.2-rc4.org/kernel/irq/autoprobe.c v4.2-rc4.work/kernel/irq/autoprobe.c
index 0119b9d..24e7647 100644
--- v4.2-rc4.org/kernel/irq/autoprobe.c
+++ v4.2-rc4.work/kernel/irq/autoprobe.c
@@ -183,3 +183,34 @@ int probe_irq_off(unsigned long val)
 }
 EXPORT_SYMBOL(probe_irq_off);
 
+/**
+ *	probe_irq_set_affinity	- change smp_affinity during autodetect
+ *	@val: mask of potential interrupts (unused)
+ *
+ *	Sets smp_affinity of candidate irq lines if possible.
+ *
+ *	An interrupt probing code expects a CPU detects an interrupt
+ *	from the target device, but it doesn't work when the CPU has
+ *	interrupts disabled during the waiting time.
+ *	The code can use this function to specify which CPU detects
+ *	the interrupt and reduce the risk of the failure.
+ */
+void probe_irq_set_affinity(unsigned long val, cpumask_t *mask)
+{
+	int i;
+	struct irq_desc *desc;
+
+	for_each_irq_desc(i, desc) {
+		raw_spin_lock_irq(&desc->lock);
+
+		if (desc->istate & IRQS_AUTODETECT) {
+			struct irq_data *data = &desc->irq_data;
+
+			if (data->chip && data->chip->irq_set_affinity)
+				irq_do_set_affinity(data, mask, false);
+		}
+		raw_spin_unlock_irq(&desc->lock);
+	}
+}
+EXPORT_SYMBOL(probe_irq_set_affinity);
+
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v2 3/3] serial: 8250: Fix autoconfig_irq() to reduce the risk of failure
  2015-07-29  8:12 [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console Taichi Kageyama
  2015-07-29  8:12 ` [RFC PATCH v2 1/3] serial: 8250: Fix autoconfig_irq() to avoid race conditions Taichi Kageyama
  2015-07-29  8:12 ` [RFC PATCH v2 2/3] genirq: Add a function to set irq affinity of candidate IRQs Taichi Kageyama
@ 2015-07-29  8:13 ` Taichi Kageyama
  2015-07-29 10:32 ` [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console Thomas Gleixner
  3 siblings, 0 replies; 18+ messages in thread
From: Taichi Kageyama @ 2015-07-29  8:13 UTC (permalink / raw)
  To: gregkh, tglx, peter, jiang.liu
  Cc: Taichi Kageyama, linux-serial, linux-kernel, jslaby, prarit,
	Naoya Horiguchi

autoconfig_irq() expects a CPU detects an interrupt from a serial
port, but it doesn't work when the CPU has interrupts disabled
during the waiting time.
New one tries to specify own CPU to probe the interrupt
and reduce the risk of the failure as far as possible.

Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Suggested-by: Peter Hurley <peter@hurleysoftware.com>
---
 drivers/tty/serial/8250/8250_core.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git v4.2-rc4.org/drivers/tty/serial/8250/8250_core.c v4.2-rc4.work/drivers/tty/serial/8250/8250_core.c
index ed1e23e..85e8cdf 100644
--- v4.2-rc4.org/drivers/tty/serial/8250/8250_core.c
+++ v4.2-rc4.work/drivers/tty/serial/8250/8250_core.c
@@ -1295,6 +1295,7 @@ static void autoconfig_irq(struct uart_8250_port *up)
 	unsigned int ICP = 0;
 	unsigned long irqs;
 	int irq;
+	cpumask_t this_cpu;
 
 	if (port->flags & UPF_FOURPORT) {
 		ICP = (port->iobase & 0xfe0) | 0x1f;
@@ -1313,6 +1314,13 @@ static void autoconfig_irq(struct uart_8250_port *up)
 	serial_out(up, UART_MCR, UART_MCR_OUT1 | UART_MCR_OUT2);
 
 	irqs = probe_irq_on();
+
+	/* Try to detect irq on the current CPU */
+	cpumask_clear(&this_cpu);
+	preempt_disable();
+	cpumask_set_cpu(smp_processor_id(), &this_cpu);
+	probe_irq_set_affinity(irqs, &this_cpu);
+
 	serial_out(up, UART_MCR, 0);
 	udelay(10);
 	if (port->flags & UPF_FOURPORT) {
@@ -1330,6 +1338,7 @@ static void autoconfig_irq(struct uart_8250_port *up)
 	serial_out(up, UART_TX, 0xFF);
 	udelay(20);
 	irq = probe_irq_off(irqs);
+	preempt_enable();
 
 	serial_out(up, UART_MCR, save_mcr);
 	serial_out(up, UART_IER, save_ier);
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console
  2015-07-29  8:12 [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console Taichi Kageyama
                   ` (2 preceding siblings ...)
  2015-07-29  8:13 ` [RFC PATCH v2 3/3] serial: 8250: Fix autoconfig_irq() to reduce the risk of failure Taichi Kageyama
@ 2015-07-29 10:32 ` Thomas Gleixner
  2015-07-29 11:51   ` Peter Hurley
  3 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2015-07-29 10:32 UTC (permalink / raw)
  To: Taichi Kageyama
  Cc: gregkh, peter, jiang.liu, linux-serial, linux-kernel, jslaby,
	prarit, Naoya Horiguchi

On Wed, 29 Jul 2015, Taichi Kageyama wrote:
> - Keep interrupt disabled on the CPU which is used to detect 
>   an interrupt during the timeout of autoconfig_irq().
>      + Kick printk() on the CPU which detects interrupt 
>        from a console serial port.

This is wrong to begin with. How is that supposed to work on an UP
machine? Not at all.

So no, fix the code which has interrupts disabled accross autoprobing
and do not try to apply bandaids somewhere else.

Thanks,

	tglx





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console
  2015-07-29 10:32 ` [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console Thomas Gleixner
@ 2015-07-29 11:51   ` Peter Hurley
  2015-07-29 11:53     ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Hurley @ 2015-07-29 11:51 UTC (permalink / raw)
  To: Thomas Gleixner, Taichi Kageyama
  Cc: gregkh, jiang.liu, linux-serial, linux-kernel, jslaby, prarit,
	Naoya Horiguchi

On 07/29/2015 06:32 AM, Thomas Gleixner wrote:
> On Wed, 29 Jul 2015, Taichi Kageyama wrote:
>> - Keep interrupt disabled on the CPU which is used to detect 
>>   an interrupt during the timeout of autoconfig_irq().
>>      + Kick printk() on the CPU which detects interrupt 
>>        from a console serial port.
> 
> This is wrong to begin with. How is that supposed to work on an UP
> machine? Not at all.
> 
> So no, fix the code which has interrupts disabled accross autoprobing
> and do not try to apply bandaids somewhere else.

Like printk() from some unrelated driver?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console
  2015-07-29 11:51   ` Peter Hurley
@ 2015-07-29 11:53     ` Thomas Gleixner
  2015-07-29 13:17       ` Peter Hurley
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2015-07-29 11:53 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Taichi Kageyama, gregkh, jiang.liu, linux-serial, linux-kernel,
	jslaby, prarit, Naoya Horiguchi

On Wed, 29 Jul 2015, Peter Hurley wrote:
> On 07/29/2015 06:32 AM, Thomas Gleixner wrote:
> > On Wed, 29 Jul 2015, Taichi Kageyama wrote:
> >> - Keep interrupt disabled on the CPU which is used to detect 
> >>   an interrupt during the timeout of autoconfig_irq().
> >>      + Kick printk() on the CPU which detects interrupt 
> >>        from a console serial port.
> > 
> > This is wrong to begin with. How is that supposed to work on an UP
> > machine? Not at all.
> > 
> > So no, fix the code which has interrupts disabled accross autoprobing
> > and do not try to apply bandaids somewhere else.
> 
> Like printk() from some unrelated driver?

If that's the cause for the wreckage then yes, we need a way to tell
the printk code not to call into the driver until that initialization
step is done. It's that simple.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console
  2015-07-29 11:53     ` Thomas Gleixner
@ 2015-07-29 13:17       ` Peter Hurley
  2015-07-29 13:35         ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Hurley @ 2015-07-29 13:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Taichi Kageyama, gregkh, jiang.liu, linux-serial, linux-kernel,
	jslaby, prarit, Naoya Horiguchi

On 07/29/2015 07:53 AM, Thomas Gleixner wrote:
> On Wed, 29 Jul 2015, Peter Hurley wrote:
>> On 07/29/2015 06:32 AM, Thomas Gleixner wrote:
>>> On Wed, 29 Jul 2015, Taichi Kageyama wrote:
>>>> - Keep interrupt disabled on the CPU which is used to detect 
>>>>   an interrupt during the timeout of autoconfig_irq().
>>>>      + Kick printk() on the CPU which detects interrupt 
>>>>        from a console serial port.
>>>
>>> This is wrong to begin with. How is that supposed to work on an UP
>>> machine? Not at all.
>>>
>>> So no, fix the code which has interrupts disabled accross autoprobing
>>> and do not try to apply bandaids somewhere else.
>>
>> Like printk() from some unrelated driver?
> 
> If that's the cause for the wreckage then yes, we need a way to tell
> the printk code not to call into the driver until that initialization
> step is done. It's that simple.

Like this?

--- >% --
Subject: [PATCH] genirq: Disable printk() during irq probe

printk() disables interrupts for extended periods of time while
outputting to console drivers. This breaks irq probing since the
triggered irq may not be serviced (in time) if scheduled on the
printk() cpu.

Claim the console_lock() which effectively disables console output.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
---
 kernel/irq/autoprobe.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/irq/autoprobe.c b/kernel/irq/autoprobe.c
index 0119b9d..47535d2 100644
--- a/kernel/irq/autoprobe.c
+++ b/kernel/irq/autoprobe.c
@@ -39,6 +39,13 @@ unsigned long probe_irq_on(void)
 	 */
 	async_synchronize_full();
 	mutex_lock(&probing_active);
+
+	/*
+	 * printk() breaks irq probing - disable printk output until probe
+	 * completes
+	 */
+	console_lock();
+
 	/*
 	 * something may have generated an irq long ago and we want to
 	 * flush such a longstanding irq before considering it as spurious.
@@ -132,6 +139,7 @@ unsigned int probe_irq_mask(unsigned long val)
 		}
 		raw_spin_unlock_irq(&desc->lock);
 	}
+	console_unlock();
 	mutex_unlock(&probing_active);
 
 	return mask & val;
@@ -174,6 +182,7 @@ int probe_irq_off(unsigned long val)
 		}
 		raw_spin_unlock_irq(&desc->lock);
 	}
+	console_unlock();
 	mutex_unlock(&probing_active);
 
 	if (nr_of_irqs > 1)
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console
  2015-07-29 13:17       ` Peter Hurley
@ 2015-07-29 13:35         ` Thomas Gleixner
  2015-07-30  1:41           ` Taichi Kageyama
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2015-07-29 13:35 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Taichi Kageyama, gregkh, jiang.liu, linux-serial, linux-kernel,
	jslaby, prarit, Naoya Horiguchi, Peter Zijlstra

On Wed, 29 Jul 2015, Peter Hurley wrote:
> On 07/29/2015 07:53 AM, Thomas Gleixner wrote:
> > On Wed, 29 Jul 2015, Peter Hurley wrote:
> >> On 07/29/2015 06:32 AM, Thomas Gleixner wrote:
> >>> On Wed, 29 Jul 2015, Taichi Kageyama wrote:
> >>>> - Keep interrupt disabled on the CPU which is used to detect 
> >>>>   an interrupt during the timeout of autoconfig_irq().
> >>>>      + Kick printk() on the CPU which detects interrupt 
> >>>>        from a console serial port.
> >>>
> >>> This is wrong to begin with. How is that supposed to work on an UP
> >>> machine? Not at all.
> >>>
> >>> So no, fix the code which has interrupts disabled accross autoprobing
> >>> and do not try to apply bandaids somewhere else.
> >>
> >> Like printk() from some unrelated driver?
> > 
> > If that's the cause for the wreckage then yes, we need a way to tell
> > the printk code not to call into the driver until that initialization
> > step is done. It's that simple.
> 
> Like this?

Looks about right.
 
> --- >% --
> Subject: [PATCH] genirq: Disable printk() during irq probe
> 
> printk() disables interrupts for extended periods of time while
> outputting to console drivers. This breaks irq probing since the
> triggered irq may not be serviced (in time) if scheduled on the
> printk() cpu.
> 
> Claim the console_lock() which effectively disables console output.
> 
> Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
> ---
>  kernel/irq/autoprobe.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/kernel/irq/autoprobe.c b/kernel/irq/autoprobe.c
> index 0119b9d..47535d2 100644
> --- a/kernel/irq/autoprobe.c
> +++ b/kernel/irq/autoprobe.c
> @@ -39,6 +39,13 @@ unsigned long probe_irq_on(void)
>  	 */
>  	async_synchronize_full();
>  	mutex_lock(&probing_active);
> +
> +	/*
> +	 * printk() breaks irq probing - disable printk output until probe
> +	 * completes
> +	 */
> +	console_lock();
> +
>  	/*
>  	 * something may have generated an irq long ago and we want to
>  	 * flush such a longstanding irq before considering it as spurious.
> @@ -132,6 +139,7 @@ unsigned int probe_irq_mask(unsigned long val)
>  		}
>  		raw_spin_unlock_irq(&desc->lock);
>  	}
> +	console_unlock();
>  	mutex_unlock(&probing_active);
>  
>  	return mask & val;
> @@ -174,6 +182,7 @@ int probe_irq_off(unsigned long val)
>  		}
>  		raw_spin_unlock_irq(&desc->lock);
>  	}
> +	console_unlock();
>  	mutex_unlock(&probing_active);
>  
>  	if (nr_of_irqs > 1)
> -- 
> 2.5.0
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console
  2015-07-29 13:35         ` Thomas Gleixner
@ 2015-07-30  1:41           ` Taichi Kageyama
  2015-07-30 10:12             ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Taichi Kageyama @ 2015-07-30  1:41 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Hurley
  Cc: gregkh, jiang.liu, linux-serial, linux-kernel, jslaby, prarit,
	Naoya Horiguchi, Peter Zijlstra

On 2015/07/29 22:35, Thomas Gleixner wrote:
> On Wed, 29 Jul 2015, Peter Hurley wrote:
>> On 07/29/2015 07:53 AM, Thomas Gleixner wrote:
>>> On Wed, 29 Jul 2015, Peter Hurley wrote:
>>>> On 07/29/2015 06:32 AM, Thomas Gleixner wrote:
>>>>> On Wed, 29 Jul 2015, Taichi Kageyama wrote:
>>>>>> - Keep interrupt disabled on the CPU which is used to detect 
>>>>>>   an interrupt during the timeout of autoconfig_irq().
>>>>>>      + Kick printk() on the CPU which detects interrupt 
>>>>>>        from a console serial port.
>>>>>
>>>>> This is wrong to begin with. How is that supposed to work on an UP
>>>>> machine? Not at all.
>>>>>
>>>>> So no, fix the code which has interrupts disabled accross autoprobing
>>>>> and do not try to apply bandaids somewhere else.
>>>>
>>>> Like printk() from some unrelated driver?
>>>
>>> If that's the cause for the wreckage then yes, we need a way to tell
>>> the printk code not to call into the driver until that initialization
>>> step is done. It's that simple.
>>
>> Like this?
> 
> Looks about right.


Thomas,
Thank you for your comments.
The status of interrupt disabled should be short,
but it cannot be controlled from autorpobing side.
That's why I was thinking fixing autoprobing itself was better.

However, I remember Peter also mentioned about your idea first.
OK, I change my mind to fix the real problem simply.
I actually have seen the problem caused only by "printk()" on RHEL.


Peter,
I know your code is sample (console.h is required), 
but it is conflict with [patch v2 1/3].
I think serial8250_console_write should not touch ctrl reg during autoconfig_irq. 
To resolve the real problem, I think keeping only [patch v2 1/3] is best(opt1).
What do you think?

opt1. keep [patch v2 1/3]
    + Don't touch other legacy drivers using autoprobe.
       Each driver can use console_lock to fix this problem if it happens.

opt2. fix probe_irq_on/mask/off for all legacy devices which use them
    + Discard [patch v2 1/3]
    + Fix autoconfig_irq and probe_irq_*
    + Test for all these drivers is required.
      --> Probably, it works but I cannot test them.


Thanks,
Taichi



>  
>> --- >% --
>> Subject: [PATCH] genirq: Disable printk() during irq probe
>>
>> printk() disables interrupts for extended periods of time while
>> outputting to console drivers. This breaks irq probing since the
>> triggered irq may not be serviced (in time) if scheduled on the
>> printk() cpu.
>>
>> Claim the console_lock() which effectively disables console output.
>>
>> Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
>> ---
>>  kernel/irq/autoprobe.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/kernel/irq/autoprobe.c b/kernel/irq/autoprobe.c
>> index 0119b9d..47535d2 100644
>> --- a/kernel/irq/autoprobe.c
>> +++ b/kernel/irq/autoprobe.c
>> @@ -39,6 +39,13 @@ unsigned long probe_irq_on(void)
>>  	 */
>>  	async_synchronize_full();
>>  	mutex_lock(&probing_active);
>> +
>> +	/*
>> +	 * printk() breaks irq probing - disable printk output until probe
>> +	 * completes
>> +	 */
>> +	console_lock();
>> +
>>  	/*
>>  	 * something may have generated an irq long ago and we want to
>>  	 * flush such a longstanding irq before considering it as spurious.
>> @@ -132,6 +139,7 @@ unsigned int probe_irq_mask(unsigned long val)
>>  		}
>>  		raw_spin_unlock_irq(&desc->lock);
>>  	}
>> +	console_unlock();
>>  	mutex_unlock(&probing_active);
>>  
>>  	return mask & val;
>> @@ -174,6 +182,7 @@ int probe_irq_off(unsigned long val)
>>  		}
>>  		raw_spin_unlock_irq(&desc->lock);
>>  	}
>> +	console_unlock();
>>  	mutex_unlock(&probing_active);
>>  
>>  	if (nr_of_irqs > 1)
>> -- 
>> 2.5.0
>>
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console
  2015-07-30  1:41           ` Taichi Kageyama
@ 2015-07-30 10:12             ` Thomas Gleixner
  2015-07-30 13:43               ` Peter Hurley
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2015-07-30 10:12 UTC (permalink / raw)
  To: Taichi Kageyama
  Cc: Peter Hurley, gregkh, jiang.liu, linux-serial, linux-kernel,
	jslaby, prarit, Naoya Horiguchi, Peter Zijlstra

On Thu, 30 Jul 2015, Taichi Kageyama wrote:
> On 2015/07/29 22:35, Thomas Gleixner wrote:
> I know your code is sample (console.h is required), 
> but it is conflict with [patch v2 1/3].
> I think serial8250_console_write should not touch ctrl reg during autoconfig_irq. 
> To resolve the real problem, I think keeping only [patch v2 1/3] is best(opt1).
> What do you think?
> 
> opt1. keep [patch v2 1/3]
>     + Don't touch other legacy drivers using autoprobe.
>        Each driver can use console_lock to fix this problem if it happens.

No, we already know that autoprobing and console access can cause
this, so we fix it at the core code and be done with it.
 
Can you please test Peters patch and confirm that is solves it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console
  2015-07-30 10:12             ` Thomas Gleixner
@ 2015-07-30 13:43               ` Peter Hurley
  2015-07-30 22:12                 ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Hurley @ 2015-07-30 13:43 UTC (permalink / raw)
  To: Thomas Gleixner, Taichi Kageyama
  Cc: gregkh, jiang.liu, linux-serial, linux-kernel, jslaby, prarit,
	Naoya Horiguchi, Peter Zijlstra

On 07/30/2015 06:12 AM, Thomas Gleixner wrote:
> On Thu, 30 Jul 2015, Taichi Kageyama wrote:
>> On 2015/07/29 22:35, Thomas Gleixner wrote:
>> I know your code is sample (console.h is required), 
>> but it is conflict with [patch v2 1/3].
>> I think serial8250_console_write should not touch ctrl reg during autoconfig_irq. 
>> To resolve the real problem, I think keeping only [patch v2 1/3] is best(opt1).
>> What do you think?
>>
>> opt1. keep [patch v2 1/3]
>>     + Don't touch other legacy drivers using autoprobe.
>>        Each driver can use console_lock to fix this problem if it happens.
> 
> No, we already know that autoprobing and console access can cause
> this, so we fix it at the core code and be done with it.
>  
> Can you please test Peters patch and confirm that is solves it.

Honestly, I'm not too sure this is the way to go.

Messing around with irqsoff tracer for 30 mins turned up:
3.664ms in intel_unmap_page
  - iotlb flush, spinlock contention on iova_rbtree_lock
1.726ms in intel_map_page()
  - iommu core @ __alloc_and_insert_iova_range()
1.859ms in syslog_print_all()
  - which is holding the logbuf_lock so that's pretty bad anyway
387us in nouveau @ nv50_vm_flush()
  - gpu tlb flush

I have irqsoff trace reports for all of these if anyone is interested.

Looks like lockdep would need to be off as well because I saw but
failed to capture a save_trace() in the 300us range.

I think this is just the tip of the iceberg for irqsoff.

I'm not saying these don't need fixing as well, but there's no way
irq probe will ever be reliable with this approach.

Going back to the RFC idea of pinning the irq affinity to the cpu
actually doing the probing (which is in a known context), what about
that is broken on UP? Just the implementation or is it the fundamental
concept?

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console
  2015-07-30 13:43               ` Peter Hurley
@ 2015-07-30 22:12                 ` Thomas Gleixner
  2015-07-30 23:43                   ` Peter Hurley
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2015-07-30 22:12 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Taichi Kageyama, gregkh, jiang.liu, linux-serial, linux-kernel,
	jslaby, prarit, Naoya Horiguchi, Peter Zijlstra

On Thu, 30 Jul 2015, Peter Hurley wrote:
> Honestly, I'm not too sure this is the way to go.
> 
> Messing around with irqsoff tracer for 30 mins turned up:
> 3.664ms in intel_unmap_page
>   - iotlb flush, spinlock contention on iova_rbtree_lock
> 1.726ms in intel_map_page()
>   - iommu core @ __alloc_and_insert_iova_range()
> 1.859ms in syslog_print_all()
>   - which is holding the logbuf_lock so that's pretty bad anyway
> 387us in nouveau @ nv50_vm_flush()
>   - gpu tlb flush
> 
> I have irqsoff trace reports for all of these if anyone is interested.
> 
> Looks like lockdep would need to be off as well because I saw but
> failed to capture a save_trace() in the 300us range.
> 
> I think this is just the tip of the iceberg for irqsoff.

I agree.
 
> I'm not saying these don't need fixing as well, but there's no way
> irq probe will ever be reliable with this approach.

irq probe is a known to be unreliable heuristic endavour anyway and it
cannot ever become truly reliable, except you put a gazillion of
restrictions to the system state on it.

> Going back to the RFC idea of pinning the irq affinity to the cpu
> actually doing the probing (which is in a known context), what about
> that is broken on UP? Just the implementation or is it the fundamental
> concept?

First of all, there is no guarantee that you can affine these
interrupts at all. We have interrupt controllers which cannot set the
affinity and they deliver to cpus in a round robin scheme or whatever
hardware designers thought would be clever.

Second, what prevents the following scenario on UP or the affine core:

	probe_start()
	interrupt
	  looong running handler (usb is an obvious candidate)
	     printk()

That will swallow your uart state and ruin detection as well.

So for the problem at hand, we really need to prevent that something
is fiddling with the uart in the first place and the most obvious way
is to serialize against printk.

We can debate whether the autoprobe code is the right place or not, we
can actually stick it into the 8250 driver and be done with it
because:

If you look at the actual autoprobe users aside of 8250. That's really
all ancient ISA hardware and hardly interesting. So all we really care
about are the 8250 serial ports.

Now lets look at the 8250 serial ports. I just checked the random
collection of machines I have access to:

 In 100% of all cases ttyS0 is on irq4 and ttyS1 is on irq3

 All of the machines have even a correct PNP entry of the irq resource
 for the serial ports. And there is pretty old crap in that lot.

So the real question is: Why would we autoprobe in the first place?

Debian, Fedora, OpenSuse, SLES have:

   # CONFIG_SERIAL_8250_DETECT_IRQ is not set

and so do my kernels.

I just built one with that option enabled for the fun of it and it
still uses the PNP information. No autoprobing.

So why are you interested in that serial irq autoprobe crap at all?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console
  2015-07-30 22:12                 ` Thomas Gleixner
@ 2015-07-30 23:43                   ` Peter Hurley
  2015-07-31  7:02                       ` Taichi Kageyama
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Hurley @ 2015-07-30 23:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Taichi Kageyama, gregkh, jiang.liu, linux-serial, linux-kernel,
	jslaby, prarit, Naoya Horiguchi, Peter Zijlstra

On 07/30/2015 06:12 PM, Thomas Gleixner wrote:
> On Thu, 30 Jul 2015, Peter Hurley wrote:
>> Honestly, I'm not too sure this is the way to go.
>>
>> Messing around with irqsoff tracer for 30 mins turned up:
>> 3.664ms in intel_unmap_page
>>   - iotlb flush, spinlock contention on iova_rbtree_lock
>> 1.726ms in intel_map_page()
>>   - iommu core @ __alloc_and_insert_iova_range()
>> 1.859ms in syslog_print_all()
>>   - which is holding the logbuf_lock so that's pretty bad anyway
>> 387us in nouveau @ nv50_vm_flush()
>>   - gpu tlb flush
>>
>> I have irqsoff trace reports for all of these if anyone is interested.
>>
>> Looks like lockdep would need to be off as well because I saw but
>> failed to capture a save_trace() in the 300us range.
>>
>> I think this is just the tip of the iceberg for irqsoff.
> 
> I agree.
>  
>> I'm not saying these don't need fixing as well, but there's no way
>> irq probe will ever be reliable with this approach.
> 
> irq probe is a known to be unreliable heuristic endavour anyway and it
> cannot ever become truly reliable, except you put a gazillion of
> restrictions to the system state on it.

Yep, totally agree. As you note below, this functionality is completely
disabled on every known distro kernel.


>> Going back to the RFC idea of pinning the irq affinity to the cpu
>> actually doing the probing (which is in a known context), what about
>> that is broken on UP? Just the implementation or is it the fundamental
>> concept?
> 
> First of all, there is no guarantee that you can affine these
> interrupts at all. We have interrupt controllers which cannot set the
> affinity and they deliver to cpus in a round robin scheme or whatever
> hardware designers thought would be clever.
> 
> Second, what prevents the following scenario on UP or the affine core:
> 
> 	probe_start()
> 	interrupt
> 	  looong running handler (usb is an obvious candidate)
> 	     printk()
> 
> That will swallow your uart state and ruin detection as well.

Yeah, ok, so fundamentally broken concept to pin the irqs for probing.
Thanks for clarifying that.


> So for the problem at hand, we really need to prevent that something
> is fiddling with the uart in the first place and the most obvious way
> is to serialize against printk.

I'm ok with just the original patch 1 (which I reviewed some time ago).


> We can debate whether the autoprobe code is the right place or not, we
> can actually stick it into the 8250 driver and be done with it
> because:
> 
> If you look at the actual autoprobe users aside of 8250. That's really
> all ancient ISA hardware and hardly interesting. So all we really care
> about are the 8250 serial ports.
> 
> Now lets look at the 8250 serial ports. I just checked the random
> collection of machines I have access to:
> 
>  In 100% of all cases ttyS0 is on irq4 and ttyS1 is on irq3
> 
>  All of the machines have even a correct PNP entry of the irq resource
>  for the serial ports. And there is pretty old crap in that lot.
> 
> So the real question is: Why would we autoprobe in the first place?
> 
> Debian, Fedora, OpenSuse, SLES have:
> 
>    # CONFIG_SERIAL_8250_DETECT_IRQ is not set
> 
> and so do my kernels.
> 
> I just built one with that option enabled for the fun of it and it
> still uses the PNP information. No autoprobing.
> 
> So why are you interested in that serial irq autoprobe crap at all?

Agree, but I guess the hardware in question is non-PNP serial-over-LAN.
It's Taichi's hardware so he can be more specific. Also, this problem
would not apply to 8250 ports @ the ISA addresses (3f8,irq 4 & 2f8,irq 3)
because those are predefined on the platform.

Taichi's original proposal was to add module parameters for the serial
driver, which I am dead set against, having just struggled to deal with
ancient module parameters while splitting the 8250 driver.

I also noted in reviewing that proposal that user-space tools (setserial)
can reset the irq to a known value after driver load. Ubuntu, for one,
runs setserial as part of boot (to restore serial hardware to known-working
configuration).

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console
  2015-07-30 23:43                   ` Peter Hurley
@ 2015-07-31  7:02                       ` Taichi Kageyama
  0 siblings, 0 replies; 18+ messages in thread
From: Taichi Kageyama @ 2015-07-31  7:02 UTC (permalink / raw)
  To: Peter Hurley, Thomas Gleixner
  Cc: gregkh, jiang.liu, linux-serial, linux-kernel, jslaby, prarit,
	Naoya Horiguchi, Peter Zijlstra

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5514 bytes --]

On 2015/07/31 8:43, Peter Hurley wrote:
> On 07/30/2015 06:12 PM, Thomas Gleixner wrote:
>> On Thu, 30 Jul 2015, Peter Hurley wrote:
>>> Honestly, I'm not too sure this is the way to go.
>>>
>>> Messing around with irqsoff tracer for 30 mins turned up:
>>> 3.664ms in intel_unmap_page
>>>   - iotlb flush, spinlock contention on iova_rbtree_lock
>>> 1.726ms in intel_map_page()
>>>   - iommu core @ __alloc_and_insert_iova_range()
>>> 1.859ms in syslog_print_all()
>>>   - which is holding the logbuf_lock so that's pretty bad anyway
>>> 387us in nouveau @ nv50_vm_flush()
>>>   - gpu tlb flush
>>>
>>> I have irqsoff trace reports for all of these if anyone is interested.
>>>
>>> Looks like lockdep would need to be off as well because I saw but
>>> failed to capture a save_trace() in the 300us range.
>>>
>>> I think this is just the tip of the iceberg for irqsoff.
>>
>> I agree.
>>  
>>> I'm not saying these don't need fixing as well, but there's no way
>>> irq probe will ever be reliable with this approach.
>>
>> irq probe is a known to be unreliable heuristic endavour anyway and it
>> cannot ever become truly reliable, except you put a gazillion of
>> restrictions to the system state on it.
> 
> Yep, totally agree. As you note below, this functionality is completely
> disabled on every known distro kernel.
> 
> 
>>> Going back to the RFC idea of pinning the irq affinity to the cpu
>>> actually doing the probing (which is in a known context), what about
>>> that is broken on UP? Just the implementation or is it the fundamental
>>> concept?
>>
>> First of all, there is no guarantee that you can affine these
>> interrupts at all. We have interrupt controllers which cannot set the
>> affinity and they deliver to cpus in a round robin scheme or whatever
>> hardware designers thought would be clever.
>>
>> Second, what prevents the following scenario on UP or the affine core:
>>
>> 	probe_start()
>> 	interrupt
>> 	  looong running handler (usb is an obvious candidate)
>> 	     printk()
>>
>> That will swallow your uart state and ruin detection as well.


Yes, the RFC idea was just for reducing the risk of failure, not perfect.
I know it depends on the interrupt controllers,
but I didn't know the error occurred after others set affinity.
I understand there is no guarantee at all in my RFC case.

> Yeah, ok, so fundamentally broken concept to pin the irqs for probing.
> Thanks for clarifying that.
> 
> 
>> So for the problem at hand, we really need to prevent that something
>> is fiddling with the uart in the first place and the most obvious way
>> is to serialize against printk.
> 
> I'm ok with just the original patch 1 (which I reviewed some time ago).
> 
> 
>> We can debate whether the autoprobe code is the right place or not, we
>> can actually stick it into the 8250 driver and be done with it
>> because:
>>
>> If you look at the actual autoprobe users aside of 8250. That's really
>> all ancient ISA hardware and hardly interesting. So all we really care
>> about are the 8250 serial ports.

In this case, I think [patch v2 1/3] is enough.
console_lock is required in autoconfig_irq() to resolve other race conditions
before calling probe_irq_on().

>>
>> Now lets look at the 8250 serial ports. I just checked the random
>> collection of machines I have access to:
>>
>>  In 100% of all cases ttyS0 is on irq4 and ttyS1 is on irq3
>>
>>  All of the machines have even a correct PNP entry of the irq resource
>>  for the serial ports. And there is pretty old crap in that lot.
>>
>> So the real question is: Why would we autoprobe in the first place?
>>
>> Debian, Fedora, OpenSuse, SLES have:
>>
>>    # CONFIG_SERIAL_8250_DETECT_IRQ is not set
>>
>> and so do my kernels.
>> I just built one with that option enabled for the fun of it and it
>> still uses the PNP information. No autoprobing.
>>
>> So why are you interested in that serial irq autoprobe crap at all?
> 

Because RHEL6 uses CONFIG_SERIAL_8250_DETECT_IRQ=y unfortunately.

> Agree, but I guess the hardware in question is non-PNP serial-over-LAN.
> It's Taichi's hardware so he can be more specific. Also, this problem
> would not apply to 8250 ports @ the ISA addresses (3f8,irq 4 & 2f8,irq 3)
> because those are predefined on the platform.
> 
> Taichi's original proposal was to add module parameters for the serial
> driver, which I am dead set against, having just struggled to deal with
> ancient module parameters while splitting the 8250 driver.

As you guys mentioned, irq3 and 4 for console ports are predefined basically.
That's why I introduced original patch to skip autoconfig_irq only for console
although the parameter thing was not good idea.
I'm using one of very large HP platform.
It expects ISA addresses for SOL, but doesn't provides PNP define like PNP0501.
So autoconfig_irq() can overwrites valid irq with invalid one...


> I also noted in reviewing that proposal that user-space tools (setserial)
> can reset the irq to a known value after driver load. Ubuntu, for one,
> runs setserial as part of boot (to restore serial hardware to known-working
> configuration).

I already got the following console solutions after discussion with Peter.
 - Force set irq before any APs and getty open /dev/console. 
   Users have to know valid irq#.
 - Fix FW to define PNP
 - CONFIG_SERIAL_8250_DETECT_IRQ=n


Thanks,
Taichi

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console
@ 2015-07-31  7:02                       ` Taichi Kageyama
  0 siblings, 0 replies; 18+ messages in thread
From: Taichi Kageyama @ 2015-07-31  7:02 UTC (permalink / raw)
  To: Peter Hurley, Thomas Gleixner
  Cc: gregkh, jiang.liu, linux-serial, linux-kernel, jslaby, prarit,
	Naoya Horiguchi, Peter Zijlstra

On 2015/07/31 8:43, Peter Hurley wrote:
> On 07/30/2015 06:12 PM, Thomas Gleixner wrote:
>> On Thu, 30 Jul 2015, Peter Hurley wrote:
>>> Honestly, I'm not too sure this is the way to go.
>>>
>>> Messing around with irqsoff tracer for 30 mins turned up:
>>> 3.664ms in intel_unmap_page
>>>   - iotlb flush, spinlock contention on iova_rbtree_lock
>>> 1.726ms in intel_map_page()
>>>   - iommu core @ __alloc_and_insert_iova_range()
>>> 1.859ms in syslog_print_all()
>>>   - which is holding the logbuf_lock so that's pretty bad anyway
>>> 387us in nouveau @ nv50_vm_flush()
>>>   - gpu tlb flush
>>>
>>> I have irqsoff trace reports for all of these if anyone is interested.
>>>
>>> Looks like lockdep would need to be off as well because I saw but
>>> failed to capture a save_trace() in the 300us range.
>>>
>>> I think this is just the tip of the iceberg for irqsoff.
>>
>> I agree.
>>  
>>> I'm not saying these don't need fixing as well, but there's no way
>>> irq probe will ever be reliable with this approach.
>>
>> irq probe is a known to be unreliable heuristic endavour anyway and it
>> cannot ever become truly reliable, except you put a gazillion of
>> restrictions to the system state on it.
> 
> Yep, totally agree. As you note below, this functionality is completely
> disabled on every known distro kernel.
> 
> 
>>> Going back to the RFC idea of pinning the irq affinity to the cpu
>>> actually doing the probing (which is in a known context), what about
>>> that is broken on UP? Just the implementation or is it the fundamental
>>> concept?
>>
>> First of all, there is no guarantee that you can affine these
>> interrupts at all. We have interrupt controllers which cannot set the
>> affinity and they deliver to cpus in a round robin scheme or whatever
>> hardware designers thought would be clever.
>>
>> Second, what prevents the following scenario on UP or the affine core:
>>
>> 	probe_start()
>> 	interrupt
>> 	  looong running handler (usb is an obvious candidate)
>> 	     printk()
>>
>> That will swallow your uart state and ruin detection as well.


Yes, the RFC idea was just for reducing the risk of failure, not perfect.
I know it depends on the interrupt controllers,
but I didn't know the error occurred after others set affinity.
I understand there is no guarantee at all in my RFC case.

> Yeah, ok, so fundamentally broken concept to pin the irqs for probing.
> Thanks for clarifying that.
> 
> 
>> So for the problem at hand, we really need to prevent that something
>> is fiddling with the uart in the first place and the most obvious way
>> is to serialize against printk.
> 
> I'm ok with just the original patch 1 (which I reviewed some time ago).
> 
> 
>> We can debate whether the autoprobe code is the right place or not, we
>> can actually stick it into the 8250 driver and be done with it
>> because:
>>
>> If you look at the actual autoprobe users aside of 8250. That's really
>> all ancient ISA hardware and hardly interesting. So all we really care
>> about are the 8250 serial ports.

In this case, I think [patch v2 1/3] is enough.
console_lock is required in autoconfig_irq() to resolve other race conditions
before calling probe_irq_on().

>>
>> Now lets look at the 8250 serial ports. I just checked the random
>> collection of machines I have access to:
>>
>>  In 100% of all cases ttyS0 is on irq4 and ttyS1 is on irq3
>>
>>  All of the machines have even a correct PNP entry of the irq resource
>>  for the serial ports. And there is pretty old crap in that lot.
>>
>> So the real question is: Why would we autoprobe in the first place?
>>
>> Debian, Fedora, OpenSuse, SLES have:
>>
>>    # CONFIG_SERIAL_8250_DETECT_IRQ is not set
>>
>> and so do my kernels.
>> I just built one with that option enabled for the fun of it and it
>> still uses the PNP information. No autoprobing.
>>
>> So why are you interested in that serial irq autoprobe crap at all?
> 

Because RHEL6 uses CONFIG_SERIAL_8250_DETECT_IRQ=y unfortunately.

> Agree, but I guess the hardware in question is non-PNP serial-over-LAN.
> It's Taichi's hardware so he can be more specific. Also, this problem
> would not apply to 8250 ports @ the ISA addresses (3f8,irq 4 & 2f8,irq 3)
> because those are predefined on the platform.
> 
> Taichi's original proposal was to add module parameters for the serial
> driver, which I am dead set against, having just struggled to deal with
> ancient module parameters while splitting the 8250 driver.

As you guys mentioned, irq3 and 4 for console ports are predefined basically.
That's why I introduced original patch to skip autoconfig_irq only for console
although the parameter thing was not good idea.
I'm using one of very large HP platform.
It expects ISA addresses for SOL, but doesn't provides PNP define like PNP0501.
So autoconfig_irq() can overwrites valid irq with invalid one...


> I also noted in reviewing that proposal that user-space tools (setserial)
> can reset the irq to a known value after driver load. Ubuntu, for one,
> runs setserial as part of boot (to restore serial hardware to known-working
> configuration).

I already got the following console solutions after discussion with Peter.
 - Force set irq before any APs and getty open /dev/console. 
   Users have to know valid irq#.
 - Fix FW to define PNP
 - CONFIG_SERIAL_8250_DETECT_IRQ=n


Thanks,
Taichi


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console
  2015-07-31  7:02                       ` Taichi Kageyama
  (?)
@ 2015-08-02  9:52                       ` Thomas Gleixner
  -1 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2015-08-02  9:52 UTC (permalink / raw)
  To: Taichi Kageyama
  Cc: Peter Hurley, gregkh, jiang.liu, linux-serial, linux-kernel,
	jslaby, prarit, Naoya Horiguchi, Peter Zijlstra

On Fri, 31 Jul 2015, Taichi Kageyama wrote:
> On 2015/07/31 8:43, Peter Hurley wrote:
> >> If you look at the actual autoprobe users aside of 8250. That's really
> >> all ancient ISA hardware and hardly interesting. So all we really care
> >> about are the 8250 serial ports.
> 
> In this case, I think [patch v2 1/3] is enough.
> console_lock is required in autoconfig_irq() to resolve other race conditions
> before calling probe_irq_on().

Fine with me.
 
> >> So why are you interested in that serial irq autoprobe crap at all?
> > 
> Because RHEL6 uses CONFIG_SERIAL_8250_DETECT_IRQ=y unfortunately.

Doh!
 
> I already got the following console solutions after discussion with Peter.
>  - Force set irq before any APs and getty open /dev/console. 
>    Users have to know valid irq#.
>  - Fix FW to define PNP
>  - CONFIG_SERIAL_8250_DETECT_IRQ=n

Ack!

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v2 1/3] serial: 8250: Fix autoconfig_irq() to avoid race conditions
  2015-07-29  8:12 ` [RFC PATCH v2 1/3] serial: 8250: Fix autoconfig_irq() to avoid race conditions Taichi Kageyama
@ 2015-08-06 12:53   ` Peter Hurley
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Hurley @ 2015-08-06 12:53 UTC (permalink / raw)
  To: Taichi Kageyama, gregkh, tglx
  Cc: jiang.liu, linux-serial, linux-kernel, jslaby, prarit, Naoya Horiguchi

On 07/29/2015 04:12 AM, Taichi Kageyama wrote:
> The following race conditions can happen when a serial port is used
> as console.
> 
> Case1: CPU_B is used to detect an interrupt from a serial port,
>        but it can have interrupts disabled during the waiting time.
> Case2: CPU_B clears UART_IER just after CPU_A sets UART_IER and then
>        a serial port may not make an interrupt.
> Case3: CPU_A sets UART_IER just after CPU_B clears UART_IER.
>        This is an unexpected behavior for serial8250_console_write().
> 
> CPU_A [autoconfig_irq]      |  CPU_B [serial8250_console_write]
> ----------------------------|---------------------------------------
>                             |
> probe_irq_on()              |  spin_lock_irqsave(&port->lock,)
> serial_outp(,UART_IER,0x0f) |  serial_out(,UART_IER,0)
> udelay(20);                 |  uart_console_write()
> probe_irq_off()             |
>                             |  spin_unlock_irqrestore(&port->lock,)
> 
> Case1 and 2 can make autoconfig_irq() failed.
> In these cases, the console doesn't work in interrupt mode and
> "input overrun" (which can make operation mistakes) can happen
> on some systems. Especially in the Case1, It is known that the
> problem happens with high rate every boot once it occurs
> because the boot sequence is always almost same.
> 
> port mutex makes sure that the autoconfig operation is exclusive of
> any other concurrent HW access except by the console operation.
> console lock is required in autoconfig_irq().
> 
> Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Reviewed-by: Peter Hurley <peter@hurleysoftware.com>

Please resend this one patch to Greg as a non-RFC patch.

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-08-06 12:53 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-29  8:12 [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console Taichi Kageyama
2015-07-29  8:12 ` [RFC PATCH v2 1/3] serial: 8250: Fix autoconfig_irq() to avoid race conditions Taichi Kageyama
2015-08-06 12:53   ` Peter Hurley
2015-07-29  8:12 ` [RFC PATCH v2 2/3] genirq: Add a function to set irq affinity of candidate IRQs Taichi Kageyama
2015-07-29  8:13 ` [RFC PATCH v2 3/3] serial: 8250: Fix autoconfig_irq() to reduce the risk of failure Taichi Kageyama
2015-07-29 10:32 ` [RFC PATCH v2 0/3] genirq, serial: 8250: Workaround to avoid irq=0 for console Thomas Gleixner
2015-07-29 11:51   ` Peter Hurley
2015-07-29 11:53     ` Thomas Gleixner
2015-07-29 13:17       ` Peter Hurley
2015-07-29 13:35         ` Thomas Gleixner
2015-07-30  1:41           ` Taichi Kageyama
2015-07-30 10:12             ` Thomas Gleixner
2015-07-30 13:43               ` Peter Hurley
2015-07-30 22:12                 ` Thomas Gleixner
2015-07-30 23:43                   ` Peter Hurley
2015-07-31  7:02                     ` Taichi Kageyama
2015-07-31  7:02                       ` Taichi Kageyama
2015-08-02  9:52                       ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.