linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] serial: 8250: Workaround to avoid irq=0 for console
@ 2015-06-05  9:55 Taichi Kageyama
  2015-06-05  9:57 ` [PATCH 1/2] serial: 8250: Fix autoconfig_irq() to avoid race conditions Taichi Kageyama
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Taichi Kageyama @ 2015-06-05  9:55 UTC (permalink / raw)
  To: gregkh; +Cc: linux-serial, jslaby, linux-kernel, prarit, Naoya Horiguchi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2161 bytes --]

This patch set provides a workaround to avoid the following problem.
It's based on Linux 4.1-rc3 mainstream kernel.
I've tested this patch set on x86-64 machine and KVM.

Problem
--------------------------
There're cases where autoconfig_irq() fails during boot.
In these cases, the console doesn't work in interrupt mode,
the mode cannot be changed anymore, and "input overrun"
(which can make operation mistakes) happens easily.
This problem happens with high rate every boot once it occurs
because the boot sequence is always almost same.
I saw the original problem on RHEL6.6.

Fix
--------------------------
serial: 8250: Fix autoconfig_irq() to avoid race conditions
  Fix one of autoconfig_irq() failure case.
serial: 8250: Allow to skip autoconfig_irq() for a console
  Provide a workaround to avoid other autoconfig_irq() failure cases.

Conditions of Reproduction
--------------------------
- Build with CONFIG_SERIAL_8250_DETECT_IRQ.
- Need non-PnP console serial or PnP console with no CONFIG_SERIAL_8250_PNP
- Kick printk() repeatedly on the CPU which can handle an interrupt
   from a console serial during autoconfig_irq(). The CPU is basically cpu0.
- Disable the interrupt of the CPU for longer during autoconfig_irq().

Note
--------------------------
Ideally, I think autoconfig_irq() should be fixed completely,
but it's hard from the following points
as long as auto_irq algorithm is used.
  - How long it should wait for an interrrupt
  - How to assure the interrupt of the CPU enabled during auto_irq
  - How to know which CPU can handle an interrupt from a serial.

Do you have any other idea?
In my opinion, providing a workaround is better than
applying big changes to the old function for legacy devices.

Taichi Kageyama (2):
   serial: 8250: Fix autoconfig_irq() to avoid race conditions
   serial: 8250: Allow to skip autoconfig_irq() for a console

  drivers/tty/serial/8250/8250_core.c |   23 +++++++++++++++++++++++
  1 files changed, 23 insertions(+), 0 deletions(-)
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/2] serial: 8250: Fix autoconfig_irq() to avoid race conditions
  2015-06-05  9:55 [PATCH 0/2] serial: 8250: Workaround to avoid irq=0 for console Taichi Kageyama
@ 2015-06-05  9:57 ` Taichi Kageyama
  2015-07-08 23:35   ` Peter Hurley
  2015-07-23 22:32   ` gregkh
  2015-06-05 10:03 ` [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console Taichi Kageyama
  2015-07-07  8:13 ` [PATCH 0/2] serial: 8250: Workaround to avoid irq=0 for console Taichi Kageyama
  2 siblings, 2 replies; 21+ messages in thread
From: Taichi Kageyama @ 2015-06-05  9:57 UTC (permalink / raw)
  To: gregkh; +Cc: linux-serial, jslaby, linux-kernel, prarit, Naoya Horiguchi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2373 bytes --]

The following race conditions can happen if a serial is used as console.
  Case1. CPU_B handles an interrupt from a serial
    autoconfig_irq() fails whether the interrupt is raised or not
    if CPU_B is disabled to handle interrupts for longer than it expects.
  Case2. CPU_B clears UART_IER just after CPU_A sets UART_IER
    A serial may not make an interrupt.
    autoconfig_irq() can fail if the interrupt is not raised.
  Case3. CPU_A sets UART_IER just after CPU_B clears UART_IER
    This is an unexpected behavior for uart_console_write().

  CPU_A [autoconfig_irq]      CPU_B [serial8250_console_write]
  -----------------------------------------------------------------
  probe_irq_on()              spin_lock_irqsave(&port->lock,)
  serial_outp(,UART_IER,0x0f) serial_out(,UART_IER,0)
  udelay(20);                 uart_console_write()
  probe_irq_off()
                              spin_unlock_irqrestore(&port->lock,)
  -----------------------------------------------------------------

If autoconfig_irq() fails, the console doesn't work in interrupt mode,
the mode cannot be changed anymore, and "input overrun"
(which can make operation mistakes) happens easily.
This problem happens with high rate every boot once it occurs
because the boot sequence is always almost same.

Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
  drivers/tty/serial/8250/8250_core.c |    6 ++++++
  1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
index 4506e40..6bf31f2 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -1309,6 +1309,9 @@ static void autoconfig_irq(struct uart_8250_port *up)
  	unsigned long irqs;
  	int irq;

+	if (uart_console(port))
+		console_lock();
+
  	if (port->flags & UPF_FOURPORT) {
  		ICP = (port->iobase & 0xfe0) | 0x1f;
  		save_ICP = inb_p(ICP);
@@ -1347,6 +1350,9 @@ static void autoconfig_irq(struct uart_8250_port *up)
  	if (port->flags & UPF_FOURPORT)
  		outb_p(save_ICP, ICP);

+	if (uart_console(port))
+		console_unlock();
+
  	port->irq = (irq > 0) ? irq : 0;
  }

-- 
1.7.1

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console
  2015-06-05  9:55 [PATCH 0/2] serial: 8250: Workaround to avoid irq=0 for console Taichi Kageyama
  2015-06-05  9:57 ` [PATCH 1/2] serial: 8250: Fix autoconfig_irq() to avoid race conditions Taichi Kageyama
@ 2015-06-05 10:03 ` Taichi Kageyama
  2015-07-08 11:55   ` Peter Hurley
  2015-07-07  8:13 ` [PATCH 0/2] serial: 8250: Workaround to avoid irq=0 for console Taichi Kageyama
  2 siblings, 1 reply; 21+ messages in thread
From: Taichi Kageyama @ 2015-06-05 10:03 UTC (permalink / raw)
  To: gregkh; +Cc: linux-serial, jslaby, linux-kernel, prarit, Naoya Horiguchi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 4161 bytes --]

This patch provides a new parameter as a workaround of the following
problem. It allows us to skip autoconfig_irq() and to use a well-known irq
number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.

There're cases where autoconfig_irq() fails during boot.
In these cases, the console doesn't work in interrupt mode,
the mode cannot be changed anymore, and "input overrun"
(which can make operation mistakes) happens easily.
This problem happens with high rate every boot once it occurs
because the boot sequence is always almost same.

autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
during the waiting time, but there're some cases where the CPU cannot
handle the interrupt for longer than the time. It completely depends on
how other functions work on the CPU. Ideally, autoconfig_irq() should be
fixed to control these cases but as long as auto_irq algorithm is used,
it's hard to know which CPU can handle an interrupt from a serial and
to assure the interrupt of the CPU is enabled during auto_irq.
Meanwhile for legacy consoles, they actually don't require auto_irq
because they basically use well-known irq number.
For non-console serials, this workaround is not required
because setserial command can kick autoconfig_irq() again for them.

Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
  drivers/tty/serial/8250/8250_core.c |   17 +++++++++++++++++
  1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
index 6bf31f2..60fda28 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -65,6 +65,8 @@ static int serial_index(struct uart_port *port)

  static unsigned int skip_txen_test; /* force skip of txen test at init time */

+static unsigned int skip_cons_autoirq; /* force skip autoirq for console */
+
  /*
   * Debugging.
   */
@@ -3336,6 +3338,9 @@ serial8250_register_ports(struct uart_driver *drv, struct device *dev)
  		if (skip_txen_test)
  			up->port.flags |= UPF_NO_TXEN_TEST;

+		if (uart_console(&up->port) && skip_cons_autoirq)
+			up->port.flags &= ~UPF_AUTO_IRQ;
+
  		uart_add_one_port(drv, &up->port);
  	}
  }
@@ -3875,6 +3880,9 @@ int serial8250_register_8250_port(struct uart_8250_port *up)
  		if (skip_txen_test)
  			uart->port.flags |= UPF_NO_TXEN_TEST;

+		if (uart_console(&uart->port) && skip_cons_autoirq)
+			uart->port.flags &= ~UPF_AUTO_IRQ;
+
  		if (up->port.flags & UPF_FIXED_TYPE)
  			uart->port.type = up->port.type;

@@ -3936,6 +3944,10 @@ void serial8250_unregister_port(int line)
  		uart->port.flags &= ~UPF_BOOT_AUTOCONF;
  		if (skip_txen_test)
  			uart->port.flags |= UPF_NO_TXEN_TEST;
+
+		if (uart_console(&uart->port) && skip_cons_autoirq)
+			uart->port.flags &= ~UPF_AUTO_IRQ;
+
  		uart->port.type = PORT_UNKNOWN;
  		uart->port.dev = &serial8250_isa_devs->dev;
  		uart->capabilities = 0;
@@ -4044,6 +4056,9 @@ MODULE_PARM_DESC(nr_uarts, "Maximum number of UARTs supported. (1-" __MODULE_STR
  module_param(skip_txen_test, uint, 0644);
  MODULE_PARM_DESC(skip_txen_test, "Skip checking for the TXEN bug at init time");

+module_param(skip_cons_autoirq, uint, 0644);
+MODULE_PARM_DESC(skip_cons_autoirq, "Skip auto irq for console during boot");
+
  #ifdef CONFIG_SERIAL_8250_RSA
  module_param_array(probe_rsa, ulong, &probe_rsa_count, 0444);
  MODULE_PARM_DESC(probe_rsa, "Probe I/O ports for RSA");
@@ -4070,6 +4085,8 @@ static void __used s8250_options(void)
  	module_param_cb(share_irqs, &param_ops_uint, &share_irqs, 0644);
  	module_param_cb(nr_uarts, &param_ops_uint, &nr_uarts, 0644);
  	module_param_cb(skip_txen_test, &param_ops_uint, &skip_txen_test, 0644);
+	module_param_cb(skip_cons_autoirq, &param_ops_uint,
+		&skip_cons_autoirq, 0644);
  #ifdef CONFIG_SERIAL_8250_RSA
  	__module_param_call(MODULE_PARAM_PREFIX, probe_rsa,
  		&param_array_ops, .arr = &__param_arr_probe_rsa,
-- 
1.7.1

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/2] serial: 8250: Workaround to avoid irq=0 for console
  2015-06-05  9:55 [PATCH 0/2] serial: 8250: Workaround to avoid irq=0 for console Taichi Kageyama
  2015-06-05  9:57 ` [PATCH 1/2] serial: 8250: Fix autoconfig_irq() to avoid race conditions Taichi Kageyama
  2015-06-05 10:03 ` [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console Taichi Kageyama
@ 2015-07-07  8:13 ` Taichi Kageyama
  2015-07-07 22:42   ` gregkh
  2 siblings, 1 reply; 21+ messages in thread
From: Taichi Kageyama @ 2015-07-07  8:13 UTC (permalink / raw)
  To: gregkh; +Cc: linux-serial, jslaby, linux-kernel, prarit, Naoya Horiguchi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2346 bytes --]

On 2015/06/05 18:55, Taichi Kageyama wrote:
> This patch set provides a workaround to avoid the following problem.
> It's based on Linux 4.1-rc3 mainstream kernel.
> I've tested this patch set on x86-64 machine and KVM.
>
> Problem
> --------------------------
> There're cases where autoconfig_irq() fails during boot.
> In these cases, the console doesn't work in interrupt mode,
> the mode cannot be changed anymore, and "input overrun"
> (which can make operation mistakes) happens easily.
> This problem happens with high rate every boot once it occurs
> because the boot sequence is always almost same.
> I saw the original problem on RHEL6.6.
>
> Fix
> --------------------------
> serial: 8250: Fix autoconfig_irq() to avoid race conditions
>   Fix one of autoconfig_irq() failure case.
> serial: 8250: Allow to skip autoconfig_irq() for a console
>   Provide a workaround to avoid other autoconfig_irq() failure cases.
>
> Conditions of Reproduction
> --------------------------
> - Build with CONFIG_SERIAL_8250_DETECT_IRQ.
> - Need non-PnP console serial or PnP console with no CONFIG_SERIAL_8250_PNP
> - Kick printk() repeatedly on the CPU which can handle an interrupt
>    from a console serial during autoconfig_irq(). The CPU is basically cpu0.
> - Disable the interrupt of the CPU for longer during autoconfig_irq().
>
> Note
> --------------------------
> Ideally, I think autoconfig_irq() should be fixed completely,
> but it's hard from the following points
> as long as auto_irq algorithm is used.
>   - How long it should wait for an interrrupt
>   - How to assure the interrupt of the CPU enabled during auto_irq
>   - How to know which CPU can handle an interrupt from a serial.
>
> Do you have any other idea?
> In my opinion, providing a workaround is better than
> applying big changes to the old function for legacy devices.
>
> Taichi Kageyama (2):
>    serial: 8250: Fix autoconfig_irq() to avoid race conditions
>    serial: 8250: Allow to skip autoconfig_irq() for a console
>
>   drivers/tty/serial/8250/8250_core.c |   23 +++++++++++++++++++++++
>   1 files changed, 23 insertions(+), 0 deletions(-)
>

Could you review this patch set?


Thanks,
Taichiÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/2] serial: 8250: Workaround to avoid irq=0 for console
  2015-07-07  8:13 ` [PATCH 0/2] serial: 8250: Workaround to avoid irq=0 for console Taichi Kageyama
@ 2015-07-07 22:42   ` gregkh
  0 siblings, 0 replies; 21+ messages in thread
From: gregkh @ 2015-07-07 22:42 UTC (permalink / raw)
  To: Taichi Kageyama
  Cc: linux-serial, jslaby, linux-kernel, prarit, Naoya Horiguchi

On Tue, Jul 07, 2015 at 08:13:59AM +0000, Taichi Kageyama wrote:
> On 2015/06/05 18:55, Taichi Kageyama wrote:
> > This patch set provides a workaround to avoid the following problem.
> > It's based on Linux 4.1-rc3 mainstream kernel.
> > I've tested this patch set on x86-64 machine and KVM.
> >
> > Problem
> > --------------------------
> > There're cases where autoconfig_irq() fails during boot.
> > In these cases, the console doesn't work in interrupt mode,
> > the mode cannot be changed anymore, and "input overrun"
> > (which can make operation mistakes) happens easily.
> > This problem happens with high rate every boot once it occurs
> > because the boot sequence is always almost same.
> > I saw the original problem on RHEL6.6.
> >
> > Fix
> > --------------------------
> > serial: 8250: Fix autoconfig_irq() to avoid race conditions
> >   Fix one of autoconfig_irq() failure case.
> > serial: 8250: Allow to skip autoconfig_irq() for a console
> >   Provide a workaround to avoid other autoconfig_irq() failure cases.
> >
> > Conditions of Reproduction
> > --------------------------
> > - Build with CONFIG_SERIAL_8250_DETECT_IRQ.
> > - Need non-PnP console serial or PnP console with no CONFIG_SERIAL_8250_PNP
> > - Kick printk() repeatedly on the CPU which can handle an interrupt
> >    from a console serial during autoconfig_irq(). The CPU is basically cpu0.
> > - Disable the interrupt of the CPU for longer during autoconfig_irq().
> >
> > Note
> > --------------------------
> > Ideally, I think autoconfig_irq() should be fixed completely,
> > but it's hard from the following points
> > as long as auto_irq algorithm is used.
> >   - How long it should wait for an interrrupt
> >   - How to assure the interrupt of the CPU enabled during auto_irq
> >   - How to know which CPU can handle an interrupt from a serial.
> >
> > Do you have any other idea?
> > In my opinion, providing a workaround is better than
> > applying big changes to the old function for legacy devices.
> >
> > Taichi Kageyama (2):
> >    serial: 8250: Fix autoconfig_irq() to avoid race conditions
> >    serial: 8250: Allow to skip autoconfig_irq() for a console
> >
> >   drivers/tty/serial/8250/8250_core.c |   23 +++++++++++++++++++++++
> >   1 files changed, 23 insertions(+), 0 deletions(-)
> >
> 
> Could you review this patch set?

Relax, it's in my queue...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console
  2015-06-05 10:03 ` [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console Taichi Kageyama
@ 2015-07-08 11:55   ` Peter Hurley
  2015-07-08 12:53     ` Prarit Bhargava
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Hurley @ 2015-07-08 11:55 UTC (permalink / raw)
  To: Taichi Kageyama, gregkh
  Cc: linux-serial, jslaby, linux-kernel, prarit, Naoya Horiguchi

Hi Taichi,

On 06/05/2015 06:03 AM, Taichi Kageyama wrote:
> This patch provides a new parameter as a workaround of the following
> problem. It allows us to skip autoconfig_irq() and to use a well-known irq
> number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.
> 
> There're cases where autoconfig_irq() fails during boot.
> In these cases, the console doesn't work in interrupt mode,
> the mode cannot be changed anymore, and "input overrun"
> (which can make operation mistakes) happens easily.
> This problem happens with high rate every boot once it occurs
> because the boot sequence is always almost same.
> 
> autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
> during the waiting time, but there're some cases where the CPU cannot
> handle the interrupt for longer than the time. It completely depends on
> how other functions work on the CPU. Ideally, autoconfig_irq() should be
> fixed

It completely depends on how long some other driver has interrupts disabled,
which is a problem that needs fixed _in that driver_. autoconfig_irq() does not
need fixing.

A typical cause of this behavior is printk spew from overly-instrumented
debugging of some driver (trace_printk is better suited to heavy instrumentation).

Regards,
Peter Hurley


> to control these cases but as long as auto_irq algorithm is used,
> it's hard to know which CPU can handle an interrupt from a serial and
> to assure the interrupt of the CPU is enabled during auto_irq.
> Meanwhile for legacy consoles, they actually don't require auto_irq
> because they basically use well-known irq number.
> For non-console serials, this workaround is not required
> because setserial command can kick autoconfig_irq() again for them.
> 
> Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> ---
>   drivers/tty/serial/8250/8250_core.c |   17 +++++++++++++++++
>   1 files changed, 17 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
> index 6bf31f2..60fda28 100644
> --- a/drivers/tty/serial/8250/8250_core.c
> +++ b/drivers/tty/serial/8250/8250_core.c
> @@ -65,6 +65,8 @@ static int serial_index(struct uart_port *port)
> 
>   static unsigned int skip_txen_test; /* force skip of txen test at init time */
> 
> +static unsigned int skip_cons_autoirq; /* force skip autoirq for console */
> +
>   /*
>    * Debugging.
>    */
> @@ -3336,6 +3338,9 @@ serial8250_register_ports(struct uart_driver *drv, struct device *dev)
>   		if (skip_txen_test)
>   			up->port.flags |= UPF_NO_TXEN_TEST;
> 
> +		if (uart_console(&up->port) && skip_cons_autoirq)
> +			up->port.flags &= ~UPF_AUTO_IRQ;
> +
>   		uart_add_one_port(drv, &up->port);
>   	}
>   }
> @@ -3875,6 +3880,9 @@ int serial8250_register_8250_port(struct uart_8250_port *up)
>   		if (skip_txen_test)
>   			uart->port.flags |= UPF_NO_TXEN_TEST;
> 
> +		if (uart_console(&uart->port) && skip_cons_autoirq)
> +			uart->port.flags &= ~UPF_AUTO_IRQ;
> +
>   		if (up->port.flags & UPF_FIXED_TYPE)
>   			uart->port.type = up->port.type;
> 
> @@ -3936,6 +3944,10 @@ void serial8250_unregister_port(int line)
>   		uart->port.flags &= ~UPF_BOOT_AUTOCONF;
>   		if (skip_txen_test)
>   			uart->port.flags |= UPF_NO_TXEN_TEST;
> +
> +		if (uart_console(&uart->port) && skip_cons_autoirq)
> +			uart->port.flags &= ~UPF_AUTO_IRQ;
> +
>   		uart->port.type = PORT_UNKNOWN;
>   		uart->port.dev = &serial8250_isa_devs->dev;
>   		uart->capabilities = 0;
> @@ -4044,6 +4056,9 @@ MODULE_PARM_DESC(nr_uarts, "Maximum number of UARTs supported. (1-" __MODULE_STR
>   module_param(skip_txen_test, uint, 0644);
>   MODULE_PARM_DESC(skip_txen_test, "Skip checking for the TXEN bug at init time");
> 
> +module_param(skip_cons_autoirq, uint, 0644);
> +MODULE_PARM_DESC(skip_cons_autoirq, "Skip auto irq for console during boot");
> +
>   #ifdef CONFIG_SERIAL_8250_RSA
>   module_param_array(probe_rsa, ulong, &probe_rsa_count, 0444);
>   MODULE_PARM_DESC(probe_rsa, "Probe I/O ports for RSA");
> @@ -4070,6 +4085,8 @@ static void __used s8250_options(void)
>   	module_param_cb(share_irqs, &param_ops_uint, &share_irqs, 0644);
>   	module_param_cb(nr_uarts, &param_ops_uint, &nr_uarts, 0644);
>   	module_param_cb(skip_txen_test, &param_ops_uint, &skip_txen_test, 0644);
> +	module_param_cb(skip_cons_autoirq, &param_ops_uint,
> +		&skip_cons_autoirq, 0644);
>   #ifdef CONFIG_SERIAL_8250_RSA
>   	__module_param_call(MODULE_PARAM_PREFIX, probe_rsa,
>   		&param_array_ops, .arr = &__param_arr_probe_rsa,
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console
  2015-07-08 11:55   ` Peter Hurley
@ 2015-07-08 12:53     ` Prarit Bhargava
  2015-07-08 13:51       ` Peter Hurley
  0 siblings, 1 reply; 21+ messages in thread
From: Prarit Bhargava @ 2015-07-08 12:53 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Taichi Kageyama, gregkh, linux-serial, jslaby, linux-kernel,
	Naoya Horiguchi



On 07/08/2015 07:55 AM, Peter Hurley wrote:
> Hi Taichi,
> 
> On 06/05/2015 06:03 AM, Taichi Kageyama wrote:
>> This patch provides a new parameter as a workaround of the following
>> problem. It allows us to skip autoconfig_irq() and to use a well-known irq
>> number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.
>>
>> There're cases where autoconfig_irq() fails during boot.
>> In these cases, the console doesn't work in interrupt mode,
>> the mode cannot be changed anymore, and "input overrun"
>> (which can make operation mistakes) happens easily.
>> This problem happens with high rate every boot once it occurs
>> because the boot sequence is always almost same.
>>
>> autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
>> during the waiting time, but there're some cases where the CPU cannot
>> handle the interrupt for longer than the time. It completely depends on
>> how other functions work on the CPU. Ideally, autoconfig_irq() should be
>> fixed
> 
> It completely depends on how long some other driver has interrupts disabled,
> which is a problem that needs fixed _in that driver_. autoconfig_irq() does not
> need fixing.
> 
> A typical cause of this behavior is printk spew from overly-instrumented
> debugging of some driver (trace_printk is better suited to heavy instrumentation).
> 

Peter, I understand what you're saying, however the problem is that this is the
serial driver which is one of, if not _the_ main debug output mechanism in the
kernel.

I'm not sure I agree with doing this patch, but perhaps an easier approach would
be to add a debug kernel parameter like "serial.force_irq=4" to force the irq to
a previously known good value?  That way the issue of the broken driver can be
debugged.

P.

> Regards,
> Peter Hurley
> 
> 
>> to control these cases but as long as auto_irq algorithm is used,
>> it's hard to know which CPU can handle an interrupt from a serial and
>> to assure the interrupt of the CPU is enabled during auto_irq.
>> Meanwhile for legacy consoles, they actually don't require auto_irq
>> because they basically use well-known irq number.
>> For non-console serials, this workaround is not required
>> because setserial command can kick autoconfig_irq() again for them.
>>
>> Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> ---
>>   drivers/tty/serial/8250/8250_core.c |   17 +++++++++++++++++
>>   1 files changed, 17 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
>> index 6bf31f2..60fda28 100644
>> --- a/drivers/tty/serial/8250/8250_core.c
>> +++ b/drivers/tty/serial/8250/8250_core.c
>> @@ -65,6 +65,8 @@ static int serial_index(struct uart_port *port)
>>
>>   static unsigned int skip_txen_test; /* force skip of txen test at init time */
>>
>> +static unsigned int skip_cons_autoirq; /* force skip autoirq for console */
>> +
>>   /*
>>    * Debugging.
>>    */
>> @@ -3336,6 +3338,9 @@ serial8250_register_ports(struct uart_driver *drv, struct device *dev)
>>   		if (skip_txen_test)
>>   			up->port.flags |= UPF_NO_TXEN_TEST;
>>
>> +		if (uart_console(&up->port) && skip_cons_autoirq)
>> +			up->port.flags &= ~UPF_AUTO_IRQ;
>> +
>>   		uart_add_one_port(drv, &up->port);
>>   	}
>>   }
>> @@ -3875,6 +3880,9 @@ int serial8250_register_8250_port(struct uart_8250_port *up)
>>   		if (skip_txen_test)
>>   			uart->port.flags |= UPF_NO_TXEN_TEST;
>>
>> +		if (uart_console(&uart->port) && skip_cons_autoirq)
>> +			uart->port.flags &= ~UPF_AUTO_IRQ;
>> +
>>   		if (up->port.flags & UPF_FIXED_TYPE)
>>   			uart->port.type = up->port.type;
>>
>> @@ -3936,6 +3944,10 @@ void serial8250_unregister_port(int line)
>>   		uart->port.flags &= ~UPF_BOOT_AUTOCONF;
>>   		if (skip_txen_test)
>>   			uart->port.flags |= UPF_NO_TXEN_TEST;
>> +
>> +		if (uart_console(&uart->port) && skip_cons_autoirq)
>> +			uart->port.flags &= ~UPF_AUTO_IRQ;
>> +
>>   		uart->port.type = PORT_UNKNOWN;
>>   		uart->port.dev = &serial8250_isa_devs->dev;
>>   		uart->capabilities = 0;
>> @@ -4044,6 +4056,9 @@ MODULE_PARM_DESC(nr_uarts, "Maximum number of UARTs supported. (1-" __MODULE_STR
>>   module_param(skip_txen_test, uint, 0644);
>>   MODULE_PARM_DESC(skip_txen_test, "Skip checking for the TXEN bug at init time");
>>
>> +module_param(skip_cons_autoirq, uint, 0644);
>> +MODULE_PARM_DESC(skip_cons_autoirq, "Skip auto irq for console during boot");
>> +
>>   #ifdef CONFIG_SERIAL_8250_RSA
>>   module_param_array(probe_rsa, ulong, &probe_rsa_count, 0444);
>>   MODULE_PARM_DESC(probe_rsa, "Probe I/O ports for RSA");
>> @@ -4070,6 +4085,8 @@ static void __used s8250_options(void)
>>   	module_param_cb(share_irqs, &param_ops_uint, &share_irqs, 0644);
>>   	module_param_cb(nr_uarts, &param_ops_uint, &nr_uarts, 0644);
>>   	module_param_cb(skip_txen_test, &param_ops_uint, &skip_txen_test, 0644);
>> +	module_param_cb(skip_cons_autoirq, &param_ops_uint,
>> +		&skip_cons_autoirq, 0644);
>>   #ifdef CONFIG_SERIAL_8250_RSA
>>   	__module_param_call(MODULE_PARAM_PREFIX, probe_rsa,
>>   		&param_array_ops, .arr = &__param_arr_probe_rsa,
>>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console
  2015-07-08 12:53     ` Prarit Bhargava
@ 2015-07-08 13:51       ` Peter Hurley
  2015-07-08 14:00         ` Prarit Bhargava
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Hurley @ 2015-07-08 13:51 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: Taichi Kageyama, gregkh, linux-serial, jslaby, linux-kernel,
	Naoya Horiguchi

Hi Prarit,

On 07/08/2015 08:53 AM, Prarit Bhargava wrote:
> 
> 
> On 07/08/2015 07:55 AM, Peter Hurley wrote:
>> Hi Taichi,
>>
>> On 06/05/2015 06:03 AM, Taichi Kageyama wrote:
>>> This patch provides a new parameter as a workaround of the following
>>> problem. It allows us to skip autoconfig_irq() and to use a well-known irq
>>> number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.
>>>
>>> There're cases where autoconfig_irq() fails during boot.
>>> In these cases, the console doesn't work in interrupt mode,
>>> the mode cannot be changed anymore, and "input overrun"
>>> (which can make operation mistakes) happens easily.
>>> This problem happens with high rate every boot once it occurs
>>> because the boot sequence is always almost same.
>>>
>>> autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
>>> during the waiting time, but there're some cases where the CPU cannot
>>> handle the interrupt for longer than the time. It completely depends on
>>> how other functions work on the CPU. Ideally, autoconfig_irq() should be
>>> fixed
>>
>> It completely depends on how long some other driver has interrupts disabled,
>> which is a problem that needs fixed _in that driver_. autoconfig_irq() does not
>> need fixing.
>>
>> A typical cause of this behavior is printk spew from overly-instrumented
>> debugging of some driver (trace_printk is better suited to heavy instrumentation).
>>
> 
> Peter, I understand what you're saying, however the problem is that this is the
> serial driver which is one of, if not _the_ main debug output mechanism in the
> kernel.
> 
> I'm not sure I agree with doing this patch, but perhaps an easier approach would
> be to add a debug kernel parameter like "serial.force_irq=4" to force the irq to
> a previously known good value?  That way the issue of the broken driver can be
> debugged.

For debugging purposes, can you use a kernel built with
CONFIG_SERIAL_8250_DETECT_IRQ=n?

What is the platform and device type?

Regards,
Peter Hurley


> P.
> 
>> Regards,
>> Peter Hurley
>>
>>
>>> to control these cases but as long as auto_irq algorithm is used,
>>> it's hard to know which CPU can handle an interrupt from a serial and
>>> to assure the interrupt of the CPU is enabled during auto_irq.
>>> Meanwhile for legacy consoles, they actually don't require auto_irq
>>> because they basically use well-known irq number.
>>> For non-console serials, this workaround is not required
>>> because setserial command can kick autoconfig_irq() again for them.
>>>
>>> Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>> ---
>>>   drivers/tty/serial/8250/8250_core.c |   17 +++++++++++++++++
>>>   1 files changed, 17 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
>>> index 6bf31f2..60fda28 100644
>>> --- a/drivers/tty/serial/8250/8250_core.c
>>> +++ b/drivers/tty/serial/8250/8250_core.c
>>> @@ -65,6 +65,8 @@ static int serial_index(struct uart_port *port)
>>>
>>>   static unsigned int skip_txen_test; /* force skip of txen test at init time */
>>>
>>> +static unsigned int skip_cons_autoirq; /* force skip autoirq for console */
>>> +
>>>   /*
>>>    * Debugging.
>>>    */
>>> @@ -3336,6 +3338,9 @@ serial8250_register_ports(struct uart_driver *drv, struct device *dev)
>>>   		if (skip_txen_test)
>>>   			up->port.flags |= UPF_NO_TXEN_TEST;
>>>
>>> +		if (uart_console(&up->port) && skip_cons_autoirq)
>>> +			up->port.flags &= ~UPF_AUTO_IRQ;
>>> +
>>>   		uart_add_one_port(drv, &up->port);
>>>   	}
>>>   }
>>> @@ -3875,6 +3880,9 @@ int serial8250_register_8250_port(struct uart_8250_port *up)
>>>   		if (skip_txen_test)
>>>   			uart->port.flags |= UPF_NO_TXEN_TEST;
>>>
>>> +		if (uart_console(&uart->port) && skip_cons_autoirq)
>>> +			uart->port.flags &= ~UPF_AUTO_IRQ;
>>> +
>>>   		if (up->port.flags & UPF_FIXED_TYPE)
>>>   			uart->port.type = up->port.type;
>>>
>>> @@ -3936,6 +3944,10 @@ void serial8250_unregister_port(int line)
>>>   		uart->port.flags &= ~UPF_BOOT_AUTOCONF;
>>>   		if (skip_txen_test)
>>>   			uart->port.flags |= UPF_NO_TXEN_TEST;
>>> +
>>> +		if (uart_console(&uart->port) && skip_cons_autoirq)
>>> +			uart->port.flags &= ~UPF_AUTO_IRQ;
>>> +
>>>   		uart->port.type = PORT_UNKNOWN;
>>>   		uart->port.dev = &serial8250_isa_devs->dev;
>>>   		uart->capabilities = 0;
>>> @@ -4044,6 +4056,9 @@ MODULE_PARM_DESC(nr_uarts, "Maximum number of UARTs supported. (1-" __MODULE_STR
>>>   module_param(skip_txen_test, uint, 0644);
>>>   MODULE_PARM_DESC(skip_txen_test, "Skip checking for the TXEN bug at init time");
>>>
>>> +module_param(skip_cons_autoirq, uint, 0644);
>>> +MODULE_PARM_DESC(skip_cons_autoirq, "Skip auto irq for console during boot");
>>> +
>>>   #ifdef CONFIG_SERIAL_8250_RSA
>>>   module_param_array(probe_rsa, ulong, &probe_rsa_count, 0444);
>>>   MODULE_PARM_DESC(probe_rsa, "Probe I/O ports for RSA");
>>> @@ -4070,6 +4085,8 @@ static void __used s8250_options(void)
>>>   	module_param_cb(share_irqs, &param_ops_uint, &share_irqs, 0644);
>>>   	module_param_cb(nr_uarts, &param_ops_uint, &nr_uarts, 0644);
>>>   	module_param_cb(skip_txen_test, &param_ops_uint, &skip_txen_test, 0644);
>>> +	module_param_cb(skip_cons_autoirq, &param_ops_uint,
>>> +		&skip_cons_autoirq, 0644);
>>>   #ifdef CONFIG_SERIAL_8250_RSA
>>>   	__module_param_call(MODULE_PARAM_PREFIX, probe_rsa,
>>>   		&param_array_ops, .arr = &__param_arr_probe_rsa,
>>>
>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console
  2015-07-08 13:51       ` Peter Hurley
@ 2015-07-08 14:00         ` Prarit Bhargava
  2015-07-09  5:32           ` Taichi Kageyama
  0 siblings, 1 reply; 21+ messages in thread
From: Prarit Bhargava @ 2015-07-08 14:00 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Taichi Kageyama, gregkh, linux-serial, jslaby, linux-kernel,
	Naoya Horiguchi



On 07/08/2015 09:51 AM, Peter Hurley wrote:
> Hi Prarit,
> 
> On 07/08/2015 08:53 AM, Prarit Bhargava wrote:
>>
>>
>> On 07/08/2015 07:55 AM, Peter Hurley wrote:
>>> Hi Taichi,
>>>
>>> On 06/05/2015 06:03 AM, Taichi Kageyama wrote:
>>>> This patch provides a new parameter as a workaround of the following
>>>> problem. It allows us to skip autoconfig_irq() and to use a well-known irq
>>>> number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.
>>>>
>>>> There're cases where autoconfig_irq() fails during boot.
>>>> In these cases, the console doesn't work in interrupt mode,
>>>> the mode cannot be changed anymore, and "input overrun"
>>>> (which can make operation mistakes) happens easily.
>>>> This problem happens with high rate every boot once it occurs
>>>> because the boot sequence is always almost same.
>>>>
>>>> autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
>>>> during the waiting time, but there're some cases where the CPU cannot
>>>> handle the interrupt for longer than the time. It completely depends on
>>>> how other functions work on the CPU. Ideally, autoconfig_irq() should be
>>>> fixed
>>>
>>> It completely depends on how long some other driver has interrupts disabled,
>>> which is a problem that needs fixed _in that driver_. autoconfig_irq() does not
>>> need fixing.
>>>
>>> A typical cause of this behavior is printk spew from overly-instrumented
>>> debugging of some driver (trace_printk is better suited to heavy instrumentation).
>>>
>>
>> Peter, I understand what you're saying, however the problem is that this is the
>> serial driver which is one of, if not _the_ main debug output mechanism in the
>> kernel.
>>
>> I'm not sure I agree with doing this patch, but perhaps an easier approach would
>> be to add a debug kernel parameter like "serial.force_irq=4" to force the irq to
>> a previously known good value?  That way the issue of the broken driver can be
>> debugged.
> 
> For debugging purposes, can you use a kernel built with
> CONFIG_SERIAL_8250_DETECT_IRQ=n?
> 
> What is the platform and device type?
> 

Taichi?

P.

> Regards,
> Peter Hurley
> 
> 
>> P.
>>
>>> Regards,
>>> Peter Hurley
>>>
>>>
>>>> to control these cases but as long as auto_irq algorithm is used,
>>>> it's hard to know which CPU can handle an interrupt from a serial and
>>>> to assure the interrupt of the CPU is enabled during auto_irq.
>>>> Meanwhile for legacy consoles, they actually don't require auto_irq
>>>> because they basically use well-known irq number.
>>>> For non-console serials, this workaround is not required
>>>> because setserial command can kick autoconfig_irq() again for them.
>>>>
>>>> Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>>> ---
>>>>   drivers/tty/serial/8250/8250_core.c |   17 +++++++++++++++++
>>>>   1 files changed, 17 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
>>>> index 6bf31f2..60fda28 100644
>>>> --- a/drivers/tty/serial/8250/8250_core.c
>>>> +++ b/drivers/tty/serial/8250/8250_core.c
>>>> @@ -65,6 +65,8 @@ static int serial_index(struct uart_port *port)
>>>>
>>>>   static unsigned int skip_txen_test; /* force skip of txen test at init time */
>>>>
>>>> +static unsigned int skip_cons_autoirq; /* force skip autoirq for console */
>>>> +
>>>>   /*
>>>>    * Debugging.
>>>>    */
>>>> @@ -3336,6 +3338,9 @@ serial8250_register_ports(struct uart_driver *drv, struct device *dev)
>>>>   		if (skip_txen_test)
>>>>   			up->port.flags |= UPF_NO_TXEN_TEST;
>>>>
>>>> +		if (uart_console(&up->port) && skip_cons_autoirq)
>>>> +			up->port.flags &= ~UPF_AUTO_IRQ;
>>>> +
>>>>   		uart_add_one_port(drv, &up->port);
>>>>   	}
>>>>   }
>>>> @@ -3875,6 +3880,9 @@ int serial8250_register_8250_port(struct uart_8250_port *up)
>>>>   		if (skip_txen_test)
>>>>   			uart->port.flags |= UPF_NO_TXEN_TEST;
>>>>
>>>> +		if (uart_console(&uart->port) && skip_cons_autoirq)
>>>> +			uart->port.flags &= ~UPF_AUTO_IRQ;
>>>> +
>>>>   		if (up->port.flags & UPF_FIXED_TYPE)
>>>>   			uart->port.type = up->port.type;
>>>>
>>>> @@ -3936,6 +3944,10 @@ void serial8250_unregister_port(int line)
>>>>   		uart->port.flags &= ~UPF_BOOT_AUTOCONF;
>>>>   		if (skip_txen_test)
>>>>   			uart->port.flags |= UPF_NO_TXEN_TEST;
>>>> +
>>>> +		if (uart_console(&uart->port) && skip_cons_autoirq)
>>>> +			uart->port.flags &= ~UPF_AUTO_IRQ;
>>>> +
>>>>   		uart->port.type = PORT_UNKNOWN;
>>>>   		uart->port.dev = &serial8250_isa_devs->dev;
>>>>   		uart->capabilities = 0;
>>>> @@ -4044,6 +4056,9 @@ MODULE_PARM_DESC(nr_uarts, "Maximum number of UARTs supported. (1-" __MODULE_STR
>>>>   module_param(skip_txen_test, uint, 0644);
>>>>   MODULE_PARM_DESC(skip_txen_test, "Skip checking for the TXEN bug at init time");
>>>>
>>>> +module_param(skip_cons_autoirq, uint, 0644);
>>>> +MODULE_PARM_DESC(skip_cons_autoirq, "Skip auto irq for console during boot");
>>>> +
>>>>   #ifdef CONFIG_SERIAL_8250_RSA
>>>>   module_param_array(probe_rsa, ulong, &probe_rsa_count, 0444);
>>>>   MODULE_PARM_DESC(probe_rsa, "Probe I/O ports for RSA");
>>>> @@ -4070,6 +4085,8 @@ static void __used s8250_options(void)
>>>>   	module_param_cb(share_irqs, &param_ops_uint, &share_irqs, 0644);
>>>>   	module_param_cb(nr_uarts, &param_ops_uint, &nr_uarts, 0644);
>>>>   	module_param_cb(skip_txen_test, &param_ops_uint, &skip_txen_test, 0644);
>>>> +	module_param_cb(skip_cons_autoirq, &param_ops_uint,
>>>> +		&skip_cons_autoirq, 0644);
>>>>   #ifdef CONFIG_SERIAL_8250_RSA
>>>>   	__module_param_call(MODULE_PARAM_PREFIX, probe_rsa,
>>>>   		&param_array_ops, .arr = &__param_arr_probe_rsa,
>>>>
>>>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] serial: 8250: Fix autoconfig_irq() to avoid race conditions
  2015-06-05  9:57 ` [PATCH 1/2] serial: 8250: Fix autoconfig_irq() to avoid race conditions Taichi Kageyama
@ 2015-07-08 23:35   ` Peter Hurley
  2015-07-23 22:32   ` gregkh
  1 sibling, 0 replies; 21+ messages in thread
From: Peter Hurley @ 2015-07-08 23:35 UTC (permalink / raw)
  To: Taichi Kageyama, gregkh
  Cc: linux-serial, jslaby, linux-kernel, prarit, Naoya Horiguchi

On 06/05/2015 05:57 AM, Taichi Kageyama wrote:
> The following race conditions can happen if a serial is used as console.
>   Case1. CPU_B handles an interrupt from a serial
>     autoconfig_irq() fails whether the interrupt is raised or not
>     if CPU_B is disabled to handle interrupts for longer than it expects.
>   Case2. CPU_B clears UART_IER just after CPU_A sets UART_IER
>     A serial may not make an interrupt.
>     autoconfig_irq() can fail if the interrupt is not raised.
>   Case3. CPU_A sets UART_IER just after CPU_B clears UART_IER
>     This is an unexpected behavior for uart_console_write().
> 
>   CPU_A [autoconfig_irq]      CPU_B [serial8250_console_write]
>   -----------------------------------------------------------------
>   probe_irq_on()              spin_lock_irqsave(&port->lock,)
>   serial_outp(,UART_IER,0x0f) serial_out(,UART_IER,0)
>   udelay(20);                 uart_console_write()
>   probe_irq_off()
>                               spin_unlock_irqrestore(&port->lock,)
>   -----------------------------------------------------------------
> 
> If autoconfig_irq() fails, the console doesn't work in interrupt mode,
> the mode cannot be changed anymore, and "input overrun"
> (which can make operation mistakes) happens easily.
> This problem happens with high rate every boot once it occurs
> because the boot sequence is always almost same.

Reviewed-by: Peter Hurley <peter@hurleysoftware.com>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console
  2015-07-08 14:00         ` Prarit Bhargava
@ 2015-07-09  5:32           ` Taichi Kageyama
  2015-07-11  0:12             ` Peter Hurley
  0 siblings, 1 reply; 21+ messages in thread
From: Taichi Kageyama @ 2015-07-09  5:32 UTC (permalink / raw)
  To: Prarit Bhargava, Peter Hurley
  Cc: gregkh, linux-serial, jslaby, linux-kernel, Naoya Horiguchi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 7825 bytes --]

Hi Peter and Prarit,

On 2015/07/08 23:00, Prarit Bhargava wrote:
>
>
> On 07/08/2015 09:51 AM, Peter Hurley wrote:
>> Hi Prarit,
>>
>> On 07/08/2015 08:53 AM, Prarit Bhargava wrote:
>>>
>>>
>>> On 07/08/2015 07:55 AM, Peter Hurley wrote:
>>>> Hi Taichi,
>>>>
>>>> On 06/05/2015 06:03 AM, Taichi Kageyama wrote:
>>>>> This patch provides a new parameter as a workaround of the following
>>>>> problem. It allows us to skip autoconfig_irq() and to use a well-known irq
>>>>> number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.
>>>>>
>>>>> There're cases where autoconfig_irq() fails during boot.
>>>>> In these cases, the console doesn't work in interrupt mode,
>>>>> the mode cannot be changed anymore, and "input overrun"
>>>>> (which can make operation mistakes) happens easily.
>>>>> This problem happens with high rate every boot once it occurs
>>>>> because the boot sequence is always almost same.
>>>>>
>>>>> autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
>>>>> during the waiting time, but there're some cases where the CPU cannot
>>>>> handle the interrupt for longer than the time. It completely depends on
>>>>> how other functions work on the CPU. Ideally, autoconfig_irq() should be
>>>>> fixed
>>>>

Thank you for your comments.

>>>> It completely depends on how long some other driver has interrupts disabled,

Agree.

>>>> which is a problem that needs fixed _in that driver_. autoconfig_irq() does not
>>>> need fixing.

Peter, ideally, you're right.
However, we cannot assume how long other drivers disable interrupts.
That's why I introduced this workaround.
In my opinion, a console is important and always should be available
even if other drivers have a bad behavior.

>>>>
>>>> A typical cause of this behavior is printk spew from overly-instrumented
>>>> debugging of some driver (trace_printk is better suited to heavy instrumentation).
>>>>
>>>
>>> Peter, I understand what you're saying, however the problem is that this is the
>>> serial driver which is one of, if not _the_ main debug output mechanism in the
>>> kernel.

I tried fixing the one of them, 8250 driver.
  "[PATCH 1/2] serial: 8250: Fix autoconfig_irq() to avoid race conditions".
  https://lkml.org/lkml/2015/6/5/218

The problem can happen when the drivers show just info level message during boot.
It depends on the timing completely.

>>>
>>> I'm not sure I agree with doing this patch, but perhaps an easier approach would
>>> be to add a debug kernel parameter like "serial.force_irq=4" to force the irq to
>>> a previously known good value?  That way the issue of the broken driver can be
>>> debugged.

Prarit, I think it's good if the serial id can be also specified.

I thought adding irq option to "console" kernel parameter before,
but it was not good idea because the impact was not small
and passing irq# was not required in most cases.
    - PnP serial device can get valid irq#
    - Most on-board serial for console use legacy fixed irq#, like irq3, irq4.

So, I designed this patch like below.
  - It allows console serial to skip autoconfig_irq.
    Actually I assumes console serial uses a typical irq# in old_serial_port[],
    but it's the same case where CONFIG_SERIAL_8250_DETECT_IRQ is not defined.
  - It allows non-console serial to kick autoconfig_irq.
    setserial command can try it again later if it fails on boot time.

>>
>> For debugging purposes, can you use a kernel built with
>> CONFIG_SERIAL_8250_DETECT_IRQ=n?

I know "CONFIG_SERIAL_8250_DETECT_IRQ=n" is one of solutions
because autoconfig_irq() is not used on boot time in this case.
However, I know some Linux distributions actually use this config
and I assume someone may require it.

>>
>> What is the platform and device type?
>>
>
> Taichi?

I saw the original problem on x86-64 platforms with RHEL6.6.
The serial was SOL(serial over LAN), not registered as PnP device.


Thanks,
Taichi


>
> P.
>
>> Regards,
>> Peter Hurley
>>
>>
>>> P.
>>>
>>>> Regards,
>>>> Peter Hurley
>>>>
>>>>
>>>>> to control these cases but as long as auto_irq algorithm is used,
>>>>> it's hard to know which CPU can handle an interrupt from a serial and
>>>>> to assure the interrupt of the CPU is enabled during auto_irq.
>>>>> Meanwhile for legacy consoles, they actually don't require auto_irq
>>>>> because they basically use well-known irq number.
>>>>> For non-console serials, this workaround is not required
>>>>> because setserial command can kick autoconfig_irq() again for them.
>>>>>
>>>>> Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
>>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>>>> ---
>>>>>    drivers/tty/serial/8250/8250_core.c |   17 +++++++++++++++++
>>>>>    1 files changed, 17 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
>>>>> index 6bf31f2..60fda28 100644
>>>>> --- a/drivers/tty/serial/8250/8250_core.c
>>>>> +++ b/drivers/tty/serial/8250/8250_core.c
>>>>> @@ -65,6 +65,8 @@ static int serial_index(struct uart_port *port)
>>>>>
>>>>>    static unsigned int skip_txen_test; /* force skip of txen test at init time */
>>>>>
>>>>> +static unsigned int skip_cons_autoirq; /* force skip autoirq for console */
>>>>> +
>>>>>    /*
>>>>>     * Debugging.
>>>>>     */
>>>>> @@ -3336,6 +3338,9 @@ serial8250_register_ports(struct uart_driver *drv, struct device *dev)
>>>>>    		if (skip_txen_test)
>>>>>    			up->port.flags |= UPF_NO_TXEN_TEST;
>>>>>
>>>>> +		if (uart_console(&up->port) && skip_cons_autoirq)
>>>>> +			up->port.flags &= ~UPF_AUTO_IRQ;
>>>>> +
>>>>>    		uart_add_one_port(drv, &up->port);
>>>>>    	}
>>>>>    }
>>>>> @@ -3875,6 +3880,9 @@ int serial8250_register_8250_port(struct uart_8250_port *up)
>>>>>    		if (skip_txen_test)
>>>>>    			uart->port.flags |= UPF_NO_TXEN_TEST;
>>>>>
>>>>> +		if (uart_console(&uart->port) && skip_cons_autoirq)
>>>>> +			uart->port.flags &= ~UPF_AUTO_IRQ;
>>>>> +
>>>>>    		if (up->port.flags & UPF_FIXED_TYPE)
>>>>>    			uart->port.type = up->port.type;
>>>>>
>>>>> @@ -3936,6 +3944,10 @@ void serial8250_unregister_port(int line)
>>>>>    		uart->port.flags &= ~UPF_BOOT_AUTOCONF;
>>>>>    		if (skip_txen_test)
>>>>>    			uart->port.flags |= UPF_NO_TXEN_TEST;
>>>>> +
>>>>> +		if (uart_console(&uart->port) && skip_cons_autoirq)
>>>>> +			uart->port.flags &= ~UPF_AUTO_IRQ;
>>>>> +
>>>>>    		uart->port.type = PORT_UNKNOWN;
>>>>>    		uart->port.dev = &serial8250_isa_devs->dev;
>>>>>    		uart->capabilities = 0;
>>>>> @@ -4044,6 +4056,9 @@ MODULE_PARM_DESC(nr_uarts, "Maximum number of UARTs supported. (1-" __MODULE_STR
>>>>>    module_param(skip_txen_test, uint, 0644);
>>>>>    MODULE_PARM_DESC(skip_txen_test, "Skip checking for the TXEN bug at init time");
>>>>>
>>>>> +module_param(skip_cons_autoirq, uint, 0644);
>>>>> +MODULE_PARM_DESC(skip_cons_autoirq, "Skip auto irq for console during boot");
>>>>> +
>>>>>    #ifdef CONFIG_SERIAL_8250_RSA
>>>>>    module_param_array(probe_rsa, ulong, &probe_rsa_count, 0444);
>>>>>    MODULE_PARM_DESC(probe_rsa, "Probe I/O ports for RSA");
>>>>> @@ -4070,6 +4085,8 @@ static void __used s8250_options(void)
>>>>>    	module_param_cb(share_irqs, &param_ops_uint, &share_irqs, 0644);
>>>>>    	module_param_cb(nr_uarts, &param_ops_uint, &nr_uarts, 0644);
>>>>>    	module_param_cb(skip_txen_test, &param_ops_uint, &skip_txen_test, 0644);
>>>>> +	module_param_cb(skip_cons_autoirq, &param_ops_uint,
>>>>> +		&skip_cons_autoirq, 0644);
>>>>>    #ifdef CONFIG_SERIAL_8250_RSA
>>>>>    	__module_param_call(MODULE_PARAM_PREFIX, probe_rsa,
>>>>>    		&param_array_ops, .arr = &__param_arr_probe_rsa,
>>>>>
>>>>
>>
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console
  2015-07-09  5:32           ` Taichi Kageyama
@ 2015-07-11  0:12             ` Peter Hurley
  2015-07-14  1:16               ` Taichi Kageyama
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Hurley @ 2015-07-11  0:12 UTC (permalink / raw)
  To: Taichi Kageyama, Prarit Bhargava
  Cc: gregkh, linux-serial, jslaby, linux-kernel, Naoya Horiguchi

On 07/09/2015 01:32 AM, Taichi Kageyama wrote:
> On 2015/07/08 23:00, Prarit Bhargava wrote:
>> On 07/08/2015 09:51 AM, Peter Hurley wrote:
>>> On 07/08/2015 08:53 AM, Prarit Bhargava wrote:
>>>> On 07/08/2015 07:55 AM, Peter Hurley wrote:
>>>>> On 06/05/2015 06:03 AM, Taichi Kageyama wrote:
>>>>>> This patch provides a new parameter as a workaround of the following
>>>>>> problem. It allows us to skip autoconfig_irq() and to use a well-known irq
>>>>>> number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.
>>>>>>
>>>>>> There're cases where autoconfig_irq() fails during boot.
>>>>>> In these cases, the console doesn't work in interrupt mode,
>>>>>> the mode cannot be changed anymore, and "input overrun"
>>>>>> (which can make operation mistakes) happens easily.
>>>>>> This problem happens with high rate every boot once it occurs
>>>>>> because the boot sequence is always almost same.
>>>>>>
>>>>>> autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
>>>>>> during the waiting time, but there're some cases where the CPU cannot
>>>>>> handle the interrupt for longer than the time. It completely depends on
>>>>>> how other functions work on the CPU. Ideally, autoconfig_irq() should be
>>>>>> fixed
>>>>>
> 
> Thank you for your comments.
> 
>>>>> It completely depends on how long some other driver has interrupts disabled,
> 
> Agree.
> 
>>>>> which is a problem that needs fixed _in that driver_. autoconfig_irq() does not
>>>>> need fixing.
> 
> Peter, ideally, you're right.
> However, we cannot assume how long other drivers disable interrupts.
> That's why I introduced this workaround.
> In my opinion, a console is important and always should be available
> even if other drivers have a bad behavior.

I have no problem with wanting to make the console more robust, but
rather with the hacky way this is being done.

Better solutions:
1. Fix autoprobing to force irq affinity to autoprobing cpu
2. Fix how the 8250 driver behaves with no irq
3. Fix printk locking to be more granular (this needs doing anyway)
4. Add exclusive port mode for console

Plus it wouldn't hurt to add some diagnostics to automate discovery
of misbehaving drivers that break autoprobing.

Perhaps there's something else going on wrt your Serial-over-Lan device,
like the driver doesn't have it in the correct state. How do you know
that the device has actually raised IRQ and that the problem is
disabled interrupts?


>>>>> A typical cause of this behavior is printk spew from overly-instrumented
>>>>> debugging of some driver (trace_printk is better suited to heavy instrumentation).
>>>>>
>>>>
>>>> Peter, I understand what you're saying, however the problem is that this is the
>>>> serial driver which is one of, if not _the_ main debug output mechanism in the
>>>> kernel.
> 
> I tried fixing the one of them, 8250 driver.
>   "[PATCH 1/2] serial: 8250: Fix autoconfig_irq() to avoid race conditions".
>   https://lkml.org/lkml/2015/6/5/218
> 
> The problem can happen when the drivers show just info level message during boot.
> It depends on the timing completely.

Right, but patch 1/2 addresses that problem, correct?


>>>> I'm not sure I agree with doing this patch, but perhaps an easier approach would
>>>> be to add a debug kernel parameter like "serial.force_irq=4" to force the irq to
>>>> a previously known good value?  That way the issue of the broken driver can be
>>>> debugged.
> 
> Prarit, I think it's good if the serial id can be also specified.
> 
> I thought adding irq option to "console" kernel parameter before,
> but it was not good idea because the impact was not small
> and passing irq# was not required in most cases.
>     - PnP serial device can get valid irq#
>     - Most on-board serial for console use legacy fixed irq#, like irq3, irq4.
> 
> So, I designed this patch like below.
>   - It allows console serial to skip autoconfig_irq.
>     Actually I assumes console serial uses a typical irq# in old_serial_port[],
>     but it's the same case where CONFIG_SERIAL_8250_DETECT_IRQ is not defined.
>   - It allows non-console serial to kick autoconfig_irq.
>     setserial command can try it again later if it fails on boot time.
> 
>>>
>>> For debugging purposes, can you use a kernel built with
>>> CONFIG_SERIAL_8250_DETECT_IRQ=n?
> 
> I know "CONFIG_SERIAL_8250_DETECT_IRQ=n" is one of solutions
> because autoconfig_irq() is not used on boot time in this case.
> However, I know some Linux distributions actually use this config
> and I assume someone may require it.

I don't want to add a module parameter to preemptively fix some
problem that "someone else might require".

Module parameters should absolutely be a last resort solution to
an actual problem that is not solvable some other way.

Regards,
Peter Hurley

>>> What is the platform and device type?
>>>
>>
>> Taichi?
> 
> I saw the original problem on x86-64 platforms with RHEL6.6.
> The serial was SOL(serial over LAN), not registered as PnP device.


[...]

>>>>>> to control these cases but as long as auto_irq algorithm is used,
>>>>>> it's hard to know which CPU can handle an interrupt from a serial and
>>>>>> to assure the interrupt of the CPU is enabled during auto_irq.
>>>>>> Meanwhile for legacy consoles, they actually don't require auto_irq
>>>>>> because they basically use well-known irq number.
>>>>>> For non-console serials, this workaround is not required
>>>>>> because setserial command can kick autoconfig_irq() again for them.
>>>>>>
>>>>>> Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
>>>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>>>>> ---
>>>>>>    drivers/tty/serial/8250/8250_core.c |   17 +++++++++++++++++
>>>>>>    1 files changed, 17 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
>>>>>> index 6bf31f2..60fda28 100644
>>>>>> --- a/drivers/tty/serial/8250/8250_core.c
>>>>>> +++ b/drivers/tty/serial/8250/8250_core.c
>>>>>> @@ -65,6 +65,8 @@ static int serial_index(struct uart_port *port)
>>>>>>
>>>>>>    static unsigned int skip_txen_test; /* force skip of txen test at init time */
>>>>>>
>>>>>> +static unsigned int skip_cons_autoirq; /* force skip autoirq for console */
>>>>>> +
>>>>>>    /*
>>>>>>     * Debugging.
>>>>>>     */
>>>>>> @@ -3336,6 +3338,9 @@ serial8250_register_ports(struct uart_driver *drv, struct device *dev)
>>>>>>    		if (skip_txen_test)
>>>>>>    			up->port.flags |= UPF_NO_TXEN_TEST;
>>>>>>
>>>>>> +		if (uart_console(&up->port) && skip_cons_autoirq)
>>>>>> +			up->port.flags &= ~UPF_AUTO_IRQ;
>>>>>> +
>>>>>>    		uart_add_one_port(drv, &up->port);
>>>>>>    	}
>>>>>>    }
>>>>>> @@ -3875,6 +3880,9 @@ int serial8250_register_8250_port(struct uart_8250_port *up)
>>>>>>    		if (skip_txen_test)
>>>>>>    			uart->port.flags |= UPF_NO_TXEN_TEST;
>>>>>>
>>>>>> +		if (uart_console(&uart->port) && skip_cons_autoirq)
>>>>>> +			uart->port.flags &= ~UPF_AUTO_IRQ;
>>>>>> +
>>>>>>    		if (up->port.flags & UPF_FIXED_TYPE)
>>>>>>    			uart->port.type = up->port.type;
>>>>>>
>>>>>> @@ -3936,6 +3944,10 @@ void serial8250_unregister_port(int line)
>>>>>>    		uart->port.flags &= ~UPF_BOOT_AUTOCONF;
>>>>>>    		if (skip_txen_test)
>>>>>>    			uart->port.flags |= UPF_NO_TXEN_TEST;
>>>>>> +
>>>>>> +		if (uart_console(&uart->port) && skip_cons_autoirq)
>>>>>> +			uart->port.flags &= ~UPF_AUTO_IRQ;
>>>>>> +
>>>>>>    		uart->port.type = PORT_UNKNOWN;
>>>>>>    		uart->port.dev = &serial8250_isa_devs->dev;
>>>>>>    		uart->capabilities = 0;
>>>>>> @@ -4044,6 +4056,9 @@ MODULE_PARM_DESC(nr_uarts, "Maximum number of UARTs supported. (1-" __MODULE_STR
>>>>>>    module_param(skip_txen_test, uint, 0644);
>>>>>>    MODULE_PARM_DESC(skip_txen_test, "Skip checking for the TXEN bug at init time");
>>>>>>
>>>>>> +module_param(skip_cons_autoirq, uint, 0644);
>>>>>> +MODULE_PARM_DESC(skip_cons_autoirq, "Skip auto irq for console during boot");
>>>>>> +
>>>>>>    #ifdef CONFIG_SERIAL_8250_RSA
>>>>>>    module_param_array(probe_rsa, ulong, &probe_rsa_count, 0444);
>>>>>>    MODULE_PARM_DESC(probe_rsa, "Probe I/O ports for RSA");
>>>>>> @@ -4070,6 +4085,8 @@ static void __used s8250_options(void)
>>>>>>    	module_param_cb(share_irqs, &param_ops_uint, &share_irqs, 0644);
>>>>>>    	module_param_cb(nr_uarts, &param_ops_uint, &nr_uarts, 0644);
>>>>>>    	module_param_cb(skip_txen_test, &param_ops_uint, &skip_txen_test, 0644);
>>>>>> +	module_param_cb(skip_cons_autoirq, &param_ops_uint,
>>>>>> +		&skip_cons_autoirq, 0644);
>>>>>>    #ifdef CONFIG_SERIAL_8250_RSA
>>>>>>    	__module_param_call(MODULE_PARAM_PREFIX, probe_rsa,
>>>>>>    		&param_array_ops, .arr = &__param_arr_probe_rsa,
>>>>>>
>>>>>
>>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console
  2015-07-11  0:12             ` Peter Hurley
@ 2015-07-14  1:16               ` Taichi Kageyama
  2015-07-14 19:29                 ` Peter Hurley
  0 siblings, 1 reply; 21+ messages in thread
From: Taichi Kageyama @ 2015-07-14  1:16 UTC (permalink / raw)
  To: Peter Hurley, Prarit Bhargava
  Cc: gregkh, linux-serial, jslaby, linux-kernel, Naoya Horiguchi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 11318 bytes --]

On 2015/07/11 9:12, Peter Hurley wrote:
> On 07/09/2015 01:32 AM, Taichi Kageyama wrote:
>> On 2015/07/08 23:00, Prarit Bhargava wrote:
>>> On 07/08/2015 09:51 AM, Peter Hurley wrote:
>>>> On 07/08/2015 08:53 AM, Prarit Bhargava wrote:
>>>>> On 07/08/2015 07:55 AM, Peter Hurley wrote:
>>>>>> On 06/05/2015 06:03 AM, Taichi Kageyama wrote:
>>>>>>> This patch provides a new parameter as a workaround of the following
>>>>>>> problem. It allows us to skip autoconfig_irq() and to use a well-known irq
>>>>>>> number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.
>>>>>>>
>>>>>>> There're cases where autoconfig_irq() fails during boot.
>>>>>>> In these cases, the console doesn't work in interrupt mode,
>>>>>>> the mode cannot be changed anymore, and "input overrun"
>>>>>>> (which can make operation mistakes) happens easily.
>>>>>>> This problem happens with high rate every boot once it occurs
>>>>>>> because the boot sequence is always almost same.
>>>>>>>
>>>>>>> autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
>>>>>>> during the waiting time, but there're some cases where the CPU cannot
>>>>>>> handle the interrupt for longer than the time. It completely depends on
>>>>>>> how other functions work on the CPU. Ideally, autoconfig_irq() should be
>>>>>>> fixed
>>>>>>
>>
>> Thank you for your comments.
>>
>>>>>> It completely depends on how long some other driver has interrupts disabled,
>>
>> Agree.
>>
>>>>>> which is a problem that needs fixed _in that driver_. autoconfig_irq() does not
>>>>>> need fixing.
>>
>> Peter, ideally, you're right.
>> However, we cannot assume how long other drivers disable interrupts.
>> That's why I introduced this workaround.
>> In my opinion, a console is important and always should be available
>> even if other drivers have a bad behavior.
>
> I have no problem with wanting to make the console more robust, but
> rather with the hacky way this is being done.

Hi Peter,

Thank you for your advice.
If there is other way to fix this problem simply,
I also think it's better than the dirty hack.

> Better solutions:
> 1. Fix autoprobing to force irq affinity to autoprobing cpu

I couldn't make sure which CPU handled serial interrupt
on all platforms before irq# was not known.
Do you know the way to detect which CPU is used for console serial?
The way is safe for all platforms?

> 2. Fix how the 8250 driver behaves with no irq

The console works with polling-mode if irq==0.
I think this behavior should not be changed.

> 3. Fix printk locking to be more granular (this needs doing anyway)
> 4. Add exclusive port mode for console

These two are solutions only for the problem which is caused by printk.
I think my [patch 1/2] is enough to resolve the case at this time.

> Plus it wouldn't hurt to add some diagnostics to automate discovery
> of misbehaving drivers that break autoprobing.

Probably we can add retry code, but it cannot fix this problem completely.
And the retry can be just waste time for some devices because the driver
cannot recognize whether interrupt is disabled temporarily or
interrupt mode is not available as long as using probe_irq_on/off().

I though other workarounds, but they were also not good idea.
  + Close all console sessions to kick autoconfig again
      Console is always opened by getty or other programs after boot.
      It's hard to close them. I think the driver should be fixed.
  + Call autoconfig by force even if console is opened.
      It can be called forcibly if uart_do_autoconfig() is changed,
      but I think console should not be used by userland during autoconfig
      although it is always used by printk...

> Perhaps there's something else going on wrt your Serial-over-Lan device,
> like the driver doesn't have it in the correct state. How do you know
> that the device has actually raised IRQ and that the problem is
> disabled interrupts?

This problem happened on my several systems every boot, luckily:)
I sent NMI just before checking raised IRQ to investigate it.
The following facts were verified.
  + On my system, CPU0 always handled a serial interrupts.
  + serial8250_console_write() kept CPU0 interrupt disabled in this timing.
    It was called when tsc_refine_calibration_work() printed a info message.
  + When I just commented out the printk in tsc_refine_calibration_work(),
    my serial device raised an interrupt immediately
    just after writing their register and CPU0 handled the interrupt.

Additionally, I have seen "irq==0" on other type system too.
I verified it can be reproduced on KVM  but I kept cpu0 disabled
intentionally during autoconfig_irq() at that time.


>>>>>> A typical cause of this behavior is printk spew from overly-instrumented
>>>>>> debugging of some driver (trace_printk is better suited to heavy instrumentation).
>>>>>>
>>>>>
>>>>> Peter, I understand what you're saying, however the problem is that this is the
>>>>> serial driver which is one of, if not _the_ main debug output mechanism in the
>>>>> kernel.
>>
>> I tried fixing the one of them, 8250 driver.
>>    "[PATCH 1/2] serial: 8250: Fix autoconfig_irq() to avoid race conditions".
>>    https://lkml.org/lkml/2015/6/5/218
>>
>> The problem can happen when the drivers show just info level message during boot.
>> It depends on the timing completely.
>
> Right, but patch 1/2 addresses that problem, correct?

Yes.


Thanks,
Taichi


>
>
>>>>> I'm not sure I agree with doing this patch, but perhaps an easier approach would
>>>>> be to add a debug kernel parameter like "serial.force_irq=4" to force the irq to
>>>>> a previously known good value?  That way the issue of the broken driver can be
>>>>> debugged.
>>
>> Prarit, I think it's good if the serial id can be also specified.
>>
>> I thought adding irq option to "console" kernel parameter before,
>> but it was not good idea because the impact was not small
>> and passing irq# was not required in most cases.
>>      - PnP serial device can get valid irq#
>>      - Most on-board serial for console use legacy fixed irq#, like irq3, irq4.
>>
>> So, I designed this patch like below.
>>    - It allows console serial to skip autoconfig_irq.
>>      Actually I assumes console serial uses a typical irq# in old_serial_port[],
>>      but it's the same case where CONFIG_SERIAL_8250_DETECT_IRQ is not defined.
>>    - It allows non-console serial to kick autoconfig_irq.
>>      setserial command can try it again later if it fails on boot time.
>>
>>>>
>>>> For debugging purposes, can you use a kernel built with
>>>> CONFIG_SERIAL_8250_DETECT_IRQ=n?
>>
>> I know "CONFIG_SERIAL_8250_DETECT_IRQ=n" is one of solutions
>> because autoconfig_irq() is not used on boot time in this case.
>> However, I know some Linux distributions actually use this config
>> and I assume someone may require it.
>
> I don't want to add a module parameter to preemptively fix some
> problem that "someone else might require".
>
> Module parameters should absolutely be a last resort solution to
> an actual problem that is not solvable some other way.
>
> Regards,
> Peter Hurley
>
>>>> What is the platform and device type?
>>>>
>>>
>>> Taichi?
>>
>> I saw the original problem on x86-64 platforms with RHEL6.6.
>> The serial was SOL(serial over LAN), not registered as PnP device.
>
>
> [...]
>
>>>>>>> to control these cases but as long as auto_irq algorithm is used,
>>>>>>> it's hard to know which CPU can handle an interrupt from a serial and
>>>>>>> to assure the interrupt of the CPU is enabled during auto_irq.
>>>>>>> Meanwhile for legacy consoles, they actually don't require auto_irq
>>>>>>> because they basically use well-known irq number.
>>>>>>> For non-console serials, this workaround is not required
>>>>>>> because setserial command can kick autoconfig_irq() again for them.
>>>>>>>
>>>>>>> Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
>>>>>>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>>>>>> ---
>>>>>>>     drivers/tty/serial/8250/8250_core.c |   17 +++++++++++++++++
>>>>>>>     1 files changed, 17 insertions(+), 0 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
>>>>>>> index 6bf31f2..60fda28 100644
>>>>>>> --- a/drivers/tty/serial/8250/8250_core.c
>>>>>>> +++ b/drivers/tty/serial/8250/8250_core.c
>>>>>>> @@ -65,6 +65,8 @@ static int serial_index(struct uart_port *port)
>>>>>>>
>>>>>>>     static unsigned int skip_txen_test; /* force skip of txen test at init time */
>>>>>>>
>>>>>>> +static unsigned int skip_cons_autoirq; /* force skip autoirq for console */
>>>>>>> +
>>>>>>>     /*
>>>>>>>      * Debugging.
>>>>>>>      */
>>>>>>> @@ -3336,6 +3338,9 @@ serial8250_register_ports(struct uart_driver *drv, struct device *dev)
>>>>>>>     		if (skip_txen_test)
>>>>>>>     			up->port.flags |= UPF_NO_TXEN_TEST;
>>>>>>>
>>>>>>> +		if (uart_console(&up->port) && skip_cons_autoirq)
>>>>>>> +			up->port.flags &= ~UPF_AUTO_IRQ;
>>>>>>> +
>>>>>>>     		uart_add_one_port(drv, &up->port);
>>>>>>>     	}
>>>>>>>     }
>>>>>>> @@ -3875,6 +3880,9 @@ int serial8250_register_8250_port(struct uart_8250_port *up)
>>>>>>>     		if (skip_txen_test)
>>>>>>>     			uart->port.flags |= UPF_NO_TXEN_TEST;
>>>>>>>
>>>>>>> +		if (uart_console(&uart->port) && skip_cons_autoirq)
>>>>>>> +			uart->port.flags &= ~UPF_AUTO_IRQ;
>>>>>>> +
>>>>>>>     		if (up->port.flags & UPF_FIXED_TYPE)
>>>>>>>     			uart->port.type = up->port.type;
>>>>>>>
>>>>>>> @@ -3936,6 +3944,10 @@ void serial8250_unregister_port(int line)
>>>>>>>     		uart->port.flags &= ~UPF_BOOT_AUTOCONF;
>>>>>>>     		if (skip_txen_test)
>>>>>>>     			uart->port.flags |= UPF_NO_TXEN_TEST;
>>>>>>> +
>>>>>>> +		if (uart_console(&uart->port) && skip_cons_autoirq)
>>>>>>> +			uart->port.flags &= ~UPF_AUTO_IRQ;
>>>>>>> +
>>>>>>>     		uart->port.type = PORT_UNKNOWN;
>>>>>>>     		uart->port.dev = &serial8250_isa_devs->dev;
>>>>>>>     		uart->capabilities = 0;
>>>>>>> @@ -4044,6 +4056,9 @@ MODULE_PARM_DESC(nr_uarts, "Maximum number of UARTs supported. (1-" __MODULE_STR
>>>>>>>     module_param(skip_txen_test, uint, 0644);
>>>>>>>     MODULE_PARM_DESC(skip_txen_test, "Skip checking for the TXEN bug at init time");
>>>>>>>
>>>>>>> +module_param(skip_cons_autoirq, uint, 0644);
>>>>>>> +MODULE_PARM_DESC(skip_cons_autoirq, "Skip auto irq for console during boot");
>>>>>>> +
>>>>>>>     #ifdef CONFIG_SERIAL_8250_RSA
>>>>>>>     module_param_array(probe_rsa, ulong, &probe_rsa_count, 0444);
>>>>>>>     MODULE_PARM_DESC(probe_rsa, "Probe I/O ports for RSA");
>>>>>>> @@ -4070,6 +4085,8 @@ static void __used s8250_options(void)
>>>>>>>     	module_param_cb(share_irqs, &param_ops_uint, &share_irqs, 0644);
>>>>>>>     	module_param_cb(nr_uarts, &param_ops_uint, &nr_uarts, 0644);
>>>>>>>     	module_param_cb(skip_txen_test, &param_ops_uint, &skip_txen_test, 0644);
>>>>>>> +	module_param_cb(skip_cons_autoirq, &param_ops_uint,
>>>>>>> +		&skip_cons_autoirq, 0644);
>>>>>>>     #ifdef CONFIG_SERIAL_8250_RSA
>>>>>>>     	__module_param_call(MODULE_PARAM_PREFIX, probe_rsa,
>>>>>>>     		&param_array_ops, .arr = &__param_arr_probe_rsa,
>>>>>>>
>>>>>>
>>>>
>
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console
  2015-07-14  1:16               ` Taichi Kageyama
@ 2015-07-14 19:29                 ` Peter Hurley
  2015-07-16  9:58                   ` Taichi Kageyama
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Hurley @ 2015-07-14 19:29 UTC (permalink / raw)
  To: Taichi Kageyama, Prarit Bhargava
  Cc: gregkh, linux-serial, jslaby, linux-kernel, Naoya Horiguchi

Hi Taichi,

On 07/13/2015 09:16 PM, Taichi Kageyama wrote:
> On 2015/07/11 9:12, Peter Hurley wrote:
>> On 07/09/2015 01:32 AM, Taichi Kageyama wrote:
>>> On 2015/07/08 23:00, Prarit Bhargava wrote:
>>>> On 07/08/2015 09:51 AM, Peter Hurley wrote:
>>>>> On 07/08/2015 08:53 AM, Prarit Bhargava wrote:
>>>>>> On 07/08/2015 07:55 AM, Peter Hurley wrote:
>>>>>>> On 06/05/2015 06:03 AM, Taichi Kageyama wrote:
>>>>>>>> This patch provides a new parameter as a workaround of the following
>>>>>>>> problem. It allows us to skip autoconfig_irq() and to use a well-known irq
>>>>>>>> number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.
>>>>>>>>
>>>>>>>> There're cases where autoconfig_irq() fails during boot.
>>>>>>>> In these cases, the console doesn't work in interrupt mode,
>>>>>>>> the mode cannot be changed anymore, and "input overrun"
>>>>>>>> (which can make operation mistakes) happens easily.
>>>>>>>> This problem happens with high rate every boot once it occurs
>>>>>>>> because the boot sequence is always almost same.
>>>>>>>>
>>>>>>>> autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
>>>>>>>> during the waiting time, but there're some cases where the CPU cannot
>>>>>>>> handle the interrupt for longer than the time. It completely depends on
>>>>>>>> how other functions work on the CPU. Ideally, autoconfig_irq() should be
>>>>>>>> fixed
>>>>>>>
>>>
>>> Thank you for your comments.
>>>
>>>>>>> It completely depends on how long some other driver has interrupts disabled,
>>>
>>> Agree.
>>>
>>>>>>> which is a problem that needs fixed _in that driver_. autoconfig_irq() does not
>>>>>>> need fixing.
>>>
>>> Peter, ideally, you're right.
>>> However, we cannot assume how long other drivers disable interrupts.
>>> That's why I introduced this workaround.
>>> In my opinion, a console is important and always should be available
>>> even if other drivers have a bad behavior.
>>
>> I have no problem with wanting to make the console more robust, but
>> rather with the hacky way this is being done.
> 
> Hi Peter,
> 
> Thank you for your advice.
> If there is other way to fix this problem simply,
> I also think it's better than the dirty hack.

While module parameters seem like "simple" solutions at the time,
they add real maintenance burden, because they establish userspace
requirements that must be preserved forever to avoid breakage.


>> Better solutions:
>> 1. Fix autoprobing to force irq affinity to autoprobing cpu
> 
> I couldn't make sure which CPU handled serial interrupt
> on all platforms before irq# was not known.
> Do you know the way to detect which CPU is used for console serial?


The basic idea would be:
1. disable preemption
2. for each irq descriptor selected for autoprobing, set the irq
   affinity to the current processor.
3. probe the i/o port as is done now
4. stop probing
5. re-enable preemption.

With this solution, your patch 1/2 wouldn't be required either
because the worker thread that disabled interrupts wouldn't be
running on the cpu detecting the triggered irq(s).

I would imagine most or all of this would be done in
probe_irq_on(), possibly refactored to perform the preemption
disable and irq affinity.


> The way is safe for all platforms?

Please understand though, autoprobing is not safe, period.
Even says so in Kconfig.


>> 2. Fix how the 8250 driver behaves with no irq
> 
> The console works with polling-mode if irq==0.
> I think this behavior should not be changed.

So another solution would be to use setserial to set the irq during
boot, right?


>> 3. Fix printk locking to be more granular (this needs doing anyway)
>> 4. Add exclusive port mode for console
> 
> These two are solutions only for the problem which is caused by printk.
> I think my [patch 1/2] is enough to resolve the case at this time.
> 
>> Plus it wouldn't hurt to add some diagnostics to automate discovery
>> of misbehaving drivers that break autoprobing.
> 
> Probably we can add retry code, but it cannot fix this problem completely.
> And the retry can be just waste time for some devices because the driver
> cannot recognize whether interrupt is disabled temporarily or
> interrupt mode is not available as long as using probe_irq_on/off().
> 
> I though other workarounds, but they were also not good idea.
>   + Close all console sessions to kick autoconfig again
>       Console is always opened by getty or other programs after boot.
>       It's hard to close them. I think the driver should be fixed.
>   + Call autoconfig by force even if console is opened.
>       It can be called forcibly if uart_do_autoconfig() is changed,
>       but I think console should not be used by userland during autoconfig
>       although it is always used by printk...

Yeah, none of these workarounds seem appropriate.

Regards,
Peter Hurley



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console
  2015-07-14 19:29                 ` Peter Hurley
@ 2015-07-16  9:58                   ` Taichi Kageyama
  2015-07-20 16:36                     ` Peter Hurley
  0 siblings, 1 reply; 21+ messages in thread
From: Taichi Kageyama @ 2015-07-16  9:58 UTC (permalink / raw)
  To: Peter Hurley, Prarit Bhargava
  Cc: gregkh, linux-serial, jslaby, linux-kernel, Naoya Horiguchi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 6250 bytes --]

Hi Peter,

On 2015/07/15 4:29, Peter Hurley wrote:
> On 07/13/2015 09:16 PM, Taichi Kageyama wrote:
>> On 2015/07/11 9:12, Peter Hurley wrote:
>>> On 07/09/2015 01:32 AM, Taichi Kageyama wrote:
>>>> On 2015/07/08 23:00, Prarit Bhargava wrote:
>>>>> On 07/08/2015 09:51 AM, Peter Hurley wrote:
>>>>>> On 07/08/2015 08:53 AM, Prarit Bhargava wrote:
>>>>>>> On 07/08/2015 07:55 AM, Peter Hurley wrote:
>>>>>>>> On 06/05/2015 06:03 AM, Taichi Kageyama wrote:
>>>>>>>>> This patch provides a new parameter as a workaround of the following
>>>>>>>>> problem. It allows us to skip autoconfig_irq() and to use a well-known irq
>>>>>>>>> number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.
>>>>>>>>>
>>>>>>>>> There're cases where autoconfig_irq() fails during boot.
>>>>>>>>> In these cases, the console doesn't work in interrupt mode,
>>>>>>>>> the mode cannot be changed anymore, and "input overrun"
>>>>>>>>> (which can make operation mistakes) happens easily.
>>>>>>>>> This problem happens with high rate every boot once it occurs
>>>>>>>>> because the boot sequence is always almost same.
>>>>>>>>>
>>>>>>>>> autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
>>>>>>>>> during the waiting time, but there're some cases where the CPU cannot
>>>>>>>>> handle the interrupt for longer than the time. It completely depends on
>>>>>>>>> how other functions work on the CPU. Ideally, autoconfig_irq() should be
>>>>>>>>> fixed
>>>>>>>>
>>>>
>>>> Thank you for your comments.
>>>>
>>>>>>>> It completely depends on how long some other driver has interrupts disabled,
>>>>
>>>> Agree.
>>>>
>>>>>>>> which is a problem that needs fixed _in that driver_. autoconfig_irq() does not
>>>>>>>> need fixing.
>>>>
>>>> Peter, ideally, you're right.
>>>> However, we cannot assume how long other drivers disable interrupts.
>>>> That's why I introduced this workaround.
>>>> In my opinion, a console is important and always should be available
>>>> even if other drivers have a bad behavior.
>>>
>>> I have no problem with wanting to make the console more robust, but
>>> rather with the hacky way this is being done.
>>
>> Hi Peter,
>>
>> Thank you for your advice.
>> If there is other way to fix this problem simply,
>> I also think it's better than the dirty hack.
>
> While module parameters seem like "simple" solutions at the time,
> they add real maintenance burden, because they establish userspace
> requirements that must be preserved forever to avoid breakage.

Yeah, I agree with you.

>>> Better solutions:
>>> 1. Fix autoprobing to force irq affinity to autoprobing cpu
>>
>> I couldn't make sure which CPU handled serial interrupt
>> on all platforms before irq# was not known.
>> Do you know the way to detect which CPU is used for console serial?
>
>
> The basic idea would be:
> 1. disable preemption
> 2. for each irq descriptor selected for autoprobing, set the irq
>     affinity to the current processor.
> 3. probe the i/o port as is done now
> 4. stop probing
> 5. re-enable preemption.

Thanks, I think it works.

> With this solution, your patch 1/2 wouldn't be required either
> because the worker thread that disabled interrupts wouldn't be
> running on the cpu detecting the triggered irq(s).

I still need my patch 1/2 which fixes also other cases (see case2 & 3).
I think both port->lock and console_lock are required in your solution.
to avoid deadlock because printk() can be called on every context.

> I would imagine most or all of this would be done in
> probe_irq_on(), possibly refactored to perform the preemption
> disable and irq affinity.

I think introducing new function like "probe_irq_set_affinity()" is better
than modifying probe_irq_on(). I cannot test all legacy devices and
I don't have any reason to break the code which works for other devices.

>> The way is safe for all platforms?
>
> Please understand though, autoprobing is not safe, period.
> Even says so in Kconfig.

OK, I'll try to create new patch which makes autoprobing safer as possible.
New patch is going to be like below.
  1. console and port lock
  2. probe_irq_on()
  3. probe_irq_set_affinity(&cpumask)
  4. probe_irq_off()
  5. port and console unlock

I think we don't have to care saving original affinity value and restore it.
Probably serial8250_startup->... setup_affinity() resets irq affinity
as default every after autoconfig_irq() is called.

>>> 2. Fix how the 8250 driver behaves with no irq
>>
>> The console works with polling-mode if irq==0.
>> I think this behavior should not be changed.
>
> So another solution would be to use setserial to set the irq during
> boot, right?

Right, if it works. Users have to kick setserial before some programs
which opens the console and make sure its irq# is valid.


Thanks,
Taichi


>>> 3. Fix printk locking to be more granular (this needs doing anyway)
>>> 4. Add exclusive port mode for console
>>
>> These two are solutions only for the problem which is caused by printk.
>> I think my [patch 1/2] is enough to resolve the case at this time.
>>
>>> Plus it wouldn't hurt to add some diagnostics to automate discovery
>>> of misbehaving drivers that break autoprobing.
>>
>> Probably we can add retry code, but it cannot fix this problem completely.
>> And the retry can be just waste time for some devices because the driver
>> cannot recognize whether interrupt is disabled temporarily or
>> interrupt mode is not available as long as using probe_irq_on/off().
>>
>> I though other workarounds, but they were also not good idea.
>>    + Close all console sessions to kick autoconfig again
>>        Console is always opened by getty or other programs after boot.
>>        It's hard to close them. I think the driver should be fixed.
>>    + Call autoconfig by force even if console is opened.
>>        It can be called forcibly if uart_do_autoconfig() is changed,
>>        but I think console should not be used by userland during autoconfig
>>        although it is always used by printk...
>
> Yeah, none of these workarounds seem appropriate.
>
> Regards,
> Peter Hurley
>
>
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console
  2015-07-16  9:58                   ` Taichi Kageyama
@ 2015-07-20 16:36                     ` Peter Hurley
  2015-07-21  9:44                       ` Taichi Kageyama
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Hurley @ 2015-07-20 16:36 UTC (permalink / raw)
  To: Taichi Kageyama
  Cc: Prarit Bhargava, gregkh, linux-serial, jslaby, linux-kernel,
	Naoya Horiguchi

Hi Taichi,

On 07/16/2015 05:58 AM, Taichi Kageyama wrote:
> On 2015/07/15 4:29, Peter Hurley wrote:
>> On 07/13/2015 09:16 PM, Taichi Kageyama wrote:
>>> On 2015/07/11 9:12, Peter Hurley wrote:
>>>> On 07/09/2015 01:32 AM, Taichi Kageyama wrote:
>>>>> On 2015/07/08 23:00, Prarit Bhargava wrote:
>>>>>> On 07/08/2015 09:51 AM, Peter Hurley wrote:
>>>>>>> On 07/08/2015 08:53 AM, Prarit Bhargava wrote:
>>>>>>>> On 07/08/2015 07:55 AM, Peter Hurley wrote:
>>>>>>>>> On 06/05/2015 06:03 AM, Taichi Kageyama wrote:
>>>>>>>>>> This patch provides a new parameter as a workaround of the following
>>>>>>>>>> problem. It allows us to skip autoconfig_irq() and to use a well-known irq
>>>>>>>>>> number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.
>>>>>>>>>>
>>>>>>>>>> There're cases where autoconfig_irq() fails during boot.
>>>>>>>>>> In these cases, the console doesn't work in interrupt mode,
>>>>>>>>>> the mode cannot be changed anymore, and "input overrun"
>>>>>>>>>> (which can make operation mistakes) happens easily.
>>>>>>>>>> This problem happens with high rate every boot once it occurs
>>>>>>>>>> because the boot sequence is always almost same.
>>>>>>>>>>
>>>>>>>>>> autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
>>>>>>>>>> during the waiting time, but there're some cases where the CPU cannot
>>>>>>>>>> handle the interrupt for longer than the time. It completely depends on
>>>>>>>>>> how other functions work on the CPU. Ideally, autoconfig_irq() should be
>>>>>>>>>> fixed
>>>>>>>>>
>>>>>
>>>>> Thank you for your comments.
>>>>>
>>>>>>>>> It completely depends on how long some other driver has interrupts disabled,
>>>>>
>>>>> Agree.
>>>>>
>>>>>>>>> which is a problem that needs fixed _in that driver_. autoconfig_irq() does not
>>>>>>>>> need fixing.
>>>>>
>>>>> Peter, ideally, you're right.
>>>>> However, we cannot assume how long other drivers disable interrupts.
>>>>> That's why I introduced this workaround.
>>>>> In my opinion, a console is important and always should be available
>>>>> even if other drivers have a bad behavior.
>>>>
>>>> I have no problem with wanting to make the console more robust, but
>>>> rather with the hacky way this is being done.
>>>
>>> Hi Peter,
>>>
>>> Thank you for your advice.
>>> If there is other way to fix this problem simply,
>>> I also think it's better than the dirty hack.
>>
>> While module parameters seem like "simple" solutions at the time,
>> they add real maintenance burden, because they establish userspace
>> requirements that must be preserved forever to avoid breakage.
> 
> Yeah, I agree with you.
> 
>>>> Better solutions:
>>>> 1. Fix autoprobing to force irq affinity to autoprobing cpu
>>>
>>> I couldn't make sure which CPU handled serial interrupt
>>> on all platforms before irq# was not known.
>>> Do you know the way to detect which CPU is used for console serial?
>>
>>
>> The basic idea would be:
>> 1. disable preemption
>> 2. for each irq descriptor selected for autoprobing, set the irq
>>     affinity to the current processor.
>> 3. probe the i/o port as is done now
>> 4. stop probing
>> 5. re-enable preemption.
> 
> Thanks, I think it works.
> 
>> With this solution, your patch 1/2 wouldn't be required either
>> because the worker thread that disabled interrupts wouldn't be
>> running on the cpu detecting the triggered irq(s).
> 
> I still need my patch 1/2 which fixes also other cases (see case2 & 3).
> I think both port->lock and console_lock are required in your solution.
> to avoid deadlock because printk() can be called on every context.
> 
>> I would imagine most or all of this would be done in
>> probe_irq_on(), possibly refactored to perform the preemption
>> disable and irq affinity.
> 
> I think introducing new function like "probe_irq_set_affinity()" is better
> than modifying probe_irq_on(). I cannot test all legacy devices and
> I don't have any reason to break the code which works for other devices.

That's fine, although most of the arguments for fixing this in the serial
driver apply equally to other users of probe_irq_on().


>>> The way is safe for all platforms?
>>
>> Please understand though, autoprobing is not safe, period.
>> Even says so in Kconfig.
> 
> OK, I'll try to create new patch which makes autoprobing safer as possible.
> New patch is going to be like below.
>   1. console and port lock
>   2. probe_irq_on()
>   3. probe_irq_set_affinity(&cpumask)
>   4. probe_irq_off()
>   5. port and console unlock

The port->lock can't be taken in this context because hard irq
has to be disabled with port->lock which defeats the purpose of
pinning the irq affinity to the current cpu.

What are you concerned about being concurrent with autoconfig_irq()?
Many operations are excluded by the port->mutex.


> I think we don't have to care saving original affinity value and restore it.
> Probably serial8250_startup->... setup_affinity() resets irq affinity
> as default every after autoconfig_irq() is called.
> 
>>>> 2. Fix how the 8250 driver behaves with no irq
>>>
>>> The console works with polling-mode if irq==0.
>>> I think this behavior should not be changed.
>>
>> So another solution would be to use setserial to set the irq during
>> boot, right?
> 
> Right, if it works. Users have to kick setserial before some programs
> which opens the console and make sure its irq# is valid.

Ubuntu does this as part of boot (restores saved serial setup) but I
understand other distros might not do this.

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console
  2015-07-20 16:36                     ` Peter Hurley
@ 2015-07-21  9:44                       ` Taichi Kageyama
  2015-07-27 19:44                         ` Peter Hurley
  0 siblings, 1 reply; 21+ messages in thread
From: Taichi Kageyama @ 2015-07-21  9:44 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Prarit Bhargava, gregkh, linux-serial, jslaby, linux-kernel,
	Naoya Horiguchi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 6762 bytes --]

Hi Peter,

On 2015/07/21 1:36, Peter Hurley wrote:
> On 07/16/2015 05:58 AM, Taichi Kageyama wrote:
>> On 2015/07/15 4:29, Peter Hurley wrote:
>>> On 07/13/2015 09:16 PM, Taichi Kageyama wrote:
>>>> On 2015/07/11 9:12, Peter Hurley wrote:
>>>>> On 07/09/2015 01:32 AM, Taichi Kageyama wrote:
>>>>>> On 2015/07/08 23:00, Prarit Bhargava wrote:
>>>>>>> On 07/08/2015 09:51 AM, Peter Hurley wrote:
>>>>>>>> On 07/08/2015 08:53 AM, Prarit Bhargava wrote:
>>>>>>>>> On 07/08/2015 07:55 AM, Peter Hurley wrote:
>>>>>>>>>> On 06/05/2015 06:03 AM, Taichi Kageyama wrote:
>>>>>>>>>>> This patch provides a new parameter as a workaround of the following
>>>>>>>>>>> problem. It allows us to skip autoconfig_irq() and to use a well-known irq
>>>>>>>>>>> number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.
>>>>>>>>>>>
>>>>>>>>>>> There're cases where autoconfig_irq() fails during boot.
>>>>>>>>>>> In these cases, the console doesn't work in interrupt mode,
>>>>>>>>>>> the mode cannot be changed anymore, and "input overrun"
>>>>>>>>>>> (which can make operation mistakes) happens easily.
>>>>>>>>>>> This problem happens with high rate every boot once it occurs
>>>>>>>>>>> because the boot sequence is always almost same.
>>>>>>>>>>>
>>>>>>>>>>> autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
>>>>>>>>>>> during the waiting time, but there're some cases where the CPU cannot
>>>>>>>>>>> handle the interrupt for longer than the time. It completely depends on
>>>>>>>>>>> how other functions work on the CPU. Ideally, autoconfig_irq() should be
>>>>>>>>>>> fixed
>>>>>>>>>>
>>>>>>
>>>>>> Thank you for your comments.
>>>>>>
>>>>>>>>>> It completely depends on how long some other driver has interrupts disabled,
>>>>>>
>>>>>> Agree.
>>>>>>
>>>>>>>>>> which is a problem that needs fixed _in that driver_. autoconfig_irq() does not
>>>>>>>>>> need fixing.
>>>>>>
>>>>>> Peter, ideally, you're right.
>>>>>> However, we cannot assume how long other drivers disable interrupts.
>>>>>> That's why I introduced this workaround.
>>>>>> In my opinion, a console is important and always should be available
>>>>>> even if other drivers have a bad behavior.
>>>>>
>>>>> I have no problem with wanting to make the console more robust, but
>>>>> rather with the hacky way this is being done.
>>>>
>>>> Hi Peter,
>>>>
>>>> Thank you for your advice.
>>>> If there is other way to fix this problem simply,
>>>> I also think it's better than the dirty hack.
>>>
>>> While module parameters seem like "simple" solutions at the time,
>>> they add real maintenance burden, because they establish userspace
>>> requirements that must be preserved forever to avoid breakage.
>>
>> Yeah, I agree with you.
>>
>>>>> Better solutions:
>>>>> 1. Fix autoprobing to force irq affinity to autoprobing cpu
>>>>
>>>> I couldn't make sure which CPU handled serial interrupt
>>>> on all platforms before irq# was not known.
>>>> Do you know the way to detect which CPU is used for console serial?
>>>
>>>
>>> The basic idea would be:
>>> 1. disable preemption
>>> 2. for each irq descriptor selected for autoprobing, set the irq
>>>      affinity to the current processor.
>>> 3. probe the i/o port as is done now
>>> 4. stop probing
>>> 5. re-enable preemption.
>>
>> Thanks, I think it works.
>>
>>> With this solution, your patch 1/2 wouldn't be required either
>>> because the worker thread that disabled interrupts wouldn't be
>>> running on the cpu detecting the triggered irq(s).
>>
>> I still need my patch 1/2 which fixes also other cases (see case2 & 3).
>> I think both port->lock and console_lock are required in your solution.
>> to avoid deadlock because printk() can be called on every context.
>>
>>> I would imagine most or all of this would be done in
>>> probe_irq_on(), possibly refactored to perform the preemption
>>> disable and irq affinity.
>>
>> I think introducing new function like "probe_irq_set_affinity()" is better
>> than modifying probe_irq_on(). I cannot test all legacy devices and
>> I don't have any reason to break the code which works for other devices.
>
> That's fine, although most of the arguments for fixing this in the serial
> driver apply equally to other users of probe_irq_on().
>
>
>>>> The way is safe for all platforms?
>>>
>>> Please understand though, autoprobing is not safe, period.
>>> Even says so in Kconfig.
>>
>> OK, I'll try to create new patch which makes autoprobing safer as possible.
>> New patch is going to be like below.
>>    1. console and port lock
>>    2. probe_irq_on()
>>    3. probe_irq_set_affinity(&cpumask)
>>    4. probe_irq_off()
>>    5. port and console unlock
>
> The port->lock can't be taken in this context because hard irq
> has to be disabled with port->lock which defeats the purpose of
> pinning the irq affinity to the current cpu.

My test code uses spin_lock() instead of spin_lock_irqsave().

> What are you concerned about being concurrent with autoconfig_irq()?
> Many operations are excluded by the port->mutex.

Actually I don't have any concerns as long as console_lock() is used,
but I thought protecting port was better during auto_irq
or register operations as same as autoconfig().

I was thinking they are used as the following purposes;
   console_lock()
    + Make sure serial8250_console_write() doesn't disable interrupt,
      try to get port->lock or touch the ctrl register of the port.
      # serial8250_console_write() can be called in any context.
   spin_lock()
    + Make sure the probing runs on the current CPU only
      to handle a serial irq by itself after setting irq affinity.
    + Make sure any other CPUs don't touch the ctrl register of the port.

It seems my test code has been working fine so far,
but let me know if you have any concerns about using spin_lock()
instead of preempt_disable().


Thanks,
Taichi


>> I think we don't have to care saving original affinity value and restore it.
>> Probably serial8250_startup->... setup_affinity() resets irq affinity
>> as default every after autoconfig_irq() is called.
>>
>>>>> 2. Fix how the 8250 driver behaves with no irq
>>>>
>>>> The console works with polling-mode if irq==0.
>>>> I think this behavior should not be changed.
>>>
>>> So another solution would be to use setserial to set the irq during
>>> boot, right?
>>
>> Right, if it works. Users have to kick setserial before some programs
>> which opens the console and make sure its irq# is valid.
>
> Ubuntu does this as part of boot (restores saved serial setup) but I
> understand other distros might not do this.
>
> Regards,
> Peter Hurley
>
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] serial: 8250: Fix autoconfig_irq() to avoid race conditions
  2015-06-05  9:57 ` [PATCH 1/2] serial: 8250: Fix autoconfig_irq() to avoid race conditions Taichi Kageyama
  2015-07-08 23:35   ` Peter Hurley
@ 2015-07-23 22:32   ` gregkh
  2015-07-29  8:13     ` Taichi Kageyama
  1 sibling, 1 reply; 21+ messages in thread
From: gregkh @ 2015-07-23 22:32 UTC (permalink / raw)
  To: Taichi Kageyama
  Cc: linux-serial, jslaby, linux-kernel, prarit, Naoya Horiguchi

On Fri, Jun 05, 2015 at 09:57:40AM +0000, Taichi Kageyama wrote:
> The following race conditions can happen if a serial is used as console.
>   Case1. CPU_B handles an interrupt from a serial
>     autoconfig_irq() fails whether the interrupt is raised or not
>     if CPU_B is disabled to handle interrupts for longer than it expects.
>   Case2. CPU_B clears UART_IER just after CPU_A sets UART_IER
>     A serial may not make an interrupt.
>     autoconfig_irq() can fail if the interrupt is not raised.
>   Case3. CPU_A sets UART_IER just after CPU_B clears UART_IER
>     This is an unexpected behavior for uart_console_write().
> 
>   CPU_A [autoconfig_irq]      CPU_B [serial8250_console_write]
>   -----------------------------------------------------------------
>   probe_irq_on()              spin_lock_irqsave(&port->lock,)
>   serial_outp(,UART_IER,0x0f) serial_out(,UART_IER,0)
>   udelay(20);                 uart_console_write()
>   probe_irq_off()
>                               spin_unlock_irqrestore(&port->lock,)
>   -----------------------------------------------------------------
> 
> If autoconfig_irq() fails, the console doesn't work in interrupt mode,
> the mode cannot be changed anymore, and "input overrun"
> (which can make operation mistakes) happens easily.
> This problem happens with high rate every boot once it occurs
> because the boot sequence is always almost same.
> 
> Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
> ---
>   drivers/tty/serial/8250/8250_core.c |    6 ++++++
>   1 files changed, 6 insertions(+), 0 deletions(-)

Does not apply to 4.2-rc3 :(

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console
  2015-07-21  9:44                       ` Taichi Kageyama
@ 2015-07-27 19:44                         ` Peter Hurley
  2015-07-29  8:09                           ` Taichi Kageyama
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Hurley @ 2015-07-27 19:44 UTC (permalink / raw)
  To: Taichi Kageyama
  Cc: Prarit Bhargava, gregkh, linux-serial, jslaby, linux-kernel,
	Naoya Horiguchi

[-- Attachment #1: Type: text/plain, Size: 7563 bytes --]

On 07/21/2015 05:44 AM, Taichi Kageyama wrote:
> Hi Peter,
> 
> On 2015/07/21 1:36, Peter Hurley wrote:
>> On 07/16/2015 05:58 AM, Taichi Kageyama wrote:
>>> On 2015/07/15 4:29, Peter Hurley wrote:
>>>> On 07/13/2015 09:16 PM, Taichi Kageyama wrote:
>>>>> On 2015/07/11 9:12, Peter Hurley wrote:
>>>>>> On 07/09/2015 01:32 AM, Taichi Kageyama wrote:
>>>>>>> On 2015/07/08 23:00, Prarit Bhargava wrote:
>>>>>>>> On 07/08/2015 09:51 AM, Peter Hurley wrote:
>>>>>>>>> On 07/08/2015 08:53 AM, Prarit Bhargava wrote:
>>>>>>>>>> On 07/08/2015 07:55 AM, Peter Hurley wrote:
>>>>>>>>>>> On 06/05/2015 06:03 AM, Taichi Kageyama wrote:
>>>>>>>>>>>> This patch provides a new parameter as a workaround of the following
>>>>>>>>>>>> problem. It allows us to skip autoconfig_irq() and to use a well-known irq
>>>>>>>>>>>> number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.
>>>>>>>>>>>>
>>>>>>>>>>>> There're cases where autoconfig_irq() fails during boot.
>>>>>>>>>>>> In these cases, the console doesn't work in interrupt mode,
>>>>>>>>>>>> the mode cannot be changed anymore, and "input overrun"
>>>>>>>>>>>> (which can make operation mistakes) happens easily.
>>>>>>>>>>>> This problem happens with high rate every boot once it occurs
>>>>>>>>>>>> because the boot sequence is always almost same.
>>>>>>>>>>>>
>>>>>>>>>>>> autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
>>>>>>>>>>>> during the waiting time, but there're some cases where the CPU cannot
>>>>>>>>>>>> handle the interrupt for longer than the time. It completely depends on
>>>>>>>>>>>> how other functions work on the CPU. Ideally, autoconfig_irq() should be
>>>>>>>>>>>> fixed
>>>>>>>>>>>
>>>>>>>
>>>>>>> Thank you for your comments.
>>>>>>>
>>>>>>>>>>> It completely depends on how long some other driver has interrupts disabled,
>>>>>>>
>>>>>>> Agree.
>>>>>>>
>>>>>>>>>>> which is a problem that needs fixed _in that driver_. autoconfig_irq() does not
>>>>>>>>>>> need fixing.
>>>>>>>
>>>>>>> Peter, ideally, you're right.
>>>>>>> However, we cannot assume how long other drivers disable interrupts.
>>>>>>> That's why I introduced this workaround.
>>>>>>> In my opinion, a console is important and always should be available
>>>>>>> even if other drivers have a bad behavior.
>>>>>>
>>>>>> I have no problem with wanting to make the console more robust, but
>>>>>> rather with the hacky way this is being done.
>>>>>
>>>>> Hi Peter,
>>>>>
>>>>> Thank you for your advice.
>>>>> If there is other way to fix this problem simply,
>>>>> I also think it's better than the dirty hack.
>>>>
>>>> While module parameters seem like "simple" solutions at the time,
>>>> they add real maintenance burden, because they establish userspace
>>>> requirements that must be preserved forever to avoid breakage.
>>>
>>> Yeah, I agree with you.
>>>
>>>>>> Better solutions:
>>>>>> 1. Fix autoprobing to force irq affinity to autoprobing cpu
>>>>>
>>>>> I couldn't make sure which CPU handled serial interrupt
>>>>> on all platforms before irq# was not known.
>>>>> Do you know the way to detect which CPU is used for console serial?
>>>>
>>>>
>>>> The basic idea would be:
>>>> 1. disable preemption
>>>> 2. for each irq descriptor selected for autoprobing, set the irq
>>>>      affinity to the current processor.
>>>> 3. probe the i/o port as is done now
>>>> 4. stop probing
>>>> 5. re-enable preemption.
>>>
>>> Thanks, I think it works.
>>>
>>>> With this solution, your patch 1/2 wouldn't be required either
>>>> because the worker thread that disabled interrupts wouldn't be
>>>> running on the cpu detecting the triggered irq(s).
>>>
>>> I still need my patch 1/2 which fixes also other cases (see case2 & 3).
>>> I think both port->lock and console_lock are required in your solution.
>>> to avoid deadlock because printk() can be called on every context.
>>>
>>>> I would imagine most or all of this would be done in
>>>> probe_irq_on(), possibly refactored to perform the preemption
>>>> disable and irq affinity.
>>>
>>> I think introducing new function like "probe_irq_set_affinity()" is better
>>> than modifying probe_irq_on(). I cannot test all legacy devices and
>>> I don't have any reason to break the code which works for other devices.
>>
>> That's fine, although most of the arguments for fixing this in the serial
>> driver apply equally to other users of probe_irq_on().
>>
>>
>>>>> The way is safe for all platforms?
>>>>
>>>> Please understand though, autoprobing is not safe, period.
>>>> Even says so in Kconfig.
>>>
>>> OK, I'll try to create new patch which makes autoprobing safer as possible.
>>> New patch is going to be like below.
>>>    1. console and port lock
>>>    2. probe_irq_on()
>>>    3. probe_irq_set_affinity(&cpumask)
>>>    4. probe_irq_off()
>>>    5. port and console unlock
>>
>> The port->lock can't be taken in this context because hard irq
>> has to be disabled with port->lock which defeats the purpose of
>> pinning the irq affinity to the current cpu.
> 
> My test code uses spin_lock() instead of spin_lock_irqsave().
> 
>> What are you concerned about being concurrent with autoconfig_irq()?
>> Many operations are excluded by the port->mutex.
> 
> Actually I don't have any concerns as long as console_lock() is used,
> but I thought protecting port was better during auto_irq
> or register operations as same as autoconfig().
> 
> I was thinking they are used as the following purposes;
>    console_lock()
>     + Make sure serial8250_console_write() doesn't disable interrupt,
>       try to get port->lock or touch the ctrl register of the port.
>       # serial8250_console_write() can be called in any context.
>    spin_lock()
>     + Make sure the probing runs on the current CPU only
>       to handle a serial irq by itself after setting irq affinity.
>     + Make sure any other CPUs don't touch the ctrl register of the port.
> 
> It seems my test code has been working fine so far,
> but let me know if you have any concerns about using spin_lock()
> instead of preempt_disable().

If you turn on lockdep, taking the port->lock without disabling irq will
assert.

A quick static analysis shows autoconfig_irq() reachable via 2 different
call trees:

[1]  uart_add_one_port()
        lock global port_mutex (to prevent concurrent port add/remove)
        lock port->mutex
        uart_configure_port()
           ops->config_port => serial8250_config_port()
              autoconfig_irq()

[2]  ioctl(TIOCSERCONFIG)
        uart_do_autoconfig()
           lock port->mutex
           uart_shutdown()
           ops->config_port => serial8250_config_port()
              autoconfig_irq()

Call tree #1 cannot execute concurrently with any other driver function
because the tty device doesn't even exist at that time.

ioctl(TIOCSERCONFIG) -- call tree #2 -- is pretty much a hack and tries
to do its best to prevent concurrent driver function/hardware access.
So it takes the port->mutex which prevents many concurrent operations,
and shuts down the port hardware. In other words, the autoconfig operation
is intended to be exclusive of any other concurrent hardware access (except
console).

I say 'intended' because this is broken if the line discipline is echoing;
I just fixed this in uart_close() and now realize it's possible wherever
uart_shutdown() is called -- so I need to fix that harder. But my point is
that no other lock should not be necessary.

Please feel free to double-check my work.

Regards,
Peter Hurley

PS -I attached a catalog of concurrent operations excluded by port->mutex.

[-- Attachment #2: port_mutex.analysis --]
[-- Type: text/plain, Size: 1019 bytes --]



8250_port.c
	uses port->mutex to mutually exclude ->fcr changes



serial core:
       mutually excluded operations guaranteed by port->mutex
       		->config_port()
		->set_termios()

		uart_configure_port()
		uart_resume_port()
		uart_suspend_port()
		uart_open() => uart_startup() => uart_port_startup
		uart_hangup() => uart_shutdown()  ** too much coverage **
		uart_close() => uart_shutdown()  ** see uart_close.analysis **
		uart_break_ctl()
		uart_tiocmset()
		uart_tiocmget()
		uart_add_one_port() ** most of this including tty_port_register_device_attr **

		/* ioctls */
		uart_get_info()		// TIOCGSERIAL
		uart_set_info()		// TIOCSSERIAL
		uart_do_autoconfig()	// TIOCSERCONFIG
		uart_get_lsr_info()	// TIOCSERGETLSR
		uart_get_rs485_config	// TIOCGRS485
		uart_set_rs485_config	// TIOCSRS485
		// any serial driver ioctl


		uart_change_pm()

		link/unlink uart_state <=> uart_port
		access to uart_port->flags


		tty_port_tty_set() for uart == incidental


TODO

TIOCMIWAIT handling is messy; simplify

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console
  2015-07-27 19:44                         ` Peter Hurley
@ 2015-07-29  8:09                           ` Taichi Kageyama
  0 siblings, 0 replies; 21+ messages in thread
From: Taichi Kageyama @ 2015-07-29  8:09 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Prarit Bhargava, gregkh, linux-serial, jslaby, linux-kernel,
	Naoya Horiguchi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 8395 bytes --]

On 2015/07/28 4:44, Peter Hurley wrote:
> On 07/21/2015 05:44 AM, Taichi Kageyama wrote:
>> On 2015/07/21 1:36, Peter Hurley wrote:
>>> On 07/16/2015 05:58 AM, Taichi Kageyama wrote:
>>>> On 2015/07/15 4:29, Peter Hurley wrote:
>>>>> On 07/13/2015 09:16 PM, Taichi Kageyama wrote:
>>>>>> On 2015/07/11 9:12, Peter Hurley wrote:
>>>>>>> On 07/09/2015 01:32 AM, Taichi Kageyama wrote:
>>>>>>>> On 2015/07/08 23:00, Prarit Bhargava wrote:
>>>>>>>>> On 07/08/2015 09:51 AM, Peter Hurley wrote:
>>>>>>>>>> On 07/08/2015 08:53 AM, Prarit Bhargava wrote:
>>>>>>>>>>> On 07/08/2015 07:55 AM, Peter Hurley wrote:
>>>>>>>>>>>> On 06/05/2015 06:03 AM, Taichi Kageyama wrote:
>>>>>>>>>>>>> This patch provides a new parameter as a workaround of the following
>>>>>>>>>>>>> problem. It allows us to skip autoconfig_irq() and to use a well-known irq
>>>>>>>>>>>>> number for a console even if CONFIG_SERIAL_8250_DETECT_IRQ is defined.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There're cases where autoconfig_irq() fails during boot.
>>>>>>>>>>>>> In these cases, the console doesn't work in interrupt mode,
>>>>>>>>>>>>> the mode cannot be changed anymore, and "input overrun"
>>>>>>>>>>>>> (which can make operation mistakes) happens easily.
>>>>>>>>>>>>> This problem happens with high rate every boot once it occurs
>>>>>>>>>>>>> because the boot sequence is always almost same.
>>>>>>>>>>>>>
>>>>>>>>>>>>> autoconfig_irq() assumes that a CPU can handle an interrupt from a serial
>>>>>>>>>>>>> during the waiting time, but there're some cases where the CPU cannot
>>>>>>>>>>>>> handle the interrupt for longer than the time. It completely depends on
>>>>>>>>>>>>> how other functions work on the CPU. Ideally, autoconfig_irq() should be
>>>>>>>>>>>>> fixed
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you for your comments.
>>>>>>>>
>>>>>>>>>>>> It completely depends on how long some other driver has interrupts disabled,
>>>>>>>>
>>>>>>>> Agree.
>>>>>>>>
>>>>>>>>>>>> which is a problem that needs fixed _in that driver_. autoconfig_irq() does not
>>>>>>>>>>>> need fixing.
>>>>>>>>
>>>>>>>> Peter, ideally, you're right.
>>>>>>>> However, we cannot assume how long other drivers disable interrupts.
>>>>>>>> That's why I introduced this workaround.
>>>>>>>> In my opinion, a console is important and always should be available
>>>>>>>> even if other drivers have a bad behavior.
>>>>>>>
>>>>>>> I have no problem with wanting to make the console more robust, but
>>>>>>> rather with the hacky way this is being done.
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> Thank you for your advice.
>>>>>> If there is other way to fix this problem simply,
>>>>>> I also think it's better than the dirty hack.
>>>>>
>>>>> While module parameters seem like "simple" solutions at the time,
>>>>> they add real maintenance burden, because they establish userspace
>>>>> requirements that must be preserved forever to avoid breakage.
>>>>
>>>> Yeah, I agree with you.
>>>>
>>>>>>> Better solutions:
>>>>>>> 1. Fix autoprobing to force irq affinity to autoprobing cpu
>>>>>>
>>>>>> I couldn't make sure which CPU handled serial interrupt
>>>>>> on all platforms before irq# was not known.
>>>>>> Do you know the way to detect which CPU is used for console serial?
>>>>>
>>>>>
>>>>> The basic idea would be:
>>>>> 1. disable preemption
>>>>> 2. for each irq descriptor selected for autoprobing, set the irq
>>>>>      affinity to the current processor.
>>>>> 3. probe the i/o port as is done now
>>>>> 4. stop probing
>>>>> 5. re-enable preemption.
>>>>
>>>> Thanks, I think it works.
>>>>
>>>>> With this solution, your patch 1/2 wouldn't be required either
>>>>> because the worker thread that disabled interrupts wouldn't be
>>>>> running on the cpu detecting the triggered irq(s).
>>>>
>>>> I still need my patch 1/2 which fixes also other cases (see case2 & 3).
>>>> I think both port->lock and console_lock are required in your solution.
>>>> to avoid deadlock because printk() can be called on every context.
>>>>
>>>>> I would imagine most or all of this would be done in
>>>>> probe_irq_on(), possibly refactored to perform the preemption
>>>>> disable and irq affinity.
>>>>
>>>> I think introducing new function like "probe_irq_set_affinity()" is better
>>>> than modifying probe_irq_on(). I cannot test all legacy devices and
>>>> I don't have any reason to break the code which works for other devices.
>>>
>>> That's fine, although most of the arguments for fixing this in the serial
>>> driver apply equally to other users of probe_irq_on().
>>>
>>>
>>>>>> The way is safe for all platforms?
>>>>>
>>>>> Please understand though, autoprobing is not safe, period.
>>>>> Even says so in Kconfig.
>>>>
>>>> OK, I'll try to create new patch which makes autoprobing safer as possible.
>>>> New patch is going to be like below.
>>>>    1. console and port lock
>>>>    2. probe_irq_on()
>>>>    3. probe_irq_set_affinity(&cpumask)
>>>>    4. probe_irq_off()
>>>>    5. port and console unlock
>>>
>>> The port->lock can't be taken in this context because hard irq
>>> has to be disabled with port->lock which defeats the purpose of
>>> pinning the irq affinity to the current cpu.
>>
>> My test code uses spin_lock() instead of spin_lock_irqsave().
>>
>>> What are you concerned about being concurrent with autoconfig_irq()?
>>> Many operations are excluded by the port->mutex.
>>
>> Actually I don't have any concerns as long as console_lock() is used,
>> but I thought protecting port was better during auto_irq
>> or register operations as same as autoconfig().
>>
>> I was thinking they are used as the following purposes;
>>    console_lock()
>>     + Make sure serial8250_console_write() doesn't disable interrupt,
>>       try to get port->lock or touch the ctrl register of the port.
>>       # serial8250_console_write() can be called in any context.
>>    spin_lock()
>>     + Make sure the probing runs on the current CPU only
>>       to handle a serial irq by itself after setting irq affinity.
>>     + Make sure any other CPUs don't touch the ctrl register of the port.
>>
>> It seems my test code has been working fine so far,
>> but let me know if you have any concerns about using spin_lock()
>> instead of preempt_disable().
> 
> If you turn on lockdep, taking the port->lock without disabling irq will
> assert.

You're right. 
I've changed my code, thank you!

I actually have tested preempt_disable version too.
It has been working on my machine, but I've found it doesn't work on KVM with v4.1.
Now I'm lost about how irq affinity should be set during interrupt probing phase...
I guess more fixes are required to achieve this. 
I'll send v2 as RFC.

> A quick static analysis shows autoconfig_irq() reachable via 2 different
> call trees:
> 
> [1]  uart_add_one_port()
>         lock global port_mutex (to prevent concurrent port add/remove)
>         lock port->mutex
>         uart_configure_port()
>            ops->config_port => serial8250_config_port()
>               autoconfig_irq()
> 
> [2]  ioctl(TIOCSERCONFIG)
>         uart_do_autoconfig()
>            lock port->mutex
>            uart_shutdown()
>            ops->config_port => serial8250_config_port()
>               autoconfig_irq()
> 
> Call tree #1 cannot execute concurrently with any other driver function
> because the tty device doesn't even exist at that time.
> 
> ioctl(TIOCSERCONFIG) -- call tree #2 -- is pretty much a hack and tries
> to do its best to prevent concurrent driver function/hardware access.
> So it takes the port->mutex which prevents many concurrent operations,
> and shuts down the port hardware. In other words, the autoconfig operation
> is intended to be exclusive of any other concurrent hardware access (except
> console).
> 
> I say 'intended' because this is broken if the line discipline is echoing;
> I just fixed this in uart_close() and now realize it's possible wherever
> uart_shutdown() is called -- so I need to fix that harder. But my point is
> that no other lock should not be necessary.
> 
> Please feel free to double-check my work.

Thank you for the details. I understand. 
I've updated commit log.


Thanks,
Taichi



> 
> Regards,
> Peter Hurley
> 
> PS -I attached a catalog of concurrent operations excluded by port->mutex.
> 
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] serial: 8250: Fix autoconfig_irq() to avoid race conditions
  2015-07-23 22:32   ` gregkh
@ 2015-07-29  8:13     ` Taichi Kageyama
  0 siblings, 0 replies; 21+ messages in thread
From: Taichi Kageyama @ 2015-07-29  8:13 UTC (permalink / raw)
  To: gregkh; +Cc: linux-serial, jslaby, linux-kernel, prarit, Naoya Horiguchi

On 2015/07/24 7:32, gregkh@linuxfoundation.org wrote:
> On Fri, Jun 05, 2015 at 09:57:40AM +0000, Taichi Kageyama wrote:
>> The following race conditions can happen if a serial is used as console.
>>   Case1. CPU_B handles an interrupt from a serial
>>     autoconfig_irq() fails whether the interrupt is raised or not
>>     if CPU_B is disabled to handle interrupts for longer than it expects.
>>   Case2. CPU_B clears UART_IER just after CPU_A sets UART_IER
>>     A serial may not make an interrupt.
>>     autoconfig_irq() can fail if the interrupt is not raised.
>>   Case3. CPU_A sets UART_IER just after CPU_B clears UART_IER
>>     This is an unexpected behavior for uart_console_write().
>>
>>   CPU_A [autoconfig_irq]      CPU_B [serial8250_console_write]
>>   -----------------------------------------------------------------
>>   probe_irq_on()              spin_lock_irqsave(&port->lock,)
>>   serial_outp(,UART_IER,0x0f) serial_out(,UART_IER,0)
>>   udelay(20);                 uart_console_write()
>>   probe_irq_off()
>>                               spin_unlock_irqrestore(&port->lock,)
>>   -----------------------------------------------------------------
>>
>> If autoconfig_irq() fails, the console doesn't work in interrupt mode,
>> the mode cannot be changed anymore, and "input overrun"
>> (which can make operation mistakes) happens easily.
>> This problem happens with high rate every boot once it occurs
>> because the boot sequence is always almost same.
>>
>> Signed-off-by: Taichi Kageyama <t-kageyama@cp.jp.nec.com>
>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
>> ---
>>   drivers/tty/serial/8250/8250_core.c |    6 ++++++
>>   1 files changed, 6 insertions(+), 0 deletions(-)
> 
> Does not apply to 4.2-rc3 :(
> 

Sorry for the trouble and late reply.
It's my fault. It was encoded as base64 by Exchange.
I sent new patch set including this patch.
I updated commit log.


Regards,
Taichi

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-07-29  8:22 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-05  9:55 [PATCH 0/2] serial: 8250: Workaround to avoid irq=0 for console Taichi Kageyama
2015-06-05  9:57 ` [PATCH 1/2] serial: 8250: Fix autoconfig_irq() to avoid race conditions Taichi Kageyama
2015-07-08 23:35   ` Peter Hurley
2015-07-23 22:32   ` gregkh
2015-07-29  8:13     ` Taichi Kageyama
2015-06-05 10:03 ` [PATCH 2/2] serial: 8250: Allow to skip autoconfig_irq() for a console Taichi Kageyama
2015-07-08 11:55   ` Peter Hurley
2015-07-08 12:53     ` Prarit Bhargava
2015-07-08 13:51       ` Peter Hurley
2015-07-08 14:00         ` Prarit Bhargava
2015-07-09  5:32           ` Taichi Kageyama
2015-07-11  0:12             ` Peter Hurley
2015-07-14  1:16               ` Taichi Kageyama
2015-07-14 19:29                 ` Peter Hurley
2015-07-16  9:58                   ` Taichi Kageyama
2015-07-20 16:36                     ` Peter Hurley
2015-07-21  9:44                       ` Taichi Kageyama
2015-07-27 19:44                         ` Peter Hurley
2015-07-29  8:09                           ` Taichi Kageyama
2015-07-07  8:13 ` [PATCH 0/2] serial: 8250: Workaround to avoid irq=0 for console Taichi Kageyama
2015-07-07 22:42   ` gregkh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).