All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] serial: 8250 check iir rdi in interrupt
@ 2012-10-23  0:19 Min Zhang
  2012-10-23 10:01 ` Alan Cox
  0 siblings, 1 reply; 7+ messages in thread
From: Min Zhang @ 2012-10-23  0:19 UTC (permalink / raw)
  To: gregkh, linux-serial; +Cc: linux-kernel

The patch works around two UART interrupt bugs when the serial console is 
flooded with inputs:
1. syslog shows "serial8250: too much works for irq"
2. serial console stops responding to key stroke

serial8250_handle_irq() checks UART_IIR_RDI before reading receive fifo
and clears bogus interrupt UART_IIR_RDI without accompanying UART_LSR_DR,
otherwise RDI interrupt could freeze or too many unhandled RDI interrupts.

Added module parameter skip_rdi_check to opt out this workaround.

Tested on Radisys ATCA 46XX which uses FPGA 16550-compatible and
other generic 16550 UART. It takes from an hour to days to reproduce by
pumping inputs to serial console continously using TeraTerm script:

---
  drivers/tty/serial/8250/8250.c |   55 ++++++++++++++++++++++++++++++++++++++++
  1 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/drivers/tty/serial/8250/8250.c b/drivers/tty/serial/8250/8250.c
index 3ba4234..838dd22 100644
--- a/drivers/tty/serial/8250/8250.c
+++ b/drivers/tty/serial/8250/8250.c
@@ -64,6 +64,7 @@ static int serial_index(struct uart_port *port)
  }

  static unsigned int skip_txen_test; /* force skip of txen test at init time */
+static unsigned int skip_rdi_check; /* skip of IIR RDI check in interrupt */

  /*
   * Debugging.
@@ -1479,6 +1480,51 @@ unsigned int serial8250_modem_status(struct uart_8250_port *up)
  EXPORT_SYMBOL_GPL(serial8250_modem_status);

  /*
+ * Check if status UART_LSR_RD accompanies with interrupt UART_IIR_RDI.
+ * If they are mismatch, massage the status or interupt cause accordingly:
+ *
+ * Return a cleared UART_LSR_RD status if there is no accompanying
+ * UART_IIR_RDI. Hopefully the new status is used by interrupt handler
+ * to skip reading receive FIFO. Otherwise some UART controller stops
+ * generating RDI interrupt after this unnotified FIFO read, until other
+ * interrupts maybe transmit interrupt reads UART_LSR again.
+ *
+ * Or clear interrupt cause UART_IIR_RDI without UART_LSR_RD. The UART sets
+ * UART_IIR_RDI *even* if the received data has been read out from the FIFO
+ * before the timeout occurs.  To clear UART_IIR_RDI, read receive buffer
+ * register. Reading it also clears timeout interrupt for 16550+. Otherwise
+ * the uncleared UART_IIR_RDI will keep triggering IRQ but interrupt
+ * handler finds nothing to do.
+ *
+ * Skip this workaround if interrupt is not expected, such as backup timer,
+ * so that handler can still solely rely on original status register.
+ */
+static unsigned char serial8250_iir_rdi_check(struct uart_8250_port *up,
+					     unsigned char status,
+					     unsigned int iir)
+{
+	unsigned int ier, rdi_stat, rdi_intr;
+
+	/* skip for handler without interrupt */
+	if (!up->port.irq)
+		return status;
+
+	/* skip for polling handler such as backup timer */
+	ier = serial_in(up, UART_IER);
+	if (!(ier & UART_IER_RDI))
+		return status;
+
+	rdi_stat = status & UART_LSR_DR;
+	rdi_intr = iir & UART_IIR_RDI;
+
+	if (rdi_stat && !rdi_intr)
+		status &= ~UART_LSR_DR;
+	else if (!rdi_stat && rdi_intr)
+		serial_in(up, UART_RX);
+	return status;
+}
+
+/*
   * This handles the interrupt from one port.
   */
  int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
@@ -1497,6 +1543,12 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)

  	DEBUG_INTR("status = %x...", status);

+	/* Some UART controller has mismatched UART_IIR_RDI and UART_LSR_DR,
+	   which causes either too many interrupts or interrupt freeze
+	 */
+	if (!skip_rdi_check)
+		status = serial8250_iir_rdi_check(up, status, iir);
+
  	if (status & (UART_LSR_DR | UART_LSR_BI))
  		status = serial8250_rx_chars(up, status);
  	serial8250_modem_status(up);
@@ -3338,6 +3390,9 @@ MODULE_PARM_DESC(nr_uarts, "Maximum number of UARTs supported. (1-" __MODULE_STR
  module_param(skip_txen_test, uint, 0644);
  MODULE_PARM_DESC(skip_txen_test, "Skip checking for the TXEN bug at init time");

+module_param(skip_rdi_check, uint, 0644);
+MODULE_PARM_DESC(skip_rdi_check, "Skip checking IIR RDI bug in interrupt");
+
  #ifdef CONFIG_SERIAL_8250_RSA
  module_param_array(probe_rsa, ulong, &probe_rsa_count, 0444);
  MODULE_PARM_DESC(probe_rsa, "Probe I/O ports for RSA");
-- 
1.7.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] serial: 8250 check iir rdi in interrupt
  2012-10-23  0:19 [PATCH] serial: 8250 check iir rdi in interrupt Min Zhang
@ 2012-10-23 10:01 ` Alan Cox
  2012-10-23 19:43   ` Min Zhang
  0 siblings, 1 reply; 7+ messages in thread
From: Alan Cox @ 2012-10-23 10:01 UTC (permalink / raw)
  To: Min Zhang; +Cc: gregkh, linux-serial, linux-kernel

> Added module parameter skip_rdi_check to opt out this workaround.

NAK. Anything like this should be runtime.

> Tested on Radisys ATCA 46XX which uses FPGA 16550-compatible and
> other generic 16550 UART. It takes from an hour to days to reproduce by
> pumping inputs to serial console continously using TeraTerm script:

You turn this on by default but it's a nasty IRQ latency penalty
on a lot of x86 platforms with the uarts on the lpc bus.

What I am not clear on from this is

- do you see it on both the ports (the bug that is)
- if you do see it on both are you sure its not in reality a symptom of
  some other console/irq handling race ?

Alan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] serial: 8250 check iir rdi in interrupt
  2012-10-23 10:01 ` Alan Cox
@ 2012-10-23 19:43   ` Min Zhang
  2012-10-26 14:19     ` Alan Cox
  0 siblings, 1 reply; 7+ messages in thread
From: Min Zhang @ 2012-10-23 19:43 UTC (permalink / raw)
  To: Alan Cox; +Cc: gregkh, linux-serial, linux-kernel

On Tue, Oct 23, 2012 at 3:01 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>
> > Added module parameter skip_rdi_check to opt out this workaround.
>
> NAK. Anything like this should be runtime.

One can echo 1 (or 0) > /sys/modules/8250/parameters/skip_rdi_check
during run time to turn it off (or on) dynamically. Does it count as
runtime?

> > Tested on Radisys ATCA 46XX which uses FPGA 16550-compatible and
> > other generic 16550 UART. It takes from an hour to days to reproduce by
> > pumping inputs to serial console continously using TeraTerm script:
>
> You turn this on by default but it's a nasty IRQ latency penalty
> on a lot of x86 platforms with the uarts on the lpc bus.

I agree. Will this patch be more acceptable if default is off? I can't
narrow it hardware down since it is all generic UART.,

> What I am not clear on from this is
>
> - do you see it on both the ports (the bug that is)

No, each hardware only has one serial console port that has traffic,
and only one of the two symptom occur on one type of hardware. That is
hardware 1 ttyS0 has "too much work for irq", and hardware 2 ttyS0 has
console freeze under a separate test. I group them together since they
occur using the same console flooding test script and under similar
RDI root cause.

> - if you do see it on both are you sure its not in reality a symptom of
>   some other console/irq handling race ?

It is racing. For "too much work for irq", here is sequence events
analyzed by a Motorola engineer:

        1) Data arrives in the FIFO, but not enough to cause an
interrupt
        2) The transmitter is started.
        3) A transmit needs data interrupt occurs (0xC2 in the
IIR)
        4) The processing function is called and it reads the
LSR
        5) The LSR indicates that the transmitter needs data,
but also indicates the presence of data in the FIFO (0x61 in the LSR)
        6) The processing function receives the characters, and
outputs data to the FIFO
        7) At the exact time (very very small window) that the
character is read from the FIFO, the FIFO timeout occurs locking in an
interrupt cause
        8) The next loop through the interrupt code begins
        9) The IIR now indicates the data timeout interrupt
(0xCC in the IIR)
        10) The processing function is called and it reads the
LSR
        11) The LSR is 0 indicating nothing to do
        12) The interrupt loop continues (the IIR won't clear
until a character is pulled) until it reaches its max count and
displays the error.

The other console freeze symptom is caused by similar sequence. The
last interrupt before interrupt stops always shows IIR=0xC2 and
LSR=0x21, which means has transmit interrupt but both transmit and
receive status.

After interrupt stops, i insmod a module to force read: IIR=0xC6,
IER=0x0F, still no interrupt. Then I read LSR=0xE3., which is what the
next interrupt would have done, makes interrupt resume again. Instead
of force reading LSR, I can also resume interrupt by forcing a printk,
which triggers a new transmit interrupt that reads LSR anyway.

>
> Alan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] serial: 8250 check iir rdi in interrupt
  2012-10-23 19:43   ` Min Zhang
@ 2012-10-26 14:19     ` Alan Cox
  2012-10-26 21:57       ` Min Zhang
  2012-10-26 22:16       ` [PATCHv2] " Min Zhang
  0 siblings, 2 replies; 7+ messages in thread
From: Alan Cox @ 2012-10-26 14:19 UTC (permalink / raw)
  To: Min Zhang; +Cc: gregkh, linux-serial, linux-kernel

> It is racing. For "too much work for irq", here is sequence events
> analyzed by a Motorola engineer:

Thanks - this is enormously helpful in understanding the report.

>         5) The LSR indicates that the transmitter needs data,
> but also indicates the presence of data in the FIFO (0x61 in the LSR)
>         6) The processing function receives the characters, and
> outputs data to the FIFO
>         7) At the exact time (very very small window) that the
> character is read from the FIFO, the FIFO timeout occurs locking in an
> interrupt cause
>         8) The next loop through the interrupt code begins
>         9) The IIR now indicates the data timeout interrupt
> (0xCC in the IIR)
>         10) The processing function is called and it reads the
> LSR
>         11) The LSR is 0 indicating nothing to do

>         12) The interrupt loop continues (the IIR won't clear
> until a character is pulled) until it reaches its max count and
> displays the error.

So we only need to check this in serial8250_handle_irq when IIR indicates
a data timeout interrupt ? 

Can we do

	if ((iir & 0x0F) == 0x0C) {
		/* Expensive RDI check */
	}



Alan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] serial: 8250 check iir rdi in interrupt
  2012-10-26 14:19     ` Alan Cox
@ 2012-10-26 21:57       ` Min Zhang
  2012-10-30 14:24         ` Alan Cox
  2012-10-26 22:16       ` [PATCHv2] " Min Zhang
  1 sibling, 1 reply; 7+ messages in thread
From: Min Zhang @ 2012-10-26 21:57 UTC (permalink / raw)
  To: Alan Cox; +Cc: gregkh, linux-serial, linux-kernel

On Fri, Oct 26, 2012 at 7:19 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> So we only need to check this in serial8250_handle_irq when IIR indicates
> a data timeout interrupt ?
>
> Can we do
>
>         if ((iir & 0x0F) == 0x0C) {
>                 /* Expensive RDI check */
>         }
>
>
>
> Alan

Checking data timeout interrupt is only for "too much work for irq"
problem. There is another console freeze problem which is caused by
reading receive FIFO when there is no RDI interrupt, such as during
THRI transmit interrupt, so no timeout interrupt. This time one has to
check two both IIR and LSR. After all these checking, it becomes the
original patch anyway.

I am posting another simpler revision to exclude timer handler from
this workaround, and make this workaround default off and inline.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCHv2] serial: 8250 check iir rdi in interrupt
  2012-10-26 14:19     ` Alan Cox
  2012-10-26 21:57       ` Min Zhang
@ 2012-10-26 22:16       ` Min Zhang
  1 sibling, 0 replies; 7+ messages in thread
From: Min Zhang @ 2012-10-26 22:16 UTC (permalink / raw)
  To: Alan Cox; +Cc: gregkh, linux-serial, linux-kernel

The patch works around two UART interrupt bugs when the serial console is
flooded with inputs:
1. syslog shows "serial8250: too much works for irq"
2. serial console stops responding to key stroke

serial8250_handle_irq() checks UART_IIR_RDI before reading receive fifo
and clears bogus interrupt UART_IIR_RDI without accompanying UART_LSR_DR,
otherwise RDI interrupt could freeze or too many unhandled RDI interrupts.

Added module parameter skip_rdi_check to opt out this workaround.

Tested on Radisys ATCA 46XX which uses FPGA 16550-compatible and
other generic 16550 UART. It takes from an hour to days to reproduce by
pumping inputs to serial console continously using TeraTerm script:

Signed-off-by: Min Zhang <mzhang@mvista.com>
---
  drivers/tty/serial/8250/8250.c |   50 ++++++++++++++++++++++++++++++++++++++++
  1 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/drivers/tty/serial/8250/8250.c b/drivers/tty/serial/8250/8250.c
index 3ba4234..dfc13d1 100644
--- a/drivers/tty/serial/8250/8250.c
+++ b/drivers/tty/serial/8250/8250.c
@@ -64,6 +64,7 @@ static int serial_index(struct uart_port *port)
  }

  static unsigned int skip_txen_test; /* force skip of txen test at init time */
+static unsigned int skip_rdi_check = 1; /* skip of IIR RDI check in interrupt */

  /*
   * Debugging.
@@ -1479,6 +1480,46 @@ unsigned int serial8250_modem_status(struct uart_8250_port *up)
  EXPORT_SYMBOL_GPL(serial8250_modem_status);

  /*
+ * Check if status UART_LSR_RD accompanies with interrupt UART_IIR_RDI.
+ * If they are mismatch, massage the status or interupt cause accordingly:
+ *
+ * Return a cleared UART_LSR_RD status if there is no accompanying
+ * UART_IIR_RDI. Hopefully the new status is used by interrupt handler
+ * to skip reading receive FIFO. Otherwise some UART controller stops
+ * generating RDI interrupt after this unnotified FIFO read, until other
+ * interrupts maybe transmit interrupt reads UART_LSR again.
+ *
+ * Or clear interrupt cause UART_IIR_RDI without UART_LSR_RD. The UART sets
+ * UART_IIR_RDI *even* if the received data has been read out from the FIFO
+ * before the timeout occurs.  To clear UART_IIR_RDI, read receive buffer
+ * register. Reading it also clears timeout interrupt for 16550+. Otherwise
+ * the uncleared UART_IIR_RDI will keep triggering IRQ but interrupt
+ * handler finds nothing to do.
+ *
+ * Skip this workaround if interrupt is not expected, such as backup timer,
+ * so that handler can still solely rely on original status register.
+ */
+static inline unsigned char serial8250_iir_rdi_check(struct uart_8250_port *up,
+						     unsigned char status,
+						     unsigned int iir)
+{
+	unsigned int rdi_stat, rdi_intr;
+
+	/* skip for timer based handler */
+	if (up->timer.data)
+		return status;
+
+	rdi_stat = status & UART_LSR_DR;
+	rdi_intr = iir & UART_IIR_RDI;
+
+	if (rdi_stat && !rdi_intr)
+		status &= ~UART_LSR_DR;
+	else if (!rdi_stat && rdi_intr)
+		serial_in(up, UART_RX);
+	return status;
+}
+
+/*
   * This handles the interrupt from one port.
   */
  int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
@@ -1497,6 +1538,12 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)

  	DEBUG_INTR("status = %x...", status);

+	/* Some UART controller has mismatched UART_IIR_RDI and UART_LSR_DR,
+	   which causes either too many interrupts or interrupt freeze
+	 */
+	if (!skip_rdi_check)
+		status = serial8250_iir_rdi_check(up, status, iir);
+
  	if (status & (UART_LSR_DR | UART_LSR_BI))
  		status = serial8250_rx_chars(up, status);
  	serial8250_modem_status(up);
@@ -3338,6 +3385,9 @@ MODULE_PARM_DESC(nr_uarts, "Maximum number of UARTs supported. (1-" __MODULE_STR
  module_param(skip_txen_test, uint, 0644);
  MODULE_PARM_DESC(skip_txen_test, "Skip checking for the TXEN bug at init time");

+module_param(skip_rdi_check, uint, 0644);
+MODULE_PARM_DESC(skip_rdi_check, "Skip checking IIR RDI bug in interrupt");
+
  #ifdef CONFIG_SERIAL_8250_RSA
  module_param_array(probe_rsa, ulong, &probe_rsa_count, 0444);
  MODULE_PARM_DESC(probe_rsa, "Probe I/O ports for RSA");
-- 
1.7.0.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] serial: 8250 check iir rdi in interrupt
  2012-10-26 21:57       ` Min Zhang
@ 2012-10-30 14:24         ` Alan Cox
  0 siblings, 0 replies; 7+ messages in thread
From: Alan Cox @ 2012-10-30 14:24 UTC (permalink / raw)
  To: Min Zhang; +Cc: gregkh, linux-serial, linux-kernel

On Fri, 26 Oct 2012 14:57:31 -0700
Min Zhang <mzhang@mvista.com> wrote:

> On Fri, Oct 26, 2012 at 7:19 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> 
> > So we only need to check this in serial8250_handle_irq when IIR indicates
> > a data timeout interrupt ?
> >
> > Can we do
> >
> >         if ((iir & 0x0F) == 0x0C) {
> >                 /* Expensive RDI check */
> >         }
> >
> >
> >
> > Alan
> 
> Checking data timeout interrupt is only for "too much work for irq"
> problem. There is another console freeze problem which is caused by
> reading receive FIFO when there is no RDI interrupt, such as during
> THRI transmit interrupt, so no timeout interrupt. This time one has to
> check two both IIR and LSR. After all these checking, it becomes the
> original patch anyway.
> 
> I am posting another simpler revision to exclude timer handler from
> this workaround, and make this workaround default off and inline.

We have two problems one of which is understood. Your patch continues to
do an extra read in an absolutely critical hot path and one that on many
systems will cost upwards of 8uS.

Can we therefore work through this neatly one problem at a time ?

In addition it seems your patch causes data loss if your check triggers
and a character appears during the check, at which point I don't see what
stops your RX read from dropping a byte of real user data ?

Alan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-10-30 14:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-23  0:19 [PATCH] serial: 8250 check iir rdi in interrupt Min Zhang
2012-10-23 10:01 ` Alan Cox
2012-10-23 19:43   ` Min Zhang
2012-10-26 14:19     ` Alan Cox
2012-10-26 21:57       ` Min Zhang
2012-10-30 14:24         ` Alan Cox
2012-10-26 22:16       ` [PATCHv2] " Min Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.