linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
@ 2004-10-29 19:55 Tim_T_Murphy
  2004-10-29 20:20 ` Russell King
  2004-10-29 21:08 ` [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel Paul Fulghum
  0 siblings, 2 replies; 27+ messages in thread
From: Tim_T_Murphy @ 2004-10-29 19:55 UTC (permalink / raw)
  To: linux-kernel

I am new to the list, hope this is ok..
I've read about several problems others are having with the new 2.6 serial driver in the list, and tried to see if their solutions solved my issue also, but unfortunately none that I have tried yet have helped.

We're migrating our applications for the Dell Remote Access Controller (DRAC) to run on a 2.6 kernel from a 2.4 kernel. Communication between the apps and the DRAC happen over a ppp link which is established via a service startup script; the script uses setserial to prepare an unused tty (based on the assigned hardware information, obtained via lspci), and the script then calls pppd to finish/establish the link.

Everything works fine with the UP kernel -- Although, there is a message in syslog regarding a spinlock (issued at approximately the same point in time where the SMP kernel hangs):
---
Oct 29 13:34:47 racjag-1 kernel: CSLIP: code copyright 1989 Regents of the University of California
Oct 29 13:34:47 racjag-1 kernel: PPP generic driver version 2.4.2
Oct 29 13:34:47 racjag-1 udev[3875]: creating device node '/dev/ppp'
Oct 29 13:34:47 racjag-1 pppd[3884]: pppd 2.4.2 started by root, uid 0
Oct 29 13:34:47 racjag-1 racser: pppd startup succeeded
Oct 29 13:34:48 racjag-1 chat[3886]: send (CLIENT^M)
Oct 29 13:34:48 racjag-1 chat[3886]: expect (CLIENTSERVER)
Oct 29 13:34:48 racjag-1 kernel: drivers/serial/serial_core.c:102: spin_lock(drivers/serial/serial_core.c:023f2548) already locked by drivers/serial/8250.c/1015
Oct 29 13:34:48 racjag-1 kernel: drivers/serial/8250.c:1017: spin_unlock(drivers/serial/serial_core.c:023f2548) not locked
Oct 29 13:34:48 racjag-1 chat[3886]: CLIENTSERVER
Oct 29 13:34:48 racjag-1 chat[3886]:  -- got it 
Oct 29 13:34:48 racjag-1 chat[3886]: send ()
Oct 29 13:34:48 racjag-1 pppd[3884]: Serial connection established.
Oct 29 13:34:48 racjag-1 pppd[3884]: Using interface ppp0
Oct 29 13:34:48 racjag-1 pppd[3884]: Connect: ppp0 <--> /dev/ttyS2
Oct 29 13:34:49 racjag-1 pppd[3884]: local  IP address 192.168.234.235
Oct 29 13:34:49 racjag-1 pppd[3884]: remote IP address 192.168.234.236
---

With the SMP kernel, it hangs very soon after starting pppd.
I enabled DEBUG in the serial driver and captured the syslog when the problem happens, but this is not detailed enough for me to finger the exact problem:
---
Oct 28 14:04:52 racjag-1 kernel: CSLIP: code copyright 1989 Regents of the University of California
Oct 28 14:04:52 racjag-1 kernel: PPP generic driver version 2.4.2
Oct 28 14:04:52 racjag-1 udev[3621]: creating device node '/dev/ppp'
Oct 28 14:05:19 racjag-1 kernel: uart_open(2) called
Oct 28 14:05:19 racjag-1 kernel: Trying to free nonexistent resource <00000000-00000007>
Oct 28 14:05:19 racjag-1 kernel: uart_close(2) called
Oct 28 14:05:19 racjag-1 kernel: uart_flush_buffer(2) called
Oct 28 14:05:19 racjag-1 kernel: uart_open(2) called
Oct 28 14:05:19 racjag-1 kernel: uart_close(2) called
Oct 28 14:05:19 racjag-1 kernel: uart_flush_buffer(2) called
Oct 28 14:05:19 racjag-1 pppd[3681]: pppd 2.4.1 started by root, uid 0
Oct 28 14:05:19 racjag-1 kernel: uart_open(2) called
Oct 28 14:05:19 racjag-1 racser: pppd startup succeeded
Oct 28 14:05:20 racjag-1 kernel: uart_open(2) called
Oct 28 14:05:20 racjag-1 kernel: uart_close(2) called
Oct 28 14:05:20 racjag-1 chat[3683]: send (CLIENT^M)
---
The system hangs right there; must press and hold power to get the system to shut down.

Any suggestions to narrow down the cause?  Please cc my email as I do not subscribe to this list.
Thanks,
Tim


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-10-29 19:55 [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel Tim_T_Murphy
@ 2004-10-29 20:20 ` Russell King
  2004-10-29 22:18   ` Paul Fulghum
  2004-10-29 23:40   ` Paul Fulghum
  2004-10-29 21:08 ` [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel Paul Fulghum
  1 sibling, 2 replies; 27+ messages in thread
From: Russell King @ 2004-10-29 20:20 UTC (permalink / raw)
  To: Tim_T_Murphy; +Cc: linux-kernel

On Fri, Oct 29, 2004 at 02:55:10PM -0500, Tim_T_Murphy@Dell.com wrote:
> I've read about several problems others are having with the new 2.6
> serial driver in the list, and tried to see if their solutions solved
> my issue also, but unfortunately none that I have tried yet have helped.

Well, this is the first I know of this kind of problem...

> We're migrating our applications for the Dell Remote Access Controller
> (DRAC) to run on a 2.6 kernel from a 2.4 kernel. Communication between
> the apps and the DRAC happen over a ppp link which is established via
> a service startup script; the script uses setserial to prepare an unused
> tty (based on the assigned hardware information, obtained via lspci),
> and the script then calls pppd to finish/establish the link.

Shouldn't 8250_pci setup the ports already for you?  If not, what needs
to be done to achieve this.  Using setserial to setup ports for PCI cards
isn't the preferred way of doing this.

At a guess, you've enabled "low latency" setting on this port ?

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-10-29 19:55 [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel Tim_T_Murphy
  2004-10-29 20:20 ` Russell King
@ 2004-10-29 21:08 ` Paul Fulghum
  1 sibling, 0 replies; 27+ messages in thread
From: Paul Fulghum @ 2004-10-29 21:08 UTC (permalink / raw)
  To: Tim_T_Murphy; +Cc: linux-kernel

On Fri, 2004-10-29 at 14:55, Tim_T_Murphy@Dell.com wrote:
> Oct 29 13:34:48 racjag-1 chat[3886]: expect (CLIENTSERVER)
> Oct 29 13:34:48 racjag-1 kernel: drivers/serial/serial_core.c:102: spin_lock(drivers/serial/serial_core.c:023f2548) already locked by drivers/serial/8250.c/1015
> Oct 29 13:34:48 racjag-1 kernel: drivers/serial/8250.c:1017: spin_unlock(drivers/serial/serial_core.c:023f2548) not locked
> Oct 29 13:34:48 racjag-1 chat[3886]: CLIENTSERVER

One way this can happen is a receive interrupt:

serial8250_interrupt();
    spin_lock(port->lock);
    serial8250_handle_port();
       receive_chars();
          flip.work.func(); /* if FLIP buffer full */
             ldisc->receive_buf(); /* N_TTY */
                 tty->driver->flush_chars();
                     uart_start();
                        spin_lock(port->lock); *BANG*

Try the attached patch and report what happens.

-- 
Paul Fulghum
paulkf@microgate.com

--- linux-2.6.8/drivers/serial/8250.c	2004-08-14 00:36:13.000000000 -0500
+++ b/drivers/serial/8250.c	2004-10-29 15:58:28.076014336 -0500
@@ -830,9 +830,13 @@ receive_chars(struct uart_8250_port *up,
 
 	do {
 		if (unlikely(tty->flip.count >= TTY_FLIPBUF_SIZE)) {
-			tty->flip.work.func((void *)tty);
-			if (tty->flip.count >= TTY_FLIPBUF_SIZE)
-				return; // if TTY_DONT_FLIP is set
+			/* no room in flip buffer, discard rx FIFO contents to clear IRQ */
+			do {
+				serial_inp(up, UART_RX);
+				up->port.icount.overrun++;
+				*status = serial_inp(up, UART_LSR);
+			} while ((*status & UART_LSR_DR) && (max_count-- > 0));
+			return;	/* if TTY_DONT_FLIP is set */
 		}
 		ch = serial_inp(up, UART_RX);
 		*tty->flip.char_buf_ptr = ch;



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-10-29 20:20 ` Russell King
@ 2004-10-29 22:18   ` Paul Fulghum
  2004-10-29 23:40   ` Paul Fulghum
  1 sibling, 0 replies; 27+ messages in thread
From: Paul Fulghum @ 2004-10-29 22:18 UTC (permalink / raw)
  To: Russell King; +Cc: Tim_T_Murphy, Linux Kernel list

On Fri, 2004-10-29 at 15:20, Russell King wrote:
> At a guess, you've enabled "low latency" setting on this port ?

Ah, that would explain the problem better than
the code path I saw (flip buffer full).
The problem is still the same: calling the flip
work routine from the ISR, which calls through
N_TTY receive_buf->flush_chars->start_tx.

-- 
Paul Fulghum
paulkf@microgate.com



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-10-29 20:20 ` Russell King
  2004-10-29 22:18   ` Paul Fulghum
@ 2004-10-29 23:40   ` Paul Fulghum
  2004-10-30 22:43     ` Alan Cox
  1 sibling, 1 reply; 27+ messages in thread
From: Paul Fulghum @ 2004-10-29 23:40 UTC (permalink / raw)
  To: Russell King; +Cc: Tim_T_Murphy, Linux Kernel list

On Fri, 2004-10-29 at 15:20, Russell King wrote:
> At a guess, you've enabled "low latency" setting on this port ?

Would it make sense to do something like (in tty_io.c) the following?

void tty_flip_buffer_push(struct tty_struct *tty)
{
	if (tty->low_latency) {
		if (in_interrupt()) {
			printk(KERN_ERR "tty_flip_buffer_push called with low latency from interrupt!\n");
			dump_stack();
			schedule_delayed_work(&tty->flip.work, 1);
		}
		else
			flush_to_ldisc((void *) tty);
	}
	else
		schedule_delayed_work(&tty->flip.work, 1);
}

-- 
Paul Fulghum
paulkf@microgate.com



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-10-29 23:40   ` Paul Fulghum
@ 2004-10-30 22:43     ` Alan Cox
  2004-10-31  0:26       ` Paul Fulghum
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Cox @ 2004-10-30 22:43 UTC (permalink / raw)
  To: Paul Fulghum; +Cc: Russell King, Tim_T_Murphy, Linux Kernel Mailing List

On Sad, 2004-10-30 at 00:40, Paul Fulghum wrote:
> On Fri, 2004-10-29 at 15:20, Russell King wrote:
> > At a guess, you've enabled "low latency" setting on this port ?
> 
> Would it make sense to do something like (in tty_io.c) the following?

Not really because it can legally occur if you flip the low latency
flag while a transaction is queued. It might work if you waited for
scheduled work to complete in the flag changing.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-10-30 22:43     ` Alan Cox
@ 2004-10-31  0:26       ` Paul Fulghum
  2004-11-01  7:14         ` [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel Stuart MacDonald
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Fulghum @ 2004-10-31  0:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: Russell King, Tim_T_Murphy, Linux Kernel Mailing List

On Sat, 2004-10-30 at 17:43, Alan Cox wrote:
> On Sad, 2004-10-30 at 00:40, Paul Fulghum wrote:
> > Would it make sense to do something like (in tty_io.c) the following?
> 
> Not really because it can legally occur if you flip the low latency
> flag while a transaction is queued. It might work if you waited for
> scheduled work to complete in the flag changing.

I don't see how having flush_to_ldisc() queued
or already running (on another processor) negates
the prohibition on calling tty_flip_buffer_push()
with low_latency set in interrupt context.

The comments for tty_flip_buffer_push() state the
function should not be called in interrupt context
if low_latency is set (no exceptions are listed).
Meaning flush_to_ldisc() should only be called
in process context.

If flush_to_ldisc() is queued or already executing,
there is no protection against calling
flush_to_ldisc() again, directly in interrupt context.
TTY_DONT_FLIP is no protection, that is only set
in read_chan() of n_tty.c

If I'm missing something, please point it out.

-- 
Paul Fulghum
paulkf@microgate.com



^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel
  2004-10-31  0:26       ` Paul Fulghum
@ 2004-11-01  7:14         ` Stuart MacDonald
  2004-11-01 14:10           ` Paul Fulghum
  0 siblings, 1 reply; 27+ messages in thread
From: Stuart MacDonald @ 2004-11-01  7:14 UTC (permalink / raw)
  To: 'Paul Fulghum', 'Alan Cox'
  Cc: 'Russell King',
	Tim_T_Murphy, 'Linux Kernel Mailing List'

From: Paul Fulghum
> I don't see how having flush_to_ldisc() queued
> or already running (on another processor) negates
> the prohibition on calling tty_flip_buffer_push()
> with low_latency set in interrupt context.

I always thought the whole point of low_latency was to make the
receive-path very fast, which means specifically allowing the flip
routine to run from the ISR. So checking for calling from the ISR and
specifically disallowing that is basically negating the entire raison
d'etre for low_latency.

Having said that, the interrupt context "taint" that is allowed by the
low_latency flag has been a thorn in our side for some time. It would
be nice if that path was cleaned up to run properly from interrupt or
process context.

..Stu
www.connecttech.com


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel
  2004-11-01  7:14         ` [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel Stuart MacDonald
@ 2004-11-01 14:10           ` Paul Fulghum
  2004-11-01 15:12             ` Stuart MacDonald
  2004-11-01 23:02             ` Alan Cox
  0 siblings, 2 replies; 27+ messages in thread
From: Paul Fulghum @ 2004-11-01 14:10 UTC (permalink / raw)
  To: Stuart MacDonald
  Cc: 'Alan Cox', 'Russell King',
	'Linux Kernel Mailing List'

Stuart MacDonald wrote:
> From: Paul Fulghum
> I always thought the whole point of low_latency was to make the
> receive-path very fast, which means specifically allowing the flip
> routine to run from the ISR. So checking for calling from the ISR and
> specifically disallowing that is basically negating the entire raison
> d'etre for low_latency.

I was thought it was to speed processing if the
caller was already in process context. Maybe the
real intentions are lost to history.

Moving forward, Alan stated that the flip
routine should not be called in interrupt context.
His last post concerning some transient state
of low_latency has confused me.

Currently, with the 8250 driver and N_TTY
line discipline, calling the flip routine from
ISR causes an SMP deadlock. There are two paths that
cause this:
1. low_latency is set
2. flip buffer becomes full

So calling the flip routine from the ISR may work
with some specific drivers, but it would be
dangerous to assume this works in all cases.

-- 
Paul Fulghum
paulkf@microgate.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel
  2004-11-01 14:10           ` Paul Fulghum
@ 2004-11-01 15:12             ` Stuart MacDonald
  2004-11-01 23:02             ` Alan Cox
  1 sibling, 0 replies; 27+ messages in thread
From: Stuart MacDonald @ 2004-11-01 15:12 UTC (permalink / raw)
  To: 'Paul Fulghum', tytso
  Cc: 'Alan Cox', 'Russell King',
	'Linux Kernel Mailing List'

From: Paul Fulghum [mailto:paulkf@microgate.com] 
> Stuart MacDonald wrote:
> > I always thought the whole point of low_latency was to make the
> > receive-path very fast, which means specifically allowing the flip
> > routine to run from the ISR. So checking for calling from 
> the ISR and
> > specifically disallowing that is basically negating the 
> entire raison
> > d'etre for low_latency.
> 
> I was thought it was to speed processing if the
> caller was already in process context. Maybe the
> real intentions are lost to history.

Best person to ask may be Ted; he was once the serial maintainer. Ted?

> Moving forward, Alan stated that the flip
> routine should not be called in interrupt context.
> His last post concerning some transient state
> of low_latency has confused me.

I didn't follow that either, but I wasn't reading too closely.

> Currently, with the 8250 driver and N_TTY
> line discipline, calling the flip routine from
> ISR causes an SMP deadlock. There are two paths that
> cause this:
> 1. low_latency is set
> 2. flip buffer becomes full
> 
> So calling the flip routine from the ISR may work
> with some specific drivers, but it would be
> dangerous to assume this works in all cases.

I haven't looked at the 2.6 serial rewrite in depth yet, but the
problem always existed in the 2.4 driver. I got around the problem by
checking for interrupt context and taking the locks or not at a much
earlier stage.

..Stu
www.connecttech.com


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel
  2004-11-01 14:10           ` Paul Fulghum
  2004-11-01 15:12             ` Stuart MacDonald
@ 2004-11-01 23:02             ` Alan Cox
  2004-11-02  0:18               ` Paul Fulghum
  1 sibling, 1 reply; 27+ messages in thread
From: Alan Cox @ 2004-11-01 23:02 UTC (permalink / raw)
  To: Paul Fulghum
  Cc: Stuart MacDonald, 'Russell King',
	'Linux Kernel Mailing List'

On Llu, 2004-11-01 at 14:10, Paul Fulghum wrote:
> I was thought it was to speed processing if the
> caller was already in process context. Maybe the
> real intentions are lost to history.

It was added way back by Ted to improve performance when dealing with
low latency requirements for I/O. 

> Moving forward, Alan stated that the flip
> routine should not be called in interrupt context.
> His last post concerning some transient state
> of low_latency has confused me.

You were correct about that



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel
  2004-11-01 23:02             ` Alan Cox
@ 2004-11-02  0:18               ` Paul Fulghum
  0 siblings, 0 replies; 27+ messages in thread
From: Paul Fulghum @ 2004-11-02  0:18 UTC (permalink / raw)
  To: Alan Cox
  Cc: Stuart MacDonald, 'Russell King',
	'Linux Kernel Mailing List'

On Mon, 2004-11-01 at 17:02, Alan Cox wrote:
> On Llu, 2004-11-01 at 14:10, Paul Fulghum wrote:
> > His last post concerning some transient state
> > of low_latency has confused me.
> 
> You were correct about that

What? That I'm easily confused?
*snicker*

-- 
Paul Fulghum
paulkf@microgate.com



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2005-01-07  1:54     ` Alan Cox
@ 2005-01-07 14:04       ` Paul Fulghum
  0 siblings, 0 replies; 27+ messages in thread
From: Paul Fulghum @ 2005-01-07 14:04 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tim_T_Murphy, rmk+lkml, Linux Kernel Mailing List

Alan Cox wrote:
> On Gwe, 2005-01-07 at 00:43, Paul Fulghum wrote:
> 
>>IIRC that guarantees a deadlock on SMP due to the
>>generic serial layer trying to grab a spinlock
>>that is already held. (Which prompted the original
>>bug report by Tim several months ago)
> 
> 
> I fixed the tty locking issues with that. If there are any left they
> should be solely in the serial generic code and I've no idea there

Yes, that is where the locking problems were.
When I last looked at it the problem call path was:

serial8250_interrupt();
    spin_lock(port->lock);
    serial8250_handle_port();
       receive_chars();
          flip.work.func(); /* if FLIP buffer full or low_latency set */
              ldisc->receive_buf(); /* N_TTY */
                  tty->driver->flush_chars();
                     uart_start();
                        spin_lock(port->lock); *BANG*

--
Paul Fulghum
Microgate Systems, Ltd


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2005-01-07  0:43   ` Paul Fulghum
@ 2005-01-07  1:54     ` Alan Cox
  2005-01-07 14:04       ` Paul Fulghum
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Cox @ 2005-01-07  1:54 UTC (permalink / raw)
  To: Paul Fulghum; +Cc: Tim_T_Murphy, rmk+lkml, Linux Kernel Mailing List

On Gwe, 2005-01-07 at 00:43, Paul Fulghum wrote:
> IIRC that guarantees a deadlock on SMP due to the
> generic serial layer trying to grab a spinlock
> that is already held. (Which prompted the original
> bug report by Tim several months ago)

I fixed the tty locking issues with that. If there are any left they
should be solely in the serial generic code and I've no idea there


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2005-01-06 23:11 ` Alan Cox
@ 2005-01-07  0:43   ` Paul Fulghum
  2005-01-07  1:54     ` Alan Cox
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Fulghum @ 2005-01-07  0:43 UTC (permalink / raw)
  To: Alan Cox; +Cc: Tim_T_Murphy, rmk+lkml, Linux Kernel Mailing List

Alan Cox wrote:
> On Iau, 2005-01-06 at 22:47, Tim_T_Murphy@Dell.com wrote:
> 
>>>anything i can do to avoid dropping characters without using 
>>>low_latency, which still hangs SMP kernels?
>>
>>this patch fixes the problem for me, but its probably an awful hack -- a
>>brief interrupt storm occurs until tty processes its buffer, but IMHO
>>that's better than dropping characters.
> 
> Presumably this is a device with a fake 8250 that produces sudden large
> bursts of data ? If so then for now you -need- to set low_latency and
> should probably do it by the PCI vendor subid/device id. The problem is
> that the serial layer expects serial data arriving at serial speeds. It
> completely breaks down when it hits an emulation of a generic uart that
> suddenely receives 32Kbytes of data at ethernet speed.
> 
> The longer term fix for this is when the flip buffers go away, and the
> same problem gets cleaned up for things like mainframes and some of the
> high performance DMA devices. Until then just set low_latency and
> comment it as "not your fault" 8)

IIRC that guarantees a deadlock on SMP due to the
generic serial layer trying to grab a spinlock
that is already held. (Which prompted the original
bug report by Tim several months ago)

Perhaps the FIFO trigger threshold for this
specific device can be altered
to try and smooth the amount of data dumped
per IRQ.

--
Paul Fulghum
paulkf@microgate.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
@ 2005-01-06 23:50 Tim_T_Murphy
  0 siblings, 0 replies; 27+ messages in thread
From: Tim_T_Murphy @ 2005-01-06 23:50 UTC (permalink / raw)
  To: rmk+lkml; +Cc: linux-kernel


> this patch fixes the problem for me, but its probably an awful hack --

> a brief interrupt storm occurs until tty processes its buffer, 
> but IMHO that's better than dropping characters.

sorry, i see now that its not an interrupt storm but rather the
interrupt handler doesn't end until it quits due to 'too much work'.

tim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2005-01-06 22:47 Tim_T_Murphy
@ 2005-01-06 23:11 ` Alan Cox
  2005-01-07  0:43   ` Paul Fulghum
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Cox @ 2005-01-06 23:11 UTC (permalink / raw)
  To: Tim_T_Murphy; +Cc: rmk+lkml, Linux Kernel Mailing List

On Iau, 2005-01-06 at 22:47, Tim_T_Murphy@Dell.com wrote:
> > anything i can do to avoid dropping characters without using 
> > low_latency, which still hangs SMP kernels?
> 
> this patch fixes the problem for me, but its probably an awful hack -- a
> brief interrupt storm occurs until tty processes its buffer, but IMHO
> that's better than dropping characters.

On a PCI device you may never get to process the buffer if you do that.
2.6.10 throws away the other bytes carefully and clears the IRQ.

Presumably this is a device with a fake 8250 that produces sudden large
bursts of data ? If so then for now you -need- to set low_latency and
should probably do it by the PCI vendor subid/device id. The problem is
that the serial layer expects serial data arriving at serial speeds. It
completely breaks down when it hits an emulation of a generic uart that
suddenely receives 32Kbytes of data at ethernet speed.

The longer term fix for this is when the flip buffers go away, and the
same problem gets cleaned up for things like mainframes and some of the
high performance DMA devices. Until then just set low_latency and
comment it as "not your fault" 8)

Alan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
@ 2005-01-06 22:47 Tim_T_Murphy
  2005-01-06 23:11 ` Alan Cox
  0 siblings, 1 reply; 27+ messages in thread
From: Tim_T_Murphy @ 2005-01-06 22:47 UTC (permalink / raw)
  To: rmk+lkml; +Cc: linux-kernel


> anything i can do to avoid dropping characters without using 
> low_latency, which still hangs SMP kernels?

this patch fixes the problem for me, but its probably an awful hack -- a
brief interrupt storm occurs until tty processes its buffer, but IMHO
that's better than dropping characters.

is there a better alternative?
thanks,
tim

--- 8250-orig.c	2005-01-06 16:25:24.000000000 -0600
+++ 8250.c	2005-01-06 16:27:21.000000000 -0600
@@ -989,8 +989,10 @@
 		if (unlikely(tty->flip.count >= TTY_FLIPBUF_SIZE)) {
 			if(tty->low_latency)
 				tty_flip_buffer_push(tty);
-			/* If this failed then we will throw away the
-			   bytes but must do so to clear interrupts */
+			else
+				break;
+			/* If this failed then we will just leave now 
+			   rather than dropping bytes (interrupts not
cleared) */
 		}
 		ch = serial_inp(up, UART_RX);
 		flag = TTY_NORMAL;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
@ 2005-01-06 14:55 Tim_T_Murphy
  0 siblings, 0 replies; 27+ messages in thread
From: Tim_T_Murphy @ 2005-01-06 14:55 UTC (permalink / raw)
  To: rmk+lkml; +Cc: linux-kernel

sorry for the huge delay since my last post on this, but disabling
low_latency is resulting in dropped characters.

this looks to be exactly what was reported in
http://www.uwsg.iu.edu/hypermail/linux/kernel/0212.0/0412.html

anything i can do to avoid dropping characters without using
low_latency, which still hangs SMP kernels?
thanks,
tim
> -----Original Message-----
> From: Murphy, Tim T 
> Sent: Monday, November 01, 2004 10:07 AM
> To: 'Russell King'
> Cc: linux-kernel@vger.kernel.org
> Subject: RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, 
> but not the
> UP kernel
> 
> 
> > Thanks for testing - I'll be adding this to mainline kernels.
> Thanks Russell.
> I'd be glad to help by testing any further low_latency 
> related patches also.
> Tim
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
@ 2004-11-01 16:06 Tim_T_Murphy
  0 siblings, 0 replies; 27+ messages in thread
From: Tim_T_Murphy @ 2004-11-01 16:06 UTC (permalink / raw)
  To: rmk+lkml; +Cc: linux-kernel

> Thanks for testing - I'll be adding this to mainline kernels.
Thanks Russell.
I'd be glad to help by testing any further low_latency related patches
also.
Tim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-11-01 14:28 Tim_T_Murphy
@ 2004-11-01 14:35 ` Russell King
  0 siblings, 0 replies; 27+ messages in thread
From: Russell King @ 2004-11-01 14:35 UTC (permalink / raw)
  To: Tim_T_Murphy; +Cc: linux-kernel

On Mon, Nov 01, 2004 at 08:28:35AM -0600, Tim_T_Murphy@Dell.com wrote:
> > Ok, could you check whether this patch automatically detects 
> > the serial port please?
> 
> Yes, other than fixing a couple typos: 
> 	uart_offest -> uart_offset
> 	PCI_ID_ANY -> PCI_ANY_ID

Thanks for testing - I'll be adding this to mainline kernels.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
@ 2004-11-01 14:28 Tim_T_Murphy
  2004-11-01 14:35 ` Russell King
  0 siblings, 1 reply; 27+ messages in thread
From: Tim_T_Murphy @ 2004-11-01 14:28 UTC (permalink / raw)
  To: rmk+lkml; +Cc: linux-kernel

> Ok, could you check whether this patch automatically detects 
> the serial port please?

Yes, other than fixing a couple typos: 
	uart_offest -> uart_offset
	PCI_ID_ANY -> PCI_ANY_ID
I now get ttyS4 in my /proc/tty/driver/serial output, on bootup:

serinfo:1.0 driver revision:
0: uart:16550A port:000003F8 irq:4 tx:22 rx:0 RI
1: uart:16550A port:000002F8 irq:3 tx:22 rx:0 RI
2: uart:unknown port:000003E8 irq:4
3: uart:unknown port:000002E8 irq:3
4: uart:16550A port:0000EC40 irq:201 tx:0 rx:0 CTS|DSR|CD
5: uart:unknown port:00000000 irq:0
6: uart:unknown port:00000000 irq:0
7: uart:unknown port:00000000 irq:0

Also: the removal of "low_latency" does avoid the hang with the SMP
kernel; I am removing this setting from our service startup script.  In
addition, I will be changing the script to only perform the setserial
commands against an unused tty if it cannot first identify a tty that
already describes our virtual uart (ala Russell's 8250_pci fix).

Thanks to all who replied, much appreciated!
Tim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-10-29 23:30 Tim_T_Murphy
@ 2004-10-30 16:02 ` Russell King
  0 siblings, 0 replies; 27+ messages in thread
From: Russell King @ 2004-10-30 16:02 UTC (permalink / raw)
  To: Tim_T_Murphy; +Cc: linux-kernel

On Fri, Oct 29, 2004 at 06:30:01PM -0500, Tim_T_Murphy@Dell.com wrote:
> 
> > Well, if you forward lspci -vvx and the "maddr" and "irqno"
> information
> > (in private mail if you prefer) then I'll fix 8250_pci to work.
> 
> maddr:	10		# note, this is for the UP kernel. for SMP,
> maddr=201
> irqno:	ec40
> lspci -d 1028:0008 -vvx:

Ok, could you check whether this patch automatically detects the serial
port please?

Thanks.

diff -up -x BitKeeper -x ChangeSet -x SCCS -x _xlk -x *.orig -x *.rej orig/drivers/serial/8250_pci.c linux/drivers/serial/8250_pci.c
--- orig/drivers/serial/8250_pci.c	Sat Oct 23 11:39:13 2004
+++ linux/drivers/serial/8250_pci.c	Sat Oct 30 16:57:59 2004
@@ -1026,6 +1026,7 @@ enum pci_board_num_t {
 
 	pbn_b1_bt_2_921600,
 
+	pbn_b1_1_1382400,
 	pbn_b1_2_1382400,
 	pbn_b1_4_1382400,
 	pbn_b1_8_1382400,
@@ -1253,6 +1254,12 @@ static struct pci_board pci_boards[] __d
 		.uart_offset	= 8,
 	},
 
+	[pbn_b1_1_1382400] = {
+		.flags		= FL_BASE1,
+		.num_ports	= 1,
+		.base_baud	= 1382400,
+		.uart_offest	= 8,
+	},
 	[pbn_b1_2_1382400] = {
 		.flags		= FL_BASE1,
 		.num_ports	= 2,
@@ -2109,6 +2116,13 @@ static struct pci_device_id serial_pci_t
 		pbn_b0_bt_1_460800 },
 
 	/*
+	 * Dell Remote Access Card III - Tim_T_Murphy@Dell.com
+	 */
+	{	PCI_VENDOR_ID_DELL, PCI_DEVICE_ID_DELL_RACIII,
+		PCI_ID_ANY, PCI_ID_ANY, 0, 0,
+		pbn_b1_1_1382400 },
+
+	/*
 	 * RAStel 2 port modem, gerg@moreton.com.au
 	 */
 	{	PCI_VENDOR_ID_MORETON, PCI_DEVICE_ID_RASTEL_2PORT,
diff -up -x BitKeeper -x ChangeSet -x SCCS -x _xlk -x *.orig -x *.rej orig/include/linux/pci_ids.h linux/include/linux/pci_ids.h
--- orig/include/linux/pci_ids.h	Sat Oct 23 11:40:03 2004
+++ linux/include/linux/pci_ids.h	Sat Oct 30 16:52:46 2004
@@ -522,6 +522,7 @@
 #define PCI_DEVICE_ID_AI_M1435		0x1435
 
 #define PCI_VENDOR_ID_DELL              0x1028
+#define PCI_DEVICE_ID_DELL_RACIII	0x0008
 
 #define PCI_VENDOR_ID_MATROX		0x102B
 #define PCI_DEVICE_ID_MATROX_MGA_2	0x0518

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
@ 2004-10-29 23:33 Tim_T_Murphy
  0 siblings, 0 replies; 27+ messages in thread
From: Tim_T_Murphy @ 2004-10-29 23:33 UTC (permalink / raw)
  To: rmk+lkml; +Cc: linux-kernel

> maddr:	10		# note, this is for the UP kernel. for
SMP, maddr=201
> irqno:	ec40

duh, i got maddr and irqno backwards in my last post, sorry.
Tim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
@ 2004-10-29 23:30 Tim_T_Murphy
  2004-10-30 16:02 ` Russell King
  0 siblings, 1 reply; 27+ messages in thread
From: Tim_T_Murphy @ 2004-10-29 23:30 UTC (permalink / raw)
  To: rmk+lkml; +Cc: linux-kernel


> Well, if you forward lspci -vvx and the "maddr" and "irqno"
information
> (in private mail if you prefer) then I'll fix 8250_pci to work.

maddr:	10		# note, this is for the UP kernel. for SMP,
maddr=201
irqno:	ec40
lspci -d 1028:0008 -vvx:

00:08.1 Class ff00: Dell Remote Access Card III
	Subsystem: Dell Remote Access Card III
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
	Interrupt: pin B routed to IRQ 10
	Region 0: Memory at fe202000 (32-bit, non-prefetchable)
[size=4K]
	Region 1: I/O ports at ec40 [size=64]
	Region 2: Memory at feb00000 (32-bit, prefetchable) [size=512K]
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 28 10 08 00 03 01 90 02 00 00 00 ff 10 20 80 00
10: 00 20 20 fe 41 ec 00 00 08 00 b0 fe 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 08 00
30: 00 00 00 00 48 00 00 00 00 00 00 00 0a 02 00 00

> I think dropping low_latency will work around the problem for the time
> being.

Thanks a lot for the help and advice, I will try this and report
results.

Tim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-10-29 21:04 Tim_T_Murphy
@ 2004-10-29 21:14 ` Russell King
  0 siblings, 0 replies; 27+ messages in thread
From: Russell King @ 2004-10-29 21:14 UTC (permalink / raw)
  To: Tim_T_Murphy; +Cc: linux-kernel

On Fri, Oct 29, 2004 at 04:04:40PM -0500, Tim_T_Murphy@Dell.com wrote:
> > Shouldn't 8250_pci setup the ports already for you?  If not, what
> > needs to be done to achieve this.  Using setserial to setup ports
> > for PCI cards isn't the preferred way of doing this.
> 
> good question, i will have to understand more to answer it though.
> our product has used this method for almost 2 years now.

Well, if you forward lspci -vvx and the "maddr" and "irqno" information
(in private mail if you prefer) then I'll fix 8250_pci to work.

> > At a guess, you've enabled "low latency" setting on this port ?
> 
> yes.  here's a snippet from the script:
> 
> 	echo -n "Starting ${racsvc}: "
> 	# set serial characteristics for RAC device
> 	setserial /dev/${ttyid} \
> 		port 0x${maddr} irq ${irqno} ^skip_test autoconfig
> 	setserial /dev/${ttyid} \
> 		uart 16550A low_latency baud_base 1382400	\
> 		close_delay 0 closing_wait infinite
> 	# now start pppd
> 	/sbin/modprobe -q ppp >/dev/null 2>&1
> 	/sbin/modprobe -q ppp_async >/dev/null 2>&1
> 	daemon pppd call ${service}
> 	RETVAL=$?

I think dropping low_latency will work around the problem for the time
being.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
@ 2004-10-29 21:04 Tim_T_Murphy
  2004-10-29 21:14 ` Russell King
  0 siblings, 1 reply; 27+ messages in thread
From: Tim_T_Murphy @ 2004-10-29 21:04 UTC (permalink / raw)
  To: rmk+lkml; +Cc: linux-kernel


> Shouldn't 8250_pci setup the ports already for you?  If not, what
needs
> to be done to achieve this.  Using setserial to setup ports for PCI
cards
> isn't the preferred way of doing this.

good question, i will have to understand more to answer it though.
our product has used this method for almost 2 years now.

> At a guess, you've enabled "low latency" setting on this port ?

yes.  here's a snippet from the script:

	echo -n "Starting ${racsvc}: "
	# set serial characteristics for RAC device
	setserial /dev/${ttyid} \
		port 0x${maddr} irq ${irqno} ^skip_test autoconfig
	setserial /dev/${ttyid} \
		uart 16550A low_latency baud_base 1382400	\
		close_delay 0 closing_wait infinite
	# now start pppd
	/sbin/modprobe -q ppp >/dev/null 2>&1
	/sbin/modprobe -q ppp_async >/dev/null 2>&1
	daemon pppd call ${service}
	RETVAL=$?

Thanks
Tim

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2005-01-07 14:09 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-29 19:55 [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel Tim_T_Murphy
2004-10-29 20:20 ` Russell King
2004-10-29 22:18   ` Paul Fulghum
2004-10-29 23:40   ` Paul Fulghum
2004-10-30 22:43     ` Alan Cox
2004-10-31  0:26       ` Paul Fulghum
2004-11-01  7:14         ` [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel Stuart MacDonald
2004-11-01 14:10           ` Paul Fulghum
2004-11-01 15:12             ` Stuart MacDonald
2004-11-01 23:02             ` Alan Cox
2004-11-02  0:18               ` Paul Fulghum
2004-10-29 21:08 ` [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel Paul Fulghum
2004-10-29 21:04 Tim_T_Murphy
2004-10-29 21:14 ` Russell King
2004-10-29 23:30 Tim_T_Murphy
2004-10-30 16:02 ` Russell King
2004-10-29 23:33 Tim_T_Murphy
2004-11-01 14:28 Tim_T_Murphy
2004-11-01 14:35 ` Russell King
2004-11-01 16:06 Tim_T_Murphy
2005-01-06 14:55 Tim_T_Murphy
2005-01-06 22:47 Tim_T_Murphy
2005-01-06 23:11 ` Alan Cox
2005-01-07  0:43   ` Paul Fulghum
2005-01-07  1:54     ` Alan Cox
2005-01-07 14:04       ` Paul Fulghum
2005-01-06 23:50 Tim_T_Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).