linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
@ 2004-10-29 19:55 Tim_T_Murphy
  2004-10-29 20:20 ` Russell King
  2004-10-29 21:08 ` [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel Paul Fulghum
  0 siblings, 2 replies; 13+ messages in thread
From: Tim_T_Murphy @ 2004-10-29 19:55 UTC (permalink / raw)
  To: linux-kernel

I am new to the list, hope this is ok..
I've read about several problems others are having with the new 2.6 serial driver in the list, and tried to see if their solutions solved my issue also, but unfortunately none that I have tried yet have helped.

We're migrating our applications for the Dell Remote Access Controller (DRAC) to run on a 2.6 kernel from a 2.4 kernel. Communication between the apps and the DRAC happen over a ppp link which is established via a service startup script; the script uses setserial to prepare an unused tty (based on the assigned hardware information, obtained via lspci), and the script then calls pppd to finish/establish the link.

Everything works fine with the UP kernel -- Although, there is a message in syslog regarding a spinlock (issued at approximately the same point in time where the SMP kernel hangs):
---
Oct 29 13:34:47 racjag-1 kernel: CSLIP: code copyright 1989 Regents of the University of California
Oct 29 13:34:47 racjag-1 kernel: PPP generic driver version 2.4.2
Oct 29 13:34:47 racjag-1 udev[3875]: creating device node '/dev/ppp'
Oct 29 13:34:47 racjag-1 pppd[3884]: pppd 2.4.2 started by root, uid 0
Oct 29 13:34:47 racjag-1 racser: pppd startup succeeded
Oct 29 13:34:48 racjag-1 chat[3886]: send (CLIENT^M)
Oct 29 13:34:48 racjag-1 chat[3886]: expect (CLIENTSERVER)
Oct 29 13:34:48 racjag-1 kernel: drivers/serial/serial_core.c:102: spin_lock(drivers/serial/serial_core.c:023f2548) already locked by drivers/serial/8250.c/1015
Oct 29 13:34:48 racjag-1 kernel: drivers/serial/8250.c:1017: spin_unlock(drivers/serial/serial_core.c:023f2548) not locked
Oct 29 13:34:48 racjag-1 chat[3886]: CLIENTSERVER
Oct 29 13:34:48 racjag-1 chat[3886]:  -- got it 
Oct 29 13:34:48 racjag-1 chat[3886]: send ()
Oct 29 13:34:48 racjag-1 pppd[3884]: Serial connection established.
Oct 29 13:34:48 racjag-1 pppd[3884]: Using interface ppp0
Oct 29 13:34:48 racjag-1 pppd[3884]: Connect: ppp0 <--> /dev/ttyS2
Oct 29 13:34:49 racjag-1 pppd[3884]: local  IP address 192.168.234.235
Oct 29 13:34:49 racjag-1 pppd[3884]: remote IP address 192.168.234.236
---

With the SMP kernel, it hangs very soon after starting pppd.
I enabled DEBUG in the serial driver and captured the syslog when the problem happens, but this is not detailed enough for me to finger the exact problem:
---
Oct 28 14:04:52 racjag-1 kernel: CSLIP: code copyright 1989 Regents of the University of California
Oct 28 14:04:52 racjag-1 kernel: PPP generic driver version 2.4.2
Oct 28 14:04:52 racjag-1 udev[3621]: creating device node '/dev/ppp'
Oct 28 14:05:19 racjag-1 kernel: uart_open(2) called
Oct 28 14:05:19 racjag-1 kernel: Trying to free nonexistent resource <00000000-00000007>
Oct 28 14:05:19 racjag-1 kernel: uart_close(2) called
Oct 28 14:05:19 racjag-1 kernel: uart_flush_buffer(2) called
Oct 28 14:05:19 racjag-1 kernel: uart_open(2) called
Oct 28 14:05:19 racjag-1 kernel: uart_close(2) called
Oct 28 14:05:19 racjag-1 kernel: uart_flush_buffer(2) called
Oct 28 14:05:19 racjag-1 pppd[3681]: pppd 2.4.1 started by root, uid 0
Oct 28 14:05:19 racjag-1 kernel: uart_open(2) called
Oct 28 14:05:19 racjag-1 racser: pppd startup succeeded
Oct 28 14:05:20 racjag-1 kernel: uart_open(2) called
Oct 28 14:05:20 racjag-1 kernel: uart_close(2) called
Oct 28 14:05:20 racjag-1 chat[3683]: send (CLIENT^M)
---
The system hangs right there; must press and hold power to get the system to shut down.

Any suggestions to narrow down the cause?  Please cc my email as I do not subscribe to this list.
Thanks,
Tim


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-10-29 19:55 [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel Tim_T_Murphy
@ 2004-10-29 20:20 ` Russell King
  2004-10-29 22:18   ` Paul Fulghum
  2004-10-29 23:40   ` Paul Fulghum
  2004-10-29 21:08 ` [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel Paul Fulghum
  1 sibling, 2 replies; 13+ messages in thread
From: Russell King @ 2004-10-29 20:20 UTC (permalink / raw)
  To: Tim_T_Murphy; +Cc: linux-kernel

On Fri, Oct 29, 2004 at 02:55:10PM -0500, Tim_T_Murphy@Dell.com wrote:
> I've read about several problems others are having with the new 2.6
> serial driver in the list, and tried to see if their solutions solved
> my issue also, but unfortunately none that I have tried yet have helped.

Well, this is the first I know of this kind of problem...

> We're migrating our applications for the Dell Remote Access Controller
> (DRAC) to run on a 2.6 kernel from a 2.4 kernel. Communication between
> the apps and the DRAC happen over a ppp link which is established via
> a service startup script; the script uses setserial to prepare an unused
> tty (based on the assigned hardware information, obtained via lspci),
> and the script then calls pppd to finish/establish the link.

Shouldn't 8250_pci setup the ports already for you?  If not, what needs
to be done to achieve this.  Using setserial to setup ports for PCI cards
isn't the preferred way of doing this.

At a guess, you've enabled "low latency" setting on this port ?

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-10-29 19:55 [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel Tim_T_Murphy
  2004-10-29 20:20 ` Russell King
@ 2004-10-29 21:08 ` Paul Fulghum
  1 sibling, 0 replies; 13+ messages in thread
From: Paul Fulghum @ 2004-10-29 21:08 UTC (permalink / raw)
  To: Tim_T_Murphy; +Cc: linux-kernel

On Fri, 2004-10-29 at 14:55, Tim_T_Murphy@Dell.com wrote:
> Oct 29 13:34:48 racjag-1 chat[3886]: expect (CLIENTSERVER)
> Oct 29 13:34:48 racjag-1 kernel: drivers/serial/serial_core.c:102: spin_lock(drivers/serial/serial_core.c:023f2548) already locked by drivers/serial/8250.c/1015
> Oct 29 13:34:48 racjag-1 kernel: drivers/serial/8250.c:1017: spin_unlock(drivers/serial/serial_core.c:023f2548) not locked
> Oct 29 13:34:48 racjag-1 chat[3886]: CLIENTSERVER

One way this can happen is a receive interrupt:

serial8250_interrupt();
    spin_lock(port->lock);
    serial8250_handle_port();
       receive_chars();
          flip.work.func(); /* if FLIP buffer full */
             ldisc->receive_buf(); /* N_TTY */
                 tty->driver->flush_chars();
                     uart_start();
                        spin_lock(port->lock); *BANG*

Try the attached patch and report what happens.

-- 
Paul Fulghum
paulkf@microgate.com

--- linux-2.6.8/drivers/serial/8250.c	2004-08-14 00:36:13.000000000 -0500
+++ b/drivers/serial/8250.c	2004-10-29 15:58:28.076014336 -0500
@@ -830,9 +830,13 @@ receive_chars(struct uart_8250_port *up,
 
 	do {
 		if (unlikely(tty->flip.count >= TTY_FLIPBUF_SIZE)) {
-			tty->flip.work.func((void *)tty);
-			if (tty->flip.count >= TTY_FLIPBUF_SIZE)
-				return; // if TTY_DONT_FLIP is set
+			/* no room in flip buffer, discard rx FIFO contents to clear IRQ */
+			do {
+				serial_inp(up, UART_RX);
+				up->port.icount.overrun++;
+				*status = serial_inp(up, UART_LSR);
+			} while ((*status & UART_LSR_DR) && (max_count-- > 0));
+			return;	/* if TTY_DONT_FLIP is set */
 		}
 		ch = serial_inp(up, UART_RX);
 		*tty->flip.char_buf_ptr = ch;



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-10-29 20:20 ` Russell King
@ 2004-10-29 22:18   ` Paul Fulghum
  2004-10-29 23:40   ` Paul Fulghum
  1 sibling, 0 replies; 13+ messages in thread
From: Paul Fulghum @ 2004-10-29 22:18 UTC (permalink / raw)
  To: Russell King; +Cc: Tim_T_Murphy, Linux Kernel list

On Fri, 2004-10-29 at 15:20, Russell King wrote:
> At a guess, you've enabled "low latency" setting on this port ?

Ah, that would explain the problem better than
the code path I saw (flip buffer full).
The problem is still the same: calling the flip
work routine from the ISR, which calls through
N_TTY receive_buf->flush_chars->start_tx.

-- 
Paul Fulghum
paulkf@microgate.com



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-10-29 20:20 ` Russell King
  2004-10-29 22:18   ` Paul Fulghum
@ 2004-10-29 23:40   ` Paul Fulghum
  2004-10-30 22:43     ` Alan Cox
  1 sibling, 1 reply; 13+ messages in thread
From: Paul Fulghum @ 2004-10-29 23:40 UTC (permalink / raw)
  To: Russell King; +Cc: Tim_T_Murphy, Linux Kernel list

On Fri, 2004-10-29 at 15:20, Russell King wrote:
> At a guess, you've enabled "low latency" setting on this port ?

Would it make sense to do something like (in tty_io.c) the following?

void tty_flip_buffer_push(struct tty_struct *tty)
{
	if (tty->low_latency) {
		if (in_interrupt()) {
			printk(KERN_ERR "tty_flip_buffer_push called with low latency from interrupt!\n");
			dump_stack();
			schedule_delayed_work(&tty->flip.work, 1);
		}
		else
			flush_to_ldisc((void *) tty);
	}
	else
		schedule_delayed_work(&tty->flip.work, 1);
}

-- 
Paul Fulghum
paulkf@microgate.com



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-10-29 23:40   ` Paul Fulghum
@ 2004-10-30 22:43     ` Alan Cox
  2004-10-31  0:26       ` Paul Fulghum
  0 siblings, 1 reply; 13+ messages in thread
From: Alan Cox @ 2004-10-30 22:43 UTC (permalink / raw)
  To: Paul Fulghum; +Cc: Russell King, Tim_T_Murphy, Linux Kernel Mailing List

On Sad, 2004-10-30 at 00:40, Paul Fulghum wrote:
> On Fri, 2004-10-29 at 15:20, Russell King wrote:
> > At a guess, you've enabled "low latency" setting on this port ?
> 
> Would it make sense to do something like (in tty_io.c) the following?

Not really because it can legally occur if you flip the low latency
flag while a transaction is queued. It might work if you waited for
scheduled work to complete in the flag changing.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel
  2004-10-30 22:43     ` Alan Cox
@ 2004-10-31  0:26       ` Paul Fulghum
  2004-11-01  7:14         ` [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel Stuart MacDonald
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Fulghum @ 2004-10-31  0:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: Russell King, Tim_T_Murphy, Linux Kernel Mailing List

On Sat, 2004-10-30 at 17:43, Alan Cox wrote:
> On Sad, 2004-10-30 at 00:40, Paul Fulghum wrote:
> > Would it make sense to do something like (in tty_io.c) the following?
> 
> Not really because it can legally occur if you flip the low latency
> flag while a transaction is queued. It might work if you waited for
> scheduled work to complete in the flag changing.

I don't see how having flush_to_ldisc() queued
or already running (on another processor) negates
the prohibition on calling tty_flip_buffer_push()
with low_latency set in interrupt context.

The comments for tty_flip_buffer_push() state the
function should not be called in interrupt context
if low_latency is set (no exceptions are listed).
Meaning flush_to_ldisc() should only be called
in process context.

If flush_to_ldisc() is queued or already executing,
there is no protection against calling
flush_to_ldisc() again, directly in interrupt context.
TTY_DONT_FLIP is no protection, that is only set
in read_chan() of n_tty.c

If I'm missing something, please point it out.

-- 
Paul Fulghum
paulkf@microgate.com



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel
  2004-10-31  0:26       ` Paul Fulghum
@ 2004-11-01  7:14         ` Stuart MacDonald
  2004-11-01 14:10           ` Paul Fulghum
  0 siblings, 1 reply; 13+ messages in thread
From: Stuart MacDonald @ 2004-11-01  7:14 UTC (permalink / raw)
  To: 'Paul Fulghum', 'Alan Cox'
  Cc: 'Russell King',
	Tim_T_Murphy, 'Linux Kernel Mailing List'

From: Paul Fulghum
> I don't see how having flush_to_ldisc() queued
> or already running (on another processor) negates
> the prohibition on calling tty_flip_buffer_push()
> with low_latency set in interrupt context.

I always thought the whole point of low_latency was to make the
receive-path very fast, which means specifically allowing the flip
routine to run from the ISR. So checking for calling from the ISR and
specifically disallowing that is basically negating the entire raison
d'etre for low_latency.

Having said that, the interrupt context "taint" that is allowed by the
low_latency flag has been a thorn in our side for some time. It would
be nice if that path was cleaned up to run properly from interrupt or
process context.

..Stu
www.connecttech.com


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel
  2004-11-01  7:14         ` [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel Stuart MacDonald
@ 2004-11-01 14:10           ` Paul Fulghum
  2004-11-01 15:12             ` Stuart MacDonald
  2004-11-01 23:02             ` Alan Cox
  0 siblings, 2 replies; 13+ messages in thread
From: Paul Fulghum @ 2004-11-01 14:10 UTC (permalink / raw)
  To: Stuart MacDonald
  Cc: 'Alan Cox', 'Russell King',
	'Linux Kernel Mailing List'

Stuart MacDonald wrote:
> From: Paul Fulghum
> I always thought the whole point of low_latency was to make the
> receive-path very fast, which means specifically allowing the flip
> routine to run from the ISR. So checking for calling from the ISR and
> specifically disallowing that is basically negating the entire raison
> d'etre for low_latency.

I was thought it was to speed processing if the
caller was already in process context. Maybe the
real intentions are lost to history.

Moving forward, Alan stated that the flip
routine should not be called in interrupt context.
His last post concerning some transient state
of low_latency has confused me.

Currently, with the 8250 driver and N_TTY
line discipline, calling the flip routine from
ISR causes an SMP deadlock. There are two paths that
cause this:
1. low_latency is set
2. flip buffer becomes full

So calling the flip routine from the ISR may work
with some specific drivers, but it would be
dangerous to assume this works in all cases.

-- 
Paul Fulghum
paulkf@microgate.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel
  2004-11-01 14:10           ` Paul Fulghum
@ 2004-11-01 15:12             ` Stuart MacDonald
  2004-11-01 23:02             ` Alan Cox
  1 sibling, 0 replies; 13+ messages in thread
From: Stuart MacDonald @ 2004-11-01 15:12 UTC (permalink / raw)
  To: 'Paul Fulghum', tytso
  Cc: 'Alan Cox', 'Russell King',
	'Linux Kernel Mailing List'

From: Paul Fulghum [mailto:paulkf@microgate.com] 
> Stuart MacDonald wrote:
> > I always thought the whole point of low_latency was to make the
> > receive-path very fast, which means specifically allowing the flip
> > routine to run from the ISR. So checking for calling from 
> the ISR and
> > specifically disallowing that is basically negating the 
> entire raison
> > d'etre for low_latency.
> 
> I was thought it was to speed processing if the
> caller was already in process context. Maybe the
> real intentions are lost to history.

Best person to ask may be Ted; he was once the serial maintainer. Ted?

> Moving forward, Alan stated that the flip
> routine should not be called in interrupt context.
> His last post concerning some transient state
> of low_latency has confused me.

I didn't follow that either, but I wasn't reading too closely.

> Currently, with the 8250 driver and N_TTY
> line discipline, calling the flip routine from
> ISR causes an SMP deadlock. There are two paths that
> cause this:
> 1. low_latency is set
> 2. flip buffer becomes full
> 
> So calling the flip routine from the ISR may work
> with some specific drivers, but it would be
> dangerous to assume this works in all cases.

I haven't looked at the 2.6 serial rewrite in depth yet, but the
problem always existed in the 2.4 driver. I got around the problem by
checking for interrupt context and taking the locks or not at a much
earlier stage.

..Stu
www.connecttech.com


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel
  2004-11-01 14:10           ` Paul Fulghum
  2004-11-01 15:12             ` Stuart MacDonald
@ 2004-11-01 23:02             ` Alan Cox
  2004-11-02  0:18               ` Paul Fulghum
  1 sibling, 1 reply; 13+ messages in thread
From: Alan Cox @ 2004-11-01 23:02 UTC (permalink / raw)
  To: Paul Fulghum
  Cc: Stuart MacDonald, 'Russell King',
	'Linux Kernel Mailing List'

On Llu, 2004-11-01 at 14:10, Paul Fulghum wrote:
> I was thought it was to speed processing if the
> caller was already in process context. Maybe the
> real intentions are lost to history.

It was added way back by Ted to improve performance when dealing with
low latency requirements for I/O. 

> Moving forward, Alan stated that the flip
> routine should not be called in interrupt context.
> His last post concerning some transient state
> of low_latency has confused me.

You were correct about that



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel
  2004-11-01 23:02             ` Alan Cox
@ 2004-11-02  0:18               ` Paul Fulghum
  0 siblings, 0 replies; 13+ messages in thread
From: Paul Fulghum @ 2004-11-02  0:18 UTC (permalink / raw)
  To: Alan Cox
  Cc: Stuart MacDonald, 'Russell King',
	'Linux Kernel Mailing List'

On Mon, 2004-11-01 at 17:02, Alan Cox wrote:
> On Llu, 2004-11-01 at 14:10, Paul Fulghum wrote:
> > His last post concerning some transient state
> > of low_latency has confused me.
> 
> You were correct about that

What? That I'm easily confused?
*snicker*

-- 
Paul Fulghum
paulkf@microgate.com



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel
@ 2005-01-10 20:36 Tim_T_Murphy
  0 siblings, 0 replies; 13+ messages in thread
From: Tim_T_Murphy @ 2005-01-10 20:36 UTC (permalink / raw)
  To: alan; +Cc: rmk+lkml, linux-kernel, paulkf


Thanks for your comments and advice, I am new to
the kernel and appreciate your taking the time.

> Presumably this is a device with a fake 8250 that 
> produces sudden large bursts of data ? If so then 
> for now you -need- to set low_latency and should 
> probably do it by the PCI vendor subid/device id. 
> The problem is that the serial layer expects serial 
> data arriving at serial speeds. It completely breaks 
> down when it hits an emulation of a generic uart that
> suddenely receives 32Kbytes of data at ethernet speed.

Yes. Thanks, this confirms what I suspected.

> The longer term fix for this is when the flip buffers
> go away, and the same problem gets cleaned up for 
> things like mainframes and some high performance DMA 
> devices. 

Is this, or a short-term fix, expected anytime soon?

Problem for me is that my application no longer works
with the 2.6 kernels, since it relies on the kernel's
serial support -- which worked fine with 2.4 kernels.

If there's anything I can do to expedite a fix please
let me know -- I've spent the past few days learning
and working with the code, but I obviously have a 
ways to go before I sleep..

Tim

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-01-10 21:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-29 19:55 [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel Tim_T_Murphy
2004-10-29 20:20 ` Russell King
2004-10-29 22:18   ` Paul Fulghum
2004-10-29 23:40   ` Paul Fulghum
2004-10-30 22:43     ` Alan Cox
2004-10-31  0:26       ` Paul Fulghum
2004-11-01  7:14         ` [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel Stuart MacDonald
2004-11-01 14:10           ` Paul Fulghum
2004-11-01 15:12             ` Stuart MacDonald
2004-11-01 23:02             ` Alan Cox
2004-11-02  0:18               ` Paul Fulghum
2004-10-29 21:08 ` [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UP kernel Paul Fulghum
2005-01-10 20:36 [BUG][2.6.8.1] serial driver hangs SMP kernel, but not the UPkernel Tim_T_Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).