All of lore.kernel.org
 help / color / mirror / Atom feed
* ST16C654 stoppes transmitting after a while
@ 2012-07-17  7:43 Tobias Arp
  2012-07-17  9:52 ` Alan Cox
  0 siblings, 1 reply; 12+ messages in thread
From: Tobias Arp @ 2012-07-17  7:43 UTC (permalink / raw)
  To: linux-serial

Hi,

we use on our system (ARM board Processor EP9315) a quad uart (ST16C654) with kernel 3.2 (the quad uart is connected only to one irq, so the four interfaces of the quad uart have to share this irq). Multiple simultanous  connections are working fine for some hours (3 Modem connections and one null modem connection / transferring data in both directions with hardware handshake (CRTSCTS) enabled). Then one connection can't send data to the remote host anymore (often the null modem connection). The driver is the 8250 serial driver.

I found out that the hw_stopped flag is set to 1 but not set to 0 again although CTS is set (measuring with the oscilloscope and reading out the modem status register are confirming this / i modified the serial8250_get_mctrl function to print out these values).

It seems to me that an CTS change (UART_MSR_DCTS) is lost.
Could this cause this behaviour?

Any suggestions for a workaround are appreciated.

Thanks in  advance

Tobias





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ST16C654 stoppes transmitting after a while
  2012-07-17  7:43 ST16C654 stoppes transmitting after a while Tobias Arp
@ 2012-07-17  9:52 ` Alan Cox
  2012-07-17 10:05   ` Tobias Arp
  0 siblings, 1 reply; 12+ messages in thread
From: Alan Cox @ 2012-07-17  9:52 UTC (permalink / raw)
  To: Tobias Arp; +Cc: linux-serial

> I found out that the hw_stopped flag is set to 1 but not set to 0 again although CTS is set (measuring with the oscilloscope and reading out the modem status register are confirming this / i modified the serial8250_get_mctrl function to print out these values).
> 
> It seems to me that an CTS change (UART_MSR_DCTS) is lost.
> Could this cause this behaviour?

Possibly you'd need to instrument your kernel as well to prove it. It
could also be a race in the kernel drivers somewhere. We certainly had
those in old kernels here and there.
 
> Any suggestions for a workaround are appreciated.

What kernel are you running ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ST16C654 stoppes transmitting after a while
  2012-07-17  9:52 ` Alan Cox
@ 2012-07-17 10:05   ` Tobias Arp
  2012-07-17 11:21     ` Alan Cox
  0 siblings, 1 reply; 12+ messages in thread
From: Tobias Arp @ 2012-07-17 10:05 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-serial

I run kernel 3.2. Upgrading to a newer kernel would be not se easy at the moment.

I am not sure how to check if this change is really lost.

-------- Original-Nachricht --------
> Datum: Tue, 17 Jul 2012 10:52:29 +0100
> Von: Alan Cox <alan@lxorguk.ukuu.org.uk>
> An: "Tobias Arp" <tobiasarp@gmx.de>
> CC: linux-serial@vger.kernel.org
> Betreff: Re: ST16C654 stoppes transmitting after a while

> > I found out that the hw_stopped flag is set to 1 but not set to 0 again
> although CTS is set (measuring with the oscilloscope and reading out the
> modem status register are confirming this / i modified the
> serial8250_get_mctrl function to print out these values).
> > 
> > It seems to me that an CTS change (UART_MSR_DCTS) is lost.
> > Could this cause this behaviour?
> 
> Possibly you'd need to instrument your kernel as well to prove it. It
> could also be a race in the kernel drivers somewhere. We certainly had
> those in old kernels here and there.
>  
> > Any suggestions for a workaround are appreciated.
> 
> What kernel are you running ?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-serial" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ST16C654 stoppes transmitting after a while
  2012-07-17 10:05   ` Tobias Arp
@ 2012-07-17 11:21     ` Alan Cox
  2012-07-17 11:36       ` Tobias Arp
  0 siblings, 1 reply; 12+ messages in thread
From: Alan Cox @ 2012-07-17 11:21 UTC (permalink / raw)
  To: Tobias Arp; +Cc: linux-serial

On Tue, 17 Jul 2012 12:05:21 +0200
"Tobias Arp" <tobiasarp@gmx.de> wrote:

> I run kernel 3.2. Upgrading to a newer kernel would be not se easy at the moment.

3.2 is certainly new enough that it should have all the major fixes in.

> I am not sure how to check if this change is really lost.

With a debugger dump the state of the port structures one the hung port
once it has hung. Something like Red Hat's systemtap might also be very
helpful in doing this kind of monitoring.

My first guess would be that you hit a race between the port stopping on
one processor and the flow control event on another. We have one known
case there still being debugged but I thought it could only be hit if you
were using ttys in low latency mode.

Alan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ST16C654 stoppes transmitting after a while
  2012-07-17 11:21     ` Alan Cox
@ 2012-07-17 11:36       ` Tobias Arp
  2012-07-17 16:38         ` Alan Cox
  0 siblings, 1 reply; 12+ messages in thread
From: Tobias Arp @ 2012-07-17 11:36 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-serial

This is a single core arm processor (Cirrus EP9315), could this race condition happen in this case? 

But i have CONFIG PRREMPT set. Shall i take a stab on it by switching it to off?

Tobias
-------- Original-Nachricht --------
> Datum: Tue, 17 Jul 2012 12:21:40 +0100
> Von: Alan Cox <alan@lxorguk.ukuu.org.uk>
> An: "Tobias Arp" <tobiasarp@gmx.de>
> CC: linux-serial@vger.kernel.org
> Betreff: Re: ST16C654 stoppes transmitting after a while

> On Tue, 17 Jul 2012 12:05:21 +0200
> "Tobias Arp" <tobiasarp@gmx.de> wrote:
> 
> > I run kernel 3.2. Upgrading to a newer kernel would be not se easy at
> the moment.
> 
> 3.2 is certainly new enough that it should have all the major fixes in.
> 
> > I am not sure how to check if this change is really lost.
> 
> With a debugger dump the state of the port structures one the hung port
> once it has hung. Something like Red Hat's systemtap might also be very
> helpful in doing this kind of monitoring.
> 
> My first guess would be that you hit a race between the port stopping on
> one processor and the flow control event on another. We have one known
> case there still being debugged but I thought it could only be hit if you
> were using ttys in low latency mode.
> 
> Alan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ST16C654 stoppes transmitting after a while
  2012-07-17 11:36       ` Tobias Arp
@ 2012-07-17 16:38         ` Alan Cox
  2012-07-18  6:37           ` Tobias Arp
  0 siblings, 1 reply; 12+ messages in thread
From: Alan Cox @ 2012-07-17 16:38 UTC (permalink / raw)
  To: Tobias Arp; +Cc: linux-serial

On Tue, 17 Jul 2012 13:36:03 +0200
"Tobias Arp" <tobiasarp@gmx.de> wrote:

> This is a single core arm processor (Cirrus EP9315), could this race condition happen in this case? 

In theory, otoh it ought to be quite easy to test for.

How fast are your links - would it be fair to characterise your
environment as a fairly slow CPU handling a lot of fast serial links

(just trying to understand what the likely places to look might be)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ST16C654 stoppes transmitting after a while
  2012-07-17 16:38         ` Alan Cox
@ 2012-07-18  6:37           ` Tobias Arp
  2012-07-18  7:45             ` Tobias Arp
                               ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Tobias Arp @ 2012-07-18  6:37 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-serial

The serial links are all set up to 115200 baud.

Dumping out the tty-struct when it hangs shows me this:

magic: 0x5401
name: ttyS0
flags: 0x00000A00
count: 1
stopped: 0
hw_stopped: 1
flow_stopped: 0
packet: 0
low_latency: 0
warned: 0
ctrl_status: 0x00
receive_room: 4095
...

It is a little bit strange that always the null modem connection shows this behaviour. May be it's because it has the highest data transfer rate (all other connections are made by isdn modems / 1-channel isdn)?




-------- Original-Nachricht --------
> Datum: Tue, 17 Jul 2012 17:38:47 +0100
> Von: Alan Cox <alan@lxorguk.ukuu.org.uk>
> An: "Tobias Arp" <tobiasarp@gmx.de>
> CC: linux-serial@vger.kernel.org
> Betreff: Re: ST16C654 stoppes transmitting after a while

> On Tue, 17 Jul 2012 13:36:03 +0200
> "Tobias Arp" <tobiasarp@gmx.de> wrote:
> 
> > This is a single core arm processor (Cirrus EP9315), could this race
> condition happen in this case? 
> 
> In theory, otoh it ought to be quite easy to test for.
> 
> How fast are your links - would it be fair to characterise your
> environment as a fairly slow CPU handling a lot of fast serial links
> 
> (just trying to understand what the likely places to look might be)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-serial" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ST16C654 stoppes transmitting after a while
  2012-07-18  6:37           ` Tobias Arp
@ 2012-07-18  7:45             ` Tobias Arp
  2012-07-18 11:08               ` Alan Cox
  2012-07-20  4:25             ` Tobias Arp
  2012-07-20 10:10             ` Tobias Arp
  2 siblings, 1 reply; 12+ messages in thread
From: Tobias Arp @ 2012-07-18  7:45 UTC (permalink / raw)
  To: Tobias Arp, alan; +Cc: linux-serial

The problem occurs very quick by setting the baud rate to 230400 on a single null modem connection (without usunign the orher ports of the quad uart). So i would say you are right with your guess...


-------- Original-Nachricht --------
> Datum: Wed, 18 Jul 2012 08:37:04 +0200
> Von: "Tobias Arp" <tobiasarp@gmx.de>
> An: Alan Cox <alan@lxorguk.ukuu.org.uk>
> CC: linux-serial@vger.kernel.org
> Betreff: Re: ST16C654 stoppes transmitting after a while

> The serial links are all set up to 115200 baud.
> 
> Dumping out the tty-struct when it hangs shows me this:
> 
> magic: 0x5401
> name: ttyS0
> flags: 0x00000A00
> count: 1
> stopped: 0
> hw_stopped: 1
> flow_stopped: 0
> packet: 0
> low_latency: 0
> warned: 0
> ctrl_status: 0x00
> receive_room: 4095
> ...
> 
> It is a little bit strange that always the null modem connection shows
> this behaviour. May be it's because it has the highest data transfer rate (all
> other connections are made by isdn modems / 1-channel isdn)?
> 
> 
> 
> 
> -------- Original-Nachricht --------
> > Datum: Tue, 17 Jul 2012 17:38:47 +0100
> > Von: Alan Cox <alan@lxorguk.ukuu.org.uk>
> > An: "Tobias Arp" <tobiasarp@gmx.de>
> > CC: linux-serial@vger.kernel.org
> > Betreff: Re: ST16C654 stoppes transmitting after a while
> 
> > On Tue, 17 Jul 2012 13:36:03 +0200
> > "Tobias Arp" <tobiasarp@gmx.de> wrote:
> > 
> > > This is a single core arm processor (Cirrus EP9315), could this race
> > condition happen in this case? 
> > 
> > In theory, otoh it ought to be quite easy to test for.
> > 
> > How fast are your links - would it be fair to characterise your
> > environment as a fairly slow CPU handling a lot of fast serial links
> > 
> > (just trying to understand what the likely places to look might be)
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-serial"
> in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-serial" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ST16C654 stoppes transmitting after a while
  2012-07-18  7:45             ` Tobias Arp
@ 2012-07-18 11:08               ` Alan Cox
  2012-07-18 11:16                 ` Tobias Arp
  0 siblings, 1 reply; 12+ messages in thread
From: Alan Cox @ 2012-07-18 11:08 UTC (permalink / raw)
  To: Tobias Arp; +Cc: linux-serial

On Wed, 18 Jul 2012 09:45:30 +0200
"Tobias Arp" <tobiasarp@gmx.de> wrote:

> The problem occurs very quick by setting the baud rate to 230400 on a single null modem connection (without usunign the orher ports of the quad uart). So i would say you are right with your guess...

> > magic: 0x5401
> > name: ttyS0
> > flags: 0x00000A00
> > count: 1
> > stopped: 0
> > hw_stopped: 1
> > flow_stopped: 0
> > packet: 0
> > low_latency: 0
> > warned: 0
> > ctrl_status: 0x00
> > receive_room: 4095

Ok that's not quite what I had expected to see

So the port is in use, not stopped at the higher levels and has lots of
receiver room as expected.

The hw_stopped flag is the hardware state - so somehow we've missed a
transition on the flow control or the UART hasn't bothered reporting it
or the IRQ got lost.

When it jams transmitting does it unjam if you send a single character
the other way ?

Alan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ST16C654 stoppes transmitting after a while
  2012-07-18 11:08               ` Alan Cox
@ 2012-07-18 11:16                 ` Tobias Arp
  0 siblings, 0 replies; 12+ messages in thread
From: Tobias Arp @ 2012-07-18 11:16 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-serial

On Wed, 18 Jul 2012 09:45:30 +0200 "Tobias Arp" <tobiasarp@gmx.de> wrote:
>> The problem occurs very quick by setting the baud rate to 230400 on a single null modem connection (without usunign the orher ports of the quad uart). So i would say you are right with your guess...
>>> magic: 0x5401
>>> name: ttyS0
>>> flags: 0x00000A00
>>> count: 1
>>> stopped: 0
>>> hw_stopped: 1
>>> flow_stopped: 0
>>> packet: 0
>>> low_latency: 0
>>> warned: 0
>>> ctrl_status: 0x00
>>> receive_room: 4095
> Ok that's not quite what I had expected to see
>
> So the port is in use, not stopped at the higher levels and has lots of
> receiver room as expected.
>
> The hw_stopped flag is the hardware state - so somehow we've missed a
> transition on the flow control or the UART hasn't bothered reporting it
> or the IRQ got lost.
>
> When it jams transmitting does it unjam if you send a single character
> the other way ?
>
> Alan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-serial" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

No it doesn't unjam be sending a single character from the remote side. 
It only unjams after plug out and plugin the
null modem cable (so an IRQ is generated caused by the DCD change or CTS 
change while replugin the cable).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ST16C654 stoppes transmitting after a while
  2012-07-18  6:37           ` Tobias Arp
  2012-07-18  7:45             ` Tobias Arp
@ 2012-07-20  4:25             ` Tobias Arp
  2012-07-20 10:10             ` Tobias Arp
  2 siblings, 0 replies; 12+ messages in thread
From: Tobias Arp @ 2012-07-20  4:25 UTC (permalink / raw)
  To: Tobias Arp, alan; +Cc: linux-serial

BTW:

all interfaces work fine with our old kernel version 2.6.15.

Tobias



-------- Original-Nachricht --------
> Datum: Wed, 18 Jul 2012 08:37:04 +0200
> Von: "Tobias Arp" <tobiasarp@gmx.de>
> An: Alan Cox <alan@lxorguk.ukuu.org.uk>
> CC: linux-serial@vger.kernel.org
> Betreff: Re: ST16C654 stoppes transmitting after a while

> The serial links are all set up to 115200 baud.
> 
> Dumping out the tty-struct when it hangs shows me this:
> 
> magic: 0x5401
> name: ttyS0
> flags: 0x00000A00
> count: 1
> stopped: 0
> hw_stopped: 1
> flow_stopped: 0
> packet: 0
> low_latency: 0
> warned: 0
> ctrl_status: 0x00
> receive_room: 4095
> ...
> 
> It is a little bit strange that always the null modem connection shows
> this behaviour. May be it's because it has the highest data transfer rate (all
> other connections are made by isdn modems / 1-channel isdn)?
> 
> 
> 
> 
> -------- Original-Nachricht --------
> > Datum: Tue, 17 Jul 2012 17:38:47 +0100
> > Von: Alan Cox <alan@lxorguk.ukuu.org.uk>
> > An: "Tobias Arp" <tobiasarp@gmx.de>
> > CC: linux-serial@vger.kernel.org
> > Betreff: Re: ST16C654 stoppes transmitting after a while
> 
> > On Tue, 17 Jul 2012 13:36:03 +0200
> > "Tobias Arp" <tobiasarp@gmx.de> wrote:
> > 
> > > This is a single core arm processor (Cirrus EP9315), could this race
> > condition happen in this case? 
> > 
> > In theory, otoh it ought to be quite easy to test for.
> > 
> > How fast are your links - would it be fair to characterise your
> > environment as a fairly slow CPU handling a lot of fast serial links
> > 
> > (just trying to understand what the likely places to look might be)
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-serial"
> in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-serial" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ST16C654 stoppes transmitting after a while
  2012-07-18  6:37           ` Tobias Arp
  2012-07-18  7:45             ` Tobias Arp
  2012-07-20  4:25             ` Tobias Arp
@ 2012-07-20 10:10             ` Tobias Arp
  2 siblings, 0 replies; 12+ messages in thread
From: Tobias Arp @ 2012-07-20 10:10 UTC (permalink / raw)
  To: Tobias Arp, alan; +Cc: linux-serial

I could track down the problem:

The interrupt if the MSR change (DCTS) gets lost on this point:

An interrupt is handled and CTS changes during transmit_chars() and this change is not recognized.

Is spin_lock_irqsave() too slow (on my platform and this circumstanmces? How can i avoid this ?

in 8250.c:


/*
 * This handles the interrupt from one port.
 */
static void serial8250_handle_port(struct uart_8250_port *up)
{
	unsigned int status;
	unsigned int status2;
	unsigned long flags;

	spin_lock_irqsave(&up->port.lock, flags);

	status = serial_inp(up, UART_LSR);

	DEBUG_INTR("status = %x...", status);

	if (status & (UART_LSR_DR | UART_LSR_BI))
		receive_chars(up, &status);
	check_modem_status(up);
	if (status & UART_LSR_THRE)
		transmit_chars(up);
	spin_unlock_irqrestore(&up->port.lock, flags);
}


Thanks

Tobias


-------- Original-Nachricht --------
> Datum: Wed, 18 Jul 2012 08:37:04 +0200
> Von: "Tobias Arp" <tobiasarp@gmx.de>
> An: Alan Cox <alan@lxorguk.ukuu.org.uk>
> CC: linux-serial@vger.kernel.org
> Betreff: Re: ST16C654 stoppes transmitting after a while

> The serial links are all set up to 115200 baud.
> 
> Dumping out the tty-struct when it hangs shows me this:
> 
> magic: 0x5401
> name: ttyS0
> flags: 0x00000A00
> count: 1
> stopped: 0
> hw_stopped: 1
> flow_stopped: 0
> packet: 0
> low_latency: 0
> warned: 0
> ctrl_status: 0x00
> receive_room: 4095
> ...
> 
> It is a little bit strange that always the null modem connection shows
> this behaviour. May be it's because it has the highest data transfer rate (all
> other connections are made by isdn modems / 1-channel isdn)?
> 
> 
> 
> 
> -------- Original-Nachricht --------
> > Datum: Tue, 17 Jul 2012 17:38:47 +0100
> > Von: Alan Cox <alan@lxorguk.ukuu.org.uk>
> > An: "Tobias Arp" <tobiasarp@gmx.de>
> > CC: linux-serial@vger.kernel.org
> > Betreff: Re: ST16C654 stoppes transmitting after a while
> 
> > On Tue, 17 Jul 2012 13:36:03 +0200
> > "Tobias Arp" <tobiasarp@gmx.de> wrote:
> > 
> > > This is a single core arm processor (Cirrus EP9315), could this race
> > condition happen in this case? 
> > 
> > In theory, otoh it ought to be quite easy to test for.
> > 
> > How fast are your links - would it be fair to characterise your
> > environment as a fairly slow CPU handling a lot of fast serial links
> > 
> > (just trying to understand what the likely places to look might be)
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-serial"
> in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-serial" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-07-20 10:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-17  7:43 ST16C654 stoppes transmitting after a while Tobias Arp
2012-07-17  9:52 ` Alan Cox
2012-07-17 10:05   ` Tobias Arp
2012-07-17 11:21     ` Alan Cox
2012-07-17 11:36       ` Tobias Arp
2012-07-17 16:38         ` Alan Cox
2012-07-18  6:37           ` Tobias Arp
2012-07-18  7:45             ` Tobias Arp
2012-07-18 11:08               ` Alan Cox
2012-07-18 11:16                 ` Tobias Arp
2012-07-20  4:25             ` Tobias Arp
2012-07-20 10:10             ` Tobias Arp

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.