[PATCH 0/4] serial: omap: robustify for high speed transfers

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] serial: omap: robustify for high speed transfers
@ 2016-01-22 10:27 ` John Ogness
  0 siblings, 0 replies; 18+ messages in thread
From: John Ogness @ 2016-01-22 10:27 UTC (permalink / raw)
  To: gregkh
  Cc: vinod.koul, dan.j.williams, peter, bigeasy, tony, nsekhar,
	peter.ujfalusi, dmaengine, linux-serial, linux-kernel

The DMA-enabled OMAP UART driver in its current form queues 48 bytes for a
DMA-RX transfer. After the transfer is complete, a new transfer of 48 bytes
is queued. The DMA completion callback runs in tasklet context, so a
reschedule with context switch is required for the completion to be
processed and the next 48 bytes to be queued.

When running at a high speed such as 3Mbit, the CPU has 128us between when
the DMA hardware transfer completes and when the DMA hardware must be fully
prepared for the next transfer. For an embedded board running applications,
this does not give the CPU much time. If the UART is using hardware flow
control, this situation results in a dramatic decrease in real transfer
speeds. If flow control is not used, the CPU will almost certainly be
forced to drop data.

This patch series modifies the UART driver to use cyclic DMA transfers
with a growable ring buffer to accommodate baud rates. The ring buffer is
large enough to hold at least 1s of RX-data. (At 3Mbit that is 367KiB.) In
order to ensure that data in the ring buffer is not overwritten before
being processed by the tty layer, a hrtimer is used as a watchdog.

With this patch series, the UART driver is resilent against latencies up
to 500ms. This means that if no flow control is used, data will not be
dropped until such latencies occur. If hardware flow control is used,
real transfer speeds will not be affected until such latencies occur.

Patch series against next-20160122.

John Ogness (4):
  ARM: edma: special case slot limit workaround
  tty: serial: 8250: add optional spinlock arg to serial8250_rx_chars
  tty: serial: 8250: omap: convert to using cyclic transfers
  tty: serial: 8250: omap: consume spurious interrupts

 drivers/dma/edma.c                  |   25 +-
 drivers/tty/serial/8250/8250.h      |    2 +
 drivers/tty/serial/8250/8250_fsl.c  |    2 +-
 drivers/tty/serial/8250/8250_omap.c |  430 ++++++++++++++++++++++++-----------
 drivers/tty/serial/8250/8250_port.c |    9 +-
 include/linux/serial_8250.h         |    3 +-
 6 files changed, 333 insertions(+), 138 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 0/4] serial: omap: robustify for high speed transfers
@ 2016-01-22 10:27 ` John Ogness
  0 siblings, 0 replies; 18+ messages in thread
From: John Ogness @ 2016-01-22 10:27 UTC (permalink / raw)
  To: gregkh
  Cc: vinod.koul, dan.j.williams, peter, bigeasy, tony, nsekhar,
	peter.ujfalusi, dmaengine, linux-serial, linux-kernel

The DMA-enabled OMAP UART driver in its current form queues 48 bytes for a
DMA-RX transfer. After the transfer is complete, a new transfer of 48 bytes
is queued. The DMA completion callback runs in tasklet context, so a
reschedule with context switch is required for the completion to be
processed and the next 48 bytes to be queued.

When running at a high speed such as 3Mbit, the CPU has 128us between when
the DMA hardware transfer completes and when the DMA hardware must be fully
prepared for the next transfer. For an embedded board running applications,
this does not give the CPU much time. If the UART is using hardware flow
control, this situation results in a dramatic decrease in real transfer
speeds. If flow control is not used, the CPU will almost certainly be
forced to drop data.

This patch series modifies the UART driver to use cyclic DMA transfers
with a growable ring buffer to accommodate baud rates. The ring buffer is
large enough to hold at least 1s of RX-data. (At 3Mbit that is 367KiB.) In
order to ensure that data in the ring buffer is not overwritten before
being processed by the tty layer, a hrtimer is used as a watchdog.

With this patch series, the UART driver is resilent against latencies up
to 500ms. This means that if no flow control is used, data will not be
dropped until such latencies occur. If hardware flow control is used,
real transfer speeds will not be affected until such latencies occur.

Patch series against next-20160122.

John Ogness (4):
  ARM: edma: special case slot limit workaround
  tty: serial: 8250: add optional spinlock arg to serial8250_rx_chars
  tty: serial: 8250: omap: convert to using cyclic transfers
  tty: serial: 8250: omap: consume spurious interrupts

 drivers/dma/edma.c                  |   25 +-
 drivers/tty/serial/8250/8250.h      |    2 +
 drivers/tty/serial/8250/8250_fsl.c  |    2 +-
 drivers/tty/serial/8250/8250_omap.c |  430 ++++++++++++++++++++++++-----------
 drivers/tty/serial/8250/8250_port.c |    9 +-
 include/linux/serial_8250.h         |    3 +-
 6 files changed, 333 insertions(+), 138 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
  2016-01-22 10:27 ` John Ogness
  (?)
@ 2016-01-25 18:56 ` Peter Hurley
  2016-01-29 16:35   ` John Ogness
  -1 siblings, 1 reply; 18+ messages in thread
From: Peter Hurley @ 2016-01-25 18:56 UTC (permalink / raw)
  To: John Ogness, gregkh
  Cc: vinod.koul, dan.j.williams, bigeasy, tony, nsekhar,
	peter.ujfalusi, dmaengine, linux-serial, linux-kernel

On 01/22/2016 02:27 AM, John Ogness wrote:
> The DMA-enabled OMAP UART driver in its current form queues 48 bytes for a
> DMA-RX transfer. After the transfer is complete, a new transfer of 48 bytes
> is queued. The DMA completion callback runs in tasklet context, so a
> reschedule with context switch is required for the completion to be
> processed and the next 48 bytes to be queued.
> 
> When running at a high speed such as 3Mbit, the CPU has 128us between when
> the DMA hardware transfer completes and when the DMA hardware must be fully
> prepared for the next transfer. For an embedded board running applications,
> this does not give the CPU much time. If the UART is using hardware flow
> control, this situation results in a dramatic decrease in real transfer
> speeds. If flow control is not used, the CPU will almost certainly be
> forced to drop data.

I'm not convinced by this logic at all.
Tasklets are not affected by the scheduler because they run in softirq.
Or is this -RT?

I'm not seeing this problem on other platforms at this baud rate, and
on this platform, all I see is lockups with DMA.

What is the test setup to reproduce these results?


> This patch series modifies the UART driver to use cyclic DMA transfers
> with a growable ring buffer to accommodate baud rates. The ring buffer is
> large enough to hold at least 1s of RX-data. 

> (At 3Mbit that is 367KiB.)

Math slightly off because the frame is typically 10 bits, not 8.

> In order to ensure that data in the ring buffer is not overwritten before
> being processed by the tty layer, a hrtimer is used as a watchdog.

How'd it go from "We're just missing 128us window" to "This holds 1s of data"?

And with a latency hit this bad, you'll never get the data to the process
because the tty buffer kworker will buffer-overflow too and its much more
susceptible to timing latency (although not as bad now that it's exclusively
on the unbounded workqueue).

Regards,
Peter Hurley


> With this patch series, the UART driver is resilent against latencies up
> to 500ms. This means that if no flow control is used, data will not be
> dropped until such latencies occur. If hardware flow control is used,
> real transfer speeds will not be affected until such latencies occur.
> 
> Patch series against next-20160122.
> 
> John Ogness (4):
>   ARM: edma: special case slot limit workaround
>   tty: serial: 8250: add optional spinlock arg to serial8250_rx_chars
>   tty: serial: 8250: omap: convert to using cyclic transfers
>   tty: serial: 8250: omap: consume spurious interrupts
> 
>  drivers/dma/edma.c                  |   25 +-
>  drivers/tty/serial/8250/8250.h      |    2 +
>  drivers/tty/serial/8250/8250_fsl.c  |    2 +-
>  drivers/tty/serial/8250/8250_omap.c |  430 ++++++++++++++++++++++++-----------
>  drivers/tty/serial/8250/8250_port.c |    9 +-
>  include/linux/serial_8250.h         |    3 +-
>  6 files changed, 333 insertions(+), 138 deletions(-)
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
  2016-01-25 18:56 ` Peter Hurley
@ 2016-01-29 16:35   ` John Ogness
  2016-02-03  1:21     ` Peter Hurley
  0 siblings, 1 reply; 18+ messages in thread
From: John Ogness @ 2016-01-29 16:35 UTC (permalink / raw)
  To: Peter Hurley
  Cc: gregkh, vinod.koul, dan.j.williams, bigeasy, tony, nsekhar,
	peter.ujfalusi, dmaengine, linux-serial, linux-kernel

Hi Peter,

On 2016-01-25, Peter Hurley <peter@hurleysoftware.com> wrote:
>> The DMA-enabled OMAP UART driver in its current form queues 48 bytes
>> for a DMA-RX transfer. After the transfer is complete, a new transfer
>> of 48 bytes is queued. The DMA completion callback runs in tasklet
>> context, so a reschedule with context switch is required for the
>> completion to be processed and the next 48 bytes to be queued.
>> 
>> When running at a high speed such as 3Mbit, the CPU has 128us between
>> when the DMA hardware transfer completes and when the DMA hardware
>> must be fully prepared for the next transfer. For an embedded board
>> running applications, this does not give the CPU much time. If the
>> UART is using hardware flow control, this situation results in a
>> dramatic decrease in real transfer speeds. If flow control is not
>> used, the CPU will almost certainly be forced to drop data.
>
> I'm not convinced by this logic at all.
> Tasklets are not affected by the scheduler because they run in softirq.
> Or is this -RT?

Softirq runs as SCHED_OTHER. It is quite easy to create a scenario where
DMA completion tasklets for this driver are not being serviced fast
enough.

> I'm not seeing this problem on other platforms at this baud rate,

Do you run 3Mbit on other platforms without hardware flow control? I
mention this because turning on hardware flow control can cover up the
driver shortcomings by slowing down the transfers. What good is 3Mbit
hardware if the driver never lets it get above 500Kbit on bulk
transfers?

> and on this platform, all I see is lockups with DMA.

I have seen (and fixed) interesting issues with the AM335x eDMA, but I
have not experienced lockups in any of my testing. I'd be curious how
you trigger that.

> What is the test setup to reproduce these results?

Two Beaglebone boards connected via ttyS1. ttyS1's are set to raw mode
at 3Mbit.

sender:   cat bigfile > /dev/ttyS1
receiver: cat /dev/ttyS1 > bigfile

I am working on creating concrete examples that demonstrate not only
that this patch series reduces system load (and thus can increase
throughput on heavy system loads with hardware flow control), but also
that it is able to handle baud rates without data loss well beyond the
current implementation when no flow control is used.

I wanted to wait until I had all the results before answering your
email. But I'm getting caught up in other tasks right now, so it may
take a few more weeks.

>> This patch series modifies the UART driver to use cyclic DMA transfers
>> with a growable ring buffer to accommodate baud rates. The ring buffer is
>> large enough to hold at least 1s of RX-data. 
>> (At 3Mbit that is 367KiB.)
>
> Math slightly off because the frame is typically 10 bits, not 8.

I was thinking 8 was the minimal frame size. Thanks for pointing that
out. A frame can contain 7-12 bits so I will modify the code to create a
buffer appropriate for the UART settings. At 3Mbit with 5n1 the driver
would require a 419KiB ring buffer (8929 DMA periods of 48 bytes).

>> In order to ensure that data in the ring buffer is not overwritten before
>> being processed by the tty layer, a hrtimer is used as a watchdog.
>
> How'd it go from "We're just missing 128us window" to "This holds 1s
> of data"?

First, you need to recognize that DMA completion tasklets can be delayed
significantly due to interrupt loads or rtprio processes (even on non-RT
systems). And at 3Mbit we are talking about >12000 interrupts per
second!

When using cyclic transfers, the only real concern is that the DMA
overwrites data in the ring buffer before the CPU has processed it due
to tasklet delays. That is what the hrtimer watchdog is for.

Assuming the DMA is zooming along at full speed, the watchdog must be
able to trigger before the ring buffer can fill up. If the watchdog sees
the ring buffer is getting full, it pauses the DMA engine. But with
cyclic DMA we never know if the DMA is zooming or sitting idle. So even
on an idle system, the watchdog must assume DMA zooming and continually
fire to check the status.

I chose 1 second buffer sizes and set the watchdog to fire at half
that. On an idle system you will see at most 2 new interrupts per second
due to this patch series. I thought that would be an acceptable trade
off. Whether the watchdog should fire at 50% buffer full or say 90%
buffer full is something that could be debated. But to answer your
question, the big ring buffer is really to keep the watchdog interrupts
low frequency.

> And with a latency hit this bad, you'll never get the data to the
> process because the tty buffer kworker will buffer-overflow too and
> its much more susceptible to timing latency (although not as bad now
> that it's exclusively on the unbounded workqueue).

Yes, you are correct. But I think that is a problem that should be
addressed at the tty layer.

John Ogness

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
  2016-01-29 16:35   ` John Ogness
@ 2016-02-03  1:21     ` Peter Hurley
  2016-02-11 12:02       ` John Ogness
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Hurley @ 2016-02-03  1:21 UTC (permalink / raw)
  To: John Ogness
  Cc: gregkh, vinod.koul, dan.j.williams, bigeasy, tony, nsekhar,
	peter.ujfalusi, dmaengine, linux-serial, linux-kernel

On 01/29/2016 08:35 AM, John Ogness wrote:
> Hi Peter,
> 
> On 2016-01-25, Peter Hurley <peter@hurleysoftware.com> wrote:
>>> The DMA-enabled OMAP UART driver in its current form queues 48 bytes
>>> for a DMA-RX transfer. After the transfer is complete, a new transfer
>>> of 48 bytes is queued. The DMA completion callback runs in tasklet
>>> context, so a reschedule with context switch is required for the
>>> completion to be processed and the next 48 bytes to be queued.
>>>
>>> When running at a high speed such as 3Mbit, the CPU has 128us between
>>> when the DMA hardware transfer completes and when the DMA hardware
>>> must be fully prepared for the next transfer. For an embedded board
>>> running applications, this does not give the CPU much time. If the
>>> UART is using hardware flow control, this situation results in a
>>> dramatic decrease in real transfer speeds. If flow control is not
>>> used, the CPU will almost certainly be forced to drop data.
>>
>> I'm not convinced by this logic at all.
>> Tasklets are not affected by the scheduler because they run in softirq.
>> Or is this -RT?
> 
> Softirq runs as SCHED_OTHER. It is quite easy to create a scenario where
> DMA completion tasklets for this driver are not being serviced fast
> enough.
> 
>> I'm not seeing this problem on other platforms at this baud rate,
> 
> Do you run 3Mbit on other platforms without hardware flow control?

Yes, but only unidirectionally.


> I mention this because turning on hardware flow control can cover up the
> driver shortcomings by slowing down the transfers. What good is 3Mbit
> hardware if the driver never lets it get above 500Kbit on bulk
> transfers?

That's interesting. I wonder why the 6x hit when using h/w flow control.
Any thoughts on that?

>> and on this platform, all I see is lockups with DMA.
> 
> I have seen (and fixed) interesting issues with the AM335x eDMA, but I
> have not experienced lockups in any of my testing. I'd be curious how
> you trigger that.

I haven't tested it since 4.1. I'll go back and re-enable DMA and retest.


>> What is the test setup to reproduce these results?
> 
> Two Beaglebone boards connected via ttyS1. ttyS1's are set to raw mode
> at 3Mbit.
> 
> sender:   cat bigfile > /dev/ttyS1
> receiver: cat /dev/ttyS1 > bigfile

Ok, I can repro something similar.


> I am working on creating concrete examples that demonstrate not only
> that this patch series reduces system load (and thus can increase
> throughput on heavy system loads with hardware flow control), but also
> that it is able to handle baud rates without data loss well beyond the
> current implementation when no flow control is used.
> 
> I wanted to wait until I had all the results before answering your
> email. But I'm getting caught up in other tasks right now, so it may
> take a few more weeks.

Ok. So just to be clear here: this patchset is really all about
performance improvement and not correct operation?


>>> This patch series modifies the UART driver to use cyclic DMA transfers
>>> with a growable ring buffer to accommodate baud rates. The ring buffer is
>>> large enough to hold at least 1s of RX-data. 
>>> (At 3Mbit that is 367KiB.)
>>
>> Math slightly off because the frame is typically 10 bits, not 8.
> 
> I was thinking 8 was the minimal frame size. Thanks for pointing that
> out. A frame can contain 7-12 bits so I will modify the code to create a
> buffer appropriate for the UART settings. At 3Mbit with 5n1 the driver
> would require a 419KiB ring buffer (8929 DMA periods of 48 bytes).

More about this below.


>>> In order to ensure that data in the ring buffer is not overwritten before
>>> being processed by the tty layer, a hrtimer is used as a watchdog.
>>
>> How'd it go from "We're just missing 128us window" to "This holds 1s
>> of data"?
> 
> First, you need to recognize that DMA completion tasklets can be delayed
> significantly due to interrupt loads or rtprio processes (even on non-RT
> systems). And at 3Mbit we are talking about >12000 interrupts per
> second!

Not sure I see 12000 ints/sec. unless you're talking about full-duplex
at max rate in both directions?  3Mbit/sec / 10-bit frame / 48 bytes/dma = 
6250 ints/sec.

But again, interrupt load is not going to result in 100ms service intervals.
So I think we're really talking about a (misbehaved) rtprio process that's
starving i/o.


> When using cyclic transfers, the only real concern is that the DMA
> overwrites data in the ring buffer before the CPU has processed it due
> to tasklet delays. That is what the hrtimer watchdog is for.
> 
> Assuming the DMA is zooming along at full speed, the watchdog must be
> able to trigger before the ring buffer can fill up. If the watchdog sees
> the ring buffer is getting full, it pauses the DMA engine. But with
> cyclic DMA we never know if the DMA is zooming or sitting idle. So even
> on an idle system, the watchdog must assume DMA zooming and continually
> fire to check the status.
> 
> I chose 1 second buffer sizes and set the watchdog to fire at half
> that. On an idle system you will see at most 2 new interrupts per second
> due to this patch series. I thought that would be an acceptable trade
> off. Whether the watchdog should fire at 50% buffer full or say 90%
> buffer full is something that could be debated. But to answer your
> question, the big ring buffer is really to keep the watchdog interrupts
> low frequency.

Ok, but your fundamental premise is that all of this is an acceptable
space-time tradeoff for everyone using this platform, when it's not.

So I'm trying to understand the actual use case you're trying to address.
I doubt that's 5n1, full-duplex.


>> And with a latency hit this bad, you'll never get the data to the
>> process because the tty buffer kworker will buffer-overflow too and
>> its much more susceptible to timing latency (although not as bad now
>> that it's exclusively on the unbounded workqueue).
> 
> Yes, you are correct. But I think that is a problem that should be
> addressed at the tty layer.

I disagree. I think you should fix the source of 500ms latency.

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
  2016-02-03  1:21     ` Peter Hurley
@ 2016-02-11 12:02       ` John Ogness
  2016-02-11 21:00         ` Tony Lindgren
  0 siblings, 1 reply; 18+ messages in thread
From: John Ogness @ 2016-02-11 12:02 UTC (permalink / raw)
  To: Peter Hurley
  Cc: gregkh, vinod.koul, dan.j.williams, bigeasy, tony, nsekhar,
	peter.ujfalusi, dmaengine, linux-serial, linux-kernel

On 2016-02-03, Peter Hurley <peter@hurleysoftware.com> wrote:
>>>> The DMA-enabled OMAP UART driver in its current form queues 48
>>>> bytes for a DMA-RX transfer. After the transfer is complete, a new
>>>> transfer of 48 bytes is queued. The DMA completion callback runs in
>>>> tasklet context, so a reschedule with context switch is required
>>>> for the completion to be processed and the next 48 bytes to be
>>>> queued.
>>>>
>>>> When running at a high speed such as 3Mbit, the CPU has 128us
>>>> between when the DMA hardware transfer completes and when the DMA
>>>> hardware must be fully prepared for the next transfer. For an
>>>> embedded board running applications, this does not give the CPU
>>>> much time. If the UART is using hardware flow control, this
>>>> situation results in a dramatic decrease in real transfer
>>>> speeds. If flow control is not used, the CPU will almost certainly
>>>> be forced to drop data.
>>>
>>> I'm not convinced by this logic at all.
>>> Tasklets are not affected by the scheduler because they run in
>>> softirq.  Or is this -RT?
>> 
>> Softirq runs as SCHED_OTHER. It is quite easy to create a scenario
>> where DMA completion tasklets for this driver are not being serviced
>> fast enough.
>> 
>>> I'm not seeing this problem on other platforms at this baud rate,
>> 
>> Do you run 3Mbit on other platforms without hardware flow control?
>
> Yes, but only unidirectionally.

It surprises me to hear that you are running UART with DMA on an AM335x
platform and are able to sustain 3Mbit transfers with the driver in its
current form.

>> I mention this because turning on hardware flow control can cover up
>> the driver shortcomings by slowing down the transfers. What good is
>> 3Mbit hardware if the driver never lets it get above 500Kbit on bulk
>> transfers?
>
> That's interesting. I wonder why the 6x hit when using h/w flow
> control.  Any thoughts on that?

The CPU is busy handling interrupts, copying data to the tty layer, and
setting up DMA transfers. No CPU cycles to spare. Without h/w flow
control it is not fast enough to keep up and data is lost. With h/w flow
control the UART controller is slowing the transfer so that the CPU can
keep up.

>>> and on this platform, all I see is lockups with DMA.
>> 
>> I have seen (and fixed) interesting issues with the AM335x eDMA, but
>> I have not experienced lockups in any of my testing. I'd be curious
>> how you trigger that.
>
> I haven't tested it since 4.1. I'll go back and re-enable DMA and
> retest.

You will need to remove the broken-flag that is in mainline. Otherwise
the driver will refuse to use DMA.

>>> What is the test setup to reproduce these results?
>> 
>> Two Beaglebone boards connected via ttyS1. ttyS1's are set to raw
>> mode at 3Mbit.
>> 
>> sender:   cat bigfile > /dev/ttyS1
>> receiver: cat /dev/ttyS1 > bigfile
>
> Ok, I can repro something similar.
>
>> I am working on creating concrete examples that demonstrate not only
>> that this patch series reduces system load (and thus can increase
>> throughput on heavy system loads with hardware flow control), but
>> also that it is able to handle baud rates without data loss well
>> beyond the current implementation when no flow control is used.
>> 
>> I wanted to wait until I had all the results before answering your
>> email. But I'm getting caught up in other tasks right now, so it may
>> take a few more weeks.
>
> Ok. So just to be clear here: this patchset is really all about
> performance improvement and not correct operation?

Correct.

>>>> This patch series modifies the UART driver to use cyclic DMA
>>>> transfers with a growable ring buffer to accommodate baud
>>>> rates. The ring buffer is large enough to hold at least 1s of
>>>> RX-data.  (At 3Mbit that is 367KiB.)
>>>
>>> Math slightly off because the frame is typically 10 bits, not 8.
>> 
>> I was thinking 8 was the minimal frame size. Thanks for pointing that
>> out. A frame can contain 7-12 bits so I will modify the code to
>> create a buffer appropriate for the UART settings. At 3Mbit with 5n1
>> the driver would require a 419KiB ring buffer (8929 DMA periods of 48
>> bytes).
>
> More about this below.
>
>>>> In order to ensure that data in the ring buffer is not overwritten
>>>> before being processed by the tty layer, a hrtimer is used as a
>>>> watchdog.
>>>
>>> How'd it go from "We're just missing 128us window" to "This holds 1s
>>> of data"?
>> 
>> First, you need to recognize that DMA completion tasklets can be
>> delayed significantly due to interrupt loads or rtprio processes
>> (even on non-RT systems). And at 3Mbit we are talking about >12000
>> interrupts per second!
>
> Not sure I see 12000 ints/sec. unless you're talking about full-duplex
> at max rate in both directions?  3Mbit/sec / 10-bit frame / 48
> bytes/dma = 6250 ints/sec.

At these speeds, nearly every DMA interrupt is accompanied by a spurious
UART interrupt. So, sadly, the interrupts are doubled.

It is on my TODO list to verify if the spurious UART interrupts exactly
match the recently added [0] spurious interrupt detection in omap-intc.

> But again, interrupt load is not going to result in 100ms service
> intervals.  So I think we're really talking about a (misbehaved)
> rtprio process that's starving i/o.

I am running my tests with busybox-init, busybox-sh, and
busybox-cat. That is it!

>> When using cyclic transfers, the only real concern is that the DMA
>> overwrites data in the ring buffer before the CPU has processed it
>> due to tasklet delays. That is what the hrtimer watchdog is for.
>> 
>> Assuming the DMA is zooming along at full speed, the watchdog must be
>> able to trigger before the ring buffer can fill up. If the watchdog
>> sees the ring buffer is getting full, it pauses the DMA engine. But
>> with cyclic DMA we never know if the DMA is zooming or sitting
>> idle. So even on an idle system, the watchdog must assume DMA zooming
>> and continually fire to check the status.
>> 
>> I chose 1 second buffer sizes and set the watchdog to fire at half
>> that. On an idle system you will see at most 2 new interrupts per
>> second due to this patch series. I thought that would be an
>> acceptable trade off. Whether the watchdog should fire at 50% buffer
>> full or say 90% buffer full is something that could be debated. But
>> to answer your question, the big ring buffer is really to keep the
>> watchdog interrupts low frequency.
>
> Ok, but your fundamental premise is that all of this is an acceptable
> space-time tradeoff for everyone using this platform, when it's not.
>
> So I'm trying to understand the actual use case you're trying to
> address.  I doubt that's 5n1, full-duplex.

The actual use case is playing/recording high quality audio via
bluetooth.

>>> And with a latency hit this bad, you'll never get the data to the
>>> process because the tty buffer kworker will buffer-overflow too and
>>> its much more susceptible to timing latency (although not as bad now
>>> that it's exclusively on the unbounded workqueue).
>> 
>> Yes, you are correct. But I think that is a problem that should be
>> addressed at the tty layer.
>
> I disagree. I think you should fix the source of 500ms latency.

All my experimentation and testing has showed that the latency occurs
because the CPU has too much to handle. So I devised an implementation
based on cyclic transfers and a ring buffer to free up the CPU. With the
new implementation the CPU is not required to setup a DMA transfer every
112-192us. And if the CPU is only able to service a small percentage of
the DMA completion interrupts, there can still exist full speed
transfers with no data loss.

Now if the tty layer is unable to handle 3Mbit because the CPU is so
busy, then it may be necessary to offload more work from the CPU. For
example, giving the tty its own high priority workqueue also eased the
situation. Now the CPU will miss even more of the DMA completion
interrupts, but the watchdog and ring buffer can allow that.

Perhaps instead we should be discussing a new API where UART drivers can
DMA bulk transfers directly to userspace. Until now my efforts have been
focussed on improving performance of the existing framework.

John Ogness

[0] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1000296.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
  2016-02-11 12:02       ` John Ogness
@ 2016-02-11 21:00         ` Tony Lindgren
  2016-02-22 15:30           ` John Ogness
  0 siblings, 1 reply; 18+ messages in thread
From: Tony Lindgren @ 2016-02-11 21:00 UTC (permalink / raw)
  To: John Ogness
  Cc: Peter Hurley, gregkh, vinod.koul, dan.j.williams, bigeasy,
	nsekhar, peter.ujfalusi, dmaengine, linux-serial, linux-kernel

Hi,

* John Ogness <john.ogness@linutronix.de> [160211 04:04]:
> 
> At these speeds, nearly every DMA interrupt is accompanied by a spurious
> UART interrupt. So, sadly, the interrupts are doubled.
> 
> It is on my TODO list to verify if the spurious UART interrupts exactly
> match the recently added [0] spurious interrupt detection in omap-intc.

If you're seeing spurious interrupts you may want try adding
a flush of posted write at the end of the 8250_omap interrupt
handler. Basically read back some register from the 8250. This
has fixed so far pretty much all the spurious IRQ issues for
omaps using the drivers/irqchip/irq-omap-intc.c, meaning omap3
and am335x and ti81xx variants too most likely.

Regards,

Tony

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
  2016-02-11 21:00         ` Tony Lindgren
@ 2016-02-22 15:30           ` John Ogness
  2016-02-22 19:38             ` Tony Lindgren
  2016-02-23  9:59               ` Sekhar Nori
  0 siblings, 2 replies; 18+ messages in thread
From: John Ogness @ 2016-02-22 15:30 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Peter Hurley, gregkh, vinod.koul, dan.j.williams, bigeasy,
	nsekhar, peter.ujfalusi, dmaengine, linux-serial, linux-kernel

Hi Tony,

On 2016-02-11, Tony Lindgren <tony@atomide.com> wrote:
>> At these speeds, nearly every DMA interrupt is accompanied by a
>> spurious UART interrupt. So, sadly, the interrupts are doubled.
>> 
>> It is on my TODO list to verify if the spurious UART interrupts
>> exactly match the recently added [0] spurious interrupt detection in
>> omap-intc.
>
> If you're seeing spurious interrupts you may want try adding
> a flush of posted write at the end of the 8250_omap interrupt
> handler. Basically read back some register from the 8250. This
> has fixed so far pretty much all the spurious IRQ issues for
> omaps using the drivers/irqchip/irq-omap-intc.c, meaning omap3
> and am335x and ti81xx variants too most likely.

I have done significant testing with this using linux-next-20160219. The
only changes I made were to disable the "rx_dma_broken" feature so that
DMA would definately be used. I created a simple test where I send 48000
bytes at 230400bps over UART from one device to another. Several
different target devices and configurations were used to test the RX-DMA
feature of the 8250_omap. The expected result is 1000 DMA interrupts and
0 UART interrupts.

With the am335x (Beaglebone Black, eDMA engine) I see 1000 DMA
interrupts and 1000 spurious UART interrupts. The spurious UART
interrupts arrive 30-50us _before_ the DMA interrupts. Always.

If I disable UART timeout interrupts (RDI), the same test generates no
spurious UART interrupts. Only 1000 DMA interrupts.

I ran the same test using a dra7 board (sDMA engine) as the target
device. RDI was enabled. Here I see no spurious interrupts.

I modified the dra7 device tree to use the eDMA engine with the
UART. RDI was enabled. Here I also see no spurious interrupts.

The dra7 uses a different interrupt controller (irq-gic instead of
irq-omap-intc), which probably explains why dra7+edma+rdi works
correctly.

I tried adding various and multiple register reads at the end of the
interrupt handlers, but this made no difference. What is interesting is
the fact that the spurious UART interrupt always _preceeds_ the DMA
interrupt and by a significant yet relatively consistent amount
(30-50us). Even the very first DMA interrupt is preceeded by the
spurious interrupt. It is as if the UART timeout logic is triggering
because it does not notice that the eDMA is pulling the data from the
FIFO. But only when the irq-omap-intc in involved.

Tony, if you have any futher ideas, I'd be happy to try them out.

Summary: If DMA is ever going to be re-enabled for am335x/8250_omap,
then it will be necessary to return IRQ_HANDLED for the spurious UART
interrupts that will preceed each DMA-RX completion interrupt.

John Ogness

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
  2016-02-22 15:30           ` John Ogness
@ 2016-02-22 19:38             ` Tony Lindgren
  2016-02-23  9:59               ` Sekhar Nori
  1 sibling, 0 replies; 18+ messages in thread
From: Tony Lindgren @ 2016-02-22 19:38 UTC (permalink / raw)
  To: John Ogness
  Cc: Peter Hurley, gregkh, vinod.koul, dan.j.williams, bigeasy,
	nsekhar, peter.ujfalusi, dmaengine, linux-serial, linux-kernel,
	linux-omap

* John Ogness <john.ogness@linutronix.de> [160222 07:30]:
> Hi Tony,
> 
> On 2016-02-11, Tony Lindgren <tony@atomide.com> wrote:
> >> At these speeds, nearly every DMA interrupt is accompanied by a
> >> spurious UART interrupt. So, sadly, the interrupts are doubled.
> >> 
> >> It is on my TODO list to verify if the spurious UART interrupts
> >> exactly match the recently added [0] spurious interrupt detection in
> >> omap-intc.
> >
> > If you're seeing spurious interrupts you may want try adding
> > a flush of posted write at the end of the 8250_omap interrupt
> > handler. Basically read back some register from the 8250. This
> > has fixed so far pretty much all the spurious IRQ issues for
> > omaps using the drivers/irqchip/irq-omap-intc.c, meaning omap3
> > and am335x and ti81xx variants too most likely.
> 
> I have done significant testing with this using linux-next-20160219. The
> only changes I made were to disable the "rx_dma_broken" feature so that
> DMA would definately be used. I created a simple test where I send 48000
> bytes at 230400bps over UART from one device to another. Several
> different target devices and configurations were used to test the RX-DMA
> feature of the 8250_omap. The expected result is 1000 DMA interrupts and
> 0 UART interrupts.
> 
> With the am335x (Beaglebone Black, eDMA engine) I see 1000 DMA
> interrupts and 1000 spurious UART interrupts. The spurious UART
> interrupts arrive 30-50us _before_ the DMA interrupts. Always.
> 
> If I disable UART timeout interrupts (RDI), the same test generates no
> spurious UART interrupts. Only 1000 DMA interrupts.
> 
> I ran the same test using a dra7 board (sDMA engine) as the target
> device. RDI was enabled. Here I see no spurious interrupts.
> 
> I modified the dra7 device tree to use the eDMA engine with the
> UART. RDI was enabled. Here I also see no spurious interrupts.
> 
> The dra7 uses a different interrupt controller (irq-gic instead of
> irq-omap-intc), which probably explains why dra7+edma+rdi works
> correctly.
> 
> I tried adding various and multiple register reads at the end of the
> interrupt handlers, but this made no difference. What is interesting is
> the fact that the spurious UART interrupt always _preceeds_ the DMA
> interrupt and by a significant yet relatively consistent amount
> (30-50us). Even the very first DMA interrupt is preceeded by the
> spurious interrupt. It is as if the UART timeout logic is triggering
> because it does not notice that the eDMA is pulling the data from the
> FIFO. But only when the irq-omap-intc in involved.
> 
> Tony, if you have any futher ideas, I'd be happy to try them out.
> 
> Summary: If DMA is ever going to be re-enabled for am335x/8250_omap,
> then it will be necessary to return IRQ_HANDLED for the spurious UART
> interrupts that will preceed each DMA-RX completion interrupt.

Well thanks for checking, sounds like this is some UART specific
issue. I guess one more thing you could try is adding a read backs
to the functions staring the DMA transfers and see if that makes
any difference.

Regards,

Tony

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
  2016-02-22 15:30           ` John Ogness
@ 2016-02-23  9:59               ` Sekhar Nori
  2016-02-23  9:59               ` Sekhar Nori
  1 sibling, 0 replies; 18+ messages in thread
From: Sekhar Nori @ 2016-02-23  9:59 UTC (permalink / raw)
  To: John Ogness, Tony Lindgren
  Cc: Peter Hurley, gregkh, vinod.koul, dan.j.williams, bigeasy,
	peter.ujfalusi, dmaengine, linux-serial, linux-kernel

On Monday 22 February 2016 09:00 PM, John Ogness wrote:
> Hi Tony,
> 
> On 2016-02-11, Tony Lindgren <tony@atomide.com> wrote:
>>> At these speeds, nearly every DMA interrupt is accompanied by a
>>> spurious UART interrupt. So, sadly, the interrupts are doubled.
>>>
>>> It is on my TODO list to verify if the spurious UART interrupts
>>> exactly match the recently added [0] spurious interrupt detection in
>>> omap-intc.
>>
>> If you're seeing spurious interrupts you may want try adding
>> a flush of posted write at the end of the 8250_omap interrupt
>> handler. Basically read back some register from the 8250. This
>> has fixed so far pretty much all the spurious IRQ issues for
>> omaps using the drivers/irqchip/irq-omap-intc.c, meaning omap3
>> and am335x and ti81xx variants too most likely.
> 
> I have done significant testing with this using linux-next-20160219. The
> only changes I made were to disable the "rx_dma_broken" feature so that
> DMA would definately be used. I created a simple test where I send 48000
> bytes at 230400bps over UART from one device to another. Several
> different target devices and configurations were used to test the RX-DMA
> feature of the 8250_omap. The expected result is 1000 DMA interrupts and
> 0 UART interrupts.
> 
> With the am335x (Beaglebone Black, eDMA engine) I see 1000 DMA
> interrupts and 1000 spurious UART interrupts. The spurious UART
> interrupts arrive 30-50us _before_ the DMA interrupts. Always.
> 
> If I disable UART timeout interrupts (RDI), the same test generates no
> spurious UART interrupts. Only 1000 DMA interrupts.

To be clear, these interrupts are not caught as spurious by the
interrupt controller (INTC). They are detected by INTC as UART
interrupts. Just that you don't expect a timeout interrupt to happen at
the time you see the interrupt, correct?

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
@ 2016-02-23  9:59               ` Sekhar Nori
  0 siblings, 0 replies; 18+ messages in thread
From: Sekhar Nori @ 2016-02-23  9:59 UTC (permalink / raw)
  To: John Ogness, Tony Lindgren
  Cc: Peter Hurley, gregkh, vinod.koul, dan.j.williams, bigeasy,
	peter.ujfalusi, dmaengine, linux-serial, linux-kernel

On Monday 22 February 2016 09:00 PM, John Ogness wrote:
> Hi Tony,
> 
> On 2016-02-11, Tony Lindgren <tony@atomide.com> wrote:
>>> At these speeds, nearly every DMA interrupt is accompanied by a
>>> spurious UART interrupt. So, sadly, the interrupts are doubled.
>>>
>>> It is on my TODO list to verify if the spurious UART interrupts
>>> exactly match the recently added [0] spurious interrupt detection in
>>> omap-intc.
>>
>> If you're seeing spurious interrupts you may want try adding
>> a flush of posted write at the end of the 8250_omap interrupt
>> handler. Basically read back some register from the 8250. This
>> has fixed so far pretty much all the spurious IRQ issues for
>> omaps using the drivers/irqchip/irq-omap-intc.c, meaning omap3
>> and am335x and ti81xx variants too most likely.
> 
> I have done significant testing with this using linux-next-20160219. The
> only changes I made were to disable the "rx_dma_broken" feature so that
> DMA would definately be used. I created a simple test where I send 48000
> bytes at 230400bps over UART from one device to another. Several
> different target devices and configurations were used to test the RX-DMA
> feature of the 8250_omap. The expected result is 1000 DMA interrupts and
> 0 UART interrupts.
> 
> With the am335x (Beaglebone Black, eDMA engine) I see 1000 DMA
> interrupts and 1000 spurious UART interrupts. The spurious UART
> interrupts arrive 30-50us _before_ the DMA interrupts. Always.
> 
> If I disable UART timeout interrupts (RDI), the same test generates no
> spurious UART interrupts. Only 1000 DMA interrupts.

To be clear, these interrupts are not caught as spurious by the
interrupt controller (INTC). They are detected by INTC as UART
interrupts. Just that you don't expect a timeout interrupt to happen at
the time you see the interrupt, correct?

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
  2016-02-23  9:59               ` Sekhar Nori
  (?)
@ 2016-02-23 12:43               ` Sebastian Andrzej Siewior
  2016-02-23 16:56                 ` Andy Shevchenko
  -1 siblings, 1 reply; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-02-23 12:43 UTC (permalink / raw)
  To: Sekhar Nori, John Ogness, Tony Lindgren
  Cc: Peter Hurley, gregkh, vinod.koul, dan.j.williams, peter.ujfalusi,
	dmaengine, linux-serial, linux-kernel

On 02/23/2016 10:59 AM, Sekhar Nori wrote:
>> With the am335x (Beaglebone Black, eDMA engine) I see 1000 DMA
>> interrupts and 1000 spurious UART interrupts. The spurious UART
>> interrupts arrive 30-50us _before_ the DMA interrupts. Always.
>>
>> If I disable UART timeout interrupts (RDI), the same test generates no
>> spurious UART interrupts. Only 1000 DMA interrupts.
> 
> To be clear, these interrupts are not caught as spurious by the
> interrupt controller (INTC). They are detected by INTC as UART
> interrupts. Just that you don't expect a timeout interrupt to happen at
> the time you see the interrupt, correct?

>From what I remember the INTC says it is UART, correct.
But UART's status register says "no interrupt" (IIR has UART_IIR_NO_INT
set). So the UART driver returns IRQ_NONE which counts as spurious.

It is just that once you disable RDI there are no more interrupts coming
during DMA transfer.

> Thanks,
> Sekhar
> 
Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
  2016-02-23 12:43               ` Sebastian Andrzej Siewior
@ 2016-02-23 16:56                 ` Andy Shevchenko
  0 siblings, 0 replies; 18+ messages in thread
From: Andy Shevchenko @ 2016-02-23 16:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Sekhar Nori, John Ogness, Tony Lindgren, Peter Hurley,
	Greg Kroah-Hartman, Vinod Koul, Dan Williams, Peter Ujfalusi,
	dmaengine, linux-serial, linux-kernel

On Tue, Feb 23, 2016 at 2:43 PM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> On 02/23/2016 10:59 AM, Sekhar Nori wrote:
>>> With the am335x (Beaglebone Black, eDMA engine) I see 1000 DMA
>>> interrupts and 1000 spurious UART interrupts. The spurious UART
>>> interrupts arrive 30-50us _before_ the DMA interrupts. Always.
>>>
>>> If I disable UART timeout interrupts (RDI), the same test generates no
>>> spurious UART interrupts. Only 1000 DMA interrupts.
>>
>> To be clear, these interrupts are not caught as spurious by the
>> interrupt controller (INTC). They are detected by INTC as UART
>> interrupts. Just that you don't expect a timeout interrupt to happen at
>> the time you see the interrupt, correct?
>
> From what I remember the INTC says it is UART, correct.
> But UART's status register says "no interrupt" (IIR has UART_IIR_NO_INT
> set). So the UART driver returns IRQ_NONE which counts as spurious.
>
> It is just that once you disable RDI there are no more interrupts coming
> during DMA transfer.

Hmm... How I missed this discussion?

I'm trying to resolve few issues I found on 8250_dw using higher baud
rates and DMA on internal loopback.
One of the issue I got is the set of spurious interrupts which lead to
"too much work for irq" message.

Reading deeply Intel's documentation on some UART I found the following:

"Receive Data Available Interrupt
...
It is recommended to disable this interrupt when running with DMA mode."

The problem is that we have no separate bit to control timeout
interrupts from UART.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
  2016-02-23  9:59               ` Sekhar Nori
  (?)
  (?)
@ 2016-02-24  3:20               ` Peter Hurley
  2016-02-24 15:37                   ` Sekhar Nori
  -1 siblings, 1 reply; 18+ messages in thread
From: Peter Hurley @ 2016-02-24  3:20 UTC (permalink / raw)
  To: Sekhar Nori, John Ogness, Tony Lindgren
  Cc: gregkh, vinod.koul, dan.j.williams, bigeasy, peter.ujfalusi,
	dmaengine, linux-serial, linux-kernel

On 02/23/2016 01:59 AM, Sekhar Nori wrote:
> On Monday 22 February 2016 09:00 PM, John Ogness wrote:
>> Hi Tony,
>>
>> On 2016-02-11, Tony Lindgren <tony@atomide.com> wrote:
>>>> At these speeds, nearly every DMA interrupt is accompanied by a
>>>> spurious UART interrupt. So, sadly, the interrupts are doubled.
>>>>
>>>> It is on my TODO list to verify if the spurious UART interrupts
>>>> exactly match the recently added [0] spurious interrupt detection in
>>>> omap-intc.
>>>
>>> If you're seeing spurious interrupts you may want try adding
>>> a flush of posted write at the end of the 8250_omap interrupt
>>> handler. Basically read back some register from the 8250. This
>>> has fixed so far pretty much all the spurious IRQ issues for
>>> omaps using the drivers/irqchip/irq-omap-intc.c, meaning omap3
>>> and am335x and ti81xx variants too most likely.
>>
>> I have done significant testing with this using linux-next-20160219. The
>> only changes I made were to disable the "rx_dma_broken" feature so that
>> DMA would definately be used. I created a simple test where I send 48000
>> bytes at 230400bps over UART from one device to another. Several
>> different target devices and configurations were used to test the RX-DMA
>> feature of the 8250_omap. The expected result is 1000 DMA interrupts and
>> 0 UART interrupts.
>>
>> With the am335x (Beaglebone Black, eDMA engine) I see 1000 DMA
>> interrupts and 1000 spurious UART interrupts. The spurious UART
>> interrupts arrive 30-50us _before_ the DMA interrupts. Always.
>>
>> If I disable UART timeout interrupts (RDI), the same test generates no
>> spurious UART interrupts. Only 1000 DMA interrupts.
> 
> To be clear, these interrupts are not caught as spurious by the
> interrupt controller (INTC). They are detected by INTC as UART
> interrupts. Just that you don't expect a timeout interrupt to happen at
> the time you see the interrupt, correct?

Just to follow-up on what Sebastian wrote.

As he pointed out, these spurious interrupts are not timeout interrupts.
Since IIR_UART[0] == 1, no uart interrupt is pending.

As he wrote, these count as spurious interrupts and trigger
interrupt shutdown at 100000 (unless acked as uart interrupts).

These spurious interrupts very nearly correspond 1:1 (but not quite)
with each dma submission. So, for example, one test run had:

    @3Mbaud line rate
    195826 submits
    195823 completions

    195704 spurious interrupts (ie., interrupts with IIR_UART[0] == 1)
         0 RLSI interrupts (no line errors) (IIR_UART == 0xc6)
         2 RX timeout interrupts (IIR_UART == 0xcc),
           one during i/o test and one at the end of i/o test
         6 RDI interrupts (IIR_UART == 0xc4)

The spurious interrupts occur with nearly 1:1 correspondence at _all_
line rates.

Presumbably, the 6 RDI interrupts are from too-slow submission of
the next DMA and the uart rx fifo has reached rx trigger level already.
[NOTE: we should at least be using ping-pong dma buffers for rx so that
there is always a next DMA buffer when the current buffer is completed].

There is no documentation in any of the OMAP TRMs regarding RDI
interrupts while in DMA mode. Some guidance from TI would be appreciated.


Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
  2016-02-24  3:20               ` Peter Hurley
@ 2016-02-24 15:37                   ` Sekhar Nori
  0 siblings, 0 replies; 18+ messages in thread
From: Sekhar Nori @ 2016-02-24 15:37 UTC (permalink / raw)
  To: Peter Hurley, John Ogness, Tony Lindgren
  Cc: gregkh, vinod.koul, dan.j.williams, bigeasy, peter.ujfalusi,
	dmaengine, linux-serial, linux-kernel

On Wednesday 24 February 2016 08:50 AM, Peter Hurley wrote:
> Just to follow-up on what Sebastian wrote.
> 
> As he pointed out, these spurious interrupts are not timeout interrupts.
> Since IIR_UART[0] == 1, no uart interrupt is pending.
> 
> As he wrote, these count as spurious interrupts and trigger
> interrupt shutdown at 100000 (unless acked as uart interrupts).

Okay, by adding a printk to where the check for UART_IIR_NO_INT is in
omap_8250_dma_handle_irq(), I do see that UART irq handler is called
when there is apparently no interrupt.

I don't see the error interrupt count in /proc/interrupts go up although
the code is returning IRQ_NONE when this happens. I initially thought
that must be because of the interrupt being IRQF_SHARED. But getting rid
of IRQF_SHARED still does not lead to error count going up. I need to
spend some more time to see what is going on.

> These spurious interrupts very nearly correspond 1:1 (but not quite)
> with each dma submission. So, for example, one test run had:
> 
>     @3Mbaud line rate
>     195826 submits
>     195823 completions
> 
>     195704 spurious interrupts (ie., interrupts with IIR_UART[0] == 1)
>          0 RLSI interrupts (no line errors) (IIR_UART == 0xc6)
>          2 RX timeout interrupts (IIR_UART == 0xcc),
>            one during i/o test and one at the end of i/o test
>          6 RDI interrupts (IIR_UART == 0xc4)
> 
> The spurious interrupts occur with nearly 1:1 correspondence at _all_
> line rates.
> 
> Presumbably, the 6 RDI interrupts are from too-slow submission of
> the next DMA and the uart rx fifo has reached rx trigger level already.
> [NOTE: we should at least be using ping-pong dma buffers for rx so that
> there is always a next DMA buffer when the current buffer is completed].
> 
> There is no documentation in any of the OMAP TRMs regarding RDI
> interrupts while in DMA mode. Some guidance from TI would be appreciated.

UART interrupts triggering while UART_IIR_NO_INT is set is weird enough.
I will check around internally with hardware folks here. Getting an
answer might take time. But this is easily reproducible so I am
optimistic we will get an answer soon.

Regards,
Sekhar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
@ 2016-02-24 15:37                   ` Sekhar Nori
  0 siblings, 0 replies; 18+ messages in thread
From: Sekhar Nori @ 2016-02-24 15:37 UTC (permalink / raw)
  To: Peter Hurley, John Ogness, Tony Lindgren
  Cc: gregkh, vinod.koul, dan.j.williams, bigeasy, peter.ujfalusi,
	dmaengine, linux-serial, linux-kernel

On Wednesday 24 February 2016 08:50 AM, Peter Hurley wrote:
> Just to follow-up on what Sebastian wrote.
> 
> As he pointed out, these spurious interrupts are not timeout interrupts.
> Since IIR_UART[0] == 1, no uart interrupt is pending.
> 
> As he wrote, these count as spurious interrupts and trigger
> interrupt shutdown at 100000 (unless acked as uart interrupts).

Okay, by adding a printk to where the check for UART_IIR_NO_INT is in
omap_8250_dma_handle_irq(), I do see that UART irq handler is called
when there is apparently no interrupt.

I don't see the error interrupt count in /proc/interrupts go up although
the code is returning IRQ_NONE when this happens. I initially thought
that must be because of the interrupt being IRQF_SHARED. But getting rid
of IRQF_SHARED still does not lead to error count going up. I need to
spend some more time to see what is going on.

> These spurious interrupts very nearly correspond 1:1 (but not quite)
> with each dma submission. So, for example, one test run had:
> 
>     @3Mbaud line rate
>     195826 submits
>     195823 completions
> 
>     195704 spurious interrupts (ie., interrupts with IIR_UART[0] == 1)
>          0 RLSI interrupts (no line errors) (IIR_UART == 0xc6)
>          2 RX timeout interrupts (IIR_UART == 0xcc),
>            one during i/o test and one at the end of i/o test
>          6 RDI interrupts (IIR_UART == 0xc4)
> 
> The spurious interrupts occur with nearly 1:1 correspondence at _all_
> line rates.
> 
> Presumbably, the 6 RDI interrupts are from too-slow submission of
> the next DMA and the uart rx fifo has reached rx trigger level already.
> [NOTE: we should at least be using ping-pong dma buffers for rx so that
> there is always a next DMA buffer when the current buffer is completed].
> 
> There is no documentation in any of the OMAP TRMs regarding RDI
> interrupts while in DMA mode. Some guidance from TI would be appreciated.

UART interrupts triggering while UART_IIR_NO_INT is set is weird enough.
I will check around internally with hardware folks here. Getting an
answer might take time. But this is easily reproducible so I am
optimistic we will get an answer soon.

Regards,
Sekhar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
  2016-02-24 15:37                   ` Sekhar Nori
  (?)
@ 2016-02-24 15:46                   ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-02-24 15:46 UTC (permalink / raw)
  To: Sekhar Nori, Peter Hurley, John Ogness, Tony Lindgren
  Cc: gregkh, vinod.koul, dan.j.williams, peter.ujfalusi, dmaengine,
	linux-serial, linux-kernel

On 02/24/2016 04:37 PM, Sekhar Nori wrote:
> I don't see the error interrupt count in /proc/interrupts go up although
> the code is returning IRQ_NONE when this happens. I initially thought
> that must be because of the interrupt being IRQF_SHARED. But getting rid
> of IRQF_SHARED still does not lead to error count going up. I need to
> spend some more time to see what is going on.

that error counter goes up if the interrupt controller can not find an
interrupt number. That means a HW interrupt was raised but after
checking the interrupt source (by the GIC) there is none. In this case
we have one: the UART.

> Regards,
> Sekhar

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4] serial: omap: robustify for high speed transfers
  2016-02-24 15:37                   ` Sekhar Nori
  (?)
  (?)
@ 2016-03-07 20:23                   ` Peter Hurley
  -1 siblings, 0 replies; 18+ messages in thread
From: Peter Hurley @ 2016-03-07 20:23 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: John Ogness, Tony Lindgren, gregkh, vinod.koul, dan.j.williams,
	bigeasy, peter.ujfalusi, dmaengine, linux-serial, linux-kernel

On 02/24/2016 07:37 AM, Sekhar Nori wrote:
> On Wednesday 24 February 2016 08:50 AM, Peter Hurley wrote:
>> Just to follow-up on what Sebastian wrote.
>>
>> As he pointed out, these spurious interrupts are not timeout interrupts.
>> Since IIR_UART[0] == 1, no uart interrupt is pending.
>>
>> As he wrote, these count as spurious interrupts and trigger
>> interrupt shutdown at 100000 (unless acked as uart interrupts).
> 
> Okay, by adding a printk to where the check for UART_IIR_NO_INT is in
> omap_8250_dma_handle_irq(), I do see that UART irq handler is called
> when there is apparently no interrupt.
> 
> I don't see the error interrupt count in /proc/interrupts go up although
> the code is returning IRQ_NONE when this happens. I initially thought
> that must be because of the interrupt being IRQF_SHARED. But getting rid
> of IRQF_SHARED still does not lead to error count going up. I need to
> spend some more time to see what is going on.
> 
>> These spurious interrupts very nearly correspond 1:1 (but not quite)
>> with each dma submission. So, for example, one test run had:
>>
>>     @3Mbaud line rate
>>     195826 submits
>>     195823 completions
>>
>>     195704 spurious interrupts (ie., interrupts with IIR_UART[0] == 1)
>>          0 RLSI interrupts (no line errors) (IIR_UART == 0xc6)
>>          2 RX timeout interrupts (IIR_UART == 0xcc),
>>            one during i/o test and one at the end of i/o test
>>          6 RDI interrupts (IIR_UART == 0xc4)
>>
>> The spurious interrupts occur with nearly 1:1 correspondence at _all_
>> line rates.
>>
>> Presumbably, the 6 RDI interrupts are from too-slow submission of
>> the next DMA and the uart rx fifo has reached rx trigger level already.
>> [NOTE: we should at least be using ping-pong dma buffers for rx so that
>> there is always a next DMA buffer when the current buffer is completed].
>>
>> There is no documentation in any of the OMAP TRMs regarding RDI
>> interrupts while in DMA mode. Some guidance from TI would be appreciated.
> 
> UART interrupts triggering while UART_IIR_NO_INT is set is weird enough.
> I will check around internally with hardware folks here. Getting an
> answer might take time. But this is easily reproducible so I am
> optimistic we will get an answer soon.

Thanks.

Also, after looking over the latest errata for am335x, I was surprised
not to see an errata for our TX DMA workaround.

Currently, to get memory-to-device DMA to start *on am335x only* requires
writing the 1st byte to the UART fifo to trigger DMA, which is pretty odd.
It's almost as if the TX DMA trigger is edge-triggered rather than
level-triggered.

Let me know if you need more info.


Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-03-07 20:23 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-22 10:27 [PATCH 0/4] serial: omap: robustify for high speed transfers John Ogness
2016-01-22 10:27 ` John Ogness
2016-01-25 18:56 ` Peter Hurley
2016-01-29 16:35   ` John Ogness
2016-02-03  1:21     ` Peter Hurley
2016-02-11 12:02       ` John Ogness
2016-02-11 21:00         ` Tony Lindgren
2016-02-22 15:30           ` John Ogness
2016-02-22 19:38             ` Tony Lindgren
2016-02-23  9:59             ` Sekhar Nori
2016-02-23  9:59               ` Sekhar Nori
2016-02-23 12:43               ` Sebastian Andrzej Siewior
2016-02-23 16:56                 ` Andy Shevchenko
2016-02-24  3:20               ` Peter Hurley
2016-02-24 15:37                 ` Sekhar Nori
2016-02-24 15:37                   ` Sekhar Nori
2016-02-24 15:46                   ` Sebastian Andrzej Siewior
2016-03-07 20:23                   ` Peter Hurley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.