From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Hans de Goede <hdegoede@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jiri Slaby <jirislaby@kernel.org>,
"open list:SERIAL DRIVERS" <linux-serial@vger.kernel.org>,
"regressions@lists.linux.dev" <regressions@lists.linux.dev>
Subject: Re: [Regression] "serial: 8250: use THRE & __stop_tx also with DMA" breaks DMA rx
Date: Wed, 15 Mar 2023 16:47:45 +0200 (EET) [thread overview]
Message-ID: <322489-e120-dcbf-fffc-d9df31d8c499@linux.intel.com> (raw)
In-Reply-To: <00de13b2-0ed9-efe2-e8d8-c9370c04e80b@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 9175 bytes --]
On Tue, 14 Mar 2023, Hans de Goede wrote:
> On 3/14/23 17:55, Ilpo Järvinen wrote:
> > On Tue, 14 Mar 2023, Hans de Goede wrote:
> >> On 3/14/23 12:48, Ilpo Järvinen wrote:
> >>> On Tue, 14 Mar 2023, Hans de Goede wrote:
> >>>> On 3/14/23 11:55, Ilpo Järvinen wrote:
> >>>>> On Tue, 14 Mar 2023, Hans de Goede wrote:
> >>>>>
> >>>>>> I have spend the last couple of days debugging a problem with Bluetooth
> >>>>>> adapters (HCIs) connected over an UART connection on Intel Bay Trail
> >>>>>> and Cherry Trail devices.
> >>>>>>
> >>>>>> After much debugging I found out that sometimes the first byte of
> >>>>>> a received packet (data[0]) would be overwritten with a 0 byte.
> >>>>>>
> >>>>>> E.g. this packet received during init of a BCM4324B3 (1) Bluetooth HCI:
> >>>>>>
> >>>>>> 04 0e 0a 01 79 fc 00 54 fe ff ff 00 00
> >>>>>>
> >>>>>> would sometimes turn into:
> >>>>>>
> >>>>>> 00 0e 0a 01 79 fc 00 54 fe ff ff 00 00
> >>>>>>
> >>>>>> Further investigation revealed that this goes away if I stop
> >>>>>> the dw_dmac module from loading, leading to:
> >>>>>> "dw-apb-uart 80860F0A:00: failed to request DMA"
> >>>>>> and the UART working without DMA support.
> >>>>>>
> >>>>>> Testing various kernels showed that this problem was introduced
> >>>>>> in v5.19, v5.18 - v5.18.19 are fine. An a git bisect points to:
> >>>>>>
> >>>>>> e8ffbb71f783 ("serial: 8250: use THRE & __stop_tx also with DMA")
> >>>>>>
> >>>>>> And reverting that on top of v6.3-rc2 indeed solves the problem.
> >>>>>
> >>>>>> So it seems that that commit somehow interferes with DMA based
> >>>>
> >>>>> Maybe the the extra interrupt that the tx side change will trigger somehow
> >>>>> causes the confusion on the rx side. So that would be an extra call into
> >>>>> handle_rx_dma() that could either do an extra flush or start DMA Rx that
> >>>>> would not occur w/o that tx side change.
> >>>>
> >>>> That sounds like a likely candidate for causing this yes, as said
> >>>> I'm unfamiliar with the serial-port code, but I did already suspect
> >>>> that the change was causing some extra interrupt which somehow
> >>>> interfered with the rx side.
> >>>>
> >>>>>> The issue has been seen on and the revert has been tested on
> >>>>>> the following HW:
> >>>>>>
> >>>>>> Asus T100TA
> >>>>>> SoC: Bay Trail UART: 80860F0A WIFI: brcmfmac43241b4-sdio BT: BCM4324B3
> >>>>>>
> >>>>>> Lenovo Yoga Tablet 2 1051L
> >>>>>> SoC: Bay Trail UART: 80860F0A WIFI: brcmfmac43241b4-sdio BT: BCM4324B3
> >>>>>>
> >>>>>> Lenovo Yoga Book X91F
> >>>>>> Soc: Cherry Trail UART: 8086228A WIFI: brcmfmac4356-pcie BT: BCM4356A2
> >>>>>>
> >>>>>> I have more hw which I believe is affected but these are the models
> >>>>>> where I have done detailed testing.
> >>>>>>
> >>>>>> I would be happy to test any patches, or run a kernel with some extra
> >>>>>> debugging added, just let me know what you need to help fixing this.
> > From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > Subject: [PATCH] serial: 8250: Prevent starting up DMA Rx on THRI interrupt
> >
> > Hans de Goede reported Bluetooth adapters (HCIs) connected over an UART
> > connection failed due corrupted Rx payload. The problem was narrowed
> > down to DMA Rx starting on UART_IIR_THRI interrupt. The problem occurs
> > despite LSR having DR bit set, which is precondition for attempting to
> > start DMA Rx.
> >
> > This problem became apparent after e8ffbb71f783 ("serial: 8250: use
> > THRE & __stop_tx also with DMA") changed Tx stopping to get triggered
> > using THRI interrupt.
> >
> > Don't setup DMA Rx on UART_IIR_THRI but leave it to a subsequent
> > interrupt which has Rx related IIR value.
> >
> > Reported-by: Hans de Goede <hdegoede@redhat.com>
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>
> I can confirm that this fixes things for me:
>
> Tested-by: Hans de Goede <hdegoede@redhat.com>
Okay, thanks for testing.
Here's is a slightly improved debug patch which will count how many
characters are received by DMA and non-DMA rx. It should be tested
WITHOUT the fix.
I'm mostly interested in knowing if it's purely DMA Rx issue or whether
the non-DMA rx is muddling the waters too. While investigating other
issues I've seen UART_IIR_TIMEOUT (inter-character timeout) to often come
so early from UART that the tail of characters is left there to be handled
by the non-DMA path.
From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Subject: [PATCH] DEBUG: Dma rx-problem
A DEBUG PATCH, not intended for upstream inclusing.
---
drivers/tty/serial/8250/8250_dma.c | 17 ++++++++++++-----
drivers/tty/serial/8250/8250_port.c | 19 +++++++++++++++++++
include/linux/serial_8250.h | 3 +++
3 files changed, 34 insertions(+), 5 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c
index 7fa66501792d..705c26941429 100644
--- a/drivers/tty/serial/8250/8250_dma.c
+++ b/drivers/tty/serial/8250/8250_dma.c
@@ -32,13 +32,15 @@ static void __dma_tx_complete(void *param)
uart_write_wakeup(&p->port);
ret = serial8250_tx_dma(p);
- if (ret || !dma->tx_running)
+ if (ret || !dma->tx_running) {
+ p->irq_debug = -5;
serial8250_set_THRI(p);
+ }
spin_unlock_irqrestore(&p->port.lock, flags);
}
-static void __dma_rx_complete(struct uart_8250_port *p)
+static int __dma_rx_complete(struct uart_8250_port *p)
{
struct uart_8250_dma *dma = p->dma;
struct tty_port *tty_port = &p->port.state->port;
@@ -53,7 +55,7 @@ static void __dma_rx_complete(struct uart_8250_port *p)
*/
dma_status = dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state);
if (dma_status == DMA_IN_PROGRESS)
- return;
+ return -1;
count = dma->rx_size - state.residue;
@@ -62,6 +64,8 @@ static void __dma_rx_complete(struct uart_8250_port *p)
dma->rx_running = 0;
tty_flip_buffer_push(tty_port);
+
+ return count;
}
static void dma_rx_complete(void *param)
@@ -69,10 +73,11 @@ static void dma_rx_complete(void *param)
struct uart_8250_port *p = param;
struct uart_8250_dma *dma = p->dma;
unsigned long flags;
+ int count = -1;
spin_lock_irqsave(&p->port.lock, flags);
if (dma->rx_running)
- __dma_rx_complete(p);
+ count = __dma_rx_complete(p);
/*
* Cannot be combined with the previous check because __dma_rx_complete()
@@ -81,6 +86,8 @@ static void dma_rx_complete(void *param)
if (!dma->rx_running && (serial_lsr_in(p) & UART_LSR_DR))
p->dma->rx_dma(p);
spin_unlock_irqrestore(&p->port.lock, flags);
+
+ pr_err("8250dma: rx complete %d\n", count);
}
int serial8250_tx_dma(struct uart_8250_port *p)
@@ -172,7 +179,7 @@ void serial8250_rx_dma_flush(struct uart_8250_port *p)
if (dma->rx_running) {
dmaengine_pause(dma->rxchan);
- __dma_rx_complete(p);
+ p->dma_rx_count = __dma_rx_complete(p);
dmaengine_terminate_async(dma->rxchan);
}
}
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index fa43df05342b..0c950ff7a8ee 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1811,6 +1811,7 @@ u16 serial8250_rx_chars(struct uart_8250_port *up, u16 lsr)
} while (lsr & (UART_LSR_DR | UART_LSR_BI));
tty_flip_buffer_push(&port->state->port);
+ up->rx_chars_count = 256 - max_count;
return lsr;
}
EXPORT_SYMBOL_GPL(serial8250_rx_chars);
@@ -1923,6 +1924,10 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
struct uart_8250_port *up = up_to_u8250p(port);
bool skip_rx = false;
unsigned long flags;
+ long irq_debug;
+ unsigned int tx_running = 0, rx_running = 0, tx_err = 0;
+ unsigned int ier;
+ int rx_count, dma_count;
u16 status;
if (iir & UART_IIR_NO_INT)
@@ -1931,6 +1936,14 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
spin_lock_irqsave(&port->lock, flags);
status = serial_lsr_in(up);
+ up->dma_rx_count = 0;
+ up->rx_chars_count = 0;
+ if (up->dma) {
+ rx_running = up->dma->rx_running;
+ tx_running = up->dma->tx_running;
+ tx_err = up->dma->tx_err;
+ }
+ ier = up->ier;
/*
* If port is stopped and there are no error conditions in the
@@ -1957,7 +1970,13 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
__stop_tx(up);
}
+ irq_debug = up->irq_debug++;
+ rx_count = up->rx_chars_count;
+ dma_count = up->dma_rx_count;
uart_unlock_and_check_sysrq_irqrestore(port, flags);
+ if (irq_debug < 0)
+ pr_err("8250irq: iir=%02x lsr+saved=%02x received=%d/%d ier=%02x dma_t/rx/err=%u/%u/%u cnt=%ld\n",
+ iir, status, dma_count, rx_count, ier, tx_running, rx_running, tx_err, irq_debug);
return 1;
}
diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h
index 741ed4807a9c..4a0c0d549d7b 100644
--- a/include/linux/serial_8250.h
+++ b/include/linux/serial_8250.h
@@ -107,6 +107,9 @@ struct uart_8250_port {
unsigned char mcr;
unsigned char cur_iotype; /* Running I/O type */
unsigned int rpm_tx_active;
+ long irq_debug;
+ int rx_chars_count;
+ int dma_rx_count;
unsigned char canary; /* non-zero during system sleep
* if no_console_suspend
*/
--
tg: (571079f5ba93..) debug/dma-rx-problem (depends on: tty-next)
next prev parent reply other threads:[~2023-03-15 14:47 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-14 10:20 [Regression] "serial: 8250: use THRE & __stop_tx also with DMA" breaks DMA rx Hans de Goede
2023-03-14 10:55 ` Ilpo Järvinen
2023-03-14 11:11 ` Hans de Goede
2023-03-14 11:48 ` Ilpo Järvinen
2023-03-14 16:04 ` Hans de Goede
2023-03-14 16:55 ` Ilpo Järvinen
2023-03-14 19:02 ` Hans de Goede
2023-03-15 14:47 ` Ilpo Järvinen [this message]
2023-03-16 20:42 ` Hans de Goede
2023-03-17 10:38 ` Ilpo Järvinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=322489-e120-dcbf-fffc-d9df31d8c499@linux.intel.com \
--to=ilpo.jarvinen@linux.intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=hdegoede@redhat.com \
--cc=jirislaby@kernel.org \
--cc=linux-serial@vger.kernel.org \
--cc=regressions@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).