regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Hans de Goede <hdegoede@redhat.com>
To: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jiri Slaby <jirislaby@kernel.org>,
	"open list:SERIAL DRIVERS" <linux-serial@vger.kernel.org>,
	"regressions@lists.linux.dev" <regressions@lists.linux.dev>
Subject: Re: [Regression] "serial: 8250: use THRE & __stop_tx also with DMA" breaks DMA rx
Date: Thu, 16 Mar 2023 21:42:59 +0100	[thread overview]
Message-ID: <7ef0037f-c049-b783-b019-c8453801555e@redhat.com> (raw)
In-Reply-To: <322489-e120-dcbf-fffc-d9df31d8c499@linux.intel.com>

Hi,

On 3/15/23 15:47, Ilpo Järvinen wrote:
> On Tue, 14 Mar 2023, Hans de Goede wrote:
>> On 3/14/23 17:55, Ilpo Järvinen wrote:
>>> On Tue, 14 Mar 2023, Hans de Goede wrote:
>>>> On 3/14/23 12:48, Ilpo Järvinen wrote:
>>>>> On Tue, 14 Mar 2023, Hans de Goede wrote:
>>>>>> On 3/14/23 11:55, Ilpo Järvinen wrote:
>>>>>>> On Tue, 14 Mar 2023, Hans de Goede wrote:
>>>>>>>
>>>>>>>> I have spend the last couple of days debugging a problem with Bluetooth
>>>>>>>> adapters (HCIs) connected over an UART connection on Intel Bay Trail
>>>>>>>> and Cherry Trail devices.
>>>>>>>>
>>>>>>>> After much debugging I found out that sometimes the first byte of
>>>>>>>> a received packet (data[0]) would be overwritten with a 0 byte.
>>>>>>>>
>>>>>>>> E.g. this packet received during init of a BCM4324B3 (1) Bluetooth HCI:
>>>>>>>>
>>>>>>>> 04 0e 0a 01 79 fc 00 54 fe ff ff 00 00
>>>>>>>>
>>>>>>>> would sometimes turn into:
>>>>>>>>
>>>>>>>> 00 0e 0a 01 79 fc 00 54 fe ff ff 00 00
>>>>>>>>
>>>>>>>> Further investigation revealed that this goes away if I stop
>>>>>>>> the dw_dmac module from loading, leading to:
>>>>>>>> "dw-apb-uart 80860F0A:00: failed to request DMA"
>>>>>>>> and the UART working without DMA support.
>>>>>>>>
>>>>>>>> Testing various kernels showed that this problem was introduced
>>>>>>>> in v5.19, v5.18 - v5.18.19 are fine. An a git bisect points to:
>>>>>>>>
>>>>>>>> e8ffbb71f783 ("serial: 8250: use THRE & __stop_tx also with DMA")
>>>>>>>>
>>>>>>>> And reverting that on top of v6.3-rc2 indeed solves the problem.
>>>>>>>
>>>>>>>> So it seems that that commit somehow interferes with DMA based
>>>>>>
>>>>>>> Maybe the the extra interrupt that the tx side change will trigger somehow 
>>>>>>> causes the confusion on the rx side. So that would be an extra call into 
>>>>>>> handle_rx_dma() that could either do an extra flush or start DMA Rx that 
>>>>>>> would not occur w/o that tx side change.
>>>>>>
>>>>>> That sounds like a likely candidate for causing this yes, as said
>>>>>> I'm unfamiliar with the serial-port code, but I did already suspect
>>>>>> that the change was causing some extra interrupt which somehow
>>>>>> interfered with the rx side.
>>>>>>
>>>>>>>> The issue has been seen on and the revert has been tested on
>>>>>>>> the following HW:
>>>>>>>>
>>>>>>>> Asus T100TA
>>>>>>>> SoC: Bay Trail UART: 80860F0A WIFI: brcmfmac43241b4-sdio BT: BCM4324B3
>>>>>>>>
>>>>>>>> Lenovo Yoga Tablet 2 1051L
>>>>>>>> SoC: Bay Trail UART: 80860F0A WIFI: brcmfmac43241b4-sdio BT: BCM4324B3
>>>>>>>>
>>>>>>>> Lenovo Yoga Book X91F
>>>>>>>> Soc: Cherry Trail UART: 8086228A WIFI: brcmfmac4356-pcie BT: BCM4356A2
>>>>>>>>
>>>>>>>> I have more hw which I believe is affected but these are the models
>>>>>>>> where I have done detailed testing.
>>>>>>>>
>>>>>>>> I would be happy to test any patches, or run a kernel with some extra
>>>>>>>> debugging added, just let me know what you need to help fixing this.
> 
>>> From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>>> Subject: [PATCH] serial: 8250: Prevent starting up DMA Rx on THRI interrupt
>>>
>>> Hans de Goede reported Bluetooth adapters (HCIs) connected over an UART
>>> connection failed due corrupted Rx payload. The problem was narrowed
>>> down to DMA Rx starting on UART_IIR_THRI interrupt. The problem occurs 
>>> despite LSR having DR bit set, which is precondition for attempting to 
>>> start DMA Rx.
>>>
>>> This problem became apparent after e8ffbb71f783 ("serial: 8250: use
>>> THRE & __stop_tx also with DMA") changed Tx stopping to get triggered
>>> using THRI interrupt.
>>>
>>> Don't setup DMA Rx on UART_IIR_THRI but leave it to a subsequent
>>> interrupt which has Rx related IIR value.
>>>
>>> Reported-by: Hans de Goede <hdegoede@redhat.com>
>>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>>
>> I can confirm that this fixes things for me:
>>
>> Tested-by: Hans de Goede <hdegoede@redhat.com>
> 
> Okay, thanks for testing.
> 
> Here's is a slightly improved debug patch which will count how many 
> characters are received by DMA and non-DMA rx. It should be tested 
> WITHOUT the fix.
> 
> I'm mostly interested in knowing if it's purely DMA Rx issue or whether 
> the non-DMA rx is muddling the waters too. While investigating other 
> issues I've seen UART_IIR_TIMEOUT (inter-character timeout) to often come 
> so early from UART that the tail of characters is left there to be handled 
> by the non-DMA path.

Ok, here is a new log:

https://fedorapeople.org/~jwrdegoede/dmesg-8250-dma-issue-2

With one successful hci_uart probe and one failed. The failed one looks interesting:

[  576.808776] 8250irq: iir=cc lsr+saved=60 received=1/12 ier=0d dma_t/rx/err=0/1/0 cnt=-4
[  576.808870] Bluetooth: hci0: Frame reassembly failed (-84)

This is the only time that debug message shows anything but 0 for data received over dma. It looks like for all these small packets dma simply never kicks in and the extra IRQ from the THRI interrupt somehow starts a DMA transfer for the one byte and for some reason that DMA transfer always reads the byte as 0.

Note that the amount of received bytes is still correct, so the 1 byte transfer by DMA replaces one byte which would be otherwise read from MMIO (I guess its MMIO?), but it has the wrong (all bits 0) content.

Regards,

Hans



  reply	other threads:[~2023-03-16 20:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-14 10:20 [Regression] "serial: 8250: use THRE & __stop_tx also with DMA" breaks DMA rx Hans de Goede
2023-03-14 10:55 ` Ilpo Järvinen
2023-03-14 11:11   ` Hans de Goede
2023-03-14 11:48     ` Ilpo Järvinen
2023-03-14 16:04       ` Hans de Goede
2023-03-14 16:55         ` Ilpo Järvinen
2023-03-14 19:02           ` Hans de Goede
2023-03-15 14:47             ` Ilpo Järvinen
2023-03-16 20:42               ` Hans de Goede [this message]
2023-03-17 10:38                 ` Ilpo Järvinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ef0037f-c049-b783-b019-c8453801555e@redhat.com \
    --to=hdegoede@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=jirislaby@kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).