linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hans de Goede <hdegoede@redhat.com>
To: Christoph Hellwig <hch@lst.de>, Tom Yan <tom.ty89@gmail.com>
Cc: Mathias Nyman <mathias.nyman@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-usb <linux-usb@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-pci@vger.kernel.org
Subject: Re: 5.10 regression caused by: "uas: fix sdev->host->dma_dev": many XHCI swiotlb buffer is full / DMAR: Device bounce map failed errors on thunderbolt connected XHCI controller
Date: Fri, 27 Nov 2020 13:32:16 +0100	[thread overview]
Message-ID: <8a52e868-0ca1-55b7-5ad2-ddb0cbb5e45d@redhat.com> (raw)
In-Reply-To: <fde7e11f-5dfc-8348-c134-a21cb1116285@redhat.com>

Hi,

On 11/27/20 12:41 PM, Hans de Goede wrote:
> Hi,
> 
> On 11/24/20 11:27 AM, Christoph Hellwig wrote:
>> On Mon, Nov 23, 2020 at 03:49:09PM +0100, Hans de Goede wrote:
>>> Hi,
>>>
>>> +Cc Christoph Hellwig <hch@lst.de>
>>>
>>> Christoph, this is still an issue, so I've been looking around a bit and think this
>>> might have something to do with the dma-mapping-5.10 changes.
>>>
>>> Do you have any suggestions to debug this, or is it time to do a git bisect
>>> on this before 5.10 ships with regression?
>>
>> Given that DMAR prefix this seems to be about using intel-iommu + bounce
>> buffering for external devices.  I can't really think of anything specific
>> in 5.10 related to that, so maybe you'll need to bisect.
>>
>> I doub this means we are actually leaking swiotlb buffers, so while
>> I'm pretty sure we broke something in lower layers this also means
>> xhci doesn't handle swiotlb operation very gracefully in general.
> 
> I've done a git bisect, and the result is somewhat surprising. The git-bisect
> points to:
> 
> commit 558033c2828f ("uas: fix sdev->host->dma_dev")
> 
>     Use scsi_add_host_with_dma() instead of scsi_add_host().
>     
>     When the scsi request queue is initialized/allocated, hw_max_sectors is clamped
>     to the dma max mapping size. Therefore, the correct device that should be used
>     for the clamping needs to be set.
>     
>     The same clamping is still needed in uas as hw_max_sectors could be changed
>     there. The original clamping would be invalidated in such cases.
> 
> I do have an UAS drive connected to the thunderbolt-dock, so I guess that this
> change is causing the UAS driver to gobble all all available swiotlb space.

I ran some more tests, I can confirm that reverting:

5df7ef7d32fe "uas: bump hw_max_sectors to 2048 blocks for SS or faster drives"
558033c2828f "uas: fix sdev->host->dma_dev"

Makes the problem go away while running a 5.10 kernel. I also tried doubling
the swiotlb size by adding: swiotlb=65536 to the kernel commandline but that
does not help.

Some more observations:

1. The usb-storage driver does not cause this issue, even though it has a
very similar change.

2. The problem does not happen until I plug an UAS decvice into the dock.

3. The problem continues to happen even after I unplug the UAS device and
rmmod the uas module

3. made me take a bit closer look to the troublesome commit, it passes:
udev->bus->sysdev, which I assume is the XHCI controller itself as device
to scsi_add_host_with_dma, which in turn seems to cause permanent changes
to the dma settings for the XHCI controller. I'm not all that familiar with
the DMA APIs but I'm getting the feeling that passing the actual XHCI-controller's
device as dma-device to scsi_add_host_with_dma is simply the wrong thing to
do; and that the intended effects (honor XHCI dma limits, but do not cause
any changes the XHCI dma settings) should be achieved differently.

Note that if this is indeed wrong, the matching usb-storage change should
likely also be dropped.

Regards,

Hans


  reply	other threads:[~2020-11-27 12:32 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-10 11:36 5.10 regression, many XHCI swiotlb buffer is full / DMAR: Device bounce map failed errors on thunderbolt connected XHCI controller Hans de Goede
2020-11-18 21:43 ` Hans de Goede
2020-11-23 14:49 ` Hans de Goede
2020-11-24 10:27   ` Christoph Hellwig
2020-11-24 10:31     ` Hans de Goede
2020-11-24 12:17       ` Mathias Nyman
2020-11-27 11:41     ` Hans de Goede
2020-11-27 12:32       ` Hans de Goede [this message]
2020-11-27 16:19         ` 5.10 regression caused by: "uas: fix sdev->host->dma_dev": " Christoph Hellwig
2020-11-27 18:12           ` Hans de Goede
2020-11-28  1:25             ` Tom Yan
2020-11-28 10:43               ` Hans de Goede
2020-11-28 15:48                 ` [PATCH 1/2] uas: revert from scsi_add_host_with_dma() to scsi_add_host() Tom Yan
2020-11-28 15:48                   ` [PATCH 2/2] usb-storage: " Tom Yan
2020-11-30  9:50                     ` Hans de Goede
2020-11-30 12:58                       ` Tom Yan
2020-11-30 13:23                         ` Hans de Goede
2020-11-30 13:30                           ` Greg KH
2020-11-30 13:36                             ` Hans de Goede
2020-11-30 13:53                               ` Greg KH
2020-11-30 13:55                                 ` Hans de Goede
2020-12-04 15:02                                   ` Greg KH
2020-11-30 17:20                               ` Alan Stern
2020-11-30 17:24                                 ` Christoph Hellwig
2020-11-30 18:18                                 ` Hans de Goede
2020-11-30 18:57                                   ` Tom Yan
2020-11-30 19:01                                     ` Tom Yan
2020-11-30 20:36                                       ` Alan Stern
2021-02-25 16:35                                         ` Alan Stern
2021-02-26  5:53                                           ` Christoph Hellwig
2021-03-01 15:59                                             ` Alan Stern
2020-11-30 14:39                           ` Tom Yan
2020-11-30  9:48                   ` [PATCH 1/2] uas: " Hans de Goede
2020-11-30 19:30                     ` Tom Yan
2020-12-01 11:09                       ` Hans de Goede
2020-11-28 17:15             ` 5.10 regression caused by: "uas: fix sdev->host->dma_dev": many XHCI swiotlb buffer is full / DMAR: Device bounce map failed errors on thunderbolt connected XHCI controller Christoph Hellwig
2020-11-30  8:43               ` Hans de Goede

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a52e868-0ca1-55b7-5ad2-ddb0cbb5e45d@redhat.com \
    --to=hdegoede@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=tom.ty89@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).