linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* MT76x2U crashes XHCI driver on AMD Ryzen system
@ 2019-01-10 22:55 Rosen Penev
  2019-01-11 17:29 ` Lorenzo Bianconi
  0 siblings, 1 reply; 33+ messages in thread
From: Rosen Penev @ 2019-01-10 22:55 UTC (permalink / raw)
  To: linux-wireless; +Cc: lorenzo.bianconi

I have an ALFA AWUS036ACM and whenever I plug it in, the XHCI driver
dies. This is on an ASRock X370 motherboard. Kernel 4.20 Arch Linux.

After plugging in the adapter, the integrated wireless chip (Intel
3168) also dies. Disabling the motherboard wireless does not help.
Here are two fairly similar logs. First with Intel 3168 and the second
with it disabled in UEFI.

kernel: xhci_hcd 0000:03:00.0: xHCI host not responding to stop
endpoint command.
kernel: xhci_hcd 0000:03:00.0: xHCI host controller not responding, assume dead
kernel: xhci_hcd 0000:03:00.0: HC died; cleaning up
kernel: usb 1-9: USB disconnect, device number 2
kernel: usb 1-14: USB disconnect, device number 3
bluetoothd[506]: Endpoint unregistered: sender=:1.151
path=/MediaEndpoint/A2DPSource
bluetoothd[506]: Endpoint unregistered: sender=:1.151
path=/MediaEndpoint/A2DPSink
systemd[1]: Starting Load/Save RF Kill Switch Status...
NetworkManager[512]: <info>  [1547157527.0255] bluez5: NAP: removed
interface B0:35:9F:2D:73:1D
systemd[1]: Stopped target Bluetooth.
systemd[1]: Started Load/Save RF Kill Switch Status.
audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295
msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd"
hostname=? addr=? terminal=? res=success'
kkuid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?
terminal=? res=success'
audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295
msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd"
hostname=? addr=? terminal=? res=success'
kernel: audit: type=1131 audit(1547157532.035:57): pid=1 uid=0
auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'
kernel: iwlwifi 0000:27:00.0: iwl_pcie_cmdq_reclaim: Read index for
DMA queue txq id (0), index 0 is out of range [0-256] 61 60.
iwlwifi 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0000 address=0x00000000ff442000 flags=0x0020]
kernel: iwlwifi 0000:27:00.0: HCMD_ACTIVE already clear for command
REPLY_SF_CFG_CMD
iwlwifi 0000:27:00.0: iwl_pcie_cmdq_reclaim: Read index for DMA queue
txq id (0), index 0 is out of range [0-256] 61 60.

Folllowed by more iwlwifi errors as well as kernel crashdumps.

Without iwlwifi:

kernel: usb 1-14: new high-speed USB device number 3 using xhci_hcd
kernel: usb 1-14: New USB device found, idVendor=0e8d, idProduct=7612,
bcdDevice= 1.00
kernel: usb 1-14: New USB device strings: Mfr=2, Product=3, SerialNumber=4
usb 1-14: Product: Wireless
kernel: usb 1-14: Manufacturer: MediaTek Inc.
kernel: usb 1-14: SerialNumber: 000000000
mtp-probe[1400]: checking bus 1, device 3:
"/sys/devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-14"
mtp-probe[1400]: bus: 1, device: 3 was not an MTP device
kernel: usb 1-14: reset high-speed USB device number 3 using xhci_hcd
kernel: mt76x2u 1-14:1.0: ASIC revision: 76120044
kernel: mt76x2u 1-14:1.0: ROM patch build: 20140408060640a
kernel: xhci_hcd 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0000 address=0x00000000f6d5c400 flags=0x0000]
kernel: mt76x2u 1-14:1.0: firmware upload timed out
kernel: xhci_hcd 0000:03:00.0: xHCI host not responding to stop
endpoint command.
kernel: xhci_hcd 0000:03:00.0: xHCI host controller not responding, assume dead
kernel: xhci_hcd 0000:03:00.0: HC died; cleaning up
kernel: usb 1-9: USB disconnect, device number 2
bluetoothd[489]: Endpoint unregistered: sender=:1.150
path=/MediaEndpoint/A2DPSource
kernel: usb 1-14: USB disconnect, device number 3
NetworkManager[496]: <info>  [1547157974.3360] bluez5: NAP: removed
interface B0:35:9F:2D:73:1D
bluetoothd[489]: Endpoint unregistered: sender=:1.150
path=/MediaEndpoint/A2DPSink
systemd[1]: Starting Load/Save RF Kill Switch Status...
systemd[1]: Stopped target Bluetooth.
systemd[1]: Started Load/Save RF Kill Switch Status.
audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295
msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd"
hostname=? addr=? terminal=? res=success'
kernel: audit: type=1130 audit(1547157974.336:52): pid=1 uid=0
auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'
audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295
msg='unit=systemd-localed comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'
audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295
msg='unit=systemd-hostnamed comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'
kernel: audit: type=1131 audit(1547157977.826:53): pid=1 uid=0
auid=4294967295 ses=4294967295 msg='unit=systemd-localed
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?
terminal=? res=success'
audit: type=1131 audit(1547157977.826:54): pid=1 uid=0 auid=4294967295
ses=4294967295 msg='unit=systemd-hostnamed comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'
audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295
msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd"
hostname=? addr=? terminal=? res=success'
kernel: audit: type=1131 audit(1547157979.346:55): pid=1 uid=0
auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'
kernel: mt76x2u 1-14:1.0: MAC RX failed to stop
kernel: mt76x2u: probe of 1-14:1.0 failed with error -5
kernel: usbcore: registered new interface driver mt76x2u
geoclue[785]: Service not used for 60 seconds. Shutting down..
audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295
msg='unit=geoclue comm="systemd" exe="/usr/lib/systemd/systemd"
hostname=? addr=? terminal=? res=success'
kernel: audit: type=1131 audit(1547157990.969:56): pid=1 uid=0
auid=4294967295 ses=4294967295 msg='unit=geoclue comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-01-10 22:55 MT76x2U crashes XHCI driver on AMD Ryzen system Rosen Penev
@ 2019-01-11 17:29 ` Lorenzo Bianconi
  2019-01-11 19:01   ` Rosen Penev
  0 siblings, 1 reply; 33+ messages in thread
From: Lorenzo Bianconi @ 2019-01-11 17:29 UTC (permalink / raw)
  To: Rosen Penev; +Cc: linux-wireless

>
> I have an ALFA AWUS036ACM and whenever I plug it in, the XHCI driver
> dies. This is on an ASRock X370 motherboard. Kernel 4.20 Arch Linux.
>
> After plugging in the adapter, the integrated wireless chip (Intel
> 3168) also dies. Disabling the motherboard wireless does not help.
> Here are two fairly similar logs. First with Intel 3168 and the second
> with it disabled in UEFI.
>

[...]

> Without iwlwifi:
>
> kernel: usb 1-14: new high-speed USB device number 3 using xhci_hcd
> kernel: usb 1-14: New USB device found, idVendor=0e8d, idProduct=7612,
> bcdDevice= 1.00
> kernel: usb 1-14: New USB device strings: Mfr=2, Product=3, SerialNumber=4
> usb 1-14: Product: Wireless
> kernel: usb 1-14: Manufacturer: MediaTek Inc.
> kernel: usb 1-14: SerialNumber: 000000000
> mtp-probe[1400]: checking bus 1, device 3:
> "/sys/devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-14"
> mtp-probe[1400]: bus: 1, device: 3 was not an MTP device
> kernel: usb 1-14: reset high-speed USB device number 3 using xhci_hcd
> kernel: mt76x2u 1-14:1.0: ASIC revision: 76120044
> kernel: mt76x2u 1-14:1.0: ROM patch build: 20140408060640a
> kernel: xhci_hcd 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT

Are you using the driver in a VM or directly in the host? This is an
error reported by AMD iommu driver

Regards,
Lorenzo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-01-11 17:29 ` Lorenzo Bianconi
@ 2019-01-11 19:01   ` Rosen Penev
  2019-01-13 13:33     ` Lorenzo Bianconi
  0 siblings, 1 reply; 33+ messages in thread
From: Rosen Penev @ 2019-01-11 19:01 UTC (permalink / raw)
  To: Lorenzo Bianconi; +Cc: linux-wireless



On Jan 11, 2019, at 09:29, Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:

>> 
>> I have an ALFA AWUS036ACM and whenever I plug it in, the XHCI driver
>> dies. This is on an ASRock X370 motherboard. Kernel 4.20 Arch Linux.
>> 
>> After plugging in the adapter, the integrated wireless chip (Intel
>> 3168) also dies. Disabling the motherboard wireless does not help.
>> Here are two fairly similar logs. First with Intel 3168 and the second
>> with it disabled in UEFI.
>> 
> 
> [...]
> 
>> Without iwlwifi:
>> 
>> kernel: usb 1-14: new high-speed USB device number 3 using xhci_hcd
>> kernel: usb 1-14: New USB device found, idVendor=0e8d, idProduct=7612,
>> bcdDevice= 1.00
>> kernel: usb 1-14: New USB device strings: Mfr=2, Product=3, SerialNumber=4
>> usb 1-14: Product: Wireless
>> kernel: usb 1-14: Manufacturer: MediaTek Inc.
>> kernel: usb 1-14: SerialNumber: 000000000
>> mtp-probe[1400]: checking bus 1, device 3:
>> "/sys/devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-14"
>> mtp-probe[1400]: bus: 1, device: 3 was not an MTP device
>> kernel: usb 1-14: reset high-speed USB device number 3 using xhci_hcd
>> kernel: mt76x2u 1-14:1.0: ASIC revision: 76120044
>> kernel: mt76x2u 1-14:1.0: ROM patch build: 20140408060640a
>> kernel: xhci_hcd 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
> 
> Are you using the driver in a VM or directly in the host? This is an
> error reported by AMD iommu driver
Direct. No VM used. This is the only peripheral causing this issue.
> 
> Regards,
> Lorenzo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-01-11 19:01   ` Rosen Penev
@ 2019-01-13 13:33     ` Lorenzo Bianconi
       [not found]       ` <1547404075.1582.0@smtp.gmail.com>
  0 siblings, 1 reply; 33+ messages in thread
From: Lorenzo Bianconi @ 2019-01-13 13:33 UTC (permalink / raw)
  To: Rosen Penev; +Cc: linux-wireless

>
> Direct. No VM used. This is the only peripheral causing this issue.

Is the device connected to a usb3.0 port? If so, could you please try
to connect the dongle to a 2.0 one?
Could you please double check if IOMMU is enabled?

Regards,
Lorenzo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
       [not found]       ` <1547404075.1582.0@smtp.gmail.com>
@ 2019-01-13 19:00         ` Lorenzo Bianconi
  2019-01-14  2:20           ` Rosen Penev
  0 siblings, 1 reply; 33+ messages in thread
From: Lorenzo Bianconi @ 2019-01-13 19:00 UTC (permalink / raw)
  To: Rosen Penev; +Cc: linux-wireless

>
>
> On Sun, Jan 13, 2019 at 5:33 AM, Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:
>
> Direct. No VM used. This is the only peripheral causing this issue.
>
> Is the device connected to a usb3.0 port? If so, could you please try to connect the dongle to a 2.0 one?
>
> I tried through a USB 2.0 port. Shouldn't make a difference as they both use the xhci driver.
>

mt76x2u supports scatter-gather on usb 3.0 (not on 2.0)

> Could you please double check if IOMMU is enabled?
>

Have you tried to disable it? Does it make any difference?

Regards,
Lorenzo

> It is.
>
> Note that this does not happen on kernel 4.15 (Ubuntu's kernel) as mt76 lacks support.
>
> Regards, Lorenzo
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-01-13 19:00         ` Lorenzo Bianconi
@ 2019-01-14  2:20           ` Rosen Penev
  2019-01-14  3:13             ` Samuel Sieb
  2019-01-14  9:18             ` Lorenzo Bianconi
  0 siblings, 2 replies; 33+ messages in thread
From: Rosen Penev @ 2019-01-14  2:20 UTC (permalink / raw)
  To: Lorenzo Bianconi; +Cc: linux-wireless

On Sun, Jan 13, 2019 at 11:00 AM Lorenzo Bianconi
<lorenzo.bianconi@redhat.com> wrote:
>
> >
> >
> > On Sun, Jan 13, 2019 at 5:33 AM, Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:
> >
> > Direct. No VM used. This is the only peripheral causing this issue.
> >
> > Is the device connected to a usb3.0 port? If so, could you please try to connect the dongle to a 2.0 one?
> >
> > I tried through a USB 2.0 port. Shouldn't make a difference as they both use the xhci driver.
> >
>
> mt76x2u supports scatter-gather on usb 3.0 (not on 2.0)
Tried a USB 3 port. Same result.
>
> > Could you please double check if IOMMU is enabled?
> >
>
> Have you tried to disable it? Does it make any difference?
No idea how. UEFI doesn't seem to show anything similar.

Similar bug report: https://bugzilla.kernel.org/show_bug.cgi?id=202241
>
> Regards,
> Lorenzo
>
> > It is.
> >
> > Note that this does not happen on kernel 4.15 (Ubuntu's kernel) as mt76 lacks support.
> >
> > Regards, Lorenzo
> >
> >
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-01-14  2:20           ` Rosen Penev
@ 2019-01-14  3:13             ` Samuel Sieb
  2019-01-14  9:18             ` Lorenzo Bianconi
  1 sibling, 0 replies; 33+ messages in thread
From: Samuel Sieb @ 2019-01-14  3:13 UTC (permalink / raw)
  To: linux-wireless

On 1/13/19 6:20 PM, Rosen Penev wrote:
> On Sun, Jan 13, 2019 at 11:00 AM Lorenzo Bianconi
> <lorenzo.bianconi@redhat.com> wrote:
>>> Could you please double check if IOMMU is enabled?
>>>
>>
>> Have you tried to disable it? Does it make any difference?
> No idea how. UEFI doesn't seem to show anything similar.
> 
> Similar bug report: https://bugzilla.kernel.org/show_bug.cgi?id=202241

As mentioned in that bug report, adding "iommu=off" to the kernel 
command line will disable it.  I think there's also an "iommu=soft" 
option, but I'm not sure what that does.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-01-14  2:20           ` Rosen Penev
  2019-01-14  3:13             ` Samuel Sieb
@ 2019-01-14  9:18             ` Lorenzo Bianconi
  2019-01-14  9:22               ` Tom Psyborg
  2019-01-14 20:06               ` Rosen Penev
  1 sibling, 2 replies; 33+ messages in thread
From: Lorenzo Bianconi @ 2019-01-14  9:18 UTC (permalink / raw)
  To: Rosen Penev; +Cc: linux-wireless

> On Sun, Jan 13, 2019 at 11:00 AM Lorenzo Bianconi
> <lorenzo.bianconi@redhat.com> wrote:
> >
> > >
> > >
> > > On Sun, Jan 13, 2019 at 5:33 AM, Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:
> > >
> > > Direct. No VM used. This is the only peripheral causing this issue.
> > >
> > > Is the device connected to a usb3.0 port? If so, could you please try to connect the dongle to a 2.0 one?
> > >
> > > I tried through a USB 2.0 port. Shouldn't make a difference as they both use the xhci driver.
> > >
> >
> > mt76x2u supports scatter-gather on usb 3.0 (not on 2.0)
> Tried a USB 3 port. Same result.
> >
> > > Could you please double check if IOMMU is enabled?
> > >
> >
> > Have you tried to disable it? Does it make any difference?
> No idea how. UEFI doesn't seem to show anything similar.
> 
> Similar bug report: https://bugzilla.kernel.org/show_bug.cgi?id=202241

You should be able to disable iommu using GRUB_CMDLINE_LINUX in
/etc/default/grub (I guess setting iommu=off and reinstalling grub)
https://wiki.gentoo.org/wiki/IOMMU_SWIOTLB

Regards,
Lorenzo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-01-14  9:18             ` Lorenzo Bianconi
@ 2019-01-14  9:22               ` Tom Psyborg
  2019-01-14 20:06               ` Rosen Penev
  1 sibling, 0 replies; 33+ messages in thread
From: Tom Psyborg @ 2019-01-14  9:22 UTC (permalink / raw)
  To: Lorenzo Bianconi; +Cc: Rosen Penev, linux-wireless

On 14/01/2019, Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:
>> On Sun, Jan 13, 2019 at 11:00 AM Lorenzo Bianconi
>> <lorenzo.bianconi@redhat.com> wrote:
>> >
>> > >
>> > >
>> > > On Sun, Jan 13, 2019 at 5:33 AM, Lorenzo Bianconi
>> > > <lorenzo.bianconi@redhat.com> wrote:
>> > >
>> > > Direct. No VM used. This is the only peripheral causing this issue.
>> > >
>> > > Is the device connected to a usb3.0 port? If so, could you please try
>> > > to connect the dongle to a 2.0 one?
>> > >
>> > > I tried through a USB 2.0 port. Shouldn't make a difference as they
>> > > both use the xhci driver.
>> > >
>> >
>> > mt76x2u supports scatter-gather on usb 3.0 (not on 2.0)
>> Tried a USB 3 port. Same result.
>> >
>> > > Could you please double check if IOMMU is enabled?
>> > >
>> >
>> > Have you tried to disable it? Does it make any difference?
>> No idea how. UEFI doesn't seem to show anything similar.
>>
>> Similar bug report: https://bugzilla.kernel.org/show_bug.cgi?id=202241
>
> You should be able to disable iommu using GRUB_CMDLINE_LINUX in
> /etc/default/grub (I guess setting iommu=off and reinstalling grub)
> https://wiki.gentoo.org/wiki/IOMMU_SWIOTLB
>
> Regards,
> Lorenzo
>

updating grub, not reinstalling

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-01-14  9:18             ` Lorenzo Bianconi
  2019-01-14  9:22               ` Tom Psyborg
@ 2019-01-14 20:06               ` Rosen Penev
  2019-01-15  9:04                 ` Lorenzo Bianconi
  1 sibling, 1 reply; 33+ messages in thread
From: Rosen Penev @ 2019-01-14 20:06 UTC (permalink / raw)
  To: Lorenzo Bianconi; +Cc: linux-wireless

On Mon, Jan 14, 2019 at 1:18 AM Lorenzo Bianconi
<lorenzo.bianconi@redhat.com> wrote:
>
> > On Sun, Jan 13, 2019 at 11:00 AM Lorenzo Bianconi
> > <lorenzo.bianconi@redhat.com> wrote:
> > >
> > > >
> > > >
> > > > On Sun, Jan 13, 2019 at 5:33 AM, Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:
> > > >
> > > > Direct. No VM used. This is the only peripheral causing this issue.
> > > >
> > > > Is the device connected to a usb3.0 port? If so, could you please try to connect the dongle to a 2.0 one?
> > > >
> > > > I tried through a USB 2.0 port. Shouldn't make a difference as they both use the xhci driver.
> > > >
> > >
> > > mt76x2u supports scatter-gather on usb 3.0 (not on 2.0)
> > Tried a USB 3 port. Same result.
> > >
> > > > Could you please double check if IOMMU is enabled?
> > > >
> > >
> > > Have you tried to disable it? Does it make any difference?
> > No idea how. UEFI doesn't seem to show anything similar.
> >
> > Similar bug report: https://bugzilla.kernel.org/show_bug.cgi?id=202241
>
> You should be able to disable iommu using GRUB_CMDLINE_LINUX in
> /etc/default/grub (I guess setting iommu=off and reinstalling grub)
> https://wiki.gentoo.org/wiki/IOMMU_SWIOTLB
Yep. Working great now. I wonder what mt76 is doing to cause the crash though...
>
> Regards,
> Lorenzo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-01-14 20:06               ` Rosen Penev
@ 2019-01-15  9:04                 ` Lorenzo Bianconi
  2019-02-18 14:37                   ` Stanislaw Gruszka
  0 siblings, 1 reply; 33+ messages in thread
From: Lorenzo Bianconi @ 2019-01-15  9:04 UTC (permalink / raw)
  To: Rosen Penev; +Cc: linux-wireless

> On Mon, Jan 14, 2019 at 1:18 AM Lorenzo Bianconi
> <lorenzo.bianconi@redhat.com> wrote:
> >
> > > On Sun, Jan 13, 2019 at 11:00 AM Lorenzo Bianconi
> > > <lorenzo.bianconi@redhat.com> wrote:
> > > >
> > > > >
> > > > >
> > > > > On Sun, Jan 13, 2019 at 5:33 AM, Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:
> > > > >
> > > > > Direct. No VM used. This is the only peripheral causing this issue.
> > > > >
> > > > > Is the device connected to a usb3.0 port? If so, could you please try to connect the dongle to a 2.0 one?
> > > > >
> > > > > I tried through a USB 2.0 port. Shouldn't make a difference as they both use the xhci driver.
> > > > >
> > > >
> > > > mt76x2u supports scatter-gather on usb 3.0 (not on 2.0)
> > > Tried a USB 3 port. Same result.
> > > >
> > > > > Could you please double check if IOMMU is enabled?
> > > > >
> > > >
> > > > Have you tried to disable it? Does it make any difference?
> > > No idea how. UEFI doesn't seem to show anything similar.
> > >
> > > Similar bug report: https://bugzilla.kernel.org/show_bug.cgi?id=202241
> >
> > You should be able to disable iommu using GRUB_CMDLINE_LINUX in
> > /etc/default/grub (I guess setting iommu=off and reinstalling grub)
> > https://wiki.gentoo.org/wiki/IOMMU_SWIOTLB
> Yep. Working great now. I wonder what mt76 is doing to cause the crash though...

Thanks for bisecting the issue. I think amd iommu does not support well usb scatter-gather
(used by default in mt76u). I am working on a series in order to add the possibility to
disable it.

Regards,
Lorenzo

> >
> > Regards,
> > Lorenzo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-01-15  9:04                 ` Lorenzo Bianconi
@ 2019-02-18 14:37                   ` Stanislaw Gruszka
  2019-02-18 15:15                     ` Lorenzo Bianconi
                                       ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Stanislaw Gruszka @ 2019-02-18 14:37 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Rosen Penev, linux-wireless, Samuel Sieb, Alexander Duyck, iommu,
	Joerg Roedel, linux-kernel

(cc: IOMMU & page_frag_alloc maintainers)

On Tue, Jan 15, 2019 at 10:04:01AM +0100, Lorenzo Bianconi wrote:
> > On Mon, Jan 14, 2019 at 1:18 AM Lorenzo Bianconi
> > <lorenzo.bianconi@redhat.com> wrote:
> > >
> > > > On Sun, Jan 13, 2019 at 11:00 AM Lorenzo Bianconi
> > > > <lorenzo.bianconi@redhat.com> wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, Jan 13, 2019 at 5:33 AM, Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:
> > > > > >
> > > > > > Direct. No VM used. This is the only peripheral causing this issue.
> > > > > >
> > > > > > Is the device connected to a usb3.0 port? If so, could you please try to connect the dongle to a 2.0 one?
> > > > > >
> > > > > > I tried through a USB 2.0 port. Shouldn't make a difference as they both use the xhci driver.
> > > > > >
> > > > >
> > > > > mt76x2u supports scatter-gather on usb 3.0 (not on 2.0)
> > > > Tried a USB 3 port. Same result.
> > > > >
> > > > > > Could you please double check if IOMMU is enabled?
> > > > > >
> > > > >
> > > > > Have you tried to disable it? Does it make any difference?
> > > > No idea how. UEFI doesn't seem to show anything similar.
> > > >
> > > > Similar bug report: https://bugzilla.kernel.org/show_bug.cgi?id=202241

FWIW: I provided some patches in the bugzilla, which were reported to
solve the problem. But I looking for confirmation if both are needed:

0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch
0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch

Or problem can be solved by just one of it (either first or second).

Additionally I'm not 100% sure if

0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch

is correct. So perhaps some IOMMU maintainer could look at it.

> > > You should be able to disable iommu using GRUB_CMDLINE_LINUX in
> > > /etc/default/grub (I guess setting iommu=off and reinstalling grub)
> > > https://wiki.gentoo.org/wiki/IOMMU_SWIOTLB
> > Yep. Working great now. I wonder what mt76 is doing to cause the crash though...
> 
> Thanks for bisecting the issue. 

Lorenzo, what you mean by 'bisecting' here ? Someone did 'git bisect'
on this issue?

> I think amd iommu does not support well usb scatter-gather
> (used by default in mt76u). I am working on a series in order to add the possibility to
> disable it.

Even if that true that AMD IOMMU does not support 'well' SG (what I think
is not true) disabling SG in mt76 driver is not right solution. Right
solution would be propagate the issue to AMD IOMMU maintainers
(already CCed).

One problem in mt76 is page_frag_alloc() usage with different sizes.
page_frag_alloc() unlike like other allocators do not assure alignment
and relay on callers to provide buffers sizes that are aligned.
Unaligned buffer might then not be appropriate for DMA.

Another issue is that dma_map_sg() & dma_map_page() may require some 
constraints. I'm not sure about that and I want to clarify that with 
CCed mm maintainers. I think DMA drivers may expect sg->offset < PAGE_SIZE
for both dma_map_sg() and dma_map_page(). Additionally dma_map_page()
maight expect that offset & length specify buffer within one page.

Stanislaw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-02-18 14:37                   ` Stanislaw Gruszka
@ 2019-02-18 15:15                     ` Lorenzo Bianconi
  2019-02-18 17:01                     ` Robin Murphy
  2019-02-26 10:05                     ` Joerg Roedel
  2 siblings, 0 replies; 33+ messages in thread
From: Lorenzo Bianconi @ 2019-02-18 15:15 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Rosen Penev, linux-wireless, Samuel Sieb, Alexander Duyck, iommu,
	Joerg Roedel, linux-kernel

> (cc: IOMMU & page_frag_alloc maintainers)
> 
> On Tue, Jan 15, 2019 at 10:04:01AM +0100, Lorenzo Bianconi wrote:
> > > On Mon, Jan 14, 2019 at 1:18 AM Lorenzo Bianconi
> > > <lorenzo.bianconi@redhat.com> wrote:
> > > >
> > > > > On Sun, Jan 13, 2019 at 11:00 AM Lorenzo Bianconi
> > > > > <lorenzo.bianconi@redhat.com> wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Jan 13, 2019 at 5:33 AM, Lorenzo Bianconi <lorenzo.bianconi@redhat.com> wrote:
> > > > > > >
> > > > > > > Direct. No VM used. This is the only peripheral causing this issue.
> > > > > > >
> > > > > > > Is the device connected to a usb3.0 port? If so, could you please try to connect the dongle to a 2.0 one?
> > > > > > >
> > > > > > > I tried through a USB 2.0 port. Shouldn't make a difference as they both use the xhci driver.
> > > > > > >
> > > > > >
> > > > > > mt76x2u supports scatter-gather on usb 3.0 (not on 2.0)
> > > > > Tried a USB 3 port. Same result.
> > > > > >
> > > > > > > Could you please double check if IOMMU is enabled?
> > > > > > >
> > > > > >
> > > > > > Have you tried to disable it? Does it make any difference?
> > > > > No idea how. UEFI doesn't seem to show anything similar.
> > > > >
> > > > > Similar bug report: https://bugzilla.kernel.org/show_bug.cgi?id=202241
> 
> FWIW: I provided some patches in the bugzilla, which were reported to
> solve the problem. But I looking for confirmation if both are needed:
> 
> 0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch
> 0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch
> 
> Or problem can be solved by just one of it (either first or second).
> 
> Additionally I'm not 100% sure if
> 
> 0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch
> 
> is correct. So perhaps some IOMMU maintainer could look at it.
> 
> > > > You should be able to disable iommu using GRUB_CMDLINE_LINUX in
> > > > /etc/default/grub (I guess setting iommu=off and reinstalling grub)
> > > > https://wiki.gentoo.org/wiki/IOMMU_SWIOTLB
> > > Yep. Working great now. I wonder what mt76 is doing to cause the crash though...
> > 
> > Thanks for bisecting the issue. 
> 
> Lorenzo, what you mean by 'bisecting' here ? Someone did 'git bisect'
> on this issue?
> 

Hi Stanislaw,

I was meaning 'help bisecting' the issue

> > I think amd iommu does not support well usb scatter-gather
> > (used by default in mt76u). I am working on a series in order to add the possibility to
> > disable it.
> 
> Even if that true that AMD IOMMU does not support 'well' SG (what I think
> is not true) disabling SG in mt76 driver is not right solution. Right
> solution would be propagate the issue to AMD IOMMU maintainers
> (already CCed).

I meant that AMD iommu seems to have different constraints respect to Intel
one.

Regards,
Lorenzo

> 
> One problem in mt76 is page_frag_alloc() usage with different sizes.
> page_frag_alloc() unlike like other allocators do not assure alignment
> and relay on callers to provide buffers sizes that are aligned.
> Unaligned buffer might then not be appropriate for DMA.
> 
> Another issue is that dma_map_sg() & dma_map_page() may require some 
> constraints. I'm not sure about that and I want to clarify that with 
> CCed mm maintainers. I think DMA drivers may expect sg->offset < PAGE_SIZE
> for both dma_map_sg() and dma_map_page(). Additionally dma_map_page()
> maight expect that offset & length specify buffer within one page.
> 
> Stanislaw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-02-18 14:37                   ` Stanislaw Gruszka
  2019-02-18 15:15                     ` Lorenzo Bianconi
@ 2019-02-18 17:01                     ` Robin Murphy
  2019-02-19 11:08                       ` Stanislaw Gruszka
  2019-02-26 10:05                     ` Joerg Roedel
  2 siblings, 1 reply; 33+ messages in thread
From: Robin Murphy @ 2019-02-18 17:01 UTC (permalink / raw)
  To: Stanislaw Gruszka, Lorenzo Bianconi
  Cc: Samuel Sieb, linux-wireless, linux-kernel, iommu, Rosen Penev,
	Alexander Duyck

On 18/02/2019 14:37, Stanislaw Gruszka wrote:
[...]
> Another issue is that dma_map_sg() & dma_map_page() may require some
> constraints. I'm not sure about that and I want to clarify that with
> CCed mm maintainers. I think DMA drivers may expect sg->offset < PAGE_SIZE
> for both dma_map_sg() and dma_map_page(). Additionally dma_map_page()
> maight expect that offset & length specify buffer within one page.

Luckily, this came up a while back[1] and we seemed to reach a consensus 
that sg->offset >= PAGE_SIZE for dma_map_sg() was weird but valid. IIRC 
it was only the Intel IOMMU code which failed to handle that case 
appropriately (and which I fixed) - the AMD IOMMU code always looked 
like it should be OK, but I'm not sure I've ever seen definitive test 
results (and I don't have hardware to do so myself).

For dma_map_page(), length >= PAGE_SIZE should be perfectly valid and 
handled correctly. The offset >= PAGE_SIZE case is a bit harder to 
justify, but at the same time has less scope for the DMA API backend to 
get it wrong, so either way is likely to be OK in practice (in 
particular the AMD IOMMU code looks like it won't have a problem, since 
its map_page() implementation converts page and offset to a plain 
physical address before doing anything else).

Robin.

[1] 
https://lists.linuxfoundation.org/pipermail/iommu/2017-September/024148.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-02-18 17:01                     ` Robin Murphy
@ 2019-02-19 11:08                       ` Stanislaw Gruszka
  0 siblings, 0 replies; 33+ messages in thread
From: Stanislaw Gruszka @ 2019-02-19 11:08 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lorenzo Bianconi, Samuel Sieb, linux-wireless, linux-kernel,
	iommu, Rosen Penev, Alexander Duyck

On Mon, Feb 18, 2019 at 05:01:59PM +0000, Robin Murphy wrote:
> On 18/02/2019 14:37, Stanislaw Gruszka wrote:
> [...]
> >Another issue is that dma_map_sg() & dma_map_page() may require some
> >constraints. I'm not sure about that and I want to clarify that with
> >CCed mm maintainers. I think DMA drivers may expect sg->offset < PAGE_SIZE
> >for both dma_map_sg() and dma_map_page(). Additionally dma_map_page()
> >maight expect that offset & length specify buffer within one page.
> 
> Luckily, this came up a while back[1] and we seemed to reach a
> consensus that sg->offset >= PAGE_SIZE for dma_map_sg() was weird
> but valid. IIRC it was only the Intel IOMMU code which failed to
> handle that case appropriately (and which I fixed) - the AMD IOMMU
> code always looked like it should be OK, but I'm not sure I've ever
> seen definitive test results (and I don't have hardware to do so
> myself).

Funny that we have problems on AMD IOMMU and not with Intel IOMMU.

> For dma_map_page(), length >= PAGE_SIZE should be perfectly valid
> and handled correctly. The offset >= PAGE_SIZE case is a bit harder
> to justify, but at the same time has less scope for the DMA API
> backend to get it wrong, so either way is likely to be OK in
> practice (in particular the AMD IOMMU code looks like it won't have
> a problem, since its map_page() implementation converts page and
> offset to a plain physical address before doing anything else).

Thanks for clarify this. So my patch which do:

-		page = virt_to_head_page(data);
+		page = virt_to_page(data);
 		offset = data - page_address(page);
 		sg_set_page(&urb->sg[i], page, sglen, offset);

should not be necessary as IOMMU driver do exactly the same internally.

Are there any alignment requirement for offset for dma_map_{page,sg} ?
It will work with let say sg->offset=113 or we have make sure it is
aligned to some boundary. If so, what boundary ?

Stanislaw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-02-18 14:37                   ` Stanislaw Gruszka
  2019-02-18 15:15                     ` Lorenzo Bianconi
  2019-02-18 17:01                     ` Robin Murphy
@ 2019-02-26 10:05                     ` Joerg Roedel
  2019-02-26 10:34                       ` Stanislaw Gruszka
  2 siblings, 1 reply; 33+ messages in thread
From: Joerg Roedel @ 2019-02-26 10:05 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Lorenzo Bianconi, Rosen Penev, linux-wireless, Samuel Sieb,
	Alexander Duyck, iommu, linux-kernel

On Mon, Feb 18, 2019 at 03:37:48PM +0100, Stanislaw Gruszka wrote:
> 0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch
> 0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch
> 
> Or problem can be solved by just one of it (either first or second).
> 
> Additionally I'm not 100% sure if
> 
> 0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch
> 
> is correct. So perhaps some IOMMU maintainer could look at it.

The patch looks good, but I don't understand why it is needed. The AMD
IOMMU driver should handle sg->offset > PAGE_SIZE just fine. Can you
verify that this is the problem? I will look into that again if it turns
out there is bug in the IOMMU driver.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-02-26 10:05                     ` Joerg Roedel
@ 2019-02-26 10:34                       ` Stanislaw Gruszka
  2019-02-26 10:44                         ` Joerg Roedel
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislaw Gruszka @ 2019-02-26 10:34 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Lorenzo Bianconi, Rosen Penev, linux-wireless, Samuel Sieb,
	Alexander Duyck, iommu, linux-kernel

On Tue, Feb 26, 2019 at 11:05:36AM +0100, Joerg Roedel wrote:
> On Mon, Feb 18, 2019 at 03:37:48PM +0100, Stanislaw Gruszka wrote:
> > 0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch
> > 0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch
> > 
> > Or problem can be solved by just one of it (either first or second).
> > 
> > Additionally I'm not 100% sure if
> > 
> > 0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch
> > 
> > is correct. So perhaps some IOMMU maintainer could look at it.
> 
> The patch looks good, but I don't understand why it is needed. The AMD
> IOMMU driver should handle sg->offset > PAGE_SIZE just fine. Can you
> verify that this is the problem? I will look into that again if it turns
> out there is bug in the IOMMU driver.

I'm try to get that information from bug reporter, but I can't get it so
far.

If sg->offset > PAGE_SIZE is fine then most likely we have problem with
alignment. We use page_frag_alloc() in mt76usb for buffer allocation
in scheme like this

page_frag_alloc(max_payload);	// something like 14434
page_frag_alloc(1024);
page_frag_alloc(2048)
page_frag_alloc(2048)
page_frag_alloc(2048)
...

page_frag_alloc works smart and fast way internally by allocating
fragments just but changing internal offset:

        offset = nc->offset - fragsz;
        if (unlikely(offset < 0)) {
                page = virt_to_page(nc->va);
	.
	.
	.

        }

        nc->offset = offset;
	return nc->va + offset;

but unlike other allocators like kmalloc that make effort to provide
ARCH_DMA_MINALIGN buffers, it does not care about alignment. Above
scheme of allocation in mt76usb breaks it. 

Note hat issue is with dma_map_sg(), switching to dma_map_single()
by using urb->transfer_buffer instead of urb->sg make things work
on AMD IOMMU.

Stanislaw


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-02-26 10:34                       ` Stanislaw Gruszka
@ 2019-02-26 10:44                         ` Joerg Roedel
  2019-02-26 11:24                           ` Stanislaw Gruszka
  0 siblings, 1 reply; 33+ messages in thread
From: Joerg Roedel @ 2019-02-26 10:44 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Lorenzo Bianconi, Rosen Penev, linux-wireless, Samuel Sieb,
	Alexander Duyck, iommu, linux-kernel

On Tue, Feb 26, 2019 at 11:34:51AM +0100, Stanislaw Gruszka wrote:
> On Tue, Feb 26, 2019 at 11:05:36AM +0100, Joerg Roedel wrote:
> If sg->offset > PAGE_SIZE is fine then most likely we have problem with
> alignment.

The map_sg implementation in the AMD IOMMU driver uses sg_phys() which
handles the sg->page + sg->offset calculation fine.

> Note hat issue is with dma_map_sg(), switching to dma_map_single()
> by using urb->transfer_buffer instead of urb->sg make things work
> on AMD IOMMU.

On the other hand this points to a bug in the driver, I'll look further
if I can spot something there.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-02-26 10:44                         ` Joerg Roedel
@ 2019-02-26 11:24                           ` Stanislaw Gruszka
  2019-02-28  9:04                             ` Stanislaw Gruszka
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislaw Gruszka @ 2019-02-26 11:24 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Lorenzo Bianconi, Rosen Penev, linux-wireless, Samuel Sieb,
	Alexander Duyck, iommu, linux-kernel

On Tue, Feb 26, 2019 at 11:44:13AM +0100, Joerg Roedel wrote:
> On Tue, Feb 26, 2019 at 11:34:51AM +0100, Stanislaw Gruszka wrote:
> > On Tue, Feb 26, 2019 at 11:05:36AM +0100, Joerg Roedel wrote:
> > If sg->offset > PAGE_SIZE is fine then most likely we have problem with
> > alignment.
> 
> The map_sg implementation in the AMD IOMMU driver uses sg_phys() which
> handles the sg->page + sg->offset calculation fine.
> 
> > Note hat issue is with dma_map_sg(), switching to dma_map_single()
> > by using urb->transfer_buffer instead of urb->sg make things work
> > on AMD IOMMU.
> 
> On the other hand this points to a bug in the driver, I'll look further
> if I can spot something there.

I think so too. And I have done some changes that avoid strange allocation
scheme and use usb synchronous messages instead of allocating buffers
with unaligned sizes. However things work ok on Intel IOMMU and 
there is no documentation what are dma_map_sg() requirement versus
dma_map_single() which works. I think there are some unwritten
requirements and things can work on some platforms and fails on others
(different IOMMUs, no-IOMMU on some ARCHes)  

Stanislaw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-02-26 11:24                           ` Stanislaw Gruszka
@ 2019-02-28  9:04                             ` Stanislaw Gruszka
  2019-02-28 10:42                               ` Stanislaw Gruszka
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislaw Gruszka @ 2019-02-28  9:04 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Lorenzo Bianconi, Rosen Penev, linux-wireless, Samuel Sieb,
	Alexander Duyck, iommu, linux-kernel

On Tue, Feb 26, 2019 at 12:24:08PM +0100, Stanislaw Gruszka wrote:
> On Tue, Feb 26, 2019 at 11:44:13AM +0100, Joerg Roedel wrote:
> > On Tue, Feb 26, 2019 at 11:34:51AM +0100, Stanislaw Gruszka wrote:
> > > On Tue, Feb 26, 2019 at 11:05:36AM +0100, Joerg Roedel wrote:
> > > If sg->offset > PAGE_SIZE is fine then most likely we have problem with
> > > alignment.
> > 
> > The map_sg implementation in the AMD IOMMU driver uses sg_phys() which
> > handles the sg->page + sg->offset calculation fine.
> > 
> > > Note hat issue is with dma_map_sg(), switching to dma_map_single()
> > > by using urb->transfer_buffer instead of urb->sg make things work
> > > on AMD IOMMU.
> > 
> > On the other hand this points to a bug in the driver, I'll look further
> > if I can spot something there.
> 
> I think so too. And I have done some changes that avoid strange allocation
> scheme and use usb synchronous messages instead of allocating buffers
> with unaligned sizes. However things work ok on Intel IOMMU and 
> there is no documentation what are dma_map_sg() requirement versus
> dma_map_single() which works. I think there are some unwritten
> requirements and things can work on some platforms and fails on others
> (different IOMMUs, no-IOMMU on some ARCHes)  

For the record: we have another bug report with this issue:
https://bugzilla.kernel.org/show_bug.cgi?id=202673

I provided there patch that change alignment for page_frag_alloc() and
it did not fixed the problem. So this is not alignment issue.
Now I think it could be page->refcount issue ...

Stanislaw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-02-28  9:04                             ` Stanislaw Gruszka
@ 2019-02-28 10:42                               ` Stanislaw Gruszka
  2019-02-28 12:19                                 ` Stanislaw Gruszka
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislaw Gruszka @ 2019-02-28 10:42 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Lorenzo Bianconi, Rosen Penev, linux-wireless, Samuel Sieb,
	Alexander Duyck, iommu, linux-kernel

On Thu, Feb 28, 2019 at 10:04:12AM +0100, Stanislaw Gruszka wrote:
> On Tue, Feb 26, 2019 at 12:24:08PM +0100, Stanislaw Gruszka wrote:
> > On Tue, Feb 26, 2019 at 11:44:13AM +0100, Joerg Roedel wrote:
> > > On Tue, Feb 26, 2019 at 11:34:51AM +0100, Stanislaw Gruszka wrote:
> > > > On Tue, Feb 26, 2019 at 11:05:36AM +0100, Joerg Roedel wrote:
> > > > If sg->offset > PAGE_SIZE is fine then most likely we have problem with
> > > > alignment.
> > > 
> > > The map_sg implementation in the AMD IOMMU driver uses sg_phys() which
> > > handles the sg->page + sg->offset calculation fine.
> > > 
> > > > Note hat issue is with dma_map_sg(), switching to dma_map_single()
> > > > by using urb->transfer_buffer instead of urb->sg make things work
> > > > on AMD IOMMU.
> > > 
> > > On the other hand this points to a bug in the driver, I'll look further
> > > if I can spot something there.
> > 
> > I think so too. And I have done some changes that avoid strange allocation
> > scheme and use usb synchronous messages instead of allocating buffers
> > with unaligned sizes. However things work ok on Intel IOMMU and 
> > there is no documentation what are dma_map_sg() requirement versus
> > dma_map_single() which works. I think there are some unwritten
> > requirements and things can work on some platforms and fails on others
> > (different IOMMUs, no-IOMMU on some ARCHes)  
> 
> For the record: we have another bug report with this issue:
> https://bugzilla.kernel.org/show_bug.cgi?id=202673
> 
> I provided there patch that change alignment for page_frag_alloc() and
> it did not fixed the problem. So this is not alignment issue.
> Now I think it could be page->refcount issue ...

I looked at the map_sg() in amd_iommu.c code and one line looks suspicious
to me, seems we can use not correctly initialized s->dma_address (should be 0,
but I think can be non-zero if SG was reused). The code also seems do
not do correct thing if there is more than one SG with multiple pages
on individual segments. Something like in below patch seems to be more
appropriate to me (not tested nor compiled).

Stanislaw

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 34c9aa76a7bd..9c8887250b82 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2517,6 +2517,7 @@ static int map_sg(struct device *dev, struct scatterlist *sglist,
 	prot = dir2prot(direction);
 
 	/* Map all sg entries */
+	npages = 0;
 	for_each_sg(sglist, s, nelems, i) {
 		int j, pages = iommu_num_pages(sg_phys(s), s->length, PAGE_SIZE);
 
@@ -2524,7 +2525,7 @@ static int map_sg(struct device *dev, struct scatterlist *sglist,
 			unsigned long bus_addr, phys_addr;
 			int ret;
 
-			bus_addr  = address + s->dma_address + (j << PAGE_SHIFT);
+			bus_addr  = address + ((npages + j) << PAGE_SHIFT);
 			phys_addr = (sg_phys(s) & PAGE_MASK) + (j << PAGE_SHIFT);
 			ret = iommu_map_page(domain, bus_addr, phys_addr, PAGE_SIZE, prot, GFP_ATOMIC);
 			if (ret)
@@ -2532,6 +2533,8 @@ static int map_sg(struct device *dev, struct scatterlist *sglist,
 
 			mapped_pages += 1;
 		}
+
+		npages += mapped_pages;
 	}
 
 	/* Everything is mapped - write the right values into s->dma_address */



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-02-28 10:42                               ` Stanislaw Gruszka
@ 2019-02-28 12:19                                 ` Stanislaw Gruszka
  2019-02-28 13:40                                   ` Joerg Roedel
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislaw Gruszka @ 2019-02-28 12:19 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Lorenzo Bianconi, Rosen Penev, linux-wireless, Samuel Sieb,
	Alexander Duyck, iommu, linux-kernel

On Thu, Feb 28, 2019 at 11:42:24AM +0100, Stanislaw Gruszka wrote:
> On Thu, Feb 28, 2019 at 10:04:12AM +0100, Stanislaw Gruszka wrote:
> > On Tue, Feb 26, 2019 at 12:24:08PM +0100, Stanislaw Gruszka wrote:
> > > On Tue, Feb 26, 2019 at 11:44:13AM +0100, Joerg Roedel wrote:
> > > > On Tue, Feb 26, 2019 at 11:34:51AM +0100, Stanislaw Gruszka wrote:
> > > > > On Tue, Feb 26, 2019 at 11:05:36AM +0100, Joerg Roedel wrote:
> > > > > If sg->offset > PAGE_SIZE is fine then most likely we have problem with
> > > > > alignment.
> > > > 
> > > > The map_sg implementation in the AMD IOMMU driver uses sg_phys() which
> > > > handles the sg->page + sg->offset calculation fine.
> > > > 
> > > > > Note hat issue is with dma_map_sg(), switching to dma_map_single()
> > > > > by using urb->transfer_buffer instead of urb->sg make things work
> > > > > on AMD IOMMU.
> > > > 
> > > > On the other hand this points to a bug in the driver, I'll look further
> > > > if I can spot something there.
> > > 
> > > I think so too. And I have done some changes that avoid strange allocation
> > > scheme and use usb synchronous messages instead of allocating buffers
> > > with unaligned sizes. However things work ok on Intel IOMMU and 
> > > there is no documentation what are dma_map_sg() requirement versus
> > > dma_map_single() which works. I think there are some unwritten
> > > requirements and things can work on some platforms and fails on others
> > > (different IOMMUs, no-IOMMU on some ARCHes)  
> > 
> > For the record: we have another bug report with this issue:
> > https://bugzilla.kernel.org/show_bug.cgi?id=202673
> > 
> > I provided there patch that change alignment for page_frag_alloc() and
> > it did not fixed the problem. So this is not alignment issue.
> > Now I think it could be page->refcount issue ...
> 
> I looked at the map_sg() in amd_iommu.c code and one line looks suspicious
> to me, seems we can use not correctly initialized s->dma_address (should be 0,
> but I think can be non-zero if SG was reused). The code also seems do
> not do correct thing if there is more than one SG with multiple pages
> on individual segments. Something like in below patch seems to be more
> appropriate to me (not tested nor compiled).

Nevermind, the patch is wrong, s->dma_address is initalized in sg_num_pages().

Stanislaw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-02-28 12:19                                 ` Stanislaw Gruszka
@ 2019-02-28 13:40                                   ` Joerg Roedel
  2019-03-04  7:10                                     ` Stanislaw Gruszka
  0 siblings, 1 reply; 33+ messages in thread
From: Joerg Roedel @ 2019-02-28 13:40 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Lorenzo Bianconi, Rosen Penev, linux-wireless, Samuel Sieb,
	Alexander Duyck, iommu, linux-kernel

On Thu, Feb 28, 2019 at 01:19:48PM +0100, Stanislaw Gruszka wrote:
> Nevermind, the patch is wrong, s->dma_address is initalized in sg_num_pages().

Yes, it is. In sg_num_pages() the offset into the IOMMU mapping is
stored in s->dma_address, taking also the segment boundary mask into
account. map_sg() later only adds the base-address to that.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-02-28 13:40                                   ` Joerg Roedel
@ 2019-03-04  7:10                                     ` Stanislaw Gruszka
  2019-03-04  7:20                                       ` Rosen Penev
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislaw Gruszka @ 2019-03-04  7:10 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Lorenzo Bianconi, Rosen Penev, linux-wireless, Samuel Sieb,
	Alexander Duyck, iommu, linux-kernel

On Thu, Feb 28, 2019 at 02:40:29PM +0100, Joerg Roedel wrote:
> On Thu, Feb 28, 2019 at 01:19:48PM +0100, Stanislaw Gruszka wrote:
> > Nevermind, the patch is wrong, s->dma_address is initalized in sg_num_pages().
> 
> Yes, it is. In sg_num_pages() the offset into the IOMMU mapping is
> stored in s->dma_address, taking also the segment boundary mask into
> account. map_sg() later only adds the base-address to that.

I have some more info about the issues in
https://bugzilla.kernel.org/show_bug.cgi?id=202673 

We have some bugs in mt76. Apparently we should not use
page_frag_alloc() with size bigger than PAGE_SIZE as page_frag_alloc()
can fallback to single page allocation. And also we should not make
sizes unaligned as pointed in commit:
3bed3cc4156e ("net: Do not allocate page fragments that are not skb aligned"

However after fixing that mt76usb still did not work. To make things
work we had to change rx frag size from 2048 to PAGE_SIZE and change
virt_to_head_page() to virt_to_page() when setting SG's.

I think I understand why first change was needed. If we do 2 separate
dma maps of 2 different buffers in single page i.e (PAGE + off=0
and PAGE + off=2048) it causes problem. So either map_sg() return
error which mt76usb does not handle correctly or there is issue
in AMD IOMMU because two dma maps use the same page.

But I don't understand why the second change was needed. Without
it we have issue with incorrect page->_refcount . It is somehow
related with AMD IOMMU, because on different platforms we do not
have such problems.

Joerg, could you look at this ? Thanks. 

Stanislaw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-03-04  7:10                                     ` Stanislaw Gruszka
@ 2019-03-04  7:20                                       ` Rosen Penev
  2019-03-11  8:43                                         ` Stanislaw Gruszka
  0 siblings, 1 reply; 33+ messages in thread
From: Rosen Penev @ 2019-03-04  7:20 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Joerg Roedel, Lorenzo Bianconi, linux-wireless, Samuel Sieb,
	Alexander Duyck, iommu, linux-kernel

On Sun, Mar 3, 2019 at 11:10 PM Stanislaw Gruszka <sgruszka@redhat.com> wrote:
>
> On Thu, Feb 28, 2019 at 02:40:29PM +0100, Joerg Roedel wrote:
> > On Thu, Feb 28, 2019 at 01:19:48PM +0100, Stanislaw Gruszka wrote:
> > > Nevermind, the patch is wrong, s->dma_address is initalized in sg_num_pages().
> >
> > Yes, it is. In sg_num_pages() the offset into the IOMMU mapping is
> > stored in s->dma_address, taking also the segment boundary mask into
> > account. map_sg() later only adds the base-address to that.
>
> I have some more info about the issues in
> https://bugzilla.kernel.org/show_bug.cgi?id=202673
>
> We have some bugs in mt76. Apparently we should not use
> page_frag_alloc() with size bigger than PAGE_SIZE as page_frag_alloc()
> can fallback to single page allocation. And also we should not make
> sizes unaligned as pointed in commit:
> 3bed3cc4156e ("net: Do not allocate page fragments that are not skb aligned"
As a small and totally unrelated note, page_frag_alloc is only used in
mt76 and the nvme driver ;)
>
> However after fixing that mt76usb still did not work. To make things
> work we had to change rx frag size from 2048 to PAGE_SIZE and change
> virt_to_head_page() to virt_to_page() when setting SG's.
>
> I think I understand why first change was needed. If we do 2 separate
> dma maps of 2 different buffers in single page i.e (PAGE + off=0
> and PAGE + off=2048) it causes problem. So either map_sg() return
> error which mt76usb does not handle correctly or there is issue
> in AMD IOMMU because two dma maps use the same page.
>
> But I don't understand why the second change was needed. Without
> it we have issue with incorrect page->_refcount . It is somehow
> related with AMD IOMMU, because on different platforms we do not
> have such problems.
>
> Joerg, could you look at this ? Thanks.
>
> Stanislaw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-03-04  7:20                                       ` Rosen Penev
@ 2019-03-11  8:43                                         ` Stanislaw Gruszka
  2019-03-11  9:03                                           ` [PATCH] iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE Stanislaw Gruszka
  2019-03-12  7:13                                           ` MT76x2U crashes XHCI driver on AMD Ryzen system Stanislaw Gruszka
  0 siblings, 2 replies; 33+ messages in thread
From: Stanislaw Gruszka @ 2019-03-11  8:43 UTC (permalink / raw)
  To: Rosen Penev
  Cc: Joerg Roedel, Lorenzo Bianconi, linux-wireless, Samuel Sieb,
	Alexander Duyck, iommu, linux-kernel, linux-nvme, jan.viktorin

On Sun, Mar 03, 2019 at 11:20:45PM -0800, Rosen Penev wrote:
> On Sun, Mar 3, 2019 at 11:10 PM Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> >
> > On Thu, Feb 28, 2019 at 02:40:29PM +0100, Joerg Roedel wrote:
> > > On Thu, Feb 28, 2019 at 01:19:48PM +0100, Stanislaw Gruszka wrote:
> > > > Nevermind, the patch is wrong, s->dma_address is initalized in sg_num_pages().
> > >
> > > Yes, it is. In sg_num_pages() the offset into the IOMMU mapping is
> > > stored in s->dma_address, taking also the segment boundary mask into
> > > account. map_sg() later only adds the base-address to that.
> >
> > I have some more info about the issues in
> > https://bugzilla.kernel.org/show_bug.cgi?id=202673
> >
> > We have some bugs in mt76. Apparently we should not use
> > page_frag_alloc() with size bigger than PAGE_SIZE as page_frag_alloc()
> > can fallback to single page allocation. And also we should not make
> > sizes unaligned as pointed in commit:
> > 3bed3cc4156e ("net: Do not allocate page fragments that are not skb aligned"
> As a small and totally unrelated note, page_frag_alloc is only used in
> mt76 and the nvme driver ;)

And there is nvme problem on AMD IOMMU:
https://bugzilla.kernel.org/show_bug.cgi?id=202665

While page_frag_alloc() should be used with cautious, at least care of
size alignment with ARCH_DMA_MINALIGN (not for IOMMU, but standard arch
dma), at this point I think some of those issues are AMD IOMMU problems.

> > However after fixing that mt76usb still did not work. To make things
> > work we had to change rx frag size from 2048 to PAGE_SIZE and change
> > virt_to_head_page() to virt_to_page() when setting SG's.
>
> > I think I understand why first change was needed. If we do 2 separate
> > dma maps of 2 different buffers in single page i.e (PAGE + off=0
> > and PAGE + off=2048) it causes problem. So either map_sg() return
> > error which mt76usb does not handle correctly or there is issue
> > in AMD IOMMU because two dma maps use the same page.

Any comment on that? Is fine or not to do 2 or more dma mappings
within the same single page on AMD IOMMU? If not, is there any
mechanism for drivers to find out about this limitation to prevent
to prepare wrong SG buffers?

> > But I don't understand why the second change was needed. Without
> > it we have issue with incorrect page->_refcount . It is somehow
> > related with AMD IOMMU, because on different platforms we do not
> > have such problems.

I think I found a bug in amd iommu code when setting sg->dma_address
with sg->offset > PAGE_SIZE. Will post fix shortly. 

Stanislaw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE
  2019-03-11  8:43                                         ` Stanislaw Gruszka
@ 2019-03-11  9:03                                           ` Stanislaw Gruszka
  2019-03-11 15:47                                             ` Alexander Duyck
  2019-03-12  7:13                                           ` MT76x2U crashes XHCI driver on AMD Ryzen system Stanislaw Gruszka
  1 sibling, 1 reply; 33+ messages in thread
From: Stanislaw Gruszka @ 2019-03-11  9:03 UTC (permalink / raw)
  To: Rosen Penev, Joerg Roedel
  Cc: Lorenzo Bianconi, linux-wireless, Samuel Sieb, Alexander Duyck,
	iommu, linux-kernel, linux-nvme, jan.viktorin

Take into account that sg->offset can be bigger than PAGE_SIZE when
setting segment sg->dma_address. Otherwise sg->dma_address will point
at diffrent page, what makes DMA not possible with erros like this:

xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa70c0 flags=0x0020]
xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7040 flags=0x0020]
xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7080 flags=0x0020]
xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7100 flags=0x0020]
xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7000 flags=0x0020]

Additinally with wrong sg->dma_address unmap_sg will free wrong pages,
what what can cause crashes like this:

Feb 28 19:27:45 kernel: BUG: Bad page state in process cinnamon  pfn:39e8b1
Feb 28 19:27:45 kernel: Disabling lock debugging due to kernel taint
Feb 28 19:27:45 kernel: flags: 0x2ffff0000000000()
Feb 28 19:27:45 kernel: raw: 02ffff0000000000 0000000000000000 ffffffff00000301 0000000000000000
Feb 28 19:27:45 kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
Feb 28 19:27:45 kernel: page dumped because: nonzero _refcount
Feb 28 19:27:45 kernel: Modules linked in: ccm fuse arc4 nct6775 hwmon_vid amdgpu nls_iso8859_1 nls_cp437 edac_mce_amd vfat fat kvm_amd ccp rng_core kvm mt76x0u mt76x0_common mt76x02_usb irqbypass mt76_usb mt76x02_lib mt76 crct10dif_pclmul crc32_pclmul chash mac80211 amd_iommu_v2 ghash_clmulni_intel gpu_sched i2c_algo_bit ttm wmi_bmof snd_hda_codec_realtek snd_hda_codec_generic drm_kms_helper snd_hda_codec_hdmi snd_hda_intel drm snd_hda_codec aesni_intel snd_hda_core snd_hwdep aes_x86_64 crypto_simd snd_pcm cfg80211 cryptd mousedev snd_timer glue_helper pcspkr r8169 input_leds realtek agpgart libphy rfkill snd syscopyarea sysfillrect sysimgblt fb_sys_fops soundcore sp5100_tco k10temp i2c_piix4 wmi evdev gpio_amdpt pinctrl_amd mac_hid pcc_cpufreq acpi_cpufreq sg ip_tables x_tables ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) sd_mod(E) hid_generic(E) usbhid(E) hid(E) dm_mod(E) serio_raw(E) atkbd(E) libps2(E) crc32c_intel(E) ahci(E) libahci(E) libata(E) xhci_pci(E) 
 xhci_hcd(E)
Feb 28 19:27:45 kernel:  scsi_mod(E) i8042(E) serio(E) bcache(E) crc64(E)
Feb 28 19:27:45 kernel: CPU: 2 PID: 896 Comm: cinnamon Tainted: G    B   W   E     4.20.12-arch1-1-custom #1
Feb 28 19:27:45 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P1.20 06/26/2018
Feb 28 19:27:45 kernel: Call Trace:
Feb 28 19:27:45 kernel:  dump_stack+0x5c/0x80
Feb 28 19:27:45 kernel:  bad_page.cold.29+0x7f/0xb2
Feb 28 19:27:45 kernel:  __free_pages_ok+0x2c0/0x2d0
Feb 28 19:27:45 kernel:  skb_release_data+0x96/0x180
Feb 28 19:27:45 kernel:  __kfree_skb+0xe/0x20
Feb 28 19:27:45 kernel:  tcp_recvmsg+0x894/0xc60
Feb 28 19:27:45 kernel:  ? reuse_swap_page+0x120/0x340
Feb 28 19:27:45 kernel:  ? ptep_set_access_flags+0x23/0x30
Feb 28 19:27:45 kernel:  inet_recvmsg+0x5b/0x100
Feb 28 19:27:45 kernel:  __sys_recvfrom+0xc3/0x180
Feb 28 19:27:45 kernel:  ? handle_mm_fault+0x10a/0x250
Feb 28 19:27:45 kernel:  ? syscall_trace_enter+0x1d3/0x2d0
Feb 28 19:27:45 kernel:  ? __audit_syscall_exit+0x22a/0x290
Feb 28 19:27:45 kernel:  __x64_sys_recvfrom+0x24/0x30
Feb 28 19:27:45 kernel:  do_syscall_64+0x5b/0x170
Feb 28 19:27:45 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Cc: stable@vger.kernel.org
Reported-and-tested-by: jan.viktorin@gmail.com
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
 drivers/iommu/amd_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 6b0760dafb3e..949621f33624 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2604,7 +2604,7 @@ static int map_sg(struct device *dev, struct scatterlist *sglist,
 
 	/* Everything is mapped - write the right values into s->dma_address */
 	for_each_sg(sglist, s, nelems, i) {
-		s->dma_address += address + s->offset;
+		s->dma_address += address + (s->offset & ~PAGE_MASK);
 		s->dma_length   = s->length;
 	}
 
-- 
2.7.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE
  2019-03-11  9:03                                           ` [PATCH] iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE Stanislaw Gruszka
@ 2019-03-11 15:47                                             ` Alexander Duyck
  2019-03-12  7:08                                               ` Stanislaw Gruszka
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Duyck @ 2019-03-11 15:47 UTC (permalink / raw)
  To: Stanislaw Gruszka, Rosen Penev, Joerg Roedel
  Cc: Lorenzo Bianconi, linux-wireless, Samuel Sieb, iommu,
	linux-kernel, linux-nvme, jan.viktorin

On Mon, 2019-03-11 at 10:03 +0100, Stanislaw Gruszka wrote:
> Take into account that sg->offset can be bigger than PAGE_SIZE when
> setting segment sg->dma_address. Otherwise sg->dma_address will point
> at diffrent page, what makes DMA not possible with erros like this:
> 
> xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa70c0 flags=0x0020]
> xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7040 flags=0x0020]
> xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7080 flags=0x0020]
> xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7100 flags=0x0020]
> xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7000 flags=0x0020]
> 
> Additinally with wrong sg->dma_address unmap_sg will free wrong pages,
> what what can cause crashes like this:
> 
> Feb 28 19:27:45 kernel: BUG: Bad page state in process cinnamon  pfn:39e8b1
> Feb 28 19:27:45 kernel: Disabling lock debugging due to kernel taint
> Feb 28 19:27:45 kernel: flags: 0x2ffff0000000000()
> Feb 28 19:27:45 kernel: raw: 02ffff0000000000 0000000000000000 ffffffff00000301 0000000000000000
> Feb 28 19:27:45 kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> Feb 28 19:27:45 kernel: page dumped because: nonzero _refcount
> Feb 28 19:27:45 kernel: Modules linked in: ccm fuse arc4 nct6775 hwmon_vid amdgpu nls_iso8859_1 nls_cp437 edac_mce_amd vfat fat kvm_amd ccp rng_core kvm mt76x0u mt76x0_common mt76x02_usb irqbypass mt76_usb mt76x02_lib mt76 crct10dif_pclmul crc32_pclmul chash mac80211 amd_iommu_v2 ghash_clmulni_intel gpu_sched i2c_algo_bit ttm wmi_bmof snd_hda_codec_realtek snd_hda_codec_generic drm_kms_helper snd_hda_codec_hdmi snd_hda_intel drm snd_hda_codec aesni_intel snd_hda_core snd_hwdep aes_x86_64 crypto_simd snd_pcm cfg80211 cryptd mousedev snd_timer glue_helper pcspkr r8169 input_leds realtek agpgart libphy rfkill snd syscopyarea sysfillrect sysimgblt fb_sys_fops soundcore sp5100_tco k10temp i2c_piix4 wmi evdev gpio_amdpt pinctrl_amd mac_hid pcc_cpufreq acpi_cpufreq sg ip_tables x_tables ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) sd_mod(E) hid_generic(E) usbhid(E) hid(E) dm_mod(E) serio_raw(E) atkbd(E) libps2(E) crc32c_intel(E) ahci(E) libahci(E) libata(E) xhci_pci(E
 ) xhci_hcd(E)
> Feb 28 19:27:45 kernel:  scsi_mod(E) i8042(E) serio(E) bcache(E) crc64(E)
> Feb 28 19:27:45 kernel: CPU: 2 PID: 896 Comm: cinnamon Tainted: G    B   W   E     4.20.12-arch1-1-custom #1
> Feb 28 19:27:45 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P1.20 06/26/2018
> Feb 28 19:27:45 kernel: Call Trace:
> Feb 28 19:27:45 kernel:  dump_stack+0x5c/0x80
> Feb 28 19:27:45 kernel:  bad_page.cold.29+0x7f/0xb2
> Feb 28 19:27:45 kernel:  __free_pages_ok+0x2c0/0x2d0
> Feb 28 19:27:45 kernel:  skb_release_data+0x96/0x180
> Feb 28 19:27:45 kernel:  __kfree_skb+0xe/0x20
> Feb 28 19:27:45 kernel:  tcp_recvmsg+0x894/0xc60
> Feb 28 19:27:45 kernel:  ? reuse_swap_page+0x120/0x340
> Feb 28 19:27:45 kernel:  ? ptep_set_access_flags+0x23/0x30
> Feb 28 19:27:45 kernel:  inet_recvmsg+0x5b/0x100
> Feb 28 19:27:45 kernel:  __sys_recvfrom+0xc3/0x180
> Feb 28 19:27:45 kernel:  ? handle_mm_fault+0x10a/0x250
> Feb 28 19:27:45 kernel:  ? syscall_trace_enter+0x1d3/0x2d0
> Feb 28 19:27:45 kernel:  ? __audit_syscall_exit+0x22a/0x290
> Feb 28 19:27:45 kernel:  __x64_sys_recvfrom+0x24/0x30
> Feb 28 19:27:45 kernel:  do_syscall_64+0x5b/0x170
> Feb 28 19:27:45 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Cc: stable@vger.kernel.org
> Reported-and-tested-by: jan.viktorin@gmail.com
> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
> ---
>  drivers/iommu/amd_iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 6b0760dafb3e..949621f33624 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2604,7 +2604,7 @@ static int map_sg(struct device *dev, struct scatterlist *sglist,
>  
>  	/* Everything is mapped - write the right values into s->dma_address */
>  	for_each_sg(sglist, s, nelems, i) {
> -		s->dma_address += address + s->offset;
> +		s->dma_address += address + (s->offset & ~PAGE_MASK);
>  		s->dma_length   = s->length;
>  	}
>  

You should add a comment calling out that this is needed because the
sg_phys(s) call above this is masked with PAGE_MASK. Then this makes
much more sense. Otherwise I would have assumed you needed either the
full offset or none.

Other than that, from that I can tell the code itself looks to be
correct, but just difficult to read.

Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE
  2019-03-11 15:47                                             ` Alexander Duyck
@ 2019-03-12  7:08                                               ` Stanislaw Gruszka
  2019-03-12 15:18                                                 ` Alexander Duyck
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislaw Gruszka @ 2019-03-12  7:08 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Rosen Penev, Joerg Roedel, Lorenzo Bianconi, linux-wireless,
	Samuel Sieb, iommu, linux-kernel, linux-nvme, jan.viktorin

On Mon, Mar 11, 2019 at 08:47:44AM -0700, Alexander Duyck wrote:
> >  drivers/iommu/amd_iommu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> > index 6b0760dafb3e..949621f33624 100644
> > --- a/drivers/iommu/amd_iommu.c
> > +++ b/drivers/iommu/amd_iommu.c
> > @@ -2604,7 +2604,7 @@ static int map_sg(struct device *dev, struct scatterlist *sglist,
> >  
> >  	/* Everything is mapped - write the right values into s->dma_address */
> >  	for_each_sg(sglist, s, nelems, i) {
> > -		s->dma_address += address + s->offset;
> > +		s->dma_address += address + (s->offset & ~PAGE_MASK);
> >  		s->dma_length   = s->length;
> >  	}
> >  
> 
> You should add a comment calling out that this is needed because the
> sg_phys(s) call above this is masked with PAGE_MASK. Then this makes
> much more sense. Otherwise I would have assumed you needed either the
> full offset or none.

Would something like this 

/*
 * Everything is mapped - write the right values into s->dma_address. 
 * Take into account s->offset can be bigger than page size and sg_phys(s)
 * address has to be aligned to page granularity.
 */

be appropriate ?

Stanislaw


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: MT76x2U crashes XHCI driver on AMD Ryzen system
  2019-03-11  8:43                                         ` Stanislaw Gruszka
  2019-03-11  9:03                                           ` [PATCH] iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE Stanislaw Gruszka
@ 2019-03-12  7:13                                           ` Stanislaw Gruszka
  1 sibling, 0 replies; 33+ messages in thread
From: Stanislaw Gruszka @ 2019-03-12  7:13 UTC (permalink / raw)
  To: Rosen Penev
  Cc: Joerg Roedel, Lorenzo Bianconi, linux-wireless, Samuel Sieb,
	Alexander Duyck, iommu, linux-kernel, linux-nvme, jan.viktorin

On Mon, Mar 11, 2019 at 09:43:19AM +0100, Stanislaw Gruszka wrote:
> > > However after fixing that mt76usb still did not work. To make things
> > > work we had to change rx frag size from 2048 to PAGE_SIZE and change
> > > virt_to_head_page() to virt_to_page() when setting SG's.
> >
> > > I think I understand why first change was needed. If we do 2 separate
> > > dma maps of 2 different buffers in single page i.e (PAGE + off=0
> > > and PAGE + off=2048) it causes problem. So either map_sg() return
> > > error which mt76usb does not handle correctly or there is issue
> > > in AMD IOMMU because two dma maps use the same page.
> 
> Any comment on that? Is fine or not to do 2 or more dma mappings
> within the same single page on AMD IOMMU? If not, is there any
> mechanism for drivers to find out about this limitation to prevent
> to prepare wrong SG buffers?

FTR: it was confirmed by Jan (bug reporter) the 2 or more dma mappings
within single page works with AMD IOMMU. Most likely it was needed
previously to workaround this sg->offset problem until proper fix to
AMD IOMMU was applied.

Stanislaw 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE
  2019-03-12  7:08                                               ` Stanislaw Gruszka
@ 2019-03-12 15:18                                                 ` Alexander Duyck
  2019-03-13  9:03                                                   ` [PATCH v2] " Stanislaw Gruszka
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Duyck @ 2019-03-12 15:18 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Rosen Penev, Joerg Roedel, Lorenzo Bianconi, linux-wireless,
	Samuel Sieb, iommu, linux-kernel, linux-nvme, jan.viktorin

On Tue, 2019-03-12 at 08:08 +0100, Stanislaw Gruszka wrote:
> On Mon, Mar 11, 2019 at 08:47:44AM -0700, Alexander Duyck wrote:
> > >  drivers/iommu/amd_iommu.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> > > index 6b0760dafb3e..949621f33624 100644
> > > --- a/drivers/iommu/amd_iommu.c
> > > +++ b/drivers/iommu/amd_iommu.c
> > > @@ -2604,7 +2604,7 @@ static int map_sg(struct device *dev, struct scatterlist *sglist,
> > >  
> > >  	/* Everything is mapped - write the right values into s->dma_address */
> > >  	for_each_sg(sglist, s, nelems, i) {
> > > -		s->dma_address += address + s->offset;
> > > +		s->dma_address += address + (s->offset & ~PAGE_MASK);
> > >  		s->dma_length   = s->length;
> > >  	}
> > >  
> > 
> > You should add a comment calling out that this is needed because the
> > sg_phys(s) call above this is masked with PAGE_MASK. Then this makes
> > much more sense. Otherwise I would have assumed you needed either the
> > full offset or none.
> 
> Would something like this 
> 
> /*
>  * Everything is mapped - write the right values into s->dma_address. 
>  * Take into account s->offset can be bigger than page size and sg_phys(s)
>  * address has to be aligned to page granularity.
>  */
> 
> be appropriate ?
> 
> Stanislaw
> 

No, that isn't a good description. If you take a look at the code a few
lines up you find:
	phys_addr = (sg_phys(s) & PAGE_MASK) + (j << PAGE_SHIFT);

Now if I am not mistaken the whole reason why you are having to make
the change here is because the application of PAGE_MASK in this line.
Basically what sg_phys() will do is take the address of the page,
convert it to a physical address and add the offset. However what the
mask is doing is limiting how much of that offset can be added. As a
result you have to add the remainder that was masked out. So maybe a
better comment would be something like:

/*
 * Add in the remaining piece of the scatter-gather offset that was 
 * masked out when we were determining the physical address via
 * (sg_phys(s) & PAGE_MASK) earlier.
 */




^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2] iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE
  2019-03-12 15:18                                                 ` Alexander Duyck
@ 2019-03-13  9:03                                                   ` Stanislaw Gruszka
  2019-03-18 10:17                                                     ` Joerg Roedel
  0 siblings, 1 reply; 33+ messages in thread
From: Stanislaw Gruszka @ 2019-03-13  9:03 UTC (permalink / raw)
  To: Alexander Duyck, Joerg Roedel
  Cc: Rosen Penev, Lorenzo Bianconi, linux-wireless, Samuel Sieb,
	iommu, linux-kernel, linux-nvme, jan.viktorin


Take into account that sg->offset can be bigger than PAGE_SIZE when
setting segment sg->dma_address. Otherwise sg->dma_address will point
at diffrent page, what makes DMA not possible with erros like this:

xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa70c0 flags=0x0020]
xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7040 flags=0x0020]
xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7080 flags=0x0020]
xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7100 flags=0x0020]
xhci_hcd 0000:38:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fdaa7000 flags=0x0020]

Additinally with wrong sg->dma_address unmap_sg will free wrong pages,
what what can cause crashes like this:

Feb 28 19:27:45 kernel: BUG: Bad page state in process cinnamon  pfn:39e8b1
Feb 28 19:27:45 kernel: Disabling lock debugging due to kernel taint
Feb 28 19:27:45 kernel: flags: 0x2ffff0000000000()
Feb 28 19:27:45 kernel: raw: 02ffff0000000000 0000000000000000 ffffffff00000301 0000000000000000
Feb 28 19:27:45 kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
Feb 28 19:27:45 kernel: page dumped because: nonzero _refcount
Feb 28 19:27:45 kernel: Modules linked in: ccm fuse arc4 nct6775 hwmon_vid amdgpu nls_iso8859_1 nls_cp437 edac_mce_amd vfat fat kvm_amd ccp rng_core kvm mt76x0u mt76x0_common mt76x02_usb irqbypass mt76_usb mt76x02_lib mt76 crct10dif_pclmul crc32_pclmul chash mac80211 amd_iommu_v2 ghash_clmulni_intel gpu_sched i2c_algo_bit ttm wmi_bmof snd_hda_codec_realtek snd_hda_codec_generic drm_kms_helper snd_hda_codec_hdmi snd_hda_intel drm snd_hda_codec aesni_intel snd_hda_core snd_hwdep aes_x86_64 crypto_simd snd_pcm cfg80211 cryptd mousedev snd_timer glue_helper pcspkr r8169 input_leds realtek agpgart libphy rfkill snd syscopyarea sysfillrect sysimgblt fb_sys_fops soundcore sp5100_tco k10temp i2c_piix4 wmi evdev gpio_amdpt pinctrl_amd mac_hid pcc_cpufreq acpi_cpufreq sg ip_tables x_tables ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) fscrypto(E) sd_mod(E) hid_generic(E) usbhid(E) hid(E) dm_mod(E) serio_raw(E) atkbd(E) libps2(E) crc32c_intel(E) ahci(E) libahci(E) libata(E) xhci_pci(E) 
 xhci_hcd(E)
Feb 28 19:27:45 kernel:  scsi_mod(E) i8042(E) serio(E) bcache(E) crc64(E)
Feb 28 19:27:45 kernel: CPU: 2 PID: 896 Comm: cinnamon Tainted: G    B   W   E     4.20.12-arch1-1-custom #1
Feb 28 19:27:45 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P1.20 06/26/2018
Feb 28 19:27:45 kernel: Call Trace:
Feb 28 19:27:45 kernel:  dump_stack+0x5c/0x80
Feb 28 19:27:45 kernel:  bad_page.cold.29+0x7f/0xb2
Feb 28 19:27:45 kernel:  __free_pages_ok+0x2c0/0x2d0
Feb 28 19:27:45 kernel:  skb_release_data+0x96/0x180
Feb 28 19:27:45 kernel:  __kfree_skb+0xe/0x20
Feb 28 19:27:45 kernel:  tcp_recvmsg+0x894/0xc60
Feb 28 19:27:45 kernel:  ? reuse_swap_page+0x120/0x340
Feb 28 19:27:45 kernel:  ? ptep_set_access_flags+0x23/0x30
Feb 28 19:27:45 kernel:  inet_recvmsg+0x5b/0x100
Feb 28 19:27:45 kernel:  __sys_recvfrom+0xc3/0x180
Feb 28 19:27:45 kernel:  ? handle_mm_fault+0x10a/0x250
Feb 28 19:27:45 kernel:  ? syscall_trace_enter+0x1d3/0x2d0
Feb 28 19:27:45 kernel:  ? __audit_syscall_exit+0x22a/0x290
Feb 28 19:27:45 kernel:  __x64_sys_recvfrom+0x24/0x30
Feb 28 19:27:45 kernel:  do_syscall_64+0x5b/0x170
Feb 28 19:27:45 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Cc: stable@vger.kernel.org
Reported-and-tested-by: Jan Viktorin <jan.viktorin@gmail.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
v2: add comment 

 drivers/iommu/amd_iommu.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 6b0760dafb3e..696c78225dd1 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2604,7 +2604,11 @@ static int map_sg(struct device *dev, struct scatterlist *sglist,
 
 	/* Everything is mapped - write the right values into s->dma_address */
 	for_each_sg(sglist, s, nelems, i) {
-		s->dma_address += address + s->offset;
+		/* Add in the remaining piece of the scatter-gather offset that
+		 * was masked out when we were determining the physical address
+		 * via (sg_phys(s) & PAGE_MASK) earlier.
+		 */
+		s->dma_address += address + (s->offset & ~PAGE_MASK);
 		s->dma_length   = s->length;
 	}
 
-- 
2.7.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE
  2019-03-13  9:03                                                   ` [PATCH v2] " Stanislaw Gruszka
@ 2019-03-18 10:17                                                     ` Joerg Roedel
  0 siblings, 0 replies; 33+ messages in thread
From: Joerg Roedel @ 2019-03-18 10:17 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Alexander Duyck, Rosen Penev, Lorenzo Bianconi, linux-wireless,
	Samuel Sieb, iommu, linux-kernel, linux-nvme, jan.viktorin

Hi Stanislaw,

thanks a lot for looking into this and tracking it down!

On Wed, Mar 13, 2019 at 10:03:17AM +0100, Stanislaw Gruszka wrote:
> -		s->dma_address += address + s->offset;
> +		/* Add in the remaining piece of the scatter-gather offset that
> +		 * was masked out when we were determining the physical address
> +		 * via (sg_phys(s) & PAGE_MASK) earlier.
> +		 */
> +		s->dma_address += address + (s->offset & ~PAGE_MASK);
>  		s->dma_length   = s->length;

Applied the patch for v5.1 (with an added Fixes-tag and a minor coding
style change) and will send it upstream soon.

Thanks again,

       Joerg

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2019-03-18 10:17 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-10 22:55 MT76x2U crashes XHCI driver on AMD Ryzen system Rosen Penev
2019-01-11 17:29 ` Lorenzo Bianconi
2019-01-11 19:01   ` Rosen Penev
2019-01-13 13:33     ` Lorenzo Bianconi
     [not found]       ` <1547404075.1582.0@smtp.gmail.com>
2019-01-13 19:00         ` Lorenzo Bianconi
2019-01-14  2:20           ` Rosen Penev
2019-01-14  3:13             ` Samuel Sieb
2019-01-14  9:18             ` Lorenzo Bianconi
2019-01-14  9:22               ` Tom Psyborg
2019-01-14 20:06               ` Rosen Penev
2019-01-15  9:04                 ` Lorenzo Bianconi
2019-02-18 14:37                   ` Stanislaw Gruszka
2019-02-18 15:15                     ` Lorenzo Bianconi
2019-02-18 17:01                     ` Robin Murphy
2019-02-19 11:08                       ` Stanislaw Gruszka
2019-02-26 10:05                     ` Joerg Roedel
2019-02-26 10:34                       ` Stanislaw Gruszka
2019-02-26 10:44                         ` Joerg Roedel
2019-02-26 11:24                           ` Stanislaw Gruszka
2019-02-28  9:04                             ` Stanislaw Gruszka
2019-02-28 10:42                               ` Stanislaw Gruszka
2019-02-28 12:19                                 ` Stanislaw Gruszka
2019-02-28 13:40                                   ` Joerg Roedel
2019-03-04  7:10                                     ` Stanislaw Gruszka
2019-03-04  7:20                                       ` Rosen Penev
2019-03-11  8:43                                         ` Stanislaw Gruszka
2019-03-11  9:03                                           ` [PATCH] iommu/amd: fix sg->dma_address for sg->offset bigger than PAGE_SIZE Stanislaw Gruszka
2019-03-11 15:47                                             ` Alexander Duyck
2019-03-12  7:08                                               ` Stanislaw Gruszka
2019-03-12 15:18                                                 ` Alexander Duyck
2019-03-13  9:03                                                   ` [PATCH v2] " Stanislaw Gruszka
2019-03-18 10:17                                                     ` Joerg Roedel
2019-03-12  7:13                                           ` MT76x2U crashes XHCI driver on AMD Ryzen system Stanislaw Gruszka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).