All of lore.kernel.org
 help / color / mirror / Atom feed
* Problems using AMD eMMC with AMD IOMMU
@ 2020-05-06 21:00 Raul Rangel
  2020-05-08 15:22 ` Joerg Roedel
  0 siblings, 1 reply; 8+ messages in thread
From: Raul Rangel @ 2020-05-06 21:00 UTC (permalink / raw)
  To: linux-mmc; +Cc: murphyt7, jroedel, dianders, evgreen, Daniel Kurtz

On Tip of Tree (47cf1b422e60) I'm seeing the following IOMMU error
when enumerating my eMMC device:

[   13.097314] mmc0: SDHCI controller on PCI [0000:02:00.0] using ADMA 64-bit
[   13.104550] mmc1: SDHCI controller on ACPI [AMDI0040:00] using ADMA 64-bit
[   13.136359] mmc1: Allocated DMA: align_addr: 0x000000041ab55000,
adma_addr: 0x000000041ab55200, dma_size: e0c. <- I added this to
sdhci_setup_host
[   13.340852] mmc1: ADMA error: 0x02000000
[   13.344780] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[   13.351222] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
[   13.357669] mmc1: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000001
[   13.364118] mmc1: sdhci: Argument:  0x00000000 | Trn mode: 0x00000013
[   13.370567] mmc1: sdhci: Present:   0xf1ff0000 | Host ctl: 0x00000019
[   13.377014] mmc1: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
[   13.383463] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x0000fa07
[   13.389911] mmc1: sdhci: Timeout:   0x0000000a | Int stat: 0x00000000
[   13.396360] mmc1: sdhci: Int enab:  0x03ff000b | Sig enab: 0x03ff000b
[   13.402808] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[   13.409257] mmc1: sdhci: Caps:      0x71fec8b2 | Caps_1:   0x00000577
[   13.415706] mmc1: sdhci: Cmd:       0x0000083a | Max curr: 0x00c80064
[   13.422154] mmc1: sdhci: Resp[0]:   0x00000700 | Resp[1]:  0xffffffff
[   13.428603] mmc1: sdhci: Resp[2]:   0x328f5903 | Resp[3]:  0x00d00f00
[   13.435048] mmc1: sdhci: Host ctl2: 0x00000000
[   13.439506] mmc1: sdhci: ADMA Err:  0x00000001 | ADMA Ptr: 0x000000041ab55200
[   13.446641] mmc1: sdhci: ============================================
[   13.453077] mmc1: sdhci: 41ab55200: DMA 0x000000041a462400, LEN
0x0200, Attr=0x21
[   13.460551] mmc1: sdhci: 41ab5520c: DMA 0x0000000000000000, LEN
0x0000, Attr=0x03
[   13.468070] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:13.1
domain=0x0000 address=0x41ab55200 flags=0x0050]
[   13.478109] mmc1: error -5 whilst initialising MMC card

If I read the error correctly (not sure), it looks like the ADMA
descriptor was at 0x41ab55200. The fault was also for 0x41ab55200.

Decoding the flags:
2.5.3 IO_PAGE_FAULT Event
* PE (peripheral did not have the permissions required to perform the
transaction)
* PR (transaction was to a page marked as present (including V=1b in
DTE) or interrupt marked as remapped (RemapEn=1)).

So it looks like the eMMC controller had a page fault trying to read
the DMA descriptor. Is this a known issue? Does anyone have any
suggestions?

I little background:
We recently cherry-picked `be62dbf554c5 iommu/amd: Convert AMD iommu
driver to the dma-iommu api` into the chromiumos 5.4 tree:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2169153
This caused the eMMC page faults to start happening on boot. When we
reverted the patch from our tree the problems went away.

When I tried on ToT, I saw the same errors. So I reverted be62dbf554c5
from my ToT branch expecting the errors to go away, but they remained.
So this makes me wonder if we are missing an mmc patch or the iommu
wasn't working correctly on 5.4 before we landed be62dbf554c5. I
looked through the recent mmc patches, but didn't see anything that
stood out.

This is the driver we are using:
https://source.chromium.org/chromiumos/chromiumos/codesearch/+/master:src/third_party/kernel/v5.4/drivers/mmc/host/sdhci-acpi.c;l=607

Any suggestions would be appreciated.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problems using AMD eMMC with AMD IOMMU
  2020-05-06 21:00 Problems using AMD eMMC with AMD IOMMU Raul Rangel
@ 2020-05-08 15:22 ` Joerg Roedel
  2020-05-08 15:46   ` Raul Rangel
  0 siblings, 1 reply; 8+ messages in thread
From: Joerg Roedel @ 2020-05-08 15:22 UTC (permalink / raw)
  To: Raul Rangel; +Cc: linux-mmc, murphyt7, dianders, evgreen, Daniel Kurtz

On Wed, May 06, 2020 at 03:00:21PM -0600, Raul Rangel wrote:
> Any suggestions would be appreciated.

Two possibilities:

	1) Please try the 5 patches from this branch:
	   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/log/?h=iommu/fixes

	   The fix some race-condition issues, I think it is unlikely
	   you hit them, but it is worth a test.

	2) Dma-iommu code does sg-merging, which the previous DMA-API
	   implementation did not. Can you try attached diff from Robin
	   Murphy to disable sg-merging? It that helps it is an issue in
	   the eMMC driver.

Regards,

	Joerg

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index a2e96a5fd9a7..a6b71bad518e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -779,7 +779,7 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,
  		 * - but doesn't fall at a segment boundary
  		 * - and wouldn't make the resulting output segment too long
  		 */
-		if (cur_len && !s_iova_off && (dma_addr & seg_mask) &&
+		if (0 && cur_len && !s_iova_off && (dma_addr & seg_mask) &&
  		    (max_len - cur_len >= s_length)) {
  			/* ...then concatenate it with the previous one */
  			cur_len += s_length;
@@ -799,6 +799,7 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,
  		if (s_length + s_iova_off < s_iova_len)
  			cur_len = 0;
  	}
+	WARN_ON(count < nents);
  	return count;
  }

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Problems using AMD eMMC with AMD IOMMU
  2020-05-08 15:22 ` Joerg Roedel
@ 2020-05-08 15:46   ` Raul Rangel
  2020-05-08 16:59     ` Raul Rangel
  0 siblings, 1 reply; 8+ messages in thread
From: Raul Rangel @ 2020-05-08 15:46 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-mmc, Tom Murphy, Douglas Anderson, evgreen, Daniel Kurtz

Thanks for the suggestions. I will try them out. I did some more
digging yesterday and it looks like the device isn't getting
associated with the IOMMU. I added some extra logging.

[   11.670073] AMD-Vi: Using IVHD type 0x40
[   11.674766] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: b0 info 0000
[   11.682232] AMD-Vi:        mmio-addr: 00000000fd200000
[   11.687952] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:01.0 flags: 00
[   11.695219] AMD-Vi:   DEV_RANGE_END           devid: ff:1f.6
[   11.701267] AMD-Vi:   DEV_ALIAS_RANGE                 devid:
ff:00.0 flags: 00 devid_to: 00:14.4
[   11.709699] AMD-Vi:   DEV_RANGE_END           devid: ff:1f.7
[   11.715224] AMD-Vi:   DEV_SPECIAL(HPET[0])           devid: 00:14.0
[   11.721327] AMD-Vi:   DEV_SPECIAL(IOAPIC[33])                devid: 00:14.0
[   11.727720] AMD-Vi:   DEV_ACPI_HID(AMDI0040[])               devid:
00:13.1 <- You can see the ACPI ID in the IVRS table.
[   11.734211] AMD-Vi: ivrs, add hid:AMDI0040, uid:, rdevid:152
[   11.913253] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[   11.920969] pci 0000:00:01.0: AMD-Vi: Init IOMMU for device
[   11.927224] pci 0000:00:01.0: Adding to iommu group 0
[   11.932910] pci 0000:00:01.0: AMD-Vi: IOMMU set to DMA
[   11.938635] pci 0000:00:01.2: AMD-Vi: Init IOMMU for device
[   11.944858] pci 0000:00:01.2: Adding to iommu group 1
[   11.950527] pci 0000:00:01.2: AMD-Vi: IOMMU set to DMA
[   11.956257] pci 0000:00:01.3: AMD-Vi: Init IOMMU for device
... More PCI
[   12.244008] pci 0000:02:00.0: Adding to iommu group 10
[   12.249784] pci 0000:02:00.0: AMD-Vi: IOMMU set to DMA
[   12.255514] pci 0000:03:00.0: AMD-Vi: Init IOMMU for device
[   12.261736] pci 0000:03:00.0: Adding to iommu group 11
[   12.267519] pci 0000:03:00.0: AMD-Vi: IOMMU set to DMA
[   12.273261] pci 0000:04:00.0: AMD-Vi: Init IOMMU for device
[   12.279568] pci 0000:04:00.0: Adding to iommu group 12
[   12.285427] pci 0000:04:00.0: Using iommu direct mapping
[   12.291336] pci 0000:04:00.0: AMD-Vi: IOMMU set to passthrough
[   12.297833] pci 0000:04:00.1: AMD-Vi: Init IOMMU for device
[   12.304087] pci 0000:04:00.1: Adding to iommu group 13
[   12.309852] pci 0000:04:00.1: AMD-Vi: IOMMU set to DMA
[   12.315576] pci 0000:04:00.2: AMD-Vi: Init IOMMU for device
... Even More PCI
[   12.415860] pci 0000:04:00.7: AMD-Vi: IOMMU set to DMA
[   12.421586] pci 0000:05:00.0: AMD-Vi: Init IOMMU for device
[   12.427815] pci 0000:05:00.0: Adding to iommu group 14
[   12.433589] pci 0000:05:00.0: AMD-Vi: IOMMU set to DMA
[   12.439583] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[   12.445687] pci 0000:00:00.2: AMD-Vi: Extended features (0x4f77ef22294ada):
[   12.453433]  PPR NX GT IA GA PC GA_vAPIC
[   12.457898] AMD-Vi: Lazy IO/TLB flushing enabled
[   12.464067] amd_uncore: AMD NB counters detected
[   12.469208] amd_uncore: AMD LLC counters detected
[   12.474453] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

Never saw the ACPI device listed.

Listing sysfs I don't see iommu either:

$ find /sys/ -iname iommu
/sys/kernel/debug/tracing/events/iommu
/sys/kernel/debug/iommu
/sys/class/iommu
/sys/devices/pci0000:00/0000:00:08.0/iommu
/sys/devices/pci0000:00/0000:00:18.3/iommu
/sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/iommu
/sys/devices/pci0000:00/0000:00:01.2/iommu
/sys/devices/pci0000:00/0000:00:18.1/iommu
/sys/devices/pci0000:00/0000:00:01.0/iommu
/sys/devices/pci0000:00/0000:00:14.3/iommu
/sys/devices/pci0000:00/0000:00:01.7/0000:03:00.0/iommu
/sys/devices/pci0000:00/0000:00:01.7/iommu
/sys/devices/pci0000:00/0000:00:00.2/iommu
/sys/devices/pci0000:00/0000:00:18.6/iommu
/sys/devices/pci0000:00/0000:00:08.1/0000:04:00.7/iommu
/sys/devices/pci0000:00/0000:00:08.1/0000:04:00.5/iommu
/sys/devices/pci0000:00/0000:00:08.1/0000:04:00.3/iommu
/sys/devices/pci0000:00/0000:00:08.1/0000:04:00.1/iommu
/sys/devices/pci0000:00/0000:00:08.1/0000:04:00.6/iommu
/sys/devices/pci0000:00/0000:00:08.1/0000:04:00.4/iommu
/sys/devices/pci0000:00/0000:00:08.1/0000:04:00.2/iommu
/sys/devices/pci0000:00/0000:00:08.1/iommu
/sys/devices/pci0000:00/0000:00:08.1/0000:04:00.0/iommu
/sys/devices/pci0000:00/0000:00:18.4/iommu
/sys/devices/pci0000:00/0000:00:01.3/0000:02:00.0/iommu
/sys/devices/pci0000:00/0000:00:01.3/iommu
/sys/devices/pci0000:00/0000:00:18.2/iommu
/sys/devices/pci0000:00/0000:00:18.0/iommu
/sys/devices/pci0000:00/0000:00:18.7/iommu
/sys/devices/pci0000:00/0000:00:14.0/iommu
/sys/devices/pci0000:00/0000:00:08.2/0000:05:00.0/iommu
/sys/devices/pci0000:00/0000:00:08.2/iommu
/sys/devices/pci0000:00/0000:00:18.5/iommu

I'm going to do a bit more tracing.

I'll post back with what I find and I'll also test the patches you provided.

Thanks

On Fri, May 8, 2020 at 9:22 AM Joerg Roedel <jroedel@suse.de> wrote:
>
> On Wed, May 06, 2020 at 03:00:21PM -0600, Raul Rangel wrote:
> > Any suggestions would be appreciated.
>
> Two possibilities:
>
>         1) Please try the 5 patches from this branch:
>            https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/log/?h=iommu/fixes
>
>            The fix some race-condition issues, I think it is unlikely
>            you hit them, but it is worth a test.
>
>         2) Dma-iommu code does sg-merging, which the previous DMA-API
>            implementation did not. Can you try attached diff from Robin
>            Murphy to disable sg-merging? It that helps it is an issue in
>            the eMMC driver.
>
> Regards,
>
>         Joerg
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index a2e96a5fd9a7..a6b71bad518e 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -779,7 +779,7 @@ static int __finalise_sg(struct device *dev, struct
> scatterlist *sg, int nents,
>                  * - but doesn't fall at a segment boundary
>                  * - and wouldn't make the resulting output segment too long
>                  */
> -               if (cur_len && !s_iova_off && (dma_addr & seg_mask) &&
> +               if (0 && cur_len && !s_iova_off && (dma_addr & seg_mask) &&
>                     (max_len - cur_len >= s_length)) {
>                         /* ...then concatenate it with the previous one */
>                         cur_len += s_length;
> @@ -799,6 +799,7 @@ static int __finalise_sg(struct device *dev, struct
> scatterlist *sg, int nents,
>                 if (s_length + s_iova_off < s_iova_len)
>                         cur_len = 0;
>         }
> +       WARN_ON(count < nents);
>         return count;
>   }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problems using AMD eMMC with AMD IOMMU
  2020-05-08 15:46   ` Raul Rangel
@ 2020-05-08 16:59     ` Raul Rangel
  2020-05-08 18:53       ` Andy Shevchenko
  2020-05-08 21:09       ` Raul Rangel
  0 siblings, 2 replies; 8+ messages in thread
From: Raul Rangel @ 2020-05-08 16:59 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-mmc, Tom Murphy, Douglas Anderson, evgreen, Daniel Kurtz,
	andriy.shevchenko

It looks like the ACPI matching logic was changed.

With ToT we get:
```
[   13.099631] platform AMDI0040:00: AMD-Vi: get_acpihid_device_id:
Checking ACPI HID Map.
[   13.108542] platform AMDI0040:00: AMD-Vi: get_acpihid_device_id:
p->hid: AMDI0040, p->uid: .
[   13.117936] platform AMDI0040:00: AMD-Vi: get_acpihid_device_id: No match!
```

If I revert [iommu/amd: Switch to use
acpi_dev_hid_uid_match()](https://github.com/torvalds/linux/commit/ae5e6c6439c3d0ac8e9c71523790ba1ff6887894)
the matching works.

```
[   13.275305] platform AMDI0040:00: AMD-Vi: get_acpihid_device_id:
Checking ACPI HID Map.
[   13.284216] platform AMDI0040:00: AMD-Vi: get_acpihid_device_id:
p->hid: AMDI0040, p->uid: .
[   13.293610] platform AMDI0040:00: AMD-Vi: match_hid_uid: dev->hid:
AMDI0040, dev->uid: 0.
[   13.302715] platform AMDI0040:00: AMD-Vi: get_acpihid_device_id: Found match!
```

We can see the device is now associated with a group:

```
# find /sys/ -iname iommu
/sys/devices/platform/AMDI0040:00/iommu
```

```
# cat /sys/devices/platform/AMDI0040\:00/iommu_group/reserved_regions
0x00000000fee00000 0x00000000feefffff msi
0x000000fd00000000 0x000000ffffffffff reserved
```

It looks like we now correctly call `iommu_dma_alloc`: https://0paste.com/66855

Unfortunately I still get the DMA errors:
```
[   16.051621] mmc1: ADMA error: 0x02000000
[   16.055632] mmc1: error -5 whilst initialising MMC card
```

```
sdhci_data_irq() {
  sdhci_adma_show_error() {
    sdhci_dumpregs() {
      /* mmc1: sdhci: ============ SDHCI REGISTER DUMP =========== */
      /* mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00001002 */
      /* mmc1: sdhci: Blk size:  0x00007200 | Blk cnt:  0x00000001 */
      /* mmc1: sdhci: Argument:  0x00000000 | Trn mode: 0x00000013 */
      /* mmc1: sdhci: Present:   0xf1ff0000 | Host ctl: 0x00000019 */
      /* mmc1: sdhci: Power:     0x0000000f | Blk gap:  0x00000000 */
      /* mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x0000fa07 */
      /* mmc1: sdhci: Timeout:   0x0000000a | Int stat: 0x00000000 */
      /* mmc1: sdhci: Int enab:  0x03ff000b | Sig enab: 0x03ff000b */
      /* mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 */
      /* mmc1: sdhci: Caps:      0x71fec8b2 | Caps_1:   0x00000577 */
      /* mmc1: sdhci: Cmd:       0x0000083a | Max curr: 0x00c80064 */
      /* mmc1: sdhci: Resp[0]:   0x00000700 | Resp[1]:  0xffffffff */
      /* mmc1: sdhci: Resp[2]:   0x328f5903 | Resp[3]:  0x00d00f00 */
      /* mmc1: sdhci: Host ctl2: 0x00000000 */
      /* mmc1: sdhci: ADMA Err:  0x00000001 | ADMA Ptr: 0xfffffffffffff200 */
      /* mmc1: sdhci: ============================================ */
    }
    /* mmc1: sdhci: fffffffffffff200: DMA 0xffffffffffffee00, LEN
0x0200, Attr=0x21 */
    /* mmc1: sdhci: fffffffffffff20c: DMA 0x0000000000000000, LEN
0x0000, Attr=0x03 */
  }
```

ADMA Err 0x1 = Error Fetching descriptor.

The 0xfffffffffffff200 page is only mapped once, so I don't think it's
a race condition. It's also not complaining about the data transfer,
so don't think it's SG related. Any other tips?

Thanks,
Raul

On Fri, May 8, 2020 at 9:46 AM Raul Rangel <rrangel@chromium.org> wrote:
>
> Thanks for the suggestions. I will try them out. I did some more
> digging yesterday and it looks like the device isn't getting
> associated with the IOMMU. I added some extra logging.
>
> [   11.670073] AMD-Vi: Using IVHD type 0x40
> [   11.674766] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: b0 info 0000
> [   11.682232] AMD-Vi:        mmio-addr: 00000000fd200000
> [   11.687952] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:01.0 flags: 00
> [   11.695219] AMD-Vi:   DEV_RANGE_END           devid: ff:1f.6
> [   11.701267] AMD-Vi:   DEV_ALIAS_RANGE                 devid:
> ff:00.0 flags: 00 devid_to: 00:14.4
> [   11.709699] AMD-Vi:   DEV_RANGE_END           devid: ff:1f.7
> [   11.715224] AMD-Vi:   DEV_SPECIAL(HPET[0])           devid: 00:14.0
> [   11.721327] AMD-Vi:   DEV_SPECIAL(IOAPIC[33])                devid: 00:14.0
> [   11.727720] AMD-Vi:   DEV_ACPI_HID(AMDI0040[])               devid:
> 00:13.1 <- You can see the ACPI ID in the IVRS table.
> [   11.734211] AMD-Vi: ivrs, add hid:AMDI0040, uid:, rdevid:152
> [   11.913253] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
> [   11.920969] pci 0000:00:01.0: AMD-Vi: Init IOMMU for device
> [   11.927224] pci 0000:00:01.0: Adding to iommu group 0
> [   11.932910] pci 0000:00:01.0: AMD-Vi: IOMMU set to DMA
> [   11.938635] pci 0000:00:01.2: AMD-Vi: Init IOMMU for device
> [   11.944858] pci 0000:00:01.2: Adding to iommu group 1
> [   11.950527] pci 0000:00:01.2: AMD-Vi: IOMMU set to DMA
> [   11.956257] pci 0000:00:01.3: AMD-Vi: Init IOMMU for device
> ... More PCI
> [   12.244008] pci 0000:02:00.0: Adding to iommu group 10
> [   12.249784] pci 0000:02:00.0: AMD-Vi: IOMMU set to DMA
> [   12.255514] pci 0000:03:00.0: AMD-Vi: Init IOMMU for device
> [   12.261736] pci 0000:03:00.0: Adding to iommu group 11
> [   12.267519] pci 0000:03:00.0: AMD-Vi: IOMMU set to DMA
> [   12.273261] pci 0000:04:00.0: AMD-Vi: Init IOMMU for device
> [   12.279568] pci 0000:04:00.0: Adding to iommu group 12
> [   12.285427] pci 0000:04:00.0: Using iommu direct mapping
> [   12.291336] pci 0000:04:00.0: AMD-Vi: IOMMU set to passthrough
> [   12.297833] pci 0000:04:00.1: AMD-Vi: Init IOMMU for device
> [   12.304087] pci 0000:04:00.1: Adding to iommu group 13
> [   12.309852] pci 0000:04:00.1: AMD-Vi: IOMMU set to DMA
> [   12.315576] pci 0000:04:00.2: AMD-Vi: Init IOMMU for device
> ... Even More PCI
> [   12.415860] pci 0000:04:00.7: AMD-Vi: IOMMU set to DMA
> [   12.421586] pci 0000:05:00.0: AMD-Vi: Init IOMMU for device
> [   12.427815] pci 0000:05:00.0: Adding to iommu group 14
> [   12.433589] pci 0000:05:00.0: AMD-Vi: IOMMU set to DMA
> [   12.439583] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
> [   12.445687] pci 0000:00:00.2: AMD-Vi: Extended features (0x4f77ef22294ada):
> [   12.453433]  PPR NX GT IA GA PC GA_vAPIC
> [   12.457898] AMD-Vi: Lazy IO/TLB flushing enabled
> [   12.464067] amd_uncore: AMD NB counters detected
> [   12.469208] amd_uncore: AMD LLC counters detected
> [   12.474453] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
>
> Never saw the ACPI device listed.
>
> Listing sysfs I don't see iommu either:
>
> $ find /sys/ -iname iommu
> /sys/kernel/debug/tracing/events/iommu
> /sys/kernel/debug/iommu
> /sys/class/iommu
> /sys/devices/pci0000:00/0000:00:08.0/iommu
> /sys/devices/pci0000:00/0000:00:18.3/iommu
> /sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/iommu
> /sys/devices/pci0000:00/0000:00:01.2/iommu
> /sys/devices/pci0000:00/0000:00:18.1/iommu
> /sys/devices/pci0000:00/0000:00:01.0/iommu
> /sys/devices/pci0000:00/0000:00:14.3/iommu
> /sys/devices/pci0000:00/0000:00:01.7/0000:03:00.0/iommu
> /sys/devices/pci0000:00/0000:00:01.7/iommu
> /sys/devices/pci0000:00/0000:00:00.2/iommu
> /sys/devices/pci0000:00/0000:00:18.6/iommu
> /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.7/iommu
> /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.5/iommu
> /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.3/iommu
> /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.1/iommu
> /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.6/iommu
> /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.4/iommu
> /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.2/iommu
> /sys/devices/pci0000:00/0000:00:08.1/iommu
> /sys/devices/pci0000:00/0000:00:08.1/0000:04:00.0/iommu
> /sys/devices/pci0000:00/0000:00:18.4/iommu
> /sys/devices/pci0000:00/0000:00:01.3/0000:02:00.0/iommu
> /sys/devices/pci0000:00/0000:00:01.3/iommu
> /sys/devices/pci0000:00/0000:00:18.2/iommu
> /sys/devices/pci0000:00/0000:00:18.0/iommu
> /sys/devices/pci0000:00/0000:00:18.7/iommu
> /sys/devices/pci0000:00/0000:00:14.0/iommu
> /sys/devices/pci0000:00/0000:00:08.2/0000:05:00.0/iommu
> /sys/devices/pci0000:00/0000:00:08.2/iommu
> /sys/devices/pci0000:00/0000:00:18.5/iommu
>
> I'm going to do a bit more tracing.
>
> I'll post back with what I find and I'll also test the patches you provided.
>
> Thanks
>
> On Fri, May 8, 2020 at 9:22 AM Joerg Roedel <jroedel@suse.de> wrote:
> >
> > On Wed, May 06, 2020 at 03:00:21PM -0600, Raul Rangel wrote:
> > > Any suggestions would be appreciated.
> >
> > Two possibilities:
> >
> >         1) Please try the 5 patches from this branch:
> >            https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/log/?h=iommu/fixes
> >
> >            The fix some race-condition issues, I think it is unlikely
> >            you hit them, but it is worth a test.
> >
> >         2) Dma-iommu code does sg-merging, which the previous DMA-API
> >            implementation did not. Can you try attached diff from Robin
> >            Murphy to disable sg-merging? It that helps it is an issue in
> >            the eMMC driver.
> >
> > Regards,
> >
> >         Joerg
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index a2e96a5fd9a7..a6b71bad518e 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -779,7 +779,7 @@ static int __finalise_sg(struct device *dev, struct
> > scatterlist *sg, int nents,
> >                  * - but doesn't fall at a segment boundary
> >                  * - and wouldn't make the resulting output segment too long
> >                  */
> > -               if (cur_len && !s_iova_off && (dma_addr & seg_mask) &&
> > +               if (0 && cur_len && !s_iova_off && (dma_addr & seg_mask) &&
> >                     (max_len - cur_len >= s_length)) {
> >                         /* ...then concatenate it with the previous one */
> >                         cur_len += s_length;
> > @@ -799,6 +799,7 @@ static int __finalise_sg(struct device *dev, struct
> > scatterlist *sg, int nents,
> >                 if (s_length + s_iova_off < s_iova_len)
> >                         cur_len = 0;
> >         }
> > +       WARN_ON(count < nents);
> >         return count;
> >   }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problems using AMD eMMC with AMD IOMMU
  2020-05-08 16:59     ` Raul Rangel
@ 2020-05-08 18:53       ` Andy Shevchenko
  2020-05-08 20:31         ` Raul Rangel
  2020-05-08 21:09       ` Raul Rangel
  1 sibling, 1 reply; 8+ messages in thread
From: Andy Shevchenko @ 2020-05-08 18:53 UTC (permalink / raw)
  To: Raul Rangel
  Cc: Joerg Roedel, linux-mmc, Tom Murphy, Douglas Anderson, evgreen,
	Daniel Kurtz

On Fri, May 08, 2020 at 10:59:37AM -0600, Raul Rangel wrote:
> It looks like the ACPI matching logic was changed.
> 
> With ToT we get:
> ```
> [   13.099631] platform AMDI0040:00: AMD-Vi: get_acpihid_device_id:
> Checking ACPI HID Map.
> [   13.108542] platform AMDI0040:00: AMD-Vi: get_acpihid_device_id:
> p->hid: AMDI0040, p->uid: .
> [   13.117936] platform AMDI0040:00: AMD-Vi: get_acpihid_device_id: No match!
> ```
> 
> If I revert [iommu/amd: Switch to use
> acpi_dev_hid_uid_match()](https://github.com/torvalds/linux/commit/ae5e6c6439c3d0ac8e9c71523790ba1ff6887894)
> the matching works.

Thank you for report.

It seems the _UID() in case of matching is present and not empty ("0" vs. "").
The question is why we have no UID is not there?

I will look at it later.

The quick fix looks like

-		if (acpi_dev_hid_uid_match(adev, p->hid, p->uid)) {
+		if (acpi_dev_hid_uid_match(adev, p->hid, *p->uid ? p->uid : NULL)) {

Can you test it?

> ```
> [   13.275305] platform AMDI0040:00: AMD-Vi: get_acpihid_device_id:
> Checking ACPI HID Map.
> [   13.284216] platform AMDI0040:00: AMD-Vi: get_acpihid_device_id:
> p->hid: AMDI0040, p->uid: .
> [   13.293610] platform AMDI0040:00: AMD-Vi: match_hid_uid: dev->hid:
> AMDI0040, dev->uid: 0.
> [   13.302715] platform AMDI0040:00: AMD-Vi: get_acpihid_device_id: Found match!
> ```
> 
> We can see the device is now associated with a group:
> 
> ```
> # find /sys/ -iname iommu
> /sys/devices/platform/AMDI0040:00/iommu
> ```
> 
> ```
> # cat /sys/devices/platform/AMDI0040\:00/iommu_group/reserved_regions
> 0x00000000fee00000 0x00000000feefffff msi
> 0x000000fd00000000 0x000000ffffffffff reserved
> ```

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problems using AMD eMMC with AMD IOMMU
  2020-05-08 18:53       ` Andy Shevchenko
@ 2020-05-08 20:31         ` Raul Rangel
  0 siblings, 0 replies; 8+ messages in thread
From: Raul Rangel @ 2020-05-08 20:31 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Joerg Roedel, linux-mmc, Tom Murphy, Douglas Anderson, evgreen,
	Daniel Kurtz

> It seems the _UID() in case of matching is present and not empty ("0" vs. "").
> The question is why we have no UID is not there?

Looking at the IOMMU IVRS spec:
```
Unique ID Format
0= UID not present
1= UID is an integer
2= UID is a character string
```

So it's optional. I added a bit of logging here:
https://source.chromium.org/chromiumos/chromiumos/codesearch/+/master:src/third_party/kernel/v5.4/drivers/iommu/amd_iommu_init.c;l=1353

```
[   11.730159] AMD-Vi: UID is not present.
[   11.734422] AMD-Vi:   DEV_ACPI_HID(AMDI0040[])               devid: 00:13.1
[   11.740913] AMD-Vi: ivrs, add hid:AMDI0040, uid:, rdevid:152
```

> The quick fix looks like
>
> -               if (acpi_dev_hid_uid_match(adev, p->hid, p->uid)) {
> +               if (acpi_dev_hid_uid_match(adev, p->hid, *p->uid ? p->uid : NULL)) {
>
> Can you test it?
>

Your patch works and fixes the matching. Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problems using AMD eMMC with AMD IOMMU
  2020-05-08 16:59     ` Raul Rangel
  2020-05-08 18:53       ` Andy Shevchenko
@ 2020-05-08 21:09       ` Raul Rangel
  2020-05-08 21:44         ` Raul Rangel
  1 sibling, 1 reply; 8+ messages in thread
From: Raul Rangel @ 2020-05-08 21:09 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-mmc, Tom Murphy, Douglas Anderson, evgreen, Daniel Kurtz,
	Andy Shevchenko

> > > Two possibilities:
> > >
> > >         1) Please try the 5 patches from this branch:
> > >            https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/log/?h=iommu/fixes
> > >
> > >            The fix some race-condition issues, I think it is unlikely
> > >            you hit them, but it is worth a test.
> > >
> > >         2) Dma-iommu code does sg-merging, which the previous DMA-API
> > >            implementation did not. Can you try attached diff from Robin
> > >            Murphy to disable sg-merging? It that helps it is an issue in
> > >            the eMMC driver.
> > >
> > > Regards,
> > >
> > >         Joerg
> > >
> > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > > index a2e96a5fd9a7..a6b71bad518e 100644
> > > --- a/drivers/iommu/dma-iommu.c
> > > +++ b/drivers/iommu/dma-iommu.c
> > > @@ -779,7 +779,7 @@ static int __finalise_sg(struct device *dev, struct
> > > scatterlist *sg, int nents,
> > >                  * - but doesn't fall at a segment boundary
> > >                  * - and wouldn't make the resulting output segment too long
> > >                  */
> > > -               if (cur_len && !s_iova_off && (dma_addr & seg_mask) &&
> > > +               if (0 && cur_len && !s_iova_off && (dma_addr & seg_mask) &&
> > >                     (max_len - cur_len >= s_length)) {
> > >                         /* ...then concatenate it with the previous one */
> > >                         cur_len += s_length;
> > > @@ -799,6 +799,7 @@ static int __finalise_sg(struct device *dev, struct
> > > scatterlist *sg, int nents,
> > >                 if (s_length + s_iova_off < s_iova_len)
> > >                         cur_len = 0;
> > >         }
> > > +       WARN_ON(count < nents);
> > >         return count;
> > >   }

So I tried the 5 patches you listed in 1, and the patch in 2. It still
results in the same failure.

Here is a cleaner link to the trace: https://0paste.com/66912

It's pretty straight forward:

 0)               |        /* iommu_dma_alloc: sdhci-acpi AMDI0040:00:
size: 3596 */
 0)               |                              /* iommu_dma_map_sg:
sdhci-acpi AMDI0040:00: nents: 1 */
 0)               |                            /* sdhci_set_adma_addr:
dma_addr: 0xfffffffffffff200 */
 0)               |      sdhci_adma_show_error() {


I'm going to try reverting be62dbf554c5 now that we know why the ACPI
matching wasn't working.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problems using AMD eMMC with AMD IOMMU
  2020-05-08 21:09       ` Raul Rangel
@ 2020-05-08 21:44         ` Raul Rangel
  0 siblings, 0 replies; 8+ messages in thread
From: Raul Rangel @ 2020-05-08 21:44 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: linux-mmc, Tom Murphy, Douglas Anderson, evgreen, Daniel Kurtz,
	Andy Shevchenko

So with the following CLs + the fix for the ACPI name matching:
f9d9f441a7a Revert "iommu/amd: Convert AMD iommu driver to the dma-iommu api"
f3ed2aa65e67 Revert "iommu: amd: Use generic_iommu_put_resv_regions()"
47cf1b422e60 (origin/master) Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Everything works correctly.

Here is the trace for the working scenario: https://0paste.com/66921

The big difference is the 32 bit address vs the 64 bit address.

Without be62dbf554c5 we get the following: adma_addr:
0x00000000fffff200. While with it we get 0xfffffffffffff200

I tried adding SDHCI_QUIRK2_BROKEN_64_BIT_DMA to
https://source.chromium.org/chromiumos/chromiumos/codesearch/+/master:src/third_party/kernel/v5.4/drivers/mmc/host/sdhci-acpi.c;l=607
and it worked!

So it looks like be62dbf554c5 made it so it starts honoring
`dma_set_mask_and_coherent`. This exposed a hardware quirk that was
missing.

I'll get some patches pushed up.

On Fri, May 8, 2020 at 3:09 PM Raul Rangel <rrangel@chromium.org> wrote:
>
> > > > Two possibilities:
> > > >
> > > >         1) Please try the 5 patches from this branch:
> > > >            https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/log/?h=iommu/fixes
> > > >
> > > >            The fix some race-condition issues, I think it is unlikely
> > > >            you hit them, but it is worth a test.
> > > >
> > > >         2) Dma-iommu code does sg-merging, which the previous DMA-API
> > > >            implementation did not. Can you try attached diff from Robin
> > > >            Murphy to disable sg-merging? It that helps it is an issue in
> > > >            the eMMC driver.
> > > >
> > > > Regards,
> > > >
> > > >         Joerg
> > > >
> > > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > > > index a2e96a5fd9a7..a6b71bad518e 100644
> > > > --- a/drivers/iommu/dma-iommu.c
> > > > +++ b/drivers/iommu/dma-iommu.c
> > > > @@ -779,7 +779,7 @@ static int __finalise_sg(struct device *dev, struct
> > > > scatterlist *sg, int nents,
> > > >                  * - but doesn't fall at a segment boundary
> > > >                  * - and wouldn't make the resulting output segment too long
> > > >                  */
> > > > -               if (cur_len && !s_iova_off && (dma_addr & seg_mask) &&
> > > > +               if (0 && cur_len && !s_iova_off && (dma_addr & seg_mask) &&
> > > >                     (max_len - cur_len >= s_length)) {
> > > >                         /* ...then concatenate it with the previous one */
> > > >                         cur_len += s_length;
> > > > @@ -799,6 +799,7 @@ static int __finalise_sg(struct device *dev, struct
> > > > scatterlist *sg, int nents,
> > > >                 if (s_length + s_iova_off < s_iova_len)
> > > >                         cur_len = 0;
> > > >         }
> > > > +       WARN_ON(count < nents);
> > > >         return count;
> > > >   }
>
> So I tried the 5 patches you listed in 1, and the patch in 2. It still
> results in the same failure.
>
> Here is a cleaner link to the trace: https://0paste.com/66912
>
> It's pretty straight forward:
>
>  0)               |        /* iommu_dma_alloc: sdhci-acpi AMDI0040:00:
> size: 3596 */
>  0)               |                              /* iommu_dma_map_sg:
> sdhci-acpi AMDI0040:00: nents: 1 */
>  0)               |                            /* sdhci_set_adma_addr:
> dma_addr: 0xfffffffffffff200 */
>  0)               |      sdhci_adma_show_error() {
>
>
> I'm going to try reverting be62dbf554c5 now that we know why the ACPI
> matching wasn't working.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-05-08 21:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-06 21:00 Problems using AMD eMMC with AMD IOMMU Raul Rangel
2020-05-08 15:22 ` Joerg Roedel
2020-05-08 15:46   ` Raul Rangel
2020-05-08 16:59     ` Raul Rangel
2020-05-08 18:53       ` Andy Shevchenko
2020-05-08 20:31         ` Raul Rangel
2020-05-08 21:09       ` Raul Rangel
2020-05-08 21:44         ` Raul Rangel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.