AMD-Vi: Event logged [IO_PAGE_FAULT device=42:00.0 domain=0x005e address=0xfffffffdf8030000 flags=0x0008]

* AMD-Vi: Event logged [IO_PAGE_FAULT device=42:00.0 domain=0x005e address=0xfffffffdf8030000 flags=0x0008]
@ 2020-12-03  6:18 Marc Smith
  2020-12-04 15:10 ` Marc Smith
  0 siblings, 1 reply; 2+ messages in thread
From: Marc Smith @ 2020-12-03  6:18 UTC (permalink / raw)
  To: iommu

Hi,

First, I must preface this email by apologizing in advance for asking
about a distro kernel (RHEL in this case); so not truly reporting this
problem and requesting a fix here (I know this should be taken up with
the vendor), rather hoping someone can give me a few hints/pointers on
where to look next for debugging this issue.

I'm using RHEL 7.8.2003 (CentOS) with a 3.10.0-1127.18.2.el7 kernel.
The systems use a Supermicro H12SSW-NT board (AMD), and we have the
IOMMU enabled along with SR-IOV. I have several virtual machines (QEMU
KVM) that run on these servers, and I'm passing PCIe end-points into
the VMs (in some cases the whole PCIe EP itself, and for some devices
I use SR-IOV and pass in the VFs to the VMs). The VM's run Linux as
their guest OS (a couple different distros).

While the servers (VMs) are idle, I don't experience any problems. But
when I start doing a lot of I/O in the virtual machines (iSCSI across
Ethernet interfaces, disk I/O via SAS HBAs that are passed into the
VM, etc.) I notice the following after some time at the host layer
("hypervisor"):
Nov 29 10:50:00 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
device=42:00.0 domain=0x005e address=0xfffffffdf8030000 flags=0x0008]
Nov 29 22:02:03 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
device=c8:02.1 domain=0x005f address=0xfffffffdf8060000 flags=0x0008]
Nov 30 02:13:54 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
device=42:00.0 domain=0x005e address=0xfffffffdf8020000 flags=0x0008]
Nov 30 02:28:44 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
device=c8:02.0 domain=0x005e address=0xfffffffdf8020000 flags=0x0008]
Nov 30 10:48:53 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
device=01:00.0 domain=0x005e address=0xfffffffdf8040000 flags=0x0008]
Dec  2 07:05:22 node1 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
device=c8:03.0 domain=0x005e address=0xfffffffdf8010000 flags=0x0008]

These events happen to all PCIe devices that are passed into the VMs,
although not all at once... as you can see on the timestamps above,
they are not very frequent when under heavy load (in the log snippet
above, the system was doing a big workload over several days). For the
Ethernet devices that are passed into the VMs, I noticed that they
experience transmit hangs / resets in the virtual machines, and when
these occur, they correspond to a matching IO_PAGE_FAULT that belongs
to that PCI device.

FWIW, those NIC hangs look like this (visible in the VM guest OS):
[17879.279091] NETDEV WATCHDOG: s1p1 (bnxt_en): transmit queue 2 timed out
[17879.279111] WARNING: CPU: 5 PID: 0 at net/sched/sch_generic.c:447
dev_watchdog+0x121/0x17e
...
[17879.279213] bnxt_en 0000:01:09.0 s1p1: TX timeout detected,
starting reset task!
[17883.075299] bnxt_en 0000:01:09.0 s1p1: Resp cmpl intr err msg: 0x51
[17883.075302] bnxt_en 0000:01:09.0 s1p1: hwrm_ring_free type 1
failed. rc:fffffff0 err:0
[17886.957100] bnxt_en 0000:01:09.0 s1p1: Resp cmpl intr err msg: 0x51
[17886.957103] bnxt_en 0000:01:09.0 s1p1: hwrm_ring_free type 2
failed. rc:fffffff0 err:0
[17890.843023] bnxt_en 0000:01:09.0 s1p1: Resp cmpl intr err msg: 0x51
[17890.843025] bnxt_en 0000:01:09.0 s1p1: hwrm_ring_free type 2
failed. rc:fffffff0 err:0

We see these NIC hangs in the VMs occur with both Broadcom and
Mellanox Ethernet adapters that are passed into the VMs, so I don't
think it's the NICs causing the IO_PAGE_FAULT events observed in the
hypervisor. Plus we see IO_PAGE_FAULT's for devices other than
Ethernet adapters.

I have several of these same servers (all using the same motherboard,
processor, memory, BIOS, etc.) and they all experience this behavior
with the IO_PAGE_FAULT events, so I don't believe it to be any one
faulty server / component. I guess my question is I'm not sure where
to dig/push next. Is this perhaps an issue with the BIOS/firmware on
these motherboards? Something with the chipset (AMD IOMMU)? A
colleague has suggested that even the AGESA may be related. Or should
I be focusing on the Linux kernel, the AMD IOMMU driver (software)?

I've been poking around other similar bug reports, and I see the
IO_PAGE_FAULT and NIC reset / transmit hang seem to be related in
other posts. This commit looked promising:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4e50ce03976fbc8ae995a000c4b10c737467beaa

But I see RH has already back-ported it into their
3.10.0-1127.18.2.el7 kernel source. I'm open to trying a newer Linux
vanilla kernel (eg, 5.4.x) but would prefer to resolve this in the
RHEL kernel I'm using now. I'll take a look at this next, although due
to the complex nature of this hypervisor/VM setup, it's a bit tedious
to test.

Kernel messages from boot (using the amd_iommu_dump=1 parameter):
...
[    0.214395] AMD-Vi: Using IVHD type 0x11
[    0.214627] AMD-Vi: device: c0:00.2 cap: 0040 seg: 0 flags: b0 info 0000
[    0.214628] AMD-Vi:        mmio-addr: 00000000f3700000
[    0.214634] AMD-Vi:   DEV_SELECT_RANGE_START  devid: c0:01.0 flags: 00
[    0.214635] AMD-Vi:   DEV_RANGE_END           devid: ff:1f.6
[    0.214763] AMD-Vi:   DEV_SPECIAL(IOAPIC[241])               devid: c0:00.1
[    0.214765] AMD-Vi: device: 80:00.2 cap: 0040 seg: 0 flags: b0 info 0000
[    0.214766] AMD-Vi:        mmio-addr: 00000000f2600000
[    0.214771] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 80:01.0 flags: 00
[    0.214772] AMD-Vi:   DEV_RANGE_END           devid: bf:1f.6
[    0.214900] AMD-Vi:   DEV_SPECIAL(IOAPIC[242])               devid: 80:00.1
[    0.214901] AMD-Vi: device: 40:00.2 cap: 0040 seg: 0 flags: b0 info 0000
[    0.214902] AMD-Vi:        mmio-addr: 00000000b4800000
[    0.214906] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 40:01.0 flags: 00
[    0.214907] AMD-Vi:   DEV_RANGE_END           devid: 7f:1f.6
[    0.215036] AMD-Vi:   DEV_SPECIAL(IOAPIC[243])               devid: 40:00.1
[    0.215037] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: b0 info 0000
[    0.215038] AMD-Vi:        mmio-addr: 00000000fc800000
[    0.215044] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:01.0 flags: 00
[    0.215045] AMD-Vi:   DEV_RANGE_END           devid: 3f:1f.6
[    0.215173] AMD-Vi:   DEV_ALIAS_RANGE                 devid:
ff:00.0 flags: 00 devid_to: 00:14.4
[    0.215174] AMD-Vi:   DEV_RANGE_END           devid: ff:1f.7
[    0.215179] AMD-Vi:   DEV_SPECIAL(HPET[0])           devid: 00:14.0
[    0.215180] AMD-Vi:   DEV_SPECIAL(IOAPIC[240])               devid: 00:14.0
[    0.215181] AMD-Vi:   DEV_SPECIAL(IOAPIC[244])               devid: 00:00.1
...
[    4.345723] AMD-Vi: Found IOMMU at 0000:c0:00.2 cap 0x40
[    4.345724] AMD-Vi: Extended features (0x58f77ef22294ade):
[    4.345724]  PPR X2APIC NX GT IA GA PC GA_vAPIC
[    4.345728] AMD-Vi: Found IOMMU at 0000:80:00.2 cap 0x40
[    4.345729] AMD-Vi: Extended features (0x58f77ef22294ade):
[    4.345729]  PPR X2APIC NX GT IA GA PC GA_vAPIC
[    4.345731] AMD-Vi: Found IOMMU at 0000:40:00.2 cap 0x40
[    4.345732] AMD-Vi: Extended features (0x58f77ef22294ade):
[    4.345733]  PPR X2APIC NX GT IA GA PC GA_vAPIC
[    4.345735] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[    4.345735] AMD-Vi: Extended features (0x58f77ef22294ade):
[    4.345736]  PPR X2APIC NX GT IA GA PC GA_vAPIC
[    4.345737] AMD-Vi: Interrupt remapping enabled
[    4.345738] AMD-Vi: virtual APIC enabled
[    4.345739] AMD-Vi: X2APIC enabled
[    4.345805] pci 0000:c0:00.2: irq 26 for MSI/MSI-X
[    4.345947] pci 0000:80:00.2: irq 27 for MSI/MSI-X
[    4.346073] pci 0000:40:00.2: irq 28 for MSI/MSI-X
[    4.346208] pci 0000:00:00.2: irq 29 for MSI/MSI-X
[    4.346305] AMD-Vi: IO/TLB flush on unmap enabled
...

I have also tried using 'amd_iommu=fullflush' (as denoted in the
kernel message above) on a hunch after reviewing other user's posts
with similar IO_PAGE_FAULT events, but this doesn't seem to change
anything -- the events still occur with or without this kernel
parameter.

So, any guidance/tips/advice on how to tackle this would be greatly
appreciated. Thank you for your consideration and time!

--Marc
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 2+ messages in thread