* i915 dma faults on Xen @ 2020-10-14 19:28 Jason Andryuk 2020-10-14 19:37 ` Andrew Cooper 0 siblings, 1 reply; 18+ messages in thread From: Jason Andryuk @ 2020-10-14 19:28 UTC (permalink / raw) To: intel-gfx, xen-devel Hi, Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576 I'm seeing DMA faults for the i915 graphics hardware on a Dell Latitude 5500. These were captured when I plugged into a Dell Thunderbolt dock with two DisplayPort monitors attached. Xen 4.12.4 staging and Linux 5.4.70 (and some earlier versions). Oct 14 18:41:49.056490 kernel:[ 85.570347] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe A: 0x00000080 Oct 14 18:41:49.056494 kernel:[ 85.570395] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe A: 0x00000080 Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] Request device [0000:00:02.0] fault addr 39b5845000, iommu reg = ffff82c00021d000 Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set Oct 14 18:41:49.056784 kernel:[ 85.570668] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe A: 0x00000080 Oct 14 18:41:49.056789 kernel:[ 85.570687] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe A: 0x00000080 Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] Request device [0000:00:02.0] fault addr 4238d0a000, iommu reg = ffff82c00021d000 Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set They repeat. In the log attached to https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at "Oct 14 18:41:49.056589" and continue until I unplug the dock around "Oct 14 18:41:54.801802". I've also seen similar messages when attaching the laptop's HDMI port to a 4k monitor. The eDP display by itself seems okay. I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and didn't see any errors This is a kernel & xen log with drm.debug=0x1e. It also includes some application (glass) logging when it changes resolutions which seems to set off the DMA faults. 5500-igfx-messages-kern-xen-glass Running xen with iommu=no-igfx disables the iommu for the i915 graphics and no faults are reported. However, that breaks some other devices (Dell Latitude 7200 and 5580) giving a black screen with: Oct 10 13:24:37.022117 kernel:[ 14.884759] i915 0000:00:02.0: Failed to idle engines, declaring wedged! Oct 10 13:24:37.022118 kernel:[ 14.964794] i915 0000:00:02.0: Failed to initialize GPU, declaring it wedged! Any suggestions welcome. Thanks, Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2020-10-14 19:28 i915 dma faults on Xen Jason Andryuk @ 2020-10-14 19:37 ` Andrew Cooper 2020-10-15 11:31 ` Roger Pau Monné 0 siblings, 1 reply; 18+ messages in thread From: Andrew Cooper @ 2020-10-14 19:37 UTC (permalink / raw) To: Jason Andryuk, intel-gfx, xen-devel On 14/10/2020 20:28, Jason Andryuk wrote: > Hi, > > Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576 > > I'm seeing DMA faults for the i915 graphics hardware on a Dell > Latitude 5500. These were captured when I plugged into a Dell > Thunderbolt dock with two DisplayPort monitors attached. Xen 4.12.4 > staging and Linux 5.4.70 (and some earlier versions). > > Oct 14 18:41:49.056490 kernel:[ 85.570347] [drm:gen8_de_irq_handler > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > Oct 14 18:41:49.056494 kernel:[ 85.570395] [drm:gen8_de_irq_handler > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > Request device [0000:00:02.0] fault addr 39b5845000, iommu reg = > ffff82c00021d000 > Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > PTE Read access is not set > Oct 14 18:41:49.056784 kernel:[ 85.570668] [drm:gen8_de_irq_handler > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > Oct 14 18:41:49.056789 kernel:[ 85.570687] [drm:gen8_de_irq_handler > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > Request device [0000:00:02.0] fault addr 4238d0a000, iommu reg = > ffff82c00021d000 > Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > PTE Read access is not set > > They repeat. In the log attached to > https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at > "Oct 14 18:41:49.056589" and continue until I unplug the dock around > "Oct 14 18:41:54.801802". > > I've also seen similar messages when attaching the laptop's HDMI port > to a 4k monitor. The eDP display by itself seems okay. > > I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and > didn't see any errors > > This is a kernel & xen log with drm.debug=0x1e. It also includes some > application (glass) logging when it changes resolutions which seems to > set off the DMA faults. 5500-igfx-messages-kern-xen-glass > > Running xen with iommu=no-igfx disables the iommu for the i915 > graphics and no faults are reported. However, that breaks some other > devices (Dell Latitude 7200 and 5580) giving a black screen with: > > Oct 10 13:24:37.022117 kernel:[ 14.884759] i915 0000:00:02.0: Failed > to idle engines, declaring wedged! > Oct 10 13:24:37.022118 kernel:[ 14.964794] i915 0000:00:02.0: Failed > to initialize GPU, declaring it wedged! > > Any suggestions welcome. Presumably this is with a PV dom0. What are 39b5845000 and 4238d0a000 in the machine memory map? This smells like a missing RMRR in the ACPI tables. ~Andrew ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2020-10-14 19:37 ` Andrew Cooper @ 2020-10-15 11:31 ` Roger Pau Monné 2020-10-15 15:16 ` Jason Andryuk 0 siblings, 1 reply; 18+ messages in thread From: Roger Pau Monné @ 2020-10-15 11:31 UTC (permalink / raw) To: Jason Andryuk; +Cc: Andrew Cooper, intel-gfx, xen-devel On Wed, Oct 14, 2020 at 08:37:06PM +0100, Andrew Cooper wrote: > On 14/10/2020 20:28, Jason Andryuk wrote: > > Hi, > > > > Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576 > > > > I'm seeing DMA faults for the i915 graphics hardware on a Dell > > Latitude 5500. These were captured when I plugged into a Dell > > Thunderbolt dock with two DisplayPort monitors attached. Xen 4.12.4 > > staging and Linux 5.4.70 (and some earlier versions). > > > > Oct 14 18:41:49.056490 kernel:[ 85.570347] [drm:gen8_de_irq_handler > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > Oct 14 18:41:49.056494 kernel:[ 85.570395] [drm:gen8_de_irq_handler > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > > Request device [0000:00:02.0] fault addr 39b5845000, iommu reg = > > ffff82c00021d000 > > Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > > PTE Read access is not set > > Oct 14 18:41:49.056784 kernel:[ 85.570668] [drm:gen8_de_irq_handler > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > Oct 14 18:41:49.056789 kernel:[ 85.570687] [drm:gen8_de_irq_handler > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > > Request device [0000:00:02.0] fault addr 4238d0a000, iommu reg = > > ffff82c00021d000 > > Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > > PTE Read access is not set > > > > They repeat. In the log attached to > > https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at > > "Oct 14 18:41:49.056589" and continue until I unplug the dock around > > "Oct 14 18:41:54.801802". > > > > I've also seen similar messages when attaching the laptop's HDMI port > > to a 4k monitor. The eDP display by itself seems okay. > > > > I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and > > didn't see any errors > > > > This is a kernel & xen log with drm.debug=0x1e. It also includes some > > application (glass) logging when it changes resolutions which seems to > > set off the DMA faults. 5500-igfx-messages-kern-xen-glass > > > > Running xen with iommu=no-igfx disables the iommu for the i915 > > graphics and no faults are reported. However, that breaks some other > > devices (Dell Latitude 7200 and 5580) giving a black screen with: > > > > Oct 10 13:24:37.022117 kernel:[ 14.884759] i915 0000:00:02.0: Failed > > to idle engines, declaring wedged! > > Oct 10 13:24:37.022118 kernel:[ 14.964794] i915 0000:00:02.0: Failed > > to initialize GPU, declaring it wedged! > > > > Any suggestions welcome. > > Presumably this is with a PV dom0. What are 39b5845000 and 4238d0a000 > in the machine memory map? > > This smells like a missing RMRR in the ACPI tables. I agree. Can you paste the memory map as printed by Xen when booting, and what command line are you using to boot Xen. Have you tried adding dom0-iommu=map-inclusive to the Xen command line? Roger. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2020-10-15 11:31 ` Roger Pau Monné @ 2020-10-15 15:16 ` Jason Andryuk 2020-10-15 16:38 ` Tamas K Lengyel 2020-10-16 16:23 ` i915 dma faults on Xen Jason Andryuk 0 siblings, 2 replies; 18+ messages in thread From: Jason Andryuk @ 2020-10-15 15:16 UTC (permalink / raw) To: Roger Pau Monné; +Cc: Andrew Cooper, intel-gfx, xen-devel On Thu, Oct 15, 2020 at 7:31 AM Roger Pau Monné <roger.pau@citrix.com> wrote: > > On Wed, Oct 14, 2020 at 08:37:06PM +0100, Andrew Cooper wrote: > > On 14/10/2020 20:28, Jason Andryuk wrote: > > > Hi, > > > > > > Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576 > > > > > > I'm seeing DMA faults for the i915 graphics hardware on a Dell > > > Latitude 5500. These were captured when I plugged into a Dell > > > Thunderbolt dock with two DisplayPort monitors attached. Xen 4.12.4 > > > staging and Linux 5.4.70 (and some earlier versions). > > > > > > Oct 14 18:41:49.056490 kernel:[ 85.570347] [drm:gen8_de_irq_handler > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > > Oct 14 18:41:49.056494 kernel:[ 85.570395] [drm:gen8_de_irq_handler > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > > Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > > > Request device [0000:00:02.0] fault addr 39b5845000, iommu reg = > > > ffff82c00021d000 > > > Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > > > PTE Read access is not set > > > Oct 14 18:41:49.056784 kernel:[ 85.570668] [drm:gen8_de_irq_handler > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > > Oct 14 18:41:49.056789 kernel:[ 85.570687] [drm:gen8_de_irq_handler > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > > Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > > > Request device [0000:00:02.0] fault addr 4238d0a000, iommu reg = > > > ffff82c00021d000 > > > Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > > > PTE Read access is not set > > > > > > They repeat. In the log attached to > > > https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at > > > "Oct 14 18:41:49.056589" and continue until I unplug the dock around > > > "Oct 14 18:41:54.801802". > > > > > > I've also seen similar messages when attaching the laptop's HDMI port > > > to a 4k monitor. The eDP display by itself seems okay. > > > > > > I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and > > > didn't see any errors > > > > > > This is a kernel & xen log with drm.debug=0x1e. It also includes some > > > application (glass) logging when it changes resolutions which seems to > > > set off the DMA faults. 5500-igfx-messages-kern-xen-glass > > > > > > Running xen with iommu=no-igfx disables the iommu for the i915 > > > graphics and no faults are reported. However, that breaks some other > > > devices (Dell Latitude 7200 and 5580) giving a black screen with: > > > > > > Oct 10 13:24:37.022117 kernel:[ 14.884759] i915 0000:00:02.0: Failed > > > to idle engines, declaring wedged! > > > Oct 10 13:24:37.022118 kernel:[ 14.964794] i915 0000:00:02.0: Failed > > > to initialize GPU, declaring it wedged! > > > > > > Any suggestions welcome. > > > > Presumably this is with a PV dom0. What are 39b5845000 and 4238d0a000 > > in the machine memory map? They are bogus? End of RAM is 0x47c800000 Thats: 0x047c800000 vs. 0x39b5845000 0x4238d0a000 > > This smells like a missing RMRR in the ACPI tables. > > I agree. > > Can you paste the memory map as printed by Xen when booting, and what > command line are you using to boot Xen. So this is OpenXT, and it's booting EFI -> xen -> tboot -> xen There's the memory map (XEN) TBOOT RAM map: (XEN) 0000000000000000 - 0000000000060000 (usable) (XEN) 0000000000060000 - 0000000000068000 (reserved) (XEN) 0000000000068000 - 000000000009e000 (usable) (XEN) 000000000009e000 - 000000000009f000 (reserved) (XEN) 000000000009f000 - 00000000000a0000 (usable) (XEN) 00000000000a0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 0000000040000000 (usable) (XEN) 0000000040000000 - 0000000040400000 (reserved) (XEN) 0000000040400000 - 000000007024b000 (usable) (XEN) 000000007024b000 - 000000007024c000 (ACPI NVS) (XEN) 000000007024c000 - 000000007024d000 (reserved) (XEN) 000000007024d000 - 0000000077f19000 (usable) (XEN) 0000000077f19000 - 0000000078987000 (reserved) (XEN) 0000000078987000 - 0000000078a04000 (ACPI data) (XEN) 0000000078a04000 - 0000000078ea3000 (ACPI NVS) (XEN) 0000000078ea3000 - 000000007acff000 (reserved) (XEN) 000000007acff000 - 000000007ad00000 (usable) (XEN) 000000007ad00000 - 000000007f800000 (reserved) (XEN) 00000000f0000000 - 00000000f8000000 (reserved) (XEN) 00000000fe000000 - 00000000fe011000 (reserved) (XEN) 00000000fec00000 - 00000000fec01000 (reserved) (XEN) 00000000fee00000 - 00000000fee01000 (reserved) (XEN) 00000000ff000000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 000000047c800000 (usable) (XEN) EFI memory map: (XEN) 0000000000000-000000009dfff type=7 attr=000000000000000f (XEN) 000000009e000-000000009efff type=0 attr=000000000000000f (XEN) 000000009f000-000000009ffff type=3 attr=000000000000000f (XEN) 0000000100000-000003fffffff type=7 attr=000000000000000f (XEN) 0000040000000-00000403fffff type=0 attr=000000000000000f (XEN) 0000040400000-000005e359fff type=7 attr=000000000000000f (XEN) 000005e35a000-000005e399fff type=4 attr=000000000000000f (XEN) 000005e39a000-000006a47dfff type=7 attr=000000000000000f (XEN) 000006a47e000-000006c3eefff type=2 attr=000000000000000f (XEN) 000006c3ef000-000006d5eefff type=1 attr=000000000000000f (XEN) 000006d5ef000-000006d86cfff type=2 attr=000000000000000f (XEN) 000006d86d000-000006d978fff type=1 attr=000000000000000f (XEN) 000006d979000-000006dc7afff type=4 attr=000000000000000f (XEN) 000006dc7b000-000006dc98fff type=3 attr=000000000000000f (XEN) 000006dc99000-000006dcc7fff type=4 attr=000000000000000f (XEN) 000006dcc8000-000006dccdfff type=3 attr=000000000000000f (XEN) 000006dcce000-00000701a5fff type=4 attr=000000000000000f (XEN) 00000701a6000-00000701c8fff type=3 attr=000000000000000f (XEN) 00000701c9000-00000701edfff type=4 attr=000000000000000f (XEN) 00000701ee000-0000070204fff type=3 attr=000000000000000f (XEN) 0000070205000-000007022cfff type=4 attr=000000000000000f (XEN) 000007022d000-000007024afff type=3 attr=000000000000000f (XEN) 000007024b000-000007024bfff type=10 attr=000000000000000f (XEN) 000007024c000-000007024cfff type=6 attr=800000000000000f (XEN) 000007024d000-000007024dfff type=4 attr=000000000000000f (XEN) 000007024e000-0000070282fff type=3 attr=000000000000000f (XEN) 0000070283000-00000702c3fff type=4 attr=000000000000000f (XEN) 00000702c4000-00000702c8fff type=3 attr=000000000000000f (XEN) 00000702c9000-00000702defff type=4 attr=000000000000000f (XEN) 00000702df000-0000070307fff type=3 attr=000000000000000f (XEN) 0000070308000-0000070317fff type=4 attr=000000000000000f (XEN) 0000070318000-0000070319fff type=3 attr=000000000000000f (XEN) 000007031a000-0000070331fff type=4 attr=000000000000000f (XEN) 0000070332000-0000070349fff type=3 attr=000000000000000f (XEN) 000007034a000-0000070356fff type=2 attr=000000000000000f (XEN) 0000070357000-0000070357fff type=7 attr=000000000000000f (XEN) 0000070358000-0000070358fff type=2 attr=000000000000000f (XEN) 0000070359000-0000076f3efff type=4 attr=000000000000000f (XEN) 0000076f3f000-00000772affff type=7 attr=000000000000000f (XEN) 00000772b0000-0000077f18fff type=3 attr=000000000000000f (XEN) 0000077f19000-0000078986fff type=0 attr=000000000000000f (XEN) 0000078987000-0000078a03fff type=9 attr=000000000000000f (XEN) 0000078a04000-0000078ea2fff type=10 attr=000000000000000f (XEN) 0000078ea3000-000007ab22fff type=6 attr=800000000000000f (XEN) 000007ab23000-000007acfefff type=5 attr=800000000000000f (XEN) 000007acff000-000007acfffff type=4 attr=000000000000000f (XEN) 0000100000000-000047c7fffff type=7 attr=000000000000000f (XEN) 00000000a0000-00000000fffff type=0 attr=0000000000000000 (XEN) 000007ad00000-000007adfffff type=0 attr=070000000000000f (XEN) 000007ae00000-000007f7fffff type=0 attr=0000000000000000 (XEN) 00000f0000000-00000f7ffffff type=11 attr=800000000000100d (XEN) 00000fe000000-00000fe010fff type=11 attr=8000000000000001 (XEN) 00000fec00000-00000fec00fff type=11 attr=8000000000000001 (XEN) 00000fee00000-00000fee00fff type=11 attr=8000000000000001 (XEN) 00000ff000000-00000ffffffff type=11 attr=800000000000100d Command line console=com1 dom0_mem=min:420M,max:420M,420M efi=no-rs,attr=uc com1=115200,8n1,pci mbi-video vga=current flask=enforcing loglvl=debug guest_loglvl=debug smt=0 ucode=-1 bootscrub=1 argo=yes,mac-permissive=1 iommu=force,igfx iommu=force,igfx was to force igfx back on. I added a dmi quirk to set no-igfx on this platform as a temporary workaround. > Have you tried adding dom0-iommu=map-inclusive to the Xen command > line? I have not. I can try that tomorrow when I have access to the system again. Thanks, Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2020-10-15 15:16 ` Jason Andryuk @ 2020-10-15 16:38 ` Tamas K Lengyel 2020-10-15 17:13 ` Jason Andryuk 2020-10-16 16:23 ` i915 dma faults on Xen Jason Andryuk 1 sibling, 1 reply; 18+ messages in thread From: Tamas K Lengyel @ 2020-10-15 16:38 UTC (permalink / raw) To: Jason Andryuk; +Cc: Roger Pau Monné, Andrew Cooper, xen-devel > > Can you paste the memory map as printed by Xen when booting, and what > > command line are you using to boot Xen. > > So this is OpenXT, and it's booting EFI -> xen -> tboot -> xen Unrelated comment: since tboot now has a PE build (http://hg.code.sf.net/p/tboot/code/rev/5c68f0963a78) I think it would be time for OpenXT to drop the weird efi->xen->tboot->xen flow and just do efi->tboot->xen. Only reason we did efi->xen->tboot was because tboot didn't have a PE build at the time. It's a very hackish solution that's no longer needed. Tamas ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2020-10-15 16:38 ` Tamas K Lengyel @ 2020-10-15 17:13 ` Jason Andryuk 2021-02-19 17:33 ` tboot UEFI and Xen (was Re: i915 dma faults on Xen) Jason Andryuk 0 siblings, 1 reply; 18+ messages in thread From: Jason Andryuk @ 2020-10-15 17:13 UTC (permalink / raw) To: Tamas K Lengyel; +Cc: Roger Pau Monné, Andrew Cooper, xen-devel On Thu, Oct 15, 2020 at 12:39 PM Tamas K Lengyel <tamas.k.lengyel@gmail.com> wrote: > > > > Can you paste the memory map as printed by Xen when booting, and what > > > command line are you using to boot Xen. > > > > So this is OpenXT, and it's booting EFI -> xen -> tboot -> xen > > Unrelated comment: since tboot now has a PE build > (http://hg.code.sf.net/p/tboot/code/rev/5c68f0963a78) I think it would > be time for OpenXT to drop the weird efi->xen->tboot->xen flow and > just do efi->tboot->xen. Only reason we did efi->xen->tboot was > because tboot didn't have a PE build at the time. It's a very hackish > solution that's no longer needed. Thanks for the pointer, Tamas. If I recall correctly, there was also an issue with ExitBootServices. Do you know if that has been addressed? Depending on timing, OpenXT may just move to TrenchBoot for a DRTM solution. Regards, Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
* tboot UEFI and Xen (was Re: i915 dma faults on Xen) 2020-10-15 17:13 ` Jason Andryuk @ 2021-02-19 17:33 ` Jason Andryuk 0 siblings, 0 replies; 18+ messages in thread From: Jason Andryuk @ 2021-02-19 17:33 UTC (permalink / raw) To: Tamas K Lengyel; +Cc: xen-devel On Thu, Oct 15, 2020 at 1:13 PM Jason Andryuk <jandryuk@gmail.com> wrote: > > On Thu, Oct 15, 2020 at 12:39 PM Tamas K Lengyel > <tamas.k.lengyel@gmail.com> wrote: > > > > > > Can you paste the memory map as printed by Xen when booting, and what > > > > command line are you using to boot Xen. > > > > > > So this is OpenXT, and it's booting EFI -> xen -> tboot -> xen > > > > Unrelated comment: since tboot now has a PE build > > (http://hg.code.sf.net/p/tboot/code/rev/5c68f0963a78) I think it would > > be time for OpenXT to drop the weird efi->xen->tboot->xen flow and > > just do efi->tboot->xen. Only reason we did efi->xen->tboot was > > because tboot didn't have a PE build at the time. It's a very hackish > > solution that's no longer needed. > > Thanks for the pointer, Tamas. If I recall correctly, there was also > an issue with ExitBootServices. Do you know if that has been > addressed? I tested tboot's UEFI build, but it didn't boot Xen: Fedora UEFI shim -> grub2 -> tboot.mb2 -> xen didn't work - hung at a black screen. Power button powered off promptly, so it didn't get far enough for something to enable ACPI. Fedora UEFI shim -> grub2 -> tboot.mb2 -> linux booted and showed efi stuff in /sys Naturally this is on a laptop without a serial port. I haven't looked into this further as it's a low priority for me at this time. Regards, Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2020-10-15 15:16 ` Jason Andryuk 2020-10-15 16:38 ` Tamas K Lengyel @ 2020-10-16 16:23 ` Jason Andryuk 2020-10-21 9:58 ` Roger Pau Monné 1 sibling, 1 reply; 18+ messages in thread From: Jason Andryuk @ 2020-10-16 16:23 UTC (permalink / raw) To: Roger Pau Monné; +Cc: Andrew Cooper, intel-gfx, xen-devel On Thu, Oct 15, 2020 at 11:16 AM Jason Andryuk <jandryuk@gmail.com> wrote: > > On Thu, Oct 15, 2020 at 7:31 AM Roger Pau Monné <roger.pau@citrix.com> wrote: > > > > On Wed, Oct 14, 2020 at 08:37:06PM +0100, Andrew Cooper wrote: > > > On 14/10/2020 20:28, Jason Andryuk wrote: > > > > Hi, > > > > > > > > Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576 > > > > > > > > I'm seeing DMA faults for the i915 graphics hardware on a Dell > > > > Latitude 5500. These were captured when I plugged into a Dell > > > > Thunderbolt dock with two DisplayPort monitors attached. Xen 4.12.4 > > > > staging and Linux 5.4.70 (and some earlier versions). > > > > > > > > Oct 14 18:41:49.056490 kernel:[ 85.570347] [drm:gen8_de_irq_handler > > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > > > Oct 14 18:41:49.056494 kernel:[ 85.570395] [drm:gen8_de_irq_handler > > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > > > Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > > > > Request device [0000:00:02.0] fault addr 39b5845000, iommu reg = > > > > ffff82c00021d000 > > > > Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > > > > PTE Read access is not set > > > > Oct 14 18:41:49.056784 kernel:[ 85.570668] [drm:gen8_de_irq_handler > > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > > > Oct 14 18:41:49.056789 kernel:[ 85.570687] [drm:gen8_de_irq_handler > > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > > > Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > > > > Request device [0000:00:02.0] fault addr 4238d0a000, iommu reg = > > > > ffff82c00021d000 > > > > Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > > > > PTE Read access is not set > > > > > > > > They repeat. In the log attached to > > > > https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at > > > > "Oct 14 18:41:49.056589" and continue until I unplug the dock around > > > > "Oct 14 18:41:54.801802". > > > > > > > > I've also seen similar messages when attaching the laptop's HDMI port > > > > to a 4k monitor. The eDP display by itself seems okay. > > > > > > > > I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and > > > > didn't see any errors > > > > > > > > This is a kernel & xen log with drm.debug=0x1e. It also includes some > > > > application (glass) logging when it changes resolutions which seems to > > > > set off the DMA faults. 5500-igfx-messages-kern-xen-glass > > > > > > > > Running xen with iommu=no-igfx disables the iommu for the i915 > > > > graphics and no faults are reported. However, that breaks some other > > > > devices (Dell Latitude 7200 and 5580) giving a black screen with: > > > > > > > > Oct 10 13:24:37.022117 kernel:[ 14.884759] i915 0000:00:02.0: Failed > > > > to idle engines, declaring wedged! > > > > Oct 10 13:24:37.022118 kernel:[ 14.964794] i915 0000:00:02.0: Failed > > > > to initialize GPU, declaring it wedged! > > > > > > > > Any suggestions welcome. > > > > > > Presumably this is with a PV dom0. What are 39b5845000 and 4238d0a000 > > > in the machine memory map? > > They are bogus? > End of RAM is 0x47c800000 > Thats: > 0x047c800000 > vs. > 0x39b5845000 > 0x4238d0a000 > > > > This smells like a missing RMRR in the ACPI tables. The RMRRs are: (XEN) [VT-D]Host address width 39 (XEN) [VT-D]found ACPI_DMAR_DRHD: (XEN) [VT-D] dmaru->address = fed90000 (XEN) [VT-D]drhd->address = fed90000 iommu->reg = ffff82c00021d000 (XEN) [VT-D]cap = 1c0000c40660462 ecap = 19e2ff0505e (XEN) [VT-D] endpoint: 0000:00:02.0 (XEN) [VT-D]found ACPI_DMAR_DRHD: (XEN) [VT-D] dmaru->address = fed91000 (XEN) [VT-D]drhd->address = fed91000 iommu->reg = ffff82c00021f000 (XEN) [VT-D]cap = d2008c40660462 ecap = f050da (XEN) [VT-D] IOAPIC: 0000:00:1e.7 (XEN) [VT-D] MSI HPET: 0000:00:1e.6 (XEN) [VT-D] flags: INCLUDE_ALL (XEN) [VT-D]found ACPI_DMAR_RMRR: (XEN) [VT-D] endpoint: 0000:00:14.0 (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 78863000 end_addr 78882fff (XEN) [VT-D]found ACPI_DMAR_RMRR: (XEN) [VT-D] endpoint: 0000:00:02.0 (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 7d000000 end_addr 7f7fffff (XEN) [VT-D]found ACPI_DMAR_RMRR: (XEN) [VT-D] endpoint: 0000:00:16.7 (XEN) [VT-D]dmar.c:581: Non-existent device (0000:00:16.7) is reported in RMRR (78907000, 78986fff)'s scope! (XEN) [VT-D]dmar.c:596: Ignore the RMRR (78907000, 78986fff) due to devices under its scope are not PCI discoverable! > > I agree. > > > > Can you paste the memory map as printed by Xen when booting, and what > > command line are you using to boot Xen. > > So this is OpenXT, and it's booting EFI -> xen -> tboot -> xen > > There's the memory map > (XEN) TBOOT RAM map: > (XEN) 0000000000000000 - 0000000000060000 (usable) > (XEN) 0000000000060000 - 0000000000068000 (reserved) > (XEN) 0000000000068000 - 000000000009e000 (usable) > (XEN) 000000000009e000 - 000000000009f000 (reserved) > (XEN) 000000000009f000 - 00000000000a0000 (usable) > (XEN) 00000000000a0000 - 0000000000100000 (reserved) > (XEN) 0000000000100000 - 0000000040000000 (usable) > (XEN) 0000000040000000 - 0000000040400000 (reserved) > (XEN) 0000000040400000 - 000000007024b000 (usable) > (XEN) 000000007024b000 - 000000007024c000 (ACPI NVS) > (XEN) 000000007024c000 - 000000007024d000 (reserved) > (XEN) 000000007024d000 - 0000000077f19000 (usable) > (XEN) 0000000077f19000 - 0000000078987000 (reserved) > (XEN) 0000000078987000 - 0000000078a04000 (ACPI data) > (XEN) 0000000078a04000 - 0000000078ea3000 (ACPI NVS) > (XEN) 0000000078ea3000 - 000000007acff000 (reserved) > (XEN) 000000007acff000 - 000000007ad00000 (usable) > (XEN) 000000007ad00000 - 000000007f800000 (reserved) > (XEN) 00000000f0000000 - 00000000f8000000 (reserved) > (XEN) 00000000fe000000 - 00000000fe011000 (reserved) > (XEN) 00000000fec00000 - 00000000fec01000 (reserved) > (XEN) 00000000fee00000 - 00000000fee01000 (reserved) > (XEN) 00000000ff000000 - 0000000100000000 (reserved) > (XEN) 0000000100000000 - 000000047c800000 (usable) > (XEN) EFI memory map: > (XEN) 0000000000000-000000009dfff type=7 attr=000000000000000f > (XEN) 000000009e000-000000009efff type=0 attr=000000000000000f > (XEN) 000000009f000-000000009ffff type=3 attr=000000000000000f > (XEN) 0000000100000-000003fffffff type=7 attr=000000000000000f > (XEN) 0000040000000-00000403fffff type=0 attr=000000000000000f > (XEN) 0000040400000-000005e359fff type=7 attr=000000000000000f > (XEN) 000005e35a000-000005e399fff type=4 attr=000000000000000f > (XEN) 000005e39a000-000006a47dfff type=7 attr=000000000000000f > (XEN) 000006a47e000-000006c3eefff type=2 attr=000000000000000f > (XEN) 000006c3ef000-000006d5eefff type=1 attr=000000000000000f > (XEN) 000006d5ef000-000006d86cfff type=2 attr=000000000000000f > (XEN) 000006d86d000-000006d978fff type=1 attr=000000000000000f > (XEN) 000006d979000-000006dc7afff type=4 attr=000000000000000f > (XEN) 000006dc7b000-000006dc98fff type=3 attr=000000000000000f > (XEN) 000006dc99000-000006dcc7fff type=4 attr=000000000000000f > (XEN) 000006dcc8000-000006dccdfff type=3 attr=000000000000000f > (XEN) 000006dcce000-00000701a5fff type=4 attr=000000000000000f > (XEN) 00000701a6000-00000701c8fff type=3 attr=000000000000000f > (XEN) 00000701c9000-00000701edfff type=4 attr=000000000000000f > (XEN) 00000701ee000-0000070204fff type=3 attr=000000000000000f > (XEN) 0000070205000-000007022cfff type=4 attr=000000000000000f > (XEN) 000007022d000-000007024afff type=3 attr=000000000000000f > (XEN) 000007024b000-000007024bfff type=10 attr=000000000000000f > (XEN) 000007024c000-000007024cfff type=6 attr=800000000000000f > (XEN) 000007024d000-000007024dfff type=4 attr=000000000000000f > (XEN) 000007024e000-0000070282fff type=3 attr=000000000000000f > (XEN) 0000070283000-00000702c3fff type=4 attr=000000000000000f > (XEN) 00000702c4000-00000702c8fff type=3 attr=000000000000000f > (XEN) 00000702c9000-00000702defff type=4 attr=000000000000000f > (XEN) 00000702df000-0000070307fff type=3 attr=000000000000000f > (XEN) 0000070308000-0000070317fff type=4 attr=000000000000000f > (XEN) 0000070318000-0000070319fff type=3 attr=000000000000000f > (XEN) 000007031a000-0000070331fff type=4 attr=000000000000000f > (XEN) 0000070332000-0000070349fff type=3 attr=000000000000000f > (XEN) 000007034a000-0000070356fff type=2 attr=000000000000000f > (XEN) 0000070357000-0000070357fff type=7 attr=000000000000000f > (XEN) 0000070358000-0000070358fff type=2 attr=000000000000000f > (XEN) 0000070359000-0000076f3efff type=4 attr=000000000000000f > (XEN) 0000076f3f000-00000772affff type=7 attr=000000000000000f > (XEN) 00000772b0000-0000077f18fff type=3 attr=000000000000000f > (XEN) 0000077f19000-0000078986fff type=0 attr=000000000000000f > (XEN) 0000078987000-0000078a03fff type=9 attr=000000000000000f > (XEN) 0000078a04000-0000078ea2fff type=10 attr=000000000000000f > (XEN) 0000078ea3000-000007ab22fff type=6 attr=800000000000000f > (XEN) 000007ab23000-000007acfefff type=5 attr=800000000000000f > (XEN) 000007acff000-000007acfffff type=4 attr=000000000000000f > (XEN) 0000100000000-000047c7fffff type=7 attr=000000000000000f > (XEN) 00000000a0000-00000000fffff type=0 attr=0000000000000000 > (XEN) 000007ad00000-000007adfffff type=0 attr=070000000000000f > (XEN) 000007ae00000-000007f7fffff type=0 attr=0000000000000000 > (XEN) 00000f0000000-00000f7ffffff type=11 attr=800000000000100d > (XEN) 00000fe000000-00000fe010fff type=11 attr=8000000000000001 > (XEN) 00000fec00000-00000fec00fff type=11 attr=8000000000000001 > (XEN) 00000fee00000-00000fee00fff type=11 attr=8000000000000001 > (XEN) 00000ff000000-00000ffffffff type=11 attr=800000000000100d > > Command line > console=com1 dom0_mem=min:420M,max:420M,420M efi=no-rs,attr=uc > com1=115200,8n1,pci mbi-video vga=current flask=enforcing loglvl=debug > guest_loglvl=debug smt=0 ucode=-1 bootscrub=1 > argo=yes,mac-permissive=1 iommu=force,igfx > > iommu=force,igfx was to force igfx back on. I added a dmi quirk to > set no-igfx on this platform as a temporary workaround. > > > Have you tried adding dom0-iommu=map-inclusive to the Xen command > > line? Still seeing faults with dom0-iommu=map-inclusive. At a different address this time: Oct 16 15:58:05.110768 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] Request device [0000:00:02.0] fault addr ea0c4f000, iommu reg = ffff 82c00021d000 Oct 16 15:58:05.110774 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set Oct 16 15:58:05.110777 VM hypervisor: (XEN) print_vtd_entries: iommu #0 dev 0000:00:02.0 gmfn ea0c4f Oct 16 15:58:05.110780 VM hypervisor: (XEN) root_entry[00] = 46e129001 Oct 16 15:58:05.110782 VM hypervisor: (XEN) context[10] = 2_46e128001 Oct 16 15:58:05.110785 VM hypervisor: (XEN) l4[000] = 46e11b003 Oct 16 15:58:05.110787 VM hypervisor: (XEN) l3[03a] = 0 Oct 16 15:58:05.110789 VM hypervisor: (XEN) l3[03a] not present The previous posting, the two faulting addresses repeated in pairs. Here it is only this one address repeating. I plugged and unplugged and a different address was repeating with a few other random addresses with 1 or 2 faults. Here is uniq -c output of the address and count pulled from the logs: 0x1ce9d6b000 2007 0x31b50d5000 1 0x1ce9d6b000 882 0x707741000 1 0x1ce9d6b000 1114 0x20d2099000 1 0x1ce9d6b000 3489 0xeb98eb000 1 0x1ce9d6b000 2430 0xeb98eb000 1 0x1ce9d6b000 1300 0x22f20bb000 1 0x1ce9d6b000 269 0x22f20bb000 1 0x1ce9d6b000 5091 0x6c99ec9000 1 0x1ce9d6b000 29 0xeb98eb000 1 0x1ce9d6b000 4599 0x6c99ec9000 1 0x1ce9d6b000 1989 In the i915 bug report, LAKSHMINARAYANA VUDUM commented "We have a similar issue on SKL on our CI system https://gitlab.freedesktop.org/drm/intel/-/issues/2017" Regards, Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2020-10-16 16:23 ` i915 dma faults on Xen Jason Andryuk @ 2020-10-21 9:58 ` Roger Pau Monné 2020-10-21 10:33 ` Jan Beulich 2020-10-21 12:45 ` Jason Andryuk 0 siblings, 2 replies; 18+ messages in thread From: Roger Pau Monné @ 2020-10-21 9:58 UTC (permalink / raw) To: Jason Andryuk; +Cc: Andrew Cooper, intel-gfx, xen-devel On Fri, Oct 16, 2020 at 12:23:22PM -0400, Jason Andryuk wrote: > On Thu, Oct 15, 2020 at 11:16 AM Jason Andryuk <jandryuk@gmail.com> wrote: > > > > On Thu, Oct 15, 2020 at 7:31 AM Roger Pau Monné <roger.pau@citrix.com> wrote: > > > > > > On Wed, Oct 14, 2020 at 08:37:06PM +0100, Andrew Cooper wrote: > > > > On 14/10/2020 20:28, Jason Andryuk wrote: > > > > > Hi, > > > > > > > > > > Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576 > > > > > > > > > > I'm seeing DMA faults for the i915 graphics hardware on a Dell > > > > > Latitude 5500. These were captured when I plugged into a Dell > > > > > Thunderbolt dock with two DisplayPort monitors attached. Xen 4.12.4 > > > > > staging and Linux 5.4.70 (and some earlier versions). > > > > > > > > > > Oct 14 18:41:49.056490 kernel:[ 85.570347] [drm:gen8_de_irq_handler > > > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > > > > Oct 14 18:41:49.056494 kernel:[ 85.570395] [drm:gen8_de_irq_handler > > > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > > > > Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > > > > > Request device [0000:00:02.0] fault addr 39b5845000, iommu reg = > > > > > ffff82c00021d000 > > > > > Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > > > > > PTE Read access is not set > > > > > Oct 14 18:41:49.056784 kernel:[ 85.570668] [drm:gen8_de_irq_handler > > > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > > > > Oct 14 18:41:49.056789 kernel:[ 85.570687] [drm:gen8_de_irq_handler > > > > > [i915]] *ERROR* Fault errors on pipe A: 0x00000080 > > > > > Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > > > > > Request device [0000:00:02.0] fault addr 4238d0a000, iommu reg = > > > > > ffff82c00021d000 > > > > > Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > > > > > PTE Read access is not set > > > > > > > > > > They repeat. In the log attached to > > > > > https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at > > > > > "Oct 14 18:41:49.056589" and continue until I unplug the dock around > > > > > "Oct 14 18:41:54.801802". > > > > > > > > > > I've also seen similar messages when attaching the laptop's HDMI port > > > > > to a 4k monitor. The eDP display by itself seems okay. > > > > > > > > > > I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and > > > > > didn't see any errors > > > > > > > > > > This is a kernel & xen log with drm.debug=0x1e. It also includes some > > > > > application (glass) logging when it changes resolutions which seems to > > > > > set off the DMA faults. 5500-igfx-messages-kern-xen-glass > > > > > > > > > > Running xen with iommu=no-igfx disables the iommu for the i915 > > > > > graphics and no faults are reported. However, that breaks some other > > > > > devices (Dell Latitude 7200 and 5580) giving a black screen with: > > > > > > > > > > Oct 10 13:24:37.022117 kernel:[ 14.884759] i915 0000:00:02.0: Failed > > > > > to idle engines, declaring wedged! > > > > > Oct 10 13:24:37.022118 kernel:[ 14.964794] i915 0000:00:02.0: Failed > > > > > to initialize GPU, declaring it wedged! > > > > > > > > > > Any suggestions welcome. > > > > > > > > Presumably this is with a PV dom0. What are 39b5845000 and 4238d0a000 > > > > in the machine memory map? > > > > They are bogus? > > End of RAM is 0x47c800000 > > Thats: > > 0x047c800000 > > vs. > > 0x39b5845000 > > 0x4238d0a000 > > > > > > This smells like a missing RMRR in the ACPI tables. > > The RMRRs are: > (XEN) [VT-D]Host address width 39 > (XEN) [VT-D]found ACPI_DMAR_DRHD: > (XEN) [VT-D] dmaru->address = fed90000 > (XEN) [VT-D]drhd->address = fed90000 iommu->reg = ffff82c00021d000 > (XEN) [VT-D]cap = 1c0000c40660462 ecap = 19e2ff0505e > (XEN) [VT-D] endpoint: 0000:00:02.0 > (XEN) [VT-D]found ACPI_DMAR_DRHD: > (XEN) [VT-D] dmaru->address = fed91000 > (XEN) [VT-D]drhd->address = fed91000 iommu->reg = ffff82c00021f000 > (XEN) [VT-D]cap = d2008c40660462 ecap = f050da > (XEN) [VT-D] IOAPIC: 0000:00:1e.7 > (XEN) [VT-D] MSI HPET: 0000:00:1e.6 > (XEN) [VT-D] flags: INCLUDE_ALL > (XEN) [VT-D]found ACPI_DMAR_RMRR: > (XEN) [VT-D] endpoint: 0000:00:14.0 > (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 78863000 end_addr 78882fff > (XEN) [VT-D]found ACPI_DMAR_RMRR: > (XEN) [VT-D] endpoint: 0000:00:02.0 > (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 7d000000 end_addr 7f7fffff > (XEN) [VT-D]found ACPI_DMAR_RMRR: > (XEN) [VT-D] endpoint: 0000:00:16.7 > (XEN) [VT-D]dmar.c:581: Non-existent device (0000:00:16.7) is > reported in RMRR (78907000, 78986fff)'s scope! > (XEN) [VT-D]dmar.c:596: Ignore the RMRR (78907000, 78986fff) due to This is also part of a reserved region, so should be added to the iommu page tables anyway regardless of this message. > devices under its scope are not PCI discoverable! > > > > I agree. > > > > > > Can you paste the memory map as printed by Xen when booting, and what > > > command line are you using to boot Xen. > > > > So this is OpenXT, and it's booting EFI -> xen -> tboot -> xen > > > > There's the memory map > > (XEN) TBOOT RAM map: > > (XEN) 0000000000000000 - 0000000000060000 (usable) > > (XEN) 0000000000060000 - 0000000000068000 (reserved) > > (XEN) 0000000000068000 - 000000000009e000 (usable) > > (XEN) 000000000009e000 - 000000000009f000 (reserved) > > (XEN) 000000000009f000 - 00000000000a0000 (usable) > > (XEN) 00000000000a0000 - 0000000000100000 (reserved) > > (XEN) 0000000000100000 - 0000000040000000 (usable) > > (XEN) 0000000040000000 - 0000000040400000 (reserved) > > (XEN) 0000000040400000 - 000000007024b000 (usable) > > (XEN) 000000007024b000 - 000000007024c000 (ACPI NVS) > > (XEN) 000000007024c000 - 000000007024d000 (reserved) > > (XEN) 000000007024d000 - 0000000077f19000 (usable) > > (XEN) 0000000077f19000 - 0000000078987000 (reserved) > > (XEN) 0000000078987000 - 0000000078a04000 (ACPI data) > > (XEN) 0000000078a04000 - 0000000078ea3000 (ACPI NVS) > > (XEN) 0000000078ea3000 - 000000007acff000 (reserved) > > (XEN) 000000007acff000 - 000000007ad00000 (usable) > > (XEN) 000000007ad00000 - 000000007f800000 (reserved) > > (XEN) 00000000f0000000 - 00000000f8000000 (reserved) > > (XEN) 00000000fe000000 - 00000000fe011000 (reserved) > > (XEN) 00000000fec00000 - 00000000fec01000 (reserved) > > (XEN) 00000000fee00000 - 00000000fee01000 (reserved) > > (XEN) 00000000ff000000 - 0000000100000000 (reserved) > > (XEN) 0000000100000000 - 000000047c800000 (usable) > > (XEN) EFI memory map: > > (XEN) 0000000000000-000000009dfff type=7 attr=000000000000000f > > (XEN) 000000009e000-000000009efff type=0 attr=000000000000000f > > (XEN) 000000009f000-000000009ffff type=3 attr=000000000000000f > > (XEN) 0000000100000-000003fffffff type=7 attr=000000000000000f > > (XEN) 0000040000000-00000403fffff type=0 attr=000000000000000f > > (XEN) 0000040400000-000005e359fff type=7 attr=000000000000000f > > (XEN) 000005e35a000-000005e399fff type=4 attr=000000000000000f > > (XEN) 000005e39a000-000006a47dfff type=7 attr=000000000000000f > > (XEN) 000006a47e000-000006c3eefff type=2 attr=000000000000000f > > (XEN) 000006c3ef000-000006d5eefff type=1 attr=000000000000000f > > (XEN) 000006d5ef000-000006d86cfff type=2 attr=000000000000000f > > (XEN) 000006d86d000-000006d978fff type=1 attr=000000000000000f > > (XEN) 000006d979000-000006dc7afff type=4 attr=000000000000000f > > (XEN) 000006dc7b000-000006dc98fff type=3 attr=000000000000000f > > (XEN) 000006dc99000-000006dcc7fff type=4 attr=000000000000000f > > (XEN) 000006dcc8000-000006dccdfff type=3 attr=000000000000000f > > (XEN) 000006dcce000-00000701a5fff type=4 attr=000000000000000f > > (XEN) 00000701a6000-00000701c8fff type=3 attr=000000000000000f > > (XEN) 00000701c9000-00000701edfff type=4 attr=000000000000000f > > (XEN) 00000701ee000-0000070204fff type=3 attr=000000000000000f > > (XEN) 0000070205000-000007022cfff type=4 attr=000000000000000f > > (XEN) 000007022d000-000007024afff type=3 attr=000000000000000f > > (XEN) 000007024b000-000007024bfff type=10 attr=000000000000000f > > (XEN) 000007024c000-000007024cfff type=6 attr=800000000000000f > > (XEN) 000007024d000-000007024dfff type=4 attr=000000000000000f > > (XEN) 000007024e000-0000070282fff type=3 attr=000000000000000f > > (XEN) 0000070283000-00000702c3fff type=4 attr=000000000000000f > > (XEN) 00000702c4000-00000702c8fff type=3 attr=000000000000000f > > (XEN) 00000702c9000-00000702defff type=4 attr=000000000000000f > > (XEN) 00000702df000-0000070307fff type=3 attr=000000000000000f > > (XEN) 0000070308000-0000070317fff type=4 attr=000000000000000f > > (XEN) 0000070318000-0000070319fff type=3 attr=000000000000000f > > (XEN) 000007031a000-0000070331fff type=4 attr=000000000000000f > > (XEN) 0000070332000-0000070349fff type=3 attr=000000000000000f > > (XEN) 000007034a000-0000070356fff type=2 attr=000000000000000f > > (XEN) 0000070357000-0000070357fff type=7 attr=000000000000000f > > (XEN) 0000070358000-0000070358fff type=2 attr=000000000000000f > > (XEN) 0000070359000-0000076f3efff type=4 attr=000000000000000f > > (XEN) 0000076f3f000-00000772affff type=7 attr=000000000000000f > > (XEN) 00000772b0000-0000077f18fff type=3 attr=000000000000000f > > (XEN) 0000077f19000-0000078986fff type=0 attr=000000000000000f > > (XEN) 0000078987000-0000078a03fff type=9 attr=000000000000000f > > (XEN) 0000078a04000-0000078ea2fff type=10 attr=000000000000000f > > (XEN) 0000078ea3000-000007ab22fff type=6 attr=800000000000000f > > (XEN) 000007ab23000-000007acfefff type=5 attr=800000000000000f > > (XEN) 000007acff000-000007acfffff type=4 attr=000000000000000f > > (XEN) 0000100000000-000047c7fffff type=7 attr=000000000000000f > > (XEN) 00000000a0000-00000000fffff type=0 attr=0000000000000000 > > (XEN) 000007ad00000-000007adfffff type=0 attr=070000000000000f > > (XEN) 000007ae00000-000007f7fffff type=0 attr=0000000000000000 > > (XEN) 00000f0000000-00000f7ffffff type=11 attr=800000000000100d > > (XEN) 00000fe000000-00000fe010fff type=11 attr=8000000000000001 > > (XEN) 00000fec00000-00000fec00fff type=11 attr=8000000000000001 > > (XEN) 00000fee00000-00000fee00fff type=11 attr=8000000000000001 > > (XEN) 00000ff000000-00000ffffffff type=11 attr=800000000000100d > > > > Command line > > console=com1 dom0_mem=min:420M,max:420M,420M efi=no-rs,attr=uc > > com1=115200,8n1,pci mbi-video vga=current flask=enforcing loglvl=debug > > guest_loglvl=debug smt=0 ucode=-1 bootscrub=1 > > argo=yes,mac-permissive=1 iommu=force,igfx > > > > iommu=force,igfx was to force igfx back on. I added a dmi quirk to > > set no-igfx on this platform as a temporary workaround. I assume setting no-igfx fixed the issue and the card works fine in that case? > > > Have you tried adding dom0-iommu=map-inclusive to the Xen command > > > line? > > Still seeing faults with dom0-iommu=map-inclusive. At a different > address this time: > Oct 16 15:58:05.110768 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > Request device [0000:00:02.0] fault addr ea0c4f000, iommu reg = ffff That's also past the end of RAM. > 82c00021d000 > Oct 16 15:58:05.110774 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > PTE Read access is not set > Oct 16 15:58:05.110777 VM hypervisor: (XEN) print_vtd_entries: iommu > #0 dev 0000:00:02.0 gmfn ea0c4f > Oct 16 15:58:05.110780 VM hypervisor: (XEN) root_entry[00] = 46e129001 > Oct 16 15:58:05.110782 VM hypervisor: (XEN) context[10] = 2_46e128001 > Oct 16 15:58:05.110785 VM hypervisor: (XEN) l4[000] = 46e11b003 > Oct 16 15:58:05.110787 VM hypervisor: (XEN) l3[03a] = 0 > Oct 16 15:58:05.110789 VM hypervisor: (XEN) l3[03a] not present > > The previous posting, the two faulting addresses repeated in pairs. > Here it is only this one address repeating. > > I plugged and unplugged and a different address was repeating with a > few other random addresses with 1 or 2 faults. Here is uniq -c output > of the address and count pulled from the logs: > 0x1ce9d6b000 2007 > 0x31b50d5000 1 > 0x1ce9d6b000 882 > 0x707741000 1 > 0x1ce9d6b000 1114 > 0x20d2099000 1 > 0x1ce9d6b000 3489 > 0xeb98eb000 1 > 0x1ce9d6b000 2430 > 0xeb98eb000 1 > 0x1ce9d6b000 1300 > 0x22f20bb000 1 > 0x1ce9d6b000 269 > 0x22f20bb000 1 > 0x1ce9d6b000 5091 > 0x6c99ec9000 1 > 0x1ce9d6b000 29 > 0xeb98eb000 1 > 0x1ce9d6b000 4599 > 0x6c99ec9000 1 > 0x1ce9d6b000 1989 Hm, it's hard to tell what's going on. My limited experience with IOMMU faults on broken systems there's a small range that initially triggers those, and then the device goes wonky and starts accessing a whole load of invalid addresses. You could try adding those manually using the rmrr Xen command line option [0], maybe you can figure out which range(s) are missing? Roger. [0] https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#rmrr ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2020-10-21 9:58 ` Roger Pau Monné @ 2020-10-21 10:33 ` Jan Beulich 2020-10-21 10:51 ` Roger Pau Monné 2020-10-21 12:45 ` Jason Andryuk 1 sibling, 1 reply; 18+ messages in thread From: Jan Beulich @ 2020-10-21 10:33 UTC (permalink / raw) To: Roger Pau Monné; +Cc: Jason Andryuk, Andrew Cooper, intel-gfx, xen-devel On 21.10.2020 11:58, Roger Pau Monné wrote: > On Fri, Oct 16, 2020 at 12:23:22PM -0400, Jason Andryuk wrote: >> The RMRRs are: >> (XEN) [VT-D]Host address width 39 >> (XEN) [VT-D]found ACPI_DMAR_DRHD: >> (XEN) [VT-D] dmaru->address = fed90000 >> (XEN) [VT-D]drhd->address = fed90000 iommu->reg = ffff82c00021d000 >> (XEN) [VT-D]cap = 1c0000c40660462 ecap = 19e2ff0505e >> (XEN) [VT-D] endpoint: 0000:00:02.0 >> (XEN) [VT-D]found ACPI_DMAR_DRHD: >> (XEN) [VT-D] dmaru->address = fed91000 >> (XEN) [VT-D]drhd->address = fed91000 iommu->reg = ffff82c00021f000 >> (XEN) [VT-D]cap = d2008c40660462 ecap = f050da >> (XEN) [VT-D] IOAPIC: 0000:00:1e.7 >> (XEN) [VT-D] MSI HPET: 0000:00:1e.6 >> (XEN) [VT-D] flags: INCLUDE_ALL >> (XEN) [VT-D]found ACPI_DMAR_RMRR: >> (XEN) [VT-D] endpoint: 0000:00:14.0 >> (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 78863000 end_addr 78882fff >> (XEN) [VT-D]found ACPI_DMAR_RMRR: >> (XEN) [VT-D] endpoint: 0000:00:02.0 >> (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 7d000000 end_addr 7f7fffff >> (XEN) [VT-D]found ACPI_DMAR_RMRR: >> (XEN) [VT-D] endpoint: 0000:00:16.7 >> (XEN) [VT-D]dmar.c:581: Non-existent device (0000:00:16.7) is >> reported in RMRR (78907000, 78986fff)'s scope! >> (XEN) [VT-D]dmar.c:596: Ignore the RMRR (78907000, 78986fff) due to > > This is also part of a reserved region, so should be added to the > iommu page tables anyway regardless of this message. Could you clarify why you think so? RMRRs are tied to devices, so if a device in reality doesn't exist (and no other one uses the same range), I don't see why an IOMMU mapping would be needed (unless to work around some related firmware bug). Plus aiui none of the IOMMU faults actually report this range as having got accessed. Jan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2020-10-21 10:33 ` Jan Beulich @ 2020-10-21 10:51 ` Roger Pau Monné 0 siblings, 0 replies; 18+ messages in thread From: Roger Pau Monné @ 2020-10-21 10:51 UTC (permalink / raw) To: Jan Beulich; +Cc: Jason Andryuk, Andrew Cooper, intel-gfx, xen-devel On Wed, Oct 21, 2020 at 12:33:05PM +0200, Jan Beulich wrote: > On 21.10.2020 11:58, Roger Pau Monné wrote: > > On Fri, Oct 16, 2020 at 12:23:22PM -0400, Jason Andryuk wrote: > >> The RMRRs are: > >> (XEN) [VT-D]Host address width 39 > >> (XEN) [VT-D]found ACPI_DMAR_DRHD: > >> (XEN) [VT-D] dmaru->address = fed90000 > >> (XEN) [VT-D]drhd->address = fed90000 iommu->reg = ffff82c00021d000 > >> (XEN) [VT-D]cap = 1c0000c40660462 ecap = 19e2ff0505e > >> (XEN) [VT-D] endpoint: 0000:00:02.0 > >> (XEN) [VT-D]found ACPI_DMAR_DRHD: > >> (XEN) [VT-D] dmaru->address = fed91000 > >> (XEN) [VT-D]drhd->address = fed91000 iommu->reg = ffff82c00021f000 > >> (XEN) [VT-D]cap = d2008c40660462 ecap = f050da > >> (XEN) [VT-D] IOAPIC: 0000:00:1e.7 > >> (XEN) [VT-D] MSI HPET: 0000:00:1e.6 > >> (XEN) [VT-D] flags: INCLUDE_ALL > >> (XEN) [VT-D]found ACPI_DMAR_RMRR: > >> (XEN) [VT-D] endpoint: 0000:00:14.0 > >> (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 78863000 end_addr 78882fff > >> (XEN) [VT-D]found ACPI_DMAR_RMRR: > >> (XEN) [VT-D] endpoint: 0000:00:02.0 > >> (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 7d000000 end_addr 7f7fffff > >> (XEN) [VT-D]found ACPI_DMAR_RMRR: > >> (XEN) [VT-D] endpoint: 0000:00:16.7 > >> (XEN) [VT-D]dmar.c:581: Non-existent device (0000:00:16.7) is > >> reported in RMRR (78907000, 78986fff)'s scope! > >> (XEN) [VT-D]dmar.c:596: Ignore the RMRR (78907000, 78986fff) due to > > > > This is also part of a reserved region, so should be added to the > > iommu page tables anyway regardless of this message. > > Could you clarify why you think so? RMRRs are tied to devices, so > if a device in reality doesn't exist (and no other one uses the > same range), I don't see why an IOMMU mapping would be needed > (unless to work around some related firmware bug). Plus aiui none > of the IOMMU faults actually report this range as having got > accessed. Since it's the hardware domain that gets the gfx card assigned here it will get any reserved regions added to the IOMMU page tables in arch_iommu_hwdom_init. I agree it's not relevant here, since those are not the regions reported in the IOMMU faults. Roger. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2020-10-21 9:58 ` Roger Pau Monné 2020-10-21 10:33 ` Jan Beulich @ 2020-10-21 12:45 ` Jason Andryuk 2020-10-21 12:52 ` Jan Beulich 1 sibling, 1 reply; 18+ messages in thread From: Jason Andryuk @ 2020-10-21 12:45 UTC (permalink / raw) To: Roger Pau Monné; +Cc: Andrew Cooper, intel-gfx, xen-devel On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@citrix.com> wrote: > > On Fri, Oct 16, 2020 at 12:23:22PM -0400, Jason Andryuk wrote: > > > > The RMRRs are: > > (XEN) [VT-D]Host address width 39 > > (XEN) [VT-D]found ACPI_DMAR_DRHD: > > (XEN) [VT-D] dmaru->address = fed90000 > > (XEN) [VT-D]drhd->address = fed90000 iommu->reg = ffff82c00021d000 > > (XEN) [VT-D]cap = 1c0000c40660462 ecap = 19e2ff0505e > > (XEN) [VT-D] endpoint: 0000:00:02.0 > > (XEN) [VT-D]found ACPI_DMAR_DRHD: > > (XEN) [VT-D] dmaru->address = fed91000 > > (XEN) [VT-D]drhd->address = fed91000 iommu->reg = ffff82c00021f000 > > (XEN) [VT-D]cap = d2008c40660462 ecap = f050da > > (XEN) [VT-D] IOAPIC: 0000:00:1e.7 > > (XEN) [VT-D] MSI HPET: 0000:00:1e.6 > > (XEN) [VT-D] flags: INCLUDE_ALL > > (XEN) [VT-D]found ACPI_DMAR_RMRR: > > (XEN) [VT-D] endpoint: 0000:00:14.0 > > (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 78863000 end_addr 78882fff > > (XEN) [VT-D]found ACPI_DMAR_RMRR: > > (XEN) [VT-D] endpoint: 0000:00:02.0 > > (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 7d000000 end_addr 7f7fffff > > (XEN) [VT-D]found ACPI_DMAR_RMRR: > > (XEN) [VT-D] endpoint: 0000:00:16.7 > > (XEN) [VT-D]dmar.c:581: Non-existent device (0000:00:16.7) is > > reported in RMRR (78907000, 78986fff)'s scope! > > (XEN) [VT-D]dmar.c:596: Ignore the RMRR (78907000, 78986fff) due to > > This is also part of a reserved region, so should be added to the > iommu page tables anyway regardless of this message. I wonder if this is for the Intel AMT PCI device? I assumed it is disabled, but I actually can't find it listed in the BIOS configuration to verify. > > devices under its scope are not PCI discoverable! > > > > > > I agree. > > > > > > > > Can you paste the memory map as printed by Xen when booting, and what > > > > command line are you using to boot Xen. > > > > > > So this is OpenXT, and it's booting EFI -> xen -> tboot -> xen > > > > > > There's the memory map > > > (XEN) TBOOT RAM map: > > > (XEN) 0000000000000000 - 0000000000060000 (usable) > > > (XEN) 0000000000060000 - 0000000000068000 (reserved) > > > (XEN) 0000000000068000 - 000000000009e000 (usable) > > > (XEN) 000000000009e000 - 000000000009f000 (reserved) > > > (XEN) 000000000009f000 - 00000000000a0000 (usable) > > > (XEN) 00000000000a0000 - 0000000000100000 (reserved) > > > (XEN) 0000000000100000 - 0000000040000000 (usable) > > > (XEN) 0000000040000000 - 0000000040400000 (reserved) > > > (XEN) 0000000040400000 - 000000007024b000 (usable) > > > (XEN) 000000007024b000 - 000000007024c000 (ACPI NVS) > > > (XEN) 000000007024c000 - 000000007024d000 (reserved) > > > (XEN) 000000007024d000 - 0000000077f19000 (usable) > > > (XEN) 0000000077f19000 - 0000000078987000 (reserved) > > > (XEN) 0000000078987000 - 0000000078a04000 (ACPI data) > > > (XEN) 0000000078a04000 - 0000000078ea3000 (ACPI NVS) > > > (XEN) 0000000078ea3000 - 000000007acff000 (reserved) > > > (XEN) 000000007acff000 - 000000007ad00000 (usable) > > > (XEN) 000000007ad00000 - 000000007f800000 (reserved) > > > (XEN) 00000000f0000000 - 00000000f8000000 (reserved) > > > (XEN) 00000000fe000000 - 00000000fe011000 (reserved) > > > (XEN) 00000000fec00000 - 00000000fec01000 (reserved) > > > (XEN) 00000000fee00000 - 00000000fee01000 (reserved) > > > (XEN) 00000000ff000000 - 0000000100000000 (reserved) > > > (XEN) 0000000100000000 - 000000047c800000 (usable) > > > (XEN) EFI memory map: > > > (XEN) 0000000000000-000000009dfff type=7 attr=000000000000000f > > > (XEN) 000000009e000-000000009efff type=0 attr=000000000000000f > > > (XEN) 000000009f000-000000009ffff type=3 attr=000000000000000f > > > (XEN) 0000000100000-000003fffffff type=7 attr=000000000000000f > > > (XEN) 0000040000000-00000403fffff type=0 attr=000000000000000f > > > (XEN) 0000040400000-000005e359fff type=7 attr=000000000000000f > > > (XEN) 000005e35a000-000005e399fff type=4 attr=000000000000000f > > > (XEN) 000005e39a000-000006a47dfff type=7 attr=000000000000000f > > > (XEN) 000006a47e000-000006c3eefff type=2 attr=000000000000000f > > > (XEN) 000006c3ef000-000006d5eefff type=1 attr=000000000000000f > > > (XEN) 000006d5ef000-000006d86cfff type=2 attr=000000000000000f > > > (XEN) 000006d86d000-000006d978fff type=1 attr=000000000000000f > > > (XEN) 000006d979000-000006dc7afff type=4 attr=000000000000000f > > > (XEN) 000006dc7b000-000006dc98fff type=3 attr=000000000000000f > > > (XEN) 000006dc99000-000006dcc7fff type=4 attr=000000000000000f > > > (XEN) 000006dcc8000-000006dccdfff type=3 attr=000000000000000f > > > (XEN) 000006dcce000-00000701a5fff type=4 attr=000000000000000f > > > (XEN) 00000701a6000-00000701c8fff type=3 attr=000000000000000f > > > (XEN) 00000701c9000-00000701edfff type=4 attr=000000000000000f > > > (XEN) 00000701ee000-0000070204fff type=3 attr=000000000000000f > > > (XEN) 0000070205000-000007022cfff type=4 attr=000000000000000f > > > (XEN) 000007022d000-000007024afff type=3 attr=000000000000000f > > > (XEN) 000007024b000-000007024bfff type=10 attr=000000000000000f > > > (XEN) 000007024c000-000007024cfff type=6 attr=800000000000000f > > > (XEN) 000007024d000-000007024dfff type=4 attr=000000000000000f > > > (XEN) 000007024e000-0000070282fff type=3 attr=000000000000000f > > > (XEN) 0000070283000-00000702c3fff type=4 attr=000000000000000f > > > (XEN) 00000702c4000-00000702c8fff type=3 attr=000000000000000f > > > (XEN) 00000702c9000-00000702defff type=4 attr=000000000000000f > > > (XEN) 00000702df000-0000070307fff type=3 attr=000000000000000f > > > (XEN) 0000070308000-0000070317fff type=4 attr=000000000000000f > > > (XEN) 0000070318000-0000070319fff type=3 attr=000000000000000f > > > (XEN) 000007031a000-0000070331fff type=4 attr=000000000000000f > > > (XEN) 0000070332000-0000070349fff type=3 attr=000000000000000f > > > (XEN) 000007034a000-0000070356fff type=2 attr=000000000000000f > > > (XEN) 0000070357000-0000070357fff type=7 attr=000000000000000f > > > (XEN) 0000070358000-0000070358fff type=2 attr=000000000000000f > > > (XEN) 0000070359000-0000076f3efff type=4 attr=000000000000000f > > > (XEN) 0000076f3f000-00000772affff type=7 attr=000000000000000f > > > (XEN) 00000772b0000-0000077f18fff type=3 attr=000000000000000f > > > (XEN) 0000077f19000-0000078986fff type=0 attr=000000000000000f > > > (XEN) 0000078987000-0000078a03fff type=9 attr=000000000000000f > > > (XEN) 0000078a04000-0000078ea2fff type=10 attr=000000000000000f > > > (XEN) 0000078ea3000-000007ab22fff type=6 attr=800000000000000f > > > (XEN) 000007ab23000-000007acfefff type=5 attr=800000000000000f > > > (XEN) 000007acff000-000007acfffff type=4 attr=000000000000000f > > > (XEN) 0000100000000-000047c7fffff type=7 attr=000000000000000f > > > (XEN) 00000000a0000-00000000fffff type=0 attr=0000000000000000 > > > (XEN) 000007ad00000-000007adfffff type=0 attr=070000000000000f > > > (XEN) 000007ae00000-000007f7fffff type=0 attr=0000000000000000 > > > (XEN) 00000f0000000-00000f7ffffff type=11 attr=800000000000100d > > > (XEN) 00000fe000000-00000fe010fff type=11 attr=8000000000000001 > > > (XEN) 00000fec00000-00000fec00fff type=11 attr=8000000000000001 > > > (XEN) 00000fee00000-00000fee00fff type=11 attr=8000000000000001 > > > (XEN) 00000ff000000-00000ffffffff type=11 attr=800000000000100d > > > > > > Command line > > > console=com1 dom0_mem=min:420M,max:420M,420M efi=no-rs,attr=uc > > > com1=115200,8n1,pci mbi-video vga=current flask=enforcing loglvl=debug > > > guest_loglvl=debug smt=0 ucode=-1 bootscrub=1 > > > argo=yes,mac-permissive=1 iommu=force,igfx > > > > > > iommu=force,igfx was to force igfx back on. I added a dmi quirk to > > > set no-igfx on this platform as a temporary workaround. > > I assume setting no-igfx fixed the issue and the card works fine in > that case? Yes, it seems to work. The internal and 2 external monitors are displaying and seem okay. If I unplug the dock with those 2 displays, then go plug in a different dock with a different monitor, I've seen (unclear how often) the i915 report errors with configuring it's "pipe" and the built in display (eDP) is black. But it may recover sometimes? > > > > Have you tried adding dom0-iommu=map-inclusive to the Xen command > > > > line? > > > > Still seeing faults with dom0-iommu=map-inclusive. At a different > > address this time: > > Oct 16 15:58:05.110768 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > > Request device [0000:00:02.0] fault addr ea0c4f000, iommu reg = ffff > > That's also past the end of RAM. > > > 82c00021d000 > > Oct 16 15:58:05.110774 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > > PTE Read access is not set > > Oct 16 15:58:05.110777 VM hypervisor: (XEN) print_vtd_entries: iommu > > #0 dev 0000:00:02.0 gmfn ea0c4f > > Oct 16 15:58:05.110780 VM hypervisor: (XEN) root_entry[00] = 46e129001 > > Oct 16 15:58:05.110782 VM hypervisor: (XEN) context[10] = 2_46e128001 > > Oct 16 15:58:05.110785 VM hypervisor: (XEN) l4[000] = 46e11b003 > > Oct 16 15:58:05.110787 VM hypervisor: (XEN) l3[03a] = 0 > > Oct 16 15:58:05.110789 VM hypervisor: (XEN) l3[03a] not present > > > > The previous posting, the two faulting addresses repeated in pairs. > > Here it is only this one address repeating. > > > > I plugged and unplugged and a different address was repeating with a > > few other random addresses with 1 or 2 faults. Here is uniq -c output > > of the address and count pulled from the logs: > > 0x1ce9d6b000 2007 > > 0x31b50d5000 1 > > 0x1ce9d6b000 882 > > 0x707741000 1 > > 0x1ce9d6b000 1114 > > 0x20d2099000 1 > > 0x1ce9d6b000 3489 > > 0xeb98eb000 1 > > 0x1ce9d6b000 2430 > > 0xeb98eb000 1 > > 0x1ce9d6b000 1300 > > 0x22f20bb000 1 > > 0x1ce9d6b000 269 > > 0x22f20bb000 1 > > 0x1ce9d6b000 5091 > > 0x6c99ec9000 1 > > 0x1ce9d6b000 29 > > 0xeb98eb000 1 > > 0x1ce9d6b000 4599 > > 0x6c99ec9000 1 > > 0x1ce9d6b000 1989 > > Hm, it's hard to tell what's going on. My limited experience with > IOMMU faults on broken systems there's a small range that initially > triggers those, and then the device goes wonky and starts accessing a > whole load of invalid addresses. > > You could try adding those manually using the rmrr Xen command line > option [0], maybe you can figure out which range(s) are missing? They seem to change, so it's hard to know. Would there be harm in adding one to cover the end of RAM ( 0x04,7c80,0000 ) to ( 0xff,ffff,ffff )? Maybe that would just quiet the pointless faults while leaving the IOMMU enabled? Thanks for taking a look. Regards, Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2020-10-21 12:45 ` Jason Andryuk @ 2020-10-21 12:52 ` Jan Beulich 2020-10-21 13:36 ` Jason Andryuk 0 siblings, 1 reply; 18+ messages in thread From: Jan Beulich @ 2020-10-21 12:52 UTC (permalink / raw) To: Jason Andryuk; +Cc: Roger Pau Monné, Andrew Cooper, intel-gfx, xen-devel On 21.10.2020 14:45, Jason Andryuk wrote: > On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@citrix.com> wrote: >> Hm, it's hard to tell what's going on. My limited experience with >> IOMMU faults on broken systems there's a small range that initially >> triggers those, and then the device goes wonky and starts accessing a >> whole load of invalid addresses. >> >> You could try adding those manually using the rmrr Xen command line >> option [0], maybe you can figure out which range(s) are missing? > > They seem to change, so it's hard to know. Would there be harm in > adding one to cover the end of RAM ( 0x04,7c80,0000 ) to ( > 0xff,ffff,ffff )? Maybe that would just quiet the pointless faults > while leaving the IOMMU enabled? While they may quieten the faults, I don't think those faults are pointless. They indicate some problem with the software (less likely the hardware, possibly the firmware) that you're using. Also there's the question of what the overall behavior is going to be when devices are permitted to access unpopulated address ranges. I assume you did check already that no devices have their BARs placed in that range? Jan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2020-10-21 12:52 ` Jan Beulich @ 2020-10-21 13:36 ` Jason Andryuk 2020-10-21 13:59 ` Jan Beulich 0 siblings, 1 reply; 18+ messages in thread From: Jason Andryuk @ 2020-10-21 13:36 UTC (permalink / raw) To: Jan Beulich; +Cc: Roger Pau Monné, Andrew Cooper, intel-gfx, xen-devel On Wed, Oct 21, 2020 at 8:53 AM Jan Beulich <jbeulich@suse.com> wrote: > > On 21.10.2020 14:45, Jason Andryuk wrote: > > On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@citrix.com> wrote: > >> Hm, it's hard to tell what's going on. My limited experience with > >> IOMMU faults on broken systems there's a small range that initially > >> triggers those, and then the device goes wonky and starts accessing a > >> whole load of invalid addresses. > >> > >> You could try adding those manually using the rmrr Xen command line > >> option [0], maybe you can figure out which range(s) are missing? > > > > They seem to change, so it's hard to know. Would there be harm in > > adding one to cover the end of RAM ( 0x04,7c80,0000 ) to ( > > 0xff,ffff,ffff )? Maybe that would just quiet the pointless faults > > while leaving the IOMMU enabled? > > While they may quieten the faults, I don't think those faults are > pointless. They indicate some problem with the software (less > likely the hardware, possibly the firmware) that you're using. > Also there's the question of what the overall behavior is going > to be when devices are permitted to access unpopulated address > ranges. I assume you did check already that no devices have their > BARs placed in that range? Isn't no-igfx already letting them try to read those unpopulated addresses? Looks like all PCI BARs are below 4GB. The graphics ones are: 00:02.0 VGA compatible controller: Intel Corporation Device 3ea0 (rev 02) (prog-if 00 [VGA controller]) Subsystem: Dell Device 08b9 Flags: bus master, fast devsel, latency 0, IRQ 177 Memory at cb000000 (64-bit, non-prefetchable) [size=16M] Memory at 80000000 (64-bit, prefetchable) [size=256M] Yes, I agree the faults aren't pointless. I'm wondering if it's something with the i915 driver or hardware having assumptions that aren't met by Xen swiotlb. Regards, Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2020-10-21 13:36 ` Jason Andryuk @ 2020-10-21 13:59 ` Jan Beulich 2021-02-19 17:30 ` Jason Andryuk 0 siblings, 1 reply; 18+ messages in thread From: Jan Beulich @ 2020-10-21 13:59 UTC (permalink / raw) To: Jason Andryuk; +Cc: Roger Pau Monné, Andrew Cooper, intel-gfx, xen-devel On 21.10.2020 15:36, Jason Andryuk wrote: > On Wed, Oct 21, 2020 at 8:53 AM Jan Beulich <jbeulich@suse.com> wrote: >> >> On 21.10.2020 14:45, Jason Andryuk wrote: >>> On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@citrix.com> wrote: >>>> Hm, it's hard to tell what's going on. My limited experience with >>>> IOMMU faults on broken systems there's a small range that initially >>>> triggers those, and then the device goes wonky and starts accessing a >>>> whole load of invalid addresses. >>>> >>>> You could try adding those manually using the rmrr Xen command line >>>> option [0], maybe you can figure out which range(s) are missing? >>> >>> They seem to change, so it's hard to know. Would there be harm in >>> adding one to cover the end of RAM ( 0x04,7c80,0000 ) to ( >>> 0xff,ffff,ffff )? Maybe that would just quiet the pointless faults >>> while leaving the IOMMU enabled? >> >> While they may quieten the faults, I don't think those faults are >> pointless. They indicate some problem with the software (less >> likely the hardware, possibly the firmware) that you're using. >> Also there's the question of what the overall behavior is going >> to be when devices are permitted to access unpopulated address >> ranges. I assume you did check already that no devices have their >> BARs placed in that range? > > Isn't no-igfx already letting them try to read those unpopulated addresses? Yes, and it is for the reason that the documentation for the option says "If specifying `no-igfx` fixes anything, please report the problem." I imply from in in particular that one better wouldn't use it for non-development purposes of whatever kind. Jan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2020-10-21 13:59 ` Jan Beulich @ 2021-02-19 17:30 ` Jason Andryuk 2021-02-22 10:18 ` Roger Pau Monné 0 siblings, 1 reply; 18+ messages in thread From: Jason Andryuk @ 2021-02-19 17:30 UTC (permalink / raw) To: Jan Beulich Cc: Roger Pau Monné, Andrew Cooper, intel-gfx, xen-devel, eric chanudet On Wed, Oct 21, 2020 at 9:59 AM Jan Beulich <jbeulich@suse.com> wrote: > > On 21.10.2020 15:36, Jason Andryuk wrote: > > On Wed, Oct 21, 2020 at 8:53 AM Jan Beulich <jbeulich@suse.com> wrote: > >> > >> On 21.10.2020 14:45, Jason Andryuk wrote: > >>> On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@citrix.com> wrote: > >>>> Hm, it's hard to tell what's going on. My limited experience with > >>>> IOMMU faults on broken systems there's a small range that initially > >>>> triggers those, and then the device goes wonky and starts accessing a > >>>> whole load of invalid addresses. > >>>> > >>>> You could try adding those manually using the rmrr Xen command line > >>>> option [0], maybe you can figure out which range(s) are missing? > >>> > >>> They seem to change, so it's hard to know. Would there be harm in > >>> adding one to cover the end of RAM ( 0x04,7c80,0000 ) to ( > >>> 0xff,ffff,ffff )? Maybe that would just quiet the pointless faults > >>> while leaving the IOMMU enabled? > >> > >> While they may quieten the faults, I don't think those faults are > >> pointless. They indicate some problem with the software (less > >> likely the hardware, possibly the firmware) that you're using. > >> Also there's the question of what the overall behavior is going > >> to be when devices are permitted to access unpopulated address > >> ranges. I assume you did check already that no devices have their > >> BARs placed in that range? > > > > Isn't no-igfx already letting them try to read those unpopulated addresses? > > Yes, and it is for the reason that the documentation for the > option says "If specifying `no-igfx` fixes anything, please > report the problem." I imply from in in particular that one > better wouldn't use it for non-development purposes of whatever > kind. I stopped seeing these DMA faults, but I didn't know what made them go away. Then when working with an older 5.4.64 kernel, I saw them again. Eric bisected down to the 5.4.y version of mainline linux commit: commit 8195400f7ea95399f721ad21f4d663a62c65036f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Oct 19 11:15:23 2020 +0100 drm/i915: Force VT'd workarounds when running as a guest OS If i915.ko is being used as a passthrough device, it does not know if the host is using intel_iommu. Mixing the iommu and gfx causes a few issues (such as scanout overfetch) which we need to workaround inside the driver, so if we detect we are running under a hypervisor, also assume the device access is being virtualised. Regards, Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2021-02-19 17:30 ` Jason Andryuk @ 2021-02-22 10:18 ` Roger Pau Monné 2021-02-22 12:49 ` Jason Andryuk 0 siblings, 1 reply; 18+ messages in thread From: Roger Pau Monné @ 2021-02-22 10:18 UTC (permalink / raw) To: Jason Andryuk Cc: Jan Beulich, Andrew Cooper, intel-gfx, xen-devel, eric chanudet On Fri, Feb 19, 2021 at 12:30:23PM -0500, Jason Andryuk wrote: > On Wed, Oct 21, 2020 at 9:59 AM Jan Beulich <jbeulich@suse.com> wrote: > > > > On 21.10.2020 15:36, Jason Andryuk wrote: > > > On Wed, Oct 21, 2020 at 8:53 AM Jan Beulich <jbeulich@suse.com> wrote: > > >> > > >> On 21.10.2020 14:45, Jason Andryuk wrote: > > >>> On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@citrix.com> wrote: > > >>>> Hm, it's hard to tell what's going on. My limited experience with > > >>>> IOMMU faults on broken systems there's a small range that initially > > >>>> triggers those, and then the device goes wonky and starts accessing a > > >>>> whole load of invalid addresses. > > >>>> > > >>>> You could try adding those manually using the rmrr Xen command line > > >>>> option [0], maybe you can figure out which range(s) are missing? > > >>> > > >>> They seem to change, so it's hard to know. Would there be harm in > > >>> adding one to cover the end of RAM ( 0x04,7c80,0000 ) to ( > > >>> 0xff,ffff,ffff )? Maybe that would just quiet the pointless faults > > >>> while leaving the IOMMU enabled? > > >> > > >> While they may quieten the faults, I don't think those faults are > > >> pointless. They indicate some problem with the software (less > > >> likely the hardware, possibly the firmware) that you're using. > > >> Also there's the question of what the overall behavior is going > > >> to be when devices are permitted to access unpopulated address > > >> ranges. I assume you did check already that no devices have their > > >> BARs placed in that range? > > > > > > Isn't no-igfx already letting them try to read those unpopulated addresses? > > > > Yes, and it is for the reason that the documentation for the > > option says "If specifying `no-igfx` fixes anything, please > > report the problem." I imply from in in particular that one > > better wouldn't use it for non-development purposes of whatever > > kind. > > I stopped seeing these DMA faults, but I didn't know what made them go > away. Then when working with an older 5.4.64 kernel, I saw them > again. Eric bisected down to the 5.4.y version of mainline linux > commit: > > commit 8195400f7ea95399f721ad21f4d663a62c65036f > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Mon Oct 19 11:15:23 2020 +0100 > > drm/i915: Force VT'd workarounds when running as a guest OS > > If i915.ko is being used as a passthrough device, it does not know if > the host is using intel_iommu. Mixing the iommu and gfx causes a few > issues (such as scanout overfetch) which we need to workaround inside > the driver, so if we detect we are running under a hypervisor, also > assume the device access is being virtualised. So the commit above fixes the DMA faults seen on Linux when using a i915 gfx card? Thanks for digging into this. Roger. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: i915 dma faults on Xen 2021-02-22 10:18 ` Roger Pau Monné @ 2021-02-22 12:49 ` Jason Andryuk 0 siblings, 0 replies; 18+ messages in thread From: Jason Andryuk @ 2021-02-22 12:49 UTC (permalink / raw) To: Roger Pau Monné Cc: Jan Beulich, Andrew Cooper, intel-gfx, xen-devel, eric chanudet On Mon, Feb 22, 2021 at 5:18 AM Roger Pau Monné <roger.pau@citrix.com> wrote: > > On Fri, Feb 19, 2021 at 12:30:23PM -0500, Jason Andryuk wrote: > > On Wed, Oct 21, 2020 at 9:59 AM Jan Beulich <jbeulich@suse.com> wrote: > > > > > > On 21.10.2020 15:36, Jason Andryuk wrote: > > > > On Wed, Oct 21, 2020 at 8:53 AM Jan Beulich <jbeulich@suse.com> wrote: > > > >> > > > >> On 21.10.2020 14:45, Jason Andryuk wrote: > > > >>> On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné <roger.pau@citrix.com> wrote: > > > >>>> Hm, it's hard to tell what's going on. My limited experience with > > > >>>> IOMMU faults on broken systems there's a small range that initially > > > >>>> triggers those, and then the device goes wonky and starts accessing a > > > >>>> whole load of invalid addresses. > > > >>>> > > > >>>> You could try adding those manually using the rmrr Xen command line > > > >>>> option [0], maybe you can figure out which range(s) are missing? > > > >>> > > > >>> They seem to change, so it's hard to know. Would there be harm in > > > >>> adding one to cover the end of RAM ( 0x04,7c80,0000 ) to ( > > > >>> 0xff,ffff,ffff )? Maybe that would just quiet the pointless faults > > > >>> while leaving the IOMMU enabled? > > > >> > > > >> While they may quieten the faults, I don't think those faults are > > > >> pointless. They indicate some problem with the software (less > > > >> likely the hardware, possibly the firmware) that you're using. > > > >> Also there's the question of what the overall behavior is going > > > >> to be when devices are permitted to access unpopulated address > > > >> ranges. I assume you did check already that no devices have their > > > >> BARs placed in that range? > > > > > > > > Isn't no-igfx already letting them try to read those unpopulated addresses? > > > > > > Yes, and it is for the reason that the documentation for the > > > option says "If specifying `no-igfx` fixes anything, please > > > report the problem." I imply from in in particular that one > > > better wouldn't use it for non-development purposes of whatever > > > kind. > > > > I stopped seeing these DMA faults, but I didn't know what made them go > > away. Then when working with an older 5.4.64 kernel, I saw them > > again. Eric bisected down to the 5.4.y version of mainline linux > > commit: > > > > commit 8195400f7ea95399f721ad21f4d663a62c65036f > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Mon Oct 19 11:15:23 2020 +0100 > > > > drm/i915: Force VT'd workarounds when running as a guest OS > > > > If i915.ko is being used as a passthrough device, it does not know if > > the host is using intel_iommu. Mixing the iommu and gfx causes a few > > issues (such as scanout overfetch) which we need to workaround inside > > the driver, so if we detect we are running under a hypervisor, also > > assume the device access is being virtualised. > > So the commit above fixes the DMA faults seen on Linux when using a > i915 gfx card? Yes, DMA faults are not seen with this commit. i915 behaves differently when it detects VT-d active, and this commit sets the VT-d behavior when running under any hypervisor. Regards, Jason ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2021-02-22 12:49 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-14 19:28 i915 dma faults on Xen Jason Andryuk 2020-10-14 19:37 ` Andrew Cooper 2020-10-15 11:31 ` Roger Pau Monné 2020-10-15 15:16 ` Jason Andryuk 2020-10-15 16:38 ` Tamas K Lengyel 2020-10-15 17:13 ` Jason Andryuk 2021-02-19 17:33 ` tboot UEFI and Xen (was Re: i915 dma faults on Xen) Jason Andryuk 2020-10-16 16:23 ` i915 dma faults on Xen Jason Andryuk 2020-10-21 9:58 ` Roger Pau Monné 2020-10-21 10:33 ` Jan Beulich 2020-10-21 10:51 ` Roger Pau Monné 2020-10-21 12:45 ` Jason Andryuk 2020-10-21 12:52 ` Jan Beulich 2020-10-21 13:36 ` Jason Andryuk 2020-10-21 13:59 ` Jan Beulich 2021-02-19 17:30 ` Jason Andryuk 2021-02-22 10:18 ` Roger Pau Monné 2021-02-22 12:49 ` Jason Andryuk
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).