All of lore.kernel.org
 help / color / mirror / Atom feed
* RMRRs and Phantom Functions
@ 2022-04-26 17:51 Andrew Cooper
  2022-04-27  3:39 ` Tian, Kevin
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andrew Cooper @ 2022-04-26 17:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Kevin Tian, Edwin Torok

Hello,

Edvin has found a machine with some very weird properties.  It is an HP
ProLiant BL460c Gen8 with:

 \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
             +-01.0-[11]--
             +-01.1-[02]--
             +-02.0-[04]--+-00.0  Emulex Corporation OneConnect 10Gb NIC
(be3)
             |            +-00.1  Emulex Corporation OneConnect 10Gb NIC
(be3)
             |            +-00.2  Emulex Corporation OneConnect 10Gb
iSCSI Initiator (be3)
             |            \-00.3  Emulex Corporation OneConnect 10Gb
iSCSI Initiator (be3)

yet all 4 other functions on the device periodically hit IOMMU faults
(~once every 5 mins, so definitely stats).

(XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.4] fault addr
bdf80000
(XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.5] fault addr
bdf80000
(XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.6] fault addr
bdf80000
(XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.7] fault addr
bdf80000

There are several RMRRs covering the these devices, with:

(XEN) [VT-D]found ACPI_DMAR_RMRR:
(XEN) [VT-D] endpoint: 0000:03:00.0
(XEN) [VT-D] endpoint: 0000:01:00.0
(XEN) [VT-D] endpoint: 0000:01:00.2
(XEN) [VT-D] endpoint: 0000:04:00.0
(XEN) [VT-D] endpoint: 0000:04:00.1
(XEN) [VT-D] endpoint: 0000:04:00.2
(XEN) [VT-D] endpoint: 0000:04:00.3
(XEN) [VT-D]dmar.c:608:   RMRR region: base_addr bdf8f000 end_addr bdf92fff

being the one relevant to these faults.  I've not manually decoded the
DMAR table because device paths are horrible to follow but there are at
least the correct number of endpoints.  The functions all have SR-IOV
(disabled) and ARI (enabled).  None have any Phantom functions described.

Specifying pci-phantom=04:00,1 does appear to work around the faults,
but it's not right, because functions 1 thru 3 aren't actually phantom.

Also, I don't see any logic which actually wires up phantom functions
like this to share RMRRs/IVMDs in IO contexts.  The faults only
disappear as a side effect of 04:00.0 and 04:00.4 being in dom0, as far
as I can tell.

Simply giving the RMRR via rmrr= doesn't work (presumably because of no
patching actual devices, but there's no warning), but it feels as if it
ought to.

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: RMRRs and Phantom Functions
  2022-04-26 17:51 RMRRs and Phantom Functions Andrew Cooper
@ 2022-04-27  3:39 ` Tian, Kevin
  2022-04-27  8:51   ` Andrew Cooper
  2022-04-27  6:59 ` Jan Beulich
  2022-04-27  8:03 ` Roger Pau Monné
  2 siblings, 1 reply; 10+ messages in thread
From: Tian, Kevin @ 2022-04-27  3:39 UTC (permalink / raw)
  To: Cooper, Andrew, xen-devel
  Cc: Pau Monné, Roger, Beulich, Jan, Edwin Torok

> From: Andrew Cooper <Andrew.Cooper3@citrix.com>
> Sent: Wednesday, April 27, 2022 1:52 AM
> 
> Hello,
> 
> Edvin has found a machine with some very weird properties.  It is an HP
> ProLiant BL460c Gen8 with:
> 
>  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
>              +-01.0-[11]--
>              +-01.1-[02]--
>              +-02.0-[04]--+-00.0  Emulex Corporation OneConnect 10Gb NIC
> (be3)
>              |            +-00.1  Emulex Corporation OneConnect 10Gb NIC
> (be3)
>              |            +-00.2  Emulex Corporation OneConnect 10Gb
> iSCSI Initiator (be3)
>              |            \-00.3  Emulex Corporation OneConnect 10Gb
> iSCSI Initiator (be3)
> 
> yet all 4 other functions on the device periodically hit IOMMU faults
> (~once every 5 mins, so definitely stats).
> 
> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.4] fault addr
> bdf80000
> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.5] fault addr
> bdf80000
> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.6] fault addr
> bdf80000
> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.7] fault addr
> bdf80000
> 
> There are several RMRRs covering the these devices, with:
> 
> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> (XEN) [VT-D] endpoint: 0000:03:00.0
> (XEN) [VT-D] endpoint: 0000:01:00.0
> (XEN) [VT-D] endpoint: 0000:01:00.2
> (XEN) [VT-D] endpoint: 0000:04:00.0
> (XEN) [VT-D] endpoint: 0000:04:00.1
> (XEN) [VT-D] endpoint: 0000:04:00.2
> (XEN) [VT-D] endpoint: 0000:04:00.3
> (XEN) [VT-D]dmar.c:608:   RMRR region: base_addr bdf8f000 end_addr
> bdf92fff
> 
> being the one relevant to these faults.  I've not manually decoded the
> DMAR table because device paths are horrible to follow but there are at
> least the correct number of endpoints.  The functions all have SR-IOV
> (disabled) and ARI (enabled).  None have any Phantom functions described.
> 
> Specifying pci-phantom=04:00,1 does appear to work around the faults,
> but it's not right, because functions 1 thru 3 aren't actually phantom.
> 
> Also, I don't see any logic which actually wires up phantom functions
> like this to share RMRRs/IVMDs in IO contexts.  The faults only
> disappear as a side effect of 04:00.0 and 04:00.4 being in dom0, as far
> as I can tell.
> 
> Simply giving the RMRR via rmrr= doesn't work (presumably because of no
> patching actual devices, but there's no warning), but it feels as if it
> ought to.
> 

What is the Xen version? Does it include Jan's change for per-device
quarantine?

btw it's weird why those NIC devices require RMRR in the first place...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RMRRs and Phantom Functions
  2022-04-26 17:51 RMRRs and Phantom Functions Andrew Cooper
  2022-04-27  3:39 ` Tian, Kevin
@ 2022-04-27  6:59 ` Jan Beulich
  2022-04-27 10:05   ` Andrew Cooper
  2022-04-27  8:03 ` Roger Pau Monné
  2 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2022-04-27  6:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monne, Kevin Tian, Edwin Torok, xen-devel

On 26.04.2022 19:51, Andrew Cooper wrote:
> Hello,
> 
> Edvin has found a machine with some very weird properties.  It is an HP
> ProLiant BL460c Gen8 with:
> 
>  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
>              +-01.0-[11]--
>              +-01.1-[02]--
>              +-02.0-[04]--+-00.0  Emulex Corporation OneConnect 10Gb NIC
> (be3)
>              |            +-00.1  Emulex Corporation OneConnect 10Gb NIC
> (be3)
>              |            +-00.2  Emulex Corporation OneConnect 10Gb
> iSCSI Initiator (be3)
>              |            \-00.3  Emulex Corporation OneConnect 10Gb
> iSCSI Initiator (be3)
> 
> yet all 4 other functions on the device periodically hit IOMMU faults
> (~once every 5 mins, so definitely stats).
> 
> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.4] fault addr
> bdf80000
> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.5] fault addr
> bdf80000
> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.6] fault addr
> bdf80000
> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.7] fault addr
> bdf80000
> 
> There are several RMRRs covering the these devices, with:
> 
> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> (XEN) [VT-D] endpoint: 0000:03:00.0
> (XEN) [VT-D] endpoint: 0000:01:00.0
> (XEN) [VT-D] endpoint: 0000:01:00.2
> (XEN) [VT-D] endpoint: 0000:04:00.0
> (XEN) [VT-D] endpoint: 0000:04:00.1
> (XEN) [VT-D] endpoint: 0000:04:00.2
> (XEN) [VT-D] endpoint: 0000:04:00.3
> (XEN) [VT-D]dmar.c:608:   RMRR region: base_addr bdf8f000 end_addr bdf92fff
> 
> being the one relevant to these faults.  I've not manually decoded the
> DMAR table because device paths are horrible to follow but there are at
> least the correct number of endpoints.  The functions all have SR-IOV
> (disabled) and ARI (enabled).  None have any Phantom functions described.
> 
> Specifying pci-phantom=04:00,1 does appear to work around the faults,
> but it's not right, because functions 1 thru 3 aren't actually phantom.

Indeed, and I think you really mean "pci-phantom=04:00,4". I guess we
should actually refuse "pci-phantom=04:00,1" in a case like this one.
The problem is that at the point we set pdev->phantom_stride we may
not know of the other devices, yet. But I guess we could attempt a
config space read of the supposed phantom function's device/vendor
and do <whatever> if these aren't both 0xffff.

> Also, I don't see any logic which actually wires up phantom functions
> like this to share RMRRs/IVMDs in IO contexts.

See for example deassign_device():

    while ( pdev->phantom_stride )
    {
        devfn += pdev->phantom_stride;
        if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
            break;
        ret = iommu_call(hd->platform_ops, reassign_device, d, target, devfn,
                         pci_to_dev(pdev));
        if ( ret )
            goto out;
    }

The hook is invoked with a devfn different from pdev's, and the VT-d
function then looks up the RMRR based on pdev while populating the
context entry for the given devfn. Or at least that's how it's
intended to work.

Jan

>  The faults only
> disappear as a side effect of 04:00.0 and 04:00.4 being in dom0, as far
> as I can tell.
> 
> Simply giving the RMRR via rmrr= doesn't work (presumably because of no
> patching actual devices, but there's no warning), but it feels as if it
> ought to.
> 
> ~Andrew



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RMRRs and Phantom Functions
  2022-04-26 17:51 RMRRs and Phantom Functions Andrew Cooper
  2022-04-27  3:39 ` Tian, Kevin
  2022-04-27  6:59 ` Jan Beulich
@ 2022-04-27  8:03 ` Roger Pau Monné
  2022-04-27 10:17   ` Andrew Cooper
  2 siblings, 1 reply; 10+ messages in thread
From: Roger Pau Monné @ 2022-04-27  8:03 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich, Kevin Tian, Edwin Torok

On Tue, Apr 26, 2022 at 05:51:32PM +0000, Andrew Cooper wrote:
> Hello,
> 
> Edvin has found a machine with some very weird properties.  It is an HP
> ProLiant BL460c Gen8 with:
> 
>  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
>              +-01.0-[11]--
>              +-01.1-[02]--
>              +-02.0-[04]--+-00.0  Emulex Corporation OneConnect 10Gb NIC
> (be3)
>              |            +-00.1  Emulex Corporation OneConnect 10Gb NIC
> (be3)
>              |            +-00.2  Emulex Corporation OneConnect 10Gb
> iSCSI Initiator (be3)
>              |            \-00.3  Emulex Corporation OneConnect 10Gb
> iSCSI Initiator (be3)
> 
> yet all 4 other functions on the device periodically hit IOMMU faults
> (~once every 5 mins, so definitely stats).
> 
> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.4] fault addr
> bdf80000
> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.5] fault addr
> bdf80000
> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.6] fault addr
> bdf80000
> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.7] fault addr
> bdf80000
> 
> There are several RMRRs covering the these devices, with:
> 
> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> (XEN) [VT-D] endpoint: 0000:03:00.0
> (XEN) [VT-D] endpoint: 0000:01:00.0
> (XEN) [VT-D] endpoint: 0000:01:00.2
> (XEN) [VT-D] endpoint: 0000:04:00.0
> (XEN) [VT-D] endpoint: 0000:04:00.1
> (XEN) [VT-D] endpoint: 0000:04:00.2
> (XEN) [VT-D] endpoint: 0000:04:00.3
> (XEN) [VT-D]dmar.c:608:   RMRR region: base_addr bdf8f000 end_addr bdf92fff
> 
> being the one relevant to these faults.  I've not manually decoded the
> DMAR table because device paths are horrible to follow but there are at
> least the correct number of endpoints.  The functions all have SR-IOV
> (disabled) and ARI (enabled).  None have any Phantom functions described.

According to the PCIe spec ARI capable devices must not have phantom
functions:

"With every Function in an ARI Device, the Phantom Functions Supported
field must be set to 00b. The remainder of this field description
applies only to non-ARI multi-Function devices."

> Specifying pci-phantom=04:00,1 does appear to work around the faults,
> but it's not right, because functions 1 thru 3 aren't actually phantom.
> 
> Also, I don't see any logic which actually wires up phantom functions
> like this to share RMRRs/IVMDs in IO contexts.  The faults only
> disappear as a side effect of 04:00.0 and 04:00.4 being in dom0, as far
> as I can tell.

I think I'm slightly confused, so those faults only happen when the
devices are assigned to domains different than dom0?

It would seem to me that functions 4 to 7 not being recognized by Xen
should also lead to their context entries not being setup in the dom0
case, and thus the faults should also happen.

> Simply giving the RMRR via rmrr= doesn't work (presumably because of no
> patching actual devices, but there's no warning), but it feels as if it
> ought to.

Xen should likely complain that there's no matching PCI device for the
provided RMRR regions, and so they are effectively ignored.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RMRRs and Phantom Functions
  2022-04-27  3:39 ` Tian, Kevin
@ 2022-04-27  8:51   ` Andrew Cooper
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Cooper @ 2022-04-27  8:51 UTC (permalink / raw)
  To: Kevin Tian, xen-devel; +Cc: Roger Pau Monne, Beulich, Jan, Edwin Torok

On 27/04/2022 04:39, Tian, Kevin wrote:
>> From: Andrew Cooper <Andrew.Cooper3@citrix.com>
>> Sent: Wednesday, April 27, 2022 1:52 AM
>>
>> Hello,
>>
>> Edvin has found a machine with some very weird properties.  It is an HP
>> ProLiant BL460c Gen8 with:
>>
>>  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
>>              +-01.0-[11]--
>>              +-01.1-[02]--
>>              +-02.0-[04]--+-00.0  Emulex Corporation OneConnect 10Gb NIC
>> (be3)
>>              |            +-00.1  Emulex Corporation OneConnect 10Gb NIC
>> (be3)
>>              |            +-00.2  Emulex Corporation OneConnect 10Gb
>> iSCSI Initiator (be3)
>>              |            \-00.3  Emulex Corporation OneConnect 10Gb
>> iSCSI Initiator (be3)
>>
>> yet all 4 other functions on the device periodically hit IOMMU faults
>> (~once every 5 mins, so definitely stats).
>>
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.4] fault addr
>> bdf80000
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.5] fault addr
>> bdf80000
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.6] fault addr
>> bdf80000
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.7] fault addr
>> bdf80000
>>
>> There are several RMRRs covering the these devices, with:
>>
>> (XEN) [VT-D]found ACPI_DMAR_RMRR:
>> (XEN) [VT-D] endpoint: 0000:03:00.0
>> (XEN) [VT-D] endpoint: 0000:01:00.0
>> (XEN) [VT-D] endpoint: 0000:01:00.2
>> (XEN) [VT-D] endpoint: 0000:04:00.0
>> (XEN) [VT-D] endpoint: 0000:04:00.1
>> (XEN) [VT-D] endpoint: 0000:04:00.2
>> (XEN) [VT-D] endpoint: 0000:04:00.3
>> (XEN) [VT-D]dmar.c:608:   RMRR region: base_addr bdf8f000 end_addr
>> bdf92fff
>>
>> being the one relevant to these faults.  I've not manually decoded the
>> DMAR table because device paths are horrible to follow but there are at
>> least the correct number of endpoints.  The functions all have SR-IOV
>> (disabled) and ARI (enabled).  None have any Phantom functions described.
>>
>> Specifying pci-phantom=04:00,1 does appear to work around the faults,
>> but it's not right, because functions 1 thru 3 aren't actually phantom.
>>
>> Also, I don't see any logic which actually wires up phantom functions
>> like this to share RMRRs/IVMDs in IO contexts.  The faults only
>> disappear as a side effect of 04:00.0 and 04:00.4 being in dom0, as far
>> as I can tell.
>>
>> Simply giving the RMRR via rmrr= doesn't work (presumably because of no
>> patching actual devices, but there's no warning), but it feels as if it
>> ought to.
>>
> What is the Xen version? Does it include Jan's change for per-device
> quarantine?

It's an up-to-date XenServer, so Xen 4.13 based, but yes.

> btw it's weird why those NIC devices require RMRR in the first place...

It's stats to the BMC.  This Emulex card is part of the default
configuration of the system from HP.

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RMRRs and Phantom Functions
  2022-04-27  6:59 ` Jan Beulich
@ 2022-04-27 10:05   ` Andrew Cooper
  2022-04-27 10:18     ` Roger Pau Monné
  2022-04-27 10:20     ` Jan Beulich
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Cooper @ 2022-04-27 10:05 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Roger Pau Monne, Kevin Tian, Edwin Torok, xen-devel

On 27/04/2022 07:59, Jan Beulich wrote:
> On 26.04.2022 19:51, Andrew Cooper wrote:
>> Hello,
>>
>> Edvin has found a machine with some very weird properties.  It is an HP
>> ProLiant BL460c Gen8 with:
>>
>>  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
>>              +-01.0-[11]--
>>              +-01.1-[02]--
>>              +-02.0-[04]--+-00.0  Emulex Corporation OneConnect 10Gb NIC
>> (be3)
>>              |            +-00.1  Emulex Corporation OneConnect 10Gb NIC
>> (be3)
>>              |            +-00.2  Emulex Corporation OneConnect 10Gb
>> iSCSI Initiator (be3)
>>              |            \-00.3  Emulex Corporation OneConnect 10Gb
>> iSCSI Initiator (be3)
>>
>> yet all 4 other functions on the device periodically hit IOMMU faults
>> (~once every 5 mins, so definitely stats).
>>
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.4] fault addr
>> bdf80000
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.5] fault addr
>> bdf80000
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.6] fault addr
>> bdf80000
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.7] fault addr
>> bdf80000
>>
>> There are several RMRRs covering the these devices, with:
>>
>> (XEN) [VT-D]found ACPI_DMAR_RMRR:
>> (XEN) [VT-D] endpoint: 0000:03:00.0
>> (XEN) [VT-D] endpoint: 0000:01:00.0
>> (XEN) [VT-D] endpoint: 0000:01:00.2
>> (XEN) [VT-D] endpoint: 0000:04:00.0
>> (XEN) [VT-D] endpoint: 0000:04:00.1
>> (XEN) [VT-D] endpoint: 0000:04:00.2
>> (XEN) [VT-D] endpoint: 0000:04:00.3
>> (XEN) [VT-D]dmar.c:608:   RMRR region: base_addr bdf8f000 end_addr bdf92fff
>>
>> being the one relevant to these faults.  I've not manually decoded the
>> DMAR table because device paths are horrible to follow but there are at
>> least the correct number of endpoints.  The functions all have SR-IOV
>> (disabled) and ARI (enabled).  None have any Phantom functions described.
>>
>> Specifying pci-phantom=04:00,1 does appear to work around the faults,
>> but it's not right, because functions 1 thru 3 aren't actually phantom.
> Indeed, and I think you really mean "pci-phantom=04:00,4".

As a quick tangent, the cmdline docs for pci-phantom= are in desperate
need of an example and a description of how stride works.  I've got some
ideas and notes jotted down.

Do we really mean ,4 here?  What happens for function 1?

> I guess we
> should actually refuse "pci-phantom=04:00,1" in a case like this one.
> The problem is that at the point we set pdev->phantom_stride we may
> not know of the other devices, yet. But I guess we could attempt a
> config space read of the supposed phantom function's device/vendor
> and do <whatever> if these aren't both 0xffff.

At a minimum, we ought to warn when it looks like something is wonky,
but I wouldn't go as far as rejecting.

All of these options to work around firmware/system screwups are applied
to an already-non-working system, and there is absolutely no guarantee
that necessary fixes make any kind of logical sense.

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RMRRs and Phantom Functions
  2022-04-27  8:03 ` Roger Pau Monné
@ 2022-04-27 10:17   ` Andrew Cooper
  2022-04-27 10:50     ` Roger Pau Monné
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Cooper @ 2022-04-27 10:17 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Jan Beulich, Kevin Tian, Edwin Torok

On 27/04/2022 09:03, Roger Pau Monne wrote:
> On Tue, Apr 26, 2022 at 05:51:32PM +0000, Andrew Cooper wrote:
>> Hello,
>>
>> Edvin has found a machine with some very weird properties.  It is an HP
>> ProLiant BL460c Gen8 with:
>>
>>  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
>>              +-01.0-[11]--
>>              +-01.1-[02]--
>>              +-02.0-[04]--+-00.0  Emulex Corporation OneConnect 10Gb NIC
>> (be3)
>>              |            +-00.1  Emulex Corporation OneConnect 10Gb NIC
>> (be3)
>>              |            +-00.2  Emulex Corporation OneConnect 10Gb
>> iSCSI Initiator (be3)
>>              |            \-00.3  Emulex Corporation OneConnect 10Gb
>> iSCSI Initiator (be3)
>>
>> yet all 4 other functions on the device periodically hit IOMMU faults
>> (~once every 5 mins, so definitely stats).
>>
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.4] fault addr
>> bdf80000
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.5] fault addr
>> bdf80000
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.6] fault addr
>> bdf80000
>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.7] fault addr
>> bdf80000
>>
>> There are several RMRRs covering the these devices, with:
>>
>> (XEN) [VT-D]found ACPI_DMAR_RMRR:
>> (XEN) [VT-D] endpoint: 0000:03:00.0
>> (XEN) [VT-D] endpoint: 0000:01:00.0
>> (XEN) [VT-D] endpoint: 0000:01:00.2
>> (XEN) [VT-D] endpoint: 0000:04:00.0
>> (XEN) [VT-D] endpoint: 0000:04:00.1
>> (XEN) [VT-D] endpoint: 0000:04:00.2
>> (XEN) [VT-D] endpoint: 0000:04:00.3
>> (XEN) [VT-D]dmar.c:608:   RMRR region: base_addr bdf8f000 end_addr bdf92fff
>>
>> being the one relevant to these faults.  I've not manually decoded the
>> DMAR table because device paths are horrible to follow but there are at
>> least the correct number of endpoints.  The functions all have SR-IOV
>> (disabled) and ARI (enabled).  None have any Phantom functions described.
> According to the PCIe spec ARI capable devices must not have phantom
> functions:
>
> "With every Function in an ARI Device, the Phantom Functions Supported
> field must be set to 00b. The remainder of this field description
> applies only to non-ARI multi-Function devices."

Lovely...

>
>> Specifying pci-phantom=04:00,1 does appear to work around the faults,
>> but it's not right, because functions 1 thru 3 aren't actually phantom.
>>
>> Also, I don't see any logic which actually wires up phantom functions
>> like this to share RMRRs/IVMDs in IO contexts.  The faults only
>> disappear as a side effect of 04:00.0 and 04:00.4 being in dom0, as far
>> as I can tell.
> I think I'm slightly confused, so those faults only happen when the
> devices are assigned to domains different than dom0?
>
> It would seem to me that functions 4 to 7 not being recognized by Xen
> should also lead to their context entries not being setup in the dom0
> case, and thus the faults should also happen.

Functions 4 thru 7 do not exist in the system.  Their config space is
all ~0's.

As they appear to be non-existent, no IOMMU context is set up for them,
hence the DMA faults when their source id is actually used.

When specifying phantom, what we're saying is that "function $X uses $Y
as a source id too".  Or in other words, treat $Y as if it were $X.  In
a theoretical future with working IOMMU groups, this would force $X and
$Y into the same IOMMU group as they can't be separated.

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RMRRs and Phantom Functions
  2022-04-27 10:05   ` Andrew Cooper
@ 2022-04-27 10:18     ` Roger Pau Monné
  2022-04-27 10:20     ` Jan Beulich
  1 sibling, 0 replies; 10+ messages in thread
From: Roger Pau Monné @ 2022-04-27 10:18 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Kevin Tian, Edwin Torok, xen-devel

On Wed, Apr 27, 2022 at 10:05:54AM +0000, Andrew Cooper wrote:
> On 27/04/2022 07:59, Jan Beulich wrote:
> > On 26.04.2022 19:51, Andrew Cooper wrote:
> >> Hello,
> >>
> >> Edvin has found a machine with some very weird properties.  It is an HP
> >> ProLiant BL460c Gen8 with:
> >>
> >>  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
> >>              +-01.0-[11]--
> >>              +-01.1-[02]--
> >>              +-02.0-[04]--+-00.0  Emulex Corporation OneConnect 10Gb NIC
> >> (be3)
> >>              |            +-00.1  Emulex Corporation OneConnect 10Gb NIC
> >> (be3)
> >>              |            +-00.2  Emulex Corporation OneConnect 10Gb
> >> iSCSI Initiator (be3)
> >>              |            \-00.3  Emulex Corporation OneConnect 10Gb
> >> iSCSI Initiator (be3)
> >>
> >> yet all 4 other functions on the device periodically hit IOMMU faults
> >> (~once every 5 mins, so definitely stats).
> >>
> >> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.4] fault addr
> >> bdf80000
> >> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.5] fault addr
> >> bdf80000
> >> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.6] fault addr
> >> bdf80000
> >> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.7] fault addr
> >> bdf80000
> >>
> >> There are several RMRRs covering the these devices, with:
> >>
> >> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> >> (XEN) [VT-D] endpoint: 0000:03:00.0
> >> (XEN) [VT-D] endpoint: 0000:01:00.0
> >> (XEN) [VT-D] endpoint: 0000:01:00.2
> >> (XEN) [VT-D] endpoint: 0000:04:00.0
> >> (XEN) [VT-D] endpoint: 0000:04:00.1
> >> (XEN) [VT-D] endpoint: 0000:04:00.2
> >> (XEN) [VT-D] endpoint: 0000:04:00.3
> >> (XEN) [VT-D]dmar.c:608:   RMRR region: base_addr bdf8f000 end_addr bdf92fff
> >>
> >> being the one relevant to these faults.  I've not manually decoded the
> >> DMAR table because device paths are horrible to follow but there are at
> >> least the correct number of endpoints.  The functions all have SR-IOV
> >> (disabled) and ARI (enabled).  None have any Phantom functions described.
> >>
> >> Specifying pci-phantom=04:00,1 does appear to work around the faults,
> >> but it's not right, because functions 1 thru 3 aren't actually phantom.
> > Indeed, and I think you really mean "pci-phantom=04:00,4".
> 
> As a quick tangent, the cmdline docs for pci-phantom= are in desperate
> need of an example and a description of how stride works.  I've got some
> ideas and notes jotted down.
> 
> Do we really mean ,4 here?  What happens for function 1?
> 
> > I guess we
> > should actually refuse "pci-phantom=04:00,1" in a case like this one.
> > The problem is that at the point we set pdev->phantom_stride we may
> > not know of the other devices, yet. But I guess we could attempt a
> > config space read of the supposed phantom function's device/vendor
> > and do <whatever> if these aren't both 0xffff.
> 
> At a minimum, we ought to warn when it looks like something is wonky,
> but I wouldn't go as far as rejecting.
> 
> All of these options to work around firmware/system screwups are applied
> to an already-non-working system, and there is absolutely no guarantee
> that necessary fixes make any kind of logical sense.

AFAICT with stride = 1 Xen will treat functions 1-7 as phantom
functions depending from function 0, which means the pdev struct won't
get updated when those phantom functions are assigned to a domain as
part of assigning function 0.  That would imply that functions 1 to 3
will be considered phantom but would also have a matching pdev that
allows them to be independently assigned to a domain, nothing good
will came out of it.

I agree with Jan that we need to explicitly reject strides that cover
functions that would otherwise be considered devices (ie: have valid
config space entries).  Or alternatively we need to remove the pdevs
for those functions now considered phantom.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RMRRs and Phantom Functions
  2022-04-27 10:05   ` Andrew Cooper
  2022-04-27 10:18     ` Roger Pau Monné
@ 2022-04-27 10:20     ` Jan Beulich
  1 sibling, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2022-04-27 10:20 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monne, Kevin Tian, Edwin Torok, xen-devel

On 27.04.2022 12:05, Andrew Cooper wrote:
> On 27/04/2022 07:59, Jan Beulich wrote:
>> On 26.04.2022 19:51, Andrew Cooper wrote:
>>> Hello,
>>>
>>> Edvin has found a machine with some very weird properties.  It is an HP
>>> ProLiant BL460c Gen8 with:
>>>
>>>  \-[0000:00]-+-00.0  Intel Corporation Xeon E5/Core i7 DMI2
>>>              +-01.0-[11]--
>>>              +-01.1-[02]--
>>>              +-02.0-[04]--+-00.0  Emulex Corporation OneConnect 10Gb NIC
>>> (be3)
>>>              |            +-00.1  Emulex Corporation OneConnect 10Gb NIC
>>> (be3)
>>>              |            +-00.2  Emulex Corporation OneConnect 10Gb
>>> iSCSI Initiator (be3)
>>>              |            \-00.3  Emulex Corporation OneConnect 10Gb
>>> iSCSI Initiator (be3)
>>>
>>> yet all 4 other functions on the device periodically hit IOMMU faults
>>> (~once every 5 mins, so definitely stats).
>>>
>>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.4] fault addr
>>> bdf80000
>>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.5] fault addr
>>> bdf80000
>>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.6] fault addr
>>> bdf80000
>>> (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:04:00.7] fault addr
>>> bdf80000
>>>
>>> There are several RMRRs covering the these devices, with:
>>>
>>> (XEN) [VT-D]found ACPI_DMAR_RMRR:
>>> (XEN) [VT-D] endpoint: 0000:03:00.0
>>> (XEN) [VT-D] endpoint: 0000:01:00.0
>>> (XEN) [VT-D] endpoint: 0000:01:00.2
>>> (XEN) [VT-D] endpoint: 0000:04:00.0
>>> (XEN) [VT-D] endpoint: 0000:04:00.1
>>> (XEN) [VT-D] endpoint: 0000:04:00.2
>>> (XEN) [VT-D] endpoint: 0000:04:00.3
>>> (XEN) [VT-D]dmar.c:608:   RMRR region: base_addr bdf8f000 end_addr bdf92fff
>>>
>>> being the one relevant to these faults.  I've not manually decoded the
>>> DMAR table because device paths are horrible to follow but there are at
>>> least the correct number of endpoints.  The functions all have SR-IOV
>>> (disabled) and ARI (enabled).  None have any Phantom functions described.
>>>
>>> Specifying pci-phantom=04:00,1 does appear to work around the faults,
>>> but it's not right, because functions 1 thru 3 aren't actually phantom.
>> Indeed, and I think you really mean "pci-phantom=04:00,4".
> 
> As a quick tangent, the cmdline docs for pci-phantom= are in desperate
> need of an example and a description of how stride works.  I've got some
> ideas and notes jotted down.
> 
> Do we really mean ,4 here?  What happens for function 1?

With stride 4 function 1's single phantom function is function 5. With
stride 1, as you had it before, functions 1...7 would all be considered
phantom functions of function 0.

Jan



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RMRRs and Phantom Functions
  2022-04-27 10:17   ` Andrew Cooper
@ 2022-04-27 10:50     ` Roger Pau Monné
  0 siblings, 0 replies; 10+ messages in thread
From: Roger Pau Monné @ 2022-04-27 10:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich, Kevin Tian, Edwin Torok

On Wed, Apr 27, 2022 at 10:17:49AM +0000, Andrew Cooper wrote:
> On 27/04/2022 09:03, Roger Pau Monne wrote:
> > On Tue, Apr 26, 2022 at 05:51:32PM +0000, Andrew Cooper wrote:
> >> Specifying pci-phantom=04:00,1 does appear to work around the faults,
> >> but it's not right, because functions 1 thru 3 aren't actually phantom.
> >>
> >> Also, I don't see any logic which actually wires up phantom functions
> >> like this to share RMRRs/IVMDs in IO contexts.  The faults only
> >> disappear as a side effect of 04:00.0 and 04:00.4 being in dom0, as far
> >> as I can tell.
> > I think I'm slightly confused, so those faults only happen when the
> > devices are assigned to domains different than dom0?
> >
> > It would seem to me that functions 4 to 7 not being recognized by Xen
> > should also lead to their context entries not being setup in the dom0
> > case, and thus the faults should also happen.
> 
> Functions 4 thru 7 do not exist in the system.  Their config space is
> all ~0's.

Yup.

> As they appear to be non-existent, no IOMMU context is set up for them,
> hence the DMA faults when their source id is actually used.

Right, somehow I read your initial description as the faults only
happening when the devices are assigned to guest, but not when in
dom0.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-04-27 10:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-26 17:51 RMRRs and Phantom Functions Andrew Cooper
2022-04-27  3:39 ` Tian, Kevin
2022-04-27  8:51   ` Andrew Cooper
2022-04-27  6:59 ` Jan Beulich
2022-04-27 10:05   ` Andrew Cooper
2022-04-27 10:18     ` Roger Pau Monné
2022-04-27 10:20     ` Jan Beulich
2022-04-27  8:03 ` Roger Pau Monné
2022-04-27 10:17   ` Andrew Cooper
2022-04-27 10:50     ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.