Re: [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <jbeulich@suse.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: "Paul Durrant" <paul@xen.org>, "Ian Jackson" <iwj@xenproject.org>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier
Date: Wed, 20 Oct 2021 00:34:23 +0100	[thread overview]
Message-ID: <932476cb-9667-efaa-65e9-7dc4baa3dc7c@citrix.com> (raw)
In-Reply-To: <d5f76461-70d2-fc59-2213-99a093e3b57f@suse.com>

On 22/09/2021 15:36, Jan Beulich wrote:
> Doing this in amd_iommu_prepare() is too late for it, in particular, to
> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
> (luckily) pretty simple, (pretty importantly) without breaking
> amd_iommu_prepare()'s logic to prevent multiple processing.
>
> This involves moving table checksumming, as
> amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
> now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
> the course of doing so stop open-coding acpi_tb_checksum(), seeing that
> we have other uses of this originally ACPI-private function elsewhere in
> the tree.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

I'm afraid this breaks booting on Skylake Server.  Yes, really - I
didn't believe the bisection at first either.

From a bit of debugging, I've found:

(XEN) *** acpi_dmar_init() => -19
(XEN) *** amd_iommu_get_supported_ivhd_type() => -19

So VT-d is disabled in firmware.  Oops, but something we should cope with.

Then we fall into acpi_ivrs_init(), and take the new-in-this-patch early
exit with -ENOENT too.

It turns out ...

> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_
>  
>  int __init acpi_ivrs_init(void)
>  {
> +    int rc;
> +
>      if ( !iommu_enable && !iommu_intremap )
>          return 0;
>  
> +    rc = amd_iommu_get_supported_ivhd_type();
> +    if ( rc < 0 )
> +        return rc;
> +    BUG_ON(!rc);
> +    ivhd_type = rc;
> +
>      if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
>      {
>          iommu_intremap = iommu_intremap_off;
>

... we're relying on this path (now skipped) to set iommu_intremap away
from iommu_intremap_full in the "no IOMMU anywhere to be found" case.

This explains why I occasionally during failure get spew about:

(XEN) CPU0: No irq handler for vector 7a (IRQ -2147483648, LAPIC)
[   17.117518] xhci_hcd 0000:00:14.0: Error while assigning device slot ID
[   17.121114] xhci_hcd 0000:00:14.0: Max number of devices this xHCI
host supports is 64.
[   17.125198] usb usb1-port2: couldn't allocate usb_device
[  248.317462] INFO: task kworker/u32:0:7 blocked for more than 120 seconds.

and eventually (gone 400s) get dumped in a dracut shell.

Booting with an explicit iommu=no-intremap, which clobbers
iommu_intremap during cmdline parsing, recovers the system.

This variable controls a whole lot of magic with interrupt handling.  It
should default to 0, not 2, and only become nonzero when an IOMMU is
properly established.  It also shouldn't be serving double duty as "what
the user wants" ahead of determining the system capabilities.

And not to open another can of worms, but our entire way of working
explodes if there are devices on the system not covered by an IOMMU.

~Andrew