All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/6] AMD/IOMMU: further work split from XSA-378
@ 2021-09-22 14:35 Jan Beulich
  2021-09-22 14:36 ` [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier Jan Beulich
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Jan Beulich @ 2021-09-22 14:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

Along the pieces that were determined to have security relevance
there are quite a few more fixes / improvements (or so I hope)
which were decided to not become part of the XSA itself.

1: obtain IVHD type to use earlier
2: improve (extended) feature detection
3: check IVMD ranges against host implementation limits
4: respect AtsDisabled device flag
5: pull ATS disabling earlier
6: expose errors and warnings unconditionally

Jan



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier
  2021-09-22 14:35 [PATCH v8 0/6] AMD/IOMMU: further work split from XSA-378 Jan Beulich
@ 2021-09-22 14:36 ` Jan Beulich
  2021-09-28  7:12   ` Durrant, Paul
  2021-10-19 23:34   ` Andrew Cooper
  2021-09-22 14:37 ` [PATCH v8 2/6] AMD/IOMMU: improve (extended) feature detection Jan Beulich
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Jan Beulich @ 2021-09-22 14:36 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

Doing this in amd_iommu_prepare() is too late for it, in particular, to
be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
(luckily) pretty simple, (pretty importantly) without breaking
amd_iommu_prepare()'s logic to prevent multiple processing.

This involves moving table checksumming, as
amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
the course of doing so stop open-coding acpi_tb_checksum(), seeing that
we have other uses of this originally ACPI-private function elsewhere in
the tree.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v7: Move table checksumming.
v5: New.

--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -22,6 +22,8 @@
 
 #include <asm/io_apic.h>
 
+#include <acpi/actables.h>
+
 #include "iommu.h"
 
 /* Some helper structures, particularly to deal with ranges. */
@@ -1167,20 +1169,7 @@ static int __init parse_ivrs_table(struc
 static int __init detect_iommu_acpi(struct acpi_table_header *table)
 {
     const struct acpi_ivrs_header *ivrs_block;
-    unsigned long i;
     unsigned long length = sizeof(struct acpi_table_ivrs);
-    u8 checksum, *raw_table;
-
-    /* validate checksum: sum of entire table == 0 */
-    checksum = 0;
-    raw_table = (u8 *)table;
-    for ( i = 0; i < table->length; i++ )
-        checksum += raw_table[i];
-    if ( checksum )
-    {
-        AMD_IOMMU_DEBUG("IVRS Error: Invalid Checksum %#x\n", checksum);
-        return -ENODEV;
-    }
 
     while ( table->length > (length + sizeof(*ivrs_block)) )
     {
@@ -1317,6 +1306,15 @@ get_supported_ivhd_type(struct acpi_tabl
 {
     size_t length = sizeof(struct acpi_table_ivrs);
     const struct acpi_ivrs_header *ivrs_block, *blk = NULL;
+    uint8_t checksum;
+
+    /* Validate checksum: Sum of entire table == 0. */
+    checksum = acpi_tb_checksum(ACPI_CAST_PTR(uint8_t, table), table->length);
+    if ( checksum )
+    {
+        AMD_IOMMU_DEBUG("IVRS Error: Invalid Checksum %#x\n", checksum);
+        return -ENODEV;
+    }
 
     while ( table->length > (length + sizeof(*ivrs_block)) )
     {
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1398,15 +1398,9 @@ int __init amd_iommu_prepare(bool xt)
         goto error_out;
 
     /* Have we been here before? */
-    if ( ivhd_type )
+    if ( ivrs_bdf_entries )
         return 0;
 
-    rc = amd_iommu_get_supported_ivhd_type();
-    if ( rc < 0 )
-        goto error_out;
-    BUG_ON(!rc);
-    ivhd_type = rc;
-
     rc = amd_iommu_get_ivrs_dev_entries();
     if ( !rc )
         rc = -ENODEV;
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_
 
 int __init acpi_ivrs_init(void)
 {
+    int rc;
+
     if ( !iommu_enable && !iommu_intremap )
         return 0;
 
+    rc = amd_iommu_get_supported_ivhd_type();
+    if ( rc < 0 )
+        return rc;
+    BUG_ON(!rc);
+    ivhd_type = rc;
+
     if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
     {
         iommu_intremap = iommu_intremap_off;



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v8 2/6] AMD/IOMMU: improve (extended) feature detection
  2021-09-22 14:35 [PATCH v8 0/6] AMD/IOMMU: further work split from XSA-378 Jan Beulich
  2021-09-22 14:36 ` [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier Jan Beulich
@ 2021-09-22 14:37 ` Jan Beulich
  2021-09-22 14:37 ` [PATCH v8 3/6] AMD/IOMMU: check IVMD ranges against host implementation limits Jan Beulich
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2021-09-22 14:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

First of all the documentation is very clear about ACPI table data
superseding raw register data. Use raw register data only if EFRSup is
clear in the ACPI tables (which may still go too far). Additionally if
this flag is clear, the IVRS type 11H table is reserved and hence may
not be recognized.

Furthermore propagate IVRS type 10H data into the feature flags
recorded, as the full extended features field is available in type 11H
only.

Note that this also makes necessary to stop the bad practice of us
finding a type 11H IVHD entry, but still processing the type 10H one
in detect_iommu_acpi()'s invocation of amd_iommu_detect_one_acpi().

Note also that the features.raw check in amd_iommu_prepare_one() needs
replacing, now that the field can also be populated by different means.
Key IOMMUv2 availability off of IVHD type not being 10H, and then move
it a function layer up, so that it would be set only once all IOMMUs
have been successfully prepared.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
v7: Re-base.
v5: New.

--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -306,6 +306,7 @@ extern struct hpet_sbdf {
     } init;
 } hpet_sbdf;
 
+extern unsigned int amd_iommu_acpi_info;
 extern int amd_iommu_min_paging_mode;
 
 extern void *shared_intremap_table;
--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -1062,7 +1062,8 @@ static unsigned int __initdata nr_ivmd;
 static inline bool_t is_ivhd_block(u8 type)
 {
     return (type == ACPI_IVRS_TYPE_HARDWARE ||
-            type == ACPI_IVRS_TYPE_HARDWARE_11H);
+            ((amd_iommu_acpi_info & ACPI_IVRS_EFR_SUP) &&
+             type == ACPI_IVRS_TYPE_HARDWARE_11H));
 }
 
 static inline bool_t is_ivmd_block(u8 type)
@@ -1176,7 +1177,7 @@ static int __init detect_iommu_acpi(stru
         ivrs_block = (struct acpi_ivrs_header *)((u8 *)table + length);
         if ( table->length < (length + ivrs_block->length) )
             return -ENODEV;
-        if ( ivrs_block->type == ACPI_IVRS_TYPE_HARDWARE &&
+        if ( ivrs_block->type == ivhd_type &&
              amd_iommu_detect_one_acpi(to_ivhd_block(ivrs_block)) != 0 )
             return -ENODEV;
         length += ivrs_block->length;
@@ -1316,6 +1317,9 @@ get_supported_ivhd_type(struct acpi_tabl
         return -ENODEV;
     }
 
+    amd_iommu_acpi_info = container_of(table, const struct acpi_table_ivrs,
+                                       header)->info;
+
     while ( table->length > (length + sizeof(*ivrs_block)) )
     {
         ivrs_block = (struct acpi_ivrs_header *)((u8 *)table + length);
--- a/xen/drivers/passthrough/amd/iommu_detect.c
+++ b/xen/drivers/passthrough/amd/iommu_detect.c
@@ -60,14 +60,14 @@ void __init get_iommu_features(struct am
     const struct amd_iommu *first;
     ASSERT( iommu->mmio_base );
 
-    if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
+    if ( !(amd_iommu_acpi_info & ACPI_IVRS_EFR_SUP) )
     {
-        iommu->features.raw = 0;
-        return;
-    }
+        if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
+            return;
 
-    iommu->features.raw =
-        readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
+        iommu->features.raw =
+            readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
+    }
 
     /* Don't log the same set of features over and over. */
     first = list_first_entry(&amd_iommu_head, struct amd_iommu, list);
@@ -164,6 +164,42 @@ int __init amd_iommu_detect_one_acpi(
     iommu->cap_offset = ivhd_block->capability_offset;
     iommu->mmio_base_phys = ivhd_block->base_address;
 
+    if ( ivhd_type != ACPI_IVRS_TYPE_HARDWARE )
+        iommu->features.raw = ivhd_block->efr_image;
+    else if ( amd_iommu_acpi_info & ACPI_IVRS_EFR_SUP )
+    {
+        union {
+            uint32_t raw;
+            struct {
+                unsigned int xt_sup:1;
+                unsigned int nx_sup:1;
+                unsigned int gt_sup:1;
+                unsigned int glx_sup:2;
+                unsigned int ia_sup:1;
+                unsigned int ga_sup:1;
+                unsigned int he_sup:1;
+                unsigned int pas_max:5;
+                unsigned int pn_counters:4;
+                unsigned int pn_banks:6;
+                unsigned int msi_num_ppr:5;
+                unsigned int gats:2;
+                unsigned int hats:2;
+            };
+        } attr = { .raw = ivhd_block->iommu_attr };
+
+        iommu->features.flds.xt_sup = attr.xt_sup;
+        iommu->features.flds.nx_sup = attr.nx_sup;
+        iommu->features.flds.gt_sup = attr.gt_sup;
+        iommu->features.flds.glx_sup = attr.glx_sup;
+        iommu->features.flds.ia_sup = attr.ia_sup;
+        iommu->features.flds.ga_sup = attr.ga_sup;
+        iommu->features.flds.pas_max = attr.pas_max;
+        iommu->features.flds.gats = attr.gats;
+        iommu->features.flds.hats = attr.hats;
+    }
+    else if ( list_empty(&amd_iommu_head) )
+        AMD_IOMMU_DEBUG("EFRSup not set in ACPI table; will fall back to hardware\n");
+
     /* override IOMMU HT flags */
     iommu->ht_flags = ivhd_block->header.flags;
 
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -29,6 +29,7 @@ static bool __initdata pci_init;
 static void do_amd_iommu_irq(void *data);
 static DECLARE_SOFTIRQ_TASKLET(amd_iommu_irq_tasklet, do_amd_iommu_irq, NULL);
 
+unsigned int __read_mostly amd_iommu_acpi_info;
 unsigned int __read_mostly ivrs_bdf_entries;
 u8 __read_mostly ivhd_type;
 static struct radix_tree_root ivrs_maps;
@@ -1375,9 +1376,6 @@ static int __init amd_iommu_prepare_one(
 
     get_iommu_features(iommu);
 
-    if ( iommu->features.raw )
-        iommuv2_enabled = true;
-
     return 0;
 }
 
@@ -1419,6 +1417,9 @@ int __init amd_iommu_prepare(bool xt)
             has_xt = false;
     }
 
+    if ( ivhd_type != ACPI_IVRS_TYPE_HARDWARE )
+        iommuv2_enabled = true;
+
     for_each_amd_iommu ( iommu )
     {
         /* NB: There's no need to actually write these out right here. */
--- a/xen/include/acpi/actbl2.h
+++ b/xen/include/acpi/actbl2.h
@@ -716,6 +716,9 @@ struct acpi_table_ivrs {
 
 /* Values for Info field above */
 
+#define ACPI_IVRS_EFR_SUP           0x00000001	/* extended feature support */
+#define ACPI_IVRS_PREBOOT_DMA_REMAP 0x00000002	/* pre-boot DMA remapping in use */
+#define ACPI_IVRS_GVA_SIZE          0x000000E0	/* 3 bits, guest VA size */
 #define ACPI_IVRS_PHYSICAL_SIZE     0x00007F00	/* 7 bits, physical address size */
 #define ACPI_IVRS_VIRTUAL_SIZE      0x003F8000	/* 7 bits, virtual address size */
 #define ACPI_IVRS_ATS_RESERVED      0x00400000	/* ATS address translation range reserved */



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v8 3/6] AMD/IOMMU: check IVMD ranges against host implementation limits
  2021-09-22 14:35 [PATCH v8 0/6] AMD/IOMMU: further work split from XSA-378 Jan Beulich
  2021-09-22 14:36 ` [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier Jan Beulich
  2021-09-22 14:37 ` [PATCH v8 2/6] AMD/IOMMU: improve (extended) feature detection Jan Beulich
@ 2021-09-22 14:37 ` Jan Beulich
  2021-09-22 14:37 ` [PATCH v8 4/6] AMD/IOMMU: respect AtsDisabled device flag Jan Beulich
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2021-09-22 14:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

When such ranges can't be represented as 1:1 mappings in page tables,
reject them as presumably bogus. Note that when we detect features late
(because of EFRSup being clear in the ACPI tables), it would be quite a
bit of work to check for (and drop) out of range IVMD ranges, so IOMMU
initialization gets failed in this case instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
v7: Re-base.
v6: Re-base.
v5: New.

--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -307,6 +307,7 @@ extern struct hpet_sbdf {
 } hpet_sbdf;
 
 extern unsigned int amd_iommu_acpi_info;
+extern unsigned int amd_iommu_max_paging_mode;
 extern int amd_iommu_min_paging_mode;
 
 extern void *shared_intremap_table;
@@ -360,7 +361,7 @@ static inline int amd_iommu_get_paging_m
     while ( max_frames > PTE_PER_TABLE_SIZE )
     {
         max_frames = PTE_PER_TABLE_ALIGN(max_frames) >> PTE_PER_TABLE_SHIFT;
-        if ( ++level > 6 )
+        if ( ++level > amd_iommu_max_paging_mode )
             return -ENOMEM;
     }
 
--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -378,6 +378,7 @@ static int __init parse_ivmd_device_iomm
 static int __init parse_ivmd_block(const struct acpi_ivrs_memory *ivmd_block)
 {
     unsigned long start_addr, mem_length, base, limit;
+    unsigned int addr_bits;
     bool iw = true, ir = true, exclusion = false;
 
     if ( ivmd_block->header.length < sizeof(*ivmd_block) )
@@ -394,6 +395,17 @@ static int __init parse_ivmd_block(const
     AMD_IOMMU_DEBUG("IVMD Block: type %#x phys %#lx len %#lx\n",
                     ivmd_block->header.type, start_addr, mem_length);
 
+    addr_bits = min(MASK_EXTR(amd_iommu_acpi_info, ACPI_IVRS_PHYSICAL_SIZE),
+                    MASK_EXTR(amd_iommu_acpi_info, ACPI_IVRS_VIRTUAL_SIZE));
+    if ( amd_iommu_get_paging_mode(PFN_UP(start_addr + mem_length)) < 0 ||
+         (addr_bits < BITS_PER_LONG &&
+          ((start_addr + mem_length - 1) >> addr_bits)) )
+    {
+        AMD_IOMMU_DEBUG("IVMD: [%lx,%lx) is not IOMMU addressable\n",
+                        start_addr, start_addr + mem_length);
+        return 0;
+    }
+
     if ( !e820_all_mapped(base, limit + PAGE_SIZE, E820_RESERVED) )
     {
         paddr_t addr;
--- a/xen/drivers/passthrough/amd/iommu_detect.c
+++ b/xen/drivers/passthrough/amd/iommu_detect.c
@@ -67,6 +67,9 @@ void __init get_iommu_features(struct am
 
         iommu->features.raw =
             readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
+
+        if ( 4 + iommu->features.flds.hats < amd_iommu_max_paging_mode )
+            amd_iommu_max_paging_mode = 4 + iommu->features.flds.hats;
     }
 
     /* Don't log the same set of features over and over. */
@@ -200,6 +203,10 @@ int __init amd_iommu_detect_one_acpi(
     else if ( list_empty(&amd_iommu_head) )
         AMD_IOMMU_DEBUG("EFRSup not set in ACPI table; will fall back to hardware\n");
 
+    if ( (amd_iommu_acpi_info & ACPI_IVRS_EFR_SUP) &&
+         4 + iommu->features.flds.hats < amd_iommu_max_paging_mode )
+        amd_iommu_max_paging_mode = 4 + iommu->features.flds.hats;
+
     /* override IOMMU HT flags */
     iommu->ht_flags = ivhd_block->header.flags;
 
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1376,6 +1376,13 @@ static int __init amd_iommu_prepare_one(
 
     get_iommu_features(iommu);
 
+    /*
+     * Late extended feature determination may cause previously mappable
+     * IVMD ranges to become unmappable.
+     */
+    if ( amd_iommu_max_paging_mode < amd_iommu_min_paging_mode )
+        return -ERANGE;
+
     return 0;
 }
 
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -254,6 +254,7 @@ int amd_iommu_alloc_root(struct domain *
     return 0;
 }
 
+unsigned int __read_mostly amd_iommu_max_paging_mode = 6;
 int __read_mostly amd_iommu_min_paging_mode = 1;
 
 static int amd_iommu_domain_init(struct domain *d)



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v8 4/6] AMD/IOMMU: respect AtsDisabled device flag
  2021-09-22 14:35 [PATCH v8 0/6] AMD/IOMMU: further work split from XSA-378 Jan Beulich
                   ` (2 preceding siblings ...)
  2021-09-22 14:37 ` [PATCH v8 3/6] AMD/IOMMU: check IVMD ranges against host implementation limits Jan Beulich
@ 2021-09-22 14:37 ` Jan Beulich
  2021-09-28  7:34   ` Durrant, Paul
  2021-09-22 14:38 ` [PATCH v8 5/6] AMD/IOMMU: pull ATS disabling earlier Jan Beulich
  2021-09-22 14:38 ` [PATCH v8 6/6] AMD/IOMMU: expose errors and warnings unconditionally Jan Beulich
  5 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2021-09-22 14:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

IVHD entries may specify that ATS is to be blocked for a device or range
of devices. Honor firmware telling us so.

While adding respective checks I noticed that the 2nd conditional in
amd_iommu_setup_domain_device() failed to check the IOMMU's capability.
Add the missing part of the condition there, as no good can come from
enabling ATS on a device when the IOMMU is not capable of dealing with
ATS requests.

For actually using ACPI_IVHD_ATS_DISABLED, make its expansion no longer
exhibit UB.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: As an alternative to adding the missing IOMMU capability check, we
     may want to consider simply using dte->i in the 2nd conditional in
     amd_iommu_setup_domain_device().
Note that while ATS enabling/disabling gets invoked without any locks
held, the two functions should not be possible to race with one another
for any individual device (or else we'd be in trouble already, as ATS
might then get re-enabled immediately after it was disabled, with the
DTE out of sync with this setting).
---
v7: New.

--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -120,6 +120,7 @@ struct ivrs_mappings {
     uint16_t dte_requestor_id;
     bool valid:1;
     bool dte_allow_exclusion:1;
+    bool block_ats:1;
 
     /* ivhd device data settings */
     uint8_t device_flags;
--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -55,8 +55,8 @@ union acpi_ivhd_device {
 };
 
 static void __init add_ivrs_mapping_entry(
-    uint16_t bdf, uint16_t alias_id, uint8_t flags, bool alloc_irt,
-    struct amd_iommu *iommu)
+    uint16_t bdf, uint16_t alias_id, uint8_t flags, unsigned int ext_flags,
+    bool alloc_irt, struct amd_iommu *iommu)
 {
     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(iommu->seg);
 
@@ -66,6 +66,7 @@ static void __init add_ivrs_mapping_entr
     ivrs_mappings[bdf].dte_requestor_id = alias_id;
 
     /* override flags for range of devices */
+    ivrs_mappings[bdf].block_ats = ext_flags & ACPI_IVHD_ATS_DISABLED;
     ivrs_mappings[bdf].device_flags = flags;
 
     /* Don't map an IOMMU by itself. */
@@ -499,7 +500,7 @@ static u16 __init parse_ivhd_device_sele
         return 0;
     }
 
-    add_ivrs_mapping_entry(bdf, bdf, select->header.data_setting, false,
+    add_ivrs_mapping_entry(bdf, bdf, select->header.data_setting, 0, false,
                            iommu);
 
     return sizeof(*select);
@@ -545,7 +546,7 @@ static u16 __init parse_ivhd_device_rang
     AMD_IOMMU_DEBUG(" Dev_Id Range: %#x -> %#x\n", first_bdf, last_bdf);
 
     for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
-        add_ivrs_mapping_entry(bdf, bdf, range->start.header.data_setting,
+        add_ivrs_mapping_entry(bdf, bdf, range->start.header.data_setting, 0,
                                false, iommu);
 
     return dev_length;
@@ -580,7 +581,7 @@ static u16 __init parse_ivhd_device_alia
 
     AMD_IOMMU_DEBUG(" Dev_Id Alias: %#x\n", alias_id);
 
-    add_ivrs_mapping_entry(bdf, alias_id, alias->header.data_setting, true,
+    add_ivrs_mapping_entry(bdf, alias_id, alias->header.data_setting, 0, true,
                            iommu);
 
     return dev_length;
@@ -636,7 +637,7 @@ static u16 __init parse_ivhd_device_alia
 
     for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
         add_ivrs_mapping_entry(bdf, alias_id, range->alias.header.data_setting,
-                               true, iommu);
+                               0, true, iommu);
 
     return dev_length;
 }
@@ -661,7 +662,8 @@ static u16 __init parse_ivhd_device_exte
         return 0;
     }
 
-    add_ivrs_mapping_entry(bdf, bdf, ext->header.data_setting, false, iommu);
+    add_ivrs_mapping_entry(bdf, bdf, ext->header.data_setting,
+                           ext->extended_data, false, iommu);
 
     return dev_length;
 }
@@ -708,7 +710,7 @@ static u16 __init parse_ivhd_device_exte
 
     for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
         add_ivrs_mapping_entry(bdf, bdf, range->extended.header.data_setting,
-                               false, iommu);
+                               range->extended.extended_data, false, iommu);
 
     return dev_length;
 }
@@ -800,7 +802,7 @@ static u16 __init parse_ivhd_device_spec
 
     AMD_IOMMU_DEBUG("IVHD Special: %pp variety %#x handle %#x\n",
                     &PCI_SBDF2(seg, bdf), special->variety, special->handle);
-    add_ivrs_mapping_entry(bdf, bdf, special->header.data_setting, true,
+    add_ivrs_mapping_entry(bdf, bdf, special->header.data_setting, 0, true,
                            iommu);
 
     switch ( special->variety )
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -105,6 +105,7 @@ static int __must_check amd_iommu_setup_
     int req_id, valid = 1, rc;
     u8 bus = pdev->bus;
     struct domain_iommu *hd = dom_iommu(domain);
+    const struct ivrs_mappings *ivrs_dev;
 
     if ( QUARANTINE_SKIP(domain) )
         return 0;
@@ -122,20 +123,18 @@ static int __must_check amd_iommu_setup_
     req_id = get_dma_requestor_id(iommu->seg, PCI_BDF2(bus, devfn));
     table = iommu->dev_table.buffer;
     dte = &table[req_id];
+    ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
 
     spin_lock_irqsave(&iommu->lock, flags);
 
     if ( !dte->v || !dte->tv )
     {
-        const struct ivrs_mappings *ivrs_dev;
-
         /* bind DTE to domain page-tables */
         amd_iommu_set_root_page_table(
             dte, page_to_maddr(hd->arch.amd.root_table),
             domain->domain_id, hd->arch.amd.paging_mode, valid);
 
         /* Undo what amd_iommu_disable_domain_device() may have done. */
-        ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
         if ( dte->it_root )
         {
             dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
@@ -146,6 +145,7 @@ static int __must_check amd_iommu_setup_
         dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, ACPI_IVHD_SYSTEM_MGMT);
 
         if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
+             !ivrs_dev->block_ats &&
              iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) )
             dte->i = ats_enabled;
 
@@ -166,6 +166,8 @@ static int __must_check amd_iommu_setup_
     ASSERT(pcidevs_locked());
 
     if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
+         !ivrs_dev->block_ats &&
+         iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) &&
          !pci_ats_enabled(iommu->seg, bus, pdev->devfn) )
     {
         if ( devfn == pdev->devfn )
--- a/xen/include/acpi/actbl2.h
+++ b/xen/include/acpi/actbl2.h
@@ -851,7 +851,7 @@ struct acpi_ivrs_device8b {
 
 /* Values for extended_data above */
 
-#define ACPI_IVHD_ATS_DISABLED      (1<<31)
+#define ACPI_IVHD_ATS_DISABLED      (1u << 31)
 
 /* Type 72: 8-byte device entry */
 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v8 5/6] AMD/IOMMU: pull ATS disabling earlier
  2021-09-22 14:35 [PATCH v8 0/6] AMD/IOMMU: further work split from XSA-378 Jan Beulich
                   ` (3 preceding siblings ...)
  2021-09-22 14:37 ` [PATCH v8 4/6] AMD/IOMMU: respect AtsDisabled device flag Jan Beulich
@ 2021-09-22 14:38 ` Jan Beulich
  2021-09-28  7:36   ` Durrant, Paul
  2021-09-22 14:38 ` [PATCH v8 6/6] AMD/IOMMU: expose errors and warnings unconditionally Jan Beulich
  5 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2021-09-22 14:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

Disabling should be done in the opposite order of enabling: ATS wants to
be turned off before adjusting the DTE, just like it gets enabled only
after the DTE was suitably prepared. Note that we want ATS to be
disabled as soon as any of the DTEs involved in the handling of a device
(including phantom devices) gets adjusted respectively. For this reason
the "devfn == pdev->devfn" of the original conditional gets dropped.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: This points out that for phantom devices the ordering in
     amd_iommu_setup_domain_device() may also not be fully suitable: ATS
     would better be enabled on the device only after all involved DTEs
     have got prepared. This would be a less straightforward change,
     though.
---
v8: New.

--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -313,6 +313,12 @@ static void amd_iommu_disable_domain_dev
     if ( QUARANTINE_SKIP(domain) )
         return;
 
+    ASSERT(pcidevs_locked());
+
+    if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
+         pci_ats_enabled(iommu->seg, bus, pdev->devfn) )
+        disable_ats_device(pdev);
+
     BUG_ON ( iommu->dev_table.buffer == NULL );
     req_id = get_dma_requestor_id(iommu->seg, PCI_BDF2(bus, devfn));
     table = iommu->dev_table.buffer;
@@ -348,13 +354,6 @@ static void amd_iommu_disable_domain_dev
     }
     else
         spin_unlock_irqrestore(&iommu->lock, flags);
-
-    ASSERT(pcidevs_locked());
-
-    if ( devfn == pdev->devfn &&
-         pci_ats_device(iommu->seg, bus, devfn) &&
-         pci_ats_enabled(iommu->seg, bus, devfn) )
-        disable_ats_device(pdev);
 }
 
 static int reassign_device(struct domain *source, struct domain *target,



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v8 6/6] AMD/IOMMU: expose errors and warnings unconditionally
  2021-09-22 14:35 [PATCH v8 0/6] AMD/IOMMU: further work split from XSA-378 Jan Beulich
                   ` (4 preceding siblings ...)
  2021-09-22 14:38 ` [PATCH v8 5/6] AMD/IOMMU: pull ATS disabling earlier Jan Beulich
@ 2021-09-22 14:38 ` Jan Beulich
  2021-09-28  7:42   ` Durrant, Paul
  5 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2021-09-22 14:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

Making these dependent upon "iommu=debug" isn't really helpful in the
field. Where touching respective code anyway also make use of %pp and
%pd.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
While I'm adding AMD_IOMMU_VERBOSE(), there aren't any uses for now.
It's not really clear to me where to draw the boundary to
AMD_IOMMU_DEBUG().

I didn't bother touching iommu_guest.c here.
---
v8: New.

--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -203,6 +203,18 @@ struct acpi_ivrs_hardware;
 
 #define DMA_32BIT_MASK  0x00000000ffffffffULL
 
+#define AMD_IOMMU_ERROR(fmt, args...) \
+    printk(XENLOG_ERR "AMD-Vi: Error: " fmt, ## args)
+
+#define AMD_IOMMU_WARN(fmt, args...) \
+    printk(XENLOG_WARNING "AMD-Vi: Warning: " fmt, ## args)
+
+#define AMD_IOMMU_VERBOSE(fmt, args...) \
+    do { \
+        if ( iommu_verbose ) \
+            printk(XENLOG_INFO "AMD-Vi: " fmt, ## args); \
+    } while ( false )
+
 #define AMD_IOMMU_DEBUG(fmt, args...) \
     do  \
     {   \
--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -174,7 +174,7 @@ static int __init reserve_unity_map_for_
         if ( unity_map->addr + unity_map->length > base &&
              base + length > unity_map->addr )
         {
-            AMD_IOMMU_DEBUG("IVMD Error: overlap [%lx,%lx) vs [%lx,%lx)\n",
+            AMD_IOMMU_ERROR("IVMD: overlap [%lx,%lx) vs [%lx,%lx)\n",
                             base, base + length, unity_map->addr,
                             unity_map->addr + unity_map->length);
             return -EPERM;
@@ -248,7 +248,7 @@ static int __init register_range_for_dev
     iommu = find_iommu_for_device(seg, bdf);
     if ( !iommu )
     {
-        AMD_IOMMU_DEBUG("IVMD Error: No IOMMU for Dev_Id %#x!\n", bdf);
+        AMD_IOMMU_ERROR("IVMD: no IOMMU for Dev_Id %#x\n", bdf);
         return -ENODEV;
     }
     req = ivrs_mappings[bdf].dte_requestor_id;
@@ -318,7 +318,7 @@ static int __init parse_ivmd_device_sele
     bdf = ivmd_block->header.device_id;
     if ( bdf >= ivrs_bdf_entries )
     {
-        AMD_IOMMU_DEBUG("IVMD Error: Invalid Dev_Id %#x\n", bdf);
+        AMD_IOMMU_ERROR("IVMD: invalid Dev_Id %#x\n", bdf);
         return -ENODEV;
     }
 
@@ -335,16 +335,14 @@ static int __init parse_ivmd_device_rang
     first_bdf = ivmd_block->header.device_id;
     if ( first_bdf >= ivrs_bdf_entries )
     {
-        AMD_IOMMU_DEBUG("IVMD Error: "
-                        "Invalid Range_First Dev_Id %#x\n", first_bdf);
+        AMD_IOMMU_ERROR("IVMD: invalid Range_First Dev_Id %#x\n", first_bdf);
         return -ENODEV;
     }
 
     last_bdf = ivmd_block->aux_data;
     if ( (last_bdf >= ivrs_bdf_entries) || (last_bdf <= first_bdf) )
     {
-        AMD_IOMMU_DEBUG("IVMD Error: "
-                        "Invalid Range_Last Dev_Id %#x\n", last_bdf);
+        AMD_IOMMU_ERROR("IVMD: invalid Range_Last Dev_Id %#x\n", last_bdf);
         return -ENODEV;
     }
 
@@ -367,7 +365,7 @@ static int __init parse_ivmd_device_iomm
                                     ivmd_block->aux_data);
     if ( !iommu )
     {
-        AMD_IOMMU_DEBUG("IVMD Error: No IOMMU for Dev_Id %#x Cap %#x\n",
+        AMD_IOMMU_ERROR("IVMD: no IOMMU for Dev_Id %#x Cap %#x\n",
                         ivmd_block->header.device_id, ivmd_block->aux_data);
         return -ENODEV;
     }
@@ -384,7 +382,7 @@ static int __init parse_ivmd_block(const
 
     if ( ivmd_block->header.length < sizeof(*ivmd_block) )
     {
-        AMD_IOMMU_DEBUG("IVMD Error: Invalid Block Length!\n");
+        AMD_IOMMU_ERROR("IVMD: invalid block length\n");
         return -ENODEV;
     }
 
@@ -402,8 +400,8 @@ static int __init parse_ivmd_block(const
          (addr_bits < BITS_PER_LONG &&
           ((start_addr + mem_length - 1) >> addr_bits)) )
     {
-        AMD_IOMMU_DEBUG("IVMD: [%lx,%lx) is not IOMMU addressable\n",
-                        start_addr, start_addr + mem_length);
+        AMD_IOMMU_WARN("IVMD: [%lx,%lx) is not IOMMU addressable\n",
+                       start_addr, start_addr + mem_length);
         return 0;
     }
 
@@ -411,8 +409,8 @@ static int __init parse_ivmd_block(const
     {
         paddr_t addr;
 
-        AMD_IOMMU_DEBUG("IVMD: [%lx,%lx) is not (entirely) in reserved memory\n",
-                        base, limit + PAGE_SIZE);
+        AMD_IOMMU_WARN("IVMD: [%lx,%lx) is not (entirely) in reserved memory\n",
+                       base, limit + PAGE_SIZE);
 
         for ( addr = base; addr <= limit; addr += PAGE_SIZE )
         {
@@ -423,7 +421,7 @@ static int __init parse_ivmd_block(const
                 if ( e820_add_range(&e820, addr, addr + PAGE_SIZE,
                                     E820_RESERVED) )
                     continue;
-                AMD_IOMMU_DEBUG("IVMD Error: Page at %lx couldn't be reserved\n",
+                AMD_IOMMU_ERROR("IVMD: page at %lx couldn't be reserved\n",
                                 addr);
                 return -EIO;
             }
@@ -433,8 +431,7 @@ static int __init parse_ivmd_block(const
                            RAM_TYPE_UNUSABLE)) )
                 continue;
 
-            AMD_IOMMU_DEBUG("IVMD Error: Page at %lx can't be converted\n",
-                            addr);
+            AMD_IOMMU_ERROR("IVMD: page at %lx can't be converted\n", addr);
             return -EIO;
         }
     }
@@ -448,7 +445,7 @@ static int __init parse_ivmd_block(const
     }
     else
     {
-        AMD_IOMMU_DEBUG("IVMD Error: Invalid Flag Field!\n");
+        AMD_IOMMU_ERROR("IVMD: invalid flag field\n");
         return -ENODEV;
     }
 
@@ -471,7 +468,8 @@ static int __init parse_ivmd_block(const
                                        iw, ir, exclusion);
 
     default:
-        AMD_IOMMU_DEBUG("IVMD Error: Invalid Block Type!\n");
+        AMD_IOMMU_ERROR("IVMD: unknown block type %#x\n",
+                        ivmd_block->header.type);
         return -ENODEV;
     }
 }
@@ -481,7 +479,7 @@ static u16 __init parse_ivhd_device_padd
 {
     if ( header_length < (block_length + pad_length) )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Length!\n");
+        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry length\n");
         return 0;
     }
 
@@ -496,7 +494,7 @@ static u16 __init parse_ivhd_device_sele
     bdf = select->header.id;
     if ( bdf >= ivrs_bdf_entries )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Dev_Id %#x\n", bdf);
+        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry Dev_Id %#x\n", bdf);
         return 0;
     }
 
@@ -515,14 +513,13 @@ static u16 __init parse_ivhd_device_rang
     dev_length = sizeof(*range);
     if ( header_length < (block_length + dev_length) )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Length!\n");
+        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry length\n");
         return 0;
     }
 
     if ( range->end.header.type != ACPI_IVRS_TYPE_END )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: "
-                        "Invalid Range: End_Type %#x\n",
+        AMD_IOMMU_ERROR("IVHD Error: invalid range: End_Type %#x\n",
                         range->end.header.type);
         return 0;
     }
@@ -530,16 +527,14 @@ static u16 __init parse_ivhd_device_rang
     first_bdf = range->start.header.id;
     if ( first_bdf >= ivrs_bdf_entries )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: "
-                        "Invalid Range: First Dev_Id %#x\n", first_bdf);
+        AMD_IOMMU_ERROR("IVHD: invalid range: First Dev_Id %#x\n", first_bdf);
         return 0;
     }
 
     last_bdf = range->end.header.id;
     if ( (last_bdf >= ivrs_bdf_entries) || (last_bdf <= first_bdf) )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: "
-                        "Invalid Range: Last Dev_Id %#x\n", last_bdf);
+        AMD_IOMMU_ERROR("IVHD: invalid range: Last Dev_Id %#x\n", last_bdf);
         return 0;
     }
 
@@ -561,21 +556,21 @@ static u16 __init parse_ivhd_device_alia
     dev_length = sizeof(*alias);
     if ( header_length < (block_length + dev_length) )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Length!\n");
+        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry length\n");
         return 0;
     }
 
     bdf = alias->header.id;
     if ( bdf >= ivrs_bdf_entries )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Dev_Id %#x\n", bdf);
+        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry Dev_Id %#x\n", bdf);
         return 0;
     }
 
     alias_id = alias->used_id;
     if ( alias_id >= ivrs_bdf_entries )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: Invalid Alias Dev_Id %#x\n", alias_id);
+        AMD_IOMMU_ERROR("IVHD: invalid Alias Dev_Id %#x\n", alias_id);
         return 0;
     }
 
@@ -597,14 +592,13 @@ static u16 __init parse_ivhd_device_alia
     dev_length = sizeof(*range);
     if ( header_length < (block_length + dev_length) )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Length!\n");
+        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry length\n");
         return 0;
     }
 
     if ( range->end.header.type != ACPI_IVRS_TYPE_END )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: "
-                        "Invalid Range: End_Type %#x\n",
+        AMD_IOMMU_ERROR("IVHD: invalid range: End_Type %#x\n",
                         range->end.header.type);
         return 0;
     }
@@ -612,16 +606,14 @@ static u16 __init parse_ivhd_device_alia
     first_bdf = range->alias.header.id;
     if ( first_bdf >= ivrs_bdf_entries )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: "
-                        "Invalid Range: First Dev_Id %#x\n", first_bdf);
+        AMD_IOMMU_ERROR("IVHD: invalid range: First Dev_Id %#x\n", first_bdf);
         return 0;
     }
 
     last_bdf = range->end.header.id;
     if ( last_bdf >= ivrs_bdf_entries || last_bdf <= first_bdf )
     {
-        AMD_IOMMU_DEBUG(
-            "IVHD Error: Invalid Range: Last Dev_Id %#x\n", last_bdf);
+        AMD_IOMMU_ERROR("IVHD: invalid range: Last Dev_Id %#x\n", last_bdf);
         return 0;
     }
 
@@ -651,14 +643,14 @@ static u16 __init parse_ivhd_device_exte
     dev_length = sizeof(*ext);
     if ( header_length < (block_length + dev_length) )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Length!\n");
+        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry length\n");
         return 0;
     }
 
     bdf = ext->header.id;
     if ( bdf >= ivrs_bdf_entries )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Dev_Id %#x\n", bdf);
+        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry Dev_Id %#x\n", bdf);
         return 0;
     }
 
@@ -677,14 +669,13 @@ static u16 __init parse_ivhd_device_exte
     dev_length = sizeof(*range);
     if ( header_length < (block_length + dev_length) )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Length!\n");
+        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry length\n");
         return 0;
     }
 
     if ( range->end.header.type != ACPI_IVRS_TYPE_END )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: "
-                        "Invalid Range: End_Type %#x\n",
+        AMD_IOMMU_ERROR("IVHD: invalid range: End_Type %#x\n",
                         range->end.header.type);
         return 0;
     }
@@ -692,16 +683,14 @@ static u16 __init parse_ivhd_device_exte
     first_bdf = range->extended.header.id;
     if ( first_bdf >= ivrs_bdf_entries )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: "
-                        "Invalid Range: First Dev_Id %#x\n", first_bdf);
+        AMD_IOMMU_ERROR("IVHD: invalid range: First Dev_Id %#x\n", first_bdf);
         return 0;
     }
 
     last_bdf = range->end.header.id;
     if ( (last_bdf >= ivrs_bdf_entries) || (last_bdf <= first_bdf) )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: "
-                        "Invalid Range: Last Dev_Id %#x\n", last_bdf);
+        AMD_IOMMU_ERROR("IVHD: invalid range: Last Dev_Id %#x\n", last_bdf);
         return 0;
     }
 
@@ -789,14 +778,14 @@ static u16 __init parse_ivhd_device_spec
     dev_length = sizeof(*special);
     if ( header_length < (block_length + dev_length) )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Length!\n");
+        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry length\n");
         return 0;
     }
 
     bdf = special->used_id;
     if ( bdf >= ivrs_bdf_entries )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Dev_Id %#x\n", bdf);
+        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry Dev_Id %#x\n", bdf);
         return 0;
     }
 
@@ -844,12 +833,12 @@ static u16 __init parse_ivhd_device_spec
             {
                 if ( ioapic_sbdf[idx].bdf == bdf &&
                      ioapic_sbdf[idx].seg == seg )
-                    AMD_IOMMU_DEBUG("IVHD Warning: Duplicate IO-APIC %#x entries\n",
+                    AMD_IOMMU_WARN("IVHD: duplicate IO-APIC %#x entries\n",
                                     special->handle);
                 else
                 {
-                    printk(XENLOG_ERR "IVHD Error: Conflicting IO-APIC %#x entries\n",
-                           special->handle);
+                    AMD_IOMMU_ERROR("IVHD: conflicting IO-APIC %#x entries\n",
+                                    special->handle);
                     if ( amd_iommu_perdev_intremap )
                         return 0;
                 }
@@ -944,7 +933,7 @@ static int __init parse_ivhd_block(const
 
     if ( ivhd_block->header.length < hdr_size )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: Invalid Block Length!\n");
+        AMD_IOMMU_ERROR("IVHD: invalid block length\n");
         return -ENODEV;
     }
 
@@ -953,7 +942,7 @@ static int __init parse_ivhd_block(const
                                     ivhd_block->capability_offset);
     if ( !iommu )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: No IOMMU for Dev_Id %#x Cap %#x\n",
+        AMD_IOMMU_ERROR("IVHD: no IOMMU for Dev_Id %#x Cap %#x\n",
                         ivhd_block->header.device_id,
                         ivhd_block->capability_offset);
         return -ENODEV;
@@ -1016,7 +1005,8 @@ static int __init parse_ivhd_block(const
                 ivhd_block->header.length, block_length, iommu);
             break;
         default:
-            AMD_IOMMU_DEBUG("IVHD Error: Invalid Device Type!\n");
+            AMD_IOMMU_WARN("IVHD: unknown device type %#x\n",
+                           ivhd_device->header.type);
             dev_length = 0;
             break;
         }
@@ -1113,8 +1103,7 @@ static int __init parse_ivrs_table(struc
 
         if ( table->length < (length + ivrs_block->length) )
         {
-            AMD_IOMMU_DEBUG("IVRS Error: "
-                            "Table Length Exceeded: %#x -> %#lx\n",
+            AMD_IOMMU_ERROR("IVRS: table length exceeded: %#x -> %#lx\n",
                             table->length,
                             (length + ivrs_block->length));
             return -ENODEV;
@@ -1214,7 +1203,7 @@ static int __init get_last_bdf_ivhd(
 
     if ( ivhd_block->header.length < hdr_size )
     {
-        AMD_IOMMU_DEBUG("IVHD Error: Invalid Block Length!\n");
+        AMD_IOMMU_ERROR("IVHD: invalid block length\n");
         return -ENODEV;
     }
 
@@ -1261,7 +1250,8 @@ static int __init get_last_bdf_ivhd(
             dev_length = sizeof(ivhd_device->special);
             break;
         default:
-            AMD_IOMMU_DEBUG("IVHD Error: Invalid Device Type!\n");
+            AMD_IOMMU_WARN("IVHD: unknown device type %#x\n",
+                           ivhd_device->header.type);
             dev_length = 0;
             break;
         }
@@ -1327,7 +1317,7 @@ get_supported_ivhd_type(struct acpi_tabl
     checksum = acpi_tb_checksum(ACPI_CAST_PTR(uint8_t, table), table->length);
     if ( checksum )
     {
-        AMD_IOMMU_DEBUG("IVRS Error: Invalid Checksum %#x\n", checksum);
+        AMD_IOMMU_ERROR("IVRS: invalid checksum %#x\n", checksum);
         return -ENODEV;
     }
 
@@ -1340,8 +1330,7 @@ get_supported_ivhd_type(struct acpi_tabl
 
         if ( table->length < (length + ivrs_block->length) )
         {
-            AMD_IOMMU_DEBUG("IVRS Error: "
-                            "Table Length Exceeded: %#x -> %#lx\n",
+            AMD_IOMMU_ERROR("IVRS: table length exceeded: %#x -> %#lx\n",
                             table->length,
                             (length + ivrs_block->length));
             return -ENODEV;
--- a/xen/drivers/passthrough/amd/iommu_cmd.c
+++ b/xen/drivers/passthrough/amd/iommu_cmd.c
@@ -291,8 +291,7 @@ void amd_iommu_flush_iotlb(u8 devfn, con
 
     if ( !iommu )
     {
-        AMD_IOMMU_DEBUG("%s: Can't find iommu for %pp\n",
-                        __func__, &pdev->sbdf);
+        AMD_IOMMU_WARN("can't find IOMMU for %pp\n", &pdev->sbdf);
         return;
     }
 
--- a/xen/drivers/passthrough/amd/iommu_detect.c
+++ b/xen/drivers/passthrough/amd/iommu_detect.c
@@ -141,21 +141,21 @@ int __init amd_iommu_detect_one_acpi(
 
     if ( ivhd_block->header.length < sizeof(*ivhd_block) )
     {
-        AMD_IOMMU_DEBUG("Invalid IVHD Block Length!\n");
+        AMD_IOMMU_ERROR("invalid IVHD block length\n");
         return -ENODEV;
     }
 
     if ( !ivhd_block->header.device_id ||
         !ivhd_block->capability_offset || !ivhd_block->base_address)
     {
-        AMD_IOMMU_DEBUG("Invalid IVHD Block!\n");
+        AMD_IOMMU_ERROR("invalid IVHD block\n");
         return -ENODEV;
     }
 
     iommu = xzalloc(struct amd_iommu);
     if ( !iommu )
     {
-        AMD_IOMMU_DEBUG("Error allocating amd_iommu\n");
+        AMD_IOMMU_ERROR("cannot allocate amd_iommu\n");
         return -ENOMEM;
     }
 
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -386,8 +386,8 @@ static void iommu_reset_log(struct amd_i
 
     if ( log_run )
     {
-        AMD_IOMMU_DEBUG("Warning: Log Run bit %d is not cleared"
-                        "before reset!\n", run_bit);
+        AMD_IOMMU_WARN("Log Run bit %d is not cleared before reset\n",
+                       run_bit);
         return;
     }
 
@@ -754,8 +754,8 @@ static bool_t __init set_iommu_interrupt
     pcidevs_unlock();
     if ( !iommu->msi.dev )
     {
-        AMD_IOMMU_DEBUG("IOMMU: no pdev for %pp\n",
-                        &PCI_SBDF2(iommu->seg, iommu->bdf));
+        AMD_IOMMU_WARN("no pdev for %pp\n",
+                       &PCI_SBDF2(iommu->seg, iommu->bdf));
         return 0;
     }
 
@@ -799,7 +799,7 @@ static bool_t __init set_iommu_interrupt
     if ( ret )
     {
         destroy_irq(irq);
-        AMD_IOMMU_DEBUG("can't request irq\n");
+        AMD_IOMMU_ERROR("can't request irq\n");
         return 0;
     }
 
@@ -992,7 +992,7 @@ static void *__init allocate_buffer(unsi
 
     if ( buffer == NULL )
     {
-        AMD_IOMMU_DEBUG("Error allocating %s\n", name);
+        AMD_IOMMU_ERROR("cannot allocate %s\n", name);
         return NULL;
     }
 
@@ -1224,7 +1224,7 @@ static int __init alloc_ivrs_mappings(u1
     ivrs_mappings = xzalloc_array(struct ivrs_mappings, ivrs_bdf_entries + 1);
     if ( ivrs_mappings == NULL )
     {
-        AMD_IOMMU_DEBUG("Error allocating IVRS Mappings table\n");
+        AMD_IOMMU_ERROR("cannot allocate IVRS Mappings table\n");
         return -ENOMEM;
     }
     IVRS_MAPPINGS_SEG(ivrs_mappings) = seg;
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -377,8 +377,8 @@ void amd_iommu_ioapic_update_ire(
     iommu = find_iommu_for_device(seg, bdf);
     if ( !iommu )
     {
-        AMD_IOMMU_DEBUG("Fail to find iommu for ioapic device id ="
-                        " %04x:%04x\n", seg, bdf);
+        AMD_IOMMU_WARN("failed to find IOMMU for IO-APIC @ %04x:%04x\n",
+                       seg, bdf);
         __io_apic_write(apic, reg, value);
         return;
     }
@@ -747,8 +747,8 @@ bool __init iov_supports_xt(void)
         if ( !find_iommu_for_device(ioapic_sbdf[idx].seg,
                                     ioapic_sbdf[idx].bdf) )
         {
-            AMD_IOMMU_DEBUG("No IOMMU for IO-APIC %#x (ID %x)\n",
-                            apic, IO_APIC_ID(apic));
+            AMD_IOMMU_WARN("no IOMMU for IO-APIC %#x (ID %x)\n",
+                           apic, IO_APIC_ID(apic));
             return false;
         }
     }
@@ -765,14 +765,12 @@ int __init amd_setup_hpet_msi(struct msi
 
     if ( hpet_sbdf.init == HPET_NONE )
     {
-        AMD_IOMMU_DEBUG("Failed to setup HPET MSI remapping."
-                        " Missing IVRS HPET info.\n");
+        AMD_IOMMU_ERROR("failed to setup HPET MSI remapping: missing IVRS HPET info\n");
         return -ENODEV;
     }
     if ( msi_desc->hpet_id != hpet_sbdf.id )
     {
-        AMD_IOMMU_DEBUG("Failed to setup HPET MSI remapping."
-                        " Wrong HPET.\n");
+        AMD_IOMMU_ERROR("failed to setup HPET MSI remapping: wrong HPET\n");
         return -ENODEV;
     }
 
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -222,7 +222,7 @@ static int iommu_pde_from_dfn(struct dom
             table = iommu_alloc_pgtable(d);
             if ( table == NULL )
             {
-                AMD_IOMMU_DEBUG("Cannot allocate I/O page table\n");
+                AMD_IOMMU_ERROR("cannot allocate I/O page table\n");
                 unmap_domain_page(next_table_vaddr);
                 return 1;
             }
@@ -252,7 +252,7 @@ static int iommu_pde_from_dfn(struct dom
                 table = iommu_alloc_pgtable(d);
                 if ( table == NULL )
                 {
-                    AMD_IOMMU_DEBUG("Cannot allocate I/O page table\n");
+                    AMD_IOMMU_ERROR("cannot allocate I/O page table\n");
                     unmap_domain_page(next_table_vaddr);
                     return 1;
                 }
@@ -301,7 +301,7 @@ int amd_iommu_map_page(struct domain *d,
     if ( rc )
     {
         spin_unlock(&hd->arch.mapping_lock);
-        AMD_IOMMU_DEBUG("Root table alloc failed, dfn = %"PRI_dfn"\n",
+        AMD_IOMMU_ERROR("root table alloc failed, dfn = %"PRI_dfn"\n",
                         dfn_x(dfn));
         domain_crash(d);
         return rc;
@@ -310,7 +310,7 @@ int amd_iommu_map_page(struct domain *d,
     if ( iommu_pde_from_dfn(d, dfn_x(dfn), &pt_mfn, true) || !pt_mfn )
     {
         spin_unlock(&hd->arch.mapping_lock);
-        AMD_IOMMU_DEBUG("Invalid IO pagetable entry dfn = %"PRI_dfn"\n",
+        AMD_IOMMU_ERROR("invalid IO pagetable entry dfn = %"PRI_dfn"\n",
                         dfn_x(dfn));
         domain_crash(d);
         return -EFAULT;
@@ -343,7 +343,7 @@ int amd_iommu_unmap_page(struct domain *
     if ( iommu_pde_from_dfn(d, dfn_x(dfn), &pt_mfn, false) )
     {
         spin_unlock(&hd->arch.mapping_lock);
-        AMD_IOMMU_DEBUG("Invalid IO pagetable entry dfn = %"PRI_dfn"\n",
+        AMD_IOMMU_ERROR("invalid IO pagetable entry dfn = %"PRI_dfn"\n",
                         dfn_x(dfn));
         domain_crash(d);
         return -EFAULT;
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -367,10 +367,8 @@ static int reassign_device(struct domain
     iommu = find_iommu_for_device(pdev->seg, bdf);
     if ( !iommu )
     {
-        AMD_IOMMU_DEBUG("Fail to find iommu."
-                        " %04x:%02x:%x02.%x cannot be assigned to dom%d\n",
-                        pdev->seg, pdev->bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-                        target->domain_id);
+        AMD_IOMMU_WARN("failed to find IOMMU: %pp cannot be assigned to %pd\n",
+                       &pdev->sbdf, target);
         return -ENODEV;
     }
 
@@ -484,8 +482,8 @@ static int amd_iommu_add_device(u8 devfn
             return 0;
         }
 
-        AMD_IOMMU_DEBUG("No iommu for %pp; cannot be handed to d%d\n",
-                        &pdev->sbdf, pdev->domain->domain_id);
+        AMD_IOMMU_WARN("no IOMMU for %pp; cannot be handed to %pd\n",
+                        &pdev->sbdf, pdev->domain);
         return -ENODEV;
     }
 
@@ -527,9 +525,8 @@ static int amd_iommu_add_device(u8 devfn
              pdev->domain,
              ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].unity_map,
              0) )
-        AMD_IOMMU_DEBUG("%pd: unity mapping failed for %04x:%02x:%02x.%u\n",
-                        pdev->domain, pdev->seg, pdev->bus, PCI_SLOT(devfn),
-                        PCI_FUNC(devfn));
+        AMD_IOMMU_WARN("%pd: unity mapping failed for %pp\n",
+                       pdev->domain, &pdev->sbdf);
 
     return amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
 }
@@ -547,7 +544,7 @@ static int amd_iommu_remove_device(u8 de
     iommu = find_iommu_for_device(pdev->seg, bdf);
     if ( !iommu )
     {
-        AMD_IOMMU_DEBUG("Fail to find iommu. %pp cannot be removed from %pd\n",
+        AMD_IOMMU_WARN("failed to find IOMMU: %pp cannot be removed from %pd\n",
                         &pdev->sbdf, pdev->domain);
         return -ENODEV;
     }
@@ -560,9 +557,8 @@ static int amd_iommu_remove_device(u8 de
     if ( amd_iommu_reserve_domain_unity_unmap(
              pdev->domain,
              ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].unity_map) )
-        AMD_IOMMU_DEBUG("%pd: unity unmapping failed for %04x:%02x:%02x.%u\n",
-                        pdev->domain, pdev->seg, pdev->bus, PCI_SLOT(devfn),
-                        PCI_FUNC(devfn));
+        AMD_IOMMU_WARN("%pd: unity unmapping failed for %pp\n",
+                       pdev->domain, &pdev->sbdf);
 
     if ( amd_iommu_perdev_intremap &&
          ivrs_mappings[bdf].dte_requestor_id == bdf &&



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier
  2021-09-22 14:36 ` [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier Jan Beulich
@ 2021-09-28  7:12   ` Durrant, Paul
  2021-10-19 23:34   ` Andrew Cooper
  1 sibling, 0 replies; 17+ messages in thread
From: Durrant, Paul @ 2021-09-28  7:12 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Andrew Cooper, Paul Durrant

On 22/09/2021 15:36, Jan Beulich wrote:
> Doing this in amd_iommu_prepare() is too late for it, in particular, to
> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
> (luckily) pretty simple, (pretty importantly) without breaking
> amd_iommu_prepare()'s logic to prevent multiple processing.
> 
> This involves moving table checksumming, as
> amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
> now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
> the course of doing so stop open-coding acpi_tb_checksum(), seeing that
> we have other uses of this originally ACPI-private function elsewhere in
> the tree.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Paul Durrant <paul@xen.org>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 4/6] AMD/IOMMU: respect AtsDisabled device flag
  2021-09-22 14:37 ` [PATCH v8 4/6] AMD/IOMMU: respect AtsDisabled device flag Jan Beulich
@ 2021-09-28  7:34   ` Durrant, Paul
  2021-09-28  7:47     ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Durrant, Paul @ 2021-09-28  7:34 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Andrew Cooper, Paul Durrant

On 22/09/2021 15:37, Jan Beulich wrote:
> IVHD entries may specify that ATS is to be blocked for a device or range
> of devices. Honor firmware telling us so.
> 
> While adding respective checks I noticed that the 2nd conditional in
> amd_iommu_setup_domain_device() failed to check the IOMMU's capability.
> Add the missing part of the condition there, as no good can come from
> enabling ATS on a device when the IOMMU is not capable of dealing with
> ATS requests.
> 
> For actually using ACPI_IVHD_ATS_DISABLED, make its expansion no longer
> exhibit UB.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> TBD: As an alternative to adding the missing IOMMU capability check, we
>       may want to consider simply using dte->i in the 2nd conditional in
>       amd_iommu_setup_domain_device().
> Note that while ATS enabling/disabling gets invoked without any locks
> held, the two functions should not be possible to race with one another
> for any individual device (or else we'd be in trouble already, as ATS
> might then get re-enabled immediately after it was disabled, with the
> DTE out of sync with this setting).
> ---
> v7: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu.h
> +++ b/xen/drivers/passthrough/amd/iommu.h
> @@ -120,6 +120,7 @@ struct ivrs_mappings {
>       uint16_t dte_requestor_id;
>       bool valid:1;
>       bool dte_allow_exclusion:1;
> +    bool block_ats:1;
>   
>       /* ivhd device data settings */
>       uint8_t device_flags;
> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
> @@ -55,8 +55,8 @@ union acpi_ivhd_device {
>   };
>   
>   static void __init add_ivrs_mapping_entry(
> -    uint16_t bdf, uint16_t alias_id, uint8_t flags, bool alloc_irt,
> -    struct amd_iommu *iommu)
> +    uint16_t bdf, uint16_t alias_id, uint8_t flags, unsigned int ext_flags,
> +    bool alloc_irt, struct amd_iommu *iommu)
>   {
>       struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(iommu->seg);
>   
> @@ -66,6 +66,7 @@ static void __init add_ivrs_mapping_entr
>       ivrs_mappings[bdf].dte_requestor_id = alias_id;
>   
>       /* override flags for range of devices */
> +    ivrs_mappings[bdf].block_ats = ext_flags & ACPI_IVHD_ATS_DISABLED;
>       ivrs_mappings[bdf].device_flags = flags;

I'm assuming the above indentation problem (which appears to be repeated 
in a few places below) is more to do with patch email generation rather 
than the actual code modifications so...

Reviewed-by: Paul Durrant <paul@xen.org>

>   
>       /* Don't map an IOMMU by itself. */
> @@ -499,7 +500,7 @@ static u16 __init parse_ivhd_device_sele
>           return 0;
>       }
>   
> -    add_ivrs_mapping_entry(bdf, bdf, select->header.data_setting, false,
> +    add_ivrs_mapping_entry(bdf, bdf, select->header.data_setting, 0, false,
>                              iommu);
>   
>       return sizeof(*select);
> @@ -545,7 +546,7 @@ static u16 __init parse_ivhd_device_rang
>       AMD_IOMMU_DEBUG(" Dev_Id Range: %#x -> %#x\n", first_bdf, last_bdf);
>   
>       for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
> -        add_ivrs_mapping_entry(bdf, bdf, range->start.header.data_setting,
> +        add_ivrs_mapping_entry(bdf, bdf, range->start.header.data_setting, 0,
>                                  false, iommu);
>   
>       return dev_length;
> @@ -580,7 +581,7 @@ static u16 __init parse_ivhd_device_alia
>   
>       AMD_IOMMU_DEBUG(" Dev_Id Alias: %#x\n", alias_id);
>   
> -    add_ivrs_mapping_entry(bdf, alias_id, alias->header.data_setting, true,
> +    add_ivrs_mapping_entry(bdf, alias_id, alias->header.data_setting, 0, true,
>                              iommu);
>   
>       return dev_length;
> @@ -636,7 +637,7 @@ static u16 __init parse_ivhd_device_alia
>   
>       for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
>           add_ivrs_mapping_entry(bdf, alias_id, range->alias.header.data_setting,
> -                               true, iommu);
> +                               0, true, iommu);
>   
>       return dev_length;
>   }
> @@ -661,7 +662,8 @@ static u16 __init parse_ivhd_device_exte
>           return 0;
>       }
>   
> -    add_ivrs_mapping_entry(bdf, bdf, ext->header.data_setting, false, iommu);
> +    add_ivrs_mapping_entry(bdf, bdf, ext->header.data_setting,
> +                           ext->extended_data, false, iommu);
>   
>       return dev_length;
>   }
> @@ -708,7 +710,7 @@ static u16 __init parse_ivhd_device_exte
>   
>       for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
>           add_ivrs_mapping_entry(bdf, bdf, range->extended.header.data_setting,
> -                               false, iommu);
> +                               range->extended.extended_data, false, iommu);
>   
>       return dev_length;
>   }
> @@ -800,7 +802,7 @@ static u16 __init parse_ivhd_device_spec
>   
>       AMD_IOMMU_DEBUG("IVHD Special: %pp variety %#x handle %#x\n",
>                       &PCI_SBDF2(seg, bdf), special->variety, special->handle);
> -    add_ivrs_mapping_entry(bdf, bdf, special->header.data_setting, true,
> +    add_ivrs_mapping_entry(bdf, bdf, special->header.data_setting, 0, true,
>                              iommu);
>   
>       switch ( special->variety )
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -105,6 +105,7 @@ static int __must_check amd_iommu_setup_
>       int req_id, valid = 1, rc;
>       u8 bus = pdev->bus;
>       struct domain_iommu *hd = dom_iommu(domain);
> +    const struct ivrs_mappings *ivrs_dev;
>   
>       if ( QUARANTINE_SKIP(domain) )
>           return 0;
> @@ -122,20 +123,18 @@ static int __must_check amd_iommu_setup_
>       req_id = get_dma_requestor_id(iommu->seg, PCI_BDF2(bus, devfn));
>       table = iommu->dev_table.buffer;
>       dte = &table[req_id];
> +    ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
>   
>       spin_lock_irqsave(&iommu->lock, flags);
>   
>       if ( !dte->v || !dte->tv )
>       {
> -        const struct ivrs_mappings *ivrs_dev;
> -
>           /* bind DTE to domain page-tables */
>           amd_iommu_set_root_page_table(
>               dte, page_to_maddr(hd->arch.amd.root_table),
>               domain->domain_id, hd->arch.amd.paging_mode, valid);
>   
>           /* Undo what amd_iommu_disable_domain_device() may have done. */
> -        ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
>           if ( dte->it_root )
>           {
>               dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
> @@ -146,6 +145,7 @@ static int __must_check amd_iommu_setup_
>           dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, ACPI_IVHD_SYSTEM_MGMT);
>   
>           if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
> +             !ivrs_dev->block_ats &&
>                iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) )
>               dte->i = ats_enabled;
>   
> @@ -166,6 +166,8 @@ static int __must_check amd_iommu_setup_
>       ASSERT(pcidevs_locked());
>   
>       if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
> +         !ivrs_dev->block_ats &&
> +         iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) &&
>            !pci_ats_enabled(iommu->seg, bus, pdev->devfn) )
>       {
>           if ( devfn == pdev->devfn )
> --- a/xen/include/acpi/actbl2.h
> +++ b/xen/include/acpi/actbl2.h
> @@ -851,7 +851,7 @@ struct acpi_ivrs_device8b {
>   
>   /* Values for extended_data above */
>   
> -#define ACPI_IVHD_ATS_DISABLED      (1<<31)
> +#define ACPI_IVHD_ATS_DISABLED      (1u << 31)
>   
>   /* Type 72: 8-byte device entry */
>   
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 5/6] AMD/IOMMU: pull ATS disabling earlier
  2021-09-22 14:38 ` [PATCH v8 5/6] AMD/IOMMU: pull ATS disabling earlier Jan Beulich
@ 2021-09-28  7:36   ` Durrant, Paul
  0 siblings, 0 replies; 17+ messages in thread
From: Durrant, Paul @ 2021-09-28  7:36 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Andrew Cooper, Paul Durrant

On 22/09/2021 15:38, Jan Beulich wrote:
> Disabling should be done in the opposite order of enabling: ATS wants to
> be turned off before adjusting the DTE, just like it gets enabled only
> after the DTE was suitably prepared. Note that we want ATS to be
> disabled as soon as any of the DTEs involved in the handling of a device
> (including phantom devices) gets adjusted respectively. For this reason
> the "devfn == pdev->devfn" of the original conditional gets dropped.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Paul Durrant <paul@xen.org>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 6/6] AMD/IOMMU: expose errors and warnings unconditionally
  2021-09-22 14:38 ` [PATCH v8 6/6] AMD/IOMMU: expose errors and warnings unconditionally Jan Beulich
@ 2021-09-28  7:42   ` Durrant, Paul
  2021-09-28  7:50     ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Durrant, Paul @ 2021-09-28  7:42 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Andrew Cooper, Paul Durrant

On 22/09/2021 15:38, Jan Beulich wrote:
> Making these dependent upon "iommu=debug" isn't really helpful in the
> field. Where touching respective code anyway also make use of %pp and
> %pd.
> 
> Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Paul Durrant <paul@xen.org>

... with one nit below...

> ---
[snip]
> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
> @@ -174,7 +174,7 @@ static int __init reserve_unity_map_for_
>           if ( unity_map->addr + unity_map->length > base &&
>                base + length > unity_map->addr )
>           {
> -            AMD_IOMMU_DEBUG("IVMD Error: overlap [%lx,%lx) vs [%lx,%lx)\n",
> +            AMD_IOMMU_ERROR("IVMD: overlap [%lx,%lx) vs [%lx,%lx)\n",
>                               base, base + length, unity_map->addr,
>                               unity_map->addr + unity_map->length);
>               return -EPERM;
> @@ -248,7 +248,7 @@ static int __init register_range_for_dev
>       iommu = find_iommu_for_device(seg, bdf);
>       if ( !iommu )
>       {
> -        AMD_IOMMU_DEBUG("IVMD Error: No IOMMU for Dev_Id %#x!\n", bdf);
> +        AMD_IOMMU_ERROR("IVMD: no IOMMU for Dev_Id %#x\n", bdf);
>           return -ENODEV;
>       }
>       req = ivrs_mappings[bdf].dte_requestor_id;
> @@ -318,7 +318,7 @@ static int __init parse_ivmd_device_sele
>       bdf = ivmd_block->header.device_id;
>       if ( bdf >= ivrs_bdf_entries )
>       {
> -        AMD_IOMMU_DEBUG("IVMD Error: Invalid Dev_Id %#x\n", bdf);
> +        AMD_IOMMU_ERROR("IVMD: invalid Dev_Id %#x\n", bdf);
>           return -ENODEV;
>       }
>   
> @@ -335,16 +335,14 @@ static int __init parse_ivmd_device_rang
>       first_bdf = ivmd_block->header.device_id;
>       if ( first_bdf >= ivrs_bdf_entries )
>       {
> -        AMD_IOMMU_DEBUG("IVMD Error: "
> -                        "Invalid Range_First Dev_Id %#x\n", first_bdf);
> +        AMD_IOMMU_ERROR("IVMD: invalid Range_First Dev_Id %#x\n", first_bdf);
>           return -ENODEV;
>       }
>   
>       last_bdf = ivmd_block->aux_data;
>       if ( (last_bdf >= ivrs_bdf_entries) || (last_bdf <= first_bdf) )
>       {
> -        AMD_IOMMU_DEBUG("IVMD Error: "
> -                        "Invalid Range_Last Dev_Id %#x\n", last_bdf);
> +        AMD_IOMMU_ERROR("IVMD: invalid Range_Last Dev_Id %#x\n", last_bdf);
>           return -ENODEV;
>       }
>   
> @@ -367,7 +365,7 @@ static int __init parse_ivmd_device_iomm
>                                       ivmd_block->aux_data);
>       if ( !iommu )
>       {
> -        AMD_IOMMU_DEBUG("IVMD Error: No IOMMU for Dev_Id %#x Cap %#x\n",
> +        AMD_IOMMU_ERROR("IVMD: no IOMMU for Dev_Id %#x Cap %#x\n",
>                           ivmd_block->header.device_id, ivmd_block->aux_data);
>           return -ENODEV;
>       }
> @@ -384,7 +382,7 @@ static int __init parse_ivmd_block(const
>   
>       if ( ivmd_block->header.length < sizeof(*ivmd_block) )
>       {
> -        AMD_IOMMU_DEBUG("IVMD Error: Invalid Block Length!\n");
> +        AMD_IOMMU_ERROR("IVMD: invalid block length\n");
>           return -ENODEV;
>       }
>   
> @@ -402,8 +400,8 @@ static int __init parse_ivmd_block(const
>            (addr_bits < BITS_PER_LONG &&
>             ((start_addr + mem_length - 1) >> addr_bits)) )
>       {
> -        AMD_IOMMU_DEBUG("IVMD: [%lx,%lx) is not IOMMU addressable\n",
> -                        start_addr, start_addr + mem_length);
> +        AMD_IOMMU_WARN("IVMD: [%lx,%lx) is not IOMMU addressable\n",
> +                       start_addr, start_addr + mem_length);
>           return 0;
>       }
>   
> @@ -411,8 +409,8 @@ static int __init parse_ivmd_block(const
>       {
>           paddr_t addr;
>   
> -        AMD_IOMMU_DEBUG("IVMD: [%lx,%lx) is not (entirely) in reserved memory\n",
> -                        base, limit + PAGE_SIZE);
> +        AMD_IOMMU_WARN("IVMD: [%lx,%lx) is not (entirely) in reserved memory\n",
> +                       base, limit + PAGE_SIZE);
>   
>           for ( addr = base; addr <= limit; addr += PAGE_SIZE )
>           {
> @@ -423,7 +421,7 @@ static int __init parse_ivmd_block(const
>                   if ( e820_add_range(&e820, addr, addr + PAGE_SIZE,
>                                       E820_RESERVED) )
>                       continue;
> -                AMD_IOMMU_DEBUG("IVMD Error: Page at %lx couldn't be reserved\n",
> +                AMD_IOMMU_ERROR("IVMD: page at %lx couldn't be reserved\n",
>                                   addr);
>                   return -EIO;
>               }
> @@ -433,8 +431,7 @@ static int __init parse_ivmd_block(const
>                              RAM_TYPE_UNUSABLE)) )
>                   continue;
>   
> -            AMD_IOMMU_DEBUG("IVMD Error: Page at %lx can't be converted\n",
> -                            addr);
> +            AMD_IOMMU_ERROR("IVMD: page at %lx can't be converted\n", addr);
>               return -EIO;
>           }
>       }
> @@ -448,7 +445,7 @@ static int __init parse_ivmd_block(const
>       }
>       else
>       {
> -        AMD_IOMMU_DEBUG("IVMD Error: Invalid Flag Field!\n");
> +        AMD_IOMMU_ERROR("IVMD: invalid flag field\n");
>           return -ENODEV;
>       }
>   
> @@ -471,7 +468,8 @@ static int __init parse_ivmd_block(const
>                                          iw, ir, exclusion);
>   
>       default:
> -        AMD_IOMMU_DEBUG("IVMD Error: Invalid Block Type!\n");
> +        AMD_IOMMU_ERROR("IVMD: unknown block type %#x\n",
> +                        ivmd_block->header.type);
>           return -ENODEV;
>       }
>   }
> @@ -481,7 +479,7 @@ static u16 __init parse_ivhd_device_padd
>   {
>       if ( header_length < (block_length + pad_length) )
>       {
> -        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Length!\n");
> +        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry length\n");
>           return 0;
>       }
>   
> @@ -496,7 +494,7 @@ static u16 __init parse_ivhd_device_sele
>       bdf = select->header.id;
>       if ( bdf >= ivrs_bdf_entries )
>       {
> -        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Dev_Id %#x\n", bdf);
> +        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry Dev_Id %#x\n", bdf);
>           return 0;
>       }
>   
> @@ -515,14 +513,13 @@ static u16 __init parse_ivhd_device_rang
>       dev_length = sizeof(*range);
>       if ( header_length < (block_length + dev_length) )
>       {
> -        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Length!\n");
> +        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry length\n");
>           return 0;
>       }
>   
>       if ( range->end.header.type != ACPI_IVRS_TYPE_END )
>       {
> -        AMD_IOMMU_DEBUG("IVHD Error: "
> -                        "Invalid Range: End_Type %#x\n",
> +        AMD_IOMMU_ERROR("IVHD Error: invalid range: End_Type %#x\n",

NIT: I guess you want to drop the 'Error' here like you did elsewhere.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 4/6] AMD/IOMMU: respect AtsDisabled device flag
  2021-09-28  7:34   ` Durrant, Paul
@ 2021-09-28  7:47     ` Jan Beulich
  2021-09-28  7:57       ` Durrant, Paul
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2021-09-28  7:47 UTC (permalink / raw)
  To: paul; +Cc: Andrew Cooper, xen-devel

On 28.09.2021 09:34, Durrant, Paul wrote:
> On 22/09/2021 15:37, Jan Beulich wrote:
>> IVHD entries may specify that ATS is to be blocked for a device or range
>> of devices. Honor firmware telling us so.
>>
>> While adding respective checks I noticed that the 2nd conditional in
>> amd_iommu_setup_domain_device() failed to check the IOMMU's capability.
>> Add the missing part of the condition there, as no good can come from
>> enabling ATS on a device when the IOMMU is not capable of dealing with
>> ATS requests.
>>
>> For actually using ACPI_IVHD_ATS_DISABLED, make its expansion no longer
>> exhibit UB.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> TBD: As an alternative to adding the missing IOMMU capability check, we
>>       may want to consider simply using dte->i in the 2nd conditional in
>>       amd_iommu_setup_domain_device().
>> Note that while ATS enabling/disabling gets invoked without any locks
>> held, the two functions should not be possible to race with one another
>> for any individual device (or else we'd be in trouble already, as ATS
>> might then get re-enabled immediately after it was disabled, with the
>> DTE out of sync with this setting).
>> ---
>> v7: New.
>>
>> --- a/xen/drivers/passthrough/amd/iommu.h
>> +++ b/xen/drivers/passthrough/amd/iommu.h
>> @@ -120,6 +120,7 @@ struct ivrs_mappings {
>>       uint16_t dte_requestor_id;
>>       bool valid:1;
>>       bool dte_allow_exclusion:1;
>> +    bool block_ats:1;
>>   
>>       /* ivhd device data settings */
>>       uint8_t device_flags;
>> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
>> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
>> @@ -55,8 +55,8 @@ union acpi_ivhd_device {
>>   };
>>   
>>   static void __init add_ivrs_mapping_entry(
>> -    uint16_t bdf, uint16_t alias_id, uint8_t flags, bool alloc_irt,
>> -    struct amd_iommu *iommu)
>> +    uint16_t bdf, uint16_t alias_id, uint8_t flags, unsigned int ext_flags,
>> +    bool alloc_irt, struct amd_iommu *iommu)
>>   {
>>       struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(iommu->seg);
>>   
>> @@ -66,6 +66,7 @@ static void __init add_ivrs_mapping_entr
>>       ivrs_mappings[bdf].dte_requestor_id = alias_id;
>>   
>>       /* override flags for range of devices */
>> +    ivrs_mappings[bdf].block_ats = ext_flags & ACPI_IVHD_ATS_DISABLED;
>>       ivrs_mappings[bdf].device_flags = flags;
> 
> I'm assuming the above indentation problem (which appears to be repeated 
> in a few places below) is more to do with patch email generation rather 
> than the actual code modifications so...

Hmm, I first suspected a Thunderbird regression, but checking the list
archives I don't see any corruption. I'm therefore now suspecting the
problem may be at your end ...

> Reviewed-by: Paul Durrant <paul@xen.org>

Thanks.

Jan



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 6/6] AMD/IOMMU: expose errors and warnings unconditionally
  2021-09-28  7:42   ` Durrant, Paul
@ 2021-09-28  7:50     ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2021-09-28  7:50 UTC (permalink / raw)
  To: paul; +Cc: Andrew Cooper, xen-devel

On 28.09.2021 09:42, Durrant, Paul wrote:
> On 22/09/2021 15:38, Jan Beulich wrote:
>> Making these dependent upon "iommu=debug" isn't really helpful in the
>> field. Where touching respective code anyway also make use of %pp and
>> %pd.
>>
>> Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Paul Durrant <paul@xen.org>

Thanks.

>> @@ -515,14 +513,13 @@ static u16 __init parse_ivhd_device_rang
>>       dev_length = sizeof(*range);
>>       if ( header_length < (block_length + dev_length) )
>>       {
>> -        AMD_IOMMU_DEBUG("IVHD Error: Invalid Device_Entry Length!\n");
>> +        AMD_IOMMU_ERROR("IVHD: invalid Device_Entry length\n");
>>           return 0;
>>       }
>>   
>>       if ( range->end.header.type != ACPI_IVRS_TYPE_END )
>>       {
>> -        AMD_IOMMU_DEBUG("IVHD Error: "
>> -                        "Invalid Range: End_Type %#x\n",
>> +        AMD_IOMMU_ERROR("IVHD Error: invalid range: End_Type %#x\n",
> 
> NIT: I guess you want to drop the 'Error' here like you did elsewhere.

Yes indeed. Fixed. Thanks for spotting.

Jan



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 4/6] AMD/IOMMU: respect AtsDisabled device flag
  2021-09-28  7:47     ` Jan Beulich
@ 2021-09-28  7:57       ` Durrant, Paul
  0 siblings, 0 replies; 17+ messages in thread
From: Durrant, Paul @ 2021-09-28  7:57 UTC (permalink / raw)
  To: Jan Beulich, paul; +Cc: Andrew Cooper, xen-devel

On 28/09/2021 08:47, Jan Beulich wrote:
> On 28.09.2021 09:34, Durrant, Paul wrote:
>> On 22/09/2021 15:37, Jan Beulich wrote:
>>> IVHD entries may specify that ATS is to be blocked for a device or range
>>> of devices. Honor firmware telling us so.
>>>
>>> While adding respective checks I noticed that the 2nd conditional in
>>> amd_iommu_setup_domain_device() failed to check the IOMMU's capability.
>>> Add the missing part of the condition there, as no good can come from
>>> enabling ATS on a device when the IOMMU is not capable of dealing with
>>> ATS requests.
>>>
>>> For actually using ACPI_IVHD_ATS_DISABLED, make its expansion no longer
>>> exhibit UB.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> TBD: As an alternative to adding the missing IOMMU capability check, we
>>>        may want to consider simply using dte->i in the 2nd conditional in
>>>        amd_iommu_setup_domain_device().
>>> Note that while ATS enabling/disabling gets invoked without any locks
>>> held, the two functions should not be possible to race with one another
>>> for any individual device (or else we'd be in trouble already, as ATS
>>> might then get re-enabled immediately after it was disabled, with the
>>> DTE out of sync with this setting).
>>> ---
>>> v7: New.
>>>
>>> --- a/xen/drivers/passthrough/amd/iommu.h
>>> +++ b/xen/drivers/passthrough/amd/iommu.h
>>> @@ -120,6 +120,7 @@ struct ivrs_mappings {
>>>        uint16_t dte_requestor_id;
>>>        bool valid:1;
>>>        bool dte_allow_exclusion:1;
>>> +    bool block_ats:1;
>>>    
>>>        /* ivhd device data settings */
>>>        uint8_t device_flags;
>>> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
>>> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
>>> @@ -55,8 +55,8 @@ union acpi_ivhd_device {
>>>    };
>>>    
>>>    static void __init add_ivrs_mapping_entry(
>>> -    uint16_t bdf, uint16_t alias_id, uint8_t flags, bool alloc_irt,
>>> -    struct amd_iommu *iommu)
>>> +    uint16_t bdf, uint16_t alias_id, uint8_t flags, unsigned int ext_flags,
>>> +    bool alloc_irt, struct amd_iommu *iommu)
>>>    {
>>>        struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(iommu->seg);
>>>    
>>> @@ -66,6 +66,7 @@ static void __init add_ivrs_mapping_entr
>>>        ivrs_mappings[bdf].dte_requestor_id = alias_id;
>>>    
>>>        /* override flags for range of devices */
>>> +    ivrs_mappings[bdf].block_ats = ext_flags & ACPI_IVHD_ATS_DISABLED;
>>>        ivrs_mappings[bdf].device_flags = flags;
>>
>> I'm assuming the above indentation problem (which appears to be repeated
>> in a few places below) is more to do with patch email generation rather
>> than the actual code modifications so...
> 
> Hmm, I first suspected a Thunderbird regression, but checking the list
> archives I don't see any corruption. I'm therefore now suspecting the
> problem may be at your end ...

It certainly could be; I was forced to upgrade my copy of Thunderbird 
recently. It's odd that the issue is not consistent though. As long as 
the actual code looks ok though... no problem.

   Paul




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier
  2021-09-22 14:36 ` [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier Jan Beulich
  2021-09-28  7:12   ` Durrant, Paul
@ 2021-10-19 23:34   ` Andrew Cooper
  2021-10-20  6:58     ` Jan Beulich
  2021-10-20  8:17     ` Jan Beulich
  1 sibling, 2 replies; 17+ messages in thread
From: Andrew Cooper @ 2021-10-19 23:34 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Paul Durrant, Ian Jackson, Roger Pau Monné

On 22/09/2021 15:36, Jan Beulich wrote:
> Doing this in amd_iommu_prepare() is too late for it, in particular, to
> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
> (luckily) pretty simple, (pretty importantly) without breaking
> amd_iommu_prepare()'s logic to prevent multiple processing.
>
> This involves moving table checksumming, as
> amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
> now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
> the course of doing so stop open-coding acpi_tb_checksum(), seeing that
> we have other uses of this originally ACPI-private function elsewhere in
> the tree.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

I'm afraid this breaks booting on Skylake Server.  Yes, really - I
didn't believe the bisection at first either.

From a bit of debugging, I've found:

(XEN) *** acpi_dmar_init() => -19
(XEN) *** amd_iommu_get_supported_ivhd_type() => -19

So VT-d is disabled in firmware.  Oops, but something we should cope with.

Then we fall into acpi_ivrs_init(), and take the new-in-this-patch early
exit with -ENOENT too.

It turns out ...

> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_
>  
>  int __init acpi_ivrs_init(void)
>  {
> +    int rc;
> +
>      if ( !iommu_enable && !iommu_intremap )
>          return 0;
>  
> +    rc = amd_iommu_get_supported_ivhd_type();
> +    if ( rc < 0 )
> +        return rc;
> +    BUG_ON(!rc);
> +    ivhd_type = rc;
> +
>      if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
>      {
>          iommu_intremap = iommu_intremap_off;
>

... we're relying on this path (now skipped) to set iommu_intremap away
from iommu_intremap_full in the "no IOMMU anywhere to be found" case.

This explains why I occasionally during failure get spew about:

(XEN) CPU0: No irq handler for vector 7a (IRQ -2147483648, LAPIC)
[   17.117518] xhci_hcd 0000:00:14.0: Error while assigning device slot ID
[   17.121114] xhci_hcd 0000:00:14.0: Max number of devices this xHCI
host supports is 64.
[   17.125198] usb usb1-port2: couldn't allocate usb_device
[  248.317462] INFO: task kworker/u32:0:7 blocked for more than 120 seconds.

and eventually (gone 400s) get dumped in a dracut shell.

Booting with an explicit iommu=no-intremap, which clobbers
iommu_intremap during cmdline parsing, recovers the system.

This variable controls a whole lot of magic with interrupt handling.  It
should default to 0, not 2, and only become nonzero when an IOMMU is
properly established.  It also shouldn't be serving double duty as "what
the user wants" ahead of determining the system capabilities.

And not to open another can of worms, but our entire way of working
explodes if there are devices on the system not covered by an IOMMU.

~Andrew



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier
  2021-10-19 23:34   ` Andrew Cooper
@ 2021-10-20  6:58     ` Jan Beulich
  2021-10-20  8:17     ` Jan Beulich
  1 sibling, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2021-10-20  6:58 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Paul Durrant, Ian Jackson, Roger Pau Monné, xen-devel

On 20.10.2021 01:34, Andrew Cooper wrote:
> On 22/09/2021 15:36, Jan Beulich wrote:
>> Doing this in amd_iommu_prepare() is too late for it, in particular, to
>> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
>> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
>> (luckily) pretty simple, (pretty importantly) without breaking
>> amd_iommu_prepare()'s logic to prevent multiple processing.
>>
>> This involves moving table checksumming, as
>> amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
>> now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
>> the course of doing so stop open-coding acpi_tb_checksum(), seeing that
>> we have other uses of this originally ACPI-private function elsewhere in
>> the tree.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I'm afraid this breaks booting on Skylake Server.  Yes, really - I
> didn't believe the bisection at first either.
> 
> From a bit of debugging, I've found:
> 
> (XEN) *** acpi_dmar_init() => -19
> (XEN) *** amd_iommu_get_supported_ivhd_type() => -19
> 
> So VT-d is disabled in firmware.  Oops, but something we should cope with.

I wanted to say that I definitely did test this (for a long, long
time) on Intel systems, but clearly not on one like this. I'm sure
though that I did test on IOMMU-less Intel systems, so I'm still a
bit puzzled.

> Then we fall into acpi_ivrs_init(), and take the new-in-this-patch early
> exit with -ENOENT too.
> 
> It turns out ...
> 
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_
>>  
>>  int __init acpi_ivrs_init(void)
>>  {
>> +    int rc;
>> +
>>      if ( !iommu_enable && !iommu_intremap )
>>          return 0;
>>  
>> +    rc = amd_iommu_get_supported_ivhd_type();
>> +    if ( rc < 0 )
>> +        return rc;
>> +    BUG_ON(!rc);
>> +    ivhd_type = rc;
>> +
>>      if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
>>      {
>>          iommu_intremap = iommu_intremap_off;
>>
> 
> ... we're relying on this path (now skipped) to set iommu_intremap away
> from iommu_intremap_full in the "no IOMMU anywhere to be found" case.
> 
> This explains why I occasionally during failure get spew about:
> 
> (XEN) CPU0: No irq handler for vector 7a (IRQ -2147483648, LAPIC)
> [   17.117518] xhci_hcd 0000:00:14.0: Error while assigning device slot ID
> [   17.121114] xhci_hcd 0000:00:14.0: Max number of devices this xHCI
> host supports is 64.
> [   17.125198] usb usb1-port2: couldn't allocate usb_device
> [  248.317462] INFO: task kworker/u32:0:7 blocked for more than 120 seconds.
> 
> and eventually (gone 400s) get dumped in a dracut shell.
> 
> Booting with an explicit iommu=no-intremap, which clobbers
> iommu_intremap during cmdline parsing, recovers the system.
> 
> This variable controls a whole lot of magic with interrupt handling.  It
> should default to 0, not 2, and only become nonzero when an IOMMU is
> properly established.  It also shouldn't be serving double duty as "what
> the user wants" ahead of determining the system capabilities.

This would probably be too large a change at this point in time;
I'll see whether I can find something less intrusive. Unless of
course there's a patch already on xen-devel, which I didn't get
to read yet.

> And not to open another can of worms, but our entire way of working
> explodes if there are devices on the system not covered by an IOMMU.

I wouldn't be surprised, but is this something we have to expect
on non-broken systems? (I do know of broken systems giving the
appearance of uncovered devices by lacking suitable include-all
DRHD entries.)

Jan



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier
  2021-10-19 23:34   ` Andrew Cooper
  2021-10-20  6:58     ` Jan Beulich
@ 2021-10-20  8:17     ` Jan Beulich
  1 sibling, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2021-10-20  8:17 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Paul Durrant, Ian Jackson, Roger Pau Monné, xen-devel

On 20.10.2021 01:34, Andrew Cooper wrote:
> On 22/09/2021 15:36, Jan Beulich wrote:
>> Doing this in amd_iommu_prepare() is too late for it, in particular, to
>> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
>> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
>> (luckily) pretty simple, (pretty importantly) without breaking
>> amd_iommu_prepare()'s logic to prevent multiple processing.
>>
>> This involves moving table checksumming, as
>> amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
>> now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
>> the course of doing so stop open-coding acpi_tb_checksum(), seeing that
>> we have other uses of this originally ACPI-private function elsewhere in
>> the tree.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I'm afraid this breaks booting on Skylake Server.  Yes, really - I
> didn't believe the bisection at first either.

I'll be able to debug this, as by disabling VT-d on my Skylake I can
repro. But ...

> From a bit of debugging, I've found:
> 
> (XEN) *** acpi_dmar_init() => -19
> (XEN) *** amd_iommu_get_supported_ivhd_type() => -19
> 
> So VT-d is disabled in firmware.  Oops, but something we should cope with.
> 
> Then we fall into acpi_ivrs_init(), and take the new-in-this-patch early
> exit with -ENOENT too.
> 
> It turns out ...
> 
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_
>>  
>>  int __init acpi_ivrs_init(void)
>>  {
>> +    int rc;
>> +
>>      if ( !iommu_enable && !iommu_intremap )
>>          return 0;
>>  
>> +    rc = amd_iommu_get_supported_ivhd_type();
>> +    if ( rc < 0 )
>> +        return rc;
>> +    BUG_ON(!rc);
>> +    ivhd_type = rc;
>> +
>>      if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
>>      {
>>          iommu_intremap = iommu_intremap_off;
>>
> 
> ... we're relying on this path (now skipped) to set iommu_intremap away
> from iommu_intremap_full in the "no IOMMU anywhere to be found" case.

... this picture here looks incomplete, since in iommu_hardware_setup()
we have

    if ( !iommu_enabled )
        iommu_intremap = iommu_intremap_off;

which I don't see how it could be bypassed. Booting here fails because
of the AHCI driver not being able to obtain control of the disk, but
checking in a working setup I see it use MSI, which can't possibly be
affected by an early-boot-only wrong setting of iommu_intremap. (I can
easily believe that we have early IO-APIC setup logic going wrong when
this remains mistakenly set.)

What I'd like to avoid though is to add yet another custom writing of
iommu_intremap_off in acpi_ivrs_init(). I'd prefer to find a better
place for it, so I will want to do some debugging first. If all else
fails, the setting should at least be moved into the caller of the
function - after all switching around the order of the
acpi_{dmar,ivrs}_init() calls in acpi_iommu_init() shouldn't have any
negative effect.

Jan

> This explains why I occasionally during failure get spew about:
> 
> (XEN) CPU0: No irq handler for vector 7a (IRQ -2147483648, LAPIC)
> [   17.117518] xhci_hcd 0000:00:14.0: Error while assigning device slot ID
> [   17.121114] xhci_hcd 0000:00:14.0: Max number of devices this xHCI
> host supports is 64.
> [   17.125198] usb usb1-port2: couldn't allocate usb_device
> [  248.317462] INFO: task kworker/u32:0:7 blocked for more than 120 seconds.
> 
> and eventually (gone 400s) get dumped in a dracut shell.
> 
> Booting with an explicit iommu=no-intremap, which clobbers
> iommu_intremap during cmdline parsing, recovers the system.
> 
> This variable controls a whole lot of magic with interrupt handling.  It
> should default to 0, not 2, and only become nonzero when an IOMMU is
> properly established.  It also shouldn't be serving double duty as "what
> the user wants" ahead of determining the system capabilities.
> 
> And not to open another can of worms, but our entire way of working
> explodes if there are devices on the system not covered by an IOMMU.
> 
> ~Andrew
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-10-20  8:17 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22 14:35 [PATCH v8 0/6] AMD/IOMMU: further work split from XSA-378 Jan Beulich
2021-09-22 14:36 ` [PATCH v8 1/6] AMD/IOMMU: obtain IVHD type to use earlier Jan Beulich
2021-09-28  7:12   ` Durrant, Paul
2021-10-19 23:34   ` Andrew Cooper
2021-10-20  6:58     ` Jan Beulich
2021-10-20  8:17     ` Jan Beulich
2021-09-22 14:37 ` [PATCH v8 2/6] AMD/IOMMU: improve (extended) feature detection Jan Beulich
2021-09-22 14:37 ` [PATCH v8 3/6] AMD/IOMMU: check IVMD ranges against host implementation limits Jan Beulich
2021-09-22 14:37 ` [PATCH v8 4/6] AMD/IOMMU: respect AtsDisabled device flag Jan Beulich
2021-09-28  7:34   ` Durrant, Paul
2021-09-28  7:47     ` Jan Beulich
2021-09-28  7:57       ` Durrant, Paul
2021-09-22 14:38 ` [PATCH v8 5/6] AMD/IOMMU: pull ATS disabling earlier Jan Beulich
2021-09-28  7:36   ` Durrant, Paul
2021-09-22 14:38 ` [PATCH v8 6/6] AMD/IOMMU: expose errors and warnings unconditionally Jan Beulich
2021-09-28  7:42   ` Durrant, Paul
2021-09-28  7:50     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.