All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7] AMD/IOMMU: further work split from XSA-378
@ 2021-08-26  7:21 Jan Beulich
  2021-08-26  7:23 ` [PATCH v7 1/8] AMD/IOMMU: check / convert IVMD ranges for being / to be reserved Jan Beulich
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Jan Beulich @ 2021-08-26  7:21 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

Along the pieces that were determined to have security relevance
there are quite a few more fixes / improvements (or so I hope)
which were decided to not become part of the XSA itself. Hence
also why this is v7 and why several of them already have a
Reviewed-by tag. Here we go.

1: check / convert IVMD ranges for being / to be reserved
2: obtain IVHD type to use earlier
3: improve (extended) feature detection
4: check IVMD ranges against host implementation limits
5: also insert IVMD ranges into Dom0's page tables
6: provide function backing XENMEM_reserved_device_memory_map
7: add "ivmd=" command line option
8: respect AtsDisabled device flag

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v7 1/8] AMD/IOMMU: check / convert IVMD ranges for being / to be reserved
  2021-08-26  7:21 [PATCH v7] AMD/IOMMU: further work split from XSA-378 Jan Beulich
@ 2021-08-26  7:23 ` Jan Beulich
  2021-08-26 12:10   ` Andrew Cooper
  2021-08-26  7:23 ` [PATCH v7 2/8] AMD/IOMMU: obtain IVHD type to use earlier Jan Beulich
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-08-26  7:23 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

While the specification doesn't say so, just like for VT-d's RMRRs no
good can come from these ranges being e.g. conventional RAM or entirely
unmarked and hence usable for placing e.g. PCI device BARs. Check
whether they are, and put in some limited effort to convert to reserved.
(More advanced logic can be added if actual problems are found with this
simplistic variant.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
v7: Re-base.
v5: New.

--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -384,6 +384,38 @@ static int __init parse_ivmd_block(const
     AMD_IOMMU_DEBUG("IVMD Block: type %#x phys %#lx len %#lx\n",
                     ivmd_block->header.type, start_addr, mem_length);
 
+    if ( !e820_all_mapped(base, limit + PAGE_SIZE, E820_RESERVED) )
+    {
+        paddr_t addr;
+
+        AMD_IOMMU_DEBUG("IVMD: [%lx,%lx) is not (entirely) in reserved memory\n",
+                        base, limit + PAGE_SIZE);
+
+        for ( addr = base; addr <= limit; addr += PAGE_SIZE )
+        {
+            unsigned int type = page_get_ram_type(maddr_to_mfn(addr));
+
+            if ( type == RAM_TYPE_UNKNOWN )
+            {
+                if ( e820_add_range(&e820, addr, addr + PAGE_SIZE,
+                                    E820_RESERVED) )
+                    continue;
+                AMD_IOMMU_DEBUG("IVMD Error: Page at %lx couldn't be reserved\n",
+                                addr);
+                return -EIO;
+            }
+
+            /* Types which won't be handed out are considered good enough. */
+            if ( !(type & (RAM_TYPE_RESERVED | RAM_TYPE_ACPI |
+                           RAM_TYPE_UNUSABLE)) )
+                continue;
+
+            AMD_IOMMU_DEBUG("IVMD Error: Page at %lx can't be converted\n",
+                            addr);
+            return -EIO;
+        }
+    }
+
     if ( ivmd_block->header.flags & ACPI_IVMD_EXCLUSION_RANGE )
         exclusion = true;
     else if ( ivmd_block->header.flags & ACPI_IVMD_UNITY )



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v7 2/8] AMD/IOMMU: obtain IVHD type to use earlier
  2021-08-26  7:21 [PATCH v7] AMD/IOMMU: further work split from XSA-378 Jan Beulich
  2021-08-26  7:23 ` [PATCH v7 1/8] AMD/IOMMU: check / convert IVMD ranges for being / to be reserved Jan Beulich
@ 2021-08-26  7:23 ` Jan Beulich
  2021-08-26 12:30   ` Andrew Cooper
  2021-08-26  7:23 ` [PATCH v7 3/8] AMD/IOMMU: improve (extended) feature detection Jan Beulich
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-08-26  7:23 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

Doing this in amd_iommu_prepare() is too late for it, in particular, to
be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
(luckily) pretty simple, (pretty importantly) without breaking
amd_iommu_prepare()'s logic to prevent multiple processing.

This involves moving table checksumming, as
amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
the course of dojng so stop open-coding acpi_tb_checksum(), seeing that
we have other uses of this originally ACPI-private function elsewhere in
the tree.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v7: Move table checksumming.
v5: New.

--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -22,6 +22,8 @@
 
 #include <asm/io_apic.h>
 
+#include <acpi/actables.h>
+
 #include "iommu.h"
 
 /* Some helper structures, particularly to deal with ranges. */
@@ -1150,20 +1152,7 @@ static int __init parse_ivrs_table(struc
 static int __init detect_iommu_acpi(struct acpi_table_header *table)
 {
     const struct acpi_ivrs_header *ivrs_block;
-    unsigned long i;
     unsigned long length = sizeof(struct acpi_table_ivrs);
-    u8 checksum, *raw_table;
-
-    /* validate checksum: sum of entire table == 0 */
-    checksum = 0;
-    raw_table = (u8 *)table;
-    for ( i = 0; i < table->length; i++ )
-        checksum += raw_table[i];
-    if ( checksum )
-    {
-        AMD_IOMMU_DEBUG("IVRS Error: Invalid Checksum %#x\n", checksum);
-        return -ENODEV;
-    }
 
     while ( table->length > (length + sizeof(*ivrs_block)) )
     {
@@ -1300,6 +1289,15 @@ get_supported_ivhd_type(struct acpi_tabl
 {
     size_t length = sizeof(struct acpi_table_ivrs);
     const struct acpi_ivrs_header *ivrs_block, *blk = NULL;
+    uint8_t checksum;
+
+    /* Validate checksum: Sum of entire table == 0. */
+    checksum = acpi_tb_checksum(ACPI_CAST_PTR(uint8_t, table), table->length);
+    if ( checksum )
+    {
+        AMD_IOMMU_DEBUG("IVRS Error: Invalid Checksum %#x\n", checksum);
+        return -ENODEV;
+    }
 
     while ( table->length > (length + sizeof(*ivrs_block)) )
     {
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1398,15 +1398,9 @@ int __init amd_iommu_prepare(bool xt)
         goto error_out;
 
     /* Have we been here before? */
-    if ( ivhd_type )
+    if ( ivrs_bdf_entries )
         return 0;
 
-    rc = amd_iommu_get_supported_ivhd_type();
-    if ( rc < 0 )
-        goto error_out;
-    BUG_ON(!rc);
-    ivhd_type = rc;
-
     rc = amd_iommu_get_ivrs_dev_entries();
     if ( !rc )
         rc = -ENODEV;
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -179,9 +179,17 @@ static int __must_check amd_iommu_setup_
 
 int __init acpi_ivrs_init(void)
 {
+    int rc;
+
     if ( !iommu_enable && !iommu_intremap )
         return 0;
 
+    rc = amd_iommu_get_supported_ivhd_type();
+    if ( rc < 0 )
+        return rc;
+    BUG_ON(!rc);
+    ivhd_type = rc;
+
     if ( (amd_iommu_detect_acpi() !=0) || (iommu_found() == 0) )
     {
         iommu_intremap = iommu_intremap_off;



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v7 3/8] AMD/IOMMU: improve (extended) feature detection
  2021-08-26  7:21 [PATCH v7] AMD/IOMMU: further work split from XSA-378 Jan Beulich
  2021-08-26  7:23 ` [PATCH v7 1/8] AMD/IOMMU: check / convert IVMD ranges for being / to be reserved Jan Beulich
  2021-08-26  7:23 ` [PATCH v7 2/8] AMD/IOMMU: obtain IVHD type to use earlier Jan Beulich
@ 2021-08-26  7:23 ` Jan Beulich
  2021-08-26 13:02   ` Andrew Cooper
  2021-08-26  7:24 ` [PATCH v7 4/8] AMD/IOMMU: check IVMD ranges against host implementation limits Jan Beulich
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-08-26  7:23 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

First of all the documentation is very clear about ACPI table data
superseding raw register data. Use raw register data only if EFRSup is
clear in the ACPI tables (which may still go too far). Additionally if
this flag is clear, the IVRS type 11H table is reserved and hence may
not be recognized.

Furthermore propagate IVRS type 10H data into the feature flags
recorded, as the full extended features field is available in type 11H
only.

Note that this also makes necessary to stop the bad practice of us
finding a type 11H IVHD entry, but still processing the type 10H one
in detect_iommu_acpi()'s invocation of amd_iommu_detect_one_acpi().

Note also that the features.raw check in amd_iommu_prepare_one() needs
replacing, now that the field can also be populated by different means.
Key IOMMUv2 availability off of IVHD type not being 10H, and then move
it a function layer up, so that it would be set only once all IOMMUs
have been successfully prepared.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
v7: Re-base.
v5: New.

--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -304,6 +304,7 @@ extern struct hpet_sbdf {
     } init;
 } hpet_sbdf;
 
+extern unsigned int amd_iommu_acpi_info;
 extern int amd_iommu_min_paging_mode;
 
 extern void *shared_intremap_table;
--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -1051,7 +1051,8 @@ static void __init dump_acpi_table_heade
 static inline bool_t is_ivhd_block(u8 type)
 {
     return (type == ACPI_IVRS_TYPE_HARDWARE ||
-            type == ACPI_IVRS_TYPE_HARDWARE_11H);
+            ((amd_iommu_acpi_info & ACPI_IVRS_EFR_SUP) &&
+             type == ACPI_IVRS_TYPE_HARDWARE_11H));
 }
 
 static inline bool_t is_ivmd_block(u8 type)
@@ -1159,7 +1160,7 @@ static int __init detect_iommu_acpi(stru
         ivrs_block = (struct acpi_ivrs_header *)((u8 *)table + length);
         if ( table->length < (length + ivrs_block->length) )
             return -ENODEV;
-        if ( ivrs_block->type == ACPI_IVRS_TYPE_HARDWARE &&
+        if ( ivrs_block->type == ivhd_type &&
              amd_iommu_detect_one_acpi(to_ivhd_block(ivrs_block)) != 0 )
             return -ENODEV;
         length += ivrs_block->length;
@@ -1299,6 +1300,9 @@ get_supported_ivhd_type(struct acpi_tabl
         return -ENODEV;
     }
 
+    amd_iommu_acpi_info = container_of(table, const struct acpi_table_ivrs,
+                                       header)->info;
+
     while ( table->length > (length + sizeof(*ivrs_block)) )
     {
         ivrs_block = (struct acpi_ivrs_header *)((u8 *)table + length);
--- a/xen/drivers/passthrough/amd/iommu_detect.c
+++ b/xen/drivers/passthrough/amd/iommu_detect.c
@@ -60,14 +60,14 @@ void __init get_iommu_features(struct am
     const struct amd_iommu *first;
     ASSERT( iommu->mmio_base );
 
-    if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
+    if ( !(amd_iommu_acpi_info & ACPI_IVRS_EFR_SUP) )
     {
-        iommu->features.raw = 0;
-        return;
-    }
+        if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
+            return;
 
-    iommu->features.raw =
-        readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
+        iommu->features.raw =
+            readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
+    }
 
     /* Don't log the same set of features over and over. */
     first = list_first_entry(&amd_iommu_head, struct amd_iommu, list);
@@ -164,6 +164,42 @@ int __init amd_iommu_detect_one_acpi(
     iommu->cap_offset = ivhd_block->capability_offset;
     iommu->mmio_base_phys = ivhd_block->base_address;
 
+    if ( ivhd_type != ACPI_IVRS_TYPE_HARDWARE )
+        iommu->features.raw = ivhd_block->efr_image;
+    else if ( amd_iommu_acpi_info & ACPI_IVRS_EFR_SUP )
+    {
+        union {
+            uint32_t raw;
+            struct {
+                unsigned int xt_sup:1;
+                unsigned int nx_sup:1;
+                unsigned int gt_sup:1;
+                unsigned int glx_sup:2;
+                unsigned int ia_sup:1;
+                unsigned int ga_sup:1;
+                unsigned int he_sup:1;
+                unsigned int pas_max:5;
+                unsigned int pn_counters:4;
+                unsigned int pn_banks:6;
+                unsigned int msi_num_ppr:5;
+                unsigned int gats:2;
+                unsigned int hats:2;
+            };
+        } attr = { .raw = ivhd_block->iommu_attr };
+
+        iommu->features.flds.xt_sup = attr.xt_sup;
+        iommu->features.flds.nx_sup = attr.nx_sup;
+        iommu->features.flds.gt_sup = attr.gt_sup;
+        iommu->features.flds.glx_sup = attr.glx_sup;
+        iommu->features.flds.ia_sup = attr.ia_sup;
+        iommu->features.flds.ga_sup = attr.ga_sup;
+        iommu->features.flds.pas_max = attr.pas_max;
+        iommu->features.flds.gats = attr.gats;
+        iommu->features.flds.hats = attr.hats;
+    }
+    else if ( list_empty(&amd_iommu_head) )
+        AMD_IOMMU_DEBUG("EFRSup not set in ACPI table; will fall back to hardware\n");
+
     /* override IOMMU HT flags */
     iommu->ht_flags = ivhd_block->header.flags;
 
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -29,6 +29,7 @@ static bool __initdata pci_init;
 static void do_amd_iommu_irq(void *data);
 static DECLARE_SOFTIRQ_TASKLET(amd_iommu_irq_tasklet, do_amd_iommu_irq, NULL);
 
+unsigned int __read_mostly amd_iommu_acpi_info;
 unsigned int __read_mostly ivrs_bdf_entries;
 u8 __read_mostly ivhd_type;
 static struct radix_tree_root ivrs_maps;
@@ -1375,9 +1376,6 @@ static int __init amd_iommu_prepare_one(
 
     get_iommu_features(iommu);
 
-    if ( iommu->features.raw )
-        iommuv2_enabled = true;
-
     return 0;
 }
 
@@ -1419,6 +1417,9 @@ int __init amd_iommu_prepare(bool xt)
             has_xt = false;
     }
 
+    if ( ivhd_type != ACPI_IVRS_TYPE_HARDWARE )
+        iommuv2_enabled = true;
+
     for_each_amd_iommu ( iommu )
     {
         /* NB: There's no need to actually write these out right here. */
--- a/xen/include/acpi/actbl2.h
+++ b/xen/include/acpi/actbl2.h
@@ -716,6 +716,9 @@ struct acpi_table_ivrs {
 
 /* Values for Info field above */
 
+#define ACPI_IVRS_EFR_SUP           0x00000001	/* extended feature support */
+#define ACPI_IVRS_PREBOOT_DMA_REMAP 0x00000002	/* pre-boot DMA remapping in use */
+#define ACPI_IVRS_GVA_SIZE          0x000000E0	/* 3 bits, guest VA size */
 #define ACPI_IVRS_PHYSICAL_SIZE     0x00007F00	/* 7 bits, physical address size */
 #define ACPI_IVRS_VIRTUAL_SIZE      0x003F8000	/* 7 bits, virtual address size */
 #define ACPI_IVRS_ATS_RESERVED      0x00400000	/* ATS address translation range reserved */



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v7 4/8] AMD/IOMMU: check IVMD ranges against host implementation limits
  2021-08-26  7:21 [PATCH v7] AMD/IOMMU: further work split from XSA-378 Jan Beulich
                   ` (2 preceding siblings ...)
  2021-08-26  7:23 ` [PATCH v7 3/8] AMD/IOMMU: improve (extended) feature detection Jan Beulich
@ 2021-08-26  7:24 ` Jan Beulich
  2021-08-26 13:16   ` Andrew Cooper
  2021-08-26  7:24 ` [PATCH v7 5/8] AMD/IOMMU: also insert IVMD ranges into Dom0's page tables Jan Beulich
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-08-26  7:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

When such ranges can't be represented as 1:1 mappings in page tables,
reject them as presumably bogus. Note that when we detect features late
(because of EFRSup being clear in the ACPI tables), it would be quite a
bit of work to check for (and drop) out of range IVMD ranges, so IOMMU
initialization gets failed in this case instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
v7: Re-base.
v6: Re-base.
v5: New.

--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -305,6 +305,7 @@ extern struct hpet_sbdf {
 } hpet_sbdf;
 
 extern unsigned int amd_iommu_acpi_info;
+extern unsigned int amd_iommu_max_paging_mode;
 extern int amd_iommu_min_paging_mode;
 
 extern void *shared_intremap_table;
@@ -358,7 +359,7 @@ static inline int amd_iommu_get_paging_m
     while ( max_frames > PTE_PER_TABLE_SIZE )
     {
         max_frames = PTE_PER_TABLE_ALIGN(max_frames) >> PTE_PER_TABLE_SHIFT;
-        if ( ++level > 6 )
+        if ( ++level > amd_iommu_max_paging_mode )
             return -ENOMEM;
     }
 
--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -370,6 +370,7 @@ static int __init parse_ivmd_device_iomm
 static int __init parse_ivmd_block(const struct acpi_ivrs_memory *ivmd_block)
 {
     unsigned long start_addr, mem_length, base, limit;
+    unsigned int addr_bits;
     bool iw = true, ir = true, exclusion = false;
 
     if ( ivmd_block->header.length < sizeof(*ivmd_block) )
@@ -386,6 +387,17 @@ static int __init parse_ivmd_block(const
     AMD_IOMMU_DEBUG("IVMD Block: type %#x phys %#lx len %#lx\n",
                     ivmd_block->header.type, start_addr, mem_length);
 
+    addr_bits = min(MASK_EXTR(amd_iommu_acpi_info, ACPI_IVRS_PHYSICAL_SIZE),
+                    MASK_EXTR(amd_iommu_acpi_info, ACPI_IVRS_VIRTUAL_SIZE));
+    if ( amd_iommu_get_paging_mode(PFN_UP(start_addr + mem_length)) < 0 ||
+         (addr_bits < BITS_PER_LONG &&
+          ((start_addr + mem_length - 1) >> addr_bits)) )
+    {
+        AMD_IOMMU_DEBUG("IVMD: [%lx,%lx) is not IOMMU addressable\n",
+                        start_addr, start_addr + mem_length);
+        return 0;
+    }
+
     if ( !e820_all_mapped(base, limit + PAGE_SIZE, E820_RESERVED) )
     {
         paddr_t addr;
--- a/xen/drivers/passthrough/amd/iommu_detect.c
+++ b/xen/drivers/passthrough/amd/iommu_detect.c
@@ -67,6 +67,9 @@ void __init get_iommu_features(struct am
 
         iommu->features.raw =
             readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
+
+        if ( 4 + iommu->features.flds.hats < amd_iommu_max_paging_mode )
+            amd_iommu_max_paging_mode = 4 + iommu->features.flds.hats;
     }
 
     /* Don't log the same set of features over and over. */
@@ -200,6 +203,10 @@ int __init amd_iommu_detect_one_acpi(
     else if ( list_empty(&amd_iommu_head) )
         AMD_IOMMU_DEBUG("EFRSup not set in ACPI table; will fall back to hardware\n");
 
+    if ( (amd_iommu_acpi_info & ACPI_IVRS_EFR_SUP) &&
+         4 + iommu->features.flds.hats < amd_iommu_max_paging_mode )
+        amd_iommu_max_paging_mode = 4 + iommu->features.flds.hats;
+
     /* override IOMMU HT flags */
     iommu->ht_flags = ivhd_block->header.flags;
 
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1376,6 +1376,13 @@ static int __init amd_iommu_prepare_one(
 
     get_iommu_features(iommu);
 
+    /*
+     * Late extended feature determination may cause previously mappable
+     * IVMD ranges to become unmappable.
+     */
+    if ( amd_iommu_max_paging_mode < amd_iommu_min_paging_mode )
+        return -ERANGE;
+
     return 0;
 }
 
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -254,6 +254,7 @@ int amd_iommu_alloc_root(struct domain *
     return 0;
 }
 
+unsigned int __read_mostly amd_iommu_max_paging_mode = 6;
 int __read_mostly amd_iommu_min_paging_mode = 1;
 
 static int amd_iommu_domain_init(struct domain *d)



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v7 5/8] AMD/IOMMU: also insert IVMD ranges into Dom0's page tables
  2021-08-26  7:21 [PATCH v7] AMD/IOMMU: further work split from XSA-378 Jan Beulich
                   ` (3 preceding siblings ...)
  2021-08-26  7:24 ` [PATCH v7 4/8] AMD/IOMMU: check IVMD ranges against host implementation limits Jan Beulich
@ 2021-08-26  7:24 ` Jan Beulich
  2021-08-26  7:25 ` [PATCH v7 6/8] AMD/IOMMU: provide function backing XENMEM_reserved_device_memory_map Jan Beulich
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2021-08-26  7:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

So far only one region would be taken care of, if it can be placed in
the exclusion range registers of the IOMMU. Take care of further ranges
as well. Seeing that we've been doing fine without this, make both
insertion and removal best effort only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
v5: New.

--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -522,6 +522,14 @@ static int amd_iommu_add_device(u8 devfn
         amd_iommu_flush_device(iommu, bdf);
     }
 
+    if ( amd_iommu_reserve_domain_unity_map(
+             pdev->domain,
+             ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].unity_map,
+             0) )
+        AMD_IOMMU_DEBUG("%pd: unity mapping failed for %04x:%02x:%02x.%u\n",
+                        pdev->domain, pdev->seg, pdev->bus, PCI_SLOT(devfn),
+                        PCI_FUNC(devfn));
+
     return amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
 }
 
@@ -547,6 +555,14 @@ static int amd_iommu_remove_device(u8 de
 
     ivrs_mappings = get_ivrs_mappings(pdev->seg);
     bdf = PCI_BDF2(pdev->bus, devfn);
+
+    if ( amd_iommu_reserve_domain_unity_unmap(
+             pdev->domain,
+             ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].unity_map) )
+        AMD_IOMMU_DEBUG("%pd: unity unmapping failed for %04x:%02x:%02x.%u\n",
+                        pdev->domain, pdev->seg, pdev->bus, PCI_SLOT(devfn),
+                        PCI_FUNC(devfn));
+
     if ( amd_iommu_perdev_intremap &&
          ivrs_mappings[bdf].dte_requestor_id == bdf &&
          ivrs_mappings[bdf].intremap_table )



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v7 6/8] AMD/IOMMU: provide function backing XENMEM_reserved_device_memory_map
  2021-08-26  7:21 [PATCH v7] AMD/IOMMU: further work split from XSA-378 Jan Beulich
                   ` (4 preceding siblings ...)
  2021-08-26  7:24 ` [PATCH v7 5/8] AMD/IOMMU: also insert IVMD ranges into Dom0's page tables Jan Beulich
@ 2021-08-26  7:25 ` Jan Beulich
  2021-08-26 13:24   ` Andrew Cooper
  2021-08-26  7:25 ` [PATCH v7 7/8] AMD/IOMMU: add "ivmd=" command line option Jan Beulich
  2021-08-26  7:26 ` [PATCH v7 8/8] AMD/IOMMU: respect AtsDisabled device flag Jan Beulich
  7 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-08-26  7:25 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

Just like for VT-d, exclusion / unity map ranges would better be
reflected in e.g. the guest's E820 map. The reporting infrastructure
was put in place still pretty tailored to VT-d's needs; extend
get_reserved_device_memory() to allow vendor specific code to probe
whether a particular (seg,bus,dev,func) tuple would get its data
actually recorded. I admit the de-duplication of entries is quite
limited for now, but considering our trouble to find a system
surfacing _any_ IVMD this is likely not a critical issue for this
initial implementation.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
v7: Re-base.
v5: New.

--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1042,6 +1042,9 @@ static int get_reserved_device_memory(xe
     if ( !(grdm->map.flags & XENMEM_RDM_ALL) && (sbdf != id) )
         return 0;
 
+    if ( !nr )
+        return 1;
+
     if ( grdm->used_entries < grdm->map.nr_entries )
     {
         struct xen_reserved_device_memory rdm = {
--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -110,6 +110,7 @@ struct amd_iommu {
 struct ivrs_unity_map {
     bool read:1;
     bool write:1;
+    bool global:1;
     paddr_t addr;
     unsigned long length;
     struct ivrs_unity_map *next;
@@ -236,6 +237,7 @@ int amd_iommu_reserve_domain_unity_map(s
                                        unsigned int flag);
 int amd_iommu_reserve_domain_unity_unmap(struct domain *d,
                                          const struct ivrs_unity_map *map);
+int amd_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt);
 int __must_check amd_iommu_flush_iotlb_pages(struct domain *d, dfn_t dfn,
                                              unsigned long page_count,
                                              unsigned int flush_flags);
--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -145,7 +145,7 @@ static int __init reserve_iommu_exclusio
 
 static int __init reserve_unity_map_for_device(
     uint16_t seg, uint16_t bdf, unsigned long base,
-    unsigned long length, bool iw, bool ir)
+    unsigned long length, bool iw, bool ir, bool global)
 {
     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(seg);
     struct ivrs_unity_map *unity_map = ivrs_mappings[bdf].unity_map;
@@ -164,7 +164,11 @@ static int __init reserve_unity_map_for_
          */
         if ( base == unity_map->addr && length == unity_map->length &&
              ir == unity_map->read && iw == unity_map->write )
+        {
+            if ( global )
+                unity_map->global = true;
             return 0;
+        }
 
         if ( unity_map->addr + unity_map->length > base &&
              base + length > unity_map->addr )
@@ -183,6 +187,7 @@ static int __init reserve_unity_map_for_
 
     unity_map->read = ir;
     unity_map->write = iw;
+    unity_map->global = global;
     unity_map->addr = base;
     unity_map->length = length;
     unity_map->next = ivrs_mappings[bdf].unity_map;
@@ -222,7 +227,8 @@ static int __init register_range_for_all
 
         /* reserve r/w unity-mapped page entries for devices */
         for ( bdf = rc = 0; !rc && bdf < ivrs_bdf_entries; bdf++ )
-            rc = reserve_unity_map_for_device(seg, bdf, base, length, iw, ir);
+            rc = reserve_unity_map_for_device(seg, bdf, base, length, iw, ir,
+                                              true);
     }
 
     return rc;
@@ -255,8 +261,10 @@ static int __init register_range_for_dev
         paddr_t length = limit + PAGE_SIZE - base;
 
         /* reserve unity-mapped page entries for device */
-        rc = reserve_unity_map_for_device(seg, bdf, base, length, iw, ir) ?:
-             reserve_unity_map_for_device(seg, req, base, length, iw, ir);
+        rc = reserve_unity_map_for_device(seg, bdf, base, length, iw, ir,
+                                          false) ?:
+             reserve_unity_map_for_device(seg, req, base, length, iw, ir,
+                                          false);
     }
     else
     {
@@ -292,9 +300,9 @@ static int __init register_range_for_iom
 
         req = get_ivrs_mappings(iommu->seg)[bdf].dte_requestor_id;
         rc = reserve_unity_map_for_device(iommu->seg, bdf, base, length,
-                                          iw, ir) ?:
+                                          iw, ir, false) ?:
              reserve_unity_map_for_device(iommu->seg, req, base, length,
-                                          iw, ir);
+                                          iw, ir, false);
     }
 
     return rc;
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -467,6 +467,81 @@ int amd_iommu_reserve_domain_unity_unmap
     return rc;
 }
 
+int amd_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    unsigned int seg = 0 /* XXX */, bdf;
+    const struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(seg);
+    /* At least for global entries, avoid reporting them multiple times. */
+    enum { pending, processing, done } global = pending;
+
+    for ( bdf = 0; bdf < ivrs_bdf_entries; ++bdf )
+    {
+        pci_sbdf_t sbdf = PCI_SBDF2(seg, bdf);
+        const struct ivrs_unity_map *um = ivrs_mappings[bdf].unity_map;
+        unsigned int req = ivrs_mappings[bdf].dte_requestor_id;
+        const struct amd_iommu *iommu = ivrs_mappings[bdf].iommu;
+        int rc;
+
+        if ( !iommu )
+        {
+            /* May need to trigger the workaround in find_iommu_for_device(). */
+            const struct pci_dev *pdev;
+
+            pcidevs_lock();
+            pdev = pci_get_pdev(seg, sbdf.bus, sbdf.devfn);
+            pcidevs_unlock();
+
+            if ( pdev )
+                iommu = find_iommu_for_device(seg, bdf);
+            if ( !iommu )
+                continue;
+        }
+
+        if ( func(0, 0, sbdf.sbdf, ctxt) )
+        {
+            /*
+             * When the caller processes a XENMEM_RDM_ALL request, don't report
+             * multiple times the same range(s) for perhaps many devices with
+             * the same alias ID.
+             */
+            if ( bdf != req && ivrs_mappings[req].iommu &&
+                 func(0, 0, PCI_SBDF2(seg, req).sbdf, ctxt) )
+                continue;
+
+            if ( global == pending )
+                global = processing;
+        }
+
+        if ( iommu->exclusion_enable &&
+             (iommu->exclusion_allow_all ?
+              global == processing :
+              ivrs_mappings[bdf].dte_allow_exclusion) )
+        {
+            rc = func(PFN_DOWN(iommu->exclusion_base),
+                      PFN_UP(iommu->exclusion_limit | 1) -
+                      PFN_DOWN(iommu->exclusion_base), sbdf.sbdf, ctxt);
+            if ( unlikely(rc < 0) )
+                return rc;
+        }
+
+        for ( ; um; um = um->next )
+        {
+            if ( um->global && global != processing )
+                continue;
+
+            rc = func(PFN_DOWN(um->addr), PFN_DOWN(um->length),
+                      sbdf.sbdf, ctxt);
+            if ( unlikely(rc < 0) )
+                return rc;
+        }
+
+        if ( global == processing )
+            global = done;
+    }
+
+    return 0;
+}
+
 int __init amd_iommu_quarantine_init(struct domain *d)
 {
     struct domain_iommu *hd = dom_iommu(d);
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -667,6 +667,7 @@ static const struct iommu_ops __initcons
     .suspend = amd_iommu_suspend,
     .resume = amd_iommu_resume,
     .crash_shutdown = amd_iommu_crash_shutdown,
+    .get_reserved_device_memory = amd_iommu_get_reserved_device_memory,
     .dump_page_tables = amd_dump_page_tables,
 };
 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v7 7/8] AMD/IOMMU: add "ivmd=" command line option
  2021-08-26  7:21 [PATCH v7] AMD/IOMMU: further work split from XSA-378 Jan Beulich
                   ` (5 preceding siblings ...)
  2021-08-26  7:25 ` [PATCH v7 6/8] AMD/IOMMU: provide function backing XENMEM_reserved_device_memory_map Jan Beulich
@ 2021-08-26  7:25 ` Jan Beulich
  2021-08-26 14:08   ` Andrew Cooper
  2021-08-26  7:26 ` [PATCH v7 8/8] AMD/IOMMU: respect AtsDisabled device flag Jan Beulich
  7 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-08-26  7:25 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

Just like VT-d's "rmrr=" it can be used to cover for firmware omissions.
Since systems surfacing IVMDs seem to be rare, it is also meant to allow
testing of the involved code.

Only the IVMD flavors actually understood by the IVMD parsing logic can
be generated, and for this initial implementation there's also no way to
control the flags field - unity r/w mappings are assumed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
v5: New.

--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -836,12 +836,12 @@ Controls for the dom0 IOMMU setup.
 
     Typically, some devices in a system use bits of RAM for communication, and
     these areas should be listed as reserved in the E820 table and identified
-    via RMRR or IVMD entries in the APCI tables, so Xen can ensure that they
+    via RMRR or IVMD entries in the ACPI tables, so Xen can ensure that they
     are identity-mapped in the IOMMU.  However, some firmware makes mistakes,
     and this option is a coarse-grain workaround for those errors.
 
     Where possible, finer grain corrections should be made with the `rmrr=`,
-    `ivrs_hpet=` or `ivrs_ioapic=` command line options.
+    `ivmd=`, `ivrs_hpet[]=`, or `ivrs_ioapic[]=` command line options.
 
     This option is disabled by default, and deprecated and intended for
     removal in future versions of Xen.  If specifying `map-inclusive` is the
@@ -1523,6 +1523,31 @@ _dom0-iommu=map-inclusive_ - using both
 > `= <integer>`
 
 ### irq_vector_map (x86)
+
+### ivmd (x86)
+> `= <start>[-<end>][=<bdf1>[-<bdf1'>][,<bdf2>[-<bdf2'>][,...]]][;<start>...]`
+
+Define IVMD-like ranges that are missing from ACPI tables along with the
+device(s) they belong to, and use them for 1:1 mapping.  End addresses can be
+omitted when exactly one page is meant.  The ranges are inclusive when start
+and end are specified.  Note that only PCI segment 0 is supported at this time,
+but it is fine to specify it explicitly.
+
+'start' and 'end' values are page numbers (not full physical addresses),
+in hexadecimal format (can optionally be preceded by "0x").
+
+Omitting the optional (range of) BDF spcifiers signals that the range is to
+be applied to all devices.
+
+Usage example: If device 0:0:1d.0 requires one page (0xd5d45) to be
+reserved, and devices 0:0:1a.0...0:0:1a.3 collectively require three pages
+(0xd5d46 thru 0xd5d48) to be reserved, one usage would be:
+
+ivmd=d5d45=0:1d.0;0xd5d46-0xd5d48=0:1a.0-0:1a.3
+
+Note: grub2 requires to escape or quote special characters, like ';' when
+multiple ranges are specified - refer to the grub2 documentation.
+
 ### ivrs_hpet[`<hpet>`] (AMD)
 > `=[<seg>:]<bus>:<device>.<func>`
 
--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -1063,6 +1063,9 @@ static void __init dump_acpi_table_heade
 
 }
 
+static struct acpi_ivrs_memory __initdata user_ivmds[8];
+static unsigned int __initdata nr_ivmd;
+
 #define to_ivhd_block(hdr) \
     container_of(hdr, const struct acpi_ivrs_hardware, header)
 #define to_ivmd_block(hdr) \
@@ -1087,7 +1090,7 @@ static int __init parse_ivrs_table(struc
 {
     const struct acpi_ivrs_header *ivrs_block;
     unsigned long length;
-    unsigned int apic;
+    unsigned int apic, i;
     bool_t sb_ioapic = !iommu_intremap;
     int error = 0;
 
@@ -1122,6 +1125,12 @@ static int __init parse_ivrs_table(struc
         length += ivrs_block->length;
     }
 
+    /* Add command line specified IVMD-equivalents. */
+    if ( nr_ivmd )
+        AMD_IOMMU_DEBUG("IVMD: %u command line provided entries\n", nr_ivmd);
+    for ( i = 0; !error && i < nr_ivmd; ++i )
+        error = parse_ivmd_block(user_ivmds + i);
+
     /* Each IO-APIC must have been mentioned in the table. */
     for ( apic = 0; !error && iommu_intremap && apic < nr_ioapics; ++apic )
     {
@@ -1362,3 +1371,80 @@ int __init amd_iommu_get_supported_ivhd_
 {
     return acpi_table_parse(ACPI_SIG_IVRS, get_supported_ivhd_type);
 }
+
+/*
+ * Parse "ivmd" command line option to later add the parsed devices / regions
+ * into unity mapping lists, just like IVMDs parsed from ACPI.
+ * Format:
+ * ivmd=<start>[-<end>][=<bdf1>[-<bdf1>'][,<bdf2>[-<bdf2>'][,...]]][;<start>...]
+ */
+static int __init parse_ivmd_param(const char *s)
+{
+    do {
+        unsigned long start, end;
+        const char *cur;
+
+        if ( nr_ivmd >= ARRAY_SIZE(user_ivmds) )
+            return -E2BIG;
+
+        start = simple_strtoul(cur = s, &s, 16);
+        if ( cur == s )
+            return -EINVAL;
+
+        if ( *s == '-' )
+        {
+            end = simple_strtoul(cur = s + 1, &s, 16);
+            if ( cur == s || end < start )
+                return -EINVAL;
+        }
+        else
+            end = start;
+
+        if ( *s != '=' )
+        {
+            user_ivmds[nr_ivmd].start_address = start << PAGE_SHIFT;
+            user_ivmds[nr_ivmd].memory_length = (end - start + 1) << PAGE_SHIFT;
+            user_ivmds[nr_ivmd].header.flags = ACPI_IVMD_UNITY |
+                                               ACPI_IVMD_READ | ACPI_IVMD_WRITE;
+            user_ivmds[nr_ivmd].header.length = sizeof(*user_ivmds);
+            user_ivmds[nr_ivmd].header.type = ACPI_IVRS_TYPE_MEMORY_ALL;
+            ++nr_ivmd;
+            continue;
+        }
+
+        do {
+            unsigned int seg, bus, dev, func;
+
+            if ( nr_ivmd >= ARRAY_SIZE(user_ivmds) )
+                return -E2BIG;
+
+            s = parse_pci(s + 1, &seg, &bus, &dev, &func);
+            if ( !s || seg )
+                return -EINVAL;
+
+            user_ivmds[nr_ivmd].start_address = start << PAGE_SHIFT;
+            user_ivmds[nr_ivmd].memory_length = (end - start + 1) << PAGE_SHIFT;
+            user_ivmds[nr_ivmd].header.flags = ACPI_IVMD_UNITY |
+                                               ACPI_IVMD_READ | ACPI_IVMD_WRITE;
+            user_ivmds[nr_ivmd].header.length = sizeof(*user_ivmds);
+            user_ivmds[nr_ivmd].header.device_id = PCI_BDF(bus, dev, func);
+            user_ivmds[nr_ivmd].header.type = ACPI_IVRS_TYPE_MEMORY_ONE;
+
+            if ( *s == '-' )
+            {
+                s = parse_pci(s + 1, &seg, &bus, &dev, &func);
+                if ( !s || seg )
+                    return -EINVAL;
+
+                user_ivmds[nr_ivmd].aux_data = PCI_BDF(bus, dev, func);
+                if ( user_ivmds[nr_ivmd].aux_data <
+                     user_ivmds[nr_ivmd].header.device_id )
+                    return -EINVAL;
+                user_ivmds[nr_ivmd].header.type = ACPI_IVRS_TYPE_MEMORY_RANGE;
+            }
+        } while ( ++nr_ivmd, *s == ',' );
+    } while ( *s++ == ';' );
+
+    return s[-1] ? -EINVAL : 0;
+}
+custom_param("ivmd", parse_ivmd_param);



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v7 8/8] AMD/IOMMU: respect AtsDisabled device flag
  2021-08-26  7:21 [PATCH v7] AMD/IOMMU: further work split from XSA-378 Jan Beulich
                   ` (6 preceding siblings ...)
  2021-08-26  7:25 ` [PATCH v7 7/8] AMD/IOMMU: add "ivmd=" command line option Jan Beulich
@ 2021-08-26  7:26 ` Jan Beulich
  2021-08-26 14:27   ` Andrew Cooper
  7 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-08-26  7:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant

IVHD entries may specify that ATS is to be blocked for a device or range
of devices. Honor firmware telling us so.

While adding respective checks I noticed that the 2nd conditional in
amd_iommu_setup_domain_device() failed to check the IOMMU's capability.
Add the missing part of the condition there, as no good can come from
enabling ATS on a device when the IOMMU is not capable of dealing with
ATS requests.

For actually using ACPI_IVHD_ATS_DISABLED, make its expansion no longer
exhibit UB.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: I find the ordering in amd_iommu_disable_domain_device()
     suspicious: amd_iommu_enable_domain_device() sets up the DTE first
     and then enables ATS on the device. It would seem to me that
     disabling would better be done the other way around (disable ATS on
     device, then adjust DTE).
TBD: As an alternative to adding the missing IOMMU capability check, we
     may want to consider simply using dte->i in the 2nd conditional in
     amd_iommu_enable_domain_device().
For both of these, while ATS enabling/disabling gets invoked without any
locks held, the two functions should not be possible to race with one
another for any individual device (or else we'd be in trouble already,
as ATS might then get re-enabled immediately after it was disabled, with
the DTE out of sync with this setting).
---
v7: New.

--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -120,6 +120,7 @@ struct ivrs_mappings {
     uint16_t dte_requestor_id;
     bool valid:1;
     bool dte_allow_exclusion:1;
+    bool block_ats:1;
 
     /* ivhd device data settings */
     uint8_t device_flags;
--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -55,8 +55,8 @@ union acpi_ivhd_device {
 };
 
 static void __init add_ivrs_mapping_entry(
-    uint16_t bdf, uint16_t alias_id, uint8_t flags, bool alloc_irt,
-    struct amd_iommu *iommu)
+    uint16_t bdf, uint16_t alias_id, uint8_t flags, unsigned int ext_flags,
+    bool alloc_irt, struct amd_iommu *iommu)
 {
     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(iommu->seg);
 
@@ -66,6 +66,7 @@ static void __init add_ivrs_mapping_entr
     ivrs_mappings[bdf].dte_requestor_id = alias_id;
 
     /* override flags for range of devices */
+    ivrs_mappings[bdf].block_ats = ext_flags & ACPI_IVHD_ATS_DISABLED;
     ivrs_mappings[bdf].device_flags = flags;
 
     /* Don't map an IOMMU by itself. */
@@ -499,7 +500,7 @@ static u16 __init parse_ivhd_device_sele
         return 0;
     }
 
-    add_ivrs_mapping_entry(bdf, bdf, select->header.data_setting, false,
+    add_ivrs_mapping_entry(bdf, bdf, select->header.data_setting, 0, false,
                            iommu);
 
     return sizeof(*select);
@@ -545,7 +546,7 @@ static u16 __init parse_ivhd_device_rang
     AMD_IOMMU_DEBUG(" Dev_Id Range: %#x -> %#x\n", first_bdf, last_bdf);
 
     for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
-        add_ivrs_mapping_entry(bdf, bdf, range->start.header.data_setting,
+        add_ivrs_mapping_entry(bdf, bdf, range->start.header.data_setting, 0,
                                false, iommu);
 
     return dev_length;
@@ -580,7 +581,7 @@ static u16 __init parse_ivhd_device_alia
 
     AMD_IOMMU_DEBUG(" Dev_Id Alias: %#x\n", alias_id);
 
-    add_ivrs_mapping_entry(bdf, alias_id, alias->header.data_setting, true,
+    add_ivrs_mapping_entry(bdf, alias_id, alias->header.data_setting, 0, true,
                            iommu);
 
     return dev_length;
@@ -636,7 +637,7 @@ static u16 __init parse_ivhd_device_alia
 
     for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
         add_ivrs_mapping_entry(bdf, alias_id, range->alias.header.data_setting,
-                               true, iommu);
+                               0, true, iommu);
 
     return dev_length;
 }
@@ -661,7 +662,8 @@ static u16 __init parse_ivhd_device_exte
         return 0;
     }
 
-    add_ivrs_mapping_entry(bdf, bdf, ext->header.data_setting, false, iommu);
+    add_ivrs_mapping_entry(bdf, bdf, ext->header.data_setting,
+                           ext->extended_data, false, iommu);
 
     return dev_length;
 }
@@ -708,7 +710,7 @@ static u16 __init parse_ivhd_device_exte
 
     for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
         add_ivrs_mapping_entry(bdf, bdf, range->extended.header.data_setting,
-                               false, iommu);
+                               range->extended.extended_data, false, iommu);
 
     return dev_length;
 }
@@ -800,7 +802,7 @@ static u16 __init parse_ivhd_device_spec
 
     AMD_IOMMU_DEBUG("IVHD Special: %pp variety %#x handle %#x\n",
                     &PCI_SBDF2(seg, bdf), special->variety, special->handle);
-    add_ivrs_mapping_entry(bdf, bdf, special->header.data_setting, true,
+    add_ivrs_mapping_entry(bdf, bdf, special->header.data_setting, 0, true,
                            iommu);
 
     switch ( special->variety )
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -105,6 +105,7 @@ static int __must_check amd_iommu_setup_
     int req_id, valid = 1, rc;
     u8 bus = pdev->bus;
     struct domain_iommu *hd = dom_iommu(domain);
+    const struct ivrs_mappings *ivrs_dev;
 
     if ( QUARANTINE_SKIP(domain) )
         return 0;
@@ -122,20 +123,18 @@ static int __must_check amd_iommu_setup_
     req_id = get_dma_requestor_id(iommu->seg, PCI_BDF2(bus, devfn));
     table = iommu->dev_table.buffer;
     dte = &table[req_id];
+    ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
 
     spin_lock_irqsave(&iommu->lock, flags);
 
     if ( !dte->v || !dte->tv )
     {
-        const struct ivrs_mappings *ivrs_dev;
-
         /* bind DTE to domain page-tables */
         amd_iommu_set_root_page_table(
             dte, page_to_maddr(hd->arch.amd.root_table),
             domain->domain_id, hd->arch.amd.paging_mode, valid);
 
         /* Undo what amd_iommu_disable_domain_device() may have done. */
-        ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
         if ( dte->it_root )
         {
             dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
@@ -146,6 +145,7 @@ static int __must_check amd_iommu_setup_
         dte->sys_mgt = MASK_EXTR(ivrs_dev->device_flags, ACPI_IVHD_SYSTEM_MGMT);
 
         if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
+             !ivrs_dev->block_ats &&
              iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) )
             dte->i = ats_enabled;
 
@@ -166,6 +166,8 @@ static int __must_check amd_iommu_setup_
     ASSERT(pcidevs_locked());
 
     if ( pci_ats_device(iommu->seg, bus, pdev->devfn) &&
+         !ivrs_dev->block_ats &&
+         iommu_has_cap(iommu, PCI_CAP_IOTLB_SHIFT) &&
          !pci_ats_enabled(iommu->seg, bus, pdev->devfn) )
     {
         if ( devfn == pdev->devfn )
--- a/xen/include/acpi/actbl2.h
+++ b/xen/include/acpi/actbl2.h
@@ -851,7 +851,7 @@ struct acpi_ivrs_device8b {
 
 /* Values for extended_data above */
 
-#define ACPI_IVHD_ATS_DISABLED      (1<<31)
+#define ACPI_IVHD_ATS_DISABLED      (1u << 31)
 
 /* Type 72: 8-byte device entry */
 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/8] AMD/IOMMU: check / convert IVMD ranges for being / to be reserved
  2021-08-26  7:23 ` [PATCH v7 1/8] AMD/IOMMU: check / convert IVMD ranges for being / to be reserved Jan Beulich
@ 2021-08-26 12:10   ` Andrew Cooper
  2021-08-26 12:31     ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2021-08-26 12:10 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Paul Durrant

On 26/08/2021 08:23, Jan Beulich wrote:
> While the specification doesn't say so, just like for VT-d's RMRRs no
> good can come from these ranges being e.g. conventional RAM or entirely
> unmarked and hence usable for placing e.g. PCI device BARs. Check
> whether they are, and put in some limited effort to convert to reserved.
> (More advanced logic can be added if actual problems are found with this
> simplistic variant.)
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Paul Durrant <paul@xen.org>
> ---
> v7: Re-base.
> v5: New.
>
> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
> @@ -384,6 +384,38 @@ static int __init parse_ivmd_block(const
>      AMD_IOMMU_DEBUG("IVMD Block: type %#x phys %#lx len %#lx\n",
>                      ivmd_block->header.type, start_addr, mem_length);
>  
> +    if ( !e820_all_mapped(base, limit + PAGE_SIZE, E820_RESERVED) )
> +    {
> +        paddr_t addr;
> +
> +        AMD_IOMMU_DEBUG("IVMD: [%lx,%lx) is not (entirely) in reserved memory\n",
> +                        base, limit + PAGE_SIZE);
> +
> +        for ( addr = base; addr <= limit; addr += PAGE_SIZE )
> +        {
> +            unsigned int type = page_get_ram_type(maddr_to_mfn(addr));
> +
> +            if ( type == RAM_TYPE_UNKNOWN )
> +            {
> +                if ( e820_add_range(&e820, addr, addr + PAGE_SIZE,
> +                                    E820_RESERVED) )
> +                    continue;
> +                AMD_IOMMU_DEBUG("IVMD Error: Page at %lx couldn't be reserved\n",
> +                                addr);
> +                return -EIO;
> +            }
> +
> +            /* Types which won't be handed out are considered good enough. */
> +            if ( !(type & (RAM_TYPE_RESERVED | RAM_TYPE_ACPI |
> +                           RAM_TYPE_UNUSABLE)) )
> +                continue;
> +
> +            AMD_IOMMU_DEBUG("IVMD Error: Page at %lx can't be converted\n",
> +                            addr);

I think these print messages need to more than just debug.  The first
one is a warning, whereas the final two are hard errors liable to impact
the correct running of the system.

Especially as you're putting them in to try and spot problem cases, they
should be visible by default for when we inevitably get bug reports to
xen-devel saying "something changed with passthrough in Xen 4.16".

~Andrew


> +            return -EIO;
> +        }
> +    }
> +
>      if ( ivmd_block->header.flags & ACPI_IVMD_EXCLUSION_RANGE )
>          exclusion = true;
>      else if ( ivmd_block->header.flags & ACPI_IVMD_UNITY )
>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 2/8] AMD/IOMMU: obtain IVHD type to use earlier
  2021-08-26  7:23 ` [PATCH v7 2/8] AMD/IOMMU: obtain IVHD type to use earlier Jan Beulich
@ 2021-08-26 12:30   ` Andrew Cooper
  2021-08-26 12:33     ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2021-08-26 12:30 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Paul Durrant

On 26/08/2021 08:23, Jan Beulich wrote:
> Doing this in amd_iommu_prepare() is too late for it, in particular, to
> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
> (luckily) pretty simple, (pretty importantly) without breaking
> amd_iommu_prepare()'s logic to prevent multiple processing.
>
> This involves moving table checksumming, as
> amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
> now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
> the course of dojng so stop open-coding acpi_tb_checksum(), seeing that

doing.

> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
> @@ -1150,20 +1152,7 @@ static int __init parse_ivrs_table(struc
>  static int __init detect_iommu_acpi(struct acpi_table_header *table)
>  {
>      const struct acpi_ivrs_header *ivrs_block;
> -    unsigned long i;
>      unsigned long length = sizeof(struct acpi_table_ivrs);
> -    u8 checksum, *raw_table;
> -
> -    /* validate checksum: sum of entire table == 0 */
> -    checksum = 0;
> -    raw_table = (u8 *)table;
> -    for ( i = 0; i < table->length; i++ )
> -        checksum += raw_table[i];
> -    if ( checksum )
> -    {
> -        AMD_IOMMU_DEBUG("IVRS Error: Invalid Checksum %#x\n", checksum);
> -        return -ENODEV;
> -    }
>  
>      while ( table->length > (length + sizeof(*ivrs_block)) )
>      {
> @@ -1300,6 +1289,15 @@ get_supported_ivhd_type(struct acpi_tabl
>  {
>      size_t length = sizeof(struct acpi_table_ivrs);
>      const struct acpi_ivrs_header *ivrs_block, *blk = NULL;
> +    uint8_t checksum;
> +
> +    /* Validate checksum: Sum of entire table == 0. */
> +    checksum = acpi_tb_checksum(ACPI_CAST_PTR(uint8_t, table), table->length);
> +    if ( checksum )
> +    {
> +        AMD_IOMMU_DEBUG("IVRS Error: Invalid Checksum %#x\n", checksum);

I know you're just moving code, but this really needs to be a visible
error.  It's "I'm turning off the IOMMU because the ACPI table is bad",
which is about as serious as errors come.

~Andrew



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/8] AMD/IOMMU: check / convert IVMD ranges for being / to be reserved
  2021-08-26 12:10   ` Andrew Cooper
@ 2021-08-26 12:31     ` Jan Beulich
  2021-09-21  7:37       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-08-26 12:31 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Paul Durrant

On 26.08.2021 14:10, Andrew Cooper wrote:
> On 26/08/2021 08:23, Jan Beulich wrote:
>> While the specification doesn't say so, just like for VT-d's RMRRs no
>> good can come from these ranges being e.g. conventional RAM or entirely
>> unmarked and hence usable for placing e.g. PCI device BARs. Check
>> whether they are, and put in some limited effort to convert to reserved.
>> (More advanced logic can be added if actual problems are found with this
>> simplistic variant.)
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Reviewed-by: Paul Durrant <paul@xen.org>
>> ---
>> v7: Re-base.
>> v5: New.
>>
>> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
>> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
>> @@ -384,6 +384,38 @@ static int __init parse_ivmd_block(const
>>      AMD_IOMMU_DEBUG("IVMD Block: type %#x phys %#lx len %#lx\n",
>>                      ivmd_block->header.type, start_addr, mem_length);
>>  
>> +    if ( !e820_all_mapped(base, limit + PAGE_SIZE, E820_RESERVED) )
>> +    {
>> +        paddr_t addr;
>> +
>> +        AMD_IOMMU_DEBUG("IVMD: [%lx,%lx) is not (entirely) in reserved memory\n",
>> +                        base, limit + PAGE_SIZE);
>> +
>> +        for ( addr = base; addr <= limit; addr += PAGE_SIZE )
>> +        {
>> +            unsigned int type = page_get_ram_type(maddr_to_mfn(addr));
>> +
>> +            if ( type == RAM_TYPE_UNKNOWN )
>> +            {
>> +                if ( e820_add_range(&e820, addr, addr + PAGE_SIZE,
>> +                                    E820_RESERVED) )
>> +                    continue;
>> +                AMD_IOMMU_DEBUG("IVMD Error: Page at %lx couldn't be reserved\n",
>> +                                addr);
>> +                return -EIO;
>> +            }
>> +
>> +            /* Types which won't be handed out are considered good enough. */
>> +            if ( !(type & (RAM_TYPE_RESERVED | RAM_TYPE_ACPI |
>> +                           RAM_TYPE_UNUSABLE)) )
>> +                continue;
>> +
>> +            AMD_IOMMU_DEBUG("IVMD Error: Page at %lx can't be converted\n",
>> +                            addr);
> 
> I think these print messages need to more than just debug.  The first
> one is a warning, whereas the final two are hard errors liable to impact
> the correct running of the system.

Well, people would observe IOMMUs not getting put in use. I was following
existing style in this regard on the assumption that in such an event
people would (be told to) enable "iommu=debug". Hence ...

> Especially as you're putting them in to try and spot problem cases, they
> should be visible by default for when we inevitably get bug reports to
> xen-devel saying "something changed with passthrough in Xen 4.16".

... I can convert to ordinary printk(), provided you're convinced the
described model isn't reasonable and introducing a logging inconsistency
is worth it.

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 2/8] AMD/IOMMU: obtain IVHD type to use earlier
  2021-08-26 12:30   ` Andrew Cooper
@ 2021-08-26 12:33     ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2021-08-26 12:33 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Paul Durrant, xen-devel

On 26.08.2021 14:30, Andrew Cooper wrote:
> On 26/08/2021 08:23, Jan Beulich wrote:
>> Doing this in amd_iommu_prepare() is too late for it, in particular, to
>> be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
>> to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
>> (luckily) pretty simple, (pretty importantly) without breaking
>> amd_iommu_prepare()'s logic to prevent multiple processing.
>>
>> This involves moving table checksumming, as
>> amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
>> now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
>> the course of dojng so stop open-coding acpi_tb_checksum(), seeing that
> 
> doing.
> 
>> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
>> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
>> @@ -1150,20 +1152,7 @@ static int __init parse_ivrs_table(struc
>>  static int __init detect_iommu_acpi(struct acpi_table_header *table)
>>  {
>>      const struct acpi_ivrs_header *ivrs_block;
>> -    unsigned long i;
>>      unsigned long length = sizeof(struct acpi_table_ivrs);
>> -    u8 checksum, *raw_table;
>> -
>> -    /* validate checksum: sum of entire table == 0 */
>> -    checksum = 0;
>> -    raw_table = (u8 *)table;
>> -    for ( i = 0; i < table->length; i++ )
>> -        checksum += raw_table[i];
>> -    if ( checksum )
>> -    {
>> -        AMD_IOMMU_DEBUG("IVRS Error: Invalid Checksum %#x\n", checksum);
>> -        return -ENODEV;
>> -    }
>>  
>>      while ( table->length > (length + sizeof(*ivrs_block)) )
>>      {
>> @@ -1300,6 +1289,15 @@ get_supported_ivhd_type(struct acpi_tabl
>>  {
>>      size_t length = sizeof(struct acpi_table_ivrs);
>>      const struct acpi_ivrs_header *ivrs_block, *blk = NULL;
>> +    uint8_t checksum;
>> +
>> +    /* Validate checksum: Sum of entire table == 0. */
>> +    checksum = acpi_tb_checksum(ACPI_CAST_PTR(uint8_t, table), table->length);
>> +    if ( checksum )
>> +    {
>> +        AMD_IOMMU_DEBUG("IVRS Error: Invalid Checksum %#x\n", checksum);
> 
> I know you're just moving code, but this really needs to be a visible
> error.  It's "I'm turning off the IOMMU because the ACPI table is bad",
> which is about as serious as errors come.

I'll wait for us settling on patch 1 in this regard, and then follow
the same model here.

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 3/8] AMD/IOMMU: improve (extended) feature detection
  2021-08-26  7:23 ` [PATCH v7 3/8] AMD/IOMMU: improve (extended) feature detection Jan Beulich
@ 2021-08-26 13:02   ` Andrew Cooper
  2021-08-26 13:13     ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2021-08-26 13:02 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Paul Durrant

On 26/08/2021 08:23, Jan Beulich wrote:
> First of all the documentation is very clear about ACPI table data
> superseding raw register data. Use raw register data only if EFRSup is
> clear in the ACPI tables (which may still go too far). Additionally if
> this flag is clear, the IVRS type 11H table is reserved and hence may
> not be recognized.

The spec says:

Software Implementation Note: Information conveyed in the IVRS overrides
the corresponding
information available through the IOMMU hardware registers. System
software is required to honor
the ACPI settings.

This suggests that if there is an ACPI table, the hardware registers
shouldn't be followed.

Given what else is broken when there is no APCI table, I think we can
(and should) not support this configuration.

> Furthermore propagate IVRS type 10H data into the feature flags
> recorded, as the full extended features field is available in type 11H
> only.
>
> Note that this also makes necessary to stop the bad practice of us
> finding a type 11H IVHD entry, but still processing the type 10H one
> in detect_iommu_acpi()'s invocation of amd_iommu_detect_one_acpi().

I could have sworn I read in the spec that if 11H is present, 10H should
be ignored, but I can't actually locate a statement to this effect.

~Andrew

>
> Note also that the features.raw check in amd_iommu_prepare_one() needs
> replacing, now that the field can also be populated by different means.
> Key IOMMUv2 availability off of IVHD type not being 10H, and then move
> it a function layer up, so that it would be set only once all IOMMUs
> have been successfully prepared.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Paul Durrant <paul@xen.org>




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 3/8] AMD/IOMMU: improve (extended) feature detection
  2021-08-26 13:02   ` Andrew Cooper
@ 2021-08-26 13:13     ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2021-08-26 13:13 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Paul Durrant, xen-devel

On 26.08.2021 15:02, Andrew Cooper wrote:
> On 26/08/2021 08:23, Jan Beulich wrote:
>> First of all the documentation is very clear about ACPI table data
>> superseding raw register data. Use raw register data only if EFRSup is
>> clear in the ACPI tables (which may still go too far). Additionally if
>> this flag is clear, the IVRS type 11H table is reserved and hence may
>> not be recognized.
> 
> The spec says:
> 
> Software Implementation Note: Information conveyed in the IVRS overrides
> the corresponding
> information available through the IOMMU hardware registers. System
> software is required to honor
> the ACPI settings.
> 
> This suggests that if there is an ACPI table, the hardware registers
> shouldn't be followed.
> 
> Given what else is broken when there is no APCI table, I think we can
> (and should) not support this configuration.

Well, we don't. We do require ACPI tables. But that's not the question
here. Instead what this is about is whether the ACPI table has EFRSup
clear. This flag being clear when the register actually exists is imo
more likely a firmware bug.

Plus I didn't want to go too far in one step; I do acknowledge that
we may want to be even more strict down the road by saying "(which may
still go too far)".

>> Furthermore propagate IVRS type 10H data into the feature flags
>> recorded, as the full extended features field is available in type 11H
>> only.
>>
>> Note that this also makes necessary to stop the bad practice of us
>> finding a type 11H IVHD entry, but still processing the type 10H one
>> in detect_iommu_acpi()'s invocation of amd_iommu_detect_one_acpi().
> 
> I could have sworn I read in the spec that if 11H is present, 10H should
> be ignored, but I can't actually locate a statement to this effect.

What I can find is "It is recommended for system software to detect
IOMMU features from the fields in the IVHD Type11h structure
information, superseding information in Type10h block and MMIO
registers."

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 4/8] AMD/IOMMU: check IVMD ranges against host implementation limits
  2021-08-26  7:24 ` [PATCH v7 4/8] AMD/IOMMU: check IVMD ranges against host implementation limits Jan Beulich
@ 2021-08-26 13:16   ` Andrew Cooper
  2021-08-26 14:03     ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2021-08-26 13:16 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Paul Durrant

On 26/08/2021 08:24, Jan Beulich wrote:
> When such ranges can't be represented as 1:1 mappings in page tables,
> reject them as presumably bogus. Note that when we detect features late
> (because of EFRSup being clear in the ACPI tables), it would be quite a
> bit of work to check for (and drop) out of range IVMD ranges, so IOMMU
> initialization gets failed in this case instead.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Paul Durrant <paul@xen.org>

I'm not certain this is correct in combination with memory encryption.

The upper bits are the KeyID, but we shouldn't find any of those set in
an IVMD range.  I think at a minimum, we need to reduce the address
check by the stolen bits for encryption, which gives a lower bound.

~Andrew



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 6/8] AMD/IOMMU: provide function backing XENMEM_reserved_device_memory_map
  2021-08-26  7:25 ` [PATCH v7 6/8] AMD/IOMMU: provide function backing XENMEM_reserved_device_memory_map Jan Beulich
@ 2021-08-26 13:24   ` Andrew Cooper
  2021-08-26 14:05     ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2021-08-26 13:24 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Paul Durrant

On 26/08/2021 08:25, Jan Beulich wrote:
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -467,6 +467,81 @@ int amd_iommu_reserve_domain_unity_unmap
>      return rc;
>  }
>  
> +int amd_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
> +{
> +    unsigned int seg = 0 /* XXX */, bdf;

Is this XXX intended to stay?

I can't say I'm fussed about multi-segment handling, given the absence
of support on any hardware I've ever encountered.

~Andrew



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 4/8] AMD/IOMMU: check IVMD ranges against host implementation limits
  2021-08-26 13:16   ` Andrew Cooper
@ 2021-08-26 14:03     ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2021-08-26 14:03 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Paul Durrant, xen-devel

On 26.08.2021 15:16, Andrew Cooper wrote:
> On 26/08/2021 08:24, Jan Beulich wrote:
>> When such ranges can't be represented as 1:1 mappings in page tables,
>> reject them as presumably bogus. Note that when we detect features late
>> (because of EFRSup being clear in the ACPI tables), it would be quite a
>> bit of work to check for (and drop) out of range IVMD ranges, so IOMMU
>> initialization gets failed in this case instead.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Reviewed-by: Paul Durrant <paul@xen.org>
> 
> I'm not certain this is correct in combination with memory encryption.

Which we don't enable. I don't follow why you put this up as an
extra requirement. I'm adding checks based on IOMMU-specific data
alone. I think that's a fair and consistent step in the right
direction, no matter that there may be another step to go. Plus ...

> The upper bits are the KeyID, but we shouldn't find any of those set in
> an IVMD range.  I think at a minimum, we need to reduce the address
> check by the stolen bits for encryption, which gives a lower bound.

... aren't you asking here to consume data I'm only making
available in the still pending "x86/AMD: make HT range dynamic for
Fam17 and up", where I (now, i.e. v2) adjust paddr_bits accordingly?
Else I'm afraid I don't even know what you're talking about.

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 6/8] AMD/IOMMU: provide function backing XENMEM_reserved_device_memory_map
  2021-08-26 13:24   ` Andrew Cooper
@ 2021-08-26 14:05     ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2021-08-26 14:05 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Paul Durrant, xen-devel

On 26.08.2021 15:24, Andrew Cooper wrote:
> On 26/08/2021 08:25, Jan Beulich wrote:
>> --- a/xen/drivers/passthrough/amd/iommu_map.c
>> +++ b/xen/drivers/passthrough/amd/iommu_map.c
>> @@ -467,6 +467,81 @@ int amd_iommu_reserve_domain_unity_unmap
>>      return rc;
>>  }
>>  
>> +int amd_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
>> +{
>> +    unsigned int seg = 0 /* XXX */, bdf;
> 
> Is this XXX intended to stay?

Yes - we've already got 3 similar instances plus at least two where the
hardcoding of segment 0 is not otherwise marked.

Jan

> I can't say I'm fussed about multi-segment handling, given the absence
> of support on any hardware I've ever encountered.
> 
> ~Andrew
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 7/8] AMD/IOMMU: add "ivmd=" command line option
  2021-08-26  7:25 ` [PATCH v7 7/8] AMD/IOMMU: add "ivmd=" command line option Jan Beulich
@ 2021-08-26 14:08   ` Andrew Cooper
  2021-08-26 14:30     ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2021-08-26 14:08 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Paul Durrant

On 26/08/2021 08:25, Jan Beulich wrote:
> Just like VT-d's "rmrr=" it can be used to cover for firmware omissions.
> Since systems surfacing IVMDs seem to be rare, it is also meant to allow
> testing of the involved code.
>
> Only the IVMD flavors actually understood by the IVMD parsing logic can
> be generated, and for this initial implementation there's also no way to
> control the flags field - unity r/w mappings are assumed.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Paul Durrant <paul@xen.org>
> ---
> v5: New.
>
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -836,12 +836,12 @@ Controls for the dom0 IOMMU setup.
>  
>      Typically, some devices in a system use bits of RAM for communication, and
>      these areas should be listed as reserved in the E820 table and identified
> -    via RMRR or IVMD entries in the APCI tables, so Xen can ensure that they
> +    via RMRR or IVMD entries in the ACPI tables, so Xen can ensure that they

Oops.

> @@ -1523,6 +1523,31 @@ _dom0-iommu=map-inclusive_ - using both
>  > `= <integer>`
>  
>  ### irq_vector_map (x86)
> +
> +### ivmd (x86)
> +> `= <start>[-<end>][=<bdf1>[-<bdf1'>][,<bdf2>[-<bdf2'>][,...]]][;<start>...]`
> +
> +Define IVMD-like ranges that are missing from ACPI tables along with the
> +device(s) they belong to, and use them for 1:1 mapping.  End addresses can be
> +omitted when exactly one page is meant.  The ranges are inclusive when start
> +and end are specified.  Note that only PCI segment 0 is supported at this time,
> +but it is fine to specify it explicitly.
> +
> +'start' and 'end' values are page numbers (not full physical addresses),
> +in hexadecimal format (can optionally be preceded by "0x").
> +
> +Omitting the optional (range of) BDF spcifiers signals that the range is to
> +be applied to all devices.
> +
> +Usage example: If device 0:0:1d.0 requires one page (0xd5d45) to be
> +reserved, and devices 0:0:1a.0...0:0:1a.3 collectively require three pages
> +(0xd5d46 thru 0xd5d48) to be reserved, one usage would be:
> +
> +ivmd=d5d45=0:1d.0;0xd5d46-0xd5d48=0:1a.0-0:1a.3
> +
> +Note: grub2 requires to escape or quote special characters, like ';' when
> +multiple ranges are specified - refer to the grub2 documentation.

I'm slightly concerned that we're putting in syntax which the majority
bootloader in use can't handle.

A real IVMD entry in hardware doesn't have the concept of multiple
device ranges, so I think comma ought to be the main list separator, and
I don't think we need ; at all.

~Andrew



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 8/8] AMD/IOMMU: respect AtsDisabled device flag
  2021-08-26  7:26 ` [PATCH v7 8/8] AMD/IOMMU: respect AtsDisabled device flag Jan Beulich
@ 2021-08-26 14:27   ` Andrew Cooper
  2021-08-26 14:33     ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2021-08-26 14:27 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Paul Durrant

On 26/08/2021 08:26, Jan Beulich wrote:
> IVHD entries may specify that ATS is to be blocked for a device or range
> of devices. Honor firmware telling us so.

It would be helpful if there was any explanation as to the purpose of
this bit.

It is presumably to do with SecureATS, but that works by accepting the
ATS translation and doing the pagewalk anyway.

>
> While adding respective checks I noticed that the 2nd conditional in
> amd_iommu_setup_domain_device() failed to check the IOMMU's capability.
> Add the missing part of the condition there, as no good can come from
> enabling ATS on a device when the IOMMU is not capable of dealing with
> ATS requests.
>
> For actually using ACPI_IVHD_ATS_DISABLED, make its expansion no longer
> exhibit UB.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> TBD: I find the ordering in amd_iommu_disable_domain_device()
>      suspicious: amd_iommu_enable_domain_device() sets up the DTE first
>      and then enables ATS on the device. It would seem to me that
>      disabling would better be done the other way around (disable ATS on
>      device, then adjust DTE).

I'd hope that the worst which goes wrong, given the problematic order,
is a master abort.

But yes - ATS wants disabling on the device first, before the DTE is
updated.

~Andrew



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 7/8] AMD/IOMMU: add "ivmd=" command line option
  2021-08-26 14:08   ` Andrew Cooper
@ 2021-08-26 14:30     ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2021-08-26 14:30 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Paul Durrant, xen-devel

On 26.08.2021 16:08, Andrew Cooper wrote:
> On 26/08/2021 08:25, Jan Beulich wrote:
>> @@ -1523,6 +1523,31 @@ _dom0-iommu=map-inclusive_ - using both
>>  > `= <integer>`
>>  
>>  ### irq_vector_map (x86)
>> +
>> +### ivmd (x86)
>> +> `= <start>[-<end>][=<bdf1>[-<bdf1'>][,<bdf2>[-<bdf2'>][,...]]][;<start>...]`
>> +
>> +Define IVMD-like ranges that are missing from ACPI tables along with the
>> +device(s) they belong to, and use them for 1:1 mapping.  End addresses can be
>> +omitted when exactly one page is meant.  The ranges are inclusive when start
>> +and end are specified.  Note that only PCI segment 0 is supported at this time,
>> +but it is fine to specify it explicitly.
>> +
>> +'start' and 'end' values are page numbers (not full physical addresses),
>> +in hexadecimal format (can optionally be preceded by "0x").
>> +
>> +Omitting the optional (range of) BDF spcifiers signals that the range is to
>> +be applied to all devices.
>> +
>> +Usage example: If device 0:0:1d.0 requires one page (0xd5d45) to be
>> +reserved, and devices 0:0:1a.0...0:0:1a.3 collectively require three pages
>> +(0xd5d46 thru 0xd5d48) to be reserved, one usage would be:
>> +
>> +ivmd=d5d45=0:1d.0;0xd5d46-0xd5d48=0:1a.0-0:1a.3
>> +
>> +Note: grub2 requires to escape or quote special characters, like ';' when
>> +multiple ranges are specified - refer to the grub2 documentation.
> 
> I'm slightly concerned that we're putting in syntax which the majority
> bootloader in use can't handle.

This matches RMRR handling, and I'd really like to keep the two as
similar as possible. Plus you can avoid the use of ; also by having
more than one "ivmd=" on the command line.

> A real IVMD entry in hardware doesn't have the concept of multiple
> device ranges, so I think comma ought to be the main list separator, and
> I don't think we need ; at all.

Firmware would need to present two IVMD entries in such a case. On
the command line I think we should allow more compact representation.
Plus again - this is similar to "rmrr=".

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 8/8] AMD/IOMMU: respect AtsDisabled device flag
  2021-08-26 14:27   ` Andrew Cooper
@ 2021-08-26 14:33     ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2021-08-26 14:33 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Paul Durrant, xen-devel

On 26.08.2021 16:27, Andrew Cooper wrote:
> On 26/08/2021 08:26, Jan Beulich wrote:
>> TBD: I find the ordering in amd_iommu_disable_domain_device()
>>      suspicious: amd_iommu_enable_domain_device() sets up the DTE first
>>      and then enables ATS on the device. It would seem to me that
>>      disabling would better be done the other way around (disable ATS on
>>      device, then adjust DTE).
> 
> I'd hope that the worst which goes wrong, given the problematic order,
> is a master abort.
> 
> But yes - ATS wants disabling on the device first, before the DTE is
> updated.

Okay, I'll add another patch.

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7 1/8] AMD/IOMMU: check / convert IVMD ranges for being / to be reserved
  2021-08-26 12:31     ` Jan Beulich
@ 2021-09-21  7:37       ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2021-09-21  7:37 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Paul Durrant, xen-devel

On 26.08.2021 14:31, Jan Beulich wrote:
> On 26.08.2021 14:10, Andrew Cooper wrote:
>> On 26/08/2021 08:23, Jan Beulich wrote:
>>> While the specification doesn't say so, just like for VT-d's RMRRs no
>>> good can come from these ranges being e.g. conventional RAM or entirely
>>> unmarked and hence usable for placing e.g. PCI device BARs. Check
>>> whether they are, and put in some limited effort to convert to reserved.
>>> (More advanced logic can be added if actual problems are found with this
>>> simplistic variant.)
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> Reviewed-by: Paul Durrant <paul@xen.org>
>>> ---
>>> v7: Re-base.
>>> v5: New.
>>>
>>> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
>>> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
>>> @@ -384,6 +384,38 @@ static int __init parse_ivmd_block(const
>>>      AMD_IOMMU_DEBUG("IVMD Block: type %#x phys %#lx len %#lx\n",
>>>                      ivmd_block->header.type, start_addr, mem_length);
>>>  
>>> +    if ( !e820_all_mapped(base, limit + PAGE_SIZE, E820_RESERVED) )
>>> +    {
>>> +        paddr_t addr;
>>> +
>>> +        AMD_IOMMU_DEBUG("IVMD: [%lx,%lx) is not (entirely) in reserved memory\n",
>>> +                        base, limit + PAGE_SIZE);
>>> +
>>> +        for ( addr = base; addr <= limit; addr += PAGE_SIZE )
>>> +        {
>>> +            unsigned int type = page_get_ram_type(maddr_to_mfn(addr));
>>> +
>>> +            if ( type == RAM_TYPE_UNKNOWN )
>>> +            {
>>> +                if ( e820_add_range(&e820, addr, addr + PAGE_SIZE,
>>> +                                    E820_RESERVED) )
>>> +                    continue;
>>> +                AMD_IOMMU_DEBUG("IVMD Error: Page at %lx couldn't be reserved\n",
>>> +                                addr);
>>> +                return -EIO;
>>> +            }
>>> +
>>> +            /* Types which won't be handed out are considered good enough. */
>>> +            if ( !(type & (RAM_TYPE_RESERVED | RAM_TYPE_ACPI |
>>> +                           RAM_TYPE_UNUSABLE)) )
>>> +                continue;
>>> +
>>> +            AMD_IOMMU_DEBUG("IVMD Error: Page at %lx can't be converted\n",
>>> +                            addr);
>>
>> I think these print messages need to more than just debug.  The first
>> one is a warning, whereas the final two are hard errors liable to impact
>> the correct running of the system.
> 
> Well, people would observe IOMMUs not getting put in use. I was following
> existing style in this regard on the assumption that in such an event
> people would (be told to) enable "iommu=debug". Hence ...
> 
>> Especially as you're putting them in to try and spot problem cases, they
>> should be visible by default for when we inevitably get bug reports to
>> xen-devel saying "something changed with passthrough in Xen 4.16".
> 
> ... I can convert to ordinary printk(), provided you're convinced the
> described model isn't reasonable and introducing a logging inconsistency
> is worth it.

Since I haven't heard back on any of the replies I gave to your comments,
I'm going to assume they were sufficient to address your concerns. I'm
therefore planning to put in the part of the series which has R-b perhaps
tomorrow (with the tiny adjustment(s) that I've made following your
comments, which iirc were just spelling issues). If you disagree, please
reply there.

As to the particular concern of yours towards visibility of error-like
log messages: There's e.g. a significant amount of pre-existing
'AMD_IOMMU_DEBUG("IVHD Error: ...", ...)'. Instead of introducing
inconsistencies here, I think I'll add a patch on top introducing
AMD_IOMMU_ERROR(), AMD_IOMMU_WARN(), and AMD_IOMMU_VERBOSE(), converting
present AMD_IOMMU_DEBUG() as I see fit (and then up for discussion).

Jan



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-09-21  7:38 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-26  7:21 [PATCH v7] AMD/IOMMU: further work split from XSA-378 Jan Beulich
2021-08-26  7:23 ` [PATCH v7 1/8] AMD/IOMMU: check / convert IVMD ranges for being / to be reserved Jan Beulich
2021-08-26 12:10   ` Andrew Cooper
2021-08-26 12:31     ` Jan Beulich
2021-09-21  7:37       ` Jan Beulich
2021-08-26  7:23 ` [PATCH v7 2/8] AMD/IOMMU: obtain IVHD type to use earlier Jan Beulich
2021-08-26 12:30   ` Andrew Cooper
2021-08-26 12:33     ` Jan Beulich
2021-08-26  7:23 ` [PATCH v7 3/8] AMD/IOMMU: improve (extended) feature detection Jan Beulich
2021-08-26 13:02   ` Andrew Cooper
2021-08-26 13:13     ` Jan Beulich
2021-08-26  7:24 ` [PATCH v7 4/8] AMD/IOMMU: check IVMD ranges against host implementation limits Jan Beulich
2021-08-26 13:16   ` Andrew Cooper
2021-08-26 14:03     ` Jan Beulich
2021-08-26  7:24 ` [PATCH v7 5/8] AMD/IOMMU: also insert IVMD ranges into Dom0's page tables Jan Beulich
2021-08-26  7:25 ` [PATCH v7 6/8] AMD/IOMMU: provide function backing XENMEM_reserved_device_memory_map Jan Beulich
2021-08-26 13:24   ` Andrew Cooper
2021-08-26 14:05     ` Jan Beulich
2021-08-26  7:25 ` [PATCH v7 7/8] AMD/IOMMU: add "ivmd=" command line option Jan Beulich
2021-08-26 14:08   ` Andrew Cooper
2021-08-26 14:30     ` Jan Beulich
2021-08-26  7:26 ` [PATCH v7 8/8] AMD/IOMMU: respect AtsDisabled device flag Jan Beulich
2021-08-26 14:27   ` Andrew Cooper
2021-08-26 14:33     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.