All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH v6 0/8] AMD IOMMU: further improvements
@ 2019-09-19 13:08 Jan Beulich
  2019-09-19 13:21 ` [Xen-devel] [PATCH v6 1/8] AMD/IOMMU: don't blindly allocate interrupt remapping tables Jan Beulich
                   ` (7 more replies)
  0 siblings, 8 replies; 39+ messages in thread
From: Jan Beulich @ 2019-09-19 13:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

The main goal continues to be to reduce the huge memory overhead
that we've noticed. On the way there a number of other things were
once again noticed. All patches have now been tested on a Fam15
and a Fam17 system.

01: don't blindly allocate interrupt remapping tables
02: make phantom functions share interrupt remapping tables
03: x86/PCI: read maximum MSI vector count early
04: replace INTREMAP_ENTRIES
05: restrict interrupt remapping table sizes
06: tidy struct ivrs_mappings
07: allocate one device table per PCI segment
08: pre-fill all DTEs right after table allocation

Jan




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Xen-devel] [PATCH v6 1/8] AMD/IOMMU: don't blindly allocate interrupt remapping tables
  2019-09-19 13:08 [Xen-devel] [PATCH v6 0/8] AMD IOMMU: further improvements Jan Beulich
@ 2019-09-19 13:21 ` Jan Beulich
  2019-09-23 15:27   ` Paul Durrant
  2019-09-25 13:23   ` Andrew Cooper
  2019-09-19 13:22 ` [Xen-devel] [PATCH v6 2/8] AMD/IOMMU: make phantom functions share " Jan Beulich
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2019-09-19 13:21 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

ACPI tables are free to list far more device coordinates than there are
actual devices. By delaying the table allocations for most cases, and
doing them only when an actual device is known to be present at a given
position, overall memory used for the tables goes down from over 500k
pages to just over 1k (on my system having such ACPI table contents).

Tables continue to get allocated right away for special entries
(IO-APIC, HPET) as well as for alias IDs. While in the former case
that's simply because there may not be any device at a given position,
in the latter case this is to avoid having to introduce ref-counting of
table usage.

The change involves invoking
iterate_ivrs_mappings(amd_iommu_setup_device_table) a second time,
because the function now wants to be able to find PCI devices, which
isn't possible yet when IOMMU setup happens very early during x2APIC
mode setup. In this context amd_iommu_init_interrupt() gets renamed as
well.

The logic adjusting a DTE's interrupt remapping attributes also gets
changed, such that the lack of an IRT would result in target aborted
rather than not remapped interrupts (should any occur).

Note that for now phantom functions get separate IRTs allocated, as was
the case before.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v6: Acquire IOMMU lock in code added to amd_iommu_add_device(). Drop a
    pointless use of the conditional operator.
v5: New.
---
TBD: This retains prior (but suspicious) behavior of not calling
     amd_iommu_set_intremap_table() for "invalid" IVRS mapping entries.
     Since DTE.IV=0 means un-remapped interrupts, I wonder if this needs
     changing.

Backporting note: This depends on b5fbe81196 "iommu / x86: move call to
scan_pci_devices() out of vendor code"!

---
 xen/drivers/passthrough/amd/iommu_acpi.c      |   65 +++++++++++++----------
 xen/drivers/passthrough/amd/iommu_init.c      |   73 ++++++++++++++++++++------
 xen/drivers/passthrough/amd/iommu_intr.c      |    4 -
 xen/drivers/passthrough/amd/iommu_map.c       |    5 +
 xen/drivers/passthrough/amd/pci_amd_iommu.c   |   43 ++++++++++++++-
 xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |    2 
 6 files changed, 143 insertions(+), 49 deletions(-)

--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -53,7 +53,8 @@ union acpi_ivhd_device {
 };
 
 static void __init add_ivrs_mapping_entry(
-    u16 bdf, u16 alias_id, u8 flags, struct amd_iommu *iommu)
+    uint16_t bdf, uint16_t alias_id, uint8_t flags, bool alloc_irt,
+    struct amd_iommu *iommu)
 {
     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(iommu->seg);
 
@@ -69,27 +70,32 @@ static void __init add_ivrs_mapping_entr
     if ( iommu->bdf == bdf )
         return;
 
-    if ( !ivrs_mappings[alias_id].intremap_table )
+    /* Allocate interrupt remapping table if needed. */
+    if ( iommu_intremap && !ivrs_mappings[alias_id].intremap_table )
     {
-         /* allocate per-device interrupt remapping table */
-         if ( amd_iommu_perdev_intremap )
-             ivrs_mappings[alias_id].intremap_table =
-                 amd_iommu_alloc_intremap_table(
-                     iommu,
-                     &ivrs_mappings[alias_id].intremap_inuse);
-         else
-         {
-             if ( shared_intremap_table == NULL  )
-                 shared_intremap_table = amd_iommu_alloc_intremap_table(
-                     iommu,
-                     &shared_intremap_inuse);
-             ivrs_mappings[alias_id].intremap_table = shared_intremap_table;
-             ivrs_mappings[alias_id].intremap_inuse = shared_intremap_inuse;
-         }
-
-         if ( !ivrs_mappings[alias_id].intremap_table )
-             panic("No memory for %04x:%02x:%02x.%u's IRT\n", iommu->seg,
-                   PCI_BUS(alias_id), PCI_SLOT(alias_id), PCI_FUNC(alias_id));
+        if ( !amd_iommu_perdev_intremap )
+        {
+            if ( !shared_intremap_table )
+                shared_intremap_table = amd_iommu_alloc_intremap_table(
+                    iommu, &shared_intremap_inuse);
+
+            if ( !shared_intremap_table )
+                panic("No memory for shared IRT\n");
+
+            ivrs_mappings[alias_id].intremap_table = shared_intremap_table;
+            ivrs_mappings[alias_id].intremap_inuse = shared_intremap_inuse;
+        }
+        else if ( alloc_irt )
+        {
+            ivrs_mappings[alias_id].intremap_table =
+                amd_iommu_alloc_intremap_table(
+                    iommu, &ivrs_mappings[alias_id].intremap_inuse);
+
+            if ( !ivrs_mappings[alias_id].intremap_table )
+                panic("No memory for %04x:%02x:%02x.%u's IRT\n",
+                      iommu->seg, PCI_BUS(alias_id), PCI_SLOT(alias_id),
+                      PCI_FUNC(alias_id));
+        }
     }
 
     ivrs_mappings[alias_id].valid = true;
@@ -433,7 +439,8 @@ static u16 __init parse_ivhd_device_sele
         return 0;
     }
 
-    add_ivrs_mapping_entry(bdf, bdf, select->header.data_setting, iommu);
+    add_ivrs_mapping_entry(bdf, bdf, select->header.data_setting, false,
+                           iommu);
 
     return sizeof(*select);
 }
@@ -479,7 +486,7 @@ static u16 __init parse_ivhd_device_rang
 
     for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
         add_ivrs_mapping_entry(bdf, bdf, range->start.header.data_setting,
-                               iommu);
+                               false, iommu);
 
     return dev_length;
 }
@@ -513,7 +520,8 @@ static u16 __init parse_ivhd_device_alia
 
     AMD_IOMMU_DEBUG(" Dev_Id Alias: %#x\n", alias_id);
 
-    add_ivrs_mapping_entry(bdf, alias_id, alias->header.data_setting, iommu);
+    add_ivrs_mapping_entry(bdf, alias_id, alias->header.data_setting, true,
+                           iommu);
 
     return dev_length;
 }
@@ -568,7 +576,7 @@ static u16 __init parse_ivhd_device_alia
 
     for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
         add_ivrs_mapping_entry(bdf, alias_id, range->alias.header.data_setting,
-                               iommu);
+                               true, iommu);
 
     return dev_length;
 }
@@ -593,7 +601,7 @@ static u16 __init parse_ivhd_device_exte
         return 0;
     }
 
-    add_ivrs_mapping_entry(bdf, bdf, ext->header.data_setting, iommu);
+    add_ivrs_mapping_entry(bdf, bdf, ext->header.data_setting, false, iommu);
 
     return dev_length;
 }
@@ -640,7 +648,7 @@ static u16 __init parse_ivhd_device_exte
 
     for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
         add_ivrs_mapping_entry(bdf, bdf, range->extended.header.data_setting,
-                               iommu);
+                               false, iommu);
 
     return dev_length;
 }
@@ -733,7 +741,8 @@ static u16 __init parse_ivhd_device_spec
     AMD_IOMMU_DEBUG("IVHD Special: %04x:%02x:%02x.%u variety %#x handle %#x\n",
                     seg, PCI_BUS(bdf), PCI_SLOT(bdf), PCI_FUNC(bdf),
                     special->variety, special->handle);
-    add_ivrs_mapping_entry(bdf, bdf, special->header.data_setting, iommu);
+    add_ivrs_mapping_entry(bdf, bdf, special->header.data_setting, true,
+                           iommu);
 
     switch ( special->variety )
     {
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -30,6 +30,7 @@
 #include <xen/delay.h>
 
 static int __initdata nr_amd_iommus;
+static bool __initdata pci_init;
 
 static void do_amd_iommu_irq(unsigned long data);
 static DECLARE_SOFTIRQ_TASKLET(amd_iommu_irq_tasklet, do_amd_iommu_irq, 0);
@@ -1244,17 +1245,20 @@ static int __init amd_iommu_setup_device
 
     BUG_ON( (ivrs_bdf_entries == 0) );
 
-    /* allocate 'device table' on a 4K boundary */
-    device_table.alloc_size = PAGE_SIZE <<
-                              get_order_from_bytes(
-                              PAGE_ALIGN(ivrs_bdf_entries *
-                              IOMMU_DEV_TABLE_ENTRY_SIZE));
-    device_table.entries = device_table.alloc_size /
-                           IOMMU_DEV_TABLE_ENTRY_SIZE;
-
-    device_table.buffer = allocate_buffer(device_table.alloc_size,
-                                          "Device Table");
-    if  ( device_table.buffer == NULL )
+    if ( !device_table.buffer )
+    {
+        /* allocate 'device table' on a 4K boundary */
+        device_table.alloc_size = PAGE_SIZE <<
+                                  get_order_from_bytes(
+                                  PAGE_ALIGN(ivrs_bdf_entries *
+                                  IOMMU_DEV_TABLE_ENTRY_SIZE));
+        device_table.entries = device_table.alloc_size /
+                               IOMMU_DEV_TABLE_ENTRY_SIZE;
+
+        device_table.buffer = allocate_buffer(device_table.alloc_size,
+                                              "Device Table");
+    }
+    if ( !device_table.buffer )
         return -ENOMEM;
 
     /* Add device table entries */
@@ -1263,13 +1267,46 @@ static int __init amd_iommu_setup_device
         if ( ivrs_mappings[bdf].valid )
         {
             void *dte;
+            const struct pci_dev *pdev = NULL;
 
             /* add device table entry */
             dte = device_table.buffer + (bdf * IOMMU_DEV_TABLE_ENTRY_SIZE);
             iommu_dte_add_device_entry(dte, &ivrs_mappings[bdf]);
 
+            if ( iommu_intremap &&
+                 ivrs_mappings[bdf].dte_requestor_id == bdf &&
+                 !ivrs_mappings[bdf].intremap_table )
+            {
+                if ( !pci_init )
+                    continue;
+                pcidevs_lock();
+                pdev = pci_get_pdev(seg, PCI_BUS(bdf), PCI_DEVFN2(bdf));
+                pcidevs_unlock();
+            }
+
+            if ( pdev )
+            {
+                unsigned int req_id = bdf;
+
+                do {
+                    ivrs_mappings[req_id].intremap_table =
+                        amd_iommu_alloc_intremap_table(
+                            ivrs_mappings[bdf].iommu,
+                            &ivrs_mappings[req_id].intremap_inuse);
+                    if ( !ivrs_mappings[req_id].intremap_table )
+                        return -ENOMEM;
+
+                    if ( !pdev->phantom_stride )
+                        break;
+                    req_id += pdev->phantom_stride;
+                } while ( PCI_SLOT(req_id) == pdev->sbdf.dev );
+            }
+
             amd_iommu_set_intremap_table(
-                dte, virt_to_maddr(ivrs_mappings[bdf].intremap_table),
+                dte,
+                ivrs_mappings[bdf].intremap_table
+                ? virt_to_maddr(ivrs_mappings[bdf].intremap_table)
+                : 0,
                 iommu_intremap);
         }
     }
@@ -1402,7 +1439,8 @@ int __init amd_iommu_init(bool xt)
     if ( rc )
         goto error_out;
 
-    /* allocate and initialize a global device table shared by all iommus */
+    /* Allocate and initialize device table(s). */
+    pci_init = !xt;
     rc = iterate_ivrs_mappings(amd_iommu_setup_device_table);
     if ( rc )
         goto error_out;
@@ -1422,7 +1460,7 @@ int __init amd_iommu_init(bool xt)
         /*
          * Setting up of the IOMMU interrupts cannot occur yet at the (very
          * early) time we get here when enabling x2APIC mode. Suppress it
-         * here, and do it explicitly in amd_iommu_init_interrupt().
+         * here, and do it explicitly in amd_iommu_init_late().
          */
         rc = amd_iommu_init_one(iommu, !xt);
         if ( rc )
@@ -1436,11 +1474,16 @@ error_out:
     return rc;
 }
 
-int __init amd_iommu_init_interrupt(void)
+int __init amd_iommu_init_late(void)
 {
     struct amd_iommu *iommu;
     int rc = 0;
 
+    /* Further initialize the device table(s). */
+    pci_init = true;
+    if ( iommu_intremap )
+        rc = iterate_ivrs_mappings(amd_iommu_setup_device_table);
+
     for_each_amd_iommu ( iommu )
     {
         struct irq_desc *desc;
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -789,7 +789,7 @@ void amd_iommu_read_msi_from_ire(
     }
 }
 
-int __init amd_iommu_free_intremap_table(
+int amd_iommu_free_intremap_table(
     const struct amd_iommu *iommu, struct ivrs_mappings *ivrs_mapping)
 {
     void **tblp;
@@ -814,7 +814,7 @@ int __init amd_iommu_free_intremap_table
     return 0;
 }
 
-void *__init amd_iommu_alloc_intremap_table(
+void *amd_iommu_alloc_intremap_table(
     const struct amd_iommu *iommu, unsigned long **inuse_map)
 {
     unsigned int order = intremap_table_order(iommu);
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -116,8 +116,9 @@ void __init amd_iommu_set_intremap_table
     struct amd_iommu_dte *dte, uint64_t intremap_ptr, bool valid)
 {
     dte->it_root = intremap_ptr >> 6;
-    dte->int_tab_len = IOMMU_INTREMAP_ORDER;
-    dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
+    dte->int_tab_len = intremap_ptr ? IOMMU_INTREMAP_ORDER : 0;
+    dte->int_ctl = intremap_ptr ? IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED
+                                : IOMMU_DEV_TABLE_INT_CONTROL_ABORTED;
     dte->ig = false; /* unmapped interrupts result in i/o page faults */
     dte->iv = valid;
 }
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -164,7 +164,7 @@ static int __init iov_detect(void)
     if ( !iommu_enable && !iommu_intremap )
         return 0;
 
-    if ( (init_done ? amd_iommu_init_interrupt()
+    if ( (init_done ? amd_iommu_init_late()
                     : amd_iommu_init(false)) != 0 )
     {
         printk("AMD-Vi: Error initialization\n");
@@ -428,6 +428,7 @@ static int amd_iommu_add_device(u8 devfn
 {
     struct amd_iommu *iommu;
     u16 bdf;
+    struct ivrs_mappings *ivrs_mappings;
 
     if ( !pdev->domain )
         return -EINVAL;
@@ -457,6 +458,36 @@ static int amd_iommu_add_device(u8 devfn
         return -ENODEV;
     }
 
+    ivrs_mappings = get_ivrs_mappings(pdev->seg);
+    bdf = PCI_BDF2(pdev->bus, devfn);
+    if ( !ivrs_mappings ||
+         !ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].valid )
+        return -EPERM;
+
+    if ( iommu_intremap &&
+         ivrs_mappings[bdf].dte_requestor_id == bdf &&
+         !ivrs_mappings[bdf].intremap_table )
+    {
+        unsigned long flags;
+
+        ivrs_mappings[bdf].intremap_table =
+            amd_iommu_alloc_intremap_table(
+                iommu, &ivrs_mappings[bdf].intremap_inuse);
+        if ( !ivrs_mappings[bdf].intremap_table )
+            return -ENOMEM;
+
+        spin_lock_irqsave(&iommu->lock, flags);
+
+        amd_iommu_set_intremap_table(
+            iommu->dev_table.buffer + (bdf * IOMMU_DEV_TABLE_ENTRY_SIZE),
+            virt_to_maddr(ivrs_mappings[bdf].intremap_table),
+            iommu_intremap);
+
+        amd_iommu_flush_device(iommu, bdf);
+
+        spin_unlock_irqrestore(&iommu->lock, flags);
+    }
+
     amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
     return 0;
 }
@@ -465,6 +496,8 @@ static int amd_iommu_remove_device(u8 de
 {
     struct amd_iommu *iommu;
     u16 bdf;
+    struct ivrs_mappings *ivrs_mappings;
+
     if ( !pdev->domain )
         return -EINVAL;
 
@@ -480,6 +513,14 @@ static int amd_iommu_remove_device(u8 de
     }
 
     amd_iommu_disable_domain_device(pdev->domain, iommu, devfn, pdev);
+
+    ivrs_mappings = get_ivrs_mappings(pdev->seg);
+    bdf = PCI_BDF2(pdev->bus, devfn);
+    if ( amd_iommu_perdev_intremap &&
+         ivrs_mappings[bdf].dte_requestor_id == bdf &&
+         ivrs_mappings[bdf].intremap_table )
+        amd_iommu_free_intremap_table(iommu, &ivrs_mappings[bdf]);
+
     return 0;
 }
 
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -50,7 +50,7 @@ void get_iommu_features(struct amd_iommu
 /* amd-iommu-init functions */
 int amd_iommu_prepare(bool xt);
 int amd_iommu_init(bool xt);
-int amd_iommu_init_interrupt(void);
+int amd_iommu_init_late(void);
 int amd_iommu_update_ivrs_mapping_acpi(void);
 int iov_adjust_irq_affinities(void);
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Xen-devel] [PATCH v6 2/8] AMD/IOMMU: make phantom functions share interrupt remapping tables
  2019-09-19 13:08 [Xen-devel] [PATCH v6 0/8] AMD IOMMU: further improvements Jan Beulich
  2019-09-19 13:21 ` [Xen-devel] [PATCH v6 1/8] AMD/IOMMU: don't blindly allocate interrupt remapping tables Jan Beulich
@ 2019-09-19 13:22 ` Jan Beulich
  2019-09-23 15:44   ` Paul Durrant
  2019-09-19 13:22 ` [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early Jan Beulich
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2019-09-19 13:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

Rather than duplicating entries in amd_iommu_msi_msg_update_ire(), share
the tables. This mainly requires some care while freeing them, to avoid
freeing memory blocks twice.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v5: New.

---
 xen/drivers/passthrough/amd/iommu_init.c      |   43 +++++++++++++++---------
 xen/drivers/passthrough/amd/iommu_intr.c      |   45 +++++++++++++-------------
 xen/drivers/passthrough/amd/pci_amd_iommu.c   |    2 -
 xen/include/asm-x86/amd-iommu.h               |    2 -
 xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |    2 -
 5 files changed, 53 insertions(+), 41 deletions(-)

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1111,7 +1111,7 @@ static void __init amd_iommu_init_cleanu
         amd_iommu_free_intremap_table(list_first_entry(&amd_iommu_head,
                                                        struct amd_iommu,
                                                        list),
-                                      NULL);
+                                      NULL, 0);
 
     /* free amd iommu list */
     list_for_each_entry_safe ( iommu, next, &amd_iommu_head, list )
@@ -1176,7 +1176,7 @@ int iterate_ivrs_mappings(int (*handler)
 }
 
 int iterate_ivrs_entries(int (*handler)(const struct amd_iommu *,
-                                        struct ivrs_mappings *))
+                                        struct ivrs_mappings *, uint16_t bdf))
 {
     u16 seg = 0;
     int rc = 0;
@@ -1193,7 +1193,7 @@ int iterate_ivrs_entries(int (*handler)(
             const struct amd_iommu *iommu = map[bdf].iommu;
 
             if ( iommu && map[bdf].dte_requestor_id == bdf )
-                rc = handler(iommu, &map[bdf]);
+                rc = handler(iommu, &map[bdf], bdf);
         }
     } while ( !rc && ++seg );
 
@@ -1286,20 +1286,29 @@ static int __init amd_iommu_setup_device
 
             if ( pdev )
             {
-                unsigned int req_id = bdf;
-
-                do {
-                    ivrs_mappings[req_id].intremap_table =
-                        amd_iommu_alloc_intremap_table(
-                            ivrs_mappings[bdf].iommu,
-                            &ivrs_mappings[req_id].intremap_inuse);
-                    if ( !ivrs_mappings[req_id].intremap_table )
-                        return -ENOMEM;
-
-                    if ( !pdev->phantom_stride )
-                        break;
-                    req_id += pdev->phantom_stride;
-                } while ( PCI_SLOT(req_id) == pdev->sbdf.dev );
+                ivrs_mappings[bdf].intremap_table =
+                    amd_iommu_alloc_intremap_table(
+                        ivrs_mappings[bdf].iommu,
+                        &ivrs_mappings[bdf].intremap_inuse);
+                if ( !ivrs_mappings[bdf].intremap_table )
+                    return -ENOMEM;
+
+                if ( pdev->phantom_stride )
+                {
+                    unsigned int req_id = bdf;
+
+                    for ( ; ; )
+                    {
+                        req_id += pdev->phantom_stride;
+                        if ( PCI_SLOT(req_id) != pdev->sbdf.dev )
+                            break;
+
+                        ivrs_mappings[req_id].intremap_table =
+                            ivrs_mappings[bdf].intremap_table;
+                        ivrs_mappings[req_id].intremap_inuse =
+                            ivrs_mappings[bdf].intremap_inuse;
+                    }
+                }
             }
 
             amd_iommu_set_intremap_table(
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -711,33 +711,20 @@ int amd_iommu_msi_msg_update_ire(
 
     if ( msi_desc->remap_index >= 0 && !msg )
     {
-        do {
-            update_intremap_entry_from_msi_msg(iommu, bdf, nr,
-                                               &msi_desc->remap_index,
-                                               NULL, NULL);
-            if ( !pdev || !pdev->phantom_stride )
-                break;
-            bdf += pdev->phantom_stride;
-        } while ( PCI_SLOT(bdf) == PCI_SLOT(pdev->devfn) );
+        update_intremap_entry_from_msi_msg(iommu, bdf, nr,
+                                           &msi_desc->remap_index,
+                                           NULL, NULL);
 
         for ( i = 0; i < nr; ++i )
             msi_desc[i].remap_index = -1;
-        if ( pdev )
-            bdf = PCI_BDF2(pdev->bus, pdev->devfn);
     }
 
     if ( !msg )
         return 0;
 
-    do {
-        rc = update_intremap_entry_from_msi_msg(iommu, bdf, nr,
-                                                &msi_desc->remap_index,
-                                                msg, &data);
-        if ( rc || !pdev || !pdev->phantom_stride )
-            break;
-        bdf += pdev->phantom_stride;
-    } while ( PCI_SLOT(bdf) == PCI_SLOT(pdev->devfn) );
-
+    rc = update_intremap_entry_from_msi_msg(iommu, bdf, nr,
+                                            &msi_desc->remap_index,
+                                            msg, &data);
     if ( !rc )
     {
         for ( i = 1; i < nr; ++i )
@@ -790,12 +777,27 @@ void amd_iommu_read_msi_from_ire(
 }
 
 int amd_iommu_free_intremap_table(
-    const struct amd_iommu *iommu, struct ivrs_mappings *ivrs_mapping)
+    const struct amd_iommu *iommu, struct ivrs_mappings *ivrs_mapping,
+    uint16_t bdf)
 {
     void **tblp;
 
     if ( ivrs_mapping )
     {
+        unsigned int i;
+
+        /*
+         * PCI device phantom functions use the same tables as their "base"
+         * function: Look ahead to zap the pointers.
+         */
+        for ( i = 1; PCI_FUNC(bdf + i) && bdf + i < ivrs_bdf_entries; ++i )
+            if ( ivrs_mapping[i].intremap_table ==
+                 ivrs_mapping->intremap_table )
+            {
+                ivrs_mapping[i].intremap_table = NULL;
+                ivrs_mapping[i].intremap_inuse = NULL;
+            }
+
         XFREE(ivrs_mapping->intremap_inuse);
         tblp = &ivrs_mapping->intremap_table;
     }
@@ -934,7 +936,8 @@ static void dump_intremap_table(const st
 }
 
 static int dump_intremap_mapping(const struct amd_iommu *iommu,
-                                 struct ivrs_mappings *ivrs_mapping)
+                                 struct ivrs_mappings *ivrs_mapping,
+                                 uint16_t unused)
 {
     unsigned long flags;
 
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -519,7 +519,7 @@ static int amd_iommu_remove_device(u8 de
     if ( amd_iommu_perdev_intremap &&
          ivrs_mappings[bdf].dte_requestor_id == bdf &&
          ivrs_mappings[bdf].intremap_table )
-        amd_iommu_free_intremap_table(iommu, &ivrs_mappings[bdf]);
+        amd_iommu_free_intremap_table(iommu, &ivrs_mappings[bdf], bdf);
 
     return 0;
 }
--- a/xen/include/asm-x86/amd-iommu.h
+++ b/xen/include/asm-x86/amd-iommu.h
@@ -131,7 +131,7 @@ extern u8 ivhd_type;
 struct ivrs_mappings *get_ivrs_mappings(u16 seg);
 int iterate_ivrs_mappings(int (*)(u16 seg, struct ivrs_mappings *));
 int iterate_ivrs_entries(int (*)(const struct amd_iommu *,
-                                 struct ivrs_mappings *));
+                                 struct ivrs_mappings *, uint16_t));
 
 /* iommu tables in guest space */
 struct mmio_reg {
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -101,7 +101,7 @@ int amd_iommu_setup_ioapic_remapping(voi
 void *amd_iommu_alloc_intremap_table(
     const struct amd_iommu *, unsigned long **);
 int amd_iommu_free_intremap_table(
-    const struct amd_iommu *, struct ivrs_mappings *);
+    const struct amd_iommu *, struct ivrs_mappings *, uint16_t);
 void amd_iommu_ioapic_update_ire(
     unsigned int apic, unsigned int reg, unsigned int value);
 unsigned int amd_iommu_read_ioapic_from_ire(


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early
  2019-09-19 13:08 [Xen-devel] [PATCH v6 0/8] AMD IOMMU: further improvements Jan Beulich
  2019-09-19 13:21 ` [Xen-devel] [PATCH v6 1/8] AMD/IOMMU: don't blindly allocate interrupt remapping tables Jan Beulich
  2019-09-19 13:22 ` [Xen-devel] [PATCH v6 2/8] AMD/IOMMU: make phantom functions share " Jan Beulich
@ 2019-09-19 13:22 ` Jan Beulich
  2019-09-23 14:22   ` Roger Pau Monné
                     ` (2 more replies)
  2019-09-19 13:23 ` [Xen-devel] [PATCH v6 4/8] AMD/IOMMU: replace INTREMAP_ENTRIES Jan Beulich
                   ` (4 subsequent siblings)
  7 siblings, 3 replies; 39+ messages in thread
From: Jan Beulich @ 2019-09-19 13:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Suravee Suthikulpanit, Roger Pau Monné

Rather than doing this every time we set up interrupts for a device
anew (and then in several places) fill this invariant field right after
allocating struct pci_dev.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v6: New.

---
 xen/arch/x86/msi.c            |   13 +++++--------
 xen/drivers/passthrough/pci.c |   10 ++++++++++
 xen/drivers/vpci/msi.c        |    9 ++++-----
 xen/include/xen/pci.h         |    3 ++-
 xen/include/xen/vpci.h        |    6 ++----
 5 files changed, 23 insertions(+), 18 deletions(-)

--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -664,7 +664,7 @@ static int msi_capability_init(struct pc
 {
     struct msi_desc *entry;
     int pos;
-    unsigned int i, maxvec, mpos;
+    unsigned int i, mpos;
     u16 control, seg = dev->seg;
     u8 bus = dev->bus;
     u8 slot = PCI_SLOT(dev->devfn);
@@ -675,9 +675,8 @@ static int msi_capability_init(struct pc
     if ( !pos )
         return -ENODEV;
     control = pci_conf_read16(dev->sbdf, msi_control_reg(pos));
-    maxvec = multi_msi_capable(control);
-    if ( nvec > maxvec )
-        return maxvec;
+    if ( nvec > dev->msi_maxvec )
+        return dev->msi_maxvec;
     control &= ~PCI_MSI_FLAGS_QSIZE;
     multi_msi_enable(control, nvec);
 
@@ -711,7 +710,7 @@ static int msi_capability_init(struct pc
 
         /* All MSIs are unmasked by default, Mask them all */
         maskbits = pci_conf_read32(dev->sbdf, mpos);
-        maskbits |= ~(u32)0 >> (32 - maxvec);
+        maskbits |= ~(u32)0 >> (32 - dev->msi_maxvec);
         pci_conf_write32(dev->sbdf, mpos, maskbits);
     }
     list_add_tail(&entry->list, &dev->msi_list);
@@ -1284,7 +1283,6 @@ int pci_msi_conf_write_intercept(struct
     entry = find_msi_entry(pdev, -1, PCI_CAP_ID_MSI);
     if ( entry && entry->msi_attrib.maskbit )
     {
-        uint16_t cntl;
         uint32_t unused;
         unsigned int nvec = entry->msi.nvec;
 
@@ -1297,8 +1295,7 @@ int pci_msi_conf_write_intercept(struct
         if ( reg < entry->msi.mpos || reg >= entry->msi.mpos + 4 || size != 4 )
             return -EACCES;
 
-        cntl = pci_conf_read16(pdev->sbdf, msi_control_reg(pos));
-        unused = ~(uint32_t)0 >> (32 - multi_msi_capable(cntl));
+        unused = ~(uint32_t)0 >> (32 - pdev->msi_maxvec);
         for ( pos = 0; pos < nvec; ++pos, ++entry )
         {
             entry->msi_attrib.guest_masked =
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -340,6 +340,16 @@ static struct pci_dev *alloc_pdev(struct
     pdev->domain = NULL;
     INIT_LIST_HEAD(&pdev->msi_list);
 
+
+    pos = pci_find_cap_offset(pseg->nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+                              PCI_CAP_ID_MSI);
+    if ( pos )
+    {
+        uint16_t ctrl = pci_conf_read16(pdev->sbdf, msi_control_reg(pos));
+
+        pdev->msi_maxvec = multi_msi_capable(ctrl);
+    }
+
     pos = pci_find_cap_offset(pseg->nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
                               PCI_CAP_ID_MSIX);
     if ( pos )
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -27,7 +27,7 @@ static uint32_t control_read(const struc
 {
     const struct vpci_msi *msi = data;
 
-    return MASK_INSR(fls(msi->max_vectors) - 1, PCI_MSI_FLAGS_QMASK) |
+    return MASK_INSR(fls(pdev->msi_maxvec) - 1, PCI_MSI_FLAGS_QMASK) |
            MASK_INSR(fls(msi->vectors) - 1, PCI_MSI_FLAGS_QSIZE) |
            (msi->enabled ? PCI_MSI_FLAGS_ENABLE : 0) |
            (msi->masking ? PCI_MSI_FLAGS_MASKBIT : 0) |
@@ -40,7 +40,7 @@ static void control_write(const struct p
     struct vpci_msi *msi = data;
     unsigned int vectors = min_t(uint8_t,
                                  1u << MASK_EXTR(val, PCI_MSI_FLAGS_QSIZE),
-                                 msi->max_vectors);
+                                 pdev->msi_maxvec);
     bool new_enabled = val & PCI_MSI_FLAGS_ENABLE;
 
     /*
@@ -215,8 +215,7 @@ static int init_msi(struct pci_dev *pdev
      * FIXME: I've only been able to test this code with devices using a single
      * MSI interrupt and no mask register.
      */
-    pdev->vpci->msi->max_vectors = multi_msi_capable(control);
-    ASSERT(pdev->vpci->msi->max_vectors <= 32);
+    ASSERT(pdev->msi_maxvec <= 32);
 
     /* The multiple message enable is 0 after reset (1 message enabled). */
     pdev->vpci->msi->vectors = 1;
@@ -298,7 +297,7 @@ void vpci_dump_msi(void)
                 if ( msi->masking )
                     printk(" mask=%08x", msi->mask);
                 printk(" vectors max: %u enabled: %u\n",
-                       msi->max_vectors, msi->vectors);
+                       pdev->msi_maxvec, msi->vectors);
 
                 vpci_msi_arch_print(msi);
             }
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -94,7 +94,8 @@ struct pci_dev {
         pci_sbdf_t sbdf;
     };
 
-    u8 phantom_stride;
+    uint8_t msi_maxvec;
+    uint8_t phantom_stride;
 
     nodeid_t node; /* NUMA node */
 
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -99,14 +99,12 @@ struct vpci {
         uint32_t mask;
         /* Data. */
         uint16_t data;
-        /* Maximum number of vectors supported by the device. */
-        uint8_t max_vectors : 6;
+        /* Number of vectors configured. */
+        uint8_t vectors     : 6;
         /* Supports per-vector masking? */
         bool masking        : 1;
         /* 64-bit address capable? */
         bool address64      : 1;
-        /* Number of vectors configured. */
-        uint8_t vectors     : 6;
         /* Enabled? */
         bool enabled        : 1;
         /* Arch-specific data. */


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Xen-devel] [PATCH v6 4/8] AMD/IOMMU: replace INTREMAP_ENTRIES
  2019-09-19 13:08 [Xen-devel] [PATCH v6 0/8] AMD IOMMU: further improvements Jan Beulich
                   ` (2 preceding siblings ...)
  2019-09-19 13:22 ` [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early Jan Beulich
@ 2019-09-19 13:23 ` Jan Beulich
  2019-09-23 16:13   ` Paul Durrant
  2019-09-19 13:23 ` [Xen-devel] [PATCH v6 5/8] AMD/IOMMU: restrict interrupt remapping table sizes Jan Beulich
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2019-09-19 13:23 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

Prepare for the number of entries to not be the maximum possible, by
separating checks against maximum size from ones against actual size.
For caller side simplicity have alloc_intremap_entry() return the
maximum possible value upon allocation failure, rather than the first
just out-of-bounds one.

Have the involved functions already take all the subsequently needed
arguments here already, to reduce code churn in the patch actually
making the allocation size dynamic.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v5: New.

---
 xen/drivers/passthrough/amd/iommu_intr.c      |   93 +++++++++++++++-----------
 xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |    2 
 2 files changed, 59 insertions(+), 36 deletions(-)

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -69,7 +69,7 @@ union irte_cptr {
     const union irte128 *ptr128;
 } __transparent__;
 
-#define INTREMAP_ENTRIES (1 << IOMMU_INTREMAP_ORDER)
+#define INTREMAP_MAX_ENTRIES (1 << IOMMU_INTREMAP_ORDER)
 
 struct ioapic_sbdf ioapic_sbdf[MAX_IO_APICS];
 struct hpet_sbdf hpet_sbdf;
@@ -83,8 +83,20 @@ static void dump_intremap_tables(unsigne
 static unsigned int __init intremap_table_order(const struct amd_iommu *iommu)
 {
     return iommu->ctrl.ga_en
-           ? get_order_from_bytes(INTREMAP_ENTRIES * sizeof(union irte128))
-           : get_order_from_bytes(INTREMAP_ENTRIES * sizeof(union irte32));
+           ? get_order_from_bytes(INTREMAP_MAX_ENTRIES * sizeof(union irte128))
+           : get_order_from_bytes(INTREMAP_MAX_ENTRIES * sizeof(union irte32));
+}
+
+unsigned int amd_iommu_intremap_table_order(
+    const void *irt, const struct amd_iommu *iommu)
+{
+    return IOMMU_INTREMAP_ORDER;
+}
+
+static unsigned int intremap_table_entries(
+    const void *irt, const struct amd_iommu *iommu)
+{
+    return 1u << amd_iommu_intremap_table_order(irt, iommu);
 }
 
 unsigned int ioapic_id_to_index(unsigned int apic_id)
@@ -122,20 +134,24 @@ static int get_intremap_requestor_id(int
     return get_ivrs_mappings(seg)[bdf].dte_requestor_id;
 }
 
-static unsigned int alloc_intremap_entry(int seg, int bdf, unsigned int nr)
+static unsigned int alloc_intremap_entry(const struct amd_iommu *iommu,
+                                         unsigned int bdf, unsigned int nr)
 {
-    unsigned long *inuse = get_ivrs_mappings(seg)[bdf].intremap_inuse;
-    unsigned int slot = find_first_zero_bit(inuse, INTREMAP_ENTRIES);
+    const struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(iommu->seg);
+    unsigned long *inuse = ivrs_mappings[bdf].intremap_inuse;
+    unsigned int nr_ents =
+        intremap_table_entries(ivrs_mappings[bdf].intremap_table, iommu);
+    unsigned int slot = find_first_zero_bit(inuse, nr_ents);
 
     for ( ; ; )
     {
         unsigned int end;
 
-        if ( slot >= INTREMAP_ENTRIES )
+        if ( slot >= nr_ents )
             break;
-        end = find_next_bit(inuse, INTREMAP_ENTRIES, slot + 1);
-        if ( end > INTREMAP_ENTRIES )
-            end = INTREMAP_ENTRIES;
+        end = find_next_bit(inuse, nr_ents, slot + 1);
+        if ( end > nr_ents )
+            end = nr_ents;
         slot = (slot + nr - 1) & ~(nr - 1);
         if ( slot + nr <= end )
         {
@@ -144,12 +160,12 @@ static unsigned int alloc_intremap_entry
             break;
         }
         slot = (end + nr) & ~(nr - 1);
-        if ( slot >= INTREMAP_ENTRIES )
+        if ( slot >= nr_ents )
             break;
-        slot = find_next_zero_bit(inuse, INTREMAP_ENTRIES, slot);
+        slot = find_next_zero_bit(inuse, nr_ents, slot);
     }
 
-    return slot;
+    return slot < nr_ents ? slot : INTREMAP_MAX_ENTRIES;
 }
 
 static union irte_ptr get_intremap_entry(const struct amd_iommu *iommu,
@@ -159,7 +175,7 @@ static union irte_ptr get_intremap_entry
         .ptr = get_ivrs_mappings(iommu->seg)[bdf].intremap_table
     };
 
-    ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
+    ASSERT(table.ptr && (index < intremap_table_entries(table.ptr, iommu)));
 
     if ( iommu->ctrl.ga_en )
         table.ptr128 += index;
@@ -279,10 +295,10 @@ static int update_intremap_entry_from_io
     spin_lock_irqsave(lock, flags);
 
     offset = *index;
-    if ( offset >= INTREMAP_ENTRIES )
+    if ( offset >= INTREMAP_MAX_ENTRIES )
     {
-        offset = alloc_intremap_entry(iommu->seg, req_id, 1);
-        if ( offset >= INTREMAP_ENTRIES )
+        offset = alloc_intremap_entry(iommu, req_id, 1);
+        if ( offset >= INTREMAP_MAX_ENTRIES )
         {
             spin_unlock_irqrestore(lock, flags);
             rte->mask = 1;
@@ -400,8 +416,8 @@ int __init amd_iommu_setup_ioapic_remapp
             }
 
             spin_lock_irqsave(lock, flags);
-            offset = alloc_intremap_entry(seg, req_id, 1);
-            BUG_ON(offset >= INTREMAP_ENTRIES);
+            offset = alloc_intremap_entry(iommu, req_id, 1);
+            BUG_ON(offset >= INTREMAP_MAX_ENTRIES);
             entry = get_intremap_entry(iommu, req_id, offset);
             update_intremap_entry(iommu, entry, vector,
                                   delivery_mode, dest_mode, dest);
@@ -476,7 +492,7 @@ void amd_iommu_ioapic_update_ire(
         *(((u32 *)&new_rte) + 1) = value;
     }
 
-    if ( ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_ENTRIES )
+    if ( ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_MAX_ENTRIES )
     {
         ASSERT(saved_mask);
 
@@ -548,7 +564,7 @@ unsigned int amd_iommu_read_ioapic_from_
         return val;
 
     offset = ioapic_sbdf[idx].pin_2_idx[pin];
-    if ( offset >= INTREMAP_ENTRIES )
+    if ( offset >= INTREMAP_MAX_ENTRIES )
         return val;
 
     seg = ioapic_sbdf[idx].seg;
@@ -561,8 +577,8 @@ unsigned int amd_iommu_read_ioapic_from_
 
     if ( !(reg & 1) )
     {
-        ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
-        val &= ~(INTREMAP_ENTRIES - 1);
+        ASSERT(offset == (val & (INTREMAP_MAX_ENTRIES - 1)));
+        val &= ~(INTREMAP_MAX_ENTRIES - 1);
         /* The IntType fields match for both formats. */
         val |= MASK_INSR(entry.ptr32->flds.int_type,
                          IO_APIC_REDIR_DELIV_MODE_MASK);
@@ -622,11 +638,11 @@ static int update_intremap_entry_from_ms
         dest = MASK_EXTR(msg->address_lo, MSI_ADDR_DEST_ID_MASK);
 
     offset = *remap_index;
-    if ( offset >= INTREMAP_ENTRIES )
+    if ( offset >= INTREMAP_MAX_ENTRIES )
     {
         ASSERT(nr);
-        offset = alloc_intremap_entry(iommu->seg, bdf, nr);
-        if ( offset >= INTREMAP_ENTRIES )
+        offset = alloc_intremap_entry(iommu, bdf, nr);
+        if ( offset >= INTREMAP_MAX_ENTRIES )
         {
             spin_unlock_irqrestore(lock, flags);
             return -ENOSPC;
@@ -654,7 +670,7 @@ static int update_intremap_entry_from_ms
     update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
     spin_unlock_irqrestore(lock, flags);
 
-    *data = (msg->data & ~(INTREMAP_ENTRIES - 1)) | offset;
+    *data = (msg->data & ~(INTREMAP_MAX_ENTRIES - 1)) | offset;
 
     /*
      * In some special cases, a pci-e device(e.g SATA controller in IDE mode)
@@ -738,7 +754,7 @@ int amd_iommu_msi_msg_update_ire(
 void amd_iommu_read_msi_from_ire(
     struct msi_desc *msi_desc, struct msi_msg *msg)
 {
-    unsigned int offset = msg->data & (INTREMAP_ENTRIES - 1);
+    unsigned int offset = msg->data & (INTREMAP_MAX_ENTRIES - 1);
     const struct pci_dev *pdev = msi_desc->dev;
     u16 bdf = pdev ? PCI_BDF2(pdev->bus, pdev->devfn) : hpet_sbdf.bdf;
     u16 seg = pdev ? pdev->seg : hpet_sbdf.seg;
@@ -758,7 +774,7 @@ void amd_iommu_read_msi_from_ire(
         offset |= nr;
     }
 
-    msg->data &= ~(INTREMAP_ENTRIES - 1);
+    msg->data &= ~(INTREMAP_MAX_ENTRIES - 1);
     /* The IntType fields match for both formats. */
     msg->data |= MASK_INSR(entry.ptr32->flds.int_type,
                            MSI_DATA_DELIVERY_MODE_MASK);
@@ -824,8 +840,9 @@ void *amd_iommu_alloc_intremap_table(
 
     if ( tb )
     {
-        *inuse_map = xzalloc_array(unsigned long,
-                                   BITS_TO_LONGS(INTREMAP_ENTRIES));
+        unsigned int nr = intremap_table_entries(tb, iommu);
+
+        *inuse_map = xzalloc_array(unsigned long, BITS_TO_LONGS(nr));
         if ( *inuse_map )
             memset(tb, 0, PAGE_SIZE << order);
         else
@@ -869,6 +886,7 @@ bool __init iov_supports_xt(void)
 
 int __init amd_setup_hpet_msi(struct msi_desc *msi_desc)
 {
+    const struct amd_iommu *iommu;
     spinlock_t *lock;
     unsigned long flags;
     int rc = 0;
@@ -886,12 +904,15 @@ int __init amd_setup_hpet_msi(struct msi
         return -ENODEV;
     }
 
+    iommu = find_iommu_for_device(hpet_sbdf.seg, hpet_sbdf.bdf);
+    if ( !iommu )
+        return -ENXIO;
+
     lock = get_intremap_lock(hpet_sbdf.seg, hpet_sbdf.bdf);
     spin_lock_irqsave(lock, flags);
 
-    msi_desc->remap_index = alloc_intremap_entry(hpet_sbdf.seg,
-                                                 hpet_sbdf.bdf, 1);
-    if ( msi_desc->remap_index >= INTREMAP_ENTRIES )
+    msi_desc->remap_index = alloc_intremap_entry(iommu, hpet_sbdf.bdf, 1);
+    if ( msi_desc->remap_index >= INTREMAP_MAX_ENTRIES )
     {
         msi_desc->remap_index = -1;
         rc = -ENXIO;
@@ -906,12 +927,12 @@ static void dump_intremap_table(const st
                                 union irte_cptr tbl,
                                 const struct ivrs_mappings *ivrs_mapping)
 {
-    unsigned int count;
+    unsigned int count, nr = intremap_table_entries(tbl.ptr, iommu);
 
     if ( !tbl.ptr )
         return;
 
-    for ( count = 0; count < INTREMAP_ENTRIES; count++ )
+    for ( count = 0; count < nr; count++ )
     {
         if ( iommu->ctrl.ga_en
              ? !tbl.ptr128[count].raw[0] && !tbl.ptr128[count].raw[1]
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -102,6 +102,8 @@ void *amd_iommu_alloc_intremap_table(
     const struct amd_iommu *, unsigned long **);
 int amd_iommu_free_intremap_table(
     const struct amd_iommu *, struct ivrs_mappings *, uint16_t);
+unsigned int amd_iommu_intremap_table_order(
+    const void *irt, const struct amd_iommu *iommu);
 void amd_iommu_ioapic_update_ire(
     unsigned int apic, unsigned int reg, unsigned int value);
 unsigned int amd_iommu_read_ioapic_from_ire(


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Xen-devel] [PATCH v6 5/8] AMD/IOMMU: restrict interrupt remapping table sizes
  2019-09-19 13:08 [Xen-devel] [PATCH v6 0/8] AMD IOMMU: further improvements Jan Beulich
                   ` (3 preceding siblings ...)
  2019-09-19 13:23 ` [Xen-devel] [PATCH v6 4/8] AMD/IOMMU: replace INTREMAP_ENTRIES Jan Beulich
@ 2019-09-19 13:23 ` Jan Beulich
  2019-09-23 16:22   ` Paul Durrant
  2019-09-25 13:40   ` Andrew Cooper
  2019-09-19 13:24 ` [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings Jan Beulich
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2019-09-19 13:23 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

There's no point setting up tables with more space than a PCI device can
use. For both MSI and MSI-X we can determine how many interrupts could
be set up at most. Tables allocated during ACPI table parsing, however,
will (for now at least) continue to be set up to have maximum size.

Note that until we would want to use sub-page allocations here there's
no point checking whether both MSI and MSI-X are supported by a device -
an order-0 allocation will fit the dual case in any event, no matter
that the MSI-X vector count may be smaller than the MSI one.

On my Rome system this reduces space needed from just over 1k pages to
about 125.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v6: Don't allocate any IRT at all when device is neither MSI-X nor
    MSI-capable. Re-base over changes earlier in this series.
v5: New.

---
 xen/drivers/passthrough/amd/iommu_acpi.c      |    4 +-
 xen/drivers/passthrough/amd/iommu_init.c      |   13 ++++-----
 xen/drivers/passthrough/amd/iommu_intr.c      |   36 +++++++++++++++-----------
 xen/drivers/passthrough/amd/iommu_map.c       |   20 ++++++++++----
 xen/drivers/passthrough/amd/pci_amd_iommu.c   |   18 +++++++------
 xen/include/asm-x86/hvm/svm/amd-iommu-defs.h  |    3 --
 xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |    5 ++-
 7 files changed, 59 insertions(+), 40 deletions(-)

--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -77,7 +77,7 @@ static void __init add_ivrs_mapping_entr
         {
             if ( !shared_intremap_table )
                 shared_intremap_table = amd_iommu_alloc_intremap_table(
-                    iommu, &shared_intremap_inuse);
+                    iommu, &shared_intremap_inuse, 0);
 
             if ( !shared_intremap_table )
                 panic("No memory for shared IRT\n");
@@ -89,7 +89,7 @@ static void __init add_ivrs_mapping_entr
         {
             ivrs_mappings[alias_id].intremap_table =
                 amd_iommu_alloc_intremap_table(
-                    iommu, &ivrs_mappings[alias_id].intremap_inuse);
+                    iommu, &ivrs_mappings[alias_id].intremap_inuse, 0);
 
             if ( !ivrs_mappings[alias_id].intremap_table )
                 panic("No memory for %04x:%02x:%02x.%u's IRT\n",
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1284,12 +1284,14 @@ static int __init amd_iommu_setup_device
                 pcidevs_unlock();
             }
 
-            if ( pdev )
+            if ( pdev && (pdev->msix || pdev->msi_maxvec) )
             {
                 ivrs_mappings[bdf].intremap_table =
                     amd_iommu_alloc_intremap_table(
                         ivrs_mappings[bdf].iommu,
-                        &ivrs_mappings[bdf].intremap_inuse);
+                        &ivrs_mappings[bdf].intremap_inuse,
+                        pdev->msix ? pdev->msix->nr_entries
+                                   : pdev->msi_maxvec);
                 if ( !ivrs_mappings[bdf].intremap_table )
                     return -ENOMEM;
 
@@ -1312,11 +1314,8 @@ static int __init amd_iommu_setup_device
             }
 
             amd_iommu_set_intremap_table(
-                dte,
-                ivrs_mappings[bdf].intremap_table
-                ? virt_to_maddr(ivrs_mappings[bdf].intremap_table)
-                : 0,
-                iommu_intremap);
+                dte, ivrs_mappings[bdf].intremap_table,
+                ivrs_mappings[bdf].iommu, iommu_intremap);
         }
     }
 
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -69,7 +69,8 @@ union irte_cptr {
     const union irte128 *ptr128;
 } __transparent__;
 
-#define INTREMAP_MAX_ENTRIES (1 << IOMMU_INTREMAP_ORDER)
+#define INTREMAP_MAX_ORDER   0xB
+#define INTREMAP_MAX_ENTRIES (1 << INTREMAP_MAX_ORDER)
 
 struct ioapic_sbdf ioapic_sbdf[MAX_IO_APICS];
 struct hpet_sbdf hpet_sbdf;
@@ -80,17 +81,13 @@ unsigned int nr_ioapic_sbdf;
 
 static void dump_intremap_tables(unsigned char key);
 
-static unsigned int __init intremap_table_order(const struct amd_iommu *iommu)
-{
-    return iommu->ctrl.ga_en
-           ? get_order_from_bytes(INTREMAP_MAX_ENTRIES * sizeof(union irte128))
-           : get_order_from_bytes(INTREMAP_MAX_ENTRIES * sizeof(union irte32));
-}
+#define intremap_page_order(irt) PFN_ORDER(virt_to_page(irt))
 
 unsigned int amd_iommu_intremap_table_order(
     const void *irt, const struct amd_iommu *iommu)
 {
-    return IOMMU_INTREMAP_ORDER;
+    return intremap_page_order(irt) + PAGE_SHIFT -
+           (iommu->ctrl.ga_en ? 4 : 2);
 }
 
 static unsigned int intremap_table_entries(
@@ -825,7 +822,10 @@ int amd_iommu_free_intremap_table(
 
     if ( *tblp )
     {
-        __free_amd_iommu_tables(*tblp, intremap_table_order(iommu));
+        unsigned int order = intremap_page_order(*tblp);
+
+        intremap_page_order(*tblp) = 0;
+        __free_amd_iommu_tables(*tblp, order);
         *tblp = NULL;
     }
 
@@ -833,15 +833,23 @@ int amd_iommu_free_intremap_table(
 }
 
 void *amd_iommu_alloc_intremap_table(
-    const struct amd_iommu *iommu, unsigned long **inuse_map)
+    const struct amd_iommu *iommu, unsigned long **inuse_map, unsigned int nr)
 {
-    unsigned int order = intremap_table_order(iommu);
-    void *tb = __alloc_amd_iommu_tables(order);
+    unsigned int order;
+    void *tb;
 
+    if ( !nr )
+        nr = INTREMAP_MAX_ENTRIES;
+
+    order = iommu->ctrl.ga_en
+            ? get_order_from_bytes(nr * sizeof(union irte128))
+            : get_order_from_bytes(nr * sizeof(union irte32));
+
+    tb = __alloc_amd_iommu_tables(order);
     if ( tb )
     {
-        unsigned int nr = intremap_table_entries(tb, iommu);
-
+        intremap_page_order(tb) = order;
+        nr = intremap_table_entries(tb, iommu);
         *inuse_map = xzalloc_array(unsigned long, BITS_TO_LONGS(nr));
         if ( *inuse_map )
             memset(tb, 0, PAGE_SIZE << order);
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -113,12 +113,22 @@ void amd_iommu_set_root_page_table(struc
 }
 
 void __init amd_iommu_set_intremap_table(
-    struct amd_iommu_dte *dte, uint64_t intremap_ptr, bool valid)
+    struct amd_iommu_dte *dte, const void *ptr,
+    const struct amd_iommu *iommu, bool valid)
 {
-    dte->it_root = intremap_ptr >> 6;
-    dte->int_tab_len = intremap_ptr ? IOMMU_INTREMAP_ORDER : 0;
-    dte->int_ctl = intremap_ptr ? IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED
-                                : IOMMU_DEV_TABLE_INT_CONTROL_ABORTED;
+    if ( ptr )
+    {
+        dte->it_root = virt_to_maddr(ptr) >> 6;
+        dte->int_tab_len = amd_iommu_intremap_table_order(ptr, iommu);
+        dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
+    }
+    else
+    {
+        dte->it_root = 0;
+        dte->int_tab_len = 0;
+        dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_ABORTED;
+    }
+
     dte->ig = false; /* unmapped interrupts result in i/o page faults */
     dte->iv = valid;
 }
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -470,18 +470,22 @@ static int amd_iommu_add_device(u8 devfn
     {
         unsigned long flags;
 
-        ivrs_mappings[bdf].intremap_table =
-            amd_iommu_alloc_intremap_table(
-                iommu, &ivrs_mappings[bdf].intremap_inuse);
-        if ( !ivrs_mappings[bdf].intremap_table )
-            return -ENOMEM;
+        if ( pdev->msix || pdev->msi_maxvec )
+        {
+            ivrs_mappings[bdf].intremap_table =
+                amd_iommu_alloc_intremap_table(
+                    iommu, &ivrs_mappings[bdf].intremap_inuse,
+                    pdev->msix ? pdev->msix->nr_entries
+                               : pdev->msi_maxvec);
+            if ( !ivrs_mappings[bdf].intremap_table )
+                return -ENOMEM;
+        }
 
         spin_lock_irqsave(&iommu->lock, flags);
 
         amd_iommu_set_intremap_table(
             iommu->dev_table.buffer + (bdf * IOMMU_DEV_TABLE_ENTRY_SIZE),
-            virt_to_maddr(ivrs_mappings[bdf].intremap_table),
-            iommu_intremap);
+            ivrs_mappings[bdf].intremap_table, iommu, iommu_intremap);
 
         amd_iommu_flush_device(iommu, bdf);
 
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -107,9 +107,6 @@
 #define IOMMU_DEV_TABLE_INT_CONTROL_FORWARDED	0x1
 #define IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED	0x2
 
-/* For now, we always allocate the maximum: 2048 entries. */
-#define IOMMU_INTREMAP_ORDER			0xB
-
 struct amd_iommu_dte {
     /* 0 - 63 */
     bool v:1;
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -72,7 +72,8 @@ int __must_check amd_iommu_flush_iotlb_a
 /* device table functions */
 int get_dma_requestor_id(uint16_t seg, uint16_t bdf);
 void amd_iommu_set_intremap_table(struct amd_iommu_dte *dte,
-                                  uint64_t intremap_ptr,
+                                  const void *ptr,
+                                  const struct amd_iommu *iommu,
                                   bool valid);
 void amd_iommu_set_root_page_table(struct amd_iommu_dte *dte,
 				   uint64_t root_ptr, uint16_t domain_id,
@@ -99,7 +100,7 @@ struct amd_iommu *find_iommu_for_device(
 bool iov_supports_xt(void);
 int amd_iommu_setup_ioapic_remapping(void);
 void *amd_iommu_alloc_intremap_table(
-    const struct amd_iommu *, unsigned long **);
+    const struct amd_iommu *, unsigned long **, unsigned int nr);
 int amd_iommu_free_intremap_table(
     const struct amd_iommu *, struct ivrs_mappings *, uint16_t);
 unsigned int amd_iommu_intremap_table_order(


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings
  2019-09-19 13:08 [Xen-devel] [PATCH v6 0/8] AMD IOMMU: further improvements Jan Beulich
                   ` (4 preceding siblings ...)
  2019-09-19 13:23 ` [Xen-devel] [PATCH v6 5/8] AMD/IOMMU: restrict interrupt remapping table sizes Jan Beulich
@ 2019-09-19 13:24 ` Jan Beulich
  2019-09-23 16:25   ` Paul Durrant
  2019-09-25 13:41   ` Andrew Cooper
  2019-09-19 13:24 ` [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment Jan Beulich
  2019-09-19 13:25 ` [Xen-devel] [PATCH v6 8/8] AMD/IOMMU: pre-fill all DTEs right after table allocation Jan Beulich
  7 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2019-09-19 13:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

Move the device flags field up into an unused hole, thus shrinking
overall structure size by 8 bytes. Use bool and uint<N>_t as
appropriate. Drop pointless (redundant) initializations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v6: New.

---
 xen/drivers/passthrough/amd/iommu_acpi.c |    6 +++---
 xen/drivers/passthrough/amd/iommu_init.c |    6 ------
 xen/include/asm-x86/amd-iommu.h          |   17 +++++++++--------
 3 files changed, 12 insertions(+), 17 deletions(-)

--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -165,7 +165,7 @@ static void __init reserve_unity_map_for
     /* extend r/w permissioms and keep aggregate */
     ivrs_mappings[bdf].write_permission = iw;
     ivrs_mappings[bdf].read_permission = ir;
-    ivrs_mappings[bdf].unity_map_enable = IOMMU_CONTROL_ENABLED;
+    ivrs_mappings[bdf].unity_map_enable = true;
     ivrs_mappings[bdf].addr_range_start = base;
     ivrs_mappings[bdf].addr_range_length = length;
 }
@@ -242,8 +242,8 @@ static int __init register_exclusion_ran
     if ( limit >= iommu_top  )
     {
         reserve_iommu_exclusion_range(iommu, base, limit);
-        ivrs_mappings[bdf].dte_allow_exclusion = IOMMU_CONTROL_ENABLED;
-        ivrs_mappings[req].dte_allow_exclusion = IOMMU_CONTROL_ENABLED;
+        ivrs_mappings[bdf].dte_allow_exclusion = true;
+        ivrs_mappings[req].dte_allow_exclusion = true;
     }
 
     return 0;
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1222,12 +1222,6 @@ static int __init alloc_ivrs_mappings(u1
     for ( bdf = 0; bdf < ivrs_bdf_entries; bdf++ )
     {
         ivrs_mappings[bdf].dte_requestor_id = bdf;
-        ivrs_mappings[bdf].dte_allow_exclusion = IOMMU_CONTROL_DISABLED;
-        ivrs_mappings[bdf].unity_map_enable = IOMMU_CONTROL_DISABLED;
-        ivrs_mappings[bdf].iommu = NULL;
-
-        ivrs_mappings[bdf].intremap_table = NULL;
-        ivrs_mappings[bdf].device_flags = 0;
 
         if ( amd_iommu_perdev_intremap )
             spin_lock_init(&ivrs_mappings[bdf].intremap_lock);
--- a/xen/include/asm-x86/amd-iommu.h
+++ b/xen/include/asm-x86/amd-iommu.h
@@ -106,12 +106,16 @@ struct amd_iommu {
 };
 
 struct ivrs_mappings {
-    u16 dte_requestor_id;
-    u8 dte_allow_exclusion;
-    u8 unity_map_enable;
-    u8 write_permission;
-    u8 read_permission;
+    uint16_t dte_requestor_id;
     bool valid;
+    bool dte_allow_exclusion;
+    bool unity_map_enable;
+    bool write_permission;
+    bool read_permission;
+
+    /* ivhd device data settings */
+    uint8_t device_flags;
+
     unsigned long addr_range_start;
     unsigned long addr_range_length;
     struct amd_iommu *iommu;
@@ -120,9 +124,6 @@ struct ivrs_mappings {
     void *intremap_table;
     unsigned long *intremap_inuse;
     spinlock_t intremap_lock;
-
-    /* ivhd device data settings */
-    u8 device_flags;
 };
 
 extern unsigned int ivrs_bdf_entries;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment
  2019-09-19 13:08 [Xen-devel] [PATCH v6 0/8] AMD IOMMU: further improvements Jan Beulich
                   ` (5 preceding siblings ...)
  2019-09-19 13:24 ` [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings Jan Beulich
@ 2019-09-19 13:24 ` Jan Beulich
  2019-09-23 16:30   ` Paul Durrant
  2019-09-25 14:19   ` Andrew Cooper
  2019-09-19 13:25 ` [Xen-devel] [PATCH v6 8/8] AMD/IOMMU: pre-fill all DTEs right after table allocation Jan Beulich
  7 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2019-09-19 13:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

Having a single device table for all segments can't possibly be right.
(Even worse, the symbol wasn't static despite being used in just one
source file.) Attach the device tables to their respective IVRS mapping
ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v6: New.

---
 xen/drivers/passthrough/amd/iommu_init.c |   81 ++++++++++++++++---------------
 1 file changed, 43 insertions(+), 38 deletions(-)

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -39,7 +39,6 @@ unsigned int __read_mostly ivrs_bdf_entr
 u8 __read_mostly ivhd_type;
 static struct radix_tree_root ivrs_maps;
 LIST_HEAD_READ_MOSTLY(amd_iommu_head);
-struct table_struct device_table;
 bool_t iommuv2_enabled;
 
 static bool iommu_has_ht_flag(struct amd_iommu *iommu, u8 mask)
@@ -989,6 +988,12 @@ static void disable_iommu(struct amd_iom
     spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
+static unsigned int __init dt_alloc_size(void)
+{
+    return PAGE_SIZE << get_order_from_bytes(ivrs_bdf_entries *
+                                             IOMMU_DEV_TABLE_ENTRY_SIZE);
+}
+
 static void __init deallocate_buffer(void *buf, uint32_t sz)
 {
     int order = 0;
@@ -999,12 +1004,6 @@ static void __init deallocate_buffer(voi
     }
 }
 
-static void __init deallocate_device_table(struct table_struct *table)
-{
-    deallocate_buffer(table->buffer, table->alloc_size);
-    table->buffer = NULL;
-}
-
 static void __init deallocate_ring_buffer(struct ring_buffer *ring_buf)
 {
     deallocate_buffer(ring_buf->buffer, ring_buf->alloc_size);
@@ -1068,8 +1067,29 @@ static void * __init allocate_ppr_log(st
                                 IOMMU_PPR_LOG_DEFAULT_ENTRIES, "PPR Log");
 }
 
+/*
+ * Within ivrs_mappings[] we allocate an extra array element to store
+ * - segment number,
+ * - device table.
+ */
+#define IVRS_MAPPINGS_SEG(m) (m)[ivrs_bdf_entries].dte_requestor_id
+#define IVRS_MAPPINGS_DEVTAB(m) (m)[ivrs_bdf_entries].intremap_table
+
+static void __init free_ivrs_mapping(void *ptr)
+{
+    const struct ivrs_mappings *ivrs_mappings = ptr;
+
+    if ( IVRS_MAPPINGS_DEVTAB(ivrs_mappings) )
+        deallocate_buffer(IVRS_MAPPINGS_DEVTAB(ivrs_mappings),
+                          dt_alloc_size());
+
+    xfree(ptr);
+}
+
 static int __init amd_iommu_init_one(struct amd_iommu *iommu, bool intr)
 {
+    const struct ivrs_mappings *ivrs_mappings;
+
     if ( allocate_cmd_buffer(iommu) == NULL )
         goto error_out;
 
@@ -1082,13 +1102,15 @@ static int __init amd_iommu_init_one(str
     if ( intr && !set_iommu_interrupt_handler(iommu) )
         goto error_out;
 
-    /* To make sure that device_table.buffer has been successfully allocated */
-    if ( device_table.buffer == NULL )
+    /* Make sure that the device table has been successfully allocated. */
+    ivrs_mappings = get_ivrs_mappings(iommu->seg);
+    if ( !IVRS_MAPPINGS_DEVTAB(ivrs_mappings) )
         goto error_out;
 
-    iommu->dev_table.alloc_size = device_table.alloc_size;
-    iommu->dev_table.entries = device_table.entries;
-    iommu->dev_table.buffer = device_table.buffer;
+    iommu->dev_table.alloc_size = dt_alloc_size();
+    iommu->dev_table.entries = iommu->dev_table.alloc_size /
+                               IOMMU_DEV_TABLE_ENTRY_SIZE;
+    iommu->dev_table.buffer = IVRS_MAPPINGS_DEVTAB(ivrs_mappings);
 
     enable_iommu(iommu);
     printk("AMD-Vi: IOMMU %d Enabled.\n", nr_amd_iommus );
@@ -1135,11 +1157,8 @@ static void __init amd_iommu_init_cleanu
         xfree(iommu);
     }
 
-    /* free device table */
-    deallocate_device_table(&device_table);
-
-    /* free ivrs_mappings[] */
-    radix_tree_destroy(&ivrs_maps, xfree);
+    /* Free ivrs_mappings[] and their device tables. */
+    radix_tree_destroy(&ivrs_maps, free_ivrs_mapping);
 
     iommu_enabled = 0;
     iommu_hwdom_passthrough = false;
@@ -1147,12 +1166,6 @@ static void __init amd_iommu_init_cleanu
     iommuv2_enabled = 0;
 }
 
-/*
- * We allocate an extra array element to store the segment number
- * (and in the future perhaps other global information).
- */
-#define IVRS_MAPPINGS_SEG(m) m[ivrs_bdf_entries].dte_requestor_id
-
 struct ivrs_mappings *get_ivrs_mappings(u16 seg)
 {
     return radix_tree_lookup(&ivrs_maps, seg);
@@ -1235,24 +1248,18 @@ static int __init alloc_ivrs_mappings(u1
 static int __init amd_iommu_setup_device_table(
     u16 seg, struct ivrs_mappings *ivrs_mappings)
 {
+    struct amd_iommu_dte *dt = IVRS_MAPPINGS_DEVTAB(ivrs_mappings);
     unsigned int bdf;
 
     BUG_ON( (ivrs_bdf_entries == 0) );
 
-    if ( !device_table.buffer )
+    if ( !dt )
     {
         /* allocate 'device table' on a 4K boundary */
-        device_table.alloc_size = PAGE_SIZE <<
-                                  get_order_from_bytes(
-                                  PAGE_ALIGN(ivrs_bdf_entries *
-                                  IOMMU_DEV_TABLE_ENTRY_SIZE));
-        device_table.entries = device_table.alloc_size /
-                               IOMMU_DEV_TABLE_ENTRY_SIZE;
-
-        device_table.buffer = allocate_buffer(device_table.alloc_size,
-                                              "Device Table");
+        dt = IVRS_MAPPINGS_DEVTAB(ivrs_mappings) =
+            allocate_buffer(dt_alloc_size(), "Device Table");
     }
-    if ( !device_table.buffer )
+    if ( !dt )
         return -ENOMEM;
 
     /* Add device table entries */
@@ -1260,12 +1267,10 @@ static int __init amd_iommu_setup_device
     {
         if ( ivrs_mappings[bdf].valid )
         {
-            void *dte;
             const struct pci_dev *pdev = NULL;
 
             /* add device table entry */
-            dte = device_table.buffer + (bdf * IOMMU_DEV_TABLE_ENTRY_SIZE);
-            iommu_dte_add_device_entry(dte, &ivrs_mappings[bdf]);
+            iommu_dte_add_device_entry(&dt[bdf], &ivrs_mappings[bdf]);
 
             if ( iommu_intremap &&
                  ivrs_mappings[bdf].dte_requestor_id == bdf &&
@@ -1308,7 +1313,7 @@ static int __init amd_iommu_setup_device
             }
 
             amd_iommu_set_intremap_table(
-                dte, ivrs_mappings[bdf].intremap_table,
+                &dt[bdf], ivrs_mappings[bdf].intremap_table,
                 ivrs_mappings[bdf].iommu, iommu_intremap);
         }
     }


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Xen-devel] [PATCH v6 8/8] AMD/IOMMU: pre-fill all DTEs right after table allocation
  2019-09-19 13:08 [Xen-devel] [PATCH v6 0/8] AMD IOMMU: further improvements Jan Beulich
                   ` (6 preceding siblings ...)
  2019-09-19 13:24 ` [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment Jan Beulich
@ 2019-09-19 13:25 ` Jan Beulich
  2019-09-23 16:33   ` Paul Durrant
  2019-09-25 14:59   ` Andrew Cooper
  7 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2019-09-19 13:25 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

Make sure we don't leave any DTEs unexpected requests through which
would be passed through untranslated. Set V and IV right away (with
all other fields left as zero), relying on the V and/or IV bits
getting cleared only by amd_iommu_set_root_page_table() and
amd_iommu_set_intremap_table() under special pass-through circumstances.
Switch back to initial settings in amd_iommu_disable_domain_device().

Take the liberty and also make the latter function static, constifying
its first parameter at the same time, at this occasion.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v6: New.

---
 xen/drivers/passthrough/amd/iommu_init.c    |   22 +++++++++++++++++++---
 xen/drivers/passthrough/amd/pci_amd_iommu.c |   20 ++++++++++++++++----
 2 files changed, 35 insertions(+), 7 deletions(-)

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1255,12 +1255,28 @@ static int __init amd_iommu_setup_device
 
     if ( !dt )
     {
+        unsigned int size = dt_alloc_size();
+
         /* allocate 'device table' on a 4K boundary */
         dt = IVRS_MAPPINGS_DEVTAB(ivrs_mappings) =
-            allocate_buffer(dt_alloc_size(), "Device Table");
+            allocate_buffer(size, "Device Table");
+        if ( !dt )
+            return -ENOMEM;
+
+        /*
+         * Prefill every DTE such that all kinds of requests will get aborted.
+         * Besides the two bits set to true below this builds upon
+         * IOMMU_DEV_TABLE_SYS_MGT_DMA_ABORTED,
+         * IOMMU_DEV_TABLE_IO_CONTROL_ABORTED, as well as
+         * IOMMU_DEV_TABLE_INT_CONTROL_ABORTED all being zero, and us also
+         * wanting at least TV, GV, I, and EX set to false.
+         */
+        for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf )
+        {
+            dt[bdf].v = true;
+            dt[bdf].iv = true;
+        }
     }
-    if ( !dt )
-        return -ENOMEM;
 
     /* Add device table entries */
     for ( bdf = 0; bdf < ivrs_bdf_entries; bdf++ )
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -267,9 +267,9 @@ static void __hwdom_init amd_iommu_hwdom
     setup_hwdom_pci_devices(d, amd_iommu_add_device);
 }
 
-void amd_iommu_disable_domain_device(struct domain *domain,
-                                     struct amd_iommu *iommu,
-                                     u8 devfn, struct pci_dev *pdev)
+static void amd_iommu_disable_domain_device(const struct domain *domain,
+                                            struct amd_iommu *iommu,
+                                            uint8_t devfn, struct pci_dev *pdev)
 {
     struct amd_iommu_dte *table, *dte;
     unsigned long flags;
@@ -284,9 +284,21 @@ void amd_iommu_disable_domain_device(str
     spin_lock_irqsave(&iommu->lock, flags);
     if ( dte->tv || dte->v )
     {
+        /* See the comment in amd_iommu_setup_device_table(). */
+        dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_ABORTED;
+        smp_wmb();
+        dte->iv = true;
         dte->tv = false;
-        dte->v = false;
+        dte->gv = false;
         dte->i = false;
+        dte->ex = false;
+        dte->sa = false;
+        dte->se = false;
+        dte->sd = false;
+        dte->sys_mgt = IOMMU_DEV_TABLE_SYS_MGT_DMA_ABORTED;
+        dte->ioctl = IOMMU_DEV_TABLE_IO_CONTROL_ABORTED;
+        smp_wmb();
+        dte->v = true;
 
         amd_iommu_flush_device(iommu, req_id);
 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early
  2019-09-19 13:22 ` [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early Jan Beulich
@ 2019-09-23 14:22   ` Roger Pau Monné
  2019-09-23 14:41     ` Jan Beulich
  2019-09-23 15:57   ` Paul Durrant
  2019-09-25 13:26   ` Andrew Cooper
  2 siblings, 1 reply; 39+ messages in thread
From: Roger Pau Monné @ 2019-09-23 14:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Wei Liu, Suravee Suthikulpanit, Andrew Cooper

On Thu, Sep 19, 2019 at 03:22:54PM +0200, Jan Beulich wrote:
> Rather than doing this every time we set up interrupts for a device
> anew (and then in several places) fill this invariant field right after
> allocating struct pci_dev.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

LGTM:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Just one nit below.

> ---
> v6: New.
> 
> ---
>  xen/arch/x86/msi.c            |   13 +++++--------
>  xen/drivers/passthrough/pci.c |   10 ++++++++++
>  xen/drivers/vpci/msi.c        |    9 ++++-----
>  xen/include/xen/pci.h         |    3 ++-
>  xen/include/xen/vpci.h        |    6 ++----
>  5 files changed, 23 insertions(+), 18 deletions(-)
> 
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -664,7 +664,7 @@ static int msi_capability_init(struct pc
>  {
>      struct msi_desc *entry;
>      int pos;
> -    unsigned int i, maxvec, mpos;
> +    unsigned int i, mpos;
>      u16 control, seg = dev->seg;
>      u8 bus = dev->bus;
>      u8 slot = PCI_SLOT(dev->devfn);
> @@ -675,9 +675,8 @@ static int msi_capability_init(struct pc
>      if ( !pos )
>          return -ENODEV;
>      control = pci_conf_read16(dev->sbdf, msi_control_reg(pos));
> -    maxvec = multi_msi_capable(control);
> -    if ( nvec > maxvec )
> -        return maxvec;
> +    if ( nvec > dev->msi_maxvec )
> +        return dev->msi_maxvec;
>      control &= ~PCI_MSI_FLAGS_QSIZE;
>      multi_msi_enable(control, nvec);
>  
> @@ -711,7 +710,7 @@ static int msi_capability_init(struct pc
>  
>          /* All MSIs are unmasked by default, Mask them all */
>          maskbits = pci_conf_read32(dev->sbdf, mpos);
> -        maskbits |= ~(u32)0 >> (32 - maxvec);
> +        maskbits |= ~(u32)0 >> (32 - dev->msi_maxvec);

GENMASK would be slightly easier to parse IMO (here and below).

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early
  2019-09-23 14:22   ` Roger Pau Monné
@ 2019-09-23 14:41     ` Jan Beulich
  2019-09-23 15:18       ` Roger Pau Monné
  2019-09-25 13:31       ` Andrew Cooper
  0 siblings, 2 replies; 39+ messages in thread
From: Jan Beulich @ 2019-09-23 14:41 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Andrew Cooper, Wei Liu, Suravee Suthikulpanit, xen-devel

On 23.09.2019 16:22, Roger Pau Monné  wrote:
> On Thu, Sep 19, 2019 at 03:22:54PM +0200, Jan Beulich wrote:
>> Rather than doing this every time we set up interrupts for a device
>> anew (and then in several places) fill this invariant field right after
>> allocating struct pci_dev.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> LGTM:
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

> Just one nit below.
> 
>> @@ -711,7 +710,7 @@ static int msi_capability_init(struct pc
>>  
>>          /* All MSIs are unmasked by default, Mask them all */
>>          maskbits = pci_conf_read32(dev->sbdf, mpos);
>> -        maskbits |= ~(u32)0 >> (32 - maxvec);
>> +        maskbits |= ~(u32)0 >> (32 - dev->msi_maxvec);
> 
> GENMASK would be slightly easier to parse IMO (here and below).

Besides this being an unrelated change, I'm afraid I'm going to
object to use of a macro where it's unclear what its parameters
mean: Even the example in xen/bitops.h is so confusing that I
can't tell whether "h" is meant to be exclusive or inclusive
(looks like the latter is intended). To me the two parameters
also look reversed - I'd expect "low" first and "high" second.
(ISTR having voiced reservations against its introduction, and
I'm happy to see that it's used in Arm code only.)

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early
  2019-09-23 14:41     ` Jan Beulich
@ 2019-09-23 15:18       ` Roger Pau Monné
  2019-09-23 15:26         ` Jan Beulich
  2019-09-25 13:31       ` Andrew Cooper
  1 sibling, 1 reply; 39+ messages in thread
From: Roger Pau Monné @ 2019-09-23 15:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Suravee Suthikulpanit, xen-devel

On Mon, Sep 23, 2019 at 04:41:01PM +0200, Jan Beulich wrote:
> On 23.09.2019 16:22, Roger Pau Monné  wrote:
> > On Thu, Sep 19, 2019 at 03:22:54PM +0200, Jan Beulich wrote:
> >> Rather than doing this every time we set up interrupts for a device
> >> anew (and then in several places) fill this invariant field right after
> >> allocating struct pci_dev.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > 
> > LGTM:
> > 
> > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Thanks.
> 
> > Just one nit below.
> > 
> >> @@ -711,7 +710,7 @@ static int msi_capability_init(struct pc
> >>  
> >>          /* All MSIs are unmasked by default, Mask them all */
> >>          maskbits = pci_conf_read32(dev->sbdf, mpos);
> >> -        maskbits |= ~(u32)0 >> (32 - maxvec);
> >> +        maskbits |= ~(u32)0 >> (32 - dev->msi_maxvec);
> > 
> > GENMASK would be slightly easier to parse IMO (here and below).
> 
> Besides this being an unrelated change, I'm afraid I'm going to
> object to use of a macro where it's unclear what its parameters
> mean: Even the example in xen/bitops.h is so confusing that I
> can't tell whether "h" is meant to be exclusive or inclusive
> (looks like the latter is intended). To me the two parameters
> also look reversed - I'd expect "low" first and "high" second.
> (ISTR having voiced reservations against its introduction, and
> I'm happy to see that it's used in Arm code only.)

I'm not specially trilled to switch to GENMASK, but would you be
willing to change u32 to uint32_t?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early
  2019-09-23 15:18       ` Roger Pau Monné
@ 2019-09-23 15:26         ` Jan Beulich
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2019-09-23 15:26 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Andrew Cooper, Wei Liu, Suravee Suthikulpanit, xen-devel

On 23.09.2019 17:18, Roger Pau Monné  wrote:
> On Mon, Sep 23, 2019 at 04:41:01PM +0200, Jan Beulich wrote:
>> On 23.09.2019 16:22, Roger Pau Monné  wrote:
>>> On Thu, Sep 19, 2019 at 03:22:54PM +0200, Jan Beulich wrote:
>>>> Rather than doing this every time we set up interrupts for a device
>>>> anew (and then in several places) fill this invariant field right after
>>>> allocating struct pci_dev.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> LGTM:
>>>
>>> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>>
>> Thanks.
>>
>>> Just one nit below.
>>>
>>>> @@ -711,7 +710,7 @@ static int msi_capability_init(struct pc
>>>>  
>>>>          /* All MSIs are unmasked by default, Mask them all */
>>>>          maskbits = pci_conf_read32(dev->sbdf, mpos);
>>>> -        maskbits |= ~(u32)0 >> (32 - maxvec);
>>>> +        maskbits |= ~(u32)0 >> (32 - dev->msi_maxvec);
>>>
>>> GENMASK would be slightly easier to parse IMO (here and below).
>>
>> Besides this being an unrelated change, I'm afraid I'm going to
>> object to use of a macro where it's unclear what its parameters
>> mean: Even the example in xen/bitops.h is so confusing that I
>> can't tell whether "h" is meant to be exclusive or inclusive
>> (looks like the latter is intended). To me the two parameters
>> also look reversed - I'd expect "low" first and "high" second.
>> (ISTR having voiced reservations against its introduction, and
>> I'm happy to see that it's used in Arm code only.)
> 
> I'm not specially trilled to switch to GENMASK, but would you be
> willing to change u32 to uint32_t?

Noticing your remark's context, I've done that change already (and
I don't know why I missed doing so right away).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 1/8] AMD/IOMMU: don't blindly allocate interrupt remapping tables
  2019-09-19 13:21 ` [Xen-devel] [PATCH v6 1/8] AMD/IOMMU: don't blindly allocate interrupt remapping tables Jan Beulich
@ 2019-09-23 15:27   ` Paul Durrant
  2019-09-25 13:23   ` Andrew Cooper
  1 sibling, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2019-09-23 15:27 UTC (permalink / raw)
  To: 'Jan Beulich', xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> Sent: 19 September 2019 14:22
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Subject: [Xen-devel] [PATCH v6 1/8] AMD/IOMMU: don't blindly allocate interrupt remapping tables
> 
> ACPI tables are free to list far more device coordinates than there are
> actual devices. By delaying the table allocations for most cases, and
> doing them only when an actual device is known to be present at a given
> position, overall memory used for the tables goes down from over 500k
> pages to just over 1k (on my system having such ACPI table contents).
> 
> Tables continue to get allocated right away for special entries
> (IO-APIC, HPET) as well as for alias IDs. While in the former case
> that's simply because there may not be any device at a given position,
> in the latter case this is to avoid having to introduce ref-counting of
> table usage.
> 
> The change involves invoking
> iterate_ivrs_mappings(amd_iommu_setup_device_table) a second time,
> because the function now wants to be able to find PCI devices, which
> isn't possible yet when IOMMU setup happens very early during x2APIC
> mode setup. In this context amd_iommu_init_interrupt() gets renamed as
> well.
> 
> The logic adjusting a DTE's interrupt remapping attributes also gets
> changed, such that the lack of an IRT would result in target aborted
> rather than not remapped interrupts (should any occur).
> 
> Note that for now phantom functions get separate IRTs allocated, as was
> the case before.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> ---
> v6: Acquire IOMMU lock in code added to amd_iommu_add_device(). Drop a
>     pointless use of the conditional operator.
> v5: New.
> ---
> TBD: This retains prior (but suspicious) behavior of not calling
>      amd_iommu_set_intremap_table() for "invalid" IVRS mapping entries.
>      Since DTE.IV=0 means un-remapped interrupts, I wonder if this needs
>      changing.
> 
> Backporting note: This depends on b5fbe81196 "iommu / x86: move call to
> scan_pci_devices() out of vendor code"!
> 
> ---
>  xen/drivers/passthrough/amd/iommu_acpi.c      |   65 +++++++++++++----------
>  xen/drivers/passthrough/amd/iommu_init.c      |   73 ++++++++++++++++++++------
>  xen/drivers/passthrough/amd/iommu_intr.c      |    4 -
>  xen/drivers/passthrough/amd/iommu_map.c       |    5 +
>  xen/drivers/passthrough/amd/pci_amd_iommu.c   |   43 ++++++++++++++-
>  xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |    2
>  6 files changed, 143 insertions(+), 49 deletions(-)
> 
> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
> @@ -53,7 +53,8 @@ union acpi_ivhd_device {
>  };
> 
>  static void __init add_ivrs_mapping_entry(
> -    u16 bdf, u16 alias_id, u8 flags, struct amd_iommu *iommu)
> +    uint16_t bdf, uint16_t alias_id, uint8_t flags, bool alloc_irt,
> +    struct amd_iommu *iommu)
>  {
>      struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(iommu->seg);
> 
> @@ -69,27 +70,32 @@ static void __init add_ivrs_mapping_entr
>      if ( iommu->bdf == bdf )
>          return;
> 
> -    if ( !ivrs_mappings[alias_id].intremap_table )
> +    /* Allocate interrupt remapping table if needed. */
> +    if ( iommu_intremap && !ivrs_mappings[alias_id].intremap_table )
>      {
> -         /* allocate per-device interrupt remapping table */
> -         if ( amd_iommu_perdev_intremap )
> -             ivrs_mappings[alias_id].intremap_table =
> -                 amd_iommu_alloc_intremap_table(
> -                     iommu,
> -                     &ivrs_mappings[alias_id].intremap_inuse);
> -         else
> -         {
> -             if ( shared_intremap_table == NULL  )
> -                 shared_intremap_table = amd_iommu_alloc_intremap_table(
> -                     iommu,
> -                     &shared_intremap_inuse);
> -             ivrs_mappings[alias_id].intremap_table = shared_intremap_table;
> -             ivrs_mappings[alias_id].intremap_inuse = shared_intremap_inuse;
> -         }
> -
> -         if ( !ivrs_mappings[alias_id].intremap_table )
> -             panic("No memory for %04x:%02x:%02x.%u's IRT\n", iommu->seg,
> -                   PCI_BUS(alias_id), PCI_SLOT(alias_id), PCI_FUNC(alias_id));
> +        if ( !amd_iommu_perdev_intremap )
> +        {
> +            if ( !shared_intremap_table )
> +                shared_intremap_table = amd_iommu_alloc_intremap_table(
> +                    iommu, &shared_intremap_inuse);
> +
> +            if ( !shared_intremap_table )
> +                panic("No memory for shared IRT\n");
> +
> +            ivrs_mappings[alias_id].intremap_table = shared_intremap_table;
> +            ivrs_mappings[alias_id].intremap_inuse = shared_intremap_inuse;
> +        }
> +        else if ( alloc_irt )
> +        {
> +            ivrs_mappings[alias_id].intremap_table =
> +                amd_iommu_alloc_intremap_table(
> +                    iommu, &ivrs_mappings[alias_id].intremap_inuse);
> +
> +            if ( !ivrs_mappings[alias_id].intremap_table )
> +                panic("No memory for %04x:%02x:%02x.%u's IRT\n",
> +                      iommu->seg, PCI_BUS(alias_id), PCI_SLOT(alias_id),
> +                      PCI_FUNC(alias_id));
> +        }
>      }
> 
>      ivrs_mappings[alias_id].valid = true;
> @@ -433,7 +439,8 @@ static u16 __init parse_ivhd_device_sele
>          return 0;
>      }
> 
> -    add_ivrs_mapping_entry(bdf, bdf, select->header.data_setting, iommu);
> +    add_ivrs_mapping_entry(bdf, bdf, select->header.data_setting, false,
> +                           iommu);
> 
>      return sizeof(*select);
>  }
> @@ -479,7 +486,7 @@ static u16 __init parse_ivhd_device_rang
> 
>      for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
>          add_ivrs_mapping_entry(bdf, bdf, range->start.header.data_setting,
> -                               iommu);
> +                               false, iommu);
> 
>      return dev_length;
>  }
> @@ -513,7 +520,8 @@ static u16 __init parse_ivhd_device_alia
> 
>      AMD_IOMMU_DEBUG(" Dev_Id Alias: %#x\n", alias_id);
> 
> -    add_ivrs_mapping_entry(bdf, alias_id, alias->header.data_setting, iommu);
> +    add_ivrs_mapping_entry(bdf, alias_id, alias->header.data_setting, true,
> +                           iommu);
> 
>      return dev_length;
>  }
> @@ -568,7 +576,7 @@ static u16 __init parse_ivhd_device_alia
> 
>      for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
>          add_ivrs_mapping_entry(bdf, alias_id, range->alias.header.data_setting,
> -                               iommu);
> +                               true, iommu);
> 
>      return dev_length;
>  }
> @@ -593,7 +601,7 @@ static u16 __init parse_ivhd_device_exte
>          return 0;
>      }
> 
> -    add_ivrs_mapping_entry(bdf, bdf, ext->header.data_setting, iommu);
> +    add_ivrs_mapping_entry(bdf, bdf, ext->header.data_setting, false, iommu);
> 
>      return dev_length;
>  }
> @@ -640,7 +648,7 @@ static u16 __init parse_ivhd_device_exte
> 
>      for ( bdf = first_bdf; bdf <= last_bdf; bdf++ )
>          add_ivrs_mapping_entry(bdf, bdf, range->extended.header.data_setting,
> -                               iommu);
> +                               false, iommu);
> 
>      return dev_length;
>  }
> @@ -733,7 +741,8 @@ static u16 __init parse_ivhd_device_spec
>      AMD_IOMMU_DEBUG("IVHD Special: %04x:%02x:%02x.%u variety %#x handle %#x\n",
>                      seg, PCI_BUS(bdf), PCI_SLOT(bdf), PCI_FUNC(bdf),
>                      special->variety, special->handle);
> -    add_ivrs_mapping_entry(bdf, bdf, special->header.data_setting, iommu);
> +    add_ivrs_mapping_entry(bdf, bdf, special->header.data_setting, true,
> +                           iommu);
> 
>      switch ( special->variety )
>      {
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -30,6 +30,7 @@
>  #include <xen/delay.h>
> 
>  static int __initdata nr_amd_iommus;
> +static bool __initdata pci_init;
> 
>  static void do_amd_iommu_irq(unsigned long data);
>  static DECLARE_SOFTIRQ_TASKLET(amd_iommu_irq_tasklet, do_amd_iommu_irq, 0);
> @@ -1244,17 +1245,20 @@ static int __init amd_iommu_setup_device
> 
>      BUG_ON( (ivrs_bdf_entries == 0) );
> 
> -    /* allocate 'device table' on a 4K boundary */
> -    device_table.alloc_size = PAGE_SIZE <<
> -                              get_order_from_bytes(
> -                              PAGE_ALIGN(ivrs_bdf_entries *
> -                              IOMMU_DEV_TABLE_ENTRY_SIZE));
> -    device_table.entries = device_table.alloc_size /
> -                           IOMMU_DEV_TABLE_ENTRY_SIZE;
> -
> -    device_table.buffer = allocate_buffer(device_table.alloc_size,
> -                                          "Device Table");
> -    if  ( device_table.buffer == NULL )
> +    if ( !device_table.buffer )
> +    {
> +        /* allocate 'device table' on a 4K boundary */
> +        device_table.alloc_size = PAGE_SIZE <<
> +                                  get_order_from_bytes(
> +                                  PAGE_ALIGN(ivrs_bdf_entries *
> +                                  IOMMU_DEV_TABLE_ENTRY_SIZE));
> +        device_table.entries = device_table.alloc_size /
> +                               IOMMU_DEV_TABLE_ENTRY_SIZE;
> +
> +        device_table.buffer = allocate_buffer(device_table.alloc_size,
> +                                              "Device Table");
> +    }
> +    if ( !device_table.buffer )
>          return -ENOMEM;
> 
>      /* Add device table entries */
> @@ -1263,13 +1267,46 @@ static int __init amd_iommu_setup_device
>          if ( ivrs_mappings[bdf].valid )
>          {
>              void *dte;
> +            const struct pci_dev *pdev = NULL;
> 
>              /* add device table entry */
>              dte = device_table.buffer + (bdf * IOMMU_DEV_TABLE_ENTRY_SIZE);
>              iommu_dte_add_device_entry(dte, &ivrs_mappings[bdf]);
> 
> +            if ( iommu_intremap &&
> +                 ivrs_mappings[bdf].dte_requestor_id == bdf &&
> +                 !ivrs_mappings[bdf].intremap_table )
> +            {
> +                if ( !pci_init )
> +                    continue;
> +                pcidevs_lock();
> +                pdev = pci_get_pdev(seg, PCI_BUS(bdf), PCI_DEVFN2(bdf));
> +                pcidevs_unlock();
> +            }
> +
> +            if ( pdev )
> +            {
> +                unsigned int req_id = bdf;
> +
> +                do {
> +                    ivrs_mappings[req_id].intremap_table =
> +                        amd_iommu_alloc_intremap_table(
> +                            ivrs_mappings[bdf].iommu,
> +                            &ivrs_mappings[req_id].intremap_inuse);
> +                    if ( !ivrs_mappings[req_id].intremap_table )
> +                        return -ENOMEM;
> +
> +                    if ( !pdev->phantom_stride )
> +                        break;
> +                    req_id += pdev->phantom_stride;
> +                } while ( PCI_SLOT(req_id) == pdev->sbdf.dev );
> +            }
> +
>              amd_iommu_set_intremap_table(
> -                dte, virt_to_maddr(ivrs_mappings[bdf].intremap_table),
> +                dte,
> +                ivrs_mappings[bdf].intremap_table
> +                ? virt_to_maddr(ivrs_mappings[bdf].intremap_table)
> +                : 0,
>                  iommu_intremap);
>          }
>      }
> @@ -1402,7 +1439,8 @@ int __init amd_iommu_init(bool xt)
>      if ( rc )
>          goto error_out;
> 
> -    /* allocate and initialize a global device table shared by all iommus */
> +    /* Allocate and initialize device table(s). */
> +    pci_init = !xt;
>      rc = iterate_ivrs_mappings(amd_iommu_setup_device_table);
>      if ( rc )
>          goto error_out;
> @@ -1422,7 +1460,7 @@ int __init amd_iommu_init(bool xt)
>          /*
>           * Setting up of the IOMMU interrupts cannot occur yet at the (very
>           * early) time we get here when enabling x2APIC mode. Suppress it
> -         * here, and do it explicitly in amd_iommu_init_interrupt().
> +         * here, and do it explicitly in amd_iommu_init_late().
>           */
>          rc = amd_iommu_init_one(iommu, !xt);
>          if ( rc )
> @@ -1436,11 +1474,16 @@ error_out:
>      return rc;
>  }
> 
> -int __init amd_iommu_init_interrupt(void)
> +int __init amd_iommu_init_late(void)
>  {
>      struct amd_iommu *iommu;
>      int rc = 0;
> 
> +    /* Further initialize the device table(s). */
> +    pci_init = true;
> +    if ( iommu_intremap )
> +        rc = iterate_ivrs_mappings(amd_iommu_setup_device_table);
> +
>      for_each_amd_iommu ( iommu )
>      {
>          struct irq_desc *desc;
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -789,7 +789,7 @@ void amd_iommu_read_msi_from_ire(
>      }
>  }
> 
> -int __init amd_iommu_free_intremap_table(
> +int amd_iommu_free_intremap_table(
>      const struct amd_iommu *iommu, struct ivrs_mappings *ivrs_mapping)
>  {
>      void **tblp;
> @@ -814,7 +814,7 @@ int __init amd_iommu_free_intremap_table
>      return 0;
>  }
> 
> -void *__init amd_iommu_alloc_intremap_table(
> +void *amd_iommu_alloc_intremap_table(
>      const struct amd_iommu *iommu, unsigned long **inuse_map)
>  {
>      unsigned int order = intremap_table_order(iommu);
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -116,8 +116,9 @@ void __init amd_iommu_set_intremap_table
>      struct amd_iommu_dte *dte, uint64_t intremap_ptr, bool valid)
>  {
>      dte->it_root = intremap_ptr >> 6;
> -    dte->int_tab_len = IOMMU_INTREMAP_ORDER;
> -    dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
> +    dte->int_tab_len = intremap_ptr ? IOMMU_INTREMAP_ORDER : 0;
> +    dte->int_ctl = intremap_ptr ? IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED
> +                                : IOMMU_DEV_TABLE_INT_CONTROL_ABORTED;
>      dte->ig = false; /* unmapped interrupts result in i/o page faults */
>      dte->iv = valid;
>  }
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -164,7 +164,7 @@ static int __init iov_detect(void)
>      if ( !iommu_enable && !iommu_intremap )
>          return 0;
> 
> -    if ( (init_done ? amd_iommu_init_interrupt()
> +    if ( (init_done ? amd_iommu_init_late()
>                      : amd_iommu_init(false)) != 0 )
>      {
>          printk("AMD-Vi: Error initialization\n");
> @@ -428,6 +428,7 @@ static int amd_iommu_add_device(u8 devfn
>  {
>      struct amd_iommu *iommu;
>      u16 bdf;
> +    struct ivrs_mappings *ivrs_mappings;
> 
>      if ( !pdev->domain )
>          return -EINVAL;
> @@ -457,6 +458,36 @@ static int amd_iommu_add_device(u8 devfn
>          return -ENODEV;
>      }
> 
> +    ivrs_mappings = get_ivrs_mappings(pdev->seg);
> +    bdf = PCI_BDF2(pdev->bus, devfn);
> +    if ( !ivrs_mappings ||
> +         !ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].valid )
> +        return -EPERM;
> +
> +    if ( iommu_intremap &&
> +         ivrs_mappings[bdf].dte_requestor_id == bdf &&
> +         !ivrs_mappings[bdf].intremap_table )
> +    {
> +        unsigned long flags;
> +
> +        ivrs_mappings[bdf].intremap_table =
> +            amd_iommu_alloc_intremap_table(
> +                iommu, &ivrs_mappings[bdf].intremap_inuse);
> +        if ( !ivrs_mappings[bdf].intremap_table )
> +            return -ENOMEM;
> +
> +        spin_lock_irqsave(&iommu->lock, flags);
> +
> +        amd_iommu_set_intremap_table(
> +            iommu->dev_table.buffer + (bdf * IOMMU_DEV_TABLE_ENTRY_SIZE),
> +            virt_to_maddr(ivrs_mappings[bdf].intremap_table),
> +            iommu_intremap);
> +
> +        amd_iommu_flush_device(iommu, bdf);
> +
> +        spin_unlock_irqrestore(&iommu->lock, flags);
> +    }
> +
>      amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev);
>      return 0;
>  }
> @@ -465,6 +496,8 @@ static int amd_iommu_remove_device(u8 de
>  {
>      struct amd_iommu *iommu;
>      u16 bdf;
> +    struct ivrs_mappings *ivrs_mappings;
> +
>      if ( !pdev->domain )
>          return -EINVAL;
> 
> @@ -480,6 +513,14 @@ static int amd_iommu_remove_device(u8 de
>      }
> 
>      amd_iommu_disable_domain_device(pdev->domain, iommu, devfn, pdev);
> +
> +    ivrs_mappings = get_ivrs_mappings(pdev->seg);
> +    bdf = PCI_BDF2(pdev->bus, devfn);
> +    if ( amd_iommu_perdev_intremap &&
> +         ivrs_mappings[bdf].dte_requestor_id == bdf &&
> +         ivrs_mappings[bdf].intremap_table )
> +        amd_iommu_free_intremap_table(iommu, &ivrs_mappings[bdf]);
> +
>      return 0;
>  }
> 
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> @@ -50,7 +50,7 @@ void get_iommu_features(struct amd_iommu
>  /* amd-iommu-init functions */
>  int amd_iommu_prepare(bool xt);
>  int amd_iommu_init(bool xt);
> -int amd_iommu_init_interrupt(void);
> +int amd_iommu_init_late(void);
>  int amd_iommu_update_ivrs_mapping_acpi(void);
>  int iov_adjust_irq_affinities(void);
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 2/8] AMD/IOMMU: make phantom functions share interrupt remapping tables
  2019-09-19 13:22 ` [Xen-devel] [PATCH v6 2/8] AMD/IOMMU: make phantom functions share " Jan Beulich
@ 2019-09-23 15:44   ` Paul Durrant
  0 siblings, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2019-09-23 15:44 UTC (permalink / raw)
  To: 'Jan Beulich', xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> Sent: 19 September 2019 14:22
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Subject: [Xen-devel] [PATCH v6 2/8] AMD/IOMMU: make phantom functions share interrupt remapping tables
> 
> Rather than duplicating entries in amd_iommu_msi_msg_update_ire(), share
> the tables. This mainly requires some care while freeing them, to avoid
> freeing memory blocks twice.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> ---
> v5: New.
> 
> ---
>  xen/drivers/passthrough/amd/iommu_init.c      |   43 +++++++++++++++---------
>  xen/drivers/passthrough/amd/iommu_intr.c      |   45 +++++++++++++-------------
>  xen/drivers/passthrough/amd/pci_amd_iommu.c   |    2 -
>  xen/include/asm-x86/amd-iommu.h               |    2 -
>  xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |    2 -
>  5 files changed, 53 insertions(+), 41 deletions(-)
> 
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -1111,7 +1111,7 @@ static void __init amd_iommu_init_cleanu
>          amd_iommu_free_intremap_table(list_first_entry(&amd_iommu_head,
>                                                         struct amd_iommu,
>                                                         list),
> -                                      NULL);
> +                                      NULL, 0);
> 
>      /* free amd iommu list */
>      list_for_each_entry_safe ( iommu, next, &amd_iommu_head, list )
> @@ -1176,7 +1176,7 @@ int iterate_ivrs_mappings(int (*handler)
>  }
> 
>  int iterate_ivrs_entries(int (*handler)(const struct amd_iommu *,
> -                                        struct ivrs_mappings *))
> +                                        struct ivrs_mappings *, uint16_t bdf))
>  {
>      u16 seg = 0;
>      int rc = 0;
> @@ -1193,7 +1193,7 @@ int iterate_ivrs_entries(int (*handler)(
>              const struct amd_iommu *iommu = map[bdf].iommu;
> 
>              if ( iommu && map[bdf].dte_requestor_id == bdf )
> -                rc = handler(iommu, &map[bdf]);
> +                rc = handler(iommu, &map[bdf], bdf);
>          }
>      } while ( !rc && ++seg );
> 
> @@ -1286,20 +1286,29 @@ static int __init amd_iommu_setup_device
> 
>              if ( pdev )
>              {
> -                unsigned int req_id = bdf;
> -
> -                do {
> -                    ivrs_mappings[req_id].intremap_table =
> -                        amd_iommu_alloc_intremap_table(
> -                            ivrs_mappings[bdf].iommu,
> -                            &ivrs_mappings[req_id].intremap_inuse);
> -                    if ( !ivrs_mappings[req_id].intremap_table )
> -                        return -ENOMEM;
> -
> -                    if ( !pdev->phantom_stride )
> -                        break;
> -                    req_id += pdev->phantom_stride;
> -                } while ( PCI_SLOT(req_id) == pdev->sbdf.dev );
> +                ivrs_mappings[bdf].intremap_table =
> +                    amd_iommu_alloc_intremap_table(
> +                        ivrs_mappings[bdf].iommu,
> +                        &ivrs_mappings[bdf].intremap_inuse);
> +                if ( !ivrs_mappings[bdf].intremap_table )
> +                    return -ENOMEM;
> +
> +                if ( pdev->phantom_stride )
> +                {
> +                    unsigned int req_id = bdf;
> +
> +                    for ( ; ; )
> +                    {
> +                        req_id += pdev->phantom_stride;
> +                        if ( PCI_SLOT(req_id) != pdev->sbdf.dev )
> +                            break;
> +
> +                        ivrs_mappings[req_id].intremap_table =
> +                            ivrs_mappings[bdf].intremap_table;
> +                        ivrs_mappings[req_id].intremap_inuse =
> +                            ivrs_mappings[bdf].intremap_inuse;
> +                    }
> +                }
>              }
> 
>              amd_iommu_set_intremap_table(
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -711,33 +711,20 @@ int amd_iommu_msi_msg_update_ire(
> 
>      if ( msi_desc->remap_index >= 0 && !msg )
>      {
> -        do {
> -            update_intremap_entry_from_msi_msg(iommu, bdf, nr,
> -                                               &msi_desc->remap_index,
> -                                               NULL, NULL);
> -            if ( !pdev || !pdev->phantom_stride )
> -                break;
> -            bdf += pdev->phantom_stride;
> -        } while ( PCI_SLOT(bdf) == PCI_SLOT(pdev->devfn) );
> +        update_intremap_entry_from_msi_msg(iommu, bdf, nr,
> +                                           &msi_desc->remap_index,
> +                                           NULL, NULL);
> 
>          for ( i = 0; i < nr; ++i )
>              msi_desc[i].remap_index = -1;
> -        if ( pdev )
> -            bdf = PCI_BDF2(pdev->bus, pdev->devfn);
>      }
> 
>      if ( !msg )
>          return 0;
> 
> -    do {
> -        rc = update_intremap_entry_from_msi_msg(iommu, bdf, nr,
> -                                                &msi_desc->remap_index,
> -                                                msg, &data);
> -        if ( rc || !pdev || !pdev->phantom_stride )
> -            break;
> -        bdf += pdev->phantom_stride;
> -    } while ( PCI_SLOT(bdf) == PCI_SLOT(pdev->devfn) );
> -
> +    rc = update_intremap_entry_from_msi_msg(iommu, bdf, nr,
> +                                            &msi_desc->remap_index,
> +                                            msg, &data);
>      if ( !rc )
>      {
>          for ( i = 1; i < nr; ++i )
> @@ -790,12 +777,27 @@ void amd_iommu_read_msi_from_ire(
>  }
> 
>  int amd_iommu_free_intremap_table(
> -    const struct amd_iommu *iommu, struct ivrs_mappings *ivrs_mapping)
> +    const struct amd_iommu *iommu, struct ivrs_mappings *ivrs_mapping,
> +    uint16_t bdf)
>  {
>      void **tblp;
> 
>      if ( ivrs_mapping )
>      {
> +        unsigned int i;
> +
> +        /*
> +         * PCI device phantom functions use the same tables as their "base"
> +         * function: Look ahead to zap the pointers.
> +         */
> +        for ( i = 1; PCI_FUNC(bdf + i) && bdf + i < ivrs_bdf_entries; ++i )
> +            if ( ivrs_mapping[i].intremap_table ==
> +                 ivrs_mapping->intremap_table )
> +            {
> +                ivrs_mapping[i].intremap_table = NULL;
> +                ivrs_mapping[i].intremap_inuse = NULL;
> +            }
> +
>          XFREE(ivrs_mapping->intremap_inuse);
>          tblp = &ivrs_mapping->intremap_table;
>      }
> @@ -934,7 +936,8 @@ static void dump_intremap_table(const st
>  }
> 
>  static int dump_intremap_mapping(const struct amd_iommu *iommu,
> -                                 struct ivrs_mappings *ivrs_mapping)
> +                                 struct ivrs_mappings *ivrs_mapping,
> +                                 uint16_t unused)
>  {
>      unsigned long flags;
> 
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -519,7 +519,7 @@ static int amd_iommu_remove_device(u8 de
>      if ( amd_iommu_perdev_intremap &&
>           ivrs_mappings[bdf].dte_requestor_id == bdf &&
>           ivrs_mappings[bdf].intremap_table )
> -        amd_iommu_free_intremap_table(iommu, &ivrs_mappings[bdf]);
> +        amd_iommu_free_intremap_table(iommu, &ivrs_mappings[bdf], bdf);
> 
>      return 0;
>  }
> --- a/xen/include/asm-x86/amd-iommu.h
> +++ b/xen/include/asm-x86/amd-iommu.h
> @@ -131,7 +131,7 @@ extern u8 ivhd_type;
>  struct ivrs_mappings *get_ivrs_mappings(u16 seg);
>  int iterate_ivrs_mappings(int (*)(u16 seg, struct ivrs_mappings *));
>  int iterate_ivrs_entries(int (*)(const struct amd_iommu *,
> -                                 struct ivrs_mappings *));
> +                                 struct ivrs_mappings *, uint16_t));
> 
>  /* iommu tables in guest space */
>  struct mmio_reg {
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> @@ -101,7 +101,7 @@ int amd_iommu_setup_ioapic_remapping(voi
>  void *amd_iommu_alloc_intremap_table(
>      const struct amd_iommu *, unsigned long **);
>  int amd_iommu_free_intremap_table(
> -    const struct amd_iommu *, struct ivrs_mappings *);
> +    const struct amd_iommu *, struct ivrs_mappings *, uint16_t);
>  void amd_iommu_ioapic_update_ire(
>      unsigned int apic, unsigned int reg, unsigned int value);
>  unsigned int amd_iommu_read_ioapic_from_ire(
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early
  2019-09-19 13:22 ` [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early Jan Beulich
  2019-09-23 14:22   ` Roger Pau Monné
@ 2019-09-23 15:57   ` Paul Durrant
  2019-09-23 16:00     ` Jan Beulich
  2019-09-25 13:26   ` Andrew Cooper
  2 siblings, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2019-09-23 15:57 UTC (permalink / raw)
  To: 'Jan Beulich', xen-devel
  Cc: Andrew Cooper, Suravee Suthikulpanit, Wei Liu, Roger Pau Monne

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> Sent: 19 September 2019 14:23
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Wei Liu <wl@xen.org>; Suravee Suthikulpanit
> <suravee.suthikulpanit@amd.com>; Roger Pau Monne <roger.pau@citrix.com>
> Subject: [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early
> 
> Rather than doing this every time we set up interrupts for a device
> anew (and then in several places) fill this invariant field right after
> allocating struct pci_dev.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

...with one nit...

> ---
> v6: New.
> 
> ---
>  xen/arch/x86/msi.c            |   13 +++++--------
>  xen/drivers/passthrough/pci.c |   10 ++++++++++
>  xen/drivers/vpci/msi.c        |    9 ++++-----
>  xen/include/xen/pci.h         |    3 ++-
>  xen/include/xen/vpci.h        |    6 ++----
>  5 files changed, 23 insertions(+), 18 deletions(-)
> 
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -664,7 +664,7 @@ static int msi_capability_init(struct pc
>  {
>      struct msi_desc *entry;
>      int pos;
> -    unsigned int i, maxvec, mpos;
> +    unsigned int i, mpos;
>      u16 control, seg = dev->seg;
>      u8 bus = dev->bus;
>      u8 slot = PCI_SLOT(dev->devfn);
> @@ -675,9 +675,8 @@ static int msi_capability_init(struct pc
>      if ( !pos )
>          return -ENODEV;
>      control = pci_conf_read16(dev->sbdf, msi_control_reg(pos));
> -    maxvec = multi_msi_capable(control);
> -    if ( nvec > maxvec )
> -        return maxvec;
> +    if ( nvec > dev->msi_maxvec )
> +        return dev->msi_maxvec;
>      control &= ~PCI_MSI_FLAGS_QSIZE;
>      multi_msi_enable(control, nvec);
> 
> @@ -711,7 +710,7 @@ static int msi_capability_init(struct pc
> 
>          /* All MSIs are unmasked by default, Mask them all */
>          maskbits = pci_conf_read32(dev->sbdf, mpos);
> -        maskbits |= ~(u32)0 >> (32 - maxvec);
> +        maskbits |= ~(u32)0 >> (32 - dev->msi_maxvec);
>          pci_conf_write32(dev->sbdf, mpos, maskbits);
>      }
>      list_add_tail(&entry->list, &dev->msi_list);
> @@ -1284,7 +1283,6 @@ int pci_msi_conf_write_intercept(struct
>      entry = find_msi_entry(pdev, -1, PCI_CAP_ID_MSI);
>      if ( entry && entry->msi_attrib.maskbit )
>      {
> -        uint16_t cntl;
>          uint32_t unused;
>          unsigned int nvec = entry->msi.nvec;
> 
> @@ -1297,8 +1295,7 @@ int pci_msi_conf_write_intercept(struct
>          if ( reg < entry->msi.mpos || reg >= entry->msi.mpos + 4 || size != 4 )
>              return -EACCES;
> 
> -        cntl = pci_conf_read16(pdev->sbdf, msi_control_reg(pos));
> -        unused = ~(uint32_t)0 >> (32 - multi_msi_capable(cntl));
> +        unused = ~(uint32_t)0 >> (32 - pdev->msi_maxvec);
>          for ( pos = 0; pos < nvec; ++pos, ++entry )
>          {
>              entry->msi_attrib.guest_masked =
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -340,6 +340,16 @@ static struct pci_dev *alloc_pdev(struct
>      pdev->domain = NULL;
>      INIT_LIST_HEAD(&pdev->msi_list);
> 
> +

Stray blank line here by the looks of it.

> +    pos = pci_find_cap_offset(pseg->nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> +                              PCI_CAP_ID_MSI);
> +    if ( pos )
> +    {
> +        uint16_t ctrl = pci_conf_read16(pdev->sbdf, msi_control_reg(pos));
> +
> +        pdev->msi_maxvec = multi_msi_capable(ctrl);
> +    }
> +
>      pos = pci_find_cap_offset(pseg->nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>                                PCI_CAP_ID_MSIX);
>      if ( pos )
> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -27,7 +27,7 @@ static uint32_t control_read(const struc
>  {
>      const struct vpci_msi *msi = data;
> 
> -    return MASK_INSR(fls(msi->max_vectors) - 1, PCI_MSI_FLAGS_QMASK) |
> +    return MASK_INSR(fls(pdev->msi_maxvec) - 1, PCI_MSI_FLAGS_QMASK) |
>             MASK_INSR(fls(msi->vectors) - 1, PCI_MSI_FLAGS_QSIZE) |
>             (msi->enabled ? PCI_MSI_FLAGS_ENABLE : 0) |
>             (msi->masking ? PCI_MSI_FLAGS_MASKBIT : 0) |
> @@ -40,7 +40,7 @@ static void control_write(const struct p
>      struct vpci_msi *msi = data;
>      unsigned int vectors = min_t(uint8_t,
>                                   1u << MASK_EXTR(val, PCI_MSI_FLAGS_QSIZE),
> -                                 msi->max_vectors);
> +                                 pdev->msi_maxvec);
>      bool new_enabled = val & PCI_MSI_FLAGS_ENABLE;
> 
>      /*
> @@ -215,8 +215,7 @@ static int init_msi(struct pci_dev *pdev
>       * FIXME: I've only been able to test this code with devices using a single
>       * MSI interrupt and no mask register.
>       */
> -    pdev->vpci->msi->max_vectors = multi_msi_capable(control);
> -    ASSERT(pdev->vpci->msi->max_vectors <= 32);
> +    ASSERT(pdev->msi_maxvec <= 32);
> 
>      /* The multiple message enable is 0 after reset (1 message enabled). */
>      pdev->vpci->msi->vectors = 1;
> @@ -298,7 +297,7 @@ void vpci_dump_msi(void)
>                  if ( msi->masking )
>                      printk(" mask=%08x", msi->mask);
>                  printk(" vectors max: %u enabled: %u\n",
> -                       msi->max_vectors, msi->vectors);
> +                       pdev->msi_maxvec, msi->vectors);
> 
>                  vpci_msi_arch_print(msi);
>              }
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -94,7 +94,8 @@ struct pci_dev {
>          pci_sbdf_t sbdf;
>      };
> 
> -    u8 phantom_stride;
> +    uint8_t msi_maxvec;
> +    uint8_t phantom_stride;
> 
>      nodeid_t node; /* NUMA node */
> 
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -99,14 +99,12 @@ struct vpci {
>          uint32_t mask;
>          /* Data. */
>          uint16_t data;
> -        /* Maximum number of vectors supported by the device. */
> -        uint8_t max_vectors : 6;
> +        /* Number of vectors configured. */
> +        uint8_t vectors     : 6;
>          /* Supports per-vector masking? */
>          bool masking        : 1;
>          /* 64-bit address capable? */
>          bool address64      : 1;
> -        /* Number of vectors configured. */
> -        uint8_t vectors     : 6;
>          /* Enabled? */
>          bool enabled        : 1;
>          /* Arch-specific data. */
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early
  2019-09-23 15:57   ` Paul Durrant
@ 2019-09-23 16:00     ` Jan Beulich
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2019-09-23 16:00 UTC (permalink / raw)
  To: Paul Durrant
  Cc: xen-devel, Roger Pau Monne, Wei Liu, SuraveeSuthikulpanit, Andrew Cooper

On 23.09.2019 17:57, Paul Durrant wrote:
>> -----Original Message-----
>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
>> Sent: 19 September 2019 14:23
>> To: xen-devel@lists.xenproject.org
>> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Wei Liu <wl@xen.org>; Suravee Suthikulpanit
>> <suravee.suthikulpanit@amd.com>; Roger Pau Monne <roger.pau@citrix.com>
>> Subject: [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early
>>
>> Rather than doing this every time we set up interrupts for a device
>> anew (and then in several places) fill this invariant field right after
>> allocating struct pci_dev.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

Thanks.

>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -340,6 +340,16 @@ static struct pci_dev *alloc_pdev(struct
>>      pdev->domain = NULL;
>>      INIT_LIST_HEAD(&pdev->msi_list);
>>
>> +
> 
> Stray blank line here by the looks of it.

Oh, indeed - dropped.

Jan

>> +    pos = pci_find_cap_offset(pseg->nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>> +                              PCI_CAP_ID_MSI);
>> +    if ( pos )
>> +    {
>> +        uint16_t ctrl = pci_conf_read16(pdev->sbdf, msi_control_reg(pos));
>> +
>> +        pdev->msi_maxvec = multi_msi_capable(ctrl);
>> +    }
>> +
>>      pos = pci_find_cap_offset(pseg->nr, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>>                                PCI_CAP_ID_MSIX);
>>      if ( pos )

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 4/8] AMD/IOMMU: replace INTREMAP_ENTRIES
  2019-09-19 13:23 ` [Xen-devel] [PATCH v6 4/8] AMD/IOMMU: replace INTREMAP_ENTRIES Jan Beulich
@ 2019-09-23 16:13   ` Paul Durrant
  0 siblings, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2019-09-23 16:13 UTC (permalink / raw)
  To: 'Jan Beulich', xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> Sent: 19 September 2019 14:23
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Subject: [Xen-devel] [PATCH v6 4/8] AMD/IOMMU: replace INTREMAP_ENTRIES
> 
> Prepare for the number of entries to not be the maximum possible, by
> separating checks against maximum size from ones against actual size.
> For caller side simplicity have alloc_intremap_entry() return the
> maximum possible value upon allocation failure, rather than the first
> just out-of-bounds one.
> 
> Have the involved functions already take all the subsequently needed
> arguments here already, to reduce code churn in the patch actually
> making the allocation size dynamic.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> ---
> v5: New.
> 
> ---
>  xen/drivers/passthrough/amd/iommu_intr.c      |   93 +++++++++++++++-----------
>  xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |    2
>  2 files changed, 59 insertions(+), 36 deletions(-)
> 
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -69,7 +69,7 @@ union irte_cptr {
>      const union irte128 *ptr128;
>  } __transparent__;
> 
> -#define INTREMAP_ENTRIES (1 << IOMMU_INTREMAP_ORDER)
> +#define INTREMAP_MAX_ENTRIES (1 << IOMMU_INTREMAP_ORDER)
> 
>  struct ioapic_sbdf ioapic_sbdf[MAX_IO_APICS];
>  struct hpet_sbdf hpet_sbdf;
> @@ -83,8 +83,20 @@ static void dump_intremap_tables(unsigne
>  static unsigned int __init intremap_table_order(const struct amd_iommu *iommu)
>  {
>      return iommu->ctrl.ga_en
> -           ? get_order_from_bytes(INTREMAP_ENTRIES * sizeof(union irte128))
> -           : get_order_from_bytes(INTREMAP_ENTRIES * sizeof(union irte32));
> +           ? get_order_from_bytes(INTREMAP_MAX_ENTRIES * sizeof(union irte128))
> +           : get_order_from_bytes(INTREMAP_MAX_ENTRIES * sizeof(union irte32));
> +}
> +
> +unsigned int amd_iommu_intremap_table_order(
> +    const void *irt, const struct amd_iommu *iommu)
> +{
> +    return IOMMU_INTREMAP_ORDER;
> +}
> +
> +static unsigned int intremap_table_entries(
> +    const void *irt, const struct amd_iommu *iommu)
> +{
> +    return 1u << amd_iommu_intremap_table_order(irt, iommu);
>  }
> 
>  unsigned int ioapic_id_to_index(unsigned int apic_id)
> @@ -122,20 +134,24 @@ static int get_intremap_requestor_id(int
>      return get_ivrs_mappings(seg)[bdf].dte_requestor_id;
>  }
> 
> -static unsigned int alloc_intremap_entry(int seg, int bdf, unsigned int nr)
> +static unsigned int alloc_intremap_entry(const struct amd_iommu *iommu,
> +                                         unsigned int bdf, unsigned int nr)
>  {
> -    unsigned long *inuse = get_ivrs_mappings(seg)[bdf].intremap_inuse;
> -    unsigned int slot = find_first_zero_bit(inuse, INTREMAP_ENTRIES);
> +    const struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(iommu->seg);
> +    unsigned long *inuse = ivrs_mappings[bdf].intremap_inuse;
> +    unsigned int nr_ents =
> +        intremap_table_entries(ivrs_mappings[bdf].intremap_table, iommu);
> +    unsigned int slot = find_first_zero_bit(inuse, nr_ents);
> 
>      for ( ; ; )
>      {
>          unsigned int end;
> 
> -        if ( slot >= INTREMAP_ENTRIES )
> +        if ( slot >= nr_ents )
>              break;
> -        end = find_next_bit(inuse, INTREMAP_ENTRIES, slot + 1);
> -        if ( end > INTREMAP_ENTRIES )
> -            end = INTREMAP_ENTRIES;
> +        end = find_next_bit(inuse, nr_ents, slot + 1);
> +        if ( end > nr_ents )
> +            end = nr_ents;
>          slot = (slot + nr - 1) & ~(nr - 1);
>          if ( slot + nr <= end )
>          {
> @@ -144,12 +160,12 @@ static unsigned int alloc_intremap_entry
>              break;
>          }
>          slot = (end + nr) & ~(nr - 1);
> -        if ( slot >= INTREMAP_ENTRIES )
> +        if ( slot >= nr_ents )
>              break;
> -        slot = find_next_zero_bit(inuse, INTREMAP_ENTRIES, slot);
> +        slot = find_next_zero_bit(inuse, nr_ents, slot);
>      }
> 
> -    return slot;
> +    return slot < nr_ents ? slot : INTREMAP_MAX_ENTRIES;
>  }
> 
>  static union irte_ptr get_intremap_entry(const struct amd_iommu *iommu,
> @@ -159,7 +175,7 @@ static union irte_ptr get_intremap_entry
>          .ptr = get_ivrs_mappings(iommu->seg)[bdf].intremap_table
>      };
> 
> -    ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
> +    ASSERT(table.ptr && (index < intremap_table_entries(table.ptr, iommu)));
> 
>      if ( iommu->ctrl.ga_en )
>          table.ptr128 += index;
> @@ -279,10 +295,10 @@ static int update_intremap_entry_from_io
>      spin_lock_irqsave(lock, flags);
> 
>      offset = *index;
> -    if ( offset >= INTREMAP_ENTRIES )
> +    if ( offset >= INTREMAP_MAX_ENTRIES )
>      {
> -        offset = alloc_intremap_entry(iommu->seg, req_id, 1);
> -        if ( offset >= INTREMAP_ENTRIES )
> +        offset = alloc_intremap_entry(iommu, req_id, 1);
> +        if ( offset >= INTREMAP_MAX_ENTRIES )
>          {
>              spin_unlock_irqrestore(lock, flags);
>              rte->mask = 1;
> @@ -400,8 +416,8 @@ int __init amd_iommu_setup_ioapic_remapp
>              }
> 
>              spin_lock_irqsave(lock, flags);
> -            offset = alloc_intremap_entry(seg, req_id, 1);
> -            BUG_ON(offset >= INTREMAP_ENTRIES);
> +            offset = alloc_intremap_entry(iommu, req_id, 1);
> +            BUG_ON(offset >= INTREMAP_MAX_ENTRIES);
>              entry = get_intremap_entry(iommu, req_id, offset);
>              update_intremap_entry(iommu, entry, vector,
>                                    delivery_mode, dest_mode, dest);
> @@ -476,7 +492,7 @@ void amd_iommu_ioapic_update_ire(
>          *(((u32 *)&new_rte) + 1) = value;
>      }
> 
> -    if ( ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_ENTRIES )
> +    if ( ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_MAX_ENTRIES )
>      {
>          ASSERT(saved_mask);
> 
> @@ -548,7 +564,7 @@ unsigned int amd_iommu_read_ioapic_from_
>          return val;
> 
>      offset = ioapic_sbdf[idx].pin_2_idx[pin];
> -    if ( offset >= INTREMAP_ENTRIES )
> +    if ( offset >= INTREMAP_MAX_ENTRIES )
>          return val;
> 
>      seg = ioapic_sbdf[idx].seg;
> @@ -561,8 +577,8 @@ unsigned int amd_iommu_read_ioapic_from_
> 
>      if ( !(reg & 1) )
>      {
> -        ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
> -        val &= ~(INTREMAP_ENTRIES - 1);
> +        ASSERT(offset == (val & (INTREMAP_MAX_ENTRIES - 1)));
> +        val &= ~(INTREMAP_MAX_ENTRIES - 1);
>          /* The IntType fields match for both formats. */
>          val |= MASK_INSR(entry.ptr32->flds.int_type,
>                           IO_APIC_REDIR_DELIV_MODE_MASK);
> @@ -622,11 +638,11 @@ static int update_intremap_entry_from_ms
>          dest = MASK_EXTR(msg->address_lo, MSI_ADDR_DEST_ID_MASK);
> 
>      offset = *remap_index;
> -    if ( offset >= INTREMAP_ENTRIES )
> +    if ( offset >= INTREMAP_MAX_ENTRIES )
>      {
>          ASSERT(nr);
> -        offset = alloc_intremap_entry(iommu->seg, bdf, nr);
> -        if ( offset >= INTREMAP_ENTRIES )
> +        offset = alloc_intremap_entry(iommu, bdf, nr);
> +        if ( offset >= INTREMAP_MAX_ENTRIES )
>          {
>              spin_unlock_irqrestore(lock, flags);
>              return -ENOSPC;
> @@ -654,7 +670,7 @@ static int update_intremap_entry_from_ms
>      update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
>      spin_unlock_irqrestore(lock, flags);
> 
> -    *data = (msg->data & ~(INTREMAP_ENTRIES - 1)) | offset;
> +    *data = (msg->data & ~(INTREMAP_MAX_ENTRIES - 1)) | offset;
> 
>      /*
>       * In some special cases, a pci-e device(e.g SATA controller in IDE mode)
> @@ -738,7 +754,7 @@ int amd_iommu_msi_msg_update_ire(
>  void amd_iommu_read_msi_from_ire(
>      struct msi_desc *msi_desc, struct msi_msg *msg)
>  {
> -    unsigned int offset = msg->data & (INTREMAP_ENTRIES - 1);
> +    unsigned int offset = msg->data & (INTREMAP_MAX_ENTRIES - 1);
>      const struct pci_dev *pdev = msi_desc->dev;
>      u16 bdf = pdev ? PCI_BDF2(pdev->bus, pdev->devfn) : hpet_sbdf.bdf;
>      u16 seg = pdev ? pdev->seg : hpet_sbdf.seg;
> @@ -758,7 +774,7 @@ void amd_iommu_read_msi_from_ire(
>          offset |= nr;
>      }
> 
> -    msg->data &= ~(INTREMAP_ENTRIES - 1);
> +    msg->data &= ~(INTREMAP_MAX_ENTRIES - 1);
>      /* The IntType fields match for both formats. */
>      msg->data |= MASK_INSR(entry.ptr32->flds.int_type,
>                             MSI_DATA_DELIVERY_MODE_MASK);
> @@ -824,8 +840,9 @@ void *amd_iommu_alloc_intremap_table(
> 
>      if ( tb )
>      {
> -        *inuse_map = xzalloc_array(unsigned long,
> -                                   BITS_TO_LONGS(INTREMAP_ENTRIES));
> +        unsigned int nr = intremap_table_entries(tb, iommu);
> +
> +        *inuse_map = xzalloc_array(unsigned long, BITS_TO_LONGS(nr));
>          if ( *inuse_map )
>              memset(tb, 0, PAGE_SIZE << order);
>          else
> @@ -869,6 +886,7 @@ bool __init iov_supports_xt(void)
> 
>  int __init amd_setup_hpet_msi(struct msi_desc *msi_desc)
>  {
> +    const struct amd_iommu *iommu;
>      spinlock_t *lock;
>      unsigned long flags;
>      int rc = 0;
> @@ -886,12 +904,15 @@ int __init amd_setup_hpet_msi(struct msi
>          return -ENODEV;
>      }
> 
> +    iommu = find_iommu_for_device(hpet_sbdf.seg, hpet_sbdf.bdf);
> +    if ( !iommu )
> +        return -ENXIO;
> +
>      lock = get_intremap_lock(hpet_sbdf.seg, hpet_sbdf.bdf);
>      spin_lock_irqsave(lock, flags);
> 
> -    msi_desc->remap_index = alloc_intremap_entry(hpet_sbdf.seg,
> -                                                 hpet_sbdf.bdf, 1);
> -    if ( msi_desc->remap_index >= INTREMAP_ENTRIES )
> +    msi_desc->remap_index = alloc_intremap_entry(iommu, hpet_sbdf.bdf, 1);
> +    if ( msi_desc->remap_index >= INTREMAP_MAX_ENTRIES )
>      {
>          msi_desc->remap_index = -1;
>          rc = -ENXIO;
> @@ -906,12 +927,12 @@ static void dump_intremap_table(const st
>                                  union irte_cptr tbl,
>                                  const struct ivrs_mappings *ivrs_mapping)
>  {
> -    unsigned int count;
> +    unsigned int count, nr = intremap_table_entries(tbl.ptr, iommu);
> 
>      if ( !tbl.ptr )
>          return;
> 
> -    for ( count = 0; count < INTREMAP_ENTRIES; count++ )
> +    for ( count = 0; count < nr; count++ )
>      {
>          if ( iommu->ctrl.ga_en
>               ? !tbl.ptr128[count].raw[0] && !tbl.ptr128[count].raw[1]
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> @@ -102,6 +102,8 @@ void *amd_iommu_alloc_intremap_table(
>      const struct amd_iommu *, unsigned long **);
>  int amd_iommu_free_intremap_table(
>      const struct amd_iommu *, struct ivrs_mappings *, uint16_t);
> +unsigned int amd_iommu_intremap_table_order(
> +    const void *irt, const struct amd_iommu *iommu);
>  void amd_iommu_ioapic_update_ire(
>      unsigned int apic, unsigned int reg, unsigned int value);
>  unsigned int amd_iommu_read_ioapic_from_ire(
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 5/8] AMD/IOMMU: restrict interrupt remapping table sizes
  2019-09-19 13:23 ` [Xen-devel] [PATCH v6 5/8] AMD/IOMMU: restrict interrupt remapping table sizes Jan Beulich
@ 2019-09-23 16:22   ` Paul Durrant
  2019-09-25 13:40   ` Andrew Cooper
  1 sibling, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2019-09-23 16:22 UTC (permalink / raw)
  To: 'Jan Beulich', xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> Sent: 19 September 2019 14:24
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Subject: [Xen-devel] [PATCH v6 5/8] AMD/IOMMU: restrict interrupt remapping table sizes
> 
> There's no point setting up tables with more space than a PCI device can
> use. For both MSI and MSI-X we can determine how many interrupts could
> be set up at most. Tables allocated during ACPI table parsing, however,
> will (for now at least) continue to be set up to have maximum size.
> 
> Note that until we would want to use sub-page allocations here there's
> no point checking whether both MSI and MSI-X are supported by a device -
> an order-0 allocation will fit the dual case in any event, no matter
> that the MSI-X vector count may be smaller than the MSI one.
> 
> On my Rome system this reduces space needed from just over 1k pages to
> about 125.
> 
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> ---
> v6: Don't allocate any IRT at all when device is neither MSI-X nor
>     MSI-capable. Re-base over changes earlier in this series.
> v5: New.
> 
> ---
>  xen/drivers/passthrough/amd/iommu_acpi.c      |    4 +-
>  xen/drivers/passthrough/amd/iommu_init.c      |   13 ++++-----
>  xen/drivers/passthrough/amd/iommu_intr.c      |   36 +++++++++++++++-----------
>  xen/drivers/passthrough/amd/iommu_map.c       |   20 ++++++++++----
>  xen/drivers/passthrough/amd/pci_amd_iommu.c   |   18 +++++++------
>  xen/include/asm-x86/hvm/svm/amd-iommu-defs.h  |    3 --
>  xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |    5 ++-
>  7 files changed, 59 insertions(+), 40 deletions(-)
> 
> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
> @@ -77,7 +77,7 @@ static void __init add_ivrs_mapping_entr
>          {
>              if ( !shared_intremap_table )
>                  shared_intremap_table = amd_iommu_alloc_intremap_table(
> -                    iommu, &shared_intremap_inuse);
> +                    iommu, &shared_intremap_inuse, 0);
> 
>              if ( !shared_intremap_table )
>                  panic("No memory for shared IRT\n");
> @@ -89,7 +89,7 @@ static void __init add_ivrs_mapping_entr
>          {
>              ivrs_mappings[alias_id].intremap_table =
>                  amd_iommu_alloc_intremap_table(
> -                    iommu, &ivrs_mappings[alias_id].intremap_inuse);
> +                    iommu, &ivrs_mappings[alias_id].intremap_inuse, 0);
> 
>              if ( !ivrs_mappings[alias_id].intremap_table )
>                  panic("No memory for %04x:%02x:%02x.%u's IRT\n",
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -1284,12 +1284,14 @@ static int __init amd_iommu_setup_device
>                  pcidevs_unlock();
>              }
> 
> -            if ( pdev )
> +            if ( pdev && (pdev->msix || pdev->msi_maxvec) )
>              {
>                  ivrs_mappings[bdf].intremap_table =
>                      amd_iommu_alloc_intremap_table(
>                          ivrs_mappings[bdf].iommu,
> -                        &ivrs_mappings[bdf].intremap_inuse);
> +                        &ivrs_mappings[bdf].intremap_inuse,
> +                        pdev->msix ? pdev->msix->nr_entries
> +                                   : pdev->msi_maxvec);
>                  if ( !ivrs_mappings[bdf].intremap_table )
>                      return -ENOMEM;
> 
> @@ -1312,11 +1314,8 @@ static int __init amd_iommu_setup_device
>              }
> 
>              amd_iommu_set_intremap_table(
> -                dte,
> -                ivrs_mappings[bdf].intremap_table
> -                ? virt_to_maddr(ivrs_mappings[bdf].intremap_table)
> -                : 0,
> -                iommu_intremap);
> +                dte, ivrs_mappings[bdf].intremap_table,
> +                ivrs_mappings[bdf].iommu, iommu_intremap);
>          }
>      }
> 
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -69,7 +69,8 @@ union irte_cptr {
>      const union irte128 *ptr128;
>  } __transparent__;
> 
> -#define INTREMAP_MAX_ENTRIES (1 << IOMMU_INTREMAP_ORDER)
> +#define INTREMAP_MAX_ORDER   0xB
> +#define INTREMAP_MAX_ENTRIES (1 << INTREMAP_MAX_ORDER)
> 
>  struct ioapic_sbdf ioapic_sbdf[MAX_IO_APICS];
>  struct hpet_sbdf hpet_sbdf;
> @@ -80,17 +81,13 @@ unsigned int nr_ioapic_sbdf;
> 
>  static void dump_intremap_tables(unsigned char key);
> 
> -static unsigned int __init intremap_table_order(const struct amd_iommu *iommu)
> -{
> -    return iommu->ctrl.ga_en
> -           ? get_order_from_bytes(INTREMAP_MAX_ENTRIES * sizeof(union irte128))
> -           : get_order_from_bytes(INTREMAP_MAX_ENTRIES * sizeof(union irte32));
> -}
> +#define intremap_page_order(irt) PFN_ORDER(virt_to_page(irt))
> 
>  unsigned int amd_iommu_intremap_table_order(
>      const void *irt, const struct amd_iommu *iommu)
>  {
> -    return IOMMU_INTREMAP_ORDER;
> +    return intremap_page_order(irt) + PAGE_SHIFT -
> +           (iommu->ctrl.ga_en ? 4 : 2);
>  }
> 
>  static unsigned int intremap_table_entries(
> @@ -825,7 +822,10 @@ int amd_iommu_free_intremap_table(
> 
>      if ( *tblp )
>      {
> -        __free_amd_iommu_tables(*tblp, intremap_table_order(iommu));
> +        unsigned int order = intremap_page_order(*tblp);
> +
> +        intremap_page_order(*tblp) = 0;
> +        __free_amd_iommu_tables(*tblp, order);
>          *tblp = NULL;
>      }
> 
> @@ -833,15 +833,23 @@ int amd_iommu_free_intremap_table(
>  }
> 
>  void *amd_iommu_alloc_intremap_table(
> -    const struct amd_iommu *iommu, unsigned long **inuse_map)
> +    const struct amd_iommu *iommu, unsigned long **inuse_map, unsigned int nr)
>  {
> -    unsigned int order = intremap_table_order(iommu);
> -    void *tb = __alloc_amd_iommu_tables(order);
> +    unsigned int order;
> +    void *tb;
> 
> +    if ( !nr )
> +        nr = INTREMAP_MAX_ENTRIES;
> +
> +    order = iommu->ctrl.ga_en
> +            ? get_order_from_bytes(nr * sizeof(union irte128))
> +            : get_order_from_bytes(nr * sizeof(union irte32));
> +
> +    tb = __alloc_amd_iommu_tables(order);
>      if ( tb )
>      {
> -        unsigned int nr = intremap_table_entries(tb, iommu);
> -
> +        intremap_page_order(tb) = order;
> +        nr = intremap_table_entries(tb, iommu);
>          *inuse_map = xzalloc_array(unsigned long, BITS_TO_LONGS(nr));
>          if ( *inuse_map )
>              memset(tb, 0, PAGE_SIZE << order);
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -113,12 +113,22 @@ void amd_iommu_set_root_page_table(struc
>  }
> 
>  void __init amd_iommu_set_intremap_table(
> -    struct amd_iommu_dte *dte, uint64_t intremap_ptr, bool valid)
> +    struct amd_iommu_dte *dte, const void *ptr,
> +    const struct amd_iommu *iommu, bool valid)
>  {
> -    dte->it_root = intremap_ptr >> 6;
> -    dte->int_tab_len = intremap_ptr ? IOMMU_INTREMAP_ORDER : 0;
> -    dte->int_ctl = intremap_ptr ? IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED
> -                                : IOMMU_DEV_TABLE_INT_CONTROL_ABORTED;
> +    if ( ptr )
> +    {
> +        dte->it_root = virt_to_maddr(ptr) >> 6;
> +        dte->int_tab_len = amd_iommu_intremap_table_order(ptr, iommu);
> +        dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED;
> +    }
> +    else
> +    {
> +        dte->it_root = 0;
> +        dte->int_tab_len = 0;
> +        dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_ABORTED;
> +    }
> +
>      dte->ig = false; /* unmapped interrupts result in i/o page faults */
>      dte->iv = valid;
>  }
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -470,18 +470,22 @@ static int amd_iommu_add_device(u8 devfn
>      {
>          unsigned long flags;
> 
> -        ivrs_mappings[bdf].intremap_table =
> -            amd_iommu_alloc_intremap_table(
> -                iommu, &ivrs_mappings[bdf].intremap_inuse);
> -        if ( !ivrs_mappings[bdf].intremap_table )
> -            return -ENOMEM;
> +        if ( pdev->msix || pdev->msi_maxvec )
> +        {
> +            ivrs_mappings[bdf].intremap_table =
> +                amd_iommu_alloc_intremap_table(
> +                    iommu, &ivrs_mappings[bdf].intremap_inuse,
> +                    pdev->msix ? pdev->msix->nr_entries
> +                               : pdev->msi_maxvec);
> +            if ( !ivrs_mappings[bdf].intremap_table )
> +                return -ENOMEM;
> +        }
> 
>          spin_lock_irqsave(&iommu->lock, flags);
> 
>          amd_iommu_set_intremap_table(
>              iommu->dev_table.buffer + (bdf * IOMMU_DEV_TABLE_ENTRY_SIZE),
> -            virt_to_maddr(ivrs_mappings[bdf].intremap_table),
> -            iommu_intremap);
> +            ivrs_mappings[bdf].intremap_table, iommu, iommu_intremap);
> 
>          amd_iommu_flush_device(iommu, bdf);
> 
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> @@ -107,9 +107,6 @@
>  #define IOMMU_DEV_TABLE_INT_CONTROL_FORWARDED	0x1
>  #define IOMMU_DEV_TABLE_INT_CONTROL_TRANSLATED	0x2
> 
> -/* For now, we always allocate the maximum: 2048 entries. */
> -#define IOMMU_INTREMAP_ORDER			0xB
> -
>  struct amd_iommu_dte {
>      /* 0 - 63 */
>      bool v:1;
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> @@ -72,7 +72,8 @@ int __must_check amd_iommu_flush_iotlb_a
>  /* device table functions */
>  int get_dma_requestor_id(uint16_t seg, uint16_t bdf);
>  void amd_iommu_set_intremap_table(struct amd_iommu_dte *dte,
> -                                  uint64_t intremap_ptr,
> +                                  const void *ptr,
> +                                  const struct amd_iommu *iommu,
>                                    bool valid);
>  void amd_iommu_set_root_page_table(struct amd_iommu_dte *dte,
>  				   uint64_t root_ptr, uint16_t domain_id,
> @@ -99,7 +100,7 @@ struct amd_iommu *find_iommu_for_device(
>  bool iov_supports_xt(void);
>  int amd_iommu_setup_ioapic_remapping(void);
>  void *amd_iommu_alloc_intremap_table(
> -    const struct amd_iommu *, unsigned long **);
> +    const struct amd_iommu *, unsigned long **, unsigned int nr);
>  int amd_iommu_free_intremap_table(
>      const struct amd_iommu *, struct ivrs_mappings *, uint16_t);
>  unsigned int amd_iommu_intremap_table_order(
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings
  2019-09-19 13:24 ` [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings Jan Beulich
@ 2019-09-23 16:25   ` Paul Durrant
  2019-09-24  9:01     ` Jan Beulich
  2019-09-25 13:41   ` Andrew Cooper
  1 sibling, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2019-09-23 16:25 UTC (permalink / raw)
  To: 'Jan Beulich', xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> Sent: 19 September 2019 14:24
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Subject: [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings
> 
> Move the device flags field up into an unused hole, thus shrinking
> overall structure size by 8 bytes. Use bool and uint<N>_t as
> appropriate. Drop pointless (redundant) initializations.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

...although I wonder...

> ---
> v6: New.
> 
> ---
>  xen/drivers/passthrough/amd/iommu_acpi.c |    6 +++---
>  xen/drivers/passthrough/amd/iommu_init.c |    6 ------
>  xen/include/asm-x86/amd-iommu.h          |   17 +++++++++--------
>  3 files changed, 12 insertions(+), 17 deletions(-)
> 
> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
> @@ -165,7 +165,7 @@ static void __init reserve_unity_map_for
>      /* extend r/w permissioms and keep aggregate */
>      ivrs_mappings[bdf].write_permission = iw;
>      ivrs_mappings[bdf].read_permission = ir;
> -    ivrs_mappings[bdf].unity_map_enable = IOMMU_CONTROL_ENABLED;
> +    ivrs_mappings[bdf].unity_map_enable = true;
>      ivrs_mappings[bdf].addr_range_start = base;
>      ivrs_mappings[bdf].addr_range_length = length;
>  }
> @@ -242,8 +242,8 @@ static int __init register_exclusion_ran
>      if ( limit >= iommu_top  )
>      {
>          reserve_iommu_exclusion_range(iommu, base, limit);
> -        ivrs_mappings[bdf].dte_allow_exclusion = IOMMU_CONTROL_ENABLED;
> -        ivrs_mappings[req].dte_allow_exclusion = IOMMU_CONTROL_ENABLED;
> +        ivrs_mappings[bdf].dte_allow_exclusion = true;
> +        ivrs_mappings[req].dte_allow_exclusion = true;
>      }
> 
>      return 0;
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -1222,12 +1222,6 @@ static int __init alloc_ivrs_mappings(u1
>      for ( bdf = 0; bdf < ivrs_bdf_entries; bdf++ )
>      {
>          ivrs_mappings[bdf].dte_requestor_id = bdf;
> -        ivrs_mappings[bdf].dte_allow_exclusion = IOMMU_CONTROL_DISABLED;
> -        ivrs_mappings[bdf].unity_map_enable = IOMMU_CONTROL_DISABLED;
> -        ivrs_mappings[bdf].iommu = NULL;
> -
> -        ivrs_mappings[bdf].intremap_table = NULL;
> -        ivrs_mappings[bdf].device_flags = 0;
> 
>          if ( amd_iommu_perdev_intremap )
>              spin_lock_init(&ivrs_mappings[bdf].intremap_lock);
> --- a/xen/include/asm-x86/amd-iommu.h
> +++ b/xen/include/asm-x86/amd-iommu.h
> @@ -106,12 +106,16 @@ struct amd_iommu {
>  };
> 
>  struct ivrs_mappings {
> -    u16 dte_requestor_id;
> -    u8 dte_allow_exclusion;
> -    u8 unity_map_enable;
> -    u8 write_permission;
> -    u8 read_permission;
> +    uint16_t dte_requestor_id;
>      bool valid;
> +    bool dte_allow_exclusion;
> +    bool unity_map_enable;
> +    bool write_permission;
> +    bool read_permission;

Could you shrink this even more by using a bit-field instead of this sequence of bools?

> +
> +    /* ivhd device data settings */
> +    uint8_t device_flags;
> +
>      unsigned long addr_range_start;
>      unsigned long addr_range_length;
>      struct amd_iommu *iommu;
> @@ -120,9 +124,6 @@ struct ivrs_mappings {
>      void *intremap_table;
>      unsigned long *intremap_inuse;
>      spinlock_t intremap_lock;
> -
> -    /* ivhd device data settings */
> -    u8 device_flags;
>  };
> 
>  extern unsigned int ivrs_bdf_entries;
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment
  2019-09-19 13:24 ` [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment Jan Beulich
@ 2019-09-23 16:30   ` Paul Durrant
  2019-09-24  9:10     ` Jan Beulich
  2019-09-25 14:19   ` Andrew Cooper
  1 sibling, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2019-09-23 16:30 UTC (permalink / raw)
  To: 'Jan Beulich', xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> Sent: 19 September 2019 14:25
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Subject: [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment
> 
> Having a single device table for all segments can't possibly be right.

The copy of the spec. I have says (on page 253: Fixed-Length IVHD Blocks) that IVHD entries must have a segment group of 0, so can't the code just require iommu->seg == 0?

  Paul

> (Even worse, the symbol wasn't static despite being used in just one
> source file.) Attach the device tables to their respective IVRS mapping
> ones.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v6: New.
> 
> ---
>  xen/drivers/passthrough/amd/iommu_init.c |   81 ++++++++++++++++---------------
>  1 file changed, 43 insertions(+), 38 deletions(-)
> 
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -39,7 +39,6 @@ unsigned int __read_mostly ivrs_bdf_entr
>  u8 __read_mostly ivhd_type;
>  static struct radix_tree_root ivrs_maps;
>  LIST_HEAD_READ_MOSTLY(amd_iommu_head);
> -struct table_struct device_table;
>  bool_t iommuv2_enabled;
> 
>  static bool iommu_has_ht_flag(struct amd_iommu *iommu, u8 mask)
> @@ -989,6 +988,12 @@ static void disable_iommu(struct amd_iom
>      spin_unlock_irqrestore(&iommu->lock, flags);
>  }
> 
> +static unsigned int __init dt_alloc_size(void)
> +{
> +    return PAGE_SIZE << get_order_from_bytes(ivrs_bdf_entries *
> +                                             IOMMU_DEV_TABLE_ENTRY_SIZE);
> +}
> +
>  static void __init deallocate_buffer(void *buf, uint32_t sz)
>  {
>      int order = 0;
> @@ -999,12 +1004,6 @@ static void __init deallocate_buffer(voi
>      }
>  }
> 
> -static void __init deallocate_device_table(struct table_struct *table)
> -{
> -    deallocate_buffer(table->buffer, table->alloc_size);
> -    table->buffer = NULL;
> -}
> -
>  static void __init deallocate_ring_buffer(struct ring_buffer *ring_buf)
>  {
>      deallocate_buffer(ring_buf->buffer, ring_buf->alloc_size);
> @@ -1068,8 +1067,29 @@ static void * __init allocate_ppr_log(st
>                                  IOMMU_PPR_LOG_DEFAULT_ENTRIES, "PPR Log");
>  }
> 
> +/*
> + * Within ivrs_mappings[] we allocate an extra array element to store
> + * - segment number,
> + * - device table.
> + */
> +#define IVRS_MAPPINGS_SEG(m) (m)[ivrs_bdf_entries].dte_requestor_id
> +#define IVRS_MAPPINGS_DEVTAB(m) (m)[ivrs_bdf_entries].intremap_table
> +
> +static void __init free_ivrs_mapping(void *ptr)
> +{
> +    const struct ivrs_mappings *ivrs_mappings = ptr;
> +
> +    if ( IVRS_MAPPINGS_DEVTAB(ivrs_mappings) )
> +        deallocate_buffer(IVRS_MAPPINGS_DEVTAB(ivrs_mappings),
> +                          dt_alloc_size());
> +
> +    xfree(ptr);
> +}
> +
>  static int __init amd_iommu_init_one(struct amd_iommu *iommu, bool intr)
>  {
> +    const struct ivrs_mappings *ivrs_mappings;
> +
>      if ( allocate_cmd_buffer(iommu) == NULL )
>          goto error_out;
> 
> @@ -1082,13 +1102,15 @@ static int __init amd_iommu_init_one(str
>      if ( intr && !set_iommu_interrupt_handler(iommu) )
>          goto error_out;
> 
> -    /* To make sure that device_table.buffer has been successfully allocated */
> -    if ( device_table.buffer == NULL )
> +    /* Make sure that the device table has been successfully allocated. */
> +    ivrs_mappings = get_ivrs_mappings(iommu->seg);
> +    if ( !IVRS_MAPPINGS_DEVTAB(ivrs_mappings) )
>          goto error_out;
> 
> -    iommu->dev_table.alloc_size = device_table.alloc_size;
> -    iommu->dev_table.entries = device_table.entries;
> -    iommu->dev_table.buffer = device_table.buffer;
> +    iommu->dev_table.alloc_size = dt_alloc_size();
> +    iommu->dev_table.entries = iommu->dev_table.alloc_size /
> +                               IOMMU_DEV_TABLE_ENTRY_SIZE;
> +    iommu->dev_table.buffer = IVRS_MAPPINGS_DEVTAB(ivrs_mappings);
> 
>      enable_iommu(iommu);
>      printk("AMD-Vi: IOMMU %d Enabled.\n", nr_amd_iommus );
> @@ -1135,11 +1157,8 @@ static void __init amd_iommu_init_cleanu
>          xfree(iommu);
>      }
> 
> -    /* free device table */
> -    deallocate_device_table(&device_table);
> -
> -    /* free ivrs_mappings[] */
> -    radix_tree_destroy(&ivrs_maps, xfree);
> +    /* Free ivrs_mappings[] and their device tables. */
> +    radix_tree_destroy(&ivrs_maps, free_ivrs_mapping);
> 
>      iommu_enabled = 0;
>      iommu_hwdom_passthrough = false;
> @@ -1147,12 +1166,6 @@ static void __init amd_iommu_init_cleanu
>      iommuv2_enabled = 0;
>  }
> 
> -/*
> - * We allocate an extra array element to store the segment number
> - * (and in the future perhaps other global information).
> - */
> -#define IVRS_MAPPINGS_SEG(m) m[ivrs_bdf_entries].dte_requestor_id
> -
>  struct ivrs_mappings *get_ivrs_mappings(u16 seg)
>  {
>      return radix_tree_lookup(&ivrs_maps, seg);
> @@ -1235,24 +1248,18 @@ static int __init alloc_ivrs_mappings(u1
>  static int __init amd_iommu_setup_device_table(
>      u16 seg, struct ivrs_mappings *ivrs_mappings)
>  {
> +    struct amd_iommu_dte *dt = IVRS_MAPPINGS_DEVTAB(ivrs_mappings);
>      unsigned int bdf;
> 
>      BUG_ON( (ivrs_bdf_entries == 0) );
> 
> -    if ( !device_table.buffer )
> +    if ( !dt )
>      {
>          /* allocate 'device table' on a 4K boundary */
> -        device_table.alloc_size = PAGE_SIZE <<
> -                                  get_order_from_bytes(
> -                                  PAGE_ALIGN(ivrs_bdf_entries *
> -                                  IOMMU_DEV_TABLE_ENTRY_SIZE));
> -        device_table.entries = device_table.alloc_size /
> -                               IOMMU_DEV_TABLE_ENTRY_SIZE;
> -
> -        device_table.buffer = allocate_buffer(device_table.alloc_size,
> -                                              "Device Table");
> +        dt = IVRS_MAPPINGS_DEVTAB(ivrs_mappings) =
> +            allocate_buffer(dt_alloc_size(), "Device Table");
>      }
> -    if ( !device_table.buffer )
> +    if ( !dt )
>          return -ENOMEM;
> 
>      /* Add device table entries */
> @@ -1260,12 +1267,10 @@ static int __init amd_iommu_setup_device
>      {
>          if ( ivrs_mappings[bdf].valid )
>          {
> -            void *dte;
>              const struct pci_dev *pdev = NULL;
> 
>              /* add device table entry */
> -            dte = device_table.buffer + (bdf * IOMMU_DEV_TABLE_ENTRY_SIZE);
> -            iommu_dte_add_device_entry(dte, &ivrs_mappings[bdf]);
> +            iommu_dte_add_device_entry(&dt[bdf], &ivrs_mappings[bdf]);
> 
>              if ( iommu_intremap &&
>                   ivrs_mappings[bdf].dte_requestor_id == bdf &&
> @@ -1308,7 +1313,7 @@ static int __init amd_iommu_setup_device
>              }
> 
>              amd_iommu_set_intremap_table(
> -                dte, ivrs_mappings[bdf].intremap_table,
> +                &dt[bdf], ivrs_mappings[bdf].intremap_table,
>                  ivrs_mappings[bdf].iommu, iommu_intremap);
>          }
>      }
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 8/8] AMD/IOMMU: pre-fill all DTEs right after table allocation
  2019-09-19 13:25 ` [Xen-devel] [PATCH v6 8/8] AMD/IOMMU: pre-fill all DTEs right after table allocation Jan Beulich
@ 2019-09-23 16:33   ` Paul Durrant
  2019-09-25 14:59   ` Andrew Cooper
  1 sibling, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2019-09-23 16:33 UTC (permalink / raw)
  To: 'Jan Beulich', xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> Sent: 19 September 2019 14:25
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Subject: [Xen-devel] [PATCH v6 8/8] AMD/IOMMU: pre-fill all DTEs right after table allocation
> 
> Make sure we don't leave any DTEs unexpected requests through which
> would be passed through untranslated. Set V and IV right away (with
> all other fields left as zero), relying on the V and/or IV bits
> getting cleared only by amd_iommu_set_root_page_table() and
> amd_iommu_set_intremap_table() under special pass-through circumstances.
> Switch back to initial settings in amd_iommu_disable_domain_device().
> 
> Take the liberty and also make the latter function static, constifying
> its first parameter at the same time, at this occasion.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> ---
> v6: New.
> 
> ---
>  xen/drivers/passthrough/amd/iommu_init.c    |   22 +++++++++++++++++++---
>  xen/drivers/passthrough/amd/pci_amd_iommu.c |   20 ++++++++++++++++----
>  2 files changed, 35 insertions(+), 7 deletions(-)
> 
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -1255,12 +1255,28 @@ static int __init amd_iommu_setup_device
> 
>      if ( !dt )
>      {
> +        unsigned int size = dt_alloc_size();
> +
>          /* allocate 'device table' on a 4K boundary */
>          dt = IVRS_MAPPINGS_DEVTAB(ivrs_mappings) =
> -            allocate_buffer(dt_alloc_size(), "Device Table");
> +            allocate_buffer(size, "Device Table");
> +        if ( !dt )
> +            return -ENOMEM;
> +
> +        /*
> +         * Prefill every DTE such that all kinds of requests will get aborted.
> +         * Besides the two bits set to true below this builds upon
> +         * IOMMU_DEV_TABLE_SYS_MGT_DMA_ABORTED,
> +         * IOMMU_DEV_TABLE_IO_CONTROL_ABORTED, as well as
> +         * IOMMU_DEV_TABLE_INT_CONTROL_ABORTED all being zero, and us also
> +         * wanting at least TV, GV, I, and EX set to false.
> +         */
> +        for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf )
> +        {
> +            dt[bdf].v = true;
> +            dt[bdf].iv = true;
> +        }
>      }
> -    if ( !dt )
> -        return -ENOMEM;
> 
>      /* Add device table entries */
>      for ( bdf = 0; bdf < ivrs_bdf_entries; bdf++ )
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -267,9 +267,9 @@ static void __hwdom_init amd_iommu_hwdom
>      setup_hwdom_pci_devices(d, amd_iommu_add_device);
>  }
> 
> -void amd_iommu_disable_domain_device(struct domain *domain,
> -                                     struct amd_iommu *iommu,
> -                                     u8 devfn, struct pci_dev *pdev)
> +static void amd_iommu_disable_domain_device(const struct domain *domain,
> +                                            struct amd_iommu *iommu,
> +                                            uint8_t devfn, struct pci_dev *pdev)
>  {
>      struct amd_iommu_dte *table, *dte;
>      unsigned long flags;
> @@ -284,9 +284,21 @@ void amd_iommu_disable_domain_device(str
>      spin_lock_irqsave(&iommu->lock, flags);
>      if ( dte->tv || dte->v )
>      {
> +        /* See the comment in amd_iommu_setup_device_table(). */
> +        dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_ABORTED;
> +        smp_wmb();
> +        dte->iv = true;
>          dte->tv = false;
> -        dte->v = false;
> +        dte->gv = false;
>          dte->i = false;
> +        dte->ex = false;
> +        dte->sa = false;
> +        dte->se = false;
> +        dte->sd = false;
> +        dte->sys_mgt = IOMMU_DEV_TABLE_SYS_MGT_DMA_ABORTED;
> +        dte->ioctl = IOMMU_DEV_TABLE_IO_CONTROL_ABORTED;
> +        smp_wmb();
> +        dte->v = true;
> 
>          amd_iommu_flush_device(iommu, req_id);
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings
  2019-09-23 16:25   ` Paul Durrant
@ 2019-09-24  9:01     ` Jan Beulich
  2019-09-24  9:08       ` Paul Durrant
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2019-09-24  9:01 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Suravee Suthikulpanit, Andrew Cooper

On 23.09.2019 18:25, Paul Durrant wrote:
>> -----Original Message-----
>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
>> Sent: 19 September 2019 14:24
>> To: xen-devel@lists.xenproject.org
>> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> Subject: [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings
>>
>> Move the device flags field up into an unused hole, thus shrinking
>> overall structure size by 8 bytes. Use bool and uint<N>_t as
>> appropriate. Drop pointless (redundant) initializations.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

Thanks.

> ...although I wonder...
> 
>> --- a/xen/include/asm-x86/amd-iommu.h
>> +++ b/xen/include/asm-x86/amd-iommu.h
>> @@ -106,12 +106,16 @@ struct amd_iommu {
>>  };
>>
>>  struct ivrs_mappings {
>> -    u16 dte_requestor_id;
>> -    u8 dte_allow_exclusion;
>> -    u8 unity_map_enable;
>> -    u8 write_permission;
>> -    u8 read_permission;
>> +    uint16_t dte_requestor_id;
>>      bool valid;
>> +    bool dte_allow_exclusion;
>> +    bool unity_map_enable;
>> +    bool write_permission;
>> +    bool read_permission;
> 
> Could you shrink this even more by using a bit-field instead of this sequence of bools?

Indeed I had been considering this. Besides the fact that making
such a move simply didn't look to fit other things here very well
when introducing the "valid" flag in an earlier path, and then
also not here, do you realize though that this wouldn't shrink
the structure's size right now (i.e. it would only be potentially
reducing future growth)? This was my main argument against going
this further step; let me know what you think.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings
  2019-09-24  9:01     ` Jan Beulich
@ 2019-09-24  9:08       ` Paul Durrant
  2019-09-24  9:13         ` Jan Beulich
  0 siblings, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2019-09-24  9:08 UTC (permalink / raw)
  To: 'Jan Beulich'; +Cc: xen-devel, Suravee Suthikulpanit, Andrew Cooper



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 24 September 2019 10:02
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: xen-devel@lists.xenproject.org; Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>; Andrew
> Cooper <Andrew.Cooper3@citrix.com>
> Subject: Re: [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings
> 
> On 23.09.2019 18:25, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> >> Sent: 19 September 2019 14:24
> >> To: xen-devel@lists.xenproject.org
> >> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Suravee Suthikulpanit
> <suravee.suthikulpanit@amd.com>
> >> Subject: [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings
> >>
> >> Move the device flags field up into an unused hole, thus shrinking
> >> overall structure size by 8 bytes. Use bool and uint<N>_t as
> >> appropriate. Drop pointless (redundant) initializations.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >
> > Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
> 
> Thanks.
> 
> > ...although I wonder...
> >
> >> --- a/xen/include/asm-x86/amd-iommu.h
> >> +++ b/xen/include/asm-x86/amd-iommu.h
> >> @@ -106,12 +106,16 @@ struct amd_iommu {
> >>  };
> >>
> >>  struct ivrs_mappings {
> >> -    u16 dte_requestor_id;
> >> -    u8 dte_allow_exclusion;
> >> -    u8 unity_map_enable;
> >> -    u8 write_permission;
> >> -    u8 read_permission;
> >> +    uint16_t dte_requestor_id;
> >>      bool valid;
> >> +    bool dte_allow_exclusion;
> >> +    bool unity_map_enable;
> >> +    bool write_permission;
> >> +    bool read_permission;
> >
> > Could you shrink this even more by using a bit-field instead of this sequence of bools?
> 
> Indeed I had been considering this. Besides the fact that making
> such a move simply didn't look to fit other things here very well
> when introducing the "valid" flag in an earlier path, and then
> also not here, do you realize though that this wouldn't shrink
> the structure's size right now (i.e. it would only be potentially
> reducing future growth)?

Yes, I'd failed to note the 'unsigned long' afterwards, but...

> This was my main argument against going
> this further step; let me know what you think.
> 

I still think a pre-emptive squash into a uint8_t bit-field followed by the device_flags field would give the struct a nice 32-bit hole for potential future use.

  Paul

> Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment
  2019-09-23 16:30   ` Paul Durrant
@ 2019-09-24  9:10     ` Jan Beulich
  2019-09-24  9:20       ` Paul Durrant
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2019-09-24  9:10 UTC (permalink / raw)
  To: Paul Durrant, xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

On 23.09.2019 18:30, Paul Durrant wrote:
>> -----Original Message-----
>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
>> Sent: 19 September 2019 14:25
>> To: xen-devel@lists.xenproject.org
>> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> Subject: [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment
>>
>> Having a single device table for all segments can't possibly be right.
> 
> The copy of the spec. I have says (on page 253: Fixed-Length IVHD
> Blocks) that IVHD entries must have a segment group of 0, so can't
> the code just require iommu->seg == 0?

The wording in my version is "At this time, only PCI Segment Group 0 is
supported." This suggests to me that it is not a good idea to have logic
baked in that depends on this remaining true. I realize though that there
are more places than just this one where we (have to) assume segment 0
(all in iommu_acpi.c, and all marked with an XXX comment).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings
  2019-09-24  9:08       ` Paul Durrant
@ 2019-09-24  9:13         ` Jan Beulich
  2019-09-24  9:18           ` Paul Durrant
  0 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2019-09-24  9:13 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, SuraveeSuthikulpanit, xen-devel

On 24.09.2019 11:08, Paul Durrant wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 24 September 2019 10:02
>>
>> On 23.09.2019 18:25, Paul Durrant wrote:
>>>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
>>>> Sent: 19 September 2019 14:24
>>>>
>>>> --- a/xen/include/asm-x86/amd-iommu.h
>>>> +++ b/xen/include/asm-x86/amd-iommu.h
>>>> @@ -106,12 +106,16 @@ struct amd_iommu {
>>>>  };
>>>>
>>>>  struct ivrs_mappings {
>>>> -    u16 dte_requestor_id;
>>>> -    u8 dte_allow_exclusion;
>>>> -    u8 unity_map_enable;
>>>> -    u8 write_permission;
>>>> -    u8 read_permission;
>>>> +    uint16_t dte_requestor_id;
>>>>      bool valid;
>>>> +    bool dte_allow_exclusion;
>>>> +    bool unity_map_enable;
>>>> +    bool write_permission;
>>>> +    bool read_permission;
>>>
>>> Could you shrink this even more by using a bit-field instead of this sequence of bools?
>>
>> Indeed I had been considering this. Besides the fact that making
>> such a move simply didn't look to fit other things here very well
>> when introducing the "valid" flag in an earlier path, and then
>> also not here, do you realize though that this wouldn't shrink
>> the structure's size right now (i.e. it would only be potentially
>> reducing future growth)?
> 
> Yes, I'd failed to note the 'unsigned long' afterwards, but...
> 
>> This was my main argument against going
>> this further step; let me know what you think.
>>
> 
> I still think a pre-emptive squash into a uint8_t bit-field followed
> by the device_flags field would give the struct a nice 32-bit hole
> for potential future use.

Okay, will do then. I take it your R-b can remain in place with this
extra change.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings
  2019-09-24  9:13         ` Jan Beulich
@ 2019-09-24  9:18           ` Paul Durrant
  0 siblings, 0 replies; 39+ messages in thread
From: Paul Durrant @ 2019-09-24  9:18 UTC (permalink / raw)
  To: 'Jan Beulich'; +Cc: Andrew Cooper, SuraveeSuthikulpanit, xen-devel

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 24 September 2019 10:13
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: SuraveeSuthikulpanit <suravee.suthikulpanit@amd.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>;
> xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings
> 
> On 24.09.2019 11:08, Paul Durrant wrote:
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 24 September 2019 10:02
> >>
> >> On 23.09.2019 18:25, Paul Durrant wrote:
> >>>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> >>>> Sent: 19 September 2019 14:24
> >>>>
> >>>> --- a/xen/include/asm-x86/amd-iommu.h
> >>>> +++ b/xen/include/asm-x86/amd-iommu.h
> >>>> @@ -106,12 +106,16 @@ struct amd_iommu {
> >>>>  };
> >>>>
> >>>>  struct ivrs_mappings {
> >>>> -    u16 dte_requestor_id;
> >>>> -    u8 dte_allow_exclusion;
> >>>> -    u8 unity_map_enable;
> >>>> -    u8 write_permission;
> >>>> -    u8 read_permission;
> >>>> +    uint16_t dte_requestor_id;
> >>>>      bool valid;
> >>>> +    bool dte_allow_exclusion;
> >>>> +    bool unity_map_enable;
> >>>> +    bool write_permission;
> >>>> +    bool read_permission;
> >>>
> >>> Could you shrink this even more by using a bit-field instead of this sequence of bools?
> >>
> >> Indeed I had been considering this. Besides the fact that making
> >> such a move simply didn't look to fit other things here very well
> >> when introducing the "valid" flag in an earlier path, and then
> >> also not here, do you realize though that this wouldn't shrink
> >> the structure's size right now (i.e. it would only be potentially
> >> reducing future growth)?
> >
> > Yes, I'd failed to note the 'unsigned long' afterwards, but...
> >
> >> This was my main argument against going
> >> this further step; let me know what you think.
> >>
> >
> > I still think a pre-emptive squash into a uint8_t bit-field followed
> > by the device_flags field would give the struct a nice 32-bit hole
> > for potential future use.
> 
> Okay, will do then. I take it your R-b can remain in place with this
> extra change.

Absolutely. Thanks,

  Paul

> 
> Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment
  2019-09-24  9:10     ` Jan Beulich
@ 2019-09-24  9:20       ` Paul Durrant
  2019-09-24  9:30         ` Jan Beulich
  0 siblings, 1 reply; 39+ messages in thread
From: Paul Durrant @ 2019-09-24  9:20 UTC (permalink / raw)
  To: 'Jan Beulich', xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 24 September 2019 10:10
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>
> Subject: Re: [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment
> 
> On 23.09.2019 18:30, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
> >> Sent: 19 September 2019 14:25
> >> To: xen-devel@lists.xenproject.org
> >> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Suravee Suthikulpanit
> <suravee.suthikulpanit@amd.com>
> >> Subject: [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment
> >>
> >> Having a single device table for all segments can't possibly be right.
> >
> > The copy of the spec. I have says (on page 253: Fixed-Length IVHD
> > Blocks) that IVHD entries must have a segment group of 0, so can't
> > the code just require iommu->seg == 0?
> 
> The wording in my version is "At this time, only PCI Segment Group 0 is
> supported." This suggests to me that it is not a good idea to have logic
> baked in that depends on this remaining true. I realize though that there
> are more places than just this one where we (have to) assume segment 0
> (all in iommu_acpi.c, and all marked with an XXX comment).
> 

Ok. Fair enough. I just wasn't sure it was worth doing this change at the moment; but it doesn't hurt, so you can add my R-b.

  Paul

> Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment
  2019-09-24  9:20       ` Paul Durrant
@ 2019-09-24  9:30         ` Jan Beulich
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2019-09-24  9:30 UTC (permalink / raw)
  To: Paul Durrant, xen-devel; +Cc: Andrew Cooper, Suravee Suthikulpanit

On 24.09.2019 11:20, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 24 September 2019 10:10
>> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
>> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>
>> Subject: Re: [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment
>>
>> On 23.09.2019 18:30, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Jan Beulich
>>>> Sent: 19 September 2019 14:25
>>>> To: xen-devel@lists.xenproject.org
>>>> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Suravee Suthikulpanit
>> <suravee.suthikulpanit@amd.com>
>>>> Subject: [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment
>>>>
>>>> Having a single device table for all segments can't possibly be right.
>>>
>>> The copy of the spec. I have says (on page 253: Fixed-Length IVHD
>>> Blocks) that IVHD entries must have a segment group of 0, so can't
>>> the code just require iommu->seg == 0?
>>
>> The wording in my version is "At this time, only PCI Segment Group 0 is
>> supported." This suggests to me that it is not a good idea to have logic
>> baked in that depends on this remaining true. I realize though that there
>> are more places than just this one where we (have to) assume segment 0
>> (all in iommu_acpi.c, and all marked with an XXX comment).
>>
> 
> Ok. Fair enough. I just wasn't sure it was worth doing this change at the
> moment;

The thing that I found really disgusting was the global (i.e. not even
static) variable "device_table". And simply making it static would still
have left things in too ugly a state for my taste.

> but it doesn't hurt, so you can add my R-b.

Thanks.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 1/8] AMD/IOMMU: don't blindly allocate interrupt remapping tables
  2019-09-19 13:21 ` [Xen-devel] [PATCH v6 1/8] AMD/IOMMU: don't blindly allocate interrupt remapping tables Jan Beulich
  2019-09-23 15:27   ` Paul Durrant
@ 2019-09-25 13:23   ` Andrew Cooper
  1 sibling, 0 replies; 39+ messages in thread
From: Andrew Cooper @ 2019-09-25 13:23 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Suravee Suthikulpanit

On 19/09/2019 14:21, Jan Beulich wrote:
> ACPI tables are free to list far more device coordinates than there are
> actual devices. By delaying the table allocations for most cases, and
> doing them only when an actual device is known to be present at a given
> position, overall memory used for the tables goes down from over 500k
> pages to just over 1k (on my system having such ACPI table contents).
>
> Tables continue to get allocated right away for special entries
> (IO-APIC, HPET) as well as for alias IDs. While in the former case
> that's simply because there may not be any device at a given position,
> in the latter case this is to avoid having to introduce ref-counting of
> table usage.
>
> The change involves invoking
> iterate_ivrs_mappings(amd_iommu_setup_device_table) a second time,
> because the function now wants to be able to find PCI devices, which
> isn't possible yet when IOMMU setup happens very early during x2APIC
> mode setup. In this context amd_iommu_init_interrupt() gets renamed as
> well.
>
> The logic adjusting a DTE's interrupt remapping attributes also gets
> changed, such that the lack of an IRT would result in target aborted
> rather than not remapped interrupts (should any occur).

non-remapped or un-remapped would be slightly less awkward grammar here.

> Note that for now phantom functions get separate IRTs allocated, as was
> the case before.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early
  2019-09-19 13:22 ` [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early Jan Beulich
  2019-09-23 14:22   ` Roger Pau Monné
  2019-09-23 15:57   ` Paul Durrant
@ 2019-09-25 13:26   ` Andrew Cooper
  2 siblings, 0 replies; 39+ messages in thread
From: Andrew Cooper @ 2019-09-25 13:26 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Wei Liu, Suravee Suthikulpanit, Roger Pau Monné

On 19/09/2019 14:22, Jan Beulich wrote:
> Rather than doing this every time we set up interrupts for a device
> anew (and then in several places) fill this invariant field right after
> allocating struct pci_dev.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early
  2019-09-23 14:41     ` Jan Beulich
  2019-09-23 15:18       ` Roger Pau Monné
@ 2019-09-25 13:31       ` Andrew Cooper
  2019-09-25 13:34         ` Jan Beulich
  1 sibling, 1 reply; 39+ messages in thread
From: Andrew Cooper @ 2019-09-25 13:31 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, Wei Liu, Suravee Suthikulpanit

On 23/09/2019 15:41, Jan Beulich wrote:
> On 23.09.2019 16:22, Roger Pau Monné  wrote:
>> On Thu, Sep 19, 2019 at 03:22:54PM +0200, Jan Beulich wrote:
>>> Rather than doing this every time we set up interrupts for a device
>>> anew (and then in several places) fill this invariant field right after
>>> allocating struct pci_dev.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> LGTM:
>>
>> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> Thanks.
>
>> Just one nit below.
>>
>>> @@ -711,7 +710,7 @@ static int msi_capability_init(struct pc
>>>  
>>>          /* All MSIs are unmasked by default, Mask them all */
>>>          maskbits = pci_conf_read32(dev->sbdf, mpos);
>>> -        maskbits |= ~(u32)0 >> (32 - maxvec);
>>> +        maskbits |= ~(u32)0 >> (32 - dev->msi_maxvec);
>> GENMASK would be slightly easier to parse IMO (here and below).
> Besides this being an unrelated change, I'm afraid I'm going to
> object to use of a macro where it's unclear what its parameters
> mean: Even the example in xen/bitops.h is so confusing that I
> can't tell whether "h" is meant to be exclusive or inclusive
> (looks like the latter is intended). To me the two parameters
> also look reversed - I'd expect "low" first and "high" second.
> (ISTR having voiced reservations against its introduction, and
> I'm happy to see that it's used in Arm code only.)

Furthermore, Linux is having enough problems with it that they were
considering abolishing it entirely.

Getting the two numbers the wrong way around gets you a mask of 0.  It
is a very unsafe macro.

-1 to any use in Xen, even in the ARM code.  (I realise this is not my
call, but this clearly expresses my opinion about it.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early
  2019-09-25 13:31       ` Andrew Cooper
@ 2019-09-25 13:34         ` Jan Beulich
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2019-09-25 13:34 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, Wei Liu, Suravee Suthikulpanit, Roger Pau Monné

On 25.09.2019 15:31, Andrew Cooper wrote:
> On 23/09/2019 15:41, Jan Beulich wrote:
>> On 23.09.2019 16:22, Roger Pau Monné  wrote:
>>> On Thu, Sep 19, 2019 at 03:22:54PM +0200, Jan Beulich wrote:
>>>> Rather than doing this every time we set up interrupts for a device
>>>> anew (and then in several places) fill this invariant field right after
>>>> allocating struct pci_dev.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> LGTM:
>>>
>>> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>> Thanks.
>>
>>> Just one nit below.
>>>
>>>> @@ -711,7 +710,7 @@ static int msi_capability_init(struct pc
>>>>  
>>>>          /* All MSIs are unmasked by default, Mask them all */
>>>>          maskbits = pci_conf_read32(dev->sbdf, mpos);
>>>> -        maskbits |= ~(u32)0 >> (32 - maxvec);
>>>> +        maskbits |= ~(u32)0 >> (32 - dev->msi_maxvec);
>>> GENMASK would be slightly easier to parse IMO (here and below).
>> Besides this being an unrelated change, I'm afraid I'm going to
>> object to use of a macro where it's unclear what its parameters
>> mean: Even the example in xen/bitops.h is so confusing that I
>> can't tell whether "h" is meant to be exclusive or inclusive
>> (looks like the latter is intended). To me the two parameters
>> also look reversed - I'd expect "low" first and "high" second.
>> (ISTR having voiced reservations against its introduction, and
>> I'm happy to see that it's used in Arm code only.)
> 
> Furthermore, Linux is having enough problems with it that they were
> considering abolishing it entirely.
> 
> Getting the two numbers the wrong way around gets you a mask of 0.  It
> is a very unsafe macro.

Where, having looked at it some when replying to Roger, it seemed
to me that it would have been quite simple to make the macro
tolerate either order.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 5/8] AMD/IOMMU: restrict interrupt remapping table sizes
  2019-09-19 13:23 ` [Xen-devel] [PATCH v6 5/8] AMD/IOMMU: restrict interrupt remapping table sizes Jan Beulich
  2019-09-23 16:22   ` Paul Durrant
@ 2019-09-25 13:40   ` Andrew Cooper
  1 sibling, 0 replies; 39+ messages in thread
From: Andrew Cooper @ 2019-09-25 13:40 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Suravee Suthikulpanit

On 19/09/2019 14:23, Jan Beulich wrote:
> There's no point setting up tables with more space than a PCI device can
> use. For both MSI and MSI-X we can determine how many interrupts could
> be set up at most. Tables allocated during ACPI table parsing, however,
> will (for now at least) continue to be set up to have maximum size.
>
> Note that until we would want to use sub-page allocations here there's
> no point checking whether both MSI and MSI-X are supported by a device -
> an order-0 allocation will fit the dual case in any event, no matter
> that the MSI-X vector count may be smaller than the MSI one.
>
> On my Rome system this reduces space needed from just over 1k pages to
> about 125.
>
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings
  2019-09-19 13:24 ` [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings Jan Beulich
  2019-09-23 16:25   ` Paul Durrant
@ 2019-09-25 13:41   ` Andrew Cooper
  1 sibling, 0 replies; 39+ messages in thread
From: Andrew Cooper @ 2019-09-25 13:41 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Suravee Suthikulpanit

On 19/09/2019 14:24, Jan Beulich wrote:
> Move the device flags field up into an unused hole, thus shrinking
> overall structure size by 8 bytes. Use bool and uint<N>_t as
> appropriate. Drop pointless (redundant) initializations.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment
  2019-09-19 13:24 ` [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment Jan Beulich
  2019-09-23 16:30   ` Paul Durrant
@ 2019-09-25 14:19   ` Andrew Cooper
  2019-09-25 14:40     ` Jan Beulich
  1 sibling, 1 reply; 39+ messages in thread
From: Andrew Cooper @ 2019-09-25 14:19 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Suravee Suthikulpanit

On 19/09/2019 14:24, Jan Beulich wrote:
> Having a single device table for all segments can't possibly be right.

That depends on your point of view.  Given that there aren't AMD systems
(or really, x86 systems in general) with segments other than zero, a
single device table is reasonable, even if not overly-forward compatible.

> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -1068,8 +1067,29 @@ static void * __init allocate_ppr_log(st
>                                  IOMMU_PPR_LOG_DEFAULT_ENTRIES, "PPR Log");
>  }
>  
> +/*
> + * Within ivrs_mappings[] we allocate an extra array element to store
> + * - segment number,
> + * - device table.
> + */
> +#define IVRS_MAPPINGS_SEG(m) (m)[ivrs_bdf_entries].dte_requestor_id
> +#define IVRS_MAPPINGS_DEVTAB(m) (m)[ivrs_bdf_entries].intremap_table
> +
> +static void __init free_ivrs_mapping(void *ptr)

A pointer to this function gets stashed in a non-init radix tree, and
gets invoked by a non-init function (radix_tree_destroy).  It shouldn't
be __init.

> +{
> +    const struct ivrs_mappings *ivrs_mappings = ptr;
> +
> +    if ( IVRS_MAPPINGS_DEVTAB(ivrs_mappings) )
> +        deallocate_buffer(IVRS_MAPPINGS_DEVTAB(ivrs_mappings),
> +                          dt_alloc_size());
> +
> +    xfree(ptr)
> +}
> +
>  static int __init amd_iommu_init_one(struct amd_iommu *iommu, bool intr)
>  {
> +    const struct ivrs_mappings *ivrs_mappings;
> +
>      if ( allocate_cmd_buffer(iommu) == NULL )
>          goto error_out;
>  
> @@ -1082,13 +1102,15 @@ static int __init amd_iommu_init_one(str
>      if ( intr && !set_iommu_interrupt_handler(iommu) )
>          goto error_out;
>  
> -    /* To make sure that device_table.buffer has been successfully allocated */
> -    if ( device_table.buffer == NULL )
> +    /* Make sure that the device table has been successfully allocated. */
> +    ivrs_mappings = get_ivrs_mappings(iommu->seg);
> +    if ( !IVRS_MAPPINGS_DEVTAB(ivrs_mappings) )

This isn't safe.  get_ivrs_mappings() returns NULL on failure, which
IVRS_MAPPINGS_DEVTAB() dereferences to find the device table pointer.

~Andrew

>          goto error_out;
>  
> -    iommu->dev_table.alloc_size = device_table.alloc_size;
> -    iommu->dev_table.entries = device_table.entries;
> -    iommu->dev_table.buffer = device_table.buffer;
> +    iommu->dev_table.alloc_size = dt_alloc_size();
> +    iommu->dev_table.entries = iommu->dev_table.alloc_size /
> +                               IOMMU_DEV_TABLE_ENTRY_SIZE;
> +    iommu->dev_table.buffer = IVRS_MAPPINGS_DEVTAB(ivrs_mappings);
>  
>      enable_iommu(iommu);
>      printk("AMD-Vi: IOMMU %d Enabled.\n", nr_amd_iommus );
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment
  2019-09-25 14:19   ` Andrew Cooper
@ 2019-09-25 14:40     ` Jan Beulich
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2019-09-25 14:40 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Suravee Suthikulpanit

On 25.09.2019 16:19, Andrew Cooper wrote:
> On 19/09/2019 14:24, Jan Beulich wrote:
>> Having a single device table for all segments can't possibly be right.
> 
> That depends on your point of view.  Given that there aren't AMD systems
> (or really, x86 systems in general)

You keep saying this, and I can only keep repeating that a couple
of years ago I did end up testing (and making work) Xen on an SGI
system with (iirc) three segments.

> with segments other than zero, a
> single device table is reasonable, even if not overly-forward compatible.

Well, I see what you mean, but my use of plural in "segments" is
intended to mean potentially multiple of them.

>> --- a/xen/drivers/passthrough/amd/iommu_init.c
>> +++ b/xen/drivers/passthrough/amd/iommu_init.c
>> @@ -1068,8 +1067,29 @@ static void * __init allocate_ppr_log(st
>>                                  IOMMU_PPR_LOG_DEFAULT_ENTRIES, "PPR Log");
>>  }
>>  
>> +/*
>> + * Within ivrs_mappings[] we allocate an extra array element to store
>> + * - segment number,
>> + * - device table.
>> + */
>> +#define IVRS_MAPPINGS_SEG(m) (m)[ivrs_bdf_entries].dte_requestor_id
>> +#define IVRS_MAPPINGS_DEVTAB(m) (m)[ivrs_bdf_entries].intremap_table
>> +
>> +static void __init free_ivrs_mapping(void *ptr)
> 
> A pointer to this function gets stashed in a non-init radix tree, and
> gets invoked by a non-init function (radix_tree_destroy).  It shouldn't
> be __init.

I don't see the stashing part, and I don't see an issue at all with
passing the function pointer to radix_tree_destroy(): This function
invocation is itself in an __init function (which gets called upon
initialization errors).

>> @@ -1082,13 +1102,15 @@ static int __init amd_iommu_init_one(str
>>      if ( intr && !set_iommu_interrupt_handler(iommu) )
>>          goto error_out;
>>  
>> -    /* To make sure that device_table.buffer has been successfully allocated */
>> -    if ( device_table.buffer == NULL )
>> +    /* Make sure that the device table has been successfully allocated. */
>> +    ivrs_mappings = get_ivrs_mappings(iommu->seg);
>> +    if ( !IVRS_MAPPINGS_DEVTAB(ivrs_mappings) )
> 
> This isn't safe.  get_ivrs_mappings() returns NULL on failure, which
> IVRS_MAPPINGS_DEVTAB() dereferences to find the device table pointer.

get_ivrs_mappings() can't return NULL here - if the setting up of
that table has failed (in amd_iommu_prepare_one()), we wouldn't make
it here.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 8/8] AMD/IOMMU: pre-fill all DTEs right after table allocation
  2019-09-19 13:25 ` [Xen-devel] [PATCH v6 8/8] AMD/IOMMU: pre-fill all DTEs right after table allocation Jan Beulich
  2019-09-23 16:33   ` Paul Durrant
@ 2019-09-25 14:59   ` Andrew Cooper
  2019-09-26 13:29     ` Jan Beulich
  1 sibling, 1 reply; 39+ messages in thread
From: Andrew Cooper @ 2019-09-25 14:59 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Suravee Suthikulpanit

On 19/09/2019 14:25, Jan Beulich wrote:
> Make sure we don't leave any DTEs unexpected requests through which
> would be passed through untranslated. Set V and IV right away (with
> all other fields left as zero), relying on the V and/or IV bits
> getting cleared only by amd_iommu_set_root_page_table() and
> amd_iommu_set_intremap_table() under special pass-through circumstances.
> Switch back to initial settings in amd_iommu_disable_domain_device().
>
> Take the liberty and also make the latter function static, constifying
> its first parameter at the same time, at this occasion.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v6: New.
>
> ---
>  xen/drivers/passthrough/amd/iommu_init.c    |   22 +++++++++++++++++++---
>  xen/drivers/passthrough/amd/pci_amd_iommu.c |   20 ++++++++++++++++----
>  2 files changed, 35 insertions(+), 7 deletions(-)
>
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -1255,12 +1255,28 @@ static int __init amd_iommu_setup_device
>  
>      if ( !dt )
>      {
> +        unsigned int size = dt_alloc_size();
> +
>          /* allocate 'device table' on a 4K boundary */
>          dt = IVRS_MAPPINGS_DEVTAB(ivrs_mappings) =
> -            allocate_buffer(dt_alloc_size(), "Device Table");
> +            allocate_buffer(size, "Device Table");
> +        if ( !dt )
> +            return -ENOMEM;
> +
> +        /*
> +         * Prefill every DTE such that all kinds of requests will get aborted.
> +         * Besides the two bits set to true below this builds upon
> +         * IOMMU_DEV_TABLE_SYS_MGT_DMA_ABORTED,
> +         * IOMMU_DEV_TABLE_IO_CONTROL_ABORTED, as well as
> +         * IOMMU_DEV_TABLE_INT_CONTROL_ABORTED all being zero, and us also
> +         * wanting at least TV, GV, I, and EX set to false.
> +         */
> +        for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf )
> +        {
> +            dt[bdf].v = true;
> +            dt[bdf].iv = true;
> +        }

The DT-BAR is generally 2MB in size.  It is inconvenient that we first
zero it, then walk it a second time setting two bits.

I'm not sure that allocate_buffer() needs to zero memory for any of its
callers.  The command ring writes a full entry at once, and the IOMMU
writes full event/page logs at once.

Dropping the memset() and changing this to be a loop of structure
assignments would be rather more efficient.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Xen-devel] [PATCH v6 8/8] AMD/IOMMU: pre-fill all DTEs right after table allocation
  2019-09-25 14:59   ` Andrew Cooper
@ 2019-09-26 13:29     ` Jan Beulich
  0 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2019-09-26 13:29 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Suravee Suthikulpanit

On 25.09.2019 16:59, Andrew Cooper wrote:
> On 19/09/2019 14:25, Jan Beulich wrote:
>> Make sure we don't leave any DTEs unexpected requests through which
>> would be passed through untranslated. Set V and IV right away (with
>> all other fields left as zero), relying on the V and/or IV bits
>> getting cleared only by amd_iommu_set_root_page_table() and
>> amd_iommu_set_intremap_table() under special pass-through circumstances.
>> Switch back to initial settings in amd_iommu_disable_domain_device().
>>
>> Take the liberty and also make the latter function static, constifying
>> its first parameter at the same time, at this occasion.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v6: New.
>>
>> ---
>>  xen/drivers/passthrough/amd/iommu_init.c    |   22 +++++++++++++++++++---
>>  xen/drivers/passthrough/amd/pci_amd_iommu.c |   20 ++++++++++++++++----
>>  2 files changed, 35 insertions(+), 7 deletions(-)
>>
>> --- a/xen/drivers/passthrough/amd/iommu_init.c
>> +++ b/xen/drivers/passthrough/amd/iommu_init.c
>> @@ -1255,12 +1255,28 @@ static int __init amd_iommu_setup_device
>>  
>>      if ( !dt )
>>      {
>> +        unsigned int size = dt_alloc_size();
>> +
>>          /* allocate 'device table' on a 4K boundary */
>>          dt = IVRS_MAPPINGS_DEVTAB(ivrs_mappings) =
>> -            allocate_buffer(dt_alloc_size(), "Device Table");
>> +            allocate_buffer(size, "Device Table");
>> +        if ( !dt )
>> +            return -ENOMEM;
>> +
>> +        /*
>> +         * Prefill every DTE such that all kinds of requests will get aborted.
>> +         * Besides the two bits set to true below this builds upon
>> +         * IOMMU_DEV_TABLE_SYS_MGT_DMA_ABORTED,
>> +         * IOMMU_DEV_TABLE_IO_CONTROL_ABORTED, as well as
>> +         * IOMMU_DEV_TABLE_INT_CONTROL_ABORTED all being zero, and us also
>> +         * wanting at least TV, GV, I, and EX set to false.
>> +         */
>> +        for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf )
>> +        {
>> +            dt[bdf].v = true;
>> +            dt[bdf].iv = true;
>> +        }
> 
> The DT-BAR is generally 2MB in size.  It is inconvenient that we first
> zero it, then walk it a second time setting two bits.
> 
> I'm not sure that allocate_buffer() needs to zero memory for any of its
> callers.  The command ring writes a full entry at once, and the IOMMU
> writes full event/page logs at once.

But in the latter case we need the zeroing to work around errata 732
and 733; see parse_{event,ppr}_log_entry().

> Dropping the memset() and changing this to be a loop of structure
> assignments would be rather more efficient.

I'll add a boolean parameter to allocate_buffer().

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2019-09-26 13:29 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-19 13:08 [Xen-devel] [PATCH v6 0/8] AMD IOMMU: further improvements Jan Beulich
2019-09-19 13:21 ` [Xen-devel] [PATCH v6 1/8] AMD/IOMMU: don't blindly allocate interrupt remapping tables Jan Beulich
2019-09-23 15:27   ` Paul Durrant
2019-09-25 13:23   ` Andrew Cooper
2019-09-19 13:22 ` [Xen-devel] [PATCH v6 2/8] AMD/IOMMU: make phantom functions share " Jan Beulich
2019-09-23 15:44   ` Paul Durrant
2019-09-19 13:22 ` [Xen-devel] [PATCH v6 3/8] x86/PCI: read maximum MSI vector count early Jan Beulich
2019-09-23 14:22   ` Roger Pau Monné
2019-09-23 14:41     ` Jan Beulich
2019-09-23 15:18       ` Roger Pau Monné
2019-09-23 15:26         ` Jan Beulich
2019-09-25 13:31       ` Andrew Cooper
2019-09-25 13:34         ` Jan Beulich
2019-09-23 15:57   ` Paul Durrant
2019-09-23 16:00     ` Jan Beulich
2019-09-25 13:26   ` Andrew Cooper
2019-09-19 13:23 ` [Xen-devel] [PATCH v6 4/8] AMD/IOMMU: replace INTREMAP_ENTRIES Jan Beulich
2019-09-23 16:13   ` Paul Durrant
2019-09-19 13:23 ` [Xen-devel] [PATCH v6 5/8] AMD/IOMMU: restrict interrupt remapping table sizes Jan Beulich
2019-09-23 16:22   ` Paul Durrant
2019-09-25 13:40   ` Andrew Cooper
2019-09-19 13:24 ` [Xen-devel] [PATCH v6 6/8] AMD/IOMMU: tidy struct ivrs_mappings Jan Beulich
2019-09-23 16:25   ` Paul Durrant
2019-09-24  9:01     ` Jan Beulich
2019-09-24  9:08       ` Paul Durrant
2019-09-24  9:13         ` Jan Beulich
2019-09-24  9:18           ` Paul Durrant
2019-09-25 13:41   ` Andrew Cooper
2019-09-19 13:24 ` [Xen-devel] [PATCH v6 7/8] AMD/IOMMU: allocate one device table per PCI segment Jan Beulich
2019-09-23 16:30   ` Paul Durrant
2019-09-24  9:10     ` Jan Beulich
2019-09-24  9:20       ` Paul Durrant
2019-09-24  9:30         ` Jan Beulich
2019-09-25 14:19   ` Andrew Cooper
2019-09-25 14:40     ` Jan Beulich
2019-09-19 13:25 ` [Xen-devel] [PATCH v6 8/8] AMD/IOMMU: pre-fill all DTEs right after table allocation Jan Beulich
2019-09-23 16:33   ` Paul Durrant
2019-09-25 14:59   ` Andrew Cooper
2019-09-26 13:29     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.