All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] IOMMU cleanup
@ 2020-07-24 16:46 Paul Durrant
  2020-07-24 16:46 ` [PATCH 1/6] x86/iommu: re-arrange arch_iommu to separate common fields Paul Durrant
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Paul Durrant @ 2020-07-24 16:46 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

From: Paul Durrant <pdurrant@amazon.com>

Patches accumulated in my during 4.14 freeze...

Paul Durrant (6):
  x86/iommu: re-arrange arch_iommu to separate common fields...
  x86/iommu: add common page-table allocator
  iommu: remove iommu_lookup_page() and the lookup_page() method...
  remove remaining uses of iommu_legacy_map/unmap
  iommu: remove the share_p2m operation
  iommu: stop calling IOMMU page tables 'p2m tables'

 xen/arch/x86/mm.c                           |  22 ++-
 xen/arch/x86/mm/p2m-ept.c                   |  22 ++-
 xen/arch/x86/mm/p2m-pt.c                    |  17 +-
 xen/arch/x86/mm/p2m.c                       |  31 ++-
 xen/arch/x86/tboot.c                        |   4 +-
 xen/arch/x86/x86_64/mm.c                    |  27 ++-
 xen/common/grant_table.c                    |  36 +++-
 xen/common/memory.c                         |   7 +-
 xen/drivers/passthrough/amd/iommu.h         |  18 +-
 xen/drivers/passthrough/amd/iommu_guest.c   |   8 +-
 xen/drivers/passthrough/amd/iommu_map.c     |  22 +--
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 109 +++--------
 xen/drivers/passthrough/iommu.c             |  91 +--------
 xen/drivers/passthrough/vtd/iommu.c         | 206 ++++++--------------
 xen/drivers/passthrough/x86/iommu.c         |  42 +++-
 xen/include/asm-x86/iommu.h                 |  33 +++-
 xen/include/xen/iommu.h                     |  35 +---
 17 files changed, 303 insertions(+), 427 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/6] x86/iommu: re-arrange arch_iommu to separate common fields...
  2020-07-24 16:46 [PATCH 0/6] IOMMU cleanup Paul Durrant
@ 2020-07-24 16:46 ` Paul Durrant
  2020-07-24 17:29   ` Andrew Cooper
  2020-07-24 16:46 ` [PATCH 2/6] x86/iommu: add common page-table allocator Paul Durrant
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Paul Durrant @ 2020-07-24 16:46 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Wei Liu, Andrew Cooper, Paul Durrant,
	Lukasz Hawrylko, Jan Beulich, Roger Pau Monné

From: Paul Durrant <pdurrant@amazon.com>

... from those specific to VT-d or AMD IOMMU, and put the latter in a union.

There is no functional change in this patch, although the initialization of
the 'mapped_rmrrs' list occurs slightly later in iommu_domain_init() since
it is now done (correctly) in VT-d specific code rather than in general x86
code.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/tboot.c                        |  4 +-
 xen/drivers/passthrough/amd/iommu_guest.c   |  8 ++--
 xen/drivers/passthrough/amd/iommu_map.c     | 14 +++---
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 35 +++++++-------
 xen/drivers/passthrough/vtd/iommu.c         | 53 +++++++++++----------
 xen/drivers/passthrough/x86/iommu.c         |  1 -
 xen/include/asm-x86/iommu.h                 | 27 +++++++----
 7 files changed, 78 insertions(+), 64 deletions(-)

diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
index 320e06f129..e66b0940c4 100644
--- a/xen/arch/x86/tboot.c
+++ b/xen/arch/x86/tboot.c
@@ -230,8 +230,8 @@ static void tboot_gen_domain_integrity(const uint8_t key[TB_KEY_SIZE],
         {
             const struct domain_iommu *dio = dom_iommu(d);
 
-            update_iommu_mac(&ctx, dio->arch.pgd_maddr,
-                             agaw_to_level(dio->arch.agaw));
+            update_iommu_mac(&ctx, dio->arch.vtd.pgd_maddr,
+                             agaw_to_level(dio->arch.vtd.agaw));
         }
     }
 
diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
index 014a72a54b..26819e82a8 100644
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -50,12 +50,12 @@ static uint16_t guest_bdf(struct domain *d, uint16_t machine_bdf)
 
 static inline struct guest_iommu *domain_iommu(struct domain *d)
 {
-    return dom_iommu(d)->arch.g_iommu;
+    return dom_iommu(d)->arch.amd_iommu.g_iommu;
 }
 
 static inline struct guest_iommu *vcpu_iommu(struct vcpu *v)
 {
-    return dom_iommu(v->domain)->arch.g_iommu;
+    return dom_iommu(v->domain)->arch.amd_iommu.g_iommu;
 }
 
 static void guest_iommu_enable(struct guest_iommu *iommu)
@@ -823,7 +823,7 @@ int guest_iommu_init(struct domain* d)
     guest_iommu_reg_init(iommu);
     iommu->mmio_base = ~0ULL;
     iommu->domain = d;
-    hd->arch.g_iommu = iommu;
+    hd->arch.amd_iommu.g_iommu = iommu;
 
     tasklet_init(&iommu->cmd_buffer_tasklet, guest_iommu_process_command, d);
 
@@ -845,5 +845,5 @@ void guest_iommu_destroy(struct domain *d)
     tasklet_kill(&iommu->cmd_buffer_tasklet);
     xfree(iommu);
 
-    dom_iommu(d)->arch.g_iommu = NULL;
+    dom_iommu(d)->arch.amd_iommu.g_iommu = NULL;
 }
diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index 93e96cd69c..06c564968c 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -180,8 +180,8 @@ static int iommu_pde_from_dfn(struct domain *d, unsigned long dfn,
     struct page_info *table;
     const struct domain_iommu *hd = dom_iommu(d);
 
-    table = hd->arch.root_table;
-    level = hd->arch.paging_mode;
+    table = hd->arch.amd_iommu.root_table;
+    level = hd->arch.amd_iommu.paging_mode;
 
     BUG_ON( table == NULL || level < 1 || level > 6 );
 
@@ -325,7 +325,7 @@ int amd_iommu_unmap_page(struct domain *d, dfn_t dfn,
 
     spin_lock(&hd->arch.mapping_lock);
 
-    if ( !hd->arch.root_table )
+    if ( !hd->arch.amd_iommu.root_table )
     {
         spin_unlock(&hd->arch.mapping_lock);
         return 0;
@@ -450,7 +450,7 @@ int __init amd_iommu_quarantine_init(struct domain *d)
     unsigned int level = amd_iommu_get_paging_mode(end_gfn);
     struct amd_iommu_pte *table;
 
-    if ( hd->arch.root_table )
+    if ( hd->arch.amd_iommu.root_table )
     {
         ASSERT_UNREACHABLE();
         return 0;
@@ -458,11 +458,11 @@ int __init amd_iommu_quarantine_init(struct domain *d)
 
     spin_lock(&hd->arch.mapping_lock);
 
-    hd->arch.root_table = alloc_amd_iommu_pgtable();
-    if ( !hd->arch.root_table )
+    hd->arch.amd_iommu.root_table = alloc_amd_iommu_pgtable();
+    if ( !hd->arch.amd_iommu.root_table )
         goto out;
 
-    table = __map_domain_page(hd->arch.root_table);
+    table = __map_domain_page(hd->arch.amd_iommu.root_table);
     while ( level )
     {
         struct page_info *pg;
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 8d6309cc8c..c386dc4387 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -92,7 +92,8 @@ static void amd_iommu_setup_domain_device(
     u8 bus = pdev->bus;
     const struct domain_iommu *hd = dom_iommu(domain);
 
-    BUG_ON( !hd->arch.root_table || !hd->arch.paging_mode ||
+    BUG_ON( !hd->arch.amd_iommu.root_table ||
+            !hd->arch.amd_iommu.paging_mode ||
             !iommu->dev_table.buffer );
 
     if ( iommu_hwdom_passthrough && is_hardware_domain(domain) )
@@ -111,8 +112,8 @@ static void amd_iommu_setup_domain_device(
 
         /* bind DTE to domain page-tables */
         amd_iommu_set_root_page_table(
-            dte, page_to_maddr(hd->arch.root_table), domain->domain_id,
-            hd->arch.paging_mode, valid);
+            dte, page_to_maddr(hd->arch.amd_iommu.root_table),
+            domain->domain_id, hd->arch.amd_iommu.paging_mode, valid);
 
         /* Undo what amd_iommu_disable_domain_device() may have done. */
         ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
@@ -132,8 +133,8 @@ static void amd_iommu_setup_domain_device(
                         "root table = %#"PRIx64", "
                         "domain = %d, paging mode = %d\n",
                         req_id, pdev->type,
-                        page_to_maddr(hd->arch.root_table),
-                        domain->domain_id, hd->arch.paging_mode);
+                        page_to_maddr(hd->arch.amd_iommu.root_table),
+                        domain->domain_id, hd->arch.amd_iommu.paging_mode);
     }
 
     spin_unlock_irqrestore(&iommu->lock, flags);
@@ -207,10 +208,10 @@ static int iov_enable_xt(void)
 
 int amd_iommu_alloc_root(struct domain_iommu *hd)
 {
-    if ( unlikely(!hd->arch.root_table) )
+    if ( unlikely(!hd->arch.amd_iommu.root_table) )
     {
-        hd->arch.root_table = alloc_amd_iommu_pgtable();
-        if ( !hd->arch.root_table )
+        hd->arch.amd_iommu.root_table = alloc_amd_iommu_pgtable();
+        if ( !hd->arch.amd_iommu.root_table )
             return -ENOMEM;
     }
 
@@ -240,7 +241,7 @@ static int amd_iommu_domain_init(struct domain *d)
      *   physical address space we give it, but this isn't known yet so use 4
      *   unilaterally.
      */
-    hd->arch.paging_mode = amd_iommu_get_paging_mode(
+    hd->arch.amd_iommu.paging_mode = amd_iommu_get_paging_mode(
         is_hvm_domain(d)
         ? 1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT)
         : get_upper_mfn_bound() + 1);
@@ -306,7 +307,7 @@ static void amd_iommu_disable_domain_device(const struct domain *domain,
         AMD_IOMMU_DEBUG("Disable: device id = %#x, "
                         "domain = %d, paging mode = %d\n",
                         req_id,  domain->domain_id,
-                        dom_iommu(domain)->arch.paging_mode);
+                        dom_iommu(domain)->arch.amd_iommu.paging_mode);
     }
     spin_unlock_irqrestore(&iommu->lock, flags);
 
@@ -422,10 +423,11 @@ static void deallocate_iommu_page_tables(struct domain *d)
     struct domain_iommu *hd = dom_iommu(d);
 
     spin_lock(&hd->arch.mapping_lock);
-    if ( hd->arch.root_table )
+    if ( hd->arch.amd_iommu.root_table )
     {
-        deallocate_next_page_table(hd->arch.root_table, hd->arch.paging_mode);
-        hd->arch.root_table = NULL;
+        deallocate_next_page_table(hd->arch.amd_iommu.root_table,
+                                   hd->arch.amd_iommu.paging_mode);
+        hd->arch.amd_iommu.root_table = NULL;
     }
     spin_unlock(&hd->arch.mapping_lock);
 }
@@ -605,11 +607,12 @@ static void amd_dump_p2m_table(struct domain *d)
 {
     const struct domain_iommu *hd = dom_iommu(d);
 
-    if ( !hd->arch.root_table )
+    if ( !hd->arch.amd_iommu.root_table )
         return;
 
-    printk("p2m table has %d levels\n", hd->arch.paging_mode);
-    amd_dump_p2m_table_level(hd->arch.root_table, hd->arch.paging_mode, 0, 0);
+    printk("p2m table has %d levels\n", hd->arch.amd_iommu.paging_mode);
+    amd_dump_p2m_table_level(hd->arch.amd_iommu.root_table,
+                             hd->arch.amd_iommu.paging_mode, 0, 0);
 }
 
 static const struct iommu_ops __initconstrel _iommu_ops = {
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 01dc444771..ac1373fb99 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -257,20 +257,20 @@ static u64 bus_to_context_maddr(struct vtd_iommu *iommu, u8 bus)
 static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
 {
     struct domain_iommu *hd = dom_iommu(domain);
-    int addr_width = agaw_to_width(hd->arch.agaw);
+    int addr_width = agaw_to_width(hd->arch.vtd.agaw);
     struct dma_pte *parent, *pte = NULL;
-    int level = agaw_to_level(hd->arch.agaw);
+    int level = agaw_to_level(hd->arch.vtd.agaw);
     int offset;
     u64 pte_maddr = 0;
 
     addr &= (((u64)1) << addr_width) - 1;
     ASSERT(spin_is_locked(&hd->arch.mapping_lock));
-    if ( !hd->arch.pgd_maddr &&
+    if ( !hd->arch.vtd.pgd_maddr &&
          (!alloc ||
-          ((hd->arch.pgd_maddr = alloc_pgtable_maddr(1, hd->node)) == 0)) )
+          ((hd->arch.vtd.pgd_maddr = alloc_pgtable_maddr(1, hd->node)) == 0)) )
         goto out;
 
-    parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.pgd_maddr);
+    parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.vtd.pgd_maddr);
     while ( level > 1 )
     {
         offset = address_level_offset(addr, level);
@@ -593,7 +593,7 @@ static int __must_check iommu_flush_iotlb(struct domain *d, dfn_t dfn,
     {
         iommu = drhd->iommu;
 
-        if ( !test_bit(iommu->index, &hd->arch.iommu_bitmap) )
+        if ( !test_bit(iommu->index, &hd->arch.vtd.iommu_bitmap) )
             continue;
 
         flush_dev_iotlb = !!find_ats_dev_drhd(iommu);
@@ -1281,7 +1281,10 @@ void __init iommu_free(struct acpi_drhd_unit *drhd)
 
 static int intel_iommu_domain_init(struct domain *d)
 {
-    dom_iommu(d)->arch.agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
+    struct domain_iommu *hd = dom_iommu(d);
+
+    hd->arch.vtd.agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
+    INIT_LIST_HEAD(&hd->arch.vtd.mapped_rmrrs);
 
     return 0;
 }
@@ -1381,10 +1384,10 @@ int domain_context_mapping_one(
         spin_lock(&hd->arch.mapping_lock);
 
         /* Ensure we have pagetables allocated down to leaf PTE. */
-        if ( hd->arch.pgd_maddr == 0 )
+        if ( hd->arch.vtd.pgd_maddr == 0 )
         {
             addr_to_dma_page_maddr(domain, 0, 1);
-            if ( hd->arch.pgd_maddr == 0 )
+            if ( hd->arch.vtd.pgd_maddr == 0 )
             {
             nomem:
                 spin_unlock(&hd->arch.mapping_lock);
@@ -1395,7 +1398,7 @@ int domain_context_mapping_one(
         }
 
         /* Skip top levels of page tables for 2- and 3-level DRHDs. */
-        pgd_maddr = hd->arch.pgd_maddr;
+        pgd_maddr = hd->arch.vtd.pgd_maddr;
         for ( agaw = level_to_agaw(4);
               agaw != level_to_agaw(iommu->nr_pt_levels);
               agaw-- )
@@ -1449,7 +1452,7 @@ int domain_context_mapping_one(
     if ( rc > 0 )
         rc = 0;
 
-    set_bit(iommu->index, &hd->arch.iommu_bitmap);
+    set_bit(iommu->index, &hd->arch.vtd.iommu_bitmap);
 
     unmap_vtd_domain_page(context_entries);
 
@@ -1727,7 +1730,7 @@ static int domain_context_unmap(struct domain *domain, u8 devfn,
     {
         int iommu_domid;
 
-        clear_bit(iommu->index, &dom_iommu(domain)->arch.iommu_bitmap);
+        clear_bit(iommu->index, &dom_iommu(domain)->arch.vtd.iommu_bitmap);
 
         iommu_domid = domain_iommu_domid(domain, iommu);
         if ( iommu_domid == -1 )
@@ -1752,7 +1755,7 @@ static void iommu_domain_teardown(struct domain *d)
     if ( list_empty(&acpi_drhd_units) )
         return;
 
-    list_for_each_entry_safe ( mrmrr, tmp, &hd->arch.mapped_rmrrs, list )
+    list_for_each_entry_safe ( mrmrr, tmp, &hd->arch.vtd.mapped_rmrrs, list )
     {
         list_del(&mrmrr->list);
         xfree(mrmrr);
@@ -1764,8 +1767,9 @@ static void iommu_domain_teardown(struct domain *d)
         return;
 
     spin_lock(&hd->arch.mapping_lock);
-    iommu_free_pagetable(hd->arch.pgd_maddr, agaw_to_level(hd->arch.agaw));
-    hd->arch.pgd_maddr = 0;
+    iommu_free_pagetable(hd->arch.vtd.pgd_maddr,
+                         agaw_to_level(hd->arch.vtd.agaw));
+    hd->arch.vtd.pgd_maddr = 0;
     spin_unlock(&hd->arch.mapping_lock);
 }
 
@@ -1905,7 +1909,7 @@ static void iommu_set_pgd(struct domain *d)
     mfn_t pgd_mfn;
 
     pgd_mfn = pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
-    dom_iommu(d)->arch.pgd_maddr =
+    dom_iommu(d)->arch.vtd.pgd_maddr =
         pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
 }
 
@@ -1925,7 +1929,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
      * No need to acquire hd->arch.mapping_lock: Both insertion and removal
      * get done while holding pcidevs_lock.
      */
-    list_for_each_entry( mrmrr, &hd->arch.mapped_rmrrs, list )
+    list_for_each_entry( mrmrr, &hd->arch.vtd.mapped_rmrrs, list )
     {
         if ( mrmrr->base == rmrr->base_address &&
              mrmrr->end == rmrr->end_address )
@@ -1972,7 +1976,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
     mrmrr->base = rmrr->base_address;
     mrmrr->end = rmrr->end_address;
     mrmrr->count = 1;
-    list_add_tail(&mrmrr->list, &hd->arch.mapped_rmrrs);
+    list_add_tail(&mrmrr->list, &hd->arch.vtd.mapped_rmrrs);
 
     return 0;
 }
@@ -2671,8 +2675,9 @@ static void vtd_dump_p2m_table(struct domain *d)
         return;
 
     hd = dom_iommu(d);
-    printk("p2m table has %d levels\n", agaw_to_level(hd->arch.agaw));
-    vtd_dump_p2m_table_level(hd->arch.pgd_maddr, agaw_to_level(hd->arch.agaw), 0, 0);
+    printk("p2m table has %d levels\n", agaw_to_level(hd->arch.vtd.agaw));
+    vtd_dump_p2m_table_level(hd->arch.vtd.pgd_maddr,
+                             agaw_to_level(hd->arch.vtd.agaw), 0, 0);
 }
 
 static int __init intel_iommu_quarantine_init(struct domain *d)
@@ -2683,7 +2688,7 @@ static int __init intel_iommu_quarantine_init(struct domain *d)
     unsigned int level = agaw_to_level(agaw);
     int rc;
 
-    if ( hd->arch.pgd_maddr )
+    if ( hd->arch.vtd.pgd_maddr )
     {
         ASSERT_UNREACHABLE();
         return 0;
@@ -2691,11 +2696,11 @@ static int __init intel_iommu_quarantine_init(struct domain *d)
 
     spin_lock(&hd->arch.mapping_lock);
 
-    hd->arch.pgd_maddr = alloc_pgtable_maddr(1, hd->node);
-    if ( !hd->arch.pgd_maddr )
+    hd->arch.vtd.pgd_maddr = alloc_pgtable_maddr(1, hd->node);
+    if ( !hd->arch.vtd.pgd_maddr )
         goto out;
 
-    parent = map_vtd_domain_page(hd->arch.pgd_maddr);
+    parent = map_vtd_domain_page(hd->arch.vtd.pgd_maddr);
     while ( level )
     {
         uint64_t maddr;
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index 3d7670e8c6..a12109a1de 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -139,7 +139,6 @@ int arch_iommu_domain_init(struct domain *d)
     struct domain_iommu *hd = dom_iommu(d);
 
     spin_lock_init(&hd->arch.mapping_lock);
-    INIT_LIST_HEAD(&hd->arch.mapped_rmrrs);
 
     return 0;
 }
diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
index 6c9d5e5632..a7add5208c 100644
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -45,16 +45,23 @@ typedef uint64_t daddr_t;
 
 struct arch_iommu
 {
-    u64 pgd_maddr;                 /* io page directory machine address */
-    spinlock_t mapping_lock;            /* io page table lock */
-    int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
-    u64 iommu_bitmap;              /* bitmap of iommu(s) that the domain uses */
-    struct list_head mapped_rmrrs;
-
-    /* amd iommu support */
-    int paging_mode;
-    struct page_info *root_table;
-    struct guest_iommu *g_iommu;
+    spinlock_t mapping_lock; /* io page table lock */
+
+    union {
+        /* Intel VT-d */
+        struct {
+            u64 pgd_maddr; /* io page directory machine address */
+            int agaw; /* adjusted guest address width, 0 is level 2 30-bit */
+            u64 iommu_bitmap; /* bitmap of iommu(s) that the domain uses */
+            struct list_head mapped_rmrrs;
+        } vtd;
+        /* AMD IOMMU */
+        struct {
+            int paging_mode;
+            struct page_info *root_table;
+            struct guest_iommu *g_iommu;
+        } amd_iommu;
+    };
 };
 
 extern struct iommu_ops iommu_ops;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/6] x86/iommu: add common page-table allocator
  2020-07-24 16:46 [PATCH 0/6] IOMMU cleanup Paul Durrant
  2020-07-24 16:46 ` [PATCH 1/6] x86/iommu: re-arrange arch_iommu to separate common fields Paul Durrant
@ 2020-07-24 16:46 ` Paul Durrant
  2020-07-24 18:24   ` Andrew Cooper
  2020-07-24 16:46 ` [PATCH 3/6] iommu: remove iommu_lookup_page() and the lookup_page() method Paul Durrant
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Paul Durrant @ 2020-07-24 16:46 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Wei Liu, Andrew Cooper, Paul Durrant, Jan Beulich,
	Roger Pau Monné

From: Paul Durrant <pdurrant@amazon.com>

Instead of having separate page table allocation functions in VT-d and AMD
IOMMU code, use a common allocation function in the general x86 code.
Also, rather than walking the page tables and using a tasklet to free them
during iommu_domain_destroy(), add allocated page table pages to a list and
then simply walk the list to free them. This saves ~90 lines of code overall.

NOTE: There is no need to clear and sync PTEs during teardown since the per-
      device root entries will have already been cleared (when devices were
      de-assigned) so the page tables can no longer be accessed by the IOMMU.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
 xen/drivers/passthrough/amd/iommu.h         | 18 +----
 xen/drivers/passthrough/amd/iommu_map.c     | 10 +--
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 74 +++------------------
 xen/drivers/passthrough/iommu.c             | 23 -------
 xen/drivers/passthrough/vtd/iommu.c         | 51 ++------------
 xen/drivers/passthrough/x86/iommu.c         | 41 ++++++++++++
 xen/include/asm-x86/iommu.h                 |  6 ++
 xen/include/xen/iommu.h                     |  5 --
 8 files changed, 68 insertions(+), 160 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/amd/iommu.h
index 3489c2a015..e2d174f3b4 100644
--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -226,7 +226,7 @@ int __must_check amd_iommu_map_page(struct domain *d, dfn_t dfn,
                                     unsigned int *flush_flags);
 int __must_check amd_iommu_unmap_page(struct domain *d, dfn_t dfn,
                                       unsigned int *flush_flags);
-int __must_check amd_iommu_alloc_root(struct domain_iommu *hd);
+int __must_check amd_iommu_alloc_root(struct domain *d);
 int amd_iommu_reserve_domain_unity_map(struct domain *domain,
                                        paddr_t phys_addr, unsigned long size,
                                        int iw, int ir);
@@ -356,22 +356,6 @@ static inline int amd_iommu_get_paging_mode(unsigned long max_frames)
     return level;
 }
 
-static inline struct page_info *alloc_amd_iommu_pgtable(void)
-{
-    struct page_info *pg = alloc_domheap_page(NULL, 0);
-
-    if ( pg )
-        clear_domain_page(page_to_mfn(pg));
-
-    return pg;
-}
-
-static inline void free_amd_iommu_pgtable(struct page_info *pg)
-{
-    if ( pg )
-        free_domheap_page(pg);
-}
-
 static inline void *__alloc_amd_iommu_tables(unsigned int order)
 {
     return alloc_xenheap_pages(order, 0);
diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index 06c564968c..d54cbf1cb9 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -217,7 +217,7 @@ static int iommu_pde_from_dfn(struct domain *d, unsigned long dfn,
             mfn = next_table_mfn;
 
             /* allocate lower level page table */
-            table = alloc_amd_iommu_pgtable();
+            table = iommu_alloc_pgtable(d);
             if ( table == NULL )
             {
                 AMD_IOMMU_DEBUG("Cannot allocate I/O page table\n");
@@ -248,7 +248,7 @@ static int iommu_pde_from_dfn(struct domain *d, unsigned long dfn,
 
             if ( next_table_mfn == 0 )
             {
-                table = alloc_amd_iommu_pgtable();
+                table = iommu_alloc_pgtable(d);
                 if ( table == NULL )
                 {
                     AMD_IOMMU_DEBUG("Cannot allocate I/O page table\n");
@@ -286,7 +286,7 @@ int amd_iommu_map_page(struct domain *d, dfn_t dfn, mfn_t mfn,
 
     spin_lock(&hd->arch.mapping_lock);
 
-    rc = amd_iommu_alloc_root(hd);
+    rc = amd_iommu_alloc_root(d);
     if ( rc )
     {
         spin_unlock(&hd->arch.mapping_lock);
@@ -458,7 +458,7 @@ int __init amd_iommu_quarantine_init(struct domain *d)
 
     spin_lock(&hd->arch.mapping_lock);
 
-    hd->arch.amd_iommu.root_table = alloc_amd_iommu_pgtable();
+    hd->arch.amd_iommu.root_table = iommu_alloc_pgtable(d);
     if ( !hd->arch.amd_iommu.root_table )
         goto out;
 
@@ -473,7 +473,7 @@ int __init amd_iommu_quarantine_init(struct domain *d)
          * page table pages, and the resulting allocations are always
          * zeroed.
          */
-        pg = alloc_amd_iommu_pgtable();
+        pg = iommu_alloc_pgtable(d);
         if ( !pg )
             break;
 
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index c386dc4387..fd9b1e7bd5 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -206,11 +206,13 @@ static int iov_enable_xt(void)
     return 0;
 }
 
-int amd_iommu_alloc_root(struct domain_iommu *hd)
+int amd_iommu_alloc_root(struct domain *d)
 {
+    struct domain_iommu *hd = dom_iommu(d);
+
     if ( unlikely(!hd->arch.amd_iommu.root_table) )
     {
-        hd->arch.amd_iommu.root_table = alloc_amd_iommu_pgtable();
+        hd->arch.amd_iommu.root_table = iommu_alloc_pgtable(d);
         if ( !hd->arch.amd_iommu.root_table )
             return -ENOMEM;
     }
@@ -218,12 +220,13 @@ int amd_iommu_alloc_root(struct domain_iommu *hd)
     return 0;
 }
 
-static int __must_check allocate_domain_resources(struct domain_iommu *hd)
+static int __must_check allocate_domain_resources(struct domain *d)
 {
+    struct domain_iommu *hd = dom_iommu(d);
     int rc;
 
     spin_lock(&hd->arch.mapping_lock);
-    rc = amd_iommu_alloc_root(hd);
+    rc = amd_iommu_alloc_root(d);
     spin_unlock(&hd->arch.mapping_lock);
 
     return rc;
@@ -255,7 +258,7 @@ static void __hwdom_init amd_iommu_hwdom_init(struct domain *d)
 {
     const struct amd_iommu *iommu;
 
-    if ( allocate_domain_resources(dom_iommu(d)) )
+    if ( allocate_domain_resources(d) )
         BUG();
 
     for_each_amd_iommu ( iommu )
@@ -324,7 +327,6 @@ static int reassign_device(struct domain *source, struct domain *target,
 {
     struct amd_iommu *iommu;
     int bdf, rc;
-    struct domain_iommu *t = dom_iommu(target);
 
     bdf = PCI_BDF2(pdev->bus, pdev->devfn);
     iommu = find_iommu_for_device(pdev->seg, bdf);
@@ -345,7 +347,7 @@ static int reassign_device(struct domain *source, struct domain *target,
         pdev->domain = target;
     }
 
-    rc = allocate_domain_resources(t);
+    rc = allocate_domain_resources(target);
     if ( rc )
         return rc;
 
@@ -378,64 +380,9 @@ static int amd_iommu_assign_device(struct domain *d, u8 devfn,
     return reassign_device(pdev->domain, d, devfn, pdev);
 }
 
-static void deallocate_next_page_table(struct page_info *pg, int level)
-{
-    PFN_ORDER(pg) = level;
-    spin_lock(&iommu_pt_cleanup_lock);
-    page_list_add_tail(pg, &iommu_pt_cleanup_list);
-    spin_unlock(&iommu_pt_cleanup_lock);
-}
-
-static void deallocate_page_table(struct page_info *pg)
-{
-    struct amd_iommu_pte *table_vaddr;
-    unsigned int index, level = PFN_ORDER(pg);
-
-    PFN_ORDER(pg) = 0;
-
-    if ( level <= 1 )
-    {
-        free_amd_iommu_pgtable(pg);
-        return;
-    }
-
-    table_vaddr = __map_domain_page(pg);
-
-    for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
-    {
-        struct amd_iommu_pte *pde = &table_vaddr[index];
-
-        if ( pde->mfn && pde->next_level && pde->pr )
-        {
-            /* We do not support skip levels yet */
-            ASSERT(pde->next_level == level - 1);
-            deallocate_next_page_table(mfn_to_page(_mfn(pde->mfn)),
-                                       pde->next_level);
-        }
-    }
-
-    unmap_domain_page(table_vaddr);
-    free_amd_iommu_pgtable(pg);
-}
-
-static void deallocate_iommu_page_tables(struct domain *d)
-{
-    struct domain_iommu *hd = dom_iommu(d);
-
-    spin_lock(&hd->arch.mapping_lock);
-    if ( hd->arch.amd_iommu.root_table )
-    {
-        deallocate_next_page_table(hd->arch.amd_iommu.root_table,
-                                   hd->arch.amd_iommu.paging_mode);
-        hd->arch.amd_iommu.root_table = NULL;
-    }
-    spin_unlock(&hd->arch.mapping_lock);
-}
-
-
 static void amd_iommu_domain_destroy(struct domain *d)
 {
-    deallocate_iommu_page_tables(d);
+    dom_iommu(d)->arch.amd_iommu.root_table = NULL;
     amd_iommu_flush_all_pages(d);
 }
 
@@ -627,7 +574,6 @@ static const struct iommu_ops __initconstrel _iommu_ops = {
     .unmap_page = amd_iommu_unmap_page,
     .iotlb_flush = amd_iommu_flush_iotlb_pages,
     .iotlb_flush_all = amd_iommu_flush_iotlb_all,
-    .free_page_table = deallocate_page_table,
     .reassign_device = reassign_device,
     .get_device_group_id = amd_iommu_group_id,
     .enable_x2apic = iov_enable_xt,
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 1d644844ab..dad4088531 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -49,10 +49,6 @@ bool_t __read_mostly amd_iommu_perdev_intremap = 1;
 
 DEFINE_PER_CPU(bool_t, iommu_dont_flush_iotlb);
 
-DEFINE_SPINLOCK(iommu_pt_cleanup_lock);
-PAGE_LIST_HEAD(iommu_pt_cleanup_list);
-static struct tasklet iommu_pt_cleanup_tasklet;
-
 static int __init parse_iommu_param(const char *s)
 {
     const char *ss;
@@ -226,7 +222,6 @@ static void iommu_teardown(struct domain *d)
     struct domain_iommu *hd = dom_iommu(d);
 
     hd->platform_ops->teardown(d);
-    tasklet_schedule(&iommu_pt_cleanup_tasklet);
 }
 
 void iommu_domain_destroy(struct domain *d)
@@ -366,23 +361,6 @@ int iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn,
     return iommu_call(hd->platform_ops, lookup_page, d, dfn, mfn, flags);
 }
 
-static void iommu_free_pagetables(void *unused)
-{
-    do {
-        struct page_info *pg;
-
-        spin_lock(&iommu_pt_cleanup_lock);
-        pg = page_list_remove_head(&iommu_pt_cleanup_list);
-        spin_unlock(&iommu_pt_cleanup_lock);
-        if ( !pg )
-            return;
-        iommu_vcall(iommu_get_ops(), free_page_table, pg);
-    } while ( !softirq_pending(smp_processor_id()) );
-
-    tasklet_schedule_on_cpu(&iommu_pt_cleanup_tasklet,
-                            cpumask_cycle(smp_processor_id(), &cpu_online_map));
-}
-
 int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned int page_count,
                       unsigned int flush_flags)
 {
@@ -506,7 +484,6 @@ int __init iommu_setup(void)
 #ifndef iommu_intremap
         printk("Interrupt remapping %sabled\n", iommu_intremap ? "en" : "dis");
 #endif
-        tasklet_init(&iommu_pt_cleanup_tasklet, iommu_free_pagetables, NULL);
     }
 
     return rc;
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index ac1373fb99..40834e2e7a 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -279,13 +279,16 @@ static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
         pte_maddr = dma_pte_addr(*pte);
         if ( !pte_maddr )
         {
+            struct page_info *pg;
+
             if ( !alloc )
                 break;
 
-            pte_maddr = alloc_pgtable_maddr(1, hd->node);
-            if ( !pte_maddr )
+            pg = iommu_alloc_pgtable(domain);
+            if ( !pg )
                 break;
 
+            pte_maddr = page_to_maddr(pg);
             dma_set_pte_addr(*pte, pte_maddr);
 
             /*
@@ -675,45 +678,6 @@ static void dma_pte_clear_one(struct domain *domain, uint64_t addr,
     unmap_vtd_domain_page(page);
 }
 
-static void iommu_free_pagetable(u64 pt_maddr, int level)
-{
-    struct page_info *pg = maddr_to_page(pt_maddr);
-
-    if ( pt_maddr == 0 )
-        return;
-
-    PFN_ORDER(pg) = level;
-    spin_lock(&iommu_pt_cleanup_lock);
-    page_list_add_tail(pg, &iommu_pt_cleanup_list);
-    spin_unlock(&iommu_pt_cleanup_lock);
-}
-
-static void iommu_free_page_table(struct page_info *pg)
-{
-    unsigned int i, next_level = PFN_ORDER(pg) - 1;
-    u64 pt_maddr = page_to_maddr(pg);
-    struct dma_pte *pt_vaddr, *pte;
-
-    PFN_ORDER(pg) = 0;
-    pt_vaddr = (struct dma_pte *)map_vtd_domain_page(pt_maddr);
-
-    for ( i = 0; i < PTE_NUM; i++ )
-    {
-        pte = &pt_vaddr[i];
-        if ( !dma_pte_present(*pte) )
-            continue;
-
-        if ( next_level >= 1 )
-            iommu_free_pagetable(dma_pte_addr(*pte), next_level);
-
-        dma_clear_pte(*pte);
-        iommu_sync_cache(pte, sizeof(struct dma_pte));
-    }
-
-    unmap_vtd_domain_page(pt_vaddr);
-    free_pgtable_maddr(pt_maddr);
-}
-
 static int iommu_set_root_entry(struct vtd_iommu *iommu)
 {
     u32 sts;
@@ -1766,11 +1730,7 @@ static void iommu_domain_teardown(struct domain *d)
     if ( iommu_use_hap_pt(d) )
         return;
 
-    spin_lock(&hd->arch.mapping_lock);
-    iommu_free_pagetable(hd->arch.vtd.pgd_maddr,
-                         agaw_to_level(hd->arch.vtd.agaw));
     hd->arch.vtd.pgd_maddr = 0;
-    spin_unlock(&hd->arch.mapping_lock);
 }
 
 static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
@@ -2751,7 +2711,6 @@ static struct iommu_ops __initdata vtd_ops = {
     .map_page = intel_iommu_map_page,
     .unmap_page = intel_iommu_unmap_page,
     .lookup_page = intel_iommu_lookup_page,
-    .free_page_table = iommu_free_page_table,
     .reassign_device = reassign_device_ownership,
     .get_device_group_id = intel_iommu_group_id,
     .enable_x2apic = intel_iommu_enable_eim,
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index a12109a1de..b3c7da0fe2 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d)
 
     spin_lock_init(&hd->arch.mapping_lock);
 
+    INIT_PAGE_LIST_HEAD(&hd->arch.pgtables.list);
+    spin_lock_init(&hd->arch.pgtables.lock);
+
     return 0;
 }
 
 void arch_iommu_domain_destroy(struct domain *d)
 {
+    struct domain_iommu *hd = dom_iommu(d);
+    struct page_info *pg;
+
+    while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
+        free_domheap_page(pg);
 }
 
 static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
@@ -257,6 +265,39 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
         return;
 }
 
+struct page_info *iommu_alloc_pgtable(struct domain *d)
+{
+    struct domain_iommu *hd = dom_iommu(d);
+#ifdef CONFIG_NUMA
+    unsigned int memflags = (hd->node == NUMA_NO_NODE) ?
+        0 : MEMF_node(hd->node);
+#else
+    unsigned int memflags = 0;
+#endif
+    struct page_info *pg;
+    void *p;
+
+    BUG_ON(!iommu_enabled);
+
+    pg = alloc_domheap_page(NULL, memflags);
+    if ( !pg )
+        return NULL;
+
+    p = __map_domain_page(pg);
+    clear_page(p);
+
+    if ( hd->platform_ops->sync_cache )
+        iommu_vcall(hd->platform_ops, sync_cache, p, PAGE_SIZE);
+
+    unmap_domain_page(p);
+
+    spin_lock(&hd->arch.pgtables.lock);
+    page_list_add(pg, &hd->arch.pgtables.list);
+    spin_unlock(&hd->arch.pgtables.lock);
+
+    return pg;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
index a7add5208c..280515966c 100644
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -46,6 +46,10 @@ typedef uint64_t daddr_t;
 struct arch_iommu
 {
     spinlock_t mapping_lock; /* io page table lock */
+    struct {
+        struct page_list_head list;
+        spinlock_t lock;
+    } pgtables;
 
     union {
         /* Intel VT-d */
@@ -131,6 +135,8 @@ int pi_update_irte(const struct pi_desc *pi_desc, const struct pirq *pirq,
         iommu_vcall(ops, sync_cache, addr, size);       \
 })
 
+struct page_info * __must_check iommu_alloc_pgtable(struct domain *d);
+
 #endif /* !__ARCH_X86_IOMMU_H__ */
 /*
  * Local variables:
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 3272874958..51c29180a4 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -263,8 +263,6 @@ struct iommu_ops {
     int __must_check (*lookup_page)(struct domain *d, dfn_t dfn, mfn_t *mfn,
                                     unsigned int *flags);
 
-    void (*free_page_table)(struct page_info *);
-
 #ifdef CONFIG_X86
     int (*enable_x2apic)(void);
     void (*disable_x2apic)(void);
@@ -381,9 +379,6 @@ void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev);
  */
 DECLARE_PER_CPU(bool_t, iommu_dont_flush_iotlb);
 
-extern struct spinlock iommu_pt_cleanup_lock;
-extern struct page_list_head iommu_pt_cleanup_list;
-
 #endif /* _IOMMU_H_ */
 
 /*
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/6] iommu: remove iommu_lookup_page() and the lookup_page() method...
  2020-07-24 16:46 [PATCH 0/6] IOMMU cleanup Paul Durrant
  2020-07-24 16:46 ` [PATCH 1/6] x86/iommu: re-arrange arch_iommu to separate common fields Paul Durrant
  2020-07-24 16:46 ` [PATCH 2/6] x86/iommu: add common page-table allocator Paul Durrant
@ 2020-07-24 16:46 ` Paul Durrant
  2020-07-24 18:38   ` Andrew Cooper
  2020-07-24 16:46 ` [PATCH 4/6] remove remaining uses of iommu_legacy_map/unmap Paul Durrant
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Paul Durrant @ 2020-07-24 16:46 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Kevin Tian, Jan Beulich

From: Paul Durrant <pdurrant@amazon.com>

... from iommu_ops.

This patch is essentially a reversion of dd93d54f "vtd: add lookup_page method
to iommu_ops". The code was intended to be used by a patch that has long-
since been abandoned. Therefore it is dead code and can be removed.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Kevin Tian <kevin.tian@intel.com>
---
 xen/drivers/passthrough/iommu.c     | 11 --------
 xen/drivers/passthrough/vtd/iommu.c | 41 -----------------------------
 xen/include/xen/iommu.h             |  5 ----
 3 files changed, 57 deletions(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index dad4088531..327df17c5d 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -350,17 +350,6 @@ int iommu_legacy_unmap(struct domain *d, dfn_t dfn, unsigned int page_order)
     return rc;
 }
 
-int iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn,
-                      unsigned int *flags)
-{
-    const struct domain_iommu *hd = dom_iommu(d);
-
-    if ( !is_iommu_enabled(d) || !hd->platform_ops->lookup_page )
-        return -EOPNOTSUPP;
-
-    return iommu_call(hd->platform_ops, lookup_page, d, dfn, mfn, flags);
-}
-
 int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned int page_count,
                       unsigned int flush_flags)
 {
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 40834e2e7a..149d7122c3 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1808,46 +1808,6 @@ static int __must_check intel_iommu_unmap_page(struct domain *d, dfn_t dfn,
     return 0;
 }
 
-static int intel_iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn,
-                                   unsigned int *flags)
-{
-    struct domain_iommu *hd = dom_iommu(d);
-    struct dma_pte *page, val;
-    u64 pg_maddr;
-
-    /*
-     * If VT-d shares EPT page table or if the domain is the hardware
-     * domain and iommu_passthrough is set then pass back the dfn.
-     */
-    if ( iommu_use_hap_pt(d) ||
-         (iommu_hwdom_passthrough && is_hardware_domain(d)) )
-        return -EOPNOTSUPP;
-
-    spin_lock(&hd->arch.mapping_lock);
-
-    pg_maddr = addr_to_dma_page_maddr(d, dfn_to_daddr(dfn), 0);
-    if ( !pg_maddr )
-    {
-        spin_unlock(&hd->arch.mapping_lock);
-        return -ENOENT;
-    }
-
-    page = map_vtd_domain_page(pg_maddr);
-    val = page[dfn_x(dfn) & LEVEL_MASK];
-
-    unmap_vtd_domain_page(page);
-    spin_unlock(&hd->arch.mapping_lock);
-
-    if ( !dma_pte_present(val) )
-        return -ENOENT;
-
-    *mfn = maddr_to_mfn(dma_pte_addr(val));
-    *flags = dma_pte_read(val) ? IOMMUF_readable : 0;
-    *flags |= dma_pte_write(val) ? IOMMUF_writable : 0;
-
-    return 0;
-}
-
 static int __init vtd_ept_page_compatible(struct vtd_iommu *iommu)
 {
     u64 ept_cap, vtd_cap = iommu->cap;
@@ -2710,7 +2670,6 @@ static struct iommu_ops __initdata vtd_ops = {
     .teardown = iommu_domain_teardown,
     .map_page = intel_iommu_map_page,
     .unmap_page = intel_iommu_unmap_page,
-    .lookup_page = intel_iommu_lookup_page,
     .reassign_device = reassign_device_ownership,
     .get_device_group_id = intel_iommu_group_id,
     .enable_x2apic = intel_iommu_enable_eim,
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 51c29180a4..271bd8e546 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -158,9 +158,6 @@ int __must_check iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn,
 int __must_check iommu_legacy_unmap(struct domain *d, dfn_t dfn,
                                     unsigned int page_order);
 
-int __must_check iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn,
-                                   unsigned int *flags);
-
 int __must_check iommu_iotlb_flush(struct domain *d, dfn_t dfn,
                                    unsigned int page_count,
                                    unsigned int flush_flags);
@@ -260,8 +257,6 @@ struct iommu_ops {
                                  unsigned int *flush_flags);
     int __must_check (*unmap_page)(struct domain *d, dfn_t dfn,
                                    unsigned int *flush_flags);
-    int __must_check (*lookup_page)(struct domain *d, dfn_t dfn, mfn_t *mfn,
-                                    unsigned int *flags);
 
 #ifdef CONFIG_X86
     int (*enable_x2apic)(void);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/6] remove remaining uses of iommu_legacy_map/unmap
  2020-07-24 16:46 [PATCH 0/6] IOMMU cleanup Paul Durrant
                   ` (2 preceding siblings ...)
  2020-07-24 16:46 ` [PATCH 3/6] iommu: remove iommu_lookup_page() and the lookup_page() method Paul Durrant
@ 2020-07-24 16:46 ` Paul Durrant
  2020-07-26  8:36   ` Jan Beulich
  2020-07-24 16:46 ` [PATCH 5/6] iommu: remove the share_p2m operation Paul Durrant
  2020-07-24 16:46 ` [PATCH 6/6] iommu: stop calling IOMMU page tables 'p2m tables' Paul Durrant
  5 siblings, 1 reply; 27+ messages in thread
From: Paul Durrant @ 2020-07-24 16:46 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Andrew Cooper, Paul Durrant, Ian Jackson, George Dunlap,
	Jan Beulich, Roger Pau Monné

From: Paul Durrant <pdurrant@amazon.com>

The 'legacy' functions do implicit flushing so amend the callers to do the
appropriate flushing.

Unfortunately, because of the structure of the P2M code, we cannot remove
the per-CPU 'iommu_dont_flush_iotlb' global and the optimization it
facilitates. It is now checked directly iommu_iotlb_flush(). Also, it is
now declared as bool (rather than bool_t) and setting/clearing it are no
longer pointlessly gated on is_iommu_enabled() returning true. (Arguably
it is also pointless to gate the call to iommu_iotlb_flush() on that
condition - since it is a no-op in that case - but the if clause allows
the scope of a stack variable to be restricted).

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/mm.c               | 22 +++++++++++++++-----
 xen/arch/x86/mm/p2m-ept.c       | 22 +++++++++++++-------
 xen/arch/x86/mm/p2m-pt.c        | 17 +++++++++++----
 xen/arch/x86/mm/p2m.c           | 28 ++++++++++++++++++-------
 xen/arch/x86/x86_64/mm.c        | 27 ++++++++++++++++++------
 xen/common/grant_table.c        | 36 +++++++++++++++++++++++++-------
 xen/common/memory.c             |  7 +++----
 xen/drivers/passthrough/iommu.c | 37 +--------------------------------
 xen/include/xen/iommu.h         | 20 +++++-------------
 9 files changed, 123 insertions(+), 93 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 82bc676553..8a5658b97a 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2446,10 +2446,16 @@ static int cleanup_page_mappings(struct page_info *page)
 
         if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
         {
-            int rc2 = iommu_legacy_unmap(d, _dfn(mfn), PAGE_ORDER_4K);
+            unsigned int flush_flags = 0;
+            int err;
 
+            err = iommu_unmap(d, _dfn(mfn), PAGE_ORDER_4K, &flush_flags);
             if ( !rc )
-                rc = rc2;
+                rc = err;
+
+            err = iommu_iotlb_flush(d, _dfn(mfn), 1, flush_flags);
+            if ( !rc )
+                rc = err;
         }
 
         if ( likely(!is_special_page(page)) )
@@ -2971,13 +2977,19 @@ static int _get_page_type(struct page_info *page, unsigned long type,
         if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
         {
             mfn_t mfn = page_to_mfn(page);
+            dfn_t dfn = _dfn(mfn_x(mfn));
+            unsigned int flush_flags = 0;
+            int err;
 
             if ( (x & PGT_type_mask) == PGT_writable_page )
-                rc = iommu_legacy_unmap(d, _dfn(mfn_x(mfn)), PAGE_ORDER_4K);
+                rc = iommu_unmap(d, dfn, PAGE_ORDER_4K, &flush_flags);
             else
-                rc = iommu_legacy_map(d, _dfn(mfn_x(mfn)), mfn, PAGE_ORDER_4K,
-                                      IOMMUF_readable | IOMMUF_writable);
+                rc = iommu_map(d, dfn, mfn, PAGE_ORDER_4K,
+                               IOMMUF_readable | IOMMUF_writable, &flush_flags);
 
+            err = iommu_iotlb_flush(d, dfn, 1, flush_flags);
+            if ( !rc )
+                rc = err;
             if ( unlikely(rc) )
             {
                 _put_page_type(page, 0, NULL);
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index b8154a7ecc..d71c949b35 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -842,15 +842,21 @@ out:
     if ( rc == 0 && p2m_is_hostp2m(p2m) &&
          need_modify_vtd_table )
     {
-        if ( iommu_use_hap_pt(d) )
-            rc = iommu_iotlb_flush(d, _dfn(gfn), (1u << order),
-                                   (iommu_flags ? IOMMU_FLUSHF_added : 0) |
-                                   (vtd_pte_present ? IOMMU_FLUSHF_modified
-                                                    : 0));
-        else if ( need_iommu_pt_sync(d) )
+        unsigned int flush_flags = 0;
+        int err;
+
+        if ( need_iommu_pt_sync(d) )
             rc = iommu_flags ?
-                iommu_legacy_map(d, _dfn(gfn), mfn, order, iommu_flags) :
-                iommu_legacy_unmap(d, _dfn(gfn), order);
+                iommu_map(d, _dfn(gfn), mfn, order, iommu_flags, &flush_flags) :
+                iommu_unmap(d, _dfn(gfn), order, &flush_flags);
+        else if ( iommu_use_hap_pt(d) )
+            flush_flags =
+                (iommu_flags ? IOMMU_FLUSHF_added : 0) |
+                (vtd_pte_present ? IOMMU_FLUSHF_modified : 0);
+
+        err = iommu_iotlb_flush(d, _dfn(gfn), 1u << order, flush_flags);
+        if ( !rc )
+            rc = err;
     }
 
     unmap_domain_page(table);
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index badb26bc34..c48245cfe4 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -678,10 +678,19 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
 
     if ( need_iommu_pt_sync(p2m->domain) &&
          (iommu_old_flags != iommu_pte_flags || old_mfn != mfn_x(mfn)) )
-        rc = iommu_pte_flags
-             ? iommu_legacy_map(d, _dfn(gfn), mfn, page_order,
-                                iommu_pte_flags)
-             : iommu_legacy_unmap(d, _dfn(gfn), page_order);
+    {
+        unsigned int flush_flags = 0;
+        int err;
+
+        rc = iommu_pte_flags ?
+            iommu_map(d, _dfn(gfn), mfn, page_order, iommu_pte_flags,
+                      &flush_flags) :
+            iommu_unmap(d, _dfn(gfn), page_order, &flush_flags);
+
+        err = iommu_iotlb_flush(d, _dfn(gfn), 1u << page_order, flush_flags);
+        if ( !rc )
+            rc = err;
+    }
 
     /*
      * Free old intermediate tables if necessary.  This has to be the
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index db7bde0230..c5f52a4118 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1350,10 +1350,17 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn_l,
 
     if ( !paging_mode_translate(p2m->domain) )
     {
-        if ( !is_iommu_enabled(d) )
-            return 0;
-        return iommu_legacy_map(d, _dfn(gfn_l), _mfn(gfn_l), PAGE_ORDER_4K,
-                                IOMMUF_readable | IOMMUF_writable);
+        unsigned int flush_flags = 0;
+        int err;
+
+        ret = iommu_map(d, _dfn(gfn_l), _mfn(gfn_l), PAGE_ORDER_4K,
+                        IOMMUF_readable | IOMMUF_writable, &flush_flags);
+
+        err = iommu_iotlb_flush(d, _dfn(gfn_l), 1, flush_flags);
+        if ( !ret )
+            ret = err;
+
+        return ret;
     }
 
     gfn_lock(p2m, gfn, 0);
@@ -1441,9 +1448,16 @@ int clear_identity_p2m_entry(struct domain *d, unsigned long gfn_l)
 
     if ( !paging_mode_translate(d) )
     {
-        if ( !is_iommu_enabled(d) )
-            return 0;
-        return iommu_legacy_unmap(d, _dfn(gfn_l), PAGE_ORDER_4K);
+        unsigned int flush_flags = 0;
+        int err;
+
+        ret = iommu_unmap(d, _dfn(gfn_l), PAGE_ORDER_4K, &flush_flags);
+
+        err = iommu_iotlb_flush(d, _dfn(gfn_l), 1, flush_flags);
+        if ( !ret )
+            ret = err;
+
+        return ret;
     }
 
     gfn_lock(p2m, gfn, 0);
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 102079a801..3e0bff228e 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1413,21 +1413,36 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
          !iommu_use_hap_pt(hardware_domain) &&
          !need_iommu_pt_sync(hardware_domain) )
     {
+        unsigned int flush_flags = 0;
+        bool failed = false;
+        unsigned int n;
+
         for ( i = spfn; i < epfn; i++ )
-            if ( iommu_legacy_map(hardware_domain, _dfn(i), _mfn(i),
-                                  PAGE_ORDER_4K,
-                                  IOMMUF_readable | IOMMUF_writable) )
+            if ( iommu_map(hardware_domain, _dfn(i), _mfn(i),
+                           PAGE_ORDER_4K, IOMMUF_readable | IOMMUF_writable,
+                           &flush_flags) )
                 break;
         if ( i != epfn )
         {
+            failed = true;
+
             while (i-- > old_max)
                 /* If statement to satisfy __must_check. */
-                if ( iommu_legacy_unmap(hardware_domain, _dfn(i),
-                                        PAGE_ORDER_4K) )
+                if ( iommu_unmap(hardware_domain, _dfn(i), PAGE_ORDER_4K,
+                                 &flush_flags) )
                     continue;
+        }
 
-            goto destroy_m2p;
+        for ( i = spfn; i < epfn; i += n )
+        {
+            n = epfn - i; /* may truncate */
+
+            /* If statement to satisfy __must_check. */
+            if ( iommu_iotlb_flush(hardware_domain, _dfn(i), n, flush_flags) )
+                continue;
         }
+        if ( failed )
+            goto destroy_m2p;
     }
 
     /* We can't revert any more */
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 9f0cae52c0..bc2b5000cf 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1225,11 +1225,25 @@ map_grant_ref(
             kind = IOMMUF_readable;
         else
             kind = 0;
-        if ( kind && iommu_legacy_map(ld, _dfn(mfn_x(mfn)), mfn, 0, kind) )
+        if ( kind )
         {
-            double_gt_unlock(lgt, rgt);
-            rc = GNTST_general_error;
-            goto undo_out;
+            dfn_t dfn = _dfn(mfn_x(mfn));
+            unsigned int flush_flags = 0;
+            int err;
+
+            err = iommu_map(ld, dfn, mfn, 0, kind, &flush_flags);
+            if ( err )
+                rc = GNTST_general_error;
+
+            err = iommu_iotlb_flush(ld, dfn, 1, flush_flags);
+            if ( err )
+                rc = GNTST_general_error;
+
+            if ( rc != GNTST_okay )
+            {
+                double_gt_unlock(lgt, rgt);
+                goto undo_out;
+            }
         }
     }
 
@@ -1473,21 +1487,27 @@ unmap_common(
     if ( rc == GNTST_okay && gnttab_need_iommu_mapping(ld) )
     {
         unsigned int kind;
+        dfn_t dfn = _dfn(mfn_x(op->mfn));
+        unsigned int flush_flags = 0;
         int err = 0;
 
         double_gt_lock(lgt, rgt);
 
         kind = mapkind(lgt, rd, op->mfn);
         if ( !kind )
-            err = iommu_legacy_unmap(ld, _dfn(mfn_x(op->mfn)), 0);
+            err = iommu_unmap(ld, dfn, 0, &flush_flags);
         else if ( !(kind & MAPKIND_WRITE) )
-            err = iommu_legacy_map(ld, _dfn(mfn_x(op->mfn)), op->mfn, 0,
-                                   IOMMUF_readable);
+            err = iommu_map(ld, dfn, op->mfn, 0, IOMMUF_readable,
+                            &flush_flags);
 
-        double_gt_unlock(lgt, rgt);
+        if ( err )
+            rc = GNTST_general_error;
 
+        err = iommu_iotlb_flush(ld, dfn, 1, flush_flags);
         if ( err )
             rc = GNTST_general_error;
+
+        double_gt_unlock(lgt, rgt);
     }
 
     /* If just unmapped a writable mapping, mark as dirtied */
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 714077c1e5..fedbd9019e 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -824,8 +824,7 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
     xatp->gpfn += start;
     xatp->size -= start;
 
-    if ( is_iommu_enabled(d) )
-       this_cpu(iommu_dont_flush_iotlb) = 1;
+    this_cpu(iommu_dont_flush_iotlb) = true;
 
     while ( xatp->size > done )
     {
@@ -845,12 +844,12 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
         }
     }
 
+    this_cpu(iommu_dont_flush_iotlb) = false;
+
     if ( is_iommu_enabled(d) )
     {
         int ret;
 
-        this_cpu(iommu_dont_flush_iotlb) = 0;
-
         ret = iommu_iotlb_flush(d, _dfn(xatp->idx - done), done,
                                 IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified);
         if ( unlikely(ret) && rc >= 0 )
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 327df17c5d..f32d8e25a8 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -277,24 +277,6 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
     return rc;
 }
 
-int iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn,
-                     unsigned int page_order, unsigned int flags)
-{
-    unsigned int flush_flags = 0;
-    int rc = iommu_map(d, dfn, mfn, page_order, flags, &flush_flags);
-
-    if ( !this_cpu(iommu_dont_flush_iotlb) )
-    {
-        int err = iommu_iotlb_flush(d, dfn, (1u << page_order),
-                                    flush_flags);
-
-        if ( !rc )
-            rc = err;
-    }
-
-    return rc;
-}
-
 int iommu_unmap(struct domain *d, dfn_t dfn, unsigned int page_order,
                 unsigned int *flush_flags)
 {
@@ -333,23 +315,6 @@ int iommu_unmap(struct domain *d, dfn_t dfn, unsigned int page_order,
     return rc;
 }
 
-int iommu_legacy_unmap(struct domain *d, dfn_t dfn, unsigned int page_order)
-{
-    unsigned int flush_flags = 0;
-    int rc = iommu_unmap(d, dfn, page_order, &flush_flags);
-
-    if ( !this_cpu(iommu_dont_flush_iotlb) )
-    {
-        int err = iommu_iotlb_flush(d, dfn, (1u << page_order),
-                                    flush_flags);
-
-        if ( !rc )
-            rc = err;
-    }
-
-    return rc;
-}
-
 int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned int page_count,
                       unsigned int flush_flags)
 {
@@ -357,7 +322,7 @@ int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned int page_count,
     int rc;
 
     if ( !is_iommu_enabled(d) || !hd->platform_ops->iotlb_flush ||
-         !page_count || !flush_flags )
+         !page_count || !flush_flags || this_cpu(iommu_dont_flush_iotlb) )
         return 0;
 
     if ( dfn_eq(dfn, INVALID_DFN) )
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 271bd8e546..ec639ba128 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -151,13 +151,6 @@ int __must_check iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
 int __must_check iommu_unmap(struct domain *d, dfn_t dfn,
                              unsigned int page_order,
                              unsigned int *flush_flags);
-
-int __must_check iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn,
-                                  unsigned int page_order,
-                                  unsigned int flags);
-int __must_check iommu_legacy_unmap(struct domain *d, dfn_t dfn,
-                                    unsigned int page_order);
-
 int __must_check iommu_iotlb_flush(struct domain *d, dfn_t dfn,
                                    unsigned int page_count,
                                    unsigned int flush_flags);
@@ -364,15 +357,12 @@ void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev);
 
 /*
  * The purpose of the iommu_dont_flush_iotlb optional cpu flag is to
- * avoid unecessary iotlb_flush in the low level IOMMU code.
- *
- * iommu_map_page/iommu_unmap_page must flush the iotlb but somethimes
- * this operation can be really expensive. This flag will be set by the
- * caller to notify the low level IOMMU code to avoid the iotlb flushes.
- * iommu_iotlb_flush/iommu_iotlb_flush_all will be explicitly called by
- * the caller.
+ * avoid unecessary IOMMU flushing while updating the P2M.
+ * Setting the value to true will cause iommu_iotlb_flush() to return without
+ * actually performing a flush. A batch flush must therefore be done by the
+ * calling code after setting the value back to false.
  */
-DECLARE_PER_CPU(bool_t, iommu_dont_flush_iotlb);
+DECLARE_PER_CPU(bool, iommu_dont_flush_iotlb);
 
 #endif /* _IOMMU_H_ */
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/6] iommu: remove the share_p2m operation
  2020-07-24 16:46 [PATCH 0/6] IOMMU cleanup Paul Durrant
                   ` (3 preceding siblings ...)
  2020-07-24 16:46 ` [PATCH 4/6] remove remaining uses of iommu_legacy_map/unmap Paul Durrant
@ 2020-07-24 16:46 ` Paul Durrant
  2020-07-24 19:00   ` Andrew Cooper
  2020-07-26  8:50   ` Jan Beulich
  2020-07-24 16:46 ` [PATCH 6/6] iommu: stop calling IOMMU page tables 'p2m tables' Paul Durrant
  5 siblings, 2 replies; 27+ messages in thread
From: Paul Durrant @ 2020-07-24 16:46 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Wei Liu, Andrew Cooper, Paul Durrant, George Dunlap,
	Jan Beulich, Roger Pau Monné

From: Paul Durrant <pdurrant@amazon.com>

Sharing of HAP tables is VT-d specific so the operation is never defined for
AMD IOMMU. There's also no need to pro-actively set vtd.pgd_maddr when using
shared EPT as it is straightforward to simply define a helper function to
return the appropriate value in the shared and non-shared cases.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Kevin Tian <kevin.tian@intel.com>
---
 xen/arch/x86/mm/p2m.c               |  3 --
 xen/drivers/passthrough/iommu.c     |  8 -----
 xen/drivers/passthrough/vtd/iommu.c | 55 ++++++++++++++---------------
 xen/include/xen/iommu.h             |  3 --
 4 files changed, 27 insertions(+), 42 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c5f52a4118..95b5055648 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -726,9 +726,6 @@ int p2m_alloc_table(struct p2m_domain *p2m)
 
     p2m->phys_table = pagetable_from_mfn(top_mfn);
 
-    if ( hap_enabled(d) )
-        iommu_share_p2m_table(d);
-
     p2m_unlock(p2m);
     return 0;
 }
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index f32d8e25a8..6a3803ff2c 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -478,14 +478,6 @@ int iommu_do_domctl(
     return ret;
 }
 
-void iommu_share_p2m_table(struct domain* d)
-{
-    ASSERT(hap_enabled(d));
-
-    if ( iommu_use_hap_pt(d) )
-        iommu_get_ops()->share_p2m(d);
-}
-
 void iommu_crash_shutdown(void)
 {
     if ( !iommu_crash_disable )
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 149d7122c3..d09ca3fb3d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -313,6 +313,26 @@ static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
     return pte_maddr;
 }
 
+static u64 domain_pgd_maddr(struct domain *d)
+{
+    struct domain_iommu *hd = dom_iommu(d);
+
+    ASSERT(spin_is_locked(&hd->arch.mapping_lock));
+
+    if ( iommu_use_hap_pt(d) )
+    {
+        mfn_t pgd_mfn =
+            pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
+
+        return pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
+    }
+
+    if ( !hd->arch.vtd.pgd_maddr )
+        addr_to_dma_page_maddr(d, 0, 1);
+
+    return hd->arch.vtd.pgd_maddr;
+}
+
 static void iommu_flush_write_buffer(struct vtd_iommu *iommu)
 {
     u32 val;
@@ -1347,22 +1367,17 @@ int domain_context_mapping_one(
     {
         spin_lock(&hd->arch.mapping_lock);
 
-        /* Ensure we have pagetables allocated down to leaf PTE. */
-        if ( hd->arch.vtd.pgd_maddr == 0 )
+        pgd_maddr = domain_pgd_maddr(domain);
+        if ( !pgd_maddr )
         {
-            addr_to_dma_page_maddr(domain, 0, 1);
-            if ( hd->arch.vtd.pgd_maddr == 0 )
-            {
-            nomem:
-                spin_unlock(&hd->arch.mapping_lock);
-                spin_unlock(&iommu->lock);
-                unmap_vtd_domain_page(context_entries);
-                return -ENOMEM;
-            }
+        nomem:
+            spin_unlock(&hd->arch.mapping_lock);
+            spin_unlock(&iommu->lock);
+            unmap_vtd_domain_page(context_entries);
+            return -ENOMEM;
         }
 
         /* Skip top levels of page tables for 2- and 3-level DRHDs. */
-        pgd_maddr = hd->arch.vtd.pgd_maddr;
         for ( agaw = level_to_agaw(4);
               agaw != level_to_agaw(iommu->nr_pt_levels);
               agaw-- )
@@ -1727,9 +1742,6 @@ static void iommu_domain_teardown(struct domain *d)
 
     ASSERT(is_iommu_enabled(d));
 
-    if ( iommu_use_hap_pt(d) )
-        return;
-
     hd->arch.vtd.pgd_maddr = 0;
 }
 
@@ -1821,18 +1833,6 @@ static int __init vtd_ept_page_compatible(struct vtd_iommu *iommu)
            (ept_has_1gb(ept_cap) && opt_hap_1gb) <= cap_sps_1gb(vtd_cap);
 }
 
-/*
- * set VT-d page table directory to EPT table if allowed
- */
-static void iommu_set_pgd(struct domain *d)
-{
-    mfn_t pgd_mfn;
-
-    pgd_mfn = pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
-    dom_iommu(d)->arch.vtd.pgd_maddr =
-        pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
-}
-
 static int rmrr_identity_mapping(struct domain *d, bool_t map,
                                  const struct acpi_rmrr_unit *rmrr,
                                  u32 flag)
@@ -2682,7 +2682,6 @@ static struct iommu_ops __initdata vtd_ops = {
     .adjust_irq_affinities = adjust_vtd_irq_affinities,
     .suspend = vtd_suspend,
     .resume = vtd_resume,
-    .share_p2m = iommu_set_pgd,
     .crash_shutdown = vtd_crash_shutdown,
     .iotlb_flush = iommu_flush_iotlb_pages,
     .iotlb_flush_all = iommu_flush_iotlb_all,
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index ec639ba128..cc122fd10b 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -266,7 +266,6 @@ struct iommu_ops {
 
     int __must_check (*suspend)(void);
     void (*resume)(void);
-    void (*share_p2m)(struct domain *d);
     void (*crash_shutdown)(void);
     int __must_check (*iotlb_flush)(struct domain *d, dfn_t dfn,
                                     unsigned int page_count,
@@ -343,8 +342,6 @@ void iommu_resume(void);
 void iommu_crash_shutdown(void);
 int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
 
-void iommu_share_p2m_table(struct domain *d);
-
 #ifdef CONFIG_HAS_PCI
 int iommu_do_pci_domctl(struct xen_domctl *, struct domain *d,
                         XEN_GUEST_HANDLE_PARAM(xen_domctl_t));
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/6] iommu: stop calling IOMMU page tables 'p2m tables'
  2020-07-24 16:46 [PATCH 0/6] IOMMU cleanup Paul Durrant
                   ` (4 preceding siblings ...)
  2020-07-24 16:46 ` [PATCH 5/6] iommu: remove the share_p2m operation Paul Durrant
@ 2020-07-24 16:46 ` Paul Durrant
  2020-07-24 19:08   ` Andrew Cooper
  5 siblings, 1 reply; 27+ messages in thread
From: Paul Durrant @ 2020-07-24 16:46 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Paul Durrant, Kevin Tian, Jan Beulich, Paul Durrant

From: Paul Durrant <pdurrant@amazon.com>

It's confusing and not consistent with the terminology introduced with 'dfn_t'.
Just call them IOMMU page tables.

Also remove a pointless check of the 'acpi_drhd_units' list in
vtd_dump_page_table_level(). If the list is empty then IOMMU mappings would
not have been enabled for the domain in the first place.

NOTE: There's also a bit of cosmetic clean-up in iommu_dump_page_tables()
      to make use of %pd.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul@xen.org>
Cc: Kevin Tian <kevin.tian@intel.com>
---
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 16 +++++++-------
 xen/drivers/passthrough/iommu.c             | 12 +++++------
 xen/drivers/passthrough/vtd/iommu.c         | 24 +++++++++------------
 xen/include/xen/iommu.h                     |  2 +-
 4 files changed, 25 insertions(+), 29 deletions(-)

diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index fd9b1e7bd5..112bd3eb37 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -499,8 +499,8 @@ static int amd_iommu_group_id(u16 seg, u8 bus, u8 devfn)
 
 #include <asm/io_apic.h>
 
-static void amd_dump_p2m_table_level(struct page_info* pg, int level, 
-                                     paddr_t gpa, int indent)
+static void amd_dump_page_table_level(struct page_info* pg, int level,
+                                      paddr_t gpa, int indent)
 {
     paddr_t address;
     struct amd_iommu_pte *table_vaddr;
@@ -537,7 +537,7 @@ static void amd_dump_p2m_table_level(struct page_info* pg, int level,
 
         address = gpa + amd_offset_level_address(index, level);
         if ( pde->next_level >= 1 )
-            amd_dump_p2m_table_level(
+            amd_dump_page_table_level(
                 mfn_to_page(_mfn(pde->mfn)), pde->next_level,
                 address, indent + 1);
         else
@@ -550,16 +550,16 @@ static void amd_dump_p2m_table_level(struct page_info* pg, int level,
     unmap_domain_page(table_vaddr);
 }
 
-static void amd_dump_p2m_table(struct domain *d)
+static void amd_dump_page_tables(struct domain *d)
 {
     const struct domain_iommu *hd = dom_iommu(d);
 
     if ( !hd->arch.amd_iommu.root_table )
         return;
 
-    printk("p2m table has %d levels\n", hd->arch.amd_iommu.paging_mode);
-    amd_dump_p2m_table_level(hd->arch.amd_iommu.root_table,
-                             hd->arch.amd_iommu.paging_mode, 0, 0);
+    printk("table has %d levels\n", hd->arch.amd_iommu.paging_mode);
+    amd_dump_page_table_level(hd->arch.amd_iommu.root_table,
+                              hd->arch.amd_iommu.paging_mode, 0, 0);
 }
 
 static const struct iommu_ops __initconstrel _iommu_ops = {
@@ -586,7 +586,7 @@ static const struct iommu_ops __initconstrel _iommu_ops = {
     .suspend = amd_iommu_suspend,
     .resume = amd_iommu_resume,
     .crash_shutdown = amd_iommu_crash_shutdown,
-    .dump_p2m_table = amd_dump_p2m_table,
+    .dump_page_tables = amd_dump_page_tables,
 };
 
 static const struct iommu_init_ops __initconstrel _iommu_init_ops = {
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 6a3803ff2c..5bc190bf98 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -22,7 +22,7 @@
 #include <xen/keyhandler.h>
 #include <xsm/xsm.h>
 
-static void iommu_dump_p2m_table(unsigned char key);
+static void iommu_dump_page_tables(unsigned char key);
 
 unsigned int __read_mostly iommu_dev_iotlb_timeout = 1000;
 integer_param("iommu_dev_iotlb_timeout", iommu_dev_iotlb_timeout);
@@ -212,7 +212,7 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
     if ( !is_iommu_enabled(d) )
         return;
 
-    register_keyhandler('o', &iommu_dump_p2m_table, "dump iommu p2m table", 0);
+    register_keyhandler('o', &iommu_dump_page_tables, "dump iommu page tables", 0);
 
     hd->platform_ops->hwdom_init(d);
 }
@@ -513,7 +513,7 @@ bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
     return is_iommu_enabled(d) && test_bit(feature, dom_iommu(d)->features);
 }
 
-static void iommu_dump_p2m_table(unsigned char key)
+static void iommu_dump_page_tables(unsigned char key)
 {
     struct domain *d;
     const struct iommu_ops *ops;
@@ -535,12 +535,12 @@ static void iommu_dump_p2m_table(unsigned char key)
 
         if ( iommu_use_hap_pt(d) )
         {
-            printk("\ndomain%d IOMMU p2m table shared with MMU: \n", d->domain_id);
+            printk("%pd: IOMMU page tables shared with MMU\n", d);
             continue;
         }
 
-        printk("\ndomain%d IOMMU p2m table: \n", d->domain_id);
-        ops->dump_p2m_table(d);
+        printk("%pd: IOMMU page tables: \n", d);
+        ops->dump_page_tables(d);
     }
 
     rcu_read_unlock(&domlist_read_lock);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index d09ca3fb3d..82d7eb6224 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2545,8 +2545,8 @@ static void vtd_resume(void)
     }
 }
 
-static void vtd_dump_p2m_table_level(paddr_t pt_maddr, int level, paddr_t gpa, 
-                                     int indent)
+static void vtd_dump_page_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
+                                      int indent)
 {
     paddr_t address;
     int i;
@@ -2575,8 +2575,8 @@ static void vtd_dump_p2m_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
 
         address = gpa + offset_level_address(i, level);
         if ( next_level >= 1 ) 
-            vtd_dump_p2m_table_level(dma_pte_addr(*pte), next_level, 
-                                     address, indent + 1);
+            vtd_dump_page_table_level(dma_pte_addr(*pte), next_level,
+                                      address, indent + 1);
         else
             printk("%*sdfn: %08lx mfn: %08lx\n",
                    indent, "",
@@ -2587,17 +2587,13 @@ static void vtd_dump_p2m_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
     unmap_vtd_domain_page(pt_vaddr);
 }
 
-static void vtd_dump_p2m_table(struct domain *d)
+static void vtd_dump_page_tables(struct domain *d)
 {
-    const struct domain_iommu *hd;
+    const struct domain_iommu *hd = dom_iommu(d);
 
-    if ( list_empty(&acpi_drhd_units) )
-        return;
-
-    hd = dom_iommu(d);
-    printk("p2m table has %d levels\n", agaw_to_level(hd->arch.vtd.agaw));
-    vtd_dump_p2m_table_level(hd->arch.vtd.pgd_maddr,
-                             agaw_to_level(hd->arch.vtd.agaw), 0, 0);
+    printk("table has %d levels\n", agaw_to_level(hd->arch.vtd.agaw));
+    vtd_dump_page_table_level(hd->arch.vtd.pgd_maddr,
+                              agaw_to_level(hd->arch.vtd.agaw), 0, 0);
 }
 
 static int __init intel_iommu_quarantine_init(struct domain *d)
@@ -2686,7 +2682,7 @@ static struct iommu_ops __initdata vtd_ops = {
     .iotlb_flush = iommu_flush_iotlb_pages,
     .iotlb_flush_all = iommu_flush_iotlb_all,
     .get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
-    .dump_p2m_table = vtd_dump_p2m_table,
+    .dump_page_tables = vtd_dump_page_tables,
 };
 
 const struct iommu_init_ops __initconstrel intel_iommu_init_ops = {
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index cc122fd10b..804b283c1b 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -272,7 +272,7 @@ struct iommu_ops {
                                     unsigned int flush_flags);
     int __must_check (*iotlb_flush_all)(struct domain *d);
     int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
-    void (*dump_p2m_table)(struct domain *d);
+    void (*dump_page_tables)(struct domain *d);
 
 #ifdef CONFIG_HAS_DEVICE_TREE
     /*
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] x86/iommu: re-arrange arch_iommu to separate common fields...
  2020-07-24 16:46 ` [PATCH 1/6] x86/iommu: re-arrange arch_iommu to separate common fields Paul Durrant
@ 2020-07-24 17:29   ` Andrew Cooper
  2020-07-24 18:49     ` Paul Durrant
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2020-07-24 17:29 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Kevin Tian, Wei Liu, Paul Durrant, Lukasz Hawrylko, Jan Beulich,
	Roger Pau Monné

On 24/07/2020 17:46, Paul Durrant wrote:
> diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
> index 6c9d5e5632..a7add5208c 100644
> --- a/xen/include/asm-x86/iommu.h
> +++ b/xen/include/asm-x86/iommu.h
> @@ -45,16 +45,23 @@ typedef uint64_t daddr_t;
>  
>  struct arch_iommu
>  {
> -    u64 pgd_maddr;                 /* io page directory machine address */
> -    spinlock_t mapping_lock;            /* io page table lock */
> -    int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
> -    u64 iommu_bitmap;              /* bitmap of iommu(s) that the domain uses */
> -    struct list_head mapped_rmrrs;
> -
> -    /* amd iommu support */
> -    int paging_mode;
> -    struct page_info *root_table;
> -    struct guest_iommu *g_iommu;
> +    spinlock_t mapping_lock; /* io page table lock */
> +
> +    union {
> +        /* Intel VT-d */
> +        struct {
> +            u64 pgd_maddr; /* io page directory machine address */
> +            int agaw; /* adjusted guest address width, 0 is level 2 30-bit */
> +            u64 iommu_bitmap; /* bitmap of iommu(s) that the domain uses */
> +            struct list_head mapped_rmrrs;
> +        } vtd;
> +        /* AMD IOMMU */
> +        struct {
> +            int paging_mode;
> +            struct page_info *root_table;
> +            struct guest_iommu *g_iommu;
> +        } amd_iommu;
> +    };

The naming split here is weird.

Ideally we'd have struct {vtd,amd}_iommu in appropriate headers, and
this would be simply

union {
    struct vtd_iommu vtd;
    struct amd_iommu amd;
};

If this isn't trivial to arrange, can we at least s/amd_iommu/amd/ here ?

~Andrew


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] x86/iommu: add common page-table allocator
  2020-07-24 16:46 ` [PATCH 2/6] x86/iommu: add common page-table allocator Paul Durrant
@ 2020-07-24 18:24   ` Andrew Cooper
  2020-07-26  8:26     ` Jan Beulich
  2020-07-27  9:37     ` Durrant, Paul
  0 siblings, 2 replies; 27+ messages in thread
From: Andrew Cooper @ 2020-07-24 18:24 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Wei Liu, Paul Durrant, Kevin Tian, Jan Beulich, Roger Pau Monné

On 24/07/2020 17:46, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
>
> Instead of having separate page table allocation functions in VT-d and AMD
> IOMMU code, use a common allocation function in the general x86 code.
> Also, rather than walking the page tables and using a tasklet to free them
> during iommu_domain_destroy(), add allocated page table pages to a list and
> then simply walk the list to free them. This saves ~90 lines of code overall.
>
> NOTE: There is no need to clear and sync PTEs during teardown since the per-
>       device root entries will have already been cleared (when devices were
>       de-assigned) so the page tables can no longer be accessed by the IOMMU.
>
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>

Oh wow - I don't have any polite words for how that code was organised
before.

Instead of discussing the ~90 lines saved, what about the removal of a
global bottleneck (the tasklet) or expand on the removal of redundant
TLB/cache maintenance?

> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index c386dc4387..fd9b1e7bd5 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -378,64 +380,9 @@ static int amd_iommu_assign_device(struct domain *d, u8 devfn,
>      return reassign_device(pdev->domain, d, devfn, pdev);
>  }
>  
> -static void deallocate_next_page_table(struct page_info *pg, int level)
> -{
> -    PFN_ORDER(pg) = level;
> -    spin_lock(&iommu_pt_cleanup_lock);
> -    page_list_add_tail(pg, &iommu_pt_cleanup_list);
> -    spin_unlock(&iommu_pt_cleanup_lock);
> -}
> -
> -static void deallocate_page_table(struct page_info *pg)
> -{
> -    struct amd_iommu_pte *table_vaddr;
> -    unsigned int index, level = PFN_ORDER(pg);
> -
> -    PFN_ORDER(pg) = 0;
> -
> -    if ( level <= 1 )
> -    {
> -        free_amd_iommu_pgtable(pg);
> -        return;
> -    }
> -
> -    table_vaddr = __map_domain_page(pg);
> -
> -    for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
> -    {
> -        struct amd_iommu_pte *pde = &table_vaddr[index];
> -
> -        if ( pde->mfn && pde->next_level && pde->pr )
> -        {
> -            /* We do not support skip levels yet */
> -            ASSERT(pde->next_level == level - 1);
> -            deallocate_next_page_table(mfn_to_page(_mfn(pde->mfn)),
> -                                       pde->next_level);
> -        }
> -    }
> -
> -    unmap_domain_page(table_vaddr);
> -    free_amd_iommu_pgtable(pg);
> -}
> -
> -static void deallocate_iommu_page_tables(struct domain *d)
> -{
> -    struct domain_iommu *hd = dom_iommu(d);
> -
> -    spin_lock(&hd->arch.mapping_lock);
> -    if ( hd->arch.amd_iommu.root_table )
> -    {
> -        deallocate_next_page_table(hd->arch.amd_iommu.root_table,
> -                                   hd->arch.amd_iommu.paging_mode);

I really need to dust off my patch fixing up several bits of dubious
logic, including the name "paging_mode" which is actually simply the
number of levels.

At this point, it will probably be best to get this series in first, and
for me to rebase.

> -        hd->arch.amd_iommu.root_table = NULL;
> -    }
> -    spin_unlock(&hd->arch.mapping_lock);
> -}
> -
> -
>  static void amd_iommu_domain_destroy(struct domain *d)
>  {
> -    deallocate_iommu_page_tables(d);
> +    dom_iommu(d)->arch.amd_iommu.root_table = NULL;
>      amd_iommu_flush_all_pages(d);

Per your NOTE:, shouldn't this flush call be dropped?

> diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
> index a12109a1de..b3c7da0fe2 100644
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d)
>  
>      spin_lock_init(&hd->arch.mapping_lock);
>  
> +    INIT_PAGE_LIST_HEAD(&hd->arch.pgtables.list);
> +    spin_lock_init(&hd->arch.pgtables.lock);
> +
>      return 0;
>  }
>  
>  void arch_iommu_domain_destroy(struct domain *d)
>  {
> +    struct domain_iommu *hd = dom_iommu(d);
> +    struct page_info *pg;
> +
> +    while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
> +        free_domheap_page(pg);

Some of those 90 lines saved were the logic to not suffer a watchdog
timeout here.

To do it like this, it needs plumbing into the relinquish resources path.

>  }
>  
>  static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
> @@ -257,6 +265,39 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
>          return;
>  }
>  
> +struct page_info *iommu_alloc_pgtable(struct domain *d)
> +{
> +    struct domain_iommu *hd = dom_iommu(d);
> +#ifdef CONFIG_NUMA
> +    unsigned int memflags = (hd->node == NUMA_NO_NODE) ?
> +        0 : MEMF_node(hd->node);
> +#else
> +    unsigned int memflags = 0;
> +#endif
> +    struct page_info *pg;
> +    void *p;

The memflags code is very awkward.  How about initialising it to 0, and
having:

#ifdef CONFIG_NUMA
    if ( hd->node != NUMA_NO_NODE )
        memflags = MEMF_node(hd->node);
#endif

here?

> +
> +    BUG_ON(!iommu_enabled);

Is this really necessary?  Can we plausibly end up in this function
otherwise?


Overall, I wonder if this patch would better be split into several.  One
which introduces the common alloc/free implementation, two which switch
the VT-d and AMD implementations over, and possibly one clean-up on the end?

~Andrew


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/6] iommu: remove iommu_lookup_page() and the lookup_page() method...
  2020-07-24 16:46 ` [PATCH 3/6] iommu: remove iommu_lookup_page() and the lookup_page() method Paul Durrant
@ 2020-07-24 18:38   ` Andrew Cooper
  2020-07-24 18:53     ` Paul Durrant
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2020-07-24 18:38 UTC (permalink / raw)
  To: Paul Durrant, xen-devel; +Cc: Paul Durrant, Kevin Tian, Jan Beulich

On 24/07/2020 17:46, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
>
> ... from iommu_ops.
>
> This patch is essentially a reversion of dd93d54f "vtd: add lookup_page method
> to iommu_ops". The code was intended to be used by a patch that has long-
> since been abandoned. Therefore it is dead code and can be removed.

And by this, you mean the work that you only partial unstreamed, with
the remainder of the feature still very much in use by XenServer?

Please don't go breaking in-use things, simply because we're fixing
Xen's IOMMU mess once large XSA at a time...

As far as I can tell, this patch doesn't interact with any others in the
series.

~Andrew


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 1/6] x86/iommu: re-arrange arch_iommu to separate common fields...
  2020-07-24 17:29   ` Andrew Cooper
@ 2020-07-24 18:49     ` Paul Durrant
  2020-07-26  8:13       ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Durrant @ 2020-07-24 18:49 UTC (permalink / raw)
  To: 'Andrew Cooper', xen-devel
  Cc: 'Kevin Tian', 'Wei Liu', 'Paul Durrant',
	'Lukasz Hawrylko', 'Jan Beulich',
	'Roger Pau Monné'

> -----Original Message-----
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Sent: 24 July 2020 18:29
> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> Cc: Paul Durrant <pdurrant@amazon.com>; Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com>; Jan Beulich
> <jbeulich@suse.com>; Wei Liu <wl@xen.org>; Roger Pau Monné <roger.pau@citrix.com>; Kevin Tian
> <kevin.tian@intel.com>
> Subject: Re: [PATCH 1/6] x86/iommu: re-arrange arch_iommu to separate common fields...
> 
> On 24/07/2020 17:46, Paul Durrant wrote:
> > diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
> > index 6c9d5e5632..a7add5208c 100644
> > --- a/xen/include/asm-x86/iommu.h
> > +++ b/xen/include/asm-x86/iommu.h
> > @@ -45,16 +45,23 @@ typedef uint64_t daddr_t;
> >
> >  struct arch_iommu
> >  {
> > -    u64 pgd_maddr;                 /* io page directory machine address */
> > -    spinlock_t mapping_lock;            /* io page table lock */
> > -    int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
> > -    u64 iommu_bitmap;              /* bitmap of iommu(s) that the domain uses */
> > -    struct list_head mapped_rmrrs;
> > -
> > -    /* amd iommu support */
> > -    int paging_mode;
> > -    struct page_info *root_table;
> > -    struct guest_iommu *g_iommu;
> > +    spinlock_t mapping_lock; /* io page table lock */
> > +
> > +    union {
> > +        /* Intel VT-d */
> > +        struct {
> > +            u64 pgd_maddr; /* io page directory machine address */
> > +            int agaw; /* adjusted guest address width, 0 is level 2 30-bit */
> > +            u64 iommu_bitmap; /* bitmap of iommu(s) that the domain uses */
> > +            struct list_head mapped_rmrrs;
> > +        } vtd;
> > +        /* AMD IOMMU */
> > +        struct {
> > +            int paging_mode;
> > +            struct page_info *root_table;
> > +            struct guest_iommu *g_iommu;
> > +        } amd_iommu;
> > +    };
> 
> The naming split here is weird.
> 
> Ideally we'd have struct {vtd,amd}_iommu in appropriate headers, and
> this would be simply
> 
> union {
>     struct vtd_iommu vtd;
>     struct amd_iommu amd;
> };
> 
> If this isn't trivial to arrange, can we at least s/amd_iommu/amd/ here ?

I was in two minds. I tried to look for a TLA for the AMD IOMMU and 'amd' seemed a little too non-descript. I don't really mind though if there's a strong preference to shorted it.
I can certainly try moving the struct definitions into the implementation headers.

  Paul

> 
> ~Andrew



^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 3/6] iommu: remove iommu_lookup_page() and the lookup_page() method...
  2020-07-24 18:38   ` Andrew Cooper
@ 2020-07-24 18:53     ` Paul Durrant
  2020-07-26  8:27       ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Durrant @ 2020-07-24 18:53 UTC (permalink / raw)
  To: 'Andrew Cooper', xen-devel
  Cc: 'Paul Durrant', 'Kevin Tian', 'Jan Beulich'

> -----Original Message-----
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Sent: 24 July 2020 19:39
> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> Cc: Paul Durrant <pdurrant@amazon.com>; Kevin Tian <kevin.tian@intel.com>; Jan Beulich
> <jbeulich@suse.com>
> Subject: Re: [PATCH 3/6] iommu: remove iommu_lookup_page() and the lookup_page() method...
> 
> On 24/07/2020 17:46, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > ... from iommu_ops.
> >
> > This patch is essentially a reversion of dd93d54f "vtd: add lookup_page method
> > to iommu_ops". The code was intended to be used by a patch that has long-
> > since been abandoned. Therefore it is dead code and can be removed.
> 
> And by this, you mean the work that you only partial unstreamed, with
> the remainder of the feature still very much in use by XenServer?
> 

I thought we basically decided to bin the original PV IOMMU idea though? 

> Please don't go breaking in-use things, simply because we're fixing
> Xen's IOMMU mess once large XSA at a time...
> 
> As far as I can tell, this patch doesn't interact with any others in the
> series.
> 

I can leave it, but I still don't think anything other than current XenServer will ever use it... so it really ought to just be in the downstream patch queue IMO.

  Paul

> ~Andrew



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] iommu: remove the share_p2m operation
  2020-07-24 16:46 ` [PATCH 5/6] iommu: remove the share_p2m operation Paul Durrant
@ 2020-07-24 19:00   ` Andrew Cooper
  2020-07-29  8:43     ` Durrant, Paul
  2020-07-26  8:50   ` Jan Beulich
  1 sibling, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2020-07-24 19:00 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Kevin Tian, Wei Liu, Paul Durrant, George Dunlap, Jan Beulich,
	Roger Pau Monné

On 24/07/2020 17:46, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
>
> Sharing of HAP tables is VT-d specific so the operation is never defined for
> AMD IOMMU.

It's not VT-d specific, and Xen really did used to share on AMD.

In fact, a remnant of this logic is still present in the form of the
comment for p2m_type_t explaining why p2m_ram_rw needs to be 0.

That said, I agree it shouldn't be a hook, because it is statically
known at domain create time based on flags and hardware properties.

>  There's also no need to pro-actively set vtd.pgd_maddr when using
> shared EPT as it is straightforward to simply define a helper function to
> return the appropriate value in the shared and non-shared cases.

It occurs to me that vtd.pgd_maddr and amd.root_table really are the
same thing, and furthermore are common to all IOMMU implementations on
any architecture.  Is it worth trying to make them common, rather than
continuing to let each implementation handle them differently?

~Andrew


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] iommu: stop calling IOMMU page tables 'p2m tables'
  2020-07-24 16:46 ` [PATCH 6/6] iommu: stop calling IOMMU page tables 'p2m tables' Paul Durrant
@ 2020-07-24 19:08   ` Andrew Cooper
  2020-07-27 10:13     ` Durrant, Paul
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2020-07-24 19:08 UTC (permalink / raw)
  To: Paul Durrant, xen-devel; +Cc: Paul Durrant, Kevin Tian, Jan Beulich

On 24/07/2020 17:46, Paul Durrant wrote:
> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> index 6a3803ff2c..5bc190bf98 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -535,12 +535,12 @@ static void iommu_dump_p2m_table(unsigned char key)
>  
>          if ( iommu_use_hap_pt(d) )
>          {
> -            printk("\ndomain%d IOMMU p2m table shared with MMU: \n", d->domain_id);
> +            printk("%pd: IOMMU page tables shared with MMU\n", d);

Change MMU to CPU?  MMU is very ambiguous in this context.

>              continue;
>          }
>  
> -        printk("\ndomain%d IOMMU p2m table: \n", d->domain_id);
> -        ops->dump_p2m_table(d);
> +        printk("%pd: IOMMU page tables: \n", d);

Drop the trailing whitespace?

~Andrew


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] x86/iommu: re-arrange arch_iommu to separate common fields...
  2020-07-24 18:49     ` Paul Durrant
@ 2020-07-26  8:13       ` Jan Beulich
  2020-07-27  9:30         ` Durrant, Paul
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2020-07-26  8:13 UTC (permalink / raw)
  To: paul
  Cc: 'Kevin Tian', 'Wei Liu', 'Andrew Cooper',
	'Paul Durrant', 'Lukasz Hawrylko',
	xen-devel, 'Roger Pau Monné'

On 24.07.2020 20:49, Paul Durrant wrote:
>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>> Sent: 24 July 2020 18:29
>>
>> On 24/07/2020 17:46, Paul Durrant wrote:
>>> diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
>>> index 6c9d5e5632..a7add5208c 100644
>>> --- a/xen/include/asm-x86/iommu.h
>>> +++ b/xen/include/asm-x86/iommu.h
>>> @@ -45,16 +45,23 @@ typedef uint64_t daddr_t;
>>>
>>>  struct arch_iommu
>>>  {
>>> -    u64 pgd_maddr;                 /* io page directory machine address */
>>> -    spinlock_t mapping_lock;            /* io page table lock */
>>> -    int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
>>> -    u64 iommu_bitmap;              /* bitmap of iommu(s) that the domain uses */
>>> -    struct list_head mapped_rmrrs;
>>> -
>>> -    /* amd iommu support */
>>> -    int paging_mode;
>>> -    struct page_info *root_table;
>>> -    struct guest_iommu *g_iommu;
>>> +    spinlock_t mapping_lock; /* io page table lock */
>>> +
>>> +    union {
>>> +        /* Intel VT-d */
>>> +        struct {
>>> +            u64 pgd_maddr; /* io page directory machine address */
>>> +            int agaw; /* adjusted guest address width, 0 is level 2 30-bit */
>>> +            u64 iommu_bitmap; /* bitmap of iommu(s) that the domain uses */
>>> +            struct list_head mapped_rmrrs;
>>> +        } vtd;
>>> +        /* AMD IOMMU */
>>> +        struct {
>>> +            int paging_mode;
>>> +            struct page_info *root_table;
>>> +            struct guest_iommu *g_iommu;
>>> +        } amd_iommu;
>>> +    };
>>
>> The naming split here is weird.
>>
>> Ideally we'd have struct {vtd,amd}_iommu in appropriate headers, and
>> this would be simply
>>
>> union {
>>     struct vtd_iommu vtd;
>>     struct amd_iommu amd;
>> };
>>
>> If this isn't trivial to arrange, can we at least s/amd_iommu/amd/ here ?
> 
> I was in two minds. I tried to look for a TLA for the AMD IOMMU and 'amd' seemed a little too non-descript. I don't really mind though if there's a strong preference to shorted it.

+1 for shortening in some way. Even amd_vi would already be better imo,
albeit I'm with Andrew and would think just amd is fine here (and
matches how things are in the file system structure).

While at it, may I ask that you also switch the plain "int" fields to
"unsigned int" - I think that's doable for both of them.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] x86/iommu: add common page-table allocator
  2020-07-24 18:24   ` Andrew Cooper
@ 2020-07-26  8:26     ` Jan Beulich
  2020-07-27  9:37     ` Durrant, Paul
  1 sibling, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2020-07-26  8:26 UTC (permalink / raw)
  To: Andrew Cooper, Paul Durrant
  Cc: xen-devel, Paul Durrant, Kevin Tian, Wei Liu, Roger Pau Monné

On 24.07.2020 20:24, Andrew Cooper wrote:
> On 24/07/2020 17:46, Paul Durrant wrote:
>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d)
>>  
>>      spin_lock_init(&hd->arch.mapping_lock);
>>  
>> +    INIT_PAGE_LIST_HEAD(&hd->arch.pgtables.list);
>> +    spin_lock_init(&hd->arch.pgtables.lock);
>> +
>>      return 0;
>>  }
>>  
>>  void arch_iommu_domain_destroy(struct domain *d)
>>  {
>> +    struct domain_iommu *hd = dom_iommu(d);
>> +    struct page_info *pg;
>> +
>> +    while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
>> +        free_domheap_page(pg);
> 
> Some of those 90 lines saved were the logic to not suffer a watchdog
> timeout here.
> 
> To do it like this, it needs plumbing into the relinquish resources path.

And indeed this is possible now only because we don't destroy page
tables for still running domains anymore. Maybe the description
should also make this connection.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/6] iommu: remove iommu_lookup_page() and the lookup_page() method...
  2020-07-24 18:53     ` Paul Durrant
@ 2020-07-26  8:27       ` Jan Beulich
  2020-07-27  9:58         ` Durrant, Paul
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2020-07-26  8:27 UTC (permalink / raw)
  To: paul
  Cc: 'Andrew Cooper', 'Paul Durrant',
	'Kevin Tian',
	xen-devel

On 24.07.2020 20:53, Paul Durrant wrote:
>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>> Sent: 24 July 2020 19:39
>>
>> On 24/07/2020 17:46, Paul Durrant wrote:
>>> From: Paul Durrant <pdurrant@amazon.com>
>>>
>>> ... from iommu_ops.
>>>
>>> This patch is essentially a reversion of dd93d54f "vtd: add lookup_page method
>>> to iommu_ops". The code was intended to be used by a patch that has long-
>>> since been abandoned. Therefore it is dead code and can be removed.
>>
>> And by this, you mean the work that you only partial unstreamed, with
>> the remainder of the feature still very much in use by XenServer?
>>
> 
> I thought we basically decided to bin the original PV IOMMU idea though? 

Did we? It's the first time I hear of it, I think.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6] remove remaining uses of iommu_legacy_map/unmap
  2020-07-24 16:46 ` [PATCH 4/6] remove remaining uses of iommu_legacy_map/unmap Paul Durrant
@ 2020-07-26  8:36   ` Jan Beulich
  2020-07-29  8:12     ` Durrant, Paul
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2020-07-26  8:36 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Paul Durrant, Ian Jackson, George Dunlap,
	Jun Nakajima, xen-devel, Roger Pau Monné

On 24.07.2020 18:46, Paul Durrant wrote:
> ---
>  xen/arch/x86/mm.c               | 22 +++++++++++++++-----
>  xen/arch/x86/mm/p2m-ept.c       | 22 +++++++++++++-------
>  xen/arch/x86/mm/p2m-pt.c        | 17 +++++++++++----
>  xen/arch/x86/mm/p2m.c           | 28 ++++++++++++++++++-------
>  xen/arch/x86/x86_64/mm.c        | 27 ++++++++++++++++++------
>  xen/common/grant_table.c        | 36 +++++++++++++++++++++++++-------
>  xen/common/memory.c             |  7 +++----
>  xen/drivers/passthrough/iommu.c | 37 +--------------------------------
>  xen/include/xen/iommu.h         | 20 +++++-------------
>  9 files changed, 123 insertions(+), 93 deletions(-)

Overall more code. I wonder whether a map-and-flush function (named
differently than the current ones) wouldn't still be worthwhile to
have.

> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -1225,11 +1225,25 @@ map_grant_ref(
>              kind = IOMMUF_readable;
>          else
>              kind = 0;
> -        if ( kind && iommu_legacy_map(ld, _dfn(mfn_x(mfn)), mfn, 0, kind) )
> +        if ( kind )
>          {
> -            double_gt_unlock(lgt, rgt);
> -            rc = GNTST_general_error;
> -            goto undo_out;
> +            dfn_t dfn = _dfn(mfn_x(mfn));
> +            unsigned int flush_flags = 0;
> +            int err;
> +
> +            err = iommu_map(ld, dfn, mfn, 0, kind, &flush_flags);
> +            if ( err )
> +                rc = GNTST_general_error;
> +
> +            err = iommu_iotlb_flush(ld, dfn, 1, flush_flags);
> +            if ( err )
> +                rc = GNTST_general_error;
> +
> +            if ( rc != GNTST_okay )
> +            {
> +                double_gt_unlock(lgt, rgt);
> +                goto undo_out;
> +            }
>          }

The mapping needs to happen with at least ld's lock held, yes. But
is the same true also for the flushing? Can't (not necessarily
right in this change) the flush be pulled out of the function and
instead done once per batch that got processed?

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] iommu: remove the share_p2m operation
  2020-07-24 16:46 ` [PATCH 5/6] iommu: remove the share_p2m operation Paul Durrant
  2020-07-24 19:00   ` Andrew Cooper
@ 2020-07-26  8:50   ` Jan Beulich
  2020-07-29  8:45     ` Durrant, Paul
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2020-07-26  8:50 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Wei Liu, Andrew Cooper, Paul Durrant, George Dunlap,
	xen-devel, Roger Pau Monné

On 24.07.2020 18:46, Paul Durrant wrote:
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -313,6 +313,26 @@ static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
>      return pte_maddr;
>  }
>  
> +static u64 domain_pgd_maddr(struct domain *d)

uint64_t please.

> +{
> +    struct domain_iommu *hd = dom_iommu(d);
> +
> +    ASSERT(spin_is_locked(&hd->arch.mapping_lock));
> +
> +    if ( iommu_use_hap_pt(d) )
> +    {
> +        mfn_t pgd_mfn =
> +            pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
> +
> +        return pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
> +    }
> +
> +    if ( !hd->arch.vtd.pgd_maddr )
> +        addr_to_dma_page_maddr(d, 0, 1);
> +
> +    return hd->arch.vtd.pgd_maddr;
> +}
> +
>  static void iommu_flush_write_buffer(struct vtd_iommu *iommu)
>  {
>      u32 val;
> @@ -1347,22 +1367,17 @@ int domain_context_mapping_one(
>      {
>          spin_lock(&hd->arch.mapping_lock);
>  
> -        /* Ensure we have pagetables allocated down to leaf PTE. */
> -        if ( hd->arch.vtd.pgd_maddr == 0 )
> +        pgd_maddr = domain_pgd_maddr(domain);
> +        if ( !pgd_maddr )
>          {
> -            addr_to_dma_page_maddr(domain, 0, 1);
> -            if ( hd->arch.vtd.pgd_maddr == 0 )
> -            {
> -            nomem:
> -                spin_unlock(&hd->arch.mapping_lock);
> -                spin_unlock(&iommu->lock);
> -                unmap_vtd_domain_page(context_entries);
> -                return -ENOMEM;
> -            }
> +        nomem:
> +            spin_unlock(&hd->arch.mapping_lock);
> +            spin_unlock(&iommu->lock);
> +            unmap_vtd_domain_page(context_entries);
> +            return -ENOMEM;
>          }

This renders all calls bogus in shared mode - the function, if
it ended up getting called nevertheless, would then still alloc
the root table. Therefore I'd like to suggest that at least all
its callers get an explicit check. That's really just
dma_pte_clear_one() as it looks.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 1/6] x86/iommu: re-arrange arch_iommu to separate common fields...
  2020-07-26  8:13       ` Jan Beulich
@ 2020-07-27  9:30         ` Durrant, Paul
  0 siblings, 0 replies; 27+ messages in thread
From: Durrant, Paul @ 2020-07-27  9:30 UTC (permalink / raw)
  To: Jan Beulich, paul
  Cc: 'Kevin Tian', 'Wei Liu', 'Andrew Cooper',
	'Lukasz Hawrylko',
	xen-devel, 'Roger Pau Monné'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 26 July 2020 09:14
> To: paul@xen.org
> Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; xen-devel@lists.xenproject.org; Durrant, Paul
> <pdurrant@amazon.co.uk>; 'Lukasz Hawrylko' <lukasz.hawrylko@linux.intel.com>; 'Wei Liu' <wl@xen.org>;
> 'Roger Pau Monné' <roger.pau@citrix.com>; 'Kevin Tian' <kevin.tian@intel.com>
> Subject: RE: [EXTERNAL] [PATCH 1/6] x86/iommu: re-arrange arch_iommu to separate common fields...
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 24.07.2020 20:49, Paul Durrant wrote:
> >> From: Andrew Cooper <andrew.cooper3@citrix.com>
> >> Sent: 24 July 2020 18:29
> >>
> >> On 24/07/2020 17:46, Paul Durrant wrote:
> >>> diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
> >>> index 6c9d5e5632..a7add5208c 100644
> >>> --- a/xen/include/asm-x86/iommu.h
> >>> +++ b/xen/include/asm-x86/iommu.h
> >>> @@ -45,16 +45,23 @@ typedef uint64_t daddr_t;
> >>>
> >>>  struct arch_iommu
> >>>  {
> >>> -    u64 pgd_maddr;                 /* io page directory machine address */
> >>> -    spinlock_t mapping_lock;            /* io page table lock */
> >>> -    int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
> >>> -    u64 iommu_bitmap;              /* bitmap of iommu(s) that the domain uses */
> >>> -    struct list_head mapped_rmrrs;
> >>> -
> >>> -    /* amd iommu support */
> >>> -    int paging_mode;
> >>> -    struct page_info *root_table;
> >>> -    struct guest_iommu *g_iommu;
> >>> +    spinlock_t mapping_lock; /* io page table lock */
> >>> +
> >>> +    union {
> >>> +        /* Intel VT-d */
> >>> +        struct {
> >>> +            u64 pgd_maddr; /* io page directory machine address */
> >>> +            int agaw; /* adjusted guest address width, 0 is level 2 30-bit */
> >>> +            u64 iommu_bitmap; /* bitmap of iommu(s) that the domain uses */
> >>> +            struct list_head mapped_rmrrs;
> >>> +        } vtd;
> >>> +        /* AMD IOMMU */
> >>> +        struct {
> >>> +            int paging_mode;
> >>> +            struct page_info *root_table;
> >>> +            struct guest_iommu *g_iommu;
> >>> +        } amd_iommu;
> >>> +    };
> >>
> >> The naming split here is weird.
> >>
> >> Ideally we'd have struct {vtd,amd}_iommu in appropriate headers, and
> >> this would be simply
> >>
> >> union {
> >>     struct vtd_iommu vtd;
> >>     struct amd_iommu amd;
> >> };
> >>
> >> If this isn't trivial to arrange, can we at least s/amd_iommu/amd/ here ?
> >
> > I was in two minds. I tried to look for a TLA for the AMD IOMMU and 'amd' seemed a little too non-
> descript. I don't really mind though if there's a strong preference to shorted it.
> 
> +1 for shortening in some way. Even amd_vi would already be better imo,
> albeit I'm with Andrew and would think just amd is fine here (and
> matches how things are in the file system structure).
> 

OK, I'll shorten to 'amd'.

> While at it, may I ask that you also switch the plain "int" fields to
> "unsigned int" - I think that's doable for both of them.
> 

Sure, I can do that.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 2/6] x86/iommu: add common page-table allocator
  2020-07-24 18:24   ` Andrew Cooper
  2020-07-26  8:26     ` Jan Beulich
@ 2020-07-27  9:37     ` Durrant, Paul
  2020-07-27 19:41       ` Jan Beulich
  1 sibling, 1 reply; 27+ messages in thread
From: Durrant, Paul @ 2020-07-27  9:37 UTC (permalink / raw)
  To: Andrew Cooper, Paul Durrant, xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Roger Pau Monné

> -----Original Message-----
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Sent: 24 July 2020 19:24
> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> Cc: Durrant, Paul <pdurrant@amazon.co.uk>; Jan Beulich <jbeulich@suse.com>; Kevin Tian
> <kevin.tian@intel.com>; Wei Liu <wl@xen.org>; Roger Pau Monné <roger.pau@citrix.com>
> Subject: RE: [EXTERNAL] [PATCH 2/6] x86/iommu: add common page-table allocator
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 24/07/2020 17:46, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > Instead of having separate page table allocation functions in VT-d and AMD
> > IOMMU code, use a common allocation function in the general x86 code.
> > Also, rather than walking the page tables and using a tasklet to free them
> > during iommu_domain_destroy(), add allocated page table pages to a list and
> > then simply walk the list to free them. This saves ~90 lines of code overall.
> >
> > NOTE: There is no need to clear and sync PTEs during teardown since the per-
> >       device root entries will have already been cleared (when devices were
> >       de-assigned) so the page tables can no longer be accessed by the IOMMU.
> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> 
> Oh wow - I don't have any polite words for how that code was organised
> before.
> 
> Instead of discussing the ~90 lines saved, what about the removal of a
> global bottleneck (the tasklet) or expand on the removal of redundant
> TLB/cache maintenance?
> 

Ok.

> > diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > index c386dc4387..fd9b1e7bd5 100644
> > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > @@ -378,64 +380,9 @@ static int amd_iommu_assign_device(struct domain *d, u8 devfn,
> >      return reassign_device(pdev->domain, d, devfn, pdev);
> >  }
> >
> > -static void deallocate_next_page_table(struct page_info *pg, int level)
> > -{
> > -    PFN_ORDER(pg) = level;
> > -    spin_lock(&iommu_pt_cleanup_lock);
> > -    page_list_add_tail(pg, &iommu_pt_cleanup_list);
> > -    spin_unlock(&iommu_pt_cleanup_lock);
> > -}
> > -
> > -static void deallocate_page_table(struct page_info *pg)
> > -{
> > -    struct amd_iommu_pte *table_vaddr;
> > -    unsigned int index, level = PFN_ORDER(pg);
> > -
> > -    PFN_ORDER(pg) = 0;
> > -
> > -    if ( level <= 1 )
> > -    {
> > -        free_amd_iommu_pgtable(pg);
> > -        return;
> > -    }
> > -
> > -    table_vaddr = __map_domain_page(pg);
> > -
> > -    for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
> > -    {
> > -        struct amd_iommu_pte *pde = &table_vaddr[index];
> > -
> > -        if ( pde->mfn && pde->next_level && pde->pr )
> > -        {
> > -            /* We do not support skip levels yet */
> > -            ASSERT(pde->next_level == level - 1);
> > -            deallocate_next_page_table(mfn_to_page(_mfn(pde->mfn)),
> > -                                       pde->next_level);
> > -        }
> > -    }
> > -
> > -    unmap_domain_page(table_vaddr);
> > -    free_amd_iommu_pgtable(pg);
> > -}
> > -
> > -static void deallocate_iommu_page_tables(struct domain *d)
> > -{
> > -    struct domain_iommu *hd = dom_iommu(d);
> > -
> > -    spin_lock(&hd->arch.mapping_lock);
> > -    if ( hd->arch.amd_iommu.root_table )
> > -    {
> > -        deallocate_next_page_table(hd->arch.amd_iommu.root_table,
> > -                                   hd->arch.amd_iommu.paging_mode);
> 
> I really need to dust off my patch fixing up several bits of dubious
> logic, including the name "paging_mode" which is actually simply the
> number of levels.
> 
> At this point, it will probably be best to get this series in first, and
> for me to rebase.
> 

Ok.

> > -        hd->arch.amd_iommu.root_table = NULL;
> > -    }
> > -    spin_unlock(&hd->arch.mapping_lock);
> > -}
> > -
> > -
> >  static void amd_iommu_domain_destroy(struct domain *d)
> >  {
> > -    deallocate_iommu_page_tables(d);
> > +    dom_iommu(d)->arch.amd_iommu.root_table = NULL;
> >      amd_iommu_flush_all_pages(d);
> 
> Per your NOTE:, shouldn't this flush call be dropped?
> 

Indeed it should.

> > diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
> > index a12109a1de..b3c7da0fe2 100644
> > --- a/xen/drivers/passthrough/x86/iommu.c
> > +++ b/xen/drivers/passthrough/x86/iommu.c
> > @@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d)
> >
> >      spin_lock_init(&hd->arch.mapping_lock);
> >
> > +    INIT_PAGE_LIST_HEAD(&hd->arch.pgtables.list);
> > +    spin_lock_init(&hd->arch.pgtables.lock);
> > +
> >      return 0;
> >  }
> >
> >  void arch_iommu_domain_destroy(struct domain *d)
> >  {
> > +    struct domain_iommu *hd = dom_iommu(d);
> > +    struct page_info *pg;
> > +
> > +    while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
> > +        free_domheap_page(pg);
> 
> Some of those 90 lines saved were the logic to not suffer a watchdog
> timeout here.
> 
> To do it like this, it needs plumbing into the relinquish resources path.
> 

Ok. I does look like there could be other potentially lengthy destruction done off the back of the RCU call. Ought we have the ability to have a restartable domain_destroy()?

> >  }
> >
> >  static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
> > @@ -257,6 +265,39 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
> >          return;
> >  }
> >
> > +struct page_info *iommu_alloc_pgtable(struct domain *d)
> > +{
> > +    struct domain_iommu *hd = dom_iommu(d);
> > +#ifdef CONFIG_NUMA
> > +    unsigned int memflags = (hd->node == NUMA_NO_NODE) ?
> > +        0 : MEMF_node(hd->node);
> > +#else
> > +    unsigned int memflags = 0;
> > +#endif
> > +    struct page_info *pg;
> > +    void *p;
> 
> The memflags code is very awkward.  How about initialising it to 0, and
> having:
> 
> #ifdef CONFIG_NUMA
>     if ( hd->node != NUMA_NO_NODE )
>         memflags = MEMF_node(hd->node);
> #endif
> 
> here?
> 

Sure.

> > +
> > +    BUG_ON(!iommu_enabled);
> 
> Is this really necessary?  Can we plausibly end up in this function
> otherwise?
> 

Not really; I'll drop it.

> 
> Overall, I wonder if this patch would better be split into several.  One
> which introduces the common alloc/free implementation, two which switch
> the VT-d and AMD implementations over, and possibly one clean-up on the end?
> 

Ok, if you feel the patch is too large as-is then I'll split as you suggest.

  Paul

> ~Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 3/6] iommu: remove iommu_lookup_page() and the lookup_page() method...
  2020-07-26  8:27       ` Jan Beulich
@ 2020-07-27  9:58         ` Durrant, Paul
  0 siblings, 0 replies; 27+ messages in thread
From: Durrant, Paul @ 2020-07-27  9:58 UTC (permalink / raw)
  To: Jan Beulich, paul
  Cc: 'Andrew Cooper', 'Kevin Tian', xen-devel

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 26 July 2020 09:28
> To: paul@xen.org
> Cc: 'Andrew Cooper' <andrew.cooper3@citrix.com>; xen-devel@lists.xenproject.org; Durrant, Paul
> <pdurrant@amazon.co.uk>; 'Kevin Tian' <kevin.tian@intel.com>
> Subject: RE: [EXTERNAL] [PATCH 3/6] iommu: remove iommu_lookup_page() and the lookup_page() method...
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 24.07.2020 20:53, Paul Durrant wrote:
> >> From: Andrew Cooper <andrew.cooper3@citrix.com>
> >> Sent: 24 July 2020 19:39
> >>
> >> On 24/07/2020 17:46, Paul Durrant wrote:
> >>> From: Paul Durrant <pdurrant@amazon.com>
> >>>
> >>> ... from iommu_ops.
> >>>
> >>> This patch is essentially a reversion of dd93d54f "vtd: add lookup_page method
> >>> to iommu_ops". The code was intended to be used by a patch that has long-
> >>> since been abandoned. Therefore it is dead code and can be removed.
> >>
> >> And by this, you mean the work that you only partial unstreamed, with
> >> the remainder of the feature still very much in use by XenServer?
> >>
> >
> > I thought we basically decided to bin the original PV IOMMU idea though?
> 
> Did we? It's the first time I hear of it, I think.
> 

I circulated a doc. ages ago, while I was still at Citrix: https://docs.google.com/document/d/12-z6JD41J_oNrCg_c0yAxGWg5ADBQ8_bSiP_NH6Hqwo/edit?usp=sharing

In there I propose that we don't follow the original idea of keeping a single set of per-domain tables but instead have a set of tables (or IOMMU contexts) for groups of devices. 'Context 0' is the current set of static 1:1 tables but other contexts are manipulated by hypercall so, in this plan, I don't envisage the need to look up mappings in the tables in this way... but I guess I can't rule it out.

 Paul 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 6/6] iommu: stop calling IOMMU page tables 'p2m tables'
  2020-07-24 19:08   ` Andrew Cooper
@ 2020-07-27 10:13     ` Durrant, Paul
  0 siblings, 0 replies; 27+ messages in thread
From: Durrant, Paul @ 2020-07-27 10:13 UTC (permalink / raw)
  To: Andrew Cooper, Paul Durrant, xen-devel; +Cc: Kevin Tian, Jan Beulich

> -----Original Message-----
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Sent: 24 July 2020 20:09
> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> Cc: Durrant, Paul <pdurrant@amazon.co.uk>; Jan Beulich <jbeulich@suse.com>; Kevin Tian
> <kevin.tian@intel.com>
> Subject: RE: [EXTERNAL] [PATCH 6/6] iommu: stop calling IOMMU page tables 'p2m tables'
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 24/07/2020 17:46, Paul Durrant wrote:
> > diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> > index 6a3803ff2c..5bc190bf98 100644
> > --- a/xen/drivers/passthrough/iommu.c
> > +++ b/xen/drivers/passthrough/iommu.c
> > @@ -535,12 +535,12 @@ static void iommu_dump_p2m_table(unsigned char key)
> >
> >          if ( iommu_use_hap_pt(d) )
> >          {
> > -            printk("\ndomain%d IOMMU p2m table shared with MMU: \n", d->domain_id);
> > +            printk("%pd: IOMMU page tables shared with MMU\n", d);
> 
> Change MMU to CPU?  MMU is very ambiguous in this context.
> 

Actually I could push this into the VT-d code and just say something like 'shared EPT is enabled'. Would that be less ambiguous?

> >              continue;
> >          }
> >
> > -        printk("\ndomain%d IOMMU p2m table: \n", d->domain_id);
> > -        ops->dump_p2m_table(d);
> > +        printk("%pd: IOMMU page tables: \n", d);
> 
> Drop the trailing whitespace?
> 

Sure.

  Paul

> ~Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] x86/iommu: add common page-table allocator
  2020-07-27  9:37     ` Durrant, Paul
@ 2020-07-27 19:41       ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2020-07-27 19:41 UTC (permalink / raw)
  To: Durrant, Paul
  Cc: Kevin Tian, Wei Liu, Paul Durrant, Andrew Cooper, xen-devel,
	Roger Pau Monné

On 27.07.2020 11:37, Durrant, Paul wrote:
>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>> Sent: 24 July 2020 19:24
>>
>> On 24/07/2020 17:46, Paul Durrant wrote:
>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>> @@ -140,11 +140,19 @@ int arch_iommu_domain_init(struct domain *d)
>>>
>>>       spin_lock_init(&hd->arch.mapping_lock);
>>>
>>> +    INIT_PAGE_LIST_HEAD(&hd->arch.pgtables.list);
>>> +    spin_lock_init(&hd->arch.pgtables.lock);
>>> +
>>>       return 0;
>>>   }
>>>
>>>   void arch_iommu_domain_destroy(struct domain *d)
>>>   {
>>> +    struct domain_iommu *hd = dom_iommu(d);
>>> +    struct page_info *pg;
>>> +
>>> +    while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
>>> +        free_domheap_page(pg);
>>
>> Some of those 90 lines saved were the logic to not suffer a watchdog
>> timeout here.
>>
>> To do it like this, it needs plumbing into the relinquish resources path.
>>
> 
> Ok. I does look like there could be other potentially lengthy destruction done off the back of the RCU call. Ought we have the ability to have a restartable domain_destroy()?

I don't see how this would be (easily) feasible. Instead - why do
page tables not get cleaned up already at relinquish_resources time?

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 4/6] remove remaining uses of iommu_legacy_map/unmap
  2020-07-26  8:36   ` Jan Beulich
@ 2020-07-29  8:12     ` Durrant, Paul
  0 siblings, 0 replies; 27+ messages in thread
From: Durrant, Paul @ 2020-07-29  8:12 UTC (permalink / raw)
  To: Jan Beulich, Paul Durrant
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Jun Nakajima,
	xen-devel, Roger Pau Monné

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 26 July 2020 09:36
> To: Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Durrant, Paul <pdurrant@amazon.co.uk>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau Monné <roger.pau@citrix.com>; George
> Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Julien Grall
> <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Jun Nakajima <jun.nakajima@intel.com>;
> Kevin Tian <kevin.tian@intel.com>
> Subject: RE: [EXTERNAL] [PATCH 4/6] remove remaining uses of iommu_legacy_map/unmap
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 24.07.2020 18:46, Paul Durrant wrote:
> > ---
> >  xen/arch/x86/mm.c               | 22 +++++++++++++++-----
> >  xen/arch/x86/mm/p2m-ept.c       | 22 +++++++++++++-------
> >  xen/arch/x86/mm/p2m-pt.c        | 17 +++++++++++----
> >  xen/arch/x86/mm/p2m.c           | 28 ++++++++++++++++++-------
> >  xen/arch/x86/x86_64/mm.c        | 27 ++++++++++++++++++------
> >  xen/common/grant_table.c        | 36 +++++++++++++++++++++++++-------
> >  xen/common/memory.c             |  7 +++----
> >  xen/drivers/passthrough/iommu.c | 37 +--------------------------------
> >  xen/include/xen/iommu.h         | 20 +++++-------------
> >  9 files changed, 123 insertions(+), 93 deletions(-)
> 
> Overall more code. I wonder whether a map-and-flush function (named
> differently than the current ones) wouldn't still be worthwhile to
> have.

Agreed but an extra 30 lines is not huge. I'd still like to keep map/unmap and flush separate but I'll see if I can reduce the added lines.

> 
> > --- a/xen/common/grant_table.c
> > +++ b/xen/common/grant_table.c
> > @@ -1225,11 +1225,25 @@ map_grant_ref(
> >              kind = IOMMUF_readable;
> >          else
> >              kind = 0;
> > -        if ( kind && iommu_legacy_map(ld, _dfn(mfn_x(mfn)), mfn, 0, kind) )
> > +        if ( kind )
> >          {
> > -            double_gt_unlock(lgt, rgt);
> > -            rc = GNTST_general_error;
> > -            goto undo_out;
> > +            dfn_t dfn = _dfn(mfn_x(mfn));
> > +            unsigned int flush_flags = 0;
> > +            int err;
> > +
> > +            err = iommu_map(ld, dfn, mfn, 0, kind, &flush_flags);
> > +            if ( err )
> > +                rc = GNTST_general_error;
> > +
> > +            err = iommu_iotlb_flush(ld, dfn, 1, flush_flags);
> > +            if ( err )
> > +                rc = GNTST_general_error;
> > +
> > +            if ( rc != GNTST_okay )
> > +            {
> > +                double_gt_unlock(lgt, rgt);
> > +                goto undo_out;
> > +            }
> >          }
> 
> The mapping needs to happen with at least ld's lock held, yes. But
> is the same true also for the flushing? Can't (not necessarily
> right in this change) the flush be pulled out of the function and
> instead done once per batch that got processed?
> 

True, the locks need not be held across the flush. I'll have a look at batching too.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 5/6] iommu: remove the share_p2m operation
  2020-07-24 19:00   ` Andrew Cooper
@ 2020-07-29  8:43     ` Durrant, Paul
  0 siblings, 0 replies; 27+ messages in thread
From: Durrant, Paul @ 2020-07-29  8:43 UTC (permalink / raw)
  To: Andrew Cooper, Paul Durrant, xen-devel
  Cc: Wei Liu, Kevin Tian, George Dunlap, Jan Beulich, Roger Pau Monné

> -----Original Message-----
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Sent: 24 July 2020 20:01
> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> Cc: Durrant, Paul <pdurrant@amazon.co.uk>; Jan Beulich <jbeulich@suse.com>; George Dunlap
> <george.dunlap@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau Monné <roger.pau@citrix.com>; Kevin Tian
> <kevin.tian@intel.com>
> Subject: RE: [EXTERNAL] [PATCH 5/6] iommu: remove the share_p2m operation
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 24/07/2020 17:46, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > Sharing of HAP tables is VT-d specific so the operation is never defined for
> > AMD IOMMU.
> 
> It's not VT-d specific, and Xen really did used to share on AMD.
> 

Oh, I never thought that ever worked.

> In fact, a remnant of this logic is still present in the form of the
> comment for p2m_type_t explaining why p2m_ram_rw needs to be 0.
> 
> That said, I agree it shouldn't be a hook, because it is statically
> known at domain create time based on flags and hardware properties.
> 

Well, for VT-d that may well change in future ;-)

> >  There's also no need to pro-actively set vtd.pgd_maddr when using
> > shared EPT as it is straightforward to simply define a helper function to
> > return the appropriate value in the shared and non-shared cases.
> 
> It occurs to me that vtd.pgd_maddr and amd.root_table really are the
> same thing, and furthermore are common to all IOMMU implementations on
> any architecture.  Is it worth trying to make them common, rather than
> continuing to let each implementation handle them differently?

I looked at this. The problem really lies in terminology. The 'root table' in the VT-d code refers to the single root table which IIRC is called the device table in the AMD IOMMU code so, whilst I could convert both to use a single common field, I think it's not really that valuable to do so since it's likely to make the code slightly more confusing to read (for someone expecting the names to tally with the spec).

  Paul

> 
> ~Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 5/6] iommu: remove the share_p2m operation
  2020-07-26  8:50   ` Jan Beulich
@ 2020-07-29  8:45     ` Durrant, Paul
  0 siblings, 0 replies; 27+ messages in thread
From: Durrant, Paul @ 2020-07-29  8:45 UTC (permalink / raw)
  To: Jan Beulich, Paul Durrant
  Cc: Kevin Tian, Wei Liu, Andrew Cooper, George Dunlap, xen-devel,
	Roger Pau Monné

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 26 July 2020 09:50
> To: Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Durrant, Paul <pdurrant@amazon.co.uk>; Andrew Cooper
> <andrew.cooper3@citrix.com>; George Dunlap <george.dunlap@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Kevin Tian <kevin.tian@intel.com>
> Subject: RE: [EXTERNAL] [PATCH 5/6] iommu: remove the share_p2m operation
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 24.07.2020 18:46, Paul Durrant wrote:
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -313,6 +313,26 @@ static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
> >      return pte_maddr;
> >  }
> >
> > +static u64 domain_pgd_maddr(struct domain *d)
> 
> uint64_t please.
> 

Ok.

> > +{
> > +    struct domain_iommu *hd = dom_iommu(d);
> > +
> > +    ASSERT(spin_is_locked(&hd->arch.mapping_lock));
> > +
> > +    if ( iommu_use_hap_pt(d) )
> > +    {
> > +        mfn_t pgd_mfn =
> > +            pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
> > +
> > +        return pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
> > +    }
> > +
> > +    if ( !hd->arch.vtd.pgd_maddr )
> > +        addr_to_dma_page_maddr(d, 0, 1);
> > +
> > +    return hd->arch.vtd.pgd_maddr;
> > +}
> > +
> >  static void iommu_flush_write_buffer(struct vtd_iommu *iommu)
> >  {
> >      u32 val;
> > @@ -1347,22 +1367,17 @@ int domain_context_mapping_one(
> >      {
> >          spin_lock(&hd->arch.mapping_lock);
> >
> > -        /* Ensure we have pagetables allocated down to leaf PTE. */
> > -        if ( hd->arch.vtd.pgd_maddr == 0 )
> > +        pgd_maddr = domain_pgd_maddr(domain);
> > +        if ( !pgd_maddr )
> >          {
> > -            addr_to_dma_page_maddr(domain, 0, 1);
> > -            if ( hd->arch.vtd.pgd_maddr == 0 )
> > -            {
> > -            nomem:
> > -                spin_unlock(&hd->arch.mapping_lock);
> > -                spin_unlock(&iommu->lock);
> > -                unmap_vtd_domain_page(context_entries);
> > -                return -ENOMEM;
> > -            }
> > +        nomem:
> > +            spin_unlock(&hd->arch.mapping_lock);
> > +            spin_unlock(&iommu->lock);
> > +            unmap_vtd_domain_page(context_entries);
> > +            return -ENOMEM;
> >          }
> 
> This renders all calls bogus in shared mode - the function, if
> it ended up getting called nevertheless, would then still alloc
> the root table. Therefore I'd like to suggest that at least all
> its callers get an explicit check. That's really just
> dma_pte_clear_one() as it looks.
> 

Ok, I think I may move this code out into a separate function too since the nomem label is slightly awkward, so I'll re-factor it.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-07-29  8:46 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-24 16:46 [PATCH 0/6] IOMMU cleanup Paul Durrant
2020-07-24 16:46 ` [PATCH 1/6] x86/iommu: re-arrange arch_iommu to separate common fields Paul Durrant
2020-07-24 17:29   ` Andrew Cooper
2020-07-24 18:49     ` Paul Durrant
2020-07-26  8:13       ` Jan Beulich
2020-07-27  9:30         ` Durrant, Paul
2020-07-24 16:46 ` [PATCH 2/6] x86/iommu: add common page-table allocator Paul Durrant
2020-07-24 18:24   ` Andrew Cooper
2020-07-26  8:26     ` Jan Beulich
2020-07-27  9:37     ` Durrant, Paul
2020-07-27 19:41       ` Jan Beulich
2020-07-24 16:46 ` [PATCH 3/6] iommu: remove iommu_lookup_page() and the lookup_page() method Paul Durrant
2020-07-24 18:38   ` Andrew Cooper
2020-07-24 18:53     ` Paul Durrant
2020-07-26  8:27       ` Jan Beulich
2020-07-27  9:58         ` Durrant, Paul
2020-07-24 16:46 ` [PATCH 4/6] remove remaining uses of iommu_legacy_map/unmap Paul Durrant
2020-07-26  8:36   ` Jan Beulich
2020-07-29  8:12     ` Durrant, Paul
2020-07-24 16:46 ` [PATCH 5/6] iommu: remove the share_p2m operation Paul Durrant
2020-07-24 19:00   ` Andrew Cooper
2020-07-29  8:43     ` Durrant, Paul
2020-07-26  8:50   ` Jan Beulich
2020-07-29  8:45     ` Durrant, Paul
2020-07-24 16:46 ` [PATCH 6/6] iommu: stop calling IOMMU page tables 'p2m tables' Paul Durrant
2020-07-24 19:08   ` Andrew Cooper
2020-07-27 10:13     ` Durrant, Paul

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.