xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/14] IOMMU cleanup
@ 2020-08-04 13:41 Paul Durrant
  2020-08-04 13:41 ` [PATCH v4 01/14] x86/iommu: re-arrange arch_iommu to separate common fields Paul Durrant
                   ` (13 more replies)
  0 siblings, 14 replies; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Andrew Cooper, Paul Durrant, Ian Jackson, George Dunlap,
	Lukasz Hawrylko, Jan Beulich, Volodymyr Babchuk,
	Roger Pau Monné

From: Paul Durrant <pdurrant@amazon.com>

v4:
 - Added three more patches to convert root_entry, context_entry and
   dma_pte to bit fields.

Paul Durrant (14):
  x86/iommu: re-arrange arch_iommu to separate common fields...
  x86/iommu: add common page-table allocator
  x86/iommu: convert VT-d code to use new page table allocator
  x86/iommu: convert AMD IOMMU code to use new page table allocator
  iommu: remove unused iommu_ops method and tasklet
  iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail
  iommu: make map, unmap and flush all take both an order and a count
  remove remaining uses of iommu_legacy_map/unmap
  common/grant_table: batch flush I/O TLB
  iommu: remove the share_p2m operation
  iommu: stop calling IOMMU page tables 'p2m tables'
  vtd: use a bit field for root_entry
  vtd: use a bit field for context_entry
  vtd: use a bit field for dma_pte

 xen/arch/arm/p2m.c                          |   2 +-
 xen/arch/x86/domain.c                       |   9 +-
 xen/arch/x86/mm.c                           |  21 +-
 xen/arch/x86/mm/p2m-ept.c                   |  20 +-
 xen/arch/x86/mm/p2m-pt.c                    |  15 +-
 xen/arch/x86/mm/p2m.c                       |  29 ++-
 xen/arch/x86/tboot.c                        |   4 +-
 xen/arch/x86/x86_64/mm.c                    |  27 +-
 xen/common/grant_table.c                    | 142 +++++++----
 xen/common/memory.c                         |   9 +-
 xen/drivers/passthrough/amd/iommu.h         |  20 +-
 xen/drivers/passthrough/amd/iommu_guest.c   |   8 +-
 xen/drivers/passthrough/amd/iommu_map.c     |  26 +-
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 110 +++-----
 xen/drivers/passthrough/arm/ipmmu-vmsa.c    |   2 +-
 xen/drivers/passthrough/arm/smmu.c          |   2 +-
 xen/drivers/passthrough/iommu.c             | 118 ++-------
 xen/drivers/passthrough/vtd/iommu.c         | 269 +++++++++-----------
 xen/drivers/passthrough/vtd/iommu.h         | 153 ++++++-----
 xen/drivers/passthrough/vtd/utils.c         |  10 +-
 xen/drivers/passthrough/vtd/x86/ats.c       |  27 +-
 xen/drivers/passthrough/x86/iommu.c         |  54 +++-
 xen/include/asm-x86/iommu.h                 |  34 ++-
 xen/include/xen/iommu.h                     |  37 +--
 24 files changed, 585 insertions(+), 563 deletions(-)
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: Wei Liu <wl@xen.org>
-- 
2.20.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v4 01/14] x86/iommu: re-arrange arch_iommu to separate common fields...
  2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
@ 2020-08-04 13:41 ` Paul Durrant
  2020-08-14  6:14   ` Tian, Kevin
  2020-08-04 13:41 ` [PATCH v4 02/14] x86/iommu: add common page-table allocator Paul Durrant
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Wei Liu, Andrew Cooper, Paul Durrant,
	Lukasz Hawrylko, Jan Beulich, Roger Pau Monné

From: Paul Durrant <pdurrant@amazon.com>

... from those specific to VT-d or AMD IOMMU, and put the latter in a union.

There is no functional change in this patch, although the initialization of
the 'mapped_rmrrs' list occurs slightly later in iommu_domain_init() since
it is now done (correctly) in VT-d specific code rather than in general x86
code.

NOTE: I have not combined the AMD IOMMU 'root_table' and VT-d 'pgd_maddr'
      fields even though they perform essentially the same function. The
      concept of 'root table' in the VT-d code is different from that in the
      AMD code so attempting to use a common name will probably only serve
      to confuse the reader.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Kevin Tian <kevin.tian@intel.com>

v4:
 - Fix format specifier as requested by Jan

v2:
 - s/amd_iommu/amd
 - Definitions still left inline as re-arrangement into implementation
   headers is non-trivial
 - Also s/u64/uint64_t and s/int/unsigned int
---
 xen/arch/x86/tboot.c                        |  4 +-
 xen/drivers/passthrough/amd/iommu_guest.c   |  8 ++--
 xen/drivers/passthrough/amd/iommu_map.c     | 14 +++---
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 35 +++++++-------
 xen/drivers/passthrough/vtd/iommu.c         | 53 +++++++++++----------
 xen/drivers/passthrough/x86/iommu.c         |  1 -
 xen/include/asm-x86/iommu.h                 | 27 +++++++----
 7 files changed, 78 insertions(+), 64 deletions(-)

diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
index 320e06f129..e66b0940c4 100644
--- a/xen/arch/x86/tboot.c
+++ b/xen/arch/x86/tboot.c
@@ -230,8 +230,8 @@ static void tboot_gen_domain_integrity(const uint8_t key[TB_KEY_SIZE],
         {
             const struct domain_iommu *dio = dom_iommu(d);
 
-            update_iommu_mac(&ctx, dio->arch.pgd_maddr,
-                             agaw_to_level(dio->arch.agaw));
+            update_iommu_mac(&ctx, dio->arch.vtd.pgd_maddr,
+                             agaw_to_level(dio->arch.vtd.agaw));
         }
     }
 
diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
index 014a72a54b..30b7353cd6 100644
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -50,12 +50,12 @@ static uint16_t guest_bdf(struct domain *d, uint16_t machine_bdf)
 
 static inline struct guest_iommu *domain_iommu(struct domain *d)
 {
-    return dom_iommu(d)->arch.g_iommu;
+    return dom_iommu(d)->arch.amd.g_iommu;
 }
 
 static inline struct guest_iommu *vcpu_iommu(struct vcpu *v)
 {
-    return dom_iommu(v->domain)->arch.g_iommu;
+    return dom_iommu(v->domain)->arch.amd.g_iommu;
 }
 
 static void guest_iommu_enable(struct guest_iommu *iommu)
@@ -823,7 +823,7 @@ int guest_iommu_init(struct domain* d)
     guest_iommu_reg_init(iommu);
     iommu->mmio_base = ~0ULL;
     iommu->domain = d;
-    hd->arch.g_iommu = iommu;
+    hd->arch.amd.g_iommu = iommu;
 
     tasklet_init(&iommu->cmd_buffer_tasklet, guest_iommu_process_command, d);
 
@@ -845,5 +845,5 @@ void guest_iommu_destroy(struct domain *d)
     tasklet_kill(&iommu->cmd_buffer_tasklet);
     xfree(iommu);
 
-    dom_iommu(d)->arch.g_iommu = NULL;
+    dom_iommu(d)->arch.amd.g_iommu = NULL;
 }
diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index 93e96cd69c..47b4472e8a 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -180,8 +180,8 @@ static int iommu_pde_from_dfn(struct domain *d, unsigned long dfn,
     struct page_info *table;
     const struct domain_iommu *hd = dom_iommu(d);
 
-    table = hd->arch.root_table;
-    level = hd->arch.paging_mode;
+    table = hd->arch.amd.root_table;
+    level = hd->arch.amd.paging_mode;
 
     BUG_ON( table == NULL || level < 1 || level > 6 );
 
@@ -325,7 +325,7 @@ int amd_iommu_unmap_page(struct domain *d, dfn_t dfn,
 
     spin_lock(&hd->arch.mapping_lock);
 
-    if ( !hd->arch.root_table )
+    if ( !hd->arch.amd.root_table )
     {
         spin_unlock(&hd->arch.mapping_lock);
         return 0;
@@ -450,7 +450,7 @@ int __init amd_iommu_quarantine_init(struct domain *d)
     unsigned int level = amd_iommu_get_paging_mode(end_gfn);
     struct amd_iommu_pte *table;
 
-    if ( hd->arch.root_table )
+    if ( hd->arch.amd.root_table )
     {
         ASSERT_UNREACHABLE();
         return 0;
@@ -458,11 +458,11 @@ int __init amd_iommu_quarantine_init(struct domain *d)
 
     spin_lock(&hd->arch.mapping_lock);
 
-    hd->arch.root_table = alloc_amd_iommu_pgtable();
-    if ( !hd->arch.root_table )
+    hd->arch.amd.root_table = alloc_amd_iommu_pgtable();
+    if ( !hd->arch.amd.root_table )
         goto out;
 
-    table = __map_domain_page(hd->arch.root_table);
+    table = __map_domain_page(hd->arch.amd.root_table);
     while ( level )
     {
         struct page_info *pg;
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 5f5f4a2eac..09a05f9d75 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -91,7 +91,8 @@ static void amd_iommu_setup_domain_device(
     u8 bus = pdev->bus;
     const struct domain_iommu *hd = dom_iommu(domain);
 
-    BUG_ON( !hd->arch.root_table || !hd->arch.paging_mode ||
+    BUG_ON( !hd->arch.amd.root_table ||
+            !hd->arch.amd.paging_mode ||
             !iommu->dev_table.buffer );
 
     if ( iommu_hwdom_passthrough && is_hardware_domain(domain) )
@@ -110,8 +111,8 @@ static void amd_iommu_setup_domain_device(
 
         /* bind DTE to domain page-tables */
         amd_iommu_set_root_page_table(
-            dte, page_to_maddr(hd->arch.root_table), domain->domain_id,
-            hd->arch.paging_mode, valid);
+            dte, page_to_maddr(hd->arch.amd.root_table),
+            domain->domain_id, hd->arch.amd.paging_mode, valid);
 
         /* Undo what amd_iommu_disable_domain_device() may have done. */
         ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
@@ -131,8 +132,8 @@ static void amd_iommu_setup_domain_device(
                         "root table = %#"PRIx64", "
                         "domain = %d, paging mode = %d\n",
                         req_id, pdev->type,
-                        page_to_maddr(hd->arch.root_table),
-                        domain->domain_id, hd->arch.paging_mode);
+                        page_to_maddr(hd->arch.amd.root_table),
+                        domain->domain_id, hd->arch.amd.paging_mode);
     }
 
     spin_unlock_irqrestore(&iommu->lock, flags);
@@ -206,10 +207,10 @@ static int iov_enable_xt(void)
 
 int amd_iommu_alloc_root(struct domain_iommu *hd)
 {
-    if ( unlikely(!hd->arch.root_table) )
+    if ( unlikely(!hd->arch.amd.root_table) )
     {
-        hd->arch.root_table = alloc_amd_iommu_pgtable();
-        if ( !hd->arch.root_table )
+        hd->arch.amd.root_table = alloc_amd_iommu_pgtable();
+        if ( !hd->arch.amd.root_table )
             return -ENOMEM;
     }
 
@@ -239,7 +240,7 @@ static int amd_iommu_domain_init(struct domain *d)
      *   physical address space we give it, but this isn't known yet so use 4
      *   unilaterally.
      */
-    hd->arch.paging_mode = amd_iommu_get_paging_mode(
+    hd->arch.amd.paging_mode = amd_iommu_get_paging_mode(
         is_hvm_domain(d)
         ? 1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT)
         : get_upper_mfn_bound() + 1);
@@ -305,7 +306,7 @@ static void amd_iommu_disable_domain_device(const struct domain *domain,
         AMD_IOMMU_DEBUG("Disable: device id = %#x, "
                         "domain = %d, paging mode = %d\n",
                         req_id,  domain->domain_id,
-                        dom_iommu(domain)->arch.paging_mode);
+                        dom_iommu(domain)->arch.amd.paging_mode);
     }
     spin_unlock_irqrestore(&iommu->lock, flags);
 
@@ -420,10 +421,11 @@ static void deallocate_iommu_page_tables(struct domain *d)
     struct domain_iommu *hd = dom_iommu(d);
 
     spin_lock(&hd->arch.mapping_lock);
-    if ( hd->arch.root_table )
+    if ( hd->arch.amd.root_table )
     {
-        deallocate_next_page_table(hd->arch.root_table, hd->arch.paging_mode);
-        hd->arch.root_table = NULL;
+        deallocate_next_page_table(hd->arch.amd.root_table,
+                                   hd->arch.amd.paging_mode);
+        hd->arch.amd.root_table = NULL;
     }
     spin_unlock(&hd->arch.mapping_lock);
 }
@@ -598,11 +600,12 @@ static void amd_dump_p2m_table(struct domain *d)
 {
     const struct domain_iommu *hd = dom_iommu(d);
 
-    if ( !hd->arch.root_table )
+    if ( !hd->arch.amd.root_table )
         return;
 
-    printk("p2m table has %d levels\n", hd->arch.paging_mode);
-    amd_dump_p2m_table_level(hd->arch.root_table, hd->arch.paging_mode, 0, 0);
+    printk("p2m table has %u levels\n", hd->arch.amd.paging_mode);
+    amd_dump_p2m_table_level(hd->arch.amd.root_table,
+                             hd->arch.amd.paging_mode, 0, 0);
 }
 
 static const struct iommu_ops __initconstrel _iommu_ops = {
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index deaeab095d..94e0455a4d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -257,20 +257,20 @@ static u64 bus_to_context_maddr(struct vtd_iommu *iommu, u8 bus)
 static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
 {
     struct domain_iommu *hd = dom_iommu(domain);
-    int addr_width = agaw_to_width(hd->arch.agaw);
+    int addr_width = agaw_to_width(hd->arch.vtd.agaw);
     struct dma_pte *parent, *pte = NULL;
-    int level = agaw_to_level(hd->arch.agaw);
+    int level = agaw_to_level(hd->arch.vtd.agaw);
     int offset;
     u64 pte_maddr = 0;
 
     addr &= (((u64)1) << addr_width) - 1;
     ASSERT(spin_is_locked(&hd->arch.mapping_lock));
-    if ( !hd->arch.pgd_maddr &&
+    if ( !hd->arch.vtd.pgd_maddr &&
          (!alloc ||
-          ((hd->arch.pgd_maddr = alloc_pgtable_maddr(1, hd->node)) == 0)) )
+          ((hd->arch.vtd.pgd_maddr = alloc_pgtable_maddr(1, hd->node)) == 0)) )
         goto out;
 
-    parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.pgd_maddr);
+    parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.vtd.pgd_maddr);
     while ( level > 1 )
     {
         offset = address_level_offset(addr, level);
@@ -593,7 +593,7 @@ static int __must_check iommu_flush_iotlb(struct domain *d, dfn_t dfn,
     {
         iommu = drhd->iommu;
 
-        if ( !test_bit(iommu->index, &hd->arch.iommu_bitmap) )
+        if ( !test_bit(iommu->index, &hd->arch.vtd.iommu_bitmap) )
             continue;
 
         flush_dev_iotlb = !!find_ats_dev_drhd(iommu);
@@ -1278,7 +1278,10 @@ void __init iommu_free(struct acpi_drhd_unit *drhd)
 
 static int intel_iommu_domain_init(struct domain *d)
 {
-    dom_iommu(d)->arch.agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
+    struct domain_iommu *hd = dom_iommu(d);
+
+    hd->arch.vtd.agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
+    INIT_LIST_HEAD(&hd->arch.vtd.mapped_rmrrs);
 
     return 0;
 }
@@ -1375,10 +1378,10 @@ int domain_context_mapping_one(
         spin_lock(&hd->arch.mapping_lock);
 
         /* Ensure we have pagetables allocated down to leaf PTE. */
-        if ( hd->arch.pgd_maddr == 0 )
+        if ( hd->arch.vtd.pgd_maddr == 0 )
         {
             addr_to_dma_page_maddr(domain, 0, 1);
-            if ( hd->arch.pgd_maddr == 0 )
+            if ( hd->arch.vtd.pgd_maddr == 0 )
             {
             nomem:
                 spin_unlock(&hd->arch.mapping_lock);
@@ -1389,7 +1392,7 @@ int domain_context_mapping_one(
         }
 
         /* Skip top levels of page tables for 2- and 3-level DRHDs. */
-        pgd_maddr = hd->arch.pgd_maddr;
+        pgd_maddr = hd->arch.vtd.pgd_maddr;
         for ( agaw = level_to_agaw(4);
               agaw != level_to_agaw(iommu->nr_pt_levels);
               agaw-- )
@@ -1443,7 +1446,7 @@ int domain_context_mapping_one(
     if ( rc > 0 )
         rc = 0;
 
-    set_bit(iommu->index, &hd->arch.iommu_bitmap);
+    set_bit(iommu->index, &hd->arch.vtd.iommu_bitmap);
 
     unmap_vtd_domain_page(context_entries);
 
@@ -1714,7 +1717,7 @@ static int domain_context_unmap(struct domain *domain, u8 devfn,
     {
         int iommu_domid;
 
-        clear_bit(iommu->index, &dom_iommu(domain)->arch.iommu_bitmap);
+        clear_bit(iommu->index, &dom_iommu(domain)->arch.vtd.iommu_bitmap);
 
         iommu_domid = domain_iommu_domid(domain, iommu);
         if ( iommu_domid == -1 )
@@ -1739,7 +1742,7 @@ static void iommu_domain_teardown(struct domain *d)
     if ( list_empty(&acpi_drhd_units) )
         return;
 
-    list_for_each_entry_safe ( mrmrr, tmp, &hd->arch.mapped_rmrrs, list )
+    list_for_each_entry_safe ( mrmrr, tmp, &hd->arch.vtd.mapped_rmrrs, list )
     {
         list_del(&mrmrr->list);
         xfree(mrmrr);
@@ -1751,8 +1754,9 @@ static void iommu_domain_teardown(struct domain *d)
         return;
 
     spin_lock(&hd->arch.mapping_lock);
-    iommu_free_pagetable(hd->arch.pgd_maddr, agaw_to_level(hd->arch.agaw));
-    hd->arch.pgd_maddr = 0;
+    iommu_free_pagetable(hd->arch.vtd.pgd_maddr,
+                         agaw_to_level(hd->arch.vtd.agaw));
+    hd->arch.vtd.pgd_maddr = 0;
     spin_unlock(&hd->arch.mapping_lock);
 }
 
@@ -1892,7 +1896,7 @@ static void iommu_set_pgd(struct domain *d)
     mfn_t pgd_mfn;
 
     pgd_mfn = pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
-    dom_iommu(d)->arch.pgd_maddr =
+    dom_iommu(d)->arch.vtd.pgd_maddr =
         pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
 }
 
@@ -1912,7 +1916,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
      * No need to acquire hd->arch.mapping_lock: Both insertion and removal
      * get done while holding pcidevs_lock.
      */
-    list_for_each_entry( mrmrr, &hd->arch.mapped_rmrrs, list )
+    list_for_each_entry( mrmrr, &hd->arch.vtd.mapped_rmrrs, list )
     {
         if ( mrmrr->base == rmrr->base_address &&
              mrmrr->end == rmrr->end_address )
@@ -1959,7 +1963,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
     mrmrr->base = rmrr->base_address;
     mrmrr->end = rmrr->end_address;
     mrmrr->count = 1;
-    list_add_tail(&mrmrr->list, &hd->arch.mapped_rmrrs);
+    list_add_tail(&mrmrr->list, &hd->arch.vtd.mapped_rmrrs);
 
     return 0;
 }
@@ -2657,8 +2661,9 @@ static void vtd_dump_p2m_table(struct domain *d)
         return;
 
     hd = dom_iommu(d);
-    printk("p2m table has %d levels\n", agaw_to_level(hd->arch.agaw));
-    vtd_dump_p2m_table_level(hd->arch.pgd_maddr, agaw_to_level(hd->arch.agaw), 0, 0);
+    printk("p2m table has %d levels\n", agaw_to_level(hd->arch.vtd.agaw));
+    vtd_dump_p2m_table_level(hd->arch.vtd.pgd_maddr,
+                             agaw_to_level(hd->arch.vtd.agaw), 0, 0);
 }
 
 static int __init intel_iommu_quarantine_init(struct domain *d)
@@ -2669,7 +2674,7 @@ static int __init intel_iommu_quarantine_init(struct domain *d)
     unsigned int level = agaw_to_level(agaw);
     int rc;
 
-    if ( hd->arch.pgd_maddr )
+    if ( hd->arch.vtd.pgd_maddr )
     {
         ASSERT_UNREACHABLE();
         return 0;
@@ -2677,11 +2682,11 @@ static int __init intel_iommu_quarantine_init(struct domain *d)
 
     spin_lock(&hd->arch.mapping_lock);
 
-    hd->arch.pgd_maddr = alloc_pgtable_maddr(1, hd->node);
-    if ( !hd->arch.pgd_maddr )
+    hd->arch.vtd.pgd_maddr = alloc_pgtable_maddr(1, hd->node);
+    if ( !hd->arch.vtd.pgd_maddr )
         goto out;
 
-    parent = map_vtd_domain_page(hd->arch.pgd_maddr);
+    parent = map_vtd_domain_page(hd->arch.vtd.pgd_maddr);
     while ( level )
     {
         uint64_t maddr;
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index 3d7670e8c6..a12109a1de 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -139,7 +139,6 @@ int arch_iommu_domain_init(struct domain *d)
     struct domain_iommu *hd = dom_iommu(d);
 
     spin_lock_init(&hd->arch.mapping_lock);
-    INIT_LIST_HEAD(&hd->arch.mapped_rmrrs);
 
     return 0;
 }
diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
index 6c9d5e5632..8ce97c981f 100644
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -45,16 +45,23 @@ typedef uint64_t daddr_t;
 
 struct arch_iommu
 {
-    u64 pgd_maddr;                 /* io page directory machine address */
-    spinlock_t mapping_lock;            /* io page table lock */
-    int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
-    u64 iommu_bitmap;              /* bitmap of iommu(s) that the domain uses */
-    struct list_head mapped_rmrrs;
-
-    /* amd iommu support */
-    int paging_mode;
-    struct page_info *root_table;
-    struct guest_iommu *g_iommu;
+    spinlock_t mapping_lock; /* io page table lock */
+
+    union {
+        /* Intel VT-d */
+        struct {
+            uint64_t pgd_maddr; /* io page directory machine address */
+            unsigned int agaw; /* adjusted guest address width, 0 is level 2 30-bit */
+            uint64_t iommu_bitmap; /* bitmap of iommu(s) that the domain uses */
+            struct list_head mapped_rmrrs;
+        } vtd;
+        /* AMD IOMMU */
+        struct {
+            unsigned int paging_mode;
+            struct page_info *root_table;
+            struct guest_iommu *g_iommu;
+        } amd;
+    };
 };
 
 extern struct iommu_ops iommu_ops;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 02/14] x86/iommu: add common page-table allocator
  2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
  2020-08-04 13:41 ` [PATCH v4 01/14] x86/iommu: re-arrange arch_iommu to separate common fields Paul Durrant
@ 2020-08-04 13:41 ` Paul Durrant
  2020-08-05 15:39   ` Jan Beulich
  2020-08-04 13:41 ` [PATCH v4 03/14] x86/iommu: convert VT-d code to use new page table allocator Paul Durrant
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Paul Durrant, Wei Liu, Jan Beulich, Roger Pau Monné

From: Paul Durrant <pdurrant@amazon.com>

Instead of having separate page table allocation functions in VT-d and AMD
IOMMU code, we could use a common allocation function in the general x86 code.

This patch adds a new allocation function, iommu_alloc_pgtable(), for this
purpose. The function adds the page table pages to a list. The pages in this
list are then freed by iommu_free_pgtables(), which is called by
domain_relinquish_resources() after PCI devices have been de-assigned.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>

v4:
 - Remove space between '*' and '__must_check'
 - Reduce frequency of pre-empt check during table freeing
 - Fix parentheses formatting

v2:
 - This is split out from a larger patch of the same name in v1
---
 xen/arch/x86/domain.c               |  9 ++++-
 xen/drivers/passthrough/x86/iommu.c | 51 +++++++++++++++++++++++++++++
 xen/include/asm-x86/iommu.h         |  7 ++++
 3 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f8084dc9e3..d1ecc7b83b 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2153,7 +2153,8 @@ int domain_relinquish_resources(struct domain *d)
         d->arch.rel_priv = PROG_ ## x; /* Fallthrough */ case PROG_ ## x
 
         enum {
-            PROG_paging = 1,
+            PROG_iommu_pagetables = 1,
+            PROG_paging,
             PROG_vcpu_pagetables,
             PROG_shared,
             PROG_xen,
@@ -2168,6 +2169,12 @@ int domain_relinquish_resources(struct domain *d)
         if ( ret )
             return ret;
 
+    PROGRESS(iommu_pagetables):
+
+        ret = iommu_free_pgtables(d);
+        if ( ret )
+            return ret;
+
     PROGRESS(paging):
 
         /* Tear down paging-assistance stuff. */
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index a12109a1de..aea07e47c4 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -140,6 +140,9 @@ int arch_iommu_domain_init(struct domain *d)
 
     spin_lock_init(&hd->arch.mapping_lock);
 
+    INIT_PAGE_LIST_HEAD(&hd->arch.pgtables.list);
+    spin_lock_init(&hd->arch.pgtables.lock);
+
     return 0;
 }
 
@@ -257,6 +260,54 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
         return;
 }
 
+int iommu_free_pgtables(struct domain *d)
+{
+    struct domain_iommu *hd = dom_iommu(d);
+    struct page_info *pg;
+    unsigned int done = 0;
+
+    while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
+    {
+        free_domheap_page(pg);
+
+        if ( !(++done & 0xff) && general_preempt_check() )
+            return -ERESTART;
+    }
+
+    return 0;
+}
+
+struct page_info *iommu_alloc_pgtable(struct domain *d)
+{
+    struct domain_iommu *hd = dom_iommu(d);
+    unsigned int memflags = 0;
+    struct page_info *pg;
+    void *p;
+
+#ifdef CONFIG_NUMA
+    if ( hd->node != NUMA_NO_NODE )
+        memflags = MEMF_node(hd->node);
+#endif
+
+    pg = alloc_domheap_page(NULL, memflags);
+    if ( !pg )
+        return NULL;
+
+    p = __map_domain_page(pg);
+    clear_page(p);
+
+    if ( hd->platform_ops->sync_cache )
+        iommu_vcall(hd->platform_ops, sync_cache, p, PAGE_SIZE);
+
+    unmap_domain_page(p);
+
+    spin_lock(&hd->arch.pgtables.lock);
+    page_list_add(pg, &hd->arch.pgtables.list);
+    spin_unlock(&hd->arch.pgtables.lock);
+
+    return pg;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
index 8ce97c981f..970eb06ffa 100644
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -46,6 +46,10 @@ typedef uint64_t daddr_t;
 struct arch_iommu
 {
     spinlock_t mapping_lock; /* io page table lock */
+    struct {
+        struct page_list_head list;
+        spinlock_t lock;
+    } pgtables;
 
     union {
         /* Intel VT-d */
@@ -131,6 +135,9 @@ int pi_update_irte(const struct pi_desc *pi_desc, const struct pirq *pirq,
         iommu_vcall(ops, sync_cache, addr, size);       \
 })
 
+int __must_check iommu_free_pgtables(struct domain *d);
+struct page_info *__must_check iommu_alloc_pgtable(struct domain *d);
+
 #endif /* !__ARCH_X86_IOMMU_H__ */
 /*
  * Local variables:
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 03/14] x86/iommu: convert VT-d code to use new page table allocator
  2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
  2020-08-04 13:41 ` [PATCH v4 01/14] x86/iommu: re-arrange arch_iommu to separate common fields Paul Durrant
  2020-08-04 13:41 ` [PATCH v4 02/14] x86/iommu: add common page-table allocator Paul Durrant
@ 2020-08-04 13:41 ` Paul Durrant
  2020-08-14  6:41   ` Tian, Kevin
  2020-08-04 13:41 ` [PATCH v4 04/14] x86/iommu: convert AMD IOMMU " Paul Durrant
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:41 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Kevin Tian, Jan Beulich

From: Paul Durrant <pdurrant@amazon.com>

This patch converts the VT-d code to use the new IOMMU page table allocator
function. This allows all the free-ing code to be removed (since it is now
handled by the general x86 code) which reduces TLB and cache thrashing as well
as shortening the code.

The scope of the mapping_lock in intel_iommu_quarantine_init() has also been
increased slightly; it should have always covered accesses to
'arch.vtd.pgd_maddr'.

NOTE: The common IOMMU needs a slight modification to avoid scheduling the
      cleanup tasklet if the free_page_table() method is not present (since
      the tasklet will unconditionally call it).

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Kevin Tian <kevin.tian@intel.com>

v2:
 - New in v2 (split from "add common page-table allocator")
---
 xen/drivers/passthrough/iommu.c     |   6 +-
 xen/drivers/passthrough/vtd/iommu.c | 101 ++++++++++------------------
 2 files changed, 39 insertions(+), 68 deletions(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 1d644844ab..2b1db8022c 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -225,8 +225,10 @@ static void iommu_teardown(struct domain *d)
 {
     struct domain_iommu *hd = dom_iommu(d);
 
-    hd->platform_ops->teardown(d);
-    tasklet_schedule(&iommu_pt_cleanup_tasklet);
+    iommu_vcall(hd->platform_ops, teardown, d);
+
+    if ( hd->platform_ops->free_page_table )
+        tasklet_schedule(&iommu_pt_cleanup_tasklet);
 }
 
 void iommu_domain_destroy(struct domain *d)
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 94e0455a4d..607e8b5e65 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -265,10 +265,15 @@ static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
 
     addr &= (((u64)1) << addr_width) - 1;
     ASSERT(spin_is_locked(&hd->arch.mapping_lock));
-    if ( !hd->arch.vtd.pgd_maddr &&
-         (!alloc ||
-          ((hd->arch.vtd.pgd_maddr = alloc_pgtable_maddr(1, hd->node)) == 0)) )
-        goto out;
+    if ( !hd->arch.vtd.pgd_maddr )
+    {
+        struct page_info *pg;
+
+        if ( !alloc || !(pg = iommu_alloc_pgtable(domain)) )
+            goto out;
+
+        hd->arch.vtd.pgd_maddr = page_to_maddr(pg);
+    }
 
     parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.vtd.pgd_maddr);
     while ( level > 1 )
@@ -279,13 +284,16 @@ static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
         pte_maddr = dma_pte_addr(*pte);
         if ( !pte_maddr )
         {
+            struct page_info *pg;
+
             if ( !alloc )
                 break;
 
-            pte_maddr = alloc_pgtable_maddr(1, hd->node);
-            if ( !pte_maddr )
+            pg = iommu_alloc_pgtable(domain);
+            if ( !pg )
                 break;
 
+            pte_maddr = page_to_maddr(pg);
             dma_set_pte_addr(*pte, pte_maddr);
 
             /*
@@ -675,45 +683,6 @@ static void dma_pte_clear_one(struct domain *domain, uint64_t addr,
     unmap_vtd_domain_page(page);
 }
 
-static void iommu_free_pagetable(u64 pt_maddr, int level)
-{
-    struct page_info *pg = maddr_to_page(pt_maddr);
-
-    if ( pt_maddr == 0 )
-        return;
-
-    PFN_ORDER(pg) = level;
-    spin_lock(&iommu_pt_cleanup_lock);
-    page_list_add_tail(pg, &iommu_pt_cleanup_list);
-    spin_unlock(&iommu_pt_cleanup_lock);
-}
-
-static void iommu_free_page_table(struct page_info *pg)
-{
-    unsigned int i, next_level = PFN_ORDER(pg) - 1;
-    u64 pt_maddr = page_to_maddr(pg);
-    struct dma_pte *pt_vaddr, *pte;
-
-    PFN_ORDER(pg) = 0;
-    pt_vaddr = (struct dma_pte *)map_vtd_domain_page(pt_maddr);
-
-    for ( i = 0; i < PTE_NUM; i++ )
-    {
-        pte = &pt_vaddr[i];
-        if ( !dma_pte_present(*pte) )
-            continue;
-
-        if ( next_level >= 1 )
-            iommu_free_pagetable(dma_pte_addr(*pte), next_level);
-
-        dma_clear_pte(*pte);
-        iommu_sync_cache(pte, sizeof(struct dma_pte));
-    }
-
-    unmap_vtd_domain_page(pt_vaddr);
-    free_pgtable_maddr(pt_maddr);
-}
-
 static int iommu_set_root_entry(struct vtd_iommu *iommu)
 {
     u32 sts;
@@ -1748,16 +1717,7 @@ static void iommu_domain_teardown(struct domain *d)
         xfree(mrmrr);
     }
 
-    ASSERT(is_iommu_enabled(d));
-
-    if ( iommu_use_hap_pt(d) )
-        return;
-
-    spin_lock(&hd->arch.mapping_lock);
-    iommu_free_pagetable(hd->arch.vtd.pgd_maddr,
-                         agaw_to_level(hd->arch.vtd.agaw));
     hd->arch.vtd.pgd_maddr = 0;
-    spin_unlock(&hd->arch.mapping_lock);
 }
 
 static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
@@ -2669,23 +2629,28 @@ static void vtd_dump_p2m_table(struct domain *d)
 static int __init intel_iommu_quarantine_init(struct domain *d)
 {
     struct domain_iommu *hd = dom_iommu(d);
+    struct page_info *pg;
     struct dma_pte *parent;
     unsigned int agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
     unsigned int level = agaw_to_level(agaw);
-    int rc;
+    int rc = 0;
+
+    spin_lock(&hd->arch.mapping_lock);
 
     if ( hd->arch.vtd.pgd_maddr )
     {
         ASSERT_UNREACHABLE();
-        return 0;
+        goto out;
     }
 
-    spin_lock(&hd->arch.mapping_lock);
+    pg = iommu_alloc_pgtable(d);
 
-    hd->arch.vtd.pgd_maddr = alloc_pgtable_maddr(1, hd->node);
-    if ( !hd->arch.vtd.pgd_maddr )
+    rc = -ENOMEM;
+    if ( !pg )
         goto out;
 
+    hd->arch.vtd.pgd_maddr = page_to_maddr(pg);
+
     parent = map_vtd_domain_page(hd->arch.vtd.pgd_maddr);
     while ( level )
     {
@@ -2697,10 +2662,12 @@ static int __init intel_iommu_quarantine_init(struct domain *d)
          * page table pages, and the resulting allocations are always
          * zeroed.
          */
-        maddr = alloc_pgtable_maddr(1, hd->node);
-        if ( !maddr )
-            break;
+        pg = iommu_alloc_pgtable(d);
+
+        if ( !pg )
+            goto out;
 
+        maddr = page_to_maddr(pg);
         for ( offset = 0; offset < PTE_NUM; offset++ )
         {
             struct dma_pte *pte = &parent[offset];
@@ -2716,13 +2683,16 @@ static int __init intel_iommu_quarantine_init(struct domain *d)
     }
     unmap_vtd_domain_page(parent);
 
+    rc = 0;
+
  out:
     spin_unlock(&hd->arch.mapping_lock);
 
-    rc = iommu_flush_iotlb_all(d);
+    if ( !rc )
+        rc = iommu_flush_iotlb_all(d);
 
-    /* Pages leaked in failure case */
-    return level ? -ENOMEM : rc;
+    /* Pages may be leaked in failure case */
+    return rc;
 }
 
 static struct iommu_ops __initdata vtd_ops = {
@@ -2737,7 +2707,6 @@ static struct iommu_ops __initdata vtd_ops = {
     .map_page = intel_iommu_map_page,
     .unmap_page = intel_iommu_unmap_page,
     .lookup_page = intel_iommu_lookup_page,
-    .free_page_table = iommu_free_page_table,
     .reassign_device = reassign_device_ownership,
     .get_device_group_id = intel_iommu_group_id,
     .enable_x2apic = intel_iommu_enable_eim,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 04/14] x86/iommu: convert AMD IOMMU code to use new page table allocator
  2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
                   ` (2 preceding siblings ...)
  2020-08-04 13:41 ` [PATCH v4 03/14] x86/iommu: convert VT-d code to use new page table allocator Paul Durrant
@ 2020-08-04 13:41 ` Paul Durrant
  2020-08-04 13:42 ` [PATCH v4 05/14] iommu: remove unused iommu_ops method and tasklet Paul Durrant
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:41 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Jan Beulich

From: Paul Durrant <pdurrant@amazon.com>

This patch converts the AMD IOMMU code to use the new page table allocator
function. This allows all the free-ing code to be removed (since it is now
handled by the general x86 code) which reduces TLB and cache thrashing as well
as shortening the code.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v2:
  - New in v2 (split from "add common page-table allocator")
---
 xen/drivers/passthrough/amd/iommu.h         | 18 +----
 xen/drivers/passthrough/amd/iommu_map.c     | 10 +--
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 75 +++------------------
 3 files changed, 16 insertions(+), 87 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/amd/iommu.h
index 3489c2a015..e2d174f3b4 100644
--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -226,7 +226,7 @@ int __must_check amd_iommu_map_page(struct domain *d, dfn_t dfn,
                                     unsigned int *flush_flags);
 int __must_check amd_iommu_unmap_page(struct domain *d, dfn_t dfn,
                                       unsigned int *flush_flags);
-int __must_check amd_iommu_alloc_root(struct domain_iommu *hd);
+int __must_check amd_iommu_alloc_root(struct domain *d);
 int amd_iommu_reserve_domain_unity_map(struct domain *domain,
                                        paddr_t phys_addr, unsigned long size,
                                        int iw, int ir);
@@ -356,22 +356,6 @@ static inline int amd_iommu_get_paging_mode(unsigned long max_frames)
     return level;
 }
 
-static inline struct page_info *alloc_amd_iommu_pgtable(void)
-{
-    struct page_info *pg = alloc_domheap_page(NULL, 0);
-
-    if ( pg )
-        clear_domain_page(page_to_mfn(pg));
-
-    return pg;
-}
-
-static inline void free_amd_iommu_pgtable(struct page_info *pg)
-{
-    if ( pg )
-        free_domheap_page(pg);
-}
-
 static inline void *__alloc_amd_iommu_tables(unsigned int order)
 {
     return alloc_xenheap_pages(order, 0);
diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index 47b4472e8a..54b991294a 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -217,7 +217,7 @@ static int iommu_pde_from_dfn(struct domain *d, unsigned long dfn,
             mfn = next_table_mfn;
 
             /* allocate lower level page table */
-            table = alloc_amd_iommu_pgtable();
+            table = iommu_alloc_pgtable(d);
             if ( table == NULL )
             {
                 AMD_IOMMU_DEBUG("Cannot allocate I/O page table\n");
@@ -248,7 +248,7 @@ static int iommu_pde_from_dfn(struct domain *d, unsigned long dfn,
 
             if ( next_table_mfn == 0 )
             {
-                table = alloc_amd_iommu_pgtable();
+                table = iommu_alloc_pgtable(d);
                 if ( table == NULL )
                 {
                     AMD_IOMMU_DEBUG("Cannot allocate I/O page table\n");
@@ -286,7 +286,7 @@ int amd_iommu_map_page(struct domain *d, dfn_t dfn, mfn_t mfn,
 
     spin_lock(&hd->arch.mapping_lock);
 
-    rc = amd_iommu_alloc_root(hd);
+    rc = amd_iommu_alloc_root(d);
     if ( rc )
     {
         spin_unlock(&hd->arch.mapping_lock);
@@ -458,7 +458,7 @@ int __init amd_iommu_quarantine_init(struct domain *d)
 
     spin_lock(&hd->arch.mapping_lock);
 
-    hd->arch.amd.root_table = alloc_amd_iommu_pgtable();
+    hd->arch.amd.root_table = iommu_alloc_pgtable(d);
     if ( !hd->arch.amd.root_table )
         goto out;
 
@@ -473,7 +473,7 @@ int __init amd_iommu_quarantine_init(struct domain *d)
          * page table pages, and the resulting allocations are always
          * zeroed.
          */
-        pg = alloc_amd_iommu_pgtable();
+        pg = iommu_alloc_pgtable(d);
         if ( !pg )
             break;
 
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 09a05f9d75..3390c22cf3 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -205,11 +205,13 @@ static int iov_enable_xt(void)
     return 0;
 }
 
-int amd_iommu_alloc_root(struct domain_iommu *hd)
+int amd_iommu_alloc_root(struct domain *d)
 {
+    struct domain_iommu *hd = dom_iommu(d);
+
     if ( unlikely(!hd->arch.amd.root_table) )
     {
-        hd->arch.amd.root_table = alloc_amd_iommu_pgtable();
+        hd->arch.amd.root_table = iommu_alloc_pgtable(d);
         if ( !hd->arch.amd.root_table )
             return -ENOMEM;
     }
@@ -217,12 +219,13 @@ int amd_iommu_alloc_root(struct domain_iommu *hd)
     return 0;
 }
 
-static int __must_check allocate_domain_resources(struct domain_iommu *hd)
+static int __must_check allocate_domain_resources(struct domain *d)
 {
+    struct domain_iommu *hd = dom_iommu(d);
     int rc;
 
     spin_lock(&hd->arch.mapping_lock);
-    rc = amd_iommu_alloc_root(hd);
+    rc = amd_iommu_alloc_root(d);
     spin_unlock(&hd->arch.mapping_lock);
 
     return rc;
@@ -254,7 +257,7 @@ static void __hwdom_init amd_iommu_hwdom_init(struct domain *d)
 {
     const struct amd_iommu *iommu;
 
-    if ( allocate_domain_resources(dom_iommu(d)) )
+    if ( allocate_domain_resources(d) )
         BUG();
 
     for_each_amd_iommu ( iommu )
@@ -323,7 +326,6 @@ static int reassign_device(struct domain *source, struct domain *target,
 {
     struct amd_iommu *iommu;
     int bdf, rc;
-    struct domain_iommu *t = dom_iommu(target);
 
     bdf = PCI_BDF2(pdev->bus, pdev->devfn);
     iommu = find_iommu_for_device(pdev->seg, bdf);
@@ -344,7 +346,7 @@ static int reassign_device(struct domain *source, struct domain *target,
         pdev->domain = target;
     }
 
-    rc = allocate_domain_resources(t);
+    rc = allocate_domain_resources(target);
     if ( rc )
         return rc;
 
@@ -376,65 +378,9 @@ static int amd_iommu_assign_device(struct domain *d, u8 devfn,
     return reassign_device(pdev->domain, d, devfn, pdev);
 }
 
-static void deallocate_next_page_table(struct page_info *pg, int level)
-{
-    PFN_ORDER(pg) = level;
-    spin_lock(&iommu_pt_cleanup_lock);
-    page_list_add_tail(pg, &iommu_pt_cleanup_list);
-    spin_unlock(&iommu_pt_cleanup_lock);
-}
-
-static void deallocate_page_table(struct page_info *pg)
-{
-    struct amd_iommu_pte *table_vaddr;
-    unsigned int index, level = PFN_ORDER(pg);
-
-    PFN_ORDER(pg) = 0;
-
-    if ( level <= 1 )
-    {
-        free_amd_iommu_pgtable(pg);
-        return;
-    }
-
-    table_vaddr = __map_domain_page(pg);
-
-    for ( index = 0; index < PTE_PER_TABLE_SIZE; index++ )
-    {
-        struct amd_iommu_pte *pde = &table_vaddr[index];
-
-        if ( pde->mfn && pde->next_level && pde->pr )
-        {
-            /* We do not support skip levels yet */
-            ASSERT(pde->next_level == level - 1);
-            deallocate_next_page_table(mfn_to_page(_mfn(pde->mfn)),
-                                       pde->next_level);
-        }
-    }
-
-    unmap_domain_page(table_vaddr);
-    free_amd_iommu_pgtable(pg);
-}
-
-static void deallocate_iommu_page_tables(struct domain *d)
-{
-    struct domain_iommu *hd = dom_iommu(d);
-
-    spin_lock(&hd->arch.mapping_lock);
-    if ( hd->arch.amd.root_table )
-    {
-        deallocate_next_page_table(hd->arch.amd.root_table,
-                                   hd->arch.amd.paging_mode);
-        hd->arch.amd.root_table = NULL;
-    }
-    spin_unlock(&hd->arch.mapping_lock);
-}
-
-
 static void amd_iommu_domain_destroy(struct domain *d)
 {
-    deallocate_iommu_page_tables(d);
-    amd_iommu_flush_all_pages(d);
+    dom_iommu(d)->arch.amd.root_table = NULL;
 }
 
 static int amd_iommu_add_device(u8 devfn, struct pci_dev *pdev)
@@ -620,7 +566,6 @@ static const struct iommu_ops __initconstrel _iommu_ops = {
     .unmap_page = amd_iommu_unmap_page,
     .iotlb_flush = amd_iommu_flush_iotlb_pages,
     .iotlb_flush_all = amd_iommu_flush_iotlb_all,
-    .free_page_table = deallocate_page_table,
     .reassign_device = reassign_device,
     .get_device_group_id = amd_iommu_group_id,
     .enable_x2apic = iov_enable_xt,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 05/14] iommu: remove unused iommu_ops method and tasklet
  2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
                   ` (3 preceding siblings ...)
  2020-08-04 13:41 ` [PATCH v4 04/14] x86/iommu: convert AMD IOMMU " Paul Durrant
@ 2020-08-04 13:42 ` Paul Durrant
  2020-08-04 13:42 ` [PATCH v4 06/14] iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail Paul Durrant
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Jan Beulich

From: Paul Durrant <pdurrant@amazon.com>

The VT-d and AMD IOMMU implementations both use the general x86 IOMMU page
table allocator and ARM always shares page tables with CPU. Hence there is no
need to retain the free_page_table() method or the tasklet which invokes it.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---

v2:
  - New in v2 (split from "add common page-table allocator")
---
 xen/drivers/passthrough/iommu.c | 25 -------------------------
 xen/include/xen/iommu.h         |  2 --
 2 files changed, 27 deletions(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 2b1db8022c..660dc5deb2 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -49,10 +49,6 @@ bool_t __read_mostly amd_iommu_perdev_intremap = 1;
 
 DEFINE_PER_CPU(bool_t, iommu_dont_flush_iotlb);
 
-DEFINE_SPINLOCK(iommu_pt_cleanup_lock);
-PAGE_LIST_HEAD(iommu_pt_cleanup_list);
-static struct tasklet iommu_pt_cleanup_tasklet;
-
 static int __init parse_iommu_param(const char *s)
 {
     const char *ss;
@@ -226,9 +222,6 @@ static void iommu_teardown(struct domain *d)
     struct domain_iommu *hd = dom_iommu(d);
 
     iommu_vcall(hd->platform_ops, teardown, d);
-
-    if ( hd->platform_ops->free_page_table )
-        tasklet_schedule(&iommu_pt_cleanup_tasklet);
 }
 
 void iommu_domain_destroy(struct domain *d)
@@ -368,23 +361,6 @@ int iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn,
     return iommu_call(hd->platform_ops, lookup_page, d, dfn, mfn, flags);
 }
 
-static void iommu_free_pagetables(void *unused)
-{
-    do {
-        struct page_info *pg;
-
-        spin_lock(&iommu_pt_cleanup_lock);
-        pg = page_list_remove_head(&iommu_pt_cleanup_list);
-        spin_unlock(&iommu_pt_cleanup_lock);
-        if ( !pg )
-            return;
-        iommu_vcall(iommu_get_ops(), free_page_table, pg);
-    } while ( !softirq_pending(smp_processor_id()) );
-
-    tasklet_schedule_on_cpu(&iommu_pt_cleanup_tasklet,
-                            cpumask_cycle(smp_processor_id(), &cpu_online_map));
-}
-
 int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned int page_count,
                       unsigned int flush_flags)
 {
@@ -508,7 +484,6 @@ int __init iommu_setup(void)
 #ifndef iommu_intremap
         printk("Interrupt remapping %sabled\n", iommu_intremap ? "en" : "dis");
 #endif
-        tasklet_init(&iommu_pt_cleanup_tasklet, iommu_free_pagetables, NULL);
     }
 
     return rc;
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 3272874958..1831dc66b0 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -263,8 +263,6 @@ struct iommu_ops {
     int __must_check (*lookup_page)(struct domain *d, dfn_t dfn, mfn_t *mfn,
                                     unsigned int *flags);
 
-    void (*free_page_table)(struct page_info *);
-
 #ifdef CONFIG_X86
     int (*enable_x2apic)(void);
     void (*disable_x2apic)(void);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 06/14] iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail
  2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
                   ` (4 preceding siblings ...)
  2020-08-04 13:42 ` [PATCH v4 05/14] iommu: remove unused iommu_ops method and tasklet Paul Durrant
@ 2020-08-04 13:42 ` Paul Durrant
  2020-08-05 16:06   ` Jan Beulich
                     ` (2 more replies)
  2020-08-04 13:42 ` [PATCH v4 07/14] iommu: make map, unmap and flush all take both an order and a count Paul Durrant
                   ` (7 subsequent siblings)
  13 siblings, 3 replies; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Jan Beulich

From: Paul Durrant <pdurrant@amazon.com>

This patch adds a full I/O TLB flush to the error paths of iommu_map() and
iommu_unmap().

Without this change callers need constructs such as:

rc = iommu_map/unmap(...)
err = iommu_flush(...)
if ( !rc )
  rc = err;

With this change, it can be simplified to:

rc = iommu_map/unmap(...)
if ( !rc )
  rc = iommu_flush(...)

because, if the map or unmap fails the flush will be unnecessary. This saves
a stack variable and generally makes the call sites tidier.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>

v2:
 - New in v2
---
 xen/drivers/passthrough/iommu.c | 28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 660dc5deb2..e2c0193a09 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -274,6 +274,10 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
         break;
     }
 
+    /* Something went wrong so flush everything and clear flush flags */
+    if ( unlikely(rc) && iommu_iotlb_flush_all(d, *flush_flags) )
+        flush_flags = 0;
+
     return rc;
 }
 
@@ -283,14 +287,8 @@ int iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn,
     unsigned int flush_flags = 0;
     int rc = iommu_map(d, dfn, mfn, page_order, flags, &flush_flags);
 
-    if ( !this_cpu(iommu_dont_flush_iotlb) )
-    {
-        int err = iommu_iotlb_flush(d, dfn, (1u << page_order),
-                                    flush_flags);
-
-        if ( !rc )
-            rc = err;
-    }
+    if ( !this_cpu(iommu_dont_flush_iotlb) && !rc )
+        rc = iommu_iotlb_flush(d, dfn, (1u << page_order), flush_flags);
 
     return rc;
 }
@@ -330,6 +328,10 @@ int iommu_unmap(struct domain *d, dfn_t dfn, unsigned int page_order,
         }
     }
 
+    /* Something went wrong so flush everything and clear flush flags */
+    if ( unlikely(rc) && iommu_iotlb_flush_all(d, *flush_flags) )
+        flush_flags = 0;
+
     return rc;
 }
 
@@ -338,14 +340,8 @@ int iommu_legacy_unmap(struct domain *d, dfn_t dfn, unsigned int page_order)
     unsigned int flush_flags = 0;
     int rc = iommu_unmap(d, dfn, page_order, &flush_flags);
 
-    if ( !this_cpu(iommu_dont_flush_iotlb) )
-    {
-        int err = iommu_iotlb_flush(d, dfn, (1u << page_order),
-                                    flush_flags);
-
-        if ( !rc )
-            rc = err;
-    }
+    if ( !this_cpu(iommu_dont_flush_iotlb) && ! rc )
+        rc = iommu_iotlb_flush(d, dfn, (1u << page_order), flush_flags);
 
     return rc;
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 07/14] iommu: make map, unmap and flush all take both an order and a count
  2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
                   ` (5 preceding siblings ...)
  2020-08-04 13:42 ` [PATCH v4 06/14] iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail Paul Durrant
@ 2020-08-04 13:42 ` Paul Durrant
  2020-08-06  9:57   ` Jan Beulich
  2020-08-04 13:42 ` [PATCH v4 08/14] remove remaining uses of iommu_legacy_map/unmap Paul Durrant
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Andrew Cooper, Paul Durrant, Ian Jackson, George Dunlap,
	Jan Beulich, Volodymyr Babchuk, Roger Pau Monné

From: Paul Durrant <pdurrant@amazon.com>

At the moment iommu_map() and iommu_unmap() take a page order but not a
count, whereas iommu_iotlb_flush() takes a count but not a page order.
This patch simply makes them consistent with each other.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>

v2:
 - New in v2
---
 xen/arch/arm/p2m.c                       |  2 +-
 xen/arch/x86/mm/p2m-ept.c                |  2 +-
 xen/common/memory.c                      |  4 +--
 xen/drivers/passthrough/amd/iommu.h      |  2 +-
 xen/drivers/passthrough/amd/iommu_map.c  |  4 +--
 xen/drivers/passthrough/arm/ipmmu-vmsa.c |  2 +-
 xen/drivers/passthrough/arm/smmu.c       |  2 +-
 xen/drivers/passthrough/iommu.c          | 31 ++++++++++++------------
 xen/drivers/passthrough/vtd/iommu.c      |  4 +--
 xen/drivers/passthrough/x86/iommu.c      |  2 +-
 xen/include/xen/iommu.h                  |  9 ++++---
 11 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index ce59f2b503..71f4a78425 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1061,7 +1061,7 @@ static int __p2m_set_entry(struct p2m_domain *p2m,
             flush_flags |= IOMMU_FLUSHF_added;
 
         rc = iommu_iotlb_flush(p2m->domain, _dfn(gfn_x(sgfn)),
-                               1UL << page_order, flush_flags);
+                               page_order, 1, flush_flags);
     }
     else
         rc = 0;
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index b8154a7ecc..b2ac912cde 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -843,7 +843,7 @@ out:
          need_modify_vtd_table )
     {
         if ( iommu_use_hap_pt(d) )
-            rc = iommu_iotlb_flush(d, _dfn(gfn), (1u << order),
+            rc = iommu_iotlb_flush(d, _dfn(gfn), (1u << order), 1,
                                    (iommu_flags ? IOMMU_FLUSHF_added : 0) |
                                    (vtd_pte_present ? IOMMU_FLUSHF_modified
                                                     : 0));
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 714077c1e5..8de334ff10 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -851,12 +851,12 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
 
         this_cpu(iommu_dont_flush_iotlb) = 0;
 
-        ret = iommu_iotlb_flush(d, _dfn(xatp->idx - done), done,
+        ret = iommu_iotlb_flush(d, _dfn(xatp->idx - done), 0, done,
                                 IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified);
         if ( unlikely(ret) && rc >= 0 )
             rc = ret;
 
-        ret = iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), done,
+        ret = iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), 0, done,
                                 IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified);
         if ( unlikely(ret) && rc >= 0 )
             rc = ret;
diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/amd/iommu.h
index e2d174f3b4..f1f0415469 100644
--- a/xen/drivers/passthrough/amd/iommu.h
+++ b/xen/drivers/passthrough/amd/iommu.h
@@ -231,7 +231,7 @@ int amd_iommu_reserve_domain_unity_map(struct domain *domain,
                                        paddr_t phys_addr, unsigned long size,
                                        int iw, int ir);
 int __must_check amd_iommu_flush_iotlb_pages(struct domain *d, dfn_t dfn,
-                                             unsigned int page_count,
+                                             unsigned long page_count,
                                              unsigned int flush_flags);
 int __must_check amd_iommu_flush_iotlb_all(struct domain *d);
 
diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index 54b991294a..0cb948d114 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -351,7 +351,7 @@ int amd_iommu_unmap_page(struct domain *d, dfn_t dfn,
     return 0;
 }
 
-static unsigned long flush_count(unsigned long dfn, unsigned int page_count,
+static unsigned long flush_count(unsigned long dfn, unsigned long page_count,
                                  unsigned int order)
 {
     unsigned long start = dfn >> order;
@@ -362,7 +362,7 @@ static unsigned long flush_count(unsigned long dfn, unsigned int page_count,
 }
 
 int amd_iommu_flush_iotlb_pages(struct domain *d, dfn_t dfn,
-                                unsigned int page_count,
+                                unsigned long page_count,
                                 unsigned int flush_flags)
 {
     unsigned long dfn_l = dfn_x(dfn);
diff --git a/xen/drivers/passthrough/arm/ipmmu-vmsa.c b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
index b2a65dfaaf..346165c3fa 100644
--- a/xen/drivers/passthrough/arm/ipmmu-vmsa.c
+++ b/xen/drivers/passthrough/arm/ipmmu-vmsa.c
@@ -945,7 +945,7 @@ static int __must_check ipmmu_iotlb_flush_all(struct domain *d)
 }
 
 static int __must_check ipmmu_iotlb_flush(struct domain *d, dfn_t dfn,
-                                          unsigned int page_count,
+                                          unsigned long page_count,
                                           unsigned int flush_flags)
 {
     ASSERT(flush_flags);
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 94662a8501..06f9bda47d 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2534,7 +2534,7 @@ static int __must_check arm_smmu_iotlb_flush_all(struct domain *d)
 }
 
 static int __must_check arm_smmu_iotlb_flush(struct domain *d, dfn_t dfn,
-					     unsigned int page_count,
+					     unsigned long page_count,
 					     unsigned int flush_flags)
 {
 	ASSERT(flush_flags);
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index e2c0193a09..568a4a5661 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -235,8 +235,8 @@ void iommu_domain_destroy(struct domain *d)
 }
 
 int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
-              unsigned int page_order, unsigned int flags,
-              unsigned int *flush_flags)
+              unsigned int page_order, unsigned int page_count,
+              unsigned int flags, unsigned int *flush_flags)
 {
     const struct domain_iommu *hd = dom_iommu(d);
     unsigned long i;
@@ -248,7 +248,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
     ASSERT(IS_ALIGNED(dfn_x(dfn), (1ul << page_order)));
     ASSERT(IS_ALIGNED(mfn_x(mfn), (1ul << page_order)));
 
-    for ( i = 0; i < (1ul << page_order); i++ )
+    for ( i = 0; i < ((unsigned long)page_count << page_order); i++ )
     {
         rc = iommu_call(hd->platform_ops, map_page, d, dfn_add(dfn, i),
                         mfn_add(mfn, i), flags, flush_flags);
@@ -285,16 +285,16 @@ int iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn,
                      unsigned int page_order, unsigned int flags)
 {
     unsigned int flush_flags = 0;
-    int rc = iommu_map(d, dfn, mfn, page_order, flags, &flush_flags);
+    int rc = iommu_map(d, dfn, mfn, page_order, 1, flags, &flush_flags);
 
     if ( !this_cpu(iommu_dont_flush_iotlb) && !rc )
-        rc = iommu_iotlb_flush(d, dfn, (1u << page_order), flush_flags);
+        rc = iommu_iotlb_flush(d, dfn, (1u << page_order), 1, flush_flags);
 
     return rc;
 }
 
 int iommu_unmap(struct domain *d, dfn_t dfn, unsigned int page_order,
-                unsigned int *flush_flags)
+                unsigned int page_count, unsigned int *flush_flags)
 {
     const struct domain_iommu *hd = dom_iommu(d);
     unsigned long i;
@@ -305,7 +305,7 @@ int iommu_unmap(struct domain *d, dfn_t dfn, unsigned int page_order,
 
     ASSERT(IS_ALIGNED(dfn_x(dfn), (1ul << page_order)));
 
-    for ( i = 0; i < (1ul << page_order); i++ )
+    for ( i = 0; i < ((unsigned long)page_count << page_order); i++ )
     {
         int err = iommu_call(hd->platform_ops, unmap_page, d, dfn_add(dfn, i),
                              flush_flags);
@@ -338,10 +338,10 @@ int iommu_unmap(struct domain *d, dfn_t dfn, unsigned int page_order,
 int iommu_legacy_unmap(struct domain *d, dfn_t dfn, unsigned int page_order)
 {
     unsigned int flush_flags = 0;
-    int rc = iommu_unmap(d, dfn, page_order, &flush_flags);
+    int rc = iommu_unmap(d, dfn, page_order, 1, &flush_flags);
 
     if ( !this_cpu(iommu_dont_flush_iotlb) && ! rc )
-        rc = iommu_iotlb_flush(d, dfn, (1u << page_order), flush_flags);
+        rc = iommu_iotlb_flush(d, dfn, (1u << page_order), 1, flush_flags);
 
     return rc;
 }
@@ -357,8 +357,8 @@ int iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn,
     return iommu_call(hd->platform_ops, lookup_page, d, dfn, mfn, flags);
 }
 
-int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned int page_count,
-                      unsigned int flush_flags)
+int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned int page_order,
+                      unsigned int page_count, unsigned int flush_flags)
 {
     const struct domain_iommu *hd = dom_iommu(d);
     int rc;
@@ -370,14 +370,15 @@ int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned int page_count,
     if ( dfn_eq(dfn, INVALID_DFN) )
         return -EINVAL;
 
-    rc = iommu_call(hd->platform_ops, iotlb_flush, d, dfn, page_count,
-                    flush_flags);
+    rc = iommu_call(hd->platform_ops, iotlb_flush, d, dfn,
+                    (unsigned long)page_count << page_order, flush_flags);
     if ( unlikely(rc) )
     {
         if ( !d->is_shutting_down && printk_ratelimit() )
             printk(XENLOG_ERR
-                   "d%d: IOMMU IOTLB flush failed: %d, dfn %"PRI_dfn", page count %u flags %x\n",
-                   d->domain_id, rc, dfn_x(dfn), page_count, flush_flags);
+                   "d%d: IOMMU IOTLB flush failed: %d, dfn %"PRI_dfn", page order %u, page count %u flags %x\n",
+                   d->domain_id, rc, dfn_x(dfn), page_order, page_count,
+                   flush_flags);
 
         if ( !is_hardware_domain(d) )
             domain_crash(d);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 607e8b5e65..68cf0e535a 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -584,7 +584,7 @@ static int __must_check iommu_flush_all(void)
 
 static int __must_check iommu_flush_iotlb(struct domain *d, dfn_t dfn,
                                           bool_t dma_old_pte_present,
-                                          unsigned int page_count)
+                                          unsigned long page_count)
 {
     struct domain_iommu *hd = dom_iommu(d);
     struct acpi_drhd_unit *drhd;
@@ -632,7 +632,7 @@ static int __must_check iommu_flush_iotlb(struct domain *d, dfn_t dfn,
 
 static int __must_check iommu_flush_iotlb_pages(struct domain *d,
                                                 dfn_t dfn,
-                                                unsigned int page_count,
+                                                unsigned long page_count,
                                                 unsigned int flush_flags)
 {
     ASSERT(page_count && !dfn_eq(dfn, INVALID_DFN));
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index aea07e47c4..dba6c9d642 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -244,7 +244,7 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
         else if ( paging_mode_translate(d) )
             rc = set_identity_p2m_entry(d, pfn, p2m_access_rw, 0);
         else
-            rc = iommu_map(d, _dfn(pfn), _mfn(pfn), PAGE_ORDER_4K,
+            rc = iommu_map(d, _dfn(pfn), _mfn(pfn), PAGE_ORDER_4K, 1,
                            IOMMUF_readable | IOMMUF_writable, &flush_flags);
 
         if ( rc )
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 1831dc66b0..d9c2e764aa 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -146,10 +146,10 @@ enum
 #define IOMMU_FLUSHF_modified (1u << _IOMMU_FLUSHF_modified)
 
 int __must_check iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
-                           unsigned int page_order, unsigned int flags,
-                           unsigned int *flush_flags);
+                           unsigned int page_order, unsigned int page_count,
+                           unsigned int flags, unsigned int *flush_flags);
 int __must_check iommu_unmap(struct domain *d, dfn_t dfn,
-                             unsigned int page_order,
+                             unsigned int page_order, unsigned int page_count,
                              unsigned int *flush_flags);
 
 int __must_check iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn,
@@ -162,6 +162,7 @@ int __must_check iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn,
                                    unsigned int *flags);
 
 int __must_check iommu_iotlb_flush(struct domain *d, dfn_t dfn,
+                                   unsigned int page_order,
                                    unsigned int page_count,
                                    unsigned int flush_flags);
 int __must_check iommu_iotlb_flush_all(struct domain *d,
@@ -281,7 +282,7 @@ struct iommu_ops {
     void (*share_p2m)(struct domain *d);
     void (*crash_shutdown)(void);
     int __must_check (*iotlb_flush)(struct domain *d, dfn_t dfn,
-                                    unsigned int page_count,
+                                    unsigned long page_count,
                                     unsigned int flush_flags);
     int __must_check (*iotlb_flush_all)(struct domain *d);
     int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 08/14] remove remaining uses of iommu_legacy_map/unmap
  2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
                   ` (6 preceding siblings ...)
  2020-08-04 13:42 ` [PATCH v4 07/14] iommu: make map, unmap and flush all take both an order and a count Paul Durrant
@ 2020-08-04 13:42 ` Paul Durrant
  2020-08-06 10:28   ` Jan Beulich
  2020-08-04 13:42 ` [PATCH v4 09/14] common/grant_table: batch flush I/O TLB Paul Durrant
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Andrew Cooper, Paul Durrant, Ian Jackson, George Dunlap,
	Jan Beulich, Roger Pau Monné

From: Paul Durrant <pdurrant@amazon.com>

The 'legacy' functions do implicit flushing so amend the callers to do the
appropriate flushing.

Unfortunately, because of the structure of the P2M code, we cannot remove
the per-CPU 'iommu_dont_flush_iotlb' global and the optimization it
facilitates. It is now checked directly iommu_iotlb_flush(). Also, it is
now declared as bool (rather than bool_t) and setting/clearing it are no
longer pointlessly gated on is_iommu_enabled() returning true. (Arguably
it is also pointless to gate the call to iommu_iotlb_flush() on that
condition - since it is a no-op in that case - but the if clause allows
the scope of a stack variable to be restricted).

NOTE: The code in memory_add() now fails if the number of pages passed to
      a single call overflows an unsigned int. I don't believe this will
      ever happen in practice.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>

v3:
 - Same as v2; elected to implement batch flushing in the grant table code as
   a subsequent patch

v2:
 - Shorten the diff (mainly because of a prior patch introducing automatic
   flush-on-fail into iommu_map() and iommu_unmap())
---
 xen/arch/x86/mm.c               | 21 +++++++++++++++-----
 xen/arch/x86/mm/p2m-ept.c       | 20 +++++++++++--------
 xen/arch/x86/mm/p2m-pt.c        | 15 +++++++++++----
 xen/arch/x86/mm/p2m.c           | 26 ++++++++++++++++++-------
 xen/arch/x86/x86_64/mm.c        | 27 +++++++++++++-------------
 xen/common/grant_table.c        | 34 ++++++++++++++++++++++++---------
 xen/common/memory.c             |  5 +++--
 xen/drivers/passthrough/iommu.c | 25 +-----------------------
 xen/include/xen/iommu.h         | 21 +++++---------------
 9 files changed, 106 insertions(+), 88 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 638f6bf580..062af1f684 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2446,10 +2446,16 @@ static int cleanup_page_mappings(struct page_info *page)
 
         if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
         {
-            int rc2 = iommu_legacy_unmap(d, _dfn(mfn), PAGE_ORDER_4K);
+            unsigned int flush_flags = 0;
+            int err;
 
+            err = iommu_unmap(d, _dfn(mfn), PAGE_ORDER_4K, 1, &flush_flags);
             if ( !rc )
-                rc = rc2;
+                rc = err;
+
+            err = iommu_iotlb_flush(d, _dfn(mfn), PAGE_ORDER_4K, 1, flush_flags);
+            if ( !rc )
+                rc = err;
         }
 
         if ( likely(!is_special_page(page)) )
@@ -2971,12 +2977,17 @@ static int _get_page_type(struct page_info *page, unsigned long type,
         if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
         {
             mfn_t mfn = page_to_mfn(page);
+            dfn_t dfn = _dfn(mfn_x(mfn));
+            unsigned int flush_flags = 0;
 
             if ( (x & PGT_type_mask) == PGT_writable_page )
-                rc = iommu_legacy_unmap(d, _dfn(mfn_x(mfn)), PAGE_ORDER_4K);
+                rc = iommu_unmap(d, dfn, PAGE_ORDER_4K, 1, &flush_flags);
             else
-                rc = iommu_legacy_map(d, _dfn(mfn_x(mfn)), mfn, PAGE_ORDER_4K,
-                                      IOMMUF_readable | IOMMUF_writable);
+                rc = iommu_map(d, dfn, mfn, PAGE_ORDER_4K, 1,
+                               IOMMUF_readable | IOMMUF_writable, &flush_flags);
+
+            if ( !rc )
+                rc = iommu_iotlb_flush(d, dfn, PAGE_ORDER_4K, 1, flush_flags);
 
             if ( unlikely(rc) )
             {
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index b2ac912cde..e38b0bf95c 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -842,15 +842,19 @@ out:
     if ( rc == 0 && p2m_is_hostp2m(p2m) &&
          need_modify_vtd_table )
     {
-        if ( iommu_use_hap_pt(d) )
-            rc = iommu_iotlb_flush(d, _dfn(gfn), (1u << order), 1,
-                                   (iommu_flags ? IOMMU_FLUSHF_added : 0) |
-                                   (vtd_pte_present ? IOMMU_FLUSHF_modified
-                                                    : 0));
-        else if ( need_iommu_pt_sync(d) )
+        unsigned int flush_flags = 0;
+
+        if ( need_iommu_pt_sync(d) )
             rc = iommu_flags ?
-                iommu_legacy_map(d, _dfn(gfn), mfn, order, iommu_flags) :
-                iommu_legacy_unmap(d, _dfn(gfn), order);
+                iommu_map(d, _dfn(gfn), mfn, order, 1, iommu_flags, &flush_flags) :
+                iommu_unmap(d, _dfn(gfn), order, 1, &flush_flags);
+        else if ( iommu_use_hap_pt(d) )
+            flush_flags =
+                (iommu_flags ? IOMMU_FLUSHF_added : 0) |
+                (vtd_pte_present ? IOMMU_FLUSHF_modified : 0);
+
+        if ( !rc )
+            rc = iommu_iotlb_flush(d, _dfn(gfn), order, 1, flush_flags);
     }
 
     unmap_domain_page(table);
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index badb26bc34..3c0901b56c 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -678,10 +678,17 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
 
     if ( need_iommu_pt_sync(p2m->domain) &&
          (iommu_old_flags != iommu_pte_flags || old_mfn != mfn_x(mfn)) )
-        rc = iommu_pte_flags
-             ? iommu_legacy_map(d, _dfn(gfn), mfn, page_order,
-                                iommu_pte_flags)
-             : iommu_legacy_unmap(d, _dfn(gfn), page_order);
+    {
+        unsigned int flush_flags = 0;
+
+        rc = iommu_pte_flags ?
+            iommu_map(d, _dfn(gfn), mfn, page_order, 1, iommu_pte_flags,
+                      &flush_flags) :
+            iommu_unmap(d, _dfn(gfn), page_order, 1, &flush_flags);
+
+        if ( !rc )
+            rc = iommu_iotlb_flush(d, _dfn(gfn), page_order, 1, flush_flags);
+    }
 
     /*
      * Free old intermediate tables if necessary.  This has to be the
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index db7bde0230..9f8b9bc5fd 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1350,10 +1350,15 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn_l,
 
     if ( !paging_mode_translate(p2m->domain) )
     {
-        if ( !is_iommu_enabled(d) )
-            return 0;
-        return iommu_legacy_map(d, _dfn(gfn_l), _mfn(gfn_l), PAGE_ORDER_4K,
-                                IOMMUF_readable | IOMMUF_writable);
+        unsigned int flush_flags = 0;
+
+        ret = iommu_map(d, _dfn(gfn_l), _mfn(gfn_l), PAGE_ORDER_4K, 1,
+                        IOMMUF_readable | IOMMUF_writable, &flush_flags);
+        if ( !ret )
+            ret = iommu_iotlb_flush(d, _dfn(gfn_l), PAGE_ORDER_4K, 1,
+                                    flush_flags);
+
+        return ret;
     }
 
     gfn_lock(p2m, gfn, 0);
@@ -1441,9 +1446,16 @@ int clear_identity_p2m_entry(struct domain *d, unsigned long gfn_l)
 
     if ( !paging_mode_translate(d) )
     {
-        if ( !is_iommu_enabled(d) )
-            return 0;
-        return iommu_legacy_unmap(d, _dfn(gfn_l), PAGE_ORDER_4K);
+        unsigned int flush_flags = 0;
+        int err;
+
+        ret = iommu_unmap(d, _dfn(gfn_l), PAGE_ORDER_4K, 1, &flush_flags);
+
+        err = iommu_iotlb_flush(d, _dfn(gfn_l), PAGE_ORDER_4K, 1, flush_flags);
+        if ( !ret )
+            ret = err;
+
+        return ret;
     }
 
     gfn_lock(p2m, gfn, 0);
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 102079a801..02684bcf9d 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1413,21 +1413,22 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
          !iommu_use_hap_pt(hardware_domain) &&
          !need_iommu_pt_sync(hardware_domain) )
     {
-        for ( i = spfn; i < epfn; i++ )
-            if ( iommu_legacy_map(hardware_domain, _dfn(i), _mfn(i),
-                                  PAGE_ORDER_4K,
-                                  IOMMUF_readable | IOMMUF_writable) )
-                break;
-        if ( i != epfn )
-        {
-            while (i-- > old_max)
-                /* If statement to satisfy __must_check. */
-                if ( iommu_legacy_unmap(hardware_domain, _dfn(i),
-                                        PAGE_ORDER_4K) )
-                    continue;
+        unsigned int flush_flags = 0;
+        unsigned int n = epfn - spfn;
+        int rc;
 
+        ret = -EOVERFLOW;
+        if ( spfn + n != epfn )
+            goto destroy_m2p;
+
+        rc = iommu_map(hardware_domain, _dfn(i), _mfn(i),
+                       PAGE_ORDER_4K, n, IOMMUF_readable | IOMMUF_writable,
+                       &flush_flags);
+        if ( !rc )
+            rc = iommu_iotlb_flush(hardware_domain, _dfn(i), PAGE_ORDER_4K, n,
+                                       flush_flags);
+        if ( rc )
             goto destroy_m2p;
-        }
     }
 
     /* We can't revert any more */
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 9f0cae52c0..d6526bca12 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1225,11 +1225,23 @@ map_grant_ref(
             kind = IOMMUF_readable;
         else
             kind = 0;
-        if ( kind && iommu_legacy_map(ld, _dfn(mfn_x(mfn)), mfn, 0, kind) )
+        if ( kind )
         {
-            double_gt_unlock(lgt, rgt);
-            rc = GNTST_general_error;
-            goto undo_out;
+            dfn_t dfn = _dfn(mfn_x(mfn));
+            unsigned int flush_flags = 0;
+            int err;
+
+            err = iommu_map(ld, dfn, mfn, 0, 1, kind, &flush_flags);
+            if ( !err )
+                err = iommu_iotlb_flush(ld, dfn, 0, 1, flush_flags);
+            if ( err )
+                rc = GNTST_general_error;
+
+            if ( rc != GNTST_okay )
+            {
+                double_gt_unlock(lgt, rgt);
+                goto undo_out;
+            }
         }
     }
 
@@ -1473,21 +1485,25 @@ unmap_common(
     if ( rc == GNTST_okay && gnttab_need_iommu_mapping(ld) )
     {
         unsigned int kind;
+        dfn_t dfn = _dfn(mfn_x(op->mfn));
+        unsigned int flush_flags = 0;
         int err = 0;
 
         double_gt_lock(lgt, rgt);
 
         kind = mapkind(lgt, rd, op->mfn);
         if ( !kind )
-            err = iommu_legacy_unmap(ld, _dfn(mfn_x(op->mfn)), 0);
+            err = iommu_unmap(ld, dfn, 0, 1, &flush_flags);
         else if ( !(kind & MAPKIND_WRITE) )
-            err = iommu_legacy_map(ld, _dfn(mfn_x(op->mfn)), op->mfn, 0,
-                                   IOMMUF_readable);
-
-        double_gt_unlock(lgt, rgt);
+            err = iommu_map(ld, dfn, op->mfn, 0, 1, IOMMUF_readable,
+                            &flush_flags);
 
+        if ( !err )
+            err = iommu_iotlb_flush(ld, dfn, 0, 1, flush_flags);
         if ( err )
             rc = GNTST_general_error;
+
+        double_gt_unlock(lgt, rgt);
     }
 
     /* If just unmapped a writable mapping, mark as dirtied */
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 8de334ff10..2891bef57b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -824,8 +824,7 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
     xatp->gpfn += start;
     xatp->size -= start;
 
-    if ( is_iommu_enabled(d) )
-       this_cpu(iommu_dont_flush_iotlb) = 1;
+    this_cpu(iommu_dont_flush_iotlb) = true;
 
     while ( xatp->size > done )
     {
@@ -845,6 +844,8 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
         }
     }
 
+    this_cpu(iommu_dont_flush_iotlb) = false;
+
     if ( is_iommu_enabled(d) )
     {
         int ret;
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 568a4a5661..ab44c332bb 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -281,18 +281,6 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
     return rc;
 }
 
-int iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn,
-                     unsigned int page_order, unsigned int flags)
-{
-    unsigned int flush_flags = 0;
-    int rc = iommu_map(d, dfn, mfn, page_order, 1, flags, &flush_flags);
-
-    if ( !this_cpu(iommu_dont_flush_iotlb) && !rc )
-        rc = iommu_iotlb_flush(d, dfn, (1u << page_order), 1, flush_flags);
-
-    return rc;
-}
-
 int iommu_unmap(struct domain *d, dfn_t dfn, unsigned int page_order,
                 unsigned int page_count, unsigned int *flush_flags)
 {
@@ -335,17 +323,6 @@ int iommu_unmap(struct domain *d, dfn_t dfn, unsigned int page_order,
     return rc;
 }
 
-int iommu_legacy_unmap(struct domain *d, dfn_t dfn, unsigned int page_order)
-{
-    unsigned int flush_flags = 0;
-    int rc = iommu_unmap(d, dfn, page_order, 1, &flush_flags);
-
-    if ( !this_cpu(iommu_dont_flush_iotlb) && ! rc )
-        rc = iommu_iotlb_flush(d, dfn, (1u << page_order), 1, flush_flags);
-
-    return rc;
-}
-
 int iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn,
                       unsigned int *flags)
 {
@@ -364,7 +341,7 @@ int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned int page_order,
     int rc;
 
     if ( !is_iommu_enabled(d) || !hd->platform_ops->iotlb_flush ||
-         !page_count || !flush_flags )
+         !page_count || !flush_flags || this_cpu(iommu_dont_flush_iotlb) )
         return 0;
 
     if ( dfn_eq(dfn, INVALID_DFN) )
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index d9c2e764aa..b7e5d3da09 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -151,16 +151,8 @@ int __must_check iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
 int __must_check iommu_unmap(struct domain *d, dfn_t dfn,
                              unsigned int page_order, unsigned int page_count,
                              unsigned int *flush_flags);
-
-int __must_check iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn,
-                                  unsigned int page_order,
-                                  unsigned int flags);
-int __must_check iommu_legacy_unmap(struct domain *d, dfn_t dfn,
-                                    unsigned int page_order);
-
 int __must_check iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn,
                                    unsigned int *flags);
-
 int __must_check iommu_iotlb_flush(struct domain *d, dfn_t dfn,
                                    unsigned int page_order,
                                    unsigned int page_count,
@@ -370,15 +362,12 @@ void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev);
 
 /*
  * The purpose of the iommu_dont_flush_iotlb optional cpu flag is to
- * avoid unecessary iotlb_flush in the low level IOMMU code.
- *
- * iommu_map_page/iommu_unmap_page must flush the iotlb but somethimes
- * this operation can be really expensive. This flag will be set by the
- * caller to notify the low level IOMMU code to avoid the iotlb flushes.
- * iommu_iotlb_flush/iommu_iotlb_flush_all will be explicitly called by
- * the caller.
+ * avoid unecessary IOMMU flushing while updating the P2M.
+ * Setting the value to true will cause iommu_iotlb_flush() to return without
+ * actually performing a flush. A batch flush must therefore be done by the
+ * calling code after setting the value back to false.
  */
-DECLARE_PER_CPU(bool_t, iommu_dont_flush_iotlb);
+DECLARE_PER_CPU(bool, iommu_dont_flush_iotlb);
 
 extern struct spinlock iommu_pt_cleanup_lock;
 extern struct page_list_head iommu_pt_cleanup_list;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 09/14] common/grant_table: batch flush I/O TLB
  2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
                   ` (7 preceding siblings ...)
  2020-08-04 13:42 ` [PATCH v4 08/14] remove remaining uses of iommu_legacy_map/unmap Paul Durrant
@ 2020-08-04 13:42 ` Paul Durrant
  2020-08-06 11:49   ` Jan Beulich
  2020-08-04 13:42 ` [PATCH v4 10/14] iommu: remove the share_p2m operation Paul Durrant
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Paul Durrant, Ian Jackson, George Dunlap, Jan Beulich

From: Paul Durrant <pdurrant@amazon.com>

This patch avoids calling iommu_iotlb_flush() for each individual GNTTABOP and
insteads calls iommu_iotlb_flush_all() at the end of the hypercall. This
should mean batched map/unmap operations perform better but may be slightly
detrimental to singleton performance.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wl@xen.org>

v3:
 - New in v3
---
 xen/common/grant_table.c | 132 ++++++++++++++++++++++++---------------
 1 file changed, 80 insertions(+), 52 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index d6526bca12..f382e0be52 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -979,7 +979,7 @@ static unsigned int mapkind(
 
 static void
 map_grant_ref(
-    struct gnttab_map_grant_ref *op)
+    struct gnttab_map_grant_ref *op, unsigned int *flush_flags)
 {
     struct domain *ld, *rd, *owner = NULL;
     struct grant_table *lgt, *rgt;
@@ -1228,17 +1228,12 @@ map_grant_ref(
         if ( kind )
         {
             dfn_t dfn = _dfn(mfn_x(mfn));
-            unsigned int flush_flags = 0;
             int err;
 
-            err = iommu_map(ld, dfn, mfn, 0, 1, kind, &flush_flags);
-            if ( !err )
-                err = iommu_iotlb_flush(ld, dfn, 0, 1, flush_flags);
+            err = iommu_map(ld, dfn, mfn, 0, 1, kind, flush_flags);
             if ( err )
-                rc = GNTST_general_error;
-
-            if ( rc != GNTST_okay )
             {
+                rc = GNTST_general_error;
                 double_gt_unlock(lgt, rgt);
                 goto undo_out;
             }
@@ -1322,6 +1317,8 @@ gnttab_map_grant_ref(
 {
     int i;
     struct gnttab_map_grant_ref op;
+    unsigned int flush_flags = 0;
+    int err, rc = 0;
 
     for ( i = 0; i < count; i++ )
     {
@@ -1329,20 +1326,30 @@ gnttab_map_grant_ref(
             return i;
 
         if ( unlikely(__copy_from_guest_offset(&op, uop, i, 1)) )
-            return -EFAULT;
+        {
+            rc = -EFAULT;
+            break;
+        }
 
-        map_grant_ref(&op);
+        map_grant_ref(&op, &flush_flags);
 
         if ( unlikely(__copy_to_guest_offset(uop, i, &op, 1)) )
-            return -EFAULT;
+        {
+            rc = -EFAULT;
+            break;
+        }
     }
 
-    return 0;
+    err = iommu_iotlb_flush_all(current->domain, flush_flags);
+    if ( !rc )
+        rc = err;
+
+    return rc;
 }
 
 static void
 unmap_common(
-    struct gnttab_unmap_common *op)
+    struct gnttab_unmap_common *op, unsigned int *flush_flags)
 {
     domid_t          dom;
     struct domain   *ld, *rd;
@@ -1486,20 +1493,16 @@ unmap_common(
     {
         unsigned int kind;
         dfn_t dfn = _dfn(mfn_x(op->mfn));
-        unsigned int flush_flags = 0;
         int err = 0;
 
         double_gt_lock(lgt, rgt);
 
         kind = mapkind(lgt, rd, op->mfn);
         if ( !kind )
-            err = iommu_unmap(ld, dfn, 0, 1, &flush_flags);
+            err = iommu_unmap(ld, dfn, 0, 1, flush_flags);
         else if ( !(kind & MAPKIND_WRITE) )
             err = iommu_map(ld, dfn, op->mfn, 0, 1, IOMMUF_readable,
-                            &flush_flags);
-
-        if ( !err )
-            err = iommu_iotlb_flush(ld, dfn, 0, 1, flush_flags);
+                            flush_flags);
         if ( err )
             rc = GNTST_general_error;
 
@@ -1600,8 +1603,8 @@ unmap_common_complete(struct gnttab_unmap_common *op)
 
 static void
 unmap_grant_ref(
-    struct gnttab_unmap_grant_ref *op,
-    struct gnttab_unmap_common *common)
+    struct gnttab_unmap_grant_ref *op, struct gnttab_unmap_common *common,
+    unsigned int *flush_flags)
 {
     common->host_addr = op->host_addr;
     common->dev_bus_addr = op->dev_bus_addr;
@@ -1613,7 +1616,7 @@ unmap_grant_ref(
     common->rd = NULL;
     common->mfn = INVALID_MFN;
 
-    unmap_common(common);
+    unmap_common(common, flush_flags);
     op->status = common->status;
 }
 
@@ -1622,31 +1625,50 @@ static long
 gnttab_unmap_grant_ref(
     XEN_GUEST_HANDLE_PARAM(gnttab_unmap_grant_ref_t) uop, unsigned int count)
 {
-    int i, c, partial_done, done = 0;
+    struct domain *currd = current->domain;
     struct gnttab_unmap_grant_ref op;
     struct gnttab_unmap_common common[GNTTAB_UNMAP_BATCH_SIZE];
+    int rc = 0;
 
     while ( count != 0 )
     {
+        unsigned int i, c, partial_done = 0, done = 0;
+        unsigned int flush_flags = 0;
+        int err;
+
         c = min(count, (unsigned int)GNTTAB_UNMAP_BATCH_SIZE);
-        partial_done = 0;
 
         for ( i = 0; i < c; i++ )
         {
             if ( unlikely(__copy_from_guest(&op, uop, 1)) )
-                goto fault;
-            unmap_grant_ref(&op, &common[i]);
+            {
+                rc = -EFAULT;
+                break;
+            }
+
+            unmap_grant_ref(&op, &common[i], &flush_flags);
             ++partial_done;
+
             if ( unlikely(__copy_field_to_guest(uop, &op, status)) )
-                goto fault;
+            {
+                rc = -EFAULT;
+                break;
+            }
+
             guest_handle_add_offset(uop, 1);
         }
 
-        gnttab_flush_tlb(current->domain);
+        gnttab_flush_tlb(currd);
+        err = iommu_iotlb_flush_all(currd, flush_flags);
+        if ( !rc )
+            rc = err;
 
         for ( i = 0; i < partial_done; i++ )
             unmap_common_complete(&common[i]);
 
+        if ( rc )
+            break;
+
         count -= c;
         done += c;
 
@@ -1654,20 +1676,14 @@ gnttab_unmap_grant_ref(
             return done;
     }
 
-    return 0;
-
-fault:
-    gnttab_flush_tlb(current->domain);
-
-    for ( i = 0; i < partial_done; i++ )
-        unmap_common_complete(&common[i]);
-    return -EFAULT;
+    return rc;
 }
 
 static void
 unmap_and_replace(
     struct gnttab_unmap_and_replace *op,
-    struct gnttab_unmap_common *common)
+    struct gnttab_unmap_common *common,
+    unsigned int *flush_flags)
 {
     common->host_addr = op->host_addr;
     common->new_addr = op->new_addr;
@@ -1679,7 +1695,7 @@ unmap_and_replace(
     common->rd = NULL;
     common->mfn = INVALID_MFN;
 
-    unmap_common(common);
+    unmap_common(common, flush_flags);
     op->status = common->status;
 }
 
@@ -1687,31 +1703,50 @@ static long
 gnttab_unmap_and_replace(
     XEN_GUEST_HANDLE_PARAM(gnttab_unmap_and_replace_t) uop, unsigned int count)
 {
-    int i, c, partial_done, done = 0;
+    struct domain *currd = current->domain;
     struct gnttab_unmap_and_replace op;
     struct gnttab_unmap_common common[GNTTAB_UNMAP_BATCH_SIZE];
+    int rc = 0;
 
     while ( count != 0 )
     {
+        unsigned int i, c, partial_done = 0, done = 0;
+        unsigned int flush_flags = 0;
+        int err;
+
         c = min(count, (unsigned int)GNTTAB_UNMAP_BATCH_SIZE);
-        partial_done = 0;
 
         for ( i = 0; i < c; i++ )
         {
             if ( unlikely(__copy_from_guest(&op, uop, 1)) )
-                goto fault;
-            unmap_and_replace(&op, &common[i]);
+            {
+                rc = -EFAULT;
+                break;
+            }
+
+            unmap_and_replace(&op, &common[i], &flush_flags);
             ++partial_done;
+
             if ( unlikely(__copy_field_to_guest(uop, &op, status)) )
-                goto fault;
+            {
+                rc = -EFAULT;
+                break;
+            }
+
             guest_handle_add_offset(uop, 1);
         }
 
-        gnttab_flush_tlb(current->domain);
+        gnttab_flush_tlb(currd);
+        err = iommu_iotlb_flush_all(currd, flush_flags);
+        if ( !rc )
+            rc = err;
 
         for ( i = 0; i < partial_done; i++ )
             unmap_common_complete(&common[i]);
 
+        if ( rc )
+            break;
+
         count -= c;
         done += c;
 
@@ -1719,14 +1754,7 @@ gnttab_unmap_and_replace(
             return done;
     }
 
-    return 0;
-
-fault:
-    gnttab_flush_tlb(current->domain);
-
-    for ( i = 0; i < partial_done; i++ )
-        unmap_common_complete(&common[i]);
-    return -EFAULT;
+    return rc;
 }
 
 static int
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 10/14] iommu: remove the share_p2m operation
  2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
                   ` (8 preceding siblings ...)
  2020-08-04 13:42 ` [PATCH v4 09/14] common/grant_table: batch flush I/O TLB Paul Durrant
@ 2020-08-04 13:42 ` Paul Durrant
  2020-08-06 12:18   ` Jan Beulich
  2020-08-14  7:04   ` Tian, Kevin
  2020-08-04 13:42 ` [PATCH v4 11/14] iommu: stop calling IOMMU page tables 'p2m tables' Paul Durrant
                   ` (3 subsequent siblings)
  13 siblings, 2 replies; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Wei Liu, Andrew Cooper, Paul Durrant, George Dunlap,
	Jan Beulich, Roger Pau Monné

From: Paul Durrant <pdurrant@amazon.com>

Sharing of HAP tables is now VT-d specific so the operation is never defined
for AMD IOMMU any more. There's also no need to pro-actively set vtd.pgd_maddr
when using shared EPT as it is straightforward to simply define a helper
function to return the appropriate value in the shared and non-shared cases.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Kevin Tian <kevin.tian@intel.com>

v2:
  - Put the PGD level adjust into the helper function too, since it is
    irrelevant in the shared EPT case
---
 xen/arch/x86/mm/p2m.c               |  3 -
 xen/drivers/passthrough/iommu.c     |  8 ---
 xen/drivers/passthrough/vtd/iommu.c | 90 ++++++++++++++++-------------
 xen/include/xen/iommu.h             |  3 -
 4 files changed, 50 insertions(+), 54 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 9f8b9bc5fd..3bd8d83d23 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -726,9 +726,6 @@ int p2m_alloc_table(struct p2m_domain *p2m)
 
     p2m->phys_table = pagetable_from_mfn(top_mfn);
 
-    if ( hap_enabled(d) )
-        iommu_share_p2m_table(d);
-
     p2m_unlock(p2m);
     return 0;
 }
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index ab44c332bb..7464f10d1c 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -498,14 +498,6 @@ int iommu_do_domctl(
     return ret;
 }
 
-void iommu_share_p2m_table(struct domain* d)
-{
-    ASSERT(hap_enabled(d));
-
-    if ( iommu_use_hap_pt(d) )
-        iommu_get_ops()->share_p2m(d);
-}
-
 void iommu_crash_shutdown(void)
 {
     if ( !iommu_crash_disable )
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 68cf0e535a..a532d9e88c 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -318,6 +318,48 @@ static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
     return pte_maddr;
 }
 
+static uint64_t domain_pgd_maddr(struct domain *d, struct vtd_iommu *iommu)
+{
+    struct domain_iommu *hd = dom_iommu(d);
+    uint64_t pgd_maddr;
+    unsigned int agaw;
+
+    ASSERT(spin_is_locked(&hd->arch.mapping_lock));
+
+    if ( iommu_use_hap_pt(d) )
+    {
+        mfn_t pgd_mfn =
+            pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
+
+        return pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
+    }
+
+    if ( !hd->arch.vtd.pgd_maddr )
+    {
+        addr_to_dma_page_maddr(d, 0, 1);
+
+        if ( !hd->arch.vtd.pgd_maddr )
+            return 0;
+    }
+
+    pgd_maddr = hd->arch.vtd.pgd_maddr;
+
+    /* Skip top levels of page tables for 2- and 3-level DRHDs. */
+    for ( agaw = level_to_agaw(4);
+          agaw != level_to_agaw(iommu->nr_pt_levels);
+          agaw-- )
+    {
+        struct dma_pte *p = map_vtd_domain_page(pgd_maddr);
+
+        pgd_maddr = dma_pte_addr(*p);
+        unmap_vtd_domain_page(p);
+        if ( !pgd_maddr )
+            return 0;
+    }
+
+    return pgd_maddr;
+}
+
 static void iommu_flush_write_buffer(struct vtd_iommu *iommu)
 {
     u32 val;
@@ -1286,7 +1328,7 @@ int domain_context_mapping_one(
     struct context_entry *context, *context_entries;
     u64 maddr, pgd_maddr;
     u16 seg = iommu->drhd->segment;
-    int agaw, rc, ret;
+    int rc, ret;
     bool_t flush_dev_iotlb;
 
     ASSERT(pcidevs_locked());
@@ -1340,37 +1382,18 @@ int domain_context_mapping_one(
     if ( iommu_hwdom_passthrough && is_hardware_domain(domain) )
     {
         context_set_translation_type(*context, CONTEXT_TT_PASS_THRU);
-        agaw = level_to_agaw(iommu->nr_pt_levels);
     }
     else
     {
         spin_lock(&hd->arch.mapping_lock);
 
-        /* Ensure we have pagetables allocated down to leaf PTE. */
-        if ( hd->arch.vtd.pgd_maddr == 0 )
+        pgd_maddr = domain_pgd_maddr(domain, iommu);
+        if ( !pgd_maddr )
         {
-            addr_to_dma_page_maddr(domain, 0, 1);
-            if ( hd->arch.vtd.pgd_maddr == 0 )
-            {
-            nomem:
-                spin_unlock(&hd->arch.mapping_lock);
-                spin_unlock(&iommu->lock);
-                unmap_vtd_domain_page(context_entries);
-                return -ENOMEM;
-            }
-        }
-
-        /* Skip top levels of page tables for 2- and 3-level DRHDs. */
-        pgd_maddr = hd->arch.vtd.pgd_maddr;
-        for ( agaw = level_to_agaw(4);
-              agaw != level_to_agaw(iommu->nr_pt_levels);
-              agaw-- )
-        {
-            struct dma_pte *p = map_vtd_domain_page(pgd_maddr);
-            pgd_maddr = dma_pte_addr(*p);
-            unmap_vtd_domain_page(p);
-            if ( pgd_maddr == 0 )
-                goto nomem;
+            spin_unlock(&hd->arch.mapping_lock);
+            spin_unlock(&iommu->lock);
+            unmap_vtd_domain_page(context_entries);
+            return -ENOMEM;
         }
 
         context_set_address_root(*context, pgd_maddr);
@@ -1389,7 +1412,7 @@ int domain_context_mapping_one(
         return -EFAULT;
     }
 
-    context_set_address_width(*context, agaw);
+    context_set_address_width(*context, level_to_agaw(iommu->nr_pt_levels));
     context_set_fault_enable(*context);
     context_set_present(*context);
     iommu_sync_cache(context, sizeof(struct context_entry));
@@ -1848,18 +1871,6 @@ static int __init vtd_ept_page_compatible(struct vtd_iommu *iommu)
            (ept_has_1gb(ept_cap) && opt_hap_1gb) <= cap_sps_1gb(vtd_cap);
 }
 
-/*
- * set VT-d page table directory to EPT table if allowed
- */
-static void iommu_set_pgd(struct domain *d)
-{
-    mfn_t pgd_mfn;
-
-    pgd_mfn = pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
-    dom_iommu(d)->arch.vtd.pgd_maddr =
-        pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
-}
-
 static int rmrr_identity_mapping(struct domain *d, bool_t map,
                                  const struct acpi_rmrr_unit *rmrr,
                                  u32 flag)
@@ -2719,7 +2730,6 @@ static struct iommu_ops __initdata vtd_ops = {
     .adjust_irq_affinities = adjust_vtd_irq_affinities,
     .suspend = vtd_suspend,
     .resume = vtd_resume,
-    .share_p2m = iommu_set_pgd,
     .crash_shutdown = vtd_crash_shutdown,
     .iotlb_flush = iommu_flush_iotlb_pages,
     .iotlb_flush_all = iommu_flush_iotlb_all,
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index b7e5d3da09..1f25d2082f 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -271,7 +271,6 @@ struct iommu_ops {
 
     int __must_check (*suspend)(void);
     void (*resume)(void);
-    void (*share_p2m)(struct domain *d);
     void (*crash_shutdown)(void);
     int __must_check (*iotlb_flush)(struct domain *d, dfn_t dfn,
                                     unsigned long page_count,
@@ -348,8 +347,6 @@ void iommu_resume(void);
 void iommu_crash_shutdown(void);
 int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
 
-void iommu_share_p2m_table(struct domain *d);
-
 #ifdef CONFIG_HAS_PCI
 int iommu_do_pci_domctl(struct xen_domctl *, struct domain *d,
                         XEN_GUEST_HANDLE_PARAM(xen_domctl_t));
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 11/14] iommu: stop calling IOMMU page tables 'p2m tables'
  2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
                   ` (9 preceding siblings ...)
  2020-08-04 13:42 ` [PATCH v4 10/14] iommu: remove the share_p2m operation Paul Durrant
@ 2020-08-04 13:42 ` Paul Durrant
  2020-08-06 12:23   ` Jan Beulich
  2020-08-14  7:12   ` Tian, Kevin
  2020-08-04 13:42 ` [PATCH v4 12/14] vtd: use a bit field for root_entry Paul Durrant
                   ` (2 subsequent siblings)
  13 siblings, 2 replies; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:42 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Paul Durrant, Kevin Tian, Jan Beulich, Paul Durrant

From: Paul Durrant <pdurrant@amazon.com>

It's confusing and not consistent with the terminology introduced with 'dfn_t'.
Just call them IOMMU page tables.

Also remove a pointless check of the 'acpi_drhd_units' list in
vtd_dump_page_table_level(). If the list is empty then IOMMU mappings would
not have been enabled for the domain in the first place.

NOTE: All calls to printk() have also been removed from
      iommu_dump_page_tables(); the implementation specific code is now
      responsible for all output.
      The check for the global 'iommu_enabled' has also been replaced by an
      ASSERT since iommu_dump_page_tables() is not registered as a key handler
      unless IOMMU mappings are enabled.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul@xen.org>
Cc: Kevin Tian <kevin.tian@intel.com>

v2:
 - Moved all output into implementation specific code
---
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 16 ++++++-------
 xen/drivers/passthrough/iommu.c             | 21 ++++-------------
 xen/drivers/passthrough/vtd/iommu.c         | 26 +++++++++++----------
 xen/include/xen/iommu.h                     |  2 +-
 4 files changed, 28 insertions(+), 37 deletions(-)

diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 3390c22cf3..be578607b1 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -491,8 +491,8 @@ static int amd_iommu_group_id(u16 seg, u8 bus, u8 devfn)
 
 #include <asm/io_apic.h>
 
-static void amd_dump_p2m_table_level(struct page_info* pg, int level, 
-                                     paddr_t gpa, int indent)
+static void amd_dump_page_table_level(struct page_info* pg, int level,
+                                      paddr_t gpa, int indent)
 {
     paddr_t address;
     struct amd_iommu_pte *table_vaddr;
@@ -529,7 +529,7 @@ static void amd_dump_p2m_table_level(struct page_info* pg, int level,
 
         address = gpa + amd_offset_level_address(index, level);
         if ( pde->next_level >= 1 )
-            amd_dump_p2m_table_level(
+            amd_dump_page_table_level(
                 mfn_to_page(_mfn(pde->mfn)), pde->next_level,
                 address, indent + 1);
         else
@@ -542,16 +542,16 @@ static void amd_dump_p2m_table_level(struct page_info* pg, int level,
     unmap_domain_page(table_vaddr);
 }
 
-static void amd_dump_p2m_table(struct domain *d)
+static void amd_dump_page_tables(struct domain *d)
 {
     const struct domain_iommu *hd = dom_iommu(d);
 
     if ( !hd->arch.amd.root_table )
         return;
 
-    printk("p2m table has %u levels\n", hd->arch.amd.paging_mode);
-    amd_dump_p2m_table_level(hd->arch.amd.root_table,
-                             hd->arch.amd.paging_mode, 0, 0);
+    printk("AMD IOMMU table has %u levels\n", hd->arch.amd.paging_mode);
+    amd_dump_page_table_level(hd->arch.amd.root_table,
+                              hd->arch.amd.paging_mode, 0, 0);
 }
 
 static const struct iommu_ops __initconstrel _iommu_ops = {
@@ -578,7 +578,7 @@ static const struct iommu_ops __initconstrel _iommu_ops = {
     .suspend = amd_iommu_suspend,
     .resume = amd_iommu_resume,
     .crash_shutdown = amd_iommu_crash_shutdown,
-    .dump_p2m_table = amd_dump_p2m_table,
+    .dump_page_tables = amd_dump_page_tables,
 };
 
 static const struct iommu_init_ops __initconstrel _iommu_init_ops = {
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 7464f10d1c..0f468379e1 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -22,7 +22,7 @@
 #include <xen/keyhandler.h>
 #include <xsm/xsm.h>
 
-static void iommu_dump_p2m_table(unsigned char key);
+static void iommu_dump_page_tables(unsigned char key);
 
 unsigned int __read_mostly iommu_dev_iotlb_timeout = 1000;
 integer_param("iommu_dev_iotlb_timeout", iommu_dev_iotlb_timeout);
@@ -212,7 +212,7 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
     if ( !is_iommu_enabled(d) )
         return;
 
-    register_keyhandler('o', &iommu_dump_p2m_table, "dump iommu p2m table", 0);
+    register_keyhandler('o', &iommu_dump_page_tables, "dump iommu page tables", 0);
 
     hd->platform_ops->hwdom_init(d);
 }
@@ -533,16 +533,12 @@ bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
     return is_iommu_enabled(d) && test_bit(feature, dom_iommu(d)->features);
 }
 
-static void iommu_dump_p2m_table(unsigned char key)
+static void iommu_dump_page_tables(unsigned char key)
 {
     struct domain *d;
     const struct iommu_ops *ops;
 
-    if ( !iommu_enabled )
-    {
-        printk("IOMMU not enabled!\n");
-        return;
-    }
+    ASSERT(iommu_enabled);
 
     ops = iommu_get_ops();
 
@@ -553,14 +549,7 @@ static void iommu_dump_p2m_table(unsigned char key)
         if ( is_hardware_domain(d) || !is_iommu_enabled(d) )
             continue;
 
-        if ( iommu_use_hap_pt(d) )
-        {
-            printk("\ndomain%d IOMMU p2m table shared with MMU: \n", d->domain_id);
-            continue;
-        }
-
-        printk("\ndomain%d IOMMU p2m table: \n", d->domain_id);
-        ops->dump_p2m_table(d);
+        ops->dump_page_tables(d);
     }
 
     rcu_read_unlock(&domlist_read_lock);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index a532d9e88c..f8da4fe0e7 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2582,8 +2582,8 @@ static void vtd_resume(void)
     }
 }
 
-static void vtd_dump_p2m_table_level(paddr_t pt_maddr, int level, paddr_t gpa, 
-                                     int indent)
+static void vtd_dump_page_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
+                                      int indent)
 {
     paddr_t address;
     int i;
@@ -2612,8 +2612,8 @@ static void vtd_dump_p2m_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
 
         address = gpa + offset_level_address(i, level);
         if ( next_level >= 1 ) 
-            vtd_dump_p2m_table_level(dma_pte_addr(*pte), next_level, 
-                                     address, indent + 1);
+            vtd_dump_page_table_level(dma_pte_addr(*pte), next_level,
+                                      address, indent + 1);
         else
             printk("%*sdfn: %08lx mfn: %08lx\n",
                    indent, "",
@@ -2624,17 +2624,19 @@ static void vtd_dump_p2m_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
     unmap_vtd_domain_page(pt_vaddr);
 }
 
-static void vtd_dump_p2m_table(struct domain *d)
+static void vtd_dump_page_tables(struct domain *d)
 {
-    const struct domain_iommu *hd;
+    const struct domain_iommu *hd = dom_iommu(d);
 
-    if ( list_empty(&acpi_drhd_units) )
+    if ( iommu_use_hap_pt(d) )
+    {
+        printk("VT-D sharing EPT table\n");
         return;
+    }
 
-    hd = dom_iommu(d);
-    printk("p2m table has %d levels\n", agaw_to_level(hd->arch.vtd.agaw));
-    vtd_dump_p2m_table_level(hd->arch.vtd.pgd_maddr,
-                             agaw_to_level(hd->arch.vtd.agaw), 0, 0);
+    printk("VT-D table has %d levels\n", agaw_to_level(hd->arch.vtd.agaw));
+    vtd_dump_page_table_level(hd->arch.vtd.pgd_maddr,
+                              agaw_to_level(hd->arch.vtd.agaw), 0, 0);
 }
 
 static int __init intel_iommu_quarantine_init(struct domain *d)
@@ -2734,7 +2736,7 @@ static struct iommu_ops __initdata vtd_ops = {
     .iotlb_flush = iommu_flush_iotlb_pages,
     .iotlb_flush_all = iommu_flush_iotlb_all,
     .get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
-    .dump_p2m_table = vtd_dump_p2m_table,
+    .dump_page_tables = vtd_dump_page_tables,
 };
 
 const struct iommu_init_ops __initconstrel intel_iommu_init_ops = {
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 1f25d2082f..23e884f54b 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -277,7 +277,7 @@ struct iommu_ops {
                                     unsigned int flush_flags);
     int __must_check (*iotlb_flush_all)(struct domain *d);
     int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
-    void (*dump_p2m_table)(struct domain *d);
+    void (*dump_page_tables)(struct domain *d);
 
 #ifdef CONFIG_HAS_DEVICE_TREE
     /*
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 12/14] vtd: use a bit field for root_entry
  2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
                   ` (10 preceding siblings ...)
  2020-08-04 13:42 ` [PATCH v4 11/14] iommu: stop calling IOMMU page tables 'p2m tables' Paul Durrant
@ 2020-08-04 13:42 ` Paul Durrant
  2020-08-06 12:34   ` Jan Beulich
  2020-08-14  7:17   ` Tian, Kevin
  2020-08-04 13:42 ` [PATCH v4 13/14] vtd: use a bit field for context_entry Paul Durrant
  2020-08-04 13:42 ` [PATCH v4 14/14] vtd: use a bit field for dma_pte Paul Durrant
  13 siblings, 2 replies; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Kevin Tian

From: Paul Durrant <pdurrant@amazon.com>

This makes the code a little easier to read and also makes it more consistent
with iremap_entry.

Also take the opportunity to tidy up the implementation of device_in_domain().

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Kevin Tian <kevin.tian@intel.com>

v4:
 - New in v4
---
 xen/drivers/passthrough/vtd/iommu.c   |  4 ++--
 xen/drivers/passthrough/vtd/iommu.h   | 33 ++++++++++++++++-----------
 xen/drivers/passthrough/vtd/utils.c   |  4 ++--
 xen/drivers/passthrough/vtd/x86/ats.c | 27 ++++++++++++----------
 4 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index f8da4fe0e7..76025f6ccd 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -245,11 +245,11 @@ static u64 bus_to_context_maddr(struct vtd_iommu *iommu, u8 bus)
             unmap_vtd_domain_page(root_entries);
             return 0;
         }
-        set_root_value(*root, maddr);
+        set_root_ctp(*root, maddr);
         set_root_present(*root);
         iommu_sync_cache(root, sizeof(struct root_entry));
     }
-    maddr = (u64) get_context_addr(*root);
+    maddr = root_ctp(*root);
     unmap_vtd_domain_page(root_entries);
     return maddr;
 }
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 216791b3d6..031ac5f66c 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -184,21 +184,28 @@
 #define dma_frcd_source_id(c) (c & 0xffff)
 #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
 
-/*
- * 0: Present
- * 1-11: Reserved
- * 12-63: Context Ptr (12 - (haw-1))
- * 64-127: Reserved
- */
 struct root_entry {
-    u64    val;
-    u64    rsvd1;
+    union {
+        __uint128_t val;
+        struct { uint64_t lo, hi; };
+        struct {
+            /* 0 - 63 */
+            uint64_t p:1;
+            uint64_t reserved0:11;
+            uint64_t ctp:52;
+
+            /* 64 - 127 */
+            uint64_t reserved1;
+        };
+    };
 };
-#define root_present(root)    ((root).val & 1)
-#define set_root_present(root) do {(root).val |= 1;} while(0)
-#define get_context_addr(root) ((root).val & PAGE_MASK_4K)
-#define set_root_value(root, value) \
-    do {(root).val |= ((value) & PAGE_MASK_4K);} while(0)
+
+#define root_present(r) (r).p
+#define set_root_present(r) do { (r).p = 1; } while (0)
+
+#define root_ctp(r) ((r).ctp << PAGE_SHIFT_4K)
+#define set_root_ctp(r, val) \
+    do { (r).ctp = ((val) >> PAGE_SHIFT_4K); } while (0)
 
 struct context_entry {
     u64 lo;
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index 4febcf506d..4c85242894 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -112,7 +112,7 @@ void print_vtd_entries(struct vtd_iommu *iommu, int bus, int devfn, u64 gmfn)
         return;
     }
 
-    printk("    root_entry[%02x] = %"PRIx64"\n", bus, root_entry[bus].val);
+    printk("    root_entry[%02x] = %"PRIx64"\n", bus, root_entry[bus].lo);
     if ( !root_present(root_entry[bus]) )
     {
         unmap_vtd_domain_page(root_entry);
@@ -120,7 +120,7 @@ void print_vtd_entries(struct vtd_iommu *iommu, int bus, int devfn, u64 gmfn)
         return;
     }
 
-    val = root_entry[bus].val;
+    val = root_ctp(root_entry[bus]);
     unmap_vtd_domain_page(root_entry);
     ctxt_entry = map_vtd_domain_page(val);
     if ( ctxt_entry == NULL )
diff --git a/xen/drivers/passthrough/vtd/x86/ats.c b/xen/drivers/passthrough/vtd/x86/ats.c
index 04d702b1d6..8369415dcc 100644
--- a/xen/drivers/passthrough/vtd/x86/ats.c
+++ b/xen/drivers/passthrough/vtd/x86/ats.c
@@ -74,8 +74,8 @@ int ats_device(const struct pci_dev *pdev, const struct acpi_drhd_unit *drhd)
 static bool device_in_domain(const struct vtd_iommu *iommu,
                              const struct pci_dev *pdev, uint16_t did)
 {
-    struct root_entry *root_entry;
-    struct context_entry *ctxt_entry = NULL;
+    struct root_entry *root_entry, *root_entries = NULL;
+    struct context_entry *context_entry, *context_entries = NULL;
     unsigned int tt;
     bool found = false;
 
@@ -85,25 +85,28 @@ static bool device_in_domain(const struct vtd_iommu *iommu,
         return false;
     }
 
-    root_entry = map_vtd_domain_page(iommu->root_maddr);
-    if ( !root_present(root_entry[pdev->bus]) )
+    root_entries = (struct root_entry *)map_vtd_domain_page(iommu->root_maddr);
+    root_entry = &root_entries[pdev->bus];
+    if ( !root_present(*root_entry) )
         goto out;
 
-    ctxt_entry = map_vtd_domain_page(root_entry[pdev->bus].val);
-    if ( context_domain_id(ctxt_entry[pdev->devfn]) != did )
+    context_entries = map_vtd_domain_page(root_ctp(*root_entry));
+    context_entry = &context_entries[pdev->devfn];
+    if ( context_domain_id(*context_entry) != did )
         goto out;
 
-    tt = context_translation_type(ctxt_entry[pdev->devfn]);
+    tt = context_translation_type(*context_entry);
     if ( tt != CONTEXT_TT_DEV_IOTLB )
         goto out;
 
     found = true;
-out:
-    if ( root_entry )
-        unmap_vtd_domain_page(root_entry);
 
-    if ( ctxt_entry )
-        unmap_vtd_domain_page(ctxt_entry);
+ out:
+    if ( root_entries )
+        unmap_vtd_domain_page(root_entries);
+
+    if ( context_entries )
+        unmap_vtd_domain_page(context_entries);
 
     return found;
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 13/14] vtd: use a bit field for context_entry
  2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
                   ` (11 preceding siblings ...)
  2020-08-04 13:42 ` [PATCH v4 12/14] vtd: use a bit field for root_entry Paul Durrant
@ 2020-08-04 13:42 ` Paul Durrant
  2020-08-06 12:46   ` Jan Beulich
  2020-08-14  7:19   ` Tian, Kevin
  2020-08-04 13:42 ` [PATCH v4 14/14] vtd: use a bit field for dma_pte Paul Durrant
  13 siblings, 2 replies; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Kevin Tian

From: Paul Durrant <pdurrant@amazon.com>

This removes the need for much shifting, masking and several magic numbers.
On the whole it makes the code quite a bit more readable.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Kevin Tian <kevin.tian@intel.com>

v4:
 - New in v4
---
 xen/drivers/passthrough/vtd/iommu.c   |  8 ++--
 xen/drivers/passthrough/vtd/iommu.h   | 65 +++++++++++++++++----------
 xen/drivers/passthrough/vtd/utils.c   |  6 +--
 xen/drivers/passthrough/vtd/x86/ats.c |  2 +-
 4 files changed, 49 insertions(+), 32 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 76025f6ccd..766d33058e 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -86,8 +86,6 @@ static int domain_iommu_domid(struct domain *d,
     return -1;
 }
 
-#define DID_FIELD_WIDTH 16
-#define DID_HIGH_OFFSET 8
 static int context_set_domain_id(struct context_entry *context,
                                  struct domain *d,
                                  struct vtd_iommu *iommu)
@@ -121,7 +119,7 @@ static int context_set_domain_id(struct context_entry *context,
     }
 
     set_bit(i, iommu->domid_bitmap);
-    context->hi |= (i & ((1 << DID_FIELD_WIDTH) - 1)) << DID_HIGH_OFFSET;
+    context_set_did(*context, i);
     return 0;
 }
 
@@ -135,7 +133,7 @@ static int context_get_domain_id(struct context_entry *context,
     {
         nr_dom = cap_ndoms(iommu->cap);
 
-        dom_index = context_domain_id(*context);
+        dom_index = context_did(*context);
 
         if ( dom_index < nr_dom && iommu->domid_map )
             domid = iommu->domid_map[dom_index];
@@ -1396,7 +1394,7 @@ int domain_context_mapping_one(
             return -ENOMEM;
         }
 
-        context_set_address_root(*context, pgd_maddr);
+        context_set_slptp(*context, pgd_maddr);
         if ( ats_enabled && ecap_dev_iotlb(iommu->ecap) )
             context_set_translation_type(*context, CONTEXT_TT_DEV_IOTLB);
         else
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 031ac5f66c..509d13918a 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -199,6 +199,7 @@ struct root_entry {
         };
     };
 };
+#define ROOT_ENTRY_NR (PAGE_SIZE_4K / sizeof(struct root_entry))
 
 #define root_present(r) (r).p
 #define set_root_present(r) do { (r).p = 1; } while (0)
@@ -208,35 +209,53 @@ struct root_entry {
     do { (r).ctp = ((val) >> PAGE_SHIFT_4K); } while (0)
 
 struct context_entry {
-    u64 lo;
-    u64 hi;
+    union {
+        __uint128_t val;
+        struct { uint64_t lo, hi; };
+        struct {
+            /* 0 - 63 */
+            uint64_t p:1;
+            uint64_t fpd:1;
+            uint64_t tt:2;
+            uint64_t reserved0:8;
+            uint64_t slptp:52;
+
+            /* 64 - 127 */
+            uint64_t aw:3;
+            uint64_t ignored:4;
+            uint64_t reserved1:1;
+            uint64_t did:16;
+            uint64_t reserved2:40;
+        };
+    };
 };
-#define ROOT_ENTRY_NR (PAGE_SIZE_4K/sizeof(struct root_entry))
-#define context_present(c) ((c).lo & 1)
-#define context_fault_disable(c) (((c).lo >> 1) & 1)
-#define context_translation_type(c) (((c).lo >> 2) & 3)
-#define context_address_root(c) ((c).lo & PAGE_MASK_4K)
-#define context_address_width(c) ((c).hi &  7)
-#define context_domain_id(c) (((c).hi >> 8) & ((1 << 16) - 1))
-
-#define context_set_present(c) do {(c).lo |= 1;} while(0)
-#define context_clear_present(c) do {(c).lo &= ~1;} while(0)
-#define context_set_fault_enable(c) \
-    do {(c).lo &= (((u64)-1) << 2) | 1;} while(0)
-
-#define context_set_translation_type(c, val) do { \
-        (c).lo &= (((u64)-1) << 4) | 3; \
-        (c).lo |= (val & 3) << 2; \
-    } while(0)
+
+#define context_present(c) (c).p
+#define context_set_present(c) do { (c).p = 1; } while (0)
+#define context_clear_present(c) do { (c).p = 0; } while (0)
+
+#define context_fault_disable(c) (c).fpd
+#define context_set_fault_enable(c) do { (c).fpd = 1; } while (0)
+
+#define context_translation_type(c) (c).tt
+#define context_set_translation_type(c, val) do { (c).tt = val; } while (0)
 #define CONTEXT_TT_MULTI_LEVEL 0
 #define CONTEXT_TT_DEV_IOTLB   1
 #define CONTEXT_TT_PASS_THRU   2
 
-#define context_set_address_root(c, val) \
-    do {(c).lo &= 0xfff; (c).lo |= (val) & PAGE_MASK_4K ;} while(0)
+#define context_slptp(c) ((c).slptp << PAGE_SHIFT_4K)
+#define context_set_slptp(c, val) \
+    do { (c).slptp = (val) >> PAGE_SHIFT_4K; } while (0)
+
+#define context_address_width(c) (c).aw
 #define context_set_address_width(c, val) \
-    do {(c).hi &= 0xfffffff8; (c).hi |= (val) & 7;} while(0)
-#define context_clear_entry(c) do {(c).lo = 0; (c).hi = 0;} while(0)
+    do { (c).aw = (val); } while (0)
+
+#define context_did(c) (c).did
+#define context_set_did(c, val) \
+    do { (c).did = (val); } while (0)
+
+#define context_clear_entry(c) do { (c).val = 0; } while (0)
 
 /* page table handling */
 #define LEVEL_STRIDE       (9)
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index 4c85242894..eae0c43269 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -129,9 +129,8 @@ void print_vtd_entries(struct vtd_iommu *iommu, int bus, int devfn, u64 gmfn)
         return;
     }
 
-    val = ctxt_entry[devfn].lo;
-    printk("    context[%02x] = %"PRIx64"_%"PRIx64"\n",
-           devfn, ctxt_entry[devfn].hi, val);
+    printk("    context[%02x] = %"PRIx64"_%"PRIx64"\n", devfn,
+           ctxt_entry[devfn].hi, ctxt_entry[devfn].lo);
     if ( !context_present(ctxt_entry[devfn]) )
     {
         unmap_vtd_domain_page(ctxt_entry);
@@ -140,6 +139,7 @@ void print_vtd_entries(struct vtd_iommu *iommu, int bus, int devfn, u64 gmfn)
     }
 
     level = agaw_to_level(context_address_width(ctxt_entry[devfn]));
+    val = context_slptp(ctxt_entry[devfn]);
     unmap_vtd_domain_page(ctxt_entry);
     if ( level != VTD_PAGE_TABLE_LEVEL_3 &&
          level != VTD_PAGE_TABLE_LEVEL_4)
diff --git a/xen/drivers/passthrough/vtd/x86/ats.c b/xen/drivers/passthrough/vtd/x86/ats.c
index 8369415dcc..a7bbd3198a 100644
--- a/xen/drivers/passthrough/vtd/x86/ats.c
+++ b/xen/drivers/passthrough/vtd/x86/ats.c
@@ -92,7 +92,7 @@ static bool device_in_domain(const struct vtd_iommu *iommu,
 
     context_entries = map_vtd_domain_page(root_ctp(*root_entry));
     context_entry = &context_entries[pdev->devfn];
-    if ( context_domain_id(*context_entry) != did )
+    if ( context_did(*context_entry) != did )
         goto out;
 
     tt = context_translation_type(*context_entry);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v4 14/14] vtd: use a bit field for dma_pte
  2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
                   ` (12 preceding siblings ...)
  2020-08-04 13:42 ` [PATCH v4 13/14] vtd: use a bit field for context_entry Paul Durrant
@ 2020-08-04 13:42 ` Paul Durrant
  2020-08-06 12:53   ` Jan Beulich
  13 siblings, 1 reply; 43+ messages in thread
From: Paul Durrant @ 2020-08-04 13:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Kevin Tian

From: Paul Durrant <pdurrant@amazon.com>

As with a prior patch for context_entry, this removes the need for much
shifting, masking and several magic numbers.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Kevin Tian <kevin.tian@intel.com>

v4:
 - New in v4
---
 xen/drivers/passthrough/vtd/iommu.c |  9 ++---
 xen/drivers/passthrough/vtd/iommu.h | 55 +++++++++++++++++------------
 2 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 766d33058e..2d60cebe67 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1772,13 +1772,14 @@ static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
     old = *pte;
 
     dma_set_pte_addr(new, mfn_to_maddr(mfn));
-    dma_set_pte_prot(new,
-                     ((flags & IOMMUF_readable) ? DMA_PTE_READ  : 0) |
-                     ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0));
+    if ( flags & IOMMUF_readable )
+        dma_set_pte_readable(new);
+    if ( flags & IOMMUF_writable )
+        dma_set_pte_writable(new);
 
     /* Set the SNP on leaf page table if Snoop Control available */
     if ( iommu_snoop )
-        dma_set_pte_snp(new);
+        dma_set_pte_snoop(new);
 
     if ( old.val == new.val )
     {
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 509d13918a..017286b0e1 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -283,29 +283,40 @@ struct context_entry {
  * 12-63: Host physcial address
  */
 struct dma_pte {
-    u64 val;
+    union {
+        uint64_t val;
+        struct {
+            uint64_t r:1;
+            uint64_t w:1;
+            uint64_t reserved0:1;
+            uint64_t ignored0:4;
+            uint64_t ps:1;
+            uint64_t ignored1:3;
+            uint64_t snp:1;
+            uint64_t addr:52;
+        };
+    };
 };
-#define DMA_PTE_READ (1)
-#define DMA_PTE_WRITE (2)
-#define DMA_PTE_PROT (DMA_PTE_READ | DMA_PTE_WRITE)
-#define DMA_PTE_SP   (1 << 7)
-#define DMA_PTE_SNP  (1 << 11)
-#define dma_clear_pte(p)    do {(p).val = 0;} while(0)
-#define dma_set_pte_readable(p) do {(p).val |= DMA_PTE_READ;} while(0)
-#define dma_set_pte_writable(p) do {(p).val |= DMA_PTE_WRITE;} while(0)
-#define dma_set_pte_superpage(p) do {(p).val |= DMA_PTE_SP;} while(0)
-#define dma_set_pte_snp(p)  do {(p).val |= DMA_PTE_SNP;} while(0)
-#define dma_set_pte_prot(p, prot) do { \
-        (p).val = ((p).val & ~DMA_PTE_PROT) | ((prot) & DMA_PTE_PROT); \
-    } while (0)
-#define dma_pte_prot(p) ((p).val & DMA_PTE_PROT)
-#define dma_pte_read(p) (dma_pte_prot(p) & DMA_PTE_READ)
-#define dma_pte_write(p) (dma_pte_prot(p) & DMA_PTE_WRITE)
-#define dma_pte_addr(p) ((p).val & PADDR_MASK & PAGE_MASK_4K)
-#define dma_set_pte_addr(p, addr) do {\
-            (p).val |= ((addr) & PAGE_MASK_4K); } while (0)
-#define dma_pte_present(p) (((p).val & DMA_PTE_PROT) != 0)
-#define dma_pte_superpage(p) (((p).val & DMA_PTE_SP) != 0)
+
+#define dma_pte_read(p) ((p).r)
+#define dma_set_pte_readable(p) do { (p).r = 1; } while (0)
+
+#define dma_pte_write(p) ((p).w)
+#define dma_set_pte_writable(p) do { (p).w = 1; } while (0)
+
+#define dma_pte_addr(p) ((p).addr << PAGE_SHIFT_4K)
+#define dma_set_pte_addr(p, val) \
+    do { (p).addr =  (val) >> PAGE_SHIFT_4K; } while (0)
+
+#define dma_pte_present(p) ((p).r || (p).w)
+
+#define dma_pte_superpage(p) ((p).ps)
+#define dma_set_pte_superpage(p) do { (p).ps = 1; } while (0)
+
+#define dma_pte_snoop(p) ((p).snp)
+#define dma_set_pte_snoop(p)  do { (p).snp = 1; } while (0)
+
+#define dma_clear_pte(p) do { (p).val = 0; } while (0)
 
 /* interrupt remap entry */
 struct iremap_entry {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 02/14] x86/iommu: add common page-table allocator
  2020-08-04 13:41 ` [PATCH v4 02/14] x86/iommu: add common page-table allocator Paul Durrant
@ 2020-08-05 15:39   ` Jan Beulich
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2020-08-05 15:39 UTC (permalink / raw)
  To: Paul Durrant
  Cc: xen-devel, Paul Durrant, Roger Pau Monné, Wei Liu, Andrew Cooper

On 04.08.2020 15:41, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> Instead of having separate page table allocation functions in VT-d and AMD
> IOMMU code, we could use a common allocation function in the general x86 code.
> 
> This patch adds a new allocation function, iommu_alloc_pgtable(), for this
> purpose. The function adds the page table pages to a list. The pages in this
> list are then freed by iommu_free_pgtables(), which is called by
> domain_relinquish_resources() after PCI devices have been de-assigned.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 06/14] iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail
  2020-08-04 13:42 ` [PATCH v4 06/14] iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail Paul Durrant
@ 2020-08-05 16:06   ` Jan Beulich
  2020-08-05 16:18     ` Paul Durrant
  2020-08-06 11:41   ` Jan Beulich
  2020-08-14  6:53   ` Tian, Kevin
  2 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2020-08-05 16:06 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Paul Durrant

On 04.08.2020 15:42, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> This patch adds a full I/O TLB flush to the error paths of iommu_map() and
> iommu_unmap().
> 
> Without this change callers need constructs such as:
> 
> rc = iommu_map/unmap(...)
> err = iommu_flush(...)
> if ( !rc )
>   rc = err;
> 
> With this change, it can be simplified to:
> 
> rc = iommu_map/unmap(...)
> if ( !rc )
>   rc = iommu_flush(...)
> 
> because, if the map or unmap fails the flush will be unnecessary. This saves
> a stack variable and generally makes the call sites tidier.

I appreciate the intent of tidier code, but I wonder whether this
flushing doesn't go a little too far: There's a need to flush in
general when multiple pages were to be (un)mapped, and there was
at least partial success. Hence e.g. in the order == 0 case I
don't see why any flushing would be needed. Granted errors aren't
commonly expected, but anyway.

> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -274,6 +274,10 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
>          break;
>      }
>  
> +    /* Something went wrong so flush everything and clear flush flags */
> +    if ( unlikely(rc) && iommu_iotlb_flush_all(d, *flush_flags) )

Both here and in the unmap path, did you get the return value
of iommu_iotlb_flush_all() the wrong way round (i.e. isn't there
a missing ! )?

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 06/14] iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail
  2020-08-05 16:06   ` Jan Beulich
@ 2020-08-05 16:18     ` Paul Durrant
  0 siblings, 0 replies; 43+ messages in thread
From: Paul Durrant @ 2020-08-05 16:18 UTC (permalink / raw)
  To: 'Jan Beulich'; +Cc: xen-devel, 'Paul Durrant'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 05 August 2020 17:06
> To: Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>
> Subject: Re: [PATCH v4 06/14] iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail
> 
> On 04.08.2020 15:42, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > This patch adds a full I/O TLB flush to the error paths of iommu_map() and
> > iommu_unmap().
> >
> > Without this change callers need constructs such as:
> >
> > rc = iommu_map/unmap(...)
> > err = iommu_flush(...)
> > if ( !rc )
> >   rc = err;
> >
> > With this change, it can be simplified to:
> >
> > rc = iommu_map/unmap(...)
> > if ( !rc )
> >   rc = iommu_flush(...)
> >
> > because, if the map or unmap fails the flush will be unnecessary. This saves
> > a stack variable and generally makes the call sites tidier.
> 
> I appreciate the intent of tidier code, but I wonder whether this
> flushing doesn't go a little too far: There's a need to flush in
> general when multiple pages were to be (un)mapped, and there was
> at least partial success. Hence e.g. in the order == 0 case I
> don't see why any flushing would be needed. Granted errors aren't
> commonly expected, but anyway.
> 

Yes, I wasn't really worried about optimizing the error case, but I can avoid unnecessary flushing in the order 0 case.

> > --- a/xen/drivers/passthrough/iommu.c
> > +++ b/xen/drivers/passthrough/iommu.c
> > @@ -274,6 +274,10 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
> >          break;
> >      }
> >
> > +    /* Something went wrong so flush everything and clear flush flags */
> > +    if ( unlikely(rc) && iommu_iotlb_flush_all(d, *flush_flags) )
> 
> Both here and in the unmap path, did you get the return value
> of iommu_iotlb_flush_all() the wrong way round (i.e. isn't there
> a missing ! )?
> 

Yes, I think you're right. I'll need to re-work anyway to avoid the flush in the order 0 case.

  Paul

> Jan



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 07/14] iommu: make map, unmap and flush all take both an order and a count
  2020-08-04 13:42 ` [PATCH v4 07/14] iommu: make map, unmap and flush all take both an order and a count Paul Durrant
@ 2020-08-06  9:57   ` Jan Beulich
  2020-08-11 11:00     ` Durrant, Paul
  2020-08-14  6:57     ` Tian, Kevin
  0 siblings, 2 replies; 43+ messages in thread
From: Jan Beulich @ 2020-08-06  9:57 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Paul Durrant, Ian Jackson, George Dunlap,
	Jun Nakajima, xen-devel, Volodymyr Babchuk, Roger Pau Monné

On 04.08.2020 15:42, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> At the moment iommu_map() and iommu_unmap() take a page order but not a
> count, whereas iommu_iotlb_flush() takes a count but not a page order.
> This patch simply makes them consistent with each other.

Why can't we do with just a count, where order gets worked out by
functions knowing how to / wanting to deal with higher order pages?

> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -843,7 +843,7 @@ out:
>           need_modify_vtd_table )
>      {
>          if ( iommu_use_hap_pt(d) )
> -            rc = iommu_iotlb_flush(d, _dfn(gfn), (1u << order),
> +            rc = iommu_iotlb_flush(d, _dfn(gfn), (1u << order), 1,

Forgot to drop the "1 << "? (There are then I think two more instances
further down.)

> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -851,12 +851,12 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
>  
>          this_cpu(iommu_dont_flush_iotlb) = 0;
>  
> -        ret = iommu_iotlb_flush(d, _dfn(xatp->idx - done), done,
> +        ret = iommu_iotlb_flush(d, _dfn(xatp->idx - done), 0, done,

Arguments wrong way round? (This risk of inverting their order is
one of the primary reasons why I think we want just a count.) I'm
also uncertain about the use of 0 vs PAGE_ORDER_4K here.

>                                  IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified);
>          if ( unlikely(ret) && rc >= 0 )
>              rc = ret;
>  
> -        ret = iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), done,
> +        ret = iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), 0, done,

Same here then.

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 08/14] remove remaining uses of iommu_legacy_map/unmap
  2020-08-04 13:42 ` [PATCH v4 08/14] remove remaining uses of iommu_legacy_map/unmap Paul Durrant
@ 2020-08-06 10:28   ` Jan Beulich
  2020-08-12  9:36     ` [EXTERNAL] " Paul Durrant
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2020-08-06 10:28 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Paul Durrant, Ian Jackson, George Dunlap,
	Jun Nakajima, xen-devel, Roger Pau Monné

On 04.08.2020 15:42, Paul Durrant wrote:
> The 'legacy' functions do implicit flushing so amend the callers to do the
> appropriate flushing.
> 
> Unfortunately, because of the structure of the P2M code, we cannot remove
> the per-CPU 'iommu_dont_flush_iotlb' global and the optimization it
> facilitates. It is now checked directly iommu_iotlb_flush(). Also, it is
> now declared as bool (rather than bool_t) and setting/clearing it are no
> longer pointlessly gated on is_iommu_enabled() returning true. (Arguably
> it is also pointless to gate the call to iommu_iotlb_flush() on that
> condition - since it is a no-op in that case - but the if clause allows
> the scope of a stack variable to be restricted).
> 
> NOTE: The code in memory_add() now fails if the number of pages passed to
>       a single call overflows an unsigned int. I don't believe this will
>       ever happen in practice.

I.e. you don't think adding 16Tb of memory in one go is possible?
I wouldn't bet on that ...

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -2446,10 +2446,16 @@ static int cleanup_page_mappings(struct page_info *page)
>  
>          if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
>          {
> -            int rc2 = iommu_legacy_unmap(d, _dfn(mfn), PAGE_ORDER_4K);
> +            unsigned int flush_flags = 0;
> +            int err;
>  
> +            err = iommu_unmap(d, _dfn(mfn), PAGE_ORDER_4K, 1, &flush_flags);
>              if ( !rc )
> -                rc = rc2;
> +                rc = err;
> +
> +            err = iommu_iotlb_flush(d, _dfn(mfn), PAGE_ORDER_4K, 1, flush_flags);
> +            if ( !rc )
> +                rc = err;
>          }

Wasn't the earlier change to add flushing in the error case to
allow to simplify code like this to

        if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
        {
            unsigned int flush_flags = 0;
            int err;

            err = iommu_unmap(d, _dfn(mfn), PAGE_ORDER_4K, 1, &flush_flags);
            if ( !err )
                err = iommu_iotlb_flush(d, _dfn(mfn), PAGE_ORDER_4K, 1, flush_flags);
            if ( !rc )
                rc = err;
        }

?

> @@ -1441,9 +1446,16 @@ int clear_identity_p2m_entry(struct domain *d, unsigned long gfn_l)
>  
>      if ( !paging_mode_translate(d) )
>      {
> -        if ( !is_iommu_enabled(d) )
> -            return 0;
> -        return iommu_legacy_unmap(d, _dfn(gfn_l), PAGE_ORDER_4K);
> +        unsigned int flush_flags = 0;
> +        int err;
> +
> +        ret = iommu_unmap(d, _dfn(gfn_l), PAGE_ORDER_4K, 1, &flush_flags);
> +
> +        err = iommu_iotlb_flush(d, _dfn(gfn_l), PAGE_ORDER_4K, 1, flush_flags);
> +        if ( !ret )
> +            ret = err;
> +
> +        return ret;
>      }

Similarly here then.

> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -1413,21 +1413,22 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
>           !iommu_use_hap_pt(hardware_domain) &&
>           !need_iommu_pt_sync(hardware_domain) )
>      {
> -        for ( i = spfn; i < epfn; i++ )
> -            if ( iommu_legacy_map(hardware_domain, _dfn(i), _mfn(i),
> -                                  PAGE_ORDER_4K,
> -                                  IOMMUF_readable | IOMMUF_writable) )
> -                break;
> -        if ( i != epfn )
> -        {
> -            while (i-- > old_max)
> -                /* If statement to satisfy __must_check. */
> -                if ( iommu_legacy_unmap(hardware_domain, _dfn(i),
> -                                        PAGE_ORDER_4K) )
> -                    continue;
> +        unsigned int flush_flags = 0;
> +        unsigned int n = epfn - spfn;
> +        int rc;
>  
> +        ret = -EOVERFLOW;
> +        if ( spfn + n != epfn )
> +            goto destroy_m2p;
> +
> +        rc = iommu_map(hardware_domain, _dfn(i), _mfn(i),
> +                       PAGE_ORDER_4K, n, IOMMUF_readable | IOMMUF_writable,
> +                       &flush_flags);
> +        if ( !rc )
> +            rc = iommu_iotlb_flush(hardware_domain, _dfn(i), PAGE_ORDER_4K, n,
> +                                       flush_flags);
> +        if ( rc )
>              goto destroy_m2p;
> -        }
>      }

Did you mean to use "ret" here instead of introducing "rc"?

> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -1225,11 +1225,23 @@ map_grant_ref(
>              kind = IOMMUF_readable;
>          else
>              kind = 0;
> -        if ( kind && iommu_legacy_map(ld, _dfn(mfn_x(mfn)), mfn, 0, kind) )
> +        if ( kind )
>          {
> -            double_gt_unlock(lgt, rgt);
> -            rc = GNTST_general_error;
> -            goto undo_out;
> +            dfn_t dfn = _dfn(mfn_x(mfn));
> +            unsigned int flush_flags = 0;
> +            int err;
> +
> +            err = iommu_map(ld, dfn, mfn, 0, 1, kind, &flush_flags);
> +            if ( !err )
> +                err = iommu_iotlb_flush(ld, dfn, 0, 1, flush_flags);

Question of 0 vs PAGE_ORDER_4K again.

> @@ -1473,21 +1485,25 @@ unmap_common(
>      if ( rc == GNTST_okay && gnttab_need_iommu_mapping(ld) )
>      {
>          unsigned int kind;
> +        dfn_t dfn = _dfn(mfn_x(op->mfn));
> +        unsigned int flush_flags = 0;
>          int err = 0;
>  
>          double_gt_lock(lgt, rgt);
>  
>          kind = mapkind(lgt, rd, op->mfn);
>          if ( !kind )
> -            err = iommu_legacy_unmap(ld, _dfn(mfn_x(op->mfn)), 0);
> +            err = iommu_unmap(ld, dfn, 0, 1, &flush_flags);
>          else if ( !(kind & MAPKIND_WRITE) )
> -            err = iommu_legacy_map(ld, _dfn(mfn_x(op->mfn)), op->mfn, 0,
> -                                   IOMMUF_readable);
> -
> -        double_gt_unlock(lgt, rgt);
> +            err = iommu_map(ld, dfn, op->mfn, 0, 1, IOMMUF_readable,
> +                            &flush_flags);
>  
> +        if ( !err )
> +            err = iommu_iotlb_flush(ld, dfn, 0, 1, flush_flags);
>          if ( err )
>              rc = GNTST_general_error;
> +
> +        double_gt_unlock(lgt, rgt);
>      }

While moving the unlock ahead of the flush would be somewhat troublesome
in the map case, it seems straightforward here. Even if this gets further
adjusted by a later patch, it should imo be done here - the later patch
may also go in much later.

> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -824,8 +824,7 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
>      xatp->gpfn += start;
>      xatp->size -= start;
>  
> -    if ( is_iommu_enabled(d) )
> -       this_cpu(iommu_dont_flush_iotlb) = 1;
> +    this_cpu(iommu_dont_flush_iotlb) = true;

Just like you replace the original instance here, ...

> @@ -845,6 +844,8 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
>          }
>      }
>  
> +    this_cpu(iommu_dont_flush_iotlb) = false;
> +
>      if ( is_iommu_enabled(d) )
>      {
>          int ret;

... I'm sure you meant to also remove the original instance from
down below here.

> @@ -364,7 +341,7 @@ int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned int page_order,
>      int rc;
>  
>      if ( !is_iommu_enabled(d) || !hd->platform_ops->iotlb_flush ||
> -         !page_count || !flush_flags )
> +         !page_count || !flush_flags || this_cpu(iommu_dont_flush_iotlb) )
>          return 0;

The patch description ought to assure the safety of this change: So
far, despite the flag set callers of iommu_iotlb_flush() (which
may be unaware of the flag's state) did get what they did ask for.
The change relies on there not being any such uses.

> @@ -370,15 +362,12 @@ void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev);
>  
>  /*
>   * The purpose of the iommu_dont_flush_iotlb optional cpu flag is to
> - * avoid unecessary iotlb_flush in the low level IOMMU code.
> - *
> - * iommu_map_page/iommu_unmap_page must flush the iotlb but somethimes
> - * this operation can be really expensive. This flag will be set by the
> - * caller to notify the low level IOMMU code to avoid the iotlb flushes.
> - * iommu_iotlb_flush/iommu_iotlb_flush_all will be explicitly called by
> - * the caller.
> + * avoid unecessary IOMMU flushing while updating the P2M.

Correct the spelling of "unnecessary" at the same time?

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 06/14] iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail
  2020-08-04 13:42 ` [PATCH v4 06/14] iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail Paul Durrant
  2020-08-05 16:06   ` Jan Beulich
@ 2020-08-06 11:41   ` Jan Beulich
  2020-08-14  6:53   ` Tian, Kevin
  2 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2020-08-06 11:41 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Paul Durrant

On 04.08.2020 15:42, Paul Durrant wrote:
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -274,6 +274,10 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
>          break;
>      }
>  
> +    /* Something went wrong so flush everything and clear flush flags */
> +    if ( unlikely(rc) && iommu_iotlb_flush_all(d, *flush_flags) )
> +        flush_flags = 0;

Noticed only while looking at patch 9: There's also an indirection
missing both here and ...

> @@ -330,6 +328,10 @@ int iommu_unmap(struct domain *d, dfn_t dfn, unsigned int page_order,
>          }
>      }
>  
> +    /* Something went wrong so flush everything and clear flush flags */
> +    if ( unlikely(rc) && iommu_iotlb_flush_all(d, *flush_flags) )
> +        flush_flags = 0;

... here.

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 09/14] common/grant_table: batch flush I/O TLB
  2020-08-04 13:42 ` [PATCH v4 09/14] common/grant_table: batch flush I/O TLB Paul Durrant
@ 2020-08-06 11:49   ` Jan Beulich
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2020-08-06 11:49 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Paul Durrant, Ian Jackson, George Dunlap, xen-devel

On 04.08.2020 15:42, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> This patch avoids calling iommu_iotlb_flush() for each individual GNTTABOP and
> insteads calls iommu_iotlb_flush_all() at the end of the hypercall. This
> should mean batched map/unmap operations perform better but may be slightly
> detrimental to singleton performance.

I would strongly suggest keeping singleton operations do single-DFN flushes.

> @@ -1329,20 +1326,30 @@ gnttab_map_grant_ref(
>              return i;

This one line is part of a path which you can't bypass as far as flushing
is concerned. In this regard the description is also slightly misleading:
It's not just "at the end of the hypercall" when flushing needs doing,
but also on every preemption.

>          if ( unlikely(__copy_from_guest_offset(&op, uop, i, 1)) )
> -            return -EFAULT;
> +        {
> +            rc = -EFAULT;
> +            break;
> +        }
>  
> -        map_grant_ref(&op);
> +        map_grant_ref(&op, &flush_flags);
>  
>          if ( unlikely(__copy_to_guest_offset(uop, i, &op, 1)) )
> -            return -EFAULT;
> +        {
> +            rc = -EFAULT;
> +            break;
> +        }
>      }
>  
> -    return 0;
> +    err = iommu_iotlb_flush_all(current->domain, flush_flags);
> +    if ( !rc )
> +        rc = err;

Not sure how important it is to retain performance upon errors: Strictly
speaking there's no need to flush when i == 0 and rc != 0.

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 10/14] iommu: remove the share_p2m operation
  2020-08-04 13:42 ` [PATCH v4 10/14] iommu: remove the share_p2m operation Paul Durrant
@ 2020-08-06 12:18   ` Jan Beulich
  2020-08-14  7:04   ` Tian, Kevin
  1 sibling, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2020-08-06 12:18 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Wei Liu, Andrew Cooper, Paul Durrant, George Dunlap,
	xen-devel, Roger Pau Monné

On 04.08.2020 15:42, Paul Durrant wrote:
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -318,6 +318,48 @@ static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
>      return pte_maddr;
>  }
>  
> +static uint64_t domain_pgd_maddr(struct domain *d, struct vtd_iommu *iommu)

The 2nd param can be const, and I wonder whether it wouldn't better be
named e.g. "vtd". Then again all you're after is iommu->nr_pt_levels,
so maybe the caller would better pass in that value (removing the
appearance of there being some further dependency about the specific
IOMMU's properties)?

> +{
> +    struct domain_iommu *hd = dom_iommu(d);
> +    uint64_t pgd_maddr;
> +    unsigned int agaw;
> +
> +    ASSERT(spin_is_locked(&hd->arch.mapping_lock));
> +
> +    if ( iommu_use_hap_pt(d) )
> +    {
> +        mfn_t pgd_mfn =
> +            pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
> +
> +        return pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));

Why the pagetable -> MFN -> pagetable -> paddr transformation? I.e. just

        return pagetable_get_paddr(p2m_get_pagetable(p2m_get_hostp2m(d)));

? Oh, I've now realized that's how the old code was written.

> +    }
> +
> +    if ( !hd->arch.vtd.pgd_maddr )
> +    {
> +        addr_to_dma_page_maddr(d, 0, 1);
> +
> +        if ( !hd->arch.vtd.pgd_maddr )
> +            return 0;
> +    }
> +
> +    pgd_maddr = hd->arch.vtd.pgd_maddr;
> +
> +    /* Skip top levels of page tables for 2- and 3-level DRHDs. */
> +    for ( agaw = level_to_agaw(4);
> +          agaw != level_to_agaw(iommu->nr_pt_levels);
> +          agaw-- )
> +    {
> +        struct dma_pte *p = map_vtd_domain_page(pgd_maddr);

const?

> +
> +        pgd_maddr = dma_pte_addr(*p);
> +        unmap_vtd_domain_page(p);
> +        if ( !pgd_maddr )
> +            return 0;
> +    }
> +
> +    return pgd_maddr;
> +}

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 11/14] iommu: stop calling IOMMU page tables 'p2m tables'
  2020-08-04 13:42 ` [PATCH v4 11/14] iommu: stop calling IOMMU page tables 'p2m tables' Paul Durrant
@ 2020-08-06 12:23   ` Jan Beulich
  2020-08-14  7:12   ` Tian, Kevin
  1 sibling, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2020-08-06 12:23 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Paul Durrant, Kevin Tian, Andrew Cooper

On 04.08.2020 15:42, Paul Durrant wrote:
> @@ -553,14 +549,7 @@ static void iommu_dump_p2m_table(unsigned char key)
>          if ( is_hardware_domain(d) || !is_iommu_enabled(d) )
>              continue;
>  
> -        if ( iommu_use_hap_pt(d) )
> -        {
> -            printk("\ndomain%d IOMMU p2m table shared with MMU: \n", d->domain_id);
> -            continue;
> -        }
> -
> -        printk("\ndomain%d IOMMU p2m table: \n", d->domain_id);

This (importantish) information was lost.

> @@ -2624,17 +2624,19 @@ static void vtd_dump_p2m_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
>      unmap_vtd_domain_page(pt_vaddr);
>  }
>  
> -static void vtd_dump_p2m_table(struct domain *d)
> +static void vtd_dump_page_tables(struct domain *d)
>  {
> -    const struct domain_iommu *hd;
> +    const struct domain_iommu *hd = dom_iommu(d);
>  
> -    if ( list_empty(&acpi_drhd_units) )
> +    if ( iommu_use_hap_pt(d) )
> +    {
> +        printk("VT-D sharing EPT table\n");
>          return;
> +    }
>  
> -    hd = dom_iommu(d);
> -    printk("p2m table has %d levels\n", agaw_to_level(hd->arch.vtd.agaw));
> -    vtd_dump_p2m_table_level(hd->arch.vtd.pgd_maddr,
> -                             agaw_to_level(hd->arch.vtd.agaw), 0, 0);
> +    printk("VT-D table has %d levels\n", agaw_to_level(hd->arch.vtd.agaw));

I think it's commonly VT-d (a mixture of case).

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 12/14] vtd: use a bit field for root_entry
  2020-08-04 13:42 ` [PATCH v4 12/14] vtd: use a bit field for root_entry Paul Durrant
@ 2020-08-06 12:34   ` Jan Beulich
  2020-08-12 13:13     ` Durrant, Paul
  2020-08-14  7:17   ` Tian, Kevin
  1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2020-08-06 12:34 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Paul Durrant, Kevin Tian

On 04.08.2020 15:42, Paul Durrant wrote:
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -184,21 +184,28 @@
>  #define dma_frcd_source_id(c) (c & 0xffff)
>  #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
>  
> -/*
> - * 0: Present
> - * 1-11: Reserved
> - * 12-63: Context Ptr (12 - (haw-1))
> - * 64-127: Reserved
> - */
>  struct root_entry {
> -    u64    val;
> -    u64    rsvd1;
> +    union {
> +        __uint128_t val;

I couldn't find a use of this field, and I also can't foresee any.
Could it be left out?

> +        struct { uint64_t lo, hi; };
> +        struct {
> +            /* 0 - 63 */
> +            uint64_t p:1;

bool?

> +            uint64_t reserved0:11;
> +            uint64_t ctp:52;
> +
> +            /* 64 - 127 */
> +            uint64_t reserved1;
> +        };
> +    };
>  };
> -#define root_present(root)    ((root).val & 1)
> -#define set_root_present(root) do {(root).val |= 1;} while(0)
> -#define get_context_addr(root) ((root).val & PAGE_MASK_4K)
> -#define set_root_value(root, value) \
> -    do {(root).val |= ((value) & PAGE_MASK_4K);} while(0)
> +
> +#define root_present(r) (r).p
> +#define set_root_present(r) do { (r).p = 1; } while (0)

And then "true" here?

> +#define root_ctp(r) ((r).ctp << PAGE_SHIFT_4K)
> +#define set_root_ctp(r, val) \
> +    do { (r).ctp = ((val) >> PAGE_SHIFT_4K); } while (0)

For documentation purposes, can the 2nd macro param be named maddr
or some such?

> --- a/xen/drivers/passthrough/vtd/x86/ats.c
> +++ b/xen/drivers/passthrough/vtd/x86/ats.c
> @@ -74,8 +74,8 @@ int ats_device(const struct pci_dev *pdev, const struct acpi_drhd_unit *drhd)
>  static bool device_in_domain(const struct vtd_iommu *iommu,
>                               const struct pci_dev *pdev, uint16_t did)
>  {
> -    struct root_entry *root_entry;
> -    struct context_entry *ctxt_entry = NULL;
> +    struct root_entry *root_entry, *root_entries = NULL;
> +    struct context_entry *context_entry, *context_entries = NULL;

Just like root_entry, root_entries doesn't look to need an initializer.
I'm unconvinced anyway that you now need two variables each:
unmap_vtd_domain_page() does quite fine with the low 12 bits not all
being zero, afaict.

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 13/14] vtd: use a bit field for context_entry
  2020-08-04 13:42 ` [PATCH v4 13/14] vtd: use a bit field for context_entry Paul Durrant
@ 2020-08-06 12:46   ` Jan Beulich
  2020-08-12 13:47     ` Paul Durrant
  2020-08-14  7:19   ` Tian, Kevin
  1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2020-08-06 12:46 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Paul Durrant, Kevin Tian

On 04.08.2020 15:42, Paul Durrant wrote:
> @@ -208,35 +209,53 @@ struct root_entry {
>      do { (r).ctp = ((val) >> PAGE_SHIFT_4K); } while (0)
>  
>  struct context_entry {
> -    u64 lo;
> -    u64 hi;
> +    union {
> +        __uint128_t val;
> +        struct { uint64_t lo, hi; };
> +        struct {
> +            /* 0 - 63 */
> +            uint64_t p:1;
> +            uint64_t fpd:1;
> +            uint64_t tt:2;
> +            uint64_t reserved0:8;
> +            uint64_t slptp:52;
> +
> +            /* 64 - 127 */
> +            uint64_t aw:3;
> +            uint64_t ignored:4;
> +            uint64_t reserved1:1;
> +            uint64_t did:16;
> +            uint64_t reserved2:40;
> +        };
> +    };
>  };
> -#define ROOT_ENTRY_NR (PAGE_SIZE_4K/sizeof(struct root_entry))
> -#define context_present(c) ((c).lo & 1)
> -#define context_fault_disable(c) (((c).lo >> 1) & 1)
> -#define context_translation_type(c) (((c).lo >> 2) & 3)
> -#define context_address_root(c) ((c).lo & PAGE_MASK_4K)
> -#define context_address_width(c) ((c).hi &  7)
> -#define context_domain_id(c) (((c).hi >> 8) & ((1 << 16) - 1))
> -
> -#define context_set_present(c) do {(c).lo |= 1;} while(0)
> -#define context_clear_present(c) do {(c).lo &= ~1;} while(0)
> -#define context_set_fault_enable(c) \
> -    do {(c).lo &= (((u64)-1) << 2) | 1;} while(0)
> -
> -#define context_set_translation_type(c, val) do { \
> -        (c).lo &= (((u64)-1) << 4) | 3; \
> -        (c).lo |= (val & 3) << 2; \
> -    } while(0)
> +
> +#define context_present(c) (c).p
> +#define context_set_present(c) do { (c).p = 1; } while (0)
> +#define context_clear_present(c) do { (c).p = 0; } while (0)
> +
> +#define context_fault_disable(c) (c).fpd
> +#define context_set_fault_enable(c) do { (c).fpd = 1; } while (0)
> +
> +#define context_translation_type(c) (c).tt
> +#define context_set_translation_type(c, val) do { (c).tt = val; } while (0)
>  #define CONTEXT_TT_MULTI_LEVEL 0
>  #define CONTEXT_TT_DEV_IOTLB   1
>  #define CONTEXT_TT_PASS_THRU   2
>  
> -#define context_set_address_root(c, val) \
> -    do {(c).lo &= 0xfff; (c).lo |= (val) & PAGE_MASK_4K ;} while(0)
> +#define context_slptp(c) ((c).slptp << PAGE_SHIFT_4K)
> +#define context_set_slptp(c, val) \
> +    do { (c).slptp = (val) >> PAGE_SHIFT_4K; } while (0)

Presumably "slptp" is in line with the doc, but "address_root" is
quite a bit more readable. I wonder if I could talk you into
restoring the old (or some similar) names.

More generally, and more so here than perhaps already on the previous
patch - are these helper macros useful to have anymore?

> +#define context_address_width(c) (c).aw
>  #define context_set_address_width(c, val) \
> -    do {(c).hi &= 0xfffffff8; (c).hi |= (val) & 7;} while(0)
> -#define context_clear_entry(c) do {(c).lo = 0; (c).hi = 0;} while(0)
> +    do { (c).aw = (val); } while (0)
> +
> +#define context_did(c) (c).did
> +#define context_set_did(c, val) \
> +    do { (c).did = (val); } while (0)
> +
> +#define context_clear_entry(c) do { (c).val = 0; } while (0)

While this is in line with previous code, I'm concerned:
domain_context_unmap_one() has

    context_clear_present(*context);
    context_clear_entry(*context);

No barrier means no guarantee of ordering. I'd drop clear_present()
here and make clear_entry() properly ordered. This, I think, will at
the same time render the __uint128_t field unused and hence
unnecessary again.

Also comments given on the previous patch apply respectively here.

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 14/14] vtd: use a bit field for dma_pte
  2020-08-04 13:42 ` [PATCH v4 14/14] vtd: use a bit field for dma_pte Paul Durrant
@ 2020-08-06 12:53   ` Jan Beulich
  2020-08-12 13:49     ` Paul Durrant
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2020-08-06 12:53 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Paul Durrant, Kevin Tian

On 04.08.2020 15:42, Paul Durrant wrote:
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1772,13 +1772,14 @@ static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
>      old = *pte;
>  
>      dma_set_pte_addr(new, mfn_to_maddr(mfn));
> -    dma_set_pte_prot(new,
> -                     ((flags & IOMMUF_readable) ? DMA_PTE_READ  : 0) |
> -                     ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0));
> +    if ( flags & IOMMUF_readable )
> +        dma_set_pte_readable(new);
> +    if ( flags & IOMMUF_writable )
> +        dma_set_pte_writable(new);
>  
>      /* Set the SNP on leaf page table if Snoop Control available */
>      if ( iommu_snoop )
> -        dma_set_pte_snp(new);
> +        dma_set_pte_snoop(new);

Perhaps simply use an initializer:

    new = (struct dma_ptr){
            .r = flags & IOMMUF_readable,
            .w = flags & IOMMUF_writable,
            .snp = iommu_snoop,
            .addr = mfn_x(mfn),
        };

? This also points out that the "addr" field isn't really an address,
and hence may want renaming.

Again comments on the two earlier patch apply here respectively (or
else part of the suggestion above isn't going to work as is).

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 07/14] iommu: make map, unmap and flush all take both an order and a count
  2020-08-06  9:57   ` Jan Beulich
@ 2020-08-11 11:00     ` Durrant, Paul
  2020-08-14  6:57     ` Tian, Kevin
  1 sibling, 0 replies; 43+ messages in thread
From: Durrant, Paul @ 2020-08-11 11:00 UTC (permalink / raw)
  To: Jan Beulich, Paul Durrant
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Andrew Cooper, Ian Jackson, George Dunlap, Jun Nakajima,
	xen-devel, Volodymyr Babchuk, Roger Pau Monné

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 06 August 2020 10:57
> To: Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Durrant, Paul <pdurrant@amazon.co.uk>; Jun Nakajima
> <jun.nakajima@intel.com>; Kevin Tian <kevin.tian@intel.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; George Dunlap <george.dunlap@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Julien Grall <julien@xen.org>;
> Stefano Stabellini <sstabellini@kernel.org>; Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: RE: [EXTERNAL] [PATCH v4 07/14] iommu: make map, unmap and flush all take both an order and a
> count
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 04.08.2020 15:42, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > At the moment iommu_map() and iommu_unmap() take a page order but not a
> > count, whereas iommu_iotlb_flush() takes a count but not a page order.
> > This patch simply makes them consistent with each other.
> 
> Why can't we do with just a count, where order gets worked out by
> functions knowing how to / wanting to deal with higher order pages?
> 

Yes, that may well be better. The order of the CPU mappings isn't really relevant cases where the IOMMU uses different page orders. I'll just move everything over to a page count.

  Paul

> > --- a/xen/arch/x86/mm/p2m-ept.c
> > +++ b/xen/arch/x86/mm/p2m-ept.c
> > @@ -843,7 +843,7 @@ out:
> >           need_modify_vtd_table )
> >      {
> >          if ( iommu_use_hap_pt(d) )
> > -            rc = iommu_iotlb_flush(d, _dfn(gfn), (1u << order),
> > +            rc = iommu_iotlb_flush(d, _dfn(gfn), (1u << order), 1,
> 
> Forgot to drop the "1 << "? (There are then I think two more instances
> further down.)
> 
> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -851,12 +851,12 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
> >
> >          this_cpu(iommu_dont_flush_iotlb) = 0;
> >
> > -        ret = iommu_iotlb_flush(d, _dfn(xatp->idx - done), done,
> > +        ret = iommu_iotlb_flush(d, _dfn(xatp->idx - done), 0, done,
> 
> Arguments wrong way round? (This risk of inverting their order is
> one of the primary reasons why I think we want just a count.) I'm
> also uncertain about the use of 0 vs PAGE_ORDER_4K here.
> 
> >                                  IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified);
> >          if ( unlikely(ret) && rc >= 0 )
> >              rc = ret;
> >
> > -        ret = iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), done,
> > +        ret = iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), 0, done,
> 
> Same here then.
> 
> Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [EXTERNAL] [PATCH v4 08/14] remove remaining uses of iommu_legacy_map/unmap
  2020-08-06 10:28   ` Jan Beulich
@ 2020-08-12  9:36     ` Paul Durrant
  0 siblings, 0 replies; 43+ messages in thread
From: Paul Durrant @ 2020-08-12  9:36 UTC (permalink / raw)
  To: 'Jan Beulich', 'Paul Durrant'
  Cc: xen-devel, 'Andrew Cooper', 'Wei Liu',
	'Roger Pau Monné', 'George Dunlap',
	'Ian Jackson', 'Julien Grall',
	'Stefano Stabellini', 'Jun Nakajima',
	'Kevin Tian'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 06 August 2020 11:29
> To: Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Durrant, Paul <pdurrant@amazon.co.uk>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau Monné <roger.pau@citrix.com>; George
> Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Julien Grall
> <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Jun Nakajima <jun.nakajima@intel.com>;
> Kevin Tian <kevin.tian@intel.com>
> Subject: RE: [EXTERNAL] [PATCH v4 08/14] remove remaining uses of iommu_legacy_map/unmap
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 04.08.2020 15:42, Paul Durrant wrote:
> > The 'legacy' functions do implicit flushing so amend the callers to do the
> > appropriate flushing.
> >
> > Unfortunately, because of the structure of the P2M code, we cannot remove
> > the per-CPU 'iommu_dont_flush_iotlb' global and the optimization it
> > facilitates. It is now checked directly iommu_iotlb_flush(). Also, it is
> > now declared as bool (rather than bool_t) and setting/clearing it are no
> > longer pointlessly gated on is_iommu_enabled() returning true. (Arguably
> > it is also pointless to gate the call to iommu_iotlb_flush() on that
> > condition - since it is a no-op in that case - but the if clause allows
> > the scope of a stack variable to be restricted).
> >
> > NOTE: The code in memory_add() now fails if the number of pages passed to
> >       a single call overflows an unsigned int. I don't believe this will
> >       ever happen in practice.
> 
> I.e. you don't think adding 16Tb of memory in one go is possible?
> I wouldn't bet on that ...
> 

I've re-worked previous patches to use unsigned long so I don't need this restriction any more.

> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -2446,10 +2446,16 @@ static int cleanup_page_mappings(struct page_info *page)
> >
> >          if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
> >          {
> > -            int rc2 = iommu_legacy_unmap(d, _dfn(mfn), PAGE_ORDER_4K);
> > +            unsigned int flush_flags = 0;
> > +            int err;
> >
> > +            err = iommu_unmap(d, _dfn(mfn), PAGE_ORDER_4K, 1, &flush_flags);
> >              if ( !rc )
> > -                rc = rc2;
> > +                rc = err;
> > +
> > +            err = iommu_iotlb_flush(d, _dfn(mfn), PAGE_ORDER_4K, 1, flush_flags);
> > +            if ( !rc )
> > +                rc = err;
> >          }
> 
> Wasn't the earlier change to add flushing in the error case to
> allow to simplify code like this to
> 
>         if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
>         {
>             unsigned int flush_flags = 0;
>             int err;
> 
>             err = iommu_unmap(d, _dfn(mfn), PAGE_ORDER_4K, 1, &flush_flags);
>             if ( !err )
>                 err = iommu_iotlb_flush(d, _dfn(mfn), PAGE_ORDER_4K, 1, flush_flags);
>             if ( !rc )
>                 rc = err;
>         }
> 
> ?

Yes.

> 
> > @@ -1441,9 +1446,16 @@ int clear_identity_p2m_entry(struct domain *d, unsigned long gfn_l)
> >
> >      if ( !paging_mode_translate(d) )
> >      {
> > -        if ( !is_iommu_enabled(d) )
> > -            return 0;
> > -        return iommu_legacy_unmap(d, _dfn(gfn_l), PAGE_ORDER_4K);
> > +        unsigned int flush_flags = 0;
> > +        int err;
> > +
> > +        ret = iommu_unmap(d, _dfn(gfn_l), PAGE_ORDER_4K, 1, &flush_flags);
> > +
> > +        err = iommu_iotlb_flush(d, _dfn(gfn_l), PAGE_ORDER_4K, 1, flush_flags);
> > +        if ( !ret )
> > +            ret = err;
> > +
> > +        return ret;
> >      }
> 
> Similarly here then.
> 

Yes.

> > --- a/xen/arch/x86/x86_64/mm.c
> > +++ b/xen/arch/x86/x86_64/mm.c
> > @@ -1413,21 +1413,22 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
> >           !iommu_use_hap_pt(hardware_domain) &&
> >           !need_iommu_pt_sync(hardware_domain) )
> >      {
> > -        for ( i = spfn; i < epfn; i++ )
> > -            if ( iommu_legacy_map(hardware_domain, _dfn(i), _mfn(i),
> > -                                  PAGE_ORDER_4K,
> > -                                  IOMMUF_readable | IOMMUF_writable) )
> > -                break;
> > -        if ( i != epfn )
> > -        {
> > -            while (i-- > old_max)
> > -                /* If statement to satisfy __must_check. */
> > -                if ( iommu_legacy_unmap(hardware_domain, _dfn(i),
> > -                                        PAGE_ORDER_4K) )
> > -                    continue;
> > +        unsigned int flush_flags = 0;
> > +        unsigned int n = epfn - spfn;
> > +        int rc;
> >
> > +        ret = -EOVERFLOW;
> > +        if ( spfn + n != epfn )
> > +            goto destroy_m2p;
> > +
> > +        rc = iommu_map(hardware_domain, _dfn(i), _mfn(i),
> > +                       PAGE_ORDER_4K, n, IOMMUF_readable | IOMMUF_writable,
> > +                       &flush_flags);
> > +        if ( !rc )
> > +            rc = iommu_iotlb_flush(hardware_domain, _dfn(i), PAGE_ORDER_4K, n,
> > +                                       flush_flags);
> > +        if ( rc )
> >              goto destroy_m2p;
> > -        }
> >      }
> 
> Did you mean to use "ret" here instead of introducing "rc"?
> 

The previous code did not set ret in the case of an iommu op failure but that does appear to be a mistake. I will use ret, as you suggest, but I will call it out in the commit description too.

> > --- a/xen/common/grant_table.c
> > +++ b/xen/common/grant_table.c
> > @@ -1225,11 +1225,23 @@ map_grant_ref(
> >              kind = IOMMUF_readable;
> >          else
> >              kind = 0;
> > -        if ( kind && iommu_legacy_map(ld, _dfn(mfn_x(mfn)), mfn, 0, kind) )
> > +        if ( kind )
> >          {
> > -            double_gt_unlock(lgt, rgt);
> > -            rc = GNTST_general_error;
> > -            goto undo_out;
> > +            dfn_t dfn = _dfn(mfn_x(mfn));
> > +            unsigned int flush_flags = 0;
> > +            int err;
> > +
> > +            err = iommu_map(ld, dfn, mfn, 0, 1, kind, &flush_flags);
> > +            if ( !err )
> > +                err = iommu_iotlb_flush(ld, dfn, 0, 1, flush_flags);
> 
> Question of 0 vs PAGE_ORDER_4K again.
> 
> > @@ -1473,21 +1485,25 @@ unmap_common(
> >      if ( rc == GNTST_okay && gnttab_need_iommu_mapping(ld) )
> >      {
> >          unsigned int kind;
> > +        dfn_t dfn = _dfn(mfn_x(op->mfn));
> > +        unsigned int flush_flags = 0;
> >          int err = 0;
> >
> >          double_gt_lock(lgt, rgt);
> >
> >          kind = mapkind(lgt, rd, op->mfn);
> >          if ( !kind )
> > -            err = iommu_legacy_unmap(ld, _dfn(mfn_x(op->mfn)), 0);
> > +            err = iommu_unmap(ld, dfn, 0, 1, &flush_flags);
> >          else if ( !(kind & MAPKIND_WRITE) )
> > -            err = iommu_legacy_map(ld, _dfn(mfn_x(op->mfn)), op->mfn, 0,
> > -                                   IOMMUF_readable);
> > -
> > -        double_gt_unlock(lgt, rgt);
> > +            err = iommu_map(ld, dfn, op->mfn, 0, 1, IOMMUF_readable,
> > +                            &flush_flags);
> >
> > +        if ( !err )
> > +            err = iommu_iotlb_flush(ld, dfn, 0, 1, flush_flags);
> >          if ( err )
> >              rc = GNTST_general_error;
> > +
> > +        double_gt_unlock(lgt, rgt);
> >      }
> 
> While moving the unlock ahead of the flush would be somewhat troublesome
> in the map case, it seems straightforward here. Even if this gets further
> adjusted by a later patch, it should imo be done here - the later patch
> may also go in much later.
> 

Ok.

> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -824,8 +824,7 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
> >      xatp->gpfn += start;
> >      xatp->size -= start;
> >
> > -    if ( is_iommu_enabled(d) )
> > -       this_cpu(iommu_dont_flush_iotlb) = 1;
> > +    this_cpu(iommu_dont_flush_iotlb) = true;
> 
> Just like you replace the original instance here, ...
> 
> > @@ -845,6 +844,8 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
> >          }
> >      }
> >
> > +    this_cpu(iommu_dont_flush_iotlb) = false;
> > +
> >      if ( is_iommu_enabled(d) )
> >      {
> >          int ret;
> 
> ... I'm sure you meant to also remove the original instance from
> down below here.

I did indeed. Thanks for spotting.

> 
> > @@ -364,7 +341,7 @@ int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned int page_order,
> >      int rc;
> >
> >      if ( !is_iommu_enabled(d) || !hd->platform_ops->iotlb_flush ||
> > -         !page_count || !flush_flags )
> > +         !page_count || !flush_flags || this_cpu(iommu_dont_flush_iotlb) )
> >          return 0;
> 
> The patch description ought to assure the safety of this change: So
> far, despite the flag set callers of iommu_iotlb_flush() (which
> may be unaware of the flag's state) did get what they did ask for.
> The change relies on there not being any such uses.
> 

Ok, I'll call it out.

> > @@ -370,15 +362,12 @@ void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev);
> >
> >  /*
> >   * The purpose of the iommu_dont_flush_iotlb optional cpu flag is to
> > - * avoid unecessary iotlb_flush in the low level IOMMU code.
> > - *
> > - * iommu_map_page/iommu_unmap_page must flush the iotlb but somethimes
> > - * this operation can be really expensive. This flag will be set by the
> > - * caller to notify the low level IOMMU code to avoid the iotlb flushes.
> > - * iommu_iotlb_flush/iommu_iotlb_flush_all will be explicitly called by
> > - * the caller.
> > + * avoid unecessary IOMMU flushing while updating the P2M.
> 
> Correct the spelling of "unnecessary" at the same time?
> 

Oh yes. Will do.

  Paul

> Jan



^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 12/14] vtd: use a bit field for root_entry
  2020-08-06 12:34   ` Jan Beulich
@ 2020-08-12 13:13     ` Durrant, Paul
  2020-08-18  8:27       ` Jan Beulich
  0 siblings, 1 reply; 43+ messages in thread
From: Durrant, Paul @ 2020-08-12 13:13 UTC (permalink / raw)
  To: Jan Beulich, Paul Durrant; +Cc: xen-devel, Kevin Tian

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 06 August 2020 13:34
> To: Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Durrant, Paul <pdurrant@amazon.co.uk>; Kevin Tian
> <kevin.tian@intel.com>
> Subject: RE: [EXTERNAL] [PATCH v4 12/14] vtd: use a bit field for root_entry
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 04.08.2020 15:42, Paul Durrant wrote:
> > --- a/xen/drivers/passthrough/vtd/iommu.h
> > +++ b/xen/drivers/passthrough/vtd/iommu.h
> > @@ -184,21 +184,28 @@
> >  #define dma_frcd_source_id(c) (c & 0xffff)
> >  #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
> >
> > -/*
> > - * 0: Present
> > - * 1-11: Reserved
> > - * 12-63: Context Ptr (12 - (haw-1))
> > - * 64-127: Reserved
> > - */
> >  struct root_entry {
> > -    u64    val;
> > -    u64    rsvd1;
> > +    union {
> > +        __uint128_t val;
> 
> I couldn't find a use of this field, and I also can't foresee any.
> Could it be left out?

Yes, probably.

> 
> > +        struct { uint64_t lo, hi; };
> > +        struct {
> > +            /* 0 - 63 */
> > +            uint64_t p:1;
> 
> bool?
> 

I'd prefer not to. One of the points of using a bit field (at least from my PoV) is that it makes referring back to the spec. much easier, by using uint64_t types consistently and hence using bit widths that can be straightforwardly summed to give the bit offsets stated in the spec.

> > +            uint64_t reserved0:11;
> > +            uint64_t ctp:52;
> > +
> > +            /* 64 - 127 */
> > +            uint64_t reserved1;
> > +        };
> > +    };
> >  };
> > -#define root_present(root)    ((root).val & 1)
> > -#define set_root_present(root) do {(root).val |= 1;} while(0)
> > -#define get_context_addr(root) ((root).val & PAGE_MASK_4K)
> > -#define set_root_value(root, value) \
> > -    do {(root).val |= ((value) & PAGE_MASK_4K);} while(0)
> > +
> > +#define root_present(r) (r).p
> > +#define set_root_present(r) do { (r).p = 1; } while (0)
> 
> And then "true" here?
> 
> > +#define root_ctp(r) ((r).ctp << PAGE_SHIFT_4K)
> > +#define set_root_ctp(r, val) \
> > +    do { (r).ctp = ((val) >> PAGE_SHIFT_4K); } while (0)
> 
> For documentation purposes, can the 2nd macro param be named maddr
> or some such?
> 

Sure.

> > --- a/xen/drivers/passthrough/vtd/x86/ats.c
> > +++ b/xen/drivers/passthrough/vtd/x86/ats.c
> > @@ -74,8 +74,8 @@ int ats_device(const struct pci_dev *pdev, const struct acpi_drhd_unit *drhd)
> >  static bool device_in_domain(const struct vtd_iommu *iommu,
> >                               const struct pci_dev *pdev, uint16_t did)
> >  {
> > -    struct root_entry *root_entry;
> > -    struct context_entry *ctxt_entry = NULL;
> > +    struct root_entry *root_entry, *root_entries = NULL;
> > +    struct context_entry *context_entry, *context_entries = NULL;
> 
> Just like root_entry, root_entries doesn't look to need an initializer.
> I'm unconvinced anyway that you now need two variables each:
> unmap_vtd_domain_page() does quite fine with the low 12 bits not all
> being zero, afaict.

Not passing a page aligned address into something that unmaps a page seems a little bit fragile though, e.g. if someone happened to add a check in future. I'll see if I can drop the initializer though.

  Paul

> 
> Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 13/14] vtd: use a bit field for context_entry
  2020-08-06 12:46   ` Jan Beulich
@ 2020-08-12 13:47     ` Paul Durrant
  0 siblings, 0 replies; 43+ messages in thread
From: Paul Durrant @ 2020-08-12 13:47 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: xen-devel, 'Paul Durrant', 'Kevin Tian'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 06 August 2020 13:47
> To: Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Kevin Tian
> <kevin.tian@intel.com>
> Subject: Re: [PATCH v4 13/14] vtd: use a bit field for context_entry
> 
> On 04.08.2020 15:42, Paul Durrant wrote:
> > @@ -208,35 +209,53 @@ struct root_entry {
> >      do { (r).ctp = ((val) >> PAGE_SHIFT_4K); } while (0)
> >
> >  struct context_entry {
> > -    u64 lo;
> > -    u64 hi;
> > +    union {
> > +        __uint128_t val;
> > +        struct { uint64_t lo, hi; };
> > +        struct {
> > +            /* 0 - 63 */
> > +            uint64_t p:1;
> > +            uint64_t fpd:1;
> > +            uint64_t tt:2;
> > +            uint64_t reserved0:8;
> > +            uint64_t slptp:52;
> > +
> > +            /* 64 - 127 */
> > +            uint64_t aw:3;
> > +            uint64_t ignored:4;
> > +            uint64_t reserved1:1;
> > +            uint64_t did:16;
> > +            uint64_t reserved2:40;
> > +        };
> > +    };
> >  };
> > -#define ROOT_ENTRY_NR (PAGE_SIZE_4K/sizeof(struct root_entry))
> > -#define context_present(c) ((c).lo & 1)
> > -#define context_fault_disable(c) (((c).lo >> 1) & 1)
> > -#define context_translation_type(c) (((c).lo >> 2) & 3)
> > -#define context_address_root(c) ((c).lo & PAGE_MASK_4K)
> > -#define context_address_width(c) ((c).hi &  7)
> > -#define context_domain_id(c) (((c).hi >> 8) & ((1 << 16) - 1))
> > -
> > -#define context_set_present(c) do {(c).lo |= 1;} while(0)
> > -#define context_clear_present(c) do {(c).lo &= ~1;} while(0)
> > -#define context_set_fault_enable(c) \
> > -    do {(c).lo &= (((u64)-1) << 2) | 1;} while(0)
> > -
> > -#define context_set_translation_type(c, val) do { \
> > -        (c).lo &= (((u64)-1) << 4) | 3; \
> > -        (c).lo |= (val & 3) << 2; \
> > -    } while(0)
> > +
> > +#define context_present(c) (c).p
> > +#define context_set_present(c) do { (c).p = 1; } while (0)
> > +#define context_clear_present(c) do { (c).p = 0; } while (0)
> > +
> > +#define context_fault_disable(c) (c).fpd
> > +#define context_set_fault_enable(c) do { (c).fpd = 1; } while (0)
> > +
> > +#define context_translation_type(c) (c).tt
> > +#define context_set_translation_type(c, val) do { (c).tt = val; } while (0)
> >  #define CONTEXT_TT_MULTI_LEVEL 0
> >  #define CONTEXT_TT_DEV_IOTLB   1
> >  #define CONTEXT_TT_PASS_THRU   2
> >
> > -#define context_set_address_root(c, val) \
> > -    do {(c).lo &= 0xfff; (c).lo |= (val) & PAGE_MASK_4K ;} while(0)
> > +#define context_slptp(c) ((c).slptp << PAGE_SHIFT_4K)
> > +#define context_set_slptp(c, val) \
> > +    do { (c).slptp = (val) >> PAGE_SHIFT_4K; } while (0)
> 
> Presumably "slptp" is in line with the doc, but "address_root" is
> quite a bit more readable. I wonder if I could talk you into
> restoring the old (or some similar) names.

The problem with 'root' in the VT-d code is that it is ambiguous between this case and manipulations of 'root entries', which is why I moved away from it. The spec refers to 'slptptr' but I shortened it to 'slptp' for consistency with the root 'ctp'... I should really use the name from the spec. though.
I will add a comment above the macro stating what the 'slptptr' is too. 

> 
> More generally, and more so here than perhaps already on the previous
> patch - are these helper macros useful to have anymore?
> 

Less useful. I was worried about ditching them causing the patches to balloon in size but maybe they won't... I'll see.

> > +#define context_address_width(c) (c).aw
> >  #define context_set_address_width(c, val) \
> > -    do {(c).hi &= 0xfffffff8; (c).hi |= (val) & 7;} while(0)
> > -#define context_clear_entry(c) do {(c).lo = 0; (c).hi = 0;} while(0)
> > +    do { (c).aw = (val); } while (0)
> > +
> > +#define context_did(c) (c).did
> > +#define context_set_did(c, val) \
> > +    do { (c).did = (val); } while (0)
> > +
> > +#define context_clear_entry(c) do { (c).val = 0; } while (0)
> 
> While this is in line with previous code, I'm concerned:
> domain_context_unmap_one() has
> 
>     context_clear_present(*context);
>     context_clear_entry(*context);
> 
> No barrier means no guarantee of ordering. I'd drop clear_present()
> here and make clear_entry() properly ordered. This, I think, will at
> the same time render the __uint128_t field unused and hence
> unnecessary again.

I'd prefer to keep both with a barrier, particularly if I get rid of the macros.

  Paul

> 
> Also comments given on the previous patch apply respectively here.
> 
> Jan



^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 14/14] vtd: use a bit field for dma_pte
  2020-08-06 12:53   ` Jan Beulich
@ 2020-08-12 13:49     ` Paul Durrant
  0 siblings, 0 replies; 43+ messages in thread
From: Paul Durrant @ 2020-08-12 13:49 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: xen-devel, 'Paul Durrant', 'Kevin Tian'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 06 August 2020 13:54
> To: Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Kevin Tian
> <kevin.tian@intel.com>
> Subject: Re: [PATCH v4 14/14] vtd: use a bit field for dma_pte
> 
> On 04.08.2020 15:42, Paul Durrant wrote:
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -1772,13 +1772,14 @@ static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
> >      old = *pte;
> >
> >      dma_set_pte_addr(new, mfn_to_maddr(mfn));
> > -    dma_set_pte_prot(new,
> > -                     ((flags & IOMMUF_readable) ? DMA_PTE_READ  : 0) |
> > -                     ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0));
> > +    if ( flags & IOMMUF_readable )
> > +        dma_set_pte_readable(new);
> > +    if ( flags & IOMMUF_writable )
> > +        dma_set_pte_writable(new);
> >
> >      /* Set the SNP on leaf page table if Snoop Control available */
> >      if ( iommu_snoop )
> > -        dma_set_pte_snp(new);
> > +        dma_set_pte_snoop(new);
> 
> Perhaps simply use an initializer:
> 
>     new = (struct dma_ptr){
>             .r = flags & IOMMUF_readable,
>             .w = flags & IOMMUF_writable,
>             .snp = iommu_snoop,
>             .addr = mfn_x(mfn),
>         };
> 
> ? This also points out that the "addr" field isn't really an address,
> and hence may want renaming.

If I am getting rid of macros then this makes more sense.

  Paul

> 
> Again comments on the two earlier patch apply here respectively (or
> else part of the suggestion above isn't going to work as is).
> 
> Jan



^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 01/14] x86/iommu: re-arrange arch_iommu to separate common fields...
  2020-08-04 13:41 ` [PATCH v4 01/14] x86/iommu: re-arrange arch_iommu to separate common fields Paul Durrant
@ 2020-08-14  6:14   ` Tian, Kevin
  0 siblings, 0 replies; 43+ messages in thread
From: Tian, Kevin @ 2020-08-14  6:14 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Paul Durrant, Jan Beulich, Lukasz Hawrylko, Andrew Cooper,
	Wei Liu, Roger Pau Monné

> From: Paul Durrant <paul@xen.org>
> Sent: Tuesday, August 4, 2020 9:42 PM
> 
> From: Paul Durrant <pdurrant@amazon.com>
> 
> ... from those specific to VT-d or AMD IOMMU, and put the latter in a union.
> 
> There is no functional change in this patch, although the initialization of
> the 'mapped_rmrrs' list occurs slightly later in iommu_domain_init() since
> it is now done (correctly) in VT-d specific code rather than in general x86
> code.
> 
> NOTE: I have not combined the AMD IOMMU 'root_table' and VT-d
> 'pgd_maddr'
>       fields even though they perform essentially the same function. The
>       concept of 'root table' in the VT-d code is different from that in the
>       AMD code so attempting to use a common name will probably only serve
>       to confuse the reader.

"root table" in VT-d is an architecture definition in spec. But I didn't
see the same term in AMD IOMMU spec (rev3.00). Instead, it mentions
I/O page tables in many places. It gave me the impression that 'root
table' in AMD code is just a software term, thus replacing it with a
a common name makes more sense.

But even if it's right thing it could come as a separate patch, thus:

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>
> ---
> Cc: Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Wei Liu <wl@xen.org>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> 
> v4:
>  - Fix format specifier as requested by Jan
> 
> v2:
>  - s/amd_iommu/amd
>  - Definitions still left inline as re-arrangement into implementation
>    headers is non-trivial
>  - Also s/u64/uint64_t and s/int/unsigned int
> ---
>  xen/arch/x86/tboot.c                        |  4 +-
>  xen/drivers/passthrough/amd/iommu_guest.c   |  8 ++--
>  xen/drivers/passthrough/amd/iommu_map.c     | 14 +++---
>  xen/drivers/passthrough/amd/pci_amd_iommu.c | 35 +++++++-------
>  xen/drivers/passthrough/vtd/iommu.c         | 53 +++++++++++----------
>  xen/drivers/passthrough/x86/iommu.c         |  1 -
>  xen/include/asm-x86/iommu.h                 | 27 +++++++----
>  7 files changed, 78 insertions(+), 64 deletions(-)
> 
> diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
> index 320e06f129..e66b0940c4 100644
> --- a/xen/arch/x86/tboot.c
> +++ b/xen/arch/x86/tboot.c
> @@ -230,8 +230,8 @@ static void tboot_gen_domain_integrity(const uint8_t
> key[TB_KEY_SIZE],
>          {
>              const struct domain_iommu *dio = dom_iommu(d);
> 
> -            update_iommu_mac(&ctx, dio->arch.pgd_maddr,
> -                             agaw_to_level(dio->arch.agaw));
> +            update_iommu_mac(&ctx, dio->arch.vtd.pgd_maddr,
> +                             agaw_to_level(dio->arch.vtd.agaw));
>          }
>      }
> 
> diff --git a/xen/drivers/passthrough/amd/iommu_guest.c
> b/xen/drivers/passthrough/amd/iommu_guest.c
> index 014a72a54b..30b7353cd6 100644
> --- a/xen/drivers/passthrough/amd/iommu_guest.c
> +++ b/xen/drivers/passthrough/amd/iommu_guest.c
> @@ -50,12 +50,12 @@ static uint16_t guest_bdf(struct domain *d, uint16_t
> machine_bdf)
> 
>  static inline struct guest_iommu *domain_iommu(struct domain *d)
>  {
> -    return dom_iommu(d)->arch.g_iommu;
> +    return dom_iommu(d)->arch.amd.g_iommu;
>  }
> 
>  static inline struct guest_iommu *vcpu_iommu(struct vcpu *v)
>  {
> -    return dom_iommu(v->domain)->arch.g_iommu;
> +    return dom_iommu(v->domain)->arch.amd.g_iommu;
>  }
> 
>  static void guest_iommu_enable(struct guest_iommu *iommu)
> @@ -823,7 +823,7 @@ int guest_iommu_init(struct domain* d)
>      guest_iommu_reg_init(iommu);
>      iommu->mmio_base = ~0ULL;
>      iommu->domain = d;
> -    hd->arch.g_iommu = iommu;
> +    hd->arch.amd.g_iommu = iommu;
> 
>      tasklet_init(&iommu->cmd_buffer_tasklet,
> guest_iommu_process_command, d);
> 
> @@ -845,5 +845,5 @@ void guest_iommu_destroy(struct domain *d)
>      tasklet_kill(&iommu->cmd_buffer_tasklet);
>      xfree(iommu);
> 
> -    dom_iommu(d)->arch.g_iommu = NULL;
> +    dom_iommu(d)->arch.amd.g_iommu = NULL;
>  }
> diff --git a/xen/drivers/passthrough/amd/iommu_map.c
> b/xen/drivers/passthrough/amd/iommu_map.c
> index 93e96cd69c..47b4472e8a 100644
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -180,8 +180,8 @@ static int iommu_pde_from_dfn(struct domain *d,
> unsigned long dfn,
>      struct page_info *table;
>      const struct domain_iommu *hd = dom_iommu(d);
> 
> -    table = hd->arch.root_table;
> -    level = hd->arch.paging_mode;
> +    table = hd->arch.amd.root_table;
> +    level = hd->arch.amd.paging_mode;
> 
>      BUG_ON( table == NULL || level < 1 || level > 6 );
> 
> @@ -325,7 +325,7 @@ int amd_iommu_unmap_page(struct domain *d,
> dfn_t dfn,
> 
>      spin_lock(&hd->arch.mapping_lock);
> 
> -    if ( !hd->arch.root_table )
> +    if ( !hd->arch.amd.root_table )
>      {
>          spin_unlock(&hd->arch.mapping_lock);
>          return 0;
> @@ -450,7 +450,7 @@ int __init amd_iommu_quarantine_init(struct
> domain *d)
>      unsigned int level = amd_iommu_get_paging_mode(end_gfn);
>      struct amd_iommu_pte *table;
> 
> -    if ( hd->arch.root_table )
> +    if ( hd->arch.amd.root_table )
>      {
>          ASSERT_UNREACHABLE();
>          return 0;
> @@ -458,11 +458,11 @@ int __init amd_iommu_quarantine_init(struct
> domain *d)
> 
>      spin_lock(&hd->arch.mapping_lock);
> 
> -    hd->arch.root_table = alloc_amd_iommu_pgtable();
> -    if ( !hd->arch.root_table )
> +    hd->arch.amd.root_table = alloc_amd_iommu_pgtable();
> +    if ( !hd->arch.amd.root_table )
>          goto out;
> 
> -    table = __map_domain_page(hd->arch.root_table);
> +    table = __map_domain_page(hd->arch.amd.root_table);
>      while ( level )
>      {
>          struct page_info *pg;
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index 5f5f4a2eac..09a05f9d75 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -91,7 +91,8 @@ static void amd_iommu_setup_domain_device(
>      u8 bus = pdev->bus;
>      const struct domain_iommu *hd = dom_iommu(domain);
> 
> -    BUG_ON( !hd->arch.root_table || !hd->arch.paging_mode ||
> +    BUG_ON( !hd->arch.amd.root_table ||
> +            !hd->arch.amd.paging_mode ||
>              !iommu->dev_table.buffer );
> 
>      if ( iommu_hwdom_passthrough && is_hardware_domain(domain) )
> @@ -110,8 +111,8 @@ static void amd_iommu_setup_domain_device(
> 
>          /* bind DTE to domain page-tables */
>          amd_iommu_set_root_page_table(
> -            dte, page_to_maddr(hd->arch.root_table), domain->domain_id,
> -            hd->arch.paging_mode, valid);
> +            dte, page_to_maddr(hd->arch.amd.root_table),
> +            domain->domain_id, hd->arch.amd.paging_mode, valid);
> 
>          /* Undo what amd_iommu_disable_domain_device() may have done.
> */
>          ivrs_dev = &get_ivrs_mappings(iommu->seg)[req_id];
> @@ -131,8 +132,8 @@ static void amd_iommu_setup_domain_device(
>                          "root table = %#"PRIx64", "
>                          "domain = %d, paging mode = %d\n",
>                          req_id, pdev->type,
> -                        page_to_maddr(hd->arch.root_table),
> -                        domain->domain_id, hd->arch.paging_mode);
> +                        page_to_maddr(hd->arch.amd.root_table),
> +                        domain->domain_id, hd->arch.amd.paging_mode);
>      }
> 
>      spin_unlock_irqrestore(&iommu->lock, flags);
> @@ -206,10 +207,10 @@ static int iov_enable_xt(void)
> 
>  int amd_iommu_alloc_root(struct domain_iommu *hd)
>  {
> -    if ( unlikely(!hd->arch.root_table) )
> +    if ( unlikely(!hd->arch.amd.root_table) )
>      {
> -        hd->arch.root_table = alloc_amd_iommu_pgtable();
> -        if ( !hd->arch.root_table )
> +        hd->arch.amd.root_table = alloc_amd_iommu_pgtable();
> +        if ( !hd->arch.amd.root_table )
>              return -ENOMEM;
>      }
> 
> @@ -239,7 +240,7 @@ static int amd_iommu_domain_init(struct domain *d)
>       *   physical address space we give it, but this isn't known yet so use 4
>       *   unilaterally.
>       */
> -    hd->arch.paging_mode = amd_iommu_get_paging_mode(
> +    hd->arch.amd.paging_mode = amd_iommu_get_paging_mode(
>          is_hvm_domain(d)
>          ? 1ul << (DEFAULT_DOMAIN_ADDRESS_WIDTH - PAGE_SHIFT)
>          : get_upper_mfn_bound() + 1);
> @@ -305,7 +306,7 @@ static void
> amd_iommu_disable_domain_device(const struct domain *domain,
>          AMD_IOMMU_DEBUG("Disable: device id = %#x, "
>                          "domain = %d, paging mode = %d\n",
>                          req_id,  domain->domain_id,
> -                        dom_iommu(domain)->arch.paging_mode);
> +                        dom_iommu(domain)->arch.amd.paging_mode);
>      }
>      spin_unlock_irqrestore(&iommu->lock, flags);
> 
> @@ -420,10 +421,11 @@ static void deallocate_iommu_page_tables(struct
> domain *d)
>      struct domain_iommu *hd = dom_iommu(d);
> 
>      spin_lock(&hd->arch.mapping_lock);
> -    if ( hd->arch.root_table )
> +    if ( hd->arch.amd.root_table )
>      {
> -        deallocate_next_page_table(hd->arch.root_table, hd-
> >arch.paging_mode);
> -        hd->arch.root_table = NULL;
> +        deallocate_next_page_table(hd->arch.amd.root_table,
> +                                   hd->arch.amd.paging_mode);
> +        hd->arch.amd.root_table = NULL;
>      }
>      spin_unlock(&hd->arch.mapping_lock);
>  }
> @@ -598,11 +600,12 @@ static void amd_dump_p2m_table(struct domain
> *d)
>  {
>      const struct domain_iommu *hd = dom_iommu(d);
> 
> -    if ( !hd->arch.root_table )
> +    if ( !hd->arch.amd.root_table )
>          return;
> 
> -    printk("p2m table has %d levels\n", hd->arch.paging_mode);
> -    amd_dump_p2m_table_level(hd->arch.root_table, hd-
> >arch.paging_mode, 0, 0);
> +    printk("p2m table has %u levels\n", hd->arch.amd.paging_mode);
> +    amd_dump_p2m_table_level(hd->arch.amd.root_table,
> +                             hd->arch.amd.paging_mode, 0, 0);
>  }
> 
>  static const struct iommu_ops __initconstrel _iommu_ops = {
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index deaeab095d..94e0455a4d 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -257,20 +257,20 @@ static u64 bus_to_context_maddr(struct
> vtd_iommu *iommu, u8 bus)
>  static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int
> alloc)
>  {
>      struct domain_iommu *hd = dom_iommu(domain);
> -    int addr_width = agaw_to_width(hd->arch.agaw);
> +    int addr_width = agaw_to_width(hd->arch.vtd.agaw);
>      struct dma_pte *parent, *pte = NULL;
> -    int level = agaw_to_level(hd->arch.agaw);
> +    int level = agaw_to_level(hd->arch.vtd.agaw);
>      int offset;
>      u64 pte_maddr = 0;
> 
>      addr &= (((u64)1) << addr_width) - 1;
>      ASSERT(spin_is_locked(&hd->arch.mapping_lock));
> -    if ( !hd->arch.pgd_maddr &&
> +    if ( !hd->arch.vtd.pgd_maddr &&
>           (!alloc ||
> -          ((hd->arch.pgd_maddr = alloc_pgtable_maddr(1, hd->node)) == 0)) )
> +          ((hd->arch.vtd.pgd_maddr = alloc_pgtable_maddr(1, hd->node)) ==
> 0)) )
>          goto out;
> 
> -    parent = (struct dma_pte *)map_vtd_domain_page(hd->arch.pgd_maddr);
> +    parent = (struct dma_pte *)map_vtd_domain_page(hd-
> >arch.vtd.pgd_maddr);
>      while ( level > 1 )
>      {
>          offset = address_level_offset(addr, level);
> @@ -593,7 +593,7 @@ static int __must_check iommu_flush_iotlb(struct
> domain *d, dfn_t dfn,
>      {
>          iommu = drhd->iommu;
> 
> -        if ( !test_bit(iommu->index, &hd->arch.iommu_bitmap) )
> +        if ( !test_bit(iommu->index, &hd->arch.vtd.iommu_bitmap) )
>              continue;
> 
>          flush_dev_iotlb = !!find_ats_dev_drhd(iommu);
> @@ -1278,7 +1278,10 @@ void __init iommu_free(struct acpi_drhd_unit
> *drhd)
> 
>  static int intel_iommu_domain_init(struct domain *d)
>  {
> -    dom_iommu(d)->arch.agaw =
> width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
> +    struct domain_iommu *hd = dom_iommu(d);
> +
> +    hd->arch.vtd.agaw =
> width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
> +    INIT_LIST_HEAD(&hd->arch.vtd.mapped_rmrrs);
> 
>      return 0;
>  }
> @@ -1375,10 +1378,10 @@ int domain_context_mapping_one(
>          spin_lock(&hd->arch.mapping_lock);
> 
>          /* Ensure we have pagetables allocated down to leaf PTE. */
> -        if ( hd->arch.pgd_maddr == 0 )
> +        if ( hd->arch.vtd.pgd_maddr == 0 )
>          {
>              addr_to_dma_page_maddr(domain, 0, 1);
> -            if ( hd->arch.pgd_maddr == 0 )
> +            if ( hd->arch.vtd.pgd_maddr == 0 )
>              {
>              nomem:
>                  spin_unlock(&hd->arch.mapping_lock);
> @@ -1389,7 +1392,7 @@ int domain_context_mapping_one(
>          }
> 
>          /* Skip top levels of page tables for 2- and 3-level DRHDs. */
> -        pgd_maddr = hd->arch.pgd_maddr;
> +        pgd_maddr = hd->arch.vtd.pgd_maddr;
>          for ( agaw = level_to_agaw(4);
>                agaw != level_to_agaw(iommu->nr_pt_levels);
>                agaw-- )
> @@ -1443,7 +1446,7 @@ int domain_context_mapping_one(
>      if ( rc > 0 )
>          rc = 0;
> 
> -    set_bit(iommu->index, &hd->arch.iommu_bitmap);
> +    set_bit(iommu->index, &hd->arch.vtd.iommu_bitmap);
> 
>      unmap_vtd_domain_page(context_entries);
> 
> @@ -1714,7 +1717,7 @@ static int domain_context_unmap(struct domain
> *domain, u8 devfn,
>      {
>          int iommu_domid;
> 
> -        clear_bit(iommu->index, &dom_iommu(domain)-
> >arch.iommu_bitmap);
> +        clear_bit(iommu->index, &dom_iommu(domain)-
> >arch.vtd.iommu_bitmap);
> 
>          iommu_domid = domain_iommu_domid(domain, iommu);
>          if ( iommu_domid == -1 )
> @@ -1739,7 +1742,7 @@ static void iommu_domain_teardown(struct
> domain *d)
>      if ( list_empty(&acpi_drhd_units) )
>          return;
> 
> -    list_for_each_entry_safe ( mrmrr, tmp, &hd->arch.mapped_rmrrs, list )
> +    list_for_each_entry_safe ( mrmrr, tmp, &hd->arch.vtd.mapped_rmrrs,
> list )
>      {
>          list_del(&mrmrr->list);
>          xfree(mrmrr);
> @@ -1751,8 +1754,9 @@ static void iommu_domain_teardown(struct
> domain *d)
>          return;
> 
>      spin_lock(&hd->arch.mapping_lock);
> -    iommu_free_pagetable(hd->arch.pgd_maddr, agaw_to_level(hd-
> >arch.agaw));
> -    hd->arch.pgd_maddr = 0;
> +    iommu_free_pagetable(hd->arch.vtd.pgd_maddr,
> +                         agaw_to_level(hd->arch.vtd.agaw));
> +    hd->arch.vtd.pgd_maddr = 0;
>      spin_unlock(&hd->arch.mapping_lock);
>  }
> 
> @@ -1892,7 +1896,7 @@ static void iommu_set_pgd(struct domain *d)
>      mfn_t pgd_mfn;
> 
>      pgd_mfn =
> pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
> -    dom_iommu(d)->arch.pgd_maddr =
> +    dom_iommu(d)->arch.vtd.pgd_maddr =
>          pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
>  }
> 
> @@ -1912,7 +1916,7 @@ static int rmrr_identity_mapping(struct domain *d,
> bool_t map,
>       * No need to acquire hd->arch.mapping_lock: Both insertion and removal
>       * get done while holding pcidevs_lock.
>       */
> -    list_for_each_entry( mrmrr, &hd->arch.mapped_rmrrs, list )
> +    list_for_each_entry( mrmrr, &hd->arch.vtd.mapped_rmrrs, list )
>      {
>          if ( mrmrr->base == rmrr->base_address &&
>               mrmrr->end == rmrr->end_address )
> @@ -1959,7 +1963,7 @@ static int rmrr_identity_mapping(struct domain *d,
> bool_t map,
>      mrmrr->base = rmrr->base_address;
>      mrmrr->end = rmrr->end_address;
>      mrmrr->count = 1;
> -    list_add_tail(&mrmrr->list, &hd->arch.mapped_rmrrs);
> +    list_add_tail(&mrmrr->list, &hd->arch.vtd.mapped_rmrrs);
> 
>      return 0;
>  }
> @@ -2657,8 +2661,9 @@ static void vtd_dump_p2m_table(struct domain *d)
>          return;
> 
>      hd = dom_iommu(d);
> -    printk("p2m table has %d levels\n", agaw_to_level(hd->arch.agaw));
> -    vtd_dump_p2m_table_level(hd->arch.pgd_maddr, agaw_to_level(hd-
> >arch.agaw), 0, 0);
> +    printk("p2m table has %d levels\n", agaw_to_level(hd->arch.vtd.agaw));
> +    vtd_dump_p2m_table_level(hd->arch.vtd.pgd_maddr,
> +                             agaw_to_level(hd->arch.vtd.agaw), 0, 0);
>  }
> 
>  static int __init intel_iommu_quarantine_init(struct domain *d)
> @@ -2669,7 +2674,7 @@ static int __init intel_iommu_quarantine_init(struct
> domain *d)
>      unsigned int level = agaw_to_level(agaw);
>      int rc;
> 
> -    if ( hd->arch.pgd_maddr )
> +    if ( hd->arch.vtd.pgd_maddr )
>      {
>          ASSERT_UNREACHABLE();
>          return 0;
> @@ -2677,11 +2682,11 @@ static int __init
> intel_iommu_quarantine_init(struct domain *d)
> 
>      spin_lock(&hd->arch.mapping_lock);
> 
> -    hd->arch.pgd_maddr = alloc_pgtable_maddr(1, hd->node);
> -    if ( !hd->arch.pgd_maddr )
> +    hd->arch.vtd.pgd_maddr = alloc_pgtable_maddr(1, hd->node);
> +    if ( !hd->arch.vtd.pgd_maddr )
>          goto out;
> 
> -    parent = map_vtd_domain_page(hd->arch.pgd_maddr);
> +    parent = map_vtd_domain_page(hd->arch.vtd.pgd_maddr);
>      while ( level )
>      {
>          uint64_t maddr;
> diff --git a/xen/drivers/passthrough/x86/iommu.c
> b/xen/drivers/passthrough/x86/iommu.c
> index 3d7670e8c6..a12109a1de 100644
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -139,7 +139,6 @@ int arch_iommu_domain_init(struct domain *d)
>      struct domain_iommu *hd = dom_iommu(d);
> 
>      spin_lock_init(&hd->arch.mapping_lock);
> -    INIT_LIST_HEAD(&hd->arch.mapped_rmrrs);
> 
>      return 0;
>  }
> diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
> index 6c9d5e5632..8ce97c981f 100644
> --- a/xen/include/asm-x86/iommu.h
> +++ b/xen/include/asm-x86/iommu.h
> @@ -45,16 +45,23 @@ typedef uint64_t daddr_t;
> 
>  struct arch_iommu
>  {
> -    u64 pgd_maddr;                 /* io page directory machine address */
> -    spinlock_t mapping_lock;            /* io page table lock */
> -    int agaw;     /* adjusted guest address width, 0 is level 2 30-bit */
> -    u64 iommu_bitmap;              /* bitmap of iommu(s) that the domain uses
> */
> -    struct list_head mapped_rmrrs;
> -
> -    /* amd iommu support */
> -    int paging_mode;
> -    struct page_info *root_table;
> -    struct guest_iommu *g_iommu;
> +    spinlock_t mapping_lock; /* io page table lock */
> +
> +    union {
> +        /* Intel VT-d */
> +        struct {
> +            uint64_t pgd_maddr; /* io page directory machine address */
> +            unsigned int agaw; /* adjusted guest address width, 0 is level 2 30-bit
> */
> +            uint64_t iommu_bitmap; /* bitmap of iommu(s) that the domain
> uses */
> +            struct list_head mapped_rmrrs;
> +        } vtd;
> +        /* AMD IOMMU */
> +        struct {
> +            unsigned int paging_mode;
> +            struct page_info *root_table;
> +            struct guest_iommu *g_iommu;
> +        } amd;
> +    };
>  };
> 
>  extern struct iommu_ops iommu_ops;
> --
> 2.20.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 03/14] x86/iommu: convert VT-d code to use new page table allocator
  2020-08-04 13:41 ` [PATCH v4 03/14] x86/iommu: convert VT-d code to use new page table allocator Paul Durrant
@ 2020-08-14  6:41   ` Tian, Kevin
  2020-08-14  7:16     ` Durrant, Paul
  0 siblings, 1 reply; 43+ messages in thread
From: Tian, Kevin @ 2020-08-14  6:41 UTC (permalink / raw)
  To: Paul Durrant, xen-devel; +Cc: Paul Durrant, Jan Beulich

> From: Paul Durrant <paul@xen.org>
> Sent: Tuesday, August 4, 2020 9:42 PM
> 
> From: Paul Durrant <pdurrant@amazon.com>
> 
> This patch converts the VT-d code to use the new IOMMU page table
> allocator
> function. This allows all the free-ing code to be removed (since it is now
> handled by the general x86 code) which reduces TLB and cache thrashing as
> well
> as shortening the code.
> 
> The scope of the mapping_lock in intel_iommu_quarantine_init() has also
> been
> increased slightly; it should have always covered accesses to
> 'arch.vtd.pgd_maddr'.
> 
> NOTE: The common IOMMU needs a slight modification to avoid scheduling
> the
>       cleanup tasklet if the free_page_table() method is not present (since
>       the tasklet will unconditionally call it).
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> Cc: Kevin Tian <kevin.tian@intel.com>
> 
> v2:
>  - New in v2 (split from "add common page-table allocator")
> ---
>  xen/drivers/passthrough/iommu.c     |   6 +-
>  xen/drivers/passthrough/vtd/iommu.c | 101 ++++++++++------------------
>  2 files changed, 39 insertions(+), 68 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/iommu.c
> b/xen/drivers/passthrough/iommu.c
> index 1d644844ab..2b1db8022c 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -225,8 +225,10 @@ static void iommu_teardown(struct domain *d)
>  {
>      struct domain_iommu *hd = dom_iommu(d);
> 
> -    hd->platform_ops->teardown(d);
> -    tasklet_schedule(&iommu_pt_cleanup_tasklet);
> +    iommu_vcall(hd->platform_ops, teardown, d);
> +
> +    if ( hd->platform_ops->free_page_table )
> +        tasklet_schedule(&iommu_pt_cleanup_tasklet);
>  }
> 
>  void iommu_domain_destroy(struct domain *d)
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index 94e0455a4d..607e8b5e65 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -265,10 +265,15 @@ static u64 addr_to_dma_page_maddr(struct
> domain *domain, u64 addr, int alloc)
> 
>      addr &= (((u64)1) << addr_width) - 1;
>      ASSERT(spin_is_locked(&hd->arch.mapping_lock));
> -    if ( !hd->arch.vtd.pgd_maddr &&
> -         (!alloc ||
> -          ((hd->arch.vtd.pgd_maddr = alloc_pgtable_maddr(1, hd->node)) ==
> 0)) )
> -        goto out;
> +    if ( !hd->arch.vtd.pgd_maddr )
> +    {
> +        struct page_info *pg;
> +
> +        if ( !alloc || !(pg = iommu_alloc_pgtable(domain)) )
> +            goto out;
> +
> +        hd->arch.vtd.pgd_maddr = page_to_maddr(pg);
> +    }
> 
>      parent = (struct dma_pte *)map_vtd_domain_page(hd-
> >arch.vtd.pgd_maddr);
>      while ( level > 1 )
> @@ -279,13 +284,16 @@ static u64 addr_to_dma_page_maddr(struct
> domain *domain, u64 addr, int alloc)
>          pte_maddr = dma_pte_addr(*pte);
>          if ( !pte_maddr )
>          {
> +            struct page_info *pg;
> +
>              if ( !alloc )
>                  break;
> 
> -            pte_maddr = alloc_pgtable_maddr(1, hd->node);
> -            if ( !pte_maddr )
> +            pg = iommu_alloc_pgtable(domain);
> +            if ( !pg )
>                  break;
> 
> +            pte_maddr = page_to_maddr(pg);
>              dma_set_pte_addr(*pte, pte_maddr);
> 
>              /*
> @@ -675,45 +683,6 @@ static void dma_pte_clear_one(struct domain
> *domain, uint64_t addr,
>      unmap_vtd_domain_page(page);
>  }
> 
> -static void iommu_free_pagetable(u64 pt_maddr, int level)
> -{
> -    struct page_info *pg = maddr_to_page(pt_maddr);
> -
> -    if ( pt_maddr == 0 )
> -        return;
> -
> -    PFN_ORDER(pg) = level;
> -    spin_lock(&iommu_pt_cleanup_lock);
> -    page_list_add_tail(pg, &iommu_pt_cleanup_list);
> -    spin_unlock(&iommu_pt_cleanup_lock);
> -}
> -
> -static void iommu_free_page_table(struct page_info *pg)
> -{
> -    unsigned int i, next_level = PFN_ORDER(pg) - 1;
> -    u64 pt_maddr = page_to_maddr(pg);
> -    struct dma_pte *pt_vaddr, *pte;
> -
> -    PFN_ORDER(pg) = 0;
> -    pt_vaddr = (struct dma_pte *)map_vtd_domain_page(pt_maddr);
> -
> -    for ( i = 0; i < PTE_NUM; i++ )
> -    {
> -        pte = &pt_vaddr[i];
> -        if ( !dma_pte_present(*pte) )
> -            continue;
> -
> -        if ( next_level >= 1 )
> -            iommu_free_pagetable(dma_pte_addr(*pte), next_level);
> -
> -        dma_clear_pte(*pte);
> -        iommu_sync_cache(pte, sizeof(struct dma_pte));

I didn't see sync_cache in the new iommu_free_pgtables. Is it intended
(i.e. original flush is meaningless) or overlooked?

Thanks
Kevin

> -    }
> -
> -    unmap_vtd_domain_page(pt_vaddr);
> -    free_pgtable_maddr(pt_maddr);
> -}
> -
>  static int iommu_set_root_entry(struct vtd_iommu *iommu)
>  {
>      u32 sts;
> @@ -1748,16 +1717,7 @@ static void iommu_domain_teardown(struct
> domain *d)
>          xfree(mrmrr);
>      }
> 
> -    ASSERT(is_iommu_enabled(d));
> -
> -    if ( iommu_use_hap_pt(d) )
> -        return;
> -
> -    spin_lock(&hd->arch.mapping_lock);
> -    iommu_free_pagetable(hd->arch.vtd.pgd_maddr,
> -                         agaw_to_level(hd->arch.vtd.agaw));
>      hd->arch.vtd.pgd_maddr = 0;
> -    spin_unlock(&hd->arch.mapping_lock);
>  }
> 
>  static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
> @@ -2669,23 +2629,28 @@ static void vtd_dump_p2m_table(struct domain
> *d)
>  static int __init intel_iommu_quarantine_init(struct domain *d)
>  {
>      struct domain_iommu *hd = dom_iommu(d);
> +    struct page_info *pg;
>      struct dma_pte *parent;
>      unsigned int agaw =
> width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
>      unsigned int level = agaw_to_level(agaw);
> -    int rc;
> +    int rc = 0;
> +
> +    spin_lock(&hd->arch.mapping_lock);
> 
>      if ( hd->arch.vtd.pgd_maddr )
>      {
>          ASSERT_UNREACHABLE();
> -        return 0;
> +        goto out;
>      }
> 
> -    spin_lock(&hd->arch.mapping_lock);
> +    pg = iommu_alloc_pgtable(d);
> 
> -    hd->arch.vtd.pgd_maddr = alloc_pgtable_maddr(1, hd->node);
> -    if ( !hd->arch.vtd.pgd_maddr )
> +    rc = -ENOMEM;
> +    if ( !pg )
>          goto out;
> 
> +    hd->arch.vtd.pgd_maddr = page_to_maddr(pg);
> +
>      parent = map_vtd_domain_page(hd->arch.vtd.pgd_maddr);
>      while ( level )
>      {
> @@ -2697,10 +2662,12 @@ static int __init
> intel_iommu_quarantine_init(struct domain *d)
>           * page table pages, and the resulting allocations are always
>           * zeroed.
>           */
> -        maddr = alloc_pgtable_maddr(1, hd->node);
> -        if ( !maddr )
> -            break;
> +        pg = iommu_alloc_pgtable(d);
> +
> +        if ( !pg )
> +            goto out;
> 
> +        maddr = page_to_maddr(pg);
>          for ( offset = 0; offset < PTE_NUM; offset++ )
>          {
>              struct dma_pte *pte = &parent[offset];
> @@ -2716,13 +2683,16 @@ static int __init
> intel_iommu_quarantine_init(struct domain *d)
>      }
>      unmap_vtd_domain_page(parent);
> 
> +    rc = 0;
> +
>   out:
>      spin_unlock(&hd->arch.mapping_lock);
> 
> -    rc = iommu_flush_iotlb_all(d);
> +    if ( !rc )
> +        rc = iommu_flush_iotlb_all(d);
> 
> -    /* Pages leaked in failure case */
> -    return level ? -ENOMEM : rc;
> +    /* Pages may be leaked in failure case */
> +    return rc;
>  }
> 
>  static struct iommu_ops __initdata vtd_ops = {
> @@ -2737,7 +2707,6 @@ static struct iommu_ops __initdata vtd_ops = {
>      .map_page = intel_iommu_map_page,
>      .unmap_page = intel_iommu_unmap_page,
>      .lookup_page = intel_iommu_lookup_page,
> -    .free_page_table = iommu_free_page_table,
>      .reassign_device = reassign_device_ownership,
>      .get_device_group_id = intel_iommu_group_id,
>      .enable_x2apic = intel_iommu_enable_eim,
> --
> 2.20.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 06/14] iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail
  2020-08-04 13:42 ` [PATCH v4 06/14] iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail Paul Durrant
  2020-08-05 16:06   ` Jan Beulich
  2020-08-06 11:41   ` Jan Beulich
@ 2020-08-14  6:53   ` Tian, Kevin
  2020-08-14  7:19     ` Durrant, Paul
  2 siblings, 1 reply; 43+ messages in thread
From: Tian, Kevin @ 2020-08-14  6:53 UTC (permalink / raw)
  To: Paul Durrant, xen-devel; +Cc: Paul Durrant, Jan Beulich

> From: Paul Durrant
> Sent: Tuesday, August 4, 2020 9:42 PM
> 
> From: Paul Durrant <pdurrant@amazon.com>
> 
> This patch adds a full I/O TLB flush to the error paths of iommu_map() and
> iommu_unmap().
> 
> Without this change callers need constructs such as:
> 
> rc = iommu_map/unmap(...)
> err = iommu_flush(...)
> if ( !rc )
>   rc = err;
> 
> With this change, it can be simplified to:
> 
> rc = iommu_map/unmap(...)
> if ( !rc )
>   rc = iommu_flush(...)
> 
> because, if the map or unmap fails the flush will be unnecessary. This saves

this statement is different from change in iommu_map...

> a stack variable and generally makes the call sites tidier.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> 
> v2:
>  - New in v2
> ---
>  xen/drivers/passthrough/iommu.c | 28 ++++++++++++----------------
>  1 file changed, 12 insertions(+), 16 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/iommu.c
> b/xen/drivers/passthrough/iommu.c
> index 660dc5deb2..e2c0193a09 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -274,6 +274,10 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t
> mfn,
>          break;
>      }
> 
> +    /* Something went wrong so flush everything and clear flush flags */
> +    if ( unlikely(rc) && iommu_iotlb_flush_all(d, *flush_flags) )
> +        flush_flags = 0;
> +

... earlier you said flush is unnecessary if map fails. But here actually you
still need to flush everything so it's just sort of moving error-path flush
within the map function?

Thanks
Kevin

>      return rc;
>  }
> 
> @@ -283,14 +287,8 @@ int iommu_legacy_map(struct domain *d, dfn_t dfn,
> mfn_t mfn,
>      unsigned int flush_flags = 0;
>      int rc = iommu_map(d, dfn, mfn, page_order, flags, &flush_flags);
> 
> -    if ( !this_cpu(iommu_dont_flush_iotlb) )
> -    {
> -        int err = iommu_iotlb_flush(d, dfn, (1u << page_order),
> -                                    flush_flags);
> -
> -        if ( !rc )
> -            rc = err;
> -    }
> +    if ( !this_cpu(iommu_dont_flush_iotlb) && !rc )
> +        rc = iommu_iotlb_flush(d, dfn, (1u << page_order), flush_flags);
> 
>      return rc;
>  }
> @@ -330,6 +328,10 @@ int iommu_unmap(struct domain *d, dfn_t dfn,
> unsigned int page_order,
>          }
>      }
> 
> +    /* Something went wrong so flush everything and clear flush flags */
> +    if ( unlikely(rc) && iommu_iotlb_flush_all(d, *flush_flags) )
> +        flush_flags = 0;
> +
>      return rc;
>  }
> 
> @@ -338,14 +340,8 @@ int iommu_legacy_unmap(struct domain *d, dfn_t
> dfn, unsigned int page_order)
>      unsigned int flush_flags = 0;
>      int rc = iommu_unmap(d, dfn, page_order, &flush_flags);
> 
> -    if ( !this_cpu(iommu_dont_flush_iotlb) )
> -    {
> -        int err = iommu_iotlb_flush(d, dfn, (1u << page_order),
> -                                    flush_flags);
> -
> -        if ( !rc )
> -            rc = err;
> -    }
> +    if ( !this_cpu(iommu_dont_flush_iotlb) && ! rc )
> +        rc = iommu_iotlb_flush(d, dfn, (1u << page_order), flush_flags);
> 
>      return rc;
>  }
> --
> 2.20.1
> 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 07/14] iommu: make map, unmap and flush all take both an order and a count
  2020-08-06  9:57   ` Jan Beulich
  2020-08-11 11:00     ` Durrant, Paul
@ 2020-08-14  6:57     ` Tian, Kevin
  1 sibling, 0 replies; 43+ messages in thread
From: Tian, Kevin @ 2020-08-14  6:57 UTC (permalink / raw)
  To: Jan Beulich, Paul Durrant
  Cc: xen-devel, Paul Durrant, Nakajima, Jun, Andrew Cooper,
	George Dunlap, Wei Liu, Roger Pau Monné,
	Ian Jackson, Julien Grall, Stefano Stabellini, Volodymyr Babchuk

> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, August 6, 2020 5:57 PM
> 
> On 04.08.2020 15:42, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > At the moment iommu_map() and iommu_unmap() take a page order but
> not a
> > count, whereas iommu_iotlb_flush() takes a count but not a page order.
> > This patch simply makes them consistent with each other.
> 
> Why can't we do with just a count, where order gets worked out by
> functions knowing how to / wanting to deal with higher order pages?

Agree. especially the new map/unmap code looks weird when having both 
order and count in parameters.

Thanks
Kevin

> 
> > --- a/xen/arch/x86/mm/p2m-ept.c
> > +++ b/xen/arch/x86/mm/p2m-ept.c
> > @@ -843,7 +843,7 @@ out:
> >           need_modify_vtd_table )
> >      {
> >          if ( iommu_use_hap_pt(d) )
> > -            rc = iommu_iotlb_flush(d, _dfn(gfn), (1u << order),
> > +            rc = iommu_iotlb_flush(d, _dfn(gfn), (1u << order), 1,
> 
> Forgot to drop the "1 << "? (There are then I think two more instances
> further down.)
> 
> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -851,12 +851,12 @@ int xenmem_add_to_physmap(struct domain *d,
> struct xen_add_to_physmap *xatp,
> >
> >          this_cpu(iommu_dont_flush_iotlb) = 0;
> >
> > -        ret = iommu_iotlb_flush(d, _dfn(xatp->idx - done), done,
> > +        ret = iommu_iotlb_flush(d, _dfn(xatp->idx - done), 0, done,
> 
> Arguments wrong way round? (This risk of inverting their order is
> one of the primary reasons why I think we want just a count.) I'm
> also uncertain about the use of 0 vs PAGE_ORDER_4K here.
> 
> >                                  IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified);
> >          if ( unlikely(ret) && rc >= 0 )
> >              rc = ret;
> >
> > -        ret = iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), done,
> > +        ret = iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), 0, done,
> 
> Same here then.
> 
> Jan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 10/14] iommu: remove the share_p2m operation
  2020-08-04 13:42 ` [PATCH v4 10/14] iommu: remove the share_p2m operation Paul Durrant
  2020-08-06 12:18   ` Jan Beulich
@ 2020-08-14  7:04   ` Tian, Kevin
  1 sibling, 0 replies; 43+ messages in thread
From: Tian, Kevin @ 2020-08-14  7:04 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Paul Durrant, Jan Beulich, Andrew Cooper, George Dunlap, Wei Liu,
	Roger Pau Monné

> From: Paul Durrant <paul@xen.org>
> Sent: Tuesday, August 4, 2020 9:42 PM
> 
> From: Paul Durrant <pdurrant@amazon.com>
> 
> Sharing of HAP tables is now VT-d specific so the operation is never defined
> for AMD IOMMU any more. There's also no need to pro-actively set
> vtd.pgd_maddr
> when using shared EPT as it is straightforward to simply define a helper
> function to return the appropriate value in the shared and non-shared cases.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Wei Liu <wl@xen.org>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> 
> v2:
>   - Put the PGD level adjust into the helper function too, since it is
>     irrelevant in the shared EPT case
> ---
>  xen/arch/x86/mm/p2m.c               |  3 -
>  xen/drivers/passthrough/iommu.c     |  8 ---
>  xen/drivers/passthrough/vtd/iommu.c | 90 ++++++++++++++++-------------
>  xen/include/xen/iommu.h             |  3 -
>  4 files changed, 50 insertions(+), 54 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 9f8b9bc5fd..3bd8d83d23 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -726,9 +726,6 @@ int p2m_alloc_table(struct p2m_domain *p2m)
> 
>      p2m->phys_table = pagetable_from_mfn(top_mfn);
> 
> -    if ( hap_enabled(d) )
> -        iommu_share_p2m_table(d);
> -
>      p2m_unlock(p2m);
>      return 0;
>  }
> diff --git a/xen/drivers/passthrough/iommu.c
> b/xen/drivers/passthrough/iommu.c
> index ab44c332bb..7464f10d1c 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -498,14 +498,6 @@ int iommu_do_domctl(
>      return ret;
>  }
> 
> -void iommu_share_p2m_table(struct domain* d)
> -{
> -    ASSERT(hap_enabled(d));
> -
> -    if ( iommu_use_hap_pt(d) )
> -        iommu_get_ops()->share_p2m(d);
> -}
> -
>  void iommu_crash_shutdown(void)
>  {
>      if ( !iommu_crash_disable )
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index 68cf0e535a..a532d9e88c 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -318,6 +318,48 @@ static u64 addr_to_dma_page_maddr(struct
> domain *domain, u64 addr, int alloc)
>      return pte_maddr;
>  }
> 
> +static uint64_t domain_pgd_maddr(struct domain *d, struct vtd_iommu
> *iommu)
> +{
> +    struct domain_iommu *hd = dom_iommu(d);
> +    uint64_t pgd_maddr;
> +    unsigned int agaw;
> +
> +    ASSERT(spin_is_locked(&hd->arch.mapping_lock));
> +
> +    if ( iommu_use_hap_pt(d) )
> +    {
> +        mfn_t pgd_mfn =
> +            pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
> +
> +        return pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
> +    }
> +
> +    if ( !hd->arch.vtd.pgd_maddr )
> +    {
> +        addr_to_dma_page_maddr(d, 0, 1);
> +
> +        if ( !hd->arch.vtd.pgd_maddr )
> +            return 0;
> +    }
> +
> +    pgd_maddr = hd->arch.vtd.pgd_maddr;
> +
> +    /* Skip top levels of page tables for 2- and 3-level DRHDs. */
> +    for ( agaw = level_to_agaw(4);
> +          agaw != level_to_agaw(iommu->nr_pt_levels);
> +          agaw-- )
> +    {
> +        struct dma_pte *p = map_vtd_domain_page(pgd_maddr);
> +
> +        pgd_maddr = dma_pte_addr(*p);
> +        unmap_vtd_domain_page(p);
> +        if ( !pgd_maddr )
> +            return 0;
> +    }
> +
> +    return pgd_maddr;
> +}
> +
>  static void iommu_flush_write_buffer(struct vtd_iommu *iommu)
>  {
>      u32 val;
> @@ -1286,7 +1328,7 @@ int domain_context_mapping_one(
>      struct context_entry *context, *context_entries;
>      u64 maddr, pgd_maddr;
>      u16 seg = iommu->drhd->segment;
> -    int agaw, rc, ret;
> +    int rc, ret;
>      bool_t flush_dev_iotlb;
> 
>      ASSERT(pcidevs_locked());
> @@ -1340,37 +1382,18 @@ int domain_context_mapping_one(
>      if ( iommu_hwdom_passthrough && is_hardware_domain(domain) )
>      {
>          context_set_translation_type(*context, CONTEXT_TT_PASS_THRU);
> -        agaw = level_to_agaw(iommu->nr_pt_levels);
>      }
>      else
>      {
>          spin_lock(&hd->arch.mapping_lock);
> 
> -        /* Ensure we have pagetables allocated down to leaf PTE. */
> -        if ( hd->arch.vtd.pgd_maddr == 0 )
> +        pgd_maddr = domain_pgd_maddr(domain, iommu);
> +        if ( !pgd_maddr )
>          {
> -            addr_to_dma_page_maddr(domain, 0, 1);
> -            if ( hd->arch.vtd.pgd_maddr == 0 )
> -            {
> -            nomem:
> -                spin_unlock(&hd->arch.mapping_lock);
> -                spin_unlock(&iommu->lock);
> -                unmap_vtd_domain_page(context_entries);
> -                return -ENOMEM;
> -            }
> -        }
> -
> -        /* Skip top levels of page tables for 2- and 3-level DRHDs. */
> -        pgd_maddr = hd->arch.vtd.pgd_maddr;
> -        for ( agaw = level_to_agaw(4);
> -              agaw != level_to_agaw(iommu->nr_pt_levels);
> -              agaw-- )
> -        {
> -            struct dma_pte *p = map_vtd_domain_page(pgd_maddr);
> -            pgd_maddr = dma_pte_addr(*p);
> -            unmap_vtd_domain_page(p);
> -            if ( pgd_maddr == 0 )
> -                goto nomem;
> +            spin_unlock(&hd->arch.mapping_lock);
> +            spin_unlock(&iommu->lock);
> +            unmap_vtd_domain_page(context_entries);
> +            return -ENOMEM;
>          }
> 
>          context_set_address_root(*context, pgd_maddr);
> @@ -1389,7 +1412,7 @@ int domain_context_mapping_one(
>          return -EFAULT;
>      }
> 
> -    context_set_address_width(*context, agaw);
> +    context_set_address_width(*context, level_to_agaw(iommu-
> >nr_pt_levels));
>      context_set_fault_enable(*context);
>      context_set_present(*context);
>      iommu_sync_cache(context, sizeof(struct context_entry));
> @@ -1848,18 +1871,6 @@ static int __init vtd_ept_page_compatible(struct
> vtd_iommu *iommu)
>             (ept_has_1gb(ept_cap) && opt_hap_1gb) <= cap_sps_1gb(vtd_cap);
>  }
> 
> -/*
> - * set VT-d page table directory to EPT table if allowed
> - */
> -static void iommu_set_pgd(struct domain *d)
> -{
> -    mfn_t pgd_mfn;
> -
> -    pgd_mfn =
> pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
> -    dom_iommu(d)->arch.vtd.pgd_maddr =
> -        pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
> -}
> -
>  static int rmrr_identity_mapping(struct domain *d, bool_t map,
>                                   const struct acpi_rmrr_unit *rmrr,
>                                   u32 flag)
> @@ -2719,7 +2730,6 @@ static struct iommu_ops __initdata vtd_ops = {
>      .adjust_irq_affinities = adjust_vtd_irq_affinities,
>      .suspend = vtd_suspend,
>      .resume = vtd_resume,
> -    .share_p2m = iommu_set_pgd,
>      .crash_shutdown = vtd_crash_shutdown,
>      .iotlb_flush = iommu_flush_iotlb_pages,
>      .iotlb_flush_all = iommu_flush_iotlb_all,
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index b7e5d3da09..1f25d2082f 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -271,7 +271,6 @@ struct iommu_ops {
> 
>      int __must_check (*suspend)(void);
>      void (*resume)(void);
> -    void (*share_p2m)(struct domain *d);
>      void (*crash_shutdown)(void);
>      int __must_check (*iotlb_flush)(struct domain *d, dfn_t dfn,
>                                      unsigned long page_count,
> @@ -348,8 +347,6 @@ void iommu_resume(void);
>  void iommu_crash_shutdown(void);
>  int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
> 
> -void iommu_share_p2m_table(struct domain *d);
> -
>  #ifdef CONFIG_HAS_PCI
>  int iommu_do_pci_domctl(struct xen_domctl *, struct domain *d,
>                          XEN_GUEST_HANDLE_PARAM(xen_domctl_t));
> --
> 2.20.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 11/14] iommu: stop calling IOMMU page tables 'p2m tables'
  2020-08-04 13:42 ` [PATCH v4 11/14] iommu: stop calling IOMMU page tables 'p2m tables' Paul Durrant
  2020-08-06 12:23   ` Jan Beulich
@ 2020-08-14  7:12   ` Tian, Kevin
  1 sibling, 0 replies; 43+ messages in thread
From: Tian, Kevin @ 2020-08-14  7:12 UTC (permalink / raw)
  To: Paul Durrant, xen-devel; +Cc: Paul Durrant, Jan Beulich, Andrew Cooper

> From: Paul Durrant <paul@xen.org>
> Sent: Tuesday, August 4, 2020 9:42 PM
> 
> From: Paul Durrant <pdurrant@amazon.com>
> 
> It's confusing and not consistent with the terminology introduced with 'dfn_t'.
> Just call them IOMMU page tables.
> 
> Also remove a pointless check of the 'acpi_drhd_units' list in
> vtd_dump_page_table_level(). If the list is empty then IOMMU mappings
> would
> not have been enabled for the domain in the first place.
> 
> NOTE: All calls to printk() have also been removed from
>       iommu_dump_page_tables(); the implementation specific code is now
>       responsible for all output.
>       The check for the global 'iommu_enabled' has also been replaced by an
>       ASSERT since iommu_dump_page_tables() is not registered as a key
> handler
>       unless IOMMU mappings are enabled.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>

with Jan's comments addressed:
	Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Paul Durrant <paul@xen.org>
> Cc: Kevin Tian <kevin.tian@intel.com>
> 
> v2:
>  - Moved all output into implementation specific code
> ---
>  xen/drivers/passthrough/amd/pci_amd_iommu.c | 16 ++++++-------
>  xen/drivers/passthrough/iommu.c             | 21 ++++-------------
>  xen/drivers/passthrough/vtd/iommu.c         | 26 +++++++++++----------
>  xen/include/xen/iommu.h                     |  2 +-
>  4 files changed, 28 insertions(+), 37 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index 3390c22cf3..be578607b1 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -491,8 +491,8 @@ static int amd_iommu_group_id(u16 seg, u8 bus, u8
> devfn)
> 
>  #include <asm/io_apic.h>
> 
> -static void amd_dump_p2m_table_level(struct page_info* pg, int level,
> -                                     paddr_t gpa, int indent)
> +static void amd_dump_page_table_level(struct page_info* pg, int level,
> +                                      paddr_t gpa, int indent)
>  {
>      paddr_t address;
>      struct amd_iommu_pte *table_vaddr;
> @@ -529,7 +529,7 @@ static void amd_dump_p2m_table_level(struct
> page_info* pg, int level,
> 
>          address = gpa + amd_offset_level_address(index, level);
>          if ( pde->next_level >= 1 )
> -            amd_dump_p2m_table_level(
> +            amd_dump_page_table_level(
>                  mfn_to_page(_mfn(pde->mfn)), pde->next_level,
>                  address, indent + 1);
>          else
> @@ -542,16 +542,16 @@ static void amd_dump_p2m_table_level(struct
> page_info* pg, int level,
>      unmap_domain_page(table_vaddr);
>  }
> 
> -static void amd_dump_p2m_table(struct domain *d)
> +static void amd_dump_page_tables(struct domain *d)
>  {
>      const struct domain_iommu *hd = dom_iommu(d);
> 
>      if ( !hd->arch.amd.root_table )
>          return;
> 
> -    printk("p2m table has %u levels\n", hd->arch.amd.paging_mode);
> -    amd_dump_p2m_table_level(hd->arch.amd.root_table,
> -                             hd->arch.amd.paging_mode, 0, 0);
> +    printk("AMD IOMMU table has %u levels\n", hd-
> >arch.amd.paging_mode);
> +    amd_dump_page_table_level(hd->arch.amd.root_table,
> +                              hd->arch.amd.paging_mode, 0, 0);
>  }
> 
>  static const struct iommu_ops __initconstrel _iommu_ops = {
> @@ -578,7 +578,7 @@ static const struct iommu_ops __initconstrel
> _iommu_ops = {
>      .suspend = amd_iommu_suspend,
>      .resume = amd_iommu_resume,
>      .crash_shutdown = amd_iommu_crash_shutdown,
> -    .dump_p2m_table = amd_dump_p2m_table,
> +    .dump_page_tables = amd_dump_page_tables,
>  };
> 
>  static const struct iommu_init_ops __initconstrel _iommu_init_ops = {
> diff --git a/xen/drivers/passthrough/iommu.c
> b/xen/drivers/passthrough/iommu.c
> index 7464f10d1c..0f468379e1 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -22,7 +22,7 @@
>  #include <xen/keyhandler.h>
>  #include <xsm/xsm.h>
> 
> -static void iommu_dump_p2m_table(unsigned char key);
> +static void iommu_dump_page_tables(unsigned char key);
> 
>  unsigned int __read_mostly iommu_dev_iotlb_timeout = 1000;
>  integer_param("iommu_dev_iotlb_timeout", iommu_dev_iotlb_timeout);
> @@ -212,7 +212,7 @@ void __hwdom_init iommu_hwdom_init(struct
> domain *d)
>      if ( !is_iommu_enabled(d) )
>          return;
> 
> -    register_keyhandler('o', &iommu_dump_p2m_table, "dump iommu p2m
> table", 0);
> +    register_keyhandler('o', &iommu_dump_page_tables, "dump iommu
> page tables", 0);
> 
>      hd->platform_ops->hwdom_init(d);
>  }
> @@ -533,16 +533,12 @@ bool_t iommu_has_feature(struct domain *d,
> enum iommu_feature feature)
>      return is_iommu_enabled(d) && test_bit(feature, dom_iommu(d)-
> >features);
>  }
> 
> -static void iommu_dump_p2m_table(unsigned char key)
> +static void iommu_dump_page_tables(unsigned char key)
>  {
>      struct domain *d;
>      const struct iommu_ops *ops;
> 
> -    if ( !iommu_enabled )
> -    {
> -        printk("IOMMU not enabled!\n");
> -        return;
> -    }
> +    ASSERT(iommu_enabled);
> 
>      ops = iommu_get_ops();
> 
> @@ -553,14 +549,7 @@ static void iommu_dump_p2m_table(unsigned char
> key)
>          if ( is_hardware_domain(d) || !is_iommu_enabled(d) )
>              continue;
> 
> -        if ( iommu_use_hap_pt(d) )
> -        {
> -            printk("\ndomain%d IOMMU p2m table shared with MMU: \n", d-
> >domain_id);
> -            continue;
> -        }
> -
> -        printk("\ndomain%d IOMMU p2m table: \n", d->domain_id);
> -        ops->dump_p2m_table(d);
> +        ops->dump_page_tables(d);
>      }
> 
>      rcu_read_unlock(&domlist_read_lock);
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index a532d9e88c..f8da4fe0e7 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2582,8 +2582,8 @@ static void vtd_resume(void)
>      }
>  }
> 
> -static void vtd_dump_p2m_table_level(paddr_t pt_maddr, int level, paddr_t
> gpa,
> -                                     int indent)
> +static void vtd_dump_page_table_level(paddr_t pt_maddr, int level,
> paddr_t gpa,
> +                                      int indent)
>  {
>      paddr_t address;
>      int i;
> @@ -2612,8 +2612,8 @@ static void vtd_dump_p2m_table_level(paddr_t
> pt_maddr, int level, paddr_t gpa,
> 
>          address = gpa + offset_level_address(i, level);
>          if ( next_level >= 1 )
> -            vtd_dump_p2m_table_level(dma_pte_addr(*pte), next_level,
> -                                     address, indent + 1);
> +            vtd_dump_page_table_level(dma_pte_addr(*pte), next_level,
> +                                      address, indent + 1);
>          else
>              printk("%*sdfn: %08lx mfn: %08lx\n",
>                     indent, "",
> @@ -2624,17 +2624,19 @@ static void vtd_dump_p2m_table_level(paddr_t
> pt_maddr, int level, paddr_t gpa,
>      unmap_vtd_domain_page(pt_vaddr);
>  }
> 
> -static void vtd_dump_p2m_table(struct domain *d)
> +static void vtd_dump_page_tables(struct domain *d)
>  {
> -    const struct domain_iommu *hd;
> +    const struct domain_iommu *hd = dom_iommu(d);
> 
> -    if ( list_empty(&acpi_drhd_units) )
> +    if ( iommu_use_hap_pt(d) )
> +    {
> +        printk("VT-D sharing EPT table\n");
>          return;
> +    }
> 
> -    hd = dom_iommu(d);
> -    printk("p2m table has %d levels\n", agaw_to_level(hd->arch.vtd.agaw));
> -    vtd_dump_p2m_table_level(hd->arch.vtd.pgd_maddr,
> -                             agaw_to_level(hd->arch.vtd.agaw), 0, 0);
> +    printk("VT-D table has %d levels\n", agaw_to_level(hd->arch.vtd.agaw));
> +    vtd_dump_page_table_level(hd->arch.vtd.pgd_maddr,
> +                              agaw_to_level(hd->arch.vtd.agaw), 0, 0);
>  }
> 
>  static int __init intel_iommu_quarantine_init(struct domain *d)
> @@ -2734,7 +2736,7 @@ static struct iommu_ops __initdata vtd_ops = {
>      .iotlb_flush = iommu_flush_iotlb_pages,
>      .iotlb_flush_all = iommu_flush_iotlb_all,
>      .get_reserved_device_memory =
> intel_iommu_get_reserved_device_memory,
> -    .dump_p2m_table = vtd_dump_p2m_table,
> +    .dump_page_tables = vtd_dump_page_tables,
>  };
> 
>  const struct iommu_init_ops __initconstrel intel_iommu_init_ops = {
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index 1f25d2082f..23e884f54b 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -277,7 +277,7 @@ struct iommu_ops {
>                                      unsigned int flush_flags);
>      int __must_check (*iotlb_flush_all)(struct domain *d);
>      int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
> -    void (*dump_p2m_table)(struct domain *d);
> +    void (*dump_page_tables)(struct domain *d);
> 
>  #ifdef CONFIG_HAS_DEVICE_TREE
>      /*
> --
> 2.20.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 03/14] x86/iommu: convert VT-d code to use new page table allocator
  2020-08-14  6:41   ` Tian, Kevin
@ 2020-08-14  7:16     ` Durrant, Paul
  0 siblings, 0 replies; 43+ messages in thread
From: Durrant, Paul @ 2020-08-14  7:16 UTC (permalink / raw)
  To: Tian, Kevin, Paul Durrant, xen-devel; +Cc: Jan Beulich

> -----Original Message-----
[snip]
> > -static void iommu_free_page_table(struct page_info *pg)
> > -{
> > -    unsigned int i, next_level = PFN_ORDER(pg) - 1;
> > -    u64 pt_maddr = page_to_maddr(pg);
> > -    struct dma_pte *pt_vaddr, *pte;
> > -
> > -    PFN_ORDER(pg) = 0;
> > -    pt_vaddr = (struct dma_pte *)map_vtd_domain_page(pt_maddr);
> > -
> > -    for ( i = 0; i < PTE_NUM; i++ )
> > -    {
> > -        pte = &pt_vaddr[i];
> > -        if ( !dma_pte_present(*pte) )
> > -            continue;
> > -
> > -        if ( next_level >= 1 )
> > -            iommu_free_pagetable(dma_pte_addr(*pte), next_level);
> > -
> > -        dma_clear_pte(*pte);
> > -        iommu_sync_cache(pte, sizeof(struct dma_pte));
> 
> I didn't see sync_cache in the new iommu_free_pgtables. Is it intended
> (i.e. original flush is meaningless) or overlooked?
> 

The original v1 combined patch had the comment:

NOTE: There is no need to clear and sync PTEs during teardown since the per-
      device root entries will have already been cleared (when devices were
      de-assigned) so the page tables can no longer be accessed by the IOMMU.

I should have included that note in this one. I'll fix in v5.

  Paul

> Thanks
> Kevin


^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 12/14] vtd: use a bit field for root_entry
  2020-08-04 13:42 ` [PATCH v4 12/14] vtd: use a bit field for root_entry Paul Durrant
  2020-08-06 12:34   ` Jan Beulich
@ 2020-08-14  7:17   ` Tian, Kevin
  1 sibling, 0 replies; 43+ messages in thread
From: Tian, Kevin @ 2020-08-14  7:17 UTC (permalink / raw)
  To: Paul Durrant, xen-devel; +Cc: Paul Durrant

> From: Paul Durrant <paul@xen.org>
> Sent: Tuesday, August 4, 2020 9:42 PM
> 
> From: Paul Durrant <pdurrant@amazon.com>
> 
> This makes the code a little easier to read and also makes it more consistent
> with iremap_entry.

I feel the original readability is slightly better, as ctp is less obvious than
set_root_value, get_context_addr, etc.

Thanks
Kevin

> 
> Also take the opportunity to tidy up the implementation of
> device_in_domain().
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
> Cc: Kevin Tian <kevin.tian@intel.com>
> 
> v4:
>  - New in v4
> ---
>  xen/drivers/passthrough/vtd/iommu.c   |  4 ++--
>  xen/drivers/passthrough/vtd/iommu.h   | 33 ++++++++++++++++-----------
>  xen/drivers/passthrough/vtd/utils.c   |  4 ++--
>  xen/drivers/passthrough/vtd/x86/ats.c | 27 ++++++++++++----------
>  4 files changed, 39 insertions(+), 29 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index f8da4fe0e7..76025f6ccd 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -245,11 +245,11 @@ static u64 bus_to_context_maddr(struct
> vtd_iommu *iommu, u8 bus)
>              unmap_vtd_domain_page(root_entries);
>              return 0;
>          }
> -        set_root_value(*root, maddr);
> +        set_root_ctp(*root, maddr);
>          set_root_present(*root);
>          iommu_sync_cache(root, sizeof(struct root_entry));
>      }
> -    maddr = (u64) get_context_addr(*root);
> +    maddr = root_ctp(*root);
>      unmap_vtd_domain_page(root_entries);
>      return maddr;
>  }
> diff --git a/xen/drivers/passthrough/vtd/iommu.h
> b/xen/drivers/passthrough/vtd/iommu.h
> index 216791b3d6..031ac5f66c 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -184,21 +184,28 @@
>  #define dma_frcd_source_id(c) (c & 0xffff)
>  #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
> 
> -/*
> - * 0: Present
> - * 1-11: Reserved
> - * 12-63: Context Ptr (12 - (haw-1))
> - * 64-127: Reserved
> - */
>  struct root_entry {
> -    u64    val;
> -    u64    rsvd1;
> +    union {
> +        __uint128_t val;
> +        struct { uint64_t lo, hi; };
> +        struct {
> +            /* 0 - 63 */
> +            uint64_t p:1;
> +            uint64_t reserved0:11;
> +            uint64_t ctp:52;
> +
> +            /* 64 - 127 */
> +            uint64_t reserved1;
> +        };
> +    };
>  };
> -#define root_present(root)    ((root).val & 1)
> -#define set_root_present(root) do {(root).val |= 1;} while(0)
> -#define get_context_addr(root) ((root).val & PAGE_MASK_4K)
> -#define set_root_value(root, value) \
> -    do {(root).val |= ((value) & PAGE_MASK_4K);} while(0)
> +
> +#define root_present(r) (r).p
> +#define set_root_present(r) do { (r).p = 1; } while (0)
> +
> +#define root_ctp(r) ((r).ctp << PAGE_SHIFT_4K)
> +#define set_root_ctp(r, val) \
> +    do { (r).ctp = ((val) >> PAGE_SHIFT_4K); } while (0)
> 
>  struct context_entry {
>      u64 lo;
> diff --git a/xen/drivers/passthrough/vtd/utils.c
> b/xen/drivers/passthrough/vtd/utils.c
> index 4febcf506d..4c85242894 100644
> --- a/xen/drivers/passthrough/vtd/utils.c
> +++ b/xen/drivers/passthrough/vtd/utils.c
> @@ -112,7 +112,7 @@ void print_vtd_entries(struct vtd_iommu *iommu, int
> bus, int devfn, u64 gmfn)
>          return;
>      }
> 
> -    printk("    root_entry[%02x] = %"PRIx64"\n", bus, root_entry[bus].val);
> +    printk("    root_entry[%02x] = %"PRIx64"\n", bus, root_entry[bus].lo);
>      if ( !root_present(root_entry[bus]) )
>      {
>          unmap_vtd_domain_page(root_entry);
> @@ -120,7 +120,7 @@ void print_vtd_entries(struct vtd_iommu *iommu, int
> bus, int devfn, u64 gmfn)
>          return;
>      }
> 
> -    val = root_entry[bus].val;
> +    val = root_ctp(root_entry[bus]);
>      unmap_vtd_domain_page(root_entry);
>      ctxt_entry = map_vtd_domain_page(val);
>      if ( ctxt_entry == NULL )
> diff --git a/xen/drivers/passthrough/vtd/x86/ats.c
> b/xen/drivers/passthrough/vtd/x86/ats.c
> index 04d702b1d6..8369415dcc 100644
> --- a/xen/drivers/passthrough/vtd/x86/ats.c
> +++ b/xen/drivers/passthrough/vtd/x86/ats.c
> @@ -74,8 +74,8 @@ int ats_device(const struct pci_dev *pdev, const struct
> acpi_drhd_unit *drhd)
>  static bool device_in_domain(const struct vtd_iommu *iommu,
>                               const struct pci_dev *pdev, uint16_t did)
>  {
> -    struct root_entry *root_entry;
> -    struct context_entry *ctxt_entry = NULL;
> +    struct root_entry *root_entry, *root_entries = NULL;
> +    struct context_entry *context_entry, *context_entries = NULL;
>      unsigned int tt;
>      bool found = false;
> 
> @@ -85,25 +85,28 @@ static bool device_in_domain(const struct
> vtd_iommu *iommu,
>          return false;
>      }
> 
> -    root_entry = map_vtd_domain_page(iommu->root_maddr);
> -    if ( !root_present(root_entry[pdev->bus]) )
> +    root_entries = (struct root_entry *)map_vtd_domain_page(iommu-
> >root_maddr);
> +    root_entry = &root_entries[pdev->bus];
> +    if ( !root_present(*root_entry) )
>          goto out;
> 
> -    ctxt_entry = map_vtd_domain_page(root_entry[pdev->bus].val);
> -    if ( context_domain_id(ctxt_entry[pdev->devfn]) != did )
> +    context_entries = map_vtd_domain_page(root_ctp(*root_entry));
> +    context_entry = &context_entries[pdev->devfn];
> +    if ( context_domain_id(*context_entry) != did )
>          goto out;
> 
> -    tt = context_translation_type(ctxt_entry[pdev->devfn]);
> +    tt = context_translation_type(*context_entry);
>      if ( tt != CONTEXT_TT_DEV_IOTLB )
>          goto out;
> 
>      found = true;
> -out:
> -    if ( root_entry )
> -        unmap_vtd_domain_page(root_entry);
> 
> -    if ( ctxt_entry )
> -        unmap_vtd_domain_page(ctxt_entry);
> + out:
> +    if ( root_entries )
> +        unmap_vtd_domain_page(root_entries);
> +
> +    if ( context_entries )
> +        unmap_vtd_domain_page(context_entries);
> 
>      return found;
>  }
> --
> 2.20.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 06/14] iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail
  2020-08-14  6:53   ` Tian, Kevin
@ 2020-08-14  7:19     ` Durrant, Paul
  0 siblings, 0 replies; 43+ messages in thread
From: Durrant, Paul @ 2020-08-14  7:19 UTC (permalink / raw)
  To: Tian, Kevin, Paul Durrant, xen-devel; +Cc: Jan Beulich

> -----Original Message-----
> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: 14 August 2020 07:53
> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> Cc: Durrant, Paul <pdurrant@amazon.co.uk>; Jan Beulich <jbeulich@suse.com>
> Subject: RE: [EXTERNAL] [PATCH v4 06/14] iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> > From: Paul Durrant
> > Sent: Tuesday, August 4, 2020 9:42 PM
> >
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > This patch adds a full I/O TLB flush to the error paths of iommu_map() and
> > iommu_unmap().
> >
> > Without this change callers need constructs such as:
> >
> > rc = iommu_map/unmap(...)
> > err = iommu_flush(...)
> > if ( !rc )
> >   rc = err;
> >
> > With this change, it can be simplified to:
> >
> > rc = iommu_map/unmap(...)
> > if ( !rc )
> >   rc = iommu_flush(...)
> >
> > because, if the map or unmap fails the flush will be unnecessary. This saves
> 
> this statement is different from change in iommu_map...
> 
> > a stack variable and generally makes the call sites tidier.
> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> > ---
> > Cc: Jan Beulich <jbeulich@suse.com>
> >
> > v2:
> >  - New in v2
> > ---
> >  xen/drivers/passthrough/iommu.c | 28 ++++++++++++----------------
> >  1 file changed, 12 insertions(+), 16 deletions(-)
> >
> > diff --git a/xen/drivers/passthrough/iommu.c
> > b/xen/drivers/passthrough/iommu.c
> > index 660dc5deb2..e2c0193a09 100644
> > --- a/xen/drivers/passthrough/iommu.c
> > +++ b/xen/drivers/passthrough/iommu.c
> > @@ -274,6 +274,10 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t
> > mfn,
> >          break;
> >      }
> >
> > +    /* Something went wrong so flush everything and clear flush flags */
> > +    if ( unlikely(rc) && iommu_iotlb_flush_all(d, *flush_flags) )
> > +        flush_flags = 0;
> > +
> 
> ... earlier you said flush is unnecessary if map fails. But here actually you
> still need to flush everything so it's just sort of moving error-path flush
> within the map function?

Yes, that's actually what's happening. The language in the comment is ambiguous I guess. I'll modify it to say

"because, if the map or unmap fails an explicit flush will be unnecessary."

Hopefully that is clearer.

  Paul

> 
> Thanks
> Kevin
> 
> >      return rc;
> >  }
> >
> > @@ -283,14 +287,8 @@ int iommu_legacy_map(struct domain *d, dfn_t dfn,
> > mfn_t mfn,
> >      unsigned int flush_flags = 0;
> >      int rc = iommu_map(d, dfn, mfn, page_order, flags, &flush_flags);
> >
> > -    if ( !this_cpu(iommu_dont_flush_iotlb) )
> > -    {
> > -        int err = iommu_iotlb_flush(d, dfn, (1u << page_order),
> > -                                    flush_flags);
> > -
> > -        if ( !rc )
> > -            rc = err;
> > -    }
> > +    if ( !this_cpu(iommu_dont_flush_iotlb) && !rc )
> > +        rc = iommu_iotlb_flush(d, dfn, (1u << page_order), flush_flags);
> >
> >      return rc;
> >  }
> > @@ -330,6 +328,10 @@ int iommu_unmap(struct domain *d, dfn_t dfn,
> > unsigned int page_order,
> >          }
> >      }
> >
> > +    /* Something went wrong so flush everything and clear flush flags */
> > +    if ( unlikely(rc) && iommu_iotlb_flush_all(d, *flush_flags) )
> > +        flush_flags = 0;
> > +
> >      return rc;
> >  }
> >
> > @@ -338,14 +340,8 @@ int iommu_legacy_unmap(struct domain *d, dfn_t
> > dfn, unsigned int page_order)
> >      unsigned int flush_flags = 0;
> >      int rc = iommu_unmap(d, dfn, page_order, &flush_flags);
> >
> > -    if ( !this_cpu(iommu_dont_flush_iotlb) )
> > -    {
> > -        int err = iommu_iotlb_flush(d, dfn, (1u << page_order),
> > -                                    flush_flags);
> > -
> > -        if ( !rc )
> > -            rc = err;
> > -    }
> > +    if ( !this_cpu(iommu_dont_flush_iotlb) && ! rc )
> > +        rc = iommu_iotlb_flush(d, dfn, (1u << page_order), flush_flags);
> >
> >      return rc;
> >  }
> > --
> > 2.20.1
> >



^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v4 13/14] vtd: use a bit field for context_entry
  2020-08-04 13:42 ` [PATCH v4 13/14] vtd: use a bit field for context_entry Paul Durrant
  2020-08-06 12:46   ` Jan Beulich
@ 2020-08-14  7:19   ` Tian, Kevin
  1 sibling, 0 replies; 43+ messages in thread
From: Tian, Kevin @ 2020-08-14  7:19 UTC (permalink / raw)
  To: Paul Durrant, xen-devel; +Cc: Paul Durrant

> From: Paul Durrant <paul@xen.org>
> Sent: Tuesday, August 4, 2020 9:42 PM
> 
> From: Paul Durrant <pdurrant@amazon.com>
> 
> This removes the need for much shifting, masking and several magic
> numbers.
> On the whole it makes the code quite a bit more readable.

similarly, I feel the readability is worse such as slptp. We may use bitfeld
to define the structure, but the function name may be kept with current
way...

Thanks
kevin

> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
> Cc: Kevin Tian <kevin.tian@intel.com>
> 
> v4:
>  - New in v4
> ---
>  xen/drivers/passthrough/vtd/iommu.c   |  8 ++--
>  xen/drivers/passthrough/vtd/iommu.h   | 65 +++++++++++++++++----------
>  xen/drivers/passthrough/vtd/utils.c   |  6 +--
>  xen/drivers/passthrough/vtd/x86/ats.c |  2 +-
>  4 files changed, 49 insertions(+), 32 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index 76025f6ccd..766d33058e 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -86,8 +86,6 @@ static int domain_iommu_domid(struct domain *d,
>      return -1;
>  }
> 
> -#define DID_FIELD_WIDTH 16
> -#define DID_HIGH_OFFSET 8
>  static int context_set_domain_id(struct context_entry *context,
>                                   struct domain *d,
>                                   struct vtd_iommu *iommu)
> @@ -121,7 +119,7 @@ static int context_set_domain_id(struct
> context_entry *context,
>      }
> 
>      set_bit(i, iommu->domid_bitmap);
> -    context->hi |= (i & ((1 << DID_FIELD_WIDTH) - 1)) << DID_HIGH_OFFSET;
> +    context_set_did(*context, i);
>      return 0;
>  }
> 
> @@ -135,7 +133,7 @@ static int context_get_domain_id(struct
> context_entry *context,
>      {
>          nr_dom = cap_ndoms(iommu->cap);
> 
> -        dom_index = context_domain_id(*context);
> +        dom_index = context_did(*context);
> 
>          if ( dom_index < nr_dom && iommu->domid_map )
>              domid = iommu->domid_map[dom_index];
> @@ -1396,7 +1394,7 @@ int domain_context_mapping_one(
>              return -ENOMEM;
>          }
> 
> -        context_set_address_root(*context, pgd_maddr);
> +        context_set_slptp(*context, pgd_maddr);
>          if ( ats_enabled && ecap_dev_iotlb(iommu->ecap) )
>              context_set_translation_type(*context, CONTEXT_TT_DEV_IOTLB);
>          else
> diff --git a/xen/drivers/passthrough/vtd/iommu.h
> b/xen/drivers/passthrough/vtd/iommu.h
> index 031ac5f66c..509d13918a 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -199,6 +199,7 @@ struct root_entry {
>          };
>      };
>  };
> +#define ROOT_ENTRY_NR (PAGE_SIZE_4K / sizeof(struct root_entry))
> 
>  #define root_present(r) (r).p
>  #define set_root_present(r) do { (r).p = 1; } while (0)
> @@ -208,35 +209,53 @@ struct root_entry {
>      do { (r).ctp = ((val) >> PAGE_SHIFT_4K); } while (0)
> 
>  struct context_entry {
> -    u64 lo;
> -    u64 hi;
> +    union {
> +        __uint128_t val;
> +        struct { uint64_t lo, hi; };
> +        struct {
> +            /* 0 - 63 */
> +            uint64_t p:1;
> +            uint64_t fpd:1;
> +            uint64_t tt:2;
> +            uint64_t reserved0:8;
> +            uint64_t slptp:52;
> +
> +            /* 64 - 127 */
> +            uint64_t aw:3;
> +            uint64_t ignored:4;
> +            uint64_t reserved1:1;
> +            uint64_t did:16;
> +            uint64_t reserved2:40;
> +        };
> +    };
>  };
> -#define ROOT_ENTRY_NR (PAGE_SIZE_4K/sizeof(struct root_entry))
> -#define context_present(c) ((c).lo & 1)
> -#define context_fault_disable(c) (((c).lo >> 1) & 1)
> -#define context_translation_type(c) (((c).lo >> 2) & 3)
> -#define context_address_root(c) ((c).lo & PAGE_MASK_4K)
> -#define context_address_width(c) ((c).hi &  7)
> -#define context_domain_id(c) (((c).hi >> 8) & ((1 << 16) - 1))
> -
> -#define context_set_present(c) do {(c).lo |= 1;} while(0)
> -#define context_clear_present(c) do {(c).lo &= ~1;} while(0)
> -#define context_set_fault_enable(c) \
> -    do {(c).lo &= (((u64)-1) << 2) | 1;} while(0)
> -
> -#define context_set_translation_type(c, val) do { \
> -        (c).lo &= (((u64)-1) << 4) | 3; \
> -        (c).lo |= (val & 3) << 2; \
> -    } while(0)
> +
> +#define context_present(c) (c).p
> +#define context_set_present(c) do { (c).p = 1; } while (0)
> +#define context_clear_present(c) do { (c).p = 0; } while (0)
> +
> +#define context_fault_disable(c) (c).fpd
> +#define context_set_fault_enable(c) do { (c).fpd = 1; } while (0)
> +
> +#define context_translation_type(c) (c).tt
> +#define context_set_translation_type(c, val) do { (c).tt = val; } while (0)
>  #define CONTEXT_TT_MULTI_LEVEL 0
>  #define CONTEXT_TT_DEV_IOTLB   1
>  #define CONTEXT_TT_PASS_THRU   2
> 
> -#define context_set_address_root(c, val) \
> -    do {(c).lo &= 0xfff; (c).lo |= (val) & PAGE_MASK_4K ;} while(0)
> +#define context_slptp(c) ((c).slptp << PAGE_SHIFT_4K)
> +#define context_set_slptp(c, val) \
> +    do { (c).slptp = (val) >> PAGE_SHIFT_4K; } while (0)
> +
> +#define context_address_width(c) (c).aw
>  #define context_set_address_width(c, val) \
> -    do {(c).hi &= 0xfffffff8; (c).hi |= (val) & 7;} while(0)
> -#define context_clear_entry(c) do {(c).lo = 0; (c).hi = 0;} while(0)
> +    do { (c).aw = (val); } while (0)
> +
> +#define context_did(c) (c).did
> +#define context_set_did(c, val) \
> +    do { (c).did = (val); } while (0)
> +
> +#define context_clear_entry(c) do { (c).val = 0; } while (0)
> 
>  /* page table handling */
>  #define LEVEL_STRIDE       (9)
> diff --git a/xen/drivers/passthrough/vtd/utils.c
> b/xen/drivers/passthrough/vtd/utils.c
> index 4c85242894..eae0c43269 100644
> --- a/xen/drivers/passthrough/vtd/utils.c
> +++ b/xen/drivers/passthrough/vtd/utils.c
> @@ -129,9 +129,8 @@ void print_vtd_entries(struct vtd_iommu *iommu, int
> bus, int devfn, u64 gmfn)
>          return;
>      }
> 
> -    val = ctxt_entry[devfn].lo;
> -    printk("    context[%02x] = %"PRIx64"_%"PRIx64"\n",
> -           devfn, ctxt_entry[devfn].hi, val);
> +    printk("    context[%02x] = %"PRIx64"_%"PRIx64"\n", devfn,
> +           ctxt_entry[devfn].hi, ctxt_entry[devfn].lo);
>      if ( !context_present(ctxt_entry[devfn]) )
>      {
>          unmap_vtd_domain_page(ctxt_entry);
> @@ -140,6 +139,7 @@ void print_vtd_entries(struct vtd_iommu *iommu, int
> bus, int devfn, u64 gmfn)
>      }
> 
>      level = agaw_to_level(context_address_width(ctxt_entry[devfn]));
> +    val = context_slptp(ctxt_entry[devfn]);
>      unmap_vtd_domain_page(ctxt_entry);
>      if ( level != VTD_PAGE_TABLE_LEVEL_3 &&
>           level != VTD_PAGE_TABLE_LEVEL_4)
> diff --git a/xen/drivers/passthrough/vtd/x86/ats.c
> b/xen/drivers/passthrough/vtd/x86/ats.c
> index 8369415dcc..a7bbd3198a 100644
> --- a/xen/drivers/passthrough/vtd/x86/ats.c
> +++ b/xen/drivers/passthrough/vtd/x86/ats.c
> @@ -92,7 +92,7 @@ static bool device_in_domain(const struct vtd_iommu
> *iommu,
> 
>      context_entries = map_vtd_domain_page(root_ctp(*root_entry));
>      context_entry = &context_entries[pdev->devfn];
> -    if ( context_domain_id(*context_entry) != did )
> +    if ( context_did(*context_entry) != did )
>          goto out;
> 
>      tt = context_translation_type(*context_entry);
> --
> 2.20.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v4 12/14] vtd: use a bit field for root_entry
  2020-08-12 13:13     ` Durrant, Paul
@ 2020-08-18  8:27       ` Jan Beulich
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2020-08-18  8:27 UTC (permalink / raw)
  To: Durrant, Paul, Paul Durrant; +Cc: xen-devel, Kevin Tian

On 12.08.2020 15:13, Durrant, Paul wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 06 August 2020 13:34
>>
>> On 04.08.2020 15:42, Paul Durrant wrote:
>>> --- a/xen/drivers/passthrough/vtd/iommu.h
>>> +++ b/xen/drivers/passthrough/vtd/iommu.h
>>> @@ -184,21 +184,28 @@
>>>  #define dma_frcd_source_id(c) (c & 0xffff)
>>>  #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
>>>
>>> -/*
>>> - * 0: Present
>>> - * 1-11: Reserved
>>> - * 12-63: Context Ptr (12 - (haw-1))
>>> - * 64-127: Reserved
>>> - */
>>>  struct root_entry {
>>> -    u64    val;
>>> -    u64    rsvd1;
>>> +    union {
>>> +        __uint128_t val;
>>
>> I couldn't find a use of this field, and I also can't foresee any.
>> Could it be left out?
> 
> Yes, probably.
> 
>>
>>> +        struct { uint64_t lo, hi; };
>>> +        struct {
>>> +            /* 0 - 63 */
>>> +            uint64_t p:1;
>>
>> bool?
>>
> 
> I'd prefer not to. One of the points of using a bit field (at least from my PoV) is that it makes referring back to the spec. much easier, by using uint64_t types consistently and hence using bit widths that can be straightforwardly summed to give the bit offsets stated in the spec.

We've gone the suggested route for earlier struct conversions on
the AMD side, so I think we should follow suit here. See e.g.
struct amd_iommu_dte or union amd_iommu_control.

>>> --- a/xen/drivers/passthrough/vtd/x86/ats.c
>>> +++ b/xen/drivers/passthrough/vtd/x86/ats.c
>>> @@ -74,8 +74,8 @@ int ats_device(const struct pci_dev *pdev, const struct acpi_drhd_unit *drhd)
>>>  static bool device_in_domain(const struct vtd_iommu *iommu,
>>>                               const struct pci_dev *pdev, uint16_t did)
>>>  {
>>> -    struct root_entry *root_entry;
>>> -    struct context_entry *ctxt_entry = NULL;
>>> +    struct root_entry *root_entry, *root_entries = NULL;
>>> +    struct context_entry *context_entry, *context_entries = NULL;
>>
>> Just like root_entry, root_entries doesn't look to need an initializer.
>> I'm unconvinced anyway that you now need two variables each:
>> unmap_vtd_domain_page() does quite fine with the low 12 bits not all
>> being zero, afaict.
> 
> Not passing a page aligned address into something that unmaps a page seems a little bit fragile though, e.g. if someone happened to add a check in future.

There are quite a few existing callers passing a not-page-aligned
address into unmap_domain_page(). I don't see why having one more
instance would cause any kind of issue.

Jan


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2020-08-18  8:27 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-04 13:41 [PATCH v4 00/14] IOMMU cleanup Paul Durrant
2020-08-04 13:41 ` [PATCH v4 01/14] x86/iommu: re-arrange arch_iommu to separate common fields Paul Durrant
2020-08-14  6:14   ` Tian, Kevin
2020-08-04 13:41 ` [PATCH v4 02/14] x86/iommu: add common page-table allocator Paul Durrant
2020-08-05 15:39   ` Jan Beulich
2020-08-04 13:41 ` [PATCH v4 03/14] x86/iommu: convert VT-d code to use new page table allocator Paul Durrant
2020-08-14  6:41   ` Tian, Kevin
2020-08-14  7:16     ` Durrant, Paul
2020-08-04 13:41 ` [PATCH v4 04/14] x86/iommu: convert AMD IOMMU " Paul Durrant
2020-08-04 13:42 ` [PATCH v4 05/14] iommu: remove unused iommu_ops method and tasklet Paul Durrant
2020-08-04 13:42 ` [PATCH v4 06/14] iommu: flush I/O TLB if iommu_map() or iommu_unmap() fail Paul Durrant
2020-08-05 16:06   ` Jan Beulich
2020-08-05 16:18     ` Paul Durrant
2020-08-06 11:41   ` Jan Beulich
2020-08-14  6:53   ` Tian, Kevin
2020-08-14  7:19     ` Durrant, Paul
2020-08-04 13:42 ` [PATCH v4 07/14] iommu: make map, unmap and flush all take both an order and a count Paul Durrant
2020-08-06  9:57   ` Jan Beulich
2020-08-11 11:00     ` Durrant, Paul
2020-08-14  6:57     ` Tian, Kevin
2020-08-04 13:42 ` [PATCH v4 08/14] remove remaining uses of iommu_legacy_map/unmap Paul Durrant
2020-08-06 10:28   ` Jan Beulich
2020-08-12  9:36     ` [EXTERNAL] " Paul Durrant
2020-08-04 13:42 ` [PATCH v4 09/14] common/grant_table: batch flush I/O TLB Paul Durrant
2020-08-06 11:49   ` Jan Beulich
2020-08-04 13:42 ` [PATCH v4 10/14] iommu: remove the share_p2m operation Paul Durrant
2020-08-06 12:18   ` Jan Beulich
2020-08-14  7:04   ` Tian, Kevin
2020-08-04 13:42 ` [PATCH v4 11/14] iommu: stop calling IOMMU page tables 'p2m tables' Paul Durrant
2020-08-06 12:23   ` Jan Beulich
2020-08-14  7:12   ` Tian, Kevin
2020-08-04 13:42 ` [PATCH v4 12/14] vtd: use a bit field for root_entry Paul Durrant
2020-08-06 12:34   ` Jan Beulich
2020-08-12 13:13     ` Durrant, Paul
2020-08-18  8:27       ` Jan Beulich
2020-08-14  7:17   ` Tian, Kevin
2020-08-04 13:42 ` [PATCH v4 13/14] vtd: use a bit field for context_entry Paul Durrant
2020-08-06 12:46   ` Jan Beulich
2020-08-12 13:47     ` Paul Durrant
2020-08-14  7:19   ` Tian, Kevin
2020-08-04 13:42 ` [PATCH v4 14/14] vtd: use a bit field for dma_pte Paul Durrant
2020-08-06 12:53   ` Jan Beulich
2020-08-12 13:49     ` Paul Durrant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).