All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/7] IOMMU cleanup
@ 2020-11-20 13:24 Paul Durrant
  2020-11-20 13:24 ` [PATCH v10 1/7] remove remaining uses of iommu_legacy_map/unmap Paul Durrant
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Paul Durrant @ 2020-11-20 13:24 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Julien Grall, Jun Nakajima, Kevin Tian,
	Roger Pau Monné,
	Stefano Stabellini, Wei Liu

From: Paul Durrant <pdurrant@amazon.com>

This is the remainder of the cleanup series deferred until XSA-346 and
XSA-347 were publicly disclosed.

Paul Durrant (7):
  remove remaining uses of iommu_legacy_map/unmap
  common/grant_table: batch flush I/O TLB
  iommu: remove the share_p2m operation
  iommu: stop calling IOMMU page tables 'p2m tables'
  vtd: use a bit field for root_entry
  vtd: use a bit field for context_entry
  vtd: use a bit field for dma_pte

 xen/arch/x86/mm.c                           |  26 ++-
 xen/arch/x86/mm/p2m-ept.c                   |  20 +-
 xen/arch/x86/mm/p2m-pt.c                    |  16 +-
 xen/arch/x86/mm/p2m.c                       |  28 ++-
 xen/arch/x86/x86_64/mm.c                    |  20 +-
 xen/common/grant_table.c                    | 208 ++++++++++++------
 xen/common/memory.c                         |   6 +-
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  20 +-
 xen/drivers/passthrough/iommu.c             |  52 +----
 xen/drivers/passthrough/vtd/extern.h        |   2 +-
 xen/drivers/passthrough/vtd/iommu.c         | 220 +++++++++++---------
 xen/drivers/passthrough/vtd/iommu.h         | 113 ++++------
 xen/drivers/passthrough/vtd/utils.c         |  22 +-
 xen/drivers/passthrough/vtd/x86/ats.c       |  29 +--
 xen/drivers/passthrough/vtd/x86/vtd.c       |   2 +-
 xen/include/xen/iommu.h                     |  26 +--
 16 files changed, 429 insertions(+), 381 deletions(-)
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <jgrall@amazon.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Wei Liu <wl@xen.org>
-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v10 1/7] remove remaining uses of iommu_legacy_map/unmap
  2020-11-20 13:24 [PATCH v10 0/7] IOMMU cleanup Paul Durrant
@ 2020-11-20 13:24 ` Paul Durrant
  2020-11-27 14:39   ` Jan Beulich
  2020-11-20 13:24 ` [PATCH v10 2/7] common/grant_table: batch flush I/O TLB Paul Durrant
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2020-11-20 13:24 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Julien Grall, Kevin Tian, Jan Beulich,
	Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini, Jun Nakajima

From: Paul Durrant <pdurrant@amazon.com>

The 'legacy' functions do implicit flushing so amend the callers to do the
appropriate flushing.

Unfortunately, because of the structure of the P2M code, we cannot remove
the per-CPU 'iommu_dont_flush_iotlb' global and the optimization it
facilitates. Checking of this flag is now done only in relevant callers of
iommu_iotlb_flush(). Also, 'iommu_dont_flush_iotlb' is now declared
as bool (rather than bool_t) and setting/clearing it are no longer pointlessly
gated on is_iommu_enabled() returning true. (Arguably it is also pointless to
gate the call to iommu_iotlb_flush() on that condition - since it is a no-op
in that case - but the if clause allows the scope of a stack variable to be
restricted).

NOTE: The code in memory_add() now sets 'ret' if iommu_map() or
      iommu_iotlb_flush() fails.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Julien Grall <jgrall@amazon.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Jun Nakajima <jun.nakajima@intel.com>

v10:
 - Re-base

v9:
 - Moved check of 'iommu_dont_flush_iotlb' out of iommu_iotlb_flush() and
   into callers to avoid re-introducing a problem on Arm
 - Dropped Jan's R-b due to change

v6:
 - Fix formatting problem in memory_add()

v5:
 - Re-base
 - Removed failure case on overflow of unsigned int as it is no longer
   necessary

v3:
 - Same as v2; elected to implement batch flushing in the grant table code as
   a subsequent patch

v2:
 - Shorten the diff (mainly because of a prior patch introducing automatic
   flush-on-fail into iommu_map() and iommu_unmap())
---
 xen/arch/x86/mm.c               | 26 +++++++++++++++++++-------
 xen/arch/x86/mm/p2m-ept.c       | 20 ++++++++++++--------
 xen/arch/x86/mm/p2m-pt.c        | 16 ++++++++++++----
 xen/arch/x86/mm/p2m.c           | 25 +++++++++++++++++--------
 xen/arch/x86/x86_64/mm.c        | 20 +++++++-------------
 xen/common/grant_table.c        | 29 ++++++++++++++++++++++-------
 xen/common/memory.c             |  6 +++---
 xen/drivers/passthrough/iommu.c | 23 -----------------------
 xen/include/xen/iommu.h         | 21 +++++----------------
 9 files changed, 97 insertions(+), 89 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 5a50339284c7..bb5f504b84e2 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2489,10 +2489,16 @@ static int cleanup_page_mappings(struct page_info *page)
 
         if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
         {
-            int rc2 = iommu_legacy_unmap(d, _dfn(mfn), 1u << PAGE_ORDER_4K);
+            unsigned int flush_flags = 0;
+            int err;
+
+            err = iommu_unmap(d, _dfn(mfn), 1ul << PAGE_ORDER_4K, &flush_flags);
+            if ( !err && !this_cpu(iommu_dont_flush_iotlb) )
+                err = iommu_iotlb_flush(d, _dfn(mfn), 1ul << PAGE_ORDER_4K,
+                                        flush_flags);
 
             if ( !rc )
-                rc = rc2;
+                rc = err;
         }
 
         if ( likely(!is_special_page(page)) )
@@ -3014,14 +3020,20 @@ static int _get_page_type(struct page_info *page, unsigned long type,
         if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
         {
             mfn_t mfn = page_to_mfn(page);
+            dfn_t dfn = _dfn(mfn_x(mfn));
+            unsigned int flush_flags = 0;
 
             if ( (x & PGT_type_mask) == PGT_writable_page )
-                rc = iommu_legacy_unmap(d, _dfn(mfn_x(mfn)),
-                                        1ul << PAGE_ORDER_4K);
+                rc = iommu_unmap(d, dfn, 1ul << PAGE_ORDER_4K, &flush_flags);
             else
-                rc = iommu_legacy_map(d, _dfn(mfn_x(mfn)), mfn,
-                                      1ul << PAGE_ORDER_4K,
-                                      IOMMUF_readable | IOMMUF_writable);
+            {
+                rc = iommu_map(d, dfn, mfn, 1ul << PAGE_ORDER_4K,
+                               IOMMUF_readable | IOMMUF_writable, &flush_flags);
+            }
+
+            if ( !rc && !this_cpu(iommu_dont_flush_iotlb) )
+                rc = iommu_iotlb_flush(d, dfn, 1ul << PAGE_ORDER_4K,
+                                       flush_flags);
 
             if ( unlikely(rc) )
             {
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 975ab403f235..c04a30eecc65 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -842,15 +842,19 @@ out:
     if ( rc == 0 && p2m_is_hostp2m(p2m) &&
          need_modify_vtd_table )
     {
-        if ( iommu_use_hap_pt(d) && !this_cpu(iommu_dont_flush_iotlb) )
-            rc = iommu_iotlb_flush(d, _dfn(gfn), 1ul << order,
-                                   (iommu_flags ? IOMMU_FLUSHF_added : 0) |
-                                   (vtd_pte_present ? IOMMU_FLUSHF_modified
-                                                    : 0));
-        else if ( need_iommu_pt_sync(d) )
+        unsigned int flush_flags = 0;
+
+        if ( need_iommu_pt_sync(d) )
             rc = iommu_flags ?
-                iommu_legacy_map(d, _dfn(gfn), mfn, 1ul << order, iommu_flags) :
-                iommu_legacy_unmap(d, _dfn(gfn), 1ul << order);
+                iommu_map(d, _dfn(gfn), mfn, 1ul << order, iommu_flags,
+                          &flush_flags) :
+                iommu_unmap(d, _dfn(gfn), 1ul << order, &flush_flags);
+        else if ( iommu_use_hap_pt(d) )
+            flush_flags = (iommu_flags ? IOMMU_FLUSHF_added : 0) |
+                          (vtd_pte_present ? IOMMU_FLUSHF_modified : 0);
+
+        if ( !rc && !this_cpu(iommu_dont_flush_iotlb) )
+            rc = iommu_iotlb_flush(d, _dfn(gfn), 1ul << order, flush_flags);
     }
 
     unmap_domain_page(table);
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 5fa0d30ce7d2..d8ffc6f7e078 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -741,10 +741,18 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
 
     if ( need_iommu_pt_sync(p2m->domain) &&
          (iommu_old_flags != iommu_pte_flags || old_mfn != mfn_x(mfn)) )
-        rc = iommu_pte_flags
-             ? iommu_legacy_map(d, _dfn(gfn), mfn, 1ul << page_order,
-                                iommu_pte_flags)
-             : iommu_legacy_unmap(d, _dfn(gfn), 1ul << page_order);
+    {
+        unsigned int flush_flags = 0;
+
+        rc = iommu_pte_flags ?
+            iommu_map(d, _dfn(gfn), mfn, 1ul << page_order, iommu_pte_flags,
+                      &flush_flags) :
+            iommu_unmap(d, _dfn(gfn), 1ul << page_order, &flush_flags);
+
+        if ( !rc && !this_cpu(iommu_dont_flush_iotlb) )
+            rc = iommu_iotlb_flush(d, _dfn(gfn), 1ul << page_order,
+                                   flush_flags);
+    }
 
     /*
      * Free old intermediate tables if necessary.  This has to be the
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d9cc1856bb80..8ee33b25ca72 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1351,11 +1351,15 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn_l,
 
     if ( !paging_mode_translate(p2m->domain) )
     {
-        if ( !is_iommu_enabled(d) )
-            return 0;
-        return iommu_legacy_map(d, _dfn(gfn_l), _mfn(gfn_l),
-                                1ul << PAGE_ORDER_4K,
-                                IOMMUF_readable | IOMMUF_writable);
+        unsigned int flush_flags = 0;
+
+        ret = iommu_map(d, _dfn(gfn_l), _mfn(gfn_l), 1ul << PAGE_ORDER_4K,
+                        IOMMUF_readable | IOMMUF_writable, &flush_flags);
+        if ( !ret )
+            ret = iommu_iotlb_flush(d, _dfn(gfn_l), 1ul << PAGE_ORDER_4K,
+                                    flush_flags);
+
+        return ret;
     }
 
     gfn_lock(p2m, gfn, 0);
@@ -1443,9 +1447,14 @@ int clear_identity_p2m_entry(struct domain *d, unsigned long gfn_l)
 
     if ( !paging_mode_translate(d) )
     {
-        if ( !is_iommu_enabled(d) )
-            return 0;
-        return iommu_legacy_unmap(d, _dfn(gfn_l), 1ul << PAGE_ORDER_4K);
+        unsigned int flush_flags = 0;
+
+        ret = iommu_unmap(d, _dfn(gfn_l), 1ul << PAGE_ORDER_4K, &flush_flags);
+        if ( !ret )
+            ret = iommu_iotlb_flush(d, _dfn(gfn_l), 1ul << PAGE_ORDER_4K,
+                                    flush_flags);
+
+        return ret;
     }
 
     gfn_lock(p2m, gfn, 0);
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index bce1561e1a80..7e9d16544915 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1284,21 +1284,15 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
          !iommu_use_hap_pt(hardware_domain) &&
          !need_iommu_pt_sync(hardware_domain) )
     {
-        for ( i = spfn; i < epfn; i++ )
-            if ( iommu_legacy_map(hardware_domain, _dfn(i), _mfn(i),
-                                  1ul << PAGE_ORDER_4K,
-                                  IOMMUF_readable | IOMMUF_writable) )
-                break;
-        if ( i != epfn )
-        {
-            while (i-- > old_max)
-                /* If statement to satisfy __must_check. */
-                if ( iommu_legacy_unmap(hardware_domain, _dfn(i),
-                                        1ul << PAGE_ORDER_4K) )
-                    continue;
+        unsigned int flush_flags = 0;
+        unsigned long n = epfn - spfn;
 
+        ret = iommu_map(hardware_domain, _dfn(i), _mfn(i), n,
+                        IOMMUF_readable | IOMMUF_writable, &flush_flags);
+        if ( !ret )
+            ret = iommu_iotlb_flush(hardware_domain, _dfn(i), n, flush_flags);
+        if ( ret )
             goto destroy_m2p;
-        }
     }
 
     /* We can't revert any more */
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index a5d3ed8bdaac..beb6b2d40d68 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1225,11 +1225,22 @@ map_grant_ref(
             kind = IOMMUF_readable;
         else
             kind = 0;
-        if ( kind && iommu_legacy_map(ld, _dfn(mfn_x(mfn)), mfn, 1, kind) )
+
+        if ( kind )
         {
-            double_gt_unlock(lgt, rgt);
-            rc = GNTST_general_error;
-            goto undo_out;
+            dfn_t dfn = _dfn(mfn_x(mfn));
+            unsigned int flush_flags = 0;
+            int err;
+
+            err = iommu_map(ld, dfn, mfn, 1, kind, &flush_flags);
+            if ( !err )
+                err = iommu_iotlb_flush(ld, dfn, 1, flush_flags);
+            if ( err )
+            {
+                double_gt_unlock(lgt, rgt);
+                rc = GNTST_general_error;
+                goto undo_out;
+            }
         }
     }
 
@@ -1473,19 +1484,23 @@ unmap_common(
     if ( rc == GNTST_okay && gnttab_need_iommu_mapping(ld) )
     {
         unsigned int kind;
+        dfn_t dfn = _dfn(mfn_x(op->mfn));
+        unsigned int flush_flags = 0;
         int err = 0;
 
         double_gt_lock(lgt, rgt);
 
         kind = mapkind(lgt, rd, op->mfn);
         if ( !kind )
-            err = iommu_legacy_unmap(ld, _dfn(mfn_x(op->mfn)), 1);
+            err = iommu_unmap(ld, dfn, 1, &flush_flags);
         else if ( !(kind & MAPKIND_WRITE) )
-            err = iommu_legacy_map(ld, _dfn(mfn_x(op->mfn)), op->mfn, 1,
-                                   IOMMUF_readable);
+            err = iommu_map(ld, dfn, op->mfn, 1, IOMMUF_readable,
+                            &flush_flags);
 
         double_gt_unlock(lgt, rgt);
 
+        if ( !err )
+            err = iommu_iotlb_flush(ld, dfn, 1, flush_flags);
         if ( err )
             rc = GNTST_general_error;
     }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index df85b550a1b1..14137c68393c 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -836,8 +836,8 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
 
     if ( is_iommu_enabled(d) )
     {
-       this_cpu(iommu_dont_flush_iotlb) = 1;
-       extra.ppage = &pages[0];
+        this_cpu(iommu_dont_flush_iotlb) = true;
+        extra.ppage = &pages[0];
     }
 
     while ( xatp->size > done )
@@ -867,7 +867,7 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
         int ret;
         unsigned int i;
 
-        this_cpu(iommu_dont_flush_iotlb) = 0;
+        this_cpu(iommu_dont_flush_iotlb) = false;
 
         ret = iommu_iotlb_flush(d, _dfn(xatp->idx - done), done,
                                 IOMMU_FLUSHF_modified);
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 87f9a857bbae..a9da4d2b0645 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -282,18 +282,6 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
     return rc;
 }
 
-int iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn,
-                     unsigned long page_count, unsigned int flags)
-{
-    unsigned int flush_flags = 0;
-    int rc = iommu_map(d, dfn, mfn, page_count, flags, &flush_flags);
-
-    if ( !this_cpu(iommu_dont_flush_iotlb) && !rc )
-        rc = iommu_iotlb_flush(d, dfn, page_count, flush_flags);
-
-    return rc;
-}
-
 int iommu_unmap(struct domain *d, dfn_t dfn, unsigned long page_count,
                 unsigned int *flush_flags)
 {
@@ -338,17 +326,6 @@ int iommu_unmap(struct domain *d, dfn_t dfn, unsigned long page_count,
     return rc;
 }
 
-int iommu_legacy_unmap(struct domain *d, dfn_t dfn, unsigned long page_count)
-{
-    unsigned int flush_flags = 0;
-    int rc = iommu_unmap(d, dfn, page_count, &flush_flags);
-
-    if ( !this_cpu(iommu_dont_flush_iotlb) && !rc )
-        rc = iommu_iotlb_flush(d, dfn, page_count, flush_flags);
-
-    return rc;
-}
-
 int iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn,
                       unsigned int *flags)
 {
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 191021870fed..244a11b9b494 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -151,16 +151,8 @@ int __must_check iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
 int __must_check iommu_unmap(struct domain *d, dfn_t dfn,
                              unsigned long page_count,
                              unsigned int *flush_flags);
-
-int __must_check iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn,
-                                  unsigned long page_count,
-                                  unsigned int flags);
-int __must_check iommu_legacy_unmap(struct domain *d, dfn_t dfn,
-                                    unsigned long page_count);
-
 int __must_check iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn,
                                    unsigned int *flags);
-
 int __must_check iommu_iotlb_flush(struct domain *d, dfn_t dfn,
                                    unsigned long page_count,
                                    unsigned int flush_flags);
@@ -368,15 +360,12 @@ void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev);
 
 /*
  * The purpose of the iommu_dont_flush_iotlb optional cpu flag is to
- * avoid unecessary iotlb_flush in the low level IOMMU code.
- *
- * iommu_map_page/iommu_unmap_page must flush the iotlb but somethimes
- * this operation can be really expensive. This flag will be set by the
- * caller to notify the low level IOMMU code to avoid the iotlb flushes.
- * iommu_iotlb_flush/iommu_iotlb_flush_all will be explicitly called by
- * the caller.
+ * avoid unnecessary IOMMU flushing while updating the P2M.
+ * Setting the value to true will cause iommu_iotlb_flush() to return without
+ * actually performing a flush. A batch flush must therefore be done by the
+ * calling code after setting the value back to false.
  */
-DECLARE_PER_CPU(bool_t, iommu_dont_flush_iotlb);
+DECLARE_PER_CPU(bool, iommu_dont_flush_iotlb);
 
 extern struct spinlock iommu_pt_cleanup_lock;
 extern struct page_list_head iommu_pt_cleanup_list;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v10 2/7] common/grant_table: batch flush I/O TLB
  2020-11-20 13:24 [PATCH v10 0/7] IOMMU cleanup Paul Durrant
  2020-11-20 13:24 ` [PATCH v10 1/7] remove remaining uses of iommu_legacy_map/unmap Paul Durrant
@ 2020-11-20 13:24 ` Paul Durrant
  2020-11-20 13:24 ` [PATCH v10 3/7] iommu: remove the share_p2m operation Paul Durrant
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Paul Durrant @ 2020-11-20 13:24 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Jan Beulich, Julien Grall, Wei Liu, Andrew Cooper,
	George Dunlap, Ian Jackson, Stefano Stabellini

From: Paul Durrant <pdurrant@amazon.com>

This patch avoids calling iommu_iotlb_flush() for each individual GNTTABOP and
instead calls iommu_iotlb_flush_all() at the end of a batch. This should mean
non-singleton map/unmap operations perform better.

NOTE: A batch is the number of operations done before a pre-emption check and,
      in the case of unmap, a TLB flush.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien@xen.org>
Reviewed-by: Wei Liu <wl@xen.org>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>

v6:
 - Fix spelling of 'preemption'
 - Drop unneeded 'currd' stack variable

v5:
 - Add batching to gnttab_map_grant_ref() to handle flushing before pre-
   emption check
 - Maintain per-op flushing in the case of singletons

v3:
 - New in v3
---
 xen/common/grant_table.c | 199 ++++++++++++++++++++++++++-------------
 1 file changed, 133 insertions(+), 66 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index beb6b2d40d68..1e3d7a2d33cb 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -241,7 +241,13 @@ struct gnttab_unmap_common {
     grant_ref_t ref;
 };
 
-/* Number of unmap operations that are done between each tlb flush */
+/* Number of map operations that are done between each preemption check */
+#define GNTTAB_MAP_BATCH_SIZE 32
+
+/*
+ * Number of unmap operations that are done between each tlb flush and
+ * preemption check.
+ */
 #define GNTTAB_UNMAP_BATCH_SIZE 32
 
 
@@ -979,7 +985,7 @@ static unsigned int mapkind(
 
 static void
 map_grant_ref(
-    struct gnttab_map_grant_ref *op)
+    struct gnttab_map_grant_ref *op, bool do_flush, unsigned int *flush_flags)
 {
     struct domain *ld, *rd, *owner = NULL;
     struct grant_table *lgt, *rgt;
@@ -1229,12 +1235,11 @@ map_grant_ref(
         if ( kind )
         {
             dfn_t dfn = _dfn(mfn_x(mfn));
-            unsigned int flush_flags = 0;
             int err;
 
-            err = iommu_map(ld, dfn, mfn, 1, kind, &flush_flags);
-            if ( !err )
-                err = iommu_iotlb_flush(ld, dfn, 1, flush_flags);
+            err = iommu_map(ld, dfn, mfn, 1, kind, flush_flags);
+            if ( do_flush && !err )
+                err = iommu_iotlb_flush(ld, dfn, 1, *flush_flags);
             if ( err )
             {
                 double_gt_unlock(lgt, rgt);
@@ -1319,29 +1324,59 @@ static long
 gnttab_map_grant_ref(
     XEN_GUEST_HANDLE_PARAM(gnttab_map_grant_ref_t) uop, unsigned int count)
 {
-    int i;
-    struct gnttab_map_grant_ref op;
+    unsigned int done = 0;
+    int rc = 0;
 
-    for ( i = 0; i < count; i++ )
+    while ( count )
     {
-        if ( i && hypercall_preempt_check() )
-            return i;
+        unsigned int i, c = min_t(unsigned int, count, GNTTAB_MAP_BATCH_SIZE);
+        unsigned int flush_flags = 0;
 
-        if ( unlikely(__copy_from_guest_offset(&op, uop, i, 1)) )
-            return -EFAULT;
+        for ( i = 0; i < c; i++ )
+        {
+            struct gnttab_map_grant_ref op;
 
-        map_grant_ref(&op);
+            if ( unlikely(__copy_from_guest(&op, uop, 1)) )
+            {
+                rc = -EFAULT;
+                break;
+            }
 
-        if ( unlikely(__copy_to_guest_offset(uop, i, &op, 1)) )
-            return -EFAULT;
+            map_grant_ref(&op, c == 1, &flush_flags);
+
+            if ( unlikely(__copy_to_guest(uop, &op, 1)) )
+            {
+                rc = -EFAULT;
+                break;
+            }
+
+            guest_handle_add_offset(uop, 1);
+        }
+
+        if ( c > 1 )
+        {
+            int err = iommu_iotlb_flush_all(current->domain, flush_flags);
+
+            if ( !rc )
+                rc = err;
+        }
+
+        if ( rc )
+            break;
+
+        count -= c;
+        done += c;
+
+        if ( count && hypercall_preempt_check() )
+            return done;
     }
 
-    return 0;
+    return rc;
 }
 
 static void
 unmap_common(
-    struct gnttab_unmap_common *op)
+    struct gnttab_unmap_common *op, bool do_flush, unsigned int *flush_flags)
 {
     domid_t          dom;
     struct domain   *ld, *rd;
@@ -1485,22 +1520,20 @@ unmap_common(
     {
         unsigned int kind;
         dfn_t dfn = _dfn(mfn_x(op->mfn));
-        unsigned int flush_flags = 0;
         int err = 0;
 
         double_gt_lock(lgt, rgt);
 
         kind = mapkind(lgt, rd, op->mfn);
         if ( !kind )
-            err = iommu_unmap(ld, dfn, 1, &flush_flags);
+            err = iommu_unmap(ld, dfn, 1, flush_flags);
         else if ( !(kind & MAPKIND_WRITE) )
-            err = iommu_map(ld, dfn, op->mfn, 1, IOMMUF_readable,
-                            &flush_flags);
+            err = iommu_map(ld, dfn, op->mfn, 1, IOMMUF_readable, flush_flags);
 
         double_gt_unlock(lgt, rgt);
 
-        if ( !err )
-            err = iommu_iotlb_flush(ld, dfn, 1, flush_flags);
+        if ( do_flush && !err )
+            err = iommu_iotlb_flush(ld, dfn, 1, *flush_flags);
         if ( err )
             rc = GNTST_general_error;
     }
@@ -1599,8 +1632,8 @@ unmap_common_complete(struct gnttab_unmap_common *op)
 
 static void
 unmap_grant_ref(
-    struct gnttab_unmap_grant_ref *op,
-    struct gnttab_unmap_common *common)
+    struct gnttab_unmap_grant_ref *op, struct gnttab_unmap_common *common,
+    bool do_flush, unsigned int *flush_flags)
 {
     common->host_addr = op->host_addr;
     common->dev_bus_addr = op->dev_bus_addr;
@@ -1612,7 +1645,7 @@ unmap_grant_ref(
     common->rd = NULL;
     common->mfn = INVALID_MFN;
 
-    unmap_common(common);
+    unmap_common(common, do_flush, flush_flags);
     op->status = common->status;
 }
 
@@ -1621,31 +1654,55 @@ static long
 gnttab_unmap_grant_ref(
     XEN_GUEST_HANDLE_PARAM(gnttab_unmap_grant_ref_t) uop, unsigned int count)
 {
-    int i, c, partial_done, done = 0;
-    struct gnttab_unmap_grant_ref op;
-    struct gnttab_unmap_common common[GNTTAB_UNMAP_BATCH_SIZE];
+    struct domain *currd = current->domain;
+    unsigned int done = 0;
+    int rc = 0;
 
-    while ( count != 0 )
+    while ( count )
     {
-        c = min(count, (unsigned int)GNTTAB_UNMAP_BATCH_SIZE);
-        partial_done = 0;
+        struct gnttab_unmap_common common[GNTTAB_UNMAP_BATCH_SIZE];
+        unsigned int i, c, partial_done = 0;
+        unsigned int flush_flags = 0;
+
+        c = min_t(unsigned int, count, GNTTAB_UNMAP_BATCH_SIZE);
 
         for ( i = 0; i < c; i++ )
         {
+            struct gnttab_unmap_grant_ref op;
+
             if ( unlikely(__copy_from_guest(&op, uop, 1)) )
-                goto fault;
-            unmap_grant_ref(&op, &common[i]);
+            {
+                rc = -EFAULT;
+                break;
+            }
+
+            unmap_grant_ref(&op, &common[i], c == 1, &flush_flags);
             ++partial_done;
+
             if ( unlikely(__copy_field_to_guest(uop, &op, status)) )
-                goto fault;
+            {
+                rc = -EFAULT;
+                break;
+            }
+
             guest_handle_add_offset(uop, 1);
         }
 
-        gnttab_flush_tlb(current->domain);
+        gnttab_flush_tlb(currd);
+        if ( c > 1 )
+        {
+            int err = iommu_iotlb_flush_all(currd, flush_flags);
+
+            if ( !rc )
+                rc = err;
+        }
 
         for ( i = 0; i < partial_done; i++ )
             unmap_common_complete(&common[i]);
 
+        if ( rc )
+            break;
+
         count -= c;
         done += c;
 
@@ -1653,20 +1710,13 @@ gnttab_unmap_grant_ref(
             return done;
     }
 
-    return 0;
-
-fault:
-    gnttab_flush_tlb(current->domain);
-
-    for ( i = 0; i < partial_done; i++ )
-        unmap_common_complete(&common[i]);
-    return -EFAULT;
+    return rc;
 }
 
 static void
 unmap_and_replace(
-    struct gnttab_unmap_and_replace *op,
-    struct gnttab_unmap_common *common)
+    struct gnttab_unmap_and_replace *op, struct gnttab_unmap_common *common,
+    bool do_flush, unsigned int *flush_flags)
 {
     common->host_addr = op->host_addr;
     common->new_addr = op->new_addr;
@@ -1678,7 +1728,7 @@ unmap_and_replace(
     common->rd = NULL;
     common->mfn = INVALID_MFN;
 
-    unmap_common(common);
+    unmap_common(common, do_flush, flush_flags);
     op->status = common->status;
 }
 
@@ -1686,31 +1736,55 @@ static long
 gnttab_unmap_and_replace(
     XEN_GUEST_HANDLE_PARAM(gnttab_unmap_and_replace_t) uop, unsigned int count)
 {
-    int i, c, partial_done, done = 0;
-    struct gnttab_unmap_and_replace op;
-    struct gnttab_unmap_common common[GNTTAB_UNMAP_BATCH_SIZE];
+    struct domain *currd = current->domain;
+    unsigned int done = 0;
+    int rc = 0;
 
-    while ( count != 0 )
+    while ( count )
     {
-        c = min(count, (unsigned int)GNTTAB_UNMAP_BATCH_SIZE);
-        partial_done = 0;
+        struct gnttab_unmap_common common[GNTTAB_UNMAP_BATCH_SIZE];
+        unsigned int i, c, partial_done = 0;
+        unsigned int flush_flags = 0;
+
+        c = min_t(unsigned int, count, GNTTAB_UNMAP_BATCH_SIZE);
 
         for ( i = 0; i < c; i++ )
         {
+            struct gnttab_unmap_and_replace op;
+
             if ( unlikely(__copy_from_guest(&op, uop, 1)) )
-                goto fault;
-            unmap_and_replace(&op, &common[i]);
+            {
+                rc = -EFAULT;
+                break;
+            }
+
+            unmap_and_replace(&op, &common[i], c == 1, &flush_flags);
             ++partial_done;
+
             if ( unlikely(__copy_field_to_guest(uop, &op, status)) )
-                goto fault;
+            {
+                rc = -EFAULT;
+                break;
+            }
+
             guest_handle_add_offset(uop, 1);
         }
 
-        gnttab_flush_tlb(current->domain);
+        gnttab_flush_tlb(currd);
+        if ( c > 1 )
+        {
+            int err = iommu_iotlb_flush_all(currd, flush_flags);
+
+            if ( !rc )
+                rc = err;
+        }
 
         for ( i = 0; i < partial_done; i++ )
             unmap_common_complete(&common[i]);
 
+        if ( rc )
+            break;
+
         count -= c;
         done += c;
 
@@ -1718,14 +1792,7 @@ gnttab_unmap_and_replace(
             return done;
     }
 
-    return 0;
-
-fault:
-    gnttab_flush_tlb(current->domain);
-
-    for ( i = 0; i < partial_done; i++ )
-        unmap_common_complete(&common[i]);
-    return -EFAULT;
+    return rc;
 }
 
 static int
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v10 3/7] iommu: remove the share_p2m operation
  2020-11-20 13:24 [PATCH v10 0/7] IOMMU cleanup Paul Durrant
  2020-11-20 13:24 ` [PATCH v10 1/7] remove remaining uses of iommu_legacy_map/unmap Paul Durrant
  2020-11-20 13:24 ` [PATCH v10 2/7] common/grant_table: batch flush I/O TLB Paul Durrant
@ 2020-11-20 13:24 ` Paul Durrant
  2020-11-20 13:24 ` [PATCH v10 4/7] iommu: stop calling IOMMU page tables 'p2m tables' Paul Durrant
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Paul Durrant @ 2020-11-20 13:24 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Jan Beulich, Kevin Tian, Andrew Cooper,
	George Dunlap, Wei Liu, Roger Pau Monné

From: Paul Durrant <pdurrant@amazon.com>

Sharing of HAP tables is now VT-d specific so the operation is never defined
for AMD IOMMU any more. There's also no need to pro-actively set vtd.pgd_maddr
when using shared EPT as it is straightforward to simply define a helper
function to return the appropriate value in the shared and non-shared cases.

NOTE: This patch also modifies unmap_vtd_domain_page() to take a const
      pointer since the only thing it calls, unmap_domain_page(), also takes
      a const pointer.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>

v6:
 - Adjust code to return P2M paddr
 - Add removed comment back in

v5:
 - Pass 'nr_pt_levels' into domain_pgd_maddr() directly

v2:
 - Put the PGD level adjust into the helper function too, since it is
   irrelevant in the shared EPT case
---
 xen/arch/x86/mm/p2m.c                 |  3 -
 xen/drivers/passthrough/iommu.c       |  8 ---
 xen/drivers/passthrough/vtd/extern.h  |  2 +-
 xen/drivers/passthrough/vtd/iommu.c   | 90 +++++++++++++++------------
 xen/drivers/passthrough/vtd/x86/vtd.c |  2 +-
 xen/include/xen/iommu.h               |  3 -
 6 files changed, 52 insertions(+), 56 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 8ee33b25ca72..34e37a9b1b5d 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -727,9 +727,6 @@ int p2m_alloc_table(struct p2m_domain *p2m)
 
     p2m->phys_table = pagetable_from_mfn(top_mfn);
 
-    if ( hap_enabled(d) )
-        iommu_share_p2m_table(d);
-
     p2m_unlock(p2m);
     return 0;
 }
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index a9da4d2b0645..90748062e5bd 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -500,14 +500,6 @@ int iommu_do_domctl(
     return ret;
 }
 
-void iommu_share_p2m_table(struct domain* d)
-{
-    ASSERT(hap_enabled(d));
-
-    if ( iommu_use_hap_pt(d) )
-        iommu_get_ops()->share_p2m(d);
-}
-
 void iommu_crash_shutdown(void)
 {
     if ( !iommu_crash_disable )
diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index ad6c5f907b8c..19a908ab4f71 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -72,7 +72,7 @@ void flush_all_cache(void);
 uint64_t alloc_pgtable_maddr(unsigned long npages, nodeid_t node);
 void free_pgtable_maddr(u64 maddr);
 void *map_vtd_domain_page(u64 maddr);
-void unmap_vtd_domain_page(void *va);
+void unmap_vtd_domain_page(const void *va);
 int domain_context_mapping_one(struct domain *domain, struct vtd_iommu *iommu,
                                u8 bus, u8 devfn, const struct pci_dev *);
 int domain_context_unmap_one(struct domain *domain, struct vtd_iommu *iommu,
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index f6c4021fd698..a76e60c99a58 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -318,6 +318,48 @@ static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
     return pte_maddr;
 }
 
+static uint64_t domain_pgd_maddr(struct domain *d, unsigned int nr_pt_levels)
+{
+    struct domain_iommu *hd = dom_iommu(d);
+    uint64_t pgd_maddr;
+    unsigned int agaw;
+
+    ASSERT(spin_is_locked(&hd->arch.mapping_lock));
+
+    if ( iommu_use_hap_pt(d) )
+    {
+        pagetable_t pgt = p2m_get_pagetable(p2m_get_hostp2m(d));
+
+        return pagetable_get_paddr(pgt);
+    }
+
+    if ( !hd->arch.vtd.pgd_maddr )
+    {
+        /* Ensure we have pagetables allocated down to leaf PTE. */
+        addr_to_dma_page_maddr(d, 0, 1);
+
+        if ( !hd->arch.vtd.pgd_maddr )
+            return 0;
+    }
+
+    pgd_maddr = hd->arch.vtd.pgd_maddr;
+
+    /* Skip top levels of page tables for 2- and 3-level DRHDs. */
+    for ( agaw = level_to_agaw(4);
+          agaw != level_to_agaw(nr_pt_levels);
+          agaw-- )
+    {
+        const struct dma_pte *p = map_vtd_domain_page(pgd_maddr);
+
+        pgd_maddr = dma_pte_addr(*p);
+        unmap_vtd_domain_page(p);
+        if ( !pgd_maddr )
+            return 0;
+    }
+
+    return pgd_maddr;
+}
+
 static void iommu_flush_write_buffer(struct vtd_iommu *iommu)
 {
     u32 val;
@@ -1286,7 +1328,7 @@ int domain_context_mapping_one(
     struct context_entry *context, *context_entries;
     u64 maddr, pgd_maddr;
     u16 seg = iommu->drhd->segment;
-    int agaw, rc, ret;
+    int rc, ret;
     bool_t flush_dev_iotlb;
 
     ASSERT(pcidevs_locked());
@@ -1340,37 +1382,18 @@ int domain_context_mapping_one(
     if ( iommu_hwdom_passthrough && is_hardware_domain(domain) )
     {
         context_set_translation_type(*context, CONTEXT_TT_PASS_THRU);
-        agaw = level_to_agaw(iommu->nr_pt_levels);
     }
     else
     {
         spin_lock(&hd->arch.mapping_lock);
 
-        /* Ensure we have pagetables allocated down to leaf PTE. */
-        if ( hd->arch.vtd.pgd_maddr == 0 )
+        pgd_maddr = domain_pgd_maddr(domain, iommu->nr_pt_levels);
+        if ( !pgd_maddr )
         {
-            addr_to_dma_page_maddr(domain, 0, 1);
-            if ( hd->arch.vtd.pgd_maddr == 0 )
-            {
-            nomem:
-                spin_unlock(&hd->arch.mapping_lock);
-                spin_unlock(&iommu->lock);
-                unmap_vtd_domain_page(context_entries);
-                return -ENOMEM;
-            }
-        }
-
-        /* Skip top levels of page tables for 2- and 3-level DRHDs. */
-        pgd_maddr = hd->arch.vtd.pgd_maddr;
-        for ( agaw = level_to_agaw(4);
-              agaw != level_to_agaw(iommu->nr_pt_levels);
-              agaw-- )
-        {
-            struct dma_pte *p = map_vtd_domain_page(pgd_maddr);
-            pgd_maddr = dma_pte_addr(*p);
-            unmap_vtd_domain_page(p);
-            if ( pgd_maddr == 0 )
-                goto nomem;
+            spin_unlock(&hd->arch.mapping_lock);
+            spin_unlock(&iommu->lock);
+            unmap_vtd_domain_page(context_entries);
+            return -ENOMEM;
         }
 
         context_set_address_root(*context, pgd_maddr);
@@ -1389,7 +1412,7 @@ int domain_context_mapping_one(
         return -EFAULT;
     }
 
-    context_set_address_width(*context, agaw);
+    context_set_address_width(*context, level_to_agaw(iommu->nr_pt_levels));
     context_set_fault_enable(*context);
     context_set_present(*context);
     iommu_sync_cache(context, sizeof(struct context_entry));
@@ -1848,18 +1871,6 @@ static int __init vtd_ept_page_compatible(struct vtd_iommu *iommu)
            (ept_has_1gb(ept_cap) && opt_hap_1gb) <= cap_sps_1gb(vtd_cap);
 }
 
-/*
- * set VT-d page table directory to EPT table if allowed
- */
-static void iommu_set_pgd(struct domain *d)
-{
-    mfn_t pgd_mfn;
-
-    pgd_mfn = pagetable_get_mfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
-    dom_iommu(d)->arch.vtd.pgd_maddr =
-        pagetable_get_paddr(pagetable_from_mfn(pgd_mfn));
-}
-
 static int rmrr_identity_mapping(struct domain *d, bool_t map,
                                  const struct acpi_rmrr_unit *rmrr,
                                  u32 flag)
@@ -2718,7 +2729,6 @@ static struct iommu_ops __initdata vtd_ops = {
     .adjust_irq_affinities = adjust_vtd_irq_affinities,
     .suspend = vtd_suspend,
     .resume = vtd_resume,
-    .share_p2m = iommu_set_pgd,
     .crash_shutdown = vtd_crash_shutdown,
     .iotlb_flush = iommu_flush_iotlb_pages,
     .iotlb_flush_all = iommu_flush_iotlb_all,
diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c b/xen/drivers/passthrough/vtd/x86/vtd.c
index bbe358dc36c7..6681dccd6970 100644
--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -42,7 +42,7 @@ void *map_vtd_domain_page(u64 maddr)
     return map_domain_page(_mfn(paddr_to_pfn(maddr)));
 }
 
-void unmap_vtd_domain_page(void *va)
+void unmap_vtd_domain_page(const void *va)
 {
     unmap_domain_page(va);
 }
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 244a11b9b494..236c55af8921 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -269,7 +269,6 @@ struct iommu_ops {
 
     int __must_check (*suspend)(void);
     void (*resume)(void);
-    void (*share_p2m)(struct domain *d);
     void (*crash_shutdown)(void);
     int __must_check (*iotlb_flush)(struct domain *d, dfn_t dfn,
                                     unsigned long page_count,
@@ -346,8 +345,6 @@ void iommu_resume(void);
 void iommu_crash_shutdown(void);
 int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
 
-void iommu_share_p2m_table(struct domain *d);
-
 #ifdef CONFIG_HAS_PCI
 int iommu_do_pci_domctl(struct xen_domctl *, struct domain *d,
                         XEN_GUEST_HANDLE_PARAM(xen_domctl_t));
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v10 4/7] iommu: stop calling IOMMU page tables 'p2m tables'
  2020-11-20 13:24 [PATCH v10 0/7] IOMMU cleanup Paul Durrant
                   ` (2 preceding siblings ...)
  2020-11-20 13:24 ` [PATCH v10 3/7] iommu: remove the share_p2m operation Paul Durrant
@ 2020-11-20 13:24 ` Paul Durrant
  2020-11-20 13:24 ` [PATCH v10 5/7] vtd: use a bit field for root_entry Paul Durrant
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Paul Durrant @ 2020-11-20 13:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Jan Beulich, Kevin Tian, Andrew Cooper

From: Paul Durrant <pdurrant@amazon.com>

It's confusing and not consistent with the terminology introduced with 'dfn_t'.
Just call them IOMMU page tables.

Also remove a pointless check of the 'acpi_drhd_units' list in
vtd_dump_page_table_level(). If the list is empty then IOMMU mappings would
not have been enabled for the domain in the first place.

NOTE: All calls to printk() have also been removed from
      iommu_dump_page_tables(); the implementation specific code is now
      responsible for all output.
      The check for the global 'iommu_enabled' has also been replaced by an
      ASSERT since iommu_dump_page_tables() is not registered as a key handler
      unless IOMMU mappings are enabled.
      Error messages are now prefixed with the name of the function.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

v6:
 - Cosmetic adjustment
 - Drop use of __func__

v5:
 - Make sure domain id is in the output
 - Use VTDPREFIX in output for consistency

v2:
 - Moved all output into implementation specific code
---
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 20 +++++++-------
 xen/drivers/passthrough/iommu.c             | 21 ++++-----------
 xen/drivers/passthrough/vtd/iommu.c         | 30 ++++++++++++---------
 xen/include/xen/iommu.h                     |  2 +-
 4 files changed, 33 insertions(+), 40 deletions(-)

diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 64c1fca7b5b6..42b5a5a9bec4 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -494,8 +494,8 @@ static int amd_iommu_group_id(u16 seg, u8 bus, u8 devfn)
 
 #include <asm/io_apic.h>
 
-static void amd_dump_p2m_table_level(struct page_info* pg, int level, 
-                                     paddr_t gpa, int indent)
+static void amd_dump_page_table_level(struct page_info *pg, int level,
+                                      paddr_t gpa, int indent)
 {
     paddr_t address;
     const union amd_iommu_pte *table_vaddr;
@@ -507,7 +507,7 @@ static void amd_dump_p2m_table_level(struct page_info* pg, int level,
     table_vaddr = __map_domain_page(pg);
     if ( table_vaddr == NULL )
     {
-        printk("Failed to map IOMMU domain page %"PRIpaddr"\n", 
+        printk("AMD IOMMU failed to map domain page %"PRIpaddr"\n",
                 page_to_maddr(pg));
         return;
     }
@@ -524,7 +524,7 @@ static void amd_dump_p2m_table_level(struct page_info* pg, int level,
 
         if ( pde->next_level && (pde->next_level != (level - 1)) )
         {
-            printk("IOMMU p2m table error. next_level = %d, expected %d\n",
+            printk("AMD IOMMU table error. next_level = %d, expected %d\n",
                    pde->next_level, level - 1);
 
             continue;
@@ -532,7 +532,7 @@ static void amd_dump_p2m_table_level(struct page_info* pg, int level,
 
         address = gpa + amd_offset_level_address(index, level);
         if ( pde->next_level >= 1 )
-            amd_dump_p2m_table_level(
+            amd_dump_page_table_level(
                 mfn_to_page(_mfn(pde->mfn)), pde->next_level,
                 address, indent + 1);
         else
@@ -545,16 +545,16 @@ static void amd_dump_p2m_table_level(struct page_info* pg, int level,
     unmap_domain_page(table_vaddr);
 }
 
-static void amd_dump_p2m_table(struct domain *d)
+static void amd_dump_page_tables(struct domain *d)
 {
     const struct domain_iommu *hd = dom_iommu(d);
 
     if ( !hd->arch.amd.root_table )
         return;
 
-    printk("p2m table has %u levels\n", hd->arch.amd.paging_mode);
-    amd_dump_p2m_table_level(hd->arch.amd.root_table,
-                             hd->arch.amd.paging_mode, 0, 0);
+    printk("AMD IOMMU %pd table has %u levels\n", d, hd->arch.amd.paging_mode);
+    amd_dump_page_table_level(hd->arch.amd.root_table,
+                              hd->arch.amd.paging_mode, 0, 0);
 }
 
 static const struct iommu_ops __initconstrel _iommu_ops = {
@@ -580,7 +580,7 @@ static const struct iommu_ops __initconstrel _iommu_ops = {
     .suspend = amd_iommu_suspend,
     .resume = amd_iommu_resume,
     .crash_shutdown = amd_iommu_crash_shutdown,
-    .dump_p2m_table = amd_dump_p2m_table,
+    .dump_page_tables = amd_dump_page_tables,
 };
 
 static const struct iommu_init_ops __initconstrel _iommu_init_ops = {
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 90748062e5bd..8fae77b59375 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -22,7 +22,7 @@
 #include <xen/keyhandler.h>
 #include <xsm/xsm.h>
 
-static void iommu_dump_p2m_table(unsigned char key);
+static void iommu_dump_page_tables(unsigned char key);
 
 unsigned int __read_mostly iommu_dev_iotlb_timeout = 1000;
 integer_param("iommu_dev_iotlb_timeout", iommu_dev_iotlb_timeout);
@@ -212,7 +212,7 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
     if ( !is_iommu_enabled(d) )
         return;
 
-    register_keyhandler('o', &iommu_dump_p2m_table, "dump iommu p2m table", 0);
+    register_keyhandler('o', &iommu_dump_page_tables, "dump iommu page tables", 0);
 
     hd->platform_ops->hwdom_init(d);
 }
@@ -535,16 +535,12 @@ bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
     return is_iommu_enabled(d) && test_bit(feature, dom_iommu(d)->features);
 }
 
-static void iommu_dump_p2m_table(unsigned char key)
+static void iommu_dump_page_tables(unsigned char key)
 {
     struct domain *d;
     const struct iommu_ops *ops;
 
-    if ( !iommu_enabled )
-    {
-        printk("IOMMU not enabled!\n");
-        return;
-    }
+    ASSERT(iommu_enabled);
 
     ops = iommu_get_ops();
 
@@ -555,14 +551,7 @@ static void iommu_dump_p2m_table(unsigned char key)
         if ( is_hardware_domain(d) || !is_iommu_enabled(d) )
             continue;
 
-        if ( iommu_use_hap_pt(d) )
-        {
-            printk("\ndomain%d IOMMU p2m table shared with MMU: \n", d->domain_id);
-            continue;
-        }
-
-        printk("\ndomain%d IOMMU p2m table: \n", d->domain_id);
-        ops->dump_p2m_table(d);
+        ops->dump_page_tables(d);
     }
 
     rcu_read_unlock(&domlist_read_lock);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index a76e60c99a58..d136fe36883b 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2582,8 +2582,8 @@ static void vtd_resume(void)
     }
 }
 
-static void vtd_dump_p2m_table_level(paddr_t pt_maddr, int level, paddr_t gpa, 
-                                     int indent)
+static void vtd_dump_page_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
+                                      int indent)
 {
     paddr_t address;
     int i;
@@ -2596,7 +2596,8 @@ static void vtd_dump_p2m_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
     pt_vaddr = map_vtd_domain_page(pt_maddr);
     if ( pt_vaddr == NULL )
     {
-        printk("Failed to map VT-D domain page %"PRIpaddr"\n", pt_maddr);
+        printk(VTDPREFIX " failed to map domain page %"PRIpaddr"\n",
+               pt_maddr);
         return;
     }
 
@@ -2612,8 +2613,8 @@ static void vtd_dump_p2m_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
 
         address = gpa + offset_level_address(i, level);
         if ( next_level >= 1 ) 
-            vtd_dump_p2m_table_level(dma_pte_addr(*pte), next_level, 
-                                     address, indent + 1);
+            vtd_dump_page_table_level(dma_pte_addr(*pte), next_level,
+                                      address, indent + 1);
         else
             printk("%*sdfn: %08lx mfn: %08lx\n",
                    indent, "",
@@ -2624,17 +2625,20 @@ static void vtd_dump_p2m_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
     unmap_vtd_domain_page(pt_vaddr);
 }
 
-static void vtd_dump_p2m_table(struct domain *d)
+static void vtd_dump_page_tables(struct domain *d)
 {
-    const struct domain_iommu *hd;
+    const struct domain_iommu *hd = dom_iommu(d);
 
-    if ( list_empty(&acpi_drhd_units) )
+    if ( iommu_use_hap_pt(d) )
+    {
+        printk(VTDPREFIX " %pd sharing EPT table\n", d);
         return;
+    }
 
-    hd = dom_iommu(d);
-    printk("p2m table has %d levels\n", agaw_to_level(hd->arch.vtd.agaw));
-    vtd_dump_p2m_table_level(hd->arch.vtd.pgd_maddr,
-                             agaw_to_level(hd->arch.vtd.agaw), 0, 0);
+    printk(VTDPREFIX" %pd table has %d levels\n", d,
+           agaw_to_level(hd->arch.vtd.agaw));
+    vtd_dump_page_table_level(hd->arch.vtd.pgd_maddr,
+                              agaw_to_level(hd->arch.vtd.agaw), 0, 0);
 }
 
 static int __init intel_iommu_quarantine_init(struct domain *d)
@@ -2733,7 +2737,7 @@ static struct iommu_ops __initdata vtd_ops = {
     .iotlb_flush = iommu_flush_iotlb_pages,
     .iotlb_flush_all = iommu_flush_iotlb_all,
     .get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
-    .dump_p2m_table = vtd_dump_p2m_table,
+    .dump_page_tables = vtd_dump_page_tables,
 };
 
 const struct iommu_init_ops __initconstrel intel_iommu_init_ops = {
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 236c55af8921..1a369c97c956 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -275,7 +275,7 @@ struct iommu_ops {
                                     unsigned int flush_flags);
     int __must_check (*iotlb_flush_all)(struct domain *d);
     int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
-    void (*dump_p2m_table)(struct domain *d);
+    void (*dump_page_tables)(struct domain *d);
 
 #ifdef CONFIG_HAS_DEVICE_TREE
     /*
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v10 5/7] vtd: use a bit field for root_entry
  2020-11-20 13:24 [PATCH v10 0/7] IOMMU cleanup Paul Durrant
                   ` (3 preceding siblings ...)
  2020-11-20 13:24 ` [PATCH v10 4/7] iommu: stop calling IOMMU page tables 'p2m tables' Paul Durrant
@ 2020-11-20 13:24 ` Paul Durrant
  2020-11-27 15:11   ` Jan Beulich
  2020-11-30  3:06   ` Tian, Kevin
  2020-11-20 13:24 ` [PATCH v10 6/7] vtd: use a bit field for context_entry Paul Durrant
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 19+ messages in thread
From: Paul Durrant @ 2020-11-20 13:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Kevin Tian

From: Paul Durrant <pdurrant@amazon.com>

This makes the code a little easier to read and also makes it more consistent
with iremap_entry.

Also take the opportunity to tidy up the implementation of device_in_domain().

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Kevin Tian <kevin.tian@intel.com>

v10:
 - Small tweaks requested by Jan
 - Remove macros in favour of direct field access
 - Add missing barrier

v4:
 - New in v4
---
 xen/drivers/passthrough/vtd/iommu.c   |  9 +++++----
 xen/drivers/passthrough/vtd/iommu.h   | 25 ++++++++++++-------------
 xen/drivers/passthrough/vtd/utils.c   |  6 +++---
 xen/drivers/passthrough/vtd/x86/ats.c | 27 +++++++++++++++------------
 4 files changed, 35 insertions(+), 32 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index d136fe36883b..1a038541f0a3 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -237,7 +237,7 @@ static u64 bus_to_context_maddr(struct vtd_iommu *iommu, u8 bus)
     ASSERT(spin_is_locked(&iommu->lock));
     root_entries = (struct root_entry *)map_vtd_domain_page(iommu->root_maddr);
     root = &root_entries[bus];
-    if ( !root_present(*root) )
+    if ( !root->p )
     {
         maddr = alloc_pgtable_maddr(1, iommu->node);
         if ( maddr == 0 )
@@ -245,11 +245,12 @@ static u64 bus_to_context_maddr(struct vtd_iommu *iommu, u8 bus)
             unmap_vtd_domain_page(root_entries);
             return 0;
         }
-        set_root_value(*root, maddr);
-        set_root_present(*root);
+        root->ctp = paddr_to_pfn(maddr);
+        smp_wmb();
+        root->p = true;
         iommu_sync_cache(root, sizeof(struct root_entry));
     }
-    maddr = (u64) get_context_addr(*root);
+    maddr = pfn_to_paddr(root->ctp);
     unmap_vtd_domain_page(root_entries);
     return maddr;
 }
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 216791b3d634..b14628eec260 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -184,21 +184,20 @@
 #define dma_frcd_source_id(c) (c & 0xffff)
 #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
 
-/*
- * 0: Present
- * 1-11: Reserved
- * 12-63: Context Ptr (12 - (haw-1))
- * 64-127: Reserved
- */
 struct root_entry {
-    u64    val;
-    u64    rsvd1;
+    union {
+        struct { uint64_t lo, hi; };
+        struct {
+            /* 0 - 63 */
+            bool p:1;
+            unsigned int reserved0:11;
+            uint64_t ctp:52;
+
+            /* 64 - 127 */
+            uint64_t reserved1;
+        };
+    };
 };
-#define root_present(root)    ((root).val & 1)
-#define set_root_present(root) do {(root).val |= 1;} while(0)
-#define get_context_addr(root) ((root).val & PAGE_MASK_4K)
-#define set_root_value(root, value) \
-    do {(root).val |= ((value) & PAGE_MASK_4K);} while(0)
 
 struct context_entry {
     u64 lo;
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index 4febcf506d8a..5f25a86a535c 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -112,15 +112,15 @@ void print_vtd_entries(struct vtd_iommu *iommu, int bus, int devfn, u64 gmfn)
         return;
     }
 
-    printk("    root_entry[%02x] = %"PRIx64"\n", bus, root_entry[bus].val);
-    if ( !root_present(root_entry[bus]) )
+    printk("    root_entry[%02x] = %"PRIx64"\n", bus, root_entry[bus].lo);
+    if ( !root_entry[bus].p )
     {
         unmap_vtd_domain_page(root_entry);
         printk("    root_entry[%02x] not present\n", bus);
         return;
     }
 
-    val = root_entry[bus].val;
+    val = pfn_to_paddr(root_entry[bus].ctp);
     unmap_vtd_domain_page(root_entry);
     ctxt_entry = map_vtd_domain_page(val);
     if ( ctxt_entry == NULL )
diff --git a/xen/drivers/passthrough/vtd/x86/ats.c b/xen/drivers/passthrough/vtd/x86/ats.c
index 04d702b1d6b1..fec969ef75bb 100644
--- a/xen/drivers/passthrough/vtd/x86/ats.c
+++ b/xen/drivers/passthrough/vtd/x86/ats.c
@@ -74,8 +74,8 @@ int ats_device(const struct pci_dev *pdev, const struct acpi_drhd_unit *drhd)
 static bool device_in_domain(const struct vtd_iommu *iommu,
                              const struct pci_dev *pdev, uint16_t did)
 {
-    struct root_entry *root_entry;
-    struct context_entry *ctxt_entry = NULL;
+    struct root_entry *root_entry, *root_entries;
+    struct context_entry *context_entry, *context_entries = NULL;
     unsigned int tt;
     bool found = false;
 
@@ -85,25 +85,28 @@ static bool device_in_domain(const struct vtd_iommu *iommu,
         return false;
     }
 
-    root_entry = map_vtd_domain_page(iommu->root_maddr);
-    if ( !root_present(root_entry[pdev->bus]) )
+    root_entries = (struct root_entry *)map_vtd_domain_page(iommu->root_maddr);
+    root_entry = &root_entries[pdev->bus];
+    if ( !root_entry->p )
         goto out;
 
-    ctxt_entry = map_vtd_domain_page(root_entry[pdev->bus].val);
-    if ( context_domain_id(ctxt_entry[pdev->devfn]) != did )
+    context_entries = map_vtd_domain_page(root_entry->ctp);
+    context_entry = &context_entries[pdev->devfn];
+    if ( context_domain_id(*context_entry) != did )
         goto out;
 
-    tt = context_translation_type(ctxt_entry[pdev->devfn]);
+    tt = context_translation_type(*context_entry);
     if ( tt != CONTEXT_TT_DEV_IOTLB )
         goto out;
 
     found = true;
-out:
-    if ( root_entry )
-        unmap_vtd_domain_page(root_entry);
 
-    if ( ctxt_entry )
-        unmap_vtd_domain_page(ctxt_entry);
+ out:
+    if ( root_entries )
+        unmap_vtd_domain_page(root_entries);
+
+    if ( context_entries )
+        unmap_vtd_domain_page(context_entries);
 
     return found;
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v10 6/7] vtd: use a bit field for context_entry
  2020-11-20 13:24 [PATCH v10 0/7] IOMMU cleanup Paul Durrant
                   ` (4 preceding siblings ...)
  2020-11-20 13:24 ` [PATCH v10 5/7] vtd: use a bit field for root_entry Paul Durrant
@ 2020-11-20 13:24 ` Paul Durrant
  2020-11-27 15:32   ` Jan Beulich
  2020-11-30  3:10   ` Tian, Kevin
  2020-11-20 13:24 ` [PATCH v10 7/7] vtd: use a bit field for dma_pte Paul Durrant
  2020-11-27 16:21 ` [PATCH v10 0/7] IOMMU cleanup Jan Beulich
  7 siblings, 2 replies; 19+ messages in thread
From: Paul Durrant @ 2020-11-20 13:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Kevin Tian

From: Paul Durrant <pdurrant@amazon.com>

This removes the need for much shifting, masking and several magic numbers.
On the whole it makes the code quite a bit more readable.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Kevin Tian <kevin.tian@intel.com>

v10:
 - Remove macros in favour of direct field access
 - Adjust field types
 - Add missing barriers

v4:
 - New in v4
---
 xen/drivers/passthrough/vtd/iommu.c   | 36 +++++++++++----------
 xen/drivers/passthrough/vtd/iommu.h   | 45 +++++++++++++--------------
 xen/drivers/passthrough/vtd/utils.c   | 10 +++---
 xen/drivers/passthrough/vtd/x86/ats.c |  6 ++--
 4 files changed, 47 insertions(+), 50 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 1a038541f0a3..fdb472ad6515 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -86,8 +86,6 @@ static int domain_iommu_domid(struct domain *d,
     return -1;
 }
 
-#define DID_FIELD_WIDTH 16
-#define DID_HIGH_OFFSET 8
 static int context_set_domain_id(struct context_entry *context,
                                  struct domain *d,
                                  struct vtd_iommu *iommu)
@@ -121,21 +119,22 @@ static int context_set_domain_id(struct context_entry *context,
     }
 
     set_bit(i, iommu->domid_bitmap);
-    context->hi |= (i & ((1 << DID_FIELD_WIDTH) - 1)) << DID_HIGH_OFFSET;
+    context->did = i;
+
     return 0;
 }
 
 static int context_get_domain_id(struct context_entry *context,
                                  struct vtd_iommu *iommu)
 {
-    unsigned long dom_index, nr_dom;
     int domid = -1;
 
     if (iommu && context)
     {
-        nr_dom = cap_ndoms(iommu->cap);
+        unsigned long dom_index, nr_dom;
 
-        dom_index = context_domain_id(*context);
+        nr_dom = cap_ndoms(iommu->cap);
+        dom_index = context->did;
 
         if ( dom_index < nr_dom && iommu->domid_map )
             domid = iommu->domid_map[dom_index];
@@ -1338,7 +1337,7 @@ int domain_context_mapping_one(
     context_entries = (struct context_entry *)map_vtd_domain_page(maddr);
     context = &context_entries[devfn];
 
-    if ( context_present(*context) )
+    if ( context->p )
     {
         int res = 0;
 
@@ -1382,7 +1381,7 @@ int domain_context_mapping_one(
 
     if ( iommu_hwdom_passthrough && is_hardware_domain(domain) )
     {
-        context_set_translation_type(*context, CONTEXT_TT_PASS_THRU);
+        context->tt = CONTEXT_TT_PASS_THRU;
     }
     else
     {
@@ -1397,11 +1396,11 @@ int domain_context_mapping_one(
             return -ENOMEM;
         }
 
-        context_set_address_root(*context, pgd_maddr);
+        context->slptptr = paddr_to_pfn(pgd_maddr);
         if ( ats_enabled && ecap_dev_iotlb(iommu->ecap) )
-            context_set_translation_type(*context, CONTEXT_TT_DEV_IOTLB);
+            context->tt = CONTEXT_TT_DEV_IOTLB;
         else
-            context_set_translation_type(*context, CONTEXT_TT_MULTI_LEVEL);
+            context->tt = CONTEXT_TT_MULTI_LEVEL;
 
         spin_unlock(&hd->arch.mapping_lock);
     }
@@ -1413,9 +1412,10 @@ int domain_context_mapping_one(
         return -EFAULT;
     }
 
-    context_set_address_width(*context, level_to_agaw(iommu->nr_pt_levels));
-    context_set_fault_enable(*context);
-    context_set_present(*context);
+    context->aw = level_to_agaw(iommu->nr_pt_levels);
+    context->fpd = false;
+    smp_wmb();
+    context->p = true;
     iommu_sync_cache(context, sizeof(struct context_entry));
     spin_unlock(&iommu->lock);
 
@@ -1567,17 +1567,19 @@ int domain_context_unmap_one(
     context_entries = (struct context_entry *)map_vtd_domain_page(maddr);
     context = &context_entries[devfn];
 
-    if ( !context_present(*context) )
+    if ( !context->p )
     {
         spin_unlock(&iommu->lock);
         unmap_vtd_domain_page(context_entries);
         return 0;
     }
 
-    context_clear_present(*context);
-    context_clear_entry(*context);
+    context->p = false;
+    smp_wmb();
     iommu_sync_cache(context, sizeof(struct context_entry));
 
+    context->val = 0; /* No need to sync; present bit is already cleared */
+
     iommu_domid= domain_iommu_domid(domain, iommu);
     if ( iommu_domid == -1 )
     {
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index b14628eec260..33b1abf98526 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -198,37 +198,34 @@ struct root_entry {
         };
     };
 };
+#define ROOT_ENTRY_NR (PAGE_SIZE_4K / sizeof(struct root_entry))
 
 struct context_entry {
-    u64 lo;
-    u64 hi;
-};
-#define ROOT_ENTRY_NR (PAGE_SIZE_4K/sizeof(struct root_entry))
-#define context_present(c) ((c).lo & 1)
-#define context_fault_disable(c) (((c).lo >> 1) & 1)
-#define context_translation_type(c) (((c).lo >> 2) & 3)
-#define context_address_root(c) ((c).lo & PAGE_MASK_4K)
-#define context_address_width(c) ((c).hi &  7)
-#define context_domain_id(c) (((c).hi >> 8) & ((1 << 16) - 1))
+    union {
+        __uint128_t val;
+        struct { uint64_t lo, hi; };
+        struct {
+            /* 0 - 63 */
+            bool p:1;
+            bool fpd:1;
+            uint64_t tt:2;
 
-#define context_set_present(c) do {(c).lo |= 1;} while(0)
-#define context_clear_present(c) do {(c).lo &= ~1;} while(0)
-#define context_set_fault_enable(c) \
-    do {(c).lo &= (((u64)-1) << 2) | 1;} while(0)
-
-#define context_set_translation_type(c, val) do { \
-        (c).lo &= (((u64)-1) << 4) | 3; \
-        (c).lo |= (val & 3) << 2; \
-    } while(0)
 #define CONTEXT_TT_MULTI_LEVEL 0
 #define CONTEXT_TT_DEV_IOTLB   1
 #define CONTEXT_TT_PASS_THRU   2
 
-#define context_set_address_root(c, val) \
-    do {(c).lo &= 0xfff; (c).lo |= (val) & PAGE_MASK_4K ;} while(0)
-#define context_set_address_width(c, val) \
-    do {(c).hi &= 0xfffffff8; (c).hi |= (val) & 7;} while(0)
-#define context_clear_entry(c) do {(c).lo = 0; (c).hi = 0;} while(0)
+            unsigned int reserved0:8;
+            uint64_t slptptr:52;
+
+            /* 64 - 127 */
+            unsigned int aw:3;
+            unsigned int ignored:4;
+            unsigned int reserved1:1;
+            unsigned int did:16;
+            uint64_t reserved2:40;
+        };
+    };
+};
 
 /* page table handling */
 #define LEVEL_STRIDE       (9)
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index 5f25a86a535c..4bca160bc663 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -129,17 +129,17 @@ void print_vtd_entries(struct vtd_iommu *iommu, int bus, int devfn, u64 gmfn)
         return;
     }
 
-    val = ctxt_entry[devfn].lo;
-    printk("    context[%02x] = %"PRIx64"_%"PRIx64"\n",
-           devfn, ctxt_entry[devfn].hi, val);
-    if ( !context_present(ctxt_entry[devfn]) )
+    printk("    context[%02x] = %"PRIx64"_%"PRIx64"\n", devfn,
+           ctxt_entry[devfn].hi, ctxt_entry[devfn].lo);
+    if ( !ctxt_entry[devfn].p )
     {
         unmap_vtd_domain_page(ctxt_entry);
         printk("    ctxt_entry[%02x] not present\n", devfn);
         return;
     }
 
-    level = agaw_to_level(context_address_width(ctxt_entry[devfn]));
+    level = agaw_to_level(ctxt_entry[devfn].aw);
+    val = pfn_to_paddr(ctxt_entry[devfn].slptptr);
     unmap_vtd_domain_page(ctxt_entry);
     if ( level != VTD_PAGE_TABLE_LEVEL_3 &&
          level != VTD_PAGE_TABLE_LEVEL_4)
diff --git a/xen/drivers/passthrough/vtd/x86/ats.c b/xen/drivers/passthrough/vtd/x86/ats.c
index fec969ef75bb..cb057ced3cf7 100644
--- a/xen/drivers/passthrough/vtd/x86/ats.c
+++ b/xen/drivers/passthrough/vtd/x86/ats.c
@@ -76,7 +76,6 @@ static bool device_in_domain(const struct vtd_iommu *iommu,
 {
     struct root_entry *root_entry, *root_entries;
     struct context_entry *context_entry, *context_entries = NULL;
-    unsigned int tt;
     bool found = false;
 
     if ( unlikely(!iommu->root_maddr) )
@@ -92,11 +91,10 @@ static bool device_in_domain(const struct vtd_iommu *iommu,
 
     context_entries = map_vtd_domain_page(root_entry->ctp);
     context_entry = &context_entries[pdev->devfn];
-    if ( context_domain_id(*context_entry) != did )
+    if ( context_entry->did != did )
         goto out;
 
-    tt = context_translation_type(*context_entry);
-    if ( tt != CONTEXT_TT_DEV_IOTLB )
+    if ( context_entry->tt != CONTEXT_TT_DEV_IOTLB )
         goto out;
 
     found = true;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v10 7/7] vtd: use a bit field for dma_pte
  2020-11-20 13:24 [PATCH v10 0/7] IOMMU cleanup Paul Durrant
                   ` (5 preceding siblings ...)
  2020-11-20 13:24 ` [PATCH v10 6/7] vtd: use a bit field for context_entry Paul Durrant
@ 2020-11-20 13:24 ` Paul Durrant
  2020-11-27 16:02   ` Jan Beulich
  2020-11-27 16:21 ` [PATCH v10 0/7] IOMMU cleanup Jan Beulich
  7 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2020-11-20 13:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant, Kevin Tian

From: Paul Durrant <pdurrant@amazon.com>

As with a prior patch for context_entry, this removes the need for much
shifting, masking and several magic numbers.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Kevin Tian <kevin.tian@intel.com>

v10:
 - Remove macros in favour of direct field access
 - Adjust field types
 - Use write_atomic() to update the live PTE

v4:
 - New in v4
---
 xen/drivers/passthrough/vtd/iommu.c | 61 +++++++++++++++--------------
 xen/drivers/passthrough/vtd/iommu.h | 43 ++++++--------------
 xen/drivers/passthrough/vtd/utils.c |  6 +--
 3 files changed, 47 insertions(+), 63 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index fdb472ad6515..2389b9fc708d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -281,7 +281,7 @@ static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
         offset = address_level_offset(addr, level);
         pte = &parent[offset];
 
-        pte_maddr = dma_pte_addr(*pte);
+        pte_maddr = pfn_to_paddr(pte->addr);
         if ( !pte_maddr )
         {
             struct page_info *pg;
@@ -294,14 +294,14 @@ static u64 addr_to_dma_page_maddr(struct domain *domain, u64 addr, int alloc)
                 break;
 
             pte_maddr = page_to_maddr(pg);
-            dma_set_pte_addr(*pte, pte_maddr);
+            pte->addr = paddr_to_pfn(pte_maddr);
+            smp_wmb();
 
             /*
              * high level table always sets r/w, last level
              * page table control read/write
              */
-            dma_set_pte_readable(*pte);
-            dma_set_pte_writable(*pte);
+            pte->r = pte->w = true;
             iommu_sync_cache(pte, sizeof(struct dma_pte));
         }
 
@@ -351,7 +351,7 @@ static uint64_t domain_pgd_maddr(struct domain *d, unsigned int nr_pt_levels)
     {
         const struct dma_pte *p = map_vtd_domain_page(pgd_maddr);
 
-        pgd_maddr = dma_pte_addr(*p);
+        pgd_maddr = pfn_to_paddr(p->addr);
         unmap_vtd_domain_page(p);
         if ( !pgd_maddr )
             return 0;
@@ -709,20 +709,23 @@ static void dma_pte_clear_one(struct domain *domain, uint64_t addr,
     page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
     pte = page + address_level_offset(addr, 1);
 
-    if ( !dma_pte_present(*pte) )
+    if ( !pte->r && !pte->w )
     {
         spin_unlock(&hd->arch.mapping_lock);
         unmap_vtd_domain_page(page);
         return;
     }
 
-    dma_clear_pte(*pte);
-    *flush_flags |= IOMMU_FLUSHF_modified;
+    pte->r = pte->w = false;
+    smp_wmb();
+    pte->val = 0;
 
     spin_unlock(&hd->arch.mapping_lock);
     iommu_sync_cache(pte, sizeof(struct dma_pte));
 
     unmap_vtd_domain_page(page);
+
+    *flush_flags |= IOMMU_FLUSHF_modified;
 }
 
 static int iommu_set_root_entry(struct vtd_iommu *iommu)
@@ -1751,7 +1754,7 @@ static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
                                              unsigned int *flush_flags)
 {
     struct domain_iommu *hd = dom_iommu(d);
-    struct dma_pte *page, *pte, old, new = {};
+    struct dma_pte *page, *pte, old, new;
     u64 pg_maddr;
     int rc = 0;
 
@@ -1775,15 +1778,12 @@ static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
     page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
     pte = &page[dfn_x(dfn) & LEVEL_MASK];
     old = *pte;
-
-    dma_set_pte_addr(new, mfn_to_maddr(mfn));
-    dma_set_pte_prot(new,
-                     ((flags & IOMMUF_readable) ? DMA_PTE_READ  : 0) |
-                     ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0));
-
-    /* Set the SNP on leaf page table if Snoop Control available */
-    if ( iommu_snoop )
-        dma_set_pte_snp(new);
+    new = (struct dma_pte){
+        .r = flags & IOMMUF_readable,
+        .w = flags & IOMMUF_writable,
+        .snp = iommu_snoop,
+        .addr = mfn_x(mfn),
+    };
 
     if ( old.val == new.val )
     {
@@ -1792,14 +1792,14 @@ static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
         return 0;
     }
 
-    *pte = new;
+    write_atomic(&pte->val, new.val);
+    spin_unlock(&hd->arch.mapping_lock);
 
     iommu_sync_cache(pte, sizeof(struct dma_pte));
-    spin_unlock(&hd->arch.mapping_lock);
     unmap_vtd_domain_page(page);
 
     *flush_flags |= IOMMU_FLUSHF_added;
-    if ( dma_pte_present(old) )
+    if ( old.r || old.w )
         *flush_flags |= IOMMU_FLUSHF_modified;
 
     return rc;
@@ -1851,12 +1851,12 @@ static int intel_iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn,
     unmap_vtd_domain_page(page);
     spin_unlock(&hd->arch.mapping_lock);
 
-    if ( !dma_pte_present(val) )
+    if ( !val.r && !val.w )
         return -ENOENT;
 
-    *mfn = maddr_to_mfn(dma_pte_addr(val));
-    *flags = dma_pte_read(val) ? IOMMUF_readable : 0;
-    *flags |= dma_pte_write(val) ? IOMMUF_writable : 0;
+    *mfn = _mfn(val.addr);
+    *flags = val.r ? IOMMUF_readable : 0;
+    *flags |= val.w ? IOMMUF_writable : 0;
 
     return 0;
 }
@@ -2611,18 +2611,18 @@ static void vtd_dump_page_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
             process_pending_softirqs();
 
         pte = &pt_vaddr[i];
-        if ( !dma_pte_present(*pte) )
+        if ( !pte->r && !pte->w )
             continue;
 
         address = gpa + offset_level_address(i, level);
         if ( next_level >= 1 ) 
-            vtd_dump_page_table_level(dma_pte_addr(*pte), next_level,
+            vtd_dump_page_table_level(pfn_to_paddr(pte->addr), next_level,
                                       address, indent + 1);
         else
             printk("%*sdfn: %08lx mfn: %08lx\n",
                    indent, "",
                    (unsigned long)(address >> PAGE_SHIFT_4K),
-                   (unsigned long)(dma_pte_addr(*pte) >> PAGE_SHIFT_4K));
+                   (unsigned long)(pte->addr));
     }
 
     unmap_vtd_domain_page(pt_vaddr);
@@ -2690,8 +2690,9 @@ static int __init intel_iommu_quarantine_init(struct domain *d)
         {
             struct dma_pte *pte = &parent[offset];
 
-            dma_set_pte_addr(*pte, maddr);
-            dma_set_pte_readable(*pte);
+            pte->addr = paddr_to_pfn(maddr);
+            smp_wmb();
+            pte->r = 1;
         }
         iommu_sync_cache(parent, PAGE_SIZE);
 
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 33b1abf98526..1b6123e0c947 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -244,38 +244,21 @@ struct context_entry {
 #define level_size(l) (1 << level_to_offset_bits(l))
 #define align_to_level(addr, l) ((addr + level_size(l) - 1) & level_mask(l))
 
-/*
- * 0: readable
- * 1: writable
- * 2-6: reserved
- * 7: super page
- * 8-11: available
- * 12-63: Host physcial address
- */
 struct dma_pte {
-    u64 val;
+    union {
+        uint64_t val;
+        struct {
+            bool r:1;
+            bool w:1;
+            unsigned int reserved0:1;
+            unsigned int ignored0:4;
+            bool ps:1;
+            unsigned int ignored1:3;
+            bool snp:1;
+            uint64_t addr:52;
+        };
+    };
 };
-#define DMA_PTE_READ (1)
-#define DMA_PTE_WRITE (2)
-#define DMA_PTE_PROT (DMA_PTE_READ | DMA_PTE_WRITE)
-#define DMA_PTE_SP   (1 << 7)
-#define DMA_PTE_SNP  (1 << 11)
-#define dma_clear_pte(p)    do {(p).val = 0;} while(0)
-#define dma_set_pte_readable(p) do {(p).val |= DMA_PTE_READ;} while(0)
-#define dma_set_pte_writable(p) do {(p).val |= DMA_PTE_WRITE;} while(0)
-#define dma_set_pte_superpage(p) do {(p).val |= DMA_PTE_SP;} while(0)
-#define dma_set_pte_snp(p)  do {(p).val |= DMA_PTE_SNP;} while(0)
-#define dma_set_pte_prot(p, prot) do { \
-        (p).val = ((p).val & ~DMA_PTE_PROT) | ((prot) & DMA_PTE_PROT); \
-    } while (0)
-#define dma_pte_prot(p) ((p).val & DMA_PTE_PROT)
-#define dma_pte_read(p) (dma_pte_prot(p) & DMA_PTE_READ)
-#define dma_pte_write(p) (dma_pte_prot(p) & DMA_PTE_WRITE)
-#define dma_pte_addr(p) ((p).val & PADDR_MASK & PAGE_MASK_4K)
-#define dma_set_pte_addr(p, addr) do {\
-            (p).val |= ((addr) & PAGE_MASK_4K); } while (0)
-#define dma_pte_present(p) (((p).val & DMA_PTE_PROT) != 0)
-#define dma_pte_superpage(p) (((p).val & DMA_PTE_SP) != 0)
 
 /* interrupt remap entry */
 struct iremap_entry {
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index 4bca160bc663..a78b02e6edc4 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -161,14 +161,14 @@ void print_vtd_entries(struct vtd_iommu *iommu, int bus, int devfn, u64 gmfn)
         unmap_vtd_domain_page(l);
         printk("    l%u[%03x] = %"PRIx64"\n", level, l_index, pte.val);
 
-        if ( !dma_pte_present(pte) )
+        if ( !pte.r && !pte.w )
         {
             printk("    l%u[%03x] not present\n", level, l_index);
             break;
         }
-        if ( dma_pte_superpage(pte) )
+        if ( pte.ps )
             break;
-        val = dma_pte_addr(pte);
+        val = pfn_to_paddr(pte.addr);
     } while ( --level );
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 1/7] remove remaining uses of iommu_legacy_map/unmap
  2020-11-20 13:24 ` [PATCH v10 1/7] remove remaining uses of iommu_legacy_map/unmap Paul Durrant
@ 2020-11-27 14:39   ` Jan Beulich
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2020-11-27 14:39 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Paul Durrant, Julien Grall, Kevin Tian, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini, Jun Nakajima,
	xen-devel

On 20.11.2020 14:24, Paul Durrant wrote:
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -2489,10 +2489,16 @@ static int cleanup_page_mappings(struct page_info *page)
>  
>          if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
>          {
> -            int rc2 = iommu_legacy_unmap(d, _dfn(mfn), 1u << PAGE_ORDER_4K);
> +            unsigned int flush_flags = 0;
> +            int err;
> +
> +            err = iommu_unmap(d, _dfn(mfn), 1ul << PAGE_ORDER_4K, &flush_flags);
> +            if ( !err && !this_cpu(iommu_dont_flush_iotlb) )
> +                err = iommu_iotlb_flush(d, _dfn(mfn), 1ul << PAGE_ORDER_4K,
> +                                        flush_flags);

As was the subject of XSA-346, honoring on a path leading to
the freeing of a page _before_ the delayed flush actually
happens is wrong. Luckily the first of the two patches for
that XSA arranged for you never being able to observe the flag
set, so the check here is simply pointless. But it should
still be removed for documentation purposes.

> @@ -3014,14 +3020,20 @@ static int _get_page_type(struct page_info *page, unsigned long type,
>          if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
>          {
>              mfn_t mfn = page_to_mfn(page);
> +            dfn_t dfn = _dfn(mfn_x(mfn));
> +            unsigned int flush_flags = 0;
>  
>              if ( (x & PGT_type_mask) == PGT_writable_page )
> -                rc = iommu_legacy_unmap(d, _dfn(mfn_x(mfn)),
> -                                        1ul << PAGE_ORDER_4K);
> +                rc = iommu_unmap(d, dfn, 1ul << PAGE_ORDER_4K, &flush_flags);
>              else
> -                rc = iommu_legacy_map(d, _dfn(mfn_x(mfn)), mfn,
> -                                      1ul << PAGE_ORDER_4K,
> -                                      IOMMUF_readable | IOMMUF_writable);
> +            {
> +                rc = iommu_map(d, dfn, mfn, 1ul << PAGE_ORDER_4K,
> +                               IOMMUF_readable | IOMMUF_writable, &flush_flags);
> +            }
> +
> +            if ( !rc && !this_cpu(iommu_dont_flush_iotlb) )
> +                rc = iommu_iotlb_flush(d, dfn, 1ul << PAGE_ORDER_4K,
> +                                       flush_flags);

Along these lines here - at least the unmapping needs to be
followed by a flush before the page can assume its new role.
Yet again I don't think the flag can ever be observed true
here, first and foremost because of the is_pv_domain() in
the surrounding if(). While the check could be made
conditional upon the prior operation having been a map, I
think it's again easier to simply delete the dead check.

> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -842,15 +842,19 @@ out:
>      if ( rc == 0 && p2m_is_hostp2m(p2m) &&
>           need_modify_vtd_table )
>      {
> -        if ( iommu_use_hap_pt(d) && !this_cpu(iommu_dont_flush_iotlb) )
> -            rc = iommu_iotlb_flush(d, _dfn(gfn), 1ul << order,
> -                                   (iommu_flags ? IOMMU_FLUSHF_added : 0) |
> -                                   (vtd_pte_present ? IOMMU_FLUSHF_modified
> -                                                    : 0));
> -        else if ( need_iommu_pt_sync(d) )
> +        unsigned int flush_flags = 0;
> +
> +        if ( need_iommu_pt_sync(d) )
>              rc = iommu_flags ?
> -                iommu_legacy_map(d, _dfn(gfn), mfn, 1ul << order, iommu_flags) :
> -                iommu_legacy_unmap(d, _dfn(gfn), 1ul << order);
> +                iommu_map(d, _dfn(gfn), mfn, 1ul << order, iommu_flags,
> +                          &flush_flags) :
> +                iommu_unmap(d, _dfn(gfn), 1ul << order, &flush_flags);
> +        else if ( iommu_use_hap_pt(d) )
> +            flush_flags = (iommu_flags ? IOMMU_FLUSHF_added : 0) |
> +                          (vtd_pte_present ? IOMMU_FLUSHF_modified : 0);

Is there a particular reason you inverted the order of the
iommu_use_hap_pt() and need_iommu_pt_sync() checks here?
The common (default) case for VT-x / VT-d / EPT is going to
be shared page tables, so I think this should remain the
path getting away with just one evaluation of a conditional.

> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -836,8 +836,8 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
>  
>      if ( is_iommu_enabled(d) )
>      {
> -       this_cpu(iommu_dont_flush_iotlb) = 1;
> -       extra.ppage = &pages[0];
> +        this_cpu(iommu_dont_flush_iotlb) = true;
> +        extra.ppage = &pages[0];
>      }

Is the respective part of the description ("no longer
pointlessly gated on is_iommu_enabled() returning true") stale?

> @@ -368,15 +360,12 @@ void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev);
>  
>  /*
>   * The purpose of the iommu_dont_flush_iotlb optional cpu flag is to
> - * avoid unecessary iotlb_flush in the low level IOMMU code.
> - *
> - * iommu_map_page/iommu_unmap_page must flush the iotlb but somethimes
> - * this operation can be really expensive. This flag will be set by the
> - * caller to notify the low level IOMMU code to avoid the iotlb flushes.
> - * iommu_iotlb_flush/iommu_iotlb_flush_all will be explicitly called by
> - * the caller.
> + * avoid unnecessary IOMMU flushing while updating the P2M.
> + * Setting the value to true will cause iommu_iotlb_flush() to return without
> + * actually performing a flush. A batch flush must therefore be done by the
> + * calling code after setting the value back to false.

I guess this too was in need of updating with the v9 changes?

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 5/7] vtd: use a bit field for root_entry
  2020-11-20 13:24 ` [PATCH v10 5/7] vtd: use a bit field for root_entry Paul Durrant
@ 2020-11-27 15:11   ` Jan Beulich
  2020-11-30  3:06   ` Tian, Kevin
  1 sibling, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2020-11-27 15:11 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Paul Durrant, Kevin Tian, xen-devel

On 20.11.2020 14:24, Paul Durrant wrote:
> @@ -85,25 +85,28 @@ static bool device_in_domain(const struct vtd_iommu *iommu,
>          return false;
>      }
>  
> -    root_entry = map_vtd_domain_page(iommu->root_maddr);
> -    if ( !root_present(root_entry[pdev->bus]) )
> +    root_entries = (struct root_entry *)map_vtd_domain_page(iommu->root_maddr);

Why the cast, the more that ...

> +    root_entry = &root_entries[pdev->bus];
> +    if ( !root_entry->p )
>          goto out;
>  
> -    ctxt_entry = map_vtd_domain_page(root_entry[pdev->bus].val);
> -    if ( context_domain_id(ctxt_entry[pdev->devfn]) != did )
> +    context_entries = map_vtd_domain_page(root_entry->ctp);

... you have none here? With this dropped
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 6/7] vtd: use a bit field for context_entry
  2020-11-20 13:24 ` [PATCH v10 6/7] vtd: use a bit field for context_entry Paul Durrant
@ 2020-11-27 15:32   ` Jan Beulich
  2020-11-30  3:10   ` Tian, Kevin
  1 sibling, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2020-11-27 15:32 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Paul Durrant, Kevin Tian, xen-devel

On 20.11.2020 14:24, Paul Durrant wrote:
> @@ -121,21 +119,22 @@ static int context_set_domain_id(struct context_entry *context,
>      }
>  
>      set_bit(i, iommu->domid_bitmap);
> -    context->hi |= (i & ((1 << DID_FIELD_WIDTH) - 1)) << DID_HIGH_OFFSET;
> +    context->did = i;
> +
>      return 0;
>  }
>  
>  static int context_get_domain_id(struct context_entry *context,
>                                   struct vtd_iommu *iommu)
>  {
> -    unsigned long dom_index, nr_dom;
>      int domid = -1;
>  
>      if (iommu && context)
>      {
> -        nr_dom = cap_ndoms(iommu->cap);
> +        unsigned long dom_index, nr_dom;

unsigned int will do here.

> -        dom_index = context_domain_id(*context);
> +        nr_dom = cap_ndoms(iommu->cap);
> +        dom_index = context->did;

These could also become the initializers of the variables now.

> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -198,37 +198,34 @@ struct root_entry {
>          };
>      };
>  };
> +#define ROOT_ENTRY_NR (PAGE_SIZE_4K / sizeof(struct root_entry))
>  
>  struct context_entry {
> -    u64 lo;
> -    u64 hi;
> -};
> -#define ROOT_ENTRY_NR (PAGE_SIZE_4K/sizeof(struct root_entry))
> -#define context_present(c) ((c).lo & 1)
> -#define context_fault_disable(c) (((c).lo >> 1) & 1)
> -#define context_translation_type(c) (((c).lo >> 2) & 3)
> -#define context_address_root(c) ((c).lo & PAGE_MASK_4K)
> -#define context_address_width(c) ((c).hi &  7)
> -#define context_domain_id(c) (((c).hi >> 8) & ((1 << 16) - 1))
> +    union {
> +        __uint128_t val;
> +        struct { uint64_t lo, hi; };
> +        struct {
> +            /* 0 - 63 */
> +            bool p:1;
> +            bool fpd:1;
> +            uint64_t tt:2;

unsigned int

With these taken care of
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 7/7] vtd: use a bit field for dma_pte
  2020-11-20 13:24 ` [PATCH v10 7/7] vtd: use a bit field for dma_pte Paul Durrant
@ 2020-11-27 16:02   ` Jan Beulich
  2020-11-30  5:29     ` Tian, Kevin
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2020-11-27 16:02 UTC (permalink / raw)
  To: Paul Durrant, Kevin Tian; +Cc: Paul Durrant, xen-devel

On 20.11.2020 14:24, Paul Durrant wrote:
> @@ -709,20 +709,23 @@ static void dma_pte_clear_one(struct domain *domain, uint64_t addr,
>      page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
>      pte = page + address_level_offset(addr, 1);
>  
> -    if ( !dma_pte_present(*pte) )
> +    if ( !pte->r && !pte->w )

I think dma_pte_present() wants to stay, so we would have to touch
only one place when adding support for x.

>      {
>          spin_unlock(&hd->arch.mapping_lock);
>          unmap_vtd_domain_page(page);
>          return;
>      }
>  
> -    dma_clear_pte(*pte);
> -    *flush_flags |= IOMMU_FLUSHF_modified;
> +    pte->r = pte->w = false;
> +    smp_wmb();
> +    pte->val = 0;
>  
>      spin_unlock(&hd->arch.mapping_lock);
>      iommu_sync_cache(pte, sizeof(struct dma_pte));

Just as an observation - in an earlier patch I think there was a
code sequence having these the other way around. I think we want
to settle one one way of doing this (flush then unlock, or unlock
then flush). Kevin?

> @@ -1775,15 +1778,12 @@ static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
>      page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
>      pte = &page[dfn_x(dfn) & LEVEL_MASK];
>      old = *pte;
> -
> -    dma_set_pte_addr(new, mfn_to_maddr(mfn));
> -    dma_set_pte_prot(new,
> -                     ((flags & IOMMUF_readable) ? DMA_PTE_READ  : 0) |
> -                     ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0));
> -
> -    /* Set the SNP on leaf page table if Snoop Control available */
> -    if ( iommu_snoop )
> -        dma_set_pte_snp(new);
> +    new = (struct dma_pte){
> +        .r = flags & IOMMUF_readable,
> +        .w = flags & IOMMUF_writable,
> +        .snp = iommu_snoop,
> +        .addr = mfn_x(mfn),
> +    };

We still haven't settled on a newer gcc baseline, so this kind of
initializer is still not allowed (as in: will break the build) for
struct-s with unnamed sub-struct-s / sub-union-s.

> @@ -2611,18 +2611,18 @@ static void vtd_dump_page_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
>              process_pending_softirqs();
>  
>          pte = &pt_vaddr[i];
> -        if ( !dma_pte_present(*pte) )
> +        if ( !pte->r && !pte->w )
>              continue;
>  
>          address = gpa + offset_level_address(i, level);
>          if ( next_level >= 1 ) 
> -            vtd_dump_page_table_level(dma_pte_addr(*pte), next_level,
> +            vtd_dump_page_table_level(pfn_to_paddr(pte->addr), next_level,
>                                        address, indent + 1);
>          else
>              printk("%*sdfn: %08lx mfn: %08lx\n",
>                     indent, "",
>                     (unsigned long)(address >> PAGE_SHIFT_4K),
> -                   (unsigned long)(dma_pte_addr(*pte) >> PAGE_SHIFT_4K));
> +                   (unsigned long)(pte->addr));

Could you also drop the no longer needed pair of parentheses. I
further suspect the cast isn't needed (anymore?). (Otoh I think
I recall oddities with gcc's printf()-style format checking and
direct passing of bitfields. But if that's a problem, I think
one of the earlier ones already introduced such an issue. So
perhaps we can wait until someone actually confirms there is an
issue - quite likely this someone would be me anyway.)

> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -244,38 +244,21 @@ struct context_entry {
>  #define level_size(l) (1 << level_to_offset_bits(l))
>  #define align_to_level(addr, l) ((addr + level_size(l) - 1) & level_mask(l))
>  
> -/*
> - * 0: readable
> - * 1: writable
> - * 2-6: reserved
> - * 7: super page
> - * 8-11: available
> - * 12-63: Host physcial address
> - */
>  struct dma_pte {
> -    u64 val;
> +    union {
> +        uint64_t val;
> +        struct {
> +            bool r:1;
> +            bool w:1;
> +            unsigned int reserved0:1;
> +            unsigned int ignored0:4;
> +            bool ps:1;
> +            unsigned int ignored1:3;
> +            bool snp:1;
> +            uint64_t addr:52;

As per the doc I look at this extends only to bit 51 at most.
Above are 11 ignored bits and (in leaf entries) the TM one.

Considering the differences between leaf and intermediate
entries, perhaps leaf-only fields could gain a brief comment?

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 0/7] IOMMU cleanup
  2020-11-20 13:24 [PATCH v10 0/7] IOMMU cleanup Paul Durrant
                   ` (6 preceding siblings ...)
  2020-11-20 13:24 ` [PATCH v10 7/7] vtd: use a bit field for dma_pte Paul Durrant
@ 2020-11-27 16:21 ` Jan Beulich
  2020-11-27 16:32   ` Durrant, Paul
  7 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2020-11-27 16:21 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Paul Durrant, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Jun Nakajima, Kevin Tian, Roger Pau Monné,
	Stefano Stabellini, Wei Liu, xen-devel

On 20.11.2020 14:24, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> This is the remainder of the cleanup series deferred until XSA-346 and
> XSA-347 were publicly disclosed.
> 
> Paul Durrant (7):
>   remove remaining uses of iommu_legacy_map/unmap
>   common/grant_table: batch flush I/O TLB
>   iommu: remove the share_p2m operation
>   iommu: stop calling IOMMU page tables 'p2m tables'

Are the latter two patches dependent upon the former two, or could
they go in independently?

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v10 0/7] IOMMU cleanup
  2020-11-27 16:21 ` [PATCH v10 0/7] IOMMU cleanup Jan Beulich
@ 2020-11-27 16:32   ` Durrant, Paul
  0 siblings, 0 replies; 19+ messages in thread
From: Durrant, Paul @ 2020-11-27 16:32 UTC (permalink / raw)
  To: Jan Beulich, Paul Durrant
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Grall, Julien,
	Jun Nakajima, Kevin Tian, Roger Pau Monné,
	Stefano Stabellini, Wei Liu, xen-devel

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 27 November 2020 16:22
> To: Paul Durrant <paul@xen.org>
> Cc: Durrant, Paul <pdurrant@amazon.co.uk>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
> <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Grall, Julien
> <jgrall@amazon.co.uk>; Jun Nakajima <jun.nakajima@intel.com>; Kevin Tian <kevin.tian@intel.com>; Roger
> Pau Monné <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>;
> xen-devel@lists.xenproject.org
> Subject: RE: [EXTERNAL] [PATCH v10 0/7] IOMMU cleanup
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 20.11.2020 14:24, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > This is the remainder of the cleanup series deferred until XSA-346 and
> > XSA-347 were publicly disclosed.
> >
> > Paul Durrant (7):
> >   remove remaining uses of iommu_legacy_map/unmap
> >   common/grant_table: batch flush I/O TLB
> >   iommu: remove the share_p2m operation
> >   iommu: stop calling IOMMU page tables 'p2m tables'
> 
> Are the latter two patches dependent upon the former two, or could
> they go in independently?
> 

Not really. They should be able to go in ahead of the other two.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v10 5/7] vtd: use a bit field for root_entry
  2020-11-20 13:24 ` [PATCH v10 5/7] vtd: use a bit field for root_entry Paul Durrant
  2020-11-27 15:11   ` Jan Beulich
@ 2020-11-30  3:06   ` Tian, Kevin
  2020-11-30  9:45     ` Jan Beulich
  1 sibling, 1 reply; 19+ messages in thread
From: Tian, Kevin @ 2020-11-30  3:06 UTC (permalink / raw)
  To: Paul Durrant, xen-devel; +Cc: Paul Durrant

> From: Paul Durrant <paul@xen.org>
> Sent: Friday, November 20, 2020 9:25 PM
> 
> From: Paul Durrant <pdurrant@amazon.com>
> 
> This makes the code a little easier to read and also makes it more consistent
> with iremap_entry.
> 
> Also take the opportunity to tidy up the implementation of
> device_in_domain().
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>

Reviewed-by: <kevin.tian@intel.com>

> ---
> Cc: Kevin Tian <kevin.tian@intel.com>
> 
> v10:
>  - Small tweaks requested by Jan
>  - Remove macros in favour of direct field access
>  - Add missing barrier
> 
> v4:
>  - New in v4
> ---
>  xen/drivers/passthrough/vtd/iommu.c   |  9 +++++----
>  xen/drivers/passthrough/vtd/iommu.h   | 25 ++++++++++++-------------
>  xen/drivers/passthrough/vtd/utils.c   |  6 +++---
>  xen/drivers/passthrough/vtd/x86/ats.c | 27 +++++++++++++++------------
>  4 files changed, 35 insertions(+), 32 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index d136fe36883b..1a038541f0a3 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -237,7 +237,7 @@ static u64 bus_to_context_maddr(struct vtd_iommu
> *iommu, u8 bus)
>      ASSERT(spin_is_locked(&iommu->lock));
>      root_entries = (struct root_entry *)map_vtd_domain_page(iommu-
> >root_maddr);
>      root = &root_entries[bus];
> -    if ( !root_present(*root) )
> +    if ( !root->p )
>      {
>          maddr = alloc_pgtable_maddr(1, iommu->node);
>          if ( maddr == 0 )
> @@ -245,11 +245,12 @@ static u64 bus_to_context_maddr(struct
> vtd_iommu *iommu, u8 bus)
>              unmap_vtd_domain_page(root_entries);
>              return 0;
>          }
> -        set_root_value(*root, maddr);
> -        set_root_present(*root);
> +        root->ctp = paddr_to_pfn(maddr);
> +        smp_wmb();
> +        root->p = true;
>          iommu_sync_cache(root, sizeof(struct root_entry));
>      }
> -    maddr = (u64) get_context_addr(*root);
> +    maddr = pfn_to_paddr(root->ctp);
>      unmap_vtd_domain_page(root_entries);
>      return maddr;
>  }
> diff --git a/xen/drivers/passthrough/vtd/iommu.h
> b/xen/drivers/passthrough/vtd/iommu.h
> index 216791b3d634..b14628eec260 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -184,21 +184,20 @@
>  #define dma_frcd_source_id(c) (c & 0xffff)
>  #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
> 
> -/*
> - * 0: Present
> - * 1-11: Reserved
> - * 12-63: Context Ptr (12 - (haw-1))
> - * 64-127: Reserved
> - */
>  struct root_entry {
> -    u64    val;
> -    u64    rsvd1;
> +    union {
> +        struct { uint64_t lo, hi; };
> +        struct {
> +            /* 0 - 63 */
> +            bool p:1;
> +            unsigned int reserved0:11;
> +            uint64_t ctp:52;
> +
> +            /* 64 - 127 */
> +            uint64_t reserved1;
> +        };
> +    };
>  };
> -#define root_present(root)    ((root).val & 1)
> -#define set_root_present(root) do {(root).val |= 1;} while(0)
> -#define get_context_addr(root) ((root).val & PAGE_MASK_4K)
> -#define set_root_value(root, value) \
> -    do {(root).val |= ((value) & PAGE_MASK_4K);} while(0)
> 
>  struct context_entry {
>      u64 lo;
> diff --git a/xen/drivers/passthrough/vtd/utils.c
> b/xen/drivers/passthrough/vtd/utils.c
> index 4febcf506d8a..5f25a86a535c 100644
> --- a/xen/drivers/passthrough/vtd/utils.c
> +++ b/xen/drivers/passthrough/vtd/utils.c
> @@ -112,15 +112,15 @@ void print_vtd_entries(struct vtd_iommu *iommu,
> int bus, int devfn, u64 gmfn)
>          return;
>      }
> 
> -    printk("    root_entry[%02x] = %"PRIx64"\n", bus, root_entry[bus].val);
> -    if ( !root_present(root_entry[bus]) )
> +    printk("    root_entry[%02x] = %"PRIx64"\n", bus, root_entry[bus].lo);
> +    if ( !root_entry[bus].p )
>      {
>          unmap_vtd_domain_page(root_entry);
>          printk("    root_entry[%02x] not present\n", bus);
>          return;
>      }
> 
> -    val = root_entry[bus].val;
> +    val = pfn_to_paddr(root_entry[bus].ctp);
>      unmap_vtd_domain_page(root_entry);
>      ctxt_entry = map_vtd_domain_page(val);
>      if ( ctxt_entry == NULL )
> diff --git a/xen/drivers/passthrough/vtd/x86/ats.c
> b/xen/drivers/passthrough/vtd/x86/ats.c
> index 04d702b1d6b1..fec969ef75bb 100644
> --- a/xen/drivers/passthrough/vtd/x86/ats.c
> +++ b/xen/drivers/passthrough/vtd/x86/ats.c
> @@ -74,8 +74,8 @@ int ats_device(const struct pci_dev *pdev, const struct
> acpi_drhd_unit *drhd)
>  static bool device_in_domain(const struct vtd_iommu *iommu,
>                               const struct pci_dev *pdev, uint16_t did)
>  {
> -    struct root_entry *root_entry;
> -    struct context_entry *ctxt_entry = NULL;
> +    struct root_entry *root_entry, *root_entries;
> +    struct context_entry *context_entry, *context_entries = NULL;
>      unsigned int tt;
>      bool found = false;
> 
> @@ -85,25 +85,28 @@ static bool device_in_domain(const struct
> vtd_iommu *iommu,
>          return false;
>      }
> 
> -    root_entry = map_vtd_domain_page(iommu->root_maddr);
> -    if ( !root_present(root_entry[pdev->bus]) )
> +    root_entries = (struct root_entry *)map_vtd_domain_page(iommu-
> >root_maddr);
> +    root_entry = &root_entries[pdev->bus];
> +    if ( !root_entry->p )
>          goto out;
> 
> -    ctxt_entry = map_vtd_domain_page(root_entry[pdev->bus].val);
> -    if ( context_domain_id(ctxt_entry[pdev->devfn]) != did )
> +    context_entries = map_vtd_domain_page(root_entry->ctp);
> +    context_entry = &context_entries[pdev->devfn];
> +    if ( context_domain_id(*context_entry) != did )
>          goto out;
> 
> -    tt = context_translation_type(ctxt_entry[pdev->devfn]);
> +    tt = context_translation_type(*context_entry);
>      if ( tt != CONTEXT_TT_DEV_IOTLB )
>          goto out;
> 
>      found = true;
> -out:
> -    if ( root_entry )
> -        unmap_vtd_domain_page(root_entry);
> 
> -    if ( ctxt_entry )
> -        unmap_vtd_domain_page(ctxt_entry);
> + out:
> +    if ( root_entries )
> +        unmap_vtd_domain_page(root_entries);
> +
> +    if ( context_entries )
> +        unmap_vtd_domain_page(context_entries);
> 
>      return found;
>  }
> --
> 2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v10 6/7] vtd: use a bit field for context_entry
  2020-11-20 13:24 ` [PATCH v10 6/7] vtd: use a bit field for context_entry Paul Durrant
  2020-11-27 15:32   ` Jan Beulich
@ 2020-11-30  3:10   ` Tian, Kevin
  1 sibling, 0 replies; 19+ messages in thread
From: Tian, Kevin @ 2020-11-30  3:10 UTC (permalink / raw)
  To: Paul Durrant, xen-devel; +Cc: Paul Durrant

> From: Paul Durrant <paul@xen.org>
> Sent: Friday, November 20, 2020 9:25 PM
> 
> From: Paul Durrant <pdurrant@amazon.com>
> 
> This removes the need for much shifting, masking and several magic
> numbers.
> On the whole it makes the code quite a bit more readable.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
> Cc: Kevin Tian <kevin.tian@intel.com>
> 
> v10:
>  - Remove macros in favour of direct field access
>  - Adjust field types
>  - Add missing barriers
> 
> v4:
>  - New in v4
> ---
>  xen/drivers/passthrough/vtd/iommu.c   | 36 +++++++++++----------
>  xen/drivers/passthrough/vtd/iommu.h   | 45 +++++++++++++--------------
>  xen/drivers/passthrough/vtd/utils.c   | 10 +++---
>  xen/drivers/passthrough/vtd/x86/ats.c |  6 ++--
>  4 files changed, 47 insertions(+), 50 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index 1a038541f0a3..fdb472ad6515 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -86,8 +86,6 @@ static int domain_iommu_domid(struct domain *d,
>      return -1;
>  }
> 
> -#define DID_FIELD_WIDTH 16
> -#define DID_HIGH_OFFSET 8
>  static int context_set_domain_id(struct context_entry *context,
>                                   struct domain *d,
>                                   struct vtd_iommu *iommu)
> @@ -121,21 +119,22 @@ static int context_set_domain_id(struct
> context_entry *context,
>      }
> 
>      set_bit(i, iommu->domid_bitmap);
> -    context->hi |= (i & ((1 << DID_FIELD_WIDTH) - 1)) << DID_HIGH_OFFSET;
> +    context->did = i;
> +
>      return 0;
>  }
> 
>  static int context_get_domain_id(struct context_entry *context,
>                                   struct vtd_iommu *iommu)
>  {
> -    unsigned long dom_index, nr_dom;
>      int domid = -1;
> 
>      if (iommu && context)
>      {
> -        nr_dom = cap_ndoms(iommu->cap);
> +        unsigned long dom_index, nr_dom;
> 
> -        dom_index = context_domain_id(*context);
> +        nr_dom = cap_ndoms(iommu->cap);
> +        dom_index = context->did;
> 
>          if ( dom_index < nr_dom && iommu->domid_map )
>              domid = iommu->domid_map[dom_index];
> @@ -1338,7 +1337,7 @@ int domain_context_mapping_one(
>      context_entries = (struct context_entry *)map_vtd_domain_page(maddr);
>      context = &context_entries[devfn];
> 
> -    if ( context_present(*context) )
> +    if ( context->p )
>      {
>          int res = 0;
> 
> @@ -1382,7 +1381,7 @@ int domain_context_mapping_one(
> 
>      if ( iommu_hwdom_passthrough && is_hardware_domain(domain) )
>      {
> -        context_set_translation_type(*context, CONTEXT_TT_PASS_THRU);
> +        context->tt = CONTEXT_TT_PASS_THRU;
>      }
>      else
>      {
> @@ -1397,11 +1396,11 @@ int domain_context_mapping_one(
>              return -ENOMEM;
>          }
> 
> -        context_set_address_root(*context, pgd_maddr);
> +        context->slptptr = paddr_to_pfn(pgd_maddr);
>          if ( ats_enabled && ecap_dev_iotlb(iommu->ecap) )
> -            context_set_translation_type(*context, CONTEXT_TT_DEV_IOTLB);
> +            context->tt = CONTEXT_TT_DEV_IOTLB;
>          else
> -            context_set_translation_type(*context, CONTEXT_TT_MULTI_LEVEL);
> +            context->tt = CONTEXT_TT_MULTI_LEVEL;
> 
>          spin_unlock(&hd->arch.mapping_lock);
>      }
> @@ -1413,9 +1412,10 @@ int domain_context_mapping_one(
>          return -EFAULT;
>      }
> 
> -    context_set_address_width(*context, level_to_agaw(iommu-
> >nr_pt_levels));
> -    context_set_fault_enable(*context);
> -    context_set_present(*context);
> +    context->aw = level_to_agaw(iommu->nr_pt_levels);
> +    context->fpd = false;
> +    smp_wmb();
> +    context->p = true;
>      iommu_sync_cache(context, sizeof(struct context_entry));
>      spin_unlock(&iommu->lock);
> 
> @@ -1567,17 +1567,19 @@ int domain_context_unmap_one(
>      context_entries = (struct context_entry *)map_vtd_domain_page(maddr);
>      context = &context_entries[devfn];
> 
> -    if ( !context_present(*context) )
> +    if ( !context->p )
>      {
>          spin_unlock(&iommu->lock);
>          unmap_vtd_domain_page(context_entries);
>          return 0;
>      }
> 
> -    context_clear_present(*context);
> -    context_clear_entry(*context);
> +    context->p = false;
> +    smp_wmb();
>      iommu_sync_cache(context, sizeof(struct context_entry));
> 
> +    context->val = 0; /* No need to sync; present bit is already cleared */
> +
>      iommu_domid= domain_iommu_domid(domain, iommu);
>      if ( iommu_domid == -1 )
>      {
> diff --git a/xen/drivers/passthrough/vtd/iommu.h
> b/xen/drivers/passthrough/vtd/iommu.h
> index b14628eec260..33b1abf98526 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -198,37 +198,34 @@ struct root_entry {
>          };
>      };
>  };
> +#define ROOT_ENTRY_NR (PAGE_SIZE_4K / sizeof(struct root_entry))
> 
>  struct context_entry {
> -    u64 lo;
> -    u64 hi;
> -};
> -#define ROOT_ENTRY_NR (PAGE_SIZE_4K/sizeof(struct root_entry))
> -#define context_present(c) ((c).lo & 1)
> -#define context_fault_disable(c) (((c).lo >> 1) & 1)
> -#define context_translation_type(c) (((c).lo >> 2) & 3)
> -#define context_address_root(c) ((c).lo & PAGE_MASK_4K)
> -#define context_address_width(c) ((c).hi &  7)
> -#define context_domain_id(c) (((c).hi >> 8) & ((1 << 16) - 1))
> +    union {
> +        __uint128_t val;
> +        struct { uint64_t lo, hi; };
> +        struct {
> +            /* 0 - 63 */
> +            bool p:1;
> +            bool fpd:1;
> +            uint64_t tt:2;
> 
> -#define context_set_present(c) do {(c).lo |= 1;} while(0)
> -#define context_clear_present(c) do {(c).lo &= ~1;} while(0)
> -#define context_set_fault_enable(c) \
> -    do {(c).lo &= (((u64)-1) << 2) | 1;} while(0)
> -
> -#define context_set_translation_type(c, val) do { \
> -        (c).lo &= (((u64)-1) << 4) | 3; \
> -        (c).lo |= (val & 3) << 2; \
> -    } while(0)
>  #define CONTEXT_TT_MULTI_LEVEL 0
>  #define CONTEXT_TT_DEV_IOTLB   1
>  #define CONTEXT_TT_PASS_THRU   2
> 
> -#define context_set_address_root(c, val) \
> -    do {(c).lo &= 0xfff; (c).lo |= (val) & PAGE_MASK_4K ;} while(0)
> -#define context_set_address_width(c, val) \
> -    do {(c).hi &= 0xfffffff8; (c).hi |= (val) & 7;} while(0)
> -#define context_clear_entry(c) do {(c).lo = 0; (c).hi = 0;} while(0)
> +            unsigned int reserved0:8;
> +            uint64_t slptptr:52;
> +
> +            /* 64 - 127 */
> +            unsigned int aw:3;
> +            unsigned int ignored:4;
> +            unsigned int reserved1:1;
> +            unsigned int did:16;
> +            uint64_t reserved2:40;
> +        };
> +    };
> +};
> 
>  /* page table handling */
>  #define LEVEL_STRIDE       (9)
> diff --git a/xen/drivers/passthrough/vtd/utils.c
> b/xen/drivers/passthrough/vtd/utils.c
> index 5f25a86a535c..4bca160bc663 100644
> --- a/xen/drivers/passthrough/vtd/utils.c
> +++ b/xen/drivers/passthrough/vtd/utils.c
> @@ -129,17 +129,17 @@ void print_vtd_entries(struct vtd_iommu *iommu,
> int bus, int devfn, u64 gmfn)
>          return;
>      }
> 
> -    val = ctxt_entry[devfn].lo;
> -    printk("    context[%02x] = %"PRIx64"_%"PRIx64"\n",
> -           devfn, ctxt_entry[devfn].hi, val);
> -    if ( !context_present(ctxt_entry[devfn]) )
> +    printk("    context[%02x] = %"PRIx64"_%"PRIx64"\n", devfn,
> +           ctxt_entry[devfn].hi, ctxt_entry[devfn].lo);
> +    if ( !ctxt_entry[devfn].p )
>      {
>          unmap_vtd_domain_page(ctxt_entry);
>          printk("    ctxt_entry[%02x] not present\n", devfn);
>          return;
>      }
> 
> -    level = agaw_to_level(context_address_width(ctxt_entry[devfn]));
> +    level = agaw_to_level(ctxt_entry[devfn].aw);
> +    val = pfn_to_paddr(ctxt_entry[devfn].slptptr);
>      unmap_vtd_domain_page(ctxt_entry);
>      if ( level != VTD_PAGE_TABLE_LEVEL_3 &&
>           level != VTD_PAGE_TABLE_LEVEL_4)
> diff --git a/xen/drivers/passthrough/vtd/x86/ats.c
> b/xen/drivers/passthrough/vtd/x86/ats.c
> index fec969ef75bb..cb057ced3cf7 100644
> --- a/xen/drivers/passthrough/vtd/x86/ats.c
> +++ b/xen/drivers/passthrough/vtd/x86/ats.c
> @@ -76,7 +76,6 @@ static bool device_in_domain(const struct vtd_iommu
> *iommu,
>  {
>      struct root_entry *root_entry, *root_entries;
>      struct context_entry *context_entry, *context_entries = NULL;
> -    unsigned int tt;
>      bool found = false;
> 
>      if ( unlikely(!iommu->root_maddr) )
> @@ -92,11 +91,10 @@ static bool device_in_domain(const struct
> vtd_iommu *iommu,
> 
>      context_entries = map_vtd_domain_page(root_entry->ctp);
>      context_entry = &context_entries[pdev->devfn];
> -    if ( context_domain_id(*context_entry) != did )
> +    if ( context_entry->did != did )
>          goto out;
> 
> -    tt = context_translation_type(*context_entry);
> -    if ( tt != CONTEXT_TT_DEV_IOTLB )
> +    if ( context_entry->tt != CONTEXT_TT_DEV_IOTLB )
>          goto out;
> 
>      found = true;
> --
> 2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v10 7/7] vtd: use a bit field for dma_pte
  2020-11-27 16:02   ` Jan Beulich
@ 2020-11-30  5:29     ` Tian, Kevin
  0 siblings, 0 replies; 19+ messages in thread
From: Tian, Kevin @ 2020-11-30  5:29 UTC (permalink / raw)
  To: Jan Beulich, Paul Durrant; +Cc: Paul Durrant, xen-devel

> From: Beulich
> Sent: Saturday, November 28, 2020 12:02 AM
> 
> On 20.11.2020 14:24, Paul Durrant wrote:
> > @@ -709,20 +709,23 @@ static void dma_pte_clear_one(struct domain
> *domain, uint64_t addr,
> >      page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
> >      pte = page + address_level_offset(addr, 1);
> >
> > -    if ( !dma_pte_present(*pte) )
> > +    if ( !pte->r && !pte->w )
> 
> I think dma_pte_present() wants to stay, so we would have to touch
> only one place when adding support for x.
> 
> >      {
> >          spin_unlock(&hd->arch.mapping_lock);
> >          unmap_vtd_domain_page(page);
> >          return;
> >      }
> >
> > -    dma_clear_pte(*pte);
> > -    *flush_flags |= IOMMU_FLUSHF_modified;
> > +    pte->r = pte->w = false;
> > +    smp_wmb();
> > +    pte->val = 0;
> >
> >      spin_unlock(&hd->arch.mapping_lock);
> >      iommu_sync_cache(pte, sizeof(struct dma_pte));
> 
> Just as an observation - in an earlier patch I think there was a
> code sequence having these the other way around. I think we want
> to settle one one way of doing this (flush then unlock, or unlock
> then flush). Kevin?
> 

Agree. Generally speaking 'unlock then flush' is preferred since
spinlock doesn't protect iommu anyway.

> > @@ -1775,15 +1778,12 @@ static int __must_check
> intel_iommu_map_page(struct domain *d, dfn_t dfn,
> >      page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
> >      pte = &page[dfn_x(dfn) & LEVEL_MASK];
> >      old = *pte;
> > -
> > -    dma_set_pte_addr(new, mfn_to_maddr(mfn));
> > -    dma_set_pte_prot(new,
> > -                     ((flags & IOMMUF_readable) ? DMA_PTE_READ  : 0) |
> > -                     ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0));
> > -
> > -    /* Set the SNP on leaf page table if Snoop Control available */
> > -    if ( iommu_snoop )
> > -        dma_set_pte_snp(new);
> > +    new = (struct dma_pte){
> > +        .r = flags & IOMMUF_readable,
> > +        .w = flags & IOMMUF_writable,
> > +        .snp = iommu_snoop,
> > +        .addr = mfn_x(mfn),
> > +    };
> 
> We still haven't settled on a newer gcc baseline, so this kind of
> initializer is still not allowed (as in: will break the build) for
> struct-s with unnamed sub-struct-s / sub-union-s.
> 
> > @@ -2611,18 +2611,18 @@ static void
> vtd_dump_page_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
> >              process_pending_softirqs();
> >
> >          pte = &pt_vaddr[i];
> > -        if ( !dma_pte_present(*pte) )
> > +        if ( !pte->r && !pte->w )
> >              continue;
> >
> >          address = gpa + offset_level_address(i, level);
> >          if ( next_level >= 1 )
> > -            vtd_dump_page_table_level(dma_pte_addr(*pte), next_level,
> > +            vtd_dump_page_table_level(pfn_to_paddr(pte->addr), next_level,
> >                                        address, indent + 1);
> >          else
> >              printk("%*sdfn: %08lx mfn: %08lx\n",
> >                     indent, "",
> >                     (unsigned long)(address >> PAGE_SHIFT_4K),
> > -                   (unsigned long)(dma_pte_addr(*pte) >> PAGE_SHIFT_4K));
> > +                   (unsigned long)(pte->addr));
> 
> Could you also drop the no longer needed pair of parentheses. I
> further suspect the cast isn't needed (anymore?). (Otoh I think
> I recall oddities with gcc's printf()-style format checking and
> direct passing of bitfields. But if that's a problem, I think
> one of the earlier ones already introduced such an issue. So
> perhaps we can wait until someone actually confirms there is an
> issue - quite likely this someone would be me anyway.)
> 
> > --- a/xen/drivers/passthrough/vtd/iommu.h
> > +++ b/xen/drivers/passthrough/vtd/iommu.h
> > @@ -244,38 +244,21 @@ struct context_entry {
> >  #define level_size(l) (1 << level_to_offset_bits(l))
> >  #define align_to_level(addr, l) ((addr + level_size(l) - 1) & level_mask(l))
> >
> > -/*
> > - * 0: readable
> > - * 1: writable
> > - * 2-6: reserved
> > - * 7: super page
> > - * 8-11: available
> > - * 12-63: Host physcial address
> > - */
> >  struct dma_pte {
> > -    u64 val;
> > +    union {
> > +        uint64_t val;
> > +        struct {
> > +            bool r:1;
> > +            bool w:1;
> > +            unsigned int reserved0:1;

'X' bit is ignored instead of reserved when execute request is not
reported or disabled.

Thanks
Kevin

> > +            unsigned int ignored0:4;
> > +            bool ps:1;
> > +            unsigned int ignored1:3;
> > +            bool snp:1;
> > +            uint64_t addr:52;
> 
> As per the doc I look at this extends only to bit 51 at most.
> Above are 11 ignored bits and (in leaf entries) the TM one.
> 
> Considering the differences between leaf and intermediate
> entries, perhaps leaf-only fields could gain a brief comment?
> 
> Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v10 5/7] vtd: use a bit field for root_entry
  2020-11-30  3:06   ` Tian, Kevin
@ 2020-11-30  9:45     ` Jan Beulich
  2020-12-01  5:14       ` Tian, Kevin
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2020-11-30  9:45 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Paul Durrant, Paul Durrant, xen-devel

On 30.11.2020 04:06, Tian, Kevin wrote:
>> From: Paul Durrant <paul@xen.org>
>> Sent: Friday, November 20, 2020 9:25 PM
>>
>> From: Paul Durrant <pdurrant@amazon.com>
>>
>> This makes the code a little easier to read and also makes it more consistent
>> with iremap_entry.
>>
>> Also take the opportunity to tidy up the implementation of
>> device_in_domain().
>>
>> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> 
> Reviewed-by: <kevin.tian@intel.com>

Besides this looking a little odd (can be easily fixed of course)
I wonder whether both here and for patch 6 you had seen my requests
for smallish changes, and whether you meant to override those, or
whether your R-b will continue to apply with them made.

Jan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v10 5/7] vtd: use a bit field for root_entry
  2020-11-30  9:45     ` Jan Beulich
@ 2020-12-01  5:14       ` Tian, Kevin
  0 siblings, 0 replies; 19+ messages in thread
From: Tian, Kevin @ 2020-12-01  5:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Paul Durrant, Paul Durrant, xen-devel

> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, November 30, 2020 5:46 PM
> 
> On 30.11.2020 04:06, Tian, Kevin wrote:
> >> From: Paul Durrant <paul@xen.org>
> >> Sent: Friday, November 20, 2020 9:25 PM
> >>
> >> From: Paul Durrant <pdurrant@amazon.com>
> >>
> >> This makes the code a little easier to read and also makes it more
> consistent
> >> with iremap_entry.
> >>
> >> Also take the opportunity to tidy up the implementation of
> >> device_in_domain().
> >>
> >> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> >
> > Reviewed-by: <kevin.tian@intel.com>
> 
> Besides this looking a little odd (can be easily fixed of course)
> I wonder whether both here and for patch 6 you had seen my requests
> for smallish changes, and whether you meant to override those, or
> whether your R-b will continue to apply with them made.
> 

Let my R-b continue to apply. Those are small changes.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-12-01  5:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-20 13:24 [PATCH v10 0/7] IOMMU cleanup Paul Durrant
2020-11-20 13:24 ` [PATCH v10 1/7] remove remaining uses of iommu_legacy_map/unmap Paul Durrant
2020-11-27 14:39   ` Jan Beulich
2020-11-20 13:24 ` [PATCH v10 2/7] common/grant_table: batch flush I/O TLB Paul Durrant
2020-11-20 13:24 ` [PATCH v10 3/7] iommu: remove the share_p2m operation Paul Durrant
2020-11-20 13:24 ` [PATCH v10 4/7] iommu: stop calling IOMMU page tables 'p2m tables' Paul Durrant
2020-11-20 13:24 ` [PATCH v10 5/7] vtd: use a bit field for root_entry Paul Durrant
2020-11-27 15:11   ` Jan Beulich
2020-11-30  3:06   ` Tian, Kevin
2020-11-30  9:45     ` Jan Beulich
2020-12-01  5:14       ` Tian, Kevin
2020-11-20 13:24 ` [PATCH v10 6/7] vtd: use a bit field for context_entry Paul Durrant
2020-11-27 15:32   ` Jan Beulich
2020-11-30  3:10   ` Tian, Kevin
2020-11-20 13:24 ` [PATCH v10 7/7] vtd: use a bit field for dma_pte Paul Durrant
2020-11-27 16:02   ` Jan Beulich
2020-11-30  5:29     ` Tian, Kevin
2020-11-27 16:21 ` [PATCH v10 0/7] IOMMU cleanup Jan Beulich
2020-11-27 16:32   ` Durrant, Paul

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.