All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] paravirtual IOMMU interface
@ 2018-02-12 10:47 Paul Durrant
  2018-02-12 10:47 ` [PATCH 1/7] iommu: introduce the concept of BFN Paul Durrant
                   ` (7 more replies)
  0 siblings, 8 replies; 68+ messages in thread
From: Paul Durrant @ 2018-02-12 10:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Jun Nakajima,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, Jan Beulich, Daniel De Graaf,
	Suravee Suthikulpanit

The idea of a paravirtual IOMMU interface was last discussed on xen-devel
more than two years ago and narrowed down on a draft specification [1].
There was also an RFC patch series posted with an implementation, however
this was never followed through.

In this patch series I have tried to simplify the interface and therefore
have moved away from the draft specification.

Patches #1 - #3 in the series introduce 'bus frame numbers' into Xen (frame
numbers relating to the IOMMU rather than the MMU). The modifications are
in common code and so affect ARM as well as x86.

Patch #4 adds a pre-requisite method in iommu_ops and an implementation
for VT-d. I have not done an implmentation for AMD IOMMUs as my test hard-
ware is Intel based, but one may be added in future.

Patches #5 - #7 introduce the new 'iommu_op' hypercall with sub-operations
to query ranges reserved in the IOMMU, map and unmap pages, and flush the
IOTLB.

For testing purposes, I have implemented patches to a Linux PV dom0 to set
up a 1:1 BFN:GFN mapping and use normal swiotlb dma operations rather
then xen-swiotlb.

[1] https://lists.xenproject.org/archives/html/xen-devel/2016-02/msg01428.html

Paul Durrant (7):
  iommu: introduce the concept of BFN...
  iommu: make use of type-safe BFN and MFN in exported functions
  iommu: push use of type-safe BFN and MFN into iommu_ops
  vtd: add lookup_page method to iommu_ops
  public / x86: introduce __HYPERCALL_iommu_op
  x86: add iommu_op to query reserved ranges
  x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB

 tools/flask/policy/modules/xen.if             |   1 +
 xen/arch/arm/p2m.c                            |   3 +-
 xen/arch/x86/Makefile                         |   1 +
 xen/arch/x86/hvm/hypercall.c                  |   1 +
 xen/arch/x86/hypercall.c                      |   1 +
 xen/arch/x86/iommu_op.c                       | 476 ++++++++++++++++++++++++++
 xen/arch/x86/mm.c                             |   7 +-
 xen/arch/x86/mm/p2m-ept.c                     |   8 +-
 xen/arch/x86/mm/p2m-pt.c                      |   8 +-
 xen/arch/x86/mm/p2m.c                         |  15 +-
 xen/arch/x86/pv/hypercall.c                   |   1 +
 xen/arch/x86/x86_64/mm.c                      |   5 +-
 xen/common/grant_table.c                      |  10 +-
 xen/common/memory.c                           |   4 +-
 xen/drivers/passthrough/amd/iommu_cmd.c       |  18 +-
 xen/drivers/passthrough/amd/iommu_map.c       |  85 ++---
 xen/drivers/passthrough/amd/pci_amd_iommu.c   |   4 +-
 xen/drivers/passthrough/arm/smmu.c            |  22 +-
 xen/drivers/passthrough/iommu.c               |  28 +-
 xen/drivers/passthrough/vtd/iommu.c           |  76 +++-
 xen/drivers/passthrough/vtd/iommu.h           |   2 +
 xen/drivers/passthrough/vtd/x86/vtd.c         |   3 +-
 xen/drivers/passthrough/x86/iommu.c           |   2 +-
 xen/include/Makefile                          |   2 +
 xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |   8 +-
 xen/include/public/iommu_op.h                 | 127 +++++++
 xen/include/public/xen.h                      |   1 +
 xen/include/xen/hypercall.h                   |  12 +
 xen/include/xen/iommu.h                       |  42 ++-
 xen/include/xlat.lst                          |   5 +
 xen/include/xsm/dummy.h                       |   6 +
 xen/include/xsm/xsm.h                         |   6 +
 xen/xsm/dummy.c                               |   1 +
 xen/xsm/flask/hooks.c                         |   6 +
 xen/xsm/flask/policy/access_vectors           |   2 +
 35 files changed, 868 insertions(+), 131 deletions(-)
 create mode 100644 xen/arch/x86/iommu_op.c
 create mode 100644 xen/include/public/iommu_op.h
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 1/7] iommu: introduce the concept of BFN...
  2018-02-12 10:47 [PATCH 0/7] paravirtual IOMMU interface Paul Durrant
@ 2018-02-12 10:47 ` Paul Durrant
  2018-03-15 13:39   ` Jan Beulich
  2018-02-12 10:47 ` [PATCH 2/7] iommu: make use of type-safe BFN and MFN in exported functions Paul Durrant
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-02-12 10:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Suravee Suthikulpanit,
	Julien Grall, Paul Durrant, Jan Beulich

...meaning 'bus frame number' i.e. a frame number mapped in the IOMMU
rather than the MMU.

This patch is a largely cosmetic change that substitutes the terms 'gfn'
and 'gaddr' for 'bfn' and 'baddr' in all the places where the frame number
or address relate to the IOMMU rather than the MMU.

The only non-cosmetic part is the introduction of a type-safe declaration
of bfn_t and definition of INVALID_BFN to make the substitution of
gfn_x(INVALID_GFN) mechanical. A subesquent patch will actually convert
code to make use of the new type.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Kevin Tian <kevin.tian@intel.com>
---
 xen/drivers/passthrough/amd/iommu_cmd.c     | 18 +++----
 xen/drivers/passthrough/amd/iommu_map.c     | 76 ++++++++++++++---------------
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  2 +-
 xen/drivers/passthrough/arm/smmu.c          | 14 +++---
 xen/drivers/passthrough/iommu.c             | 24 ++++-----
 xen/drivers/passthrough/vtd/iommu.c         | 30 ++++++------
 xen/include/xen/iommu.h                     | 16 +++---
 7 files changed, 92 insertions(+), 88 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_cmd.c b/xen/drivers/passthrough/amd/iommu_cmd.c
index 08247fa354..ff55b389a0 100644
--- a/xen/drivers/passthrough/amd/iommu_cmd.c
+++ b/xen/drivers/passthrough/amd/iommu_cmd.c
@@ -284,7 +284,7 @@ void invalidate_iommu_all(struct amd_iommu *iommu)
 }
 
 void amd_iommu_flush_iotlb(u8 devfn, const struct pci_dev *pdev,
-                           uint64_t gaddr, unsigned int order)
+                           uint64_t baddr, unsigned int order)
 {
     unsigned long flags;
     struct amd_iommu *iommu;
@@ -315,12 +315,12 @@ void amd_iommu_flush_iotlb(u8 devfn, const struct pci_dev *pdev,
 
     /* send INVALIDATE_IOTLB_PAGES command */
     spin_lock_irqsave(&iommu->lock, flags);
-    invalidate_iotlb_pages(iommu, maxpend, 0, queueid, gaddr, req_id, order);
+    invalidate_iotlb_pages(iommu, maxpend, 0, queueid, baddr, req_id, order);
     flush_command_buffer(iommu);
     spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
-static void amd_iommu_flush_all_iotlbs(struct domain *d, uint64_t gaddr,
+static void amd_iommu_flush_all_iotlbs(struct domain *d, uint64_t baddr,
                                        unsigned int order)
 {
     struct pci_dev *pdev;
@@ -333,7 +333,7 @@ static void amd_iommu_flush_all_iotlbs(struct domain *d, uint64_t gaddr,
         u8 devfn = pdev->devfn;
 
         do {
-            amd_iommu_flush_iotlb(devfn, pdev, gaddr, order);
+            amd_iommu_flush_iotlb(devfn, pdev, baddr, order);
             devfn += pdev->phantom_stride;
         } while ( devfn != pdev->devfn &&
                   PCI_SLOT(devfn) == PCI_SLOT(pdev->devfn) );
@@ -342,7 +342,7 @@ static void amd_iommu_flush_all_iotlbs(struct domain *d, uint64_t gaddr,
 
 /* Flush iommu cache after p2m changes. */
 static void _amd_iommu_flush_pages(struct domain *d,
-                                   uint64_t gaddr, unsigned int order)
+                                   uint64_t baddr, unsigned int order)
 {
     unsigned long flags;
     struct amd_iommu *iommu;
@@ -352,13 +352,13 @@ static void _amd_iommu_flush_pages(struct domain *d,
     for_each_amd_iommu ( iommu )
     {
         spin_lock_irqsave(&iommu->lock, flags);
-        invalidate_iommu_pages(iommu, gaddr, dom_id, order);
+        invalidate_iommu_pages(iommu, baddr, dom_id, order);
         flush_command_buffer(iommu);
         spin_unlock_irqrestore(&iommu->lock, flags);
     }
 
     if ( ats_enabled )
-        amd_iommu_flush_all_iotlbs(d, gaddr, order);
+        amd_iommu_flush_all_iotlbs(d, baddr, order);
 }
 
 void amd_iommu_flush_all_pages(struct domain *d)
@@ -367,9 +367,9 @@ void amd_iommu_flush_all_pages(struct domain *d)
 }
 
 void amd_iommu_flush_pages(struct domain *d,
-                           unsigned long gfn, unsigned int order)
+                           unsigned long bfn, unsigned int order)
 {
-    _amd_iommu_flush_pages(d, (uint64_t) gfn << PAGE_SHIFT, order);
+    _amd_iommu_flush_pages(d, (uint64_t) bfn << PAGE_SHIFT, order);
 }
 
 void amd_iommu_flush_device(struct amd_iommu *iommu, uint16_t bdf)
diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index fd2327d3e5..09d29ef026 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -35,12 +35,12 @@ static unsigned int pfn_to_pde_idx(unsigned long pfn, unsigned int level)
     return idx;
 }
 
-void clear_iommu_pte_present(unsigned long l1_mfn, unsigned long gfn)
+void clear_iommu_pte_present(unsigned long l1_mfn, unsigned long bfn)
 {
     u64 *table, *pte;
 
     table = map_domain_page(_mfn(l1_mfn));
-    pte = table + pfn_to_pde_idx(gfn, IOMMU_PAGING_MODE_LEVEL_1);
+    pte = table + pfn_to_pde_idx(bfn, IOMMU_PAGING_MODE_LEVEL_1);
     *pte = 0;
     unmap_domain_page(table);
 }
@@ -104,7 +104,7 @@ static bool_t set_iommu_pde_present(u32 *pde, unsigned long next_mfn,
     return need_flush;
 }
 
-static bool_t set_iommu_pte_present(unsigned long pt_mfn, unsigned long gfn, 
+static bool_t set_iommu_pte_present(unsigned long pt_mfn, unsigned long bfn,
                                     unsigned long next_mfn, int pde_level, 
                                     bool_t iw, bool_t ir)
 {
@@ -114,7 +114,7 @@ static bool_t set_iommu_pte_present(unsigned long pt_mfn, unsigned long gfn,
 
     table = map_domain_page(_mfn(pt_mfn));
 
-    pde = (u32*)(table + pfn_to_pde_idx(gfn, pde_level));
+    pde = (u32*)(table + pfn_to_pde_idx(bfn, pde_level));
 
     need_flush = set_iommu_pde_present(pde, next_mfn, 
                                        IOMMU_PAGING_MODE_LEVEL_0, iw, ir);
@@ -331,7 +331,7 @@ static void set_pde_count(u64 *pde, unsigned int count)
  * otherwise increase pde count if mfn is contigous with mfn - 1
  */
 static int iommu_update_pde_count(struct domain *d, unsigned long pt_mfn,
-                                  unsigned long gfn, unsigned long mfn,
+                                  unsigned long bfn, unsigned long mfn,
                                   unsigned int merge_level)
 {
     unsigned int pde_count, next_level;
@@ -347,7 +347,7 @@ static int iommu_update_pde_count(struct domain *d, unsigned long pt_mfn,
 
     /* get pde at merge level */
     table = map_domain_page(_mfn(pt_mfn));
-    pde = table + pfn_to_pde_idx(gfn, merge_level);
+    pde = table + pfn_to_pde_idx(bfn, merge_level);
 
     /* get page table of next level */
     ntable_maddr = amd_iommu_get_next_table_from_pte((u32*)pde);
@@ -362,7 +362,7 @@ static int iommu_update_pde_count(struct domain *d, unsigned long pt_mfn,
     mask = (1ULL<< (PTE_PER_TABLE_SHIFT * next_level)) - 1;
 
     if ( ((first_mfn & mask) == 0) &&
-         (((gfn & mask) | first_mfn) == mfn) )
+         (((bfn & mask) | first_mfn) == mfn) )
     {
         pde_count = get_pde_count(*pde);
 
@@ -387,7 +387,7 @@ out:
 }
 
 static int iommu_merge_pages(struct domain *d, unsigned long pt_mfn,
-                             unsigned long gfn, unsigned int flags,
+                             unsigned long bfn, unsigned int flags,
                              unsigned int merge_level)
 {
     u64 *table, *pde, *ntable;
@@ -398,7 +398,7 @@ static int iommu_merge_pages(struct domain *d, unsigned long pt_mfn,
     ASSERT( spin_is_locked(&hd->arch.mapping_lock) && pt_mfn );
 
     table = map_domain_page(_mfn(pt_mfn));
-    pde = table + pfn_to_pde_idx(gfn, merge_level);
+    pde = table + pfn_to_pde_idx(bfn, merge_level);
 
     /* get first mfn */
     ntable_mfn = amd_iommu_get_next_table_from_pte((u32*)pde) >> PAGE_SHIFT;
@@ -436,7 +436,7 @@ static int iommu_merge_pages(struct domain *d, unsigned long pt_mfn,
  * {Re, un}mapping super page frames causes re-allocation of io
  * page tables.
  */
-static int iommu_pde_from_gfn(struct domain *d, unsigned long pfn, 
+static int iommu_pde_from_bfn(struct domain *d, unsigned long pfn,
                               unsigned long pt_mfn[])
 {
     u64 *pde, *next_table_vaddr;
@@ -477,11 +477,11 @@ static int iommu_pde_from_gfn(struct domain *d, unsigned long pfn,
              next_table_mfn != 0 )
         {
             int i;
-            unsigned long mfn, gfn;
+            unsigned long mfn, bfn;
             unsigned int page_sz;
 
             page_sz = 1 << (PTE_PER_TABLE_SHIFT * (next_level - 1));
-            gfn =  pfn & ~((1 << (PTE_PER_TABLE_SHIFT * next_level)) - 1);
+            bfn =  pfn & ~((1 << (PTE_PER_TABLE_SHIFT * next_level)) - 1);
             mfn = next_table_mfn;
 
             /* allocate lower level page table */
@@ -499,10 +499,10 @@ static int iommu_pde_from_gfn(struct domain *d, unsigned long pfn,
 
             for ( i = 0; i < PTE_PER_TABLE_SIZE; i++ )
             {
-                set_iommu_pte_present(next_table_mfn, gfn, mfn, next_level,
+                set_iommu_pte_present(next_table_mfn, bfn, mfn, next_level,
                                       !!IOMMUF_writable, !!IOMMUF_readable);
                 mfn += page_sz;
-                gfn += page_sz;
+                bfn += page_sz;
              }
 
             amd_iommu_flush_all_pages(d);
@@ -540,7 +540,7 @@ static int iommu_pde_from_gfn(struct domain *d, unsigned long pfn,
     return 0;
 }
 
-static int update_paging_mode(struct domain *d, unsigned long gfn)
+static int update_paging_mode(struct domain *d, unsigned long bfn)
 {
     u16 bdf;
     void *device_entry;
@@ -554,13 +554,13 @@ static int update_paging_mode(struct domain *d, unsigned long gfn)
     unsigned long old_root_mfn;
     struct domain_iommu *hd = dom_iommu(d);
 
-    if ( gfn == gfn_x(INVALID_GFN) )
+    if ( bfn == bfn_x(INVALID_BFN) )
         return -EADDRNOTAVAIL;
-    ASSERT(!(gfn >> DEFAULT_DOMAIN_ADDRESS_WIDTH));
+    ASSERT(!(bfn >> DEFAULT_DOMAIN_ADDRESS_WIDTH));
 
     level = hd->arch.paging_mode;
     old_root = hd->arch.root_table;
-    offset = gfn >> (PTE_PER_TABLE_SHIFT * (level - 1));
+    offset = bfn >> (PTE_PER_TABLE_SHIFT * (level - 1));
 
     ASSERT(spin_is_locked(&hd->arch.mapping_lock) && is_hvm_domain(d));
 
@@ -631,7 +631,7 @@ static int update_paging_mode(struct domain *d, unsigned long gfn)
     return 0;
 }
 
-int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
+int amd_iommu_map_page(struct domain *d, unsigned long bfn, unsigned long mfn,
                        unsigned int flags)
 {
     bool_t need_flush = 0;
@@ -651,34 +651,34 @@ int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
     if ( rc )
     {
         spin_unlock(&hd->arch.mapping_lock);
-        AMD_IOMMU_DEBUG("Root table alloc failed, gfn = %lx\n", gfn);
+        AMD_IOMMU_DEBUG("Root table alloc failed, bfn = %lx\n", bfn);
         domain_crash(d);
         return rc;
     }
 
     /* Since HVM domain is initialized with 2 level IO page table,
-     * we might need a deeper page table for lager gfn now */
+     * we might need a deeper page table for lager bfn now */
     if ( is_hvm_domain(d) )
     {
-        if ( update_paging_mode(d, gfn) )
+        if ( update_paging_mode(d, bfn) )
         {
             spin_unlock(&hd->arch.mapping_lock);
-            AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn);
+            AMD_IOMMU_DEBUG("Update page mode failed bfn = %lx\n", bfn);
             domain_crash(d);
             return -EFAULT;
         }
     }
 
-    if ( iommu_pde_from_gfn(d, gfn, pt_mfn) || (pt_mfn[1] == 0) )
+    if ( iommu_pde_from_bfn(d, bfn, pt_mfn) || (pt_mfn[1] == 0) )
     {
         spin_unlock(&hd->arch.mapping_lock);
-        AMD_IOMMU_DEBUG("Invalid IO pagetable entry gfn = %lx\n", gfn);
+        AMD_IOMMU_DEBUG("Invalid IO pagetable entry bfn = %lx\n", bfn);
         domain_crash(d);
         return -EFAULT;
     }
 
     /* Install 4k mapping first */
-    need_flush = set_iommu_pte_present(pt_mfn[1], gfn, mfn, 
+    need_flush = set_iommu_pte_present(pt_mfn[1], bfn, mfn,
                                        IOMMU_PAGING_MODE_LEVEL_1,
                                        !!(flags & IOMMUF_writable),
                                        !!(flags & IOMMUF_readable));
@@ -690,7 +690,7 @@ int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
     /* 4K mapping for PV guests never changes, 
      * no need to flush if we trust non-present bits */
     if ( is_hvm_domain(d) )
-        amd_iommu_flush_pages(d, gfn, 0);
+        amd_iommu_flush_pages(d, bfn, 0);
 
     for ( merge_level = IOMMU_PAGING_MODE_LEVEL_2;
           merge_level <= hd->arch.paging_mode; merge_level++ )
@@ -698,15 +698,15 @@ int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
         if ( pt_mfn[merge_level] == 0 )
             break;
         if ( !iommu_update_pde_count(d, pt_mfn[merge_level],
-                                     gfn, mfn, merge_level) )
+                                     bfn, mfn, merge_level) )
             break;
 
-        if ( iommu_merge_pages(d, pt_mfn[merge_level], gfn, 
+        if ( iommu_merge_pages(d, pt_mfn[merge_level], bfn,
                                flags, merge_level) )
         {
             spin_unlock(&hd->arch.mapping_lock);
             AMD_IOMMU_DEBUG("Merge iommu page failed at level %d, "
-                            "gfn = %lx mfn = %lx\n", merge_level, gfn, mfn);
+                            "bfn = %lx mfn = %lx\n", merge_level, bfn, mfn);
             domain_crash(d);
             return -EFAULT;
         }
@@ -720,7 +720,7 @@ out:
     return 0;
 }
 
-int amd_iommu_unmap_page(struct domain *d, unsigned long gfn)
+int amd_iommu_unmap_page(struct domain *d, unsigned long bfn)
 {
     unsigned long pt_mfn[7];
     struct domain_iommu *hd = dom_iommu(d);
@@ -739,34 +739,34 @@ int amd_iommu_unmap_page(struct domain *d, unsigned long gfn)
     }
 
     /* Since HVM domain is initialized with 2 level IO page table,
-     * we might need a deeper page table for lager gfn now */
+     * we might need a deeper page table for lager bfn now */
     if ( is_hvm_domain(d) )
     {
-        int rc = update_paging_mode(d, gfn);
+        int rc = update_paging_mode(d, bfn);
 
         if ( rc )
         {
             spin_unlock(&hd->arch.mapping_lock);
-            AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n", gfn);
+            AMD_IOMMU_DEBUG("Update page mode failed bfn = %lx\n", bfn);
             if ( rc != -EADDRNOTAVAIL )
                 domain_crash(d);
             return rc;
         }
     }
 
-    if ( iommu_pde_from_gfn(d, gfn, pt_mfn) || (pt_mfn[1] == 0) )
+    if ( iommu_pde_from_bfn(d, bfn, pt_mfn) || (pt_mfn[1] == 0) )
     {
         spin_unlock(&hd->arch.mapping_lock);
-        AMD_IOMMU_DEBUG("Invalid IO pagetable entry gfn = %lx\n", gfn);
+        AMD_IOMMU_DEBUG("Invalid IO pagetable entry bfn = %lx\n", bfn);
         domain_crash(d);
         return -EFAULT;
     }
 
     /* mark PTE as 'page not present' */
-    clear_iommu_pte_present(pt_mfn[1], gfn);
+    clear_iommu_pte_present(pt_mfn[1], bfn);
     spin_unlock(&hd->arch.mapping_lock);
 
-    amd_iommu_flush_pages(d, gfn, 0);
+    amd_iommu_flush_pages(d, bfn, 0);
 
     return 0;
 }
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 12d2695b89..d608631e6e 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -578,7 +578,7 @@ static void amd_dump_p2m_table_level(struct page_info* pg, int level,
                 maddr_to_page(next_table_maddr), next_level,
                 address, indent + 1);
         else
-            printk("%*sgfn: %08lx  mfn: %08lx\n",
+            printk("%*sbfn: %08lx  mfn: %08lx\n",
                    indent, "",
                    (unsigned long)PFN_DOWN(address),
                    (unsigned long)PFN_DOWN(next_table_maddr));
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 74c09b0991..3605e20afd 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2551,7 +2551,7 @@ static int __must_check arm_smmu_iotlb_flush_all(struct domain *d)
 }
 
 static int __must_check arm_smmu_iotlb_flush(struct domain *d,
-                                             unsigned long gfn,
+                                             unsigned long bfn,
                                              unsigned int page_count)
 {
 	/* ARM SMMU v1 doesn't have flush by VMA and VMID */
@@ -2737,7 +2737,7 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
 	xfree(xen_domain);
 }
 
-static int __must_check arm_smmu_map_page(struct domain *d, unsigned long gfn,
+static int __must_check arm_smmu_map_page(struct domain *d, unsigned long bfn,
 			unsigned long mfn, unsigned int flags)
 {
 	p2m_type_t t;
@@ -2748,10 +2748,10 @@ static int __must_check arm_smmu_map_page(struct domain *d, unsigned long gfn,
 	 * protected by an IOMMU, Xen needs to add a 1:1 mapping in the domain
 	 * p2m to allow DMA request to work.
 	 * This is only valid when the domain is directed mapped. Hence this
-	 * function should only be used by gnttab code with gfn == mfn.
+	 * function should only be used by gnttab code with bfn == mfn.
 	 */
 	BUG_ON(!is_domain_direct_mapped(d));
-	BUG_ON(mfn != gfn);
+	BUG_ON(mfn != bfn);
 
 	/* We only support readable and writable flags */
 	if (!(flags & (IOMMUF_readable | IOMMUF_writable)))
@@ -2763,10 +2763,10 @@ static int __must_check arm_smmu_map_page(struct domain *d, unsigned long gfn,
 	 * The function guest_physmap_add_entry replaces the current mapping
 	 * if there is already one...
 	 */
-	return guest_physmap_add_entry(d, _gfn(gfn), _mfn(mfn), 0, t);
+	return guest_physmap_add_entry(d, _gfn(bfn), _mfn(mfn), 0, t);
 }
 
-static int __must_check arm_smmu_unmap_page(struct domain *d, unsigned long gfn)
+static int __must_check arm_smmu_unmap_page(struct domain *d, unsigned long bfn)
 {
 	/*
 	 * This function should only be used by gnttab code when the domain
@@ -2775,7 +2775,7 @@ static int __must_check arm_smmu_unmap_page(struct domain *d, unsigned long gfn)
 	if ( !is_domain_direct_mapped(d) )
 		return -EINVAL;
 
-	return guest_physmap_remove_page(d, _gfn(gfn), _mfn(gfn), 0);
+	return guest_physmap_remove_page(d, _gfn(bfn), _mfn(bfn), 0);
 }
 
 static const struct iommu_ops arm_smmu_iommu_ops = {
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 1aecf7cf34..df7c22f39c 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -255,7 +255,7 @@ void iommu_domain_destroy(struct domain *d)
     arch_iommu_domain_destroy(d);
 }
 
-int iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
+int iommu_map_page(struct domain *d, unsigned long bfn, unsigned long mfn,
                    unsigned int flags)
 {
     const struct domain_iommu *hd = dom_iommu(d);
@@ -264,13 +264,13 @@ int iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
     if ( !iommu_enabled || !hd->platform_ops )
         return 0;
 
-    rc = hd->platform_ops->map_page(d, gfn, mfn, flags);
+    rc = hd->platform_ops->map_page(d, bfn, mfn, flags);
     if ( unlikely(rc) )
     {
         if ( !d->is_shutting_down && printk_ratelimit() )
             printk(XENLOG_ERR
-                   "d%d: IOMMU mapping gfn %#lx to mfn %#lx failed: %d\n",
-                   d->domain_id, gfn, mfn, rc);
+                   "d%d: IOMMU mapping bfn %#lx to mfn %#lx failed: %d\n",
+                   d->domain_id, bfn, mfn, rc);
 
         if ( !is_hardware_domain(d) )
             domain_crash(d);
@@ -279,7 +279,7 @@ int iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
     return rc;
 }
 
-int iommu_unmap_page(struct domain *d, unsigned long gfn)
+int iommu_unmap_page(struct domain *d, unsigned long bfn)
 {
     const struct domain_iommu *hd = dom_iommu(d);
     int rc;
@@ -287,13 +287,13 @@ int iommu_unmap_page(struct domain *d, unsigned long gfn)
     if ( !iommu_enabled || !hd->platform_ops )
         return 0;
 
-    rc = hd->platform_ops->unmap_page(d, gfn);
+    rc = hd->platform_ops->unmap_page(d, bfn);
     if ( unlikely(rc) )
     {
         if ( !d->is_shutting_down && printk_ratelimit() )
             printk(XENLOG_ERR
-                   "d%d: IOMMU unmapping gfn %#lx failed: %d\n",
-                   d->domain_id, gfn, rc);
+                   "d%d: IOMMU unmapping bfn %#lx failed: %d\n",
+                   d->domain_id, bfn, rc);
 
         if ( !is_hardware_domain(d) )
             domain_crash(d);
@@ -319,7 +319,7 @@ static void iommu_free_pagetables(unsigned long unused)
                             cpumask_cycle(smp_processor_id(), &cpu_online_map));
 }
 
-int iommu_iotlb_flush(struct domain *d, unsigned long gfn,
+int iommu_iotlb_flush(struct domain *d, unsigned long bfn,
                       unsigned int page_count)
 {
     const struct domain_iommu *hd = dom_iommu(d);
@@ -328,13 +328,13 @@ int iommu_iotlb_flush(struct domain *d, unsigned long gfn,
     if ( !iommu_enabled || !hd->platform_ops || !hd->platform_ops->iotlb_flush )
         return 0;
 
-    rc = hd->platform_ops->iotlb_flush(d, gfn, page_count);
+    rc = hd->platform_ops->iotlb_flush(d, bfn, page_count);
     if ( unlikely(rc) )
     {
         if ( !d->is_shutting_down && printk_ratelimit() )
             printk(XENLOG_ERR
-                   "d%d: IOMMU IOTLB flush failed: %d, gfn %#lx, page count %u\n",
-                   d->domain_id, rc, gfn, page_count);
+                   "d%d: IOMMU IOTLB flush failed: %d, bfn %#lx, page count %u\n",
+                   d->domain_id, rc, bfn, page_count);
 
         if ( !is_hardware_domain(d) )
             domain_crash(d);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index daaed0abbd..18752819a7 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -585,7 +585,7 @@ static int __must_check iommu_flush_all(void)
 }
 
 static int __must_check iommu_flush_iotlb(struct domain *d,
-                                          unsigned long gfn,
+                                          unsigned long bfn,
                                           bool_t dma_old_pte_present,
                                           unsigned int page_count)
 {
@@ -612,12 +612,12 @@ static int __must_check iommu_flush_iotlb(struct domain *d,
         if ( iommu_domid == -1 )
             continue;
 
-        if ( page_count != 1 || gfn == gfn_x(INVALID_GFN) )
+        if ( page_count != 1 || bfn == bfn_x(INVALID_BFN) )
             rc = iommu_flush_iotlb_dsi(iommu, iommu_domid,
                                        0, flush_dev_iotlb);
         else
             rc = iommu_flush_iotlb_psi(iommu, iommu_domid,
-                                       (paddr_t)gfn << PAGE_SHIFT_4K,
+                                       (paddr_t)bfn << PAGE_SHIFT_4K,
                                        PAGE_ORDER_4K,
                                        !dma_old_pte_present,
                                        flush_dev_iotlb);
@@ -633,15 +633,15 @@ static int __must_check iommu_flush_iotlb(struct domain *d,
 }
 
 static int __must_check iommu_flush_iotlb_pages(struct domain *d,
-                                                unsigned long gfn,
+                                                unsigned long bfn,
                                                 unsigned int page_count)
 {
-    return iommu_flush_iotlb(d, gfn, 1, page_count);
+    return iommu_flush_iotlb(d, bfn, 1, page_count);
 }
 
 static int __must_check iommu_flush_iotlb_all(struct domain *d)
 {
-    return iommu_flush_iotlb(d, gfn_x(INVALID_GFN), 0, 0);
+    return iommu_flush_iotlb(d, bfn_x(INVALID_BFN), 0, 0);
 }
 
 /* clear one page's page table */
@@ -1761,7 +1761,7 @@ static void iommu_domain_teardown(struct domain *d)
 }
 
 static int __must_check intel_iommu_map_page(struct domain *d,
-                                             unsigned long gfn,
+                                             unsigned long bfn,
                                              unsigned long mfn,
                                              unsigned int flags)
 {
@@ -1780,14 +1780,14 @@ static int __must_check intel_iommu_map_page(struct domain *d,
 
     spin_lock(&hd->arch.mapping_lock);
 
-    pg_maddr = addr_to_dma_page_maddr(d, (paddr_t)gfn << PAGE_SHIFT_4K, 1);
+    pg_maddr = addr_to_dma_page_maddr(d, (paddr_t)bfn << PAGE_SHIFT_4K, 1);
     if ( pg_maddr == 0 )
     {
         spin_unlock(&hd->arch.mapping_lock);
         return -ENOMEM;
     }
     page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
-    pte = page + (gfn & LEVEL_MASK);
+    pte = page + (bfn & LEVEL_MASK);
     old = *pte;
     dma_set_pte_addr(new, (paddr_t)mfn << PAGE_SHIFT_4K);
     dma_set_pte_prot(new,
@@ -1811,22 +1811,22 @@ static int __must_check intel_iommu_map_page(struct domain *d,
     unmap_vtd_domain_page(page);
 
     if ( !this_cpu(iommu_dont_flush_iotlb) )
-        rc = iommu_flush_iotlb(d, gfn, dma_pte_present(old), 1);
+        rc = iommu_flush_iotlb(d, bfn, dma_pte_present(old), 1);
 
     return rc;
 }
 
 static int __must_check intel_iommu_unmap_page(struct domain *d,
-                                               unsigned long gfn)
+                                               unsigned long bfn)
 {
     /* Do nothing if hardware domain and iommu supports pass thru. */
     if ( iommu_passthrough && is_hardware_domain(d) )
         return 0;
 
-    return dma_pte_clear_one(d, (paddr_t)gfn << PAGE_SHIFT_4K);
+    return dma_pte_clear_one(d, (paddr_t)bfn << PAGE_SHIFT_4K);
 }
 
-int iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte,
+int iommu_pte_flush(struct domain *d, u64 bfn, u64 *pte,
                     int order, int present)
 {
     struct acpi_drhd_unit *drhd;
@@ -1850,7 +1850,7 @@ int iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte,
             continue;
 
         rc = iommu_flush_iotlb_psi(iommu, iommu_domid,
-                                   (paddr_t)gfn << PAGE_SHIFT_4K,
+                                   (paddr_t)bfn << PAGE_SHIFT_4K,
                                    order, !present, flush_dev_iotlb);
         if ( rc > 0 )
         {
@@ -2620,7 +2620,7 @@ static void vtd_dump_p2m_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
             vtd_dump_p2m_table_level(dma_pte_addr(*pte), next_level, 
                                      address, indent + 1);
         else
-            printk("%*sgfn: %08lx mfn: %08lx\n",
+            printk("%*sbfn: %08lx mfn: %08lx\n",
                    indent, "",
                    (unsigned long)(address >> PAGE_SHIFT_4K),
                    (unsigned long)(dma_pte_addr(*pte) >> PAGE_SHIFT_4K));
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 33c8b221dc..de1c581cdd 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -23,11 +23,15 @@
 #include <xen/page-defs.h>
 #include <xen/spinlock.h>
 #include <xen/pci.h>
+#include <xen/typesafe.h>
 #include <public/hvm/ioreq.h>
 #include <public/domctl.h>
 #include <asm/device.h>
 #include <asm/iommu.h>
 
+TYPE_SAFE(unsigned long, bfn);
+#define INVALID_BFN      _bfn(~0UL)
+
 extern bool_t iommu_enable, iommu_enabled;
 extern bool_t force_iommu, iommu_verbose;
 extern bool_t iommu_workaround_bios_bug, iommu_igfx, iommu_passthrough;
@@ -60,9 +64,9 @@ void iommu_teardown(struct domain *d);
 #define IOMMUF_readable  (1u<<_IOMMUF_readable)
 #define _IOMMUF_writable 1
 #define IOMMUF_writable  (1u<<_IOMMUF_writable)
-int __must_check iommu_map_page(struct domain *d, unsigned long gfn,
+int __must_check iommu_map_page(struct domain *d, unsigned long bfn,
                                 unsigned long mfn, unsigned int flags);
-int __must_check iommu_unmap_page(struct domain *d, unsigned long gfn);
+int __must_check iommu_unmap_page(struct domain *d, unsigned long bfn);
 
 enum iommu_feature
 {
@@ -152,9 +156,9 @@ struct iommu_ops {
 #endif /* HAS_PCI */
 
     void (*teardown)(struct domain *d);
-    int __must_check (*map_page)(struct domain *d, unsigned long gfn,
+    int __must_check (*map_page)(struct domain *d, unsigned long bfn,
                                  unsigned long mfn, unsigned int flags);
-    int __must_check (*unmap_page)(struct domain *d, unsigned long gfn);
+    int __must_check (*unmap_page)(struct domain *d, unsigned long bfn);
     void (*free_page_table)(struct page_info *);
 #ifdef CONFIG_X86
     void (*update_ire_from_apic)(unsigned int apic, unsigned int reg, unsigned int value);
@@ -165,7 +169,7 @@ struct iommu_ops {
     void (*resume)(void);
     void (*share_p2m)(struct domain *d);
     void (*crash_shutdown)(void);
-    int __must_check (*iotlb_flush)(struct domain *d, unsigned long gfn,
+    int __must_check (*iotlb_flush)(struct domain *d, unsigned long bfn,
                                     unsigned int page_count);
     int __must_check (*iotlb_flush_all)(struct domain *d);
     int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
@@ -187,7 +191,7 @@ int iommu_do_pci_domctl(struct xen_domctl *, struct domain *d,
 int iommu_do_domctl(struct xen_domctl *, struct domain *d,
                     XEN_GUEST_HANDLE_PARAM(xen_domctl_t));
 
-int __must_check iommu_iotlb_flush(struct domain *d, unsigned long gfn,
+int __must_check iommu_iotlb_flush(struct domain *d, unsigned long bfn,
                                    unsigned int page_count);
 int __must_check iommu_iotlb_flush_all(struct domain *d);
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 2/7] iommu: make use of type-safe BFN and MFN in exported functions
  2018-02-12 10:47 [PATCH 0/7] paravirtual IOMMU interface Paul Durrant
  2018-02-12 10:47 ` [PATCH 1/7] iommu: introduce the concept of BFN Paul Durrant
@ 2018-02-12 10:47 ` Paul Durrant
  2018-03-15 15:44   ` Jan Beulich
  2018-02-12 10:47 ` [PATCH 3/7] iommu: push use of type-safe BFN and MFN into iommu_ops Paul Durrant
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-02-12 10:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Jun Nakajima,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Paul Durrant, Jan Beulich

This patch modifies the declaration of the entry points to the IOMMU
sub-system to use bfn_t and mfn_t in place of unsigned long. A subsequent
patch will similarly modify the methods in the iommu_ops structure.

NOTE: Since (with this patch applied) bfn_t is now in use, the patch also
      introduces the 'cscope/grep fodder' to allow the type declaration to
      be easily found.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/arch/arm/p2m.c                    |  3 ++-
 xen/arch/x86/mm.c                     |  7 +++----
 xen/arch/x86/mm/p2m-ept.c             |  8 +++++---
 xen/arch/x86/mm/p2m-pt.c              |  8 ++++----
 xen/arch/x86/mm/p2m.c                 | 15 +++++++++------
 xen/arch/x86/x86_64/mm.c              |  5 +++--
 xen/common/grant_table.c              | 10 ++++++----
 xen/common/memory.c                   |  4 ++--
 xen/drivers/passthrough/iommu.c       | 25 ++++++++++++-------------
 xen/drivers/passthrough/vtd/x86/vtd.c |  3 ++-
 xen/include/xen/iommu.h               | 23 +++++++++++++++++++----
 11 files changed, 67 insertions(+), 44 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 65e8b9c6ea..25e9af6b05 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -957,7 +957,8 @@ static int __p2m_set_entry(struct p2m_domain *p2m,
 
     if ( need_iommu(p2m->domain) &&
          (lpae_valid(orig_pte) || lpae_valid(*entry)) )
-        rc = iommu_iotlb_flush(p2m->domain, gfn_x(sgfn), 1UL << page_order);
+        rc = iommu_iotlb_flush(p2m->domain, _bfn(gfn_x(sgfn)),
+                               1UL << page_order);
     else
         rc = 0;
 
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 35f204369b..69ce57914b 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2676,13 +2676,12 @@ static int _get_page_type(struct page_info *page, unsigned long type,
         struct domain *d = page_get_owner(page);
         if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
         {
-            gfn_t gfn = _gfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
+            bfn_t bfn = _bfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
 
             if ( (x & PGT_type_mask) == PGT_writable_page )
-                iommu_ret = iommu_unmap_page(d, gfn_x(gfn));
+                iommu_ret = iommu_unmap_page(d, bfn);
             else if ( type == PGT_writable_page )
-                iommu_ret = iommu_map_page(d, gfn_x(gfn),
-                                           mfn_x(page_to_mfn(page)),
+                iommu_ret = iommu_map_page(d, bfn, page_to_mfn(page),
                                            IOMMUF_readable|IOMMUF_writable);
         }
     }
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 66dbb3e83a..e1ebd25e57 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -873,12 +873,14 @@ out:
             if ( iommu_flags )
                 for ( i = 0; i < (1 << order); i++ )
                 {
-                    rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
+                    rc = iommu_map_page(d, _bfn(gfn + i), mfn_add(mfn, i),
+                                        iommu_flags);
                     if ( unlikely(rc) )
                     {
                         while ( i-- )
                             /* If statement to satisfy __must_check. */
-                            if ( iommu_unmap_page(p2m->domain, gfn + i) )
+                            if ( iommu_unmap_page(p2m->domain,
+                                                  _bfn(gfn + i)) )
                                 continue;
 
                         break;
@@ -887,7 +889,7 @@ out:
             else
                 for ( i = 0; i < (1 << order); i++ )
                 {
-                    ret = iommu_unmap_page(d, gfn + i);
+                    ret = iommu_unmap_page(d, _bfn(gfn + i));
                     if ( !rc )
                         rc = ret;
                 }
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index ad6f9ef10d..0e6392a959 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -696,13 +696,13 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
         else if ( iommu_pte_flags )
             for ( i = 0; i < (1UL << page_order); i++ )
             {
-                rc = iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
-                                    iommu_pte_flags);
+                rc = iommu_map_page(p2m->domain, _bfn(gfn + i),
+                                    mfn_add(mfn, i), iommu_pte_flags);
                 if ( unlikely(rc) )
                 {
                     while ( i-- )
                         /* If statement to satisfy __must_check. */
-                        if ( iommu_unmap_page(p2m->domain, gfn + i) )
+                        if ( iommu_unmap_page(p2m->domain, _bfn(gfn + i)) )
                             continue;
 
                     break;
@@ -711,7 +711,7 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
         else
             for ( i = 0; i < (1UL << page_order); i++ )
             {
-                int ret = iommu_unmap_page(p2m->domain, gfn + i);
+                int ret = iommu_unmap_page(p2m->domain, _bfn(gfn + i));
 
                 if ( !rc )
                     rc = ret;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index dccd1425b4..115956bcec 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -722,7 +722,7 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn_l, unsigned long mfn,
         {
             for ( i = 0; i < (1 << page_order); i++ )
             {
-                int ret = iommu_unmap_page(p2m->domain, mfn + i);
+                int ret = iommu_unmap_page(p2m->domain, _bfn(mfn + i));
 
                 if ( !rc )
                     rc = ret;
@@ -781,14 +781,14 @@ guest_physmap_add_entry(struct domain *d, gfn_t gfn, mfn_t mfn,
         {
             for ( i = 0; i < (1 << page_order); i++ )
             {
-                rc = iommu_map_page(d, mfn_x(mfn_add(mfn, i)),
-                                    mfn_x(mfn_add(mfn, i)),
+                rc = iommu_map_page(d, _bfn(mfn_x(mfn) + i),
+                                    mfn_add(mfn, i),
                                     IOMMUF_readable|IOMMUF_writable);
                 if ( rc != 0 )
                 {
                     while ( i-- > 0 )
                         /* If statement to satisfy __must_check. */
-                        if ( iommu_unmap_page(d, mfn_x(mfn_add(mfn, i))) )
+                        if ( iommu_unmap_page(d, _bfn(mfn_x(mfn) + i)) )
                             continue;
 
                     return rc;
@@ -1164,7 +1164,9 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn_l,
     {
         if ( !need_iommu(d) )
             return 0;
-        return iommu_map_page(d, gfn_l, gfn_l, IOMMUF_readable|IOMMUF_writable);
+
+        return iommu_map_page(d, _bfn(gfn_l), _mfn(gfn_l),
+                              IOMMUF_readable|IOMMUF_writable);
     }
 
     gfn_lock(p2m, gfn, 0);
@@ -1254,7 +1256,8 @@ int clear_identity_p2m_entry(struct domain *d, unsigned long gfn_l)
     {
         if ( !need_iommu(d) )
             return 0;
-        return iommu_unmap_page(d, gfn_l);
+
+        return iommu_unmap_page(d, _bfn(gfn_l));
     }
 
     gfn_lock(p2m, gfn, 0);
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 9b37da6698..5af3164b8d 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1428,13 +1428,14 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     if ( iommu_enabled && !iommu_passthrough && !need_iommu(hardware_domain) )
     {
         for ( i = spfn; i < epfn; i++ )
-            if ( iommu_map_page(hardware_domain, i, i, IOMMUF_readable|IOMMUF_writable) )
+            if ( iommu_map_page(hardware_domain, _bfn(i), _mfn(i),
+                                IOMMUF_readable|IOMMUF_writable) )
                 break;
         if ( i != epfn )
         {
             while (i-- > old_max)
                 /* If statement to satisfy __must_check. */
-                if ( iommu_unmap_page(hardware_domain, i) )
+                if ( iommu_unmap_page(hardware_domain, _bfn(i)) )
                     continue;
 
             goto destroy_m2p;
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 48c547930c..97dc371f4b 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1108,13 +1108,14 @@ map_grant_ref(
              !(old_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) )
         {
             if ( !(kind & MAPKIND_WRITE) )
-                err = iommu_map_page(ld, frame, frame,
+                err = iommu_map_page(ld, _bfn(frame), _mfn(frame),
                                      IOMMUF_readable|IOMMUF_writable);
         }
         else if ( act_pin && !old_pin )
         {
             if ( !kind )
-                err = iommu_map_page(ld, frame, frame, IOMMUF_readable);
+                err = iommu_map_page(ld, _bfn(frame), _mfn(frame),
+                                     IOMMUF_readable);
         }
         if ( err )
         {
@@ -1376,9 +1377,10 @@ unmap_common(
 
         kind = mapkind(lgt, rd, op->frame);
         if ( !kind )
-            err = iommu_unmap_page(ld, op->frame);
+            err = iommu_unmap_page(ld, _bfn(op->frame));
         else if ( !(kind & MAPKIND_WRITE) )
-            err = iommu_map_page(ld, op->frame, op->frame, IOMMUF_readable);
+            err = iommu_map_page(ld, _bfn(op->frame), _mfn(op->frame),
+                                 IOMMUF_readable);
 
         double_gt_unlock(lgt, rgt);
 
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 59d23a2a98..5f9152a817 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -823,11 +823,11 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,
 
         this_cpu(iommu_dont_flush_iotlb) = 0;
 
-        ret = iommu_iotlb_flush(d, xatp->idx - done, done);
+        ret = iommu_iotlb_flush(d, _bfn(xatp->idx - done), done);
         if ( unlikely(ret) && rc >= 0 )
             rc = ret;
 
-        ret = iommu_iotlb_flush(d, xatp->gpfn - done, done);
+        ret = iommu_iotlb_flush(d, _bfn(xatp->gpfn - done), done);
         if ( unlikely(ret) && rc >= 0 )
             rc = ret;
     }
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index df7c22f39c..b25d9e3707 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -255,7 +255,7 @@ void iommu_domain_destroy(struct domain *d)
     arch_iommu_domain_destroy(d);
 }
 
-int iommu_map_page(struct domain *d, unsigned long bfn, unsigned long mfn,
+int iommu_map_page(struct domain *d, bfn_t bfn, mfn_t mfn,
                    unsigned int flags)
 {
     const struct domain_iommu *hd = dom_iommu(d);
@@ -264,13 +264,13 @@ int iommu_map_page(struct domain *d, unsigned long bfn, unsigned long mfn,
     if ( !iommu_enabled || !hd->platform_ops )
         return 0;
 
-    rc = hd->platform_ops->map_page(d, bfn, mfn, flags);
+    rc = hd->platform_ops->map_page(d, bfn_x(bfn), mfn_x(mfn), flags);
     if ( unlikely(rc) )
     {
         if ( !d->is_shutting_down && printk_ratelimit() )
             printk(XENLOG_ERR
-                   "d%d: IOMMU mapping bfn %#lx to mfn %#lx failed: %d\n",
-                   d->domain_id, bfn, mfn, rc);
+                   "d%d: IOMMU mapping bfn %"PRI_bfn" to mfn %"PRI_mfn" failed: %d\n",
+                   d->domain_id, bfn_x(bfn), mfn_x(mfn), rc);
 
         if ( !is_hardware_domain(d) )
             domain_crash(d);
@@ -279,7 +279,7 @@ int iommu_map_page(struct domain *d, unsigned long bfn, unsigned long mfn,
     return rc;
 }
 
-int iommu_unmap_page(struct domain *d, unsigned long bfn)
+int iommu_unmap_page(struct domain *d, bfn_t bfn)
 {
     const struct domain_iommu *hd = dom_iommu(d);
     int rc;
@@ -287,13 +287,13 @@ int iommu_unmap_page(struct domain *d, unsigned long bfn)
     if ( !iommu_enabled || !hd->platform_ops )
         return 0;
 
-    rc = hd->platform_ops->unmap_page(d, bfn);
+    rc = hd->platform_ops->unmap_page(d, bfn_x(bfn));
     if ( unlikely(rc) )
     {
         if ( !d->is_shutting_down && printk_ratelimit() )
             printk(XENLOG_ERR
-                   "d%d: IOMMU unmapping bfn %#lx failed: %d\n",
-                   d->domain_id, bfn, rc);
+                   "d%d: IOMMU unmapping bfn %"PRI_bfn" failed: %d\n",
+                   d->domain_id, bfn_x(bfn), rc);
 
         if ( !is_hardware_domain(d) )
             domain_crash(d);
@@ -319,8 +319,7 @@ static void iommu_free_pagetables(unsigned long unused)
                             cpumask_cycle(smp_processor_id(), &cpu_online_map));
 }
 
-int iommu_iotlb_flush(struct domain *d, unsigned long bfn,
-                      unsigned int page_count)
+int iommu_iotlb_flush(struct domain *d, bfn_t bfn, unsigned int page_count)
 {
     const struct domain_iommu *hd = dom_iommu(d);
     int rc;
@@ -328,13 +327,13 @@ int iommu_iotlb_flush(struct domain *d, unsigned long bfn,
     if ( !iommu_enabled || !hd->platform_ops || !hd->platform_ops->iotlb_flush )
         return 0;
 
-    rc = hd->platform_ops->iotlb_flush(d, bfn, page_count);
+    rc = hd->platform_ops->iotlb_flush(d, bfn_x(bfn), page_count);
     if ( unlikely(rc) )
     {
         if ( !d->is_shutting_down && printk_ratelimit() )
             printk(XENLOG_ERR
-                   "d%d: IOMMU IOTLB flush failed: %d, bfn %#lx, page count %u\n",
-                   d->domain_id, rc, bfn, page_count);
+                   "d%d: IOMMU IOTLB flush failed: %d, bfn %"PRI_bfn", page count %u\n",
+                   d->domain_id, rc, bfn_x(bfn), page_count);
 
         if ( !is_hardware_domain(d) )
             domain_crash(d);
diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c b/xen/drivers/passthrough/vtd/x86/vtd.c
index 88a60b3307..16f900f451 100644
--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -143,7 +143,8 @@ void __hwdom_init vtd_set_hwdom_mapping(struct domain *d)
         tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K);
         for ( j = 0; j < tmp; j++ )
         {
-            int ret = iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
+            int ret = iommu_map_page(d, _bfn(pfn * tmp + j),
+                                     _mfn(pfn * tmp + j),
                                      IOMMUF_readable|IOMMUF_writable);
 
             if ( !rc )
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index de1c581cdd..3d19918301 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -24,14 +24,29 @@
 #include <xen/spinlock.h>
 #include <xen/pci.h>
 #include <xen/typesafe.h>
+#include <xen/mm.h>
 #include <public/hvm/ioreq.h>
 #include <public/domctl.h>
 #include <asm/device.h>
 #include <asm/iommu.h>
 
 TYPE_SAFE(unsigned long, bfn);
+#define PRI_bfn          "05lx"
 #define INVALID_BFN      _bfn(~0UL)
 
+/*
+ * The definitions below are purely for the benefit of grep/cscope. The
+ * real definitions come from the TYPE_SAFE macro above.
+ */
+#ifndef bfn_t
+#define bfn_t
+#define _bfn
+#define bfn_x
+#undef bfn_t
+#undef _bfn
+#undef bfn_x
+#endif
+
 extern bool_t iommu_enable, iommu_enabled;
 extern bool_t force_iommu, iommu_verbose;
 extern bool_t iommu_workaround_bios_bug, iommu_igfx, iommu_passthrough;
@@ -64,9 +79,9 @@ void iommu_teardown(struct domain *d);
 #define IOMMUF_readable  (1u<<_IOMMUF_readable)
 #define _IOMMUF_writable 1
 #define IOMMUF_writable  (1u<<_IOMMUF_writable)
-int __must_check iommu_map_page(struct domain *d, unsigned long bfn,
-                                unsigned long mfn, unsigned int flags);
-int __must_check iommu_unmap_page(struct domain *d, unsigned long bfn);
+int __must_check iommu_map_page(struct domain *d, bfn_t bfn,
+                                mfn_t mfn, unsigned int flags);
+int __must_check iommu_unmap_page(struct domain *d, bfn_t bfn);
 
 enum iommu_feature
 {
@@ -191,7 +206,7 @@ int iommu_do_pci_domctl(struct xen_domctl *, struct domain *d,
 int iommu_do_domctl(struct xen_domctl *, struct domain *d,
                     XEN_GUEST_HANDLE_PARAM(xen_domctl_t));
 
-int __must_check iommu_iotlb_flush(struct domain *d, unsigned long bfn,
+int __must_check iommu_iotlb_flush(struct domain *d, bfn_t bfn,
                                    unsigned int page_count);
 int __must_check iommu_iotlb_flush_all(struct domain *d);
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 3/7] iommu: push use of type-safe BFN and MFN into iommu_ops
  2018-02-12 10:47 [PATCH 0/7] paravirtual IOMMU interface Paul Durrant
  2018-02-12 10:47 ` [PATCH 1/7] iommu: introduce the concept of BFN Paul Durrant
  2018-02-12 10:47 ` [PATCH 2/7] iommu: make use of type-safe BFN and MFN in exported functions Paul Durrant
@ 2018-02-12 10:47 ` Paul Durrant
  2018-03-15 16:15   ` Jan Beulich
  2018-02-12 10:47 ` [PATCH 4/7] vtd: add lookup_page method to iommu_ops Paul Durrant
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-02-12 10:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Kevin Tian, Paul Durrant, Jan Beulich,
	Suravee Suthikulpanit

This patch modifies the methods in struct iommu_ops to use type-safe BFN
and MFN. This follows on from the prior patch that modified the functions
exported in xen/iommu.h.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/drivers/passthrough/amd/iommu_map.c       | 45 ++++++++++++++++-----------
 xen/drivers/passthrough/amd/pci_amd_iommu.c   |  2 +-
 xen/drivers/passthrough/arm/smmu.c            | 20 +++++++-----
 xen/drivers/passthrough/iommu.c               |  9 +++---
 xen/drivers/passthrough/vtd/iommu.c           | 27 ++++++++--------
 xen/drivers/passthrough/x86/iommu.c           |  2 +-
 xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |  8 ++---
 xen/include/xen/iommu.h                       | 13 +++++---
 8 files changed, 72 insertions(+), 54 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index 09d29ef026..bcc11c27cc 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -631,7 +631,7 @@ static int update_paging_mode(struct domain *d, unsigned long bfn)
     return 0;
 }
 
-int amd_iommu_map_page(struct domain *d, unsigned long bfn, unsigned long mfn,
+int amd_iommu_map_page(struct domain *d, bfn_t bfn, mfn_t mfn,
                        unsigned int flags)
 {
     bool_t need_flush = 0;
@@ -651,7 +651,8 @@ int amd_iommu_map_page(struct domain *d, unsigned long bfn, unsigned long mfn,
     if ( rc )
     {
         spin_unlock(&hd->arch.mapping_lock);
-        AMD_IOMMU_DEBUG("Root table alloc failed, bfn = %lx\n", bfn);
+        AMD_IOMMU_DEBUG("Root table alloc failed, bfn = %"PRI_bfn"\n",
+                        bfn_x(bfn));
         domain_crash(d);
         return rc;
     }
@@ -660,25 +661,27 @@ int amd_iommu_map_page(struct domain *d, unsigned long bfn, unsigned long mfn,
      * we might need a deeper page table for lager bfn now */
     if ( is_hvm_domain(d) )
     {
-        if ( update_paging_mode(d, bfn) )
+        if ( update_paging_mode(d, bfn_x(bfn)) )
         {
             spin_unlock(&hd->arch.mapping_lock);
-            AMD_IOMMU_DEBUG("Update page mode failed bfn = %lx\n", bfn);
+            AMD_IOMMU_DEBUG("Update page mode failed bfn = %"PRI_bfn"\n",
+                            bfn_x(bfn));
             domain_crash(d);
             return -EFAULT;
         }
     }
 
-    if ( iommu_pde_from_bfn(d, bfn, pt_mfn) || (pt_mfn[1] == 0) )
+    if ( iommu_pde_from_bfn(d, bfn_x(bfn), pt_mfn) || (pt_mfn[1] == 0) )
     {
         spin_unlock(&hd->arch.mapping_lock);
-        AMD_IOMMU_DEBUG("Invalid IO pagetable entry bfn = %lx\n", bfn);
+        AMD_IOMMU_DEBUG("Invalid IO pagetable entry bfn = %"PRI_bfn"\n",
+                        bfn_x(bfn));
         domain_crash(d);
         return -EFAULT;
     }
 
     /* Install 4k mapping first */
-    need_flush = set_iommu_pte_present(pt_mfn[1], bfn, mfn,
+    need_flush = set_iommu_pte_present(pt_mfn[1], bfn_x(bfn), mfn_x(mfn),
                                        IOMMU_PAGING_MODE_LEVEL_1,
                                        !!(flags & IOMMUF_writable),
                                        !!(flags & IOMMUF_readable));
@@ -690,7 +693,7 @@ int amd_iommu_map_page(struct domain *d, unsigned long bfn, unsigned long mfn,
     /* 4K mapping for PV guests never changes, 
      * no need to flush if we trust non-present bits */
     if ( is_hvm_domain(d) )
-        amd_iommu_flush_pages(d, bfn, 0);
+        amd_iommu_flush_pages(d, bfn_x(bfn), 0);
 
     for ( merge_level = IOMMU_PAGING_MODE_LEVEL_2;
           merge_level <= hd->arch.paging_mode; merge_level++ )
@@ -698,15 +701,16 @@ int amd_iommu_map_page(struct domain *d, unsigned long bfn, unsigned long mfn,
         if ( pt_mfn[merge_level] == 0 )
             break;
         if ( !iommu_update_pde_count(d, pt_mfn[merge_level],
-                                     bfn, mfn, merge_level) )
+                                     bfn_x(bfn), mfn_x(mfn), merge_level) )
             break;
 
-        if ( iommu_merge_pages(d, pt_mfn[merge_level], bfn,
+        if ( iommu_merge_pages(d, pt_mfn[merge_level], bfn_x(bfn),
                                flags, merge_level) )
         {
             spin_unlock(&hd->arch.mapping_lock);
             AMD_IOMMU_DEBUG("Merge iommu page failed at level %d, "
-                            "bfn = %lx mfn = %lx\n", merge_level, bfn, mfn);
+                            "bfn = %"PRI_bfn" mfn = %"PRI_mfn"\n",
+                            merge_level, bfn_x(bfn), mfn_x(mfn));
             domain_crash(d);
             return -EFAULT;
         }
@@ -720,7 +724,7 @@ out:
     return 0;
 }
 
-int amd_iommu_unmap_page(struct domain *d, unsigned long bfn)
+int amd_iommu_unmap_page(struct domain *d, bfn_t bfn)
 {
     unsigned long pt_mfn[7];
     struct domain_iommu *hd = dom_iommu(d);
@@ -742,31 +746,33 @@ int amd_iommu_unmap_page(struct domain *d, unsigned long bfn)
      * we might need a deeper page table for lager bfn now */
     if ( is_hvm_domain(d) )
     {
-        int rc = update_paging_mode(d, bfn);
+        int rc = update_paging_mode(d, bfn_x(bfn));
 
         if ( rc )
         {
             spin_unlock(&hd->arch.mapping_lock);
-            AMD_IOMMU_DEBUG("Update page mode failed bfn = %lx\n", bfn);
+            AMD_IOMMU_DEBUG("Update page mode failed bfn = %"PRI_bfn"\n",
+                            bfn_x(bfn));
             if ( rc != -EADDRNOTAVAIL )
                 domain_crash(d);
             return rc;
         }
     }
 
-    if ( iommu_pde_from_bfn(d, bfn, pt_mfn) || (pt_mfn[1] == 0) )
+    if ( iommu_pde_from_bfn(d, bfn_x(bfn), pt_mfn) || (pt_mfn[1] == 0) )
     {
         spin_unlock(&hd->arch.mapping_lock);
-        AMD_IOMMU_DEBUG("Invalid IO pagetable entry bfn = %lx\n", bfn);
+        AMD_IOMMU_DEBUG("Invalid IO pagetable entry bfn = %"PRI_bfn"\n",
+                        bfn_x(bfn));
         domain_crash(d);
         return -EFAULT;
     }
 
     /* mark PTE as 'page not present' */
-    clear_iommu_pte_present(pt_mfn[1], bfn);
+    clear_iommu_pte_present(pt_mfn[1], bfn_x(bfn));
     spin_unlock(&hd->arch.mapping_lock);
 
-    amd_iommu_flush_pages(d, bfn, 0);
+    amd_iommu_flush_pages(d, bfn_x(bfn), 0);
 
     return 0;
 }
@@ -787,7 +793,8 @@ int amd_iommu_reserve_domain_unity_map(struct domain *domain,
     gfn = phys_addr >> PAGE_SHIFT;
     for ( i = 0; i < npages; i++ )
     {
-        rt = amd_iommu_map_page(domain, gfn +i, gfn +i, flags);
+        rt = amd_iommu_map_page(domain, _bfn(gfn + i), _mfn(gfn + i),
+                                flags);
         if ( rt != 0 )
             return rt;
     }
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index d608631e6e..eea22c3d0d 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -271,7 +271,7 @@ static void __hwdom_init amd_iommu_hwdom_init(struct domain *d)
              */
             if ( mfn_valid(_mfn(pfn)) )
             {
-                int ret = amd_iommu_map_page(d, pfn, pfn,
+                int ret = amd_iommu_map_page(d, _bfn(pfn), _mfn(pfn),
                                              IOMMUF_readable|IOMMUF_writable);
 
                 if ( !rc )
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 3605e20afd..7c02335532 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2551,7 +2551,7 @@ static int __must_check arm_smmu_iotlb_flush_all(struct domain *d)
 }
 
 static int __must_check arm_smmu_iotlb_flush(struct domain *d,
-                                             unsigned long bfn,
+                                             bfn_t bfn,
                                              unsigned int page_count)
 {
 	/* ARM SMMU v1 doesn't have flush by VMA and VMID */
@@ -2737,10 +2737,11 @@ static void arm_smmu_iommu_domain_teardown(struct domain *d)
 	xfree(xen_domain);
 }
 
-static int __must_check arm_smmu_map_page(struct domain *d, unsigned long bfn,
-			unsigned long mfn, unsigned int flags)
+static int __must_check arm_smmu_map_page(struct domain *d, bfn_t bfn,
+			mfn_t mfn, unsigned int flags)
 {
 	p2m_type_t t;
+        gfn_t gfn = _gfn(bfn_x(bfn));
 
 	/*
 	 * Grant mappings can be used for DMA requests. The dev_bus_addr
@@ -2751,7 +2752,7 @@ static int __must_check arm_smmu_map_page(struct domain *d, unsigned long bfn,
 	 * function should only be used by gnttab code with bfn == mfn.
 	 */
 	BUG_ON(!is_domain_direct_mapped(d));
-	BUG_ON(mfn != bfn);
+	BUG_ON(mfn_x(mfn) != bfn_x(bfn));
 
 	/* We only support readable and writable flags */
 	if (!(flags & (IOMMUF_readable | IOMMUF_writable)))
@@ -2763,19 +2764,22 @@ static int __must_check arm_smmu_map_page(struct domain *d, unsigned long bfn,
 	 * The function guest_physmap_add_entry replaces the current mapping
 	 * if there is already one...
 	 */
-	return guest_physmap_add_entry(d, _gfn(bfn), _mfn(mfn), 0, t);
+	return guest_physmap_add_entry(d, gfn, mfn, 0, t);
 }
 
-static int __must_check arm_smmu_unmap_page(struct domain *d, unsigned long bfn)
+static int __must_check arm_smmu_unmap_page(struct domain *d, bfn_t bfn)
 {
-	/*
+        gfn_t gfn = _gfn(bfn_x(bfn));
+        mfn_t mfn = _mfn(bfn_x(bfn));
+
+        /*
 	 * This function should only be used by gnttab code when the domain
 	 * is direct mapped
 	 */
 	if ( !is_domain_direct_mapped(d) )
 		return -EINVAL;
 
-	return guest_physmap_remove_page(d, _gfn(bfn), _mfn(bfn), 0);
+	return guest_physmap_remove_page(d, gfn, mfn, 0);
 }
 
 static const struct iommu_ops arm_smmu_iommu_ops = {
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index b25d9e3707..7de830f6ce 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -194,7 +194,8 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
                   == PGT_writable_page) )
                 mapping |= IOMMUF_writable;
 
-            ret = hd->platform_ops->map_page(d, gfn, mfn, mapping);
+            ret = hd->platform_ops->map_page(d, _bfn(gfn), _mfn(mfn),
+                                             mapping);
             if ( !rc )
                 rc = ret;
 
@@ -264,7 +265,7 @@ int iommu_map_page(struct domain *d, bfn_t bfn, mfn_t mfn,
     if ( !iommu_enabled || !hd->platform_ops )
         return 0;
 
-    rc = hd->platform_ops->map_page(d, bfn_x(bfn), mfn_x(mfn), flags);
+    rc = hd->platform_ops->map_page(d, bfn, mfn, flags);
     if ( unlikely(rc) )
     {
         if ( !d->is_shutting_down && printk_ratelimit() )
@@ -287,7 +288,7 @@ int iommu_unmap_page(struct domain *d, bfn_t bfn)
     if ( !iommu_enabled || !hd->platform_ops )
         return 0;
 
-    rc = hd->platform_ops->unmap_page(d, bfn_x(bfn));
+    rc = hd->platform_ops->unmap_page(d, bfn);
     if ( unlikely(rc) )
     {
         if ( !d->is_shutting_down && printk_ratelimit() )
@@ -327,7 +328,7 @@ int iommu_iotlb_flush(struct domain *d, bfn_t bfn, unsigned int page_count)
     if ( !iommu_enabled || !hd->platform_ops || !hd->platform_ops->iotlb_flush )
         return 0;
 
-    rc = hd->platform_ops->iotlb_flush(d, bfn_x(bfn), page_count);
+    rc = hd->platform_ops->iotlb_flush(d, bfn, page_count);
     if ( unlikely(rc) )
     {
         if ( !d->is_shutting_down && printk_ratelimit() )
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 18752819a7..a27529412a 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -585,7 +585,7 @@ static int __must_check iommu_flush_all(void)
 }
 
 static int __must_check iommu_flush_iotlb(struct domain *d,
-                                          unsigned long bfn,
+                                          bfn_t bfn,
                                           bool_t dma_old_pte_present,
                                           unsigned int page_count)
 {
@@ -612,12 +612,12 @@ static int __must_check iommu_flush_iotlb(struct domain *d,
         if ( iommu_domid == -1 )
             continue;
 
-        if ( page_count != 1 || bfn == bfn_x(INVALID_BFN) )
+        if ( page_count != 1 || bfn_eq(bfn, INVALID_BFN) )
             rc = iommu_flush_iotlb_dsi(iommu, iommu_domid,
                                        0, flush_dev_iotlb);
         else
             rc = iommu_flush_iotlb_psi(iommu, iommu_domid,
-                                       (paddr_t)bfn << PAGE_SHIFT_4K,
+                                       (paddr_t)bfn_x(bfn) << PAGE_SHIFT_4K,
                                        PAGE_ORDER_4K,
                                        !dma_old_pte_present,
                                        flush_dev_iotlb);
@@ -633,7 +633,7 @@ static int __must_check iommu_flush_iotlb(struct domain *d,
 }
 
 static int __must_check iommu_flush_iotlb_pages(struct domain *d,
-                                                unsigned long bfn,
+                                                bfn_t bfn,
                                                 unsigned int page_count)
 {
     return iommu_flush_iotlb(d, bfn, 1, page_count);
@@ -641,7 +641,7 @@ static int __must_check iommu_flush_iotlb_pages(struct domain *d,
 
 static int __must_check iommu_flush_iotlb_all(struct domain *d)
 {
-    return iommu_flush_iotlb(d, bfn_x(INVALID_BFN), 0, 0);
+    return iommu_flush_iotlb(d, INVALID_BFN, 0, 0);
 }
 
 /* clear one page's page table */
@@ -676,7 +676,8 @@ static int __must_check dma_pte_clear_one(struct domain *domain, u64 addr)
     iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
 
     if ( !this_cpu(iommu_dont_flush_iotlb) )
-        rc = iommu_flush_iotlb_pages(domain, addr >> PAGE_SHIFT_4K, 1);
+        rc = iommu_flush_iotlb_pages(domain, _bfn(addr >> PAGE_SHIFT_4K),
+                                     1);
 
     unmap_vtd_domain_page(page);
 
@@ -1761,8 +1762,7 @@ static void iommu_domain_teardown(struct domain *d)
 }
 
 static int __must_check intel_iommu_map_page(struct domain *d,
-                                             unsigned long bfn,
-                                             unsigned long mfn,
+                                             bfn_t bfn, mfn_t mfn,
                                              unsigned int flags)
 {
     struct domain_iommu *hd = dom_iommu(d);
@@ -1780,16 +1780,17 @@ static int __must_check intel_iommu_map_page(struct domain *d,
 
     spin_lock(&hd->arch.mapping_lock);
 
-    pg_maddr = addr_to_dma_page_maddr(d, (paddr_t)bfn << PAGE_SHIFT_4K, 1);
+    pg_maddr =
+        addr_to_dma_page_maddr(d, (paddr_t)bfn_x(bfn) << PAGE_SHIFT_4K, 1);
     if ( pg_maddr == 0 )
     {
         spin_unlock(&hd->arch.mapping_lock);
         return -ENOMEM;
     }
     page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
-    pte = page + (bfn & LEVEL_MASK);
+    pte = page + (bfn_x(bfn) & LEVEL_MASK);
     old = *pte;
-    dma_set_pte_addr(new, (paddr_t)mfn << PAGE_SHIFT_4K);
+    dma_set_pte_addr(new, (paddr_t)mfn_x(mfn) << PAGE_SHIFT_4K);
     dma_set_pte_prot(new,
                      ((flags & IOMMUF_readable) ? DMA_PTE_READ  : 0) |
                      ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0));
@@ -1817,13 +1818,13 @@ static int __must_check intel_iommu_map_page(struct domain *d,
 }
 
 static int __must_check intel_iommu_unmap_page(struct domain *d,
-                                               unsigned long bfn)
+                                               bfn_t bfn)
 {
     /* Do nothing if hardware domain and iommu supports pass thru. */
     if ( iommu_passthrough && is_hardware_domain(d) )
         return 0;
 
-    return dma_pte_clear_one(d, (paddr_t)bfn << PAGE_SHIFT_4K);
+    return dma_pte_clear_one(d, (paddr_t)bfn_x(bfn) << PAGE_SHIFT_4K);
 }
 
 int iommu_pte_flush(struct domain *d, u64 bfn, u64 *pte,
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index 0253823173..5e221fa6ff 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -65,7 +65,7 @@ int arch_iommu_populate_page_table(struct domain *d)
             {
                 ASSERT(!(gfn >> DEFAULT_DOMAIN_ADDRESS_WIDTH));
                 BUG_ON(SHARED_M2P(gfn));
-                rc = hd->platform_ops->map_page(d, gfn, mfn,
+                rc = hd->platform_ops->map_page(d, _bfn(gfn), _mfn(mfn),
                                                 IOMMUF_readable |
                                                 IOMMUF_writable);
             }
diff --git a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
index 99bc21c7b3..dce9ed6b83 100644
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -52,9 +52,9 @@ int amd_iommu_init(void);
 int amd_iommu_update_ivrs_mapping_acpi(void);
 
 /* mapping functions */
-int __must_check amd_iommu_map_page(struct domain *d, unsigned long gfn,
-                                    unsigned long mfn, unsigned int flags);
-int __must_check amd_iommu_unmap_page(struct domain *d, unsigned long gfn);
+int __must_check amd_iommu_map_page(struct domain *d, bfn_t bfn,
+                                    mfn_t mfn, unsigned int flags);
+int __must_check amd_iommu_unmap_page(struct domain *d, bfn_t bfn);
 u64 amd_iommu_get_next_table_from_pte(u32 *entry);
 int __must_check amd_iommu_alloc_root(struct domain_iommu *hd);
 int amd_iommu_reserve_domain_unity_map(struct domain *domain,
@@ -77,7 +77,7 @@ void iommu_dte_set_guest_cr3(u32 *dte, u16 dom_id, u64 gcr3,
 
 /* send cmd to iommu */
 void amd_iommu_flush_all_pages(struct domain *d);
-void amd_iommu_flush_pages(struct domain *d, unsigned long gfn,
+void amd_iommu_flush_pages(struct domain *d, unsigned long bfn,
                            unsigned int order);
 void amd_iommu_flush_iotlb(u8 devfn, const struct pci_dev *pdev,
                            uint64_t gaddr, unsigned int order);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 3d19918301..fd6f6fb05a 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -47,6 +47,11 @@ TYPE_SAFE(unsigned long, bfn);
 #undef bfn_x
 #endif
 
+static inline bool_t bfn_eq(bfn_t x, bfn_t y)
+{
+    return bfn_x(x) == bfn_x(y);
+}
+
 extern bool_t iommu_enable, iommu_enabled;
 extern bool_t force_iommu, iommu_verbose;
 extern bool_t iommu_workaround_bios_bug, iommu_igfx, iommu_passthrough;
@@ -171,9 +176,9 @@ struct iommu_ops {
 #endif /* HAS_PCI */
 
     void (*teardown)(struct domain *d);
-    int __must_check (*map_page)(struct domain *d, unsigned long bfn,
-                                 unsigned long mfn, unsigned int flags);
-    int __must_check (*unmap_page)(struct domain *d, unsigned long bfn);
+    int __must_check (*map_page)(struct domain *d, bfn_t bfn, mfn_t mfn,
+                                 unsigned int flags);
+    int __must_check (*unmap_page)(struct domain *d, bfn_t bfn);
     void (*free_page_table)(struct page_info *);
 #ifdef CONFIG_X86
     void (*update_ire_from_apic)(unsigned int apic, unsigned int reg, unsigned int value);
@@ -184,7 +189,7 @@ struct iommu_ops {
     void (*resume)(void);
     void (*share_p2m)(struct domain *d);
     void (*crash_shutdown)(void);
-    int __must_check (*iotlb_flush)(struct domain *d, unsigned long bfn,
+    int __must_check (*iotlb_flush)(struct domain *d, bfn_t bfn,
                                     unsigned int page_count);
     int __must_check (*iotlb_flush_all)(struct domain *d);
     int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 4/7] vtd: add lookup_page method to iommu_ops
  2018-02-12 10:47 [PATCH 0/7] paravirtual IOMMU interface Paul Durrant
                   ` (2 preceding siblings ...)
  2018-02-12 10:47 ` [PATCH 3/7] iommu: push use of type-safe BFN and MFN into iommu_ops Paul Durrant
@ 2018-02-12 10:47 ` Paul Durrant
  2018-03-15 16:54   ` Jan Beulich
  2018-02-12 10:47 ` [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op Paul Durrant
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-02-12 10:47 UTC (permalink / raw)
  To: xen-devel; +Cc: Kevin Tian, Paul Durrant, Jan Beulich

This patch adds a new method to the VT-d IOMMU implementation to find the
MFN currently mapped by the specified BFN. This functionality will be used
by a subsequent patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/drivers/passthrough/vtd/iommu.c | 39 +++++++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/vtd/iommu.h |  2 ++
 xen/include/xen/iommu.h             |  2 ++
 3 files changed, 43 insertions(+)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index a27529412a..bc4fc36d5f 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1827,6 +1827,44 @@ static int __must_check intel_iommu_unmap_page(struct domain *d,
     return dma_pte_clear_one(d, (paddr_t)bfn_x(bfn) << PAGE_SHIFT_4K);
 }
 
+static int intel_iommu_lookup_page(struct domain *d, bfn_t bfn, mfn_t *mfn,
+                                   unsigned int *flags)
+{
+    struct domain_iommu *hd = dom_iommu(d);
+    struct dma_pte *page = NULL, *pte = NULL, val;
+    u64 pg_maddr;
+
+    spin_lock(&hd->arch.mapping_lock);
+
+    pg_maddr =
+        addr_to_dma_page_maddr(d, (paddr_t)bfn_x(bfn) << PAGE_SHIFT_4K, 1);
+    if ( pg_maddr == 0 )
+    {
+        spin_unlock(&hd->arch.mapping_lock);
+        return -ENOMEM;
+    }
+    page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
+    pte = page + (bfn_x(bfn) & LEVEL_MASK);
+    val = *pte;
+    if (!dma_pte_present(val)) {
+        unmap_vtd_domain_page(page);
+        spin_unlock(&hd->arch.mapping_lock);
+        return -ENOMEM;
+    }
+    unmap_vtd_domain_page(page);
+    spin_unlock(&hd->arch.mapping_lock);
+
+    *mfn = _mfn(dma_get_pte_addr(val) >> PAGE_SHIFT_4K);
+
+    *flags = 0;
+    if (dma_get_pte_prot(val) & DMA_PTE_READ)
+        *flags |= IOMMUF_readable;
+    if (dma_get_pte_prot(val) & DMA_PTE_WRITE)
+        *flags |= IOMMUF_writable;
+
+    return 0;
+}
+
 int iommu_pte_flush(struct domain *d, u64 bfn, u64 *pte,
                     int order, int present)
 {
@@ -2652,6 +2690,7 @@ const struct iommu_ops intel_iommu_ops = {
     .teardown = iommu_domain_teardown,
     .map_page = intel_iommu_map_page,
     .unmap_page = intel_iommu_unmap_page,
+    .lookup_page = intel_iommu_lookup_page,
     .free_page_table = iommu_free_page_table,
     .reassign_device = reassign_device_ownership,
     .get_device_group_id = intel_iommu_group_id,
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h
index 72c1a2e3cd..5eda66868e 100644
--- a/xen/drivers/passthrough/vtd/iommu.h
+++ b/xen/drivers/passthrough/vtd/iommu.h
@@ -272,9 +272,11 @@ struct dma_pte {
 #define dma_set_pte_prot(p, prot) do { \
         (p).val = ((p).val & ~DMA_PTE_PROT) | ((prot) & DMA_PTE_PROT); \
     } while (0)
+#define dma_get_pte_prot(p) ((p).val & DMA_PTE_PROT)
 #define dma_pte_addr(p) ((p).val & PADDR_MASK & PAGE_MASK_4K)
 #define dma_set_pte_addr(p, addr) do {\
             (p).val |= ((addr) & PAGE_MASK_4K); } while (0)
+#define dma_get_pte_addr(p) ((p).val & PAGE_MASK_4K)
 #define dma_pte_present(p) (((p).val & DMA_PTE_PROT) != 0)
 #define dma_pte_superpage(p) (((p).val & DMA_PTE_SP) != 0)
 
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index fd6f6fb05a..40099e8f32 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -179,6 +179,8 @@ struct iommu_ops {
     int __must_check (*map_page)(struct domain *d, bfn_t bfn, mfn_t mfn,
                                  unsigned int flags);
     int __must_check (*unmap_page)(struct domain *d, bfn_t bfn);
+    int __must_check (*lookup_page)(struct domain *d, bfn_t bfn, mfn_t *mfn,
+                                    unsigned int *flags);
     void (*free_page_table)(struct page_info *);
 #ifdef CONFIG_X86
     void (*update_ire_from_apic)(unsigned int apic, unsigned int reg, unsigned int value);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-02-12 10:47 [PATCH 0/7] paravirtual IOMMU interface Paul Durrant
                   ` (3 preceding siblings ...)
  2018-02-12 10:47 ` [PATCH 4/7] vtd: add lookup_page method to iommu_ops Paul Durrant
@ 2018-02-12 10:47 ` Paul Durrant
  2018-02-13  6:43   ` Tian, Kevin
  2018-03-16 12:25   ` Jan Beulich
  2018-02-12 10:47 ` [PATCH 6/7] x86: add iommu_op to query reserved ranges Paul Durrant
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 68+ messages in thread
From: Paul Durrant @ 2018-02-12 10:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Paul Durrant, Jan Beulich,
	Daniel De Graaf

This patch introduces the boilerplate for a new hypercall to allow a
domain to control IOMMU mappings for its own pages.
Whilst there is duplication of code between the native and compat entry
points which appears ripe for some form of combination, I think it is
better to maintain the separation as-is because the compat entry point
will necessarily gain complexity in subsequent patches.

NOTE: This hypercall is only implemented for x86 and is currently
      restricted by XSM to dom0 since it could be used to cause IOMMU
      faults which may bring down a host.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
---
 tools/flask/policy/modules/xen.if   |   1 +
 xen/arch/x86/Makefile               |   1 +
 xen/arch/x86/hvm/hypercall.c        |   1 +
 xen/arch/x86/hypercall.c            |   1 +
 xen/arch/x86/iommu_op.c             | 169 ++++++++++++++++++++++++++++++++++++
 xen/arch/x86/pv/hypercall.c         |   1 +
 xen/include/Makefile                |   2 +
 xen/include/public/iommu_op.h       |  55 ++++++++++++
 xen/include/public/xen.h            |   1 +
 xen/include/xen/hypercall.h         |  12 +++
 xen/include/xlat.lst                |   1 +
 xen/include/xsm/dummy.h             |   6 ++
 xen/include/xsm/xsm.h               |   6 ++
 xen/xsm/dummy.c                     |   1 +
 xen/xsm/flask/hooks.c               |   6 ++
 xen/xsm/flask/policy/access_vectors |   2 +
 16 files changed, 266 insertions(+)
 create mode 100644 xen/arch/x86/iommu_op.c
 create mode 100644 xen/include/public/iommu_op.h

diff --git a/tools/flask/policy/modules/xen.if b/tools/flask/policy/modules/xen.if
index 459880bb01..5a1d447afd 100644
--- a/tools/flask/policy/modules/xen.if
+++ b/tools/flask/policy/modules/xen.if
@@ -59,6 +59,7 @@ define(`create_domain_common', `
 	allow $1 $2:grant setup;
 	allow $1 $2:hvm { getparam hvmctl sethvmc
 			setparam nested altp2mhvm altp2mhvm_op dm };
+	allow $1 $2:resource control_iommu;
 ')
 
 # create_domain(priv, target)
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index d903b7abb9..df3fdc1beb 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -33,6 +33,7 @@ obj-$(CONFIG_CRASH_DEBUG) += gdbstub.o
 obj-y += hypercall.o
 obj-y += i387.o
 obj-y += i8259.o
+obj-y += iommu_op.o
 obj-y += io_apic.o
 obj-$(CONFIG_LIVEPATCH) += alternative.o livepatch.o
 obj-y += msi.o
diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 5742dd1797..df96019103 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -132,6 +132,7 @@ static const hypercall_table_t hvm_hypercall_table[] = {
     COMPAT_CALL(mmuext_op),
     HYPERCALL(xenpmu_op),
     COMPAT_CALL(dm_op),
+    COMPAT_CALL(iommu_op),
     HYPERCALL(arch_1)
 };
 
diff --git a/xen/arch/x86/hypercall.c b/xen/arch/x86/hypercall.c
index 90e88c1d2c..045753e702 100644
--- a/xen/arch/x86/hypercall.c
+++ b/xen/arch/x86/hypercall.c
@@ -68,6 +68,7 @@ const hypercall_args_t hypercall_args_table[NR_hypercalls] =
     ARGS(xenpmu_op, 2),
     ARGS(dm_op, 3),
     ARGS(mca, 1),
+    ARGS(iommu_op, 2),
     ARGS(arch_1, 1),
 };
 
diff --git a/xen/arch/x86/iommu_op.c b/xen/arch/x86/iommu_op.c
new file mode 100644
index 0000000000..edd8a384b3
--- /dev/null
+++ b/xen/arch/x86/iommu_op.c
@@ -0,0 +1,169 @@
+/******************************************************************************
+ * x86/iommu_op.c
+ *
+ * Paravirtualised IOMMU functionality
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Copyright (C) 2018 Citrix Systems Inc
+ */
+
+#include <xen/event.h>
+#include <xen/guest_access.h>
+#include <xen/hypercall.h>
+
+static bool can_control_iommu(void)
+{
+    struct domain *currd = current->domain;
+
+    /*
+     * IOMMU mappings cannot be manipulated if:
+     * - the IOMMU is not enabled or,
+     * - the IOMMU is passed through or,
+     * - shared EPT configured or,
+     * - Xen is maintaining an identity map.
+     */
+    if ( !iommu_enabled || iommu_passthrough ||
+         iommu_use_hap_pt(currd) || need_iommu(currd) )
+        return false;
+
+    return true;
+}
+
+static void iommu_op(xen_iommu_op_t *op)
+{
+    switch ( op->op )
+    {
+    default:
+        op->status = -EOPNOTSUPP;
+        break;
+    }
+}
+
+long do_iommu_op(XEN_GUEST_HANDLE_PARAM(xen_iommu_op_t) uops,
+                 unsigned int count)
+{
+    unsigned int i;
+    int rc;
+
+    rc = xsm_iommu_op(XSM_PRIV, current->domain);
+    if ( rc )
+        return rc;
+
+    if ( !can_control_iommu() )
+        return -EACCES;
+
+    for ( i = 0; i < count; i++ )
+    {
+        xen_iommu_op_t op;
+
+        if ( ((i & 0xff) == 0xff) && hypercall_preempt_check() )
+        {
+            rc = i;
+            break;
+        }
+
+        if ( copy_from_guest_offset(&op, uops, i, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        iommu_op(&op);
+
+        if ( copy_to_guest_offset(uops, i, &op, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+    }
+
+    if ( rc > 0 )
+    {
+        ASSERT(rc < count);
+        guest_handle_add_offset(uops, rc);
+        count -= rc;
+
+        rc = hypercall_create_continuation(__HYPERVISOR_iommu_op,
+                                           "hi", uops, count);
+    }
+
+    return rc;
+}
+
+int compat_iommu_op(XEN_GUEST_HANDLE_PARAM(compat_iommu_op_t) uops,
+                    unsigned int count)
+{
+    unsigned int i;
+    int rc;
+
+    rc = xsm_iommu_op(XSM_PRIV, current->domain);
+    if ( rc )
+        return rc;
+
+    if ( !can_control_iommu() )
+        return -EACCES;
+
+    for ( i = 0; i < count; i++ )
+    {
+        compat_iommu_op_t cmp;
+        xen_iommu_op_t nat;
+
+        if ( ((i & 0xff) == 0xff) && hypercall_preempt_check() )
+        {
+            rc = i;
+            break;
+        }
+
+        if ( copy_from_guest_offset(&cmp, uops, i, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        XLAT_iommu_op(&nat, &cmp);
+
+        iommu_op(&nat);
+
+        XLAT_iommu_op(&cmp, &nat);
+
+        if ( copy_to_guest_offset(uops, i, &cmp, 1) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+    }
+
+    if ( rc > 0 )
+    {
+        ASSERT(rc < count);
+        guest_handle_add_offset(uops, rc);
+        count -= rc;
+
+        rc = hypercall_create_continuation(__HYPERVISOR_iommu_op,
+                                           "hi", uops, count);
+    }
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/pv/hypercall.c b/xen/arch/x86/pv/hypercall.c
index bbc3011d1a..d23f9af42f 100644
--- a/xen/arch/x86/pv/hypercall.c
+++ b/xen/arch/x86/pv/hypercall.c
@@ -80,6 +80,7 @@ const hypercall_table_t pv_hypercall_table[] = {
     HYPERCALL(xenpmu_op),
     COMPAT_CALL(dm_op),
     HYPERCALL(mca),
+    COMPAT_CALL(iommu_op),
     HYPERCALL(arch_1),
 };
 
diff --git a/xen/include/Makefile b/xen/include/Makefile
index 19066a33a0..ac3d6e5aef 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -11,6 +11,7 @@ headers-y := \
     compat/features.h \
     compat/grant_table.h \
     compat/kexec.h \
+    compat/iommu_op.h \
     compat/memory.h \
     compat/nmi.h \
     compat/physdev.h \
@@ -28,6 +29,7 @@ headers-$(CONFIG_X86)     += compat/arch-x86/xen.h
 headers-$(CONFIG_X86)     += compat/arch-x86/xen-$(compat-arch-y).h
 headers-$(CONFIG_X86)     += compat/hvm/hvm_vcpu.h
 headers-$(CONFIG_X86)     += compat/hvm/dm_op.h
+headers-$(CONFIG_X86)     += compat/iommu_op.h
 headers-y                 += compat/arch-$(compat-arch-y).h compat/pmu.h compat/xlat.h
 headers-$(CONFIG_FLASK)   += compat/xsm/flask_op.h
 
diff --git a/xen/include/public/iommu_op.h b/xen/include/public/iommu_op.h
new file mode 100644
index 0000000000..202cb63fb5
--- /dev/null
+++ b/xen/include/public/iommu_op.h
@@ -0,0 +1,55 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (C) 2018 Citrix Systems Inc
+ */
+
+#ifndef __XEN_PUBLIC_IOMMU_OP_H__
+#define __XEN_PUBLIC_IOMMU_OP_H__
+
+#include "xen.h"
+
+struct xen_iommu_op {
+    uint16_t op;
+    uint16_t flags; /* op specific flags */
+    int32_t status; /* op completion status: */
+                    /* 0 for success otherwise, negative errno */
+};
+typedef struct xen_iommu_op xen_iommu_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_iommu_op_t);
+
+/* ` enum neg_errnoval
+ * ` HYPERVISOR_iommu_op(xen_iommu_op_t ops[],
+ * `                     unsigned int count)
+ * `
+ *
+ * @ops points to an array of @count xen_iommu_op structures.
+ */
+
+#endif /* __XEN_PUBLIC_IOMMU_OP_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 308109f176..4200264411 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -121,6 +121,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
 #define __HYPERVISOR_xenpmu_op            40
 #define __HYPERVISOR_dm_op                41
+#define __HYPERVISOR_iommu_op             42
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index cc99aea57d..8cb62b7d65 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -16,6 +16,7 @@
 #include <public/version.h>
 #include <public/pmu.h>
 #include <public/hvm/dm_op.h>
+#include <public/iommu_op.h>
 #include <asm/hypercall.h>
 #include <xsm/xsm.h>
 
@@ -148,6 +149,10 @@ do_dm_op(
     unsigned int nr_bufs,
     XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs);
 
+extern long
+do_iommu_op(XEN_GUEST_HANDLE_PARAM(xen_iommu_op_t) ops,
+            unsigned int count);
+
 #ifdef CONFIG_COMPAT
 
 extern int
@@ -205,6 +210,13 @@ compat_dm_op(
     unsigned int nr_bufs,
     XEN_GUEST_HANDLE_PARAM(void) bufs);
 
+#include <compat/iommu_op.h>
+
+DEFINE_XEN_GUEST_HANDLE(compat_iommu_op_t);
+extern int
+compat_iommu_op(XEN_GUEST_HANDLE_PARAM(compat_iommu_op_t) ops,
+                unsigned int count);
+
 #endif
 
 void arch_get_xen_caps(xen_capabilities_info_t *info);
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 3690b97d5d..7409759084 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -76,6 +76,7 @@
 ?	vcpu_hvm_context		hvm/hvm_vcpu.h
 ?	vcpu_hvm_x86_32			hvm/hvm_vcpu.h
 ?	vcpu_hvm_x86_64			hvm/hvm_vcpu.h
+!	iommu_op			iommu_op.h
 ?	kexec_exec			kexec.h
 !	kexec_image			kexec.h
 !	kexec_range			kexec.h
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index d6ddadcafd..69431c88cd 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -701,6 +701,12 @@ static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d)
     return xsm_default_action(action, current->domain, d);
 }
 
+static XSM_INLINE int xsm_iommu_op(XSM_DEFAULT_ARG struct domain *d)
+{
+    XSM_ASSERT_ACTION(XSM_PRIV);
+    return xsm_default_action(action, current->domain, d);
+}
+
 #endif /* CONFIG_X86 */
 
 #include <public/version.h>
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index e3912bcc9d..9d95a4e5bb 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -178,6 +178,7 @@ struct xsm_operations {
     int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
     int (*pmu_op) (struct domain *d, unsigned int op);
     int (*dm_op) (struct domain *d);
+    int (*iommu_op) (struct domain *d);
 #endif
     int (*xen_version) (uint32_t cmd);
 };
@@ -685,6 +686,11 @@ static inline int xsm_dm_op(xsm_default_t def, struct domain *d)
     return xsm_ops->dm_op(d);
 }
 
+static inline int xsm_iommu_op(xsm_default_t def, struct domain *d)
+{
+    return xsm_ops->iommu_op(d);
+}
+
 #endif /* CONFIG_X86 */
 
 static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 479b103614..21cefdee74 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -155,6 +155,7 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, ioport_mapping);
     set_to_dummy_if_null(ops, pmu_op);
     set_to_dummy_if_null(ops, dm_op);
+    set_to_dummy_if_null(ops, iommu_op);
 #endif
     set_to_dummy_if_null(ops, xen_version);
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 1802d8dfe6..1bef3269c4 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1670,6 +1670,11 @@ static int flask_dm_op(struct domain *d)
     return current_has_perm(d, SECCLASS_HVM, HVM__DM);
 }
 
+static int flask_iommu_op(struct domain *d)
+{
+    return current_has_perm(d, SECCLASS_RESOURCE, RESOURCE__CONTROL_IOMMU);
+}
+
 #endif /* CONFIG_X86 */
 
 static int flask_xen_version (uint32_t op)
@@ -1843,6 +1848,7 @@ static struct xsm_operations flask_ops = {
     .ioport_mapping = flask_ioport_mapping,
     .pmu_op = flask_pmu_op,
     .dm_op = flask_dm_op,
+    .iommu_op = flask_iommu_op,
 #endif
     .xen_version = flask_xen_version,
 };
diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
index 89b99966bb..190017dfc3 100644
--- a/xen/xsm/flask/policy/access_vectors
+++ b/xen/xsm/flask/policy/access_vectors
@@ -471,6 +471,8 @@ class resource
 # checked for PHYSDEVOP_setup_gsi (target IRQ)
 # checked for PHYSDEVOP_pci_mmcfg_reserved (target xen_t)
     setup
+# checked for IOMMU_OP
+    control_iommu
 }
 
 # Class security describes the FLASK security server itself; these operations
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 6/7] x86: add iommu_op to query reserved ranges
  2018-02-12 10:47 [PATCH 0/7] paravirtual IOMMU interface Paul Durrant
                   ` (4 preceding siblings ...)
  2018-02-12 10:47 ` [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op Paul Durrant
@ 2018-02-12 10:47 ` Paul Durrant
  2018-02-13  6:51   ` Tian, Kevin
                     ` (2 more replies)
  2018-02-12 10:47 ` [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB Paul Durrant
  2018-02-13  6:21 ` [PATCH 0/7] paravirtual IOMMU interface Tian, Kevin
  7 siblings, 3 replies; 68+ messages in thread
From: Paul Durrant @ 2018-02-12 10:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Paul Durrant, Jan Beulich

Certain areas of memory, such as RMRRs, must be mapped 1:1
(i.e. BFN == MFN) through the IOMMU.

This patch adds an iommu_op to allow these ranges to be queried.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/iommu_op.c       | 121 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/iommu_op.h |  35 ++++++++++++
 xen/include/xlat.lst          |   2 +
 3 files changed, 158 insertions(+)

diff --git a/xen/arch/x86/iommu_op.c b/xen/arch/x86/iommu_op.c
index edd8a384b3..ac81b98b7a 100644
--- a/xen/arch/x86/iommu_op.c
+++ b/xen/arch/x86/iommu_op.c
@@ -22,6 +22,58 @@
 #include <xen/event.h>
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
+#include <xen/iommu.h>
+
+struct get_rdm_ctxt {
+    unsigned int max_entries;
+    unsigned int nr_entries;
+    XEN_GUEST_HANDLE(xen_iommu_reserved_region_t) regions;
+};
+
+static int get_rdm(xen_pfn_t start, xen_ulong_t nr, u32 id, void *arg)
+{
+    struct get_rdm_ctxt *ctxt = arg;
+
+    if ( ctxt->nr_entries < ctxt->max_entries )
+    {
+        xen_iommu_reserved_region_t region = {
+            .start_bfn = start,
+            .nr_frames = nr,
+        };
+
+        if ( copy_to_guest_offset(ctxt->regions, ctxt->nr_entries, &region,
+                                  1) )
+            return -EFAULT;
+    }
+
+    ctxt->nr_entries++;
+
+    return 1;
+}
+
+static int iommuop_query_reserved(struct xen_iommu_op_query_reserved *op)
+{
+    struct get_rdm_ctxt ctxt = {
+        .max_entries = op->nr_entries,
+        .regions = op->regions,
+    };
+    int rc;
+
+    if (op->pad != 0)
+        return -EINVAL;
+
+    rc = iommu_get_reserved_device_memory(get_rdm, &ctxt);
+    if ( rc )
+        return rc;
+
+    /* Pass back the actual number of reserved regions */
+    op->nr_entries = ctxt.nr_entries;
+
+    if ( ctxt.nr_entries > ctxt.max_entries )
+        return -ENOBUFS;
+
+    return 0;
+}
 
 static bool can_control_iommu(void)
 {
@@ -45,6 +97,10 @@ static void iommu_op(xen_iommu_op_t *op)
 {
     switch ( op->op )
     {
+    case XEN_IOMMUOP_query_reserved:
+        op->status = iommuop_query_reserved(&op->u.query_reserved);
+        break;
+
     default:
         op->status = -EOPNOTSUPP;
         break;
@@ -119,6 +175,8 @@ int compat_iommu_op(XEN_GUEST_HANDLE_PARAM(compat_iommu_op_t) uops,
     {
         compat_iommu_op_t cmp;
         xen_iommu_op_t nat;
+        unsigned int u;
+        int32_t status;
 
         if ( ((i & 0xff) == 0xff) && hypercall_preempt_check() )
         {
@@ -132,12 +190,75 @@ int compat_iommu_op(XEN_GUEST_HANDLE_PARAM(compat_iommu_op_t) uops,
             break;
         }
 
+        /*
+         * The xlat magic doesn't quite know how to handle the union so
+         * we need to fix things up here.
+         */
+#define XLAT_iommu_op_u_query_reserved XEN_IOMMUOP_query_reserved
+        u = cmp.op;
+
+#define XLAT_iommu_op_query_reserved_HNDL_regions(_d_, _s_) \
+        do \
+        { \
+            if ( !compat_handle_is_null((_s_)->regions) ) \
+            { \
+                unsigned int *nr_entries = COMPAT_ARG_XLAT_VIRT_BASE; \
+                xen_iommu_reserved_region_t *regions = \
+                    (void *)(nr_entries + 1); \
+                \
+                if ( sizeof(*nr_entries) + \
+                     (sizeof(*regions) * (_s_)->nr_entries) > \
+                     COMPAT_ARG_XLAT_SIZE ) \
+                    return -E2BIG; \
+                \
+                *nr_entries = (_s_)->nr_entries; \
+                set_xen_guest_handle((_d_)->regions, regions); \
+            } \
+            else \
+                set_xen_guest_handle((_d_)->regions, NULL); \
+        } while (false)
+
         XLAT_iommu_op(&nat, &cmp);
 
+#undef XLAT_iommu_op_query_reserved_HNDL_regions
+
         iommu_op(&nat);
 
+        status = nat.status;
+
+#define XLAT_iommu_op_query_reserved_HNDL_regions(_d_, _s_) \
+        do \
+        { \
+            if ( !compat_handle_is_null((_d_)->regions) ) \
+            { \
+                unsigned int *nr_entries = COMPAT_ARG_XLAT_VIRT_BASE; \
+                xen_iommu_reserved_region_t *regions = \
+                    (void *)(nr_entries + 1); \
+                unsigned int j; \
+                \
+                for ( j = 0; \
+                      j < min_t(unsigned int, (_d_)->nr_entries, \
+                                *nr_entries); \
+                      j++ ) \
+                { \
+                    compat_iommu_reserved_region_t region; \
+                    \
+                    XLAT_iommu_reserved_region(&region, &regions[j]); \
+                    \
+                    if ( __copy_to_compat_offset((_d_)->regions, j, \
+                                                 &region, 1) ) \
+                        status = -EFAULT; \
+                } \
+            } \
+        } while (false)
+
         XLAT_iommu_op(&cmp, &nat);
 
+        /* The status will have been modified if copy_to_compat() failed */
+        cmp.status = status;
+
+#undef XLAT_iommu_op_query_reserved_HNDL_regions
+
         if ( copy_to_guest_offset(uops, i, &cmp, 1) )
         {
             rc = -EFAULT;
diff --git a/xen/include/public/iommu_op.h b/xen/include/public/iommu_op.h
index 202cb63fb5..24b8b9e0cc 100644
--- a/xen/include/public/iommu_op.h
+++ b/xen/include/public/iommu_op.h
@@ -25,11 +25,46 @@
 
 #include "xen.h"
 
+typedef unsigned long xen_bfn_t;
+
+/* Structure describing a single region reserved in the IOMMU */
+struct xen_iommu_reserved_region {
+    xen_bfn_t start_bfn;
+    unsigned int nr_frames;
+    unsigned int pad;
+};
+typedef struct xen_iommu_reserved_region xen_iommu_reserved_region_t;
+DEFINE_XEN_GUEST_HANDLE(xen_iommu_reserved_region_t);
+
+/*
+ * XEN_IOMMUOP_query_reserved: Query ranges reserved in the IOMMU.
+ */
+#define XEN_IOMMUOP_query_reserved 1
+
+struct xen_iommu_op_query_reserved {
+    /*
+     * IN/OUT - On entries this is the number of entries available
+     *          in the regions array below.
+     *          On exit this is the actual number of reserved regions.
+     */
+    unsigned int nr_entries;
+    unsigned int pad;
+    /*
+     * OUT - This array is populated with reserved regions. If it is
+     *       not sufficiently large then available entries are populated,
+     *       but the op status code will be set to -ENOBUFS.
+     */
+    XEN_GUEST_HANDLE(xen_iommu_reserved_region_t) regions;
+};
+
 struct xen_iommu_op {
     uint16_t op;
     uint16_t flags; /* op specific flags */
     int32_t status; /* op completion status: */
                     /* 0 for success otherwise, negative errno */
+    union {
+        struct xen_iommu_op_query_reserved query_reserved;
+    } u;
 };
 typedef struct xen_iommu_op xen_iommu_op_t;
 DEFINE_XEN_GUEST_HANDLE(xen_iommu_op_t);
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 7409759084..a2070b6d7d 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -76,6 +76,8 @@
 ?	vcpu_hvm_context		hvm/hvm_vcpu.h
 ?	vcpu_hvm_x86_32			hvm/hvm_vcpu.h
 ?	vcpu_hvm_x86_64			hvm/hvm_vcpu.h
+!	iommu_reserved_region		iommu_op.h
+!	iommu_op_query_reserved		iommu_op.h
 !	iommu_op			iommu_op.h
 ?	kexec_exec			kexec.h
 !	kexec_image			kexec.h
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB
  2018-02-12 10:47 [PATCH 0/7] paravirtual IOMMU interface Paul Durrant
                   ` (5 preceding siblings ...)
  2018-02-12 10:47 ` [PATCH 6/7] x86: add iommu_op to query reserved ranges Paul Durrant
@ 2018-02-12 10:47 ` Paul Durrant
  2018-02-13  6:55   ` Tian, Kevin
  2018-03-19 15:11   ` Jan Beulich
  2018-02-13  6:21 ` [PATCH 0/7] paravirtual IOMMU interface Tian, Kevin
  7 siblings, 2 replies; 68+ messages in thread
From: Paul Durrant @ 2018-02-12 10:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Paul Durrant, Jan Beulich

This patch adds iommu_ops to allow a domain with control_iommu privilege
to map and unmap pages from any guest over which it has mapping privilege
in the IOMMU.
These operations implicitly disable IOTLB flushing so that the caller can
batch operations and then explicitly flush the IOTLB using the iommu_op
also added by this patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/iommu_op.c       | 186 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/iommu_op.h |  37 +++++++++
 xen/include/xlat.lst          |   2 +
 3 files changed, 225 insertions(+)

diff --git a/xen/arch/x86/iommu_op.c b/xen/arch/x86/iommu_op.c
index ac81b98b7a..b10c916279 100644
--- a/xen/arch/x86/iommu_op.c
+++ b/xen/arch/x86/iommu_op.c
@@ -24,6 +24,174 @@
 #include <xen/hypercall.h>
 #include <xen/iommu.h>
 
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_to_page
+#define mfn_to_page(mfn) __mfn_to_page(mfn_x(mfn))
+#undef page_to_mfn
+#define page_to_mfn(page) _mfn(__page_to_mfn(page))
+
+struct check_rdm_ctxt {
+    bfn_t bfn;
+};
+
+static int check_rdm(xen_pfn_t start, xen_ulong_t nr, u32 id, void *arg)
+{
+    struct check_rdm_ctxt *ctxt = arg;
+
+    if ( bfn_x(ctxt->bfn) >= start &&
+         bfn_x(ctxt->bfn) < start + nr )
+        return -EINVAL;
+
+    return 1;
+}
+
+static int iommuop_map(struct xen_iommu_op_map *op, unsigned int flags)
+{
+    struct domain *d, *od, *currd = current->domain;
+    struct domain_iommu *iommu = dom_iommu(currd);
+    const struct iommu_ops *ops = iommu->platform_ops;
+    domid_t domid = op->domid;
+    gfn_t gfn = _gfn(op->gfn);
+    bfn_t bfn = _bfn(op->bfn);
+    mfn_t mfn;
+    struct check_rdm_ctxt ctxt = {
+        .bfn = bfn,
+    };
+    p2m_type_t p2mt;
+    p2m_query_t p2mq;
+    struct page_info *page;
+    unsigned int prot;
+    int rc;
+
+    if (op->pad0 != 0 || op->pad1 != 0)
+        return -EINVAL;
+
+    /*
+     * Both map_page and lookup_page operations must be implemented.
+     * The lookup_page method is not used here but is relied upon by
+     * iommuop_unmap() to drop the page reference taken here.
+     */
+    if ( !ops->map_page || !ops->lookup_page )
+        return -ENOSYS;
+
+    /* Check whether the specified BFN falls in a reserved region */
+    rc = iommu_get_reserved_device_memory(check_rdm, &ctxt);
+    if ( rc )
+        return rc;
+
+    d = rcu_lock_domain_by_any_id(domid);
+    if ( !d )
+        return -ESRCH;
+
+    p2mq = (flags & XEN_IOMMUOP_map_readonly) ?
+        P2M_UNSHARE : P2M_ALLOC;
+    page = get_page_from_gfn(d, gfn_x(gfn), &p2mt, p2mq);
+
+    rc = -ENOENT;
+    if ( !page )
+        goto unlock;
+
+    if ( p2m_is_paged(p2mt) )
+    {
+        p2m_mem_paging_populate(d, gfn_x(gfn));
+        goto release;
+    }
+
+    if ( (p2mq & P2M_UNSHARE) && p2m_is_shared(p2mt) )
+        goto release;
+
+    /*
+     * Make sure the page is RAM and, if it is read-only, that the
+     * read-only flag is present.
+     */
+    rc = -EPERM;
+    if ( !p2m_is_any_ram(p2mt) ||
+         (p2m_is_readonly(p2mt) && !(flags & XEN_IOMMUOP_map_readonly)) )
+        goto release;
+
+    /*
+     * If the calling domain does not own the page then make sure it
+     * has mapping privilege over the page owner.
+     */
+    od = page_get_owner(page);
+    if ( od != currd )
+    {
+        rc = xsm_domain_memory_map(XSM_TARGET, od);
+        if ( rc )
+            goto release;
+    }
+
+    prot = IOMMUF_readable;
+    if ( !(flags & XEN_IOMMUOP_map_readonly) )
+        prot |= IOMMUF_writable;
+
+    mfn = page_to_mfn(page);
+
+    rc = 0;
+    if ( !ops->map_page(currd, bfn, mfn, prot) )
+        goto unlock; /* keep the page ref */
+
+    rc = -EIO;
+
+ release:
+    put_page(page);
+
+ unlock:
+    rcu_unlock_domain(d);
+
+    return rc;
+}
+
+static int iommuop_unmap(struct xen_iommu_op_unmap *op)
+{
+    struct domain *currd = current->domain;
+    struct domain_iommu *iommu = dom_iommu(currd);
+    const struct iommu_ops *ops = iommu->platform_ops;
+    bfn_t bfn = _bfn(op->bfn);
+    mfn_t mfn;
+    struct check_rdm_ctxt ctxt = {
+        .bfn = bfn,
+    };
+    unsigned int flags;
+    struct page_info *page;
+    int rc;
+
+    /*
+     * Both unmap_page and lookup_page operations must be implemented.
+     */
+    if ( !ops->unmap_page || !ops->lookup_page )
+        return -ENOSYS;
+
+    /* Check whether the specified BFN falls in a reserved region */
+    rc = iommu_get_reserved_device_memory(check_rdm, &ctxt);
+    if ( rc )
+        return rc;
+
+    if ( ops->lookup_page(currd, bfn, &mfn, &flags) ||
+         !mfn_valid(mfn) )
+        return -ENOENT;
+
+    page = mfn_to_page(mfn);
+
+    if ( ops->unmap_page(currd, bfn) )
+        return -EIO;
+
+    put_page(page);
+    return 0;
+}
+
+static int iommuop_flush(void)
+{
+    struct domain *currd = current->domain;
+    struct domain_iommu *iommu = dom_iommu(currd);
+    const struct iommu_ops *ops = iommu->platform_ops;
+
+    if ( ops->iotlb_flush_all(currd) )
+        return -EIO;
+
+    return 0;
+}
+
 struct get_rdm_ctxt {
     unsigned int max_entries;
     unsigned int nr_entries;
@@ -101,6 +269,22 @@ static void iommu_op(xen_iommu_op_t *op)
         op->status = iommuop_query_reserved(&op->u.query_reserved);
         break;
 
+    case XEN_IOMMUOP_map:
+        this_cpu(iommu_dont_flush_iotlb) = 1;
+        op->status = iommuop_map(&op->u.map, op->flags);
+        this_cpu(iommu_dont_flush_iotlb) = 0;
+        break;
+
+    case XEN_IOMMUOP_unmap:
+        this_cpu(iommu_dont_flush_iotlb) = 1;
+        op->status = iommuop_unmap(&op->u.unmap);
+        this_cpu(iommu_dont_flush_iotlb) = 0;
+        break;
+
+    case XEN_IOMMUOP_flush:
+        op->status = iommuop_flush();
+        break;
+
     default:
         op->status = -EOPNOTSUPP;
         break;
@@ -195,6 +379,8 @@ int compat_iommu_op(XEN_GUEST_HANDLE_PARAM(compat_iommu_op_t) uops,
          * we need to fix things up here.
          */
 #define XLAT_iommu_op_u_query_reserved XEN_IOMMUOP_query_reserved
+#define XLAT_iommu_op_u_map XEN_IOMMUOP_map
+#define XLAT_iommu_op_u_unmap XEN_IOMMUOP_unmap
         u = cmp.op;
 
 #define XLAT_iommu_op_query_reserved_HNDL_regions(_d_, _s_) \
diff --git a/xen/include/public/iommu_op.h b/xen/include/public/iommu_op.h
index 24b8b9e0cc..9a782603de 100644
--- a/xen/include/public/iommu_op.h
+++ b/xen/include/public/iommu_op.h
@@ -57,13 +57,50 @@ struct xen_iommu_op_query_reserved {
     XEN_GUEST_HANDLE(xen_iommu_reserved_region_t) regions;
 };
 
+/*
+ * XEN_IOMMUOP_map: Map a page in the IOMMU.
+ */
+#define XEN_IOMMUOP_map 2
+
+struct xen_iommu_op_map {
+    /* IN - The IOMMU frame number which will hold the new mapping */
+    xen_bfn_t bfn;
+    /* IN - The guest frame number of the page to be mapped */
+    xen_pfn_t gfn;
+    /* IN - The domid of the guest */
+    domid_t domid;
+    unsigned short pad0;
+    unsigned int pad1;
+};
+
+/*
+ * XEN_IOMMUOP_unmap: Remove a mapping in the IOMMU.
+ */
+#define XEN_IOMMUOP_unmap 3
+
+struct xen_iommu_op_unmap {
+    /* IN - The IOMMU frame number holding the mapping to be cleared */
+    xen_bfn_t bfn;
+};
+
+/*
+ * XEN_IOMMUOP_flush: Flush the IOMMU TLB.
+ */
+#define XEN_IOMMUOP_flush 4
+
 struct xen_iommu_op {
     uint16_t op;
     uint16_t flags; /* op specific flags */
+
+#define _XEN_IOMMUOP_map_readonly 0
+#define XEN_IOMMUOP_map_readonly (1 << (_XEN_IOMMUOP_map_readonly))
+
     int32_t status; /* op completion status: */
                     /* 0 for success otherwise, negative errno */
     union {
         struct xen_iommu_op_query_reserved query_reserved;
+        struct xen_iommu_op_map map;
+        struct xen_iommu_op_unmap unmap;
     } u;
 };
 typedef struct xen_iommu_op xen_iommu_op_t;
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index a2070b6d7d..dddafc7422 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -78,6 +78,8 @@
 ?	vcpu_hvm_x86_64			hvm/hvm_vcpu.h
 !	iommu_reserved_region		iommu_op.h
 !	iommu_op_query_reserved		iommu_op.h
+!	iommu_op_map			iommu_op.h
+!	iommu_op_unmap			iommu_op.h
 !	iommu_op			iommu_op.h
 ?	kexec_exec			kexec.h
 !	kexec_image			kexec.h
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/7] paravirtual IOMMU interface
  2018-02-12 10:47 [PATCH 0/7] paravirtual IOMMU interface Paul Durrant
                   ` (6 preceding siblings ...)
  2018-02-12 10:47 ` [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB Paul Durrant
@ 2018-02-13  6:21 ` Tian, Kevin
  2018-02-13  9:18   ` Paul Durrant
  7 siblings, 1 reply; 68+ messages in thread
From: Tian, Kevin @ 2018-02-13  6:21 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Nakajima, Jun, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall,
	Jan Beulich, Daniel De Graaf, Suravee Suthikulpanit

> From: Paul Durrant [mailto:paul.durrant@citrix.com]
> Sent: Monday, February 12, 2018 6:47 PM
> 
> The idea of a paravirtual IOMMU interface was last discussed on xen-devel
> more than two years ago and narrowed down on a draft specification [1].
> There was also an RFC patch series posted with an implementation,
> however
> this was never followed through.
> 
> In this patch series I have tried to simplify the interface and therefore
> have moved away from the draft specification.

bear sending out an updated spec?

> 
> Patches #1 - #3 in the series introduce 'bus frame numbers' into Xen (frame
> numbers relating to the IOMMU rather than the MMU). The modifications
> are
> in common code and so affect ARM as well as x86.
> 
> Patch #4 adds a pre-requisite method in iommu_ops and an
> implementation
> for VT-d. I have not done an implmentation for AMD IOMMUs as my test
> hard-
> ware is Intel based, but one may be added in future.
> 
> Patches #5 - #7 introduce the new 'iommu_op' hypercall with sub-
> operations
> to query ranges reserved in the IOMMU, map and unmap pages, and flush
> the
> IOTLB.
> 
> For testing purposes, I have implemented patches to a Linux PV dom0 to
> set
> up a 1:1 BFN:GFN mapping and use normal swiotlb dma operations rather
> then xen-swiotlb.
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2016-
> 02/msg01428.html
> 
> Paul Durrant (7):
>   iommu: introduce the concept of BFN...
>   iommu: make use of type-safe BFN and MFN in exported functions
>   iommu: push use of type-safe BFN and MFN into iommu_ops
>   vtd: add lookup_page method to iommu_ops
>   public / x86: introduce __HYPERCALL_iommu_op
>   x86: add iommu_op to query reserved ranges
>   x86: add iommu_ops to map and unmap pages, and also to flush the
> IOTLB
> 
>  tools/flask/policy/modules/xen.if             |   1 +
>  xen/arch/arm/p2m.c                            |   3 +-
>  xen/arch/x86/Makefile                         |   1 +
>  xen/arch/x86/hvm/hypercall.c                  |   1 +
>  xen/arch/x86/hypercall.c                      |   1 +
>  xen/arch/x86/iommu_op.c                       | 476
> ++++++++++++++++++++++++++
>  xen/arch/x86/mm.c                             |   7 +-
>  xen/arch/x86/mm/p2m-ept.c                     |   8 +-
>  xen/arch/x86/mm/p2m-pt.c                      |   8 +-
>  xen/arch/x86/mm/p2m.c                         |  15 +-
>  xen/arch/x86/pv/hypercall.c                   |   1 +
>  xen/arch/x86/x86_64/mm.c                      |   5 +-
>  xen/common/grant_table.c                      |  10 +-
>  xen/common/memory.c                           |   4 +-
>  xen/drivers/passthrough/amd/iommu_cmd.c       |  18 +-
>  xen/drivers/passthrough/amd/iommu_map.c       |  85 ++---
>  xen/drivers/passthrough/amd/pci_amd_iommu.c   |   4 +-
>  xen/drivers/passthrough/arm/smmu.c            |  22 +-
>  xen/drivers/passthrough/iommu.c               |  28 +-
>  xen/drivers/passthrough/vtd/iommu.c           |  76 +++-
>  xen/drivers/passthrough/vtd/iommu.h           |   2 +
>  xen/drivers/passthrough/vtd/x86/vtd.c         |   3 +-
>  xen/drivers/passthrough/x86/iommu.c           |   2 +-
>  xen/include/Makefile                          |   2 +
>  xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |   8 +-
>  xen/include/public/iommu_op.h                 | 127 +++++++
>  xen/include/public/xen.h                      |   1 +
>  xen/include/xen/hypercall.h                   |  12 +
>  xen/include/xen/iommu.h                       |  42 ++-
>  xen/include/xlat.lst                          |   5 +
>  xen/include/xsm/dummy.h                       |   6 +
>  xen/include/xsm/xsm.h                         |   6 +
>  xen/xsm/dummy.c                               |   1 +
>  xen/xsm/flask/hooks.c                         |   6 +
>  xen/xsm/flask/policy/access_vectors           |   2 +
>  35 files changed, 868 insertions(+), 131 deletions(-)
>  create mode 100644 xen/arch/x86/iommu_op.c
>  create mode 100644 xen/include/public/iommu_op.h
> ---
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Julien Grall <julien.grall@arm.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Cc: Tim Deegan <tim@xen.org>
> Cc: Wei Liu <wei.liu2@citrix.com>
> 
> --
> 2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-02-12 10:47 ` [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op Paul Durrant
@ 2018-02-13  6:43   ` Tian, Kevin
  2018-02-13  9:22     ` Paul Durrant
  2018-03-16 12:25   ` Jan Beulich
  1 sibling, 1 reply; 68+ messages in thread
From: Tian, Kevin @ 2018-02-13  6:43 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Jan Beulich, Daniel De Graaf

> From: Paul Durrant
> Sent: Monday, February 12, 2018 6:47 PM
> 
> This patch introduces the boilerplate for a new hypercall to allow a
> domain to control IOMMU mappings for its own pages.
> Whilst there is duplication of code between the native and compat entry
> points which appears ripe for some form of combination, I think it is
> better to maintain the separation as-is because the compat entry point
> will necessarily gain complexity in subsequent patches.
> 
> NOTE: This hypercall is only implemented for x86 and is currently
>       restricted by XSM to dom0 since it could be used to cause IOMMU
>       faults which may bring down a host.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
[...]
> +
> +
> +static bool can_control_iommu(void)
> +{
> +    struct domain *currd = current->domain;
> +
> +    /*
> +     * IOMMU mappings cannot be manipulated if:
> +     * - the IOMMU is not enabled or,
> +     * - the IOMMU is passed through or,
> +     * - shared EPT configured or,
> +     * - Xen is maintaining an identity map.

"for dom0"

> +     */
> +    if ( !iommu_enabled || iommu_passthrough ||
> +         iommu_use_hap_pt(currd) || need_iommu(currd) )

I guess it's clearer to directly check iommu_dom0_strict here

> +        return false;
> +
> +    return true;
> +}


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 6/7] x86: add iommu_op to query reserved ranges
  2018-02-12 10:47 ` [PATCH 6/7] x86: add iommu_op to query reserved ranges Paul Durrant
@ 2018-02-13  6:51   ` Tian, Kevin
  2018-02-13  9:25     ` Paul Durrant
  2018-03-19 14:10   ` Jan Beulich
  2018-03-19 15:13   ` Jan Beulich
  2 siblings, 1 reply; 68+ messages in thread
From: Tian, Kevin @ 2018-02-13  6:51 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Jan Beulich

> From: Paul Durrant
> Sent: Monday, February 12, 2018 6:47 PM
> 
> Certain areas of memory, such as RMRRs, must be mapped 1:1
> (i.e. BFN == MFN) through the IOMMU.
> 
> This patch adds an iommu_op to allow these ranges to be queried.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Tim Deegan <tim@xen.org>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  xen/arch/x86/iommu_op.c       | 121
> ++++++++++++++++++++++++++++++++++++++++++
>  xen/include/public/iommu_op.h |  35 ++++++++++++
>  xen/include/xlat.lst          |   2 +
>  3 files changed, 158 insertions(+)
> 
> diff --git a/xen/arch/x86/iommu_op.c b/xen/arch/x86/iommu_op.c
> index edd8a384b3..ac81b98b7a 100644
> --- a/xen/arch/x86/iommu_op.c
> +++ b/xen/arch/x86/iommu_op.c
> @@ -22,6 +22,58 @@
>  #include <xen/event.h>
>  #include <xen/guest_access.h>
>  #include <xen/hypercall.h>
> +#include <xen/iommu.h>
> +
> +struct get_rdm_ctxt {
> +    unsigned int max_entries;
> +    unsigned int nr_entries;
> +    XEN_GUEST_HANDLE(xen_iommu_reserved_region_t) regions;
> +};
> +
> +static int get_rdm(xen_pfn_t start, xen_ulong_t nr, u32 id, void *arg)
> +{
> +    struct get_rdm_ctxt *ctxt = arg;
> +
> +    if ( ctxt->nr_entries < ctxt->max_entries )
> +    {
> +        xen_iommu_reserved_region_t region = {
> +            .start_bfn = start,
> +            .nr_frames = nr,
> +        };
> +
> +        if ( copy_to_guest_offset(ctxt->regions, ctxt->nr_entries, &region,
> +                                  1) )
> +            return -EFAULT;

RMRR entries are device specific. it's why a 'id' (i.e. sbdf) field 
is introduced for such check.

> +    }
> +
> +    ctxt->nr_entries++;
> +
> +    return 1;
> +}
> +
> +static int iommuop_query_reserved(struct
> xen_iommu_op_query_reserved *op)

I didn't get why we cannot reuse existing XENMEM_reserved_
device_memory_map?

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB
  2018-02-12 10:47 ` [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB Paul Durrant
@ 2018-02-13  6:55   ` Tian, Kevin
  2018-02-13  9:55     ` Paul Durrant
  2018-03-19 15:11   ` Jan Beulich
  1 sibling, 1 reply; 68+ messages in thread
From: Tian, Kevin @ 2018-02-13  6:55 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Jan Beulich

> From: Paul Durrant
> Sent: Monday, February 12, 2018 6:47 PM
> 
> This patch adds iommu_ops to allow a domain with control_iommu
> privilege
> to map and unmap pages from any guest over which it has mapping
> privilege
> in the IOMMU.
> These operations implicitly disable IOTLB flushing so that the caller can
> batch operations and then explicitly flush the IOTLB using the iommu_op
> also added by this patch.

given that last discussion is 2yrs ago and you said actual implementation
already biased from original spec, it'd be difficult to judge whether current
change is sufficient or just 1st step. Could you summarize what have
been changed from last spec, and also any further tasks in your TODO list?

at least just map/unmap operations definitely not meet XenGT requirement...

Thanks
kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/7] paravirtual IOMMU interface
  2018-02-13  6:21 ` [PATCH 0/7] paravirtual IOMMU interface Tian, Kevin
@ 2018-02-13  9:18   ` Paul Durrant
  0 siblings, 0 replies; 68+ messages in thread
From: Paul Durrant @ 2018-02-13  9:18 UTC (permalink / raw)
  To: Kevin Tian, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Nakajima, Jun, Andrew Cooper,
	Tim (Xen.org),
	George Dunlap, Julien Grall, Jan Beulich, Ian Jackson,
	Daniel De Graaf, Suravee Suthikulpanit

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: 13 February 2018 06:21
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Daniel De Graaf
> <dgdegra@tycho.nsa.gov>; George Dunlap <George.Dunlap@citrix.com>;
> Ian Jackson <Ian.Jackson@citrix.com>; Jan Beulich <jbeulich@suse.com>;
> Julien Grall <julien.grall@arm.com>; Nakajima, Jun
> <jun.nakajima@intel.com>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Stefano Stabellini <sstabellini@kernel.org>;
> Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>; Tim (Xen.org)
> <tim@xen.org>; Wei Liu <wei.liu2@citrix.com>
> Subject: RE: [PATCH 0/7] paravirtual IOMMU interface
> 
> > From: Paul Durrant [mailto:paul.durrant@citrix.com]
> > Sent: Monday, February 12, 2018 6:47 PM
> >
> > The idea of a paravirtual IOMMU interface was last discussed on xen-devel
> > more than two years ago and narrowed down on a draft specification [1].
> > There was also an RFC patch series posted with an implementation,
> > however
> > this was never followed through.
> >
> > In this patch series I have tried to simplify the interface and therefore
> > have moved away from the draft specification.
> 
> bear sending out an updated spec?
> 

I'll have to write one, but I agree it is probably worthwhile for the record. The intention is the same as it was when the old spec. was written but I hope this implementation is less complex (though it may not yet be fully complete).
In the meantime I hope each patch is sufficiently small to be reasonably self-explanatory.

Cheers,

  Paul

> >
> > Patches #1 - #3 in the series introduce 'bus frame numbers' into Xen (frame
> > numbers relating to the IOMMU rather than the MMU). The modifications
> > are
> > in common code and so affect ARM as well as x86.
> >
> > Patch #4 adds a pre-requisite method in iommu_ops and an
> > implementation
> > for VT-d. I have not done an implmentation for AMD IOMMUs as my test
> > hard-
> > ware is Intel based, but one may be added in future.
> >
> > Patches #5 - #7 introduce the new 'iommu_op' hypercall with sub-
> > operations
> > to query ranges reserved in the IOMMU, map and unmap pages, and flush
> > the
> > IOTLB.
> >
> > For testing purposes, I have implemented patches to a Linux PV dom0 to
> > set
> > up a 1:1 BFN:GFN mapping and use normal swiotlb dma operations rather
> > then xen-swiotlb.
> >
> > [1] https://lists.xenproject.org/archives/html/xen-devel/2016-
> > 02/msg01428.html
> >
> > Paul Durrant (7):
> >   iommu: introduce the concept of BFN...
> >   iommu: make use of type-safe BFN and MFN in exported functions
> >   iommu: push use of type-safe BFN and MFN into iommu_ops
> >   vtd: add lookup_page method to iommu_ops
> >   public / x86: introduce __HYPERCALL_iommu_op
> >   x86: add iommu_op to query reserved ranges
> >   x86: add iommu_ops to map and unmap pages, and also to flush the
> > IOTLB
> >
> >  tools/flask/policy/modules/xen.if             |   1 +
> >  xen/arch/arm/p2m.c                            |   3 +-
> >  xen/arch/x86/Makefile                         |   1 +
> >  xen/arch/x86/hvm/hypercall.c                  |   1 +
> >  xen/arch/x86/hypercall.c                      |   1 +
> >  xen/arch/x86/iommu_op.c                       | 476
> > ++++++++++++++++++++++++++
> >  xen/arch/x86/mm.c                             |   7 +-
> >  xen/arch/x86/mm/p2m-ept.c                     |   8 +-
> >  xen/arch/x86/mm/p2m-pt.c                      |   8 +-
> >  xen/arch/x86/mm/p2m.c                         |  15 +-
> >  xen/arch/x86/pv/hypercall.c                   |   1 +
> >  xen/arch/x86/x86_64/mm.c                      |   5 +-
> >  xen/common/grant_table.c                      |  10 +-
> >  xen/common/memory.c                           |   4 +-
> >  xen/drivers/passthrough/amd/iommu_cmd.c       |  18 +-
> >  xen/drivers/passthrough/amd/iommu_map.c       |  85 ++---
> >  xen/drivers/passthrough/amd/pci_amd_iommu.c   |   4 +-
> >  xen/drivers/passthrough/arm/smmu.c            |  22 +-
> >  xen/drivers/passthrough/iommu.c               |  28 +-
> >  xen/drivers/passthrough/vtd/iommu.c           |  76 +++-
> >  xen/drivers/passthrough/vtd/iommu.h           |   2 +
> >  xen/drivers/passthrough/vtd/x86/vtd.c         |   3 +-
> >  xen/drivers/passthrough/x86/iommu.c           |   2 +-
> >  xen/include/Makefile                          |   2 +
> >  xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |   8 +-
> >  xen/include/public/iommu_op.h                 | 127 +++++++
> >  xen/include/public/xen.h                      |   1 +
> >  xen/include/xen/hypercall.h                   |  12 +
> >  xen/include/xen/iommu.h                       |  42 ++-
> >  xen/include/xlat.lst                          |   5 +
> >  xen/include/xsm/dummy.h                       |   6 +
> >  xen/include/xsm/xsm.h                         |   6 +
> >  xen/xsm/dummy.c                               |   1 +
> >  xen/xsm/flask/hooks.c                         |   6 +
> >  xen/xsm/flask/policy/access_vectors           |   2 +
> >  35 files changed, 868 insertions(+), 131 deletions(-)
> >  create mode 100644 xen/arch/x86/iommu_op.c
> >  create mode 100644 xen/include/public/iommu_op.h
> > ---
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > Cc: George Dunlap <george.dunlap@eu.citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Julien Grall <julien.grall@arm.com>
> > Cc: Jun Nakajima <jun.nakajima@intel.com>
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> > Cc: Tim Deegan <tim@xen.org>
> > Cc: Wei Liu <wei.liu2@citrix.com>
> >
> > --
> > 2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-02-13  6:43   ` Tian, Kevin
@ 2018-02-13  9:22     ` Paul Durrant
  2018-02-23  5:17       ` Tian, Kevin
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-02-13  9:22 UTC (permalink / raw)
  To: Kevin Tian, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson, Daniel De Graaf

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: 13 February 2018 06:43
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> __HYPERCALL_iommu_op
> 
> > From: Paul Durrant
> > Sent: Monday, February 12, 2018 6:47 PM
> >
> > This patch introduces the boilerplate for a new hypercall to allow a
> > domain to control IOMMU mappings for its own pages.
> > Whilst there is duplication of code between the native and compat entry
> > points which appears ripe for some form of combination, I think it is
> > better to maintain the separation as-is because the compat entry point
> > will necessarily gain complexity in subsequent patches.
> >
> > NOTE: This hypercall is only implemented for x86 and is currently
> >       restricted by XSM to dom0 since it could be used to cause IOMMU
> >       faults which may bring down a host.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> [...]
> > +
> > +
> > +static bool can_control_iommu(void)
> > +{
> > +    struct domain *currd = current->domain;
> > +
> > +    /*
> > +     * IOMMU mappings cannot be manipulated if:
> > +     * - the IOMMU is not enabled or,
> > +     * - the IOMMU is passed through or,
> > +     * - shared EPT configured or,
> > +     * - Xen is maintaining an identity map.
> 
> "for dom0"
> 
> > +     */
> > +    if ( !iommu_enabled || iommu_passthrough ||
> > +         iommu_use_hap_pt(currd) || need_iommu(currd) )
> 
> I guess it's clearer to directly check iommu_dom0_strict here

Well, the problem with that is that it totally ties this interface to dom0. Whilst, in practice, that is the case at the moment (because of the xsm check) I do want to leave the potential to allow other PV domains to control their IOMMU mappings, if that make sense in future.

  Paul

> 
> > +        return false;
> > +
> > +    return true;
> > +}
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 6/7] x86: add iommu_op to query reserved ranges
  2018-02-13  6:51   ` Tian, Kevin
@ 2018-02-13  9:25     ` Paul Durrant
  2018-02-23  5:23       ` Tian, Kevin
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-02-13  9:25 UTC (permalink / raw)
  To: Kevin Tian, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: 13 February 2018 06:52
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> <jbeulich@suse.com>
> Subject: RE: [Xen-devel] [PATCH 6/7] x86: add iommu_op to query reserved
> ranges
> 
> > From: Paul Durrant
> > Sent: Monday, February 12, 2018 6:47 PM
> >
> > Certain areas of memory, such as RMRRs, must be mapped 1:1
> > (i.e. BFN == MFN) through the IOMMU.
> >
> > This patch adds an iommu_op to allow these ranges to be queried.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > ---
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Tim Deegan <tim@xen.org>
> > Cc: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  xen/arch/x86/iommu_op.c       | 121
> > ++++++++++++++++++++++++++++++++++++++++++
> >  xen/include/public/iommu_op.h |  35 ++++++++++++
> >  xen/include/xlat.lst          |   2 +
> >  3 files changed, 158 insertions(+)
> >
> > diff --git a/xen/arch/x86/iommu_op.c b/xen/arch/x86/iommu_op.c
> > index edd8a384b3..ac81b98b7a 100644
> > --- a/xen/arch/x86/iommu_op.c
> > +++ b/xen/arch/x86/iommu_op.c
> > @@ -22,6 +22,58 @@
> >  #include <xen/event.h>
> >  #include <xen/guest_access.h>
> >  #include <xen/hypercall.h>
> > +#include <xen/iommu.h>
> > +
> > +struct get_rdm_ctxt {
> > +    unsigned int max_entries;
> > +    unsigned int nr_entries;
> > +    XEN_GUEST_HANDLE(xen_iommu_reserved_region_t) regions;
> > +};
> > +
> > +static int get_rdm(xen_pfn_t start, xen_ulong_t nr, u32 id, void *arg)
> > +{
> > +    struct get_rdm_ctxt *ctxt = arg;
> > +
> > +    if ( ctxt->nr_entries < ctxt->max_entries )
> > +    {
> > +        xen_iommu_reserved_region_t region = {
> > +            .start_bfn = start,
> > +            .nr_frames = nr,
> > +        };
> > +
> > +        if ( copy_to_guest_offset(ctxt->regions, ctxt->nr_entries, &region,
> > +                                  1) )
> > +            return -EFAULT;
> 
> RMRR entries are device specific. it's why a 'id' (i.e. sbdf) field
> is introduced for such check.

What I want here is the union of all RMRRs for all devices in the domain. I believe that is what the code will currently query, but I could be wrong.

> 
> > +    }
> > +
> > +    ctxt->nr_entries++;
> > +
> > +    return 1;
> > +}
> > +
> > +static int iommuop_query_reserved(struct
> > xen_iommu_op_query_reserved *op)
> 
> I didn't get why we cannot reuse existing XENMEM_reserved_
> device_memory_map?
> 

This hypercall is not intended to be tools-only. That one is, unless I misread the #ifdefs.

  Paul

> Thanks
> Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB
  2018-02-13  6:55   ` Tian, Kevin
@ 2018-02-13  9:55     ` Paul Durrant
  2018-02-23  5:35       ` Tian, Kevin
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-02-13  9:55 UTC (permalink / raw)
  To: Kevin Tian, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: 13 February 2018 06:56
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> <jbeulich@suse.com>
> Subject: RE: [Xen-devel] [PATCH 7/7] x86: add iommu_ops to map and
> unmap pages, and also to flush the IOTLB
> 
> > From: Paul Durrant
> > Sent: Monday, February 12, 2018 6:47 PM
> >
> > This patch adds iommu_ops to allow a domain with control_iommu
> > privilege
> > to map and unmap pages from any guest over which it has mapping
> > privilege
> > in the IOMMU.
> > These operations implicitly disable IOTLB flushing so that the caller can
> > batch operations and then explicitly flush the IOTLB using the iommu_op
> > also added by this patch.
> 
> given that last discussion is 2yrs ago and you said actual implementation
> already biased from original spec, it'd be difficult to judge whether current
> change is sufficient or just 1st step. Could you summarize what have
> been changed from last spec, and also any further tasks in your TODO list?

Kevin,

The main changes are:

- there is no op to query mapping capability... instead the hypercall will fail with -EACCES
- there is no longer an option to avoid reference counting map and unmap operations
- there are no longer separate ops for mapping local and remote pages (DOMID_SELF should be passed to the map op for local pages), and ops always deal with GFNs not MFNs
  - also I have dropped the idea of a global m2b map, so...
  - it is now going to be the responsibility of the code running in the mapping domain to track what it has mapped [1]
- there is no illusion that pages other 4k are supported at the moment
- the flush operation is now explicit

[1] this would be an issue if the interface becomes usable for anything other than dom0 as we'd also need something in Xen to release the page refs if the domain was forcibly destroyed, but I think the m2b was the wrong solution since it necessitates a full scan of *host* RAM on any domain destruction

The main item on my TODO list is to implement a new IOREQ to allow invalidation of specific guest pages. Think of the current 'invalidate map cache' as a global flush... I need a specific flush so that a decrease_reservation hypercall issued by a guest can instead tell emulators exactly which pages are being removed from guest. It is then the emulators' responsibilities to unmap those pages if they had them mapped (either through MMU or IOMMU) which then drop page refs and actually allow the pages to be recycled.

I will, of course, need to come up with more Linux code to test all this, which will eventually lead to kernel and user APIs to allow emulators running in dom0 to IOMMU map guest pages.

> 
> at least just map/unmap operations definitely not meet XenGT
> requirement...
> 

What aspect of the hypercall interface does not meet XenGT's requirements? It would be good to know now then I can make any necessary adjustments in v2.

Cheers,

  Paul

> Thanks
> kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-02-13  9:22     ` Paul Durrant
@ 2018-02-23  5:17       ` Tian, Kevin
  2018-02-23  9:41         ` Paul Durrant
  0 siblings, 1 reply; 68+ messages in thread
From: Tian, Kevin @ 2018-02-23  5:17 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson, Daniel De Graaf

> From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> Sent: Tuesday, February 13, 2018 5:23 PM
> 
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > Sent: 13 February 2018 06:43
> > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> devel@lists.xenproject.org
> > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> > __HYPERCALL_iommu_op
> >
> > > From: Paul Durrant
> > > Sent: Monday, February 12, 2018 6:47 PM
> > >
> > > This patch introduces the boilerplate for a new hypercall to allow a
> > > domain to control IOMMU mappings for its own pages.
> > > Whilst there is duplication of code between the native and compat
> entry
> > > points which appears ripe for some form of combination, I think it is
> > > better to maintain the separation as-is because the compat entry point
> > > will necessarily gain complexity in subsequent patches.
> > >
> > > NOTE: This hypercall is only implemented for x86 and is currently
> > >       restricted by XSM to dom0 since it could be used to cause IOMMU
> > >       faults which may bring down a host.
> > >
> > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > [...]
> > > +
> > > +
> > > +static bool can_control_iommu(void)
> > > +{
> > > +    struct domain *currd = current->domain;
> > > +
> > > +    /*
> > > +     * IOMMU mappings cannot be manipulated if:
> > > +     * - the IOMMU is not enabled or,
> > > +     * - the IOMMU is passed through or,
> > > +     * - shared EPT configured or,
> > > +     * - Xen is maintaining an identity map.
> >
> > "for dom0"
> >
> > > +     */
> > > +    if ( !iommu_enabled || iommu_passthrough ||
> > > +         iommu_use_hap_pt(currd) || need_iommu(currd) )
> >
> > I guess it's clearer to directly check iommu_dom0_strict here
> 
> Well, the problem with that is that it totally ties this interface to dom0.
> Whilst, in practice, that is the case at the moment (because of the xsm
> check) I do want to leave the potential to allow other PV domains to control
> their IOMMU mappings, if that make sense in future.
> 

first it's inconsistent from the comments - "Xen is maintaining
an identity map" which only applies to dom0.

second I'm afraid !need_iommu is not an accurate condition to represent
PV domain. what about iommu also enabled for future PV domains?

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 6/7] x86: add iommu_op to query reserved ranges
  2018-02-13  9:25     ` Paul Durrant
@ 2018-02-23  5:23       ` Tian, Kevin
  2018-02-23  9:02         ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Tian, Kevin @ 2018-02-23  5:23 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson

> From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> Sent: Tuesday, February 13, 2018 5:25 PM
> 
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > Sent: 13 February 2018 06:52
> > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> devel@lists.xenproject.org
> > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > <jbeulich@suse.com>
> > Subject: RE: [Xen-devel] [PATCH 6/7] x86: add iommu_op to query
> reserved
> > ranges
> >
> > > From: Paul Durrant
> > > Sent: Monday, February 12, 2018 6:47 PM
> > >
> > > Certain areas of memory, such as RMRRs, must be mapped 1:1
> > > (i.e. BFN == MFN) through the IOMMU.
> > >
> > > This patch adds an iommu_op to allow these ranges to be queried.
> > >
> > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > > ---
> > > Cc: Jan Beulich <jbeulich@suse.com>
> > > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > > Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> > > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > > Cc: Tim Deegan <tim@xen.org>
> > > Cc: Wei Liu <wei.liu2@citrix.com>
> > > ---
> > >  xen/arch/x86/iommu_op.c       | 121
> > > ++++++++++++++++++++++++++++++++++++++++++
> > >  xen/include/public/iommu_op.h |  35 ++++++++++++
> > >  xen/include/xlat.lst          |   2 +
> > >  3 files changed, 158 insertions(+)
> > >
> > > diff --git a/xen/arch/x86/iommu_op.c b/xen/arch/x86/iommu_op.c
> > > index edd8a384b3..ac81b98b7a 100644
> > > --- a/xen/arch/x86/iommu_op.c
> > > +++ b/xen/arch/x86/iommu_op.c
> > > @@ -22,6 +22,58 @@
> > >  #include <xen/event.h>
> > >  #include <xen/guest_access.h>
> > >  #include <xen/hypercall.h>
> > > +#include <xen/iommu.h>
> > > +
> > > +struct get_rdm_ctxt {
> > > +    unsigned int max_entries;
> > > +    unsigned int nr_entries;
> > > +    XEN_GUEST_HANDLE(xen_iommu_reserved_region_t) regions;
> > > +};
> > > +
> > > +static int get_rdm(xen_pfn_t start, xen_ulong_t nr, u32 id, void *arg)
> > > +{
> > > +    struct get_rdm_ctxt *ctxt = arg;
> > > +
> > > +    if ( ctxt->nr_entries < ctxt->max_entries )
> > > +    {
> > > +        xen_iommu_reserved_region_t region = {
> > > +            .start_bfn = start,
> > > +            .nr_frames = nr,
> > > +        };
> > > +
> > > +        if ( copy_to_guest_offset(ctxt->regions, ctxt->nr_entries, &region,
> > > +                                  1) )
> > > +            return -EFAULT;
> >
> > RMRR entries are device specific. it's why a 'id' (i.e. sbdf) field
> > is introduced for such check.
> 
> What I want here is the union of all RMRRs for all devices in the domain. I
> believe that is what the code will currently query, but I could be wrong.

RMRR is per-device. I'm not sure why we want to restrict it for every
device if not related.

> 
> >
> > > +    }
> > > +
> > > +    ctxt->nr_entries++;
> > > +
> > > +    return 1;
> > > +}
> > > +
> > > +static int iommuop_query_reserved(struct
> > > xen_iommu_op_query_reserved *op)
> >
> > I didn't get why we cannot reuse existing XENMEM_reserved_
> > device_memory_map?
> >
> 
> This hypercall is not intended to be tools-only. That one is, unless I misread
> the #ifdefs.
> 

I didn't realize it. Curious how Xen enforces such tools-only policy? What
would happen if calling it from Dom0 kernel? I just felt not good of
creating a new interface just for duplicated purpose...

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB
  2018-02-13  9:55     ` Paul Durrant
@ 2018-02-23  5:35       ` Tian, Kevin
  2018-02-23  9:35         ` Paul Durrant
  0 siblings, 1 reply; 68+ messages in thread
From: Tian, Kevin @ 2018-02-23  5:35 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson

> From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> Sent: Tuesday, February 13, 2018 5:56 PM
> 
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > Sent: 13 February 2018 06:56
> > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> devel@lists.xenproject.org
> > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > <jbeulich@suse.com>
> > Subject: RE: [Xen-devel] [PATCH 7/7] x86: add iommu_ops to map and
> > unmap pages, and also to flush the IOTLB
> >
> > > From: Paul Durrant
> > > Sent: Monday, February 12, 2018 6:47 PM
> > >
> > > This patch adds iommu_ops to allow a domain with control_iommu
> > > privilege
> > > to map and unmap pages from any guest over which it has mapping
> > > privilege
> > > in the IOMMU.
> > > These operations implicitly disable IOTLB flushing so that the caller can
> > > batch operations and then explicitly flush the IOTLB using the
> iommu_op
> > > also added by this patch.
> >
> > given that last discussion is 2yrs ago and you said actual implementation
> > already biased from original spec, it'd be difficult to judge whether
> current
> > change is sufficient or just 1st step. Could you summarize what have
> > been changed from last spec, and also any further tasks in your TODO list?
> 
> Kevin,
> 
> The main changes are:
> 
> - there is no op to query mapping capability... instead the hypercall will fail
> with -EACCES
> - there is no longer an option to avoid reference counting map and unmap
> operations
> - there are no longer separate ops for mapping local and remote pages
> (DOMID_SELF should be passed to the map op for local pages), and ops
> always deal with GFNs not MFNs
>   - also I have dropped the idea of a global m2b map, so...
>   - it is now going to be the responsibility of the code running in the
> mapping domain to track what it has mapped [1]
> - there is no illusion that pages other 4k are supported at the moment
> - the flush operation is now explicit
> 
> [1] this would be an issue if the interface becomes usable for anything
> other than dom0 as we'd also need something in Xen to release the page
> refs if the domain was forcibly destroyed, but I think the m2b was the
> wrong solution since it necessitates a full scan of *host* RAM on any
> domain destruction
> 
> The main item on my TODO list is to implement a new IOREQ to allow
> invalidation of specific guest pages. Think of the current 'invalidate map
> cache' as a global flush... I need a specific flush so that a
> decrease_reservation hypercall issued by a guest can instead tell emulators
> exactly which pages are being removed from guest. It is then the emulators'
> responsibilities to unmap those pages if they had them mapped (either
> through MMU or IOMMU) which then drop page refs and actually allow the
> pages to be recycled.
> 
> I will, of course, need to come up with more Linux code to test all this,
> which will eventually lead to kernel and user APIs to allow emulators
> running in dom0 to IOMMU map guest pages.

Thanks for elaboration. I didn't find original proposal. Can you 
attach or point me to a link?

> 
> >
> > at least just map/unmap operations definitely not meet XenGT
> > requirement...
> >
> 
> What aspect of the hypercall interface does not meet XenGT's
> requirements? It would be good to know now then I can make any
> necessary adjustments in v2.
> 

XenGT needs to replace GFN with BFN into shadow GPU page table
for a given domain. Previously iirc there is a query interface for such 
purpose, since the mapping is managed by hypervisor. Based on above 
description (e.g. m2b), did you intend to let Dom0 pvIOMMU driver 
manage all related mapping information thus GVT-g just consults 
pvIOMMU driver for such purpose?

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 6/7] x86: add iommu_op to query reserved ranges
  2018-02-23  5:23       ` Tian, Kevin
@ 2018-02-23  9:02         ` Jan Beulich
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Beulich @ 2018-02-23  9:02 UTC (permalink / raw)
  To: Kevin Tian
  Cc: Stefano Stabellini, Wei Liu, AndrewCooper, Tim (Xen.org),
	George Dunlap, Paul Durrant, xen-devel, Ian Jackson

>>> On 23.02.18 at 06:23, <kevin.tian@intel.com> wrote:
>>  From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
>> Sent: Tuesday, February 13, 2018 5:25 PM
>> > From: Tian, Kevin [mailto:kevin.tian@intel.com]
>> > Sent: 13 February 2018 06:52
>> > > From: Paul Durrant
>> > > Sent: Monday, February 12, 2018 6:47 PM
>> > > +    }
>> > > +
>> > > +    ctxt->nr_entries++;
>> > > +
>> > > +    return 1;
>> > > +}
>> > > +
>> > > +static int iommuop_query_reserved(struct
>> > > xen_iommu_op_query_reserved *op)
>> >
>> > I didn't get why we cannot reuse existing XENMEM_reserved_
>> > device_memory_map?
>> >
>> 
>> This hypercall is not intended to be tools-only. That one is, unless I misread
>> the #ifdefs.
>> 
> 
> I didn't realize it. Curious how Xen enforces such tools-only policy? What
> would happen if calling it from Dom0 kernel? I just felt not good of
> creating a new interface just for duplicated purpose...

It's not enforced for Dom0; Dom0 (including its kernel) is trusted.
How would Xen know whether a request came from user land
(through the privcmd driver) or directly from some kernel component?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB
  2018-02-23  5:35       ` Tian, Kevin
@ 2018-02-23  9:35         ` Paul Durrant
  2018-02-24  3:01           ` Tian, Kevin
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-02-23  9:35 UTC (permalink / raw)
  To: Kevin Tian, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson

[-- Attachment #1: Type: text/plain, Size: 5084 bytes --]

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: 23 February 2018 05:36
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> <jbeulich@suse.com>
> Subject: RE: [Xen-devel] [PATCH 7/7] x86: add iommu_ops to map and
> unmap pages, and also to flush the IOTLB
> 
> > From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> > Sent: Tuesday, February 13, 2018 5:56 PM
> >
> > > -----Original Message-----
> > > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > > Sent: 13 February 2018 06:56
> > > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> > devel@lists.xenproject.org
> > > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > > <jbeulich@suse.com>
> > > Subject: RE: [Xen-devel] [PATCH 7/7] x86: add iommu_ops to map and
> > > unmap pages, and also to flush the IOTLB
> > >
> > > > From: Paul Durrant
> > > > Sent: Monday, February 12, 2018 6:47 PM
> > > >
> > > > This patch adds iommu_ops to allow a domain with control_iommu
> > > > privilege
> > > > to map and unmap pages from any guest over which it has mapping
> > > > privilege
> > > > in the IOMMU.
> > > > These operations implicitly disable IOTLB flushing so that the caller can
> > > > batch operations and then explicitly flush the IOTLB using the
> > iommu_op
> > > > also added by this patch.
> > >
> > > given that last discussion is 2yrs ago and you said actual implementation
> > > already biased from original spec, it'd be difficult to judge whether
> > current
> > > change is sufficient or just 1st step. Could you summarize what have
> > > been changed from last spec, and also any further tasks in your TODO
> list?
> >
> > Kevin,
> >
> > The main changes are:
> >
> > - there is no op to query mapping capability... instead the hypercall will fail
> > with -EACCES
> > - there is no longer an option to avoid reference counting map and unmap
> > operations
> > - there are no longer separate ops for mapping local and remote pages
> > (DOMID_SELF should be passed to the map op for local pages), and ops
> > always deal with GFNs not MFNs
> >   - also I have dropped the idea of a global m2b map, so...
> >   - it is now going to be the responsibility of the code running in the
> > mapping domain to track what it has mapped [1]
> > - there is no illusion that pages other 4k are supported at the moment
> > - the flush operation is now explicit
> >
> > [1] this would be an issue if the interface becomes usable for anything
> > other than dom0 as we'd also need something in Xen to release the page
> > refs if the domain was forcibly destroyed, but I think the m2b was the
> > wrong solution since it necessitates a full scan of *host* RAM on any
> > domain destruction
> >
> > The main item on my TODO list is to implement a new IOREQ to allow
> > invalidation of specific guest pages. Think of the current 'invalidate map
> > cache' as a global flush... I need a specific flush so that a
> > decrease_reservation hypercall issued by a guest can instead tell emulators
> > exactly which pages are being removed from guest. It is then the
> emulators'
> > responsibilities to unmap those pages if they had them mapped (either
> > through MMU or IOMMU) which then drop page refs and actually allow the
> > pages to be recycled.
> >
> > I will, of course, need to come up with more Linux code to test all this,
> > which will eventually lead to kernel and user APIs to allow emulators
> > running in dom0 to IOMMU map guest pages.
> 
> Thanks for elaboration. I didn't find original proposal. Can you
> attach or point me to a link?
> 

FWIW, I've attached Malcolm's original for reference.

> >
> > >
> > > at least just map/unmap operations definitely not meet XenGT
> > > requirement...
> > >
> >
> > What aspect of the hypercall interface does not meet XenGT's
> > requirements? It would be good to know now then I can make any
> > necessary adjustments in v2.
> >
> 
> XenGT needs to replace GFN with BFN into shadow GPU page table
> for a given domain.

I assume xengt would be dynamically mapping the gfn at this point...

> Previously iirc there is a query interface for such
> purpose, since the mapping is managed by hypervisor. Based on above
> description (e.g. m2b), did you intend to let Dom0 pvIOMMU driver
> manage all related mapping information thus GVT-g just consults
> pvIOMMU driver for such purpose?
> 

...so my plan is that the dom0 API picks a bfn, does the mapping and then passes the bfn back to the caller.

> Thanks
> Kevin

[-- Attachment #2: PV-IOMMU.TXT --]
[-- Type: text/plain, Size: 32555 bytes --]

% Xen PV IOMMU interface
% Malcolm Crossley <<malcolm.crossley@xxxxxxxxxx>>
  Paul Durrant <<paul.durrant@xxxxxxxxxx>>
% Draft D

Introduction
============

Revision History
----------------

--------------------------------------------------------------------
Version  Date         Changes
-------  -----------  ----------------------------------------------
Draft A  10 Apr 2014  Initial draft.

Draft B  12 Jun 2015  Second draft.

Draft C  26 Jun 2015  Third draft.

Draft D  09 Feb 2016  Fourth draft.
--------------------------------------------------------------------

Background
==========

Linux kernel SWIOTLB
--------------------

Xen PV guests use a Pseudophysical Frame Number(PFN) address space which is
decoupled from the host Machine Frame Number(MFN) address space.

PV guest hardware drivers are aware of the PFN address space only and
assume that if PFN addresses are contiguous then the hardware addresses would
be contiguous as well. The decoupling between PFN and MFN address spaces means
PFN and MFN addresses may not be contiguous across page boundaries and thus a
buffer allocated in GFN address space which spans a page boundary may not be
contiguous in MFN address space.

PV hardware drivers cannot tolerate this behaviour and so a special
"bounce buffer" region is used to hide this issue from the drivers.

A bounce buffer region is a special part of the PFN address space which has
been made to be contiguous in both PFN and MFN address spaces. When a driver
requests a buffer which spans a page boundary be made available for hardware
to read the core operating system code copies the buffer into a temporarily
reserved part of the bounce buffer region and then returns the MFN address of
the reserved part of the bounce buffer region back to the driver itself. The
driver then instructs the hardware to read the copy of the buffer in the
bounce buffer. Similarly if the driver requests a buffer is made available
for hardware to write to the first a region of the bounce buffer is reserved
and then after the hardware completes writing then the reserved region of
bounce buffer is copied to the originally allocated buffer.

The overheard of memory copies to/from the bounce buffer region is high
and damages performance. Furthermore, there is a risk the fixed size
bounce buffer region will become exhausted and it will not be possible to
return an hardware address back to the driver. The Linux kernel drivers do not
tolerate this failure and so the kernel is forced to crash, as an
unrecoverable error has occurred.

Input/Output Memory Management Units (IOMMU) allow for an inbound address
mapping to be created from the I/O Bus address space (typically PCI) to
the machine frame number address space. IOMMUs typically use a page table
mechanism to manage the mappings and therefore can create mappings of page size
granularity or larger.

The I/O Bus address space will be referred to as the Bus Frame Number (BFN)
address space for the rest of this document.


Mediated Pass-through Emulators
-------------------------------

Mediated Pass-through emulators allow guest domains to interact with
hardware devices via emulator mediation. The emulator runs in a domain separate
to the guest domain and it is used to enforce security of guest access to the
hardware devices and isolation of different guests accessing the same hardware
device.

The emulator requires a mechanism to map guest addresses to a bus address that
the hardware devices can access.


Clarification of GFN and BFN fields for different guest types
-------------------------------------------------------------
Guest Frame Numbers (GFN) definition varies depending on the guest type.

Diagram below details the memory accesses originating from CPU, per guest type:

      HVM guest                              PV guest

         (VA)                                   (VA)
          |                                      |
         MMU                                    MMU
          |                                      |
         (GFN)                                   |
          |                                      | (GFN)
     HAP a.k.a EPT/NPT                           |
          |                                      |
         (MFN)                                  (MFN)
          |                                      |
         RAM                                    RAM

For PV guests GFN is equal to MFN for a single page but not for a contiguous
range of pages.

Bus Frame Numbers (BFN) refer to the address presented on the physical bus
before being translated by the IOMMU.

Diagram below details memory accesses originating from physical device.

    Physical Device
          |
        (BFN)
          |
           IOMMU-PT
          |
        (MFN)
          |
         RAM



Purpose
=======

1. Allow Xen guests to create/modify/destroy IOMMU mappings for
hardware devices that the PV guests has access to. This enables the PV guest to
program a bus address space mapping which matches its GFN mapping. Once a 1-1
mapping of PFN to bus address space is created then a bounce buffer
region is not required for the I/O devices connected to the IOMMU.

2. Allow for Xen guests to lookup/create/modify/destroy IOMMU mappings for
guest memory of domains the calling Xen guest has sufficient privilege over.
This enables domains to provide mediated hardware acceleration to other
guest domains.


General principles for PV IOMMU interface
=========================================

There are two different usage models for the BFN address space of a calling
guest based upon the two purposes specified in the section above.

A calling guest may use their BFN address space for only one of the purposes
detailed above and so the PV IOMMU interface has a subop per usage model.
Furthermore, the IOMMU mapping of foreign domains memory is more complex than
IOMMU mapping local domain memory and seperating the subops allows for the
complexity to be split in the implementation.

The PV IOMMU design allows the calling domain to control it's BFN memory map.
Thus the design also assigns the responsiblity of ensuring a BFN address
mapped for local domain memory mappings are not reused for foreign domain
memory mappings without an explict unmap of BFN address first. This simplifies
the usage of the API and the extra overhead for the calling domains should be
minimal as they should be already tracking the BFN address space usage already.


Emulator usage of PV IOMMU interface
====================================

Emulators which require bus address mapping of guest RAM must first determine if
it's possible for the domain to control the bus addresses themselves.

A IOMMUOP_query_caps subop will return the IOMMU_QUERY_map_cap flag. If this
flag is set then the emulator may specify the BFN address it wishes guest RAM to
be mapped to via the IOMMUOP_map_foreign_page subop.  If the flag is not set
then the emulator must use BFN addresses supplied by the Xen via the
IOMMUOP_lookup_foreign_page.

Operating systems which use the IOMMUOP_map_page subop are expected to provide a
common interface for emulators to use. Otherwise emulators will not be aware
of existing BFN mappings created by operating system and will get failed
subops due to conflicts in the BFN address space for the domain.

Emulators should unmap unused GFN mappings as often as possible using
IOMMUOP_unmap_foreign_page subops so that guest domains can balloon pages
quickly and efficiently.

Emulators should conform to the ballooning behaviour described section
"IOMMUOP_*_foreign_page interactions with guest domain ballooning" so that guest
domains are able to effectively balloon out and in memory.

Emulators must unmap any active BFN mappings when they shutdown.

IOMMUOP_*_foreign_page interactions with guest domain ballooning
================================================================

Guest domains can balloon out a set of GFN mappings at any time and render the
BFN to GFN mapping invalid.

When a BFN to GFN mapping becomes invalid, Xen will issue a buffered I/O request
of type IOREQ_TYPE_INVALIDATE to the affected IOREQ servers with the now invalid
BFN address in the data field. If the buffered I/O request ring is full then a
standard (synchronous) I/O request of type IOREQ_TYPE_INVALIDATE will be issued
to the affected IOREQ server the with just invalidated BFN address in the data
field.

The BFN mappings cannot be simply unmapped at the point of the balloon hypercall
otherwise a malicious guest could specifically balloon out an in use GFN address
in use by an emulator and trigger IOMMU faults for the domains with BFN
mappings.

For hosts with no IOMMU support: The affected emulator(s) must specifically
issue a IOMMUOP_unmap_foreign_page subop for the now invalid BFN address so that
the references to the underlying MFN are removed and the MFN can be freed back
to the Xen memory allocator.

For hosts with IOMMU support:
If the BFN was mapped without the IOMMUOP_swap_mfn flag set in the
IOMMUOP_map_foreign_page then the affected affected emulator(s) must
specifically issue a IOMMUOP_unmap_foreign_page subop for the now invalid BFN
address so that the references to the underlying MFN are removed.

If the BFN was mapped with the IOMMUOP_swap_mfn flag set in the
IOMMUOP_map_foreign_page subop for all emulators with mappings of that GFN then
the BFN mapping will be swapped to point at a scratch MFN page and all BFN
references to the invalid MFN will be removed by Xen after the BFN mapping has
been updated to point at the scratch MFN page.

The rationale for swapping the BFN mapping to point at scratch pages is to
enable guest domains to balloon quickly without requiring hypercall(s) from
emulators.

Not all BFN mappings can be swapped without potentially causing problems for the
hardware itself (command rings etc.) so the IOMMUOP_swap_mfn flag is used to
allow per BFN control of Xen ballooning behaviour.


PV IOMMU interactions with self ballooning
==========================================

A guest should clear any IOMMU mappings it has of its own pages before
releasing a page back to Xen. The guest also will need to add IOMMU mappings
after repopulating a page with the populate_physmap hypercall.

PV guests must clear any IOMMU mappings before pinning page table pages
because the IOMMU mappings will take a writable reference count and this will
prevent page table pinning.


Security Implications of allowing domain IOMMU control
======================================================

Xen currently allows I/O devices attached to hardware domain to have direct
access to the all of the MFN address space (except Xen hypervisor memory 
regions),
provided the Xen IOMMU option dom0-strict is not enabled.

The PV IOMMU feature provides the same level of access to MFN address space
and the feature is not enabled when the Xen IOMMU option dom0-strict is
enabled. Therefore security is not degraded by the PV IOMMU feature.

Domains with physical device(s) assigned which are not hardware domains are only
allowed to map their own GFNs or GFNs for domain(s) they have privilege over.


PV IOMMU interactions with grant map/unmap operations
=====================================================

Grant map operations return a Physical device accessible address (BFN) if the
GNTMAP_device_map flag is set.  This operation currently returns the MFN for PV
guests which may conflict with the BFN address space the guest uses if PV IOMMU
map support is available to the guest.

This design proposes to allow the calling domain to control the BFN address that
a grant map operation uses.

This can be achieved by specifying that the dev_bus_addr in the
gnttab_map_grant_ref structure is used an input parameter instead of the
output parameter it is currently.

Only PAGE_SIZE aligned addresses are allowed for dev_bus_addr input parameter.

The revised structure is shown below for convenience.

    struct gnttab_map_grant_ref {
        /* IN parameters. */
        uint64_t host_addr;
        uint32_t flags;               /* GNTMAP_* */
        grant_ref_t ref;
        domid_t  dom;
        /* OUT parameters. */
        int16_t  status;              /* => enum grant_status */
        grant_handle_t handle;
        /* IN/OUT parameters */
        uint64_t dev_bus_addr;
    };


The grant map operation would then behave similarly to the IOMMUOP_map_page
subop for the creation of the IOMMU mapping.

The grant unmap operation would then behave similarly to the IOMMUOP_unmap_page
subop for the removal of the IOMMU mapping.

A new grantmap flag would be used to indicate the domain is requesting the
dev_bus_addr field is used an input parameter.


    #define _GNTMAP_request_bfn_map      (6)
    #define GNTMAP_request_bfn_map   (1<<_GNTMAP_request_bfn_map)


Xen PV-IOMMU Architecture
=========================

The Xen architecture consists of a new hypercall interface and changes to the
grant map interface.

The existing IOMMU mappings setup at domain creation time will be preserved so
that PV domains unaware of this feature will continue to function with no
changes required.

Memory ballooning will be supported by taking an additional reference on the
MFN backing the GFN for each successful IOMMU mapping created.

An M2B tracking structure will be used to ensure all references to an MFN can
be located efficiently.

Xen PV IOMMU hypercall interface
--------------------------------
A two argument hypercall interface (do_iommu_op).

    ret_t do_iommu_op(XEN_GUEST_HANDLE_PARAM(void) arg, unsigned int count)

First argument, guest handle pointer to array of `struct pv_iommu_op`

Second argument, unsigned integer count of `struct pv_iommu_op` elements in 
array.

Definition of `struct pv_iommu_op`:

    struct pv_iommu_op {

        uint16_t subop_id;
        uint16_t flags;
        int32_t status;

        union {
            struct {
                uint64_t bfn;
                uint64_t gfn;
            } map_page;

            struct {
                uint64_t bfn;
            } unmap_page;

            struct {
                uint64_t bfn;
                uint64_t gfn;
                uint16_t domid;
                ioservid_t ioserver;
            } map_foreign_page;

            struct {
                uint64_t bfn;
                uint64_t gfn;
                uint16_t domid;
                ioservid_t ioserver;
            } lookup_foreign_page;

            struct {
                uint64_t bfn;
                ioservid_t ioserver;
            } unmap_foreign_page;
        } u;
    };

Definition of PV IOMMU subops:

    #define IOMMUOP_query_caps            1
    #define IOMMUOP_map_page              2
    #define IOMMUOP_unmap_page            3
    #define IOMMUOP_map_foreign_page      4
    #define IOMMUOP_lookup_foreign_page   5
    #define IOMMUOP_unmap_foreign_page    6


Design considerations for hypercall op
-------------------------------------------
IOMMU map/unmap operations can be slow and can involve flushing the IOMMU TLB
to ensure the I/O device uses the updated mappings.

The op has been designed to take an array of operations and a count as
parameters. This allows for easily implemented hypercall continuations to be
used and allows for batches of IOMMU operations to be submitted before flushing
the IOMMU TLB.

The `subop_id` to be used for a particular element is encoded into the element
itself. This allows for map and unmap operations to be performed in one 
hypercall
and for the IOMMU TLB flushing optimisations to be still applied.

The hypercall will ensure that the required IOMMU TLB flushes are applied before
returning to guest via either hypercall completion or a hypercall continuation.

IOMMUOP_query_caps
------------------

This subop queries the runtime capabilities of the PV-IOMMU interface for the
specific calling domain. This subop uses `struct pv_iommu_op` directly.

------------------------------------------------------------------------------
Field          Purpose
-----          ---------------------------------------------------------------
`flags`        [out] This field details the IOMMUOP capabilities.

`status`       [out] Status of this op, op specific values listed below
------------------------------------------------------------------------------

Defined bits for flags field:

------------------------------------------------------------------------------
Name                        Bit                Definition
----                       ------     ----------------------------------
IOMMU_QUERY_map_cap          0        IOMMUOP_map_page or IOMMUOP_map_foreign
                                      can be used for the calling domain

IOMMU_QUERY_map_all_mfns     1        IOMMUOP_map_page subop can map any MFN
                                      not used by Xen

Reserved for future use     2-9                   n/a

IOMMU_page_order           10-15      Returns maximum possible page order for
                                      all other IOMMUOP subops
------------------------------------------------------------------------------

Defined values for query_caps subop status field:

Value   Reason
------  ----------------------------------------------------------
0       subop successfully returned

IOMMUOP_map_page
----------------------
This subop uses `struct map_page` part of the `struct pv_iommu_op`.

If IOMMU dom0-strict mode is NOT enabled then the hardware domain will be
allowed to map all GFNs except for Xen owned MFNs else the hardware
domain will only be allowed to map GFNs which it owns.

If IOMMU dom0-strict mode is NOT enabled then the hardware domain will be
allowed to map all GFNs without taking a reference to the MFN backing the GFN
by setting the IOMMU_MAP_OP_no_ref_cnt flag.

Every successful pv_iommu_op will result in an additional page reference being
taken on the MFN backing the GFN except for the condition detailed above.

If the map_op flags indicate a writeable mapping is required then a writeable
page type reference will be taken otherwise a standard page reference will be
taken.

All the following conditions are required to be true for PV IOMMU map
subop to succeed:

1. IOMMU detected and supported by Xen
2. The domain has IOMMU controlled hardware allocated to it
3. If hardware_domain and the following Xen IOMMU options are
   NOT enabled: dom0-passthrough

This subop usage of the `struct pv_iommu_op` and `struct map_page` fields
are detailed below:

------------------------------------------------------------------------------
Field          Purpose
-----          ---------------------------------------------------------------
`bfn`          [in]  Bus address frame number(BFN) to be mapped to specified gfn
                     below

`gfn`          [in]  Guest address frame number for DOMID_SELF

`flags`        [in]  Flags for signalling type of IOMMU mapping to be created,
                     Flags can be combined.

`status`       [out] Mapping status of this op, op specific values listed below
------------------------------------------------------------------------------

Defined bits for flags field:

Name                        Bit                Definition
----                       -----      ----------------------------------
IOMMU_OP_readable            0        Create readable IOMMU mapping
IOMMU_OP_writeable           1        Create writeable IOMMU mapping
IOMMU_MAP_OP_no_ref_cnt      2        IOMMU mapping does not take a reference to
                                      MFN backing BFN mapping
IOMMU_MAP_OP_add_m2b         3        Wildcard M2B mapping added for
                                      lookup_foreign_page to use
Reserved for future use     4-9                   n/a
IOMMU_page_order            10-15     Page order to be used for both gfn and bfn

Defined values for map_page subop status field:

Value   Reason
------  ----------------------------------------------------------------------
0       subop successfully returned
-EIO    IOMMU unit returned error when attempting to map BFN to GFN.
-EPERM  GFN could not be mapped because the GFN belongs to Xen.
-EPERM  Domain is not the hardware domain and GFN does not belong to domain
-EPERM  Domain is the hardware domain, IOMMU dom-strict mode is enabled and
        GFN does not belong to domain
-EACCES BFN address conflicts with RMRR regions for devices attached to
        DOMID_SELF
-ENOSPC Page order is too large for either BFN, GFN or IOMMU unit

IOMMUOP_unmap_page
------------------
This subop uses `struct unmap_page` part of the `struct pv_iommu_op`.

The subop usage of the `struct pv_iommu_op` and `struct unmap_page` fields
are detailed below:

--------------------------------------------------------------------
Field          Purpose
-----          -----------------------------------------------------
`bfn`          [in] Bus address frame number to be unmapped in DOMID_SELF

`flags`        [in] Flags for signalling page order of unmap operation

`status`       [out] Mapping status of this unmap operation, 0 indicates success
--------------------------------------------------------------------

Defined bits for flags field:

Name                        Bit                Definition
----                       -----      ----------------------------------
IOMMU_UNMAP_OP_remove_m2b    0        Wildcard M2B mapping removed for
                                      lookup_foreign_page use
Reserved for future use     1-9                   n/a
IOMMU_page_order            10-15     Page order to be used for bfn


Defined values for unmap_page subop status field:

Error code  Reason
----------  ------------------------------------------------------------
0            subop successfully returned
-EIO         IOMMU unit returned error when attempting to unmap BFN.
-ENOSPC      Page order is too large for either BFN address or IOMMU unit
------------------------------------------------------------------------


IOMMUOP_map_foreign_page
------------------------
This subop uses `struct map_foreign_page` part of the `struct pv_iommu_op`.

It is not valid to use a domid representing the calling domain.

The hypercall will only succeed if calling domain has sufficient privilege over
the specified domid.

The M2B mechanism is an MFN to (BFN,domid,ioserver) tuple.

Each successful subop will add to the M2B if there was not an existing identical
M2B entry.

Every new M2B entry will take a reference to the MFN backing the GFN.

All the following conditions are required to be true for PV IOMMU map_foreign
subop to succeed:

1. IOMMU detected and supported by Xen
2. The domain has IOMMU controlled hardware allocated to it
3. The domain is the hardware_domain and the following Xen IOMMU options are
   NOT enabled: dom0-passthrough


This subop usage of the `struct pv_iommu_op` and `struct map_foreign_page`
fields are detailed below:

--------------------------------------------------------------------
Field          Purpose
-----          -----------------------------------------------------
`domid`        [in] The domain id for which the gfn field applies

`ioserver`     [in] IOREQ server id associated with mapping

`bfn`          [in] Bus address frame number for gfn address

`gfn`          [in] Guest address frame number

`flags`        [in] Details the status of the BFN mapping

`status`       [out] status of this subop, 0 indicates success
--------------------------------------------------------------------

Defined bits for flags field:

Name                         Bit                Definition
----                        -----      ----------------------------------
IOMMUOP_readable              0        BFN IOMMU mapping is readable
IOMMUOP_writeable             1        BFN IOMMU mapping is writeable
IOMMUOP_swap_mfn              2        BFN IOMMU mapping can be safely
                                       swapped to scratch page
Reserved for future use      3-9       Reserved flag bits should be 0
IOMMU_page_order            10-15      Page order to be used for both gfn and 
bfn

Defined values for map_foreign_page subop status field:

Error code  Reason
----------  ------------------------------------------------------------
0            subop successfully returned
-EIO         IOMMU unit returned error when attempting to map BFN to GFN.
-EPERM       Calling domain does not have sufficient privilege over domid
-EPERM       GFN could not be mapped because the GFN belongs to Xen.
-EPERM       domid maps to DOMID_SELF
-EACCES      BFN address conflicts with RMRR regions for devices attached to
             DOMID_SELF
-ENODEV      Provided ioserver id is not valid
-ENXIO       Provided domid id is not valid
-ENXIO       Provided GFN address is not valid
-ENOSPC      Page order is too large for either BFN, GFN or IOMMU unit

IOMMU_lookup_foreign_page
-------------------------
This subop uses `struct lookup_foreign_page` part of the `struct pv_iommu_op`.

This subop lookups up a BFN mapping for a ioserver + gfn + target domid
combination.

The hypercall will only succeed if calling domain has sufficient privilege over
the specified domid.

If a 1:1 mapping exists of BFN to MFN then a M2B entry is added and a
reference is taken to the underlying MFN. If an existing mapping is present
then the BFN is returned and no additional reference's will be taken to the
underlying MFN.

A 1:1 mapping will exist if there is no IOMMU support or if the PV hardware
domain was booted in dom0-relaxed mode or in dom0-passthrough mode.

If there is no IOMMU support then the MFN is returned in the BFN field (that is
the only valid bus address for the GFN + domid combination).

Each successful subop will add to the M2B if there was not an existing identical
M2B entry.

Every new M2B entry will take a reference to the MFN backing the GFN.

This subop usage of the `struct pv_iommu_op` and `struct lookup_foreign_page`
fields are detailed below:

--------------------------------------------------------------------
Field          Purpose
-----          -----------------------------------------------------
`domid`        [in] The domain id for which the gfn field applies

`ioserver`     [in] IOREQ server id associated with mapping

`bfn`          [out] Bus address frame number for gfn address

`gfn`          [in] Guest address frame number

`flags`        [out] Details the status of the BFN mapping

`status`       [out] status of this subop, 0 indicates success
--------------------------------------------------------------------

Defined bits for flags field:

Name                         Bit                Definition
----                        -----      ----------------------------------
IOMMUOP_readable              0        Returned BFN IOMMU mapping is readable
IOMMUOP_writeable             1        Returned BFN IOMMU mapping is writeable
Reserved for future use      2-9       Reserved flag bits should be 0
IOMMU_page_order            10-15      Returns maximum possible page order for
                                       all other IOMMUOP subops

Defined values for lookup_foreign_page subop status field:

Error code  Reason
----------  ------------------------------------------------------------
0            subop successfully returned
-EPERM       Calling domain does not have sufficient privilege over domid
-ENOENT      There is no available BFN for provided GFN + domid combination
-ENODEV      Provided ioserver id is not valid
-ENXIO       Provided domid id is not valid
-ENXIO       Provided GFN address is not valid


IOMMUOP_unmap_foreign_page
--------------------------
This subop uses `struct unmap_foreign_page` part of the `struct pv_iommu_op`.

It only allows BFNs acquired via IOMMUOP_map_foreign_page or IOMMUOP_lookup_page
to be unmapped. If an attempt is made to unmap a BFN mapped via IOMMUOP_map_page
then the subop will fail.

The subop will perform a B2M lookup (IO page table walk) for the calling domain
and then index the M2B using the returned MFN. This is safe because a particular
BFN mapping can only map to one MFN for a particular calling domain.

This subop usage of the `struct pv_iommu_op` and `struct unmap_foreign_page` 
fields
are detailed below:

-----------------------------------------------------------------------
Field          Purpose
-----          --------------------------------------------------------
`ioserver`     [in] IOREQ server id associated with mapping

`bfn`          [in] Bus address frame number for gfn address

`flags`        [in] Flags for signalling page order of unmap operation

`status`       [out] status of this subop, 0 indicates success
-----------------------------------------------------------------------

Defined bits for flags field:

Name                        Bit                Definition
----                        -----     ----------------------------------
Reserved for future use     0-9                   n/a
IOMMU_page_order            10-15     Page order to be used for bfn unmapping

Defined values for unmap_foreign_page subop status field:

Error code  Reason
----------  ------------------------------------------------------------
0            subop successfully returned
-ENOENT      An M2B entry was not found for the specified input parameters.


Linux kernel architecture
=========================

The Linux kernel will use the PV-IOMMU hypercalls to map its PFN address
space into the IOMMU. It will map the PFNs to the IOMMU address space using
a 1:1 mapping, it does this by programming a BFN to GFN mapping which matches
the PFN to GFN mapping.

The native SWIOTLB will be used to handle devices which cannot DMA to all of
the kernel's PFN address space.

An interface shall be provided for emulator usage of IOMMUOP_*_foreign_page
subops which will allow the Linux kernel to centrally manage that domain's BFN
resource and ensure there are no unexpected conflicts.

Kernel Map Foreign GFN to BFN interface
---------------------------------------

An array of 'count' of 'struct pv_iommu_ops' will be passed to the mapping
function.

    int map_foreign_gfn_to_bfn(int count, struct pv_iommu_op *ops)

The calling function will use the `struct map_foreign_page` inside the `struct
pv_iommu_op` and will fill in the domid, gfn and ioserver_id fields.

The kernel function will reuse the passed in struct pv_iommu_op for the
hypercall and will set the subop_id field based on the IOMMU_QUERY_map_cap
capability.

If the IOMMU_QUERY_map_cap is set then the kernel will allocate a suitable BFN
address, set the BFN field in the op to this address and set the subop_id to
IOMMUOP_map_page. It will do this on all 'ops' and then issue the hypercall.

If the IOMMU_QUERY_map_cap is NOT set then the kernel will set the subops_id
to IOMMUOP_lookup_page on all `ops` and then issue the hypercall.

The calling function should check the status field in each op and if the
status field is 0 then it can use the returned BFN address in each op.


Kernel Unmap Foreign GFN to BFN interface
-----------------------------------------

An array of 'count' of 'struct pv_iommu_ops' will be passed the mapping
function.

    int unmap_foreign_gfn_to_bfn(int count, struct pv_iommu_op *ops)

The calling function will use the `struct unmap_foreign_page` inside the `struct
pv_iommu_op` and will fill in the bfn field.

The kernel function will set the subop_id field to IOMMUOP_unmap_foreign_page
in each op and then issue the hypercall.

The calling function should check the status field in each op and if the
status field is 0 then the BFN has been successfully unmapped.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

[-- Attachment #3: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-02-23  5:17       ` Tian, Kevin
@ 2018-02-23  9:41         ` Paul Durrant
  2018-02-24  2:57           ` Tian, Kevin
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-02-23  9:41 UTC (permalink / raw)
  To: Kevin Tian, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson, Daniel De Graaf

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: 23 February 2018 05:17
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> __HYPERCALL_iommu_op
> 
> > From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> > Sent: Tuesday, February 13, 2018 5:23 PM
> >
> > > -----Original Message-----
> > > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > > Sent: 13 February 2018 06:43
> > > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> > devel@lists.xenproject.org
> > > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > > <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > > Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> > > __HYPERCALL_iommu_op
> > >
> > > > From: Paul Durrant
> > > > Sent: Monday, February 12, 2018 6:47 PM
> > > >
> > > > This patch introduces the boilerplate for a new hypercall to allow a
> > > > domain to control IOMMU mappings for its own pages.
> > > > Whilst there is duplication of code between the native and compat
> > entry
> > > > points which appears ripe for some form of combination, I think it is
> > > > better to maintain the separation as-is because the compat entry point
> > > > will necessarily gain complexity in subsequent patches.
> > > >
> > > > NOTE: This hypercall is only implemented for x86 and is currently
> > > >       restricted by XSM to dom0 since it could be used to cause IOMMU
> > > >       faults which may bring down a host.
> > > >
> > > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > > [...]
> > > > +
> > > > +
> > > > +static bool can_control_iommu(void)
> > > > +{
> > > > +    struct domain *currd = current->domain;
> > > > +
> > > > +    /*
> > > > +     * IOMMU mappings cannot be manipulated if:
> > > > +     * - the IOMMU is not enabled or,
> > > > +     * - the IOMMU is passed through or,
> > > > +     * - shared EPT configured or,
> > > > +     * - Xen is maintaining an identity map.
> > >
> > > "for dom0"
> > >
> > > > +     */
> > > > +    if ( !iommu_enabled || iommu_passthrough ||
> > > > +         iommu_use_hap_pt(currd) || need_iommu(currd) )
> > >
> > > I guess it's clearer to directly check iommu_dom0_strict here
> >
> > Well, the problem with that is that it totally ties this interface to dom0.
> > Whilst, in practice, that is the case at the moment (because of the xsm
> > check) I do want to leave the potential to allow other PV domains to control
> > their IOMMU mappings, if that make sense in future.
> >
> 
> first it's inconsistent from the comments - "Xen is maintaining
> an identity map" which only applies to dom0.

That's not true. If I assign a PCI device to an HVM domain, for instance, then need_iommu() is true for that domain and indeed Xen maintains a 1:1 BFN:GFN map for that domain.

> 
> second I'm afraid !need_iommu is not an accurate condition to represent
> PV domain. what about iommu also enabled for future PV domains?
> 

I don't quite follow... need_iommu is a per-domain flag, set for dom0 when in strict mode, set for others when passing through a device. Either way, if Xen is maintaining the IOMMU pagetables then it is clearly unsafe for the domain to also be messing with them.

  Cheers,

    Paul

> Thanks
> Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-02-23  9:41         ` Paul Durrant
@ 2018-02-24  2:57           ` Tian, Kevin
  2018-02-26  9:57             ` Paul Durrant
  0 siblings, 1 reply; 68+ messages in thread
From: Tian, Kevin @ 2018-02-24  2:57 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson, Daniel De Graaf

> From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> Sent: Friday, February 23, 2018 5:41 PM
> 
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > Sent: 23 February 2018 05:17
> > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> devel@lists.xenproject.org
> > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> > __HYPERCALL_iommu_op
> >
> > > From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> > > Sent: Tuesday, February 13, 2018 5:23 PM
> > >
> > > > -----Original Message-----
> > > > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > > > Sent: 13 February 2018 06:43
> > > > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> > > devel@lists.xenproject.org
> > > > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > > > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > > > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > > > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > > > <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > > > Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> > > > __HYPERCALL_iommu_op
> > > >
> > > > > From: Paul Durrant
> > > > > Sent: Monday, February 12, 2018 6:47 PM
> > > > >
> > > > > This patch introduces the boilerplate for a new hypercall to allow a
> > > > > domain to control IOMMU mappings for its own pages.
> > > > > Whilst there is duplication of code between the native and compat
> > > entry
> > > > > points which appears ripe for some form of combination, I think it is
> > > > > better to maintain the separation as-is because the compat entry
> point
> > > > > will necessarily gain complexity in subsequent patches.
> > > > >
> > > > > NOTE: This hypercall is only implemented for x86 and is currently
> > > > >       restricted by XSM to dom0 since it could be used to cause
> IOMMU
> > > > >       faults which may bring down a host.
> > > > >
> > > > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > > > [...]
> > > > > +
> > > > > +
> > > > > +static bool can_control_iommu(void)
> > > > > +{
> > > > > +    struct domain *currd = current->domain;
> > > > > +
> > > > > +    /*
> > > > > +     * IOMMU mappings cannot be manipulated if:
> > > > > +     * - the IOMMU is not enabled or,
> > > > > +     * - the IOMMU is passed through or,
> > > > > +     * - shared EPT configured or,
> > > > > +     * - Xen is maintaining an identity map.
> > > >
> > > > "for dom0"
> > > >
> > > > > +     */
> > > > > +    if ( !iommu_enabled || iommu_passthrough ||
> > > > > +         iommu_use_hap_pt(currd) || need_iommu(currd) )
> > > >
> > > > I guess it's clearer to directly check iommu_dom0_strict here
> > >
> > > Well, the problem with that is that it totally ties this interface to dom0.
> > > Whilst, in practice, that is the case at the moment (because of the xsm
> > > check) I do want to leave the potential to allow other PV domains to
> control
> > > their IOMMU mappings, if that make sense in future.
> > >
> >
> > first it's inconsistent from the comments - "Xen is maintaining
> > an identity map" which only applies to dom0.
> 
> That's not true. If I assign a PCI device to an HVM domain, for instance,
> then need_iommu() is true for that domain and indeed Xen maintains a 1:1
> BFN:GFN map for that domain.
> 
> >
> > second I'm afraid !need_iommu is not an accurate condition to represent
> > PV domain. what about iommu also enabled for future PV domains?
> >
> 
> I don't quite follow... need_iommu is a per-domain flag, set for dom0 when
> in strict mode, set for others when passing through a device. Either way, if
> Xen is maintaining the IOMMU pagetables then it is clearly unsafe for the
> domain to also be messing with them.
> 

I don't think it's a mess. Xen always maintains the IOMMU pagetables
in a way that guest expects:

1) for dom0 (w/o pvIOMMU) in strict mode, it's MFN:MFN identity mapping
2) for dom0 (w/ pvIOMMU), it's BFN:MFN mapping
3) for HVM (w/o virtual VTd) with passthrough device, it's GFN:MFN 
4) for HVM (w/ virtual VTd) with passthrough device, it's BFN:MFN

(from IOMMU p.o.v we can always call all 4 categories as BFN:MFN. 
I deliberately separate them from usage p.o.v, where 'BFN'
represents the cases where guest explicitly manages a new address
space - different from physical address space in its mind)

there is an address space switch in 2) and 4) before and after
enabling vIOMMU.

above is why I didn’t follow the assumption that "Xen is maintaining 
an identity map" is identical to need_iommu.

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB
  2018-02-23  9:35         ` Paul Durrant
@ 2018-02-24  3:01           ` Tian, Kevin
  2018-02-26  9:38             ` Paul Durrant
  0 siblings, 1 reply; 68+ messages in thread
From: Tian, Kevin @ 2018-02-24  3:01 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson

> From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> Sent: Friday, February 23, 2018 5:35 PM
> 
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > Sent: 23 February 2018 05:36
> > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> devel@lists.xenproject.org
> > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > <jbeulich@suse.com>
> > Subject: RE: [Xen-devel] [PATCH 7/7] x86: add iommu_ops to map and
> > unmap pages, and also to flush the IOTLB
> >
> > > From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> > > Sent: Tuesday, February 13, 2018 5:56 PM
> > >
> > > > -----Original Message-----
> > > > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > > > Sent: 13 February 2018 06:56
> > > > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> > > devel@lists.xenproject.org
> > > > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > > > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > > > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > > > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > > > <jbeulich@suse.com>
> > > > Subject: RE: [Xen-devel] [PATCH 7/7] x86: add iommu_ops to map and
> > > > unmap pages, and also to flush the IOTLB
> > > >
> > > > > From: Paul Durrant
> > > > > Sent: Monday, February 12, 2018 6:47 PM
> > > > >
> > > > > This patch adds iommu_ops to allow a domain with control_iommu
> > > > > privilege
> > > > > to map and unmap pages from any guest over which it has mapping
> > > > > privilege
> > > > > in the IOMMU.
> > > > > These operations implicitly disable IOTLB flushing so that the caller
> can
> > > > > batch operations and then explicitly flush the IOTLB using the
> > > iommu_op
> > > > > also added by this patch.
> > > >
> > > > given that last discussion is 2yrs ago and you said actual
> implementation
> > > > already biased from original spec, it'd be difficult to judge whether
> > > current
> > > > change is sufficient or just 1st step. Could you summarize what have
> > > > been changed from last spec, and also any further tasks in your TODO
> > list?
> > >
> > > Kevin,
> > >
> > > The main changes are:
> > >
> > > - there is no op to query mapping capability... instead the hypercall will
> fail
> > > with -EACCES
> > > - there is no longer an option to avoid reference counting map and
> unmap
> > > operations
> > > - there are no longer separate ops for mapping local and remote pages
> > > (DOMID_SELF should be passed to the map op for local pages), and ops
> > > always deal with GFNs not MFNs
> > >   - also I have dropped the idea of a global m2b map, so...
> > >   - it is now going to be the responsibility of the code running in the
> > > mapping domain to track what it has mapped [1]
> > > - there is no illusion that pages other 4k are supported at the moment
> > > - the flush operation is now explicit
> > >
> > > [1] this would be an issue if the interface becomes usable for anything
> > > other than dom0 as we'd also need something in Xen to release the
> page
> > > refs if the domain was forcibly destroyed, but I think the m2b was the
> > > wrong solution since it necessitates a full scan of *host* RAM on any
> > > domain destruction
> > >
> > > The main item on my TODO list is to implement a new IOREQ to allow
> > > invalidation of specific guest pages. Think of the current 'invalidate map
> > > cache' as a global flush... I need a specific flush so that a
> > > decrease_reservation hypercall issued by a guest can instead tell
> emulators
> > > exactly which pages are being removed from guest. It is then the
> > emulators'
> > > responsibilities to unmap those pages if they had them mapped (either
> > > through MMU or IOMMU) which then drop page refs and actually allow
> the
> > > pages to be recycled.
> > >
> > > I will, of course, need to come up with more Linux code to test all this,
> > > which will eventually lead to kernel and user APIs to allow emulators
> > > running in dom0 to IOMMU map guest pages.
> >
> > Thanks for elaboration. I didn't find original proposal. Can you
> > attach or point me to a link?
> >
> 
> FWIW, I've attached Malcolm's original for reference.

Thanks. I'll have a read.

> 
> > >
> > > >
> > > > at least just map/unmap operations definitely not meet XenGT
> > > > requirement...
> > > >
> > >
> > > What aspect of the hypercall interface does not meet XenGT's
> > > requirements? It would be good to know now then I can make any
> > > necessary adjustments in v2.
> > >
> >
> > XenGT needs to replace GFN with BFN into shadow GPU page table
> > for a given domain.
> 
> I assume xengt would be dynamically mapping the gfn at this point...
> 
> > Previously iirc there is a query interface for such
> > purpose, since the mapping is managed by hypervisor. Based on above
> > description (e.g. m2b), did you intend to let Dom0 pvIOMMU driver
> > manage all related mapping information thus GVT-g just consults
> > pvIOMMU driver for such purpose?
> >
> 
> ...so my plan is that the dom0 API picks a bfn, does the mapping and then
> passes the bfn back to the caller.
> 

A curious question. How to pass the domid from XenGT to pvIOMMU
driver so the latter knows whether it's a local or remote mapping? 
Ideally pvIOMMU driver is registered to Linux IOMMU core layer,
of which all existing API wrappers are only for local mapping today...

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB
  2018-02-24  3:01           ` Tian, Kevin
@ 2018-02-26  9:38             ` Paul Durrant
  0 siblings, 0 replies; 68+ messages in thread
From: Paul Durrant @ 2018-02-26  9:38 UTC (permalink / raw)
  To: Kevin Tian, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: 24 February 2018 03:02
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> <jbeulich@suse.com>
> Subject: RE: [Xen-devel] [PATCH 7/7] x86: add iommu_ops to map and
> unmap pages, and also to flush the IOTLB
[snip]
> 
> >
> > ...so my plan is that the dom0 API picks a bfn, does the mapping and then
> > passes the bfn back to the caller.
> >
> 
> A curious question. How to pass the domid from XenGT to pvIOMMU
> driver so the latter knows whether it's a local or remote mapping?
> Ideally pvIOMMU driver is registered to Linux IOMMU core layer,
> of which all existing API wrappers are only for local mapping today...
> 

It may well be that, for remote mapping, that layer is just not appropriate. I'll look into that when I get to that stage.

Cheers,

    Paul

> Thanks
> Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-02-24  2:57           ` Tian, Kevin
@ 2018-02-26  9:57             ` Paul Durrant
  2018-02-26 11:55               ` Tian, Kevin
  2018-02-27  5:05               ` Tian, Kevin
  0 siblings, 2 replies; 68+ messages in thread
From: Paul Durrant @ 2018-02-26  9:57 UTC (permalink / raw)
  To: Kevin Tian, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson, Daniel De Graaf

> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: 24 February 2018 02:57
> To: Paul Durrant <Paul.Durrant@citrix.com>; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> __HYPERCALL_iommu_op
> 
> > From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> > Sent: Friday, February 23, 2018 5:41 PM
> >
> > > -----Original Message-----
> > > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > > Sent: 23 February 2018 05:17
> > > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> > devel@lists.xenproject.org
> > > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > > <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > > Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> > > __HYPERCALL_iommu_op
> > >
> > > > From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> > > > Sent: Tuesday, February 13, 2018 5:23 PM
> > > >
> > > > > -----Original Message-----
> > > > > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > > > > Sent: 13 February 2018 06:43
> > > > > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> > > > devel@lists.xenproject.org
> > > > > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > > > > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > > > > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > > > > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > > > > <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > > > > Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> > > > > __HYPERCALL_iommu_op
> > > > >
> > > > > > From: Paul Durrant
> > > > > > Sent: Monday, February 12, 2018 6:47 PM
> > > > > >
> > > > > > This patch introduces the boilerplate for a new hypercall to allow a
> > > > > > domain to control IOMMU mappings for its own pages.
> > > > > > Whilst there is duplication of code between the native and compat
> > > > entry
> > > > > > points which appears ripe for some form of combination, I think it is
> > > > > > better to maintain the separation as-is because the compat entry
> > point
> > > > > > will necessarily gain complexity in subsequent patches.
> > > > > >
> > > > > > NOTE: This hypercall is only implemented for x86 and is currently
> > > > > >       restricted by XSM to dom0 since it could be used to cause
> > IOMMU
> > > > > >       faults which may bring down a host.
> > > > > >
> > > > > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > > > > [...]
> > > > > > +
> > > > > > +
> > > > > > +static bool can_control_iommu(void)
> > > > > > +{
> > > > > > +    struct domain *currd = current->domain;
> > > > > > +
> > > > > > +    /*
> > > > > > +     * IOMMU mappings cannot be manipulated if:
> > > > > > +     * - the IOMMU is not enabled or,
> > > > > > +     * - the IOMMU is passed through or,
> > > > > > +     * - shared EPT configured or,
> > > > > > +     * - Xen is maintaining an identity map.
> > > > >
> > > > > "for dom0"
> > > > >
> > > > > > +     */
> > > > > > +    if ( !iommu_enabled || iommu_passthrough ||
> > > > > > +         iommu_use_hap_pt(currd) || need_iommu(currd) )
> > > > >
> > > > > I guess it's clearer to directly check iommu_dom0_strict here
> > > >
> > > > Well, the problem with that is that it totally ties this interface to dom0.
> > > > Whilst, in practice, that is the case at the moment (because of the xsm
> > > > check) I do want to leave the potential to allow other PV domains to
> > control
> > > > their IOMMU mappings, if that make sense in future.
> > > >
> > >
> > > first it's inconsistent from the comments - "Xen is maintaining
> > > an identity map" which only applies to dom0.
> >
> > That's not true. If I assign a PCI device to an HVM domain, for instance,
> > then need_iommu() is true for that domain and indeed Xen maintains a 1:1
> > BFN:GFN map for that domain.
> >
> > >
> > > second I'm afraid !need_iommu is not an accurate condition to represent
> > > PV domain. what about iommu also enabled for future PV domains?
> > >
> >
> > I don't quite follow... need_iommu is a per-domain flag, set for dom0 when
> > in strict mode, set for others when passing through a device. Either way, if
> > Xen is maintaining the IOMMU pagetables then it is clearly unsafe for the
> > domain to also be messing with them.
> >
> 
> I don't think it's a mess. Xen always maintains the IOMMU pagetables
> in a way that guest expects:
> 

I'll define some terms to try to avoid confusing...

- where the IOMMU code in Xen maintains a map such that BFN == MFN, let’s call this an 'identitity MFN map'
- where the IOMMU code in Xen *initially programmes* the IOMMU with an identity MFN map for the whole host, let's call this a 'host map'
- where the IOMMU code in Xen maintains a map such that BFN == GFN, let's call this an 'identity GFN map'
- where the IOMMU code in Xen *initially programmes* the IOMMU with an identity GFN map for the guest, let's call this a 'guest map'

> 1) for dom0 (w/o pvIOMMU) in strict mode, it's MFN:MFN identity mapping

Without strict mode, a host map is set up for dom0, otherwise it is an identity MFN map. In both cases the xen-swiotlb driver is use in Linux as there is no difference from its point of view.

> 2) for dom0 (w/ pvIOMMU), it's BFN:MFN mapping

With PV-IOMMU there is also a host map but since a host map is only initialized and not maintained (i.e. nothing happens when pages are removed from or added to dom0) then it is safe for dom0 to control the IOMMU mappings as it will not conflict with anything Xen is doing.

> 3) for HVM (w/o virtual VTd) with passthrough device, it's GFN:MFN

I have not been following virtual VTd closely but, yes, as it stands *when h/w is passed through* the guest gets an identity GFN map otherwise it gets no map at all.

> 4) for HVM (w/ virtual VTd) with passthrough device, it's BFN:MFN
> 

With virtual VTd I'd expect there would be a guest map and then the guest would get the same level of control over the IOMMU that PV-IOMMU allows for a PV domain but, of course, such control is as-yet unsafe for guests since an IOMMU fault can cause a host crash.

> (from IOMMU p.o.v we can always call all 4 categories as BFN:MFN.
> I deliberately separate them from usage p.o.v, where 'BFN'
> represents the cases where guest explicitly manages a new address
> space - different from physical address space in its mind)
> 
> there is an address space switch in 2) and 4) before and after
> enabling vIOMMU.

Is there? The initial mapping in 2 is the same as 1, and the initial mapping in 4 is the same as 3.

> 
> above is why I didn’t follow the assumption that "Xen is maintaining
> an identity map" is identical to need_iommu.
> 

The crucial point is that in cases 2 and 4 Xen is not *maintaining* any map so need_iommu(d) should be false and hence the domain can control its own mappings without interfering which what Xen is doing internally.

Does that help clarify?

Cheers,

  Paul

> Thanks
> Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-02-26  9:57             ` Paul Durrant
@ 2018-02-26 11:55               ` Tian, Kevin
  2018-02-27  5:05               ` Tian, Kevin
  1 sibling, 0 replies; 68+ messages in thread
From: Tian, Kevin @ 2018-02-26 11:55 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson, Daniel De Graaf

> From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> Sent: Monday, February 26, 2018 5:57 PM
> 
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > Sent: 24 February 2018 02:57
> > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> devel@lists.xenproject.org
> > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> > __HYPERCALL_iommu_op
> >
> > > From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> > > Sent: Friday, February 23, 2018 5:41 PM
> > >
> > > > -----Original Message-----
> > > > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > > > Sent: 23 February 2018 05:17
> > > > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> > > devel@lists.xenproject.org
> > > > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > > > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > > > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > > > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > > > <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > > > Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> > > > __HYPERCALL_iommu_op
> > > >
> > > > > From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> > > > > Sent: Tuesday, February 13, 2018 5:23 PM
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > > > > > Sent: 13 February 2018 06:43
> > > > > > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> > > > > devel@lists.xenproject.org
> > > > > > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > > > > > <wei.liu2@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>;
> > > > > > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > > > > > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan
> Beulich
> > > > > > <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > > > > > Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> > > > > > __HYPERCALL_iommu_op
> > > > > >
> > > > > > > From: Paul Durrant
> > > > > > > Sent: Monday, February 12, 2018 6:47 PM
> > > > > > >
> > > > > > > This patch introduces the boilerplate for a new hypercall to allow
> a
> > > > > > > domain to control IOMMU mappings for its own pages.
> > > > > > > Whilst there is duplication of code between the native and
> compat
> > > > > entry
> > > > > > > points which appears ripe for some form of combination, I think
> it is
> > > > > > > better to maintain the separation as-is because the compat entry
> > > point
> > > > > > > will necessarily gain complexity in subsequent patches.
> > > > > > >
> > > > > > > NOTE: This hypercall is only implemented for x86 and is currently
> > > > > > >       restricted by XSM to dom0 since it could be used to cause
> > > IOMMU
> > > > > > >       faults which may bring down a host.
> > > > > > >
> > > > > > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > > > > > [...]
> > > > > > > +
> > > > > > > +
> > > > > > > +static bool can_control_iommu(void)
> > > > > > > +{
> > > > > > > +    struct domain *currd = current->domain;
> > > > > > > +
> > > > > > > +    /*
> > > > > > > +     * IOMMU mappings cannot be manipulated if:
> > > > > > > +     * - the IOMMU is not enabled or,
> > > > > > > +     * - the IOMMU is passed through or,
> > > > > > > +     * - shared EPT configured or,
> > > > > > > +     * - Xen is maintaining an identity map.
> > > > > >
> > > > > > "for dom0"
> > > > > >
> > > > > > > +     */
> > > > > > > +    if ( !iommu_enabled || iommu_passthrough ||
> > > > > > > +         iommu_use_hap_pt(currd) || need_iommu(currd) )
> > > > > >
> > > > > > I guess it's clearer to directly check iommu_dom0_strict here
> > > > >
> > > > > Well, the problem with that is that it totally ties this interface to
> dom0.
> > > > > Whilst, in practice, that is the case at the moment (because of the
> xsm
> > > > > check) I do want to leave the potential to allow other PV domains to
> > > control
> > > > > their IOMMU mappings, if that make sense in future.
> > > > >
> > > >
> > > > first it's inconsistent from the comments - "Xen is maintaining
> > > > an identity map" which only applies to dom0.
> > >
> > > That's not true. If I assign a PCI device to an HVM domain, for instance,
> > > then need_iommu() is true for that domain and indeed Xen maintains a
> 1:1
> > > BFN:GFN map for that domain.
> > >
> > > >
> > > > second I'm afraid !need_iommu is not an accurate condition to
> represent
> > > > PV domain. what about iommu also enabled for future PV domains?
> > > >
> > >
> > > I don't quite follow... need_iommu is a per-domain flag, set for dom0
> when
> > > in strict mode, set for others when passing through a device. Either way,
> if
> > > Xen is maintaining the IOMMU pagetables then it is clearly unsafe for
> the
> > > domain to also be messing with them.
> > >
> >
> > I don't think it's a mess. Xen always maintains the IOMMU pagetables
> > in a way that guest expects:
> >
> 
> I'll define some terms to try to avoid confusing...
> 
> - where the IOMMU code in Xen maintains a map such that BFN == MFN,
> let’s call this an 'identitity MFN map'
> - where the IOMMU code in Xen *initially programmes* the IOMMU with
> an identity MFN map for the whole host, let's call this a 'host map'
> - where the IOMMU code in Xen maintains a map such that BFN == GFN,
> let's call this an 'identity GFN map'
> - where the IOMMU code in Xen *initially programmes* the IOMMU with
> an identity GFN map for the guest, let's call this a 'guest map'
> 
> > 1) for dom0 (w/o pvIOMMU) in strict mode, it's MFN:MFN identity
> mapping
> 
> Without strict mode, a host map is set up for dom0, otherwise it is an
> identity MFN map. In both cases the xen-swiotlb driver is use in Linux as
> there is no difference from its point of view.
> 
> > 2) for dom0 (w/ pvIOMMU), it's BFN:MFN mapping
> 
> With PV-IOMMU there is also a host map but since a host map is only
> initialized and not maintained (i.e. nothing happens when pages are
> removed from or added to dom0) then it is safe for dom0 to control the
> IOMMU mappings as it will not conflict with anything Xen is doing.
> 
> > 3) for HVM (w/o virtual VTd) with passthrough device, it's GFN:MFN
> 
> I have not been following virtual VTd closely but, yes, as it stands *when
> h/w is passed through* the guest gets an identity GFN map otherwise it
> gets no map at all.
> 
> > 4) for HVM (w/ virtual VTd) with passthrough device, it's BFN:MFN
> >
> 
> With virtual VTd I'd expect there would be a guest map and then the guest
> would get the same level of control over the IOMMU that PV-IOMMU
> allows for a PV domain but, of course, such control is as-yet unsafe for
> guests since an IOMMU fault can cause a host crash.
> 
> > (from IOMMU p.o.v we can always call all 4 categories as BFN:MFN.
> > I deliberately separate them from usage p.o.v, where 'BFN'
> > represents the cases where guest explicitly manages a new address
> > space - different from physical address space in its mind)
> >
> > there is an address space switch in 2) and 4) before and after
> > enabling vIOMMU.
> 
> Is there? The initial mapping in 2 is the same as 1, and the initial mapping in
> 4 is the same as 3.
> 
> >
> > above is why I didn’t follow the assumption that "Xen is maintaining
> > an identity map" is identical to need_iommu.
> >
> 
> The crucial point is that in cases 2 and 4 Xen is not *maintaining* any map
> so need_iommu(d) should be false and hence the domain can control its
> own mappings without interfering which what Xen is doing internally.
> 
> Does that help clarify?
> 

well, now I can see where the confusion comes from. what I described
earlier was about the actual mapping relationship by IOMMU page table.
it's also sort of map maintained by Xen... :-)

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-02-26  9:57             ` Paul Durrant
  2018-02-26 11:55               ` Tian, Kevin
@ 2018-02-27  5:05               ` Tian, Kevin
  2018-02-27  9:32                 ` Paul Durrant
  1 sibling, 1 reply; 68+ messages in thread
From: Tian, Kevin @ 2018-02-27  5:05 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson, Daniel De Graaf

> From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> Sent: Monday, February 26, 2018 5:57 PM
> 
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > Sent: 24 February 2018 02:57
> > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> devel@lists.xenproject.org
> > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> > __HYPERCALL_iommu_op
> >
> > > From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> > > Sent: Friday, February 23, 2018 5:41 PM
> > >
> > > > -----Original Message-----
> > > > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > > > Sent: 23 February 2018 05:17
> > > > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> > > devel@lists.xenproject.org
> > > > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > > > <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>;
> > > > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > > > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan Beulich
> > > > <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > > > Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> > > > __HYPERCALL_iommu_op
> > > >
> > > > > From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> > > > > Sent: Tuesday, February 13, 2018 5:23 PM
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > > > > > Sent: 13 February 2018 06:43
> > > > > > To: Paul Durrant <Paul.Durrant@citrix.com>; xen-
> > > > > devel@lists.xenproject.org
> > > > > > Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
> > > > > > <wei.liu2@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>;
> > > > > > Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> > > > > > <Ian.Jackson@citrix.com>; Tim (Xen.org) <tim@xen.org>; Jan
> Beulich
> > > > > > <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> > > > > > Subject: RE: [Xen-devel] [PATCH 5/7] public / x86: introduce
> > > > > > __HYPERCALL_iommu_op
> > > > > >
> > > > > > > From: Paul Durrant
> > > > > > > Sent: Monday, February 12, 2018 6:47 PM
> > > > > > >
> > > > > > > This patch introduces the boilerplate for a new hypercall to allow
> a
> > > > > > > domain to control IOMMU mappings for its own pages.
> > > > > > > Whilst there is duplication of code between the native and
> compat
> > > > > entry
> > > > > > > points which appears ripe for some form of combination, I think
> it is
> > > > > > > better to maintain the separation as-is because the compat entry
> > > point
> > > > > > > will necessarily gain complexity in subsequent patches.
> > > > > > >
> > > > > > > NOTE: This hypercall is only implemented for x86 and is currently
> > > > > > >       restricted by XSM to dom0 since it could be used to cause
> > > IOMMU
> > > > > > >       faults which may bring down a host.
> > > > > > >
> > > > > > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > > > > > [...]
> > > > > > > +
> > > > > > > +
> > > > > > > +static bool can_control_iommu(void)
> > > > > > > +{
> > > > > > > +    struct domain *currd = current->domain;
> > > > > > > +
> > > > > > > +    /*
> > > > > > > +     * IOMMU mappings cannot be manipulated if:
> > > > > > > +     * - the IOMMU is not enabled or,
> > > > > > > +     * - the IOMMU is passed through or,
> > > > > > > +     * - shared EPT configured or,
> > > > > > > +     * - Xen is maintaining an identity map.
> > > > > >
> > > > > > "for dom0"
> > > > > >
> > > > > > > +     */
> > > > > > > +    if ( !iommu_enabled || iommu_passthrough ||
> > > > > > > +         iommu_use_hap_pt(currd) || need_iommu(currd) )
> > > > > >
> > > > > > I guess it's clearer to directly check iommu_dom0_strict here
> > > > >
> > > > > Well, the problem with that is that it totally ties this interface to
> dom0.
> > > > > Whilst, in practice, that is the case at the moment (because of the
> xsm
> > > > > check) I do want to leave the potential to allow other PV domains to
> > > control
> > > > > their IOMMU mappings, if that make sense in future.
> > > > >
> > > >
> > > > first it's inconsistent from the comments - "Xen is maintaining
> > > > an identity map" which only applies to dom0.
> > >
> > > That's not true. If I assign a PCI device to an HVM domain, for instance,
> > > then need_iommu() is true for that domain and indeed Xen maintains a
> 1:1
> > > BFN:GFN map for that domain.
> > >
> > > >
> > > > second I'm afraid !need_iommu is not an accurate condition to
> represent
> > > > PV domain. what about iommu also enabled for future PV domains?
> > > >
> > >
> > > I don't quite follow... need_iommu is a per-domain flag, set for dom0
> when
> > > in strict mode, set for others when passing through a device. Either way,
> if
> > > Xen is maintaining the IOMMU pagetables then it is clearly unsafe for
> the
> > > domain to also be messing with them.
> > >
> >
> > I don't think it's a mess. Xen always maintains the IOMMU pagetables
> > in a way that guest expects:
> >
> 
> I'll define some terms to try to avoid confusing...
> 
> - where the IOMMU code in Xen maintains a map such that BFN == MFN,
> let’s call this an 'identitity MFN map'
> - where the IOMMU code in Xen *initially programmes* the IOMMU with
> an identity MFN map for the whole host, let's call this a 'host map'
> - where the IOMMU code in Xen maintains a map such that BFN == GFN,
> let's call this an 'identity GFN map'
> - where the IOMMU code in Xen *initially programmes* the IOMMU with
> an identity GFN map for the guest, let's call this a 'guest map'

Can you introduce a name for such mapping? then when you describe
identity mapping in future version, people can immediately get the
actual meaning. At least to me I always think about the mapping on 
actual IOMMU page table first, which is always about BFN->MFN 
mapping (where the definition of BFN varies in different usages). 

> 
> > 1) for dom0 (w/o pvIOMMU) in strict mode, it's MFN:MFN identity
> mapping
> 
> Without strict mode, a host map is set up for dom0, otherwise it is an
> identity MFN map. In both cases the xen-swiotlb driver is use in Linux as
> there is no difference from its point of view.
> 
> > 2) for dom0 (w/ pvIOMMU), it's BFN:MFN mapping
> 
> With PV-IOMMU there is also a host map but since a host map is only
> initialized and not maintained (i.e. nothing happens when pages are
> removed from or added to dom0) then it is safe for dom0 to control the
> IOMMU mappings as it will not conflict with anything Xen is doing.

what do you mean by not maintained? host map will be programmed
to IOMMU page table before launching Dom0, since hypervisor doesn't
know whether there will be a pvIOMMU driver launched. Later 
pvIOMMU driver is loaded and issues hypercall to control its own
mapping, hypervisor then switch IOMMU page table from host map
to the new one, which is the same logic regarding to virtual VTd for
HVM guest. that is how I call an address space switch.

> 
> > 3) for HVM (w/o virtual VTd) with passthrough device, it's GFN:MFN
> 
> I have not been following virtual VTd closely but, yes, as it stands *when
> h/w is passed through* the guest gets an identity GFN map otherwise it
> gets no map at all.
> 
> > 4) for HVM (w/ virtual VTd) with passthrough device, it's BFN:MFN
> >
> 
> With virtual VTd I'd expect there would be a guest map and then the guest
> would get the same level of control over the IOMMU that PV-IOMMU
> allows for a PV domain but, of course, such control is as-yet unsafe for
> guests since an IOMMU fault can cause a host crash.

I'm not sure why you call it unsafe. even today with any passthrough
device (w/o virtual VTd exposed), a bad guest driver can always cause
DMA access to invalid GPA address and thus cause IOMMU fault. adding
virtual VTd doesn't change any security aspect here.

> 
> > (from IOMMU p.o.v we can always call all 4 categories as BFN:MFN.
> > I deliberately separate them from usage p.o.v, where 'BFN'
> > represents the cases where guest explicitly manages a new address
> > space - different from physical address space in its mind)
> >
> > there is an address space switch in 2) and 4) before and after
> > enabling vIOMMU.
> 
> Is there? The initial mapping in 2 is the same as 1, and the initial mapping in
> 4 is the same as 3.
> 
> >
> > above is why I didn’t follow the assumption that "Xen is maintaining
> > an identity map" is identical to need_iommu.
> >
> 
> The crucial point is that in cases 2 and 4 Xen is not *maintaining* any map
> so need_iommu(d) should be false and hence the domain can control its
> own mappings without interfering which what Xen is doing internally.
> 
> Does that help clarify?
> 

again, above description is really confusing as you don't specify
which mapping is referred to here.

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-02-27  5:05               ` Tian, Kevin
@ 2018-02-27  9:32                 ` Paul Durrant
  2018-02-28  2:53                   ` Tian, Kevin
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-02-27  9:32 UTC (permalink / raw)
  To: Kevin Tian, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson, Daniel De Graaf

> -----Original Message-----
[snip]
> > I'll define some terms to try to avoid confusing...
> >
> > - where the IOMMU code in Xen maintains a map such that BFN == MFN,
> > let’s call this an 'identitity MFN map'
> > - where the IOMMU code in Xen *initially programmes* the IOMMU with
> > an identity MFN map for the whole host, let's call this a 'host map'
> > - where the IOMMU code in Xen maintains a map such that BFN == GFN,
> > let's call this an 'identity GFN map'
> > - where the IOMMU code in Xen *initially programmes* the IOMMU with
> > an identity GFN map for the guest, let's call this a 'guest map'
> 
> Can you introduce a name for such mapping? then when you describe
> identity mapping in future version, people can immediately get the
> actual meaning. At least to me I always think about the mapping on
> actual IOMMU page table first, which is always about BFN->MFN
> mapping (where the definition of BFN varies in different usages).
> 

My point is that there are two notional types of identity map: one where BFN == MFN and one where BFN == GFN. Then there is whether Xen maintains the map, or just programmes it at domain create and thereafter leaves it alone.

> >
> > > 1) for dom0 (w/o pvIOMMU) in strict mode, it's MFN:MFN identity
> > mapping
> >
> > Without strict mode, a host map is set up for dom0, otherwise it is an
> > identity MFN map. In both cases the xen-swiotlb driver is use in Linux as
> > there is no difference from its point of view.
> >
> > > 2) for dom0 (w/ pvIOMMU), it's BFN:MFN mapping
> >
> > With PV-IOMMU there is also a host map but since a host map is only
> > initialized and not maintained (i.e. nothing happens when pages are
> > removed from or added to dom0) then it is safe for dom0 to control the
> > IOMMU mappings as it will not conflict with anything Xen is doing.
> 
> what do you mean by not maintained?

By 'maintained' I mean that, when the P2M of the guest is modified, Xen will adjust the IOMMU mappings accordingly.

> host map will be programmed
> to IOMMU page table before launching Dom0, since hypervisor doesn't
> know whether there will be a pvIOMMU driver launched. Later
> pvIOMMU driver is loaded and issues hypercall to control its own
> mapping, hypervisor then switch IOMMU page table from host map
> to the new one, which is the same logic regarding to virtual VTd for
> HVM guest. that is how I call an address space switch.

But that is not what happens. If need_iommu() is false then Xen will have programmed a mapping (BFN == MFN in the case of dom0), but will not touch it after that. Whether the domain (dom0 in this case) chooses to modify those mapping after that is up to the domain.... but it is free to do so because Xen will not dynamically adjust the mapping should the P2M change.
With PV-IOMMU there is no 'big switch'; Xen does nothing more than set up the initial mapping and then respond to the individual map/unmap hypercalls that the domain may or may not issue.

> 
> >
> > > 3) for HVM (w/o virtual VTd) with passthrough device, it's GFN:MFN
> >
> > I have not been following virtual VTd closely but, yes, as it stands *when
> > h/w is passed through* the guest gets an identity GFN map otherwise it
> > gets no map at all.
> >
> > > 4) for HVM (w/ virtual VTd) with passthrough device, it's BFN:MFN
> > >
> >
> > With virtual VTd I'd expect there would be a guest map and then the guest
> > would get the same level of control over the IOMMU that PV-IOMMU
> > allows for a PV domain but, of course, such control is as-yet unsafe for
> > guests since an IOMMU fault can cause a host crash.
> 
> I'm not sure why you call it unsafe. even today with any passthrough
> device (w/o virtual VTd exposed), a bad guest driver can always cause
> DMA access to invalid GPA address and thus cause IOMMU fault. adding
> virtual VTd doesn't change any security aspect here.

That's not entirely true. Xen could easily fill the IOMMU with a BFN == GFN mapping for valid GFN and then program all the other BFN to point at a scratch page and thus avoid any possibility of an IOMMU fault caused by an in-guest driver mis-programming a device. As soon as Xen gives the domain control over its own mappings then it can no longer ensure all BFN map to something valid.

> 
> >
> > > (from IOMMU p.o.v we can always call all 4 categories as BFN:MFN.
> > > I deliberately separate them from usage p.o.v, where 'BFN'
> > > represents the cases where guest explicitly manages a new address
> > > space - different from physical address space in its mind)
> > >
> > > there is an address space switch in 2) and 4) before and after
> > > enabling vIOMMU.
> >
> > Is there? The initial mapping in 2 is the same as 1, and the initial mapping in
> > 4 is the same as 3.
> >
> > >
> > > above is why I didn’t follow the assumption that "Xen is maintaining
> > > an identity map" is identical to need_iommu.
> > >
> >
> > The crucial point is that in cases 2 and 4 Xen is not *maintaining* any map
> > so need_iommu(d) should be false and hence the domain can control its
> > own mappings without interfering which what Xen is doing internally.
> >
> > Does that help clarify?
> >
> 
> again, above description is really confusing as you don't specify
> which mapping is referred to here.
> 

That's because the actual mapping is irrelevant here. Do you now understand the difference between Xen setting up an initial mapping and Xen maintaining that mapping (by keeping it synchronized with the P2M)? That's what the need_iommu(d) flag is all about.... it has nothing to do with whether the mapping is identity MFN or identity GFN, or something different.

  Cheers,

    Paul

> Thanks
> Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-02-27  9:32                 ` Paul Durrant
@ 2018-02-28  2:53                   ` Tian, Kevin
  2018-02-28  8:55                     ` Paul Durrant
  0 siblings, 1 reply; 68+ messages in thread
From: Tian, Kevin @ 2018-02-28  2:53 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson, Daniel De Graaf

> From: Paul Durrant [mailto:Paul.Durrant@citrix.com]
> Sent: Tuesday, February 27, 2018 5:33 PM
> 
> > -----Original Message-----
> [snip]
> > > I'll define some terms to try to avoid confusing...
> > >
> > > - where the IOMMU code in Xen maintains a map such that BFN == MFN,
> > > let’s call this an 'identitity MFN map'
> > > - where the IOMMU code in Xen *initially programmes* the IOMMU
> with
> > > an identity MFN map for the whole host, let's call this a 'host map'
> > > - where the IOMMU code in Xen maintains a map such that BFN == GFN,
> > > let's call this an 'identity GFN map'
> > > - where the IOMMU code in Xen *initially programmes* the IOMMU
> with
> > > an identity GFN map for the guest, let's call this a 'guest map'
> >
> > Can you introduce a name for such mapping? then when you describe
> > identity mapping in future version, people can immediately get the
> > actual meaning. At least to me I always think about the mapping on
> > actual IOMMU page table first, which is always about BFN->MFN
> > mapping (where the definition of BFN varies in different usages).
> >
> 
> My point is that there are two notional types of identity map: one where
> BFN == MFN and one where BFN == GFN. Then there is whether Xen
> maintains the map, or just programmes it at domain create and thereafter
> leaves it alone.
> 
> > >
> > > > 1) for dom0 (w/o pvIOMMU) in strict mode, it's MFN:MFN identity
> > > mapping
> > >
> > > Without strict mode, a host map is set up for dom0, otherwise it is an
> > > identity MFN map. In both cases the xen-swiotlb driver is use in Linux as
> > > there is no difference from its point of view.
> > >
> > > > 2) for dom0 (w/ pvIOMMU), it's BFN:MFN mapping
> > >
> > > With PV-IOMMU there is also a host map but since a host map is only
> > > initialized and not maintained (i.e. nothing happens when pages are
> > > removed from or added to dom0) then it is safe for dom0 to control the
> > > IOMMU mappings as it will not conflict with anything Xen is doing.
> >
> > what do you mean by not maintained?
> 
> By 'maintained' I mean that, when the P2M of the guest is modified, Xen
> will adjust the IOMMU mappings accordingly.
> 
> > host map will be programmed
> > to IOMMU page table before launching Dom0, since hypervisor doesn't
> > know whether there will be a pvIOMMU driver launched. Later
> > pvIOMMU driver is loaded and issues hypercall to control its own
> > mapping, hypervisor then switch IOMMU page table from host map
> > to the new one, which is the same logic regarding to virtual VTd for
> > HVM guest. that is how I call an address space switch.
> 
> But that is not what happens. If need_iommu() is false then Xen will have
> programmed a mapping (BFN == MFN in the case of dom0), but will not
> touch it after that. Whether the domain (dom0 in this case) chooses to
> modify those mapping after that is up to the domain.... but it is free to do
> so because Xen will not dynamically adjust the mapping should the P2M
> change.
> With PV-IOMMU there is no 'big switch'; Xen does nothing more than set
> up the initial mapping and then respond to the individual map/unmap
> hypercalls that the domain may or may not issue.

I prefer to Xen doing an ownership switch, i.e. clear all initial mappings
before serving pvIOMMU request. otherwise pvIOMMU driver needs to
unmap the whole address space itself before serving any map/unmap
requests from other drivers, which is counterintuitive to what a normal
IOMMU driver would do (just initialize an empty page table).

> 
> >
> > >
> > > > 3) for HVM (w/o virtual VTd) with passthrough device, it's GFN:MFN
> > >
> > > I have not been following virtual VTd closely but, yes, as it stands *when
> > > h/w is passed through* the guest gets an identity GFN map otherwise it
> > > gets no map at all.
> > >
> > > > 4) for HVM (w/ virtual VTd) with passthrough device, it's BFN:MFN
> > > >
> > >
> > > With virtual VTd I'd expect there would be a guest map and then the
> guest
> > > would get the same level of control over the IOMMU that PV-IOMMU
> > > allows for a PV domain but, of course, such control is as-yet unsafe for
> > > guests since an IOMMU fault can cause a host crash.
> >
> > I'm not sure why you call it unsafe. even today with any passthrough
> > device (w/o virtual VTd exposed), a bad guest driver can always cause
> > DMA access to invalid GPA address and thus cause IOMMU fault. adding
> > virtual VTd doesn't change any security aspect here.
> 
> That's not entirely true. Xen could easily fill the IOMMU with a BFN == GFN
> mapping for valid GFN and then program all the other BFN to point at a
> scratch page and thus avoid any possibility of an IOMMU fault caused by an
> in-guest driver mis-programming a device. As soon as Xen gives the domain
> control over its own mappings then it can no longer ensure all BFN map to
> something valid.

Please note Xen never gives the domain control on the actual IOMMU page
table. w/ either pvIOMMU or virtual VTd, the map/unmap operations
are always validated by Xen and then reflected in IOMMU page table. In this
regard, nothing prevents Xen from doing similar trick - programming
invalid BFNs to pointing to scratch page, same as for GFN, and then later
replaced with guest-expected mapping upon map/unmap request.

There is no architectural difference between w/ and w/o virtual VTd.
same for pvIOMMU.

> 
> >
> > >
> > > > (from IOMMU p.o.v we can always call all 4 categories as BFN:MFN.
> > > > I deliberately separate them from usage p.o.v, where 'BFN'
> > > > represents the cases where guest explicitly manages a new address
> > > > space - different from physical address space in its mind)
> > > >
> > > > there is an address space switch in 2) and 4) before and after
> > > > enabling vIOMMU.
> > >
> > > Is there? The initial mapping in 2 is the same as 1, and the initial
> mapping in
> > > 4 is the same as 3.
> > >
> > > >
> > > > above is why I didn’t follow the assumption that "Xen is maintaining
> > > > an identity map" is identical to need_iommu.
> > > >
> > >
> > > The crucial point is that in cases 2 and 4 Xen is not *maintaining* any
> map
> > > so need_iommu(d) should be false and hence the domain can control
> its
> > > own mappings without interfering which what Xen is doing internally.
> > >
> > > Does that help clarify?
> > >
> >
> > again, above description is really confusing as you don't specify
> > which mapping is referred to here.
> >
> 
> That's because the actual mapping is irrelevant here. Do you now
> understand the difference between Xen setting up an initial mapping and
> Xen maintaining that mapping (by keeping it synchronized with the P2M)?
> That's what the need_iommu(d) flag is all about.... it has nothing to do with
> whether the mapping is identity MFN or identity GFN, or something
> different.
> 

Though I understand the way that you are describing, saying "Xen is 
maintaining an identity map" without any decoration explaining
'identity' for what did generate confusion. In IOMMU context, identity 
mapping w/o any decoration imo always refers to the IOMMU page table 
by default. If you intend it to mean something different, then please 
elaborate it in the code comment.

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-02-28  2:53                   ` Tian, Kevin
@ 2018-02-28  8:55                     ` Paul Durrant
  0 siblings, 0 replies; 68+ messages in thread
From: Paul Durrant @ 2018-02-28  8:55 UTC (permalink / raw)
  To: Kevin Tian, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, Ian Jackson, Daniel De Graaf

> -----Original Message-----
[snip]
> >
> > But that is not what happens. If need_iommu() is false then Xen will have
> > programmed a mapping (BFN == MFN in the case of dom0), but will not
> > touch it after that. Whether the domain (dom0 in this case) chooses to
> > modify those mapping after that is up to the domain.... but it is free to do
> > so because Xen will not dynamically adjust the mapping should the P2M
> > change.
> > With PV-IOMMU there is no 'big switch'; Xen does nothing more than set
> > up the initial mapping and then respond to the individual map/unmap
> > hypercalls that the domain may or may not issue.
> 
> I prefer to Xen doing an ownership switch, i.e. clear all initial mappings
> before serving pvIOMMU request. otherwise pvIOMMU driver needs to
> unmap the whole address space itself before serving any map/unmap
> requests from other drivers, which is counterintuitive to what a normal
> IOMMU driver would do (just initialize an empty page table).
> 

I don't think there is any need. I have code happily running that simply overwrites the existing mappings. Nothing in the lower layers requires an IOMMU PTE to be clear before it is written, no no need for explicit unmap. There is also danger in clearing the existing mappings as I have discovered... some hosts, including my test host, have undeclared RMRRs inside some of the E820 reserved regions so completely removing the identity MFN map (leaving only RMRRs) causes the machine to lock up. Simply allowing dom0 to write an identity GFN map over the top avoids this problem... but I can also a case for starting from a 'clean' IOMMU.

> >
> > >
> > > >
> > > > > 3) for HVM (w/o virtual VTd) with passthrough device, it's GFN:MFN
> > > >
> > > > I have not been following virtual VTd closely but, yes, as it stands
> *when
> > > > h/w is passed through* the guest gets an identity GFN map otherwise it
> > > > gets no map at all.
> > > >
> > > > > 4) for HVM (w/ virtual VTd) with passthrough device, it's BFN:MFN
> > > > >
> > > >
> > > > With virtual VTd I'd expect there would be a guest map and then the
> > guest
> > > > would get the same level of control over the IOMMU that PV-IOMMU
> > > > allows for a PV domain but, of course, such control is as-yet unsafe for
> > > > guests since an IOMMU fault can cause a host crash.
> > >
> > > I'm not sure why you call it unsafe. even today with any passthrough
> > > device (w/o virtual VTd exposed), a bad guest driver can always cause
> > > DMA access to invalid GPA address and thus cause IOMMU fault. adding
> > > virtual VTd doesn't change any security aspect here.
> >
> > That's not entirely true. Xen could easily fill the IOMMU with a BFN == GFN
> > mapping for valid GFN and then program all the other BFN to point at a
> > scratch page and thus avoid any possibility of an IOMMU fault caused by an
> > in-guest driver mis-programming a device. As soon as Xen gives the domain
> > control over its own mappings then it can no longer ensure all BFN map to
> > something valid.
> 
> Please note Xen never gives the domain control on the actual IOMMU page
> table. w/ either pvIOMMU or virtual VTd, the map/unmap operations
> are always validated by Xen and then reflected in IOMMU page table. In this
> regard, nothing prevents Xen from doing similar trick - programming
> invalid BFNs to pointing to scratch page, same as for GFN, and then later
> replaced with guest-expected mapping upon map/unmap request.
> 
> There is no architectural difference between w/ and w/o virtual VTd.
> same for pvIOMMU.
> 

Yes, true. If Xen always treated unmap operations as 'map to a scratch page' then faults could be avoided. Perhaps it would be better to do that rather than actually clearing PTEs.

> >
> > >
> > > >
> > > > > (from IOMMU p.o.v we can always call all 4 categories as BFN:MFN.
> > > > > I deliberately separate them from usage p.o.v, where 'BFN'
> > > > > represents the cases where guest explicitly manages a new address
> > > > > space - different from physical address space in its mind)
> > > > >
> > > > > there is an address space switch in 2) and 4) before and after
> > > > > enabling vIOMMU.
> > > >
> > > > Is there? The initial mapping in 2 is the same as 1, and the initial
> > mapping in
> > > > 4 is the same as 3.
> > > >
> > > > >
> > > > > above is why I didn’t follow the assumption that "Xen is maintaining
> > > > > an identity map" is identical to need_iommu.
> > > > >
> > > >
> > > > The crucial point is that in cases 2 and 4 Xen is not *maintaining* any
> > map
> > > > so need_iommu(d) should be false and hence the domain can control
> > its
> > > > own mappings without interfering which what Xen is doing internally.
> > > >
> > > > Does that help clarify?
> > > >
> > >
> > > again, above description is really confusing as you don't specify
> > > which mapping is referred to here.
> > >
> >
> > That's because the actual mapping is irrelevant here. Do you now
> > understand the difference between Xen setting up an initial mapping and
> > Xen maintaining that mapping (by keeping it synchronized with the P2M)?
> > That's what the need_iommu(d) flag is all about.... it has nothing to do with
> > whether the mapping is identity MFN or identity GFN, or something
> > different.
> >
> 
> Though I understand the way that you are describing, saying "Xen is
> maintaining an identity map" without any decoration explaining
> 'identity' for what did generate confusion. In IOMMU context, identity
> mapping w/o any decoration imo always refers to the IOMMU page table
> by default. If you intend it to mean something different, then please
> elaborate it in the code comment.
> 

Ok, I will expand the comment to fully explain.

  Cheers,

    Paul

> Thanks
> Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] iommu: introduce the concept of BFN...
  2018-02-12 10:47 ` [PATCH 1/7] iommu: introduce the concept of BFN Paul Durrant
@ 2018-03-15 13:39   ` Jan Beulich
  2018-03-16 10:31     ` Paul Durrant
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2018-03-15 13:39 UTC (permalink / raw)
  To: Paul Durrant
  Cc: xen-devel, Julien Grall, Stefano Stabellini, KevinTian,
	Suravee Suthikulpanit

>>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> @@ -367,9 +367,9 @@ void amd_iommu_flush_all_pages(struct domain *d)
>  }
>  
>  void amd_iommu_flush_pages(struct domain *d,
> -                           unsigned long gfn, unsigned int order)
> +                           unsigned long bfn, unsigned int order)
>  {
> -    _amd_iommu_flush_pages(d, (uint64_t) gfn << PAGE_SHIFT, order);
> +    _amd_iommu_flush_pages(d, (uint64_t) bfn << PAGE_SHIFT, order);
>  }

I assume you've simply used sed or alike to do the replacements,
but we prefer to make style corrections at the same time when
already touching a line: There's a stray space after the cast here,
and really this wants to be bfn_to_baddr() (which then also
shouldn't use the MMU's PAGE_SHIFT).

> @@ -651,34 +651,34 @@ int amd_iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn,
>      if ( rc )
>      {
>          spin_unlock(&hd->arch.mapping_lock);
> -        AMD_IOMMU_DEBUG("Root table alloc failed, gfn = %lx\n", gfn);
> +        AMD_IOMMU_DEBUG("Root table alloc failed, bfn = %lx\n", bfn);
>          domain_crash(d);
>          return rc;
>      }
>  
>      /* Since HVM domain is initialized with 2 level IO page table,
> -     * we might need a deeper page table for lager gfn now */
> +     * we might need a deeper page table for lager bfn now */

Similarly here: Mind making this say "larger" (or "wider")? There's at
least one more instance further down.

> @@ -2763,10 +2763,10 @@ static int __must_check arm_smmu_map_page(struct domain *d, unsigned long gfn,
>  	 * The function guest_physmap_add_entry replaces the current mapping
>  	 * if there is already one...
>  	 */
> -	return guest_physmap_add_entry(d, _gfn(gfn), _mfn(mfn), 0, t);
> +	return guest_physmap_add_entry(d, _gfn(bfn), _mfn(mfn), 0, t);

Hmm, very bad a change, but I presume unavoidable. I'd prefer if
such could at least be accompanied by a comment clarifying why
this mix of address spaces is correct in the specific case.

> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -23,11 +23,15 @@
>  #include <xen/page-defs.h>
>  #include <xen/spinlock.h>
>  #include <xen/pci.h>
> +#include <xen/typesafe.h>
>  #include <public/hvm/ioreq.h>
>  #include <public/domctl.h>
>  #include <asm/device.h>
>  #include <asm/iommu.h>
>  
> +TYPE_SAFE(unsigned long, bfn);
> +#define INVALID_BFN      _bfn(~0UL)

Please accompany this by a grep fodder (like the others have) and
perhaps also PRI_bfn. And while the type definition logically belongs
here, you will also want to add bfn_t with a description of its
purpose into the comment at the top of xen/mm.h. I guess you'll
need to replace / amend "host" in the MFN description there at the
same time.

I ask for this in particular because the description saying "mapped
in the IOMMU rather than the MMU" is ambiguous: Is it the input
frame number, or the output one (and things are even more
complicated when IOMMUs do two stages of translation). That in
turn affects whether I'd consider correct some of the changes
done elsewhere in this patch.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/7] iommu: make use of type-safe BFN and MFN in exported functions
  2018-02-12 10:47 ` [PATCH 2/7] iommu: make use of type-safe BFN and MFN in exported functions Paul Durrant
@ 2018-03-15 15:44   ` Jan Beulich
  2018-03-16 10:26     ` Paul Durrant
  2018-07-10 14:29     ` George Dunlap
  0 siblings, 2 replies; 68+ messages in thread
From: Jan Beulich @ 2018-03-15 15:44 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall,
	Jun Nakajima, xen-devel

>>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> This patch modifies the declaration of the entry points to the IOMMU
> sub-system to use bfn_t and mfn_t in place of unsigned long. A subsequent
> patch will similarly modify the methods in the iommu_ops structure.
> 
> NOTE: Since (with this patch applied) bfn_t is now in use, the patch also
>       introduces the 'cscope/grep fodder' to allow the type declaration to
>       be easily found.

Ah, here we go. But I continue to think this belong in patch 1.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -2676,13 +2676,12 @@ static int _get_page_type(struct page_info *page, unsigned long type,
>          struct domain *d = page_get_owner(page);
>          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>          {
> -            gfn_t gfn = _gfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
> +            bfn_t bfn = _bfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
>  
>              if ( (x & PGT_type_mask) == PGT_writable_page )
> -                iommu_ret = iommu_unmap_page(d, gfn_x(gfn));
> +                iommu_ret = iommu_unmap_page(d, bfn);
>              else if ( type == PGT_writable_page )
> -                iommu_ret = iommu_map_page(d, gfn_x(gfn),
> -                                           mfn_x(page_to_mfn(page)),
> +                iommu_ret = iommu_map_page(d, bfn, page_to_mfn(page),

Along the lines of what I've said earlier about mixing address spaces,
this would perhaps not so much need a comment (it's a 1:1 mapping
after all), but rather making more obvious that it's a 1:1 mapping.
This in particular would mean to me to latch page_to_mfn(page) into
a (neutrally named, e.g. "frame") local variable, and use the result in
a way that makes obviously especially on the "map" path that this
really requests a 1:1 mapping. By implication from the 1:1 mapping
it'll then (hopefully) be clear to the reader that which exact name
space is used doesn't really matter.

> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -873,12 +873,14 @@ out:
>              if ( iommu_flags )
>                  for ( i = 0; i < (1 << order); i++ )
>                  {
> -                    rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
> +                    rc = iommu_map_page(d, _bfn(gfn + i), mfn_add(mfn, i),
> +                                        iommu_flags);
>                      if ( unlikely(rc) )
>                      {
>                          while ( i-- )
>                              /* If statement to satisfy __must_check. */
> -                            if ( iommu_unmap_page(p2m->domain, gfn + i) )
> +                            if ( iommu_unmap_page(p2m->domain,
> +                                                  _bfn(gfn + i)) )

The fundamental issue of mixed address spaces continues ...

> @@ -781,14 +781,14 @@ guest_physmap_add_entry(struct domain *d, gfn_t gfn, mfn_t mfn,
>          {
>              for ( i = 0; i < (1 << page_order); i++ )
>              {
> -                rc = iommu_map_page(d, mfn_x(mfn_add(mfn, i)),
> -                                    mfn_x(mfn_add(mfn, i)),
> +                rc = iommu_map_page(d, _bfn(mfn_x(mfn) + i),
> +                                    mfn_add(mfn, i),

Please check whether some line wrapping can now be avoided, like
apparently here.

> @@ -1164,7 +1164,9 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn_l,
>      {
>          if ( !need_iommu(d) )
>              return 0;
> -        return iommu_map_page(d, gfn_l, gfn_l, IOMMUF_readable|IOMMUF_writable);
> +
> +        return iommu_map_page(d, _bfn(gfn_l), _mfn(gfn_l),
> +                              IOMMUF_readable|IOMMUF_writable);

Please add spaces around | as you touch this (also elsewhere).

> @@ -1254,7 +1256,8 @@ int clear_identity_p2m_entry(struct domain *d, unsigned long gfn_l)
>      {
>          if ( !need_iommu(d) )
>              return 0;
> -        return iommu_unmap_page(d, gfn_l);
> +
> +        return iommu_unmap_page(d, _bfn(gfn_l));
>      }

No real need for the extra blank line here, as this isn't the main return
point.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/7] iommu: push use of type-safe BFN and MFN into iommu_ops
  2018-02-12 10:47 ` [PATCH 3/7] iommu: push use of type-safe BFN and MFN into iommu_ops Paul Durrant
@ 2018-03-15 16:15   ` Jan Beulich
  2018-03-16 10:22     ` Paul Durrant
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2018-03-15 16:15 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Kevin Tian, Suravee Suthikulpanit, xen-devel

>>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> @@ -612,12 +612,12 @@ static int __must_check iommu_flush_iotlb(struct domain *d,
>          if ( iommu_domid == -1 )
>              continue;
>  
> -        if ( page_count != 1 || bfn == bfn_x(INVALID_BFN) )
> +        if ( page_count != 1 || bfn_eq(bfn, INVALID_BFN) )
>              rc = iommu_flush_iotlb_dsi(iommu, iommu_domid,
>                                         0, flush_dev_iotlb);
>          else
>              rc = iommu_flush_iotlb_psi(iommu, iommu_domid,
> -                                       (paddr_t)bfn << PAGE_SHIFT_4K,
> +                                       (paddr_t)bfn_x(bfn) << PAGE_SHIFT_4K,

The latest at this point you'll need to introduce bfn_to_baddr(). I
also have a hard time seeing how this can then validly be cast to
paddr_t.

> @@ -676,7 +676,8 @@ static int __must_check dma_pte_clear_one(struct domain *domain, u64 addr)
>      iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
>  
>      if ( !this_cpu(iommu_dont_flush_iotlb) )
> -        rc = iommu_flush_iotlb_pages(domain, addr >> PAGE_SHIFT_4K, 1);
> +        rc = iommu_flush_iotlb_pages(domain, _bfn(addr >> PAGE_SHIFT_4K),

And baddr_to_bfn().

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/7] vtd: add lookup_page method to iommu_ops
  2018-02-12 10:47 ` [PATCH 4/7] vtd: add lookup_page method to iommu_ops Paul Durrant
@ 2018-03-15 16:54   ` Jan Beulich
  2018-03-16 10:19     ` Paul Durrant
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2018-03-15 16:54 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Kevin Tian

>>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> This patch adds a new method to the VT-d IOMMU implementation to find the
> MFN currently mapped by the specified BFN. This functionality will be used
> by a subsequent patch.

How come this is VT-d only? The same is going to be needed at least
for the AMD IOMMU. And if you don't do it for ARM, then the hook
should be x86-specific for the time being.

> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1827,6 +1827,44 @@ static int __must_check intel_iommu_unmap_page(struct domain *d,
>      return dma_pte_clear_one(d, (paddr_t)bfn_x(bfn) << PAGE_SHIFT_4K);
>  }
>  
> +static int intel_iommu_lookup_page(struct domain *d, bfn_t bfn, mfn_t *mfn,
> +                                   unsigned int *flags)
> +{
> +    struct domain_iommu *hd = dom_iommu(d);
> +    struct dma_pte *page = NULL, *pte = NULL, val;

Pointless initializers.

> +    u64 pg_maddr;
> +
> +    spin_lock(&hd->arch.mapping_lock);

Depending on how frequently this is going to be used, this lock
may need to become an r/w one.

> +    pg_maddr =
> +        addr_to_dma_page_maddr(d, (paddr_t)bfn_x(bfn) << PAGE_SHIFT_4K, 1);

Why do you request table allocation here? Lookups shouldn't
normally alter the tables. Also this wants better line wrapping.

> +    if ( pg_maddr == 0 )
> +    {
> +        spin_unlock(&hd->arch.mapping_lock);
> +        return -ENOMEM;
> +    }
> +    page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);

Pointless cast.

> +    pte = page + (bfn_x(bfn) & LEVEL_MASK);
> +    val = *pte;
> +    if (!dma_pte_present(val)) {

Style (also more below).

> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -272,9 +272,11 @@ struct dma_pte {
>  #define dma_set_pte_prot(p, prot) do { \
>          (p).val = ((p).val & ~DMA_PTE_PROT) | ((prot) & DMA_PTE_PROT); \
>      } while (0)
> +#define dma_get_pte_prot(p) ((p).val & DMA_PTE_PROT)
>  #define dma_pte_addr(p) ((p).val & PADDR_MASK & PAGE_MASK_4K)
>  #define dma_set_pte_addr(p, addr) do {\
>              (p).val |= ((addr) & PAGE_MASK_4K); } while (0)
> +#define dma_get_pte_addr(p) ((p).val & PAGE_MASK_4K)

Why is dma_pte_addr() not good enough?

Overall this looks very much like Malcolm's original implementation;
I'm not sure dropping his authorship / S-o-b is a valid thing to do.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/7] vtd: add lookup_page method to iommu_ops
  2018-03-15 16:54   ` Jan Beulich
@ 2018-03-16 10:19     ` Paul Durrant
  2018-03-16 10:28       ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-03-16 10:19 UTC (permalink / raw)
  To: 'Jan Beulich'; +Cc: xen-devel, Kevin Tian

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 15 March 2018 16:54
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Kevin Tian <kevin.tian@intel.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH 4/7] vtd: add lookup_page method to iommu_ops
> 
> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> > This patch adds a new method to the VT-d IOMMU implementation to find
> the
> > MFN currently mapped by the specified BFN. This functionality will be used
> > by a subsequent patch.
> 
> How come this is VT-d only? The same is going to be needed at least
> for the AMD IOMMU. And if you don't do it for ARM, then the hook
> should be x86-specific for the time being.

I only have VT-d h/w to test on so it seemed prudent to keep it limited to that. I did look at doing a speculative implementation for AMD but it was not sufficiently obvious to give me confidence.
I don't see any particular reason to keep the hook arch specific though... it would just create code churn later, assuming someone wants to do PV-IOMMU for ARM.

> 
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -1827,6 +1827,44 @@ static int __must_check
> intel_iommu_unmap_page(struct domain *d,
> >      return dma_pte_clear_one(d, (paddr_t)bfn_x(bfn) << PAGE_SHIFT_4K);
> >  }
> >
> > +static int intel_iommu_lookup_page(struct domain *d, bfn_t bfn, mfn_t
> *mfn,
> > +                                   unsigned int *flags)
> > +{
> > +    struct domain_iommu *hd = dom_iommu(d);
> > +    struct dma_pte *page = NULL, *pte = NULL, val;
> 
> Pointless initializers.
> 
> > +    u64 pg_maddr;
> > +
> > +    spin_lock(&hd->arch.mapping_lock);
> 
> Depending on how frequently this is going to be used, this lock
> may need to become an r/w one.
> 
> > +    pg_maddr =
> > +        addr_to_dma_page_maddr(d, (paddr_t)bfn_x(bfn) <<
> PAGE_SHIFT_4K, 1);
> 
> Why do you request table allocation here? Lookups shouldn't
> normally alter the tables. Also this wants better line wrapping.
> 
> > +    if ( pg_maddr == 0 )
> > +    {
> > +        spin_unlock(&hd->arch.mapping_lock);
> > +        return -ENOMEM;
> > +    }
> > +    page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
> 
> Pointless cast.
> 
> > +    pte = page + (bfn_x(bfn) & LEVEL_MASK);
> > +    val = *pte;
> > +    if (!dma_pte_present(val)) {
> 
> Style (also more below).
> 
> > --- a/xen/drivers/passthrough/vtd/iommu.h
> > +++ b/xen/drivers/passthrough/vtd/iommu.h
> > @@ -272,9 +272,11 @@ struct dma_pte {
> >  #define dma_set_pte_prot(p, prot) do { \
> >          (p).val = ((p).val & ~DMA_PTE_PROT) | ((prot) & DMA_PTE_PROT); \
> >      } while (0)
> > +#define dma_get_pte_prot(p) ((p).val & DMA_PTE_PROT)
> >  #define dma_pte_addr(p) ((p).val & PADDR_MASK & PAGE_MASK_4K)
> >  #define dma_set_pte_addr(p, addr) do {\
> >              (p).val |= ((addr) & PAGE_MASK_4K); } while (0)
> > +#define dma_get_pte_addr(p) ((p).val & PAGE_MASK_4K)
> 
> Why is dma_pte_addr() not good enough?

I guess it probably is... not sure why Malcolm felt the need to add this... possibly concern over the AND with PADDR_MASK... but that looks like the right thing to do. I'll drop it in v2.

> 
> Overall this looks very much like Malcolm's original implementation;
> I'm not sure dropping his authorship / S-o-b is a valid thing to do.
> 

Yes, there's probably a little too much cut'n'paste from Malcolm's original. After some discussions with Andy Cooper I think I'm going to re-work things a bit in v2 anyway so Malcolm's s-o-b is likely to become moot at that point.

> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/7] iommu: push use of type-safe BFN and MFN into iommu_ops
  2018-03-15 16:15   ` Jan Beulich
@ 2018-03-16 10:22     ` Paul Durrant
  0 siblings, 0 replies; 68+ messages in thread
From: Paul Durrant @ 2018-03-16 10:22 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: Andrew Cooper, Kevin Tian, Suravee Suthikulpanit, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 15 March 2018 16:16
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>; Andrew
> Cooper <Andrew.Cooper3@citrix.com>; Kevin Tian <kevin.tian@intel.com>;
> xen-devel@lists.xenproject.org
> Subject: Re: [PATCH 3/7] iommu: push use of type-safe BFN and MFN into
> iommu_ops
> 
> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> > @@ -612,12 +612,12 @@ static int __must_check iommu_flush_iotlb(struct
> domain *d,
> >          if ( iommu_domid == -1 )
> >              continue;
> >
> > -        if ( page_count != 1 || bfn == bfn_x(INVALID_BFN) )
> > +        if ( page_count != 1 || bfn_eq(bfn, INVALID_BFN) )
> >              rc = iommu_flush_iotlb_dsi(iommu, iommu_domid,
> >                                         0, flush_dev_iotlb);
> >          else
> >              rc = iommu_flush_iotlb_psi(iommu, iommu_domid,
> > -                                       (paddr_t)bfn << PAGE_SHIFT_4K,
> > +                                       (paddr_t)bfn_x(bfn) << PAGE_SHIFT_4K,
> 
> The latest at this point you'll need to introduce bfn_to_baddr(). I
> also have a hard time seeing how this can then validly be cast to
> paddr_t.
> 

Well, it does look a little bogus... adding a bfn_to_baddr() does indeed sound like the best idea.

> > @@ -676,7 +676,8 @@ static int __must_check dma_pte_clear_one(struct
> domain *domain, u64 addr)
> >      iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
> >
> >      if ( !this_cpu(iommu_dont_flush_iotlb) )
> > -        rc = iommu_flush_iotlb_pages(domain, addr >> PAGE_SHIFT_4K, 1);
> > +        rc = iommu_flush_iotlb_pages(domain, _bfn(addr >>
> PAGE_SHIFT_4K),
> 
> And baddr_to_bfn().
> 

Sure.

  Paul

> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/7] iommu: make use of type-safe BFN and MFN in exported functions
  2018-03-15 15:44   ` Jan Beulich
@ 2018-03-16 10:26     ` Paul Durrant
  2018-07-10 14:29     ` George Dunlap
  1 sibling, 0 replies; 68+ messages in thread
From: Paul Durrant @ 2018-03-16 10:26 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Andrew Cooper,
	Tim (Xen.org),
	George Dunlap, Julien Grall, Jun Nakajima, xen-devel,
	Ian Jackson

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 15 March 2018 15:45
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Julien Grall <julien.grall@arm.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; George
> Dunlap <George.Dunlap@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>;
> Jun Nakajima <jun.nakajima@intel.com>; Kevin Tian
> <kevin.tian@intel.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Tim (Xen.org) <tim@xen.org>
> Subject: Re: [PATCH 2/7] iommu: make use of type-safe BFN and MFN in
> exported functions
> 
> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> > This patch modifies the declaration of the entry points to the IOMMU
> > sub-system to use bfn_t and mfn_t in place of unsigned long. A
> subsequent
> > patch will similarly modify the methods in the iommu_ops structure.
> >
> > NOTE: Since (with this patch applied) bfn_t is now in use, the patch also
> >       introduces the 'cscope/grep fodder' to allow the type declaration to
> >       be easily found.
> 
> Ah, here we go. But I continue to think this belong in patch 1.
> 

Ok. I debated it with myself when I wrote the original patches. I'll move the relevant hunks.

> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -2676,13 +2676,12 @@ static int _get_page_type(struct page_info
> *page, unsigned long type,
> >          struct domain *d = page_get_owner(page);
> >          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
> >          {
> > -            gfn_t gfn = _gfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
> > +            bfn_t bfn = _bfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
> >
> >              if ( (x & PGT_type_mask) == PGT_writable_page )
> > -                iommu_ret = iommu_unmap_page(d, gfn_x(gfn));
> > +                iommu_ret = iommu_unmap_page(d, bfn);
> >              else if ( type == PGT_writable_page )
> > -                iommu_ret = iommu_map_page(d, gfn_x(gfn),
> > -                                           mfn_x(page_to_mfn(page)),
> > +                iommu_ret = iommu_map_page(d, bfn, page_to_mfn(page),
> 
> Along the lines of what I've said earlier about mixing address spaces,
> this would perhaps not so much need a comment (it's a 1:1 mapping
> after all), but rather making more obvious that it's a 1:1 mapping.
> This in particular would mean to me to latch page_to_mfn(page) into
> a (neutrally named, e.g. "frame") local variable, and use the result in
> a way that makes obviously especially on the "map" path that this
> really requests a 1:1 mapping. By implication from the 1:1 mapping
> it'll then (hopefully) be clear to the reader that which exact name
> space is used doesn't really matter.

Ok, I'll re-phrase things in v2.

> 
> > --- a/xen/arch/x86/mm/p2m-ept.c
> > +++ b/xen/arch/x86/mm/p2m-ept.c
> > @@ -873,12 +873,14 @@ out:
> >              if ( iommu_flags )
> >                  for ( i = 0; i < (1 << order); i++ )
> >                  {
> > -                    rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
> > +                    rc = iommu_map_page(d, _bfn(gfn + i), mfn_add(mfn, i),
> > +                                        iommu_flags);
> >                      if ( unlikely(rc) )
> >                      {
> >                          while ( i-- )
> >                              /* If statement to satisfy __must_check. */
> > -                            if ( iommu_unmap_page(p2m->domain, gfn + i) )
> > +                            if ( iommu_unmap_page(p2m->domain,
> > +                                                  _bfn(gfn + i)) )
> 
> The fundamental issue of mixed address spaces continues ...
> 

I'll add appropriately an appropriately named stack variable.

> > @@ -781,14 +781,14 @@ guest_physmap_add_entry(struct domain *d,
> gfn_t gfn, mfn_t mfn,
> >          {
> >              for ( i = 0; i < (1 << page_order); i++ )
> >              {
> > -                rc = iommu_map_page(d, mfn_x(mfn_add(mfn, i)),
> > -                                    mfn_x(mfn_add(mfn, i)),
> > +                rc = iommu_map_page(d, _bfn(mfn_x(mfn) + i),
> > +                                    mfn_add(mfn, i),
> 
> Please check whether some line wrapping can now be avoided, like
> apparently here.
> 

Ok.

> > @@ -1164,7 +1164,9 @@ int set_identity_p2m_entry(struct domain *d,
> unsigned long gfn_l,
> >      {
> >          if ( !need_iommu(d) )
> >              return 0;
> > -        return iommu_map_page(d, gfn_l, gfn_l,
> IOMMUF_readable|IOMMUF_writable);
> > +
> > +        return iommu_map_page(d, _bfn(gfn_l), _mfn(gfn_l),
> > +                              IOMMUF_readable|IOMMUF_writable);
> 
> Please add spaces around | as you touch this (also elsewhere).
> 

Ok.

> > @@ -1254,7 +1256,8 @@ int clear_identity_p2m_entry(struct domain *d,
> unsigned long gfn_l)
> >      {
> >          if ( !need_iommu(d) )
> >              return 0;
> > -        return iommu_unmap_page(d, gfn_l);
> > +
> > +        return iommu_unmap_page(d, _bfn(gfn_l));
> >      }
> 
> No real need for the extra blank line here, as this isn't the main return
> point.
> 

Ok.

  Paul

> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/7] vtd: add lookup_page method to iommu_ops
  2018-03-16 10:19     ` Paul Durrant
@ 2018-03-16 10:28       ` Jan Beulich
  2018-03-16 10:41         ` Paul Durrant
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2018-03-16 10:28 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, Kevin Tian

>>> On 16.03.18 at 11:19, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 15 March 2018 16:54
>> 
>> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
>> > This patch adds a new method to the VT-d IOMMU implementation to find
>> the
>> > MFN currently mapped by the specified BFN. This functionality will be used
>> > by a subsequent patch.
>> 
>> How come this is VT-d only? The same is going to be needed at least
>> for the AMD IOMMU. And if you don't do it for ARM, then the hook
>> should be x86-specific for the time being.
> 
> I only have VT-d h/w to test on so it seemed prudent to keep it limited to 
> that. I did look at doing a speculative implementation for AMD but it was not 
> sufficiently obvious to give me confidence.
> I don't see any particular reason to keep the hook arch specific though... 
> it would just create code churn later, assuming someone wants to do PV-IOMMU 
> for ARM.

Well, the primary concern - as you're certainly aware - is the
dangling NULL pointer resulting from the lack of those other
implementations.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] iommu: introduce the concept of BFN...
  2018-03-15 13:39   ` Jan Beulich
@ 2018-03-16 10:31     ` Paul Durrant
  2018-03-16 10:39       ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-03-16 10:31 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: xen-devel, Julien Grall, Stefano Stabellini, Kevin Tian,
	Suravee Suthikulpanit

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 15 March 2018 13:40
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>; Julien Grall
> <julien.grall@arm.com>; Kevin Tian <kevin.tian@intel.com>; Stefano
> Stabellini <sstabellini@kernel.org>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH 1/7] iommu: introduce the concept of BFN...
> 
> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> > @@ -367,9 +367,9 @@ void amd_iommu_flush_all_pages(struct domain
> *d)
> >  }
> >
> >  void amd_iommu_flush_pages(struct domain *d,
> > -                           unsigned long gfn, unsigned int order)
> > +                           unsigned long bfn, unsigned int order)
> >  {
> > -    _amd_iommu_flush_pages(d, (uint64_t) gfn << PAGE_SHIFT, order);
> > +    _amd_iommu_flush_pages(d, (uint64_t) bfn << PAGE_SHIFT, order);
> >  }
> 
> I assume you've simply used sed or alike to do the replacements,
> but we prefer to make style corrections at the same time when
> already touching a line: There's a stray space after the cast here,
> and really this wants to be bfn_to_baddr() (which then also
> shouldn't use the MMU's PAGE_SHIFT).
> 

I guess I'll add IOMMU_PAGE_SHIFT/MASK definitions and use those in a new bfn_to_baddr()/baddr_to_bfn() pair.

> > @@ -651,34 +651,34 @@ int amd_iommu_map_page(struct domain *d,
> unsigned long gfn, unsigned long mfn,
> >      if ( rc )
> >      {
> >          spin_unlock(&hd->arch.mapping_lock);
> > -        AMD_IOMMU_DEBUG("Root table alloc failed, gfn = %lx\n", gfn);
> > +        AMD_IOMMU_DEBUG("Root table alloc failed, bfn = %lx\n", bfn);
> >          domain_crash(d);
> >          return rc;
> >      }
> >
> >      /* Since HVM domain is initialized with 2 level IO page table,
> > -     * we might need a deeper page table for lager gfn now */
> > +     * we might need a deeper page table for lager bfn now */
> 
> Similarly here: Mind making this say "larger" (or "wider")? There's at
> least one more instance further down.
> 

Sure.

> > @@ -2763,10 +2763,10 @@ static int __must_check
> arm_smmu_map_page(struct domain *d, unsigned long gfn,
> >  	 * The function guest_physmap_add_entry replaces the current
> mapping
> >  	 * if there is already one...
> >  	 */
> > -	return guest_physmap_add_entry(d, _gfn(gfn), _mfn(mfn), 0, t);
> > +	return guest_physmap_add_entry(d, _gfn(bfn), _mfn(mfn), 0, t);
> 
> Hmm, very bad a change, but I presume unavoidable. I'd prefer if
> such could at least be accompanied by a comment clarifying why
> this mix of address spaces is correct in the specific case.
> 

I'll add such a comment stating the 1:1 mapping.

> > --- a/xen/include/xen/iommu.h
> > +++ b/xen/include/xen/iommu.h
> > @@ -23,11 +23,15 @@
> >  #include <xen/page-defs.h>
> >  #include <xen/spinlock.h>
> >  #include <xen/pci.h>
> > +#include <xen/typesafe.h>
> >  #include <public/hvm/ioreq.h>
> >  #include <public/domctl.h>
> >  #include <asm/device.h>
> >  #include <asm/iommu.h>
> >
> > +TYPE_SAFE(unsigned long, bfn);
> > +#define INVALID_BFN      _bfn(~0UL)
> 
> Please accompany this by a grep fodder (like the others have) and
> perhaps also PRI_bfn. And while the type definition logically belongs
> here, you will also want to add bfn_t with a description of its
> purpose into the comment at the top of xen/mm.h. I guess you'll
> need to replace / amend "host" in the MFN description there at the
> same time.
> 

Should I move the TYPE_SAFE evaluation to xen/mm.h then? If I leave it here then I'll presumably need some ifdef hackery in mm.h if you want be to define bfn_t there too.

> I ask for this in particular because the description saying "mapped
> in the IOMMU rather than the MMU" is ambiguous: Is it the input
> frame number, or the output one (and things are even more
> complicated when IOMMUs do two stages of translation). That in
> turn affects whether I'd consider correct some of the changes
> done elsewhere in this patch.
> 

Ok.

  Paul

> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/7] iommu: introduce the concept of BFN...
  2018-03-16 10:31     ` Paul Durrant
@ 2018-03-16 10:39       ` Jan Beulich
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Beulich @ 2018-03-16 10:39 UTC (permalink / raw)
  To: Paul Durrant
  Cc: xen-devel, Julien Grall, Stefano Stabellini, Kevin Tian,
	Suravee Suthikulpanit

>>> On 16.03.18 at 11:31, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 15 March 2018 13:40
>> 
>> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
>> > --- a/xen/include/xen/iommu.h
>> > +++ b/xen/include/xen/iommu.h
>> > @@ -23,11 +23,15 @@
>> >  #include <xen/page-defs.h>
>> >  #include <xen/spinlock.h>
>> >  #include <xen/pci.h>
>> > +#include <xen/typesafe.h>
>> >  #include <public/hvm/ioreq.h>
>> >  #include <public/domctl.h>
>> >  #include <asm/device.h>
>> >  #include <asm/iommu.h>
>> >
>> > +TYPE_SAFE(unsigned long, bfn);
>> > +#define INVALID_BFN      _bfn(~0UL)
>> 
>> Please accompany this by a grep fodder (like the others have) and
>> perhaps also PRI_bfn. And while the type definition logically belongs
>> here, you will also want to add bfn_t with a description of its
>> purpose into the comment at the top of xen/mm.h. I guess you'll
>> need to replace / amend "host" in the MFN description there at the
>> same time.
> 
> Should I move the TYPE_SAFE evaluation to xen/mm.h then? If I leave it here 
> then I'll presumably need some ifdef hackery in mm.h if you want be to define 
> bfn_t there too.

No, I don't think the type definition itself needs to go there. If
anything, add a comment there saying that bfn_t lives in iommu.h.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/7] vtd: add lookup_page method to iommu_ops
  2018-03-16 10:28       ` Jan Beulich
@ 2018-03-16 10:41         ` Paul Durrant
  0 siblings, 0 replies; 68+ messages in thread
From: Paul Durrant @ 2018-03-16 10:41 UTC (permalink / raw)
  To: 'Jan Beulich'; +Cc: xen-devel, Kevin Tian

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 16 March 2018 10:29
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Kevin Tian <kevin.tian@intel.com>; xen-devel@lists.xenproject.org
> Subject: RE: [PATCH 4/7] vtd: add lookup_page method to iommu_ops
> 
> >>> On 16.03.18 at 11:19, <Paul.Durrant@citrix.com> wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: 15 March 2018 16:54
> >>
> >> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> >> > This patch adds a new method to the VT-d IOMMU implementation to
> find
> >> the
> >> > MFN currently mapped by the specified BFN. This functionality will be
> used
> >> > by a subsequent patch.
> >>
> >> How come this is VT-d only? The same is going to be needed at least
> >> for the AMD IOMMU. And if you don't do it for ARM, then the hook
> >> should be x86-specific for the time being.
> >
> > I only have VT-d h/w to test on so it seemed prudent to keep it limited to
> > that. I did look at doing a speculative implementation for AMD but it was
> not
> > sufficiently obvious to give me confidence.
> > I don't see any particular reason to keep the hook arch specific though...
> > it would just create code churn later, assuming someone wants to do PV-
> IOMMU
> > for ARM.
> 
> Well, the primary concern - as you're certainly aware - is the
> dangling NULL pointer resulting from the lack of those other
> implementations.
> 

Sure. One of the changes I was going to make was to the definitions of iommu_map_page() and iommu_unmap_page() (in passthrough/iommu.c). I'll try to make them more useful by removing the in-built domain_crash() (pushing that out to the callers in the p2m and grant code as appropriate) and then add an iommu_lookup_page() that checks the op's existence before attempting to call it.

  Paul

> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-02-12 10:47 ` [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op Paul Durrant
  2018-02-13  6:43   ` Tian, Kevin
@ 2018-03-16 12:25   ` Jan Beulich
  2018-06-07 11:42     ` Paul Durrant
  1 sibling, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2018-03-16 12:25 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	IanJackson, Tim Deegan, xen-devel, Daniel De Graaf

>>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> --- a/xen/arch/x86/Makefile
> +++ b/xen/arch/x86/Makefile
> @@ -33,6 +33,7 @@ obj-$(CONFIG_CRASH_DEBUG) += gdbstub.o
>  obj-y += hypercall.o
>  obj-y += i387.o
>  obj-y += i8259.o
> +obj-y += iommu_op.o

As mentioned in other contexts, I'd prefer if we stopped using
underscores in places where dashes (or other separators not
usable in C identifiers) are fine.

> --- /dev/null
> +++ b/xen/arch/x86/iommu_op.c
> @@ -0,0 +1,169 @@
> +/******************************************************************************
> + * x86/iommu_op.c
> + *
> + * Paravirtualised IOMMU functionality
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Copyright (C) 2018 Citrix Systems Inc
> + */
> +
> +#include <xen/event.h>
> +#include <xen/guest_access.h>
> +#include <xen/hypercall.h>
> +
> +static bool can_control_iommu(void)
> +{
> +    struct domain *currd = current->domain;
> +
> +    /*
> +     * IOMMU mappings cannot be manipulated if:
> +     * - the IOMMU is not enabled or,
> +     * - the IOMMU is passed through or,

"is passed through" isn't really a proper description of what
iommu_passthrough means, I'm afraid. The description of the
option says "Control whether to disable DMA remapping for
Dom0." Perhaps "is bypassed"? But then it would be better
to qualify the check with is_hardware_domain(), despite you
restricting things to Dom0 for now anyway.

> +     * - shared EPT configured or,
> +     * - Xen is maintaining an identity map.

Is this meant to describe ...

> +     */
> +    if ( !iommu_enabled || iommu_passthrough ||
> +         iommu_use_hap_pt(currd) || need_iommu(currd) )

... need_iommu() here? How is that implying an identity map?

> +        return false;
> +
> +    return true;

Please make this a singe return statement (with the expression as
operand).

> +long do_iommu_op(XEN_GUEST_HANDLE_PARAM(xen_iommu_op_t) uops,
> +                 unsigned int count)
> +{
> +    unsigned int i;
> +    int rc;
> +
> +    rc = xsm_iommu_op(XSM_PRIV, current->domain);
> +    if ( rc )
> +        return rc;
> +
> +    if ( !can_control_iommu() )
> +        return -EACCES;
> +
> +    for ( i = 0; i < count; i++ )
> +    {
> +        xen_iommu_op_t op;
> +
> +        if ( ((i & 0xff) == 0xff) && hypercall_preempt_check() )
> +        {
> +            rc = i;

For this to be correct for large enough values of "count", rc needs
to have long type.

> +            break;
> +        }
> +
> +        if ( copy_from_guest_offset(&op, uops, i, 1) )
> +        {
> +            rc = -EFAULT;
> +            break;
> +        }
> +
> +        iommu_op(&op);
> +
> +        if ( copy_to_guest_offset(uops, i, &op, 1) )

__copy_to_guest_offset()

Also do you really need to copy back other than the status?

> --- /dev/null
> +++ b/xen/include/public/iommu_op.h
> @@ -0,0 +1,55 @@
> +/*
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to
> + * deal in the Software without restriction, including without limitation the
> + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
> + * sell copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + *
> + * Copyright (C) 2018 Citrix Systems Inc
> + */
> +
> +#ifndef __XEN_PUBLIC_IOMMU_OP_H__
> +#define __XEN_PUBLIC_IOMMU_OP_H__

Please can you avoid introducing further name space violations
into the public headers?

> +#include "xen.h"
> +
> +struct xen_iommu_op {
> +    uint16_t op;
> +    uint16_t flags; /* op specific flags */
> +    int32_t status; /* op completion status: */
> +                    /* 0 for success otherwise, negative errno */
> +};

Peeking at patch 6, you need to add the union and a large enough
placeholder here right away, so that the struct size won't change
with future additions.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 6/7] x86: add iommu_op to query reserved ranges
  2018-02-12 10:47 ` [PATCH 6/7] x86: add iommu_op to query reserved ranges Paul Durrant
  2018-02-13  6:51   ` Tian, Kevin
@ 2018-03-19 14:10   ` Jan Beulich
  2018-03-19 15:13     ` Paul Durrant
  2018-03-19 15:13   ` Jan Beulich
  2 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2018-03-19 14:10 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel

>>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> --- a/xen/arch/x86/iommu_op.c
> +++ b/xen/arch/x86/iommu_op.c
> @@ -22,6 +22,58 @@
>  #include <xen/event.h>
>  #include <xen/guest_access.h>
>  #include <xen/hypercall.h>
> +#include <xen/iommu.h>
> +
> +struct get_rdm_ctxt {
> +    unsigned int max_entries;
> +    unsigned int nr_entries;
> +    XEN_GUEST_HANDLE(xen_iommu_reserved_region_t) regions;
> +};
> +
> +static int get_rdm(xen_pfn_t start, xen_ulong_t nr, u32 id, void *arg)

uint32_t please in new code.

> +static int iommuop_query_reserved(struct xen_iommu_op_query_reserved *op)
> +{
> +    struct get_rdm_ctxt ctxt = {
> +        .max_entries = op->nr_entries,
> +        .regions = op->regions,
> +    };
> +    int rc;
> +
> +    if (op->pad != 0)

Missing blanks. Perhaps also drop the " != 0".

> +        return -EINVAL;
> +
> +    rc = iommu_get_reserved_device_memory(get_rdm, &ctxt);
> +    if ( rc )
> +        return rc;
> +
> +    /* Pass back the actual number of reserved regions */
> +    op->nr_entries = ctxt.nr_entries;
> +
> +    if ( ctxt.nr_entries > ctxt.max_entries )
> +        return -ENOBUFS;

Perhaps unless the handle is null?

> @@ -132,12 +190,75 @@ int compat_iommu_op(XEN_GUEST_HANDLE_PARAM(compat_iommu_op_t) uops,
>              break;
>          }
>  
> +        /*
> +         * The xlat magic doesn't quite know how to handle the union so
> +         * we need to fix things up here.
> +         */

That's quite sad, as this is the second instance in a relatively short
period of time. We really should see whether the translation code
can't be adjusted suitably.

> +#define XLAT_iommu_op_u_query_reserved XEN_IOMMUOP_query_reserved
> +        u = cmp.op;
> +
> +#define XLAT_iommu_op_query_reserved_HNDL_regions(_d_, _s_) \
> +        do \
> +        { \
> +            if ( !compat_handle_is_null((_s_)->regions) ) \

In the context of the earlier missing null handle check I find this
a little surprising (but correct).

> +            { \
> +                unsigned int *nr_entries = COMPAT_ARG_XLAT_VIRT_BASE; \
> +                xen_iommu_reserved_region_t *regions = \
> +                    (void *)(nr_entries + 1); \
> +                \
> +                if ( sizeof(*nr_entries) + \
> +                     (sizeof(*regions) * (_s_)->nr_entries) > \
> +                     COMPAT_ARG_XLAT_SIZE ) \
> +                    return -E2BIG; \
> +                \
> +                *nr_entries = (_s_)->nr_entries; \
> +                set_xen_guest_handle((_d_)->regions, regions); \

I don't understand why nr_entries has to be a pointer into the
translation area. Can't this be a simple local variable?

> +            } \
> +            else \
> +                set_xen_guest_handle((_d_)->regions, NULL); \
> +        } while (false)
> +
>          XLAT_iommu_op(&nat, &cmp);
>  
> +#undef XLAT_iommu_op_query_reserved_HNDL_regions
> +
>          iommu_op(&nat);
>  
> +        status = nat.status;
> +
> +#define XLAT_iommu_op_query_reserved_HNDL_regions(_d_, _s_) \
> +        do \
> +        { \
> +            if ( !compat_handle_is_null((_d_)->regions) ) \
> +            { \
> +                unsigned int *nr_entries = COMPAT_ARG_XLAT_VIRT_BASE; \
> +                xen_iommu_reserved_region_t *regions = \
> +                    (void *)(nr_entries + 1); \
> +                unsigned int j; \

Without any i in an outer scope, using j is a little unusual (but of
course okay).

> +                \
> +                for ( j = 0; \
> +                      j < min_t(unsigned int, (_d_)->nr_entries, \
> +                                *nr_entries); \

Do you really need min_t() here (rather than the more safe min())?

> +                      j++ ) \
> +                { \
> +                    compat_iommu_reserved_region_t region; \
> +                    \
> +                    XLAT_iommu_reserved_region(&region, &regions[j]); \
> +                    \
> +                    if ( __copy_to_compat_offset((_d_)->regions, j, \
> +                                                 &region, 1) ) \

If you use the __-prefixed variant here, where's the address
validity check?

> --- a/xen/include/public/iommu_op.h
> +++ b/xen/include/public/iommu_op.h
> @@ -25,11 +25,46 @@
>  
>  #include "xen.h"
>  
> +typedef unsigned long xen_bfn_t;

Is this suitable for e.g. ARM, who don't use unsigned long for e.g.
xen_pfn_t? Is there in fact any reason not to re-use the generic
xen_pfn_t here (also see your get_rdm() above)? Otoh this is an
opportunity to not widen the problem of limited addressability in
32-bit guests - the type could be 64-bit wide across the board.

> +struct xen_iommu_reserved_region {
> +    xen_bfn_t start_bfn;
> +    unsigned int nr_frames;
> +    unsigned int pad;

Fixed width types (i.e. uint32_t) in the public interface please.
Also, this not being the main MMU, page granularity needs to be
specified somehow (also for the conversion between xen_bfn_t
and a bus address).

> +struct xen_iommu_op_query_reserved {
> +    /*
> +     * IN/OUT - On entries this is the number of entries available
> +     *          in the regions array below.
> +     *          On exit this is the actual number of reserved regions.
> +     */
> +    unsigned int nr_entries;
> +    unsigned int pad;

Same here.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB
  2018-02-12 10:47 ` [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB Paul Durrant
  2018-02-13  6:55   ` Tian, Kevin
@ 2018-03-19 15:11   ` Jan Beulich
  2018-03-19 15:34     ` Paul Durrant
  1 sibling, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2018-03-19 15:11 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel

>>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> This patch adds iommu_ops to allow a domain with control_iommu privilege
> to map and unmap pages from any guest over which it has mapping privilege
> in the IOMMU.
> These operations implicitly disable IOTLB flushing so that the caller can
> batch operations and then explicitly flush the IOTLB using the iommu_op
> also added by this patch.

Can't this be abused for unmaps?

> --- a/xen/arch/x86/iommu_op.c
> +++ b/xen/arch/x86/iommu_op.c
> @@ -24,6 +24,174 @@
>  #include <xen/hypercall.h>
>  #include <xen/iommu.h>
>  
> +/* Override macros from asm/page.h to make them work with mfn_t */
> +#undef mfn_to_page
> +#define mfn_to_page(mfn) __mfn_to_page(mfn_x(mfn))
> +#undef page_to_mfn
> +#define page_to_mfn(page) _mfn(__page_to_mfn(page))

I guess with Julien's this needs to go away, but it looks like his
series hasn't landed yet.

> +struct check_rdm_ctxt {
> +    bfn_t bfn;
> +};
> +
> +static int check_rdm(xen_pfn_t start, xen_ulong_t nr, u32 id, void *arg)

uint32_t

> +{
> +    struct check_rdm_ctxt *ctxt = arg;
> +
> +    if ( bfn_x(ctxt->bfn) >= start &&
> +         bfn_x(ctxt->bfn) < start + nr )
> +        return -EINVAL;

Something more distinguishable than EINVAL would certainly be
nice here. Also how come this check does not depend on the
domain? Only RMRRs of devices owned by a domain are relevant
in the BFN range (unless I still didn't fully understand how BFN is
meant to be different from GFN and MFN).

> +static int iommuop_map(struct xen_iommu_op_map *op, unsigned int flags)
> +{
> +    struct domain *d, *od, *currd = current->domain;
> +    struct domain_iommu *iommu = dom_iommu(currd);
> +    const struct iommu_ops *ops = iommu->platform_ops;
> +    domid_t domid = op->domid;
> +    gfn_t gfn = _gfn(op->gfn);
> +    bfn_t bfn = _bfn(op->bfn);
> +    mfn_t mfn;
> +    struct check_rdm_ctxt ctxt = {
> +        .bfn = bfn,
> +    };
> +    p2m_type_t p2mt;
> +    p2m_query_t p2mq;
> +    struct page_info *page;
> +    unsigned int prot;
> +    int rc;
> +
> +    if (op->pad0 != 0 || op->pad1 != 0)

Missing blanks again (and please again consider dropping the " != 0").

> +        return -EINVAL;
> +
> +    /*
> +     * Both map_page and lookup_page operations must be implemented.
> +     * The lookup_page method is not used here but is relied upon by
> +     * iommuop_unmap() to drop the page reference taken here.
> +     */
> +    if ( !ops->map_page || !ops->lookup_page )
> +        return -ENOSYS;

EOPNOTSUPP (also further down)

Also how about the unmap hook? If that's not implemented, how
would the page ref obtained below ever be dropped again? Or
you may need to re-order the unmap side code.

> +    /* Check whether the specified BFN falls in a reserved region */
> +    rc = iommu_get_reserved_device_memory(check_rdm, &ctxt);
> +    if ( rc )
> +        return rc;
> +
> +    d = rcu_lock_domain_by_any_id(domid);
> +    if ( !d )
> +        return -ESRCH;
> +
> +    p2mq = (flags & XEN_IOMMUOP_map_readonly) ?
> +        P2M_UNSHARE : P2M_ALLOC;

Isn't this the wrong way round?

> +    page = get_page_from_gfn(d, gfn_x(gfn), &p2mt, p2mq);
> +
> +    rc = -ENOENT;
> +    if ( !page )
> +        goto unlock;
> +
> +    if ( p2m_is_paged(p2mt) )
> +    {
> +        p2m_mem_paging_populate(d, gfn_x(gfn));
> +        goto release;
> +    }
> +
> +    if ( (p2mq & P2M_UNSHARE) && p2m_is_shared(p2mt) )
> +        goto release;

Same for this check then?

> +    /*
> +     * Make sure the page is RAM and, if it is read-only, that the
> +     * read-only flag is present.
> +     */
> +    rc = -EPERM;
> +    if ( !p2m_is_any_ram(p2mt) ||
> +         (p2m_is_readonly(p2mt) && !(flags & XEN_IOMMUOP_map_readonly)) )
> +        goto release;

Don't you also need to obtain a PGT_writable reference in the
"not r/o" case?

> +    /*
> +     * If the calling domain does not own the page then make sure it
> +     * has mapping privilege over the page owner.
> +     */
> +    od = page_get_owner(page);
> +    if ( od != currd )
> +    {
> +        rc = xsm_domain_memory_map(XSM_TARGET, od);
> +        if ( rc )
> +            goto release;
> +    }

With XSM_TARGET I don't see the point of the if() around here.
Perhaps simply

        rc = xsm_domain_memory_map(XSM_TARGET, page_get_owner(page));

?

> +static int iommuop_unmap(struct xen_iommu_op_unmap *op)
> +{
> +    struct domain *currd = current->domain;
> +    struct domain_iommu *iommu = dom_iommu(currd);
> +    const struct iommu_ops *ops = iommu->platform_ops;
> +    bfn_t bfn = _bfn(op->bfn);
> +    mfn_t mfn;
> +    struct check_rdm_ctxt ctxt = {
> +        .bfn = bfn,
> +    };
> +    unsigned int flags;
> +    struct page_info *page;
> +    int rc;
> +
> +    /*
> +     * Both unmap_page and lookup_page operations must be implemented.
> +     */

Single line comment (there are more below).

> +    if ( !ops->unmap_page || !ops->lookup_page )
> +        return -ENOSYS;
> +
> +    /* Check whether the specified BFN falls in a reserved region */
> +    rc = iommu_get_reserved_device_memory(check_rdm, &ctxt);
> +    if ( rc )
> +        return rc;
> +
> +    if ( ops->lookup_page(currd, bfn, &mfn, &flags) ||
> +         !mfn_valid(mfn) )
> +        return -ENOENT;
> +
> +    page = mfn_to_page(mfn);
> +
> +    if ( ops->unmap_page(currd, bfn) )
> +        return -EIO;

How are you making sure this is a mapping that was established via
the map op? Without that this can be (ab)used to ...

> +    put_page(page);

... underflow the refcount of a page.

> +    return 0;

Blank line above here please.

> @@ -101,6 +269,22 @@ static void iommu_op(xen_iommu_op_t *op)
>          op->status = iommuop_query_reserved(&op->u.query_reserved);
>          break;
>  
> +    case XEN_IOMMUOP_map:
> +        this_cpu(iommu_dont_flush_iotlb) = 1;
> +        op->status = iommuop_map(&op->u.map, op->flags);
> +        this_cpu(iommu_dont_flush_iotlb) = 0;

true/false would be better in new code, even if the type of the
variable is still bool_t.

> --- a/xen/include/public/iommu_op.h
> +++ b/xen/include/public/iommu_op.h
> @@ -57,13 +57,50 @@ struct xen_iommu_op_query_reserved {
>      XEN_GUEST_HANDLE(xen_iommu_reserved_region_t) regions;
>  };
>  
> +/*
> + * XEN_IOMMUOP_map: Map a page in the IOMMU.
> + */
> +#define XEN_IOMMUOP_map 2
> +
> +struct xen_iommu_op_map {
> +    /* IN - The IOMMU frame number which will hold the new mapping */
> +    xen_bfn_t bfn;
> +    /* IN - The guest frame number of the page to be mapped */
> +    xen_pfn_t gfn;
> +    /* IN - The domid of the guest */

"... owning the page"

> +    domid_t domid;
> +    unsigned short pad0;
> +    unsigned int pad1;
> +};

No built in batching here? Also fixed width types again please.

> +/*
> + * XEN_IOMMUOP_flush: Flush the IOMMU TLB.
> + */
> +#define XEN_IOMMUOP_flush 4

No inputs here at all makes this a rather simple interface, but makes
single-page updates quite expensive.

>  struct xen_iommu_op {
>      uint16_t op;
>      uint16_t flags; /* op specific flags */
> +
> +#define _XEN_IOMMUOP_map_readonly 0
> +#define XEN_IOMMUOP_map_readonly (1 << (_XEN_IOMMUOP_map_readonly))

Perhaps better have this next to the map op?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 6/7] x86: add iommu_op to query reserved ranges
  2018-02-12 10:47 ` [PATCH 6/7] x86: add iommu_op to query reserved ranges Paul Durrant
  2018-02-13  6:51   ` Tian, Kevin
  2018-03-19 14:10   ` Jan Beulich
@ 2018-03-19 15:13   ` Jan Beulich
  2018-03-19 15:36     ` Paul Durrant
  2 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2018-03-19 15:13 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, xen-devel

>>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> --- a/xen/arch/x86/iommu_op.c
> +++ b/xen/arch/x86/iommu_op.c
> @@ -22,6 +22,58 @@
>  #include <xen/event.h>
>  #include <xen/guest_access.h>
>  #include <xen/hypercall.h>
> +#include <xen/iommu.h>
> +
> +struct get_rdm_ctxt {
> +    unsigned int max_entries;
> +    unsigned int nr_entries;
> +    XEN_GUEST_HANDLE(xen_iommu_reserved_region_t) regions;
> +};
> +
> +static int get_rdm(xen_pfn_t start, xen_ulong_t nr, u32 id, void *arg)
> +{
> +    struct get_rdm_ctxt *ctxt = arg;
> +
> +    if ( ctxt->nr_entries < ctxt->max_entries )
> +    {
> +        xen_iommu_reserved_region_t region = {
> +            .start_bfn = start,
> +            .nr_frames = nr,
> +        };
> +
> +        if ( copy_to_guest_offset(ctxt->regions, ctxt->nr_entries, &region,
> +                                  1) )
> +            return -EFAULT;
> +    }
> +
> +    ctxt->nr_entries++;
> +
> +    return 1;
> +}
> +
> +static int iommuop_query_reserved(struct xen_iommu_op_query_reserved *op)
> +{
> +    struct get_rdm_ctxt ctxt = {
> +        .max_entries = op->nr_entries,
> +        .regions = op->regions,
> +    };
> +    int rc;
> +
> +    if (op->pad != 0)
> +        return -EINVAL;
> +
> +    rc = iommu_get_reserved_device_memory(get_rdm, &ctxt);
> +    if ( rc )
> +        return rc;
> +
> +    /* Pass back the actual number of reserved regions */
> +    op->nr_entries = ctxt.nr_entries;
> +
> +    if ( ctxt.nr_entries > ctxt.max_entries )
> +        return -ENOBUFS;
> +
> +    return 0;
> +}

One more note here: As it looks we can only hope there won't be
too many RMRRs, as the number of entries that can be requested
here is basically unbounded.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 6/7] x86: add iommu_op to query reserved ranges
  2018-03-19 14:10   ` Jan Beulich
@ 2018-03-19 15:13     ` Paul Durrant
  2018-03-19 16:30       ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-03-19 15:13 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Ian Jackson, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 19 March 2018 14:10
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; Ian
> Jackson <Ian.Jackson@citrix.com>; Stefano Stabellini
> <sstabellini@kernel.org>; xen-devel@lists.xenproject.org; Konrad Rzeszutek
> Wilk <konrad.wilk@oracle.com>; Tim (Xen.org) <tim@xen.org>
> Subject: Re: [PATCH 6/7] x86: add iommu_op to query reserved ranges
> 
> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> > --- a/xen/arch/x86/iommu_op.c
> > +++ b/xen/arch/x86/iommu_op.c
> > @@ -22,6 +22,58 @@
> >  #include <xen/event.h>
> >  #include <xen/guest_access.h>
> >  #include <xen/hypercall.h>
> > +#include <xen/iommu.h>
> > +
> > +struct get_rdm_ctxt {
> > +    unsigned int max_entries;
> > +    unsigned int nr_entries;
> > +    XEN_GUEST_HANDLE(xen_iommu_reserved_region_t) regions;
> > +};
> > +
> > +static int get_rdm(xen_pfn_t start, xen_ulong_t nr, u32 id, void *arg)
> 
> uint32_t please in new code.

Ok.

> 
> > +static int iommuop_query_reserved(struct
> xen_iommu_op_query_reserved *op)
> > +{
> > +    struct get_rdm_ctxt ctxt = {
> > +        .max_entries = op->nr_entries,
> > +        .regions = op->regions,
> > +    };
> > +    int rc;
> > +
> > +    if (op->pad != 0)
> 
> Missing blanks. Perhaps also drop the " != 0".
> 

Indeed.

> > +        return -EINVAL;
> > +
> > +    rc = iommu_get_reserved_device_memory(get_rdm, &ctxt);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    /* Pass back the actual number of reserved regions */
> > +    op->nr_entries = ctxt.nr_entries;
> > +
> > +    if ( ctxt.nr_entries > ctxt.max_entries )
> > +        return -ENOBUFS;
> 
> Perhaps unless the handle is null?
> 

Hmm. I'll re-work my Linux code and try that.

> > @@ -132,12 +190,75 @@ int
> compat_iommu_op(XEN_GUEST_HANDLE_PARAM(compat_iommu_op_t)
> uops,
> >              break;
> >          }
> >
> > +        /*
> > +         * The xlat magic doesn't quite know how to handle the union so
> > +         * we need to fix things up here.
> > +         */
> 
> That's quite sad, as this is the second instance in a relatively short
> period of time. We really should see whether the translation code
> can't be adjusted suitably.
> 
> > +#define XLAT_iommu_op_u_query_reserved
> XEN_IOMMUOP_query_reserved
> > +        u = cmp.op;
> > +
> > +#define XLAT_iommu_op_query_reserved_HNDL_regions(_d_, _s_) \
> > +        do \
> > +        { \
> > +            if ( !compat_handle_is_null((_s_)->regions) ) \
> 
> In the context of the earlier missing null handle check I find this
> a little surprising (but correct).
> 
> > +            { \
> > +                unsigned int *nr_entries = COMPAT_ARG_XLAT_VIRT_BASE; \
> > +                xen_iommu_reserved_region_t *regions = \
> > +                    (void *)(nr_entries + 1); \
> > +                \
> > +                if ( sizeof(*nr_entries) + \
> > +                     (sizeof(*regions) * (_s_)->nr_entries) > \
> > +                     COMPAT_ARG_XLAT_SIZE ) \
> > +                    return -E2BIG; \
> > +                \
> > +                *nr_entries = (_s_)->nr_entries; \
> > +                set_xen_guest_handle((_d_)->regions, regions); \
> 
> I don't understand why nr_entries has to be a pointer into the
> translation area. Can't this be a simple local variable?
> 

Probably. On the face of it it looks a stack variable should be fine. I'll check.

> > +            } \
> > +            else \
> > +                set_xen_guest_handle((_d_)->regions, NULL); \
> > +        } while (false)
> > +
> >          XLAT_iommu_op(&nat, &cmp);
> >
> > +#undef XLAT_iommu_op_query_reserved_HNDL_regions
> > +
> >          iommu_op(&nat);
> >
> > +        status = nat.status;
> > +
> > +#define XLAT_iommu_op_query_reserved_HNDL_regions(_d_, _s_) \
> > +        do \
> > +        { \
> > +            if ( !compat_handle_is_null((_d_)->regions) ) \
> > +            { \
> > +                unsigned int *nr_entries = COMPAT_ARG_XLAT_VIRT_BASE; \
> > +                xen_iommu_reserved_region_t *regions = \
> > +                    (void *)(nr_entries + 1); \
> > +                unsigned int j; \
> 
> Without any i in an outer scope, using j is a little unusual (but of
> course okay).
> 

Oh, that may have been a hangover from a previous incarnation of the code. I'll change it if there's no clash.

> > +                \
> > +                for ( j = 0; \
> > +                      j < min_t(unsigned int, (_d_)->nr_entries, \
> > +                                *nr_entries); \
> 
> Do you really need min_t() here (rather than the more safe min())?
> 

I've been asked to preferentially use min_t() before (although I don't think it was by you) so I'm not sure what the expectation is. I'm happy to use min().

> > +                      j++ ) \
> > +                { \
> > +                    compat_iommu_reserved_region_t region; \
> > +                    \
> > +                    XLAT_iommu_reserved_region(&region, &regions[j]); \
> > +                    \
> > +                    if ( __copy_to_compat_offset((_d_)->regions, j, \
> > +                                                 &region, 1) ) \
> 
> If you use the __-prefixed variant here, where's the address
> validity check?
> 

I thought it was validated on the way in but maybe I missed that.

> > --- a/xen/include/public/iommu_op.h
> > +++ b/xen/include/public/iommu_op.h
> > @@ -25,11 +25,46 @@
> >
> >  #include "xen.h"
> >
> > +typedef unsigned long xen_bfn_t;
> 
> Is this suitable for e.g. ARM, who don't use unsigned long for e.g.
> xen_pfn_t? Is there in fact any reason not to re-use the generic
> xen_pfn_t here (also see your get_rdm() above)? Otoh this is an
> opportunity to not widen the problem of limited addressability in
> 32-bit guests - the type could be 64-bit wide across the board.
> 

A fixed 64-bit type should mean I can lose the compat code so I'd be happy with that.

> > +struct xen_iommu_reserved_region {
> > +    xen_bfn_t start_bfn;
> > +    unsigned int nr_frames;
> > +    unsigned int pad;
> 
> Fixed width types (i.e. uint32_t) in the public interface please.
> Also, this not being the main MMU, page granularity needs to be
> specified somehow (also for the conversion between xen_bfn_t
> and a bus address).
> 

Do you think it would be better to have a separate query call to get the IOMMU page size back, or are you anticipating heterogeneous ranges (in which case I'm going to need to adjust the map and unmap functions to allow for that)?

  Paul

> > +struct xen_iommu_op_query_reserved {
> > +    /*
> > +     * IN/OUT - On entries this is the number of entries available
> > +     *          in the regions array below.
> > +     *          On exit this is the actual number of reserved regions.
> > +     */
> > +    unsigned int nr_entries;
> > +    unsigned int pad;
> 
> Same here.
> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB
  2018-03-19 15:11   ` Jan Beulich
@ 2018-03-19 15:34     ` Paul Durrant
  2018-03-19 16:49       ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-03-19 15:34 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Ian Jackson, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 19 March 2018 15:12
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; Ian
> Jackson <Ian.Jackson@citrix.com>; Stefano Stabellini
> <sstabellini@kernel.org>; xen-devel@lists.xenproject.org; Konrad Rzeszutek
> Wilk <konrad.wilk@oracle.com>; Tim (Xen.org) <tim@xen.org>
> Subject: Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and
> also to flush the IOTLB
> 
> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> > This patch adds iommu_ops to allow a domain with control_iommu
> privilege
> > to map and unmap pages from any guest over which it has mapping
> privilege
> > in the IOMMU.
> > These operations implicitly disable IOTLB flushing so that the caller can
> > batch operations and then explicitly flush the IOTLB using the iommu_op
> > also added by this patch.
> 
> Can't this be abused for unmaps?

Hmm. I think we're ok. The calls just play with the CPU local flush disable flag so they should only disable anything resulting from the current hypercall. Manipulation of other IOMMU page tables (on behalf of other domains) should not be affected AFAICT. I'll double check though.

> 
> > --- a/xen/arch/x86/iommu_op.c
> > +++ b/xen/arch/x86/iommu_op.c
> > @@ -24,6 +24,174 @@
> >  #include <xen/hypercall.h>
> >  #include <xen/iommu.h>
> >
> > +/* Override macros from asm/page.h to make them work with mfn_t */
> > +#undef mfn_to_page
> > +#define mfn_to_page(mfn) __mfn_to_page(mfn_x(mfn))
> > +#undef page_to_mfn
> > +#define page_to_mfn(page) _mfn(__page_to_mfn(page))
> 
> I guess with Julien's this needs to go away, but it looks like his
> series hasn't landed yet.
> 

Yes, I'll remove this once that happens.

> > +struct check_rdm_ctxt {
> > +    bfn_t bfn;
> > +};
> > +
> > +static int check_rdm(xen_pfn_t start, xen_ulong_t nr, u32 id, void *arg)
> 
> uint32_t
> 

Yep.

> > +{
> > +    struct check_rdm_ctxt *ctxt = arg;
> > +
> > +    if ( bfn_x(ctxt->bfn) >= start &&
> > +         bfn_x(ctxt->bfn) < start + nr )
> > +        return -EINVAL;
> 
> Something more distinguishable than EINVAL would certainly be
> nice here. Also how come this check does not depend on the
> domain? Only RMRRs of devices owned by a domain are relevant
> in the BFN range (unless I still didn't fully understand how BFN is
> meant to be different from GFN and MFN).
> 

I thought that the reserved range check was only for the current domain's mappings (optionally limited to a single initiator), but I could be wrong. I'll check.

> > +static int iommuop_map(struct xen_iommu_op_map *op, unsigned int
> flags)
> > +{
> > +    struct domain *d, *od, *currd = current->domain;
> > +    struct domain_iommu *iommu = dom_iommu(currd);
> > +    const struct iommu_ops *ops = iommu->platform_ops;
> > +    domid_t domid = op->domid;
> > +    gfn_t gfn = _gfn(op->gfn);
> > +    bfn_t bfn = _bfn(op->bfn);
> > +    mfn_t mfn;
> > +    struct check_rdm_ctxt ctxt = {
> > +        .bfn = bfn,
> > +    };
> > +    p2m_type_t p2mt;
> > +    p2m_query_t p2mq;
> > +    struct page_info *page;
> > +    unsigned int prot;
> > +    int rc;
> > +
> > +    if (op->pad0 != 0 || op->pad1 != 0)
> 
> Missing blanks again (and please again consider dropping the " != 0").
> 
> > +        return -EINVAL;
> > +
> > +    /*
> > +     * Both map_page and lookup_page operations must be implemented.
> > +     * The lookup_page method is not used here but is relied upon by
> > +     * iommuop_unmap() to drop the page reference taken here.
> > +     */
> > +    if ( !ops->map_page || !ops->lookup_page )
> > +        return -ENOSYS;
> 
> EOPNOTSUPP (also further down)
> 

I wanted the 'not implemented' case to be distinct from the 'not supported because of some configuration detail' case, which is why I chose ENOSYS. I'll change it if you don't think that matters though.

> Also how about the unmap hook? If that's not implemented, how
> would the page ref obtained below ever be dropped again? Or
> you may need to re-order the unmap side code.

Ok. I'll just check for all map, unmap and lookup in both cases.

> 
> > +    /* Check whether the specified BFN falls in a reserved region */
> > +    rc = iommu_get_reserved_device_memory(check_rdm, &ctxt);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    d = rcu_lock_domain_by_any_id(domid);
> > +    if ( !d )
> > +        return -ESRCH;
> > +
> > +    p2mq = (flags & XEN_IOMMUOP_map_readonly) ?
> > +        P2M_UNSHARE : P2M_ALLOC;
> 
> Isn't this the wrong way round?
> 

I don't think so. If we're doing a readonly mapping then the page should not be forcibly populated, right?

> > +    page = get_page_from_gfn(d, gfn_x(gfn), &p2mt, p2mq);
> > +
> > +    rc = -ENOENT;
> > +    if ( !page )
> > +        goto unlock;
> > +
> > +    if ( p2m_is_paged(p2mt) )
> > +    {
> > +        p2m_mem_paging_populate(d, gfn_x(gfn));
> > +        goto release;
> > +    }
> > +
> > +    if ( (p2mq & P2M_UNSHARE) && p2m_is_shared(p2mt) )
> > +        goto release;
> 
> Same for this check then?
> 

I'm confused.

> > +    /*
> > +     * Make sure the page is RAM and, if it is read-only, that the
> > +     * read-only flag is present.
> > +     */
> > +    rc = -EPERM;
> > +    if ( !p2m_is_any_ram(p2mt) ||
> > +         (p2m_is_readonly(p2mt) && !(flags &
> XEN_IOMMUOP_map_readonly)) )
> > +        goto release;
> 
> Don't you also need to obtain a PGT_writable reference in the
> "not r/o" case?
> 

I'll check the logic again.

> > +    /*
> > +     * If the calling domain does not own the page then make sure it
> > +     * has mapping privilege over the page owner.
> > +     */
> > +    od = page_get_owner(page);
> > +    if ( od != currd )
> > +    {
> > +        rc = xsm_domain_memory_map(XSM_TARGET, od);
> > +        if ( rc )
> > +            goto release;
> > +    }
> 
> With XSM_TARGET I don't see the point of the if() around here.
> Perhaps simply
> 
>         rc = xsm_domain_memory_map(XSM_TARGET,
> page_get_owner(page));
> 
> ?

I wasn't sure the test was valid if the current domain was owner.

> 
> > +static int iommuop_unmap(struct xen_iommu_op_unmap *op)
> > +{
> > +    struct domain *currd = current->domain;
> > +    struct domain_iommu *iommu = dom_iommu(currd);
> > +    const struct iommu_ops *ops = iommu->platform_ops;
> > +    bfn_t bfn = _bfn(op->bfn);
> > +    mfn_t mfn;
> > +    struct check_rdm_ctxt ctxt = {
> > +        .bfn = bfn,
> > +    };
> > +    unsigned int flags;
> > +    struct page_info *page;
> > +    int rc;
> > +
> > +    /*
> > +     * Both unmap_page and lookup_page operations must be
> implemented.
> > +     */
> 
> Single line comment (there are more below).
> 

Ok.

> > +    if ( !ops->unmap_page || !ops->lookup_page )
> > +        return -ENOSYS;
> > +
> > +    /* Check whether the specified BFN falls in a reserved region */
> > +    rc = iommu_get_reserved_device_memory(check_rdm, &ctxt);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    if ( ops->lookup_page(currd, bfn, &mfn, &flags) ||
> > +         !mfn_valid(mfn) )
> > +        return -ENOENT;
> > +
> > +    page = mfn_to_page(mfn);
> > +
> > +    if ( ops->unmap_page(currd, bfn) )
> > +        return -EIO;
> 
> How are you making sure this is a mapping that was established via
> the map op? Without that this can be (ab)used to ...
> 
> > +    put_page(page);
> 
> ... underflow the refcount of a page.
> 

Yes, I guess I need to ensure that only non-RAM (i.e. RMRR and E820 reserved areas) are mapped through the IOMMU or this could indeed be abused.

> > +    return 0;
> 
> Blank line above here please.
> 

Ok.

> > @@ -101,6 +269,22 @@ static void iommu_op(xen_iommu_op_t *op)
> >          op->status = iommuop_query_reserved(&op->u.query_reserved);
> >          break;
> >
> > +    case XEN_IOMMUOP_map:
> > +        this_cpu(iommu_dont_flush_iotlb) = 1;
> > +        op->status = iommuop_map(&op->u.map, op->flags);
> > +        this_cpu(iommu_dont_flush_iotlb) = 0;
> 
> true/false would be better in new code, even if the type of the
> variable is still bool_t.

Ok.

> 
> > --- a/xen/include/public/iommu_op.h
> > +++ b/xen/include/public/iommu_op.h
> > @@ -57,13 +57,50 @@ struct xen_iommu_op_query_reserved {
> >      XEN_GUEST_HANDLE(xen_iommu_reserved_region_t) regions;
> >  };
> >
> > +/*
> > + * XEN_IOMMUOP_map: Map a page in the IOMMU.
> > + */
> > +#define XEN_IOMMUOP_map 2
> > +
> > +struct xen_iommu_op_map {
> > +    /* IN - The IOMMU frame number which will hold the new mapping */
> > +    xen_bfn_t bfn;
> > +    /* IN - The guest frame number of the page to be mapped */
> > +    xen_pfn_t gfn;
> > +    /* IN - The domid of the guest */
> 
> "... owning the page"
> 

Not necessarily. If the page has been grant or foreign mapped by the domain then I need this work.

> > +    domid_t domid;
> > +    unsigned short pad0;
> > +    unsigned int pad1;
> > +};
> 
> No built in batching here? Also fixed width types again please.
> 

It's a multi-op hypercall so the batching is done at that level.

> > +/*
> > + * XEN_IOMMUOP_flush: Flush the IOMMU TLB.
> > + */
> > +#define XEN_IOMMUOP_flush 4
> 
> No inputs here at all makes this a rather simple interface, but makes
> single-page updates quite expensive.
> 

Ok. I guess I could make it specific even if the underlying implementation doesn't support that.

> >  struct xen_iommu_op {
> >      uint16_t op;
> >      uint16_t flags; /* op specific flags */
> > +
> > +#define _XEN_IOMMUOP_map_readonly 0
> > +#define XEN_IOMMUOP_map_readonly (1 <<
> (_XEN_IOMMUOP_map_readonly))
> 
> Perhaps better have this next to the map op?
> 

Ok.

  Paul

> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 6/7] x86: add iommu_op to query reserved ranges
  2018-03-19 15:13   ` Jan Beulich
@ 2018-03-19 15:36     ` Paul Durrant
  2018-03-19 16:31       ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-03-19 15:36 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Ian Jackson, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 19 March 2018 15:14
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; Ian
> Jackson <Ian.Jackson@citrix.com>; Stefano Stabellini
> <sstabellini@kernel.org>; xen-devel@lists.xenproject.org; Konrad Rzeszutek
> Wilk <konrad.wilk@oracle.com>; Tim (Xen.org) <tim@xen.org>
> Subject: Re: [PATCH 6/7] x86: add iommu_op to query reserved ranges
> 
> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> > --- a/xen/arch/x86/iommu_op.c
> > +++ b/xen/arch/x86/iommu_op.c
> > @@ -22,6 +22,58 @@
> >  #include <xen/event.h>
> >  #include <xen/guest_access.h>
> >  #include <xen/hypercall.h>
> > +#include <xen/iommu.h>
> > +
> > +struct get_rdm_ctxt {
> > +    unsigned int max_entries;
> > +    unsigned int nr_entries;
> > +    XEN_GUEST_HANDLE(xen_iommu_reserved_region_t) regions;
> > +};
> > +
> > +static int get_rdm(xen_pfn_t start, xen_ulong_t nr, u32 id, void *arg)
> > +{
> > +    struct get_rdm_ctxt *ctxt = arg;
> > +
> > +    if ( ctxt->nr_entries < ctxt->max_entries )
> > +    {
> > +        xen_iommu_reserved_region_t region = {
> > +            .start_bfn = start,
> > +            .nr_frames = nr,
> > +        };
> > +
> > +        if ( copy_to_guest_offset(ctxt->regions, ctxt->nr_entries, &region,
> > +                                  1) )
> > +            return -EFAULT;
> > +    }
> > +
> > +    ctxt->nr_entries++;
> > +
> > +    return 1;
> > +}
> > +
> > +static int iommuop_query_reserved(struct
> xen_iommu_op_query_reserved *op)
> > +{
> > +    struct get_rdm_ctxt ctxt = {
> > +        .max_entries = op->nr_entries,
> > +        .regions = op->regions,
> > +    };
> > +    int rc;
> > +
> > +    if (op->pad != 0)
> > +        return -EINVAL;
> > +
> > +    rc = iommu_get_reserved_device_memory(get_rdm, &ctxt);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    /* Pass back the actual number of reserved regions */
> > +    op->nr_entries = ctxt.nr_entries;
> > +
> > +    if ( ctxt.nr_entries > ctxt.max_entries )
> > +        return -ENOBUFS;
> > +
> > +    return 0;
> > +}
> 
> One more note here: As it looks we can only hope there won't be
> too many RMRRs, as the number of entries that can be requested
> here is basically unbounded.
> 

The caller has to be able to allocate a buffer large enough but, yes there is no explicit limit. I'll add pre-empt checks.

  Paul

> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 6/7] x86: add iommu_op to query reserved ranges
  2018-03-19 15:13     ` Paul Durrant
@ 2018-03-19 16:30       ` Jan Beulich
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Beulich @ 2018-03-19 16:30 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Ian Jackson, xen-devel

>>> On 19.03.18 at 16:13, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 19 March 2018 14:10
>> 
>> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
>> > +                for ( j = 0; \
>> > +                      j < min_t(unsigned int, (_d_)->nr_entries, \
>> > +                                *nr_entries); \
>> 
>> Do you really need min_t() here (rather than the more safe min())?
>> 
> 
> I've been asked to preferentially use min_t() before (although I don't think 
> it was by you) so I'm not sure what the expectation is. I'm happy to use 
> min().

I'd be curious who that was and why. The type check in min() makes
it preferable to use whenever the two types are (supposed to be)
compatible.

>> > +struct xen_iommu_reserved_region {
>> > +    xen_bfn_t start_bfn;
>> > +    unsigned int nr_frames;
>> > +    unsigned int pad;
>> 
>> Fixed width types (i.e. uint32_t) in the public interface please.
>> Also, this not being the main MMU, page granularity needs to be
>> specified somehow (also for the conversion between xen_bfn_t
>> and a bus address).
>> 
> 
> Do you think it would be better to have a separate query call to get the 
> IOMMU page size back, or are you anticipating heterogeneous ranges (in which 
> case I'm going to need to adjust the map and unmap functions to allow for 
> that)?

Fundamentally (on x86) I can't see why we couldn't eventually
permit 2M and 1G mappings to be established this way. For
RMRRs I don't know how large they can get. I think I've seen
larger than single pages ones for graphics devices, but I don't
recall how big they were, or whether they were suitable aligned
to allow large page mappings.

But we should also have ARM (and ideally make this abstract
enough to even fit other architectures) in mind. Also remember
that someone already has a series somewhere to extend the
iommu_{,un}map_page() functions with an order parameter.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 6/7] x86: add iommu_op to query reserved ranges
  2018-03-19 15:36     ` Paul Durrant
@ 2018-03-19 16:31       ` Jan Beulich
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Beulich @ 2018-03-19 16:31 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Ian Jackson, xen-devel

>>> On 19.03.18 at 16:36, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 19 March 2018 15:14
>> 
>> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
>> > --- a/xen/arch/x86/iommu_op.c
>> > +++ b/xen/arch/x86/iommu_op.c
>> > @@ -22,6 +22,58 @@
>> >  #include <xen/event.h>
>> >  #include <xen/guest_access.h>
>> >  #include <xen/hypercall.h>
>> > +#include <xen/iommu.h>
>> > +
>> > +struct get_rdm_ctxt {
>> > +    unsigned int max_entries;
>> > +    unsigned int nr_entries;
>> > +    XEN_GUEST_HANDLE(xen_iommu_reserved_region_t) regions;
>> > +};
>> > +
>> > +static int get_rdm(xen_pfn_t start, xen_ulong_t nr, u32 id, void *arg)
>> > +{
>> > +    struct get_rdm_ctxt *ctxt = arg;
>> > +
>> > +    if ( ctxt->nr_entries < ctxt->max_entries )
>> > +    {
>> > +        xen_iommu_reserved_region_t region = {
>> > +            .start_bfn = start,
>> > +            .nr_frames = nr,
>> > +        };
>> > +
>> > +        if ( copy_to_guest_offset(ctxt->regions, ctxt->nr_entries, &region,
>> > +                                  1) )
>> > +            return -EFAULT;
>> > +    }
>> > +
>> > +    ctxt->nr_entries++;
>> > +
>> > +    return 1;
>> > +}
>> > +
>> > +static int iommuop_query_reserved(struct
>> xen_iommu_op_query_reserved *op)
>> > +{
>> > +    struct get_rdm_ctxt ctxt = {
>> > +        .max_entries = op->nr_entries,
>> > +        .regions = op->regions,
>> > +    };
>> > +    int rc;
>> > +
>> > +    if (op->pad != 0)
>> > +        return -EINVAL;
>> > +
>> > +    rc = iommu_get_reserved_device_memory(get_rdm, &ctxt);
>> > +    if ( rc )
>> > +        return rc;
>> > +
>> > +    /* Pass back the actual number of reserved regions */
>> > +    op->nr_entries = ctxt.nr_entries;
>> > +
>> > +    if ( ctxt.nr_entries > ctxt.max_entries )
>> > +        return -ENOBUFS;
>> > +
>> > +    return 0;
>> > +}
>> 
>> One more note here: As it looks we can only hope there won't be
>> too many RMRRs, as the number of entries that can be requested
>> here is basically unbounded.
>> 
> 
> The caller has to be able to allocate a buffer large enough but, yes there 
> is no explicit limit. I'll add pre-empt checks.

Thing is - preempt check probably won't be easy with the way
iommu_get_reserved_device_memory() works.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB
  2018-03-19 15:34     ` Paul Durrant
@ 2018-03-19 16:49       ` Jan Beulich
  2018-03-19 16:57         ` Paul Durrant
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2018-03-19 16:49 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Ian Jackson, xen-devel

>>> On 19.03.18 at 16:34, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 19 March 2018 15:12
>> 
>> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
>> > This patch adds iommu_ops to allow a domain with control_iommu
>> privilege
>> > to map and unmap pages from any guest over which it has mapping
>> privilege
>> > in the IOMMU.
>> > These operations implicitly disable IOTLB flushing so that the caller can
>> > batch operations and then explicitly flush the IOTLB using the iommu_op
>> > also added by this patch.
>> 
>> Can't this be abused for unmaps?
> 
> Hmm. I think we're ok. The calls just play with the CPU local flush disable 
> flag so they should only disable anything resulting from the current 
> hypercall. Manipulation of other IOMMU page tables (on behalf of other 
> domains) should not be affected AFAICT. I'll double check though.

Just think about the caller doing an unmap (which drops the page
ref) but never doing a flush. If the dropped ref was the last one,
the page will be freed before the caller even has a chance to issue
a flush.

>> > +    /*
>> > +     * Both map_page and lookup_page operations must be implemented.
>> > +     * The lookup_page method is not used here but is relied upon by
>> > +     * iommuop_unmap() to drop the page reference taken here.
>> > +     */
>> > +    if ( !ops->map_page || !ops->lookup_page )
>> > +        return -ENOSYS;
>> 
>> EOPNOTSUPP (also further down)
>> 
> 
> I wanted the 'not implemented' case to be distinct from the 'not supported 
> because of some configuration detail' case, which is why I chose ENOSYS. I'll 
> change it if you don't think that matters though.

Distinguishing those two cases is perhaps indeed worthwhile, but
for ENOSYS we had the discussion multiple times, and I think we've
finally converged to this being intended to only be returned for
out of range hypercall numbers (not even sub-function ones). Of
course there continue to be many violators, but we'll try to not
allow in new ones.

>> Also how about the unmap hook? If that's not implemented, how
>> would the page ref obtained below ever be dropped again? Or
>> you may need to re-order the unmap side code.
> 
> Ok. I'll just check for all map, unmap and lookup in both cases.

Well, the unmap path probably doesn't need to check the map
hook.

>> > +    /* Check whether the specified BFN falls in a reserved region */
>> > +    rc = iommu_get_reserved_device_memory(check_rdm, &ctxt);
>> > +    if ( rc )
>> > +        return rc;
>> > +
>> > +    d = rcu_lock_domain_by_any_id(domid);
>> > +    if ( !d )
>> > +        return -ESRCH;
>> > +
>> > +    p2mq = (flags & XEN_IOMMUOP_map_readonly) ?
>> > +        P2M_UNSHARE : P2M_ALLOC;
>> 
>> Isn't this the wrong way round?
>> 
> 
> I don't think so. If we're doing a readonly mapping then the page should not 
> be forcibly populated, right?

I view it the other way around - no matter what mapping, the
page should be populated. If it's a writable one, an existing
page also needs to be unshared.

>> > +    page = get_page_from_gfn(d, gfn_x(gfn), &p2mt, p2mq);
>> > +
>> > +    rc = -ENOENT;
>> > +    if ( !page )
>> > +        goto unlock;
>> > +
>> > +    if ( p2m_is_paged(p2mt) )
>> > +    {
>> > +        p2m_mem_paging_populate(d, gfn_x(gfn));
>> > +        goto release;
>> > +    }
>> > +
>> > +    if ( (p2mq & P2M_UNSHARE) && p2m_is_shared(p2mt) )
>> > +        goto release;
>> 
>> Same for this check then?
>> 
> 
> I'm confused.

Actually, if you request UNSHARE, you'll get back a shared type
only together with NULL for the page. See e.g. get_paged_frame()
in common/grant_table.c. There you'll also find an example of the
inverted use of the request types compared to what you have.

>> > +    if ( !ops->unmap_page || !ops->lookup_page )
>> > +        return -ENOSYS;
>> > +
>> > +    /* Check whether the specified BFN falls in a reserved region */
>> > +    rc = iommu_get_reserved_device_memory(check_rdm, &ctxt);
>> > +    if ( rc )
>> > +        return rc;
>> > +
>> > +    if ( ops->lookup_page(currd, bfn, &mfn, &flags) ||
>> > +         !mfn_valid(mfn) )
>> > +        return -ENOENT;
>> > +
>> > +    page = mfn_to_page(mfn);
>> > +
>> > +    if ( ops->unmap_page(currd, bfn) )
>> > +        return -EIO;
>> 
>> How are you making sure this is a mapping that was established via
>> the map op? Without that this can be (ab)used to ...
>> 
>> > +    put_page(page);
>> 
>> ... underflow the refcount of a page.
>> 
> 
> Yes, I guess I need to ensure that only non-RAM (i.e. RMRR and E820 reserved 
> areas) are mapped through the IOMMU or this could indeed be abused.

Now I'm confused - then you don't need to deal with struct page_info
and page references at all. Nor would you need to call
get_page_from_gfn() and check p2m_is_any_ram(). Also - what use
would the interface be if you couldn't map any RAM?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB
  2018-03-19 16:49       ` Jan Beulich
@ 2018-03-19 16:57         ` Paul Durrant
  2018-03-20  8:11           ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-03-19 16:57 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, xen-devel, Ian Jackson

> -----Original Message-----
[snip]
> >> How are you making sure this is a mapping that was established via
> >> the map op? Without that this can be (ab)used to ...
> >>
> >> > +    put_page(page);
> >>
> >> ... underflow the refcount of a page.
> >>
> >
> > Yes, I guess I need to ensure that only non-RAM (i.e. RMRR and E820
> reserved
> > areas) are mapped through the IOMMU or this could indeed be abused.
> 
> Now I'm confused - then you don't need to deal with struct page_info
> and page references at all. Nor would you need to call
> get_page_from_gfn() and check p2m_is_any_ram(). Also - what use
> would the interface be if you couldn't map any RAM?
> 

Sorry to confuse...

What I meant was that safety (against underflow) is predicated on iommu_lookup_page() failing if the mapping was not established through an iommu op hypercall. So, the only things that should be valid in the iommu (and hence that iommu_lookup_page() would succeed for) at the point where the guest starts to boot must all fall within reserved regions, so thay they are ruled out by the earlier check.

  Paul
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB
  2018-03-19 16:57         ` Paul Durrant
@ 2018-03-20  8:11           ` Jan Beulich
  2018-03-20  9:32             ` Paul Durrant
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2018-03-20  8:11 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim(Xen.org),
	George Dunlap, IanJackson, xen-devel

>>> On 19.03.18 at 17:57, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
> [snip]
>> >> How are you making sure this is a mapping that was established via
>> >> the map op? Without that this can be (ab)used to ...
>> >>
>> >> > +    put_page(page);
>> >>
>> >> ... underflow the refcount of a page.
>> >>
>> >
>> > Yes, I guess I need to ensure that only non-RAM (i.e. RMRR and E820
>> reserved
>> > areas) are mapped through the IOMMU or this could indeed be abused.
>> 
>> Now I'm confused - then you don't need to deal with struct page_info
>> and page references at all. Nor would you need to call
>> get_page_from_gfn() and check p2m_is_any_ram(). Also - what use
>> would the interface be if you couldn't map any RAM?
>> 
> 
> Sorry to confuse...
> 
> What I meant was that safety (against underflow) is predicated on 
> iommu_lookup_page() failing if the mapping was not established through an 
> iommu op hypercall. So, the only things that should be valid in the iommu 
> (and hence that iommu_lookup_page() would succeed for) at the point where the 
> guest starts to boot must all fall within reserved regions, so thay they are 
> ruled out by the earlier check.

Ah, I see. What I don't see is how you want to arrange for that.
The tool stack wouldn't know ahead of time whether the guest
wants to use the PV IOMMU interfaces, would it? IOW rather than
guaranteeing said state at start of guest, shouldn't you blow away
all non-special mappings the first time a PV IOMMU request is made?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB
  2018-03-20  8:11           ` Jan Beulich
@ 2018-03-20  9:32             ` Paul Durrant
  2018-03-20  9:49               ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-03-20  9:32 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Ian Jackson, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 20 March 2018 08:12
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel@lists.xenproject.org; Tim (Xen.org) <tim@xen.org>
> Subject: RE: [Xen-devel] [PATCH 7/7] x86: add iommu_ops to map and
> unmap pages, and also to flush the IOTLB
> 
> >>> On 19.03.18 at 17:57, <Paul.Durrant@citrix.com> wrote:
> >>  -----Original Message-----
> > [snip]
> >> >> How are you making sure this is a mapping that was established via
> >> >> the map op? Without that this can be (ab)used to ...
> >> >>
> >> >> > +    put_page(page);
> >> >>
> >> >> ... underflow the refcount of a page.
> >> >>
> >> >
> >> > Yes, I guess I need to ensure that only non-RAM (i.e. RMRR and E820
> >> reserved
> >> > areas) are mapped through the IOMMU or this could indeed be abused.
> >>
> >> Now I'm confused - then you don't need to deal with struct page_info
> >> and page references at all. Nor would you need to call
> >> get_page_from_gfn() and check p2m_is_any_ram(). Also - what use
> >> would the interface be if you couldn't map any RAM?
> >>
> >
> > Sorry to confuse...
> >
> > What I meant was that safety (against underflow) is predicated on
> > iommu_lookup_page() failing if the mapping was not established through
> an
> > iommu op hypercall. So, the only things that should be valid in the iommu
> > (and hence that iommu_lookup_page() would succeed for) at the point
> where the
> > guest starts to boot must all fall within reserved regions, so thay they are
> > ruled out by the earlier check.
> 
> Ah, I see. What I don't see is how you want to arrange for that.
> The tool stack wouldn't know ahead of time whether the guest
> wants to use the PV IOMMU interfaces, would it? IOW rather than
> guaranteeing said state at start of guest, shouldn't you blow away
> all non-special mappings the first time a PV IOMMU request is made?
> 

I suspect we want both. Kevin suggested a 'big switch' when the domain boots, in which I could blow away all non-reserved mappings. But, for performance sake, I think it would also be worth a Xen command line option to avoid populating the IOMMU mappings for dom0 in the first place (so when it pulls the 'big switch' it's a no-op). Non-aware dom0s will, of course, probably fail to boot but whoever is setting the command line for Xen should know what their dom0 is capable of. As for other domains, it may be worth adding a similar domain create option to the toolstack but that could be done at a later date.

  Paul

> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB
  2018-03-20  9:32             ` Paul Durrant
@ 2018-03-20  9:49               ` Jan Beulich
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Beulich @ 2018-03-20  9:49 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim(Xen.org),
	George Dunlap, Ian Jackson, xen-devel

>>> On 20.03.18 at 10:32, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 20 March 2018 08:12
>> 
>> >>> On 19.03.18 at 17:57, <Paul.Durrant@citrix.com> wrote:
>> > What I meant was that safety (against underflow) is predicated on
>> > iommu_lookup_page() failing if the mapping was not established through
>> an
>> > iommu op hypercall. So, the only things that should be valid in the iommu
>> > (and hence that iommu_lookup_page() would succeed for) at the point
>> where the
>> > guest starts to boot must all fall within reserved regions, so thay they are
>> > ruled out by the earlier check.
>> 
>> Ah, I see. What I don't see is how you want to arrange for that.
>> The tool stack wouldn't know ahead of time whether the guest
>> wants to use the PV IOMMU interfaces, would it? IOW rather than
>> guaranteeing said state at start of guest, shouldn't you blow away
>> all non-special mappings the first time a PV IOMMU request is made?
>> 
> 
> I suspect we want both. Kevin suggested a 'big switch' when the domain 
> boots, in which I could blow away all non-reserved mappings. But, for 
> performance sake, I think it would also be worth a Xen command line option to 
> avoid populating the IOMMU mappings for dom0 in the first place (so when it 
> pulls the 'big switch' it's a no-op). Non-aware dom0s will, of course, probably 
> fail to boot but whoever is setting the command line for Xen should know what 
> their dom0 is capable of. As for other domains, it may be worth adding a 
> similar domain create option to the toolstack but that could be done at a 
> later date.

Oh, yes, options to avoid the entire teardown are certainly going
to be a good thing to have.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-03-16 12:25   ` Jan Beulich
@ 2018-06-07 11:42     ` Paul Durrant
  2018-06-07 13:21       ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-06-07 11:42 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Ian Jackson, xen-devel, Daniel De Graaf

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 16 March 2018 12:25
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Wei Liu
> <wei.liu2@citrix.com>; George Dunlap <George.Dunlap@citrix.com>; Ian
> Jackson <Ian.Jackson@citrix.com>; Stefano Stabellini
> <sstabellini@kernel.org>; xen-devel@lists.xenproject.org; KonradRzeszutek
> Wilk <konrad.wilk@oracle.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>;
> Tim (Xen.org) <tim@xen.org>
> Subject: Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
> 
> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> > --- a/xen/arch/x86/Makefile
> > +++ b/xen/arch/x86/Makefile
> > @@ -33,6 +33,7 @@ obj-$(CONFIG_CRASH_DEBUG) += gdbstub.o
> >  obj-y += hypercall.o
> >  obj-y += i387.o
> >  obj-y += i8259.o
> > +obj-y += iommu_op.o
> 
> As mentioned in other contexts, I'd prefer if we stopped using
> underscores in places where dashes (or other separators not
> usable in C identifiers) are fine.
> 

I don't see any guidance in CODING_STYLE or elsewhere, and also the majority of the codebase seems to prefer using underscores in module names. Personally I'd prefer new code remain consistent.

> > --- /dev/null
> > +++ b/xen/arch/x86/iommu_op.c
> > @@ -0,0 +1,169 @@
> >
> +/*********************************************************
> *********************
> > + * x86/iommu_op.c
> > + *
> > + * Paravirtualised IOMMU functionality
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> > + *
> > + * Copyright (C) 2018 Citrix Systems Inc
> > + */
> > +
> > +#include <xen/event.h>
> > +#include <xen/guest_access.h>
> > +#include <xen/hypercall.h>
> > +
> > +static bool can_control_iommu(void)
> > +{
> > +    struct domain *currd = current->domain;
> > +
> > +    /*
> > +     * IOMMU mappings cannot be manipulated if:
> > +     * - the IOMMU is not enabled or,
> > +     * - the IOMMU is passed through or,
> 
> "is passed through" isn't really a proper description of what
> iommu_passthrough means, I'm afraid. The description of the
> option says "Control whether to disable DMA remapping for
> Dom0." Perhaps "is bypassed"? But then it would be better
> to qualify the check with is_hardware_domain(), despite you
> restricting things to Dom0 for now anyway.
> 

I think I'm going to add a hypercall for a domain to enable PV IOMMU at start of day, so I'll re-work all this in a separate patch.

> > +     * - shared EPT configured or,
> > +     * - Xen is maintaining an identity map.
> 
> Is this meant to describe ...
> 
> > +     */
> > +    if ( !iommu_enabled || iommu_passthrough ||
> > +         iommu_use_hap_pt(currd) || need_iommu(currd) )
> 
> ... need_iommu() here? How is that implying an identity map?
> 
> > +        return false;
> > +
> > +    return true;
> 
> Please make this a singe return statement (with the expression as
> operand).
> 
> > +long do_iommu_op(XEN_GUEST_HANDLE_PARAM(xen_iommu_op_t)
> uops,
> > +                 unsigned int count)
> > +{
> > +    unsigned int i;
> > +    int rc;
> > +
> > +    rc = xsm_iommu_op(XSM_PRIV, current->domain);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    if ( !can_control_iommu() )
> > +        return -EACCES;
> > +
> > +    for ( i = 0; i < count; i++ )
> > +    {
> > +        xen_iommu_op_t op;
> > +
> > +        if ( ((i & 0xff) == 0xff) && hypercall_preempt_check() )
> > +        {
> > +            rc = i;
> 
> For this to be correct for large enough values of "count", rc needs
> to have long type.

Yes, it does indeed.

> 
> > +            break;
> > +        }
> > +
> > +        if ( copy_from_guest_offset(&op, uops, i, 1) )
> > +        {
> > +            rc = -EFAULT;
> > +            break;
> > +        }
> > +
> > +        iommu_op(&op);
> > +
> > +        if ( copy_to_guest_offset(uops, i, &op, 1) )
> 
> __copy_to_guest_offset()
> 
> Also do you really need to copy back other than the status?

At this stage, no. I'll restrict it here and it can expand later if need be.

> 
> > --- /dev/null
> > +++ b/xen/include/public/iommu_op.h
> > @@ -0,0 +1,55 @@
> > +/*
> > + * Permission is hereby granted, free of charge, to any person obtaining a
> copy
> > + * of this software and associated documentation files (the "Software"),
> to
> > + * deal in the Software without restriction, including without limitation the
> > + * rights to use, copy, modify, merge, publish, distribute, sublicense,
> and/or
> > + * sell copies of the Software, and to permit persons to whom the
> Software is
> > + * furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included
> in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
> KIND, EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
> EVENT SHALL THE
> > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
> DAMAGES OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
> OR OTHER
> > + * DEALINGS IN THE SOFTWARE.
> > + *
> > + * Copyright (C) 2018 Citrix Systems Inc
> > + */
> > +
> > +#ifndef __XEN_PUBLIC_IOMMU_OP_H__
> > +#define __XEN_PUBLIC_IOMMU_OP_H__
> 
> Please can you avoid introducing further name space violations
> into the public headers?

I assume you mean the leading '__'? Again, I chose the name based on consistency with other code and I'd prefer to remain consistent. Could you explain why having a leading '__' is problematic?

  Paul

> 
> > +#include "xen.h"
> > +
> > +struct xen_iommu_op {
> > +    uint16_t op;
> > +    uint16_t flags; /* op specific flags */
> > +    int32_t status; /* op completion status: */
> > +                    /* 0 for success otherwise, negative errno */
> > +};
> 
> Peeking at patch 6, you need to add the union and a large enough
> placeholder here right away, so that the struct size won't change
> with future additions.
> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-06-07 11:42     ` Paul Durrant
@ 2018-06-07 13:21       ` Jan Beulich
  2018-06-07 13:45         ` George Dunlap
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2018-06-07 13:21 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim Deegan,
	george.dunlap, Ian Jackson, xen-devel, Daniel de Graaf

>>> On 07.06.18 at 13:42, <Paul.Durrant@citrix.com> wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 16 March 2018 12:25
>> >>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
>> > --- a/xen/arch/x86/Makefile
>> > +++ b/xen/arch/x86/Makefile
>> > @@ -33,6 +33,7 @@ obj-$(CONFIG_CRASH_DEBUG) += gdbstub.o
>> >  obj-y += hypercall.o
>> >  obj-y += i387.o
>> >  obj-y += i8259.o
>> > +obj-y += iommu_op.o
>> 
>> As mentioned in other contexts, I'd prefer if we stopped using
>> underscores in places where dashes (or other separators not
>> usable in C identifiers) are fine.
> 
> I don't see any guidance in CODING_STYLE or elsewhere, and also the majority 
> of the codebase seems to prefer using underscores in module names. Personally 
> I'd prefer new code remain consistent.

The lack of statement to this effect is why I've said "I'd prefer". See
alternative-asm.h, x86-defns.h, or x86-vendors.h for _recent_
examples of moving into the other direction. On all keyboards I've
seen or used, an underscore requires two keys to be pressed, while
a dash takes only one. This isn't much for an individual instance, but
it sums up. It's the same reason why I'm advocating against the use
of underscores in new command line option names.

In the end, looking at the history of typography, I think underscore
is a relatively late (and presumably artificial) addition; in particular I
don't recall mechanical type writers to even have a key for it. It's
use as a visual separator is necessary in e.g. programming
languages, as commonly dash designated the operator for "minus"
there. Extending such naming to non-identifiers (file system names
and command line options are just prominent examples) is simply
misguided imo.

>> > --- /dev/null
>> > +++ b/xen/include/public/iommu_op.h
>> > @@ -0,0 +1,55 @@
>> > +/*
>> > + * Permission is hereby granted, free of charge, to any person obtaining a
>> copy
>> > + * of this software and associated documentation files (the "Software"),
>> to
>> > + * deal in the Software without restriction, including without limitation the
>> > + * rights to use, copy, modify, merge, publish, distribute, sublicense,
>> and/or
>> > + * sell copies of the Software, and to permit persons to whom the
>> Software is
>> > + * furnished to do so, subject to the following conditions:
>> > + *
>> > + * The above copyright notice and this permission notice shall be included
>> in
>> > + * all copies or substantial portions of the Software.
>> > + *
>> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
>> KIND, EXPRESS OR
>> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
>> EVENT SHALL THE
>> > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
>> DAMAGES OR OTHER
>> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>> OTHERWISE, ARISING
>> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
>> OR OTHER
>> > + * DEALINGS IN THE SOFTWARE.
>> > + *
>> > + * Copyright (C) 2018 Citrix Systems Inc
>> > + */
>> > +
>> > +#ifndef __XEN_PUBLIC_IOMMU_OP_H__
>> > +#define __XEN_PUBLIC_IOMMU_OP_H__
>> 
>> Please can you avoid introducing further name space violations
>> into the public headers?
> 
> I assume you mean the leading '__'? Again, I chose the name based on 
> consistency with other code and I'd prefer to remain consistent. Could you 
> explain why having a leading '__' is problematic?

Names starting with double underscores are reserved (as are, btw,
names starting with a single underscore and an upper case letter).
While it's unlikely for a compiler to ever want to use
__XEN_PUBLIC_IOMMU_OP_H__ for its internal purposes, we couldn't
validly complain if one did.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-06-07 13:21       ` Jan Beulich
@ 2018-06-07 13:45         ` George Dunlap
  2018-06-07 14:06           ` Paul Durrant
  0 siblings, 1 reply; 68+ messages in thread
From: George Dunlap @ 2018-06-07 13:45 UTC (permalink / raw)
  To: Jan Beulich, Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim Deegan,
	Ian Jackson, xen-devel, Daniel de Graaf

On 06/07/2018 02:21 PM, Jan Beulich wrote:
>>>> On 07.06.18 at 13:42, <Paul.Durrant@citrix.com> wrote:
>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>> Sent: 16 March 2018 12:25
>>>>>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
>>>> --- a/xen/arch/x86/Makefile
>>>> +++ b/xen/arch/x86/Makefile
>>>> @@ -33,6 +33,7 @@ obj-$(CONFIG_CRASH_DEBUG) += gdbstub.o
>>>>  obj-y += hypercall.o
>>>>  obj-y += i387.o
>>>>  obj-y += i8259.o
>>>> +obj-y += iommu_op.o
>>>
>>> As mentioned in other contexts, I'd prefer if we stopped using
>>> underscores in places where dashes (or other separators not
>>> usable in C identifiers) are fine.
>>
>> I don't see any guidance in CODING_STYLE or elsewhere, and also the majority 
>> of the codebase seems to prefer using underscores in module names. Personally 
>> I'd prefer new code remain consistent.
> 
> The lack of statement to this effect is why I've said "I'd prefer". See
> alternative-asm.h, x86-defns.h, or x86-vendors.h for _recent_
> examples of moving into the other direction. On all keyboards I've
> seen or used, an underscore requires two keys to be pressed, while
> a dash takes only one. This isn't much for an individual instance, but
> it sums up. It's the same reason why I'm advocating against the use
> of underscores in new command line option names.
> 
> In the end, looking at the history of typography, I think underscore
> is a relatively late (and presumably artificial) addition; in particular I
> don't recall mechanical type writers to even have a key for it.

<pedantic>The mechanical typewriters I learned on had an underscore to
allow you to go back and underline words.</pedantic>

> It's
> use as a visual separator is necessary in e.g. programming
> languages, as commonly dash designated the operator for "minus"
> there. Extending such naming to non-identifiers (file system names
> and command line options are just prominent examples) is simply
> misguided imo.

Well in any case, maybe this should be discussed in a patch to
CODING_STYLE, rather than in the middle of a patch series about
something completely different.

>>>> +#ifndef __XEN_PUBLIC_IOMMU_OP_H__
>>>> +#define __XEN_PUBLIC_IOMMU_OP_H__
>>>
>>> Please can you avoid introducing further name space violations
>>> into the public headers?
>>
>> I assume you mean the leading '__'? Again, I chose the name based on 
>> consistency with other code and I'd prefer to remain consistent. Could you 
>> explain why having a leading '__' is problematic?
> 
> Names starting with double underscores are reserved (as are, btw,
> names starting with a single underscore and an upper case letter).
> While it's unlikely for a compiler to ever want to use
> __XEN_PUBLIC_IOMMU_OP_H__ for its internal purposes, we couldn't
> validly complain if one did.

I'm with Jan on this one.  At the moment I'm not sure about using dashes
instead of underscores for filenames, but in this case the extra
underscores at the beginning and end are redundant; the "XEN_..._H" is
sufficient to make the contents unique.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-06-07 13:45         ` George Dunlap
@ 2018-06-07 14:06           ` Paul Durrant
  2018-06-07 14:21             ` Ian Jackson
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-06-07 14:06 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	Ian Jackson, xen-devel, Daniel de Graaf

> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@citrix.com]
> Sent: 07 June 2018 14:45
> To: Jan Beulich <JBeulich@suse.com>; Paul Durrant
> <Paul.Durrant@citrix.com>
> Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>; Ian Jackson
> <Ian.Jackson@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Stefano Stabellini
> <sstabellini@kernel.org>; xen-devel <xen-devel@lists.xenproject.org>;
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; Daniel de Graaf
> <dgdegra@tycho.nsa.gov>; Tim (Xen.org) <tim@xen.org>
> Subject: Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
> 
> On 06/07/2018 02:21 PM, Jan Beulich wrote:
> >>>> On 07.06.18 at 13:42, <Paul.Durrant@citrix.com> wrote:
> >>> From: Jan Beulich [mailto:JBeulich@suse.com]
> >>> Sent: 16 March 2018 12:25
> >>>>>> On 12.02.18 at 11:47, <paul.durrant@citrix.com> wrote:
> >>>> --- a/xen/arch/x86/Makefile
> >>>> +++ b/xen/arch/x86/Makefile
> >>>> @@ -33,6 +33,7 @@ obj-$(CONFIG_CRASH_DEBUG) += gdbstub.o
> >>>>  obj-y += hypercall.o
> >>>>  obj-y += i387.o
> >>>>  obj-y += i8259.o
> >>>> +obj-y += iommu_op.o
> >>>
> >>> As mentioned in other contexts, I'd prefer if we stopped using
> >>> underscores in places where dashes (or other separators not
> >>> usable in C identifiers) are fine.
> >>
> >> I don't see any guidance in CODING_STYLE or elsewhere, and also the
> majority
> >> of the codebase seems to prefer using underscores in module names.
> Personally
> >> I'd prefer new code remain consistent.
> >
> > The lack of statement to this effect is why I've said "I'd prefer". See
> > alternative-asm.h, x86-defns.h, or x86-vendors.h for _recent_
> > examples of moving into the other direction. On all keyboards I've
> > seen or used, an underscore requires two keys to be pressed, while
> > a dash takes only one. This isn't much for an individual instance, but
> > it sums up. It's the same reason why I'm advocating against the use
> > of underscores in new command line option names.
> >
> > In the end, looking at the history of typography, I think underscore
> > is a relatively late (and presumably artificial) addition; in particular I
> > don't recall mechanical type writers to even have a key for it.
> 
> <pedantic>The mechanical typewriters I learned on had an underscore to
> allow you to go back and underline words.</pedantic>
> 
> > It's
> > use as a visual separator is necessary in e.g. programming
> > languages, as commonly dash designated the operator for "minus"
> > there. Extending such naming to non-identifiers (file system names
> > and command line options are just prominent examples) is simply
> > misguided imo.
> 
> Well in any case, maybe this should be discussed in a patch to
> CODING_STYLE, rather than in the middle of a patch series about
> something completely different.
> 
> >>>> +#ifndef __XEN_PUBLIC_IOMMU_OP_H__
> >>>> +#define __XEN_PUBLIC_IOMMU_OP_H__
> >>>
> >>> Please can you avoid introducing further name space violations
> >>> into the public headers?
> >>
> >> I assume you mean the leading '__'? Again, I chose the name based on
> >> consistency with other code and I'd prefer to remain consistent. Could
> you
> >> explain why having a leading '__' is problematic?
> >
> > Names starting with double underscores are reserved (as are, btw,
> > names starting with a single underscore and an upper case letter).
> > While it's unlikely for a compiler to ever want to use
> > __XEN_PUBLIC_IOMMU_OP_H__ for its internal purposes, we couldn't
> > validly complain if one did.
> 
> I'm with Jan on this one.  At the moment I'm not sure about using dashes
> instead of underscores for filenames, but in this case the extra
> underscores at the beginning and end are redundant; the "XEN_..._H" is
> sufficient to make the contents unique.
> 

FWIW Linux appears to use a single '_' prefix and no suffix.

  Paul

>  -George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-06-07 14:06           ` Paul Durrant
@ 2018-06-07 14:21             ` Ian Jackson
  2018-06-07 15:21               ` Paul Durrant
  0 siblings, 1 reply; 68+ messages in thread
From: Ian Jackson @ 2018-06-07 14:21 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, xen-devel, Daniel de Graaf

Paul Durrant writes ("RE: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op"):
> FWIW Linux appears to use a single '_' prefix and no suffix.

This practice of scattering underscores about, apparently at random,
is baffling to me.

It doesn't look like most of the people who do it are aware of the
rules.  For example, #defining any identifier starting __ is a licence
to the compiler to stuff demons up your nose.

We should do this:
  #define XEN_PUBLIC_IOMMU_OP_H
which is (i) not in any of the compiler's namespaces (ii) has XEN_ at
the beginning so we can justify thinking that it won't clash with
anyone else's identifiers (Iii) will never clash with any of our own
because it ends in _H.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-06-07 14:21             ` Ian Jackson
@ 2018-06-07 15:21               ` Paul Durrant
  2018-06-07 15:41                 ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Paul Durrant @ 2018-06-07 15:21 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim (Xen.org),
	George Dunlap, Jan Beulich, xen-devel, Daniel de Graaf

> -----Original Message-----
> From: Ian Jackson [mailto:ian.jackson@citrix.com]
> Sent: 07 June 2018 15:21
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: George Dunlap <George.Dunlap@citrix.com>; Jan Beulich
> <JBeulich@suse.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Wei
> Liu <wei.liu2@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel <xen-devel@lists.xenproject.org>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Daniel de Graaf <dgdegra@tycho.nsa.gov>; Tim
> (Xen.org) <tim@xen.org>
> Subject: RE: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
> 
> Paul Durrant writes ("RE: [PATCH 5/7] public / x86: introduce
> __HYPERCALL_iommu_op"):
> > FWIW Linux appears to use a single '_' prefix and no suffix.
> 
> This practice of scattering underscores about, apparently at random,
> is baffling to me.
> 
> It doesn't look like most of the people who do it are aware of the
> rules.  For example, #defining any identifier starting __ is a licence
> to the compiler to stuff demons up your nose.
> 
> We should do this:
>   #define XEN_PUBLIC_IOMMU_OP_H
> which is (i) not in any of the compiler's namespaces (ii) has XEN_ at
> the beginning so we can justify thinking that it won't clash with
> anyone else's identifiers (Iii) will never clash with any of our own
> because it ends in _H.
> 

Sounds ok to me. Patch to CODING_STYLE?

  Paul

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
  2018-06-07 15:21               ` Paul Durrant
@ 2018-06-07 15:41                 ` Jan Beulich
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Beulich @ 2018-06-07 15:41 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Tim Deegan,
	george.dunlap, Ian Jackson, xen-devel, Daniel de Graaf

>>> On 07.06.18 at 17:21, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Ian Jackson [mailto:ian.jackson@citrix.com]
>> Sent: 07 June 2018 15:21
>> To: Paul Durrant <Paul.Durrant@citrix.com>
>> Cc: George Dunlap <George.Dunlap@citrix.com>; Jan Beulich
>> <JBeulich@suse.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Wei
>> Liu <wei.liu2@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
>> devel <xen-devel@lists.xenproject.org>; Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com>; Daniel de Graaf <dgdegra@tycho.nsa.gov>; Tim
>> (Xen.org) <tim@xen.org>
>> Subject: RE: [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op
>> 
>> Paul Durrant writes ("RE: [PATCH 5/7] public / x86: introduce
>> __HYPERCALL_iommu_op"):
>> > FWIW Linux appears to use a single '_' prefix and no suffix.
>> 
>> This practice of scattering underscores about, apparently at random,
>> is baffling to me.
>> 
>> It doesn't look like most of the people who do it are aware of the
>> rules.  For example, #defining any identifier starting __ is a licence
>> to the compiler to stuff demons up your nose.
>> 
>> We should do this:
>>   #define XEN_PUBLIC_IOMMU_OP_H
>> which is (i) not in any of the compiler's namespaces (ii) has XEN_ at
>> the beginning so we can justify thinking that it won't clash with
>> anyone else's identifiers (Iii) will never clash with any of our own
>> because it ends in _H.
>> 
> 
> Sounds ok to me. Patch to CODING_STYLE?

I've sent a patch already, but that's addressing the other aspect. I
don't think CODING_STYLE needs to talk about conforming with
language standards. To reduce the risk of misguiding people, I agree
we should see about reducing the number of existing violations. We've
been doing this for quite a while, but the process is rather slow going,
largely because so far we mostly change things we need to touch
anyway.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/7] iommu: make use of type-safe BFN and MFN in exported functions
  2018-03-15 15:44   ` Jan Beulich
  2018-03-16 10:26     ` Paul Durrant
@ 2018-07-10 14:29     ` George Dunlap
  2018-07-10 14:34       ` Jan Beulich
  1 sibling, 1 reply; 68+ messages in thread
From: George Dunlap @ 2018-07-10 14:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Paul Durrant,
	Jun Nakajima, xen-devel

On Thu, Mar 15, 2018 at 3:44 PM, Jan Beulich <JBeulich@suse.com> wrote:
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -2676,13 +2676,12 @@ static int _get_page_type(struct page_info *page, unsigned long type,
>>          struct domain *d = page_get_owner(page);
>>          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>>          {
>> -            gfn_t gfn = _gfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
>> +            bfn_t bfn = _bfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
>>
>>              if ( (x & PGT_type_mask) == PGT_writable_page )
>> -                iommu_ret = iommu_unmap_page(d, gfn_x(gfn));
>> +                iommu_ret = iommu_unmap_page(d, bfn);
>>              else if ( type == PGT_writable_page )
>> -                iommu_ret = iommu_map_page(d, gfn_x(gfn),
>> -                                           mfn_x(page_to_mfn(page)),
>> +                iommu_ret = iommu_map_page(d, bfn, page_to_mfn(page),
>
> Along the lines of what I've said earlier about mixing address spaces,
> this would perhaps not so much need a comment (it's a 1:1 mapping
> after all), but rather making more obvious that it's a 1:1 mapping.
> This in particular would mean to me to latch page_to_mfn(page) into
> a (neutrally named, e.g. "frame") local variable, and use the result in
> a way that makes obviously especially on the "map" path that this
> really requests a 1:1 mapping. By implication from the 1:1 mapping
> it'll then (hopefully) be clear to the reader that which exact name
> space is used doesn't really matter.

I'm sorry, I don't think this is a good idea.

First of all, it doesn't communicate what you think it does.  What
having an extra variable communicates is, "I am calculating an extra
value that will be used somewhere".  When I saw the "intermediate"
variables all over the place, I didn't immediately think "abstract
space because there's a 1-1 mapping", I was simply confused.

On the other hand, it is obvious to me that if you 1) have different
kinds of variables (gfn_t, bfn_t, &c) and 2) you cast one from the
other doing some math, that you're carefully changing address spaces;
and that if you do _bfn(gfn), that you know you have a 1-1 mapping --
or at least, you very much better well have one, or you're doing
something wrong.

"Documenting" something by introducing random extra unused variables
isn't a good idea.  Either people will waste time trying to verify
that they're not used a different way, or people will become
conditioned to the idea that they're not changing, and will overlook
bugs introduced when the variables actually do change.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/7] iommu: make use of type-safe BFN and MFN in exported functions
  2018-07-10 14:29     ` George Dunlap
@ 2018-07-10 14:34       ` Jan Beulich
  2018-07-10 14:37         ` Andrew Cooper
  2018-07-10 14:58         ` George Dunlap
  0 siblings, 2 replies; 68+ messages in thread
From: Jan Beulich @ 2018-07-10 14:34 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Paul Durrant,
	Jun Nakajima, xen-devel

>>> On 10.07.18 at 16:29, <George.Dunlap@eu.citrix.com> wrote:
> On Thu, Mar 15, 2018 at 3:44 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -2676,13 +2676,12 @@ static int _get_page_type(struct page_info *page, 
> unsigned long type,
>>>          struct domain *d = page_get_owner(page);
>>>          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>>>          {
>>> -            gfn_t gfn = _gfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
>>> +            bfn_t bfn = _bfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
>>>
>>>              if ( (x & PGT_type_mask) == PGT_writable_page )
>>> -                iommu_ret = iommu_unmap_page(d, gfn_x(gfn));
>>> +                iommu_ret = iommu_unmap_page(d, bfn);
>>>              else if ( type == PGT_writable_page )
>>> -                iommu_ret = iommu_map_page(d, gfn_x(gfn),
>>> -                                           mfn_x(page_to_mfn(page)),
>>> +                iommu_ret = iommu_map_page(d, bfn, page_to_mfn(page),
>>
>> Along the lines of what I've said earlier about mixing address spaces,
>> this would perhaps not so much need a comment (it's a 1:1 mapping
>> after all), but rather making more obvious that it's a 1:1 mapping.
>> This in particular would mean to me to latch page_to_mfn(page) into
>> a (neutrally named, e.g. "frame") local variable, and use the result in
>> a way that makes obviously especially on the "map" path that this
>> really requests a 1:1 mapping. By implication from the 1:1 mapping
>> it'll then (hopefully) be clear to the reader that which exact name
>> space is used doesn't really matter.
> 
> I'm sorry, I don't think this is a good idea.
> 
> First of all, it doesn't communicate what you think it does.  What
> having an extra variable communicates is, "I am calculating an extra
> value that will be used somewhere".  When I saw the "intermediate"
> variables all over the place, I didn't immediately think "abstract
> space because there's a 1-1 mapping", I was simply confused.
> 
> On the other hand, it is obvious to me that if you 1) have different
> kinds of variables (gfn_t, bfn_t, &c) and 2) you cast one from the
> other doing some math, that you're carefully changing address spaces;
> and that if you do _bfn(gfn), that you know you have a 1-1 mapping --
> or at least, you very much better well have one, or you're doing
> something wrong.

Okay - differing opinions, what do you do. To me an expression like
_bfn(gfn) looks buggy. And iirc we've had bugs of this kind in the
past, which would then contradict your "carefully changing address
spaces" assumption.

As said in the other reply, something like
	iommu_map_page(..., _bfn(frame), frame, ...)
makes pretty clear that a 1:1 mapping is wanted.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/7] iommu: make use of type-safe BFN and MFN in exported functions
  2018-07-10 14:34       ` Jan Beulich
@ 2018-07-10 14:37         ` Andrew Cooper
  2018-07-10 14:58         ` George Dunlap
  1 sibling, 0 replies; 68+ messages in thread
From: Andrew Cooper @ 2018-07-10 14:37 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Tim Deegan, Ian Jackson,
	Julien Grall, Paul Durrant, Jun Nakajima, xen-devel

On 10/07/18 15:34, Jan Beulich wrote:
>>>> On 10.07.18 at 16:29, <George.Dunlap@eu.citrix.com> wrote:
>> On Thu, Mar 15, 2018 at 3:44 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> --- a/xen/arch/x86/mm.c
>>>> +++ b/xen/arch/x86/mm.c
>>>> @@ -2676,13 +2676,12 @@ static int _get_page_type(struct page_info *page, 
>> unsigned long type,
>>>>          struct domain *d = page_get_owner(page);
>>>>          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>>>>          {
>>>> -            gfn_t gfn = _gfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
>>>> +            bfn_t bfn = _bfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
>>>>
>>>>              if ( (x & PGT_type_mask) == PGT_writable_page )
>>>> -                iommu_ret = iommu_unmap_page(d, gfn_x(gfn));
>>>> +                iommu_ret = iommu_unmap_page(d, bfn);
>>>>              else if ( type == PGT_writable_page )
>>>> -                iommu_ret = iommu_map_page(d, gfn_x(gfn),
>>>> -                                           mfn_x(page_to_mfn(page)),
>>>> +                iommu_ret = iommu_map_page(d, bfn, page_to_mfn(page),
>>> Along the lines of what I've said earlier about mixing address spaces,
>>> this would perhaps not so much need a comment (it's a 1:1 mapping
>>> after all), but rather making more obvious that it's a 1:1 mapping.
>>> This in particular would mean to me to latch page_to_mfn(page) into
>>> a (neutrally named, e.g. "frame") local variable, and use the result in
>>> a way that makes obviously especially on the "map" path that this
>>> really requests a 1:1 mapping. By implication from the 1:1 mapping
>>> it'll then (hopefully) be clear to the reader that which exact name
>>> space is used doesn't really matter.
>> I'm sorry, I don't think this is a good idea.
>>
>> First of all, it doesn't communicate what you think it does.  What
>> having an extra variable communicates is, "I am calculating an extra
>> value that will be used somewhere".  When I saw the "intermediate"
>> variables all over the place, I didn't immediately think "abstract
>> space because there's a 1-1 mapping", I was simply confused.
>>
>> On the other hand, it is obvious to me that if you 1) have different
>> kinds of variables (gfn_t, bfn_t, &c) and 2) you cast one from the
>> other doing some math, that you're carefully changing address spaces;
>> and that if you do _bfn(gfn), that you know you have a 1-1 mapping --
>> or at least, you very much better well have one, or you're doing
>> something wrong.
> Okay - differing opinions, what do you do. To me an expression like
> _bfn(gfn) looks buggy. And iirc we've had bugs of this kind in the
> past, which would then contradict your "carefully changing address
> spaces" assumption.
>
> As said in the other reply, something like
> 	iommu_map_page(..., _bfn(frame), frame, ...)
> makes pretty clear that a 1:1 mapping is wanted.

TBH, I think _bfn(gfn) is better, but any such mixing of address spaces
needs a comment explaining the correctness, even if it is a short /* 1:1
mapping here */

Indirecting through an unsigned long (particularly one with a generic
name) is a misuse of the typesafe interface, because all it does is
serve to confuse the reader.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/7] iommu: make use of type-safe BFN and MFN in exported functions
  2018-07-10 14:34       ` Jan Beulich
  2018-07-10 14:37         ` Andrew Cooper
@ 2018-07-10 14:58         ` George Dunlap
  2018-07-10 15:19           ` Jan Beulich
  1 sibling, 1 reply; 68+ messages in thread
From: George Dunlap @ 2018-07-10 14:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Paul Durrant,
	Jun Nakajima, xen-devel

On Tue, Jul 10, 2018 at 3:34 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 10.07.18 at 16:29, <George.Dunlap@eu.citrix.com> wrote:
>> On Thu, Mar 15, 2018 at 3:44 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> --- a/xen/arch/x86/mm.c
>>>> +++ b/xen/arch/x86/mm.c
>>>> @@ -2676,13 +2676,12 @@ static int _get_page_type(struct page_info *page,
>> unsigned long type,
>>>>          struct domain *d = page_get_owner(page);
>>>>          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>>>>          {
>>>> -            gfn_t gfn = _gfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
>>>> +            bfn_t bfn = _bfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
>>>>
>>>>              if ( (x & PGT_type_mask) == PGT_writable_page )
>>>> -                iommu_ret = iommu_unmap_page(d, gfn_x(gfn));
>>>> +                iommu_ret = iommu_unmap_page(d, bfn);
>>>>              else if ( type == PGT_writable_page )
>>>> -                iommu_ret = iommu_map_page(d, gfn_x(gfn),
>>>> -                                           mfn_x(page_to_mfn(page)),
>>>> +                iommu_ret = iommu_map_page(d, bfn, page_to_mfn(page),
>>>
>>> Along the lines of what I've said earlier about mixing address spaces,
>>> this would perhaps not so much need a comment (it's a 1:1 mapping
>>> after all), but rather making more obvious that it's a 1:1 mapping.
>>> This in particular would mean to me to latch page_to_mfn(page) into
>>> a (neutrally named, e.g. "frame") local variable, and use the result in
>>> a way that makes obviously especially on the "map" path that this
>>> really requests a 1:1 mapping. By implication from the 1:1 mapping
>>> it'll then (hopefully) be clear to the reader that which exact name
>>> space is used doesn't really matter.
>>
>> I'm sorry, I don't think this is a good idea.
>>
>> First of all, it doesn't communicate what you think it does.  What
>> having an extra variable communicates is, "I am calculating an extra
>> value that will be used somewhere".  When I saw the "intermediate"
>> variables all over the place, I didn't immediately think "abstract
>> space because there's a 1-1 mapping", I was simply confused.
>>
>> On the other hand, it is obvious to me that if you 1) have different
>> kinds of variables (gfn_t, bfn_t, &c) and 2) you cast one from the
>> other doing some math, that you're carefully changing address spaces;
>> and that if you do _bfn(gfn), that you know you have a 1-1 mapping --
>> or at least, you very much better well have one, or you're doing
>> something wrong.
>
> Okay - differing opinions, what do you do. To me an expression like
> _bfn(gfn) looks buggy. And iirc we've had bugs of this kind in the
> past, which would then contradict your "carefully changing address
> spaces" assumption.

To me it looks the same as
  unsigned long x;
  char * s;
  [do something to calculate x]
  s = (char *)x

Obviously that sort of casting in C has sharp edges, so you need to be careful.

Bugs can happen in any sort of code -- would the bug you have in mind
actually have been prevented with the use of an extra variable?

> As said in the other reply, something like
>         iommu_map_page(..., _bfn(frame), frame, ...)
> makes pretty clear that a 1:1 mapping is wanted.

I just don't see how this is supposed to catch more bugs than
   /* gfns are mapped 1:1 with mfns */
    iommu_map_page(..., _bfn(gfn), gfn, ...)

There may be some places where having an intermediate variable might
make things clearer, but there are an awful lot of places in Paul's
patches where the code just looks like this:
  unsigned long frame = bfn;
  gfn_t gfn = _gfn(frame);
  mfn_t mfn = _mfn(frame);

Which just seems really pointless.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/7] iommu: make use of type-safe BFN and MFN in exported functions
  2018-07-10 14:58         ` George Dunlap
@ 2018-07-10 15:19           ` Jan Beulich
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Beulich @ 2018-07-10 15:19 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Paul Durrant,
	Jun Nakajima, xen-devel

>>> On 10.07.18 at 16:58, <George.Dunlap@eu.citrix.com> wrote:
> On Tue, Jul 10, 2018 at 3:34 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 10.07.18 at 16:29, <George.Dunlap@eu.citrix.com> wrote:
>>> On Thu, Mar 15, 2018 at 3:44 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> --- a/xen/arch/x86/mm.c
>>>>> +++ b/xen/arch/x86/mm.c
>>>>> @@ -2676,13 +2676,12 @@ static int _get_page_type(struct page_info *page,
>>> unsigned long type,
>>>>>          struct domain *d = page_get_owner(page);
>>>>>          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>>>>>          {
>>>>> -            gfn_t gfn = _gfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
>>>>> +            bfn_t bfn = _bfn(mfn_to_gmfn(d, mfn_x(page_to_mfn(page))));
>>>>>
>>>>>              if ( (x & PGT_type_mask) == PGT_writable_page )
>>>>> -                iommu_ret = iommu_unmap_page(d, gfn_x(gfn));
>>>>> +                iommu_ret = iommu_unmap_page(d, bfn);
>>>>>              else if ( type == PGT_writable_page )
>>>>> -                iommu_ret = iommu_map_page(d, gfn_x(gfn),
>>>>> -                                           mfn_x(page_to_mfn(page)),
>>>>> +                iommu_ret = iommu_map_page(d, bfn, page_to_mfn(page),
>>>>
>>>> Along the lines of what I've said earlier about mixing address spaces,
>>>> this would perhaps not so much need a comment (it's a 1:1 mapping
>>>> after all), but rather making more obvious that it's a 1:1 mapping.
>>>> This in particular would mean to me to latch page_to_mfn(page) into
>>>> a (neutrally named, e.g. "frame") local variable, and use the result in
>>>> a way that makes obviously especially on the "map" path that this
>>>> really requests a 1:1 mapping. By implication from the 1:1 mapping
>>>> it'll then (hopefully) be clear to the reader that which exact name
>>>> space is used doesn't really matter.
>>>
>>> I'm sorry, I don't think this is a good idea.
>>>
>>> First of all, it doesn't communicate what you think it does.  What
>>> having an extra variable communicates is, "I am calculating an extra
>>> value that will be used somewhere".  When I saw the "intermediate"
>>> variables all over the place, I didn't immediately think "abstract
>>> space because there's a 1-1 mapping", I was simply confused.
>>>
>>> On the other hand, it is obvious to me that if you 1) have different
>>> kinds of variables (gfn_t, bfn_t, &c) and 2) you cast one from the
>>> other doing some math, that you're carefully changing address spaces;
>>> and that if you do _bfn(gfn), that you know you have a 1-1 mapping --
>>> or at least, you very much better well have one, or you're doing
>>> something wrong.
>>
>> Okay - differing opinions, what do you do. To me an expression like
>> _bfn(gfn) looks buggy. And iirc we've had bugs of this kind in the
>> past, which would then contradict your "carefully changing address
>> spaces" assumption.
> 
> To me it looks the same as
>   unsigned long x;
>   char * s;
>   [do something to calculate x]
>   s = (char *)x
> 
> Obviously that sort of casting in C has sharp edges, so you need to be 
> careful.

Right, and I object to casts wherever I can.

> Bugs can happen in any sort of code -- would the bug you have in mind
> actually have been prevented with the use of an extra variable?

How can I tell? Maybe, maybe not.

>> As said in the other reply, something like
>>         iommu_map_page(..., _bfn(frame), frame, ...)
>> makes pretty clear that a 1:1 mapping is wanted.
> 
> I just don't see how this is supposed to catch more bugs than
>    /* gfns are mapped 1:1 with mfns */
>     iommu_map_page(..., _bfn(gfn), gfn, ...)

Well, with the comment it probably doesn't matter how the
variable(s) is/are named.

> There may be some places where having an intermediate variable might
> make things clearer, but there are an awful lot of places in Paul's
> patches where the code just looks like this:
>   unsigned long frame = bfn;
>   gfn_t gfn = _gfn(frame);
>   mfn_t mfn = _mfn(frame);
> 
> Which just seems really pointless.

This indeed looks to be going too far.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2018-07-10 15:19 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-12 10:47 [PATCH 0/7] paravirtual IOMMU interface Paul Durrant
2018-02-12 10:47 ` [PATCH 1/7] iommu: introduce the concept of BFN Paul Durrant
2018-03-15 13:39   ` Jan Beulich
2018-03-16 10:31     ` Paul Durrant
2018-03-16 10:39       ` Jan Beulich
2018-02-12 10:47 ` [PATCH 2/7] iommu: make use of type-safe BFN and MFN in exported functions Paul Durrant
2018-03-15 15:44   ` Jan Beulich
2018-03-16 10:26     ` Paul Durrant
2018-07-10 14:29     ` George Dunlap
2018-07-10 14:34       ` Jan Beulich
2018-07-10 14:37         ` Andrew Cooper
2018-07-10 14:58         ` George Dunlap
2018-07-10 15:19           ` Jan Beulich
2018-02-12 10:47 ` [PATCH 3/7] iommu: push use of type-safe BFN and MFN into iommu_ops Paul Durrant
2018-03-15 16:15   ` Jan Beulich
2018-03-16 10:22     ` Paul Durrant
2018-02-12 10:47 ` [PATCH 4/7] vtd: add lookup_page method to iommu_ops Paul Durrant
2018-03-15 16:54   ` Jan Beulich
2018-03-16 10:19     ` Paul Durrant
2018-03-16 10:28       ` Jan Beulich
2018-03-16 10:41         ` Paul Durrant
2018-02-12 10:47 ` [PATCH 5/7] public / x86: introduce __HYPERCALL_iommu_op Paul Durrant
2018-02-13  6:43   ` Tian, Kevin
2018-02-13  9:22     ` Paul Durrant
2018-02-23  5:17       ` Tian, Kevin
2018-02-23  9:41         ` Paul Durrant
2018-02-24  2:57           ` Tian, Kevin
2018-02-26  9:57             ` Paul Durrant
2018-02-26 11:55               ` Tian, Kevin
2018-02-27  5:05               ` Tian, Kevin
2018-02-27  9:32                 ` Paul Durrant
2018-02-28  2:53                   ` Tian, Kevin
2018-02-28  8:55                     ` Paul Durrant
2018-03-16 12:25   ` Jan Beulich
2018-06-07 11:42     ` Paul Durrant
2018-06-07 13:21       ` Jan Beulich
2018-06-07 13:45         ` George Dunlap
2018-06-07 14:06           ` Paul Durrant
2018-06-07 14:21             ` Ian Jackson
2018-06-07 15:21               ` Paul Durrant
2018-06-07 15:41                 ` Jan Beulich
2018-02-12 10:47 ` [PATCH 6/7] x86: add iommu_op to query reserved ranges Paul Durrant
2018-02-13  6:51   ` Tian, Kevin
2018-02-13  9:25     ` Paul Durrant
2018-02-23  5:23       ` Tian, Kevin
2018-02-23  9:02         ` Jan Beulich
2018-03-19 14:10   ` Jan Beulich
2018-03-19 15:13     ` Paul Durrant
2018-03-19 16:30       ` Jan Beulich
2018-03-19 15:13   ` Jan Beulich
2018-03-19 15:36     ` Paul Durrant
2018-03-19 16:31       ` Jan Beulich
2018-02-12 10:47 ` [PATCH 7/7] x86: add iommu_ops to map and unmap pages, and also to flush the IOTLB Paul Durrant
2018-02-13  6:55   ` Tian, Kevin
2018-02-13  9:55     ` Paul Durrant
2018-02-23  5:35       ` Tian, Kevin
2018-02-23  9:35         ` Paul Durrant
2018-02-24  3:01           ` Tian, Kevin
2018-02-26  9:38             ` Paul Durrant
2018-03-19 15:11   ` Jan Beulich
2018-03-19 15:34     ` Paul Durrant
2018-03-19 16:49       ` Jan Beulich
2018-03-19 16:57         ` Paul Durrant
2018-03-20  8:11           ` Jan Beulich
2018-03-20  9:32             ` Paul Durrant
2018-03-20  9:49               ` Jan Beulich
2018-02-13  6:21 ` [PATCH 0/7] paravirtual IOMMU interface Tian, Kevin
2018-02-13  9:18   ` Paul Durrant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.