xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Check VT-d Device-TLB flush error
@ 2016-03-17  6:54 Quan Xu
  2016-03-17  6:54 ` [PATCH 1/2] IOMMU/MMU: Adjust top level functions for " Quan Xu
  2016-03-17  6:54 ` [PATCH 2/2] IOMMU/MMU: Adjust low " Quan Xu
  0 siblings, 2 replies; 42+ messages in thread
From: Quan Xu @ 2016-03-17  6:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Quan Xu

this patch set is a prereq patch set for Patch:'VT-d Device-TLB flush issue'.
Current code would be panic(), when VT-d Device-TLB flush timed out. the panic()
is going to be eliminated, so we must check all kinds of error and all the way up
the call trees.

Quan Xu (2):
  IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.


This patch set was patch#1/patch#2 of 'VT-d Device-TLB flush issue' v6 patch sets.
--Changes in this version:
   * Rebase against efc904263f483026672a5cdf3ccf9be8c4fdaa31.
   * Make a reasonable attempt at splitting things, adjusting top level functions first and then
     working the way down to leaf ones.
   * Remove some pointless initializers (Compiler helps me check them).
   * Log error and don't return error value for hardware_domain init and crashed system shutdown.
   * when to populate iommu page table for domu, try to tear down the iommu page table for iommu
     iotlb flush error.
   * when the flush_iotlb_qi() return value is positive, All we need are:
     -call iommu_flush_write_buffer() only when rc > 0
     -return zero from this function when rc is positive (or 'rc = 0' after call iommu_flush_write_buffer()).
   * Fix v4 "VT-d Device-TLB flush issue" unaddressed issue:
     http://lists.xenproject.org/archives/html/xen-devel/2016-01/msg01555.html

 xen/arch/x86/acpi/power.c                     |  14 ++-
 xen/arch/x86/mm.c                             |  13 +--
 xen/arch/x86/mm/p2m-ept.c                     |  12 ++-
 xen/arch/x86/mm/p2m-pt.c                      |  12 ++-
 xen/common/grant_table.c                      |   5 +-
 xen/common/memory.c                           |   5 +-
 xen/drivers/passthrough/amd/iommu_init.c      |  12 ++-
 xen/drivers/passthrough/amd/pci_amd_iommu.c   |   2 +-
 xen/drivers/passthrough/arm/smmu.c            |  10 ++-
 xen/drivers/passthrough/iommu.c               |  25 ++++--
 xen/drivers/passthrough/vtd/extern.h          |   2 +-
 xen/drivers/passthrough/vtd/iommu.c           | 120 ++++++++++++++++++--------
 xen/drivers/passthrough/vtd/quirks.c          |  26 +++---
 xen/drivers/passthrough/vtd/x86/vtd.c         |   7 +-
 xen/drivers/passthrough/x86/iommu.c           |   6 +-
 xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |   2 +-
 xen/include/asm-x86/iommu.h                   |   2 +-
 xen/include/xen/iommu.h                       |  12 +--
 18 files changed, 199 insertions(+), 88 deletions(-)

-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-17  6:54 [PATCH 0/2] Check VT-d Device-TLB flush error Quan Xu
@ 2016-03-17  6:54 ` Quan Xu
  2016-03-17  7:32   ` Tian, Kevin
                     ` (2 more replies)
  2016-03-17  6:54 ` [PATCH 2/2] IOMMU/MMU: Adjust low " Quan Xu
  1 sibling, 3 replies; 42+ messages in thread
From: Quan Xu @ 2016-03-17  6:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Jan Beulich, George Dunlap, Liu Jinsong,
	Dario Faggioli, Jun Nakajima, Andrew Cooper, Quan Xu, Feng Wu

Current code would be panic(), when VT-d Device-TLB flush timed out.
the panic() is going to be eliminated, so we must check all kinds of
error and all the way up the call trees.

Signed-off-by: Quan Xu <quan.xu@intel.com>

CC: Jan Beulich <jbeulich@suse.com>
CC: Liu Jinsong <jinsong.liu@alibaba-inc.com>
CC: Keir Fraser <keir@xen.org>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Feng Wu <feng.wu@intel.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
---
 xen/arch/x86/acpi/power.c             | 14 +++++++++++++-
 xen/arch/x86/mm.c                     | 13 ++++++++-----
 xen/arch/x86/mm/p2m-ept.c             | 10 +++++++++-
 xen/arch/x86/mm/p2m-pt.c              | 12 ++++++++++--
 xen/common/grant_table.c              |  5 +++--
 xen/common/memory.c                   |  5 +++--
 xen/drivers/passthrough/iommu.c       | 16 +++++++++++-----
 xen/drivers/passthrough/vtd/x86/vtd.c |  7 +++++--
 xen/drivers/passthrough/x86/iommu.c   |  6 +++++-
 xen/include/xen/iommu.h               |  6 +++---
 10 files changed, 70 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 2885e31..50edf3f 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -45,6 +45,8 @@ void do_suspend_lowlevel(void);
 
 static int device_power_down(void)
 {
+    int err;
+
     console_suspend();
 
     time_suspend();
@@ -53,11 +55,21 @@ static int device_power_down(void)
 
     ioapic_suspend();
 
-    iommu_suspend();
+    err = iommu_suspend();
+    if ( err )
+        goto iommu_suspend_error;
 
     lapic_suspend();
 
     return 0;
+
+iommu_suspend_error:
+    ioapic_resume();
+    i8259A_resume();
+    time_resume();
+    console_resume();
+
+    return err;
 }
 
 static void device_power_up(void)
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index c997b53..526548e 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info *page, unsigned long type,
                            int preemptible)
 {
     unsigned long nx, x, y = page->u.inuse.type_info;
-    int rc = 0;
+    int rc = 0, ret = 0;
 
     ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
 
@@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info *page, unsigned long type,
         if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
         {
             if ( (x & PGT_type_mask) == PGT_writable_page )
-                iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
+                ret = iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
             else if ( type == PGT_writable_page )
-                iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
-                               page_to_mfn(page),
-                               IOMMUF_readable|IOMMUF_writable);
+                ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
+                                     page_to_mfn(page),
+                                     IOMMUF_readable|IOMMUF_writable);
         }
     }
 
@@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info *page, unsigned long type,
     if ( (x & PGT_partial) && !(nx & PGT_partial) )
         put_page(page);
 
+    if ( !rc )
+        rc = ret;
+
     return rc;
 }
 
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 3cb6868..f9bcce7 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -830,7 +830,15 @@ out:
         {
             if ( iommu_flags )
                 for ( i = 0; i < (1 << order); i++ )
-                    iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
+                {
+                    rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
+                    if ( rc )
+                    {
+                        while ( i-- > 0 )
+                            iommu_unmap_page(d, gfn + i);
+                        break;
+                    }
+                }
             else
                 for ( i = 0; i < (1 << order); i++ )
                     iommu_unmap_page(d, gfn + i);
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 3d80612..c33b753 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -680,8 +680,16 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         }
         else if ( iommu_pte_flags )
             for ( i = 0; i < (1UL << page_order); i++ )
-                iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
-                               iommu_pte_flags);
+            {
+                rc = iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
+                                    iommu_pte_flags);
+                if ( rc )
+                {
+                    while ( i-- > 0 )
+                        iommu_unmap_page(p2m->domain, gfn + i);
+                    break;
+                }
+            }
         else
             for ( i = 0; i < (1UL << page_order); i++ )
                 iommu_unmap_page(p2m->domain, gfn + i);
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 8b22299..b410ffc 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -932,8 +932,9 @@ __gnttab_map_grant_ref(
             {
                 nr_gets++;
                 (void)get_page(pg, rd);
-                if ( !(op->flags & GNTMAP_readonly) )
-                    get_page_type(pg, PGT_writable_page);
+                if ( !(op->flags & GNTMAP_readonly) &&
+                     !get_page_type(pg, PGT_writable_page) )
+                        goto could_not_pin;
             }
         }
     }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index ef57219..543647d 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -678,8 +678,9 @@ static int xenmem_add_to_physmap(struct domain *d,
     if ( need_iommu(d) )
     {
         this_cpu(iommu_dont_flush_iotlb) = 0;
-        iommu_iotlb_flush(d, xatp->idx - done, done);
-        iommu_iotlb_flush(d, xatp->gpfn - done, done);
+        rc = iommu_iotlb_flush(d, xatp->idx - done, done);
+        if ( !rc )
+            rc = iommu_iotlb_flush(d, xatp->gpfn - done, done);
     }
 #endif
 
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index b64676f..29efbfe 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -277,24 +277,28 @@ static void iommu_free_pagetables(unsigned long unused)
                             cpumask_cycle(smp_processor_id(), &cpu_online_map));
 }
 
-void iommu_iotlb_flush(struct domain *d, unsigned long gfn, unsigned int page_count)
+int iommu_iotlb_flush(struct domain *d, unsigned long gfn, unsigned int page_count)
 {
     struct hvm_iommu *hd = domain_hvm_iommu(d);
 
     if ( !iommu_enabled || !hd->platform_ops || !hd->platform_ops->iotlb_flush )
-        return;
+        return 0;
 
     hd->platform_ops->iotlb_flush(d, gfn, page_count);
+
+    return 0;
 }
 
-void iommu_iotlb_flush_all(struct domain *d)
+int iommu_iotlb_flush_all(struct domain *d)
 {
     struct hvm_iommu *hd = domain_hvm_iommu(d);
 
     if ( !iommu_enabled || !hd->platform_ops || !hd->platform_ops->iotlb_flush_all )
-        return;
+        return 0;
 
     hd->platform_ops->iotlb_flush_all(d);
+
+    return 0;
 }
 
 int __init iommu_setup(void)
@@ -368,11 +372,13 @@ int iommu_do_domctl(
     return ret;
 }
 
-void iommu_suspend()
+int iommu_suspend()
 {
     const struct iommu_ops *ops = iommu_get_ops();
     if ( iommu_enabled )
         ops->suspend();
+
+    return 0;
 }
 
 void iommu_share_p2m_table(struct domain* d)
diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c b/xen/drivers/passthrough/vtd/x86/vtd.c
index c0d6aab..e5ab10a 100644
--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -140,8 +140,11 @@ void __hwdom_init vtd_set_hwdom_mapping(struct domain *d)
 
         tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K);
         for ( j = 0; j < tmp; j++ )
-            iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
-                           IOMMUF_readable|IOMMUF_writable);
+            if ( iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
+                                IOMMUF_readable|IOMMUF_writable) )
+                printk(XENLOG_G_ERR
+                       "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx) failed.\n",
+                       pfn * tmp + j, pfn * tmp + j);
 
         if (!(i & (0xfffff >> (PAGE_SHIFT - PAGE_SHIFT_4K))))
             process_pending_softirqs();
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index 8cbb655..d8e3c8f 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct domain *d)
     this_cpu(iommu_dont_flush_iotlb) = 0;
 
     if ( !rc )
-        iommu_iotlb_flush_all(d);
+    {
+        rc = iommu_iotlb_flush_all(d);
+        if ( rc )
+            iommu_teardown(d);
+    }
     else if ( rc != -ERESTART )
         iommu_teardown(d);
 
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 8217cb7..d6d489a 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -167,7 +167,7 @@ struct iommu_ops {
     void (*dump_p2m_table)(struct domain *d);
 };
 
-void iommu_suspend(void);
+int iommu_suspend(void);
 void iommu_resume(void);
 void iommu_crash_shutdown(void);
 int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
@@ -182,8 +182,8 @@ int iommu_do_pci_domctl(struct xen_domctl *, struct domain *d,
 int iommu_do_domctl(struct xen_domctl *, struct domain *d,
                     XEN_GUEST_HANDLE_PARAM(xen_domctl_t));
 
-void iommu_iotlb_flush(struct domain *d, unsigned long gfn, unsigned int page_count);
-void iommu_iotlb_flush_all(struct domain *d);
+int iommu_iotlb_flush(struct domain *d, unsigned long gfn, unsigned int page_count);
+int iommu_iotlb_flush_all(struct domain *d);
 
 /*
  * The purpose of the iommu_dont_flush_iotlb optional cpu flag is to
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.
  2016-03-17  6:54 [PATCH 0/2] Check VT-d Device-TLB flush error Quan Xu
  2016-03-17  6:54 ` [PATCH 1/2] IOMMU/MMU: Adjust top level functions for " Quan Xu
@ 2016-03-17  6:54 ` Quan Xu
  2016-03-17  7:37   ` Tian, Kevin
                     ` (2 more replies)
  1 sibling, 3 replies; 42+ messages in thread
From: Quan Xu @ 2016-03-17  6:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Keir Fraser, Jun Nakajima, George Dunlap,
	Andrew Cooper, Dario Faggioli, Julien Grall, Stefano Stabellini,
	Jan Beulich, Quan Xu, Feng Wu, Suravee Suthikulpanit

Current code would be panic(), when VT-d Device-TLB flush timed out.
the panic() is going to be eliminated, so we must check all kinds of
error and all the way up the call trees.

Signed-off-by: Quan Xu <quan.xu@intel.com>

CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Julien Grall <julien.grall@arm.com>
CC: Feng Wu <feng.wu@intel.com>
CC: Dario Faggioli <dario.faggioli@citrix.com>
---
 xen/arch/x86/mm/p2m-ept.c                     |   2 +-
 xen/drivers/passthrough/amd/iommu_init.c      |  12 ++-
 xen/drivers/passthrough/amd/pci_amd_iommu.c   |   2 +-
 xen/drivers/passthrough/arm/smmu.c            |  10 ++-
 xen/drivers/passthrough/iommu.c               |  17 ++--
 xen/drivers/passthrough/vtd/extern.h          |   2 +-
 xen/drivers/passthrough/vtd/iommu.c           | 120 ++++++++++++++++++--------
 xen/drivers/passthrough/vtd/quirks.c          |  26 +++---
 xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |   2 +-
 xen/include/asm-x86/iommu.h                   |   2 +-
 xen/include/xen/iommu.h                       |   6 +-
 11 files changed, 133 insertions(+), 68 deletions(-)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index f9bcce7..fa6c710 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -825,7 +825,7 @@ out:
          need_modify_vtd_table )
     {
         if ( iommu_hap_pt_share )
-            iommu_pte_flush(d, gfn, &ept_entry->epte, order, vtd_pte_present);
+            rc = iommu_pte_flush(d, gfn, &ept_entry->epte, order, vtd_pte_present);
         else
         {
             if ( iommu_flags )
diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
index 4536106..8bb47b2 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1339,12 +1339,14 @@ static void invalidate_all_devices(void)
     iterate_ivrs_mappings(_invalidate_all_devices);
 }
 
-void amd_iommu_suspend(void)
+int amd_iommu_suspend(void)
 {
     struct amd_iommu *iommu;
 
     for_each_amd_iommu ( iommu )
         disable_iommu(iommu);
+
+    return 0;
 }
 
 void amd_iommu_resume(void)
@@ -1368,3 +1370,11 @@ void amd_iommu_resume(void)
         invalidate_all_domain_pages();
     }
 }
+
+void amd_iommu_crash_shutdown(void)
+{
+    struct amd_iommu *iommu;
+
+    for_each_amd_iommu ( iommu )
+        disable_iommu(iommu);
+}
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index c8ee8dc..fb8e816 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -628,6 +628,6 @@ const struct iommu_ops amd_iommu_ops = {
     .suspend = amd_iommu_suspend,
     .resume = amd_iommu_resume,
     .share_p2m = amd_iommu_share_p2m,
-    .crash_shutdown = amd_iommu_suspend,
+    .crash_shutdown = amd_iommu_crash_shutdown,
     .dump_p2m_table = amd_dump_p2m_table,
 };
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 62da087..36c2c83 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2540,7 +2540,7 @@ static int force_stage = 2;
  */
 static u32 platform_features = ARM_SMMU_FEAT_COHERENT_WALK;
 
-static void arm_smmu_iotlb_flush_all(struct domain *d)
+static int arm_smmu_iotlb_flush_all(struct domain *d)
 {
 	struct arm_smmu_xen_domain *smmu_domain = domain_hvm_iommu(d)->arch.priv;
 	struct iommu_domain *cfg;
@@ -2557,13 +2557,15 @@ static void arm_smmu_iotlb_flush_all(struct domain *d)
 		arm_smmu_tlb_inv_context(cfg->priv);
 	}
 	spin_unlock(&smmu_domain->lock);
+
+    return 0;
 }
 
-static void arm_smmu_iotlb_flush(struct domain *d, unsigned long gfn,
-                                 unsigned int page_count)
+static int arm_smmu_iotlb_flush(struct domain *d, unsigned long gfn,
+                                unsigned int page_count)
 {
     /* ARM SMMU v1 doesn't have flush by VMA and VMID */
-    arm_smmu_iotlb_flush_all(d);
+    return arm_smmu_iotlb_flush_all(d);
 }
 
 static struct iommu_domain *arm_smmu_get_domain(struct domain *d,
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 29efbfe..08bb6f7 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -182,7 +182,11 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
                  ((page->u.inuse.type_info & PGT_type_mask)
                   == PGT_writable_page) )
                 mapping |= IOMMUF_writable;
-            hd->platform_ops->map_page(d, gfn, mfn, mapping);
+            if ( hd->platform_ops->map_page(d, gfn, mfn, mapping) )
+                printk(XENLOG_G_ERR
+                       "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx) failed.\n",
+                       gfn, mfn);
+
             if ( !(i++ & 0xfffff) )
                 process_pending_softirqs();
         }
@@ -284,9 +288,7 @@ int iommu_iotlb_flush(struct domain *d, unsigned long gfn, unsigned int page_cou
     if ( !iommu_enabled || !hd->platform_ops || !hd->platform_ops->iotlb_flush )
         return 0;
 
-    hd->platform_ops->iotlb_flush(d, gfn, page_count);
-
-    return 0;
+    return hd->platform_ops->iotlb_flush(d, gfn, page_count);
 }
 
 int iommu_iotlb_flush_all(struct domain *d)
@@ -296,9 +298,7 @@ int iommu_iotlb_flush_all(struct domain *d)
     if ( !iommu_enabled || !hd->platform_ops || !hd->platform_ops->iotlb_flush_all )
         return 0;
 
-    hd->platform_ops->iotlb_flush_all(d);
-
-    return 0;
+    return hd->platform_ops->iotlb_flush_all(d);
 }
 
 int __init iommu_setup(void)
@@ -375,8 +375,9 @@ int iommu_do_domctl(
 int iommu_suspend()
 {
     const struct iommu_ops *ops = iommu_get_ops();
+
     if ( iommu_enabled )
-        ops->suspend();
+        return ops->suspend();
 
     return 0;
 }
diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index cbe0286..d4d37c3 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -91,7 +91,7 @@ int is_igd_vt_enabled_quirk(void);
 void platform_quirks_init(void);
 void vtd_ops_preamble_quirk(struct iommu* iommu);
 void vtd_ops_postamble_quirk(struct iommu* iommu);
-void me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map);
+int me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map);
 void pci_vtd_quirk(const struct pci_dev *);
 bool_t platform_supports_intremap(void);
 bool_t platform_supports_x2apic(void);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 5ad25dc..4f8b63a 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -542,11 +542,12 @@ static int iommu_flush_iotlb_psi(
     return status;
 }
 
-static void iommu_flush_all(void)
+static int iommu_flush_all(void)
 {
     struct acpi_drhd_unit *drhd;
     struct iommu *iommu;
     int flush_dev_iotlb;
+    int rc = 0;
 
     flush_all_cache();
     for_each_drhd_unit ( drhd )
@@ -554,11 +555,24 @@ static void iommu_flush_all(void)
         iommu = drhd->iommu;
         iommu_flush_context_global(iommu, 0);
         flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
-        iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
+        rc = iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
+
+        if ( rc > 0 )
+        {
+            iommu_flush_write_buffer(iommu);
+            rc = 0;
+        }
+        else if ( rc < 0 )
+        {
+            printk(XENLOG_G_ERR "IOMMU: IOMMU flush all failed.\n");
+            break;
+        }
     }
+
+    return rc;
 }
 
-static void __intel_iommu_iotlb_flush(struct domain *d, unsigned long gfn,
+static int __intel_iommu_iotlb_flush(struct domain *d, unsigned long gfn,
         int dma_old_pte_present, unsigned int page_count)
 {
     struct hvm_iommu *hd = domain_hvm_iommu(d);
@@ -566,6 +580,7 @@ static void __intel_iommu_iotlb_flush(struct domain *d, unsigned long gfn,
     struct iommu *iommu;
     int flush_dev_iotlb;
     int iommu_domid;
+    int rc = 0;
 
     /*
      * No need pcideves_lock here because we have flush
@@ -584,37 +599,40 @@ static void __intel_iommu_iotlb_flush(struct domain *d, unsigned long gfn,
             continue;
 
         if ( page_count != 1 || gfn == INVALID_GFN )
-        {
-            if ( iommu_flush_iotlb_dsi(iommu, iommu_domid,
-                        0, flush_dev_iotlb) )
-                iommu_flush_write_buffer(iommu);
-        }
+            rc = iommu_flush_iotlb_dsi(iommu, iommu_domid,
+                                       0, flush_dev_iotlb);
         else
+            rc = iommu_flush_iotlb_psi(iommu, iommu_domid,
+                                       (paddr_t)gfn << PAGE_SHIFT_4K, 0,
+                                       !dma_old_pte_present,
+                                       flush_dev_iotlb);
+        if ( rc > 0 )
         {
-            if ( iommu_flush_iotlb_psi(iommu, iommu_domid,
-                        (paddr_t)gfn << PAGE_SHIFT_4K, PAGE_ORDER_4K,
-                        !dma_old_pte_present, flush_dev_iotlb) )
-                iommu_flush_write_buffer(iommu);
+            iommu_flush_write_buffer(iommu);
+            rc = 0;
         }
     }
+
+    return rc;
 }
 
-static void intel_iommu_iotlb_flush(struct domain *d, unsigned long gfn, unsigned int page_count)
+static int intel_iommu_iotlb_flush(struct domain *d, unsigned long gfn, unsigned int page_count)
 {
-    __intel_iommu_iotlb_flush(d, gfn, 1, page_count);
+    return __intel_iommu_iotlb_flush(d, gfn, 1, page_count);
 }
 
-static void intel_iommu_iotlb_flush_all(struct domain *d)
+static int intel_iommu_iotlb_flush_all(struct domain *d)
 {
-    __intel_iommu_iotlb_flush(d, INVALID_GFN, 0, 0);
+    return __intel_iommu_iotlb_flush(d, INVALID_GFN, 0, 0);
 }
 
 /* clear one page's page table */
-static void dma_pte_clear_one(struct domain *domain, u64 addr)
+static int dma_pte_clear_one(struct domain *domain, u64 addr)
 {
     struct hvm_iommu *hd = domain_hvm_iommu(domain);
     struct dma_pte *page = NULL, *pte = NULL;
     u64 pg_maddr;
+    int rc = 0;
 
     spin_lock(&hd->arch.mapping_lock);
     /* get last level pte */
@@ -622,7 +640,7 @@ static void dma_pte_clear_one(struct domain *domain, u64 addr)
     if ( pg_maddr == 0 )
     {
         spin_unlock(&hd->arch.mapping_lock);
-        return;
+        return -ENOMEM;
     }
 
     page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
@@ -632,7 +650,7 @@ static void dma_pte_clear_one(struct domain *domain, u64 addr)
     {
         spin_unlock(&hd->arch.mapping_lock);
         unmap_vtd_domain_page(page);
-        return;
+        return 0;
     }
 
     dma_clear_pte(*pte);
@@ -640,9 +658,11 @@ static void dma_pte_clear_one(struct domain *domain, u64 addr)
     iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
 
     if ( !this_cpu(iommu_dont_flush_iotlb) )
-        __intel_iommu_iotlb_flush(domain, addr >> PAGE_SHIFT_4K, 1, 1);
+        rc = __intel_iommu_iotlb_flush(domain, addr >> PAGE_SHIFT_4K, 1, 1);
 
     unmap_vtd_domain_page(page);
+
+    return rc;
 }
 
 static void iommu_free_pagetable(u64 pt_maddr, int level)
@@ -1281,6 +1301,7 @@ int domain_context_mapping_one(
     u64 maddr, pgd_maddr;
     u16 seg = iommu->intel->drhd->segment;
     int agaw;
+    int rc = 0;
 
     ASSERT(pcidevs_locked());
     spin_lock(&iommu->lock);
@@ -1400,7 +1421,14 @@ int domain_context_mapping_one(
     else
     {
         int flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
-        iommu_flush_iotlb_dsi(iommu, 0, 1, flush_dev_iotlb);
+
+        rc = iommu_flush_iotlb_dsi(iommu, 0, 1, flush_dev_iotlb);
+
+        if ( rc > 0 )
+        {
+            iommu_flush_write_buffer(iommu);
+            rc = 0;
+        }
     }
 
     set_bit(iommu->index, &hd->arch.iommu_bitmap);
@@ -1410,7 +1438,7 @@ int domain_context_mapping_one(
     if ( !seg )
         me_wifi_quirk(domain, bus, devfn, MAP_ME_PHANTOM_FUNC);
 
-    return 0;
+    return rc;
 }
 
 static int domain_context_mapping(
@@ -1505,6 +1533,7 @@ int domain_context_unmap_one(
     struct context_entry *context, *context_entries;
     u64 maddr;
     int iommu_domid;
+    int rc = 0;
 
     ASSERT(pcidevs_locked());
     spin_lock(&iommu->lock);
@@ -1539,7 +1568,14 @@ int domain_context_unmap_one(
     else
     {
         int flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
-        iommu_flush_iotlb_dsi(iommu, iommu_domid, 0, flush_dev_iotlb);
+
+        rc = iommu_flush_iotlb_dsi(iommu, iommu_domid, 0, flush_dev_iotlb);
+
+        if ( rc > 0 )
+        {
+            iommu_flush_write_buffer(iommu);
+            rc = 0;
+        }
     }
 
     spin_unlock(&iommu->lock);
@@ -1548,7 +1584,7 @@ int domain_context_unmap_one(
     if ( !iommu->intel->drhd->segment )
         me_wifi_quirk(domain, bus, devfn, UNMAP_ME_PHANTOM_FUNC);
 
-    return 0;
+    return rc;
 }
 
 static int domain_context_unmap(
@@ -1738,7 +1774,7 @@ static int intel_iommu_map_page(
     unmap_vtd_domain_page(page);
 
     if ( !this_cpu(iommu_dont_flush_iotlb) )
-        __intel_iommu_iotlb_flush(d, gfn, dma_pte_present(old), 1);
+        return __intel_iommu_iotlb_flush(d, gfn, dma_pte_present(old), 1);
 
     return 0;
 }
@@ -1749,19 +1785,18 @@ static int intel_iommu_unmap_page(struct domain *d, unsigned long gfn)
     if ( iommu_passthrough && is_hardware_domain(d) )
         return 0;
 
-    dma_pte_clear_one(d, (paddr_t)gfn << PAGE_SHIFT_4K);
-
-    return 0;
+    return dma_pte_clear_one(d, (paddr_t)gfn << PAGE_SHIFT_4K);
 }
 
-void iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte,
-                     int order, int present)
+int iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte,
+                    int order, int present)
 {
     struct acpi_drhd_unit *drhd;
     struct iommu *iommu = NULL;
     struct hvm_iommu *hd = domain_hvm_iommu(d);
     int flush_dev_iotlb;
     int iommu_domid;
+    int rc = 0;
 
     iommu_flush_cache_entry(pte, sizeof(struct dma_pte));
 
@@ -1775,11 +1810,17 @@ void iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte,
         iommu_domid= domain_iommu_domid(d, iommu);
         if ( iommu_domid == -1 )
             continue;
-        if ( iommu_flush_iotlb_psi(iommu, iommu_domid,
+        rc = iommu_flush_iotlb_psi(iommu, iommu_domid,
                                    (paddr_t)gfn << PAGE_SHIFT_4K,
-                                   order, !present, flush_dev_iotlb) )
+                                   order, !present, flush_dev_iotlb);
+        if ( rc > 0 )
+        {
             iommu_flush_write_buffer(iommu);
+            rc = 0;
+        }
     }
+
+    return rc;
 }
 
 static int __init vtd_ept_page_compatible(struct iommu *iommu)
@@ -2099,8 +2140,8 @@ static int init_vtd_hw(void)
             return -EIO;
         }
     }
-    iommu_flush_all();
-    return 0;
+
+    return iommu_flush_all();
 }
 
 static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
@@ -2389,16 +2430,19 @@ static int intel_iommu_group_id(u16 seg, u8 bus, u8 devfn)
 }
 
 static u32 iommu_state[MAX_IOMMUS][MAX_IOMMU_REGS];
-static void vtd_suspend(void)
+static int vtd_suspend(void)
 {
     struct acpi_drhd_unit *drhd;
     struct iommu *iommu;
     u32    i;
+    int rc;
 
     if ( !iommu_enabled )
-        return;
+        return 0;
 
-    iommu_flush_all();
+    rc = iommu_flush_all();
+    if ( rc )
+        return rc;
 
     for_each_drhd_unit ( drhd )
     {
@@ -2427,6 +2471,8 @@ static void vtd_suspend(void)
         if ( !iommu_intremap && iommu_qinval )
             disable_qinval(iommu);
     }
+
+    return 0;
 }
 
 static void vtd_crash_shutdown(void)
diff --git a/xen/drivers/passthrough/vtd/quirks.c b/xen/drivers/passthrough/vtd/quirks.c
index 49df41d..a230b68 100644
--- a/xen/drivers/passthrough/vtd/quirks.c
+++ b/xen/drivers/passthrough/vtd/quirks.c
@@ -332,10 +332,11 @@ void __init platform_quirks_init(void)
  * assigning Intel integrated wifi device to a guest.
  */
 
-static void map_me_phantom_function(struct domain *domain, u32 dev, int map)
+static int map_me_phantom_function(struct domain *domain, u32 dev, int map)
 {
     struct acpi_drhd_unit *drhd;
     struct pci_dev *pdev;
+    int rc;
 
     /* find ME VT-d engine base on a real ME device */
     pdev = pci_get_pdev(0, 0, PCI_DEVFN(dev, 0));
@@ -343,23 +344,26 @@ static void map_me_phantom_function(struct domain *domain, u32 dev, int map)
 
     /* map or unmap ME phantom function */
     if ( map )
-        domain_context_mapping_one(domain, drhd->iommu, 0,
-                                   PCI_DEVFN(dev, 7), NULL);
+        rc = domain_context_mapping_one(domain, drhd->iommu, 0,
+                                        PCI_DEVFN(dev, 7), NULL);
     else
-        domain_context_unmap_one(domain, drhd->iommu, 0,
-                                 PCI_DEVFN(dev, 7));
+        rc = domain_context_unmap_one(domain, drhd->iommu, 0,
+                                      PCI_DEVFN(dev, 7));
+
+    return rc;
 }
 
-void me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
+int me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
 {
     u32 id;
+    int rc = 0;
 
     id = pci_conf_read32(0, 0, 0, 0, 0);
     if ( IS_CTG(id) )
     {
         /* quit if ME does not exist */
         if ( pci_conf_read32(0, 0, 3, 0, 0) == 0xffffffff )
-            return;
+            return -ENOENT;
 
         /* if device is WLAN device, map ME phantom device 0:3.7 */
         id = pci_conf_read32(0, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), 0);
@@ -373,7 +377,7 @@ void me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
             case 0x423b8086:
             case 0x423c8086:
             case 0x423d8086:
-                map_me_phantom_function(domain, 3, map);
+                rc = map_me_phantom_function(domain, 3, map);
                 break;
             default:
                 break;
@@ -383,7 +387,7 @@ void me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
     {
         /* quit if ME does not exist */
         if ( pci_conf_read32(0, 0, 22, 0, 0) == 0xffffffff )
-            return;
+            return -ENOENT;
 
         /* if device is WLAN device, map ME phantom device 0:22.7 */
         id = pci_conf_read32(0, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), 0);
@@ -399,12 +403,14 @@ void me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
             case 0x42388086:        /* Puma Peak */
             case 0x422b8086:
             case 0x422c8086:
-                map_me_phantom_function(domain, 22, map);
+                rc = map_me_phantom_function(domain, 22, map);
                 break;
             default:
                 break;
         }
     }
+
+    return rc;
 }
 
 void pci_vtd_quirk(const struct pci_dev *pdev)
diff --git a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
index 9c51172..f540fc8 100644
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -119,7 +119,7 @@ extern unsigned long *shared_intremap_inuse;
 
 /* power management support */
 void amd_iommu_resume(void);
-void amd_iommu_suspend(void);
+int amd_iommu_suspend(void);
 void amd_iommu_crash_shutdown(void);
 
 /* guest iommu support */
diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
index f22b3a5..790701e 100644
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -26,7 +26,7 @@ int iommu_setup_hpet_msi(struct msi_desc *);
 
 /* While VT-d specific, this must get declared in a generic header. */
 int adjust_vtd_irq_affinities(void);
-void iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte, int order, int present);
+int iommu_pte_flush(struct domain *d, u64 gfn, u64 *pte, int order, int present);
 bool_t iommu_supports_eim(void);
 int iommu_enable_x2apic_IR(void);
 void iommu_disable_x2apic_IR(void);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index d6d489a..8710dfe 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -157,12 +157,12 @@ struct iommu_ops {
     unsigned int (*read_apic_from_ire)(unsigned int apic, unsigned int reg);
     int (*setup_hpet_msi)(struct msi_desc *);
 #endif /* CONFIG_X86 */
-    void (*suspend)(void);
+    int (*suspend)(void);
     void (*resume)(void);
     void (*share_p2m)(struct domain *d);
     void (*crash_shutdown)(void);
-    void (*iotlb_flush)(struct domain *d, unsigned long gfn, unsigned int page_count);
-    void (*iotlb_flush_all)(struct domain *d);
+    int (*iotlb_flush)(struct domain *d, unsigned long gfn, unsigned int page_count);
+    int (*iotlb_flush_all)(struct domain *d);
     int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
     void (*dump_p2m_table)(struct domain *d);
 };
-- 
1.9.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-17  6:54 ` [PATCH 1/2] IOMMU/MMU: Adjust top level functions for " Quan Xu
@ 2016-03-17  7:32   ` Tian, Kevin
  2016-03-17  7:58     ` Jan Beulich
  2016-03-17 12:30   ` George Dunlap
  2016-03-17 17:14   ` Jan Beulich
  2 siblings, 1 reply; 42+ messages in thread
From: Tian, Kevin @ 2016-03-17  7:32 UTC (permalink / raw)
  To: Xu, Quan, xen-devel
  Cc: Keir Fraser, Jan Beulich, George Dunlap, Liu Jinsong,
	Dario Faggioli, Nakajima, Jun, Andrew Cooper, Wu, Feng

> From: Xu, Quan
> Sent: Thursday, March 17, 2016 2:55 PM
> 
> diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c
> b/xen/drivers/passthrough/vtd/x86/vtd.c
> index c0d6aab..e5ab10a 100644
> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> @@ -140,8 +140,11 @@ void __hwdom_init vtd_set_hwdom_mapping(struct domain *d)
> 
>          tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K);
>          for ( j = 0; j < tmp; j++ )
> -            iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
> -                           IOMMUF_readable|IOMMUF_writable);
> +            if ( iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
> +                                IOMMUF_readable|IOMMUF_writable) )
> +                printk(XENLOG_G_ERR
> +                       "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx) failed.\n",
> +                       pfn * tmp + j, pfn * tmp + j);
> 
>          if (!(i & (0xfffff >> (PAGE_SHIFT - PAGE_SHIFT_4K))))
>              process_pending_softirqs();

Hi, Quan, this patch looks good to me in general. Just double confirm one
thing. Above doesn't handle error from iommu_map_page, while only
throwing out warning. Not sure whether it has been discussed before
as the agreement or not.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.
  2016-03-17  6:54 ` [PATCH 2/2] IOMMU/MMU: Adjust low " Quan Xu
@ 2016-03-17  7:37   ` Tian, Kevin
  2016-03-18  2:30     ` Xu, Quan
  2016-03-17 15:31   ` George Dunlap
  2016-03-18 10:20   ` Jan Beulich
  2 siblings, 1 reply; 42+ messages in thread
From: Tian, Kevin @ 2016-03-17  7:37 UTC (permalink / raw)
  To: Xu, Quan, xen-devel
  Cc: Nakajima, Jun, Keir Fraser, Jan Beulich, George Dunlap,
	Andrew Cooper, Dario Faggioli, Julien Grall, Stefano Stabellini,
	Suravee Suthikulpanit, Wu, Feng

> From: Xu, Quan
> Sent: Thursday, March 17, 2016 2:55 PM
> 
> Current code would be panic(), when VT-d Device-TLB flush timed out.
> the panic() is going to be eliminated, so we must check all kinds of
> error and all the way up the call trees.

sorry that I'm unclear what is the criteria of defining high level and
low level functions in two patches. Could you elaborate? Once I thought
you may mean common code and vendor specific code, however...

> 
> Signed-off-by: Quan Xu <quan.xu@intel.com>
> 
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Julien Grall <julien.grall@arm.com>
> CC: Feng Wu <feng.wu@intel.com>
> CC: Dario Faggioli <dario.faggioli@citrix.com>
> ---
>  xen/arch/x86/mm/p2m-ept.c                     |   2 +-
>  xen/drivers/passthrough/amd/iommu_init.c      |  12 ++-
>  xen/drivers/passthrough/amd/pci_amd_iommu.c   |   2 +-
>  xen/drivers/passthrough/arm/smmu.c            |  10 ++-
>  xen/drivers/passthrough/iommu.c               |  17 ++--
>  xen/drivers/passthrough/vtd/extern.h          |   2 +-
>  xen/drivers/passthrough/vtd/iommu.c           | 120
> ++++++++++++++++++--------
>  xen/drivers/passthrough/vtd/quirks.c          |  26 +++---
>  xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |   2 +-
>  xen/include/asm-x86/iommu.h                   |   2 +-
>  xen/include/xen/iommu.h                       |   6 +-

Above you have general passthrough/iommu.c though most
of others are vendor specific.

Then in PATCH [1/2], you have:

xen/arch/x86/acpi/power.c             | 14 +++++++++++++-
 xen/arch/x86/mm.c                     | 13 ++++++++-----
 xen/arch/x86/mm/p2m-ept.c             | 10 +++++++++-
 xen/arch/x86/mm/p2m-pt.c              | 12 ++++++++++--
 xen/common/grant_table.c              |  5 +++--
 xen/common/memory.c                   |  5 +++--
 xen/drivers/passthrough/iommu.c       | 16 +++++++++++-----
 xen/drivers/passthrough/vtd/x86/vtd.c |  7 +++++--
 xen/drivers/passthrough/x86/iommu.c   |  6 +++++-
 xen/include/xen/iommu.h               |  6 +++---

They are also mixed.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-17  7:32   ` Tian, Kevin
@ 2016-03-17  7:58     ` Jan Beulich
  2016-03-17  8:00       ` Tian, Kevin
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2016-03-17  7:58 UTC (permalink / raw)
  To: Kevin Tian
  Cc: Feng Wu, Quan Xu, George Dunlap, Liu Jinsong, Dario Faggioli,
	xen-devel, Jun Nakajima, Andrew Cooper, Keir Fraser

>>> On 17.03.16 at 08:32, <kevin.tian@intel.com> wrote:
>>  From: Xu, Quan
>> Sent: Thursday, March 17, 2016 2:55 PM
>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>> @@ -140,8 +140,11 @@ void __hwdom_init vtd_set_hwdom_mapping(struct domain *d)
>> 
>>          tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K);
>>          for ( j = 0; j < tmp; j++ )
>> -            iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
>> -                           IOMMUF_readable|IOMMUF_writable);
>> +            if ( iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
>> +                                IOMMUF_readable|IOMMUF_writable) )
>> +                printk(XENLOG_G_ERR
>> +                       "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx) failed.\n",
>> +                       pfn * tmp + j, pfn * tmp + j);
>> 
>>          if (!(i & (0xfffff >> (PAGE_SHIFT - PAGE_SHIFT_4K))))
>>              process_pending_softirqs();
> 
> Hi, Quan, this patch looks good to me in general. Just double confirm one
> thing. Above doesn't handle error from iommu_map_page, while only
> throwing out warning. Not sure whether it has been discussed before
> as the agreement or not.

For code paths involved in preparing the hardware domain only
I had specifically asked to handle such in a best effort manner,
instead of explicitly rendering a system unbootable.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-17  7:58     ` Jan Beulich
@ 2016-03-17  8:00       ` Tian, Kevin
  0 siblings, 0 replies; 42+ messages in thread
From: Tian, Kevin @ 2016-03-17  8:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wu, Feng, Xu, Quan, George Dunlap, Liu Jinsong, Dario Faggioli,
	xen-devel, Nakajima, Jun, Andrew Cooper, Keir Fraser

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, March 17, 2016 3:59 PM
> 
> >>> On 17.03.16 at 08:32, <kevin.tian@intel.com> wrote:
> >>  From: Xu, Quan
> >> Sent: Thursday, March 17, 2016 2:55 PM
> >> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> >> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> >> @@ -140,8 +140,11 @@ void __hwdom_init vtd_set_hwdom_mapping(struct domain
> *d)
> >>
> >>          tmp = 1 << (PAGE_SHIFT - PAGE_SHIFT_4K);
> >>          for ( j = 0; j < tmp; j++ )
> >> -            iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
> >> -                           IOMMUF_readable|IOMMUF_writable);
> >> +            if ( iommu_map_page(d, pfn * tmp + j, pfn * tmp + j,
> >> +                                IOMMUF_readable|IOMMUF_writable) )
> >> +                printk(XENLOG_G_ERR
> >> +                       "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx) failed.\n",
> >> +                       pfn * tmp + j, pfn * tmp + j);
> >>
> >>          if (!(i & (0xfffff >> (PAGE_SHIFT - PAGE_SHIFT_4K))))
> >>              process_pending_softirqs();
> >
> > Hi, Quan, this patch looks good to me in general. Just double confirm one
> > thing. Above doesn't handle error from iommu_map_page, while only
> > throwing out warning. Not sure whether it has been discussed before
> > as the agreement or not.
> 
> For code paths involved in preparing the hardware domain only
> I had specifically asked to handle such in a best effort manner,
> instead of explicitly rendering a system unbootable.
> 

OK, good to know that.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-17  6:54 ` [PATCH 1/2] IOMMU/MMU: Adjust top level functions for " Quan Xu
  2016-03-17  7:32   ` Tian, Kevin
@ 2016-03-17 12:30   ` George Dunlap
  2016-03-17 12:33     ` George Dunlap
  2016-03-18  7:54     ` Xu, Quan
  2016-03-17 17:14   ` Jan Beulich
  2 siblings, 2 replies; 42+ messages in thread
From: George Dunlap @ 2016-03-17 12:30 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Keir Fraser, Jun Nakajima, Liu Jinsong,
	Dario Faggioli, xen-devel, Jan Beulich, Andrew Cooper, Feng Wu

On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote:
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index c997b53..526548e 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info *page, unsigned long type,
>                             int preemptible)
>  {
>      unsigned long nx, x, y = page->u.inuse.type_info;
> -    int rc = 0;
> +    int rc = 0, ret = 0;
>
>      ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
>
> @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info *page, unsigned long type,
>          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>          {
>              if ( (x & PGT_type_mask) == PGT_writable_page )
> -                iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
> +                ret = iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
>              else if ( type == PGT_writable_page )
> -                iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
> -                               page_to_mfn(page),
> -                               IOMMUF_readable|IOMMUF_writable);
> +                ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
> +                                     page_to_mfn(page),
> +                                     IOMMUF_readable|IOMMUF_writable);
>          }
>      }
>
> @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info *page, unsigned long type,
>      if ( (x & PGT_partial) && !(nx & PGT_partial) )
>          put_page(page);
>
> +    if ( !rc )
> +        rc = ret;
> +

What's this about?  If the iommu_[un]map_page() operation times out,
we still go through with calling alloc_page_type(); and if
alloc_page_type() fails we return its failure value, but if it
succeeds, we return the error from iommu_[un]map_page()?

>      return rc;
>  }
>
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index 3cb6868..f9bcce7 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -830,7 +830,15 @@ out:
>          {
>              if ( iommu_flags )
>                  for ( i = 0; i < (1 << order); i++ )
> -                    iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
> +                {
> +                    rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
> +                    if ( rc )
> +                    {
> +                        while ( i-- > 0 )
> +                            iommu_unmap_page(d, gfn + i);

This won't unmap gfn+0 (since it will break out when i == 0 without
calling unmap).

If i were signed, we could make the conditional ">= 0"; but
unfortunately it's unsigned, so you'll have to do something more
complicated.

While we're at it, is it worth adding "unlikely()" to the if() condition here?

> +                        break;
> +                    }
> +                }
>              else
>                  for ( i = 0; i < (1 << order); i++ )
>                      iommu_unmap_page(d, gfn + i);
> diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
> index 3d80612..c33b753 100644
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -680,8 +680,16 @@ p2m_pt_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
>          }
>          else if ( iommu_pte_flags )
>              for ( i = 0; i < (1UL << page_order); i++ )
> -                iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
> -                               iommu_pte_flags);
> +            {
> +                rc = iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
> +                                    iommu_pte_flags);
> +                if ( rc )
> +                {
> +                    while ( i-- > 0 )
> +                        iommu_unmap_page(p2m->domain, gfn + i);

Same with gfn+0.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-17 12:30   ` George Dunlap
@ 2016-03-17 12:33     ` George Dunlap
  2016-03-18  3:19       ` Xu, Quan
  2016-03-18  7:54     ` Xu, Quan
  1 sibling, 1 reply; 42+ messages in thread
From: George Dunlap @ 2016-03-17 12:33 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Keir Fraser, Jun Nakajima, Liu Jinsong,
	Dario Faggioli, xen-devel, Jan Beulich, Andrew Cooper, Feng Wu

On Thu, Mar 17, 2016 at 12:30 PM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote:
>> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
>> index c997b53..526548e 100644
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info *page, unsigned long type,
>>                             int preemptible)
>>  {
>>      unsigned long nx, x, y = page->u.inuse.type_info;
>> -    int rc = 0;
>> +    int rc = 0, ret = 0;
>>
>>      ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
>>
>> @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info *page, unsigned long type,
>>          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>>          {
>>              if ( (x & PGT_type_mask) == PGT_writable_page )
>> -                iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
>> +                ret = iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
>>              else if ( type == PGT_writable_page )
>> -                iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
>> -                               page_to_mfn(page),
>> -                               IOMMUF_readable|IOMMUF_writable);
>> +                ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
>> +                                     page_to_mfn(page),
>> +                                     IOMMUF_readable|IOMMUF_writable);
>>          }
>>      }
>>
>> @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info *page, unsigned long type,
>>      if ( (x & PGT_partial) && !(nx & PGT_partial) )
>>          put_page(page);
>>
>> +    if ( !rc )
>> +        rc = ret;
>> +
>
> What's this about?  If the iommu_[un]map_page() operation times out,
> we still go through with calling alloc_page_type(); and if
> alloc_page_type() fails we return its failure value, but if it
> succeeds, we return the error from iommu_[un]map_page()?
>
>>      return rc;
>>  }
>>
>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>> index 3cb6868..f9bcce7 100644
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -830,7 +830,15 @@ out:
>>          {
>>              if ( iommu_flags )
>>                  for ( i = 0; i < (1 << order); i++ )
>> -                    iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
>> +                {
>> +                    rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
>> +                    if ( rc )
>> +                    {
>> +                        while ( i-- > 0 )
>> +                            iommu_unmap_page(d, gfn + i);
>
> This won't unmap gfn+0 (since it will break out when i == 0 without
> calling unmap).

Oh, no it won't, because the decrement is postfix.

For us mere mortals, I'd appreciate a comment here like this:

/* Postfix operator means we will call unmap with i == 0 */

Thanks,
 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.
  2016-03-17  6:54 ` [PATCH 2/2] IOMMU/MMU: Adjust low " Quan Xu
  2016-03-17  7:37   ` Tian, Kevin
@ 2016-03-17 15:31   ` George Dunlap
  2016-03-18  6:57     ` Xu, Quan
  2016-03-18 10:20   ` Jan Beulich
  2 siblings, 1 reply; 42+ messages in thread
From: George Dunlap @ 2016-03-17 15:31 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Keir Fraser, Jan Beulich, Andrew Cooper,
	Dario Faggioli, xen-devel, Julien Grall, Stefano Stabellini,
	Jun Nakajima, Feng Wu, Suravee Suthikulpanit

On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote:
> Current code would be panic(), when VT-d Device-TLB flush timed out.
> the panic() is going to be eliminated, so we must check all kinds of
> error and all the way up the call trees.
>
> Signed-off-by: Quan Xu <quan.xu@intel.com>
>
> CC: Jun Nakajima <jun.nakajima@intel.com>
> CC: Kevin Tian <kevin.tian@intel.com>
> CC: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Julien Grall <julien.grall@arm.com>
> CC: Feng Wu <feng.wu@intel.com>
> CC: Dario Faggioli <dario.faggioli@citrix.com>
> ---
>  xen/arch/x86/mm/p2m-ept.c                     |   2 +-
>  xen/drivers/passthrough/amd/iommu_init.c      |  12 ++-
>  xen/drivers/passthrough/amd/pci_amd_iommu.c   |   2 +-
>  xen/drivers/passthrough/arm/smmu.c            |  10 ++-
>  xen/drivers/passthrough/iommu.c               |  17 ++--
>  xen/drivers/passthrough/vtd/extern.h          |   2 +-
>  xen/drivers/passthrough/vtd/iommu.c           | 120 ++++++++++++++++++--------
>  xen/drivers/passthrough/vtd/quirks.c          |  26 +++---
>  xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |   2 +-
>  xen/include/asm-x86/iommu.h                   |   2 +-
>  xen/include/xen/iommu.h                       |   6 +-
>  11 files changed, 133 insertions(+), 68 deletions(-)
>
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index f9bcce7..fa6c710 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -825,7 +825,7 @@ out:
>           need_modify_vtd_table )
>      {
>          if ( iommu_hap_pt_share )
> -            iommu_pte_flush(d, gfn, &ept_entry->epte, order, vtd_pte_present);
> +            rc = iommu_pte_flush(d, gfn, &ept_entry->epte, order, vtd_pte_present);

So this sort changes the meaning of the "rc" check near the end of the
function, when we check whether we want to update altp2m.

As it happens, I *think* it doesn't matter, because you can't have
altp2m and passthrough enabled at the same time, right?

If so, this at least merits a comment above the altp2m check; something like:

"NB that if altp2m is enabled, rc cannot be non-zero here due to
iommu_pte_flush, since you can't have altp2m and pass-through enabled
at the same time."

If you *can* have both altp2m and pass-through, then we need to make
sure that the altp2m gets updated when the hostp2m is updated, even if
the iommu flush fails.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-17  6:54 ` [PATCH 1/2] IOMMU/MMU: Adjust top level functions for " Quan Xu
  2016-03-17  7:32   ` Tian, Kevin
  2016-03-17 12:30   ` George Dunlap
@ 2016-03-17 17:14   ` Jan Beulich
  2016-03-28  3:33     ` Xu, Quan
  2 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2016-03-17 17:14 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, George Dunlap, Liu Jinsong, Dario Faggioli,
	xen-devel, Jun Nakajima, Andrew Cooper, Keir Fraser

>>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote:
> @@ -53,11 +55,21 @@ static int device_power_down(void)
>  
>      ioapic_suspend();
>  
> -    iommu_suspend();
> +    err = iommu_suspend();
> +    if ( err )
> +        goto iommu_suspend_error;
>  
>      lapic_suspend();
>  
>      return 0;
> +
> +iommu_suspend_error:

Labels indented by at least one space please.

> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -830,7 +830,15 @@ out:
>          {
>              if ( iommu_flags )
>                  for ( i = 0; i < (1 << order); i++ )
> -                    iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
> +                {
> +                    rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
> +                    if ( rc )
> +                    {
> +                        while ( i-- > 0 )
> +                            iommu_unmap_page(d, gfn + i);
> +                        break;
> +                    }
> +                }
>              else
>                  for ( i = 0; i < (1 << order); i++ )
>                      iommu_unmap_page(d, gfn + i);

Earlier on in the PV mm code you also checked iommu_unmap_page()'s
return code - why not here (and also in p2m-pt.c)?

Also I'm quite unhappy about the inconsistent state you leave things
in: You unmap from the IOMMU, return an error, but leave the EPT
entry in place.

> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -932,8 +932,9 @@ __gnttab_map_grant_ref(
>              {
>                  nr_gets++;
>                  (void)get_page(pg, rd);
> -                if ( !(op->flags & GNTMAP_readonly) )
> -                    get_page_type(pg, PGT_writable_page);
> +                if ( !(op->flags & GNTMAP_readonly) &&
> +                     !get_page_type(pg, PGT_writable_page) )
> +                        goto could_not_pin;

This needs explanation, as it doesn't look related to what your actual
goal is: If an error was possible here, I think this would be a security
issue. However, as also kind of documented by the explicitly ignored
return value from get_page(), it is my understanding there here we
only obtain an _extra_ reference.

> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -678,8 +678,9 @@ static int xenmem_add_to_physmap(struct domain *d,
>      if ( need_iommu(d) )
>      {
>          this_cpu(iommu_dont_flush_iotlb) = 0;
> -        iommu_iotlb_flush(d, xatp->idx - done, done);
> -        iommu_iotlb_flush(d, xatp->gpfn - done, done);
> +        rc = iommu_iotlb_flush(d, xatp->idx - done, done);
> +        if ( !rc )
> +            rc = iommu_iotlb_flush(d, xatp->gpfn - done, done);
>      }

And the pattern repeats - you now return an error, but you don't
roll back the now failed operation. But wait - maybe that intended:
Are you meaning to crash the guest in such cases (somewhere
deep in the flush code)? If so, I think that's fine, but you
absolutely would need to say so in the commit message.

> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct domain *d)
>      this_cpu(iommu_dont_flush_iotlb) = 0;
>  
>      if ( !rc )
> -        iommu_iotlb_flush_all(d);
> +    {
> +        rc = iommu_iotlb_flush_all(d);
> +        if ( rc )
> +            iommu_teardown(d);
> +    }
>      else if ( rc != -ERESTART )
>          iommu_teardown(d);

Why can't you just use the existing call to iommu_teardown(), by
simply deleting the "else"?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.
  2016-03-17  7:37   ` Tian, Kevin
@ 2016-03-18  2:30     ` Xu, Quan
  2016-03-18  8:06       ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Xu, Quan @ 2016-03-18  2:30 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nakajima, Jun, Keir Fraser, Jan Beulich, George Dunlap,
	Andrew Cooper, Dario Faggioli, xen-devel, Julien Grall,
	Stefano Stabellini, Suravee Suthikulpanit, Wu, Feng

On March 17, 2016 3:38pm, Tian, Kevin <kevin.tian@intel.com> wrote:
> > From: Xu, Quan
> > Sent: Thursday, March 17, 2016 2:55 PM
> >
> > Current code would be panic(), when VT-d Device-TLB flush timed out.
> > the panic() is going to be eliminated, so we must check all kinds of
> > error and all the way up the call trees.
> 
> sorry that I'm unclear what is the criteria of defining high level and low level
> functions in two patches. Could you elaborate? Once I thought you may mean
> common code and vendor specific code, however...
> 

In this patch set, it is adjusting top level functions of the call tree first, and working my way down to leaf ones.
I tried to define that top level is mainly about MMU, and the low level is mainly about IOMMU. Mixed things are in
Consideration of compiling and simplification. For high level of these call trees, IMO it is a reasonable attempt at splitting things.


> >  xen/arch/x86/mm/p2m-ept.c                     |   2 +-
> >  xen/drivers/passthrough/amd/iommu_init.c      |  12 ++-
> >  xen/drivers/passthrough/amd/pci_amd_iommu.c   |   2 +-
> >  xen/drivers/passthrough/arm/smmu.c            |  10 ++-
> >  xen/drivers/passthrough/iommu.c               |  17 ++--
> >  xen/drivers/passthrough/vtd/extern.h          |   2 +-
> >  xen/drivers/passthrough/vtd/iommu.c           | 120
> > ++++++++++++++++++--------
> >  xen/drivers/passthrough/vtd/quirks.c          |  26 +++---
> >  xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |   2 +-
> >  xen/include/asm-x86/iommu.h                   |   2 +-
> >  xen/include/xen/iommu.h                       |   6 +-
> 
> Above you have general passthrough/iommu.c though most of others are
> vendor specific.
> 


 .e.g, as the 'struct iommu_ops' is common structure for arm/amd/intel.


> Then in PATCH [1/2], you have:
> 
> xen/arch/x86/acpi/power.c             | 14 +++++++++++++-
>  xen/arch/x86/mm.c                     | 13 ++++++++-----
>  xen/arch/x86/mm/p2m-ept.c             | 10 +++++++++-
>  xen/arch/x86/mm/p2m-pt.c              | 12 ++++++++++--
>  xen/common/grant_table.c              |  5 +++--
>  xen/common/memory.c                   |  5 +++--
>  xen/drivers/passthrough/iommu.c       | 16 +++++++++++-----
>  xen/drivers/passthrough/vtd/x86/vtd.c |  7 +++++--
>  xen/drivers/passthrough/x86/iommu.c   |  6 +++++-
>  xen/include/xen/iommu.h               |  6 +++---
> 
> They are also mixed.
> 

It is in Consideration of compiling and simplification. 
 e.g. for this call tree,  ...--iommu_suspend()--device_power_down()--...
 device_power_down() is in -xen/arch/x86/acpi/power.c
 iommu_suspend() is in -xen/drivers/passthrough/iommu.c

when I tried to return error code from iommu_suspend(), which is with 'void' annotation.
I need change it from 'void' to 'int', then it is unavoidable to mix things.

Any good idea? To be honest, I am very tired to at splitting things like this :).


Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-17 12:33     ` George Dunlap
@ 2016-03-18  3:19       ` Xu, Quan
  2016-03-18  8:09         ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Xu, Quan @ 2016-03-18  3:19 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tian, Kevin, Keir Fraser, Nakajima, Jun, Liu Jinsong,
	Dario Faggioli, xen-devel, Jan Beulich, Andrew Cooper, Wu, Feng

On March 17, 2016 8:34pm, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> On Thu, Mar 17, 2016 at 12:30 PM, George Dunlap
> <George.Dunlap@eu.citrix.com> wrote:
> > On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote:
> >> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index
> >> c997b53..526548e 100644
> >> --- a/xen/arch/x86/mm.c
> >> +++ b/xen/arch/x86/mm.c
> >> @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info
> *page, unsigned long type,
> >>                             int preemptible)  {
> >>      unsigned long nx, x, y = page->u.inuse.type_info;
> >> -    int rc = 0;
> >> +    int rc = 0, ret = 0;
> >>
> >>      ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
> >>
> >> @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info
> *page, unsigned long type,
> >>          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
> >>          {
> >>              if ( (x & PGT_type_mask) == PGT_writable_page )
> >> -                iommu_unmap_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)));
> >> +                ret = iommu_unmap_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)));
> >>              else if ( type == PGT_writable_page )
> >> -                iommu_map_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)),
> >> -                               page_to_mfn(page),
> >> -
> IOMMUF_readable|IOMMUF_writable);
> >> +                ret = iommu_map_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)),
> >> +                                     page_to_mfn(page),
> >> +
> IOMMUF_readable|IOMMUF_writable);
> >>          }
> >>      }
> >>
> >> @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info
> *page, unsigned long type,
> >>      if ( (x & PGT_partial) && !(nx & PGT_partial) )
> >>          put_page(page);
> >>
> >> +    if ( !rc )
> >> +        rc = ret;
> >> +
> >
> > What's this about?  If the iommu_[un]map_page() operation times out,
> > we still go through with calling alloc_page_type(); and if
> > alloc_page_type() fails we return its failure value, but if it
> > succeeds, we return the error from iommu_[un]map_page()?
> >
> >>      return rc;
> >>  }
> >>
> >> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> >> index 3cb6868..f9bcce7 100644
> >> --- a/xen/arch/x86/mm/p2m-ept.c
> >> +++ b/xen/arch/x86/mm/p2m-ept.c
> >> @@ -830,7 +830,15 @@ out:
> >>          {
> >>              if ( iommu_flags )
> >>                  for ( i = 0; i < (1 << order); i++ )
> >> -                    iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
> iommu_flags);
> >> +                {
> >> +                    rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
> iommu_flags);
> >> +                    if ( rc )
> >> +                    {
> >> +                        while ( i-- > 0 )
> >> +                            iommu_unmap_page(d, gfn + i);
> >
> > This won't unmap gfn+0 (since it will break out when i == 0 without
> > calling unmap).
> 
> Oh, no it won't, because the decrement is postfix.
> 
> For us mere mortals, I'd appreciate a comment here like this:
> 
> /* Postfix operator means we will call unmap with i == 0 */
> 
Agreed.
For these 2 points, to summarize:
   - adding "unlikely()" to the if() condition, e.g. if ( unlikely(rc) )
   - adding a comment:
        /* Postfix operator means we will call unmap with i == 0 */


Right?

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.
  2016-03-17 15:31   ` George Dunlap
@ 2016-03-18  6:57     ` Xu, Quan
  0 siblings, 0 replies; 42+ messages in thread
From: Xu, Quan @ 2016-03-18  6:57 UTC (permalink / raw)
  To: George Dunlap, Andrew Cooper
  Cc: Tian, Kevin, Keir Fraser, Jan Beulich, Dario Faggioli, xen-devel,
	Julien Grall, Stefano Stabellini, Nakajima, Jun, Wu, Feng,
	Suravee Suthikulpanit

On March 17, 2016 11:31pm, <George.Dunlap@eu.citrix.com> wrote:
> On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote:
> > Current code would be panic(), when VT-d Device-TLB flush timed out.
> > the panic() is going to be eliminated, so we must check all kinds of
> > error and all the way up the call trees.
> >
> > Signed-off-by: Quan Xu <quan.xu@intel.com>
> >
> > CC: Jun Nakajima <jun.nakajima@intel.com>
> > CC: Kevin Tian <kevin.tian@intel.com>
> > CC: George Dunlap <george.dunlap@eu.citrix.com>
> > CC: Keir Fraser <keir@xen.org>
> > CC: Jan Beulich <jbeulich@suse.com>
> > CC: Andrew Cooper <andrew.cooper3@citrix.com>
> > CC: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> > CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> > CC: Julien Grall <julien.grall@arm.com>
> > CC: Feng Wu <feng.wu@intel.com>
> > CC: Dario Faggioli <dario.faggioli@citrix.com>
> > ---
> >  xen/arch/x86/mm/p2m-ept.c                     |   2 +-
> >  xen/drivers/passthrough/amd/iommu_init.c      |  12 ++-
> >  xen/drivers/passthrough/amd/pci_amd_iommu.c   |   2 +-
> >  xen/drivers/passthrough/arm/smmu.c            |  10 ++-
> >  xen/drivers/passthrough/iommu.c               |  17 ++--
> >  xen/drivers/passthrough/vtd/extern.h          |   2 +-
> >  xen/drivers/passthrough/vtd/iommu.c           | 120
> ++++++++++++++++++--------
> >  xen/drivers/passthrough/vtd/quirks.c          |  26 +++---
> >  xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |   2 +-
> >  xen/include/asm-x86/iommu.h                   |   2 +-
> >  xen/include/xen/iommu.h                       |   6 +-
> >  11 files changed, 133 insertions(+), 68 deletions(-)
> >
> > diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> > index f9bcce7..fa6c710 100644
> > --- a/xen/arch/x86/mm/p2m-ept.c
> > +++ b/xen/arch/x86/mm/p2m-ept.c
> > @@ -825,7 +825,7 @@ out:
> >           need_modify_vtd_table )
> >      {
> >          if ( iommu_hap_pt_share )
> > -            iommu_pte_flush(d, gfn, &ept_entry->epte, order,
> vtd_pte_present);
> > +            rc = iommu_pte_flush(d, gfn, &ept_entry->epte, order,
> > + vtd_pte_present);
> 
> So this sort changes the meaning of the "rc" check near the end of the function,
> when we check whether we want to update altp2m.
> 
> As it happens, I *think* it doesn't matter, because you can't have altp2m and
> passthrough enabled at the same time, right?


I think it is _not_. 
I have gone through the altp2m design/design discussion. It looks Xen _can_ have altp2m and passthrough enabled at the same time.
Also you can refer to http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg01963.html
Andrew, could you help me double check it?

> 
> If so, this at least merits a comment above the altp2m check; something like:
> 
> "NB that if altp2m is enabled, rc cannot be non-zero here due to
> iommu_pte_flush, since you can't have altp2m and pass-through enabled at the
> same time."
> 
> If you *can* have both altp2m and pass-through, then we need to make sure
> that the altp2m gets updated when the hostp2m is updated, even if the iommu
> flush fails.
> 
I think it _can_, and let's wait for the above check first.

Quan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-17 12:30   ` George Dunlap
  2016-03-17 12:33     ` George Dunlap
@ 2016-03-18  7:54     ` Xu, Quan
  2016-03-18  8:19       ` Jan Beulich
  1 sibling, 1 reply; 42+ messages in thread
From: Xu, Quan @ 2016-03-18  7:54 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tian, Kevin, Keir Fraser, Nakajima, Jun, Liu Jinsong,
	Dario Faggioli, xen-devel, Jan Beulich, Andrew Cooper, Wu, Feng

On March 17, 2016 8:30pm, <dunlapg@gmail.com> wrote:
> On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote:
> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index
> > c997b53..526548e 100644
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info *page,
> unsigned long type,
> >                             int preemptible)  {
> >      unsigned long nx, x, y = page->u.inuse.type_info;
> > -    int rc = 0;
> > +    int rc = 0, ret = 0;
> >
> >      ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
> >
> > @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info
> *page, unsigned long type,
> >          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
> >          {
> >              if ( (x & PGT_type_mask) == PGT_writable_page )
> > -                iommu_unmap_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)));
> > +                ret = iommu_unmap_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)));
> >              else if ( type == PGT_writable_page )
> > -                iommu_map_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)),
> > -                               page_to_mfn(page),
> > -
> IOMMUF_readable|IOMMUF_writable);
> > +                ret = iommu_map_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)),
> > +                                     page_to_mfn(page),
> > +
> IOMMUF_readable|IOMMUF_writable);
> >          }
> >      }
> >
> > @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info *page,
> unsigned long type,
> >      if ( (x & PGT_partial) && !(nx & PGT_partial) )
> >          put_page(page);
> >
> > +    if ( !rc )
> > +        rc = ret;
> > +
> 
> What's this about?  If the iommu_[un]map_page() operation times out,
> we still go through with calling alloc_page_type(); and if
> alloc_page_type() fails we return its failure value, but if it
> succeeds, we return the error from iommu_[un]map_page()?
> 
Yes.
To be honest, to me, it is tricky too. 
I found it is not right to return directly if the iommu_[un]map_page() operation times out.

 """if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )"""
Does IOMMU support pv domain? If not, we'd better remove the " if(...){...}"

Quan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.
  2016-03-18  2:30     ` Xu, Quan
@ 2016-03-18  8:06       ` Jan Beulich
  2016-03-21  5:01         ` Tian, Kevin
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2016-03-18  8:06 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Julien Grall, Stefano Stabellini,
	SuraveeSuthikulpanit, Keir Fraser

>>> On 18.03.16 at 03:30, <quan.xu@intel.com> wrote:
> Any good idea? To be honest, I am very tired to at splitting things like 
> this :).

I understand that this is a tedious task; the code should have been
propagating errors from the beginning.

The most natural way of splitting things would be to go function by
function, top level ones first, and leaf ones last, one function per
patch (maybe pairs of functions, as in the map/unmap case). Such
model would be problematic (almost) only when there's recursion at
some point, which I don't think would be the case anywhere here.
As mentioned before, the __must_check annotation is a great help
to not miss any callers - both to you as you put things together and
to the reviewers to be ascertained that nothing was missed.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-18  3:19       ` Xu, Quan
@ 2016-03-18  8:09         ` Jan Beulich
  2016-03-24  6:45           ` Xu, Quan
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2016-03-18  8:09 UTC (permalink / raw)
  To: George Dunlap, Quan Xu
  Cc: Kevin Tian, Feng Wu, Liu Jinsong, DarioFaggioli, xen-devel,
	Jun Nakajima, Andrew Cooper, Keir Fraser

>>> On 18.03.16 at 04:19, <quan.xu@intel.com> wrote:
> On March 17, 2016 8:34pm, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>> On Thu, Mar 17, 2016 at 12:30 PM, George Dunlap
>> <George.Dunlap@eu.citrix.com> wrote:
>> > On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote:
>> >> --- a/xen/arch/x86/mm/p2m-ept.c
>> >> +++ b/xen/arch/x86/mm/p2m-ept.c
>> >> @@ -830,7 +830,15 @@ out:
>> >>          {
>> >>              if ( iommu_flags )
>> >>                  for ( i = 0; i < (1 << order); i++ )
>> >> -                    iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
>> iommu_flags);
>> >> +                {
>> >> +                    rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
>> iommu_flags);
>> >> +                    if ( rc )
>> >> +                    {
>> >> +                        while ( i-- > 0 )
>> >> +                            iommu_unmap_page(d, gfn + i);
>> >
>> > This won't unmap gfn+0 (since it will break out when i == 0 without
>> > calling unmap).
>> 
>> Oh, no it won't, because the decrement is postfix.
>> 
>> For us mere mortals, I'd appreciate a comment here like this:
>> 
>> /* Postfix operator means we will call unmap with i == 0 */
>> 
> Agreed.
> For these 2 points, to summarize:
>    - adding "unlikely()" to the if() condition, e.g. if ( unlikely(rc) )
>    - adding a comment:
>         /* Postfix operator means we will call unmap with i == 0 */

To be honest, I'm opposed to the addition of such comments.
See also the parallel discussion rooted at
http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg01779.html

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-18  7:54     ` Xu, Quan
@ 2016-03-18  8:19       ` Jan Beulich
  2016-03-18  9:09         ` Xu, Quan
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2016-03-18  8:19 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, George Dunlap, Liu Jinsong, DarioFaggioli,
	xen-devel, Jun Nakajima, Andrew Cooper, Keir Fraser

>>> On 18.03.16 at 08:54, <quan.xu@intel.com> wrote:
> On March 17, 2016 8:30pm, <dunlapg@gmail.com> wrote:
>> On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote:
>> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index
>> > c997b53..526548e 100644
>> > --- a/xen/arch/x86/mm.c
>> > +++ b/xen/arch/x86/mm.c
>> > @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info *page,
>> unsigned long type,
>> >                             int preemptible)  {
>> >      unsigned long nx, x, y = page->u.inuse.type_info;
>> > -    int rc = 0;
>> > +    int rc = 0, ret = 0;
>> >
>> >      ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
>> >
>> > @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info
>> *page, unsigned long type,
>> >          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>> >          {
>> >              if ( (x & PGT_type_mask) == PGT_writable_page )
>> > -                iommu_unmap_page(d, mfn_to_gmfn(d,
>> page_to_mfn(page)));
>> > +                ret = iommu_unmap_page(d, mfn_to_gmfn(d,
>> page_to_mfn(page)));
>> >              else if ( type == PGT_writable_page )
>> > -                iommu_map_page(d, mfn_to_gmfn(d,
>> page_to_mfn(page)),
>> > -                               page_to_mfn(page),
>> > -
>> IOMMUF_readable|IOMMUF_writable);
>> > +                ret = iommu_map_page(d, mfn_to_gmfn(d,
>> page_to_mfn(page)),
>> > +                                     page_to_mfn(page),
>> > +
>> IOMMUF_readable|IOMMUF_writable);
>> >          }
>> >      }
>> >
>> > @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info *page,
>> unsigned long type,
>> >      if ( (x & PGT_partial) && !(nx & PGT_partial) )
>> >          put_page(page);
>> >
>> > +    if ( !rc )
>> > +        rc = ret;
>> > +
>> 
>> What's this about?  If the iommu_[un]map_page() operation times out,
>> we still go through with calling alloc_page_type(); and if
>> alloc_page_type() fails we return its failure value, but if it
>> succeeds, we return the error from iommu_[un]map_page()?
>> 
> Yes.
> To be honest, to me, it is tricky too. 
> I found it is not right to return directly if the iommu_[un]map_page() 
> operation times out.
> 
>  """if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )"""
> Does IOMMU support pv domain? If not, we'd better remove the " if(...){...}"

What strange a question: Of course it does.

As you can infer form the reply I sent yesterday, you first need to
settle on an abstract model: What do failures mean? If, in the flush
case, a timeout is going to kill the domain anyway, then error
handling can be lighter weight than if you mean to try to keep the
domain running. Of course in this context you also should not forget
that iommu_map_page() could already return errors prior to your
changes (most notably -ENOMEM, but at least the AMD side also
produces others, with -EFAULT generally being accompanied by
domain_crash()). As mentioned elsewhere - it seems extremely
bogus that these errors weren't check for from the beginning.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-18  8:19       ` Jan Beulich
@ 2016-03-18  9:09         ` Xu, Quan
  2016-03-18  9:29           ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Xu, Quan @ 2016-03-18  9:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Liu Jinsong, DarioFaggioli,
	xen-devel, Nakajima, Jun, Andrew Cooper, Keir Fraser

On March 18, 2016 4:20pm, <JBeulich@suse.com> wrote:
> >>> On 18.03.16 at 08:54, <quan.xu@intel.com> wrote:
> > On March 17, 2016 8:30pm, <dunlapg@gmail.com> wrote:
> >> On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote:
> >> > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index
> >> > c997b53..526548e 100644
> >> > --- a/xen/arch/x86/mm.c
> >> > +++ b/xen/arch/x86/mm.c
> >> > @@ -2467,7 +2467,7 @@ static int __get_page_type(struct page_info
> >> > *page,
> >> unsigned long type,
> >> >                             int preemptible)  {
> >> >      unsigned long nx, x, y = page->u.inuse.type_info;
> >> > -    int rc = 0;
> >> > +    int rc = 0, ret = 0;
> >> >
> >> >      ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
> >> >
> >> > @@ -2578,11 +2578,11 @@ static int __get_page_type(struct page_info
> >> *page, unsigned long type,
> >> >          if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
> >> >          {
> >> >              if ( (x & PGT_type_mask) == PGT_writable_page )
> >> > -                iommu_unmap_page(d, mfn_to_gmfn(d,
> >> page_to_mfn(page)));
> >> > +                ret = iommu_unmap_page(d, mfn_to_gmfn(d,
> >> page_to_mfn(page)));
> >> >              else if ( type == PGT_writable_page )
> >> > -                iommu_map_page(d, mfn_to_gmfn(d,
> >> page_to_mfn(page)),
> >> > -                               page_to_mfn(page),
> >> > -
> >> IOMMUF_readable|IOMMUF_writable);
> >> > +                ret = iommu_map_page(d, mfn_to_gmfn(d,
> >> page_to_mfn(page)),
> >> > +                                     page_to_mfn(page),
> >> > +
> >> IOMMUF_readable|IOMMUF_writable);
> >> >          }
> >> >      }
> >> >
> >> > @@ -2599,6 +2599,9 @@ static int __get_page_type(struct page_info
> >> > *page,
> >> unsigned long type,
> >> >      if ( (x & PGT_partial) && !(nx & PGT_partial) )
> >> >          put_page(page);
> >> >
> >> > +    if ( !rc )
> >> > +        rc = ret;
> >> > +
> >>
> >> What's this about?  If the iommu_[un]map_page() operation times out,
> >> we still go through with calling alloc_page_type(); and if
> >> alloc_page_type() fails we return its failure value, but if it
> >> succeeds, we return the error from iommu_[un]map_page()?
> >>
> > Yes.
> > To be honest, to me, it is tricky too.
> > I found it is not right to return directly if the iommu_[un]map_page()
> > operation times out.
> >
> >  """if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )"""
> > Does IOMMU support pv domain? If not, we'd better remove the " if(...){...}"
> 
> What strange a question: Of course it does.
> 

Jan, thanks. 
To be honest, this stupid question was always in my mind.

> As you can infer form the reply I sent yesterday, you first need to settle on an
> abstract model: What do failures mean? If, in the flush case, a timeout is going
> to kill the domain anyway, then error handling can be lighter weight than if you
> mean to try to keep the domain running. Of course in this context you also
> should not forget that iommu_map_page() could already return errors prior to
> your changes (most notably -ENOMEM, but at least the AMD side also produces
> others, with -EFAULT generally being accompanied by domain_crash()). As
> mentioned elsewhere - it seems extremely bogus that these errors weren't
> check for from the beginning.
> 
Jan, I am not familiar/confident enough to this single point, could you tell me more how to fix it?
Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-18  9:09         ` Xu, Quan
@ 2016-03-18  9:29           ` Jan Beulich
  2016-03-18  9:38             ` Dario Faggioli
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2016-03-18  9:29 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, George Dunlap, Liu Jinsong, DarioFaggioli,
	xen-devel, Jun Nakajima, Andrew Cooper, Keir Fraser

>>> On 18.03.16 at 10:09, <quan.xu@intel.com> wrote:
> On March 18, 2016 4:20pm, <JBeulich@suse.com> wrote:
>> As you can infer form the reply I sent yesterday, you first need to settle on an
>> abstract model: What do failures mean? If, in the flush case, a timeout is going
>> to kill the domain anyway, then error handling can be lighter weight than if you
>> mean to try to keep the domain running. Of course in this context you also
>> should not forget that iommu_map_page() could already return errors prior to
>> your changes (most notably -ENOMEM, but at least the AMD side also produces
>> others, with -EFAULT generally being accompanied by domain_crash()). As
>> mentioned elsewhere - it seems extremely bogus that these errors weren't
>> check for from the beginning.
>> 
> Jan, I am not familiar/confident enough to this single point, could you tell 
> me more how to fix it?

Not sure what exactly you're asking for: As said, we first need to
settle on an abstract model. Do we want IOMMU mapping failures
to be fatal to the domain (perhaps with the exception of the
hardware one)? I think we do, and for the hardware domain we'd
do things on a best effort basis (always erring on the side of
unmapping). Which would probably mean crashing the domain
could be centralized in iommu_{,un}map_page(). How much roll
back would then still be needed in callers of these functions for
the hardware domain's sake would need to be seen.

So before you start coing, give others (namely but not limited to
VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance
to voice differing opinions.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-18  9:29           ` Jan Beulich
@ 2016-03-18  9:38             ` Dario Faggioli
  2016-03-18  9:48               ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Dario Faggioli @ 2016-03-18  9:38 UTC (permalink / raw)
  To: Jan Beulich, Quan Xu
  Cc: Kevin Tian, Feng Wu, George Dunlap, Liu Jinsong, xen-devel,
	Jun Nakajima, Andrew Cooper, Keir Fraser


[-- Attachment #1.1: Type: text/plain, Size: 1423 bytes --]

On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote:
> > 
> Not sure what exactly you're asking for: As said, we first need to
> settle on an abstract model. Do we want IOMMU mapping failures
> to be fatal to the domain (perhaps with the exception of the
> hardware one)? I think we do, and for the hardware domain we'd
> do things on a best effort basis (always erring on the side of
> unmapping). Which would probably mean crashing the domain
> could be centralized in iommu_{,un}map_page(). How much roll
> back would then still be needed in callers of these functions for
> the hardware domain's sake would need to be seen.
> 
> So before you start coing, give others (namely but not limited to
> VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance
> to voice differing opinions.
>
I'm nothing of the above but, FWIW, the behavior Jan described
(crashing the domain for all domains but the hardware domain) was
indeed the intended plan for this series, as far as I understood from
talking to people and looking at previous email conversations and
submissions.

And it looks to me like it is a sane plan.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-18  9:38             ` Dario Faggioli
@ 2016-03-18  9:48               ` Jan Beulich
  2016-03-21  6:18                 ` Tian, Kevin
  2016-03-24  9:02                 ` Xu, Quan
  0 siblings, 2 replies; 42+ messages in thread
From: Jan Beulich @ 2016-03-18  9:48 UTC (permalink / raw)
  To: Dario Faggioli, Quan Xu
  Cc: KevinTian, Feng Wu, George Dunlap, Liu Jinsong, xen-devel,
	Jun Nakajima, Andrew Cooper, Keir Fraser

>>> On 18.03.16 at 10:38, <dario.faggioli@citrix.com> wrote:
> On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote:
>> > 
>> Not sure what exactly you're asking for: As said, we first need to
>> settle on an abstract model. Do we want IOMMU mapping failures
>> to be fatal to the domain (perhaps with the exception of the
>> hardware one)? I think we do, and for the hardware domain we'd
>> do things on a best effort basis (always erring on the side of
>> unmapping). Which would probably mean crashing the domain
>> could be centralized in iommu_{,un}map_page(). How much roll
>> back would then still be needed in callers of these functions for
>> the hardware domain's sake would need to be seen.
>> 
>> So before you start coing, give others (namely but not limited to
>> VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance
>> to voice differing opinions.
>>
> I'm nothing of the above but,

Don't you fall under "but not limited to"  ;-) ?

> FWIW, the behavior Jan described
> (crashing the domain for all domains but the hardware domain) was
> indeed the intended plan for this series, as far as I understood from
> talking to people and looking at previous email conversations and
> submissions.

That was taking only the flush timeout as an error source into account.
Now that we see that the lack of error handling pre-exists, we can't
just extend that intended model to also cover those other error
reasons without at least having given people a chance to object.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.
  2016-03-17  6:54 ` [PATCH 2/2] IOMMU/MMU: Adjust low " Quan Xu
  2016-03-17  7:37   ` Tian, Kevin
  2016-03-17 15:31   ` George Dunlap
@ 2016-03-18 10:20   ` Jan Beulich
  2016-03-25  9:27     ` Xu, Quan
  2 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2016-03-18 10:20 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, Jun Nakajima, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Julien Grall, Stefano Stabellini,
	Suravee Suthikulpanit, Keir Fraser

>>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote:
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -1339,12 +1339,14 @@ static void invalidate_all_devices(void)
>      iterate_ivrs_mappings(_invalidate_all_devices);
>  }
>  
> -void amd_iommu_suspend(void)
> +int amd_iommu_suspend(void)
>  {
>      struct amd_iommu *iommu;
>  
>      for_each_amd_iommu ( iommu )
>          disable_iommu(iommu);
> +
> +    return 0;
>  }
>  
>  void amd_iommu_resume(void)
> @@ -1368,3 +1370,11 @@ void amd_iommu_resume(void)
>          invalidate_all_domain_pages();
>      }
>  }
> +
> +void amd_iommu_crash_shutdown(void)
> +{
> +    struct amd_iommu *iommu;
> +
> +    for_each_amd_iommu ( iommu )
> +        disable_iommu(iommu);
> +}

One of the two should clearly call the other - no need to have the
same code twice.

> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -182,7 +182,11 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
>                   ((page->u.inuse.type_info & PGT_type_mask)
>                    == PGT_writable_page) )
>                  mapping |= IOMMUF_writable;
> -            hd->platform_ops->map_page(d, gfn, mfn, mapping);
> +            if ( hd->platform_ops->map_page(d, gfn, mfn, mapping) )
> +                printk(XENLOG_G_ERR
> +                       "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx) failed.\n",
> +                       gfn, mfn);
> +

Printing one message here is certainly necessary, but what if the
failure repeats for very many pages? Also %#lx instead of 0x%lx
please, and a blank before the opening parenthesis.

> @@ -554,11 +555,24 @@ static void iommu_flush_all(void)
>          iommu = drhd->iommu;
>          iommu_flush_context_global(iommu, 0);
>          flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
> -        iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
> +        rc = iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
> +
> +        if ( rc > 0 )
> +        {
> +            iommu_flush_write_buffer(iommu);

Why is this needed all of the sudden? (Note that if you did a
more fine grained split, it might also be easier for you to note/
explain all the not directly related changes in the respective
commit messages. Unless of course they fix actual bugs, in
which case they should be split out anyway; such individual
fixes would also likely have a much faster route to commit,
relieving you earlier from the burden of at least some of the
changes you have to carry and re-base.)

> +            rc = 0;
> +        }
> +        else if ( rc < 0 )
> +        {
> +            printk(XENLOG_G_ERR "IOMMU: IOMMU flush all failed.\n");
> +            break;
> +        }

Is a log message really advisable here?

> -static void __intel_iommu_iotlb_flush(struct domain *d, unsigned long gfn,
> +static int __intel_iommu_iotlb_flush(struct domain *d, unsigned long gfn,

While I'm not VT-d maintainer, I think changes like this would be a
good opportunity to also drop the stray double underscores: You
need to touch all callers anyway.

> @@ -584,37 +599,40 @@ static void __intel_iommu_iotlb_flush(struct domain *d, 
> unsigned long gfn,
>              continue;
>  
>          if ( page_count != 1 || gfn == INVALID_GFN )
> -        {
> -            if ( iommu_flush_iotlb_dsi(iommu, iommu_domid,
> -                        0, flush_dev_iotlb) )
> -                iommu_flush_write_buffer(iommu);
> -        }
> +            rc = iommu_flush_iotlb_dsi(iommu, iommu_domid,
> +                                       0, flush_dev_iotlb);
>          else
> +            rc = iommu_flush_iotlb_psi(iommu, iommu_domid,
> +                                       (paddr_t)gfn << PAGE_SHIFT_4K, 0,
> +                                       !dma_old_pte_present,
> +                                       flush_dev_iotlb);
> +        if ( rc > 0 )
>          {
> -            if ( iommu_flush_iotlb_psi(iommu, iommu_domid,
> -                        (paddr_t)gfn << PAGE_SHIFT_4K, PAGE_ORDER_4K,

Note how this used PAGE_ORDER_4K so far?

> -                        !dma_old_pte_present, flush_dev_iotlb) )
> -                iommu_flush_write_buffer(iommu);
> +            iommu_flush_write_buffer(iommu);

Same question again: Why is this all of the sudden needed on
both paths?

> @@ -622,7 +640,7 @@ static void dma_pte_clear_one(struct domain *domain, u64 addr)
>      if ( pg_maddr == 0 )
>      {
>          spin_unlock(&hd->arch.mapping_lock);
> -        return;
> +        return -ENOMEM;
>      }

addr_to_dma_page_maddr() gets called with "alloc" being false, so
there can't be any memory allocation failure here. There simply is
nothing to do in this case.

> -void me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
> +int me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
>  {
>      u32 id;
> +    int rc = 0;
>  
>      id = pci_conf_read32(0, 0, 0, 0, 0);
>      if ( IS_CTG(id) )
>      {
>          /* quit if ME does not exist */
>          if ( pci_conf_read32(0, 0, 3, 0, 0) == 0xffffffff )
> -            return;
> +            return -ENOENT;

Is this really an error? IOW, do all systems which satisfy IS_CTG()
have such a device?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.
  2016-03-18  8:06       ` Jan Beulich
@ 2016-03-21  5:01         ` Tian, Kevin
  0 siblings, 0 replies; 42+ messages in thread
From: Tian, Kevin @ 2016-03-21  5:01 UTC (permalink / raw)
  To: Jan Beulich, Xu, Quan
  Cc: Wu, Feng, Nakajima, Jun, George Dunlap, Andrew Cooper,
	Dario Faggioli, xen-devel, Julien Grall, Stefano Stabellini,
	SuraveeSuthikulpanit, Keir Fraser

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, March 18, 2016 4:06 PM
> 
> >>> On 18.03.16 at 03:30, <quan.xu@intel.com> wrote:
> > Any good idea? To be honest, I am very tired to at splitting things like
> > this :).
> 
> I understand that this is a tedious task; the code should have been
> propagating errors from the beginning.
> 
> The most natural way of splitting things would be to go function by
> function, top level ones first, and leaf ones last, one function per
> patch (maybe pairs of functions, as in the map/unmap case). Such
> model would be problematic (almost) only when there's recursion at
> some point, which I don't think would be the case anywhere here.
> As mentioned before, the __must_check annotation is a great help
> to not miss any callers - both to you as you put things together and
> to the reviewers to be ascertained that nothing was missed.
> 

Agree with this suggestion. It also allows different maintainers
to focus on changes they really need to care about.

Thanks
Kevin


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-18  9:48               ` Jan Beulich
@ 2016-03-21  6:18                 ` Tian, Kevin
  2016-03-21 12:22                   ` Jan Beulich
  2016-03-24  9:02                 ` Xu, Quan
  1 sibling, 1 reply; 42+ messages in thread
From: Tian, Kevin @ 2016-03-21  6:18 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli, Xu, Quan
  Cc: Wu, Feng, George Dunlap, Liu Jinsong, xen-devel, Nakajima, Jun,
	Andrew Cooper, Keir Fraser

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Friday, March 18, 2016 5:49 PM
> 
> >>> On 18.03.16 at 10:38, <dario.faggioli@citrix.com> wrote:
> > On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote:
> >> >
> >> Not sure what exactly you're asking for: As said, we first need to
> >> settle on an abstract model. Do we want IOMMU mapping failures
> >> to be fatal to the domain (perhaps with the exception of the
> >> hardware one)? I think we do, and for the hardware domain we'd
> >> do things on a best effort basis (always erring on the side of
> >> unmapping). Which would probably mean crashing the domain
> >> could be centralized in iommu_{,un}map_page(). How much roll
> >> back would then still be needed in callers of these functions for
> >> the hardware domain's sake would need to be seen.
> >>
> >> So before you start coing, give others (namely but not limited to
> >> VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance
> >> to voice differing opinions.
> >>
> > I'm nothing of the above but,
> 
> Don't you fall under "but not limited to"  ;-) ?
> 
> > FWIW, the behavior Jan described
> > (crashing the domain for all domains but the hardware domain) was
> > indeed the intended plan for this series, as far as I understood from
> > talking to people and looking at previous email conversations and
> > submissions.
> 
> That was taking only the flush timeout as an error source into account.
> Now that we see that the lack of error handling pre-exists, we can't
> just extend that intended model to also cover those other error
> reasons without at least having given people a chance to object.
> 

It makes sense so I'm OK with this behavior change. 

Then regarding to Quan's next version (if nobody disagrees), is it better
to first describe related callers and then reach agreement which caller
still needs error handling for hardware domain, before Quan starts to
change actual code (at least based on current discussion Quan didn't
have thorough understanding of such necessity yet)?

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-21  6:18                 ` Tian, Kevin
@ 2016-03-21 12:22                   ` Jan Beulich
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2016-03-21 12:22 UTC (permalink / raw)
  To: Dario Faggioli, Kevin Tian, Quan Xu
  Cc: Feng Wu, George Dunlap, Liu Jinsong, xen-devel, Jun Nakajima,
	Andrew Cooper, Keir Fraser

>>> On 21.03.16 at 07:18, <kevin.tian@intel.com> wrote:
>>  From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Friday, March 18, 2016 5:49 PM
>> 
>> >>> On 18.03.16 at 10:38, <dario.faggioli@citrix.com> wrote:
>> > On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote:
>> >> >
>> >> Not sure what exactly you're asking for: As said, we first need to
>> >> settle on an abstract model. Do we want IOMMU mapping failures
>> >> to be fatal to the domain (perhaps with the exception of the
>> >> hardware one)? I think we do, and for the hardware domain we'd
>> >> do things on a best effort basis (always erring on the side of
>> >> unmapping). Which would probably mean crashing the domain
>> >> could be centralized in iommu_{,un}map_page(). How much roll
>> >> back would then still be needed in callers of these functions for
>> >> the hardware domain's sake would need to be seen.
>> >>
>> >> So before you start coing, give others (namely but not limited to
>> >> VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance
>> >> to voice differing opinions.
>> >>
>> > I'm nothing of the above but,
>> 
>> Don't you fall under "but not limited to"  ;-) ?
>> 
>> > FWIW, the behavior Jan described
>> > (crashing the domain for all domains but the hardware domain) was
>> > indeed the intended plan for this series, as far as I understood from
>> > talking to people and looking at previous email conversations and
>> > submissions.
>> 
>> That was taking only the flush timeout as an error source into account.
>> Now that we see that the lack of error handling pre-exists, we can't
>> just extend that intended model to also cover those other error
>> reasons without at least having given people a chance to object.
>> 
> 
> It makes sense so I'm OK with this behavior change. 
> 
> Then regarding to Quan's next version (if nobody disagrees), is it better
> to first describe related callers and then reach agreement which caller
> still needs error handling for hardware domain, before Quan starts to
> change actual code (at least based on current discussion Quan didn't
> have thorough understanding of such necessity yet)?

Well, ideally we'd indeed reach agreement before starting to write
or change code. However, in a case like this where error handling
was just ignored in pre-existing code, I think the easiest way to get
a complete picture is to see what places / paths need changing. I.e.
here it may indeed be better to start from the patches. Whether that
means posting the patches in patch form, or - prior to posting them -
extracting what was learned into a textual description is a different
aspect (and I'd leave it to Quan to pick the route more suitable to him).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-18  8:09         ` Jan Beulich
@ 2016-03-24  6:45           ` Xu, Quan
  0 siblings, 0 replies; 42+ messages in thread
From: Xu, Quan @ 2016-03-24  6:45 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Tian, Kevin, Wu, Feng, Liu Jinsong, DarioFaggioli, xen-devel,
	Nakajima, Jun, Andrew Cooper, Keir Fraser

On March 18, 2016 4:10pm, <JBeulich@suse.com> wrote:
> >>> On 18.03.16 at 04:19, <quan.xu@intel.com> wrote:
> > On March 17, 2016 8:34pm, George Dunlap <George.Dunlap@eu.citrix.com>
> wrote:
> >> On Thu, Mar 17, 2016 at 12:30 PM, George Dunlap
> >> <George.Dunlap@eu.citrix.com> wrote:
> >> > On Thu, Mar 17, 2016 at 6:54 AM, Quan Xu <quan.xu@intel.com> wrote:
> >> >> --- a/xen/arch/x86/mm/p2m-ept.c
> >> >> +++ b/xen/arch/x86/mm/p2m-ept.c
> >> >> @@ -830,7 +830,15 @@ out:
> >> >>          {
> >> >>              if ( iommu_flags )
> >> >>                  for ( i = 0; i < (1 << order); i++ )
> >> >> -                    iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
> >> iommu_flags);
> >> >> +                {
> >> >> +                    rc = iommu_map_page(d, gfn + i, mfn_x(mfn) +
> >> >> + i,
> >> iommu_flags);
> >> >> +                    if ( rc )
> >> >> +                    {
> >> >> +                        while ( i-- > 0 )
> >> >> +                            iommu_unmap_page(d, gfn + i);
> >> >
> >> > This won't unmap gfn+0 (since it will break out when i == 0 without
> >> > calling unmap).
> >>
> >> Oh, no it won't, because the decrement is postfix.
> >>
> >> For us mere mortals, I'd appreciate a comment here like this:
> >>
> >> /* Postfix operator means we will call unmap with i == 0 */
> >>
> > Agreed.
> > For these 2 points, to summarize:
> >    - adding "unlikely()" to the if() condition, e.g. if ( unlikely(rc) )
> >    - adding a comment:
> >         /* Postfix operator means we will call unmap with i == 0 */
> 
> To be honest, I'm opposed to the addition of such comments.
> See also the parallel discussion rooted at
> http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg01779.html
> 

Reading the Follow-Ups email, it looks a pretty common cleanup pattern.
now I don't fully get this point, but I would try to follow this pattern.

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-18  9:48               ` Jan Beulich
  2016-03-21  6:18                 ` Tian, Kevin
@ 2016-03-24  9:02                 ` Xu, Quan
  2016-03-24  9:58                   ` Jan Beulich
  1 sibling, 1 reply; 42+ messages in thread
From: Xu, Quan @ 2016-03-24  9:02 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Liu Jinsong, xen-devel,
	Nakajima, Jun, Andrew Cooper, Keir Fraser

On March 18, 2016 5:49pm, <JBeulich@suse.com> wrote:
> >>> On 18.03.16 at 10:38, <dario.faggioli@citrix.com> wrote:
> > On Fri, 2016-03-18 at 03:29 -0600, Jan Beulich wrote:
> >> >
> >> Not sure what exactly you're asking for: As said, we first need to
> >> settle on an abstract model. Do we want IOMMU mapping failures to be
> >> fatal to the domain (perhaps with the exception of the hardware one)?
> >> I think we do, and for the hardware domain we'd do things on a best
> >> effort basis (always erring on the side of unmapping). Which would
> >> probably mean crashing the domain could be centralized in
> >> iommu_{,un}map_page(). How much roll back would then still be needed
> >> in callers of these functions for the hardware domain's sake would
> >> need to be seen.
> >>
> >> So before you start coing, give others (namely but not limited to
> >> VT-d, AMD IOMMU, other x86, and x86/mm maintainers) a chance to voice
> >> differing opinions.
> >>
> > FWIW, the behavior Jan described
> > (crashing the domain for all domains but the hardware domain) was
> > indeed the intended plan for this series, as far as I understood from
> > talking to people and looking at previous email conversations and
> > submissions.
> 
> That was taking only the flush timeout as an error source into account.
> Now that we see that the lack of error handling pre-exists, we can't just extend
> that intended model to also cover those other error reasons without at least
> having given people a chance to object.
> 

For this abstract model, 
I assume we are on the same page for the precondition:
If Device-TLB flush timed out, we would hide the target ATS device and crash the domain owning this ATS device. 
If impacted domain is hardware domain, just throw out a warning.

Then IMO,
   1. Try the best to return error code.
   2. Log error and don't return error value for hardware_domain init or crashed system shutdown.
   3. For iommu_{,un}map_page(), we'd better fix it as a normal error, as the error is not only from iommu flush, .e.g, '-ENOMEM'.
     So, we need to {,un}map from the IOMMU, return an error, and roll back the failed operation( .e.g, unmap EPT).
   4. for the rest, we may return an error, but don't roll back the failed operation, and we need to analysis the different condition.

Quan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-24  9:02                 ` Xu, Quan
@ 2016-03-24  9:58                   ` Jan Beulich
  2016-03-24 14:12                     ` Xu, Quan
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2016-03-24  9:58 UTC (permalink / raw)
  To: Dario Faggioli, Quan Xu
  Cc: Kevin Tian, Feng Wu, George Dunlap, Liu Jinsong, xen-devel,
	Jun Nakajima, Andrew Cooper, Keir Fraser

>>> On 24.03.16 at 10:02, <quan.xu@intel.com> wrote:
> On March 18, 2016 5:49pm, <JBeulich@suse.com> wrote:
>> That was taking only the flush timeout as an error source into account.
>> Now that we see that the lack of error handling pre-exists, we can't just extend
>> that intended model to also cover those other error reasons without at least
>> having given people a chance to object.
>> 
> 
> For this abstract model, 
> I assume we are on the same page for the precondition:
> If Device-TLB flush timed out, we would hide the target ATS device and crash 
> the domain owning this ATS device. 
> If impacted domain is hardware domain, just throw out a warning.
> 
> Then IMO,
>    1. Try the best to return error code.
>    2. Log error and don't return error value for hardware_domain init or 
> crashed system shutdown.
>    3. For iommu_{,un}map_page(), we'd better fix it as a normal error, as 
> the error is not only from iommu flush, .e.g, '-ENOMEM'.
>      So, we need to {,un}map from the IOMMU, return an error, and roll back 
> the failed operation( .e.g, unmap EPT).

Well, if that possible in a provably correct way, then sure. But be
clear - when the failure occurs while unmapping, unmapping the
EPT entry obviously can't be the solution, you'd need a true
roll back. And of course you should keep in mind what happens to
the guest if such an operation fails: If you can be certain it'll crash
because of this later on anyway, you're likely better off crashing
it right away (such that the reason for the crash is at least obvious).

Jan

>    4. for the rest, we may return an error, but don't roll back the failed 
> operation, and we need to analysis the different condition.
> 
> Quan




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-24  9:58                   ` Jan Beulich
@ 2016-03-24 14:12                     ` Xu, Quan
  2016-03-24 14:37                       ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Xu, Quan @ 2016-03-24 14:12 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Liu Jinsong, xen-devel,
	Nakajima, Jun, Andrew Cooper, Keir Fraser

On March 24, 2016 5:59pm, <JBeulich@suse.com> wrote:
> >>> On 24.03.16 at 10:02, <quan.xu@intel.com> wrote:
> > On March 18, 2016 5:49pm, <JBeulich@suse.com> wrote:


> >    3. For iommu_{,un}map_page(), we'd better fix it as a normal error,
> > as the error is not only from iommu flush, .e.g, '-ENOMEM'.
> >      So, we need to {,un}map from the IOMMU, return an error, and roll
> > back the failed operation( .e.g, unmap EPT).
> 
> Well, if that possible in a provably correct way, then sure. But be clear - when
> the failure occurs while unmapping, unmapping the EPT entry obviously can't be
> the solution, 

I hope we discuss about the same point as bellow?:
   ept_set_entry()
   {
    ....
          if ( iommu_flags )
                for ( i = 0; i < (1 << order); i++ )
                    iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
            else
                for ( i = 0; i < (1 << order); i++ )
                    iommu_unmap_page(d, gfn + i);
    ....
   }


> you'd need a true roll back.

Does it refer to as:
If the old entry is present, we need to write back the old entry? 

> And of course you should keep in mind
> what happens to the guest if such an operation fails: If you can be certain it'll
> crash because of this later on anyway, you're likely better off crashing it right
> away (such that the reason for the crash is at least obvious).
> 

I think, for iommu_{,un}map_page(), it would be not aware that the domain is going to crash.
As mentioned,  the error is not only from iommu flush.
We need to fix it one by one. fortunately, it is limited.




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-24 14:12                     ` Xu, Quan
@ 2016-03-24 14:37                       ` Jan Beulich
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2016-03-24 14:37 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, George Dunlap, Liu Jinsong, Dario Faggioli,
	xen-devel, Jun Nakajima, Andrew Cooper, Keir Fraser

>>> On 24.03.16 at 15:12, <quan.xu@intel.com> wrote:
> On March 24, 2016 5:59pm, <JBeulich@suse.com> wrote:
>> >>> On 24.03.16 at 10:02, <quan.xu@intel.com> wrote:
>> > On March 18, 2016 5:49pm, <JBeulich@suse.com> wrote:
> 
> 
>> >    3. For iommu_{,un}map_page(), we'd better fix it as a normal error,
>> > as the error is not only from iommu flush, .e.g, '-ENOMEM'.
>> >      So, we need to {,un}map from the IOMMU, return an error, and roll
>> > back the failed operation( .e.g, unmap EPT).
>> 
>> Well, if that possible in a provably correct way, then sure. But be clear - when
>> the failure occurs while unmapping, unmapping the EPT entry obviously can't be
>> the solution, 
> 
> I hope we discuss about the same point as bellow?:
>    ept_set_entry()
>    {
>     ....
>           if ( iommu_flags )
>                 for ( i = 0; i < (1 << order); i++ )
>                     iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
>             else
>                 for ( i = 0; i < (1 << order); i++ )
>                     iommu_unmap_page(d, gfn + i);
>     ....
>    }
> 
> 
>> you'd need a true roll back.
> 
> Does it refer to as:
> If the old entry is present, we need to write back the old entry? 

Yes, but that's not as simple as it sounds: You also need to take
care of the splitting of super pages which may have occurred.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.
  2016-03-18 10:20   ` Jan Beulich
@ 2016-03-25  9:27     ` Xu, Quan
  2016-03-29  7:36       ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Xu, Quan @ 2016-03-25  9:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, Nakajima, Jun, George Dunlap,
	Andrew Cooper, Dario Faggioli, xen-devel, Julien Grall,
	Stefano Stabellini, Suravee Suthikulpanit, Keir Fraser

On March 18, 2016 6:20pm, <JBeulich@suse.com> wrote:
> >>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote:
> > --- a/xen/drivers/passthrough/amd/iommu_init.c
> > +++ b/xen/drivers/passthrough/amd/iommu_init.c
> > @@ -1339,12 +1339,14 @@ static void invalidate_all_devices(void)
> >      iterate_ivrs_mappings(_invalidate_all_devices);
> >  }
> >
> > -void amd_iommu_suspend(void)
> > +int amd_iommu_suspend(void)
> >  {
> >      struct amd_iommu *iommu;
> >
> >      for_each_amd_iommu ( iommu )
> >          disable_iommu(iommu);
> > +
> > +    return 0;
> >  }
> >
> >  void amd_iommu_resume(void)
> > @@ -1368,3 +1370,11 @@ void amd_iommu_resume(void)
> >          invalidate_all_domain_pages();
> >      }
> >  }
> > +
> > +void amd_iommu_crash_shutdown(void)
> > +{
> > +    struct amd_iommu *iommu;
> > +
> > +    for_each_amd_iommu ( iommu )
> > +        disable_iommu(iommu);
> > +}
> 
> One of the two should clearly call the other - no need to have the same code
> twice.
> 

Good idea.

> > --- a/xen/drivers/passthrough/iommu.c
> > +++ b/xen/drivers/passthrough/iommu.c
> > @@ -182,7 +182,11 @@ void __hwdom_init iommu_hwdom_init(struct
> domain *d)
> >                   ((page->u.inuse.type_info & PGT_type_mask)
> >                    == PGT_writable_page) )
> >                  mapping |= IOMMUF_writable;
> > -            hd->platform_ops->map_page(d, gfn, mfn, mapping);
> > +            if ( hd->platform_ops->map_page(d, gfn, mfn, mapping) )
> > +                printk(XENLOG_G_ERR
> > +                       "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx)
> failed.\n",
> > +                       gfn, mfn);
> > +
> 
> Printing one message here is certainly necessary, but what if the failure repeats
> for very many pages? 

Yes, to me, it is ok, but I am open to your suggestion.

> Also %#lx instead of 0x%lx please, and a blank before the
> opening parenthesis.
> 
OK, just check it:

..
"IOMMU: Map page gfn: %#lx (mfn: %#lx) failed.\n"
..

Right?


> > @@ -554,11 +555,24 @@ static void iommu_flush_all(void)
> >          iommu = drhd->iommu;
> >          iommu_flush_context_global(iommu, 0);
> >          flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
> > -        iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
> > +        rc = iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
> > +
> > +        if ( rc > 0 )
> > +        {
> > +            iommu_flush_write_buffer(iommu);
> 
> Why is this needed all of the sudden?

As there may be multiple IOMMUs. .e.g, there are 2 IOMMUs in my machine, and I can find the following log message:
"""
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB.
"""
__iiuc__, iommu_flush_write_buffer() is per IOMMU, so It should be called to flush every IOMMU.



> (Note that if you did a more fine grained
> split, it might also be easier for you to note/ explain all the not directly related
> changes in the respective commit messages. Unless of course they fix actual
> bugs, in which case they should be split out anyway; such individual fixes would
> also likely have a much faster route to commit, relieving you earlier from the
> burden of at least some of the changes you have to carry and re-base.)
> 
> > +            rc = 0;
> > +        }
> > +        else if ( rc < 0 )
> > +        {
> > +            printk(XENLOG_G_ERR "IOMMU: IOMMU flush all failed.\n");
> > +            break;
> > +        }
> 
> Is a log message really advisable here?
> 

To me, It looks tricky too. I was struggling to make decision. For scheme B, I would try to do as below:

if ( iommu_flush_all() )
    printk("... nnn ...");

but there are 4 function calls, if so, to me, it looks redundant.

Or, could I ignore the print out for iommu_flush_all() failed?



> > -static void __intel_iommu_iotlb_flush(struct domain *d, unsigned long
> > gfn,
> > +static int __intel_iommu_iotlb_flush(struct domain *d, unsigned long
> > +gfn,
> 
> While I'm not VT-d maintainer, I think changes like this would be a good
> opportunity to also drop the stray double underscores: You need to touch all
> callers anyway.
> 

I think this is optional.


> > @@ -584,37 +599,40 @@ static void __intel_iommu_iotlb_flush(struct
> > domain *d, unsigned long gfn,
> >              continue;
> >
> >          if ( page_count != 1 || gfn == INVALID_GFN )
> > -        {
> > -            if ( iommu_flush_iotlb_dsi(iommu, iommu_domid,
> > -                        0, flush_dev_iotlb) )
> > -                iommu_flush_write_buffer(iommu);
> > -        }
> > +            rc = iommu_flush_iotlb_dsi(iommu, iommu_domid,
> > +                                       0, flush_dev_iotlb);
> >          else
> > +            rc = iommu_flush_iotlb_psi(iommu, iommu_domid,
> > +                                       (paddr_t)gfn <<
> PAGE_SHIFT_4K, 0,
> > +                                       !dma_old_pte_present,
> > +                                       flush_dev_iotlb);
> > +        if ( rc > 0 )
> >          {
> > -            if ( iommu_flush_iotlb_psi(iommu, iommu_domid,
> > -                        (paddr_t)gfn << PAGE_SHIFT_4K,
> PAGE_ORDER_4K,
> 
> Note how this used PAGE_ORDER_4K so far?

Sorry, this is a rebasing mistake.

> 
> > -                        !dma_old_pte_present, flush_dev_iotlb) )
> > -                iommu_flush_write_buffer(iommu);
> > +            iommu_flush_write_buffer(iommu);
> 
> Same question again: Why is this all of the sudden needed on both paths?
> 

The same as above question. Hold on first.


> > @@ -622,7 +640,7 @@ static void dma_pte_clear_one(struct domain
> *domain, u64 addr)
> >      if ( pg_maddr == 0 )
> >      {
> >          spin_unlock(&hd->arch.mapping_lock);
> > -        return;
> > +        return -ENOMEM;
> >      }
> 
> addr_to_dma_page_maddr() gets called with "alloc" being false, so there can't
> be any memory allocation failure here. There simply is nothing to do in this
> case.
> 

I copy it from iommu_map_page().

Good, then the error of iommu_unmap_page() looks only from flush (the crash is at least obvious), then error handling can be lighter weight--
We may return an error, but don't roll back the failed operation.
Right?

> > -void me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
> > +int me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
> >  {
> >      u32 id;
> > +    int rc = 0;
> >
> >      id = pci_conf_read32(0, 0, 0, 0, 0);
> >      if ( IS_CTG(id) )
> >      {
> >          /* quit if ME does not exist */
> >          if ( pci_conf_read32(0, 0, 3, 0, 0) == 0xffffffff )
> > -            return;
> > +            return -ENOENT;
> 
> Is this really an error? IOW, do all systems which satisfy IS_CTG() have such a
> device?
> 
To be honest, I didn't know much about me_wifi_quirk.
Now, IMO I don't need to deal with me_wifi_quirk().

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-17 17:14   ` Jan Beulich
@ 2016-03-28  3:33     ` Xu, Quan
  2016-03-29  7:20       ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Xu, Quan @ 2016-03-28  3:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Liu Jinsong,
	Dario Faggioli, xen-devel, Nakajima, Jun, Andrew Cooper,
	Keir Fraser

On March 18, 2016 1:15am, <JBeulich@suse.com> wrote:
> >>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote:
> > @@ -53,11 +55,21 @@ static int device_power_down(void)
> >
> >      ioapic_suspend();
> >
> > -    iommu_suspend();
> > +    err = iommu_suspend();
> > +    if ( err )
> > +        goto iommu_suspend_error;
> >
> >      lapic_suspend();
> >
> >      return 0;
> > +
> > +iommu_suspend_error:
> 
> Labels indented by at least one space please.
> 

Good, I wasn't aware of it.


> > --- a/xen/arch/x86/mm/p2m-ept.c
> > +++ b/xen/arch/x86/mm/p2m-ept.c
> > @@ -830,7 +830,15 @@ out:
> >          {
> >              if ( iommu_flags )
> >                  for ( i = 0; i < (1 << order); i++ )
> > -                    iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
> iommu_flags);
> > +                {
> > +                    rc = iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
> iommu_flags);
> > +                    if ( rc )
> > +                    {
> > +                        while ( i-- > 0 )
> > +                            iommu_unmap_page(d, gfn + i);
> > +                        break;
> > +                    }
> > +                }
> >              else
> >                  for ( i = 0; i < (1 << order); i++ )
> >                      iommu_unmap_page(d, gfn + i);
> 
> Earlier on in the PV mm code you also checked iommu_unmap_page()'s return
> code - why not here (and also in p2m-pt.c)?
> 
> Also I'm quite unhappy about the inconsistent state you leave things
> in: You unmap from the IOMMU, return an error, but leave the EPT entry in
> place.
> 

As I mentioned for the abstract model,
     For iommu_{,un}map_page(), we'd better fix it as a normal error, as the error is not only from iommu flush, .e.g, '-ENOMEM'.
     So, we need to {,un}map from the IOMMU, return an error, and roll back the failed operation. 

For iommu_unmap_page, it is still under discussion in another thread. I think we could hold it on, waiting for another discussion.


> > --- a/xen/common/grant_table.c
> > +++ b/xen/common/grant_table.c
> > @@ -932,8 +932,9 @@ __gnttab_map_grant_ref(
> >              {
> >                  nr_gets++;
> >                  (void)get_page(pg, rd);
> > -                if ( !(op->flags & GNTMAP_readonly) )
> > -                    get_page_type(pg, PGT_writable_page);
> > +                if ( !(op->flags & GNTMAP_readonly) &&
> > +                     !get_page_type(pg, PGT_writable_page) )
> > +                        goto could_not_pin;
> 
> This needs explanation, as it doesn't look related to what your actual goal is: If
> an error was possible here, I think this would be a security issue. However, as
> also kind of documented by the explicitly ignored return value from get_page(),
> it is my understanding there here we only obtain an _extra_ reference.
> 

For this point, I inferred from:
map_vcpu_info()
{
...
    if ( !get_page_type(page, PGT_writable_page) )
    {
        put_page(page);
        return -EINVAL;
    }
...
}
, then for get_page_type(), I think the return value:
     0 -- error, 
     1-- right.

So if get_page_type() is failed, we should goto could_not_pin.

btw, there is another issue in the call path:
    iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---


I tried to return iommu_{,un}map_page() error code in __get_page_type(), is it right?

> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -678,8 +678,9 @@ static int xenmem_add_to_physmap(struct domain
> *d,
> >      if ( need_iommu(d) )
> >      {
> >          this_cpu(iommu_dont_flush_iotlb) = 0;
> > -        iommu_iotlb_flush(d, xatp->idx - done, done);
> > -        iommu_iotlb_flush(d, xatp->gpfn - done, done);
> > +        rc = iommu_iotlb_flush(d, xatp->idx - done, done);
> > +        if ( !rc )
> > +            rc = iommu_iotlb_flush(d, xatp->gpfn - done, done);
> >      }
> 
> And the pattern repeats - you now return an error, but you don't roll back the
> now failed operation. But wait - maybe that intended:
> Are you meaning to crash the guest in such cases (somewhere deep in the flush
> code)? If so, I think that's fine, but you absolutely would need to say so in the
> commit message.
> 

Yes, I should enhance the commit message.

> > --- a/xen/drivers/passthrough/x86/iommu.c
> > +++ b/xen/drivers/passthrough/x86/iommu.c
> > @@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct
> domain *d)
> >      this_cpu(iommu_dont_flush_iotlb) = 0;
> >
> >      if ( !rc )
> > -        iommu_iotlb_flush_all(d);
> > +    {
> > +        rc = iommu_iotlb_flush_all(d);
> > +        if ( rc )
> > +            iommu_teardown(d);
> > +    }
> >      else if ( rc != -ERESTART )
> >          iommu_teardown(d);
> 
> Why can't you just use the existing call to iommu_teardown(), by simply deleting
> the "else"?
> 

Just check it, could I modify it as below:
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -105,7 +105,8 @@ int arch_iommu_populate_page_table(struct domain *d)

     if ( !rc )
         iommu_iotlb_flush_all(d);
-    else if ( rc != -ERESTART )
+
+    if ( rc != -ERESTART )
         iommu_teardown(d);


Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-28  3:33     ` Xu, Quan
@ 2016-03-29  7:20       ` Jan Beulich
  2016-03-30  2:28         ` Xu, Quan
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2016-03-29  7:20 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, George Dunlap, Liu Jinsong, Dario Faggioli,
	xen-devel, Jun Nakajima, Andrew Cooper, Keir Fraser

>>> On 28.03.16 at 05:33, <quan.xu@intel.com> wrote:
> On March 18, 2016 1:15am, <JBeulich@suse.com> wrote:
>> >>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote:
>> > --- a/xen/common/grant_table.c
>> > +++ b/xen/common/grant_table.c
>> > @@ -932,8 +932,9 @@ __gnttab_map_grant_ref(
>> >              {
>> >                  nr_gets++;
>> >                  (void)get_page(pg, rd);
>> > -                if ( !(op->flags & GNTMAP_readonly) )
>> > -                    get_page_type(pg, PGT_writable_page);
>> > +                if ( !(op->flags & GNTMAP_readonly) &&
>> > +                     !get_page_type(pg, PGT_writable_page) )
>> > +                        goto could_not_pin;
>> 
>> This needs explanation, as it doesn't look related to what your actual goal is: If
>> an error was possible here, I think this would be a security issue. However, as
>> also kind of documented by the explicitly ignored return value from get_page(),
>> it is my understanding there here we only obtain an _extra_ reference.
>> 
> 
> For this point, I inferred from:
> map_vcpu_info()
> {
> ...
>     if ( !get_page_type(page, PGT_writable_page) )
>     {
>         put_page(page);
>         return -EINVAL;
>     }
> ...
> }
> , then for get_page_type(), I think the return value:
>      0 -- error, 
>      1-- right.
> 
> So if get_page_type() is failed, we should goto could_not_pin.

Did you read my reply at all? The explanation I'm expecting here is
why error checking is all of the sudden needed _at all_.

> btw, there is another issue in the call path:
>     iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---
> 
> 
> I tried to return iommu_{,un}map_page() error code in __get_page_type(), is 
> it right?

If the operation got fully rolled back - yes. Whether fully rolling back
is feasible there though is - see the respective discussion - an open
question.

>> > --- a/xen/drivers/passthrough/x86/iommu.c
>> > +++ b/xen/drivers/passthrough/x86/iommu.c
>> > @@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct
>> domain *d)
>> >      this_cpu(iommu_dont_flush_iotlb) = 0;
>> >
>> >      if ( !rc )
>> > -        iommu_iotlb_flush_all(d);
>> > +    {
>> > +        rc = iommu_iotlb_flush_all(d);
>> > +        if ( rc )
>> > +            iommu_teardown(d);
>> > +    }
>> >      else if ( rc != -ERESTART )
>> >          iommu_teardown(d);
>> 
>> Why can't you just use the existing call to iommu_teardown(), by simply 
> deleting
>> the "else"?
>> 
> 
> Just check it, could I modify it as below:
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -105,7 +105,8 @@ int arch_iommu_populate_page_table(struct domain *d)
> 
>      if ( !rc )
>          iommu_iotlb_flush_all(d);
> -    else if ( rc != -ERESTART )
> +
> +    if ( rc != -ERESTART )
>          iommu_teardown(d);

Clearly not - not only are you losing the return value of
iommu_iotlb_flush_all() now, you would then also call
iommu_teardown() in the "success" case. My comment was
related to code structure, yet you seem to have taken it
literally.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.
  2016-03-25  9:27     ` Xu, Quan
@ 2016-03-29  7:36       ` Jan Beulich
  2016-04-11  3:09         ` Xu, Quan
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2016-03-29  7:36 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, Jun Nakajima, George Dunlap, Andrew Cooper,
	DarioFaggioli, xen-devel, Julien Grall, Stefano Stabellini,
	Suravee Suthikulpanit, Keir Fraser

>>> On 25.03.16 at 10:27, <quan.xu@intel.com> wrote:
> On March 18, 2016 6:20pm, <JBeulich@suse.com> wrote:
>> >>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote:
>> > --- a/xen/drivers/passthrough/iommu.c
>> > +++ b/xen/drivers/passthrough/iommu.c
>> > @@ -182,7 +182,11 @@ void __hwdom_init iommu_hwdom_init(struct
>> domain *d)
>> >                   ((page->u.inuse.type_info & PGT_type_mask)
>> >                    == PGT_writable_page) )
>> >                  mapping |= IOMMUF_writable;
>> > -            hd->platform_ops->map_page(d, gfn, mfn, mapping);
>> > +            if ( hd->platform_ops->map_page(d, gfn, mfn, mapping) )
>> > +                printk(XENLOG_G_ERR
>> > +                       "IOMMU: Map page gfn: 0x%lx(mfn: 0x%lx)
>> failed.\n",
>> > +                       gfn, mfn);
>> > +
>> 
>> Printing one message here is certainly necessary, but what if the failure 
> repeats
>> for very many pages? 
> 
> Yes, to me, it is ok, but I am open to your suggestion.
> 
>> Also %#lx instead of 0x%lx please, and a blank before the
>> opening parenthesis.
>> 
> OK, just check it:
> 
> ..
> "IOMMU: Map page gfn: %#lx (mfn: %#lx) failed.\n"
> ..
> 
> Right?

Almost: Generally no full stop in log messages please. And I think
the word "page" is redundant here. As rule of thumb: Log
messages should give as much as possible useful information (which
includes the requirement for distinct ones to be distinguishable in
resulting logs) with as little as possible verbosity.

>> > @@ -554,11 +555,24 @@ static void iommu_flush_all(void)
>> >          iommu = drhd->iommu;
>> >          iommu_flush_context_global(iommu, 0);
>> >          flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
>> > -        iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
>> > +        rc = iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
>> > +
>> > +        if ( rc > 0 )
>> > +        {
>> > +            iommu_flush_write_buffer(iommu);
>> 
>> Why is this needed all of the sudden?
> 
> As there may be multiple IOMMUs. .e.g, there are 2 IOMMUs in my machine, and 
> I can find the following log message:
> """
> (XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
> (XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB.
> """
> __iiuc__, iommu_flush_write_buffer() is per IOMMU, so It should be called to 
> flush every IOMMU.

For one what you say suggests that right now this is being done
for some (one?) IOMMU(s), which I don't see being the case. And
then what you say _still_ doesn't say _why_ this is now needed all
of the sudden. If, in the course of doing your re-work here, you
find pre-existing issues with the code, please split the necessary
fixes out of your re-work and submit them separately with proper
explanations in their commit messages.

>> > +            rc = 0;
>> > +        }
>> > +        else if ( rc < 0 )
>> > +        {
>> > +            printk(XENLOG_G_ERR "IOMMU: IOMMU flush all failed.\n");
>> > +            break;
>> > +        }
>> 
>> Is a log message really advisable here?
>> 
> 
> To me, It looks tricky too. I was struggling to make decision. For scheme B, 
> I would try to do as below:
> 
> if ( iommu_flush_all() )
>     printk("... nnn ...");
> 
> but there are 4 function calls, if so, to me, it looks redundant.
> 
> Or, could I ignore the print out for iommu_flush_all() failed?

Directing the question back is not helpful: You should know better
than me why you had added a log message there in the first place.
And it is this "why" which is to judge about whether it should stay
there, move elsewhere, or get dropped altogether.

>> > @@ -622,7 +640,7 @@ static void dma_pte_clear_one(struct domain
>> *domain, u64 addr)
>> >      if ( pg_maddr == 0 )
>> >      {
>> >          spin_unlock(&hd->arch.mapping_lock);
>> > -        return;
>> > +        return -ENOMEM;
>> >      }
>> 
>> addr_to_dma_page_maddr() gets called with "alloc" being false, so there can't
>> be any memory allocation failure here. There simply is nothing to do in this
>> case.
>> 
> 
> I copy it from iommu_map_page().
> 
> Good, then the error of iommu_unmap_page() looks only from flush (the crash 
> is at least obvious), then error handling can be lighter weight--
> We may return an error, but don't roll back the failed operation.
> Right?

I don't think so, and I can only re-iterate: There can't be any error
here, so there's no error code to forward up the call tree. IOW the
"pg_maddr == 0" case simply means "nothing to do" here.

>> > -void me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
>> > +int me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map)
>> >  {
>> >      u32 id;
>> > +    int rc = 0;
>> >
>> >      id = pci_conf_read32(0, 0, 0, 0, 0);
>> >      if ( IS_CTG(id) )
>> >      {
>> >          /* quit if ME does not exist */
>> >          if ( pci_conf_read32(0, 0, 3, 0, 0) == 0xffffffff )
>> > -            return;
>> > +            return -ENOENT;
>> 
>> Is this really an error? IOW, do all systems which satisfy IS_CTG() have such a
>> device?
>> 
> To be honest, I didn't know much about me_wifi_quirk.

In such a case - how about asking the maintainers, who are your
colleagues? And that of course after having looked at the history
in an attempt to gain some understanding.

> Now, IMO I don't need to deal with me_wifi_quirk().

Well, you clearly can't ignore it.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-29  7:20       ` Jan Beulich
@ 2016-03-30  2:28         ` Xu, Quan
  2016-03-30  2:35           ` Xu, Quan
  2016-03-30  8:05           ` Jan Beulich
  0 siblings, 2 replies; 42+ messages in thread
From: Xu, Quan @ 2016-03-30  2:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Liu Jinsong,
	Dario Faggioli, xen-devel, Nakajima, Jun, Andrew Cooper,
	Keir Fraser

On March 29, 2016 3:21pm, <JBeulich@suse.com> wrote:
> >>> On 28.03.16 at 05:33, <quan.xu@intel.com> wrote:
> > On March 18, 2016 1:15am, <JBeulich@suse.com> wrote:
> >> >>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote:
> >> > --- a/xen/common/grant_table.c
> >> > +++ b/xen/common/grant_table.c
> >> > @@ -932,8 +932,9 @@ __gnttab_map_grant_ref(
> >> >              {
> >> >                  nr_gets++;
> >> >                  (void)get_page(pg, rd);
> >> > -                if ( !(op->flags & GNTMAP_readonly) )
> >> > -                    get_page_type(pg, PGT_writable_page);
> >> > +                if ( !(op->flags & GNTMAP_readonly) &&
> >> > +                     !get_page_type(pg, PGT_writable_page) )
> >> > +                        goto could_not_pin;
> >>
> >> This needs explanation, as it doesn't look related to what your
> >> actual goal is: If an error was possible here, I think this would be
> >> a security issue. However, as also kind of documented by the
> >> explicitly ignored return value from get_page(), it is my understanding there
> here we only obtain an _extra_ reference.
> >>
> >
> > For this point, I inferred from:
> > map_vcpu_info()
> > {
> > ...
> >     if ( !get_page_type(page, PGT_writable_page) )
> >     {
> >         put_page(page);
> >         return -EINVAL;
> >     }
> > ...
> > }
> > , then for get_page_type(), I think the return value:
> >      0 -- error,
> >      1-- right.
> >
> > So if get_page_type() is failed, we should goto could_not_pin.
> 
> Did you read my reply at all? The explanation I'm expecting here is why error
> checking is all of the sudden needed _at all_.
> 

Sorry for my stupid reply.
As in this version, before the open discussion, I try to return the iommu_{,un}map_page() error in this call tree:
           iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---
then, in this point, I try to deal with this iommu_{,un}map_page() error.

> > btw, there is another issue in the call path:
> >     iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---
> >
> >
> > I tried to return iommu_{,un}map_page() error code in
> > __get_page_type(), is it right?
> 
> If the operation got fully rolled back - yes. Whether fully rolling back is feasible
> there though is - see the respective discussion - an open question.
> 

For the open question, does it refer to as below:

"""
As said, we first need
to settle on an abstract model. Do we want IOMMU mapping
failures to be fatal to the domain (perhaps with the exception
of the hardware one)? I think we do, and for the hardware domain
we'd do things on a best effort basis (always erring on the side
of unmapping). Which would probably mean crashing the domain
could be centralized in iommu_{,un}map_page(). How much roll
back would then still be needed in callers of these functions
for the hardware domain's sake would need to be seen.
"""

I hope it is yes. I read all of your emails again and again, I found I did get the point until this Monday.
I am summarizing it and would send out in a new thread.


> >> > --- a/xen/drivers/passthrough/x86/iommu.c
> >> > +++ b/xen/drivers/passthrough/x86/iommu.c
> >> > @@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct
> >> domain *d)
> >> >      this_cpu(iommu_dont_flush_iotlb) = 0;
> >> >
> >> >      if ( !rc )
> >> > -        iommu_iotlb_flush_all(d);
> >> > +    {
> >> > +        rc = iommu_iotlb_flush_all(d);
> >> > +        if ( rc )
> >> > +            iommu_teardown(d);
> >> > +    }
> >> >      else if ( rc != -ERESTART )
> >> >          iommu_teardown(d);
> >>
> >> Why can't you just use the existing call to iommu_teardown(), by
> >> simply
> > deleting
> >> the "else"?
> >>
> >
> > Just check it, could I modify it as below:
> > --- a/xen/drivers/passthrough/x86/iommu.c
> > +++ b/xen/drivers/passthrough/x86/iommu.c
> > @@ -105,7 +105,8 @@ int arch_iommu_populate_page_table(struct domain
> > *d)
> >
> >      if ( !rc )
> >          iommu_iotlb_flush_all(d);
> > -    else if ( rc != -ERESTART )
> > +
> > +    if ( rc != -ERESTART )
> >          iommu_teardown(d);
> 
> Clearly not - not only are you losing the return value of
> iommu_iotlb_flush_all() now, you would then also call
> iommu_teardown() in the "success" case. My comment was related to code
> structure, yet you seem to have taken it literally.
> 

Then, what about this one:
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -104,8 +104,9 @@ int arch_iommu_populate_page_table(struct domain *d)
     this_cpu(iommu_dont_flush_iotlb) = 0;

     if ( !rc )
-        iommu_iotlb_flush_all(d);
-    else if ( rc != -ERESTART )
+        rc = iommu_iotlb_flush_all(d);
+
+    if ( !rc && rc != -ERESTART )
         iommu_teardown(d);


IMO, my original modification is correct and redundant with 2 'iommu_teardown()'..
If this is still the correct one, could you help me send out the correct one?

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-30  2:28         ` Xu, Quan
@ 2016-03-30  2:35           ` Xu, Quan
  2016-03-30  8:05           ` Jan Beulich
  1 sibling, 0 replies; 42+ messages in thread
From: Xu, Quan @ 2016-03-30  2:35 UTC (permalink / raw)
  To: Xu, Quan, Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Liu Jinsong,
	Dario Faggioli, xen-devel, Nakajima, Jun, Andrew Cooper,
	Keir Fraser

On March 30, 2016 10:28am, Xu, Quan <quan.xu@intel.com> wrote:
> If this is still the correct one, could you help me send out the correct one?
> 
Sorry, a typo:
If this is still _not_ the correct one, could you help me send out the correct one?

Quan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] IOMMU/MMU: Adjust top level functions for VT-d Device-TLB flush error.
  2016-03-30  2:28         ` Xu, Quan
  2016-03-30  2:35           ` Xu, Quan
@ 2016-03-30  8:05           ` Jan Beulich
  1 sibling, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2016-03-30  8:05 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, George Dunlap, Liu Jinsong, Dario Faggioli,
	xen-devel, Jun Nakajima, Andrew Cooper, Keir Fraser

>>> On 30.03.16 at 04:28, <quan.xu@intel.com> wrote:
> On March 29, 2016 3:21pm, <JBeulich@suse.com> wrote:
>> >>> On 28.03.16 at 05:33, <quan.xu@intel.com> wrote:
>> > On March 18, 2016 1:15am, <JBeulich@suse.com> wrote:
>> >> >>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote:
>> >> > --- a/xen/common/grant_table.c
>> >> > +++ b/xen/common/grant_table.c
>> >> > @@ -932,8 +932,9 @@ __gnttab_map_grant_ref(
>> >> >              {
>> >> >                  nr_gets++;
>> >> >                  (void)get_page(pg, rd);
>> >> > -                if ( !(op->flags & GNTMAP_readonly) )
>> >> > -                    get_page_type(pg, PGT_writable_page);
>> >> > +                if ( !(op->flags & GNTMAP_readonly) &&
>> >> > +                     !get_page_type(pg, PGT_writable_page) )
>> >> > +                        goto could_not_pin;
>> >>
>> >> This needs explanation, as it doesn't look related to what your
>> >> actual goal is: If an error was possible here, I think this would be
>> >> a security issue. However, as also kind of documented by the
>> >> explicitly ignored return value from get_page(), it is my understanding there
>> here we only obtain an _extra_ reference.
>> >>
>> >
>> > For this point, I inferred from:
>> > map_vcpu_info()
>> > {
>> > ...
>> >     if ( !get_page_type(page, PGT_writable_page) )
>> >     {
>> >         put_page(page);
>> >         return -EINVAL;
>> >     }
>> > ...
>> > }
>> > , then for get_page_type(), I think the return value:
>> >      0 -- error,
>> >      1-- right.
>> >
>> > So if get_page_type() is failed, we should goto could_not_pin.
>> 
>> Did you read my reply at all? The explanation I'm expecting here is why 
> error
>> checking is all of the sudden needed _at all_.
>> 
> 
> Sorry for my stupid reply.
> As in this version, before the open discussion, I try to return the 
> iommu_{,un}map_page() error in this call tree:
>            iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---
> then, in this point, I try to deal with this iommu_{,un}map_page() error.

I still don't get it: We're talking about a get_page_type() invocation
that previously was known to never fail (or at least so we hope,
based on the existing code). What I'm expecting as an explanation
is why this "cannot fail" state is not true any longer. And while
sorting this out, please pay particular attention to the limited set of
cases where __get_page_type() calls iommu_{,un}map_page() in
the first place.

>> > btw, there is another issue in the call path:
>> >     iommu_{,un}map_page() -- __get_page_type() -- get_page_type()---
>> >
>> >
>> > I tried to return iommu_{,un}map_page() error code in
>> > __get_page_type(), is it right?
>> 
>> If the operation got fully rolled back - yes. Whether fully rolling back is feasible
>> there though is - see the respective discussion - an open question.
>> 
> 
> For the open question, does it refer to as below:

Partly.

> """
> As said, we first need
> to settle on an abstract model. Do we want IOMMU mapping
> failures to be fatal to the domain (perhaps with the exception
> of the hardware one)? I think we do, and for the hardware domain
> we'd do things on a best effort basis (always erring on the side
> of unmapping). Which would probably mean crashing the domain
> could be centralized in iommu_{,un}map_page(). How much roll
> back would then still be needed in callers of these functions
> for the hardware domain's sake would need to be seen.
> """
> 
> I hope it is yes.

It is not clear to me what part of the above this is meant to refer to.
Perhaps this is meant to answer the question in the 2nd sentence,
but I think this really ought to take a little more than "yes".

>> >> > --- a/xen/drivers/passthrough/x86/iommu.c
>> >> > +++ b/xen/drivers/passthrough/x86/iommu.c
>> >> > @@ -104,7 +104,11 @@ int arch_iommu_populate_page_table(struct
>> >> domain *d)
>> >> >      this_cpu(iommu_dont_flush_iotlb) = 0;
>> >> >
>> >> >      if ( !rc )
>> >> > -        iommu_iotlb_flush_all(d);
>> >> > +    {
>> >> > +        rc = iommu_iotlb_flush_all(d);
>> >> > +        if ( rc )
>> >> > +            iommu_teardown(d);
>> >> > +    }
>> >> >      else if ( rc != -ERESTART )
>> >> >          iommu_teardown(d);
>> >>
>> >> Why can't you just use the existing call to iommu_teardown(), by
>> >> simply
>> > deleting
>> >> the "else"?
>> >>
>> >
>> > Just check it, could I modify it as below:
>> > --- a/xen/drivers/passthrough/x86/iommu.c
>> > +++ b/xen/drivers/passthrough/x86/iommu.c
>> > @@ -105,7 +105,8 @@ int arch_iommu_populate_page_table(struct domain
>> > *d)
>> >
>> >      if ( !rc )
>> >          iommu_iotlb_flush_all(d);
>> > -    else if ( rc != -ERESTART )
>> > +
>> > +    if ( rc != -ERESTART )
>> >          iommu_teardown(d);
>> 
>> Clearly not - not only are you losing the return value of
>> iommu_iotlb_flush_all() now, you would then also call
>> iommu_teardown() in the "success" case. My comment was related to code
>> structure, yet you seem to have taken it literally.
>> 
> 
> Then, what about this one:
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -104,8 +104,9 @@ int arch_iommu_populate_page_table(struct domain *d)
>      this_cpu(iommu_dont_flush_iotlb) = 0;
> 
>      if ( !rc )
> -        iommu_iotlb_flush_all(d);
> -    else if ( rc != -ERESTART )
> +        rc = iommu_iotlb_flush_all(d);
> +
> +    if ( !rc && rc != -ERESTART )
>          iommu_teardown(d);
> 
> 
> IMO, my original modification is correct and redundant with 2 
> 'iommu_teardown()'..
> If this is still the correct one, could you help me send out the correct 
> one?

The above looks right to me.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.
  2016-03-29  7:36       ` Jan Beulich
@ 2016-04-11  3:09         ` Xu, Quan
  2016-04-11  3:27           ` Xu, Quan
  0 siblings, 1 reply; 42+ messages in thread
From: Xu, Quan @ 2016-04-11  3:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, Suravee Suthikulpanit, George Dunlap,
	Andrew Cooper, DarioFaggioli, xen-devel, Julien Grall,
	Stefano Stabellini, Nakajima, Jun, Keir Fraser

On March 29, 2016 3:37pm, Jan Beulich <JBeulich@suse.com> wrote:
> >>> On 25.03.16 at 10:27, <quan.xu@intel.com> wrote:
> > On March 18, 2016 6:20pm, <JBeulich@suse.com> wrote:
> >> >>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote:
> >> > --- a/xen/drivers/passthrough/iommu.c
> >> > +++ b/xen/drivers/passthrough/iommu.c
> >> > @@ -554,11 +555,24 @@ static void iommu_flush_all(void)
> >> >          iommu = drhd->iommu;
> >> >          iommu_flush_context_global(iommu, 0);
> >> >          flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
> >> > -        iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
> >> > +        rc = iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
> >> > +
> >> > +        if ( rc > 0 )
> >> > +        {
> >> > +            iommu_flush_write_buffer(iommu);
> >>
> >> Why is this needed all of the sudden?
> >
> > As there may be multiple IOMMUs. .e.g, there are 2 IOMMUs in my
> > machine, and I can find the following log message:
> > """
> > (XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
> > (XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB.
> > """
> > __iiuc__, iommu_flush_write_buffer() is per IOMMU, so It should be
> > called to flush every IOMMU.
> 
> For one what you say suggests that right now this is being done for some (one?)
> IOMMU(s), which I don't see being the case. And then what you say _still_
> doesn't say _why_ this is now needed all of the sudden. If, in the course of
> doing your re-work here, you find pre-existing issues with the code, please split
> the necessary fixes out of your re-work and submit them separately with proper
> explanations in their commit messages.
> 

I find out it is no need modification for this function.
I overlooked the parameter 'flush_non_present_entry', which is '0' to call iommu_flush_iotlb_global().
At the bottom of the call tree,
(Existing code)
>>
In flush->iotlb():
{
....
    if ( flush_non_present_entry )
    {
        if ( !cap_caching_mode(iommu->cap) )
            return 1;
        else
            did = 0;
    }

....
}
<<


it is impossible to return '1', which is used to indicate caller needs to flush cache.

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.
  2016-04-11  3:09         ` Xu, Quan
@ 2016-04-11  3:27           ` Xu, Quan
  2016-04-11 16:34             ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: Xu, Quan @ 2016-04-11  3:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, Nakajima, Jun, George Dunlap,
	Andrew Cooper, DarioFaggioli, xen-devel, Julien Grall,
	Stefano Stabellini, Suravee Suthikulpanit, Keir Fraser

On April 11, 2016 11:10am, <quan.xu@intel.com> wrote:
> On March 29, 2016 3:37pm, Jan Beulich <JBeulich@suse.com> wrote:
> > >>> On 25.03.16 at 10:27, <quan.xu@intel.com> wrote:
> > > On March 18, 2016 6:20pm, <JBeulich@suse.com> wrote:
> > >> >>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote:
> > >> > --- a/xen/drivers/passthrough/iommu.c
> > >> > +++ b/xen/drivers/passthrough/iommu.c
> > >> > @@ -554,11 +555,24 @@ static void iommu_flush_all(void)
> > >> >          iommu = drhd->iommu;
> > >> >          iommu_flush_context_global(iommu, 0);
> > >> >          flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
> > >> > -        iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
> > >> > +        rc = iommu_flush_iotlb_global(iommu, 0,
> > >> > + flush_dev_iotlb);
> > >> > +
> > >> > +        if ( rc > 0 )
> > >> > +        {
> > >> > +            iommu_flush_write_buffer(iommu);
> > >>
> > >> Why is this needed all of the sudden?
> > >
> > > As there may be multiple IOMMUs. .e.g, there are 2 IOMMUs in my
> > > machine, and I can find the following log message:
> > > """
> > > (XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
> > > (XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB.
> > > """
> > > __iiuc__, iommu_flush_write_buffer() is per IOMMU, so It should be
> > > called to flush every IOMMU.
> >
> > For one what you say suggests that right now this is being done for
> > some (one?) IOMMU(s), which I don't see being the case. And then what
> > you say _still_ doesn't say _why_ this is now needed all of the
> > sudden. If, in the course of doing your re-work here, you find
> > pre-existing issues with the code, please split the necessary fixes
> > out of your re-work and submit them separately with proper explanations in
> their commit messages.
> >
> 
> I find out it is no need modification for this function.
Sorry, this modification refers to as:
"
+        if ( rc > 0 )
+        {
+            iommu_flush_write_buffer(iommu);
+            rc = 0;
+        }
"

at least errors need to be propagated.


Quan

> I overlooked the parameter 'flush_non_present_entry', which is '0' to call
> iommu_flush_iotlb_global().
> At the bottom of the call tree,
> (Existing code)
> >>
> In flush->iotlb():
> {
> ....
>     if ( flush_non_present_entry )
>     {
>         if ( !cap_caching_mode(iommu->cap) )
>             return 1;
>         else
>             did = 0;
>     }
> 
> ....
> }
> <<
> 
> 
> it is impossible to return '1', which is used to indicate caller needs to flush
> cache.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.
  2016-04-11  3:27           ` Xu, Quan
@ 2016-04-11 16:34             ` Jan Beulich
  2016-04-12  1:09               ` Xu, Quan
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2016-04-11 16:34 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, Jun Nakajima, George Dunlap, Andrew Cooper,
	DarioFaggioli, xen-devel, Julien Grall, StefanoStabellini,
	Suravee Suthikulpanit, Keir Fraser

>>> On 11.04.16 at 05:27, <quan.xu@intel.com> wrote:
> On April 11, 2016 11:10am, <quan.xu@intel.com> wrote:
>> On March 29, 2016 3:37pm, Jan Beulich <JBeulich@suse.com> wrote:
>> > >>> On 25.03.16 at 10:27, <quan.xu@intel.com> wrote:
>> > > On March 18, 2016 6:20pm, <JBeulich@suse.com> wrote:
>> > >> >>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote:
>> > >> > --- a/xen/drivers/passthrough/iommu.c
>> > >> > +++ b/xen/drivers/passthrough/iommu.c
>> > >> > @@ -554,11 +555,24 @@ static void iommu_flush_all(void)
>> > >> >          iommu = drhd->iommu;
>> > >> >          iommu_flush_context_global(iommu, 0);
>> > >> >          flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
>> > >> > -        iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
>> > >> > +        rc = iommu_flush_iotlb_global(iommu, 0,
>> > >> > + flush_dev_iotlb);
>> > >> > +
>> > >> > +        if ( rc > 0 )
>> > >> > +        {
>> > >> > +            iommu_flush_write_buffer(iommu);
>> > >>
>> > >> Why is this needed all of the sudden?
>> > >
>> > > As there may be multiple IOMMUs. .e.g, there are 2 IOMMUs in my
>> > > machine, and I can find the following log message:
>> > > """
>> > > (XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
>> > > (XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB.
>> > > """
>> > > __iiuc__, iommu_flush_write_buffer() is per IOMMU, so It should be
>> > > called to flush every IOMMU.
>> >
>> > For one what you say suggests that right now this is being done for
>> > some (one?) IOMMU(s), which I don't see being the case. And then what
>> > you say _still_ doesn't say _why_ this is now needed all of the
>> > sudden. If, in the course of doing your re-work here, you find
>> > pre-existing issues with the code, please split the necessary fixes
>> > out of your re-work and submit them separately with proper explanations in
>> their commit messages.
>> >
>> 
>> I find out it is no need modification for this function.
> Sorry, this modification refers to as:
> "
> +        if ( rc > 0 )
> +        {
> +            iommu_flush_write_buffer(iommu);
> +            rc = 0;
> +        }
> "
> 
> at least errors need to be propagated.

How does error propagation correlate with suddenly calling
iommu_flush_write_buffer()?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] IOMMU/MMU: Adjust low level functions for VT-d Device-TLB flush error.
  2016-04-11 16:34             ` Jan Beulich
@ 2016-04-12  1:09               ` Xu, Quan
  0 siblings, 0 replies; 42+ messages in thread
From: Xu, Quan @ 2016-04-12  1:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, Nakajima, Jun, George Dunlap,
	Andrew Cooper, DarioFaggioli, xen-devel, Julien Grall,
	StefanoStabellini, Suravee Suthikulpanit, Keir Fraser

On April 12, 2016 12:35am, Jan Beulich <JBeulich@suse.com> wrote:
> >>> On 11.04.16 at 05:27, <quan.xu@intel.com> wrote:
> > On April 11, 2016 11:10am, <quan.xu@intel.com> wrote:
> >> On March 29, 2016 3:37pm, Jan Beulich <JBeulich@suse.com> wrote:
> >> > >>> On 25.03.16 at 10:27, <quan.xu@intel.com> wrote:
> >> > > On March 18, 2016 6:20pm, <JBeulich@suse.com> wrote:
> >> > >> >>> On 17.03.16 at 07:54, <quan.xu@intel.com> wrote:
> >> > >> > --- a/xen/drivers/passthrough/iommu.c
> >> > >> > +++ b/xen/drivers/passthrough/iommu.c
> >> > >> > @@ -554,11 +555,24 @@ static void iommu_flush_all(void)
> >> > >> >          iommu = drhd->iommu;
> >> > >> >          iommu_flush_context_global(iommu, 0);
> >> > >> >          flush_dev_iotlb = find_ats_dev_drhd(iommu) ? 1 : 0;
> >> > >> > -        iommu_flush_iotlb_global(iommu, 0, flush_dev_iotlb);
> >> > >> > +        rc = iommu_flush_iotlb_global(iommu, 0,
> >> > >> > + flush_dev_iotlb);
> >> > >> > +
> >> > >> > +        if ( rc > 0 )
> >> > >> > +        {
> >> > >> > +            iommu_flush_write_buffer(iommu);
> >> > >>
> >> > >> Why is this needed all of the sudden?
> >> > >
> >> > > As there may be multiple IOMMUs. .e.g, there are 2 IOMMUs in my
> >> > > machine, and I can find the following log message:
> >> > > """
> >> > > (XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
> >> > > (XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB.
> >> > > """
> >> > > __iiuc__, iommu_flush_write_buffer() is per IOMMU, so It should
> >> > > be called to flush every IOMMU.
> >> >
> >> > For one what you say suggests that right now this is being done for
> >> > some (one?) IOMMU(s), which I don't see being the case. And then
> >> > what you say _still_ doesn't say _why_ this is now needed all of
> >> > the sudden. If, in the course of doing your re-work here, you find
> >> > pre-existing issues with the code, please split the necessary fixes
> >> > out of your re-work and submit them separately with proper
> >> > explanations in
> >> their commit messages.
> >> >
> >>
> >> I find out it is no need modification for this function.
> > Sorry, this modification refers to as:
> > "
> > +        if ( rc > 0 )
> > +        {
> > +            iommu_flush_write_buffer(iommu);
> > +            rc = 0;
> > +        }
> > "
> >

My bad description, what I mean is that I will drop above modification, then..

> > at least errors need to be propagated.
> 
> How does error propagation correlate with suddenly calling
> iommu_flush_write_buffer()?
> 

iommu_flush_write_buffer() is no longer called suddenly in this function.

Quan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2016-04-12  1:09 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-17  6:54 [PATCH 0/2] Check VT-d Device-TLB flush error Quan Xu
2016-03-17  6:54 ` [PATCH 1/2] IOMMU/MMU: Adjust top level functions for " Quan Xu
2016-03-17  7:32   ` Tian, Kevin
2016-03-17  7:58     ` Jan Beulich
2016-03-17  8:00       ` Tian, Kevin
2016-03-17 12:30   ` George Dunlap
2016-03-17 12:33     ` George Dunlap
2016-03-18  3:19       ` Xu, Quan
2016-03-18  8:09         ` Jan Beulich
2016-03-24  6:45           ` Xu, Quan
2016-03-18  7:54     ` Xu, Quan
2016-03-18  8:19       ` Jan Beulich
2016-03-18  9:09         ` Xu, Quan
2016-03-18  9:29           ` Jan Beulich
2016-03-18  9:38             ` Dario Faggioli
2016-03-18  9:48               ` Jan Beulich
2016-03-21  6:18                 ` Tian, Kevin
2016-03-21 12:22                   ` Jan Beulich
2016-03-24  9:02                 ` Xu, Quan
2016-03-24  9:58                   ` Jan Beulich
2016-03-24 14:12                     ` Xu, Quan
2016-03-24 14:37                       ` Jan Beulich
2016-03-17 17:14   ` Jan Beulich
2016-03-28  3:33     ` Xu, Quan
2016-03-29  7:20       ` Jan Beulich
2016-03-30  2:28         ` Xu, Quan
2016-03-30  2:35           ` Xu, Quan
2016-03-30  8:05           ` Jan Beulich
2016-03-17  6:54 ` [PATCH 2/2] IOMMU/MMU: Adjust low " Quan Xu
2016-03-17  7:37   ` Tian, Kevin
2016-03-18  2:30     ` Xu, Quan
2016-03-18  8:06       ` Jan Beulich
2016-03-21  5:01         ` Tian, Kevin
2016-03-17 15:31   ` George Dunlap
2016-03-18  6:57     ` Xu, Quan
2016-03-18 10:20   ` Jan Beulich
2016-03-25  9:27     ` Xu, Quan
2016-03-29  7:36       ` Jan Beulich
2016-04-11  3:09         ` Xu, Quan
2016-04-11  3:27           ` Xu, Quan
2016-04-11 16:34             ` Jan Beulich
2016-04-12  1:09               ` Xu, Quan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).