All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping for PVH Dom0
@ 2017-08-11 16:43 Roger Pau Monne
  2017-08-11 16:43 ` [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas " Roger Pau Monne
                   ` (4 more replies)
  0 siblings, 5 replies; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-11 16:43 UTC (permalink / raw)
  To: xen-devel

Hello,

Currently iommu_inclusive_mapping is not working for PVH Dom0, this patch
series allows using it for a PVH Dom0, which seems to be required in order to
boot on older boxes.

Git branch can be found at:

git://xenbits.xen.org/people/royger/xen.git iommu_inclusive_v2

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-11 16:43 [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping for PVH Dom0 Roger Pau Monne
@ 2017-08-11 16:43 ` Roger Pau Monne
  2017-08-17  3:12   ` Tian, Kevin
  2017-08-22 12:26   ` Jan Beulich
  2017-08-11 16:43 ` [PATCH v2 2/4] x86/dom0: prevent PVH Dom0 from mapping read-only the IO APIC area Roger Pau Monne
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-11 16:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

They are emulated by Xen, so they must not be mapped into Dom0 p2m.
Introduce a helper function to add the MMCFG areas to the list of
denied iomem regions for PVH Dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since RFC:
 - Introduce as helper instead of exposing the internal mmcfg
   variables to the Dom0 builder.
---
 xen/arch/x86/dom0_build.c         |  4 ++++
 xen/arch/x86/x86_64/mmconfig_64.c | 21 +++++++++++++++++++++
 xen/include/xen/pci.h             |  2 ++
 3 files changed, 27 insertions(+)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 0c125e61eb..3e0910d779 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain *d)
             rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
     }
 
+    /* For PVH prevent access to the MMCFG areas. */
+    if ( dom0_pvh )
+        rc |= pci_mmcfg_set_domain_permissions(d);
+
     return rc;
 }
 
diff --git a/xen/arch/x86/x86_64/mmconfig_64.c b/xen/arch/x86/x86_64/mmconfig_64.c
index e84a67dfc4..271fad407f 100644
--- a/xen/arch/x86/x86_64/mmconfig_64.c
+++ b/xen/arch/x86/x86_64/mmconfig_64.c
@@ -15,6 +15,8 @@
 #include <xen/pci_regs.h>
 #include <xen/iommu.h>
 #include <xen/rangeset.h>
+#include <xen/sched.h>
+#include <xen/iocap.h>
 
 #include "mmconfig.h"
 
@@ -175,6 +177,25 @@ void pci_mmcfg_arch_disable(unsigned int idx)
            cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number);
 }
 
+int pci_mmcfg_set_domain_permissions(struct domain *d)
+{
+    unsigned int idx;
+    int rc = 0;
+
+    for ( idx = 0; idx < pci_mmcfg_config_num; idx++ )
+    {
+        const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg;
+        unsigned long start = PFN_DOWN(cfg->address) +
+                              PCI_BDF(cfg->start_bus_number, 0, 0);
+        unsigned long end = PFN_DOWN(cfg->address) +
+                            PCI_BDF(cfg->end_bus_number, ~0, ~0);
+
+        rc |= iomem_deny_access(d, start, end);
+    }
+
+    return rc;
+}
+
 bool_t pci_mmcfg_decode(unsigned long mfn, unsigned int *seg,
                         unsigned int *bdf)
 {
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 59b6e8a81c..ea6a66b248 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -170,4 +170,6 @@ int msixtbl_pt_register(struct domain *, struct pirq *, uint64_t gtable);
 void msixtbl_pt_unregister(struct domain *, struct pirq *);
 void msixtbl_pt_cleanup(struct domain *d);
 
+int pci_mmcfg_set_domain_permissions(struct domain *d);
+
 #endif /* __XEN_PCI_H__ */
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 2/4] x86/dom0: prevent PVH Dom0 from mapping read-only the IO APIC area
  2017-08-11 16:43 [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping for PVH Dom0 Roger Pau Monne
  2017-08-11 16:43 ` [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas " Roger Pau Monne
@ 2017-08-11 16:43 ` Roger Pau Monne
  2017-08-17  3:12   ` Tian, Kevin
  2017-08-22 12:28   ` Jan Beulich
  2017-08-11 16:43 ` [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping Roger Pau Monne
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-11 16:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

This is emulated by Xen and must not be mapped into PVH Dom0 p2m.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/dom0_build.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 3e0910d779..804efee1a9 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -402,7 +402,7 @@ int __init dom0_setup_permissions(struct domain *d)
     for ( i = 0; i < nr_ioapics; i++ )
     {
         mfn = paddr_to_pfn(mp_ioapics[i].mpc_apicaddr);
-        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        if ( dom0_pvh || !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
             rc |= iomem_deny_access(d, mfn, mfn);
     }
     /* MSI range. */
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping
  2017-08-11 16:43 [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping for PVH Dom0 Roger Pau Monne
  2017-08-11 16:43 ` [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas " Roger Pau Monne
  2017-08-11 16:43 ` [PATCH v2 2/4] x86/dom0: prevent PVH Dom0 from mapping read-only the IO APIC area Roger Pau Monne
@ 2017-08-11 16:43 ` Roger Pau Monne
  2017-08-17  3:28   ` Tian, Kevin
  2017-08-22 12:31   ` Jan Beulich
  2017-08-11 16:43 ` [PATCH v2 4/4] x86/dom0: re-order DMA remapping enabling for PVH Dom0 Roger Pau Monne
  2017-08-17  3:10 ` [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping " Tian, Kevin
  4 siblings, 2 replies; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-11 16:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Kevin Tian, Roger Pau Monne

On certain Intel systems, as far as I can tell almost all pre-Haswell ones,
trying to boot a PVH Dom0 will freeze the box completely, up to the point that
not even the watchdog works. The freeze happens exactly when enabling the DMA
remapping in the IOMMU, the last line seen is:

(XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000

In order to workaround this (which seems to be a lack of proper RMRR entries,
plus the IOMMU being unable to generate faults and freezing the entire system)
add a PVH specific implementation of iommu_inclusive_mapping, that maps
non-RAM, non-unusable regions into Dom0 p2m. Note that care is taken to not map
device MMIO regions that Xen is emulating, like the local APIC or the IO APIC.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Kevin Tian <kevin.tian@intel.com>
---
 xen/drivers/passthrough/vtd/extern.h  |  1 +
 xen/drivers/passthrough/vtd/iommu.c   |  2 ++
 xen/drivers/passthrough/vtd/x86/vtd.c | 39 +++++++++++++++++++++++++++++++++++
 3 files changed, 42 insertions(+)

diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index fb7edfaef9..0eaf8956ff 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -100,5 +100,6 @@ bool_t platform_supports_intremap(void);
 bool_t platform_supports_x2apic(void);
 
 void vtd_set_hwdom_mapping(struct domain *d);
+void vtd_set_pvh_hwdom_mapping(struct domain *d);
 
 #endif // _VTD_EXTERN_H_
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index daaed0abbd..8ed28defe2 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1303,6 +1303,8 @@ static void __hwdom_init intel_iommu_hwdom_init(struct domain *d)
         /* Set up 1:1 page table for hardware domain. */
         vtd_set_hwdom_mapping(d);
     }
+    else if ( is_hvm_domain(d) )
+        vtd_set_pvh_hwdom_mapping(d);
 
     setup_hwdom_pci_devices(d, setup_hwdom_device);
     setup_hwdom_rmrr(d);
diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c b/xen/drivers/passthrough/vtd/x86/vtd.c
index 88a60b3307..79c9b0526f 100644
--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -21,10 +21,12 @@
 #include <xen/softirq.h>
 #include <xen/domain_page.h>
 #include <asm/paging.h>
+#include <xen/iocap.h>
 #include <xen/iommu.h>
 #include <xen/irq.h>
 #include <xen/numa.h>
 #include <asm/fixmap.h>
+#include <asm/p2m.h>
 #include <asm/setup.h>
 #include "../iommu.h"
 #include "../dmar.h"
@@ -159,3 +161,40 @@ void __hwdom_init vtd_set_hwdom_mapping(struct domain *d)
     }
 }
 
+void __hwdom_init vtd_set_pvh_hwdom_mapping(struct domain *d)
+{
+    unsigned long pfn;
+
+    BUG_ON(!is_hardware_domain(d));
+
+    if ( !iommu_inclusive_mapping )
+        return;
+
+    /* NB: the low 1MB is already mapped in pvh_setup_p2m. */
+    for ( pfn = PFN_DOWN(MB(1)); pfn < PFN_DOWN(GB(4)); pfn++ )
+    {
+        p2m_access_t a;
+        int rc;
+
+        if ( !(pfn & 0xfff) )
+            process_pending_softirqs();
+
+        /* Skip RAM, ACPI and unusable regions. */
+        if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) ||
+             page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) ||
+             page_is_ram_type(pfn, RAM_TYPE_ACPI) ||
+             !iomem_access_permitted(d, pfn, pfn) )
+            continue;
+
+        ASSERT(!xen_in_range(pfn));
+
+        a = rangeset_contains_range(mmio_ro_ranges, pfn, pfn) ? p2m_access_r
+                                                              : p2m_access_rw;
+        rc = set_identity_p2m_entry(d, pfn, a, 0);
+        if ( rc )
+           printk(XENLOG_WARNING VTDPREFIX
+                  " d%d: IOMMU mapping failed pfn %#lx: %d\n",
+                  d->domain_id, pfn, rc);
+    }
+}
+
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 4/4] x86/dom0: re-order DMA remapping enabling for PVH Dom0
  2017-08-11 16:43 [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping for PVH Dom0 Roger Pau Monne
                   ` (2 preceding siblings ...)
  2017-08-11 16:43 ` [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping Roger Pau Monne
@ 2017-08-11 16:43 ` Roger Pau Monne
  2017-08-22 12:37   ` Jan Beulich
  2017-08-17  3:10 ` [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping " Tian, Kevin
  4 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-11 16:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Make sure the reserved regions are setup before enabling the DMA
remapping in the IOMMU, by calling dom0_setup_permissions before
iommu_hwdom_init. Also, in order to workaround IOMMU issues seen on
pre-Haswell Intel hardware, as described in patch "introduce a PVH
implementation of iommu_inclusive_mapping" make sure the DMA remapping
is enabled after populating Dom0 p2m.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since RFC:
 - Expand commit message to reference patch #3.
---
 xen/arch/x86/hvm/dom0_build.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 020c355faf..0e7d06be95 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -605,13 +605,6 @@ static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
         return rc;
     }
 
-    rc = dom0_setup_permissions(d);
-    if ( rc )
-    {
-        panic("Unable to setup Dom0 permissions: %d\n", rc);
-        return rc;
-    }
-
     update_domain_wallclock_time(d);
 
     clear_bit(_VPF_down, &v->pause_flags);
@@ -1059,7 +1052,12 @@ int __init dom0_construct_pvh(struct domain *d, const module_t *image,
 
     printk("** Building a PVH Dom0 **\n");
 
-    iommu_hwdom_init(d);
+    rc = dom0_setup_permissions(d);
+    if ( rc )
+    {
+        printk("Unable to setup Dom0 permissions: %d\n", rc);
+        return rc;
+    }
 
     rc = pvh_setup_p2m(d);
     if ( rc )
@@ -1068,6 +1066,8 @@ int __init dom0_construct_pvh(struct domain *d, const module_t *image,
         return rc;
     }
 
+    iommu_hwdom_init(d);
+
     rc = pvh_load_kernel(d, image, image_headroom, initrd, bootstrap_map(image),
                          cmdline, &entry, &start_info);
     if ( rc )
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping for PVH Dom0
  2017-08-11 16:43 [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping for PVH Dom0 Roger Pau Monne
                   ` (3 preceding siblings ...)
  2017-08-11 16:43 ` [PATCH v2 4/4] x86/dom0: re-order DMA remapping enabling for PVH Dom0 Roger Pau Monne
@ 2017-08-17  3:10 ` Tian, Kevin
  2017-08-17  9:28   ` Roger Pau Monne
  4 siblings, 1 reply; 45+ messages in thread
From: Tian, Kevin @ 2017-08-17  3:10 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel

> From: Roger Pau Monne
> Sent: Saturday, August 12, 2017 12:43 AM
> 
> Hello,
> 
> Currently iommu_inclusive_mapping is not working for PVH Dom0, this

not working for all platforms or only older boxes? The subject indicates
the former while the later description seems the latter...

> patch
> series allows using it for a PVH Dom0, which seems to be required in order
> to
> boot on older boxes.
> 
> Git branch can be found at:
> 
> git://xenbits.xen.org/people/royger/xen.git iommu_inclusive_v2
> 
> Thanks, Roger.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-11 16:43 ` [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas " Roger Pau Monne
@ 2017-08-17  3:12   ` Tian, Kevin
  2017-08-17  9:32     ` Roger Pau Monne
  2017-08-22 12:26   ` Jan Beulich
  1 sibling, 1 reply; 45+ messages in thread
From: Tian, Kevin @ 2017-08-17  3:12 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Andrew Cooper, Jan Beulich

> From: Roger Pau Monne
> Sent: Saturday, August 12, 2017 12:43 AM
> 
> They are emulated by Xen, so they must not be mapped into Dom0 p2m.
> Introduce a helper function to add the MMCFG areas to the list of
> denied iomem regions for PVH Dom0.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

this patch is a general fix, not just for inclusive mapping. please send
it separately.

> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> Changes since RFC:
>  - Introduce as helper instead of exposing the internal mmcfg
>    variables to the Dom0 builder.
> ---
>  xen/arch/x86/dom0_build.c         |  4 ++++
>  xen/arch/x86/x86_64/mmconfig_64.c | 21 +++++++++++++++++++++
>  xen/include/xen/pci.h             |  2 ++
>  3 files changed, 27 insertions(+)
> 
> diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
> index 0c125e61eb..3e0910d779 100644
> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain
> *d)
>              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
>      }
> 
> +    /* For PVH prevent access to the MMCFG areas. */
> +    if ( dom0_pvh )
> +        rc |= pci_mmcfg_set_domain_permissions(d);
> +
>      return rc;
>  }
> 
> diff --git a/xen/arch/x86/x86_64/mmconfig_64.c
> b/xen/arch/x86/x86_64/mmconfig_64.c
> index e84a67dfc4..271fad407f 100644
> --- a/xen/arch/x86/x86_64/mmconfig_64.c
> +++ b/xen/arch/x86/x86_64/mmconfig_64.c
> @@ -15,6 +15,8 @@
>  #include <xen/pci_regs.h>
>  #include <xen/iommu.h>
>  #include <xen/rangeset.h>
> +#include <xen/sched.h>
> +#include <xen/iocap.h>
> 
>  #include "mmconfig.h"
> 
> @@ -175,6 +177,25 @@ void pci_mmcfg_arch_disable(unsigned int idx)
>             cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number);
>  }
> 
> +int pci_mmcfg_set_domain_permissions(struct domain *d)
> +{
> +    unsigned int idx;
> +    int rc = 0;
> +
> +    for ( idx = 0; idx < pci_mmcfg_config_num; idx++ )
> +    {
> +        const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg;
> +        unsigned long start = PFN_DOWN(cfg->address) +
> +                              PCI_BDF(cfg->start_bus_number, 0, 0);
> +        unsigned long end = PFN_DOWN(cfg->address) +
> +                            PCI_BDF(cfg->end_bus_number, ~0, ~0);
> +
> +        rc |= iomem_deny_access(d, start, end);
> +    }
> +
> +    return rc;
> +}
> +
>  bool_t pci_mmcfg_decode(unsigned long mfn, unsigned int *seg,
>                          unsigned int *bdf)
>  {
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 59b6e8a81c..ea6a66b248 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -170,4 +170,6 @@ int msixtbl_pt_register(struct domain *, struct pirq
> *, uint64_t gtable);
>  void msixtbl_pt_unregister(struct domain *, struct pirq *);
>  void msixtbl_pt_cleanup(struct domain *d);
> 
> +int pci_mmcfg_set_domain_permissions(struct domain *d);
> +
>  #endif /* __XEN_PCI_H__ */
> --
> 2.11.0 (Apple Git-81)
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/4] x86/dom0: prevent PVH Dom0 from mapping read-only the IO APIC area
  2017-08-11 16:43 ` [PATCH v2 2/4] x86/dom0: prevent PVH Dom0 from mapping read-only the IO APIC area Roger Pau Monne
@ 2017-08-17  3:12   ` Tian, Kevin
  2017-08-17  9:35     ` Roger Pau Monne
  2017-08-22 12:28   ` Jan Beulich
  1 sibling, 1 reply; 45+ messages in thread
From: Tian, Kevin @ 2017-08-17  3:12 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Andrew Cooper, Jan Beulich

> From: Roger Pau Monne
> Sent: Saturday, August 12, 2017 12:43 AM
> 
> This is emulated by Xen and must not be mapped into PVH Dom0 p2m.

same comment as previous one. please send it separately.

> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/arch/x86/dom0_build.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
> index 3e0910d779..804efee1a9 100644
> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -402,7 +402,7 @@ int __init dom0_setup_permissions(struct domain
> *d)
>      for ( i = 0; i < nr_ioapics; i++ )
>      {
>          mfn = paddr_to_pfn(mp_ioapics[i].mpc_apicaddr);
> -        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
> +        if ( dom0_pvh || !rangeset_contains_singleton(mmio_ro_ranges,
> mfn) )
>              rc |= iomem_deny_access(d, mfn, mfn);
>      }
>      /* MSI range. */
> --
> 2.11.0 (Apple Git-81)
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping
  2017-08-11 16:43 ` [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping Roger Pau Monne
@ 2017-08-17  3:28   ` Tian, Kevin
  2017-08-17  9:39     ` Roger Pau Monne
  2017-08-22 12:31   ` Jan Beulich
  1 sibling, 1 reply; 45+ messages in thread
From: Tian, Kevin @ 2017-08-17  3:28 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel

> From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> Sent: Saturday, August 12, 2017 12:43 AM
> 
> On certain Intel systems, as far as I can tell almost all pre-Haswell ones,
> trying to boot a PVH Dom0 will freeze the box completely, up to the point
> that
> not even the watchdog works. The freeze happens exactly when enabling
> the DMA
> remapping in the IOMMU, the last line seen is:
> 
> (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
> 
> In order to workaround this (which seems to be a lack of proper RMRR
> entries,

since you position this patch as 'workaround', what is the side-effect with
such workaround? Do you want to restrict such workaround only for old
boxes?

better you can also put some comment in the code so others can understand
why pvh requires its own way when reading the code.

> plus the IOMMU being unable to generate faults and freezing the entire
> system)
> add a PVH specific implementation of iommu_inclusive_mapping, that
> maps
> non-RAM, non-unusable regions into Dom0 p2m. Note that care is taken to
> not map
> device MMIO regions that Xen is emulating, like the local APIC or the IO
> APIC.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Kevin Tian <kevin.tian@intel.com>
> ---
>  xen/drivers/passthrough/vtd/extern.h  |  1 +
>  xen/drivers/passthrough/vtd/iommu.c   |  2 ++
>  xen/drivers/passthrough/vtd/x86/vtd.c | 39
> +++++++++++++++++++++++++++++++++++
>  3 files changed, 42 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/vtd/extern.h
> b/xen/drivers/passthrough/vtd/extern.h
> index fb7edfaef9..0eaf8956ff 100644
> --- a/xen/drivers/passthrough/vtd/extern.h
> +++ b/xen/drivers/passthrough/vtd/extern.h
> @@ -100,5 +100,6 @@ bool_t platform_supports_intremap(void);
>  bool_t platform_supports_x2apic(void);
> 
>  void vtd_set_hwdom_mapping(struct domain *d);
> +void vtd_set_pvh_hwdom_mapping(struct domain *d);
> 
>  #endif // _VTD_EXTERN_H_
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index daaed0abbd..8ed28defe2 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1303,6 +1303,8 @@ static void __hwdom_init
> intel_iommu_hwdom_init(struct domain *d)
>          /* Set up 1:1 page table for hardware domain. */
>          vtd_set_hwdom_mapping(d);
>      }
> +    else if ( is_hvm_domain(d) )
> +        vtd_set_pvh_hwdom_mapping(d);

Can you elaborate a bit here? Current condition is:

    if ( !iommu_passthrough && !need_iommu(d) )
    {
        /* Set up 1:1 page table for hardware domain. */
        vtd_set_hwdom_mapping(d);
    }

So you assume for PVH above condition will never be true?

> 
>      setup_hwdom_pci_devices(d, setup_hwdom_device);
>      setup_hwdom_rmrr(d);
> diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c
> b/xen/drivers/passthrough/vtd/x86/vtd.c
> index 88a60b3307..79c9b0526f 100644
> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> @@ -21,10 +21,12 @@
>  #include <xen/softirq.h>
>  #include <xen/domain_page.h>
>  #include <asm/paging.h>
> +#include <xen/iocap.h>
>  #include <xen/iommu.h>
>  #include <xen/irq.h>
>  #include <xen/numa.h>
>  #include <asm/fixmap.h>
> +#include <asm/p2m.h>
>  #include <asm/setup.h>
>  #include "../iommu.h"
>  #include "../dmar.h"
> @@ -159,3 +161,40 @@ void __hwdom_init
> vtd_set_hwdom_mapping(struct domain *d)
>      }
>  }
> 
> +void __hwdom_init vtd_set_pvh_hwdom_mapping(struct domain *d)
> +{
> +    unsigned long pfn;
> +
> +    BUG_ON(!is_hardware_domain(d));
> +
> +    if ( !iommu_inclusive_mapping )
> +        return;
> +
> +    /* NB: the low 1MB is already mapped in pvh_setup_p2m. */
> +    for ( pfn = PFN_DOWN(MB(1)); pfn < PFN_DOWN(GB(4)); pfn++ )
> +    {
> +        p2m_access_t a;
> +        int rc;
> +
> +        if ( !(pfn & 0xfff) )
> +            process_pending_softirqs();
> +
> +        /* Skip RAM, ACPI and unusable regions. */
> +        if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) ||
> +             page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) ||
> +             page_is_ram_type(pfn, RAM_TYPE_ACPI) ||
> +             !iomem_access_permitted(d, pfn, pfn) )
> +            continue;

I'm a bit confused here. So you only handle RESERVED memory
type here, which doesn't match the definition of inclusive mapping.

/*
 * iommu_inclusive_mapping: when set, all memory below 4GB is included in dom0
 * 1:1 iommu mappings except xen and unusable regions.
 */

there must be some background which I missed...

> +
> +        ASSERT(!xen_in_range(pfn));
> +
> +        a = rangeset_contains_range(mmio_ro_ranges, pfn, pfn) ?
> p2m_access_r
> +                                                              : p2m_access_rw;
> +        rc = set_identity_p2m_entry(d, pfn, a, 0);
> +        if ( rc )
> +           printk(XENLOG_WARNING VTDPREFIX
> +                  " d%d: IOMMU mapping failed pfn %#lx: %d\n",
> +                  d->domain_id, pfn, rc);
> +    }
> +}
> +
> --
> 2.11.0 (Apple Git-81)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping for PVH Dom0
  2017-08-17  3:10 ` [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping " Tian, Kevin
@ 2017-08-17  9:28   ` Roger Pau Monne
  0 siblings, 0 replies; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-17  9:28 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: xen-devel

On Thu, Aug 17, 2017 at 03:10:44AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monne
> > Sent: Saturday, August 12, 2017 12:43 AM
> > 
> > Hello,
> > 
> > Currently iommu_inclusive_mapping is not working for PVH Dom0, this
> 
> not working for all platforms or only older boxes? The subject indicates
> the former while the later description seems the latter...

It would be probably best to write it as:

"Currently iommu_inclusive_mapping is not implemented for PVH Dom0",
rather than "not working".

iommu_inclusive_mapping is only used when !need_iommu(d), and PVH Dom0
requires an iommu, so this condition is never going to be true.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-17  3:12   ` Tian, Kevin
@ 2017-08-17  9:32     ` Roger Pau Monne
  2017-08-28  6:04       ` Tian, Kevin
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-17  9:32 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: xen-devel, Jan Beulich, Andrew Cooper

On Thu, Aug 17, 2017 at 03:12:02AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monne
> > Sent: Saturday, August 12, 2017 12:43 AM
> > 
> > They are emulated by Xen, so they must not be mapped into Dom0 p2m.
> > Introduce a helper function to add the MMCFG areas to the list of
> > denied iomem regions for PVH Dom0.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> this patch is a general fix, not just for inclusive mapping. please send
> it separately.

Hm, not really.

PV Dom0 should have access to the MMCFG areas, PVH Dom0 shouldn't
because they will emulated by Xen.

So far MMCFG areas are not mapped into PVH Dom0 p2m, but they will be
once iommu_inclusive_mapping is implemented for PVH Dom0. So I
consider this a preparatory change before enabling
iommu_inclusive_mapping for PVH, rather than a fix. It would be a
fix if iommu_inclusive_mapping was already enabled for PVH Dom0.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/4] x86/dom0: prevent PVH Dom0 from mapping read-only the IO APIC area
  2017-08-17  3:12   ` Tian, Kevin
@ 2017-08-17  9:35     ` Roger Pau Monne
  2017-08-28  6:07       ` Tian, Kevin
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-17  9:35 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: xen-devel, Jan Beulich, Andrew Cooper

On Thu, Aug 17, 2017 at 03:12:45AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monne
> > Sent: Saturday, August 12, 2017 12:43 AM
> > 
> > This is emulated by Xen and must not be mapped into PVH Dom0 p2m.
> 
> same comment as previous one. please send it separately.

This will only be mapped once iommu_inclusive_mapping is available for
PVH Dom0, which is what patch #3 does. It's not a bugfix because the
bug it would be fix doesn't exist yet.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping
  2017-08-17  3:28   ` Tian, Kevin
@ 2017-08-17  9:39     ` Roger Pau Monne
  2017-08-28  6:13       ` Tian, Kevin
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-17  9:39 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: xen-devel

On Thu, Aug 17, 2017 at 03:28:29AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> > Sent: Saturday, August 12, 2017 12:43 AM
> > 
> > On certain Intel systems, as far as I can tell almost all pre-Haswell ones,
> > trying to boot a PVH Dom0 will freeze the box completely, up to the point
> > that
> > not even the watchdog works. The freeze happens exactly when enabling
> > the DMA
> > remapping in the IOMMU, the last line seen is:
> > 
> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
> > 
> > In order to workaround this (which seems to be a lack of proper RMRR
> > entries,
> 
> since you position this patch as 'workaround', what is the side-effect with
> such workaround?

Side effect is that Xen basically maps everything below 4GB to the PVH
Dom0 p2m. It's farily similar to what's already done for PV Dom0.

> Do you want to restrict such workaround only for old
> boxes?

Hm, I don't think so, the more that I don't think it's feasible to
identify the broken boxes from Xen's PoV.

> better you can also put some comment in the code so others can understand
> why pvh requires its own way when reading the code.
> 
> > plus the IOMMU being unable to generate faults and freezing the entire
> > system)
> > add a PVH specific implementation of iommu_inclusive_mapping, that
> > maps
> > non-RAM, non-unusable regions into Dom0 p2m. Note that care is taken to
> > not map
> > device MMIO regions that Xen is emulating, like the local APIC or the IO
> > APIC.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > ---
> >  xen/drivers/passthrough/vtd/extern.h  |  1 +
> >  xen/drivers/passthrough/vtd/iommu.c   |  2 ++
> >  xen/drivers/passthrough/vtd/x86/vtd.c | 39
> > +++++++++++++++++++++++++++++++++++
> >  3 files changed, 42 insertions(+)
> > 
> > diff --git a/xen/drivers/passthrough/vtd/extern.h
> > b/xen/drivers/passthrough/vtd/extern.h
> > index fb7edfaef9..0eaf8956ff 100644
> > --- a/xen/drivers/passthrough/vtd/extern.h
> > +++ b/xen/drivers/passthrough/vtd/extern.h
> > @@ -100,5 +100,6 @@ bool_t platform_supports_intremap(void);
> >  bool_t platform_supports_x2apic(void);
> > 
> >  void vtd_set_hwdom_mapping(struct domain *d);
> > +void vtd_set_pvh_hwdom_mapping(struct domain *d);
> > 
> >  #endif // _VTD_EXTERN_H_
> > diff --git a/xen/drivers/passthrough/vtd/iommu.c
> > b/xen/drivers/passthrough/vtd/iommu.c
> > index daaed0abbd..8ed28defe2 100644
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -1303,6 +1303,8 @@ static void __hwdom_init
> > intel_iommu_hwdom_init(struct domain *d)
> >          /* Set up 1:1 page table for hardware domain. */
> >          vtd_set_hwdom_mapping(d);
> >      }
> > +    else if ( is_hvm_domain(d) )
> > +        vtd_set_pvh_hwdom_mapping(d);
> 
> Can you elaborate a bit here? Current condition is:
> 
>     if ( !iommu_passthrough && !need_iommu(d) )
>     {
>         /* Set up 1:1 page table for hardware domain. */
>         vtd_set_hwdom_mapping(d);
>     }
> 
> So you assume for PVH above condition will never be true?

No, PVH Dom0 always requires an iommu, so the above condition will
never be true for a PVH Dom0.

> > 
> >      setup_hwdom_pci_devices(d, setup_hwdom_device);
> >      setup_hwdom_rmrr(d);
> > diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c
> > b/xen/drivers/passthrough/vtd/x86/vtd.c
> > index 88a60b3307..79c9b0526f 100644
> > --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> > +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> > @@ -21,10 +21,12 @@
> >  #include <xen/softirq.h>
> >  #include <xen/domain_page.h>
> >  #include <asm/paging.h>
> > +#include <xen/iocap.h>
> >  #include <xen/iommu.h>
> >  #include <xen/irq.h>
> >  #include <xen/numa.h>
> >  #include <asm/fixmap.h>
> > +#include <asm/p2m.h>
> >  #include <asm/setup.h>
> >  #include "../iommu.h"
> >  #include "../dmar.h"
> > @@ -159,3 +161,40 @@ void __hwdom_init
> > vtd_set_hwdom_mapping(struct domain *d)
> >      }
> >  }
> > 
> > +void __hwdom_init vtd_set_pvh_hwdom_mapping(struct domain *d)
> > +{
> > +    unsigned long pfn;
> > +
> > +    BUG_ON(!is_hardware_domain(d));
> > +
> > +    if ( !iommu_inclusive_mapping )
> > +        return;
> > +
> > +    /* NB: the low 1MB is already mapped in pvh_setup_p2m. */
> > +    for ( pfn = PFN_DOWN(MB(1)); pfn < PFN_DOWN(GB(4)); pfn++ )
> > +    {
> > +        p2m_access_t a;
> > +        int rc;
> > +
> > +        if ( !(pfn & 0xfff) )
> > +            process_pending_softirqs();
> > +
> > +        /* Skip RAM, ACPI and unusable regions. */
> > +        if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) ||
> > +             page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) ||
> > +             page_is_ram_type(pfn, RAM_TYPE_ACPI) ||
> > +             !iomem_access_permitted(d, pfn, pfn) )
> > +            continue;
> 
> I'm a bit confused here. So you only handle RESERVED memory
> type here, which doesn't match the definition of inclusive mapping.
> 
> /*
>  * iommu_inclusive_mapping: when set, all memory below 4GB is included in dom0
>  * 1:1 iommu mappings except xen and unusable regions.
>  */
> 
> there must be some background which I missed...

Right, RAM and ACPI regions are already mapped by the Dom0 builder, so
the only thing left are reserved regions not being used by Xen.

I can expand the comment above to say:

/*
 * Skip RAM, ACPI and unusable regions because they have been already
 * mapped by the PVH Dom0 builder.
 */

Does that seem better?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-11 16:43 ` [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas " Roger Pau Monne
  2017-08-17  3:12   ` Tian, Kevin
@ 2017-08-22 12:26   ` Jan Beulich
  2017-08-22 13:54     ` Roger Pau Monne
  1 sibling, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2017-08-22 12:26 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

>>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> They are emulated by Xen, so they must not be mapped into Dom0 p2m.
> Introduce a helper function to add the MMCFG areas to the list of
> denied iomem regions for PVH Dom0.

"They are" or "They are going to be"?

> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain *d)
>              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
>      }
>  
> +    /* For PVH prevent access to the MMCFG areas. */
> +    if ( dom0_pvh )
> +        rc |= pci_mmcfg_set_domain_permissions(d);

What about ones reported by Dom0 later on? Which then raises the
question whether ...

> @@ -175,6 +177,25 @@ void pci_mmcfg_arch_disable(unsigned int idx)
>             cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number);
>  }
>  
> +int pci_mmcfg_set_domain_permissions(struct domain *d)
> +{
> +    unsigned int idx;
> +    int rc = 0;
> +
> +    for ( idx = 0; idx < pci_mmcfg_config_num; idx++ )
> +    {
> +        const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg;
> +        unsigned long start = PFN_DOWN(cfg->address) +
> +                              PCI_BDF(cfg->start_bus_number, 0, 0);
> +        unsigned long end = PFN_DOWN(cfg->address) +
> +                            PCI_BDF(cfg->end_bus_number, ~0, ~0);
> +
> +        rc |= iomem_deny_access(d, start, end);

... this shouldn't be unnecessary by, other than PV Dom0,
starting out with no I/O memory being made accessible (i.e.
white listing just like we decided we would do for other
properties for PVH).

Additionally while in the code that dom0_setup_permissions()
was broken out from using |= was fine, there and here it's not
really appropriate unless we want to continue to bake in the
assumption that either iomem_deny_access() can only ever
return a single error indicator or (b) the callers only care about
the value being (non-)zero.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/4] x86/dom0: prevent PVH Dom0 from mapping read-only the IO APIC area
  2017-08-11 16:43 ` [PATCH v2 2/4] x86/dom0: prevent PVH Dom0 from mapping read-only the IO APIC area Roger Pau Monne
  2017-08-17  3:12   ` Tian, Kevin
@ 2017-08-22 12:28   ` Jan Beulich
  1 sibling, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2017-08-22 12:28 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

>>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> This is emulated by Xen and must not be mapped into PVH Dom0 p2m.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

If we stay with black listing MMIO ranges
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping
  2017-08-11 16:43 ` [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping Roger Pau Monne
  2017-08-17  3:28   ` Tian, Kevin
@ 2017-08-22 12:31   ` Jan Beulich
  2017-08-22 14:01     ` Roger Pau Monne
  1 sibling, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2017-08-22 12:31 UTC (permalink / raw)
  To: Roger Pau Monne, Kevin Tian; +Cc: xen-devel

>>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> On certain Intel systems, as far as I can tell almost all pre-Haswell ones,
> trying to boot a PVH Dom0 will freeze the box completely, up to the point that
> not even the watchdog works. The freeze happens exactly when enabling the DMA
> remapping in the IOMMU, the last line seen is:
> 
> (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
> 
> In order to workaround this (which seems to be a lack of proper RMRR entries,
> plus the IOMMU being unable to generate faults and freezing the entire system)
> add a PVH specific implementation of iommu_inclusive_mapping, that maps
> non-RAM, non-unusable regions into Dom0 p2m. Note that care is taken to not map
> device MMIO regions that Xen is emulating, like the local APIC or the IO APIC.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

I don't mean to object to the patch, but it certainly would be helpful
to understand the behavior a little better, in particular also to
perhaps be able to derive what RMRRs are missing (which could
then be added via command line option instead of this all-or-norhing
approach). Kevin, could you perhaps help here?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/4] x86/dom0: re-order DMA remapping enabling for PVH Dom0
  2017-08-11 16:43 ` [PATCH v2 4/4] x86/dom0: re-order DMA remapping enabling for PVH Dom0 Roger Pau Monne
@ 2017-08-22 12:37   ` Jan Beulich
  2017-08-22 14:05     ` Roger Pau Monne
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2017-08-22 12:37 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

>>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> Make sure the reserved regions are setup before enabling the DMA
> remapping in the IOMMU, by calling dom0_setup_permissions before
> iommu_hwdom_init.

I can't match up this part with ...

> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -605,13 +605,6 @@ static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
>          return rc;
>      }
>  
> -    rc = dom0_setup_permissions(d);
> -    if ( rc )
> -    {
> -        panic("Unable to setup Dom0 permissions: %d\n", rc);
> -        return rc;
> -    }
> -
>      update_domain_wallclock_time(d);
>  
>      clear_bit(_VPF_down, &v->pause_flags);
> @@ -1059,7 +1052,12 @@ int __init dom0_construct_pvh(struct domain *d, const module_t *image,
>  
>      printk("** Building a PVH Dom0 **\n");
>  
> -    iommu_hwdom_init(d);
> +    rc = dom0_setup_permissions(d);
> +    if ( rc )
> +    {
> +        printk("Unable to setup Dom0 permissions: %d\n", rc);
> +        return rc;
> +    }
>  
>      rc = pvh_setup_p2m(d);
>      if ( rc )
> @@ -1068,6 +1066,8 @@ int __init dom0_construct_pvh(struct domain *d, const module_t *image,
>          return rc;
>      }
>  
> +    iommu_hwdom_init(d);

... you not changing the relative order between these two function
calls. As to the other half I'm inclined to also wait for better
understanding of what's going on here, as said in reply to patch 3.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-22 12:26   ` Jan Beulich
@ 2017-08-22 13:54     ` Roger Pau Monne
  2017-08-23  8:16       ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-22 13:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> > They are emulated by Xen, so they must not be mapped into Dom0 p2m.
> > Introduce a helper function to add the MMCFG areas to the list of
> > denied iomem regions for PVH Dom0.
> 
> "They are" or "They are going to be"?

This started as a series on top of vPCI, but I think it has a chance
of getting in before vPCI. I will change it.

> > --- a/xen/arch/x86/dom0_build.c
> > +++ b/xen/arch/x86/dom0_build.c
> > @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain *d)
> >              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
> >      }
> >  
> > +    /* For PVH prevent access to the MMCFG areas. */
> > +    if ( dom0_pvh )
> > +        rc |= pci_mmcfg_set_domain_permissions(d);
> 
> What about ones reported by Dom0 later on? Which then raises the
> question whether ...

This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved handler.
But since you propose to do white listing, I guess it doesn't matter
that much anymore.

> > @@ -175,6 +177,25 @@ void pci_mmcfg_arch_disable(unsigned int idx)
> >             cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number);
> >  }
> >  
> > +int pci_mmcfg_set_domain_permissions(struct domain *d)
> > +{
> > +    unsigned int idx;
> > +    int rc = 0;
> > +
> > +    for ( idx = 0; idx < pci_mmcfg_config_num; idx++ )
> > +    {
> > +        const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg;
> > +        unsigned long start = PFN_DOWN(cfg->address) +
> > +                              PCI_BDF(cfg->start_bus_number, 0, 0);
> > +        unsigned long end = PFN_DOWN(cfg->address) +
> > +                            PCI_BDF(cfg->end_bus_number, ~0, ~0);
> > +
> > +        rc |= iomem_deny_access(d, start, end);
> 
> ... this shouldn't be unnecessary by, other than PV Dom0,
> starting out with no I/O memory being made accessible (i.e.
> white listing just like we decided we would do for other
> properties for PVH).

So would you like to switch to this white listing mode even for PV
Dom0, or just for PVH?

Should reserved regions and holes be added to it? Maybe only reserved
regions?

> Additionally while in the code that dom0_setup_permissions()
> was broken out from using |= was fine, there and here it's not
> really appropriate unless we want to continue to bake in the
> assumption that either iomem_deny_access() can only ever
> return a single error indicator or (b) the callers only care about
> the value being (non-)zero.

Right, I can fix that.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping
  2017-08-22 12:31   ` Jan Beulich
@ 2017-08-22 14:01     ` Roger Pau Monne
  2017-08-23  8:18       ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-22 14:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Kevin Tian

On Tue, Aug 22, 2017 at 06:31:27AM -0600, Jan Beulich wrote:
> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> > On certain Intel systems, as far as I can tell almost all pre-Haswell ones,
> > trying to boot a PVH Dom0 will freeze the box completely, up to the point that
> > not even the watchdog works. The freeze happens exactly when enabling the DMA
> > remapping in the IOMMU, the last line seen is:
> > 
> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
> > 
> > In order to workaround this (which seems to be a lack of proper RMRR entries,
> > plus the IOMMU being unable to generate faults and freezing the entire system)
> > add a PVH specific implementation of iommu_inclusive_mapping, that maps
> > non-RAM, non-unusable regions into Dom0 p2m. Note that care is taken to not map
> > device MMIO regions that Xen is emulating, like the local APIC or the IO APIC.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> I don't mean to object to the patch, but it certainly would be helpful
> to understand the behavior a little better, in particular also to
> perhaps be able to derive what RMRRs are missing (which could
> then be added via command line option instead of this all-or-norhing
> approach). Kevin, could you perhaps help here?

I tied that, but since the system freezes completely I have no idea
what's missing. It's quite clear to me that it's related to the IOMMU
and it's inability to properly generate a fault, but further than that
I have no other clue.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/4] x86/dom0: re-order DMA remapping enabling for PVH Dom0
  2017-08-22 12:37   ` Jan Beulich
@ 2017-08-22 14:05     ` Roger Pau Monne
  2017-08-23  8:21       ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-22 14:05 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Tue, Aug 22, 2017 at 06:37:15AM -0600, Jan Beulich wrote:
> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> > Make sure the reserved regions are setup before enabling the DMA
> > remapping in the IOMMU, by calling dom0_setup_permissions before
> > iommu_hwdom_init.
> 
> I can't match up this part with ...
> 
> > --- a/xen/arch/x86/hvm/dom0_build.c
> > +++ b/xen/arch/x86/hvm/dom0_build.c
> > @@ -605,13 +605,6 @@ static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
> >          return rc;
> >      }
> >  
> > -    rc = dom0_setup_permissions(d);
> > -    if ( rc )
> > -    {
> > -        panic("Unable to setup Dom0 permissions: %d\n", rc);
> > -        return rc;
> > -    }
> > -
> >      update_domain_wallclock_time(d);
> >  
> >      clear_bit(_VPF_down, &v->pause_flags);
> > @@ -1059,7 +1052,12 @@ int __init dom0_construct_pvh(struct domain *d, const module_t *image,
> >  
> >      printk("** Building a PVH Dom0 **\n");
> >  
> > -    iommu_hwdom_init(d);
> > +    rc = dom0_setup_permissions(d);
> > +    if ( rc )
> > +    {
> > +        printk("Unable to setup Dom0 permissions: %d\n", rc);
> > +        return rc;
> > +    }
> >  
> >      rc = pvh_setup_p2m(d);
> >      if ( rc )
> > @@ -1068,6 +1066,8 @@ int __init dom0_construct_pvh(struct domain *d, const module_t *image,
> >          return rc;
> >      }
> >  
> > +    iommu_hwdom_init(d);
> 
> ... you not changing the relative order between these two function
> calls. As to the other half I'm inclined to also wait for better
> understanding of what's going on here, as said in reply to patch 3.

Why not?

dom0_setup_permissions was called from pvh_setup_cpus, while
iommu_hwdom_init was the first function called in
dom0_construct_pvh.

After this patch dom0_setup_permissions is always called before
iommu_hwdom_init.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-22 13:54     ` Roger Pau Monne
@ 2017-08-23  8:16       ` Jan Beulich
  2017-08-25 12:15         ` Roger Pau Monne
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2017-08-23  8:16 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

>>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote:
> On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
>> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
>> > They are emulated by Xen, so they must not be mapped into Dom0 p2m.
>> > Introduce a helper function to add the MMCFG areas to the list of
>> > denied iomem regions for PVH Dom0.
>> 
>> "They are" or "They are going to be"?
> 
> This started as a series on top of vPCI, but I think it has a chance
> of getting in before vPCI. I will change it.

I guessed this would be the reason, but while reviewing the vPCI
series you've said somewhere functionality from the series here
would be implied.

>> > --- a/xen/arch/x86/dom0_build.c
>> > +++ b/xen/arch/x86/dom0_build.c
>> > @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain *d)
>> >              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
>> >      }
>> >  
>> > +    /* For PVH prevent access to the MMCFG areas. */
>> > +    if ( dom0_pvh )
>> > +        rc |= pci_mmcfg_set_domain_permissions(d);
>> 
>> What about ones reported by Dom0 later on? Which then raises the
>> question whether ...
> 
> This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved handler.
> But since you propose to do white listing, I guess it doesn't matter
> that much anymore.

Well, a fundamental question is whether white listing would work in
the first place. I could see room for severe problems e.g. with ACPI
methods wanting to access MMIO that's not described by any PCI
devices' BARs. Typically that would be regions in the chipset which
firmware is responsible for configuring/managing, the addresses of
which can be found/set in custom config space registers.

>> > @@ -175,6 +177,25 @@ void pci_mmcfg_arch_disable(unsigned int idx)
>> >             cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number);
>> >  }
>> >  
>> > +int pci_mmcfg_set_domain_permissions(struct domain *d)
>> > +{
>> > +    unsigned int idx;
>> > +    int rc = 0;
>> > +
>> > +    for ( idx = 0; idx < pci_mmcfg_config_num; idx++ )
>> > +    {
>> > +        const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg;
>> > +        unsigned long start = PFN_DOWN(cfg->address) +
>> > +                              PCI_BDF(cfg->start_bus_number, 0, 0);
>> > +        unsigned long end = PFN_DOWN(cfg->address) +
>> > +                            PCI_BDF(cfg->end_bus_number, ~0, ~0);
>> > +
>> > +        rc |= iomem_deny_access(d, start, end);
>> 
>> ... this shouldn't be unnecessary by, other than PV Dom0,
>> starting out with no I/O memory being made accessible (i.e.
>> white listing just like we decided we would do for other
>> properties for PVH).
> 
> So would you like to switch to this white listing mode even for PV
> Dom0, or just for PVH?

No, I certainly don't think we should touch PV here.

> Should reserved regions and holes be added to it? Maybe only reserved
> regions?

See above - reserved regions may be a minimum that's needed to
be added, but then again we can't be certain all BIOSes properly
report everything in use by the chipset/firmware as reserved. Otoh
they're called reserved because no-one outside of the firmware
should touch them.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping
  2017-08-22 14:01     ` Roger Pau Monne
@ 2017-08-23  8:18       ` Jan Beulich
  2017-08-28  6:14         ` Tian, Kevin
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2017-08-23  8:18 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Kevin Tian

>>> On 22.08.17 at 16:01, <roger.pau@citrix.com> wrote:
> On Tue, Aug 22, 2017 at 06:31:27AM -0600, Jan Beulich wrote:
>> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
>> > On certain Intel systems, as far as I can tell almost all pre-Haswell ones,
>> > trying to boot a PVH Dom0 will freeze the box completely, up to the point that
>> > not even the watchdog works. The freeze happens exactly when enabling the DMA
>> > remapping in the IOMMU, the last line seen is:
>> > 
>> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
>> > 
>> > In order to workaround this (which seems to be a lack of proper RMRR entries,
>> > plus the IOMMU being unable to generate faults and freezing the entire system)
>> > add a PVH specific implementation of iommu_inclusive_mapping, that maps
>> > non-RAM, non-unusable regions into Dom0 p2m. Note that care is taken to not map
>> > device MMIO regions that Xen is emulating, like the local APIC or the IO APIC.
>> > 
>> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> 
>> I don't mean to object to the patch, but it certainly would be helpful
>> to understand the behavior a little better, in particular also to
>> perhaps be able to derive what RMRRs are missing (which could
>> then be added via command line option instead of this all-or-norhing
>> approach). Kevin, could you perhaps help here?
> 
> I tied that, but since the system freezes completely I have no idea
> what's missing. It's quite clear to me that it's related to the IOMMU
> and it's inability to properly generate a fault, but further than that
> I have no other clue.

Hence my request for Kevin to help (perhaps indirectly by pulling
in other Intel folks). Someone being able to check what the chipset
actually does or being able to observe what's going on in a logic
analyzer should be able to explain the observed behavior.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 4/4] x86/dom0: re-order DMA remapping enabling for PVH Dom0
  2017-08-22 14:05     ` Roger Pau Monne
@ 2017-08-23  8:21       ` Jan Beulich
  0 siblings, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2017-08-23  8:21 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

>>> On 22.08.17 at 16:05, <roger.pau@citrix.com> wrote:
> On Tue, Aug 22, 2017 at 06:37:15AM -0600, Jan Beulich wrote:
>> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
>> > Make sure the reserved regions are setup before enabling the DMA
>> > remapping in the IOMMU, by calling dom0_setup_permissions before
>> > iommu_hwdom_init.
>> 
>> I can't match up this part with ...
>> 
>> > --- a/xen/arch/x86/hvm/dom0_build.c
>> > +++ b/xen/arch/x86/hvm/dom0_build.c
>> > @@ -605,13 +605,6 @@ static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
>> >          return rc;
>> >      }
>> >  
>> > -    rc = dom0_setup_permissions(d);
>> > -    if ( rc )
>> > -    {
>> > -        panic("Unable to setup Dom0 permissions: %d\n", rc);
>> > -        return rc;
>> > -    }
>> > -
>> >      update_domain_wallclock_time(d);
>> >  
>> >      clear_bit(_VPF_down, &v->pause_flags);
>> > @@ -1059,7 +1052,12 @@ int __init dom0_construct_pvh(struct domain *d, const module_t *image,
>> >  
>> >      printk("** Building a PVH Dom0 **\n");
>> >  
>> > -    iommu_hwdom_init(d);
>> > +    rc = dom0_setup_permissions(d);
>> > +    if ( rc )
>> > +    {
>> > +        printk("Unable to setup Dom0 permissions: %d\n", rc);
>> > +        return rc;
>> > +    }
>> >  
>> >      rc = pvh_setup_p2m(d);
>> >      if ( rc )
>> > @@ -1068,6 +1066,8 @@ int __init dom0_construct_pvh(struct domain *d, const module_t *image,
>> >          return rc;
>> >      }
>> >  
>> > +    iommu_hwdom_init(d);
>> 
>> ... you not changing the relative order between these two function
>> calls. As to the other half I'm inclined to also wait for better
>> understanding of what's going on here, as said in reply to patch 3.
> 
> Why not?

Oh, I'm sorry - I should have looked at function names in the
hunk headers instead of just the sequence of hunks.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-23  8:16       ` Jan Beulich
@ 2017-08-25 12:15         ` Roger Pau Monne
  2017-08-25 12:25           ` Jan Beulich
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-25 12:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Kevin Tian, xen-devel

On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote:
> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote:
> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> >> > --- a/xen/arch/x86/dom0_build.c
> >> > +++ b/xen/arch/x86/dom0_build.c
> >> > @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain *d)
> >> >              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
> >> >      }
> >> >  
> >> > +    /* For PVH prevent access to the MMCFG areas. */
> >> > +    if ( dom0_pvh )
> >> > +        rc |= pci_mmcfg_set_domain_permissions(d);
> >> 
> >> What about ones reported by Dom0 later on? Which then raises the
> >> question whether ...
> > 
> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved handler.
> > But since you propose to do white listing, I guess it doesn't matter
> > that much anymore.
> 
> Well, a fundamental question is whether white listing would work in
> the first place. I could see room for severe problems e.g. with ACPI
> methods wanting to access MMIO that's not described by any PCI
> devices' BARs. Typically that would be regions in the chipset which
> firmware is responsible for configuring/managing, the addresses of
> which can be found/set in custom config space registers.

The question would also be what would Xen allow in such white-listing.
Obviously you can get to map the same using both white-list and
black-listing (see below).

> >> > @@ -175,6 +177,25 @@ void pci_mmcfg_arch_disable(unsigned int idx)
> >> >             cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number);
> >> >  }
> >> >  
> >> > +int pci_mmcfg_set_domain_permissions(struct domain *d)
> >> > +{
> >> > +    unsigned int idx;
> >> > +    int rc = 0;
> >> > +
> >> > +    for ( idx = 0; idx < pci_mmcfg_config_num; idx++ )
> >> > +    {
> >> > +        const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg;
> >> > +        unsigned long start = PFN_DOWN(cfg->address) +
> >> > +                              PCI_BDF(cfg->start_bus_number, 0, 0);
> >> > +        unsigned long end = PFN_DOWN(cfg->address) +
> >> > +                            PCI_BDF(cfg->end_bus_number, ~0, ~0);
> >> > +
> >> > +        rc |= iomem_deny_access(d, start, end);
> >> 
> >> ... this shouldn't be unnecessary by, other than PV Dom0,
> >> starting out with no I/O memory being made accessible (i.e.
> >> white listing just like we decided we would do for other
> >> properties for PVH).
> > 
> > So would you like to switch to this white listing mode even for PV
> > Dom0, or just for PVH?
> 
> No, I certainly don't think we should touch PV here.
> 
> > Should reserved regions and holes be added to it? Maybe only reserved
> > regions?
> 
> See above - reserved regions may be a minimum that's needed to
> be added, but then again we can't be certain all BIOSes properly
> report everything in use by the chipset/firmware as reserved. Otoh
> they're called reserved because no-one outside of the firmware
> should touch them.

Right. On a more general comment I can see your suspicions on this
series, TBH I don't like to implement something like this either. This
series just paper over an issue in either the VT-d IOMMU
implementation in Xen, or a hardware errata in some IOMMUs found on
older hardware.

Having that said, I've tested now a slightly less intrusive variant,
which only maps reserved regions. This will still require Xen to
blacklsit the MMCFG regions, which reside in reserved areas. Is there
anything else Xen should blacklist from reserved regions?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-25 12:15         ` Roger Pau Monne
@ 2017-08-25 12:25           ` Jan Beulich
  2017-08-25 13:58             ` Roger Pau Monne
  0 siblings, 1 reply; 45+ messages in thread
From: Jan Beulich @ 2017-08-25 12:25 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Kevin Tian, xen-devel

>>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote:
> On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote:
>> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote:
>> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
>> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
>> >> > --- a/xen/arch/x86/dom0_build.c
>> >> > +++ b/xen/arch/x86/dom0_build.c
>> >> > @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain *d)
>> >> >              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
>> >> >      }
>> >> >  
>> >> > +    /* For PVH prevent access to the MMCFG areas. */
>> >> > +    if ( dom0_pvh )
>> >> > +        rc |= pci_mmcfg_set_domain_permissions(d);
>> >> 
>> >> What about ones reported by Dom0 later on? Which then raises the
>> >> question whether ...
>> > 
>> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved handler.
>> > But since you propose to do white listing, I guess it doesn't matter
>> > that much anymore.
>> 
>> Well, a fundamental question is whether white listing would work in
>> the first place. I could see room for severe problems e.g. with ACPI
>> methods wanting to access MMIO that's not described by any PCI
>> devices' BARs. Typically that would be regions in the chipset which
>> firmware is responsible for configuring/managing, the addresses of
>> which can be found/set in custom config space registers.
> 
> The question would also be what would Xen allow in such white-listing.
> Obviously you can get to map the same using both white-list and
> black-listing (see below).

Not really - what you've said there regarding MMCFG regions is
a clear indication that we should _not_ map reserved regions, i.e.
it would need to be full white listing with perhaps just the PCI
device BARs being handled automatically.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-25 12:25           ` Jan Beulich
@ 2017-08-25 13:58             ` Roger Pau Monne
  2017-08-28  6:18               ` Tian, Kevin
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-25 13:58 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Kevin Tian, xen-devel

On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote:
> >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote:
> > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote:
> >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote:
> >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
> >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> >> >> > --- a/xen/arch/x86/dom0_build.c
> >> >> > +++ b/xen/arch/x86/dom0_build.c
> >> >> > @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain *d)
> >> >> >              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
> >> >> >      }
> >> >> >  
> >> >> > +    /* For PVH prevent access to the MMCFG areas. */
> >> >> > +    if ( dom0_pvh )
> >> >> > +        rc |= pci_mmcfg_set_domain_permissions(d);
> >> >> 
> >> >> What about ones reported by Dom0 later on? Which then raises the
> >> >> question whether ...
> >> > 
> >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved handler.
> >> > But since you propose to do white listing, I guess it doesn't matter
> >> > that much anymore.
> >> 
> >> Well, a fundamental question is whether white listing would work in
> >> the first place. I could see room for severe problems e.g. with ACPI
> >> methods wanting to access MMIO that's not described by any PCI
> >> devices' BARs. Typically that would be regions in the chipset which
> >> firmware is responsible for configuring/managing, the addresses of
> >> which can be found/set in custom config space registers.
> > 
> > The question would also be what would Xen allow in such white-listing.
> > Obviously you can get to map the same using both white-list and
> > black-listing (see below).
> 
> Not really - what you've said there regarding MMCFG regions is
> a clear indication that we should _not_ map reserved regions, i.e.
> it would need to be full white listing with perhaps just the PCI
> device BARs being handled automatically.

I've tried just mapping the BARs and that sadly doesn't work, the box
hangs after the IOMMU is enabled:

[...]
(XEN) [VT-D]d0:PCI: map 0000:3f:13.5
(XEN) [VT-D]d0:PCI: map 0000:3f:13.6
(XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000

I will park this ATM and leave it for the Intel guys to diagnose.

For the reference, the specific box I'm testing ATM has a Xeon(R) CPU
E5-1607 0 @ 3.00GHz and a C600/X79 chipset.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-17  9:32     ` Roger Pau Monne
@ 2017-08-28  6:04       ` Tian, Kevin
  0 siblings, 0 replies; 45+ messages in thread
From: Tian, Kevin @ 2017-08-28  6:04 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Jan Beulich, Andrew Cooper

> From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> Sent: Thursday, August 17, 2017 5:32 PM
> 
> On Thu, Aug 17, 2017 at 03:12:02AM +0000, Tian, Kevin wrote:
> > > From: Roger Pau Monne
> > > Sent: Saturday, August 12, 2017 12:43 AM
> > >
> > > They are emulated by Xen, so they must not be mapped into Dom0 p2m.
> > > Introduce a helper function to add the MMCFG areas to the list of
> > > denied iomem regions for PVH Dom0.
> > >
> > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >
> > this patch is a general fix, not just for inclusive mapping. please send
> > it separately.
> 
> Hm, not really.
> 
> PV Dom0 should have access to the MMCFG areas, PVH Dom0 shouldn't
> because they will emulated by Xen.
> 
> So far MMCFG areas are not mapped into PVH Dom0 p2m, but they will be
> once iommu_inclusive_mapping is implemented for PVH Dom0. So I
> consider this a preparatory change before enabling
> iommu_inclusive_mapping for PVH, rather than a fix. It would be a
> fix if iommu_inclusive_mapping was already enabled for PVH Dom0.
>  

Possibly you need a better description here. otherwise current
description has nothing to do with inclusive mapping, based on
which it looks a basic PVH dom0 problem (while from your 
explanation it's not valid today).

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/4] x86/dom0: prevent PVH Dom0 from mapping read-only the IO APIC area
  2017-08-17  9:35     ` Roger Pau Monne
@ 2017-08-28  6:07       ` Tian, Kevin
  0 siblings, 0 replies; 45+ messages in thread
From: Tian, Kevin @ 2017-08-28  6:07 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Jan Beulich, Andrew Cooper

> From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> Sent: Thursday, August 17, 2017 5:35 PM
> 
> On Thu, Aug 17, 2017 at 03:12:45AM +0000, Tian, Kevin wrote:
> > > From: Roger Pau Monne
> > > Sent: Saturday, August 12, 2017 12:43 AM
> > >
> > > This is emulated by Xen and must not be mapped into PVH Dom0 p2m.
> >
> > same comment as previous one. please send it separately.
> 
> This will only be mapped once iommu_inclusive_mapping is available for
> PVH Dom0, which is what patch #3 does. It's not a bugfix because the
> bug it would be fix doesn't exist yet.
> 

Similarly please add more explanation why it's only includsive
mapping specific. For people not familiar with PVH specifics,
it's hard to get that feeling simply looking at the current patch
description and actual patch which looks like a general change.
e.g. you may want to explain why PVH dom0 doesn't require
iomem_deny_access so far while it becomes necessary later...

Thanks
kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping
  2017-08-17  9:39     ` Roger Pau Monne
@ 2017-08-28  6:13       ` Tian, Kevin
  0 siblings, 0 replies; 45+ messages in thread
From: Tian, Kevin @ 2017-08-28  6:13 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel

> From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> Sent: Thursday, August 17, 2017 5:39 PM
> 
> > >
> > > +void __hwdom_init vtd_set_pvh_hwdom_mapping(struct domain *d)
> > > +{
> > > +    unsigned long pfn;
> > > +
> > > +    BUG_ON(!is_hardware_domain(d));
> > > +
> > > +    if ( !iommu_inclusive_mapping )
> > > +        return;
> > > +
> > > +    /* NB: the low 1MB is already mapped in pvh_setup_p2m. */
> > > +    for ( pfn = PFN_DOWN(MB(1)); pfn < PFN_DOWN(GB(4)); pfn++ )
> > > +    {
> > > +        p2m_access_t a;
> > > +        int rc;
> > > +
> > > +        if ( !(pfn & 0xfff) )
> > > +            process_pending_softirqs();
> > > +
> > > +        /* Skip RAM, ACPI and unusable regions. */
> > > +        if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) ||
> > > +             page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) ||
> > > +             page_is_ram_type(pfn, RAM_TYPE_ACPI) ||
> > > +             !iomem_access_permitted(d, pfn, pfn) )
> > > +            continue;
> >
> > I'm a bit confused here. So you only handle RESERVED memory
> > type here, which doesn't match the definition of inclusive mapping.
> >
> > /*
> >  * iommu_inclusive_mapping: when set, all memory below 4GB is
> included in dom0
> >  * 1:1 iommu mappings except xen and unusable regions.
> >  */
> >
> > there must be some background which I missed...
> 
> Right, RAM and ACPI regions are already mapped by the Dom0 builder, so
> the only thing left are reserved regions not being used by Xen.
> 
> I can expand the comment above to say:
> 
> /*
>  * Skip RAM, ACPI and unusable regions because they have been already
>  * mapped by the PVH Dom0 builder.
>  */
> 
> Does that seem better?
> 
> Roger.

yes it's better. btw if you can limit the function name further then 
it might be clearer, e.g. vtd_set_pvh_hwdom_reserved_mapping


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping
  2017-08-23  8:18       ` Jan Beulich
@ 2017-08-28  6:14         ` Tian, Kevin
  2017-08-29  7:39           ` Roger Pau Monne
  0 siblings, 1 reply; 45+ messages in thread
From: Tian, Kevin @ 2017-08-28  6:14 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monne; +Cc: xen-devel

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, August 23, 2017 4:19 PM
> To: Roger Pau Monne <roger.pau@citrix.com>
> Cc: Tian, Kevin <kevin.tian@intel.com>; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [PATCH v2 3/4] x86/vtd: introduce a PVH
> implementation of iommu_inclusive_mapping
> 
> >>> On 22.08.17 at 16:01, <roger.pau@citrix.com> wrote:
> > On Tue, Aug 22, 2017 at 06:31:27AM -0600, Jan Beulich wrote:
> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> >> > On certain Intel systems, as far as I can tell almost all pre-Haswell ones,
> >> > trying to boot a PVH Dom0 will freeze the box completely, up to the
> point that
> >> > not even the watchdog works. The freeze happens exactly when
> enabling the DMA
> >> > remapping in the IOMMU, the last line seen is:
> >> >
> >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg =
> ffff82c00021b000
> >> >
> >> > In order to workaround this (which seems to be a lack of proper RMRR
> entries,
> >> > plus the IOMMU being unable to generate faults and freezing the
> entire system)
> >> > add a PVH specific implementation of iommu_inclusive_mapping, that
> maps
> >> > non-RAM, non-unusable regions into Dom0 p2m. Note that care is
> taken to not map
> >> > device MMIO regions that Xen is emulating, like the local APIC or the IO
> APIC.
> >> >
> >> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>
> >> I don't mean to object to the patch, but it certainly would be helpful
> >> to understand the behavior a little better, in particular also to
> >> perhaps be able to derive what RMRRs are missing (which could
> >> then be added via command line option instead of this all-or-norhing
> >> approach). Kevin, could you perhaps help here?
> >
> > I tied that, but since the system freezes completely I have no idea
> > what's missing. It's quite clear to me that it's related to the IOMMU
> > and it's inability to properly generate a fault, but further than that
> > I have no other clue.
> 
> Hence my request for Kevin to help (perhaps indirectly by pulling
> in other Intel folks). Someone being able to check what the chipset
> actually does or being able to observe what's going on in a logic
> analyzer should be able to explain the observed behavior.
> 

We don't have logic analyzer specifically to examine VTd, but yes
we can help have a try whether it's reproducible in our side and
then do some analysis.

what's the hardware configuration?

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-25 13:58             ` Roger Pau Monne
@ 2017-08-28  6:18               ` Tian, Kevin
  2017-08-29  7:33                 ` Roger Pau Monne
  0 siblings, 1 reply; 45+ messages in thread
From: Tian, Kevin @ 2017-08-28  6:18 UTC (permalink / raw)
  To: Roger Pau Monne, Jan Beulich; +Cc: Andrew Cooper, Gao, Chao, xen-devel

> From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> Sent: Friday, August 25, 2017 9:59 PM
> 
> On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote:
> > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote:
> > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote:
> > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote:
> > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
> > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> > >> >> > --- a/xen/arch/x86/dom0_build.c
> > >> >> > +++ b/xen/arch/x86/dom0_build.c
> > >> >> > @@ -440,6 +440,10 @@ int __init
> dom0_setup_permissions(struct domain *d)
> > >> >> >              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
> > >> >> >      }
> > >> >> >
> > >> >> > +    /* For PVH prevent access to the MMCFG areas. */
> > >> >> > +    if ( dom0_pvh )
> > >> >> > +        rc |= pci_mmcfg_set_domain_permissions(d);
> > >> >>
> > >> >> What about ones reported by Dom0 later on? Which then raises the
> > >> >> question whether ...
> > >> >
> > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved
> handler.
> > >> > But since you propose to do white listing, I guess it doesn't matter
> > >> > that much anymore.
> > >>
> > >> Well, a fundamental question is whether white listing would work in
> > >> the first place. I could see room for severe problems e.g. with ACPI
> > >> methods wanting to access MMIO that's not described by any PCI
> > >> devices' BARs. Typically that would be regions in the chipset which
> > >> firmware is responsible for configuring/managing, the addresses of
> > >> which can be found/set in custom config space registers.
> > >
> > > The question would also be what would Xen allow in such white-listing.
> > > Obviously you can get to map the same using both white-list and
> > > black-listing (see below).
> >
> > Not really - what you've said there regarding MMCFG regions is
> > a clear indication that we should _not_ map reserved regions, i.e.
> > it would need to be full white listing with perhaps just the PCI
> > device BARs being handled automatically.
> 
> I've tried just mapping the BARs and that sadly doesn't work, the box
> hangs after the IOMMU is enabled:
> 
> [...]
> (XEN) [VT-D]d0:PCI: map 0000:3f:13.5
> (XEN) [VT-D]d0:PCI: map 0000:3f:13.6
> (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
> 
> I will park this ATM and leave it for the Intel guys to diagnose.
> 
> For the reference, the specific box I'm testing ATM has a Xeon(R) CPU
> E5-1607 0 @ 3.00GHz and a C600/X79 chipset.
> 

+Chao who can help check whether we have such a box at hand.

btw please also give your BIOS version.

Thanks
kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-28  6:18               ` Tian, Kevin
@ 2017-08-29  7:33                 ` Roger Pau Monne
  2017-08-31  7:32                   ` Chao Gao
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-29  7:33 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Andrew Cooper, Gao, Chao, Jan Beulich, xen-devel

On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> > Sent: Friday, August 25, 2017 9:59 PM
> > 
> > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote:
> > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote:
> > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote:
> > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote:
> > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
> > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> > > >> >> > --- a/xen/arch/x86/dom0_build.c
> > > >> >> > +++ b/xen/arch/x86/dom0_build.c
> > > >> >> > @@ -440,6 +440,10 @@ int __init
> > dom0_setup_permissions(struct domain *d)
> > > >> >> >              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
> > > >> >> >      }
> > > >> >> >
> > > >> >> > +    /* For PVH prevent access to the MMCFG areas. */
> > > >> >> > +    if ( dom0_pvh )
> > > >> >> > +        rc |= pci_mmcfg_set_domain_permissions(d);
> > > >> >>
> > > >> >> What about ones reported by Dom0 later on? Which then raises the
> > > >> >> question whether ...
> > > >> >
> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved
> > handler.
> > > >> > But since you propose to do white listing, I guess it doesn't matter
> > > >> > that much anymore.
> > > >>
> > > >> Well, a fundamental question is whether white listing would work in
> > > >> the first place. I could see room for severe problems e.g. with ACPI
> > > >> methods wanting to access MMIO that's not described by any PCI
> > > >> devices' BARs. Typically that would be regions in the chipset which
> > > >> firmware is responsible for configuring/managing, the addresses of
> > > >> which can be found/set in custom config space registers.
> > > >
> > > > The question would also be what would Xen allow in such white-listing.
> > > > Obviously you can get to map the same using both white-list and
> > > > black-listing (see below).
> > >
> > > Not really - what you've said there regarding MMCFG regions is
> > > a clear indication that we should _not_ map reserved regions, i.e.
> > > it would need to be full white listing with perhaps just the PCI
> > > device BARs being handled automatically.
> > 
> > I've tried just mapping the BARs and that sadly doesn't work, the box
> > hangs after the IOMMU is enabled:
> > 
> > [...]
> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5
> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6
> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
> > 
> > I will park this ATM and leave it for the Intel guys to diagnose.
> > 
> > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU
> > E5-1607 0 @ 3.00GHz and a C600/X79 chipset.
> > 
> 
> +Chao who can help check whether we have such a box at hand.
> 
> btw please also give your BIOS version.

It's a Precision T3600 BIOS A14.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping
  2017-08-28  6:14         ` Tian, Kevin
@ 2017-08-29  7:39           ` Roger Pau Monne
  0 siblings, 0 replies; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-29  7:39 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: xen-devel, Jan Beulich

On Mon, Aug 28, 2017 at 06:14:45AM +0000, Tian, Kevin wrote:
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Wednesday, August 23, 2017 4:19 PM
> > To: Roger Pau Monne <roger.pau@citrix.com>
> > Cc: Tian, Kevin <kevin.tian@intel.com>; xen-devel@lists.xenproject.org
> > Subject: Re: [Xen-devel] [PATCH v2 3/4] x86/vtd: introduce a PVH
> > implementation of iommu_inclusive_mapping
> > 
> > >>> On 22.08.17 at 16:01, <roger.pau@citrix.com> wrote:
> > > On Tue, Aug 22, 2017 at 06:31:27AM -0600, Jan Beulich wrote:
> > >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> > >> > On certain Intel systems, as far as I can tell almost all pre-Haswell ones,
> > >> > trying to boot a PVH Dom0 will freeze the box completely, up to the
> > point that
> > >> > not even the watchdog works. The freeze happens exactly when
> > enabling the DMA
> > >> > remapping in the IOMMU, the last line seen is:
> > >> >
> > >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg =
> > ffff82c00021b000
> > >> >
> > >> > In order to workaround this (which seems to be a lack of proper RMRR
> > entries,
> > >> > plus the IOMMU being unable to generate faults and freezing the
> > entire system)
> > >> > add a PVH specific implementation of iommu_inclusive_mapping, that
> > maps
> > >> > non-RAM, non-unusable regions into Dom0 p2m. Note that care is
> > taken to not map
> > >> > device MMIO regions that Xen is emulating, like the local APIC or the IO
> > APIC.
> > >> >
> > >> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > >>
> > >> I don't mean to object to the patch, but it certainly would be helpful
> > >> to understand the behavior a little better, in particular also to
> > >> perhaps be able to derive what RMRRs are missing (which could
> > >> then be added via command line option instead of this all-or-norhing
> > >> approach). Kevin, could you perhaps help here?
> > >
> > > I tied that, but since the system freezes completely I have no idea
> > > what's missing. It's quite clear to me that it's related to the IOMMU
> > > and it's inability to properly generate a fault, but further than that
> > > I have no other clue.
> > 
> > Hence my request for Kevin to help (perhaps indirectly by pulling
> > in other Intel folks). Someone being able to check what the chipset
> > actually does or being able to observe what's going on in a logic
> > analyzer should be able to explain the observed behavior.
> > 
> 
> We don't have logic analyzer specifically to examine VTd, but yes
> we can help have a try whether it's reproducible in our side and
> then do some analysis.
> 
> what's the hardware configuration?

It's a Dell Precision T3600 with Xeon(R) CPU E5-1607 0 @ 3.00GHz, a
C600/X79 chipset and BIOS version A14.

Just trying to boot a Xen kernel with dom0=pvh will show the issue
(ie: you won't be able to reach the panic at the end of the Dom0
builder because the box will get stuck in the iommu_hwdom_init
call).

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-29  7:33                 ` Roger Pau Monne
@ 2017-08-31  7:32                   ` Chao Gao
  2017-08-31  8:53                     ` Roger Pau Monne
  2017-08-31  9:03                     ` Roger Pau Monne
  0 siblings, 2 replies; 45+ messages in thread
From: Chao Gao @ 2017-08-31  7:32 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Tian, Kevin, Jan Beulich, xen-devel

On Tue, Aug 29, 2017 at 08:33:25AM +0100, Roger Pau Monne wrote:
>On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote:
>> > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
>> > Sent: Friday, August 25, 2017 9:59 PM
>> > 
>> > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote:
>> > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote:
>> > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote:
>> > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote:
>> > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
>> > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
>> > > >> >> > --- a/xen/arch/x86/dom0_build.c
>> > > >> >> > +++ b/xen/arch/x86/dom0_build.c
>> > > >> >> > @@ -440,6 +440,10 @@ int __init
>> > dom0_setup_permissions(struct domain *d)
>> > > >> >> >              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
>> > > >> >> >      }
>> > > >> >> >
>> > > >> >> > +    /* For PVH prevent access to the MMCFG areas. */
>> > > >> >> > +    if ( dom0_pvh )
>> > > >> >> > +        rc |= pci_mmcfg_set_domain_permissions(d);
>> > > >> >>
>> > > >> >> What about ones reported by Dom0 later on? Which then raises the
>> > > >> >> question whether ...
>> > > >> >
>> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved
>> > handler.
>> > > >> > But since you propose to do white listing, I guess it doesn't matter
>> > > >> > that much anymore.
>> > > >>
>> > > >> Well, a fundamental question is whether white listing would work in
>> > > >> the first place. I could see room for severe problems e.g. with ACPI
>> > > >> methods wanting to access MMIO that's not described by any PCI
>> > > >> devices' BARs. Typically that would be regions in the chipset which
>> > > >> firmware is responsible for configuring/managing, the addresses of
>> > > >> which can be found/set in custom config space registers.
>> > > >
>> > > > The question would also be what would Xen allow in such white-listing.
>> > > > Obviously you can get to map the same using both white-list and
>> > > > black-listing (see below).
>> > >
>> > > Not really - what you've said there regarding MMCFG regions is
>> > > a clear indication that we should _not_ map reserved regions, i.e.
>> > > it would need to be full white listing with perhaps just the PCI
>> > > device BARs being handled automatically.
>> > 
>> > I've tried just mapping the BARs and that sadly doesn't work, the box
>> > hangs after the IOMMU is enabled:
>> > 
>> > [...]
>> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5
>> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6
>> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
>> > 
>> > I will park this ATM and leave it for the Intel guys to diagnose.
>> > 
>> > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU
>> > E5-1607 0 @ 3.00GHz and a C600/X79 chipset.
>> > 
>> 
>> +Chao who can help check whether we have such a box at hand.
>> 
>> btw please also give your BIOS version.
>
>It's a Precision T3600 BIOS A14.

Hi, Roger.

I found a Ivy bridge box with E5-2697 v2 and tested with "dom0=pvh", and
the bug didn't occur on this box. The log is below:
XEN) [    7.509588] [VT-D]d0:PCIe: map 0000:ff:1e.2
(XEN) [    7.511047] [VT-D]d0:PCIe: map 0000:ff:1e.3
(XEN) [    7.512463] [VT-D]d0:PCIe: map 0000:ff:1e.4
(XEN) [    7.513927] [VT-D]d0:PCIe: map 0000:ff:1e.5
(XEN) [    7.515342] [VT-D]d0:PCIe: map 0000:ff:1e.6
(XEN) [    7.516808] [VT-D]d0:PCIe: map 0000:ff:1e.7
(XEN) [    7.519449] [VT-D]iommu_enable_translation: iommu->reg =
ffff82c00021b000
(XEN) [    7.522295] [VT-D]iommu_enable_translation: iommu->reg =
ffff82c00021d000
(XEN) [    8.675096] OS: linux version: 2.6 loader: generic bitness:
64-bit
(XEN) [    8.726763] 
(XEN) [    8.730171] ****************************************
(XEN) [    8.737491] Panic on CPU 0:
(XEN) [    8.742376] Building a PVHv2 Dom0 is not yet supported.
(XEN) [    8.750148] ****************************************
(XEN) [    8.757457] 
(XEN) [    8.760877] Reboot in five seconds...
(XEN) [   13.769050] Resetting with ACPI MEMORY or I/O RESET_REG

I agree with you that there may be some bugs in firmware and VT-d.
I did trials on a haswell box with iommu_inclusive_mapping=false. I did
see DMA traslation fault. The address to be translated is reserved in
e820 but isn't included in RMRR. Even that, the box booted up
successfully.

But if the bug exists in pvh dom0, it also exists in pv dom0. Could you
try to boot with pv dom0 and set iommu_inclusive_mapping=false?
Theoretically, the system would halt exactly like what you met in
pvh dom0. Is that right? or I miss some difference between pvh dom0 and
pv dom0.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-31  9:03                     ` Roger Pau Monne
@ 2017-08-31  8:45                       ` Chao Gao
  2017-08-31 10:09                         ` Roger Pau Monne
  0 siblings, 1 reply; 45+ messages in thread
From: Chao Gao @ 2017-08-31  8:45 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Tian, Kevin, Jan Beulich, xen-devel

On Thu, Aug 31, 2017 at 10:03:19AM +0100, Roger Pau Monne wrote:
>On Thu, Aug 31, 2017 at 03:32:42PM +0800, Chao Gao wrote:
>> On Tue, Aug 29, 2017 at 08:33:25AM +0100, Roger Pau Monne wrote:
>> >On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote:
>> >> > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
>> >> > Sent: Friday, August 25, 2017 9:59 PM
>> >> > 
>> >> > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote:
>> >> > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote:
>> >> > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote:
>> >> > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote:
>> >> > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
>> >> > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
>> >> > > >> >> > --- a/xen/arch/x86/dom0_build.c
>> >> > > >> >> > +++ b/xen/arch/x86/dom0_build.c
>> >> > > >> >> > @@ -440,6 +440,10 @@ int __init
>> >> > dom0_setup_permissions(struct domain *d)
>> >> > > >> >> >              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
>> >> > > >> >> >      }
>> >> > > >> >> >
>> >> > > >> >> > +    /* For PVH prevent access to the MMCFG areas. */
>> >> > > >> >> > +    if ( dom0_pvh )
>> >> > > >> >> > +        rc |= pci_mmcfg_set_domain_permissions(d);
>> >> > > >> >>
>> >> > > >> >> What about ones reported by Dom0 later on? Which then raises the
>> >> > > >> >> question whether ...
>> >> > > >> >
>> >> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved
>> >> > handler.
>> >> > > >> > But since you propose to do white listing, I guess it doesn't matter
>> >> > > >> > that much anymore.
>> >> > > >>
>> >> > > >> Well, a fundamental question is whether white listing would work in
>> >> > > >> the first place. I could see room for severe problems e.g. with ACPI
>> >> > > >> methods wanting to access MMIO that's not described by any PCI
>> >> > > >> devices' BARs. Typically that would be regions in the chipset which
>> >> > > >> firmware is responsible for configuring/managing, the addresses of
>> >> > > >> which can be found/set in custom config space registers.
>> >> > > >
>> >> > > > The question would also be what would Xen allow in such white-listing.
>> >> > > > Obviously you can get to map the same using both white-list and
>> >> > > > black-listing (see below).
>> >> > >
>> >> > > Not really - what you've said there regarding MMCFG regions is
>> >> > > a clear indication that we should _not_ map reserved regions, i.e.
>> >> > > it would need to be full white listing with perhaps just the PCI
>> >> > > device BARs being handled automatically.
>> >> > 
>> >> > I've tried just mapping the BARs and that sadly doesn't work, the box
>> >> > hangs after the IOMMU is enabled:
>> >> > 
>> >> > [...]
>> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5
>> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6
>> >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
>> >> > 
>> >> > I will park this ATM and leave it for the Intel guys to diagnose.
>> >> > 
>> >> > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU
>> >> > E5-1607 0 @ 3.00GHz and a C600/X79 chipset.
>> >> > 
>> >> 
>> >> +Chao who can help check whether we have such a box at hand.
>> >> 
>> >> btw please also give your BIOS version.
>> >
>> >It's a Precision T3600 BIOS A14.
>> 
>> Hi, Roger.
>> 
>> I found a Ivy bridge box with E5-2697 v2 and tested with "dom0=pvh", and
>
>The ones I've seen issues with are Sandy Bridge or Nehalem, can you
>find some of this hardware?

As I expected, I was removed from recipents :(, which made me
hard to notice your replies in time. 

Yes. I will. But may take some time (for even Ivy Bridge is rare).

>
>I haven't tested Ivy Bridge, but all Haswell boxes I've tested seem to
>work just fine.

The reason why I chose Ivy Bridge partly is you said you found this bug on
almost pre-haswell box.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-31  7:32                   ` Chao Gao
@ 2017-08-31  8:53                     ` Roger Pau Monne
  2017-08-31  9:03                     ` Roger Pau Monne
  1 sibling, 0 replies; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-31  8:53 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich, Andrew Cooper, xen-devel

On Thu, Aug 31, 2017 at 03:32:42PM +0800, Chao Gao wrote:
> On Tue, Aug 29, 2017 at 08:33:25AM +0100, Roger Pau Monne wrote:
> >On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote:
> >> > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> >> > Sent: Friday, August 25, 2017 9:59 PM
> >> > 
> >> > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote:
> >> > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote:
> >> > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote:
> >> > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote:
> >> > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
> >> > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> >> > > >> >> > --- a/xen/arch/x86/dom0_build.c
> >> > > >> >> > +++ b/xen/arch/x86/dom0_build.c
> >> > > >> >> > @@ -440,6 +440,10 @@ int __init
> >> > dom0_setup_permissions(struct domain *d)
> >> > > >> >> >              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
> >> > > >> >> >      }
> >> > > >> >> >
> >> > > >> >> > +    /* For PVH prevent access to the MMCFG areas. */
> >> > > >> >> > +    if ( dom0_pvh )
> >> > > >> >> > +        rc |= pci_mmcfg_set_domain_permissions(d);
> >> > > >> >>
> >> > > >> >> What about ones reported by Dom0 later on? Which then raises the
> >> > > >> >> question whether ...
> >> > > >> >
> >> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved
> >> > handler.
> >> > > >> > But since you propose to do white listing, I guess it doesn't matter
> >> > > >> > that much anymore.
> >> > > >>
> >> > > >> Well, a fundamental question is whether white listing would work in
> >> > > >> the first place. I could see room for severe problems e.g. with ACPI
> >> > > >> methods wanting to access MMIO that's not described by any PCI
> >> > > >> devices' BARs. Typically that would be regions in the chipset which
> >> > > >> firmware is responsible for configuring/managing, the addresses of
> >> > > >> which can be found/set in custom config space registers.
> >> > > >
> >> > > > The question would also be what would Xen allow in such white-listing.
> >> > > > Obviously you can get to map the same using both white-list and
> >> > > > black-listing (see below).
> >> > >
> >> > > Not really - what you've said there regarding MMCFG regions is
> >> > > a clear indication that we should _not_ map reserved regions, i.e.
> >> > > it would need to be full white listing with perhaps just the PCI
> >> > > device BARs being handled automatically.
> >> > 
> >> > I've tried just mapping the BARs and that sadly doesn't work, the box
> >> > hangs after the IOMMU is enabled:
> >> > 
> >> > [...]
> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5
> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6
> >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
> >> > 
> >> > I will park this ATM and leave it for the Intel guys to diagnose.
> >> > 
> >> > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU
> >> > E5-1607 0 @ 3.00GHz and a C600/X79 chipset.
> >> > 
> >> 
> >> +Chao who can help check whether we have such a box at hand.
> >> 
> >> btw please also give your BIOS version.
> >
> >It's a Precision T3600 BIOS A14.
> 
> Hi, Roger.
> 
> I found a Ivy bridge box with E5-2697 v2 and tested with "dom0=pvh", and
> the bug didn't occur on this box. The log is below:
> XEN) [    7.509588] [VT-D]d0:PCIe: map 0000:ff:1e.2
> (XEN) [    7.511047] [VT-D]d0:PCIe: map 0000:ff:1e.3
> (XEN) [    7.512463] [VT-D]d0:PCIe: map 0000:ff:1e.4
> (XEN) [    7.513927] [VT-D]d0:PCIe: map 0000:ff:1e.5
> (XEN) [    7.515342] [VT-D]d0:PCIe: map 0000:ff:1e.6
> (XEN) [    7.516808] [VT-D]d0:PCIe: map 0000:ff:1e.7
> (XEN) [    7.519449] [VT-D]iommu_enable_translation: iommu->reg =
> ffff82c00021b000
> (XEN) [    7.522295] [VT-D]iommu_enable_translation: iommu->reg =
> ffff82c00021d000
> (XEN) [    8.675096] OS: linux version: 2.6 loader: generic bitness:
> 64-bit
> (XEN) [    8.726763] 
> (XEN) [    8.730171] ****************************************
> (XEN) [    8.737491] Panic on CPU 0:
> (XEN) [    8.742376] Building a PVHv2 Dom0 is not yet supported.
> (XEN) [    8.750148] ****************************************
> (XEN) [    8.757457] 
> (XEN) [    8.760877] Reboot in five seconds...
> (XEN) [   13.769050] Resetting with ACPI MEMORY or I/O RESET_REG
> 
> I agree with you that there may be some bugs in firmware and VT-d.
> I did trials on a haswell box with iommu_inclusive_mapping=false. I did
> see DMA traslation fault. The address to be translated is reserved in
> e820 but isn't included in RMRR. Even that, the box booted up
> successfully.
> 
> But if the bug exists in pvh dom0, it also exists in pv dom0. Could you
> try to boot with pv dom0 and set iommu_inclusive_mapping=false?
> Theoretically, the system would halt exactly like what you met in
> pvh dom0. Is that right? or I miss some difference between pvh dom0 and
> pv dom0.

Yes, the same happens with iommu_incluse_mapping=false on PV, the
issue is not PVH specific. Here is the full dmesg:

 Xen 4.10-unstable
(XEN) Xen version 4.10-unstable (root@) (FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 4.0.0)) debug=y  Thu Aug 31 09:47:55 BST 2017
(XEN) Latest ChangeSet:
(XEN) Console output is synchronous.
(XEN) Bootloader: FreeBSD Loader
(XEN) Command line: dom0_mem=4096M com1=115200,8n1 console=com1,vga guest_loglvl=all loglvl=all iommu=debug,verbose sync_console watchdog iommu_inclusive_mapping=false
(XEN) Xen image load base address: 0
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN)  Found 2 MBR signatures
(XEN)  Found 2 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000008dc00 (usable)
(XEN)  000000000008dc00 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 0000000018ebb000 (usable)
(XEN)  0000000018ebb000 - 0000000018fe8000 (ACPI NVS)
(XEN)  0000000018fe8000 - 0000000018fe9000 (usable)
(XEN)  0000000018fe9000 - 0000000019000000 (ACPI NVS)
(XEN)  0000000019000000 - 000000001dffd000 (usable)
(XEN)  000000001dffd000 - 000000001e000000 (ACPI data)
(XEN)  000000001e000000 - 00000000ac784000 (usable)
(XEN)  00000000ac784000 - 00000000ac818000 (reserved)
(XEN)  00000000ac818000 - 00000000ad800000 (usable)
(XEN)  00000000b0000000 - 00000000b4000000 (reserved)
(XEN)  00000000fed20000 - 00000000fed40000 (reserved)
(XEN)  00000000fed50000 - 00000000fed90000 (reserved)
(XEN)  00000000ffa00000 - 00000000ffa40000 (reserved)
(XEN)  0000000100000000 - 0000000250000000 (usable)
(XEN) New Xen image base address: 0xad200000
(XEN) ACPI: RSDP 000FE300, 0024 (r2 DELL  )
(XEN) ACPI: XSDT 1DFFEE18, 0074 (r1 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: FACP 18FEFD98, 00F4 (r4 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: DSDT 18FA9018, 6373 (r1 DELL    CBX3           0 INTL 20091112)
(XEN) ACPI: FACS 18FF1F40, 0040
(XEN) ACPI: APIC 1DFFDC18, 0158 (r2 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: MCFG 18FFED18, 003C (r1 A M I  OEMMCFG.  6222004 MSFT       97)
(XEN) ACPI: TCPA 18FFEC98, 0032 (r2                        0             0)
(XEN) ACPI: SSDT 18FF0A98, 0306 (r1 DELLTP      TPM     3000 INTL 20091112)
(XEN) ACPI: HPET 18FFEC18, 0038 (r1 A M I   PCHHPET  6222004 AMI.        3)
(XEN) ACPI: BOOT 18FFEB98, 0028 (r1 DELL   CBX3      6222004 AMI     10013)
(XEN) ACPI: SSDT 18FB0018, 36FFE (r2  INTEL    CpuPm     4000 INTL 20091112)
(XEN) ACPI: SLIC 18FEEC18, 0176 (r3 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: DMAR 18FF1B18, 0094 (r1 A M I   OEMDMAR        1 INTL        1)
(XEN) System RAM: 8149MB (8345288kB)
(XEN) No NUMA configuration found
(XEN) Faking a node at 0000000000000000-0000000250000000
(XEN) Domain heap initialised
(XEN) CPU Vendor: Intel, Family 6 (0x6), Model 45 (0x2d), Stepping 7 (raw 000206d7)
(XEN) found SMP MP-table at 000f1db0
(XEN) DMI 2.6 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x408 (32 bits)
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:404,1:0], pm1x_evt[1:400,1:0]
(XEN) ACPI: 32/64X FACS address mismatch in FADT - 18ffdf40/0000000018ff1f40, using 32
(XEN) ACPI:             wakeup_vec[18ffdf4c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x06] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x09] lapic_id[0x08] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x09] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0a] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0b] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0c] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0d] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0e] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x10] lapic_id[0x0f] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x11] lapic_id[0x10] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x12] lapic_id[0x11] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x13] lapic_id[0x12] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x14] lapic_id[0x13] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x15] lapic_id[0x14] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x16] lapic_id[0x15] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x17] lapic_id[0x16] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x18] lapic_id[0x17] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x19] lapic_id[0x18] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x1a] lapic_id[0x19] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x1b] lapic_id[0x1a] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x1c] lapic_id[0x1b] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x1d] lapic_id[0x1c] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x1e] lapic_id[0x1d] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x1f] lapic_id[0x1e] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x20] lapic_id[0x1f] disabled)
(XEN) ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: IOAPIC (id[0x02] address[0xfec3f000] gsi_base[24])
(XEN) IOAPIC[1]: apic_id 2, version 32, address 0xfec3f000, GSI 24-47
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 2 I/O APICs
(XEN) ACPI: HPET id: 0x8086a701 base: 0xfed00000
(XEN) [VT-D]Host address width 46
(XEN) [VT-D]found ACPI_DMAR_DRHD:
(XEN) [VT-D]  dmaru->address = fbffe000
(XEN) [VT-D]drhd->address = fbffe000 iommu->reg = ffff82c00021b000
(XEN) [VT-D]cap = d2078c106f0462 ecap = f020fa
(XEN) [VT-D] IOAPIC: 0000:00:1f.7
(XEN) [VT-D] IOAPIC: 0000:00:05.4
(XEN) [VT-D] MSI HPET: 0000:f0:0f.0
(XEN) [VT-D]  flags: INCLUDE_ALL
(XEN) [VT-D]found ACPI_DMAR_RMRR:
(XEN) [VT-D] endpoint: 0000:00:1d.0
(XEN) [VT-D] endpoint: 0000:00:1a.0
(XEN) [VT-D]dmar.c:638:   RMRR region: base_addr ac7cf000 end_addr ac7defff
(XEN) [VT-D]found ACPI_DMAR_RHSA:
(XEN) [VT-D]  rhsau->address: fbffe000 rhsau->proximity_domain: 0
(XEN) ERST table was not found
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) SMP: Allowing 32 CPUs (28 hotplug CPUs)
(XEN) IRQ limits: 48 GSI, 736 MSI/MSI-X
(XEN) Switched to APIC driver x2apic_cluster.
(XEN) xstate: size: 0x340 and states: 0x7
(XEN) mce_intel.c:763: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, SER, CMCI
(XEN) CPU0: Intel machine check reporting enabled
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Platform timer is 14.318MHz HPET
(XEN) Detected 2992.801 MHz processor.
(XEN) Initing memory sharing.
(XEN) alt table ffff82d0804262c0 -> ffff82d080426a34
(XEN) PCI: MCFG configuration 0: base b0000000 segment 0000 buses 00 - 3f
(XEN) PCI: MCFG area at b0000000 reserved in E820
(XEN) PCI: Using MCFG for segment 0000 bus 00-3f
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d Snoop Control enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Posted Interrupt not enabled.
(XEN) Intel VT-d Shared EPT tables enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) nr_sockets: 9
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) TSC deadline timer enabled
(XEN) Allocated console ring of 32 KiB.
(XEN) mwait-idle: MWAIT substates: 0x21120
(XEN) mwait-idle: v0.4.1 model 0x2d
(XEN) mwait-idle: lapic_timer_reliable_states 0xffffffff
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Extended Page Tables (EPT)
(XEN)  - Virtual-Processor Identifiers (VPID)
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN)  - Unrestricted Guest
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB
(XEN) Brought up 4 CPUs
(XEN) Testing NMI watchdog on all CPUs: ok
(XEN) Running stub recovery selftests...
(XEN) traps.c:1530: GPF (0000): ffff82d0bffff041 [ffff82d0bffff041] -> ffff82d08034b0fa
(XEN) traps.c:738: Trap 12: ffff82d0bffff040 [ffff82d0bffff040] -> ffff82d08034b0fa
(XEN) traps.c:1068: Trap 3: ffff82d0bffff041 [ffff82d0bffff041] -> ffff82d08034b0fa
(XEN) ACPI sleep modes: S3
(XEN) VPMU: disabled
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) Dom0 has maximum 624 PIRQs
(XEN) NX (Execute Disable) protection active
(XEN) *** LOADING DOMAIN 0 ***
(XEN) ELF: phdr: paddr=0x200000 memsz=0x14d0dc8
(XEN) ELF: phdr: paddr=0x18d1000 memsz=0x6b05a8
(XEN) ELF: memory: 0x200000 -> 0x1f815a8
(XEN) ELF: note: GUEST_OS = "FreeBSD"
(XEN) ELF: note: GUEST_VERSION = "0x124f9d"
(XEN) ELF: note: XEN_VERSION = "xen-3.0"
(XEN) ELF: note: VIRT_BASE = 0xffffffff80000000
(XEN) ELF: note: PADDR_OFFSET = 0
(XEN) ELF: note: ENTRY = 0xffffffff80eba000
(XEN) ELF: note: HYPERCALL_PAGE = 0xffffffff80eb9000
(XEN) ELF: note: HV_START_LOW = 0xffff800000000000
(XEN) ELF: note: FEATURES = "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector"
(XEN) ELF: note: PAE_MODE = "yes"
(XEN) ELF: note: unknown (0xd)
(XEN) ELF: note: LOADER = "generic"
(XEN) ELF: note: SUSPEND_CANCEL = 0
(XEN) ELF: note: BSD_SYMTAB = "yes"
(XEN) ELF: note: PHYS32_ENTRY = 0xeba030
(XEN) ELF: using notes from SHT_NOTE section
(XEN) ELF: addresses:
(XEN)     virt_base        = 0xffffffff80000000
(XEN)     elf_paddr_offset = 0x0
(XEN)     virt_offset      = 0xffffffff80000000
(XEN)     virt_kstart      = 0xffffffff80200000
(XEN)     virt_kend        = 0xffffffff82272768
(XEN)     virt_entry       = 0xffffffff80eba000
(XEN)     p2m_base         = 0xffffffffffffffff
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x200000 -> 0x1f815a8
(XEN)  Dom0 symbol map 0x1f815a8 -> 0x2272768
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000244000000->0000000248000000 (1031217 pages to be allocated)
(XEN)  Init. ramdisk: 000000024fc31000->0000000250000000
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff80200000->ffffffff82272768
(XEN)  Init. ramdisk: ffffffff82273000->ffffffff82642000
(XEN)  Phys-Mach map: ffffffff82642000->ffffffff82e42000
(XEN)  Start info:    ffffffff82e42000->ffffffff82e424b4
(XEN)  Page tables:   ffffffff82e43000->ffffffff82e5e000
(XEN)  Boot stack:    ffffffff82e5e000->ffffffff82e5f000
(XEN)  TOTAL:         ffffffff80000000->ffffffff83000000
(XEN)  ENTRY ADDRESS: ffffffff80eba000
(XEN) Dom0 has maximum 4 VCPUs
(XEN) ELF: phdr 2 at 0xffffffff80200000 -> 0xffffffff816d0dc8
(XEN) ELF: phdr 3 at 0xffffffff818d1000 -> 0xffffffff81a1a7b8
(XEN) [VT-D]d0:Hostbridge: skip 0000:00:00.0 map
(XEN) Masked UR signaling on 0000:00:00.0
(XEN) Masked UR signaling on 0000:00:01.0
(XEN) Masked UR signaling on 0000:00:01.1
(XEN) Masked UR signaling on 0000:00:02.0
(XEN) Masked UR signaling on 0000:00:03.0
(XEN) [VT-D]d0:PCIe: map 0000:00:05.0
(XEN) Masked VT-d error signaling on 0000:00:05.0
(XEN) [VT-D]d0:PCIe: map 0000:00:05.2
(XEN) [VT-D]d0:PCI: map 0000:00:05.4
(XEN) [VT-D]d0:PCI: map 0000:00:16.0
(XEN) [VT-D]d0:PCI: map 0000:00:19.0
(XEN) [VT-D]d0:PCI: map 0000:00:1a.0
(XEN) [VT-D]d0:PCIe: map 0000:00:1b.0
(XEN) [VT-D]d0:PCI: map 0000:00:1d.0
(XEN) [VT-D]d0:PCI: map 0000:00:1f.0
(XEN) [VT-D]d0:PCI: map 0000:00:1f.2
(XEN) [VT-D]d0:PCI: map 0000:00:1f.3
(XEN) [VT-D]d0:PCIe: map 0000:03:00.0
(XEN) [VT-D]d0:PCIe: map 0000:03:00.1
(XEN) [VT-D]d0:PCIe: map 0000:05:00.0
(XEN) [VT-D]d0:PCIe: map 0000:05:00.3
(XEN) [VT-D]d0:PCIe: map 0000:07:00.0
(XEN) [VT-D]d0:PCI: map 0000:3f:08.0
(XEN) [VT-D]d0:PCIe: map 0000:3f:08.3
(XEN) [VT-D]d0:PCIe: map 0000:3f:08.4
(XEN) [VT-D]d0:PCI: map 0000:3f:09.0
(XEN) [VT-D]d0:PCIe: map 0000:3f:09.3
(XEN) [VT-D]d0:PCIe: map 0000:3f:09.4
(XEN) [VT-D]d0:PCI: map 0000:3f:0a.0
(XEN) [VT-D]d0:PCI: map 0000:3f:0a.1
(XEN) [VT-D]d0:PCI: map 0000:3f:0a.2
(XEN) [VT-D]d0:PCI: map 0000:3f:0a.3
(XEN) [VT-D]d0:PCI: map 0000:3f:0b.0
(XEN) [VT-D]d0:PCI: map 0000:3f:0b.3
(XEN) [VT-D]d0:PCI: map 0000:3f:0c.0
(XEN) [VT-D]d0:PCI: map 0000:3f:0c.1
(XEN) [VT-D]d0:PCI: map 0000:3f:0c.6
(XEN) [VT-D]d0:PCI: map 0000:3f:0c.7
(XEN) [VT-D]d0:PCI: map 0000:3f:0d.0
(XEN) [VT-D]d0:PCI: map 0000:3f:0d.1
(XEN) [VT-D]d0:PCI: map 0000:3f:0d.6
(XEN) [VT-D]d0:PCI: map 0000:3f:0e.0
(XEN) [VT-D]d0:PCI: map 0000:3f:0e.1
(XEN) [VT-D]d0:PCIe: map 0000:3f:0f.0
(XEN) [VT-D]d0:PCIe: map 0000:3f:0f.1
(XEN) [VT-D]d0:PCIe: map 0000:3f:0f.2
(XEN) [VT-D]d0:PCIe: map 0000:3f:0f.3
(XEN) [VT-D]d0:PCIe: map 0000:3f:0f.4
(XEN) [VT-D]d0:PCIe: map 0000:3f:0f.5
(XEN) [VT-D]d0:PCI: map 0000:3f:0f.6
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.0
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.1
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.2
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.3
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.4
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.5
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.6
(XEN) [VT-D]d0:PCIe: map 0000:3f:10.7
(XEN) [VT-D]d0:PCI: map 0000:3f:11.0
(XEN) [VT-D]d0:PCI: map 0000:3f:13.0
(XEN) [VT-D]d0:PCI: map 0000:3f:13.1
(XEN) [VT-D]d0:PCI: map 0000:3f:13.4
(XEN) [VT-D]d0:PCI: map 0000:3f:13.5
(XEN) [VT-D]d0:PCI: map 0000:3f:13.6
(XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-31  7:32                   ` Chao Gao
  2017-08-31  8:53                     ` Roger Pau Monne
@ 2017-08-31  9:03                     ` Roger Pau Monne
  2017-08-31  8:45                       ` Chao Gao
  1 sibling, 1 reply; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-31  9:03 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich, Andrew Cooper, xen-devel

On Thu, Aug 31, 2017 at 03:32:42PM +0800, Chao Gao wrote:
> On Tue, Aug 29, 2017 at 08:33:25AM +0100, Roger Pau Monne wrote:
> >On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote:
> >> > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> >> > Sent: Friday, August 25, 2017 9:59 PM
> >> > 
> >> > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote:
> >> > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote:
> >> > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote:
> >> > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote:
> >> > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
> >> > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> >> > > >> >> > --- a/xen/arch/x86/dom0_build.c
> >> > > >> >> > +++ b/xen/arch/x86/dom0_build.c
> >> > > >> >> > @@ -440,6 +440,10 @@ int __init
> >> > dom0_setup_permissions(struct domain *d)
> >> > > >> >> >              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
> >> > > >> >> >      }
> >> > > >> >> >
> >> > > >> >> > +    /* For PVH prevent access to the MMCFG areas. */
> >> > > >> >> > +    if ( dom0_pvh )
> >> > > >> >> > +        rc |= pci_mmcfg_set_domain_permissions(d);
> >> > > >> >>
> >> > > >> >> What about ones reported by Dom0 later on? Which then raises the
> >> > > >> >> question whether ...
> >> > > >> >
> >> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved
> >> > handler.
> >> > > >> > But since you propose to do white listing, I guess it doesn't matter
> >> > > >> > that much anymore.
> >> > > >>
> >> > > >> Well, a fundamental question is whether white listing would work in
> >> > > >> the first place. I could see room for severe problems e.g. with ACPI
> >> > > >> methods wanting to access MMIO that's not described by any PCI
> >> > > >> devices' BARs. Typically that would be regions in the chipset which
> >> > > >> firmware is responsible for configuring/managing, the addresses of
> >> > > >> which can be found/set in custom config space registers.
> >> > > >
> >> > > > The question would also be what would Xen allow in such white-listing.
> >> > > > Obviously you can get to map the same using both white-list and
> >> > > > black-listing (see below).
> >> > >
> >> > > Not really - what you've said there regarding MMCFG regions is
> >> > > a clear indication that we should _not_ map reserved regions, i.e.
> >> > > it would need to be full white listing with perhaps just the PCI
> >> > > device BARs being handled automatically.
> >> > 
> >> > I've tried just mapping the BARs and that sadly doesn't work, the box
> >> > hangs after the IOMMU is enabled:
> >> > 
> >> > [...]
> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5
> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6
> >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
> >> > 
> >> > I will park this ATM and leave it for the Intel guys to diagnose.
> >> > 
> >> > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU
> >> > E5-1607 0 @ 3.00GHz and a C600/X79 chipset.
> >> > 
> >> 
> >> +Chao who can help check whether we have such a box at hand.
> >> 
> >> btw please also give your BIOS version.
> >
> >It's a Precision T3600 BIOS A14.
> 
> Hi, Roger.
> 
> I found a Ivy bridge box with E5-2697 v2 and tested with "dom0=pvh", and

The ones I've seen issues with are Sandy Bridge or Nehalem, can you
find some of this hardware?

I haven't tested Ivy Bridge, but all Haswell boxes I've tested seem to
work just fine.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-31  8:45                       ` Chao Gao
@ 2017-08-31 10:09                         ` Roger Pau Monne
  2017-09-04  6:25                           ` Chao Gao
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monne @ 2017-08-31 10:09 UTC (permalink / raw)
  To: Chao Gao, Tian, Kevin, Jan Beulich, Andrew Cooper, xen-devel

On Thu, Aug 31, 2017 at 04:45:23PM +0800, Chao Gao wrote:
> On Thu, Aug 31, 2017 at 10:03:19AM +0100, Roger Pau Monne wrote:
> >On Thu, Aug 31, 2017 at 03:32:42PM +0800, Chao Gao wrote:
> >> On Tue, Aug 29, 2017 at 08:33:25AM +0100, Roger Pau Monne wrote:
> >> >On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote:
> >> >> > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> >> >> > Sent: Friday, August 25, 2017 9:59 PM
> >> >> > 
> >> >> > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote:
> >> >> > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote:
> >> >> > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote:
> >> >> > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote:
> >> >> > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
> >> >> > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
> >> >> > > >> >> > --- a/xen/arch/x86/dom0_build.c
> >> >> > > >> >> > +++ b/xen/arch/x86/dom0_build.c
> >> >> > > >> >> > @@ -440,6 +440,10 @@ int __init
> >> >> > dom0_setup_permissions(struct domain *d)
> >> >> > > >> >> >              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
> >> >> > > >> >> >      }
> >> >> > > >> >> >
> >> >> > > >> >> > +    /* For PVH prevent access to the MMCFG areas. */
> >> >> > > >> >> > +    if ( dom0_pvh )
> >> >> > > >> >> > +        rc |= pci_mmcfg_set_domain_permissions(d);
> >> >> > > >> >>
> >> >> > > >> >> What about ones reported by Dom0 later on? Which then raises the
> >> >> > > >> >> question whether ...
> >> >> > > >> >
> >> >> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved
> >> >> > handler.
> >> >> > > >> > But since you propose to do white listing, I guess it doesn't matter
> >> >> > > >> > that much anymore.
> >> >> > > >>
> >> >> > > >> Well, a fundamental question is whether white listing would work in
> >> >> > > >> the first place. I could see room for severe problems e.g. with ACPI
> >> >> > > >> methods wanting to access MMIO that's not described by any PCI
> >> >> > > >> devices' BARs. Typically that would be regions in the chipset which
> >> >> > > >> firmware is responsible for configuring/managing, the addresses of
> >> >> > > >> which can be found/set in custom config space registers.
> >> >> > > >
> >> >> > > > The question would also be what would Xen allow in such white-listing.
> >> >> > > > Obviously you can get to map the same using both white-list and
> >> >> > > > black-listing (see below).
> >> >> > >
> >> >> > > Not really - what you've said there regarding MMCFG regions is
> >> >> > > a clear indication that we should _not_ map reserved regions, i.e.
> >> >> > > it would need to be full white listing with perhaps just the PCI
> >> >> > > device BARs being handled automatically.
> >> >> > 
> >> >> > I've tried just mapping the BARs and that sadly doesn't work, the box
> >> >> > hangs after the IOMMU is enabled:
> >> >> > 
> >> >> > [...]
> >> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5
> >> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6
> >> >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
> >> >> > 
> >> >> > I will park this ATM and leave it for the Intel guys to diagnose.
> >> >> > 
> >> >> > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU
> >> >> > E5-1607 0 @ 3.00GHz and a C600/X79 chipset.
> >> >> > 
> >> >> 
> >> >> +Chao who can help check whether we have such a box at hand.
> >> >> 
> >> >> btw please also give your BIOS version.
> >> >
> >> >It's a Precision T3600 BIOS A14.
> >> 
> >> Hi, Roger.
> >> 
> >> I found a Ivy bridge box with E5-2697 v2 and tested with "dom0=pvh", and
> >
> >The ones I've seen issues with are Sandy Bridge or Nehalem, can you
> >find some of this hardware?
> 
> As I expected, I was removed from recipents :(, which made me
> hard to notice your replies in time. 

Sorry, I have no idea why my MUA does that, it seems to be able to
deal fine with other recipients.

> Yes. I will. But may take some time (for even Ivy Bridge is rare).
> 
> >
> >I haven't tested Ivy Bridge, but all Haswell boxes I've tested seem to
> >work just fine.
> 
> The reason why I chose Ivy Bridge partly is you said you found this bug on
> almost pre-haswell box.

I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge
(in fact I didn't even know about Ivy Bridge, that's why I said all
pre-Haswell).

In fact I'm now trying with a Nehalem processor that seem to work, so
whatever this issue is it certainly doesn't affect all models or
chipsets.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-08-31 10:09                         ` Roger Pau Monne
@ 2017-09-04  6:25                           ` Chao Gao
  2017-09-04  9:00                             ` Roger Pau Monné
  0 siblings, 1 reply; 45+ messages in thread
From: Chao Gao @ 2017-09-04  6:25 UTC (permalink / raw)
  To: Roger Pau Monne, Tian, Kevin, Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Thu, Aug 31, 2017 at 11:09:48AM +0100, Roger Pau Monne wrote:
>On Thu, Aug 31, 2017 at 04:45:23PM +0800, Chao Gao wrote:
>> On Thu, Aug 31, 2017 at 10:03:19AM +0100, Roger Pau Monne wrote:
>> >On Thu, Aug 31, 2017 at 03:32:42PM +0800, Chao Gao wrote:
>> >> On Tue, Aug 29, 2017 at 08:33:25AM +0100, Roger Pau Monne wrote:
>> >> >On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote:
>> >> >> > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
>> >> >> > Sent: Friday, August 25, 2017 9:59 PM
>> >> >> > 
>> >> >> > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote:
>> >> >> > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote:
>> >> >> > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote:
>> >> >> > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote:
>> >> >> > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
>> >> >> > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote:
>> >> >> > > >> >> > --- a/xen/arch/x86/dom0_build.c
>> >> >> > > >> >> > +++ b/xen/arch/x86/dom0_build.c
>> >> >> > > >> >> > @@ -440,6 +440,10 @@ int __init
>> >> >> > dom0_setup_permissions(struct domain *d)
>> >> >> > > >> >> >              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
>> >> >> > > >> >> >      }
>> >> >> > > >> >> >
>> >> >> > > >> >> > +    /* For PVH prevent access to the MMCFG areas. */
>> >> >> > > >> >> > +    if ( dom0_pvh )
>> >> >> > > >> >> > +        rc |= pci_mmcfg_set_domain_permissions(d);
>> >> >> > > >> >>
>> >> >> > > >> >> What about ones reported by Dom0 later on? Which then raises the
>> >> >> > > >> >> question whether ...
>> >> >> > > >> >
>> >> >> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved
>> >> >> > handler.
>> >> >> > > >> > But since you propose to do white listing, I guess it doesn't matter
>> >> >> > > >> > that much anymore.
>> >> >> > > >>
>> >> >> > > >> Well, a fundamental question is whether white listing would work in
>> >> >> > > >> the first place. I could see room for severe problems e.g. with ACPI
>> >> >> > > >> methods wanting to access MMIO that's not described by any PCI
>> >> >> > > >> devices' BARs. Typically that would be regions in the chipset which
>> >> >> > > >> firmware is responsible for configuring/managing, the addresses of
>> >> >> > > >> which can be found/set in custom config space registers.
>> >> >> > > >
>> >> >> > > > The question would also be what would Xen allow in such white-listing.
>> >> >> > > > Obviously you can get to map the same using both white-list and
>> >> >> > > > black-listing (see below).
>> >> >> > >
>> >> >> > > Not really - what you've said there regarding MMCFG regions is
>> >> >> > > a clear indication that we should _not_ map reserved regions, i.e.
>> >> >> > > it would need to be full white listing with perhaps just the PCI
>> >> >> > > device BARs being handled automatically.
>> >> >> > 
>> >> >> > I've tried just mapping the BARs and that sadly doesn't work, the box
>> >> >> > hangs after the IOMMU is enabled:
>> >> >> > 
>> >> >> > [...]
>> >> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5
>> >> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6
>> >> >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
>> >> >> > 
>> >> >> > I will park this ATM and leave it for the Intel guys to diagnose.
>> >> >> > 
>> >> >> > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU
>> >> >> > E5-1607 0 @ 3.00GHz and a C600/X79 chipset.
>> >> >> > 
>> >> >> 
>> >> >> +Chao who can help check whether we have such a box at hand.
>> >> >> 
>> >> >> btw please also give your BIOS version.
>> >> >
>> >> >It's a Precision T3600 BIOS A14.
>> >> 
>> >> Hi, Roger.
>> >> 
>> >> I found a Ivy bridge box with E5-2697 v2 and tested with "dom0=pvh", and
>> >
>> >The ones I've seen issues with are Sandy Bridge or Nehalem, can you
>> >find some of this hardware?
>> 
>> As I expected, I was removed from recipents :(, which made me
>> hard to notice your replies in time. 
>
>Sorry, I have no idea why my MUA does that, it seems to be able to
>deal fine with other recipients.
>
>> Yes. I will. But may take some time (for even Ivy Bridge is rare).
>> 
>> >
>> >I haven't tested Ivy Bridge, but all Haswell boxes I've tested seem to
>> >work just fine.
>> 
>> The reason why I chose Ivy Bridge partly is you said you found this bug on
>> almost pre-haswell box.
>
>I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge
>(in fact I didn't even know about Ivy Bridge, that's why I said all
>pre-Haswell).
>
>In fact I'm now trying with a Nehalem processor that seem to work, so
>whatever this issue is it certainly doesn't affect all models or
>chipsets.

Hi, Roger.

Last week, I borrowed a Sandy Bridge with Intel(R) Xeon(R) E5-2690
2.7GHz and tested with 'dom0=pvh'. But I didn't see the machine hang.

I also tested on Haswell and found RMRRs in dmar are incorrect on my
haswell. The e820 on that machine is:
(XEN) [    0.000000] Xen-e820 RAM map:
(XEN) [    0.000000]  0000000000000000 - 000000000009a400 (usable)
(XEN) [    0.000000]  000000000009a400 - 00000000000a0000 (reserved)
(XEN) [    0.000000]  00000000000e0000 - 0000000000100000 (reserved)
(XEN) [    0.000000]  0000000000100000 - 000000006ff84000 (usable)
(XEN) [    0.000000]  000000006ff84000 - 000000007ac51000 (reserved)
(XEN) [    0.000000]  000000007ac51000 - 000000007b681000 (ACPI NVS)
(XEN) [    0.000000]  000000007b681000 - 000000007b7cf000 (ACPI data)
(XEN) [    0.000000]  000000007b7cf000 - 000000007b800000 (usable)
(XEN) [    0.000000]  000000007b800000 - 0000000090000000 (reserved)
(XEN) [    0.000000]  00000000fed1c000 - 00000000fed20000 (reserved)
(XEN) [    0.000000]  00000000ff400000 - 0000000100000000 (reserved)
(XEN) [    0.000000]  0000000100000000 - 0000002080000000 (usable)

And the RMRRs in DMAR are:
(XEN) [    0.000000] [VT-D]found ACPI_DMAR_RMRR:
(XEN) [    0.000000] [VT-D] endpoint: 0000:05:00.0
(XEN) [    0.000000] [VT-D]dmar.c:638:   RMRR region: base_addr 723b4000
end_addr 7a3f3fff
(XEN) [    0.000000] [VT-D]found ACPI_DMAR_RMRR:
(XEN) [    0.000000] [VT-D] endpoint: 0000:00:1d.0
(XEN) [    0.000000] [VT-D] endpoint: 0000:00:1a.0
(XEN) [    0.000000] [VT-D]dmar.c:638:   RMRR region: base_addr 723ac000
end_addr 723aefff
(Endpoint 05:00.0 is a RAID bus controller. Endpoints 00.1d.0 and 00.1a.0
are USB controllers.)

After DMA remapping is enabled, two DMA translation faults are reported
by VT-d:
(XEN) [    9.547924] [VT-D]iommu_enable_translation: iommu->reg =
ffff82c00021b000
(XEN) [    9.550620] [VT-D]iommu_enable_translation: iommu->reg =
ffff82c00021d000
(XEN) [    9.553327] [VT-D]iommu.c:921: iommu_fault_status: Primary
Pending Fault
(XEN) [    9.555906] [VT-D]DMAR:[DMA Read] Request device [0000:00:1a.0]
fault addr 7a3f5000, iommu reg = ffff82c00021d000
(XEN) [    9.558537] [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [    9.559860] print_vtd_entries: iommu #1 dev 0000:00:1a.0 gmfn
7a3f5
(XEN) [    9.561179]     root_entry[00] = 107277c001
(XEN) [    9.562447]     context[d0] = 2_1072c06001
(XEN) [    9.563776]     l4[000] = 9c0000202f171107
(XEN) [    9.565125]     l3[001] = 9c0000202f152107
(XEN) [    9.566483]     l2[1d1] = 9c000010727ce107
(XEN) [    9.567821]     l1[1f5] = 8000000000000000
(XEN) [    9.569168]     l1[1f5] not present
(XEN) [    9.570502] [VT-D]DMAR:[DMA Read] Request device [0000:00:1d.0]
fault addr 7a3f4000, iommu reg = ffff82c00021d000
(XEN) [    9.573147] [VT-D]DMAR: reason 06 - PTE Read access is not set
(XEN) [    9.574488] print_vtd_entries: iommu #1 dev 0000:00:1d.0 gmfn
7a3f4
(XEN) [    9.575819]     root_entry[00] = 107277c001
(XEN) [    9.577129]     context[e8] = 2_1072c06001
(XEN) [    9.578439]     l4[000] = 9c0000202f171107
(XEN) [    9.579778]     l3[001] = 9c0000202f152107
(XEN) [    9.581111]     l2[1d1] = 9c000010727ce107
(XEN) [    9.582482]     l1[1f4] = 8000000000000000
(XEN) [    9.583812]     l1[1f4] not present
(XEN) [   10.520172] Unable to find XEN_ELFNOTE_PHYS32_ENTRY address
(XEN) [   10.521499] Failed to load Dom0 kernel
(XEN) [   10.532171] 
(XEN) [   10.535464] ****************************************
(XEN) [   10.542636] Panic on CPU 0:
(XEN) [   10.547394] Could not set up DOM0 guest OS
(XEN) [   10.553605] ****************************************

The fault address the devices failed to access is marked as reserved in
e820 and isn't reserved for the devices according to the RMRRs in DMAR.
So I think we can draw a conclusion that some existing BIOSs don't
expose correct RMRR to OS by DMAR. And we need a workaround such as
iommu_inclusive_mapping to deal with such kind of BIOS for both pv dom0
and pvh dom0.

As to the machine hang Roger observed, I have no idea on the cause. Roger,
have you ever seen the VT-d on that machine reporting a DMA
translation fault? If not, can you create one fault in native? I think
this can tell us whether the hardware's fault report function works well
or there are some bugs in Xen code. What is your opinion on this trial?

Thanks
chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-09-04  9:26                               ` Roger Pau Monné
@ 2017-09-04  8:52                                 ` Chao Gao
  2017-09-04 15:06                                   ` Roger Pau Monné
  0 siblings, 1 reply; 45+ messages in thread
From: Chao Gao @ 2017-09-04  8:52 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Tian, Kevin, Jan Beulich, xen-devel

On Mon, Sep 04, 2017 at 10:26:04AM +0100, Roger Pau Monné wrote:
>(Adding Chao again because my MUA seems to drop him each time)
>
>On Mon, Sep 04, 2017 at 10:00:00AM +0100, Roger Pau Monné wrote:
>> On Mon, Sep 04, 2017 at 02:25:10PM +0800, Chao Gao wrote:
>> > On Thu, Aug 31, 2017 at 11:09:48AM +0100, Roger Pau Monne wrote:
>> > >I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge
>> > >(in fact I didn't even know about Ivy Bridge, that's why I said all
>> > >pre-Haswell).
>> > >
>> > >In fact I'm now trying with a Nehalem processor that seem to work, so
>> > >whatever this issue is it certainly doesn't affect all models or
>> > >chipsets.
>> > 
>> > Hi, Roger.
>> > 
>> > Last week, I borrowed a Sandy Bridge with Intel(R) Xeon(R) E5-2690
>> > 2.7GHz and tested with 'dom0=pvh'. But I didn't see the machine hang.
>> > 
>> > I also tested on Haswell and found RMRRs in dmar are incorrect on my
>> > haswell. The e820 on that machine is:
>> > (XEN) [    0.000000] Xen-e820 RAM map:
>> > (XEN) [    0.000000]  0000000000000000 - 000000000009a400 (usable)
>> > (XEN) [    0.000000]  000000000009a400 - 00000000000a0000 (reserved)
>> > (XEN) [    0.000000]  00000000000e0000 - 0000000000100000 (reserved)
>> > (XEN) [    0.000000]  0000000000100000 - 000000006ff84000 (usable)
>> > (XEN) [    0.000000]  000000006ff84000 - 000000007ac51000 (reserved)
>> > (XEN) [    0.000000]  000000007ac51000 - 000000007b681000 (ACPI NVS)
>> > (XEN) [    0.000000]  000000007b681000 - 000000007b7cf000 (ACPI data)
>> > (XEN) [    0.000000]  000000007b7cf000 - 000000007b800000 (usable)
>> > (XEN) [    0.000000]  000000007b800000 - 0000000090000000 (reserved)
>> > (XEN) [    0.000000]  00000000fed1c000 - 00000000fed20000 (reserved)
>> > (XEN) [    0.000000]  00000000ff400000 - 0000000100000000 (reserved)
>> > (XEN) [    0.000000]  0000000100000000 - 0000002080000000 (usable)
>> > 
>> > And the RMRRs in DMAR are:
>> > (XEN) [    0.000000] [VT-D]found ACPI_DMAR_RMRR:
>> > (XEN) [    0.000000] [VT-D] endpoint: 0000:05:00.0
>> > (XEN) [    0.000000] [VT-D]dmar.c:638:   RMRR region: base_addr 723b4000
>> > end_addr 7a3f3fff
>> > (XEN) [    0.000000] [VT-D]found ACPI_DMAR_RMRR:
>> > (XEN) [    0.000000] [VT-D] endpoint: 0000:00:1d.0
>> > (XEN) [    0.000000] [VT-D] endpoint: 0000:00:1a.0
>> > (XEN) [    0.000000] [VT-D]dmar.c:638:   RMRR region: base_addr 723ac000
>> > end_addr 723aefff
>> > (Endpoint 05:00.0 is a RAID bus controller. Endpoints 00.1d.0 and 00.1a.0
>> > are USB controllers.)
>> > 
>> > After DMA remapping is enabled, two DMA translation faults are reported
>> > by VT-d:
>> > (XEN) [    9.547924] [VT-D]iommu_enable_translation: iommu->reg =
>> > ffff82c00021b000
>> > (XEN) [    9.550620] [VT-D]iommu_enable_translation: iommu->reg =
>> > ffff82c00021d000
>> > (XEN) [    9.553327] [VT-D]iommu.c:921: iommu_fault_status: Primary
>> > Pending Fault
>> > (XEN) [    9.555906] [VT-D]DMAR:[DMA Read] Request device [0000:00:1a.0]
>> > fault addr 7a3f5000, iommu reg = ffff82c00021d000
>> > (XEN) [    9.558537] [VT-D]DMAR: reason 06 - PTE Read access is not set
>> > (XEN) [    9.559860] print_vtd_entries: iommu #1 dev 0000:00:1a.0 gmfn
>> > 7a3f5
>> > (XEN) [    9.561179]     root_entry[00] = 107277c001
>> > (XEN) [    9.562447]     context[d0] = 2_1072c06001
>> > (XEN) [    9.563776]     l4[000] = 9c0000202f171107
>> > (XEN) [    9.565125]     l3[001] = 9c0000202f152107
>> > (XEN) [    9.566483]     l2[1d1] = 9c000010727ce107
>> > (XEN) [    9.567821]     l1[1f5] = 8000000000000000
>> > (XEN) [    9.569168]     l1[1f5] not present
>> > (XEN) [    9.570502] [VT-D]DMAR:[DMA Read] Request device [0000:00:1d.0]
>> > fault addr 7a3f4000, iommu reg = ffff82c00021d000
>> > (XEN) [    9.573147] [VT-D]DMAR: reason 06 - PTE Read access is not set
>> > (XEN) [    9.574488] print_vtd_entries: iommu #1 dev 0000:00:1d.0 gmfn
>> > 7a3f4
>> > (XEN) [    9.575819]     root_entry[00] = 107277c001
>> > (XEN) [    9.577129]     context[e8] = 2_1072c06001
>> > (XEN) [    9.578439]     l4[000] = 9c0000202f171107
>> > (XEN) [    9.579778]     l3[001] = 9c0000202f152107
>> > (XEN) [    9.581111]     l2[1d1] = 9c000010727ce107
>> > (XEN) [    9.582482]     l1[1f4] = 8000000000000000
>> > (XEN) [    9.583812]     l1[1f4] not present
>> > (XEN) [   10.520172] Unable to find XEN_ELFNOTE_PHYS32_ENTRY address
>> > (XEN) [   10.521499] Failed to load Dom0 kernel
>> > (XEN) [   10.532171] 
>> > (XEN) [   10.535464] ****************************************
>> > (XEN) [   10.542636] Panic on CPU 0:
>> > (XEN) [   10.547394] Could not set up DOM0 guest OS
>> > (XEN) [   10.553605] ****************************************
>> > 
>> > The fault address the devices failed to access is marked as reserved in
>> > e820 and isn't reserved for the devices according to the RMRRs in DMAR.
>> > So I think we can draw a conclusion that some existing BIOSs don't
>> > expose correct RMRR to OS by DMAR. And we need a workaround such as
>> > iommu_inclusive_mapping to deal with such kind of BIOS for both pv dom0
>> > and pvh dom0.
>> 
>> So your box seems to be capable of generating faults. Missing RMRR
>> regions is (sadly) expected, but at least you get faults and not a
>> complete hang. Which chipset does this box have? Is it a C600/X79?

No. The haswell's chipset is C610/x99. 

>> 
>> > 
>> > As to the machine hang Roger observed, I have no idea on the cause. Roger,
>> > have you ever seen the VT-d on that machine reporting a DMA
>> > translation fault? If not, can you create one fault in native? I think
>> > this can tell us whether the hardware's fault report function works well
>> > or there are some bugs in Xen code. What is your opinion on this trial?
>> 
>> Is there any simple way to create such a fault? Does the IOMMU have
>> some kind of self-test thing that can be used to generate a synthetic
>> fault?

I don't know such tool. Maybe you can hack the iommu driver.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-09-04  6:25                           ` Chao Gao
@ 2017-09-04  9:00                             ` Roger Pau Monné
  2017-09-04  9:26                               ` Roger Pau Monné
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monné @ 2017-09-04  9:00 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich, Andrew Cooper, xen-devel

On Mon, Sep 04, 2017 at 02:25:10PM +0800, Chao Gao wrote:
> On Thu, Aug 31, 2017 at 11:09:48AM +0100, Roger Pau Monne wrote:
> >I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge
> >(in fact I didn't even know about Ivy Bridge, that's why I said all
> >pre-Haswell).
> >
> >In fact I'm now trying with a Nehalem processor that seem to work, so
> >whatever this issue is it certainly doesn't affect all models or
> >chipsets.
> 
> Hi, Roger.
> 
> Last week, I borrowed a Sandy Bridge with Intel(R) Xeon(R) E5-2690
> 2.7GHz and tested with 'dom0=pvh'. But I didn't see the machine hang.
> 
> I also tested on Haswell and found RMRRs in dmar are incorrect on my
> haswell. The e820 on that machine is:
> (XEN) [    0.000000] Xen-e820 RAM map:
> (XEN) [    0.000000]  0000000000000000 - 000000000009a400 (usable)
> (XEN) [    0.000000]  000000000009a400 - 00000000000a0000 (reserved)
> (XEN) [    0.000000]  00000000000e0000 - 0000000000100000 (reserved)
> (XEN) [    0.000000]  0000000000100000 - 000000006ff84000 (usable)
> (XEN) [    0.000000]  000000006ff84000 - 000000007ac51000 (reserved)
> (XEN) [    0.000000]  000000007ac51000 - 000000007b681000 (ACPI NVS)
> (XEN) [    0.000000]  000000007b681000 - 000000007b7cf000 (ACPI data)
> (XEN) [    0.000000]  000000007b7cf000 - 000000007b800000 (usable)
> (XEN) [    0.000000]  000000007b800000 - 0000000090000000 (reserved)
> (XEN) [    0.000000]  00000000fed1c000 - 00000000fed20000 (reserved)
> (XEN) [    0.000000]  00000000ff400000 - 0000000100000000 (reserved)
> (XEN) [    0.000000]  0000000100000000 - 0000002080000000 (usable)
> 
> And the RMRRs in DMAR are:
> (XEN) [    0.000000] [VT-D]found ACPI_DMAR_RMRR:
> (XEN) [    0.000000] [VT-D] endpoint: 0000:05:00.0
> (XEN) [    0.000000] [VT-D]dmar.c:638:   RMRR region: base_addr 723b4000
> end_addr 7a3f3fff
> (XEN) [    0.000000] [VT-D]found ACPI_DMAR_RMRR:
> (XEN) [    0.000000] [VT-D] endpoint: 0000:00:1d.0
> (XEN) [    0.000000] [VT-D] endpoint: 0000:00:1a.0
> (XEN) [    0.000000] [VT-D]dmar.c:638:   RMRR region: base_addr 723ac000
> end_addr 723aefff
> (Endpoint 05:00.0 is a RAID bus controller. Endpoints 00.1d.0 and 00.1a.0
> are USB controllers.)
> 
> After DMA remapping is enabled, two DMA translation faults are reported
> by VT-d:
> (XEN) [    9.547924] [VT-D]iommu_enable_translation: iommu->reg =
> ffff82c00021b000
> (XEN) [    9.550620] [VT-D]iommu_enable_translation: iommu->reg =
> ffff82c00021d000
> (XEN) [    9.553327] [VT-D]iommu.c:921: iommu_fault_status: Primary
> Pending Fault
> (XEN) [    9.555906] [VT-D]DMAR:[DMA Read] Request device [0000:00:1a.0]
> fault addr 7a3f5000, iommu reg = ffff82c00021d000
> (XEN) [    9.558537] [VT-D]DMAR: reason 06 - PTE Read access is not set
> (XEN) [    9.559860] print_vtd_entries: iommu #1 dev 0000:00:1a.0 gmfn
> 7a3f5
> (XEN) [    9.561179]     root_entry[00] = 107277c001
> (XEN) [    9.562447]     context[d0] = 2_1072c06001
> (XEN) [    9.563776]     l4[000] = 9c0000202f171107
> (XEN) [    9.565125]     l3[001] = 9c0000202f152107
> (XEN) [    9.566483]     l2[1d1] = 9c000010727ce107
> (XEN) [    9.567821]     l1[1f5] = 8000000000000000
> (XEN) [    9.569168]     l1[1f5] not present
> (XEN) [    9.570502] [VT-D]DMAR:[DMA Read] Request device [0000:00:1d.0]
> fault addr 7a3f4000, iommu reg = ffff82c00021d000
> (XEN) [    9.573147] [VT-D]DMAR: reason 06 - PTE Read access is not set
> (XEN) [    9.574488] print_vtd_entries: iommu #1 dev 0000:00:1d.0 gmfn
> 7a3f4
> (XEN) [    9.575819]     root_entry[00] = 107277c001
> (XEN) [    9.577129]     context[e8] = 2_1072c06001
> (XEN) [    9.578439]     l4[000] = 9c0000202f171107
> (XEN) [    9.579778]     l3[001] = 9c0000202f152107
> (XEN) [    9.581111]     l2[1d1] = 9c000010727ce107
> (XEN) [    9.582482]     l1[1f4] = 8000000000000000
> (XEN) [    9.583812]     l1[1f4] not present
> (XEN) [   10.520172] Unable to find XEN_ELFNOTE_PHYS32_ENTRY address
> (XEN) [   10.521499] Failed to load Dom0 kernel
> (XEN) [   10.532171] 
> (XEN) [   10.535464] ****************************************
> (XEN) [   10.542636] Panic on CPU 0:
> (XEN) [   10.547394] Could not set up DOM0 guest OS
> (XEN) [   10.553605] ****************************************
> 
> The fault address the devices failed to access is marked as reserved in
> e820 and isn't reserved for the devices according to the RMRRs in DMAR.
> So I think we can draw a conclusion that some existing BIOSs don't
> expose correct RMRR to OS by DMAR. And we need a workaround such as
> iommu_inclusive_mapping to deal with such kind of BIOS for both pv dom0
> and pvh dom0.

So your box seems to be capable of generating faults. Missing RMRR
regions is (sadly) expected, but at least you get faults and not a
complete hang. Which chipset does this box have? Is it a C600/X79?

> 
> As to the machine hang Roger observed, I have no idea on the cause. Roger,
> have you ever seen the VT-d on that machine reporting a DMA
> translation fault? If not, can you create one fault in native? I think
> this can tell us whether the hardware's fault report function works well
> or there are some bugs in Xen code. What is your opinion on this trial?

Is there any simple way to create such a fault? Does the IOMMU have
some kind of self-test thing that can be used to generate a synthetic
fault?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-09-04  9:00                             ` Roger Pau Monné
@ 2017-09-04  9:26                               ` Roger Pau Monné
  2017-09-04  8:52                                 ` Chao Gao
  0 siblings, 1 reply; 45+ messages in thread
From: Roger Pau Monné @ 2017-09-04  9:26 UTC (permalink / raw)
  To: Chao Gao, Tian, Kevin, Jan Beulich, Andrew Cooper, xen-devel

(Adding Chao again because my MUA seems to drop him each time)

On Mon, Sep 04, 2017 at 10:00:00AM +0100, Roger Pau Monné wrote:
> On Mon, Sep 04, 2017 at 02:25:10PM +0800, Chao Gao wrote:
> > On Thu, Aug 31, 2017 at 11:09:48AM +0100, Roger Pau Monne wrote:
> > >I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge
> > >(in fact I didn't even know about Ivy Bridge, that's why I said all
> > >pre-Haswell).
> > >
> > >In fact I'm now trying with a Nehalem processor that seem to work, so
> > >whatever this issue is it certainly doesn't affect all models or
> > >chipsets.
> > 
> > Hi, Roger.
> > 
> > Last week, I borrowed a Sandy Bridge with Intel(R) Xeon(R) E5-2690
> > 2.7GHz and tested with 'dom0=pvh'. But I didn't see the machine hang.
> > 
> > I also tested on Haswell and found RMRRs in dmar are incorrect on my
> > haswell. The e820 on that machine is:
> > (XEN) [    0.000000] Xen-e820 RAM map:
> > (XEN) [    0.000000]  0000000000000000 - 000000000009a400 (usable)
> > (XEN) [    0.000000]  000000000009a400 - 00000000000a0000 (reserved)
> > (XEN) [    0.000000]  00000000000e0000 - 0000000000100000 (reserved)
> > (XEN) [    0.000000]  0000000000100000 - 000000006ff84000 (usable)
> > (XEN) [    0.000000]  000000006ff84000 - 000000007ac51000 (reserved)
> > (XEN) [    0.000000]  000000007ac51000 - 000000007b681000 (ACPI NVS)
> > (XEN) [    0.000000]  000000007b681000 - 000000007b7cf000 (ACPI data)
> > (XEN) [    0.000000]  000000007b7cf000 - 000000007b800000 (usable)
> > (XEN) [    0.000000]  000000007b800000 - 0000000090000000 (reserved)
> > (XEN) [    0.000000]  00000000fed1c000 - 00000000fed20000 (reserved)
> > (XEN) [    0.000000]  00000000ff400000 - 0000000100000000 (reserved)
> > (XEN) [    0.000000]  0000000100000000 - 0000002080000000 (usable)
> > 
> > And the RMRRs in DMAR are:
> > (XEN) [    0.000000] [VT-D]found ACPI_DMAR_RMRR:
> > (XEN) [    0.000000] [VT-D] endpoint: 0000:05:00.0
> > (XEN) [    0.000000] [VT-D]dmar.c:638:   RMRR region: base_addr 723b4000
> > end_addr 7a3f3fff
> > (XEN) [    0.000000] [VT-D]found ACPI_DMAR_RMRR:
> > (XEN) [    0.000000] [VT-D] endpoint: 0000:00:1d.0
> > (XEN) [    0.000000] [VT-D] endpoint: 0000:00:1a.0
> > (XEN) [    0.000000] [VT-D]dmar.c:638:   RMRR region: base_addr 723ac000
> > end_addr 723aefff
> > (Endpoint 05:00.0 is a RAID bus controller. Endpoints 00.1d.0 and 00.1a.0
> > are USB controllers.)
> > 
> > After DMA remapping is enabled, two DMA translation faults are reported
> > by VT-d:
> > (XEN) [    9.547924] [VT-D]iommu_enable_translation: iommu->reg =
> > ffff82c00021b000
> > (XEN) [    9.550620] [VT-D]iommu_enable_translation: iommu->reg =
> > ffff82c00021d000
> > (XEN) [    9.553327] [VT-D]iommu.c:921: iommu_fault_status: Primary
> > Pending Fault
> > (XEN) [    9.555906] [VT-D]DMAR:[DMA Read] Request device [0000:00:1a.0]
> > fault addr 7a3f5000, iommu reg = ffff82c00021d000
> > (XEN) [    9.558537] [VT-D]DMAR: reason 06 - PTE Read access is not set
> > (XEN) [    9.559860] print_vtd_entries: iommu #1 dev 0000:00:1a.0 gmfn
> > 7a3f5
> > (XEN) [    9.561179]     root_entry[00] = 107277c001
> > (XEN) [    9.562447]     context[d0] = 2_1072c06001
> > (XEN) [    9.563776]     l4[000] = 9c0000202f171107
> > (XEN) [    9.565125]     l3[001] = 9c0000202f152107
> > (XEN) [    9.566483]     l2[1d1] = 9c000010727ce107
> > (XEN) [    9.567821]     l1[1f5] = 8000000000000000
> > (XEN) [    9.569168]     l1[1f5] not present
> > (XEN) [    9.570502] [VT-D]DMAR:[DMA Read] Request device [0000:00:1d.0]
> > fault addr 7a3f4000, iommu reg = ffff82c00021d000
> > (XEN) [    9.573147] [VT-D]DMAR: reason 06 - PTE Read access is not set
> > (XEN) [    9.574488] print_vtd_entries: iommu #1 dev 0000:00:1d.0 gmfn
> > 7a3f4
> > (XEN) [    9.575819]     root_entry[00] = 107277c001
> > (XEN) [    9.577129]     context[e8] = 2_1072c06001
> > (XEN) [    9.578439]     l4[000] = 9c0000202f171107
> > (XEN) [    9.579778]     l3[001] = 9c0000202f152107
> > (XEN) [    9.581111]     l2[1d1] = 9c000010727ce107
> > (XEN) [    9.582482]     l1[1f4] = 8000000000000000
> > (XEN) [    9.583812]     l1[1f4] not present
> > (XEN) [   10.520172] Unable to find XEN_ELFNOTE_PHYS32_ENTRY address
> > (XEN) [   10.521499] Failed to load Dom0 kernel
> > (XEN) [   10.532171] 
> > (XEN) [   10.535464] ****************************************
> > (XEN) [   10.542636] Panic on CPU 0:
> > (XEN) [   10.547394] Could not set up DOM0 guest OS
> > (XEN) [   10.553605] ****************************************
> > 
> > The fault address the devices failed to access is marked as reserved in
> > e820 and isn't reserved for the devices according to the RMRRs in DMAR.
> > So I think we can draw a conclusion that some existing BIOSs don't
> > expose correct RMRR to OS by DMAR. And we need a workaround such as
> > iommu_inclusive_mapping to deal with such kind of BIOS for both pv dom0
> > and pvh dom0.
> 
> So your box seems to be capable of generating faults. Missing RMRR
> regions is (sadly) expected, but at least you get faults and not a
> complete hang. Which chipset does this box have? Is it a C600/X79?
> 
> > 
> > As to the machine hang Roger observed, I have no idea on the cause. Roger,
> > have you ever seen the VT-d on that machine reporting a DMA
> > translation fault? If not, can you create one fault in native? I think
> > this can tell us whether the hardware's fault report function works well
> > or there are some bugs in Xen code. What is your opinion on this trial?
> 
> Is there any simple way to create such a fault? Does the IOMMU have
> some kind of self-test thing that can be used to generate a synthetic
> fault?
> 
> Thanks, Roger.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-09-04  8:52                                 ` Chao Gao
@ 2017-09-04 15:06                                   ` Roger Pau Monné
  2017-09-04 15:19                                     ` Roger Pau Monné
  2017-09-04 15:39                                     ` Jan Beulich
  0 siblings, 2 replies; 45+ messages in thread
From: Roger Pau Monné @ 2017-09-04 15:06 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich, Andrew Cooper, xen-devel

On Mon, Sep 04, 2017 at 04:52:35PM +0800, Chao Gao wrote:
> On Mon, Sep 04, 2017 at 10:26:04AM +0100, Roger Pau Monné wrote:
> >(Adding Chao again because my MUA seems to drop him each time)
> >
> >On Mon, Sep 04, 2017 at 10:00:00AM +0100, Roger Pau Monné wrote:
> >> On Mon, Sep 04, 2017 at 02:25:10PM +0800, Chao Gao wrote:
> >> > On Thu, Aug 31, 2017 at 11:09:48AM +0100, Roger Pau Monne wrote:
> >> > >I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge
> >> > >(in fact I didn't even know about Ivy Bridge, that's why I said all
> >> > >pre-Haswell).
> >> > >
> >> > >In fact I'm now trying with a Nehalem processor that seem to work, so
> >> > >whatever this issue is it certainly doesn't affect all models or
> >> > >chipsets.
> >> > 
> >> > Hi, Roger.
> >> > 
> >> > Last week, I borrowed a Sandy Bridge with Intel(R) Xeon(R) E5-2690
> >> > 2.7GHz and tested with 'dom0=pvh'. But I didn't see the machine hang.
> >> > 
> >> > I also tested on Haswell and found RMRRs in dmar are incorrect on my
> >> > haswell. The e820 on that machine is:
> >> > (XEN) [    0.000000] Xen-e820 RAM map:
> >> > (XEN) [    0.000000]  0000000000000000 - 000000000009a400 (usable)
> >> > (XEN) [    0.000000]  000000000009a400 - 00000000000a0000 (reserved)
> >> > (XEN) [    0.000000]  00000000000e0000 - 0000000000100000 (reserved)
> >> > (XEN) [    0.000000]  0000000000100000 - 000000006ff84000 (usable)
> >> > (XEN) [    0.000000]  000000006ff84000 - 000000007ac51000 (reserved)
> >> > (XEN) [    0.000000]  000000007ac51000 - 000000007b681000 (ACPI NVS)
> >> > (XEN) [    0.000000]  000000007b681000 - 000000007b7cf000 (ACPI data)
> >> > (XEN) [    0.000000]  000000007b7cf000 - 000000007b800000 (usable)
> >> > (XEN) [    0.000000]  000000007b800000 - 0000000090000000 (reserved)
> >> > (XEN) [    0.000000]  00000000fed1c000 - 00000000fed20000 (reserved)
> >> > (XEN) [    0.000000]  00000000ff400000 - 0000000100000000 (reserved)
> >> > (XEN) [    0.000000]  0000000100000000 - 0000002080000000 (usable)
> >> > 
> >> > And the RMRRs in DMAR are:
> >> > (XEN) [    0.000000] [VT-D]found ACPI_DMAR_RMRR:
> >> > (XEN) [    0.000000] [VT-D] endpoint: 0000:05:00.0
> >> > (XEN) [    0.000000] [VT-D]dmar.c:638:   RMRR region: base_addr 723b4000
> >> > end_addr 7a3f3fff
> >> > (XEN) [    0.000000] [VT-D]found ACPI_DMAR_RMRR:
> >> > (XEN) [    0.000000] [VT-D] endpoint: 0000:00:1d.0
> >> > (XEN) [    0.000000] [VT-D] endpoint: 0000:00:1a.0
> >> > (XEN) [    0.000000] [VT-D]dmar.c:638:   RMRR region: base_addr 723ac000
> >> > end_addr 723aefff
> >> > (Endpoint 05:00.0 is a RAID bus controller. Endpoints 00.1d.0 and 00.1a.0
> >> > are USB controllers.)
> >> > 
> >> > After DMA remapping is enabled, two DMA translation faults are reported
> >> > by VT-d:
> >> > (XEN) [    9.547924] [VT-D]iommu_enable_translation: iommu->reg =
> >> > ffff82c00021b000
> >> > (XEN) [    9.550620] [VT-D]iommu_enable_translation: iommu->reg =
> >> > ffff82c00021d000
> >> > (XEN) [    9.553327] [VT-D]iommu.c:921: iommu_fault_status: Primary
> >> > Pending Fault
> >> > (XEN) [    9.555906] [VT-D]DMAR:[DMA Read] Request device [0000:00:1a.0]
> >> > fault addr 7a3f5000, iommu reg = ffff82c00021d000
> >> > (XEN) [    9.558537] [VT-D]DMAR: reason 06 - PTE Read access is not set
> >> > (XEN) [    9.559860] print_vtd_entries: iommu #1 dev 0000:00:1a.0 gmfn
> >> > 7a3f5
> >> > (XEN) [    9.561179]     root_entry[00] = 107277c001
> >> > (XEN) [    9.562447]     context[d0] = 2_1072c06001
> >> > (XEN) [    9.563776]     l4[000] = 9c0000202f171107
> >> > (XEN) [    9.565125]     l3[001] = 9c0000202f152107
> >> > (XEN) [    9.566483]     l2[1d1] = 9c000010727ce107
> >> > (XEN) [    9.567821]     l1[1f5] = 8000000000000000
> >> > (XEN) [    9.569168]     l1[1f5] not present
> >> > (XEN) [    9.570502] [VT-D]DMAR:[DMA Read] Request device [0000:00:1d.0]
> >> > fault addr 7a3f4000, iommu reg = ffff82c00021d000
> >> > (XEN) [    9.573147] [VT-D]DMAR: reason 06 - PTE Read access is not set
> >> > (XEN) [    9.574488] print_vtd_entries: iommu #1 dev 0000:00:1d.0 gmfn
> >> > 7a3f4
> >> > (XEN) [    9.575819]     root_entry[00] = 107277c001
> >> > (XEN) [    9.577129]     context[e8] = 2_1072c06001
> >> > (XEN) [    9.578439]     l4[000] = 9c0000202f171107
> >> > (XEN) [    9.579778]     l3[001] = 9c0000202f152107
> >> > (XEN) [    9.581111]     l2[1d1] = 9c000010727ce107
> >> > (XEN) [    9.582482]     l1[1f4] = 8000000000000000
> >> > (XEN) [    9.583812]     l1[1f4] not present
> >> > (XEN) [   10.520172] Unable to find XEN_ELFNOTE_PHYS32_ENTRY address
> >> > (XEN) [   10.521499] Failed to load Dom0 kernel
> >> > (XEN) [   10.532171] 
> >> > (XEN) [   10.535464] ****************************************
> >> > (XEN) [   10.542636] Panic on CPU 0:
> >> > (XEN) [   10.547394] Could not set up DOM0 guest OS
> >> > (XEN) [   10.553605] ****************************************
> >> > 
> >> > The fault address the devices failed to access is marked as reserved in
> >> > e820 and isn't reserved for the devices according to the RMRRs in DMAR.
> >> > So I think we can draw a conclusion that some existing BIOSs don't
> >> > expose correct RMRR to OS by DMAR. And we need a workaround such as
> >> > iommu_inclusive_mapping to deal with such kind of BIOS for both pv dom0
> >> > and pvh dom0.
> >> 
> >> So your box seems to be capable of generating faults. Missing RMRR
> >> regions is (sadly) expected, but at least you get faults and not a
> >> complete hang. Which chipset does this box have? Is it a C600/X79?
> 
> No. The haswell's chipset is C610/x99. 

Can you try with the C600/x79 chipset? I'm afraid the issue is
probably more related to the chipset rather than the CPU itself.

> >> 
> >> > 
> >> > As to the machine hang Roger observed, I have no idea on the cause. Roger,
> >> > have you ever seen the VT-d on that machine reporting a DMA
> >> > translation fault? If not, can you create one fault in native? I think
> >> > this can tell us whether the hardware's fault report function works well
> >> > or there are some bugs in Xen code. What is your opinion on this trial?
> >> 
> >> Is there any simple way to create such a fault? Does the IOMMU have
> >> some kind of self-test thing that can be used to generate a synthetic
> >> fault?
> 
> I don't know such tool. Maybe you can hack the iommu driver.

Hm, OK, it doesn't look very easy to implement something like this.
Will try to find some time, but I'm fairly busy ATM.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-09-04 15:06                                   ` Roger Pau Monné
@ 2017-09-04 15:19                                     ` Roger Pau Monné
  2017-09-04 15:39                                     ` Jan Beulich
  1 sibling, 0 replies; 45+ messages in thread
From: Roger Pau Monné @ 2017-09-04 15:19 UTC (permalink / raw)
  To: Chao Gao, Tian, Kevin, Jan Beulich, Andrew Cooper, xen-devel

OK, I know why my MUA doesn't add your email to the To or Cc when
replying, this is because your original email contain the following
header tag:

Mail-Followup-To: Roger Pau =?iso-8859-1?Q?Monn=E9?= <roger.pau@citrix.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	Jan Beulich <JBeulich@suse.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>

When replying, the following addresses are placed in the "To" field,
and as you can see your address is missing from this list. So either
you add your address here, or you stop setting "Mail-Followup-To".

Roger.

On Mon, Sep 04, 2017 at 04:06:51PM +0100, Roger Pau Monné wrote:
> On Mon, Sep 04, 2017 at 04:52:35PM +0800, Chao Gao wrote:
> > On Mon, Sep 04, 2017 at 10:26:04AM +0100, Roger Pau Monné wrote:
> > >(Adding Chao again because my MUA seems to drop him each time)
> > >
> > >On Mon, Sep 04, 2017 at 10:00:00AM +0100, Roger Pau Monné wrote:
> > >> On Mon, Sep 04, 2017 at 02:25:10PM +0800, Chao Gao wrote:
> > >> > On Thu, Aug 31, 2017 at 11:09:48AM +0100, Roger Pau Monne wrote:
> > >> > >I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge
> > >> > >(in fact I didn't even know about Ivy Bridge, that's why I said all
> > >> > >pre-Haswell).
> > >> > >
> > >> > >In fact I'm now trying with a Nehalem processor that seem to work, so
> > >> > >whatever this issue is it certainly doesn't affect all models or
> > >> > >chipsets.
> > >> > 
> > >> > Hi, Roger.
> > >> > 
> > >> > Last week, I borrowed a Sandy Bridge with Intel(R) Xeon(R) E5-2690
> > >> > 2.7GHz and tested with 'dom0=pvh'. But I didn't see the machine hang.
> > >> > 
> > >> > I also tested on Haswell and found RMRRs in dmar are incorrect on my
> > >> > haswell. The e820 on that machine is:
> > >> > (XEN) [    0.000000] Xen-e820 RAM map:
> > >> > (XEN) [    0.000000]  0000000000000000 - 000000000009a400 (usable)
> > >> > (XEN) [    0.000000]  000000000009a400 - 00000000000a0000 (reserved)
> > >> > (XEN) [    0.000000]  00000000000e0000 - 0000000000100000 (reserved)
> > >> > (XEN) [    0.000000]  0000000000100000 - 000000006ff84000 (usable)
> > >> > (XEN) [    0.000000]  000000006ff84000 - 000000007ac51000 (reserved)
> > >> > (XEN) [    0.000000]  000000007ac51000 - 000000007b681000 (ACPI NVS)
> > >> > (XEN) [    0.000000]  000000007b681000 - 000000007b7cf000 (ACPI data)
> > >> > (XEN) [    0.000000]  000000007b7cf000 - 000000007b800000 (usable)
> > >> > (XEN) [    0.000000]  000000007b800000 - 0000000090000000 (reserved)
> > >> > (XEN) [    0.000000]  00000000fed1c000 - 00000000fed20000 (reserved)
> > >> > (XEN) [    0.000000]  00000000ff400000 - 0000000100000000 (reserved)
> > >> > (XEN) [    0.000000]  0000000100000000 - 0000002080000000 (usable)
> > >> > 
> > >> > And the RMRRs in DMAR are:
> > >> > (XEN) [    0.000000] [VT-D]found ACPI_DMAR_RMRR:
> > >> > (XEN) [    0.000000] [VT-D] endpoint: 0000:05:00.0
> > >> > (XEN) [    0.000000] [VT-D]dmar.c:638:   RMRR region: base_addr 723b4000
> > >> > end_addr 7a3f3fff
> > >> > (XEN) [    0.000000] [VT-D]found ACPI_DMAR_RMRR:
> > >> > (XEN) [    0.000000] [VT-D] endpoint: 0000:00:1d.0
> > >> > (XEN) [    0.000000] [VT-D] endpoint: 0000:00:1a.0
> > >> > (XEN) [    0.000000] [VT-D]dmar.c:638:   RMRR region: base_addr 723ac000
> > >> > end_addr 723aefff
> > >> > (Endpoint 05:00.0 is a RAID bus controller. Endpoints 00.1d.0 and 00.1a.0
> > >> > are USB controllers.)
> > >> > 
> > >> > After DMA remapping is enabled, two DMA translation faults are reported
> > >> > by VT-d:
> > >> > (XEN) [    9.547924] [VT-D]iommu_enable_translation: iommu->reg =
> > >> > ffff82c00021b000
> > >> > (XEN) [    9.550620] [VT-D]iommu_enable_translation: iommu->reg =
> > >> > ffff82c00021d000
> > >> > (XEN) [    9.553327] [VT-D]iommu.c:921: iommu_fault_status: Primary
> > >> > Pending Fault
> > >> > (XEN) [    9.555906] [VT-D]DMAR:[DMA Read] Request device [0000:00:1a.0]
> > >> > fault addr 7a3f5000, iommu reg = ffff82c00021d000
> > >> > (XEN) [    9.558537] [VT-D]DMAR: reason 06 - PTE Read access is not set
> > >> > (XEN) [    9.559860] print_vtd_entries: iommu #1 dev 0000:00:1a.0 gmfn
> > >> > 7a3f5
> > >> > (XEN) [    9.561179]     root_entry[00] = 107277c001
> > >> > (XEN) [    9.562447]     context[d0] = 2_1072c06001
> > >> > (XEN) [    9.563776]     l4[000] = 9c0000202f171107
> > >> > (XEN) [    9.565125]     l3[001] = 9c0000202f152107
> > >> > (XEN) [    9.566483]     l2[1d1] = 9c000010727ce107
> > >> > (XEN) [    9.567821]     l1[1f5] = 8000000000000000
> > >> > (XEN) [    9.569168]     l1[1f5] not present
> > >> > (XEN) [    9.570502] [VT-D]DMAR:[DMA Read] Request device [0000:00:1d.0]
> > >> > fault addr 7a3f4000, iommu reg = ffff82c00021d000
> > >> > (XEN) [    9.573147] [VT-D]DMAR: reason 06 - PTE Read access is not set
> > >> > (XEN) [    9.574488] print_vtd_entries: iommu #1 dev 0000:00:1d.0 gmfn
> > >> > 7a3f4
> > >> > (XEN) [    9.575819]     root_entry[00] = 107277c001
> > >> > (XEN) [    9.577129]     context[e8] = 2_1072c06001
> > >> > (XEN) [    9.578439]     l4[000] = 9c0000202f171107
> > >> > (XEN) [    9.579778]     l3[001] = 9c0000202f152107
> > >> > (XEN) [    9.581111]     l2[1d1] = 9c000010727ce107
> > >> > (XEN) [    9.582482]     l1[1f4] = 8000000000000000
> > >> > (XEN) [    9.583812]     l1[1f4] not present
> > >> > (XEN) [   10.520172] Unable to find XEN_ELFNOTE_PHYS32_ENTRY address
> > >> > (XEN) [   10.521499] Failed to load Dom0 kernel
> > >> > (XEN) [   10.532171] 
> > >> > (XEN) [   10.535464] ****************************************
> > >> > (XEN) [   10.542636] Panic on CPU 0:
> > >> > (XEN) [   10.547394] Could not set up DOM0 guest OS
> > >> > (XEN) [   10.553605] ****************************************
> > >> > 
> > >> > The fault address the devices failed to access is marked as reserved in
> > >> > e820 and isn't reserved for the devices according to the RMRRs in DMAR.
> > >> > So I think we can draw a conclusion that some existing BIOSs don't
> > >> > expose correct RMRR to OS by DMAR. And we need a workaround such as
> > >> > iommu_inclusive_mapping to deal with such kind of BIOS for both pv dom0
> > >> > and pvh dom0.
> > >> 
> > >> So your box seems to be capable of generating faults. Missing RMRR
> > >> regions is (sadly) expected, but at least you get faults and not a
> > >> complete hang. Which chipset does this box have? Is it a C600/X79?
> > 
> > No. The haswell's chipset is C610/x99. 
> 
> Can you try with the C600/x79 chipset? I'm afraid the issue is
> probably more related to the chipset rather than the CPU itself.
> 
> > >> 
> > >> > 
> > >> > As to the machine hang Roger observed, I have no idea on the cause. Roger,
> > >> > have you ever seen the VT-d on that machine reporting a DMA
> > >> > translation fault? If not, can you create one fault in native? I think
> > >> > this can tell us whether the hardware's fault report function works well
> > >> > or there are some bugs in Xen code. What is your opinion on this trial?
> > >> 
> > >> Is there any simple way to create such a fault? Does the IOMMU have
> > >> some kind of self-test thing that can be used to generate a synthetic
> > >> fault?
> > 
> > I don't know such tool. Maybe you can hack the iommu driver.
> 
> Hm, OK, it doesn't look very easy to implement something like this.
> Will try to find some time, but I'm fairly busy ATM.
> 
> Roger.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0
  2017-09-04 15:06                                   ` Roger Pau Monné
  2017-09-04 15:19                                     ` Roger Pau Monné
@ 2017-09-04 15:39                                     ` Jan Beulich
  1 sibling, 0 replies; 45+ messages in thread
From: Jan Beulich @ 2017-09-04 15:39 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Kevin Tian, xen-devel

>>> On 04.09.17 at 17:06, <roger.pau@citrix.com> wrote:
> On Mon, Sep 04, 2017 at 04:52:35PM +0800, Chao Gao wrote:
>> On Mon, Sep 04, 2017 at 10:26:04AM +0100, Roger Pau Monné wrote:
>> >On Mon, Sep 04, 2017 at 10:00:00AM +0100, Roger Pau Monné wrote:
>> >> So your box seems to be capable of generating faults. Missing RMRR
>> >> regions is (sadly) expected, but at least you get faults and not a
>> >> complete hang. Which chipset does this box have? Is it a C600/X79?
>> 
>> No. The haswell's chipset is C610/x99. 
> 
> Can you try with the C600/x79 chipset? I'm afraid the issue is
> probably more related to the chipset rather than the CPU itself.

Or even the firmware.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2017-09-04 15:39 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-11 16:43 [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping for PVH Dom0 Roger Pau Monne
2017-08-11 16:43 ` [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas " Roger Pau Monne
2017-08-17  3:12   ` Tian, Kevin
2017-08-17  9:32     ` Roger Pau Monne
2017-08-28  6:04       ` Tian, Kevin
2017-08-22 12:26   ` Jan Beulich
2017-08-22 13:54     ` Roger Pau Monne
2017-08-23  8:16       ` Jan Beulich
2017-08-25 12:15         ` Roger Pau Monne
2017-08-25 12:25           ` Jan Beulich
2017-08-25 13:58             ` Roger Pau Monne
2017-08-28  6:18               ` Tian, Kevin
2017-08-29  7:33                 ` Roger Pau Monne
2017-08-31  7:32                   ` Chao Gao
2017-08-31  8:53                     ` Roger Pau Monne
2017-08-31  9:03                     ` Roger Pau Monne
2017-08-31  8:45                       ` Chao Gao
2017-08-31 10:09                         ` Roger Pau Monne
2017-09-04  6:25                           ` Chao Gao
2017-09-04  9:00                             ` Roger Pau Monné
2017-09-04  9:26                               ` Roger Pau Monné
2017-09-04  8:52                                 ` Chao Gao
2017-09-04 15:06                                   ` Roger Pau Monné
2017-09-04 15:19                                     ` Roger Pau Monné
2017-09-04 15:39                                     ` Jan Beulich
2017-08-11 16:43 ` [PATCH v2 2/4] x86/dom0: prevent PVH Dom0 from mapping read-only the IO APIC area Roger Pau Monne
2017-08-17  3:12   ` Tian, Kevin
2017-08-17  9:35     ` Roger Pau Monne
2017-08-28  6:07       ` Tian, Kevin
2017-08-22 12:28   ` Jan Beulich
2017-08-11 16:43 ` [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping Roger Pau Monne
2017-08-17  3:28   ` Tian, Kevin
2017-08-17  9:39     ` Roger Pau Monne
2017-08-28  6:13       ` Tian, Kevin
2017-08-22 12:31   ` Jan Beulich
2017-08-22 14:01     ` Roger Pau Monne
2017-08-23  8:18       ` Jan Beulich
2017-08-28  6:14         ` Tian, Kevin
2017-08-29  7:39           ` Roger Pau Monne
2017-08-11 16:43 ` [PATCH v2 4/4] x86/dom0: re-order DMA remapping enabling for PVH Dom0 Roger Pau Monne
2017-08-22 12:37   ` Jan Beulich
2017-08-22 14:05     ` Roger Pau Monne
2017-08-23  8:21       ` Jan Beulich
2017-08-17  3:10 ` [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping " Tian, Kevin
2017-08-17  9:28   ` Roger Pau Monne

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.