All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] 86/iommu: PVH Dom0 workarounds for missing RMRR entries
@ 2018-08-08 10:07 Roger Pau Monne
  2018-08-08 10:07 ` [PATCH v4 1/4] iommu: introduce dom0-iommu option Roger Pau Monne
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Roger Pau Monne @ 2018-08-08 10:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne

Hello,

The following series implement a workaround for missing RMRR
entries for a PVH Dom0. It's based on the iommu_inclusive_mapping VTd
option.

The PVH workaround identity maps all regions marked as reserved in the
memory map.

Note that this workaround is enabled by default on Intel hardware. It's
also available to AMD hardware, although it's disabled by default in
that case.

The series can be found at:

git://xenbits.xen.org/people/royger/xen.git iommu_inclusive_v4

Thanks.

Roger Pau Monne (4):
  iommu: introduce dom0-iommu option
  iommu: make iommu_inclusive_mapping a suboption of dom0-iommu
  dom0/pvh: change the order of the MMCFG initialization
  x86/iommu: add reserved dom0-iommu option to map reserved memory
    ranges

 docs/misc/xen-command-line.markdown         | 47 +++++++++++
 xen/arch/x86/hvm/dom0_build.c               |  9 +-
 xen/arch/x86/hvm/io.c                       |  5 ++
 xen/arch/x86/x86_64/mm.c                    |  3 +-
 xen/drivers/passthrough/amd/iommu_init.c    |  2 +-
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 11 ++-
 xen/drivers/passthrough/arm/iommu.c         |  4 +
 xen/drivers/passthrough/iommu.c             | 62 ++++++++++++--
 xen/drivers/passthrough/vtd/extern.h        |  2 -
 xen/drivers/passthrough/vtd/iommu.c         | 25 +++---
 xen/drivers/passthrough/vtd/x86/vtd.c       | 58 +------------
 xen/drivers/passthrough/x86/iommu.c         | 93 +++++++++++++++++++++
 xen/include/asm-x86/hvm/io.h                |  3 +
 xen/include/xen/iommu.h                     |  8 +-
 14 files changed, 246 insertions(+), 86 deletions(-)

-- 
2.18.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 1/4] iommu: introduce dom0-iommu option
  2018-08-08 10:07 [PATCH v4 0/4] 86/iommu: PVH Dom0 workarounds for missing RMRR entries Roger Pau Monne
@ 2018-08-08 10:07 ` Roger Pau Monne
  2018-08-08 12:10   ` Jan Beulich
  2018-08-08 10:07 ` [PATCH v4 2/4] iommu: make iommu_inclusive_mapping a suboption of dom0-iommu Roger Pau Monne
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2018-08-08 10:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Suravee Suthikulpanit,
	George Dunlap, Andrew Cooper, Ian Jackson, Tim Deegan,
	Julien Grall, Jan Beulich, Brian Woods, Roger Pau Monne

To select the iommu configuration used by Dom0. This option supersedes
iommu=dom0-strict|dom0-passthrough.

No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v2:
 - Change the style and text used in the xen command line document.
 - Don't allow none, strict or relaxed to use the no- prefix.

Changes since v1:
 - New in this version.
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Brian Woods <brian.woods@amd.com>
Cc: Kevin Tian <kevin.tian@intel.com>
---
 docs/misc/xen-command-line.markdown         | 23 +++++++++++
 xen/arch/x86/x86_64/mm.c                    |  3 +-
 xen/drivers/passthrough/amd/iommu_init.c    |  2 +-
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  4 +-
 xen/drivers/passthrough/iommu.c             | 42 +++++++++++++++++----
 xen/drivers/passthrough/vtd/iommu.c         | 16 ++++----
 xen/include/xen/iommu.h                     |  6 ++-
 7 files changed, 75 insertions(+), 21 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 65b4754418..ea451f088e 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1150,12 +1150,18 @@ detection of systems known to misbehave upon accesses to that port.
 
 > `dom0-passthrough`
 
+> **WARNING: This command line option is deprecated, and superseded by
+> _dom0-iommu=none_ - using both options in combination is undefined.**
+
 > Default: `false`
 
 >> Control whether to disable DMA remapping for Dom0.
 
 > `dom0-strict`
 
+> **WARNING: This command line option is deprecated, and superseded by
+> _dom0-iommu=strict_ - using both options in combination is undefined.**
+
 > Default: `false`
 
 >> Control whether to set up DMA remapping only for the memory Dom0 actually
@@ -1198,6 +1204,23 @@ detection of systems known to misbehave upon accesses to that port.
 
 >> Enable IOMMU debugging code (implies `verbose`).
 
+### dom0-iommu
+> `= List of [ none | strict | relaxed ]`
+
+* `none`: disables DMA remapping for Dom0.
+
+The following two options control how RAM regions are mapped in the iommu for
+PV Dom0:
+
+* `strict`: sets up DMA remapping only for the memory Dom0 actually got
+  assigned.
+
+* `relaxed`: sets DMA remapping for all the host RAM except regions in use by
+  Xen. This is the default iommu behaviour.
+
+Note that all the above options are mutually exclusive. Specifying more than
+one on the `dom0-iommu` command line will result in undefined behavior.
+
 ### iommu\_dev\_iotlb\_timeout
 > `= <integer>`
 
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index cca4ae926e..84226b3326 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -1426,7 +1426,8 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
     if ( ret )
         goto destroy_m2p;
 
-    if ( iommu_enabled && !iommu_passthrough && !need_iommu(hardware_domain) )
+    if ( iommu_enabled && !iommu_dom0_passthrough &&
+         !need_iommu(hardware_domain) )
     {
         for ( i = spfn; i < epfn; i++ )
             if ( iommu_map_page(hardware_domain, i, i, IOMMUF_readable|IOMMUF_writable) )
diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
index 474992a75a..ad8c48be1c 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1062,7 +1062,7 @@ static void __init amd_iommu_init_cleanup(void)
     radix_tree_destroy(&ivrs_maps, xfree);
 
     iommu_enabled = 0;
-    iommu_passthrough = 0;
+    iommu_dom0_passthrough = false;
     iommu_intremap = 0;
     iommuv2_enabled = 0;
 }
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 12d2695b89..eeacf713e4 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -121,7 +121,7 @@ static void amd_iommu_setup_domain_device(
     BUG_ON( !hd->arch.root_table || !hd->arch.paging_mode ||
             !iommu->dev_table.buffer );
 
-    if ( iommu_passthrough && is_hardware_domain(domain) )
+    if ( iommu_dom0_passthrough && is_hardware_domain(domain) )
         valid = 0;
 
     if ( ats_enabled )
@@ -256,7 +256,7 @@ static void __hwdom_init amd_iommu_hwdom_init(struct domain *d)
     if ( allocate_domain_resources(dom_iommu(d)) )
         BUG();
 
-    if ( !iommu_passthrough && !need_iommu(d) )
+    if ( !iommu_dom0_passthrough && !need_iommu(d) )
     {
         int rc = 0;
 
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 70d218f910..830560bdcf 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -22,6 +22,7 @@
 #include <xsm/xsm.h>
 
 static int parse_iommu_param(const char *s);
+static int parse_dom0_iommu_param(const char *s);
 static void iommu_dump_p2m_table(unsigned char key);
 
 unsigned int __read_mostly iommu_dev_iotlb_timeout = 1000;
@@ -52,11 +53,9 @@ custom_param("iommu", parse_iommu_param);
 bool_t __initdata iommu_enable = 1;
 bool_t __read_mostly iommu_enabled;
 bool_t __read_mostly force_iommu;
-bool_t __hwdom_initdata iommu_dom0_strict;
 bool_t __read_mostly iommu_verbose;
 bool_t __read_mostly iommu_workaround_bios_bug;
 bool_t __read_mostly iommu_igfx = 1;
-bool_t __read_mostly iommu_passthrough;
 bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_qinval = 1;
 bool_t __read_mostly iommu_intremap = 1;
@@ -72,6 +71,10 @@ bool_t __read_mostly iommu_hap_pt_share = 1;
 bool_t __read_mostly iommu_debug;
 bool_t __read_mostly amd_iommu_perdev_intremap = 1;
 
+custom_param("dom0-iommu", parse_dom0_iommu_param);
+bool __hwdom_initdata iommu_dom0_strict;
+bool __read_mostly iommu_dom0_passthrough;
+
 DEFINE_PER_CPU(bool_t, iommu_dont_flush_iotlb);
 
 DEFINE_SPINLOCK(iommu_pt_cleanup_lock);
@@ -121,7 +124,7 @@ static int __init parse_iommu_param(const char *s)
         else if ( !strncmp(s, "amd-iommu-perdev-intremap", ss - s) )
             amd_iommu_perdev_intremap = val;
         else if ( !strncmp(s, "dom0-passthrough", ss - s) )
-            iommu_passthrough = val;
+            iommu_dom0_passthrough = val;
         else if ( !strncmp(s, "dom0-strict", ss - s) )
             iommu_dom0_strict = val;
         else if ( !strncmp(s, "sharept", ss - s) )
@@ -135,6 +138,31 @@ static int __init parse_iommu_param(const char *s)
     return rc;
 }
 
+static int __init parse_dom0_iommu_param(const char *s)
+{
+    const char *ss;
+    int rc = 0;
+
+    do {
+        ss = strchr(s, ',');
+        if ( !ss )
+            ss = strchr(s, '\0');
+
+        if ( !strncmp(s, "none", ss - s) )
+            iommu_dom0_passthrough = true;
+        else if ( !strncmp(s, "strict", ss - s) )
+            iommu_dom0_strict = true;
+        else if ( !strncmp(s, "relaxed", ss - s) )
+            iommu_dom0_strict = false;
+        else
+            rc = -EINVAL;
+
+        s = ss + 1;
+    } while ( *ss );
+
+    return rc;
+}
+
 int iommu_domain_init(struct domain *d)
 {
     struct domain_iommu *hd = dom_iommu(d);
@@ -158,7 +186,7 @@ static void __hwdom_init check_hwdom_reqs(struct domain *d)
 
     arch_iommu_check_autotranslated_hwdom(d);
 
-    if ( iommu_passthrough )
+    if ( iommu_dom0_passthrough )
         panic("Dom0 uses paging translated mode, dom0-passthrough must not be "
               "enabled\n");
 
@@ -372,7 +400,7 @@ int __init iommu_setup(void)
     bool_t force_intremap = force_iommu && iommu_intremap;
 
     if ( iommu_dom0_strict )
-        iommu_passthrough = 0;
+        iommu_dom0_passthrough = false;
 
     if ( iommu_enable )
     {
@@ -393,14 +421,14 @@ int __init iommu_setup(void)
     if ( !iommu_enabled )
     {
         iommu_snoop = 0;
-        iommu_passthrough = 0;
+        iommu_dom0_passthrough = false;
         iommu_dom0_strict = 0;
     }
     printk("I/O virtualisation %sabled\n", iommu_enabled ? "en" : "dis");
     if ( iommu_enabled )
     {
         printk(" - Dom0 mode: %s\n",
-               iommu_passthrough ? "Passthrough" :
+               iommu_dom0_passthrough ? "Passthrough" :
                iommu_dom0_strict ? "Strict" : "Relaxed");
         printk("Interrupt remapping %sabled\n", iommu_intremap ? "en" : "dis");
         tasklet_init(&iommu_pt_cleanup_tasklet, iommu_free_pagetables, 0);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 1710256823..8ac774215b 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1304,7 +1304,7 @@ static void __hwdom_init intel_iommu_hwdom_init(struct domain *d)
 {
     struct acpi_drhd_unit *drhd;
 
-    if ( !iommu_passthrough && is_pv_domain(d) )
+    if ( !iommu_dom0_passthrough && is_pv_domain(d) )
     {
         /* Set up 1:1 page table for hardware domain. */
         vtd_set_hwdom_mapping(d);
@@ -1391,7 +1391,7 @@ int domain_context_mapping_one(
         return res;
     }
 
-    if ( iommu_passthrough && is_hardware_domain(domain) )
+    if ( iommu_dom0_passthrough && is_hardware_domain(domain) )
     {
         context_set_translation_type(*context, CONTEXT_TT_PASS_THRU);
         agaw = level_to_agaw(iommu->nr_pt_levels);
@@ -1781,7 +1781,7 @@ static int __must_check intel_iommu_map_page(struct domain *d,
         return 0;
 
     /* Do nothing if hardware domain and iommu supports pass thru. */
-    if ( iommu_passthrough && is_hardware_domain(d) )
+    if ( iommu_dom0_passthrough && is_hardware_domain(d) )
         return 0;
 
     spin_lock(&hd->arch.mapping_lock);
@@ -1826,7 +1826,7 @@ static int __must_check intel_iommu_unmap_page(struct domain *d,
                                                unsigned long gfn)
 {
     /* Do nothing if hardware domain and iommu supports pass thru. */
-    if ( iommu_passthrough && is_hardware_domain(d) )
+    if ( iommu_dom0_passthrough && is_hardware_domain(d) )
         return 0;
 
     return dma_pte_clear_one(d, (paddr_t)gfn << PAGE_SHIFT_4K);
@@ -2269,8 +2269,8 @@ int __init intel_vtd_setup(void)
         if ( iommu_snoop && !ecap_snp_ctl(iommu->ecap) )
             iommu_snoop = 0;
 
-        if ( iommu_passthrough && !ecap_pass_thru(iommu->ecap) )
-            iommu_passthrough = 0;
+        if ( iommu_dom0_passthrough && !ecap_pass_thru(iommu->ecap) )
+            iommu_dom0_passthrough = false;
 
         if ( iommu_qinval && !ecap_queued_inval(iommu->ecap) )
             iommu_qinval = 0;
@@ -2308,7 +2308,7 @@ int __init intel_vtd_setup(void)
 
 #define P(p,s) printk("Intel VT-d %s %senabled.\n", s, (p)? "" : "not ")
     P(iommu_snoop, "Snoop Control");
-    P(iommu_passthrough, "Dom0 DMA Passthrough");
+    P(iommu_dom0_passthrough, "Dom0 DMA Passthrough");
     P(iommu_qinval, "Queued Invalidation");
     P(iommu_intremap, "Interrupt Remapping");
     P(iommu_intpost, "Posted Interrupt");
@@ -2330,7 +2330,7 @@ int __init intel_vtd_setup(void)
  error:
     iommu_enabled = 0;
     iommu_snoop = 0;
-    iommu_passthrough = 0;
+    iommu_dom0_passthrough = false;
     iommu_qinval = 0;
     iommu_intremap = 0;
     iommu_intpost = 0;
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 6b42e3b876..c0c6975ac4 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -29,13 +29,15 @@
 #include <asm/iommu.h>
 
 extern bool_t iommu_enable, iommu_enabled;
-extern bool_t force_iommu, iommu_dom0_strict, iommu_verbose;
-extern bool_t iommu_workaround_bios_bug, iommu_igfx, iommu_passthrough;
+extern bool_t force_iommu, iommu_verbose;
+extern bool_t iommu_workaround_bios_bug, iommu_igfx;
 extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
 extern bool_t iommu_hap_pt_share;
 extern bool_t iommu_debug;
 extern bool_t amd_iommu_perdev_intremap;
 
+extern bool iommu_dom0_strict, iommu_dom0_passthrough;
+
 extern unsigned int iommu_dev_iotlb_timeout;
 
 int iommu_setup(void);
-- 
2.18.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 2/4] iommu: make iommu_inclusive_mapping a suboption of dom0-iommu
  2018-08-08 10:07 [PATCH v4 0/4] 86/iommu: PVH Dom0 workarounds for missing RMRR entries Roger Pau Monne
  2018-08-08 10:07 ` [PATCH v4 1/4] iommu: introduce dom0-iommu option Roger Pau Monne
@ 2018-08-08 10:07 ` Roger Pau Monne
  2018-08-08 12:32   ` Jan Beulich
  2018-08-08 10:07 ` [PATCH v4 3/4] dom0/pvh: change the order of the MMCFG initialization Roger Pau Monne
  2018-08-08 10:07 ` [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges Roger Pau Monne
  3 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2018-08-08 10:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall,
	Jan Beulich, Roger Pau Monne

Introduce a new dom0-iommu=inclusive generic option that supersedes
iommu_inclusive_mapping. The previous behaviour is preserved and the
option should only be enabled by default on Intel hardware.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v2:
 - Fix typo in commit message.
 - Change style and text of the documentation in xen command line.
 - Set the defaults in {intel/amd}_iommu_hwdom_init for inclusive.
 - Re-add the iommu_dom0_passthrough || !is_pv_domain(d) check.

Changes since v1:
 - Use dom0-iommu instead of the iommu option.
 - Only enable by default on Intel hardware.
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Kevin Tian <kevin.tian@intel.com>
---
 docs/misc/xen-command-line.markdown         | 17 +++++-
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  4 ++
 xen/drivers/passthrough/arm/iommu.c         |  4 ++
 xen/drivers/passthrough/iommu.c             | 23 ++++++--
 xen/drivers/passthrough/vtd/extern.h        |  2 -
 xen/drivers/passthrough/vtd/iommu.c         |  8 ++-
 xen/drivers/passthrough/vtd/x86/vtd.c       | 58 +-------------------
 xen/drivers/passthrough/x86/iommu.c         | 59 +++++++++++++++++++++
 xen/include/xen/iommu.h                     |  2 +
 9 files changed, 109 insertions(+), 68 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index ea451f088e..90b32fe3f0 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1205,7 +1205,7 @@ detection of systems known to misbehave upon accesses to that port.
 >> Enable IOMMU debugging code (implies `verbose`).
 
 ### dom0-iommu
-> `= List of [ none | strict | relaxed ]`
+> `= List of [ none | strict | relaxed | inclusive ]`
 
 * `none`: disables DMA remapping for Dom0.
 
@@ -1221,6 +1221,18 @@ PV Dom0:
 Note that all the above options are mutually exclusive. Specifying more than
 one on the `dom0-iommu` command line will result in undefined behavior.
 
+The following options control whether non-RAM regions are added to the Dom0
+iommu tables. Note that they can be prefixed with `no-` to effect the inverse
+meaning:
+
+* `inclusive`: sets up DMA remapping for all the non-RAM memory below 4GB
+  except for unusable ranges. Use this to work around firmware issues providing
+  incorrect RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU
+  accesses for Dom0, with this option all pages up to 4GB, not marked as
+  unusable in the E820 table, will get a mapping established. Note that this
+  option is only applicable to a PV Dom0 and is enabled by default on Intel
+  hardware.
+
 ### iommu\_dev\_iotlb\_timeout
 > `= <integer>`
 
@@ -1233,6 +1245,9 @@ wait descriptor timed out', try increasing this value.
 ### iommu\_inclusive\_mapping (VT-d)
 > `= <boolean>`
 
+**WARNING: This command line option is deprecated, and superseded by
+_dom0-iommu=inclusive_ - using both options in combination is undefined.**
+
 > Default: `true`
 
 Use this to work around firmware issues providing incorrect RMRR entries.
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index eeacf713e4..0e0c99c942 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -253,6 +253,10 @@ static void __hwdom_init amd_iommu_hwdom_init(struct domain *d)
     unsigned long i; 
     const struct amd_iommu *iommu;
 
+    /* Inclusive IOMMU mappings are disabled by default on AMD hardware. */
+    iommu_dom0_inclusive = iommu_dom0_inclusive == -1 ? false
+                                                      : iommu_dom0_inclusive;
+
     if ( allocate_domain_resources(dom_iommu(d)) )
         BUG();
 
diff --git a/xen/drivers/passthrough/arm/iommu.c b/xen/drivers/passthrough/arm/iommu.c
index 95b1abb972..325997b19f 100644
--- a/xen/drivers/passthrough/arm/iommu.c
+++ b/xen/drivers/passthrough/arm/iommu.c
@@ -73,3 +73,7 @@ int arch_iommu_populate_page_table(struct domain *d)
     /* The IOMMU shares the p2m with the CPU */
     return -ENOSYS;
 }
+
+void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
+{
+}
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 830560bdcf..f15c94be42 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -74,6 +74,7 @@ bool_t __read_mostly amd_iommu_perdev_intremap = 1;
 custom_param("dom0-iommu", parse_dom0_iommu_param);
 bool __hwdom_initdata iommu_dom0_strict;
 bool __read_mostly iommu_dom0_passthrough;
+int8_t __hwdom_initdata iommu_dom0_inclusive = -1;
 
 DEFINE_PER_CPU(bool_t, iommu_dont_flush_iotlb);
 
@@ -144,16 +145,23 @@ static int __init parse_dom0_iommu_param(const char *s)
     int rc = 0;
 
     do {
+        bool val = !!strncmp(s, "no-", 3);
+
+        if ( !val )
+            s += 3;
+
         ss = strchr(s, ',');
         if ( !ss )
             ss = strchr(s, '\0');
 
-        if ( !strncmp(s, "none", ss - s) )
+        if ( !strncmp(s, "none", ss - s) && val )
             iommu_dom0_passthrough = true;
-        else if ( !strncmp(s, "strict", ss - s) )
+        else if ( !strncmp(s, "strict", ss - s) && val )
             iommu_dom0_strict = true;
-        else if ( !strncmp(s, "relaxed", ss - s) )
+        else if ( !strncmp(s, "relaxed", ss - s) && val )
             iommu_dom0_strict = false;
+        else if ( !strncmp(s, "inclusive", ss - s) )
+            iommu_dom0_inclusive = val;
         else
             rc = -EINVAL;
 
@@ -202,6 +210,13 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
     if ( !iommu_enabled )
         return;
 
+    if ( iommu_dom0_inclusive == true && !is_pv_domain(d) )
+    {
+        printk(XENLOG_WARNING
+               "IOMMU inclusive mappings are only supported on PV Dom0\n");
+        iommu_dom0_inclusive = false;
+    }
+
     register_keyhandler('o', &iommu_dump_p2m_table, "dump iommu p2m table", 0);
     d->need_iommu = !!iommu_dom0_strict;
     if ( need_iommu(d) && !iommu_use_hap_pt(d) )
@@ -236,6 +251,8 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
     }
 
     hd->platform_ops->hwdom_init(d);
+
+    arch_iommu_hwdom_init(d);
 }
 
 void iommu_teardown(struct domain *d)
diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index fb7edfaef9..91cadc602e 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -99,6 +99,4 @@ void pci_vtd_quirk(const struct pci_dev *);
 bool_t platform_supports_intremap(void);
 bool_t platform_supports_x2apic(void);
 
-void vtd_set_hwdom_mapping(struct domain *d);
-
 #endif // _VTD_EXTERN_H_
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 8ac774215b..7c7e15755d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1304,11 +1304,9 @@ static void __hwdom_init intel_iommu_hwdom_init(struct domain *d)
 {
     struct acpi_drhd_unit *drhd;
 
-    if ( !iommu_dom0_passthrough && is_pv_domain(d) )
-    {
-        /* Set up 1:1 page table for hardware domain. */
-        vtd_set_hwdom_mapping(d);
-    }
+    /* Inclusive mappings are enabled by default on Intel hardware for PV. */
+    iommu_dom0_inclusive = iommu_dom0_inclusive == -1 ? is_pv_domain(d)
+                                                      : iommu_dom0_inclusive;
 
     setup_hwdom_pci_devices(d, setup_hwdom_device);
     setup_hwdom_rmrr(d);
diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c b/xen/drivers/passthrough/vtd/x86/vtd.c
index 00a9891005..20323051d0 100644
--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -25,7 +25,6 @@
 #include <xen/irq.h>
 #include <xen/numa.h>
 #include <asm/fixmap.h>
-#include <asm/setup.h>
 #include "../iommu.h"
 #include "../dmar.h"
 #include "../vtd.h"
@@ -35,8 +34,7 @@
  * iommu_inclusive_mapping: when set, all memory below 4GB is included in dom0
  * 1:1 iommu mappings except xen and unusable regions.
  */
-static bool_t __hwdom_initdata iommu_inclusive_mapping = 1;
-boolean_param("iommu_inclusive_mapping", iommu_inclusive_mapping);
+boolean_param("iommu_inclusive_mapping", iommu_dom0_inclusive);
 
 void *map_vtd_domain_page(u64 maddr)
 {
@@ -108,57 +106,3 @@ void hvm_dpci_isairq_eoi(struct domain *d, unsigned int isairq)
     spin_unlock(&d->event_lock);
 }
 
-void __hwdom_init vtd_set_hwdom_mapping(struct domain *d)
-{
-    unsigned long i, top, max_pfn;
-
-    BUG_ON(!is_hardware_domain(d));
-
-    max_pfn = (GB(4) >> PAGE_SHIFT) - 1;
-    top = max(max_pdx, pfn_to_pdx(max_pfn) + 1);
-
-    for ( i = 0; i < top; i++ )
-    {
-        unsigned long pfn = pdx_to_pfn(i);
-        bool map;
-        int rc;
-
-        /*
-         * Set up 1:1 mapping for dom0. Default to include only
-         * conventional RAM areas and let RMRRs include needed reserved
-         * regions. When set, the inclusive mapping additionally maps in
-         * every pfn up to 4GB except those that fall in unusable ranges.
-         */
-        if ( pfn > max_pfn && !mfn_valid(_mfn(pfn)) )
-            continue;
-
-        if ( iommu_inclusive_mapping && pfn <= max_pfn )
-            map = !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE);
-        else
-            map = page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL);
-
-        if ( !map )
-            continue;
-
-        /* Exclude Xen bits */
-        if ( xen_in_range(pfn) )
-            continue;
-
-        /*
-         * If dom0-strict mode is enabled then exclude conventional RAM
-         * and let the common code map dom0's pages.
-         */
-        if ( iommu_dom0_strict &&
-             page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) )
-            continue;
-
-        rc = iommu_map_page(d, pfn, pfn, IOMMUF_readable|IOMMUF_writable);
-        if ( rc )
-            printk(XENLOG_WARNING VTDPREFIX " d%d: IOMMU mapping failed: %d\n",
-                   d->domain_id, rc);
-
-        if (!(i & 0xfffff))
-            process_pending_softirqs();
-    }
-}
-
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index 68182afd91..5a7a765e9d 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -20,6 +20,8 @@
 #include <xen/softirq.h>
 #include <xsm/xsm.h>
 
+#include <asm/setup.h>
+
 void iommu_update_ire_from_apic(
     unsigned int apic, unsigned int reg, unsigned int value)
 {
@@ -132,6 +134,63 @@ void arch_iommu_domain_destroy(struct domain *d)
 {
 }
 
+void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
+{
+    unsigned long i, top, max_pfn;
+
+    BUG_ON(!is_hardware_domain(d));
+
+    if ( iommu_dom0_passthrough || !is_pv_domain(d) )
+        return;
+
+    max_pfn = (GB(4) >> PAGE_SHIFT) - 1;
+    top = max(max_pdx, pfn_to_pdx(max_pfn) + 1);
+
+    for ( i = 0; i < top; i++ )
+    {
+        unsigned long pfn = pdx_to_pfn(i);
+        bool map;
+        int rc;
+
+        /*
+         * Set up 1:1 mapping for dom0. Default to include only
+         * conventional RAM areas and let RMRRs include needed reserved
+         * regions. When set, the inclusive mapping additionally maps in
+         * every pfn up to 4GB except those that fall in unusable ranges.
+         */
+        if ( pfn > max_pfn && !mfn_valid(_mfn(pfn)) )
+            continue;
+
+        if ( iommu_dom0_inclusive && pfn <= max_pfn )
+            map = !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE);
+        else
+            map = page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL);
+
+        if ( !map )
+            continue;
+
+        /* Exclude Xen bits */
+        if ( xen_in_range(pfn) )
+            continue;
+
+        /*
+         * If dom0-strict mode is enabled then exclude conventional RAM
+         * and let the common code map dom0's pages.
+         */
+        if ( iommu_dom0_strict &&
+             page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) )
+            continue;
+
+        rc = iommu_map_page(d, pfn, pfn, IOMMUF_readable|IOMMUF_writable);
+        if ( rc )
+            printk(XENLOG_WARNING " d%d: IOMMU mapping failed: %d\n",
+                   d->domain_id, rc);
+
+        if (!(i & 0xfffff))
+            process_pending_softirqs();
+    }
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index c0c6975ac4..99e5b89c0f 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -37,6 +37,7 @@ extern bool_t iommu_debug;
 extern bool_t amd_iommu_perdev_intremap;
 
 extern bool iommu_dom0_strict, iommu_dom0_passthrough;
+extern int8_t iommu_dom0_inclusive;
 
 extern unsigned int iommu_dev_iotlb_timeout;
 
@@ -51,6 +52,7 @@ void arch_iommu_domain_destroy(struct domain *d);
 int arch_iommu_domain_init(struct domain *d);
 int arch_iommu_populate_page_table(struct domain *d);
 void arch_iommu_check_autotranslated_hwdom(struct domain *d);
+void arch_iommu_hwdom_init(struct domain *d);
 
 int iommu_construct(struct domain *d);
 
-- 
2.18.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 3/4] dom0/pvh: change the order of the MMCFG initialization
  2018-08-08 10:07 [PATCH v4 0/4] 86/iommu: PVH Dom0 workarounds for missing RMRR entries Roger Pau Monne
  2018-08-08 10:07 ` [PATCH v4 1/4] iommu: introduce dom0-iommu option Roger Pau Monne
  2018-08-08 10:07 ` [PATCH v4 2/4] iommu: make iommu_inclusive_mapping a suboption of dom0-iommu Roger Pau Monne
@ 2018-08-08 10:07 ` Roger Pau Monne
  2018-08-08 12:35   ` Jan Beulich
  2018-08-08 10:07 ` [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges Roger Pau Monne
  3 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2018-08-08 10:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

So it's done before the iommu is initialized. This is required in
order to be able to fetch the MMCFG regions from the domain struct.

No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - New in this version.
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/dom0_build.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index f0cd63b1ec..5065729106 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -1100,6 +1100,13 @@ int __init dom0_construct_pvh(struct domain *d, const module_t *image,
         return rc;
     }
 
+    /*
+     * NB: MMCFG initialization needs to be performed before iommu
+     * initialization so the iommu code can fetch the MMCFG regions used by the
+     * domain.
+     */
+    pvh_setup_mmcfg(d);
+
     iommu_hwdom_init(d);
 
     rc = pvh_load_kernel(d, image, image_headroom, initrd, bootstrap_map(image),
@@ -1124,8 +1131,6 @@ int __init dom0_construct_pvh(struct domain *d, const module_t *image,
         return rc;
     }
 
-    pvh_setup_mmcfg(d);
-
     printk("WARNING: PVH is an experimental mode with limited functionality\n");
     return 0;
 }
-- 
2.18.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges
  2018-08-08 10:07 [PATCH v4 0/4] 86/iommu: PVH Dom0 workarounds for missing RMRR entries Roger Pau Monne
                   ` (2 preceding siblings ...)
  2018-08-08 10:07 ` [PATCH v4 3/4] dom0/pvh: change the order of the MMCFG initialization Roger Pau Monne
@ 2018-08-08 10:07 ` Roger Pau Monne
  2018-08-08 13:15   ` Jan Beulich
  3 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monne @ 2018-08-08 10:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, Jan Beulich,
	Roger Pau Monne

Several people have reported hardware issues (malfunctioning USB
controllers) due to iommu page faults on Intel hardware. Those faults
are caused by missing RMRR (VTd) entries in the ACPI tables. Those can
be worked around on VTd hardware by manually adding RMRR entries on
the command line, this is however limited to Intel hardware and quite
cumbersome to do.

In order to solve those issues add a new dom0-iommu=reserved option
that identity maps all regions marked as reserved in the memory map.
Note that regions used by devices emulated by Xen (LAPIC, IO-APIC or
PCIe MCFG regions) are specifically avoided. Note that this option is
available to a PVH Dom0 (as opposed to the inclusive option which only
works for PV Dom0).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v3:
 - Add mappings if the iommu page tables are shared.

Changes since v2:
 - Fix comment regarding dom0-strict.
 - Change documentation style of xen command line.
 - Rename iommu_map to hwdom_iommu_map.
 - Move all the checks to hwdom_iommu_map.

Changes since v1:
 - Introduce a new reserved option instead of abusing the inclusive
   option.
 - Use the same helper function for PV and PVH in order to decide if a
   page should be added to the domain page tables.
 - Use the data inside of the domain struct to detect overlaps with
   emulated MMIO regions.
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 docs/misc/xen-command-line.markdown         | 11 ++-
 xen/arch/x86/hvm/io.c                       |  5 ++
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 +
 xen/drivers/passthrough/iommu.c             |  3 +
 xen/drivers/passthrough/vtd/iommu.c         |  3 +
 xen/drivers/passthrough/x86/iommu.c         | 92 ++++++++++++++-------
 xen/include/asm-x86/hvm/io.h                |  3 +
 xen/include/xen/iommu.h                     |  2 +-
 8 files changed, 91 insertions(+), 31 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 90b32fe3f0..59ec2afc5d 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1205,7 +1205,7 @@ detection of systems known to misbehave upon accesses to that port.
 >> Enable IOMMU debugging code (implies `verbose`).
 
 ### dom0-iommu
-> `= List of [ none | strict | relaxed | inclusive ]`
+> `= List of [ none | strict | relaxed | inclusive | reserved ]`
 
 * `none`: disables DMA remapping for Dom0.
 
@@ -1233,6 +1233,15 @@ meaning:
   option is only applicable to a PV Dom0 and is enabled by default on Intel
   hardware.
 
+* `reserved`: sets up DMA remapping for all the reserved regions in the memory
+  map for Dom0. Use this to work around firmware issues providing incorrect
+  RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU accesses
+  for Dom0, all memory regions marked as reserved in the memory map that don't
+  overlap with any MMIO region from emulated devices will be identity mapped.
+  This option maps a subset of the memory that would be mapped when using the
+  `inclusive` option. This option is available to a PVH Dom0 and is enabled by
+  default on Intel hardware.
+
 ### iommu\_dev\_iotlb\_timeout
 > `= <integer>`
 
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index bf4d8748d3..5e01c33890 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -404,6 +404,11 @@ static const struct hvm_mmcfg *vpci_mmcfg_find(const struct domain *d,
     return NULL;
 }
 
+bool vpci_mmcfg_address(const struct domain *d, paddr_t addr)
+{
+    return vpci_mmcfg_find(d, addr);
+}
+
 static unsigned int vpci_mmcfg_decode_addr(const struct hvm_mmcfg *mmcfg,
                                            paddr_t addr, pci_sbdf_t *sbdf)
 {
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 0e0c99c942..2c2867d088 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -256,6 +256,9 @@ static void __hwdom_init amd_iommu_hwdom_init(struct domain *d)
     /* Inclusive IOMMU mappings are disabled by default on AMD hardware. */
     iommu_dom0_inclusive = iommu_dom0_inclusive == -1 ? false
                                                       : iommu_dom0_inclusive;
+    /* Reserved IOMMU mappings are disabled by default on AMD hardware. */
+    iommu_dom0_reserved = iommu_dom0_reserved == -1 ? false
+                                                    : iommu_dom0_reserved;
 
     if ( allocate_domain_resources(dom_iommu(d)) )
         BUG();
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index f15c94be42..9c991bd2cf 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -75,6 +75,7 @@ custom_param("dom0-iommu", parse_dom0_iommu_param);
 bool __hwdom_initdata iommu_dom0_strict;
 bool __read_mostly iommu_dom0_passthrough;
 int8_t __hwdom_initdata iommu_dom0_inclusive = -1;
+int8_t __hwdom_initdata iommu_dom0_reserved = -1;
 
 DEFINE_PER_CPU(bool_t, iommu_dont_flush_iotlb);
 
@@ -162,6 +163,8 @@ static int __init parse_dom0_iommu_param(const char *s)
             iommu_dom0_strict = false;
         else if ( !strncmp(s, "inclusive", ss - s) )
             iommu_dom0_inclusive = val;
+        else if ( !strncmp(s, "reserved", ss - s) )
+            iommu_dom0_reserved = val;
         else
             rc = -EINVAL;
 
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 7c7e15755d..77a076215b 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1307,6 +1307,9 @@ static void __hwdom_init intel_iommu_hwdom_init(struct domain *d)
     /* Inclusive mappings are enabled by default on Intel hardware for PV. */
     iommu_dom0_inclusive = iommu_dom0_inclusive == -1 ? is_pv_domain(d)
                                                       : iommu_dom0_inclusive;
+    /* Reserved IOMMU mappings are enabled by default on Intel hardware. */
+    iommu_dom0_reserved = iommu_dom0_reserved == -1 ? true
+                                                    : iommu_dom0_reserved;
 
     setup_hwdom_pci_devices(d, setup_hwdom_device);
     setup_hwdom_rmrr(d);
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index 5a7a765e9d..6271d8b671 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -20,6 +20,7 @@
 #include <xen/softirq.h>
 #include <xsm/xsm.h>
 
+#include <asm/hvm/io.h>
 #include <asm/setup.h>
 
 void iommu_update_ire_from_apic(
@@ -134,13 +135,67 @@ void arch_iommu_domain_destroy(struct domain *d)
 {
 }
 
+static bool __hwdom_init hwdom_iommu_map(const struct domain *d, unsigned long pfn,
+                                         unsigned long max_pfn)
+{
+    unsigned int i;
+
+    /*
+     * Ignore any address below 1MB, that's already identity mapped by the
+     * domain builder for HVM.
+     */
+    if ( (is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||
+         /* Exclude Xen bits. */
+         xen_in_range(pfn) || (pfn > max_pfn && !mfn_valid(_mfn(pfn))) )
+        return false;
+
+    /*
+     * If dom0-strict mode is enabled or the guest type is PVH/HVM then exclude
+     * conventional RAM and let the common code map dom0's pages.
+     */
+    if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
+         (iommu_dom0_strict || is_hvm_domain(d)) )
+        return false;
+    if ( page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
+         !iommu_dom0_reserved && !iommu_dom0_inclusive )
+        return false;
+    if ( !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) &&
+         !page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
+         !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
+         (!iommu_dom0_inclusive || pfn > max_pfn) )
+        return false;
+
+    /* Check that it doesn't overlap with the LAPIC */
+    if ( has_vlapic(d) )
+    {
+        const struct vcpu *v;
+
+        for_each_vcpu(d, v)
+            if ( pfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
+                return false;
+    }
+    /* ... or the IO-APIC */
+    for ( i = 0; has_vioapic(d) && i < d->arch.hvm_domain.nr_vioapics; i++ )
+        if ( pfn == PFN_DOWN(domain_vioapic(d, i)->base_address) )
+            return false;
+    /*
+     * ... or the PCIe MCFG regions.
+     * TODO: runtime added MMCFG regions are not checked to make sure they
+     * don't overlap with already mapped regions, thus preventing trapping.
+     */
+    if ( has_vpci(d) && vpci_mmcfg_address(d, pfn << PAGE_SHIFT) )
+        return false;
+
+    return true;
+}
+
 void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
 {
     unsigned long i, top, max_pfn;
 
     BUG_ON(!is_hardware_domain(d));
 
-    if ( iommu_dom0_passthrough || !is_pv_domain(d) )
+    if ( iommu_dom0_passthrough )
         return;
 
     max_pfn = (GB(4) >> PAGE_SHIFT) - 1;
@@ -149,39 +204,18 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
     for ( i = 0; i < top; i++ )
     {
         unsigned long pfn = pdx_to_pfn(i);
-        bool map;
         int rc;
 
-        /*
-         * Set up 1:1 mapping for dom0. Default to include only
-         * conventional RAM areas and let RMRRs include needed reserved
-         * regions. When set, the inclusive mapping additionally maps in
-         * every pfn up to 4GB except those that fall in unusable ranges.
-         */
-        if ( pfn > max_pfn && !mfn_valid(_mfn(pfn)) )
+        if ( !hwdom_iommu_map(d, pfn, max_pfn) )
             continue;
 
-        if ( iommu_dom0_inclusive && pfn <= max_pfn )
-            map = !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE);
+        if ( iommu_use_hap_pt(d) )
+        {
+            ASSERT(is_hvm_domain(d));
+            rc = set_identity_p2m_entry(d, pfn, p2m_access_rw, 0);
+        }
         else
-            map = page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL);
-
-        if ( !map )
-            continue;
-
-        /* Exclude Xen bits */
-        if ( xen_in_range(pfn) )
-            continue;
-
-        /*
-         * If dom0-strict mode is enabled then exclude conventional RAM
-         * and let the common code map dom0's pages.
-         */
-        if ( iommu_dom0_strict &&
-             page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) )
-            continue;
-
-        rc = iommu_map_page(d, pfn, pfn, IOMMUF_readable|IOMMUF_writable);
+            rc = iommu_map_page(d, pfn, pfn, IOMMUF_readable|IOMMUF_writable);
         if ( rc )
             printk(XENLOG_WARNING " d%d: IOMMU mapping failed: %d\n",
                    d->domain_id, rc);
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index e6b6ed0b92..8cca456b55 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -180,6 +180,9 @@ int register_vpci_mmcfg_handler(struct domain *d, paddr_t addr,
 /* Destroy tracked MMCFG areas. */
 void destroy_vpci_mmcfg(struct domain *d);
 
+/* Check if an address is between a MMCFG region for a domain. */
+bool vpci_mmcfg_address(const struct domain *d, paddr_t addr);
+
 #endif /* __ASM_X86_HVM_IO_H__ */
 
 
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 99e5b89c0f..fed1b1ea7a 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -37,7 +37,7 @@ extern bool_t iommu_debug;
 extern bool_t amd_iommu_perdev_intremap;
 
 extern bool iommu_dom0_strict, iommu_dom0_passthrough;
-extern int8_t iommu_dom0_inclusive;
+extern int8_t iommu_dom0_inclusive, iommu_dom0_reserved;
 
 extern unsigned int iommu_dev_iotlb_timeout;
 
-- 
2.18.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/4] iommu: introduce dom0-iommu option
  2018-08-08 10:07 ` [PATCH v4 1/4] iommu: introduce dom0-iommu option Roger Pau Monne
@ 2018-08-08 12:10   ` Jan Beulich
  2018-08-08 15:50     ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2018-08-08 12:10 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall,
	Suravee Suthikulpanit, xen-devel, Brian Woods

>>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
> @@ -1198,6 +1204,23 @@ detection of systems known to misbehave upon accesses to that port.
>  
>  >> Enable IOMMU debugging code (implies `verbose`).
>  
> +### dom0-iommu

This is now misplaced, as the file is (meant to be) alphabetically
sorted.

> +> `= List of [ none | strict | relaxed ]`
> +
> +* `none`: disables DMA remapping for Dom0.
> +
> +The following two options control how RAM regions are mapped in the iommu for
> +PV Dom0:
> +
> +* `strict`: sets up DMA remapping only for the memory Dom0 actually got
> +  assigned.

s/memory/RAM/ ?

> +* `relaxed`: sets DMA remapping for all the host RAM except regions in use by
> +  Xen. This is the default iommu behaviour.

Drop "iommu" here?

> +Note that all the above options are mutually exclusive. Specifying more than
> +one on the `dom0-iommu` command line will result in undefined behavior.

Isn't this more strict than it needs to be? "none", afaict, always takes
precedence if enabled. What color a bike shed is simply doesn't matter
when it doesn't exist.

> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -1426,7 +1426,8 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
>      if ( ret )
>          goto destroy_m2p;
>  
> -    if ( iommu_enabled && !iommu_passthrough && !need_iommu(hardware_domain) )
> +    if ( iommu_enabled && !iommu_dom0_passthrough &&
> +         !need_iommu(hardware_domain) )

This makes already clear that you need to better distinguish Dom0 and
hwdom here, but it's not immediately clear to me which direction the
changes should be made: Do you mean truly only Dom0 throughout
this patch, or hwdom? While the doc and command line option name can
perhaps left as is, internal variable names should not say Dom0 when
they don't mean Dom0. Otoh if you mean only Dom0, then the use of
hardware_domain above (and elsewhere) is now wrong.

Of course I won't demand (but even less so object to) you renaming
the other related variable that is affected here.

> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -22,6 +22,7 @@
>  #include <xsm/xsm.h>
>  
>  static int parse_iommu_param(const char *s);
> +static int parse_dom0_iommu_param(const char *s);

Please don't. Instead ...

> @@ -72,6 +71,10 @@ bool_t __read_mostly iommu_hap_pt_share = 1;
>  bool_t __read_mostly iommu_debug;
>  bool_t __read_mostly amd_iommu_perdev_intremap = 1;
>  
> +custom_param("dom0-iommu", parse_dom0_iommu_param);

... move this immediately after (with no intervening blank line)
parse_dom0_iommu_param()'s definition.

> +static int __init parse_dom0_iommu_param(const char *s)
> +{
> +    const char *ss;
> +    int rc = 0;
> +
> +    do {
> +        ss = strchr(s, ',');
> +        if ( !ss )
> +            ss = strchr(s, '\0');
> +
> +        if ( !strncmp(s, "none", ss - s) )
> +            iommu_dom0_passthrough = true;
> +        else if ( !strncmp(s, "strict", ss - s) )
> +            iommu_dom0_strict = true;
> +        else if ( !strncmp(s, "relaxed", ss - s) )
> +            iommu_dom0_strict = false;

Perhaps better just have one of the two, and make them truly
boolean? Or would that conflict with further patches / plans?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/4] iommu: make iommu_inclusive_mapping a suboption of dom0-iommu
  2018-08-08 10:07 ` [PATCH v4 2/4] iommu: make iommu_inclusive_mapping a suboption of dom0-iommu Roger Pau Monne
@ 2018-08-08 12:32   ` Jan Beulich
  2018-08-08 16:09     ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2018-08-08 12:32 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall, xen-devel

>>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
> Introduce a new dom0-iommu=inclusive generic option that supersedes
> iommu_inclusive_mapping. The previous behaviour is preserved and the
> option should only be enabled by default on Intel hardware.

Why "should" instead of "is"?

> @@ -1221,6 +1221,18 @@ PV Dom0:
>  Note that all the above options are mutually exclusive. Specifying more than
>  one on the `dom0-iommu` command line will result in undefined behavior.
>  
> +The following options control whether non-RAM regions are added to the Dom0
> +iommu tables. Note that they can be prefixed with `no-` to effect the inverse
> +meaning:

I'm not particularly happy about the mentioning of "no-" here: Why is
this better than the also permitted "=0" etc suffixes? Keep it generic,
just like other options do.

> +* `inclusive`: sets up DMA remapping for all the non-RAM memory below 4GB
> +  except for unusable ranges. Use this to work around firmware issues providing
> +  incorrect RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU
> +  accesses for Dom0, with this option all pages up to 4GB, not marked as
> +  unusable in the E820 table, will get a mapping established. Note that this
> +  option is only applicable to a PV Dom0 and is enabled by default on Intel
> +  hardware.

No word at all about the interaction with none/strict/relaxed? I think,
as mentioned for patch 1, "none" renders this option meaningless as
well. But for "relaxed" it's pretty unclear, because from E820 alone
you can't judge whether e.g. a reserved region is RAM or MMIO. (As
an implication, the mentioning of RAM in patch 1's doc for "relaxed"
then looks symmetrically wrong, just like I've already asked to replace
"memory" by "RAM" for "strict".)

> --- a/xen/drivers/passthrough/arm/iommu.c
> +++ b/xen/drivers/passthrough/arm/iommu.c
> @@ -73,3 +73,7 @@ int arch_iommu_populate_page_table(struct domain *d)
>      /* The IOMMU shares the p2m with the CPU */
>      return -ENOSYS;
>  }
> +
> +void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
> +{
> +}

The option being in common code, I think you also need to set it for
ARM, so it won't remain at its default of -1.

> @@ -144,16 +145,23 @@ static int __init parse_dom0_iommu_param(const char *s)
>      int rc = 0;
>  
>      do {
> +        bool val = !!strncmp(s, "no-", 3);

Oh, you do a literal comparison against "no-". Please don't, that's what
we have parse_boolean() for.

> @@ -202,6 +210,13 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
>      if ( !iommu_enabled )
>          return;
>  
> +    if ( iommu_dom0_inclusive == true && !is_pv_domain(d) )

Why the "== true"? It shouldn't have its initializer value of -1 anymore
at this point.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 3/4] dom0/pvh: change the order of the MMCFG initialization
  2018-08-08 10:07 ` [PATCH v4 3/4] dom0/pvh: change the order of the MMCFG initialization Roger Pau Monne
@ 2018-08-08 12:35   ` Jan Beulich
  2018-08-10 10:04     ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2018-08-08 12:35 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

>>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
> So it's done before the iommu is initialized. This is required in
> order to be able to fetch the MMCFG regions from the domain struct.

Is this a useful change to make? Regions not reported through the MCFG
table will need punching holes anyway, so why not punch holes uniformly
in all cases, allowing the hole punching code to be tested even on systems
without non-boot-time-available regions?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges
  2018-08-08 10:07 ` [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges Roger Pau Monne
@ 2018-08-08 13:15   ` Jan Beulich
  2018-08-08 16:18     ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2018-08-08 13:15 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

>>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
> Several people have reported hardware issues (malfunctioning USB
> controllers) due to iommu page faults on Intel hardware. Those faults
> are caused by missing RMRR (VTd) entries in the ACPI tables. Those can
> be worked around on VTd hardware by manually adding RMRR entries on
> the command line, this is however limited to Intel hardware and quite
> cumbersome to do.
> 
> In order to solve those issues add a new dom0-iommu=reserved option
> that identity maps all regions marked as reserved in the memory map.

Considering that report we've had (yesterday? earlier today?), don't
we need to go further and use the union of RMRRs and reserved
regions? Iirc they had a case where an RMRR was not in a reserved
region ...

> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1205,7 +1205,7 @@ detection of systems known to misbehave upon accesses 
> to that port.
>  >> Enable IOMMU debugging code (implies `verbose`).
>  
>  ### dom0-iommu
> -> `= List of [ none | strict | relaxed | inclusive ]`
> +> `= List of [ none | strict | relaxed | inclusive | reserved ]`
>  
>  * `none`: disables DMA remapping for Dom0.
>  
> @@ -1233,6 +1233,15 @@ meaning:
>    option is only applicable to a PV Dom0 and is enabled by default on Intel
>    hardware.
>  
> +* `reserved`: sets up DMA remapping for all the reserved regions in the  memory
> +  map for Dom0. Use this to work around firmware issues providing incorrect
> +  RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU accesses
> +  for Dom0, all memory regions marked as reserved in the memory map that don't
> +  overlap with any MMIO region from emulated devices will be identity mapped.
> +  This option maps a subset of the memory that would be mapped when using the
> +  `inclusive` option. This option is available to a PVH Dom0 and is enabled by
> +  default on Intel hardware.

The sub-options so far were quite clear in their meanings, but
"dom0-iommu=reserved" might mean all sorts of things to me, but quite
certainly not "map all reserved regions". "map-reserved" perhaps?

> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -404,6 +404,11 @@ static const struct hvm_mmcfg *vpci_mmcfg_find(const struct domain *d,
>      return NULL;
>  }
>  
> +bool vpci_mmcfg_address(const struct domain *d, paddr_t addr)
> +{
> +    return vpci_mmcfg_find(d, addr);
> +}

I think the function name should have an "is_" somewhere, to make
clear address is a noun here and not a verb.

> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -256,6 +256,9 @@ static void __hwdom_init amd_iommu_hwdom_init(struct domain *d)
>      /* Inclusive IOMMU mappings are disabled by default on AMD hardware. */
>      iommu_dom0_inclusive = iommu_dom0_inclusive == -1 ? false
>                                                        : iommu_dom0_inclusive;
> +    /* Reserved IOMMU mappings are disabled by default on AMD hardware. */
> +    iommu_dom0_reserved = iommu_dom0_reserved == -1 ? false
> +                                                    : iommu_dom0_reserved;

Especially seeing these two together now, I think an if() each would
be easier to read (not the least because of being shorter). To me,
the main purpose of the conditional operator is to allow shortening
simple if/else pairs, rather than lengthening simple if()-s.

> @@ -134,13 +135,67 @@ void arch_iommu_domain_destroy(struct domain *d)
>  {
>  }
>  
> +static bool __hwdom_init hwdom_iommu_map(const struct domain *d, unsigned long pfn,
> +                                         unsigned long max_pfn)
> +{
> +    unsigned int i;
> +
> +    /*
> +     * Ignore any address below 1MB, that's already identity mapped by the
> +     * domain builder for HVM.
> +     */
> +    if ( (is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||

Careful again here with the distinction between Dom0 and hwdom:
The domain builder you refer to is - aiui - the in-Xen one, i.e. the
one _only_ dealing with Dom0.

> +    /*
> +     * If dom0-strict mode is enabled or the guest type is PVH/HVM then exclude
> +     * conventional RAM and let the common code map dom0's pages.
> +     */
> +    if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
> +         (iommu_dom0_strict || is_hvm_domain(d)) )
> +        return false;
> +    if ( page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
> +         !iommu_dom0_reserved && !iommu_dom0_inclusive )
> +        return false;
> +    if ( !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) &&
> +         !page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
> +         !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
> +         (!iommu_dom0_inclusive || pfn > max_pfn) )
> +        return false;

As page_is_ram_type() is, especially on systems with many E820
entries, not really cheap, I think at least a minimum amount of
optimization is on order here - after all you do this for every
single page of the system. Hence minimally the result of the first
CONVENTIONAL and RESERVED queries should be latched and
re-used (or "else" be used suitably). Ideally an approach would
be used which involved just a single iteration through the E820
map, but I realize this may be more than is feasible here.

Furthermore I'm unconvinced the !page_is_ram_type() uses
are really what you want: The function returning false means
"don't know", not "no". Perhaps the function needs to be made
return a tristate (yes, no, or part of it).

> +    /* Check that it doesn't overlap with the LAPIC */
> +    if ( has_vlapic(d) )
> +    {
> +        const struct vcpu *v;
> +
> +        for_each_vcpu(d, v)
> +            if ( pfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
> +                return false;
> +    }

I don't, btw, recall any code adjusting the IOMMU mappings in
case the domain relocates its LAPIC. If you do the check above,
wouldn't that other side too need taking care of?

> +    /* ... or the IO-APIC */
> +    for ( i = 0; has_vioapic(d) && i < d->arch.hvm_domain.nr_vioapics; i++ )
> +        if ( pfn == PFN_DOWN(domain_vioapic(d, i)->base_address) )
> +            return false;
> +    /*
> +     * ... or the PCIe MCFG regions.
> +     * TODO: runtime added MMCFG regions are not checked to make sure they
> +     * don't overlap with already mapped regions, thus preventing trapping.
> +     */
> +    if ( has_vpci(d) && vpci_mmcfg_address(d, pfn << PAGE_SHIFT) )

ITYM pfn_to_paddr() here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/4] iommu: introduce dom0-iommu option
  2018-08-08 12:10   ` Jan Beulich
@ 2018-08-08 15:50     ` Roger Pau Monné
  2018-08-09  7:00       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2018-08-08 15:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall,
	Suravee Suthikulpanit, xen-devel, Brian Woods

On Wed, Aug 08, 2018 at 06:10:39AM -0600, Jan Beulich wrote:
> >>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
> > +Note that all the above options are mutually exclusive. Specifying more than
> > +one on the `dom0-iommu` command line will result in undefined behavior.
> 
> Isn't this more strict than it needs to be? "none", afaict, always takes
> precedence if enabled. What color a bike shed is simply doesn't matter
> when it doesn't exist.

Right, that's due to the current implementation and the way this is
stored, but I don't think we want to spell out any of this in order to
not give any guarantees. For example:

dom0-iommu=none,relaxed

Shouldn't be used, albeit with the current implementation relaxed will
be basically ignored I don't think we want to write this down
anywhere because people shouldn't rely on this behavior.

> > --- a/xen/arch/x86/x86_64/mm.c
> > +++ b/xen/arch/x86/x86_64/mm.c
> > @@ -1426,7 +1426,8 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
> >      if ( ret )
> >          goto destroy_m2p;
> >  
> > -    if ( iommu_enabled && !iommu_passthrough && !need_iommu(hardware_domain) )
> > +    if ( iommu_enabled && !iommu_dom0_passthrough &&
> > +         !need_iommu(hardware_domain) )
> 
> This makes already clear that you need to better distinguish Dom0 and
> hwdom here, but it's not immediately clear to me which direction the
> changes should be made: Do you mean truly only Dom0 throughout
> this patch, or hwdom? While the doc and command line option name can
> perhaps left as is, internal variable names should not say Dom0 when
> they don't mean Dom0. Otoh if you mean only Dom0, then the use of
> hardware_domain above (and elsewhere) is now wrong.

Hm, everything is kind of mixed here. Existing variables already use
_dom0_ (iommu_dom0_strict for example). I can rename them to
iommu_hwdom_, because AFAICT this applies to the hardware domain.

> > +static int __init parse_dom0_iommu_param(const char *s)
> > +{
> > +    const char *ss;
> > +    int rc = 0;
> > +
> > +    do {
> > +        ss = strchr(s, ',');
> > +        if ( !ss )
> > +            ss = strchr(s, '\0');
> > +
> > +        if ( !strncmp(s, "none", ss - s) )
> > +            iommu_dom0_passthrough = true;
> > +        else if ( !strncmp(s, "strict", ss - s) )
> > +            iommu_dom0_strict = true;
> > +        else if ( !strncmp(s, "relaxed", ss - s) )
> > +            iommu_dom0_strict = false;
> 
> Perhaps better just have one of the two, and make them truly
> boolean? Or would that conflict with further patches / plans?

I don't think this will cause a lot of conflicts, some rebasing
issues but no big deal. I've used this syntax as discussed
in a previous version and agreed with Paul and Kevin. I'm OK with
this, and I think it's clear, but I don't have a strong opinion so if
you think this is not clear I can change it.

I would just like to reach a consensus on the nomenclature of the
option ASAP, so the bikeshed can be as small as possible :).

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/4] iommu: make iommu_inclusive_mapping a suboption of dom0-iommu
  2018-08-08 12:32   ` Jan Beulich
@ 2018-08-08 16:09     ` Roger Pau Monné
  2018-08-09  7:17       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2018-08-08 16:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall, xen-devel

On Wed, Aug 08, 2018 at 06:32:00AM -0600, Jan Beulich wrote:
> >>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
> > Introduce a new dom0-iommu=inclusive generic option that supersedes
> > iommu_inclusive_mapping. The previous behaviour is preserved and the
> > option should only be enabled by default on Intel hardware.
> 
> Why "should" instead of "is"?
> 
> > @@ -1221,6 +1221,18 @@ PV Dom0:
> >  Note that all the above options are mutually exclusive. Specifying more than
> >  one on the `dom0-iommu` command line will result in undefined behavior.
> >  
> > +The following options control whether non-RAM regions are added to the Dom0
> > +iommu tables. Note that they can be prefixed with `no-` to effect the inverse
> > +meaning:
> 
> I'm not particularly happy about the mentioning of "no-" here: Why is
> this better than the also permitted "=0" etc suffixes? Keep it generic,
> just like other options do.

Oh, I've just copied this text from the iommu option. Should I change
this to:

"The following boolean options control whether non-RAM regions are
added to the Dom0 iommu tables:"

> > +* `inclusive`: sets up DMA remapping for all the non-RAM memory below 4GB
> > +  except for unusable ranges. Use this to work around firmware issues providing
> > +  incorrect RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU
> > +  accesses for Dom0, with this option all pages up to 4GB, not marked as
> > +  unusable in the E820 table, will get a mapping established. Note that this
> > +  option is only applicable to a PV Dom0 and is enabled by default on Intel
> > +  hardware.
> 
> No word at all about the interaction with none/strict/relaxed? I think,
> as mentioned for patch 1, "none" renders this option meaningless as
> well. But for "relaxed" it's pretty unclear, because from E820 alone
> you can't judge whether e.g. a reserved region is RAM or MMIO. (As
> an implication, the mentioning of RAM in patch 1's doc for "relaxed"
> then looks symmetrically wrong, just like I've already asked to replace
> "memory" by "RAM" for "strict".)

Hm, I'm not sure I follow. 'relaxed' refers to how the regions marked
as RAM in the memory map (E820_RAM) will be mapped to the guest.

The inclusive option OTOH refers to if/how non-RAM regions in the
memory map will be mapped to the guest. It doesn't matter if a
reserved region (E820_RESERVED) is actually backed by RAM or MMIO, it
will be mapped to the guest because it's a reserved region on the
memory map.

> > --- a/xen/drivers/passthrough/arm/iommu.c
> > +++ b/xen/drivers/passthrough/arm/iommu.c
> > @@ -73,3 +73,7 @@ int arch_iommu_populate_page_table(struct domain *d)
> >      /* The IOMMU shares the p2m with the CPU */
> >      return -ENOSYS;
> >  }
> > +
> > +void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
> > +{
> > +}
> 
> The option being in common code, I think you also need to set it for
> ARM, so it won't remain at its default of -1.
> 
> > @@ -144,16 +145,23 @@ static int __init parse_dom0_iommu_param(const char *s)
> >      int rc = 0;
> >  
> >      do {
> > +        bool val = !!strncmp(s, "no-", 3);
> 
> Oh, you do a literal comparison against "no-". Please don't, that's what
> we have parse_boolean() for.

Thanks, I will fix it. I've mostly cloned what the iommu option
currently does.

> > @@ -202,6 +210,13 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
> >      if ( !iommu_enabled )
> >          return;
> >  
> > +    if ( iommu_dom0_inclusive == true && !is_pv_domain(d) )
> 
> Why the "== true"? It shouldn't have its initializer value of -1 anymore
> at this point.

It can have a value of -1, AFAICT the default values are set by
hd->platform_ops->hwdom_init which is called later in this function.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges
  2018-08-08 13:15   ` Jan Beulich
@ 2018-08-08 16:18     ` Roger Pau Monné
  2018-08-09  7:36       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2018-08-08 16:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

On Wed, Aug 08, 2018 at 07:15:54AM -0600, Jan Beulich wrote:
> >>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
> > Several people have reported hardware issues (malfunctioning USB
> > controllers) due to iommu page faults on Intel hardware. Those faults
> > are caused by missing RMRR (VTd) entries in the ACPI tables. Those can
> > be worked around on VTd hardware by manually adding RMRR entries on
> > the command line, this is however limited to Intel hardware and quite
> > cumbersome to do.
> > 
> > In order to solve those issues add a new dom0-iommu=reserved option
> > that identity maps all regions marked as reserved in the memory map.
> 
> Considering that report we've had (yesterday? earlier today?), don't
> we need to go further and use the union of RMRRs and reserved
> regions? Iirc they had a case where an RMRR was not in a reserved
> region ...

AFAICT (and I could be reading the code wrong) RMRR regions not on
reserved regions are still correctly mapped to the guest.

> > --- a/docs/misc/xen-command-line.markdown
> > +++ b/docs/misc/xen-command-line.markdown
> > @@ -1205,7 +1205,7 @@ detection of systems known to misbehave upon accesses 
> > to that port.
> >  >> Enable IOMMU debugging code (implies `verbose`).
> >  
> >  ### dom0-iommu
> > -> `= List of [ none | strict | relaxed | inclusive ]`
> > +> `= List of [ none | strict | relaxed | inclusive | reserved ]`
> >  
> >  * `none`: disables DMA remapping for Dom0.
> >  
> > @@ -1233,6 +1233,15 @@ meaning:
> >    option is only applicable to a PV Dom0 and is enabled by default on Intel
> >    hardware.
> >  
> > +* `reserved`: sets up DMA remapping for all the reserved regions in the  memory
> > +  map for Dom0. Use this to work around firmware issues providing incorrect
> > +  RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU accesses
> > +  for Dom0, all memory regions marked as reserved in the memory map that don't
> > +  overlap with any MMIO region from emulated devices will be identity mapped.
> > +  This option maps a subset of the memory that would be mapped when using the
> > +  `inclusive` option. This option is available to a PVH Dom0 and is enabled by
> > +  default on Intel hardware.
> 
> The sub-options so far were quite clear in their meanings, but
> "dom0-iommu=reserved" might mean all sorts of things to me, but quite
> certainly not "map all reserved regions". "map-reserved" perhaps?

Then we should also have 'map-inclusive' for symmetry IMO.

> > --- a/xen/arch/x86/hvm/io.c
> > +++ b/xen/arch/x86/hvm/io.c
> > @@ -404,6 +404,11 @@ static const struct hvm_mmcfg *vpci_mmcfg_find(const struct domain *d,
> >      return NULL;
> >  }
> >  
> > +bool vpci_mmcfg_address(const struct domain *d, paddr_t addr)
> > +{
> > +    return vpci_mmcfg_find(d, addr);
> > +}
> 
> I think the function name should have an "is_" somewhere, to make
> clear address is a noun here and not a verb.
> 
> > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> > @@ -256,6 +256,9 @@ static void __hwdom_init amd_iommu_hwdom_init(struct domain *d)
> >      /* Inclusive IOMMU mappings are disabled by default on AMD hardware. */
> >      iommu_dom0_inclusive = iommu_dom0_inclusive == -1 ? false
> >                                                        : iommu_dom0_inclusive;
> > +    /* Reserved IOMMU mappings are disabled by default on AMD hardware. */
> > +    iommu_dom0_reserved = iommu_dom0_reserved == -1 ? false
> > +                                                    : iommu_dom0_reserved;
> 
> Especially seeing these two together now, I think an if() each would
> be easier to read (not the least because of being shorter). To me,
> the main purpose of the conditional operator is to allow shortening
> simple if/else pairs, rather than lengthening simple if()-s.
> 
> > @@ -134,13 +135,67 @@ void arch_iommu_domain_destroy(struct domain *d)
> >  {
> >  }
> >  
> > +static bool __hwdom_init hwdom_iommu_map(const struct domain *d, unsigned long pfn,
> > +                                         unsigned long max_pfn)
> > +{
> > +    unsigned int i;
> > +
> > +    /*
> > +     * Ignore any address below 1MB, that's already identity mapped by the
> > +     * domain builder for HVM.
> > +     */
> > +    if ( (is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||
> 
> Careful again here with the distinction between Dom0 and hwdom:
> The domain builder you refer to is - aiui - the in-Xen one, i.e. the
> one _only_ dealing with Dom0.

So this should instead be:

if ( (is_control_domain(d) && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||

> > +    /*
> > +     * If dom0-strict mode is enabled or the guest type is PVH/HVM then exclude
> > +     * conventional RAM and let the common code map dom0's pages.
> > +     */
> > +    if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
> > +         (iommu_dom0_strict || is_hvm_domain(d)) )
> > +        return false;
> > +    if ( page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
> > +         !iommu_dom0_reserved && !iommu_dom0_inclusive )
> > +        return false;
> > +    if ( !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) &&
> > +         !page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
> > +         !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
> > +         (!iommu_dom0_inclusive || pfn > max_pfn) )
> > +        return false;
> 
> As page_is_ram_type() is, especially on systems with many E820
> entries, not really cheap, I think at least a minimum amount of
> optimization is on order here - after all you do this for every
> single page of the system. Hence minimally the result of the first
> CONVENTIONAL and RESERVED queries should be latched and
> re-used (or "else" be used suitably). Ideally an approach would
> be used which involved just a single iteration through the E820
> map, but I realize this may be more than is feasible here.

This would be quite better if I could just fetch the type, then I
could add a switch (type) { ... and it would be better IMO.

> Furthermore I'm unconvinced the !page_is_ram_type() uses
> are really what you want: The function returning false means
> "don't know", not "no". Perhaps the function needs to be made
> return a tristate (yes, no, or part of it).

Right, that's why I have three different !page_is_ram_type checks in
the last branch of the if, so that I can make sure it's not one of the
previous types and also account for holes.

> > +    /* Check that it doesn't overlap with the LAPIC */
> > +    if ( has_vlapic(d) )
> > +    {
> > +        const struct vcpu *v;
> > +
> > +        for_each_vcpu(d, v)
> > +            if ( pfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
> > +                return false;
> > +    }
> 
> I don't, btw, recall any code adjusting the IOMMU mappings in
> case the domain relocates its LAPIC. If you do the check above,
> wouldn't that other side too need taking care of?

Yes. I can add something later but this is already an issue if the
guest for example relocates the LAPIC over a RAM region.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/4] iommu: introduce dom0-iommu option
  2018-08-08 15:50     ` Roger Pau Monné
@ 2018-08-09  7:00       ` Jan Beulich
  2018-08-09 10:01         ` Roger Pau Monné
  2018-08-09 10:18         ` Paul Durrant
  0 siblings, 2 replies; 24+ messages in thread
From: Jan Beulich @ 2018-08-09  7:00 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall,
	Suravee Suthikulpanit, xen-devel, Brian Woods

>>> On 08.08.18 at 17:50, <roger.pau@citrix.com> wrote:
> On Wed, Aug 08, 2018 at 06:10:39AM -0600, Jan Beulich wrote:
>> >>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
>> > +Note that all the above options are mutually exclusive. Specifying more than
>> > +one on the `dom0-iommu` command line will result in undefined behavior.
>> 
>> Isn't this more strict than it needs to be? "none", afaict, always takes
>> precedence if enabled. What color a bike shed is simply doesn't matter
>> when it doesn't exist.
> 
> Right, that's due to the current implementation and the way this is
> stored, but I don't think we want to spell out any of this in order to
> not give any guarantees. For example:
> 
> dom0-iommu=none,relaxed
> 
> Shouldn't be used, albeit with the current implementation relaxed will
> be basically ignored I don't think we want to write this down
> anywhere because people shouldn't rely on this behavior.

Well, there's one very particular case to be considered: In a number
of environments you can (easily) append to the command line, but
you can't (easily) alter what has been put there e.g. in some config
file. If the config file says "dom0-iommu=relaxed" but for the current
run you want "dom0-iommu=none", with your restrictions you'd be
unable to (legitimately) do so.

Therefore I think we should try to avoid spelling out undefined
behavior for command line option combinations wherever we can.

>> > --- a/xen/arch/x86/x86_64/mm.c
>> > +++ b/xen/arch/x86/x86_64/mm.c
>> > @@ -1426,7 +1426,8 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
>> >      if ( ret )
>> >          goto destroy_m2p;
>> >  
>> > -    if ( iommu_enabled && !iommu_passthrough && !need_iommu(hardware_domain) )
>> > +    if ( iommu_enabled && !iommu_dom0_passthrough &&
>> > +         !need_iommu(hardware_domain) )
>> 
>> This makes already clear that you need to better distinguish Dom0 and
>> hwdom here, but it's not immediately clear to me which direction the
>> changes should be made: Do you mean truly only Dom0 throughout
>> this patch, or hwdom? While the doc and command line option name can
>> perhaps left as is, internal variable names should not say Dom0 when
>> they don't mean Dom0. Otoh if you mean only Dom0, then the use of
>> hardware_domain above (and elsewhere) is now wrong.
> 
> Hm, everything is kind of mixed here. Existing variables already use
> _dom0_ (iommu_dom0_strict for example). I can rename them to
> iommu_hwdom_, because AFAICT this applies to the hardware domain.

Well, as said - I'd like you to do so for ones you rename anyway.
I'd appreciate (but won't demand) you to also do so for others.

>> > +static int __init parse_dom0_iommu_param(const char *s)
>> > +{
>> > +    const char *ss;
>> > +    int rc = 0;
>> > +
>> > +    do {
>> > +        ss = strchr(s, ',');
>> > +        if ( !ss )
>> > +            ss = strchr(s, '\0');
>> > +
>> > +        if ( !strncmp(s, "none", ss - s) )
>> > +            iommu_dom0_passthrough = true;
>> > +        else if ( !strncmp(s, "strict", ss - s) )
>> > +            iommu_dom0_strict = true;
>> > +        else if ( !strncmp(s, "relaxed", ss - s) )
>> > +            iommu_dom0_strict = false;
>> 
>> Perhaps better just have one of the two, and make them truly
>> boolean? Or would that conflict with further patches / plans?
> 
> I don't think this will cause a lot of conflicts, some rebasing
> issues but no big deal. I've used this syntax as discussed
> in a previous version and agreed with Paul and Kevin. I'm OK with
> this, and I think it's clear, but I don't have a strong opinion so if
> you think this is not clear I can change it.

Well, I'm certainly of the pretty strong opinion that inverse
options should be specifiable by a boolean mechanism, not
(only) by entirely distinct names. I wouldn't mind you retaining
both "relaxed" and "strict", as long as "relaxed=0" means
"strict" and vice versa. Paul, Kevin?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/4] iommu: make iommu_inclusive_mapping a suboption of dom0-iommu
  2018-08-08 16:09     ` Roger Pau Monné
@ 2018-08-09  7:17       ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2018-08-09  7:17 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall, xen-devel

>>> On 08.08.18 at 18:09, <roger.pau@citrix.com> wrote:
> On Wed, Aug 08, 2018 at 06:32:00AM -0600, Jan Beulich wrote:
>> >>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
>> > Introduce a new dom0-iommu=inclusive generic option that supersedes
>> > iommu_inclusive_mapping. The previous behaviour is preserved and the
>> > option should only be enabled by default on Intel hardware.
>> 
>> Why "should" instead of "is"?
>> 
>> > @@ -1221,6 +1221,18 @@ PV Dom0:
>> >  Note that all the above options are mutually exclusive. Specifying more than
>> >  one on the `dom0-iommu` command line will result in undefined behavior.
>> >  
>> > +The following options control whether non-RAM regions are added to the Dom0
>> > +iommu tables. Note that they can be prefixed with `no-` to effect the inverse
>> > +meaning:
>> 
>> I'm not particularly happy about the mentioning of "no-" here: Why is
>> this better than the also permitted "=0" etc suffixes? Keep it generic,
>> just like other options do.
> 
> Oh, I've just copied this text from the iommu option. Should I change
> this to:
> 
> "The following boolean options control whether non-RAM regions are
> added to the Dom0 iommu tables:"

Yes please, much better.

>> > +* `inclusive`: sets up DMA remapping for all the non-RAM memory below 4GB
>> > +  except for unusable ranges. Use this to work around firmware issues providing
>> > +  incorrect RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU
>> > +  accesses for Dom0, with this option all pages up to 4GB, not marked as
>> > +  unusable in the E820 table, will get a mapping established. Note that this
>> > +  option is only applicable to a PV Dom0 and is enabled by default on Intel
>> > +  hardware.
>> 
>> No word at all about the interaction with none/strict/relaxed? I think,
>> as mentioned for patch 1, "none" renders this option meaningless as
>> well. But for "relaxed" it's pretty unclear, because from E820 alone
>> you can't judge whether e.g. a reserved region is RAM or MMIO. (As
>> an implication, the mentioning of RAM in patch 1's doc for "relaxed"
>> then looks symmetrically wrong, just like I've already asked to replace
>> "memory" by "RAM" for "strict".)
> 
> Hm, I'm not sure I follow. 'relaxed' refers to how the regions marked
> as RAM in the memory map (E820_RAM) will be mapped to the guest.

Hmm, it may well be that I'm (once again) confused with the
various options and their precise effects; perhaps I didn't pay
(enough) attention to the use of iommu_inclusive_mapping in
vtd_set_hwdom_mapping().

>> > @@ -202,6 +210,13 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
>> >      if ( !iommu_enabled )
>> >          return;
>> >  
>> > +    if ( iommu_dom0_inclusive == true && !is_pv_domain(d) )
>> 
>> Why the "== true"? It shouldn't have its initializer value of -1 anymore
>> at this point.
> 
> It can have a value of -1, AFAICT the default values are set by
> hd->platform_ops->hwdom_init which is called later in this function.

Hmm, I see. But that's not very nice. Can't you move this check
between the ->hwdom_init() hook invocation and the call to
arch_iommu_hwdom_init() that you add? The latter is the only
place where it's actually needed.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges
  2018-08-08 16:18     ` Roger Pau Monné
@ 2018-08-09  7:36       ` Jan Beulich
  2018-08-09 10:23         ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2018-08-09  7:36 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

>>> On 08.08.18 at 18:18, <roger.pau@citrix.com> wrote:
> On Wed, Aug 08, 2018 at 07:15:54AM -0600, Jan Beulich wrote:
>> >>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
>> > Several people have reported hardware issues (malfunctioning USB
>> > controllers) due to iommu page faults on Intel hardware. Those faults
>> > are caused by missing RMRR (VTd) entries in the ACPI tables. Those can
>> > be worked around on VTd hardware by manually adding RMRR entries on
>> > the command line, this is however limited to Intel hardware and quite
>> > cumbersome to do.
>> > 
>> > In order to solve those issues add a new dom0-iommu=reserved option
>> > that identity maps all regions marked as reserved in the memory map.
>> 
>> Considering that report we've had (yesterday? earlier today?), don't
>> we need to go further and use the union of RMRRs and reserved
>> regions? Iirc they had a case where an RMRR was not in a reserved
>> region ...
> 
> AFAICT (and I could be reading the code wrong) RMRR regions not on
> reserved regions are still correctly mapped to the guest.

Oh, yes, you're right - the logged message is really just that,
with no other enforced effects.

>> > --- a/docs/misc/xen-command-line.markdown
>> > +++ b/docs/misc/xen-command-line.markdown
>> > @@ -1205,7 +1205,7 @@ detection of systems known to misbehave upon accesses 
>> > to that port.
>> >  >> Enable IOMMU debugging code (implies `verbose`).
>> >  
>> >  ### dom0-iommu
>> > -> `= List of [ none | strict | relaxed | inclusive ]`
>> > +> `= List of [ none | strict | relaxed | inclusive | reserved ]`
>> >  
>> >  * `none`: disables DMA remapping for Dom0.
>> >  
>> > @@ -1233,6 +1233,15 @@ meaning:
>> >    option is only applicable to a PV Dom0 and is enabled by default on Intel
>> >    hardware.
>> >  
>> > +* `reserved`: sets up DMA remapping for all the reserved regions in the  memory
>> > +  map for Dom0. Use this to work around firmware issues providing incorrect
>> > +  RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU accesses
>> > +  for Dom0, all memory regions marked as reserved in the memory map that don't
>> > +  overlap with any MMIO region from emulated devices will be identity mapped.
>> > +  This option maps a subset of the memory that would be mapped when using the
>> > +  `inclusive` option. This option is available to a PVH Dom0 and is enabled by
>> > +  default on Intel hardware.
>> 
>> The sub-options so far were quite clear in their meanings, but
>> "dom0-iommu=reserved" might mean all sorts of things to me, but quite
>> certainly not "map all reserved regions". "map-reserved" perhaps?
> 
> Then we should also have 'map-inclusive' for symmetry IMO.

I don't mind at all such symmetry to be established.

>> > @@ -134,13 +135,67 @@ void arch_iommu_domain_destroy(struct domain *d)
>> >  {
>> >  }
>> >  
>> > +static bool __hwdom_init hwdom_iommu_map(const struct domain *d, unsigned long pfn,
>> > +                                         unsigned long max_pfn)
>> > +{
>> > +    unsigned int i;
>> > +
>> > +    /*
>> > +     * Ignore any address below 1MB, that's already identity mapped by the
>> > +     * domain builder for HVM.
>> > +     */
>> > +    if ( (is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||
>> 
>> Careful again here with the distinction between Dom0 and hwdom:
>> The domain builder you refer to is - aiui - the in-Xen one, i.e. the
>> one _only_ dealing with Dom0.
> 
> So this should instead be:
> 
> if ( (is_control_domain(d) && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||

From an abstract perspective (long term plans) is_control_domain()
can be true for multiple domains, none of which may be Dom0 or
hwdom. So no, I don't think the addition would help in any way.
With the reference to the in-Xen domain builder, I think you really
mean Dom0 here.

>> > +    /*
>> > +     * If dom0-strict mode is enabled or the guest type is PVH/HVM then exclude
>> > +     * conventional RAM and let the common code map dom0's pages.
>> > +     */
>> > +    if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
>> > +         (iommu_dom0_strict || is_hvm_domain(d)) )
>> > +        return false;
>> > +    if ( page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
>> > +         !iommu_dom0_reserved && !iommu_dom0_inclusive )
>> > +        return false;
>> > +    if ( !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) &&
>> > +         !page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
>> > +         !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
>> > +         (!iommu_dom0_inclusive || pfn > max_pfn) )
>> > +        return false;
>> 
>> As page_is_ram_type() is, especially on systems with many E820
>> entries, not really cheap, I think at least a minimum amount of
>> optimization is on order here - after all you do this for every
>> single page of the system. Hence minimally the result of the first
>> CONVENTIONAL and RESERVED queries should be latched and
>> re-used (or "else" be used suitably). Ideally an approach would
>> be used which involved just a single iteration through the E820
>> map, but I realize this may be more than is feasible here.
> 
> This would be quite better if I could just fetch the type, then I
> could add a switch (type) { ... and it would be better IMO.

Except that, as said, there's no "the" type for an entire page.
Only a single byte can have an exact type.

>> Furthermore I'm unconvinced the !page_is_ram_type() uses
>> are really what you want: The function returning false means
>> "don't know", not "no". Perhaps the function needs to be made
>> return a tristate (yes, no, or part of it).
> 
> Right, that's why I have three different !page_is_ram_type checks in
> the last branch of the if, so that I can make sure it's not one of the
> previous types and also account for holes.

I'm afraid I don't understand. Take the example of a single
page being split in an unusable and a reserved part. Both
respective function invocations will return false. Yet you
want to exclude both unusable and reserved types when
!iommu_dom0_inclusive, and hence your goal would be to
exclude that page here.

As to unusable - don't you break original behavior here
anyway? Shouldn't the function return false when
page_is_ram_type(pfn, RAM_TYPE_UNUSABLE), effectively
independent of any command line options?

>> > +    /* Check that it doesn't overlap with the LAPIC */
>> > +    if ( has_vlapic(d) )
>> > +    {
>> > +        const struct vcpu *v;
>> > +
>> > +        for_each_vcpu(d, v)
>> > +            if ( pfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
>> > +                return false;
>> > +    }
>> 
>> I don't, btw, recall any code adjusting the IOMMU mappings in
>> case the domain relocates its LAPIC. If you do the check above,
>> wouldn't that other side too need taking care of?
> 
> Yes. I can add something later but this is already an issue if the
> guest for example relocates the LAPIC over a RAM region.

Well, I'm fine leaving out parts that are currently broken anyway,
but please leave at least a TODO note just like you do for MCFG.
Or (for both) don't do anything here until the situation gets taken
care of uniformly (but presumably that's the worse option).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/4] iommu: introduce dom0-iommu option
  2018-08-09  7:00       ` Jan Beulich
@ 2018-08-09 10:01         ` Roger Pau Monné
  2018-08-09 10:29           ` Jan Beulich
  2018-08-09 10:18         ` Paul Durrant
  1 sibling, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2018-08-09 10:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall,
	Suravee Suthikulpanit, xen-devel, Brian Woods

On Thu, Aug 09, 2018 at 01:00:59AM -0600, Jan Beulich wrote:
> >>> On 08.08.18 at 17:50, <roger.pau@citrix.com> wrote:
> > On Wed, Aug 08, 2018 at 06:10:39AM -0600, Jan Beulich wrote:
> >> >>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
> >> > +Note that all the above options are mutually exclusive. Specifying more than
> >> > +one on the `dom0-iommu` command line will result in undefined behavior.
> >> 
> >> Isn't this more strict than it needs to be? "none", afaict, always takes
> >> precedence if enabled. What color a bike shed is simply doesn't matter
> >> when it doesn't exist.
> > 
> > Right, that's due to the current implementation and the way this is
> > stored, but I don't think we want to spell out any of this in order to
> > not give any guarantees. For example:
> > 
> > dom0-iommu=none,relaxed
> > 
> > Shouldn't be used, albeit with the current implementation relaxed will
> > be basically ignored I don't think we want to write this down
> > anywhere because people shouldn't rely on this behavior.
> 
> Well, there's one very particular case to be considered: In a number
> of environments you can (easily) append to the command line, but
> you can't (easily) alter what has been put there e.g. in some config
> file. If the config file says "dom0-iommu=relaxed" but for the current
> run you want "dom0-iommu=none", with your restrictions you'd be
> unable to (legitimately) do so.
> 
> Therefore I think we should try to avoid spelling out undefined
> behavior for command line option combinations wherever we can.

I'm fine with just having:

"Note that all the above options are mutually exclusive."

Leaving out the undefined behavior part.

Note that your example won't work as expected the other way around, if
you have dom0-iommu=none in the config and try to append
dom0-iommu=relaxed.

> >> > --- a/xen/arch/x86/x86_64/mm.c
> >> > +++ b/xen/arch/x86/x86_64/mm.c
> >> > @@ -1426,7 +1426,8 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
> >> >      if ( ret )
> >> >          goto destroy_m2p;
> >> >  
> >> > -    if ( iommu_enabled && !iommu_passthrough && !need_iommu(hardware_domain) )
> >> > +    if ( iommu_enabled && !iommu_dom0_passthrough &&
> >> > +         !need_iommu(hardware_domain) )
> >> 
> >> This makes already clear that you need to better distinguish Dom0 and
> >> hwdom here, but it's not immediately clear to me which direction the
> >> changes should be made: Do you mean truly only Dom0 throughout
> >> this patch, or hwdom? While the doc and command line option name can
> >> perhaps left as is, internal variable names should not say Dom0 when
> >> they don't mean Dom0. Otoh if you mean only Dom0, then the use of
> >> hardware_domain above (and elsewhere) is now wrong.
> > 
> > Hm, everything is kind of mixed here. Existing variables already use
> > _dom0_ (iommu_dom0_strict for example). I can rename them to
> > iommu_hwdom_, because AFAICT this applies to the hardware domain.
> 
> Well, as said - I'd like you to do so for ones you rename anyway.
> I'd appreciate (but won't demand) you to also do so for others.

In fact I think this would be clearer if something like:

enum {
    NONE,
    RELAXED,
    STRICT,
} iommu_hwdom = RELAXED;

Was used instead of iommu_dom0_passthrough and iommu_dom0_strict.

> >> > +static int __init parse_dom0_iommu_param(const char *s)
> >> > +{
> >> > +    const char *ss;
> >> > +    int rc = 0;
> >> > +
> >> > +    do {
> >> > +        ss = strchr(s, ',');
> >> > +        if ( !ss )
> >> > +            ss = strchr(s, '\0');
> >> > +
> >> > +        if ( !strncmp(s, "none", ss - s) )
> >> > +            iommu_dom0_passthrough = true;
> >> > +        else if ( !strncmp(s, "strict", ss - s) )
> >> > +            iommu_dom0_strict = true;
> >> > +        else if ( !strncmp(s, "relaxed", ss - s) )
> >> > +            iommu_dom0_strict = false;
> >> 
> >> Perhaps better just have one of the two, and make them truly
> >> boolean? Or would that conflict with further patches / plans?
> > 
> > I don't think this will cause a lot of conflicts, some rebasing
> > issues but no big deal. I've used this syntax as discussed
> > in a previous version and agreed with Paul and Kevin. I'm OK with
> > this, and I think it's clear, but I don't have a strong opinion so if
> > you think this is not clear I can change it.
> 
> Well, I'm certainly of the pretty strong opinion that inverse
> options should be specifiable by a boolean mechanism, not
> (only) by entirely distinct names. I wouldn't mind you retaining
> both "relaxed" and "strict", as long as "relaxed=0" means
> "strict" and vice versa. Paul, Kevin?

The issue I see with this is that no-none or none=0 doesn't make much
sense as pointed out by Kevin in a previous review. Also adding
support for relaxed=0 or strict=0 just makes the syntax more
complicated IMO, because:

none=0 -> relaxed?
strict=0 -> relaxed?
relaxed=0 -> strict?

We would have to spell out all this combinations in the command line
document.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/4] iommu: introduce dom0-iommu option
  2018-08-09  7:00       ` Jan Beulich
  2018-08-09 10:01         ` Roger Pau Monné
@ 2018-08-09 10:18         ` Paul Durrant
  1 sibling, 0 replies; 24+ messages in thread
From: Paul Durrant @ 2018-08-09 10:18 UTC (permalink / raw)
  To: 'Jan Beulich', Roger Pau Monne
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, Andrew Cooper,
	Tim (Xen.org),
	George Dunlap, Julien Grall, Suravee Suthikulpanit, xen-devel,
	Ian Jackson, Brian Woods

> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On Behalf
> Of Jan Beulich
> Sent: 09 August 2018 08:01
> To: Roger Pau Monne <roger.pau@citrix.com>
> Cc: Kevin Tian <kevin.tian@intel.com>; Stefano Stabellini
> <sstabellini@kernel.org>; Wei Liu <wei.liu2@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Tim
> (Xen.org) <tim@xen.org>; Julien Grall <julien.grall@arm.com>; Suravee
> Suthikulpanit <suravee.suthikulpanit@amd.com>; xen-devel <xen-
> devel@lists.xenproject.org>; Brian Woods <brian.woods@amd.com>
> Subject: Re: [Xen-devel] [PATCH v4 1/4] iommu: introduce dom0-iommu
> option
> 
> >>> On 08.08.18 at 17:50, <roger.pau@citrix.com> wrote:
> > On Wed, Aug 08, 2018 at 06:10:39AM -0600, Jan Beulich wrote:
> >> >>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
> >> > +Note that all the above options are mutually exclusive. Specifying more
> than
> >> > +one on the `dom0-iommu` command line will result in undefined
> behavior.
> >>
> >> Isn't this more strict than it needs to be? "none", afaict, always takes
> >> precedence if enabled. What color a bike shed is simply doesn't matter
> >> when it doesn't exist.
> >
> > Right, that's due to the current implementation and the way this is
> > stored, but I don't think we want to spell out any of this in order to
> > not give any guarantees. For example:
> >
> > dom0-iommu=none,relaxed
> >
> > Shouldn't be used, albeit with the current implementation relaxed will
> > be basically ignored I don't think we want to write this down
> > anywhere because people shouldn't rely on this behavior.
> 
> Well, there's one very particular case to be considered: In a number
> of environments you can (easily) append to the command line, but
> you can't (easily) alter what has been put there e.g. in some config
> file. If the config file says "dom0-iommu=relaxed" but for the current
> run you want "dom0-iommu=none", with your restrictions you'd be
> unable to (legitimately) do so.
> 
> Therefore I think we should try to avoid spelling out undefined
> behavior for command line option combinations wherever we can.
> 
> >> > --- a/xen/arch/x86/x86_64/mm.c
> >> > +++ b/xen/arch/x86/x86_64/mm.c
> >> > @@ -1426,7 +1426,8 @@ int memory_add(unsigned long spfn, unsigned
> long epfn, unsigned int pxm)
> >> >      if ( ret )
> >> >          goto destroy_m2p;
> >> >
> >> > -    if ( iommu_enabled && !iommu_passthrough &&
> !need_iommu(hardware_domain) )
> >> > +    if ( iommu_enabled && !iommu_dom0_passthrough &&
> >> > +         !need_iommu(hardware_domain) )
> >>
> >> This makes already clear that you need to better distinguish Dom0 and
> >> hwdom here, but it's not immediately clear to me which direction the
> >> changes should be made: Do you mean truly only Dom0 throughout
> >> this patch, or hwdom? While the doc and command line option name can
> >> perhaps left as is, internal variable names should not say Dom0 when
> >> they don't mean Dom0. Otoh if you mean only Dom0, then the use of
> >> hardware_domain above (and elsewhere) is now wrong.
> >
> > Hm, everything is kind of mixed here. Existing variables already use
> > _dom0_ (iommu_dom0_strict for example). I can rename them to
> > iommu_hwdom_, because AFAICT this applies to the hardware domain.
> 
> Well, as said - I'd like you to do so for ones you rename anyway.
> I'd appreciate (but won't demand) you to also do so for others.
> 
> >> > +static int __init parse_dom0_iommu_param(const char *s)
> >> > +{
> >> > +    const char *ss;
> >> > +    int rc = 0;
> >> > +
> >> > +    do {
> >> > +        ss = strchr(s, ',');
> >> > +        if ( !ss )
> >> > +            ss = strchr(s, '\0');
> >> > +
> >> > +        if ( !strncmp(s, "none", ss - s) )
> >> > +            iommu_dom0_passthrough = true;
> >> > +        else if ( !strncmp(s, "strict", ss - s) )
> >> > +            iommu_dom0_strict = true;
> >> > +        else if ( !strncmp(s, "relaxed", ss - s) )
> >> > +            iommu_dom0_strict = false;
> >>
> >> Perhaps better just have one of the two, and make them truly
> >> boolean? Or would that conflict with further patches / plans?
> >
> > I don't think this will cause a lot of conflicts, some rebasing
> > issues but no big deal. I've used this syntax as discussed
> > in a previous version and agreed with Paul and Kevin. I'm OK with
> > this, and I think it's clear, but I don't have a strong opinion so if
> > you think this is not clear I can change it.
> 
> Well, I'm certainly of the pretty strong opinion that inverse
> options should be specifiable by a boolean mechanism, not
> (only) by entirely distinct names. I wouldn't mind you retaining
> both "relaxed" and "strict", as long as "relaxed=0" means
> "strict" and vice versa. Paul, Kevin?
> 

Well, the problem is that the options relating to RAM mappings are not boolean. I guess they could be made boolean if we went with a scheme that tried to call out specifically what areas of RAM are mapped. I think the areas in question are just 'guest-ram' and 'non-guest-ram'.

Relaxed -> guest-ram+non-guest-ram
Strict -> guest-ram
None -> neither

  Paul

> Jan
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges
  2018-08-09  7:36       ` Jan Beulich
@ 2018-08-09 10:23         ` Roger Pau Monné
  2018-08-09 10:33           ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2018-08-09 10:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

On Thu, Aug 09, 2018 at 01:36:57AM -0600, Jan Beulich wrote:
> >>> On 08.08.18 at 18:18, <roger.pau@citrix.com> wrote:
> > On Wed, Aug 08, 2018 at 07:15:54AM -0600, Jan Beulich wrote:
> >> >>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
> >> > @@ -134,13 +135,67 @@ void arch_iommu_domain_destroy(struct domain *d)
> >> >  {
> >> >  }
> >> >  
> >> > +static bool __hwdom_init hwdom_iommu_map(const struct domain *d, unsigned long pfn,
> >> > +                                         unsigned long max_pfn)
> >> > +{
> >> > +    unsigned int i;
> >> > +
> >> > +    /*
> >> > +     * Ignore any address below 1MB, that's already identity mapped by the
> >> > +     * domain builder for HVM.
> >> > +     */
> >> > +    if ( (is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||
> >> 
> >> Careful again here with the distinction between Dom0 and hwdom:
> >> The domain builder you refer to is - aiui - the in-Xen one, i.e. the
> >> one _only_ dealing with Dom0.
> > 
> > So this should instead be:
> > 
> > if ( (is_control_domain(d) && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||
> 
> From an abstract perspective (long term plans) is_control_domain()
> can be true for multiple domains, none of which may be Dom0 or
> hwdom. So no, I don't think the addition would help in any way.
> With the reference to the in-Xen domain builder, I think you really
> mean Dom0 here.

OK, I will check against the domid then.

> >> > +    /*
> >> > +     * If dom0-strict mode is enabled or the guest type is PVH/HVM then exclude
> >> > +     * conventional RAM and let the common code map dom0's pages.
> >> > +     */
> >> > +    if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
> >> > +         (iommu_dom0_strict || is_hvm_domain(d)) )
> >> > +        return false;
> >> > +    if ( page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
> >> > +         !iommu_dom0_reserved && !iommu_dom0_inclusive )
> >> > +        return false;
> >> > +    if ( !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) &&
> >> > +         !page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
> >> > +         !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
> >> > +         (!iommu_dom0_inclusive || pfn > max_pfn) )
> >> > +        return false;
> >> 
> >> As page_is_ram_type() is, especially on systems with many E820
> >> entries, not really cheap, I think at least a minimum amount of
> >> optimization is on order here - after all you do this for every
> >> single page of the system. Hence minimally the result of the first
> >> CONVENTIONAL and RESERVED queries should be latched and
> >> re-used (or "else" be used suitably). Ideally an approach would
> >> be used which involved just a single iteration through the E820
> >> map, but I realize this may be more than is feasible here.
> > 
> > This would be quite better if I could just fetch the type, then I
> > could add a switch (type) { ... and it would be better IMO.
> 
> Except that, as said, there's no "the" type for an entire page.
> Only a single byte can have an exact type.

Right, I don0t think the original code was much better in that regard
anyway, neither I'm sure about how to handle this any better.

> >> Furthermore I'm unconvinced the !page_is_ram_type() uses
> >> are really what you want: The function returning false means
> >> "don't know", not "no". Perhaps the function needs to be made
> >> return a tristate (yes, no, or part of it).
> > 
> > Right, that's why I have three different !page_is_ram_type checks in
> > the last branch of the if, so that I can make sure it's not one of the
> > previous types and also account for holes.
> 
> I'm afraid I don't understand. Take the example of a single
> page being split in an unusable and a reserved part. Both
> respective function invocations will return false. Yet you
> want to exclude both unusable and reserved types when
> !iommu_dom0_inclusive, and hence your goal would be to
> exclude that page here.

Right, but I'm not sure how to fix this given the current interfaces.
I plan to introduce something like page_get_type() that returns the
whole type for a page in a similar fashion to what page_is_ram_type
currently does, but this is not going to solve the issue of a page
having different types.

> As to unusable - don't you break original behavior here
> anyway? Shouldn't the function return false when
> page_is_ram_type(pfn, RAM_TYPE_UNUSABLE), effectively
> independent of any command line options?

Yes, I think so, will fix it in the next version.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/4] iommu: introduce dom0-iommu option
  2018-08-09 10:01         ` Roger Pau Monné
@ 2018-08-09 10:29           ` Jan Beulich
  2018-08-09 10:51             ` Roger Pau Monné
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2018-08-09 10:29 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall,
	Suravee Suthikulpanit, xen-devel, Brian Woods

>>> On 09.08.18 at 12:01, <roger.pau@citrix.com> wrote:
> On Thu, Aug 09, 2018 at 01:00:59AM -0600, Jan Beulich wrote:
>> >>> On 08.08.18 at 17:50, <roger.pau@citrix.com> wrote:
>> > On Wed, Aug 08, 2018 at 06:10:39AM -0600, Jan Beulich wrote:
>> >> >>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
>> >> > +Note that all the above options are mutually exclusive. Specifying more than
>> >> > +one on the `dom0-iommu` command line will result in undefined behavior.
>> >> 
>> >> Isn't this more strict than it needs to be? "none", afaict, always takes
>> >> precedence if enabled. What color a bike shed is simply doesn't matter
>> >> when it doesn't exist.
>> > 
>> > Right, that's due to the current implementation and the way this is
>> > stored, but I don't think we want to spell out any of this in order to
>> > not give any guarantees. For example:
>> > 
>> > dom0-iommu=none,relaxed
>> > 
>> > Shouldn't be used, albeit with the current implementation relaxed will
>> > be basically ignored I don't think we want to write this down
>> > anywhere because people shouldn't rely on this behavior.
>> 
>> Well, there's one very particular case to be considered: In a number
>> of environments you can (easily) append to the command line, but
>> you can't (easily) alter what has been put there e.g. in some config
>> file. If the config file says "dom0-iommu=relaxed" but for the current
>> run you want "dom0-iommu=none", with your restrictions you'd be
>> unable to (legitimately) do so.
>> 
>> Therefore I think we should try to avoid spelling out undefined
>> behavior for command line option combinations wherever we can.
> 
> I'm fine with just having:
> 
> "Note that all the above options are mutually exclusive."

But they aren't.

> Note that your example won't work as expected the other way around, if
> you have dom0-iommu=none in the config and try to append
> dom0-iommu=relaxed.

Indeed, which means there would need to be an opposite of
"none". I'm hesitant to suggest "no-none". Perhaps "strict"
and "relaxed" could also clear that other flag?

>> >> > --- a/xen/arch/x86/x86_64/mm.c
>> >> > +++ b/xen/arch/x86/x86_64/mm.c
>> >> > @@ -1426,7 +1426,8 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
>> >> >      if ( ret )
>> >> >          goto destroy_m2p;
>> >> >  
>> >> > -    if ( iommu_enabled && !iommu_passthrough && !need_iommu(hardware_domain) )
>> >> > +    if ( iommu_enabled && !iommu_dom0_passthrough &&
>> >> > +         !need_iommu(hardware_domain) )
>> >> 
>> >> This makes already clear that you need to better distinguish Dom0 and
>> >> hwdom here, but it's not immediately clear to me which direction the
>> >> changes should be made: Do you mean truly only Dom0 throughout
>> >> this patch, or hwdom? While the doc and command line option name can
>> >> perhaps left as is, internal variable names should not say Dom0 when
>> >> they don't mean Dom0. Otoh if you mean only Dom0, then the use of
>> >> hardware_domain above (and elsewhere) is now wrong.
>> > 
>> > Hm, everything is kind of mixed here. Existing variables already use
>> > _dom0_ (iommu_dom0_strict for example). I can rename them to
>> > iommu_hwdom_, because AFAICT this applies to the hardware domain.
>> 
>> Well, as said - I'd like you to do so for ones you rename anyway.
>> I'd appreciate (but won't demand) you to also do so for others.
> 
> In fact I think this would be clearer if something like:
> 
> enum {
>     NONE,
>     RELAXED,
>     STRICT,
> } iommu_hwdom = RELAXED;
> 
> Was used instead of iommu_dom0_passthrough and iommu_dom0_strict.

Fine with me.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges
  2018-08-09 10:23         ` Roger Pau Monné
@ 2018-08-09 10:33           ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2018-08-09 10:33 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Ian Jackson, Tim Deegan, Julien Grall, xen-devel

>>> On 09.08.18 at 12:23, <roger.pau@citrix.com> wrote:
> On Thu, Aug 09, 2018 at 01:36:57AM -0600, Jan Beulich wrote:
>> >>> On 08.08.18 at 18:18, <roger.pau@citrix.com> wrote:
>> > On Wed, Aug 08, 2018 at 07:15:54AM -0600, Jan Beulich wrote:
>> >> >>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
>> >> > +    /*
>> >> > +     * If dom0-strict mode is enabled or the guest type is PVH/HVM then exclude
>> >> > +     * conventional RAM and let the common code map dom0's pages.
>> >> > +     */
>> >> > +    if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
>> >> > +         (iommu_dom0_strict || is_hvm_domain(d)) )
>> >> > +        return false;
>> >> > +    if ( page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
>> >> > +         !iommu_dom0_reserved && !iommu_dom0_inclusive )
>> >> > +        return false;
>> >> > +    if ( !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) &&
>> >> > +         !page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
>> >> > +         !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
>> >> > +         (!iommu_dom0_inclusive || pfn > max_pfn) )
>> >> > +        return false;
>> >> 
>> >> As page_is_ram_type() is, especially on systems with many E820
>> >> entries, not really cheap, I think at least a minimum amount of
>> >> optimization is on order here - after all you do this for every
>> >> single page of the system. Hence minimally the result of the first
>> >> CONVENTIONAL and RESERVED queries should be latched and
>> >> re-used (or "else" be used suitably). Ideally an approach would
>> >> be used which involved just a single iteration through the E820
>> >> map, but I realize this may be more than is feasible here.
>> > 
>> > This would be quite better if I could just fetch the type, then I
>> > could add a switch (type) { ... and it would be better IMO.
>> 
>> Except that, as said, there's no "the" type for an entire page.
>> Only a single byte can have an exact type.
> 
> Right, I don0t think the original code was much better in that regard
> anyway, neither I'm sure about how to handle this any better.

And I'm not demanding that you improve it beyond what's there
at this point. But we need to be reasonably certain it doesn't
regress.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/4] iommu: introduce dom0-iommu option
  2018-08-09 10:29           ` Jan Beulich
@ 2018-08-09 10:51             ` Roger Pau Monné
  2018-08-09 11:46               ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2018-08-09 10:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall,
	Suravee Suthikulpanit, xen-devel, Brian Woods

On Thu, Aug 09, 2018 at 04:29:51AM -0600, Jan Beulich wrote:
> >>> On 09.08.18 at 12:01, <roger.pau@citrix.com> wrote:
> > On Thu, Aug 09, 2018 at 01:00:59AM -0600, Jan Beulich wrote:
> >> >>> On 08.08.18 at 17:50, <roger.pau@citrix.com> wrote:
> >> > On Wed, Aug 08, 2018 at 06:10:39AM -0600, Jan Beulich wrote:
> >> >> >>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
> >> >> > +Note that all the above options are mutually exclusive. Specifying more than
> >> >> > +one on the `dom0-iommu` command line will result in undefined behavior.
> >> >> 
> >> >> Isn't this more strict than it needs to be? "none", afaict, always takes
> >> >> precedence if enabled. What color a bike shed is simply doesn't matter
> >> >> when it doesn't exist.
> >> > 
> >> > Right, that's due to the current implementation and the way this is
> >> > stored, but I don't think we want to spell out any of this in order to
> >> > not give any guarantees. For example:
> >> > 
> >> > dom0-iommu=none,relaxed
> >> > 
> >> > Shouldn't be used, albeit with the current implementation relaxed will
> >> > be basically ignored I don't think we want to write this down
> >> > anywhere because people shouldn't rely on this behavior.
> >> 
> >> Well, there's one very particular case to be considered: In a number
> >> of environments you can (easily) append to the command line, but
> >> you can't (easily) alter what has been put there e.g. in some config
> >> file. If the config file says "dom0-iommu=relaxed" but for the current
> >> run you want "dom0-iommu=none", with your restrictions you'd be
> >> unable to (legitimately) do so.
> >> 
> >> Therefore I think we should try to avoid spelling out undefined
> >> behavior for command line option combinations wherever we can.
> > 
> > I'm fine with just having:
> > 
> > "Note that all the above options are mutually exclusive."
> 
> But they aren't.
> 
> > Note that your example won't work as expected the other way around, if
> > you have dom0-iommu=none in the config and try to append
> > dom0-iommu=relaxed.
> 
> Indeed, which means there would need to be an opposite of
> "none". I'm hesitant to suggest "no-none". Perhaps "strict"
> and "relaxed" could also clear that other flag?

See below.

> >> >> > --- a/xen/arch/x86/x86_64/mm.c
> >> >> > +++ b/xen/arch/x86/x86_64/mm.c
> >> >> > @@ -1426,7 +1426,8 @@ int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm)
> >> >> >      if ( ret )
> >> >> >          goto destroy_m2p;
> >> >> >  
> >> >> > -    if ( iommu_enabled && !iommu_passthrough && !need_iommu(hardware_domain) )
> >> >> > +    if ( iommu_enabled && !iommu_dom0_passthrough &&
> >> >> > +         !need_iommu(hardware_domain) )
> >> >> 
> >> >> This makes already clear that you need to better distinguish Dom0 and
> >> >> hwdom here, but it's not immediately clear to me which direction the
> >> >> changes should be made: Do you mean truly only Dom0 throughout
> >> >> this patch, or hwdom? While the doc and command line option name can
> >> >> perhaps left as is, internal variable names should not say Dom0 when
> >> >> they don't mean Dom0. Otoh if you mean only Dom0, then the use of
> >> >> hardware_domain above (and elsewhere) is now wrong.
> >> > 
> >> > Hm, everything is kind of mixed here. Existing variables already use
> >> > _dom0_ (iommu_dom0_strict for example). I can rename them to
> >> > iommu_hwdom_, because AFAICT this applies to the hardware domain.
> >> 
> >> Well, as said - I'd like you to do so for ones you rename anyway.
> >> I'd appreciate (but won't demand) you to also do so for others.
> > 
> > In fact I think this would be clearer if something like:
> > 
> > enum {
> >     NONE,
> >     RELAXED,
> >     STRICT,
> > } iommu_hwdom = RELAXED;
> > 
> > Was used instead of iommu_dom0_passthrough and iommu_dom0_strict.
> 
> Fine with me.

Switching to a single enum also means that the last dom0-iommu = [
none | strict | relaxed ] will be the one enforced, so I could keep
them as-is, and add:

"Note that all the above options are mutually exclusive and the last
one found on the command line will be the one to take effect."

This will also deal with your request to be able to override previous
options on the command line while keeping the options as they are.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/4] iommu: introduce dom0-iommu option
  2018-08-09 10:51             ` Roger Pau Monné
@ 2018-08-09 11:46               ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2018-08-09 11:46 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Kevin Tian, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Ian Jackson, Tim Deegan, Julien Grall,
	Suravee Suthikulpanit, xen-devel, Brian Woods

>>> On 09.08.18 at 12:51, <roger.pau@citrix.com> wrote:
> On Thu, Aug 09, 2018 at 04:29:51AM -0600, Jan Beulich wrote:
>> >>> On 09.08.18 at 12:01, <roger.pau@citrix.com> wrote:
>> > In fact I think this would be clearer if something like:
>> > 
>> > enum {
>> >     NONE,
>> >     RELAXED,
>> >     STRICT,
>> > } iommu_hwdom = RELAXED;
>> > 
>> > Was used instead of iommu_dom0_passthrough and iommu_dom0_strict.
>> 
>> Fine with me.
> 
> Switching to a single enum also means that the last dom0-iommu = [
> none | strict | relaxed ] will be the one enforced, so I could keep
> them as-is, and add:
> 
> "Note that all the above options are mutually exclusive and the last
> one found on the command line will be the one to take effect."

Right. I don't think the second half of the sentence is needed -
that's what happens for most other options as well.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 3/4] dom0/pvh: change the order of the MMCFG initialization
  2018-08-08 12:35   ` Jan Beulich
@ 2018-08-10 10:04     ` Roger Pau Monné
  2018-08-10 11:41       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Roger Pau Monné @ 2018-08-10 10:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Wed, Aug 08, 2018 at 06:35:47AM -0600, Jan Beulich wrote:
> >>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
> > So it's done before the iommu is initialized. This is required in
> > order to be able to fetch the MMCFG regions from the domain struct.
> 
> Is this a useful change to make? Regions not reported through the MCFG
> table will need punching holes anyway, so why not punch holes uniformly
> in all cases, allowing the hole punching code to be tested even on systems
> without non-boot-time-available regions?

I can add this hole-punching code to register_vpci_mmcfg_handler so I
can remove this reordering.

I'm however struggling to find a function that will set a p2m range to
p2m_invalid. All the functions that I find to deal with this assume
that you know the memory type you are trying to remove (for example
clear_identity_p2m_entry) and fail if the current type is different
than expected.

In this case it's quite likely that the MMCFG region is not mapped to
anything, should I introduce a new helper that sets a p2m range to
p2m_invalid regardless of the current type?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 3/4] dom0/pvh: change the order of the MMCFG initialization
  2018-08-10 10:04     ` Roger Pau Monné
@ 2018-08-10 11:41       ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2018-08-10 11:41 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

>>> On 10.08.18 at 12:04, <roger.pau@citrix.com> wrote:
> On Wed, Aug 08, 2018 at 06:35:47AM -0600, Jan Beulich wrote:
>> >>> On 08.08.18 at 12:07, <roger.pau@citrix.com> wrote:
>> > So it's done before the iommu is initialized. This is required in
>> > order to be able to fetch the MMCFG regions from the domain struct.
>> 
>> Is this a useful change to make? Regions not reported through the MCFG
>> table will need punching holes anyway, so why not punch holes uniformly
>> in all cases, allowing the hole punching code to be tested even on systems
>> without non-boot-time-available regions?
> 
> I can add this hole-punching code to register_vpci_mmcfg_handler so I
> can remove this reordering.
> 
> I'm however struggling to find a function that will set a p2m range to
> p2m_invalid. All the functions that I find to deal with this assume
> that you know the memory type you are trying to remove (for example
> clear_identity_p2m_entry) and fail if the current type is different
> than expected.
> 
> In this case it's quite likely that the MMCFG region is not mapped to
> anything, should I introduce a new helper that sets a p2m range to
> p2m_invalid regardless of the current type?

Well, if there is none - why not? But really that's more like something to
get George to agree to.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-08-10 11:41 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-08 10:07 [PATCH v4 0/4] 86/iommu: PVH Dom0 workarounds for missing RMRR entries Roger Pau Monne
2018-08-08 10:07 ` [PATCH v4 1/4] iommu: introduce dom0-iommu option Roger Pau Monne
2018-08-08 12:10   ` Jan Beulich
2018-08-08 15:50     ` Roger Pau Monné
2018-08-09  7:00       ` Jan Beulich
2018-08-09 10:01         ` Roger Pau Monné
2018-08-09 10:29           ` Jan Beulich
2018-08-09 10:51             ` Roger Pau Monné
2018-08-09 11:46               ` Jan Beulich
2018-08-09 10:18         ` Paul Durrant
2018-08-08 10:07 ` [PATCH v4 2/4] iommu: make iommu_inclusive_mapping a suboption of dom0-iommu Roger Pau Monne
2018-08-08 12:32   ` Jan Beulich
2018-08-08 16:09     ` Roger Pau Monné
2018-08-09  7:17       ` Jan Beulich
2018-08-08 10:07 ` [PATCH v4 3/4] dom0/pvh: change the order of the MMCFG initialization Roger Pau Monne
2018-08-08 12:35   ` Jan Beulich
2018-08-10 10:04     ` Roger Pau Monné
2018-08-10 11:41       ` Jan Beulich
2018-08-08 10:07 ` [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges Roger Pau Monne
2018-08-08 13:15   ` Jan Beulich
2018-08-08 16:18     ` Roger Pau Monné
2018-08-09  7:36       ` Jan Beulich
2018-08-09 10:23         ` Roger Pau Monné
2018-08-09 10:33           ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.