All of lore.kernel.org
 help / color / mirror / Atom feed
* [v3][PATCH 00/16] Fix RMRR
@ 2015-06-11  1:15 Tiejun Chen
  2015-06-11  1:15 ` [v3][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
                   ` (17 more replies)
  0 siblings, 18 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

v3:

* Rearrange all patches orderly as Wei suggested
* Rebase on the latest tree
* Address some Wei's comments on tools side
* Two changes for runtime cycle
   patch #2,xen/x86/p2m: introduce set_identity_p2m_entry, on hypervisor side

  a>. Introduce paging_mode_translate()
  Otherwise, we'll see this error when boot Xen/Dom0

(XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
....
(XEN) Xen call trace:
(XEN)    [<ffff82d0801f53db>] p2m_pt_get_entry+0x29/0x558
(XEN)    [<ffff82d0801f0b5c>] set_identity_p2m_entry+0xfc/0x1f0
(XEN)    [<ffff82d08014ebc8>] rmrr_identity_mapping+0x154/0x1ce
(XEN)    [<ffff82d0802abb46>] intel_iommu_hwdom_init+0x76/0x158
(XEN)    [<ffff82d0802ab169>] iommu_hwdom_init+0x179/0x188
(XEN)    [<ffff82d0802cc608>] construct_dom0+0x2fed/0x35d8
(XEN)    [<ffff82d0802bdaa0>] __start_xen+0x22d8/0x2381
(XEN)    [<ffff82d080100067>] __high_start+0x53/0x55
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702

Note I don't copy all info since I think the above is enough.

  b>. Actually we still need to use "mfn_x(mfn) == INVALID_MFN" to confirm
  we're getting an invalid mfn.

* Add patch #16 to handle those devices which share same RMRR.

v2:

* Instead of that fixed predefined rdm memory boundary, we'd like to
  introduce a parameter, "rdm_mem_boundary", to set this threshold value.

* Remove that existing USB hack.

* Make sure the MMIO regions all fit in the available resource window

* Rename our policy, "force/try" -> "strict/relaxed"

* Indeed, Wei and Jan gave me more and more comments to refine codes
  * Code style
  * Better and reasonable code implementation
  * Correct or improve code comments.

* A little bit to work well with ARM.

Open:

* We should fail assigning device which has a shared RMRR with
another device. We can only do group assignment when RMRR is shared
among devices.

We need more time to figure a good policy/way out because something
is not clear to me.

As you know all devices are owned by Dom0 firstly before we create any
DomU, right? Do we allow Dom0 still own a group device while assign another
device in the same group?

Really appreciate any comments to policy.


v1:

RMRR is an acronym for Reserved Memory Region Reporting, expected to
be used for legacy usages (such as USB, UMA Graphics, etc.) requiring
reserved memory. Special treatment is required in system software to
setup those reserved regions in IOMMU translation structures, otherwise
passing through a device with RMRR reported may not work correctly.

This patch set tries to enhance existing Xen RMRR implementation to fix
various reported and theoretical problems. Most noteworthy changes are
to setup identity mapping in p2m layer and handle possible conflicts between
reported regions and gfn space. Initial proposal can be found at:
    http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg00524.html
and after a long discussion a summarized agreement is here:
    http://lists.xen.org/archives/html/xen-devel/2015-01/msg01580.html

Below is a key summary of this patch set according to agreed proposal:

1. Use RDM (Reserved Device Memory) name in user space as a general 
description instead of using ACPI RMRR name directly.

2. Introduce configuration parameters to allow user control both per-device 
and global RDM resources along with desired policies upon a detected conflict.

3. Introduce a new hypercall to query global and per-device RDM resources.

4. Extend libxl to be a central place to manage RDM resources and handle 
potential conflicts between reserved regions and gfn space. One simplification
goal is made to keep existing lowmem / mmio / highmem layout which is
passed around various function blocks. So a reasonable assumption
is made, that conflicts falling into below areas are not re-arranged otherwise
it will result in a more scattered layout:
    a) in highmem region (>4G)
    b) in lowmem region, and below a predefined boundary (default 2G)
  a) is a new assumption not discussed before. From VT-d spec this is 
possible but no such observation in real-world. So we can make this
reasonable assumption until there's real usage on it.

5. Extend XENMEM_set_memory_map usable for HVM guest, and then have
libxl to use that hypercall to carry RDM information to hvmloader. There
is one difference from original discussion. Previously we discussed to
introduce a new E820 type specifically for RDM entries. After more thought
we think it's OK to just tag them as E820_reserved. Actually hvmloader
doesn't need to know whether the reserved entries come from RDM or
from other purposes. 

6. Then in hvmloader the change is generic for XENMEM_memory_map
change. Given a predefined memory layout, hvmloader should avoid
allocating all reserved entries for other usages (opregion, mmio, etc.)

7. Extend existing device passthrough hypercall to carry conflict handling
policy.

8. Setup identity map in p2m layer for RMRRs reported for the given
device. And conflicts are handled according to specified policy in hypercall.

Current patch set contains core enhancements calling for comments.
There are still several tasks not implemented now. We'll include them
in final version after RFC is agreed:

- remove existing USB hack
- detect and fail assigning device which has a shared RMRR with another device
- add a config parameter to configure that memory boundary flexibly
- In the case of hotplug we also need to figure out a way to fix that policy
  conflict between the per-pci policy and the global policy but firstly we think
  we'd better collect some good or correct ideas to step next in RFC. 

So here I made this as RFC to collect your any comments.

----------------------------------------------------------------
Jan Beulich (1):
      xen: introduce XENMEM_reserved_device_memory_map
 
Tiejun Chen (15):
      xen/x86/p2m: introduce set_identity_p2m_entry
      xen/vtd: create RMRR mapping
      xen/passthrough: extend hypercall to support rdm reservation policy
      xen: enable XENMEM_memory_map in hvm
      hvmloader: get guest memory map into memory_map[]
      hvmloader/pci: skip reserved ranges
      hvmloader/e820: construct guest e820 table
      tools/libxc: Expose new hypercall xc_reserved_device_memory_map
      tools: extend xc_assign_device() to support rdm reservation policy
      tools: introduce some new parameters to set rdm policy
      tools/libxl: passes rdm reservation policy
      tools/libxl: detect and avoid conflicts with RDM
      tools/libxl: extend XENMEM_set_memory_map
      xen/vtd: enable USB device assignment
      xen/vtd: prevent from assign the device with shared rmrr

 docs/man/xl.cfg.pod.5                       |  71 ++++++
 docs/man/xl.pod.1                           |   7 +-
 docs/misc/vtd.txt                           |  24 ++
 tools/firmware/hvmloader/e820.c             |  62 +++--
 tools/firmware/hvmloader/e820.h             |   7 +
 tools/firmware/hvmloader/hvmloader.c        |  37 +++
 tools/firmware/hvmloader/pci.c              |  36 ++-
 tools/firmware/hvmloader/util.c             |  26 ++
 tools/firmware/hvmloader/util.h             |  11 +
 tools/libxc/include/xenctrl.h               |  11 +-
 tools/libxc/xc_domain.c                     |  42 +++-
 tools/libxc/xc_hvm_build_x86.c              |   5 +-
 tools/libxl/libxl.h                         |   6 +
 tools/libxl/libxl_create.c                  |  19 +-
 tools/libxl/libxl_dm.c                      | 255 ++++++++++++++++++++
 tools/libxl/libxl_dom.c                     |  16 +-
 tools/libxl/libxl_internal.h                |  37 ++-
 tools/libxl/libxl_pci.c                     |  14 +-
 tools/libxl/libxl_types.idl                 |  26 ++
 tools/libxl/libxl_x86.c                     |  78 ++++++
 tools/libxl/libxlu_pci.c                    |  92 +++++++
 tools/libxl/libxlutil.h                     |   4 +
 tools/libxl/xl_cmdimpl.c                    |  36 ++-
 tools/libxl/xl_cmdtable.c                   |   2 +-
 tools/ocaml/libs/xc/xenctrl_stubs.c         |  18 +-
 tools/python/xen/lowlevel/xc/xc.c           |  29 ++-
 xen/arch/x86/hvm/hvm.c                      |   2 -
 xen/arch/x86/mm.c                           |   6 -
 xen/arch/x86/mm/p2m.c                       |  47 ++++
 xen/common/compat/memory.c                  |  66 +++++
 xen/common/memory.c                         |  64 +++++
 xen/drivers/passthrough/amd/pci_amd_iommu.c |   3 +-
 xen/drivers/passthrough/arm/smmu.c          |   2 +-
 xen/drivers/passthrough/device_tree.c       |  11 +-
 xen/drivers/passthrough/iommu.c             |  10 +
 xen/drivers/passthrough/pci.c               |  10 +-
 xen/drivers/passthrough/vtd/dmar.c          |  32 +++
 xen/drivers/passthrough/vtd/dmar.h          |   1 -
 xen/drivers/passthrough/vtd/extern.h        |   1 +
 xen/drivers/passthrough/vtd/iommu.c         |  63 +++--
 xen/drivers/passthrough/vtd/utils.c         |   7 -
 xen/include/asm-x86/p2m.h                   |   4 +
 xen/include/public/domctl.h                 |   5 +
 xen/include/public/memory.h                 |  32 ++-
 xen/include/xen/iommu.h                     |  12 +-
 xen/include/xen/pci.h                       |   2 +
 xen/include/xlat.lst                        |   3 +-
 47 files changed, 1264 insertions(+), 90 deletions(-)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* [v3][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-11  8:56   ` Tian, Kevin
  2015-06-11  1:15 ` [v3][PATCH 02/16] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

From: Jan Beulich <jbeulich@suse.com>

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/common/compat/memory.c           | 66 ++++++++++++++++++++++++++++++++++++
 xen/common/memory.c                  | 64 ++++++++++++++++++++++++++++++++++
 xen/drivers/passthrough/iommu.c      | 10 ++++++
 xen/drivers/passthrough/vtd/dmar.c   | 32 +++++++++++++++++
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h          | 32 ++++++++++++++++-
 xen/include/xen/iommu.h              | 10 ++++++
 xen/include/xen/pci.h                |  2 ++
 xen/include/xlat.lst                 |  3 +-
 10 files changed, 219 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index b258138..b608496 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -17,6 +17,45 @@ CHECK_TYPE(domid);
 CHECK_mem_access_op;
 CHECK_vmemrange;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+    struct compat_reserved_device_memory_map map;
+    unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                      u32 id, void *ctxt)
+{
+    struct get_reserved_device_memory *grdm = ctxt;
+    u32 sbdf;
+    struct compat_reserved_device_memory rdm = {
+        .start_pfn = start, .nr_pages = nr
+    };
+
+    sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+    if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+    {
+        if ( grdm->used_entries < grdm->map.nr_entries )
+        {
+            if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+                return -ERANGE;
+
+            if ( __copy_to_compat_offset(grdm->map.buffer,
+                                         grdm->used_entries,
+                                         &rdm,
+                                         1) )
+            {
+                return -EFAULT;
+            }
+        }
+        ++grdm->used_entries;
+        return 1;
+    }
+
+    return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
     int split, op = cmd & MEMOP_CMD_MASK;
@@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
             break;
         }
 
+#ifdef HAS_PASSTHROUGH
+        case XENMEM_reserved_device_memory_map:
+        {
+            struct get_reserved_device_memory grdm;
+
+            if ( copy_from_guest(&grdm.map, compat, 1) ||
+                 !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+                return -EFAULT;
+
+            grdm.used_entries = 0;
+            rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+                                                  &grdm);
+
+            if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+                rc = -ENOBUFS;
+
+            grdm.map.nr_entries = grdm.used_entries;
+            if ( grdm.map.nr_entries )
+            {
+                if ( __copy_to_guest(compat, &grdm.map, 1) )
+                    rc = -EFAULT;
+            }
+
+            return rc;
+        }
+#endif
+
         default:
             return compat_arch_memory_op(cmd, compat);
         }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 063a1c5..c789f72 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
     return 0;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+    struct xen_reserved_device_memory_map map;
+    unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+                                      u32 id, void *ctxt)
+{
+    struct get_reserved_device_memory *grdm = ctxt;
+    u32 sbdf;
+
+    sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+    if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+    {
+        if ( grdm->used_entries < grdm->map.nr_entries )
+        {
+            struct xen_reserved_device_memory rdm = {
+                .start_pfn = start, .nr_pages = nr
+            };
+
+            if ( __copy_to_guest_offset(grdm->map.buffer,
+                                        grdm->used_entries,
+                                        &rdm,
+                                        1) )
+            {
+                return -EFAULT;
+            }
+        }
+        ++grdm->used_entries;
+        return 1;
+    }
+
+    return 0;
+}
+#endif
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct domain *d;
@@ -1162,6 +1199,33 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+#ifdef HAS_PASSTHROUGH
+    case XENMEM_reserved_device_memory_map:
+    {
+        struct get_reserved_device_memory grdm;
+
+        if ( copy_from_guest(&grdm.map, arg, 1) ||
+             !guest_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+            return -EFAULT;
+
+        grdm.used_entries = 0;
+        rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+                                              &grdm);
+
+        if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+            rc = -ENOBUFS;
+
+        grdm.map.nr_entries = grdm.used_entries;
+        if ( grdm.map.nr_entries )
+        {
+            if ( __copy_to_guest(arg, &grdm.map, 1) )
+                rc = -EFAULT;
+        }
+
+        break;
+    }
+#endif
+
     default:
         rc = arch_memory_op(cmd, arg);
         break;
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 06cb38f..0b2ef52 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -375,6 +375,16 @@ void iommu_crash_shutdown(void)
     iommu_enabled = iommu_intremap = 0;
 }
 
+int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+
+    if ( !iommu_enabled || !ops->get_reserved_device_memory )
+        return 0;
+
+    return ops->get_reserved_device_memory(func, ctxt);
+}
+
 bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
 {
     const struct hvm_iommu *hd = domain_hvm_iommu(d);
diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
index 2b07be9..a730de5 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -893,3 +893,35 @@ int platform_supports_x2apic(void)
     unsigned int mask = ACPI_DMAR_INTR_REMAP | ACPI_DMAR_X2APIC_OPT_OUT;
     return cpu_has_x2apic && ((dmar_flags & mask) == ACPI_DMAR_INTR_REMAP);
 }
+
+int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
+{
+    struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
+    int rc = 0;
+    unsigned int i;
+    u16 bdf;
+
+    for_each_rmrr_device ( rmrr, bdf, i )
+    {
+        if ( rmrr != rmrr_cur )
+        {
+            rc = func(PFN_DOWN(rmrr->base_address),
+                      PFN_UP(rmrr->end_address) -
+                        PFN_DOWN(rmrr->base_address),
+                      PCI_SBDF(rmrr->segment, bdf),
+                      ctxt);
+
+            if ( unlikely(rc < 0) )
+                return rc;
+
+            if ( !rc )
+                continue;
+
+            /* Just go next. */
+            if ( rc == 1 )
+                rmrr_cur = rmrr;
+        }
+    }
+
+    return 0;
+}
diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
index 5524dba..f9ee9b0 100644
--- a/xen/drivers/passthrough/vtd/extern.h
+++ b/xen/drivers/passthrough/vtd/extern.h
@@ -75,6 +75,7 @@ int domain_context_mapping_one(struct domain *domain, struct iommu *iommu,
                                u8 bus, u8 devfn, const struct pci_dev *);
 int domain_context_unmap_one(struct domain *domain, struct iommu *iommu,
                              u8 bus, u8 devfn);
+int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt);
 
 unsigned int io_apic_read_remap_rte(unsigned int apic, unsigned int reg);
 void io_apic_write_remap_rte(unsigned int apic,
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 9053a1f..6a37624 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2491,6 +2491,7 @@ const struct iommu_ops intel_iommu_ops = {
     .crash_shutdown = vtd_crash_shutdown,
     .iotlb_flush = intel_iommu_iotlb_flush,
     .iotlb_flush_all = intel_iommu_iotlb_flush_all,
+    .get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
     .dump_p2m_table = vtd_dump_p2m_table,
 };
 
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 832559a..7b25275 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -573,7 +573,37 @@ struct xen_vnuma_topology_info {
 typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
 DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
 
-/* Next available subop number is 27 */
+/*
+ * With some legacy devices, certain guest-physical addresses cannot safely
+ * be used for other purposes, e.g. to map guest RAM.  This hypercall
+ * enumerates those regions so the toolstack can avoid using them.
+ */
+#define XENMEM_reserved_device_memory_map   27
+struct xen_reserved_device_memory {
+    xen_pfn_t start_pfn;
+    xen_ulong_t nr_pages;
+};
+typedef struct xen_reserved_device_memory xen_reserved_device_memory_t;
+DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
+
+struct xen_reserved_device_memory_map {
+    /* IN */
+    /* Currently just one bit to indicate checkng all Reserved Device Memory. */
+#define PCI_DEV_RDM_ALL   0x1
+    uint32_t        flag;
+    /* IN */
+    uint16_t        seg;
+    uint8_t         bus;
+    uint8_t         devfn;
+    /* IN/OUT */
+    unsigned int    nr_entries;
+    /* OUT */
+    XEN_GUEST_HANDLE(xen_reserved_device_memory_t) buffer;
+};
+typedef struct xen_reserved_device_memory_map xen_reserved_device_memory_map_t;
+DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_map_t);
+
+/* Next available subop number is 28 */
 
 #endif /* __XEN_PUBLIC_MEMORY_H__ */
 
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index b30bf41..e2f584d 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -126,6 +126,14 @@ int iommu_do_dt_domctl(struct xen_domctl *, struct domain *,
 
 struct page_info;
 
+/*
+ * Any non-zero value returned from callbacks of this type will cause the
+ * function the callback was handed to terminate its iteration. Assigning
+ * meaning of these non-zero values is left to the top level caller /
+ * callback pair.
+ */
+typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void *ctxt);
+
 struct iommu_ops {
     int (*init)(struct domain *d);
     void (*hwdom_init)(struct domain *d);
@@ -157,12 +165,14 @@ struct iommu_ops {
     void (*crash_shutdown)(void);
     void (*iotlb_flush)(struct domain *d, unsigned long gfn, unsigned int page_count);
     void (*iotlb_flush_all)(struct domain *d);
+    int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
     void (*dump_p2m_table)(struct domain *d);
 };
 
 void iommu_suspend(void);
 void iommu_resume(void);
 void iommu_crash_shutdown(void);
+int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
 
 void iommu_share_p2m_table(struct domain *d);
 
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 3908146..d176e8b 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -33,6 +33,8 @@
 #define PCI_DEVFN2(bdf) ((bdf) & 0xff)
 #define PCI_BDF(b,d,f)  ((((b) & 0xff) << 8) | PCI_DEVFN(d,f))
 #define PCI_BDF2(b,df)  ((((b) & 0xff) << 8) | ((df) & 0xff))
+#define PCI_SBDF(s,bdf) (((s & 0xffff) << 16) | (bdf & 0xffff))
+#define PCI_SBDF2(s,b,df) (((s & 0xffff) << 16) | PCI_BDF2(b,df))
 
 struct pci_dev_info {
     bool_t is_extfn;
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 9c9fd9a..dd23559 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -61,9 +61,10 @@
 !	memory_exchange			memory.h
 !	memory_map			memory.h
 !	memory_reservation		memory.h
-?	mem_access_op		memory.h
+?	mem_access_op			memory.h
 !	pod_target			memory.h
 !	remove_from_physmap		memory.h
+!	reserved_device_memory_map	memory.h
 ?	vmemrange			memory.h
 !	vnuma_topology_info		memory.h
 ?	physdev_eoi			physdev.h
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 02/16] xen/x86/p2m: introduce set_identity_p2m_entry
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
  2015-06-11  1:15 ` [v3][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-11  7:33   ` Jan Beulich
  2015-06-11  9:00   ` Tian, Kevin
  2015-06-11  1:15 ` [v3][PATCH 03/16] xen/vtd: create RMRR mapping Tiejun Chen
                   ` (15 subsequent siblings)
  17 siblings, 2 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

We will create this sort of identity mapping as follows:

If the gfn space is unoccupied, we just set the mapping. If the space
is already occupied by 1:1 mappings, do nothing. Failed for any
other cases.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/arch/x86/mm/p2m.c     | 35 +++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/p2m.h |  4 ++++
 2 files changed, 39 insertions(+)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 1fd1194..a6db236 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -898,6 +898,41 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
     return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                           p2m_access_t p2ma)
+{
+    p2m_type_t p2mt;
+    p2m_access_t a;
+    mfn_t mfn;
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int ret;
+
+    if ( paging_mode_translate(p2m->domain) )
+    {
+        gfn_lock(p2m, gfn, 0);
+
+        mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
+
+        if ( p2mt == p2m_invalid || mfn_x(mfn) == INVALID_MFN )
+            ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+                                p2m_mmio_direct, p2ma);
+        else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
+            ret = 0;
+        else
+        {
+            ret = -EBUSY;
+            printk(XENLOG_G_WARNING
+                   "Cannot identity map d%d:%lx, already mapped to %lx.\n",
+                   d->domain_id, gfn, mfn_x(mfn));
+        }
+
+        gfn_unlock(p2m, gfn, 0);
+        return ret;
+    }
+
+    return 0;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index b49c09b..95b6266 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -543,6 +543,10 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
                        p2m_access_t access);
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
+/* Set identity addresses in the p2m table (for pass-through) */
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                           p2m_access_t p2ma);
+
 /* Add foreign mapping to the guest's p2m table. */
 int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
                     unsigned long gpfn, domid_t foreign_domid);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 03/16] xen/vtd: create RMRR mapping
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
  2015-06-11  1:15 ` [v3][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
  2015-06-11  1:15 ` [v3][PATCH 02/16] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-11  9:14   ` Tian, Kevin
  2015-06-17 10:03   ` Jan Beulich
  2015-06-11  1:15 ` [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
                   ` (14 subsequent siblings)
  17 siblings, 2 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/arch/x86/mm/p2m.c               | 10 ++++++++--
 xen/drivers/passthrough/vtd/iommu.c |  3 +--
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index a6db236..c7198a5 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -927,10 +927,16 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
         }
 
         gfn_unlock(p2m, gfn, 0);
-        return ret;
     }
+    else
+        ret = 0;
 
-    return 0;
+    if( ret == 0 )
+    {
+        ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
+    }
+
+    return ret;
 }
 
 /* Returns: 0 for success, -errno for failure */
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 6a37624..31ce1af 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1856,8 +1856,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
     while ( base_pfn < end_pfn )
     {
-        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
-                                       IOMMUF_readable|IOMMUF_writable);
+        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
 
         if ( err )
             return err;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (2 preceding siblings ...)
  2015-06-11  1:15 ` [v3][PATCH 03/16] xen/vtd: create RMRR mapping Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-11  9:28   ` Tian, Kevin
  2015-06-17 10:11   ` Jan Beulich
  2015-06-11  1:15 ` [v3][PATCH 05/16] xen: enable XENMEM_memory_map in hvm Tiejun Chen
                   ` (13 subsequent siblings)
  17 siblings, 2 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/arch/x86/mm/p2m.c                       |  8 +++++++-
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
 xen/drivers/passthrough/arm/smmu.c          |  2 +-
 xen/drivers/passthrough/device_tree.c       | 11 ++++++++++-
 xen/drivers/passthrough/pci.c               | 10 ++++++----
 xen/drivers/passthrough/vtd/iommu.c         | 20 ++++++++++++--------
 xen/include/asm-x86/p2m.h                   |  2 +-
 xen/include/public/domctl.h                 |  5 +++++
 xen/include/xen/iommu.h                     |  2 +-
 9 files changed, 45 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c7198a5..3fcdcac 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -899,7 +899,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-                           p2m_access_t p2ma)
+                           p2m_access_t p2ma, u32 flag)
 {
     p2m_type_t p2mt;
     p2m_access_t a;
@@ -924,6 +924,12 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
             printk(XENLOG_G_WARNING
                    "Cannot identity map d%d:%lx, already mapped to %lx.\n",
                    d->domain_id, gfn, mfn_x(mfn));
+
+            if ( flag == XEN_DOMCTL_DEV_RDM_RELAXED )
+            {
+                ret = 0;
+                printk(XENLOG_G_WARNING "Some devices may work failed.\n");
+            }
         }
 
         gfn_unlock(p2m, gfn, 0);
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct domain *target,
 }
 
 static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-                                   struct pci_dev *pdev)
+                                   struct pci_dev *pdev,
+                                   u32 flag)
 {
     struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
     int bdf = PCI_BDF2(pdev->bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct iommu_domain *domain)
 }
 
 static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-			       struct device *dev)
+			       struct device *dev, u32 flag)
 {
 	struct iommu_domain *domain;
 	struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c b/xen/drivers/passthrough/device_tree.c
index 5d3842a..ea85645 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev)
             goto fail;
     }
 
-    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
+    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev),
+                                         XEN_DOMCTL_DEV_NO_RDM);
 
     if ( rc )
         goto fail;
@@ -148,6 +149,14 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct domain *d,
         if ( domctl->u.assign_device.dev != XEN_DOMCTL_DEV_DT )
             break;
 
+        if ( domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM )
+        {
+            printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: assign \"%s\""
+                   " to dom%u failed (%d) since we don't support RDM.\n",
+                   dt_node_full_name(dev), d->domain_id, ret);
+            break;
+        }
+
         if ( unlikely(d->is_dying) )
         {
             ret = -EINVAL;
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e30be43..557c87e 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1335,7 +1335,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn)
     return pdev ? 0 : -EBUSY;
 }
 
-static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
+static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
 {
     struct hvm_iommu *hd = domain_hvm_iommu(d);
     struct pci_dev *pdev;
@@ -1371,7 +1371,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
 
     pdev->fault.count = 0;
 
-    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev))) )
+    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag)) )
         goto done;
 
     for ( ; pdev->phantom_stride; rc = 0 )
@@ -1379,7 +1379,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
         devfn += pdev->phantom_stride;
         if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
             break;
-        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev));
+        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
         if ( rc )
             printk(XENLOG_G_WARNING "d%d: assign %04x:%02x:%02x.%u failed (%d)\n",
                    d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
@@ -1496,6 +1496,7 @@ int iommu_do_pci_domctl(
 {
     u16 seg;
     u8 bus, devfn;
+    u32 flag;
     int ret = 0;
     uint32_t machine_sbdf;
 
@@ -1577,9 +1578,10 @@ int iommu_do_pci_domctl(
         seg = machine_sbdf >> 16;
         bus = PCI_BUS(machine_sbdf);
         devfn = PCI_DEVFN2(machine_sbdf);
+        flag = domctl->u.assign_device.flag;
 
         ret = device_assigned(seg, bus, devfn) ?:
-              assign_device(d, seg, bus, devfn);
+              assign_device(d, seg, bus, devfn, flag);
         if ( ret == -ERESTART )
             ret = hypercall_create_continuation(__HYPERVISOR_domctl,
                                                 "h", u_domctl);
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 31ce1af..d7c9e1c 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1808,7 +1808,8 @@ static void iommu_set_pgd(struct domain *d)
 }
 
 static int rmrr_identity_mapping(struct domain *d, bool_t map,
-                                 const struct acpi_rmrr_unit *rmrr)
+                                 const struct acpi_rmrr_unit *rmrr,
+                                 u32 flag)
 {
     unsigned long base_pfn = rmrr->base_address >> PAGE_SHIFT_4K;
     unsigned long end_pfn = PAGE_ALIGN_4K(rmrr->end_address) >> PAGE_SHIFT_4K;
@@ -1856,7 +1857,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
 
     while ( base_pfn < end_pfn )
     {
-        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
+        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
 
         if ( err )
             return err;
@@ -1899,7 +1900,8 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev *pdev)
              PCI_BUS(bdf) == pdev->bus &&
              PCI_DEVFN2(bdf) == devfn )
         {
-            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
+            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
+                                        XEN_DOMCTL_DEV_RDM_RELAXED);
             if ( ret )
                 dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
                         pdev->domain->domain_id);
@@ -1940,7 +1942,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev)
              PCI_DEVFN2(bdf) != devfn )
             continue;
 
-        rmrr_identity_mapping(pdev->domain, 0, rmrr);
+        rmrr_identity_mapping(pdev->domain, 0, rmrr,
+                              XEN_DOMCTL_DEV_RDM_RELAXED);
     }
 
     return domain_context_unmap(pdev->domain, devfn, pdev);
@@ -2098,7 +2101,7 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
     spin_lock(&pcidevs_lock);
     for_each_rmrr_device ( rmrr, bdf, i )
     {
-        ret = rmrr_identity_mapping(d, 1, rmrr);
+        ret = rmrr_identity_mapping(d, 1, rmrr, XEN_DOMCTL_DEV_RDM_RELAXED);
         if ( ret )
             dprintk(XENLOG_ERR VTDPREFIX,
                      "IOMMU: mapping reserved region failed\n");
@@ -2241,7 +2244,8 @@ static int reassign_device_ownership(
                  PCI_BUS(bdf) == pdev->bus &&
                  PCI_DEVFN2(bdf) == devfn )
             {
-                ret = rmrr_identity_mapping(source, 0, rmrr);
+                ret = rmrr_identity_mapping(source, 0, rmrr,
+                                            XEN_DOMCTL_DEV_RDM_RELAXED);
                 if ( ret != -ENOENT )
                     return ret;
             }
@@ -2265,7 +2269,7 @@ static int reassign_device_ownership(
 }
 
 static int intel_iommu_assign_device(
-    struct domain *d, u8 devfn, struct pci_dev *pdev)
+    struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag)
 {
     struct acpi_rmrr_unit *rmrr;
     int ret = 0, i;
@@ -2294,7 +2298,7 @@ static int intel_iommu_assign_device(
              PCI_BUS(bdf) == bus &&
              PCI_DEVFN2(bdf) == devfn )
         {
-            ret = rmrr_identity_mapping(d, 1, rmrr);
+            ret = rmrr_identity_mapping(d, 1, rmrr, flag);
             if ( ret )
             {
                 reassign_device_ownership(d, hardware_domain, devfn, pdev);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 95b6266..a80b4f8 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -545,7 +545,7 @@ int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
 /* Set identity addresses in the p2m table (for pass-through) */
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-                           p2m_access_t p2ma);
+                           p2m_access_t p2ma, u32 flag);
 
 /* Add foreign mapping to the guest's p2m table. */
 int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index bc45ea5..2f9e40e 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -478,6 +478,11 @@ struct xen_domctl_assign_device {
             XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
         } dt;
     } u;
+    /* IN */
+#define XEN_DOMCTL_DEV_NO_RDM           0
+#define XEN_DOMCTL_DEV_RDM_RELAXED      1
+#define XEN_DOMCTL_DEV_RDM_STRICT       2
+    uint32_t  flag;   /* flag of assigned device */
 };
 typedef struct xen_domctl_assign_device xen_domctl_assign_device_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_assign_device_t);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index e2f584d..02b2b02 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -140,7 +140,7 @@ struct iommu_ops {
     int (*add_device)(u8 devfn, device_t *dev);
     int (*enable_device)(device_t *dev);
     int (*remove_device)(u8 devfn, device_t *dev);
-    int (*assign_device)(struct domain *, u8 devfn, device_t *dev);
+    int (*assign_device)(struct domain *, u8 devfn, device_t *dev, u32 flag);
     int (*reassign_device)(struct domain *s, struct domain *t,
                            u8 devfn, device_t *dev);
 #ifdef HAS_PCI
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 05/16] xen: enable XENMEM_memory_map in hvm
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (3 preceding siblings ...)
  2015-06-11  1:15 ` [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-11  9:29   ` Tian, Kevin
  2015-06-17 10:14   ` Jan Beulich
  2015-06-11  1:15 ` [v3][PATCH 06/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
                   ` (12 subsequent siblings)
  17 siblings, 2 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

This patch enables XENMEM_memory_map in hvm. So we can use it to
setup the e820 mappings.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Reviewed-by: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c      | 6 ------
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index f354cb7..fab5637 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4728,7 +4728,6 @@ static long hvm_memory_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
@@ -4804,7 +4803,6 @@ static long hvm_memory_op_compat32(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     switch ( cmd & MEMOP_CMD_MASK )
     {
-    case XENMEM_memory_map:
     case XENMEM_machine_memory_map:
     case XENMEM_machphys_mapping:
         return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 472c494..4923ccd 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
             return rc;
         }
 
-        if ( is_hvm_domain(d) )
-        {
-            rcu_unlock_domain(d);
-            return -EPERM;
-        }
-
         e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
         if ( e820 == NULL )
         {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 06/16] hvmloader: get guest memory map into memory_map[]
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (4 preceding siblings ...)
  2015-06-11  1:15 ` [v3][PATCH 05/16] xen: enable XENMEM_memory_map in hvm Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-11  9:38   ` Tian, Kevin
  2015-06-17 10:22   ` Jan Beulich
  2015-06-11  1:15 ` [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges Tiejun Chen
                   ` (11 subsequent siblings)
  17 siblings, 2 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END].
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/firmware/hvmloader/e820.h      |  7 +++++++
 tools/firmware/hvmloader/hvmloader.c | 37 ++++++++++++++++++++++++++++++++++++
 tools/firmware/hvmloader/util.c      | 26 +++++++++++++++++++++++++
 tools/firmware/hvmloader/util.h      | 11 +++++++++++
 4 files changed, 81 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
     uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX	128
+
+struct e820map {
+    unsigned int nr_map;
+    struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..c9f170e 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -107,6 +107,8 @@ asm (
     "    .text                       \n"
     );
 
+struct e820map memory_map;
+
 unsigned long scratch_start = SCRATCH_PHYSICAL_ADDRESS;
 
 static void init_hypercalls(void)
@@ -199,6 +201,39 @@ static void apic_setup(void)
     ioapic_write(0x11, SET_APIC_ID(LAPIC_ID(0)));
 }
 
+void memory_map_setup(void)
+{
+    unsigned int nr_entries = E820MAX, i;
+    int rc;
+    uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
+    uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+    rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
+
+    if ( rc )
+    {
+        printf("Failed to get guest memory map.\n");
+        BUG();
+    }
+
+    BUG_ON(!nr_entries);
+    memory_map.nr_map = nr_entries;
+
+    for ( i = 0; i < nr_entries; i++ )
+    {
+        if ( memory_map.map[i].type == E820_RESERVED )
+        {
+            if ( check_overlap(alloc_addr, alloc_size,
+                               memory_map.map[i].addr,
+                               memory_map.map[i].size) )
+            {
+                printf("RDM conflicts Memory allocation.\n");
+                BUG();
+            }
+        }
+    }
+}
+
 struct bios_info {
     const char *key;
     const struct bios_config *bios;
@@ -262,6 +297,8 @@ int main(void)
 
     init_hypercalls();
 
+    memory_map_setup();
+
     xenbus_setup();
 
     bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include <xen/memory.h>
 #include <xen/sched.h>
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+                   uint64_t reserved_start, uint64_t reserved_size)
+{
+    return (start + size > reserved_start) &&
+            (start < reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
     asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
     *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+    int rc;
+    struct xen_memory_map memmap = {
+        .nr_entries = *max_entries
+    };
+
+    set_xen_guest_handle(memmap.buffer, entries);
+
+    rc = hypercall_memory_op(XENMEM_memory_map, &memmap);
+    *max_entries = memmap.nr_entries;
+
+    return rc;
+}
+
 void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
 {
     static int over_allocated;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index a70e4aa..70e19c4 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -4,8 +4,10 @@
 #include <stdarg.h>
 #include <stdint.h>
 #include <stddef.h>
+#include <stdbool.h>
 #include <xen/xen.h>
 #include <xen/hvm/hvm_info_table.h>
+#include "e820.h"
 
 #define __STR(...) #__VA_ARGS__
 #define STR(...) __STR(__VA_ARGS__)
@@ -222,6 +224,9 @@ int hvm_param_set(uint32_t index, uint64_t value);
 /* Setup PCI bus */
 void pci_setup(void);
 
+/* Setup memory map  */
+void memory_map_setup(void);
+
 /* Prepare the 32bit BIOS */
 uint32_t rombios_highbios_setup(void);
 
@@ -249,6 +254,12 @@ void perform_tests(void);
 
 extern char _start[], _end[];
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries);
+
+extern struct e820map memory_map;
+bool check_overlap(uint64_t start, uint64_t size,
+                   uint64_t reserved_start, uint64_t reserved_size);
+
 #endif /* __HVMLOADER_UTIL_H__ */
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (5 preceding siblings ...)
  2015-06-11  1:15 ` [v3][PATCH 06/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-11  9:51   ` Tian, Kevin
  2015-06-11  1:15 ` [v3][PATCH 08/16] hvmloader/e820: construct guest e820 table Tiejun Chen
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

When allocating mmio address for PCI bars, we need to make
sure they don't overlap with reserved regions.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/firmware/hvmloader/pci.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..98af568 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -59,8 +59,8 @@ void pci_setup(void)
         uint32_t bar_reg;
         uint64_t bar_sz;
     } *bars = (struct bars *)scratch_start;
-    unsigned int i, nr_bars = 0;
-    uint64_t mmio_hole_size = 0;
+    unsigned int i, j, nr_bars = 0;
+    uint64_t mmio_hole_size = 0, reserved_end, max_bar_sz = 0;
 
     const char *s;
     /*
@@ -226,6 +226,8 @@ void pci_setup(void)
             bars[i].devfn   = devfn;
             bars[i].bar_reg = bar_reg;
             bars[i].bar_sz  = bar_sz;
+            if ( bar_sz > max_bar_sz )
+                max_bar_sz = bar_sz;
 
             if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
                   PCI_BASE_ADDRESS_SPACE_MEMORY) ||
@@ -301,6 +303,21 @@ void pci_setup(void)
             pci_mem_start <<= 1;
     }
 
+    /* Relocate PCI memory that overlaps reserved space, like RDM. */
+    for ( j = 0; j < memory_map.nr_map ; j++ )
+    {
+        if ( memory_map.map[j].type != E820_RAM )
+        {
+            reserved_end = memory_map.map[j].addr + memory_map.map[j].size;
+            if ( check_overlap(pci_mem_start, pci_mem_end,
+                               memory_map.map[j].addr,
+                               memory_map.map[j].size) )
+                pci_mem_start -= memory_map.map[j].size >> PAGE_SHIFT;
+                pci_mem_start = (pci_mem_start + max_bar_sz - 1) &
+                                    ~(uint64_t)(max_bar_sz - 1);
+        }
+    }
+
     if ( mmio_total > (pci_mem_end - pci_mem_start) )
     {
         printf("Low MMIO hole not large enough for all devices,"
@@ -407,8 +424,23 @@ void pci_setup(void)
         }
 
         base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+ reallocate_mmio:
         bar_data |= (uint32_t)base;
         bar_data_upper = (uint32_t)(base >> 32);
+        for ( j = 0; j < memory_map.nr_map ; j++ )
+        {
+            if ( memory_map.map[j].type != E820_RAM )
+            {
+                reserved_end = memory_map.map[j].addr + memory_map.map[j].size;
+                if ( check_overlap(base, bar_sz,
+                                   memory_map.map[j].addr,
+                                   memory_map.map[j].size) )
+                {
+                    base = (reserved_end  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+                    goto reallocate_mmio;
+                }
+            }
+        }
         base += bar_sz;
 
         if ( (base < resource->base) || (base > resource->max) )
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 08/16] hvmloader/e820: construct guest e820 table
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (6 preceding siblings ...)
  2015-06-11  1:15 ` [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-11  9:59   ` Tian, Kevin
  2015-06-11  1:15 ` [v3][PATCH 09/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

Now we can use that memory map to build our final
e820 table but it may need to reorder all e820
entries.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/firmware/hvmloader/e820.c | 62 +++++++++++++++++++++++++++++++----------
 1 file changed, 48 insertions(+), 14 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..c39b0aa 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -73,7 +73,8 @@ int build_e820_table(struct e820entry *e820,
                      unsigned int lowmem_reserved_base,
                      unsigned int bios_image_base)
 {
-    unsigned int nr = 0;
+    unsigned int nr = 0, i, j;
+    uint64_t low_mem_pgend = hvm_info->low_mem_pgend << PAGE_SHIFT;
 
     if ( !lowmem_reserved_base )
             lowmem_reserved_base = 0xA0000;
@@ -117,13 +118,6 @@ int build_e820_table(struct e820entry *e820,
     e820[nr].type = E820_RESERVED;
     nr++;
 
-    /* Low RAM goes here. Reserve space for special pages. */
-    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
-    e820[nr].addr = 0x100000;
-    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-    e820[nr].type = E820_RAM;
-    nr++;
-
     /*
      * Explicitly reserve space for special pages.
      * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -159,16 +153,56 @@ int build_e820_table(struct e820entry *e820,
         nr++;
     }
 
-
-    if ( hvm_info->high_mem_pgend )
+    /*
+     * Construct the remaining according memory_map.
+     *
+     * Note memory_map includes,
+     *
+     * #1. Low memory region
+     *
+     * Low RAM starts at least from 1M to make sure all standard regions
+     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+     * have enough space.
+     *
+     * #2. RDM region if it exists
+     *
+     * #3. High memory region if it exists
+     */
+    for ( i = 0; i < memory_map.nr_map; i++ )
     {
-        e820[nr].addr = ((uint64_t)1 << 32);
-        e820[nr].size =
-            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-        e820[nr].type = E820_RAM;
+        e820[nr] = memory_map.map[i];
         nr++;
     }
 
+    /* Low RAM goes here. Reserve space for special pages. */
+    BUG_ON(low_mem_pgend < (2u << 20));
+    /*
+     * We may need to adjust real lowmem end since we may
+     * populate RAM to get enough MMIO previously.
+     */
+    for ( i = 0; i < memory_map.nr_map; i++ )
+    {
+        uint64_t end = e820[i].addr + e820[i].size;
+        if ( e820[i].type == E820_RAM &&
+             low_mem_pgend > e820[i].addr && low_mem_pgend < end )
+            e820[i].size = low_mem_pgend - e820[i].addr;
+    }
+
+    /* Finally we need to reorder all e820 entries. */
+    for ( j = 0; j < nr-1; j++ )
+    {
+        for ( i = j+1; i < nr; i++ )
+        {
+            if ( e820[j].addr > e820[i].addr )
+            {
+                struct e820entry tmp;
+                tmp = e820[j];
+                e820[j] = e820[i];
+                e820[i] = tmp;
+            }
+        }
+    }
+
     return nr;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 09/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (7 preceding siblings ...)
  2015-06-11  1:15 ` [v3][PATCH 08/16] hvmloader/e820: construct guest e820 table Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-11 10:00   ` Tian, Kevin
  2015-06-11  1:15 ` [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

We will introduce the hypercall xc_reserved_device_memory_map
approach to libxc. This helps us get rdm entry info according to
different parameters. If flag == PCI_DEV_RDM_ALL, all entries
should be exposed. Or we just expose that rdm entry specific to
a SBDF.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/libxc/include/xenctrl.h |  8 ++++++++
 tools/libxc/xc_domain.c       | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 50fa9e7..6c01362 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1326,6 +1326,14 @@ int xc_domain_set_memory_map(xc_interface *xch,
 int xc_get_machine_memory_map(xc_interface *xch,
                               struct e820entry entries[],
                               uint32_t max_entries);
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+                                  uint32_t flag,
+                                  uint16_t seg,
+                                  uint8_t bus,
+                                  uint8_t devfn,
+                                  struct xen_reserved_device_memory entries[],
+                                  uint32_t *max_entries);
 #endif
 int xc_domain_set_time_offset(xc_interface *xch,
                               uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 1ff6d0a..4f96e1b 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -684,6 +684,42 @@ int xc_domain_set_memory_map(xc_interface *xch,
 
     return rc;
 }
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+                                  uint32_t flag,
+                                  uint16_t seg,
+                                  uint8_t bus,
+                                  uint8_t devfn,
+                                  struct xen_reserved_device_memory entries[],
+                                  uint32_t *max_entries)
+{
+    int rc;
+    struct xen_reserved_device_memory_map xrdmmap = {
+        .flag = flag,
+        .seg = seg,
+        .bus = bus,
+        .devfn = devfn,
+        .nr_entries = *max_entries
+    };
+    DECLARE_HYPERCALL_BOUNCE(entries,
+                             sizeof(struct xen_reserved_device_memory) *
+                             *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+    if ( xc_hypercall_bounce_pre(xch, entries) )
+        return -1;
+
+    set_xen_guest_handle(xrdmmap.buffer, entries);
+
+    rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
+                      &xrdmmap, sizeof(xrdmmap));
+
+    xc_hypercall_bounce_post(xch, entries);
+
+    *max_entries = xrdmmap.nr_entries;
+
+    return rc;
+}
+
 int xc_get_machine_memory_map(xc_interface *xch,
                               struct e820entry entries[],
                               uint32_t max_entries)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (8 preceding siblings ...)
  2015-06-11  1:15 ` [v3][PATCH 09/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-11 10:02   ` Tian, Kevin
  2015-06-12 15:43   ` Wei Liu
  2015-06-11  1:15 ` [v3][PATCH 11/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
                   ` (7 subsequent siblings)
  17 siblings, 2 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/libxc/include/xenctrl.h       |  3 ++-
 tools/libxc/xc_domain.c             |  6 +++++-
 tools/libxl/libxl_pci.c             |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 18 ++++++++++++++----
 tools/python/xen/lowlevel/xc/xc.c   | 29 +++++++++++++++++++----------
 5 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 6c01362..7fd60d5 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2078,7 +2078,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
                      uint32_t domid,
-                     uint32_t machine_sbdf);
+                     uint32_t machine_sbdf,
+                     uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
                      uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 4f96e1b..19127ec 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
     xc_interface *xch,
     uint32_t domid,
-    uint32_t machine_sbdf)
+    uint32_t machine_sbdf,
+    uint32_t flag)
 {
     DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
     domctl.domain = domid;
     domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
     domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+    domctl.u.assign_device.flag = flag;
 
     return do_domctl(xch, &domctl);
 }
@@ -1792,6 +1794,8 @@ int xc_assign_dt_device(
 
     domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
     domctl.u.assign_device.u.dt.size = size;
+    /* DT doesn't own any RDM. */
+    domctl.u.assign_device.flag = XEN_DOMCTL_DEV_NO_RDM;
     set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
     rc = do_domctl(xch, &domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
     FILE *f;
     unsigned long long start, end, flags, size;
     int irq, i, rc, hvm = 0;
+    uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
     if (type == LIBXL_DOMAIN_TYPE_INVALID)
         return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
 
 out:
     if (!libxl_is_stubdom(ctx, domid, NULL)) {
-        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
+        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
         if (rc < 0 && (hvm || errno != ENOSYS)) {
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
             return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..317bf75 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,19 @@ CAMLprim value stub_xc_domain_test_assign_device(value xch, value domid, value d
 	CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+    XEN_DOMCTL_DEV_NO_RDM,
+    XEN_DOMCTL_DEV_RDM_RELAXED,
+    XEN_DOMCTL_DEV_RDM_STRICT,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+                                            value rflag)
 {
-	CAMLparam3(xch, domid, desc);
+	CAMLparam4(xch, domid, desc, rflag);
 	int ret;
 	int domain, bus, dev, func;
-	uint32_t sbdf;
+	uint32_t sbdf, flag;
 
 	domain = Int_val(Field(desc, 0));
 	bus = Int_val(Field(desc, 1));
@@ -1185,7 +1192,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
 	func = Int_val(Field(desc, 3));
 	sbdf = encode_sbdf(domain, bus, dev, func);
 
-	ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+	ret = Int_val(Field(rflag, 0));
+	flag = domain_assign_device_rdm_flag_table[ret];
+
+	ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
 	if (ret < 0)
 		failwith_xc(_H(xch));
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index c77e15b..172bdf0 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -592,7 +592,8 @@ static int token_value(char *token)
     return strtol(token, NULL, 16);
 }
 
-static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
+static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func,
+                    int *flag)
 {
     char *token;
 
@@ -607,8 +608,16 @@ static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
     *dev  = token_value(token);
     token = strchr(token, ',') + 1;
     *func  = token_value(token);
-    token = strchr(token, ',');
-    *str = token ? token + 1 : NULL;
+    token = strchr(token, ',') + 1;
+    if ( token ) {
+        *flag = token_value(token);
+        *str = token + 1;
+    }
+    else
+    {
+        *flag = XEN_DOMCTL_DEV_RDM_STRICT;
+        *str = NULL;
+    }
 
     return 1;
 }
@@ -620,14 +629,14 @@ static PyObject *pyxc_test_assign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
@@ -653,21 +662,21 @@ static PyObject *pyxc_assign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
         sbdf |= (dev & 0x1f) << 3;
         sbdf |= (func & 0x7);
 
-        if ( xc_assign_device(self->xc_handle, dom, sbdf) != 0 )
+        if ( xc_assign_device(self->xc_handle, dom, sbdf, flag) != 0 )
         {
             if (errno == ENOSYS)
                 sbdf = -1;
@@ -686,14 +695,14 @@ static PyObject *pyxc_deassign_device(XcObject *self,
     uint32_t dom;
     char *pci_str;
     int32_t sbdf = 0;
-    int seg, bus, dev, func;
+    int seg, bus, dev, func, flag;
 
     static char *kwd_list[] = { "domid", "pci", NULL };
     if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
                                       &dom, &pci_str) )
         return NULL;
 
-    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
+    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
     {
         sbdf = seg << 16;
         sbdf |= (bus & 0xff) << 8;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 11/16] tools: introduce some new parameters to set rdm policy
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (9 preceding siblings ...)
  2015-06-11  1:15 ` [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-12 16:02   ` Wei Liu
  2015-06-11  1:15 ` [v3][PATCH 12/16] tools/libxl: passes rdm reservation policy Tiejun Chen
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
    rdm = "type=none/host,reserve=strict/relaxed"
Per-device RDM parameter:
    pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]

Global RDM parameter, "type", allows user to specify reserved regions
explicitly, e.g. using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. Instead, 'none' means we have nothing
to do all reserved regions and ignore all policies, so guest work as before.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM will be killed, while 'relaxed' allows moving forward with a warning
message thrown out.

Default per-device RDM policy is 'strict', while default global RDM policy
is 'relaxed'. When both policies are specified on a given region, 'strict' is
always preferred.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 docs/man/xl.cfg.pod.5        | 50 ++++++++++++++++++++++++
 docs/misc/vtd.txt            | 24 ++++++++++++
 tools/libxl/libxl_create.c   | 13 +++++++
 tools/libxl/libxl_internal.h |  2 +
 tools/libxl/libxl_pci.c      |  3 ++
 tools/libxl/libxl_types.idl  | 18 +++++++++
 tools/libxl/libxlu_pci.c     | 92 ++++++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxlutil.h      |  4 ++
 tools/libxl/xl_cmdimpl.c     | 10 +++++
 9 files changed, 216 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..638b350 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,49 @@ assigned slave device.
 
 =back
 
+=item B<rdm="RDM_RESERVE_STRING">
+
+(HVM/x86 only) Specifies the information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+B<RDM_RESERVE_STRING> has the form C<[KEY=VALUE,KEY=VALUE,...> where:
+
+=over 4
+
+=item B<KEY=VALUE>
+
+Possible B<KEY>s are:
+
+=over 4
+
+=item B<type="STRING">
+
+Currently we just have two types:
+
+"host" means all reserved device memory on this platform should be reserved
+in this VM's guest address space space. This global RDM parameter allows
+user to specify reserved regions explicitly. And using "host" to include all
+reserved regions reported on this platform which is good to handle hotplug
+scenario. In the future this parameter may be further extended to allow
+specifying random regions, e.g. even those belonging to another platform as
+a preparation for live migration with passthrough devices.
+
+"none" means we have nothing to do all reserved regions and ignore all policies,
+so guest work as before.
+
+=over 4
+
+=item B<reserve="STRING">
+
+Conflict may be detected when reserving reserved device memory in guest address
+space. "strict" means an unsolved conflict leads to immediate VM crash, while
+"relaxed" allows VM moving forward with a warning message thrown out. "relaxed"
+is default.
+
+Note this may be overridden by rdm_reserve option in PCI device configuration.
+
 =item B<pci=[ "PCI_SPEC_STRING", "PCI_SPEC_STRING", ... ]>
 
 Specifies the host PCI devices to passthrough to this guest. Each B<PCI_SPEC_STRING>
@@ -717,6 +760,13 @@ dom0 without confirmation.  Please use with care.
 D0-D3hot power management states for the PCI device. False (0) by
 default.
 
+=item B<rdm_reserv="STRING">
+
+(HVM/x86 only) This is same as reserve option above but just specific
+to a given device, and "strict" is default here.
+
+Note this would override global B<rdm> option.
+
 =back
 
 =back
diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
index 9af0e99..7d63c47 100644
--- a/docs/misc/vtd.txt
+++ b/docs/misc/vtd.txt
@@ -111,6 +111,30 @@ in the config file:
 To override for a specific device:
 	pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
 
+RDM, 'reserved device memory', for PCI Device Passthrough
+---------------------------------------------------------
+
+There are some devices the BIOS controls, for e.g. USB devices to perform
+PS2 emulation. The regions of memory used for these devices are marked
+reserved in the e820 map. When we turn on DMA translation, DMA to those
+regions will fail. Hence BIOS uses RMRR to specify these regions along with
+devices that need to access these regions. OS is expected to setup
+identity mappings for these regions for these devices to access these regions.
+
+While creating a VM we should reserve them in advance, and avoid any conflicts.
+So we introduce user configurable parameters to specify RDM resource and
+according policies,
+
+To enable this globally, add "rdm" in the config file:
+
+    rdm = "type=host, reserve=relaxed"   (default policy is "relaxed")
+
+Or just for a specific device:
+
+    pci = [ '01:00.0,rdm_reserve=relaxed', '03:00.0,rdm_reserve=strict' ]
+
+For all the options available to RDM, see xl.cfg(5).
+
 
 Caveat on Conventional PCI Device Passthrough
 ---------------------------------------------
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 86384d2..6c8ec63 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -105,6 +105,12 @@ static int sched_params_valid(libxl__gc *gc,
     return 1;
 }
 
+void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
+{
+    if (b_info->rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
+        b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+}
+
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
                                         libxl_domain_build_info *b_info)
 {
@@ -419,6 +425,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
                    libxl_domain_type_to_string(b_info->type));
         return ERROR_INVAL;
     }
+
+    libxl__rdm_setdefault(gc, b_info);
     return 0;
 }
 
@@ -1450,6 +1458,11 @@ static void domcreate_attach_pci(libxl__egc *egc, libxl__multidev *multidev,
     }
 
     for (i = 0; i < d_config->num_pcidevs; i++) {
+        /*
+         * If the rdm global policy is 'strict' we should override each device.
+         */
+        if (d_config->b_info.rdm.reserve == LIBXL_RDM_RESERVE_FLAG_STRICT)
+            d_config->pcidevs[i].rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
         ret = libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1);
         if (ret < 0) {
             LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index bb3a5c7..e9ac886 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1108,6 +1108,8 @@ _hidden int libxl__device_vtpm_setdefault(libxl__gc *gc, libxl_device_vtpm *vtpm
 _hidden int libxl__device_vfb_setdefault(libxl__gc *gc, libxl_device_vfb *vfb);
 _hidden int libxl__device_vkb_setdefault(libxl__gc *gc, libxl_device_vkb *vkb);
 _hidden int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci);
+_hidden void libxl__rdm_setdefault(libxl__gc *gc,
+                                   libxl_domain_build_info *b_info);
 
 _hidden const char *libxl__device_nic_devname(libxl__gc *gc,
                                               uint32_t domid,
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 632c15e..a00d799 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1040,6 +1040,9 @@ static int libxl__device_pci_reset(libxl__gc *gc, unsigned int domain, unsigned
 
 int libxl__device_pci_setdefault(libxl__gc *gc, libxl_device_pci *pci)
 {
+    /* We'd like to force reserve rdm specific to a device by default.*/
+    if ( pci->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
+        pci->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
     return 0;
 }
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 23f27d4..4dfcaf7 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -73,6 +73,17 @@ libxl_domain_type = Enumeration("domain_type", [
     (2, "PV"),
     ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")
 
+libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
+    (0, "none"),
+    (1, "host"),
+    ], init_val = "LIBXL_RDM_RESERVE_TYPE_NONE")
+
+libxl_rdm_reserve_flag = Enumeration("rdm_reserve_flag", [
+    (-1, "invalid"),
+    (0, "strict"),
+    (1, "relaxed"),
+    ], init_val = "LIBXL_RDM_RESERVE_FLAG_INVALID")
+
 libxl_channel_connection = Enumeration("channel_connection", [
     (0, "UNKNOWN"),
     (1, "PTY"),
@@ -366,6 +377,11 @@ libxl_vnode_info = Struct("vnode_info", [
     ("vcpus", libxl_bitmap), # vcpus in this node
     ])
 
+libxl_rdm_reserve = Struct("rdm_reserve", [
+    ("type",    libxl_rdm_reserve_type),
+    ("reserve",   libxl_rdm_reserve_flag),
+    ])
+
 libxl_domain_build_info = Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
@@ -413,6 +429,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("kernel",           string),
     ("cmdline",          string),
     ("ramdisk",          string),
+    ("rdm",     libxl_rdm_reserve),
     # Given the complexity of verifying the validity of a device tree,
     # libxl doesn't do any security check on it. It's the responsibility
     # of the caller to provide only trusted device tree.
@@ -539,6 +556,7 @@ libxl_device_pci = Struct("device_pci", [
     ("power_mgmt", bool),
     ("permissive", bool),
     ("seize", bool),
+    ("rdm_reserve",   libxl_rdm_reserve_flag),
     ])
 
 libxl_device_dtdev = Struct("device_dtdev", [
diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..9255878 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -42,6 +42,9 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, unsigned int domain,
 #define STATE_OPTIONS_K 6
 #define STATE_OPTIONS_V 7
 #define STATE_TERMINAL  8
+#define STATE_TYPE      9
+#define STATE_RDM_TYPE      10
+#define STATE_RESERVE_FLAG      11
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str)
 {
     unsigned state = STATE_DOMAIN;
@@ -143,6 +146,17 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str
                     pcidev->permissive = atoi(tok);
                 }else if ( !strcmp(optkey, "seize") ) {
                     pcidev->seize = atoi(tok);
+                }else if ( !strcmp(optkey, "rdm_reserve") ) {
+                    if ( !strcmp(tok, "strict") ) {
+                        pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
+                    } else if ( !strcmp(tok, "relaxed") ) {
+                        pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+                    } else {
+                        XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM property"
+                                          " flag: 'strict' or 'relaxed'.",
+                                     tok);
+                        goto parse_error;
+                    }
                 }else{
                     XLU__PCI_ERR(cfg, "Unknown PCI BDF option: %s", optkey);
                 }
@@ -167,6 +181,84 @@ parse_error:
     return ERROR_INVAL;
 }
 
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str)
+{
+    unsigned state = STATE_TYPE;
+    char *buf2, *tok, *ptr, *end;
+
+    if (NULL == (buf2 = ptr = strdup(str)))
+        return ERROR_NOMEM;
+
+    for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
+        switch(state) {
+        case STATE_TYPE:
+            if (*ptr == '=') {
+                state = STATE_RDM_TYPE;
+                *ptr = '\0';
+                if (strcmp(tok, "type")) {
+                    XLU__PCI_ERR(cfg, "Unknown RDM state option: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_RDM_TYPE:
+            if (*ptr == '\0' || *ptr == ',') {
+                state = STATE_RESERVE_FLAG;
+                *ptr = '\0';
+                if (!strcmp(tok, "host")) {
+                    rdm->type = LIBXL_RDM_RESERVE_TYPE_HOST;
+                } else if (!strcmp(tok, "none")) {
+                    rdm->type = LIBXL_RDM_RESERVE_TYPE_NONE;
+                } else {
+                    XLU__PCI_ERR(cfg, "Unknown RDM type option: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_RESERVE_FLAG:
+            if (*ptr == '=') {
+                state = STATE_OPTIONS_V;
+                *ptr = '\0';
+                if (strcmp(tok, "reserve")) {
+                    XLU__PCI_ERR(cfg, "Unknown RDM property value: %s", tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+            break;
+        case STATE_OPTIONS_V:
+            if (*ptr == ',' || *ptr == '\0') {
+                state = STATE_TERMINAL;
+                *ptr = '\0';
+                if (!strcmp(tok, "strict")) {
+                    rdm->reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
+                } else if (!strcmp(tok, "relaxed")) {
+                    rdm->reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+                } else {
+                    XLU__PCI_ERR(cfg, "Unknown RDM property flag value: %s",
+                                 tok);
+                    goto parse_error;
+                }
+                tok = ptr + 1;
+            }
+        default:
+            break;
+        }
+    }
+
+    free(buf2);
+
+    if (tok != ptr || state != STATE_TERMINAL)
+        goto parse_error;
+
+    return 0;
+
+parse_error:
+    return ERROR_INVAL;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxlutil.h b/tools/libxl/libxlutil.h
index 989605a..e81b644 100644
--- a/tools/libxl/libxlutil.h
+++ b/tools/libxl/libxlutil.h
@@ -106,6 +106,10 @@ int xlu_disk_parse(XLU_Config *cfg, int nspecs, const char *const *specs,
  */
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str);
 
+/*
+ * RDM parsing
+ */
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str);
 
 /*
  * Vif rate parsing.
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c858068..aedbd4b 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1920,6 +1920,14 @@ skip_vfb:
         xlu_cfg_get_defbool(config, "e820_host", &b_info->u.pv.e820_host, 0);
     }
 
+    if (!xlu_cfg_get_string(config, "rdm", &buf, 0)) {
+        libxl_rdm_reserve rdm;
+        if (!xlu_rdm_parse(config, &rdm, buf)) {
+            b_info->rdm.type = rdm.type;
+            b_info->rdm.reserve = rdm.reserve;
+        }
+    }
+
     if (!xlu_cfg_get_list (config, "pci", &pcis, 0, 0)) {
         d_config->num_pcidevs = 0;
         d_config->pcidevs = NULL;
@@ -1934,6 +1942,8 @@ skip_vfb:
             pcidev->power_mgmt = pci_power_mgmt;
             pcidev->permissive = pci_permissive;
             pcidev->seize = pci_seize;
+            /* We'd like to force reserve rdm specific to a device by default.*/
+            pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
             if (!xlu_pci_parse_bdf(config, pcidev, buf))
                 d_config->num_pcidevs++;
         }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 12/16] tools/libxl: passes rdm reservation policy
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (10 preceding siblings ...)
  2015-06-11  1:15 ` [v3][PATCH 11/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-12 16:17   ` Wei Liu
  2015-06-11  1:15 ` [v3][PATCH 13/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

This patch passes our rdm reservation policy inside libxl
when we assign a device or attach a device.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 docs/man/xl.pod.1         |  7 ++++++-
 tools/libxl/libxl_pci.c   | 10 +++++++++-
 tools/libxl/xl_cmdimpl.c  | 23 +++++++++++++++++++----
 tools/libxl/xl_cmdtable.c |  2 +-
 4 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 4eb929d..c5c4809 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1368,10 +1368,15 @@ it will also attempt to re-bind the device to its original driver, making it
 usable by Domain 0 again.  If the device is not bound to pciback, it will
 return success.
 
-=item B<pci-attach> I<domain-id> I<BDF>
+=item B<pci-attach> I<domain-id> I<BDF> [I<rdm>]
 
 Hot-plug a new pass-through pci device to the specified domain.
 B<BDF> is the PCI Bus/Device/Function of the physical device to pass-through.
+B<rdm policy> is about how to handle conflict between reserving reserved device
+memory and guest address space. "strict" means an unsolved conflict leads to
+immediate VM crash, while "relaxed" allows VM moving forward with a warning
+message thrown out. Here "strict" is default.
+
 
 =item B<pci-detach> [I<-f>] I<domain-id> I<BDF>
 
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index a00d799..d2e8911 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,7 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
     FILE *f;
     unsigned long long start, end, flags, size;
     int irq, i, rc, hvm = 0;
-    uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
+    uint32_t flag;
 
     if (type == LIBXL_DOMAIN_TYPE_INVALID)
         return ERROR_FAIL;
@@ -988,6 +988,14 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
 
 out:
     if (!libxl_is_stubdom(ctx, domid, NULL)) {
+        if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_RELAXED) {
+            flag = XEN_DOMCTL_DEV_RDM_RELAXED;
+        } else if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_STRICT) {
+            flag = XEN_DOMCTL_DEV_RDM_STRICT;
+        } else {
+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unkwon rdm check flag.");
+            return ERROR_FAIL;
+        }
         rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
         if (rc < 0 && (hvm || errno != ENOSYS)) {
             LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index aedbd4b..4364ba4 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3359,7 +3359,8 @@ int main_pcidetach(int argc, char **argv)
     pcidetach(domid, bdf, force);
     return 0;
 }
-static void pciattach(uint32_t domid, const char *bdf, const char *vs)
+static void pciattach(uint32_t domid, const char *bdf, const char *vs,
+                      uint32_t flag)
 {
     libxl_device_pci pcidev;
     XLU_Config *config;
@@ -3369,6 +3370,7 @@ static void pciattach(uint32_t domid, const char *bdf, const char *vs)
     config = xlu_cfg_init(stderr, "command line");
     if (!config) { perror("xlu_cfg_inig"); exit(-1); }
 
+    pcidev.rdm_reserve = flag;
     if (xlu_pci_parse_bdf(config, &pcidev, bdf)) {
         fprintf(stderr, "pci-attach: malformed BDF specification \"%s\"\n", bdf);
         exit(2);
@@ -3381,9 +3383,9 @@ static void pciattach(uint32_t domid, const char *bdf, const char *vs)
 
 int main_pciattach(int argc, char **argv)
 {
-    uint32_t domid;
+    uint32_t domid, flag;
     int opt;
-    const char *bdf = NULL, *vs = NULL;
+    const char *bdf = NULL, *vs = NULL, *rdm_policy = NULL;
 
     SWITCH_FOREACH_OPT(opt, "", NULL, "pci-attach", 2) {
         /* No options */
@@ -3395,7 +3397,20 @@ int main_pciattach(int argc, char **argv)
     if (optind + 1 < argc)
         vs = argv[optind + 2];
 
-    pciattach(domid, bdf, vs);
+    if (optind + 2 < argc) {
+        rdm_policy = argv[optind + 3];
+    }
+    if (!strcmp(rdm_policy, "strict")) {
+        flag = LIBXL_RDM_RESERVE_FLAG_STRICT;
+    } else if (!strcmp(rdm_policy, "relaxed")) {
+        flag = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+    } else {
+        fprintf(stderr, "%s is an invalid rdm policy: 'strict'|'relaxed'\n",
+                rdm_policy);
+        exit(2);
+    }
+
+    pciattach(domid, bdf, vs, flag);
     return 0;
 }
 
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 7f4759b..552fbec 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -88,7 +88,7 @@ struct cmd_spec cmd_table[] = {
     { "pci-attach",
       &main_pciattach, 0, 1,
       "Insert a new pass-through pci device",
-      "<Domain> <BDF> [Virtual Slot]",
+      "<Domain> <BDF> [Virtual Slot] <policy to reserve rdm['strice'|'relaxed']>",
     },
     { "pci-detach",
       &main_pcidetach, 0, 1,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 13/16] tools/libxl: detect and avoid conflicts with RDM
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (11 preceding siblings ...)
  2015-06-11  1:15 ` [v3][PATCH 12/16] tools/libxl: passes rdm reservation policy Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-11 10:19   ` Tian, Kevin
  2015-06-12 16:39   ` Wei Liu
  2015-06-11  1:15 ` [v3][PATCH 14/16] tools/libxl: extend XENMEM_set_memory_map Tiejun Chen
                   ` (4 subsequent siblings)
  17 siblings, 2 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RAM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

    #1. Above a predefined boundary (default 2G)
        - move lowmem_end below reserved region to solve conflict;

    #2. Below a predefined boundary (default 2G)
        - Check strict/relaxed policy.
        "strict" policy leads to fail libxl. Note when both policies
        are specified on a given region, 'strict' is always preferred.
        "relaxed" policy issue a warning message and also mask this entry INVALID
        to indicate we shouldn't expose this entry to hvmloader.

Note this predefined boundary can be changes with the parameter
"rdm_mem_boundary" in .cfg file.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 docs/man/xl.cfg.pod.5          |  21 ++++
 tools/libxc/xc_hvm_build_x86.c |   5 +-
 tools/libxl/libxl.h            |   6 +
 tools/libxl/libxl_create.c     |   6 +-
 tools/libxl/libxl_dm.c         | 255 +++++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_dom.c        |  11 +-
 tools/libxl/libxl_internal.h   |  11 +-
 tools/libxl/libxl_types.idl    |   8 ++
 tools/libxl/xl_cmdimpl.c       |   3 +
 9 files changed, 322 insertions(+), 4 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 638b350..6fd2370 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -767,6 +767,27 @@ to a given device, and "strict" is default here.
 
 Note this would override global B<rdm> option.
 
+=item B<rdm_mem_boundary=MBYTES>
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RMRR entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+    #1. Above a predefined boundary
+        - move lowmem_end below reserved region to solve conflict;
+
+    #2. Below a predefined boundary
+        - Check strict/relaxed policy.
+        "strict" policy leads to fail libxl. Note when both policies
+        are specified on a given region, 'strict' is always preferred.
+        "relaxed" policy issue a warning message and also mask this entry INVALID
+        to indicate we shouldn't expose this entry to hvmloader.
+
+Here the default is 2G.
+
 =back
 
 =back
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index 0e98c84..5142578 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -21,6 +21,7 @@
 #include <stdlib.h>
 #include <unistd.h>
 #include <zlib.h>
+#include <assert.h>
 
 #include "xg_private.h"
 #include "xc_private.h"
@@ -270,7 +271,7 @@ static int setup_guest(xc_interface *xch,
 
     elf_parse_binary(&elf);
     v_start = 0;
-    v_end = args->mem_size;
+    v_end = args->lowmem_end;
 
     if ( nr_pages > target_pages )
         memflags |= XENMEMF_populate_on_demand;
@@ -754,6 +755,8 @@ int xc_hvm_build_target_mem(xc_interface *xch,
     args.mem_size = (uint64_t)memsize << 20;
     args.mem_target = (uint64_t)target << 20;
     args.image_file_name = image_name;
+    if ( args.mmio_size == 0 )
+        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
 
     return xc_hvm_build(xch, domid, &args);
 }
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 0a7913b..a6212fb 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -858,6 +858,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
 #define LIBXL_TIMER_MODE_DEFAULT -1
 #define LIBXL_MEMKB_DEFAULT ~0ULL
 
+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
 #define LIBXL_MS_VM_GENID_LEN 16
 typedef struct {
     uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 6c8ec63..0438731 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info)
 {
     if (b_info->rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
         b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+
+    if (b_info->rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
+        b_info->rdm_mem_boundary_memkb =
+                            LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
@@ -460,7 +464,7 @@ int libxl__domain_build(libxl__gc *gc,
 
     switch (info->type) {
     case LIBXL_DOMAIN_TYPE_HVM:
-        ret = libxl__build_hvm(gc, domid, info, state);
+        ret = libxl__build_hvm(gc, domid, d_config, state);
         if (ret)
             goto out;
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 33f9ce6..d908350 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -90,6 +90,261 @@ const char *libxl__domain_device_model(libxl__gc *gc,
     return dm;
 }
 
+static struct xen_reserved_device_memory
+*xc_device_get_rdm(libxl__gc *gc,
+                   uint32_t flag,
+                   uint16_t seg,
+                   uint8_t bus,
+                   uint8_t devfn,
+                   unsigned int *nr_entries)
+{
+    struct xen_reserved_device_memory *xrdm;
+    int rc;
+
+    rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+                                       NULL, nr_entries);
+    assert(rc <= 0);
+    /* "0" means we have no any rdm entry. */
+    if (!rc)
+        goto out;
+
+    if (errno == ENOBUFS) {
+        xrdm = malloc(*nr_entries * sizeof(xen_reserved_device_memory_t));
+        if (!xrdm) {
+            LOG(ERROR, "Could not allocate RDM buffer!\n");
+            goto out;
+        }
+        rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+                                           xrdm, nr_entries);
+        if (rc) {
+            LOG(ERROR, "Could not get reserved device memory maps.\n");
+            *nr_entries = 0;
+            free(xrdm);
+            xrdm = NULL;
+        }
+    } else
+        LOG(ERROR, "Could not get reserved device memory maps.\n");
+
+ out:
+    return xrdm;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+static bool overlaps_rdm(uint64_t start, uint64_t memsize,
+                         uint64_t rdm_start, uint64_t rdm_size)
+{
+    return (start + memsize > rdm_start) && (start < rdm_start + rdm_size);
+}
+
+/*
+ * Check reported RDM regions and handle potential gfn conflicts according
+ * to user preferred policy.
+ *
+ * RMRR can reside in address space beyond 4G theoretically, but we never
+ * see this in real world. So in order to avoid breaking highmem layout
+ * we don't solve highmem conflict. Note this means highmem rmrr could still
+ * be supported if no conflict.
+ *
+ * But in the case of lowmem, RMRR probably scatter the whole RAM space.
+ * Especially multiple RMRR entries would worsen this to lead a complicated
+ * memory layout. And then its hard to extend hvm_info_table{} to work
+ * hvmloader out. So here we're trying to figure out a simple solution to
+ * avoid breaking existing layout. So when a conflict occurs,
+ *
+ * #1. Above a predefined boundary (default 2G)
+ * - Move lowmem_end below reserved region to solve conflict;
+ *
+ * #2. Below a predefined boundary (default 2G)
+ * - Check strict/relaxed policy.
+ * "strict" policy leads to fail libxl. Note when both policies
+ * are specified on a given region, 'strict' is always preferred.
+ * "relaxed" policy issue a warning message and also mask this entry
+ * INVALID to indicate we shouldn't expose this entry to hvmloader.
+ */
+int libxl__domain_device_construct_rdm(libxl__gc *gc,
+                                       libxl_domain_config *d_config,
+                                       uint64_t rdm_mem_boundary,
+                                       struct xc_hvm_build_args *args)
+{
+    int i, j, conflict;
+    struct xen_reserved_device_memory *xrdm = NULL;
+    uint32_t type = d_config->b_info.rdm.type;
+    uint16_t seg;
+    uint8_t bus, devfn;
+    uint64_t rdm_start, rdm_size;
+    uint64_t highmem_end = args->highmem_end ? args->highmem_end : (1ull<<32);
+
+    /* Might not expose rdm. */
+    if (type == LIBXL_RDM_RESERVE_TYPE_NONE && !d_config->num_pcidevs)
+        return 0;
+
+    /* Query all RDM entries in this platform */
+    if (type == LIBXL_RDM_RESERVE_TYPE_HOST) {
+        unsigned int nr_entries;
+
+        /* Collect all rdm info if exist. */
+        xrdm = xc_device_get_rdm(gc, PCI_DEV_RDM_ALL,
+                                 0, 0, 0, &nr_entries);
+        if (!nr_entries)
+            return 0;
+
+        assert(xrdm);
+
+        d_config->num_rdms = nr_entries;
+        d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
+                                d_config->num_rdms * sizeof(libxl_device_rdm));
+
+        for (i = 0; i < d_config->num_rdms; i++) {
+            d_config->rdms[i].start =
+                                (uint64_t)xrdm[i].start_pfn << XC_PAGE_SHIFT;
+            d_config->rdms[i].size =
+                                (uint64_t)xrdm[i].nr_pages << XC_PAGE_SHIFT;
+            d_config->rdms[i].flag = d_config->b_info.rdm.reserve;
+        }
+
+        free(xrdm);
+    } else
+        d_config->num_rdms = 0;
+
+    /* Query RDM entries per-device */
+    for (i = 0; i < d_config->num_pcidevs; i++) {
+        unsigned int nr_entries;
+        bool new = true;
+
+        seg = d_config->pcidevs[i].domain;
+        bus = d_config->pcidevs[i].bus;
+        devfn = PCI_DEVFN(d_config->pcidevs[i].dev, d_config->pcidevs[i].func);
+        nr_entries = 0;
+        xrdm = xc_device_get_rdm(gc, ~PCI_DEV_RDM_ALL,
+                                 seg, bus, devfn, &nr_entries);
+        /* No RDM to associated with this device. */
+        if (!nr_entries)
+            continue;
+
+        assert(xrdm);
+
+        /*
+         * Need to check whether this entry is already saved in the array.
+         * This could come from two cases:
+         *
+         *   - user may configure to get all RMRRs in this platform, which
+         *   is already queried before this point
+         *   - or two assigned devices may share one RMRR entry
+         *
+         * different policies may be configured on the same RMRR due to above
+         * two cases. We choose a simple policy to always favor stricter policy
+         */
+        for (j = 0; j < d_config->num_rdms; j++) {
+            if (d_config->rdms[j].start ==
+                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT)
+             {
+                if (d_config->rdms[j].flag != LIBXL_RDM_RESERVE_FLAG_STRICT)
+                    d_config->rdms[j].flag = d_config->pcidevs[i].rdm_reserve;
+                new = false;
+                break;
+            }
+        }
+
+        if (new) {
+            d_config->num_rdms++;
+            d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
+                                d_config->num_rdms * sizeof(libxl_device_rdm));
+
+            d_config->rdms[d_config->num_rdms - 1].start =
+                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT;
+            d_config->rdms[d_config->num_rdms - 1].size =
+                                (uint64_t)xrdm[0].nr_pages << XC_PAGE_SHIFT;
+            d_config->rdms[d_config->num_rdms - 1].flag =
+                                d_config->pcidevs[i].rdm_reserve;
+        }
+        free(xrdm);
+    }
+
+    /*
+     * Next step is to check and avoid potential conflict between RDM entries
+     * and guest RAM. To avoid intrusive impact to existing memory layout
+     * {lowmem, mmio, highmem} which is passed around various function blocks,
+     * below conflicts are not handled which are rare and handling them would
+     * lead to a more scattered layout:
+     *  - RMRR in highmem area (>4G)
+     *  - RMRR lower than a defined memory boundary (e.g. 2G)
+     * Otherwise for conflicts between boundary and 4G, we'll simply move lowmem
+     * end below reserved region to solve conflict.
+     *
+     * If a conflict is detected on a given RMRR entry, an error will be
+     * returned if 'strict' policy is specified. Instead, if 'relaxed' policy
+     * specified, this conflict is treated just as a warning, but we mark this
+     * RMRR entry as INVALID to indicate that this entry shouldn't be exposed
+     * to hvmloader.
+     *
+     * Firstly we should check the case of rdm < 4G because we may need to
+     * expand highmem_end.
+     */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        rdm_start = d_config->rdms[i].start;
+        rdm_size = d_config->rdms[i].size;
+        conflict = overlaps_rdm(0, args->lowmem_end, rdm_start, rdm_size);
+
+        if (!conflict)
+            continue;
+
+        /* Just check if RDM > our memory boundary. */
+        if (rdm_start > rdm_mem_boundary) {
+            /*
+             * We will move downwards lowmem_end so we have to expand
+             * highmem_end.
+             */
+            highmem_end += (args->lowmem_end - rdm_start);
+            /* Now move downwards lowmem_end. */
+            args->lowmem_end = rdm_start;
+        }
+    }
+
+    /* Sync highmem_end. */
+    args->highmem_end = highmem_end;
+
+    /*
+     * Finally we can take same policy to check lowmem(< 2G) and
+     * highmem adjusted above.
+     */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        rdm_start = d_config->rdms[i].start;
+        rdm_size = d_config->rdms[i].size;
+        /* Does this entry conflict with lowmem? */
+        conflict = overlaps_rdm(0, args->lowmem_end,
+                                rdm_start, rdm_size);
+        /* Does this entry conflict with highmem? */
+        conflict |= overlaps_rdm((1ULL<<32),
+                                 args->highmem_end - (1ULL<<32),
+                                 rdm_start, rdm_size);
+
+        if (!conflict)
+            continue;
+
+        if(d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_STRICT) {
+            LOG(ERROR, "RDM conflict at 0x%lx.\n", d_config->rdms[i].start);
+            goto out;
+        } else {
+            LOG(WARN, "Ignoring RDM conflict at 0x%lx.\n",
+                      d_config->rdms[i].start);
+
+            /*
+             * Then mask this INVALID to indicate we shouldn't expose this
+             * to hvmloader.
+             */
+            d_config->rdms[i].flag = LIBXL_RDM_RESERVE_FLAG_INVALID;
+        }
+    }
+
+    return 0;
+
+ out:
+    return ERROR_FAIL;
+}
+
 const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *guest_config)
 {
     const libxl_vnc_info *vnc = NULL;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 867172a..1777b32 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -914,13 +914,14 @@ out:
 }
 
 int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
-              libxl_domain_build_info *info,
+              libxl_domain_config *d_config,
               libxl__domain_build_state *state)
 {
     libxl_ctx *ctx = libxl__gc_owner(gc);
     struct xc_hvm_build_args args = {};
     int ret, rc = ERROR_FAIL;
     uint64_t mmio_start, lowmem_end, highmem_end;
+    libxl_domain_build_info *const info = &d_config->b_info;
 
     memset(&args, 0, sizeof(struct xc_hvm_build_args));
     /* The params from the configuration file are in Mb, which are then
@@ -958,6 +959,14 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
     args.highmem_end = highmem_end;
     args.mmio_start = mmio_start;
 
+    ret = libxl__domain_device_construct_rdm(gc, d_config,
+                                             info->rdm_mem_boundary_memkb*1024,
+                                             &args);
+    if (ret) {
+        LOG(ERROR, "checking reserved device memory failed");
+        goto out;
+    }
+
     if (info->num_vnuma_nodes != 0) {
         int i;
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index e9ac886..52f3831 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1011,7 +1011,7 @@ _hidden int libxl__build_post(libxl__gc *gc, uint32_t domid,
 _hidden int libxl__build_pv(libxl__gc *gc, uint32_t domid,
              libxl_domain_build_info *info, libxl__domain_build_state *state);
 _hidden int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
-              libxl_domain_build_info *info,
+              libxl_domain_config *d_config,
               libxl__domain_build_state *state);
 
 _hidden int libxl__qemu_traditional_cmd(libxl__gc *gc, uint32_t domid,
@@ -1519,6 +1519,15 @@ _hidden int libxl__need_xenpv_qemu(libxl__gc *gc,
         int nr_channels, libxl_device_channel *channels);
 
 /*
+ * This function will fix reserved device memory conflict
+ * according to user's configuration.
+ */
+_hidden int libxl__domain_device_construct_rdm(libxl__gc *gc,
+                                   libxl_domain_config *d_config,
+                                   uint64_t rdm_mem_guard,
+                                   struct xc_hvm_build_args *args);
+
+/*
  * This function will cause the whole libxl process to hang
  * if the device model does not respond.  It is deprecated.
  *
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 4dfcaf7..b4282a0 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -395,6 +395,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("target_memkb",    MemKB),
     ("video_memkb",     MemKB),
     ("shadow_memkb",    MemKB),
+    ("rdm_mem_boundary_memkb",    MemKB),
     ("rtc_timeoffset",  uint32),
     ("exec_ssidref",    uint32),
     ("exec_ssid_label", string),
@@ -559,6 +560,12 @@ libxl_device_pci = Struct("device_pci", [
     ("rdm_reserve",   libxl_rdm_reserve_flag),
     ])
 
+libxl_device_rdm = Struct("device_rdm", [
+    ("start", uint64),
+    ("size", uint64),
+    ("flag", libxl_rdm_reserve_flag),
+    ])
+
 libxl_device_dtdev = Struct("device_dtdev", [
     ("path", string),
     ])
@@ -589,6 +596,7 @@ libxl_domain_config = Struct("domain_config", [
     ("disks", Array(libxl_device_disk, "num_disks")),
     ("nics", Array(libxl_device_nic, "num_nics")),
     ("pcidevs", Array(libxl_device_pci, "num_pcidevs")),
+    ("rdms", Array(libxl_device_rdm, "num_rdms")),
     ("dtdevs", Array(libxl_device_dtdev, "num_dtdevs")),
     ("vfbs", Array(libxl_device_vfb, "num_vfbs")),
     ("vkbs", Array(libxl_device_vkb, "num_vkbs")),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 4364ba4..85d74fd 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1374,6 +1374,9 @@ static void parse_config_data(const char *config_source,
     if (!xlu_cfg_get_long (config, "videoram", &l, 0))
         b_info->video_memkb = l * 1024;
 
+    if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
+        b_info->rdm_mem_boundary_memkb = l * 1024;
+
     if (!xlu_cfg_get_long(config, "max_event_channels", &l, 0))
         b_info->event_channels = l;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 14/16] tools/libxl: extend XENMEM_set_memory_map
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (12 preceding siblings ...)
  2015-06-11  1:15 ` [v3][PATCH 13/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-12 16:43   ` Wei Liu
  2015-06-11  1:15 ` [v3][PATCH 15/16] xen/vtd: enable USB device assignment Tiejun Chen
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist. And hvmloader would need this info
later.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 tools/libxl/libxl_dom.c      |  5 +++
 tools/libxl/libxl_internal.h | 24 ++++++++++++++
 tools/libxl/libxl_x86.c      | 78 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 107 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 1777b32..3125ac0 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
         goto out;
     }
 
+    if (libxl__domain_construct_e820(gc, d_config, domid, &args)) {
+        LOG(ERROR, "setting domain memory map failed");
+        goto out;
+    }
+
     ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
                                &state->store_mfn, state->console_port,
                                &state->console_mfn, state->store_domid,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 52f3831..d838639 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3713,6 +3713,30 @@ static inline void libxl__update_config_vtpm(libxl__gc *gc,
  */
 void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr,
                                     const libxl_bitmap *sptr);
+
+/*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+int libxl__domain_construct_e820(libxl__gc *gc,
+                                 libxl_domain_config *d_config,
+                                 uint32_t domid,
+                                 struct xc_hvm_build_args *args);
+
 #endif
 
 /*
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..291f6ab 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,84 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
 }
 
 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x100000
+int libxl__domain_construct_e820(libxl__gc *gc,
+                                 libxl_domain_config *d_config,
+                                 uint32_t domid,
+                                 struct xc_hvm_build_args *args)
+{
+    unsigned int nr = 0, i;
+    /* We always own at least one lowmem entry. */
+    unsigned int e820_entries = 1;
+    struct e820entry *e820 = NULL;
+    uint64_t highmem_size =
+                    args->highmem_end ? args->highmem_end - (1ull << 32) : 0;
+
+    /* Add all rdm entries. */
+    for (i = 0; i < d_config->num_rdms; i++)
+        if (d_config->rdms[i].flag != LIBXL_RDM_RESERVE_FLAG_INVALID)
+            e820_entries++;
+
+
+    /* If we should have a highmem range. */
+    if (highmem_size)
+        e820_entries++;
+
+    if (e820_entries >= E820MAX) {
+        LOG(ERROR, "Ooops! Too many entries in the memory map!\n");
+        return -1;
+    }
+
+    e820 = libxl__malloc(NOGC, sizeof(struct e820entry) * e820_entries);
+
+    /* Low memory */
+    e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
+    e820[nr].size = args->lowmem_end - GUEST_LOW_MEM_START_DEFAULT;
+    e820[nr].type = E820_RAM;
+    nr++;
+
+    /* RDM mapping */
+    for (i = 0; i < d_config->num_rdms; i++) {
+        if (d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_INVALID)
+            continue;
+
+        e820[nr].addr = d_config->rdms[i].start;
+        e820[nr].size = d_config->rdms[i].size;
+        e820[nr].type = E820_RESERVED;
+        nr++;
+    }
+
+    /* High memory */
+    if (highmem_size) {
+        e820[nr].addr = ((uint64_t)1 << 32);
+        e820[nr].size = highmem_size;
+        e820[nr].type = E820_RAM;
+    }
+
+    if (xc_domain_set_memory_map(CTX->xch, domid, e820, e820_entries) != 0)
+        return -1;
+
+    return 0;
+}
+
+/*
  * Local variables:
  * mode: C
  * c-basic-offset: 4
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 15/16] xen/vtd: enable USB device assignment
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (13 preceding siblings ...)
  2015-06-11  1:15 ` [v3][PATCH 14/16] tools/libxl: extend XENMEM_set_memory_map Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-11 10:22   ` Tian, Kevin
  2015-06-11  1:15 ` [v3][PATCH 16/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

Before we refine RMRR mechanism, USB RMRR may conflict with guest bios
region so we always ignore USB RMRR. Now this can be gone when we enable
pci_force to check/reserve RMRR.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++---------
 xen/drivers/passthrough/vtd/utils.c |  7 -------
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {                                                \
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index d7c9e1c..d3233b8 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2229,11 +2229,9 @@ static int reassign_device_ownership(
     /*
      * If the device belongs to the hardware domain, and it has RMRR, don't
      * remove it from the hardware domain, because BIOS may use RMRR at
-     * booting time. Also account for the special casing of USB below (in
-     * intel_iommu_assign_device()).
+     * booting time.
      */
-    if ( !is_hardware_domain(source) &&
-         !is_usb_device(pdev->seg, pdev->bus, pdev->devfn) )
+    if ( !is_hardware_domain(source) )
     {
         const struct acpi_rmrr_unit *rmrr;
         u16 bdf;
@@ -2283,13 +2281,8 @@ static int intel_iommu_assign_device(
     if ( ret )
         return ret;
 
-    /* FIXME: Because USB RMRR conflicts with guest bios region,
-     * ignore USB RMRR temporarily.
-     */
     seg = pdev->seg;
     bus = pdev->bus;
-    if ( is_usb_device(seg, bus, pdev->devfn) )
-        return 0;
 
     /* Setup rmrr identity mapping */
     for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include "extern.h"
 #include <asm/io_apic.h>
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-    u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-                                PCI_CLASS_DEVICE);
-    return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [v3][PATCH 16/16] xen/vtd: prevent from assign the device with shared rmrr
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (14 preceding siblings ...)
  2015-06-11  1:15 ` [v3][PATCH 15/16] xen/vtd: enable USB device assignment Tiejun Chen
@ 2015-06-11  1:15 ` Tiejun Chen
  2015-06-11 10:25   ` Tian, Kevin
  2015-06-17 10:28   ` Jan Beulich
  2015-06-11  7:27 ` [v3][PATCH 00/16] Fix RMRR Jan Beulich
  2015-06-11 12:52 ` Tim Deegan
  17 siblings, 2 replies; 114+ messages in thread
From: Tiejun Chen @ 2015-06-11  1:15 UTC (permalink / raw)
  To: jbeulich, tim, andrew.cooper3, kevin.tian, yang.z.zhang,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/drivers/passthrough/vtd/iommu.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index d3233b8..f220081 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2277,13 +2277,37 @@ static int intel_iommu_assign_device(
     if ( list_empty(&acpi_drhd_units) )
         return -ENODEV;
 
+    seg = pdev->seg;
+    bus = pdev->bus;
+    /*
+     * In rare cases one given rmrr is shared by multiple devices but
+     * obviously this would put the security of a system at risk. So
+     * we should prevent from this sort of device assignment.
+     *
+     * TODO: actually we can group these devices which shared rmrr, and
+     * then allow all devices within a group to be assigned to same domain.
+     */
+    for_each_rmrr_device( rmrr, bdf, i )
+    {
+        if ( rmrr->segment == seg &&
+             PCI_BUS(bdf) == bus &&
+             PCI_DEVFN2(bdf) == devfn )
+        {
+            if ( rmrr->scope.devices_cnt > 1 )
+            {
+                ret = -EPERM;
+                printk(XENLOG_G_ERR VTDPREFIX
+                       " cannot assign this device with shared RMRR for Dom%d (%d)\n",
+                       d->domain_id, ret);
+                return ret;
+            }
+        }
+    }
+
     ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
     if ( ret )
         return ret;
 
-    seg = pdev->seg;
-    bus = pdev->bus;
-
     /* Setup rmrr identity mapping */
     for_each_rmrr_device( rmrr, bdf, i )
     {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 00/16] Fix RMRR
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (15 preceding siblings ...)
  2015-06-11  1:15 ` [v3][PATCH 16/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
@ 2015-06-11  7:27 ` Jan Beulich
  2015-06-11  8:42   ` Tian, Kevin
  2015-06-11 12:52 ` Tim Deegan
  17 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-11  7:27 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, yang.z.zhang

>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
> v3:
> 
> * Rearrange all patches orderly as Wei suggested
> * Rebase on the latest tree
> * Address some Wei's comments on tools side
> * Two changes for runtime cycle
>    patch #2,xen/x86/p2m: introduce set_identity_p2m_entry, on hypervisor 
> side
> 
>   a>. Introduce paging_mode_translate()
>   Otherwise, we'll see this error when boot Xen/Dom0
> 
> (XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702
> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> ....
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0801f53db>] p2m_pt_get_entry+0x29/0x558
> (XEN)    [<ffff82d0801f0b5c>] set_identity_p2m_entry+0xfc/0x1f0
> (XEN)    [<ffff82d08014ebc8>] rmrr_identity_mapping+0x154/0x1ce
> (XEN)    [<ffff82d0802abb46>] intel_iommu_hwdom_init+0x76/0x158
> (XEN)    [<ffff82d0802ab169>] iommu_hwdom_init+0x179/0x188
> (XEN)    [<ffff82d0802cc608>] construct_dom0+0x2fed/0x35d8
> (XEN)    [<ffff82d0802bdaa0>] __start_xen+0x22d8/0x2381
> (XEN)    [<ffff82d080100067>] __high_start+0x53/0x55
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702
> 
> Note I don't copy all info since I think the above is enough.
> 
>   b>. Actually we still need to use "mfn_x(mfn) == INVALID_MFN" to confirm
>   we're getting an invalid mfn.
> 
> * Add patch #16 to handle those devices which share same RMRR.

Summarizing the changed in the overview mail is fine, but the primary
place for them to live to help reviewing should be in the patches
themselves, after a first --- marker. This is especially so for as
extensive an explanation as you give for patch 2 here (but I'll reply
to that in the context of that patch).

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 02/16] xen/x86/p2m: introduce set_identity_p2m_entry
  2015-06-11  1:15 ` [v3][PATCH 02/16] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
@ 2015-06-11  7:33   ` Jan Beulich
  2015-06-11  8:23     ` Chen, Tiejun
  2015-06-11  9:00   ` Tian, Kevin
  1 sibling, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-11  7:33 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, yang.z.zhang

>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
> We will create this sort of identity mapping as follows:
> 
> If the gfn space is unoccupied, we just set the mapping. If the space
> is already occupied by 1:1 mappings, do nothing. Failed for any
> other cases.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

First of all you continue to be copying each patch to every
maintainer involved in some part of the series. Please limit the
Cc list of each patch to the actual list of people needing to be
Cc-ed on it (or you know explicitly wanting a copy).

> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -898,6 +898,41 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
>      return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
>  }
>  
> +int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
> +                           p2m_access_t p2ma)
> +{
> +    p2m_type_t p2mt;
> +    p2m_access_t a;
> +    mfn_t mfn;
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    int ret;
> +
> +    if ( paging_mode_translate(p2m->domain) )
> +    {
> +        gfn_lock(p2m, gfn, 0);
> +
> +        mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
> +
> +        if ( p2mt == p2m_invalid || mfn_x(mfn) == INVALID_MFN )

I'm not fundamentally opposed to this extra INVALID_MFN check, but
could you please clarify for which P2M type you saw INVALID_MFN
coming back here, and for which p2m_invalid cases you didn't also
see INVALID_MFN? I.e. I'd really prefer a single check to be used
when that can cover all cases.

> +            ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
> +                                p2m_mmio_direct, p2ma);
> +        else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
> +            ret = 0;
> +        else
> +        {
> +            ret = -EBUSY;
> +            printk(XENLOG_G_WARNING
> +                   "Cannot identity map d%d:%lx, already mapped to %lx.\n",
> +                   d->domain_id, gfn, mfn_x(mfn));
> +        }
> +
> +        gfn_unlock(p2m, gfn, 0);
> +        return ret;
> +    }
> +
> +    return 0;
> +}

Either have a single return point, or invert the original if() and bail
early (reducing the indentation level on the main body of the code).

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 02/16] xen/x86/p2m: introduce set_identity_p2m_entry
  2015-06-11  7:33   ` Jan Beulich
@ 2015-06-11  8:23     ` Chen, Tiejun
  2015-06-11  9:23       ` Jan Beulich
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-11  8:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, yang.z.zhang

On 2015/6/11 15:33, Jan Beulich wrote:
>>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
>> We will create this sort of identity mapping as follows:
>>
>> If the gfn space is unoccupied, we just set the mapping. If the space
>> is already occupied by 1:1 mappings, do nothing. Failed for any
>> other cases.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>
> First of all you continue to be copying each patch to every
> maintainer involved in some part of the series. Please limit the

I just hope all involved guys can see this series on the whole to 
review. But,

> Cc list of each patch to the actual list of people needing to be
> Cc-ed on it (or you know explicitly wanting a copy).

Next, I will just send them to each associated maintainer.

>
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -898,6 +898,41 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
>>       return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
>>   }
>>
>> +int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
>> +                           p2m_access_t p2ma)
>> +{
>> +    p2m_type_t p2mt;
>> +    p2m_access_t a;
>> +    mfn_t mfn;
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +    int ret;
>> +
>> +    if ( paging_mode_translate(p2m->domain) )
>> +    {
>> +        gfn_lock(p2m, gfn, 0);
>> +
>> +        mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
>> +
>> +        if ( p2mt == p2m_invalid || mfn_x(mfn) == INVALID_MFN )
>
> I'm not fundamentally opposed to this extra INVALID_MFN check, but
> could you please clarify for which P2M type you saw INVALID_MFN
> coming back here, and for which p2m_invalid cases you didn't also
> see INVALID_MFN? I.e. I'd really prefer a single check to be used
> when that can cover all cases.

Actually, I initially adopted "!mfn_valid(mfn)" in our previous version. 
But Tim issued one comment about this,

"I don't think this check is quite right -- for example, this p2m entry
might be an MMIO mapping or a PoD entry.  "if ( p2mt == p2m_invalid )"
would be better."

But if I just keep his recommended check, you can see the following when 
I pass through IGD,

(XEN) Cannot identity map d1:ad800, already mapped to ffffffffffffffff 
with p2mt:4.

Looks "4" indicates p2m_mmio_dm, right?

>
>> +            ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
>> +                                p2m_mmio_direct, p2ma);
>> +        else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
>> +            ret = 0;
>> +        else
>> +        {
>> +            ret = -EBUSY;
>> +            printk(XENLOG_G_WARNING
>> +                   "Cannot identity map d%d:%lx, already mapped to %lx.\n",
>> +                   d->domain_id, gfn, mfn_x(mfn));
>> +        }
>> +
>> +        gfn_unlock(p2m, gfn, 0);
>> +        return ret;
>> +    }
>> +
>> +    return 0;
>> +}
>
> Either have a single return point, or invert the original if() and bail
> early (reducing the indentation level on the main body of the code).
>

Sure, I guess I can follow this patten,

int ret = 0;

if ()
{
     ...
}

return ret;

Thanks
Tiejun

> Jan
>
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 00/16] Fix RMRR
  2015-06-11  7:27 ` [v3][PATCH 00/16] Fix RMRR Jan Beulich
@ 2015-06-11  8:42   ` Tian, Kevin
  2015-06-11  9:06     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Tian, Kevin @ 2015-06-11  8:42 UTC (permalink / raw)
  To: Jan Beulich, Chen, Tiejun
  Cc: tim, wei.liu2, ian.campbell, andrew.cooper3, Ian.Jackson,
	xen-devel, stefano.stabellini, Zhang, Yang Z

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, June 11, 2015 3:28 PM
> 
> >>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
> > v3:
> >
> > * Rearrange all patches orderly as Wei suggested
> > * Rebase on the latest tree
> > * Address some Wei's comments on tools side
> > * Two changes for runtime cycle
> >    patch #2,xen/x86/p2m: introduce set_identity_p2m_entry, on hypervisor
> > side
> >
> >   a>. Introduce paging_mode_translate()
> >   Otherwise, we'll see this error when boot Xen/Dom0
> >
> > (XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702
> > (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
> > ....
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d0801f53db>] p2m_pt_get_entry+0x29/0x558
> > (XEN)    [<ffff82d0801f0b5c>] set_identity_p2m_entry+0xfc/0x1f0
> > (XEN)    [<ffff82d08014ebc8>] rmrr_identity_mapping+0x154/0x1ce
> > (XEN)    [<ffff82d0802abb46>] intel_iommu_hwdom_init+0x76/0x158
> > (XEN)    [<ffff82d0802ab169>] iommu_hwdom_init+0x179/0x188
> > (XEN)    [<ffff82d0802cc608>] construct_dom0+0x2fed/0x35d8
> > (XEN)    [<ffff82d0802bdaa0>] __start_xen+0x22d8/0x2381
> > (XEN)    [<ffff82d080100067>] __high_start+0x53/0x55
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702
> >
> > Note I don't copy all info since I think the above is enough.
> >
> >   b>. Actually we still need to use "mfn_x(mfn) == INVALID_MFN" to confirm
> >   we're getting an invalid mfn.
> >
> > * Add patch #16 to handle those devices which share same RMRR.
> 
> Summarizing the changed in the overview mail is fine, but the primary
> place for them to live to help reviewing should be in the patches
> themselves, after a first --- marker. This is especially so for as
> extensive an explanation as you give for patch 2 here (but I'll reply
> to that in the context of that patch).
> 

Agree. Tiejun could you add per-patch version history and resend a 
new version with right maintainers CCed?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map
  2015-06-11  1:15 ` [v3][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
@ 2015-06-11  8:56   ` Tian, Kevin
  0 siblings, 0 replies; 114+ messages in thread
From: Tian, Kevin @ 2015-06-11  8:56 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Thursday, June 11, 2015 9:15 AM
> 
> From: Jan Beulich <jbeulich@suse.com>
> 
> This is a prerequisite for punching holes into HVM and PVH guests' P2M
> to allow passing through devices that are associated with (on VT-d)
> RMRRs.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Acked-by: Kevin Tian <kevin.tian@intel.com>

> ---
>  xen/common/compat/memory.c           | 66
> ++++++++++++++++++++++++++++++++++++
>  xen/common/memory.c                  | 64
> ++++++++++++++++++++++++++++++++++
>  xen/drivers/passthrough/iommu.c      | 10 ++++++
>  xen/drivers/passthrough/vtd/dmar.c   | 32 +++++++++++++++++
>  xen/drivers/passthrough/vtd/extern.h |  1 +
>  xen/drivers/passthrough/vtd/iommu.c  |  1 +
>  xen/include/public/memory.h          | 32 ++++++++++++++++-
>  xen/include/xen/iommu.h              | 10 ++++++
>  xen/include/xen/pci.h                |  2 ++
>  xen/include/xlat.lst                 |  3 +-
>  10 files changed, 219 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
> index b258138..b608496 100644
> --- a/xen/common/compat/memory.c
> +++ b/xen/common/compat/memory.c
> @@ -17,6 +17,45 @@ CHECK_TYPE(domid);
>  CHECK_mem_access_op;
>  CHECK_vmemrange;
> 
> +#ifdef HAS_PASSTHROUGH
> +struct get_reserved_device_memory {
> +    struct compat_reserved_device_memory_map map;
> +    unsigned int used_entries;
> +};
> +
> +static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
> +                                      u32 id, void *ctxt)
> +{
> +    struct get_reserved_device_memory *grdm = ctxt;
> +    u32 sbdf;
> +    struct compat_reserved_device_memory rdm = {
> +        .start_pfn = start, .nr_pages = nr
> +    };
> +
> +    sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
> +    if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
> +    {
> +        if ( grdm->used_entries < grdm->map.nr_entries )
> +        {
> +            if ( rdm.start_pfn != start || rdm.nr_pages != nr )
> +                return -ERANGE;
> +
> +            if ( __copy_to_compat_offset(grdm->map.buffer,
> +                                         grdm->used_entries,
> +                                         &rdm,
> +                                         1) )
> +            {
> +                return -EFAULT;
> +            }
> +        }
> +        ++grdm->used_entries;
> +        return 1;
> +    }
> +
> +    return 0;
> +}
> +#endif
> +
>  int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
>  {
>      int split, op = cmd & MEMOP_CMD_MASK;
> @@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd,
> XEN_GUEST_HANDLE_PARAM(void) compat)
>              break;
>          }
> 
> +#ifdef HAS_PASSTHROUGH
> +        case XENMEM_reserved_device_memory_map:
> +        {
> +            struct get_reserved_device_memory grdm;
> +
> +            if ( copy_from_guest(&grdm.map, compat, 1) ||
> +                 !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
> +                return -EFAULT;
> +
> +            grdm.used_entries = 0;
> +            rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
> +                                                  &grdm);
> +
> +            if ( !rc && grdm.map.nr_entries < grdm.used_entries )
> +                rc = -ENOBUFS;
> +
> +            grdm.map.nr_entries = grdm.used_entries;
> +            if ( grdm.map.nr_entries )
> +            {
> +                if ( __copy_to_guest(compat, &grdm.map, 1) )
> +                    rc = -EFAULT;
> +            }
> +
> +            return rc;
> +        }
> +#endif
> +
>          default:
>              return compat_arch_memory_op(cmd, compat);
>          }
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 063a1c5..c789f72 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
>      return 0;
>  }
> 
> +#ifdef HAS_PASSTHROUGH
> +struct get_reserved_device_memory {
> +    struct xen_reserved_device_memory_map map;
> +    unsigned int used_entries;
> +};
> +
> +static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
> +                                      u32 id, void *ctxt)
> +{
> +    struct get_reserved_device_memory *grdm = ctxt;
> +    u32 sbdf;
> +
> +    sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
> +    if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
> +    {
> +        if ( grdm->used_entries < grdm->map.nr_entries )
> +        {
> +            struct xen_reserved_device_memory rdm = {
> +                .start_pfn = start, .nr_pages = nr
> +            };
> +
> +            if ( __copy_to_guest_offset(grdm->map.buffer,
> +                                        grdm->used_entries,
> +                                        &rdm,
> +                                        1) )
> +            {
> +                return -EFAULT;
> +            }
> +        }
> +        ++grdm->used_entries;
> +        return 1;
> +    }
> +
> +    return 0;
> +}
> +#endif
> +
>  long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>  {
>      struct domain *d;
> @@ -1162,6 +1199,33 @@ long do_memory_op(unsigned long cmd,
> XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
> 
> +#ifdef HAS_PASSTHROUGH
> +    case XENMEM_reserved_device_memory_map:
> +    {
> +        struct get_reserved_device_memory grdm;
> +
> +        if ( copy_from_guest(&grdm.map, arg, 1) ||
> +             !guest_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
> +            return -EFAULT;
> +
> +        grdm.used_entries = 0;
> +        rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
> +                                              &grdm);
> +
> +        if ( !rc && grdm.map.nr_entries < grdm.used_entries )
> +            rc = -ENOBUFS;
> +
> +        grdm.map.nr_entries = grdm.used_entries;
> +        if ( grdm.map.nr_entries )
> +        {
> +            if ( __copy_to_guest(arg, &grdm.map, 1) )
> +                rc = -EFAULT;
> +        }
> +
> +        break;
> +    }
> +#endif
> +
>      default:
>          rc = arch_memory_op(cmd, arg);
>          break;
> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> index 06cb38f..0b2ef52 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -375,6 +375,16 @@ void iommu_crash_shutdown(void)
>      iommu_enabled = iommu_intremap = 0;
>  }
> 
> +int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
> +{
> +    const struct iommu_ops *ops = iommu_get_ops();
> +
> +    if ( !iommu_enabled || !ops->get_reserved_device_memory )
> +        return 0;
> +
> +    return ops->get_reserved_device_memory(func, ctxt);
> +}
> +
>  bool_t iommu_has_feature(struct domain *d, enum iommu_feature feature)
>  {
>      const struct hvm_iommu *hd = domain_hvm_iommu(d);
> diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
> index 2b07be9..a730de5 100644
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -893,3 +893,35 @@ int platform_supports_x2apic(void)
>      unsigned int mask = ACPI_DMAR_INTR_REMAP | ACPI_DMAR_X2APIC_OPT_OUT;
>      return cpu_has_x2apic && ((dmar_flags & mask) == ACPI_DMAR_INTR_REMAP);
>  }
> +
> +int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
> +{
> +    struct acpi_rmrr_unit *rmrr, *rmrr_cur = NULL;
> +    int rc = 0;
> +    unsigned int i;
> +    u16 bdf;
> +
> +    for_each_rmrr_device ( rmrr, bdf, i )
> +    {
> +        if ( rmrr != rmrr_cur )
> +        {
> +            rc = func(PFN_DOWN(rmrr->base_address),
> +                      PFN_UP(rmrr->end_address) -
> +                        PFN_DOWN(rmrr->base_address),
> +                      PCI_SBDF(rmrr->segment, bdf),
> +                      ctxt);
> +
> +            if ( unlikely(rc < 0) )
> +                return rc;
> +
> +            if ( !rc )
> +                continue;
> +
> +            /* Just go next. */
> +            if ( rc == 1 )
> +                rmrr_cur = rmrr;
> +        }
> +    }
> +
> +    return 0;
> +}
> diff --git a/xen/drivers/passthrough/vtd/extern.h
> b/xen/drivers/passthrough/vtd/extern.h
> index 5524dba..f9ee9b0 100644
> --- a/xen/drivers/passthrough/vtd/extern.h
> +++ b/xen/drivers/passthrough/vtd/extern.h
> @@ -75,6 +75,7 @@ int domain_context_mapping_one(struct domain *domain, struct
> iommu *iommu,
>                                 u8 bus, u8 devfn, const struct pci_dev *);
>  int domain_context_unmap_one(struct domain *domain, struct iommu *iommu,
>                               u8 bus, u8 devfn);
> +int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt);
> 
>  unsigned int io_apic_read_remap_rte(unsigned int apic, unsigned int reg);
>  void io_apic_write_remap_rte(unsigned int apic,
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index 9053a1f..6a37624 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2491,6 +2491,7 @@ const struct iommu_ops intel_iommu_ops = {
>      .crash_shutdown = vtd_crash_shutdown,
>      .iotlb_flush = intel_iommu_iotlb_flush,
>      .iotlb_flush_all = intel_iommu_iotlb_flush_all,
> +    .get_reserved_device_memory = intel_iommu_get_reserved_device_memory,
>      .dump_p2m_table = vtd_dump_p2m_table,
>  };
> 
> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
> index 832559a..7b25275 100644
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -573,7 +573,37 @@ struct xen_vnuma_topology_info {
>  typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t);
> 
> -/* Next available subop number is 27 */
> +/*
> + * With some legacy devices, certain guest-physical addresses cannot safely
> + * be used for other purposes, e.g. to map guest RAM.  This hypercall
> + * enumerates those regions so the toolstack can avoid using them.
> + */
> +#define XENMEM_reserved_device_memory_map   27
> +struct xen_reserved_device_memory {
> +    xen_pfn_t start_pfn;
> +    xen_ulong_t nr_pages;
> +};
> +typedef struct xen_reserved_device_memory xen_reserved_device_memory_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_t);
> +
> +struct xen_reserved_device_memory_map {
> +    /* IN */
> +    /* Currently just one bit to indicate checkng all Reserved Device Memory. */
> +#define PCI_DEV_RDM_ALL   0x1
> +    uint32_t        flag;
> +    /* IN */
> +    uint16_t        seg;
> +    uint8_t         bus;
> +    uint8_t         devfn;
> +    /* IN/OUT */
> +    unsigned int    nr_entries;
> +    /* OUT */
> +    XEN_GUEST_HANDLE(xen_reserved_device_memory_t) buffer;
> +};
> +typedef struct xen_reserved_device_memory_map
> xen_reserved_device_memory_map_t;
> +DEFINE_XEN_GUEST_HANDLE(xen_reserved_device_memory_map_t);
> +
> +/* Next available subop number is 28 */
> 
>  #endif /* __XEN_PUBLIC_MEMORY_H__ */
> 
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index b30bf41..e2f584d 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -126,6 +126,14 @@ int iommu_do_dt_domctl(struct xen_domctl *, struct domain *,
> 
>  struct page_info;
> 
> +/*
> + * Any non-zero value returned from callbacks of this type will cause the
> + * function the callback was handed to terminate its iteration. Assigning
> + * meaning of these non-zero values is left to the top level caller /
> + * callback pair.
> + */
> +typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void *ctxt);
> +
>  struct iommu_ops {
>      int (*init)(struct domain *d);
>      void (*hwdom_init)(struct domain *d);
> @@ -157,12 +165,14 @@ struct iommu_ops {
>      void (*crash_shutdown)(void);
>      void (*iotlb_flush)(struct domain *d, unsigned long gfn, unsigned int page_count);
>      void (*iotlb_flush_all)(struct domain *d);
> +    int (*get_reserved_device_memory)(iommu_grdm_t *, void *);
>      void (*dump_p2m_table)(struct domain *d);
>  };
> 
>  void iommu_suspend(void);
>  void iommu_resume(void);
>  void iommu_crash_shutdown(void);
> +int iommu_get_reserved_device_memory(iommu_grdm_t *, void *);
> 
>  void iommu_share_p2m_table(struct domain *d);
> 
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 3908146..d176e8b 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -33,6 +33,8 @@
>  #define PCI_DEVFN2(bdf) ((bdf) & 0xff)
>  #define PCI_BDF(b,d,f)  ((((b) & 0xff) << 8) | PCI_DEVFN(d,f))
>  #define PCI_BDF2(b,df)  ((((b) & 0xff) << 8) | ((df) & 0xff))
> +#define PCI_SBDF(s,bdf) (((s & 0xffff) << 16) | (bdf & 0xffff))
> +#define PCI_SBDF2(s,b,df) (((s & 0xffff) << 16) | PCI_BDF2(b,df))
> 
>  struct pci_dev_info {
>      bool_t is_extfn;
> diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
> index 9c9fd9a..dd23559 100644
> --- a/xen/include/xlat.lst
> +++ b/xen/include/xlat.lst
> @@ -61,9 +61,10 @@
>  !	memory_exchange			memory.h
>  !	memory_map			memory.h
>  !	memory_reservation		memory.h
> -?	mem_access_op		memory.h
> +?	mem_access_op			memory.h
>  !	pod_target			memory.h
>  !	remove_from_physmap		memory.h
> +!	reserved_device_memory_map	memory.h
>  ?	vmemrange			memory.h
>  !	vnuma_topology_info		memory.h
>  ?	physdev_eoi			physdev.h
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 02/16] xen/x86/p2m: introduce set_identity_p2m_entry
  2015-06-11  1:15 ` [v3][PATCH 02/16] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
  2015-06-11  7:33   ` Jan Beulich
@ 2015-06-11  9:00   ` Tian, Kevin
  2015-06-11  9:18     ` Chen, Tiejun
  1 sibling, 1 reply; 114+ messages in thread
From: Tian, Kevin @ 2015-06-11  9:00 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Thursday, June 11, 2015 9:15 AM
> 
> We will create this sort of identity mapping as follows:
> 
> If the gfn space is unoccupied, we just set the mapping. If the space
> is already occupied by 1:1 mappings, do nothing. Failed for any
> other cases.

"If space is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned."

> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>, plus one small
comment as below:


> +            printk(XENLOG_G_WARNING
> +                   "Cannot identity map d%d:%lx, already mapped to %lx.\n",
> +                   d->domain_id, gfn, mfn_x(mfn));

"Cannot setup identity map, ..., gfn already mapped to..."

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 00/16] Fix RMRR
  2015-06-11  8:42   ` Tian, Kevin
@ 2015-06-11  9:06     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-11  9:06 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich
  Cc: tim, wei.liu2, ian.campbell, andrew.cooper3, Ian.Jackson,
	xen-devel, stefano.stabellini, Zhang, Yang Z

On 2015/6/11 16:42, Tian, Kevin wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Thursday, June 11, 2015 3:28 PM
>>
>>>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
>>> v3:
>>>
>>> * Rearrange all patches orderly as Wei suggested
>>> * Rebase on the latest tree
>>> * Address some Wei's comments on tools side
>>> * Two changes for runtime cycle
>>>     patch #2,xen/x86/p2m: introduce set_identity_p2m_entry, on hypervisor
>>> side
>>>
>>>    a>. Introduce paging_mode_translate()
>>>    Otherwise, we'll see this error when boot Xen/Dom0
>>>
>>> (XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702
>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>>> ....
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d0801f53db>] p2m_pt_get_entry+0x29/0x558
>>> (XEN)    [<ffff82d0801f0b5c>] set_identity_p2m_entry+0xfc/0x1f0
>>> (XEN)    [<ffff82d08014ebc8>] rmrr_identity_mapping+0x154/0x1ce
>>> (XEN)    [<ffff82d0802abb46>] intel_iommu_hwdom_init+0x76/0x158
>>> (XEN)    [<ffff82d0802ab169>] iommu_hwdom_init+0x179/0x188
>>> (XEN)    [<ffff82d0802cc608>] construct_dom0+0x2fed/0x35d8
>>> (XEN)    [<ffff82d0802bdaa0>] __start_xen+0x22d8/0x2381
>>> (XEN)    [<ffff82d080100067>] __high_start+0x53/0x55
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702
>>>
>>> Note I don't copy all info since I think the above is enough.
>>>
>>>    b>. Actually we still need to use "mfn_x(mfn) == INVALID_MFN" to confirm
>>>    we're getting an invalid mfn.
>>>
>>> * Add patch #16 to handle those devices which share same RMRR.
>>
>> Summarizing the changed in the overview mail is fine, but the primary
>> place for them to live to help reviewing should be in the patches
>> themselves, after a first --- marker. This is especially so for as
>> extensive an explanation as you give for patch 2 here (but I'll reply
>> to that in the context of that patch).
>>
>
> Agree. Tiejun could you add per-patch version history and resend a
> new version with right maintainers CCed?
>

Yes, I should do this as Jan mentioned but I'd like to do this next.

Because to compare v2, I didn't introduce any changes on hypervisor 
side, except for these twos listed here. Others focus on refactoring 
codes on tools side. But indeed, I still should comment this per patch 
as you guys said.

So next, I will do

#1. Make sure send per patch to its associated maintainers
#2. Includes revision history to each patch

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 03/16] xen/vtd: create RMRR mapping
  2015-06-11  1:15 ` [v3][PATCH 03/16] xen/vtd: create RMRR mapping Tiejun Chen
@ 2015-06-11  9:14   ` Tian, Kevin
  2015-06-11  9:31     ` Chen, Tiejun
  2015-06-17 10:03   ` Jan Beulich
  1 sibling, 1 reply; 114+ messages in thread
From: Tian, Kevin @ 2015-06-11  9:14 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Thursday, June 11, 2015 9:15 AM
> 
> RMRR reserved regions must be setup in the pfn space with an identity
> mapping to reported mfn. However existing code has problem to setup
> correct mapping when VT-d shares EPT page table, so lead to problem
> when assigning devices (e.g GPU) with RMRR reported. So instead, this
> patch aims to setup identity mapping in p2m layer, regardless of
> whether EPT is shared or not. And we still keep creating VT-d table.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  xen/arch/x86/mm/p2m.c               | 10 ++++++++--
>  xen/drivers/passthrough/vtd/iommu.c |  3 +--
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index a6db236..c7198a5 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -927,10 +927,16 @@ int set_identity_p2m_entry(struct domain *d, unsigned long
> gfn,
>          }
> 
>          gfn_unlock(p2m, gfn, 0);
> -        return ret;
>      }
> +    else
> +        ret = 0;
> 
> -    return 0;
> +    if( ret == 0 )
> +    {
> +        ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
> +    }
> +
> +    return ret;

p2m_set_entry will setup IOMMU pages already. You don't need
another explicit iommu map here. 

>  }
> 
>  /* Returns: 0 for success, -errno for failure */
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index 6a37624..31ce1af 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1856,8 +1856,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
> 
>      while ( base_pfn < end_pfn )
>      {
> -        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
> -                                       IOMMUF_readable|IOMMUF_writable);
> +        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
> 
>          if ( err )
>              return err;

Tim has another comment to replace earlier unmap with 
guest_physmap_remove_page() which will call iommu
unmap internally. Please include this change too.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 02/16] xen/x86/p2m: introduce set_identity_p2m_entry
  2015-06-11  9:00   ` Tian, Kevin
@ 2015-06-11  9:18     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-11  9:18 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

On 2015/6/11 17:00, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Thursday, June 11, 2015 9:15 AM
>>
>> We will create this sort of identity mapping as follows:
>>
>> If the gfn space is unoccupied, we just set the mapping. If the space
>> is already occupied by 1:1 mappings, do nothing. Failed for any
>> other cases.
>
> "If space is already occupied by desired identity mapping, do nothing.
> Otherwise, failure is returned."

Okay.

>
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>, plus one small
> comment as below:
>
>
>> +            printk(XENLOG_G_WARNING
>> +                   "Cannot identity map d%d:%lx, already mapped to %lx.\n",
>> +                   d->domain_id, gfn, mfn_x(mfn));
>
> "Cannot setup identity map, ..., gfn already mapped to..."
>

Fixed.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 02/16] xen/x86/p2m: introduce set_identity_p2m_entry
  2015-06-11  8:23     ` Chen, Tiejun
@ 2015-06-11  9:23       ` Jan Beulich
  2015-06-11  9:25         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-11  9:23 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, yang.z.zhang

>>> On 11.06.15 at 10:23, <tiejun.chen@intel.com> wrote:
> On 2015/6/11 15:33, Jan Beulich wrote:
>>>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
>>> We will create this sort of identity mapping as follows:
>>>
>>> If the gfn space is unoccupied, we just set the mapping. If the space
>>> is already occupied by 1:1 mappings, do nothing. Failed for any
>>> other cases.
>>>
>>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>>
>> First of all you continue to be copying each patch to every
>> maintainer involved in some part of the series. Please limit the
> 
> I just hope all involved guys can see this series on the whole to 
> review. But,
> 
>> Cc list of each patch to the actual list of people needing to be
>> Cc-ed on it (or you know explicitly wanting a copy).
> 
> Next, I will just send them to each associated maintainer.
> 
>>
>>> --- a/xen/arch/x86/mm/p2m.c
>>> +++ b/xen/arch/x86/mm/p2m.c
>>> @@ -898,6 +898,41 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long 
> gfn, mfn_t mfn,
>>>       return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
>>>   }
>>>
>>> +int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
>>> +                           p2m_access_t p2ma)
>>> +{
>>> +    p2m_type_t p2mt;
>>> +    p2m_access_t a;
>>> +    mfn_t mfn;
>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> +    int ret;
>>> +
>>> +    if ( paging_mode_translate(p2m->domain) )
>>> +    {
>>> +        gfn_lock(p2m, gfn, 0);
>>> +
>>> +        mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
>>> +
>>> +        if ( p2mt == p2m_invalid || mfn_x(mfn) == INVALID_MFN )
>>
>> I'm not fundamentally opposed to this extra INVALID_MFN check, but
>> could you please clarify for which P2M type you saw INVALID_MFN
>> coming back here, and for which p2m_invalid cases you didn't also
>> see INVALID_MFN? I.e. I'd really prefer a single check to be used
>> when that can cover all cases.
> 
> Actually, I initially adopted "!mfn_valid(mfn)" in our previous version. 
> But Tim issued one comment about this,
> 
> "I don't think this check is quite right -- for example, this p2m entry
> might be an MMIO mapping or a PoD entry.  "if ( p2mt == p2m_invalid )"
> would be better."

Ah, I right, I now remember. In which case checking against
INVALID_MFN would cover the MMIO case, but not the PoD one.

> But if I just keep his recommended check, you can see the following when 
> I pass through IGD,
> 
> (XEN) Cannot identity map d1:ad800, already mapped to ffffffffffffffff 
> with p2mt:4.
> 
> Looks "4" indicates p2m_mmio_dm, right?

And it seems to me that this particular combination would need
special treatment, i.e. you'd need

       if ( p2mt == p2m_invalid ||
            (p2mt == p2m_mmio_dm && mfn_x(mfn) == INVALID_MFN) )

as long as p2m_invalid isn't the default type lookups return. But
I'd recommend waiting for Tim to confirm or further adjust that.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 02/16] xen/x86/p2m: introduce set_identity_p2m_entry
  2015-06-11  9:23       ` Jan Beulich
@ 2015-06-11  9:25         ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-11  9:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, kevin.tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, yang.z.zhang

On 2015/6/11 17:23, Jan Beulich wrote:
>>>> On 11.06.15 at 10:23, <tiejun.chen@intel.com> wrote:
>> On 2015/6/11 15:33, Jan Beulich wrote:
>>>>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
>>>> We will create this sort of identity mapping as follows:
>>>>
>>>> If the gfn space is unoccupied, we just set the mapping. If the space
>>>> is already occupied by 1:1 mappings, do nothing. Failed for any
>>>> other cases.
>>>>
>>>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>>>
>>> First of all you continue to be copying each patch to every
>>> maintainer involved in some part of the series. Please limit the
>>
>> I just hope all involved guys can see this series on the whole to
>> review. But,
>>
>>> Cc list of each patch to the actual list of people needing to be
>>> Cc-ed on it (or you know explicitly wanting a copy).
>>
>> Next, I will just send them to each associated maintainer.
>>
>>>
>>>> --- a/xen/arch/x86/mm/p2m.c
>>>> +++ b/xen/arch/x86/mm/p2m.c
>>>> @@ -898,6 +898,41 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long
>> gfn, mfn_t mfn,
>>>>        return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
>>>>    }
>>>>
>>>> +int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
>>>> +                           p2m_access_t p2ma)
>>>> +{
>>>> +    p2m_type_t p2mt;
>>>> +    p2m_access_t a;
>>>> +    mfn_t mfn;
>>>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>> +    int ret;
>>>> +
>>>> +    if ( paging_mode_translate(p2m->domain) )
>>>> +    {
>>>> +        gfn_lock(p2m, gfn, 0);
>>>> +
>>>> +        mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
>>>> +
>>>> +        if ( p2mt == p2m_invalid || mfn_x(mfn) == INVALID_MFN )
>>>
>>> I'm not fundamentally opposed to this extra INVALID_MFN check, but
>>> could you please clarify for which P2M type you saw INVALID_MFN
>>> coming back here, and for which p2m_invalid cases you didn't also
>>> see INVALID_MFN? I.e. I'd really prefer a single check to be used
>>> when that can cover all cases.
>>
>> Actually, I initially adopted "!mfn_valid(mfn)" in our previous version.
>> But Tim issued one comment about this,
>>
>> "I don't think this check is quite right -- for example, this p2m entry
>> might be an MMIO mapping or a PoD entry.  "if ( p2mt == p2m_invalid )"
>> would be better."
>
> Ah, I right, I now remember. In which case checking against
> INVALID_MFN would cover the MMIO case, but not the PoD one.
>
>> But if I just keep his recommended check, you can see the following when
>> I pass through IGD,
>>
>> (XEN) Cannot identity map d1:ad800, already mapped to ffffffffffffffff
>> with p2mt:4.
>>
>> Looks "4" indicates p2m_mmio_dm, right?
>
> And it seems to me that this particular combination would need
> special treatment, i.e. you'd need
>
>         if ( p2mt == p2m_invalid ||
>              (p2mt == p2m_mmio_dm && mfn_x(mfn) == INVALID_MFN) )
>
> as long as p2m_invalid isn't the default type lookups return. But
> I'd recommend waiting for Tim to confirm or further adjust that.
>

Sure.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-11  1:15 ` [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
@ 2015-06-11  9:28   ` Tian, Kevin
  2015-06-12  6:31     ` Chen, Tiejun
  2015-06-17 10:11   ` Jan Beulich
  1 sibling, 1 reply; 114+ messages in thread
From: Tian, Kevin @ 2015-06-11  9:28 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Thursday, June 11, 2015 9:15 AM
> 
> This patch extends the existing hypercall to support rdm reservation policy.
> We return error or just throw out a warning message depending on whether
> the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
> Note in some special cases, e.g. add a device to hwdomain, and remove a
> device from user domain, 'relaxed' is fine enough since this is always safe
> to hwdomain.

could you elaborate " add a device to hwdomain, and remove a device 
from user domain "? move a device from user domain to hwdomain
or completely irrelevant?

> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  xen/arch/x86/mm/p2m.c                       |  8 +++++++-
>  xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
>  xen/drivers/passthrough/arm/smmu.c          |  2 +-
>  xen/drivers/passthrough/device_tree.c       | 11 ++++++++++-
>  xen/drivers/passthrough/pci.c               | 10 ++++++----
>  xen/drivers/passthrough/vtd/iommu.c         | 20 ++++++++++++--------
>  xen/include/asm-x86/p2m.h                   |  2 +-
>  xen/include/public/domctl.h                 |  5 +++++
>  xen/include/xen/iommu.h                     |  2 +-
>  9 files changed, 45 insertions(+), 18 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index c7198a5..3fcdcac 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -899,7 +899,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn,
> mfn_t mfn,
>  }
> 
>  int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
> -                           p2m_access_t p2ma)
> +                           p2m_access_t p2ma, u32 flag)
>  {
>      p2m_type_t p2mt;
>      p2m_access_t a;
> @@ -924,6 +924,12 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
>              printk(XENLOG_G_WARNING
>                     "Cannot identity map d%d:%lx, already mapped to %lx.\n",
>                     d->domain_id, gfn, mfn_x(mfn));
> +
> +            if ( flag == XEN_DOMCTL_DEV_RDM_RELAXED )
> +            {
> +                ret = 0;
> +                printk(XENLOG_G_WARNING "Some devices may work failed.\n");

Do you need this extra printk? The warning message is already given
several lines above and here you just need to change return value
for relaxed policy.

> +            }
>          }
> 
>          gfn_unlock(p2m, gfn, 0);
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index e83bb35..920b35a 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct domain
> *target,
>  }
> 
>  static int amd_iommu_assign_device(struct domain *d, u8 devfn,
> -                                   struct pci_dev *pdev)
> +                                   struct pci_dev *pdev,
> +                                   u32 flag)
>  {
>      struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
>      int bdf = PCI_BDF2(pdev->bus, devfn);
> diff --git a/xen/drivers/passthrough/arm/smmu.c
> b/xen/drivers/passthrough/arm/smmu.c
> index 6cc4394..9a667e9 100644
> --- a/xen/drivers/passthrough/arm/smmu.c
> +++ b/xen/drivers/passthrough/arm/smmu.c
> @@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct
> iommu_domain *domain)
>  }
> 
>  static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
> -			       struct device *dev)
> +			       struct device *dev, u32 flag)
>  {
>  	struct iommu_domain *domain;
>  	struct arm_smmu_xen_domain *xen_domain;
> diff --git a/xen/drivers/passthrough/device_tree.c
> b/xen/drivers/passthrough/device_tree.c
> index 5d3842a..ea85645 100644
> --- a/xen/drivers/passthrough/device_tree.c
> +++ b/xen/drivers/passthrough/device_tree.c
> @@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct
> dt_device_node *dev)
>              goto fail;
>      }
> 
> -    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
> +    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev),
> +                                         XEN_DOMCTL_DEV_NO_RDM);
> 
>      if ( rc )
>          goto fail;
> @@ -148,6 +149,14 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct
> domain *d,
>          if ( domctl->u.assign_device.dev != XEN_DOMCTL_DEV_DT )
>              break;
> 
> +        if ( domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM )
> +        {
> +            printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: assign \"%s\""
> +                   " to dom%u failed (%d) since we don't support RDM.\n",
> +                   dt_node_full_name(dev), d->domain_id, ret);
> +            break;
> +        }
> +
>          if ( unlikely(d->is_dying) )
>          {
>              ret = -EINVAL;
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index e30be43..557c87e 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -1335,7 +1335,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn)
>      return pdev ? 0 : -EBUSY;
>  }
> 
> -static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
> +static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>  {
>      struct hvm_iommu *hd = domain_hvm_iommu(d);
>      struct pci_dev *pdev;
> @@ -1371,7 +1371,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8
> devfn)
> 
>      pdev->fault.count = 0;
> 
> -    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev))) )
> +    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag)) )
>          goto done;
> 
>      for ( ; pdev->phantom_stride; rc = 0 )
> @@ -1379,7 +1379,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8
> devfn)
>          devfn += pdev->phantom_stride;
>          if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
>              break;
> -        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev));
> +        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
>          if ( rc )
>              printk(XENLOG_G_WARNING "d%d: assign %04x:%02x:%02x.%u failed
> (%d)\n",
>                     d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> @@ -1496,6 +1496,7 @@ int iommu_do_pci_domctl(
>  {
>      u16 seg;
>      u8 bus, devfn;
> +    u32 flag;
>      int ret = 0;
>      uint32_t machine_sbdf;
> 
> @@ -1577,9 +1578,10 @@ int iommu_do_pci_domctl(
>          seg = machine_sbdf >> 16;
>          bus = PCI_BUS(machine_sbdf);
>          devfn = PCI_DEVFN2(machine_sbdf);
> +        flag = domctl->u.assign_device.flag;
> 
>          ret = device_assigned(seg, bus, devfn) ?:
> -              assign_device(d, seg, bus, devfn);
> +              assign_device(d, seg, bus, devfn, flag);
>          if ( ret == -ERESTART )
>              ret = hypercall_create_continuation(__HYPERVISOR_domctl,
>                                                  "h", u_domctl);
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index 31ce1af..d7c9e1c 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1808,7 +1808,8 @@ static void iommu_set_pgd(struct domain *d)
>  }
> 
>  static int rmrr_identity_mapping(struct domain *d, bool_t map,
> -                                 const struct acpi_rmrr_unit *rmrr)
> +                                 const struct acpi_rmrr_unit *rmrr,
> +                                 u32 flag)
>  {
>      unsigned long base_pfn = rmrr->base_address >> PAGE_SHIFT_4K;
>      unsigned long end_pfn = PAGE_ALIGN_4K(rmrr->end_address) >>
> PAGE_SHIFT_4K;
> @@ -1856,7 +1857,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
> 
>      while ( base_pfn < end_pfn )
>      {
> -        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
> +        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
> 
>          if ( err )
>              return err;
> @@ -1899,7 +1900,8 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev
> *pdev)
>               PCI_BUS(bdf) == pdev->bus &&
>               PCI_DEVFN2(bdf) == devfn )
>          {
> -            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
> +            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
> +                                        XEN_DOMCTL_DEV_RDM_RELAXED);

Why did you hardcode relax policy here? Shouldn't the policy come
from hypercall flag?

>              if ( ret )
>                  dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
>                          pdev->domain->domain_id);
> @@ -1940,7 +1942,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev
> *pdev)
>               PCI_DEVFN2(bdf) != devfn )
>              continue;
> 
> -        rmrr_identity_mapping(pdev->domain, 0, rmrr);
> +        rmrr_identity_mapping(pdev->domain, 0, rmrr,
> +                              XEN_DOMCTL_DEV_RDM_RELAXED);

ditto

>      }
> 
>      return domain_context_unmap(pdev->domain, devfn, pdev);
> @@ -2098,7 +2101,7 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
>      spin_lock(&pcidevs_lock);
>      for_each_rmrr_device ( rmrr, bdf, i )
>      {
> -        ret = rmrr_identity_mapping(d, 1, rmrr);
> +        ret = rmrr_identity_mapping(d, 1, rmrr, XEN_DOMCTL_DEV_RDM_RELAXED);
>          if ( ret )
>              dprintk(XENLOG_ERR VTDPREFIX,
>                       "IOMMU: mapping reserved region failed\n");
> @@ -2241,7 +2244,8 @@ static int reassign_device_ownership(
>                   PCI_BUS(bdf) == pdev->bus &&
>                   PCI_DEVFN2(bdf) == devfn )
>              {
> -                ret = rmrr_identity_mapping(source, 0, rmrr);
> +                ret = rmrr_identity_mapping(source, 0, rmrr,
> +                                            XEN_DOMCTL_DEV_RDM_RELAXED);
>                  if ( ret != -ENOENT )
>                      return ret;
>              }
> @@ -2265,7 +2269,7 @@ static int reassign_device_ownership(
>  }
> 
>  static int intel_iommu_assign_device(
> -    struct domain *d, u8 devfn, struct pci_dev *pdev)
> +    struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag)
>  {
>      struct acpi_rmrr_unit *rmrr;
>      int ret = 0, i;
> @@ -2294,7 +2298,7 @@ static int intel_iommu_assign_device(
>               PCI_BUS(bdf) == bus &&
>               PCI_DEVFN2(bdf) == devfn )
>          {
> -            ret = rmrr_identity_mapping(d, 1, rmrr);
> +            ret = rmrr_identity_mapping(d, 1, rmrr, flag);
>              if ( ret )
>              {
>                  reassign_device_ownership(d, hardware_domain, devfn, pdev);
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 95b6266..a80b4f8 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -545,7 +545,7 @@ int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn,
> mfn_t mfn);
> 
>  /* Set identity addresses in the p2m table (for pass-through) */
>  int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
> -                           p2m_access_t p2ma);
> +                           p2m_access_t p2ma, u32 flag);
> 
>  /* Add foreign mapping to the guest's p2m table. */
>  int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index bc45ea5..2f9e40e 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -478,6 +478,11 @@ struct xen_domctl_assign_device {
>              XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
>          } dt;
>      } u;
> +    /* IN */
> +#define XEN_DOMCTL_DEV_NO_RDM           0
> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
> +#define XEN_DOMCTL_DEV_RDM_STRICT       2

I don't understand why we require a NO_RDM flag. Whether there
is RDM associated with a given device, it's reported by system 
BIOS or through cmdline extension in coming patch. Why do we
require the hypercall to ask NO_RDM to hypervisor? The only flags
we want to pass to hypervisor is relaxed/strict policy, so hypervisor
know whether to fail or warn upon caught conflicts of identity
mapping...

> +    uint32_t  flag;   /* flag of assigned device */
>  };
>  typedef struct xen_domctl_assign_device xen_domctl_assign_device_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_assign_device_t);
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index e2f584d..02b2b02 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -140,7 +140,7 @@ struct iommu_ops {
>      int (*add_device)(u8 devfn, device_t *dev);
>      int (*enable_device)(device_t *dev);
>      int (*remove_device)(u8 devfn, device_t *dev);
> -    int (*assign_device)(struct domain *, u8 devfn, device_t *dev);
> +    int (*assign_device)(struct domain *, u8 devfn, device_t *dev, u32 flag);
>      int (*reassign_device)(struct domain *s, struct domain *t,
>                             u8 devfn, device_t *dev);
>  #ifdef HAS_PCI
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 05/16] xen: enable XENMEM_memory_map in hvm
  2015-06-11  1:15 ` [v3][PATCH 05/16] xen: enable XENMEM_memory_map in hvm Tiejun Chen
@ 2015-06-11  9:29   ` Tian, Kevin
  2015-06-17 10:14   ` Jan Beulich
  1 sibling, 0 replies; 114+ messages in thread
From: Tian, Kevin @ 2015-06-11  9:29 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Thursday, June 11, 2015 9:15 AM
> 
> This patch enables XENMEM_memory_map in hvm. So we can use it to
> setup the e820 mappings.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> Reviewed-by: Tim Deegan <tim@xen.org>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
>  xen/arch/x86/hvm/hvm.c | 2 --
>  xen/arch/x86/mm.c      | 6 ------
>  2 files changed, 8 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index f354cb7..fab5637 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4728,7 +4728,6 @@ static long hvm_memory_op(int cmd,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> 
>      switch ( cmd & MEMOP_CMD_MASK )
>      {
> -    case XENMEM_memory_map:
>      case XENMEM_machine_memory_map:
>      case XENMEM_machphys_mapping:
>          return -ENOSYS;
> @@ -4804,7 +4803,6 @@ static long hvm_memory_op_compat32(int cmd,
> XEN_GUEST_HANDLE_PARAM(void) arg)
> 
>      switch ( cmd & MEMOP_CMD_MASK )
>      {
> -    case XENMEM_memory_map:
>      case XENMEM_machine_memory_map:
>      case XENMEM_machphys_mapping:
>          return -ENOSYS;
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 472c494..4923ccd 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd,
> XEN_GUEST_HANDLE_PARAM(void) arg)
>              return rc;
>          }
> 
> -        if ( is_hvm_domain(d) )
> -        {
> -            rcu_unlock_domain(d);
> -            return -EPERM;
> -        }
> -
>          e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
>          if ( e820 == NULL )
>          {
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 03/16] xen/vtd: create RMRR mapping
  2015-06-11  9:14   ` Tian, Kevin
@ 2015-06-11  9:31     ` Chen, Tiejun
  2015-06-11 14:07       ` Tim Deegan
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-11  9:31 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

On 2015/6/11 17:14, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Thursday, June 11, 2015 9:15 AM
>>
>> RMRR reserved regions must be setup in the pfn space with an identity
>> mapping to reported mfn. However existing code has problem to setup
>> correct mapping when VT-d shares EPT page table, so lead to problem
>> when assigning devices (e.g GPU) with RMRR reported. So instead, this
>> patch aims to setup identity mapping in p2m layer, regardless of
>> whether EPT is shared or not. And we still keep creating VT-d table.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   xen/arch/x86/mm/p2m.c               | 10 ++++++++--
>>   xen/drivers/passthrough/vtd/iommu.c |  3 +--
>>   2 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
>> index a6db236..c7198a5 100644
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -927,10 +927,16 @@ int set_identity_p2m_entry(struct domain *d, unsigned long
>> gfn,
>>           }
>>
>>           gfn_unlock(p2m, gfn, 0);
>> -        return ret;
>>       }
>> +    else
>> +        ret = 0;
>>
>> -    return 0;
>> +    if( ret == 0 )
>> +    {
>> +        ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
>> +    }
>> +
>> +    return ret;
>
> p2m_set_entry will setup IOMMU pages already. You don't need
> another explicit iommu map here.

Right.

>
>>   }
>>
>>   /* Returns: 0 for success, -errno for failure */
>> diff --git a/xen/drivers/passthrough/vtd/iommu.c
>> b/xen/drivers/passthrough/vtd/iommu.c
>> index 6a37624..31ce1af 100644
>> --- a/xen/drivers/passthrough/vtd/iommu.c
>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>> @@ -1856,8 +1856,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
>>
>>       while ( base_pfn < end_pfn )
>>       {
>> -        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
>> -                                       IOMMUF_readable|IOMMUF_writable);
>> +        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
>>
>>           if ( err )
>>               return err;
>
> Tim has another comment to replace earlier unmap with

Yes, I knew this.

> guest_physmap_remove_page() which will call iommu
> unmap internally. Please include this change too.
>

But,

guest_physmap_remove_page()
     |
     + p2m_remove_page()
	|
	+ iommu_unmap_page()
	|
	+ p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), xxx)

I think this already remove these pages both on ept/vt-d sides, right?

Or I'm misunderstand what you guys mean?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 06/16] hvmloader: get guest memory map into memory_map[]
  2015-06-11  1:15 ` [v3][PATCH 06/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
@ 2015-06-11  9:38   ` Tian, Kevin
  2015-06-12  7:33     ` Chen, Tiejun
  2015-06-17 10:22   ` Jan Beulich
  1 sibling, 1 reply; 114+ messages in thread
From: Tian, Kevin @ 2015-06-11  9:38 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Thursday, June 11, 2015 9:15 AM
> 
> Now we get this map layout by call XENMEM_memory_map then
> save them into one global variable memory_map[]. It should
> include lowmem range, rdm range and highmem range. Note
> rdm range and highmem range may not exist in some cases.
> 
> And here we need to check if any reserved memory conflicts with
> [RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END].
> This range is used to allocate memory in hvmloder level, and
> we would lead hvmloader failed in case of conflict since its
> another rare possibility in real world.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/firmware/hvmloader/e820.h      |  7 +++++++
>  tools/firmware/hvmloader/hvmloader.c | 37
> ++++++++++++++++++++++++++++++++++++
>  tools/firmware/hvmloader/util.c      | 26 +++++++++++++++++++++++++
>  tools/firmware/hvmloader/util.h      | 11 +++++++++++
>  4 files changed, 81 insertions(+)
> 
> diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
> index b2ead7f..8b5a9e0 100644
> --- a/tools/firmware/hvmloader/e820.h
> +++ b/tools/firmware/hvmloader/e820.h
> @@ -15,6 +15,13 @@ struct e820entry {
>      uint32_t type;
>  } __attribute__((packed));
> 
> +#define E820MAX	128
> +
> +struct e820map {
> +    unsigned int nr_map;
> +    struct e820entry map[E820MAX];
> +};
> +
>  #endif /* __HVMLOADER_E820_H__ */
> 
>  /*
> diff --git a/tools/firmware/hvmloader/hvmloader.c
> b/tools/firmware/hvmloader/hvmloader.c
> index 25b7f08..c9f170e 100644
> --- a/tools/firmware/hvmloader/hvmloader.c
> +++ b/tools/firmware/hvmloader/hvmloader.c
> @@ -107,6 +107,8 @@ asm (
>      "    .text                       \n"
>      );
> 
> +struct e820map memory_map;
> +
>  unsigned long scratch_start = SCRATCH_PHYSICAL_ADDRESS;
> 
>  static void init_hypercalls(void)
> @@ -199,6 +201,39 @@ static void apic_setup(void)
>      ioapic_write(0x11, SET_APIC_ID(LAPIC_ID(0)));
>  }
> 
> +void memory_map_setup(void)
> +{
> +    unsigned int nr_entries = E820MAX, i;
> +    int rc;
> +    uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
> +    uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
> +
> +    rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
> +
> +    if ( rc )
> +    {
> +        printf("Failed to get guest memory map.\n");
> +        BUG();
> +    }
> +
> +    BUG_ON(!nr_entries);
> +    memory_map.nr_map = nr_entries;
> +
> +    for ( i = 0; i < nr_entries; i++ )
> +    {
> +        if ( memory_map.map[i].type == E820_RESERVED )
> +        {
> +            if ( check_overlap(alloc_addr, alloc_size,
> +                               memory_map.map[i].addr,
> +                               memory_map.map[i].size) )
> +            {
> +                printf("RDM conflicts Memory allocation.\n");

hvmloader has no concept of RDM here. It's just E820_RESERVED
type. Please make the error message clear, e.g. "Fail to setup
memory map due to conflict on dynamic reserved memory range."

Otherwise:

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-11  1:15 ` [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges Tiejun Chen
@ 2015-06-11  9:51   ` Tian, Kevin
  2015-06-12  7:53     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Tian, Kevin @ 2015-06-11  9:51 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Thursday, June 11, 2015 9:15 AM
> 
> When allocating mmio address for PCI bars, we need to make
> sure they don't overlap with reserved regions.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/firmware/hvmloader/pci.c | 36
> ++++++++++++++++++++++++++++++++++--
>  1 file changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index 5ff87a7..98af568 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -59,8 +59,8 @@ void pci_setup(void)
>          uint32_t bar_reg;
>          uint64_t bar_sz;
>      } *bars = (struct bars *)scratch_start;
> -    unsigned int i, nr_bars = 0;
> -    uint64_t mmio_hole_size = 0;
> +    unsigned int i, j, nr_bars = 0;
> +    uint64_t mmio_hole_size = 0, reserved_end, max_bar_sz = 0;
> 
>      const char *s;
>      /*
> @@ -226,6 +226,8 @@ void pci_setup(void)
>              bars[i].devfn   = devfn;
>              bars[i].bar_reg = bar_reg;
>              bars[i].bar_sz  = bar_sz;
> +            if ( bar_sz > max_bar_sz )
> +                max_bar_sz = bar_sz;
> 
>              if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
>                    PCI_BASE_ADDRESS_SPACE_MEMORY) ||
> @@ -301,6 +303,21 @@ void pci_setup(void)
>              pci_mem_start <<= 1;
>      }
> 
> +    /* Relocate PCI memory that overlaps reserved space, like RDM. */
> +    for ( j = 0; j < memory_map.nr_map ; j++ )
> +    {
> +        if ( memory_map.map[j].type != E820_RAM )
> +        {
> +            reserved_end = memory_map.map[j].addr + memory_map.map[j].size;
> +            if ( check_overlap(pci_mem_start, pci_mem_end,
> +                               memory_map.map[j].addr,
> +                               memory_map.map[j].size) )
> +                pci_mem_start -= memory_map.map[j].size >> PAGE_SHIFT;

what's the point of subtracting reserved size here? I think you want
to move pci_mem_start higher instead of lower to avoid conflict, right?

> +                pci_mem_start = (pci_mem_start + max_bar_sz - 1) &
> +                                    ~(uint64_t)(max_bar_sz - 1);

better have some comment to explain what exactly you're trying to
achieve here.

> +        }
> +    }
> +
>      if ( mmio_total > (pci_mem_end - pci_mem_start) )
>      {
>          printf("Low MMIO hole not large enough for all devices,"
> @@ -407,8 +424,23 @@ void pci_setup(void)
>          }
> 
>          base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
> + reallocate_mmio:

In earlier comment you said:

> +    /* Relocate PCI memory that overlaps reserved space, like RDM. */

If pci_mem_start has been relocated to avoid overlapping, how will actual
allocation here will conflict again? Sorry I may miss the two relocations here...

>          bar_data |= (uint32_t)base;
>          bar_data_upper = (uint32_t)(base >> 32);
> +        for ( j = 0; j < memory_map.nr_map ; j++ )
> +        {
> +            if ( memory_map.map[j].type != E820_RAM )
> +            {
> +                reserved_end = memory_map.map[j].addr +
> memory_map.map[j].size;
> +                if ( check_overlap(base, bar_sz,
> +                                   memory_map.map[j].addr,
> +                                   memory_map.map[j].size) )
> +                {
> +                    base = (reserved_end  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
> +                    goto reallocate_mmio;
> +                }
> +            }
> +        }
>          base += bar_sz;
> 
>          if ( (base < resource->base) || (base > resource->max) )
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 08/16] hvmloader/e820: construct guest e820 table
  2015-06-11  1:15 ` [v3][PATCH 08/16] hvmloader/e820: construct guest e820 table Tiejun Chen
@ 2015-06-11  9:59   ` Tian, Kevin
  2015-06-12  8:19     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Tian, Kevin @ 2015-06-11  9:59 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Thursday, June 11, 2015 9:15 AM
> 
> Now we can use that memory map to build our final
> e820 table but it may need to reorder all e820
> entries.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/firmware/hvmloader/e820.c | 62
> +++++++++++++++++++++++++++++++----------
>  1 file changed, 48 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
> index 2e05e93..c39b0aa 100644
> --- a/tools/firmware/hvmloader/e820.c
> +++ b/tools/firmware/hvmloader/e820.c
> @@ -73,7 +73,8 @@ int build_e820_table(struct e820entry *e820,
>                       unsigned int lowmem_reserved_base,
>                       unsigned int bios_image_base)
>  {
> -    unsigned int nr = 0;
> +    unsigned int nr = 0, i, j;
> +    uint64_t low_mem_pgend = hvm_info->low_mem_pgend << PAGE_SHIFT;

You may call it low_mem_end to differentiate from original
low_mem_pgend since one means actual address while the
other means pfn.

> 
>      if ( !lowmem_reserved_base )
>              lowmem_reserved_base = 0xA0000;
> @@ -117,13 +118,6 @@ int build_e820_table(struct e820entry *e820,
>      e820[nr].type = E820_RESERVED;
>      nr++;
> 
> -    /* Low RAM goes here. Reserve space for special pages. */
> -    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
> -    e820[nr].addr = 0x100000;
> -    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
> -    e820[nr].type = E820_RAM;
> -    nr++;
> -
>      /*
>       * Explicitly reserve space for special pages.
>       * This space starts at RESERVED_MEMBASE an extends to cover various
> @@ -159,16 +153,56 @@ int build_e820_table(struct e820entry *e820,
>          nr++;
>      }
> 
> -
> -    if ( hvm_info->high_mem_pgend )
> +    /*
> +     * Construct the remaining according memory_map.

"Construct E820 table according to recorded memory map"

> +     *
> +     * Note memory_map includes,

The memory map created by toolstack may include:

> +     *
> +     * #1. Low memory region
> +     *
> +     * Low RAM starts at least from 1M to make sure all standard regions
> +     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
> +     * have enough space.
> +     *
> +     * #2. RDM region if it exists

"Reserved regions if they exist"

> +     *
> +     * #3. High memory region if it exists
> +     */
> +    for ( i = 0; i < memory_map.nr_map; i++ )
>      {
> -        e820[nr].addr = ((uint64_t)1 << 32);
> -        e820[nr].size =
> -            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
> -        e820[nr].type = E820_RAM;
> +        e820[nr] = memory_map.map[i];
>          nr++;
>      }
> 
> +    /* Low RAM goes here. Reserve space for special pages. */
> +    BUG_ON(low_mem_pgend < (2u << 20));
> +    /*
> +     * We may need to adjust real lowmem end since we may
> +     * populate RAM to get enough MMIO previously.
> +     */
> +    for ( i = 0; i < memory_map.nr_map; i++ )

since you already translate memory map into e820 earlier, here
you should use 'nr' instead of memory_map.nr_map.

> +    {
> +        uint64_t end = e820[i].addr + e820[i].size;
> +        if ( e820[i].type == E820_RAM &&
> +             low_mem_pgend > e820[i].addr && low_mem_pgend < end )
> +            e820[i].size = low_mem_pgend - e820[i].addr;
> +    }

Sorry I may miss the code but could you elaborate where the
low_mem_pgend is changed after memory map is created? If
it happens within hvmloader, suppose the amount of reduced
memory from original E820_RAM entry should be added to
another E820_RAM entry for highmem, right?

> +
> +    /* Finally we need to reorder all e820 entries. */
> +    for ( j = 0; j < nr-1; j++ )
> +    {
> +        for ( i = j+1; i < nr; i++ )
> +        {
> +            if ( e820[j].addr > e820[i].addr )
> +            {
> +                struct e820entry tmp;
> +                tmp = e820[j];
> +                e820[j] = e820[i];
> +                e820[i] = tmp;
> +            }
> +        }
> +    }
> +
>      return nr;
>  }
> 
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 09/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map
  2015-06-11  1:15 ` [v3][PATCH 09/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
@ 2015-06-11 10:00   ` Tian, Kevin
  0 siblings, 0 replies; 114+ messages in thread
From: Tian, Kevin @ 2015-06-11 10:00 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Thursday, June 11, 2015 9:15 AM
> 
> We will introduce the hypercall xc_reserved_device_memory_map
> approach to libxc. This helps us get rdm entry info according to
> different parameters. If flag == PCI_DEV_RDM_ALL, all entries
> should be exposed. Or we just expose that rdm entry specific to
> a SBDF.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
>  tools/libxc/include/xenctrl.h |  8 ++++++++
>  tools/libxc/xc_domain.c       | 36
> ++++++++++++++++++++++++++++++++++++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 50fa9e7..6c01362 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -1326,6 +1326,14 @@ int xc_domain_set_memory_map(xc_interface *xch,
>  int xc_get_machine_memory_map(xc_interface *xch,
>                                struct e820entry entries[],
>                                uint32_t max_entries);
> +
> +int xc_reserved_device_memory_map(xc_interface *xch,
> +                                  uint32_t flag,
> +                                  uint16_t seg,
> +                                  uint8_t bus,
> +                                  uint8_t devfn,
> +                                  struct xen_reserved_device_memory entries[],
> +                                  uint32_t *max_entries);
>  #endif
>  int xc_domain_set_time_offset(xc_interface *xch,
>                                uint32_t domid,
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 1ff6d0a..4f96e1b 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -684,6 +684,42 @@ int xc_domain_set_memory_map(xc_interface *xch,
> 
>      return rc;
>  }
> +
> +int xc_reserved_device_memory_map(xc_interface *xch,
> +                                  uint32_t flag,
> +                                  uint16_t seg,
> +                                  uint8_t bus,
> +                                  uint8_t devfn,
> +                                  struct xen_reserved_device_memory entries[],
> +                                  uint32_t *max_entries)
> +{
> +    int rc;
> +    struct xen_reserved_device_memory_map xrdmmap = {
> +        .flag = flag,
> +        .seg = seg,
> +        .bus = bus,
> +        .devfn = devfn,
> +        .nr_entries = *max_entries
> +    };
> +    DECLARE_HYPERCALL_BOUNCE(entries,
> +                             sizeof(struct xen_reserved_device_memory) *
> +                             *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> +
> +    if ( xc_hypercall_bounce_pre(xch, entries) )
> +        return -1;
> +
> +    set_xen_guest_handle(xrdmmap.buffer, entries);
> +
> +    rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
> +                      &xrdmmap, sizeof(xrdmmap));
> +
> +    xc_hypercall_bounce_post(xch, entries);
> +
> +    *max_entries = xrdmmap.nr_entries;
> +
> +    return rc;
> +}
> +
>  int xc_get_machine_memory_map(xc_interface *xch,
>                                struct e820entry entries[],
>                                uint32_t max_entries)
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy
  2015-06-11  1:15 ` [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
@ 2015-06-11 10:02   ` Tian, Kevin
  2015-06-12  8:25     ` Chen, Tiejun
  2015-06-12 15:43   ` Wei Liu
  1 sibling, 1 reply; 114+ messages in thread
From: Tian, Kevin @ 2015-06-11 10:02 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Thursday, June 11, 2015 9:15 AM
> 
> This patch passes rdm reservation policy to xc_assign_device() so the policy
> is checked when assigning devices to a VM.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/libxc/include/xenctrl.h       |  3 ++-
>  tools/libxc/xc_domain.c             |  6 +++++-
>  tools/libxl/libxl_pci.c             |  3 ++-
>  tools/ocaml/libs/xc/xenctrl_stubs.c | 18 ++++++++++++++----
>  tools/python/xen/lowlevel/xc/xc.c   | 29 +++++++++++++++++++----------
>  5 files changed, 42 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 6c01362..7fd60d5 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2078,7 +2078,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
>  /* HVM guest pass-through */
>  int xc_assign_device(xc_interface *xch,
>                       uint32_t domid,
> -                     uint32_t machine_sbdf);
> +                     uint32_t machine_sbdf,
> +                     uint32_t flag);
> 
>  int xc_get_device_group(xc_interface *xch,
>                       uint32_t domid,
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 4f96e1b..19127ec 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
>  int xc_assign_device(
>      xc_interface *xch,
>      uint32_t domid,
> -    uint32_t machine_sbdf)
> +    uint32_t machine_sbdf,
> +    uint32_t flag)
>  {
>      DECLARE_DOMCTL;
> 
> @@ -1705,6 +1706,7 @@ int xc_assign_device(
>      domctl.domain = domid;
>      domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
>      domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
> +    domctl.u.assign_device.flag = flag;
> 
>      return do_domctl(xch, &domctl);
>  }
> @@ -1792,6 +1794,8 @@ int xc_assign_dt_device(
> 
>      domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
>      domctl.u.assign_device.u.dt.size = size;
> +    /* DT doesn't own any RDM. */
> +    domctl.u.assign_device.flag = XEN_DOMCTL_DEV_NO_RDM;

still not clear about this NO_RDM flag. If a device-tree device doesn't
own RDM, the hypervisor will know it. Why do we require toolstack
to tell hypervisor not use it?

>      set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
> 
>      rc = do_domctl(xch, &domctl);
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index e0743f8..632c15e 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid,
> libxl_device_pci *pcidev, i
>      FILE *f;
>      unsigned long long start, end, flags, size;
>      int irq, i, rc, hvm = 0;
> +    uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
> 
>      if (type == LIBXL_DOMAIN_TYPE_INVALID)
>          return ERROR_FAIL;
> @@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid,
> libxl_device_pci *pcidev, i
> 
>  out:
>      if (!libxl_is_stubdom(ctx, domid, NULL)) {
> -        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
> +        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
>          if (rc < 0 && (hvm || errno != ENOSYS)) {
>              LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
>              return ERROR_FAIL;
> diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
> index 64f1137..317bf75 100644
> --- a/tools/ocaml/libs/xc/xenctrl_stubs.c
> +++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
> @@ -1172,12 +1172,19 @@ CAMLprim value stub_xc_domain_test_assign_device(value
> xch, value domid, value d
>  	CAMLreturn(Val_bool(ret == 0));
>  }
> 
> -CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
> +static int domain_assign_device_rdm_flag_table[] = {
> +    XEN_DOMCTL_DEV_NO_RDM,
> +    XEN_DOMCTL_DEV_RDM_RELAXED,
> +    XEN_DOMCTL_DEV_RDM_STRICT,
> +};
> +
> +CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
> +                                            value rflag)
>  {
> -	CAMLparam3(xch, domid, desc);
> +	CAMLparam4(xch, domid, desc, rflag);
>  	int ret;
>  	int domain, bus, dev, func;
> -	uint32_t sbdf;
> +	uint32_t sbdf, flag;
> 
>  	domain = Int_val(Field(desc, 0));
>  	bus = Int_val(Field(desc, 1));
> @@ -1185,7 +1192,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch,
> value domid, value desc)
>  	func = Int_val(Field(desc, 3));
>  	sbdf = encode_sbdf(domain, bus, dev, func);
> 
> -	ret = xc_assign_device(_H(xch), _D(domid), sbdf);
> +	ret = Int_val(Field(rflag, 0));
> +	flag = domain_assign_device_rdm_flag_table[ret];
> +
> +	ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
> 
>  	if (ret < 0)
>  		failwith_xc(_H(xch));
> diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
> index c77e15b..172bdf0 100644
> --- a/tools/python/xen/lowlevel/xc/xc.c
> +++ b/tools/python/xen/lowlevel/xc/xc.c
> @@ -592,7 +592,8 @@ static int token_value(char *token)
>      return strtol(token, NULL, 16);
>  }
> 
> -static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
> +static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func,
> +                    int *flag)
>  {
>      char *token;
> 
> @@ -607,8 +608,16 @@ static int next_bdf(char **str, int *seg, int *bus, int *dev, int
> *func)
>      *dev  = token_value(token);
>      token = strchr(token, ',') + 1;
>      *func  = token_value(token);
> -    token = strchr(token, ',');
> -    *str = token ? token + 1 : NULL;
> +    token = strchr(token, ',') + 1;
> +    if ( token ) {
> +        *flag = token_value(token);
> +        *str = token + 1;
> +    }
> +    else
> +    {
> +        *flag = XEN_DOMCTL_DEV_RDM_STRICT;
> +        *str = NULL;
> +    }
> 
>      return 1;
>  }
> @@ -620,14 +629,14 @@ static PyObject *pyxc_test_assign_device(XcObject *self,
>      uint32_t dom;
>      char *pci_str;
>      int32_t sbdf = 0;
> -    int seg, bus, dev, func;
> +    int seg, bus, dev, func, flag;
> 
>      static char *kwd_list[] = { "domid", "pci", NULL };
>      if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
>                                        &dom, &pci_str) )
>          return NULL;
> 
> -    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
> +    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
>      {
>          sbdf = seg << 16;
>          sbdf |= (bus & 0xff) << 8;
> @@ -653,21 +662,21 @@ static PyObject *pyxc_assign_device(XcObject *self,
>      uint32_t dom;
>      char *pci_str;
>      int32_t sbdf = 0;
> -    int seg, bus, dev, func;
> +    int seg, bus, dev, func, flag;
> 
>      static char *kwd_list[] = { "domid", "pci", NULL };
>      if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
>                                        &dom, &pci_str) )
>          return NULL;
> 
> -    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
> +    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
>      {
>          sbdf = seg << 16;
>          sbdf |= (bus & 0xff) << 8;
>          sbdf |= (dev & 0x1f) << 3;
>          sbdf |= (func & 0x7);
> 
> -        if ( xc_assign_device(self->xc_handle, dom, sbdf) != 0 )
> +        if ( xc_assign_device(self->xc_handle, dom, sbdf, flag) != 0 )
>          {
>              if (errno == ENOSYS)
>                  sbdf = -1;
> @@ -686,14 +695,14 @@ static PyObject *pyxc_deassign_device(XcObject *self,
>      uint32_t dom;
>      char *pci_str;
>      int32_t sbdf = 0;
> -    int seg, bus, dev, func;
> +    int seg, bus, dev, func, flag;
> 
>      static char *kwd_list[] = { "domid", "pci", NULL };
>      if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
>                                        &dom, &pci_str) )
>          return NULL;
> 
> -    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
> +    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
>      {
>          sbdf = seg << 16;
>          sbdf |= (bus & 0xff) << 8;
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 13/16] tools/libxl: detect and avoid conflicts with RDM
  2015-06-11  1:15 ` [v3][PATCH 13/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
@ 2015-06-11 10:19   ` Tian, Kevin
  2015-06-12  8:30     ` Chen, Tiejun
  2015-06-12 16:39   ` Wei Liu
  1 sibling, 1 reply; 114+ messages in thread
From: Tian, Kevin @ 2015-06-11 10:19 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Thursday, June 11, 2015 9:15 AM
> 
> While building a VM, HVM domain builder provides struct hvm_info_table{}
> to help hvmloader. Currently it includes two fields to construct guest
> e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
> check them to fix any conflict with RAM.
> 
> RMRR can reside in address space beyond 4G theoretically, but we never
> see this in real world. So in order to avoid breaking highmem layout
> we don't solve highmem conflict. Note this means highmem rmrr could still
> be supported if no conflict.
> 
> But in the case of lowmem, RMRR probably scatter the whole RAM space.
> Especially multiple RMRR entries would worsen this to lead a complicated
> memory layout. And then its hard to extend hvm_info_table{} to work
> hvmloader out. So here we're trying to figure out a simple solution to
> avoid breaking existing layout. So when a conflict occurs,
> 
>     #1. Above a predefined boundary (default 2G)
>         - move lowmem_end below reserved region to solve conflict;
> 
>     #2. Below a predefined boundary (default 2G)
>         - Check strict/relaxed policy.
>         "strict" policy leads to fail libxl. Note when both policies
>         are specified on a given region, 'strict' is always preferred.
>         "relaxed" policy issue a warning message and also mask this entry INVALID
>         to indicate we shouldn't expose this entry to hvmloader.
> 
> Note this predefined boundary can be changes with the parameter
> "rdm_mem_boundary" in .cfg file.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Reviewed-by: Kevin Tian <kevint.tian@intel.com>

One comment though. could you be consistent to use RDM in the code? 
RMRR  is just an example of RDM...


> ---
>  docs/man/xl.cfg.pod.5          |  21 ++++
>  tools/libxc/xc_hvm_build_x86.c |   5 +-
>  tools/libxl/libxl.h            |   6 +
>  tools/libxl/libxl_create.c     |   6 +-
>  tools/libxl/libxl_dm.c         | 255
> +++++++++++++++++++++++++++++++++++++++++
>  tools/libxl/libxl_dom.c        |  11 +-
>  tools/libxl/libxl_internal.h   |  11 +-
>  tools/libxl/libxl_types.idl    |   8 ++
>  tools/libxl/xl_cmdimpl.c       |   3 +
>  9 files changed, 322 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index 638b350..6fd2370 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -767,6 +767,27 @@ to a given device, and "strict" is default here.
> 
>  Note this would override global B<rdm> option.
> 
> +=item B<rdm_mem_boundary=MBYTES>
> +
> +Number of megabytes to set a boundary for checking rdm conflict.
> +
> +When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
> +Especially multiple RMRR entries would worsen this to lead a complicated
> +memory layout. So here we're trying to figure out a simple solution to
> +avoid breaking existing layout. So when a conflict occurs,
> +
> +    #1. Above a predefined boundary
> +        - move lowmem_end below reserved region to solve conflict;
> +
> +    #2. Below a predefined boundary
> +        - Check strict/relaxed policy.
> +        "strict" policy leads to fail libxl. Note when both policies
> +        are specified on a given region, 'strict' is always preferred.
> +        "relaxed" policy issue a warning message and also mask this entry INVALID
> +        to indicate we shouldn't expose this entry to hvmloader.
> +
> +Here the default is 2G.
> +
>  =back
> 
>  =back
> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
> index 0e98c84..5142578 100644
> --- a/tools/libxc/xc_hvm_build_x86.c
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -21,6 +21,7 @@
>  #include <stdlib.h>
>  #include <unistd.h>
>  #include <zlib.h>
> +#include <assert.h>
> 
>  #include "xg_private.h"
>  #include "xc_private.h"
> @@ -270,7 +271,7 @@ static int setup_guest(xc_interface *xch,
> 
>      elf_parse_binary(&elf);
>      v_start = 0;
> -    v_end = args->mem_size;
> +    v_end = args->lowmem_end;
> 
>      if ( nr_pages > target_pages )
>          memflags |= XENMEMF_populate_on_demand;
> @@ -754,6 +755,8 @@ int xc_hvm_build_target_mem(xc_interface *xch,
>      args.mem_size = (uint64_t)memsize << 20;
>      args.mem_target = (uint64_t)target << 20;
>      args.image_file_name = image_name;
> +    if ( args.mmio_size == 0 )
> +        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
> 
>      return xc_hvm_build(xch, domid, &args);
>  }
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 0a7913b..a6212fb 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -858,6 +858,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
>  #define LIBXL_TIMER_MODE_DEFAULT -1
>  #define LIBXL_MEMKB_DEFAULT ~0ULL
> 
> +/*
> + * We'd like to set a memory boundary to determine if we need to check
> + * any overlap with reserved device memory.
> + */
> +#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
> +
>  #define LIBXL_MS_VM_GENID_LEN 16
>  typedef struct {
>      uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 6c8ec63..0438731 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info
> *b_info)
>  {
>      if (b_info->rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
>          b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
> +
> +    if (b_info->rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
> +        b_info->rdm_mem_boundary_memkb =
> +                            LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
>  }
> 
>  int libxl__domain_build_info_setdefault(libxl__gc *gc,
> @@ -460,7 +464,7 @@ int libxl__domain_build(libxl__gc *gc,
> 
>      switch (info->type) {
>      case LIBXL_DOMAIN_TYPE_HVM:
> -        ret = libxl__build_hvm(gc, domid, info, state);
> +        ret = libxl__build_hvm(gc, domid, d_config, state);
>          if (ret)
>              goto out;
> 
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 33f9ce6..d908350 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -90,6 +90,261 @@ const char *libxl__domain_device_model(libxl__gc *gc,
>      return dm;
>  }
> 
> +static struct xen_reserved_device_memory
> +*xc_device_get_rdm(libxl__gc *gc,
> +                   uint32_t flag,
> +                   uint16_t seg,
> +                   uint8_t bus,
> +                   uint8_t devfn,
> +                   unsigned int *nr_entries)
> +{
> +    struct xen_reserved_device_memory *xrdm;
> +    int rc;
> +
> +    rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
> +                                       NULL, nr_entries);
> +    assert(rc <= 0);
> +    /* "0" means we have no any rdm entry. */
> +    if (!rc)
> +        goto out;
> +
> +    if (errno == ENOBUFS) {
> +        xrdm = malloc(*nr_entries * sizeof(xen_reserved_device_memory_t));
> +        if (!xrdm) {
> +            LOG(ERROR, "Could not allocate RDM buffer!\n");
> +            goto out;
> +        }
> +        rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
> +                                           xrdm, nr_entries);
> +        if (rc) {
> +            LOG(ERROR, "Could not get reserved device memory maps.\n");
> +            *nr_entries = 0;
> +            free(xrdm);
> +            xrdm = NULL;
> +        }
> +    } else
> +        LOG(ERROR, "Could not get reserved device memory maps.\n");
> +
> + out:
> +    return xrdm;
> +}
> +
> +/*
> + * Check whether there exists rdm hole in the specified memory range.
> + * Returns true if exists, else returns false.
> + */
> +static bool overlaps_rdm(uint64_t start, uint64_t memsize,
> +                         uint64_t rdm_start, uint64_t rdm_size)
> +{
> +    return (start + memsize > rdm_start) && (start < rdm_start + rdm_size);
> +}
> +
> +/*
> + * Check reported RDM regions and handle potential gfn conflicts according
> + * to user preferred policy.
> + *
> + * RMRR can reside in address space beyond 4G theoretically, but we never
> + * see this in real world. So in order to avoid breaking highmem layout
> + * we don't solve highmem conflict. Note this means highmem rmrr could still
> + * be supported if no conflict.
> + *
> + * But in the case of lowmem, RMRR probably scatter the whole RAM space.
> + * Especially multiple RMRR entries would worsen this to lead a complicated
> + * memory layout. And then its hard to extend hvm_info_table{} to work
> + * hvmloader out. So here we're trying to figure out a simple solution to
> + * avoid breaking existing layout. So when a conflict occurs,
> + *
> + * #1. Above a predefined boundary (default 2G)
> + * - Move lowmem_end below reserved region to solve conflict;
> + *
> + * #2. Below a predefined boundary (default 2G)
> + * - Check strict/relaxed policy.
> + * "strict" policy leads to fail libxl. Note when both policies
> + * are specified on a given region, 'strict' is always preferred.
> + * "relaxed" policy issue a warning message and also mask this entry
> + * INVALID to indicate we shouldn't expose this entry to hvmloader.
> + */
> +int libxl__domain_device_construct_rdm(libxl__gc *gc,
> +                                       libxl_domain_config *d_config,
> +                                       uint64_t rdm_mem_boundary,
> +                                       struct xc_hvm_build_args *args)
> +{
> +    int i, j, conflict;
> +    struct xen_reserved_device_memory *xrdm = NULL;
> +    uint32_t type = d_config->b_info.rdm.type;
> +    uint16_t seg;
> +    uint8_t bus, devfn;
> +    uint64_t rdm_start, rdm_size;
> +    uint64_t highmem_end = args->highmem_end ? args->highmem_end : (1ull<<32);
> +
> +    /* Might not expose rdm. */
> +    if (type == LIBXL_RDM_RESERVE_TYPE_NONE && !d_config->num_pcidevs)
> +        return 0;
> +
> +    /* Query all RDM entries in this platform */
> +    if (type == LIBXL_RDM_RESERVE_TYPE_HOST) {
> +        unsigned int nr_entries;
> +
> +        /* Collect all rdm info if exist. */
> +        xrdm = xc_device_get_rdm(gc, PCI_DEV_RDM_ALL,
> +                                 0, 0, 0, &nr_entries);
> +        if (!nr_entries)
> +            return 0;
> +
> +        assert(xrdm);
> +
> +        d_config->num_rdms = nr_entries;
> +        d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
> +                                d_config->num_rdms * sizeof(libxl_device_rdm));
> +
> +        for (i = 0; i < d_config->num_rdms; i++) {
> +            d_config->rdms[i].start =
> +                                (uint64_t)xrdm[i].start_pfn << XC_PAGE_SHIFT;
> +            d_config->rdms[i].size =
> +                                (uint64_t)xrdm[i].nr_pages << XC_PAGE_SHIFT;
> +            d_config->rdms[i].flag = d_config->b_info.rdm.reserve;
> +        }
> +
> +        free(xrdm);
> +    } else
> +        d_config->num_rdms = 0;
> +
> +    /* Query RDM entries per-device */
> +    for (i = 0; i < d_config->num_pcidevs; i++) {
> +        unsigned int nr_entries;
> +        bool new = true;
> +
> +        seg = d_config->pcidevs[i].domain;
> +        bus = d_config->pcidevs[i].bus;
> +        devfn = PCI_DEVFN(d_config->pcidevs[i].dev, d_config->pcidevs[i].func);
> +        nr_entries = 0;
> +        xrdm = xc_device_get_rdm(gc, ~PCI_DEV_RDM_ALL,
> +                                 seg, bus, devfn, &nr_entries);
> +        /* No RDM to associated with this device. */
> +        if (!nr_entries)
> +            continue;
> +
> +        assert(xrdm);
> +
> +        /*
> +         * Need to check whether this entry is already saved in the array.
> +         * This could come from two cases:
> +         *
> +         *   - user may configure to get all RMRRs in this platform, which
> +         *   is already queried before this point
> +         *   - or two assigned devices may share one RMRR entry
> +         *
> +         * different policies may be configured on the same RMRR due to above
> +         * two cases. We choose a simple policy to always favor stricter policy
> +         */
> +        for (j = 0; j < d_config->num_rdms; j++) {
> +            if (d_config->rdms[j].start ==
> +                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT)
> +             {
> +                if (d_config->rdms[j].flag != LIBXL_RDM_RESERVE_FLAG_STRICT)
> +                    d_config->rdms[j].flag = d_config->pcidevs[i].rdm_reserve;
> +                new = false;
> +                break;
> +            }
> +        }
> +
> +        if (new) {
> +            d_config->num_rdms++;
> +            d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
> +                                d_config->num_rdms * sizeof(libxl_device_rdm));
> +
> +            d_config->rdms[d_config->num_rdms - 1].start =
> +                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT;
> +            d_config->rdms[d_config->num_rdms - 1].size =
> +                                (uint64_t)xrdm[0].nr_pages << XC_PAGE_SHIFT;
> +            d_config->rdms[d_config->num_rdms - 1].flag =
> +                                d_config->pcidevs[i].rdm_reserve;
> +        }
> +        free(xrdm);
> +    }
> +
> +    /*
> +     * Next step is to check and avoid potential conflict between RDM entries
> +     * and guest RAM. To avoid intrusive impact to existing memory layout
> +     * {lowmem, mmio, highmem} which is passed around various function blocks,
> +     * below conflicts are not handled which are rare and handling them would
> +     * lead to a more scattered layout:
> +     *  - RMRR in highmem area (>4G)
> +     *  - RMRR lower than a defined memory boundary (e.g. 2G)
> +     * Otherwise for conflicts between boundary and 4G, we'll simply move lowmem
> +     * end below reserved region to solve conflict.
> +     *
> +     * If a conflict is detected on a given RMRR entry, an error will be
> +     * returned if 'strict' policy is specified. Instead, if 'relaxed' policy
> +     * specified, this conflict is treated just as a warning, but we mark this
> +     * RMRR entry as INVALID to indicate that this entry shouldn't be exposed
> +     * to hvmloader.
> +     *
> +     * Firstly we should check the case of rdm < 4G because we may need to
> +     * expand highmem_end.
> +     */
> +    for (i = 0; i < d_config->num_rdms; i++) {
> +        rdm_start = d_config->rdms[i].start;
> +        rdm_size = d_config->rdms[i].size;
> +        conflict = overlaps_rdm(0, args->lowmem_end, rdm_start, rdm_size);
> +
> +        if (!conflict)
> +            continue;
> +
> +        /* Just check if RDM > our memory boundary. */
> +        if (rdm_start > rdm_mem_boundary) {
> +            /*
> +             * We will move downwards lowmem_end so we have to expand
> +             * highmem_end.
> +             */
> +            highmem_end += (args->lowmem_end - rdm_start);
> +            /* Now move downwards lowmem_end. */
> +            args->lowmem_end = rdm_start;
> +        }
> +    }
> +
> +    /* Sync highmem_end. */
> +    args->highmem_end = highmem_end;
> +
> +    /*
> +     * Finally we can take same policy to check lowmem(< 2G) and
> +     * highmem adjusted above.
> +     */
> +    for (i = 0; i < d_config->num_rdms; i++) {
> +        rdm_start = d_config->rdms[i].start;
> +        rdm_size = d_config->rdms[i].size;
> +        /* Does this entry conflict with lowmem? */
> +        conflict = overlaps_rdm(0, args->lowmem_end,
> +                                rdm_start, rdm_size);
> +        /* Does this entry conflict with highmem? */
> +        conflict |= overlaps_rdm((1ULL<<32),
> +                                 args->highmem_end - (1ULL<<32),
> +                                 rdm_start, rdm_size);
> +
> +        if (!conflict)
> +            continue;
> +
> +        if(d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_STRICT) {
> +            LOG(ERROR, "RDM conflict at 0x%lx.\n", d_config->rdms[i].start);
> +            goto out;
> +        } else {
> +            LOG(WARN, "Ignoring RDM conflict at 0x%lx.\n",
> +                      d_config->rdms[i].start);
> +
> +            /*
> +             * Then mask this INVALID to indicate we shouldn't expose this
> +             * to hvmloader.
> +             */
> +            d_config->rdms[i].flag = LIBXL_RDM_RESERVE_FLAG_INVALID;
> +        }
> +    }
> +
> +    return 0;
> +
> + out:
> +    return ERROR_FAIL;
> +}
> +
>  const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *guest_config)
>  {
>      const libxl_vnc_info *vnc = NULL;
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 867172a..1777b32 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -914,13 +914,14 @@ out:
>  }
> 
>  int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
> -              libxl_domain_build_info *info,
> +              libxl_domain_config *d_config,
>                libxl__domain_build_state *state)
>  {
>      libxl_ctx *ctx = libxl__gc_owner(gc);
>      struct xc_hvm_build_args args = {};
>      int ret, rc = ERROR_FAIL;
>      uint64_t mmio_start, lowmem_end, highmem_end;
> +    libxl_domain_build_info *const info = &d_config->b_info;
> 
>      memset(&args, 0, sizeof(struct xc_hvm_build_args));
>      /* The params from the configuration file are in Mb, which are then
> @@ -958,6 +959,14 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>      args.highmem_end = highmem_end;
>      args.mmio_start = mmio_start;
> 
> +    ret = libxl__domain_device_construct_rdm(gc, d_config,
> +                                             info->rdm_mem_boundary_memkb*1024,
> +                                             &args);
> +    if (ret) {
> +        LOG(ERROR, "checking reserved device memory failed");
> +        goto out;
> +    }
> +
>      if (info->num_vnuma_nodes != 0) {
>          int i;
> 
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index e9ac886..52f3831 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1011,7 +1011,7 @@ _hidden int libxl__build_post(libxl__gc *gc, uint32_t domid,
>  _hidden int libxl__build_pv(libxl__gc *gc, uint32_t domid,
>               libxl_domain_build_info *info, libxl__domain_build_state *state);
>  _hidden int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
> -              libxl_domain_build_info *info,
> +              libxl_domain_config *d_config,
>                libxl__domain_build_state *state);
> 
>  _hidden int libxl__qemu_traditional_cmd(libxl__gc *gc, uint32_t domid,
> @@ -1519,6 +1519,15 @@ _hidden int libxl__need_xenpv_qemu(libxl__gc *gc,
>          int nr_channels, libxl_device_channel *channels);
> 
>  /*
> + * This function will fix reserved device memory conflict
> + * according to user's configuration.
> + */
> +_hidden int libxl__domain_device_construct_rdm(libxl__gc *gc,
> +                                   libxl_domain_config *d_config,
> +                                   uint64_t rdm_mem_guard,
> +                                   struct xc_hvm_build_args *args);
> +
> +/*
>   * This function will cause the whole libxl process to hang
>   * if the device model does not respond.  It is deprecated.
>   *
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 4dfcaf7..b4282a0 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -395,6 +395,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>      ("target_memkb",    MemKB),
>      ("video_memkb",     MemKB),
>      ("shadow_memkb",    MemKB),
> +    ("rdm_mem_boundary_memkb",    MemKB),
>      ("rtc_timeoffset",  uint32),
>      ("exec_ssidref",    uint32),
>      ("exec_ssid_label", string),
> @@ -559,6 +560,12 @@ libxl_device_pci = Struct("device_pci", [
>      ("rdm_reserve",   libxl_rdm_reserve_flag),
>      ])
> 
> +libxl_device_rdm = Struct("device_rdm", [
> +    ("start", uint64),
> +    ("size", uint64),
> +    ("flag", libxl_rdm_reserve_flag),
> +    ])
> +
>  libxl_device_dtdev = Struct("device_dtdev", [
>      ("path", string),
>      ])
> @@ -589,6 +596,7 @@ libxl_domain_config = Struct("domain_config", [
>      ("disks", Array(libxl_device_disk, "num_disks")),
>      ("nics", Array(libxl_device_nic, "num_nics")),
>      ("pcidevs", Array(libxl_device_pci, "num_pcidevs")),
> +    ("rdms", Array(libxl_device_rdm, "num_rdms")),
>      ("dtdevs", Array(libxl_device_dtdev, "num_dtdevs")),
>      ("vfbs", Array(libxl_device_vfb, "num_vfbs")),
>      ("vkbs", Array(libxl_device_vkb, "num_vkbs")),
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 4364ba4..85d74fd 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -1374,6 +1374,9 @@ static void parse_config_data(const char *config_source,
>      if (!xlu_cfg_get_long (config, "videoram", &l, 0))
>          b_info->video_memkb = l * 1024;
> 
> +    if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
> +        b_info->rdm_mem_boundary_memkb = l * 1024;
> +
>      if (!xlu_cfg_get_long(config, "max_event_channels", &l, 0))
>          b_info->event_channels = l;
> 
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 15/16] xen/vtd: enable USB device assignment
  2015-06-11  1:15 ` [v3][PATCH 15/16] xen/vtd: enable USB device assignment Tiejun Chen
@ 2015-06-11 10:22   ` Tian, Kevin
  2015-06-12  8:59     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Tian, Kevin @ 2015-06-11 10:22 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Thursday, June 11, 2015 9:15 AM
> 
> Before we refine RMRR mechanism, USB RMRR may conflict with guest bios
> region so we always ignore USB RMRR. 

If USB RMRR conflicts with guest bios, the conflict is always there
before and after your refinement. :-)

> Now this can be gone when we enable
> pci_force to check/reserve RMRR.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Acked-by: Kevin Tian <kevin.tian@intel.com> except one small comment below

> ---
>  xen/drivers/passthrough/vtd/dmar.h  |  1 -
>  xen/drivers/passthrough/vtd/iommu.c | 11 ++---------
>  xen/drivers/passthrough/vtd/utils.c |  7 -------
>  3 files changed, 2 insertions(+), 17 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h
> index af1feef..af205f5 100644
> --- a/xen/drivers/passthrough/vtd/dmar.h
> +++ b/xen/drivers/passthrough/vtd/dmar.h
> @@ -129,7 +129,6 @@ do {                                                \
> 
>  int vtd_hw_check(void);
>  void disable_pmr(struct iommu *iommu);
> -int is_usb_device(u16 seg, u8 bus, u8 devfn);
>  int is_igd_drhd(struct acpi_drhd_unit *drhd);
> 
>  #endif /* _DMAR_H_ */
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index d7c9e1c..d3233b8 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2229,11 +2229,9 @@ static int reassign_device_ownership(
>      /*
>       * If the device belongs to the hardware domain, and it has RMRR, don't
>       * remove it from the hardware domain, because BIOS may use RMRR at
> -     * booting time. Also account for the special casing of USB below (in
> -     * intel_iommu_assign_device()).
> +     * booting time.

this code is run-time right?

>       */
> -    if ( !is_hardware_domain(source) &&
> -         !is_usb_device(pdev->seg, pdev->bus, pdev->devfn) )
> +    if ( !is_hardware_domain(source) )
>      {
>          const struct acpi_rmrr_unit *rmrr;
>          u16 bdf;
> @@ -2283,13 +2281,8 @@ static int intel_iommu_assign_device(
>      if ( ret )
>          return ret;
> 
> -    /* FIXME: Because USB RMRR conflicts with guest bios region,
> -     * ignore USB RMRR temporarily.
> -     */
>      seg = pdev->seg;
>      bus = pdev->bus;
> -    if ( is_usb_device(seg, bus, pdev->devfn) )
> -        return 0;
> 
>      /* Setup rmrr identity mapping */
>      for_each_rmrr_device( rmrr, bdf, i )
> diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
> index bd14c02..b8a077f 100644
> --- a/xen/drivers/passthrough/vtd/utils.c
> +++ b/xen/drivers/passthrough/vtd/utils.c
> @@ -29,13 +29,6 @@
>  #include "extern.h"
>  #include <asm/io_apic.h>
> 
> -int is_usb_device(u16 seg, u8 bus, u8 devfn)
> -{
> -    u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> -                                PCI_CLASS_DEVICE);
> -    return (class == 0xc03);
> -}
> -
>  /* Disable vt-d protected memory registers. */
>  void disable_pmr(struct iommu *iommu)
>  {
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 16/16] xen/vtd: prevent from assign the device with shared rmrr
  2015-06-11  1:15 ` [v3][PATCH 16/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
@ 2015-06-11 10:25   ` Tian, Kevin
  2015-06-12  8:44     ` Chen, Tiejun
  2015-06-17 10:28   ` Jan Beulich
  1 sibling, 1 reply; 114+ messages in thread
From: Tian, Kevin @ 2015-06-11 10:25 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Thursday, June 11, 2015 9:15 AM
> 
> Currently we're intending to cover this kind of devices

we're -> we're not?

> with shared RMRR simply since the case of shared RMRR is
> a rare case according to our previous experiences. But
> late we can group these devices which shared rmrr, and
> then allow all devices within a group to be assigned to
> same domain.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Acked-by: Kevin Tian <kevin.tian@intel.com> except one text
comment.

> ---
>  xen/drivers/passthrough/vtd/iommu.c | 30
> +++++++++++++++++++++++++++---
>  1 file changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index d3233b8..f220081 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2277,13 +2277,37 @@ static int intel_iommu_assign_device(
>      if ( list_empty(&acpi_drhd_units) )
>          return -ENODEV;
> 
> +    seg = pdev->seg;
> +    bus = pdev->bus;
> +    /*
> +     * In rare cases one given rmrr is shared by multiple devices but
> +     * obviously this would put the security of a system at risk. So
> +     * we should prevent from this sort of device assignment.
> +     *
> +     * TODO: actually we can group these devices which shared rmrr, and
> +     * then allow all devices within a group to be assigned to same domain.

TODO: in the future we can introduce group device assignment
interface to make sure devices sharing RMRR are assigned to the 
same domain together.

> +     */
> +    for_each_rmrr_device( rmrr, bdf, i )
> +    {
> +        if ( rmrr->segment == seg &&
> +             PCI_BUS(bdf) == bus &&
> +             PCI_DEVFN2(bdf) == devfn )
> +        {
> +            if ( rmrr->scope.devices_cnt > 1 )
> +            {
> +                ret = -EPERM;
> +                printk(XENLOG_G_ERR VTDPREFIX
> +                       " cannot assign this device with shared RMRR for Dom%d (%d)\n",
> +                       d->domain_id, ret);
> +                return ret;
> +            }
> +        }
> +    }
> +
>      ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
>      if ( ret )
>          return ret;
> 
> -    seg = pdev->seg;
> -    bus = pdev->bus;
> -
>      /* Setup rmrr identity mapping */
>      for_each_rmrr_device( rmrr, bdf, i )
>      {
> --
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 00/16] Fix RMRR
  2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
                   ` (16 preceding siblings ...)
  2015-06-11  7:27 ` [v3][PATCH 00/16] Fix RMRR Jan Beulich
@ 2015-06-11 12:52 ` Tim Deegan
  2015-06-12  2:10   ` Chen, Tiejun
  17 siblings, 1 reply; 114+ messages in thread
From: Tim Deegan @ 2015-06-11 12:52 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, andrew.cooper3, Ian.Jackson,
	xen-devel, stefano.stabellini, jbeulich, yang.z.zhang

Hi,

At 09:15 +0800 on 11 Jun (1434014109), Tiejun Chen wrote:
> * Two changes for runtime cycle
>    patch #2,xen/x86/p2m: introduce set_identity_p2m_entry, on hypervisor side
> 
>   a>. Introduce paging_mode_translate()
>   Otherwise, we'll see this error when boot Xen/Dom0

Righto.  Looking at the patch, it would be neater to have

    if ( !paging_mode_translate(p2m->domain) )
        return 0;

at the start, instead of indenting the whole body of the function in
an inner scope.

>   b>. Actually we still need to use "mfn_x(mfn) == INVALID_MFN" to confirm
>   we're getting an invalid mfn.

Can you give a concrete example of when this is needed?  The code now
looks like this: 

       if ( p2mt == p2m_invalid || mfn_x(mfn) == INVALID_MFN )
           ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
                               p2m_mmio_direct, p2ma);

which will catch any invalid-mfn mapping, including paged-out pages
&c.   I suspect the case you care about is the default p2m_mmio_dm type,
which would be better handeld explicitly:

       if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
           ...

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 03/16] xen/vtd: create RMRR mapping
  2015-06-11  9:31     ` Chen, Tiejun
@ 2015-06-11 14:07       ` Tim Deegan
  2015-06-12  2:43         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Tim Deegan @ 2015-06-11 14:07 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Tian, Kevin, wei.liu2, ian.campbell, andrew.cooper3, Ian.Jackson,
	xen-devel, stefano.stabellini, jbeulich, Zhang, Yang Z

At 17:31 +0800 on 11 Jun (1434043916), Chen, Tiejun wrote:
> >>       while ( base_pfn < end_pfn )
> >>       {
> >> -        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
> >> -                                       IOMMUF_readable|IOMMUF_writable);
> >> +        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
> >>
> >>           if ( err )
> >>               return err;
> >
> > Tim has another comment to replace earlier unmap with
> 
> Yes, I knew this.
> 
> > guest_physmap_remove_page() which will call iommu
> > unmap internally. Please include this change too.
> >
> 
> But,
> 
> guest_physmap_remove_page()
>      |
>      + p2m_remove_page()
> 	|
> 	+ iommu_unmap_page()
> 	|
> 	+ p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), xxx)
> 
> I think this already remove these pages both on ept/vt-d sides, right?

Yes; this is about this code further up in the same function:

           while ( base_pfn < end_pfn )
           {
               if ( intel_iommu_unmap_page(d, base_pfn) )
                   ret = -ENXIO;
               base_pfn++;
           }

which ought to be calling guest_physmap_remove_page() or similar, to
make sure that both iommu and EPT mappings get removed.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 00/16] Fix RMRR
  2015-06-11 12:52 ` Tim Deegan
@ 2015-06-12  2:10   ` Chen, Tiejun
  2015-06-12  8:04     ` Jan Beulich
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-12  2:10 UTC (permalink / raw)
  To: Tim Deegan
  Cc: kevin.tian, wei.liu2, ian.campbell, andrew.cooper3, Ian.Jackson,
	xen-devel, stefano.stabellini, jbeulich, yang.z.zhang

On 2015/6/11 20:52, Tim Deegan wrote:
> Hi,
>
> At 09:15 +0800 on 11 Jun (1434014109), Tiejun Chen wrote:
>> * Two changes for runtime cycle
>>     patch #2,xen/x86/p2m: introduce set_identity_p2m_entry, on hypervisor side
>>
>>    a>. Introduce paging_mode_translate()
>>    Otherwise, we'll see this error when boot Xen/Dom0
>
> Righto.  Looking at the patch, it would be neater to have
>
>      if ( !paging_mode_translate(p2m->domain) )
>          return 0;
>
> at the start, instead of indenting the whole body of the function in
> an inner scope.

Right.

>
>>    b>. Actually we still need to use "mfn_x(mfn) == INVALID_MFN" to confirm
>>    we're getting an invalid mfn.
>
> Can you give a concrete example of when this is needed?  The code now

Actually we're considering the case that a VM is set with a big memory 
like setting "memory = 2800" [0xaf000000] in the .cfg file, but here 
RMRR is owning such some overlap ranges,

(XEN) [VT-D]dmar.c:808: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:677:   RMRR region: base_addr ac6d3000 end_address 
ac6e6fff
(XEN) [VT-D]dmar.c:808: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:677:   RMRR region: base_addr ad800000 end_address 
afffffff

Furthermore, our current policy can help us eliminate this sort of 
conflict. For this example above, the real low memory is limited to 
0xacd3000 finally. So indeed, all RMRR regions aren't treated as RAM, 
and then at least p2m->get_entry() should always return "mfn_x(mfn) == 
INVALID_MFN", right? And,

> looks like this:
>
>         if ( p2mt == p2m_invalid || mfn_x(mfn) == INVALID_MFN )
>             ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
>                                 p2m_mmio_direct, p2ma);
>
> which will catch any invalid-mfn mapping, including paged-out pages
> &c.   I suspect the case you care about is the default p2m_mmio_dm type,

based on my understanding,

static mfn_t ept_get_entry(struct p2m_domain *p2m,
                            unsigned long gfn, p2m_type_t *t, 
p2m_access_t* a,
                            p2m_query_t q, unsigned int *page_order)
{
     ...
     mfn_t mfn = _mfn(INVALID_MFN);


     *t = p2m_mmio_dm;
     ...;

     /* This pfn is higher than the highest the p2m map currently holds */
     if ( gfn > p2m->max_mapped_pfn )
         goto out;
     ...

Actually we're falling into this condition so at last, we're always 
getting this combination of (*t = p2m_mmio_dm) and (mfn_t mfn = 
_mfn(INVALID_MFN)).

> which would be better handeld explicitly:
>
>         if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
>             ...
>

So if I'm correct, we should do this check explicitly,

        if ( p2mt == p2m_invalid ||
             (p2mt == p2m_mmio_dm && !mfn_valid(mfn) )

Note this is equivalent to Jan's comment.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 03/16] xen/vtd: create RMRR mapping
  2015-06-11 14:07       ` Tim Deegan
@ 2015-06-12  2:43         ` Chen, Tiejun
  2015-06-12  5:58           ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-12  2:43 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Tian, Kevin, wei.liu2, ian.campbell, andrew.cooper3, Ian.Jackson,
	xen-devel, stefano.stabellini, jbeulich, Zhang, Yang Z

On 2015/6/11 22:07, Tim Deegan wrote:
> At 17:31 +0800 on 11 Jun (1434043916), Chen, Tiejun wrote:
>>>>        while ( base_pfn < end_pfn )
>>>>        {
>>>> -        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
>>>> -                                       IOMMUF_readable|IOMMUF_writable);
>>>> +        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
>>>>
>>>>            if ( err )
>>>>                return err;
>>>
>>> Tim has another comment to replace earlier unmap with
>>
>> Yes, I knew this.
>>
>>> guest_physmap_remove_page() which will call iommu
>>> unmap internally. Please include this change too.
>>>
>>
>> But,
>>
>> guest_physmap_remove_page()
>>       |
>>       + p2m_remove_page()
>> 	|
>> 	+ iommu_unmap_page()
>> 	|
>> 	+ p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), xxx)
>>
>> I think this already remove these pages both on ept/vt-d sides, right?
>
> Yes; this is about this code further up in the same function:
>
>             while ( base_pfn < end_pfn )
>             {
>                 if ( intel_iommu_unmap_page(d, base_pfn) )
>                     ret = -ENXIO;
>                 base_pfn++;
>             }
>
> which ought to be calling guest_physmap_remove_page() or similar, to
> make sure that both iommu and EPT mappings get removed.
>

I still just think current implementation might be fine at this point.

We have two scenarios here, the case of shared ept and the case of 
non-shared ept. But no matter what case we're tracking, shouldn't 
guest_physmap_remove_page() always call p2m->set_entry() to clear *all* 
*valid* mfn which is owned by a given VM? And p2m->set_entry() also 
calls iommu_unmap_page() internally. So nothing special should further 
consider.

If I'm wrong or misunderstanding, please correct me :)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 03/16] xen/vtd: create RMRR mapping
  2015-06-12  2:43         ` Chen, Tiejun
@ 2015-06-12  5:58           ` Chen, Tiejun
  2015-06-12  5:59             ` Tian, Kevin
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-12  5:58 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Tian, Kevin, wei.liu2, ian.campbell, andrew.cooper3, Ian.Jackson,
	xen-devel, stefano.stabellini, jbeulich, Zhang, Yang Z

On 2015/6/12 10:43, Chen, Tiejun wrote:
> On 2015/6/11 22:07, Tim Deegan wrote:
>> At 17:31 +0800 on 11 Jun (1434043916), Chen, Tiejun wrote:
>>>>>        while ( base_pfn < end_pfn )
>>>>>        {
>>>>> -        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
>>>>> -
>>>>> IOMMUF_readable|IOMMUF_writable);
>>>>> +        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
>>>>>
>>>>>            if ( err )
>>>>>                return err;
>>>>
>>>> Tim has another comment to replace earlier unmap with
>>>
>>> Yes, I knew this.
>>>
>>>> guest_physmap_remove_page() which will call iommu
>>>> unmap internally. Please include this change too.
>>>>
>>>
>>> But,
>>>
>>> guest_physmap_remove_page()
>>>       |
>>>       + p2m_remove_page()
>>>     |
>>>     + iommu_unmap_page()
>>>     |
>>>     + p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), xxx)
>>>
>>> I think this already remove these pages both on ept/vt-d sides, right?
>>
>> Yes; this is about this code further up in the same function:
>>
>>             while ( base_pfn < end_pfn )
>>             {
>>                 if ( intel_iommu_unmap_page(d, base_pfn) )
>>                     ret = -ENXIO;
>>                 base_pfn++;
>>             }
>>
>> which ought to be calling guest_physmap_remove_page() or similar, to
>> make sure that both iommu and EPT mappings get removed.
>>
>
> I still just think current implementation might be fine at this point.
>
> We have two scenarios here, the case of shared ept and the case of
> non-shared ept. But no matter what case we're tracking, shouldn't
> guest_physmap_remove_page() always call p2m->set_entry() to clear *all*
> *valid* mfn which is owned by a given VM? And p2m->set_entry() also
> calls iommu_unmap_page() internally. So nothing special should further
> consider.
>
> If I'm wrong or misunderstanding, please correct me :)
>

Sorry for my misunderstanding to this. Right now Kevin help me 
understand what you mean. Sounds like we should address something 
specific to unmap rmrr here.

So I will do this as follows:

#1. Provide a clear helper

+int clear_identity_p2m_entry(struct domain *d, unsigned long gfn,
+                             unsigned int page_order)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    int ret;
+    gfn_lock(p2m, gfn, page_order);
+    ret = p2m_remove_page(p2m, gfn, gfn, page_order);
+    gfn_unlock(p2m, gfn, page_order);
+    return ret;
+}
+

#2. Call such a helper

@@ -1840,7 +1840,7 @@ static int rmrr_identity_mapping(struct domain *d, 
bool_t map,

              while ( base_pfn < end_pfn )
              {
-                if ( intel_iommu_unmap_page(d, base_pfn) )
+                if ( clear_identity_p2m_entry(d, base_pfn, 0) )
                      ret = -ENXIO;
                  base_pfn++;
              }
Is this right?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 03/16] xen/vtd: create RMRR mapping
  2015-06-12  5:58           ` Chen, Tiejun
@ 2015-06-12  5:59             ` Tian, Kevin
  2015-06-12  6:13               ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Tian, Kevin @ 2015-06-12  5:59 UTC (permalink / raw)
  To: Chen, Tiejun, Tim Deegan
  Cc: wei.liu2, ian.campbell, andrew.cooper3, Ian.Jackson, xen-devel,
	stefano.stabellini, jbeulich, Zhang, Yang Z

> From: Chen, Tiejun
> Sent: Friday, June 12, 2015 1:58 PM
> 
> On 2015/6/12 10:43, Chen, Tiejun wrote:
> > On 2015/6/11 22:07, Tim Deegan wrote:
> >> At 17:31 +0800 on 11 Jun (1434043916), Chen, Tiejun wrote:
> >>>>>        while ( base_pfn < end_pfn )
> >>>>>        {
> >>>>> -        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
> >>>>> -
> >>>>> IOMMUF_readable|IOMMUF_writable);
> >>>>> +        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
> >>>>>
> >>>>>            if ( err )
> >>>>>                return err;
> >>>>
> >>>> Tim has another comment to replace earlier unmap with
> >>>
> >>> Yes, I knew this.
> >>>
> >>>> guest_physmap_remove_page() which will call iommu
> >>>> unmap internally. Please include this change too.
> >>>>
> >>>
> >>> But,
> >>>
> >>> guest_physmap_remove_page()
> >>>       |
> >>>       + p2m_remove_page()
> >>>     |
> >>>     + iommu_unmap_page()
> >>>     |
> >>>     + p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), xxx)
> >>>
> >>> I think this already remove these pages both on ept/vt-d sides, right?
> >>
> >> Yes; this is about this code further up in the same function:
> >>
> >>             while ( base_pfn < end_pfn )
> >>             {
> >>                 if ( intel_iommu_unmap_page(d, base_pfn) )
> >>                     ret = -ENXIO;
> >>                 base_pfn++;
> >>             }
> >>
> >> which ought to be calling guest_physmap_remove_page() or similar, to
> >> make sure that both iommu and EPT mappings get removed.
> >>
> >
> > I still just think current implementation might be fine at this point.
> >
> > We have two scenarios here, the case of shared ept and the case of
> > non-shared ept. But no matter what case we're tracking, shouldn't
> > guest_physmap_remove_page() always call p2m->set_entry() to clear *all*
> > *valid* mfn which is owned by a given VM? And p2m->set_entry() also
> > calls iommu_unmap_page() internally. So nothing special should further
> > consider.
> >
> > If I'm wrong or misunderstanding, please correct me :)
> >
> 
> Sorry for my misunderstanding to this. Right now Kevin help me
> understand what you mean. Sounds like we should address something
> specific to unmap rmrr here.
> 
> So I will do this as follows:
> 
> #1. Provide a clear helper
> 
> +int clear_identity_p2m_entry(struct domain *d, unsigned long gfn,
> +                             unsigned int page_order)
> +{
> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    int ret;
> +    gfn_lock(p2m, gfn, page_order);
> +    ret = p2m_remove_page(p2m, gfn, gfn, page_order);
> +    gfn_unlock(p2m, gfn, page_order);
> +    return ret;
> +}
> +
> 
> #2. Call such a helper
> 
> @@ -1840,7 +1840,7 @@ static int rmrr_identity_mapping(struct domain *d,
> bool_t map,
> 
>               while ( base_pfn < end_pfn )
>               {
> -                if ( intel_iommu_unmap_page(d, base_pfn) )
> +                if ( clear_identity_p2m_entry(d, base_pfn, 0) )
>                       ret = -ENXIO;
>                   base_pfn++;
>               }
> Is this right?
> 
> Thanks
> Tiejun

could you explain why existing guest_physmap_remove_page can't
serve the purpose so you need invent a new identity mapping
specific one? For unmapping suppose it should be common regardless
of whether it's identity-mapped or not. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 03/16] xen/vtd: create RMRR mapping
  2015-06-12  5:59             ` Tian, Kevin
@ 2015-06-12  6:13               ` Chen, Tiejun
  2015-06-18 10:07                 ` Tim Deegan
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-12  6:13 UTC (permalink / raw)
  To: Tian, Kevin, Tim Deegan
  Cc: wei.liu2, ian.campbell, andrew.cooper3, Ian.Jackson, xen-devel,
	stefano.stabellini, jbeulich, Zhang, Yang Z

On 2015/6/12 13:59, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Friday, June 12, 2015 1:58 PM
>>
>> On 2015/6/12 10:43, Chen, Tiejun wrote:
>>> On 2015/6/11 22:07, Tim Deegan wrote:
>>>> At 17:31 +0800 on 11 Jun (1434043916), Chen, Tiejun wrote:
>>>>>>>         while ( base_pfn < end_pfn )
>>>>>>>         {
>>>>>>> -        int err = intel_iommu_map_page(d, base_pfn, base_pfn,
>>>>>>> -
>>>>>>> IOMMUF_readable|IOMMUF_writable);
>>>>>>> +        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
>>>>>>>
>>>>>>>             if ( err )
>>>>>>>                 return err;
>>>>>>
>>>>>> Tim has another comment to replace earlier unmap with
>>>>>
>>>>> Yes, I knew this.
>>>>>
>>>>>> guest_physmap_remove_page() which will call iommu
>>>>>> unmap internally. Please include this change too.
>>>>>>
>>>>>
>>>>> But,
>>>>>
>>>>> guest_physmap_remove_page()
>>>>>        |
>>>>>        + p2m_remove_page()
>>>>>      |
>>>>>      + iommu_unmap_page()
>>>>>      |
>>>>>      + p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), xxx)
>>>>>
>>>>> I think this already remove these pages both on ept/vt-d sides, right?
>>>>
>>>> Yes; this is about this code further up in the same function:
>>>>
>>>>              while ( base_pfn < end_pfn )
>>>>              {
>>>>                  if ( intel_iommu_unmap_page(d, base_pfn) )
>>>>                      ret = -ENXIO;
>>>>                  base_pfn++;
>>>>              }
>>>>
>>>> which ought to be calling guest_physmap_remove_page() or similar, to
>>>> make sure that both iommu and EPT mappings get removed.
>>>>
>>>
>>> I still just think current implementation might be fine at this point.
>>>
>>> We have two scenarios here, the case of shared ept and the case of
>>> non-shared ept. But no matter what case we're tracking, shouldn't
>>> guest_physmap_remove_page() always call p2m->set_entry() to clear *all*
>>> *valid* mfn which is owned by a given VM? And p2m->set_entry() also
>>> calls iommu_unmap_page() internally. So nothing special should further
>>> consider.
>>>
>>> If I'm wrong or misunderstanding, please correct me :)
>>>
>>
>> Sorry for my misunderstanding to this. Right now Kevin help me
>> understand what you mean. Sounds like we should address something
>> specific to unmap rmrr here.
>>
>> So I will do this as follows:
>>
>> #1. Provide a clear helper
>>
>> +int clear_identity_p2m_entry(struct domain *d, unsigned long gfn,
>> +                             unsigned int page_order)
>> +{
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +    int ret;
>> +    gfn_lock(p2m, gfn, page_order);
>> +    ret = p2m_remove_page(p2m, gfn, gfn, page_order);
>> +    gfn_unlock(p2m, gfn, page_order);
>> +    return ret;
>> +}
>> +
>>
>> #2. Call such a helper
>>
>> @@ -1840,7 +1840,7 @@ static int rmrr_identity_mapping(struct domain *d,
>> bool_t map,
>>
>>                while ( base_pfn < end_pfn )
>>                {
>> -                if ( intel_iommu_unmap_page(d, base_pfn) )
>> +                if ( clear_identity_p2m_entry(d, base_pfn, 0) )
>>                        ret = -ENXIO;
>>                    base_pfn++;
>>                }
>> Is this right?
>>
>> Thanks
>> Tiejun
>
> could you explain why existing guest_physmap_remove_page can't
> serve the purpose so you need invent a new identity mapping
> specific one? For unmapping suppose it should be common regardless
> of whether it's identity-mapped or not. :-)

I have some concerns here:

#1. guest_physmap_remove_page() is a void function without a returning 
value, so you still need a little change.

#2. guest_physmap_remove_page() doesn't make readable in such a code 
context;

rmrr_identity_mapping()
{
     ...
     guest_physmap_remove_page()
     ...
}

#3. a new helper is good to further extend if necessary.

Of course, I'd like to modify-to-use guest_physmap_remove_page() if you 
guys aren't in agreement with me :)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-11  9:28   ` Tian, Kevin
@ 2015-06-12  6:31     ` Chen, Tiejun
  2015-06-12  8:45       ` Jan Beulich
  2015-06-16  2:30       ` Tian, Kevin
  0 siblings, 2 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-12  6:31 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

On 2015/6/11 17:28, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Thursday, June 11, 2015 9:15 AM
>>
>> This patch extends the existing hypercall to support rdm reservation policy.
>> We return error or just throw out a warning message depending on whether
>> the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
>> Note in some special cases, e.g. add a device to hwdomain, and remove a
>> device from user domain, 'relaxed' is fine enough since this is always safe
>> to hwdomain.
>
> could you elaborate " add a device to hwdomain, and remove a device
> from user domain "? move a device from user domain to hwdomain
> or completely irrelevant?

Yes, they're not relevant. And I think we shouldn't care our policy,

#1. When add a device to hwdomain

I think RMRR is always reserved on e820 so either of flag is fine.

#2. remove a device from domain

"remove" action also can ignore that since the original mechanism is 
enough.

>
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   xen/arch/x86/mm/p2m.c                       |  8 +++++++-
>>   xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
>>   xen/drivers/passthrough/arm/smmu.c          |  2 +-
>>   xen/drivers/passthrough/device_tree.c       | 11 ++++++++++-
>>   xen/drivers/passthrough/pci.c               | 10 ++++++----
>>   xen/drivers/passthrough/vtd/iommu.c         | 20 ++++++++++++--------
>>   xen/include/asm-x86/p2m.h                   |  2 +-
>>   xen/include/public/domctl.h                 |  5 +++++
>>   xen/include/xen/iommu.h                     |  2 +-
>>   9 files changed, 45 insertions(+), 18 deletions(-)
>>
>> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
>> index c7198a5..3fcdcac 100644
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -899,7 +899,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn,
>> mfn_t mfn,
>>   }
>>
>>   int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
>> -                           p2m_access_t p2ma)
>> +                           p2m_access_t p2ma, u32 flag)
>>   {
>>       p2m_type_t p2mt;
>>       p2m_access_t a;
>> @@ -924,6 +924,12 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
>>               printk(XENLOG_G_WARNING
>>                      "Cannot identity map d%d:%lx, already mapped to %lx.\n",
>>                      d->domain_id, gfn, mfn_x(mfn));
>> +
>> +            if ( flag == XEN_DOMCTL_DEV_RDM_RELAXED )
>> +            {
>> +                ret = 0;
>> +                printk(XENLOG_G_WARNING "Some devices may work failed.\n");
>
> Do you need this extra printk? The warning message is already given
> several lines above and here you just need to change return value
> for relaxed policy.

Okay.

>
>> +            }
>>           }
>>
>>           gfn_unlock(p2m, gfn, 0);
>> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> index e83bb35..920b35a 100644
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct domain
>> *target,
>>   }
>>
>>   static int amd_iommu_assign_device(struct domain *d, u8 devfn,
>> -                                   struct pci_dev *pdev)
>> +                                   struct pci_dev *pdev,
>> +                                   u32 flag)
>>   {
>>       struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
>>       int bdf = PCI_BDF2(pdev->bus, devfn);
>> diff --git a/xen/drivers/passthrough/arm/smmu.c
>> b/xen/drivers/passthrough/arm/smmu.c
>> index 6cc4394..9a667e9 100644
>> --- a/xen/drivers/passthrough/arm/smmu.c
>> +++ b/xen/drivers/passthrough/arm/smmu.c
>> @@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct
>> iommu_domain *domain)
>>   }
>>
>>   static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
>> -			       struct device *dev)
>> +			       struct device *dev, u32 flag)
>>   {
>>   	struct iommu_domain *domain;
>>   	struct arm_smmu_xen_domain *xen_domain;
>> diff --git a/xen/drivers/passthrough/device_tree.c
>> b/xen/drivers/passthrough/device_tree.c
>> index 5d3842a..ea85645 100644
>> --- a/xen/drivers/passthrough/device_tree.c
>> +++ b/xen/drivers/passthrough/device_tree.c
>> @@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct
>> dt_device_node *dev)
>>               goto fail;
>>       }
>>
>> -    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
>> +    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev),
>> +                                         XEN_DOMCTL_DEV_NO_RDM);
>>
>>       if ( rc )
>>           goto fail;
>> @@ -148,6 +149,14 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct
>> domain *d,
>>           if ( domctl->u.assign_device.dev != XEN_DOMCTL_DEV_DT )
>>               break;
>>
>> +        if ( domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM )
>> +        {
>> +            printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: assign \"%s\""
>> +                   " to dom%u failed (%d) since we don't support RDM.\n",
>> +                   dt_node_full_name(dev), d->domain_id, ret);
>> +            break;
>> +        }
>> +
>>           if ( unlikely(d->is_dying) )
>>           {
>>               ret = -EINVAL;
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index e30be43..557c87e 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -1335,7 +1335,7 @@ static int device_assigned(u16 seg, u8 bus, u8 devfn)
>>       return pdev ? 0 : -EBUSY;
>>   }
>>
>> -static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn)
>> +static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
>>   {
>>       struct hvm_iommu *hd = domain_hvm_iommu(d);
>>       struct pci_dev *pdev;
>> @@ -1371,7 +1371,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8
>> devfn)
>>
>>       pdev->fault.count = 0;
>>
>> -    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev))) )
>> +    if ( (rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag)) )
>>           goto done;
>>
>>       for ( ; pdev->phantom_stride; rc = 0 )
>> @@ -1379,7 +1379,7 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8
>> devfn)
>>           devfn += pdev->phantom_stride;
>>           if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
>>               break;
>> -        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev));
>> +        rc = hd->platform_ops->assign_device(d, devfn, pci_to_dev(pdev), flag);
>>           if ( rc )
>>               printk(XENLOG_G_WARNING "d%d: assign %04x:%02x:%02x.%u failed
>> (%d)\n",
>>                      d->domain_id, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>> @@ -1496,6 +1496,7 @@ int iommu_do_pci_domctl(
>>   {
>>       u16 seg;
>>       u8 bus, devfn;
>> +    u32 flag;
>>       int ret = 0;
>>       uint32_t machine_sbdf;
>>
>> @@ -1577,9 +1578,10 @@ int iommu_do_pci_domctl(
>>           seg = machine_sbdf >> 16;
>>           bus = PCI_BUS(machine_sbdf);
>>           devfn = PCI_DEVFN2(machine_sbdf);
>> +        flag = domctl->u.assign_device.flag;
>>
>>           ret = device_assigned(seg, bus, devfn) ?:
>> -              assign_device(d, seg, bus, devfn);
>> +              assign_device(d, seg, bus, devfn, flag);
>>           if ( ret == -ERESTART )
>>               ret = hypercall_create_continuation(__HYPERVISOR_domctl,
>>                                                   "h", u_domctl);
>> diff --git a/xen/drivers/passthrough/vtd/iommu.c
>> b/xen/drivers/passthrough/vtd/iommu.c
>> index 31ce1af..d7c9e1c 100644
>> --- a/xen/drivers/passthrough/vtd/iommu.c
>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>> @@ -1808,7 +1808,8 @@ static void iommu_set_pgd(struct domain *d)
>>   }
>>
>>   static int rmrr_identity_mapping(struct domain *d, bool_t map,
>> -                                 const struct acpi_rmrr_unit *rmrr)
>> +                                 const struct acpi_rmrr_unit *rmrr,
>> +                                 u32 flag)
>>   {
>>       unsigned long base_pfn = rmrr->base_address >> PAGE_SHIFT_4K;
>>       unsigned long end_pfn = PAGE_ALIGN_4K(rmrr->end_address) >>
>> PAGE_SHIFT_4K;
>> @@ -1856,7 +1857,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map,
>>
>>       while ( base_pfn < end_pfn )
>>       {
>> -        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
>> +        int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
>>
>>           if ( err )
>>               return err;
>> @@ -1899,7 +1900,8 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev
>> *pdev)
>>                PCI_BUS(bdf) == pdev->bus &&
>>                PCI_DEVFN2(bdf) == devfn )
>>           {
>> -            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
>> +            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
>> +                                        XEN_DOMCTL_DEV_RDM_RELAXED);
>
> Why did you hardcode relax policy here? Shouldn't the policy come
> from hypercall flag?

I just saw we have one path to use intel_iommu_add_device(),

pci_add_device()
     |
     + if ( !pdev->domain )
       {
         pdev->domain = hardware_domain;
         ret = iommu_add_device(pdev);
	    |
	    + hd->platform_ops->add_device()
		|
		+ intel_iommu_add_device()

So I think intel_iommu_add_device() is used to add a device to 
hardware_domain. And in our case hardware_domain should be special as I 
explained above.

>
>>               if ( ret )
>>                   dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
>>                           pdev->domain->domain_id);
>> @@ -1940,7 +1942,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev
>> *pdev)
>>                PCI_DEVFN2(bdf) != devfn )
>>               continue;
>>
>> -        rmrr_identity_mapping(pdev->domain, 0, rmrr);
>> +        rmrr_identity_mapping(pdev->domain, 0, rmrr,
>> +                              XEN_DOMCTL_DEV_RDM_RELAXED);
>
> ditto

It doesn't matter when we're trying to remove a device since we don't 
care this flag.

>
>>       }
>>
>>       return domain_context_unmap(pdev->domain, devfn, pdev);
>> @@ -2098,7 +2101,7 @@ static void __hwdom_init setup_hwdom_rmrr(struct domain *d)
>>       spin_lock(&pcidevs_lock);
>>       for_each_rmrr_device ( rmrr, bdf, i )
>>       {
>> -        ret = rmrr_identity_mapping(d, 1, rmrr);
>> +        ret = rmrr_identity_mapping(d, 1, rmrr, XEN_DOMCTL_DEV_RDM_RELAXED);
>>           if ( ret )
>>               dprintk(XENLOG_ERR VTDPREFIX,
>>                        "IOMMU: mapping reserved region failed\n");
>> @@ -2241,7 +2244,8 @@ static int reassign_device_ownership(
>>                    PCI_BUS(bdf) == pdev->bus &&
>>                    PCI_DEVFN2(bdf) == devfn )
>>               {
>> -                ret = rmrr_identity_mapping(source, 0, rmrr);
>> +                ret = rmrr_identity_mapping(source, 0, rmrr,
>> +                                            XEN_DOMCTL_DEV_RDM_RELAXED);
>>                   if ( ret != -ENOENT )
>>                       return ret;
>>               }
>> @@ -2265,7 +2269,7 @@ static int reassign_device_ownership(
>>   }
>>
>>   static int intel_iommu_assign_device(
>> -    struct domain *d, u8 devfn, struct pci_dev *pdev)
>> +    struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag)
>>   {
>>       struct acpi_rmrr_unit *rmrr;
>>       int ret = 0, i;
>> @@ -2294,7 +2298,7 @@ static int intel_iommu_assign_device(
>>                PCI_BUS(bdf) == bus &&
>>                PCI_DEVFN2(bdf) == devfn )
>>           {
>> -            ret = rmrr_identity_mapping(d, 1, rmrr);
>> +            ret = rmrr_identity_mapping(d, 1, rmrr, flag);
>>               if ( ret )
>>               {
>>                   reassign_device_ownership(d, hardware_domain, devfn, pdev);
>> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
>> index 95b6266..a80b4f8 100644
>> --- a/xen/include/asm-x86/p2m.h
>> +++ b/xen/include/asm-x86/p2m.h
>> @@ -545,7 +545,7 @@ int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn,
>> mfn_t mfn);
>>
>>   /* Set identity addresses in the p2m table (for pass-through) */
>>   int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
>> -                           p2m_access_t p2ma);
>> +                           p2m_access_t p2ma, u32 flag);
>>
>>   /* Add foreign mapping to the guest's p2m table. */
>>   int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
>> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
>> index bc45ea5..2f9e40e 100644
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -478,6 +478,11 @@ struct xen_domctl_assign_device {
>>               XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
>>           } dt;
>>       } u;
>> +    /* IN */
>> +#define XEN_DOMCTL_DEV_NO_RDM           0
>> +#define XEN_DOMCTL_DEV_RDM_RELAXED      1
>> +#define XEN_DOMCTL_DEV_RDM_STRICT       2
>
> I don't understand why we require a NO_RDM flag. Whether there
> is RDM associated with a given device, it's reported by system
> BIOS or through cmdline extension in coming patch. Why do we
> require the hypercall to ask NO_RDM to hypervisor? The only flags
> we want to pass to hypervisor is relaxed/strict policy, so hypervisor
> know whether to fail or warn upon caught conflicts of identity
> mapping...
>

This is just introduced to ARM as we discussed previously since we're 
touching some common interfaces.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 06/16] hvmloader: get guest memory map into memory_map[]
  2015-06-11  9:38   ` Tian, Kevin
@ 2015-06-12  7:33     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-12  7:33 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

On 2015/6/11 17:38, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Thursday, June 11, 2015 9:15 AM
>>
>> Now we get this map layout by call XENMEM_memory_map then
>> save them into one global variable memory_map[]. It should
>> include lowmem range, rdm range and highmem range. Note
>> rdm range and highmem range may not exist in some cases.
>>
>> And here we need to check if any reserved memory conflicts with
>> [RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END].
>> This range is used to allocate memory in hvmloder level, and
>> we would lead hvmloader failed in case of conflict since its
>> another rare possibility in real world.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   tools/firmware/hvmloader/e820.h      |  7 +++++++
>>   tools/firmware/hvmloader/hvmloader.c | 37
>> ++++++++++++++++++++++++++++++++++++
>>   tools/firmware/hvmloader/util.c      | 26 +++++++++++++++++++++++++
>>   tools/firmware/hvmloader/util.h      | 11 +++++++++++
>>   4 files changed, 81 insertions(+)
>>
>> diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
>> index b2ead7f..8b5a9e0 100644
>> --- a/tools/firmware/hvmloader/e820.h
>> +++ b/tools/firmware/hvmloader/e820.h
>> @@ -15,6 +15,13 @@ struct e820entry {
>>       uint32_t type;
>>   } __attribute__((packed));
>>
>> +#define E820MAX	128
>> +
>> +struct e820map {
>> +    unsigned int nr_map;
>> +    struct e820entry map[E820MAX];
>> +};
>> +
>>   #endif /* __HVMLOADER_E820_H__ */
>>
>>   /*
>> diff --git a/tools/firmware/hvmloader/hvmloader.c
>> b/tools/firmware/hvmloader/hvmloader.c
>> index 25b7f08..c9f170e 100644
>> --- a/tools/firmware/hvmloader/hvmloader.c
>> +++ b/tools/firmware/hvmloader/hvmloader.c
>> @@ -107,6 +107,8 @@ asm (
>>       "    .text                       \n"
>>       );
>>
>> +struct e820map memory_map;
>> +
>>   unsigned long scratch_start = SCRATCH_PHYSICAL_ADDRESS;
>>
>>   static void init_hypercalls(void)
>> @@ -199,6 +201,39 @@ static void apic_setup(void)
>>       ioapic_write(0x11, SET_APIC_ID(LAPIC_ID(0)));
>>   }
>>
>> +void memory_map_setup(void)
>> +{
>> +    unsigned int nr_entries = E820MAX, i;
>> +    int rc;
>> +    uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
>> +    uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
>> +
>> +    rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
>> +
>> +    if ( rc )
>> +    {
>> +        printf("Failed to get guest memory map.\n");
>> +        BUG();
>> +    }
>> +
>> +    BUG_ON(!nr_entries);
>> +    memory_map.nr_map = nr_entries;
>> +
>> +    for ( i = 0; i < nr_entries; i++ )
>> +    {
>> +        if ( memory_map.map[i].type == E820_RESERVED )
>> +        {
>> +            if ( check_overlap(alloc_addr, alloc_size,
>> +                               memory_map.map[i].addr,
>> +                               memory_map.map[i].size) )
>> +            {
>> +                printf("RDM conflicts Memory allocation.\n");
>
> hvmloader has no concept of RDM here. It's just E820_RESERVED
> type. Please make the error message clear, e.g. "Fail to setup
> memory map due to conflict on dynamic reserved memory range."
>

Okay,

+            {
+                printf("Fail to setup memory map due to conflict");
+                printf(" on dynamic reserved memory range.\n");
+                BUG();
+            }


> Otherwise:
>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-11  9:51   ` Tian, Kevin
@ 2015-06-12  7:53     ` Chen, Tiejun
  2015-06-16  5:47       ` Tian, Kevin
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-12  7:53 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

On 2015/6/11 17:51, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Thursday, June 11, 2015 9:15 AM
>>
>> When allocating mmio address for PCI bars, we need to make
>> sure they don't overlap with reserved regions.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   tools/firmware/hvmloader/pci.c | 36
>> ++++++++++++++++++++++++++++++++++--
>>   1 file changed, 34 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
>> index 5ff87a7..98af568 100644
>> --- a/tools/firmware/hvmloader/pci.c
>> +++ b/tools/firmware/hvmloader/pci.c
>> @@ -59,8 +59,8 @@ void pci_setup(void)
>>           uint32_t bar_reg;
>>           uint64_t bar_sz;
>>       } *bars = (struct bars *)scratch_start;
>> -    unsigned int i, nr_bars = 0;
>> -    uint64_t mmio_hole_size = 0;
>> +    unsigned int i, j, nr_bars = 0;
>> +    uint64_t mmio_hole_size = 0, reserved_end, max_bar_sz = 0;
>>
>>       const char *s;
>>       /*
>> @@ -226,6 +226,8 @@ void pci_setup(void)
>>               bars[i].devfn   = devfn;
>>               bars[i].bar_reg = bar_reg;
>>               bars[i].bar_sz  = bar_sz;
>> +            if ( bar_sz > max_bar_sz )
>> +                max_bar_sz = bar_sz;
>>
>>               if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
>>                     PCI_BASE_ADDRESS_SPACE_MEMORY) ||
>> @@ -301,6 +303,21 @@ void pci_setup(void)
>>               pci_mem_start <<= 1;
>>       }
>>
>> +    /* Relocate PCI memory that overlaps reserved space, like RDM. */
>> +    for ( j = 0; j < memory_map.nr_map ; j++ )
>> +    {
>> +        if ( memory_map.map[j].type != E820_RAM )
>> +        {
>> +            reserved_end = memory_map.map[j].addr + memory_map.map[j].size;
>> +            if ( check_overlap(pci_mem_start, pci_mem_end,
>> +                               memory_map.map[j].addr,
>> +                               memory_map.map[j].size) )
>> +                pci_mem_start -= memory_map.map[j].size >> PAGE_SHIFT;
>
> what's the point of subtracting reserved size here? I think you want
> to move pci_mem_start higher instead of lower to avoid conflict, right?

No.

>
>> +                pci_mem_start = (pci_mem_start + max_bar_sz - 1) &
>> +                                    ~(uint64_t)(max_bar_sz - 1);
>
> better have some comment to explain what exactly you're trying to
> achieve here.

Actually I didn't have this code fragment. Here I add this chunk of 
codes to address one concern that Jan raised. Please see the below.

>
>> +        }
>> +    }
>> +
>>       if ( mmio_total > (pci_mem_end - pci_mem_start) )
>>       {
>>           printf("Low MMIO hole not large enough for all devices,"
>> @@ -407,8 +424,23 @@ void pci_setup(void)
>>           }
>>
>>           base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
>> + reallocate_mmio:
>
> In earlier comment you said:
>
>> +    /* Relocate PCI memory that overlaps reserved space, like RDM. */
>
> If pci_mem_start has been relocated to avoid overlapping, how will actual
> allocation here will conflict again? Sorry I may miss the two relocations here...
>
>>           bar_data |= (uint32_t)base;
>>           bar_data_upper = (uint32_t)(base >> 32);
>> +        for ( j = 0; j < memory_map.nr_map ; j++ )
>> +        {
>> +            if ( memory_map.map[j].type != E820_RAM )
>> +            {
>> +                reserved_end = memory_map.map[j].addr +
>> memory_map.map[j].size;
>> +                if ( check_overlap(base, bar_sz,
>> +                                   memory_map.map[j].addr,
>> +                                   memory_map.map[j].size) )
>> +                {
>> +                    base = (reserved_end  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
>> +                    goto reallocate_mmio;

That is because our previous implementation is just skipping that 
conflict region,

"But you do nothing to make sure the MMIO regions all fit in the
available window (see the code ahead of this relocating RAM if
necessary)." and "...it simply skips assigning resources. Your changes 
potentially growing the space needed to fit all MMIO BARs therefore also 
needs to adjust the up front calculation, such that if necessary more 
RAM can be relocated to make the hole large enough."

And then I replied as follows,

"You're right.

Just think about we're always trying to check pci_mem_start to populate 
more RAM to obtain enough PCI mempry,

     /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
     while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
     {
         struct xen_add_to_physmap xatp;
         unsigned int nr_pages = min_t(
             unsigned int,
             hvm_info->low_mem_pgend - (pci_mem_start >> PAGE_SHIFT),
             (1u << 16) - 1);
         if ( hvm_info->high_mem_pgend == 0 )
             hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
         hvm_info->low_mem_pgend -= nr_pages;
         printf("Relocating 0x%x pages from "PRIllx" to "PRIllx\
                " for lowmem MMIO hole\n",
                nr_pages,
                PRIllx_arg(((uint64_t)hvm_info->low_mem_pgend)<<PAGE_SHIFT),

PRIllx_arg(((uint64_t)hvm_info->high_mem_pgend)<<PAGE_SHIFT));
         xatp.domid = DOMID_SELF;
         xatp.space = XENMAPSPACE_gmfn_range;
         xatp.idx   = hvm_info->low_mem_pgend;
         xatp.gpfn  = hvm_info->high_mem_pgend;
         xatp.size  = nr_pages;
         if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
             BUG();
         hvm_info->high_mem_pgend += nr_pages;
     }
"

I hope this can help you understand this background. And I will update 
that code comment like this,

     /*
      * We'll skip all space overlapping with reserved memory later,
      * so we need to decrease pci_mem_start to populate more RAM
      * to compensate them.
      */

Thanks
Tiejun

>> +                }
>> +            }
>> +        }
>>           base += bar_sz;
>>
>>           if ( (base < resource->base) || (base > resource->max) )
>> --
>> 1.9.1
>
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 00/16] Fix RMRR
  2015-06-12  2:10   ` Chen, Tiejun
@ 2015-06-12  8:04     ` Jan Beulich
  2015-06-12  8:20       ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-12  8:04 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Tim Deegan, kevin.tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, yang.z.zhang

>>> On 12.06.15 at 04:10, <tiejun.chen@intel.com> wrote:
> On 2015/6/11 20:52, Tim Deegan wrote:
>> which would be better handeld explicitly:
>>
>>         if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
>>             ...
>>
> 
> So if I'm correct, we should do this check explicitly,
> 
>         if ( p2mt == p2m_invalid ||
>              (p2mt == p2m_mmio_dm && !mfn_valid(mfn) )
> 
> Note this is equivalent to Jan's comment.

I think the !mfn_valid() part is really redundant - p2m_mmio_dm
should never be put on a page translating to a valid MFN.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 08/16] hvmloader/e820: construct guest e820 table
  2015-06-11  9:59   ` Tian, Kevin
@ 2015-06-12  8:19     ` Chen, Tiejun
  2015-06-16  5:54       ` Tian, Kevin
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-12  8:19 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

On 2015/6/11 17:59, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Thursday, June 11, 2015 9:15 AM
>>
>> Now we can use that memory map to build our final
>> e820 table but it may need to reorder all e820
>> entries.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   tools/firmware/hvmloader/e820.c | 62
>> +++++++++++++++++++++++++++++++----------
>>   1 file changed, 48 insertions(+), 14 deletions(-)
>>
>> diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
>> index 2e05e93..c39b0aa 100644
>> --- a/tools/firmware/hvmloader/e820.c
>> +++ b/tools/firmware/hvmloader/e820.c
>> @@ -73,7 +73,8 @@ int build_e820_table(struct e820entry *e820,
>>                        unsigned int lowmem_reserved_base,
>>                        unsigned int bios_image_base)
>>   {
>> -    unsigned int nr = 0;
>> +    unsigned int nr = 0, i, j;
>> +    uint64_t low_mem_pgend = hvm_info->low_mem_pgend << PAGE_SHIFT;
>
> You may call it low_mem_end to differentiate from original
> low_mem_pgend since one means actual address while the
> other means pfn.

Okay.

>
>>
>>       if ( !lowmem_reserved_base )
>>               lowmem_reserved_base = 0xA0000;
>> @@ -117,13 +118,6 @@ int build_e820_table(struct e820entry *e820,
>>       e820[nr].type = E820_RESERVED;
>>       nr++;
>>
>> -    /* Low RAM goes here. Reserve space for special pages. */
>> -    BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
>> -    e820[nr].addr = 0x100000;
>> -    e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
>> -    e820[nr].type = E820_RAM;
>> -    nr++;
>> -
>>       /*
>>        * Explicitly reserve space for special pages.
>>        * This space starts at RESERVED_MEMBASE an extends to cover various
>> @@ -159,16 +153,56 @@ int build_e820_table(struct e820entry *e820,
>>           nr++;
>>       }
>>
>> -
>> -    if ( hvm_info->high_mem_pgend )
>> +    /*
>> +     * Construct the remaining according memory_map.
>
> "Construct E820 table according to recorded memory map"

Fixed.

>
>> +     *
>> +     * Note memory_map includes,
>
> The memory map created by toolstack may include:
>

Fixed.

>> +     *
>> +     * #1. Low memory region
>> +     *
>> +     * Low RAM starts at least from 1M to make sure all standard regions
>> +     * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
>> +     * have enough space.
>> +     *
>> +     * #2. RDM region if it exists
>
> "Reserved regions if they exist"

Fixed.

>
>> +     *
>> +     * #3. High memory region if it exists
>> +     */
>> +    for ( i = 0; i < memory_map.nr_map; i++ )
>>       {
>> -        e820[nr].addr = ((uint64_t)1 << 32);
>> -        e820[nr].size =
>> -            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
>> -        e820[nr].type = E820_RAM;
>> +        e820[nr] = memory_map.map[i];
>>           nr++;
>>       }
>>
>> +    /* Low RAM goes here. Reserve space for special pages. */
>> +    BUG_ON(low_mem_pgend < (2u << 20));
>> +    /*
>> +     * We may need to adjust real lowmem end since we may
>> +     * populate RAM to get enough MMIO previously.
>> +     */
>> +    for ( i = 0; i < memory_map.nr_map; i++ )
>
> since you already translate memory map into e820 earlier, here
> you should use 'nr' instead of memory_map.nr_map.
>

As we're saying in the code comment above, we're just handling the 
lowmem entry, so I think memory_map.nr_map is enough.

>> +    {
>> +        uint64_t end = e820[i].addr + e820[i].size;
>> +        if ( e820[i].type == E820_RAM &&
>> +             low_mem_pgend > e820[i].addr && low_mem_pgend < end )
>> +            e820[i].size = low_mem_pgend - e820[i].addr;
>> +    }
>
> Sorry I may miss the code but could you elaborate where the
> low_mem_pgend is changed after memory map is created? If
> it happens within hvmloader, suppose the amount of reduced
> memory from original E820_RAM entry should be added to
> another E820_RAM entry for highmem, right?

You're right so I really should compensate this in highmem entry,

     add_high_mem = end - low_mem_end;

     /*
      * And then we also need to adjust highmem.
      */
     if ( add_high_mem )
     {
         for ( i = 0; i < memory_map.nr_map; i++ )
         {
             if ( e820[i].type == E820_RAM &&
                  e820[i].addr > (1ull << 32))
                 e820[i].size += add_high_mem;
         }
     }


Thanks
Tiejun

>
>> +
>> +    /* Finally we need to reorder all e820 entries. */
>> +    for ( j = 0; j < nr-1; j++ )
>> +    {
>> +        for ( i = j+1; i < nr; i++ )
>> +        {
>> +            if ( e820[j].addr > e820[i].addr )
>> +            {
>> +                struct e820entry tmp;
>> +                tmp = e820[j];
>> +                e820[j] = e820[i];
>> +                e820[i] = tmp;
>> +            }
>> +        }
>> +    }
>> +
>>       return nr;
>>   }
>>
>> --
>> 1.9.1
>
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 00/16] Fix RMRR
  2015-06-12  8:04     ` Jan Beulich
@ 2015-06-12  8:20       ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-12  8:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, kevin.tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, yang.z.zhang

On 2015/6/12 16:04, Jan Beulich wrote:
>>>> On 12.06.15 at 04:10, <tiejun.chen@intel.com> wrote:
>> On 2015/6/11 20:52, Tim Deegan wrote:
>>> which would be better handeld explicitly:
>>>
>>>          if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
>>>              ...
>>>
>>
>> So if I'm correct, we should do this check explicitly,
>>
>>          if ( p2mt == p2m_invalid ||
>>               (p2mt == p2m_mmio_dm && !mfn_valid(mfn) )
>>
>> Note this is equivalent to Jan's comment.
>
> I think the !mfn_valid() part is really redundant - p2m_mmio_dm
> should never be put on a page translating to a valid MFN.
>

I'm not sure immediately but I believe you're definitely right at this 
point, so lets goes there :)

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy
  2015-06-11 10:02   ` Tian, Kevin
@ 2015-06-12  8:25     ` Chen, Tiejun
  2015-06-16  2:28       ` Tian, Kevin
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-12  8:25 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

On 2015/6/11 18:02, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Thursday, June 11, 2015 9:15 AM
>>
>> This patch passes rdm reservation policy to xc_assign_device() so the policy
>> is checked when assigning devices to a VM.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   tools/libxc/include/xenctrl.h       |  3 ++-
>>   tools/libxc/xc_domain.c             |  6 +++++-
>>   tools/libxl/libxl_pci.c             |  3 ++-
>>   tools/ocaml/libs/xc/xenctrl_stubs.c | 18 ++++++++++++++----
>>   tools/python/xen/lowlevel/xc/xc.c   | 29 +++++++++++++++++++----------
>>   5 files changed, 42 insertions(+), 17 deletions(-)
>>
>> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
>> index 6c01362..7fd60d5 100644
>> --- a/tools/libxc/include/xenctrl.h
>> +++ b/tools/libxc/include/xenctrl.h
>> @@ -2078,7 +2078,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
>>   /* HVM guest pass-through */
>>   int xc_assign_device(xc_interface *xch,
>>                        uint32_t domid,
>> -                     uint32_t machine_sbdf);
>> +                     uint32_t machine_sbdf,
>> +                     uint32_t flag);
>>
>>   int xc_get_device_group(xc_interface *xch,
>>                        uint32_t domid,
>> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
>> index 4f96e1b..19127ec 100644
>> --- a/tools/libxc/xc_domain.c
>> +++ b/tools/libxc/xc_domain.c
>> @@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
>>   int xc_assign_device(
>>       xc_interface *xch,
>>       uint32_t domid,
>> -    uint32_t machine_sbdf)
>> +    uint32_t machine_sbdf,
>> +    uint32_t flag)
>>   {
>>       DECLARE_DOMCTL;
>>
>> @@ -1705,6 +1706,7 @@ int xc_assign_device(
>>       domctl.domain = domid;
>>       domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
>>       domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
>> +    domctl.u.assign_device.flag = flag;
>>
>>       return do_domctl(xch, &domctl);
>>   }
>> @@ -1792,6 +1794,8 @@ int xc_assign_dt_device(
>>
>>       domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
>>       domctl.u.assign_device.u.dt.size = size;
>> +    /* DT doesn't own any RDM. */
>> +    domctl.u.assign_device.flag = XEN_DOMCTL_DEV_NO_RDM;
>
> still not clear about this NO_RDM flag. If a device-tree device doesn't
> own RDM, the hypervisor will know it. Why do we require toolstack
> to tell hypervisor not use it?

I think an explicit flag can make this sort of case identified, right? 
And other flags brings easily some confusions, or even a potential risk 
in the future.

Thanks
Tiejun

>
>>       set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
>>
>>       rc = do_domctl(xch, &domctl);
>> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
>> index e0743f8..632c15e 100644
>> --- a/tools/libxl/libxl_pci.c
>> +++ b/tools/libxl/libxl_pci.c
>> @@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid,
>> libxl_device_pci *pcidev, i
>>       FILE *f;
>>       unsigned long long start, end, flags, size;
>>       int irq, i, rc, hvm = 0;
>> +    uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>>
>>       if (type == LIBXL_DOMAIN_TYPE_INVALID)
>>           return ERROR_FAIL;
>> @@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid,
>> libxl_device_pci *pcidev, i
>>
>>   out:
>>       if (!libxl_is_stubdom(ctx, domid, NULL)) {
>> -        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
>> +        rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
>>           if (rc < 0 && (hvm || errno != ENOSYS)) {
>>               LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
>>               return ERROR_FAIL;
>> diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
>> index 64f1137..317bf75 100644
>> --- a/tools/ocaml/libs/xc/xenctrl_stubs.c
>> +++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
>> @@ -1172,12 +1172,19 @@ CAMLprim value stub_xc_domain_test_assign_device(value
>> xch, value domid, value d
>>   	CAMLreturn(Val_bool(ret == 0));
>>   }
>>
>> -CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
>> +static int domain_assign_device_rdm_flag_table[] = {
>> +    XEN_DOMCTL_DEV_NO_RDM,
>> +    XEN_DOMCTL_DEV_RDM_RELAXED,
>> +    XEN_DOMCTL_DEV_RDM_STRICT,
>> +};
>> +
>> +CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
>> +                                            value rflag)
>>   {
>> -	CAMLparam3(xch, domid, desc);
>> +	CAMLparam4(xch, domid, desc, rflag);
>>   	int ret;
>>   	int domain, bus, dev, func;
>> -	uint32_t sbdf;
>> +	uint32_t sbdf, flag;
>>
>>   	domain = Int_val(Field(desc, 0));
>>   	bus = Int_val(Field(desc, 1));
>> @@ -1185,7 +1192,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch,
>> value domid, value desc)
>>   	func = Int_val(Field(desc, 3));
>>   	sbdf = encode_sbdf(domain, bus, dev, func);
>>
>> -	ret = xc_assign_device(_H(xch), _D(domid), sbdf);
>> +	ret = Int_val(Field(rflag, 0));
>> +	flag = domain_assign_device_rdm_flag_table[ret];
>> +
>> +	ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
>>
>>   	if (ret < 0)
>>   		failwith_xc(_H(xch));
>> diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
>> index c77e15b..172bdf0 100644
>> --- a/tools/python/xen/lowlevel/xc/xc.c
>> +++ b/tools/python/xen/lowlevel/xc/xc.c
>> @@ -592,7 +592,8 @@ static int token_value(char *token)
>>       return strtol(token, NULL, 16);
>>   }
>>
>> -static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
>> +static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func,
>> +                    int *flag)
>>   {
>>       char *token;
>>
>> @@ -607,8 +608,16 @@ static int next_bdf(char **str, int *seg, int *bus, int *dev, int
>> *func)
>>       *dev  = token_value(token);
>>       token = strchr(token, ',') + 1;
>>       *func  = token_value(token);
>> -    token = strchr(token, ',');
>> -    *str = token ? token + 1 : NULL;
>> +    token = strchr(token, ',') + 1;
>> +    if ( token ) {
>> +        *flag = token_value(token);
>> +        *str = token + 1;
>> +    }
>> +    else
>> +    {
>> +        *flag = XEN_DOMCTL_DEV_RDM_STRICT;
>> +        *str = NULL;
>> +    }
>>
>>       return 1;
>>   }
>> @@ -620,14 +629,14 @@ static PyObject *pyxc_test_assign_device(XcObject *self,
>>       uint32_t dom;
>>       char *pci_str;
>>       int32_t sbdf = 0;
>> -    int seg, bus, dev, func;
>> +    int seg, bus, dev, func, flag;
>>
>>       static char *kwd_list[] = { "domid", "pci", NULL };
>>       if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
>>                                         &dom, &pci_str) )
>>           return NULL;
>>
>> -    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
>> +    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
>>       {
>>           sbdf = seg << 16;
>>           sbdf |= (bus & 0xff) << 8;
>> @@ -653,21 +662,21 @@ static PyObject *pyxc_assign_device(XcObject *self,
>>       uint32_t dom;
>>       char *pci_str;
>>       int32_t sbdf = 0;
>> -    int seg, bus, dev, func;
>> +    int seg, bus, dev, func, flag;
>>
>>       static char *kwd_list[] = { "domid", "pci", NULL };
>>       if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
>>                                         &dom, &pci_str) )
>>           return NULL;
>>
>> -    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
>> +    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
>>       {
>>           sbdf = seg << 16;
>>           sbdf |= (bus & 0xff) << 8;
>>           sbdf |= (dev & 0x1f) << 3;
>>           sbdf |= (func & 0x7);
>>
>> -        if ( xc_assign_device(self->xc_handle, dom, sbdf) != 0 )
>> +        if ( xc_assign_device(self->xc_handle, dom, sbdf, flag) != 0 )
>>           {
>>               if (errno == ENOSYS)
>>                   sbdf = -1;
>> @@ -686,14 +695,14 @@ static PyObject *pyxc_deassign_device(XcObject *self,
>>       uint32_t dom;
>>       char *pci_str;
>>       int32_t sbdf = 0;
>> -    int seg, bus, dev, func;
>> +    int seg, bus, dev, func, flag;
>>
>>       static char *kwd_list[] = { "domid", "pci", NULL };
>>       if ( !PyArg_ParseTupleAndKeywords(args, kwds, "is", kwd_list,
>>                                         &dom, &pci_str) )
>>           return NULL;
>>
>> -    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func) )
>> +    while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
>>       {
>>           sbdf = seg << 16;
>>           sbdf |= (bus & 0xff) << 8;
>> --
>> 1.9.1
>
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 13/16] tools/libxl: detect and avoid conflicts with RDM
  2015-06-11 10:19   ` Tian, Kevin
@ 2015-06-12  8:30     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-12  8:30 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

On 2015/6/11 18:19, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Thursday, June 11, 2015 9:15 AM
>>
>> While building a VM, HVM domain builder provides struct hvm_info_table{}
>> to help hvmloader. Currently it includes two fields to construct guest
>> e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
>> check them to fix any conflict with RAM.
>>
>> RMRR can reside in address space beyond 4G theoretically, but we never
>> see this in real world. So in order to avoid breaking highmem layout
>> we don't solve highmem conflict. Note this means highmem rmrr could still
>> be supported if no conflict.
>>
>> But in the case of lowmem, RMRR probably scatter the whole RAM space.
>> Especially multiple RMRR entries would worsen this to lead a complicated
>> memory layout. And then its hard to extend hvm_info_table{} to work
>> hvmloader out. So here we're trying to figure out a simple solution to
>> avoid breaking existing layout. So when a conflict occurs,
>>
>>      #1. Above a predefined boundary (default 2G)
>>          - move lowmem_end below reserved region to solve conflict;
>>
>>      #2. Below a predefined boundary (default 2G)
>>          - Check strict/relaxed policy.
>>          "strict" policy leads to fail libxl. Note when both policies
>>          are specified on a given region, 'strict' is always preferred.
>>          "relaxed" policy issue a warning message and also mask this entry INVALID
>>          to indicate we shouldn't expose this entry to hvmloader.
>>
>> Note this predefined boundary can be changes with the parameter
>> "rdm_mem_boundary" in .cfg file.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>
> Reviewed-by: Kevin Tian <kevint.tian@intel.com>
>
> One comment though. could you be consistent to use RDM in the code?
> RMRR  is just an example of RDM...

Sure.

Thanks
Tiejun

>
>
>> ---
>>   docs/man/xl.cfg.pod.5          |  21 ++++
>>   tools/libxc/xc_hvm_build_x86.c |   5 +-
>>   tools/libxl/libxl.h            |   6 +
>>   tools/libxl/libxl_create.c     |   6 +-
>>   tools/libxl/libxl_dm.c         | 255
>> +++++++++++++++++++++++++++++++++++++++++
>>   tools/libxl/libxl_dom.c        |  11 +-
>>   tools/libxl/libxl_internal.h   |  11 +-
>>   tools/libxl/libxl_types.idl    |   8 ++
>>   tools/libxl/xl_cmdimpl.c       |   3 +
>>   9 files changed, 322 insertions(+), 4 deletions(-)
>>
>> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
>> index 638b350..6fd2370 100644
>> --- a/docs/man/xl.cfg.pod.5
>> +++ b/docs/man/xl.cfg.pod.5
>> @@ -767,6 +767,27 @@ to a given device, and "strict" is default here.
>>
>>   Note this would override global B<rdm> option.
>>
>> +=item B<rdm_mem_boundary=MBYTES>
>> +
>> +Number of megabytes to set a boundary for checking rdm conflict.
>> +
>> +When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
>> +Especially multiple RMRR entries would worsen this to lead a complicated
>> +memory layout. So here we're trying to figure out a simple solution to
>> +avoid breaking existing layout. So when a conflict occurs,
>> +
>> +    #1. Above a predefined boundary
>> +        - move lowmem_end below reserved region to solve conflict;
>> +
>> +    #2. Below a predefined boundary
>> +        - Check strict/relaxed policy.
>> +        "strict" policy leads to fail libxl. Note when both policies
>> +        are specified on a given region, 'strict' is always preferred.
>> +        "relaxed" policy issue a warning message and also mask this entry INVALID
>> +        to indicate we shouldn't expose this entry to hvmloader.
>> +
>> +Here the default is 2G.
>> +
>>   =back
>>
>>   =back
>> diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
>> index 0e98c84..5142578 100644
>> --- a/tools/libxc/xc_hvm_build_x86.c
>> +++ b/tools/libxc/xc_hvm_build_x86.c
>> @@ -21,6 +21,7 @@
>>   #include <stdlib.h>
>>   #include <unistd.h>
>>   #include <zlib.h>
>> +#include <assert.h>
>>
>>   #include "xg_private.h"
>>   #include "xc_private.h"
>> @@ -270,7 +271,7 @@ static int setup_guest(xc_interface *xch,
>>
>>       elf_parse_binary(&elf);
>>       v_start = 0;
>> -    v_end = args->mem_size;
>> +    v_end = args->lowmem_end;
>>
>>       if ( nr_pages > target_pages )
>>           memflags |= XENMEMF_populate_on_demand;
>> @@ -754,6 +755,8 @@ int xc_hvm_build_target_mem(xc_interface *xch,
>>       args.mem_size = (uint64_t)memsize << 20;
>>       args.mem_target = (uint64_t)target << 20;
>>       args.image_file_name = image_name;
>> +    if ( args.mmio_size == 0 )
>> +        args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
>>
>>       return xc_hvm_build(xch, domid, &args);
>>   }
>> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
>> index 0a7913b..a6212fb 100644
>> --- a/tools/libxl/libxl.h
>> +++ b/tools/libxl/libxl.h
>> @@ -858,6 +858,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
>>   #define LIBXL_TIMER_MODE_DEFAULT -1
>>   #define LIBXL_MEMKB_DEFAULT ~0ULL
>>
>> +/*
>> + * We'd like to set a memory boundary to determine if we need to check
>> + * any overlap with reserved device memory.
>> + */
>> +#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
>> +
>>   #define LIBXL_MS_VM_GENID_LEN 16
>>   typedef struct {
>>       uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
>> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
>> index 6c8ec63..0438731 100644
>> --- a/tools/libxl/libxl_create.c
>> +++ b/tools/libxl/libxl_create.c
>> @@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, libxl_domain_build_info
>> *b_info)
>>   {
>>       if (b_info->rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
>>           b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
>> +
>> +    if (b_info->rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
>> +        b_info->rdm_mem_boundary_memkb =
>> +                            LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
>>   }
>>
>>   int libxl__domain_build_info_setdefault(libxl__gc *gc,
>> @@ -460,7 +464,7 @@ int libxl__domain_build(libxl__gc *gc,
>>
>>       switch (info->type) {
>>       case LIBXL_DOMAIN_TYPE_HVM:
>> -        ret = libxl__build_hvm(gc, domid, info, state);
>> +        ret = libxl__build_hvm(gc, domid, d_config, state);
>>           if (ret)
>>               goto out;
>>
>> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
>> index 33f9ce6..d908350 100644
>> --- a/tools/libxl/libxl_dm.c
>> +++ b/tools/libxl/libxl_dm.c
>> @@ -90,6 +90,261 @@ const char *libxl__domain_device_model(libxl__gc *gc,
>>       return dm;
>>   }
>>
>> +static struct xen_reserved_device_memory
>> +*xc_device_get_rdm(libxl__gc *gc,
>> +                   uint32_t flag,
>> +                   uint16_t seg,
>> +                   uint8_t bus,
>> +                   uint8_t devfn,
>> +                   unsigned int *nr_entries)
>> +{
>> +    struct xen_reserved_device_memory *xrdm;
>> +    int rc;
>> +
>> +    rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
>> +                                       NULL, nr_entries);
>> +    assert(rc <= 0);
>> +    /* "0" means we have no any rdm entry. */
>> +    if (!rc)
>> +        goto out;
>> +
>> +    if (errno == ENOBUFS) {
>> +        xrdm = malloc(*nr_entries * sizeof(xen_reserved_device_memory_t));
>> +        if (!xrdm) {
>> +            LOG(ERROR, "Could not allocate RDM buffer!\n");
>> +            goto out;
>> +        }
>> +        rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
>> +                                           xrdm, nr_entries);
>> +        if (rc) {
>> +            LOG(ERROR, "Could not get reserved device memory maps.\n");
>> +            *nr_entries = 0;
>> +            free(xrdm);
>> +            xrdm = NULL;
>> +        }
>> +    } else
>> +        LOG(ERROR, "Could not get reserved device memory maps.\n");
>> +
>> + out:
>> +    return xrdm;
>> +}
>> +
>> +/*
>> + * Check whether there exists rdm hole in the specified memory range.
>> + * Returns true if exists, else returns false.
>> + */
>> +static bool overlaps_rdm(uint64_t start, uint64_t memsize,
>> +                         uint64_t rdm_start, uint64_t rdm_size)
>> +{
>> +    return (start + memsize > rdm_start) && (start < rdm_start + rdm_size);
>> +}
>> +
>> +/*
>> + * Check reported RDM regions and handle potential gfn conflicts according
>> + * to user preferred policy.
>> + *
>> + * RMRR can reside in address space beyond 4G theoretically, but we never
>> + * see this in real world. So in order to avoid breaking highmem layout
>> + * we don't solve highmem conflict. Note this means highmem rmrr could still
>> + * be supported if no conflict.
>> + *
>> + * But in the case of lowmem, RMRR probably scatter the whole RAM space.
>> + * Especially multiple RMRR entries would worsen this to lead a complicated
>> + * memory layout. And then its hard to extend hvm_info_table{} to work
>> + * hvmloader out. So here we're trying to figure out a simple solution to
>> + * avoid breaking existing layout. So when a conflict occurs,
>> + *
>> + * #1. Above a predefined boundary (default 2G)
>> + * - Move lowmem_end below reserved region to solve conflict;
>> + *
>> + * #2. Below a predefined boundary (default 2G)
>> + * - Check strict/relaxed policy.
>> + * "strict" policy leads to fail libxl. Note when both policies
>> + * are specified on a given region, 'strict' is always preferred.
>> + * "relaxed" policy issue a warning message and also mask this entry
>> + * INVALID to indicate we shouldn't expose this entry to hvmloader.
>> + */
>> +int libxl__domain_device_construct_rdm(libxl__gc *gc,
>> +                                       libxl_domain_config *d_config,
>> +                                       uint64_t rdm_mem_boundary,
>> +                                       struct xc_hvm_build_args *args)
>> +{
>> +    int i, j, conflict;
>> +    struct xen_reserved_device_memory *xrdm = NULL;
>> +    uint32_t type = d_config->b_info.rdm.type;
>> +    uint16_t seg;
>> +    uint8_t bus, devfn;
>> +    uint64_t rdm_start, rdm_size;
>> +    uint64_t highmem_end = args->highmem_end ? args->highmem_end : (1ull<<32);
>> +
>> +    /* Might not expose rdm. */
>> +    if (type == LIBXL_RDM_RESERVE_TYPE_NONE && !d_config->num_pcidevs)
>> +        return 0;
>> +
>> +    /* Query all RDM entries in this platform */
>> +    if (type == LIBXL_RDM_RESERVE_TYPE_HOST) {
>> +        unsigned int nr_entries;
>> +
>> +        /* Collect all rdm info if exist. */
>> +        xrdm = xc_device_get_rdm(gc, PCI_DEV_RDM_ALL,
>> +                                 0, 0, 0, &nr_entries);
>> +        if (!nr_entries)
>> +            return 0;
>> +
>> +        assert(xrdm);
>> +
>> +        d_config->num_rdms = nr_entries;
>> +        d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
>> +                                d_config->num_rdms * sizeof(libxl_device_rdm));
>> +
>> +        for (i = 0; i < d_config->num_rdms; i++) {
>> +            d_config->rdms[i].start =
>> +                                (uint64_t)xrdm[i].start_pfn << XC_PAGE_SHIFT;
>> +            d_config->rdms[i].size =
>> +                                (uint64_t)xrdm[i].nr_pages << XC_PAGE_SHIFT;
>> +            d_config->rdms[i].flag = d_config->b_info.rdm.reserve;
>> +        }
>> +
>> +        free(xrdm);
>> +    } else
>> +        d_config->num_rdms = 0;
>> +
>> +    /* Query RDM entries per-device */
>> +    for (i = 0; i < d_config->num_pcidevs; i++) {
>> +        unsigned int nr_entries;
>> +        bool new = true;
>> +
>> +        seg = d_config->pcidevs[i].domain;
>> +        bus = d_config->pcidevs[i].bus;
>> +        devfn = PCI_DEVFN(d_config->pcidevs[i].dev, d_config->pcidevs[i].func);
>> +        nr_entries = 0;
>> +        xrdm = xc_device_get_rdm(gc, ~PCI_DEV_RDM_ALL,
>> +                                 seg, bus, devfn, &nr_entries);
>> +        /* No RDM to associated with this device. */
>> +        if (!nr_entries)
>> +            continue;
>> +
>> +        assert(xrdm);
>> +
>> +        /*
>> +         * Need to check whether this entry is already saved in the array.
>> +         * This could come from two cases:
>> +         *
>> +         *   - user may configure to get all RMRRs in this platform, which
>> +         *   is already queried before this point
>> +         *   - or two assigned devices may share one RMRR entry
>> +         *
>> +         * different policies may be configured on the same RMRR due to above
>> +         * two cases. We choose a simple policy to always favor stricter policy
>> +         */
>> +        for (j = 0; j < d_config->num_rdms; j++) {
>> +            if (d_config->rdms[j].start ==
>> +                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT)
>> +             {
>> +                if (d_config->rdms[j].flag != LIBXL_RDM_RESERVE_FLAG_STRICT)
>> +                    d_config->rdms[j].flag = d_config->pcidevs[i].rdm_reserve;
>> +                new = false;
>> +                break;
>> +            }
>> +        }
>> +
>> +        if (new) {
>> +            d_config->num_rdms++;
>> +            d_config->rdms = libxl__realloc(NOGC, d_config->rdms,
>> +                                d_config->num_rdms * sizeof(libxl_device_rdm));
>> +
>> +            d_config->rdms[d_config->num_rdms - 1].start =
>> +                                (uint64_t)xrdm[0].start_pfn << XC_PAGE_SHIFT;
>> +            d_config->rdms[d_config->num_rdms - 1].size =
>> +                                (uint64_t)xrdm[0].nr_pages << XC_PAGE_SHIFT;
>> +            d_config->rdms[d_config->num_rdms - 1].flag =
>> +                                d_config->pcidevs[i].rdm_reserve;
>> +        }
>> +        free(xrdm);
>> +    }
>> +
>> +    /*
>> +     * Next step is to check and avoid potential conflict between RDM entries
>> +     * and guest RAM. To avoid intrusive impact to existing memory layout
>> +     * {lowmem, mmio, highmem} which is passed around various function blocks,
>> +     * below conflicts are not handled which are rare and handling them would
>> +     * lead to a more scattered layout:
>> +     *  - RMRR in highmem area (>4G)
>> +     *  - RMRR lower than a defined memory boundary (e.g. 2G)
>> +     * Otherwise for conflicts between boundary and 4G, we'll simply move lowmem
>> +     * end below reserved region to solve conflict.
>> +     *
>> +     * If a conflict is detected on a given RMRR entry, an error will be
>> +     * returned if 'strict' policy is specified. Instead, if 'relaxed' policy
>> +     * specified, this conflict is treated just as a warning, but we mark this
>> +     * RMRR entry as INVALID to indicate that this entry shouldn't be exposed
>> +     * to hvmloader.
>> +     *
>> +     * Firstly we should check the case of rdm < 4G because we may need to
>> +     * expand highmem_end.
>> +     */
>> +    for (i = 0; i < d_config->num_rdms; i++) {
>> +        rdm_start = d_config->rdms[i].start;
>> +        rdm_size = d_config->rdms[i].size;
>> +        conflict = overlaps_rdm(0, args->lowmem_end, rdm_start, rdm_size);
>> +
>> +        if (!conflict)
>> +            continue;
>> +
>> +        /* Just check if RDM > our memory boundary. */
>> +        if (rdm_start > rdm_mem_boundary) {
>> +            /*
>> +             * We will move downwards lowmem_end so we have to expand
>> +             * highmem_end.
>> +             */
>> +            highmem_end += (args->lowmem_end - rdm_start);
>> +            /* Now move downwards lowmem_end. */
>> +            args->lowmem_end = rdm_start;
>> +        }
>> +    }
>> +
>> +    /* Sync highmem_end. */
>> +    args->highmem_end = highmem_end;
>> +
>> +    /*
>> +     * Finally we can take same policy to check lowmem(< 2G) and
>> +     * highmem adjusted above.
>> +     */
>> +    for (i = 0; i < d_config->num_rdms; i++) {
>> +        rdm_start = d_config->rdms[i].start;
>> +        rdm_size = d_config->rdms[i].size;
>> +        /* Does this entry conflict with lowmem? */
>> +        conflict = overlaps_rdm(0, args->lowmem_end,
>> +                                rdm_start, rdm_size);
>> +        /* Does this entry conflict with highmem? */
>> +        conflict |= overlaps_rdm((1ULL<<32),
>> +                                 args->highmem_end - (1ULL<<32),
>> +                                 rdm_start, rdm_size);
>> +
>> +        if (!conflict)
>> +            continue;
>> +
>> +        if(d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_STRICT) {
>> +            LOG(ERROR, "RDM conflict at 0x%lx.\n", d_config->rdms[i].start);
>> +            goto out;
>> +        } else {
>> +            LOG(WARN, "Ignoring RDM conflict at 0x%lx.\n",
>> +                      d_config->rdms[i].start);
>> +
>> +            /*
>> +             * Then mask this INVALID to indicate we shouldn't expose this
>> +             * to hvmloader.
>> +             */
>> +            d_config->rdms[i].flag = LIBXL_RDM_RESERVE_FLAG_INVALID;
>> +        }
>> +    }
>> +
>> +    return 0;
>> +
>> + out:
>> +    return ERROR_FAIL;
>> +}
>> +
>>   const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *guest_config)
>>   {
>>       const libxl_vnc_info *vnc = NULL;
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index 867172a..1777b32 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -914,13 +914,14 @@ out:
>>   }
>>
>>   int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>> -              libxl_domain_build_info *info,
>> +              libxl_domain_config *d_config,
>>                 libxl__domain_build_state *state)
>>   {
>>       libxl_ctx *ctx = libxl__gc_owner(gc);
>>       struct xc_hvm_build_args args = {};
>>       int ret, rc = ERROR_FAIL;
>>       uint64_t mmio_start, lowmem_end, highmem_end;
>> +    libxl_domain_build_info *const info = &d_config->b_info;
>>
>>       memset(&args, 0, sizeof(struct xc_hvm_build_args));
>>       /* The params from the configuration file are in Mb, which are then
>> @@ -958,6 +959,14 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>>       args.highmem_end = highmem_end;
>>       args.mmio_start = mmio_start;
>>
>> +    ret = libxl__domain_device_construct_rdm(gc, d_config,
>> +                                             info->rdm_mem_boundary_memkb*1024,
>> +                                             &args);
>> +    if (ret) {
>> +        LOG(ERROR, "checking reserved device memory failed");
>> +        goto out;
>> +    }
>> +
>>       if (info->num_vnuma_nodes != 0) {
>>           int i;
>>
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index e9ac886..52f3831 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
>> @@ -1011,7 +1011,7 @@ _hidden int libxl__build_post(libxl__gc *gc, uint32_t domid,
>>   _hidden int libxl__build_pv(libxl__gc *gc, uint32_t domid,
>>                libxl_domain_build_info *info, libxl__domain_build_state *state);
>>   _hidden int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>> -              libxl_domain_build_info *info,
>> +              libxl_domain_config *d_config,
>>                 libxl__domain_build_state *state);
>>
>>   _hidden int libxl__qemu_traditional_cmd(libxl__gc *gc, uint32_t domid,
>> @@ -1519,6 +1519,15 @@ _hidden int libxl__need_xenpv_qemu(libxl__gc *gc,
>>           int nr_channels, libxl_device_channel *channels);
>>
>>   /*
>> + * This function will fix reserved device memory conflict
>> + * according to user's configuration.
>> + */
>> +_hidden int libxl__domain_device_construct_rdm(libxl__gc *gc,
>> +                                   libxl_domain_config *d_config,
>> +                                   uint64_t rdm_mem_guard,
>> +                                   struct xc_hvm_build_args *args);
>> +
>> +/*
>>    * This function will cause the whole libxl process to hang
>>    * if the device model does not respond.  It is deprecated.
>>    *
>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>> index 4dfcaf7..b4282a0 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -395,6 +395,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>>       ("target_memkb",    MemKB),
>>       ("video_memkb",     MemKB),
>>       ("shadow_memkb",    MemKB),
>> +    ("rdm_mem_boundary_memkb",    MemKB),
>>       ("rtc_timeoffset",  uint32),
>>       ("exec_ssidref",    uint32),
>>       ("exec_ssid_label", string),
>> @@ -559,6 +560,12 @@ libxl_device_pci = Struct("device_pci", [
>>       ("rdm_reserve",   libxl_rdm_reserve_flag),
>>       ])
>>
>> +libxl_device_rdm = Struct("device_rdm", [
>> +    ("start", uint64),
>> +    ("size", uint64),
>> +    ("flag", libxl_rdm_reserve_flag),
>> +    ])
>> +
>>   libxl_device_dtdev = Struct("device_dtdev", [
>>       ("path", string),
>>       ])
>> @@ -589,6 +596,7 @@ libxl_domain_config = Struct("domain_config", [
>>       ("disks", Array(libxl_device_disk, "num_disks")),
>>       ("nics", Array(libxl_device_nic, "num_nics")),
>>       ("pcidevs", Array(libxl_device_pci, "num_pcidevs")),
>> +    ("rdms", Array(libxl_device_rdm, "num_rdms")),
>>       ("dtdevs", Array(libxl_device_dtdev, "num_dtdevs")),
>>       ("vfbs", Array(libxl_device_vfb, "num_vfbs")),
>>       ("vkbs", Array(libxl_device_vkb, "num_vkbs")),
>> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
>> index 4364ba4..85d74fd 100644
>> --- a/tools/libxl/xl_cmdimpl.c
>> +++ b/tools/libxl/xl_cmdimpl.c
>> @@ -1374,6 +1374,9 @@ static void parse_config_data(const char *config_source,
>>       if (!xlu_cfg_get_long (config, "videoram", &l, 0))
>>           b_info->video_memkb = l * 1024;
>>
>> +    if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
>> +        b_info->rdm_mem_boundary_memkb = l * 1024;
>> +
>>       if (!xlu_cfg_get_long(config, "max_event_channels", &l, 0))
>>           b_info->event_channels = l;
>>
>> --
>> 1.9.1
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 16/16] xen/vtd: prevent from assign the device with shared rmrr
  2015-06-11 10:25   ` Tian, Kevin
@ 2015-06-12  8:44     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-12  8:44 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

On 2015/6/11 18:25, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Thursday, June 11, 2015 9:15 AM
>>
>> Currently we're intending to cover this kind of devices
>
> we're -> we're not?

I mean currently we want to handle this shared case *simply* so I think 
its still "we're", right?

>
>> with shared RMRR simply since the case of shared RMRR is
>> a rare case according to our previous experiences. But
>> late we can group these devices which shared rmrr, and
>> then allow all devices within a group to be assigned to
>> same domain.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>
> Acked-by: Kevin Tian <kevin.tian@intel.com> except one text
> comment.
>
>> ---
>>   xen/drivers/passthrough/vtd/iommu.c | 30
>> +++++++++++++++++++++++++++---
>>   1 file changed, 27 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/drivers/passthrough/vtd/iommu.c
>> b/xen/drivers/passthrough/vtd/iommu.c
>> index d3233b8..f220081 100644
>> --- a/xen/drivers/passthrough/vtd/iommu.c
>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>> @@ -2277,13 +2277,37 @@ static int intel_iommu_assign_device(
>>       if ( list_empty(&acpi_drhd_units) )
>>           return -ENODEV;
>>
>> +    seg = pdev->seg;
>> +    bus = pdev->bus;
>> +    /*
>> +     * In rare cases one given rmrr is shared by multiple devices but
>> +     * obviously this would put the security of a system at risk. So
>> +     * we should prevent from this sort of device assignment.
>> +     *
>> +     * TODO: actually we can group these devices which shared rmrr, and
>> +     * then allow all devices within a group to be assigned to same domain.
>
> TODO: in the future we can introduce group device assignment
> interface to make sure devices sharing RMRR are assigned to the
> same domain together.

Thank you to rephrase this.

Tiejun

>
>> +     */
>> +    for_each_rmrr_device( rmrr, bdf, i )
>> +    {
>> +        if ( rmrr->segment == seg &&
>> +             PCI_BUS(bdf) == bus &&
>> +             PCI_DEVFN2(bdf) == devfn )
>> +        {
>> +            if ( rmrr->scope.devices_cnt > 1 )
>> +            {
>> +                ret = -EPERM;
>> +                printk(XENLOG_G_ERR VTDPREFIX
>> +                       " cannot assign this device with shared RMRR for Dom%d (%d)\n",
>> +                       d->domain_id, ret);
>> +                return ret;
>> +            }
>> +        }
>> +    }
>> +
>>       ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
>>       if ( ret )
>>           return ret;
>>
>> -    seg = pdev->seg;
>> -    bus = pdev->bus;
>> -
>>       /* Setup rmrr identity mapping */
>>       for_each_rmrr_device( rmrr, bdf, i )
>>       {
>> --
>> 1.9.1
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-12  6:31     ` Chen, Tiejun
@ 2015-06-12  8:45       ` Jan Beulich
  2015-06-12  9:20         ` Chen, Tiejun
  2015-06-16  2:30       ` Tian, Kevin
  1 sibling, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-12  8:45 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

>>> On 12.06.15 at 08:31, <tiejun.chen@intel.com> wrote:
> On 2015/6/11 17:28, Tian, Kevin wrote:
>>> From: Chen, Tiejun
>>> Sent: Thursday, June 11, 2015 9:15 AM
>>> @@ -1940,7 +1942,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev
>>> *pdev)
>>>                PCI_DEVFN2(bdf) != devfn )
>>>               continue;
>>>
>>> -        rmrr_identity_mapping(pdev->domain, 0, rmrr);
>>> +        rmrr_identity_mapping(pdev->domain, 0, rmrr,
>>> +                              XEN_DOMCTL_DEV_RDM_RELAXED);
>>
>> ditto
> 
> It doesn't matter when we're trying to remove a device since we don't 
> care this flag.

In such a case it helps to add a brief comment saying that the precise
value passed is irrelevant. Or maybe this could be expressed by
folding this and the "map" parameters of the function (in which case it
might become self-documenting)?

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 15/16] xen/vtd: enable USB device assignment
  2015-06-11 10:22   ` Tian, Kevin
@ 2015-06-12  8:59     ` Chen, Tiejun
  2015-06-16  5:58       ` Tian, Kevin
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-12  8:59 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

On 2015/6/11 18:22, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Thursday, June 11, 2015 9:15 AM
>>
>> Before we refine RMRR mechanism, USB RMRR may conflict with guest bios
>> region so we always ignore USB RMRR.
>
> If USB RMRR conflicts with guest bios, the conflict is always there
> before and after your refinement. :-)

Yeah :)

>
>> Now this can be gone when we enable
>> pci_force to check/reserve RMRR.

So what about this?

USB RMRR may conflict with guest bios region so we always ignore
USB RMRR. But now this can be checked to handle after we introduce
our policy mechanism.

>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>
> Acked-by: Kevin Tian <kevin.tian@intel.com> except one small comment below
>
>> ---
>>   xen/drivers/passthrough/vtd/dmar.h  |  1 -
>>   xen/drivers/passthrough/vtd/iommu.c | 11 ++---------
>>   xen/drivers/passthrough/vtd/utils.c |  7 -------
>>   3 files changed, 2 insertions(+), 17 deletions(-)
>>
>> diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h
>> index af1feef..af205f5 100644
>> --- a/xen/drivers/passthrough/vtd/dmar.h
>> +++ b/xen/drivers/passthrough/vtd/dmar.h
>> @@ -129,7 +129,6 @@ do {                                                \
>>
>>   int vtd_hw_check(void);
>>   void disable_pmr(struct iommu *iommu);
>> -int is_usb_device(u16 seg, u8 bus, u8 devfn);
>>   int is_igd_drhd(struct acpi_drhd_unit *drhd);
>>
>>   #endif /* _DMAR_H_ */
>> diff --git a/xen/drivers/passthrough/vtd/iommu.c
>> b/xen/drivers/passthrough/vtd/iommu.c
>> index d7c9e1c..d3233b8 100644
>> --- a/xen/drivers/passthrough/vtd/iommu.c
>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>> @@ -2229,11 +2229,9 @@ static int reassign_device_ownership(
>>       /*
>>        * If the device belongs to the hardware domain, and it has RMRR, don't
>>        * remove it from the hardware domain, because BIOS may use RMRR at
>> -     * booting time. Also account for the special casing of USB below (in
>> -     * intel_iommu_assign_device()).
>> +     * booting time.
>
> this code is run-time right?

According to one associated commit,

commit 8b99f4400b695535153dcd5d949b3f63602ca8bf
Author: Jan Beulich <jbeulich@suse.com>
Date:   Fri Oct 10 10:54:21 2014 +0200

     VT-d: fix RMRR related error handling

     - reassign_device_ownership() now tears down RMRR mappings (for other
       than Dom0)
     - to facilitate that, rmrr_identity_mapping() now deals with both
       establishing and tearing down of these mappings (the open coded
       equivalent in intel_iommu_remove_device() is being replaced at once)
     - intel_iommu_assign_device() now unrolls the assignment upon RMRR
       mapping errors
     - intel_iommu_add_device() now returns consistent values upon RMRR
       mapping failures (was: failure when last iteration ran into a
       problem, success otherwise)
     - intel_iommu_remove_device() no longer special cases Dom0 (it only
       ever gets called for devices removed from the _system_, not a domain)
     - rmrr_identity_mapping() now returns a proper error indicator instead
       of -1 when intel_iommu_map_page() failed

     Signed-off-by: Jan Beulich <jbeulich@suse.com>
     Acked-by: Kevin Tian <kevin.tian@intel.com>

This chunk of codes resides inside intel_iommu_remove_device() so I 
think this shouldn't be for a running domain.

Thanks
Tiejun

>
>>        */
>> -    if ( !is_hardware_domain(source) &&
>> -         !is_usb_device(pdev->seg, pdev->bus, pdev->devfn) )
>> +    if ( !is_hardware_domain(source) )
>>       {
>>           const struct acpi_rmrr_unit *rmrr;
>>           u16 bdf;
>> @@ -2283,13 +2281,8 @@ static int intel_iommu_assign_device(
>>       if ( ret )
>>           return ret;
>>
>> -    /* FIXME: Because USB RMRR conflicts with guest bios region,
>> -     * ignore USB RMRR temporarily.
>> -     */
>>       seg = pdev->seg;
>>       bus = pdev->bus;
>> -    if ( is_usb_device(seg, bus, pdev->devfn) )
>> -        return 0;
>>
>>       /* Setup rmrr identity mapping */
>>       for_each_rmrr_device( rmrr, bdf, i )
>> diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c
>> index bd14c02..b8a077f 100644
>> --- a/xen/drivers/passthrough/vtd/utils.c
>> +++ b/xen/drivers/passthrough/vtd/utils.c
>> @@ -29,13 +29,6 @@
>>   #include "extern.h"
>>   #include <asm/io_apic.h>
>>
>> -int is_usb_device(u16 seg, u8 bus, u8 devfn)
>> -{
>> -    u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>> -                                PCI_CLASS_DEVICE);
>> -    return (class == 0xc03);
>> -}
>> -
>>   /* Disable vt-d protected memory registers. */
>>   void disable_pmr(struct iommu *iommu)
>>   {
>> --
>> 1.9.1
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-12  8:45       ` Jan Beulich
@ 2015-06-12  9:20         ` Chen, Tiejun
  2015-06-12  9:26           ` Jan Beulich
  2015-06-15  7:39           ` Chen, Tiejun
  0 siblings, 2 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-12  9:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

On 2015/6/12 16:45, Jan Beulich wrote:
>>>> On 12.06.15 at 08:31, <tiejun.chen@intel.com> wrote:
>> On 2015/6/11 17:28, Tian, Kevin wrote:
>>>> From: Chen, Tiejun
>>>> Sent: Thursday, June 11, 2015 9:15 AM
>>>> @@ -1940,7 +1942,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev
>>>> *pdev)
>>>>                 PCI_DEVFN2(bdf) != devfn )
>>>>                continue;
>>>>
>>>> -        rmrr_identity_mapping(pdev->domain, 0, rmrr);
>>>> +        rmrr_identity_mapping(pdev->domain, 0, rmrr,
>>>> +                              XEN_DOMCTL_DEV_RDM_RELAXED);
>>>
>>> ditto
>>
>> It doesn't matter when we're trying to remove a device since we don't
>> care this flag.
>
> In such a case it helps to add a brief comment saying that the precise
> value passed is irrelevant. Or maybe this could be expressed by

Okay.

> folding this and the "map" parameters of the function (in which case it
> might become self-documenting)?
>

Sorry, I don't know exactly how to implement this idea. Have we any 
similar example on Xen side?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-12  9:20         ` Chen, Tiejun
@ 2015-06-12  9:26           ` Jan Beulich
  2015-06-15  7:39           ` Chen, Tiejun
  1 sibling, 0 replies; 114+ messages in thread
From: Jan Beulich @ 2015-06-12  9:26 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

>>> On 12.06.15 at 11:20, <tiejun.chen@intel.com> wrote:
> On 2015/6/12 16:45, Jan Beulich wrote:
>>>>> On 12.06.15 at 08:31, <tiejun.chen@intel.com> wrote:
>>> On 2015/6/11 17:28, Tian, Kevin wrote:
>>>>> From: Chen, Tiejun
>>>>> Sent: Thursday, June 11, 2015 9:15 AM
>>>>> @@ -1940,7 +1942,8 @@ static int intel_iommu_remove_device(u8 devfn, struct 
> pci_dev
>>>>> *pdev)
>>>>>                 PCI_DEVFN2(bdf) != devfn )
>>>>>                continue;
>>>>>
>>>>> -        rmrr_identity_mapping(pdev->domain, 0, rmrr);
>>>>> +        rmrr_identity_mapping(pdev->domain, 0, rmrr,
>>>>> +                              XEN_DOMCTL_DEV_RDM_RELAXED);
>>>>
>>>> ditto
>>>
>>> It doesn't matter when we're trying to remove a device since we don't
>>> care this flag.
>>
>> In such a case it helps to add a brief comment saying that the precise
>> value passed is irrelevant. Or maybe this could be expressed by
> 
> Okay.
> 
>> folding this and the "map" parameters of the function (in which case it
>> might become self-documenting)?
>>
> 
> Sorry, I don't know exactly how to implement this idea. Have we any 
> similar example on Xen side?

No idea what you're after. What we have with your change are
tuples like
(map, relaxed)
(map, strict)
(unmap, <ignored>)
Clearly these can be represented with three distinct numbers. I.e.
along with using XEN_DOMCTL_DEV_RDM_* without the "map"
boolean, (unmap, <ignored>) could e.g. be expressed by passing
-1.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy
  2015-06-11  1:15 ` [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
  2015-06-11 10:02   ` Tian, Kevin
@ 2015-06-12 15:43   ` Wei Liu
  2015-06-15  1:12     ` Chen, Tiejun
  1 sibling, 1 reply; 114+ messages in thread
From: Wei Liu @ 2015-06-12 15:43 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, andrew.cooper3, tim,
	xen-devel, stefano.stabellini, jbeulich, yang.z.zhang,
	Ian.Jackson

On Thu, Jun 11, 2015 at 09:15:19AM +0800, Tiejun Chen wrote:
[...]
>  
> -static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
> +static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func,
> +                    int *flag)

This is unrelated change. It should be moved to appropriate patch.

Wei.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 11/16] tools: introduce some new parameters to set rdm policy
  2015-06-11  1:15 ` [v3][PATCH 11/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
@ 2015-06-12 16:02   ` Wei Liu
  2015-06-15  1:19     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Wei Liu @ 2015-06-12 16:02 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, andrew.cooper3, tim,
	xen-devel, stefano.stabellini, jbeulich, yang.z.zhang,
	Ian.Jackson

On Thu, Jun 11, 2015 at 09:15:20AM +0800, Tiejun Chen wrote:
> This patch introduces user configurable parameters to specify RDM
> resource and according policies,
> 
> Global RDM parameter:
>     rdm = "type=none/host,reserve=strict/relaxed"
> Per-device RDM parameter:
>     pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]
> 
> Global RDM parameter, "type", allows user to specify reserved regions
> explicitly, e.g. using 'host' to include all reserved regions reported
> on this platform which is good to handle hotplug scenario. In the future
> this parameter may be further extended to allow specifying random regions,
> e.g. even those belonging to another platform as a preparation for live
> migration with passthrough devices. Instead, 'none' means we have nothing
> to do all reserved regions and ignore all policies, so guest work as before.
> 
> 'strict/relaxed' policy decides how to handle conflict when reserving RDM
> regions in pfn space. If conflict exists, 'strict' means an immediate error
> so VM will be killed, while 'relaxed' allows moving forward with a warning
> message thrown out.
> 
> Default per-device RDM policy is 'strict', while default global RDM policy
> is 'relaxed'. When both policies are specified on a given region, 'strict' is
> always preferred.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
[...]
>  }
>  
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 23f27d4..4dfcaf7 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -73,6 +73,17 @@ libxl_domain_type = Enumeration("domain_type", [
>      (2, "PV"),
>      ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")
>  
> +libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
> +    (0, "none"),
> +    (1, "host"),
> +    ], init_val = "LIBXL_RDM_RESERVE_TYPE_NONE")
> +

No need to define init_val if that value is 0.

Other than this minor nit this patch does what we've discussed before.


> + */
> +int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str);
>  
>  /*
>   * Vif rate parsing.
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index c858068..aedbd4b 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -1920,6 +1920,14 @@ skip_vfb:
>          xlu_cfg_get_defbool(config, "e820_host", &b_info->u.pv.e820_host, 0);
>      }
>  
> +    if (!xlu_cfg_get_string(config, "rdm", &buf, 0)) {
> +        libxl_rdm_reserve rdm;
> +        if (!xlu_rdm_parse(config, &rdm, buf)) {
> +            b_info->rdm.type = rdm.type;
> +            b_info->rdm.reserve = rdm.reserve;
> +        }
> +    }
> +

You might want to consider breaking out changes to xl and libxlu to a
final patch.

My thought is that even if those changes don't break bisection (which
I'm not very sure at this point), they are useless. If you think it is
difficult or I'm talking non-sense, do let me know.

Wei.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 12/16] tools/libxl: passes rdm reservation policy
  2015-06-11  1:15 ` [v3][PATCH 12/16] tools/libxl: passes rdm reservation policy Tiejun Chen
@ 2015-06-12 16:17   ` Wei Liu
  2015-06-15  1:26     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Wei Liu @ 2015-06-12 16:17 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, andrew.cooper3, tim,
	xen-devel, stefano.stabellini, jbeulich, yang.z.zhang,
	Ian.Jackson

On Thu, Jun 11, 2015 at 09:15:21AM +0800, Tiejun Chen wrote:
> This patch passes our rdm reservation policy inside libxl
> when we assign a device or attach a device.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  docs/man/xl.pod.1         |  7 ++++++-
>  tools/libxl/libxl_pci.c   | 10 +++++++++-
>  tools/libxl/xl_cmdimpl.c  | 23 +++++++++++++++++++----
>  tools/libxl/xl_cmdtable.c |  2 +-
>  4 files changed, 35 insertions(+), 7 deletions(-)
> 
> diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
> index 4eb929d..c5c4809 100644
> --- a/docs/man/xl.pod.1
> +++ b/docs/man/xl.pod.1
> @@ -1368,10 +1368,15 @@ it will also attempt to re-bind the device to its original driver, making it
>  usable by Domain 0 again.  If the device is not bound to pciback, it will
>  return success.
>  
> -=item B<pci-attach> I<domain-id> I<BDF>
> +=item B<pci-attach> I<domain-id> I<BDF> [I<rdm>]
>  
>  Hot-plug a new pass-through pci device to the specified domain.
>  B<BDF> is the PCI Bus/Device/Function of the physical device to pass-through.
> +B<rdm policy> is about how to handle conflict between reserving reserved device
> +memory and guest address space. "strict" means an unsolved conflict leads to
> +immediate VM crash, while "relaxed" allows VM moving forward with a warning
> +message thrown out. Here "strict" is default.
> +
>  
>  =item B<pci-detach> [I<-f>] I<domain-id> I<BDF>
>  
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index a00d799..d2e8911 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -894,7 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>      FILE *f;
>      unsigned long long start, end, flags, size;
>      int irq, i, rc, hvm = 0;
> -    uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
> +    uint32_t flag;
>  
>      if (type == LIBXL_DOMAIN_TYPE_INVALID)
>          return ERROR_FAIL;
> @@ -988,6 +988,14 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>  
>  out:
>      if (!libxl_is_stubdom(ctx, domid, NULL)) {
> +        if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_RELAXED) {
> +            flag = XEN_DOMCTL_DEV_RDM_RELAXED;
> +        } else if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_STRICT) {
> +            flag = XEN_DOMCTL_DEV_RDM_STRICT;
> +        } else {
> +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unkwon rdm check flag.");

Typo "unkwon" and use LOG(ERROR,...).

> +            return ERROR_FAIL;
> +        }
>          rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
>          if (rc < 0 && (hvm || errno != ENOSYS)) {
>              LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index aedbd4b..4364ba4 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -3359,7 +3359,8 @@ int main_pcidetach(int argc, char **argv)
>      pcidetach(domid, bdf, force);
>      return 0;
>  }
> -static void pciattach(uint32_t domid, const char *bdf, const char *vs)
> +static void pciattach(uint32_t domid, const char *bdf, const char *vs,
> +                      uint32_t flag)
>  {
>      libxl_device_pci pcidev;
>      XLU_Config *config;
> @@ -3369,6 +3370,7 @@ static void pciattach(uint32_t domid, const char *bdf, const char *vs)
>      config = xlu_cfg_init(stderr, "command line");
>      if (!config) { perror("xlu_cfg_inig"); exit(-1); }
>  
> +    pcidev.rdm_reserve = flag;
>      if (xlu_pci_parse_bdf(config, &pcidev, bdf)) {
>          fprintf(stderr, "pci-attach: malformed BDF specification \"%s\"\n", bdf);
>          exit(2);
> @@ -3381,9 +3383,9 @@ static void pciattach(uint32_t domid, const char *bdf, const char *vs)
>  
>  int main_pciattach(int argc, char **argv)
>  {
> -    uint32_t domid;
> +    uint32_t domid, flag;
>      int opt;
> -    const char *bdf = NULL, *vs = NULL;
> +    const char *bdf = NULL, *vs = NULL, *rdm_policy = NULL;
>  
>      SWITCH_FOREACH_OPT(opt, "", NULL, "pci-attach", 2) {
>          /* No options */
> @@ -3395,7 +3397,20 @@ int main_pciattach(int argc, char **argv)
>      if (optind + 1 < argc)
>          vs = argv[optind + 2];
>  
> -    pciattach(domid, bdf, vs);
> +    if (optind + 2 < argc) {
> +        rdm_policy = argv[optind + 3];
> +    }
> +    if (!strcmp(rdm_policy, "strict")) {
> +        flag = LIBXL_RDM_RESERVE_FLAG_STRICT;
> +    } else if (!strcmp(rdm_policy, "relaxed")) {
> +        flag = LIBXL_RDM_RESERVE_FLAG_RELAXED;
> +    } else {
> +        fprintf(stderr, "%s is an invalid rdm policy: 'strict'|'relaxed'\n",
> +                rdm_policy);
> +        exit(2);
> +    }
> +
> +    pciattach(domid, bdf, vs, flag);
>      return 0;
>  }
>  
> diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
> index 7f4759b..552fbec 100644
> --- a/tools/libxl/xl_cmdtable.c
> +++ b/tools/libxl/xl_cmdtable.c
> @@ -88,7 +88,7 @@ struct cmd_spec cmd_table[] = {
>      { "pci-attach",
>        &main_pciattach, 0, 1,
>        "Insert a new pass-through pci device",
> -      "<Domain> <BDF> [Virtual Slot]",
> +      "<Domain> <BDF> [Virtual Slot] <policy to reserve rdm['strice'|'relaxed']>",

Should use "[]" to indicate it's optional.

Wei.

>      },
>      { "pci-detach",
>        &main_pcidetach, 0, 1,
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 13/16] tools/libxl: detect and avoid conflicts with RDM
  2015-06-11  1:15 ` [v3][PATCH 13/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
  2015-06-11 10:19   ` Tian, Kevin
@ 2015-06-12 16:39   ` Wei Liu
  2015-06-15  1:50     ` Chen, Tiejun
  1 sibling, 1 reply; 114+ messages in thread
From: Wei Liu @ 2015-06-12 16:39 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, andrew.cooper3, tim,
	xen-devel, stefano.stabellini, jbeulich, yang.z.zhang,
	Ian.Jackson

On Thu, Jun 11, 2015 at 09:15:22AM +0800, Tiejun Chen wrote:
[...]
> +++ b/tools/libxc/xc_hvm_build_x86.c
> @@ -21,6 +21,7 @@
>  #include <stdlib.h>
>  #include <unistd.h>
>  #include <zlib.h>
> +#include <assert.h>
>  
>  #include "xg_private.h"
>  #include "xc_private.h"
> @@ -270,7 +271,7 @@ static int setup_guest(xc_interface *xch,
>  
>      elf_parse_binary(&elf);
>      v_start = 0;
> -    v_end = args->mem_size;
> +    v_end = args->lowmem_end;

Why is this needed?
>  
>      if ( nr_pages > target_pages )
>          memflags |= XENMEMF_populate_on_demand;
> @@ -754,6 +755,8 @@ int xc_hvm_build_target_mem(xc_interface *xch,
>      args.mem_size = (uint64_t)memsize << 20;
>      args.mem_target = (uint64_t)target << 20;
>      args.image_file_name = image_name;
[...]
>  
> +static struct xen_reserved_device_memory
> +*xc_device_get_rdm(libxl__gc *gc,
> +                   uint32_t flag,
> +                   uint16_t seg,
> +                   uint8_t bus,
> +                   uint8_t devfn,
> +                   unsigned int *nr_entries)
> +{
> +    struct xen_reserved_device_memory *xrdm;
> +    int rc;
> +
> +    rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
> +                                       NULL, nr_entries);

xc_reserved_device_memory_map dereferences nr_entries. You need to make
sure there is no garbage value in nr_entries. I.e. you need to
initialise nr_entries to 0 before passing it in.

> +    assert(rc <= 0);
> +    /* "0" means we have no any rdm entry. */
> +    if (!rc)
> +        goto out;
> +
> +    if (errno == ENOBUFS) {
> +        xrdm = malloc(*nr_entries * sizeof(xen_reserved_device_memory_t));
> +        if (!xrdm) {
> +            LOG(ERROR, "Could not allocate RDM buffer!\n");
> +            goto out;
> +        }
> +        rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
> +                                           xrdm, nr_entries);
> +        if (rc) {
> +            LOG(ERROR, "Could not get reserved device memory maps.\n");
> +            *nr_entries = 0;
> +            free(xrdm);
> +            xrdm = NULL;
> +        }
> +    } else
> +        LOG(ERROR, "Could not get reserved device memory maps.\n");
> +
> + out:
> +    return xrdm;
> +}
> +
> +/*
> + * Check whether there exists rdm hole in the specified memory range.
> + * Returns true if exists, else returns false.
> + */
> +static bool overlaps_rdm(uint64_t start, uint64_t memsize,
> +                         uint64_t rdm_start, uint64_t rdm_size)
> +{
> +    return (start + memsize > rdm_start) && (start < rdm_start + rdm_size);
> +}
> +
> +/*
> + * Check reported RDM regions and handle potential gfn conflicts according
> + * to user preferred policy.
> + *
> + * RMRR can reside in address space beyond 4G theoretically, but we never
> + * see this in real world. So in order to avoid breaking highmem layout
> + * we don't solve highmem conflict. Note this means highmem rmrr could still
> + * be supported if no conflict.
> + *
> + * But in the case of lowmem, RMRR probably scatter the whole RAM space.
> + * Especially multiple RMRR entries would worsen this to lead a complicated
> + * memory layout. And then its hard to extend hvm_info_table{} to work
> + * hvmloader out. So here we're trying to figure out a simple solution to
> + * avoid breaking existing layout. So when a conflict occurs,
> + *
> + * #1. Above a predefined boundary (default 2G)
> + * - Move lowmem_end below reserved region to solve conflict;
> + *
> + * #2. Below a predefined boundary (default 2G)
> + * - Check strict/relaxed policy.
> + * "strict" policy leads to fail libxl. Note when both policies
> + * are specified on a given region, 'strict' is always preferred.
> + * "relaxed" policy issue a warning message and also mask this entry
> + * INVALID to indicate we shouldn't expose this entry to hvmloader.
> + */

This looks sensible. Thanks.

> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 4364ba4..85d74fd 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -1374,6 +1374,9 @@ static void parse_config_data(const char *config_source,
>      if (!xlu_cfg_get_long (config, "videoram", &l, 0))
>          b_info->video_memkb = l * 1024;
>  
> +    if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
> +        b_info->rdm_mem_boundary_memkb = l * 1024;
> +

Maybe you need to rearrange this patch series a bit more. The toolstack
side patches have mixed libxc, libxl and xl changes which is a bit hard
for me to tell what is needed by what. We can discuss this if you have
questions.

Wei.

>      if (!xlu_cfg_get_long(config, "max_event_channels", &l, 0))
>          b_info->event_channels = l;
>  
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 14/16] tools/libxl: extend XENMEM_set_memory_map
  2015-06-11  1:15 ` [v3][PATCH 14/16] tools/libxl: extend XENMEM_set_memory_map Tiejun Chen
@ 2015-06-12 16:43   ` Wei Liu
  2015-06-15  2:15     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Wei Liu @ 2015-06-12 16:43 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, wei.liu2, ian.campbell, andrew.cooper3, tim,
	xen-devel, stefano.stabellini, jbeulich, yang.z.zhang,
	Ian.Jackson

On Thu, Jun 11, 2015 at 09:15:23AM +0800, Tiejun Chen wrote:
> Here we'll construct a basic guest e820 table via
> XENMEM_set_memory_map. This table includes lowmem, highmem
> and RDMs if they exist. And hvmloader would need this info
> later.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  tools/libxl/libxl_dom.c      |  5 +++
>  tools/libxl/libxl_internal.h | 24 ++++++++++++++
>  tools/libxl/libxl_x86.c      | 78 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 107 insertions(+)
> 
> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
> index 1777b32..3125ac0 100644
> --- a/tools/libxl/libxl_dom.c
> +++ b/tools/libxl/libxl_dom.c
> @@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>          goto out;
>      }
>  
> +    if (libxl__domain_construct_e820(gc, d_config, domid, &args)) {
> +        LOG(ERROR, "setting domain memory map failed");
> +        goto out;
> +    }
> +
>      ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
>                                 &state->store_mfn, state->console_port,
>                                 &state->console_mfn, state->store_domid,
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 52f3831..d838639 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -3713,6 +3713,30 @@ static inline void libxl__update_config_vtpm(libxl__gc *gc,
>   */
>  void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr,
>                                      const libxl_bitmap *sptr);
> +
> +/*
> + * Here we're just trying to set these kinds of e820 mappings:
> + *
> + * #1. Low memory region
> + *
> + * Low RAM starts at least from 1M to make sure all standard regions
> + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
> + * have enough space.
> + * Note: Those stuffs below 1M are still constructed with multiple
> + * e820 entries by hvmloader. At this point we don't change anything.
> + *
> + * #2. RDM region if it exists
> + *
> + * #3. High memory region if it exists
> + *
> + * Note: these regions are not overlapping since we already check
> + * to adjust them. Please refer to libxl__domain_device_construct_rdm().
> + */
> +int libxl__domain_construct_e820(libxl__gc *gc,
> +                                 libxl_domain_config *d_config,
> +                                 uint32_t domid,
> +                                 struct xc_hvm_build_args *args);
> +
>  #endif
>  
>  /*
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index ed2bd38..291f6ab 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -438,6 +438,84 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
>  }
>  
>  /*
> + * Here we're just trying to set these kinds of e820 mappings:
> + *
> + * #1. Low memory region
> + *
> + * Low RAM starts at least from 1M to make sure all standard regions
> + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
> + * have enough space.
> + * Note: Those stuffs below 1M are still constructed with multiple
> + * e820 entries by hvmloader. At this point we don't change anything.
> + *
> + * #2. RDM region if it exists
> + *
> + * #3. High memory region if it exists
> + *
> + * Note: these regions are not overlapping since we already check
> + * to adjust them. Please refer to libxl__domain_device_construct_rdm().
> + */
> +#define GUEST_LOW_MEM_START_DEFAULT 0x100000
> +int libxl__domain_construct_e820(libxl__gc *gc,
> +                                 libxl_domain_config *d_config,
> +                                 uint32_t domid,
> +                                 struct xc_hvm_build_args *args)
> +{
> +    unsigned int nr = 0, i;
> +    /* We always own at least one lowmem entry. */
> +    unsigned int e820_entries = 1;
> +    struct e820entry *e820 = NULL;
> +    uint64_t highmem_size =
> +                    args->highmem_end ? args->highmem_end - (1ull << 32) : 0;
> +
> +    /* Add all rdm entries. */
> +    for (i = 0; i < d_config->num_rdms; i++)
> +        if (d_config->rdms[i].flag != LIBXL_RDM_RESERVE_FLAG_INVALID)
> +            e820_entries++;
> +
> +
> +    /* If we should have a highmem range. */
> +    if (highmem_size)
> +        e820_entries++;
> +
> +    if (e820_entries >= E820MAX) {
> +        LOG(ERROR, "Ooops! Too many entries in the memory map!\n");
> +        return -1;

Please use goto style error handling.

> +    }
> +
> +    e820 = libxl__malloc(NOGC, sizeof(struct e820entry) * e820_entries);
> +

You should use libxl__malloc(gc,) here.

> +    /* Low memory */
> +    e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
> +    e820[nr].size = args->lowmem_end - GUEST_LOW_MEM_START_DEFAULT;
> +    e820[nr].type = E820_RAM;
> +    nr++;
> +
> +    /* RDM mapping */
> +    for (i = 0; i < d_config->num_rdms; i++) {
> +        if (d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_INVALID)
> +            continue;
> +
> +        e820[nr].addr = d_config->rdms[i].start;
> +        e820[nr].size = d_config->rdms[i].size;
> +        e820[nr].type = E820_RESERVED;
> +        nr++;
> +    }
> +
> +    /* High memory */
> +    if (highmem_size) {
> +        e820[nr].addr = ((uint64_t)1 << 32);
> +        e820[nr].size = highmem_size;
> +        e820[nr].type = E820_RAM;
> +    }
> +
> +    if (xc_domain_set_memory_map(CTX->xch, domid, e820, e820_entries) != 0)
> +        return -1;
> +
> +    return 0;
> +}
> +
> +/*
>   * Local variables:
>   * mode: C
>   * c-basic-offset: 4
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy
  2015-06-12 15:43   ` Wei Liu
@ 2015-06-15  1:12     ` Chen, Tiejun
  2015-06-15 14:58       ` Wei Liu
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-15  1:12 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, jbeulich, yang.z.zhang, Ian.Jackson

On 2015/6/12 23:43, Wei Liu wrote:
> On Thu, Jun 11, 2015 at 09:15:19AM +0800, Tiejun Chen wrote:
> [...]
>>
>> -static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
>> +static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func,
>> +                    int *flag)
>
> This is unrelated change. It should be moved to appropriate patch.
>

This is in the tools/python/xen/lowlevel/xc/xc.c file,

pyxc_assign_device
     |
     + while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
	|
	+ ...
	|
	+ xc_assign_device()

this really should be related to extend xc_assign_device().

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 11/16] tools: introduce some new parameters to set rdm policy
  2015-06-12 16:02   ` Wei Liu
@ 2015-06-15  1:19     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-15  1:19 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, jbeulich, yang.z.zhang, Ian.Jackson

On 2015/6/13 0:02, Wei Liu wrote:
> On Thu, Jun 11, 2015 at 09:15:20AM +0800, Tiejun Chen wrote:
>> This patch introduces user configurable parameters to specify RDM
>> resource and according policies,
>>
>> Global RDM parameter:
>>      rdm = "type=none/host,reserve=strict/relaxed"
>> Per-device RDM parameter:
>>      pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]
>>
>> Global RDM parameter, "type", allows user to specify reserved regions
>> explicitly, e.g. using 'host' to include all reserved regions reported
>> on this platform which is good to handle hotplug scenario. In the future
>> this parameter may be further extended to allow specifying random regions,
>> e.g. even those belonging to another platform as a preparation for live
>> migration with passthrough devices. Instead, 'none' means we have nothing
>> to do all reserved regions and ignore all policies, so guest work as before.
>>
>> 'strict/relaxed' policy decides how to handle conflict when reserving RDM
>> regions in pfn space. If conflict exists, 'strict' means an immediate error
>> so VM will be killed, while 'relaxed' allows moving forward with a warning
>> message thrown out.
>>
>> Default per-device RDM policy is 'strict', while default global RDM policy
>> is 'relaxed'. When both policies are specified on a given region, 'strict' is
>> always preferred.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
> [...]
>>   }
>>
>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>> index 23f27d4..4dfcaf7 100644
>> --- a/tools/libxl/libxl_types.idl
>> +++ b/tools/libxl/libxl_types.idl
>> @@ -73,6 +73,17 @@ libxl_domain_type = Enumeration("domain_type", [
>>       (2, "PV"),
>>       ], init_val = "LIBXL_DOMAIN_TYPE_INVALID")
>>
>> +libxl_rdm_reserve_type = Enumeration("rdm_reserve_type", [
>> +    (0, "none"),
>> +    (1, "host"),
>> +    ], init_val = "LIBXL_RDM_RESERVE_TYPE_NONE")
>> +
>
> No need to define init_val if that value is 0.

Okay.

>
> Other than this minor nit this patch does what we've discussed before.
>
>
>> + */
>> +int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str);
>>
>>   /*
>>    * Vif rate parsing.
>> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
>> index c858068..aedbd4b 100644
>> --- a/tools/libxl/xl_cmdimpl.c
>> +++ b/tools/libxl/xl_cmdimpl.c
>> @@ -1920,6 +1920,14 @@ skip_vfb:
>>           xlu_cfg_get_defbool(config, "e820_host", &b_info->u.pv.e820_host, 0);
>>       }
>>
>> +    if (!xlu_cfg_get_string(config, "rdm", &buf, 0)) {
>> +        libxl_rdm_reserve rdm;
>> +        if (!xlu_rdm_parse(config, &rdm, buf)) {
>> +            b_info->rdm.type = rdm.type;
>> +            b_info->rdm.reserve = rdm.reserve;
>> +        }
>> +    }
>> +
>
> You might want to consider breaking out changes to xl and libxlu to a
> final patch.
>
> My thought is that even if those changes don't break bisection (which
> I'm not very sure at this point), they are useless. If you think it is
> difficult or I'm talking non-sense, do let me know.
>

Its a little difficult but just let me try to split again if possible.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 12/16] tools/libxl: passes rdm reservation policy
  2015-06-12 16:17   ` Wei Liu
@ 2015-06-15  1:26     ` Chen, Tiejun
  2015-06-15 15:00       ` Wei Liu
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-15  1:26 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, jbeulich, yang.z.zhang, Ian.Jackson

On 2015/6/13 0:17, Wei Liu wrote:
> On Thu, Jun 11, 2015 at 09:15:21AM +0800, Tiejun Chen wrote:
>> This patch passes our rdm reservation policy inside libxl
>> when we assign a device or attach a device.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   docs/man/xl.pod.1         |  7 ++++++-
>>   tools/libxl/libxl_pci.c   | 10 +++++++++-
>>   tools/libxl/xl_cmdimpl.c  | 23 +++++++++++++++++++----
>>   tools/libxl/xl_cmdtable.c |  2 +-
>>   4 files changed, 35 insertions(+), 7 deletions(-)
>>
>> diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
>> index 4eb929d..c5c4809 100644
>> --- a/docs/man/xl.pod.1
>> +++ b/docs/man/xl.pod.1
>> @@ -1368,10 +1368,15 @@ it will also attempt to re-bind the device to its original driver, making it
>>   usable by Domain 0 again.  If the device is not bound to pciback, it will
>>   return success.
>>
>> -=item B<pci-attach> I<domain-id> I<BDF>
>> +=item B<pci-attach> I<domain-id> I<BDF> [I<rdm>]
>>
>>   Hot-plug a new pass-through pci device to the specified domain.
>>   B<BDF> is the PCI Bus/Device/Function of the physical device to pass-through.
>> +B<rdm policy> is about how to handle conflict between reserving reserved device
>> +memory and guest address space. "strict" means an unsolved conflict leads to
>> +immediate VM crash, while "relaxed" allows VM moving forward with a warning
>> +message thrown out. Here "strict" is default.
>> +
>>
>>   =item B<pci-detach> [I<-f>] I<domain-id> I<BDF>
>>
>> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
>> index a00d799..d2e8911 100644
>> --- a/tools/libxl/libxl_pci.c
>> +++ b/tools/libxl/libxl_pci.c
>> @@ -894,7 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>>       FILE *f;
>>       unsigned long long start, end, flags, size;
>>       int irq, i, rc, hvm = 0;
>> -    uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>> +    uint32_t flag;
>>
>>       if (type == LIBXL_DOMAIN_TYPE_INVALID)
>>           return ERROR_FAIL;
>> @@ -988,6 +988,14 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i
>>
>>   out:
>>       if (!libxl_is_stubdom(ctx, domid, NULL)) {
>> +        if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_RELAXED) {
>> +            flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>> +        } else if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_STRICT) {
>> +            flag = XEN_DOMCTL_DEV_RDM_STRICT;
>> +        } else {
>> +            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unkwon rdm check flag.");
>
> Typo "unkwon" and use LOG(ERROR,...).

Will fix that typo, s/unkwon/unknown, but are you sure we should use 
LOG() here? Because this function always uses LIBXL__LOG_ERRNO(),

>
>> +            return ERROR_FAIL;
>> +        }
>>           rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
>>           if (rc < 0 && (hvm || errno != ENOSYS)) {
>>               LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");

like here.

>> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
>> index aedbd4b..4364ba4 100644
>> --- a/tools/libxl/xl_cmdimpl.c
>> +++ b/tools/libxl/xl_cmdimpl.c
>> @@ -3359,7 +3359,8 @@ int main_pcidetach(int argc, char **argv)
>>       pcidetach(domid, bdf, force);
>>       return 0;
>>   }
>> -static void pciattach(uint32_t domid, const char *bdf, const char *vs)
>> +static void pciattach(uint32_t domid, const char *bdf, const char *vs,
>> +                      uint32_t flag)
>>   {
>>       libxl_device_pci pcidev;
>>       XLU_Config *config;
>> @@ -3369,6 +3370,7 @@ static void pciattach(uint32_t domid, const char *bdf, const char *vs)
>>       config = xlu_cfg_init(stderr, "command line");
>>       if (!config) { perror("xlu_cfg_inig"); exit(-1); }
>>
>> +    pcidev.rdm_reserve = flag;
>>       if (xlu_pci_parse_bdf(config, &pcidev, bdf)) {
>>           fprintf(stderr, "pci-attach: malformed BDF specification \"%s\"\n", bdf);
>>           exit(2);
>> @@ -3381,9 +3383,9 @@ static void pciattach(uint32_t domid, const char *bdf, const char *vs)
>>
>>   int main_pciattach(int argc, char **argv)
>>   {
>> -    uint32_t domid;
>> +    uint32_t domid, flag;
>>       int opt;
>> -    const char *bdf = NULL, *vs = NULL;
>> +    const char *bdf = NULL, *vs = NULL, *rdm_policy = NULL;
>>
>>       SWITCH_FOREACH_OPT(opt, "", NULL, "pci-attach", 2) {
>>           /* No options */
>> @@ -3395,7 +3397,20 @@ int main_pciattach(int argc, char **argv)
>>       if (optind + 1 < argc)
>>           vs = argv[optind + 2];
>>
>> -    pciattach(domid, bdf, vs);
>> +    if (optind + 2 < argc) {
>> +        rdm_policy = argv[optind + 3];
>> +    }
>> +    if (!strcmp(rdm_policy, "strict")) {
>> +        flag = LIBXL_RDM_RESERVE_FLAG_STRICT;
>> +    } else if (!strcmp(rdm_policy, "relaxed")) {
>> +        flag = LIBXL_RDM_RESERVE_FLAG_RELAXED;
>> +    } else {
>> +        fprintf(stderr, "%s is an invalid rdm policy: 'strict'|'relaxed'\n",
>> +                rdm_policy);
>> +        exit(2);
>> +    }
>> +
>> +    pciattach(domid, bdf, vs, flag);
>>       return 0;
>>   }
>>
>> diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
>> index 7f4759b..552fbec 100644
>> --- a/tools/libxl/xl_cmdtable.c
>> +++ b/tools/libxl/xl_cmdtable.c
>> @@ -88,7 +88,7 @@ struct cmd_spec cmd_table[] = {
>>       { "pci-attach",
>>         &main_pciattach, 0, 1,
>>         "Insert a new pass-through pci device",
>> -      "<Domain> <BDF> [Virtual Slot]",
>> +      "<Domain> <BDF> [Virtual Slot] <policy to reserve rdm['strice'|'relaxed']>",
>
> Should use "[]" to indicate it's optional.
>

What about this?

"<Domain> <BDF> [Virtual Slot] [policy to reserve rdm<'strice'|'relaxed'>]",

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 13/16] tools/libxl: detect and avoid conflicts with RDM
  2015-06-12 16:39   ` Wei Liu
@ 2015-06-15  1:50     ` Chen, Tiejun
  2015-06-15 15:01       ` Wei Liu
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-15  1:50 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, jbeulich, yang.z.zhang, Ian.Jackson

On 2015/6/13 0:39, Wei Liu wrote:
> On Thu, Jun 11, 2015 at 09:15:22AM +0800, Tiejun Chen wrote:
> [...]
>> +++ b/tools/libxc/xc_hvm_build_x86.c
>> @@ -21,6 +21,7 @@
>>   #include <stdlib.h>
>>   #include <unistd.h>
>>   #include <zlib.h>
>> +#include <assert.h>
>>
>>   #include "xg_private.h"
>>   #include "xc_private.h"
>> @@ -270,7 +271,7 @@ static int setup_guest(xc_interface *xch,
>>
>>       elf_parse_binary(&elf);
>>       v_start = 0;
>> -    v_end = args->mem_size;
>> +    v_end = args->lowmem_end;
>
> Why is this needed?

This was left to handle something inside modules_init(). But I think 
this change can be removed now.

>>
>>       if ( nr_pages > target_pages )
>>           memflags |= XENMEMF_populate_on_demand;
>> @@ -754,6 +755,8 @@ int xc_hvm_build_target_mem(xc_interface *xch,
>>       args.mem_size = (uint64_t)memsize << 20;
>>       args.mem_target = (uint64_t)target << 20;
>>       args.image_file_name = image_name;
> [...]
>>
>> +static struct xen_reserved_device_memory
>> +*xc_device_get_rdm(libxl__gc *gc,
>> +                   uint32_t flag,
>> +                   uint16_t seg,
>> +                   uint8_t bus,
>> +                   uint8_t devfn,
>> +                   unsigned int *nr_entries)
>> +{
>> +    struct xen_reserved_device_memory *xrdm;
>> +    int rc;
>> +
>> +    rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
>> +                                       NULL, nr_entries);
>
> xc_reserved_device_memory_map dereferences nr_entries. You need to make
> sure there is no garbage value in nr_entries. I.e. you need to
> initialise nr_entries to 0 before passing it in.

Sure, so what about this?

/*
  * We really can't presume how many entries we can get in advance.
  */
if (*nr_entries)
     *nr_entries = 0;

>
>> +    assert(rc <= 0);
>> +    /* "0" means we have no any rdm entry. */
>> +    if (!rc)
>> +        goto out;
>> +
>> +    if (errno == ENOBUFS) {
>> +        xrdm = malloc(*nr_entries * sizeof(xen_reserved_device_memory_t));
>> +        if (!xrdm) {
>> +            LOG(ERROR, "Could not allocate RDM buffer!\n");
>> +            goto out;
>> +        }
>> +        rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
>> +                                           xrdm, nr_entries);
>> +        if (rc) {
>> +            LOG(ERROR, "Could not get reserved device memory maps.\n");
>> +            *nr_entries = 0;
>> +            free(xrdm);
>> +            xrdm = NULL;
>> +        }
>> +    } else
>> +        LOG(ERROR, "Could not get reserved device memory maps.\n");
>> +
>> + out:
>> +    return xrdm;
>> +}
>> +
>> +/*
>> + * Check whether there exists rdm hole in the specified memory range.
>> + * Returns true if exists, else returns false.
>> + */
>> +static bool overlaps_rdm(uint64_t start, uint64_t memsize,
>> +                         uint64_t rdm_start, uint64_t rdm_size)
>> +{
>> +    return (start + memsize > rdm_start) && (start < rdm_start + rdm_size);
>> +}
>> +
>> +/*
>> + * Check reported RDM regions and handle potential gfn conflicts according
>> + * to user preferred policy.
>> + *
>> + * RMRR can reside in address space beyond 4G theoretically, but we never
>> + * see this in real world. So in order to avoid breaking highmem layout
>> + * we don't solve highmem conflict. Note this means highmem rmrr could still
>> + * be supported if no conflict.
>> + *
>> + * But in the case of lowmem, RMRR probably scatter the whole RAM space.
>> + * Especially multiple RMRR entries would worsen this to lead a complicated
>> + * memory layout. And then its hard to extend hvm_info_table{} to work
>> + * hvmloader out. So here we're trying to figure out a simple solution to
>> + * avoid breaking existing layout. So when a conflict occurs,
>> + *
>> + * #1. Above a predefined boundary (default 2G)
>> + * - Move lowmem_end below reserved region to solve conflict;
>> + *
>> + * #2. Below a predefined boundary (default 2G)
>> + * - Check strict/relaxed policy.
>> + * "strict" policy leads to fail libxl. Note when both policies
>> + * are specified on a given region, 'strict' is always preferred.
>> + * "relaxed" policy issue a warning message and also mask this entry
>> + * INVALID to indicate we shouldn't expose this entry to hvmloader.
>> + */
>
> This looks sensible. Thanks.
>
>> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
>> index 4364ba4..85d74fd 100644
>> --- a/tools/libxl/xl_cmdimpl.c
>> +++ b/tools/libxl/xl_cmdimpl.c
>> @@ -1374,6 +1374,9 @@ static void parse_config_data(const char *config_source,
>>       if (!xlu_cfg_get_long (config, "videoram", &l, 0))
>>           b_info->video_memkb = l * 1024;
>>
>> +    if (!xlu_cfg_get_long (config, "rdm_mem_boundary", &l, 0))
>> +        b_info->rdm_mem_boundary_memkb = l * 1024;
>> +
>
> Maybe you need to rearrange this patch series a bit more. The toolstack
> side patches have mixed libxc, libxl and xl changes which is a bit hard
> for me to tell what is needed by what. We can discuss this if you have
> questions.

Okay, just let me try firstly.

Thanks
Tiejun

>
> Wei.
>
>>       if (!xlu_cfg_get_long(config, "max_event_channels", &l, 0))
>>           b_info->event_channels = l;
>>
>> --
>> 1.9.1
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 14/16] tools/libxl: extend XENMEM_set_memory_map
  2015-06-12 16:43   ` Wei Liu
@ 2015-06-15  2:15     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-15  2:15 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, jbeulich, yang.z.zhang, Ian.Jackson

On 2015/6/13 0:43, Wei Liu wrote:
> On Thu, Jun 11, 2015 at 09:15:23AM +0800, Tiejun Chen wrote:
>> Here we'll construct a basic guest e820 table via
>> XENMEM_set_memory_map. This table includes lowmem, highmem
>> and RDMs if they exist. And hvmloader would need this info
>> later.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   tools/libxl/libxl_dom.c      |  5 +++
>>   tools/libxl/libxl_internal.h | 24 ++++++++++++++
>>   tools/libxl/libxl_x86.c      | 78 ++++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 107 insertions(+)
>>
>> diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
>> index 1777b32..3125ac0 100644
>> --- a/tools/libxl/libxl_dom.c
>> +++ b/tools/libxl/libxl_dom.c
>> @@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
>>           goto out;
>>       }
>>
>> +    if (libxl__domain_construct_e820(gc, d_config, domid, &args)) {
>> +        LOG(ERROR, "setting domain memory map failed");
>> +        goto out;
>> +    }
>> +
>>       ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
>>                                  &state->store_mfn, state->console_port,
>>                                  &state->console_mfn, state->store_domid,
>> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
>> index 52f3831..d838639 100644
>> --- a/tools/libxl/libxl_internal.h
>> +++ b/tools/libxl/libxl_internal.h
>> @@ -3713,6 +3713,30 @@ static inline void libxl__update_config_vtpm(libxl__gc *gc,
>>    */
>>   void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr,
>>                                       const libxl_bitmap *sptr);
>> +
>> +/*
>> + * Here we're just trying to set these kinds of e820 mappings:
>> + *
>> + * #1. Low memory region
>> + *
>> + * Low RAM starts at least from 1M to make sure all standard regions
>> + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
>> + * have enough space.
>> + * Note: Those stuffs below 1M are still constructed with multiple
>> + * e820 entries by hvmloader. At this point we don't change anything.
>> + *
>> + * #2. RDM region if it exists
>> + *
>> + * #3. High memory region if it exists
>> + *
>> + * Note: these regions are not overlapping since we already check
>> + * to adjust them. Please refer to libxl__domain_device_construct_rdm().
>> + */
>> +int libxl__domain_construct_e820(libxl__gc *gc,
>> +                                 libxl_domain_config *d_config,
>> +                                 uint32_t domid,
>> +                                 struct xc_hvm_build_args *args);
>> +
>>   #endif
>>
>>   /*
>> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
>> index ed2bd38..291f6ab 100644
>> --- a/tools/libxl/libxl_x86.c
>> +++ b/tools/libxl/libxl_x86.c
>> @@ -438,6 +438,84 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
>>   }
>>
>>   /*
>> + * Here we're just trying to set these kinds of e820 mappings:
>> + *
>> + * #1. Low memory region
>> + *
>> + * Low RAM starts at least from 1M to make sure all standard regions
>> + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
>> + * have enough space.
>> + * Note: Those stuffs below 1M are still constructed with multiple
>> + * e820 entries by hvmloader. At this point we don't change anything.
>> + *
>> + * #2. RDM region if it exists
>> + *
>> + * #3. High memory region if it exists
>> + *
>> + * Note: these regions are not overlapping since we already check
>> + * to adjust them. Please refer to libxl__domain_device_construct_rdm().
>> + */
>> +#define GUEST_LOW_MEM_START_DEFAULT 0x100000
>> +int libxl__domain_construct_e820(libxl__gc *gc,
>> +                                 libxl_domain_config *d_config,
>> +                                 uint32_t domid,
>> +                                 struct xc_hvm_build_args *args)
>> +{
>> +    unsigned int nr = 0, i;
>> +    /* We always own at least one lowmem entry. */
>> +    unsigned int e820_entries = 1;
>> +    struct e820entry *e820 = NULL;
>> +    uint64_t highmem_size =
>> +                    args->highmem_end ? args->highmem_end - (1ull << 32) : 0;
>> +
>> +    /* Add all rdm entries. */
>> +    for (i = 0; i < d_config->num_rdms; i++)
>> +        if (d_config->rdms[i].flag != LIBXL_RDM_RESERVE_FLAG_INVALID)
>> +            e820_entries++;
>> +
>> +
>> +    /* If we should have a highmem range. */
>> +    if (highmem_size)
>> +        e820_entries++;
>> +
>> +    if (e820_entries >= E820MAX) {
>> +        LOG(ERROR, "Ooops! Too many entries in the memory map!\n");
>> +        return -1;
>
> Please use goto style error handling.

Okay.

>
>> +    }
>> +
>> +    e820 = libxl__malloc(NOGC, sizeof(struct e820entry) * e820_entries);
>> +
>
> You should use libxl__malloc(gc,) here.

Okay.

Thanks
Tiejun

>
>> +    /* Low memory */
>> +    e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
>> +    e820[nr].size = args->lowmem_end - GUEST_LOW_MEM_START_DEFAULT;
>> +    e820[nr].type = E820_RAM;
>> +    nr++;
>> +
>> +    /* RDM mapping */
>> +    for (i = 0; i < d_config->num_rdms; i++) {
>> +        if (d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_INVALID)
>> +            continue;
>> +
>> +        e820[nr].addr = d_config->rdms[i].start;
>> +        e820[nr].size = d_config->rdms[i].size;
>> +        e820[nr].type = E820_RESERVED;
>> +        nr++;
>> +    }
>> +
>> +    /* High memory */
>> +    if (highmem_size) {
>> +        e820[nr].addr = ((uint64_t)1 << 32);
>> +        e820[nr].size = highmem_size;
>> +        e820[nr].type = E820_RAM;
>> +    }
>> +
>> +    if (xc_domain_set_memory_map(CTX->xch, domid, e820, e820_entries) != 0)
>> +        return -1;
>> +
>> +    return 0;
>> +}
>> +
>> +/*
>>    * Local variables:
>>    * mode: C
>>    * c-basic-offset: 4
>> --
>> 1.9.1
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-12  9:20         ` Chen, Tiejun
  2015-06-12  9:26           ` Jan Beulich
@ 2015-06-15  7:39           ` Chen, Tiejun
  1 sibling, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-15  7:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

On 2015/6/12 17:20, Chen, Tiejun wrote:
> On 2015/6/12 16:45, Jan Beulich wrote:
>>>>> On 12.06.15 at 08:31, <tiejun.chen@intel.com> wrote:
>>> On 2015/6/11 17:28, Tian, Kevin wrote:
>>>>> From: Chen, Tiejun
>>>>> Sent: Thursday, June 11, 2015 9:15 AM
>>>>> @@ -1940,7 +1942,8 @@ static int intel_iommu_remove_device(u8
>>>>> devfn, struct pci_dev
>>>>> *pdev)
>>>>>                 PCI_DEVFN2(bdf) != devfn )
>>>>>                continue;
>>>>>
>>>>> -        rmrr_identity_mapping(pdev->domain, 0, rmrr);
>>>>> +        rmrr_identity_mapping(pdev->domain, 0, rmrr,
>>>>> +                              XEN_DOMCTL_DEV_RDM_RELAXED);
>>>>
>>>> ditto
>>>
>>> It doesn't matter when we're trying to remove a device since we don't
>>> care this flag.
>>
>> In such a case it helps to add a brief comment saying that the precise
>> value passed is irrelevant. Or maybe this could be expressed by
>
> Okay.

Just let me go this simple way now.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy
  2015-06-15  1:12     ` Chen, Tiejun
@ 2015-06-15 14:58       ` Wei Liu
  2015-06-16  2:31         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Wei Liu @ 2015-06-15 14:58 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: kevin.tian, Wei Liu, ian.campbell, andrew.cooper3, tim,
	xen-devel, stefano.stabellini, jbeulich, yang.z.zhang,
	Ian.Jackson

On Mon, Jun 15, 2015 at 09:12:17AM +0800, Chen, Tiejun wrote:
> On 2015/6/12 23:43, Wei Liu wrote:
> >On Thu, Jun 11, 2015 at 09:15:19AM +0800, Tiejun Chen wrote:
> >[...]
> >>
> >>-static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
> >>+static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func,
> >>+                    int *flag)
> >
> >This is unrelated change. It should be moved to appropriate patch.
> >
> 
> This is in the tools/python/xen/lowlevel/xc/xc.c file,
> 
> pyxc_assign_device
>     |
>     + while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
> 	|
> 	+ ...
> 	|
> 	+ xc_assign_device()
> 
> this really should be related to extend xc_assign_device().
> 

Right, this is fine then.

Please write in the commit message that you also adjusted all language
bindings. Normally when I notice the code doesn't match the commit
message I become very skeptical about the code.

Wei.

> Thanks
> Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 12/16] tools/libxl: passes rdm reservation policy
  2015-06-15  1:26     ` Chen, Tiejun
@ 2015-06-15 15:00       ` Wei Liu
  0 siblings, 0 replies; 114+ messages in thread
From: Wei Liu @ 2015-06-15 15:00 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: kevin.tian, Wei Liu, ian.campbell, andrew.cooper3, tim,
	xen-devel, stefano.stabellini, jbeulich, yang.z.zhang,
	Ian.Jackson

On Mon, Jun 15, 2015 at 09:26:30AM +0800, Chen, Tiejun wrote:
[...]
> >>  out:
> >>      if (!libxl_is_stubdom(ctx, domid, NULL)) {
> >>+        if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_RELAXED) {
> >>+            flag = XEN_DOMCTL_DEV_RDM_RELAXED;
> >>+        } else if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_STRICT) {
> >>+            flag = XEN_DOMCTL_DEV_RDM_STRICT;
> >>+        } else {
> >>+            LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unkwon rdm check flag.");
> >
> >Typo "unkwon" and use LOG(ERROR,...).
> 
> Will fix that typo, s/unkwon/unknown, but are you sure we should use LOG()
> here? Because this function always uses LIBXL__LOG_ERRNO(),
> 

Yeah. Consistency is also a strong argument. I won't force my opinion on
you. Fixing the typo is good enough for me.

> >
> >>+            return ERROR_FAIL;
> >>+        }
> >>          rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), flag);
> >>          if (rc < 0 && (hvm || errno != ENOSYS)) {
> >>              LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
> 
[...]
> >>  }
> >>
> >>diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
> >>index 7f4759b..552fbec 100644
> >>--- a/tools/libxl/xl_cmdtable.c
> >>+++ b/tools/libxl/xl_cmdtable.c
> >>@@ -88,7 +88,7 @@ struct cmd_spec cmd_table[] = {
> >>      { "pci-attach",
> >>        &main_pciattach, 0, 1,
> >>        "Insert a new pass-through pci device",
> >>-      "<Domain> <BDF> [Virtual Slot]",
> >>+      "<Domain> <BDF> [Virtual Slot] <policy to reserve rdm['strice'|'relaxed']>",
> >
> >Should use "[]" to indicate it's optional.
> >
> 
> What about this?
> 
> "<Domain> <BDF> [Virtual Slot] [policy to reserve rdm<'strice'|'relaxed'>]",
> 

Fine by me.

Wei.

> Thanks
> Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 13/16] tools/libxl: detect and avoid conflicts with RDM
  2015-06-15  1:50     ` Chen, Tiejun
@ 2015-06-15 15:01       ` Wei Liu
  2015-06-16  1:44         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Wei Liu @ 2015-06-15 15:01 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: kevin.tian, Wei Liu, ian.campbell, andrew.cooper3, tim,
	xen-devel, stefano.stabellini, jbeulich, yang.z.zhang,
	Ian.Jackson

On Mon, Jun 15, 2015 at 09:50:49AM +0800, Chen, Tiejun wrote:
[...]
> >>+                   uint32_t flag,
> >>+                   uint16_t seg,
> >>+                   uint8_t bus,
> >>+                   uint8_t devfn,
> >>+                   unsigned int *nr_entries)
> >>+{
> >>+    struct xen_reserved_device_memory *xrdm;
> >>+    int rc;
> >>+
> >>+    rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
> >>+                                       NULL, nr_entries);
> >
> >xc_reserved_device_memory_map dereferences nr_entries. You need to make
> >sure there is no garbage value in nr_entries. I.e. you need to
> >initialise nr_entries to 0 before passing it in.
> 
> Sure, so what about this?
> 
> /*
>  * We really can't presume how many entries we can get in advance.
>  */
> if (*nr_entries)
>     *nr_entries = 0;
> 

You might just unconditionally set *nr_entries to 0.

Wei.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 13/16] tools/libxl: detect and avoid conflicts with RDM
  2015-06-15 15:01       ` Wei Liu
@ 2015-06-16  1:44         ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-16  1:44 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, jbeulich, yang.z.zhang, Ian.Jackson

On 2015/6/15 23:01, Wei Liu wrote:
> On Mon, Jun 15, 2015 at 09:50:49AM +0800, Chen, Tiejun wrote:
> [...]
>>>> +                   uint32_t flag,
>>>> +                   uint16_t seg,
>>>> +                   uint8_t bus,
>>>> +                   uint8_t devfn,
>>>> +                   unsigned int *nr_entries)
>>>> +{
>>>> +    struct xen_reserved_device_memory *xrdm;
>>>> +    int rc;
>>>> +
>>>> +    rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
>>>> +                                       NULL, nr_entries);
>>>
>>> xc_reserved_device_memory_map dereferences nr_entries. You need to make
>>> sure there is no garbage value in nr_entries. I.e. you need to
>>> initialise nr_entries to 0 before passing it in.
>>
>> Sure, so what about this?
>>
>> /*
>>   * We really can't presume how many entries we can get in advance.
>>   */
>> if (*nr_entries)
>>      *nr_entries = 0;
>>
>
> You might just unconditionally set *nr_entries to 0.
>

Okay.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy
  2015-06-12  8:25     ` Chen, Tiejun
@ 2015-06-16  2:28       ` Tian, Kevin
  0 siblings, 0 replies; 114+ messages in thread
From: Tian, Kevin @ 2015-06-16  2:28 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Friday, June 12, 2015 4:25 PM
> 
> >> @@ -1792,6 +1794,8 @@ int xc_assign_dt_device(
> >>
> >>       domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
> >>       domctl.u.assign_device.u.dt.size = size;
> >> +    /* DT doesn't own any RDM. */
> >> +    domctl.u.assign_device.flag = XEN_DOMCTL_DEV_NO_RDM;
> >
> > still not clear about this NO_RDM flag. If a device-tree device doesn't
> > own RDM, the hypervisor will know it. Why do we require toolstack
> > to tell hypervisor not use it?
> 
> I think an explicit flag can make this sort of case identified, right?
> And other flags brings easily some confusions, or even a potential risk
> in the future.
> 

Thanks. That explanation is OK to me.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-12  6:31     ` Chen, Tiejun
  2015-06-12  8:45       ` Jan Beulich
@ 2015-06-16  2:30       ` Tian, Kevin
  1 sibling, 0 replies; 114+ messages in thread
From: Tian, Kevin @ 2015-06-16  2:30 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Friday, June 12, 2015 2:31 PM
> 
> >> @@ -1899,7 +1900,8 @@ static int intel_iommu_add_device(u8 devfn, struct pci_dev
> >> *pdev)
> >>                PCI_BUS(bdf) == pdev->bus &&
> >>                PCI_DEVFN2(bdf) == devfn )
> >>           {
> >> -            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
> >> +            ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
> >> +                                        XEN_DOMCTL_DEV_RDM_RELAXED);
> >
> > Why did you hardcode relax policy here? Shouldn't the policy come
> > from hypercall flag?
> 
> I just saw we have one path to use intel_iommu_add_device(),
> 
> pci_add_device()
>      |
>      + if ( !pdev->domain )
>        {
>          pdev->domain = hardware_domain;
>          ret = iommu_add_device(pdev);
> 	    |
> 	    + hd->platform_ops->add_device()
> 		|
> 		+ intel_iommu_add_device()
> 
> So I think intel_iommu_add_device() is used to add a device to
> hardware_domain. And in our case hardware_domain should be special as I
> explained above.

Then please add a clear comment in such case.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy
  2015-06-15 14:58       ` Wei Liu
@ 2015-06-16  2:31         ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-16  2:31 UTC (permalink / raw)
  To: Wei Liu
  Cc: kevin.tian, ian.campbell, andrew.cooper3, tim, xen-devel,
	stefano.stabellini, jbeulich, yang.z.zhang, Ian.Jackson

On 2015/6/15 22:58, Wei Liu wrote:
> On Mon, Jun 15, 2015 at 09:12:17AM +0800, Chen, Tiejun wrote:
>> On 2015/6/12 23:43, Wei Liu wrote:
>>> On Thu, Jun 11, 2015 at 09:15:19AM +0800, Tiejun Chen wrote:
>>> [...]
>>>>
>>>> -static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func)
>>>> +static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func,
>>>> +                    int *flag)
>>>
>>> This is unrelated change. It should be moved to appropriate patch.
>>>
>>
>> This is in the tools/python/xen/lowlevel/xc/xc.c file,
>>
>> pyxc_assign_device
>>      |
>>      + while ( next_bdf(&pci_str, &seg, &bus, &dev, &func, &flag) )
>> 	|
>> 	+ ...
>> 	|
>> 	+ xc_assign_device()
>>
>> this really should be related to extend xc_assign_device().
>>
>
> Right, this is fine then.
>
> Please write in the commit message that you also adjusted all language
> bindings. Normally when I notice the code doesn't match the commit

I want to add this line into the patch head description,

"Note this also bring some fallout to python usage of xc_assign_device()."

> message I become very skeptical about the code.
>

Sorry for this confusion.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-12  7:53     ` Chen, Tiejun
@ 2015-06-16  5:47       ` Tian, Kevin
  2015-06-16  9:29         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Tian, Kevin @ 2015-06-16  5:47 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Friday, June 12, 2015 3:54 PM
> >
> >>           bar_data |= (uint32_t)base;
> >>           bar_data_upper = (uint32_t)(base >> 32);
> >> +        for ( j = 0; j < memory_map.nr_map ; j++ )
> >> +        {
> >> +            if ( memory_map.map[j].type != E820_RAM )
> >> +            {
> >> +                reserved_end = memory_map.map[j].addr +
> >> memory_map.map[j].size;
> >> +                if ( check_overlap(base, bar_sz,
> >> +                                   memory_map.map[j].addr,
> >> +                                   memory_map.map[j].size) )
> >> +                {
> >> +                    base = (reserved_end  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
> >> +                    goto reallocate_mmio;
> 
> That is because our previous implementation is just skipping that
> conflict region,
> 
> "But you do nothing to make sure the MMIO regions all fit in the
> available window (see the code ahead of this relocating RAM if
> necessary)." and "...it simply skips assigning resources. Your changes
> potentially growing the space needed to fit all MMIO BARs therefore also
> needs to adjust the up front calculation, such that if necessary more
> RAM can be relocated to make the hole large enough."
> 
> And then I replied as follows,
> 
> "You're right.
> 
> Just think about we're always trying to check pci_mem_start to populate
> more RAM to obtain enough PCI mempry,
> 
>      /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
>      while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
>      {
>          struct xen_add_to_physmap xatp;
>          unsigned int nr_pages = min_t(
>              unsigned int,
>              hvm_info->low_mem_pgend - (pci_mem_start >> PAGE_SHIFT),
>              (1u << 16) - 1);
>          if ( hvm_info->high_mem_pgend == 0 )
>              hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
>          hvm_info->low_mem_pgend -= nr_pages;
>          printf("Relocating 0x%x pages from "PRIllx" to "PRIllx\
>                 " for lowmem MMIO hole\n",
>                 nr_pages,
>                 PRIllx_arg(((uint64_t)hvm_info->low_mem_pgend)<<PAGE_SHIFT),
> 
> PRIllx_arg(((uint64_t)hvm_info->high_mem_pgend)<<PAGE_SHIFT));
>          xatp.domid = DOMID_SELF;
>          xatp.space = XENMAPSPACE_gmfn_range;
>          xatp.idx   = hvm_info->low_mem_pgend;
>          xatp.gpfn  = hvm_info->high_mem_pgend;
>          xatp.size  = nr_pages;
>          if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
>              BUG();
>          hvm_info->high_mem_pgend += nr_pages;
>      }
> "
> 
> I hope this can help you understand this background. And I will update
> that code comment like this,
> 
>      /*
>       * We'll skip all space overlapping with reserved memory later,
>       * so we need to decrease pci_mem_start to populate more RAM
>       * to compensate them.
>       */
> 

Jan's comment is correct. However I don't think adjusting pci_mem_start
is the right way here (even now I don't quite understand how it's adjusted
in your earlier code). There are other limitations on that value. We can simply 
adjust mmio_total to include conflicting reserved ranges, so more bars will
be moved to high_mem_resource automatically by below code:

 380         using_64bar = bars[i].is_64bar && bar64_relocate
 381             && (mmio_total > (mem_resource.max - mem_resource.base));
 382         bar_data = pci_readl(devfn, bar_reg);
 383 
 384         if ( (bar_data & PCI_BASE_ADDRESS_SPACE) ==
 385              PCI_BASE_ADDRESS_SPACE_MEMORY )
 386         {
 387             /* Mapping high memory if PCI device is 64 bits bar */
 388             if ( using_64bar ) {
 389                 if ( high_mem_resource.base & (bar_sz - 1) )
 390                     high_mem_resource.base = high_mem_resource.base - 
 391                         (high_mem_resource.base & (bar_sz - 1)) + bar_sz;
 392                 if ( !pci_hi_mem_start )
 393                     pci_hi_mem_start = high_mem_resource.base;
 394                 resource = &high_mem_resource;
 395                 bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
 396             } 
 397             else {
 398                 resource = &mem_resource;
 399                 bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
 400             }
 401             mmio_total -= bar_sz;
 402         }

Thanks
Kevin

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 08/16] hvmloader/e820: construct guest e820 table
  2015-06-12  8:19     ` Chen, Tiejun
@ 2015-06-16  5:54       ` Tian, Kevin
  0 siblings, 0 replies; 114+ messages in thread
From: Tian, Kevin @ 2015-06-16  5:54 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Friday, June 12, 2015 4:19 PM
> >
> >> +     *
> >> +     * #3. High memory region if it exists
> >> +     */
> >> +    for ( i = 0; i < memory_map.nr_map; i++ )
> >>       {
> >> -        e820[nr].addr = ((uint64_t)1 << 32);
> >> -        e820[nr].size =
> >> -            ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) -
> e820[nr].addr;
> >> -        e820[nr].type = E820_RAM;
> >> +        e820[nr] = memory_map.map[i];
> >>           nr++;
> >>       }
> >>
> >> +    /* Low RAM goes here. Reserve space for special pages. */
> >> +    BUG_ON(low_mem_pgend < (2u << 20));
> >> +    /*
> >> +     * We may need to adjust real lowmem end since we may
> >> +     * populate RAM to get enough MMIO previously.
> >> +     */
> >> +    for ( i = 0; i < memory_map.nr_map; i++ )
> >
> > since you already translate memory map into e820 earlier, here
> > you should use 'nr' instead of memory_map.nr_map.
> >
> 
> As we're saying in the code comment above, we're just handling the
> lowmem entry, so I think memory_map.nr_map is enough.
> 

OK

> >> +    {
> >> +        uint64_t end = e820[i].addr + e820[i].size;
> >> +        if ( e820[i].type == E820_RAM &&
> >> +             low_mem_pgend > e820[i].addr && low_mem_pgend < end )
> >> +            e820[i].size = low_mem_pgend - e820[i].addr;
> >> +    }
> >
> > Sorry I may miss the code but could you elaborate where the
> > low_mem_pgend is changed after memory map is created? If
> > it happens within hvmloader, suppose the amount of reduced
> > memory from original E820_RAM entry should be added to
> > another E820_RAM entry for highmem, right?
> 
> You're right so I really should compensate this in highmem entry,
> 
>      add_high_mem = end - low_mem_end;
> 
>      /*
>       * And then we also need to adjust highmem.
>       */
>      if ( add_high_mem )
>      {
>          for ( i = 0; i < memory_map.nr_map; i++ )
>          {
>              if ( e820[i].type == E820_RAM &&
>                   e820[i].addr > (1ull << 32))
>                  e820[i].size += add_high_mem;
>          }
>      }
> 
> 

Need to see more code in next version.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 15/16] xen/vtd: enable USB device assignment
  2015-06-12  8:59     ` Chen, Tiejun
@ 2015-06-16  5:58       ` Tian, Kevin
  2015-06-16  6:09         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Tian, Kevin @ 2015-06-16  5:58 UTC (permalink / raw)
  To: Chen, Tiejun, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

> From: Chen, Tiejun
> Sent: Friday, June 12, 2015 5:00 PM
> 
> On 2015/6/11 18:22, Tian, Kevin wrote:
> >> From: Chen, Tiejun
> >> Sent: Thursday, June 11, 2015 9:15 AM
> >>
> >> Before we refine RMRR mechanism, USB RMRR may conflict with guest bios
> >> region so we always ignore USB RMRR.
> >
> > If USB RMRR conflicts with guest bios, the conflict is always there
> > before and after your refinement. :-)
> 
> Yeah :)
> 
> >
> >> Now this can be gone when we enable
> >> pci_force to check/reserve RMRR.
> 
> So what about this?
> 
> USB RMRR may conflict with guest bios region so we always ignore
> USB RMRR. But now this can be checked to handle after we introduce
> our policy mechanism.

USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we 
can handle this scenario cleanly with new policy mechanism so previous 
hack code can be removed now.

> 
> >>
> >> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> >
> > Acked-by: Kevin Tian <kevin.tian@intel.com> except one small comment below
> >
> >> ---
> >>   xen/drivers/passthrough/vtd/dmar.h  |  1 -
> >>   xen/drivers/passthrough/vtd/iommu.c | 11 ++---------
> >>   xen/drivers/passthrough/vtd/utils.c |  7 -------
> >>   3 files changed, 2 insertions(+), 17 deletions(-)
> >>
> >> diff --git a/xen/drivers/passthrough/vtd/dmar.h
> b/xen/drivers/passthrough/vtd/dmar.h
> >> index af1feef..af205f5 100644
> >> --- a/xen/drivers/passthrough/vtd/dmar.h
> >> +++ b/xen/drivers/passthrough/vtd/dmar.h
> >> @@ -129,7 +129,6 @@ do {                                                \
> >>
> >>   int vtd_hw_check(void);
> >>   void disable_pmr(struct iommu *iommu);
> >> -int is_usb_device(u16 seg, u8 bus, u8 devfn);
> >>   int is_igd_drhd(struct acpi_drhd_unit *drhd);
> >>
> >>   #endif /* _DMAR_H_ */
> >> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> >> b/xen/drivers/passthrough/vtd/iommu.c
> >> index d7c9e1c..d3233b8 100644
> >> --- a/xen/drivers/passthrough/vtd/iommu.c
> >> +++ b/xen/drivers/passthrough/vtd/iommu.c
> >> @@ -2229,11 +2229,9 @@ static int reassign_device_ownership(
> >>       /*
> >>        * If the device belongs to the hardware domain, and it has RMRR, don't
> >>        * remove it from the hardware domain, because BIOS may use RMRR at
> >> -     * booting time. Also account for the special casing of USB below (in
> >> -     * intel_iommu_assign_device()).
> >> +     * booting time.
> >
> > this code is run-time right?
> 
> According to one associated commit,
> 
> commit 8b99f4400b695535153dcd5d949b3f63602ca8bf
> Author: Jan Beulich <jbeulich@suse.com>
> Date:   Fri Oct 10 10:54:21 2014 +0200
> 
>      VT-d: fix RMRR related error handling
> 
>      - reassign_device_ownership() now tears down RMRR mappings (for other
>        than Dom0)
>      - to facilitate that, rmrr_identity_mapping() now deals with both
>        establishing and tearing down of these mappings (the open coded
>        equivalent in intel_iommu_remove_device() is being replaced at once)
>      - intel_iommu_assign_device() now unrolls the assignment upon RMRR
>        mapping errors
>      - intel_iommu_add_device() now returns consistent values upon RMRR
>        mapping failures (was: failure when last iteration ran into a
>        problem, success otherwise)
>      - intel_iommu_remove_device() no longer special cases Dom0 (it only
>        ever gets called for devices removed from the _system_, not a domain)
>      - rmrr_identity_mapping() now returns a proper error indicator instead
>        of -1 when intel_iommu_map_page() failed
> 
>      Signed-off-by: Jan Beulich <jbeulich@suse.com>
>      Acked-by: Kevin Tian <kevin.tian@intel.com>
> 
> This chunk of codes resides inside intel_iommu_remove_device() so I
> think this shouldn't be for a running domain.
> 

sorry I thought you meant intel_iommu_assign_device()) only used at
booting time. Wrong catch on the patch format. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 15/16] xen/vtd: enable USB device assignment
  2015-06-16  5:58       ` Tian, Kevin
@ 2015-06-16  6:09         ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-16  6:09 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

On 2015/6/16 13:58, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Friday, June 12, 2015 5:00 PM
>>
>> On 2015/6/11 18:22, Tian, Kevin wrote:
>>>> From: Chen, Tiejun
>>>> Sent: Thursday, June 11, 2015 9:15 AM
>>>>
>>>> Before we refine RMRR mechanism, USB RMRR may conflict with guest bios
>>>> region so we always ignore USB RMRR.
>>>
>>> If USB RMRR conflicts with guest bios, the conflict is always there
>>> before and after your refinement. :-)
>>
>> Yeah :)
>>
>>>
>>>> Now this can be gone when we enable
>>>> pci_force to check/reserve RMRR.
>>
>> So what about this?
>>
>> USB RMRR may conflict with guest bios region so we always ignore
>> USB RMRR. But now this can be checked to handle after we introduce
>> our policy mechanism.
>
> USB RMRR may conflict with guest BIOS region. In such case, identity
> mapping setup is simply skipped in previous implementation. Now we
> can handle this scenario cleanly with new policy mechanism so previous
> hack code can be removed now.

Will update.

Thanks
Tiejun

>
>>
>>>>
>>>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>>>
>>> Acked-by: Kevin Tian <kevin.tian@intel.com> except one small comment below
>>>
>>>> ---
>>>>    xen/drivers/passthrough/vtd/dmar.h  |  1 -
>>>>    xen/drivers/passthrough/vtd/iommu.c | 11 ++---------
>>>>    xen/drivers/passthrough/vtd/utils.c |  7 -------
>>>>    3 files changed, 2 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/xen/drivers/passthrough/vtd/dmar.h
>> b/xen/drivers/passthrough/vtd/dmar.h
>>>> index af1feef..af205f5 100644
>>>> --- a/xen/drivers/passthrough/vtd/dmar.h
>>>> +++ b/xen/drivers/passthrough/vtd/dmar.h
>>>> @@ -129,7 +129,6 @@ do {                                                \
>>>>
>>>>    int vtd_hw_check(void);
>>>>    void disable_pmr(struct iommu *iommu);
>>>> -int is_usb_device(u16 seg, u8 bus, u8 devfn);
>>>>    int is_igd_drhd(struct acpi_drhd_unit *drhd);
>>>>
>>>>    #endif /* _DMAR_H_ */
>>>> diff --git a/xen/drivers/passthrough/vtd/iommu.c
>>>> b/xen/drivers/passthrough/vtd/iommu.c
>>>> index d7c9e1c..d3233b8 100644
>>>> --- a/xen/drivers/passthrough/vtd/iommu.c
>>>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>>>> @@ -2229,11 +2229,9 @@ static int reassign_device_ownership(
>>>>        /*
>>>>         * If the device belongs to the hardware domain, and it has RMRR, don't
>>>>         * remove it from the hardware domain, because BIOS may use RMRR at
>>>> -     * booting time. Also account for the special casing of USB below (in
>>>> -     * intel_iommu_assign_device()).
>>>> +     * booting time.
>>>
>>> this code is run-time right?
>>
>> According to one associated commit,
>>
>> commit 8b99f4400b695535153dcd5d949b3f63602ca8bf
>> Author: Jan Beulich <jbeulich@suse.com>
>> Date:   Fri Oct 10 10:54:21 2014 +0200
>>
>>       VT-d: fix RMRR related error handling
>>
>>       - reassign_device_ownership() now tears down RMRR mappings (for other
>>         than Dom0)
>>       - to facilitate that, rmrr_identity_mapping() now deals with both
>>         establishing and tearing down of these mappings (the open coded
>>         equivalent in intel_iommu_remove_device() is being replaced at once)
>>       - intel_iommu_assign_device() now unrolls the assignment upon RMRR
>>         mapping errors
>>       - intel_iommu_add_device() now returns consistent values upon RMRR
>>         mapping failures (was: failure when last iteration ran into a
>>         problem, success otherwise)
>>       - intel_iommu_remove_device() no longer special cases Dom0 (it only
>>         ever gets called for devices removed from the _system_, not a domain)
>>       - rmrr_identity_mapping() now returns a proper error indicator instead
>>         of -1 when intel_iommu_map_page() failed
>>
>>       Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>       Acked-by: Kevin Tian <kevin.tian@intel.com>
>>
>> This chunk of codes resides inside intel_iommu_remove_device() so I
>> think this shouldn't be for a running domain.
>>
>
> sorry I thought you meant intel_iommu_assign_device()) only used at
> booting time. Wrong catch on the patch format. :-)
>
> Thanks
> Kevin
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-16  5:47       ` Tian, Kevin
@ 2015-06-16  9:29         ` Chen, Tiejun
  2015-06-16  9:40           ` Jan Beulich
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-16  9:29 UTC (permalink / raw)
  To: Tian, Kevin, jbeulich, tim, andrew.cooper3, Zhang, Yang Z,
	wei.liu2, ian.campbell, Ian.Jackson, stefano.stabellini
  Cc: xen-devel

On 2015/6/16 13:47, Tian, Kevin wrote:
>> From: Chen, Tiejun
>> Sent: Friday, June 12, 2015 3:54 PM
>>>
>>>>            bar_data |= (uint32_t)base;
>>>>            bar_data_upper = (uint32_t)(base >> 32);
>>>> +        for ( j = 0; j < memory_map.nr_map ; j++ )
>>>> +        {
>>>> +            if ( memory_map.map[j].type != E820_RAM )
>>>> +            {
>>>> +                reserved_end = memory_map.map[j].addr +
>>>> memory_map.map[j].size;
>>>> +                if ( check_overlap(base, bar_sz,
>>>> +                                   memory_map.map[j].addr,
>>>> +                                   memory_map.map[j].size) )
>>>> +                {
>>>> +                    base = (reserved_end  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
>>>> +                    goto reallocate_mmio;
>>
>> That is because our previous implementation is just skipping that
>> conflict region,
>>
>> "But you do nothing to make sure the MMIO regions all fit in the
>> available window (see the code ahead of this relocating RAM if
>> necessary)." and "...it simply skips assigning resources. Your changes
>> potentially growing the space needed to fit all MMIO BARs therefore also
>> needs to adjust the up front calculation, such that if necessary more
>> RAM can be relocated to make the hole large enough."
>>
>> And then I replied as follows,
>>
>> "You're right.
>>
>> Just think about we're always trying to check pci_mem_start to populate
>> more RAM to obtain enough PCI mempry,
>>
>>       /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
>>       while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
>>       {
>>           struct xen_add_to_physmap xatp;
>>           unsigned int nr_pages = min_t(
>>               unsigned int,
>>               hvm_info->low_mem_pgend - (pci_mem_start >> PAGE_SHIFT),
>>               (1u << 16) - 1);
>>           if ( hvm_info->high_mem_pgend == 0 )
>>               hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
>>           hvm_info->low_mem_pgend -= nr_pages;
>>           printf("Relocating 0x%x pages from "PRIllx" to "PRIllx\
>>                  " for lowmem MMIO hole\n",
>>                  nr_pages,
>>                  PRIllx_arg(((uint64_t)hvm_info->low_mem_pgend)<<PAGE_SHIFT),
>>
>> PRIllx_arg(((uint64_t)hvm_info->high_mem_pgend)<<PAGE_SHIFT));
>>           xatp.domid = DOMID_SELF;
>>           xatp.space = XENMAPSPACE_gmfn_range;
>>           xatp.idx   = hvm_info->low_mem_pgend;
>>           xatp.gpfn  = hvm_info->high_mem_pgend;
>>           xatp.size  = nr_pages;
>>           if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
>>               BUG();
>>           hvm_info->high_mem_pgend += nr_pages;
>>       }
>> "
>>
>> I hope this can help you understand this background. And I will update
>> that code comment like this,
>>
>>       /*
>>        * We'll skip all space overlapping with reserved memory later,
>>        * so we need to decrease pci_mem_start to populate more RAM
>>        * to compensate them.
>>        */
>>
>
> Jan's comment is correct. However I don't think adjusting pci_mem_start
> is the right way here (even now I don't quite understand how it's adjusted
> in your earlier code). There are other limitations on that value. We can simply
> adjust mmio_total to include conflicting reserved ranges, so more bars will

Agreed.

I'm trying to walk into this direction:

     /*
      * We'll skip all space overlapping with reserved memory later,
      * so we need to increase mmio_total to compensate them.
      */
     for ( j = 0; j < memory_map.nr_map ; j++ )
     {
         uint64_t conflict_size = 0;
         if ( memory_map.map[j].type != E820_RAM )
         {
             reserved_start = memory_map.map[j].addr;
             reserved_size = memory_map.map[j].size;
             reserved_end = reserved_start + reserved_size;
             if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
                                reserved_start, reserved_size) )
             {
                 /*
                  * Calculate how much mmio range conflict with
                  * reserved device memory.
                  */
                 conflict_size += reserved_size;

                 /*
                  * But we may need to subtract those sizes beyond the
                  * pci memory, [pci_mem_start, pci_mem_end].
                  */
                 if ( reserved_start < pci_mem_start )
                     conflict_size -= (pci_mem_start - reserved_start);
                 if ( reserved_end > pci_mem_end )
                     conflict_size -= (reserved_end - pci_mem_end);
             }
         }

         if ( conflict_size )
         {
             uint64_t conflict_size = max_t(
                     uint64_t, conflict_size, max_bar_sz);
             conflict_size &= ~(conflict_size - 1);
             mmio_total += conflict_size;
         }
     }


Note max_bar_sz just represents the most biggest bar size among all pci 
devices.

Thanks
Tiejun

> be moved to high_mem_resource automatically by below code:
>
>   380         using_64bar = bars[i].is_64bar && bar64_relocate
>   381             && (mmio_total > (mem_resource.max - mem_resource.base));
>   382         bar_data = pci_readl(devfn, bar_reg);
>   383
>   384         if ( (bar_data & PCI_BASE_ADDRESS_SPACE) ==
>   385              PCI_BASE_ADDRESS_SPACE_MEMORY )
>   386         {
>   387             /* Mapping high memory if PCI device is 64 bits bar */
>   388             if ( using_64bar ) {
>   389                 if ( high_mem_resource.base & (bar_sz - 1) )
>   390                     high_mem_resource.base = high_mem_resource.base -
>   391                         (high_mem_resource.base & (bar_sz - 1)) + bar_sz;
>   392                 if ( !pci_hi_mem_start )
>   393                     pci_hi_mem_start = high_mem_resource.base;
>   394                 resource = &high_mem_resource;
>   395                 bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
>   396             }
>   397             else {
>   398                 resource = &mem_resource;
>   399                 bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
>   400             }
>   401             mmio_total -= bar_sz;
>   402         }
>
> Thanks
> Kevin
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-16  9:29         ` Chen, Tiejun
@ 2015-06-16  9:40           ` Jan Beulich
  2015-06-17  7:10             ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-16  9:40 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

>>> On 16.06.15 at 11:29, <tiejun.chen@intel.com> wrote:
> I'm trying to walk into this direction:
> 
>      /*
>       * We'll skip all space overlapping with reserved memory later,
>       * so we need to increase mmio_total to compensate them.
>       */
>      for ( j = 0; j < memory_map.nr_map ; j++ )
>      {
>          uint64_t conflict_size = 0;
>          if ( memory_map.map[j].type != E820_RAM )
>          {
>              reserved_start = memory_map.map[j].addr;
>              reserved_size = memory_map.map[j].size;
>              reserved_end = reserved_start + reserved_size;
>              if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
>                                 reserved_start, reserved_size) )
>              {
>                  /*
>                   * Calculate how much mmio range conflict with
>                   * reserved device memory.
>                   */
>                  conflict_size += reserved_size;
> 
>                  /*
>                   * But we may need to subtract those sizes beyond the
>                   * pci memory, [pci_mem_start, pci_mem_end].
>                   */
>                  if ( reserved_start < pci_mem_start )
>                      conflict_size -= (pci_mem_start - reserved_start);
>                  if ( reserved_end > pci_mem_end )
>                      conflict_size -= (reserved_end - pci_mem_end);
>              }
>          }
> 
>          if ( conflict_size )
>          {
>              uint64_t conflict_size = max_t(
>                      uint64_t, conflict_size, max_bar_sz);
>              conflict_size &= ~(conflict_size - 1);
>              mmio_total += conflict_size;
>          }
>      }

This last thing goes in the right direction, but is complete overkill
when you have a small reserved region and a huge BAR. You
ought to work out the smallest power-of-2 region enclosing the
reserved range (albeit there are tricky corner cases to consider).

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-16  9:40           ` Jan Beulich
@ 2015-06-17  7:10             ` Chen, Tiejun
  2015-06-17  7:19               ` Jan Beulich
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-17  7:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

On 2015/6/16 17:40, Jan Beulich wrote:
>>>> On 16.06.15 at 11:29, <tiejun.chen@intel.com> wrote:
>> I'm trying to walk into this direction:
>>
>>       /*
>>        * We'll skip all space overlapping with reserved memory later,
>>        * so we need to increase mmio_total to compensate them.
>>        */
>>       for ( j = 0; j < memory_map.nr_map ; j++ )
>>       {
>>           uint64_t conflict_size = 0;
>>           if ( memory_map.map[j].type != E820_RAM )
>>           {
>>               reserved_start = memory_map.map[j].addr;
>>               reserved_size = memory_map.map[j].size;
>>               reserved_end = reserved_start + reserved_size;
>>               if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
>>                                  reserved_start, reserved_size) )
>>               {
>>                   /*
>>                    * Calculate how much mmio range conflict with
>>                    * reserved device memory.
>>                    */
>>                   conflict_size += reserved_size;
>>
>>                   /*
>>                    * But we may need to subtract those sizes beyond the
>>                    * pci memory, [pci_mem_start, pci_mem_end].
>>                    */
>>                   if ( reserved_start < pci_mem_start )
>>                       conflict_size -= (pci_mem_start - reserved_start);
>>                   if ( reserved_end > pci_mem_end )
>>                       conflict_size -= (reserved_end - pci_mem_end);
>>               }
>>           }
>>
>>           if ( conflict_size )
>>           {
>>               uint64_t conflict_size = max_t(
>>                       uint64_t, conflict_size, max_bar_sz);
>>               conflict_size &= ~(conflict_size - 1);
>>               mmio_total += conflict_size;
>>           }
>>       }
>
> This last thing goes in the right direction, but is complete overkill
> when you have a small reserved region and a huge BAR. You

Yeah, this may waste some spaces in this worst case but I this think 
this can guarantee our change don't impact on the original expectation, 
right?

> ought to work out the smallest power-of-2 region enclosing the

Okay. I remember the smallest size of a given PCI I/O space is 8 bytes, 
and the smallest size of a PCI memory space is 16 bytes. So

/* At least 16 bytes to align a PCI BAR size. */
uint64_t align = 16;

reserved_start = memory_map.map[j].addr;
reserved_size = memory_map.map[j].size;

reserved_start = (reserved_star + align) & ~(align - 1);
reserved_size = (reserved_size + align) & ~(align - 1);

Is this correct?

> reserved range (albeit there are tricky corner cases to consider).
>

Yeah, its a little tricky since RMRR always owns a fixed start address, 
so we can't reorder them with all pci bars. I just think at least we 
should provide a correct solution now, then further look into what can 
be optimized. So I think we'd better get conflict_size with 
max(conflict_size, max_bar_sz), right?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-17  7:10             ` Chen, Tiejun
@ 2015-06-17  7:19               ` Jan Beulich
  2015-06-17  7:54                 ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-17  7:19 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

>>> On 17.06.15 at 09:10, <tiejun.chen@intel.com> wrote:
> On 2015/6/16 17:40, Jan Beulich wrote:
>>>>> On 16.06.15 at 11:29, <tiejun.chen@intel.com> wrote:
>>> I'm trying to walk into this direction:
>>>
>>>       /*
>>>        * We'll skip all space overlapping with reserved memory later,
>>>        * so we need to increase mmio_total to compensate them.
>>>        */
>>>       for ( j = 0; j < memory_map.nr_map ; j++ )
>>>       {
>>>           uint64_t conflict_size = 0;
>>>           if ( memory_map.map[j].type != E820_RAM )
>>>           {
>>>               reserved_start = memory_map.map[j].addr;
>>>               reserved_size = memory_map.map[j].size;
>>>               reserved_end = reserved_start + reserved_size;
>>>               if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
>>>                                  reserved_start, reserved_size) )
>>>               {
>>>                   /*
>>>                    * Calculate how much mmio range conflict with
>>>                    * reserved device memory.
>>>                    */
>>>                   conflict_size += reserved_size;
>>>
>>>                   /*
>>>                    * But we may need to subtract those sizes beyond the
>>>                    * pci memory, [pci_mem_start, pci_mem_end].
>>>                    */
>>>                   if ( reserved_start < pci_mem_start )
>>>                       conflict_size -= (pci_mem_start - reserved_start);
>>>                   if ( reserved_end > pci_mem_end )
>>>                       conflict_size -= (reserved_end - pci_mem_end);
>>>               }
>>>           }
>>>
>>>           if ( conflict_size )
>>>           {
>>>               uint64_t conflict_size = max_t(
>>>                       uint64_t, conflict_size, max_bar_sz);
>>>               conflict_size &= ~(conflict_size - 1);
>>>               mmio_total += conflict_size;
>>>           }
>>>       }
>>
>> This last thing goes in the right direction, but is complete overkill
>> when you have a small reserved region and a huge BAR. You
> 
> Yeah, this may waste some spaces in this worst case but I this think 
> this can guarantee our change don't impact on the original expectation, 
> right?

"Some space" may be multiple Gb (e.g. the frame buffer of a graphics
card), which is totally unacceptable.

>> ought to work out the smallest power-of-2 region enclosing the
> 
> Okay. I remember the smallest size of a given PCI I/O space is 8 bytes, 
> and the smallest size of a PCI memory space is 16 bytes. So
> 
> /* At least 16 bytes to align a PCI BAR size. */
> uint64_t align = 16;
> 
> reserved_start = memory_map.map[j].addr;
> reserved_size = memory_map.map[j].size;
> 
> reserved_start = (reserved_star + align) & ~(align - 1);
> reserved_size = (reserved_size + align) & ~(align - 1);
> 
> Is this correct?

Simply aligning the region doesn't help afaict. You need to fit it
with the other MMIO allocations.

>> reserved range (albeit there are tricky corner cases to consider).
>>
> 
> Yeah, its a little tricky since RMRR always owns a fixed start address, 
> so we can't reorder them with all pci bars. I just think at least we 
> should provide a correct solution now, then further look into what can 
> be optimized. So I think we'd better get conflict_size with 
> max(conflict_size, max_bar_sz), right?

As per above - no, this is not an option.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-17  7:19               ` Jan Beulich
@ 2015-06-17  7:54                 ` Chen, Tiejun
  2015-06-17  8:05                   ` Jan Beulich
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-17  7:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

On 2015/6/17 15:19, Jan Beulich wrote:
>>>> On 17.06.15 at 09:10, <tiejun.chen@intel.com> wrote:
>> On 2015/6/16 17:40, Jan Beulich wrote:
>>>>>> On 16.06.15 at 11:29, <tiejun.chen@intel.com> wrote:
>>>> I'm trying to walk into this direction:
>>>>
>>>>        /*
>>>>         * We'll skip all space overlapping with reserved memory later,
>>>>         * so we need to increase mmio_total to compensate them.
>>>>         */
>>>>        for ( j = 0; j < memory_map.nr_map ; j++ )
>>>>        {
>>>>            uint64_t conflict_size = 0;
>>>>            if ( memory_map.map[j].type != E820_RAM )
>>>>            {
>>>>                reserved_start = memory_map.map[j].addr;
>>>>                reserved_size = memory_map.map[j].size;
>>>>                reserved_end = reserved_start + reserved_size;
>>>>                if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
>>>>                                   reserved_start, reserved_size) )
>>>>                {
>>>>                    /*
>>>>                     * Calculate how much mmio range conflict with
>>>>                     * reserved device memory.
>>>>                     */
>>>>                    conflict_size += reserved_size;
>>>>
>>>>                    /*
>>>>                     * But we may need to subtract those sizes beyond the
>>>>                     * pci memory, [pci_mem_start, pci_mem_end].
>>>>                     */
>>>>                    if ( reserved_start < pci_mem_start )
>>>>                        conflict_size -= (pci_mem_start - reserved_start);
>>>>                    if ( reserved_end > pci_mem_end )
>>>>                        conflict_size -= (reserved_end - pci_mem_end);
>>>>                }
>>>>            }
>>>>
>>>>            if ( conflict_size )
>>>>            {
>>>>                uint64_t conflict_size = max_t(
>>>>                        uint64_t, conflict_size, max_bar_sz);
>>>>                conflict_size &= ~(conflict_size - 1);
>>>>                mmio_total += conflict_size;
>>>>            }
>>>>        }
>>>
>>> This last thing goes in the right direction, but is complete overkill
>>> when you have a small reserved region and a huge BAR. You
>>
>> Yeah, this may waste some spaces in this worst case but I this think
>> this can guarantee our change don't impact on the original expectation,
>> right?
>
> "Some space" may be multiple Gb (e.g. the frame buffer of a graphics

Sure.

> card), which is totally unacceptable.
>

But then I don't understand what's your way. How can we fit all pci 
devices just with "the smallest power-of-2 region enclosing the reserved 
device memory"?

For example, the whole pci memory is sitting at
[0xa0000000, 0xa2000000]. And there are two PCI devices, A and B. Note 
each device needs to be allocated with 0x1000000. So if without 
concerning RMRR,

A. [0xa0000000,0xa1000000]
B. [0xa1000000,0xa2000000]

But if one RMRR resides at [0xa0f00000, 0xa1f00000] which obviously 
generate its own alignment with 0x1000000. So the pci memory is expended 
as [0xa0000000, 0xa3000000], right?

Then actually the whole pci memory can be separated three segments like,

#1. [0xa0000000, 0xa0f00000]
#2. [0xa0f00000, 0xa1f00000] -> RMRR would occupy
#3. [0xa1f00000, 0xa3000000]

So just #3 can suffice to allocate but just for one device, right?

If I'm wrong please correct me.

>>> ought to work out the smallest power-of-2 region enclosing the
>>
>> Okay. I remember the smallest size of a given PCI I/O space is 8 bytes,
>> and the smallest size of a PCI memory space is 16 bytes. So
>>
>> /* At least 16 bytes to align a PCI BAR size. */
>> uint64_t align = 16;
>>
>> reserved_start = memory_map.map[j].addr;
>> reserved_size = memory_map.map[j].size;
>>
>> reserved_start = (reserved_star + align) & ~(align - 1);
>> reserved_size = (reserved_size + align) & ~(align - 1);
>>
>> Is this correct?
>
> Simply aligning the region doesn't help afaict. You need to fit it
> with the other MMIO allocations.

I guess you're saying just those mmio allocations conflicting with RMRR? 
But we don't know these exact addresses until we finalize to allocate 
them, right?

Thanks
Tiejun

>
>>> reserved range (albeit there are tricky corner cases to consider).
>>>
>>
>> Yeah, its a little tricky since RMRR always owns a fixed start address,
>> so we can't reorder them with all pci bars. I just think at least we
>> should provide a correct solution now, then further look into what can
>> be optimized. So I think we'd better get conflict_size with
>> max(conflict_size, max_bar_sz), right?
>
> As per above - no, this is not an option.
>
> Jan
>
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-17  7:54                 ` Chen, Tiejun
@ 2015-06-17  8:05                   ` Jan Beulich
  2015-06-17  8:26                     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-17  8:05 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

>>> On 17.06.15 at 09:54, <tiejun.chen@intel.com> wrote:
> On 2015/6/17 15:19, Jan Beulich wrote:
>>>>> On 17.06.15 at 09:10, <tiejun.chen@intel.com> wrote:
>>> Yeah, this may waste some spaces in this worst case but I this think
>>> this can guarantee our change don't impact on the original expectation,
>>> right?
>>
>> "Some space" may be multiple Gb (e.g. the frame buffer of a graphics
> 
> Sure.
> 
>> card), which is totally unacceptable.
>>
> 
> But then I don't understand what's your way. How can we fit all pci 
> devices just with "the smallest power-of-2 region enclosing the reserved 
> device memory"?
> 
> For example, the whole pci memory is sitting at
> [0xa0000000, 0xa2000000]. And there are two PCI devices, A and B. Note 
> each device needs to be allocated with 0x1000000. So if without 
> concerning RMRR,
> 
> A. [0xa0000000,0xa1000000]
> B. [0xa1000000,0xa2000000]
> 
> But if one RMRR resides at [0xa0f00000, 0xa1f00000] which obviously 
> generate its own alignment with 0x1000000. So the pci memory is expended 
> as [0xa0000000, 0xa3000000], right?
> 
> Then actually the whole pci memory can be separated three segments like,
> 
> #1. [0xa0000000, 0xa0f00000]
> #2. [0xa0f00000, 0xa1f00000] -> RMRR would occupy
> #3. [0xa1f00000, 0xa3000000]
> 
> So just #3 can suffice to allocate but just for one device, right?

Right, i.e. this isn't even sufficient - you need [a0000000,a3ffffff]
to fit everything (but of course you can put smaller BARs into the
unused ranges [a0000000,a0efffff] and [a1f00000,a1ffffff]).
That's why I said it's not going to be tricky to get all corner cases
right _and_ not use up more space than needed.

>>>> ought to work out the smallest power-of-2 region enclosing the
>>>
>>> Okay. I remember the smallest size of a given PCI I/O space is 8 bytes,
>>> and the smallest size of a PCI memory space is 16 bytes. So
>>>
>>> /* At least 16 bytes to align a PCI BAR size. */
>>> uint64_t align = 16;
>>>
>>> reserved_start = memory_map.map[j].addr;
>>> reserved_size = memory_map.map[j].size;
>>>
>>> reserved_start = (reserved_star + align) & ~(align - 1);
>>> reserved_size = (reserved_size + align) & ~(align - 1);
>>>
>>> Is this correct?
>>
>> Simply aligning the region doesn't help afaict. You need to fit it
>> with the other MMIO allocations.
> 
> I guess you're saying just those mmio allocations conflicting with RMRR? 
> But we don't know these exact addresses until we finalize to allocate 
> them, right?

That's the point - you need to allocate them _around_ the reserved
regions.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-17  8:05                   ` Jan Beulich
@ 2015-06-17  8:26                     ` Chen, Tiejun
  2015-06-17  8:47                       ` Chen, Tiejun
  2015-06-17  9:02                       ` Jan Beulich
  0 siblings, 2 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-17  8:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

On 2015/6/17 16:05, Jan Beulich wrote:
>>>> On 17.06.15 at 09:54, <tiejun.chen@intel.com> wrote:
>> On 2015/6/17 15:19, Jan Beulich wrote:
>>>>>> On 17.06.15 at 09:10, <tiejun.chen@intel.com> wrote:
>>>> Yeah, this may waste some spaces in this worst case but I this think
>>>> this can guarantee our change don't impact on the original expectation,
>>>> right?
>>>
>>> "Some space" may be multiple Gb (e.g. the frame buffer of a graphics
>>
>> Sure.
>>
>>> card), which is totally unacceptable.
>>>
>>
>> But then I don't understand what's your way. How can we fit all pci
>> devices just with "the smallest power-of-2 region enclosing the reserved
>> device memory"?
>>
>> For example, the whole pci memory is sitting at
>> [0xa0000000, 0xa2000000]. And there are two PCI devices, A and B. Note
>> each device needs to be allocated with 0x1000000. So if without
>> concerning RMRR,
>>
>> A. [0xa0000000,0xa1000000]
>> B. [0xa1000000,0xa2000000]
>>
>> But if one RMRR resides at [0xa0f00000, 0xa1f00000] which obviously
>> generate its own alignment with 0x1000000. So the pci memory is expended
>> as [0xa0000000, 0xa3000000], right?
>>
>> Then actually the whole pci memory can be separated three segments like,
>>
>> #1. [0xa0000000, 0xa0f00000]
>> #2. [0xa0f00000, 0xa1f00000] -> RMRR would occupy
>> #3. [0xa1f00000, 0xa3000000]
>>
>> So just #3 can suffice to allocate but just for one device, right?
>
> Right, i.e. this isn't even sufficient - you need [a0000000,a3ffffff]
> to fit everything (but of course you can put smaller BARs into the
> unused ranges [a0000000,a0efffff] and [a1f00000,a1ffffff]).

Yes, I knew there's this sort of hole that we should use efficiently as 
you said. And I also thought about this way previously but current pci 
allocation framework isn't friend to implement this easily,

     /* Assign iomem and ioport resources in descending order of size. */
     for ( i = 0; i < nr_bars; i++ )
     {

I mean it isn't easy to calculate what's the most sufficient size in 
advance, and its also difficult to find to fit a appropriate pci device 
into those "holes", so see below,

> That's why I said it's not going to be tricky to get all corner cases
> right _and_ not use up more space than needed.
>
>>>>> ought to work out the smallest power-of-2 region enclosing the
>>>>
>>>> Okay. I remember the smallest size of a given PCI I/O space is 8 bytes,
>>>> and the smallest size of a PCI memory space is 16 bytes. So
>>>>
>>>> /* At least 16 bytes to align a PCI BAR size. */
>>>> uint64_t align = 16;
>>>>
>>>> reserved_start = memory_map.map[j].addr;
>>>> reserved_size = memory_map.map[j].size;
>>>>
>>>> reserved_start = (reserved_star + align) & ~(align - 1);
>>>> reserved_size = (reserved_size + align) & ~(align - 1);
>>>>
>>>> Is this correct?
>>>
>>> Simply aligning the region doesn't help afaict. You need to fit it
>>> with the other MMIO allocations.
>>
>> I guess you're saying just those mmio allocations conflicting with RMRR?
>> But we don't know these exact addresses until we finalize to allocate
>> them, right?
>
> That's the point - you need to allocate them _around_ the reserved
> regions.
>

Something hits me to generate another idea,

#1. Still allocate all devices as before.
#2. Lookup all actual bars to check if they're conflicting RMRR

We can skip these bars to keep zero. Then later it would make lookup easily.

#3. Need to reallocate these conflicting bars.
#3.1 Trying to reallocate them with the remaining resources
#3.2 If the remaining resources aren't enough, we need to allocate them 
from high_mem_resource.

I just feel this way may be easy and better. And even, this way also can 
help terminate the preexisting allocation failures, right?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-17  8:26                     ` Chen, Tiejun
@ 2015-06-17  8:47                       ` Chen, Tiejun
  2015-06-17  9:02                       ` Jan Beulich
  1 sibling, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-17  8:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, wei.liu2, ian.campbell, tim, Ian.Jackson, xen-devel,
	stefano.stabellini, andrew.cooper3, Yang Z Zhang

On 2015/6/17 16:26, Chen, Tiejun wrote:
> On 2015/6/17 16:05, Jan Beulich wrote:
>>>>> On 17.06.15 at 09:54, <tiejun.chen@intel.com> wrote:
>>> On 2015/6/17 15:19, Jan Beulich wrote:
>>>>>>> On 17.06.15 at 09:10, <tiejun.chen@intel.com> wrote:
>>>>> Yeah, this may waste some spaces in this worst case but I this think
>>>>> this can guarantee our change don't impact on the original
>>>>> expectation,
>>>>> right?
>>>>
>>>> "Some space" may be multiple Gb (e.g. the frame buffer of a graphics
>>>
>>> Sure.
>>>
>>>> card), which is totally unacceptable.
>>>>
>>>
>>> But then I don't understand what's your way. How can we fit all pci
>>> devices just with "the smallest power-of-2 region enclosing the reserved
>>> device memory"?
>>>
>>> For example, the whole pci memory is sitting at
>>> [0xa0000000, 0xa2000000]. And there are two PCI devices, A and B. Note
>>> each device needs to be allocated with 0x1000000. So if without
>>> concerning RMRR,
>>>
>>> A. [0xa0000000,0xa1000000]
>>> B. [0xa1000000,0xa2000000]
>>>
>>> But if one RMRR resides at [0xa0f00000, 0xa1f00000] which obviously
>>> generate its own alignment with 0x1000000. So the pci memory is expended
>>> as [0xa0000000, 0xa3000000], right?
>>>
>>> Then actually the whole pci memory can be separated three segments like,
>>>
>>> #1. [0xa0000000, 0xa0f00000]
>>> #2. [0xa0f00000, 0xa1f00000] -> RMRR would occupy
>>> #3. [0xa1f00000, 0xa3000000]
>>>
>>> So just #3 can suffice to allocate but just for one device, right?
>>
>> Right, i.e. this isn't even sufficient - you need [a0000000,a3ffffff]
>> to fit everything (but of course you can put smaller BARs into the
>> unused ranges [a0000000,a0efffff] and [a1f00000,a1ffffff]).
>
> Yes, I knew there's this sort of hole that we should use efficiently as
> you said. And I also thought about this way previously but current pci
> allocation framework isn't friend to implement this easily,
>
>      /* Assign iomem and ioport resources in descending order of size. */
>      for ( i = 0; i < nr_bars; i++ )
>      {
>
> I mean it isn't easy to calculate what's the most sufficient size in
> advance, and its also difficult to find to fit a appropriate pci device
> into those "holes", so see below,
>
>> That's why I said it's not going to be tricky to get all corner cases
>> right _and_ not use up more space than needed.
>>
>>>>>> ought to work out the smallest power-of-2 region enclosing the
>>>>>
>>>>> Okay. I remember the smallest size of a given PCI I/O space is 8
>>>>> bytes,
>>>>> and the smallest size of a PCI memory space is 16 bytes. So
>>>>>
>>>>> /* At least 16 bytes to align a PCI BAR size. */
>>>>> uint64_t align = 16;
>>>>>
>>>>> reserved_start = memory_map.map[j].addr;
>>>>> reserved_size = memory_map.map[j].size;
>>>>>
>>>>> reserved_start = (reserved_star + align) & ~(align - 1);
>>>>> reserved_size = (reserved_size + align) & ~(align - 1);
>>>>>
>>>>> Is this correct?
>>>>
>>>> Simply aligning the region doesn't help afaict. You need to fit it
>>>> with the other MMIO allocations.
>>>
>>> I guess you're saying just those mmio allocations conflicting with RMRR?
>>> But we don't know these exact addresses until we finalize to allocate
>>> them, right?
>>
>> That's the point - you need to allocate them _around_ the reserved
>> regions.
>>
>
> Something hits me to generate another idea,
>
> #1. Still allocate all devices as before.
> #2. Lookup all actual bars to check if they're conflicting RMRR
>
> We can skip these bars to keep zero. Then later it would make lookup
> easily.

Sorry this may bring a confusion when I reread these lines.

I mean,

Or we can skip allocating these bars to keep zero during #1. But further 
consider this way, I think this lead that it can't make sure the 
remaining resources don't conflict RMRR. So please ignore this option 
and just do #1 purely.

Thanks
Tiejun

>
> #3. Need to reallocate these conflicting bars.
> #3.1 Trying to reallocate them with the remaining resources
> #3.2 If the remaining resources aren't enough, we need to allocate them
> from high_mem_resource.
>
> I just feel this way may be easy and better. And even, this way also can
> help terminate the preexisting allocation failures, right?
>
> Thanks
> Tiejun
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-17  8:26                     ` Chen, Tiejun
  2015-06-17  8:47                       ` Chen, Tiejun
@ 2015-06-17  9:02                       ` Jan Beulich
  2015-06-17  9:18                         ` Chen, Tiejun
  1 sibling, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-17  9:02 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

>>> On 17.06.15 at 10:26, <tiejun.chen@intel.com> wrote:
> Something hits me to generate another idea,
> 
> #1. Still allocate all devices as before.
> #2. Lookup all actual bars to check if they're conflicting RMRR
> 
> We can skip these bars to keep zero. Then later it would make lookup easily.
> 
> #3. Need to reallocate these conflicting bars.
> #3.1 Trying to reallocate them with the remaining resources
> #3.2 If the remaining resources aren't enough, we need to allocate them 
> from high_mem_resource.

That's possible onyl for 64-bit BARs.

> I just feel this way may be easy and better. And even, this way also can 
> help terminate the preexisting allocation failures, right?

I think this complicates things rather than simplifying them: The
more passes (and adjustments to previous settings) you do, the
more error prone the whole logic will become. It may well be that
you need to basically re-write what is there right now...

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-17  9:02                       ` Jan Beulich
@ 2015-06-17  9:18                         ` Chen, Tiejun
  2015-06-17  9:24                           ` Jan Beulich
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-17  9:18 UTC (permalink / raw)
  To: Jan Beulich, Kevin Tian
  Cc: tim, wei.liu2, ian.campbell, andrew.cooper3, Ian.Jackson,
	xen-devel, stefano.stabellini, Yang Z Zhang

On 2015/6/17 17:02, Jan Beulich wrote:
>>>> On 17.06.15 at 10:26, <tiejun.chen@intel.com> wrote:
>> Something hits me to generate another idea,
>>
>> #1. Still allocate all devices as before.
>> #2. Lookup all actual bars to check if they're conflicting RMRR
>>
>> We can skip these bars to keep zero. Then later it would make lookup easily.
>>
>> #3. Need to reallocate these conflicting bars.
>> #3.1 Trying to reallocate them with the remaining resources
>> #3.2 If the remaining resources aren't enough, we need to allocate them
>> from high_mem_resource.
>
> That's possible onyl for 64-bit BARs.

You're right so this means its not proper to adjust mmio_total to 
include conflicting reserved ranges and finally moved all conflicting 
bars to high_mem_resource as Kevin suggested previously, so i high 
level, we still need to decrease pci_mem_start to populate more RAM to 
compensate them as I did, right?

>
>> I just feel this way may be easy and better. And even, this way also can
>> help terminate the preexisting allocation failures, right?
>
> I think this complicates things rather than simplifying them: The
> more passes (and adjustments to previous settings) you do, the
> more error prone the whole logic will become. It may well be that
> you need to basically re-write what is there right now...
>

Yeah,  we need to think about this carefully.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-17  9:18                         ` Chen, Tiejun
@ 2015-06-17  9:24                           ` Jan Beulich
  2015-06-18  6:17                             ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-17  9:24 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

>>> On 17.06.15 at 11:18, <tiejun.chen@intel.com> wrote:
> On 2015/6/17 17:02, Jan Beulich wrote:
>>>>> On 17.06.15 at 10:26, <tiejun.chen@intel.com> wrote:
>>> Something hits me to generate another idea,
>>>
>>> #1. Still allocate all devices as before.
>>> #2. Lookup all actual bars to check if they're conflicting RMRR
>>>
>>> We can skip these bars to keep zero. Then later it would make lookup easily.
>>>
>>> #3. Need to reallocate these conflicting bars.
>>> #3.1 Trying to reallocate them with the remaining resources
>>> #3.2 If the remaining resources aren't enough, we need to allocate them
>>> from high_mem_resource.
>>
>> That's possible onyl for 64-bit BARs.
> 
> You're right so this means its not proper to adjust mmio_total to 
> include conflicting reserved ranges and finally moved all conflicting 
> bars to high_mem_resource as Kevin suggested previously, so i high 
> level, we still need to decrease pci_mem_start to populate more RAM to 
> compensate them as I did, right?

You probably should do both: Prefer moving things beyond 4Gb,
but if not possible increase the MMIO hole.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 03/16] xen/vtd: create RMRR mapping
  2015-06-11  1:15 ` [v3][PATCH 03/16] xen/vtd: create RMRR mapping Tiejun Chen
  2015-06-11  9:14   ` Tian, Kevin
@ 2015-06-17 10:03   ` Jan Beulich
  2015-06-18  6:23     ` Chen, Tiejun
  1 sibling, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-17 10:03 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, tim, xen-devel

>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -927,10 +927,16 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
>          }
>  
>          gfn_unlock(p2m, gfn, 0);
> -        return ret;
>      }
> +    else
> +        ret = 0;
>  
> -    return 0;
> +    if( ret == 0 )
> +    {
> +        ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
> +    }

Pointless braces and missing blank in the if() statement. I also think
this should be added right when the function gets introduced.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-11  1:15 ` [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
  2015-06-11  9:28   ` Tian, Kevin
@ 2015-06-17 10:11   ` Jan Beulich
  2015-06-18  7:14     ` Chen, Tiejun
  1 sibling, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-17 10:11 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, tim, xen-devel

>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
> @@ -899,7 +899,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
>  }
>  
>  int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
> -                           p2m_access_t p2ma)
> +                           p2m_access_t p2ma, u32 flag)

Please avoid using fixed width types unless really needed. Using
uint32_t in the public interface is the right thing to do, but in all
internal parts affected this can simply be (unsigned) int.

> --- a/xen/drivers/passthrough/device_tree.c
> +++ b/xen/drivers/passthrough/device_tree.c
> @@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev)
>              goto fail;
>      }
>  
> -    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
> +    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev),
> +                                         XEN_DOMCTL_DEV_NO_RDM);
>  
>      if ( rc )
>          goto fail;
> @@ -148,6 +149,14 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct domain *d,
>          if ( domctl->u.assign_device.dev != XEN_DOMCTL_DEV_DT )
>              break;
>  
> +        if ( domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM )
> +        {
> +            printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: assign \"%s\""
> +                   " to dom%u failed (%d) since we don't support RDM.\n",
> +                   dt_node_full_name(dev), d->domain_id, ret);
> +            break;
> +        }

Isn't the condition inverted, i.e. don't you mean != there?

> @@ -1577,9 +1578,10 @@ int iommu_do_pci_domctl(
>          seg = machine_sbdf >> 16;
>          bus = PCI_BUS(machine_sbdf);
>          devfn = PCI_DEVFN2(machine_sbdf);
> +        flag = domctl->u.assign_device.flag;
>  
>          ret = device_assigned(seg, bus, devfn) ?:
> -              assign_device(d, seg, bus, devfn);
> +              assign_device(d, seg, bus, devfn, flag);

I think you should range check the flag passed to make future
extensions possible (and to avoid ambiguity on what out of
range values would mean).

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 05/16] xen: enable XENMEM_memory_map in hvm
  2015-06-11  1:15 ` [v3][PATCH 05/16] xen: enable XENMEM_memory_map in hvm Tiejun Chen
  2015-06-11  9:29   ` Tian, Kevin
@ 2015-06-17 10:14   ` Jan Beulich
  2015-06-18  8:53     ` Chen, Tiejun
  1 sibling, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-17 10:14 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, tim, xen-devel

>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
> This patch enables XENMEM_memory_map in hvm. So we can use it to
> setup the e820 mappings.

I think saying "hvmloader" instead of "we" would make things more
explicit. In the context here, "we" would be the hypervisor, and
in that context enabling this subop to set up e820 mappings makes
no sense.

As to the change itself:
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 06/16] hvmloader: get guest memory map into memory_map[]
  2015-06-11  1:15 ` [v3][PATCH 06/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
  2015-06-11  9:38   ` Tian, Kevin
@ 2015-06-17 10:22   ` Jan Beulich
  2015-06-18  9:13     ` Chen, Tiejun
  1 sibling, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-17 10:22 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, tim, xen-devel

>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
> --- a/tools/firmware/hvmloader/hvmloader.c
> +++ b/tools/firmware/hvmloader/hvmloader.c
> @@ -107,6 +107,8 @@ asm (
>      "    .text                       \n"
>      );
>  
> +struct e820map memory_map;

Imo this should live in e820.c.

> @@ -199,6 +201,39 @@ static void apic_setup(void)
>      ioapic_write(0x11, SET_APIC_ID(LAPIC_ID(0)));
>  }
>  
> +void memory_map_setup(void)

And perhaps this one too. Or if not, it should be static.

> +{
> +    unsigned int nr_entries = E820MAX, i;
> +    int rc;
> +    uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
> +    uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
> +
> +    rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
> +
> +    if ( rc )
> +    {
> +        printf("Failed to get guest memory map.\n");
> +        BUG();
> +    }
> +
> +    BUG_ON(!nr_entries);

Please be consistent: printf()+BUG() or BUG_ON(). Also I think the
two (sanity) checks above could combined into one (and the
printf() should then print both rc and nr_entries).


> @@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
>      *p = '\0';
>  }
>  
> +int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)

Again no need for a fixed width type here afaict.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 16/16] xen/vtd: prevent from assign the device with shared rmrr
  2015-06-11  1:15 ` [v3][PATCH 16/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
  2015-06-11 10:25   ` Tian, Kevin
@ 2015-06-17 10:28   ` Jan Beulich
  2015-06-18  9:23     ` Chen, Tiejun
  1 sibling, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-17 10:28 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, tim, xen-devel

>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2277,13 +2277,37 @@ static int intel_iommu_assign_device(
>      if ( list_empty(&acpi_drhd_units) )
>          return -ENODEV;
>  
> +    seg = pdev->seg;
> +    bus = pdev->bus;
> +    /*
> +     * In rare cases one given rmrr is shared by multiple devices but
> +     * obviously this would put the security of a system at risk. So
> +     * we should prevent from this sort of device assignment.
> +     *
> +     * TODO: actually we can group these devices which shared rmrr, and
> +     * then allow all devices within a group to be assigned to same domain.
> +     */
> +    for_each_rmrr_device( rmrr, bdf, i )
> +    {
> +        if ( rmrr->segment == seg &&
> +             PCI_BUS(bdf) == bus &&
> +             PCI_DEVFN2(bdf) == devfn )
> +        {
> +            if ( rmrr->scope.devices_cnt > 1 )
> +            {
> +                ret = -EPERM;
> +                printk(XENLOG_G_ERR VTDPREFIX
> +                       " cannot assign this device with shared RMRR for Dom%d (%d)\n",
> +                       d->domain_id, ret);
> +                return ret;

return -EPERM. No need to assign the value to ret, and no need to
add the constant error code to the log entry. What's missing otoh
is what "this device" is - you should print SBDF instead.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-17  9:24                           ` Jan Beulich
@ 2015-06-18  6:17                             ` Chen, Tiejun
  2015-06-18  6:29                               ` Jan Beulich
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-18  6:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

On 2015/6/17 17:24, Jan Beulich wrote:
>>>> On 17.06.15 at 11:18, <tiejun.chen@intel.com> wrote:
>> On 2015/6/17 17:02, Jan Beulich wrote:
>>>>>> On 17.06.15 at 10:26, <tiejun.chen@intel.com> wrote:
>>>> Something hits me to generate another idea,
>>>>
>>>> #1. Still allocate all devices as before.
>>>> #2. Lookup all actual bars to check if they're conflicting RMRR
>>>>
>>>> We can skip these bars to keep zero. Then later it would make lookup easily.
>>>>
>>>> #3. Need to reallocate these conflicting bars.
>>>> #3.1 Trying to reallocate them with the remaining resources
>>>> #3.2 If the remaining resources aren't enough, we need to allocate them
>>>> from high_mem_resource.
>>>
>>> That's possible onyl for 64-bit BARs.
>>
>> You're right so this means its not proper to adjust mmio_total to
>> include conflicting reserved ranges and finally moved all conflicting
>> bars to high_mem_resource as Kevin suggested previously, so i high
>> level, we still need to decrease pci_mem_start to populate more RAM to
>> compensate them as I did, right?
>
> You probably should do both: Prefer moving things beyond 4Gb,
> but if not possible increase the MMIO hole.
>

I'm trying to figure out a better solution. Perhaps we can allocate 
32-bit bars and 64-bit bars orderly. This may help us bypass those 
complicated corner cases.

#1. We don't calculate how much memory should be compensated to add them 
to expand something like we though previously.

#2. Instead, before allocating bars, we just check if reserved device 
memory is really conflicting this default region [pci_mem_start, 
pci_mem_end]

#2.1 If not, obviously nothing is changed.
#2.2 If yes, we introduce a new local bool, bar32_allocating, which 
indicates if we want to allocate 32-bit bars and 64-bit bars separately.

So here we should set as true, and we also need to set 'bar64_relocate' 
to relocate bars to 64-bit.

     /*
      * Check if reserved device memory conflicts current pci memory.
      * If yes, we need to first allocate bar32 since reserved devices
      * always occupy low memory, and also enable relocating some BARs
      * to 64bit as possible.
      */
     for ( i = 0; i < memory_map.nr_map ; i++ )
     {
         reserved_start = memory_map.map[i].addr;
         reserved_size = memory_map.map[i].size;
         reserved_end = reserved_start + reserved_size;
         if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
                            reserved_start, reserved_size) )
         {
             printf("Reserved device memory conflicts current PCI memory,"
                    " so first to allocate 32-bit BAR and trying to"
                    " relocating some BARs to 64-bit\n");
             bar32_allocating = 1;
             if ( !bar64_relocate )
                 bar64_relocate = 1;
         }
     }

#2.2 then this also means we may allocate many times so add a label at 
the allocation,

+ further_allocate:
      /* Assign iomem and ioport resources in descending order of size. */
      for ( i = 0; i < nr_bars; i++ )
      {

#2.3 Make sure we can allocate all bars separately as we expect,

          bar_sz  = bars[i].bar_sz;

          /*
+         * This means we'd like to first allocate 32bit bar to make sure
+         * all 32bit bars can be allocated as possible.
+         */
+        if ( bars[i].is_64bar && bar32_allocating )
+            continue;
+
+        /*
           * Relocate to high memory if the total amount of MMIO needed
           * is more than the low MMIO available.  Because devices are
           * processed in order of bar_sz, this will preferentially

#2.4 We still need to skip reserved device memory

          base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
          bar_data |= (uint32_t)base;
          bar_data_upper = (uint32_t)(base >> 32);
+        /*
+         * Skip all reserved device memory.
+         *
+         * Note you need to make sure memory_map.map doesn't go high 
memory,
+         * then here we can make life easy.
+         */
+        if ( !using_64bar )
+        {
+            for ( j = 0; j < memory_map.nr_map ; j++ )
+            {
+                if ( memory_map.map[j].type != E820_RAM )
+                {
+                    reserved_end = memory_map.map[j].addr + 
memory_map.map[j].size;
+                    if ( check_overlap(base, bar_sz,
+                                       memory_map.map[j].addr,
+                                       memory_map.map[j].size) )
+                    {
+                        base = reserved_end;
+                        continue;
+                    }
+                }
+            }
+        }
          base += bar_sz;

          if ( (base < resource->base) || (base > resource->max) )

#2.5 Then go to the second allocation if necessary

#2.5.1 After we're trying to allocate all 32bit-bars at the first 
allocation, we'll check if any 32bit bars are not allocated.

#2.5.1 If all 32bit-bars are allocated, we just set something as follows,

+                bar32_allocating = 0;
+                goto further_allocate;

then we allocate 64bit-bards at the second allocation. But note this 
doesn't mean we will allocate 64bit-bars just from highmem since the 
original mechanism still work at this point, so we're still trying to 
allocate 64bit-bars from the remaining low pci memory & highmem like before.

#2.5.2 If not all 32bit-bars are allocated, we need to populate RAM to 
extend low pci memory.

#2.5.2.1 Calculate how much memory are still needed

+            /* Calculate the remaining 32bars. */
+            for ( n = 0; n < nr_bars ; n++ )
+            {
+                if ( !bars[n].is_64bar )
+                {
+                    uint32_t devfn32, bar_reg32, bar_data32;
+                    uint64_t bar_sz32;
+                    devfn32   = bars[n].devfn;
+                    bar_reg32 = bars[n].bar_reg;
+                    bar_sz32  = bars[n].bar_sz;
+                    bar_data32 = pci_readl(devfn32, bar_reg32);
+                    if ( !bar_data32 )
+                        mmio32_unallocated_total  += bar_sz32;
+                }
+            }

#2.5.2.2 populate these memory

+                cur_pci_mem_start = pci_mem_start - 
mmio32_unallocated_total;
+                relocate_ram_for_pci_memory(cur_pci_mem_start);
+                exp_mem_resource.base = cur_pci_mem_start;
+                exp_mem_resource.max = pci_mem_start;

#2.5.2.3 goto further_allocate to reallocate the remaining 32bit-bars. 
Note we should make sure we allocate them just from exp_mem_resource above.

#2.5.3 In theory we can allocate all 32bit-bars successfully. But we can 
check if mmio32_unallocated_total is already set to make sure we don't 
fall into an infinite loop.

#2.5.4 Then as before, we would go to allocate 64bit-bars at the first 
allocation.

The following patch is trying to implement my idea,

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..f2953c0 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,31 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
  enum virtual_vga virtual_vga = VGA_none;
  unsigned long igd_opregion_pgbase = 0;

+static void relocate_ram_for_pci_memory(unsigned long cur_pci_mem_start)
+{
+    struct xen_add_to_physmap xatp;
+    unsigned int nr_pages = min_t(
+        unsigned int,
+        hvm_info->low_mem_pgend - (cur_pci_mem_start >> PAGE_SHIFT),
+        (1u << 16) - 1);
+    if ( hvm_info->high_mem_pgend == 0 )
+        hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
+    hvm_info->low_mem_pgend -= nr_pages;
+    printf("Relocating 0x%x pages from "PRIllx" to "PRIllx\
+           " for lowmem MMIO hole\n",
+           nr_pages,
+           PRIllx_arg(((uint64_t)hvm_info->low_mem_pgend)<<PAGE_SHIFT),
+           PRIllx_arg(((uint64_t)hvm_info->high_mem_pgend)<<PAGE_SHIFT));
+    xatp.domid = DOMID_SELF;
+    xatp.space = XENMAPSPACE_gmfn_range;
+    xatp.idx   = hvm_info->low_mem_pgend;
+    xatp.gpfn  = hvm_info->high_mem_pgend;
+    xatp.size  = nr_pages;
+    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
+        BUG();
+    hvm_info->high_mem_pgend += nr_pages;
+}
+
  void pci_setup(void)
  {
      uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -50,7 +75,7 @@ void pci_setup(void)
      /* Resources assignable to PCI devices via BARs. */
      struct resource {
          uint64_t base, max;
-    } *resource, mem_resource, high_mem_resource, io_resource;
+    } *resource, mem_resource, high_mem_resource, io_resource, 
exp_mem_resource;

      /* Create a list of device BARs in descending order of size. */
      struct bars {
@@ -59,8 +84,11 @@ void pci_setup(void)
          uint32_t bar_reg;
          uint64_t bar_sz;
      } *bars = (struct bars *)scratch_start;
-    unsigned int i, nr_bars = 0;
-    uint64_t mmio_hole_size = 0;
+    unsigned int i, j, n, nr_bars = 0;
+    uint64_t mmio_hole_size = 0, reserved_start, reserved_end, 
reserved_size;
+    bool bar32_allocating = 0;
+    uint64_t mmio32_unallocated_total = 0;
+    unsigned long cur_pci_mem_start = 0;

      const char *s;
      /*
@@ -309,29 +337,31 @@ void pci_setup(void)
      }

      /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
+    cur_pci_mem_start = pci_mem_start;
      while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
+        relocate_ram_for_pci_memory(cur_pci_mem_start);
+
+    /*
+     * Check if reserved device memory conflicts current pci memory.
+     * If yes, we need to first allocate bar32 since reserved devices
+     * always occupy low memory, and also enable relocating some BARs
+     * to 64bit as possible.
+     */
+    for ( i = 0; i < memory_map.nr_map ; i++ )
      {
-        struct xen_add_to_physmap xatp;
-        unsigned int nr_pages = min_t(
-            unsigned int,
-            hvm_info->low_mem_pgend - (pci_mem_start >> PAGE_SHIFT),
-            (1u << 16) - 1);
-        if ( hvm_info->high_mem_pgend == 0 )
-            hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
-        hvm_info->low_mem_pgend -= nr_pages;
-        printf("Relocating 0x%x pages from "PRIllx" to "PRIllx\
-               " for lowmem MMIO hole\n",
-               nr_pages,
-               PRIllx_arg(((uint64_t)hvm_info->low_mem_pgend)<<PAGE_SHIFT),
- 
PRIllx_arg(((uint64_t)hvm_info->high_mem_pgend)<<PAGE_SHIFT));
-        xatp.domid = DOMID_SELF;
-        xatp.space = XENMAPSPACE_gmfn_range;
-        xatp.idx   = hvm_info->low_mem_pgend;
-        xatp.gpfn  = hvm_info->high_mem_pgend;
-        xatp.size  = nr_pages;
-        if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
-            BUG();
-        hvm_info->high_mem_pgend += nr_pages;
+        reserved_start = memory_map.map[i].addr;
+        reserved_size = memory_map.map[i].size;
+        reserved_end = reserved_start + reserved_size;
+        if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start,
+                           reserved_start, reserved_size) )
+        {
+            printf("Reserved device memory conflicts current PCI memory,"
+                   " so first to allocate 32-bit BAR and trying to"
+                   " relocating some BARs to 64-bit\n");
+            bar32_allocating = 1;
+            if ( !bar64_relocate )
+                bar64_relocate = 1;
+        }
      }

      high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend) << 
PAGE_SHIFT;
@@ -352,6 +382,7 @@ void pci_setup(void)
      io_resource.base = 0xc000;
      io_resource.max = 0x10000;

+ further_allocate:
      /* Assign iomem and ioport resources in descending order of size. */
      for ( i = 0; i < nr_bars; i++ )
      {
@@ -360,6 +391,13 @@ void pci_setup(void)
          bar_sz  = bars[i].bar_sz;

          /*
+         * This means we'd like to first allocate 32bit bar to make sure
+         * all 32bit bars can be allocated as possible.
+         */
+        if ( bars[i].is_64bar && bar32_allocating )
+            continue;
+
+        /*
           * Relocate to high memory if the total amount of MMIO needed
           * is more than the low MMIO available.  Because devices are
           * processed in order of bar_sz, this will preferentially
@@ -395,7 +433,14 @@ void pci_setup(void)
                  bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
              }
              else {
-                resource = &mem_resource;
+                /*
+                 * This menas we're trying to use that expanded
+                 * memory to reallocate 32bars.
+                 */
+                if ( mmio32_unallocated_total )
+                    resource = &exp_mem_resource;
+                else
+                    resource = &mem_resource;
                  bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
              }
              mmio_total -= bar_sz;
@@ -409,6 +454,29 @@ void pci_setup(void)
          base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
          bar_data |= (uint32_t)base;
          bar_data_upper = (uint32_t)(base >> 32);
+        /*
+         * Skip all reserved device memory.
+         *
+         * Note you need to make sure memory_map.map doesn't go high 
memory,
+         * then here we can make life easy.
+         */
+        if ( !using_64bar )
+        {
+            for ( j = 0; j < memory_map.nr_map ; j++ )
+            {
+                if ( memory_map.map[j].type != E820_RAM )
+                {
+                    reserved_end = memory_map.map[j].addr + 
memory_map.map[j].size;
+                    if ( check_overlap(base, bar_sz,
+                                       memory_map.map[j].addr,
+                                       memory_map.map[j].size) )
+                    {
+                        base = reserved_end;
+                        continue;
+                    }
+                }
+            }
+        }
          base += bar_sz;

          if ( (base < resource->base) || (base > resource->max) )
@@ -439,6 +507,54 @@ void pci_setup(void)
          else
              cmd |= PCI_COMMAND_IO;
          pci_writew(devfn, PCI_COMMAND, cmd);
+
+        /* If we finish allocating bar32 at the first time. */
+        if ( i == nr_bars && bar32_allocating )
+        {
+            /*
+             * We won't repeat to populate more RAM to finalize
+             * allocate all 32bars, so just go to allocate 64bit-bars.
+             */
+            if ( mmio32_unallocated_total )
+            {
+                bar32_allocating = 0;
+                mmio32_unallocated_total = 0;
+                high_mem_resource.base =
+                        ((uint64_t)hvm_info->high_mem_pgend) << PAGE_SHIFT;
+                goto further_allocate;
+            }
+
+            /* Calculate the remaining 32bars. */
+            for ( n = 0; n < nr_bars ; n++ )
+            {
+                if ( !bars[n].is_64bar )
+                {
+                    uint32_t devfn32, bar_reg32, bar_data32;
+                    uint64_t bar_sz32;
+                    devfn32   = bars[n].devfn;
+                    bar_reg32 = bars[n].bar_reg;
+                    bar_sz32  = bars[n].bar_sz;
+                    bar_data32 = pci_readl(devfn32, bar_reg32);
+                    if ( !bar_data32 )
+                        mmio32_unallocated_total  += bar_sz32;
+                }
+            }
+
+            /*
+             * We have to populate more RAM to further allocate
+             * the remaining 32bars.
+             */
+            if ( mmio32_unallocated_total )
+            {
+                cur_pci_mem_start = pci_mem_start - 
mmio32_unallocated_total;
+                relocate_ram_for_pci_memory(cur_pci_mem_start);
+                exp_mem_resource.base = cur_pci_mem_start;
+                exp_mem_resource.max = pci_mem_start;
+            }
+            else
+                bar32_allocating = 0;
+            goto further_allocate;
+        }
      }

      if ( pci_hi_mem_start )


Thanks
Tiejun

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 03/16] xen/vtd: create RMRR mapping
  2015-06-17 10:03   ` Jan Beulich
@ 2015-06-18  6:23     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-18  6:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, tim, xen-devel

On 2015/6/17 18:03, Jan Beulich wrote:
>>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -927,10 +927,16 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
>>           }
>>
>>           gfn_unlock(p2m, gfn, 0);
>> -        return ret;
>>       }
>> +    else
>> +        ret = 0;
>>
>> -    return 0;
>> +    if( ret == 0 )
>> +    {
>> +        ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
>> +    }
>
> Pointless braces and missing blank in the if() statement. I also think
> this should be added right when the function gets introduced.

Finally I think we should remove this change since p2m_set_entry() 
always set iommu page map internally.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-18  6:17                             ` Chen, Tiejun
@ 2015-06-18  6:29                               ` Jan Beulich
  2015-06-18  7:01                                 ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-18  6:29 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

>>> On 18.06.15 at 08:17, <tiejun.chen@intel.com> wrote:
> On 2015/6/17 17:24, Jan Beulich wrote:
>>>>> On 17.06.15 at 11:18, <tiejun.chen@intel.com> wrote:
>>> On 2015/6/17 17:02, Jan Beulich wrote:
>>>>>>> On 17.06.15 at 10:26, <tiejun.chen@intel.com> wrote:
>>>>> Something hits me to generate another idea,
>>>>>
>>>>> #1. Still allocate all devices as before.
>>>>> #2. Lookup all actual bars to check if they're conflicting RMRR
>>>>>
>>>>> We can skip these bars to keep zero. Then later it would make lookup easily.
>>>>>
>>>>> #3. Need to reallocate these conflicting bars.
>>>>> #3.1 Trying to reallocate them with the remaining resources
>>>>> #3.2 If the remaining resources aren't enough, we need to allocate them
>>>>> from high_mem_resource.
>>>>
>>>> That's possible onyl for 64-bit BARs.
>>>
>>> You're right so this means its not proper to adjust mmio_total to
>>> include conflicting reserved ranges and finally moved all conflicting
>>> bars to high_mem_resource as Kevin suggested previously, so i high
>>> level, we still need to decrease pci_mem_start to populate more RAM to
>>> compensate them as I did, right?
>>
>> You probably should do both: Prefer moving things beyond 4Gb,
>> but if not possible increase the MMIO hole.
>>
> 
> I'm trying to figure out a better solution. Perhaps we can allocate 
> 32-bit bars and 64-bit bars orderly. This may help us bypass those 
> complicated corner cases.

Dealing with 32- and 64-bit BARs separately won't help at all, as
there may only be 32-bit ones, or the set of 32-bit ones may
already require you to do re-arrangements. Plus, for compatibility
reasons (just like physical machines' BIOSes do) avoiding to place
MMIO above 4Gb where possible is still a goal.

> #1. We don't calculate how much memory should be compensated to add them 
> to expand something like we though previously.
> 
> #2. Instead, before allocating bars, we just check if reserved device 
> memory is really conflicting this default region [pci_mem_start, 
> pci_mem_end]
> 
> #2.1 If not, obviously nothing is changed.
> #2.2 If yes, we introduce a new local bool, bar32_allocating, which 
> indicates if we want to allocate 32-bit bars and 64-bit bars separately.
> 
> So here we should set as true, and we also need to set 'bar64_relocate' 
> to relocate bars to 64-bit.

Doesn't look like the right approach to me. As said before, I think
you should allocate BARs _around_ reserved regions (perhaps
filling non-aligned areas first, again utilizing that BARs are always
a power of 2 in size).

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-18  6:29                               ` Jan Beulich
@ 2015-06-18  7:01                                 ` Chen, Tiejun
  2015-06-18  8:05                                   ` Jan Beulich
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-18  7:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

On 2015/6/18 14:29, Jan Beulich wrote:
>>>> On 18.06.15 at 08:17, <tiejun.chen@intel.com> wrote:
>> On 2015/6/17 17:24, Jan Beulich wrote:
>>>>>> On 17.06.15 at 11:18, <tiejun.chen@intel.com> wrote:
>>>> On 2015/6/17 17:02, Jan Beulich wrote:
>>>>>>>> On 17.06.15 at 10:26, <tiejun.chen@intel.com> wrote:
>>>>>> Something hits me to generate another idea,
>>>>>>
>>>>>> #1. Still allocate all devices as before.
>>>>>> #2. Lookup all actual bars to check if they're conflicting RMRR
>>>>>>
>>>>>> We can skip these bars to keep zero. Then later it would make lookup easily.
>>>>>>
>>>>>> #3. Need to reallocate these conflicting bars.
>>>>>> #3.1 Trying to reallocate them with the remaining resources
>>>>>> #3.2 If the remaining resources aren't enough, we need to allocate them
>>>>>> from high_mem_resource.
>>>>>
>>>>> That's possible onyl for 64-bit BARs.
>>>>
>>>> You're right so this means its not proper to adjust mmio_total to
>>>> include conflicting reserved ranges and finally moved all conflicting
>>>> bars to high_mem_resource as Kevin suggested previously, so i high
>>>> level, we still need to decrease pci_mem_start to populate more RAM to
>>>> compensate them as I did, right?
>>>
>>> You probably should do both: Prefer moving things beyond 4Gb,
>>> but if not possible increase the MMIO hole.
>>>
>>
>> I'm trying to figure out a better solution. Perhaps we can allocate
>> 32-bit bars and 64-bit bars orderly. This may help us bypass those
>> complicated corner cases.
>
> Dealing with 32- and 64-bit BARs separately won't help at all, as

More precisely I'm saying to deal with them orderly.

> there may only be 32-bit ones, or the set of 32-bit ones may
> already require you to do re-arrangements. Plus, for compatibility

Yes but I don't understand they are specific cases to my idea.

> reasons (just like physical machines' BIOSes do) avoiding to place
> MMIO above 4Gb where possible is still a goal.

So are you sure you see my idea completely? I don't intend to expand pci 
memory above 4GB.

Let me clear this simply,

#1. I'm still trying to allocate all 32bit bars from 
[pci_mem_start,pci_mem_end] as before.

#2. But [pci_mem_start,pci_mem_end] mightn't enough cover all 32bit-bars 
again because of RMRR, right? So I will populate RAM to push downward at 
cur_pci_mem_start ( = pci_mem_start - reserved device memory), then 
allocate the remaining 32bit-bars from [cur_pci_mem_start , pci_mem_start]

#3. Then I'm still trying to allocate 64bit-bars from 
[pci_mem_start,pci_mem_end], unless its not enough. This is just going 
to follow the original.

So anything is breaking that goal? And overall, its same as the original.

>
>> #1. We don't calculate how much memory should be compensated to add them
>> to expand something like we though previously.
>>
>> #2. Instead, before allocating bars, we just check if reserved device
>> memory is really conflicting this default region [pci_mem_start,
>> pci_mem_end]
>>
>> #2.1 If not, obviously nothing is changed.
>> #2.2 If yes, we introduce a new local bool, bar32_allocating, which
>> indicates if we want to allocate 32-bit bars and 64-bit bars separately.
>>
>> So here we should set as true, and we also need to set 'bar64_relocate'
>> to relocate bars to 64-bit.
>

'bar64_relocate' doesn't indicate we always allocate them from highmem. 
Instead, we're trying to fist allocate them from low pci memory, but if 
low memory is not enough to allocate, then we'll relocate bars to 
64-bit. This is a original mechanism and I just use that.

> Doesn't look like the right approach to me. As said before, I think

Could you see what I'm saying again? I just feel you don't understand 
what you mean. If you still think I'm wrong let me know.

> you should allocate BARs _around_ reserved regions (perhaps

I don't involve to allocate BAR directly.

> filling non-aligned areas first, again utilizing that BARs are always
> a power of 2 in size).

We're populating RAM aligned to *page* before allocating as before.

Thanks
Tiejun

>
> Jan
>
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-17 10:11   ` Jan Beulich
@ 2015-06-18  7:14     ` Chen, Tiejun
  2015-06-18  7:53       ` Jan Beulich
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-18  7:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, tim, xen-devel

On 2015/6/17 18:11, Jan Beulich wrote:
>>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
>> @@ -899,7 +899,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
>>   }
>>
>>   int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
>> -                           p2m_access_t p2ma)
>> +                           p2m_access_t p2ma, u32 flag)
>
> Please avoid using fixed width types unless really needed. Using
> uint32_t in the public interface is the right thing to do, but in all
> internal parts affected this can simply be (unsigned) int.

Will do.

>
>> --- a/xen/drivers/passthrough/device_tree.c
>> +++ b/xen/drivers/passthrough/device_tree.c
>> @@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev)
>>               goto fail;
>>       }
>>
>> -    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
>> +    rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev),
>> +                                         XEN_DOMCTL_DEV_NO_RDM);
>>
>>       if ( rc )
>>           goto fail;
>> @@ -148,6 +149,14 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct domain *d,
>>           if ( domctl->u.assign_device.dev != XEN_DOMCTL_DEV_DT )
>>               break;
>>
>> +        if ( domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM )
>> +        {
>> +            printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: assign \"%s\""
>> +                   " to dom%u failed (%d) since we don't support RDM.\n",
>> +                   dt_node_full_name(dev), d->domain_id, ret);
>> +            break;
>> +        }
>
> Isn't the condition inverted, i.e. don't you mean != there?

You're right and thanks.

>
>> @@ -1577,9 +1578,10 @@ int iommu_do_pci_domctl(
>>           seg = machine_sbdf >> 16;
>>           bus = PCI_BUS(machine_sbdf);
>>           devfn = PCI_DEVFN2(machine_sbdf);
>> +        flag = domctl->u.assign_device.flag;
>>
>>           ret = device_assigned(seg, bus, devfn) ?:
>> -              assign_device(d, seg, bus, devfn);
>> +              assign_device(d, seg, bus, devfn, flag);
>
> I think you should range check the flag passed to make future
> extensions possible (and to avoid ambiguity on what out of
> range values would mean).

Yeah.

Maybe I can set this comment,

     /* Make sure this is always the last. */ 

#define XEN_DOMCTL_DEV_NO_RDM           2 

     uint32_t  flag;   /* flag of assigned device */


and then

         flag = domctl->u.assign_device.flag;
         if ( flag > XEN_DOMCTL_DEV_NO_RDM )
         {
             printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: "
                    "assign %04x:%02x:%02x.%u to dom%d failed "
                    "with unknown rdm flag %x. (%d)\n",
                    seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
                    d->domain_id, flag, ret);
             ret = -EINVAL;
             break;
         }


Thanks
Tiejun
>
> Jan
>
>
>

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-18  7:14     ` Chen, Tiejun
@ 2015-06-18  7:53       ` Jan Beulich
  2015-06-18  8:48         ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-18  7:53 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, tim, xen-devel

>>> On 18.06.15 at 09:14, <tiejun.chen@intel.com> wrote:
> On 2015/6/17 18:11, Jan Beulich wrote:
>>>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
>>> @@ -1577,9 +1578,10 @@ int iommu_do_pci_domctl(
>>>           seg = machine_sbdf >> 16;
>>>           bus = PCI_BUS(machine_sbdf);
>>>           devfn = PCI_DEVFN2(machine_sbdf);
>>> +        flag = domctl->u.assign_device.flag;
>>>
>>>           ret = device_assigned(seg, bus, devfn) ?:
>>> -              assign_device(d, seg, bus, devfn);
>>> +              assign_device(d, seg, bus, devfn, flag);
>>
>> I think you should range check the flag passed to make future
>> extensions possible (and to avoid ambiguity on what out of
>> range values would mean).
> 
> Yeah.
> 
> Maybe I can set this comment,
> 
>      /* Make sure this is always the last. */ 
> 
> #define XEN_DOMCTL_DEV_NO_RDM           2 
> 
>      uint32_t  flag;   /* flag of assigned device */

Why would you want to needlessly break the interface is a new
constant gets added? It's a domctl, so it can be changed, but we
shouldn't change for no reason.

> and then
> 
>          flag = domctl->u.assign_device.flag;
>          if ( flag > XEN_DOMCTL_DEV_NO_RDM )

All that needs updating when a new constant gets added is this
line.

>          {
>              printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: "
>                     "assign %04x:%02x:%02x.%u to dom%d failed "
>                     "with unknown rdm flag %x. (%d)\n",
>                     seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>                     d->domain_id, flag, ret);

I see absolutely no reason for such a log message.

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-18  7:01                                 ` Chen, Tiejun
@ 2015-06-18  8:05                                   ` Jan Beulich
  2015-06-19  2:02                                     ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-18  8:05 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

>>> On 18.06.15 at 09:01, <tiejun.chen@intel.com> wrote:
> On 2015/6/18 14:29, Jan Beulich wrote:
>>>>> On 18.06.15 at 08:17, <tiejun.chen@intel.com> wrote:
>>> On 2015/6/17 17:24, Jan Beulich wrote:
>>>>>>> On 17.06.15 at 11:18, <tiejun.chen@intel.com> wrote:
>>>>> On 2015/6/17 17:02, Jan Beulich wrote:
>>>>>>>>> On 17.06.15 at 10:26, <tiejun.chen@intel.com> wrote:
>>>>>>> Something hits me to generate another idea,
>>>>>>>
>>>>>>> #1. Still allocate all devices as before.
>>>>>>> #2. Lookup all actual bars to check if they're conflicting RMRR
>>>>>>>
>>>>>>> We can skip these bars to keep zero. Then later it would make lookup easily.
>>>>>>>
>>>>>>> #3. Need to reallocate these conflicting bars.
>>>>>>> #3.1 Trying to reallocate them with the remaining resources
>>>>>>> #3.2 If the remaining resources aren't enough, we need to allocate them
>>>>>>> from high_mem_resource.
>>>>>>
>>>>>> That's possible onyl for 64-bit BARs.
>>>>>
>>>>> You're right so this means its not proper to adjust mmio_total to
>>>>> include conflicting reserved ranges and finally moved all conflicting
>>>>> bars to high_mem_resource as Kevin suggested previously, so i high
>>>>> level, we still need to decrease pci_mem_start to populate more RAM to
>>>>> compensate them as I did, right?
>>>>
>>>> You probably should do both: Prefer moving things beyond 4Gb,
>>>> but if not possible increase the MMIO hole.
>>>>
>>>
>>> I'm trying to figure out a better solution. Perhaps we can allocate
>>> 32-bit bars and 64-bit bars orderly. This may help us bypass those
>>> complicated corner cases.
>>
>> Dealing with 32- and 64-bit BARs separately won't help at all, as
> 
> More precisely I'm saying to deal with them orderly.
> 
>> there may only be 32-bit ones, or the set of 32-bit ones may
>> already require you to do re-arrangements. Plus, for compatibility
> 
> Yes but I don't understand they are specific cases to my idea.

Perhaps the problem is that you don't say what "orderly" is supposed
to mean here?

>> reasons (just like physical machines' BIOSes do) avoiding to place
>> MMIO above 4Gb where possible is still a goal.
> 
> So are you sure you see my idea completely? I don't intend to expand pci 
> memory above 4GB.
> 
> Let me clear this simply,
> 
> #1. I'm still trying to allocate all 32bit bars from 
> [pci_mem_start,pci_mem_end] as before.
> 
> #2. But [pci_mem_start,pci_mem_end] mightn't enough cover all 32bit-bars 
> again because of RMRR, right? So I will populate RAM to push downward at 
> cur_pci_mem_start ( = pci_mem_start - reserved device memory), then 
> allocate the remaining 32bit-bars from [cur_pci_mem_start , pci_mem_start]
> 
> #3. Then I'm still trying to allocate 64bit-bars from 
> [pci_mem_start,pci_mem_end], unless its not enough. This is just going 
> to follow the original.
> 
> So anything is breaking that goal?

Maybe not, from the above.

> And overall, its same as the original.

If the model follows the original, what's the point of outlining
supposed changes to the model? All I'm trying to understand is how
you want to change the current code to accommodate the not
aligned reserved memory regions. If everything is the same as
before, this can't have been taken care of. If something is different
from the original, that's what needs spelling out (and nothing else,
as that would only clutter the picture).

>> Doesn't look like the right approach to me. As said before, I think
> 
> Could you see what I'm saying again? I just feel you don't understand 
> what you mean. If you still think I'm wrong let me know.

I think I understand what _I_ mean, but I'm indeed unsure I see
what _you_ mean. Part of the problem is that you toggle between
sending (incomplete) patches, code fragments, and discussing the
approach verbally. I'd much prefer if either you started with a clear
picture of what you intend to implement, or with an implementation
that at least attempts to take care of all the corner cases (showing
that you understand what the corner cases are, which so far I'm
getting the - perhaps false - impression that you don't).

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-18  7:53       ` Jan Beulich
@ 2015-06-18  8:48         ` Chen, Tiejun
  2015-06-18  9:13           ` Jan Beulich
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-18  8:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, tim, xen-devel

On 2015/6/18 15:53, Jan Beulich wrote:
>>>> On 18.06.15 at 09:14, <tiejun.chen@intel.com> wrote:
>> On 2015/6/17 18:11, Jan Beulich wrote:
>>>>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
>>>> @@ -1577,9 +1578,10 @@ int iommu_do_pci_domctl(
>>>>            seg = machine_sbdf >> 16;
>>>>            bus = PCI_BUS(machine_sbdf);
>>>>            devfn = PCI_DEVFN2(machine_sbdf);
>>>> +        flag = domctl->u.assign_device.flag;
>>>>
>>>>            ret = device_assigned(seg, bus, devfn) ?:
>>>> -              assign_device(d, seg, bus, devfn);
>>>> +              assign_device(d, seg, bus, devfn, flag);
>>>
>>> I think you should range check the flag passed to make future
>>> extensions possible (and to avoid ambiguity on what out of
>>> range values would mean).
>>
>> Yeah.
>>
>> Maybe I can set this comment,
>>
>>       /* Make sure this is always the last. */
>>
>> #define XEN_DOMCTL_DEV_NO_RDM           2
>>
>>       uint32_t  flag;   /* flag of assigned device */
>
> Why would you want to needlessly break the interface is a new
> constant gets added? It's a domctl, so it can be changed, but we
> shouldn't change for no reason.

I just think XEN_DOMCTL_DEV_NO_RDM is prone to represent a sort of 
ending of all flags, and I also add this comment,

/* Make sure this is always the last. */

>
>> and then
>>
>>           flag = domctl->u.assign_device.flag;
>>           if ( flag > XEN_DOMCTL_DEV_NO_RDM )
>
> All that needs updating when a new constant gets added is this
> line.

This place really isn't one spotlight to take a attention when a new 
flag is introduced, right? So what I intend to is trying to make sure we 
don't need to change this.

>
>>           {
>>               printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: "
>>                      "assign %04x:%02x:%02x.%u to dom%d failed "
>>                      "with unknown rdm flag %x. (%d)\n",
>>                      seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>>                      d->domain_id, flag, ret);
>
> I see absolutely no reason for such a log message.
>

Do you mean I should simplify this log message? Or remove completely?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 05/16] xen: enable XENMEM_memory_map in hvm
  2015-06-17 10:14   ` Jan Beulich
@ 2015-06-18  8:53     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-18  8:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, tim, xen-devel

On 2015/6/17 18:14, Jan Beulich wrote:
>>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
>> This patch enables XENMEM_memory_map in hvm. So we can use it to
>> setup the e820 mappings.
>
> I think saying "hvmloader" instead of "we" would make things more
> explicit. In the context here, "we" would be the hypervisor, and

Fixed.

> in that context enabling this subop to set up e820 mappings makes
> no sense.
>
> As to the change itself:
> Acked-by: Jan Beulich <jbeulich@suse.com>
>

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 06/16] hvmloader: get guest memory map into memory_map[]
  2015-06-17 10:22   ` Jan Beulich
@ 2015-06-18  9:13     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-18  9:13 UTC (permalink / raw)
  To: Jan Beulich; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, tim, xen-devel

On 2015/6/17 18:22, Jan Beulich wrote:
>>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
>> --- a/tools/firmware/hvmloader/hvmloader.c
>> +++ b/tools/firmware/hvmloader/hvmloader.c
>> @@ -107,6 +107,8 @@ asm (
>>       "    .text                       \n"
>>       );
>>
>> +struct e820map memory_map;
>
> Imo this should live in e820.c.

Okay.

>
>> @@ -199,6 +201,39 @@ static void apic_setup(void)
>>       ioapic_write(0x11, SET_APIC_ID(LAPIC_ID(0)));
>>   }
>>
>> +void memory_map_setup(void)
>
> And perhaps this one too. Or if not, it should be static.

I'd like to move this into e820.c as well.

>
>> +{
>> +    unsigned int nr_entries = E820MAX, i;
>> +    int rc;
>> +    uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
>> +    uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
>> +
>> +    rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
>> +
>> +    if ( rc )
>> +    {
>> +        printf("Failed to get guest memory map.\n");
>> +        BUG();
>> +    }
>> +
>> +    BUG_ON(!nr_entries);
>
> Please be consistent: printf()+BUG() or BUG_ON(). Also I think the
> two (sanity) checks above could combined into one (and the
> printf() should then print both rc and nr_entries).

What about this?

     if ( rc || !nr_entries ) 

     { 

         printf("Get guest memory maps[%d] failed. (%d)\n", nr_entries, 
rc);
         BUG(); 

     }

>
>
>> @@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
>>       *p = '\0';
>>   }
>>
>> +int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)

s/uint32_t/unsigned int/

>
> Again no need for a fixed width type here afaict.
>

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-18  8:48         ` Chen, Tiejun
@ 2015-06-18  9:13           ` Jan Beulich
  2015-06-18  9:31             ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Jan Beulich @ 2015-06-18  9:13 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, tim, xen-devel

>>> On 18.06.15 at 10:48, <tiejun.chen@intel.com> wrote:
> On 2015/6/18 15:53, Jan Beulich wrote:
>>>>> On 18.06.15 at 09:14, <tiejun.chen@intel.com> wrote:
>>> On 2015/6/17 18:11, Jan Beulich wrote:
>>>>>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
>>>>> @@ -1577,9 +1578,10 @@ int iommu_do_pci_domctl(
>>>>>            seg = machine_sbdf >> 16;
>>>>>            bus = PCI_BUS(machine_sbdf);
>>>>>            devfn = PCI_DEVFN2(machine_sbdf);
>>>>> +        flag = domctl->u.assign_device.flag;
>>>>>
>>>>>            ret = device_assigned(seg, bus, devfn) ?:
>>>>> -              assign_device(d, seg, bus, devfn);
>>>>> +              assign_device(d, seg, bus, devfn, flag);
>>>>
>>>> I think you should range check the flag passed to make future
>>>> extensions possible (and to avoid ambiguity on what out of
>>>> range values would mean).
>>>
>>> Yeah.
>>>
>>> Maybe I can set this comment,
>>>
>>>       /* Make sure this is always the last. */
>>>
>>> #define XEN_DOMCTL_DEV_NO_RDM           2
>>>
>>>       uint32_t  flag;   /* flag of assigned device */
>>
>> Why would you want to needlessly break the interface is a new
>> constant gets added? It's a domctl, so it can be changed, but we
>> shouldn't change for no reason.
> 
> I just think XEN_DOMCTL_DEV_NO_RDM is prone to represent a sort of 
> ending of all flags, and I also add this comment,
> 
> /* Make sure this is always the last. */
> 
>>
>>> and then
>>>
>>>           flag = domctl->u.assign_device.flag;
>>>           if ( flag > XEN_DOMCTL_DEV_NO_RDM )
>>
>> All that needs updating when a new constant gets added is this
>> line.
> 
> This place really isn't one spotlight to take a attention when a new 
> flag is introduced, right? So what I intend to is trying to make sure we 
> don't need to change this.

Anyone adding a new value will need to test their code. And this
testing would not succeed without the range check above having
got adjusted.

>>>           {
>>>               printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: "
>>>                      "assign %04x:%02x:%02x.%u to dom%d failed "
>>>                      "with unknown rdm flag %x. (%d)\n",
>>>                      seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>>>                      d->domain_id, flag, ret);
>>
>> I see absolutely no reason for such a log message.
>>
> 
> Do you mean I should simplify this log message? Or remove completely?

Remove. (And I think you generally need to reduce verbosity of
your additions - please don't mix up what might be useful for your
debugging with what will be useful once the code went in.)

Jan

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 16/16] xen/vtd: prevent from assign the device with shared rmrr
  2015-06-17 10:28   ` Jan Beulich
@ 2015-06-18  9:23     ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-18  9:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, tim, xen-devel

On 2015/6/17 18:28, Jan Beulich wrote:
>>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
>> --- a/xen/drivers/passthrough/vtd/iommu.c
>> +++ b/xen/drivers/passthrough/vtd/iommu.c
>> @@ -2277,13 +2277,37 @@ static int intel_iommu_assign_device(
>>       if ( list_empty(&acpi_drhd_units) )
>>           return -ENODEV;
>>
>> +    seg = pdev->seg;
>> +    bus = pdev->bus;
>> +    /*
>> +     * In rare cases one given rmrr is shared by multiple devices but
>> +     * obviously this would put the security of a system at risk. So
>> +     * we should prevent from this sort of device assignment.
>> +     *
>> +     * TODO: actually we can group these devices which shared rmrr, and
>> +     * then allow all devices within a group to be assigned to same domain.
>> +     */
>> +    for_each_rmrr_device( rmrr, bdf, i )
>> +    {
>> +        if ( rmrr->segment == seg &&
>> +             PCI_BUS(bdf) == bus &&
>> +             PCI_DEVFN2(bdf) == devfn )
>> +        {
>> +            if ( rmrr->scope.devices_cnt > 1 )
>> +            {
>> +                ret = -EPERM;
>> +                printk(XENLOG_G_ERR VTDPREFIX
>> +                       " cannot assign this device with shared RMRR for Dom%d (%d)\n",
>> +                       d->domain_id, ret);
>> +                return ret;
>
> return -EPERM. No need to assign the value to ret, and no need to
> add the constant error code to the log entry. What's missing otoh
> is what "this device" is - you should print SBDF instead.
>

Right.

                 printk(XENLOG_G_ERR VTDPREFIX
                        " cannot assign %04x:%02x:%02x.%u"
                        " with shared RMRR for Dom%d.\n",
                        seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
                        d->domain_id);
                 return -EPERM;


Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
  2015-06-18  9:13           ` Jan Beulich
@ 2015-06-18  9:31             ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-18  9:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: yang.z.zhang, andrew.cooper3, kevin.tian, tim, xen-devel

On 2015/6/18 17:13, Jan Beulich wrote:
>>>> On 18.06.15 at 10:48, <tiejun.chen@intel.com> wrote:
>> On 2015/6/18 15:53, Jan Beulich wrote:
>>>>>> On 18.06.15 at 09:14, <tiejun.chen@intel.com> wrote:
>>>> On 2015/6/17 18:11, Jan Beulich wrote:
>>>>>>>> On 11.06.15 at 03:15, <tiejun.chen@intel.com> wrote:
>>>>>> @@ -1577,9 +1578,10 @@ int iommu_do_pci_domctl(
>>>>>>             seg = machine_sbdf >> 16;
>>>>>>             bus = PCI_BUS(machine_sbdf);
>>>>>>             devfn = PCI_DEVFN2(machine_sbdf);
>>>>>> +        flag = domctl->u.assign_device.flag;
>>>>>>
>>>>>>             ret = device_assigned(seg, bus, devfn) ?:
>>>>>> -              assign_device(d, seg, bus, devfn);
>>>>>> +              assign_device(d, seg, bus, devfn, flag);
>>>>>
>>>>> I think you should range check the flag passed to make future
>>>>> extensions possible (and to avoid ambiguity on what out of
>>>>> range values would mean).
>>>>
>>>> Yeah.
>>>>
>>>> Maybe I can set this comment,
>>>>
>>>>        /* Make sure this is always the last. */
>>>>
>>>> #define XEN_DOMCTL_DEV_NO_RDM           2
>>>>
>>>>        uint32_t  flag;   /* flag of assigned device */
>>>
>>> Why would you want to needlessly break the interface is a new
>>> constant gets added? It's a domctl, so it can be changed, but we
>>> shouldn't change for no reason.
>>
>> I just think XEN_DOMCTL_DEV_NO_RDM is prone to represent a sort of
>> ending of all flags, and I also add this comment,
>>
>> /* Make sure this is always the last. */
>>
>>>
>>>> and then
>>>>
>>>>            flag = domctl->u.assign_device.flag;
>>>>            if ( flag > XEN_DOMCTL_DEV_NO_RDM )
>>>
>>> All that needs updating when a new constant gets added is this
>>> line.
>>
>> This place really isn't one spotlight to take a attention when a new
>> flag is introduced, right? So what I intend to is trying to make sure we
>> don't need to change this.
>
> Anyone adding a new value will need to test their code. And this
> testing would not succeed without the range check above having
> got adjusted.

Okay.

>
>>>>            {
>>>>                printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: "
>>>>                       "assign %04x:%02x:%02x.%u to dom%d failed "
>>>>                       "with unknown rdm flag %x. (%d)\n",
>>>>                       seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
>>>>                       d->domain_id, flag, ret);
>>>
>>> I see absolutely no reason for such a log message.
>>>
>>
>> Do you mean I should simplify this log message? Or remove completely?
>
> Remove. (And I think you generally need to reduce verbosity of
> your additions - please don't mix up what might be useful for your
> debugging with what will be useful once the code went in.)
>

Yes, I should follow this rule.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 03/16] xen/vtd: create RMRR mapping
  2015-06-12  6:13               ` Chen, Tiejun
@ 2015-06-18 10:07                 ` Tim Deegan
  2015-06-19  0:37                   ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Tim Deegan @ 2015-06-18 10:07 UTC (permalink / raw)
  To: Chen, Tiejun
  Cc: Tian, Kevin, wei.liu2, ian.campbell, andrew.cooper3, Ian.Jackson,
	xen-devel, stefano.stabellini, jbeulich, Zhang, Yang Z

At 14:13 +0800 on 12 Jun (1434118407), Chen, Tiejun wrote:
> > could you explain why existing guest_physmap_remove_page can't
> > serve the purpose so you need invent a new identity mapping
> > specific one? For unmapping suppose it should be common regardless
> > of whether it's identity-mapped or not. :-)
> 
> I have some concerns here:
> 
> #1. guest_physmap_remove_page() is a void function without a returning 
> value, so you still need a little change.

I'd be happy with adding a return value to it -- even if other callers
don't check it yet it's better to have errors ignored by callers than
ignored inside the function. :)

> #2. guest_physmap_remove_page() doesn't make readable in such a code 
> context;
> 
> rmrr_identity_mapping()
> {
>      ...
>      guest_physmap_remove_page()
>      ...
> }

I think it's fine there.

In general I'd prefer to avoid the code duplication of another helper
function if we can.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 03/16] xen/vtd: create RMRR mapping
  2015-06-18 10:07                 ` Tim Deegan
@ 2015-06-19  0:37                   ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-19  0:37 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Tian, Kevin, wei.liu2, ian.campbell, andrew.cooper3, Ian.Jackson,
	xen-devel, stefano.stabellini, jbeulich, Zhang, Yang Z

On 2015/6/18 18:07, Tim Deegan wrote:
> At 14:13 +0800 on 12 Jun (1434118407), Chen, Tiejun wrote:
>>> could you explain why existing guest_physmap_remove_page can't
>>> serve the purpose so you need invent a new identity mapping
>>> specific one? For unmapping suppose it should be common regardless
>>> of whether it's identity-mapped or not. :-)
>>
>> I have some concerns here:
>>
>> #1. guest_physmap_remove_page() is a void function without a returning
>> value, so you still need a little change.
>
> I'd be happy with adding a return value to it -- even if other callers
> don't check it yet it's better to have errors ignored by callers than
> ignored inside the function. :)
>
>> #2. guest_physmap_remove_page() doesn't make readable in such a code
>> context;
>>
>> rmrr_identity_mapping()
>> {
>>       ...
>>       guest_physmap_remove_page()
>>       ...
>> }
>
> I think it's fine there.
>
> In general I'd prefer to avoid the code duplication of another helper
> function if we can.
>

Fine to me.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-18  8:05                                   ` Jan Beulich
@ 2015-06-19  2:02                                     ` Chen, Tiejun
  2015-06-23  9:46                                       ` Chen, Tiejun
  0 siblings, 1 reply; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-19  2:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, Kevin Tian, wei.liu2, ian.campbell, andrew.cooper3,
	Ian.Jackson, xen-devel, stefano.stabellini, Yang Z Zhang

On 2015/6/18 16:05, Jan Beulich wrote:
>>>> On 18.06.15 at 09:01, <tiejun.chen@intel.com> wrote:
>> On 2015/6/18 14:29, Jan Beulich wrote:
>>>>>> On 18.06.15 at 08:17, <tiejun.chen@intel.com> wrote:
>>>> On 2015/6/17 17:24, Jan Beulich wrote:
>>>>>>>> On 17.06.15 at 11:18, <tiejun.chen@intel.com> wrote:
>>>>>> On 2015/6/17 17:02, Jan Beulich wrote:
>>>>>>>>>> On 17.06.15 at 10:26, <tiejun.chen@intel.com> wrote:
>>>>>>>> Something hits me to generate another idea,
>>>>>>>>
>>>>>>>> #1. Still allocate all devices as before.
>>>>>>>> #2. Lookup all actual bars to check if they're conflicting RMRR
>>>>>>>>
>>>>>>>> We can skip these bars to keep zero. Then later it would make lookup easily.
>>>>>>>>
>>>>>>>> #3. Need to reallocate these conflicting bars.
>>>>>>>> #3.1 Trying to reallocate them with the remaining resources
>>>>>>>> #3.2 If the remaining resources aren't enough, we need to allocate them
>>>>>>>> from high_mem_resource.
>>>>>>>
>>>>>>> That's possible onyl for 64-bit BARs.
>>>>>>
>>>>>> You're right so this means its not proper to adjust mmio_total to
>>>>>> include conflicting reserved ranges and finally moved all conflicting
>>>>>> bars to high_mem_resource as Kevin suggested previously, so i high
>>>>>> level, we still need to decrease pci_mem_start to populate more RAM to
>>>>>> compensate them as I did, right?
>>>>>
>>>>> You probably should do both: Prefer moving things beyond 4Gb,
>>>>> but if not possible increase the MMIO hole.
>>>>>
>>>>
>>>> I'm trying to figure out a better solution. Perhaps we can allocate
>>>> 32-bit bars and 64-bit bars orderly. This may help us bypass those
>>>> complicated corner cases.
>>>
>>> Dealing with 32- and 64-bit BARs separately won't help at all, as
>>
>> More precisely I'm saying to deal with them orderly.
>>
>>> there may only be 32-bit ones, or the set of 32-bit ones may
>>> already require you to do re-arrangements. Plus, for compatibility
>>
>> Yes but I don't understand they are specific cases to my idea.
>
> Perhaps the problem is that you don't say what "orderly" is supposed
> to mean here?

You're right. Here when "separately" vs "orderly", I should definitely 
use "orderly" to make more understandable.

>
>>> reasons (just like physical machines' BIOSes do) avoiding to place
>>> MMIO above 4Gb where possible is still a goal.
>>
>> So are you sure you see my idea completely? I don't intend to expand pci
>> memory above 4GB.
>>
>> Let me clear this simply,
>>
>> #1. I'm still trying to allocate all 32bit bars from
>> [pci_mem_start,pci_mem_end] as before.
>>
>> #2. But [pci_mem_start,pci_mem_end] mightn't enough cover all 32bit-bars
>> again because of RMRR, right? So I will populate RAM to push downward at
>> cur_pci_mem_start ( = pci_mem_start - reserved device memory), then
>> allocate the remaining 32bit-bars from [cur_pci_mem_start , pci_mem_start]
>>
>> #3. Then I'm still trying to allocate 64bit-bars from
>> [pci_mem_start,pci_mem_end], unless its not enough. This is just going
>> to follow the original.
>>
>> So anything is breaking that goal?
>
> Maybe not, from the above.
>
>> And overall, its same as the original.
>
> If the model follows the original, what's the point of outlining
> supposed changes to the model? All I'm trying to understand is how

Its not same completely, or let me change this statement, "same" -> 
"similar".

> you want to change the current code to accommodate the not
> aligned reserved memory regions. If everything is the same as
> before, this can't have been taken care of. If something is different
> from the original, that's what needs spelling out (and nothing else,
> as that would only clutter the picture).
>
>>> Doesn't look like the right approach to me. As said before, I think
>>
>> Could you see what I'm saying again? I just feel you don't understand
>> what you mean. If you still think I'm wrong let me know.
>
> I think I understand what _I_ mean, but I'm indeed unsure I see
> what _you_ mean. Part of the problem is that you toggle between
> sending (incomplete) patches, code fragments, and discussing the
> approach verbally. I'd much prefer if either you started with a clear
> picture of what you intend to implement, or with an implementation
> that at least attempts to take care of all the corner cases (showing
> that you understand what the corner cases are, which so far I'm
> getting the - perhaps false - impression that you don't).
>

Based on my previous recognition and our recent discussion, my current 
understanding can be summarized as follows;

#1. Goal

MMIO region should exclude all reserved device memory

#2. Requirements

#2.1 Still need to make sure MMIO region is fit all pci devices as before

#2.2 Accommodate the not aligned reserved memory regions

If I'm missing something let me know.

#3. How to

#3.1 Address #2.1

We need to either of populating more RAM, or of expanding more highmem. 
But we should know just 64bit-bar can work with highmem, and as you 
mentioned we also should avoid expanding highmem as possible.

So my implementation is to allocate 32bit-bar and 64bit-bar orderly.

1>. The first allocation round just to 32bit-bar

If we can finish allocating all 32bit-bar, we just go to allocate 
64bit-bar with all remaining resources including low pci memory.

If not, we need to calculate how much RAM should be populated to 
allocate the remaining 32bit-bars, then populate sufficient RAM as
exp_mem_resource to go to the second allocation round 2>.

2>. The second allocation round to the remaining 32bit-bar

We should can finish allocating all 32bit-bar in theory, then go to the 
third allocation round 3>.

3>. The third allocation round to 64bit-bar

We'll try to first allocate from the remaining low memory resource. If 
that isn't enough, we try to expand highmem to allocate for 64bit-bar. 
This process should be same as the original.

I think my pasted patch should represent this framework above.

#3.2 Address #2.2

As you said, we need to accommodate the not aligned reserved memory 
regions. I didn't consider this more previously.

Now I'm trying to rewrite that chunk of code fragment to address this 
point like this,

-        base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
+ reallocate_bar:
+        base = (resource->base + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
          bar_data |= (uint32_t)base;
          bar_data_upper = (uint32_t)(base >> 32);
+        /*
+         * We should skip all reserved device memory, but we also need
+         * to check if other smaller bars can be allocated if a mmio hole
+         * exists between resource->base and reserved device memory.
+         */
+        for ( j = 0; j < memory_map.nr_map ; j++ )
+        {
+            if ( memory_map.map[j].type != E820_RAM )
+            {
+                reserved_start = memory_map.map[i].addr;
+                reserved_size = memory_map.map[i].size;
+                reserved_end = reserved_start + reserved_size;
+                if ( check_overlap(base, bar_sz,
+                                   reserved_start, reserved_size) )
+                {
+                    /*
+                     * If a hole exists between base and reserved device
+                     * memory, lets go out simply to try allocate for next
+                     * bar since all bars are in descending order of size.
+                     */
+                    if ( resource->base < reserved_start )
+                        continue;
+                    /*
+                     * If not, we need to move resource->base to
+                     * reserved_end just to reallocate this bar.
+                     */
+                    else
+                    {
+                        resource->base = reserved_end;
+                        goto reallocate_bar;
+                    }
+                }
+            }
+        }
          base += bar_sz;


Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
  2015-06-19  2:02                                     ` Chen, Tiejun
@ 2015-06-23  9:46                                       ` Chen, Tiejun
  0 siblings, 0 replies; 114+ messages in thread
From: Chen, Tiejun @ 2015-06-23  9:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, wei.liu2, ian.campbell, tim, Ian.Jackson, xen-devel,
	stefano.stabellini, andrew.cooper3, Yang Z Zhang

On 2015/6/19 10:02, Chen, Tiejun wrote:
> On 2015/6/18 16:05, Jan Beulich wrote:
>>>>> On 18.06.15 at 09:01, <tiejun.chen@intel.com> wrote:
>>> On 2015/6/18 14:29, Jan Beulich wrote:
>>>>>>> On 18.06.15 at 08:17, <tiejun.chen@intel.com> wrote:
>>>>> On 2015/6/17 17:24, Jan Beulich wrote:
>>>>>>>>> On 17.06.15 at 11:18, <tiejun.chen@intel.com> wrote:
>>>>>>> On 2015/6/17 17:02, Jan Beulich wrote:
>>>>>>>>>>> On 17.06.15 at 10:26, <tiejun.chen@intel.com> wrote:
>>>>>>>>> Something hits me to generate another idea,
>>>>>>>>>
>>>>>>>>> #1. Still allocate all devices as before.
>>>>>>>>> #2. Lookup all actual bars to check if they're conflicting RMRR
>>>>>>>>>
>>>>>>>>> We can skip these bars to keep zero. Then later it would make
>>>>>>>>> lookup easily.
>>>>>>>>>
>>>>>>>>> #3. Need to reallocate these conflicting bars.
>>>>>>>>> #3.1 Trying to reallocate them with the remaining resources
>>>>>>>>> #3.2 If the remaining resources aren't enough, we need to
>>>>>>>>> allocate them
>>>>>>>>> from high_mem_resource.
>>>>>>>>
>>>>>>>> That's possible onyl for 64-bit BARs.
>>>>>>>
>>>>>>> You're right so this means its not proper to adjust mmio_total to
>>>>>>> include conflicting reserved ranges and finally moved all
>>>>>>> conflicting
>>>>>>> bars to high_mem_resource as Kevin suggested previously, so i high
>>>>>>> level, we still need to decrease pci_mem_start to populate more
>>>>>>> RAM to
>>>>>>> compensate them as I did, right?
>>>>>>
>>>>>> You probably should do both: Prefer moving things beyond 4Gb,
>>>>>> but if not possible increase the MMIO hole.
>>>>>>
>>>>>
>>>>> I'm trying to figure out a better solution. Perhaps we can allocate
>>>>> 32-bit bars and 64-bit bars orderly. This may help us bypass those
>>>>> complicated corner cases.
>>>>
>>>> Dealing with 32- and 64-bit BARs separately won't help at all, as
>>>
>>> More precisely I'm saying to deal with them orderly.
>>>
>>>> there may only be 32-bit ones, or the set of 32-bit ones may
>>>> already require you to do re-arrangements. Plus, for compatibility
>>>
>>> Yes but I don't understand they are specific cases to my idea.
>>
>> Perhaps the problem is that you don't say what "orderly" is supposed
>> to mean here?
>
> You're right. Here when "separately" vs "orderly", I should definitely
> use "orderly" to make more understandable.
>
>>
>>>> reasons (just like physical machines' BIOSes do) avoiding to place
>>>> MMIO above 4Gb where possible is still a goal.
>>>
>>> So are you sure you see my idea completely? I don't intend to expand pci
>>> memory above 4GB.
>>>
>>> Let me clear this simply,
>>>
>>> #1. I'm still trying to allocate all 32bit bars from
>>> [pci_mem_start,pci_mem_end] as before.
>>>
>>> #2. But [pci_mem_start,pci_mem_end] mightn't enough cover all 32bit-bars
>>> again because of RMRR, right? So I will populate RAM to push downward at
>>> cur_pci_mem_start ( = pci_mem_start - reserved device memory), then
>>> allocate the remaining 32bit-bars from [cur_pci_mem_start ,
>>> pci_mem_start]
>>>
>>> #3. Then I'm still trying to allocate 64bit-bars from
>>> [pci_mem_start,pci_mem_end], unless its not enough. This is just going
>>> to follow the original.
>>>
>>> So anything is breaking that goal?
>>
>> Maybe not, from the above.
>>
>>> And overall, its same as the original.
>>
>> If the model follows the original, what's the point of outlining
>> supposed changes to the model? All I'm trying to understand is how
>
> Its not same completely, or let me change this statement, "same" ->
> "similar".
>
>> you want to change the current code to accommodate the not
>> aligned reserved memory regions. If everything is the same as
>> before, this can't have been taken care of. If something is different
>> from the original, that's what needs spelling out (and nothing else,
>> as that would only clutter the picture).
>>
>>>> Doesn't look like the right approach to me. As said before, I think
>>>
>>> Could you see what I'm saying again? I just feel you don't understand
>>> what you mean. If you still think I'm wrong let me know.
>>
>> I think I understand what _I_ mean, but I'm indeed unsure I see
>> what _you_ mean. Part of the problem is that you toggle between
>> sending (incomplete) patches, code fragments, and discussing the
>> approach verbally. I'd much prefer if either you started with a clear
>> picture of what you intend to implement, or with an implementation
>> that at least attempts to take care of all the corner cases (showing
>> that you understand what the corner cases are, which so far I'm
>> getting the - perhaps false - impression that you don't).
>>
>
> Based on my previous recognition and our recent discussion, my current
> understanding can be summarized as follows;
>
> #1. Goal
>
> MMIO region should exclude all reserved device memory
>
> #2. Requirements
>
> #2.1 Still need to make sure MMIO region is fit all pci devices as before
>
> #2.2 Accommodate the not aligned reserved memory regions
>
> If I'm missing something let me know.
>
> #3. How to
>
> #3.1 Address #2.1
>
> We need to either of populating more RAM, or of expanding more highmem.
> But we should know just 64bit-bar can work with highmem, and as you
> mentioned we also should avoid expanding highmem as possible.
>
> So my implementation is to allocate 32bit-bar and 64bit-bar orderly.
>
> 1>. The first allocation round just to 32bit-bar
>
> If we can finish allocating all 32bit-bar, we just go to allocate
> 64bit-bar with all remaining resources including low pci memory.
>
> If not, we need to calculate how much RAM should be populated to
> allocate the remaining 32bit-bars, then populate sufficient RAM as
> exp_mem_resource to go to the second allocation round 2>.
>
> 2>. The second allocation round to the remaining 32bit-bar
>
> We should can finish allocating all 32bit-bar in theory, then go to the
> third allocation round 3>.
>
> 3>. The third allocation round to 64bit-bar
>
> We'll try to first allocate from the remaining low memory resource. If
> that isn't enough, we try to expand highmem to allocate for 64bit-bar.
> This process should be same as the original.
>
> I think my pasted patch should represent this framework above.
>
> #3.2 Address #2.2
>
> As you said, we need to accommodate the not aligned reserved memory
> regions. I didn't consider this more previously.
>
> Now I'm trying to rewrite that chunk of code fragment to address this
> point like this,
>
> -        base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
> + reallocate_bar:
> +        base = (resource->base + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
>           bar_data |= (uint32_t)base;
>           bar_data_upper = (uint32_t)(base >> 32);
> +        /*
> +         * We should skip all reserved device memory, but we also need
> +         * to check if other smaller bars can be allocated if a mmio hole
> +         * exists between resource->base and reserved device memory.
> +         */
> +        for ( j = 0; j < memory_map.nr_map ; j++ )
> +        {
> +            if ( memory_map.map[j].type != E820_RAM )
> +            {
> +                reserved_start = memory_map.map[i].addr;
> +                reserved_size = memory_map.map[i].size;
> +                reserved_end = reserved_start + reserved_size;
> +                if ( check_overlap(base, bar_sz,
> +                                   reserved_start, reserved_size) )
> +                {
> +                    /*
> +                     * If a hole exists between base and reserved device
> +                     * memory, lets go out simply to try allocate for next
> +                     * bar since all bars are in descending order of size.
> +                     */
> +                    if ( resource->base < reserved_start )
> +                        continue;
> +                    /*
> +                     * If not, we need to move resource->base to
> +                     * reserved_end just to reallocate this bar.
> +                     */
> +                    else
> +                    {
> +                        resource->base = reserved_end;
> +                        goto reallocate_bar;
> +                    }
> +                }
> +            }
> +        }
>           base += bar_sz;
>
>

Just let me include this into next series to seek further review or 
discussion.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 114+ messages in thread

end of thread, other threads:[~2015-06-23  9:46 UTC | newest]

Thread overview: 114+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-11  1:15 [v3][PATCH 00/16] Fix RMRR Tiejun Chen
2015-06-11  1:15 ` [v3][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map Tiejun Chen
2015-06-11  8:56   ` Tian, Kevin
2015-06-11  1:15 ` [v3][PATCH 02/16] xen/x86/p2m: introduce set_identity_p2m_entry Tiejun Chen
2015-06-11  7:33   ` Jan Beulich
2015-06-11  8:23     ` Chen, Tiejun
2015-06-11  9:23       ` Jan Beulich
2015-06-11  9:25         ` Chen, Tiejun
2015-06-11  9:00   ` Tian, Kevin
2015-06-11  9:18     ` Chen, Tiejun
2015-06-11  1:15 ` [v3][PATCH 03/16] xen/vtd: create RMRR mapping Tiejun Chen
2015-06-11  9:14   ` Tian, Kevin
2015-06-11  9:31     ` Chen, Tiejun
2015-06-11 14:07       ` Tim Deegan
2015-06-12  2:43         ` Chen, Tiejun
2015-06-12  5:58           ` Chen, Tiejun
2015-06-12  5:59             ` Tian, Kevin
2015-06-12  6:13               ` Chen, Tiejun
2015-06-18 10:07                 ` Tim Deegan
2015-06-19  0:37                   ` Chen, Tiejun
2015-06-17 10:03   ` Jan Beulich
2015-06-18  6:23     ` Chen, Tiejun
2015-06-11  1:15 ` [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy Tiejun Chen
2015-06-11  9:28   ` Tian, Kevin
2015-06-12  6:31     ` Chen, Tiejun
2015-06-12  8:45       ` Jan Beulich
2015-06-12  9:20         ` Chen, Tiejun
2015-06-12  9:26           ` Jan Beulich
2015-06-15  7:39           ` Chen, Tiejun
2015-06-16  2:30       ` Tian, Kevin
2015-06-17 10:11   ` Jan Beulich
2015-06-18  7:14     ` Chen, Tiejun
2015-06-18  7:53       ` Jan Beulich
2015-06-18  8:48         ` Chen, Tiejun
2015-06-18  9:13           ` Jan Beulich
2015-06-18  9:31             ` Chen, Tiejun
2015-06-11  1:15 ` [v3][PATCH 05/16] xen: enable XENMEM_memory_map in hvm Tiejun Chen
2015-06-11  9:29   ` Tian, Kevin
2015-06-17 10:14   ` Jan Beulich
2015-06-18  8:53     ` Chen, Tiejun
2015-06-11  1:15 ` [v3][PATCH 06/16] hvmloader: get guest memory map into memory_map[] Tiejun Chen
2015-06-11  9:38   ` Tian, Kevin
2015-06-12  7:33     ` Chen, Tiejun
2015-06-17 10:22   ` Jan Beulich
2015-06-18  9:13     ` Chen, Tiejun
2015-06-11  1:15 ` [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges Tiejun Chen
2015-06-11  9:51   ` Tian, Kevin
2015-06-12  7:53     ` Chen, Tiejun
2015-06-16  5:47       ` Tian, Kevin
2015-06-16  9:29         ` Chen, Tiejun
2015-06-16  9:40           ` Jan Beulich
2015-06-17  7:10             ` Chen, Tiejun
2015-06-17  7:19               ` Jan Beulich
2015-06-17  7:54                 ` Chen, Tiejun
2015-06-17  8:05                   ` Jan Beulich
2015-06-17  8:26                     ` Chen, Tiejun
2015-06-17  8:47                       ` Chen, Tiejun
2015-06-17  9:02                       ` Jan Beulich
2015-06-17  9:18                         ` Chen, Tiejun
2015-06-17  9:24                           ` Jan Beulich
2015-06-18  6:17                             ` Chen, Tiejun
2015-06-18  6:29                               ` Jan Beulich
2015-06-18  7:01                                 ` Chen, Tiejun
2015-06-18  8:05                                   ` Jan Beulich
2015-06-19  2:02                                     ` Chen, Tiejun
2015-06-23  9:46                                       ` Chen, Tiejun
2015-06-11  1:15 ` [v3][PATCH 08/16] hvmloader/e820: construct guest e820 table Tiejun Chen
2015-06-11  9:59   ` Tian, Kevin
2015-06-12  8:19     ` Chen, Tiejun
2015-06-16  5:54       ` Tian, Kevin
2015-06-11  1:15 ` [v3][PATCH 09/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map Tiejun Chen
2015-06-11 10:00   ` Tian, Kevin
2015-06-11  1:15 ` [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy Tiejun Chen
2015-06-11 10:02   ` Tian, Kevin
2015-06-12  8:25     ` Chen, Tiejun
2015-06-16  2:28       ` Tian, Kevin
2015-06-12 15:43   ` Wei Liu
2015-06-15  1:12     ` Chen, Tiejun
2015-06-15 14:58       ` Wei Liu
2015-06-16  2:31         ` Chen, Tiejun
2015-06-11  1:15 ` [v3][PATCH 11/16] tools: introduce some new parameters to set rdm policy Tiejun Chen
2015-06-12 16:02   ` Wei Liu
2015-06-15  1:19     ` Chen, Tiejun
2015-06-11  1:15 ` [v3][PATCH 12/16] tools/libxl: passes rdm reservation policy Tiejun Chen
2015-06-12 16:17   ` Wei Liu
2015-06-15  1:26     ` Chen, Tiejun
2015-06-15 15:00       ` Wei Liu
2015-06-11  1:15 ` [v3][PATCH 13/16] tools/libxl: detect and avoid conflicts with RDM Tiejun Chen
2015-06-11 10:19   ` Tian, Kevin
2015-06-12  8:30     ` Chen, Tiejun
2015-06-12 16:39   ` Wei Liu
2015-06-15  1:50     ` Chen, Tiejun
2015-06-15 15:01       ` Wei Liu
2015-06-16  1:44         ` Chen, Tiejun
2015-06-11  1:15 ` [v3][PATCH 14/16] tools/libxl: extend XENMEM_set_memory_map Tiejun Chen
2015-06-12 16:43   ` Wei Liu
2015-06-15  2:15     ` Chen, Tiejun
2015-06-11  1:15 ` [v3][PATCH 15/16] xen/vtd: enable USB device assignment Tiejun Chen
2015-06-11 10:22   ` Tian, Kevin
2015-06-12  8:59     ` Chen, Tiejun
2015-06-16  5:58       ` Tian, Kevin
2015-06-16  6:09         ` Chen, Tiejun
2015-06-11  1:15 ` [v3][PATCH 16/16] xen/vtd: prevent from assign the device with shared rmrr Tiejun Chen
2015-06-11 10:25   ` Tian, Kevin
2015-06-12  8:44     ` Chen, Tiejun
2015-06-17 10:28   ` Jan Beulich
2015-06-18  9:23     ` Chen, Tiejun
2015-06-11  7:27 ` [v3][PATCH 00/16] Fix RMRR Jan Beulich
2015-06-11  8:42   ` Tian, Kevin
2015-06-11  9:06     ` Chen, Tiejun
2015-06-11 12:52 ` Tim Deegan
2015-06-12  2:10   ` Chen, Tiejun
2015-06-12  8:04     ` Jan Beulich
2015-06-12  8:20       ` Chen, Tiejun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.